 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
SKAN
Joined: 26 Dec 2005 Posts: 7186
|
Posted: Fri Nov 20, 2009 1:22 am Post subject: StrX() :: Auto-Parser for XML / HTML |
|
|
StrX() is a wrapper that extends SubStr()'s functionality. It accepts two strings for extremes ( begin & end ) and extracts the text in between them. It is much similar to
RegExMatch( Str, "BeginStr(.*)EndStr", SubPat ), but the major difference is, StrX() allows flexibility on the final length of the resultant string. To be precise, it can trim/expand characters at either/both ends of the resultant string.
| Quote: |
Announcement: The current version 1.0 can auto-parse when used with While loop. Please checkout the updated examples.
|
StrX( H, BS,BO,BT, ES,EO,ET, NextOffset )
Parameters
- 1 ) H = HayStack. The "Source Text"
- 2 ) BS = BeginStr. Pass a String that will result at the left extreme of Resultant String
- 3 ) BO = BeginOffset.
Number of Characters to omit from the left extreme of "Source Text" while searching for BeginStr
- Pass a 0 to search in reverse ( from right-to-left ) in "Source Text"
- If you intend to call StrX() from a Loop, pass the same variable used as 8th Parameter, which will simplify the parsing process.
- 4 ) BT = BeginTrim.
Number of characters to trim on the left extreme of Resultant String
- Pass the String length of BeginStr if you want to omit it from Resultant String
- Pass a Negative value if you want to expand the left extreme of Resultant String
- 5 ) ES = EndStr. Pass a String that will result at the right extreme of Resultant String
- 6 ) EO = EndOffset.
Can be only True or False.
If False, EndStr will be searched from the end of Source Text.
If True, search will be conducted from the search result offset of BeginStr or from offset 1 whichever is applicable. - 7 ) ET = EndTrim.
Number of characters to trim on the right extreme of Resultant String
- Pass the String length of EndStr if you want to omit it from Resultant String
- Pass a Negative value if you want to expand the right extreme of Resultant String
- 8 ) NextOffset : A name of ByRef Variable that will be updated by StrX() with the current offset, You may pass the same variable as Parameter 3, to simplify data parsing in a loop
Here follows real world examples that demonstrates StrX()'s functionality:
Example 1 : A Script to retrieve real-time details of last 15 posts made in our forum.
| Code: | UrlDownloadToFile, http://www.autohotkey.com/forum/rss.php, ahkrss.xml ; 01
FileRead, xml, ahkrss.xml ; 02
While Item := StrX( xml , "<item>" ,N,0, "</item>" ,1,0, N ) ; 03
Title := StrX( Item, "<title>",1,7, "</title>",1,8 ) ; 04
, Link := StrX( Item, "<link>" ,1,6, "</link>" ,1,7 ) ; 05
, List .= "`n`n" A_Index ")`t" Title "`n`t" Link ; 06
MsgBox, 64, Latest Posts on AHK Forum, %List% ; 07 |
| Quote: | Note: The result of above script may contain HTML formatting like below:
15) Ask for Help :: &quot;Jump to&quot; video frame (i.e. &quot;seek&quot;
You may use UnHTM() on Title to convert it to proper text.
|
Example 2 : Download and extract links from a Google Search Result
| Code: | UrlDownloadToFile, % "http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN"
. "307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=", Google.htm
FileRead, html, Google.htm
While Item := StrX( html, "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
Sub1 := StrX( Item, "<a href=",1,9, """" ,1,1, T )
, Sub2 := StrX( Item, ">", T,1, "</a>",1,4 )
, Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"
MsgBox, %Text% ; Dependency :: Get UnHTM() www.autohotkey.com/forum/viewtopic.php?t=51342 |
... and finally here is StrX()
| Code: | StrX( H, BS="",BO=0,BT=1, ES="",EO=0,ET=1, ByRef N="" ) { ; | by Skan | 19-Nov-2009
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
<0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
+(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html
} |
_________________ Suresh Kumar A N
Last edited by SKAN on Sat Nov 21, 2009 6:06 pm; edited 9 times in total |
|
| Back to top |
|
 |
The Naked General
Joined: 22 Feb 2009 Posts: 12 Location: Dallas TX
|
Posted: Fri Nov 20, 2009 5:18 am Post subject: |
|
|
This is great Skan!
I'll have to go back and clean up some old parsing scripts with it. Thanks a bunch  _________________ The shortest distance between two points:
A wormhole
I win |
|
| Back to top |
|
 |
hugov
Joined: 27 May 2007 Posts: 2473
|
|
| Back to top |
|
 |
linpinger
Joined: 20 Oct 2007 Posts: 10 Location: china,hubei
|
Posted: Fri Nov 20, 2009 10:50 am Post subject: |
|
|
If I Comment No. 03 line
It will go into an unend loop
why does this happen?
my English is pool, ^_^ |
|
| Back to top |
|
 |
SKAN
Joined: 26 Dec 2005 Posts: 7186
|
Posted: Fri Nov 20, 2009 12:43 pm Post subject: |
|
|
"Title Post" Updated with Example 2
Download and extract links from a Google Search Result
| Code: | UrlDownloadToFile, % "http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN"
. "307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=", Google.htm
FileRead, html, Google.htm
While Item := StrX( html, "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
Sub1 := StrX( Item, "<a href=",1,9, """" ,1,1, T )
, Sub2 := StrX( Item, ">", T,1, "</a>",1,4 )
, Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"
MsgBox, %Text% ; Dependency :: Get UnHTM() www.autohotkey.com/forum/viewtopic.php?t=51342 |
On a related note here is Lexikos' COM version for the same:
http://www.autohotkey.com/forum/viewtopic.php?p=182714#182714
Last edited by SKAN on Fri Nov 20, 2009 9:06 pm; edited 1 time in total |
|
| Back to top |
|
 |
linpinger
Joined: 20 Oct 2007 Posts: 10 Location: china,hubei
|
Posted: Fri Nov 20, 2009 2:58 pm Post subject: |
|
|
Thanks SKAN's Reply !
I still Don't UnderStand
while Searching on the end of string, why It don't stop and break
I had to add some other check code,
add this three line in while loop can break
| Code: |
if ( N < old )
break
old := N
|
|
|
| Back to top |
|
 |
SKAN
Joined: 26 Dec 2005 Posts: 7186
|
Posted: Fri Nov 20, 2009 9:17 pm Post subject: |
|
|
| linpinger wrote: | I still Don't UnderStand
while Searching on the end of string, why It don't stop and break
I had to add some other check code,
add this three line in while loop can break
| Code: |
if ( N < old )
break
old := N
|
|
My code was at fault. I have re-written the function which has been posted on the top.
You do not have to add code anymore.. When used with "While loop" StrX() will
automatically parse the data and shall exit the loop gracefully.
Please test the updated examples and let me know the status.
| linpinger wrote: | | Thanks SKAN's Reply ! |
er.. You might find my reply missing as I have deleted it
... as it does not fit the current version of StrX() and may cause confusion.
Thank You. |
|
| Back to top |
|
 |
linpinger
Joined: 20 Oct 2007 Posts: 10 Location: china,hubei
|
Posted: Sat Nov 21, 2009 3:13 am Post subject: |
|
|
I have get the latest strX()
It's completly Great !
I noticed that new Example 1 don't have
N := 1
It means that N is blank, does it matter?
(The result is right, have no problem.) |
|
| Back to top |
|
 |
SKAN
Joined: 26 Dec 2005 Posts: 7186
|
Posted: Sat Nov 21, 2009 9:04 am Post subject: |
|
|
| linpinger wrote: | I have get the latest strX()
It's completly Great ! |
Thanks for testing it.
| linpinger wrote: | I noticed that new Example 1 don't have
N := 1
It means that N is blank, does it matter?
(The result is right, have no problem.) |
It is a side effect. The code tests the value of BeginOffset to make sure a negative value is not being passed to InStr().
| Code: | | BO < 0 ? 1 : BO ; If BO is lesser than 0 use 1 - otherwise use BO itself |
If you want to run both the posted examples from the same script,
then you have to use a N := 1 in between them to reset N
.. or you can name the variables differently, like N1 and N2
Maybe StrX() should reset N with 1 when it is about to return an empty string? |
|
| Back to top |
|
 |
linpinger
Joined: 20 Oct 2007 Posts: 10 Location: china,hubei
|
Posted: Sat Nov 21, 2009 1:08 pm Post subject: |
|
|
| SKAN wrote: |
Maybe StrX() should reset N with 1 when it is about to return an empty string?
|
I think reseting N is a good Ideal
Because, When we Use N as the last Parameter
It always show , N > strlen(xml)
so, it seems N is not very usefull, reset it is a good ideal |
|
| Back to top |
|
 |
daonlyfreez
Joined: 16 Mar 2005 Posts: 841 Location: Berlin
|
Posted: Sat Nov 21, 2009 1:39 pm Post subject: |
|
|
Very nice!
 _________________
My AHK stuff on ahk.net / on DropBox (mirror) / @home (if online) |
|
| Back to top |
|
 |
SKAN
Joined: 26 Dec 2005 Posts: 7186
|
Posted: Mon Nov 23, 2009 6:18 am Post subject: |
|
|
| linpinger wrote: | I think reseting N is a good Ideal
Because, When we Use N as the last Parameter
It always show , N > strlen(xml)
so, it seems N is not very usefull, reset it is a good ideal |
| Code: | While Item := StrX( html, "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
Sub1 := StrX( Item, "<a href=",1,9, """" ,1,1, T )
, Sub2 := StrX( Item, ">", T,1, "</a>",1,4 ) |
In above, if Sub1 result is empty Sub2 will definitely become empty, which is best behaviour to expect. |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|