AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

StrX() :: Auto-Parser for XML / HTML

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions
View previous topic :: View next topic  
Author Message
SKAN



Joined: 26 Dec 2005
Posts: 7186

PostPosted: Fri Nov 20, 2009 1:22 am    Post subject: StrX() :: Auto-Parser for XML / HTML Reply with quote


    StrX() is a wrapper that extends SubStr()'s functionality. It accepts two strings for extremes ( begin & end ) and extracts the text in between them. It is much similar to
    RegExMatch( Str, "BeginStr(.*)EndStr", SubPat ), but the major difference is, StrX() allows flexibility on the final length of the resultant string. To be precise, it can trim/expand characters at either/both ends of the resultant string.

    Quote:

      Announcement: The current version 1.0 can auto-parse when used with While loop. Please checkout the updated examples.



    StrX( H, BS,BO,BT, ES,EO,ET, NextOffset )

      Parameters

    • 1 ) H = HayStack. The "Source Text"


    • 2 ) BS = BeginStr. Pass a String that will result at the left extreme of Resultant String
    • 3 ) BO = BeginOffset.
        Number of Characters to omit from the left extreme of "Source Text" while searching for BeginStr
      • Pass a 0 to search in reverse ( from right-to-left ) in "Source Text"
      • If you intend to call StrX() from a Loop, pass the same variable used as 8th Parameter, which will simplify the parsing process.
    • 4 ) BT = BeginTrim.
        Number of characters to trim on the left extreme of Resultant String
      • Pass the String length of BeginStr if you want to omit it from Resultant String
      • Pass a Negative value if you want to expand the left extreme of Resultant String


    • 5 ) ES = EndStr. Pass a String that will result at the right extreme of Resultant String
    • 6 ) EO = EndOffset.
        Can be only True or False.
        If False, EndStr will be searched from the end of Source Text.
        If True, search will be conducted from the search result offset of BeginStr or from offset 1 whichever is applicable.
    • 7 ) ET = EndTrim.
        Number of characters to trim on the right extreme of Resultant String
      • Pass the String length of EndStr if you want to omit it from Resultant String
      • Pass a Negative value if you want to expand the right extreme of Resultant String


    • 8 ) NextOffset : A name of ByRef Variable that will be updated by StrX() with the current offset, You may pass the same variable as Parameter 3, to simplify data parsing in a loop


    Here follows real world examples that demonstrates StrX()'s functionality:


    Example 1 : A Script to retrieve real-time details of last 15 posts made in our forum.

    Code:
    UrlDownloadToFile, http://www.autohotkey.com/forum/rss.php, ahkrss.xml   ; 01
    FileRead, xml, ahkrss.xml                                                ; 02

    While Item  := StrX( xml ,  "<item>" ,N,0,  "</item>" ,1,0,  N )         ; 03
          Title := StrX( Item,  "<title>",1,7,  "</title>",1,8     )         ; 04
        , Link  := StrX( Item,  "<link>" ,1,6,  "</link>" ,1,7     )         ; 05
        , List  .= "`n`n" A_Index ")`t" Title "`n`t" Link                    ; 06

    MsgBox, 64, Latest Posts on AHK Forum, %List%                            ; 07


    Quote:
    Note: The result of above script may contain HTML formatting like below:

    15) Ask for Help :: &amp;quot;Jump to&amp;quot; video frame (i.e. &amp;quot;seek&amp;quot;

    You may use UnHTM() on Title to convert it to proper text.


    Example 2 : Download and extract links from a Google Search Result

    Code:
    UrlDownloadToFile, % "http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN"
                       . "307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=", Google.htm
    FileRead, html, Google.htm

    While Item := StrX( html,  "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
          Sub1 := StrX( Item, "<a href=",1,9,  """"  ,1,1,  T )
        , Sub2 := StrX( Item, ">",       T,1,  "</a>",1,4     )
        , Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"

    MsgBox, %Text% ; Dependency :: Get UnHTM() www.autohotkey.com/forum/viewtopic.php?t=51342



        ... and finally here is StrX()


      Code:
      StrX( HBS="",BO=0,BT=1,   ES="",EO=0,ET=1,  ByRef N="" ) { ;    | by Skan | 19-Nov-2009
      Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
       <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
       +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html
      }



    _________________
    Suresh Kumar A N


    Last edited by SKAN on Sat Nov 21, 2009 6:06 pm; edited 9 times in total
    Back to top
    View user's profile Send private message
    The Naked General



    Joined: 22 Feb 2009
    Posts: 12
    Location: Dallas TX

    PostPosted: Fri Nov 20, 2009 5:18 am    Post subject: Reply with quote

    This is great Skan!

    I'll have to go back and clean up some old parsing scripts with it. Thanks a bunch Very Happy
    _________________
    The shortest distance between two points:
    A wormhole
    I win
    Back to top
    View user's profile Send private message MSN Messenger
    hugov



    Joined: 27 May 2007
    Posts: 2473

    PostPosted: Fri Nov 20, 2009 9:26 am    Post subject: Reply with quote

    No Sir, I'm definitely not disappointed Very Happy
    _________________
    Tut 4 Newbies
    TF : Text file & string lib, TF Forum
    Back to top
    View user's profile Send private message Visit poster's website
    linpinger



    Joined: 20 Oct 2007
    Posts: 10
    Location: china,hubei

    PostPosted: Fri Nov 20, 2009 10:50 am    Post subject: Reply with quote

    If I Comment No. 03 line

    It will go into an unend loop

    why does this happen?

    my English is pool, ^_^
    Back to top
    View user's profile Send private message
    SKAN



    Joined: 26 Dec 2005
    Posts: 7186

    PostPosted: Fri Nov 20, 2009 12:43 pm    Post subject: Reply with quote

    "Title Post" Updated with Example 2

    Download and extract links from a Google Search Result

    Code:
    UrlDownloadToFile, % "http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN"
                       . "307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=", Google.htm
    FileRead, html, Google.htm

    While Item := StrX( html,  "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
          Sub1 := StrX( Item, "<a href=",1,9,  """"  ,1,1,  T )
        , Sub2 := StrX( Item, ">",       T,1,  "</a>",1,4     )
        , Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"

    MsgBox, %Text% ; Dependency :: Get UnHTM() www.autohotkey.com/forum/viewtopic.php?t=51342


    On a related note here is Lexikos' COM version for the same:
    http://www.autohotkey.com/forum/viewtopic.php?p=182714#182714


    Last edited by SKAN on Fri Nov 20, 2009 9:06 pm; edited 1 time in total
    Back to top
    View user's profile Send private message
    linpinger



    Joined: 20 Oct 2007
    Posts: 10
    Location: china,hubei

    PostPosted: Fri Nov 20, 2009 2:58 pm    Post subject: Reply with quote

    Thanks SKAN's Reply !

    I still Don't UnderStand
    while Searching on the end of string, why It don't stop and break

    I had to add some other check code,

    add this three line in while loop can break

    Code:

    if ( N < old )
       break
    old := N
    Back to top
    View user's profile Send private message
    SKAN



    Joined: 26 Dec 2005
    Posts: 7186

    PostPosted: Fri Nov 20, 2009 9:17 pm    Post subject: Reply with quote

    linpinger wrote:
    I still Don't UnderStand
    while Searching on the end of string, why It don't stop and break

    I had to add some other check code,

    add this three line in while loop can break

    Code:

    if ( N < old )
       break
    old := N


    My code was at fault. I have re-written the function which has been posted on the top.
    You do not have to add code anymore.. When used with "While loop" StrX() will
    automatically parse the data and shall exit the loop gracefully.
    Please test the updated examples and let me know the status.

    linpinger wrote:
    Thanks SKAN's Reply !

    er.. You might find my reply missing as I have deleted it
    ... as it does not fit the current version of StrX() and may cause confusion.

    Thank You.
    Back to top
    View user's profile Send private message
    linpinger



    Joined: 20 Oct 2007
    Posts: 10
    Location: china,hubei

    PostPosted: Sat Nov 21, 2009 3:13 am    Post subject: Reply with quote

    I have get the latest strX()

    It's completly Great !

    I noticed that new Example 1 don't have
    N := 1

    It means that N is blank, does it matter?
    (The result is right, have no problem.)
    Back to top
    View user's profile Send private message
    SKAN



    Joined: 26 Dec 2005
    Posts: 7186

    PostPosted: Sat Nov 21, 2009 9:04 am    Post subject: Reply with quote

    linpinger wrote:
    I have get the latest strX()
    It's completly Great !


    Thanks for testing it. Smile

    linpinger wrote:
    I noticed that new Example 1 don't have
    N := 1

    It means that N is blank, does it matter?
    (The result is right, have no problem.)


    It is a side effect. The code tests the value of BeginOffset to make sure a negative value is not being passed to InStr().

    Code:
    BO < 0 ? 1 : BO  ; If BO is lesser than 0 use 1 -  otherwise use BO itself


    If you want to run both the posted examples from the same script,
    then you have to use a N := 1 in between them to reset N
    .. or you can name the variables differently, like N1 and N2

    Idea Maybe StrX() should reset N with 1 when it is about to return an empty string?
    Back to top
    View user's profile Send private message
    linpinger



    Joined: 20 Oct 2007
    Posts: 10
    Location: china,hubei

    PostPosted: Sat Nov 21, 2009 1:08 pm    Post subject: Reply with quote

    SKAN wrote:

    Idea Maybe StrX() should reset N with 1 when it is about to return an empty string?


    I think reseting N is a good Ideal

    Because, When we Use N as the last Parameter

    It always show , N > strlen(xml)

    so, it seems N is not very usefull, reset it is a good ideal
    Back to top
    View user's profile Send private message
    daonlyfreez



    Joined: 16 Mar 2005
    Posts: 841
    Location: Berlin

    PostPosted: Sat Nov 21, 2009 1:39 pm    Post subject: Reply with quote

    Very nice!

    Cool
    _________________
    My AHK stuff on ahk.net / on DropBox (mirror) / @home (if online)
    Back to top
    View user's profile Send private message
    SKAN



    Joined: 26 Dec 2005
    Posts: 7186

    PostPosted: Mon Nov 23, 2009 6:18 am    Post subject: Reply with quote

    linpinger wrote:
    I think reseting N is a good Ideal

    Because, When we Use N as the last Parameter

    It always show , N > strlen(xml)

    so, it seems N is not very usefull, reset it is a good ideal


    Code:
    While Item := StrX( html,  "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
          Sub1 := StrX( Item, "<a href=",1,9,  """"  ,1,1,  T )
        , Sub2 := StrX( Item, ">",       T,1,  "</a>",1,4     )


    In above, if Sub1 result is empty Sub2 will definitely become empty, which is best behaviour to expect.
    Back to top
    View user's profile Send private message
    Display posts from previous:   
    Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions All times are GMT
    Page 1 of 1

     
    Jump to:  
    You can post new topics in this forum
    You can reply to topics in this forum


    Powered by phpBB © 2001, 2005 phpBB Group