AutoHotkey Community

It is currently May 27th, 2012, 1:33 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 53 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: November 20th, 2009, 1:22 am 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
    StrX() is a wrapper that extends SubStr()'s functionality. It accepts two strings for extremes ( begin & end ) and extracts the text in between them. It is much similar to
    RegExMatch( Str, "BeginStr(.*)EndStr", SubPat ), but the major difference is, StrX() allows flexibility on the final length of the resultant string. To be precise, it can trim/expand characters at either/both ends of the resultant string.

    Quote:
      Announcement: The current version 1.0 can auto-parse when used with While loop. Please checkout the updated examples.



    StrX( H, BS,BO,BT, ES,EO,ET, NextOffset )

      Parameters

    • 1 ) H = HayStack. The "Source Text"

    • 2 ) BS = BeginStr. Pass a String that will result at the left extreme of Resultant String
    • 3 ) BO = BeginOffset.
        Number of Characters to omit from the left extreme of "Source Text" while searching for BeginStr
      • Pass a 0 to search in reverse ( from right-to-left ) in "Source Text"
      • If you intend to call StrX() from a Loop, pass the same variable used as 8th Parameter, which will simplify the parsing process.
    • 4 ) BT = BeginTrim.
        Number of characters to trim on the left extreme of Resultant String
      • Pass the String length of BeginStr if you want to omit it from Resultant String
      • Pass a Negative value if you want to expand the left extreme of Resultant String
    • 5 ) ES = EndStr. Pass a String that will result at the right extreme of Resultant String
    • 6 ) EO = EndOffset.
        Can be only True or False.
        If False, EndStr will be searched from the end of Source Text.
        If True, search will be conducted from the search result offset of BeginStr or from offset 1 whichever is applicable.
    • 7 ) ET = EndTrim.
        Number of characters to trim on the right extreme of Resultant String
      • Pass the String length of EndStr if you want to omit it from Resultant String
      • Pass a Negative value if you want to expand the right extreme of Resultant String
    • 8 ) NextOffset : A name of ByRef Variable that will be updated by StrX() with the current offset, You may pass the same variable as Parameter 3, to simplify data parsing in a loop

    Here follows real world examples that demonstrates StrX()'s functionality:


    Example 1 : A Script to retrieve real-time details of last 15 posts made in our forum.

    Code:
    UrlDownloadToFile, http://www.autohotkey.com/forum/rss.php, ahkrss.xml   ; 01
    FileRead, xml, ahkrss.xml                                                ; 02

    While Item  := StrX( xml ,  "<item>" ,N,0,  "</item>" ,1,0,  N )         ; 03
          Title := StrX( Item,  "<title>",1,7,  "</title>",1,8     )         ; 04
        , Link  := StrX( Item,  "<link>" ,1,6,  "</link>" ,1,7     )         ; 05
        , List  .= "`n`n" A_Index ")`t" Title "`n`t" Link                    ; 06

    MsgBox, 64, Latest Posts on AHK Forum, %List%                            ; 07


    Quote:
    Note: The result of above script may contain HTML formatting like below:

    15) Ask for Help :: &amp;quot;Jump to&amp;quot; video frame (i.e. &amp;quot;seek&amp;quot;

    You may use UnHTM() on Title to convert it to proper text.


    Example 2 : Download and extract links from a Google Search Result

    Code:
    UrlDownloadToFile, % "http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN"
                       . "307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=", Google.htm
    FileRead, html, Google.htm

    While Item := StrX( html,  "<h3 class=""r""><a href=",N,0, "<li class=g>",1,12, N )
          Sub1 := StrX( Item, "<a href=",1,9,  """"  ,1,1,  T )
        , Sub2 := StrX( Item, ">",       T,1,  "</a>",1,4     )
        , Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"

    MsgBox, %Text% ; Dependency :: Get UnHTM() www.autohotkey.com/forum/viewtopic.php?t=51342


    Example 3 : Movie-DB Creator 66L for IMDb.com

    Example 4 : ListView for http://www.google.com/movies

    Example 5 : Yahoo! Weather in TrayTip

        ... and finally here is StrX()

      Code:
      StrX( HBS="",BO=0,BT=1,   ES="",EO=0,ET=1,  ByRef N="" ) { ;    | by Skan | 19-Nov-2009
      Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
       <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
       +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html
      }




    Last edited by SKAN on May 14th, 2010, 8:46 am, edited 13 times in total.

    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 20th, 2009, 5:18 am 
    Offline

    Joined: February 22nd, 2009, 7:24 pm
    Posts: 21
    Location: Dallas TX
    This is great Skan!

    I'll have to go back and clean up some old parsing scripts with it. Thanks a bunch :D

    _________________
    "lol, i made this thing, but it didn't work... so I read the forums and now it does!"


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 20th, 2009, 9:26 am 
    Offline

    Joined: May 27th, 2007, 9:41 am
    Posts: 4999
    No Sir, I'm definitely not disappointed :D

    _________________
    AHK FAQ
    TF : Text files & strings lib, TF Forum


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 20th, 2009, 10:50 am 
    Offline

    Joined: October 20th, 2007, 10:40 am
    Posts: 15
    Location: china,hubei
    If I Comment No. 03 line

    It will go into an unend loop

    why does this happen?

    my English is pool, ^_^


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 20th, 2009, 12:43 pm 
    Offline
    User avatar

    Joined: December 26th, 2005, 4:40 pm
    Posts: 8776
    "Title Post" Updated with Example 2

    Download and extract links from a Google Search Result

    Code:
    UrlDownloadToFile, % "http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN"
                       . "307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=", Google.htm
    FileRead, html, Google.htm

    While Item := StrX( html,  "<h3 class=""r""><a href=",N,0, "<li class=g>",1,12, N )
          Sub1 := StrX( Item, "<a href=",1,9,  """"  ,1,1,  T )
        , Sub2 := StrX( Item, ">",       T,1,  "</a>",1,4     )
        , Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"

    MsgBox, %Text% ; Dependency :: Get UnHTM() www.autohotkey.com/forum/viewtopic.php?t=51342


    On a related note here is Lexikos' COM version for the same:
    http://www.autohotkey.com/forum/viewtop ... 714#182714


    Last edited by SKAN on May 8th, 2010, 4:25 am, edited 2 times in total.

    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 20th, 2009, 2:58 pm 
    Offline

    Joined: October 20th, 2007, 10:40 am
    Posts: 15
    Location: china,hubei
    Thanks SKAN's Reply !

    I still Don't UnderStand
    while Searching on the end of string, why It don't stop and break

    I had to add some other check code,

    add this three line in while loop can break

    Code:
    if ( N < old )
       break
    old := N


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 20th, 2009, 9:17 pm 
    Offline
    User avatar

    Joined: December 26th, 2005, 4:40 pm
    Posts: 8776
    linpinger wrote:
    I still Don't UnderStand
    while Searching on the end of string, why It don't stop and break

    I had to add some other check code,

    add this three line in while loop can break

    Code:
    if ( N < old )
       break
    old := N


    My code was at fault. I have re-written the function which has been posted on the top.
    You do not have to add code anymore.. When used with "While loop" StrX() will
    automatically parse the data and shall exit the loop gracefully.
    Please test the updated examples and let me know the status.

    linpinger wrote:
    Thanks SKAN's Reply !

    er.. You might find my reply missing as I have deleted it
    ... as it does not fit the current version of StrX() and may cause confusion.

    Thank You.


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 21st, 2009, 3:13 am 
    Offline

    Joined: October 20th, 2007, 10:40 am
    Posts: 15
    Location: china,hubei
    I have get the latest strX()

    It's completly Great !

    I noticed that new Example 1 don't have
    N := 1

    It means that N is blank, does it matter?
    (The result is right, have no problem.)


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 21st, 2009, 9:04 am 
    Offline
    User avatar

    Joined: December 26th, 2005, 4:40 pm
    Posts: 8776
    linpinger wrote:
    I have get the latest strX()
    It's completly Great !


    Thanks for testing it. :)

    linpinger wrote:
    I noticed that new Example 1 don't have
    N := 1

    It means that N is blank, does it matter?
    (The result is right, have no problem.)


    It is a side effect. The code tests the value of BeginOffset to make sure a negative value is not being passed to InStr().

    Code:
    BO < 0 ? 1 : BO  ; If BO is lesser than 0 use 1 -  otherwise use BO itself


    If you want to run both the posted examples from the same script,
    then you have to use a N := 1 in between them to reset N
    .. or you can name the variables differently, like N1 and N2

    :idea: Maybe StrX() should reset N with 1 when it is about to return an empty string?


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 21st, 2009, 1:08 pm 
    Offline

    Joined: October 20th, 2007, 10:40 am
    Posts: 15
    Location: china,hubei
    SKAN wrote:
    :idea: Maybe StrX() should reset N with 1 when it is about to return an empty string?


    I think reseting N is a good Ideal

    Because, When we Use N as the last Parameter

    It always show , N > strlen(xml)

    so, it seems N is not very usefull, reset it is a good ideal


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 21st, 2009, 1:39 pm 
    Offline

    Joined: March 16th, 2005, 10:33 pm
    Posts: 969
    Location: Frisia
    Very nice!

    8)

    _________________
    Image mirror 1mirror 2mirror 3ahk4.me • PM or Image


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: November 23rd, 2009, 6:18 am 
    Offline
    User avatar

    Joined: December 26th, 2005, 4:40 pm
    Posts: 8776
    linpinger wrote:
    I think reseting N is a good Ideal

    Because, When we Use N as the last Parameter

    It always show , N > strlen(xml)

    so, it seems N is not very usefull, reset it is a good ideal


    Code:
    While Item := StrX( html,  "<h3 class=r><a href=",N,0, "<li class=g>",1,12, N )
          Sub1 := StrX( Item, "<a href=",1,9,  """"  ,1,1,  T )
        , Sub2 := StrX( Item, ">",       T,1,  "</a>",1,4     )


    In above, if Sub1 result is empty Sub2 will definitely become empty, which is best behaviour to expect.


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: March 25th, 2010, 9:21 am 
    Offline
    User avatar

    Joined: December 26th, 2005, 4:40 pm
    Posts: 8776
      Real World example for using StrX() & UnHTM() to parse out text from HTML

        Row Structure for Text DB
        1. Year
        2. Movie title
        3. MPAA Rating
        4. Runtime ( in Minutes)
        5. IMDb hash - should be prefixed with www.imdb.com/title/tt to form a proper URL
        6. User Rating ( 1.0 to 10.0 )
        7. User Votes
        8. Genre ( Pipe Delimited Values )
        8. Director
        10. Stars ( Comma Seperated Values )
        11. Movie Outline
      Quote:
      One may use the "IMDb hash" to connect with other providers:

      The IMDb hash for "The Shawshank Redemption" is 0111161

      1) Connect to www.themoviedb.org for extended Movie Info : http://api.themoviedb.org/2.1/Movie.imdbLookup/en/xml/APIKEY/tt0111161
      2) Retrieve Images (Posters/Cover) from www.themoviedb.org : http://api.themoviedb.org/2.1/Movie.getImages/en/xml/APIKEY/tt0111161
      3) Connect to www.opensubtitles.org for subtitles : http://www.opensubtitles.org/en/search/imdbid-0111161/sublanguageid-eng/rss_2_00

      Again, the above methods return data in XML format which you may parse out with StrX()


      Movie-DB Creator 66L for IMDb.com
    Code:
    ; Movie-DB Creator 66L for IMDb.com ; By Skan / Last Modified: 24-Mar-2010
    ; Forum Post : www.autohotkey.com/forum/viewtopic.php?p=342196#342196
    ; Sample Output: www.autohotkey.net/~Skan/Scripts/StrX/IMDb/IMDb.txt (3.94 MiB)
    ; !!! Caution : Downloads around 1000 webpages from IMDb. Time/Bandwidth consuming operation.

    #SingleInstance, Force
    SetBatchLines, -1

    IMDB_SR := A_Temp "\IMDB_Search_Results.htm" ,     IMDB_WF := A_Temp "\IMDB_Work_File.txt"
    IMDB_URL := "http://www.imdb.com/search/title?has=asin-dvd-us&languages=en&num_votes=6,&"
              . "sort=num_votes,desc&start=1&title_type=feature"
    FileDelete, %IMDB_WF%
    URLDownloadToFile, %IMDB_URL%, %IMDB_SR%
    FileRead, HTM, %IMDB_SR%
    TotalE := StrX( HTM, "<div id=""left"">",1,24, "titles",1,7 )
    SysGet, m, MonitorWorkArea, 1
    Y := (mBottom-46-2),  X := (mRight-200-2), TotalE := RegExReplace( TotalE,"," )
    Progress, CWE6E3E4 CT000020 CBF73D00  x%x% y%y% w200 h46 B1 FS8 WM700 WS400 FM8 ZH8 ZY3
            , Downloading from IMDb, % "Page 1/" TP:=Round(TotalE/20), , Arial

    Z := A_Tab, StartE := 1
    Loop {
     List := "", N := 1
     While(  TR := StrX( HTM, "<tr class=",N,0, "</tr>",1,5, N ) )
        URat  := StrX( TR, "Users rated this ",1,17, "/",1,1, O1 )
      , Vote  := StrX( TR, "(",O1,1, " votes",1,6, O2 )
                , Vote := RegExReplace( Vote,"," )
      , IMDB := StrX( TR, "href=""/title/tt",0,15, """",1,2, O2 )            ; Reverse Search
      , Title := UnHTM( StrX( TR, ">",O2,1, "<",1,1, O3 ))
      , Year  := StrX( TR, "year_type"">(",O3,12,")",1,1 )
      , OutL  := UnHTM( StrX( TR, "outline"">",1,9, "<",1,1 ))
      , Dir   := UnHTM( StrX( TR, "Dir: <",1,5, "</a>",1,4 ))
      , Star  := UnHTM( StrX( TR, "With: <",1,6, "</span>",1,8 ))
      , Gen   := UnHTM( StrX( TR, "class=""genre",1,-6, "</span>",1,0, O4 ))
              ,  Gen := RegExReplace( Gen,A_Space )
      , CE    := StrX( TR, "title=",O4,6, A_Space,1,1 )
      , RunT  := UnHTM( StrX( TR, "class=""runtime",1,-6, " mins",1,5 ))
      , List  .= Year Z Title Z CE Z RunT Z IMDB Z URat Z Vote Z Gen Z Dir Z Star Z OutL "`n"
     FileAppend, %List%, %IMDB_WF%
     StringReplace, IMDB_URL, IMDB_URL, start=%StartE%, % "start=" ( StartE := StartE+20 )
     IfGreater,StartE,%TotalE%, Break
     Progress, % (StartE/TotalE)*100, % "Page " Round(StartE/20) "/" TP, Downloading from IMDb
     URLDownloadToFile, %IMDB_URL%, %IMDB_SR%
     FileRead, HTM, %IMDB_SR%
    }
    FileCopy, %IMDB_WF%, %A_ScriptDir%\IMDb.txt, 1
    Return                                                 ; // end of auto-execute section //

    StrX(HBS="",BO=0,BT=1,   ES="",EO=0,ET=1,  ByRef N="" ) { ;    | by Skan | 19-Nov-2009
    Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
     <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
     +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html
    }

    UnHTM( HTM ) { ; Remove HTML formatting / Convert to ordinary text     by SKAN 19-Nov-2009
     Static HT     ; Forum Topic: www.autohotkey.com/forum/topic51342.html
     IfEqual,HT,,   SetEnv,HT, % "&aacuteá&acircâ&acute´&aeligæ&agraveà&amp&aringå&atildeã&au"
     . "mlä&bdquo„&brvbar¦&bull•&ccedilç&cedil¸&cent¢&circˆ&copy©&curren¤&dagger†&dagger‡&deg"
     . "°&divide÷&eacuteé&ecircê&egraveè&ethð&eumlë&euro€&fnofƒ&frac12½&frac14¼&frac34¾&gt>&h"
     . "ellip…&iacuteí&icircî&iexcl¡&igraveì&iquest¿&iumlï&laquo«&ldquo“&lsaquo‹&lsquo‘&lt<&m"
     . "acr¯&mdash—&microµ&middot·&nbsp &ndash–&not¬&ntildeñ&oacuteó&ocircô&oeligœ&ograveò&or"
     . "dfª&ordmº&oslashø&otildeõ&oumlö&para¶&permil‰&plusmn±&pound£&quot""&raquo»&rdquo”&reg"
     . "®&rsaquo›&rsquo’&sbquo‚&scaronš&sect§&shy­&sup1¹&sup2²&sup3³&szligß&thornþ&tilde˜&tim"
     . "es×&trade™&uacuteú&ucircû&ugraveù&uml¨&uumlü&yacuteý&yen¥&yumlÿ
    "
     TXT := RegExReplace( HTM,"<[^>]+>" )               ; Remove all tags between  "<" and ">"
     Loop, Parse, TXT, &`;                              ; Create a list of special characters
       L := "&" A_LoopField ";", R .= (!(A_Index&1)) ? ( (!InStr(R,L,1)) ? L:"" ) : ""
     StringTrimRight, R, R, 1
     Loop, Parse, R , `;                                ; Parse Special Characters
      If F := InStr( HT, A_LoopField )                  ; Lookup HT Data
        StringReplace, TXT,TXT, %A_LoopField%`;, % SubStr( HT,F+StrLen(A_LoopField), 1 ), All
      Else If ( SubStr( A_LoopField,2,1)="#" )
        StringReplace, TXT, TXT, %A_LoopField%`;, % Chr(SubStr(A_LoopField,3)), All
    Return RegExReplace( TXT, "(^\s*|\s*$)")            ; Remove leading/trailing white spaces
    }


    EDITS:

    13-Sep-2010 : IMDb had updated its HTML layout breaking the script. The script is now altered and working


    Last edited by SKAN on September 13th, 2010, 3:04 pm, edited 4 times in total.

    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: March 25th, 2010, 9:35 am 
    Offline

    Joined: May 14th, 2009, 12:43 pm
    Posts: 57
    Location: UK
    Excellent, thanks SKAN.

    _________________
    PrimalNoise.com
    It's a rock. Can't wait to tell my friends. They don't have a rock this big.


    Report this post
    Top
     Profile  
    Reply with quote  
     Post subject:
    PostPosted: March 25th, 2010, 1:32 pm 
    Offline
    User avatar

    Joined: December 26th, 2005, 4:40 pm
    Posts: 8776
    noise wrote:
    Excellent, thanks SKAN.


    Thanks, Welcome. :)

    BTW, I have updated the script to include one more field: MPAA Rating
    and have also provided additional info on IMDb hash usage.

    :)


    Report this post
    Top
     Profile  
    Reply with quote  
    Display posts from previous:  Sort by  
    Post new topic Reply to topic  [ 53 posts ]  Go to page 1, 2, 3, 4  Next

    All times are UTC [ DST ]


    Who is online

    Users browsing this forum: No registered users and 13 guests


    You can post new topics in this forum
    You can reply to topics in this forum
    You cannot edit your posts in this forum
    You cannot delete your posts in this forum
    You cannot post attachments in this forum

    Search for:
    Powered by phpBB® Forum Software © phpBB Group