AutoHotkey Community

It is currently May 27th, 2012, 1:10 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: May 18th, 2008, 11:26 am 
Offline

Joined: February 13th, 2008, 7:03 am
Posts: 15
Location: Denmark
Hi guys im interested in extracting 11075 and e9.bogeyman, from the line below.

Code:
<td><a href="?section=profile&amp;show=sig&amp;id=11075">e9.bogeyman</a> <img src="gfx/laenderflaggen/Singapore.gif" alt="[SG]" title="Singapore"> <img src="gfx/laenderflaggen/Denmark.gif" alt="[DK]" title="Denmark"></td>'


The length of the name and numbers may differ. Also there might be a diffrence in the actual line number (in this case its 220)

My own best suggestion would be to download the file, use a loop to search each line for the string :

href="?section=profile&amp;show=sig&amp;id

however this method might be slow, and the script needs to parse around 100.000 html documents. Also I dont know how to point out the number and name (as these may vary in size)

Any suggestions ?

thx in advance


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 18th, 2008, 11:31 am 
Loop, Read + something like InStr() or FileRead + RegExMatch or xpath.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: May 18th, 2008, 11:34 am 
Offline

Joined: February 13th, 2008, 7:03 am
Posts: 15
Location: Denmark
The name and number will differ each time, so what to search for ?

I could use RegExMatch but I would need clumpsy loops too ?

if I try to search for "http://thisiswhatcomesbeforethenumber?" and lets say I get position called 7.

I would have 7, then i can go to endpos of the string, but i still dont know how long the number or name is. the length could be between 1 and 10

So I would have to do a clumpsy loop, to look for a < sign ?


Last edited by Buckie on May 18th, 2008, 12:12 pm, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 18th, 2008, 12:11 pm 
RegExMatch and Regular Expressions (RegEx) - Quick Reference and #EscapeChar (and explanation of escape sequences)

not-tested (and, there are RegEx cracks around that could probably shorten this considerably)

Code:
; the question-mark is escaped for regex
before := "<td><a href="\?section=profile&amp;show=sig&amp;id="
; the double-quote is escaped for ahk
between := """<"
after := "<"

; "(.*)" is regex for: return anything in match variable as array
pattern := before . "(.*)" . between . "(.*)" . after

FileRead, aFile, pathToAFile
RegExMatch(aFile, pattern, match)
MsgBox % match1 . " " . match2


HTH


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: May 18th, 2008, 12:12 pm 
Offline

Joined: February 13th, 2008, 7:03 am
Posts: 15
Location: Denmark
Briliant, thx


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 18th, 2008, 12:55 pm 
There is a mistake in the script :wink:

To make it easier:

regex wrote:
the characters \.*?+[{|()^$ must be preceded by a backslash to be seen as literal


ahk wrote:
the characters ,%`'" might get special treatment, or have to be escaped (depends). Within an expression, two consecutive quotes enclosed inside a literal string resolve to a single literal quote.


Report this post
Top
  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Amandaville, BrandonHotkey, chaosad and 20 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group