AutoHotkey Community

It is currently May 27th, 2012, 11:24 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: July 17th, 2009, 6:00 am 
I'm new to autohotkey. I'm trying to parse some information from an HTML file. What I want to do is that search through the URLs inside the file and find the first substring (with a known length) inside an URL after another particular substring. For example, say if an HTML file has the following URLs in it:

"http://abc.def/ghi"
"http://jkl.def/mno"

I'm looking for the first string after "def/", and that would be "ghi" in this case, after this is done, it will return "ghi" and terminate the loop. The length of both substring are known in advance. I'm a C programmer so I'm not very used to the syntax of autohotkey, especially the loops. I'm trying to parse each substring (with a length of 4 in my example) first inside every string, if it equals to the word I'm looking for, then return the next few characters after it (which in the example, has a length of 3), but I'm not sure how it is excatly done in autohotkey. Could anyone help me with this?


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 6:44 am 
Offline

Joined: March 24th, 2005, 11:50 am
Posts: 398
Location: germany
use
voundPos := RegExMatch("abcXYZ123", "abc(.*)123", SubPat) ; Returns 1 and stores "XYZ" in SubPat1

http://www.autohotkey.com/docs/commands/RegExMatch.htm


Read about Regex-Operations in the helpfile, they are very handy, formerly I used the normal string operations , but with Regexmatch all becomes easier

or use stringgetpos and then stringtrimleft


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 3:21 pm 
Thanks a lot for the help!

Now I'm getting an error every time I load the script saying it's missing a '('. This is what I have so far. I know that the word that I'm looking for ends with a quotation mark. Is there anything I did wrong here?

Code:
 
Loop, read, html.txt {
    found := RegExMatch(%A_LoopReadLine%, "key(.*)\"", word)
    if found = 1 {
      FileDelete html.txt
      return %word%
    }
  }


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 4:02 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5482
Location: the tunnel(?=light)
The problem is likely here:

Code:
RegExMatch(%A_LoopReadLine%, "key(.*)\"", word)


The extra double quote makes it appear as if the RegEx Needle is still being defined, eliminate one of them and try it again. Also, the one true brace style is not supported in a file-reading loop so you may want to move the brace to the next line:


Code:
Loop, read, html.txt
{
  found := RegExMatch(%A_LoopReadLine%, "key(.*)\"", word)
  if found = 1
  {
    FileDelete html.txt
    return %word%
  }
}

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 5:43 pm 
Thanks for the advice.

Now the script can be loaded, but more problems emerge. Every time when it run, it will say the variable that I'm reading contains illegal characters since I'm parsing HTML document. What should I do? Also, for RegExMatch, the string that I'm looking for is the word in the following URL:

<a href="http://www.abc.def/ghi=word">blah</a>

So the word is surrounded by an equal sign and a quotation mark. I used "ghi=" as the first keyword before (.*), but it seems that I can't use a quotation mark to end it. Is there any other way that I can handle this situation properly?


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 5:59 pm 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
wwkuter1 wrote:
Now the script can be loaded, but more problems emerge. Every time when it run, it will say the variable that I'm reading contains illegal characters since I'm parsing HTML document.


Remove %

sinkfaze wrote:
Code:
Loop, read, html.txt
{
  found := RegExMatch(%A_LoopReadLine%, "key(.*)"", word)
  if found = 1
  {
    FileDelete html.txt
    return %word%
  }
}


Code:
Str=<a href="http://www.abc.def/ghi=word">blah</a>
Match=ghi=(.*)"
RegExMatch( Str, Match, Result )
MsgBox, % Result


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 6:00 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5482
Location: the tunnel(?=light)
Try this:

Code:
Loop, read, html.txt
{
  found := RegExMatch(A_LoopReadLine, "key(.*)\"", word) ; no percent signs around A_LoopReadLine
  if found = 1
  {
    FileDelete html.txt
    return %word%
  }
}


And if you're trying to capture a subpattern here:

Code:
RegExMatch(A_LoopReadLine, "key(.*)\"", word)


Wouldn't you want to do this?

Code:
return %word1%

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2009, 7:32 pm 
Thanks a lot!! It's working now!

I have a few other questions.

Is it possible to make Inputbox have multiple input field? I know I can use StringSplit and just let the user input everything in one field, but it just doesn't look very good.


Report this post
Top
  
Reply with quote  
 Post subject: Parsing of URLs
PostPosted: August 30th, 2009, 5:30 pm 
So you have an html file, you want to find all strings between def/ and " . The following script will work.


Code:
# Script def.txt
# Read file in.
var str file ; cat "http://www.somesite.com/somepage.html" > $file
# Go thru all instance of def/  .
while ( { sen -c "^def/^" $file } > 0 )
do
    # Throw away portion up to def/.
    stex "^def/^]" $file > null
    # Print out string up to the following ".
    stex "]^\"^" $file
done



Script is in biterscripting ( http://www.biterscripting.com ) . To try, save the script as C:/Scripts/def.txt, start biterscripting, enter the following command.

Code:
script "C:/Scripts/def.txt"



Make sure you enter the correct html file location in the script instead of "http://www.somesite.com/somepage.html". It can be a local file such as "C:/xyz.html" or a document on the internet such as "http://www.something.com/somepage.html". Double quotes are required in both cases. If on the internet, the beginning http:// is required. The extension does not matter - it can be a .txt file, .asp file, whatever. As long as the script finds instance of def/ in it, it will extract strings between def/ and ".

Hope this helps.


Jenni

( There is also a sample script posted at http://www.biterscripting.com/SS_URLs.html . That extracts all URLs from a web page, if that's what you are trying to do. )


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: August 30th, 2009, 6:02 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5482
Location: the tunnel(?=light)
Jenni, I think you're in the wrong forum.

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 31st, 2009, 11:21 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7503
Location: Australia
wwkuter1, since you've mentioned you're a C programmer, I suppose when you write \" you mean a literal " character. Perhaps you've figured it out already, but AutoHotkey uses ` as an escape character; except with double-quote marks, which must be doubled up to be interpreted literally in an expression:
Quote:
To include an actual quote-character inside a literal string, specify two consecutive quotes as shown twice in this example: "She said, ""An apple a day."""
Source: Variables and Expressions
\ has special meaning only in regular expressions, which are processed at a much later stage than ` or "".
Quote:
Is it possible to make Inputbox have multiple input field?
It is not. However, it can easily be accomplished with AutoHotkey's GUI command/framework. A very basic example:
Code:
Gui, Add, Text,, Put your prompt here.
Gui, Add, Edit, vFirst
Gui, Add, Edit, vSecond
Gui, Add, Edit, vThird
Gui, Add, Button,, OK
Gui, Add, Button, x+10, Cancel
Gui, Show
return

ButtonOK:
Gui, Submit
MsgBox %First%, %Second%, %Third%
ButtonCancel:
GuiClose:
ExitApp


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Yahoo [Bot] and 20 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group