AutoHotkey Community

It is currently May 25th, 2012, 9:38 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: June 27th, 2007, 4:48 pm 
Offline

Joined: March 18th, 2007, 10:00 am
Posts: 11
Hi,
after hours of debugging I am not capable of solving a problem.

I want to parse the HTML of a spanish forum page and get the thread titles.
I tested the regular expression with a third program and seems to be ok, my problem is that the capture group (?<hilo>...) value is updated each 7 loops ( and not each loop like it is supposed to do ) but the value returned by RegExMatch seems to be okay, it changes and is updated at each call.

The code:

Code:
UrlDownloadToFile, http://foros.acb.com/viewforum.php?f=1, fACB
FileRead, html, fACB
res =
pos = 1
Loop
{
   sitio := MyRegEx(html,pos)

   pos := sitio + 1

   if sitio = 0
      break
}

MsgBox, Result:%res%

MyRegEx(text,position)
{
   local dani, danihilo
   cursor:=RegExMatch(text,"<a href=""viewtopic\.php\?t=.+?""\sclass=""topictitle"">(?<hilo>.+?)</a>",dani,position)
   res .= "match:" . danihilo . "`n"
   return cursor
}


Have you experienced something similar, the captured var value not updating between loops ?
After looking other thread I have declared local the vars for the regex expression groups, but same result.

Thanks for reading


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 27th, 2007, 5:34 pm 
Offline

Joined: March 18th, 2007, 10:00 am
Posts: 11
I've detected that my RegExp is not as good as I thought, is returning more results that expected.

So I concrete my question:

How I can write a regular expression that returns only Anchor tags with class attribute ?

My current expression:
<a href=""viewtopic\.php\?t=.+?""\sclass=""topictitle"">(?<hilo>.+?)</a>

is returning anchor tags with or without class attribute and with href equal to viewtopic.php?t=...


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 27th, 2007, 7:30 pm 
Offline
User avatar

Joined: August 30th, 2005, 8:43 pm
Posts: 8647
Location: Salem, MA
search for XPath in the forum, I think you can get them directly. A fancy Regex would work too, but I am just learning regex.

_________________
Image
(Common Answers) - New Tutorials Forum - Humongous FAQ


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 29th, 2007, 11:37 am 
songless wrote:
How I can write a regular expression that returns only Anchor tags with class attribute ?
Start of an answer:
Get all a href links in a Web page

Althought that's not what you asked.
Well, I suggest to write a little script using your regex, applied to some continuation section, so we can test and see what is wrong.
The look of the RE you give seems OK, so I wonder what is wrong.


Report this post
Top
  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Bing [Bot], lblb, Yahoo [Bot] and 75 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group