AutoHotkey Community

It is currently May 27th, 2012, 2:16 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: February 5th, 2010, 11:33 am 
Offline

Joined: February 24th, 2006, 12:56 am
Posts: 172
here is the code i have (it is supposed to get info from amazon)
it just dont recognize html tag <div class="productTitle" it was working before, can somebody verify this??

Code:
^F1::

    page=1
    endpage=5598  ;category specific
   
    url=http://www.amazon.com/s/qid=1265362596/ref=sr_pg_1?ie=UTF8&rs=226680&bbn=226680&rh=i:stripbooks,n:!1000,n:75,n:226680&page=%page%
   
        Loop
        {
       
           
            ;download given url
            URLDownloadToFile, %url%, file.tmp
           
            html_tag=<div class="productTitle"
           
            Loop, read, file.tmp
            {
             
              IfInString, A_LoopReadLine, %html_tag%
                 {                   
                   
                     str=%A_LoopReadLine%
                     MsgBox, %str%
                    fp := RegExMatch(str,"<a.href=.(.*).>.*</a>",r)
                   
                    if(fp>0){
                     
                        MsgBox, r: %r%
                        MsgBox, r1: %r1%
                       
                    }
                     
                 }
              }

         
         
          ;get urls in the web page
         
          ;place data into spreadsheet
         
          if(count=endpage)
            break
                     
        }
     
return


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 2:14 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5480
Location: the tunnel(?=light)
Have you checked the lines most recently executed to find out where the script is terminating?

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 2:59 pm 
Offline

Joined: February 24th, 2006, 12:56 am
Posts: 172
the script works it just recognize html tag i specified
i tried to put something else and it does recognize it
it is problem with this one, i dont know what is going on

can someone take a quick look into it?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 6:03 pm 
Offline

Joined: February 24th, 2006, 12:56 am
Posts: 172
let me simplify

i have this line in the web page:
<div class="productTitle"><a href="http://www.amazon.com/Omnivores-Dilemma-Natural-History-Meals/dp/0143038583/ref=sr_1_1?ie=UTF8&s=books&qid=1265389329&sr=1-1"> The Omnivore's Dilemma: A Natural History of Four Meals</a> <span class="ptBrand">by <a href="/Michael-Pollan/e/B000AQ74HQ/ref=sr_ntt_srch_lnk_1?_encoding=UTF8&amp;qid=1265389329&amp;sr=1-1">Michael Pollan</a></span><span class="binding"> (<span class="format">Paperback</span> - Aug. 28, 2007)</span></div>

i need to extract this

http://www.amazon.com/Omnivores-Dilemma ... 329&sr=1-1

how i am suppose to do that?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 6:46 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5480
Location: the tunnel(?=light)
When I manually search the generated temp file for that link there is no <div class="productTitle" in it. Perhaps they've changed their page coding.

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 7:25 pm 
Offline

Joined: February 24th, 2006, 12:56 am
Posts: 172
here is the first page i need to extract info from

http://www.amazon.com/s/qid=1265362596/ ... 680&page=1

basically i need to extract all urls that lead to 12 listed books so that i can get info from each of them later.


there is productTitle, it seems that somehow it cant be used


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 7:50 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5480
Location: the tunnel(?=light)
This is what you'll find in the source for productTitle:

matches wrote:
listView div.productTitle
div.productTitle


Nothing with <div class="productTitle"... so you'll need to change your search options.

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 7:58 pm 
Offline

Joined: February 24th, 2006, 12:56 am
Posts: 172
i wonder how it is visible on my system then
take a look at the screenshot:

http://img188.yfrog.com/i/88767958.jpg/


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 5th, 2010, 8:30 pm 
Offline
User avatar

Joined: March 19th, 2008, 12:43 am
Posts: 5480
Location: the tunnel(?=light)
I don't know. I downloaded the url using UrlDownloadToFile and the resulting text file contained no instances of productTitle as a div class except in that dotted path format.

If you select 'View Source' from the page itself the productTitle div class tags are all there. With that in mind you could use COM if you don't mind the page being open(requires COM Standard Library and the iWeb functions):

Code:
iWeb_Init()
pwb:=iWeb_getWin("http://www.amazon.com/s/qid=1265362596/ref=sr_pg_1?ie=UTF8&rs=226680&bbn=226680&rh=i:stripbooks,n:!1000,n:75,n:226680&page=1")
Loop % iWeb_getTagLen(pwb,"div") {
  if InStr(iWeb_getTagObj(pwb,"div",A_Index-1,-1,-1,"className")
   ,"productTitle") {
    RegExMatch(iWeb_getTagObj(pwb,"div",A_Index-1,-1,-1,"outerHTML")
     ,"i)(?<=<a href="")http://.*?(?="">)",m)
    MsgBox % m
  }
}
iWeb_Release(pwb)
iWeb_Term()
return

iWeb_getTagLen(pdsp,tag,t="-1",r="-1",frm="") {

   If pWin:=iWeb_DomWin(pdsp,frm) ; doesn't this cover the frames already?
      result:=COM_Invoke(pWin,"document.all.tags[" tag "]"
       . ((tag="table" && t>=0) ? ".item[" t "].rows" : "") ((tag="table" && r>=0) ? "[" r "].cells" : "")
       . ".length")
   COM_Release(pWin)
   return result

}

iWeb_getTagObj(pdsp,tag,itm,r="-1",c="-1",type="innerText",frm="") {

   If pWin:=iWeb_DomWin(pdsp,frm)
      result:=COM_Invoke(pWin,"document.all.tags[" tag "].item[" itm "]"
       . ((tag="table" && r>=0) ? ".rows[" r "]" : "") ((tag="table" && c>=0) ? ".cells[" c "]" : "")
       . "." type)
   COM_Release(pWin)
   return result

}

_________________
Image
Try Quick Search for Autohotkey or see the tutorial for newbies.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Bing [Bot], Google Feedfetcher, LazyMan, nimda, poserpro, rbrtryn, sjc1000, Yahoo [Bot] and 15 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group