AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Looking for: Parse rows from HTML table
Goto page Previous  1, 2, 3
 
Reply to topic    AutoHotkey Community Forum Index -> Ask for Help
View previous topic :: View next topic  
Author Message
jethrow



Joined: 24 May 2009
Posts: 1907
Location: Iowa, USA

PostPosted: Tue Feb 09, 2010 5:38 am    Post subject: Reply with quote

Code:
html =
(
   ; enter HTML here
)

doc := COM_CreateObject( "HTMLfile" )
doc.write( html )
data := doc.forms[0].childNodes[0].innerText "`n"
table := doc.all.tags( "table" )
         
Loop, % ( rows := table[0].all.tags( "tr" ) ).length {
   If A_Index = 1 ; build headers
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "font" ) ).length
         data .=   item[ A_Index-1 ].innerText
               .   ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column
   Else
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "td" ) ).length
         data .=   ( (text := item[ A_Index-1 ].innerText)="" && A_Index=1 ) ? ""
               :   ( data~="`n$" && text+0<>"" ? "`t" : "" ) text "`t"
   data .= "`n"
}

FileAppend, %data%, text.txt
MsgBox, %data%
Return

_________________
Very Happy - in case I forgot to smile
Basic Webpage Controls
COM Object Reference
Back to top
View user's profile Send private message Visit poster's website Yahoo Messenger
randallf



Joined: 06 Jul 2009
Posts: 678

PostPosted: Wed Feb 10, 2010 2:29 pm    Post subject: Reply with quote

Thank you again, that works very well, unfortunately there is an unforseen issue...

When I compile the script and try to run from an EXE, using AHK2EXE or AHK_COMPILE 2 I get an error of:

Error at Line 40
The follow variable name contains an illegal character:
"ppv.prm_"

Code:
#Include E:\FSROOT\FILES\SCRIPTS\AutoHotKey\MODULES\COM_L\com.ahk

;ready
filedelete, %A_scriptdir%\text.txt


;get the website html
url:="my URL"
pwb := COM_CreateObject("InternetExplorer.Application")
pwb.Visible := False
pwb.Navigate(url)
Loop
  if (pwb.readyState = 4) ; wait for page to load
    break
html := pwb.document.documentElement.innerHTML ; should be correct to get the source


;parse the data from the table
doc := COM_CreateObject( "HTMLfile" )
doc.write( html )
data := doc.forms[0].childNodes[0].innerText "`n"
table := doc.all.tags( "table" )

Loop, % ( rows := table[0].all.tags( "tr" ) ).length {
   If A_Index = 1 ; build headers
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "font" ) ).length
         data .=   item[ A_Index-1 ].innerText
               .   ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column
   Else
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "td" ) ).length
         data .=   ( (text := item[ A_Index-1 ].innerText)="" && A_Index=1 ) ? ""
               :   ( data~="`n$" && text+0<>"" ? "`t" : "" ) text "`t"
   data .= "`n"
}

;append and launch
FileAppend, %data%, %A_scriptdir%\text.txt
run, Excel.exe "%A_scriptdir%\text.txt"
Exitapp
Return

Pause::
exitapp
Back to top
View user's profile Send private message
JenniC
Guest





PostPosted: Wed Feb 10, 2010 2:55 pm    Post subject: Extracting table from html <table...</table> Reply with quote

randallf said

Quote:
I shall clarify, what I am looking for is something that will turn the above, which is a single table row, into a tab delimited row of the same data.



I copied the source HTML (thanks for attaching it) to file C:/Table.html, then ran the following script in biterscripting.

Code:
scr SS_WebPageToCSV.txt page("C:/Table.html")


I got the following output.

Quote:
, T3h Customer / 901094 , 1403 , 4.6 , 4.6 , 4.6 , 4.7 , 4.6 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , , ,


Is that what you looking for ? If so, take a look at the WebPageToCSV script at http://www.biterscripting.com/SS_WebPageToCSV.html . I have been using it for a while, and it extract a table from any web page (also local file) in all table cases I have encountered. You can also specify a table number using number(5), etc.

The script outputs in CSV (Comma Separated Values). I understand you want TSV (Tab Separated Values). Just change comma (,) in the script to tab (\t).
Back to top
randallf



Joined: 06 Jul 2009
Posts: 678

PostPosted: Wed Feb 10, 2010 3:11 pm    Post subject: Re: Extracting table from html <table...</table> Reply with quote

JenniC wrote:
randallf said

Quote:
I shall clarify, what I am looking for is something that will turn the above, which is a single table row, into a tab delimited row of the same data.



I copied the source HTML (thanks for attaching it) to file C:/Table.html, then ran the following script in biterscripting.

Code:
scr SS_WebPageToCSV.txt page("C:/Table.html")


I got the following output.

Quote:
, T3h Customer / 901094 , 1403 , 4.6 , 4.6 , 4.6 , 4.7 , 4.6 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , , ,


Is that what you looking for ? If so, take a look at the WebPageToCSV script at http://www.biterscripting.com/SS_WebPageToCSV.html . I have been using it for a while, and it extract a table from any web page (also local file) in all table cases I have encountered. You can also specify a table number using number(5), etc.

The script outputs in CSV (Comma Separated Values). I understand you want TSV (Tab Separated Values). Just change comma (,) in the script to tab (\t).


!!! I will have to check this out as yes, that's exactly what I'm after, thanks for the info!

Edit: Unfortunately not after anything that isn't 10,000% free. Why does it want a license? Whatever.
Back to top
View user's profile Send private message
jethrow - nli
Guest





PostPosted: Wed Feb 10, 2010 6:38 pm    Post subject: Reply with quote

randallf wrote:
When I compile the script and try to run from an EXE, using AHK2EXE or AHK_COMPILE 2 I get an error of:

Error at Line 40
The follow variable name contains an illegal character:
"ppv.prm_"

I wasn't able to duplicate this error. Make sure you have the most recent versions of AHKL & COM.ahk - and make sure you're consistent with either Unicode or ANSI.
Back to top
randallf



Joined: 06 Jul 2009
Posts: 678

PostPosted: Fri Feb 19, 2010 6:43 pm    Post subject: Reply with quote

I did get this issue sorted, if I am correct in my brief troubleshooting you need the dec version of AHK_L... I am learning a lot of object related stuff right now (in learning Python) which helps greatly in understanding this, (and again thank you for the excellent code)

Code:
 ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column


It seems that "4" is not a constant in my environment, I am making some attempts on counting the rows out and changing the line %accordingly%, but if you have any suggestions I am all ears Smile

thanks again!

Edit: I think I may have got my head around this, it may depend on how many levels of the page have been expanded. The URL's that I am using to pull from are actually the 'expand' links from the site itself, the "4" above probably depends on the hierarchy of expansion chosen.

Smile
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> Ask for Help All times are GMT
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group