 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
jethrow
Joined: 24 May 2009 Posts: 1907 Location: Iowa, USA
|
Posted: Tue Feb 09, 2010 5:38 am Post subject: |
|
|
| Code: | html =
(
; enter HTML here
)
doc := COM_CreateObject( "HTMLfile" )
doc.write( html )
data := doc.forms[0].childNodes[0].innerText "`n"
table := doc.all.tags( "table" )
Loop, % ( rows := table[0].all.tags( "tr" ) ).length {
If A_Index = 1 ; build headers
Loop, % ( item := rows[ A_Index-1 ].all.tags( "font" ) ).length
data .= item[ A_Index-1 ].innerText
. ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column
Else
Loop, % ( item := rows[ A_Index-1 ].all.tags( "td" ) ).length
data .= ( (text := item[ A_Index-1 ].innerText)="" && A_Index=1 ) ? ""
: ( data~="`n$" && text+0<>"" ? "`t" : "" ) text "`t"
data .= "`n"
}
FileAppend, %data%, text.txt
MsgBox, %data%
Return |
_________________
- in case I forgot to smile
Basic Webpage Controls
COM Object Reference |
|
| Back to top |
|
 |
randallf
Joined: 06 Jul 2009 Posts: 678
|
Posted: Wed Feb 10, 2010 2:29 pm Post subject: |
|
|
Thank you again, that works very well, unfortunately there is an unforseen issue...
When I compile the script and try to run from an EXE, using AHK2EXE or AHK_COMPILE 2 I get an error of:
Error at Line 40
The follow variable name contains an illegal character:
"ppv.prm_"
| Code: | #Include E:\FSROOT\FILES\SCRIPTS\AutoHotKey\MODULES\COM_L\com.ahk
;ready
filedelete, %A_scriptdir%\text.txt
;get the website html
url:="my URL"
pwb := COM_CreateObject("InternetExplorer.Application")
pwb.Visible := False
pwb.Navigate(url)
Loop
if (pwb.readyState = 4) ; wait for page to load
break
html := pwb.document.documentElement.innerHTML ; should be correct to get the source
;parse the data from the table
doc := COM_CreateObject( "HTMLfile" )
doc.write( html )
data := doc.forms[0].childNodes[0].innerText "`n"
table := doc.all.tags( "table" )
Loop, % ( rows := table[0].all.tags( "tr" ) ).length {
If A_Index = 1 ; build headers
Loop, % ( item := rows[ A_Index-1 ].all.tags( "font" ) ).length
data .= item[ A_Index-1 ].innerText
. ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column
Else
Loop, % ( item := rows[ A_Index-1 ].all.tags( "td" ) ).length
data .= ( (text := item[ A_Index-1 ].innerText)="" && A_Index=1 ) ? ""
: ( data~="`n$" && text+0<>"" ? "`t" : "" ) text "`t"
data .= "`n"
}
;append and launch
FileAppend, %data%, %A_scriptdir%\text.txt
run, Excel.exe "%A_scriptdir%\text.txt"
Exitapp
Return
Pause::
exitapp |
|
|
| Back to top |
|
 |
JenniC Guest
|
Posted: Wed Feb 10, 2010 2:55 pm Post subject: Extracting table from html <table...</table> |
|
|
randallf said
| Quote: | | I shall clarify, what I am looking for is something that will turn the above, which is a single table row, into a tab delimited row of the same data. |
I copied the source HTML (thanks for attaching it) to file C:/Table.html, then ran the following script in biterscripting.
| Code: | | scr SS_WebPageToCSV.txt page("C:/Table.html") |
I got the following output.
| Quote: | | , T3h Customer / 901094 , 1403 , 4.6 , 4.6 , 4.6 , 4.7 , 4.6 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , , , |
Is that what you looking for ? If so, take a look at the WebPageToCSV script at http://www.biterscripting.com/SS_WebPageToCSV.html . I have been using it for a while, and it extract a table from any web page (also local file) in all table cases I have encountered. You can also specify a table number using number(5), etc.
The script outputs in CSV (Comma Separated Values). I understand you want TSV (Tab Separated Values). Just change comma (,) in the script to tab (\t). |
|
| Back to top |
|
 |
randallf
Joined: 06 Jul 2009 Posts: 678
|
Posted: Wed Feb 10, 2010 3:11 pm Post subject: Re: Extracting table from html <table...</table> |
|
|
| JenniC wrote: | randallf said
| Quote: | | I shall clarify, what I am looking for is something that will turn the above, which is a single table row, into a tab delimited row of the same data. |
I copied the source HTML (thanks for attaching it) to file C:/Table.html, then ran the following script in biterscripting.
| Code: | | scr SS_WebPageToCSV.txt page("C:/Table.html") |
I got the following output.
| Quote: | | , T3h Customer / 901094 , 1403 , 4.6 , 4.6 , 4.6 , 4.7 , 4.6 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , , , |
Is that what you looking for ? If so, take a look at the WebPageToCSV script at http://www.biterscripting.com/SS_WebPageToCSV.html . I have been using it for a while, and it extract a table from any web page (also local file) in all table cases I have encountered. You can also specify a table number using number(5), etc.
The script outputs in CSV (Comma Separated Values). I understand you want TSV (Tab Separated Values). Just change comma (,) in the script to tab (\t). |
!!! I will have to check this out as yes, that's exactly what I'm after, thanks for the info!
Edit: Unfortunately not after anything that isn't 10,000% free. Why does it want a license? Whatever. |
|
| Back to top |
|
 |
jethrow - nli Guest
|
Posted: Wed Feb 10, 2010 6:38 pm Post subject: |
|
|
| randallf wrote: | When I compile the script and try to run from an EXE, using AHK2EXE or AHK_COMPILE 2 I get an error of:
Error at Line 40
The follow variable name contains an illegal character:
"ppv.prm_" |
I wasn't able to duplicate this error. Make sure you have the most recent versions of AHKL & COM.ahk - and make sure you're consistent with either Unicode or ANSI. |
|
| Back to top |
|
 |
randallf
Joined: 06 Jul 2009 Posts: 678
|
Posted: Fri Feb 19, 2010 6:43 pm Post subject: |
|
|
I did get this issue sorted, if I am correct in my brief troubleshooting you need the dec version of AHK_L... I am learning a lot of object related stuff right now (in learning Python) which helps greatly in understanding this, (and again thank you for the excellent code)
| Code: | | ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column |
It seems that "4" is not a constant in my environment, I am making some attempts on counting the rows out and changing the line %accordingly%, but if you have any suggestions I am all ears
thanks again!
Edit: I think I may have got my head around this, it may depend on how many levels of the page have been expanded. The URL's that I am using to pull from are actually the 'expand' links from the site itself, the "4" above probably depends on the hierarchy of expansion chosen.
 |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|