AutoHotkey Community

It is currently May 27th, 2012, 2:29 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3
Author Message
 Post subject:
PostPosted: February 9th, 2010, 6:38 am 
Offline
User avatar

Joined: May 24th, 2009, 5:35 am
Posts: 2099
Location: Iowa, USA
Code:
html =
(
   ; enter HTML here
)

doc := COM_CreateObject( "HTMLfile" )
doc.write( html )
data := doc.forms[0].childNodes[0].innerText "`n"
table := doc.all.tags( "table" )
         
Loop, % ( rows := table[0].all.tags( "tr" ) ).length {
   If A_Index = 1 ; build headers
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "font" ) ).length
         data .=   item[ A_Index-1 ].innerText
               .   ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column
   Else
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "td" ) ).length
         data .=   ( (text := item[ A_Index-1 ].innerText)="" && A_Index=1 ) ? ""
               :   ( data~="`n$" && text+0<>"" ? "`t" : "" ) text "`t"
   data .= "`n"
}

FileAppend, %data%, text.txt
MsgBox, %data%
Return

_________________
Image
Recommended: AutoHotkey_L
Basic Webpage Controls


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 10th, 2010, 3:29 pm 
Offline

Joined: July 6th, 2009, 9:58 pm
Posts: 678
Thank you again, that works very well, unfortunately there is an unforseen issue...

When I compile the script and try to run from an EXE, using AHK2EXE or AHK_COMPILE 2 I get an error of:

Error at Line 40
The follow variable name contains an illegal character:
"ppv.prm_"

Code:
#Include E:\FSROOT\FILES\SCRIPTS\AutoHotKey\MODULES\COM_L\com.ahk

;ready
filedelete, %A_scriptdir%\text.txt


;get the website html
url:="my URL"
pwb := COM_CreateObject("InternetExplorer.Application")
pwb.Visible := False
pwb.Navigate(url)
Loop
  if (pwb.readyState = 4) ; wait for page to load
    break
html := pwb.document.documentElement.innerHTML ; should be correct to get the source


;parse the data from the table
doc := COM_CreateObject( "HTMLfile" )
doc.write( html )
data := doc.forms[0].childNodes[0].innerText "`n"
table := doc.all.tags( "table" )

Loop, % ( rows := table[0].all.tags( "tr" ) ).length {
   If A_Index = 1 ; build headers
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "font" ) ).length
         data .=   item[ A_Index-1 ].innerText
               .   ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column
   Else
      Loop, % ( item := rows[ A_Index-1 ].all.tags( "td" ) ).length
         data .=   ( (text := item[ A_Index-1 ].innerText)="" && A_Index=1 ) ? ""
               :   ( data~="`n$" && text+0<>"" ? "`t" : "" ) text "`t"
   data .= "`n"
}

;append and launch
FileAppend, %data%, %A_scriptdir%\text.txt
run, Excel.exe "%A_scriptdir%\text.txt"
Exitapp
Return

Pause::
exitapp


Report this post
Top
 Profile  
Reply with quote  
PostPosted: February 10th, 2010, 3:55 pm 
randallf said

Quote:
I shall clarify, what I am looking for is something that will turn the above, which is a single table row, into a tab delimited row of the same data.



I copied the source HTML (thanks for attaching it) to file C:/Table.html, then ran the following script in biterscripting.

Code:
scr SS_WebPageToCSV.txt page("C:/Table.html")


I got the following output.

Quote:
, T3h Customer / 901094 , 1403 , 4.6 , 4.6 , 4.6 , 4.7 , 4.6 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , , ,


Is that what you looking for ? If so, take a look at the WebPageToCSV script at http://www.biterscripting.com/SS_WebPageToCSV.html . I have been using it for a while, and it extract a table from any web page (also local file) in all table cases I have encountered. You can also specify a table number using number(5), etc.

The script outputs in CSV (Comma Separated Values). I understand you want TSV (Tab Separated Values). Just change comma (,) in the script to tab (\t).


Report this post
Top
  
Reply with quote  
PostPosted: February 10th, 2010, 4:11 pm 
Offline

Joined: July 6th, 2009, 9:58 pm
Posts: 678
JenniC wrote:
randallf said

Quote:
I shall clarify, what I am looking for is something that will turn the above, which is a single table row, into a tab delimited row of the same data.



I copied the source HTML (thanks for attaching it) to file C:/Table.html, then ran the following script in biterscripting.

Code:
scr SS_WebPageToCSV.txt page("C:/Table.html")


I got the following output.

Quote:
, T3h Customer / 901094 , 1403 , 4.6 , 4.6 , 4.6 , 4.7 , 4.6 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , , ,


Is that what you looking for ? If so, take a look at the WebPageToCSV script at http://www.biterscripting.com/SS_WebPageToCSV.html . I have been using it for a while, and it extract a table from any web page (also local file) in all table cases I have encountered. You can also specify a table number using number(5), etc.

The script outputs in CSV (Comma Separated Values). I understand you want TSV (Tab Separated Values). Just change comma (,) in the script to tab (\t).


!!! I will have to check this out as yes, that's exactly what I'm after, thanks for the info!

Edit: Unfortunately not after anything that isn't 10,000% free. Why does it want a license? Whatever.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 10th, 2010, 7:38 pm 
randallf wrote:
When I compile the script and try to run from an EXE, using AHK2EXE or AHK_COMPILE 2 I get an error of:

Error at Line 40
The follow variable name contains an illegal character:
"ppv.prm_"

I wasn't able to duplicate this error. Make sure you have the most recent versions of AHKL & COM.ahk - and make sure you're consistent with either Unicode or ANSI.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: February 19th, 2010, 7:43 pm 
Offline

Joined: July 6th, 2009, 9:58 pm
Posts: 678
I did get this issue sorted, if I am correct in my brief troubleshooting you need the dec version of AHK_L... I am learning a lot of object related stuff right now (in learning Python) which helps greatly in understanding this, (and again thank you for the excellent code)

Code:
 ( A_Index<4 ? " " : "`t" ) ; combine first 4 headers in first column


It seems that "4" is not a constant in my environment, I am making some attempts on counting the rows out and changing the line %accordingly%, but if you have any suggestions I am all ears :)

thanks again!

Edit: I think I may have got my head around this, it may depend on how many levels of the page have been expanded. The URL's that I am using to pull from are actually the 'expand' links from the site itself, the "4" above probably depends on the hierarchy of expansion chosen.

:)


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3

All times are UTC [ DST ]


Who is online

Users browsing this forum: Alpha Bravo, LazyMan, rbrtryn and 20 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group