Extract table element from a page Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
sqlcode
Posts: 27
Joined: 02 Nov 2017, 08:55

Extract table element from a page

18 Jan 2018, 18:16

I have a code like below that i think gives me a table, i need to extract "value_of_my_field". I ran my ahk script as below and it gives me both rows but i am interested in only first column of the first row.

How do i only extract first column value of the first row?

Code: Select all

	<div class="row" id="DtlId">
		<div class="col-md-12" style="-ms-overflow-x: auto;">
			<table class="Tabform tblCont" style="width: 50%; margin-top: 8px;">
			<tbody>
			<tr>
			<th title="Some  Desc">Something</th>
			<th title="Some1 Desc">Something1</th>
			<th title="Some2 Desc">Something2</th>
						<tr>
			<td>value_of_my_field</td>
			<td>value_of_some2</td>
			<td>value_of_some3</td>
		</div>
	</div>
            

Code: Select all

table := pwb.document.getElementById("DtlId").innertext
Msgbox % table
sqlcode
Posts: 27
Joined: 02 Nov 2017, 08:55

Re: Extract table element from a page

01 Feb 2018, 15:23

Hello could anyone please shed some light?
A_AhkUser
Posts: 1147
Joined: 06 Mar 2017, 16:18
Location: France
Contact:

Re: Extract table element from a page

01 Feb 2018, 16:45

Hi sqlcode,

Try the following:

Code: Select all

#NoEnv
HTML =
(Ltrim Join
<!DOCTYPE html>
<html>
	<head>
	<meta http-equiv='X-UA-Compatible' content='IE=edge'><meta charset='utf-8' />
	<title>HTMLFile</title>
	</head>
<body>
<div class='row' id='DtlId'>
	<div class='col-md-12' style='-ms-overflow-x: auto;'>
		<table class='Tabform tblCont' style='width: 50`%; margin-top: 8px;'>
		<tbody>
		<tr>
			<th title='Some  Desc'>Something</th>
			<th title='Some1 Desc'>Something1</th>
			<th title='Some2 Desc'>Something2</th>
		<tr>
		<td>value_of_my_field</td>
		<td>value_of_some2</td>
		<td>value_of_some3</td>
	</div>
</div>
</body>
</html>
)
(oHTML:=ComObjCreate("HTMLFile")).write(HTML)
MsgBox % oHTML.getElementById("DtlId").querySelector("td").innerText ; <<<<
MsgBox % oHTML.getElementById("DtlId").getElementsByTagName("td")[0].innerText ; <<<<
Hope this helps.
my scripts
sqlcode
Posts: 27
Joined: 02 Nov 2017, 08:55

Re: Extract table element from a page

27 Feb 2018, 14:26

sorry for not responding sooner. I didn't get alerts and totally missed your response. So if I understand this correctly (novice with AHK), HTML= () portion is what I don't need because that is something you have added for testing purpose. My script will only contain last 3 lines of what you have posted. Is that correct understanding? I am asking because I have tried that and I am not getting anything in Msgbox.

Is this what the final script would look like?

Code: Select all

oHTML:=ComObjCreate("HTMLFile").write(HTML)
MsgBox % oHTML.getElementById("DtlId").querySelector("td").innerText ; <<<<
MsgBox % oHTML.getElementById("DtlId").getElementsByTagName("td")[0].innerText ; <<<<
sqlcode
Posts: 27
Joined: 02 Nov 2017, 08:55

Re: Extract table element from a page

27 Feb 2018, 16:57

Ok never mind, I now realized you are just creating COM object to test this.

I tested this and i am getting required data element. Curiosity question, which one of the two methods is a better one?

Thanks a lot about taking time to test and post the solution.
sqlcode
Posts: 27
Joined: 02 Nov 2017, 08:55

Re: Extract table element from a page

02 Mar 2018, 12:23

If you don't mind i have a follow up question. I have page defined something like below and i am trying to use your method but its not working. I feel like i am close but been stuck on this for days now. I am interested in extracting "111111" from below.

Code: Select all

<table class="Tabform first-data" id="TabDetails">
	<tbody>
		<tr>
			<th title="Header1">Header1</th>
			<th title="Header2">Header2</th>
			<th title="Header3">Header3</th>			
		</tr>			
		<tr>
			<td>
				<input name="fdto" tabindex="64" class="first-data Tabtextboxinput-md-lg2 Datefield" id="_fdto" type="text" maxlength="8" value="111111">
			</td>
			<td>
			 <input name="ldto" tabindex="65" class="first-data Tabtextboxinput-md-lg2 Datefield1" id="_ldto" type="text" maxlength="5" value="2222">
			</td>
			<td>
			 <something similar_0>
			</td>
		</tr>			
		<tr>
			<td>
				<input name="fdto" tabindex="76" class="first-data Tabtextboxinput-md-lg2 Datefield" id="_fdto" type="text" maxlength="8" value="333333">
			</td>
			<td>
			 <input name="ldto" tabindex="77" class="first-data Tabtextboxinput-md-lg2 Datefield1" id="_ldto" type="text" maxlength="5" value="4444">
			</td>
			 <something similar_1>
		</tr>
Thanks,
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Extract table element from a page  Topic is solved

02 Mar 2018, 14:49

I saved your html as an htm file, opened it in Internet Explorer, and tried the following code on it. It can easily be adjusted to work with a HTMLFile object: to use the oHTML approach, replace 'oWB.document' with 'oHTML'. Your table uses input fields, which is not something I'd tested code on before, however, I managed to figure something out by trial and error.

Code: Select all

q:: ;Internet Explorer - table get text
WinGet, hWnd, ID, A
oWB := WBGet("ahk_id " hWnd)
;MsgBox, % oWB.document.getElementsByTagName("table").length

oTable := oWB.document.getElementsByTagName("table")[0]
oRows := oTable.rows
vOutput := ""
;MsgBox, % oRows.length

;note: the html has two elements with same ID '_fdto'
MsgBox, % oWB.document.getElementById("_fdto").value

Loop % oRows.length
{
	oCells := oRows[A_Index-1].cells
	;MsgBox, % oCells.length
	Loop, % oCells.length
	{
		if oCells[A_Index-1].all.length
		{
			vTemp := ""
			try vTemp := oCells[A_Index-1].getElementsByTagName("input")[0].value
			vOutput .= vTemp "`t"
		}
		else
			vOutput .= oCells[A_Index-1].innerText "`t"
	}
	vOutput := SubStr(vOutput, 1, -1) "`r`n"
}
oWB := oTable := oRows := oCells := ""
MsgBox, % Clipboard := vOutput
return
Link:
[see 'TABLES: GET TEXT FROM CELLS' section]
jeeswg's Internet Explorer and HTML tutorial - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=7&t=31766
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
sqlcode
Posts: 27
Joined: 02 Nov 2017, 08:55

Re: Extract table element from a page

02 Mar 2018, 18:10

Thank you and Excellent.

It worked but could you please elaborate what happened here? After playing around with your solution, I realized i was making it overly complicated. For example, if i "always" need first row and 3rd cell which has an id (but same id is repeated across rows in the table), i can just do getElementById and extract details. It will always give me content from first row but it may be ok, right (for my case)?

I tried with below 1 liner from your solution and it worked perfectly. I tried something similar earlier but i was going from table and then to getElementById and was not using ".value". It seems like I don't really need to traverse through table at all. Do you see any issues with this? I know it works for my use case but anything to consider for future reference?

Code: Select all

 MsgBox, % oWB.document.getElementById("_fdto").value
Thanks,

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: roeleboele, TAC109 and 395 guests