How to extract certain value from a HTML table

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
hellotherenjny
Posts: 8
Joined: 17 Oct 2021, 10:49

How to extract certain value from a HTML table

Post by hellotherenjny » 25 Oct 2021, 07:40

Hello!

Thanks for all the help and inputs as I've been getting a lot of help from the forum.

I have a question on extracting certain value from a HTML table.

I have a table that shows the title and it's value (Comparison to Revenue: 17 / Contract Start Date: 2021-10-22 / Contract End Date: 2022-12-20).

I've been having trouble extracting the value of these three elements (17 / 2021-10-22 / 2022-12-20).

I would have to use the titles (Comparison to Revenue, Contract Start Date, Contract End Date) as table is dynamic and there can be some additional rows in other windows.<br/>

After spending an entire day, I'm at a point where I can extract all the values (cells) of the table but I could not find a way to extract those 3 values.<br/>

Any help will be deeply appreciated! Thank you for your help!

Progress I made so far
<CODE><s>

Code: Select all

</s>^3::

	wb := ComObjCreate("InternetExplorer.Application")  
	wb.Visible := true                                  
	wb.Navigate("https://kind.krx.co.kr/common/disclsviewer.do?method=search&acptno=20211025000089&docno=&viewerhost=&viewerport=") 

	while wb.busy or wb.ReadyState != 4
		Sleep, 10
	
	Sleep, 2000
	
	Frame := wb.document.getElementbyID("docViewFrm")
	Table := Frame.contentWindow.document.getElementsByTagName("table")[0]
	rows := table.rows
	Loop % rows.length  {
		cells := rows[A_Index - 1].cells
		Loop % cells.length
			MsgBox, % cells[A_Index - 1].innerText
}
return<e>
</e></CODE>

HTML Codes
<CODE><s>

Code: Select all

</s><iframe name="docViewFrm" width="100%" height="100%" title="MainContent" id="docViewFrm" src="" frameborder="0" marginwidth="0" marginheight="0" scrolling="auto" style="width: 1132px; height: 142px;"></iframe>
<html><head> 
  <meta http-equiv="X-UA-TextLayoutMetrics" content="gdi"> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 
  <style>.xforms * { font-family: Calibri;} .xforms_title * { font-size: 13pt; padding : 0 0 10 0;} .xforms table { font-size: 10px; padding:0px; border-collapse:collapse; color: #3D3D3D; } .xforms td { padding-left:0px; padding-top:0px; border-collapse:collapse; line-height:22px; color: #3D3D3D; /*text-align:right;*/ vertical-align:middle; border: 1px solid #7f7f7f; } .xforms span { /*display: inline-block;*/ padding-left:3px; padding-right: 1px; line-height:22px; padding-top:0px; padding-bottom:0px; text-align:left; vertical-align:middle; border: 0px solid #DBDBDB; } .xforms.img{ border : 0px; } .xforms a:link { color: #194866; text-decoration: none; } .xforms a:active { color: #194866; text-decoration: none; } .xforms a:visited { color: #194866; text-decoration: none; } .xforms A:hover { color:008BE3; text-decoration: none; } .xforms_input{ border: 0px solid #DBDBDB; padding-left:3px; padding-right: 1px; height:22px; padding-top:0px; padding-bottom:0px; text-align:left; }</style> 
  <title>:: 70370_New Contract</title> 
 </head> 
 <body> 
  <div class="xforms"> 
   <div> 
    <span class="noprint" style="width: 600px; font-size: 10pt; display: none;"> </span> 
    <div class="xforms_title"> 
     <div> 
      <span style="width: 600px; text-align: center; font-weight: bold;">New Contract</span> 
     </div> 
    </div> 
    <span class="noprint" style="width: 600px; font-size: 10pt; display: none;"> </span> 
    <table id="XFormD6_Form0_Table0" bordercolorlight="#666666" bordercolordark="white" style="margin: 0px 0px 20px; border: 1px solid rgb(127, 127, 127); border-image: none; width: 600px; font-size: 10pt;" border="1" cellspacing="0" cellpadding="1"> 
     <tbody> 
      <tr> 
       <td width="268" colspan="2"> <span style="width: 268px; font-size: 10pt;">1. Contract Content</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; font-size: 10pt;">Supply Screen Sensor</span> </td> 
      </tr> 
      <tr> 
       <td width="144" rowspan="6"> <span style="width: 144px; font-size: 10pt;">2. Contract Details</span> </td> 
       <td width="124"> <span style="width: 124px; font-size: 10pt;">Conditions</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; font-size: 10pt;">Not Applicable</span> </td> 
      </tr> 
      <tr> 
       <td width="124"> <span style="width: 124px; font-size: 10pt;">Contract Amount</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; text-align: right; font-size: 10pt;">2,550,000,000</span> </td> 
      </tr> 
      <tr> 
       <td width="124"> <span style="width: 124px; font-size: 10pt;">Contract Amount</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; text-align: right; font-size: 10pt;">-</span> </td> 
      </tr> 
      <tr> 
       <td width="124"> <span style="width: 124px; font-size: 10pt;">Total Contract Amount</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; text-align: right; font-size: 10pt;">2,550,000,000</span> </td> 
      </tr> 
      <tr> 
       <td width="124"> <span style="width: 124px; font-size: 10pt;">Last Year Revenue</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; text-align: right; font-size: 10pt;">15,041,930,242</span> </td> 
      </tr> 
      <tr> 
       <td width="124"> <span style="width: 124px; font-size: 10pt;">Comparison to Revenue</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; text-align: right; font-size: 10pt;">17.0</span> </td> 
      </tr> 
      <tr> 
      <tr> 
       <td width="144" rowspan="2"> <span style="width: 144px; font-size: 10pt;">5. Contract Duration</span> </td> 
       <td width="124"> <span style="width: 124px; text-align: center; font-size: 10pt;">Contract Start  Date</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; font-size: 10pt;">2021-10-22</span> </td> 
      </tr> 
      <tr> 
       <td width="124"> <span style="width: 124px; text-align: center; font-size: 10pt;">Contract End Date</span> </td> 
       <td width="332" colspan="2"> <span class="xforms_input" style="width: 332px; font-size: 10pt;">2022-12-20</span> </td> 
      </tr> 
     </tbody> 
    </table> 
   </div> 
  </div>  
 
</body></html>
<e>
</e></CODE></r>

User avatar
mikeyww
Posts: 26883
Joined: 09 Sep 2014, 18:38

Re: How to extract certain value from a HTML table

Post by mikeyww » 25 Oct 2021, 08:08

Code: Select all

If !FileExist(table := "d:\temp2\table.html") { ; Adjust as needed
 MsgBox, 48, Error, File not found.`n`n%table%
 Return
} Else FileRead, html, %table%
For each, str in ["Comparison to Revenue", "Contract Start Date", "Contract End Date"]
 MsgBox, 64, %str%, % str ": " value(html, str)

value(html, str) {
 RegExMatch(html, "s)" RegExReplace(str, " +", "\s+") ".+?>\K[\d.-]+", val)
 Return val
}

hellotherenjny
Posts: 8
Joined: 17 Oct 2021, 10:49

Re: How to extract certain value from a HTML table

Post by hellotherenjny » 25 Oct 2021, 18:59

Thanks a ton Mikey!

It seemed like I would have to download to a file so I did so but the message box didn't show the values that I'm looking for "17" "2021-10-22" "2022-12-20".

Please let me know if I'm missing something. Thanks!

Code: Select all

^3::

	wb := ComObjCreate("InternetExplorer.Application")  
	wb.Visible := true                                  
	wb.Navigate("https://kind.krx.co.kr/common/disclsviewer.do?method=search&acptno=20211025000089&docno=&viewerhost=&viewerport=") 

	while wb.busy or wb.ReadyState != 4
		Sleep, 10
	
	Sleep, 2000
	
	URLDownloadToFile, https://kind.krx.co.kr/common/disclsviewer.do?method=search&acptno=20211025000089&docno=&viewerhost=&viewerport=, A:\Explorer\Temporary Internet Files\table.html
	
	If !FileExist(table := "A:\Explorer\Temporary Internet Files\table.html") { ; Adjust as needed
	 MsgBox, 48, Error, File not found.`n`n%table%
	 Return
	 
	} Else FileRead, html, %table%
	For each, str in ["Comparison to Revenue", "Contract Start Date", "Contract End Date"]
	 MsgBox, 64, %str%, % str ": " value(html, str)

	value(html, str) {
	 RegExMatch(html, "s)" RegExReplace(str, " +", "\s+") ".+?>\K[\d.-]+", val)
	 Return val
	}
	
	
return

User avatar
mikeyww
Posts: 26883
Joined: 09 Sep 2014, 18:38

Re: How to extract certain value from a HTML table

Post by mikeyww » 25 Oct 2021, 20:15

Look inside the downloaded file. See if it contains the text strings that you mentioned.

hellotherenjny
Posts: 8
Joined: 17 Oct 2021, 10:49

Re: How to extract certain value from a HTML table

Post by hellotherenjny » 25 Oct 2021, 20:44

Awesome. Thanks for the help Mikey!

User avatar
SirSocks
Posts: 360
Joined: 26 Oct 2018, 08:14

Re: How to extract certain value from a HTML table

Post by SirSocks » 25 Oct 2021, 21:03

Check out xStr() by SKAN, its for text extraction and parsing HTML.
viewtopic.php?t=74050

Post Reply

Return to “Ask for Help (v1)”