Recommended way to get a webpage and parse it later

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
menteith
Posts: 51
Joined: 04 Feb 2016, 12:22

Recommended way to get a webpage and parse it later

19 Apr 2016, 11:10

Hi all,

I am trying to get a web page and parse some text from it (with document.getElementById and so on). What would be best way to do it?

I could use:

First solution:

Code: Select all

url = www.google.pl ; an example

	whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
	whr.Open("GET", url, false ), whr.Send()
	www := whr.ResponseText
It works fine, but if a webpage is in charset UTF-8 it sometimes throws and error (0x80070459 - No mapping for the Unicode character exists in the target multi-byte code page.). A possible workaround would be:

Second solution:

Code: Select all

url = www.google.pl ; an example

	whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
	whr.Open("GET", url, false ), whr.Send()
	www := whr.ResponseText
	if A_LastError
		www := URLDownloadToVar(sURL, "UTF-8")
		
; found at https://autohotkey.com/board/topic/101686-objectresponsetext-error/
URLDownloadToVar(url, Encoding = ""){
	hObject:=ComObjCreate("WinHttp.WinHttpRequest.5.1")
	hObject.Open("GET",url)
	hObject.Send()

	If Encoding {
		oADO          := ComObjCreate("adodb.stream")
		oADO.Type     := 1
		oADO.Mode     := 3
		oADO.Open()
		oADO.Write( hObject.ResponseBody )
		oADO.Position := 0
		oADO.Type     := 2
		oADO.Charset  := Encoding
		return oADO.ReadText(), oADO.Close()
	}
	return hObject.ResponseText
}
However, this workaround sometimes fails to work even thought I think it should – if A_LastError should work fine. I also tried if ErrorLevel.

Third solution:

Then I tried to use URLDownloadToFile but the URL should be encoded and more importantly the downloaded webpage could be in different charset and (I could be wrong) reading files in different charset could be challenging at times in AHK.

The following used to work but when I installed Internet Explorer 11 (I had had IE 8 earlier) it fails to work: It works fine after I have restarted my computer once again.

Fourth solution:

Code: Select all

IE := ComObjCreate("InternetExplorer.Application")
IE.Visible := false
url = http://www.google.pl
IE.Navigate(url)

While IE.Busy
Sleep, 10

contents := IE.document.body.outerhtml
Any ideas?
User avatar
Capn Odin
Posts: 1352
Joined: 23 Feb 2016, 19:45
Location: Denmark
Contact:

Re: Recommended way to get a webpage and parse it later

19 Apr 2016, 11:31

How about requesting a specific charset ?

Code: Select all

url := "http://www.google.pl" ; an example

whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
whr.Open("GET", url)
whr.SetRequestHeader("Accept-Charset", "iso-8859-1")
whr.Send()
www := whr.ResponseText
Please excuse my spelling I am dyslexic.
User avatar
menteith
Posts: 51
Joined: 04 Feb 2016, 12:22

Re: Recommended way to get a webpage and parse it later

19 Apr 2016, 11:45

I will work. Basically my second solution does that but only when I download a page I will know the charset:)
An ordinary user who needs some help with developing own programs for his own use.
User avatar
Joe Glines
Posts: 771
Joined: 30 Sep 2013, 20:49
Location: Dallas
Contact:

Re: Recommended way to get a webpage and parse it later

30 May 2021, 14:59

I was helping a friend work with an API and getting a weird error on the ResponseText.

The error was "No mapping for the Unicode character exists in the target multi-byte code page"

I asked Tank to give me a hand and he showed me how to iterate over the ResponseBody and convert the array of bytes to thext

Sign-up for the 🅰️HK Newsletter

ImageImageImageImage:clap:
AHK Tutorials:Web Scraping | | Webservice APIs | AHK and Excel | Chrome | RegEx | Functions
Training: AHK Webinars Courses on AutoHotkey :ugeek:
YouTube

:thumbup: Quick Access Popup, the powerful Windows folders, apps and documents launcher!

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 398 guests