[SOLVED] Encoding issues retrieving text from the internet

Get help with using AutoHotkey and its commands and hotkeys
A_AhkUser
Posts: 1076
Joined: 06 Mar 2017, 16:18
GitHub: AAhkUser
Location: France

[SOLVED] Encoding issues retrieving text from the internet

16 Nov 2017, 21:01

Hi,

I have this simple script. It downloads twice a file from the internet; the first time, using UrlDownloadToFile, then using ComObjCreate("WinHttp.WinHttpRequest.5.1") and FileAppend:

Code: Select all

url := "https://linguee-api.herokuapp.com/api?q=route&src=en&dst=ru"

UrlDownloadToFile % url, % A_Desktop . "\test_1"

(whr:=ComObjCreate("WinHttp.WinHttpRequest.5.1")).Open("GET", url, true)
whr.Send()
whr.WaitForResponse()
MsgBox % SubStr(out:=whr.ResponseText, 1, 1000)
FileAppend % out, % A_Desktop . "\test_2", utf-8
The problem lies in the fact that the second one, downloading the text to a variable, doesn't give the same result as the first one.

first one:

Code: Select all

...
"examples": [
{
	"source": "The parade will follow a very long route this year.",
	"target": "В этом году парад пройдёт по очень длинному маршруту."
}
]
...
second one:

Code: Select all

...
"examples": [
{
	"source": "The parade will follow a very long route this year.",
	"target": "п эѿом годѿ паѿад пѿойдѿѿ по оѿенѿ длинномѿ маѿѿѿѿѿѿ."
}
]
...
Also I noticed, while being on Notepad++, that the status bar displays UNIX for the first dowloaded file and Dos\Windows for the second one. I guess it is an encoding related issue. So my questions are: 1° why is this actually happening and 2° how make so that the second method - using ComObjCreate - retrieve the text with the correct encoding (if possible)?

Thanks
Last edited by A_AhkUser on 21 Nov 2017, 16:33, edited 1 time in total.
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Encoding issues retrieving text from the internet

16 Nov 2017, 21:44

- It was a different but related issue that led me to an interest in retrieving the raw contents of the html and not ResponseText.
- Sometimes I would get an error, mentioning encoding I believe, when retrieving ResponseText. I don't know the full details, because I did this very rarely.
- So what I would do instead is download the contents to a variable, and then do StrGet as ANSI or perhaps UTF-8.
download urls to vars, partially/fully, via WinHttpRequest - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=26528
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
teadrinker
Posts: 1130
Joined: 29 Mar 2015, 09:41
Contact:

Re: Encoding issues retrieving text from the internet

16 Nov 2017, 22:08

Try Microsoft.XmlHttp instead of WinHttp.WinHttpRequest.5.1

Code: Select all

url := "https://linguee-api.herokuapp.com/api?q=route&src=en&dst=ru"
whr := ComObjCreate("Microsoft.XmlHttp")
whr.Open("GET", url, false)
whr.Send()

Run, notepad,,, PID
WinWait, ahk_pid %PID%
ControlSetText, Edit1, % whr.ResponseText
A_AhkUser
Posts: 1076
Joined: 06 Mar 2017, 16:18
GitHub: AAhkUser
Location: France

Re: Encoding issues retrieving text from the internet

16 Nov 2017, 23:36

@jeeswg
Following your instruction it works just fine :thumbup: :

Code: Select all

#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
#SingleInstance, force

JEE_UrlDownloadToVar("https://linguee-api.herokuapp.com/api?q=route&src=en&dst=ru", data, size)
FileAppend % str := StrGet(&data, "utf-8"), % A_Desktop . "\test_2"

JEE_UrlDownloadToVar(vUrl, ByRef vData, ByRef vSize) { ; https://autohotkey.com/boards/viewtopic.php?f=5&t=26528
	
	oHTTP := ComObjCreate("WinHttp.WinHttpRequest.5.1")
	oHTTP.Open("GET", vUrl)
	oHTTP.Send()

	oHTTP.WaitForResponse()
	vSize := oHTTP.ResponseBody.MaxIndex()
	VarSetCapacity(vData, vSize, 0)

	VarSetCapacity(vIndex, 4, 0)
	if !DllCall("oleaut32\SafeArrayPtrOfIndex", Ptr,ComObjValue(oHTTP.ResponseBody), Ptr,&vIndex, PtrP,vPtrOfIndex)
		DllCall("kernel32\RtlMoveMemory", Ptr,&vData, Ptr,vPtrOfIndex, UPtr,vSize)
	oHTTP := ""
	
}
Not sure to understand the VarSetCapacity(vIndex, 4, 0)-->vIndex part though, it is way beyond my understanding... :headwall:

@teadrinker

Actually it works! Very nice solution :thumbup: (one of the first time I see one of your script without DllCall or SendMessage btw ; and that is just as well: actually, this solution appears to be the one that I'm most familiar with, contrary to you DllCall It's not my cup of tea :lol: ).

Thank you so much to you both for you precious input.

Return to “Ask For Help”

Who is online

Users browsing this forum: au6, BushMange, CEA6597, Google [Bot], howardb1, VACO BenQ, w0z and 188 guests