Getting unicode strings from WinHttpRequest, why is this necessary? Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
hachi
Posts: 15
Joined: 16 Jan 2017, 19:29

Getting unicode strings from WinHttpRequest, why is this necessary?

12 Mar 2017, 12:28

Bear with me since I'm quite clueless on char encoding, and probably everything related to it.

I'm working on some functions to extract channel info from Twitch API through Jxon,

Code: Select all

twitch_status(chan) {
	static oauth := twitch_oauth()
	if !oauth
		return 0
	
	; get json obj from twitch
	obj := ComObjCreate("WinHTTP.WinHttpRequest.5.1")
	obj.Open("GET", "https://api.twitch.tv/kraken/streams/" chan "?oauth_token=" oauth)
	obj.Send()

	; deal with unicode response
	oADO          := ComObjCreate("adodb.stream")
	oADO.Type     := 1
	oADO.Mode     := 3
	oADO.Open()
	oADO.Write( obj.ResponseBody )
	oADO.Position := 0
	oADO.Type     := 2
	oADO.Charset  := "utf-8"

	; convert to string array
	rx := jxon_load(oADO.ReadText())
	if !isObject(rx)
		return 0
	if (rx["status"] = 400)
		return "error: 400"
	return rx["stream"]["channel"]["status"]
}

; read oauth key from file
twitch_oauth() {
	fo := FileOpen(a_scriptdir "\twitch_irc_oauth", 0)
	return fo ? fo.Read() : 0
}
The get request returns a JSON object in utf-8 like so,

Code: Select all

{
   "stream": {
      "_id": 23932774784,
      "game": "BATMAN - The Telltale Series",
      "viewers": 7254,
      "video_height": 720,
      "average_fps": 60,
      "delay": 0,
      "created_at": "2016-12-14T22:49:56Z",
      "is_playlist": false,
      "preview": {
         "small": "https://static-cdn.jtvnw.net/previews-ttv/live_user_dansgaming-80x45.jpg",
         "medium": "https://static-cdn.jtvnw.net/previews-ttv/live_user_dansgaming-320x180.jpg",
         "large": "https://static-cdn.jtvnw.net/previews-ttv/live_user_dansgaming-640x360.jpg",
         "template": "https://static-cdn.jtvnw.net/previews-ttv/live_user_dansgaming-{width}x{height}.jpg"
      },
      "channel": {
         "mature": false,
         "status": "Dan is Batman? - Telltale's Batman",
         "broadcaster_language": "en",
         "display_name": "DansGaming",
         "game": "BATMAN - The Telltale Series",
         "language": "en",
         "_id": 7236692,
         "name": "dansgaming",
         "created_at": "2009-07-15T03:02:41Z",
         "updated_at": "2016-12-15T01:33:58Z",
         "partner": true,
         "logo": "https://static-cdn.jtvnw.net/jtv_user_pictures/dansgaming-profile_image-76e4a4ab9388bc9c-300x300.png",
         "video_banner": "https://static-cdn.jtvnw.net/jtv_user_pictures/dansgaming-channel_offline_image-d3551503c24c08ad-1920x1080.png",
         "profile_banner": "https://static-cdn.jtvnw.net/jtv_user_pictures/dansgaming-profile_banner-4c2b8ece8cd010b4-480.jpeg",
         "profile_banner_background_color": null,
         "url": "https://www.twitch.tv/dansgaming",
         "views": 63906830,
         "followers": 538598
      }
   }
}
Now somewhere between response/parsing, the original format of unicode characters kept getting garbled until I found this beautiful ADO solution here.

Though if I just hand the response text off to parsing like rx := jxon_load(obj.ResponseText), all unicode formatting is lost. I'm on AutoHotkeyU64, the script is bom/utf-8, winhttp makes utf-8 requests by default, the server responds with utf-8, why do I need to send this through ADO?

My lack of understanding makes this feel like a ham fisted workaround, am I doing something wrong.

If you want to experiment with this it's simple to get an oauth key through any twitch account, just go to the app section in your control panel and generate an IRC key, this is good enough for any endpoints having to do with public channel info.
hachi
Posts: 15
Joined: 16 Jan 2017, 19:29

Re: Getting unicode strings from WinHttpRequest, why is this necessary?

16 Mar 2017, 17:12

bump, any takers? what I really want to know is, can such a request return usable data without the additional ADO stream

more example strings,
no ADO:

Code: Select all

rx := jxon_load(obj.ResponseText)
	; rx["stream"]["channel"]["status"] = Толстяк из Космоса !розыгрыш !музыка
with ADO:

Code: Select all

rx := jxon_load(oADO.ReadText())
	; rx["stream"]["channel"]["status"] = Толстяк из Космоса !розыгрыш !музыка
lexikos
Posts: 6731
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Getting unicode strings from WinHttpRequest, why is this necessary?  Topic is solved

17 Mar 2017, 04:03

I suppose ResponseText might work like this:
  • Look for a charset in the response headers; if found, use that to decode the text.
  • Look for a byte order mark in the text; if found, use that to identify the charset.
  • Just assume it's ANSI text and translate it to UTF-16, perhaps corrupting it in the process if it isn't ANSI text.
    (It must always be translated to UTF-16 because COM objects use BSTR for string values.)
Maybe it doesn't look in the headers. Maybe it doesn't check for a BOM. Maybe it does. I suppose that usually neither will be found, and the charset will only be defined within the document, if at all.

ResponseBody is a SafeArray of bytes. There are two ways to get at the raw data. Where body := obj.ResponseBody...

The proper way:

Code: Select all

DllCall("oleaut32\SafeArrayAccessData", "ptr", ComObjValue(body), "ptr*", pdata)
... use pdata ...
DllCall("oleaut32\SafeArrayUnaccessData", "ptr", ComObjValue(body))
The lazy way:

Code: Select all

pdata := NumGet(ComObjValue(body)+8+A_PtrSize)
pdata contains the address of the data.

Use StrGet to read a UTF-8 string from the address. Since the response likely won't contain a binary zero at the end, you need to specify the length.

Code: Select all

length := body.MaxIndex() - body.MinIndex() + 1
; or if you assume MinIndex will always be 0:
;  length := body.MaxIndex() + 1

text := StrGet(pdata, length, "UTF-8")
hachi
Posts: 15
Joined: 16 Jan 2017, 19:29

Re: Getting unicode strings from WinHttpRequest, why is this necessary?

17 Mar 2017, 16:02

ok this is making sense to me, being that BOM/mime charset is not supported for application/JSON, WinHTTP just don't know how to deal with this text. so for all practical purpose I'm going to consider ResponseText property useless for anything not ANSI here

your method is also consistently about 4x more efficient than ADO stream (turnaround of ~120 vs 500 ms), thanks

Code: Select all

	obj := ComObjCreate("WinHTTP.WinHttpRequest.5.1")
	obj.Open("GET", "https://api.twitch.tv/kraken/streams/" chan "?oauth_token=" oauth)
	obj.Send()

	body := obj.ResponseBody
	DllCall("oleaut32\SafeArrayAccessData", "ptr", ComObjValue(body), "ptr*", pdata)
	length := (body.MaxIndex() - body.MinIndex() + 1)
	rx := jxon_load(StrGet(pdata, length, "UTF-8"))
	DllCall("oleaut32\SafeArrayUnaccessData", "ptr", ComObjValue(body))
	
	return isObject(rx) ? rx : 0

Return to “Ask For Help”

Who is online

Users browsing this forum: blue83, Yakshongas and 99 guests