mysterious Unicode text encoding issue

Get help with using AutoHotkey and its commands and hotkeys
User avatar
Joe Glines
Posts: 697
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

mysterious Unicode text encoding issue

23 Aug 2014, 08:04

I am using WinHttpRequest.5.1 to perform an API call to a vendor tool. My has several parameters to specify my call should take utf-8 format for example:

Code: Select all

 WebRequest.SetRequestHeader("Content-Type", "text/xml;charset=UTF-8")
I have used this same call to other vendor's and have received traditional / simplified Chinese, Japanese, etc. characters so I don't not think this is my problem.

In places where there are Unicode characters I would use SciTE and Excel to open it and see text like this: 感谢GPS器件的啥程师

Which (I thought) was "gobbly-gook" and there was something wrong with my call. My above text is within an XML framework. Excel opens the XML structurally correct but I still see the SAME strange characters listed above.

I then saved the above Excel file as a "text file". To my surprise when I open my newly-saved text file with SciTE, I can see the correct Unicode text "感谢GPS器件的工程师". If I re-open the text file using the import wizard and specifiy 65001-Unicode (UTF-8), the unicode shows up correctly in Excel.

So apparently the "gobbly-gook" text is actually meaningful however why isn't it displaying correctly in SciTe? My original text file has utf-8 encoding. How is excel converting the characters? Any ideas on what it is using and if I can convert it somehow in my WinHTTP call?

As always, thank you for any / all help!
Joe

Find & Click AutoHotkey syntax writer Automate my Task :clap:
AHK Tutorials:Web Scraping | AHK Studio | Webservice APIs | Excel | Chrome | RegEx | Functions
Training: AHK Webinars Courses on AutoHotkey :ugeek:
Connect with me on LinkedIn :beer: | YouTube
How-to: Create a shortcut that automatically
logs in to any website

:thumbup: Quick Access Popup, the powerful Windows folders, apps and documents launcher!
Help support the AutoHotkey foundation
lexikos
Posts: 7057
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: mysterious Unicode text encoding issue

24 Aug 2014, 23:24

SciTE does not always auto-detect UTF-8. Perhaps Excel wrote a UTF-8 BOM into the newly created file.
SciTE will automatically detect the encoding scheme used for Unicode files that start with a Byte Order Mark (BOM). The UTF-8 and UTF-16 encodings are recognised including both Little Endian and Big Endian variants of UTF-16.

UTF-8 files will also be recognised when they contain a coding cookie on one of the first two lines. A coding cookie looks similar to "coding: utf-8" ("coding" followed by ':' or '=', optional whitespace, optional quote, "utf-8") and is normally contained in a comment:

Code: Select all

# -*- coding: utf-8 -*-
For XML there is a declaration:

Code: Select all

<?xml version='1.0' encoding='utf-8'?>
For other encodings set the code.page and character.set properties.
Guest

Re: mysterious Unicode text encoding issue

25 Aug 2014, 07:49

Thanks Lexicos! The first line of the XML file returned from my WinHTTPRequest contains: <?xml version="1.0" encoding="utf-8"?><xml> and my file (in Scite) shows the encoding to be UTF-8 with BOM so I do not believe that is the issue.

On this page: http://www.w3schools.com/xml/xml_encoding.asp I saw two links at the bottom which show both correct encoding and incorrect. I can now see how my "gobbly-gook" text does look closer to the correct encoding however I'm still surprised at how I can re-open it either in SciTE or Excel and it first looks one way then, after saving as text with Excel, it renders in both SciTE and Excel with the actual Asian characters. To your point, Excel must be adding something... What that is and how I can tell is beyond me!
User avatar
Joe Glines
Posts: 697
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Re: mysterious Unicode text encoding issue

25 Aug 2014, 08:38

The above was from me btw. I don't know how I got logged-out.

Find & Click AutoHotkey syntax writer Automate my Task :clap:
AHK Tutorials:Web Scraping | AHK Studio | Webservice APIs | Excel | Chrome | RegEx | Functions
Training: AHK Webinars Courses on AutoHotkey :ugeek:
Connect with me on LinkedIn :beer: | YouTube
How-to: Create a shortcut that automatically
logs in to any website

:thumbup: Quick Access Popup, the powerful Windows folders, apps and documents launcher!
Help support the AutoHotkey foundation

Return to “Ask For Help”

Who is online

Users browsing this forum: Bing [Bot], Chunjee, malcev, mikeyww, murataygun, Sabestian Caine and 58 guests