[Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

Get help with using AutoHotkey and its commands and hotkeys
User avatar
Cerberus
Posts: 138
Joined: 12 Jan 2016, 15:46

[Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

27 Mar 2019, 23:53

I'm writing a code linter for Autohotkey, which is a programme that automatically marks errors in your code. Demonstration in Sublime Text:
AHKLint_Demonstration.gif
(448.52 KiB) Downloaded 77 times
My linter uses, amongst other sources, the error messages that the Autohotkey compiler (I think?) itself puts out. Everything works perfectly, except that, somehow, it cannot handle Unicode characters in the file it is linting.

What happens is that the compiler (I think?) error message shows a Unicode character like wrongly as â?¬ after I save the script-to-lint in Sublime Text or Notepad++ — though this doesn't happen when I run the code as a "pipe", for some reason, nor when I save the file with the Windows standard Notepad. So it happens when I create a file test.ahk in Sublime Text or Notepad++ and put in the body only as the code to run; if I then run the script (from Windows Explorer), I get this error message:
IR5jCWD[1].png
Error Message with € sign
(4.6 KiB) Downloaded 77 times
But, even with files where the error message has the correct (e.g. saved in standard Notepad), I still get the wrong encoding in through the linter script.

So I have two options:
1. Try to fix the underlying issue with the wrong output/file format; but:
a.) That seems way over my head, and:
b.) It may not work the same way on every person's computer.
2. Work around the issue by converting bad characters like â?¬ back into .

The second approach would seem the easiest. Does anyone know of a way to do this? I have Googled one of my heads off for a way to auto-convert characters like â?¬ back into . The only thing I can come up with would be regexing a finite list of hard-coded characters into the correct originals; but that seems ugly, and it will only work on a limited number of characters. Is there another way?

For the record, here is the code from the linter that gets the text from the error messages (I think I got part of it from an old post by Lexikos):

Code: Select all

    path = D:\Dropbox\Autohotkey\test_eurosign.ahk
    shell := ComObjCreate("WScript.Shell")
    ; Msgbox % A_AhkPath
    exec := shell.Exec(A_AhkPath . " /iLib NUL /ErrorStdOut *")
    warn := IsByRef(warnings) ? "StdOut" : "Off"
    script =
    (LTrim
    #Include %path%
    #Warn,, %warn%
    )
    exec.StdIn.Write(script)
    exec.StdIn.Close()
    error := exec.StdErr.ReadAll()
    warnings := exec.StdOut.ReadAll()
    If not Error
        Return

    Msgbox % Error
    FileDelete, ahklint_errors.txt
    FileAppend, %Error%, ahklint_errors.txt
User avatar
jeeswg
Posts: 6832
Joined: 19 Dec 2016, 01:58
Location: UK

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

28 Mar 2019, 00:10

You can use StrPut to convert UTF-8 bytes to Unicode characters. Cheers.
ü ä ö ... getCorrectedStringUAOSS - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=44955&p=203507#p203507
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
Cerberus
Posts: 138
Joined: 12 Jan 2016, 15:46

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

28 Mar 2019, 00:51

jeeswg wrote:
28 Mar 2019, 00:10
You can use StrPut to convert UTF-8 bytes to Unicode characters. Cheers.
ü ä ö ... getCorrectedStringUAOSS - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=44955&p=203507#p203507
Thank you for your help! But I'm afraid it didn't work. I've added your functions to the bottom of the script, and your function calls at the appropriate place. It works when I set vText to your Chr(8730) character, but not when I set it to â?¬ directly, nor with the text I want to use it on. This is the code I used; even in a separate file, the below doesn't work for me (bad characters):

Code: Select all

    ; vText := Chr(8730) ;square root symbol    
vText = â?¬
vText := JEE_StrTextToUtf8Bytes(vText)
Msgbox % vText
vText := JEE_StrUtf8BytesToText(vText)    
Msgbox % vText


;==================================================

;e.g. vText := JEE_StrUtf8BytesToText(vUtf8Bytes)

JEE_StrUtf8BytesToText(ByRef vUtf8Bytes) ;{
{
  if A_IsUnicode
  {
    VarSetCapacity(vTemp, StrPut(vUtf8Bytes, "CP0"))
    StrPut(vUtf8Bytes, &vTemp, "CP0")
    return StrGet(&vTemp, "UTF-8")
  }
  else
    return StrGet(&vUtf8Bytes, "UTF-8")
}

;==================================================

;e.g. vUtf8Bytes := JEE_StrTextToUtf8Bytes(vText)

JEE_StrTextToUtf8Bytes(ByRef vText) ;{
{
  VarSetCapacity(vTemp, StrPut(vText, "UTF-8"))
  StrPut(vText, &vTemp, "UTF-8")
  return StrGet(&vTemp, "CP0")
}

;==================================================
User avatar
jeeswg
Posts: 6832
Joined: 19 Dec 2016, 01:58
Location: UK

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

28 Mar 2019, 00:56

I get this for the euro symbol: €, not �.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
Cerberus
Posts: 138
Joined: 12 Jan 2016, 15:46

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

28 Mar 2019, 15:24

jeeswg wrote:
28 Mar 2019, 00:56
I get this for the euro symbol: €, not �.
Oh, dear. I hadn't noticed. So the problem is even worse than I thought, and it cannot be easily solved by back-conversion. I wonder what I could do.
User avatar
jeeswg
Posts: 6832
Joined: 19 Dec 2016, 01:58
Location: UK

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

28 Mar 2019, 15:28

Maybe you need to save the files as UTF-8 with BOM. Or UTF-16 with BOM.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
Cerberus
Posts: 138
Joined: 12 Jan 2016, 15:46

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

06 Apr 2019, 23:16

jeeswg wrote:
28 Mar 2019, 15:28
Maybe you need to save the files as UTF-8 with BOM. Or UTF-16 with BOM.
Thanks for your advice; I think saving as UTF-8 with BOM may have helped somewhat. I get most Unicode characters correctly now, using your script to decode the gibberish. But I think there is a different issue here. This minimal example (taken from some other post) doesn't work for the sign for me, though it works for some other Unicode characters:

Code: Select all

FileAppend € é Û`n, ~badscript.ahk
MsgBox % ScriptErrors("~badscript.ahk")
FileDelete ~badscript.ahk
; MsgBox % ScriptErrors("D:\Dropbox\Autohotkey\unicode_test_script.ahk")

ScriptErrors(Script)
{
    return ComObjCreate("WScript.Shell")
        .Exec("AutoHotkey.exe /iLib nul /ErrorStdOut """ Script """")
        .StdErr.ReadAll()
}
TYHckqp[1].png
TYHckqp[1].png (4.57 KiB) Viewed 742 times
There is a question mark in place of the sign (the rest of the error message is as it should be). So there must be some issue with .StdErr.ReadAll() ? I wonder what that could mean.
User avatar
jeeswg
Posts: 6832
Joined: 19 Dec 2016, 01:58
Location: UK

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

06 Apr 2019, 23:23

You can see here that the UTF-8 bytes for the euro symbol contain a comma, so you could end up with a comma there, and hence an extra argument for FileAppend, if you save the script as UTF-8 with no BOM.

Code: Select all

;Unicode:
FileAppend € é Û`n, ~badscript.ahk

;UTF-8 bytes:
FileAppend € é Û`n, ~badscript.ahk
You could try one of:

Code: Select all

FileAppend € é Û`n, ~badscript.ahk, UTF-8
FileAppend € é Û`n, ~badscript.ahk, UTF-16
Another consideration is whether the script you're running that contains the FileAppend line, has a BOM.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
Cerberus
Posts: 138
Joined: 12 Jan 2016, 15:46

Re: [Making an AHK linter, encoding issue] Unicode characters like € are shown as â?¬ : convert them back somehow?

07 Apr 2019, 09:18

jeeswg wrote:
06 Apr 2019, 23:23
You can see here that the UTF-8 bytes for the euro symbol contain a comma, so you could end up with a comma there, and hence an extra argument for FileAppend, if you save the script as UTF-8 with no BOM.
...
Another consideration is whether the script you're running that contains the FileAppend line, has a BOM.
That's nice, I didn't know you could add the encoding to the FileAppend command directly. But I had already tried it with the FileEncoding command, which didn't help. And I'm afraid I still get a question mark in your examples. And it's not just the euro sign; it seems to be basically all the red characters on this page, and also characters that aren't on that page at all. And it also happens for characters where there is no comma in the byte code, such as ž:

Code: Select all


FileAppend ¾ ž é ō Û`n, ~badscript.ahk, UTF-8      ; Result: ¾ ? é ? 
; FileAppend € é Û`n, ~badscript.ahk, UTF-16       ; Result: ? é Û
; FileAppend € é Û`n, ~badscript.ahk, UTF-8    ; Result: � é �
FileRead, FileContent, ~badscript.ahk 			   ; Result: ¾ ž é ō Û (correct)
Msgbox % FileContent
MsgBox % ScriptErrors("~badscript.ahk")
FileDelete ~badscript.ahk
; MsgBox % ScriptErrors("D:\Dropbox\Autohotkey\unicode_test_script.ahk")

ScriptErrors(Script){
    return ComObjCreate("WScript.Shell")
        .Exec("AutoHotkey.exe /iLib nul /ErrorStdOut """ Script """")
        .StdErr.ReadAll()
}
It also happens when the characters are in a pre-saved file with UTF-8 with BOM, or without BOM. I had also tried saving the code above as UTF-8 with BOM, or without BOM, but it makes no difference, alas. As you can see above, when I just do a FileRead on the temporary file, the characters are read correctly. So there must be something happening inside the ScriptErrors function. The above is the complete code I'm using for testing.

Return to “Ask For Help”

Who is online

Users browsing this forum: Google [Bot], SuperFoobar, TAC109 and 80 guests