Detecting whether AutoHotkey.exe or a compiled script is Unicode

Put simple Tips and Tricks that are not entire Tutorials in this forum
lexikos
Posts: 6968
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 06:23

Within a script, A_IsUnicode tells you whether the running AutoHotkey executable is Unicode. (Unless you're using a recent v2 alpha, in which case A_IsUnicode is undefined, but there's no ANSI version anyway.)

But what if you want to know whether some other AutoHotkey.exe, AutoHotkeySC.bin or compiled script is Unicode?

You can scan the file for a string.

Certain strings are only present in the native encoding, so if you search for a UTF-16 string in an ANSI executable, you won't find it. However, you have to be careful about which string you use.

In order to operate, the interpreter binary must contain the name of every built-in function/command, so those are good candidates (if they exist in all versions of AutoHotkey). "AutoHotkey" won't work because it's always present in both UTF-16 and UTF-8 (in the manifest resource). Built-in variables can be used, but you must omit the "A_" prefix, and remember that A_IsUnicode isn't defined in recent v2 alphas. Older versions of AutoHotkey included them in lower-case, with some names broken up, like "loop" "file" "fullpath".

Code: Select all

IsUnicodeAutoHotkey(path) {
    FileRead fd, *c %path%  ; Get the raw file data.
    fb := StrLen(fd) * (A_IsUnicode ? 2 : 1)  ; Get the size in bytes (may be safer to use StrLen than FileGetSize).
    needle := "MsgBox"  ; Pick a string to search for, and convert it to UTF-16:
    VarSetCapacity(ns := "", nb := StrPut(needle, "utf-16")), StrPut(needle, &ns, "utf-16")
    ; Search!
    return InBuf(&fd, fb, &ns, nb) != -1  ; -1 means "not found"
}

Loop Files, %A_AhkPath%\..\*.exe
    MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)
Loop Files, %A_AhkPath%\..\Compiler\*.bin
    MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)

#NoEnv
To run this, you would need to copy InBuf() from Ahk2Exe/BinMod.ahk (for some reason the parameters are ordered differently to wOxxOm's original version).

What about without InBuf?

RegExMatch and RegExReplace can search past null characters, but there are caveats:
  • You must pass a variable, and the variable's internal string length (StrLen) must match the data size. FileRead fd, *c %path% does set StrLen appropriately, but...
  • Binary clipboard variables (from var := ClipboardAll or FileRead fd, *c %path%) are... special. By design, the internal string length of that type of variable is ignored in many cases, including RegExMatch.
  • If you're searching for a string, you need to take care that it's in the right encoding - it will depend on which version of AutoHotkey you run, unless you perform conversion.
Laszlo wrote:When a binary file is to be read into RAM, we have to use the *c option, which sets StrLen the file size, but the data is stored in a special variable, not usable for RegEx. We have to copy it into another variable (or use dllcalls to open/read/close the file).

Code: Select all

FileRead a, *c %A_AhkPath%
VarSetCapacity(b,StrLen(a),1)
DllCall("RtlMoveMemory", UInt,&b, UInt,&a, Uint,StrLen(a))
MsgBox % "It is found: " RegExMatch(b, "\0\03")
Source: Machine code binary buffer searching regardless of NULL - Scripts and Functions - AutoHotkey Community
Above, Laszlo shows how to get a normal variable from a binary clipboard variable. This was before Unicode, so needs to be adjusted for that. We can use it like this:

Code: Select all

IsUnicodeAutoHotkey(path) {
    FileRead fd, *c %path%
    VarSetCapacity(b, cb := StrLen(fd)*(A_IsUnicode?2:1), 1)
    DllCall("RtlMoveMemory", UInt,&b, UInt,&fd, Uint,cb)
    return !!RegExMatch(b, "MsgBox")
}

Loop Files, %A_AhkPath%\..\*.exe
    MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)
Loop Files, %A_AhkPath%\..\Compiler\*.bin
    MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)

#NoEnv
Most of the function is just adjusting for the weirdness of ClipboardAll/FileRead *c. There's actually an easier way: AutoHotkey removes the "binary clipboard" status from a variable if you tamper with it by passing it directly to NumPut:

Code: Select all

IsUnicodeAutoHotkey(path) {
    FileRead fd, *c %path%
    NumPut(NumGet(fd, "char"), fd, "char") ; Clear the "binary clip" status.
    return !!RegExMatch(fd, "MsgBox")
}
Actually, we don't have to use FileRead *c. Originally it was necessary because normal FileRead specifically recalculated the variable's length based on the location of the first null character. FileRead hasn't done this since Unicode support was added, but it does convert from the "source" encoding to the native encoding, which will corrupt any data that isn't actually in the "source" encoding. To avoid that (and get the original data), you just need to match the source encoding to the native encoding:

Code: Select all

IsUnicodeAutoHotkey(path) {
    FileRead fd, % "*p" (A_IsUnicode ? 1200 : 0) " " path
    return !!RegExMatch(fd, "MsgBox")
}
Now, we also have the File object and its RawRead method:

Code: Select all

IsUnicodeAutoHotkey(path) {
    f := FileOpen(path, "r"), f.RawRead(fd, f.Length)
    return !!RegExMatch(fd, "MsgBox")
}
This only works because when RawRead resizes the variable, it sets the variable's length regardless of any null characters. It only works in v1.1, because in v2 RawRead doesn't accept a variable "ByRef".

Backing up a bit, I mentioned that you need to be careful about the encoding of the needle - "MsgBox" in this case. If we run the above code with an ANSI version of AutoHotkey, the result is wrong because RegExMatch is searching for an ANSI string. You might think to simply invert the result with something like this:

Code: Select all

IsUnicodeAutoHotkey(path) {
    f := FileOpen(path, "r"), f.RawRead(fd, f.Length)
    return !!RegExMatch(fd, "MsgBox") = !!A_IsUnicode
}
However, this will get the wrong result in compiled Unicode scripts if the script itself contains "MsgBox". The way around that is to include a null terminator in the search string. Fortunately this is easy with RegEx:

Code: Select all

IsUnicodeAutoHotkey(path) {
    f := FileOpen(path, "r"), f.RawRead(fd, f.Length)
    return !!RegExMatch(fd, "MsgBox\0") = !!A_IsUnicode
}
Since we don't actually need the raw binary data, there's another trick we can use: let FileRead return "corrupt" data. If UTF-16 is specified for the source encoding, the entire file will be treated as a UTF-16 string. In a Unicode version of AutoHotkey, the file data will be returned as is and we will search for a UTF-16 string within it. In an ANSI version of AutoHotkey, any real UTF-16 strings will convert correctly to ANSI while any ANSI strings will be "corrupted" and won't be found by the search (because each pair of bytes will be misinterpreted as a UTF-16 code unit and undergo "conversion" to ANSI).

Code: Select all

IsUnicodeAutoHotkey(path) {
    FileRead fd, *p1200 %path%
    return !!RegExMatch(fd, "MsgBox\0")
}

For reference, it can be done with AutoHotkey v2.0-a112 like this:

Code: Select all

IsUnicodeAutoHotkey(path) {
    fd := FileRead(path, "UTF-16")
    return !!RegExMatch(fd, "MsgBox\0")
}

Code: Select all

IsUnicodeAutoHotkey(path) {
    buf := FileRead(path, "RAW")  ; Returns a Buffer.
    fd := StrGet(buf, -buf.size//2)  ; Get the data as a string.
    return !!RegExMatch(fd, "MsgBox\0")
}

One possible flaw in this technique is that compiled scripts can FileInstall other versions of AutoHotkey. In such cases, it might be necessary to scan for strings in multiple encodings, and compare the found positions. I'm not really sure where UpdateResource puts resources (i.e. the FileInstall data) in relation to code or strings.
User avatar
joedf
Posts: 7696
Joined: 29 Sep 2013, 17:08
Facebook: J0EDF
Google: +joedf
GitHub: joedf
Location: Canada
Contact:

Re: Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 08:41

Super neat, I was thinking of exploring how to detect AutoHotkey the other day. :+1:
I'd imagine a small dictionary of the most common (but somewhat unique to AHK) words could be used, say? :think:
MsgBox, UrlDownloadToFile, VarSetCapacity, SetTimer, SetBatchLines, #NoEnv, SendInput, WinExist, WinActivate
User avatar
SKAN
Posts: 667
Joined: 29 Sep 2013, 16:58

Re: Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 14:38

lexikos wrote: Within a script, A_IsUnicode tells you whether the running AutoHotkey executable is Unicode. (Unless you're using a recent v2 alpha, in which case A_IsUnicode is undefined, but there's no ANSI version anyway.)

But what if you want to know whether some other AutoHotkey.exe, AutoHotkeySC.bin or compiled script is Unicode?
Checking PE Import directory seems to be a better option.
I scrambled a function quickly and checked for imports from COMDLG32.dll specifically GetOpenFileNameW vs GetOpenFileNameA.
The function is able to lookup under 1.3ms for me.

Let me know if I can share here. :)
lexikos
Posts: 6968
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 16:59

Feel free to share. But is it better, or just faster? Saving a few milliseconds for this task is not particularly helpful. Even if it was, a synthetic benchmark on my system gave between 0.6ms and 1.2ms for my method depending on which file I check and with which AutoHotkey version.

It also wouldn't have been as interesting. ;)

RegisterClassExA vs RegisterClassExW might be more fitting. This is what decides the result of IsUnicodeWindow and whether the system sends Unicode messages or ANSI messages to the window.
User avatar
SKAN
Posts: 667
Joined: 29 Sep 2013, 16:58

Re: Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 17:48

lexikos wrote:Feel free to share.
 
Thank you.
 
lexikos wrote:But is it better, or just faster? Saving a few milliseconds for this task is not particularly helpful.
 
Not faster.. I will have to directly parse, if I need speed. Most of the overhead comes from ImageHlp\MapAndLoad().
Ok. I withdraw the word 'better'. After wrapping it into a function, it seems too verbose and cannot match your short versions.
 
lexikos wrote:It also wouldn't have been as interesting. ;)
:D
lexikos wrote:RegisterClassExA vs RegisterClassExW might be more fitting. This is what decides the result of IsUnicodeWindow and whether the system sends Unicode messages or ANSI messages to the window.
RegisterClassEx takes too many iterations.
Slightly favoring V2, WINMM.dll\mciSendString? seems to consume the least loop iterations.
 
Usage eg: MsgBox % DllCheckImport(A_AhkPath, "WINMM.dll", "mciSendString")
 
DllCheckImport()
TAC109
Posts: 548
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 17:57

@lexikos
A good in-depth exposition! Are you leaning towards any particular approach?
Edit: After more in-depth study, the last ones (v1 & v2) I guess. Do you have a preference for whether/how to include v1 in Ahk2Exe?
Edit2: I guess it should go into AHKType(). After I hear back on the point below, I can implement this tomorrow.

Incidentally can you expand on the comment on this line of code in your first example?

Code: Select all

fb := StrLen(fd) * (A_IsUnicode ? 2 : 1)  ; Get the size in bytes (may be safer to use StrLen than FileGetSize).
I use FileGetSize prior to FileRead ... *c in BinMod and it would be useful to know any shortcomings. (It is also used in Ahk2Exe's Compile.ahk and Directives.ahk.)
My programs:-
ReClip - a Text Reformatting and Clip Management utility
XRef - Produces Cross Reference lists for scripts
lexikos
Posts: 6968
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Detecting whether AutoHotkey.exe or a compiled script is Unicode

23 Jun 2020, 23:03

In general, there is a chance the size returned by FileGetSize won't match the size of the data if the file is being modified. Prior to 1.1.16.02 it could be quite inaccurate due to caching. It's probably fine in this case, but it's still better to check the size of the data you've already read.

I think AHKType already reads the file to check the machine type, so it shouldn't need to open the file twice.

Return to “Tips and Tricks”

Who is online

Users browsing this forum: No registered users and 3 guests