But what if you want to know whether some other AutoHotkey.exe, AutoHotkeySC.bin or compiled script is Unicode?
You can scan the file for a string.
Certain strings are only present in the native encoding, so if you search for a UTF-16 string in an ANSI executable, you won't find it. However, you have to be careful about which string you use.
In order to operate, the interpreter binary must contain the name of every built-in function/command, so those are good candidates (if they exist in all versions of AutoHotkey). "AutoHotkey" won't work because it's always present in both UTF-16 and UTF-8 (in the manifest resource). Built-in variables can be used, but you must omit the "A_" prefix, and remember that A_IsUnicode isn't defined in recent v2 alphas. Older versions of AutoHotkey included them in lower-case, with some names broken up, like "loop" "file" "fullpath".
Code: Select all
IsUnicodeAutoHotkey(path) {
FileRead fd, *c %path% ; Get the raw file data.
fb := StrLen(fd) * (A_IsUnicode ? 2 : 1) ; Get the size in bytes (may be safer to use StrLen than FileGetSize).
needle := "MsgBox" ; Pick a string to search for, and convert it to UTF-16:
VarSetCapacity(ns := "", nb := StrPut(needle, "utf-16")), StrPut(needle, &ns, "utf-16")
; Search!
return InBuf(&fd, fb, &ns, nb) != -1 ; -1 means "not found"
}
Loop Files, %A_AhkPath%\..\*.exe
MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)
Loop Files, %A_AhkPath%\..\Compiler\*.bin
MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)
#NoEnv
What about without InBuf?
RegExMatch and RegExReplace can search past null characters, but there are caveats:
- You must pass a variable, and the variable's internal string length (StrLen) must match the data size. FileRead fd, *c %path% does set StrLen appropriately, but...
- Binary clipboard variables (from var := ClipboardAll or FileRead fd, *c %path%) are... special. By design, the internal string length of that type of variable is ignored in many cases, including RegExMatch.
- If you're searching for a string, you need to take care that it's in the right encoding - it will depend on which version of AutoHotkey you run, unless you perform conversion.
Above, Laszlo shows how to get a normal variable from a binary clipboard variable. This was before Unicode, so needs to be adjusted for that. We can use it like this:Laszlo wrote:When a binary file is to be read into RAM, we have to use the *c option, which sets StrLen the file size, but the data is stored in a special variable, not usable for RegEx. We have to copy it into another variable (or use dllcalls to open/read/close the file).Source: Machine code binary buffer searching regardless of NULL - Scripts and Functions - AutoHotkey CommunityCode: Select all
FileRead a, *c %A_AhkPath% VarSetCapacity(b,StrLen(a),1) DllCall("RtlMoveMemory", UInt,&b, UInt,&a, Uint,StrLen(a)) MsgBox % "It is found: " RegExMatch(b, "\0\03")
Code: Select all
IsUnicodeAutoHotkey(path) {
FileRead fd, *c %path%
VarSetCapacity(b, cb := StrLen(fd)*(A_IsUnicode?2:1), 1)
DllCall("RtlMoveMemory", UInt,&b, UInt,&fd, Uint,cb)
return !!RegExMatch(b, "MsgBox")
}
Loop Files, %A_AhkPath%\..\*.exe
MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)
Loop Files, %A_AhkPath%\..\Compiler\*.bin
MsgBox % A_LoopFileName "`n" IsUnicodeAutoHotkey(A_LoopFilePath)
#NoEnv
Code: Select all
IsUnicodeAutoHotkey(path) {
FileRead fd, *c %path%
NumPut(NumGet(fd, "char"), fd, "char") ; Clear the "binary clip" status.
return !!RegExMatch(fd, "MsgBox")
}
Code: Select all
IsUnicodeAutoHotkey(path) {
FileRead fd, % "*p" (A_IsUnicode ? 1200 : 0) " " path
return !!RegExMatch(fd, "MsgBox")
}
Code: Select all
IsUnicodeAutoHotkey(path) {
f := FileOpen(path, "r"), f.RawRead(fd, f.Length)
return !!RegExMatch(fd, "MsgBox")
}
Backing up a bit, I mentioned that you need to be careful about the encoding of the needle - "MsgBox" in this case. If we run the above code with an ANSI version of AutoHotkey, the result is wrong because RegExMatch is searching for an ANSI string. You might think to simply invert the result with something like this:
Code: Select all
IsUnicodeAutoHotkey(path) {
f := FileOpen(path, "r"), f.RawRead(fd, f.Length)
return !!RegExMatch(fd, "MsgBox") = !!A_IsUnicode
}
Code: Select all
IsUnicodeAutoHotkey(path) {
f := FileOpen(path, "r"), f.RawRead(fd, f.Length)
return !!RegExMatch(fd, "MsgBox\0") = !!A_IsUnicode
}
Code: Select all
IsUnicodeAutoHotkey(path) {
FileRead fd, *p1200 %path%
return !!RegExMatch(fd, "MsgBox\0")
}
For reference, it can be done with AutoHotkey v2.0-a112 like this:
Code: Select all
IsUnicodeAutoHotkey(path) {
fd := FileRead(path, "UTF-16")
return !!RegExMatch(fd, "MsgBox\0")
}
Code: Select all
IsUnicodeAutoHotkey(path) {
buf := FileRead(path, "RAW") ; Returns a Buffer.
fd := StrGet(buf, -buf.size//2) ; Get the data as a string.
return !!RegExMatch(fd, "MsgBox\0")
}
One possible flaw in this technique is that compiled scripts can FileInstall other versions of AutoHotkey. In such cases, it might be necessary to scan for strings in multiple encodings, and compare the found positions. I'm not really sure where UpdateResource puts resources (i.e. the FileInstall data) in relation to code or strings.