Page 1 of 1

RegExMatch on binary data from FileRead

Posted: 28 May 2017, 12:25
by jeeswg
How can RegExMatch be reliably performed on a binary file?

Code: Select all

q:: ;test RegExMatch on binary data
FileGetSize, vSize, % A_AhkPath
VarSetCapacity(vData1, vSize, 1)
FileRead, vData2, % "*c " A_AhkPath
DllCall("kernel32\RtlMoveMemory", Ptr,&vData1, Ptr,&vData2, UPtr,vSize)

vNeedle := "AutoHotkeyGUI"
vSizeNeedle := StrLen(vNeedle)*2
Loop, % Floor(vSizeNeedle/2)
{
	vNum := NumGet(&vNeedle+0, A_Index*2-2, "UShort")
	vNeedleRegEx .= "\x{" Format("{:04X}", vNum) "}"
}

MsgBox, % RegExMatch(vData1, vNeedleRegEx)*2-2 ;842752 first time, -2 afterwards
MsgBox, % RegExMatch(vData2, vNeedleRegEx)*2-2 ;-2
return
[EDIT:] The only useful link I've found on this:
[SOLVED] AHK_L unicode - binary search problem - Ask for Help - AutoHotkey Community
https://autohotkey.com/board/topic/7749 ... h-problem/

Code: Select all

VarSetCapacity(v, 8, 1), NumPut(0x1020304000000000, v, 0, "int64")  ; (reversed byte order)
MsgBox % (RegExMatch(v, "\x{3040}\x{1020}") - 1) * 2

Re: RegExMatch on binary data from FileRead  Topic is solved

Posted: 11 Sep 2017, 12:17
by jeeswg
The problem was simply that the line vNeedleRegEx := "" was missing, without it, the needle string was growing every time.

This script works every time now. The idea is to search binary data for a string. It appears that you can't do FileRead and immediately perform a RegExMatch on the data, you have to first copy the data to a ready-made variable, and then perform a RegExMatch.

Code: Select all

q:: ;RegExMatch on binary data
FileGetSize, vSize, % A_AhkPath
VarSetCapacity(vData1, vSize, 1)
FileRead, vData2, % "*c " A_AhkPath
DllCall("kernel32\RtlMoveMemory", Ptr,&vData1, Ptr,&vData2, UPtr,vSize)

vNeedle := "AutoHotkeyGUI"
vSizeNeedle := StrLen(vNeedle)*2
vNeedleRegEx := ""
Loop, % Floor(vSizeNeedle/2)
{
	vNum := NumGet(&vNeedle, A_Index*2-2, "UShort")
	vNeedleRegEx .= "\x{" Format("{:04X}", vNum) "}"
}

;for a UTF-16 string, this would also work:
;Loop, Parse, vNeedle
;	vNeedleRegEx .= "\x{" Format("{:04X}", Ord(A_LoopField)) "}"

MsgBox, % RegExMatch(vData1, vNeedleRegEx)*2-2 ;e.g. 842752
MsgBox, % RegExMatch(vData2, vNeedleRegEx)*2-2 ;-2
return
Note: In Unicode versions of AutoHotkey, RegExMatch operates in a 2-byte way, the needle length must be an even number of bytes, and matches are only found at even offsets.

For general binary searching see:
InBuf function currently 32-bit only (machine code binary buffer searching) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=28393

Re: RegExMatch on binary data from FileRead

Posted: 11 Sep 2017, 15:39
by Helgef
jeeswg wrote: It appears that you can't do FileRead
Greetings. I suggest you do fileOpen and rawRead.

Re: RegExMatch on binary data from FileRead

Posted: 11 Sep 2017, 16:34
by jeeswg
Yeah, good call. Funnily enough I wrote this File object code in the last few days, and SKAN had mentioned about File object and binary search on my Wish List 2.0.

The slight complications involved in the script above, mean that in this case, File object is probably the simpler option, because usually the FileXXX commands are the simpler option. Also I suppose the first method involves copying data to two variables, versus this method only copying to one.

Haha just as I'd thought I'd finally dealt with this problem ... another thing to try. I was thinking for a moment: but how can I get a handle on the data for use with RegEx, but I believe that that's the whole point of the File object, it doesn't retrieve any information until you ask for it, which is useful for dealing with massive files ... you get a bit at a time (unless you use RawRead).

Code: Select all

q:: ;RegExMatch on binary data
vPath := A_AhkPath
;needle corresponds to 'AutoHotkeyGUI'
vNeedleRegEx := "\x{0041}\x{0075}\x{0074}\x{006F}\x{0048}\x{006F}\x{0074}\x{006B}\x{0065}\x{0079}\x{0047}\x{0055}\x{0049}"

oFile := FileOpen(vPath, "r")
oFile.Pos := 0
oFile.RawRead(vData, oFile.Length)
oFile.Close()

MsgBox, % RegExMatch(vData, vNeedleRegEx)*2-2
return