RegExMatch on binary data from FileRead Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

RegExMatch on binary data from FileRead

28 May 2017, 12:25

How can RegExMatch be reliably performed on a binary file?

Code: Select all

q:: ;test RegExMatch on binary data
FileGetSize, vSize, % A_AhkPath
VarSetCapacity(vData1, vSize, 1)
FileRead, vData2, % "*c " A_AhkPath
DllCall("kernel32\RtlMoveMemory", Ptr,&vData1, Ptr,&vData2, UPtr,vSize)

vNeedle := "AutoHotkeyGUI"
vSizeNeedle := StrLen(vNeedle)*2
Loop, % Floor(vSizeNeedle/2)
{
	vNum := NumGet(&vNeedle+0, A_Index*2-2, "UShort")
	vNeedleRegEx .= "\x{" Format("{:04X}", vNum) "}"
}

MsgBox, % RegExMatch(vData1, vNeedleRegEx)*2-2 ;842752 first time, -2 afterwards
MsgBox, % RegExMatch(vData2, vNeedleRegEx)*2-2 ;-2
return
[EDIT:] The only useful link I've found on this:
[SOLVED] AHK_L unicode - binary search problem - Ask for Help - AutoHotkey Community
https://autohotkey.com/board/topic/7749 ... h-problem/

Code: Select all

VarSetCapacity(v, 8, 1), NumPut(0x1020304000000000, v, 0, "int64")  ; (reversed byte order)
MsgBox % (RegExMatch(v, "\x{3040}\x{1020}") - 1) * 2
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegExMatch on binary data from FileRead  Topic is solved

11 Sep 2017, 12:17

The problem was simply that the line vNeedleRegEx := "" was missing, without it, the needle string was growing every time.

This script works every time now. The idea is to search binary data for a string. It appears that you can't do FileRead and immediately perform a RegExMatch on the data, you have to first copy the data to a ready-made variable, and then perform a RegExMatch.

Code: Select all

q:: ;RegExMatch on binary data
FileGetSize, vSize, % A_AhkPath
VarSetCapacity(vData1, vSize, 1)
FileRead, vData2, % "*c " A_AhkPath
DllCall("kernel32\RtlMoveMemory", Ptr,&vData1, Ptr,&vData2, UPtr,vSize)

vNeedle := "AutoHotkeyGUI"
vSizeNeedle := StrLen(vNeedle)*2
vNeedleRegEx := ""
Loop, % Floor(vSizeNeedle/2)
{
	vNum := NumGet(&vNeedle, A_Index*2-2, "UShort")
	vNeedleRegEx .= "\x{" Format("{:04X}", vNum) "}"
}

;for a UTF-16 string, this would also work:
;Loop, Parse, vNeedle
;	vNeedleRegEx .= "\x{" Format("{:04X}", Ord(A_LoopField)) "}"

MsgBox, % RegExMatch(vData1, vNeedleRegEx)*2-2 ;e.g. 842752
MsgBox, % RegExMatch(vData2, vNeedleRegEx)*2-2 ;-2
return
Note: In Unicode versions of AutoHotkey, RegExMatch operates in a 2-byte way, the needle length must be an even number of bytes, and matches are only found at even offsets.

For general binary searching see:
InBuf function currently 32-bit only (machine code binary buffer searching) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=28393
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: RegExMatch on binary data from FileRead

11 Sep 2017, 15:39

jeeswg wrote: It appears that you can't do FileRead
Greetings. I suggest you do fileOpen and rawRead.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegExMatch on binary data from FileRead

11 Sep 2017, 16:34

Yeah, good call. Funnily enough I wrote this File object code in the last few days, and SKAN had mentioned about File object and binary search on my Wish List 2.0.

The slight complications involved in the script above, mean that in this case, File object is probably the simpler option, because usually the FileXXX commands are the simpler option. Also I suppose the first method involves copying data to two variables, versus this method only copying to one.

Haha just as I'd thought I'd finally dealt with this problem ... another thing to try. I was thinking for a moment: but how can I get a handle on the data for use with RegEx, but I believe that that's the whole point of the File object, it doesn't retrieve any information until you ask for it, which is useful for dealing with massive files ... you get a bit at a time (unless you use RawRead).

Code: Select all

q:: ;RegExMatch on binary data
vPath := A_AhkPath
;needle corresponds to 'AutoHotkeyGUI'
vNeedleRegEx := "\x{0041}\x{0075}\x{0074}\x{006F}\x{0048}\x{006F}\x{0074}\x{006B}\x{0065}\x{0079}\x{0047}\x{0055}\x{0049}"

oFile := FileOpen(vPath, "r")
oFile.Pos := 0
oFile.RawRead(vData, oFile.Length)
oFile.Close()

MsgBox, % RegExMatch(vData, vNeedleRegEx)*2-2
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Anput, jameswrightesq and 269 guests