case-insensitive searching in binary buffers

Get help with using AutoHotkey and its commands and hotkeys
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

case-insensitive searching in binary buffers

20 Oct 2018, 21:45

tl;dr I can do it, but I'm interested in ideas for the algorithm

- I've been interested in having a function for case-insensitive searching in binary buffers. E.g. search a binary file for a string.
- AFAIK there are no existing Winapi functions for searching for ANSI/UTF-8/UTF-16 strings, which can handle null characters. I.e. they stop at the first null character.
- One way to do this would be to repeatedly use NumGet, but this is very very slow, so instead, writing a function in C++ and compiling it, and using it as a machine code function, would be better. E.g. something like this:
GitHub - HelgeffegleH/buf: Functions for searching in and writing to buffers
https://github.com/HelgeffegleH/buf
C++: C++ to machine code via TDM-GCC - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 23&t=49554
- (Generally speaking I write everything in AHK, but a few times, machine code functions that can be used in AHK are necessary.)
- (Note: case-sensitive searching is easier, because you only have to search for an exact stream of bytes. You can search for 'AutoHotkey', you don't have to also check for 'a'/'U'/'T' etc.)

- To do a case-insensitive search for 'AutoHotkey'. I check each character until I find 'A' or 'a', then I check if the next character is 'U' or 'u', then I check for 'T'/'t' etc. Either I've found the entire word, else, I start looking for 'A'/'a' again.
- The question is how to most efficiently handle checking multiple byte blocks.
- E.g. to do a case-sensitive search for 'résumé' in UTF-8:
R É S U M É
r é s u m é

- Also, for some letters, the upper case and lower case forms do not have the same number of bytes.
- Actually, this case is so rare that I would suggest any code use 2 separate algorithms. One algorithm for searching strings where each Unicode character has the same number of bytes for upper/lower case forms in UTF-8, and one algorithm to handle the general case where the number can differ.
- This script sought to find any characters where the upper case and lower case forms had a different number of bytes when stored as UTF-8.
- I found only 7 pairs of characters.

Code: Select all

;==================================================

q:: ;characters where upper/lower case have different sizes in UTF-8
vOutput := "`r`n"
Loop, 1114111
{
	vCharU := Format("{:U}", Chr(A_Index))
	vCharL := Format("{:L}", Chr(A_Index))
	vSizeU := StrPut(vCharU, "UTF-8") - 1
	vSizeL := StrPut(vCharL, "UTF-8") - 1
	vOrdU := Ord(vCharU)
	vOrdL := Ord(vCharL)
	vTemp := vCharU " " vCharL " " vSizeU " " vSizeL " " vOrdU " " vOrdL
	if !(vSizeU = vSizeL)
	&& !InStr(vOutput, "`r`n" vTemp "`r`n")
		vOutput .= vTemp "`r`n"
}
vOutput := SubStr(vOutput, 3)
Clipboard := vOutput
MsgBox, % vOutput
return

;==================================================

;Ⱥ ⱥ 2 3 570 11365
;Ⱦ ⱦ 2 3 574 11366
;Ɐ ɐ 3 2 11375 592
;Ɑ ɑ 3 2 11373 593
;Ɫ ɫ 3 2 11362 619
;Ɱ ɱ 3 2 11374 625
;Ɽ ɽ 3 2 11364 637

;Ⱥⱥ Ⱦⱦ Ɐɐ Ɑɑ Ɫɫ Ɱɱ Ɽɽ

;==================================================
- If we assume that each character has the same number of bytes in the upper and lower case forms. One algorithm could work like this, where each space represents a pad byte:
A BC D EF
a bc d ef
- We split the needle into byte groups, one byte group for each character.
- The length of each split-up needle is 9 (including spaces). Let's say when searching for 'A'/'a', we've found 'ABCdef'.
- We check if 'A'/'a' matches. 'A' matches. The next byte is a pad byte so we skip to the next byte. We check if 'B'/'b' matches. 'B' matches. The next byte is not the pad byte, (so we are still within the byte group for a character,) so we only check if 'C' matches, we do not check for 'c'. 'C' matches. Etc.

- The more general algorithm, where byte groups could vary in length, could ignore pad characters, and wait until there is a pad character in both byte lists (to signal a new byte group).

- Another algorithm could make use of a list. E.g. using '110110' to represent 'A BC D EF'. I.e. 1 marks the start of a new byte group.
- [EDIT:] This approach is probably better. One problem with having a pad byte, is that you need to specify a pad byte number between 0 and 255. Your needle might include all 256 different bytes.
- [EDIT:] You could use a ternary (base 3) pattern for each string. 2 for the start of a byte group, 1 for the continuation of a byte group, 0 for a pad byte.
- [EDIT:] Perhaps the best system. The upper and lower case versions of the needle each have a binary 'guide' number that represents them. While checking a lower case byte group, you move the pointer along on the lower case 'guide' until you hit 1, at which point you move the pointer along on the upper case 'guide' until it hits 1. (That way you can have byte groups of differing lengths, but avoid using pad bytes. You keep the pointers on each binary 'guide' in sync.)

- Or perhaps another idea.
- The question is, which algorithm is best?

- [EDIT:] I realised that the UTF-8 forms of upper and lower case characters, can start with the same bytes, but then differ, so ... if one 'character' (byte group) fails to match, you need to backtrack and try the other one. Or you check the bytes from both byte groups simultaneously.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: case-insensitive searching in binary buffers

23 Oct 2018, 23:45

- I've come up with some code via NumGet to do a case-insensitive search for an ANSI/UTF-8/UTF-16 string in a binary buffer, as a prelude to a potential machine code function.
- The test script creates some binary data consisting only of null bytes, and writes a string at 2 places, the function then searches for that string from the front and from the back.
- Do notify of any errors, although the function has passed some tough tests e.g. the test below.
- The function is called 'InBufStrAlt' rather than 'InBufStr', because I use the Alt suffix for alternative versions of a proper function. The proper function would use machine code.

Code: Select all

q:: ;case-insensitive search for ANSI/UTF-8/UTF-16 string in binary buffer
vNeedle := "Ⱥⱥ Ⱦⱦ Ɐɐ Ɑɑ Ɫɫ Ɱɱ Ɽɽ"
;vNeedle := "Résumé"
vNeedleU := Format("{:U}", vNeedle)
vNeedleL := Format("{:L}", vNeedle)

vListEnc := "CP0,UTF-8,UTF-16"
Loop, Parse, vListEnc, % ","
{
	vEnc := A_LoopField
	vSize := 10000, vOffset1 := 1000, vOffset2 := 9000
	vHaystack := ""
	VarSetCapacity(vHaystack, vSize, 0)
	StrPut(vNeedle, &vHaystack+vOffset1, vEnc)
	StrPut(vNeedle, &vHaystack+vOffset2, vEnc)

	vRet := JEE_InBufStrAlt(&vHaystack, vSize, vNeedle, vEnc)
	vRetU := JEE_InBufStrAlt(&vHaystack, vSize, vNeedleU, vEnc)
	vRetL := JEE_InBufStrAlt(&vHaystack, vSize, vNeedleL, vEnc)

	vRetRev := JEE_InBufStrAlt(&vHaystack, vSize, vNeedle, vEnc, -1)
	vRetRevU := JEE_InBufStrAlt(&vHaystack, vSize, vNeedleU, vEnc, -1)
	vRetRevL := JEE_InBufStrAlt(&vHaystack, vSize, vNeedleL, vEnc, -1)
	MsgBox, % vRet "`r`n" vRetU "`r`n" vRetL
	. "`r`n" vRetRev "`r`n" vRetRevU "`r`n" vRetRevL
}
return

;==================================================

JEE_InBufStrAlt(vAddr, vSize, vNeedle, vEnc:="CP0", vStep:=1)
{
	local
	vNeedleU := Format("{:U}", vNeedle)
	vNeedleL := Format("{:L}", vNeedle)
	vBinU := vBinL := ""
	Loop, Parse, vNeedle
	{
		vChar := A_LoopField
		vCharU := Format("{:U}", vChar)
		vCharL := Format("{:L}", vChar)
		if !(vEnc = "UTF-16")
			vSizeU := StrPut(vCharU, vEnc) - 1
			, vSizeL := StrPut(vCharL, vEnc) - 1
		else
			vSizeU := StrLen(vCharU)*2
			, vSizeL := StrLen(vCharL)*2
		vBinU .= "1" JEE_StrRept("0", vSizeU - 1)
		vBinL .= "1" JEE_StrRept("0", vSizeL - 1)
	}
	vBinLenU := StrLen(vBinU)
	vBinLenL := StrLen(vBinL)
	if !(vEnc = "UTF-16")
		vSizeNeedleU := StrPut(vNeedleU, vEnc)
		, vSizeNeedleL := StrPut(vNeedleL, vEnc)
	else
		vSizeNeedleU := StrLen(vNeedleU)*2
		, vSizeNeedleL := StrLen(vNeedleL)*2
	vNeedle8U := ""
	VarSetCapacity(vNeedle8U, vSizeNeedleU+1)
	vNeedle8L := ""
	VarSetCapacity(vNeedle8L, vSizeNeedleL+1)
	StrPut(vNeedleU, &vNeedle8U, vEnc)
	StrPut(vNeedleL, &vNeedle8L, vEnc)
	vByte1U := NumGet(&vNeedle8U, 0, "UChar")
	vByte1L := NumGet(&vNeedle8L, 0, "UChar")

	vEnd := vAddr + vSize
	if (vStep > 0)
		vAddrTemp1 := vAddr - vStep
	else
		vAddrTemp1 := vEnd + vStep ;where vStep is negative
	vDoCheckU := vDoCheckL := 0
	vPosU := vPosL := 0
	vIsMatch := 0
	vBinU .= "1", vBinL .= "1" ;trailing 1s to aid parsing
	Loop, % vSize - Min(vBinLenU, vBinLenL) + 1
	{
		vAddrTemp1 += vStep
		if (vAddrTemp1 < vAddr) || (vAddrTemp1 >= vEnd)
			return -1
		vByte := NumGet(vAddrTemp1+0, "UChar")
		if (vByte = vByte1U)
			vDoCheckU := 1
		if (vByte = vByte1L)
			vDoCheckL := 1
		if !vDoCheckU && !vDoCheckL
			continue
		vPosU := vPosL := 1
		vAddrTemp := vAddrTemp1-1
		Loop, % Max(vBinLenU, vBinLenL) + 1
		{
			vAddrTemp++
			vByte := NumGet(vAddrTemp+0, "UChar")

			;diagnostic:
			;MsgBox, % vByte " " (NumGet(&vNeedle8U, vPosU-1, "UChar")) " " vPosU
			;. "`r`n`r`n" vByte " " (NumGet(&vNeedle8L, vPosL-1, "UChar")) " " vPosL

			if vDoCheckU
				if (vByte = NumGet(&vNeedle8U, vPosU-1, "UChar"))
				{
					vPosU += 1
					if (vPosU > vBinLenU)
					{
						vIsMatch := 1
						break 2
					}
					if SubStr(vBinU, vPosU, 1)
					{
						vDoCheckL := 1
						vPosL := InStr(vBinL, "1", 0, vPosL+1)
						continue
					}
				}
				else
					vDoCheckU := 0
			if vDoCheckL
				if (vByte = NumGet(&vNeedle8L, vPosL-1, "UChar"))
				{
					vPosL += 1
					if (vPosL > vBinLenL)
					{
						vIsMatch := 1
						break 2
					}
					if SubStr(vBinL, vPosL, 1)
					{
						vDoCheckU := 1
						vPosU := InStr(vBinU, "1", 0, vPosU+1)
						continue
					}
				}
				else
					vDoCheckL := 0
			if !vDoCheckU && !vDoCheckL
				break
		}
	}
	return vIsMatch ? (vAddrTemp1-vAddr) : -1
}

JEE_StrRept(vText, vNum)
{
	local
	if (vNum <= 0)
		return
	return StrReplace(Format("{:" vNum "}", ""), " ", vText)
	;return StrReplace(Format("{:0" vNum "}", 0), 0, vText)
}

;==================================================
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: case-insensitive searching in binary buffers

31 Dec 2018, 11:56

Well, here's a working script. Do notify of any issues, or make any suggestions.

Code: Select all

q:: ;case-insensitive search for ANSI/UTF-8/UTF-16 string in binary buffer
vNeedle := "Ⱥⱥ Ⱦⱦ Ɐɐ Ɑɑ Ɫɫ Ɱɱ Ɽɽ"
;vNeedle := "abc"
;vNeedle := "résumé"
;vNeedle := Format("{:U}", vNeedle)
;vNeedle := Format("{:T}", vNeedle)
;vNeedle := Format("{:L}", vNeedle)

vNeedleU := Format("{:U}", vNeedle)
vNeedleL := Format("{:L}", vNeedle)

vListEnc := "CP0,UTF-8,UTF-16"
Loop, Parse, vListEnc, % ","
{
	vEnc := A_LoopField
	vSize := 10000, vOffset1 := 1000, vOffset2 := 9000
	vHaystack := ""
	VarSetCapacity(vHaystack, vSize, 0)
	StrPut(vNeedle, &vHaystack+vOffset1, vEnc)
	StrPut(vNeedle, &vHaystack+vOffset2, vEnc)

	vSfx := ""
	;vSfx := "Alt"

	vRet := JEE_InBufStr%vSfx%(&vHaystack, vSize, vNeedle, vEnc)
	vRetU := JEE_InBufStr%vSfx%(&vHaystack, vSize, vNeedleU, vEnc)
	vRetL := JEE_InBufStr%vSfx%(&vHaystack, vSize, vNeedleL, vEnc)

	vRetRev := JEE_InBufStr%vSfx%(&vHaystack, vSize, vNeedle, vEnc, -1)
	vRetRevU := JEE_InBufStr%vSfx%(&vHaystack, vSize, vNeedleU, vEnc, -1)
	vRetRevL := JEE_InBufStr%vSfx%(&vHaystack, vSize, vNeedleL, vEnc, -1)

	MsgBox, % vRet "`r`n" vRetU "`r`n" vRetL
	. "`r`n" vRetRev "`r`n" vRetRevU "`r`n" vRetRevL
}
return

;==================================================

;case-insensitive searching in binary buffers - AutoHotkey Community
;https://autohotkey.com/boards/viewtopic.php?f=5&t=58103

;search for ANSI/UTF-8/UTF-16 strings, case insensitive
;vStep: -1 for reverse search

JEE_InBufStr(vAddr, vSize, vNeedle, vEnc:="CP0", vStep:=1)
{
	local
	static vIsReady := 0, vFunc
	if !vIsReady
	{
		vHex32 := "5557565383EC188B44243C8B5C24388B7C242C8B6C244C8D48018B44244083C00139C189C20F46D10F42C88B4424302B7C24440FB6008844240E8B4424340FB6008844240F8B44242C01D8837C24440089C6894424140F4EF88D430129D089442410746E037C2444397C242C776439FE76608D4101C744240801000000C644240D0031F689442404908DB426000000000FB61F385C240E0F8483010000385C240F743D0FB644240D89F283F00138D0763431F683442408018B4424083B4424107710037C2444397C242C7706397C241477BE83C418B8FFFFFFFF5B5E5F5DC390C644240D018B44240485C074C631C9B80100000031D2890C2489"
		. "6C244C8D760089F184C974108B4C24308B2C243A1C290F848A000000807C240D000F84230100008B7424343A1C160F854301000031F683C201395424400F84260100008B5C244CC644240D01803C1300743E830424018B7424488B1C24803C1E000F85F80000008B6C244C89D989F38DB4260000000083C101803C0B0074F7890C24896C244CBE01000000C644240D0183C00139442404725D0FB65C07FFE963FFFFFF8D7600830424018B2C24396C243C0F84B20000008B4C2448803C290074458B5C244C83C201803C13000F858300000089E989DD83C201807C15000074F683C00139442404890C24896C244CC644240D0173A4908B6C244C"
		. "E9BAFEFFFF8DB42600000000807C240D0074838B4C24343A1C110F841CFFFFFFC644240D00E96CFFFFFF66900FB654240EB801000000BE010000003854240F0FB654240D0F44D08854240DE9A1FEFFFF8B6C244C31F6E964FEFFFFC644240D01E931FFFFFFBE01000000E927FFFFFF89F82B44242C83C4185B5E5F5DC38B6C244CC644240D00E932FEFFFF"
		vHex64 := "4157415641554154555756534883EC28448BBC24900000008B84249800000048894C24708B8C24A00000004C8BAC24A80000004C8BA424B0000000418D7F0183C0014189C239C7440F46D70F42F80FB60288442412410FB600884424134489C8480344247085C948894424084889C37E0B488B5C24704863C14829C3418D41014429D08944241474664863C14801C348395C24704889442418775448395C2408764D41BE0100000031F631C083C70190440FB61B44385C24120F843601000044385C2413743F89F183F10138C1763B31C04183C6014439742414721348035C241848395C2470770748395C240877C1B8FFFFFFFF4883C4285B5E"
		. "5F5D415C415D415E415FC3BE0100000085FF74C331C94531C94531D2662E0F1F84000000000084C0740D4489D544381C2A0F84830000004084F674994489C845381C000F85E400000031C04183C10144398C24980000000F84C50000004589CBBE0100000043803C1C007433418D420141807C0500004989C20F85990000000F1F8000000000418D420141807C0500004989C274F1BE01000000B8010000004883C101448D59014439DF0F8227FFFFFF440FB61C0BE96CFFFFFF4183C2014539D7745F4489D541807C2D000075184084F674CC4489CE453A1C300F846BFFFFFF31F6EBBB6690458D590143803C1C004D89D974F2BE01000000EB"
		. "A4440FB6542412B901000000B80100000044385424130F44F1E9F8FEFFFFB801000000E97DFFFFFF89D82B442470E9CDFEFFFF31F631C0E9A1FEFFFF"
		vHex := A_PtrSize=8 ? vHex64 : vHex32
		VarSetCapacity(vFunc, StrLen(vHex)//2)
		Loop, % StrLen(vHex)//2
			NumPut("0x" SubStr(vHex, 2*A_Index-1, 2), vFunc, A_Index-1, "UChar")
		vHex64 := vHex32 := ""
		vIsReady := 1
	}

	vNeedleU := Format("{:U}", vNeedle)
	vNeedleL := Format("{:L}", vNeedle)
	vBinU := vBinL := ""
	Loop, Parse, vNeedle
	{
		vChar := A_LoopField
		vCharU := Format("{:U}", vChar)
		vCharL := Format("{:L}", vChar)
		if !(vEnc = "UTF-16")
			vSizeU := StrPut(vCharU, vEnc) - 1
			, vSizeL := StrPut(vCharL, vEnc) - 1
		else
			vSizeU := StrLen(vCharU)*2
			, vSizeL := StrLen(vCharL)*2
		vBinU .= "1" JEE_StrRept("0", vSizeU - 1)
		vBinL .= "1" JEE_StrRept("0", vSizeL - 1)
	}
	vBinU .= "1", vBinL .= "1" ;trailing 1s to aid parsing
	vBinLenU := StrLen(vBinU)
	vBinLenL := StrLen(vBinL)
	if !(vEnc = "UTF-16")
		vSizeNeedleU := StrPut(vNeedleU, vEnc)
		, vSizeNeedleL := StrPut(vNeedleL, vEnc)
	else
		vSizeNeedleU := StrLen(vNeedleU)*2
		, vSizeNeedleL := StrLen(vNeedleL)*2
	vNeedle8U := ""
	VarSetCapacity(vNeedle8U, vSizeNeedleU+1)
	vNeedle8L := ""
	VarSetCapacity(vNeedle8L, vSizeNeedleL+1)
	StrPut(vNeedleU, &vNeedle8U, vEnc)
	StrPut(vNeedleL, &vNeedle8L, vEnc)

	;==============================

	VarSetCapacity(vBinU2, vBinLenU)
	VarSetCapacity(vBinL2, vBinLenL)
	Loop, % vBinLenU
		NumPut(SubStr(vBinU, A_Index, 1), &vBinU2, A_Index-1, "UChar")
	Loop, % vBinLenL
		NumPut(SubStr(vBinL, A_Index, 1), &vBinL2, A_Index-1, "UChar")
	;pHaystack, pNeedleU, pNeedleL
	;vSize, vSizeNeedleU, vSizeNeedleL
	;vStep, pBinU, pBinL ;note: pBinU/pBinL are 1s and 0s indicating the start/end of characters
	return DllCall(&vFunc, Ptr,vAddr, Ptr,&vNeedle8U, Ptr,&vNeedle8L, UInt,vSize, UInt,vSizeNeedleU, UInt,vSizeNeedleL, Int,vStep, Ptr,&vBinU2, Ptr,&vBinL2, Int)
}

;==================================================

JEE_InBufStrAlt(vAddr, vSize, vNeedle, vEnc:="CP0", vStep:=1)
{
	local
	vNeedleU := Format("{:U}", vNeedle)
	vNeedleL := Format("{:L}", vNeedle)
	vBinU := vBinL := ""
	Loop, Parse, vNeedle
	{
		vChar := A_LoopField
		vCharU := Format("{:U}", vChar)
		vCharL := Format("{:L}", vChar)
		if !(vEnc = "UTF-16")
			vSizeU := StrPut(vCharU, vEnc) - 1
			, vSizeL := StrPut(vCharL, vEnc) - 1
		else
			vSizeU := StrLen(vCharU)*2
			, vSizeL := StrLen(vCharL)*2
		vBinU .= "1" JEE_StrRept("0", vSizeU - 1)
		vBinL .= "1" JEE_StrRept("0", vSizeL - 1)
	}
	vBinU .= "1", vBinL .= "1" ;trailing 1s to aid parsing
	vBinLenU := StrLen(vBinU)
	vBinLenL := StrLen(vBinL)
	if !(vEnc = "UTF-16")
		vSizeNeedleU := StrPut(vNeedleU, vEnc)
		, vSizeNeedleL := StrPut(vNeedleL, vEnc)
	else
		vSizeNeedleU := StrLen(vNeedleU)*2
		, vSizeNeedleL := StrLen(vNeedleL)*2
	vNeedle8U := ""
	VarSetCapacity(vNeedle8U, vSizeNeedleU+1)
	vNeedle8L := ""
	VarSetCapacity(vNeedle8L, vSizeNeedleL+1)
	StrPut(vNeedleU, &vNeedle8U, vEnc)
	StrPut(vNeedleL, &vNeedle8L, vEnc)

	vByte1U := NumGet(&vNeedle8U, 0, "UChar")
	vByte1L := NumGet(&vNeedle8L, 0, "UChar")
	vEnd := vAddr + vSize
	if (vStep > 0)
		vAddrTemp1 := vAddr - vStep
	else
		vAddrTemp1 := vEnd
	vDoCheckU := vDoCheckL := 0
	vPosU := vPosL := 0
	vIsMatch := 0

	;==============================

	Loop, % vSize - Min(vBinLenU, vBinLenL) + 1
	{
		vAddrTemp1 += vStep
		if (vAddrTemp1 < vAddr) || (vAddrTemp1 >= vEnd)
			return -1
		vByte := NumGet(vAddrTemp1+0, "UChar")
		if (vByte = vByte1U)
			vDoCheckU := 1
		if (vByte = vByte1L)
			vDoCheckL := 1
		if !vDoCheckU && !vDoCheckL
			continue
		vPosU := vPosL := 1
		vAddrTemp := vAddrTemp1-1
		Loop, % Max(vBinLenU, vBinLenL) + 1
		{
			vAddrTemp++
			vByte := NumGet(vAddrTemp+0, "UChar")

			;diagnostic:
			;MsgBox, % vByte " " (NumGet(&vNeedle8U, vPosU-1, "UChar")) " " vPosU
			;. "`r`n`r`n" vByte " " (NumGet(&vNeedle8L, vPosL-1, "UChar")) " " vPosL

			if vDoCheckU
				if (vByte = NumGet(&vNeedle8U, vPosU-1, "UChar"))
				{
					vPosU++
					if (vPosU > vBinLenU)
					{
						vIsMatch := 1
						break 2
					}
					if SubStr(vBinU, vPosU, 1)
					{
						vDoCheckL := 1
						vPosL := InStr(vBinL, "1", 0, vPosL+1)
						continue
					}
				}
				else
					vDoCheckU := 0
			if vDoCheckL
				if (vByte = NumGet(&vNeedle8L, vPosL-1, "UChar"))
				{
					vPosL++
					if (vPosL > vBinLenL)
					{
						vIsMatch := 1
						break 2
					}
					if SubStr(vBinL, vPosL, 1)
					{
						vDoCheckU := 1
						vPosU := InStr(vBinU, "1", 0, vPosU+1)
						continue
					}
				}
				else
					vDoCheckL := 0
			if !vDoCheckU && !vDoCheckL
				break
		}
	}
	return vIsMatch ? (vAddrTemp1-vAddr) : -1
}

;==================================================
Here's some C++ code, which is translated to machine code, stored as hex, and used in the function.

Code: Select all

;==================================================

;C++: C++ to machine code via TDM-GCC - AutoHotkey Community
;https://autohotkey.com/boards/viewtopic.php?f=23&t=49554

;note: replace bool with _Bool before compiling with TDM-GCC

/*
int inbufstr(unsigned char* pHaystack, unsigned char* pNeedleU, unsigned char* pNeedleL, unsigned int vSize, unsigned int vSizeNeedleU, unsigned int vSizeNeedleL, int vStep, unsigned char* pBinU, unsigned char* pBinL)
{
	unsigned char vByte, vByte1U, vByte1L;
	unsigned char *vEnd, *vAddrTemp, *vAddrTemp1; //pointers to chars
	bool vDoCheckU, vDoCheckL, vIsMatch;
	unsigned int vPosU, vPosL, vBinLenU, vBinLenL, vMax, vMin;
	vBinLenU = vSizeNeedleU + 1;
	vBinLenL = vSizeNeedleL + 1;
	vMin = vBinLenU < vBinLenL ? vBinLenU : vBinLenL;
	vMax = vBinLenU > vBinLenL ? vBinLenU : vBinLenL;
	vByte1U = pNeedleU[0];
	vByte1L = pNeedleL[0];
	vEnd = pHaystack + vSize;
	if (vStep > 0)
		vAddrTemp1 = pHaystack - vStep;
	else
		vAddrTemp1 = vEnd;
	vDoCheckU = 0; vDoCheckL = 0;
	vPosU = 0; vPosL = 0;
	vIsMatch = 0;
	for (int j = 1; j <= (vSize - vMin + 1); ++j)
	{
		vAddrTemp1 += vStep;
		if ((vAddrTemp1 < pHaystack) || (vAddrTemp1 >= vEnd))
			return 0xFFFFFFFF;
		vByte = *vAddrTemp1;
		if (vByte == vByte1U)
			vDoCheckU = 1;
		if (vByte == vByte1L)
			vDoCheckL = 1;
		if (!vDoCheckU && !vDoCheckL)
			continue;
		vPosU = 0; vPosL = 0;
		vAddrTemp = vAddrTemp1 - 1;
		for (int i = 1; i <= (vMax + 1); ++i)
		{
			vAddrTemp++;
			vByte = *vAddrTemp;
			if (vDoCheckU)
			{
				if (vByte == pNeedleU[vPosU])
				{
					vPosU++;
					if (vPosU == vSizeNeedleU)
					{
						vIsMatch = 1;
						break;
					}
					if (pBinU[vPosU])
					{
						vDoCheckL = 1;
						vPosL++;
						while (!pBinL[vPosL])
							vPosL++;
						continue;
					}
				}
				else
					vDoCheckU = 0;
			}
			if (vDoCheckL)
			{
				if (vByte == pNeedleL[vPosL])
				{
					vPosL++;
					if (vPosL == vSizeNeedleL)
					{
						vIsMatch = 1;
						break;
					}
					if (pBinL[vPosL])
					{
						vDoCheckU = 1;
						vPosU++;
						while (!pBinU[vPosU])
							vPosU++;
						continue;
					}
				}
				else
					vDoCheckL = 0;
			}
			if (!vDoCheckU && !vDoCheckL)
				break;
		}
		if (vIsMatch)
			break;
	}
	return vIsMatch ? (vAddrTemp1 - pHaystack) : 0xFFFFFFFF;
}
*/

;==================================================
I found out that I had to change 'bool' to '_Bool' to compile with TDM-GCC, mentioned here. I usually write and test my code in Visual Studio. Cheers.
C++: TDM-GCC: error with bool - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=23&t=59979
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Ask For Help”

Who is online

Users browsing this forum: Google [Bot], TAC109 and 323 guests