Of course, my mistake, fixed.
Search optimisation Topic is solved
-
- Posts: 4391
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Search optimisation
For memcmp the cdecl can be ignored since it is a int as return.
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
Re: Search optimisation
That solves the issue for me.
-
- Posts: 4391
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Search optimisation
@pneumatic
Can you test this, please:
Can you test this, please:
Code: Select all
SetBatchLines, -1
filePath := "C:\Users\mi\AppData\Roaming\XnViewMP\Thumb.db"
bytes := "74 72 61"
size := ReadFile(filePath, buff)
len := CryptStringToBinary(bytes, needle, "CRYPT_STRING_HEX")
start := A_TickCount
offsets := SearchInBuff(&buff, size, &needle, len)
MsgBox,, % A_TickCount - start . "ms", % offsets
SearchInBuff(pHaystack, hstkSize, pNeedle, needleSize) {
arr := GetMax(pNeedle, needleSize)
maxByte := arr.max, offset := arr.maxIdx
prevByte := arr.prevMax, prevByteShift := arr.prevMaxIdx - offset
bool := needleSize > 1
found := "", addr := pHaystack + offset - 1, maxAddr := addr + hstkSize - needleSize + 1
while addr := DllCall("msvcrt\memchr", "Ptr", addr + 1, "Int", maxByte, "Ptr", maxAddr - addr, "Cdecl Ptr") {
( (bool ? NumGet(addr + prevByteShift, "UChar") = prevByte : true)
&& DllCall("msvcrt\memcmp", "Ptr", addr - offset, "Ptr", pNeedle, "Ptr", needleSize) = 0
&& found .= Format("{:#x}", addr - offset - pHaystack) . "`n" )
}
Return RTrim(found, "`n")
}
ReadFile(filePath, ByRef buff) {
File := FileOpen(filePath, "r")
File.Pos := 0
File.RawRead(buff, size := File.Length)
File.Close()
Return size
}
CryptStringToBinary(string, ByRef outData, formatName := "CRYPT_STRING_BASE64")
{
static formats := { CRYPT_STRING_BASE64: 0x1
, CRYPT_STRING_HEX: 0x4
, CRYPT_STRING_HEXRAW: 0xC }
fmt := formats[formatName]
chars := StrLen(string)
if !DllCall("Crypt32\CryptStringToBinary", "Str", string, "UInt", chars, "UInt", fmt
, "Ptr", 0, "UIntP", bytes, "UIntP", 0, "UIntP", 0)
throw "CryptStringToBinary failed. LastError: " . A_LastError
VarSetCapacity(outData, bytes)
DllCall("Crypt32\CryptStringToBinary", "Str", string, "UInt", chars, "UInt", fmt
, "Str", outData, "UIntP", bytes, "UIntP", 0, "UIntP", 0)
Return bytes
}
GetMax(pData, len) {
VarSetCapacity(sortedData, len, 0)
DllCall("RtlMoveMemory", "Ptr", &sortedData, "Ptr", pData, "Ptr", len)
pCmp := RegisterCallback("cmp", "C F", 2)
DllCall("msvcrt\qsort", "Ptr", &sortedData, "Ptr", len, "Ptr", 1, "Ptr", pCmp, "Cdecl")
max := NumGet(sortedData, "UChar"), (len > 1 && prevMax := NumGet(sortedData, 1, "UChar"))
Loop % len {
i := A_Index - 1
res := NumGet(pData + i, "UChar")
(res = max && maxIdx := i), (res = prevMax && prevMaxIdx := i)
}
Return {max: max, maxIdx: maxIdx, prevMax: prevMax, prevMaxIdx: prevMaxIdx}
}
cmp(a, b) {
a := NumGet(a + 0, "UChar"), b := NumGet(b + 0, "UChar")
Return a < b ? 1 : a > b ? -1 : 0
}
Re: Search optimisation
It only works if I put either Cdecl or Cdecl Ptr as final parameter on both memchr and memcmp in SearchInBuff (in your example it's only present on memchr).
Re: Search optimisation
Strange. (ReturnType: If the function returns a 32-bit signed integer (Int), BOOL, or nothing at all, ReturnType may be omitted.)
Than use
But it looks like it is needed
(If you omit Cdecl but the call yields ErrorLevel An -- where n is the total size of the arguments you passed -- Cdecl might be required.)
Than use
Code: Select all
while addr := DllCall("msvcrt\memchr", "Ptr", addr + 1, "Int", maxByte, "Ptr", maxAddr - addr, "Cdecl Ptr") {
( (bool ? NumGet(addr + prevByteShift, "UChar") = prevByte : true)
&& DllCall("msvcrt\memcmp", "Ptr", addr - offset, "Ptr", pNeedle, "Ptr", needleSize, "Cdecl Int") = 0
&& found .= Format("{:#x}", addr - offset - pHaystack) . "`n" )
}
But it looks like it is needed
(If you omit Cdecl but the call yields ErrorLevel An -- where n is the total size of the arguments you passed -- Cdecl might be required.)
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
-
- Posts: 4391
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Search optimisation
Looks like I overread myself on "Cdecl ReturnType"
After 99.8% of all Windows 10 are on 64-bit, I haven't used 32-bit AutoHotkey for years.
And I don't see any reason why you should use 32-bit since Window 10.
But nvm. Im glad we solved it with "cdecl ptr" and "cdecl int"
After 99.8% of all Windows 10 are on 64-bit, I haven't used 32-bit AutoHotkey for years.
And I don't see any reason why you should use 32-bit since Window 10.
But nvm. Im glad we solved it with "cdecl ptr" and "cdecl int"
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
Re: Search optimisation
There's challenge: to compile a 64-bit version of the mcode for even more performance:
https://autohotkey.com/board/topic/23627-machine-code-binary-buffer-searching-regardless-of-null/#entry152824
https://autohotkey.com/board/topic/23627-machine-code-binary-buffer-searching-regardless-of-null/#entry152824
Re: Search optimisation
Not everyone is running Windows 10, so if you write code for distribution to others, 32-bit AHK ensures compatibility over a wider user base. I'm not saying you should use it, but there are valid reasons why others do.
Re: Search optimisation
Oops! I didn't read your code fully... only the part I quoted. Great idea on searching the max value.teadrinker wrote: ↑06 May 2021, 05:51I assumed that the byte with max value is generally encountered less frequently. At least it can be non-zero, zero generally encountered very often.Thanks! I took previous byte by value:
I have written a machine code version of InStr(), which can deal with pointers and search past nulls.
Just wondering what to do with the CaseSensitive parameter?
I will tag you when I post the function. Thanks for sharing wonderful code.
Edit: 09-May-2021
Done! InBin() : Machine code variant of InStr(), for searching binary buffer.
Re: Search optimisation
99.8% of all third-party DLLs were written for 32-bit, I would say..
I have a powerful PC, but only for sake of Photoshop x64.
Almost all installed software are x64 except AutoHotkey.
I will default to x64 when V2 beta is available.
Re: Search optimisation
I don’t have time to look at converting InBuf to 64-bit at present. As far as I know the major changes needed would relate to the handling of the input parameters, which is different for x64.
This post has some information on using RegExMatch to search binary data. It is rather specialised but you may be able to get some ideas from it, depending on what you’re trying to achieve.
This post has some information on using RegExMatch to search binary data. It is rather specialised but you may be able to get some ideas from it, depending on what you’re trying to achieve.
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
Re: Search optimisation
The mcode solution seems to be unusually slow when searching for needles containing 0's -- slower than the msvcrt solution.
I suspect it's due to teadrinker's optimisation of searching for the largest byte first, which doesn't seem to be implementable with the mcode, since you can't vary the size of its search address window (only the starting position to search the entire buffer thereafter).
edit: actually mcode is still slower even when the needle is just a single byte 0x00.
Also I think we could implement a 'search maxbyte first' solution in mcode, but I suspect performance may be destroyed by all the per-byte ahk code needed (whereas in mscvcrt solution, this work is done by memchr).
I suspect it's due to teadrinker's optimisation of searching for the largest byte first, which doesn't seem to be implementable with the mcode, since you can't vary the size of its search address window (only the starting position to search the entire buffer thereafter).
edit: actually mcode is still slower even when the needle is just a single byte 0x00.
Also I think we could implement a 'search maxbyte first' solution in mcode, but I suspect performance may be destroyed by all the per-byte ahk code needed (whereas in mscvcrt solution, this work is done by memchr).
Last edited by pneumatic on 06 May 2021, 22:31, edited 5 times in total.
Re: Search optimisation
A slight inconvenience with the msvcrt solution is that after finding a needle, it doesn't seem to advance the search offset by the length of the needle, so you can get overlapping duplicates for needles which require manual removal afterwards.
eg.
haystack: 01 04 0A 01 04 0A 01 04 FF FF
needle: 01 04 0A 01 04
matches: offset 0; offset 3.
Although there are probably circumstances where it's desirable to find overlaps as well.
A compromise might be to have a parameter to control it.
eg.
haystack: 01 04 0A 01 04 0A 01 04 FF FF
needle: 01 04 0A 01 04
matches: offset 0; offset 3.
Although there are probably circumstances where it's desirable to find overlaps as well.
A compromise might be to have a parameter to control it.
-
- Posts: 4391
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Search optimisation
Code: Select all
SearchInBuff(pHaystack, hstkSize, pNeedle, needleSize, overlap := false) {
arr := GetMax(pNeedle, needleSize)
maxByte := arr.max, offset := arr.maxIdx
prevByte := arr.prevMax, prevByteShift := arr.prevMaxIdx - offset
bool := needleSize > 1
found := "", addr := pHaystack + offset - 1, maxAddr := addr + hstkSize - needleSize + 1
while (maxAddr - addr > 0) && addr := DllCall("msvcrt\memchr", "Ptr", addr + 1, "Int", maxByte, "Ptr", maxAddr - addr, "Cdecl Ptr") {
( (bool ? NumGet(addr + prevByteShift, "UChar") = prevByte : true)
&& DllCall("msvcrt\memcmp", "Ptr", addr - offset, "Ptr", pNeedle, "Ptr", needleSize, "Cdecl") = 0
&& (found .= Format("{:#x}", addr - offset - pHaystack) . "`n")
&& (overlap ? true : addr += needleSize - 1) )
}
Return RTrim(found, "`n")
}
GetMax(pData, len) {
VarSetCapacity(sortedData, len, 0)
DllCall("RtlMoveMemory", "Ptr", &sortedData, "Ptr", pData, "Ptr", len)
pCmp := RegisterCallback("cmp", "C F", 2)
DllCall("msvcrt\qsort", "Ptr", &sortedData, "Ptr", len, "Ptr", 1, "Ptr", pCmp, "Cdecl")
max := NumGet(sortedData, "UChar"), (len > 1 && prevMax := NumGet(sortedData, 1, "UChar"))
Loop % len {
i := A_Index - 1
res := NumGet(pData + i, "UChar")
(res = max && maxIdx := i)
(res = prevMax && prevMaxIdx := i)
}
Return {max: max, maxIdx: maxIdx, prevMax: prevMax, prevMaxIdx: prevMaxIdx}
}
cmp(a, b) {
a := NumGet(a + 0, "UChar"), b := NumGet(b + 0, "UChar")
Return a < b ? 1 : a > b ? -1 : 0
}
Re: Search optimisation
@teadrinker
Just to note that your newer versions seems to contain more pre-code leading up to the msvcrt\memchr and msvcrt\memcmp , which may negate the performance gain when searching for many needles in a haystack (I'm searching for around 1000 of them )
edit: after comparing msvcrt vs mcode performance, it seems mcode is only around 4x faster on average with the needles I'm searching, which contain 0x00 in 30% of their bytes. However some of this is being obscured by the ahk code being executed in between. If I remove all the ahk code I get anywhere from mcode being 10x faster to 4x slower depending on the byte pattern.
edit2: for the msvcrt solution, I'd like to specify a starting position to search the haystack -- do I just need to add it to addr ?
i.e modifying only the following 2 lines of SearchMsvcrt():
SearchMsvcrt(pHaystack, hstkSize, pNeedle, needleSize, overlap := false, startaddr:=0)
found := [], addr := pHaystack + offset - 1 + startaddr, maxAddr := addr + hstkSize - needleSize + 1
Or will this break things elsewhere?
Just to note that your newer versions seems to contain more pre-code leading up to the msvcrt\memchr and msvcrt\memcmp , which may negate the performance gain when searching for many needles in a haystack (I'm searching for around 1000 of them )
edit: after comparing msvcrt vs mcode performance, it seems mcode is only around 4x faster on average with the needles I'm searching, which contain 0x00 in 30% of their bytes. However some of this is being obscured by the ahk code being executed in between. If I remove all the ahk code I get anywhere from mcode being 10x faster to 4x slower depending on the byte pattern.
edit2: for the msvcrt solution, I'd like to specify a starting position to search the haystack -- do I just need to add it to addr ?
i.e modifying only the following 2 lines of SearchMsvcrt():
SearchMsvcrt(pHaystack, hstkSize, pNeedle, needleSize, overlap := false, startaddr:=0)
found := [], addr := pHaystack + offset - 1 + startaddr, maxAddr := addr + hstkSize - needleSize + 1
Or will this break things elsewhere?
-
- Posts: 4391
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Search optimisation
You need to increase pHaystack and decrease hstkSize. I'd do it before calling the function.
Who is online
Users browsing this forum: peter_ahk and 140 guests