Search optimisation Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: Search optimisation

06 May 2021, 08:45

jNizM wrote: Maybe at least a "ptr" after memchr
Of course, my mistake, fixed.
User avatar
jNizM
Posts: 3183
Joined: 30 Sep 2013, 01:33
Contact:

Re: Search optimisation

06 May 2021, 08:56

For memcmp the cdecl can be ignored since it is a int as return.
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

06 May 2021, 08:57

teadrinker wrote:
06 May 2021, 08:30
Perhaps, "Cdecl" must be added to all msvcrt functions:
That solves the issue for me.
teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: Search optimisation

06 May 2021, 09:00

@pneumatic
Can you test this, please:

Code: Select all

SetBatchLines, -1

filePath := "C:\Users\mi\AppData\Roaming\XnViewMP\Thumb.db"
bytes := "74 72 61"

size := ReadFile(filePath, buff)
len := CryptStringToBinary(bytes, needle, "CRYPT_STRING_HEX")
start := A_TickCount
offsets := SearchInBuff(&buff, size, &needle, len)
MsgBox,, % A_TickCount - start . "ms", % offsets

SearchInBuff(pHaystack, hstkSize, pNeedle, needleSize) {
   arr := GetMax(pNeedle, needleSize)
   maxByte := arr.max, offset := arr.maxIdx
   prevByte := arr.prevMax, prevByteShift := arr.prevMaxIdx - offset
   bool := needleSize > 1
   found := "", addr := pHaystack + offset - 1, maxAddr := addr + hstkSize - needleSize + 1
   while addr := DllCall("msvcrt\memchr", "Ptr", addr + 1, "Int", maxByte, "Ptr", maxAddr - addr, "Cdecl Ptr") {
      ( (bool ? NumGet(addr + prevByteShift, "UChar") = prevByte : true)
        && DllCall("msvcrt\memcmp", "Ptr", addr - offset, "Ptr", pNeedle, "Ptr", needleSize) = 0
        && found .= Format("{:#x}", addr - offset - pHaystack) . "`n" )
   }
   Return RTrim(found, "`n")
}

ReadFile(filePath, ByRef buff) {
   File := FileOpen(filePath, "r")
   File.Pos := 0
   File.RawRead(buff, size := File.Length)
   File.Close()
   Return size
}

CryptStringToBinary(string, ByRef outData, formatName := "CRYPT_STRING_BASE64")
{
   static formats := { CRYPT_STRING_BASE64: 0x1
                     , CRYPT_STRING_HEX:    0x4
                     , CRYPT_STRING_HEXRAW: 0xC }
   fmt := formats[formatName]
   chars := StrLen(string)
   if !DllCall("Crypt32\CryptStringToBinary", "Str", string, "UInt", chars, "UInt", fmt
                                            , "Ptr", 0, "UIntP", bytes, "UIntP", 0, "UIntP", 0)
      throw "CryptStringToBinary failed. LastError: " . A_LastError
   VarSetCapacity(outData, bytes)
   DllCall("Crypt32\CryptStringToBinary", "Str", string, "UInt", chars, "UInt", fmt
                                        , "Str", outData, "UIntP", bytes, "UIntP", 0, "UIntP", 0)
   Return bytes
}

GetMax(pData, len) {
   VarSetCapacity(sortedData, len, 0)
   DllCall("RtlMoveMemory", "Ptr", &sortedData, "Ptr", pData, "Ptr", len)
   pCmp := RegisterCallback("cmp", "C F", 2)
   DllCall("msvcrt\qsort", "Ptr", &sortedData, "Ptr", len, "Ptr", 1, "Ptr", pCmp, "Cdecl")
   max := NumGet(sortedData, "UChar"), (len > 1 && prevMax := NumGet(sortedData, 1, "UChar"))
   Loop % len {
      i := A_Index - 1
      res := NumGet(pData + i, "UChar")
      (res = max && maxIdx := i), (res = prevMax && prevMaxIdx := i)
   }
   Return {max: max, maxIdx: maxIdx, prevMax: prevMax, prevMaxIdx: prevMaxIdx}
}

cmp(a, b) {
   a := NumGet(a + 0, "UChar"), b := NumGet(b + 0, "UChar")
   Return a < b ? 1 : a > b ? -1 : 0
}
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

06 May 2021, 09:07

teadrinker wrote:
06 May 2021, 09:00
@pneumatic
Can you test this, please
It only works if I put either Cdecl or Cdecl Ptr as final parameter on both memchr and memcmp in SearchInBuff (in your example it's only present on memchr).
User avatar
jNizM
Posts: 3183
Joined: 30 Sep 2013, 01:33
Contact:

Re: Search optimisation

06 May 2021, 09:09

Strange. (ReturnType: If the function returns a 32-bit signed integer (Int), BOOL, or nothing at all, ReturnType may be omitted.)

Than use

Code: Select all

   while addr := DllCall("msvcrt\memchr", "Ptr", addr + 1, "Int", maxByte, "Ptr", maxAddr - addr, "Cdecl Ptr") {
      ( (bool ? NumGet(addr + prevByteShift, "UChar") = prevByte : true)
        && DllCall("msvcrt\memcmp", "Ptr", addr - offset, "Ptr", pNeedle, "Ptr", needleSize, "Cdecl Int") = 0
        && found .= Format("{:#x}", addr - offset - pHaystack) . "`n" )
   }

But it looks like it is needed
(If you omit Cdecl but the call yields ErrorLevel An -- where n is the total size of the arguments you passed -- Cdecl might be required.)
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: Search optimisation

06 May 2021, 09:14

jNizM wrote: ReturnType may be omitted
But "Cdecl" is not "ReturnType". (?)
User avatar
jNizM
Posts: 3183
Joined: 30 Sep 2013, 01:33
Contact:

Re: Search optimisation

06 May 2021, 09:19

Looks like I overread myself on "Cdecl ReturnType"

After 99.8% of all Windows 10 are on 64-bit, I haven't used 32-bit AutoHotkey for years.
And I don't see any reason why you should use 32-bit since Window 10. 🤷‍♂️

But nvm. Im glad we solved it with "cdecl ptr" and "cdecl int"
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

06 May 2021, 09:23

jNizM wrote:
06 May 2021, 09:19
After 99.8% of all Windows 10 are on 64-bit, I haven't used 32-bit AutoHotkey for years.
And I don't see any reason why you should use 32-bit since Window 10. 🤷‍♂️
For compatibility with mcode.
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

06 May 2021, 09:55

There's challenge: to compile a 64-bit version of the mcode for even more performance:
https://autohotkey.com/board/topic/23627-machine-code-binary-buffer-searching-regardless-of-null/#entry152824
User avatar
boiler
Posts: 16769
Joined: 21 Dec 2014, 02:44

Re: Search optimisation

06 May 2021, 10:03

jNizM wrote:
06 May 2021, 09:19
After 99.8% of all Windows 10 are on 64-bit, I haven't used 32-bit AutoHotkey for years.
And I don't see any reason why you should use 32-bit since Window 10. 🤷‍♂️
Not everyone is running Windows 10, so if you write code for distribution to others, 32-bit AHK ensures compatibility over a wider user base. I'm not saying you should use it, but there are valid reasons why others do.
User avatar
SKAN
Posts: 1551
Joined: 29 Sep 2013, 16:58

Re: Search optimisation

06 May 2021, 13:20

teadrinker wrote:
06 May 2021, 05:51
I assumed that the byte with max value is generally encountered less frequently. At least it can be non-zero, zero generally encountered very often.
SKAN wrote: A suggestion: When memchr() succeeds, check whether the last byte of needle is a match
Thanks! I took previous byte by value:
Oops! I didn't read your code fully... only the part I quoted. Great idea on searching the max value.
I have written a machine code version of InStr(), which can deal with pointers and search past nulls.
Just wondering what to do with the CaseSensitive parameter? :roll:

I will tag you when I post the function. Thanks for sharing wonderful code.

Edit: 09-May-2021
Done!   InBin() : Machine code variant of InStr(), for searching binary buffer.
User avatar
SKAN
Posts: 1551
Joined: 29 Sep 2013, 16:58

Re: Search optimisation

06 May 2021, 13:30

jNizM wrote:
06 May 2021, 09:19
99.8% of all Windows 10 are on 64-bit
99.8% of all third-party DLLs were written for 32-bit, I would say..
I have a powerful PC, but only for sake of Photoshop x64.
Almost all installed software are x64 except AutoHotkey.
I will default to x64 when V2 beta is available.
TAC109
Posts: 1098
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: Search optimisation

06 May 2021, 17:56

I don’t have time to look at converting InBuf to 64-bit at present. As far as I know the major changes needed would relate to the handling of the input parameters, which is different for x64.

This post has some information on using RegExMatch to search binary data. It is rather specialised but you may be able to get some ideas from it, depending on what you’re trying to achieve.
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

06 May 2021, 21:12

The mcode solution seems to be unusually slow when searching for needles containing 0's -- slower than the msvcrt solution.

I suspect it's due to teadrinker's optimisation of searching for the largest byte first, which doesn't seem to be implementable with the mcode, since you can't vary the size of its search address window (only the starting position to search the entire buffer thereafter).

edit: actually mcode is still slower even when the needle is just a single byte 0x00.

Also I think we could implement a 'search maxbyte first' solution in mcode, but I suspect performance may be destroyed by all the per-byte ahk code needed (whereas in mscvcrt solution, this work is done by memchr).
Last edited by pneumatic on 06 May 2021, 22:31, edited 5 times in total.
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

06 May 2021, 21:26

A slight inconvenience with the msvcrt solution is that after finding a needle, it doesn't seem to advance the search offset by the length of the needle, so you can get overlapping duplicates for needles which require manual removal afterwards.

eg.
haystack: 01 04 0A 01 04 0A 01 04 FF FF
needle: 01 04 0A 01 04
matches: offset 0; offset 3.

Although there are probably circumstances where it's desirable to find overlaps as well.

A compromise might be to have a parameter to control it.
teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: Search optimisation

07 May 2021, 03:06

pneumatic wrote: A compromise might be to have a parameter to control it.

Code: Select all

SearchInBuff(pHaystack, hstkSize, pNeedle, needleSize, overlap := false) {
   arr := GetMax(pNeedle, needleSize)
   maxByte := arr.max, offset := arr.maxIdx
   prevByte := arr.prevMax, prevByteShift := arr.prevMaxIdx - offset
   bool := needleSize > 1
   found := "", addr := pHaystack + offset - 1, maxAddr := addr + hstkSize - needleSize + 1
   while (maxAddr - addr > 0) && addr := DllCall("msvcrt\memchr", "Ptr", addr + 1, "Int", maxByte, "Ptr", maxAddr - addr, "Cdecl Ptr") {
      ( (bool ? NumGet(addr + prevByteShift, "UChar") = prevByte : true)
        && DllCall("msvcrt\memcmp", "Ptr", addr - offset, "Ptr", pNeedle, "Ptr", needleSize, "Cdecl") = 0
        && (found .= Format("{:#x}", addr - offset - pHaystack) . "`n")
        && (overlap ? true : addr += needleSize - 1) )
   }
   Return RTrim(found, "`n")
}

GetMax(pData, len) {
   VarSetCapacity(sortedData, len, 0)
   DllCall("RtlMoveMemory", "Ptr", &sortedData, "Ptr", pData, "Ptr", len)
   pCmp := RegisterCallback("cmp", "C F", 2)
   DllCall("msvcrt\qsort", "Ptr", &sortedData, "Ptr", len, "Ptr", 1, "Ptr", pCmp, "Cdecl")
   max := NumGet(sortedData, "UChar"), (len > 1 && prevMax := NumGet(sortedData, 1, "UChar"))
   Loop % len {
      i := A_Index - 1
      res := NumGet(pData + i, "UChar")
     (res = max && maxIdx := i)
     (res = prevMax && prevMaxIdx := i)
   }
   Return {max: max, maxIdx: maxIdx, prevMax: prevMax, prevMaxIdx: prevMaxIdx}
}

cmp(a, b) {
   a := NumGet(a + 0, "UChar"), b := NumGet(b + 0, "UChar")
   Return a < b ? 1 : a > b ? -1 : 0
}
pneumatic
Posts: 338
Joined: 05 Dec 2016, 01:51

Re: Search optimisation

08 May 2021, 02:50

@teadrinker

Just to note that your newer versions seems to contain more pre-code leading up to the msvcrt\memchr and msvcrt\memcmp , which may negate the performance gain when searching for many needles in a haystack (I'm searching for around 1000 of them :) )

edit: after comparing msvcrt vs mcode performance, it seems mcode is only around 4x faster on average with the needles I'm searching, which contain 0x00 in 30% of their bytes. However some of this is being obscured by the ahk code being executed in between. If I remove all the ahk code I get anywhere from mcode being 10x faster to 4x slower depending on the byte pattern.

edit2: for the msvcrt solution, I'd like to specify a starting position to search the haystack -- do I just need to add it to addr ?

i.e modifying only the following 2 lines of SearchMsvcrt():
SearchMsvcrt(pHaystack, hstkSize, pNeedle, needleSize, overlap := false, startaddr:=0)
found := [], addr := pHaystack + offset - 1 + startaddr, maxAddr := addr + hstkSize - needleSize + 1


Or will this break things elsewhere?
teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: Search optimisation

08 May 2021, 05:45

You need to increase pHaystack and decrease hstkSize. I'd do it before calling the function.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: FanaticGuru and 136 guests