using mcode to speed up array search

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
colt
Posts: 291
Joined: 04 Aug 2014, 23:12
Location: Portland Oregon

using mcode to speed up array search

Post by colt » 17 Sep 2021, 10:46

I have an array that is a cache of all pdf files on system. There is a function that searches this array for a fuzzy match and returns an array of all matches. This is called many of times in my main program on launch and if I could speed it up would greatly improve performance. I am planning on transferring the search function to mcode, and am looking for tips on how to best implement it.

Specifically, can I pass a pointer to the native ahk array and navigate it in c? I tried making a simple mcode function to explore the memory after the array pointer but didn't see anything recognizable.

Code: Select all

out := ""
test := [1,2,3,4,5,6,7,8,9]

MyFunction := MCode("2,x86:VYnlg+wEi0UMiEX8D75V/ItFCAHQD7YAycOQkA==")

loop 200
{
	raw := DllCall(MyFunction,"ptr",&test ,"int",A_index-1, "cdecl")
	out .= chr(raw) . "  " . raw . "`n"
}

clipboard := out
msgbox done

/*
char explore(char *arrayPtr,char offset)
{
	return *(arrayPtr+offset);
}
*/

MCode(mcode)
{
  static e := {1:4, 2:1}, c := (A_PtrSize=8) ? "x64" : "x86"
  if (!regexmatch(mcode, "^([0-9]+),(" c ":|.*?," c ":)([^,]+)", m))
    return
  if (!DllCall("crypt32\CryptStringToBinary", "str", m3, "uint", 0, "uint", e[m1], "ptr", 0, "uint*", s, "ptr", 0, "ptr", 0))
    return
  p := DllCall("GlobalAlloc", "uint", 0, "ptr", s, "ptr")
  if (c="x64")
    DllCall("VirtualProtect", "ptr", p, "ptr", s, "uint", 0x40, "uint*", op)
  if (DllCall("crypt32\CryptStringToBinary", "str", m3, "uint", 0, "uint", e[m1], "ptr", p, "uint*", s, "ptr", 0, "ptr", 0))
    return p
  DllCall("GlobalFree", "ptr", p)
}
My second thought was to convert the cache to a giant string and pass that pointer to the mcode.

Thanks for any tips.
User avatar
Spawnova
Posts: 554
Joined: 08 Jul 2015, 00:12
Contact:

Re: using mcode to speed up array search

Post by Spawnova » 18 Sep 2021, 05:40

How many pdfs do you have? AutoHotKey should still be plenty fast enough to do what you want

Here's an example script which parses over 100k pdfs, for me it takes between 15-30ms to check 100k, 0-15ms on 50k and below

Code: Select all

#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.
setbatchlines,-1  ;improves speed, but increases cpu usage during loops, if doing long loops consider sleeping

pdf := []
loop 100000 ;100k random pdfs
	pdf.push(GetRandomString(pdf))

loop 1000 ; 1000 target pdfs
	pdf.push("zFile" a_index ".pdf")


start := a_tickcount
a := GetExactMatches(pdf,"zFile1") ;match exactly the string only
msgbox % "Took " (a_tickcount - start) " ms`nMatches: " a.length()

start := a_tickcount
a := GetFuzzyMatches(pdf,"zFile\d+") ;match zFile and any digits after it only
msgbox % "Took " (a_tickcount - start) " ms`nMatches: " a.length()


start := a_tickcount
a := GetFuzzyMatches(pdf,"i)abc") ;match anything that has abc in it, case insensitive
msgbox % "Took " (a_tickcount - start) " ms`nMatches: " a.length()

str := ""
loop % a.length() {
	str .= "Match " a_index ": " pdf[a[a_index]] "`n"
	if (a_index = 10) ;only show up to 10 results
		break
}
msgbox % str
exitapp



GetExactMatches(pdf,needle) {
	a := []
	for k,v in pdf
		if (v = needle)
			a.push(k)
	return a
}

GetFuzzyMatches(pdf,needle) {
	a := []
	for k,v in pdf
		if (RegExMatch(v,needle))
			a.push(k)
	return a
}

GetRandomString(pdf) {
	str := a_workingdir "\"
	loop % random(8,20) {   ;make a random string from 8 to 20 length
		str .= chr(random(65,90) + (random(0,1) ? 32 : 0))
	}
	return str ".pdf"
}

Random(min,max:=100) {
	random,result,% min,% max
	return result
}
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: using mcode to speed up array search

Post by swagfag » 18 Sep 2021, 12:29

colt wrote:
17 Sep 2021, 10:46
can I pass a pointer to the native ahk array and navigate it in c?
yes, but u have to pass the pointer to where the elements are actually stored(which is not the address of the array itself, but mFields see viewtopic.php?p=409399#p409399)
I tried making a simple mcode function to explore the memory after the array pointer but didn't see anything recognizable.
u dont need mcode for that, u can use NumGet. u wont see anything recognizable, unless u already know what to look for
colt
Posts: 291
Joined: 04 Aug 2014, 23:12
Location: Portland Oregon

Re: using mcode to speed up array search

Post by colt » 24 Sep 2021, 15:02

@Spawnova Thanks for the benchmark. I get similar times on my machine too. So in reality I have multiple cache structures that are arrays of full paths of each file type. I then call getFromCache and it decides which array to search. On real world data my getFromCache function gets called 3500 times each time my program is launched and spends approx 4 seconds inside this function or 1/3 my script launch time. Maybe converting to m-code would not produce much gain, and instead I should focus on constructing more searchable list arrays. I was hoping to just convert to mcode and have it magically reduce launch time.

edit
17000 .pdf
4000 .sldasm
5000 .sldprt

Code: Select all

getFromCache(searchTerm,fuzzyExtension := "",getFirstHitOnly := false)
{	
	global	
	
	timeIn := A_TickCount
	result := []
	;soundBeep 1000,1
	cacheStructure := {}
	if(instr(searchTerm,".PDF")||(fuzzyExtension = ".PDF"))
	{
		cacheStructure := "pdfCache"		
	}
	else if(instr(searchTerm,".SLDASM")||(fuzzyExtension = ".SLDASM"))
	{
		cacheStructure := "asmCache"
	}
	else if(instr(searchTerm,".SLDPRT")||(fuzzyExtension = ".SLDPRT"))
	{
		cacheStructure := "prtCache"
	}
	else
	{
		Msgbox %fuzzyExtension% extension not supported.`n Only .PDF .SLDASM .SLDPRT are supported.
	}
	if(getFirstHitOnly)
	{
		firstHitOnlyCount++
	}
	if(fuzzyExtension) ;doing fuzzy search so it will take inefficient search
	{
		fuzzySearch++
		for item in %cacheStructure%
		{
			if(instr(item , searchTerm))
			{
				if(fileExist(item))
				{
					result.push(item)
					if(getFirstHitOnly)
					{
						break
					}
				}
			}
		}
	}
	else ;fast search using some assumptions
	{		
		fastSearch++
		lenOffset := -1*(strLen(searchTerm)-1)
		for item in %cacheStructure%
		{
			;msgbox % searchTerm . "`n" . subStr(item,-(strLen(searchTerm)-1))
			;if(regexMatch(item,"(?i)" . searchTerm . "$"))
			if(subStr(item,lenOffset) = searchTerm)			
			{
				if(fileExist(item))
				{
					result.push(item)
					if(getFirstHitOnly)
					{
						break
					}
				}
			}
		}
	}
	totalTime += A_TickCount - timeIn
	return result
}
resulting in

Code: Select all

msgbox % "totalTime : " . totalTime . "`nfastSearch : " . fastSearch . "`nfuzzySearch : " . fuzzySearch . "`nfirstHitOnlyCount : " . firstHitOnlyCount

totalTime : 4202
fastSearch : 3468
fuzzySearch : 0
firstHitOnlyCount : 3277
@swagfag thanks for the link. it will give me a start.
Post Reply

Return to “Ask for Help (v1)”