AHK v2: converting/optimizing scripts Topic is solved

Get help with using AutoHotkey (v2 or newer) and its commands and hotkeys
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

20 Dec 2018, 14:38

proof of concept :D I reduced mcode to 47 binary bytes as well. And added auto-alignment to 16 bytes boundary for args structure this.a:=(c+n+15)&~15
updated mcode class to v1.1

Code: Select all

class mcode{ ;v1.1 by vvhitevvizard
;0x8000=MEM_RELEASE
	static __Delete:=(this)=>DllCall("VirtualFree", 'ptr',this.c, 'uint',0, 'uint',0x8000)
	__New(_s){
;allocates the virtual addr space of the calling process and changes its protection
;If 1st arg=0, 1=size (1 byte for mcodes <4096) is rounded up to the next page boundary
;Memory allocated is auto-initialized to zero
;0x3000=MEM_COMMIT | MEM_RESERVE, 0x40=PAGE_EXECUTE_READWRITE
		c:=this.c:=DllCall("VirtualAlloc", 'ptr',0, 'uint',1, 'uint',0x3000, 'uint',0x40, "PTR")
;0=zero terminated, 1=Base64 w/o hdr, n (out) contains the mcode's binary size
		DllCall("crypt32\CryptStringToBinary", 'str',_s, 'uint',0, 'uint',1, 'ptr',c
			, 'uintp',n:=4096, 'ptr',0, 'ptr',0)
			|| (this.__Delete(),this.c:=0) ;cleanup on decrypt FAIL
		this.a:=(c+n+15)&~15 ;addr of binary struct for passed args aligned at 16
	}
}

needle:="fly", haystack:="The quick brown fox jumps over the lazy dog."
strIn:=new mcode("SIsNSQAAAEyLFUoAAABIichmRIsAZkWFwHQdTInSZkSLCmZFOch0EkiDwgJmRYXJdexIg8AC69kxwEgpyDHSSNH4/8APSMLD")
NumPut(&needle, (NumPut(&haystack, strIn.a)))
msgbox(DllCall(strIn.c))
Last edited by vvhitevvizard on 20 Dec 2018, 23:48, edited 1 time in total.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

20 Dec 2018, 14:52

vvhitevvizard wrote:
20 Dec 2018, 14:38
Nice work! You can further improve speed by storing the addresses in variables instead of objects!
Spoiler
I think your latest is about 2x as fast as the previous :D
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

20 Dec 2018, 15:02

Code: Select all

You can further improve speed by storing the addresses in variables instead of objects!
yup. I always do that for inner cycles. :D

Code: Select all

mc := strIn.c
ma := strIn.a
t := A_TickCount
loop n
	NumPut(&needle, (NumPut(&haystack, ma)))
	DllCall(mc)
msgbox(A_tickcount - t)
This one is 126 vs 312 of the prev method. x2.5 times as fast!

btw I didnt optimize return value. it can be passed thru binary structure as well. And we could pass MessageBoxA address to mcode for debugging.
Last edited by vvhitevvizard on 20 Dec 2018, 15:14, edited 1 time in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

20 Dec 2018, 15:06

Spoiler
And C code. I see it can be optimized actually. for code size at least. we have redundant math here. 2 inner subfunctions calculate address, while main function is trying to make an offset out of it.
why not to calculate offset everywhere? ) And the whole C source separated in 3 functions looks terrible.
Spoiler
in assembly, starting from label L3 is calculation of that offset. it actually takes 1/4 of the whole code size.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

20 Dec 2018, 16:00

It never ends. ;) replaced int with unsigned int, turned it into pure address logic operations:

Code: Select all

unsigned int match(){
	const wchar_t* r=wcspbrk2(h,n);
	return r ? r-h+1 : 0;
}
L3 part became simple :) -4 bytes

Code: Select all

L3:
	sub	rax, rcx
	sar	eax, 1 ;byte offset/2 = token offset
	inc	eax ;starting from 1
	ret
"SIsNSQAAAEyLFUoAAABIichmRIsAZkWFwHQdTInSZkSLCmZFOch0E0iDwgJmRYXJdexIg8AC69kxwMNIKchI0fj/wMM="

2.
btw, the lowest level of memory allocation for user space is ZwAllocateVirtualMemory(). this function should be the fastest
https://msdn.microsoft.com/en-us/librar ... s.85).aspx

GlobalFree needs to be replaced with VirtualFree just in case. thou both do accept just a memory region starting addr.

and I added regular destructor to free the allocated memory when the last reference to the instance is deleted (e.g. when we reassign the same var with new mcode instance) like:
strIn:=new mcode(s) ;this memory is auto-freed on next line
strIn:=new mcode(s)
updated full script above to v1.1

3.
the mcode class should reserve the 2nd memory region with addr+0x1000 for data (read/write). this way all the variables should just address+0x1000. No relocation tables, we still use flat model and relative addresses inside mcode.
And the 1st region for mcode should be set with Execute only attribute.

But I think of adding the collection functionality for mcode. allocating 4k chunks just for small snippets is suboptimal.
Last edited by vvhitevvizard on 20 Dec 2018, 23:53, edited 3 times in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

20 Dec 2018, 23:45

ok. regular a:=DllCall(f) is faster than a:=NumGet(f, 16), DllCall(f). we return uint64 (default) here.
0.000125ms vs 0.000156ms: -20%. So no point to change it.
Spoiler
.

2.
asm code is optimized for performance AND size (-3 lines, -5 bytes more):
I exploited the idea of using offset instead of pointer(address math) and e.g. use mov cx, [r11+rdx*2] addressing mode (here r11 - string base, rdx - offset, *2 = sideof.wchar. So I simply increment offset and dont change base address at all.
both strings length is restricted to 4GB - doing so I saved a few bytes with shorter instructions :)
s:=("TIsVOQAAAE0xwEyLHTcAAABmQ4sMQkyJwGaFyXQdTTHJZkOLFEtmhdJ0C0n/wWY50XXu/8DDSf/A69YxwMM=")
I m not sure about naming convention. Let it be InStrS

the functioning InStrS is only 13% slower than a DllCall of _empty_ mcode; its hard to measure any perf changes.
I had to increase length of both haystack and needle to see at least the rough difference. Now it should return 107 (position of f)
needle:="ыфвапролджейцук-01234567890@#$%^*()!lfy"
haystack:="..........................................................................................The quick brown fox jumps over the lazy dog."

source for .asm
the next optimization step would be trying to get rid of conditional branching (je, jne). Maybe later. I dont see this even possible for strings processing.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

21 Dec 2018, 04:13

addition to InsStrS: InStrC, analogue of InStr(string, char,1) aka strchr()
2nd arg can be ushort sized (wchar itself, not a pointer), but I use it as ptr size for convinience of skipping default params for NumPut/NumGet (it doesnt affect perf). Usage:

Code: Select all

InStrC:=new mcode("TIsFKQAAAE0xyWaLFScAAABmQ4sMSEyJyGaFyXQLSf/BZjnRdev/wMMxwMM=")
NumPut(Ord('f'), (NumPut(&haystack, InStrC.a)))
msgbox(DllCall(InStrC.c))
2.
I tested custom mcode which could replace ~= "[\]\},\s]|$" regex for json.Get():

Code: Select all

;v1.1
global InStrS1:=new mcode("U0yLFWgAAABFMcBJuwAmAAABEAAAZkOLDEJEicBmhcl0O2aD+VuJyg+Uw4Pi"
	. "32aD+l2yAUEPlMFmg/ksdxBMidpI0+qD4gFIg/IBg+IBQQnZSf/ARDjKd7z/wOsCMcBbww==")
and found we have a huge issue here (and that issue seems to be there since the original version):
(d:=SubStr(_s, n) ~= "[\]\},\s]|$",v:=SubStr(_s, n, d-1), n+=d-2
this line is good for small json we used for testing, but for big ones its a disaster. the line above with SubStr w/o 3rd parameter copies everything from the current pos to the end of the string into new string and does the same redundant work for every new number-type token found in json. And now imagine 1MB sized json :D

so replacing that line with : (NumPut(&_s+n*2,ma), d:=DllCall(mc), v:=SubStr(_s, n, d), n+=d-1 gives some boost. for "compact" test string boost is +2%, for 14KB real-world json its +13% Not a wow effect but we replaced only small part in the inner cycle.

EDIT: I ended up just adding the related mcode initialization inside json.Get

Code: Select all

	Get(ByRef _s){
		(f) || f:=new mcode("TIsFOQAAAE0xyWZDiwRISf/BZoP4IA+WwWaD+FsPlMII0Wa"
			. "D+CwPlMII0WaD4N9mg/hdD5TCCNF00EyJyMM=") ;~= "[,\]\}\s]|$"
		fc:=f.c, fa:=f.a

		...
Last edited by vvhitevvizard on 22 Dec 2018, 04:15, edited 11 times in total.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

21 Dec 2018, 12:19

I see that you have been putting the MCode to good use already :thumbup:
Seems like you are on the path to replacing nearly everything inside the while loop of Get() with MCode :D
With that idea, is it possible to load the string into MCode's address space at the beginning of Get(), use the MCode to determine whether or not the current segment is an object/array/string/number, and then return the corresponding flag, thus making the while loop contain only the MCode, NumGet/StrGet and the push/assignment actions?
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

22 Dec 2018, 01:45

oif2003 wrote:
21 Dec 2018, 12:19
I see that you have been putting the MCode to good use already :thumbup:
Seems like you are on the path to replacing nearly everything inside the while loop of Get() with MCode :D
Its hard to beat InStr operating with <5 symbols. regex can be beaten with ease (performance-wise), especially complex ones.
Ofc I might think of replacing the whole json.Get logic with mcode BUT it uses AHK objects so things get overcomplicated. Rather I'm just testing possibilities of mixing AHK with mcode. json.Get is just a guinea-pig. :D

I've manually reworked the post #152 custom-logic mcode to search for every white space as 1 operation (\t, \r, \n, " " and all other codes below Chr(32) can be considered whitespace and thus denoting number token field's end). And using setxx asm instructions, I managed to get rid of 3/4 conditional branches inside the loop. So resulting code is shorter and faster at the same time. It gives additional 1.5% performance boost for 14KB test file (see below)

Code: Select all

;v1.2
		this.s1:=new mcode("TIsFOQAAAE0xyWZDiwRISf/BZoP4IA+WwWaD+FsPlMII0WaD+CwPlMII0W"
			. "aD4N9mg/hdD5TCCNF00EyJyMM=")
and 3rd version. +0.3% performace (-1 conditional branch, there is only 1 left). thou +3 bytes longer.

Code: Select all

;v1.3
		(this.s1) || this.s1:=new mcode("TIsFOQAAAE0xyWZDiwRISf/BZoP4IA+WwWaD+FsPlMII0WaD+CwPlMII0WaD4N9mg/hdD5TCCNF00EyJyMM=") ;~= "[,\]\}\s]|$"

here is 14KB real-world json file (there r files of 500+KB as well I receive).
1678982546.7z
(1.69 KiB) Downloaded 79 times
To test with it, u load it into j and decrease n loop count by x10.
j:=FileRead("1678982546"), n:=2000
Last edited by vvhitevvizard on 22 Dec 2018, 03:36, edited 7 times in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

22 Dec 2018, 01:55

Also I found 1 more disappointment. I tried to replace InStr(" `t`n`r", c,1) with Ord(c)<33

But according to tests, where c contains a single wide char, e.g.: c="f":
InStr(" `t`n`r", c,1) vs Ord(c)<33
and InStr(" `t`n`r", c,1) vs NumGet(&c,0,"ushort")<33
Ord variant is 1.9% slower
and NumGet is 4.8% slower

Thats not right. Ord should be faster than Instr with a substring of 4 chars to compare with cuz Ord just loads pure ushort integer from &char or &string and compares it with another pure integer (33), no pure number to string or opposite conversions required!

2.
I think AHK v2 logic where it tries to compare strings using anything except = != == !== should be abandoned. <= >= < > comparison of 2 strings makes no sense in the majority of cases. E.g. comparison of 2 stringified numbers ("11.11" < "2.01") leads to the wrong result: "2.01" is greater than "11.11". Oh, cmon... This AHK v2 logic leaves lots of sleeping errors... Just a nightmare. I would like v2 to return run-time "Type mismatch" for such cases.
Last edited by vvhitevvizard on 22 Dec 2018, 05:55, edited 1 time in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

22 Dec 2018, 02:57

oif2003 wrote:
21 Dec 2018, 12:19
With that idea, is it possible to load the string into MCode's address space at the beginning of Get(),
They share the same virtual address space. All AHK_Unicode strings can be accessed as widechar strings w.o any copying and/or conversion - we just provide mcode with that string addr via binary structure. :D I guess we can mess with inner AHK logic as well but I see no practical use in it.

I wrote a small debugging utility function just to see the memory dump around specified AHK string. Its very handy for binary mcode's address overview as well.
U use it with mem(&var) syntax.

Code: Select all

mem(_b, _n:=0x80){ ;&bin data, size
        Loop(_n){
		i:=A_Index-1, (i&0xF) || s.=(i ? "`n":"") Format("{:x}", &_b+i) ": ", 
		s.=Format("{:02x}", NumGet(_b,i,"char")) " "
        }
        return s
}
And I planned to print printable chars to the right side of it. btw, can u beautify it by improving algo even further? :D
use the MCode to determine whether or not the current segment is an object/array/string/number, and then return the corresponding flag, thus making the while loop contain only the MCode, NumGet/StrGet and the push/assignment actions?
Sounds good!! We cant really win against optimized InStr(",:",c,1) lines with multiple inlined mcodes, but we can mcode this logic as the whole indeed

Code: Select all

			InStr(" `t`n`r", c,1) f:=1... 
				InStr(z,c,1) f:=2 ...
				, InStr("}]", c) f:=3...
				: InStr(",:",c,1) f:=4...
				: InStr("{[",c,1) f:=5...
                                : ((c==q) f:=6...
Well, it needs to be tested.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

22 Dec 2018, 06:13

Oh I moved u away from the main cycle :D

Code: Select all

InStr(z,c,1) || E(c, n), u:=0
->

Code: Select all

InStr(z,c,1) || E(c, n)
...
				: ((c==q) ? (d:=InStr(_s,q,1, n+1),v:=SubStr(_s, n+1, d-n-1)
						, n:=d, (u:=kf) && (k:=v, z:=":")) ;string literals
					: (NumPut(&_s+n*2,fa), d:=DllCall(fc), v:=SubStr(_s, n, d), n+=d-1
						, (v is "Number") ? v+=0:E(c, n), u:=0) ;number
					, (u) || (a ? (k:=b.Push(v), z:=",]"):(b[k]:=v, z:=",}")))
+1.1% performance boost. We were not yet at the perfection it seems

2.
and when I got rid of u completely (by adding some trivial redundancy), I got another +2% for compact test string and +5% for 14KB json file
v1.3.2 rearranged branches to start from the most frequently appearing chars: double quote and ",:". got +1.3% for 14KB json file.
v1.3.3 Then I've deleted 3 instances of k assignments for a simple array type (its members have no name!): a ? (k:=b.Push(v)... -> a ? (b.Push(v)...
v1.3.4 After separating ",:" logic (and fixed a potential bug for "," case), I got another +1.5% for both compact test string and 14KB json file.
: InStr(",:",c,1) ? (z:=(t:=!a && c==",") ? q:x)->
: (c==":") ? (t:=0, z:=x)
: (c==",") ? (z:=(t:=!a) ? q "{[":x)
v1.3.5: best variant so far
Ive done it getting ready for the next steps:
.

3.
And here I stuck. Tried to combine branches with InStr to 1: o:=InStr('"{[]},:',c,1) and use (o=index) ? for branching, but cant beat fast InStr, it works faster than comparing o with a number.
Then I tried a table-driven approach which is definitely the fastest in Assembly and C/C++. But function's array (lookup table) to call the one according to the index in o failed as well: deref F%o%() (calling closures named F0(), F1(), F2(), etc) is very slow...

Main loop could look like this (2 lines) but...

Code: Select all

		while((c:=SubStr(_s, ++n, 1))!=="")
			InStr(" `t`n`r",c,1) || (InStr(z,c,1) || E(c, n), o:=InStr('"{[]},:',c,1), F%o%() )
performance: retrograde step
.

Using array of closures u[o].Call() is increasing code size while being even slower (-50% compared to v1.3.5 in this post)...
performance: retrograde step 2
Last edited by vvhitevvizard on 23 Dec 2018, 01:41, edited 8 times in total.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

22 Dec 2018, 23:51

vvhitevvizard wrote:
22 Dec 2018, 01:45
I've manually reworked the post #152 custom-logic mcode to search for every white space as 1 operation (\t, \r, \n, " " and all other codes below Chr(32) can be considered whitespace and thus denoting number token field's end). And using setxx asm instructions, I managed to get rid of 3/4 conditional branches inside the loop. So resulting code is shorter and faster at the same time.
That's pretty slick! :)
vvhitevvizard wrote:
22 Dec 2018, 01:55
Ord should be faster than Instr
...
I think AHK v2 logic where it tries to compare strings using anything except = != == !== should be abandoned.
That is very strange... My simple test shows that Ord(c)<33 is indeed faster than InStr(" `t`n`r", c)
Spoiler

I think the logic for string comparison is fine. It is the coder's responsibility to know what the types are. The "2.01" > "11.11" issue shows up in filename sorting as well. It may not be ideal for all situations, but it does seem to be the usual way of comparing strings.


If you do end up replacing all the checks inside the while loop with MCode, perhaps it can return an Int64 with the first 2 bits as a flag (covering string/number/array/object) plus two 31 bit numbers (like Int32 without negative) representing start and end positions of the sub-string.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 00:07

vvhitevvizard wrote:
22 Dec 2018, 06:13
Might not be worthwhile, but can we push each function reference into an array and then call the function by index (o) to avoid using %o%?

Edit: Nevermind, you already tested that :facepalm:
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 01:53

oif2003 wrote:
22 Dec 2018, 23:51
That is very strange... My simple test shows that Ord(c)<33 is indeed faster than InStr(" `t`n`r", c)
yeah. I checked ur test file. in ur synthetical test Ord seems much faster indeed.
but when I replace json.Get's line InStr(" `t`n`r",c,1) || ( with (Ord(c)<33) || ( I get like 5% slowdown with 14KB json file.
Disastrous slowdown. I cant fathom a mystery of it.
Nevermind, you already tested that
yeah. and functions array variant is even slower than using deref. u can check it for urself
I think the logic for string comparison is fine. It is the coder's responsibility to know what the types are.
I often "import" numbers from .CSV, .INI, .JSON, GUI Edit fields, etc. Its a real pain to make sure I added +0 all over the script to convert all of them to pure numbers. Otherwise I get a sleeping error that might not show up for long time. I would prefer AHK v2 to be a bit more strict concerning using math operators on strings. concatenation ., =, != is all that should be allowed for strings. If one needs to compare strings using < >, he should use Ord(next_char_in_the_string)

Code: Select all

to be the usual way of comparing strings.
I beg to differ. its AHK feature AFAIK. e.g. in Java/JS u can use only =, != on strings. And compare strings by their length.
other operators make no sense. Its like doing math operations on AHK Objects. the only operation that makes sense is comparison whether some reference to Object = Object (both have the same address).
oif2003 wrote:
22 Dec 2018, 23:51
If you do end up replacing all the checks inside the while loop with MCode, perhaps it can return an Int64 with the first 2 bits as a flag (covering string/number/array/object) plus two 31 bit numbers (like Int32 without negative) representing start and end positions of the sub-string.
I thought about using found offset in each_char_from_string as a flag at the same time:
',:{[' -> offset =1: we found "," =2:":", =3..4:"{[", =0:end of string. so no need in any additional math applied on it.
Issue is I don't see a way of effective branching in AHK for parsing. Honestly, DllCall called for every few chars in json wont make any advance here.
Last edited by vvhitevvizard on 23 Dec 2018, 02:58, edited 5 times in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 02:43

btw, have u checked mem dump utility from post #155
It helps to detect not so obvious things. e.g. here follows a funny discovery. I always considered that 2 bytes would be enuf to represent 1 cyrillic symbol in wide string. Aha! It takes 4 bytes actually! so for "два2" token's length is 7 not 4
Spoiler
nvm I found the issue. I saved it as Unicode w.o BOM. That makes AHK crazy. Unicode WITH BOM - All works as expected :D
Last edited by vvhitevvizard on 23 Dec 2018, 08:47, edited 1 time in total.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 03:02

I suppose the word "usual" does need to be qualified. If I recall correctly, Matlab, Lua, Julia and VBA behave like this, so I didn't see it as anything special in AHK. I've never seen it in C but I've seen it used in C++. In some languages you can do === to see if objects are not only identical but are in fact the same object. In AHK one can always do a &v1 == &v2, but you already know that. I suppose it all depends on what one is used to working with. I see no obvious overall advantage or disadvantages either way. I like the convenience of comparison operator for strings bring, but as you have pointed out, there are drawbacks.

I remember reading about UTF-8 here: http://unifoundry.com/unicode-tutorial.html
I'm glad I don't have to know how it works in order to use it :D
From the article:
If the upper two bits in a byte are "10", then the byte must be part of a multi-byte UTF-8 sequence, but not the first byte. If the upper two bits begin with "11", then the byte must be the first byte of a multi-byte UTF-8 code point sequence. This byproduct allows rapid string searching. The ability to perform string searching quickly even if jumping into the middle of a multi-byte character sequence was one of Ken Thompson's design goals for what has become UTF-8 encoding.
Interesting stuff!

Also that utility is quite nice. I used to fire up Cheat Engine for stuff like that.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 11:05

Im tinkering with x64 SSE4.2 widechar alternative of StrLen
Its blazing fast. Yet it cant compete with StrLen for regular AHK strings cuz AHK saves string's length somewhere and just retrieves that value on request. But this mcode is very useful for finding length of some zero-terminated string or array of WORDS (2 bytes) elements returned by some DLL call.
test file: I compare it with Strlen just for output verification =)

Code: Select all

class mcode{ ;v1.1 by vvhitevvizard
;0x8000=MEM_RELEASE
	static __Delete:=(this)=>DllCall("VirtualFree", 'ptr',this.c, 'uint',0, 'uint',0x8000)
	__New(_s){
;allocates the virtual addr space of the calling process and changes its protection
;If 1st arg=0, 1=size (1 byte for mcodes <4096) is rounded up to the next page boundary
;Memory allocated is auto-initialized to zero
;0x3000=MEM_COMMIT | MEM_RESERVE, 0x40=PAGE_EXECUTE_READWRITE
		c:=this.c:=DllCall("VirtualAlloc", 'ptr',0, 'uint',1, 'uint',0x3000, 'uint',0x40, "PTR")
;0=zero terminated, 1=Base64 w/o hdr, n (out) contains the mcode's binary size
		DllCall("crypt32\CryptStringToBinary", 'str',_s, 'uint',0, 'uint',1, 'ptr',c
			, 'uintp',n:=4096, 'ptr',0, 'ptr',0)
			|| (this.__Delete(),this.c:=0) ;cleanup on decrypt FAIL
		this.a:=(c+n+15)&~15 ;addr of binary struct for passed args aligned at 16
	}
}

; Test runs
	f:=new mcode("SMfA8P///0iLFSIAAABmD+/Ag8AQZg86YwQCCXX00egByMM=")
	fc:=f.c, fa:=f.a

j:='12345678901234567'
;VarSetCapacity(b,1024,0)
;StrPut(j, &b, "CP0")
n:=200*1000

t:=A_TickCount
loop(n)
	(NumPut(&j,fa)), s1:=DllCall(fc)
a1:=(A_TickCount-t)/n

t:=A_TickCount
loop(n)
	s2:=StrLen(j)
a2:=(A_TickCount-t)/n

msgbox(clipboard:=a1 "|" a2 "`n`n" s1 "|" s2)
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 12:56

I keep forgetting we are dealing with UTF-16 here. My mind keeps defaulting back to UTF-8 because that's what I'm used to seeing :crazy:
AHK strings cuz AHK saves string's length somewhere and just retrieves that value on request
That explains a lot. It does appear to be at least 2x faster on my tests of random length strings if the string is stored in memory.
Spoiler
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

23 Dec 2018, 21:40

Code: Select all

It does appear to be at least 2x faster on my tests of random length strings if the string is stored in memory.
3.2 .. 3.9 times faster in my case haha. but thats for 7000 widechars string. Average string is around 10 widechars or less. :) Anyways this SE 4.2 StrLen beats even Microsoft`s StrLen
asm source (FASM format)
It might be sped up still. The main cycle can be aligned to 16 bytes boundary, etc.
String comparison like InStr can be done this way BUT
I don't feel like we get any tangible performance increase for json parsing cuz the latter consists of short substrings 1..10 widechars long.

2.
just test this slightly optimized version - I aligned it and made sure all the inner cycle instructions fit into 16 byte cache line of CPU.
Now its an additional ~30..60% performance boost :) Overall x4..x4.5 as fast compared to AHK's StrLen
3.
ok. a few more optimization hacks. now its x5 and (if cached luckily) up to 16.2! times as fast for 7000 chars long line. Im serious -try to run it several times :)
Image
f:=new mcode("TIsFOQAAAEjHwhAAAABIx8Dw////Zg/vwEjHwfD///8B0GZBDzpjBAAJdfTR6AHIww==")
Last edited by vvhitevvizard on 24 Dec 2018, 00:25, edited 1 time in total.

Return to “Ask for Help (v2)”

Who is online

Users browsing this forum: a_bolog, Descolada, gero, kunkel321, niCode and 34 guests