AHK v2: converting/optimizing scripts Topic is solved

Get help with using AutoHotkey (v2 or newer) and its commands and hotkeys
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

14 Dec 2018, 21:27

I ended up with 1-liner :)

Code: Select all

;for "stringified" integer|float, get rid of leading/trailing zeros
;	keep decimal point only if 1+ digits follow it
;	respect digitals placeholder and just trim up. Faster than Format("{:g}", v)
SmartT(_s)=>Round(_s, Strlen(_s)-InStr(_s,".",1))
But its 281 vs 328. "only" 14.3% faster than Format
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

14 Dec 2018, 22:27

Nicely done! Being a one liner sorta makes up for it :D I did some investigating and found this in the AHK manual:
Even faster performance can be achieved by looking up the function's address beforehand. For example:

Code: Select all

; In the following example, if the DLL isn't yet loaded, use LoadLibrary in place of GetModuleHandle.
MulDivProc := DllCall("GetProcAddress", Ptr, DllCall("GetModuleHandle", Str, "kernel32", "Ptr"), AStr, "MulDiv", "Ptr")
Loop 500
    DllCall(MulDivProc, Int, 3, Int, 4, Int, 3)
Note: You must use ASTR for "GetProcAddress" since apparently it only comes in that flavor.

This brings compiled version (dll) within 10% of mcode. I prefer Dlls over mcodes because mcodes sometimes don't behave the way you expect them to. If you prefer mcode because it doesn't require a separate file, we can always cheat and pack dll as compressed text (windows compression api -> crypstring 64 base) inside the script and extract it on the fly (maybe even to a pipe?)

About TCC and 0x86, it has a 64 bit variant, which is the one I use... unless you are referring to something else? Anyway, for faster compiled code TCC is not that great anyway. It's selling point is size and compile speed, not code speed.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

14 Dec 2018, 23:02

oif2003 wrote:
14 Dec 2018, 22:27
Nicely done! Being a one liner sorta makes up for it :D
Do u recall that original SmartT looked ugly, BUT it was incredibly fast nonetheless? The reason is pure Float<->String conversions r very slow, so is Round . Faster than Format but not fast enuf. And even string crunching with un-optimized AHK loops appears to be much faster. So stringified float|integer's trimming up is another candidate to be m-coded. But for now we can try to optimize original SmartT (the one w/o Round calls). Can u try? :) Its x2.2 times faster but ~5-10 lines.
Last edited by vvhitevvizard on 14 Dec 2018, 23:47, edited 7 times in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

14 Dec 2018, 23:11

oif2003 wrote:
14 Dec 2018, 22:27
In the following example, if the DLL isn't yet loaded, use LoadLibrary in place of GetModuleHandle.
...
This brings compiled version (dll) within 10% of mcode.
that requires DLL to be a satellite of the script. Sometime we don't need all the assortment thats stored in particular DLL. So small m-code snippets r preferable.
I prefer Dlls over mcodes because mcodes sometimes don't behave the way you expect them to.
What do u mean? Could u elaborate?
With m-code u can store both x64 and x86 code (or even more variants like using/not using SSE4). U would need x2 DDLs satellites for that at least.
If u meant CPU's NX bit protection, I use AHK mcode function with dllcalling VirtualProtect. So its not an issue.

Well, anyways having compiled machine code, one could use both approaches. Question here is compiling C code. :D
About TCC and 0x86, it has a 64 bit variant, which is the one I use... unless you are referring to something else? Anyway, for faster compiled code TCC is not that great anyway. It's selling point is size and compile speed, not code speed.
it does NOT have x64 inline assembler. Correct me if Im wrong.

PS: I used "Flat Assembler" http://flatassembler.net/ before. Its able to create DLL/EXE (x64 as well) bypassing OBJ step. It used macroses extensively. Code would look like:

Code: Select all

format PE64 console
entry start

include 'win64a.inc'

section '.text' code readable executable

start:
	sub	rsp,8

	invoke	WriteMessage,message

	invoke	ExitProcess,0

section '.data' data readable

  message db "Hi! I'm the example program!",0

section '.idata' import data readable writeable

  library kernel32,'KERNEL32.DLL',\
	  writemsg,'WRITEMSG.DLL'

  include 'api/kernel32.inc'

  import writemsg,\
	 WriteMessage,'WriteMessage'
and here is where real fun begins:
fasm avx instructions example
.

2.
For json.Get we need a snippet for typical pos:=UnicodeSearch(haystack, needle), where haystack is a binary data of Unicode DWORDS and needle is zero-terminated Unicode string. Which doesn't try to calculate haystack's StrLen before processing, but just respects "zerobyte" (zero DWORD) as the end of haystack data.
And another m-code, same as above, but using just 1 Unicode symbol as needle. We could effectively replace json.Get's InStr instances with it.


I feel like Ill write it in assembler but later.
Feel free to make a compiled C version of that algo as a prototype, Ill make sure to catch up and optimize it assembler-wise.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 03:40

According to https://wiki.osdev.org/TCC:
TCC includes also a linker and an assembler (only x86). But this assembler is limited: no 16/64-bit support, instructions up to MMX are supported.
I will leave that fancy assembly stuff to you. :thumbup:

According to nnnik's MCode Tutorial: https://autohotkey.com/boards/viewtopic.php?f=7&t=32
There are also other things that wont work in autohotkey:
Global&Static Variables
Objects(thiscall konvention)
Sometimes even calling the own function
Preset Floats can cause difficulties cause compiler rather sores them at a specific address than in a code.
You can test to see that static variables do not work correctly under MCode. I updated the previous script to compile both dll and object file. While the Dll version gives the correct output, the MCode version crashes with 0xc0000005 exit code.

Code: Select all

	;Need TCC: https://download.savannah.gnu.org/releases/tinycc/tcc-0.9.27-win64-bin.zip
	;place script in the same folder as libtcc.dll

#singleinstance force

hModule := load_cDll(make_c("btest.dll"))
load_cObject(make_c("otest.o"), mCode)

cTest := getProcAddress("test", hModule)
/*_C DLL
	#define DLL __attribute__ ((dllexport)) 
	
	DLL int test(void) {
		static int i = 0;
		return ++i;
	}
*/

/*_C mCode
	int test(void) {
		static int i = 0;
		return ++i;
	}
*/

t := A_TickCount
loop 1000000
	x := DllCall(cTest)
msgbox(A_TickCount - t " : " x)
msgbox(DllCall(&mCode)) ;Crashes with exit code: 0xC000 0005 (3221225477)

DllCall("FreeLibrary", 'Ptr', hModule)
;==================================================================================================
getProcAddress(fName, hModule) => DllCall("GetProcAddress", 'Ptr', hModule, 'AStr', fName)
load_cDll(bName) => DllCall("LoadLibrary", 'Str', bName, 'Ptr')

;StrPutVar helper function straight from the v2 docs
StrPutVar(string, ByRef var, encoding := "cp0") {    
	VarSetCapacity(var, StrPut(string, encoding) * ((encoding="utf-16"||encoding="cp1200") ? 2 : 1) )
	return StrPut(string, &var, encoding)
}

load_cObject(oName, ByRef code) {
	;parse ELF (Executable and Linkable Format) file
	o := FileRead(A_ScriptDir "\" oName, "RAW")
	,tableAdd := NumGet(o, 0x28, "Ptr")
	,toffset := 0x40
	,padd := NumGet(o, tableAdd + toffset +0x18, "UInt")
	,plen := NumGet(o, tableAdd + toffset +0x20, "UInt")
	,VarSetCapacity(code, plen)
	Loop plen//4	;write Program data to memory in 4 byte chuncks
		NumPut(NumGet(o, padd + (A_Index - 1)*4, "UInt"), code, (A_Index-1)*4, "UInt")
	return DllCall("VirtualProtect", "Ptr", &code, "UInt", plen, "UInt", 0x40, "Ptr*" , 0)
}	
	
make_c(cFileName) {
	directory := A_ScriptDir
	,outputDll := (SubStr(cFileName, -4) = ".dll")
	
	;read c code
	,startLabel := "`n/*_C " (outputDll ? "DLL" : "mCode")			
	,endLabel   := "`n*/"
	,script := fileRead(A_ScriptFullPath)
	,cStart := InStr(script, startLabel) + StrLen(startLabel) + 2
	,cEnd   := InStr(script, endLabel, , cStart + 1)
	,_cStr  := SubStr(script, cStart, cEnd - cStart)
	loop parse _cStr "`n", "`r"
		cStr .= (A_LoopField ~= "^[\s]+;") ? "" : A_LoopField	;ignore lines starting with ; (AHK escape)
	cStr := Trim(cStr, " `t`n`r") ? cStr 
								  : outputDll ? "DLL __attribute__ ((dllexport)) int main() {return 0;}"
											  : "int main() {return 0;}"
	;LIBTCC calls
	,htcclib := DllCall("LoadLibrary", "Str", directory "\libtcc.dll", "Ptr") 
	,Context := DllCall("libtcc\tcc_new", "Ptr")

	,TCC_OUTPUT_DLL := 3 ; dynamic library
	,TCC_OUTPUT_OBJ := 4 ; object file
	,DllCall("libtcc\tcc_set_output_type", "Ptr", Context, "UInt", outputDll ? TCC_OUTPUT_DLL : TCC_OUTPUT_OBJ, "Int")
	
	,StrPutVar(cStr, _cStr)
	,DllCall("libtcc\tcc_compile_string", "Ptr", Context, "Str", _cStr, "Int")
	
	,StrPutVar(directory "\" cFileName, filename)
	,DllCall("libtcc\tcc_output_file", "Ptr", Context, "Str", filename, "Int")
	
	;clean up
	,DllCall("libtcc\tcc_delete", "Ptr", Context)
	,DllCall("FreeLibrary", "Ptr", htcclib)
	
	return cFileName
}


I have not had any luck with writing those functions in C. The fastest variant I found was a simple call to strpbrk, but even that is slower than builtin RegExMatch. Some ideas to try later: 1) exploit the fact that the haystack is ever decreasing 2) keep track of which characters have been checked and skip over them 3) find a known algorithm that is fast.

I do think that whatever needs to be written in C ought to be done through MCode whenever possible; however, using DLL really is almost the same thing. The issue of having a satellite file can be dealt with by embedding the dll in text form just like MCode. As for 32bit support, I thought we were going for a pure 64bit solution?

Also there is one more advantage of TCC: C source code can be included instead of MCode/Dlls for maximum transparency, provided the user is willing to download and extract TCC. The compiler is fast enough to be used as a JIT. Performance may suffer, however, as TCC is not optimized for runtime speed.
Last edited by oif2003 on 15 Dec 2018, 03:53, edited 1 time in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 03:51

oif2003 wrote:
15 Dec 2018, 03:40
There are also other things that wont work in autohotkey:
Global&Static Variables
Objects(thiscall konvention)
Sometimes even calling the own function
1. global and static vars of caller should not be accessible inside m-code ofc.
2. m-code should not know anything about objects in AHK. it should be called from virtually any compiled or interpreted code if the latter can use DLL calls and knows calling conventions.
3. if we do recursive calls inside m-code, we probably do something wrong anyways.
4. m-code data is placed on CPU stack actually (EBP/ESP registers). so static variables (initialized at DLL library/exe program loading) wont work. We just design m-code in different way.

I wrote that on prev. page: true m-code should NOT rely on other DLL calls - it has input buffer and output buffer to process. We can regularly call DLLs via AHK so its up to AHK code to fetch data for m-code using external DLLs, and m-code idea is to process provided data (e.g. searching, replacing for binary buffers and strings).
Last edited by vvhitevvizard on 15 Dec 2018, 07:18, edited 6 times in total.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 03:55

I also haven't figure out how to get #include to work in MCode, so it leaves much to be desired. I think the list of issues are actually longer than that though.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:10

oif2003 wrote:
15 Dec 2018, 03:55
I also haven't figure out how to get #include to work in MCode, so it leaves much to be desired.
I Im not sure If I understood ur query.

Code: Select all

#Include %A_ScriptDir%
#Include ur file.ahk2 ;file with uuencoded machine code
..
f:=MCode(uuencoded_string) ;ofc we have to initialize it once. decode string, put it in memory, get pointer (f) to be used with DLLCalls.
Last edited by vvhitevvizard on 15 Dec 2018, 04:43, edited 2 times in total.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:10

nnnik wrote:
30 Sep 2013, 12:00
There are also other things that wont work in autohotkey:
Global&Static Variables
Objects(thiscall konvention)
Let me clarify this:
Global and static variables inside C code wont work with MCode.
C++ Objects won't be useable in AHK even if you compile them since AHKs dllCall cannot use the thiscall convention.
Recommends AHK Studio
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:22

nnnik wrote:
15 Dec 2018, 04:10
Global and static variables inside C code wont work with MCode.
We should just consider that every DLL library function (and every m-code chunk of the machine code) knows nothing of the caller (except the shared stack maybe). It cares only of valid input data and sufficient memory (maybe even provided by the callee) to fulfill some data transformation.
If m-code procedure tasked just for strings manipulation do some external DLLs calling to open files, get some registry data, etc - its just a poor design.
Last edited by vvhitevvizard on 15 Dec 2018, 07:17, edited 1 time in total.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:32

I never said anything about the callee. I said something about how C features like:
(Forgive me if the code might be invalid its been ages since I've last written C)

Code: Select all

global int f = 0;
int function(int u) {
	static int i = 0;
	i += f;
	f = u * f + u + f;
	return i;
}
Simply does not work
Recommends AHK Studio
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:35

nnnik wrote:
15 Dec 2018, 04:32
Simply does not work
yup. c code designed for m-coding should use only local (not static) variables. b/c static variables r placed in other memory area, not in stack.

and no calls to other functions. some library functions just should be inlined inside the function.

and it should be just a function. not a class method.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:51

Well Im trying to work around the global/static restriction with my new compiler.
But I have run into a few issues there and havent continued ever since.

You can call external dlls if you need to by passing them as a parameter (as shown in my tutorial I think).
Recommends AHK Studio
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 04:56

oif2003 wrote:
15 Dec 2018, 03:40
however, using DLL really is almost the same thing. The issue of having a satellite file can be dealt with by embedding the dll in text form just like MCode.
embedded DLL is just the same as m-code but increased in size. A lot of redundant data.
As for 32bit support, I thought we were going for a pure 64bit solution?
Sure. We skip 32bit machine code support.
Also there is one more advantage of TCC: C source code can be included instead of MCode/Dlls for maximum transparency, provided the user is willing to download and extract TCC. The compiler is fast enough to be used as a JIT. Performance may suffer, however, as TCC is not optimized for runtime speed.
U mean it compiles at run-time? I see nothing good in that. m-code is to increase performance (and maybe to obfuscate the code abit), but not to be editable on the fly. :D

nnnik wrote:
15 Dec 2018, 04:51
You can call external dlls if you need to by passing them as a parameter (as shown in my tutorial I think).
But better not. Let external dllcalling be in AHK caller. :) m-code should be portable and autonomous, w/o dependencies. Main idea of m-coding is in unleashing CPU power using machine registers aplenty and SIMD commands or at least compiled code.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 08:40

When you need performance the first step is to avoid AHK as much as possible.
Therefore calling external functions using MCode is a valid strategy and there is no reason not to.
Recommends AHK Studio
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 11:57

vvhitevvizard wrote:
15 Dec 2018, 04:56
Hey, I think you are misinterpreting some of the things I was saying. Perhaps my issue with MCode is due to my lack of skills in C. MCode has always seemed like a hack job to me, and can be difficult to work with at times. But that is the price one has to pay for speed sometimes. I understand TCC has little (or maybe none?) to offer in terms of performance, but I still see other potentials in it. For example, I just figured out how to use TCC's "output to memory." I can now get function pointers directly from TCC and C code no longer needs to be compiled to disk! Also there are at least two ways to "inline" C code with TCC:

Code: Select all

;define a C function string
stringC1 := new tcc('
	(
		char* hello() {
			return "hello world";
		}
	)', path)
msgbox(DllCall(stringC1["hello"], "AStr"))
or like previously

Code: Select all

;define a C comment block
commentC := new tcc(parseC())
/*_C
	#include <math.h>
	
	int sqrt3(double *d) {
		*d = sqrt(3);
	}
*/
DllCall(commentC["sqrt3"], "Ptr", &_sqrt := 0.0)
msgbox("sqrt(3) = " _sqrt)
Despite all the simplifications, I did run into one issue. I can't call windows.h's MessageBoxA directly from C anymore, so I wonder what else is broken?
Spoiler
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 15:13

nnnik wrote:
15 Dec 2018, 08:40
When you need performance the first step is to avoid AHK as much as possible.
Therefore calling external functions using MCode is a valid strategy and there is no reason not to.
I beg to differ. We don't optimize performance-wise the whole program structure. We pinpoint time-critical cycles in it which take 90% of CPU time at run time. In case of json parsing we try to optimize within the last 5 pages, it might be just a few simple cycles comparing 2 string buffers. No external DLL calls needed.
Ofc usage-case depends...

But Lets take as an example these AHK lines from mcode initialization lines doing some string conversion and memory allocation:
Spoiler
we could pre-calculate DLL functions addresses via AHK and be done with it, I doubt it can be optimized for tangible results any further if we move this fragment to mcode.

Another example is drawing some graphics using GDI+ calls extensively. Here calls like gp_RcFill (fill rectangle) take considerable CPU time by themselves. Any mcode optimization would have no noticeable effect unless u substitute the GDI+ DLL function itself (gdiplus\GdipFillRectangle)
Spoiler
Last edited by vvhitevvizard on 15 Dec 2018, 16:31, edited 1 time in total.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 16:12

oif2003 wrote:
15 Dec 2018, 11:57
Hey, I think you are misinterpreting some of the things I was saying.
Yeah there is a defect in my character of perfunctorily reading thru posts. Definitely u meant more profound issues.
I understand TCC has little (or maybe none?) to offer in terms of performance, but I still see other potentials in it. For example, I just figured out how to use TCC's "output to memory." I can now get function pointers directly from TCC and C code no longer needs to be compiled to disk! Also there are at least two ways to "inline" C code with TCC:

Code: Select all

;define a C function string
stringC1 := new tcc('
	(
		char* hello() {
			return "hello world";
		}
	)', path)
msgbox(DllCall(stringC1["hello"], "AStr"))
:thumbup: So its possible to convert small pieces of code one-at-a-time using DLL calls! Even auto-generate C for-loops on the fly! Thats definitely should be like x10 times as fast compared to running a compiler for the same small pieces of code in the ordinary way.

Thats good for drafting. To test the very feasibility of the lines (syntax and algorithm) to be run in C. But final step is to resort to m-coding/DLL storing, cuz the code above would still require TCC and we need portability. U can include commentary of source C code next to m-code uuencoded strings just to be able to edit it later :)

oif2003 wrote:
15 Dec 2018, 11:57
Despite all the simplifications, I did run into one issue. I can't call windows.h's MessageBoxA directly from C anymore, so I wonder what else is broken?
Nothing is broken. Code designed to be m-coded shouldn't rely on DLL calls in general. Just design it to output debug info within returned array.
I guess thats happening b/c the address of "MessageBoxA" (USER32.DLL) is not updated automatically (as it normally happens at start up from import tables) cuz ur machine code is generated on the fly. TCC doesn't know that address at compile time and since it skips EXE/DLL creation there is no import tables created to dynamically link DLL.
For test purpose u need to pass 1 additional argument, a pointer to that MessageBoxA function.
Last edited by vvhitevvizard on 15 Dec 2018, 17:27, edited 1 time in total.
oif2003
Posts: 214
Joined: 17 Oct 2018, 11:43
Contact:

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 17:25

Yeah, I agree. For now it gives us a more efficient way to test things out. It can also give us the tools to do some metaprogramming if the need ever arise (by we, I mean you). Fun tool to play around with at the very least :)

Except that wasn't MCode and I was under the impression that compiling to memory should work the same as compiling to disk. I tried to use options -luser32 -lkernel32 but that didn't work. Anyway, it's not terribly important at the moment as long as it only effects windows.h (?) I might do some more testing later. Hopefully it's only a configuration issue.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: AHK v2: converting/optimizing scripts

15 Dec 2018, 19:52

oif2003 wrote:
15 Dec 2018, 17:25
Yeah, I agree. For now it gives us a more efficient way to test things out. It can also give us the tools to do some metaprogramming if the need ever arise (by we, I mean you). Fun tool to play around with at the very least :)
Ill give u a quick hack idea here. Just look in InStr function in AHK sources (written for VC2010 if Im not mistaken), get rid of unnecessary stuff (we just expect Unicode and do case-sensitive comparisons) and bingo - we got our first compiled code optimization for json.Get done :) It would effectively replace InStr with 2 subversions of it: for char and string search, and we could start making custom string search routines with hardcoded comparisons to get rid of the majority of slow Regex expressions.
But for further optimizations strpbrk, memchr, strchr have to be inlined inside C function body - no calls to C libraries.

I, for one, haven't written a single C line since 2010. Actually I remember more stuff from Assembly optimizing than C programming. So I would be slow with C drafting.

Return to “Ask for Help (v2)”

Who is online

Users browsing this forum: Marium0505 and 52 guests