MCode Tutorial (Compiled Code in AHK)

Helpful script writing tricks and HowTo's
geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

MCode Tutorial (Compiled Code in AHK)

Post by geek » 20 Dec 2023, 10:52

With great appreciation of the original MCode Tutorial by nnnik, from which has been rewritten and extended.

Please visit the Wiki for the full tutorial.
AutoHotkey Wiki wrote:Machine code is the lowest level of binary code that your computer can run on. Programming languages like C, C++, Rust, and Go all compile to machine code in order for your computer to understand them. Machine code can run several hundred times faster than equivalent AutoHotkey code, which does not compile to machine code.

In the AutoHotkey community the term "MCode" refers to tools and methods used for putting machine code into scripts. These MCode tools normally take code written in a language like C, use a compiler to turn that code into machine code, then turns that machine code either directly into AutoHotkey code or into text that can be loaded using a custom AutoHotkey library and then called by DllCall.

MCode is important for optimization when writing scripts that need to process a relatively large amount of data quickly. Here are some common situations where MCode can be helpful:
  • Encoding or decoding data more than a few kilobytes in size, in formats like Json
  • Hashing files or large amounts of text
  • Manipulating images like GDI+ bitmaps (especially for custom ImageSearch algorithms)
  • Performing real-time calculations, such as for a physics engine
MCode is not the only way to achieve these performance goals. It is possible, and sometimes more flexible, to use the normal tooling of those compiled languages to produce a standard machine code DLL that can be used from AutoHotkey. However, a script that comes with a custom DLL is harder to share because it takes multiple files, it can take up more disk space than a normal script, and is more likely to be blocked by antivirus and corporate application filters.
The wiki follows this up with a proper introduction to using MCode in AutoHotkey v2.0, with interactive examples where you can adjust the C code being compiled, test different ways to call the C code, and even output code that can be pasted directly into your script so you can use MCode without having a compiler installed.

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 22 Dec 2023, 17:50

neogna2 wrote: Suggestion: advise script creators to bundle C source code and mcode creation tips as comments next to the mcode in source that they publish. That way script users have the power to examine the C source, generate mcode from it and thus verify the mcode. Otherwise the user has to blindly trust the mcode, like any binary blob.

Some earlier discussion on verifying mcode with the godbolt online compiler here and here.
For what it's worth, publishing the source of your machine code blobs is not a requirement in many of the situations where you might want to use machine code. And users have to blindly trust code they don't understand all the time, even when it is plain-text AHK, just because it is using techniques they do not understand. But I get the sentiment that where sharing should occur it would be helpful to have instructions of how to properly do so.

The output of godbolt is only somewhat similar to the output of other MCode tools, and is not going to be much similar at all to the output of MCL. At least not in a way that will be clearly obvious to someone who isn't already very familiar with working with machine code. Verifying against godbolt output, although theoretically straightforward, does not sound like a very effective validation method.

Instead I have a much more ambitious plans for safer MCode.

First, following industry standards of reproducible builds, I've developed a Docker container where AutoHotkey and MCL can run inside of. This container can be connected to platforms like GitHub or GitLab to allow automated building of the MCode. Or in the case of GitHub I guess it could run in a Windows environment, since GitHub actions is Microsoft-pilled now. Anyways, the point is that all of the build steps for a script utilizing MCode would occur in this third party environment rather than on the developer's PC where tampering could occur. Eventually my project cJson is likely to use this this a method for generating releases, and I encourage other developers of large MCode projects to do something similar.

Second and similar, but likely easier to use for small projects, I have plans to update the pastebin (https://p.autohotkey.com/) with access to the CloudAHK Runner sandbox (like the Wiki). I would allow long-lived pastes that include both the originally pastebinned code and the output of the sandbox, so that anyone using the pastebin can prove (at least to the extent that people trust a pastebin at p.autohotkey.com) that their built code matches their source. This would be a difficult interface to use for multi-file C code, but it should work with a little coercion and I imagine most MCode isn't that expansive to begin with. If the long-lived paste did for some reason expire, the compilation could just be run again through the pastebin with similar settings. Though, a compilation now may not perfectly match a compilation months or years prior because of updates to the compiler environment.

I also have some vague ideas involving code signing, where I could provide a service that will sign your MCode against the source code using my private key, but this seems like it may not be as helpful as the pastebin concept.

neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: MCode Tutorial (Compiled Code in AHK)

Post by neogna2 » 23 Dec 2023, 05:22

My previous post disappeared, due to the forum issues the last few days I assume. But the text is still visible in the quote by geek. Here are the links to previous godbolt mcode discussions again
viewtopic.php?p=445337#p445490
viewtopic.php?t=112753#p539687
geek wrote:
22 Dec 2023, 17:50
publishing the source of your machine code blobs is not a requirement in many of the situations where you might want to use machine code.
Could you give an example where a script creator makes the AutoHotkey source available but still has a good reason to not make the C source for the included mcode available?

My suggestions was made from the POV of users of code posted to the AutoHotkey forum. When someone posts only compiled .exe scripts or .dll files here it is pretty common to ask for the source code. I think that's good (reduces malware risk and eases collaborative/derivative work) an I think the same goes for including the C source when mcode is posted.

I like your reproduction ideas (docker container, GitHub actions, long-lived pastebin) but see them as complements to providing the C source.
geek wrote:
22 Dec 2023, 17:50
The output of godbolt is only somewhat similar to the output of other MCode tools, and is not going to be much similar at all to the output of MCL.
Do you have time to expand on that? It would be useful with a short explainer, here or in the wiki, on factors that make the same C source input become different mcode output, such as different compilers and compiler flags.

For example I know SKAN has used Pelles C to generate mcode which I so far haven't been able to reproduce in godbolt. People are of course free to make different such choices, but having a standard or recommended approach and set of choices could also be helpful.

User avatar
boiler
Posts: 17471
Joined: 21 Dec 2014, 02:44

Re: MCode Tutorial (Compiled Code in AHK)

Post by boiler » 23 Dec 2023, 06:48

geek wrote: For what it's worth, publishing the source of your machine code blobs is not a requirement in many of the situations where you might want to use machine code.
Please describe such a situation, because the forum rules do not note any exceptions to the rule stating “No closed-source scripts/code.”

geek wrote: And users have to blindly trust code they don't understand all the time, even when it is plain-text AHK, just because it is using techniques they do not understand.
This is not a valid reason for not providing source code. It is a bit condescending and patronizing to say, “I don’t need to provide the source code because many of you wouldn’t understand it anyway.” Not fully understanding source code and not being able to see it are not the same thing. When it is open source, there is at least the knowledge that it is available to be inspected and understood by others. Not being hidden itself provides a significant level of comfort to all users. But more importantly, it does allow it to be inspected and understood by anyone who has the requisite knowledge, while of course, closed-source code does not.

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 23 Dec 2023, 09:21

neogna2 wrote: Could you give an example where a script creator makes the AutoHotkey source available but still has a good reason to not make the C source for the included mcode available?

My suggestions was made from the POV of users of code posted to the AutoHotkey forum. When someone posts only compiled .exe scripts or .dll files here it is pretty common to ask for the source code. I think that's good (reduces malware risk and eases collaborative/derivative work) an I think the same goes for including the C source when mcode is posted.
I am referring to projects outside the scope of the forum, such as the production of non-FOSS freeware or code meant for personal or internal business use.
neogna2 wrote: Do you have time to expand on that? It would be useful with a short explainer, here or in the wiki, on factors that make the same C source input become different mcode output, such as different compilers and compiler flags.
Previous MCode generation tools would take the compiler's debug output of compiling a single function and manipulate it with regular expressions and other text manipulation techniques in order to extract hex bytes of compiled code, preserving for example the relative positions of all the bytes. This means its output ends up being very similar to a debug view of the compiler output like is provided Godbolt (assuming your Godbolt instance is set to use a similar compiler with similar settings).

MCL takes the .o object file produced by compiling a source file with multiple exported functions and potentially very many extra functions pulled in from other C files by #include. The order of exported functions within the object file is very likely to be different than the order of the functions in the Godbolt debug output. Then MCL uses a custom linker to manipulate the object data, replacing certain bytes in order to make the code more functional. MCL also sets up rules for run-time imports of DLL where the contents of the machine code will be altered dynamically each time the script gets run, making static analysis more complicated. Before getting put into the script, the static portion of the binary machine code gets LZ compressed then base64 encoded, which complicates the reversal process for casual inspection.

To compare MCL output to Godbolt output you would need to get past the encoding, compression, rearrangement, and automated modifications applied both at runtime and compile time, provided Godbolt even supports the compiler that was used. It's not impossible, just complicated. Consequently, Godbolt is not a great way to definitively say that the compiled code matches the source code.
boiler wrote:
geek wrote: For what it's worth, publishing the source of your machine code blobs is not a requirement in many of the situations where you might want to use machine code.
Please describe such a situation, because the forum rules do not note any exceptions to the rule stating “No closed-source scripts/code.”
I am referring to projects outside the scope of the forum, such as the production of non-FOSS freeware or code meant for personal or internal business use.

That some projects may be using MCode for sharing onto the forum is secondary to the case of people using MCode to perform work. High performance data processing is critical in many cases as a business need. Distributing script among an organization can be done easily by .ahk file where source visibility or obscurity is not really the point.

There are an uncountable number of ways to write, use, and share code outside of a forum with a strict open-source policy. That does not negate the requirement to follow the rules of the forum when posting on the forum, and I apologize if it sounded like I meant it did. I just mean that I am not writing a tutorial on posting on the forum, I am writing a tutorial on creating and using MCode.
boiler wrote: This is not a valid reason for not providing source code. It is a bit condescending and patronizing to say, “I don’t need to provide the source code because many of you wouldn’t understand it anyway.” Not fully understanding source code and not being able to see it are not the same thing. When it is open source, there is at least the knowledge that it is available to be inspected and understood by others. Not being hidden itself provides a significant level of comfort to all users. But more importantly, it does allow it to be inspected and understood by anyone who has the requisite knowledge, while of course, closed-source code does not.
I do not mean to be condescending or patronizing, I mean to invoke the mental image of non-forum software distribution situations like tools created and distributed by Nir Sofer over at NirSoft, cresstone.com's apps like Shutdown Blocker, Antibody Software's WizTree, etc. There are tons of people online outside of forum situations where open source requirement policy does not apply, producing useful apps where, if they had been written using AHK, MCode would have been extremely valuable to them. People download and run these applications all the time, it is just the nature of freeware.

If you would like me to remove my guide for making machine code because it does not guide people how to follow the rules of the forum, I will gladly do so.

User avatar
boiler
Posts: 17471
Joined: 21 Dec 2014, 02:44

Re: MCode Tutorial (Compiled Code in AHK)

Post by boiler » 23 Dec 2023, 10:01

No, sorry. I didn’t understand the reason for you stating that. There is no reason to remove your guide. Thanks for clarifying.

And more importantly, I owe you a huge apology for accidentally editing out most of your post when trying to trim parts of it I meant to use for my reply. I tried to recover your original text, but was not able to. I have no excuse as I need to be much more careful! I’m very sorry!

gregster
Posts: 9124
Joined: 30 Sep 2013, 06:48

Re: MCode Tutorial (Compiled Code in AHK)

Post by gregster » 23 Dec 2023, 10:03

I think I can retrieve geek's original post from an open tab that I still have.

Edit: geek was faster.

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 23 Dec 2023, 10:08

I had a copy of the original content too, I've restored the original post content.

User avatar
boiler
Posts: 17471
Joined: 21 Dec 2014, 02:44

Re: MCode Tutorial (Compiled Code in AHK)

Post by boiler » 23 Dec 2023, 10:26

Thank you! Need to be more careful posting from my phone. With limited/trivial power comes great responsibility! ;)

neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: MCode Tutorial (Compiled Code in AHK)

Post by neogna2 » 23 Dec 2023, 17:28

geek wrote:
23 Dec 2023, 09:21
I am referring to projects outside the scope of the forum, such as the production of non-FOSS freeware or code meant for personal or internal business use.
Aha that makes sense, I guess we were taking past each other a bit. Cleared up now. :thumbup:
Thanks for the expanded information on MCL vs godbolt, it helped me. Considered adding it to the wiki as I think others can learn from it too.

iseahound
Posts: 1479
Joined: 13 Aug 2016, 21:04
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by iseahound » 24 Dec 2023, 00:03

Honestly, if you want my feedback, it's too complex. (All of it). Even I don't understand the problem that is being solved.

The reason for machine code (as I see it) is speed. That means assembly. We only use C because hand writing assembly is hard.

So your wiki should be more focused on methods to increase the speed of AutoHotkey using MCode while reducing the complexity of such an endeavor. I think godbolt.org is a great choice to disassemble the C output. I'm honestly shocked it's not at the very top of your wiki. Nor are there simple lightweight examples, that can be constructed using a loop to parse a hexadecimal string of assembly instructions, or Lexikos' NumPut methods see here: viewtopic.php?f=6&t=21223 The MCL library should be at the bottom: Normal MCode users should not be forced to place a "export" function in their cross-platform cross-compiler C code.

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 24 Dec 2023, 08:22

iseahound wrote:I don't understand the problem that is being solved
The example set are fabricated problems that could absolutely be solved other ways. Problems that benefit from MCode are difficult to stage in a tutorial like this because they are typically very contextual. Yesterday, for example, I was helping to improve someone's script meant to simulate the travel of electrons across a two-dimensional material, by calculating velocities of individual electrons by the effect of all other electrons in the simulation. This essentially means performing n^2 floating point calculations where n is the quantity of electrons which, for an interesting simulation, can trend into the hundreds.

https://p.ahkscript.org/?p=850aeef9 (uncompiled)
https://p.ahkscript.org/?p=996132b1 (compiled)
iseahound wrote:The MCL library should be at the bottom: Normal MCode users should not be forced to place a "export" function in their cross-platform cross-compiler C code.
The export is not mandatory for single-function embeddings, but it paves a clear pathway to multi-function embeddings which is something traditional mcode tools have always fallen short on. "Normal" MCode users also have to contend with issues like non-functional floating point constants, broken ability to call across multiple defined functions even when exporting only a single function, etc.

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 27 Dec 2023, 23:46

I have made adjustments to the tutorial to start with the basics of MCode and with the simple examples that can be easily implemented using godbolt, before moving into MCode features that are challenging to implement with traditional tools (like multiple-function codes or codes with global variables). I do intend to write more, moving into examples where MCode is more practical than pure-ahk code (such as for a custom data encoder) but I do not have the time to do so right now.

neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: MCode Tutorial (Compiled Code in AHK)

Post by neogna2 » 28 Dec 2023, 06:08

@geek Great update! It is now easier for beginners to understand the mcode examples and the page then stepwise guides us to the point where we see the usefulness of MCL.

Two small suggestions:

At "the provided function will output a string like this" add a mention that in the MCode4GCC string the mcode parts are Base64, as compared to Hex in the earlier godbolt examples. Maybe with a link to https://en.wikipedia.org/wiki/Base64#Examples or https://stackoverflow.com/q/3183841 ?

In section "3. Multiple Functions" at "If you look in the code" add a godbolt link https://godbolt.org/z/G16YrYWsY

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 28 Dec 2023, 10:12

neogna2 wrote: In section "3. Multiple Functions" at "If you look in the code" add a godbolt link https://godbolt.org/z/G16YrYWsY
I hesitate to post just godbolt share links, with the concern that the content at that link may expire or change at some point in the future which would cause confusion. Though, the godbolt admins seem to indicate that there's no plans to introduce expiration. What do you think of a link and an image or plain-text replication of the full output text?
neogna2 wrote: At "the provided function will output a string like this" add a mention that in the MCode4GCC string the mcode parts are Base64, as compared to Hex in the earlier godbolt examples. Maybe with a link to https://en.wikipedia.org/wiki/Base64#Examples or https://stackoverflow.com/q/3183841 ?
Some more context about base64 sounds like a good idea. I'd love to write an article about it myself but don't really have the time, so I might just add a brief description to the MCode article highlighting its purpose and advantages/disadvantages compared to hexadecimal, and add the Wikipedia link.

More generally, what topics would people like to see covered first? There's so much potential for MCode that I'm not sure how to prioritize.
  • Working with callbacks created by CallbackCreate. (See part 5)
  • Using MCode functions instead of CallbackCreate callbacks.
  • Calling functions from DLLs like the Windows API or DLL files you have on hand like lua54.dll. Especially important to access common stdlib functions like sqrt. (See part 6)
  • Interacting with AutoHotkey objects (well, COM objects generally) from the C code.
  • Working with structs populated by AHK then processed by MCode and vice versa.
  • Processing GDI+ bitmaps from MCode
  • Inventing new COM objects that get exported to AHK, like a high performance hashmap

geek
Posts: 1056
Joined: 02 Oct 2013, 22:13
Location: GeekDude
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by geek » 08 Feb 2024, 10:08

Some UI for MCL has been built out now, allowing you to invoke the compiler directly from the browser by a three step process:

1. Paste C code into wiki page
2. Press compile
3. Copy mcode out of wiki page

I put the component on this page https://autohotkey.wiki/cloudahk#compiler though I may put another copy of it on this tutorial page too. Not sure yet.

The buttons for changing optimization and other settings aren't implemented yet, but shouldn't be too hard to add once I find the time. If you visit the page and find it just entirely isn't working, you may need to force-refresh the page cache with Control-F5 (Chrome browsers) or Shift-F5 (Firefox) to get the new javascript to load.

iseahound
Posts: 1479
Joined: 13 Aug 2016, 21:04
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by iseahound » 08 Feb 2024, 22:57

nice work.

User avatar
thqby
Posts: 442
Joined: 16 Apr 2021, 11:18
Contact:

Re: MCode Tutorial (Compiled Code in AHK)

Post by thqby » 02 May 2024, 08:08

Recently, I modified geek's PEObjectLinker to support linking *.obj files generated by the msvc C/C++ compiler as if you were writing a dll with export functions, and then processing it with the ahk linker to generate mcode that can be used in ahk.

https://github.com/thqby/ahk2_lib/blob/master/MCode/example.ahk

There will be similar output in the generation process. You can copy bytecode to https://shell-storm.org/online/online-assembler-and-disassembler/ for disassembly, and compare it with https://godbolt.org/ or dumpbin.exe /disasm code.obj. The generated mcode is a number of function blocks arranged in a certain order, and filled with some relative addresses and CC for byte alignment, as well as data segments (strings, etc.), export tables, import tables and relocation tables at the tail.

Code: Select all

BASE64(LZCompress):
b3,pbAAi0QkBFZqAGgAbAAAAP80xYAFAGCNAzBqAP8VTAEAOIvGXsIEAMwhBABqBOgNAFCDxAAExwBhaGsAwzEAJv8lUAAkBwBIZQhsbG8ADkdvb2QAYnllAGRvZwAAY2F0AENhbGwAIE1Db2RlIEYAdW5jdGlvbgCqVAAnFAADXAADWgADKmQAA8gAA2gAA4QDAACYgJCAiICAAIBGHhYPCEwCADAAgIAD

EXPORT OFFSETS:
128	?Map@@3PAUMyMap@@A
0	?call_msgbox_and_return_item@@YGPAUMyMap@@H@Z
48	?new_char_array_and_set_val@@YGPADXZ

32BIT CODE WITH 21 BYTE HEADER:
0000	8b 44 24 04 56 6a 00 68  6c 00 00 00 ff 34 c5 80   .D$.Vj.hl....4..
0010	00 00 00 8d 34 c5 80 00  00 00 6a 00 ff 15 4c 00   ....4.....j...L.
0020	00 00 8b c6 5e c2 04 00  cc cc cc cc cc cc cc cc   ....^...........
0030	6a 04 e8 0d 00 00 00 83  c4 04 c7 00 61 68 6b 00   j...........ahk.
0040	c3 cc cc cc ff 25 50 00  00 00 00 00 00 00 00 00   .....%P.........
0050	00 00 00 00 48 65 6c 6c  6f 00 00 00 47 6f 6f 64   ....Hello...Good
0060	62 79 65 00 64 6f 67 00  63 61 74 00 43 61 6c 6c   bye.dog.cat.Call
0070	20 4d 43 6f 64 65 20 46  75 6e 63 74 69 6f 6e 00    MCode Function.
0080	54 00 00 00 14 00 00 00  5c 00 00 00 5a 00 00 00   T.......\...Z...
0090	64 00 00 00 c8 00 00 00  68 00 00 00 84 03 00 98   d.......h.......
00a0	80 90 80 88 80 80 80 46  1e 16 0f 08 4c 02 30 00   .......F....L.0.
00b0	80 80 03                                           ...

RELOCATION OFFSETS:
8, 15, 22, 30, 70, 128, 136, 144, 152

IMPORT TABLE ENTRY OFFSET: 76
user32:MessageBoxA
msvcrt:??_U@YAPAXI@Z

Post Reply

Return to “Tutorials (v2)”