How to obtain the memory address of a variable in AutoHotkey v2?

teadrinker · 26 Mar 2024, 13:18

Guys, maybe I don't understand something obvious, but there is a classical way to convert an unsigned number into a signed one: check the high bit, if it is equal to one, then invert all bits, add one and change the sign.

Code: Select all

ushortNum := 64759
shortNum := ushortNum >> 15 ? -((ushortNum ^ 0xFFFF) + 1) : ushortNum
MsgBox shortNum

28 Mar 2024, 22:17

iseahound wrote: ↑
26 Mar 2024, 09:19
If you think the masking is important, then just say so, otherwise I've clearly given the exact method I normally use.

To be clear, I never criticized your method of using NumPut and NumGet to extract a number of one type from a value of another type. I criticized:

How you introduced it as "the current method", not "the exact method I normally use". This gives a misleading implication that it is the method generally accepted or used. There have been many examples of bitwise operators used for this purpose in publicly available scripts over the years, and not so many using NumPut.
Your argument against the use of macros that no one previously brought up. (Looks like a "straw man fallacy".)
Incorrect: "signed values can't be retrieved using bitwise operators"
Incorrect: "a purely bitwise operation can't be compatible across 32-bit and 64-bit versions."

In any case, you proved yourself that the exact method the C/C++ code uses to sign extend isn't based off the size of an integer buffer (apparently hard coded to 8 bytes in AHK) but rather it calls the sign extend assembly instruction via type casts.

Why do you think it is a good idea to tell me what I did or wrote, especially after you've already been contradicted several times in this topic? All you've achieved is to show how little you understand it.

Assembly instructions aren't "called", nor do the type-casts directly correspond to instructions. As I said, "The effect of the chain of type-casts depends on where the expression is used." The type-cast just establishes context for the compiler. GET_X_LPARAM(lParam) ultimately just gets the low 16 bits of lParam, and is established (at compile time) to be signed. If you were to use short x = GET_X_LPARAM(lParam);, there would obviously be no sign-extension. If you use __int64 x = GET_X_LPARAM(lParam);, the language rules indicate that sign-extension is required, therefore the compiler will output a sign-extending instruction. It is the same for __int64 x = y; where y is of type short and there are no type-casts.

NumGet isn't based off the size of an integer buffer either, nor is that size "hard coded to 8 bytes in AHK". When you use NumGet, you specify which type you want to get. Internally, NumGet reinterprets (by way of type cast) the address as a pointer of the specified type, dereferences it and assigns the result to an __int64 or double variable. If you specified "short", the exact C++ code used is aResultToken.value_int64 = *(short *)op.target;. Here the compiler must take a short from the address contained by op.target, sign-extend it to 64-bit and store in aResultToken.value_int64.

NumGet is effectively just a combination of a type-cast and dereference. (Interpreting the parameters is probably the most costly part.)

Sign-extension is most likely achieved using the same instruction either way, but such details are irrelevant, as long as the compiler achieves whatever result is correct according to the language spec.

I did not prove any of this, merely explained it.

If you wish to understand how the compiler achieves its result, you can decompile. Take this C++ snippet for example,

Code: Select all

void NumGetShort(__int64 *a, UINT_PTR b) {
	*a = *(short*)b;
}

void NumGetInt64(__int64 *a, UINT_PTR b) {
	*a = *(__int64*)b;
}

A decompilation may look like this:

Code: Select all

?NumGetInt64@@YAXPEA_J_K@Z PROC                         ; NumGetInt64, COMDAT
  00000 48 8b 02         mov     rax, QWORD PTR [rdx]
  00003 48 89 01         mov     QWORD PTR [rcx], rax
  00006 c3               ret     0
?NumGetInt64@@YAXPEA_J_K@Z ENDP

?NumGetShort@@YAXPEA_J_K@Z PROC                         ; NumGetShort, COMDAT
  00000 48 0f bf 02      movsx   rax, WORD PTR [rdx]
  00004 48 89 01         mov     QWORD PTR [rcx], rax
  00007 c3               ret     0
?NumGetShort@@YAXPEA_J_K@Z ENDP

Here the parameters are already in registers due to the x64 calling convention, but the real NumGet would do more work leading up to this, with the value of op.target and the address of aResultToken.value_int64 ending up in registers. In one case there is just the two mov instructions (one is insufficient because only one operand can be a memory operand, dereferencing a register). In the other case, one of the mov instructions is replaced with movsx, the sign-extending version of mov.

I tend to avoid bit-magic that relies on the size of the underlying buffer,

Even at the assembly level, behaviour is defined by the data type, not "the size of the underlying buffer", which is also defined by the data type. It is well known and documented that AutoHotkey uses signed 64-bit integers.

As for how any of this relates to the original topic, you (iseahound) first suggested that retrieving the address of the number contained by a variable would allow NumGet to be used on it directly. It has since been proven that such a thing isn't needed for extracting signed shorts from a message parameter or similar.

However, if you have an integer value (perhaps received by a CallbackCreate or OnMessage callback) and want to reinterpret it as a floating-point number, NumPut and NumGet are probably the most practical way.

@teadrinker if I'm not mistaken, ushortNum >= 0x8000 ? ushortNum - 0x10000 : ushortNum would achieve the same result without bitwise operators. The inverse would be shortNum < 0 ? 0x10000 + shortNum : shortNum.

Understanding any of these methods probably requires an understanding of two's complement.

vmech · 28 Mar 2024, 23:31

lexikos wrote: ↑
28 Mar 2024, 22:17
Understanding any of these methods probably requires an understanding of two's complement.

It's even nice to be back at school again. Thank you!

teadrinker · 29 Mar 2024, 11:54

lexikos wrote: ↑ushortNum >= 0x8000 ? ushortNum - 0x10000 : ushortNum

It turned out that this variant has even better performance. But the one I gave also gives you a better idea of how negative numbers are organized in memory.

iseahound · 29 Mar 2024, 21:51

No one needs to agree with my opinion. They're just simple facts:

Bit magic is pretty much solved so it doesn't really represent a good body of reusable knowledge, but rather a collection of curios.
Numbers seem to pop up out of nowhere.

For 1, see https://graphics.stanford.edu/~seander/bithacks.html (really good and classic source for bit hacks)
For 2 just look up De Bruijn numbers

Seven0528 · 29 Mar 2024, 22:43

iseahound wrote: ↑
29 Mar 2024, 21:51
No one needs to agree with my opinion. They're just simple facts:

Bit magic is pretty much solved so it doesn't really represent a good body of reusable knowledge, but rather a collection of curios.

Numbers seem to pop up out of nowhere.

For 1, see https://graphics.stanford.edu/~seander/bithacks.html (really good and classic source for bit hacks)
For 2 just look up De Bruijn numbers

　So are you claiming that the method using bit shifting is incorrect or inappropriate? Um...
(Since we have strayed too far from the original topic, this will be my final reply. Please understand.)

Code: Select all

//  C++
#define GET_X_LPARAM(lp) ((int)(short)LOWORD(lp))
#define LOWORD(l) ((WORD)(((DWORD_PTR)(l)) & 0xffff))

#define GET_Y_LPARAM(lp) ((int)(short)HIWORD(lp))
#define HIWORD(l) ((WORD)((((DWORD_PTR)(l)) >> 16) & 0xffff))

#define MAKELPARAM(l, h) ((LPARAM)(DWORD)MAKELONG(l, h))
#define MAKELONG(a, b) ((LONG)(((WORD)(((DWORD_PTR)(a)) & 0xffff)) | ((DWORD)((WORD)(((DWORD_PTR)(b)) & 0xffff))) << 16))

Code: Select all

;  AHK
GET_X_LPARAM(lParam)    => (lParam << 48 >> 48) ;  Int  ==  Short
LOWORD(l)               => (l << 48 >>> 48) ;  WORD  ==  UShort

GET_Y_LPARAM(lParam)    => (lParam << 32 >> 48) ;  Int  ==  Short
HIWORD(l)               => (l << 32 >>> 48) ;  WORD  ==  UShort

MAKELPARAM(l, h)        => (lp := (l & 0xffff) | ((h & 0xffff) << 16), A_PtrSize == 8 ? lp : (lp << 32 >> 32)) ;  LPARAM  ==  Ptr
MAKELONG(a, b)          => (((a & 0xffff) | ((b & 0xffff) << 16)) << 32 >> 32) ;  LONG  ==  Int

iseahound · 29 Mar 2024, 23:13

Well how can you determine the position of the sign bit? Many readers of this forum did not know that AHK uses 64-bit integers on both 32- and 64-bit versions (including myself). Since you have to know the position of the sign bit (or know the size of the integer data type) to fully realize bitwise operations, and this is not common knowledge, it wasn't appropriate (to do things like <<32>>32) until lexikos gave his reply in this thread.

I could find the following snippets from the v1.1→→v2.0 changes page

Integer constants and numeric strings outside of the supported range (of 64-bit signed integers) now overflow/wrap around, instead of being capped at the min/max value. This is consistent with math operators, so 9223372036854775807+1 == 9223372036854775808 (but both produce -9223372036854775808). This facilitates bitwise operations on 64-bit values.

The ~ (bitwise-NOT) operator now always treats its input as a 64-bit signed integer; it no longer treats values between 0 and 4294967295 as unsigned 32-bit.

but couldn't really find clear documentation that the integers are 64-bit.

Likewise, the page on Pure Numbers https://www.autohotkey.com/docs/v2/Concepts.htm#pure-numbers says AHK primarily uses 64-bit integers, but it doesn't seem like there are any scenarios where this isn't the case according to this thread.

In other words, I do not program based off the "run once it works" mentality, because I need a stronger form of proof that my code is consistent across all scenarios to be reliable. It only takes one yet-to-be-discovered counterexample to prove that these bitwise manipulations cannot be trusted to the same extent they are in C/C++ where data types are explicitly specified. I think it would be appropriate to update the documentation pages to show some bitwise manipulation examples (outside of DllCall and CallbackCreate), and maybe state explicitly that all integers are 64-bit (or list out the few cases where it doesn't apply).

Seven0528 · 30 Mar 2024, 00:02

　Ah, if you're suggesting that it should be made clearer that both 32-bit and 64-bit systems use Int64, then I agree.
It's a definite fact, so one can confidently write code based on it, but... As you mentioned, it's not something easily understandable enough to be considered "common knowledge," and I too had to struggle quite a bit to grasp this fact. (Just a few months ago, I was under the same misconception.)

While engaged in self-study, I've seen discussions on Stack Overflow threads explaining that the 32-bit or 64-bit executable file is separate from the 64-bit data type.
This might be common knowledge to some people... but considering the need to differentiate between 32-bit and 64-bit data types when using DllCall, there could be potential for misunderstanding.
(Especially considering that such concepts need to be learned while using DllCall, which is not a conventional AHK command, I believe there is ample room for misunderstanding.)

Although the official documentation mentions AutoHotkey using Int64 several times,
I've never seen any mention that this is not dependent on whether it's 32-bit or 64-bit.
I'm not sure if I missed relevant information or if it simply wasn't deemed necessary to mention,
but it would be helpful if it were clearly stated in the official documentation.

It wasn't my intention, this thread has provided me with an opportunity to delve deeper into bitwise operations.
I hope this thread has been beneficial to you as well.
While I feel somewhat uneasy about seeming to highlight someone's oversight, I hope you don't mind too much.
Have a nice day! @iseahound.

(This is indeed my final response.)

31 Mar 2024, 03:55

The documentation now explicitly states that bitwise operators perform 64-bit integer operations.

teadrinker wrote: ↑
29 Mar 2024, 11:54
[The variant] I gave also gives you a better idea of how negative numbers are organized in memory.

I'm not sure I even understand what you mean by "organized in memory". If you mean how the negative number relates to the exact sequence of bits (1s and 0s), I think the lack of symmetry in your formula obscures even that. I didn't fully understand it until I considered that unary negation literally finds the two's complement of its operand, and that's exactly what ((ushortNum ^ 0xFFFF) + 1) does (but for 16-bit).

Your method is finding the 16-bit two's complement of ushortNum, and then finding the 64-bit two's complement of the result, much the same as this:

Code: Select all

shortNum := ushortNum >> 15 ? (((ushortNum ^ 0xFFFF) + 1) ^ 0xFFFFFFFFFFFFFFFF) + 1 : ushortNum

The way I look at it, we don't read and write memory in bits, but in chunks of bits. We don't manipulate memory solely using bitwise operators, but using all kinds of operators and functions, and code that generally expresses numbers in hexadecimal and decimal. I don't think that using xor, addition and unary to get from unsigned to signed is any more informative than using simple subtraction.

I came up with ushortNum - 0x10000 on the spot, after considering the difference (mathematically, as it turns out) between the unsigned value and the negative value.

iseahound wrote: ↑
29 Mar 2024, 21:51
They're just simple facts:

That's your opinion, actually.

iseahound wrote: ↑
29 Mar 2024, 23:13
Well how can you determine the position of the sign bit?

Are you ignoring the other "bit magic" methods that were demonstrated, which do not rely on the position of the sign bit?

In any case, I think the simplest way making the least assumptions is like this:

Code: Select all

MsgBox position_of_sign_bit()

position_of_sign_bit() {
    Loop
        if (1 << A_Index) < 0
            return A_Index
}

There is no need for optimization as one can simply retain the result until the program exits; otherwise, a smarter method can be designed to rely on the assumption that the sign bit can only reasonably be at 7, 15, 31 or 63. You are screwed if the sign bit is at 7 or 15 (since you want to work with a 32-bit lParam), and << doesn't support shifting above 63 (as per the documentation), so you may as well just check if it is at 31, and when it isn't, assume it is at 63.

Code: Select all

sign_bit := (1 << 31) < 0 ? 31 : 63
MsgBox sign_bit

Then the value can be used with certainty, to calculate the number of bits that the X or Y coordinate must be shifted to put its sign bit into the right place.

iseahound wrote: Many readers of this forum did not know that AHK uses 64-bit integers on both 32- and 64-bit versions (including myself).

I think it is safe to assume that a group largely composed of people new to programming would have "many" members who have not even learned what "64-bit" means. So what you say is probably true, though I think "many readers of this forum" is misdirection and not something you actually know. There is no shame in saying "I didn't know that".

Some may hold the unfounded misconception that AutoHotkey 32-bit is limited to 32-bit integers, but it is more reasonable to assume that "AutoHotkey primarily uses [...] 64-bit signed integers (int64)" and "64-bit signed values are supported", completely absent any mention of AutoHotkey build or other limitations for specific operators, means that all builds support integer values in this full range, and operations will be 64-bit.

If AutoHotkey 32-bit can restrict general use of integers to 32-bit without any such restriction being documented, why not the same for DllCall and NumPut? I'm sure you didn't consider that your NumPut/NumGet method might fail due to such a reason.

It was common knowledge (and documented) before 2010 that 64-bit integers were supported, and at that time, all versions were 32-bit. Come to think of it, 64-bit integers were supported and used for math operations even back when numbers were stored in variables as strings.

iseahound wrote:It only takes one yet-to-be-discovered counterexample to prove that these bitwise manipulations cannot be trusted

Can you prove that the same argument does not apply equally to your own method of using NumPut and NumGet?

iseahound wrote:Since you have to know the position of the sign bit (or know the size of the integer data type) to fully realize bitwise operations, and this is not common knowledge, it wasn't appropriate (to do things like <<32>>32) until lexikos gave his reply in this thread.

This technique was already demonstrated in the documentation, and therefore can be assumed safe to use. It is common sense that one does not need to understand the underlying mechanics of the code to put it to use reliably. It is certainly appropriate to put documented examples to use.

If an incoming parameter is intended to be a signed integer, any negative numbers can be revealed by following either of the following methods:
Code: Select all
; Method #1
if (wParam > 0x7FFFFFFF)
    wParam := -(~wParam) - 1

; Method #2: Relies on the fact that AutoHotkey natively uses signed 64-bit integers.
wParam := wParam << 32 >> 32
Source: CallbackCreate - Syntax & Usage | AutoHotkey v2

It even says "AutoHotkey natively uses signed 64-bit integers", though the example can be trusted to work regardless.

In the registry, REG_DWORD values are always expressed as positive decimal numbers. If the number was intended to be negative, convert it to a signed 32-bit integer by using OutputVar := OutputVar << 32 >> 32 or similar.
Source: RegRead - Syntax & Usage | AutoHotkey v2

iseahound wrote:this is not common knowledge

Would you say that it is common knowledge that NumPut and NumGet can be used to extract the X and Y components of lParam? Would the majority of users have ever even had reason to consider doing such a thing?

iseahound wrote:but couldn't really find clear documentation that the integers are 64-bit.

What integers? Integers stored in variables, arrays, map elements? Integers used as keys in a map, indices in an array? Integers passed to various built-in functions? You want a clear answer, but "are integers 64-bit?" is the wrong question.

The implementation would be free to store an integer as 8-bit if it was in that range, while still supporting signed 64-bit integers as documented. The user does not need to know which data type is actually used in a variable, because you can't get its address (surprise, we return to the original topic!).

In the context of the bitwise operators, it does not matter what type the input value has or how it was stored, only the actual value and how the operator itself behaves. For instance, ~ in v1 operates as 32-bit or 64-bit depending on the input value itself, not the type of the input value (and certainly not the build).

Even just in the bit shifting documentation, there were strong indicators that they are 64-bit, without anything to suggest that this would be conditional on the build.

If Value2 is less than 0 or greater than 63, an exception is thrown.
Source: Variables and Expressions - Definition & Usage | AutoHotkey v2

For example, -1 has the same bit representation as the unsigned 64-bit integer 0xffffffffffffffff, therefore -1 >>> 1 is 0x7fffffffffffffff.
Source: Variables and Expressions - Definition & Usage | AutoHotkey v2

Seven0528 wrote: ↑
30 Mar 2024, 00:02
I've never seen any mention that this is not dependent on whether it's 32-bit or 64-bit.

It can be assumed, otherwise the documentation would be incorrect. If 64-bit integers were not supported on 32-bit, the unqualified statement "64-bit signed values are supported" would be false.

How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Re: How to obtain the memory address of a variable in AutoHotkey v2?

Who is online