AutoHotkey Community

It is currently May 26th, 2012, 9:04 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: 64-bit arithmetic
PostPosted: July 15th, 2005, 7:56 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
It looks like there is something inconsistent with the 64-bit arithmetic (I’d like to have it documented in the Help). If the conversion of a literal constant to machine integer overflows, like 0x8000000000000001 (> 0x7fffffffffffffff), it is maxed at 0x7fffffffffffffff. Other operations wrap around (ignore overflows)
Code:
      Operation                  Result              Expected

0x7fffffffffffffff + 0 ->  0x7fffffffffffffff  -0x8000000000000000  OK
0x7fffffffffffffff + 1 -> -0x8000000000000000  -0x8000000000000000  OK
0x7fffffffffffffff + 2 -> -0x7fffffffffffffff  -0x7fffffffffffffff  OK
0x7fffffffffffffff + 3 -> -0x7ffffffffffffffe  -0x7ffffffffffffffe  OK

0x8000000000000000 + 0 ->  0x7fffffffffffffff  -0x8000000000000000
0x8000000000000000 + 1 -> -0x8000000000000000  -0x7fffffffffffffff
0x8000000000000000 + 2 -> -0x7fffffffffffffff  -0x7ffffffffffffffe
0x8000000000000000 + 3 -> -0x7ffffffffffffffe  -0x7ffffffffffffffd

0x8000000000000001 + 0 ->  0x7fffffffffffffff  -0x7fffffffffffffff
0x8000000000000001 + 1 -> -0x8000000000000000  -0x7ffffffffffffffe
0x8000000000000001 + 2 -> -0x7fffffffffffffff  -0x7ffffffffffffffd
0x8000000000000001 + 3 -> -0x7ffffffffffffffe  -0x7ffffffffffffffc

      (1<<63) + 0      -> -0x8000000000000000  -0x8000000000000000  OK
      (1<<63) + 1      -> -0x7fffffffffffffff  -0x7fffffffffffffff  OK
      (1<<63) + 2      -> -0x7ffffffffffffffe  -0x7ffffffffffffffe  OK
      (1<<63) + 3      -> -0x7ffffffffffffffd  -0x7ffffffffffffffd  OK


Report this post
Top
 Profile  
Reply with quote  
 Post subject: Re: 64-bit arithmetic
PostPosted: July 16th, 2005, 5:00 pm 
Offline

Joined: March 2nd, 2004, 3:36 pm
Posts: 10720
Laszlo wrote:
If the conversion of a literal constant to machine integer overflows ... it is maxed at 0x7fffffffffffffff.
Perhaps there will be a way to fix this so that unsigned 64-bit integers can be supported. In the meantime, I've documented it as you suggested on the Variables page under "Notes about variable capacity and memory usage":
Quote:
Commands, functions, and expressions that accept numeric inputs generally support 15 digits of precision for floating point values. For integers, 64-bit signed values are supported, which range from -9223372036854775808 (-0x8000000000000000) to 9223372036854775807 (0x7FFFFFFFFFFFFFFF). Any integer constants outside this range are set to the nearest in-range integer.
Suggestions/revisions are welcome.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 16th, 2005, 5:15 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Supporting unsigned 64 bit integers does not seem to be easy. You might need a flag A_IntegerType, like the A_FormatInteger. Setting it to +32, -32, +64, -64 could force the arithmetic mode to unsigned 32, signed 32, unsigned 64 or signed 64 bit arithmetic. Otherwise there will be always problems with the behavior of left-shift, right-shift or with the effects of overflows, etc.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 17th, 2005, 3:37 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
For changing the documentation of the current system I meant that it would be nice to mention somewhere that arithmetic operations wrap around at overflow, not like literal constant conversion. Maybe it is there, but I could not find it.

--- --- --- ---

Another idea is that a decimal or hex number string, as it is now, designates a signed 64 bit number. A special char attached to the end of the number designates different type, like U for unsigned 64-bit, S for short (32-bit), T (Two postfixes) for unsigned short. The result should also have the correct postfix, that is, 1T + 2T = 3T. Or, we can use special appended chars, like !,@,#,$,%,&, but these are less intuitive.

Maybe, using prefixes is even better. A negative sign, of course, indicates signed number, like no prefix in front of a positive number. An explicit + sign could indicate unsigned 64-bit numbers. 32-bit or 16-bit numbers could have a leading 0 or 00, but they are less important, because operations on them can be simulated with & 0xffffffff or 0xffff. They are useful, though, if an AHK script is used to check or reproduce the results of a compiled program, simulating its arithmetic. For example, the TEA cipher I posted relies on 32-bit unsigned arithmetic; with 64 bits the results are different, and other parties with C code cannot decrypt our messages.

The behavior of arithmetic operations could be the same as in C, like shifting-right unsigned numbers brings in 0, not the sign bit. Similarly, we should have the same results as in C when operating on mixed signed and unsigned numbers or of different precisions.

One implication is that "If var is [not] type" needs to be changed and some scripts might have to be modified. If a directive, like #AllowTypedNumbers activates this feature, old scripts won’t break.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 18th, 2005, 5:36 pm 
Offline

Joined: March 2nd, 2004, 3:36 pm
Posts: 10720
Laszlo wrote:
I meant that it would be nice to mention somewhere that arithmetic operations wrap around at overflow...
I've documented it in the same section, which now reads:
Quote:
Any integer constants outside this range are set to the nearest in-range integer. By contrast, arithmetic operations on integers wrap around upon overflow (e.g. 0x7FFFFFFFFFFFFFFF + 1 = -0x8000000000000000).

Concerning the rest of your post, I'm glad we have an expert like you "on staff". I should admit that one of the areas I'm not strong in is the repercussions and side effects of integer capacity (32 vs. 64 bit) and signed vs. unsigned on math results. My general impression is that 99.9% of scripts are not affected by these distinctions. If true, the proposed #AllowTypedNumbers mode might understandably be given a somewhat low priority.

Another reason that it should perhaps be a low priority is performance: As the script runs, all numbers must be parsed and converted to integers and floating point numbers. Anything that slows down this process would impact the performance of nearly every aspect of the program.

I'd welcome more discussion on this from you or anyone else who has an interest, especially anyone who has found the signed 64-bit nature of AutoHotkey to be problematic for some tasks.

Thanks for sharing your expertise.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 18th, 2005, 9:02 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Algorithms computing something sufficiently complex often depend on the precision of the arithmetic and the sign. For example, long integer arithmetic packages often use sign-magnitude number representation, where signed digits (machine words) cause unnecessary complications. In case of an overflow some multiplication operations return the least significant bits (modulo), others return the most significant bits (rounding). Comparisons of signed numbers are different from unsigned ones. Shifting of different length numbers give different results when 1-bits leave the word or when they have to be shifted in.

If AHK scripts could give the result of all kind of typed integer arithmetic, we could use it for "serious" applications, like
- Cryptography
- Pseudorandom number generation
- Hash functions
- Coding theory, error correction
- SW verification, prototyping
+ Image/signal processing
+ Long integer algorithms
+ Transforms
+ Searching and sorting
+ Complex data structures
+ Cellular automata, shift registers, CRC
+ Computation in Galois fields (bit vectors)
+ Polynomial/Integer matrix computation
+ Symbolic computation, Set operations
+ Exact numeric computation

A good example is the chat-room encryption. I can write the crypto code in C for the cipher in the matter of hours (or get it from the Web), but assigning the function to a hotkey, decrypting selected text from a Windows control, using popup windows for the results are hard. AHK gives these for free, but to get the cipher work as specified is a pain now, because of the different integer type AHK uses.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 18th, 2005, 9:53 pm 
Offline

Joined: March 2nd, 2004, 3:36 pm
Posts: 10720
Great response (very persuasive). Since I'm not an expert in computational theory and machine math, I would either have to rely on someone else for the design or adopt the syntax of one or more existing languages verbatim. For example, perhaps a limited subset of ANSI C type-casting can be implemented in expressions. Something like:

Var := (float)((int)Var1 + (int64)Var2 + (UInt)-1)
; Where "float" should probably be 64-bit vs. 32-bit by default.

Hopefully this wouldn't be too hard to implement. But I'm not sure if it would provide complete control over math operations because I lack the expertise to judge it, especially in light of the fact that the program stores all intermediate expression results as 64-bit signed integers or floating point numbers. Perhaps this storage method wouldn't hamper type-casting as long as the integers are 32-bit or signed 64-bit. Unsigned 64-bit might lead to some complications.

There is also a tentative plan to expand variables to be capable of storing integers and floating point numbers directly rather than as strings, which should boost performance and precision in some cases.

If you or anyone would like to help with the design of type casting or some similar mechanism, I'd welcome it.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 19th, 2005, 5:16 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
The fastest running solution could be, as Chris says, storing internally next to the variable name and length the type of the value and the value of the variable (which is a pointer in case of strings). These can be unsigned or signed integers of length 8, 16, 32 or 64 bits (8 choices), float, double, pointer, string, file, window ID, pixel color, even structures. This list could grow as the language evolves.

Short of these, we have to make some decisions:
- Data type is assigned dynamically, like with casting: (uint32)100. It requires a new construct, and modifying the behavior of the arithmetic operators. x := (uint32)100 is the same as x := 100 or x = 100, and x := (uint32)1 + (uint32)2 still gives the string "3", but adding large numbers will be different. The difficulty is that the addition operator needs to find the types of both of its parameters, which might not always be obvious: (uint32)2*3 + 4*(uint8)5. Also, functions should pass the cast data types: Sin((float)2) or Sin((double)2) should be different.
- Operators are distinguished with explicit specification of data types. This we can do with functions w/o changing the language: Add(2,3,uint8) or ShiftRight(x,1,int32), but the simplicity of the expressions will be lost. 1+2*3 would look Add(1,Mult(2,3,uint8),uint8). We would need to add extra parameters to the math functions, like Abs(x,type=int64) to be able to give right answers: Abs(0xffffffff,int32) is 1, but Abs(0xffffffff,int64) is 0xffffffff.
- Let the data itself carry the information of its type. Changing the type could be done with functions or by editing the string representing the numbers. This was my previous proposal. Since the internal data representation must not be affected, the strings representing the numbers need to be tweaked.

+0 - prefix:
~~~~~~~~
-1, 325, 0xff: signed 64-bit numbers
+1, +100, +0x1a: unsigned 64-bit numbers
-01, 0325, 0x0ff: signed 32-bit numbers
+01, +0100, +0x01a: unsigned 32-bit numbers
-001, 00325, 0x00ff: signed 16-bit numbers
+001, +00100, +0x001a: unsigned 16-bit numbers
-0001, 000325, 0x000ff: signed 8-bit numbers
+0001, +000100, +0x0001a: unsigned 8-bit numbers

One could use postfixes, like 325UL, but that will break more scripts than prefixes, which yield still valid numbers. Operators would produce results of similarly encoded data types according to the C convention, like "+03" + "+05" = "+08". This looks ugly only with constants, variables carry their type (and the extra characters) inside, set up at the time of the last assignment.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 20th, 2005, 3:08 am 
Offline

Joined: March 2nd, 2004, 3:36 pm
Posts: 10720
Although abs((int32)0xFFFFFFFF) would produce 1 as desired, I can see from your post that there are probably many other examples for which such simple type casting wouldn't be a complete solution (and possibly not even a very useful idea by itself).

So rather than risk breaking scripts by adding a plus sign prefix right away (which seems ingenious, by the way), I'm leaning toward waiting until a more complete solution is designed: something that stores intermediate, integer expression results as the right type rather than always as 64-bit integers. And perhaps also the ability for variables to internally store numbers rather than only strings.

This is the kind of thing that is a little daunting to work on because of difficulty in finding the right balance between how many scripts get broken vs. doing the job right. So I'm a little reluctant to even start on this anytime soon because I could easily see spending two weeks or more on it, mostly in deep thought pondering things that are mysterious to me but obvious to a professional computer scientist.

I don't expect you to devote more time to this unless you have a lot of desire and/or time to see this get done, in which case you might have to do a lot of the design and feasibility analysis yourself. I don't know how large such a task would be since you would have to reconcile the current 64-bit signed nature of AHK expressions -- along with all its repercussions -- against how typed integers would work. From this, a plan would have to be devised to retain backward compatibility while still being correct and useful for the implementation of the algorithms you mentioned.

Anyway, I hope you won't be too disappointed by this. It all comes down to prioritization: weighing my own limitations and abilities against the list of pending features, and how many users actually benefit.

Edit: Fixed -1 to be 1 at the top.


Last edited by Chris on July 20th, 2005, 12:50 pm, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 20th, 2005, 4:04 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
It was too nice to be hoped for, the full integer data typing. Matlab is very expensive commercial SW and still it took more than ten years for them to include some kind of support of these, which is still not complete. Once I was a member of team which extended atoms in Lisp with this kind of attributes to facilitate numerical computations. Half a year was necessary for the five of us to do it right.

I have a job with a lot of work, so I cannot dig in the AHK code to do some implementation or detailed design. I am really sorry about it. Let’s just hope we live long enough to see it happening. In the mean time I shall go with the way of the explicit functions (sometimes simplified as in the TEA cipher, where masking of the final results was enough - assuming I did not make a mistake. Somebody please check it!).


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 4 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group