Here is the alpha version of a Multi Precision (long integer) arithmetic library for AutoHotkey. It uses machine code functions for low level operations, so working with many thousand digit numbers is reasonably fast. There is no dependency from external dll's.
The application has to include the MP_ functions listed below, and call first the MP_Init() function, to initialize machine code functions.
Long integers are stored in binary structures {Int:Sign, UInt:LenInUInts, UInts:LSword..MSword}, therefore no direct assignments can be performed. The functions below create new long integers for their results, if needed:
MP_LEN(x) ; Length in 32-bit integers
MP_IS0(x) ; test if x IS 0
MP_CMP(x, y, eq="") ; Compare: -0+ for x <=> y, eq = #equal MS words
MP_SFT(y, x, s) ; Shift: s>0: y <- x << s, s<0: y <- x >> -s
MP_ADD(z, y, x) ; Add: z <- y + x
MP_SUB(z, y, x) ; Subtract: z <- y - x
MP_MUL(z, y, x) ; Multiply: z <- y * x
MP_DIV(q, r, y, x) ; Divide, Remainder: q <- y/x, r <- Mod(y,x), sign(q) = SY*SX
MP_GCD(g, c, d, x, y) ; Extended Euclidean GCD <- g = c*x + d*y
MP_AddIpp(y, x, U32) ; Add a 32-bit integer (pos+pos) to x
MP_SubIpp(y, x, U32) ; Subtract a 32-bit integer (pos+pos) from x
MP_MulIpp(y, x, U32) ; Multiply x with max 32-bit integer (pos*pos)
MP_DivIpp(y, x, U32) ; Divide y <- x // 32-bit integer (pos/pos), RETURN remainder
MP_Cpy(y, x) ; Copy y <- x
MP_Set(x, h) ; Set x <- string (hex, decimal integer, double precision float)
MP_Hex(x, tab="") ; HEX representation of MP integer x
MP_Dec(z, tab="") ; DECIMAL representation of MP integer x
PI(digits) ; Newton's method for the digits of Pi
PI0(digits) ; Newton's method for the digits of Pi, faster for less than 20,000 digits
SRT(value,digits) ; Square root with Newton's iteration, value <= 16 digits integer
If your script contains
MsgBox % SRT(2,100)
The first 100 digits of sqrt(2) is shown, immediately. For the first 100 digits of Pi a fraction of a second is used with
MsgBox % PI(100)
Dividing a million-bit number with one of half size finishes in about 2 seconds in my 2GHz Centrino notebook.
Here is a long list of test cases
Code:
#include mpl.ahk
MsgBox % "100 Digits of Sqrt(2):`n" SQT(2,100)
; 1414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572
MsgBox % "100 Digits of Pi:`n" PI0(100)
; 3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067
S = GCD:`n
MP_Set(x,30)
MP_Set(y,18)
MP_GCD(g,c,d, x,y)
S .= "`n" . MP_Dec(g) ", " MP_Dec(c) ", " MP_Dec(d) ; 6, -1, 2
MP_Set(x,2*3*5*7*11*13*17*23**4*29**3)
MP_Set(y,17**5*23*31**2)
MP_GCD(g,c,d, x,y)
S .= "`n" . MP_Dec(g) ", " MP_Dec(c) ", " MP_Dec(d) ; 391, -6878429, 763665233731
MP_Set(x,176023680645013966468226945392411250770384383304492191886725992896575345044216019675)
MP_Set(y,222232244629420445529739893461909967206666939096499764990979600)
MP_GCD(g,c,d, x,y) ; GCD of the 400th and 300th Fibonacci numbers = 100th Fibonacci number
S .= "`n" . MP_Dec(g) ", " MP_Dec(c) ", " MP_Dec(d)
; 354224848179261915075, -792070839848372253127, 627376215338105766356982006981782561278128
MP_Set(x,1414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572)
MP_Set(y,3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117068)
MP_GCD(g,c,d, x,y)
S .= "`n" . MP_Dec(g) "`n" MP_Dec(c) "`n" MP_Dec(d) ; 4
; 277618068276139372139970459366882728416722612456219294511921887244235666622301044198307082946902119
; -124972038264512876885991215247412539326590817554290162245629946727004984493348100970663253404223898
Msgbox %S%
/*
; Measure running time
n = 10000
MP_Set(a,1)
MP_SFT(x,a,64*n-1)
MP_SFT(y,a,32*n-1)
i := j := 4
Loop %n% {
Random r, -0x80000000,0x7fffffff
NumPut(r,x,i+=4)
Random r, -0x80000000,0x7fffffff
NumPut(r,x,i+=4)
Random r, -0x80000000,0x7fffffff
NumPut(r,y,j+=4)
}
t := A_TickCount
MP_Div(q,r,x,y)
MsgBox % A_TickCount - t ; 20,000/10,000 words 881ms(DIVpq),1065ms(DIVpp), 60,000/0,000 dec digits = 100 ms
; Cut'n Paste comparisions with other MP packages
Loop 5000 {
Random r, 0, 999
a .= SubStr(1000+r,2,2)
b .= SubStr(r,0)
}
MsgBox Dividend: %a%
MsgBox Divisor: %b%
MP_Set(x,a)
MP_Set(y,b)
MP_Div(q,r,x,y)
c := MP_Dec(q)
d := MP_Dec(r)
ClipBoard := "floor(" a "/" b ") - " c
MsgBox ClipBoard = floor(a / b) - q`nQuotient: %c%
ClipBoard := "(" a " mod " b ") - " d
MsgBox ClipBoard = (a mod b) - r`nRemainder: %d%
*/
S = Division:`n
MP_Set(x,12345678)
MP_Set(y,-1000)
MP_Div(q,r,x,y)
S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; -12345 / 678
MP_Set(x,1234567890123)
MP_Set(y,1000)
MP_Div(q,r,x,y)
S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; 1234567890 / 123
MP_Div(q,r,y,x)
S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; 0 / 1000
MP_Set(x,1234567890123456789012345)
MP_Set(y,1000000000000)
MP_Div(q,r,x,y)
S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; 1234567890123 / 456789012345
MP_Set(x,0xfffffffffffffffe00000000)
MP_Set(y,0xffffffffffffffff)
MP_Div(q,r,x,y)
S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xFFFFFFFF / 0xFFFFFFFeFFFFFFFF
MP_Set(x,0xffffffffe00000000)
MP_Set(y,0xfffffffff)
MP_Div(q,r,x,y)
S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xFFFFFFFF / 0xeFFFFFFFF (9 digits)
MP_Set(x,0x10000000000000000) ; 1,0,0
MP_Set(y,0x100000000) ; 1,0
MP_Div(q,r,x,y)
S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0x100000000 / 0x0 (9/1 digits)
MP_Set(x,0x1000000000000000000000000) ; 1,0,0,0
MP_Set(y,0x10000000000000001) ; 1,0,1
MP_Div(q,r,x,y)
S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xFFFFFFFF / 0xFFFFFFFF00000001 (8/16 digits)
MP_Set(x,0xffffffffe00000000) ; f,-2,0
MP_Set(y,0x111111111) ; 1,1..1
MP_Div(q,r,x,y)
S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xeFFFFFFFF / 0x11111111 (9/8 digits)
MsgBox %S%
S = Shift:`n
MP_Set(x,0x2)
MP_SFT(y,x,0)
S .= "`n" . MP_Dec(y) ; 2
MP_SFT(y,x,-2)
S .= "`n" . MP_Dec(y) ; 0
MP_Set(x,0x100000000)
MP_SFT(y,x,-32)
S .= "`n" . MP_Dec(y) ; 1
MP_Set(x,2222220000000022222222)
MP_SFT(y,x,1)
S .= "`n" . MP_Dec(y) ; 4444440000000044444444
MP_SFT(y,x,-1)
S .= "`n" . MP_Dec(y) ; 1111110000000011111111
MP_Set(x,1)
MP_SFT(y,x,2000)
S .= "`n" . MP_Dec(y) ; 2**2000 = 114813069527425452...62184851149029376
MsgBox %S%
S = Multiply:`n
MP_Set(x,10000000)
MP_Set(y,200)
MP_Mul(z,y,x)
S .= "`n" . MP_Dec(z) ; 2000000000
MP_Set(x,1000000000000)
MP_Set(y,-12345678901234256789)
MP_Mul(z,y,x)
S .= "`n" . MP_Dec(z) ; -12345678901234256789000000000000
MsgBox %S%
S = Subtract: `n
MP_Set(x,0xff)
MP_Set(y,255)
MP_Sub(z,y,x)
S .= "`n" . MP_Hex(z) ; 0x0
MP_Set(x,1)
MP_Set(y,0x100000000)
MP_Sub(z,y,x)
S .= "`n" . MP_Hex(z) ; 0xffffffff
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_Set(y,-1)
MP_Sub(z,y,x)
S .= "`n" . MP_Hex(z) ; -0x10000000000000000000000000000000000
MP_Set(x, 0xffffffffffffffffffffffffff12345678)
MP_Set(y, 0x10000000012345678)
MP_Sub(z,y,x)
S .= "`n" . MP_Hex(z) ; -0xFFFFFFFFFFFFFFFFFEFFFFFFFF00000000
MsgBox %S%
S = Add:`n
MP_Set(x, 0xffffffff00000001)
MP_Set(y,0xffffffffffffffffff00000000ffffffff)
MP_Add(z,x,y)
S .= "`n" . MP_Hex(z) ; 0x10000000000000000000000000000000000
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_Set(y,1)
MP_Add(z,y,x)
S .= "`n" . MP_Hex(z) ; 0x10000000000000000000000000000000000
MP_Set(x,-0xffffffffffffffffffffffffff12345678)
MP_Set(y,-0x10000000012345678)
MP_Add(z,y,x)
S .= "`n" . MP_Hex(z) ; -0x1000000000000000000FFFFFFFF2468ACF0
MP_Set(x,111111110000000033333333)
MP_Cpy(y,x)
MP_Add(z,y,x)
S .= "`n" . MP_Dec(z) ; 222222220000000066666666
MsgBox %S%
S = Compare:`n
MP_Set(x,-0x123456789abcdef0123456789abcdef01)
MP_Set(y,-0x123456789abcdef0123456789abcdef01)
c := MP_CMP(x,y,eq)
S .= "`n" c "/" eq ; 0/5
MP_Set(x,-0xfffffffffffffffffffccccccccccccc)
MP_Set(y,-0xfffffffffffffffffffffffffffffffd)
c := MP_CMP(x,y,eq)
S .= "`n" c "/" eq ; 1/2
MP_Set(x,-1)
MP_Set(y, 1)
c := MP_CMP(x,y)
S .= "`n" c ; -1
MP_Set(x,-0xffffffffffffffffffffffffffffffffff)
MP_Set(y,-0xfffffffffffffffff)
c := MP_CMP(x,y)
S .= "`n" c ; -1
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_Set(y,0xfffffffffffffffffffffffffffffffffd)
c := MP_CMP(x,y,eq)
S .= "`n" c "/" eq ; 1/4
MsgBox %S%
S = Operations with UInt:`n
MP_Set(x,1)
MP_SubIpp(y,x,0)
S .= "`n" . MP_Dec(y) ; 1
MP_SubIpp(y,x,1)
S .= "`n" . MP_Dec(y) ; 0
MP_SubIpp(y,x,2)
S .= "`n" . MP_Dec(y) ; -1
MP_Set(x,0x100000000)
MP_SubIpp(y,x,1)
S .= "`n" . MP_Hex(y) ; 0xffffffff
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_AddIpp(y,x,0xfffffff0)
S .= "`n" . MP_Hex(y) ; 0x100000000000000000000000000FFFFFFEF
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_MulIpp(y,x,0xfffffff0)
S .= "`n" . MP_Hex(y) ; 0xFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFF00000010
MP_Set(x,0xffffffffffffffffffffffffffffffffffffff)
r := MP_DivIpp(y,x,0xfffffff0)
S .= "`n" . r " / " MP_Hex(y) ; 4095(0xFFF) / 0x1000000100000010000001000000100
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
r := MP_DivIpp(y,x,0xfffffff0)
S .= "`n" . r " / " MP_Hex(y) ; 16777215(0xFFFFFF) / 0x100000010000001000000100000
MsgBox %S%
S = Set Hex:`n
i := 0
i++, MP_Set(a%i%,"0x0")
i++, MP_Set(a%i%, 0x1234567) ; no need for quotes
i++, MP_Set(a%i%, 0x12345678)
i++, MP_Set(a%i%, 0x123456789)
i++, MP_Set(a%i%,-0x123456789)
i++, MP_Set(a%i%, 0x1234567890abcdef1234567890abcdef0fedcba98765432100)
i++, MP_Set(a%i%,-0x1234567890abcdef1234567890abcdef0fedcba98765432100)
i++, MP_Set(a%i%,-9223372036854775807) ; 2^63-1
i++, MP_Set(a%i%, 100)
i++, MP_Set(a%i%,-0)
i++, MP_Set(a%i%, 1.5)
i++, MP_Set(a%i%,-0.75)
i++, MP_Set(a%i%, 2.**52)
i++, MP_Set(a%i%, 2.**52+3)
i++, MP_Set(a%i%, 2.**62+2.**59)
i++, MP_Set(a%i%, 2.**63-2.**59)
i++, MP_Set(a%i%, 2.**63)
i++, MP_Set(a%i%, 2.**63+2.**59)
i++, MP_Set(a%i%, 2.**64-2.**59)
i++, MP_Set(a%i%, 2.**64)
i++, MP_Set(a%i%, 2.**64+2.**59)
i++, MP_Set(a%i%, 2.**128-2.**120)
i++, MP_Set(a%i%, 2.**128)
i++, MP_Set(a%i%, 2.**128+2.**120)
i++, MP_Set(a%i%, 2.**843+2.**842+2.**840+2.**839+2.**837) ; ~ max 0xDA000...
i++, MP_Set(a%i%, 36028797018963968) ; 2**55 0x80000000000000
i++, MP_Set(a%i%, 4611686018427387904) ; 2**62 0x4000000000000000
i++, MP_Set(a%i%, 9223372036854775808) ; 2**63 0x8000000000000000
i++, MP_Set(a%i%, 18446744073709551616) ; 2**64 0x10000000000000000
i++, MP_Set(a%i%, 36893488147419103232) ; 2**65 0x20000000000000000
i++, MP_Set(a%i%,-987654321098765432109876543210) ;-0xC7748819DFFB62438D1C67EEA
Loop %i%
S .= A_Index . ": " . MP_Hex(a%A_Index%) . "`n"
MsgBox %S%
S = Set Decimal:`n
i := 0
i++, MP_Set(a%i%, 36028797018963968) ; 2**55 0x80000000000000
i++, MP_Set(a%i%, 4611686018427387904) ; 2**62 0x4000000000000000
i++, MP_Set(a%i%, 9223372036854775808) ; 2**63 0x8000000000000000
i++, MP_Set(a%i%, 18446744073709551616) ; 2**64 0x10000000000000000
i++, MP_Set(a%i%, 36893488147419103232) ; 2**65 0x20000000000000000
i++, MP_Set(a%i%,-987654321098765432109876543210) ;-0xC7748819DFFB62438D1C67EEA
Loop %i%
S .= A_Index . ": " . MP_Dec(a%A_Index%) . "`n"
MsgBox %S%
S = Hex/Decimal Conversion:`n
VarSetCapacity(z,999,Asc("0"))
MP_Set(x, 0x1 . SubStr(z,500))
S .= "`n" . MP_Dec(x) ; 2**2000 in decimal form = 114813069527425452...62184851149029376
MP_Set(x, 1 . SubStr(z,500))
S .= "`n`n" . MP_Hex(x) ; 10**500 in hexadecimal form = 0x1F365FEC9370D2BAEE34B150720878EA52CCBA5E661...000(*125)
MsgBox %S%
The long integer manipulation AHK functions are below:
Code:
; Multi Precision Library v.1.0
; Large Integer = {Int:Sign, UInt:LenInUInts, UInts:LSword..MSword}
; = little endian, signed magnitude (not 2's complement!)
MP_Init() ; Initialize Multi Precision Library
MP_CMP(ByRef x, ByRef y, ByRef eq="") { ; -0+ for x <=> y, eq = #equal MS words
Global CMPpp
eq := 0
SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int")
If (SX < SY)
Return -1
If (SX > SY)
Return 1
LX := NumGet(x,4), LY := NumGet(y,4)
If (LX < LY)
Return -SX
If (LX > LY)
Return SX
Return SX*dllcall(&CMPpp,"Int",LX, "UInt*",eq, "UInt",&x+8, "UInt",&y+8, "CDECL Int") ; same sign, length
}
MP_SFT(ByRef y, ByRef x, s) { ; s>0: y <- x << s, s<0: y <- x >> -s
Global SFL, SFR, LEN
L8 := NumGet(x,4)
If (s > 0) { ; shift left
offs := s//32, s &= 31
VarSetCapacity(y,(L8+offs)*4+8,0)
NumPut(NumGet(x,0,"Int"),y) ; sign
NumPut(L8+offs,y,4) ; new length
dllcall(&SFL, "UInt",&y+8+4*offs, "Int",L8, "UInt",&x+8, "UInt",s, "CDECL")
Return
} ; shift right
s := -s, offs := s//32, s &= 31
If (offs >= L8) { ; too large shift: nothing remains
VarSetCapacity(y,12,0)
NumPut(1,y), NumPut(1,y,4)
Return
}
VarSetCapacity(y,(L8-offs)*4+8,0)
NumPut(NumGet(x,0,"Int"),y) ; sign
dllcall(&SFR, "UInt",&y+8, "Int",L8-offs, "UInt",&x+8+4*offs, "UInt",s, "CDECL")
NumPut(dllcall(&LEN,"Int",L8-offs,"UInt",&y+8, "CDECL Int"), y, 4)
}
MP_ADD(ByRef z, ByRef y, ByRef x) { ; z <- y + x
Global CMPpp, ADDpp, SUBpp, LEN
SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int")
LX := NumGet(x,4), LY := NumGet(y,4), LZ := LX > LY ? LX : LY
If (SX = SY) { ; ADD
VarSetCapacity(z, LZ*4+12) ; room for carry
If (LY > LX)
dllcall(&ADDpp,"UInt",&z+8, "Int",LY,"UInt",&y+8, "Int",LX,"UInt",&x+8, "CDECL")
Else dllcall(&ADDpp,"UInt",&z+8, "Int",LX,"UInt",&x+8, "Int",LY,"UInt",&y+8, "CDECL")
NumPut(SX,z) ; set sign(z)
NumPut(dllcall(&LEN,"Int",LZ+1,"UInt",&z+8, "CDECL Int"), z, 4)
Return
} ; SUBTRACT
NumPut(1,x), NumPut(1,y) ; set both positive
c := MP_CMP(x,y,eq) ; compare, find same MS part
eq -= eq = LZ ; leave at least one word
VarSetCapacity(z,(LZ-eq)*4+8,0) ; no room needed for equal MS parts (fill 0 for x=y)
If (c = 0) ; x = y
NumPut(1,z), NumPut(1,z,4) ; z <- 0
Else If (c < 0) { ; x < y
dllcall(&SUBpp,"UInt",&z+8, "Int",LY-eq,"UInt",&y+8, "Int",LX-eq,"UInt",&x+8, "CDECL")
NumPut(Sy,z) ; sign(z) <- Sy
}
Else { ; x > y
dllcall(&SUBpp,"UInt",&z+8, "Int",LX-eq,"UInt",&x+8, "Int",LY-eq,"UInt",&y+8, "CDECL")
NumPut(Sx,z) ; sign(z) <- Sx
}
NumPut(dllcall(&LEN,"Int",LZ,"UInt",&z+8, "CDECL Int"), z, 4)
NumPut(Sx,x), NumPut(Sy,y) ; reset both signs
}
MP_SUB(ByRef z, ByRef y, ByRef x) { ; z <- y - x
SX := NumGet(x,0,"Int")
NumPut(-SX,x) ; x <- (-x)
MP_ADD(z,y,x) ; z <- y + (-x)
NumPut( SX,x) ; reset sign(x)
}
MP_MUL(ByRef z, ByRef y, ByRef x) { ; z <- y * x
Global MULpp, LEN
SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int")
LX := NumGet(x,4), LY := NumGet(y,4), LZ := LX + LY
VarSetCapacity(z, LZ*4+8, 0)
dllcall(&MULpp,"UInt",&z+8, "Int",LY,"UInt",&y+8, "Int",LX,"UInt",&x+8, "CDECL")
NumPut(dllcall(&LEN,"Int",LZ,"UInt",&z+8, "CDECL Int"), z, 4)
NumPut(Sy*Sx,z) ; sign(z) <- Sy*Sx
}
MP_DIV(ByRef q, ByRef r, ByRef y, ByRef x) { ; q <- y/x, r <- Mod(y,x), sign(q) = SY*SX, sign(r) = SY
Global DIVpq, LEN, CPY
SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int")
LX := NumGet(x,4), LY := NumGet(y,4)
If (Lx = 1) { ; Single digit x
If (NumGet(x,8) = 0)
Return "ERROR: Divide by 0"
rem := MP_DivIpp(q, y, NumGet(x,8))
VarSetCapacity(r,12) ; put rem in MP format
NumPut(SY,r), NumPut(1,r,4), NumPut(rem,r,8)
NumPut(Sy*Sx,q) ; set lengths, signs
Return
} ; Multi-digit x
If (MP_CMP(y,x) < 0) { ; y < x
VarSetCapacity(q,12,0)
NumPut(SY*Sx,q), NumPut(1,q,4) ; q <- 0
MP_CPY(r,y) ; r <- y
Return
} ; Multi-digits, y >= x
LQ := LY > LX ? LY-LX+1 : 1
VarSetCapacity(q, LQ*4+ 8, 0) ; q initialized to all 0!
VarSetCapacity(r, LY*4+12, 0) ; r <- y, 0-padded on both ends
dllcall(&CPY, "UInt",&r+8, "UInt",&y+8, "Int",LY*4, "CDECL")
dllcall(&DIVpq,"UInt",&q+8, "Int",LY,"UInt",&r+8, "Int",LX,"UInt",&x+8, "CDECL")
NumPut(dllcall(&LEN,"Int",LQ,"UInt",&q+8, "CDECL Int"), q, 4)
NumPut(dllcall(&LEN,"Int",LX,"UInt",&r+8, "CDECL Int"), r, 4)
NumPut(Sy*Sx,q), NumPut(Sy,r) ; sign(q) <- Sy*Sx, sign(r) <- Sy
}
MP_GCD(ByRef g, ByRef c, ByRef d, ByRef x, ByRef y) { ; Extended shifting Euclidean GCD = g = c*x + d*y
Global xGCD, CPY
If (MP_IS0(x)) {
MP_CPY(g,y), MP_SET(c,0), MP_SET(d,1)
Return
}
If (MP_IS0(y)) {
MP_CPY(g,x), MP_SET(c,1), MP_SET(d,0)
Return
}
L := ((MP_LEN(x) > MP_LEN(y) ? MP_LEN(x) : MP_LEN(y)) + 3) * 4 ; temp length in bytes
VarSetCapacity(Z, 6*L, 0)
dllcall(&CPY, "UInt", &Z, "UInt",&x, "Int",MP_LEN(x)*4+8, "CDECL") ; copy input
dllcall(&CPY, "UInt",&Z+L, "UInt",&y, "Int",MP_LEN(y)*4+8, "CDECL") ; copy destroyed in xGCD
dllcall(&xGCD, "UInt*",f,"UInt*",a,"UInt*",b, "UInt",&Z,"UInt",&Z+L ; do the work
, "UInt",&Z+2*L,"UInt",&Z+3*L,"UInt",&Z+4*L,"UInt",&Z+5*L)
s := NumGet(f+0,4)*4+8, VarSetCapacity(g,s) ; copy result from buffer
dllcall(&CPY, "UInt",&g, "UInt",f, "Int",s, "CDECL")
s := NumGet(a+0,4)*4+8, VarSetCapacity(c,s)
dllcall(&CPY, "UInt",&c, "UInt",a, "Int",s, "CDECL")
s := NumGet(b+0,4)*4+8, VarSetCapacity(d,s)
dllcall(&CPY, "UInt",&d, "UInt",b, "Int",s, "CDECL")
}
MP_AddIpp(ByRef y, ByRef x, I32) { ; add a 32-bit integer (pos+pos) to x
Global AddIpp
L8 := NumGet(x,4)
VarSetCapacity(y,L8*4+12) ; one extra word for carry
c := dllcall(&AddIpp, "UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",I32, "CDECL UInt")
NumPut(1,y) ; positive
NumPut(c>0 ? L8+1 : L8,y,4) ; true length
}
MP_SubIpp(ByRef y, ByRef x, I32) { ; subtract a 32-bit integer (pos+pos) from x
Global SubIpp, LEN
L8 := NumGet(x,4)
If (L8 < 2) {
VarSetCapacity(y,12)
d := NumGet(x,8,"UInt") - I32
s := d < 0 ? -1 : 1
NumPut(s,y), NumPut(1,y,4), NumPut(s*d,y,8)
Return
}
VarSetCapacity(y,L8*4+8) ; no borrow
c := dllcall(&SubIpp, "UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",I32, "CDECL UInt")
NumPut(1,y) ; positive
NumPut(dllcall(&LEN,"Int",L8,"UInt",&y+8, "CDECL Int"), y, 4)
}
MP_MulIpp(ByRef y, ByRef x, I32) { ; multiply x with max 32-bit integer (pos*pos)
Global MulIpp
L8 := NumGet(x,4)
VarSetCapacity(y,L8*4+12) ; one extra word for carry
c := dllcall(&MulIpp,"UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",I32, "CDECL UInt")
NumPut(1,y) ; positive
NumPut(c>0 ? L8+1 : L8,y,4) ; true length
}
MP_DivIpp(ByRef y, ByRef x, U32) { ; y := x // 32-bit integer (pos/pos), RETURN remainder
Global LEN, DivIpp
L8 := NumGet(x,4)
VarSetCapacity(y,L8*4+8)
r := dllcall(&DivIpp,"UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",U32, "CDECL UInt")
NumPut(1,y) ; positive
NumPut(dllcall(&LEN,"Int",L8,"UInt",&y+8, "CDECL Int"), y, 4)
Return r
}
MP_Cpy(ByRef y, ByRef x) { ; y <- x
Global CPY
L8 := NumGet(x,4)
VarSetCapacity(y,L8*4+8)
dllcall(&CPY, "UInt",&y, "UInt",&x, "Int",L8*4+8, "CDECL")
}
MP_Set(ByRef x, h) { ; x <- string (hex, decimal integer, double precision float)
Global U32toHex
Static D52 := 4503599627370496.0, d := 0x1234567890123456, mk = 0xFFFFFFFFFFFFF
If (SubStr(h,1,1) = "-")
sign := -1, h := SubStr(h,2) ; negative
Else
sign := 1 ; positive
If (SubStr(h,1,2) = "0x") { ; arbitrary long hex
L8 := (StrLen(h)+5)//8
VarSetCapacity(x,L8*4+8)
Loop % L8-1
NumPut("0x" . SubStr(h,1-A_Index*8,8), x, 4+A_Index*4)
NumPut(SubStr(h,1,StrLen(h)+8-L8*8), x, 4+L8*4)
NumPut(L8,x,4)
}
Else If h contains e,. ; decimal double
{
If (h <= D52) ; 52-bit precise AHK conversion
MP_Set(x, Round(h)) ; sign fixed at the end
Else {
NumPut(h,d,0,"Double")
e := (NumGet(d,6,"Short")>>4) - 1075 ; power_of_2 exponent (0x3ff+52)
f := NumGet(d,0,"Int64") & mk | D52 ; fraction 1..2 * 2.**-52
If e < 11
MP_Set(x, f<<e) ; h < 2**63 (Int64)
Else {
fm := f >> 32
fl := f & 0xffffffff
el := e & 31, ek := e >> 5
If (el < 12) { ; 2 words
VarSetCapacity(x,16+4*ek,0)
NumPut(2+ek,x,4)
NumPut(fl<<el,x,8+4*ek)
NumPut(fm<<el | fl>>(32-el), x ,12+4*ek)
}
Else { ; 3 words
VarSetCapacity(x,20+4*ek,0)
NumPut(3+ek,x,4)
NumPut(fl<<el,x,8+4*ek)
NumPut(fm<<el | fl>>(32-el), x ,12+4*ek)
NumPut(fm>>(32-el), x ,16+4*ek)
}
}
}
}
Else If (StrLen(h) < 19) ; decimal integer
If (h > 0xffffffff) {
VarSetCapacity(x,16) ; 63 bits
NumPut(h & 0xffffffff,x,8)
NumPut(h>>32,x,12)
NumPut(2,x,4)
} Else { ; 32 bits
VarSetCapacity(x,12)
NumPut(h,x,8)
NumPut(1,x,4)
}
Else { ; > 18 digits
d := Mod(StrLen(h)-1,9) + 1 ; 1..9
MP_Set(x,SubStr(h,1,d)) ; MS 1..9 digits
StringTrimLeft h, h, d
Loop % StrLen(h)//9 {
MP_MulIpp(y, x, 1000000000)
MP_AddIpp(x, y, SubStr(h,1,9))
StringTrimLeft h, h, 9
}
}
NumPut(sign,x)
}
MP_Hex(ByRef x, tab="") { ; HEX representation of MP integer x, tab before every 8 digits
Global U32toHex, U32to8Hex
Static S := "12345678"
L8 := NumGet(x,4)
VarSetCapacity(h,3+L8*8)
DllCall(&U32toHex, "Str",h, "UInt",NumGet(x,4+L8*4), "CDECL")
h := NumGet(x)=1 ? "0x" . h : "-0x" . h
Loop % L8-1
DllCall(&U32to8Hex, "UInt",&S, "UInt",NumGet(x,L8*4+4-A_Index*4), "CDECL"), h .= tab . S
Return h
}
MP_Dec(ByRef z, tab="") { ; DECIMAL representation of MP integer x, tab before every 9 digits
Static d9 := 1000000000
MP_CPY(x,z) ; copy of input (destroyed)
sign := NumGet(x)
VarSetCapacity(d,NumGet(x,4)*10) ; allocate enough memory
Loop {
c := MP_DivIpp(y, x, d9) ; no leading 0's
If MP_IS0(y)
Break
d := SubStr(c+d9,-8) . tab . d ; attach to the left with leaging 0's
c := MP_DivIpp(x, y, d9) ; no leading 0's
If MP_IS0(x)
Break
d := SubStr(c+d9,-8) . tab . d ; pad-left, prepend
}
Return (sign = 1 ? "" : "-") . c . tab . d
}
MP_LEN(ByRef x) {
Return NumGet(x,4)
}
MP_IS0(ByRef x) {
Return NumGet(x,4)=1 && NumGet(x,8)=0
}
PI(digits) { ; Newton's method, 2 accurate bits/iteration, less rounding errors, < 905 million digits
VarSetCapacity(y,digits*1.3,Asc("0"))
MP_SET(x, 3 . y) ; initial estimate pi = 3
MP_CPY(y,x)
Loop {
i := 2*A_Index-1 ; for better speed: start with a large i and good initial value
MP_SET(n,i*i)
MP_MUL(z,y,n)
MP_SET(d,4*(i+1)*(i+2)) ; up to 63 bits, i < 1.4 * 2**30, up to 1.4*2**31 accurate bits
MP_DIV(y,n,z,d) ; the remainder in n can be used for rounding: fewer guard digits
MP_ADD(z,x,y)
MP_CPY(x,z)
If (MP_LEN(y)<2) ; last few terms can be floats
Break
}
Return SubStr(MP_DEC(x),1,digits)
}
PI0(digits) { ; Newton's method, little faster, for less than 20,000 digits
VarSetCapacity(y,digits+sqrt(2*digits),Asc("0")) ; digits+sqrt(digits) is expected to be OK
MP_SET(x, 3 . y) ; x = estimate of pi, x0 = 3
MP_CPY(y,x) ; y = current term to be added
Loop {
i := 2*A_Index-1
MP_MulIpp(z,y,i*i) ; 32-bit UInt parameter
d := 4*(i+1)*(i+2) ; i < 2**15 ==> accuracy < 2**16 bits =~ 19,728 digits
r := MP_DivIpp(w,z,d) ; remainder used for rounding: fewer guard digits
MP_AddIpp(y,w,r>=d//2)
MP_ADD(z,x,y)
MP_CPY(x,z)
If (MP_LEN(y) < 2)
Break
}
Return SubStr(MP_DEC(x),1,digits)
}
SQT(value,digits) { ; Newton iteration, value <= 16 digits integer
VarSetCapacity(z,digits,Asc("0"))
StringLeft z, z, digits ; trailing 0's
a = %value%%z%%z% ; scaled input
s := Round(sqrt(value)*1.e16, 1) ; float form, to have more than 18 digits
StringTrimRight s, s, 2 ; remove ".0" from end
s .= z ; pad with 0's
StringTrimRight s, s, 16 ; initial value
MP_SET(V, a)
MP_SET(X, s)
Loop {
MP_DIV(y,z,V,X)
MP_ADD(z,X,y)
MP_SFT(X,z,-1)
If MP_CMP(X,y) = 0
Break
}
Return SubStr(MP_DEC(x),1,digits)
}
MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
VarSetCapacity(code,StrLen(hex)//2)
Loop % StrLen(hex)//2
NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char")
}
MP_Init() { ; INITIALIZE global machine code functions
GLOBAL
MCode(CPY,"568B74241085F67E118B4C240C8B4424088A11881040414E75F75EC3") ; ~memcpy, copy BYTES
MCode(LEN,"8B442404EB0A8B4C2408833C810077054879F333C040C3")
MCode(SFL,"558BEC8B450C568B7508576A205A2B551433FF85C07E2489450C8B4510538B188B4D14D3E38BCA0BDF891E8B3883C604D3EF"
. "83C004FF4D0C75E45B893E5F5E5DC3")
MCode(SFR,"558BEC8B551053568B7508578B7D0C6A208BCFC1E102582B451433DB03F103D185FF7E1F897D0C8B4D1483EA048B3AD3EF83"
. "EE048BC80BFB893E8B1AD3E3FF4D0C75E45F895EFC5E5B5DC3")
MCode(UDIV,"8B4424048B5424088B4C2410F774240C8911C3") ; ErrorLevel = 0xc0000095: Integer Overflow
MCode(ADC, "33D28B4C240C8B010344240883D2000344240483D2008911C3") ; x+y+*c, c <- carry
MCode(SBB, "33D28B4C240C8B4424042B0183D2002B44240883D2008911C3") ; x-y-*b, b <- borrow
MCode(MulIpp,"558BEC518365FC008B4D08578B7D0C85FF7E2153568B75108B06F7651433DB83C6040345FC13D3890183C1044F8955FC75E6"
. "5E5B8B45FC89015FC9C3")
MCode(DivIpp,"558BEC83EC0C8365FC00578B7D0C4F78398D45FC538B5D1089450C8B4508568D34B82BD88B45FC8945F88B04338945F48B45"
. "F48B55F88B4D0CF77514891189064F83EE0485FF7DDC5E5B8B45FC5FC9C3")
MCode(AddIpp,"558BEC8B45148B4D08568B750C85F67E1F8B5510578B3A03F8893983C20483C1043BF8730533C040EB0233C04E75E65F8901"
. "5E5DC3")
MCode(SubIpp,"558BEC8B550C85D27E2C8B45108B4D08568B302B751489318B3083C10483C0043B75147309C7451401000000EB048365140"
. "04A75DC5E8B45145DC3")
MCode(CMPpp,"558BEC8B550C8B450883220048535657781C8B75148B7D108D0C862BFE8B1C0F3B197511FF024883E90485C07DEF33C05F5E"
. "5B5DC38B4D10C1E0028B0C08390C301BC083E00248EBE7")
MCode(ADDpp,"558BEC83EC108B45148365FC0085C053568B7508578B7D107E3B8B5D188D4DFC894DF08945F88B038945F48B0789450833D2"
. "8B4DF08B010345F483D20003450883D2008911890683C60483C70483C304FF4DF875D18B450C2B451485C07E2D8D4DFC894D148BD88B07"
. "89450833D28B4D148B0183C00083D20003450883D2008911890683C60483C7044B75DB8B45FC5F89065E5BC9C3")
MCode(SUBpp,"558BEC83EC148B45148365FC0085C053568B7508578B7D107E3B8B5D188D4DFC894DEC8945F88B038945F48B078945F033D2"
. "8B4DEC8B45F02B0183D2002B45F483D2008911890683C60483C70483C304FF4DF875D18B450C2B451485C07E2D8D4DFC894D0C8BD88B07"
. "89451433D28B4D0C8B45142B0183D20083E80083D2008911890683C60483C7044B75DB5F5E5BC9C3")
MCode(MULpp,"558BEC8B451483EC0C85C07E628B4D08538B5D18568B751057894D142BF18945F48B450C8B4D1433D285C08955FC7E2B8945"
. "F88B040EF72333FF0345FC13D701018B3983C1043BF8730533C040EB0233C003D0FF4DF88955FC75D88345140483C30483EE04FF4DF489"
. "1175B55F5E5BC9C3")
MCode(DIVpp,"558BEC83EC288B4D1856578B7D148B44B9FC0FBDD08955FC6A1F5E2BF28B54B9F88B4DFCD3EA8BCED3E08975ECD1EA0BD08B"
. "450C3BC78955E88945F00F8C210100008B4D102BC78D048189450C8B45082BC1538945D8EB038B4D108B5DF0C1E3028D740BFC8B068B76"
. "FC03D98B4DFC8BD0D3EA8B4DEC895DE48B1BD3E38B4DFCD3EE8B4DECD3E0D1EAD1EE0BD30BF03B55E88955F88975E073168D45148945F4"
. "8B45E08B55F88B4DF4F775E88911EB0383C8FF836514008365F80085FF8945F47E328B750C8B4DF88B45188B0488F765F433C903451413"
. "D18B0E3BC81BDBF7DB2BC803DAFF45F8890E83C604397DF8895D147CD18B45148B4DE43901744233DB3BFB895D147E368B750C8D451489"
. "45DC8B45188B04988945F88B068945E033D28B4DDC8B010345F883D2000345E083D200891189064383C6043BDF7CD3FF4DF48B45E48320"
. "008B550CFF4DF08B45D88B4DF4836D0C04397DF0890C100F8DF6FEFFFF5B5F5EC9C3") ; shortest code 359 bytes
MCode(DIVpq,"558BEC83EC2C538B5D1856578B7D148B44BBFC0FBDC88B54BBF8894DFC6A1F592B4DFC8BF2894DF48B4DFCD3EE8B4DF4D3E0"
. "D3E2D1EE0BF083FF028975E48955E87E0E8B44BBF48B4DFCD3E8D1E80945E88B450C3BC78945EC0F8C300100002BC7C1E0028945F88B5D"
. "EC8B4D10C1E3028D740BFC8B068B76FC03D98B4DFC8BD0D3EA8B4DF4895DE08B1BD3E38B4DFCD3EE8B4DF4D3E0D1EAD1EE0BD30BF03B55"
. "E48955F08975DC732A8D451489450C8B45DC8B55F08B4D0CF775E4891189450CF765E88945D48B4514403BD07609FF4D0CEB04834D0CFF"
. "836514008365F00085FF7E388B45F88B4D108D34088B4DF08B45188B0488F7650C33C903451413D18B0E3BC81BDBF7DB2BC803DAFF45F0"
. "890E83C604397DF0895D147CD18B45148B4DE03901744833DB3BFB895D147E3C8B4D108D45148945D88B45F88D34088B45188B04988945"
. "F08B068945DC33D28B4DD88B010345F083D2000345DC83D200891189064383C6043BDF7CD3FF4D0C8B45E08320008B55F8FF4DEC8B4508"
. "8B4D0C836DF804397DEC890C020F8DD8FEFFFF5F5E5BC9C3") ; Quotient correction 404 bytes, 20..30% faster @ large args!
MCode(U32toHex,"8B542404566A1C33F6598B44240CD3E883E00F03F0750485C9750E83F80976040437EB02043088024283E90479DCC602005EC3")
MCode(U32to8Hex,"8B5424046A1C598B442408D3E883E00F83F8097E040437EB02043088024283E90479E4C60200C3")
MCode(xGCD, "558BEC8B551C836208008B4D2433C0408942048901894104894108830AFF8B4D2083EC20538B5D1489018941048941088B4D"
. "28836108008309FF56578941048B43048B55188BC82B4A04740A3BC81BC083E00248EB234083F8027C1B8BFB8D34822BFA8B0C372B0E0F"
. "85C60000004883EE0483F8027DEC33C085C00F840B0400007D228B4D1C8BC38945188B4524894D248B4D2089451C8B45288BDA8B551889"
. "4D288945208B43040FBD4C8304894D148B4A040FBD748A048B7D142BFE8975EC8BF02BF185FF897D147D054E83451420408D7E023BC789"
. "45EC7E426A1F592B4D142BC68D0482894DF88945F08B78FC8B4DF88B00D3EF8B4D14D3E08B4DECD1EF0BF88B048B2BC775338B45F04983"
. "E8048D7E023BCF894DEC8945F07FCF8B4D148B42088B3CBBD3E08BC88BC72BC174173BC7EB0B3B0C83E912FFFFFF3B048B1BC083E00248"
. "EB138D4601EB07833C830077154883F8027DF433C085C00F842A0300007D03FF4D14837D14007D054E834514208B4D148365FC008D45FC"
. "8945F48B4208D3E08D7CB3088945EC8B078945F033D28B4DF48B45F02B0183D2002B45EC83D200891189078B4B048D4603413BC18945F4"
. "77608D4DFC894DE86A1F592B4D142BC6894DF88B4D188D3C818B4DF88B47FC8B17D3E88B4D14D3E2D1E80BC28945EC8B45F48B04838945"
. "F033D28B4DE88B45F02B0183D2002B45EC83D20089118B4DF4FF45F489048B8B430483C704403945F476B78B430433C94140EB07833C83"
. "0075084883F80277F4EB038D48FF894B04837C8B04000F844C0200008B4D148365FC008D45FC8945F08B451C8B4008D3E08945EC8B4524"
. "8D7CB0088B078945E833D28B4DF08B010345EC83D2000345E883D200891189078B451C8B40048D7C30018B45248B4004403BF873028BF8"
. "8D46033BC78945F4776C8D4DFC894DE46A1F592B4D142BC6894DF88B4D1C8D04818945F0EB038B45F08B50FC8B4DF88B00D3EA8B4D14D3"
. "E08B4DF4D1EA0BD08B45248D04888945E08B008955EC8945E833D28B4DE48B010345EC83D2000345E883D20089118B4DE0FF45F48345F0"
. "04397DF4890176B08B551C8D45FC8945E86A1F8BC72BC68B0482592B4D14D3E8894DF8D1E88945E48B45248B44B8048945E033D28B4DE8"
. "8B010345E483D2000345E083D20089118B4D248944B9048B45FC85C089790476098944B908FF4104EB0B837CB9040075044F8979048B4D"
. "148B7D288365FC008D45FC8945E88B45208B4008D3E08945E48B44B7088945E033D28B4DE88B010345E483D2000345E083D20089118B4D"
. "208944B7088B41048D5430018B4704403BD08955F473038945F48D46033B45F48945F0775D2BC68D55FC8D04818955E88945ECEB038B45"
. "EC8B50FC8B4DF88B00D3EA8B4D14D3E0D1EA0BD08B45F08B04878955E48945E033D28B4DE88B010345E483D2000345E083D20089118B4D"
. "F08345EC0489048F413B4DF4894DF076B68B4D208D45FC8945E48B45F48BD02BD68B14918B4DF8D3EA8D7487048B06894514D1EA8955E0"
. "33D28B4DE48B010345E083D20003451483D20089118B4DFC85C989068B45F4894704760C894C8708FF4704E9C8FBFFFF833E000F85BFFB"
. "FFFF8B4D2848E9B3FBFFFF8B55188B45088B4D1C89108B450C89088B45108B4D205F5E89085BC9C3") ; 1190 bytes
}
The low level functions were coded in C, compiled by VS'05, and their machine code listing was converted to hex and pasted into the MP_Init() function. Here they are:
Code:
typedef long long Int64;
typedef int Int32;
typedef short Int16;
typedef char Int8;
typedef unsigned long long UInt64;
typedef unsigned int UInt32;
typedef unsigned long ULong;
typedef unsigned short UInt16;
typedef unsigned char UInt8;
// for dll creation
#define static
// Intrinsic functions (insted of #include <intrin.h>)
UInt64 __emulu(UInt32 ua, UInt32 ub); // <- 64bit ua*ub (no intrinsic division!)
__forceinline UInt8 _BitScanForward( UInt32 *Index, UInt32 Mask);// bsf, <- 0 if Mask=0, Index <- LS bit position
__forceinline UInt8 _BitScanReverse( UInt32 *Index, UInt32 Mask);// bsr, <- 0 if Mask=0, Index <- MS bit
#define XTR 2
#define SGN(x) x[0]
#define LEN(x) x[1]
#define LSW(x) x[2]
#define LST(x) XTR-1+LEN(x)
#define MSW(x) x[LST(x)]
#define SET(x,v) SGN(x)=1; LEN(x) = 1; LSW(x) = v
#define SWP(x,y) t = x; x = y; y = t
__forceinline static UInt32 UDIV(UInt32 LS, UInt32 MS, UInt32 d, UInt32 *r) { // <- n64/u, r <- n64%u. (n64=MS|LS)
__asm {
mov eax, LS
mov edx, MS
mov ecx, r
div d
mov DWORD PTR [ecx], edx
}
}
__forceinline static UInt32 ADC(UInt32 x, UInt32 y, UInt32 *c) { // <- x + y + c, c <- new carry
__asm { // x += y + *c; *c = carry; return x;
xor edx, edx // edx = 0
mov ecx, c // ecx = c
mov eax, dword ptr[ecx] // eax = *c
add eax, y // eax += y, set carry
adc edx, 0 // edx = carry
add eax, x // eax += x, eax --> return
adc edx, 0 // edx += carry
mov dword ptr[ecx], edx // *c = carry
}
}
__forceinline static UInt32 SBB(UInt32 x, UInt32 y, UInt32 *b) { // <- x - y - b, b <- new borrow
__asm { // x -= y + *b; *b = borrow; return x;
xor edx, edx // edx = 0
mov ecx, b // ecx = b
mov eax, x // eax = x
sub eax, dword ptr[ecx] // eax -= *b
adc edx, 0 // edx = carry
sub eax, y // eax -= y, set carry
adc edx, 0 // edx += carry
mov dword ptr[ecx], edx // *b = borrow
}
}
__forceinline Int32 CMP(UInt32 *x, UInt32 *y) { // MP_ numbers compared -0+
Int32 i; UInt32 c;
c = LEN(x) - LEN(y);
if (c != 0)
return LEN(x)>c ? 1 : -1;
for (i = LEN(x)+XTR-1; i >= XTR; --i) {
c = x[i] - y[i];
if (c != 0)
return x[i]>c ? 1 : -1;
}
return 0;
}
__forceinline Int32 shCMP(UInt32 *x, UInt32 *y, Int32 o, Int32 s) { // MP_ number x compared with y[+o]<<s
Int32 i; UInt32 c; // s = shift #bits in words, o = ofset in words
for (i = LST(x); i > o+XTR; --i) { // from MS to LS words, ASSUME equal MS bits!!
c = x[i] - ((y[i-o]<<s) | (y[i-o-1]>>(31-s)>>1));
if (c != 0) // not equal, we can finish compare
return x[i]>c ? 1 : -1;
} // all equal until LS word of shifted y
c = x[o+XTR] - (y[XTR]<<s); // partial shifted LS word of y
if (c != 0)
return x[o+XTR]>c ? 1 : -1;
for (i = o+XTR-1; i >=XTR; --i) // the rest of y == 0
if (x[i] > 0) return 1;
return 0;
}
__forceinline void shSUB(UInt32 *x, UInt32 *y, Int32 o, Int32 s) { // MP_INT x -= y[+o]<<s. x be larger!
UInt32 i, b = 0; // s = shift #bits in words, o = ofset in words
x[o+XTR] = SBB(x[o+XTR], y[XTR]<<s, &b); // Subtract with borrow, partial shifted word
for (i = o+XTR+1; i <= LST(x); ++i) // from LSW to MSW, subtract with borrow
x[i] = SBB(x[i], (y[i-o]<<s) | (y[i-o-1]>>(31-s)>>1), &b);
b = 1; // no final borrow, get MS word
for (i = LST(x); i > XTR; --i) // scan for MS non-zero word
if (x[i] != 0) {
b = i-XTR+1; break; }
LEN(x) = b; // update length
}
__forceinline void shADD(UInt32 *x, UInt32 *y, Int32 o, Int32 s) { // MP_INT x+=y[+o]<<s. Unused words be 0!
UInt32 c = 0, i, L; // s = shift #bits in words, o = ofset in words
x[o+XTR] = ADC(x[o+XTR], y[XTR]<<s, &c); // Add with carry, partial shifted word
L = LST(y)+o; if (L < LST(x)) L = LST(x); // loop until both x and y are used up
for (i = o+XTR+1; i <= L; ++i) // from LS to MS words, add with carry to x the shifted y
x[i] = ADC(x[i], (y[i-o]<<s) | (y[i-o-1]>>(31-s)>>1), &c);
x[L+1] = ADC(x[L+1],y[L-o]>>(31-s)>>1,&c);// shifted y can be one longer
LEN(x) = L-XTR+2; // update length, keep sign
if (c > 0) {
x[L+2] = c; // final carry
LEN(x) += 1; }
else if (x[L+1]==0)
LEN(x) -= 1; // no carry, last set x was 0
}
// MP_INT: {sign,len,word[0],..word[len-1]}. e,f,x,y == all 0, each one longer than max(a,b)!
void xGCD(UInt32 **g, UInt32 **c, UInt32 **d, // OUTPUT: g = gcd(a,b) = c*a + d*b
UInt32 *a, UInt32 *b, // INPUT: a,b > 0, (destroyed)
UInt32 *e, UInt32 *f, UInt32 *x, UInt32 *y) {// Work buffers: e,f, x,y of len = max(La,Lb)+1
Int32 i, o, s, v; UInt32 *t;
SET(e,0); SET(x,1); SGN(e) = -1; // start with "-0"
SET(f,1); SET(y,0); SGN(y) = -1; // x-=(e<<s) or e-=(x<<s) will be OK
for (;;) {
i = CMP(a,b);
if (i == 0)
break;
if (i < 0) {
SWP(a,b); SWP(x,e); SWP(y,f); }
_BitScanReverse(&s, MSW(a)); // index of MS bit, 0..31
_BitScanReverse(&v, MSW(b));
s -= v;
o = LEN(a)-LEN(b);
if (s < 0) { o -= 1; s += 32; }
i = shCMP(a,b,o,s);
if (i == 0)
break;
if (i < 0) s -= 1; if (s < 0) { o -= 1; s += 32; }
shSUB(a,b,o,s);
if (MSW(a)==0)
break;
shADD(x,e,o,s);
shADD(y,f,o,s);
}
*g = b; *c = e; *d = f;
}
static void CPY(char *dest, char *src, Int32 count) { // copy chars
Int32 i;
for (i = 0; i < count; ++i)
*dest++ = *src++;
}
static void SFL(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 s) { // y <- x<<s, s<32
Int32 i; UInt32 c = 0, t = 32-s;
for (i = 0; i < Lx; ++i) {
*y++ = (*x<<s) | c;
c = *x++ >> t;
}
*y = c;
}
static void SFR(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 s) { // y <- x>>s, s<32
Int32 i; UInt32 c = 0, t = 32-s;
y += Lx; x += Lx;
for (i = 0; i < Lx; ++i) {
*--y = c | (*--x>>s);
c = *x << t;
}
*--y = c;
}
static UInt32 DivIpp(UInt32 y[], Int32 Lx, UInt32 x[], UInt32 d) { // y <- x/u
Int32 i; UInt32 r = 0;
for (i = Lx-1; i >= 0; --i)
y[i] = UDIV(x[i],r,d,&r);
return r;
}
static void DIVpq(UInt32 q[], Int32 Ly, UInt32 y[], Int32 Lx, UInt32 x[]) { // q <- y/x, y <- y%x
Int32 i, j, k, n, m; UInt32 c, t, B, D, M, L; // ASSUME Ly >= Lx >= 2, y[Ly]==y[-1]==0
UInt64 pp; UInt32 *p = (UInt32*)(&pp); // p[0] = LS, p[1] = MS word of UInt64 pp
_BitScanReverse(&n, x[Lx-1]); k = 31-n; // MSb of x (n = 1..31), (.>>32 undefined)
D = (x[Lx-1] << k) | (x[Lx-2]>>n>>1); // normalize divisor by k = 31-n (0..30)
B = (x[Lx-2] << k); // MS part of next digit of divisor
if (Lx > 2) B |= (x[Lx-3]>>n>>1); // use one more word, if can
for (i = Ly; i >= Lx; --i) { // y[Ly] = 0
M = (y[ i ] << k) | (y[i-1]>>n>>1); // dividend << k
L = (y[i-1] << k) | (y[i-2]>>n>>1); // y is padded on both ends
if (M < D) { // test (almost never overflow)
t = UDIV(L,M,D,&c); // trial quotient digit (c <- mod) can be 1 too large
pp = __emulu(t,B); // extra multiplication for test too large t
if (p[1]>(c+1)) t--; } // correction when t is known too large
else t = 0xffffffff; // max digit
c = 0; m = i - Lx; // carry, index of q
for (j = 0; j < Lx; ++j) { // reduce dividend
pp = __emulu(x[j],t) + c; // 64-bit product, with carry
c = p[1] + (y[m+j] < p[0]); // new 32-bit carry
y[m+j] -= p[0]; } // reduce dividend digits
if (y[i] != c) { // t too large: y[i] not cleared
c = 0; // clear carry
for (j = 0; j < Lx; ++j) // add back dividend
y[m+j] = ADC(y[m+j],x[j],&c); // using/updating c = carry
t--; } // decrement trial quotient
y[i] = 0; // MS word of y is cleared
q[m] = t; // store quotient digit
}
}
static void DIVpp(UInt32 q[], Int32 Ly, UInt32 y[], Int32 Lx, UInt32 x[]) { // q <- y/x, y <- y%x
Int32 i, j, n, k; UInt32 c, t, D, M, L; // ASSUME Ly >= Lx >= 2, y[Ly]==y[-1]==0
UInt64 pp; UInt32 *p = (UInt32*)(&pp); // p[0] = LS, p[1] = MS word of UInt64 pp
_BitScanReverse(&n, x[Lx-1]); k = 31-n; // MSb of x (n = 1..31), (.>>32 undefined)
D = (x[Lx-1] << k) | (x[Lx-2]>>n>>1); // normalize divisor by k = 31-n (0..30)
for (i = Ly; i >= Lx; --i) { // y[Ly] = 0
M = (y[ i ] << k) | (y[i-1]>>n>>1); // dividend << k
L = (y[i-1] << k) | (y[i-2]>>n>>1); // y is padded on both ends
t = M < D ? UDIV(L,M,D,&c) : 0xffffffff; // trial quotient digit (c <- mod) can be 1 too large
c = 0; // carry
for (j = 0; j < Lx; ++j) { // reduce dividend
pp = __emulu(x[j],t) + c; // 64-bit product, with carry
c = p[1] + (y[i-Lx+j] < p[0]); // new 32-bit carry
y[i-Lx+j] -= p[0]; } // reduce dividend digits
if (y[i] != c) { // t too large: y[i] not cleared
c = 0; // clear carry
for (j = 0; j < Lx; ++j) // add back dividend
y[j+i-Lx] = ADC(y[j+i-Lx],x[j],&c); // using/updating c = carry
t--; } // decrement trial quotient
y[i] = 0; // MS word of y is cleared
q[i-Lx] = t; // store quotient digit
}
}
static void MULpp(UInt32 *z, Int32 Ly, UInt32 *y, Int32 Lx, UInt32 *x) { // z <- y*x
Int32 i, j; UInt32 c, *yy, *zz; UInt64 p; UInt32 *q = (UInt32*)(&p)+1; // MS 32 bits of p
for (i = 0; i < Lx; ++i) {
c = 0; yy = y; zz = z + i;
for (j = 0; j < Ly; ++j) {
p = __emulu(*yy++,*x) + c;
*zz += (UInt32)p;
c = *q + (*zz++ < (UInt32)p);
}
*zz = c; ++x;
}
}
static UInt32 MulIpp(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 u) { // y <- x*u
Int32 i; UInt32 c = 0; UInt64 p;
for (i = 0; i < Lx; ++i) {
p = __emulu(*x++,u) + c;
c = (UInt32)(p >> 32);
*y++ = (UInt32)p;
}
*y = c;
return c;
}
static Int32 LENGTH(Int32 Lx, UInt32 x[]) { // <- index of nonzero MS word = 1..Len
Int32 i;
for (i = Lx-1; i >= 0; --i)
if (x[i] > 0) return i+1;
return 1;
}
static void ADDpp(UInt32 *z, Int32 Ly, UInt32 *y, Int32 Lx, UInt32 *x) { // z <- y+x
Int32 i, c = 0; // ASSUMED: y > x, Len(z) = Ly+1
for (i = 0; i < Lx; ++i)
*z++ = ADC(*y++,*x++,&c); // c = new carry
for (i = 0; i < Ly-Lx; ++i)
*z++ = ADC(*y++,0,&c); // c = new carry
*z = c;
}
static void SUBpp(UInt32 *z, Int32 Ly, UInt32 *y, Int32 Lx, UInt32 *x) { // z <- y-x
Int32 i, b = 0; // ASSUMED: z is all 0, y > x: no final borrow
for (i = 0; i < Lx; ++i)
*z++ = SBB(*y++,*x++,&b); // b = new borrow
for (i = 0; i < Ly-Lx; ++i)
*z++ = SBB(*y++,0,&b);
}
static Int32 CMPpp(Int32 Len, Int32 *eq, UInt32 x[], UInt32 y[]) { // <- -0+, eq <- #equal words
Int32 i;
*eq = 0;
for (i = Len-1; i >= 0; --i)
if (x[i] == y[i])
*eq += 1;
else
return x[i]>y[i] ? 1 : -1;
return 0;
}
static UInt32 AddIpp(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 u) { // y <- x+u, return carry
Int32 i;
for (i = 0; i < Lx; ++i) {
*y = u + *x++;
u = *y++ < u; // carry
}
*y = u;
return u;
}
static UInt32 SubIpp(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 u) { // y <- x-u, return borrow
Int32 i;
for (i = 0; i < Lx; ++i) {
*y++ = *x - u;
u = *x++ < u; // borrow
}
return u;
}
static void UInt32toHex(UInt8 *hex, UInt32 u) { // in hex room for 5 bytes
Int32 i; UInt32 d, f = 0;
for (i=28; i>=0; i-=4) {
d = (u >> i) & 15;
f += d;
if ((f>0) || (i==0)) {
if (d > 9) *hex++ = d + 55;
else *hex++ = d + 48; }
}
*hex = 0;
}
static void UInt32to8Hex(UInt8 *hex, UInt32 u) { // in hex room for 5 bytes
Int32 i, d;
for (i=28; i>=0; i-=4) {
d = (u >> i) & 15;
if (d > 9) *hex++ = d + 55;
else *hex++ = d + 48;
}
*hex = 0;
}
They are more or less straightforward. There are 3 assembler functions:
UDIV, a wrapper for the single Pentium instruction for 64/32-bit division, which is, surprisingly, not available as intrinsic function, so it had to be coded in assembler.
ADC and SBB, which perform addition and subtraction, with carry/borrow, respectively. One can determine in C if an addition produces a carry (checking if the result is less than one of the added numbers), but it is quite tedious: a+b+carry needs a test after each +.
One somewhat more complicated function is the long division: DIVpp, and its faster variant DIVpq. I tried to make them as short as possible, without relying on any external function (except the inlined assembler functions). They are the traditional (grammar school) division algorithm with a couple of tricks to reduce code size. DIVpq has a little longer code, but it is noticeably faster on large numbers. (The trial quotient digit is estimated better, so fewer correction steps are necessary.)
The most complicated function is the extended Euclidean algorithm for finding the greatest common divisor of two long integers. This variant does not use multiplication or division, but shifts. The algorithm also returns two multipliers, to express the GCD as the linear combination of the two input. It can be used to compute modular inverses, that is, finding y for an x, such that mod(x*y, m) = 1.
Test the code on large, structured input (random values likely work), and post here the input which yield the wrong result or cause the script to crash, or cycle indefinitely.
Edit 20070802: Added functions ADC and SBB to simplify handling carry/borrow; and sped up and simplified further the division algorithm.
Edit 20070817:
- bugfix in MP_DEC
- tab implemented in MP_DEC
- new functions MP_SubIpp, MP_GCD, PI0
- faster division
Edit 20100613: A directory in AutoHotkey.net contains now the
Multi-precision integer library and related functions. The
library has to be included, before the long integer functions are used. (It contains a call to the initialization function.)
Some tests and example code is included in a
test script.
The
C source code of the machine code routines is also uploaded for reference.