The application has to include the MP_ functions listed below, and call first the MP_Init() function, to initialize machine code functions.
Long integers are stored in binary structures {Int:Sign, UInt:LenInUInts, UInts:LSword..MSword}, therefore no direct assignments can be performed. The functions below create new long integers for their results, if needed:
MP_LEN(x) ; Length in 32-bit integers
MP_IS0(x) ; test if x IS 0
MP_CMP(x, y, eq="") ; Compare: -0+ for x <=> y, eq = #equal MS words
MP_SFT(y, x, s) ; Shift: s>0: y <- x << s, s<0: y <- x >> -s
MP_ADD(z, y, x) ; Add: z <- y + x
MP_SUB(z, y, x) ; Subtract: z <- y - x
MP_MUL(z, y, x) ; Multiply: z <- y * x
MP_DIV(q, r, y, x) ; Divide, Remainder: q <- y/x, r <- Mod(y,x), sign(q) = SY*SX
MP_GCD(g, c, d, x, y) ; Extended Euclidean GCD <- g = c*x + d*y
MP_AddIpp(y, x, U32) ; Add a 32-bit integer (pos+pos) to x
MP_SubIpp(y, x, U32) ; Subtract a 32-bit integer (pos+pos) from x
MP_MulIpp(y, x, U32) ; Multiply x with max 32-bit integer (pos*pos)
MP_DivIpp(y, x, U32) ; Divide y <- x // 32-bit integer (pos/pos), RETURN remainder
MP_Cpy(y, x) ; Copy y <- x
MP_Set(x, h) ; Set x <- string (hex, decimal integer, double precision float)
MP_Hex(x, tab="") ; HEX representation of MP integer x
MP_Dec(z, tab="") ; DECIMAL representation of MP integer x
PI(digits) ; Newton's method for the digits of Pi
PI0(digits) ; Newton's method for the digits of Pi, faster for less than 20,000 digits
SRT(value,digits) ; Square root with Newton's iteration, value <= 16 digits integer
If your script contains
MsgBox % SRT(2,100)
The first 100 digits of sqrt(2) is shown, immediately. For the first 100 digits of Pi a fraction of a second is used with
MsgBox % PI(100)
Dividing a million-bit number with one of half size finishes in about 2 seconds in my 2GHz Centrino notebook.
Here is a long list of test cases
#include mpl.ahk MsgBox % "100 Digits of Sqrt(2):`n" SQT(2,100) ; 1414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572 MsgBox % "100 Digits of Pi:`n" PI0(100) ; 3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067 S = GCD:`n MP_Set(x,30) MP_Set(y,18) MP_GCD(g,c,d, x,y) S .= "`n" . MP_Dec(g) ", " MP_Dec(c) ", " MP_Dec(d) ; 6, -1, 2 MP_Set(x,2*3*5*7*11*13*17*23**4*29**3) MP_Set(y,17**5*23*31**2) MP_GCD(g,c,d, x,y) S .= "`n" . MP_Dec(g) ", " MP_Dec(c) ", " MP_Dec(d) ; 391, -6878429, 763665233731 MP_Set(x,176023680645013966468226945392411250770384383304492191886725992896575345044216019675) MP_Set(y,222232244629420445529739893461909967206666939096499764990979600) MP_GCD(g,c,d, x,y) ; GCD of the 400th and 300th Fibonacci numbers = 100th Fibonacci number S .= "`n" . MP_Dec(g) ", " MP_Dec(c) ", " MP_Dec(d) ; 354224848179261915075, -792070839848372253127, 627376215338105766356982006981782561278128 MP_Set(x,1414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572) MP_Set(y,3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117068) MP_GCD(g,c,d, x,y) S .= "`n" . MP_Dec(g) "`n" MP_Dec(c) "`n" MP_Dec(d) ; 4 ; 277618068276139372139970459366882728416722612456219294511921887244235666622301044198307082946902119 ; -124972038264512876885991215247412539326590817554290162245629946727004984493348100970663253404223898 Msgbox %S% /* ; Measure running time n = 10000 MP_Set(a,1) MP_SFT(x,a,64*n-1) MP_SFT(y,a,32*n-1) i := j := 4 Loop %n% { Random r, -0x80000000,0x7fffffff NumPut(r,x,i+=4) Random r, -0x80000000,0x7fffffff NumPut(r,x,i+=4) Random r, -0x80000000,0x7fffffff NumPut(r,y,j+=4) } t := A_TickCount MP_Div(q,r,x,y) MsgBox % A_TickCount - t ; 20,000/10,000 words 881ms(DIVpq),1065ms(DIVpp), 60,000/0,000 dec digits = 100 ms ; Cut'n Paste comparisions with other MP packages Loop 5000 { Random r, 0, 999 a .= SubStr(1000+r,2,2) b .= SubStr(r,0) } MsgBox Dividend: %a% MsgBox Divisor: %b% MP_Set(x,a) MP_Set(y,b) MP_Div(q,r,x,y) c := MP_Dec(q) d := MP_Dec(r) ClipBoard := "floor(" a "/" b ") - " c MsgBox ClipBoard = floor(a / b) - q`nQuotient: %c% ClipBoard := "(" a " mod " b ") - " d MsgBox ClipBoard = (a mod b) - r`nRemainder: %d% */ S = Division:`n MP_Set(x,12345678) MP_Set(y,-1000) MP_Div(q,r,x,y) S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; -12345 / 678 MP_Set(x,1234567890123) MP_Set(y,1000) MP_Div(q,r,x,y) S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; 1234567890 / 123 MP_Div(q,r,y,x) S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; 0 / 1000 MP_Set(x,1234567890123456789012345) MP_Set(y,1000000000000) MP_Div(q,r,x,y) S .= "`n" . MP_Dec(q) " / " MP_Dec(r) ; 1234567890123 / 456789012345 MP_Set(x,0xfffffffffffffffe00000000) MP_Set(y,0xffffffffffffffff) MP_Div(q,r,x,y) S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xFFFFFFFF / 0xFFFFFFFeFFFFFFFF MP_Set(x,0xffffffffe00000000) MP_Set(y,0xfffffffff) MP_Div(q,r,x,y) S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xFFFFFFFF / 0xeFFFFFFFF (9 digits) MP_Set(x,0x10000000000000000) ; 1,0,0 MP_Set(y,0x100000000) ; 1,0 MP_Div(q,r,x,y) S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0x100000000 / 0x0 (9/1 digits) MP_Set(x,0x1000000000000000000000000) ; 1,0,0,0 MP_Set(y,0x10000000000000001) ; 1,0,1 MP_Div(q,r,x,y) S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xFFFFFFFF / 0xFFFFFFFF00000001 (8/16 digits) MP_Set(x,0xffffffffe00000000) ; f,-2,0 MP_Set(y,0x111111111) ; 1,1..1 MP_Div(q,r,x,y) S .= "`n" . MP_Hex(q) " / " MP_Hex(r) ; 0xeFFFFFFFF / 0x11111111 (9/8 digits) MsgBox %S% S = Shift:`n MP_Set(x,0x2) MP_SFT(y,x,0) S .= "`n" . MP_Dec(y) ; 2 MP_SFT(y,x,-2) S .= "`n" . MP_Dec(y) ; 0 MP_Set(x,0x100000000) MP_SFT(y,x,-32) S .= "`n" . MP_Dec(y) ; 1 MP_Set(x,2222220000000022222222) MP_SFT(y,x,1) S .= "`n" . MP_Dec(y) ; 4444440000000044444444 MP_SFT(y,x,-1) S .= "`n" . MP_Dec(y) ; 1111110000000011111111 MP_Set(x,1) MP_SFT(y,x,2000) S .= "`n" . MP_Dec(y) ; 2**2000 = 114813069527425452...62184851149029376 MsgBox %S% S = Multiply:`n MP_Set(x,10000000) MP_Set(y,200) MP_Mul(z,y,x) S .= "`n" . MP_Dec(z) ; 2000000000 MP_Set(x,1000000000000) MP_Set(y,-12345678901234256789) MP_Mul(z,y,x) S .= "`n" . MP_Dec(z) ; -12345678901234256789000000000000 MsgBox %S% S = Subtract: `n MP_Set(x,0xff) MP_Set(y,255) MP_Sub(z,y,x) S .= "`n" . MP_Hex(z) ; 0x0 MP_Set(x,1) MP_Set(y,0x100000000) MP_Sub(z,y,x) S .= "`n" . MP_Hex(z) ; 0xffffffff MP_Set(x,0xffffffffffffffffffffffffffffffffff) MP_Set(y,-1) MP_Sub(z,y,x) S .= "`n" . MP_Hex(z) ; -0x10000000000000000000000000000000000 MP_Set(x, 0xffffffffffffffffffffffffff12345678) MP_Set(y, 0x10000000012345678) MP_Sub(z,y,x) S .= "`n" . MP_Hex(z) ; -0xFFFFFFFFFFFFFFFFFEFFFFFFFF00000000 MsgBox %S% S = Add:`n MP_Set(x, 0xffffffff00000001) MP_Set(y,0xffffffffffffffffff00000000ffffffff) MP_Add(z,x,y) S .= "`n" . MP_Hex(z) ; 0x10000000000000000000000000000000000 MP_Set(x,0xffffffffffffffffffffffffffffffffff) MP_Set(y,1) MP_Add(z,y,x) S .= "`n" . MP_Hex(z) ; 0x10000000000000000000000000000000000 MP_Set(x,-0xffffffffffffffffffffffffff12345678) MP_Set(y,-0x10000000012345678) MP_Add(z,y,x) S .= "`n" . MP_Hex(z) ; -0x1000000000000000000FFFFFFFF2468ACF0 MP_Set(x,111111110000000033333333) MP_Cpy(y,x) MP_Add(z,y,x) S .= "`n" . MP_Dec(z) ; 222222220000000066666666 MsgBox %S% S = Compare:`n MP_Set(x,-0x123456789abcdef0123456789abcdef01) MP_Set(y,-0x123456789abcdef0123456789abcdef01) c := MP_CMP(x,y,eq) S .= "`n" c "/" eq ; 0/5 MP_Set(x,-0xfffffffffffffffffffccccccccccccc) MP_Set(y,-0xfffffffffffffffffffffffffffffffd) c := MP_CMP(x,y,eq) S .= "`n" c "/" eq ; 1/2 MP_Set(x,-1) MP_Set(y, 1) c := MP_CMP(x,y) S .= "`n" c ; -1 MP_Set(x,-0xffffffffffffffffffffffffffffffffff) MP_Set(y,-0xfffffffffffffffff) c := MP_CMP(x,y) S .= "`n" c ; -1 MP_Set(x,0xffffffffffffffffffffffffffffffffff) MP_Set(y,0xfffffffffffffffffffffffffffffffffd) c := MP_CMP(x,y,eq) S .= "`n" c "/" eq ; 1/4 MsgBox %S% S = Operations with UInt:`n MP_Set(x,1) MP_SubIpp(y,x,0) S .= "`n" . MP_Dec(y) ; 1 MP_SubIpp(y,x,1) S .= "`n" . MP_Dec(y) ; 0 MP_SubIpp(y,x,2) S .= "`n" . MP_Dec(y) ; -1 MP_Set(x,0x100000000) MP_SubIpp(y,x,1) S .= "`n" . MP_Hex(y) ; 0xffffffff MP_Set(x,0xffffffffffffffffffffffffffffffffff) MP_AddIpp(y,x,0xfffffff0) S .= "`n" . MP_Hex(y) ; 0x100000000000000000000000000FFFFFFEF MP_Set(x,0xffffffffffffffffffffffffffffffffff) MP_MulIpp(y,x,0xfffffff0) S .= "`n" . MP_Hex(y) ; 0xFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFF00000010 MP_Set(x,0xffffffffffffffffffffffffffffffffffffff) r := MP_DivIpp(y,x,0xfffffff0) S .= "`n" . r " / " MP_Hex(y) ; 4095(0xFFF) / 0x1000000100000010000001000000100 MP_Set(x,0xffffffffffffffffffffffffffffffffff) r := MP_DivIpp(y,x,0xfffffff0) S .= "`n" . r " / " MP_Hex(y) ; 16777215(0xFFFFFF) / 0x100000010000001000000100000 MsgBox %S% S = Set Hex:`n i := 0 i++, MP_Set(a%i%,"0x0") i++, MP_Set(a%i%, 0x1234567) ; no need for quotes i++, MP_Set(a%i%, 0x12345678) i++, MP_Set(a%i%, 0x123456789) i++, MP_Set(a%i%,-0x123456789) i++, MP_Set(a%i%, 0x1234567890abcdef1234567890abcdef0fedcba98765432100) i++, MP_Set(a%i%,-0x1234567890abcdef1234567890abcdef0fedcba98765432100) i++, MP_Set(a%i%,-9223372036854775807) ; 2^63-1 i++, MP_Set(a%i%, 100) i++, MP_Set(a%i%,-0) i++, MP_Set(a%i%, 1.5) i++, MP_Set(a%i%,-0.75) i++, MP_Set(a%i%, 2.**52) i++, MP_Set(a%i%, 2.**52+3) i++, MP_Set(a%i%, 2.**62+2.**59) i++, MP_Set(a%i%, 2.**63-2.**59) i++, MP_Set(a%i%, 2.**63) i++, MP_Set(a%i%, 2.**63+2.**59) i++, MP_Set(a%i%, 2.**64-2.**59) i++, MP_Set(a%i%, 2.**64) i++, MP_Set(a%i%, 2.**64+2.**59) i++, MP_Set(a%i%, 2.**128-2.**120) i++, MP_Set(a%i%, 2.**128) i++, MP_Set(a%i%, 2.**128+2.**120) i++, MP_Set(a%i%, 2.**843+2.**842+2.**840+2.**839+2.**837) ; ~ max 0xDA000... i++, MP_Set(a%i%, 36028797018963968) ; 2**55 0x80000000000000 i++, MP_Set(a%i%, 4611686018427387904) ; 2**62 0x4000000000000000 i++, MP_Set(a%i%, 9223372036854775808) ; 2**63 0x8000000000000000 i++, MP_Set(a%i%, 18446744073709551616) ; 2**64 0x10000000000000000 i++, MP_Set(a%i%, 36893488147419103232) ; 2**65 0x20000000000000000 i++, MP_Set(a%i%,-987654321098765432109876543210) ;-0xC7748819DFFB62438D1C67EEA Loop %i% S .= A_Index . ": " . MP_Hex(a%A_Index%) . "`n" MsgBox %S% S = Set Decimal:`n i := 0 i++, MP_Set(a%i%, 36028797018963968) ; 2**55 0x80000000000000 i++, MP_Set(a%i%, 4611686018427387904) ; 2**62 0x4000000000000000 i++, MP_Set(a%i%, 9223372036854775808) ; 2**63 0x8000000000000000 i++, MP_Set(a%i%, 18446744073709551616) ; 2**64 0x10000000000000000 i++, MP_Set(a%i%, 36893488147419103232) ; 2**65 0x20000000000000000 i++, MP_Set(a%i%,-987654321098765432109876543210) ;-0xC7748819DFFB62438D1C67EEA Loop %i% S .= A_Index . ": " . MP_Dec(a%A_Index%) . "`n" MsgBox %S% S = Hex/Decimal Conversion:`n VarSetCapacity(z,999,Asc("0")) MP_Set(x, 0x1 . SubStr(z,500)) S .= "`n" . MP_Dec(x) ; 2**2000 in decimal form = 114813069527425452...62184851149029376 MP_Set(x, 1 . SubStr(z,500)) S .= "`n`n" . MP_Hex(x) ; 10**500 in hexadecimal form = 0x1F365FEC9370D2BAEE34B150720878EA52CCBA5E661...000(*125) MsgBox %S%
The long integer manipulation AHK functions are below:
; Multi Precision Library v.1.0 ; Large Integer = {Int:Sign, UInt:LenInUInts, UInts:LSword..MSword} ; = little endian, signed magnitude (not 2's complement!) MP_Init() ; Initialize Multi Precision Library MP_CMP(ByRef x, ByRef y, ByRef eq="") { ; -0+ for x <=> y, eq = #equal MS words Global CMPpp eq := 0 SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int") If (SX < SY) Return -1 If (SX > SY) Return 1 LX := NumGet(x,4), LY := NumGet(y,4) If (LX < LY) Return -SX If (LX > LY) Return SX Return SX*dllcall(&CMPpp,"Int",LX, "UInt*",eq, "UInt",&x+8, "UInt",&y+8, "CDECL Int") ; same sign, length } MP_SFT(ByRef y, ByRef x, s) { ; s>0: y <- x << s, s<0: y <- x >> -s Global SFL, SFR, LEN L8 := NumGet(x,4) If (s > 0) { ; shift left offs := s//32, s &= 31 VarSetCapacity(y,(L8+offs)*4+8,0) NumPut(NumGet(x,0,"Int"),y) ; sign NumPut(L8+offs,y,4) ; new length dllcall(&SFL, "UInt",&y+8+4*offs, "Int",L8, "UInt",&x+8, "UInt",s, "CDECL") Return } ; shift right s := -s, offs := s//32, s &= 31 If (offs >= L8) { ; too large shift: nothing remains VarSetCapacity(y,12,0) NumPut(1,y), NumPut(1,y,4) Return } VarSetCapacity(y,(L8-offs)*4+8,0) NumPut(NumGet(x,0,"Int"),y) ; sign dllcall(&SFR, "UInt",&y+8, "Int",L8-offs, "UInt",&x+8+4*offs, "UInt",s, "CDECL") NumPut(dllcall(&LEN,"Int",L8-offs,"UInt",&y+8, "CDECL Int"), y, 4) } MP_ADD(ByRef z, ByRef y, ByRef x) { ; z <- y + x Global CMPpp, ADDpp, SUBpp, LEN SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int") LX := NumGet(x,4), LY := NumGet(y,4), LZ := LX > LY ? LX : LY If (SX = SY) { ; ADD VarSetCapacity(z, LZ*4+12) ; room for carry If (LY > LX) dllcall(&ADDpp,"UInt",&z+8, "Int",LY,"UInt",&y+8, "Int",LX,"UInt",&x+8, "CDECL") Else dllcall(&ADDpp,"UInt",&z+8, "Int",LX,"UInt",&x+8, "Int",LY,"UInt",&y+8, "CDECL") NumPut(SX,z) ; set sign(z) NumPut(dllcall(&LEN,"Int",LZ+1,"UInt",&z+8, "CDECL Int"), z, 4) Return } ; SUBTRACT NumPut(1,x), NumPut(1,y) ; set both positive c := MP_CMP(x,y,eq) ; compare, find same MS part eq -= eq = LZ ; leave at least one word VarSetCapacity(z,(LZ-eq)*4+8,0) ; no room needed for equal MS parts (fill 0 for x=y) If (c = 0) ; x = y NumPut(1,z), NumPut(1,z,4) ; z <- 0 Else If (c < 0) { ; x < y dllcall(&SUBpp,"UInt",&z+8, "Int",LY-eq,"UInt",&y+8, "Int",LX-eq,"UInt",&x+8, "CDECL") NumPut(Sy,z) ; sign(z) <- Sy } Else { ; x > y dllcall(&SUBpp,"UInt",&z+8, "Int",LX-eq,"UInt",&x+8, "Int",LY-eq,"UInt",&y+8, "CDECL") NumPut(Sx,z) ; sign(z) <- Sx } NumPut(dllcall(&LEN,"Int",LZ,"UInt",&z+8, "CDECL Int"), z, 4) NumPut(Sx,x), NumPut(Sy,y) ; reset both signs } MP_SUB(ByRef z, ByRef y, ByRef x) { ; z <- y - x SX := NumGet(x,0,"Int") NumPut(-SX,x) ; x <- (-x) MP_ADD(z,y,x) ; z <- y + (-x) NumPut( SX,x) ; reset sign(x) } MP_MUL(ByRef z, ByRef y, ByRef x) { ; z <- y * x Global MULpp, LEN SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int") LX := NumGet(x,4), LY := NumGet(y,4), LZ := LX + LY VarSetCapacity(z, LZ*4+8, 0) dllcall(&MULpp,"UInt",&z+8, "Int",LY,"UInt",&y+8, "Int",LX,"UInt",&x+8, "CDECL") NumPut(dllcall(&LEN,"Int",LZ,"UInt",&z+8, "CDECL Int"), z, 4) NumPut(Sy*Sx,z) ; sign(z) <- Sy*Sx } MP_DIV(ByRef q, ByRef r, ByRef y, ByRef x) { ; q <- y/x, r <- Mod(y,x), sign(q) = SY*SX, sign(r) = SY Global DIVpq, LEN, CPY SX := NumGet(x,0,"Int"), SY := NumGet(y,0,"Int") LX := NumGet(x,4), LY := NumGet(y,4) If (Lx = 1) { ; Single digit x If (NumGet(x,8) = 0) Return "ERROR: Divide by 0" rem := MP_DivIpp(q, y, NumGet(x,8)) VarSetCapacity(r,12) ; put rem in MP format NumPut(SY,r), NumPut(1,r,4), NumPut(rem,r,8) NumPut(Sy*Sx,q) ; set lengths, signs Return } ; Multi-digit x If (MP_CMP(y,x) < 0) { ; y < x VarSetCapacity(q,12,0) NumPut(SY*Sx,q), NumPut(1,q,4) ; q <- 0 MP_CPY(r,y) ; r <- y Return } ; Multi-digits, y >= x LQ := LY > LX ? LY-LX+1 : 1 VarSetCapacity(q, LQ*4+ 8, 0) ; q initialized to all 0! VarSetCapacity(r, LY*4+12, 0) ; r <- y, 0-padded on both ends dllcall(&CPY, "UInt",&r+8, "UInt",&y+8, "Int",LY*4, "CDECL") dllcall(&DIVpq,"UInt",&q+8, "Int",LY,"UInt",&r+8, "Int",LX,"UInt",&x+8, "CDECL") NumPut(dllcall(&LEN,"Int",LQ,"UInt",&q+8, "CDECL Int"), q, 4) NumPut(dllcall(&LEN,"Int",LX,"UInt",&r+8, "CDECL Int"), r, 4) NumPut(Sy*Sx,q), NumPut(Sy,r) ; sign(q) <- Sy*Sx, sign(r) <- Sy } MP_GCD(ByRef g, ByRef c, ByRef d, ByRef x, ByRef y) { ; Extended shifting Euclidean GCD = g = c*x + d*y Global xGCD, CPY If (MP_IS0(x)) { MP_CPY(g,y), MP_SET(c,0), MP_SET(d,1) Return } If (MP_IS0(y)) { MP_CPY(g,x), MP_SET(c,1), MP_SET(d,0) Return } L := ((MP_LEN(x) > MP_LEN(y) ? MP_LEN(x) : MP_LEN(y)) + 3) * 4 ; temp length in bytes VarSetCapacity(Z, 6*L, 0) dllcall(&CPY, "UInt", &Z, "UInt",&x, "Int",MP_LEN(x)*4+8, "CDECL") ; copy input dllcall(&CPY, "UInt",&Z+L, "UInt",&y, "Int",MP_LEN(y)*4+8, "CDECL") ; copy destroyed in xGCD dllcall(&xGCD, "UInt*",f,"UInt*",a,"UInt*",b, "UInt",&Z,"UInt",&Z+L ; do the work , "UInt",&Z+2*L,"UInt",&Z+3*L,"UInt",&Z+4*L,"UInt",&Z+5*L) s := NumGet(f+0,4)*4+8, VarSetCapacity(g,s) ; copy result from buffer dllcall(&CPY, "UInt",&g, "UInt",f, "Int",s, "CDECL") s := NumGet(a+0,4)*4+8, VarSetCapacity(c,s) dllcall(&CPY, "UInt",&c, "UInt",a, "Int",s, "CDECL") s := NumGet(b+0,4)*4+8, VarSetCapacity(d,s) dllcall(&CPY, "UInt",&d, "UInt",b, "Int",s, "CDECL") } MP_AddIpp(ByRef y, ByRef x, I32) { ; add a 32-bit integer (pos+pos) to x Global AddIpp L8 := NumGet(x,4) VarSetCapacity(y,L8*4+12) ; one extra word for carry c := dllcall(&AddIpp, "UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",I32, "CDECL UInt") NumPut(1,y) ; positive NumPut(c>0 ? L8+1 : L8,y,4) ; true length } MP_SubIpp(ByRef y, ByRef x, I32) { ; subtract a 32-bit integer (pos+pos) from x Global SubIpp, LEN L8 := NumGet(x,4) If (L8 < 2) { VarSetCapacity(y,12) d := NumGet(x,8,"UInt") - I32 s := d < 0 ? -1 : 1 NumPut(s,y), NumPut(1,y,4), NumPut(s*d,y,8) Return } VarSetCapacity(y,L8*4+8) ; no borrow c := dllcall(&SubIpp, "UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",I32, "CDECL UInt") NumPut(1,y) ; positive NumPut(dllcall(&LEN,"Int",L8,"UInt",&y+8, "CDECL Int"), y, 4) } MP_MulIpp(ByRef y, ByRef x, I32) { ; multiply x with max 32-bit integer (pos*pos) Global MulIpp L8 := NumGet(x,4) VarSetCapacity(y,L8*4+12) ; one extra word for carry c := dllcall(&MulIpp,"UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",I32, "CDECL UInt") NumPut(1,y) ; positive NumPut(c>0 ? L8+1 : L8,y,4) ; true length } MP_DivIpp(ByRef y, ByRef x, U32) { ; y := x // 32-bit integer (pos/pos), RETURN remainder Global LEN, DivIpp L8 := NumGet(x,4) VarSetCapacity(y,L8*4+8) r := dllcall(&DivIpp,"UInt",&y+8, "Int",L8, "UInt",&x+8, "UInt",U32, "CDECL UInt") NumPut(1,y) ; positive NumPut(dllcall(&LEN,"Int",L8,"UInt",&y+8, "CDECL Int"), y, 4) Return r } MP_Cpy(ByRef y, ByRef x) { ; y <- x Global CPY L8 := NumGet(x,4) VarSetCapacity(y,L8*4+8) dllcall(&CPY, "UInt",&y, "UInt",&x, "Int",L8*4+8, "CDECL") } MP_Set(ByRef x, h) { ; x <- string (hex, decimal integer, double precision float) Global U32toHex Static D52 := 4503599627370496.0, d := 0x1234567890123456, mk = 0xFFFFFFFFFFFFF If (SubStr(h,1,1) = "-") sign := -1, h := SubStr(h,2) ; negative Else sign := 1 ; positive If (SubStr(h,1,2) = "0x") { ; arbitrary long hex L8 := (StrLen(h)+5)//8 VarSetCapacity(x,L8*4+8) Loop % L8-1 NumPut("0x" . SubStr(h,1-A_Index*8,8), x, 4+A_Index*4) NumPut(SubStr(h,1,StrLen(h)+8-L8*8), x, 4+L8*4) NumPut(L8,x,4) } Else If h contains e,. ; decimal double { If (h <= D52) ; 52-bit precise AHK conversion MP_Set(x, Round(h)) ; sign fixed at the end Else { NumPut(h,d,0,"Double") e := (NumGet(d,6,"Short")>>4) - 1075 ; power_of_2 exponent (0x3ff+52) f := NumGet(d,0,"Int64") & mk | D52 ; fraction 1..2 * 2.**-52 If e < 11 MP_Set(x, f<<e) ; h < 2**63 (Int64) Else { fm := f >> 32 fl := f & 0xffffffff el := e & 31, ek := e >> 5 If (el < 12) { ; 2 words VarSetCapacity(x,16+4*ek,0) NumPut(2+ek,x,4) NumPut(fl<<el,x,8+4*ek) NumPut(fm<<el | fl>>(32-el), x ,12+4*ek) } Else { ; 3 words VarSetCapacity(x,20+4*ek,0) NumPut(3+ek,x,4) NumPut(fl<<el,x,8+4*ek) NumPut(fm<<el | fl>>(32-el), x ,12+4*ek) NumPut(fm>>(32-el), x ,16+4*ek) } } } } Else If (StrLen(h) < 19) ; decimal integer If (h > 0xffffffff) { VarSetCapacity(x,16) ; 63 bits NumPut(h & 0xffffffff,x,8) NumPut(h>>32,x,12) NumPut(2,x,4) } Else { ; 32 bits VarSetCapacity(x,12) NumPut(h,x,8) NumPut(1,x,4) } Else { ; > 18 digits d := Mod(StrLen(h)-1,9) + 1 ; 1..9 MP_Set(x,SubStr(h,1,d)) ; MS 1..9 digits StringTrimLeft h, h, d Loop % StrLen(h)//9 { MP_MulIpp(y, x, 1000000000) MP_AddIpp(x, y, SubStr(h,1,9)) StringTrimLeft h, h, 9 } } NumPut(sign,x) } MP_Hex(ByRef x, tab="") { ; HEX representation of MP integer x, tab before every 8 digits Global U32toHex, U32to8Hex Static S := "12345678" L8 := NumGet(x,4) VarSetCapacity(h,3+L8*8) DllCall(&U32toHex, "Str",h, "UInt",NumGet(x,4+L8*4), "CDECL") h := NumGet(x)=1 ? "0x" . h : "-0x" . h Loop % L8-1 DllCall(&U32to8Hex, "UInt",&S, "UInt",NumGet(x,L8*4+4-A_Index*4), "CDECL"), h .= tab . S Return h } MP_Dec(ByRef z, tab="") { ; DECIMAL representation of MP integer x, tab before every 9 digits Static d9 := 1000000000 MP_CPY(x,z) ; copy of input (destroyed) sign := NumGet(x) VarSetCapacity(d,NumGet(x,4)*10) ; allocate enough memory Loop { c := MP_DivIpp(y, x, d9) ; no leading 0's If MP_IS0(y) Break d := SubStr(c+d9,-8) . tab . d ; attach to the left with leaging 0's c := MP_DivIpp(x, y, d9) ; no leading 0's If MP_IS0(x) Break d := SubStr(c+d9,-8) . tab . d ; pad-left, prepend } Return (sign = 1 ? "" : "-") . c . tab . d } MP_LEN(ByRef x) { Return NumGet(x,4) } MP_IS0(ByRef x) { Return NumGet(x,4)=1 && NumGet(x,8)=0 } PI(digits) { ; Newton's method, 2 accurate bits/iteration, less rounding errors, < 905 million digits VarSetCapacity(y,digits*1.3,Asc("0")) MP_SET(x, 3 . y) ; initial estimate pi = 3 MP_CPY(y,x) Loop { i := 2*A_Index-1 ; for better speed: start with a large i and good initial value MP_SET(n,i*i) MP_MUL(z,y,n) MP_SET(d,4*(i+1)*(i+2)) ; up to 63 bits, i < 1.4 * 2**30, up to 1.4*2**31 accurate bits MP_DIV(y,n,z,d) ; the remainder in n can be used for rounding: fewer guard digits MP_ADD(z,x,y) MP_CPY(x,z) If (MP_LEN(y)<2) ; last few terms can be floats Break } Return SubStr(MP_DEC(x),1,digits) } PI0(digits) { ; Newton's method, little faster, for less than 20,000 digits VarSetCapacity(y,digits+sqrt(2*digits),Asc("0")) ; digits+sqrt(digits) is expected to be OK MP_SET(x, 3 . y) ; x = estimate of pi, x0 = 3 MP_CPY(y,x) ; y = current term to be added Loop { i := 2*A_Index-1 MP_MulIpp(z,y,i*i) ; 32-bit UInt parameter d := 4*(i+1)*(i+2) ; i < 2**15 ==> accuracy < 2**16 bits =~ 19,728 digits r := MP_DivIpp(w,z,d) ; remainder used for rounding: fewer guard digits MP_AddIpp(y,w,r>=d//2) MP_ADD(z,x,y) MP_CPY(x,z) If (MP_LEN(y) < 2) Break } Return SubStr(MP_DEC(x),1,digits) } SQT(value,digits) { ; Newton iteration, value <= 16 digits integer VarSetCapacity(z,digits,Asc("0")) StringLeft z, z, digits ; trailing 0's a = %value%%z%%z% ; scaled input s := Round(sqrt(value)*1.e16, 1) ; float form, to have more than 18 digits StringTrimRight s, s, 2 ; remove ".0" from end s .= z ; pad with 0's StringTrimRight s, s, 16 ; initial value MP_SET(V, a) MP_SET(X, s) Loop { MP_DIV(y,z,V,X) MP_ADD(z,X,y) MP_SFT(X,z,-1) If MP_CMP(X,y) = 0 Break } Return SubStr(MP_DEC(x),1,digits) } MCode(ByRef code, hex) { ; allocate memory and write Machine Code there VarSetCapacity(code,StrLen(hex)//2) Loop % StrLen(hex)//2 NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char") } MP_Init() { ; INITIALIZE global machine code functions GLOBAL MCode(CPY,"568B74241085F67E118B4C240C8B4424088A11881040414E75F75EC3") ; ~memcpy, copy BYTES MCode(LEN,"8B442404EB0A8B4C2408833C810077054879F333C040C3") MCode(SFL,"558BEC8B450C568B7508576A205A2B551433FF85C07E2489450C8B4510538B188B4D14D3E38BCA0BDF891E8B3883C604D3EF" . "83C004FF4D0C75E45B893E5F5E5DC3") MCode(SFR,"558BEC8B551053568B7508578B7D0C6A208BCFC1E102582B451433DB03F103D185FF7E1F897D0C8B4D1483EA048B3AD3EF83" . "EE048BC80BFB893E8B1AD3E3FF4D0C75E45F895EFC5E5B5DC3") MCode(UDIV,"8B4424048B5424088B4C2410F774240C8911C3") ; ErrorLevel = 0xc0000095: Integer Overflow MCode(ADC, "33D28B4C240C8B010344240883D2000344240483D2008911C3") ; x+y+*c, c <- carry MCode(SBB, "33D28B4C240C8B4424042B0183D2002B44240883D2008911C3") ; x-y-*b, b <- borrow MCode(MulIpp,"558BEC518365FC008B4D08578B7D0C85FF7E2153568B75108B06F7651433DB83C6040345FC13D3890183C1044F8955FC75E6" . "5E5B8B45FC89015FC9C3") MCode(DivIpp,"558BEC83EC0C8365FC00578B7D0C4F78398D45FC538B5D1089450C8B4508568D34B82BD88B45FC8945F88B04338945F48B45" . "F48B55F88B4D0CF77514891189064F83EE0485FF7DDC5E5B8B45FC5FC9C3") MCode(AddIpp,"558BEC8B45148B4D08568B750C85F67E1F8B5510578B3A03F8893983C20483C1043BF8730533C040EB0233C04E75E65F8901" . "5E5DC3") MCode(SubIpp,"558BEC8B550C85D27E2C8B45108B4D08568B302B751489318B3083C10483C0043B75147309C7451401000000EB048365140" . "04A75DC5E8B45145DC3") MCode(CMPpp,"558BEC8B550C8B450883220048535657781C8B75148B7D108D0C862BFE8B1C0F3B197511FF024883E90485C07DEF33C05F5E" . "5B5DC38B4D10C1E0028B0C08390C301BC083E00248EBE7") MCode(ADDpp,"558BEC83EC108B45148365FC0085C053568B7508578B7D107E3B8B5D188D4DFC894DF08945F88B038945F48B0789450833D2" . "8B4DF08B010345F483D20003450883D2008911890683C60483C70483C304FF4DF875D18B450C2B451485C07E2D8D4DFC894D148BD88B07" . "89450833D28B4D148B0183C00083D20003450883D2008911890683C60483C7044B75DB8B45FC5F89065E5BC9C3") MCode(SUBpp,"558BEC83EC148B45148365FC0085C053568B7508578B7D107E3B8B5D188D4DFC894DEC8945F88B038945F48B078945F033D2" . "8B4DEC8B45F02B0183D2002B45F483D2008911890683C60483C70483C304FF4DF875D18B450C2B451485C07E2D8D4DFC894D0C8BD88B07" . "89451433D28B4D0C8B45142B0183D20083E80083D2008911890683C60483C7044B75DB5F5E5BC9C3") MCode(MULpp,"558BEC8B451483EC0C85C07E628B4D08538B5D18568B751057894D142BF18945F48B450C8B4D1433D285C08955FC7E2B8945" . "F88B040EF72333FF0345FC13D701018B3983C1043BF8730533C040EB0233C003D0FF4DF88955FC75D88345140483C30483EE04FF4DF489" . "1175B55F5E5BC9C3") MCode(DIVpp,"558BEC83EC288B4D1856578B7D148B44B9FC0FBDD08955FC6A1F5E2BF28B54B9F88B4DFCD3EA8BCED3E08975ECD1EA0BD08B" . "450C3BC78955E88945F00F8C210100008B4D102BC78D048189450C8B45082BC1538945D8EB038B4D108B5DF0C1E3028D740BFC8B068B76" . "FC03D98B4DFC8BD0D3EA8B4DEC895DE48B1BD3E38B4DFCD3EE8B4DECD3E0D1EAD1EE0BD30BF03B55E88955F88975E073168D45148945F4" . "8B45E08B55F88B4DF4F775E88911EB0383C8FF836514008365F80085FF8945F47E328B750C8B4DF88B45188B0488F765F433C903451413" . "D18B0E3BC81BDBF7DB2BC803DAFF45F8890E83C604397DF8895D147CD18B45148B4DE43901744233DB3BFB895D147E368B750C8D451489" . "45DC8B45188B04988945F88B068945E033D28B4DDC8B010345F883D2000345E083D200891189064383C6043BDF7CD3FF4DF48B45E48320" . "008B550CFF4DF08B45D88B4DF4836D0C04397DF0890C100F8DF6FEFFFF5B5F5EC9C3") ; shortest code 359 bytes MCode(DIVpq,"558BEC83EC2C538B5D1856578B7D148B44BBFC0FBDC88B54BBF8894DFC6A1F592B4DFC8BF2894DF48B4DFCD3EE8B4DF4D3E0" . "D3E2D1EE0BF083FF028975E48955E87E0E8B44BBF48B4DFCD3E8D1E80945E88B450C3BC78945EC0F8C300100002BC7C1E0028945F88B5D" . "EC8B4D10C1E3028D740BFC8B068B76FC03D98B4DFC8BD0D3EA8B4DF4895DE08B1BD3E38B4DFCD3EE8B4DF4D3E0D1EAD1EE0BD30BF03B55" . "E48955F08975DC732A8D451489450C8B45DC8B55F08B4D0CF775E4891189450CF765E88945D48B4514403BD07609FF4D0CEB04834D0CFF" . "836514008365F00085FF7E388B45F88B4D108D34088B4DF08B45188B0488F7650C33C903451413D18B0E3BC81BDBF7DB2BC803DAFF45F0" . "890E83C604397DF0895D147CD18B45148B4DE03901744833DB3BFB895D147E3C8B4D108D45148945D88B45F88D34088B45188B04988945" . "F08B068945DC33D28B4DD88B010345F083D2000345DC83D200891189064383C6043BDF7CD3FF4D0C8B45E08320008B55F8FF4DEC8B4508" . "8B4D0C836DF804397DEC890C020F8DD8FEFFFF5F5E5BC9C3") ; Quotient correction 404 bytes, 20..30% faster @ large args! MCode(U32toHex,"8B542404566A1C33F6598B44240CD3E883E00F03F0750485C9750E83F80976040437EB02043088024283E90479DCC602005EC3") MCode(U32to8Hex,"8B5424046A1C598B442408D3E883E00F83F8097E040437EB02043088024283E90479E4C60200C3") MCode(xGCD, "558BEC8B551C836208008B4D2433C0408942048901894104894108830AFF8B4D2083EC20538B5D1489018941048941088B4D" . "28836108008309FF56578941048B43048B55188BC82B4A04740A3BC81BC083E00248EB234083F8027C1B8BFB8D34822BFA8B0C372B0E0F" . "85C60000004883EE0483F8027DEC33C085C00F840B0400007D228B4D1C8BC38945188B4524894D248B4D2089451C8B45288BDA8B551889" . "4D288945208B43040FBD4C8304894D148B4A040FBD748A048B7D142BFE8975EC8BF02BF185FF897D147D054E83451420408D7E023BC789" . "45EC7E426A1F592B4D142BC68D0482894DF88945F08B78FC8B4DF88B00D3EF8B4D14D3E08B4DECD1EF0BF88B048B2BC775338B45F04983" . "E8048D7E023BCF894DEC8945F07FCF8B4D148B42088B3CBBD3E08BC88BC72BC174173BC7EB0B3B0C83E912FFFFFF3B048B1BC083E00248" . "EB138D4601EB07833C830077154883F8027DF433C085C00F842A0300007D03FF4D14837D14007D054E834514208B4D148365FC008D45FC" . "8945F48B4208D3E08D7CB3088945EC8B078945F033D28B4DF48B45F02B0183D2002B45EC83D200891189078B4B048D4603413BC18945F4" . "77608D4DFC894DE86A1F592B4D142BC6894DF88B4D188D3C818B4DF88B47FC8B17D3E88B4D14D3E2D1E80BC28945EC8B45F48B04838945" . "F033D28B4DE88B45F02B0183D2002B45EC83D20089118B4DF4FF45F489048B8B430483C704403945F476B78B430433C94140EB07833C83" . "0075084883F80277F4EB038D48FF894B04837C8B04000F844C0200008B4D148365FC008D45FC8945F08B451C8B4008D3E08945EC8B4524" . "8D7CB0088B078945E833D28B4DF08B010345EC83D2000345E883D200891189078B451C8B40048D7C30018B45248B4004403BF873028BF8" . "8D46033BC78945F4776C8D4DFC894DE46A1F592B4D142BC6894DF88B4D1C8D04818945F0EB038B45F08B50FC8B4DF88B00D3EA8B4D14D3" . "E08B4DF4D1EA0BD08B45248D04888945E08B008955EC8945E833D28B4DE48B010345EC83D2000345E883D20089118B4DE0FF45F48345F0" . "04397DF4890176B08B551C8D45FC8945E86A1F8BC72BC68B0482592B4D14D3E8894DF8D1E88945E48B45248B44B8048945E033D28B4DE8" . "8B010345E483D2000345E083D20089118B4D248944B9048B45FC85C089790476098944B908FF4104EB0B837CB9040075044F8979048B4D" . "148B7D288365FC008D45FC8945E88B45208B4008D3E08945E48B44B7088945E033D28B4DE88B010345E483D2000345E083D20089118B4D" . "208944B7088B41048D5430018B4704403BD08955F473038945F48D46033B45F48945F0775D2BC68D55FC8D04818955E88945ECEB038B45" . "EC8B50FC8B4DF88B00D3EA8B4D14D3E0D1EA0BD08B45F08B04878955E48945E033D28B4DE88B010345E483D2000345E083D20089118B4D" . "F08345EC0489048F413B4DF4894DF076B68B4D208D45FC8945E48B45F48BD02BD68B14918B4DF8D3EA8D7487048B06894514D1EA8955E0" . "33D28B4DE48B010345E083D20003451483D20089118B4DFC85C989068B45F4894704760C894C8708FF4704E9C8FBFFFF833E000F85BFFB" . "FFFF8B4D2848E9B3FBFFFF8B55188B45088B4D1C89108B450C89088B45108B4D205F5E89085BC9C3") ; 1190 bytes }
The low level functions were coded in C, compiled by VS'05, and their machine code listing was converted to hex and pasted into the MP_Init() function. Here they are:
typedef long long Int64; typedef int Int32; typedef short Int16; typedef char Int8; typedef unsigned long long UInt64; typedef unsigned int UInt32; typedef unsigned long ULong; typedef unsigned short UInt16; typedef unsigned char UInt8; // for dll creation #define static // Intrinsic functions (insted of #include <intrin.h>) UInt64 __emulu(UInt32 ua, UInt32 ub); // <- 64bit ua*ub (no intrinsic division!) __forceinline UInt8 _BitScanForward( UInt32 *Index, UInt32 Mask);// bsf, <- 0 if Mask=0, Index <- LS bit position __forceinline UInt8 _BitScanReverse( UInt32 *Index, UInt32 Mask);// bsr, <- 0 if Mask=0, Index <- MS bit #define XTR 2 #define SGN(x) x[0] #define LEN(x) x[1] #define LSW(x) x[2] #define LST(x) XTR-1+LEN(x) #define MSW(x) x[LST(x)] #define SET(x,v) SGN(x)=1; LEN(x) = 1; LSW(x) = v #define SWP(x,y) t = x; x = y; y = t __forceinline static UInt32 UDIV(UInt32 LS, UInt32 MS, UInt32 d, UInt32 *r) { // <- n64/u, r <- n64%u. (n64=MS|LS) __asm { mov eax, LS mov edx, MS mov ecx, r div d mov DWORD PTR [ecx], edx } } __forceinline static UInt32 ADC(UInt32 x, UInt32 y, UInt32 *c) { // <- x + y + c, c <- new carry __asm { // x += y + *c; *c = carry; return x; xor edx, edx // edx = 0 mov ecx, c // ecx = c mov eax, dword ptr[ecx] // eax = *c add eax, y // eax += y, set carry adc edx, 0 // edx = carry add eax, x // eax += x, eax --> return adc edx, 0 // edx += carry mov dword ptr[ecx], edx // *c = carry } } __forceinline static UInt32 SBB(UInt32 x, UInt32 y, UInt32 *b) { // <- x - y - b, b <- new borrow __asm { // x -= y + *b; *b = borrow; return x; xor edx, edx // edx = 0 mov ecx, b // ecx = b mov eax, x // eax = x sub eax, dword ptr[ecx] // eax -= *b adc edx, 0 // edx = carry sub eax, y // eax -= y, set carry adc edx, 0 // edx += carry mov dword ptr[ecx], edx // *b = borrow } } __forceinline Int32 CMP(UInt32 *x, UInt32 *y) { // MP_ numbers compared -0+ Int32 i; UInt32 c; c = LEN(x) - LEN(y); if (c != 0) return LEN(x)>c ? 1 : -1; for (i = LEN(x)+XTR-1; i >= XTR; --i) { c = x[i] - y[i]; if (c != 0) return x[i]>c ? 1 : -1; } return 0; } __forceinline Int32 shCMP(UInt32 *x, UInt32 *y, Int32 o, Int32 s) { // MP_ number x compared with y[+o]<<s Int32 i; UInt32 c; // s = shift #bits in words, o = ofset in words for (i = LST(x); i > o+XTR; --i) { // from MS to LS words, ASSUME equal MS bits!! c = x[i] - ((y[i-o]<<s) | (y[i-o-1]>>(31-s)>>1)); if (c != 0) // not equal, we can finish compare return x[i]>c ? 1 : -1; } // all equal until LS word of shifted y c = x[o+XTR] - (y[XTR]<<s); // partial shifted LS word of y if (c != 0) return x[o+XTR]>c ? 1 : -1; for (i = o+XTR-1; i >=XTR; --i) // the rest of y == 0 if (x[i] > 0) return 1; return 0; } __forceinline void shSUB(UInt32 *x, UInt32 *y, Int32 o, Int32 s) { // MP_INT x -= y[+o]<<s. x be larger! UInt32 i, b = 0; // s = shift #bits in words, o = ofset in words x[o+XTR] = SBB(x[o+XTR], y[XTR]<<s, &b); // Subtract with borrow, partial shifted word for (i = o+XTR+1; i <= LST(x); ++i) // from LSW to MSW, subtract with borrow x[i] = SBB(x[i], (y[i-o]<<s) | (y[i-o-1]>>(31-s)>>1), &b); b = 1; // no final borrow, get MS word for (i = LST(x); i > XTR; --i) // scan for MS non-zero word if (x[i] != 0) { b = i-XTR+1; break; } LEN(x) = b; // update length } __forceinline void shADD(UInt32 *x, UInt32 *y, Int32 o, Int32 s) { // MP_INT x+=y[+o]<<s. Unused words be 0! UInt32 c = 0, i, L; // s = shift #bits in words, o = ofset in words x[o+XTR] = ADC(x[o+XTR], y[XTR]<<s, &c); // Add with carry, partial shifted word L = LST(y)+o; if (L < LST(x)) L = LST(x); // loop until both x and y are used up for (i = o+XTR+1; i <= L; ++i) // from LS to MS words, add with carry to x the shifted y x[i] = ADC(x[i], (y[i-o]<<s) | (y[i-o-1]>>(31-s)>>1), &c); x[L+1] = ADC(x[L+1],y[L-o]>>(31-s)>>1,&c);// shifted y can be one longer LEN(x) = L-XTR+2; // update length, keep sign if (c > 0) { x[L+2] = c; // final carry LEN(x) += 1; } else if (x[L+1]==0) LEN(x) -= 1; // no carry, last set x was 0 } // MP_INT: {sign,len,word[0],..word[len-1]}. e,f,x,y == all 0, each one longer than max(a,b)! void xGCD(UInt32 **g, UInt32 **c, UInt32 **d, // OUTPUT: g = gcd(a,b) = c*a + d*b UInt32 *a, UInt32 *b, // INPUT: a,b > 0, (destroyed) UInt32 *e, UInt32 *f, UInt32 *x, UInt32 *y) {// Work buffers: e,f, x,y of len = max(La,Lb)+1 Int32 i, o, s, v; UInt32 *t; SET(e,0); SET(x,1); SGN(e) = -1; // start with "-0" SET(f,1); SET(y,0); SGN(y) = -1; // x-=(e<<s) or e-=(x<<s) will be OK for (;;) { i = CMP(a,b); if (i == 0) break; if (i < 0) { SWP(a,b); SWP(x,e); SWP(y,f); } _BitScanReverse(&s, MSW(a)); // index of MS bit, 0..31 _BitScanReverse(&v, MSW(b)); s -= v; o = LEN(a)-LEN(b); if (s < 0) { o -= 1; s += 32; } i = shCMP(a,b,o,s); if (i == 0) break; if (i < 0) s -= 1; if (s < 0) { o -= 1; s += 32; } shSUB(a,b,o,s); if (MSW(a)==0) break; shADD(x,e,o,s); shADD(y,f,o,s); } *g = b; *c = e; *d = f; } static void CPY(char *dest, char *src, Int32 count) { // copy chars Int32 i; for (i = 0; i < count; ++i) *dest++ = *src++; } static void SFL(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 s) { // y <- x<<s, s<32 Int32 i; UInt32 c = 0, t = 32-s; for (i = 0; i < Lx; ++i) { *y++ = (*x<<s) | c; c = *x++ >> t; } *y = c; } static void SFR(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 s) { // y <- x>>s, s<32 Int32 i; UInt32 c = 0, t = 32-s; y += Lx; x += Lx; for (i = 0; i < Lx; ++i) { *--y = c | (*--x>>s); c = *x << t; } *--y = c; } static UInt32 DivIpp(UInt32 y[], Int32 Lx, UInt32 x[], UInt32 d) { // y <- x/u Int32 i; UInt32 r = 0; for (i = Lx-1; i >= 0; --i) y[i] = UDIV(x[i],r,d,&r); return r; } static void DIVpq(UInt32 q[], Int32 Ly, UInt32 y[], Int32 Lx, UInt32 x[]) { // q <- y/x, y <- y%x Int32 i, j, k, n, m; UInt32 c, t, B, D, M, L; // ASSUME Ly >= Lx >= 2, y[Ly]==y[-1]==0 UInt64 pp; UInt32 *p = (UInt32*)(&pp); // p[0] = LS, p[1] = MS word of UInt64 pp _BitScanReverse(&n, x[Lx-1]); k = 31-n; // MSb of x (n = 1..31), (.>>32 undefined) D = (x[Lx-1] << k) | (x[Lx-2]>>n>>1); // normalize divisor by k = 31-n (0..30) B = (x[Lx-2] << k); // MS part of next digit of divisor if (Lx > 2) B |= (x[Lx-3]>>n>>1); // use one more word, if can for (i = Ly; i >= Lx; --i) { // y[Ly] = 0 M = (y[ i ] << k) | (y[i-1]>>n>>1); // dividend << k L = (y[i-1] << k) | (y[i-2]>>n>>1); // y is padded on both ends if (M < D) { // test (almost never overflow) t = UDIV(L,M,D,&c); // trial quotient digit (c <- mod) can be 1 too large pp = __emulu(t,B); // extra multiplication for test too large t if (p[1]>(c+1)) t--; } // correction when t is known too large else t = 0xffffffff; // max digit c = 0; m = i - Lx; // carry, index of q for (j = 0; j < Lx; ++j) { // reduce dividend pp = __emulu(x[j],t) + c; // 64-bit product, with carry c = p[1] + (y[m+j] < p[0]); // new 32-bit carry y[m+j] -= p[0]; } // reduce dividend digits if (y[i] != c) { // t too large: y[i] not cleared c = 0; // clear carry for (j = 0; j < Lx; ++j) // add back dividend y[m+j] = ADC(y[m+j],x[j],&c); // using/updating c = carry t--; } // decrement trial quotient y[i] = 0; // MS word of y is cleared q[m] = t; // store quotient digit } } static void DIVpp(UInt32 q[], Int32 Ly, UInt32 y[], Int32 Lx, UInt32 x[]) { // q <- y/x, y <- y%x Int32 i, j, n, k; UInt32 c, t, D, M, L; // ASSUME Ly >= Lx >= 2, y[Ly]==y[-1]==0 UInt64 pp; UInt32 *p = (UInt32*)(&pp); // p[0] = LS, p[1] = MS word of UInt64 pp _BitScanReverse(&n, x[Lx-1]); k = 31-n; // MSb of x (n = 1..31), (.>>32 undefined) D = (x[Lx-1] << k) | (x[Lx-2]>>n>>1); // normalize divisor by k = 31-n (0..30) for (i = Ly; i >= Lx; --i) { // y[Ly] = 0 M = (y[ i ] << k) | (y[i-1]>>n>>1); // dividend << k L = (y[i-1] << k) | (y[i-2]>>n>>1); // y is padded on both ends t = M < D ? UDIV(L,M,D,&c) : 0xffffffff; // trial quotient digit (c <- mod) can be 1 too large c = 0; // carry for (j = 0; j < Lx; ++j) { // reduce dividend pp = __emulu(x[j],t) + c; // 64-bit product, with carry c = p[1] + (y[i-Lx+j] < p[0]); // new 32-bit carry y[i-Lx+j] -= p[0]; } // reduce dividend digits if (y[i] != c) { // t too large: y[i] not cleared c = 0; // clear carry for (j = 0; j < Lx; ++j) // add back dividend y[j+i-Lx] = ADC(y[j+i-Lx],x[j],&c); // using/updating c = carry t--; } // decrement trial quotient y[i] = 0; // MS word of y is cleared q[i-Lx] = t; // store quotient digit } } static void MULpp(UInt32 *z, Int32 Ly, UInt32 *y, Int32 Lx, UInt32 *x) { // z <- y*x Int32 i, j; UInt32 c, *yy, *zz; UInt64 p; UInt32 *q = (UInt32*)(&p)+1; // MS 32 bits of p for (i = 0; i < Lx; ++i) { c = 0; yy = y; zz = z + i; for (j = 0; j < Ly; ++j) { p = __emulu(*yy++,*x) + c; *zz += (UInt32)p; c = *q + (*zz++ < (UInt32)p); } *zz = c; ++x; } } static UInt32 MulIpp(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 u) { // y <- x*u Int32 i; UInt32 c = 0; UInt64 p; for (i = 0; i < Lx; ++i) { p = __emulu(*x++,u) + c; c = (UInt32)(p >> 32); *y++ = (UInt32)p; } *y = c; return c; } static Int32 LENGTH(Int32 Lx, UInt32 x[]) { // <- index of nonzero MS word = 1..Len Int32 i; for (i = Lx-1; i >= 0; --i) if (x[i] > 0) return i+1; return 1; } static void ADDpp(UInt32 *z, Int32 Ly, UInt32 *y, Int32 Lx, UInt32 *x) { // z <- y+x Int32 i, c = 0; // ASSUMED: y > x, Len(z) = Ly+1 for (i = 0; i < Lx; ++i) *z++ = ADC(*y++,*x++,&c); // c = new carry for (i = 0; i < Ly-Lx; ++i) *z++ = ADC(*y++,0,&c); // c = new carry *z = c; } static void SUBpp(UInt32 *z, Int32 Ly, UInt32 *y, Int32 Lx, UInt32 *x) { // z <- y-x Int32 i, b = 0; // ASSUMED: z is all 0, y > x: no final borrow for (i = 0; i < Lx; ++i) *z++ = SBB(*y++,*x++,&b); // b = new borrow for (i = 0; i < Ly-Lx; ++i) *z++ = SBB(*y++,0,&b); } static Int32 CMPpp(Int32 Len, Int32 *eq, UInt32 x[], UInt32 y[]) { // <- -0+, eq <- #equal words Int32 i; *eq = 0; for (i = Len-1; i >= 0; --i) if (x[i] == y[i]) *eq += 1; else return x[i]>y[i] ? 1 : -1; return 0; } static UInt32 AddIpp(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 u) { // y <- x+u, return carry Int32 i; for (i = 0; i < Lx; ++i) { *y = u + *x++; u = *y++ < u; // carry } *y = u; return u; } static UInt32 SubIpp(UInt32 *y, Int32 Lx, UInt32 *x, UInt32 u) { // y <- x-u, return borrow Int32 i; for (i = 0; i < Lx; ++i) { *y++ = *x - u; u = *x++ < u; // borrow } return u; } static void UInt32toHex(UInt8 *hex, UInt32 u) { // in hex room for 5 bytes Int32 i; UInt32 d, f = 0; for (i=28; i>=0; i-=4) { d = (u >> i) & 15; f += d; if ((f>0) || (i==0)) { if (d > 9) *hex++ = d + 55; else *hex++ = d + 48; } } *hex = 0; } static void UInt32to8Hex(UInt8 *hex, UInt32 u) { // in hex room for 5 bytes Int32 i, d; for (i=28; i>=0; i-=4) { d = (u >> i) & 15; if (d > 9) *hex++ = d + 55; else *hex++ = d + 48; } *hex = 0; }They are more or less straightforward. There are 3 assembler functions:
UDIV, a wrapper for the single Pentium instruction for 64/32-bit division, which is, surprisingly, not available as intrinsic function, so it had to be coded in assembler.
ADC and SBB, which perform addition and subtraction, with carry/borrow, respectively. One can determine in C if an addition produces a carry (checking if the result is less than one of the added numbers), but it is quite tedious: a+b+carry needs a test after each +.
One somewhat more complicated function is the long division: DIVpp, and its faster variant DIVpq. I tried to make them as short as possible, without relying on any external function (except the inlined assembler functions). They are the traditional (grammar school) division algorithm with a couple of tricks to reduce code size. DIVpq has a little longer code, but it is noticeably faster on large numbers. (The trial quotient digit is estimated better, so fewer correction steps are necessary.)
The most complicated function is the extended Euclidean algorithm for finding the greatest common divisor of two long integers. This variant does not use multiplication or division, but shifts. The algorithm also returns two multipliers, to express the GCD as the linear combination of the two input. It can be used to compute modular inverses, that is, finding y for an x, such that mod(x*y, m) = 1.
Test the code on large, structured input (random values likely work), and post here the input which yield the wrong result or cause the script to crash, or cycle indefinitely.
Edit 20070802: Added functions ADC and SBB to simplify handling carry/borrow; and sped up and simplified further the division algorithm.
Edit 20070817:
- bugfix in MP_DEC
- tab implemented in MP_DEC
- new functions MP_SubIpp, MP_GCD, PI0
- faster division
Edit 20100613: A directory in AutoHotkey.net contains now the Multi-precision integer library and related functions. The library has to be included, before the long integer functions are used. (It contains a call to the initialization function.)
Some tests and example code is included in a test script.
The C source code of the machine code routines is also uploaded for reference.