- are programmed fully in AHK
- are much faster than cryptographic hash functions
- provide long enough hash values (64 or 128 bits), so a collision is very unlikely
They are modeled after LFSR based hash functions, like CRC-32. LFSR stands for Linear Feedback Shift Register. It can be programmed with a CPU register, holding an integer. This register is shifted to the left at processing the next bit of the input stream. The input bit is written to the now empty position of the register. The leftmost bit of the shifted register is removed. If it is 1, a certain integer is XOR-ed to the register. We repeat this until all the input bits are processed, when the final content of the register is returned as the hash value.
This procedure can be accelerated if more bits are processed at once. E.g. the register is shifted to the left by 8 bits, the whole next input byte is written to the end of the register, and one of 256 numbers is XOR-ed to the register, indexed by the shifted out byte. The presented 64-bit hash function works this way, with the table filled up with pseudorandom entries.
We can replace the XOR with another arithmetic function, like subtraction. Using the same constant table we get very different hash values. Concatenating this to the XOR version gives 128-bit hash values (32 hex digits).
The global pseudorandom table is initialized only at the first call, the values are re-used until the script is terminated. It can be set up using the built in random number generator, if an explicit seed is provided (otherwise hash values will be different at each run of the script). If hash values are used as integrity check (as CRC-32), we want them to be the same on different computers, regardless if the hash functions are programmed in AHK or any other language. Thus, we have to explicitly program the pseudorandom number generator. One really good and simple one calls the TEA cipher recursively, with an arbitrary key. It requires only a handful of AHK commands. The speed is not critical, because the table is created with 256 calls of TEA, only once at the first call, finishing in less than a second.
This hash functions are very fast, requiring only a handful operations for each input character. On my 2 GHz Centrino laptop I got the following results:
64-bit hash of 2.1 MB string = 10.8 sec
128-bit hash of 2.1 MB string = 21 sec
100,000 calls of 64-bit hash operations on 12 character strings: 8.25 sec
100,000 calls of 64-bit hash operations on 30 character strings: 19.25 sec
MsgBox % L64Hash("12345678")
MsgBox % L128Hash("12345678")
L64Hash(x) { ; 64-bit generalized LFSR hash of string x
Local i, R = 0
LHashInit() ; 1st time set LHASH0..LHAS256 global table
Loop Parse, x
{
i := (R >> 56) & 255 ; dynamic vars are global
R := (R << 8) + Asc(A_LoopField) ^ LHASH%i%
}
Return Hex8(R>>32) . Hex8(R)
}
L128Hash(x) { ; 128-bit generalized LFSR hash of string x
Local i, S = 0, R = -1
LHashInit() ; 1st time set LHASH0..LHAS256 global table
Loop Parse, x
{
i := (R >> 56) & 255 ; dynamic vars are global
R := (R << 8) + Asc(A_LoopField) ^ LHASH%i%
i := (S >> 56) & 255
S := (S << 8) + Asc(A_LoopField) - LHASH%i%
}
Return Hex8(R>>32) . Hex8(R) . Hex8(S>>32) . Hex8(S)
}
Hex8(i) { ; integer -> LS 8 hex digits
SetFormat Integer, Hex
i:= 0x100000000 | i & 0xFFFFFFFF ; mask LS word, set bit32 for leading 0's --> hex
SetFormat Integer, D
Return SubStr(i,-7) ; 8 LS digits = 32 unsigned bits
}
LHashInit() { ; build pseudorandom substitution table
Local i, u = 0, v = 0
If LHASH0=
Loop 256 {
i := A_Index - 1
TEA(u,v, 1,22,333,4444, 8) ; <- to be portable, no Random()
LHASH%i% := (u<<32) | v
}
}
; [y,z] = 64-bit I/0 block, [k0,k1,k2,k3] = 128-bit key
TEA(ByRef y,ByRef z, k0,k1,k2,k3, n = 32) { ; n = #Rounds
s := 0, d := 0x9E3779B9
Loop %n% { ; standard = 32, 8 for speed
k := "k" . s & 3 ; indexing the key
y := 0xFFFFFFFF & (y + ((z << 4 ^ z >> 5) + z ^ s + %k%))
s := 0xFFFFFFFF & (s + d) ; simulate 32 bit operations
k := "k" . s >> 11 & 3
z := 0xFFFFFFFF & (z + ((y << 4 ^ y >> 5) + y ^ s + %k%))
}
}




