I was playing around with binary encoding a little bit in the last time, thus i was incidentally creating a little help library, which i want to share with you.
Like the topic title is saying, this library contains a function "gpBinEncode()", which allows you to put any data through any binary coding with fixed sourcewords and codewords in a serial manner.
This means for example (amongst other codes): parity codes, shannon-fano- or huffman-optimal-coding, hamming codes, cyclic codes, ...
Additionally you'll find 2 help functions "gpLoadBinString()" / "gpStoreBinString()" to read and write binary data of any length into/from the memory. This was espacially usefull for me, when testing different codings with gpBinEncode().
Another function "gpApproxAverageLength()" is simply used internally for some kind of optimization. It approximates the average code word length from a given encoding table.
The full range of functions:
gpBinEncode(ByRef sourceData, ByRef encodedData, encodingTable, sourceLength="StrLen", ByRef leftoverData := "omitted", ByRef leftoverLength := "")
gpStoreBinString(ByRef var, binString)
gpLoadBinString(ByRef var, length)
gpApproxAverageLength(encodingTable)
gpStoreBinString(ByRef var, binString)
gpLoadBinString(ByRef var, length)
gpApproxAverageLength(encodingTable)
The library (with a short documentation for each function):
gpBinEncode.ahk
Code: Select all
/*
#####################################################################################
General Purpose Binary Encode - "gpBinEncode" - v0.2 by Alibaba - 19 August 2015
http://ahkscript.org/boards/viewtopic.php?f=10&t=9209
#####################################################################################
MAIN FUNCTIONS:
gpBinEncode( ByRef sourceData,
ByRef encodedData,
encodingTable,
sourceLength="StrLen",
ByRef leftoverData := "omitted",
ByRef leftoverLength := "" )
INTERNAL / HELP FUNCTIONS:
gpStoreBinString(ByRef var, binString)
gpLoadBinString(ByRef var, length)
gpApproxAverageLength(encodingTable)
#####################################################################################
*/
; Function: gpBinEncode
; Description: Performs a serial binary coding on the given data, based on any given encoding table.
;
; ByRef sourceData The variable holding the source data for encoding
; ByRef encodedData The variable in which the encoded output will be stored
; encodingTable An array/object where indices are the binary source words and values are the respective code words
; Note: source words and code words must always be stored as strings! Specifying a "-" as code word will act as "no output".
; sourceLength Amount of bits (not bytes!) to be encoded
; If omitted, the length will be calculated using StrLen() and the string encoding (UTF-8/UTF-16)
; A string should be passed in order for this to work correctly!
; ByRef leftoverData (Optional) The variable in which the remainder of the source data which couldn't be encoded (no match in the encoding table), will be stored
; ByRef leftoverLength (Optional) The variable in which the length (amount of bits) of leftoverData will be stored
;
; returns: The length of encodedData in bits
;
; notes: Any output that is produced (encodedData, leftoverData), will be written to an integer amount of bytes, although the length of the respective
; data in bits doesn't have to be a multiple of 8, meaning that the last bits of the output will always be filled up to a full byte with zeros.
gpBinEncode(ByRef sourceData, ByRef encodedData, encodingTable, sourceLength="StrLen", ByRef leftoverData := "omitted", ByRef leftoverLength := ""){
if (sourceLength == "StrLen")
sourceLength := A_IsUnicode ? (StrLen(sourceData) * 16) : StrLen(sourceData) * 8
encodedApproxSize := ceil(sourceLength * gpApproxAverageLength(encodingTable))
VarSetCapacity(encodedData,encodedApproxSize,0)
VarSetCapacity(inputbuffer,1,0)
sourcewordbuffer := "", codewordbuffer := "", bytecount := 0, encodedBitsCounter := 0, encodedDataLength := 0
Loop % sourceLength {
inputbuffer := NumGet(sourceData, (A_Index - 1), "UChar")
Loop, 8 {
bitposition := (8 - A_Index), nextbit := (inputbuffer & (2 ** (8 - A_Index))) && 1, sourcewordbuffer .= "" . nextbit
if (encodingTable["" . sourcewordbuffer] != "") {
if (encodingTable["" . sourcewordbuffer] != "-"){
bitcount := StrLen(codewordbuffer .= encodingTable["" . sourcewordbuffer])
while (bitcount >= 8) {
NumPut(Dec(SubStr(codewordbuffer, 1, 8)), encodedData, bytecount, "UChar")
codewordbuffer := SubStr(codewordbuffer, 9), bitcount -= 8, bytecount++, encodedDataLength += 8
}
}
sourcewordbuffer := ""
}
encodedBitsCounter++
if (encodedBitsCounter == sourceLength)
break, 2
}
}
if(bitcount := StrLen(codewordbuffer .= encodingTable["" . sourcewordbuffer])){
NumPut((Dec(SubStr(codewordbuffer, 1, 8)) << (8 - bitcount)), encodedData, bytecount, "UChar")
encodedDataLength += bitcount
}
if((leftoverData != "omitted") && (leftoverLength := StrLen(sourcewordbuffer))){
gpStoreBinString(leftoverData, sourcewordbuffer)
}
return encodedDataLength
}
;##################################################################################
;
; Function: gpStoreBinString
; Description: Stores any "binary" string (simple string of 0's and 1's) as binary data into an integer amount of bytes at the given location
;
; ByRef var The destination variable
; binStr The "binary" string that will be converted to binary data and stored
;
; returns: The amount of stored bits / equal to StrLen(binStr)
;
; notes: Any output that is produced, will be written to an integer amount of bytes, although the length of the respective data in bits doesn't
; have to be a multiple of 8, meaning that the last bits of the output will always be filled up to a full byte with zeros.
; The output is written from most significant bit to last significant bit, meaning that the first "bit" from the string will be the first
; byte's most significant bit at the specified destination
gpStoreBinString(ByRef var, binString){
requiredBytes := ceil((strlen := StrLen(binString)) / 8)
VarSetCapacity(var,requiredBytes,0)
Loop, % requiredBytes {
byte := SubStr(binString, A_Index * 8 - 7, 8), len := StrLen(byte)
NumPut(Dec(byte) << (8 - len), var, (A_Index - 1), "UChar")
}
return strlen
}
;##################################################################################
;
; Function: gpLoadBinString
; Description: Reads a specified amount of bits from the specified location, returns it as a string of 0's and 1's
;
; ByRef var The source variable to read from
; length The amount of bits to read
;
; returns: The "binary" string containing the bits from the source variable
;
; notes: Reading is done from most significant bit to last significant bit, meaning that the first byte's most significant bit at the specified
; source, will be the first "bit" in the "binary" string.
gpLoadBinString(ByRef var, length){
binStr := "", requiredBytes := ceil(length / 8)
Loop, % requiredBytes {
bin := Bin(NumGet(var, (A_Index - 1), "UChar")), bin := SubStr("00000000", 1, 8 - StrLen(bin)) . bin
binStr .= (length > 8) ? bin : SubStr(bin, 1, length), length -= 8
}
return binStr
}
;##################################################################################
;
; Function: gpApproxAverageLength
; Description: Approximates the average length of output per sourceword-bit, based on a given encoding table, by assuming that shorter code words
; appear more often than long ones
;
; encodingTable An array/object where indices are the binary source words and values are the respective code words
; Note: source words and code words must always be stored as strings!
;
; returns: The approximated average amount of bits produced by encoding a bit from a binary source word (code ratio)
gpApproxAverageLength(encodingTable){
maxLm := 0
for sourceWord, codeWord in encodingTable {
maxLM := (Lm := (codeWord / sourceWord) > maxLm) ? Lm : maxLm
}
return maxLm
}
;##################################################################################
;##################################################################################
;
;Binary and Decimal conversion functions by infogulch
;http://www.autohotkey.com/board/topic/49990-binary-and-decimal-conversion/
Dec(x){
b:=StrLen(x),r:=0
loop,parse,x
r|=A_LoopField<<--b
return r
}
Bin(x){
while x
r:=1&x r,x>>=1
return r
}
3 simple examples: (you could question the real applicability of these examples, but they give a good quick summary of the libraries functionality )
Examples.ahk
I'm happy over any feedback!Code: Select all
#NoEnv
#Include gpBinEncode.ahk
;Create our binary data
dataLength := gpStoreBinString(data, "000111010100001")
;===== EXAMPLE 1 =====
;Remove zeros from data / get data "weight" (amount of 1's)
remZeros := Object("1","1","0","-")
encodedLength := gpBinEncode(data, encodedData, remZeros, dataLength)
MsgBox % "EXAMPLE 1`nInput:`t" . gpLoadBinString(data, dataLength) . " (Length: " . dataLength . ")`n" . "Encoded:`t" . gpLoadBinString(encodedData, encodedLength) . " (Length: " . encodedLength . ")"
;===== EXAMPLE 2 =====
;Interpret input as 3bit groups, extend to 4bit groups with odd parity:
addParity := Object("000","0001","001","0010","010","0100","011","0111","100","1000","101","1011","110","1101","111","1110")
remParity := Object("0001","000","0010","001","0100","010","0111","011","1000","100","1011","101","1101","110","1110","111")
encodedLength := gpBinEncode(data, encodedData, addParity, dataLength)
decodedLength := gpBinEncode(encodedData, decodedData, remParity, encodedLength)
MsgBox % "EXAMPLE 2`nInput:`t" . gpLoadBinString(data, dataLength) . " (Length: " . dataLength . ")`n" . "Encoded:`t" . gpLoadBinString(encodedData, encodedLength) . " (Length: " . encodedLength . ")`n" . "Decoded:`t" . gpLoadBinString(decodedData, decodedLength) . " (Length: " . decodedLength . ")"
;===== EXAMPLE 3 =====
;Replace each 4bit binary fixed-point number of value n with exactly n 1's and a subsequent 0.
;15 bit Input will produce a 3 bit leftover since mod(15,4) = 3
;Since this code fulfills the prefix property, it's possible to decode it afterwards
;Generate encoding table by looping through all possible source words.
serialEncode := Object(), serialDecode := Object()
Loop 16 {
sourceWord := "" . SubStr("0000", 1, 4 - StrLen(tmpsw := Bin(A_Index - 1))) . tmpsw, codeWord := SubStr("111111111111111", 1, A_Index - 1) . "0"
serialEncode["" . sourceWord] := "" . codeWord, serialDecode["" . codeWord] := "" . sourceWord
}
encodedLength := gpBinEncode(data, encodedData, serialEncode, dataLength, leftoverData, leftoverLength)
decodedLength := gpBinEncode(encodedData, decodedData, serialDecode, encodedLength)
MsgBox % "EXAMPLE 3`nInput:`t" . gpLoadBinString(data, dataLength) . " (Length: " . dataLength . ")`n" . "Encoded:`t" . gpLoadBinString(encodedData, encodedLength) . " (Length: " . encodedLength . ")`n`tLeftover: " . gpLoadBinString(leftoverData, leftoverLength) . "`nDecoded:`t" . gpLoadBinString(decodedData, decodedLength) . " (Length: " . decodedLength . ")"
Greetings, Alibaba