 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
Laszlo
Joined: 14 Feb 2005 Posts: 4517 Location: Boulder, CO
|
Posted: Sat Jul 08, 2006 11:02 pm Post subject: |
|
|
| @Veovis: Although PhiLho did a nice job with his codec, I would stick to Base64. As I said, when you receive an email attachment, most of the time the file is already encoded this way, so you could save the conversion. The main advantage of Pebwa is that you could see embedded text in the encoded binary, but the size of the coded file is not much different. |
|
| Back to top |
|
 |
Veovis
Joined: 13 Feb 2006 Posts: 390 Location: Utah
|
Posted: Sun Jul 09, 2006 5:10 am Post subject: |
|
|
Alright, I like the idea of the Base64 encoder.
When you transport Hex to ascii it doubles its size, and this can compress it 66% so using this you can store hex in text files at 133% its original size. Which isnt too bad considering.
So I took your base64 encoder/decoder and stared at it for a couple hours till i understood how it worked. Then modified it to work on Hex rather than Ascii. However, i seem to have broken it. It works fine when the hex string you feed it has a length divisible by 3. But otherwise it either adds a zero or the last hex digit turns into a zero. Kinda a major problem when dealing with encoding files. @Laszlo, HELP! I assume the problem is in the lines right after the string parsing loop. But its my bedtime right now (one of the joys of being a teenager) and i cant figure it out. Thanks for your lovely encoder and thanks in advance for your help!
| Code: | #singleinstance force
Chars = 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+ ;i rearranged the chars so "0000 00" = 0 and "1111 11" =
test1 = ff7e4c3bc023
test2 = 8ca9d
test3 = f001
msgbox, % "string:`t" test1 "`nencoded:`t" HextoBase64(test1) "`ndecoded:`t" Base64toHex(HextoBase64(test1))
msgbox, % "string:`t" test2 "`nencoded:`t" HextoBase64(test2) "`ndecoded:`t" Base64toHex(HextoBase64(test2))
msgbox, % "string:`t" test3 "`nencoded:`t" HextoBase64(test3) "`ndecoded:`t" Base64toHex(HextoBase64(test3))
StringCaseSense On
;msgbox, % "encoding:`n`n" In "`n`nInto: " HextoBase64(In)"`n`nstring was " strlen(in) " long and is now " strlen(HextoBase64(In))
HextoBase64(string) {
Loop Parse, string
{
m := Mod(A_Index,3)
IfEqual m,1, SetEnv buffer, % Dec("0x" A_loopfield) << 8
Else IfEqual m,2, EnvAdd buffer, % Dec("0x" A_loopfield) << 4
Else {
buffer += Dec("0x" A_loopfield)
out := out Code(buffer>>6) code(buffer)
}
}
IfEqual m,0, Return out
IfEqual m,1, Return out Code(buffer) "=="
Return out Code(buffer>>6) Code(buffer) "="
}
Base64toHex(code) {
stringreplace,code,code,=,,all
Loop Parse, code
{
m := Mod(A_index,2)
IfEqual m,0, {
buffer += DeCode(A_LoopField)
out := out Trim(Hex(buffer>>8)) Trim(Hex(15 & buffer>>4)) Trim(Hex(15 & buffer))
}
Else SetEnv buffer, % DeCode(A_LoopField) << 6
}
IfEqual m,0, return out
IfEqual m,1, Return out Trim(Hex(15 & buffer>>8))
Return out Trim(Hex(15 & buffer>>8)) Trim(Hex(15 & buffer>>4))
}
Code(i) { ; <== Chars[i & 63], 0-base index
Global Chars
StringMid i, Chars, (i&63)+1, 1
Return i
}
DeCode(c) { ; c = a char in Chars ==> position [0,63]
Global Chars
Return InStr(Chars,c,1) - 1
}
Dec(hexin) {
currentformat := A_formatinteger
setformat,integer,d
hexin += 0
setformat,integer, %currentformat%
return hexin
}
Hex(decin) {
currentformat := A_formatinteger
setformat,integer,h
decin += 0
setformat,integer, %currentformat%
return decin
}
Trim(hexin) {
stringleft,beg,hexin,2
if beg = 0x
stringtrimleft,hexin,hexin,2
return hexin
} |
Lol, i now something is wrong because in the decoding function mod(A_index,2) can only return 0 and 1, and i have 3 ifs. But im to tired to think. _________________
"Power can be given overnight, but responsibility must be taught. Long years go into its making." |
|
| Back to top |
|
 |
Laszlo
Joined: 14 Feb 2005 Posts: 4517 Location: Boulder, CO
|
Posted: Sun Jul 09, 2006 5:20 pm Post subject: |
|
|
A direct conversion from hex to base 64 is a bit more complicated, because you have to remember half-digits. It looks easier to use binary as an intermediate format: | Code: | Hex2Bin(bin,"0123456789")
MsgBox % Bin2Hex(bin)
Bin2Hex(ByRef b, n=0) ; n bytes binary data -> stream of 2-digit hex
{ ; n = 0: all (SetCapacity can be larger than used!)
format = %A_FormatInteger% ; save original integer format
SetFormat Integer, Hex ; for converting bytes to hex
m := VarSetCapacity(b)
If n not between 1 and %m% ; invalid length -> all allocated
n = %m%
Loop %n%
h := h 256+*(&b+A_Index-1) ; concatenate 0x1xx
StringReplace h, h, 0x1,,All ; remove every 0x1
SetFormat Integer, %format% ; restore original format
Return h
}
Hex2Bin(ByRef bin, hex) { ; Convert hex and write as binary to bin
VarSetCapacity(bin, StrLen(hex)//2)
Loop Parse, hex
If (A_Index & 1) ; Odd index
x = 0x%A_LoopField% ; 1st hex digit of a Byte
Else
DllCall("RtlFillMemory",UInt,&bin+A_Index//2-1, UInt,1, UChar,x A_LoopField)
} | I don't understand your point with "", but I'll see, how easy is a direct conversion. |
|
| Back to top |
|
 |
Veovis
Joined: 13 Feb 2006 Posts: 390 Location: Utah
|
Posted: Sun Jul 09, 2006 5:30 pm Post subject: |
|
|
Alright i figured it out
the reason i use (and rearragned the chars) is just a personal preference so that hex of "ffffff000000" would turn into "0000" rather than "AAAA////"
| Code: | /* Hex to Base64 encoder/decoder
by Veovis
Based off of Laszlos Ascii to Base64 encoder
Example of how it works:
Hex: f f 7 e 4 c
1111 1111 0111 1110 0100 1100 transform to binary
111111 110111 111001 001100 rearrange the bits into groups of 6
(63) (55) (57) (12) (what those are in decimal)
t v C
becuase of half digits, i use the char "-" to represent that when you decode the string remove the last digit
So we get about 66% compression, but shorter string are less successful, especially if they have half digits
*/
#singleinstance force
Chars = 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+ ;i rearranged the chars so "0000 00" = 0 and "1111 11" =
StringCaseSense On
test1 = ff7e4c8cba838d8e0c9
test2 = 8ca9dec
test3 = f0011e
test4 = c8ffc
test5 = ff00
test6 = 103
test7 = ff
loop 7
msgbox, % "string:`t" test%A_index% "`nencoded:`t" HextoBase64(test%A_index%) "`ndecoded:`t" Base64toHex(HextoBase64(test%A_index%))
HextoBase64(string) {
Loop Parse, string
{
m := Mod(A_Index,3)
IfEqual m,1, SetEnv buffer, % Dec("0x" A_loopfield) << 8
Else IfEqual m,2, EnvAdd buffer, % Dec("0x" A_loopfield) << 4
Else {
buffer += Dec("0x" A_loopfield)
out := out Code(buffer>>6) code(buffer)
}
}
IfEqual, m, 0, return out
IfEqual, m, 1, return out Code(buffer>>6)
IfEqual, m, 2, return out Code(buffer>>6) Code(buffer) "-"
}
Base64toHex(code) {
ifinstring,code,-,setenv,trim,1
stringreplace,code,code,-,,a
Loop Parse, code
{
m := Mod(A_index,2)
IfEqual m,0, {
buffer += DeCode(A_LoopField)
out := out Trim(Hex(buffer>>8)) Trim(Hex(15 & buffer>>4)) Trim(Hex(15 & buffer))
}
Else SetEnv buffer, % DeCode(A_LoopField) << 6
}
IfEqual m,1, setenv,out,% out Trim(Hex(15 & buffer>>8))
IfEqual trim,1, stringtrimright,out,out,1
return out
}
Code(i) { ; <== Chars[i & 63], 0-base index
Global Chars
StringMid i, Chars, (i&63)+1, 1
Return i
}
DeCode(c) { ; c = a char in Chars ==> position [0,63]
Global Chars
Return InStr(Chars,c,1) - 1
}
Dec(hexin) {
currentformat := A_formatinteger
setformat,integer,d
hexin += 0
setformat,integer, %currentformat%
return hexin
}
Hex(decin) {
currentformat := A_formatinteger
setformat,integer,h
decin += 0
setformat,integer, %currentformat%
return decin
}
Trim(hexin) { ;trims the 0x off of hex
stringleft,beg,hexin,2
if beg = 0x
stringtrimleft,hexin,hexin,2
return hexin
} |
_________________
"Power can be given overnight, but responsibility must be taught. Long years go into its making." |
|
| Back to top |
|
 |
PhiLho
Joined: 27 Dec 2005 Posts: 6723 Location: France (near Paris)
|
Posted: Sun Jul 09, 2006 6:55 pm Post subject: |
|
|
| Laszlo wrote: | | Nice idea! If you use base64 encoding (see here, you can save some space and time, and still be standard conform. With base 85, 128 etc. further insignificant memory savings are possible, but the data will be nonstandard. | What is the advantage of using "standard" encoding here? I don't necessarily advocate the use of Pebwa (although it is mosly a toy, it does a smaller encoding than Base64. I guess Ascii85 is even better). I can understand the necessity to stick to standards with encryption, where an error can be costly! But here, it is mostly for use within a given script, ie. if it works and has good performance, it can be used. This is not to exchange data with friends or something. _________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2") |
|
| Back to top |
|
 |
Laszlo
Joined: 14 Feb 2005 Posts: 4517 Location: Boulder, CO
|
Posted: Sun Jul 09, 2006 8:04 pm Post subject: |
|
|
| PhiLho wrote: | | What is the advantage of using "standard" encoding here? | You can copy into the script already encoded files, like email attachments. |
|
| Back to top |
|
 |
corrupt
Joined: 29 Dec 2004 Posts: 2446
|
Posted: Sun Jul 09, 2006 9:38 pm Post subject: |
|
|
I played around a bit with the script for compressing an existing file to be included and put together a couple functions for compressing/decompressing the data. Nothing too fancy. Basically a combination of Hex to AscII and pattern compression put together for fun. The compression ratio seems reasonable for most files so far considering the time spent on it (tested 50-85 % compression of the hex output so far) but hasn't been extensively tested and isn't incredibly fast. I also added a small function for splitting the lines so that the data can be easily copied and pasted into a script. When using Join the ` option is required. Maybe someone will find the modifications useful .
Edit: Posted an updated version of the code here : http://www.autohotkey.com/forum/viewtopic.php?p=68357#68357
Last edited by corrupt on Mon Jul 17, 2006 10:53 am; edited 1 time in total |
|
| Back to top |
|
 |
corrupt
Joined: 29 Dec 2004 Posts: 2446
|
Posted: Sun Jul 09, 2006 10:01 pm Post subject: |
|
|
| Laszlo wrote: | | PhiLho wrote: | | What is the advantage of using "standard" encoding here? | You can copy into the script already encoded files, like email attachments. | Good point. I just tested adding a script into a script using a slightly modified version of the scripts I posted above and the included script extracts and runs Ok .
An uncompiled script can be used as a Self-extracting archive Cool  |
|
| Back to top |
|
 |
Laszlo
Joined: 14 Feb 2005 Posts: 4517 Location: Boulder, CO
|
Posted: Sun Jul 09, 2006 10:38 pm Post subject: |
|
|
| @Corrupt: could you tell in a few words, how the compression works? |
|
| Back to top |
|
 |
corrupt
Joined: 29 Dec 2004 Posts: 2446
|
Posted: Sun Jul 09, 2006 11:30 pm Post subject: |
|
|
| Laszlo wrote: | | @Corrupt: could you tell in a few words, how the compression works? |
Sure . It's a bit messy to follow but there's not much to it. I changed it a couple times so hopefully I'm giving the right values that I used.
- it first starts a loop that counts from 2A to FE in Hex
- the hex values are then replaced in the text with ASCII characters except for those between 127 - 176, upper and lower case letters a - f and numeric characters 0 - 9
- once the characters have been replaced a string is built of each ASCII character from 128 - 176
- another couple loops are then started. One that counts from 42 to 255 (the range from 127 to 176 is skipped again) and another that loops 48 times
- for each character in the ASCII range a string of 48 of the same character is created
- The loop then counts down looking for a set of characters in a row that are between 3 - 51 characters long. If a match is found it replaces the string of x characters (anywhere from 3 - 51) with the character that is repeated followed by the next available character from the parsing loop (the characters between 128-176). The value of the character that is added to replace the group of multiple characters is used to identify how many characters were removed.
In short, hex pairs are replaced with ASCII characters when within certain ranges then characters that repeat (up to 51 times) are replaced with 2 characters (the repeated character followed by an ASCII character in a different range than the first range used for replacement.
I'd welcome any input for improvements. I know of a few ways to improve the compression but I figured I'd stop there for now as a compromise on speed vs compression. |
|
| Back to top |
|
 |
Laszlo
Joined: 14 Feb 2005 Posts: 4517 Location: Boulder, CO
|
Posted: Sun Jul 09, 2006 11:44 pm Post subject: |
|
|
@Veovis: I looked at your hex to base64 converter. It looks good.
If you could keep it conform to the standard, we could find other uses of it, like generating/processing a hex file with a script, encode it and send it with a command line email program.
Also, half-bytes do not seem to be necessary. The purpose is to handle binary files in a script, and they always contain an integer number of bytes. Therefore, we could assume an even number of hex digits. Do you know an application, which needs an odd number of digits? I looks ambiguous, too: do we assume an implicit leading- or trailing 0? |
|
| Back to top |
|
 |
Veovis
Joined: 13 Feb 2006 Posts: 390 Location: Utah
|
Posted: Mon Jul 10, 2006 12:25 am Post subject: |
|
|
@PhiLho
Concerning Pebwa, while i think it is a great idea, i am not near advanced enough to fully understand how it works and since it leaves all normal chars alone, it does not encode the hexadecimal. It could probably be rewritten to compress much better than Base64, but I'm fine with 133%. And i think Laszlo has a valid point about keeping standard.
@Laszlo
I assume the standard for Base64 is:
| Code: | | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ |
I have now changed my function to use this string, and unless you correct me it is the one i will use in my app im working on. i agree that it would be benificial to keep standard so that any Base64 file will work (like email attachments.)
You also have a valid point about half bytes. But (and im actually still really confused about this all) since in encodes things 3 hex digits at a time into 2 Base64 digits, and bytes come in sets of 2 hex digits, you could have a even number of bytes that needs that - sign.
For example:
| Code: | string: ff7e4c8c ;4 bytes of data
encoded: /35MjA-
decoded: ff7e4c8c |
In case i wasnt clear, when you encoding a string of hex that has mod(strlen,3) = 2 (for example a string 8 hex digits long), becuase of the ratio that it compresses things (take 3 hex digits give 2 base64 digits), you end up with 2 hex making 2 base64, and when you decode that it makes 3 hex digits, so that last digit has to be trimmed off when you decode the string. So i place a "-" at the end of a string to tell the decoder to remove the trailing zero after it decodes.
I am curious as to how the "standard" base64 avoids this ratio problem. _________________
"Power can be given overnight, but responsibility must be taught. Long years go into its making." |
|
| Back to top |
|
 |
Veovis
Joined: 13 Feb 2006 Posts: 390 Location: Utah
|
Posted: Mon Jul 10, 2006 12:44 am Post subject: |
|
|
Ah, i found the answers to most of my questions.
See here for more details
| Quote: | | If there are two input bytes remaining (the remainder of the total input bytes divided by three is two), pad with one "=". If there is one input byte remaining (remainder was one), pad with two "=", otherwise, dont pad. This prevents extra bits being added to the reconstructed data. |
That also answers my question about what the standard is. +/ are the last 2 digits, and = is the paddin character. except that we want to keep standard, i am almost tempted to use "-" instead of "=" and "=" instead of "==".
Also i will add this into my function as well:
| Quote: | | newlines are inserted in the encoded data every 76 characters, |
_________________
"Power can be given overnight, but responsibility must be taught. Long years go into its making." |
|
| Back to top |
|
 |
Laszlo
Joined: 14 Feb 2005 Posts: 4517 Location: Boulder, CO
|
Posted: Mon Jul 10, 2006 12:52 am Post subject: |
|
|
You are fast! Anyway, this is a version of your script, which seems to be standard conform: | Code: | StringCaseSense On
Chars = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
MsgBox % HextoBase64("12")
MsgBox % Base64toHex("Eg==")
MsgBox % HextoBase64("1234")
MsgBox % Base64toHex("EjQ=")
MsgBox % HextoBase64("123456")
MsgBox % Base64toHex("EjRW")
HextoBase64(hex) { ; StrLen(hex) must be even
Loop Parse, hex
{
m := Mod(A_Index,3)
x = 0x%A_loopfield%
IfEqual m,1, SetEnv z, % x << 8
Else IfEqual m,2, EnvAdd z, % x << 4
Else {
z += x
o := o Code(z>>6) code(z)
}
}
IfEqual m,2, Return o Code(z>>6) Code(z) "=="
IfEqual m,1, Return o Code(z>>6) "="
Return o
}
Base64toHex(code) {
StringReplace code, code, =,, All
Loop Parse, code
If (A_Index & 1)
z := DeCode(A_LoopField) << 6
Else {
z += DeCode(A_LoopField)
o := o H1(z>>8) H1(z>>4) H1(z)
}
If (StrLen(code)&3 = 3)
Return o H1(z>>8)
If (StrLen(code)&3 = 2)
StringTrimRight o,o,1
Return o
}
H1(x) { ; LS hex digit
Return Chr((x&15)+48 + 7*(x&15>9))
}
Code(i) { ; <== Chars[i & 63], 0-base index
Global Chars
StringMid i, Chars, (i&63)+1, 1
Return i
}
DeCode(c) { ; c = a char in Chars ==> position [0,63]
Global Chars
Return InStr(Chars,c,1) - 1
} |
Edit 20060717: Simplified H1, added tests
Last edited by Laszlo on Mon Jul 17, 2006 7:48 pm; edited 4 times in total |
|
| Back to top |
|
 |
Veovis
Joined: 13 Feb 2006 Posts: 390 Location: Utah
|
Posted: Mon Jul 10, 2006 1:21 am Post subject: |
|
|
Hmmmm, (i might be wrong) but i think you did that wrong. You check the remainder of strlen(code) / 3 and you should have checked for the remainder of strlen(hex) / 3
Wait, it only gets it wrong if you give it a half-byte. Hmmmm. Not sure how that works, and I guess it doesnt matter since no one should give it half-bytes.
Also, it appears that in your code you switched where the "=" and "==" should be added.
But i do like how you eliminated the need of my silly Dec() and Hex() and Trim() functions. And i like your H1() function.
In anycase, as much as i want to stick to the standard, i dont understand the purpose of adding "==" of "=" to the code if all you do is immediatly delete it when you decode the base64. _________________
"Power can be given overnight, but responsibility must be taught. Long years go into its making." |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|