AutoHotkey Community

It is currently May 27th, 2012, 8:46 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 85 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject:
PostPosted: July 8th, 2006, 11:02 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
@Veovis: Although PhiLho did a nice job with his codec, I would stick to Base64. As I said, when you receive an email attachment, most of the time the file is already encoded this way, so you could save the conversion. The main advantage of Pebwa is that you could see embedded text in the encoded binary, but the size of the coded file is not much different.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 5:10 am 
Offline

Joined: February 13th, 2006, 10:40 pm
Posts: 389
Location: Utah
Alright, I like the idea of the Base64 encoder.

When you transport Hex to ascii it doubles its size, and this can compress it 66% so using this you can store hex in text files at 133% its original size. Which isnt too bad considering.

So I took your base64 encoder/decoder and stared at it for a couple hours till i understood how it worked. Then modified it to work on Hex rather than Ascii. However, i seem to have broken it. It works fine when the hex string you feed it has a length divisible by 3. But otherwise it either adds a zero or the last hex digit turns into a zero. Kinda a major problem when dealing with encoding files. @Laszlo, HELP! I assume the problem is in the lines right after the string parsing loop. But its my bedtime right now (one of the joys of being a teenager) and i cant figure it out. Thanks for your lovely encoder and thanks in advance for your help!

Code:
#singleinstance force
Chars = 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+ƒ  ;i rearranged the chars so "0000 00" = 0 and "1111 11" = ƒ

test1 = ff7e4c3bc023
test2 = 8ca9d
test3 = f001

msgbox, % "string:`t" test1 "`nencoded:`t" HextoBase64(test1) "`ndecoded:`t" Base64toHex(HextoBase64(test1))
msgbox, % "string:`t" test2 "`nencoded:`t" HextoBase64(test2) "`ndecoded:`t" Base64toHex(HextoBase64(test2))
msgbox, % "string:`t" test3 "`nencoded:`t" HextoBase64(test3) "`ndecoded:`t" Base64toHex(HextoBase64(test3))

StringCaseSense On

;msgbox, % "encoding:`n`n" In "`n`nInto: " HextoBase64(In)"`n`nstring was " strlen(in) " long and is now " strlen(HextoBase64(In))

HextoBase64(string) {
   Loop Parse, string
   {
      m := Mod(A_Index,3)
      IfEqual      m,1, SetEnv buffer, % Dec("0x" A_loopfield) << 8
      Else IfEqual m,2, EnvAdd buffer, % Dec("0x" A_loopfield) << 4
      Else {
         buffer += Dec("0x" A_loopfield)
         out := out Code(buffer>>6) code(buffer)
      }
   }
   IfEqual m,0, Return out
   IfEqual m,1, Return out Code(buffer) "=="
   Return out Code(buffer>>6) Code(buffer) "="

}

Base64toHex(code) {
   stringreplace,code,code,=,,all
   Loop Parse, code
   {
      m := Mod(A_index,2)
      IfEqual m,0, {
         buffer += DeCode(A_LoopField)
         out := out Trim(Hex(buffer>>8)) Trim(Hex(15 & buffer>>4)) Trim(Hex(15 & buffer))
      }
      Else SetEnv buffer, % DeCode(A_LoopField) << 6
   }
   IfEqual m,0, return out
   IfEqual m,1, Return out Trim(Hex(15 & buffer>>8))
   Return out Trim(Hex(15 & buffer>>8)) Trim(Hex(15 & buffer>>4))

}

Code(i) {   ; <== Chars[i & 63], 0-base index
   Global Chars
   StringMid i, Chars, (i&63)+1, 1
   Return i
}

DeCode(c) { ; c = a char in Chars ==> position [0,63]
   Global Chars
   Return InStr(Chars,c,1) - 1
}

Dec(hexin) {
   currentformat := A_formatinteger
   setformat,integer,d
   hexin += 0
   setformat,integer, %currentformat%
   return hexin
}

Hex(decin) {
   currentformat := A_formatinteger
   setformat,integer,h
   decin += 0
   setformat,integer, %currentformat%
   return decin
}

Trim(hexin) {
   stringleft,beg,hexin,2
   if beg = 0x
      stringtrimleft,hexin,hexin,2
   return hexin
}


Lol, i now something is wrong because in the decoding function mod(A_index,2) can only return 0 and 1, and i have 3 ifs. But im to tired to think.

_________________
Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 5:20 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
A direct conversion from hex to base 64 is a bit more complicated, because you have to remember half-digits. It looks easier to use binary as an intermediate format:
Code:
Hex2Bin(bin,"0123456789")
MsgBox % Bin2Hex(bin)

Bin2Hex(ByRef b, n=0)            ; n bytes binary data -> stream of 2-digit hex
{                                ; n = 0: all (SetCapacity can be larger than used!)
   format = %A_FormatInteger%    ; save original integer format
   SetFormat Integer, Hex        ; for converting bytes to hex

   m := VarSetCapacity(b)
   If n not between 1 and %m%    ; invalid length -> all allocated
       n = %m%
   Loop %n%
      h := h 256+*(&b+A_Index-1) ; concatenate  0x1xx
   StringReplace h, h, 0x1,,All  ; remove every 0x1

   SetFormat Integer, %format%   ; restore original format
   Return h
}

Hex2Bin(ByRef bin, hex) {        ; Convert hex and write as binary to bin
   VarSetCapacity(bin, StrLen(hex)//2)
   Loop Parse, hex
      If (A_Index & 1)           ; Odd index
         x = 0x%A_LoopField%     ; 1st hex digit of a Byte
      Else
         DllCall("RtlFillMemory",UInt,&bin+A_Index//2-1, UInt,1, UChar,x A_LoopField)
}
I don't understand your point with "ƒ", but I'll see, how easy is a direct conversion.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 5:30 pm 
Offline

Joined: February 13th, 2006, 10:40 pm
Posts: 389
Location: Utah
Alright i figured it out

the reason i use ƒ (and rearragned the chars) is just a personal preference so that hex of "ffffff000000" would turn into "ƒƒƒƒ0000" rather than "AAAA////"


Code:
/*   Hex to Base64 encoder/decoder
         by Veovis
      Based off of Laszlos Ascii to Base64 encoder

   Example of how it works:

Hex:  f    f    7    e    4    c   
 
      1111 1111 0111 1110 0100 1100   transform to binary

      111111  110111 111001  001100   rearrange the bits into groups of 6

      (63)    (55)   (57)    (12)     (what those are in decimal)

      ƒ       t      v       C


becuase of half digits, i use the char "-" to represent that when you decode the string remove the last digit

So we get about 66% compression, but shorter string are less successful, especially if they have half digits
*/
     
#singleinstance force
Chars = 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+ƒ  ;i rearranged the chars so "0000 00" = 0 and "1111 11" = ƒ
StringCaseSense On

test1 = ff7e4c8cba838d8e0c9
test2 = 8ca9dec
test3 = f0011e
test4 = c8ffc
test5 = ff00
test6 = 103
test7 = ff

loop 7
   msgbox, % "string:`t" test%A_index% "`nencoded:`t" HextoBase64(test%A_index%) "`ndecoded:`t" Base64toHex(HextoBase64(test%A_index%))

HextoBase64(string) {
   Loop Parse, string
   {
      m := Mod(A_Index,3)
      IfEqual      m,1, SetEnv buffer, % Dec("0x" A_loopfield) << 8
      Else IfEqual m,2, EnvAdd buffer, % Dec("0x" A_loopfield) << 4
      Else {
         buffer += Dec("0x" A_loopfield)
         out := out Code(buffer>>6) code(buffer)
      }
   }
   IfEqual, m, 0, return out
   IfEqual, m, 1, return out Code(buffer>>6)
   IfEqual, m, 2, return out Code(buffer>>6) Code(buffer) "-"
}

Base64toHex(code) {
   ifinstring,code,-,setenv,trim,1
   stringreplace,code,code,-,,a
   Loop Parse, code
   {
      m := Mod(A_index,2)
      IfEqual m,0, {
         buffer += DeCode(A_LoopField)
         out := out Trim(Hex(buffer>>8)) Trim(Hex(15 & buffer>>4)) Trim(Hex(15 & buffer))
      }
      Else SetEnv buffer, % DeCode(A_LoopField) << 6
   }
   IfEqual m,1, setenv,out,% out Trim(Hex(15 & buffer>>8))
   IfEqual trim,1, stringtrimright,out,out,1
   return out
}

Code(i) {   ; <== Chars[i & 63], 0-base index
   Global Chars
   StringMid i, Chars, (i&63)+1, 1
   Return i
}

DeCode(c) { ; c = a char in Chars ==> position [0,63]
   Global Chars
   Return InStr(Chars,c,1) - 1
}

Dec(hexin) {
   currentformat := A_formatinteger
   setformat,integer,d
   hexin += 0
   setformat,integer, %currentformat%
   return hexin
}

Hex(decin) {
   currentformat := A_formatinteger
   setformat,integer,h
   decin += 0
   setformat,integer, %currentformat%
   return decin
}

Trim(hexin) {      ;trims the 0x off of hex
   stringleft,beg,hexin,2
   if beg = 0x
      stringtrimleft,hexin,hexin,2
   return hexin
}

_________________
Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 6:55 pm 
Offline

Joined: December 27th, 2005, 1:46 pm
Posts: 6837
Location: France (near Paris)
Laszlo wrote:
Nice idea! If you use base64 encoding (see here, you can save some space and time, and still be standard conform. With base 85, 128 etc. further insignificant memory savings are possible, but the data will be nonstandard.
What is the advantage of using "standard" encoding here? I don't necessarily advocate the use of Pebwa (although it is mosly a toy, it does a smaller encoding than Base64. I guess Ascii85 is even better). I can understand the necessity to stick to standards with encryption, where an error can be costly! But here, it is mostly for use within a given script, ie. if it works and has good performance, it can be used. This is not to exchange data with friends or something.

_________________
Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 8:04 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
PhiLho wrote:
What is the advantage of using "standard" encoding here?
You can copy into the script already encoded files, like email attachments.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 9:38 pm 
Offline
User avatar

Joined: December 29th, 2004, 1:28 pm
Posts: 2545
I played around a bit with the script for compressing an existing file to be included and put together a couple functions for compressing/decompressing the data. Nothing too fancy. Basically a combination of Hex to AscII and pattern compression put together for fun. The compression ratio seems reasonable for most files so far considering the time spent on it (tested 50-85 % compression of the hex output so far) but hasn't been extensively tested and isn't incredibly fast. I also added a small function for splitting the lines so that the data can be easily copied and pasted into a script. When using Join the ` option is required. Maybe someone will find the modifications useful :) .

Edit: Posted an updated version of the code here :) : http://www.autohotkey.com/forum/viewtop ... 8357#68357


Last edited by corrupt on July 17th, 2006, 10:53 am, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 10:01 pm 
Offline
User avatar

Joined: December 29th, 2004, 1:28 pm
Posts: 2545
Laszlo wrote:
PhiLho wrote:
What is the advantage of using "standard" encoding here?
You can copy into the script already encoded files, like email attachments.
Good point. I just tested adding a script into a script using a slightly modified version of the scripts I posted above and the included script extracts and runs Ok :D .

An uncompiled script can be used as a Self-extracting archive :!: Cool 8)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 10:38 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
@Corrupt: could you tell in a few words, how the compression works?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 11:30 pm 
Offline
User avatar

Joined: December 29th, 2004, 1:28 pm
Posts: 2545
Laszlo wrote:
@Corrupt: could you tell in a few words, how the compression works?

Sure :) . It's a bit messy to follow but there's not much to it. I changed it a couple times so hopefully I'm giving the right values that I used.

- it first starts a loop that counts from 2A to FE in Hex
- the hex values are then replaced in the text with ASCII characters except for those between 127 - 176, upper and lower case letters a - f and numeric characters 0 - 9
- once the characters have been replaced a string is built of each ASCII character from 128 - 176
- another couple loops are then started. One that counts from 42 to 255 (the range from 127 to 176 is skipped again) and another that loops 48 times
- for each character in the ASCII range a string of 48 of the same character is created
- The loop then counts down looking for a set of characters in a row that are between 3 - 51 characters long. If a match is found it replaces the string of x characters (anywhere from 3 - 51) with the character that is repeated followed by the next available character from the parsing loop (the characters between 128-176). The value of the character that is added to replace the group of multiple characters is used to identify how many characters were removed.

In short, hex pairs are replaced with ASCII characters when within certain ranges then characters that repeat (up to 51 times) are replaced with 2 characters (the repeated character followed by an ASCII character in a different range than the first range used for replacement.

I'd welcome any input for improvements. I know of a few ways to improve the compression but I figured I'd stop there for now as a compromise on speed vs compression.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2006, 11:44 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
@Veovis: I looked at your hex to base64 converter. It looks good.

If you could keep it conform to the standard, we could find other uses of it, like generating/processing a hex file with a script, encode it and send it with a command line email program.

Also, half-bytes do not seem to be necessary. The purpose is to handle binary files in a script, and they always contain an integer number of bytes. Therefore, we could assume an even number of hex digits. Do you know an application, which needs an odd number of digits? I looks ambiguous, too: do we assume an implicit leading- or trailing 0?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 10th, 2006, 12:25 am 
Offline

Joined: February 13th, 2006, 10:40 pm
Posts: 389
Location: Utah
@PhiLho

Concerning Pebwa, while i think it is a great idea, i am not near advanced enough to fully understand how it works and since it leaves all normal chars alone, it does not encode the hexadecimal. It could probably be rewritten to compress much better than Base64, but I'm fine with 133%. And i think Laszlo has a valid point about keeping standard.

@Laszlo

I assume the standard for Base64 is:
Code:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

I have now changed my function to use this string, and unless you correct me it is the one i will use in my app im working on. i agree that it would be benificial to keep standard so that any Base64 file will work (like email attachments.)

You also have a valid point about half bytes. But (and im actually still really confused about this all) since in encodes things 3 hex digits at a time into 2 Base64 digits, and bytes come in sets of 2 hex digits, you could have a even number of bytes that needs that - sign.

For example:

Code:
string:    ff7e4c8c       ;4 bytes of data
encoded:   /35MjA-
decoded:   ff7e4c8c


In case i wasnt clear, when you encoding a string of hex that has mod(strlen,3) = 2 (for example a string 8 hex digits long), becuase of the ratio that it compresses things (take 3 hex digits give 2 base64 digits), you end up with 2 hex making 2 base64, and when you decode that it makes 3 hex digits, so that last digit has to be trimmed off when you decode the string. So i place a "-" at the end of a string to tell the decoder to remove the trailing zero after it decodes.

I am curious as to how the "standard" base64 avoids this ratio problem.

_________________
Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 10th, 2006, 12:44 am 
Offline

Joined: February 13th, 2006, 10:40 pm
Posts: 389
Location: Utah
Ah, i found the answers to most of my questions.

See here for more details

Quote:
If there are two input bytes remaining (the remainder of the total input bytes divided by three is two), pad with one "=". If there is one input byte remaining (remainder was one), pad with two "=", otherwise, don’t pad. This prevents extra bits being added to the reconstructed data.


That also answers my question about what the standard is. +/ are the last 2 digits, and = is the paddin character. except that we want to keep standard, i am almost tempted to use "-" instead of "=" and "=" instead of "==".

Also i will add this into my function as well:

Quote:
newlines are inserted in the encoded data every 76 characters,

_________________
Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 10th, 2006, 12:52 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
You are fast! Anyway, this is a version of your script, which seems to be standard conform:
Code:
StringCaseSense On
Chars = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

MsgBox % HextoBase64("12")
MsgBox % Base64toHex("Eg==")

MsgBox % HextoBase64("1234")
MsgBox % Base64toHex("EjQ=")

MsgBox % HextoBase64("123456")
MsgBox % Base64toHex("EjRW")


HextoBase64(hex) { ; StrLen(hex) must be even
   Loop Parse, hex
   {
      m := Mod(A_Index,3)
      x  = 0x%A_loopfield%
      IfEqual      m,1, SetEnv z, % x << 8
      Else IfEqual m,2, EnvAdd z, % x << 4
      Else {
         z += x
         o := o Code(z>>6) code(z)
      }
   }
   IfEqual m,2, Return o Code(z>>6) Code(z) "=="
   IfEqual m,1, Return o Code(z>>6) "="
   Return o
}

Base64toHex(code) {
   StringReplace code, code, =,, All
   Loop Parse, code
      If (A_Index & 1)
         z := DeCode(A_LoopField) << 6
      Else {
         z += DeCode(A_LoopField)
         o := o H1(z>>8) H1(z>>4) H1(z)
      }
   If (StrLen(code)&3 = 3)
      Return o H1(z>>8)
   If (StrLen(code)&3 = 2)
      StringTrimRight o,o,1
   Return o
}

H1(x) {     ; LS hex digit
   Return Chr((x&15)+48 + 7*(x&15>9))
}

Code(i) {   ; <== Chars[i & 63], 0-base index
   Global Chars
   StringMid i, Chars, (i&63)+1, 1
   Return i
}

DeCode(c) { ; c = a char in Chars ==> position [0,63]
   Global Chars
   Return InStr(Chars,c,1) - 1
}

Edit 20060717: Simplified H1, added tests


Last edited by Laszlo on July 17th, 2006, 7:48 pm, edited 4 times in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 10th, 2006, 1:21 am 
Offline

Joined: February 13th, 2006, 10:40 pm
Posts: 389
Location: Utah
Hmmmm, (i might be wrong) but i think you did that wrong. You check the remainder of strlen(code) / 3 and you should have checked for the remainder of strlen(hex) / 3

Wait, it only gets it wrong if you give it a half-byte. Hmmmm. Not sure how that works, and I guess it doesnt matter since no one should give it half-bytes.

Also, it appears that in your code you switched where the "=" and "==" should be added.

But i do like how you eliminated the need of my silly Dec() and Hex() and Trim() functions. And i like your H1() function.

In anycase, as much as i want to stick to the standard, i dont understand the purpose of adding "==" of "=" to the code if all you do is immediatly delete it when you decode the base64.

_________________
Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 85 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: Google Feedfetcher, tomoe_uehara, Xx7 and 9 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group