AutoHotkey Community

It is currently May 26th, 2012, 6:44 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: FileMD5() and MD5()
PostPosted: June 18th, 2009, 3:11 pm 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
FileMD5()

Computes and returns MD5 hash [ RFC1321 Specification ] for a File passed as parameter, with speeds comparable to Hashes.DLL.
You may refer MSDN for Message Digest API
    Parameters:

    sFile : The fullpath of filename to hash
    cSz : *Chunk size factor. Values accepted are 0 thru 8. Default value is 4
Quote:
* To hash large files, the function has to read the file in manageable chunks into a buffer variable.
The size of this buffer is derived from cSz parameter as follows:

0 = 256 KB, 1 = 512 KB, 2 = 1.00 MB, 3 = 2.00 MB, 4 = 4.00 MB, 5 = 8.00 MB, 6 = 16.0 MB, 7 = 32.0 MB, 8 = 64.0 MB


Code:
FileMD5( sFile="", cSz=4 ) { ; www.autohotkey.com/forum/viewtopic.php?p=275910#275910
 cSz  := (cSz<0||cSz>8) ? 2**22 : 2**(18+cSz), VarSetCapacity( Buffer,cSz,0 )
 hFil := DllCall( "CreateFile", Str,sFile,UInt,0x80000000, Int,3,Int,0,Int,3,Int,0,Int,0 )
 IfLess,hFil,1, Return,hFil
 DllCall( "GetFileSizeEx", UInt,hFil, Str,Buffer ),   fSz := NumGet( Buffer,0,"Int64" )
 VarSetCapacity( MD5_CTX,104,0 ),    DllCall( "advapi32\MD5Init", Str,MD5_CTX )
 Loop % ( fSz//cSz+!!Mod(fSz,cSz) )
   DllCall( "ReadFile", UInt,hFil, Str,Buffer, UInt,cSz, UIntP,bytesRead, UInt,0 )
 , DllCall( "advapi32\MD5Update", Str,MD5_CTX, Str,Buffer, UInt,bytesRead )
 DllCall( "advapi32\MD5Final", Str,MD5_CTX ), DllCall( "CloseHandle", UInt,hFil )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5
}


Code:
MsgBox, % FileMD5( A_AhkPath ) ; Usage Example



    Edit 2009-06-18 :
  • An important bug fixed: Changed VarSetCapacity( MD5_CTX,24,0 ) to VarSetCapacity( MD5_CTX,128,0 )
  • Corrected again to VarSetCapacity( MD5_CTX,104,0). Thanks to Laszlo.

MD5()

Computes and returns MD5 hash [ RFC1321 Specification ] for memory variable contents.

Code:
MD5( ByRef V, L=0 ) { ; www.autohotkey.com/forum/viewtopic.php?p=275910#275910
 VarSetCapacity( MD5_CTX,104,0 ), DllCall( "advapi32\MD5Init", Str,MD5_CTX )
 DllCall( "advapi32\MD5Update", Str,MD5_CTX, Str,V, UInt,L ? L : StrLen(V) )
 DllCall( "advapi32\MD5Final", Str,MD5_CTX )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5
}

; Usage Example

V := "The quick brown fox jumps over the lazy dog"
L := StrLen(V)
MsgBox, % MD5( V,L )


Last edited by SKAN on May 3rd, 2011, 8:07 am, edited 5 times in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 18th, 2009, 4:32 pm 
Online

Joined: March 27th, 2008, 2:14 pm
Posts: 700
Cool MD5 functions!

AutoHotkey Doesn't have an md5 function on rosetta code yet:
http://www.rosettacode.org/wiki/Tasks_not_implemented_in_AutoHotkey

You (or somebody) should post on Rosetta Code's MD5 page: http://www.rosettacode.org/wiki/MD5

Thanks!

_________________
Scripts - License


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 18th, 2009, 5:25 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Nice! I guess the right size is VarSetCapacity(MD5_CTX,104), though. Anything larger works, too. The string version at the second dll call needs VarSetCapacity(V) at the end of the line, not VarSetCapacity(L).

Also, the right offset to read the digest from is 87+A_Index. 7+A_Index works, too, but it is not the intended place.

It is not a fair rosettacode solution, because the work is done in the Windows advapi32.dll, not in AHK. I am not even sure if machine code functions qualify, because they are processor dependent, not in the control of AHK.


Last edited by Laszlo on June 18th, 2009, 5:42 pm, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 18th, 2009, 5:40 pm 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
infogulch wrote:
Cool MD5 functions!


Thanks! :)


Laszlo wrote:
I guess the right size is VarSetCapacity(MD5_CTX,104)


I do not understand that struct at all!.. I vaguely picked that number from here

Quote:
It is not a fair rosettacode solution, because the work is done in the Windows advapi32.dll, not in AHK.


I agree Sir. Maybe you can consider posting your native implementation of MD5()

:)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 18th, 2009, 5:46 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
MSDN says:
Code:
typedef struct {
  ULONG         i[2];
  ULONG         buf[4];
  unsigned char in[64];
  unsigned char digest[16];
} MD5_CTX;
It is 2*4+4*4+64+16 = 104 bytes. The offset for the result (digest) is 88.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 18th, 2009, 5:47 pm 
Online

Joined: March 27th, 2008, 2:14 pm
Posts: 700
:shock: oops. I didn't know you had already done that, Laszlo. I agree with SKAN, you should post yours on rosetta code.

_________________
Scripts - License


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 18th, 2009, 6:23 pm 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
Laszlo wrote:
It is 2*4+4*4+64+16 = 104 bytes.


Thank you Sir. :)

Laszlo wrote:
The offset for the result (digest) is 88.


:shock:

I did not view that far with my Binspector. Seeing that the digest is also available at offset 8, I happily contended with 24 until I noticed I am not getting proper results for small strings.

Thanks again for the clarification..

Edit: I have updated the code with right VarSetCapacity and corrected the Message offset to read from.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 15th, 2010, 6:14 pm 
Offline

Joined: February 17th, 2008, 5:01 pm
Posts: 303
Hi SKAN, many thanks for the MD5() function.

I'm curious. I notice that if you omit the second argument, it calculates the length on it's own. Why would you ever want to include the second argument? And why does the function use VarSetCapacity to calculate the capacity rather than just using StrLen?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 15th, 2010, 6:43 pm 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
Second question first:

Quote:
why does the function use VarSetCapacity to calculate the capacity rather than just using StrLen?


StrLen() will not return the correct length for binary data which contain null(s).

Code:
FileRead, Bin, %A_AhkPath%
MsgBox, % "StrLen()          `t: " StrLen( Bin )
        . "`nVarSetCapacity()`t: " VarSetCapacity( Bin )


Quote:
if you omit the second argument, it calculates the length on it's own. Why would you ever want to include the second argument?


VarSetCapacity() will not be reliable, when a variable is assigned with := operator.
In those cases, it is best to use StrLen( Data ) as second parameter.

Code:
Var := "Hello"
MsgBox, % "StrLen()          `t: " StrLen( Var )
        . "`nVarSetCapacity()`t: " VarSetCapacity( Var )


:)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 16th, 2010, 3:42 am 
Offline

Joined: February 17th, 2008, 5:01 pm
Posts: 303
That's very interesting and a bit disturbing. Is VarSetCapacity wrong in those cases because it is telling the capacity of the variable rather than the length of the string that has been assigned to it? I may have already found a case in which not using StrLen leads to an incorrect hash.

I'm only going to be using it for relatively short strings that were generated within AHK, so I presume that these won't have any Nulls. I'd like to remove the second parameter, just using StrLen instead. To allow for calling the function on a constant string (rather than a variable), I'd also like to remove the "byref." Do you know if the following would be safe and reliable? Idon't know if it matters, but I will be using both Unicode and Ascii builds
Code:
MD5(V) { ; www.autohotkey.com/forum/viewtopic.php?p=275910#275910
 VarSetCapacity( MD5_CTX,104,0 ), DllCall( "advapi32\MD5Init", Str,MD5_CTX )
 DllCall( "advapi32\MD5Update", Str,MD5_CTX, Str,V, UInt, StrLen(V) )
 DllCall( "advapi32\MD5Final", Str,MD5_CTX )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5
}


Thanks again for the prompt help...


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 16th, 2010, 5:05 am 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
JoeSchmoe wrote:
I'd also like to remove the "byref."


Then the function would work on the copy of original string. Redundant!

Quote:
I will be using both Unicode and Ascii builds


As is, the function would return incorrect results on AHK_L.. I need some time to test it. I will revert at earliest.

:)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 16th, 2010, 9:56 am 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
SKAN wrote:
Then the function would work on the copy of original string. Redundant!
It may be redundant, but ByRef isn't necessarily more efficient. In one case you have a one-time string copy when the function is called, and in the other case every access to the variable has an extra level of indirection, i.e. V -> variable passed by caller. I'd generally recommend using ByRef only when required for the function to work. For instance, ByRef would be required to support hashing binary data. For usability, you could do something like this:
Code:
MD5( V, L=0 ) {
    return MD5_ref( V, L )
}
MD5_ref( ByRef V, L=0 ) {
    ... actual function here ...
}
The extra function call is minimal overhead. However, for very large strings ByRef will perform better -- I guess in such cases the string would generally be in a variable already. Rather than having the extra function, JoeSchmoe can do this:
Code:
MD5(_:="literal string")

Quote:
As is, the function would return incorrect results on AHK_L..
Looks like you only need to convert the string length to bytes; i.e. multiply by (A_IsUnicode ? 2:1). VarSetCapacity already returns a value in bytes.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 17th, 2010, 5:49 am 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
Lexikos wrote:
Quote:
As is, the function would return incorrect results on AHK_L..
Looks like you only need to convert the string length to bytes; i.e. multiply by (A_IsUnicode ? 2:1). VarSetCapacity already returns a value in bytes.


Maybe I am missing something obvious, but I am not able to produce correct results ( AHK_L ) without calling WideCharToMultiByte() prior to MD5Update()


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 17th, 2010, 6:40 am 
Offline
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
@JoeSchmoe

Okay! .. I got it:


Code:
MsgBox, % StrMD5( "The Quick Brown Fox Jumps Over The Lazy Dog" ) ; 58826469C2606F4791B9F75880DFBE2A

StrMD5( V ) { ; www.autohotkey.com/forum/viewtopic.php?p=376840#376840
 VarSetCapacity( MD5_CTX,104,0 ), DllCall( "advapi32\MD5Init", UInt,&MD5_CTX )
 DllCall( "advapi32\MD5Update", UInt,&MD5_CTX, A_IsUnicode ? "AStr" : "Str",V
 , UInt,StrLen(V) ), DllCall( "advapi32\MD5Final", UInt,&MD5_CTX )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5
}


You may compare results with http://md5-hash-online.waraxe.us/


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 17th, 2010, 7:01 am 
Offline

Joined: February 17th, 2008, 5:01 pm
Posts: 303
Thanks, SKAN! This looks perfect. I can't wait to try it out in the code tomorrow.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: fusion1920, Ragnar, Retro Gamer and 11 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group