Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

FileCRC32, FileSHA1, FileMD5() and MD5()


  • Please log in to reply
45 replies to this topic
SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
FileCRC32()
Computes and returns CRC32 hash for a File passed as parameter.
Please refer FileMD5() for explanation on parameter 2

MsgBox, % FileCRC32( A_AhkPath )[color=#408000] ; Usage Example[/color]

[color=#FF0000]FileCRC32([/color] sFile="",cSz=4 [color=#FF0000])[/color] { ; by SKAN www.autohotkey.com/community/viewtopic.php?t=64211
 cSz := (cSz<0||cSz>8) ? 2**22 : 2**(18+cSz), VarSetCapacity( Buffer,cSz,0 ) ; 10-Oct-2009
 hFil := DllCall( "CreateFile", Str,sFile,UInt,0x80000000, Int,3,Int,0,Int,3,Int,0,Int,0 )
 IfLess,hFil,1, Return,hFil
 hMod := DllCall( "LoadLibrary", Str,"ntdll.dll" ), CRC32 := 0
 DllCall( "GetFileSizeEx", UInt,hFil, UInt,&Buffer ),    fSz := NumGet( Buffer,0,"Int64" )
 Loop % ( fSz//cSz + !!Mod( fSz,cSz ) )
   DllCall( "ReadFile", UInt,hFil, UInt,&Buffer, UInt,cSz, UIntP,Bytes, UInt,0 )
 , CRC32 := DllCall( "NTDLL\RtlComputeCrc32", UInt,CRC32, UInt,&Buffer, UInt,Bytes, UInt )
 DllCall( "CloseHandle", UInt,hFil )
 SetFormat, Integer, % SubStr( ( A_FI := A_FormatInteger ) "H", 0 )
 CRC32 := SubStr( CRC32 + 0x1000000000, -7 ), DllCall( "CharUpper", Str,CRC32 )
 SetFormat, Integer, %A_FI%
Return CRC32, DllCall( "FreeLibrary", UInt,hMod )
}


FileSHA1()
Computes and returns SHA1 hash for a File passed as parameter.
Please refer FileMD5() for explanation on parameter 2

MsgBox, % FileSHA1( A_AhkPath )[color=#408000] ; Usage Example[/color]

[color=#FF0000]FileSHA1([/color] sFile="", cSz=4 [color=#FF0000])[/color] { ; by SKAN www.autohotkey.com/community/viewtopic.php?t=64211
 cSz := (cSz<0||cSz>8) ? 2**22 : 2**(18+cSz), VarSetCapacity( Buffer,cSz,0 ) ; 09-Oct-2012
 hFil := DllCall( "CreateFile", Str,sFile,UInt,0x80000000, Int,3,Int,0,Int,3,Int,0,Int,0 )
 IfLess,hFil,1, Return,hFil
 hMod := DllCall( "LoadLibrary", Str,"advapi32.dll" )
 DllCall( "GetFileSizeEx", UInt,hFil, UInt,&Buffer ),    fSz := NumGet( Buffer,0,"Int64" )
 VarSetCapacity( SHA_CTX,136,0 ),  DllCall( "advapi32\A_SHAInit", UInt,&SHA_CTX )
 Loop % ( fSz//cSz + !!Mod( fSz,cSz ) )
   DllCall( "ReadFile", UInt,hFil, UInt,&Buffer, UInt,cSz, UIntP,bytesRead, UInt,0 )
 , DllCall( "advapi32\A_SHAUpdate", UInt,&SHA_CTX, UInt,&Buffer, UInt,bytesRead )
 DllCall( "advapi32\A_SHAFinal", UInt,&SHA_CTX, UInt,&SHA_CTX + 116 )
 DllCall( "CloseHandle", UInt,hFil )
 Loop % StrLen( Hex:="123456789ABCDEF0" ) + 4
  N := NumGet( SHA_CTX,115+A_Index,"Char"), SHA1 .= SubStr(Hex,N>>4,1) SubStr(Hex,N&15,1)
Return SHA1, DllCall( "FreeLibrary", UInt,hMod )
}


FileMD5()
Computes and returns MD5 hash [ RFC1321 Specification ] for a File passed as parameter, with speeds comparable to Hashes.DLL.
You may refer MSDN for Message Digest API ( Edit 09-Oct-2012: MSDN has removed documention for MD5 )

Parameters:

sFile : The fullpath of filename to hash
cSz : *Chunk size factor. Values accepted are 0 thru 8. Default value is 4

* To hash large files, the function has to read the file in manageable chunks into a buffer variable.
The size of this buffer is derived from cSz parameter as follows:

0 = 256 KB, 1 = 512 KB, 2 = 1.00 MB, 3 = 2.00 MB, 4 = 4.00 MB, 5 = 8.00 MB, 6 = 16.0 MB, 7 = 32.0 MB, 8 = 64.0 MB


MsgBox, % FileMD5( A_AhkPath )[color=#408000] ; Usage Example[/color]

[color=#FF0000]FileMD5([/color] sFile="", cSz=4 [color=#FF0000])[/color] {  ; by SKAN www.autohotkey.com/community/viewtopic.php?t=64211
 cSz := (cSz<0||cSz>8) ? 2**22 : 2**(18+cSz), VarSetCapacity( Buffer,cSz,0 ) ; 18-Jun-2009
 hFil := DllCall( "CreateFile", Str,sFile,UInt,0x80000000, Int,3,Int,0,Int,3,Int,0,Int,0 )
 IfLess,hFil,1, Return,hFil
 hMod := DllCall( "LoadLibrary", Str,"advapi32.dll" )
 DllCall( "GetFileSizeEx", UInt,hFil, UInt,&Buffer ),    fSz := NumGet( Buffer,0,"Int64" )
 VarSetCapacity( MD5_CTX,104,0 ),    DllCall( "advapi32\MD5Init", UInt,&MD5_CTX )
 Loop % ( fSz//cSz + !!Mod( fSz,cSz ) )
   DllCall( "ReadFile", UInt,hFil, UInt,&Buffer, UInt,cSz, UIntP,bytesRead, UInt,0 )
 , DllCall( "advapi32\MD5Update", UInt,&MD5_CTX, UInt,&Buffer, UInt,bytesRead )
 DllCall( "advapi32\MD5Final", UInt,&MD5_CTX )
 DllCall( "CloseHandle", UInt,hFil )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5, DllCall( "FreeLibrary", UInt,hMod )
}


MD5()

Computes and returns MD5 hash [ RFC1321 Specification ] for memory variable contents.

[color=#FF0000]MD5([/color] ByRef V, L=0 [color=#FF0000])[/color] { ; www.autohotkey.com/forum/viewtopic.php?p=275910#275910
 VarSetCapacity( MD5_CTX,104,0 ), DllCall( "[color=#D62A00]advapi32\MD5Init[/color]", Str,MD5_CTX )
 DllCall( "[color=#D62A00]advapi32\MD5Update[/color]", Str,MD5_CTX, Str,V, UInt,L ? L : StrLen(V) )
 DllCall( "[color=#D62A00]advapi32\MD5Final[/color]", Str,MD5_CTX )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5
}

; Usage Example

[color=black]V := "The quick brown fox jumps over the lazy dog"
L := StrLen(V)
MsgBox, % MD5( V,L )[/color]

kWo4Lk1.png

infogulch
  • Moderators
  • 717 posts
  • Last active: Jul 31 2014 08:27 PM
  • Joined: 27 Mar 2008
Cool MD5 functions!

AutoHotkey Doesn't have an md5 function on rosetta code yet:
http://www.rosettaco...d_in_AutoHotkey

You (or somebody) should post on Rosetta Code's MD5 page: http://www.rosettacode.org/wiki/MD5

Thanks!

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Nice! I guess the right size is VarSetCapacity(MD5_CTX,104), though. Anything larger works, too. The string version at the second dll call needs VarSetCapacity(V) at the end of the line, not VarSetCapacity(L).

Also, the right offset to read the digest from is 87+A_Index. 7+A_Index works, too, but it is not the intended place.

It is not a fair rosettacode solution, because the work is done in the Windows advapi32.dll, not in AHK. I am not even sure if machine code functions qualify, because they are processor dependent, not in the control of AHK.

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

Cool MD5 functions!


Thanks! :)


I guess the right size is VarSetCapacity(MD5_CTX,104)


I do not understand that struct at all!.. I vaguely picked that number from here

It is not a fair rosettacode solution, because the work is done in the Windows advapi32.dll, not in AHK.


I agree Sir. Maybe you can consider posting your native implementation of MD5()

:)

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
MSDN says:
typedef struct {

  ULONG         i[2];

  ULONG         buf[4];

  unsigned char in[64];

  unsigned char digest[16];

} MD5_CTX;
It is 2*4+4*4+64+16 = 104 bytes. The offset for the result (digest) is 88.

infogulch
  • Moderators
  • 717 posts
  • Last active: Jul 31 2014 08:27 PM
  • Joined: 27 Mar 2008
:shock: oops. I didn't know you had already done that, Laszlo. I agree with SKAN, you should post yours on rosetta code.

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

It is 2*4+4*4+64+16 = 104 bytes.


Thank you Sir. :)

The offset for the result (digest) is 88.


:shock:

I did not view that far with my Binspector. Seeing that the digest is also available at offset 8, I happily contended with 24 until I noticed I am not getting proper results for small strings.

Thanks again for the clarification..

Edit: I have updated the code with right VarSetCapacity and corrected the Message offset to read from.

JoeSchmoe
  • Members
  • 304 posts
  • Last active: Feb 28 2013 05:39 PM
  • Joined: 17 Feb 2008
Hi SKAN, many thanks for the MD5() function.

I'm curious. I notice that if you omit the second argument, it calculates the length on it's own. Why would you ever want to include the second argument? And why does the function use VarSetCapacity to calculate the capacity rather than just using StrLen?

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Second question first:

why does the function use VarSetCapacity to calculate the capacity rather than just using StrLen?


StrLen() will not return the correct length for binary data which contain null(s).

FileRead, Bin, %A_AhkPath%
MsgBox, % "StrLen()          `t: " StrLen( Bin )
        . "`nVarSetCapacity()`t: " VarSetCapacity( Bin )

if you omit the second argument, it calculates the length on it's own. Why would you ever want to include the second argument?


VarSetCapacity() will not be reliable, when a variable is assigned with := operator.
In those cases, it is best to use StrLen( Data ) as second parameter.

Var := "Hello"
MsgBox, % "StrLen()          `t: " StrLen( Var )
        . "`nVarSetCapacity()`t: " VarSetCapacity( Var )

:)

JoeSchmoe
  • Members
  • 304 posts
  • Last active: Feb 28 2013 05:39 PM
  • Joined: 17 Feb 2008
That's very interesting and a bit disturbing. Is VarSetCapacity wrong in those cases because it is telling the capacity of the variable rather than the length of the string that has been assigned to it? I may have already found a case in which not using StrLen leads to an incorrect hash.

I'm only going to be using it for relatively short strings that were generated within AHK, so I presume that these won't have any Nulls. I'd like to remove the second parameter, just using StrLen instead. To allow for calling the function on a constant string (rather than a variable), I'd also like to remove the "byref." Do you know if the following would be safe and reliable? Idon't know if it matters, but I will be using both Unicode and Ascii builds
MD5(V) { ; www.autohotkey.com/forum/viewtopic.php?p=275910#275910 
 VarSetCapacity( MD5_CTX,104,0 ), DllCall( "advapi32\MD5Init", Str,MD5_CTX ) 
 DllCall( "advapi32\MD5Update", Str,MD5_CTX, Str,V, UInt, StrLen(V) ) 
 DllCall( "advapi32\MD5Final", Str,MD5_CTX ) 
 Loop % StrLen( Hex:="123456789ABCDEF0" ) 
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1) 
Return MD5 
}

Thanks again for the prompt help...

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

I'd also like to remove the "byref."


Then the function would work on the copy of original string. Redundant!

I will be using both Unicode and Ascii builds


As is, the function would return incorrect results on AHK_L.. I need some time to test it. I will revert at earliest.

:)

Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006

Then the function would work on the copy of original string. Redundant!

It may be redundant, but ByRef isn't necessarily more efficient. In one case you have a one-time string copy when the function is called, and in the other case every access to the variable has an extra level of indirection, i.e. V -> variable passed by caller. I'd generally recommend using ByRef only when required for the function to work. For instance, ByRef would be required to support hashing binary data. For usability, you could do something like this:
MD5( V, L=0 ) {
    return MD5_ref( V, L )
}
MD5_ref( ByRef V, L=0 ) { 
    [color=darkgray]... actual function here ...[/color]
}
The extra function call is minimal overhead. However, for very large strings ByRef will perform better -- I guess in such cases the string would generally be in a variable already. Rather than having the extra function, JoeSchmoe can do this:
MD5([color=red]_:=[/color]"literal string")

As is, the function would return incorrect results on AHK_L..

Looks like you only need to convert the string length to bytes; i.e. multiply by (A_IsUnicode ? 2:1). VarSetCapacity already returns a value in bytes.

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

As is, the function would return incorrect results on AHK_L..

Looks like you only need to convert the string length to bytes; i.e. multiply by (A_IsUnicode ? 2:1). VarSetCapacity already returns a value in bytes.


Maybe I am missing something obvious, but I am not able to produce correct results ( AHK_L ) without calling WideCharToMultiByte() prior to MD5Update()

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
@JoeSchmoe

Okay! .. I got it:


MsgBox, % StrMD5( "The Quick Brown Fox Jumps Over The Lazy Dog" ) ; 58826469C2606F4791B9F75880DFBE2A

StrMD5( V ) { ; www.autohotkey.com/forum/viewtopic.php?p=376840#376840
 VarSetCapacity( MD5_CTX,104,0 ), DllCall( "advapi32\MD5Init", UInt,&MD5_CTX )
 DllCall( "advapi32\MD5Update", UInt,&MD5_CTX, [color=red]A_IsUnicode ? "AStr" : "Str"[/color],V
 , UInt,StrLen(V) ), DllCall( "advapi32\MD5Final", UInt,&MD5_CTX )
 Loop % StrLen( Hex:="123456789ABCDEF0" )
  N := NumGet( MD5_CTX,87+A_Index,"Char"), MD5 .= SubStr(Hex,N>>4,1) . SubStr(Hex,N&15,1)
Return MD5
}

You may compare results with <!-- m -->http://md5-hash-online.waraxe.us/<!-- m -->

JoeSchmoe
  • Members
  • 304 posts
  • Last active: Feb 28 2013 05:39 PM
  • Joined: 17 Feb 2008
Thanks, SKAN! This looks perfect. I can't wait to try it out in the code tomorrow.