Jump to content


Photo

CRC-32


  • Please log in to reply
17 replies to this topic

#1 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 14 August 2005 - 04:07 AM

CRC, the Cyclic Redundancy Check is a simple hash function: a fixed length value computed from arbitrary length data. It is used mainly for error checking: to see if the data (program code, compressed archive, Internet message) is intact; or narrowing down the search for identical files. A few random bit flips or dropped bytes almost always result in different CRC value, but it is easy to modify the data e.g. with appending a few bytes, so the result will provide any desired CRC value. Consequently, CRC cannot be used for security (message authentication, or detecting malicious modifications).

CRCs are based on division in the ring of polynomials over the integers modulo 2. A string of bits is interpreted as the coefficients of a polynomial of this sort, and to find the CRC, we divide it by another fixed polynomial. The coefficients of the remainder polynomial are the CRC. (See more details in http://www.repairfaq...K/F_crc_v3.html.) The IEEE 802.3 standard specifies the polynomial and the padding of the data (with its length represented in 4 bytes).

There are several CRC variants in use:

CRC-8 x^8 + x^2 + x^ + 1
CRC-CCITT x^16 + x^12 + x^5 + 1
CRC-16 (IBM) x^16 +x^15 + x^2 + 1
CRC-32 (802.3) x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x^ + 1
CRC32c (Castagnoli) x^32 + x^28 + x^27 + x^26 + x^25 + x^23 + x^22 + x^20 + x^19 + x^18 + x^14 + x^13 + x^11 + x^10 + x^9 + x^8 + x^6 + 1

Here is a simple script for CRC-32, the most often used CRC version. When using it the for a (binary) file, the data has to be converted to a string of hex digits, e.g. with this script, also used here. For large files the input has to be modified, so pieces of data are read, without closing and reopening the file. Also, the code can be optimized with using tables and handling several bits together, but this script could still serve as a verifier for the accelerated code.
SetFormat Integer, H

x = 00000000000000000000000000000000000000000000000000000000000000000000000000000000 ; 864d7f99
MsgBox % CRC32(x)
x = FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ; c55e457a
MsgBox % CRC32(x)
x = 0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728 ; bf671ed0
MsgBox % CRC32(x)
Return

CRC32(x)
{
   L := StrLen(x)>>1          ; length in bytes
   StringTrimLeft L, L, 2     ; remove leading 0x
   L = 0000000%L%
   StringRight L, L, 8        ; 8 hex digits
   x = %x%%L%                 ; standard pad
   R =  0xFFFFFFFF            ; initial register value
   Loop Parse, x
   {
      y := "0x" A_LoopField   ; one hex digit at a time
      Loop 4
      {
         R := (R << 1) ^ ( (y << (A_Index+28)) & 0x100000000)
         IfGreater R,0xFFFFFFFF
            R := R ^ 0x104C11DB7
      }
   }
   Return ~R                  ; ones complement is the CRC
}


#2 Chris

Chris
  • Administrators
  • 10727 posts

Posted 14 August 2005 - 12:05 PM

It's great to have a CRC function posted. I think it could be used with PixelGetColor to identify a pattern of pixels on the screen, which might be easier than ImageSearch for some tasks. And I'm sure CRC has many other uses.

Thanks.

#3 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 14 August 2005 - 08:54 PM

The script above implements the standard. For internal use (like at pattern search), we can even drop the padding with length and the final bit inversion, making the code really short.
CRC(x)                        ; CRC of x = string of hex digits

{

   x = %x%12345678            ; non-standard pad

   R =  0xFFFFFFFF            ; initial register value

   Loop Parse, x

   {

      y := "0x" A_LoopField   ; one hex digit at a time

      Loop 4

      {

         R := (R << 1) ^ ( (y << (A_Index+28)) & 0x100000000)

         IfGreater R,0xFFFFFFFF

            R := R ^ 0x104C11DB7

      }

   }

   Return R                   ; final Reg = CRC

}
Also, don't forget to convert other types of data to strings of hex digits, what the CRC script needs. The following is simplified from the list functions library.
String2Hex(x)                 ; Convert a string to hex digits

{                             ; needs SetFormat Integer, H

   Loop Parse, x

   {

      y := ASC(A_LoopField)   ; 2 digit ASCII code of chars of x, 15 < y < 256

      StringTrimLeft y, y, 2  ; Remove leading 0x

      hex = %hex%%y%

   }

   Return hex

}


#4 Rajat

Rajat
  • Members
  • 1886 posts

Posted 14 August 2005 - 10:33 PM

wow Laszlo, you're really good with encryption stuff! nice work here too!

#5 Invalid User

Invalid User
  • Members
  • 447 posts

Posted 14 August 2005 - 11:39 PM

Rajat I think you mentioned Laszlo was an encryption expert, your wern't joking. Nice work here Laszlo.

#6 corrupt

corrupt
  • Members
  • 2558 posts

Posted 14 August 2005 - 11:49 PM

Thanks Laszlo :D .

#7 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 14 August 2005 - 11:59 PM

It was my pleasure! (And I really do crypto for living.)

#8 Rajat

Rajat
  • Members
  • 1886 posts

Posted 15 August 2005 - 12:19 AM

Rajat I think you mentioned Laszlo was an encryption expert, your wern't joking.

actually i came to know about that from his webpage... its url was in some script of his.

#9 SKAN

SKAN
  • Administrators
  • 9062 posts

Posted 25 May 2006 - 01:20 PM

Dear Laszlo, :)

Currently, for file verification I call an external dll function.
I like to have it done in pure AHK.

The CRC32() function expects a hex string.
Can you post a variant that accepts a filename and returns a CRC32 hash!

Something like :
Hash:=CRC32("Autohotkey.exe")

I require to check binary files sized less than 1 MB.

Please help me!

Regards, :)

#10 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 25 May 2006 - 03:19 PM

It is called CRC. I simplified the original CRC32:
MsgBox % CRC(A_ScriptFullPath)

MsgBox %ErrorLevel%                 ; Error from file read



CRC(file) {

   format = %A_FormatInteger%       ; save original integer format

   SetFormat Integer, Hex           ; for converting bytes to hex

   c := CRC32(HexRead(file))

   SetFormat Integer, %format%      ; restore original format

   Return c                         ; result in hex. Add "+0" for original integer format

}



HexRead(file, n=0, offset=0) {

   h := DllCall("CreateFile",Str,file,UInt,0x80000000,UInt,3,UInt,0,UInt,3,UInt,0,UInt,0)

   IfEqual h,-1, SetEnv, ErrorLevel, -1

   IfNotEqual ErrorLevel,0,Return,0 ; couldn't open the file



   m = 0                            ; seek to offset

   IfLess offset,0, SetEnv,m,2

   r := DllCall("SetFilePointerEx",UInt,h,Int64,offset,UIntP,p,Int,m)

   IfEqual r,0, SetEnv, ErrorLevel, -3

   IfNotEqual ErrorLevel,0, {

      t = %ErrorLevel%              ; save ErrorLevel to be returned

      DllCall("CloseHandle", UInt, h)

      ErrorLevel = %t%              ; return seek error

      Return 0

   }



   IfEqual n,0, SetEnv n,0xffffffff ; almost infinite



   Loop %n%

   {

      result := DllCall("ReadFile",UInt,h,UCharP,c,UInt,1,UIntP,Read,UInt,0)

      if (!result or Read < 1 or ErrorLevel)

         break

      c += 0x100                    ; convert to hex with a leading 1

      StringTrimLeft c, c, 3        ; remove 0x1

      data = %data%%c%              ; append 2 hex digits

   }



   IfNotEqual ErrorLevel,0, SetEnv,t,%ErrorLevel%



   h := DllCall("CloseHandle", UInt, h)

   IfEqual h,-1, SetEnv, ErrorLevel, -2

   IfNotEqual t,,SetEnv, ErrorLevel, %t%



   Return data

}



CRC32(x) {

   L := 0x100000000 + (StrLen(x)>>1) ; length in bytes with leading 0x1

   StringTrimLeft L, L, 3     ; remove leading 0x1, length in 8 digits

   x = %x%%L%                 ; standard pad

   R = 0xFFFFFFFF             ; initial register value

   Loop Parse, x

   {

      y = 0x%A_LoopField%     ; one hex digit at a time

      Loop 4

      {

         R := (R << 1) ^ ( (y << (A_Index+28)) & 0x100000000)

         IfGreater R,0xFFFFFFFF

            R := R ^ 0x104C11DB7

      }

   }

   Return ~R                  ; ones complement is the CRC

}


#11 SKAN

SKAN
  • Administrators
  • 9062 posts

Posted 25 May 2006 - 04:05 PM

Dear Laszlo, :)

Many Thanks! :D

It is called CRC. I simplified the original CRC32:


I tested it with a binary file sized 193847 bytes

The time taken varies between 100 - 156 seconds!
Is this normal for this algorithm?

My system: AMD Sempron 2 Ghz / 256 MB / WIN 2000 SP4

Thanks again for the help.

Regards, :)

#12 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 25 May 2006 - 05:39 PM

Try it with
Process Priority,,High

SetBatchLines -1
It could speed it up. However, AHK is always much slower for large scale data processing than compiled code. The script performs several AHK commands on each data byte, so don't expect really high performance.

#13 SKAN

SKAN
  • Administrators
  • 9062 posts

Posted 25 May 2006 - 05:53 PM

Dear Laszlo, :)

The script performs several AHK commands on each data byte, so don't expect really high performance.


:shock:

I was always wondering if we could have a FileCopy command written with pure AHK

- so that we have a progress bar while copying/moving large data ( especially Video files! ).

- so that simultaneously CRC32 will be computed and appended to the file name.

This is disheartening!

:D Anyways! I am very happy to have a Pure AHK solution for checking file integrity! :D

Thank you very much for all this help...

Regards, :)

#14 AhkRich

AhkRich
  • Guests

Posted 09 November 2007 - 05:08 PM

Hi,

Great work Laszlo! I'm really glad that someone is bringing such great expertise to Autohotkey users.

I think there is a problem with your CRC32 implementation though.
The result of your code does not match the result obtained from PHP or several online calculators. Now, this could just mean that the PHP function is wrong and those online calculators all use the PHP function.

For example, let's use the text "clipboard".
The hex-encoded form is 636c6970626f617264

Your crc32 code (the "standard" one) produced the following result: ebf4f949
However, the php function (and all the online sites) produced the following: 8fdbc496

So it seems that your implementation does match the one PHP uses. Is there somewhere I can find some standard references to see which implimentation is correct?

Here are some sites to check against your results:
http://www.fileforma...o/tool/hash.htm
http://crc32-checksum.waraxe.us/

Thanks,
Richard

#15 AhkRich

AhkRich
  • Members
  • 1 posts

Posted 09 November 2007 - 07:03 PM

Beware if you are trying to find the CRC32 of BINARY data.
The String2Hex function shown above will NOT give the correct result for binary data.

However the String2Hex function shown above only gives the correct result for bytes that have a value of 15-256 (as stated in the comments). Many binary files (including pure-text files) have characters that have values less than 15. Common examples are carriage return, line feed, tab etc. String2Hex will NOT give the correct result for any string or file that contains these characters.

Each byte in the binary (or text) data MUST be represented by two hex digits. For example a 4-character string will have an 8-digit hex representation.

For example take the string "AA[TAB]A" where [TAB] is the tab character (ASCII 0x09) and A is just the letter A (ASCII 0x41).
The correct hex representation of this string is 41410941. However String2hex gives 4141941 which actually represents a completely different string where the characters have the following ascii codes (from left to right): 0x04, 0x14, 0x19, and 0x41

The following code corrects the problem (although I'm sure there are more optimized ways of doing it).
I also hate functions that change global parameters (such as integer formatting), so I made sure this function restores A_FormatInteger to its original value (H or D).

String2Hex(x)                 ; Convert a string to hex digits
{
  prevFmt = %A_FormatInteger%
  SetFormat Integer, H ;this function requires hex format
   Loop Parse, x
   {
      y := Asc(A_LoopField)   ; 2 digit ASCII code
      StringTrimLeft, y,y,2  ;remove 0x
      y=0%y%  ;pad on left with 0 to make sure each char gives 2 hex digits.
      StringRight, y,y,2 ;take only the right-most two digits
      hex := hex y
   }
   SetFormat Integer, %prevFmt% ;restore original integer formatting
   return hex
}