AutoHotkey Community

It is currently May 26th, 2012, 2:36 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 356 posts ]  Go to page Previous  1 ... 11, 12, 13, 14, 15, 16, 17 ... 24  Next
Author Message
 Post subject:
PostPosted: January 9th, 2009, 11:40 pm 
Been searching the forum for a fast file checksum code, and found this posting by Laszlo: a wrapper for the CRC32 function
Quote:
The function CRC32 has three parameters.
- The first one is the name of a buffer, which can contain binary data.
- The second parameter is the length of the data in bytes. If omitted or not positive, Strlen(Buffer) is used internally.
- The 3rd parameter is used for continuing the CRC computation for second or later data sections. If omitted, -1 is used, the standard initial value for CRC32. If an earlier CRC operation is to be continued (which returned C), put here ~C. If a different CRC is needed than the standard CRC-32 (e.g. to resolve collisions), you can use any 32 bit integer for initialization.


Can anybody show me in a simple script how to use this CRC32 function to generate the checksum of any file (with FileSelectFile, for example), read as binary (and in chunks if big, so as to not load the whole file in memory)

:?:

I'd like to use this for a duplicate file finder script.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 12th, 2009, 5:53 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Code:
bufSz := 1 << 26             ; 64MB buffer
VarSetCapacity(buff,bufSz)   ; allocate buffer
file := A_ScriptFullPath     ; put your filename here
FileGetSize Sz, %file%

c := 0, offs := -bufSz
h := OpenFile(file)          ; handle to file
Loop % Sz//bufSz {           ; for each buff-full of data
   BinRead(h, buff, bufSz, offs+=bufSz)
   c := CRC32(buff,bufSz,~c) ; compute accumulated CRC
}

If (m:=mod(Sz,bufSz)) {      ; the slack
   BinRead(h, buff, m, offs+=bufSz)
   c := CRC32(buff,m,~c)
}
CloseFile(h)
                             ; c = CRC here
SetFormat Integer, Hex
MsgBox  % c+0                ; show CRC32 in hex


OpenFile(file) {             ; only for read!
   Return DllCall("CreateFile",Str,file, UInt,0x80000000, UInt,3, UInt,0, UInt,3, UInt,0, UInt,0)
}

BinRead(hFile, ByRef data, n, offset=0)  { ; offset<0: counted from the end backwards
   DllCall("SetFilePointerEx",UInt,hFile, Int64,offset, UIntP,U, Int,2*(offset<0))
   DllCall("ReadFile",UInt,hFile, Str,data, UInt,n, UIntP,r, UInt,0)
   Return r                  ; the number of bytes read
}

CloseFile(hFile) {
   DllCall("CloseHandle", UInt,hFile)
}

CRC32(ByRef Buffer, Bytes=0, Start=-1) {
   Static CRC32, CRC32LookupTable
   If (CRC32 = "") {
      MCode(CRC32,"33c06a088bc85af6c101740ad1e981f12083b8edeb02d1e94a75ec8b542404890c82403d0001000072d8c3")
      VarSetCapacity(CRC32LookupTable, 1024)
      DllCall(&CRC32, "uint",&CRC32LookupTable, "cdecl")
      MCode(CRC32,"558bec33c039450c7627568b4d080fb60c08334d108b55108b751481e1ff000000c1ea0833148e403b450c89551072db5e8b4510f7d05dc3")
   }
   If Bytes <= 0
      Bytes := StrLen(Buffer)
   Return DllCall(&CRC32, "uint",&Buffer, "uint",Bytes, "int",Start, "uint",&CRC32LookupTable, "cdecl uint")
}

MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
   VarSetCapacity(code,StrLen(hex)//2)
   Loop % StrLen(hex)//2
      NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char")
}

Edit 20090112: Faster BinRead, larger buffer for speedup


Last edited by Laszlo on January 12th, 2009, 8:04 pm, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 12th, 2009, 6:13 pm 
Great! Much abliged... 8)

Is there any way to speed up the scanning of large files more? I found two (fast) command line utilities that beat this script considerably in speed:

rehash.exe (has other checksum options too)
crc32.exe (fastest of the crc32.exe's I found)

Time to scan a 700 MB iso file:
- script: 30 seconds (buffer-size: 32768, latest AutoHotkey beta)
- rehash: 19 seconds
- crc32: 18 seconds

Other command line programs I tried (but were slower than the ones mentioned above):

- crc32.exe
- crc32.exe
- crc.exe

testscript for AHK code

Code:
#NoEnv
SetBatchLines -1
;Critical On
;Process, Priority, , Realtime

StartTime := A_TickCount

aBuffer := 1024 * 32

VarSetCapacity(data,aBuffer)   ; allocate 4KB buffer (change to your taste)
file := "E:\Downloads\gos-3.1-gadgets-20081205.iso" ;A_ScriptFullPath    ; put your filename here
FileGetSize Sz, %file%

c := 0, offs := -aBuffer
Loop % Sz//aBuffer {           ; for each block of data
   BinRead(file, data, aBuffer, offs+=aBuffer)
   c := CRC32(data,aBuffer,~c) ; compute accumulated CRC
}

If (m:=mod(Sz,aBuffer)) {      ; the slack
   BinRead(file, data, m, offs+=aBuffer)
   c := CRC32(data,m,~c)
}
                            ; c = CRC here
SetFormat Integer, Hex
crc := c+0               ; show CRC32 in hex
SetFormat Integer, D

ElapsedTime := Round((A_TickCount - StartTime)/1000)

MsgBox  % "It took " ElapsedTime " seconds to scan`n`n" file "`n`nBuffer size: " aBuffer " kB`nCRC: " crc

Return

BinRead(file, ByRef data, n=0, offset=0)  {
   h := DllCall("CreateFile",Str,file, UInt,0x80000000, UInt,3, UInt,0, UInt,3, UInt,0, UInt,0)
   DllCall("SetFilePointerEx",UInt,h, Int64,offset, UIntP,U, Int,2*(offset<0))
   m := DllCall("GetFileSize",UInt,h, Int64P,r)
   If n not between 1 and %m%
      n := m
   VarSetCapacity(data, n)
   DllCall("ReadFile",UInt,h, Str,data, UInt,n, UIntP,r, UInt,0)
   DllCall("CloseHandle", UInt,h)
   Return r
}

CRC32(ByRef Buffer, Bytes=0, Start=-1) {
   Static CRC32, CRC32LookupTable
   If (CRC32 = "") {
      MCode(CRC32,"33c06a088bc85af6c101740ad1e981f12083b8edeb02d1e94a75ec8b542404890c82403d0001000072d8c3")
      VarSetCapacity(CRC32LookupTable, 1024)
      DllCall(&CRC32, "uint",&CRC32LookupTable, "cdecl")
      MCode(CRC32,"558bec33c039450c7627568b4d080fb60c08334d108b55108b751481e1ff000000c1ea0833148e403b450c89551072db5e8b4510f7d05dc3")
   }
   If Bytes <= 0
      Bytes := StrLen(Buffer)
   Return DllCall(&CRC32, "uint",&Buffer, "uint",Bytes, "int",Start, "uint",&CRC32LookupTable, "cdecl uint")
}

MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
   VarSetCapacity(code,StrLen(hex)//2)
   Loop % StrLen(hex)//2
      NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char")
}


testscript for console programs:

Code:
#NoEnv
SetBatchLines -1
;Critical On
;Process, Priority, , Realtime

StartTime := A_TickCount

;aBuffer := 1024 * 4

fileToScan := "E:\Downloads\gos-3.1-gadgets-20081205.iso"
CMDdir := A_ScriptDir "\bin"

CMDin := A_ScriptDir "\bin\rehash.exe -none -crc32 -norcrsv -f """ fileToScan """"


crc := CMDret_RunReturn(CMDin, CMDdir)

ElapsedTime := Round((A_TickCount - StartTime)/1000)

MsgBox  % "It took " ElapsedTime " seconds to scan`n`n" fileToScan "`n`nBuffer size: " aBuffer " kB`nCRC: " crc              ; show CRC32 in hex

Return

; ******************************************************************
; CMDret-AHK functions
; version 1.10 beta
;
; Updated: Dec 5, 2006
; by: corrupt
; Code modifications and/or contributions made by:
; Laszlo, shimanov, toralf, Wdb 
; ******************************************************************
; Usage:
; CMDin - command to execute
; WorkingDir - full path to working directory (Optional)
; ******************************************************************
; Known Issues:
; - If using dir be sure to specify a path (example: cmd /c dir c:\)
; or specify a working directory   
; - Running 16 bit console applications may not produce output. Use
; a 32 bit application to start the 16 bit process to receive output 
; ******************************************************************
; Additional requirements:
; - none
; ******************************************************************
; Code Start
; ******************************************************************

CMDret_RunReturn(CMDin, WorkingDir=0)
{
  Global cmdretPID
  tcWrk := WorkingDir=0 ? "Int" : "Str"
  idltm := A_TickCount + 20
  CMsize = 1
  VarSetCapacity(CMDout, 1, 32)
  VarSetCapacity(sui,68, 0)
  VarSetCapacity(pi, 16, 0)
  VarSetCapacity(pa, 12, 0)
  Loop, 4 {
    DllCall("RtlFillMemory", UInt,&pa+A_Index-1, UInt,1, UChar,12 >> 8*A_Index-8)
    DllCall("RtlFillMemory", UInt,&pa+8+A_Index-1, UInt,1, UChar,1 >> 8*A_Index-8)
  }
  IF (DllCall("CreatePipe", "UInt*",hRead, "UInt*",hWrite, "UInt",&pa, "Int",0) <> 0) {
    Loop, 4
      DllCall("RtlFillMemory", UInt,&sui+A_Index-1, UInt,1, UChar,68 >> 8*A_Index-8)
    DllCall("GetStartupInfo", "UInt", &sui)
    Loop, 4 {
      DllCall("RtlFillMemory", UInt,&sui+44+A_Index-1, UInt,1, UChar,257 >> 8*A_Index-8)
      DllCall("RtlFillMemory", UInt,&sui+60+A_Index-1, UInt,1, UChar,hWrite >> 8*A_Index-8)
      DllCall("RtlFillMemory", UInt,&sui+64+A_Index-1, UInt,1, UChar,hWrite >> 8*A_Index-8)
      DllCall("RtlFillMemory", UInt,&sui+48+A_Index-1, UInt,1, UChar,0 >> 8*A_Index-8)
    }
    IF (DllCall("CreateProcess", Int,0, Str,CMDin, Int,0, Int,0, Int,1, "UInt",0, Int,0, tcWrk, WorkingDir, UInt,&sui, UInt,&pi) <> 0) {
      Loop, 4
        cmdretPID += *(&pi+8+A_Index-1) << 8*A_Index-8
      Loop {
        idltm2 := A_TickCount - idltm
        If (idltm2 < 10) {
          DllCall("Sleep", Int, 10)
          Continue
        }
        IF (DllCall("PeekNamedPipe", "uint", hRead, "uint", 0, "uint", 0, "uint", 0, "uint*", bSize, "uint", 0 ) <> 0 ) {
          Process, Exist, %cmdretPID%
          IF (ErrorLevel OR bSize > 0) {
            IF (bSize > 0) {
              VarSetCapacity(lpBuffer, bSize+1)
              IF (DllCall("ReadFile", "UInt",hRead, "Str", lpBuffer, "Int",bSize, "UInt*",bRead, "Int",0) > 0) {
                IF (bRead > 0) {
                  TRead += bRead
                  VarSetCapacity(CMcpy, (bRead+CMsize+1), 0)
                  CMcpy = a
                  DllCall("RtlMoveMemory", "UInt", &CMcpy, "UInt", &CMDout, "Int", CMsize)
                  DllCall("RtlMoveMemory", "UInt", &CMcpy+CMsize, "UInt", &lpBuffer, "Int", bRead)
                  CMsize += bRead
                  VarSetCapacity(CMDout, (CMsize + 1), 0)
                  CMDout=a   
                  DllCall("RtlMoveMemory", "UInt", &CMDout, "UInt", &CMcpy, "Int", CMsize)
                  VarSetCapacity(CMDout, -1)   ; fix required by change in autohotkey v1.0.44.14
                }
              }
            }
          }
          ELSE
            break
        }
        ELSE
          break
        idltm := A_TickCount
      }
      cmdretPID=
      DllCall("CloseHandle", UInt, hWrite)
      DllCall("CloseHandle", UInt, hRead)
    }
  }
  IF (StrLen(CMDout) < TRead) {
    VarSetCapacity(CMcpy, TRead, 32)
    TRead2 = %TRead%
    Loop {
      DllCall("RtlZeroMemory", "UInt", &CMcpy, Int, TRead)
      NULLptr := StrLen(CMDout)
      cpsize := Tread - NULLptr
      DllCall("RtlMoveMemory", "UInt", &CMcpy, "UInt", (&CMDout + NULLptr + 2), "Int", (cpsize - 1))
      DllCall("RtlZeroMemory", "UInt", (&CMDout + NULLptr), Int, cpsize)
      DllCall("RtlMoveMemory", "UInt", (&CMDout + NULLptr), "UInt", &CMcpy, "Int", cpsize)
      TRead2 --
      IF (StrLen(CMDout) > TRead2)
        break
    }
  }
  StringTrimLeft, CMDout, CMDout, 1
  Return, CMDout
}


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 12th, 2009, 7:36 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
n-l-i-d wrote:
speed up the scanning of large files
An AHK script, which is only 1.66 times slower than the speed champion is pretty good. The machine code part is written in C, optimized for size, not speed, so you could gain a few percent speed if you replace it with a faster, pure assembler code. The AHK overhead at dll calls can be improved if you precompute the file I/O function addresses, and use larger buffers, like 4MB (be a power of two). You can remove a few superfluous lines from the BinRead function, open and close the file only once, but it does not matter much at large buffers. I'll update the script in the previous post in a few minutes...


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 12th, 2009, 8:09 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
The CRC32 loop script is updated in the third previous post. Do you see a speedup?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 12th, 2009, 9:14 pm 
Not really. I tried experimenting with different buffer sizes, and it seems that a buffer of 32 MB is fastest (on this particular file that is). If I change the buffer size in your latest code from 64 to 32 MB, the scanning is much faster (around 18 seconds, as fast as the fastest command line utility).


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 12th, 2009, 9:31 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
The running time could be dominated by the actual disk I/O. Too large buffers might be bad, because Windows has to shuffle things around to make room. In theory double buffering could speed things up, by loading the data into one buffer, while processing the other, but in this case the processing is much faster than reading from disk (unless you have RAID or 15K rpm drives), so you cannot gain much.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 13th, 2009, 2:36 am 
While searching for even faster scanning methods (crc32 assembly code being the optimum I guess), I stumbled across this code that might interest you: CodeProject - CRC32_ Generating a checksum for a file (8 functions, including assembly) and a related set of AutoIt scripts to run inline assembly, with a crc32 example (and if I'm not misinterpreting the code, the author also loads an inline dll into memory and uses it from there!)

This is all way over my head, but looks like very interesting stuff to convert to AHK.

8)


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 25th, 2009, 8:40 pm 
Offline
User avatar

Joined: May 10th, 2007, 10:54 am
Posts: 649
Location: .switzerland
I've made a little Machinecode-Extracter... Hope it is usefull for someone: http://www.autohotkey.com/forum/viewtopic.php?p=245780 :)

_________________
http://securityvision.ch
AHK 2D GAME ENGINE


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 28th, 2009, 5:09 pm 
Offline

Joined: June 14th, 2008, 5:09 pm
Posts: 9
How useful is all this? Like someone said, you can only have assembly code snippets here. Far calls are out of the question. Why don't you just put all the snippets you want into a DLL and call them. And if you use a DLL, you aren't limited to self-relative code.

It's still pretty cool :)


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 28th, 2009, 5:32 pm 
Offline

Joined: November 24th, 2007, 9:07 pm
Posts: 774
However if you use a Dll you're not demonstrating something as cool as running machine code natively in AHK :)

_________________
Ben

My Trac projects
My Wiki
[Broken] - My music


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 28th, 2009, 5:46 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
mitchi wrote:
How useful is all this?
If you need a few small functions hard to program or too slow with the AHK commands, you can directly embed them in your script. If you make a dll, each time you find a useful function you have to recompile the dll. If others use your SW, you have to make sure they have the right version of the dll. If you update a function in the dll, older scripts could break, so you face a complex compatibility management problem. It is also easier including a few lines of machine code somebody else developed, than copying the source code to your always growing dll source, and recompile. The source of the machine code need not be C, and mixing different language source code is hard.

On the other hand, if your function is large, calls library functions, etc. it is better kept in a separate dll. You may end up needing several of them for a larger project, though.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 31st, 2009, 10:48 pm 
Hi, is there anyway to have Bitmap data in memory, and have a conversion of that to hex directly occur within memory?

i.e., hBitmap or pBitmap --->>> straight assignment of hex code to a variable, or fileappend write to file with hBitmap or pBitmap converted to hex code? :)

I commonly use GDI+ and Gdip functions posted elsewhere in this forum, and curious if this bit wizardy is robust to do this.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 31st, 2009, 10:56 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
Glasso wrote:
Hi, is there anyway to have Bitmap data in memory, and have a conversion of that to hex directly occur within memory?
Check out the Bin2Hex and Hex2Bin functions in this thread.
Glasso wrote:
this bit wizardy is robust to do this?
Yes


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 31st, 2009, 11:42 pm 
Well, what I try is:

Code:
hexData := Bin2Hex (&hBitmap, not_sure_how_to_get_hbitmap_size_at_this_level)


what can i do bettter?


Report this post
Top
  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 356 posts ]  Go to page Previous  1 ... 11, 12, 13, 14, 15, 16, 17 ... 24  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: daonlyfreez, Google Feedfetcher, mhe and 13 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group