AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Machine code functions: Bit Wizardry
Goto page Previous  1, 2, 3 ... 13, 14, 15, 16, 17  Next
 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions
View previous topic :: View next topic  
Author Message
n-l-i-d
Guest





PostPosted: Fri Jan 09, 2009 11:40 pm    Post subject: Reply with quote

Been searching the forum for a fast file checksum code, and found this posting by Laszlo: a wrapper for the CRC32 function
Quote:
The function CRC32 has three parameters.
- The first one is the name of a buffer, which can contain binary data.
- The second parameter is the length of the data in bytes. If omitted or not positive, Strlen(Buffer) is used internally.
- The 3rd parameter is used for continuing the CRC computation for second or later data sections. If omitted, -1 is used, the standard initial value for CRC32. If an earlier CRC operation is to be continued (which returned C), put here ~C. If a different CRC is needed than the standard CRC-32 (e.g. to resolve collisions), you can use any 32 bit integer for initialization.


Can anybody show me in a simple script how to use this CRC32 function to generate the checksum of any file (with FileSelectFile, for example), read as binary (and in chunks if big, so as to not load the whole file in memory)

Question

I'd like to use this for a duplicate file finder script.
Back to top
Laszlo



Joined: 14 Feb 2005
Posts: 4515
Location: Boulder, CO

PostPosted: Mon Jan 12, 2009 5:53 am    Post subject: Reply with quote

Code:
bufSz := 1 << 26             ; 64MB buffer
VarSetCapacity(buff,bufSz)   ; allocate buffer
file := A_ScriptFullPath     ; put your filename here
FileGetSize Sz, %file%

c := 0, offs := -bufSz
h := OpenFile(file)          ; handle to file
Loop % Sz//bufSz {           ; for each buff-full of data
   BinRead(h, buff, bufSz, offs+=bufSz)
   c := CRC32(buff,bufSz,~c) ; compute accumulated CRC
}

If (m:=mod(Sz,bufSz)) {      ; the slack
   BinRead(h, buff, m, offs+=bufSz)
   c := CRC32(buff,m,~c)
}
CloseFile(h)
                             ; c = CRC here
SetFormat Integer, Hex
MsgBox  % c+0                ; show CRC32 in hex


OpenFile(file) {             ; only for read!
   Return DllCall("CreateFile",Str,file, UInt,0x80000000, UInt,3, UInt,0, UInt,3, UInt,0, UInt,0)
}

BinRead(hFile, ByRef data, n, offset=0)  { ; offset<0: counted from the end backwards
   DllCall("SetFilePointerEx",UInt,hFile, Int64,offset, UIntP,U, Int,2*(offset<0))
   DllCall("ReadFile",UInt,hFile, Str,data, UInt,n, UIntP,r, UInt,0)
   Return r                  ; the number of bytes read
}

CloseFile(hFile) {
   DllCall("CloseHandle", UInt,hFile)
}

CRC32(ByRef Buffer, Bytes=0, Start=-1) {
   Static CRC32, CRC32LookupTable
   If (CRC32 = "") {
      MCode(CRC32,"33c06a088bc85af6c101740ad1e981f12083b8edeb02d1e94a75ec8b542404890c82403d0001000072d8c3")
      VarSetCapacity(CRC32LookupTable, 1024)
      DllCall(&CRC32, "uint",&CRC32LookupTable, "cdecl")
      MCode(CRC32,"558bec33c039450c7627568b4d080fb60c08334d108b55108b751481e1ff000000c1ea0833148e403b450c89551072db5e8b4510f7d05dc3")
   }
   If Bytes <= 0
      Bytes := StrLen(Buffer)
   Return DllCall(&CRC32, "uint",&Buffer, "uint",Bytes, "int",Start, "uint",&CRC32LookupTable, "cdecl uint")
}

MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
   VarSetCapacity(code,StrLen(hex)//2)
   Loop % StrLen(hex)//2
      NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char")
}

Edit 20090112: Faster BinRead, larger buffer for speedup


Last edited by Laszlo on Mon Jan 12, 2009 8:04 pm; edited 1 time in total
Back to top
View user's profile Send private message
n-l-i-d
Guest





PostPosted: Mon Jan 12, 2009 6:13 pm    Post subject: Reply with quote

Great! Much abliged... Cool

Is there any way to speed up the scanning of large files more? I found two (fast) command line utilities that beat this script considerably in speed:

rehash.exe (has other checksum options too)
crc32.exe (fastest of the crc32.exe's I found)

Time to scan a 700 MB iso file:
- script: 30 seconds (buffer-size: 32768, latest AutoHotkey beta)
- rehash: 19 seconds
- crc32: 18 seconds

Other command line programs I tried (but were slower than the ones mentioned above):

- crc32.exe
- crc32.exe
- crc.exe

testscript for AHK code

Code:
#NoEnv
SetBatchLines -1
;Critical On
;Process, Priority, , Realtime

StartTime := A_TickCount

aBuffer := 1024 * 32

VarSetCapacity(data,aBuffer)   ; allocate 4KB buffer (change to your taste)
file := "E:\Downloads\gos-3.1-gadgets-20081205.iso" ;A_ScriptFullPath    ; put your filename here
FileGetSize Sz, %file%

c := 0, offs := -aBuffer
Loop % Sz//aBuffer {           ; for each block of data
   BinRead(file, data, aBuffer, offs+=aBuffer)
   c := CRC32(data,aBuffer,~c) ; compute accumulated CRC
}

If (m:=mod(Sz,aBuffer)) {      ; the slack
   BinRead(file, data, m, offs+=aBuffer)
   c := CRC32(data,m,~c)
}
                            ; c = CRC here
SetFormat Integer, Hex
crc := c+0               ; show CRC32 in hex
SetFormat Integer, D

ElapsedTime := Round((A_TickCount - StartTime)/1000)

MsgBox  % "It took " ElapsedTime " seconds to scan`n`n" file "`n`nBuffer size: " aBuffer " kB`nCRC: " crc

Return

BinRead(file, ByRef data, n=0, offset=0)  {
   h := DllCall("CreateFile",Str,file, UInt,0x80000000, UInt,3, UInt,0, UInt,3, UInt,0, UInt,0)
   DllCall("SetFilePointerEx",UInt,h, Int64,offset, UIntP,U, Int,2*(offset<0))
   m := DllCall("GetFileSize",UInt,h, Int64P,r)
   If n not between 1 and %m%
      n := m
   VarSetCapacity(data, n)
   DllCall("ReadFile",UInt,h, Str,data, UInt,n, UIntP,r, UInt,0)
   DllCall("CloseHandle", UInt,h)
   Return r
}

CRC32(ByRef Buffer, Bytes=0, Start=-1) {
   Static CRC32, CRC32LookupTable
   If (CRC32 = "") {
      MCode(CRC32,"33c06a088bc85af6c101740ad1e981f12083b8edeb02d1e94a75ec8b542404890c82403d0001000072d8c3")
      VarSetCapacity(CRC32LookupTable, 1024)
      DllCall(&CRC32, "uint",&CRC32LookupTable, "cdecl")
      MCode(CRC32,"558bec33c039450c7627568b4d080fb60c08334d108b55108b751481e1ff000000c1ea0833148e403b450c89551072db5e8b4510f7d05dc3")
   }
   If Bytes <= 0
      Bytes := StrLen(Buffer)
   Return DllCall(&CRC32, "uint",&Buffer, "uint",Bytes, "int",Start, "uint",&CRC32LookupTable, "cdecl uint")
}

MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
   VarSetCapacity(code,StrLen(hex)//2)
   Loop % StrLen(hex)//2
      NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char")
}


testscript for console programs:

Code:
#NoEnv
SetBatchLines -1
;Critical On
;Process, Priority, , Realtime

StartTime := A_TickCount

;aBuffer := 1024 * 4

fileToScan := "E:\Downloads\gos-3.1-gadgets-20081205.iso"
CMDdir := A_ScriptDir "\bin"

CMDin := A_ScriptDir "\bin\rehash.exe -none -crc32 -norcrsv -f """ fileToScan """"


crc := CMDret_RunReturn(CMDin, CMDdir)

ElapsedTime := Round((A_TickCount - StartTime)/1000)

MsgBox  % "It took " ElapsedTime " seconds to scan`n`n" fileToScan "`n`nBuffer size: " aBuffer " kB`nCRC: " crc              ; show CRC32 in hex

Return

; ******************************************************************
; CMDret-AHK functions
; version 1.10 beta
;
; Updated: Dec 5, 2006
; by: corrupt
; Code modifications and/or contributions made by:
; Laszlo, shimanov, toralf, Wdb 
; ******************************************************************
; Usage:
; CMDin - command to execute
; WorkingDir - full path to working directory (Optional)
; ******************************************************************
; Known Issues:
; - If using dir be sure to specify a path (example: cmd /c dir c:\)
; or specify a working directory   
; - Running 16 bit console applications may not produce output. Use
; a 32 bit application to start the 16 bit process to receive output 
; ******************************************************************
; Additional requirements:
; - none
; ******************************************************************
; Code Start
; ******************************************************************

CMDret_RunReturn(CMDin, WorkingDir=0)
{
  Global cmdretPID
  tcWrk := WorkingDir=0 ? "Int" : "Str"
  idltm := A_TickCount + 20
  CMsize = 1
  VarSetCapacity(CMDout, 1, 32)
  VarSetCapacity(sui,68, 0)
  VarSetCapacity(pi, 16, 0)
  VarSetCapacity(pa, 12, 0)
  Loop, 4 {
    DllCall("RtlFillMemory", UInt,&pa+A_Index-1, UInt,1, UChar,12 >> 8*A_Index-8)
    DllCall("RtlFillMemory", UInt,&pa+8+A_Index-1, UInt,1, UChar,1 >> 8*A_Index-8)
  }
  IF (DllCall("CreatePipe", "UInt*",hRead, "UInt*",hWrite, "UInt",&pa, "Int",0) <> 0) {
    Loop, 4
      DllCall("RtlFillMemory", UInt,&sui+A_Index-1, UInt,1, UChar,68 >> 8*A_Index-8)
    DllCall("GetStartupInfo", "UInt", &sui)
    Loop, 4 {
      DllCall("RtlFillMemory", UInt,&sui+44+A_Index-1, UInt,1, UChar,257 >> 8*A_Index-8)
      DllCall("RtlFillMemory", UInt,&sui+60+A_Index-1, UInt,1, UChar,hWrite >> 8*A_Index-8)
      DllCall("RtlFillMemory", UInt,&sui+64+A_Index-1, UInt,1, UChar,hWrite >> 8*A_Index-8)
      DllCall("RtlFillMemory", UInt,&sui+48+A_Index-1, UInt,1, UChar,0 >> 8*A_Index-8)
    }
    IF (DllCall("CreateProcess", Int,0, Str,CMDin, Int,0, Int,0, Int,1, "UInt",0, Int,0, tcWrk, WorkingDir, UInt,&sui, UInt,&pi) <> 0) {
      Loop, 4
        cmdretPID += *(&pi+8+A_Index-1) << 8*A_Index-8
      Loop {
        idltm2 := A_TickCount - idltm
        If (idltm2 < 10) {
          DllCall("Sleep", Int, 10)
          Continue
        }
        IF (DllCall("PeekNamedPipe", "uint", hRead, "uint", 0, "uint", 0, "uint", 0, "uint*", bSize, "uint", 0 ) <> 0 ) {
          Process, Exist, %cmdretPID%
          IF (ErrorLevel OR bSize > 0) {
            IF (bSize > 0) {
              VarSetCapacity(lpBuffer, bSize+1)
              IF (DllCall("ReadFile", "UInt",hRead, "Str", lpBuffer, "Int",bSize, "UInt*",bRead, "Int",0) > 0) {
                IF (bRead > 0) {
                  TRead += bRead
                  VarSetCapacity(CMcpy, (bRead+CMsize+1), 0)
                  CMcpy = a
                  DllCall("RtlMoveMemory", "UInt", &CMcpy, "UInt", &CMDout, "Int", CMsize)
                  DllCall("RtlMoveMemory", "UInt", &CMcpy+CMsize, "UInt", &lpBuffer, "Int", bRead)
                  CMsize += bRead
                  VarSetCapacity(CMDout, (CMsize + 1), 0)
                  CMDout=a   
                  DllCall("RtlMoveMemory", "UInt", &CMDout, "UInt", &CMcpy, "Int", CMsize)
                  VarSetCapacity(CMDout, -1)   ; fix required by change in autohotkey v1.0.44.14
                }
              }
            }
          }
          ELSE
            break
        }
        ELSE
          break
        idltm := A_TickCount
      }
      cmdretPID=
      DllCall("CloseHandle", UInt, hWrite)
      DllCall("CloseHandle", UInt, hRead)
    }
  }
  IF (StrLen(CMDout) < TRead) {
    VarSetCapacity(CMcpy, TRead, 32)
    TRead2 = %TRead%
    Loop {
      DllCall("RtlZeroMemory", "UInt", &CMcpy, Int, TRead)
      NULLptr := StrLen(CMDout)
      cpsize := Tread - NULLptr
      DllCall("RtlMoveMemory", "UInt", &CMcpy, "UInt", (&CMDout + NULLptr + 2), "Int", (cpsize - 1))
      DllCall("RtlZeroMemory", "UInt", (&CMDout + NULLptr), Int, cpsize)
      DllCall("RtlMoveMemory", "UInt", (&CMDout + NULLptr), "UInt", &CMcpy, "Int", cpsize)
      TRead2 --
      IF (StrLen(CMDout) > TRead2)
        break
    }
  }
  StringTrimLeft, CMDout, CMDout, 1
  Return, CMDout
}
Back to top
Laszlo



Joined: 14 Feb 2005
Posts: 4515
Location: Boulder, CO

PostPosted: Mon Jan 12, 2009 7:36 pm    Post subject: Reply with quote

n-l-i-d wrote:
speed up the scanning of large files
An AHK script, which is only 1.66 times slower than the speed champion is pretty good. The machine code part is written in C, optimized for size, not speed, so you could gain a few percent speed if you replace it with a faster, pure assembler code. The AHK overhead at dll calls can be improved if you precompute the file I/O function addresses, and use larger buffers, like 4MB (be a power of two). You can remove a few superfluous lines from the BinRead function, open and close the file only once, but it does not matter much at large buffers. I'll update the script in the previous post in a few minutes...
Back to top
View user's profile Send private message
Laszlo



Joined: 14 Feb 2005
Posts: 4515
Location: Boulder, CO

PostPosted: Mon Jan 12, 2009 8:09 pm    Post subject: Reply with quote

The CRC32 loop script is updated in the third previous post. Do you see a speedup?
Back to top
View user's profile Send private message
n-l-i-d
Guest





PostPosted: Mon Jan 12, 2009 9:14 pm    Post subject: Reply with quote

Not really. I tried experimenting with different buffer sizes, and it seems that a buffer of 32 MB is fastest (on this particular file that is). If I change the buffer size in your latest code from 64 to 32 MB, the scanning is much faster (around 18 seconds, as fast as the fastest command line utility).
Back to top
Laszlo



Joined: 14 Feb 2005
Posts: 4515
Location: Boulder, CO

PostPosted: Mon Jan 12, 2009 9:31 pm    Post subject: Reply with quote

The running time could be dominated by the actual disk I/O. Too large buffers might be bad, because Windows has to shuffle things around to make room. In theory double buffering could speed things up, by loading the data into one buffer, while processing the other, but in this case the processing is much faster than reading from disk (unless you have RAID or 15K rpm drives), so you cannot gain much.
Back to top
View user's profile Send private message
n-l-i-d
Guest





PostPosted: Tue Jan 13, 2009 2:36 am    Post subject: Reply with quote

While searching for even faster scanning methods (crc32 assembly code being the optimum I guess), I stumbled across this code that might interest you: CodeProject - CRC32_ Generating a checksum for a file (8 functions, including assembly) and a related set of AutoIt scripts to run inline assembly, with a crc32 example (and if I'm not misinterpreting the code, the author also loads an inline dll into memory and uses it from there!)

This is all way over my head, but looks like very interesting stuff to convert to AHK.

Cool
Back to top
IsNull



Joined: 10 May 2007
Posts: 166
Location: .switzerland

PostPosted: Sun Jan 25, 2009 8:40 pm    Post subject: Reply with quote

I've made a little Machinecode-Extracter... Hope it is usefull for someone: http://www.autohotkey.com/forum/viewtopic.php?p=245780 Smile
_________________
http://securityvision.ch

AHK 2D GAME ENGINE
Back to top
View user's profile Send private message
mitchi



Joined: 14 Jun 2008
Posts: 9

PostPosted: Wed Jan 28, 2009 5:09 pm    Post subject: Reply with quote

How useful is all this? Like someone said, you can only have assembly code snippets here. Far calls are out of the question. Why don't you just put all the snippets you want into a DLL and call them. And if you use a DLL, you aren't limited to self-relative code.

It's still pretty cool Smile
Back to top
View user's profile Send private message
bmcclure



Joined: 24 Nov 2007
Posts: 766

PostPosted: Wed Jan 28, 2009 5:32 pm    Post subject: Reply with quote

However if you use a Dll you're not demonstrating something as cool as running machine code natively in AHK Smile
_________________
Ben

My Trac projects
My Wiki
[Broken] - My music
Back to top
View user's profile Send private message
Laszlo



Joined: 14 Feb 2005
Posts: 4515
Location: Boulder, CO

PostPosted: Wed Jan 28, 2009 5:46 pm    Post subject: Reply with quote

mitchi wrote:
How useful is all this?
If you need a few small functions hard to program or too slow with the AHK commands, you can directly embed them in your script. If you make a dll, each time you find a useful function you have to recompile the dll. If others use your SW, you have to make sure they have the right version of the dll. If you update a function in the dll, older scripts could break, so you face a complex compatibility management problem. It is also easier including a few lines of machine code somebody else developed, than copying the source code to your always growing dll source, and recompile. The source of the machine code need not be C, and mixing different language source code is hard.

On the other hand, if your function is large, calls library functions, etc. it is better kept in a separate dll. You may end up needing several of them for a larger project, though.
Back to top
View user's profile Send private message
Glasso
Guest





PostPosted: Sat Jan 31, 2009 10:48 pm    Post subject: Reply with quote

Hi, is there anyway to have Bitmap data in memory, and have a conversion of that to hex directly occur within memory?

i.e., hBitmap or pBitmap --->>> straight assignment of hex code to a variable, or fileappend write to file with hBitmap or pBitmap converted to hex code? Smile

I commonly use GDI+ and Gdip functions posted elsewhere in this forum, and curious if this bit wizardy is robust to do this.
Back to top
Laszlo



Joined: 14 Feb 2005
Posts: 4515
Location: Boulder, CO

PostPosted: Sat Jan 31, 2009 10:56 pm    Post subject: Reply with quote

Glasso wrote:
Hi, is there anyway to have Bitmap data in memory, and have a conversion of that to hex directly occur within memory?
Check out the Bin2Hex and Hex2Bin functions in this thread.
Glasso wrote:
this bit wizardy is robust to do this?
Yes
Back to top
View user's profile Send private message
Glasso
Guest





PostPosted: Sat Jan 31, 2009 11:42 pm    Post subject: Reply with quote

Well, what I try is:

Code:

hexData := Bin2Hex (&hBitmap, not_sure_how_to_get_hbitmap_size_at_this_level)


what can i do bettter?
Back to top
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions All times are GMT
Goto page Previous  1, 2, 3 ... 13, 14, 15, 16, 17  Next
Page 14 of 17

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group