AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Machine code binary buffer searching regardless of NULL
Goto page 1, 2, 3, 4  Next
 
Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions
View previous topic :: View next topic  
Author Message
wOxxOm



Joined: 09 Feb 2006
Posts: 326

PostPosted: Sat Nov 24, 2007 1:18 pm    Post subject: Machine code binary buffer searching regardless of NULL Reply with quote

Blazing fast machine-code CASE-SENSITIVE searching in a (binary) buffer for a sequence of bytes, that may include NULL characters. Returns either position of 'sought' inside 'haystack' or -1 if not found.

Time to search is 0 ms generally Smile, and in case of a very hostile buffer contents (all bytes are the same and *equal Needle's first* byte) on my pc - 60megabytes in 0.1 - 0.5 sec.

InBuf - look for binary Needle in binary Buffer.
0-based (-1 = not found), case-sensitive.

Code:
InBuf(haystackAddr, needleAddr, haystackSize, needleSize, StartOffset=0)
{   Static fun
   IfEqual,fun,
   {
      h=
      ( LTrim join
         5589E583EC0C53515256579C8B5D1483FB000F8EC20000008B4D108B451829C129D9410F8E
         B10000008B7D0801C78B750C31C0FCAC4B742A4B742D4B74364B74144B753F93AD93F2AE0F
         858B000000391F75F4EB754EADF2AE757F3947FF75F7EB68F2AE7574EB628A26F2AE756C38
         2775F8EB569366AD93F2AE755E66391F75F7EB474E43AD8975FC89DAC1EB02895DF483E203
         8955F887DF87D187FB87CAF2AE75373947FF75F789FB89CA83C7038B75FC8B4DF485C97404
         F3A775DE8B4DF885C97404F3A675D389DF4F89F82B45089D5F5E5A595BC9C2140031C0F7D0EBF0
      )
      VarSetCapacity(fun,StrLen(h)//2)
      Loop % StrLen(h)//2
         NumPut("0x" . SubStr(h,2*A_Index-1,2), fun, A_Index-1, "Char")
   }
   Return DllCall(&fun
      , "uint",haystackAddr, "uint",needleAddr
      , "uint",haystackSize, "uint",needleSize
      , "uint",StartOffset)
}


InBufRev - reverse look for binary Needle in binary Buffer.
0-based (-1 = not found), case-sensitive.
StartOffsetOfLastNeedleByte - maximum hayStack offset to contain Needle's bytes (-1=whole haystackSize)

Code:
InBufRev(haystackAddr, needleAddr, haystackSize, needleSize, StartOffsetOfLastNeedleByte=-1)
{   Static fun
   IfEqual,fun,
   {
      h=
      ( LTrim join
         5589E583EC0C53515256579C8B5D1483FB000F8EDE0000008B4510488B4D1883F9FF0F44
         C839C80F4CC829D989CF410F8EC1000000037D088B750C83E000FCAC4B74224B742A4B74
         354B74434B754E93AD93FDF2AE0F859B000000395F0275F3E981000000FDF2AE0F858800
         0000EB76FD8A26F2AE757F38670275F7EB689366AD93FDF2AE756F66395F0275F6EB574E
         ADFDF2AE756039470175F7EB494E43AD8975FC89DAC1EB02895DF483E2038955F887DF87
         D1FD87FB87CAF2AE753839470175F7FC89FB89CA83C7058B75FC8B4DF485C97404F3A775
         DC8B4DF885C97404F3A675D189DF4789F82B45089D5F5E5A595BC9C2140031C0F7D0EBF0
      )
      VarSetCapacity(fun,StrLen(h)//2)
      Loop % StrLen(h)//2
      NumPut("0x" . SubStr(h,2*A_Index-1,2), fun, A_Index-1, "Char")
   }
   return DllCall(&fun
      , "uint",haystackAddr, "uint",needleAddr
      , "uint",haystackSize, "uint",needleSize
      , "uint",StartOffsetOfLastNeedleByte)
}


Wrappers for searching a string in a binary buffer
Code:
;InBufStr - look for string Needle in binary Buffer
;         0-based (-1 = not found), case-sensitive.
InBufStr(haystackAddr, needleStr, haystackSize, StartOffset=0)
{   return InBuf(haystackAddr, &needleStr, haystackSize, strlen(needleStr))
}

;InBufStrRev - reverse look for string Needle in binary Buffer
;         0-based (-1 = not found), case-sensitive.
;         StartOffsetOfLastNeedleByte - maximum hayStack offset to contain Needle's bytes (-1=whole haystackSize)
InBufStrRev(haystackAddr, needleStr, haystackSize, StartOffsetOfLastNeedleByte=-1)
{   return InBufRev(haystackAddr, &needleStr, haystackSize, strlen(needleStr), StartOffsetOfLastNeedleByte)
}


Wrappers for extracting a string from a binary buffer
Code:
;substrBuf - extract a string of specified length from a binary buffer
;         usage: stringVar:=substrBuf( &buf+Offset, 100 )
;         the Length is optional, if none specified then NULL-terminated string is extracted
;         be accurate in order to not exceed the buf bounds, if no NULL is there
substrBuf(bufAddr, Length="")
{  IfEqual,Length,
      Length:=dllCall("lstrlen","uint",bufAddr)
   VarSetCapacity(result,Length)
   DllCall("RtlMoveMemory", "str",result, "uint", bufAddr, "uint",Length)
   return result
}


Usage:
Code:
Offset := InBuf( &Buffer, &sought, 10000, 100)
Offset := InBufRev( &Buffer, &sought, 10000, 100, 5000)

Offset := InBuf[b]Str[/b]Rev( &Buffer, "ImmediateString", 10000, -1)
Offset := InBuf[b]Str[/b]( &Buffer, StringVar, 10000, StartOffset)
Offset := InBuf[b]Str[/b]( &Buffer, StringVar, 10000000)
Code:
ifNotEqual, Offset, -1
   Text := substrBuf( &buffer+Offset)
ifNotEqual, Offset, -1
   Text := substrBuf( &buffer+Offset1, Offset2-Offset1)


InFile - InBuf based case-sensitive searching in file's contents of any* size
*: even larger than 4GB, StartOffset may also be larger than 4GB.
Code:
InFile( fileName, needleAddr, needleLen, StartOffset=0 )
{
   lRet=-1
   IfEqual,needleLen,0, return lRet
   IfEqual,needleAddr,0, return lRet
   hFile:=DllCall("CreateFile", "str", fileName,"uint",0x80000000 ;GENERIC_READ
            ,"uint", 1 ;FILE_SHARE_READ
            ,"uint", 0, "uint",3 ;OPEN_EXISTING
            ,"uint",0x2000000 ;FILE_FLAG_BACKUP_SEMANTICS
            ,"uint", 0)
   ifEqual,hFile,-1, return lRet

   VarSetCapacity( lBufLen, 8, 0 )
   NumPut( DllCall("GetFileSize","uint",hFile,"uint",&lBufLen+4), lBufLen )
   DllCall( "RtlMoveMemory", "int64 *",lBufLen64, "uint",&lBufLen, "uint",8 )
   lBufLen64 -= StartOffset
   If( lBufLen64>=0 )
   {   hMap:=DllCall("CreateFileMapping", "uint",hFile, "uint",0, "uint",2 ;PAGE_READONLY
               ,"uint",0,"uint",0,"uint",0)
      if( hMap )
      {   lMax32b=0xFFFFFFFF
         lMaxView=0x40000000 ;1GB
         VarSetCapacity( SI, 36, 0 )
         DllCall("GetSystemInfo","uint",&SI)
         memAllocGranularity:=NumGet( SI, 28 )
         loop
         {   FileOffs:=(StartOffset//memAllocGranularity)*memAllocGranularity
            delta:=StartOffset-FileOffs
            lBufLenLo:=(lBufLen64+delta > lMaxView) ? lMaxView : lBufLen64+delta
            hView:=DllCall("MapViewOfFile", "uint",hMap, "uint", 4 ;FILE_MAP_READ
                     ,"uint",FileOffs>>32,"uint",FileOffs & lMax32b,"uint",lBufLenLo)
            ifEqual,hView,0, break
            lRet:=InBuf( hView, needleAddr, lBufLenLo, needleLen, delta )
            DllCall("UnmapViewOfFile","uint",hView)
            if( lRet!=-1 )
            {   lRet += FileOffs
               break
            }
            StartOffset += lBufLenLo-needleLen
            lBufLen64 -= lBufLenLo-needleLen
            IfLessOrEqual,lBufLen64,0, break
         }
         DllCall("CloseHandle","uint",hMap)
      }
   }
   DllCall("CloseHandle","uint",hFile)
   return lRet
}

Usage:
fileOffs:=InFile( "d:\filename", &needle, needleLen, StartOffset )

The advantage of using CreateFileMapping - no need to maintain chunked reading of file in case your file is under 1GB, no need to read the whole contents of file into memory, the OS does it for you automatically when actual memory access occurs (in InBuf code). The disadvantage may also be huge - given current implementation of InFile and InBuf there is no way to stop the process or show feedback... but this can also be circumvented if needed by multiple loop calls to InBuf with BufLen parameter set to a small value


Last edited by wOxxOm on Thu Jan 03, 2008 6:12 am; edited 13 times in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Laszlo



Joined: 14 Feb 2005
Posts: 4710
Location: Boulder, CO

PostPosted: Sat Nov 24, 2007 4:55 pm    Post subject: Reply with quote

This can come in handy! Thanks for sharing it.

Could you post the full ASM code and detailed explanations? In some cases, if someone is really after speed, he might want to tweak the code. E.g. if the search string is 2, 4 or 8 byte long, you could use integer comparisons, or, if the search pattern is very long or has some regularities, some more complex algorithm.

Does anyone have experiences with MMX (SSE or 3DNow!)? They could make the code even faster, if the user has the right processor.
Back to top
View user's profile Send private message
wOxxOm



Joined: 09 Feb 2006
Posts: 326

PostPosted: Sun Nov 25, 2007 3:12 am    Post subject: Reply with quote

Built with FlatASM.
There are optimizations for 1,2,3,4,5 byte sequences.
Note: only PROC code matters, everything else is dummy of course, only to allow fasm compile it to exe to rip the code.
Code:
format PE GUI 4.0
entry start

include 'win32a.inc'

section '.data' data readable writeable

  hayStack db '1111111122222111111'
  Needle db '22222'

section '.code' code readable executable

  start:

   push   0 5 19 Needle hayStack
   call   InBuf
   push   -1 5 19 Needle hayStack
   call   InBufRev
   invoke   ExitProcess,0

proc InBuf stdcall uses ebx ecx edx esi edi, hayStack,Needle,hayStackSize,NeedleSize,StartOffset
   local   lNeedleRemDwords:DWORD   ;(NeedleSize-4)>>2
   local   lNeedleRemTail:DWORD   ;Needle remainder byte count (NeedleSize-4) mod 4 -> (0..3)
   local   lNeedleRemPtr4:DWORD   ;&Needle[4]

   pushfd

   mov   ebx,[NeedleSize]
   cmp   ebx,0
   jle   .NotFound
   mov   ecx,[hayStackSize]
   mov   eax,[StartOffset]
   sub   ecx,eax
   sub   ecx,ebx
   inc   ecx   ;repetitions=hayStackSize-StartOffset-NeedleSize+1
   jle   .NotFound

   mov   edi,[hayStack]
   add   edi,eax ;edi=&(hayStack[StartOffset])

   ;load Needle FirstByte
   mov   esi,[Needle]
   xor   eax,eax
   cld
   lodsb   ; AL=Needle[0], keep EAX now!

   ;decide on needle length
   dec   ebx
   jz   .NeedleLenIs1
   dec   ebx
   jz   .NeedleLenIs2
   dec   ebx
   jz   .NeedleLenIs3
   dec   ebx
   jz   .NeedleLenIs4
   dec   ebx
   jnz   .NeedleLenIsLong

;.NeedleLenIs5:
   xchg   eax,ebx
   lodsd      ;AL=Needle[0]
   xchg   eax,ebx ;EBX=bytes 1..5 of Needle

   .ScanNeedleLenIs5:
   repne   scasb
   jne   .NotFound
   cmp   [edi],ebx
   jne   .ScanNeedleLenIs5
   jmp   .Found

.NeedleLenIs4:
   dec   esi
   lodsd   ;EAX=first 4 bytes of Needle
   .ScanNeedleLenIs4:
   repne   scasb
   jne   .NotFound
   cmp   [edi-1],eax
   jne   .ScanNeedleLenIs4
   jmp   .Found

.NeedleLenIs1:
   repne   scasb
   jne   .NotFound
   jmp   .Found

.NeedleLenIs2:
   mov   ah,[esi]
   .ScanNeedleLenIs2:
   repne   scasb
   jne   .NotFound
   cmp   [edi],ah
   jne   .ScanNeedleLenIs2
   jmp   .Found

.NeedleLenIs3:
   xchg   ebx,eax
   lodsw
   xchg   ebx,eax
   .ScanNeedleLenIs3:
   repne   scasb
   jne   .NotFound
   cmp   [edi],bx
   jne   .ScanNeedleLenIs3
   jmp   .Found

.NeedleLenIsLong:
   ; get (needleSize-1)//4, (needleSize-1) mod 4
   dec   esi   ;ESI=&(Needle[0])
   inc   ebx   ;EBX=NeedleSize-4
   lodsd   ;EAX=first 4 bytes of Needle
   mov   [lNeedleRemPtr4],esi
   mov   edx,ebx
   shr   ebx,2
   mov   [lNeedleRemDwords],ebx
   and   edx,3
   mov   [lNeedleRemTail],edx

   xchg   ebx,edi ;EBX=save EDI buf ptr for scasb
   xchg   edx,ecx ;EDX=save ECX counter for scasb

   .ScanNeedleLenIsLong:
   xchg   edi,ebx ;load saved buf ptr
   xchg   ecx,edx ;load saved counter
   .ScanNeedleLenIsLongJustScan:
   repne   scasb
   jne   .NotFound

   ;check all 4 bytes
   cmp   [edi-1],eax
   jne   .ScanNeedleLenIsLongJustScan

   ;check up to Needle's tail
   mov   ebx,edi
   mov   edx,ecx
   add   edi,3
   mov   esi,[lNeedleRemPtr4]
   mov   ecx,[lNeedleRemDwords]
   test   ecx,ecx
   jz   .ScanNeedleLenIsLongTail
   repe   cmpsd
   jne   .ScanNeedleLenIsLong
   .ScanNeedleLenIsLongTail:
   mov   ecx,[lNeedleRemTail]
   test   ecx,ecx
   jz   .ScanNeedleLenIsLongFound
   repe   cmpsb
   jne   .ScanNeedleLenIsLong
   .ScanNeedleLenIsLongFound:
   mov   edi,ebx ;FOUND!

.Found:
   dec   edi
   mov   eax,edi
   sub   eax,[hayStack]
.popOut:
   popfd
   ret
.NotFound:
   xor   eax,eax
   not   eax
   jmp   .popOut
endp

;@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

proc InBufRev stdcall uses ebx ecx edx esi edi, hayStack,Needle,hayStackSize,NeedleSize,StartOffsetOfLastByte
   local   lNeedleRemDwords:DWORD   ;(NeedleSize-4)>>2
   local   lNeedleRemTail:DWORD   ;Needle remainder byte count (NeedleSize-4) mod 4 -> (0..3)
   local   lNeedleRemPtr4:DWORD   ;&Needle[4]

   pushfd

   mov   ebx,[NeedleSize]
   cmp   ebx,0
   jle   .NotFound
   mov   eax,[hayStackSize]
   dec   eax
   mov   ecx,[StartOffsetOfLastByte]
   cmp   ecx,-1
   cmovE   ecx,eax
   cmp   eax,ecx
   cmovL   ecx,eax
   sub   ecx,ebx
   mov   edi,ecx
   inc   ecx   ;repetitions=min(hayStackSize-1,StartOffsetOfLastByte)-NeedleSize+2
   jle   .NotFound

   add   edi,[hayStack]   ;edi=&(hayStack[min(hayStackSize-1,StartOffsetOfLastByte)-NeedleSize+1])

   ;load Needle FirstByte
   mov   esi,[Needle]
   and   eax,0
   cld
   lodsb   ; AL=Needle[0], keep EAX now!

   ;decide on needle length
   dec   ebx
   jz   .NeedleLenIs1
   dec   ebx
   jz   .NeedleLenIs2
   dec   ebx
   jz   .NeedleLenIs3
   dec   ebx
   jz   .NeedleLenIs4
   dec   ebx
   jnz   .NeedleLenIsLong

;.NeedleLenIs5:
   xchg   eax,ebx
   lodsd      ;AL=Needle[0]
   xchg   eax,ebx ;EBX=bytes 1..4 of Needle (0-based)
   std

   .ScanNeedleLenIs5:
   repne   scasb
   jne   .NotFound
   cmp   [edi+2],ebx
   jne   .ScanNeedleLenIs5
   jmp   .Found

.NeedleLenIs1:
   std
   repne   scasb
   jne   .NotFound
   jmp   .Found

.NeedleLenIs2:
   std
   mov   ah,[esi]   ;AH=Needle[1]
   .ScanNeedleLenIs2:
   repne   scasb
   jne   .NotFound
   cmp   [edi+2],ah
   jne   .ScanNeedleLenIs2
   jmp   .Found

.NeedleLenIs3:
   xchg   ebx,eax
   lodsw
   xchg   ebx,eax
   std
   .ScanNeedleLenIs3:
   repne   scasb
   jne   .NotFound
   cmp   [edi+2],bx
   jne   .ScanNeedleLenIs3
   jmp   .Found

.NeedleLenIs4:
   dec   esi
   lodsd   ;EAX=first 4 bytes of Needle
   std
   .ScanNeedleLenIs4:
   repne   scasb
   jne   .NotFound
   cmp   [edi+1],eax
   jne   .ScanNeedleLenIs4
   jmp   .Found

.NeedleLenIsLong:
   ; get (needleSize-1)//4, (needleSize-1) mod 4
   dec   esi   ;ESI=&(Needle[0])
   inc   ebx   ;EBX=NeedleSize-4
   lodsd   ;EAX=first 4 bytes of Needle
   mov   [lNeedleRemPtr4],esi
   mov   edx,ebx
   shr   ebx,2
   mov   [lNeedleRemDwords],ebx
   and   edx,3
   mov   [lNeedleRemTail],edx

   xchg   ebx,edi ;EBX=save EDI buf ptr for scasb
   xchg   edx,ecx ;EDX=save ECX counter for scasb

   .ScanNeedleLenIsLong:
   std
   xchg   edi,ebx ;load saved buf ptr
   xchg   ecx,edx ;load saved counter
   .ScanNeedleLenIsLongJustScan:
   repne   scasb
   jne   .NotFound

   ;check all 4 bytes
   cmp   [edi+1],eax
   jne   .ScanNeedleLenIsLongJustScan

   ;check up to Needle's tail
   cld
   mov   ebx,edi
   mov   edx,ecx
   add   edi,5
   mov   esi,[lNeedleRemPtr4]
   mov   ecx,[lNeedleRemDwords]
   test   ecx,ecx
   jz   .ScanNeedleLenIsLongTail
   repe   cmpsd
   jne   .ScanNeedleLenIsLong
   .ScanNeedleLenIsLongTail:
   mov   ecx,[lNeedleRemTail]
   test   ecx,ecx
   jz   .ScanNeedleLenIsLongFound
   repe   cmpsb
   jne   .ScanNeedleLenIsLong
   .ScanNeedleLenIsLongFound:
   mov   edi,ebx ;FOUND!

.Found:
   inc   edi
   mov   eax,edi
   sub   eax,[hayStack]
.popOut:
   popfd
   ret
.NotFound:
   xor   eax,eax
   not   eax
   jmp   .popOut
endp


data import

library kernel32,'KERNEL32.DLL'
import kernel32,ExitProcess,'ExitProcess'

end data


Last edited by wOxxOm on Tue Nov 27, 2007 4:10 pm; edited 9 times in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Laszlo



Joined: 14 Feb 2005
Posts: 4710
Location: Boulder, CO

PostPosted: Sun Nov 25, 2007 4:45 am    Post subject: Reply with quote

This seems to be the first pure assembler function written for AHK. Of course we don’t laugh at you, but learn from it and at some point try to improve or to adapt it to some other task. Not many programmers want to write perfect code, just one, which does the job.
Back to top
View user's profile Send private message
wOxxOm



Joined: 09 Feb 2006
Posts: 326

PostPosted: Mon Nov 26, 2007 1:20 pm    Post subject: Reply with quote

InBuf (consequently, InBufStr wrapper as well) has undergone a very serious optimization, I've updated my earlier posts (also with asm code).

Time to search a very hostile buffer contents (all bytes are the same and equal Needle's first byte) on my pc - 60megabytes for 0.1sec minimum in case of 1 byte search, 0.5 sec - other lengths. That's time being measured before and after DllCall, for a simple case of under 1MB buffer the time was less than 1ms sometimes Smile which indicates actually anything less than 18ms - not bad either way Very Happy

Initial version yielded 1.2seconds in most cases for hostile 60MB buffer, so the maximum difference is 12 times!

InBufRev will be the next to get optimized.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
majkinetor



Joined: 24 May 2006
Posts: 4511
Location: Belgrade

PostPosted: Mon Nov 26, 2007 1:49 pm    Post subject: Reply with quote

Thx for your work. Highly appriciated.

I don't have use for it right now, so I will not test it ATM.


Have fun.
_________________
Back to top
View user's profile Send private message
derRaphael



Joined: 23 Nov 2007
Posts: 841
Location: ~/.

PostPosted: Tue Nov 27, 2007 5:17 am    Post subject: Reply with quote

hi
i wonder, if this routine might be used to update/modify the array the autohotkey.exe files generates on runtime for its controls.

if it'll be accessible, so this tiny script can search for a particular name assigned to a control by a script and modyfies it somehow (without corrupting the array structure - like replacing the varName through 012N123 without changinit its value) it should be possible to remove a control, by sending WM_CANCEL or WM_DESTROY, modify the autohotkey array and reassign a particular varName again to a new generated control.

im sure, unless the array wont be updated correctly - means the variable and the value deleted out of its structure - this would hit the 11,000 Control Limit / Gui when used frequently.

Found on:
http://www.autohotkey.com/forum/viewtopic.php?t=2859&highlight=array#18313

but it might be a workaround, if someone really needs desperate to detroy a control item and reassign that previously used varName to a new Control

Greets

DerRaphael
Back to top
View user's profile Send private message
wOxxOm



Joined: 09 Feb 2006
Posts: 326

PostPosted: Tue Nov 27, 2007 4:41 pm    Post subject: Reply with quote

New: InBufRev (reverse lookup) is now also optimized (the way as InBuf is).

Fixed: a dumb bug, now -1 is returned if nothing is found (was '0' Very Happy )
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bmcclure



Joined: 24 Nov 2007
Posts: 774

PostPosted: Sat Dec 08, 2007 5:41 pm    Post subject: Reply with quote

Great functions!

wOxxOm wrote:
Code:
;substrBuf - extract a string of specified length from a binary buffer
;         usage: stringVar:=substrBuf( &buf+Offset, 100 )
substrBuf(bufAddr, Length)
{  VarSetCapacity(result,Length)
   DllCall("RtlMoveMemory", "str",result, "uint", bufAddr, "uint",Length)
   return result
}
;substrZBuf - extract a NULL terminated string from a binary buffer
;         usage: stringVar:=substrZBuf( &buf+Offset )
substrZBuf(bufAddr)
{  L:=dllCall("lstrlen","uint",bufAddr)
   VarSetCapacity(result,L)
   DllCall("RtlMoveMemory", "str",result, "uint", bufAddr, "uint",L)
   return result
}


Couldn't this be combined?
Code:
;substrBuf - extract a string of specified length from a binary buffer
;         usage: stringVar:=substrBuf( &buf+Offset, 100 )
; Omit Length to extract null-terminated string
substrBuf(bufAddr, Length="")

   If (Length = "")
      Length:=dllCall("lstrlen","uint",bufAddr)
   VarSetCapacity(result,Length)
   DllCall("RtlMoveMemory", "str",result, "uint", bufAddr, "uint",Length)
   return result
}

_________________
Ben

My Trac projects
My Wiki
[Broken] - My music
Back to top
View user's profile Send private message
wOxxOm



Joined: 09 Feb 2006
Posts: 326

PostPosted: Sat Dec 08, 2007 5:48 pm    Post subject: Reply with quote

NICE, the first post is updated
Back to top
View user's profile Send private message Send e-mail Visit poster's website
polyethene



Joined: 11 Aug 2004
Posts: 5248
Location: UK

PostPosted: Mon Dec 31, 2007 1:23 pm    Post subject: Reply with quote

You know RegEx can search past null chars too.
_________________
GitHubScriptsIronAHK Contact by email not private message.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Laszlo



Joined: 14 Feb 2005
Posts: 4710
Location: Boulder, CO

PostPosted: Mon Dec 31, 2007 1:37 pm    Post subject: Reply with quote

Titan wrote:
You know RegEx can search past null chars too.
If Haystack contains nulls, it looks easy, but could you give some example, how to handle search- and replacement strings, which contain null chars?
Back to top
View user's profile Send private message
polyethene



Joined: 11 Aug 2004
Posts: 5248
Location: UK

PostPosted: Mon Dec 31, 2007 1:42 pm    Post subject: Reply with quote

Laszlo wrote:
If Haystack contains nulls, it looks easy, but could you give some example, how to handle search- and replacement strings, which contain null chars?
Don't see why you couldn't have experimented yourself, but for the sake of it here's a lousy demonstration:

Code:
var = abcdef123ghi
NumPut(0, var, 3, "Char") ; replace 4th char with null byte

MsgBox, % "Position of null: " . RegExMatch(var, "\0") ; find
   . "`nPosition of last char: " . RegExMatch(var, ".$") ; search past

new := RegExReplace(var, "\d", "-") ; replace digits which are past the null char
NumPut(46, new, 3, "Char") ; 'unmask' var from ahk display
MsgBox, %new%


wOxxOm wrote:
Time to search a very hostile buffer contents (all bytes are the same and equal Needle's first byte) on my pc - 60megabytes for 0.1sec minimum in case of 1 byte search, 0.5 sec - other Needle's lengths (full 60MB span)
Using the following script I got speeds as fast as 0.0069785545369593 ms (on P4 3ghz) to search for a char past a null byte in a 60MB stack:

Code:
i = 10
s := VarSetCapacity(var, 60 * 1000 * 1000, 0xff) ; 60 MB
Random, r, 1, s - 1
NumPut(0, var, r, "UChar")
NumPut(0x61, var, r + 1, "UChar")

t1 := t2 := 0x00000000 + 0
RegExMatch(var, "\x61")

DllCall("QueryPerformanceCounter", "Int64P", t1)

Loop, %i%
   RegExMatch(var, "\x61")

DllCall("QueryPerformanceCounter", "Int64P", t2)
DllCall("QueryPerformanceFrequency", "Int64P", f)

SetFormat, Float, 0.16
MsgBox, % (t2 - t1) / i / f . " ms"

_________________
GitHubScriptsIronAHK Contact by email not private message.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
wOxxOm



Joined: 09 Feb 2006
Posts: 326

PostPosted: Mon Dec 31, 2007 10:46 pm    Post subject: Reply with quote

hm, your example is the most simple of all the possible ones, of course my function will be even faster in such case, because it will check all the 60MB in ONE asm command: REPNE SCASB, and there hardly could be anything faster.

A good call though anyway.

P.S. RegExMatch has the advantage of being built-in in AHK core code, whereas my InBuf is being called via extremely slow in itself DllCall mechanism.
On the other hand there is no sane way to use regExMatch to search adlib binary data inside binary buffer, lest you create an escapement function, which is well also possible Very Happy
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Laszlo



Joined: 14 Feb 2005
Posts: 4710
Location: Boulder, CO

PostPosted: Mon Dec 31, 2007 11:42 pm    Post subject: Reply with quote

Titan wrote:
Don't see why you couldn't have experimented yourself
I could, but you sounded as someone, who already did the work. Still, your example does not answer the question.

It looks like RegExMatch/Replace uses counted strings for haystack, not NULL terminated ones. I did not find it documented, but it is a pleasant surprise. However, I asked for an example to search for a string containing nulls. Your example indicates that in the search string we have to replace each special character ([]()"\'.*?<>^$|null…) with a hex (or octal, as in your example) escape sequence (or precede them with "\"), which is not a very elegant solution.

AHK passes the length of HayStack to the regular expression function. The difficulty is setting the right StrLen if we get a binary buffer to RAM. When characters of an existing string are overwritten with NumPut (as in your example), StrLen stays unchanged.

When a binary file is to be read into RAM, we have to use the *c option, which sets StrLen the file size, but the data is stored in a special variable, not usable for RegEx. We have to copy it into another variable (or use dllcalls to open/read/close the file).
Code:
FileRead a, *c %A_AhkPath%
VarSetCapacity(b,StrLen(a),1)
DllCall("RtlMoveMemory", UInt,&b, UInt,&a, Uint,StrLen(a))
MsgBox % "It is found: " RegExMatch(b, "\0\03")
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions All times are GMT
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group