Jump to content


Photo

[Lib] binGetString - Get textual part of binary {binfile()}


  • Please log in to reply
8 replies to this topic

#1 Tuncay

Tuncay
  • Members
  • 1943 posts

Posted 11 March 2010 - 04:38 PM

Ahk Compatible: Vanilla / Mainstream
License: New BSD License

Download (~11 kb)
The archive contains the library, a demo script and the documentation. Also all required modules are included.Online Documentation

Description

I tried to make a function to read textual portion of binary files. After searching the forum, I did not find any solution (that day). The function can be used to read out headers etc.

Examples

MsgBox % str := binGetString(bindata, 800, 64, filename)

#Include binGetString.ahk
#NoEnv
SendMode Input
SetWorkingDir %A_ScriptDir%
SetBatchLines, -1

; Any positiv number is the size of first bytes to load.
sizeToRead := 5000

; Specify how many characters to leave before
; beginning to work with file.
seek := 0

; Select source file
FileSelectFile, filepath, %A_WorkingDir%,, Select binary file
If (ErrorLevel = 0)
{
	; Read file into memory
	FileRead, bindata, %filepath%
	
	; Strip non printable ASCII characters from file
	str := binGetString(bindata, sizeToRead, seek, filepath)
	
	; Show that content now
	MsgBox,, %filepath%, %str%
}

Edit: Name changed from binfile to binGetString. In general, it is the same function. In addition, it does leave out the first 31 ascii characters too.

#2 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 11 March 2010 - 05:23 PM

The expression
Chr(NumGet(_data, _offset + A_Index - 1, "UChar"))
gives you the char already in _data, so you just copy it to str, except NUL characters are dropped. The ByRef _data already contains the result, in full. If you know there is no NUL in the _data, SubStr is easier to extract the desired bytes.

#3 Tuncay

Tuncay
  • Members
  • 1943 posts

Posted 11 March 2010 - 05:34 PM

I am not understanding why? The data contains nulls like in say mp3 or wma file. There IS nul in the data and this function converts to string representation. It is not desired for reading text files. :shock:

#4 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 11 March 2010 - 05:39 PM

When you append NUL {=chr(0)} to a string, nothing changes. This is why NUL characters are removed from str, which you return. Its length will be smaller, and you will not know, where the original NULs were.

#5 Tuncay

Tuncay
  • Members
  • 1943 posts

Posted 11 March 2010 - 05:49 PM

This is why I made this. Its more for like reading header of mp3 or such a thing.

#6 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 11 March 2010 - 06:01 PM

I see. The script description does not tell that you wanted to extract ANSI text from binary files, or drop NUL chars. If you want only ANSI text, you could exclude other funny characters, too, (with code <32 or >126) making the result even more readable.

#7 Tuncay

Tuncay
  • Members
  • 1943 posts

Posted 11 March 2010 - 06:23 PM

Ok I have renamed the function from binary_read() to binfile() and changed topic.
You are right about stripping characters above 126, but with those of below 32, there are some characters like tab, which should not be removed. I have to look which codes can be eliminated. But I think this should not be really neccessary, because a correct usage of this function assumes the user knows the offsets and size of data to extract.

Edit: Stripping all codes above 126, is this correct?

#8 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 11 March 2010 - 07:51 PM

If you are expecting diacritical letters, or other special characters you need to filter several intervals, or use "if var is alpha" for letters, and intervals for symbols. The script is useful for finding error message texts in program files, to determine the version of a document stored in binary form, to figure what SW created a file, etc. In these cases you might not know the offset, where to look.

#9 Tuncay

Tuncay
  • Members
  • 1943 posts

Posted 29 September 2010 - 09:31 AM

Updated and license changed.