Binary Data | VarSetCapacity | VarSetLength | Heap Object

Discuss the future of the AutoHotkey language
just me
Posts: 6567
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 04:51

:arrow: Buffer Object

Helgef wrote:I think writetype and readtype would be a good replacement for numgetput.
Me too! (Together with Buffer.Pos)

Buffer.Data : Retrieves a copy of the the buffer's content, as a string.
What is the intended usage? If the buffer contains an UTF-16 string one should get it with StrGet(). Otherwise, the returned data aren't actually a string but arbitrary binary data, and you might loose one important byte if the buffer's size is odd. So why?
Helgef
Posts: 4031
Joined: 17 Jul 2016, 01:02
Contact:

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 05:16

Flipeador wrote:I think, as lexikos said above, it should pad to default C alignment values by default.
lexikos said perhaps, and I see no indication that is does. But if it did, then it would be for structs.
User avatar
Flipeador
Posts: 1149
Joined: 15 Nov 2014, 21:31
GitHub: Flipeador
Location: Argentina
Contact:

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 05:22

WriteType and ReadType can be added as methods to the Buffer object, but NumPut and NumGet must be kept. Sometimes you have to write or read in memory allocated by the system by other methods (as CoTaskMemAlloc). Or you just want to avoid using the Buffer object to write a simple value.
Helgef wrote:then it would be for structs
For simple structures. Later on, the Struct object can be implemented.
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 05:28

I changed my mind about handling alignment, as it would require scanning through the parameter list twice - once to calculate the alignment requirement, and again to put the numbers - and it might be unwanted. I do not want to add another parameter to NumGet or NumPut.

Instead, NumPut just puts the numbers in sequence, the same as if you nested calls. If you had to deal with alignment requirements before, you still do.
So if it is not for structs, then what is it for?
I did not say it is not for structs. I said it was not a substitute for struct support, meaning that proper struct support, which would be much better than just being able to write the fields in sequence, may still be added (much) later. This you can have at the very low cost of I already implemented it (and optimized the code while I was there).

There are probably more appropriate ways to do arrays of homogeneous numbers. I can't say I've ever needed to put a sequence of numbers into an array, other than in a loop where the old NumPut would suffice. There are many times I would have benefited from the new NumPut mode.
User avatar
Flipeador
Posts: 1149
Joined: 15 Nov 2014, 21:31
GitHub: Flipeador
Location: Argentina
Contact:

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 08:37

I think these could be optional (for performance reasons in large buffers perhaps):
  • All bytes within the buffer are zero-initialized.
  • In this case adding the Resize method:

    Code: Select all

    ; https://github.com/Lexikos/AutoHotkey_L/commit/fbc503470579d421c2bae9cd5322f07ed98af6c2
    if (aNewSize > mSize)
        memset((BYTE*)new_data + mSize, 0, aNewSize - mSize);
    If the size is increased, all data is preserved and any new bytes are zero-initialized.
lexikos wrote:I did not say it is not for structs. I said it was not a substitute for struct support, meaning that proper struct support, which would be much better than just being able to write the fields in sequence, may still be added (much) later.
Can't we just take part of the work done in AutoHotkey_H?, and turn that 'much later' into 'a few days'. ;)
A Clone method (or make BufferAlloc accept a Buffer object in the first parameter, much better) and a new MemCopy function may be useful: MemCopy(TargetSrc, TargetDest [, Bytes]) (This function may call memmove, which seems safer than memcpy, but slower surely).
* Shouldn't the current buffer be released before replacing it with the new one?: mData = new_data;.
Helgef
Posts: 4031
Joined: 17 Jul 2016, 01:02
Contact:

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 14:21

Shouldn't the current buffer be released before replacing it with the new one?: mData = new_data;
Assuming you refer to BufferObject::Resize, it uses realloc, which can return the same pointer or move the entire buffer to a new location, in which case it frees the old buffer. So no additional actions are required.

Cheers.
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 15:08

I assumed that the guarantees provided by initialization would outweigh the loss of performance, but I might have underestimated the latter. Also, Helgef pointed out BufferAlloc() doesn't currently zero-initialize, so one can compare the performance of b := BufferAlloc(0), b.Size := n to b := BufferAlloc(n), b.Size := n or even b := BufferAlloc(n+1), b.Size := n.

If you want AutoHotkey_H's implementation of structs as is, you can use AutoHotkey_H. I will not be using any code from AutoHotkey_H without thorough review. I will not be working on structs before v2.0, except in the unlikely event that it particularly takes my interest. If you've been paying attention, you can see how much I can get done in a few days when I'm actually interested.

ClipboardAll(buf.Ptr, buf.Size) actually gives you a clone of the Buffer, but with a different type name. It wouldn't be BufferAlloc() in that case, but BufferFrom(). When it comes to low level stuff like this, I think the added clarity is important. Incidentally, the name was inspired by Node.js, which also has Buffer.allocUnsafe() for skipping initialization.
This function may call memmove, which seems safer than memcpy, but slower surely
In the context of a function called by the script, I doubt that it would make any real difference. If you use DllCall, for instance, msvcrt\memcpy vs. msvcrt\memmove makes zero difference.
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

13 May 2019, 15:30

I didn't see the third page before.

@just me Pos, Write and Read would be more appropriate for a Stream, Pointer, or some-such. This is just a Buffer. I did consider implementing something like a bound-checked pointer type, at some future date. As much as these things can make dealing with structs or binary data easier, I do not see a need to design and develop them right now.

Actually, Ptr (pointer) might be mistaken for the Buffer's current position, if it had that. Anyway, I prefer the composable approach, where Buffer is just a block of memory and can be used as the basis of various other objects. For instance, AutoHotkeySC.bin already contains a C++ class TextMem, which allows one to treat a block of memory as a file. It won't take much to create a File object from a Buffer.

Buffer.Data is just ClipboardAll.Data. I didn't write any new code for it. It was, and is, used to pass the data to FileAppend, although now you can pass the object itself to File.RawWrite(). I planned to let FileAppend accept an object too, but it didn't seem urgent. Data returns a copy of the data, which might be useful in some other context as well. On the other hand, in other ways I was trying to get away from using strings for binary data, so I'll probably remove Data.

ClipboardAll works around the odd-byte truncation by rounding up. The additional byte doesn't make any difference to the Clipboard variable. Rounding up for the memory block while also retaining the original size (which I figured would be useful for Buffer) seemed like it would add too complexity that would outweigh the benefits, so I left it as is for now.

As I said, I implemented only the very bare minimum to replace SetCapacity/GetAddress for now. My purpose was to eliminate those methods so that I could continue working on splitting Object into Array and Map without having to update the methods or reduce the objects' flexibility.
User avatar
nnnik
Posts: 4329
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

14 May 2019, 05:57

Love the recent changes :) <3
Recommends AHK Studio
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

18 Oct 2019, 22:49

I don't think I (explicitly) mentioned one of the problems with NumPut (hypothetically) taking C struct alignment requirements into account by default: adding padding is easier and more intuitive than removing unwanted padding, such as when the struct has non-default requirements, or if NumPut is being used for something that isn't a C struct (like file/network data).
User avatar
nnnik
Posts: 4329
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

19 Oct 2019, 05:17

NumPut shouldn't consider struct padding - that much is obvious.
But we also need a built in method for structs moreso than we need NumPut because most of the times NumPut is used to access C Structs.
Recommends AHK Studio
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

19 Oct 2019, 07:10

That was mainly for fincs' benefit (I was going to link to this topic), but I may have been misreading this:
fincs wrote:
15 Oct 2019, 19:25
NumPut's new mode is nice and with a bit of wrapping/bound-function magic it could be used to serialize structs in one go; however in practice it might not be as useful due to it not obeying struct packing rules (basically, all struct fields are aligned to the required alignment of their type, which is often just the size of the type itself). An alternative solution would be to use explicit padding fields in struct wrappers, but it might not be a great solution.
I don't know whether fincs has read this topic yet. If not:
lexikos wrote:
13 May 2019, 05:28
I changed my mind about handling alignment, as it would require scanning through the parameter list twice - once to calculate the alignment requirement, and again to put the numbers - and it might be unwanted. I do not want to add another parameter to NumGet or NumPut.

Instead, NumPut just puts the numbers in sequence, the same as if you nested calls. If you had to deal with alignment requirements before, you still do.
I'm not sure what "might not be as useful" means; I certainly believe that having the new mode is more useful than not having it, but it obviously does not serve as a ready-made struct implementation, nor is it intended to. The new parameter mode is just a low cost change that makes the function more suitable for a purpose it is already serving.

One may benefit from the new mode of NumPut even when creating structs which require explicit padding.

If you are using Bind to predefine the field types, you can also use it to predefine the padding (a type and an arbitrary value).

Often, padding can be handled by promoting the previous field to a larger type. For example: If an int has 4 bytes of padding after it because the struct contains an int64, promote the int to int64. If the padding is needed only on x64 because the struct contains a pointer, promote the int to ptr.
User avatar
fincs
Posts: 504
Joined: 30 Sep 2013, 14:17
GitHub: fincs
Location: Seville, Spain
Contact:

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

19 Oct 2019, 16:55

Just for the record:
lexikos wrote:I'm not sure what "might not be as useful" means; I certainly believe that having the new mode is more useful than not having it, but it obviously does not serve as a ready-made struct implementation, nor is it intended to. The new parameter mode is just a low cost change that makes the function more suitable for a purpose it is already serving.
I assumed the new mode was primarily intended to be used for wrapping structs, hence why I raised the point of obeying alignment rules. I now know alignment rules are in the end a non-issue:
lexikos wrote:If you are using Bind to predefine the field types, you can also use it to predefine the padding (a type and an arbitrary value).
Often, padding can be handled by promoting the previous field to a larger type.
I hadn't thought of those workarounds and yes, that would cancel out my concerns. Thanks for the insight.
fincs
Windows 10 x64 Build 18362 | AMD Ryzen 7 3700X with 32 GB of RAM | AutoHotkey v1.1.31.01
Get SciTE4AutoHotkey v3.0.06.01 - [My project list]
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Binary Data | VarSetCapacity | VarSetLength | Heap Object

19 Oct 2019, 17:10

It was primarily motivated by structs, but works just as well for everything else. Even if you had to break it into multiple calls wherever an offset was needed, it would still be an improvement over requiring exactly one call for each field. Even if you're calling it once for each field, it can be an improvement - when you nest calls, instead of being inside-out...

Code: Select all

NumPut(y, NumPut(x, pt, "int"), "int")
...it is right to left:

Code: Select all

NumPut("int", y, NumPut("int", x, pt))

Return to “AutoHotkey v2 Development”

Who is online

Users browsing this forum: No registered users and 14 guests