The OP contains significant exaggeration and some false assumptions.
nnnik wrote: ↑
07 Feb 2019, 08:35
Closing the file will force windows to completely flush the buffer onto the disk before execution continues.
Microsoft's documentation explicitly states otherwise:
Closing the File object
will cause the File object
to pass the contents of its write buffer to WriteFile and then call CloseHandle. It does not force anyone to write the contents to disk (even if the file is on a disk).
The File object and FileAppend both use an internal class named TextStream. TextStream currently has an 8KB buffer that it uses for both reads and writes, but for simplicity, never uses it for both purposes at the same time. This is why calling Read(0) flushes the write buffer. The buffer is also flushed if you call File.__Handle
(or in v2, File.Handle
), so that the position of the file pointer
and actual content of the file (as seen by whatever Win32 function you're calling) will be what you expect. However, this only synchronizes the File object's perspective with the Win32 one; it does not guarantee that data is written to physical storage.
When you read data from file, the file pointer is advanced by as many bytes as were read. When you write data to file, it is written at the position indicated by the file pointer, which is then moved by as many bytes as were written. So in effect, writing n bytes will cause the next n bytes (if there are that many) to be overwritten and therefore not be returned by the next read. Reads and writes could share the buffer by emulating this behaviour. Doing so would improve the performance of scripts that intersperse reads and writes within an 8KB block. In that case, Read(0) would have no reason to flush the write buffer, so it's probably better to use File.Handle if you want to flush the buffers.
The File object performs buffering because calls to ReadFile and WriteFile are costly. I believe this is mostly because of the overhead of switching to kernel mode (and perhaps calling the file system driver), not because of disk write performance. Originally when Unicode support was added, ReadFile and WriteFile were called very frequently (perhaps for every character). When I tested the performance, I found that the overhead of these calls dwarfed any differences caused by storage devices (such as between a hard disk drive, SSD or RAM disk).
nnnik wrote: ↑
11 Feb 2019, 01:06
Edit: Now I tested it myself and the buffer is indeed not flushed when I crash the script - I guess this is to prevent the program from writing bad data to the file - I need to change some stuff in the topic.
The buffer you are flushing is in the TextStream object (within the File object), which the OS knows nothing about. The buffer is not deliberately discarded; it is lost because the program has crashed and no further program logic is executed. It is like accumulating your data in a variable with the intention of passing it to FileAppend, but then having the program crash before FileAppend is called.
nnnik wrote: ↑
07 Feb 2019, 08:35
Each time you close a file Windows will back up this new version of the file in addition to removing the old one.
I think you have a mistaken idea about how shadow copies work. Shadow copies are taken on request, such as when backup software runs or a system restore point is created, not whenever you write to a file. Repeatedly opening and closing a file will not cause multiple backups of the file. Backups of the user's files might never be (and generally are not) created via shadow copy without the use of additional software (or configuring File History).
As for your assumption that an antivirus would wait until the file is closed, I think it is unfounded. I would think that most antivirus programs are like a black box to anyone but the developer, and each one may be different; perhaps some behave the way you say, and some do not.
Expressions have overhead, and objects have more overhead. In some cases the overhead of FileOpen + File.Write can outweigh any performance gain received from buffering and keeping the file open, and it is probably always slower than accumulating the data in a variable and writing once with FileAppend. Whether it helps or hinders performance should be confirmed by benchmarking your script, if that kind of performance matters to you. But if you can't see
the difference without benchmarks, it probably doesn't matter in the real world.
It appears that all of the arguments against FileAppend so far are based on the assumption that it opens and closes the file each time, but consider Loop Read's OutputFile parameter. In that case, the first call to FileAppend which omits the Filename opens the output file, and it is closed only when the loop stops. Each call to FileAppend (omitting the Filename) utilizes the same buffering as File.Write.
jeeswg wrote: ↑
07 Feb 2019, 15:09
But is it possible to reserve file space for the text? Otherwise the file appending via the File object might be inefficient.
How does this relate specifically to the File object and not FileAppend? You could accumulate data into a variable and then pass it to File.Write once, or you could call FileAppend repeatedly.
In fact, you can "reserve file space" by setting File.Length
, but there is probably no guarantee that the OS will physically reserve space on the disk, or that it will do so efficiently, or that the reserved space will be contiguous (it could be fragmented).
File.Write typically only writes when the 8KB buffer is near full, whereas FileAppend writes immediately; but even so, the file system has a minimum allocation unit (often 4KB) which would reduce the chance of small writes causing fragmentation.
swagfag wrote: ↑
07 Feb 2019, 17:32
ultimately calls WriteFile
. can it be altered to expose nNumberOfBytesToWrite
1. FileOpen corresponds to CreateFile, not WriteFile. It just opens the file handle.
2. When you write text, you don't want to specify the number of bytes. You want exactly that text encoded into bytes and written, and that's what the File object's methods do, if you ignore buffering.
3. RawWrite accepts a byte count.
If you want data flushed from the buffer immediately by each Write/RawWrite call, what you need is an option for that, not to expose some parameter of WriteFile. (This reminds me of the XY problem