Why you shouldn't use FileAppend to continuously write to the disk

Put simple Tips and Tricks that are not entire Tutorials in this forum
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Why you shouldn't use FileAppend to continuously write to the disk

Post by nnnik » 07 Feb 2019, 08:35

Some stuff first
If you use AHK and had to write something to a file you might have used the FileAppend command.
It is a command that lets you append simple text to simple text files.
It's stupid simple: you state a piece of text name a file to append it to and AHK will append the text or create a new file with that text.

Code: Select all

Loop, 99
	FileAppend, %A_Index%, numbers.txt
This piece of code will write all numbers from 1 to 99 into a file named numbers.txt.
It's really simple but also really really bad code that can cause major issues on newer systems.

Code: Select all

file := FileOpen("numbers.txt","a")
Loop 99
	file.write(A_Index)
file.close()
This code is much longer and many people will say it's a lot more complex. Regardless it's a lot less problematic.

Each FileAppend command does the same as a few actions with a fileObject:

Code: Select all

Loop 99 {
	file := FileOpen("numbers.txt", "a")
	file.write(A_Index)
	file.close()
}
If you look at this you might see that a few actions are rather poorly optimized.
You might just think the closing an opening of a file is neglecteable - however that's not the case.
You are dealing with the filesystem. A special piece of hardware that is heavily optimized and backed up and relevant for security.
Opening and closing something doesn't have to happen fast nor does it have to happen that often.
And therefore it's made to last long - however once opened you can write a lot to a file in a fast way.
And when you close it all this will have been written to the file on the disk.

Here are the 3 biggest problems that opening and closing a file repeadetly like this can cause:

1. Not using the buffer
While the file is open you can always tell Windows to write new data to the file.
Windows will then take that data and slowly write it to the file - however it will tell you to move on with your program before it even finishes writing.
The data is then buffered before it's slowly written to the file. When you want to write more data while the file is still open it will just get added to the buffer.
Windows will only tell your program to wait when this buffer gets filled and it can't buffer anymore - however it's unlikely that this will happen easily.
This buffering process is important to make your program faster. It is also used to increase the lifetime of SSDs.
Closing the file will force windows to completely flush the buffer onto the disk before execution continues.
Doing this cycle of opening and closing excessively completely fails to make use of this feature.
This can massively decrease the performance of your script while also damaging the hardware of your users on the long run.

2. Shadowcopies
Starting with one of the newer Windows Versions (I think Vista or 7) Windows keeps backups of all the old versions of your files.
This is so that you can restore them or so that Windows can analyze and optimize it's programs.
Regardless of whether you like this feature or not - a fact is that this features is active in many if not most Windows PCs.
And don't even dream about your users disabling this feature just for your program.
Each time you close a file Windows will back up this new version of the file in addition to removing the old one.
If you are appending like this Windows will have to start moving around some serious data after your file gets large enough.
Each successive close operation will take longer. a file with 99 lines might be fine, but what about 300k lines, what if each new line contained 100s of characters instead of 1 or 2?
Once again this might decrease the durability of your users hardware and it massively decreases performance.

3. Other programs
If you are actively using this PC and not only using it to produce a single AutoHotkey script you might have other programs installed.
People commonly have anti-virus programs installed and these commonly tend to take a look at files programs write.
People might also have a synchronised folder like dropbox or similar and these tend to take a look at new files programs write to upload them to their cloud.
Of course while the file is open the cloud folder won't move an inch. The anti-virus program might also jump a bit into action and take a look at the file thats is being written.
But honestly only if there is enough resources and your PC doesn't have anything else to do.
However after the file is closed these programs jump to action and scream: "Now it is my time to block your resources and hog your PC to find out if this is dangerous" or "Now it is my time to block your resources and hog your internet to upload this new file version on your cloud folder" upon which the Anti-Virus program once again jumps to action to check this new action thats being taken.
If you close and open repeadetly each program will jump to action after every single time and do it's thing again.
You won't be able to write to your file until these programs are done with your file.
In the best case your program will become extraordinarily slow. Consider that the Anti-Virus program performs an heuristic search on the entire file that you just touched with FileAppend and the process that is still running.
And perhaps it might find these consecutive writes supsicious and it might do a complete search each time. Multiple minutes for each new open and close.
Perhaps it might even find that this behavior is a bother in itself and toss the program from the PC and give a malware warning.
In the worst case FileAppend will fail silently - as all things tend to fail silently in AHK v1.
Your dropbox provider might even block you from uploading new data automatically because you are using too much of their bandwidth.
You don't really want to damage the users of your programs or?

Open your files when you start writing to them - Close them after there is nothing that you will write to them anymore.

Additional interesting results:

Keeping a save logfile:
Using File.write will not immediately write to the file - instead it will write to the file when it pleases.
When reading from the file object that you are writing to it will immediately write to the file.
When crashing windows will not write the buffer thats currently held by the fileObject onto the disk - meaning that you might loose the data thats inside there even if you used file.write.
In order to prevent that you can use file.read(0) to immediately write the buffer to the disk.
Further research is necessary on that matter though as I do not know the impacts performance wise.

The batchsize problem:
Now you might ask yourself "why don't we just write the entire text that is created to a variable and then after the text creation is done we write it to the disk?".
And in my opinion that's fine. It's a good solution - however there are a few situations where you might prefer using file.write during the loop:

You have a lot of data and cannot hold it all inside the memory.
You want to make sure at least part of it is saved even when crashes occur during text generation.
You generate a lot of data that you need to write to the disk as fast as possible.

Generally however there isn't a lot of difference between either alternative - so you might just stick with the one that you like more.
I think both times you are just being lazy - which isn't a bad thing, you need to avoid extra work to get things done.

Of course when looping 99 times and just writing the number down each iteration we dont generate a lot of data.
So it's easy to imagine that the system wont write to the disk until we close the file anyways - you will call file.write every time but it won't have any effect.
But what if we loop 4.000.000 times? Sure the first 100 times the system won't act on us calling file.write but what about the 101th time?
It just might be enough data to cause the system to write to the disk. However that also means that we made 100 useless attempts at doing that.
Of course the system will make the writes highly optimized with regards to many things and considering things we cannot possibly imagine.
However these considerations take time. If we want maximum performance we should avoid using this too often.

So what we do in this case is pack a few iterations into a batch and notify the system of the results of this whole batch at once.
Using this setup 'calling file.write every iteration' would be a batch-size of 1 and 'calling it once at the end' is a batch size of infinity.
Both are equaly lazy and probably equally bad. Example code:

Code: Select all

#NoEnv
SetBatchLines , -1
file := FileOpen("test.txt", "w")
batchBuffer := ""
batchSize := 4000

maxNr := 1000000

Loop %maxNr% {
	e .= A_Index
	if (!mod(A_Index, batchSize)||A_Index=maxNr) {
		file.write(e)
		e := ""
	}
}
Recommends AHK Studio
gregster
Posts: 9002
Joined: 30 Sep 2013, 06:48

Re: Why you shouldn't use FileAppend

Post by gregster » 07 Feb 2019, 09:26

Using the File Object is all fine and dandy with me, but it is a statement like "I just want to say that I will move or delete their posts and not waste any breath on them" that "triggers" me.

I would reconsider this statement - this is not adequate for an admin of this site. Feel free to ignore other opinions, but to suppress a discussion (or certain opinions) before it even started (while we don't know if there ever will be one), seems to ignore the spirit of this forum, totally independent from the actual topic of this thread. At least, I know of no forum rule that prohibits dissent (even if it is wrong or considered wrong). This doesn't mean that obvious trolling attempts cannot be moderated (although I would leave this to other mods or admins which are not involved in this topic).

Anyway, thanks for the post about the File Object!
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Why you shouldn't use FileAppend

Post by nnnik » 07 Feb 2019, 09:33

I just had it with the discussion about this topic. Even if I will not move their posts I'll at the very least completely ignore them.
It just shows how completely exhausted I am talking about this and I hope that everyone will not start another discussion about this topic and ruin this topic for me because I spend a lot of time writing topics like these.
If they keep getting ruined by the same topic over and over again maybe I'll just quit instead.
Recommends AHK Studio
gregster
Posts: 9002
Joined: 30 Sep 2013, 06:48

Re: Why you shouldn't use FileAppend

Post by gregster » 07 Feb 2019, 09:40

Well, surely nobody wants you to quit - but you should perhaps realise that differing opinions are not a personal attack on you or will devalue the good things you are doing here.
garry
Posts: 3764
Joined: 22 Dec 2013, 12:50

Re: Why you shouldn't use FileAppend

Post by garry » 07 Feb 2019, 14:24

I like to collect in a variable and write the variable once in a file ( if needed )
yes , this is only append / otherwise must delete the existing file
fileopen allows use the parameter a / w /rw ....

Code: Select all

loop,99
  e .= a_index . "`r`n"
fileappend,%e%,numbers.txt
e=
return
question to 3-rd example ( not often worked yet with fileopen )
not better first collect to variable and then write once ?

Code: Select all

f1:= a_desktop . "\numbers.txt"
loop,99
  e .= a_index . "`r`n"
	file := FileOpen(F1, "w")
	file.write(e)
	file.close()
e=	
return
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Why you shouldn't use FileAppend

Post by nnnik » 07 Feb 2019, 14:59

There isn't any real advantage to this over simply using the fileObject.
With the FileObject you start writing to the file from the moment your loop starts.
With this code the data is appended once the loop is finished.
This might not seem like much at first but a loop that potentially runs for hours could create a rather large amount of data.
Perhaps more than AutoHotkeys memory limit can hold.

Using your code is neither easier nor more difficult than using a FileObject.
In my opinion it's practically the same thing.
Why are you going out of your way to use an old feature that creates broken/bad code?
Recommends AHK Studio
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Why you shouldn't use FileAppend

Post by jeeswg » 07 Feb 2019, 15:09

But is it possible to reserve file space for the text? Otherwise the file appending via the File object might be inefficient.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
garry
Posts: 3764
Joined: 22 Dec 2013, 12:50

Re: Why you shouldn't use FileAppend

Post by garry » 07 Feb 2019, 15:11

have no experience with fileopen
but in your example I think not needed 99-times
- file := FileOpen(F1, "w")
and
- file.close()

your example :

Code: Select all

Loop 99 {
	file := FileOpen("numbers.txt", "a")
	file.write(A_Index)
	file.close()
}
idea :

Code: Select all

f1:= a_desktop . "\numbers2.txt"
file := FileOpen(F1, "w")  ;- write , not append
loop,99
	file.write(A_Index . "`r`n")
file.close()
run,%f1%
return
or see 2 examples above ( collect to variable and write once )

Code: Select all

f1:= a_desktop . "\numbers.txt"
loop,99
  e .= a_index . "`r`n"
	file := FileOpen(F1, "w")
	file.write(e)
	file.close()
e=	
return
Last edited by garry on 08 Feb 2019, 12:57, edited 1 time in total.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Why you shouldn't use FileAppend

Post by nnnik » 07 Feb 2019, 16:35

garry wrote:
07 Feb 2019, 15:11
have no experience with fileopen
but in your example I think not needed 99-times
- file := FileOpen(F1, "w")
and
- file.close()

your example :

Code: Select all

Loop 99 {
	file := FileOpen("numbers.txt", "a")
	file.write(A_Index)
	file.close()
}
idea :

Code: Select all

f1:= a_desktop . "\numbers2.txt"
file := FileOpen(F1, "w")  ;- write , not append
loop,99
	file.write(A_Index . "`r`n")
file.close()
run,%f1%
return
My example was meant to showcase the behavior of FileAppend and show that doing what it does is bad.

@jeeswg - you could surely reserve the filesize in some way. However this is not a major concern.
The biggest concern is the excessive opening and closing of files.
Recommends AHK Studio
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Why you shouldn't use FileAppend

Post by swagfag » 07 Feb 2019, 17:32

FileOpen ultimately calls WriteFile. can it be altered to expose nNumberOfBytesToWrite? probably. is that a common enough usecase to warrant implementing it? /shrug
just me
Posts: 9453
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Why you shouldn't use FileAppend

Post by just me » 08 Feb 2019, 06:30

jeeswg wrote:
07 Feb 2019, 15:09
But is it possible to reserve file space for the text? Otherwise the file appending via the File object might be inefficient.
How would FileAppend reserve 'file space for the text'?
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Why you shouldn't use FileAppend

Post by jeeswg » 08 Feb 2019, 07:14

- You either: append text to a variable multiple times and use FileAppend *once*, or you open access to a file, and write *multiple* times. The latter would ideally have a reserved block of space, otherwise, depending on how it works, you might end up with heavily fragmented files.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Why you shouldn't use FileAppend

Post by Helgef » 08 Feb 2019, 07:39

Good topic nnnik, thanks for sharing :thumbup:.

Cheers.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Why you shouldn't use FileAppend

Post by jeeswg » 08 Feb 2019, 08:31

So what is the cost of using a File object to repeatedly write to a file, versus using FileAppend to write to a file once? Because if the costs are severe, perhaps it's a reason ...
[spoiler3]Why you shouldn't use File objects[/spoiler3]
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Why you shouldn't use FileAppend

Post by nnnik » 09 Feb 2019, 04:45

From practical experience doing the same thing with the FileObject will take 30 seconds-2 minutes when FileAppend requires 20 minutes+.
FileAppend doesn't have this optimization and you would end up with heavily fragmented files too.
If anything FileAppend will end up being more fragmented because there are multiple successive writes to the disk.
Recommends AHK Studio
DRocks
Posts: 565
Joined: 08 May 2018, 10:20

Re: Why you shouldn't use FileAppend

Post by DRocks » 09 Feb 2019, 09:00

I am joining in and asking why would you not just use the variable to write once, as others have mentionned?

Lets say you loop 99 times to build the variable and just do either FileAppend or fileobject open, write, close?

I cannot think of a situation where the variable wouldnt be preferable to looping 99 times to disk?
User avatar
haichen
Posts: 631
Joined: 09 Feb 2014, 08:24

Re: Why you shouldn't use FileAppend

Post by haichen » 09 Feb 2019, 10:05

On my samsung ssd fileappendtime/fileobjectmethodtime is 13000 (10000 loops)! so slow..I would not have thought that. Thanks for this tutorial!
garry
Posts: 3764
Joined: 22 Dec 2013, 12:50

Re: Why you shouldn't use FileAppend

Post by garry » 09 Feb 2019, 12:16

How measure fileappendtime ?
Short :
Lifetime of SSD is longer if use File.Write ?
And faster with File.Write ?

Code: Select all

f1:= a_desktop . "\Test.txt"
; e is the variable 
	file := FileOpen(F1, "w")
	file.write(e)
	file.close()
return
- or
Fileappend,%e%,%f1%
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Why you shouldn't use FileAppend

Post by nnnik » 10 Feb 2019, 03:53

DRocks wrote:
09 Feb 2019, 09:00
I am joining in and asking why would you not just use the variable to write once, as others have mentionned?

Lets say you loop 99 times to build the variable and just do either FileAppend or fileobject open, write, close?

I cannot think of a situation where the variable wouldnt be preferable to looping 99 times to disk?
Reason 1: The data is better off not in the memory.
If you create a very long file that can grow very large it might be wise not to store it all in memory.
You might use up memory that the user needs for other programs. Both size and bandwidth are limited.

Reason 2: The system will make that variable for you:
Using File.write may not directly lead to writing something to the disk.
With File.write we tell the system that this data should be written to the disk.
However since we keep the File open we also tell the system that more data will come.
The system might then decide when the optimal time for flushing the accumulated data to disk is.
It will take things into consideration that we cannot deal within our scripts.
However that being said those considerations take time.

Reason 3: The data won't get lost on crash

Reason 4: You don't have to wait until you are done with the looping until things will get written to disk.
Using this method you will be faster overall than waiting until your loop is done and then writing everything to disk.

My opinion:
Generally though I wouldn't say you are completely wrong.
This problem is essentially about batch execution. You know that your loop won't produce enough data in one iteration to make the system write to the disk then it might make sense not to tell the system that it might have to.
But what about every second iteration or every tenth or every thousand? Then you just might have enough to make the system write to the disk.
So in order to include this knowledge about our task into the script we will write code like:

Code: Select all

#NoEnv
SetBatchLines , -1
file := FileOpen("test.txt", "w")
batchBuffer := ""
batchSize := 4000

maxNr := 1000000

Loop %maxNr% {
	e .= A_Index
	if (!mod(A_Index, batchSize)||A_Index=maxNr) {
		file.write(e)
		e := ""
	}
}
In the example I have given you can surely let the loop run 99 times before notifying the system.
But thats just because the batchsize is larger than the size of the entire data set.
Depending on your problem you might want to shift around those values.
Always setting it to inifinity by never telling the system that data can be written inside the loop is probably just as bad as setting it to 1 - that's just being lazy. (Not a bad thing imo)
At this point you might just stick to the side which you prefer or offers the bigger benefits for your current problem.

@garry
In this topic I point out the consequences of repeadetly opening and closing a file use FileOpen with immediate close or using FileAppend repeadetly.
I pointed out that using:

Code: Select all

file := FileOpen(F1, "w")
file.write(e)
file.close()
Is pretty much the same as using:

Code: Select all

Fileappend,%e%,%f1%
It's not about the general performance of FileAppend vs the FileObject but rather how they perform in a specific situation.
When using FileAppend or FileWrite in a loop File.Write performs better than FileAppend because it is the difference between doing

Code: Select all

file := FileOpen("test.txt","w")
Loop
	file.write(data)
file.close()
and this:

Code: Select all

Loop {
	file := FileOpen("test.txt","a")
	file.write(data)
	file.close()
}
;which is the same as
Loop
	FileAppend, %data%, test.txt
Recommends AHK Studio
garry
Posts: 3764
Joined: 22 Dec 2013, 12:50

Re: Why you shouldn't use FileAppend

Post by garry » 10 Feb 2019, 04:59

@nnnik, thank you for explanation
Post Reply

Return to “Tips and Tricks (v1)”