Dealing with millions of files ...

Bacon19331 · 09 Aug 2020, 23:40

So I'm making this script that deals with millions of files, potentially. Basically my script randomly selects a file, deals with the file, once it's done with it, it FileAppends to a list of IDs so it knows not to go over it again and uses FileRead with "If InStr" to search for the ID (if the ID is found it goes back to randomly picking a new one). Is this the best way I could do this? I'm dealing with probably a hundred per minute and always FileAppends after it deals with the file. This in return is making really large files so over time it will take longer for it to FileRead and scan with If InStr. Right? Like will AHK slow down significantly if FileRead is reading a file that's 20MB? 60MB?

Second question: My script is using UrlDownloadToFile, FileRead, and If InStr. Do I have to use Sleep in between them to be safe or will they not go to the next part of the script until one of them completes first? I'm trying to make my script as fast as possible. Thank you in advance

.

Rohwedder · 10 Aug 2020, 01:43

Hallo,
if your IDs can be variable names, set these variables to True.
%ID% := True
Otherwise write them into an array.

Chunjee · 10 Aug 2020, 06:24

One benefit to the way you have done it is that it can be resumed if script is restarted. The downside is that your journal file will get very large.

Is it possible to move your input files to a finished processing folder? that would remove the need to index them as they are nolonger considered for input.

If you put your processed files into an array, it will be searchable slightly faster than re-reading and writing a journal file. With some array tricks it could be almost instant. But those will add complexity and require more code if restarting the script is in the design.

Bacon19331 · 10 Aug 2020, 10:19

Yes, good thought. I do resume after I quit my script sometimes and that's why I like it. I can't move them into a finished processing folder. Good idea though. I made a little GUI that told me when it collided and it actually collided with an id that was found in the used IDs about 3 times after dealing with about 10,000 files. Is it even worth it to store used IDs? Also my script does up to 5 files per second.

divanebaba · 10 Aug 2020, 11:46

Hi.
I'm avoiding FileReadLine and FileAppend inside Loops, because at least FileAppend accesses every time your hard-disc or ssd.
And when you have millions of files to check, you can be sure, that for every check at least one access from FileAppend is done to your hard-disc.

Use instead following technique:
Read the whole content of your list one time with FileRead into a variable, for example "VAR_A" and parse the variable with "Loop, parse, VAR_A, `n,`r".
Add the path of your new file to a new variable, for example "VAR_B" to avoid FileAppend.

At the end of your process store the content of your new variable "VAR_B" one time only with FileAppend to the end of the existling list.

Using FleAppend inside fast and big loops is easiest way but not the fastest and at least the deadliest way for your harddisc.

Dealing with millions of files ...

Dealing with millions of files ...

Re: Dealing with millions of files ...

Re: Dealing with millions of files ...

Re: Dealing with millions of files ...

Re: Dealing with millions of files ...

Who is online