 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
Shoot Here Guest
|
Posted: Fri Mar 02, 2007 8:15 am Post subject: Help splitting large text files |
|
|
this is my first foray into the language and I'm trying to split large (ie >$65536 rows) CSV (TXT) files into a series of smaller files that can be opened in Excel.
I have a working script, but it's too slow - thanks to the fileappend command opening and closing the output file for each iteration splitting 78K rows into 2 files took about 30 minutes....
I've tried a few different methods without success -
- incrementing the output file from within the loop has no effect. The
incremented file is created, but the output continues to go to the
original file
- directing output to a temporary file, then copying that to the intended
output file and deleting the temp file before proceeding. The File copy
works fine but I can't delete the temp file 'cos it's in use
does anyone have any suggestions on how to do this??
many thanks for any/all suggestions |
|
| Back to top |
|
 |
BoBo Guest
|
Posted: Fri Mar 02, 2007 8:28 am Post subject: |
|
|
time for 'show' ...
a) show your code
b) show an excerpt of the input file
c) show a template of the output files |
|
| Back to top |
|
 |
PhiLho
Joined: 27 Dec 2005 Posts: 6721 Location: France (near Paris)
|
Posted: Fri Mar 02, 2007 11:27 am Post subject: |
|
|
Pseudo-code:
| Code: | targetFileName = segment
maxLinesPerFile := 100
fileID := 1
count := 0
lines =
Loop Read, %originalFileName%
{
lines .= A_LoopField . "`n"
count++
If (count == maxLinesPerFile)
{
FileAppend %lines%, %targetFileName%%fileID%.csv
fileID++
lines =
count := 0
}
}
If (count > 0)
FileAppend %lines%, %targetFileName%%fileID%.csv
| OK, that's real code, but untested!
I would format the fileID to have leading zero(es) too, for better sorting. _________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2") |
|
| Back to top |
|
 |
Shoot Here Guest
|
Posted: Tue Mar 06, 2007 1:52 am Post subject: Help Splitting Large Text Files |
|
|
Thanks for the code!!
it's obvious when you see it, but it never occurred to me to assign X number of rows to a single variable
I had been trying to work out if I could do this using an array but this is much (much!) simpler and neater
not sure why anyone would need to see input/output samples - they're text files, and I'm not doing any parsing or manipulations. Output is the same as the input, just in smaller volumes...
and FWIW, here's the (working) version of my code (apologies if it's untidy - I'm learning as I go here...) -
| Code: |
Header:=
RecordCount = 0
FileCount = 0
fileselectfile, FNAME,,,TXT_CSV(*.csv;*.txt)
splitpath, fname,,Dir,XTN,NAME_NoXTN
Setworkingdir, %dir%
Start:=A_TickCount
if XTN not contains CSV,TXT
{
msgbox, selected file is not CSV or TXT. Try again with the correct file
exitapp
}
loop, read, %FNAME%
RecordCount ++ ;= 1
unit:= recordcount/100
unit:= round(unit)
maltiplier = 1
prog:= (maltiplier*1)
split:= unit*maltiplier
if RecordCount < 65001
{
msgbox, %RecordCount% records - no split needed!
ExitApp
}
else
progress, M X250 Y250 , working... ,elapsed: , Script Progress
progress, 0
gosub fileout
loop, read, %FNAME% , %Out_File%
{
gosub MyET
if errorlevel
break
if a_index = %FileSplit%
gosub Fileout
fileappend, %A_loopreadline%`n
progress, ,working - %prog%`% completed , elapsed: %runtime%, Script Progress
if (mod(a_index - 1, split)=0) {
progress, ,working - %prog%`% completed , elapsed: %runtime%, Script Progress
progress, %Prog%
maltiplier ++
prog:= 1*maltiplier
split:= unit*maltiplier
}
}
exitapp
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
MyET:
elapsed:= (A_TickCount - Start) / 1000
mins:= (elapsed / 60)
mins:= floor(mins)
sex:= mod(elapsed,60)
sex:= floor(sex)
RunTime = %mins% : %sex%
return
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FileOut:
FileCount ++
FileSplit:= (50000*filecount)
Out_File = %Name_noxtn%(%FileCount%).%XTN%
ifexist, %Out_File%
filedelete, %out_file%
ifequal, filecount, 1
filereadline, Header, %FNAME%, 1
ifgreater, filecount, 1
fileappend, %Header%`n, %Out_File%
Return
|
|
|
| Back to top |
|
 |
automaticman
Joined: 27 Oct 2006 Posts: 322
|
Posted: Thu Apr 24, 2008 10:59 am Post subject: |
|
|
How would one split text files according to specific strings in the beginnings of lines, e.g. lines starting with
- Chapter
- Section
...
should be splitted into different files using this line as the saved file name? So one could split books (in forms of text files) into the chapter files and read only those. One could also compare the sizes of chapters easily. |
|
| Back to top |
|
 |
[VxE]
Joined: 07 Oct 2006 Posts: 1129
|
Posted: Thu Apr 24, 2008 7:33 pm Post subject: |
|
|
| automaticman wrote: | How would one split text files according to specific strings in the beginnings of lines, e.g. lines starting with
- Chapter
- Section
...
should be splitted into different files using this line as the saved file name? So one could split books (in forms of text files) into the chapter files and read only those. One could also compare the sizes of chapters easily. |
| Code: | [pseudocode]
CompleteBookFilePath = c:\MyBook.txt
FileRead, book, %CompleteBookFilePath%
Stringreplace, book, book, `r, `n, all
SetFormat, float, 3.0
chapter = 000
section = 000
Loop ; break at end of book
{
MySplitBookPath := SubStr( CompleteBookFilePath, 1, -4) "-Chapter" chapter "-Section" section ".txt"
Loop, Read, %CompleteBookFilePath%, %MySplitBookPath%
{ ; a read loop keeps an output file open for writing
Loop, Parse, book, `n
{ ; begin parsing the remaining total contents of the book
StringTrimLeft, book, book, % StrLen(A_LoopField) + 1
If InStr(A_LoopField, " - Chapter", 1)
{ ; if the line contains the word " - chapter", increment 'chapter'
chapter += 1.0
section := 0.0 ; reset. this should get incremented at the
break ; next instance of the word ' - section'
}
If InStr(A_LoopField, " - Section", 1)
{ ; if the line contains ' - section', increment
section += 1.0
break
}
FileAppend, %A_LoopField%`n ; write a line to the current output file
}
break
}
If StrLen(book) < 3 ; are we done?
break
} |
untested, but should be enough to give you an idea. _________________ My Home Thread
More Common Answers: 1. It's in the FAQ 2. Ternary ( ? : ) guide 3. Post code with [code][/code] tags |
|
| Back to top |
|
 |
automaticman
Joined: 27 Oct 2006 Posts: 322
|
Posted: Fri Apr 25, 2008 12:11 am Post subject: |
|
|
| Thank you [VxE], after trying it out I will write my experience here. |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|