AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Help splitting large text files

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Ask for Help
View previous topic :: View next topic  
Author Message
Shoot Here
Guest





PostPosted: Fri Mar 02, 2007 8:15 am    Post subject: Help splitting large text files Reply with quote

this is my first foray into the language and I'm trying to split large (ie >$65536 rows) CSV (TXT) files into a series of smaller files that can be opened in Excel.

I have a working script, but it's too slow - thanks to the fileappend command opening and closing the output file for each iteration splitting 78K rows into 2 files took about 30 minutes....

I've tried a few different methods without success -

- incrementing the output file from within the loop has no effect. The
incremented file is created, but the output continues to go to the
original file
- directing output to a temporary file, then copying that to the intended
output file and deleting the temp file before proceeding. The File copy
works fine but I can't delete the temp file 'cos it's in use

does anyone have any suggestions on how to do this??

many thanks for any/all suggestions
Back to top
BoBo
Guest





PostPosted: Fri Mar 02, 2007 8:28 am    Post subject: Reply with quote

time for 'show' ...
a) show your code
b) show an excerpt of the input file
c) show a template of the output files
Back to top
PhiLho



Joined: 27 Dec 2005
Posts: 6721
Location: France (near Paris)

PostPosted: Fri Mar 02, 2007 11:27 am    Post subject: Reply with quote

Pseudo-code:
Code:
targetFileName = segment
maxLinesPerFile := 100
fileID := 1
count := 0
lines =
Loop Read, %originalFileName%
{
   lines .= A_LoopField . "`n"
   count++
   If (count == maxLinesPerFile)
   {
      FileAppend %lines%, %targetFileName%%fileID%.csv
      fileID++
      lines =
      count := 0
   }
}
If (count > 0)
   FileAppend %lines%, %targetFileName%%fileID%.csv
OK, that's real code, but untested!
I would format the fileID to have leading zero(es) too, for better sorting.
_________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
Back to top
View user's profile Send private message Visit poster's website
Shoot Here
Guest





PostPosted: Tue Mar 06, 2007 1:52 am    Post subject: Help Splitting Large Text Files Reply with quote

Thanks for the code!!

it's obvious when you see it, but it never occurred to me to assign X number of rows to a single variable

I had been trying to work out if I could do this using an array but this is much (much!) simpler and neater

not sure why anyone would need to see input/output samples - they're text files, and I'm not doing any parsing or manipulations. Output is the same as the input, just in smaller volumes...

and FWIW, here's the (working) version of my code (apologies if it's untidy - I'm learning as I go here...) -

Code:


Header:=
RecordCount = 0
FileCount = 0
fileselectfile, FNAME,,,TXT_CSV(*.csv;*.txt)
splitpath, fname,,Dir,XTN,NAME_NoXTN
Setworkingdir, %dir%
Start:=A_TickCount
if XTN not contains CSV,TXT
{
   msgbox, selected file is not CSV or TXT.  Try again with the correct file
   exitapp
}
loop, read, %FNAME%
   RecordCount ++ ;= 1

unit:= recordcount/100
unit:= round(unit)
maltiplier = 1
prog:= (maltiplier*1)
split:= unit*maltiplier


if RecordCount < 65001
{
   msgbox, %RecordCount% records - no split needed!
   ExitApp
}
else
   progress, M X250 Y250 , working... ,elapsed: , Script Progress
   progress, 0   

   gosub fileout
   loop, read, %FNAME% , %Out_File%
   {
      gosub MyET
      if errorlevel
         break

      if a_index = %FileSplit%
         gosub Fileout
      
      fileappend, %A_loopreadline%`n 
      progress, ,working - %prog%`% completed , elapsed: %runtime%, Script Progress
      if (mod(a_index - 1, split)=0)       {
         progress, ,working - %prog%`% completed , elapsed: %runtime%, Script Progress
         progress, %Prog%
         maltiplier ++
         prog:= 1*maltiplier
         split:= unit*maltiplier
      }
      }
exitapp
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
MyET:
elapsed:= (A_TickCount - Start) / 1000
mins:= (elapsed / 60)
mins:= floor(mins)
sex:= mod(elapsed,60)
sex:= floor(sex)
RunTime = %mins% : %sex%
return
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FileOut:
FileCount ++
FileSplit:= (50000*filecount)
Out_File = %Name_noxtn%(%FileCount%).%XTN%
ifexist, %Out_File%
   filedelete, %out_file%
ifequal, filecount, 1
   filereadline, Header, %FNAME%, 1
ifgreater, filecount, 1
   fileappend, %Header%`n, %Out_File%
Return

Back to top
automaticman



Joined: 27 Oct 2006
Posts: 322

PostPosted: Thu Apr 24, 2008 10:59 am    Post subject: Reply with quote

How would one split text files according to specific strings in the beginnings of lines, e.g. lines starting with

- Chapter
- Section
...

should be splitted into different files using this line as the saved file name? So one could split books (in forms of text files) into the chapter files and read only those. One could also compare the sizes of chapters easily.
Back to top
View user's profile Send private message
[VxE]



Joined: 07 Oct 2006
Posts: 1129

PostPosted: Thu Apr 24, 2008 7:33 pm    Post subject: Reply with quote

automaticman wrote:
How would one split text files according to specific strings in the beginnings of lines, e.g. lines starting with

- Chapter
- Section
...

should be splitted into different files using this line as the saved file name? So one could split books (in forms of text files) into the chapter files and read only those. One could also compare the sizes of chapters easily.

Code:
[pseudocode]
CompleteBookFilePath = c:\MyBook.txt

FileRead, book, %CompleteBookFilePath%
Stringreplace, book, book, `r, `n, all
SetFormat, float, 3.0
chapter = 000
section = 000
Loop ; break at end of book
{
   MySplitBookPath := SubStr( CompleteBookFilePath, 1, -4) "-Chapter" chapter "-Section" section ".txt"
   Loop, Read, %CompleteBookFilePath%, %MySplitBookPath%
   { ; a read loop keeps an output file open for writing
      Loop, Parse, book, `n
      { ; begin parsing the remaining total contents of the book
         StringTrimLeft, book, book, % StrLen(A_LoopField) + 1
         If InStr(A_LoopField, " - Chapter", 1)
         { ; if the line contains the word " - chapter", increment 'chapter'
            chapter += 1.0
            section := 0.0 ; reset. this should get incremented at the
            break      ; next instance of the word ' - section'
         }
         If InStr(A_LoopField, " - Section", 1)
         { ; if the line contains ' - section', increment
            section += 1.0
            break
         }
         FileAppend, %A_LoopField%`n ; write a line to the current output file
      }
      break
   }
   If StrLen(book) < 3 ; are we done?
      break
}

untested, but should be enough to give you an idea.
_________________
My Home Thread
More Common Answers: 1. It's in the FAQ 2. Ternary ( ? : ) guide 3. Post code with [code][/code] tags
Back to top
View user's profile Send private message
automaticman



Joined: 27 Oct 2006
Posts: 322

PostPosted: Fri Apr 25, 2008 12:11 am    Post subject: Reply with quote

Thank you [VxE], after trying it out I will write my experience here.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Ask for Help All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group