Reads Excel (xlsx) Fast
Re: Reads Excel (xlsx) Fast
this post is regarding.. Excel parsing with XML file format for fun
the most famous one of kind is MS' OOXML
I have tried it
And
Got a silly result
Tried one million file using C#'s OOXML type
Result ? took me millions of years, I just stopped it
It looked like that it is using my Intel i7 processor "fully" - that means concurrent/asynchronous something.. right ?
But, made nothing. Only big noise of CPU fans. Idiot.
Does anybody know which is the nicer way to test Ten millions Excel xml file using AHK ? It just gives me Out Of Memory error.
the most famous one of kind is MS' OOXML
I have tried it
And
Got a silly result
Tried one million file using C#'s OOXML type
Result ? took me millions of years, I just stopped it
It looked like that it is using my Intel i7 processor "fully" - that means concurrent/asynchronous something.. right ?
But, made nothing. Only big noise of CPU fans. Idiot.
Does anybody know which is the nicer way to test Ten millions Excel xml file using AHK ? It just gives me Out Of Memory error.
Re: Reads Excel (xlsx) Fast
Numbers...
Excel Cell's appearance
0.01
0.012
0.0034
0.00034
(1/2) XML file's contents. <-- When simply typed in - so, it is saved ascrazy Scientific "Numbers"
0.01
1.2E-2
3.3999999999999998E-3
3.4000000000000002E-4
(2/2) XML file's contents. <-- Firstly make cells as "Text" and simply typed in - so, it is saved as "Text(String)". Good, very good.
0.01
0.012
0.0034
0.00034
Excel Cell's appearance
0.01
0.012
0.0034
0.00034
(1/2) XML file's contents. <-- When simply typed in - so, it is saved as
0.01
1.2E-2
3.3999999999999998E-3
3.4000000000000002E-4
(2/2) XML file's contents. <-- Firstly make cells as "Text" and simply typed in - so, it is saved as "Text(String)". Good, very good.
0.01
0.012
0.0034
0.00034
Re: Reads Excel (xlsx) Fast
to write MS Excel file, simply (?)
https://www.autohotkey.com/boards/viewtopic.php?f=74&t=63344
to write MS Word file, simply (?)
https://www.autohotkey.com/boards/viewtopic.php?f=74&t=49922
https://www.autohotkey.com/boards/viewtopic.php?f=74&t=63344
to write MS Word file, simply (?)
https://www.autohotkey.com/boards/viewtopic.php?f=74&t=49922
Re: Reads Excel (xlsx) Fast
The concept of this idea is quite good, open .xlsx files with no Excel, because not everybody is going to have Microsoft Office. There are now many alternatives, like LibreOffice, Open Office, etc... That a person only needs AutoHotkey to open up and work with an Excel document (.xlsx) can be very useful.
This is also why I disagree with the C# dependency, even if it's marginally faster. The slight difference in speed is irrelevant. What is more important is the functionality and convenience attained, where just AutoHotkey can be used, without depending on C# or Excel.
This is also why I disagree with the C# dependency, even if it's marginally faster. The slight difference in speed is irrelevant. What is more important is the functionality and convenience attained, where just AutoHotkey can be used, without depending on C# or Excel.
- flyingDman
- Posts: 2821
- Joined: 29 Sep 2013, 19:01
Re: Reads Excel (xlsx) Fast
Some speed comparisons in prior posts are not completely fair. Although there is obviously merit to any code that tries to improve on any other code, for practical reasons, I'll stick to my preferred (and used for many years) way of extracting large amounts of data from an Excel sheet. I.e.: using:
This actually beats any of the other examples shown above (including the saveas CSV method, but marginally). More importantly, although I use UsedRange in my example, the range can be anything you want and MyContents is a variable that I can easily manipulate.
Code: Select all
myExcel := "C:\Users\xxxx\Documents\Spreadsheets\test 123.xlsx"
resultFile := RegExReplace( myExcel, "\.xlsx") "_resultFile.txt"
filedelete, resultFile
startTime := A_TickCount
oExcel := ComObjCreate( "Excel.Application" )
oExcel.Workbooks.Open( myExcel )
oExcel.ActiveSheet.UsedRange.copy ;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
myContents := clipboard
clipboard =
oExcel.Quit
oExcel =
FileAppend,% myContents, % resultFile
;Run, % resultFile,,, Max
MsgBox % A_TickCount - startTime
14.3 & 1.3.7
Re: Reads Excel (xlsx) Fast
@flyingDman, that's what I'd call a "Speed Dman" . I just tried it on a workbook with 8,000 records with 11 columns per record.
Just 3432 milliseconds on my slow old computer. Notepad took about twice that long just to open the result file!
Thanks for this.
Regards,
burque505
Just 3432 milliseconds on my slow old computer. Notepad took about twice that long just to open the result file!
Thanks for this.
Regards,
burque505
- flyingDman
- Posts: 2821
- Joined: 29 Sep 2013, 19:01
Re: Reads Excel (xlsx) Fast
@flyingDman ,
Can you do the same but for all the sheets and not only the active one?
Maybe also with an append the name of the sheet in the result file.
Can you do the same but for all the sheets and not only the active one?
Maybe also with an append the name of the sheet in the result file.
- flyingDman
- Posts: 2821
- Joined: 29 Sep 2013, 19:01
Re: Reads Excel (xlsx) Fast
try this:
Code: Select all
file2open := "C:\Users\xxxx\Documents\Spreadsheets\test 123.xlsx"
resultFile := RegExReplace( file2open, "\.xlsx") "_resultFile.txt"
filedelete, %resultFile%
XL := ComObjCreate( "Excel.Application" )
XL.Workbooks.Open(file2open)
For sheet in XL.Worksheets
{
XL.sheets(sheet.name).UsedRange.copy
myContents .= sheet.name ":`n" clipboard "`n`n"
clipboard =
}
XL.Quit
XL =
FileAppend,% myContents, % resultFile
Run, % resultFile,,, Max
14.3 & 1.3.7
Re: Reads Excel (xlsx) Fast
[Updated 2019 January 18]
This time, just collects contents.
-fixed "...cells is formatted (Bold/Italic/Underlined/formatted as text/foreground color/background color etc.) it will be skipped.."
Thanks. It works great. And the document doesn't have to be opened.
I want to print all the content on a string. I want to print it in the most efficient way. What kind of path would you suggest?
This time, just collects contents.
-fixed "...cells is formatted (Bold/Italic/Underlined/formatted as text/foreground color/background color etc.) it will be skipped.."
Code: Select all
myExcel := "Testing.xlsx"
tempFolder := RegExReplace(myExcel, ".*\K\\.*") "\_Excel_UnZip\"
tempName := RegExReplace(myExcel, "\.xlsx") ".zip"
FileCopy, % myExcel , % tempName
FileCreateDir, % tempFolder
tempObject := ComObjCreate("Shell.Application")
tempObject.Namespace(tempFolder).CopyHere(tempObject.Namespace(tempName).items, 4|16)
FileDelete, % tempName
FileEncoding, UTF-8
FileRead, myCNT, % tempFolder "\" "xl\worksheets\sheet1.xml"
FileRead, mySST, % tempFolder "\" "xl\sharedStrings.xml"
FileRemoveDir, % tempFolder, 1
sharedStringArray := []
While @ := RegExMatch(mySST, "<t[^>]*>(.+?)</t>", _, @ ? StrLen(_) + @ : 1)
sharedStringArray.Push(_1)
While @ := RegExMatch(myCNT, "<c r=""([A-Z]+)(\d+)""[^>]*?(t=""s"")?><v>(.+?)</v>", _, @ ? StrLen(_) + @ : 1)
{
rowIndex := _2 ; 1, 2, 3... 1048576
columnName := _1 ; A, B...AA, AB,... XFD
If(columnName = "A") ; at the moment column "A" Only
myResult .= (_3 ? sharedStringArray[1 + _4] : _4) "`n"
}
MsgBox % myResult
I want to print all the content on a string. I want to print it in the most efficient way. What kind of path would you suggest?
Re: Reads Excel (xlsx) Fast
Thank you @flyingDman on that.
One question to add, what if I have 5 columns in excel and I want to have in that .txt file at the end only column 1,2 and 4?
How to accomplish that?
Thank you
One question to add, what if I have 5 columns in excel and I want to have in that .txt file at the end only column 1,2 and 4?
How to accomplish that?
Thank you
Re: Reads Excel (xlsx) Fast
Ok, I have find a way
Code: Select all
Loop Read, Book1.txt, Book2.txt
{
StringSplit Field, A_LoopReadLine, `,
FileAppend %Field1%`,%Field2%`,%Field4%`n
}
- flyingDman
- Posts: 2821
- Joined: 29 Sep 2013, 19:01
Re: Reads Excel (xlsx) Fast
Well, I would go this way:
Code: Select all
myExcel := "C:\Users\xxxx\Documents\Spreadsheets\test 123.xlsx"
resultFile := RegExReplace( myExcel, "\.xlsx") "_resultFile.txt"
filedelete, resultFile
oExcel := ComObjCreate( "Excel.Application" )
oExcel.Workbooks.Open( myExcel )
oExcel.ActiveSheet.UsedRange.copy
myContents := clipboard
clipboard =
oExcel.Quit
oExcel =
for x,y in strsplit(myContents,"`r","`n")
for a,b in z:=strsplit(y,"`t")
newcontents .= z.1 "`t" z.3 "`n" ; only extracts field 1 and 3
FileAppend,% newContents, % resultFile
14.3 & 1.3.7
Re: Reads Excel (xlsx) Fast
Hi flyingDman,
Yes, that`s it
Thank you
Yes, that`s it
Thank you
Re: Reads Excel (xlsx) Fast
Hi @flyingDman how to put 1 from z.1 or 3 from z.3 into variable?
Because sometimes columns which I need to extract are changing positions.
I have tryied all combinations but without success.
Because sometimes columns which I need to extract are changing positions.
I have tryied all combinations but without success.
Re: Reads Excel (xlsx) Fast
i modified the code a little.
i works for my use case.
i works for my use case.
Code: Select all
; original by IMEime
; changed to be a function and different tempFolder
; https://www.autohotkey.com/boards/viewtopic.php?f=6&t=29408
ExcelReadFast(myExcel){
;ensure tempfolder is empty
tempFolder := A_Temp "\ExcelReadFast\"
FileRemoveDir, % tempFolder, 1
FileCreateDir, % tempFolder
;copy xlsx file as zip file to temp folder
SplitPath, myExcel, OutFileName, OutDir, OutExtension, OutNameNoExt, OutDrive
tempName := tempFolder OutNameNoExt ".zip"
FileCopy, % myExcel , % tempName
;unpack
tempObject := ComObjCreate("Shell.Application")
tempObject.Namespace(tempFolder).CopyHere(tempObject.Namespace(tempName).items, 4|16)
;read content
oldFE := A_FileEncoding
FileEncoding, UTF-8
FileRead, myCNT, % tempFolder "\" "xl\worksheets\sheet1.xml"
FileRead, mySST, % tempFolder "\" "xl\sharedStrings.xml"
FileEncoding, %oldFE%
;clean up tempfolder and files
FileRemoveDir, % tempFolder, 1
;process content sharedStrings
sharedString := []
While @ := RegExMatch(mySST, "<t[^>]*>(.+?)</t>", _, @ ? StrLen(_) + @ : 1)
sharedString.Push(_1)
;process content sheet
myResult := {}
While @ := RegExMatch(myCNT, "<c r=""([A-Z]+)(\d+)""[^>]*?(t=""s"")?><v>(.+?)</v>", _, @ ? StrLen(_) + @ : 1)
{
rowIndex := _2 ; 1, 2, 3... 1048576
columnName := _1 ; A, B...AA, AB,... XFD
If !myResult.HasKey(rowIndex)
myResult[rowIndex] := {}
myResult[rowIndex, columnName] := (_3 ? sharedString[1 + _4] : _4)
}
Return myResult
}
ciao
toralf
toralf