Reads Excel (xlsx) Fast

Post your working scripts, libraries and tools for AHK v1.1 and older
IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 26 Jan 2019, 11:53

this post is regarding.. Excel parsing with XML file format for fun

the most famous one of kind is MS' OOXML

I have tried it
And
Got a silly result

Tried one million file using C#'s OOXML type
Result ? took me millions of years, I just stopped it
It looked like that it is using my Intel i7 processor "fully" - that means concurrent/asynchronous something.. right ?
But, made nothing. Only big noise of CPU fans. Idiot.

Does anybody know which is the nicer way to test Ten millions Excel xml file using AHK ? It just gives me Out Of Memory error.

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 26 Jan 2019, 12:53

Numbers...

Excel Cell's appearance
0.01
0.012
0.0034
0.00034


(1/2) XML file's contents. <-- When simply typed in - so, it is saved as crazy Scientific "Numbers"
0.01
1.2E-2
3.3999999999999998E-3
3.4000000000000002E-4


(2/2) XML file's contents. <-- Firstly make cells as "Text" and simply typed in - so, it is saved as "Text(String)". Good, very good.
0.01
0.012
0.0034
0.00034

Klarion
Posts: 176
Joined: 26 Mar 2019, 10:02

Re: Reads Excel (xlsx) Fast

Post by Klarion » 03 Apr 2019, 22:07


SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Reads Excel (xlsx) Fast

Post by SOTE » 21 Jun 2019, 22:47

The concept of this idea is quite good, open .xlsx files with no Excel, because not everybody is going to have Microsoft Office. There are now many alternatives, like LibreOffice, Open Office, etc... That a person only needs AutoHotkey to open up and work with an Excel document (.xlsx) can be very useful.

This is also why I disagree with the C# dependency, even if it's marginally faster. The slight difference in speed is irrelevant. What is more important is the functionality and convenience attained, where just AutoHotkey can be used, without depending on C# or Excel.

User avatar
flyingDman
Posts: 2821
Joined: 29 Sep 2013, 19:01

Re: Reads Excel (xlsx) Fast

Post by flyingDman » 22 Jun 2019, 15:59

Some speed comparisons in prior posts are not completely fair. Although there is obviously merit to any code that tries to improve on any other code, for practical reasons, I'll stick to my preferred (and used for many years) way of extracting large amounts of data from an Excel sheet. I.e.: using:

Code: Select all

myExcel := "C:\Users\xxxx\Documents\Spreadsheets\test 123.xlsx"		
resultFile := RegExReplace( myExcel, "\.xlsx") "_resultFile.txt"
filedelete, resultFile

startTime  :=  A_TickCount															
oExcel := ComObjCreate( "Excel.Application" )
oExcel.Workbooks.Open( myExcel ) 		
oExcel.ActiveSheet.UsedRange.copy                                ;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
myContents := clipboard
clipboard = 
oExcel.Quit				
oExcel  =		
FileAppend,% myContents, % resultFile
;Run, % resultFile,,, Max	
MsgBox % A_TickCount - startTime
This actually beats any of the other examples shown above (including the saveas CSV method, but marginally). More importantly, although I use UsedRange in my example, the range can be anything you want and MyContents is a variable that I can easily manipulate.
14.3 & 1.3.7

burque505
Posts: 1736
Joined: 22 Jan 2017, 19:37

Re: Reads Excel (xlsx) Fast

Post by burque505 » 25 Jun 2019, 17:06

@flyingDman, that's what I'd call a "Speed Dman" :). I just tried it on a workbook with 8,000 records with 11 columns per record.
Just 3432 milliseconds on my slow old computer. Notepad took about twice that long just to open the result file!
Thanks for this.
Regards,
burque505

User avatar
flyingDman
Posts: 2821
Joined: 29 Sep 2013, 19:01

Re: Reads Excel (xlsx) Fast

Post by flyingDman » 25 Jun 2019, 17:20

Can't fly slow!
14.3 & 1.3.7

ozzii
Posts: 482
Joined: 30 Oct 2013, 06:04

Re: Reads Excel (xlsx) Fast

Post by ozzii » 26 Jun 2019, 01:43

@flyingDman ,
Can you do the same but for all the sheets and not only the active one?
Maybe also with an append the name of the sheet in the result file.

User avatar
flyingDman
Posts: 2821
Joined: 29 Sep 2013, 19:01

Re: Reads Excel (xlsx) Fast

Post by flyingDman » 26 Jun 2019, 11:11

try this:

Code: Select all

file2open := "C:\Users\xxxx\Documents\Spreadsheets\test 123.xlsx"		
resultFile := RegExReplace( file2open, "\.xlsx") "_resultFile.txt"
filedelete, %resultFile%

XL := ComObjCreate( "Excel.Application" )
XL.Workbooks.Open(file2open)
For sheet in XL.Worksheets
	{
	XL.sheets(sheet.name).UsedRange.copy
	myContents .= sheet.name ":`n" clipboard "`n`n"
	clipboard = 
	}
XL.Quit				
XL  =		
FileAppend,% myContents, % resultFile
Run, % resultFile,,, Max	
14.3 & 1.3.7

hasantr
Posts: 933
Joined: 05 Apr 2016, 14:18
Location: İstanbul

Re: Reads Excel (xlsx) Fast

Post by hasantr » 12 Nov 2019, 16:51

[Updated 2019 January 18]

This time, just collects contents.
-fixed "...cells is formatted (Bold/Italic/Underlined/formatted as text/foreground color/background color etc.) it will be skipped.."

Code: Select all

myExcel := "Testing.xlsx" 
tempFolder := RegExReplace(myExcel, ".*\K\\.*") "\_Excel_UnZip\"		
tempName := RegExReplace(myExcel, "\.xlsx") ".zip"
FileCopy, % myExcel , % tempName
FileCreateDir, % tempFolder
tempObject := ComObjCreate("Shell.Application")
tempObject.Namespace(tempFolder).CopyHere(tempObject.Namespace(tempName).items, 4|16)
FileDelete, % tempName
FileEncoding, UTF-8
FileRead, myCNT, % tempFolder "\" "xl\worksheets\sheet1.xml"	
FileRead, mySST, % tempFolder "\" "xl\sharedStrings.xml"		
FileRemoveDir, % tempFolder, 1
sharedStringArray := []
While @ := RegExMatch(mySST, "<t[^>]*>(.+?)</t>", _, @ ? StrLen(_) + @ : 1)  	
	sharedStringArray.Push(_1)
While @ := RegExMatch(myCNT, "<c r=""([A-Z]+)(\d+)""[^>]*?(t=""s"")?><v>(.+?)</v>", _, @ ? StrLen(_) + @ : 1)
{
	rowIndex := _2				;  1, 2, 3... 1048576
	columnName := _1			;  A, B...AA, AB,... XFD
	If(columnName = "A")    	;  at the moment column "A" Only
		myResult .= (_3 ? sharedStringArray[1 + _4] : _4) "`n"	
}
MsgBox % myResult
Thanks. It works great. And the document doesn't have to be opened.

I want to print all the content on a string. I want to print it in the most efficient way. What kind of path would you suggest?

blue83
Posts: 157
Joined: 11 Apr 2018, 06:38

Re: Reads Excel (xlsx) Fast

Post by blue83 » 18 Nov 2019, 03:00

Thank you @flyingDman on that.

One question to add, what if I have 5 columns in excel and I want to have in that .txt file at the end only column 1,2 and 4?
How to accomplish that?

Thank you

blue83
Posts: 157
Joined: 11 Apr 2018, 06:38

Re: Reads Excel (xlsx) Fast

Post by blue83 » 18 Nov 2019, 03:29

Ok, I have find a way :)

Code: Select all

Loop Read, Book1.txt, Book2.txt
{
   StringSplit Field, A_LoopReadLine, `,
   FileAppend %Field1%`,%Field2%`,%Field4%`n
}

User avatar
flyingDman
Posts: 2821
Joined: 29 Sep 2013, 19:01

Re: Reads Excel (xlsx) Fast

Post by flyingDman » 18 Nov 2019, 13:23

Well, I would go this way:

Code: Select all

myExcel := "C:\Users\xxxx\Documents\Spreadsheets\test 123.xlsx"		
resultFile := RegExReplace( myExcel, "\.xlsx") "_resultFile.txt"
filedelete, resultFile

oExcel := ComObjCreate( "Excel.Application" )
oExcel.Workbooks.Open( myExcel ) 		
oExcel.ActiveSheet.UsedRange.copy
myContents := clipboard
clipboard = 
oExcel.Quit				
oExcel  =		

for x,y in strsplit(myContents,"`r","`n")
	for a,b in z:=strsplit(y,"`t")
		newcontents .= z.1 "`t" z.3 "`n"                                        ; only extracts field 1 and 3
	
FileAppend,% newContents, % resultFile
14.3 & 1.3.7

blue83
Posts: 157
Joined: 11 Apr 2018, 06:38

Re: Reads Excel (xlsx) Fast

Post by blue83 » 20 Nov 2019, 05:30

Hi flyingDman,

Yes, that`s it :)

Thank you

blue83
Posts: 157
Joined: 11 Apr 2018, 06:38

Re: Reads Excel (xlsx) Fast

Post by blue83 » 21 Nov 2019, 04:52

Hi @flyingDman how to put 1 from z.1 or 3 from z.3 into variable?

Because sometimes columns which I need to extract are changing positions.

I have tryied all combinations but without success.

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: Reads Excel (xlsx) Fast

Post by toralf » 11 Aug 2022, 03:08

i modified the code a little.
i works for my use case.

Code: Select all

; original by IMEime
; changed to be a function and different tempFolder
; https://www.autohotkey.com/boards/viewtopic.php?f=6&t=29408
ExcelReadFast(myExcel){
  ;ensure tempfolder is empty
  tempFolder := A_Temp "\ExcelReadFast\"		
  FileRemoveDir, % tempFolder, 1
  FileCreateDir, % tempFolder
  
  ;copy xlsx file as zip file to temp folder
  SplitPath, myExcel, OutFileName, OutDir, OutExtension, OutNameNoExt, OutDrive
  tempName := tempFolder OutNameNoExt ".zip"
  FileCopy, % myExcel , % tempName

  ;unpack
  tempObject := ComObjCreate("Shell.Application")
  tempObject.Namespace(tempFolder).CopyHere(tempObject.Namespace(tempName).items, 4|16)
  
  ;read content
  oldFE := A_FileEncoding
  FileEncoding, UTF-8
  FileRead, myCNT, % tempFolder "\" "xl\worksheets\sheet1.xml"	
  FileRead, mySST, % tempFolder "\" "xl\sharedStrings.xml"		
  FileEncoding, %oldFE%
  
  ;clean up tempfolder and files
  FileRemoveDir, % tempFolder, 1
  
  ;process content sharedStrings
  sharedString := []
  While @ := RegExMatch(mySST, "<t[^>]*>(.+?)</t>", _, @ ? StrLen(_) + @ : 1)  	
    sharedString.Push(_1)

  ;process content sheet
  myResult := {}
  While @ := RegExMatch(myCNT, "<c r=""([A-Z]+)(\d+)""[^>]*?(t=""s"")?><v>(.+?)</v>", _, @ ? StrLen(_) + @ : 1)
  {
    rowIndex := _2          ;  1, 2, 3... 1048576
    columnName := _1        ;  A, B...AA, AB,... XFD
    If !myResult.HasKey(rowIndex)
      myResult[rowIndex] := {}
    myResult[rowIndex, columnName] := (_3 ? sharedString[1 + _4] : _4) 
  }
  
  Return myResult
}
ciao
toralf

Post Reply

Return to “Scripts and Functions (v1)”