Reads Excel (xlsx) Fast

Post your working scripts, libraries and tools for AHK v1.1 and older
IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Reads Excel (xlsx) Fast

Post by IMEime » 20 Mar 2017, 07:43

[EDIT]
**********************************************************************************************
It came to me a couple of days ago though.
It is quite old specifications (Let's say millions of years or so..)


Just have a (archeological) fun.
That is my real intention here.

**********************************************************************************************


I wrote a brief code for fun.

It reads xlsx file's content.
(strings and numbers-simply data.)

First of all.
I made a testing.xlsx file.
It has 100,000 cells within a sheet.
Each of it has 10 alpha-numeric random characters.
For instance, "d4j9415852".
And all of the cells are unique (no duplications).

Now.
I tried to extract its contents and made a txt file.

When I run the COM PIA codes.
It took me 24.9 secs.

And, this "Fast" codes took only 0.8 sec.

OMG, it is 30 times faster.
That is good, real good !!!

Though, it is just a level of pseudo code, it works.

Regards



COM PIA codes (it is slow, too slow)

Code: Select all

myExcel := "Testing.xlsx"		
startTime  :=  A_TickCount															
oExcel := ComObjCreate( "Excel.Application" )
oExcel.Workbooks.Open( myExcel ) 		
For eachCell In oExcel.ActiveSheet.UsedRange.Columns(1).Cells
	myContents .= eachCell.Text "`n"
oExcel.Quit				
oExcel  =		
resultFile := RegExReplace( myExcel, "\.xlsx") "_resultFile.txt"
FileAppend,% myContents, % resultFile
;Run, % resultFile,,, Max	
MsgBox % A_TickCount - startTime
my "Fast" codes

Code: Select all

;  [Updated 2019 January 18] This is OLD Code
myExcel := "Testing.xlsx" 	
startTime  :=  A_TickCount	
tempFolder := RegExReplace( myExcel, ".*\K\\.*") "\_Excel_UnZip\"		
tempName := RegExReplace( myExcel, "\.xlsx") ".zip"
FileCopy, % myExcel , % tempName
FileCreateDir, % tempFolder
tempObject := ComObjCreate("Shell.Application")
tempObject.Namespace(tempFolder).CopyHere( tempObject.Namespace(tempName).items, 4|16)	
FileDelete, % tempName
FileEncoding, UTF-8
FileRead, contentNumber, % tempFolder "\" "xl\worksheets\sheet1.xml"	
FileRead, contentString, % tempFolder "\" "xl\sharedStrings.xml"			
stringArray := []
While @ := RegExMatch( contentString, "<t>(.+?)</t>", _, @ ? StrLen(_) + @ : 1 )  	
	stringArray.Insert( _1)
myPattern := "<c r=""[A-Z]+[0-9]+""( t=""s"")?><v>(.+?)</v></c>"
While @ := RegExMatch( contentNumber, myPattern, _, @ ? StrLen(_) + @ : 1 )  		
	myContents .= ( StrLen( _1) ? stringArray[1+_2] : _2 ) "`n"		
FileRemoveDir, % tempFolder, 1
resultFile := RegExReplace( myExcel, "\.xlsx") "_Extracted.txt"
FileAppend,% myContents, % resultFile
;Run, % resultFile,,, Max	
MsgBox % A_TickCount - startTime

[Updated 2019 January 18]

This time, just collects contents.
-fixed "...cells is formatted (Bold/Italic/Underlined/formatted as text/foreground color/background color etc.) it will be skipped.."

Code: Select all

myExcel := "Testing.xlsx" 
tempFolder := RegExReplace(myExcel, ".*\K\\.*") "\_Excel_UnZip\"		
tempName := RegExReplace(myExcel, "\.xlsx") ".zip"
FileCopy, % myExcel , % tempName
FileCreateDir, % tempFolder
tempObject := ComObjCreate("Shell.Application")
tempObject.Namespace(tempFolder).CopyHere(tempObject.Namespace(tempName).items, 4|16)
FileDelete, % tempName
FileEncoding, UTF-8
FileRead, myCNT, % tempFolder "\" "xl\worksheets\sheet1.xml"	
FileRead, mySST, % tempFolder "\" "xl\sharedStrings.xml"		
FileRemoveDir, % tempFolder, 1
sharedStringArray := []
While @ := RegExMatch(mySST, "<t[^>]*>(.+?)</t>", _, @ ? StrLen(_) + @ : 1)  	
	sharedStringArray.Push(_1)
While @ := RegExMatch(myCNT, "<c r=""([A-Z]+)(\d+)""[^>]*?(t=""s"")?><v>(.+?)</v>", _, @ ? StrLen(_) + @ : 1)
{
	rowIndex := _2				;  1, 2, 3... 1048576
	columnName := _1			;  A, B...AA, AB,... XFD
	If(columnName = "A")    	;  at the moment column "A" Only
		myResult .= (_3 ? sharedStringArray[1 + _4] : _4) "`n"	
}
MsgBox % myResult
Note:
-I am not good at numbers in Excel. For instance;
"Excel cell values" --> "Values in Here"
"0.01" --> "0.01" It is OK. No problem at all
"0.002" --> "2.0000000000000001E-3" It starts problem here. Suddenly changed into Scientific. What the heck.. ?
"0.0034" --> "3.3999999999999998E-3" Woooo... It is the worst I have ever seen. I just gave it up.


When you are going to use someone else's script, Please just leave a brief comment saying thank you.
타인의 스크립트를 이용할 때는 최소한의 감사 표시를 남기시기 바랍니다. 개싸가지 도적질은 그만 하시고..
Last edited by IMEime on 18 Jan 2019, 23:05, edited 43 times in total.


IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 20 Mar 2017, 10:47

Nice try.

Though, it is still slower 3 times than mine.

Mine is less than 1s.
Yours is more than 2s. Always..

Regards.

ozzii
Posts: 481
Joined: 30 Oct 2013, 06:04

Re: Reads Excel (xlsx) Fast

Post by ozzii » 21 Mar 2017, 04:20

Doesn't work with xlsm files ;(

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 21 Mar 2017, 06:09

ozzii wrote:Doesn't work with xlsm files ;(
I have tested it with xlsx not with xlsm.
What is xlsm ? Sorry, I do not know it.

ozzii
Posts: 481
Joined: 30 Oct 2013, 06:04

Re: Reads Excel (xlsx) Fast

Post by ozzii » 22 Mar 2017, 09:06

Same but with macros.
I've changed xlsx in the code with xlsm, but same.

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 22 Mar 2017, 10:55

I don no
I wrote all I know in the codes
That is all
I have nothing else to say

Regards

User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Reads Excel (xlsx) Fast

Post by FanaticGuru » 22 Mar 2017, 17:11

IMEime wrote:
Nice try.

Though, it is still slower 3 times than mine.

Mine is less than 1s.
Yours is more than 2s. Always..

Regards.
My results are reversed from yours.

Below code of yours (with a modified path to my test folder) executes always more than 2 seconds for me.

Code: Select all

myExcel := A_Desktop "\Test\Test.xlsx" 	
tempFolder := RegExReplace( myExcel, ".*\K\\.*") "\_Excel_UnZip\"		
tempName := RegExReplace( myExcel, "\.xlsx") ".zip"
FileCopy, % myExcel , % tempName
FileCreateDir, % tempFolder
tempObject := ComObjCreate("Shell.Application")
tempObject.Namespace(tempFolder).CopyHere( tempObject.Namespace(tempName).items, 4|16)	
FileDelete, % tempName
FileEncoding, UTF-8
FileRead, contentNumber, % tempFolder "\" "xl\worksheets\sheet1.xml"	
FileRead, contentString, % tempFolder "\" "xl\sharedStrings.xml"			
stringArray := []
While @ := RegExMatch( contentString, "<t>(.+?)</t>", _, @ ? StrLen(_) + @ : 1 )  	
	stringArray.Insert( _1)
myPattern := "<c r=""[A-Z]+[0-9]+""( t=""s"")?><v>(.+?)</v></c>"
While @ := RegExMatch( contentNumber, myPattern, _, @ ? StrLen(_) + @ : 1 )  		
	myContents .= ( StrLen( _1) ? stringArray[1+_2] : _2 ) "`n"		
FileRemoveDir, % tempFolder, 1
resultFile := RegExReplace( myExcel, "\.xlsx") "_Extracted.txt"
FileAppend,% myContents, % resultFile
This code always less than a second.

Code: Select all

oExcel := ComObjCreate("Excel.Application")
oExcel.Workbooks.Open(A_Desktop "\Test\Test.xlsx") 
oExcel.DisplayAlerts := False
oExcel.ActiveWorkbook.SaveAs(A_Desktop "\Test\Test.csv", 6)
oExcel.Quit()
And most of that time is just the ComObjCreate("Excel.Application") which is the overhead it takes to start Excel running.

Yours is a nice demonstration that Excel files are really just specialized Zip files (at least in newer versions) and can be manipulated without Excel.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks

User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Reads Excel (xlsx) Fast

Post by FanaticGuru » 22 Mar 2017, 17:22

ozzii wrote:Doesn't work with xlsm files ;(
It works with xlsm files for me.
Xlsx and Xlsm file structures are very similar. Xlsm just has a few extra files in the Zip but the same folder layout.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 22 Mar 2017, 20:38

[EDIT]
People do not read comment carefully
I understand it fully.
I, myself too, do not read whole comments character by character. Because, I am not crazy

But, I do not ask questions on the basis of that kind of skimmed reading.
Following testing is done on the basis of OLD codes




@FG

It depends.

<< XML method>>
Cells 10,000 --> 0.7s
Cells 100,000 --> 0.7s
Cells 1,000,000 --> 38s (not 3.8s)

<< COM Save as>>
Cells 10,000 --> 1.8s
Cells 100,000 --> 2.2s
Cells 1,000,000 --> 5.5s

(AHK 64 Unicode, Office 2013 32bit, Win 8.1, Core i7)

Regards

-Ahh.. sorry, I forgot, I have two VSTO settings (Ribbons) in my everyday Excel. That could make it slower.
-On second thought, the result still has its meaning to me. I need all geared Excle, I do not need bare/clean Excel. It is useless to me.
Last edited by IMEime on 26 Jan 2019, 11:42, edited 2 times in total.

sv270190
Posts: 45
Joined: 06 Feb 2014, 11:48

Re: Reads Excel (xlsx) Fast

Post by sv270190 » 18 Jun 2017, 12:30

kind attention: Blackholyman
saving-excel-file-data-to-csv
your codes works good. thank you.

1. kindly modify the codes so that the excel file is open and active sheet is saved as csv file

2. all the sheets of the active work book is saved as csv files

3. normal fileselectfile save all sheets as CSV files

it will greatly help us in office work
S.V. SRINIVASAN
SRIVILLIPUTTUR
TAMIL NADU

User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: Reads Excel (xlsx) Fast

Post by AlphaBravo » 12 Jan 2019, 12:20

Hello IMEime,

I have noticed your reference to this script from "Comparing 2 excel Files takes Forever to finish" request, I gave it a bit of testing and while it is for sure faster than iterating through Excel cells via COM, I noticed that if a cells is formatted (Bold/Italic/Underlined/formatted as text/foreground color/background color etc.) it will be skipped from the result text file.

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 14 Jan 2019, 03:03

@AlphaBravo

Ho.. really ?
Nice comment

I just tested only with extremely simple xlsx.

For your reference, I guess you know already well (or not)
All of its information are documented Perfectly ahhhhh ten or twenty years ago.
You can find it out somewhere else for sure

Any way
I will check it out with time

bye

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 18 Jan 2019, 04:20

Updated A Little Bit
[Updated 2019 January 18]

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 18 Jan 2019, 18:19

Excel column Name --> column Index
Excel column Index --> column Name

Mine here
https://www.autohotkey.com/boards/viewt ... =6&t=61118

jeeswg's here
https://www.autohotkey.com/boards/viewt ... 23#p164723

IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: Reads Excel (xlsx) Fast

Post by IMEime » 18 Jan 2019, 22:58

I have test this code with a one million cells Excel file.

- AHK : 95 seconds
- C# : 8 seconds

This is a Huge gap...

Half of its time consumed at the first Array
The other half is consumed at the 2nd Array. (just modified string 'myResult' into an Array)

One million cells Excel file here.
https://drive.google.com/file/d/17_T1cK ... sp=sharing

SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Reads Excel (xlsx) Fast

Post by SOTE » 25 Jan 2019, 14:41

IMEime wrote:
18 Jan 2019, 22:58
I have test this code with a one million cells Excel file.

- AHK : 95 seconds
- C# : 8 seconds

This is a Huge gap...

Half of its time consumed at the first Array
The other half is consumed at the 2nd Array. (just modified string 'myResult' into an Array)

One million cells Excel file here.
https://drive.google.com/file/d/17_T1cK2Aoq-Add27WV9v1Z_NrJUYLMiV/view?usp=sharing
Interesting. But I'm not quite sure of who is doing any real work with one million cell Excel files.

And why the difference from the previous test?
@FG

It depends.

<< XML method>>
Cells 10,000 --> 0.7s
Cells 100,000 --> 0.7s
Cells 1,000,000 --> 38s (not 3.8s)

<< COM Save as>>
Cells 10,000 --> 1.8s
Cells 100,000 --> 2.2s
Cells 1,000,000 --> 5.5s

(AHK 64 Unicode, Office 2013 32bit, Win 8.1, Core i7)

Regards

-Ahh.. sorry, I forgot, I have two VSTO settings (Ribbons) in my everyday Excel. That could make it slower.
-On second thought, the result still has its meaning to me. I need all geared Excle, I do not need bare/clean Excel. It is useless to me.

User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Reads Excel (xlsx) Fast

Post by FanaticGuru » 25 Jan 2019, 15:34

SOTE wrote:
25 Jan 2019, 14:41
And why the difference from the previous test?
@FG

It depends.

<< XML method>>
Cells 10,000 --> 0.7s
Cells 100,000 --> 0.7s
Cells 1,000,000 --> 38s (not 3.8s)

<< COM Save as>>
Cells 10,000 --> 1.8s
Cells 100,000 --> 2.2s
Cells 1,000,000 --> 5.5s

(AHK 64 Unicode, Office 2013 32bit, Win 8.1, Core i7)

Regards

-Ahh.. sorry, I forgot, I have two VSTO settings (Ribbons) in my everyday Excel. That could make it slower.
-On second thought, the result still has its meaning to me. I need all geared Excle, I do not need bare/clean Excel. It is useless to me.
COM is faster in doing the work but there is an overhead of that Excel has to startup. That can take a second or two, after that overhead of starting up Excel. Excel is going to do the work much faster. On my system with a SSD, Excel starts hidden in less than a second when it don't have to do all the junk to display its GUI.

COM Excel is faster for me with even small workbooks. Now if you have add-ins and other things that makes your Excel startup very slow (which IMEime says they do) then the results could be different.

Basically in my opinion, as far as speed, this technique is all hype. It is like saying, "I can run faster than Usain Bolt!". When what you are really saying is that "If we wake up from sleeping, I will finish running a 100 meters sooner than Usain Bolt because I run bare foot and he will have to put on his shoes before running so I will get a head start but he will still catchup and pass me if the run is very long because he actually runs faster!".

It is still a nice demonstration that new Excel files are a special type of zip file that uses an XML structure that can be accessed directly without using Excel. It could be very useful for accessing Excel files on a computer that does not even have Excel installed. But as far as speed goes, Excel written in compiled C++ is going to do everything with that XML faster than AHK can hope as an interpreted language once they both start actually working on the XML.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks

awel20
Posts: 211
Joined: 19 Mar 2018, 14:09

Re: Reads Excel (xlsx) Fast

Post by awel20 » 25 Jan 2019, 15:49

FanaticGuru wrote:
25 Jan 2019, 15:34
It is still a nice demonstration
Maybe just as a curiosity.

But parsing XML with RegEx? ...surely nothing could go wrong \s

User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Reads Excel (xlsx) Fast

Post by FanaticGuru » 25 Jan 2019, 16:50

awel20 wrote:
25 Jan 2019, 15:49
FanaticGuru wrote:
25 Jan 2019, 15:34
It is still a nice demonstration
Maybe just as a curiosity.

But parsing XML with RegEx? ...surely nothing could go wrong \s
Kind of like running bare foot! :P

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks

Post Reply

Return to “Scripts and Functions (v1)”