Manage "blank" rows in a csv file Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Manage "blank" rows in a csv file

31 May 2022, 10:26

I want to handle a number of CSV files ("," comma separated files).
Trying to find a good way to handle these on.

My wish .:
  • Delete any rows that lack information in all fields.
  • Give a message if a row has the wrong number of fields.
(It may be that a field has been incorrectly described
for example 24,56 instead of "24,56" and I want to draw attention to this.)

Background .:
(A CSV-file can be from a few rows to maybe 50,000 rows)
Encode = UTF-8
csvFile = the name of the CSV-file.

If I want to read the headlines I use .:

Code: Select all

FileEncoding %vEncode%	; Encoding for FileReadLine and Loop Read
		FileReadLine outLine, % csvFile, 1
And to read the rest of the file I use this instructions.

Code: Select all

	Loop Read, %csvFile%
	{	Loop Parse, A_LoopReadLine, CSV
		{
But how to know that the row is empty? (the content of the different fields is not as structured as in the examples)

Some examples
example 1. (row 3 is empty)
Head1,Head2,Head3
info1,info2,info3

InfoA,infoB,infoC
example 2. (row 3 is empty)
Head1,Head2,Head3
info1,info2,info3
,,
InfoA,infoB,infoC
example 3. (row 3 is empty)
Head1,Head2,Head3
info1,info2,info3
"","",""
InfoA,infoB,infoC
example 4. (row 3 is empty)
Head1,Head2,Head3,Head4,Head5,Head6
info1,info2,info3,,Info5,
"",,"",,"",""
InfoA,infoB,infoC,,,
example 5. (row 1 and row 4 is empty)

Head1,Head2,Head3,Head4,Head5,Head6
info1,info2,info3,,Info5,
"","",,,,""
InfoA,infoB,infoC,,,
Perhaps this example is impossible in all different cases.
But completely blank lines before the headline can occur.
example 6. (row 1, 4 and 6 is empty)
row 7 has the wrong number of fields

Head1,Head2,Head3,Head4,Head5,Head6
info1,info2,info3,,Info5,
,"","","","",
InfoA,infoB,infoC,,,
,,,,,
,,
BoBo
Posts: 6563
Joined: 13 May 2014, 17:15

Re: Manage "blank" rows in a csv file

31 May 2022, 11:23

Well, example 6's row 1 isn't empty as it contains an EOL character! But you might end up with an empty line at the end of a text section (that is missing a trailing EOL).
Pretty sure that it can be easily sorted out using RegExMatch(). Nevertheless, here's a "workaround": viewtopic.php?p=464831#p464831 :shifty:

For "counting" the number of columns to identify a mismatch (expecting a confirmed unique/default delimiter) you could (ab)use MsgBox % StrSplit(A_LoopField,",").Count()!=6 ? "This line sucks!" : "Muito Bueno!"

But that won't handle a missing column-delimiter while one of the other cells contains a 'false friend' ie. '24,56' that would add up to a "correct" amount of cells again.

Which delimiter is finally used? https://www.autohotkey.com/docs/commands/LoopParse.htm#ExDelimiter
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Re: Manage "blank" rows in a csv file

31 May 2022, 17:05

Thank you!

You have right about "EOL" but is not counted by strLen()
The instruction Loop Parse, A_LoopReadLine, CSV is not read a row from the CSV-file with only EOL.
I'm not sure how to use your other suggestions in my case.

Can add additional information .:
a)
The first thing that happens in the "real" program, is that all headings are placed in an array aHeader,
which makes it easy to check how many fields the CSV-file contains. fieldCount: = aHeader.count()

b)
The CSV-files are always "," (comma separated) to be able to use the CSV as delimiter in AHK (works very good)

Wrote a simplified test program to more easily test ideas and suggestions.
To test my various examples, just modify Gosub.
for example Gosub ex1 tests example 1, Gosub ex5 tests example 5 and so on.
The correct fieldCount (number of columns) has been specified for each example below.

Code: Select all

#SingleInstance, Force

csvFile := A_ScriptDir "\exampleFile.csv"

; - - -   initiate csvFile   - - - - - - - 
vEncode := "UTF-8" 
FileEncoding %vEncode%	; Encoding for FileReadLine and Loop Read

if FileExist(csvFile)
	FileDelete % csvFile

Gosub ex1

FileAppend % csvField1, %  csvFile, % vEncode
; - - -   csvFile is created   - - - - - - - 


; - - - - Handle empty rows - - - - -
Loop Read, % csvFile
{	tmp := A_Index
	if (strLen(A_LoopReadLine) < 1)
		Continue	; Ignore rows without content
		
	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % A_LoopReadLine
	Loop Parse, A_LoopReadLine, CSV
	{	res .= tmp "`n"
		; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % A_LoopField
	}
	res .= "`n"
}
MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % res

ExitApp


; Example1
ex1:
	csvField1 =
	( LTrim
		Head1,Head2,Head3
		info1,info2,info3
	
		InfoA,infoB,infoC
	)
	fieldCount = 3
Return


; Example2
ex2:
	csvField1 =
	( LTrim
		Head1,Head2,Head3
		info1,info2,info3
		,,
		InfoA,infoB,infoC
	)
	fieldCount = 3
Return



; Example3
ex3:
	csvField1 =
	( LTrim
		Head1,Head2,Head3
		info1,info2,info3
		"","",""
		InfoA,infoB,infoC
	)
	fieldCount = 3
Return


; Example4
ex4:
	csvField1 =
	( LTrim
		Head1,Head2,Head3,Head4,Head5,Head6
		info1,info2,info3,,Info5,
		"",,"",,"",""
		InfoA,infoB,infoC,,,
	)
	fieldCount = 6
Return


; Example5
ex5:
	csvField1 =
	( LTrim
		Head1,Head2,Head3,Head4,Head5,Head6
		info1,info2,info3,,Info5,
		"","",,,,""
		InfoA,infoB,infoC,,,
	)
	fieldCount = 6
Return


; Example6
ex6:
	csvField1 =
	( LTrim

		Head1,Head2,Head3,Head4,Head5,Head6
		info1,info2,info3,,Info5,
		,"","","","",
		InfoA,infoB,infoC,,,
		,,,,,
		,,
	)
	fieldCount = 6
Return
BoBo
Posts: 6563
Joined: 13 May 2014, 17:15

Re: Manage "blank" rows in a csv file

01 Jun 2022, 00:26

You have right about "EOL" but is not counted by strLen()
Probably, bc you're getting each line extracted using A_LoopReadLine and that is stripping any EOL before you start parsing that line?

Code: Select all

#SingleInstance, Force

csv = 
(LTrim

a,b,c,d

)

Loop, Parse, csv, CSV
   MsgBox % StrLen(A_LoopField) "|" A_LoopField "|"
This is returning a length of 2 for a and d regardless that they should be counted as "single" characters, but StrLen() is adding the leading/trailing EOL to them. But as said, that might be different if Loop, Parse is embedded within a Loop, Read.

Code: Select all

csv = 
(

"1","2,5","3","4"
"","",""
,,,
)

Loop, Parse, csv, `n
   Msgbox   % StrSplit(A_LoopField,"""").Count()=0 
            ? StrSplit(A_LoopField,"""").Count() 
            : Round((StrSplit(A_LoopField,"""").Count()-1)/2)
Another workaround until a RegEx-solution pops up - that should validate the number of cells per row, if its content is "quoted" correctly.
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Re: Manage "blank" rows in a csv file

01 Jun 2022, 04:26

Thank you!
The handling of "blank rows" was more difficult than I thought.

A small modification of the wish.
If field 1 is missing and one of the other fields has some content, the event must be noted (in some way)

I created the following test script .:

Code: Select all

#SingleInstance Force

csv = 
(

"1","2,5","3","4"
"bla","bla","bla "bla" bla",
"","",""
"",34
,,,

)

Loop Parse, csv, `n
{	MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "Before the filter for blank rows`nA_Index .: " A_Index "`n`nstrLen . .: " strLen(A_LoopField) "`n|" A_LoopField "|"

	flagField := false
	field1 := ""
	loop parse, A_LoopField, CSV, """"
	{	fieldCount := A_Index
		If ( strLen(A_LoopField) > 0 )
			flagField := true	; any field exist
		
		If A_Index = 1
			field1 := A_LoopField	; Save the first field for errorhandling
	}
	If !flagField	; No fields exist
		Continue
	
	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "A_Index .: " A_Index "`nflagField .: " flagField "`nfield1 .: " field1
	If flagField and (strLen(field1) = 0)
	{	MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "Error handling!`n`n Field 1 is missing .: " A_LoopField
		Continue
	}
	
	MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "After compare`nA_Index .: " A_Index "`n`nstrLen . .: " strLen(A_LoopField) "`nnumber of fields . .: " fieldCount "`nA_LoopField .:`t |" A_LoopField "|"
}

ExitApp
I do not know if it works in all cases, and hope all comparisons are fast (can be many 100,000 compares in one run)
BoBo
Posts: 6563
Joined: 13 May 2014, 17:15

Re: Manage "blank" rows in a csv file

01 Jun 2022, 04:39

"bla","bla","bla "bla" bla", You're able to catch this? Otherwise, it means that your "quote"-regime/spec gets busted. :think: IMHO you can't handle any weird content setting and shouldn't allow it in the first place. Btw, "Quoting 'quoted' quotes" has to be done this way or in a similar fashion, JM2€ents
LazarouJoinery
Posts: 22
Joined: 02 Jan 2022, 21:41

Re: Manage "blank" rows in a csv file

01 Jun 2022, 07:38

Hi Albireo;

My advices:
- maybe consider ObjCSV.ahk?
I wrote this for it;
maybe it will help you
viewtopic.php?f=6&t=101791
- maybe consider Excel?
viewtopic.php?f=6&t=103500
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Re: Manage "blank" rows in a csv file

03 Jun 2022, 05:03

Suppose I know the field separator (in this example as ; )
Then every rows with only the following characters " and/or ; should be able to be skipped / deleted?
Then check if anything is left. (if nothing is left, the row is empty)

Edit .:
How to write a RegexMatch() that removes all field dividers ( ; in this case) and " ?
(or is it better to use StrSplit() ) ?
end Edit

Comments!
BoBo wrote:
01 Jun 2022, 04:39
"bla","bla","bla "bla" bla", You're able to catch this? Otherwise, it means that your "quote"-regime/spec gets busted. :think: IMHO you can't handle any weird content setting and shouldn't allow it in the first place. Btw, "Quoting 'quoted' quotes" has to be done this way or in a similar fashion, JM2€ents
I know it's difficult. but only the CSV option in AHK can handle this. (that's why I want to have all my CSV files with , (comma) as separator.
But the challenge is that many of the source files have ; as separator, because in Sweden we have , (comma) as decimal point.
LazarouJoinery wrote:
01 Jun 2022, 07:38
...My advices:
- maybe consider ObjCSV.ahk?
I wrote this for it;
maybe it will help you
viewtopic.php?f=6&t=101791 ....
I have tried to understand how your proposal could be used in my case (but failed)

At the beginning of the link there are 3 + 4 points that try to explain what can be done.
(but I do not understand a single of the points - sorry)
One example point 1 loads from txt, instead of working with files.
I understand loads from txt (maybe the source is a textfile)
but Instead of working with files ??? What does this want to convey to me?
( I don't use Excel - I use LibreOffice. )

The main issue in this case was to handle blank lines. (after that the result was placed in an array)
If I know which field separator is used in the source file.
(or it's always converted to , as field separator)
the function .: CheckDelim()
the function .: ChangeDelim()
BoBo
Posts: 6563
Joined: 13 May 2014, 17:15

Re: Manage "blank" rows in a csv file

04 Jun 2022, 18:01

@Albireo
Determines which delimiter was encountered...
https://www.autohotkey.com/docs/commands/LoopParse.htm#
viewtopic.php?f=10&t=104590 :shh: :lol:

Code: Select all

#SingleInstance, Force

delm  = ";"
ColNo =  5
row   =
   (LTrim Comment
   "a";"2.300,50€";"b";"Hello World";"3"  ; 1
   ""                                     ; 0
   ,,,,                                   ; 0
   ";                                     ; 0
   ","                                    ; 0
                                          ; 0
   "bla";'blu';"blo"                      ; 0
   "";"";"";""                            ; 1 (valid but skipped bc number of 'columns' < 5)
   "x";"y";"z";"@";"#"                    ; 1
   ) 

Loop, parse, row, `n
   {  if InStr(A_LoopField,delm)!=0
         filteredRows .= StrSplit(A_LoopField,delm).Count()=ColNo ? A_LoopField "`n" : ""
   }
MsgBox % filteredRows      
Return
HTH
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Re: Manage "blank" rows in a csv file

07 Jun 2022, 15:27

Thank you!
But the delimitercheck from the manual, only works with some csv-files.
BoBo wrote:
04 Jun 2022, 18:01
... Determines which delimiter was encountered...
https://www.autohotkey.com/docs/commands/LoopParse.htm# ...
(It doesn't work on this cases)
"1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
"0,001"; "0,002"; "0,003"; "0,004"; "0,005"
"34,25",,"43,52"
And so on.

@BoBo shows a second example - it probably works perfectly because the delimiter is already known.

My questions still remains.
How to write a RegexMatch() that removes all field dividers ( ; in this case) and " ?
(or is it better to use StrSplit() )
?
BoBo
Posts: 6563
Joined: 13 May 2014, 17:15

Re: Manage "blank" rows in a csv file

08 Jun 2022, 06:00

But the delimitercheck from the manual, only works with some csv-files.
Well, you gave yourself the answer already - bc there are too many possible irregularities.
My questions still remain.
How to write a RegexMatch() that removes all field dividers ( ; in this case) and " ?
Tricky for the above reason. How to decide what is in fact a divider or part of a cell's content? If those are fixed valid settings you wouldn't need RegEx-stuff but only StrReplace(), right?
(or is it better to use StrSplit() ) ?
I don't think so. While it can be (ab)used to do some basic pattern detection, it's more error-prone and less flexible bc it's not meant to be used that way. Go 4 RegEx!
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Re: Manage "blank" rows in a csv file

08 Jun 2022, 08:13

If I use this test data .:

Code: Select all


"";"";"";"";"";""
"1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
"0,001"; "0,002"; "0,003"; "0,004"; "0,005"; "0,006"; "0,007"; "0,008"; "0,009"
0,001; 0,002; 0,003
"";"";"";"";"";34

"1";"2,5";"3";"4";;
"bla";"bla";"bla "bla" bla";;;;
"";"";"";;
"34,25";;"43,52"
"";"34,25"
;;;

and run the function() CheckDelim() I got the result ;
CheckDelim() checks a maximum of 5 lines. - if there is still uncertainty, which delimiter has ben used, an error message is displayed.

If the field divider is changed from ; to , with the function() ChangeDelim() (above)
Gave the following results (with the empty rows removed).:

Code: Select all

"1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
"0,001", "0,002", "0,003", "0,004", "0,005", "0,006", "0,007", "0,008", "0,009"
0,001, 0,002, 0,003
"1","2,5","3","4",,
"bla","bla","bla "bla" bla",,,,
"34,25",,"43,52"
"","34,25"
The only problem I found in this field divider conversion was .: from 0,001; 0,002; 0,003 to 0,001, 0,002, 0,003
(In this case it is not possible to split the fields with Loop parse, aString, CSV)
Will try to handle this, so the result will instead be: "0,001", "0,002", "0,003"
After analysis of the field divider that the CSV-file uses, it should not be difficult for someone who can use RegexMatch() to filter out, for example, a line like this. ""; ""; ""; "" (if the field divider is ; )
Albireo
Posts: 1821
Joined: 16 Oct 2013, 13:53

Re: Manage "blank" rows in a csv file  Topic is solved

10 Jun 2022, 05:29

Thank you!

Have not done so many different tests, but since the field divider is known, the solution was the following. (the idea by flyingDman)
this solution is not speed tested, eg if 50,000 rows and many fields are to be processed. - Most rows have content - How long will it take?)

Code: Select all

#Singleinstance force
csv =
( LTrim

	"";"";"";"";"";""
	"1";"2,5";"3";"4";;
	"";"";"";"";"";3
	"";"";"";;;
	"bla";"bla";"bla "bla" bla";;;

	123;"some text" 125 "abcdef";"174,50";asd";"asd;
	0,001; 0,002; 0,003; 0,004; 0,005; 0,006
	"0,001"; "0,002"; "0,003"; "0,004"; "0,005"; "0,006"
	"1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
	"34,25";;"43,52"
	;;;;"";"34,25"
	;;;
)
Delim := ";"
for x, y in strsplit(csv,"`n")
	if (y ~= "[^" Delim """""]" )
		ncsv .= y "`n"
msgbox % ">>" rtrim(ncsv,"`n") "<<"    ; << and >> included to show we're catching the start and the end of the string

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: leothlon and 103 guests