Manage the Input to - Loop parse , ... , CSV

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Albireo
Posts: 1774
Joined: 16 Oct 2013, 13:53

Manage the Input to - Loop parse , ... , CSV

11 Jun 2022, 03:27

Hi!
Encountered a problem, when I wanted to split / handle a CSV file

My question / desire
How to remove the spaces between the fields in the source?


Description of the challenge
Both rows in the example script below, contains 5 fields.
But, the first row can't loop parse, v, CSV handle as I wish
(due to the space between the fields)

Code: Select all

infoCSV =
(
"abc def", "0,02", "0,03", "0,04", "0,05"
"fed cba","29,90","35,75","48,50","55,10"
)

for k, v in StrSplit(infoCSV, "`n")
	loop parse, v, CSV
		s .= "k.: " k "`tv.: " A_LoopField "`n"
MsgBox % s
Rohwedder
Posts: 7718
Joined: 04 Jun 2014, 08:33
Location: Germany

Re: Manage the Input to - Loop parse , ... , CSV

11 Jun 2022, 04:19

Hallo,
try:

Code: Select all

infoCSV =
(
"abc def", "0,02", "0,03", "0,04", "0,05"
"fed cba","29,90","35,75","48,50","55,10"
)

for k, v in StrSplit(infoCSV, "`n")
	loop parse,% StrReplace(v, ", ", ","), CSV
		s .= "k.: " k "`tv.: " A_LoopField "`n"
MsgBox % s
or:

Code: Select all

infoCSV =
(
"abc def", "0,02", "0,03",   "0,04",      "0,05"
"fed cba","29,90","35,75","48,50","55,10"
)

for k, v in StrSplit(infoCSV, "`n")
	loop parse,% RegExReplace(v, ",\s+", ","), CSV
		s .= "k.: " k "`tv.: " A_LoopField "`n"
MsgBox % s
Albireo
Posts: 1774
Joined: 16 Oct 2013, 13:53

Re: Manage the Input to - Loop parse , ... , CSV

11 Jun 2022, 04:43

Thank you!
the second one works better
Albireo
Posts: 1774
Joined: 16 Oct 2013, 13:53

Re: Manage the Input to - Loop parse , ... , CSV

11 Jun 2022, 10:23

Hmmm.
Expands the problem in the example a bit (the third/fourth line)
The third line is Empty.
This code change the delimiter from ; to , (in this case)
But how to handle the rows with a space around the delimiter? (now the proposal does not work ...)
(and delete rows "without" content)

Code: Select all

#SingleInstance Force
infoCSV =
(

"abc def"; "0,02"   ; "0,03";   "0,04" ;      "0,05"
"fed cba";"29,90";"35,75";"48,50";"55,10"
"";"" ;"";  ""  ;"" ;""
"bla";"bla";"bla "bla" bla";;;
"";"";"";;;

)


arrayCSV := {}
delimFrom := ";"
delimTo := ","

aInputCSV := StrSplit(infoCSV, "`n", "`r") ; Create an array of each row

rowOK = 0
For LineNo, inputLine in aInputCSV
{	newInputCSV := "" 
	charPos = 1
	countDelim = 0
	; countQuotes = 0
	For CharNo, Char in StrSplit(inputLine)
	{	if !inQuotes
		{	; Check for START quote character
			if ( Char = """" )
			{	inQuotes := True
				; countQuotes += 1
				Continue
			}
			
			; Check if the character is a delimiter
			if ( Char = delimFrom )
			{	If ( delimFrom <> delimTo )	; If the delimiter is to be changed
					newInputCSV .= SubStr(inputLine, charPos, A_Index - charPos) delimTo
				charPos := A_Index + 1	; Next field begin
				countDelim += 1
			}
		}
		else
		{	; Check for END quote character
			if ( Char = """" )
			{	inQuotes := False
				; countQuotes += 1
			}
		}
	}
	newInputCSV .= SubStr(inputLine, StrLen(newInputCSV) + 1 , StrLen(inputLine) - StrLen(newInputCSV) + 1 )
	; MsgBox ,, Row.: %A_LineNumber% -> %A_ScriptName%, % "A_Index .: " A_Index "`n|" newInputCSV "|"
	
	arrayCSV.Push( newInputCSV )	; Lägg in resultatsträngen i en array
	if rowOK = 1
	{	fieldNo := countDelim + 1	; Räkna antal fält i första godkända rad!
	}
}

for k, v in arrayCSV
	s .= "k.: " k "`tv.: " v "`n"
MsgBox 64, Row.: %A_LineNumber% -> %A_ScriptName%, % "The result (is OK!) .: `n" s

s:= ""
for k, v in StrSplit(infoCSV, "`n")
	loop parse,% RegExReplace(v, ",\s*", ","), CSV
		s .= "k.: " k "`tv.: " A_LoopField "`n"
MsgBox % s

Rohwedder
Posts: 7718
Joined: 04 Jun 2014, 08:33
Location: Germany

Re: Manage the Input to - Loop parse , ... , CSV

11 Jun 2022, 10:48

Then perhaps?:

Code: Select all

#SingleInstance Force
infoCSV =
(
"abc def", "0,02", "0,03",   "0,04",      "0,05"
"fed cba","29,90","35,75","48,50","55,10"

"abc def"; "0,02"   ; "0,03";   "0,04" ;      "0,05"
"fed cba";"29,90";"35,75";"48,50";"55,10"
"";"" ;"";  ""  ;"" ;""
"bla";"bla";"bla "bla" bla";;;
"";"";"";;;

)
for k, v in StrSplit(infoCSV, "`n")
	IF RegExMatch(v, "\d") ;If there is a digit in v 
		loop parse,% RegExReplace(v, "(;|,)\s*""", ","""), CSV
			s .= "k.: " k "`tv.: " A_LoopField "`n"
MsgBox % s
; RegExReplace(v, "(;|,)\s*""", ",""") means:
; all (semicolon or comma & none to several tabs or spaces & quotation mark)
; in v will be replaced by (comma & quotation mark). CSV needs commas!
https://i.redd.it/nac35ntlfg831.jpg
User avatar
flyingDman
Posts: 2832
Joined: 29 Sep 2013, 19:01

Re: Manage the Input to - Loop parse , ... , CSV

11 Jun 2022, 12:51

the IF RegExMatch(v, "\d") eliminates all elements of the alpha only lines like the bla,bla line.
What about this:

Code: Select all

for k, v in StrSplit(infoCSV, "`n")
	loop parse,% RegExReplace(v, "(,|;)\s*""",","""), CSV
		(v~="\w") ?? s .= "k.: " k "`tv.: " A_LoopField "`n"  ; eliminates lines with at least a digit or an alpha
It does not report the 3rd element in the bla,bla line, but i guess that's and extreme anomaly anyway.
14.3 & 1.3.7
Albireo
Posts: 1774
Joined: 16 Oct 2013, 13:53

Re: Manage the Input to - Loop parse , ... , CSV

12 Jun 2022, 17:40

Thank you!
It was more complicated to solve than I thought ...
I do not think it's a "quickfix".

I understand that not all different possibilities can be handled.
But is it possible with a message that there is no solution?
(probably wrong number of columns before / after? or..)

Maybe the solution must contain several separate steps.

This code (change delimiter, below) works well
Input value .:

"abc def"; "0,02" ; "0,03"; "0,04" ; "0,05"
"fed cba";"29,90";"35,75";"48,50";"55,10"
"";"" ;""; "" ;""
10; 17,90 ; 15,30; 4,50; "700; 32"
"bla";"bla";"bla "bla" bla";;
"";"";"";;
pensé; abc åäö;; 12345;
result value .:
"abc def", "0,02" , "0,03", "0,04" , "0,05"
"fed cba","29,90","35,75","48,50","55,10"
"","" ,"", "" ,""
10, 17,90 , 15,30, 4,50, "700; 32"
"bla","bla","bla "bla" bla",,
pensé, abc åäö,, 12345,
Change delimiters
But, I thought that loop parse ???, CSV could handle "spaces" around the field dividers - but there I was wrong.


Come to the conclusion that it is better to try to handle the spaces around the field divider before delimiter is changed.
Maybe even insert quotation marks around all fields that do not have quotation marks.
Split rows in fields
My wish is still, to be able to handle the result with loop parse ?? CSV.
If the number of fields mismatches, it should be noted in a log file - (but that is another challenge)
hd0202
Posts: 183
Joined: 04 Oct 2013, 03:07
Location: Germany near Cologne

Re: Manage the Input to - Loop parse , ... , CSV

14 Jun 2022, 07:30

But, I thought that loop parse ???, CSV could handle "spaces" around the field dividers - but there I was wrong.
have a look at wikipedia - https://en.wikipedia.org/wiki/Comma-separated_values
In some CSV implementations[which?], leading and trailing spaces and tabs are trimmed (ignored). Such trimming is forbidden by RFC 4180, which states "Spaces are considered part of a field and should not be ignored."

1997, Ford, E350
not same as
1997,Ford,E350

According to RFC 4180, spaces outside quotes in a field are not allowed; however, the RFC also says that "Spaces are considered part of a field and should not be ignored." and "Implementers should 'be conservative in what you do, be liberal in what you accept from others' (RFC 793, section 2.10) when processing CSV files."

1997, "Ford" ,E350

In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.

1997,Ford,E350," Super luxurious truck "
Here is my implementation, please test it !

Code: Select all

setbatchlines, -1

; for test purpose, replace it with the data read
; I added 3 lines 
lines =
(
"abc def"; "0,02" ; "0,03"; "0,04" ; "0,05"
"fed cba";"29,90";"35,75";"48,50";"55,10"
"";"" ;""; "" ;""
10; 17,90 ; 15,30; 4,50; "700; 32"
"bla";"bla";"bla "bla" bla";;
"bla";"bla";"bla ""bla"" bla";;
"";"";"";;
pensé; abc åäö;; 12345;
"0,001"; "0,002"; "0,003"; "0,004"; "0,005"
"first field",SecondField,"the word ""special"" is quoted literally",,"last field, has literal comma"ß
)

fexp := 5	; expected number of fields
for k, _inline in strsplit(lines, "`n", "`r")	; my replacement for loop, parse, v, CSV
{
	_array := SplitCSV(_inline, ";")	; _array.0 = number of fields
										; _array.1 to _array[_array.0] = fields
										; _array[_array.0 + 1] = string of delimiters, separated by char(1)
	gosub, process_results
}
exitapp

process_results:
;	<== evaluate _array to lines for output file and _msg for log info ==>
	_msg =
	_del := strsplit(_array[_array.0 + 1], chr(1))
	pos := instr(_array[_array.0 + 1], "?")
	if (pos != 0)
		_msg .= "--> unexpected chars with delimiters <--`n"
	if (_array.0 != fexp)
		_msg .= "--> only " _array.0 "  fields found <--`n"
	loop, % _array.0
	{
		_msg .= "field " a_index ": "
	    _msg .= "<" _array[a_index] "> with <" _del[a_index] ">`n"
	}
	MsgBox % _msg
return

SplitCSV(str, delims := ",")
{
	if (substr(str, 0) = "ß")
	{
		str := substr(str, 1, -1)
		delims := ","
	}
	str := strreplace(str, delims " """, delims """", all)
	str := strreplace(str, """" " " delims, """" delims, all)
	str := strreplace(str, """""", "~", all)
	if (substr(str, 1, 1) != """")
		str := """" strreplace(str, delims, """" delims, oc, 1)	; only first
	hk := true
	array := SplitDlms("""", str)
	a2 := {}
	c := 0
	d := 0
	str =
	Loop % array.0
	{
		if (!array.1 and a_index = 1)
			continue
		curr := a_index
		next := curr + 1
		c++
		if (mod(c, 2) = 1)
		{	
			stringreplace, line, % array[curr], ~, "", all
			dellen := strlen(delims)
			stringleft, left, % array[next], % dellen
			if (left = delims)
			{
				array[next] := substr(array[next], dellen + 1)
			}
			if !y
			{
				a2.insert(line)
				str .= delims chr(1)
				d++
			}
			continue
		}
		a3 := strsplit(array[curr], delims)
		a3.0 := a3.maxindex()
		if (a3.0 = 1 and a3.1)
		{
			stringreplace, line, % array[curr], ~, "", all
			if !line
			{
				a2.insert(line)
				d++
			}
			str .= a3.1 "?"
		}
		else
			loop, % a3.0 - 1
			{
				stringreplace, line, % a3[a_index], ~, "", all
				if !y
				{
					a2.insert(line)
					str .= delims chr(1)
					d++
				}
			}
	}
	a2.0 := d
	a2[d + 1] := str
	Return a2
}
SplitDlms(delims,str)
{
	array := strsplit(str, delims)
	array.0 := array.maxindex()
	Loop % array.0
	{
		str := strreplace(str, array[a_index], oc, limit := 1)
	}
	array[array.0 + 1] := str
	Return array
}
escape::exitapp
Hubert
Albireo
Posts: 1774
Joined: 16 Oct 2013, 13:53

Re: Manage the Input to - Loop parse , ... , CSV

14 Jun 2022, 09:37

Thank you!
Yes! (Wikipedia has right - but...)
In Sweden, it is common with field dividers other than , (comma) as a tab, ; or | one reason is that we have , decimal comma (not decimal point)

My CSV-files are coming from different sources (with different "problems") - enough about this.
But if these files will be easy to handle with AHK, they must be converted to , comma as file delimiter.
Which is not simple.

Should try to analyze your strategy, and test the speed of the script.

It was more complicated than I thought, but this proposal could be a solution (for me) .:
Seems to work for all fields, even those with poor structure.
Program handle CSV-field

just counting the number of quotes, was complicated (any easier way to do this on?) - I just want to know if it is an even or odd number

Code: Select all

countQuotes = 0
	flagField := False
	Loop
	{	tmpNo := inStr(inputLine, """",,, A_Index )
		if ( tmpNo > 0 )	; quotes have been found
			countQuotes ++
		else ;  if the number of quotation marks is uneven - problem to handle
		{	if ( mod(countQuotes, 2) <> 0 )
			{	MsgBox 48, Row.: %A_LineNumber% -> %A_ScriptName%, % "Row can't be handled .: `n`n" inputLine "`t" countQuotes "`n`nCan be saved to a logfile!"
				Continue 2	; Skip the string with an odd number of quotes
			}
			else
				Break	; next row
		}
	}
I think the solution for managing a CSV file is approaching :D
hd0202
Posts: 183
Joined: 04 Oct 2013, 03:07
Location: Germany near Cologne

Re: Manage the Input to - Loop parse , ... , CSV

14 Jun 2022, 10:44

But if these files will be easy to handle with AHK, they must be converted to , comma as file delimiter.
in my script the delimiter can be selected:

Code: Select all

_array := SplitCSV(_inline, ";")
it can even be changed for special cases, I used it for test in one run, see the last char in this line <ß>:

Code: Select all

"first field",SecondField,"the word ""special"" is quoted literally",,"last field, has literal comma"ß
and the code lines:

Code: Select all

	if (substr(str, 0) = "ß")
	{
		str := substr(str, 1, -1)
		delims := ","
	}
Hubert
Albireo
Posts: 1774
Joined: 16 Oct 2013, 13:53

Re: Manage the Input to - Loop parse , ... , CSV

14 Jun 2022, 18:12

Thank you!
(I have no idea what a "literal comma" is - Have never heard about this) :?

My function()
1) Will clear empty rows / fields.
Empty rows / fields also include e.g. "", "", "" ,, "" (maybe even " ","","" ,,,)

2) Analyzes which fields will have problems after a change of field dividers (and possibly handle these)
for example 37,56 ; 45,75 ; 90 ; 9,90 (is not the same as 37,56 , 45,75 , 90 , 9,90 (after the delimiter is changed to , )

3) Manage the lines that will have problems with a conversion
Right now can I not handle an uneven number of quotes.
e.g. ; Diff"icult;""; 98 456;perhaps impossible to handle?

4) If the field contains control characters or similar - delete all this characters? (these are not handled - right now)

So far - the function() to read the CSV-file (and analyze any problems)

If the number of fields differs from e.g. first line (usually the header line), these should probably be moved to a log file.
In my case I will do this after the CSV file is cleaned from unwanted rows

Your program doesn't give the same result as mine.

Assume the CSV information below (I hope it appears in the same way as I want)
All rows contain 5 fields (if I counted correctly)

Code: Select all

(

"abc def"; "0,02" ; "0,03"; "0,04" ; "0,05"
"";"" ;""; "" ;""
"";" " ;""; "" ;""
10; 17,90 ; 15,30; 4,50; "700; 32"
"bla";"bla";"bla "bla" bla";;
"bla";"bla";"bla ""bla"" bla";;
"";"";"";;
pensé; abc åäö;; 12345;
"first field";SecondField;"the word ""special"" is quoted literally";;"last field, has literal comma"ß
; Diff"icult;""; 98 456;perhaps impossible to handle?
;,;,;,;,
"0,001"; "0,002"; "0,003"; "0,004"; "0,005"

)
The input and result from my program .:

"abc def"; "0,02" ; "0,03"; "0,04" ; "0,05"
"";"" ;""; "" ;""
"";" " ;""; "" ;""
10; 17,90 ; 15,30; 4,50; "700; 32"
"bla";"bla";"bla "bla" bla";;
"bla";"bla";"bla ""bla"" bla";;
"";"";"";;
pensé; abc åäö;; 12345;
"first field";SecondField;"the word ""special"" is quoted literally";;"last field, has literal comma"ß
; Diff"icult;""; 98 456;perhaps impossible to handle?
;,;,;,;,
"0,001"; "0,002"; "0,003"; "0,004"; "0,005"


k .: 1 v .: "abc def","0,02","0,03","0,04","0,05"
k .: 2 v .: 10,"17,90","15,30","4,50","700; 32"
k .: 3 v .: "bla","bla","bla "bla" bla",,
k .: 4 v .: "bla","bla","bla ""bla"" bla",,
k .: 5 v .: pensé,abc åäö,,12345,
k .: 6 v .: "first field",SecondField,"the word ""special"" is quoted literally",,"last field, has literal comma"ß
k .: 7 v .: ,",",",",",",","
k .: 8 v .: "0,001","0,002","0,003","0,004","0,005"

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Chunjee and 123 guests