parse CSV (specify delimiters and qualifier)

Post your working scripts, libraries and tools for AHK v1.1 and older
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

parse CSV (specify delimiters and qualifier)

11 Jul 2019, 19:28

- I wrote an attempt at parsing a line of CSV text, where you can specify multiple delimiters, multiple omit characters and one qualifier.
- I've only done a few tests on it, and it's not something I'm likely to use very much.
- The principle it uses is: use RegExMatch to find the next delimiter, if there are an even number of qualifiers (e.g. quotes) between the first character and delimiter, then you have reached the end of a field, otherwise you are still within a field.
- People are welcome to make suggestions or modify/adapt the script.
- Has anything similar been written? Thanks.

Code: Select all

q:: ;parse CSV
;vText = """a""bc""","def","ghi"
vText = ""","",a,"",b,c,"","""," def ",;;," ghi "

vOutput := ""
for vKey, vValue in StrSplitCSV(vText)
	vOutput .= vKey " " vValue "`r`n"
MsgBox, % vOutput

vOutput := ""
for vKey, vValue in StrSplitCSV(vText, ",;", " ")
	vOutput .= vKey " " vValue "`r`n"
MsgBox, % vOutput
return

StrSplitCSV(vText, vDelim:=",", vOmitChars:="", vQual:="_DQ_") ;note: AHK v1: """", AHK v2: "`""
{
	local
	static vIsV1 := !!SubStr(1, 0)
	if (vQual = "_DQ_")
		vQual := Chr(34)
	if !(StrLen(vQual) = 1)
		throw Exception("invalid qualifier", -1)
	if (vText = "")
		return []
	if IsObject(vDelim)
	{
		oDelim := vDelim
		vDelim := ""
		for _, vValue in oDelim
			vDelim .= vValue
	}
	if InStr(vDelim, vQual, 1)
		throw Exception("qualifier matches a delimiter", -1)
	;4 characters that need escaping in a RegEx character class: ^-]\
	vNeedle := "[" RegExReplace(vDelim, "[\^\-\]\\]", "\$0") "]"
	vPos := 1
	vPos1 := 1
	vDoEnd := 0
	oArray := []
	Loop
	{
		vPos := RegExMatch(vText, vNeedle,, vPos)
		if vPos
			vPos2 := vPos-1
		else
			vPos2 := StrLen(vText), vDoEnd := 1
		vTemp := SubStr(vText, vPos1, vPos2-vPos1+1)
		StrReplace(vTemp, vQual,, vCount)
		if !(vCount & 1) ;an even number of qualifiers
		|| vDoEnd
		{
			vIsQuote1 := (SubStr(vTemp, 1, 1) = vQual)
			vIsQuote2 := (SubStr(vTemp, vIsV1-1) = vQual)
			if !(vIsQuote1 = vIsQuote2)
				throw Exception("qualifier start/end mismatch:`r`n" vTemp, -1)
			if vIsQuote1
				vTemp := SubStr(vTemp, 2, -1)
			oArray.Push(Trim(StrReplace(vTemp, vQual vQual, vQual), vOmitChars))
			vPos1 := vPos + 1
		}
		if vDoEnd
		{
			if (vCount & 1)
				throw Exception("qualifier count mismatch:`r`n" vTemp, -1)
			break
		}
		vPos++
	}
	return oArray
}
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
carno
Posts: 265
Joined: 20 Jun 2014, 16:48

Re: parse CSV (specify delimiters and qualifier)

11 Jul 2019, 23:05

great script that could come in handy! :D
ahk7
Posts: 575
Joined: 06 Nov 2013, 16:35

Re: parse CSV (specify delimiters and qualifier)

12 Jul 2019, 11:31

Delimiter Seperated Values by DerRaphael
https://autohotkey.com/board/topic/30102-how-can-i-parse-a-csv-file-with-multi-line-values/#entry191846
which is used the CSV libary https://github.com/hi5/CSV

As the old forum sometimes has messed up code I would check/use the one in the CSV lib as it is confirmed to work (at least for past several years), starts here https://github.com/hi5/CSV/blob/master/csv.ahk#L578

Edit: and ObjCSV by JnLlnd @ https://github.com/JnLlnd/ObjCSV/tree/master/Lib
Albireo
Posts: 1761
Joined: 16 Oct 2013, 13:53

Re: parse CSV (specify delimiters and qualifier)

24 Jul 2019, 13:32

jeeswg wrote:
11 Jul 2019, 19:28
- I wrote an attempt at parsing a line of CSV text, where you can specify multiple delimiters, multiple omit characters and one qualifier.
Good suggestion, will do a test of the function .: StrSplitCSV(vText)
There are some things that I do not understand "multiple delimiters" is it many different delimiters you mean? (not at the same time?) or delimiters with many characters?
jeeswg wrote:
11 Jul 2019, 19:28
... it's not something I'm likely to use very much...
I would probably use this quite often (in connection with CSV files) depending on how fast your function is.

I did it in a different way in order to use eg. StrSplit() from AHK in my case, I wrote a function that analyzes a CSV variable, replaces the existing delimiter with an optional delimiter.
One of the problems that arises, is which delimiter / sign combination works best to work with later. (| or § or …)
Nor do I make any control that the selected output delimiter is not already exist in the CSV variable to be processed.
I cannot handle delimiters with many characters (like ||)
I have tested your CSV-string vText = ""","",a,"",b,c,"","""," def ",;;," ghi " with my ChangeDelim() function :D
(Excuse for some comments in Swedish and some other test MsgBoxes)
Function ChangeDelim()
As return, an array of row(s) from the input string with the selected(new) delimiter.

Example of a short program that tests the strings from @jeeswg

Code: Select all

#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
#SingleInstance Force
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.

Support = Jackie Ögonsten - 070 1234567

vText = ""","",a,"",b,c,"","""," def ",;;," ghi "
; vText = """a""bc""","def","ghi"

DelimFrom := ","
DelimTo := "|"
aInputCSV := ChangeDelim(vText, DelimFrom, FieldNo, DelimTo)

aResult := StrSplit(aInputCSV.1, "|", "`r`n""""")
Loop % aResult.Length()
	vOutput .= A_Index " " aResult[A_Index] "`r`n"
MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%, % vOutput

ExitApp
I got another result than @jeeswg

The result from the two test strings
My analysis shows that the result in field 1 differs between the solutions. (but are both wrong?)
Wouldn't that have been the following result? (I was expecting this)
String 1, Field 1 "","",a,"",b,c,"",""
String 2, Field 1 ""a""bc""
Albireo
Posts: 1761
Joined: 16 Oct 2013, 13:53

Re: parse CSV (specify delimiters and qualifier)

24 Jul 2019, 16:49

I wrote a new example, how I think the result should be (did not change anything in the function ChangeDelim() )
New testprogram
---------------------------
No Length String
1 20 "","",a,"",b,c,"",""
2 5 def
3 2 ;;
4 5 ghi

---------------------------
(Difficult to display nice columns in this forum)

I have not done any test how fast CSV files can be handled in the function StrSplitCSV() or ChangeDelim().
( Maybe it would be better to create a matrix directly in the function ChangeDelim())
But I don't know how to do it and handle the result )

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: No registered users and 56 guests