- I wrote an attempt at parsing a line of CSV text, where you can specify multiple delimiters, multiple omit characters and one qualifier.
- I've only done a few tests on it, and it's not something I'm likely to use very much.
- The principle it uses is: use RegExMatch to find the next delimiter, if there are an even number of qualifiers (e.g. quotes) between the first character and delimiter, then you have reached the end of a field, otherwise you are still within a field.
- People are welcome to make suggestions or modify/adapt the script.
- Has anything similar been written? Thanks.
As the old forum sometimes has messed up code I would check/use the one in the CSV lib as it is confirmed to work (at least for past several years), starts here https://github.com/hi5/CSV/blob/master/csv.ahk#L578
- I wrote an attempt at parsing a line of CSV text, where you can specify multiple delimiters, multiple omit characters and one qualifier.
Good suggestion, will do a test of the function .: StrSplitCSV(vText)
There are some things that I do not understand "multiple delimiters" is it many different delimiters you mean? (not at the same time?) or delimiters with many characters?
... it's not something I'm likely to use very much...
I would probably use this quite often (in connection with CSV files) depending on how fast your function is.
I did it in a different way in order to use eg. StrSplit() from AHK in my case, I wrote a function that analyzes a CSV variable, replaces the existing delimiter with an optional delimiter.
One of the problems that arises, is which delimiter / sign combination works best to work with later. (| or § or …)
Nor do I make any control that the selected output delimiter is not already exist in the CSV variable to be processed.
I cannot handle delimiters with many characters (like ||)
I have tested your CSV-string vText = ""","",a,"",b,c,"","""," def ",;;," ghi " with my ChangeDelim() function
(Excuse for some comments in Swedish and some other test MsgBoxes)
ChangeDelim(aInputCSV, DelimFrom, ByRef FieldNo, DelimTo := ",")
{ ; Version 15 juli 2019
; aString := ChangeDelim(CSVData, DelimFrom, CountField, DelimTo)
; aString := ChangeDelim(CSVData, DelimFrom, CountField)
;
;
; Both Strings and Arrays can be managed.
; Change the field delimiters From ?? to ","
; Manages both to change the delimiter, and the same delimiter as the result
;
; All CSV-files must be checked / handled
; If no fields with content could be found, give an Error message.
; Skip all lines without characters (< 2 characters)
; Skip lines with only field delimiters and quotes characters.
; Don't handle fields with many quotes (like this .: ;""Test"";
; Assumes that all strings are delimited with quotation marks, especially strings with delimiters in the text eg. price.
; If number of fields is equal in all rows, it is not checked (summa / antal godkända poster/rader = jämt antal?)
;
; An array of rows is created in Return
; Also the number of fields is returned from the function
; Only the first field is checked)
FuncName := "ChangeDelim()" ; Endast för felhantering
arrayCSV := {}
If ( !IsObject(aInputCSV) ) ; If InputCSV is a string (not object) - Convert aInputCSV to an Object
aInputCSV := StrSplit(aInputCSV, "`n", " `t`r")
; MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%, % aInputCSV[1] "`n" aInputCSV[2] "`n`n" aInputCSV.Length()
For LineNo, Line in aInputCSV
{ If ( StrLen(Line) < 2 ) ; Behandla EJ helt tomma rader
Continue
NewInputCSV =
CharPos = 1
CountField = 0
CountQuotes = 0
For CharNo, Char in StrSplit(Line)
{ if !inQuotes
{ ; Check for START quote character
if ( Char = """" )
{ inQuotes := True
CountQuotes += 1
Continue
}
; Check if the character is a delimiter
if ( Char = DelimFrom )
{ If ( DelimFrom <> DelimTo ) ; Om fältdelare ska ändras
NewInputCSV .= SubStr(Line, CharPos, A_Index - CharPos) DelimTo
CharPos := A_Index + 1
CountField += 1
}
}
else
{ ; Check for END quote character
if ( Char = """" )
{ inQuotes := False
CountQuotes += 1
}
}
}
; Create the result string
NewInputCSV .= SubStr(Line, StrLen(NewInputCSV) + 1 , StrLen(Line) - StrLen(NewInputCSV) + 1 )
; MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%, % "`n- " CountField "`n`n- " CountQuotes "`n`n- " StrLen(NewInputCSV)
If ( StrLen(NewInputCSV) > (CountField + CountQuotes + 2) )
{ arrayCSV.Push( NewInputCSV )
RowOK += 1
If RowOK = 1
{ FieldNo := CountField + 1 ; Räkna antal fält i första godkända sträng!
; MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%, % NewInputCSV "`n`nGodkänd! ( " FieldNo " )"
}
}
}
; StopTime := A_TickCount
; RunTime := A_TickCount - StartTime
; MsgBox ,, Rad.: %A_LineNumber% -> %A_ScriptName%, % "Input CSV .: `n" aInputCSV[1] "`n`nThe new CSV .: `n" NewInputCSV "`n`n`nRun time .: " RunTime " ms"
; MsgBox ,, Rad.: %A_LineNumber% -> %A_ScriptName%, % arrayCSV.1
; MsgBox ,, Rad.: %A_LineNumber% -> %A_ScriptName%, % CountField
; If no fields with content could be found
If ( FieldNo < 1 )
{ MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%,
( LTrim Join
% "Fält saknas i CSV-variabeln! `n
Funktion .: " FuncName " `n
Variabel .: aInputCSV[] `n`n
Detta program kan inte fortsätta. `n
`tKommer att avslutas!"
)
MsgBox ,,, Programmet avslutas!, 1
ExitApp
}
Return % arrayCSV
}
As return, an array of row(s) from the input string with the selected(new) delimiter.
Example of a short program that tests the strings from @jeeswg
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn ; Enable warnings to assist with detecting common errors.
#SingleInstance Force
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
Support = Jackie Ögonsten - 070 1234567
vText = ""","",a,"",b,c,"","""," def ",;;," ghi "
; vText = """a""bc""","def","ghi"
DelimFrom := ","
DelimTo := "|"
aInputCSV := ChangeDelim(vText, DelimFrom, FieldNo, DelimTo)
aResult := StrSplit(aInputCSV.1, "|", "`r`n""""")
Loop % aResult.Length()
vOutput .= A_Index " " aResult[A_Index] "`r`n"
MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%, % vOutput
ExitApp
I got another result than @jeeswg
The result from the two test strings
String 1 .: ""","",a,"",b,c,"","""," def ",;;," ghi "
Result from @jeeswg .:
---------------------------
",",a,",b,c,","
def
;;
ghi
---------------------------
Result from the function ChangeDelim()
---------------------------
,"",a,"",b,c,"",
def
;;
ghi
---------------------------
String 2 .: """a""bc""","def","ghi"
Result from @jeeswg .:
---------------------------
"a"bc"
def
ghi
---------------------------
Result from the function ChangeDelim()
---------------------------
a""bc
def
ghi
---------------------------
My analysis shows that the result in field 1 differs between the solutions. (but are both wrong?)
Wouldn't that have been the following result? (I was expecting this) String 1, Field 1 "","",a,"",b,c,"","" String 2, Field 1 ""a""bc""
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn ; Enable warnings to assist with detecting common errors.
#SingleInstance Force
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
Support = Jackie Ögonsten - 070 1234567
vText = ""","",a,"",b,c,"","""," def ",;;," ghi "
; vText = """a""bc""","def",ghi
DelimFrom := ","
DelimTo := "|"
aInputCSV := ChangeDelim(vText, DelimFrom, FieldNo, DelimTo)
vOutput = No `tLength `tString `n
aResult := StrSplit(aInputCSV.1, "|", "`r`n")
Loop % aResult.Length()
{ If ( SubStr(aResult[A_Index], 1, 1) = Chr(34) )
vOutput .= A_Index "`t" StrLen(aResult[A_Index])-2 " `t" SubStr(aResult[A_Index], 2, StrLen(aResult[A_Index])-2) "`r`n"
else
vOutput .= A_Index "`t" StrLen(aResult[A_Index]) " `t" aResult[A_Index] "`r`n"
}
MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%, % vOutput
ExitApp
ChangeDelim(aInputCSV, DelimFrom, ByRef FieldNo, DelimTo := ",")
{ ; Version 24 juli 2019
; aString := ChangeDelim(CSVData, DelimFrom, CountField, DelimTo)
; aString := ChangeDelim(CSVData, DelimFrom, CountField)
;
;
; Both Strings and Arrays can be managed.
; Change the field delimiters From ?? to ","
; Manages both to change the delimiter, and the same delimiter as the result
;
; All CSV-files must be checked / handled
; If no fields with content could be found, give an Error message.
; Skip all lines without characters (< 2 characters)
; Skip lines with only field delimiters and quotes characters.
; Don't handle fields with many quotes (like this .: ;""Test"";
; Assumes that all strings are delimited with quotation marks, especially strings with delimiters in the text eg. price.
; If number of fields is equal in all rows, it is not checked (summa / antal godkända poster/rader = jämt antal?)
;
; An array of rows is created in Return
; Also the number of fields is returned from the function
; Only the first field is checked
; Do not handles delimiters with more than one character
Global Support
FuncName := "ChangeDelim()" ; Only for error handling
DelimFromNo := StrLen(DelimFrom)
DelimToNo := StrLen(DelimTo)
If ((DelimFromNo <> 1) OR (DelimToNo <> 1))
{ MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%,
( LTrim
% "Fältdelare med fler än ett tecken kan inte hanteras! `n
Fältdelare från " DelimFrom " (" DelimFromNo " tkn.)
FältDelare till " DelimTo " (" DelimToNo " tkn.) `n
Programnamn .: " A_ScriptName "
Funktion .: " FuncName "`n
Kontakta .: " Support "`n
Detta program måste avslutas!"
)
MsgBox ,,, Programmet avslutas!, 1
ExitApp
}
arrayCSV := {}
If ( !IsObject(aInputCSV) ) ; If InputCSV is a string (not object) - Convert aInputCSV to an Object
aInputCSV := StrSplit(aInputCSV, "`n", " `t`r")
For LineNo, Line in aInputCSV
{ If ( StrLen(Line) < 2 ) ; Do NOT process completely empty lines
Continue
NewInputCSV =
CharPos = 1
CountField = 0
CountQuotes = 0
For CharNo, Char in StrSplit(Line)
{ if !inQuotes
{ ; Check for START quote character
if ( Char = """" )
{ inQuotes := True
CountQuotes += 1
Continue
}
; Check if the character is a delimiter
if ( Char = DelimFrom )
{ If ( DelimFrom <> DelimTo ) ; If the delimiter is to be changed
NewInputCSV .= SubStr(Line, CharPos, A_Index - CharPos) DelimTo
CharPos := A_Index + 1 ; Next field begin
CountField += 1
}
}
else
{ ; Check for END quote character
if ( Char = """" )
{ inQuotes := False
CountQuotes += 1
}
}
}
; Create the result string
NewInputCSV .= SubStr(Line, StrLen(NewInputCSV) + 1 , StrLen(Line) - StrLen(NewInputCSV) + 1 )
If ( StrLen(NewInputCSV) > (CountField + CountQuotes + 2) )
{ arrayCSV.Push( NewInputCSV )
RowOK += 1
If RowOK = 1
{ FieldNo := CountField + 1 ; Räkna antal fält i första godkända sträng!
}
}
}
; If no fields with content could be found
If ( FieldNo < 1 )
{ MsgBox 64, Rad.: %A_LineNumber% -> %A_ScriptName%,
( LTrim Join
% "Fält saknas i CSV-variabeln! `n
Funktion .: " FuncName "
Programnamn .: " A_ScriptName "`n
Variabel .: aInputCSV[] `n`n
Kontakta .: " Support "`n
Detta program måste avslutas!"
)
MsgBox ,,, Programmet avslutas!, 1
ExitApp
}
Return % arrayCSV
}
---------------------------
(Difficult to display nice columns in this forum)
I have not done any test how fast CSV files can be handled in the function StrSplitCSV() or ChangeDelim().
( Maybe it would be better to create a matrix directly in the function ChangeDelim())
But I don't know how to do it and handle the result )