[Library] DSV Parser

04 Dec 2019, 03:06

A simple utility for parsing delimiter-separated values (i.e., DSV) in AutoHotkey scripts, whether that be comma-separated (i.e., CSV), tab-separated (i.e., TSV), or something else, possibly even exotic ones. Check out the full list of features at, https://github.com/jasonsparc/DSVParser-AHK

I hope this ends up being useful to anyone.

Example usage:

Code: Select all

; Load a TSV data string
FileRead tsvStr, data.tsv

; Parse the TSV data string
MyTable := TSVParser.ToArray(tsvStr)

; ...

MsgBox % MyTable[2][1] ; Access 1st cell of 2nd row

; ... do something else with `MyTable` ...

; Convert into a CSV, with custom line break settings
csvStr := CSVParser.FromArray(MyTable, "`n", false)

FileDelete new-data.csv
FileAppend %csvStr%, *new-data.csv

EDIT 1: Rather than Critical, I now recommend SetBatchLines -1, to allow other threads and timers from interrupting.
EDIT 2: I decided to bake in SetBatchLines -1, so that you wouldn't have to use it manually with DSVParser.ToArray(…).

DRocks · 04 Dec 2019, 06:05

Nice code !
Thanks.

I am wondering why you did not use the built-in Loop, Parse, _, Csv ?
Was it for specific custom needs ?
- Seeing that you do some complex regexmatch stuff.

04 Dec 2019, 13:08

DRocks wrote: ↑
04 Dec 2019, 06:05
I am wondering why you did not use the built-in Loop, Parse, _, Csv ?
Was it for specific custom needs ?

I created it to parse a fairly large and complex TSV.

If you're familiar with Anki (you may skip this paragraph if not), I created this utility to parse my huge (as in huge) Anki export file, which happens to contain newlines and other weird characters packed in just a single TSV cell. I wanted to migrate a particular Anki deck of mine (that has over 15k+ cards) to SuperMemo and was having a hard time preserving all my complex notes, in which each note having multiple complex fields (there's also JS scripts and custom CSS stylings). So I'll also need the DSV data in a state I could manipulate easily (e.g., a string and an array), to convert it to a format that SuperMemo can eat raw without it spewing hellfire. Also, if you know SuperMemo, then you should know how fantastically poopy that funny buggy thingy is.

Anyway, I could have solved my problem with Loop, Parse, _, CSV. I could have converted the TSV file first to CSV. But, ...

The first problem I faced was, "How to split the file into multiple rows, when some newlines could be part of a single cell's data?" Remember how in CSV, newlines would delimit a record, if they're not inside quotes, but apparently Loop, Parse, _, CSV doesn't recognize those newlines, if you feed it a string, it will assume that the entirety of that string is a single row. Here's a demonstration:

Code: Select all

data =
(
1,2,3,4
5,6,7,8
)

Loop, parse, data, CSV
{
	MsgBox, 4, , Cell %A_Index% is:`n%A_LoopField%`n`nContinue?
	IfMsgBox, No
		return
}

By the time, you hit cell 4, AHK thinks that the newline is part of the entire record, and not a record delimiter, therefore you'll get 4`r`n5 as cell 4's data. So you have to figure out how to first break down the data into records, while taking into account that newlines may indeed be part of a single cell's data (i.e., can't even use Loop, Parse, _, `n,`r nor Loop, Read).

I could have hacked my way into it, set up a code that looks like frankenstein, but I thought, "Why not make a simple elegant utility instead?" – thinking that maybe there's someone out there who's also facing (or would be facing) the same problem. Now, there have been several AHK scripts that tackled this in the past though, so I'm not the first.

But, at least mine was built to handle complex DSV data more robustly (also, battle tested

with unit tests), with parsing packed in a single RegEx match to ensure that all the parsing overhead is on the native side (and of course I did also tried a non-regex implementation, just to make sure that mine is indeed performant), feeds on and outputs raw DSV strings unlike most of the scripts I've found so far, allows custom delimiters and qualifiers (no need to convert the input data to CSV first), and for most use cases you'll only need 2 methods for DSV manipulation (i.e., ToArray(…) and FromArray(…)) which is hopefully a win for some people who only want a DSV string parsed as a 2D array quick.

Also, it feels bad to create a throw-away frankenstein code, don't you think so?

DRocks · 05 Dec 2019, 20:22

Thanks for your detailed reply, its super interesting.
I was not aware of Anki but I read the paragraph anyway lol.
Seems gigantic and complex to parse correctly!

Good job on accomplishing that.
Sometimes we don't want to "re-invent" the wheel as they say but it still can be necessary in many scenarios for me.

Chunjee · 27 Jan 2020, 14:00

I believe this library is of high quality and have recommended it to someone seeking a way to parse their CSV files. I too will keep it in mind for future projects.

AHK_user · 27 Jan 2020, 17:03

Quite usefull, in my country we use ";" as delimiter...

guest3456 · 27 Jan 2020, 17:07

Chunjee wrote: ↑
27 Jan 2020, 14:00
I believe this library is of high quality

when unit tests are shipped along with the library, its usually high quality

arimania · 17 Mar 2021, 07:14

I tried to get data and write to the cell A2 but no luck, it notwork, the csv is extracted from imacros, then i change the comma to pipe "|"

Code: Select all

#Include D:\Documents\AutoHotkey\Lib\DSVParser.ahk
global BSVParser := new DSVParser("|")

; Load a CSV data string
FileRead csvStr, "D:\sampleofpipe - Copy.csv"


; Parse the BSV data string,   DSVParser.ToArray(csvStr) is not works too, i change to BSVParser
MyTable := BSVParser.ToArray(csvStr)

MsgBox % MyTable[1][2] ; Access 2nd cell of 1st  row

my csv file is "cellA1"|"cellB1" format, i mean in 2 colums (with 3 rows). This is the sample of file content of CSV when opened with notepad

Code: Select all

"https website.com /wp-admin/post.php?post=7493&action=edit"|"[vc_row][vc_column][vc_column_text]  Broken Link for safety
<h2 style=""text-align: center; color: red;"">Something</h2>"
"https website.com /wp-admin/post.php?post=7493&action=edit"|"[vc_row][vc_column][vc_column_text]  Broken Link for safety
<h2 style=""text-align: center; color: red;"">WAhyu Revealed</h2>"
<div style=""text-align: left;""><img class=""aligncenter"" src=""https website.com /wp-content/uploads/2020/03/goodluck.jpg""  Broken Link for safety /></div>
<div style=""text-align: left;"">
SOMETEXT ARE HERE"
"https website.com /wp-admin/post.php?post=7484&action=edit"|"[vc_row][vc_column][vc_column_text]  Broken Link for safety
<h2 style=""text-align: center; color: red;"">  For JUST A PATCHOULI OIL</h2>
<div style=""text-align: left;""><img class=""aligncenter"" src=""https scontent.fsub8-1.fna.fbcdn.net /v/t1.0-9/74371753_2337951126331775_8279704392630796288_n.jpg?_nc_cat=104&amp;_nc_eui2=AeGbfwLAO0GpIB0azSnemXN1PN5C6WaJIHHGFFuHr67YlnmzivA09qlDoq2MfgxPqCvNnTuxV-V6R5HQcn83Mr4-oMSP9MqUGUJjgXcyjsQFoQ&amp;_nc_oc=AQkGaowHr-ycWVmiBMAElYnw9fxCi76w_r11iJnaJq4c7x-q1Hg9UW6DAwePGEmr5OE&amp;_nc_ht=scontent.fsub8-1.fna&amp;oh=cad6b5ca70f9960b256a59efa99a792b&amp;oe=5E59F271""  Broken Link for safety width=""673"" height=""672"" /></div>
<div style=""text-align: left;"">
OK STILL ANOTHER TEXT"

Please help, whats wrong and how make it works?

03 Jun 2021, 14:08

Hi, arimania! Sorry for the late reply. (My bad, for not checking.)

The problem is in here,

Code: Select all

; Load a CSV data string
FileRead csvStr, "D:\sampleofpipe - Copy.csv"

Notice the quotes around your input file path. It should have instead been,

Code: Select all

; Load a CSV data string
FileRead csvStr, D:\sampleofpipe - Copy.csv

That is, no quotes, as per AHK syntax rules.

And actually, when I tried your code and sample file, it works for me. (See the image attachment.)

[Library] DSV Parser

[Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Re: [Library] DSV Parser

Who is online