[Library] DSV Parser

Post your working scripts, libraries and tools
jasonsparc
Posts: 2
Joined: 27 Nov 2019, 03:26
GitHub: jasonsparc

[Library] DSV Parser

04 Dec 2019, 03:06

A simple utility for parsing delimiter-separated values (i.e., DSV) in AutoHotkey scripts, whether that be comma-separated (i.e., CSV), tab-separated (i.e., TSV), or something else, possibly even exotic ones. Check out the full list of features at, https://github.com/jasonsparc/DSVParser-AHK

I hope this ends up being useful to anyone. :salute:

Example usage:

Code: Select all

; Load a TSV data string
FileRead tsvStr, data.tsv

; Parse the TSV data string
MyTable := TSVParser.ToArray(tsvStr)

; ...

MsgBox % MyTable[2][1] ; Access 1st cell of 2nd row

; ... do something else with `MyTable` ...

; Convert into a CSV, with custom line break settings
csvStr := CSVParser.FromArray(MyTable, "`n", false)

FileDelete new-data.csv
FileAppend %csvStr%, *new-data.csv

EDIT 1: Rather than Critical, I now recommend SetBatchLines -1, to allow other threads and timers from interrupting.
EDIT 2: I decided to bake in SetBatchLines -1, so that you wouldn't have to use it manually with DSVParser.ToArray(…).
Last edited by jasonsparc on 08 Dec 2019, 12:07, edited 4 times in total.
DRocks
Posts: 559
Joined: 08 May 2018, 10:20

Re: [Library] DSV Parser

04 Dec 2019, 06:05

Nice code !
Thanks.

I am wondering why you did not use the built-in Loop, Parse, _, Csv ?
Was it for specific custom needs ?
- Seeing that you do some complex regexmatch stuff.
jasonsparc
Posts: 2
Joined: 27 Nov 2019, 03:26
GitHub: jasonsparc

Re: [Library] DSV Parser

04 Dec 2019, 13:08

DRocks wrote:
04 Dec 2019, 06:05
I am wondering why you did not use the built-in Loop, Parse, _, Csv ?
Was it for specific custom needs ?
I created it to parse a fairly large and complex TSV. :think:

If you're familiar with Anki (you may skip this paragraph if not), I created this utility to parse my huge (as in huge) Anki export file, which happens to contain newlines and other weird characters packed in just a single TSV cell. I wanted to migrate a particular Anki deck of mine (that has over 15k+ cards) to SuperMemo and was having a hard time preserving all my complex notes, in which each note having multiple complex fields (there's also JS scripts and custom CSS stylings). So I'll also need the DSV data in a state I could manipulate easily (e.g., a string and an array), to convert it to a format that SuperMemo can eat raw without it spewing hellfire. Also, if you know SuperMemo, then you should know how fantastically poopy that funny buggy thingy is.

Anyway, I could have solved my problem with Loop, Parse, _, CSV. I could have converted the TSV file first to CSV. But, ...

The first problem I faced was, "How to split the file into multiple rows, when some newlines could be part of a single cell's data?" Remember how in CSV, newlines would delimit a record, if they're not inside quotes, but apparently Loop, Parse, _, CSV doesn't recognize those newlines, if you feed it a string, it will assume that the entirety of that string is a single row. Here's a demonstration:

Code: Select all

data =
(
1,2,3,4
5,6,7,8
)

Loop, parse, data, CSV
{
	MsgBox, 4, , Cell %A_Index% is:`n%A_LoopField%`n`nContinue?
	IfMsgBox, No
		return
}
By the time, you hit cell 4, AHK thinks that the newline is part of the entire record, and not a record delimiter, therefore you'll get 4`r`n5 as cell 4's data. So you have to figure out how to first break down the data into records, while taking into account that newlines may indeed be part of a single cell's data (i.e., can't even use Loop, Parse, _, `n,`r nor Loop, Read).

I could have hacked my way into it, set up a code that looks like frankenstein, but I thought, "Why not make a simple elegant utility instead?" – thinking that maybe there's someone out there who's also facing (or would be facing) the same problem. Now, there have been several AHK scripts that tackled this in the past though, so I'm not the first. :beard:

But, at least mine was built to handle complex DSV data more robustly (also, battle tested :morebeard: with unit tests), with parsing packed in a single RegEx match to ensure that all the parsing overhead is on the native side (and of course I did also tried a non-regex implementation, just to make sure that mine is indeed performant), feeds on and outputs raw DSV strings unlike most of the scripts I've found so far, allows custom delimiters and qualifiers (no need to convert the input data to CSV first), and for most use cases you'll only need 2 methods for DSV manipulation (i.e., ToArray(…) and FromArray(…)) which is hopefully a win for some people who only want a DSV string parsed as a 2D array quick.

Also, it feels bad to create a throw-away frankenstein code, don't you think so? :salute:
DRocks
Posts: 559
Joined: 08 May 2018, 10:20

Re: [Library] DSV Parser

05 Dec 2019, 20:22

Thanks for your detailed reply, its super interesting.
I was not aware of Anki but I read the paragraph anyway lol.
Seems gigantic and complex to parse correctly!

Good job on accomplishing that.
Sometimes we don't want to "re-invent" the wheel as they say but it still can be necessary in many scenarios for me.
User avatar
Chunjee
Posts: 690
Joined: 18 Apr 2014, 19:05
GitHub: Chunjee

Re: [Library] DSV Parser

27 Jan 2020, 14:00

I believe this library is of high quality and have recommended it to someone seeking a way to parse their CSV files. I too will keep it in mind for future projects.
AHK_user
Posts: 70
Joined: 04 Dec 2015, 14:52
Location: Belgium

Re: [Library] DSV Parser

27 Jan 2020, 17:03

Quite usefull, in my country we use ";" as delimiter...
guest3456
Posts: 3145
Joined: 09 Oct 2013, 10:31

Re: [Library] DSV Parser

27 Jan 2020, 17:07

Chunjee wrote:
27 Jan 2020, 14:00
I believe this library is of high quality
when unit tests are shipped along with the library, its usually high quality


Return to “Scripts and Functions”

Who is online

Users browsing this forum: Google [Bot], keylo, Spikea and 33 guests