DRocks wrote: ↑
04 Dec 2019, 06:05
I am wondering why you did not use the built-in Loop, Parse, _, Csv
Was it for specific custom needs ?
I created it to parse a fairly large and complex TSV.
If you're familiar with Anki (you may skip this paragraph if not), I created this utility to parse my huge (as in huge) Anki export file, which happens to contain newlines and other weird characters packed in just a single TSV cell. I wanted to migrate a particular Anki deck of mine (that has over 15k+ cards) to SuperMemo and was having a hard time preserving all my complex notes, in which each note having multiple complex fields (there's also JS scripts and custom CSS stylings). So I'll also need the DSV data in a state I could manipulate easily (e.g., a string and an array), to convert it to a format that SuperMemo can eat raw without it spewing hellfire. Also, if you know SuperMemo, then you should know how fantastically poopy that funny buggy thingy is.
Anyway, I could have solved my problem with Loop, Parse, _, CSV
. I could have converted the TSV file first to CSV. But, ...
The first problem I faced was, "How to split the file into multiple rows, when some newlines could be part of a single cell's data?" Remember how in CSV, newlines would delimit a record, if they're not inside quotes, but apparently Loop, Parse, _, CSV
doesn't recognize those newlines, if you feed it a string, it will assume that the entirety of that string is a single row. Here's a demonstration:
Code: Select all
Loop, parse, data, CSV
MsgBox, 4, , Cell %A_Index% is:`n%A_LoopField%`n`nContinue?
By the time, you hit cell 4, AHK thinks that the newline is part of the entire record, and not a record delimiter, therefore you'll get 4`r`n5
as cell 4's data. So you have to figure out how to first break down the data into records, while taking into account that newlines may indeed be part of a single cell's data (i.e., can't even use Loop, Parse, _, `n,`r
nor Loop, Read
I could have hacked my way into it, set up a code that looks like frankenstein, but I thought, "Why not make a simple elegant utility instead?" – thinking that maybe there's someone out there who's also facing (or would be facing) the same problem. Now, there have been several AHK scripts that tackled this in the past though, so I'm not the first.
But, at least mine was built to handle complex DSV data more robustly (also, battle tested
with unit tests), with parsing packed in a single RegEx match to ensure that all the parsing overhead is on the native side (and of course I did also tried a non-regex implementation, just to make sure that mine is indeed performant), feeds on and outputs raw DSV strings unlike most of the scripts I've found so far, allows custom delimiters and qualifiers (no need to convert the input data to CSV first), and for most use cases you'll only need 2 methods for DSV manipulation (i.e., ToArray(…)
) which is hopefully a win for some people who only want a DSV string parsed as a 2D array quick.
Also, it feels bad to create a throw-away frankenstein code, don't you think so?