Loop, Parse, Var, CSV

Propose new features and changes
just me
Posts: 9453
Joined: 02 Oct 2013, 08:51
Location: Germany

Loop, Parse, Var, CSV

21 Jun 2019, 05:41

The CSV option is quite useful if (and only if) you parse a CSV file using the English default Excel format:
  • " text field delimiter.
  • , field separator.
Both characters (" and ,) are hard-coded in AHK's modul script.cpp:

Code: Select all

ResultType Line::PerformLoopParseCSV(ExprTokenType *aResultToken, bool &aContinueMainLoop, Line *&aJumpToLine, Line *aUntil)
// This function is similar to PerformLoopParse() so the two should be maintained together.
// See PerformLoopParse() for comments about the below (comments have been mostly stripped
// from this function).
{
	if (!*ARG2) // Since the input variable's contents are blank, the loop will execute zero times.
		return OK;

	// See comments in PerformLoopParse() for details.
	size_t space_needed = ArgLength(2) + 1;  // +1 for the zero terminator.
	LPTSTR stack_buf, buf;
	if (space_needed <= LOOP_PARSE_BUF_SIZE)
	{
		stack_buf = (LPTSTR)talloca(space_needed); // Helps performance.  See comments above.
		buf = stack_buf;
	}
	else
	{
		if (   !(buf = tmalloc(space_needed))   )
			return LineError(ERR_OUTOFMEM, FAIL, ARG2);
		stack_buf = NULL; // For comparison purposes later below.
	}
	_tcscpy(buf, ARG2); // Make the copy.

	TCHAR omit_list[512];
	tcslcpy(omit_list, ARG4, _countof(omit_list));

	ResultType result;
	Line *jump_to_line;
	TCHAR *field, *field_end, saved_char;
	size_t field_length;
	bool field_is_enclosed_in_quotes;
	global_struct &g = *::g; // Primarily for performance in this case.

	for (field = buf;;)
	{
		if (*field == '"')
		{
			// For each field, check if the optional leading double-quote is present.  If it is,
			// skip over it since we always know it's the one that marks the beginning of
			// the that field.  This assumes that a field containing escaped double-quote is
			// always contained in double quotes, which is how Excel does it.  For example:
			// """string with escaped quotes""" resolves to a literal quoted string:
			field_is_enclosed_in_quotes = true;
			++field;
		}
		else
			field_is_enclosed_in_quotes = false;

		for (field_end = field;;)
		{
			if (   !(field_end = _tcschr(field_end, field_is_enclosed_in_quotes ? '"' : ','))   )
			{
				// This is the last field in the string, so set field_end to the position of
				// the zero terminator instead:
				field_end = field + _tcslen(field);
				break;
			}
			if (field_is_enclosed_in_quotes)
			{
				// The quote discovered above marks the end of the string if it isn't followed
				// by another quote.  But if it is a pair of quotes, replace it with a single
				// literal double-quote and then keep searching for the real ending quote:
				if (field_end[1] == '"')  // A pair of quotes was encountered.
				{
					tmemmove(field_end, field_end + 1, _tcslen(field_end + 1) + 1); // +1 to include terminator.
					++field_end; // Skip over the literal double quote that we just produced.
					continue; // Keep looking for the "real" ending quote.
				}
				// Otherwise, this quote marks the end of the field, so just fall through and break.
			}
			// else field is not enclosed in quotes, so the comma discovered above must be a delimiter.
			break;
		}
German CSV files often use the ; semicolon as field separator, because the comma is used as decimal point. It would be a useful enhancement, to permit to specify the characters used as delimiter / separator. And it shouldn't be a really complicated task. There are some new contributors on AHK developement recently. Maybe one of them is willing to do it.
User avatar
Ragnar
Posts: 613
Joined: 30 Sep 2013, 15:25

Re: Loop, Parse, Var, CSV

21 Jun 2019, 08:31

Image

This could be done by offering other delimiter options common in Excel, such as SSV (semicolon-separated values) or TSV (tab-separated values), or even DSV (delimiter-separated values, a combination of CSV, SSV, DSV). Spaces as delimiters seem to be common in Excel as well, but would interfere with SSV by name and not very useful in most cases.

Or maybe CSVX, where X the delimiter. If X is omitted, it defaults to comma.
ahk7
Posts: 575
Joined: 06 Nov 2013, 16:35

Re: Loop, Parse, Var, CSV

21 Jun 2019, 13:01

Perhaps borrow the Dx[1] from the Sort command?

Loop, parse, CSV, Dx

[1] Dx: Specifies x as the delimiter character - https://www.autohotkey.com/docs/commands/Sort.htm
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Loop, Parse, Var, CSV

07 Jul 2019, 13:16

I implemented this suggestion for v2, see :arrow: 1d48aa7a...

If someone makes the testing and wants to do the documentation, I might make a PR. Edit: if someone else wants to make the PR feel free.

It is now:

Code: Select all

Loop ParseCSV String [, Delimiter := ",", Qualifier := '"', OmitChars]
Delimiter and Qualifier must be of length 0 or 1.

Cheers.
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Loop, Parse, Var, CSV

07 Jul 2019, 22:07

ahk7 wrote:
21 Jun 2019, 13:01
Perhaps borrow the Dx[1] from the Sort command?

Loop, parse, CSV, Dx

[1] Dx: Specifies x as the delimiter character - https://www.autohotkey.com/docs/commands/Sort.htm
Looking at this, thought that was a good idea. To maintain syntax consistency, perhaps it should be...

Loop, Parse, CSV [, Delimiters, Qualifiers, OmitChars]
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Loop, Parse, Var, CSV

08 Jul 2019, 00:45

Iirc we were going in the direction of replacing loop Parse with an iterator.
Recommends AHK Studio
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Loop, Parse, Var, CSV

08 Jul 2019, 01:18

nnnik wrote:
08 Jul 2019, 00:45
Iirc we were going in the direction of replacing loop Parse with an iterator.
Loop Parse is an easy to understand and useful syntax.

Could you clarify or provide examples of how it will be replaced?
Do you mean just replacing the Loop Parse for CSV or replacing Loop Parse entirely?
Will this replacement only affect AHK v2?
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Loop, Parse, Var, CSV

08 Jul 2019, 01:28

Code: Select all

for index, line in Parse(string, "`n", ”`r")
    ...
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Loop, Parse, Var, CSV

08 Jul 2019, 02:27

@nnnik, it was mentioned in this post that it is more likely after the alpha. Anyways, I can't imagine it would involve large rewrites of Line::PerformLoopParseCSV, so making that more general now wouldn't be wasted imo.

Disregarding that, there are no suggestions made in this thread (except the op) which are possible for v1, in particular, this is not possible, Loop, Parse, CSV [, Delimiters, Qualifiers, OmitChars]. If we would add to the current v1 syntax, it would have to be Loop, Parse, CSV [, OmitChars, DelimiterAgain, Qualifier]. I cannot imagine that happening.

The best chance for v1 to get these features would be if it was added to v2 in a way that it would be easy to backport it without breaking anything. My suggestion works for v1 except for breaking a special case of an old syntax file loop,

Code: Select all

Loop, FilePattern [, IncludeFolders?, Recurse?]
where FilePattern = ParseCSV. This would be so uncommon that I guess it would be OK, but I really do not care.

For v2, there is absolutely no reason to let the word CSV be a special case for the delimiters parameter of Loop Parse.

Cheers.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Loop, Parse, Var, CSV

08 Jul 2019, 02:38

There is no real reason to keep the special loops anymore when there are more consistent alternatives with the for loop available.
Recommends AHK Studio
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Loop, Parse, Var, CSV

08 Jul 2019, 02:46

Helgef wrote:
Disregarding that, there are no suggestions made in this thread (except the op) which are possible for v1, in particular, this is not possible, Loop, Parse, CSV [, Delimiters, Qualifiers, OmitChars]. If we would add to the current v1 syntax, it would have to be Loop, Parse, CSV [, OmitChars, DelimiterAgain, Qualifier]. I cannot imagine that happening.
I see what you mean.
Helgef wrote: The best chance for v1 to get these features would be if it was added to v2 in a way that it would be easy to backport it without breaking anything. My suggestion works for v1 except for breaking a special case of an old syntax file loop,

Code: Select all

Loop, FilePattern [, IncludeFolders?, Recurse?]
where FilePattern = ParseCSV. This would be so uncommon that I guess it would be OK, but I really do not care.

For v2, there is absolutely no reason to let the word CSV be a special case for the delimiters parameter of Loop Parse.

Cheers.
It seems that being so specific, with using the word CSV, may have caused a bit of an issue. Looks better to have allowed for multiple kinds of delimiters and then add qualifiers at the end. As a user, and maybe others think the same, don't mind the order of words used. As long as the added functionality is there, would be happy.
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Loop, Parse, Var, CSV

08 Jul 2019, 03:01

nnnik wrote:
08 Jul 2019, 02:38
There is no real reason to keep the special loops anymore when there are more consistent alternative with the for loop available.
I would think a lot of people use Loop Parse instead of For Loop. And Loop is one of those "foundation" syntax of AutoHotkey, that it's debatably famous for. Possibly many people getting familiar with programming and AutoHotkey learn what Loop does first, and then they branch out to the special Loops. Though it's not common in other programming languages, the Assembly language uses the word Loop too.

AutoHotkey's use of For Loop doesn't quite match up with how it's used in other programming languages, so in that regard, it doesn't seem like it can be said to be any better than using the syntax of Loop. Maybe it's a preference type of thing, to prefer Loop Parse or For Loop. Though I'm not debating that For Loop has distinct uses in regards to arrays and objects.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Loop, Parse, Var, CSV

08 Jul 2019, 03:15

Its not a preference. When you do not need something and it isn't uniquely beneficial then there is no real reason to add it to the language.

I could even imagine that we add a variable parameter to the v2 special loops to give them more variability - if we decide to keep them.
Then we would have:

Code: Select all

Loop outputVar, modeName, specialModeParameters* {

}
like:

Code: Select all

Loop outputVar, CSV, inputString, [OmitChars, DelimiterAgain, Qualifier] ;CSV mode rather than Parse mode as CSV is its one unique mode imo
for example.
At that point the only difference between this and the for loop is this:

Code: Select all

for outputVar in CSVParse(inputString, omitChars, delimiterAgain, qualifier)
Do you need to keep the first thing as something unique just to give AHK a fake sense of "uniwue character"?

AHKs for loop does what is most commonly called a for each loop.
The normal C styled for loop was a mistake.
Recommends AHK Studio
just me
Posts: 9453
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Loop, Parse, Var, CSV

08 Jul 2019, 03:25

Moin, I didn't post my wish to be realized with AHK v2. The " as text delimiter is rather common. So the (IMO) easiest way to specify a separator might be

Code: Select all

Loop, Parse, Var, CSV;			; specifies the semicolon as field separator
   ...
Loop, Parse, Var, CSV%A_Tab%	;  (or CSV`t) specifies the tab as field separator
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Loop, Parse, Var, CSV

08 Jul 2019, 03:27

dude, what, lol
ure reaching. Loop Parse is not what ahk is "famous" for

for item in iterable is a very standard way of doing it many programming languages. the exact keywords/symbols gluing it together may vary but the concept stays the same
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Loop, Parse, Var, CSV

08 Jul 2019, 03:35

Moin, I didn't post to realise your wish. Your suggestion is much more likely to break existing scripts, and less useful.

Cheers.
just me
Posts: 9453
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Loop, Parse, Var, CSV

08 Jul 2019, 06:53

Helgef wrote:Your suggestion is much more likely to break existing scripts, and less useful.
How would it break existing scripts?

Code: Select all

#NoEnv
Str := "1,2,3,4,5,6,7,8,9"
Loop, Parse, Str, CSV;
   MsgBox, %A_LoopField%
ExitApp
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Loop, Parse, Var, CSV

08 Jul 2019, 07:12

Loop parse wrote:If this parameter is CSV, InputVar will be parsed in standard comma separated value format.
Otherwise, Delimiters contains one or more characters (case sensitive), each of which is used to determine where the boundaries between substrings occur in InputVar.
Meaning that if it isn't CSV, but CSVx, C, S, V and x are delimiters.

MyFavouriteWorkingScript.ahk:

Code: Select all

Str := "1C2S3V4;5"
Loop, Parse, Str, CSV;
   MsgBox, %A_LoopField%
ExitApp
Cheers.
User avatar
Ragnar
Posts: 613
Joined: 30 Sep 2013, 15:25

Re: Loop, Parse, Var, CSV

08 Jul 2019, 07:36

However, it is very unlikely that scripts exist which use "CSV;" or similar as delimiters.
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Loop, Parse, Var, CSV

08 Jul 2019, 14:05

But do we need to even deal with CSV as the delimiter? Why can't the Qualifier be put at the end? This would seem not to break scripts.

Loop, Parse, InputVar [, Delimiters, OmitChars, Qualifiers]

If we are using CSV, then perhaps tack on for an alternative Delimiter at the end, to specify if other than a comma.

Loop, Parse, InputVar, CSV [, OmitChars, Qualifiers, AlternativeDelimiter]

I don't think this is as awkward as it was initially presented as, "DelimiterAgain". In the case of CSV, that's where you could specify for an alternative field separator.

Return to “Wish List”

Who is online

Users browsing this forum: Xtra and 70 guests