Is it possible to keep string processing syntax consistent?

Discuss the future of the AutoHotkey language
tuzi
Posts: 223
Joined: 27 Apr 2016, 23:40

Is it possible to keep string processing syntax consistent?

31 Jul 2020, 21:52

for example substr(),trim(),strsplit(),strreplace(), all string functions return the processed string, except RegExMatch() and RegExReplace().

It seems that V1 has compromised in order to maintain compatibility, but V2 has given up compatibility with v1. Why not unify their behavior? :?:
User avatar
boiler
Posts: 16772
Joined: 21 Dec 2014, 02:44

Re: Is it possible to keep string processing syntax consistent?

31 Jul 2020, 22:44

RegExReplace() does return the processed string. There was discussion here regarding the suggestion to have RegExMatch return a match object instead of the found position.
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 00:26

The non-RegEx equivalent of RegExMatch is InStr, which does not return a string either.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 02:55

Hey if we are at it I think StrLen should return a string too. \s
Recommends AHK Studio
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 04:03

@tuzi @lexikos @nnnik
IMHO regexmatch and instr return what is naturally expected (a found index within the string) but we might have their counterparts added to return a substring, too. Or add additional output var parameter to them. It would save CPU time and reduce script's code cuz in the majority of cases substr is following it anyways.
tuzi
Posts: 223
Joined: 27 Apr 2016, 23:40

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 06:31

@boiler
@lexikos
@nnnik
@vvhitevvizard
I have read the previous discussion, but I still adhere to my point.

The reason is that:

instr(), we pass in is a known string like "abc", so the expected result is the "abc" in or not in the Haystack, if in, retrun pos. if not in, return zero.

RegExMatch(), we pass in is a pattern like "\d+", so the expected result is the string that matches the pattern. not a pattern in or not in the Haystack.

Strlen(), we pass in is a known string like "abc",so we just want to get the length.
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 06:37

Code: Select all

MsgBox % StrLen("hello world")
; => "eleven"
@nnnik was onto something :lol:
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 06:44

tuzi wrote:
01 Aug 2020, 06:31
RegExMatch(), we pass in is a pattern like "\d+", so the expected result is the string that matches the pattern. not a pattern in or not in the Haystack.
I do support ur point. I, for one, would like all that stuff to follow the same design rule.
Changing RegExMatch() to return substr or NULL (empty string) sounds reasonable. The command creates Match object to return position, length and so on for the found substrings
User avatar
boiler
Posts: 16772
Joined: 21 Dec 2014, 02:44

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 06:46

tuzi wrote:
01 Aug 2020, 06:31
RegExMatch(), we pass in is a pattern like "\d+", so the expected result is the string that matches the pattern. not a pattern in or not in the Haystack.
The pattern is not always so simple, thus the result is not always a simple string. An object is used to capture the various aspects of the match (captured subpatterns, for example). In the thread I linked in my prior post, fincs suggested having it return the match object, and lexikos pointed out the reasons why he is not implementing that suggestion — largely for performance reasons.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: Is it possible to keep string processing syntax consistent?

01 Aug 2020, 07:58

boiler wrote:
01 Aug 2020, 06:46
The pattern is not always so simple, thus the result is not always a simple string.
To return the whole string for simple patterns (=Match.Value(0)) or the first captured group inside () (=Match.Value(1)) or NULL (empty string) if none found.
It makes sense to have a simplified RegExMatch's counterpart (RegEx(Haystack, NeedleRegEx, Starting Pos) or something) that returns no Match object but just a substring - analogous to ~= shorthand that just returns a position or NULL.
tuzi
Posts: 223
Joined: 27 Apr 2016, 23:40

Re: Is it possible to keep string processing syntax consistent?

02 Aug 2020, 11:00

@boiler

I read the views of lexikos and fincs.
At first, I shared the same view with fincs that we should return the matching object directly.
But lexikos is right. Many times people want to use it simply, so the new view is that for simple use, use return value(string), and for complex cases, use matching objects.

When people get pos, they do two things most often.
1 is use pos to know pattern match or not match.
2 is use pos to get match string. Obviously, further processing is needed, such as getting the length or using the matching object.

if return string, we can handle two situations well.

In conclusion, my point is complex cases use matching objects, and simple cases use string instead of pos, because isolated pos are less useful than string.
tuzi
Posts: 223
Joined: 27 Apr 2016, 23:40

Re: Is it possible to keep string processing syntax consistent?

02 Aug 2020, 11:09

@vvhitevvizard
If we create three different RegExMatch() regex ~=, will it confuse people?
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Is it possible to keep string processing syntax consistent?

02 Aug 2020, 20:54

tuzi wrote:
01 Aug 2020, 06:31
RegExMatch(), we pass in is a pattern like "\d+", so the expected result is the string that matches the pattern.
The expected result is a value indicating whether the RegEx matches. If it returned the overall matched value, it would not be a boolean value, so statements such as if RegExMatch(haystack, pattern) would be unreliable. Patterns can match strings of zero digits or of zero length.

I need the position more often than the overall match, such as for repeated matching in a large string. If I need a matched value, it is usually a subpattern, not the overall match.

You can always write your own very simple function that adapts RegExMatch to whatever usage you want.
User avatar
vvhitevvizard
Posts: 454
Joined: 25 Nov 2018, 10:15
Location: Russia

Re: Is it possible to keep string processing syntax consistent?

03 Aug 2020, 07:00

tuzi wrote:
02 Aug 2020, 11:09
@vvhitevvizard
If we create three different RegExMatch() regex ~=, will it confuse people?
Possibility to choose would confuse some. :D
But in general, having function's counterparts with slightly different in/out types and/or total params amount is common:
E.g. gdiplus\GdipFillRectangle accepting float coordinates and gdiplus\GdipFillRectangleI accepting integers,
kernel32\LoadLibrary accepting 1 parameter and kernel32\LoadLibraryEx accepting 3 parameters (including additional flags),
and so on, not to mention that there r A and W variants for many string accepting functions to distinguish ANSI from Unicode ones.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Is it possible to keep string processing syntax consistent?

03 Aug 2020, 08:37

Yes but those follow a commonly established convention.
For RegExMatch it would be completely random.
Nobody would expect the output parameter to change depending on the calling convention.
Recommends AHK Studio
_3D_
Posts: 277
Joined: 29 Jan 2014, 14:40

Re: Is it possible to keep string processing syntax consistent?

06 Aug 2020, 08:30

Why string methods return integer?
1. Integer is most closer to boolean - so you can faster check the result.
2. Integer is most closer to cpu - so you get faster maintenance.
3. Integer is small enough - so result will lend less ram and or stack.

If you think about string as "abc" and result as "a" - yes string is better. But if your string is 10K and result is 8K then:
1. Result will lend double ram - and if you need just to check if(strmethod(...)) - ram will become highly fragmented.
2. If you use the result as argument - you will fill stack with unusual data or you must have special treatment of using string when argument (pass string as reference or pointer).

In AHK strmethods(...) have extra: return 0 if not - if result is string "" you don`t know is that the result or no coincidence.

In conclusion: result must be simple and unambiguous.
AHKv2.0 alpha forever.

Return to “AutoHotkey Development”

Who is online

Users browsing this forum: No registered users and 38 guests