RegexMatch get next position by default

Propose new features and changes
User avatar
RaptorX
Posts: 395
Joined: 06 Dec 2014, 14:27
Contact:

RegexMatch get next position by default

01 May 2022, 02:18

I think it would be beneficial for RegexMatch to either allow for global matches (and avoid the looping we are forced to do atm)
or have some way of telling the function to start at the previous position + match.len by default without having to use additional code as shown below.

Code: Select all

while (pos := RegExMatch(http.responseText, regex, &def, pos+(IsSet(def) ? def.Len : 1)))
{
	OutputDebug pos2 ' ' def.var ' := ' def.val ' `; ' def.comment
	hFile.Write(def.var ' := ' def.val ' `; ' def.comment '`n')
}
Something along the lines of would be nice:

Code: Select all

while RegExMatch(http.responseText, regex, &def, "nextMatch") ;or something like that
Projects:
AHK-ToolKit
lexikos
Posts: 9690
Joined: 30 Sep 2013, 04:07
Contact:

Re: RegexMatch get next position by default

02 May 2022, 00:08

I think it would make more sense to have a separate function with different syntax/usage, to optimize for the fact that the same haystack and needle are going to be used repeatedly.
User avatar
RaptorX
Posts: 395
Joined: 06 Dec 2014, 14:27
Contact:

Re: RegexMatch get next position by default

02 May 2022, 09:35

lexikos wrote:
02 May 2022, 00:08
I think it would make more sense to have a separate function with different syntax/usage, to optimize for the fact that the same haystack and needle are going to be used repeatedly.
That makes sense.

Something along GlobalMatch (if thats the direction you are going with it) might be a good idea.

In some situations I use the VB regex match engine to work around that limitation but i have been having issues in x64. so there's that.
I think there might be legitimate need to optimize for global matching.

But that begs the question, would there be an analogous shorthand like ~=? :P
Maybe im just pushing it now.
Projects:
AHK-ToolKit
User avatar
RaptorX
Posts: 395
Joined: 06 Dec 2014, 14:27
Contact:

Re: RegexMatch get next position by default

06 May 2022, 12:52

@Lexicos
I just found this other post I had forgotten about and you are very consistent with your opinions on the matter haha

I like that. I hope to see that idea implemented at some point. Of course I use my own function for it, but I think I keep thinking about it because it might be a nice addition to the language itself rather than forcing users to create their own.
And mainly because regular expressions already do that. It's a matter of missing something that I know is there kind of thing.
Projects:
AHK-ToolKit
lexikos
Posts: 9690
Joined: 30 Sep 2013, 04:07
Contact:

Re: RegexMatch get next position by default

06 May 2022, 19:31

And mainly because regular expressions already do that.
They do what? Get all matches? No, they don't. One has to apply the regex repeatedly, whether it is in script or with equivalent C++.

Some other languages from which regex can be used have standard functions which return all matches.
User avatar
RaptorX
Posts: 395
Joined: 06 Dec 2014, 14:27
Contact:

Re: RegexMatch get next position by default

07 May 2022, 09:45

lexikos wrote:
06 May 2022, 19:31
And mainly because regular expressions already do that.
They do what? Get all matches? No, they don't. One has to apply the regex repeatedly, whether it is in script or with equivalent C++.

Some other languages from which regex can be used have standard functions which return all matches.
Im not referring to other languages but to regex itself by applying the /.../g option. which is what I was suggesting I think on my older post:

I would be in favor of allowing the g option in the regex options string and that the &Match object returns an array for them such as:

Code: Select all

RegexMatch(var, "ig)(?<thing>.*)", &matched)
MsgBox matched.thing.length
MsgBox matched.thing[1]

for thing in matched.thing
	msgbox thing
Projects:
AHK-ToolKit
lexikos
Posts: 9690
Joined: 30 Sep 2013, 04:07
Contact:

Re: RegexMatch get next position by default

07 May 2022, 16:54

No, you are referring to other languages. /.../g is a JavaScript construct. Only the ... part is regex.

The g flag causes match to return only the overall matches. To get all matches including captured subpatterns, there is matchAll, which requires the g flag. It's actually a relatively new function that doesn't exist in Internet Explorer. Apparently before it was added, the way to get all matches with captured subpatterns was to call regexp.exec() in a loop.

The flag does not affect the matching behaviour of the regex; it only affects how specific JavaScript functions utilize the regex.

Also, the JavaScript RegExp object is not a regular expression or regex. It is an object which encapsulates a regex pattern with a set of methods for applying it in different ways, like putting together a regex string, RegExMatch, RegExReplace and a hypothetical future match-all function. You could create such an object yourself, although it wouldn't be as efficient as a built-in object.

The RegExMatch function or object could be "overloaded" with match-all functionality, but there are a number of reasons I do not think it would be appropriate:
  • The OutputVar parameter is what it says, an output parameter. If it differentiated by whether the variable already contains a match object, you would have to explicitly reset the variable between runs, if you're using the RegExMatch loop inside a larger loop.
  • It is inefficient to pass the haystack and needle again if they are stored in the match object (although I think only the necessary substring is stored in the match object at the moment).
  • It doesn't make sense to pass the haystack and needle again if the matching state (i.e. last match index) relies on the specific haystack or needle. What is the function supposed to do if the parameters have been changed?
  • For using the match object itself to get repeated matches, you would need to first call RegExMatch once to get the match object.
In JavaScript, the RegExp object represents the regex itself, not the matches.
JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g., /foo/g or /foo/y). They store a lastIndex from the previous match.
Source: RegExp.prototype.test() - JavaScript | MDN
For repeated calls, it looks like you pass the haystack in repeatedly. Because the RegExp object represents the pattern and not a match, it does not store the haystack. I suppose the only "state" it has aside from the pattern itself is a single integer, lastIndex.


I agree that more convenient RegEx functions would be useful, but there are many such features that could (and will) be implemented, and usually only one developer (sometimes zero) working on AutoHotkey. Currently my priority is on making v2 more accessible to users, although if I'm here writing posts, I'm probably procrastinating...
User avatar
RaptorX
Posts: 395
Joined: 06 Dec 2014, 14:27
Contact:

Re: RegexMatch get next position by default

10 May 2022, 20:30

lexikos wrote:
07 May 2022, 16:54
No, you are referring to other languages. /.../g is a JavaScript construct. Only the ... part is regex.
This is something that I didnt know. I was convinced that the g flag was like other flags but after reading the manual I only see these:
i for PCRE2_CASELESS
m for PCRE2_MULTILINE
n for PCRE2_NO_AUTO_CAPTURE
s for PCRE2_DOTALL
x for PCRE2_EXTENDED
xx for PCRE2_EXTENDED_MORE
lexikos wrote:
07 May 2022, 16:54
You could create such an object yourself, although it wouldn't be as efficient as a built-in object.
This is the reason why I usually go around wishing for stuff on the forums. Even though I can and usually build those objects, I think it would be beneficial to have a built in, efficient method instead of what we amateurs might come up with. :)

Is my same sentiment behind having a builtin JSON parser that converts strings to AHK objects and viceversa. Not that I cant build a library, but it would be way more efficient if is a built in thing.
lexikos wrote:
07 May 2022, 16:54
The RegExMatch function or object could be "overloaded" with match-all functionality, but there are a number of reasons I do not think it would be appropriate:
  • The OutputVar parameter is what it says, an output parameter. If it differentiated by whether the variable already contains a match object, you would have to explicitly reset the variable between runs, if you're using the RegExMatch loop inside a larger loop.
  • It is inefficient to pass the haystack and needle again if they are stored in the match object (although I think only the necessary substring is stored in the match object at the moment).
  • It doesn't make sense to pass the haystack and needle again if the matching state (i.e. last match index) relies on the specific haystack or needle. What is the function supposed to do if the parameters have been changed?
  • For using the match object itself to get repeated matches, you would need to first call RegExMatch once to get the match object.
After understanding how the global flag actually works, I tend to agree that RegexMatch should not be overloaded like that.
Maybe at some point we might see a sister function but for now, I understand your point.
lexikos wrote:
07 May 2022, 16:54
I agree that more convenient RegEx functions would be useful, but there are many such features that could (and will) be implemented, and usually only one developer (sometimes zero) working on AutoHotkey. Currently my priority is on making v2 more accessible to users, although if I'm here writing posts, I'm probably procrastinating...
Haha, yeah my friend, I totally get your point. I totally wish I had the level of understanding of the c++ code to of help. Maybe that will be my personal project... get good enough to add those little things if possible :)
Projects:
AHK-ToolKit

Return to “Wish List”

Who is online

Users browsing this forum: No registered users and 29 guests