Minor Change to RegExMatch()

Discuss the future of the AutoHotkey language
User avatar
fincs
Posts: 505
Joined: 30 Sep 2013, 14:17
GitHub: fincs
Location: Seville, Spain
Contact:

Minor Change to RegExMatch()

01 Jul 2014, 10:24

Currently RegExMatch has this syntax:

Code: Select all

FoundPos := RegExMatch(Haystack, NeedleRegEx [, OutputVar, StartingPosition = 1])
I think it should be:

Code: Select all

MatchObj := RegExMatch(Haystack, NeedleRegEx [, StartingPosition = 1])
This change makes the ~= operator actually useful, while removing a redundant return value: FoundPos is the same thing as MatchObj.Pos. E.g.:

Code: Select all

if m := "The current year is %A_Year%" ~= "(\d+)"
    msgbox Found numeric sequence '%m[0]%' at %m.Pos%!
fincs
Windows 10 x64 Build 18362 | AMD Ryzen 7 3700X with 32 GB of RAM | AutoHotkey v1.1.31.01
Get SciTE4AutoHotkey v3.0.06.01 - [My project list]
toralf
Posts: 792
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: Minor Change to RegExMatch()

01 Jul 2014, 11:12

I like it.
How would named and unnamed pattern be referenced?
ciao
toralf
toralf
Posts: 792
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: Minor Change to RegExMatch()

01 Jul 2014, 11:16

At the same time the ~= doesn't really improve

Code: Select all

if Pos := RegExMatch("The current year is " A_Year, "(\d+)", m)
    msgbox Found numeric sequence '%m1%' at %Pos%!
ciao
toralf
User avatar
fincs
Posts: 505
Joined: 30 Sep 2013, 14:17
GitHub: fincs
Location: Seville, Spain
Contact:

Re: Minor Change to RegExMatch()

01 Jul 2014, 11:19

toralf wrote:How would named and unnamed pattern be referenced?
(...)
At the same time the ~= doesn't really improve
In v2, the old capture system that uses pseudo-arrays was completely removed, and only match objects are supported. Please read the documentation. So what I propose is that instead of returning MatchPos (which is redundant), it should return the match object instead. This match object contains all information about captures, positions and lengths. ~= is improved because it now returns something useful instead of MatchPos (which is very rarely used and as I said, it is redundant).
fincs
Windows 10 x64 Build 18362 | AMD Ryzen 7 3700X with 32 GB of RAM | AutoHotkey v1.1.31.01
Get SciTE4AutoHotkey v3.0.06.01 - [My project list]
toralf
Posts: 792
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: Minor Change to RegExMatch()

01 Jul 2014, 12:01

Thanks, I didn't know that mode1 was removed in v2.

The times I used Pos was when I had to go over a haystack multiple times.

Code: Select all

Pos = 0
While Pos := RegExMatch(Haystack, Needle, Var, Pos){
  
}
The same could work when zero is returned in case of an object when nothing gets returned and an empty string with ErrorLevel on errors.
I assume an object by itself is True. I would hate to write While isObject(m:= RegExMatch(Haystack, Needle, Pos)){
ciao
toralf
User avatar
fincs
Posts: 505
Joined: 30 Sep 2013, 14:17
GitHub: fincs
Location: Seville, Spain
Contact:

Re: Minor Change to RegExMatch()

01 Jul 2014, 12:06

Objects are true when interpreted as booleans. It would suffice to say while m := RegExMatch(Haystack, Needle, Pos). Also, RegExMatch (and thus ~=) would return either an object (in case of match) or empty string (in case of non-match). In v2, RegEx functions always throw an exception if a malformed pattern is used or there's some other PCRE error (ErrorLevel is not involved). The loop you wrote could be expressed as such:

Code: Select all

Pos := 1
while m := RegExMatch(Haystack, Needle, Pos)
{
    ;...process the match
    Pos := m.Pos + m.Len ; continue search at the end of the overall match
}
fincs
Windows 10 x64 Build 18362 | AMD Ryzen 7 3700X with 32 GB of RAM | AutoHotkey v1.1.31.01
Get SciTE4AutoHotkey v3.0.06.01 - [My project list]
HotKeyIt
Posts: 2157
Joined: 29 Sep 2013, 18:35
Contact:

Re: Minor Change to RegExMatch()

01 Jul 2014, 12:22

+1

EDIT:
fincs wrote:Also, RegExMatch (and thus ~=) would return either an object (in case of match) or empty string (in case of non-match).
Btw. RegExMatch returns 0 if no match is found.
User avatar
fincs
Posts: 505
Joined: 30 Sep 2013, 14:17
GitHub: fincs
Location: Seville, Spain
Contact:

Re: Minor Change to RegExMatch()

01 Jul 2014, 16:36

HotKeyIt wrote:Btw. RegExMatch returns 0 if no match is found.
That's what currently happens, and is related to the fact that it returns a position value. In the proposed RegExMatch modification, it seems IMO inconsistent to return 0 for false when the other possibility for the return value is not an integer as well.
fincs
Windows 10 x64 Build 18362 | AMD Ryzen 7 3700X with 32 GB of RAM | AutoHotkey v1.1.31.01
Get SciTE4AutoHotkey v3.0.06.01 - [My project list]
lexikos
Posts: 7088
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Minor Change to RegExMatch()

01 Jul 2014, 20:50

I had already considered and rejected this possibility.

The ~= operator is already useful, for simple matching of patterns. This is its intended purpose - "does x match pattern y?". obj := str ~= pattern is awkward and particularly obscure.

Returning the object does not improve convenience for RegExMatch, but it is likely to decrease performance substantially with large Haystacks when only a position is needed. Haystack has to be copied into the object, but only if OutputVar is specified, since the object isn't created otherwise.
Coco
Posts: 771
Joined: 29 Sep 2013, 20:37
GitHub: cocobelgica

Re: Minor Change to RegExMatch()

15 Nov 2014, 00:24

How about a new operator: ~==, return RegExMatchObject, while ~= retain its behavior of returning the position
User avatar
Sjc1000
Posts: 39
Joined: 02 Oct 2013, 02:07

Re: Minor Change to RegExMatch()

17 Nov 2014, 04:27

FoundPos is the same thing as MatchObj.Pos. E.g.:
What would happen if you had a named sub pattern called Pos?

Maybe if MatchObj had a setup like this.

Code: Select all

MatchObj := RegExMatch("Testing 123", "(\w+)\s(?<subpattern>\d+)")
Non named sup patterns:
MatchObj.1 MatchObj.2 ( or MatchObj[1], etc )

Named sub patterns:
MatchObj.subpattern ( or MatchObj["subpattern"] )

Both of these could also allow for specific .length's and .pos's .
MatchObj.subpattern.length and matchObj.1.pos

It would also allow for easy iteration with a For loop.

Code: Select all

For key, val in MatchObj
{
     length := val.length
     MsgBox, Match %key% has a length of %length%
}
Please find me on the IRC if you have any questions, I'm never on the forum anymore.
Coco-guest

Re: Minor Change to RegExMatch()

17 Nov 2014, 05:49

Sjc1000 wrote:What would happen if you had a named sub pattern called Pos?
You can use the method call syntax MatchObj.Pos().
User avatar
Sjc1000
Posts: 39
Joined: 02 Oct 2013, 02:07

Re: Minor Change to RegExMatch()

17 Nov 2014, 05:53

I see, thanks :)

I now see it's also documented in the help file. I must have missed that. :P
Please find me on the IRC if you have any questions, I'm never on the forum anymore.

Return to “AutoHotkey v2 Development”

Who is online

Users browsing this forum: No registered users and 2 guests