Page 1 of 1
Minor Change to RegExMatch()
Posted: 01 Jul 2014, 10:24
by fincs
Currently RegExMatch has this syntax:
Code: Select all
FoundPos := RegExMatch(Haystack, NeedleRegEx [, OutputVar, StartingPosition = 1])
I think it should be:
Code: Select all
MatchObj := RegExMatch(Haystack, NeedleRegEx [, StartingPosition = 1])
This change makes the
~= operator actually useful, while removing a redundant return value:
FoundPos is the same thing as
MatchObj.Pos. E.g.:
Code: Select all
if m := "The current year is %A_Year%" ~= "(\d+)"
msgbox Found numeric sequence '%m[0]%' at %m.Pos%!
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 11:12
by toralf
I like it.
How would named and unnamed pattern be referenced?
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 11:16
by toralf
At the same time the
~= doesn't really improve
Code: Select all
if Pos := RegExMatch("The current year is " A_Year, "(\d+)", m)
msgbox Found numeric sequence '%m1%' at %Pos%!
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 11:19
by fincs
toralf wrote:How would named and unnamed pattern be referenced?
(...)
At the same time the ~= doesn't really improve
In v2, the old capture system that uses pseudo-arrays was completely removed, and only match objects are supported. Please read
the documentation. So what I propose is that instead of returning MatchPos (which is redundant), it should return the match object instead. This match object contains all information about captures, positions and lengths.
~= is improved because it now returns something useful instead of MatchPos (which is very rarely used and as I said, it is redundant).
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 12:01
by toralf
Thanks, I didn't know that mode1 was removed in v2.
The times I used
Pos was when I had to go over a haystack multiple times.
Code: Select all
Pos = 0
While Pos := RegExMatch(Haystack, Needle, Var, Pos){
}
The same could work when zero is returned in case of an object when nothing gets returned and an empty string with ErrorLevel on errors.
I assume an object by itself is True. I would hate to write
While isObject(m:= RegExMatch(Haystack, Needle, Pos)){
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 12:06
by fincs
Objects are true when interpreted as booleans. It would suffice to say
while m := RegExMatch(Haystack, Needle, Pos). Also, RegExMatch (and thus ~=) would return either an object (in case of match) or empty string (in case of non-match). In v2, RegEx functions always throw an exception if a malformed pattern is used or there's some other PCRE error (ErrorLevel is not involved). The loop you wrote could be expressed as such:
Code: Select all
Pos := 1
while m := RegExMatch(Haystack, Needle, Pos)
{
;...process the match
Pos := m.Pos + m.Len ; continue search at the end of the overall match
}
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 12:22
by HotKeyIt
+1
EDIT:
fincs wrote:Also, RegExMatch (and thus ~=) would return either an object (in case of match) or empty string (in case of non-match).
Btw. RegExMatch returns 0 if no match is found.
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 16:36
by fincs
HotKeyIt wrote:Btw. RegExMatch returns 0 if no match is found.
That's what currently happens, and is related to the fact that it returns a position value. In the proposed RegExMatch modification, it seems IMO inconsistent to return 0 for false when the other possibility for the return value is not an integer as well.
Re: Minor Change to RegExMatch()
Posted: 01 Jul 2014, 20:50
by lexikos
I had already considered and rejected this possibility.
The ~= operator is already useful, for simple matching of patterns. This is its intended purpose - "does x match pattern y?". obj := str ~= pattern is awkward and particularly obscure.
Returning the object does not improve convenience for RegExMatch, but it is likely to decrease performance substantially with large Haystacks when only a position is needed. Haystack has to be copied into the object, but only if OutputVar is specified, since the object isn't created otherwise.
Re: Minor Change to RegExMatch()
Posted: 15 Nov 2014, 00:24
by Coco
How about a new operator: ~==, return RegExMatchObject, while ~= retain its behavior of returning the position
Re: Minor Change to RegExMatch()
Posted: 17 Nov 2014, 04:27
by Sjc1000
FoundPos is the same thing as MatchObj.Pos. E.g.:
What would happen if you had a named sub pattern called Pos?
Maybe if MatchObj had a setup like this.
Code: Select all
MatchObj := RegExMatch("Testing 123", "(\w+)\s(?<subpattern>\d+)")
Non named sup patterns:
MatchObj.1 MatchObj.2 ( or MatchObj[1], etc )
Named sub patterns:
MatchObj.subpattern ( or MatchObj["subpattern"] )
Both of these could also allow for specific .length's and .pos's .
MatchObj.subpattern.length and matchObj.1.pos
It would also allow for easy iteration with a For loop.
Code: Select all
For key, val in MatchObj
{
length := val.length
MsgBox, Match %key% has a length of %length%
}
Re: Minor Change to RegExMatch()
Posted: 17 Nov 2014, 05:49
by Coco-guest
Sjc1000 wrote:What would happen if you had a named sub pattern called Pos?
You can use the method call syntax
MatchObj.Pos().
Re: Minor Change to RegExMatch()
Posted: 17 Nov 2014, 05:53
by Sjc1000
I see, thanks
I now see it's also documented in the help file. I must have missed that.
![Razz :P](./images/smilies/icon_razz.gif)