Page 1 of 1
RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 03:47
by mslonik
Dear Forum,
I can't figure out how to formulate regex syntax.
Test string:
Expected output after regex:
Another words I'd like to get few substitutions at once, individually for each subpatern:
What I've figured out till now is only the first part. In place of substitution I've put therefore "???".
Code: Select all
Haystack := "foo boo woo"
NewStr := RegExReplace(Haystack, "(foo)|(boo)|(woo)", "???"
Please help me to figure it out.
Kind regards, mslonik (
)
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 04:02
by braunbaer
I don't think it is possible with a single regexreplace.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 04:38
by Rohwedder
Hallo,
does not work properly yet!
Code: Select all
Haystack := "foo boo woo"
RegExReplace(Haystack, "(\woo)(?CCallNewStr)", NewStr)
MsgBox,% NewStr
CallNewStr(Match)
{
Global NewStr .= {"foo":"cat","boo":"dog","woo":"cow"}[Match]
}
Re: RegExReplace, substitution challenge, multiple subpaterns Topic is solved
Posted: 24 Jun 2021, 05:19
by braunbaer
A callout function is a good idea. This works
Code: Select all
NumpadSub::
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, "(.*?)(foo|boo|woo|$)(?CCallNewStr)")
MsgBox,% NewStr
CallNewStr(Match)
{
static repl:={"foo":"cat","boo":"dog","woo":"cow"}
Global NewStr
NewStr .= Match1 repl[Match2]
}
return
The actual replacement occurs in the callout function. Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.
And, to make the function more flexible when other replacements are added, you can build the regex from the replacement table:
Code: Select all
NumpadSub::
Global NewStr
Global repl:={foo:"cat",boo:"dog",woo:"cow"}
rgx:=""
for key in repl
rgx.=key "|"
rgx:= "(.*?)(" rgx "$)(?CCallNewStr)"
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, rgx)
MsgBox,% NewStr
CallNewStr(Match) {
NewStr .= Match1 repl[Match2]
}
return
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 06:24
by Hellbent
I don't know anything about RegEX, but aren't you basically just doing this in this case?
Code: Select all
Table := {"foo":"cat","boo":"dog","woo":"cow"}
Haystack := "foo boo abc woo moo xyz"
loop, % ( arr := StrSplit(HayStack," ") ).Length()
( Table.HasKey(arr[A_Index]))?( NewStr .= Table[arr[A_Index]] " ")
MsgBox, % NewStr
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 06:42
by braunbaer
Well, first he asked for a solution with a single regexreplace. Making multiple replacements in a loop was explicitly unwanted. There are lots of different ways to make such replacments using a loop.
Second, this is much more flexible. An element to replace could consist of several words, this would not work using strsplit.
Third, your version loses everything that does not match one of the replace keys. At least, the ternary should have a subexpression for the case when a part of the string is not found in the replacement table.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 06:53
by Hellbent
braunbaer wrote: ↑24 Jun 2021, 05:19
And, to make the function more flexible when other replacements are added, you can build the regex from the replacement table:
Code: Select all
Table := {"foo":"cat","boo":"dog","woo":"cow"},Haystack := "foo boo abc woo moo xyz"
loop, % ( arr := StrSplit(HayStack," ") ).Length()
(Table.HasKey(arr[A_Index]))?( NewStr .= Table[arr[A_Index]] " "):(NewStr .= arr[A_Index] " ")
MsgBox, % NewStr
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 07:01
by braunbaer
In the version I posted, word boundaries are not relevant. fooxyz will be translated to catxyz. If that is unwanted, the regex would have to be changed, according to what exactly is wanted.
Of course, you don't need regex at all for doing whatever you want, but it is much more flexible and facilitates string manipulation and pattern recognition tremendously.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 07:03
by Hellbent
braunbaer wrote: ↑24 Jun 2021, 06:42
Well, first he asked for a solution with a single regexreplace. Making multiple replacements in a loop was explicitly unwanted.
Didn't see that in the OP. My bad
braunbaer wrote: ↑24 Jun 2021, 06:42
Second, this is much more flexible. An element to replace could consist of several words, this would not work using strsplit.
You got me there
braunbaer wrote: ↑24 Jun 2021, 06:42
Third, your version loses everything that does not match one of the replace keys. At least, the ternary should have a subexpression for the case when a part of the string is not found in the replacement table.
See my last post.
As Meatloaf used to sing. Two Out Of Three Ain't Bad.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 07:09
by braunbaer
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 07:21
by Hellbent
Hellbent wrote: ↑24 Jun 2021, 07:03
braunbaer wrote: ↑24 Jun 2021, 06:42
Second, this is much more flexible. An element to replace could consist of several words, this would not work using strsplit.
You got me there
I spoke too soon.
Code: Select all
Table := {"foo":"cat","boo":"dog","woo":"cow"} , Haystack := "fooxyz boohoohoo abc woohoo moochewchew xyz"
for k, v in Table
Haystack := StrReplace(Haystack,k,v)
MsgBox, % Haystack
Three out of three
or
Code: Select all
Haystack := "fooxyz boohoohoo abc woohoo moochewchew xyz"
for k, v in {"foo":"cat","boo":"dog","woo":"cow"}
Haystack := StrReplace(Haystack,k,v)
MsgBox, % Haystack
*Edit*
And bonus points to boot!!
That was unexpected
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 12:14
by braunbaer
Hi!
First:
braunbaer wrote: ↑24 Jun 2021, 07:01
Of course, you don't need regex at all for doing whatever you want, but it is much more flexible and facilitates string manipulation and pattern recognition tremendously.
I am quite sure you can find 54 other ways to replace the strings without regex, but what's your point?
Hellbent wrote: ↑24 Jun 2021, 07:21
To replace simple substrings, use StrReplace() or StringReplace because it is faster than RegExReplace()
Certainly regexreplace is slower when replacing a single string. But that is not the case here. One single regexreplace call replaces a lot of different strings. You need a loop and a lot of strreplace (or the like) calls to achieve the same.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 15:37
by mslonik
Hello again,
@braunbaer
Thank you very much for your support!
I don't think it is possible with a single regexreplace.
(...)
Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.
I thought so, but as a newbie didn't mind to ask. Solution provided by you, namely: callout function is elegant and I even understand how it works.
Well, first he asked for a solution with a single regexreplace. (...)
Exactly. I've prepared trivial example for sake of my question. Actually in my application where issue arised, my RegExReplace is far more complicated (flexible) and I thought if it would be possible to do even more with one shot. This is the reason why I decided to ask for support.
@Hellbent and
@Rohwedder
Thank you too for paying some attention to my challenge!
I consider this issue as closed.
Kind regards, mslonik (
)
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 16:39
by Hellbent
braunbaer wrote: ↑24 Jun 2021, 12:14
Hi!
First:
I am quite sure you can find 54 other ways to replace the strings without regex, but what's your point?
Hi.
I don't know anything about RegEX,
but aren't you basically just doing this in this case?
Looked to me that you were just doing a simple look up from a table.
braunbaer wrote: ↑24 Jun 2021, 12:14
Certainly regexreplace is slower when replacing a single string. But that is not the case here. One single regexreplace call replaces a lot of different strings. You need a loop and a lot of strreplace (or the like) calls to achieve the same.
Do you have a estimate on how much faster your method is to the StrReplace() in this case?
As in this actual case not just regex in general.
I ask because what we see in a AHK script is normally just the tip of the iceberg of the actual processes that are going on behind the scenes and since you know a lot about this subject you seem as good a person as any to ask.
HB.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 24 Jun 2021, 17:10
by Hellbent
Another thing of confusion for me.
braunbaer wrote: ↑24 Jun 2021, 05:19
The
actual replacement occurs in the callout function.
Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.
And
braunbaer wrote: ↑24 Jun 2021, 05:19
You need a loop and a lot of strreplace (or the like) calls to achieve the same.
This looks a lot like a loop.
Code: Select all
global Index := 0
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, "(.*?)(foo|boo|woo|$)(?CCallNewStr)")
MsgBox,% Index
return
CallNewStr(Match)
{
static repl:={"foo":"cat","boo":"dog","woo":"cow"}
Global NewStr
NewStr .= Match1 repl[Match2]
++Index
}
What am I missing here?
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 25 Jun 2021, 04:32
by braunbaer
Hellbent wrote: ↑24 Jun 2021, 16:39
Do you have a estimate on how much faster your method is to the StrReplace() in this case?
No. The algos are completely different, and it depends strongly on the data (how many alternatives, how many matches, etc...). Regex may be even slower than a GOOD handcrafted parsing algo. The reason for using regex is not that it saves computing time, the reason is it tremendously saves programming time (and even if it's a little slower, in most cases that will be irrelevant). You don't have to program an algorithm for parsing your text, you just describe the patterns and call regexmatch/regexpreplace (of course, if the pattern consists of a single string, using regex does not make sense, you should use instr/strreplace in that case). This thread with a very simple task is a good example: You started off with strsplit, then, as you saw this as a dead end, you switched to strreplace and completely rewrote the code. But what if you want to replace whole words only, but allow all kinds of delimiters? Again, you wil have to completely rewrite your algorithm (and it's going to become complicated). Using regex, you always leave the program structure as it is and just adapt the regex that describes the pattern.
Hellbent wrote: ↑24 Jun 2021, 16:39
I don't know anything about RegEX, but aren't you basically just doing this in this case?
Doing what, precisely?
This looks a lot like a loop.
What am I missing here?
Where do you see a loop? In this script, there is one single regexreplace call, no loop. Of course, internally, in a program virtually everything happens in loops - regexreplace loops through the data, just as strreplace does, and as most other AHK commands and functions do.
Note: In an interpretive language like AHK, the speed gain of regex is probably bigger (or the speed loss is smaller) than in a compiled language, as there is some overhead for every script line executed.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 17 Nov 2022, 09:20
by george-laurentiu
braunbaer wrote: ↑24 Jun 2021, 05:19
A callout function is a good idea. This works
Code: Select all
NumpadSub::
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, "(.*?)(foo|boo|woo|$)(?CCallNewStr)")
MsgBox,% NewStr
CallNewStr(Match)
{
static repl:={"foo":"cat","boo":"dog","woo":"cow"}
Global NewStr
NewStr .= Match1 repl[Match2]
}
return
The actual replacement occurs in the callout function. Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.
And, to make the function more flexible when other replacements are added, you can build the regex from the replacement table:
Code: Select all
NumpadSub::
Global NewStr
Global repl:={foo:"cat",boo:"dog",woo:"cow"}
rgx:=""
for key in repl
rgx.=key "|"
rgx:= "(.*?)(" rgx "$)(?CCallNewStr)"
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, rgx)
MsgBox,% NewStr
CallNewStr(Match) {
NewStr .= Match1 repl[Match2]
}
return
Can you please make a version of this script which work also with non literal strings? I've tried but it doesn't work.
When I search, for example, for "/s,/s" it replaces with nothing (" , " just disappears)
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 22 Nov 2022, 07:53
by braunbaer
george-laurentiu wrote: ↑17 Nov 2022, 09:20
When I search, for example, for "/s,/s" it replaces with nothing (" , " just disappears)
I don't understand what you are exactly looking for. Could you give a complete example: How does the input string look like, what should be the replacement rules, and what is the expected output?
you could just post the script that does not work for you, along with the desired output.
Re: RegExReplace, substitution challenge, multiple subpaterns
Posted: 22 Nov 2022, 13:31
by george-laurentiu
you could just post the script that does not work for you, along with the desired output.
Code: Select all
^#8::
Global NewStr
Global repl:={foo:"cat",boo:"dog",woo:"cow", "\d,\d":"aaa"}
rgx:=""
for key in repl
rgx.=key "|"
rgx:= "(.*?)(" rgx "$)(?CCallNewStr)"
Clipboard := ""
Sendinput ^c
Clipwait
newstr:=""
RegExReplace(Clipboard, rgx)
CallNewStr(Match) {
NewStr .= Match1 repl[Match2]
}
return
Selected string which is copied into clipboard: foo hei 3,3 boo salut woo
Expected result: cat hei aaa dog salut cow
Actual result: cat hei dog salut cow
When I use regex syntax, the script doesn't replace anything, that's the problem.
[Mod edit: Fixed quote tags.]