RegExReplace, substitution challenge, multiple subpaterns Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
mslonik
Posts: 144
Joined: 21 Feb 2019, 04:38
Location: Poland
Contact:

RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 03:47

Dear Forum,

I can't figure out how to formulate regex syntax.

Test string:

Code: Select all

foo boo woo
Expected output after regex:

Code: Select all

cat dog cow
Another words I'd like to get few substitutions at once, individually for each subpatern:

Code: Select all

foo → cat
boo → dog
woo → cow
What I've figured out till now is only the first part. In place of substitution I've put therefore "???".

Code: Select all

Haystack := "foo boo woo"
NewStr := RegExReplace(Haystack, "(foo)|(boo)|(woo)", "???"
Please help me to figure it out.

Kind regards, mslonik (🐘)

My scripts on this forum: Hotstrings Diacritic O T A G L E
Please become my patreon: Patreon👍
Written in AutoHotkey text replacement tool: Hotstrings.technology
Courses on AutoHotkey :ugeek:
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 04:02

I don't think it is possible with a single regexreplace.
Rohwedder
Posts: 7616
Joined: 04 Jun 2014, 08:33
Location: Germany

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 04:38

Hallo,
does not work properly yet!

Code: Select all

Haystack := "foo boo woo"
RegExReplace(Haystack, "(\woo)(?CCallNewStr)", NewStr)
MsgBox,% NewStr


CallNewStr(Match)
{
	Global NewStr .= {"foo":"cat","boo":"dog","woo":"cow"}[Match]
}
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns  Topic is solved

24 Jun 2021, 05:19

A callout function is a good idea. This works

Code: Select all

NumpadSub::
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, "(.*?)(foo|boo|woo|$)(?CCallNewStr)")
MsgBox,% NewStr

CallNewStr(Match)
{
    static repl:={"foo":"cat","boo":"dog","woo":"cow"}
	Global NewStr
    NewStr .= Match1 repl[Match2]
}
return
The actual replacement occurs in the callout function. Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.

And, to make the function more flexible when other replacements are added, you can build the regex from the replacement table:

Code: Select all

NumpadSub::
Global NewStr
Global repl:={foo:"cat",boo:"dog",woo:"cow"}
rgx:=""
for key in repl
    rgx.=key "|"
rgx:= "(.*?)(" rgx "$)(?CCallNewStr)"

Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, rgx)
MsgBox,% NewStr

CallNewStr(Match) {
    NewStr .= Match1 repl[Match2]
}
return
User avatar
Hellbent
Posts: 2109
Joined: 23 Sep 2017, 13:34

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 06:24

I don't know anything about RegEX, but aren't you basically just doing this in this case?

Code: Select all

Table := {"foo":"cat","boo":"dog","woo":"cow"}
Haystack := "foo boo abc woo moo xyz"

loop, % ( arr := StrSplit(HayStack," ") ).Length()	
	( Table.HasKey(arr[A_Index]))?( NewStr .= Table[arr[A_Index]] " ")

MsgBox, % NewStr
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 06:42

Well, first he asked for a solution with a single regexreplace. Making multiple replacements in a loop was explicitly unwanted. There are lots of different ways to make such replacments using a loop.

Second, this is much more flexible. An element to replace could consist of several words, this would not work using strsplit.

Third, your version loses everything that does not match one of the replace keys. At least, the ternary should have a subexpression for the case when a part of the string is not found in the replacement table.
User avatar
Hellbent
Posts: 2109
Joined: 23 Sep 2017, 13:34

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 06:53

braunbaer wrote:
24 Jun 2021, 05:19

And, to make the function more flexible when other replacements are added, you can build the regex from the replacement table:

Code: Select all

Table := {"foo":"cat","boo":"dog","woo":"cow"},Haystack := "foo boo abc woo moo xyz"
loop, % ( arr := StrSplit(HayStack," ") ).Length()	
	(Table.HasKey(arr[A_Index]))?( NewStr .= Table[arr[A_Index]] " "):(NewStr .= arr[A_Index] " ")
MsgBox, % NewStr 	
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 07:01

In the version I posted, word boundaries are not relevant. fooxyz will be translated to catxyz. If that is unwanted, the regex would have to be changed, according to what exactly is wanted.

Of course, you don't need regex at all for doing whatever you want, but it is much more flexible and facilitates string manipulation and pattern recognition tremendously.
User avatar
Hellbent
Posts: 2109
Joined: 23 Sep 2017, 13:34

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 07:03

braunbaer wrote:
24 Jun 2021, 06:42
Well, first he asked for a solution with a single regexreplace. Making multiple replacements in a loop was explicitly unwanted.
Didn't see that in the OP. My bad :lol:
braunbaer wrote:
24 Jun 2021, 06:42
Second, this is much more flexible. An element to replace could consist of several words, this would not work using strsplit.
You got me there :thumbup:
braunbaer wrote:
24 Jun 2021, 06:42
Third, your version loses everything that does not match one of the replace keys. At least, the ternary should have a subexpression for the case when a part of the string is not found in the replacement table.
See my last post.


As Meatloaf used to sing. Two Out Of Three Ain't Bad.
User avatar
Hellbent
Posts: 2109
Joined: 23 Sep 2017, 13:34

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 07:21

Hellbent wrote:
24 Jun 2021, 07:03
braunbaer wrote:
24 Jun 2021, 06:42
Second, this is much more flexible. An element to replace could consist of several words, this would not work using strsplit.
You got me there :thumbup:
I spoke too soon.

Code: Select all

Table := {"foo":"cat","boo":"dog","woo":"cow"} , Haystack := "fooxyz boohoohoo abc woohoo moochewchew xyz"
for k, v in Table
	Haystack := StrReplace(Haystack,k,v)
MsgBox, % Haystack
Three out of three :lol:

or

Code: Select all

Haystack := "fooxyz boohoohoo abc woohoo moochewchew xyz"
for k, v in {"foo":"cat","boo":"dog","woo":"cow"}
	Haystack := StrReplace(Haystack,k,v)
MsgBox, % Haystack
*Edit*

And bonus points to boot!!
https://www.autohotkey.com/docs/commands/RegExReplace.htm wrote:
To replace simple substrings, use StrReplace() or StringReplace because it is faster than RegExReplace().
That was unexpected :D
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 12:14

Hi!

First:
braunbaer wrote:
24 Jun 2021, 07:01
Of course, you don't need regex at all for doing whatever you want, but it is much more flexible and facilitates string manipulation and pattern recognition tremendously.
I am quite sure you can find 54 other ways to replace the strings without regex, but what's your point?
Hellbent wrote:
24 Jun 2021, 07:21
To replace simple substrings, use StrReplace() or StringReplace because it is faster than RegExReplace()

Certainly regexreplace is slower when replacing a single string. But that is not the case here. One single regexreplace call replaces a lot of different strings. You need a loop and a lot of strreplace (or the like) calls to achieve the same.
User avatar
mslonik
Posts: 144
Joined: 21 Feb 2019, 04:38
Location: Poland
Contact:

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 15:37

Hello again,

@braunbaer

Thank you very much for your support!
I don't think it is possible with a single regexreplace.
(...)
Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.
I thought so, but as a newbie didn't mind to ask. Solution provided by you, namely: callout function is elegant and I even understand how it works.
Well, first he asked for a solution with a single regexreplace. (...)
Exactly. I've prepared trivial example for sake of my question. Actually in my application where issue arised, my RegExReplace is far more complicated (flexible) and I thought if it would be possible to do even more with one shot. This is the reason why I decided to ask for support.

@Hellbent and @Rohwedder
Thank you too for paying some attention to my challenge!

I consider this issue as closed.

Kind regards, mslonik (🐘)

My scripts on this forum: Hotstrings Diacritic O T A G L E
Please become my patreon: Patreon👍
Written in AutoHotkey text replacement tool: Hotstrings.technology
Courses on AutoHotkey :ugeek:
User avatar
Hellbent
Posts: 2109
Joined: 23 Sep 2017, 13:34

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 16:39

braunbaer wrote:
24 Jun 2021, 12:14
Hi!
First:
I am quite sure you can find 54 other ways to replace the strings without regex, but what's your point?
Hi. :wave:
I don't know anything about RegEX, but aren't you basically just doing this in this case?

Looked to me that you were just doing a simple look up from a table.
braunbaer wrote:
24 Jun 2021, 12:14
Certainly regexreplace is slower when replacing a single string. But that is not the case here. One single regexreplace call replaces a lot of different strings. You need a loop and a lot of strreplace (or the like) calls to achieve the same.
Do you have a estimate on how much faster your method is to the StrReplace() in this case?
As in this actual case not just regex in general.
I ask because what we see in a AHK script is normally just the tip of the iceberg of the actual processes that are going on behind the scenes and since you know a lot about this subject you seem as good a person as any to ask.

HB.
User avatar
Hellbent
Posts: 2109
Joined: 23 Sep 2017, 13:34

Re: RegExReplace, substitution challenge, multiple subpaterns

24 Jun 2021, 17:10

Another thing of confusion for me.
braunbaer wrote:
24 Jun 2021, 05:19
The actual replacement occurs in the callout function. Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.
And
braunbaer wrote:
24 Jun 2021, 05:19
You need a loop and a lot of strreplace (or the like) calls to achieve the same.
This looks a lot like a loop.

Code: Select all

global Index := 0
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, "(.*?)(foo|boo|woo|$)(?CCallNewStr)")
MsgBox,% Index
return

CallNewStr(Match)
{
    static repl:={"foo":"cat","boo":"dog","woo":"cow"}
	Global NewStr
    NewStr .= Match1 repl[Match2]
	++Index
}

What am I missing here?
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns

25 Jun 2021, 04:32

Hellbent wrote:
24 Jun 2021, 16:39
Do you have a estimate on how much faster your method is to the StrReplace() in this case?
No. The algos are completely different, and it depends strongly on the data (how many alternatives, how many matches, etc...). Regex may be even slower than a GOOD handcrafted parsing algo. The reason for using regex is not that it saves computing time, the reason is it tremendously saves programming time (and even if it's a little slower, in most cases that will be irrelevant). You don't have to program an algorithm for parsing your text, you just describe the patterns and call regexmatch/regexpreplace (of course, if the pattern consists of a single string, using regex does not make sense, you should use instr/strreplace in that case). This thread with a very simple task is a good example: You started off with strsplit, then, as you saw this as a dead end, you switched to strreplace and completely rewrote the code. But what if you want to replace whole words only, but allow all kinds of delimiters? Again, you wil have to completely rewrite your algorithm (and it's going to become complicated). Using regex, you always leave the program structure as it is and just adapt the regex that describes the pattern.
Hellbent wrote:
24 Jun 2021, 16:39
I don't know anything about RegEX, but aren't you basically just doing this in this case?

Doing what, precisely?
This looks a lot like a loop.
What am I missing here?
Where do you see a loop? In this script, there is one single regexreplace call, no loop. Of course, internally, in a program virtually everything happens in loops - regexreplace loops through the data, just as strreplace does, and as most other AHK commands and functions do.

Note: In an interpretive language like AHK, the speed gain of regex is probably bigger (or the speed loss is smaller) than in a compiled language, as there is some overhead for every script line executed.
george-laurentiu
Posts: 11
Joined: 17 Nov 2022, 09:13

Re: RegExReplace, substitution challenge, multiple subpaterns

17 Nov 2022, 09:20

braunbaer wrote:
24 Jun 2021, 05:19
A callout function is a good idea. This works

Code: Select all

NumpadSub::
Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, "(.*?)(foo|boo|woo|$)(?CCallNewStr)")
MsgBox,% NewStr

CallNewStr(Match)
{
    static repl:={"foo":"cat","boo":"dog","woo":"cow"}
	Global NewStr
    NewStr .= Match1 repl[Match2]
}
return
The actual replacement occurs in the callout function. Regexreplace can't do that, because the replace parameter is only evaluated once and not everytime a new match is found.

And, to make the function more flexible when other replacements are added, you can build the regex from the replacement table:

Code: Select all

NumpadSub::
Global NewStr
Global repl:={foo:"cat",boo:"dog",woo:"cow"}
rgx:=""
for key in repl
    rgx.=key "|"
rgx:= "(.*?)(" rgx "$)(?CCallNewStr)"

Haystack := "foo boo abc woo moo xyz", newstr:=""
RegExReplace(Haystack, rgx)
MsgBox,% NewStr

CallNewStr(Match) {
    NewStr .= Match1 repl[Match2]
}
return


Can you please make a version of this script which work also with non literal strings? I've tried but it doesn't work.
When I search, for example, for "/s,/s" it replaces with nothing (" , " just disappears)
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExReplace, substitution challenge, multiple subpaterns

22 Nov 2022, 07:53

george-laurentiu wrote:
17 Nov 2022, 09:20
When I search, for example, for "/s,/s" it replaces with nothing (" , " just disappears)
I don't understand what you are exactly looking for. Could you give a complete example: How does the input string look like, what should be the replacement rules, and what is the expected output?

you could just post the script that does not work for you, along with the desired output.
george-laurentiu
Posts: 11
Joined: 17 Nov 2022, 09:13

Re: RegExReplace, substitution challenge, multiple subpaterns

22 Nov 2022, 13:31

you could just post the script that does not work for you, along with the desired output.

Code: Select all

^#8::
Global NewStr
Global repl:={foo:"cat",boo:"dog",woo:"cow", "\d,\d":"aaa"}
rgx:=""
for key in repl
rgx.=key "|"
rgx:= "(.*?)(" rgx "$)(?CCallNewStr)"
Clipboard := ""
Sendinput ^c
Clipwait
newstr:=""
RegExReplace(Clipboard, rgx)
CallNewStr(Match) {
NewStr .= Match1 repl[Match2]
}
return
Selected string which is copied into clipboard: foo hei 3,3 boo salut woo

Expected result: cat hei aaa dog salut cow

Actual result: cat hei dog salut cow

When I use regex syntax, the script doesn't replace anything, that's the problem.


[Mod edit: Fixed quote tags.]

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Descolada, Giresharu, inseption86, jomaweb, Rohwedder and 268 guests