Use regexreplace maintaining original capitalization

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Use regexreplace maintaining original capitalization

Post by Nixcalo » 19 Sep 2021, 00:39

Hi everyone!

I have a large number of strings I need to perform some replace operations. I have created an array called ARR from a variable called Replacements where I have the substitutions I need, in the following way

Code: Select all

Replacements =  ;This the variable
(
retire:quite
entrada:admisión
arnés:mazo de cables
muelle:resorte
)

arr := {} ; I create an array named Arr with the contents of Replacements!!!
for x,y in strsplit(Replacements,"`n", "`r"
		arr[strsplit(y, ":").1] := strsplit(y, ":").2
and then

Code: Select all

For Word1,Word2 in arr
TargetText:=StrReplace(SourceText,Word1,Word2)
So the code is clear, when my text finds "retire" it changes it to "quite", if it finds "entrada" it changes it to "admisión", etc.

So far so good.

The thing is, I need two extra things for this to be useful. First, I need to maintain capitalization because I might have the entry "retire" but it could also be "Retire", with a capital R. Or it could be "RETIRE", all in uppercase (there are only those two types of capitalization). And of course, it's not optimal having to write all capitalization possibilities!

This, I have it covered thanks to the function JEE_StrReplaceMaintainCase I could find in this same forum. It works beautifully as can be seen in the following link. replace text maintain case

But alas, I have another problem. I need to use Regular Expressions and the function JEE_StrReplaceMaintainCase only works with StrReplace. I have not been able to adapt it to use RegexReplace.

The thing, I would also need things like

Code: Select all

Replacements =  
(
hacia a(dentro|fuera):hacia $1
njunto de (\w+)s\b:njunto de $1
nto(s?) de arneses:nto$1 de mazo de cables
)
because I cannot have a million strings to replace, I need to use them smartly to lower the number of them and cover more cases, and that is what Regexes are for.

So how the heck do you think you can replace a NeedleRegEx within a Haystack so capitalization is preserved? So I can apply it to my large array of a hundred substitutions (or more)?

The thing should be that, if I have

Code: Select all

Replacements=
cas(on)?a(s?):choza$2
I would be able to replace casas, casa, casona, casonas, Casonas, CASONAS, CASONA, etc in just one entry in my array for chozas, choza, choza, chozas, Chozas, CHOZAS, CHOZA, etc. With just one entry. I have got this without regular expressions, thanks to the function JEE_StrReplaceMaintainCase above, but when I try to change StrReplace to RegexReplace.. everything goes to hell.

Any ideas? Only two possibilities, either the replaced string has to go all in uppercase, or only the first case. Nothing crazy like CamelCase.
User avatar
mikeyww
Posts: 26437
Joined: 09 Sep 2014, 18:38

Re: Use regexreplace maintaining original capitalization

Post by mikeyww » 19 Sep 2021, 05:57

Code: Select all

replacements =
(
retire:quite
entrada:admisión
arnés:mazo de cables
muelle:resorte
)
MsgBox, 64, Result, % replace("This muelle is neither a MUELLE nor an Entrada.", replacements)

replace(str, replacements) {
 StringCaseSense, On
 For each, line in StrSplit(replacements, "`n", "`r")
  For each, case in ["L", "U", "T"]
   str := StrReplace(str
        , Format("{:" case "}", (word := StrSplit(line, ":")).1)
        , Format("{:" case "}", word.2))
 Return str
}
Or:

Code: Select all

replacements =
(
retire:quite
entrada:admisión
arnés:mazo de cables
muelle:resorte
)
MsgBox, 64, Result, % replace("This muelle is neither a MUELLE nor an Entrada.", replacements)

replace(str, replacements) {
 StringCaseSense, On
 Static arr := {}
 If !arr.Count()
  For each, line in StrSplit(replacements, "`n", "`r")
   arr[(word := StrSplit(line, ":")).1] := word.2
 For word1, word2 in arr
  For each, case in ["L", "U", "T"]
   str := StrReplace(str, Format("{:" case "}", word1), Format("{:" case "}", word2))
 Return str
}
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: Use regexreplace maintaining original capitalization

Post by Nixcalo » 19 Sep 2021, 10:57

Hi, Thank you for your reply!


First of all, I think your solution works perfectly and in a simpler way than the one I had.

If you change strReplace with RegexReplace, which was my need as I need to work with Regular expressions, it works great as well, as far as I have tested.

However, I believe there is a problem when either the Haystack or the Needle consists of two or more words. I believe the space is causing problems so I am trying to find out why and solve it.

In some

For example, if I have this:

Code: Select all

  replacements =
(
retire:quite
arnés:mazo de cables
de la herramienta de trabajo:del implemento
a la herramienta de trabajo:al implemento
la herramienta de trabajo:el implemento
herramientas de trabajo:implementos
; cas(on)?a:choza
casa:choza
casona:choza
)
MsgBox, 64, Result, % replace("Retire retire RETIRE arnés Arnés ARNÉS
, las casas CASAS La herramienta de trabajo la herramienta de trabajo LA HERRAMIENTA DE TRABAJO
, casa, casona, casonas, Casonas, CASONAS, CASONA la herramienta de trabajo
, A la herramienta de trabajo A LA HERRAMIENTA DE TRABAJO", replacements)

replace(str, replacements) {
 StringCaseSense, On
 For each, line in StrSplit(replacements, "`n", "`r")
  For each, case in ["L", "U", "T"]
   str := Strreplace(str
        , Format("{:" case "}", (word := StrSplit(line, ":")).1)
        , Format("{:" case "}", word.2))
 Return str
}
The result is this.
image.png
image.png (8.23 KiB) Viewed 470 times
There are a few issues here. First, Arnés is changed to Mazo De Cables (Title Case) when I am aiming to "Mazo de cables".
"La herramienta de trabajo" is unchanged, I don't know why. And "A la herramienta de trabajo" is changed to "A el implemento" when it should be "Al implemento"...

I think we are in the right track here, but I am going to do some research here. I believe there is some issue with the spaces in the Haystack...
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: Use regexreplace maintaining original capitalization

Post by Nixcalo » 19 Sep 2021, 14:00

I have found a neat solution, using a string with SentenceCase instead of using the Format function. Here it is in case anything can make a good use to it.

In short, replace is a function that is great for performing lots of string substitutions to an array of the form: {"Haystack1":"Needle1", "Haystack2":"Needle1", etc...}
It supports regular expressions and maintains capitalization of the original string.

Invaluable help from mikeyww and the user tidbit who created the st_setCase function from the library String Things - Common String & Array Functions.

Code: Select all

replacements =
(
retire:quite
arnés:mazo de cables
cas(on)?a:choza
de la herramienta de trabajo:del implemento
\ba la herramienta de trabajo:al implemento
la herramienta de trabajo:el implemento
herramientas de trabajo:implementos
casa:choza
casona:choza
)
MsgBox, 64, Result, % replace("Retire retire RETIRE arnés Arnés ARNÉS
, las casas CASAS La herramienta de trabajo la herramienta de trabajo LA HERRAMIENTA DE TRABAJO
, casa, casona, casonas, Casonas, CASONAS, CASONA la herramienta de trabajo
, A la herramienta de trabajo A LA HERRAMIENTA DE TRABAJO", replacements)

replace(str, replacements) {
 StringCaseSense, On
MsgBox, % str

 For each, line in StrSplit(replacements, "`n", "`r")

 For each, case in ["l", "u", "s"] ; Using st_setCase

		{
	word := StrSplit(line, ":")
msgbox, % "Line: " line "`nWord1: " word.1 "`nWord2: " word.2
	msgbox, % st_setCase(word.1,case)
	msgbox, % st_setCase(word.2,case)
;   msgbox, % Format("{:" case "}", word.1)
;   MsgBox, % Format("{:" case "}", word.2)
 ;  str := RegexReplace(str, Format("{:" case "}", (word := StrSplit(line, ":")).1), Format("{:" case "}", word.2))
  str := RegexReplace(str, st_setCase(word.1,case), st_setCase(word.2,case))

MsgBox, % str
		}
 Return str
}

st_setCase(string, case="s") ; Changes string to all uppercase, all lowercase, sentence mode, title mode, inverted mode...
/*
SetCase
   Set the case (Such as UPPERCASE or lowercase) for the specified text.

   string = The text you want to modify.
   case   = The case you would like the specified text to be.

   The following types of Case are aloud:
   .-===============================================-.
   |    Use any cell as a name. CaSE-InSEnsitIVe.    |
   |----|-----|---------|------------|---------------|
   | 1  |  U  |   UP    |   UPPER    |   UPPERCASE   |
   |----|-----|---------|------------|---------------|
   | 2  |  l  |   low   |   lower    |   lowercase   |
   |----|-----|---------|------------|---------------|
   | 3  |  T  |  Title  |  TitleCase |               |
   |----|-----|---------|------------|---------------|
   | 4  |  S  |   Sen   |  Sentence  |  Sentencecase |
   |----|-----|---------|------------|---------------|
   | 5  |  i  |   iNV   |   iNVERT   |   iNVERTCASE  |
   |----|-----|---------|------------|---------------|
   | 6  |  r  |  rANd   |   rAnDOm   |   RAndoMcASE  |
   '-===============================================-'

example: st_setCase("ABCDEFGH", "l")
output: abcdefgh
*/
{
   if (case=1 || case="u" || case="up" || case="upper" || case="uppercase")
      StringUpper, new, string
   else if (case=2 || case="l" || case="low" || case="lower" || case="lowercase")
      StringLower, new, string
   else if (case=3 || case="t" || case="title" || case="titlecase")
   {
      StringLower, string, string, T
      string:=RegExReplace(string, "i)(with|amid|atop|from|into|onto|over|past|plus|than|till|upon|are|via|and|but|for|nor|off|out|per|the|\b[a-z]{1,2}\b)", "$L1")
      new:=RegExReplace(string, "^(\w)|(\bi\b)|(\w)(\w+)$", "$U1$U2$U3$4")
   }
   else if (case=4 || case="s" || case="sen" || case="sentence" || case="sentencecase")
   {
      StringLower string, string
      new:=RegExReplace(string, "([.?\s!(]\s\w)|^(\b\w)|(\.\s*[(]\w)|(\bi\b)", "$U0")
   }
   else if (case=5 || case="i" || case="inv" || case="invert" || case="invertcase")
   {
      Loop, parse, string
      {
         if A_LoopField is upper
            new.= Chr(Asc(A_LoopField) + 32)
         else if A_LoopField is lower
            new.= Chr(Asc(A_LoopField) - 32)
         else
            new.= A_LoopField
      }
   }
   else if (case=6 || case="r" || case="rand" || case="random" || case="randomcase")
   {
      loop, parse, string
      {
         random, rcase, 0, 1
         if (rcase==0)
            StringUpper, out, A_LoopField
         Else
            StringLower, out, A_LoopField
         new.=out
      }
      return new
   }
   Else
      return -1
   return new
}
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: Use regexreplace maintaining original capitalization

Post by Nixcalo » 19 Sep 2021, 16:17

There is an issue with the replace function that I am trying to solve.

If Haystack is, for example, man(kind)\b, during its operation it will try to search for the strings man(kind)\b, Man(kind)\b and MAN(KIND)\B. Evidently, since \b is NOT the same as \B, I must find a way so the regex operators are not modified or it's capitalization changed.

This is an issue since in regex there are many operators (\b, \s, \p{Ll}, \w, among a few) that are case-sensitive. I must find a way to skip those.

I am on it.
User avatar
boiler
Posts: 16705
Joined: 21 Dec 2014, 02:44

Re: Use regexreplace maintaining original capitalization

Post by boiler » 19 Sep 2021, 17:52

In fact, the capital version of most RegEx tokens typically indicate the exact opposite of the lowercase version.
Post Reply

Return to “Ask for Help (v1)”