REGEX word boundary

zuzu_kuc · 29 Jan 2020, 03:43

Hi,
why the RED regex dont work? i try change Case to lower characters of word Grilovací to grilovací but when i use word boundary \b and \b to indicate the beginning and end of a word. Nothing happend, when i remove last \b then it works, what i am doing wrong? Thank you.
- red square (behind the bullet) is metacharacter for hard space in InDesign
https://1drv.ms/u/s!Au4b1k0c3BzYkZs6LTVcsWRNE99XUw?e=fZbUIj [Mod edit: Link fixed.] - omg WTH? it is onedrive link, why is denided?

Code: Select all

info = "Grilovací plocha: 36 cm • Grilovací výška: 27 cm"
info := RegExReplace(info,"i)\bGrilovací","grilovací")
info := RegExReplace(info,"i)\bGrilovací\b","grilovací")

just me · 29 Jan 2020, 04:01

Hi,

\w Matches any single "word" character, namely alphanumeric or underscore. This is equivalent to [a-zA-Z0-9_]. Conversely, capital \W means "any non-word character".

If you run AHK Unicode try (*UCP)

zuzu_kuc · 29 Jan 2020, 04:30

I know, but i have many diferently words on diferent location. And i need word boundrey to avoid THIS and IS.
But i dont understant why \bword\b dont work in my script :/ where can by a problem?

just me · 29 Jan 2020, 05:12

By default, word characters are defined in the ASCII range [a-zA-Z0-9_]. The í character in your haystack is not part of this range. The same is true für e.g. 'German Umlauts' [äöüÄÖÜß].

zuzu_kuc · 29 Jan 2020, 05:41

hmm right, whned i change í to i it works. But it strange, the same code works in InDesign. I know InDesign have his own metacharacters, but somenting basic as Word Boundary? ..ok i have to find another way how to to that. Thank you.

zuzu_kuc · 29 Jan 2020, 05:56

EDIT: hmm as regex documentation say: \b can by used on characters wich is in \w and řžýáíé are not in \w, and DOT {.} in \b is not any character, but DOT, wich means end of the sentes for \b -

No i still dont get it:
word Napájení wich contains á and í works in this cases:

info := RegExReplace(info,"i)\bNapájení","napájení")
OR
info := RegExReplace(info,"i)Napájení","napájení")
OR
info := RegExReplace(info,"i)Nap.jen.","napájení")

but dont works:
info := RegExReplace(info,"i)\\bNapájení\b","napájení")
OR
info := RegExReplace(info,"i)Napájení\b","napájení")
even
info := RegExReplace(info,"i)\bnap.jen.\b","napájení")

just me · 29 Jan 2020, 12:00

Regular Expressions -> Commonly Used Symbols and Syntax:

\b ... It requires the current character's status as a word character (\w) to be the opposite of the previous character's. ...

So what do we have:

RegExReplace(info,"i)\bNapájení","napájení")
The character following the \b is a word character (N). The 'previous' character isn't a word character. I.e. its a word boundary and a match.
RegExReplace(info,"i)Napájení","napájení")
No boundary check. I.e. it's a match.
RegExReplace(info,"i)Nap.jen.","napájení")
No boundary check. The dots (.) stand for any single character, á and í in this case. I.e. same as 2.
RegExReplace(info,"i)\bNapájení\b","napájení")
The result of the first boundary check is true (see 1.). The character following second boundary check isn't a word character. The previous character í isn't a word character, too. I.e. it's not a word boundarey and no match.
RegExReplace(info,"i)Napájení\b","napájení")
The result of the boundary check is false (see 4.). I.e. it'S no match.
RegExReplace(info,"i)\bnap.jen.\b","napájení")
The dots (.) stand for any single character, á and í in this case I.e. same as 5.

REGEX word boundary Topic is solved

REGEX word boundary

Re: REGEX word boundary Topic is solved

Re: REGEX word boundary

Re: REGEX word boundary

Re: REGEX word boundary

Re: REGEX word boundary

Re: REGEX word boundary

Who is online