REGEX word boundary Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
zuzu_kuc
Posts: 76
Joined: 30 Mar 2016, 12:36

REGEX word boundary

29 Jan 2020, 03:43

Hi,
why the RED regex dont work? i try change Case to lower characters of word Grilovací to grilovací but when i use word boundary \b and \b to indicate the beginning and end of a word. Nothing happend, when i remove last \b then it works, what i am doing wrong? Thank you.
- red square (behind the bullet) is metacharacter for hard space in InDesign
https://1drv.ms/u/s!Au4b1k0c3BzYkZs6LTVcsWRNE99XUw?e=fZbUIj [Mod edit: Link fixed.] - omg WTH? it is onedrive link, why is denided?

Code: Select all

info = "Grilovací plocha: 36 cm • Grilovací výška: 27 cm"
info := RegExReplace(info,"i)\bGrilovací","grilovací")
info := RegExReplace(info,"i)\bGrilovací\b","grilovací")
just me
Posts: 9450
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: REGEX word boundary  Topic is solved

29 Jan 2020, 04:01

Hi,
\w Matches any single "word" character, namely alphanumeric or underscore. This is equivalent to [a-zA-Z0-9_]. Conversely, capital \W means "any non-word character".
If you run AHK Unicode try (*UCP)
zuzu_kuc
Posts: 76
Joined: 30 Mar 2016, 12:36

Re: REGEX word boundary

29 Jan 2020, 04:30

I know, but i have many diferently words on diferent location. And i need word boundrey to avoid THIS and IS.
But i dont understant why \bword\b dont work in my script :/ where can by a problem?
just me
Posts: 9450
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: REGEX word boundary

29 Jan 2020, 05:12

By default, word characters are defined in the ASCII range [a-zA-Z0-9_]. The í character in your haystack is not part of this range. The same is true für e.g. 'German Umlauts' [äöüÄÖÜß].
zuzu_kuc
Posts: 76
Joined: 30 Mar 2016, 12:36

Re: REGEX word boundary

29 Jan 2020, 05:41

hmm right, whned i change í to i it works. But it strange, the same code works in InDesign. I know InDesign have his own metacharacters, but somenting basic as Word Boundary? ..ok i have to find another way how to to that. Thank you.
zuzu_kuc
Posts: 76
Joined: 30 Mar 2016, 12:36

Re: REGEX word boundary

29 Jan 2020, 05:56

EDIT: hmm as regex documentation say: \b can by used on characters wich is in \w and řžýáíé are not in \w, and DOT {.} in \b is not any character, but DOT, wich means end of the sentes for \b -

No i still dont get it:
word Napájení wich contains á and í works in this cases:

info := RegExReplace(info,"i)\bNapájení","napájení")
OR
info := RegExReplace(info,"i)Napájení","napájení")
OR
info := RegExReplace(info,"i)Nap.jen.","napájení")

but dont works:
info := RegExReplace(info,"i)\\bNapájení\b","napájení")
OR
info := RegExReplace(info,"i)Napájení\b","napájení")
even
info := RegExReplace(info,"i)\bnap.jen.\b","napájení")
just me
Posts: 9450
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: REGEX word boundary

29 Jan 2020, 12:00

Regular Expressions -> Commonly Used Symbols and Syntax:
\b ... It requires the current character's status as a word character (\w) to be the opposite of the previous character's. ...
So what do we have:
  1. RegExReplace(info,"i)\bNapájení","napájení")
    The character following the \b is a word character (N). The 'previous' character isn't a word character. I.e. its a word boundary and a match.
  2. RegExReplace(info,"i)Napájení","napájení")
    No boundary check. I.e. it's a match.
  3. RegExReplace(info,"i)Nap.jen.","napájení")
    No boundary check. The dots (.) stand for any single character, á and í in this case. I.e. same as 2.
  4. RegExReplace(info,"i)\bNapájení\b","napájení")
    The result of the first boundary check is true (see 1.). The character following second boundary check isn't a word character. The previous character í isn't a word character, too. I.e. it's not a word boundarey and no match.
  5. RegExReplace(info,"i)Napájení\b","napájení")
    The result of the boundary check is false (see 4.). I.e. it'S no match.
  6. RegExReplace(info,"i)\bnap.jen.\b","napájení")
    The dots (.) stand for any single character, á and í in this case I.e. same as 5.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Joey5 and 210 guests