regex problem

16 May 2020, 20:38

I'd like to get the second word of a string, that contains random words, with random spaces between them, and an occasional accent mark. Because of the accent mark, my regex script will not locate the entire second word and instead leave out the e. How do I make it so that if there is an e with an accent at the end, or if there is an accented letter at the beginning, it is still considered part of the word?

line:= "él     	esté   	ellos	         estén"

RegExMatch(Line, "^(?:.*?\K\b\S+\b){2}", word1)	; <-- finds the 2nd word
msgbox, % word1 ; prints "est" instead of "esté"

Re: regex problem

16 May 2020, 20:55

Special characters like accented letters are not considered word characters, so the word boundary \b will not consider them part of the word. To get around that, just consider white space and non-white space characters by removing both \b markers from your needle. Alternatively:

Code: Select all

RegExMatch(Line, "\S+\s+\K\S+", word1)

