Page 1 of 1

RegEx misses unicode bullet

Posted: 06 Jul 2020, 18:38
by Kapitano
I'm using AHK2 (version a108) to process text with some non-ascii characters. I cut it from the clipboard into an array, delimiting at `n:

Code: Select all

PageArray := StrSplit(Clipboard , "`n")
...and I want to remove the start of lines up to the bullet character (U+2022).

Code: Select all

Line := RegExReplace(Line , "^(?:.+?•)(.+?)" , "$1")
But, AHK seems not to recognise the bullet character - the RegEx line does nothing. AHK a113 has the same problem. Is this part of a larger problem with non-ascii characters? Is there a workaround?

Re: RegEx misses unicode bullet

Posted: 06 Jul 2020, 18:49
by swagfag

Code: Select all

MsgBox RegExReplace('abc•xyz' , "^(?:.+?•)(.+?)" , "$1") ; xyz
cant reproduce with a115

Code: Select all

Clipboard
is A_Clipboard post a111

make sure u save ur scripts as utf-8 with BOM

Re: RegEx misses unicode bullet

Posted: 06 Jul 2020, 19:45
by Kapitano
> make sure u save ur scripts as utf-8 with BOM

Oh crap, I think that's it. Excuse me while I go and slap myself.

Re: RegEx misses unicode bullet

Posted: 07 Jul 2020, 22:35
by lexikos
FYI, UTF-8 is the default since v2.0-a112.