Page 1 of 1

4-byte Unicode support?

Posted: 29 Mar 2017, 10:04
by mbirth
I am trying to clean clipboard data from multibyte unicode characters by looping through all characters, using SubStr() and Asc() and removing those with values larger than 254. During testing (with Unicode 64-bit version) I noticed that it doesn't recognise 3- or 4-Byte-characters. E.g. a string with one single U+1F60A character gives a StrLen() of 2 and looping through it with Asc() gives a character Chr(55357) [D83D] and Chr(56842) [DE0A] (UTF-16 representation of the character) when it should be one single Chr(128522) [1F60A].

However, for "❤", it works correctly. StrLen() returns 1 and Asc() returns 10084.

Is that a bug or a feature? How can I work around that to be able to parse character by character instead of having partial UTF-16 Bytes in between?

Re: 4-byte Unicode support?

Posted: 30 Mar 2017, 03:19
by just me

Re: 4-byte Unicode support?

Posted: 30 Mar 2017, 07:46
by mbirth
Thanks. using Ord() instead of Asc() was the trick. (And skipping a character whenever that returned values > 65535.)