I tested this using FileOpen and got the same exact results. Unfortunately, it's a tradeoff between the O(log n) access of a map vs the linear access of an array. Now if you wanted to do
aaaa →
工, you could use the file pointer and binary search as I do here:
Code: Select all
UnicodeData("🍅")
UnicodeData(s, filepath := "UnicodeData.txt", show := True) {
if (s == "")
return
; Download UnicodeData from backup sites if needed.
if not FileExist(filepath)
try Download "https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt", filepath
catch
try Download "http://www.unicode.org/Public/UNIDATA/UnicodeData.txt", filepath
catch
Download "https://raw.githubusercontent.com/latex3/unicode-data/main/UnicodeData.txt", filepath
; Open UnicodeData.txt
database := FileOpen(filepath, "r`n", "UTF-8")
; Binary Search.
l := 0 ; lower bound
h := database.length ; higher bound
while (n := (l + h) / 2, n != l && n != h) { ; Allow 0.5 so it breaks after this loop.
database.Seek(n) ; Move file pointer to middle of file.
(database.Pos != 0) && database.ReadLine() ; Ensure a full line can be read below.
row := database.ReadLine() ; Read a full line of text.
codepoint := RegExReplace(row, "^(.*?);.*$", "0x$1") ; Extract and convert the unicode hex to decimal.
; Limit min or max bound of binary search.
if (Ord(s) < codepoint)
h := Floor(n) ; Converges h == l
else if (Ord(s) > codepoint)
l := Ceil(n) ; Converges h == l
else
break
}
database.Close()
r := StrSplit(row, ";") ; Split row into an array.
desc := (r[2] = "<control>") ? r[11] : r[2] ; Retrieve alternate description if control character.
if not show
return Format("<U+{:X}> " desc, Ord(s))
MsgBox Format(" <U+{:X}> " desc, Ord(s)), " " Chr(Ord(s)) ; Show a Message Box.
}
Actually, a Map can't handle repeated keys, so I assume you're doing a lookup like
aaaa →
工 where you can take advantage of the fact that the data comes pre-sorted already!