UdeExport.dll (x64) based on Mozilla Universal Charset Detector (Ude C# port)
Original code for Ude can be found here.
"Mostly" Accurate Encoding Detector
Not much different than the original. I updated some of the code a little, but for the most part I just added a couple exported functions and packed it up so its easily callable from AutoHotkey.
There are two functions available
GetFileEncoding
GetStringEncoding <-- Not Very Useful, but it's there
GetFileEncoding will often return "ASCII" even if the file is UTF-8. Unless there are UTF-8 specific characters or a UTF-8 BOM this will always be the case.
Here is the Dictionary<Encoding string, Codepage int> that contains all available return values. I could not find some codepages.
Code: Select all
internal static Dictionary<string, int> UdeCharsetCodePages = new()
{
{ "ASCII", 20127 },
{ "UTF-8", 65001 },
{ "UTF-16LE", 1200 },
{ "UTF-16BE", 1201 },
{ "UTF-32BE", 12001 },
{ "UTF-32LE", 12000 },
{ "X-ISO-10646-UCS-4-3412", 0 },
{ "X-ISO-10646-UCS-4-2413", 0 },
{ "windows-1251", 1251 },
{ "windows-1252", 1252 },
{ "windows-1253", 1253 },
{ "windows-1255", 1255 },
{ "Big-5", 950 },
{ "EUC-KR", 51949 },
{ "EUC-JP", 51932 },
{ "EUC-TW", 0 },
{ "gb18030", 54936 },
{ "ISO-2022-JP", 50222 },
{ "ISO-2022-CN", 0 },
{ "ISO-2022-KR", 50225 },
{ "HZ-GB-2312", 52936 },
{ "Shift-JIS", 932 },
{ "x-mac-cyrillic", 10007 },
{ "KOI8-R", 20866 },
{ "IBM855", 855 },
{ "IBM866", 866 },
{ "ISO-8859-2", 28592 },
{ "ISO-8859-5", 28595 },
{ "ISO-8859-7", 28597 },
{ "ISO-8859-8", 28598 },
{ "TIS620", 874 }
};