UTF-8 ini files

Post your working scripts, libraries and tools
User avatar
jeeswg
Posts: 5746
Joined: 19 Dec 2016, 01:58
Location: UK

UTF-8 ini files

18 Oct 2017, 08:33

A workaround to create 'UTF-8 ini files'.

Note: ANSI/Unicode ini files cannot handle certain characters, this would also affect 'UTF-8 ini files'.

IniRead/IniWrite/IniDelete which are based on:
GetPrivateProfileString function (Windows)
https://msdn.microsoft.com/en-us/librar ... s.85).aspx
WritePrivateProfileString function (Windows)
https://msdn.microsoft.com/en-us/librar ... s.85).aspx

And:
GetPrivateProfileSection function (Windows)
https://msdn.microsoft.com/en-us/librar ... s.85).aspx
GetPrivateProfileSectionNames function (Windows)
https://msdn.microsoft.com/en-us/librar ... s.85).aspx
WritePrivateProfileSection function (Windows)
https://msdn.microsoft.com/en-us/librar ... s.85).aspx

can only handle ANSI and UTF-16.

The following code provides a way to use UTF-8 instead, essentially by using an ANSI ini, and converting between UTF-8/UTF-16 (or UTF-8/ANSI) every time you do a read/write.

;==================================================

Code: Select all

q:: ;test UTF-8 conversion
vText := Chr(8730) Chr(33) Chr(333) Chr(3333) Chr(33333) Chr(8730)
MsgBox, % vText
vUtf8 := JEE_StrTextToUtf8Bytes(vText)
MsgBox, % vUtf8
vText2 := JEE_StrUtf8BytesToText(vUtf8)
MsgBox, % (vText = vText2)
return

w:: ;test 'UTF-8 ini files'
vText := ";this line is required for a 'UTF-8 ini file'"
vPath := A_Desktop "\MyUtf8Ini.ini"
FileAppend, % vText, % "*" vPath, UTF-8
vSection := Chr(8730) "Section" Chr(8730)
vKey := Chr(8730) "Key" Chr(8730)
vValue := Chr(8730) Chr(33) Chr(333) Chr(3333) Chr(33333) Chr(8730)
JEE_IniWriteUtf8(vValue, vPath, vSection, vKey)
MsgBox, % JEE_IniReadUtf8(vPath, vSection, vKey)
return

;==================================================

;note: a 'UTF-8 ini file' will need a comment as the first line
;e.g. ';my comment'
JEE_IniReadUtf8(vPath, vSection:="", vKey:="", vDefault:="")
{
	local vOutput
	vSection := JEE_StrTextToUtf8Bytes(vSection)
	vKey := JEE_StrTextToUtf8Bytes(vKey)
	IniRead, vOutput, % vPath, % vSection, % vKey, % vDefault
	if !ErrorLevel
		return JEE_StrUtf8BytesToText(vOutput)
}

;==================================================

JEE_IniWriteUtf8(vValue, vPath, vSection, vKey:="")
{
	vSection := JEE_StrTextToUtf8Bytes(vSection)
	vKey := JEE_StrTextToUtf8Bytes(vKey)
	vValue := JEE_StrTextToUtf8Bytes(vValue)
	IniWrite, % vValue, % vPath, % vSection, % vKey
	return !ErrorLevel
}

;==================================================

JEE_IniDeleteUtf8(vPath, vSection, vKey:="")
{
	vSection := JEE_StrTextToUtf8Bytes(vSection)
	vKey := JEE_StrTextToUtf8Bytes(vKey)
	IniDelete, % vPath, % vSection, % vKey
	return !ErrorLevel
}

;==================================================

JEE_StrUtf8BytesToText(vUtf8)
{
	if A_IsUnicode
	{
		VarSetCapacity(vUtf8X, StrPut(vUtf8, "CP0"))
		StrPut(vUtf8, &vUtf8X, "CP0")
		return StrGet(&vUtf8X, "UTF-8")
	}
	else
		return StrGet(&vUtf8, "UTF-8")
}

;==================================================

JEE_StrTextToUtf8Bytes(vText)
{
	VarSetCapacity(vUtf8, StrPut(vText, "UTF-8"))
	StrPut(vText, &vUtf8, "UTF-8")
	return StrGet(&vUtf8, "CP0")
}

;==================================================
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
dd900
Posts: 76
Joined: 27 Oct 2013, 16:03

Re: UTF-8 ini files

29 Oct 2017, 15:12

Nice code. But whats wrong with normal Ini cmds?

Works fine:

Code: Select all

IniWrite, % Chr(8730) Chr(33) Chr(333) Chr(3333) Chr(33333) Chr(8730), ini.ini, sec, key
IniRead, nonASCII, ini.ini, sec, key
MsgBox, % nonASCII
Notepad++ is telling me the above ini is encoded with UCS-2 LE BOM

Funny because the documented workaround does not work for me

Code: Select all

FileAppend,, ini.ini, UTF-8-RAW
IniWrite, % Chr(8730) Chr(33) Chr(333) Chr(3333) Chr(33333) Chr(8730), ini.ini, sec, key
IniRead, nonASCII, ini.ini, sec, key
MsgBox, % nonASCII
Notepad++ says UTF-8 but the characters in the ini are not UTF

Maybe you can shed some light on this for me?
User avatar
jeeswg
Posts: 5746
Joined: 19 Dec 2016, 01:58
Location: UK

Re: UTF-8 ini files

29 Oct 2017, 15:24

UCS-2 LE BOM = UTF-16 LE BOM

There's nothing 'wrong' with the IniRead/IniWrite functions per se, just that they can only handle UTF-16 (UTF-16 files are often bigger in size than UTF-8 files if your language uses the Latin alphabet) and ANSI (Unicode to ANSI is lossy).

FileAppend,, ini.ini, UTF-8-RAW
- This creates a blank (0 byte) file which will then be regarded as either UTF-16 or ANSI by the IniWrite command, depending on whether AHK is ANSI/Unicode.
- AHK Unicode creates new ini files as UTF-16.
- AHK ANSI creates new ini files as ANSI.
- AHK Unicode and ANSI see any existing file as ANSI, unless it has a UTF-16 BOM. This means that both see any blank (0 byte) file as ANSI.
- FileAppend with 'UTF-8' appends the text, and if the file is empty/doesn't exist, prepends a BOM. Creating a file by appending an empty string results in a 3-byte file (the 3 bytes are the BOM).
- FileAppend with 'UTF-8-RAW' appends the text, but never prepends a BOM. Creating a file by appending an empty string results in a 0-byte file.

FileEncoding
https://autohotkey.com/docs/commands/FileEncoding.htm
•UTF-8: Unicode UTF-8, equivalent to CP65001.
•UTF-16: Unicode UTF-16 with little endian byte order, equivalent to CP1200.
•UTF-8-RAW or UTF-16-RAW: As above, but no byte order mark is written when a new file is created.
IniWrite
https://autohotkey.com/docs/commands/IniWrite.htm
New files are created in either the system's default ANSI code page or UTF-16, depending on the version of AutoHotkey.

...

In Unicode scripts, IniWrite uses UTF-16 for each new file. If this is undesired, ensure the file exists before calling IniWrite.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
jeeswg
Posts: 5746
Joined: 19 Dec 2016, 01:58
Location: UK

Re: UTF-8 ini files

27 Oct 2018, 11:54

There is more info re. ;this line is required for a 'UTF-8 ini file', here:
IniRead requires blank line - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 91#p246291
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
DRocks
Posts: 310
Joined: 08 May 2018, 10:20

Re: UTF-8 ini files

02 Nov 2018, 05:41

Thank you

Return to “Scripts and Functions”

Who is online

Users browsing this forum: No registered users and 31 guests