V2 faq UTF8 recommendation Topic is solved

Share your ideas as to how the documentation can be improved.
TAC109
Posts: 1129
Joined: 02 Oct 2013, 19:41
Location: New Zealand

V2 faq UTF8 recommendation

30 Sep 2023, 17:38

@Ragnar
If you agree, could you change the v2 faq, item Why are the non-ASCII characters in my script displaying or sending incorrectly? replacing Short answer: Save the script as UTF-8. with Short answer: Save the script as UTF-8, or for complete compatibility with utilities such as editors, UTF-8 with BOM.

Thanks
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: V2 faq UTF8 recommendation

01 Oct 2023, 02:48

Which utilities/editors work better with v2 scripts saved in UTF-8 BOM?
TAC109
Posts: 1129
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: V2 faq UTF8 recommendation

01 Oct 2023, 04:21

Having a BOM (Byte Order Mark) written to the start of a UTF-8 file ensures that any modern utility (including editors) that can understand UTF-8, can determine with certainty that the file is actually UTF-8. Without a BOM at the start of the file the program has to make a guess as to the file encoding, sometimes getting it wrong.

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: V2 faq UTF8 recommendation

01 Oct 2023, 05:09

I know what BOM is. I wonder if there are any real world practical issues with particular utilities/editors and v2 script that you encounter that you think supports making the suggested changes? I use UTF-8 without BOM for v2 scripts and encounter no issues.
TAC109
Posts: 1129
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: V2 faq UTF8 recommendation

01 Oct 2023, 17:51

My suggested alteration to the faq is to help users having possible problems when editing/processing scripts. If you find that your editor handles UTF-8 files correctly, then that is good, however the editor will still process files correctly when saved as UTF-8 with BOM but with more certainty. This recommendation should not adversely affect your editor unless it does not understand Unicode encoding.

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: V2 faq UTF8 recommendation

02 Oct 2023, 06:14

But are such possible issues a real problem for v2 users often enough that it merits a BOM recommendation in the doc?
v2 defaults to UTF-8 (without BOM) and Notepad since Windows 10 build 1903 defaults to UTF-8 (without BOM).
So I'm curious if your suggestion now is prompted by some utilities/editors in particular that recently caused issues for v2 users with non-BOM UTF-8 files?

If v2 users run into such issues only very seldomly then it seems unclear if the documentation should recommend BOM.

This 2020 thread is relevant to this topic and has the opposite wish: remove a BOM recommendation.
viewtopic.php?t=79309

In that thread user need4speed points to a difference between these two doc locations
https://www.autohotkey.com/docs/v2/Program.htm#create
Be sure to save the file as UTF-8 with BOM if it will contain non-ASCII characters.
and
https://www.autohotkey.com/docs/v2/FAQ.htm#nonascii
Short answer: Save the script as UTF-8.
(Though two paragraphs below that it says: "To save as UTF-8 in Notepad, select UTF-8 or UTF-8 with BOM from ...")

The message in those two doc locations could be made more aligned in two ways: recommending BOM in both or in neither.

The 2020 thread has an argument against recommending BOM
need4speed wrote:
03 Aug 2020, 07:35
In certain circumstances "UTF-8 with BOM" can cause trouble.
lexikos replied that those circumstances are not relevant to AutoHotkey.

But I think they can be indirectly relevant, if a v2 doc BOM recommendation influences the user to set their editor to default to create UTF-8 BOM files and the user then also uses that editor for creating such other files where having a BOM causes issues. The existence of a BOM is easy to miss since the BOM characters are by default hidden in many editors. Though if that problem only happens very seldomly then this argument has very little weight.
TAC109
Posts: 1129
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: V2 faq UTF8 recommendation

02 Oct 2023, 18:51

The thread you quoted:
This 2020 thread is relevant to this topic and has the opposite wish: remove a BOM recommendation.
viewtopic.php?t=79309
resulted in no changes to the documentation:
Be sure to save the file as UTF-8 with BOM if it will contain non-ASCII characters.
and the negative example quoted in that thread was regarded by @lexikos as not being relevant to AutoHotkey scripts.

Rather than having to determine whether a script contains non-ASCII characters (which is a rather technical exercise) it is just simpler to initially save the script as UTF-8 with BOM. It only has to be done once when creating a script, as the setting is now hard-coded into the beginning of the file and will act as a signal that this file is always to be treated as UTF-8. Without the BOM there is no other information available to determine the file encoding and the editor has to make a guess when opening the file, unless the type of encoding is selected at that time.

Initially saving an AutoHotkey script as UTF-8 with BOM is all upside with no downside, as I see it.

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
User avatar
Ragnar
Posts: 630
Joined: 30 Sep 2013, 15:25

Re: V2 faq UTF8 recommendation  Topic is solved

03 Oct 2023, 04:59

Thanks for the suggestion. I've added the recommendation to save the script as UTF-8 with BOM. See the commit for details.
neogna2
Posts: 600
Joined: 15 Sep 2016, 15:44

Re: V2 faq UTF8 recommendation

03 Oct 2023, 05:36

@TAC109
Back in 2020 Notepad had only recently defaulted to BOM-less UTF-8. Now 3 years have passed. Windows 8 and all earlier versions are end of life. We can ask:
1. Are almost all AutoHotkey v2 users today on either Windows 10 build 1904+ or Windows 11?
2. Is there a significant number of (or even any) known recent cases in this forum or elsewhere where an AutoHotkey v2 script saved as BOM-less UTF-8 is causing issues with some tool or editor in Windows 10 build 1904 or later Windows version?

If answers are 1=yes and 2=no then I think the v2 documentation should not recommend UTF-8 BOM.

Instead the general recommendation, suitable for almost all users, should be UTF-8 BOM-less or just UTF-8 without mentioning BOM. There could then also be a small note further down in the doc text, something like "Note: if you use a version of Windows older than Windows 10 build 1904 and write code in Notepad then it is best to create and save v2 script files with UTF-8 BOM (Byte order mark) encoding"
TAC109 wrote:
02 Oct 2023, 18:51
saving an AutoHotkey script as UTF-8 with BOM is all upside with no downside
I cited the possible downside from need4speed and expanded how it can be indirectly relevant. Any objection to that?

Another downside is that when BOM-less UTF-8 is the default in Notepad in up to date Win 10/11 and in VS Code (the most popular code editor) recommending users to still save v2 scripts as UTF-8 BOM is to recommend them to take an extra step.

As a side note I really liked this recent text "The Absolute Minimum Every Software Developer Must Know About Unicode in 2023" https://tonsky.me/blog/unicode/ archived link https://archive.ph/LtKk0
It is a rejoinder to Spolsky's classic 2003 text
It doesn't say anything about BOM (which maybe is a sign that issues from BOM-lessness are not very common nowadays) but has a nice graph on the rise of UTF-8 and lots of interesting stuff. I wonder if the ICU4C lib mentioned there could be used in AutoHotkey v2 DllCall to count graphemes in strings.
TAC109
Posts: 1129
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: V2 faq UTF8 recommendation

04 Oct 2023, 19:11

Thanks @Ragnar

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
TAC109
Posts: 1129
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: V2 faq UTF8 recommendation

04 Oct 2023, 19:31

@neogna2
I enjoyed reading many of the 'Joel on Software' articles too.

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe

Return to “Suggestions on Documentation Improvements”

Who is online

Users browsing this forum: No registered users and 7 guests