Notepad and UTF-8

Discuss Autohotkey related topics here. Not a place to share code.
Forum rules
Discuss Autohotkey related topics here. Not a place to share code.
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Notepad and UTF-8

05 Sep 2019, 23:19

What's new in the Windows 10 Insider Preview Builds 1903 - Windows Insider Program | Microsoft Docs

We’ve made significant improvements to the way Notepad handles encoding. Starting with this build, we are adding the option to save files in UTF-8 without a Byte Order Mark and making this the default for new files. UTF-8 without a Byte Order Mark is backwards-compatible with ASCII and will provide better interoperability with the web, where UTF-8 has become the default encoding. Additionally, we added a column to the status bar that displays the encoding of the document.
Currently if a script file lacks a BOM, AutoHotkey defaults to interpreting it as ANSI. UTF-8 was the default for a short time, but it caused issues since Notepad is the default editor, and Notepad did not default to UTF-8. IIRC, the "UTF-8" option in Notepad was UTF-8 with BOM; now it is without BOM.

The /cp65001 command-line switch can be used to make UTF-8 the default for script files (but it does not affect A_FileEncoding). If you add this into the command line associated with .ahk files in the registry (i.e. HKCR\AutoHotkeyScript\Shell\Open\Command) it effectively becomes the default for .ahk files launched via Explorer. There's currently no option in the installer for this, but it does detect whether the switch was present and preserve it when reinstalling.

For Windows v1903+, I'm thinking that I will make UTF-8 the default for script files. I have not decided how; I could:
  • Have the AutoHotkey installer add /cp65001 by default in new installations on v1903+. I suppose this benefits new users while minimizing change for existing users.
  • Change the default within AutoHotkey.exe on v1903+. The previous behaviour could be restored by adding /cp0 to the command line. Existing scripts which are saved without a BOM might be adversely affected, although I suppose that updating to v1903 might have already caused problems if you're using Notepad. If a user updates to v1903 after installing AutoHotkey, the behaviour will change to match Notepad.
Adding /cp65001 or /cp0 to the registry does not affect scripts or shortcuts that target AutoHotkey.exe explicitly (and therefore do not utilize the .ahk file type registration), which could be good or bad.
TAC109
Posts: 1099
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: Notepad and UTF-8

12 Sep 2019, 18:31

The Windows implementation of Unicode is a cluster-f*ck!

When a file without BOM is read, Windows 'guesses' whether it is ANSI, UTF8 or UTF16. There is an API which has been around for a while that is used for this 'guessing'. In addition, I have read that in Windows 1903 Notepad implements it’s own 'guessing' algorithm. These guessing algorithms involve reading the file and making a determination depending on the contents.

If the file starts with a BOM the 'guess' is almost always correct (I.e. it will be treated as the appropriate Unicode), otherwise it is a crapshoot. (I have had a case recently with my work on Ahk2Exe where the file was originally saved as UTF8 without BOM, and a subsequent edit caused the script to be treated as ANSI, corrupting some copyright symbols!)

The only sure method of flagging that a script is Unicode is to save it with BOM.

The problem with the suggested approach in the previous post is that all scripts will be treated the same, whereas the type of file should actually be a property of the individual file.

Good luck with whatever is decided!
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Notepad and UTF-8

13 Sep 2019, 04:28

This is just about the default encoding for .ahk files. I am not interested in creating new ways to allow the encoding to vary, or trying to solve impossible problems, such as identifying the encoding of a file based on arbitrary text contents. These are script files, not text files read by the script, which would come from a much wider range of sources.

If the default encoding of the default script editor is UTF-8 without BOM, I believe that defaulting to ANSI in the absence of a BOM will cause more problems than defaulting to UTF-8. The proportion of users facing these problems will grow as systems are updated to 1903+.

Requiring UTF-8 for script files would eventually eliminate non-UTF-8 script files, and there would be no need to identify the encoding. But it would also generate endless repetition of "Unicode character problems" if the common script editors do not default to UTF-8.

An alternative solution would be to bundle AutoHotkey with an editor.
User avatar
kczx3
Posts: 1640
Joined: 06 Oct 2015, 21:39

Re: Notepad and UTF-8

13 Sep 2019, 20:22

lexikos wrote:
13 Sep 2019, 04:28
An alternative solution would be to bundle AutoHotkey with an editor.
That’s a bold alternative, no?
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Notepad and UTF-8

18 Sep 2019, 02:50

@kczx3 I don't see why you would think that.
User avatar
jNizM
Posts: 3183
Joined: 30 Sep 2013, 01:33
Contact:

Re: Notepad and UTF-8

18 Sep 2019, 04:47

kczx3 wrote:
13 Sep 2019, 20:22
That’s a bold alternative, no?
I think for many beginners it would be (very) helpful to be able to select an editor when installing AutoHotkey (like SciTE4AutoHotkey).
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
User avatar
kczx3
Posts: 1640
Joined: 06 Oct 2015, 21:39

Re: Notepad and UTF-8

18 Sep 2019, 19:39

lexikos wrote:
18 Sep 2019, 02:50
@kczx3 I don't see why you would think that.
It just seems a bit out of character for you I guess. What editor did you have in mind?
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Notepad and UTF-8

19 Sep 2019, 05:53

An AHK gui with an edit field from which you can save your script with the correct encoding would suffice.

Cheers.
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Notepad and UTF-8

20 Sep 2019, 01:35

Exactly.

It doesn't have to be as basic as Notepad though - some basic functions can be added without increasing size much, like shortcuts to run or reload the script.

Even if SciTE4AutoHotkey was actively being maintained, I would not include it in the download, since it outweighs the main program. However, scripts to automate download and installation of common editors could be included in the installer.
iPhilip
Posts: 802
Joined: 02 Oct 2013, 12:21

Re: Notepad and UTF-8

20 Sep 2019, 14:37

lexikos wrote: ... scripts to automate download and installation of common editors could be included in the installer.
I like that idea. It would help new users get started quickly.
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
john_c
Posts: 493
Joined: 05 May 2017, 13:19

Re: Notepad and UTF-8

20 Sep 2019, 19:53

I'm not sure I understand the issue.

I use Windows 7 and therefore I cannot say anything about Notepad in Win 8 or 10.

In Windows 7, Notepad is horrible and nearly useless.

* It doesn't have syntax highlighting at all
* It has only "single" undo.
* In case you are not native English speaker, it will be necessary to select the proper encoding (UTF-8 instead of ANSI) each time you save the file. (Not sure, it was a long time ago, but I remember there were some annoying issues.)
* It has buggy line wrapping.
* It's not possible to indent/dedent multiple lines at once.

I can't imagine that someone use it on the daily basis for coding purposes.

There are a lot of better options. My personal choice is Sublime Text and Notepad2-mod. The latter one has built-in AutoHotkey syntax highlighting and an option to launch the script with Ctrl-L. The former one is very powerful if you spent several hours to install the packages and write the configuration.

Another possible options:

* Notepad++
* AHK Studio, SciTE4AHK.
* Vim, Emacs.
* VS Code, Atom.
Last edited by john_c on 20 Sep 2019, 20:41, edited 3 times in total.
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Notepad and UTF-8

20 Sep 2019, 20:25

Not everyone installs an editor immediately after installing AutoHotkey. As long as Notepad is the default editor, any incompatibilities between Notepad and AutoHotkey are a problem that must be addressed in some way.

Notepad gets the job done, while every other editor is completely useless if it is not present on the system. If I'm installing AutoHotkey on someone else's computer to create a quick fix for a problem, I do not install an editor.

In case you are not native English speaker, it will be necessary to select the proper encoding (UTF-8 instead of ANSI) each time you save the file.
In Windows 7, Notepad's default of ANSI works just fine with AutoHotkey's default of ANSI. There are only problems if your system's ANSI code page does not include all of the characters you need, but that is a case of Notepad dropping those characters, rather than AutoHotkey interpreting them incorrectly. The ANSI code page is controlled by the "Language for non-Unicode programs" setting, which should surely be set to your native language.
Not sure, it was a long time ago, but I remember there was some annoying issues.
When Unicode support was first added, AutoHotkey defaulted to UTF-8 when there was no BOM, while Notepad (and some other editors) defaulted to ANSI. This was one source of "annoying issues". The point of this topic is that Windows 10 users will be facing the same problem again, caused by a mismatch in the opposite direction.
User avatar
boiler
Posts: 16774
Joined: 21 Dec 2014, 02:44

Re: Notepad and UTF-8

20 Sep 2019, 22:04

iPhilip wrote:
20 Sep 2019, 14:37
lexikos wrote: ... scripts to automate download and installation of common editors could be included in the installer.
I like that idea. It would help new users get started quickly.
Another reason it would be helpful to new users is that some are uncomfortable not seeing an AutoHotkey client window. They wonder where AutoHotkey is. "How do you run it?" They would see the default editor/IDE as "AutoHotkey" and feel more comfort, similar to Python's IDLE.
iPhilip
Posts: 802
Joined: 02 Oct 2013, 12:21

Re: Notepad and UTF-8

21 Sep 2019, 09:26

boiler wrote:
20 Sep 2019, 22:04
Another reason it would be helpful to new users is that some are uncomfortable not seeing an AutoHotkey client window. They wonder where AutoHotkey is. "How do you run it?" They would see the default editor/IDE as "AutoHotkey" and feel more comfort, similar to Python's IDLE.
Well said! :)
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
User avatar
lmstearn
Posts: 688
Joined: 11 Aug 2016, 02:32
Contact:

Re: Notepad and UTF-8

26 Oct 2019, 21:48

If an existing UTF with BOM script is loaded into N++, for example, will that still work if the change goes ahead, or will the compiler emit a warning?
:arrow: itros "ylbbub eht tuO kaerB" a ni kcuts m'I pleH
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Notepad and UTF-8

27 Oct 2019, 06:10

jNizM wrote:
18 Sep 2019, 04:47
kczx3 wrote:
13 Sep 2019, 20:22
That’s a bold alternative, no?
I think for many beginners it would be (very) helpful to be able to select an editor when installing AutoHotkey (like SciTE4AutoHotkey).
I would vote for AHK Studio and AutoGUI to be bundled. Both are written in AutoHotkey. If only one could be bundled, then just AHK Studio. The advantage of the editor being written in AutoHotkey, is the likelihood of someone being able to take over or make a fork if the software got abandoned is greater. In addition, it allows greater customization options for more users.
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Notepad and UTF-8

14 Jun 2020, 02:57

Has anyone observed any issues resulting from Notepad defaulting to UTF-8?
User avatar
boiler
Posts: 16774
Joined: 21 Dec 2014, 02:44

Re: Notepad and UTF-8

14 Jun 2020, 03:18

lexikos wrote:
14 Jun 2020, 02:57
Has anyone observed any issues resulting from Notepad defaulting to UTF-8?
In case this is a useful datapoint, I have had issues with VS Code defaulting to UTF-8, and it’s not until I manually save my script with UTF-8 with BOM encoding that the issues with special characters are resolved.
User avatar
jNizM
Posts: 3183
Joined: 30 Sep 2013, 01:33
Contact:

Re: Notepad and UTF-8

30 Jun 2020, 02:36

Since I use a German keyboard layout, I have to use "UTF-8-BOM" in Notepad ( / Notepad++), otherwise the umlauts (äöü) are not displayed correctly in AutoHotkey (strings, ...).

Image
[AHK] v2.0.5 | [WIN] 11 Pro (Version 22H2) | [GitHub] Profile
lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: Notepad and UTF-8

30 Jun 2020, 06:24

It's always been that way for UTF-8 without BOM (except for a brief time during which UTF-8 was the default for AutoHotkey, and unless you use /cp65001).

My question was specifically about Notepad's new default behaviour. For example, you were creating scripts in Notepad, and the umlauts (äöü) were not displayed correctly because Notepad automatically used UTF-8 (not because you chose UTF-8). I already knew that this would happen, in theory, but do not know how many users are affected in practice.

Anyway, the installer for v1.1.33.00 has a "Default to UTF-8" option, but it's not enabled by default unless you had /CP65001 in the default value of HKCR\AutoHotkeyScript\Shell\Open\Command.

Return to “General Discussion”

Who is online

Users browsing this forum: No registered users and 24 guests