To BOM or not to BOM?

Get help with using AutoHotkey (v2 or newer) and its commands and hotkeys
User avatar
fade2gray
Posts: 85
Joined: 21 Apr 2015, 12:28

To BOM or not to BOM?

24 Aug 2023, 12:35

I notice that when creating a new script using explorer's context menu, whether empty, legacy (v1) or v2, all are created with UTF-8 with BOM encoding.

Is this now preferred for both v1 and v2?
gregster
Posts: 9085
Joined: 30 Sep 2013, 06:48

Re: To BOM or not to BOM?

24 Aug 2023, 14:53

For v1, UTF-8 with BOM is surely preferred for Unicode handling. I think for v2 ist doesn't matter anymore, as long as you are using UTF-8.

https://www.autohotkey.com/docs/v1/FAQ.htm#nonascii
versus
https://www.autohotkey.com/docs/v2/FAQ.htm#nonascii
User avatar
fade2gray
Posts: 85
Joined: 21 Apr 2015, 12:28

Re: To BOM or not to BOM?

24 Aug 2023, 15:14

Can't say I've ever encountered the issue outlined in the first link, all scripts I have created using the context menu pre v2 were always encoded UTF-8 (no BOM) - enough to say that most of my scripts are pretty simplistic.

I'm happy to leave it that way as a 'safety net' for any future v1 scripts I may create, but as it's superfluous to include BOM for v2 scripts, shouldn't there be a settings option in Dash for the preference to include it or not - can it be assumed that a v2 BOM won't incur a performance hit?
gregster
Posts: 9085
Joined: 30 Sep 2013, 06:48

Re: To BOM or not to BOM?

24 Aug 2023, 15:23

Good for you. But it's a classic; we have probably answered some hundred posts dealing with this common v1 issue, even just a few minutes ago...
I don't see how a byte order mark could have a significant impact on performance.

Perhaps add it to the Wishlist forum, if you think that you really need such an option. I usually create new script files in my editors, so I never noticed...
User avatar
kunkel321
Posts: 1136
Joined: 30 Nov 2015, 21:19

Re: To BOM or not to BOM?

25 Aug 2023, 17:09

In my own experience, UTF-8 with BOM was necessary for v2 scripts if I want to have hotstrings with accented diacritic characters...

Try pasting the below hotstrings into a script and see if that
1. Display correctly in your editor, and
2. Get sent correctly when using the hotstring.

Code: Select all

::ao dai::ào dái  ; noun the traditional dress of Vietnamese women consisting of a tunic with long sleeves and panels front and back; the tunic is worn over trousers
:*:apertif::apértif ; noun an alcoholic drink that is taken as an appetizer before a meal
:*:applique::appliqué ; noun a decorative design made of one material sewn over another; verb sew on as a decoration
::apres::après ; French:  Too late.  After the event.
::arete::arête ; noun a sharp narrow ridge found in rugged mountains
::attache::attaché ; noun a specialist assigned to the staff of a diplomatic mission; a shallow and rectangular briefcase
::auto-da-fe::auto-da-fé ; noun the burning to death of heretics (as during the Spanish Inquisition)
::belle epoque::belle époque ; French: Fine period.   noun the period of settled and comfortable life preceding World War I
::bete noire::bête noire ; noun a detested person
::betise::bêtise ; noun a stupid mistake
::Bjorn::Bjørn ; An old norse name.  Means "Bear."
::blase::blasé ; adj. nonchalantly unconcerned; uninterested because of frequent exposure or indulgence; very sophisticated especially because of surfeit; versed in the ways of the world
:*:boite::boîte ; French: "Box."  a small restaurant or nightclub.
::boutonniere::boutonnière ; noun a flower that is worn in a buttonhole.
:*:canape::canapé  ; noun an appetizer consisting usually of a thin slice of bread or toast spread with caviar or cheese or other savory food
:*:celebre::célèbre ; Cause célèbre An incident that attracts great public attention.
ste(phen|ve) kunkel
TAC109
Posts: 1125
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: To BOM or not to BOM?

25 Aug 2023, 19:18

fade2gray wrote:
24 Aug 2023, 15:14

I'm happy to leave it that way as a 'safety net' for any future v1 scripts I may create, but as it's superfluous to include BOM for v2 scripts, shouldn't there be a settings option in Dash for the preference to include it or not - can it be assumed that a v2 BOM won't incur a performance hit?
The purpose of saving script files as UTF-8 with BOM is to ensure that when the file is processed, it is handled as UTF8 and not some other coding format. Without the BOM there is no certainty about the coding of the file, and software has to guess by checking some of the file contents. Sometimes the guess is wrong. Adding the BOM removes the guesswork and ensures that the file is processed correctly.

While AutoHotkey v2 will always process scripts as UTF8, the same can’t be said for the script editor used, or any other utilities involved. So it is always safer to include the BOM when saving. It should process faster too, as there is no preliminary reading of the file to guess the coding.

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
20170201225639
Posts: 144
Joined: 01 Feb 2017, 22:57

Re: To BOM or not to BOM?

25 Aug 2023, 21:17

I've often had use AHK to save to files that are intended to be used by other apps, and found that using FileEncoding "UTF-8" (which means UTF-8 with BOM) sometimes cause issues. For example my Win10 won't run .bat files saved in that way (using "UTF-8-RAW" is necessary)

Code: Select all

bat := "@echo off`r`npause"
FileAppend(bat, "C:\test.bat", "UTF-8-RAW") ; otherwise won't work
Run("C:\test.bat")
TAC109
Posts: 1125
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: To BOM or not to BOM?

25 Aug 2023, 23:43

This is because cmd.exe (the batch file interpreter) doesn’t understand UTF8. (It partially understands an early version of UTF16.) However it can’t be updated by Microsoft without potentially breaking the millions of batch scripts out there.

Cheers
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
User avatar
fade2gray
Posts: 85
Joined: 21 Apr 2015, 12:28

Re: To BOM or not to BOM?

26 Aug 2023, 06:11

20170201225639 wrote:
25 Aug 2023, 21:17
"UTF-8" (which means UTF-8 with BOM)
Are you saying that encoding with UTF-8 uses 'with BOM' by default - is that implied by your editor? I use VSCode and the encoding options, amongst others, are 'UTF-8' or 'UTF-8 with BOM'
User avatar
fade2gray
Posts: 85
Joined: 21 Apr 2015, 12:28

Re: To BOM or not to BOM?

26 Aug 2023, 06:35

@kunkel321 Using VSCode I found no difference in using 'UTF-8' or 'UTF-8 with BOM' when pasting the hotkeys and saving the script, both encodings displayed the characters properly, but the editor window receiving the hotkey output had to be encoded as 'UTF-8 with BOM' or the accents wouldn't be displayed.
User avatar
boiler
Posts: 17206
Joined: 21 Dec 2014, 02:44

Re: To BOM or not to BOM?

26 Aug 2023, 06:42

fade2gray wrote: Are you saying that encoding with UTF-8 uses 'with BOM' by default - is that implied by your editor?
No, it has nothing to do with the editor. He was referring to files written by an AHK script and showed an example of the encoding he had to specify for it not to include the BOM because, as the documentation shows, specifying UTF-8 (without the -RAW) includes the BOM.
20170201225639
Posts: 144
Joined: 01 Feb 2017, 22:57

Re: To BOM or not to BOM?

26 Aug 2023, 10:24

fade2gray wrote:
26 Aug 2023, 06:11
20170201225639 wrote:
25 Aug 2023, 21:17
"UTF-8" (which means UTF-8 with BOM)
Are you saying that encoding with UTF-8 uses 'with BOM' by default - is that implied by your editor? I use VSCode and the encoding options, amongst others, are 'UTF-8' or 'UTF-8 with BOM'
As @boiler said, FileEncoding("UTF-8") is the "with BOM" option, so it corresponds to "UTF-8 with BOM" in editors such as VSCode, sublime text, and Notepad. FileEncoding("UTF-8-RAW") corresponds to the plain "UTF-8" in those editors.

However it wasn't always like this, at least for notepad. In notepad, "UTF-8" used to mean "UTF-8 with BOM" just like in AHK, but Win10 build 1903 changed the meaning (see the quoted text and discussion in viewtopic.php?t=67763) and make without BOM the default, for the backwards compatibility reasons @TAC109 brought up .


The hotstring problems are I believe due to this
lexikos wrote:
05 Sep 2019, 23:19
Currently if a script file lacks a BOM, AutoHotkey defaults to interpreting it as ANSI.
It's worth mentioning that setting FileEncoding in a script file has no effect on how AHK will read that script file (which is determined I think only by 2 factors: (1) whether the script file begins with a BOM (2) whether the script was ran with command line switch /cp65001). Setting FileEncoding only affects the encoding used by the file operations made during the execution of that script.
lexikos
Posts: 9635
Joined: 30 Sep 2013, 04:07
Contact:

Re: To BOM or not to BOM?

26 Aug 2023, 21:44

20170201225639 wrote:
26 Aug 2023, 10:24
The hotstring problems are I believe due to this
lexikos wrote:
05 Sep 2019, 23:19
Currently if a script file lacks a BOM, AutoHotkey defaults to interpreting it as ANSI.
That is for v1 only. v2 defaults to UTF-8 when loading script files.

Return to “Ask for Help (v2)”

Who is online

Users browsing this forum: No registered users and 20 guests