Basic question about the string terminator in ahk Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
autocart
Posts: 165
Joined: 12 May 2014, 07:42

Basic question about the string terminator in ahk

20 Jun 2019, 13:28

Hi all,

I am wondering how StrGet(addressA) figures out the string length when I don't particularly specify it as parameter for the function.
Does ahk use a string terminator? If so, is it the null-terminator mentioned in the help? (The help does not seem very clear to me, or I did not find the right page. For reference purposes: https://www.autohotkey.com/docs/commands/StrPutGet.htm)

Thank for any helpful feedback. Regards, S.
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk

20 Jun 2019, 15:49

autocart wrote: If so, is it the null-terminator mentioned in the help?
Yes. In the unicode version, these are two null bytes.
autocart
Posts: 165
Joined: 12 May 2014, 07:42

Re: Basic question about the string terminator in ahk

20 Jun 2019, 15:52

Thx and in the ANSI version, I assume it is a one byte null character?

Besides, if I wrote the string to memory somehow without a null-terminator character at all, and I don't specify a length in StrGet, what happens then?
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk

20 Jun 2019, 15:57

autocart wrote: in the ANSI version, I assume it is a one byte null character?
Right.
autocart wrote: if I wrote the string to memory somehow without a null-terminator character at all, and I don't specify a length in StrGet, what happens then?
AHK will read bytes until first null-byte appears (in ANSI).
autocart
Posts: 165
Joined: 12 May 2014, 07:42

Re: Basic question about the string terminator in ahk

20 Jun 2019, 16:00

Thank u very much, teadrinker,

do u think it is different in the Unicode version since u wrote "(in ANSI)"?
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk  Topic is solved

20 Jun 2019, 16:05

In the unicode version AHK will read two bytes at a time until two zero bytes are encountered.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Basic question about the string terminator in ahk

20 Jun 2019, 20:22

- ANSI and UTF-8 use 1 null byte as the null character.
- UTF-16 uses 2 null bytes (at an even offset) as the null character.
- StrGet/StrPut should work identically on AHK v1.1 ANSI/AHK Unicode. [EDIT:] One exception is that when Encoding is omitted: it is UTF-16 in AHK Unicode, and CP0 in AHK ANSI.

- Based on some tests:
- A script should load correctly, with AHK v1.1 (AHK v1.1 ANSI/AHK Unicode) as long as the script is ANSI/UTF-8 (with a BOM)/UTF-16 LE (with a BOM).
- If AHK v1.1 ANSI tries to run a script that contains literal non-ASCII characters, those literal non-ASCII are converted to their 'best-fit' equivalent, e.g. square root to 'v', or otherwise to a question mark.
- AHK v1.0 (AHK Basic) can only handle ANSI files. It can open a UTF-8 file with a BOM (it ignores the BOM), it treats the file as though it's ANSI, e.g. square root to '√'.
Last edited by jeeswg on 20 Jun 2019, 21:29, edited 1 time in total.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk

20 Jun 2019, 20:58

jeeswg wrote: StrGet/StrPut should work identically on AHK v1.1 ANSI/AHK Unicode
Not exactly. StrGet(&str) in ANSI-version interprets the byte array as an ANSI string, while in Unicode-version it interprets the byte array as an Unicode-string.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Basic question about the string terminator in ahk

20 Jun 2019, 21:37

- Thanks teadrinker, I've added a clarification above.
- Here is a test demonstrating what happens when the Encoding parameter is omitted.
- (I've read through the StrPut/StrGet documentation multiple times, but haven't noticed any other AHK v1.1 Unicode/ANSI differences.)

Code: Select all

q:: ;test StrGet/StrPut (AHK v1.1 Unicode/ANSI)
VarSetCapacity(vData, 10*2, 0)
Loop 4
	NumPut(96+A_Index, &vData, A_Index-1, "UChar")
MsgBox, % StrGet(&vData)

VarSetCapacity(vData, 10*2, 0)
MsgBox, % StrPut("abcd", &vData)
MsgBox, % Format("0x{:08X}", NumGet(&vData, 0, "UInt"))
return
- There are some issues with the documentation:
- Under 'Encoding', it half-implies that omit parameter = CP0, but actually omit parameter = UTF-16 or CP0. (As omitting a parameter and using a blank string are commonly equivalent in functions.)
Specify an empty string or "CP0" to use the system default ANSI code page.
- Also, people might expect Encoding to match A_FileEncoding, if Encoding is omitted.

- Under 'Encoding', it should say something like:
- If Encoding is not specified, it is UTF-16 (on Unicode versions) or CP0 (on ANSI versions).
- The documentation does say this, but it's meaning is not immediately apparent:
If no Encoding is specified, the string is simply measured or copied without any conversion taking place.
- I've mentioned the problem, here:
Suggestions on documentation improvements - Page 29 - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=13&t=1434&p=281716#p281716
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk

20 Jun 2019, 22:14

teadrinker wrote:
20 Jun 2019, 16:05
In the unicode version AHK will read two bytes at a time until two zero bytes are encountered.
Of course, I meant "if both parameters (length and encoding) are omitted". :)
Helgef
Posts: 4440
Joined: 17 Jul 2016, 01:02
Contact:

Re: Basic question about the string terminator in ahk

21 Jun 2019, 00:43

remarks wrote:Note that the String parameter of StrPut and return value of StrGet are always in the native encoding of the current executable, whereas Encoding specifies the encoding of the string written to or read from the given Address. If no Encoding is specified, the string is simply measured or copied without any conversion taking place.
No Encoding parameter means no conversion and the return value is in the native encoding, it is pretty clear imo.

omitting a parameter and using a blank string are commonly equivalent in functions.
Any example?
- Also, people might expect Encoding to match A_FileEncoding, if Encoding is omitted.
FileEncoding wrote:Sets the default encoding for FileRead, FileReadLine, Loop Read, FileAppend, and FileOpen.
Cheers.
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk

21 Jun 2019, 01:05

If no Encoding is specified, the string is simply measured or copied without any conversion taking place
IMO, not quite accurate definition.

Code: Select all

str := "hello"
VarSetCapacity(buff, StrPut(str, "UTF-8"))
StrPut(str, &buff, "UTF-8")
MsgBox, % StrGet(&buff)
The conversion did take place, didn't it? :)
Helgef
Posts: 4440
Joined: 17 Jul 2016, 01:02
Contact:

Re: Basic question about the string terminator in ahk

21 Jun 2019, 01:39

Hi teadrinker :).
Your example works exactly as I expected, and the definition is accurate, there is no conversion when you omit the encoding parameter. (Edit: Disregarding that the string is not guaranteed to be null terminated on Unicode build, which could cause undefined behaviour ;) )

Cheers :tea:
teadrinker
Posts: 1718
Joined: 29 Mar 2015, 09:41
Contact:

Re: Basic question about the string terminator in ahk

21 Jun 2019, 02:23

Hi @Helgef
Yes, you are right, now I see. :?
Cheers 8-)
autocart
Posts: 165
Joined: 12 May 2014, 07:42

Re: Basic question about the string terminator in ahk

21 Jun 2019, 03:51

Hi all,
thx for all of your feedback.
Helgef wrote:
21 Jun 2019, 01:39
(Edit: Disregarding that the string is not guaranteed to be null terminated on Unicode build, which could cause undefined behaviour ;) )
Are you saying that StrPut will sometimes add a null-terminator and sometimes not?
Helgef
Posts: 4440
Joined: 17 Jul 2016, 01:02
Contact:

Re: Basic question about the string terminator in ahk

21 Jun 2019, 04:18

How the buffer is filled in the example doesn't matter, utf-16 strings require two byte null to be zero terminated, utf-8 only requires one byte. Meaning that if you try to interpret what was meant to be a utf-8 string as utf-16, the double zero isn't guaranteed.

Disregarding that,
return value wrote: If Length is exactly the length of the converted string, the string is not null-terminated; otherwise the returned count includes the null-terminator.
autocart
Posts: 165
Joined: 12 May 2014, 07:42

Re: Basic question about the string terminator in ahk

21 Jun 2019, 04:39

Got it. You were referring only to that particular example. I thought that was a general statement. Thx.
Helgef
Posts: 4440
Joined: 17 Jul 2016, 01:02
Contact:

Re: Basic question about the string terminator in ahk

21 Jun 2019, 04:56

The general statement would be that you shouldn't specify the wrong encoding, either explicitly or by omitting the parameter.
autocart
Posts: 165
Joined: 12 May 2014, 07:42

Re: Basic question about the string terminator in ahk

21 Jun 2019, 14:41

[EDIT] Never mind. Found the answer in the next post on the next page.[/EDIT]

Help,

this test example with a 20 char string does not work right:

Code: Select all

string := "12345678901234567890" ;"TEST"
address := ""
StrPut(string, &address, 21)
InputBox, OutputVar, , , , , , , , , , % """" StrGet(&address) """"
The result, which I get in return is "12345678901234567890ݸȚ" (when I paste it here, although in the MsgBox it looked like in the picture)
StrPutGet problem.JPG
StrPutGet problem.JPG (11.26 KiB) Viewed 2206 times
and the script seems to crash as well (upon closing the dialog) since the tray icon does not go away until I hover the mouse over it (without clicking).

The returned string seems to be ok with 19 chars and less but the tray icon still hangs.
The tray icon only disappears on its own with 12 chars or less.

Ideas, anyone?
Last edited by autocart on 21 Jun 2019, 19:34, edited 1 time in total.

Return to “Ask For Help”

Who is online

Users browsing this forum: Bing [Bot], bjukrain, Dlxgp7, GridVin, robmar-zl, TAC109 and 36 guests