isspace versus iswspace Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

isspace versus iswspace

24 Nov 2018, 20:31

I'm looking for more information re. isspace versus iswspace, i.e. if any of the following information could be corrected/expanded.

isspace, iswspace, _isspace_l, _iswspace_l
https://msdn.microsoft.com/en-us/library/y13z34da.aspx

- The functions appear to be very different, and not Unicode/ANSI versions of the same function.
- Curiously, isspace expects an Int, and iswspace expects a UShort (if that is what wint_t is).
- I get quite different results for which characters are considered 'spaces' by each function. E.g. isspace does not consider Chr(160) as a space, and appears to consider hundreds/thousands of non-space characters as spaces.

Here is some test code, although it's conceivable that there are some errors in it.

Code: Select all

q:: ;test isspace and iswspace
vOutput1 := vOutput2 := vOutput3 := vOutput4 := ""
vCount1 := vCount2 := vCount3 := vCount4 := 0
Loop, 65535
{
	if DllCall("msvcrt\isspace", Int,A_Index, "Cdecl")
		vOutput1 .= A_Index ",", vCount1 += 1
	if DllCall("msvcrt\iswspace", UShort,A_Index, "Cdecl")
		vOutput2 .= A_Index ",", vCount2 += 1
	if DllCall("msvcrt\_isspace_l", Int,A_Index, Int,0, "Cdecl")
		vOutput3 .= A_Index ",", vCount3 += 1
	if DllCall("msvcrt\_iswspace_l", UShort,A_Index, Int,0, "Cdecl")
		vOutput4 .= A_Index ",", vCount4 += 1
}
Clipboard := Format("{}`r`n{}`r`n`r`n{}`r`n{}`r`n`r`n{}`r`n{}`r`n`r`n{}`r`n{}", vCount1, vOutput1, vCount2, vOutput2, vCount3, vOutput3, vCount4, vOutput4)
MsgBox, % "done"
return
Note: the AHK source code uses '_istspace' in the source code inside the StrToTitleCase function in util.h. I couldn't find which Winapi functions this function uses.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4106
Joined: 17 Jul 2016, 01:02
Contact:

Re: isspace versus iswspace

25 Nov 2018, 04:30

jeeswg wrote:isspace, iswspace, _isspace_l, _iswspace_l
https://msdn.microsoft.com/en-us/library/y13z34da.aspx
Did you even read this yourself?
E.g. isspace does not consider Chr(160) as a space
isspace returns a nonzero value if c is a white-space character (0x09 – 0x0D or 0x20).
160 is 0xa0 in hex.
Also,
The behavior of isspace and _isspace_l is undefined if c is not EOF or in the range 0 through 0xFF, inclusive
Your test script is clearly out of range for the most part.
, and appears to consider hundreds/thousands of non-space characters as spaces.
Run this in v2 and compare to your list for isspace,

Code: Select all

msgbox 0x09 	
msgbox 0x109 
msgbox 0x209 
msgbox 0x309
; ...
iswspace, is the corresponding wide character function, your test calls it correctly as far as I can tell.

As for the _l functions, _locale_t is not int, it is a pointer to the locale to use.

Cheers.

Edit:
Note: the AHK source code uses '_istspace' in the source code inside the StrToTitleCase function in util.h. I couldn't find which Winapi functions this function uses.
_istspace is not a function, it is a macro, defined as iswspace on unicode, else it is isspace.
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: isspace versus iswspace

25 Nov 2018, 08:25

- Thanks Helgef. I fixed the script above (0xFF limit and Ptr instead of Int). And comparing the results, the functions are almost the Unicode/ANSI equivalents of each other. With Chr(160) being an exception.
- If you run the title case script below, in AHK Unicode and AHK ANSI, you will see that in AHK Unicode, a letter after a Chr(160) is capitalised, but not so in AHK ANSI.

Code: Select all

q:: ;test isspace and iswspace
vOutput1 := vOutput2 := vOutput3 := vOutput4 := ""
vCount1 := vCount2 := vCount3 := vCount4 := 0
Loop, 65535
{
	if (A_Index <= 0xFF)
	&& DllCall("msvcrt\isspace", Int,A_Index, "Cdecl")
		vOutput1 .= A_Index ",", vCount1 += 1
	if DllCall("msvcrt\iswspace", UShort,A_Index, "Cdecl")
		vOutput2 .= A_Index ",", vCount2 += 1
	if (A_Index <= 0xFF)
	&& DllCall("msvcrt\_isspace_l", Int,A_Index, Ptr,0, "Cdecl")
		vOutput3 .= A_Index ",", vCount3 += 1
	if DllCall("msvcrt\_iswspace_l", UShort,A_Index, Ptr,0, "Cdecl")
		vOutput4 .= A_Index ",", vCount4 += 1
}
Loop, 4
	vOutput%A_Index% := SubStr(vOutput%A_Index%, 1, -1)
Clipboard := Format("{}`r`n{}`r`n`r`n{}`r`n{}`r`n`r`n{}`r`n{}`r`n`r`n{}`r`n{}", vCount1, vOutput1, vCount2, vOutput2, vCount3, vOutput3, vCount4, vOutput4)
MsgBox, % "done"
return

;isspace (6 chars):
;9,10,11,12,13,32
;iswspace (25 chars):
;9,10,11,12,13,32,160,5760,6158,8192,8193,8194,8195,8196,8197,8198,8199,8200,8201,8202,8232,8233,8239,8287,12288

w:: ;test title case (slightly different results in AHK Unicode and AHK ANSI)
VarSetCapacity(vAnsi, 256*2, 0)
Loop, 256
	NumPut(Ord("a"), &vAnsi, A_Index*2-2, "UChar")
	, NumPut(A_Index, &vAnsi, A_Index*2-1, "UChar")
vText := StrGet(&vAnsi, "CP0")
Clipboard := Format("{:T}", vText)
return
- I meant to say '_istspace' 'function' (with apostrophes). There are a lot of 'functions' with 't' in their name, that are really macros, that resolve to different code based on whether the exe is Unicode or ANSI.
- Anyhow, usually I can find where these macros are defined, but in this case, I could only find one reference to '_istspace', where it was used, but not where it was defined. It would be helpful to find the code.
- Here are some tests re. the size of various types, I found any information on the Internet about them very vague.

Code: Select all

	std::cout << sizeof(wchar_t) << std::endl; //2:2 (64-bit:32-bit)
	std::cout << sizeof(wint_t) << std::endl; //2:2
	std::cout << sizeof(_locale_t) << std::endl; //8:4
- I found this unclear: 'is undefined if c is not EOF or in the range 0 through 0xFF'. Not 'end of file'? Not Chr(26)?
- It's curious that isspace uses Int (rather than UChar), whereas iswspace using UShort makes sense.

==================================================

- I see what you mean by your cryptic AHK v2 MsgBox reference. There is a recurring pattern for isspace, made clearer if you display the numbers as hex not dec. What I wanted to understand, was the difference between isspace and _isspace_l in the range above 0xFF. (Although since that behaviour is undefined, it doesn't have to make sense.)

Code: Select all

;before:
	if DllCall("msvcrt\isspace", Int,A_Index, "Cdecl")
		vOutput1 .= A_Index ",", vCount1 += 1
;after:
	if DllCall("msvcrt\isspace", Int,A_Index, "Cdecl")
		vOutput1 .= Format("0x{:X}", A_Index) ",", vCount1 += 1
- Here's some code to investigate the behaviour of _isspace_l in the undefined range, but it doesn't reveal much. isspace is quite transparent however.

Code: Select all

q:: ;test isspace and _isspace_l (0x01 to 0xFF and in undefined range above 0xFF)
vOutput1 := vOutput2 := ""
vCount1 := vCount2 := 0
Loop, 65535
{
	if DllCall("msvcrt\isspace", Int,A_Index, "Cdecl")
		vOutput1 .= Format("0x{:04X}", A_Index) ",", vCount1 += 1
	if DllCall("msvcrt\_isspace_l", Int,A_Index, Ptr,0, "Cdecl")
		;vOutput2 .= Format("0x{:04X}", A_Index) ",", vCount2 += 1
		vOutput2 .= ".", vCount2 += 1
	if !(A_Index & 0xFF)
		vOutput1 .= "`r`n", vOutput2 .= "`r`n"
}
Loop, 2
	vOutput%A_Index% := SubStr(vOutput%A_Index%, 1, -1)
Clipboard := Format("{}`r`n{}`r`n`r`n{}`r`n{}", vCount1, vOutput1, vCount2, vOutput2)
MsgBox, % "done"
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4106
Joined: 17 Jul 2016, 01:02
Contact:

Re: isspace versus iswspace  Topic is solved

25 Nov 2018, 10:59

I found this unclear: 'is undefined if c is not EOF or in the range 0 through 0xFF'. Not 'end of file'? Not Chr(26)?
It's curious that isspace uses Int (rather than UChar), whereas iswspace using UShort makes sense.
EOF is not a character, and hence, it is also not chr(26). EOF is a macro, often defined as -1, but it can be something else (<0) too, it should be used as an int. The point is to be outside of the range of char, and it is used by I/O functions to indicate end of file or errors. That is why the function (and many other string functions) takes an int, so you can call them with the result from an I/O function.
I could only find one reference to '_istspace',
Look in tchar.h.

Cheers.
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: isspace versus iswspace

25 Nov 2018, 11:27

- Thanks a lot Helgef.
- The AHK source code zip does not contain tchar.h, but it is referenced in a file in the zip: stdafx.h, '#include <tchar.h>'.
- Curiously, tchar.h is not present here either (a folder created by Visual Studio):
C:\Program Files (x86)\Windows Kits\8.1
- However, I did find it in two places:
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\tchar.h
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\tchar.h
- Indeed _istspace resolves to iswspace or isspace.
- What has surprised me is that I would have expected a char to be cast to an int at some point, for use with isspace. So I'm wondering if/where that's done.

- Great info, cheers. I've been coming across various curious little problems after having gone through much of the AHK source code. Thanks re. EOF and tchar.h, and bit fields mentioned here:
list of structs with parameters (sizes and types) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 32#p249732
- Did you come across tchar.h before? It seems quite a specific thing to know. Cheers.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4106
Joined: 17 Jul 2016, 01:02
Contact:

Re: isspace versus iswspace

25 Nov 2018, 13:42

What has surprised me is that I would have expected a char to be cast to an int at some point, for use with isspace. So I'm wondering if/where that's done.
In c++, char is (implicitly) converted to int when passed to a function which expects an int.

Cheers.

Return to “Ask For Help”

Who is online

Users browsing this forum: Bing [Bot], f1ster, flyingDman, lakozyin and 240 guests