There is currently no efficient method to test strings for a prefix, suffix, or random-index character. As far as I can tell, we only have the ability to use SubStr to extract a new string containing the specific character, then compare it to our target. It would be great to have the ability to simply check a specific index character value. Something like if mystring[A_Index] == "X" .
Thanks!
Optimized access to string characters
Re: Optimized access to string characters
Is that a problem? Your suggestion is to add a shortcut (shorthand syntax) for doing exactly the same thing.As far as I can tell, we only have the ability to use SubStr to extract a new string containing the specific character, then compare it to our target.
There are other options, but SubStr-comparison is likely to be the most efficient, especially when you are only extracting one character.
You can make string[] a shortcut for SubStr by adding a __Get meta-function to the default base object (v1) or adding a parameterized __Item property to String.Prototype (v2).
How do you gauge efficiency? Is it about your efficiency, or the program's? What is the criteria for a method to be judged as not efficient?
Re: Optimized access to string characters
Like this?
Code: Select all
text:="randomtext"
needle:="ex"
MsgBox % DllCall("msvcrt\wcsncmp","PTR",(&text)+14,"Str",needle,"Int",2)
Re: Optimized access to string characters
That is less efficient.
Re: Optimized access to string characters
This is how I implemented the above for v1.lexikos wrote: ↑26 Feb 2021, 20:09You can make string[] a shortcut for SubStr by adding a __Get meta-function to the default base object (v1) or adding a parameterized __Item property to String.Prototype (v2).
Code: Select all
"".base.__Get := Func("Default_Get")
Default_Get(var, index) {
return SubStr(var, index, 1)
}
Str := "hello world", Index := 5
MsgBox % Str[Index] " == " "hello world"[Index] " == " SubStr(Str, Index, 1)
Code: Select all
DefProp := {}.GetMethod("DefineProp")
%DefProp%("".base, "__Item", {get : (var, index) => SubStr(var, index, 1)})
Str := "hello world", Index := 5
MsgBox Str[Index] " == " "hello world"[Index] " == " SubStr(Str, Index, 1)
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
Re: Optimized access to string characters
Hi @lexikos,
I have a follow up question: How would I go about removing a __Get meta-function to restore the default behavior?
adds a __Get meta-function to the default base object but it's not clear how I would undo that.
I have a follow up question: How would I go about removing a __Get meta-function to restore the default behavior?
Code: Select all
"".base.__Get := Func("Default_Get")
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
Re: Optimized access to string characters
All you did was assign (and thereby create) a property/key-value pair. To undo that, delete what you created.
Re: Optimized access to string characters
Hi @lexikos,
Thank you for your reply. I figured as much. The sticky issue for me was not realizing that I had to allow for the object base to be returned by the Default_Get function after assigning the __Get meta-function to the default base object.
The code below works for v1.
Code: Select all
#NoEnv
Str := "hello world", Index := 5
MsgBox % Str[Index] " == " "hello world"[Index] " != " SubStr(Str, Index, 1)
"".base.__Get := Func("Default_Get")
MsgBox % Str[Index] " == " "hello world"[Index] " == " SubStr(Str, Index, 1)
"".base.Delete("__Get")
MsgBox % Str[Index] " == " "hello world"[Index] " != " SubStr(Str, Index, 1)
Default_Get(var, key) {
if (key != "base")
return SubStr(var, key, 1)
}
Code: Select all
Str := "hello world", Index := 5
try
MsgBox Str[Index] " == " "hello world"[Index] " != " SubStr(Str, Index, 1)
catch
MsgBox 'Error: This value of type "String" has no property named "__Item".'
DefProp := {}.GetMethod("DefineProp")
%DefProp%("".base, "__Item", {get : (var, index) => SubStr(var, index, 1)})
MsgBox Str[Index] " == " "hello world"[Index] " == " SubStr(Str, Index, 1)
DelProp := {}.GetMethod("DeleteProp")
%DelProp%("".base, "__Item")
try
MsgBox Str[Index] " == " "hello world"[Index] " != " SubStr(Str, Index, 1)
catch
MsgBox 'Error: This value of type "String" has no property named "__Item".'
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
Re: Optimized access to string characters
You may also use Func("SubStr").Bind(,, 1) in place of (var, index) => SubStr(var, index, 1).
With a future release (current 'x' branch), SubStr.Bind(,, 1) will work.
With a future release (current 'x' branch), SubStr.Bind(,, 1) will work.
Re: Optimized access to string characters
Thank you, @lexikos. I had forgotten about the new Bind() feature.
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
Re: Optimized access to string characters
Perhaps biga.ahk can be of some use:
Code: Select all
A := new biga() ; requires https://www.npmjs.com/package/biga.ahk
msgbox, % A.startsWith("NOT REQUIRED", "NOT")
; => true
msgbox, % A.endsWith("NOT REQUIRED", "D")
; => true
; get a random character:
msgbox, % A.sample(strSplit("NOT REQUIRED"))
; => "E"
https://biga-ahk.github.io/biga.ahk/#/?id=startswith
https://biga-ahk.github.io/biga.ahk/#/?id=endswith
https://biga-ahk.github.io/biga.ahk/#/?id=sample
Re: Optimized access to string characters
I'm sorry, I didn't mean to touch a soft spot. I apologize for my poor choice of words. I wasn't implying that any of the existing functionality is inefficient. Just that SubStr() is doing a lot more work than what is necessary to achieve the desired result. If we just want to look at a single character's value, we are required to create an entirely new, separate string in memory. I assume the new string is held in heap memory, etc? Even if it uses the stack or a very efficient recycling system, its still doing a lot more than necessary to compare a single character value to another value (memory + index lookup).lexikos wrote: ↑26 Feb 2021, 20:09Is that a problem? Your suggestion is to add a shortcut (shorthand syntax) for doing exactly the same thing.As far as I can tell, we only have the ability to use SubStr to extract a new string containing the specific character, then compare it to our target.
There are other options, but SubStr-comparison is likely to be the most efficient, especially when you are only extracting one character.
You can make string[] a shortcut for SubStr by adding a __Get meta-function to the default base object (v1) or adding a parameterized __Item property to String.Prototype (v2).
How do you gauge efficiency? Is it about your efficiency, or the program's? What is the criteria for a method to be judged as not efficient?
If it would take too much time to implement, or would be too difficult to implement, then that makes sense. Otherwise, I personally think the functionality should be available in such a string-friendly language. You can do tons of amazing things with strings, but not the most basic operation.
EDIT: I personally won't be adding a shortcut to pretend this operation exists, but I do recommend faking it officially. If for no other reason, just to represent a possible future upgrade. This would allow script writers to make use of the optimized string[index] syntax, allowing things to fall into place if/when the functionality is eventually added. Or if you don't like array syntax, maybe GetCharAtIndex(string,index).
Re: Optimized access to string characters
I think you misread the tone of my post and have made some poor assumptions about how strings are implemented and what the performance would be.
What makes you think that indexing into the string and having it return a single-character string (for comparison, as in your hypothetical example) would be implemented or perform any differently to SubStr indexing into the string and having it return a single-character string? It's literally doing the exact same thing, but the indexer would be invoked through an additional layer of abstraction (invoking a virtual property vs calling a function directly) whereas SubStr has one more parameter.
What makes you think that indexing into the string and having it return a single-character string (for comparison, as in your hypothetical example) would be implemented or perform any differently to SubStr indexing into the string and having it return a single-character string? It's literally doing the exact same thing, but the indexer would be invoked through an additional layer of abstraction (invoking a virtual property vs calling a function directly) whereas SubStr has one more parameter.
Re: Optimized access to string characters
sounds like he wants std::string_view implemented
Re: Optimized access to string characters
More apologies. Apparently the tone of all of your posts come off as angry. I haven't looked at the source code, but I'm assuming the comparison operators are not required to create variables for the expressions. There are several situations where a comparison operator can "look inside" of another variable, so I guess it depends on how AHK handles those situations. Does it create new variables to represent a portion of the other variables, or does it compare the data directly? And in either case, there's no reason to create new variables - just compare the data directly. Especially in those simple situations where a<50> == b<30>. And in cases where the user is assigning the data to a new variable, use SubStr().lexikos wrote: ↑16 Mar 2021, 21:57I think you misread the tone of my post and have made some poor assumptions about how strings are implemented and what the performance would be.
What makes you think that indexing into the string and having it return a single-character string (for comparison, as in your hypothetical example) would be implemented or perform any differently to SubStr indexing into the string and having it return a single-character string? It's literally doing the exact same thing, but the indexer would be invoked through an additional layer of abstraction (invoking a virtual property vs calling a function directly) whereas SubStr has one more parameter.
I didn't realize this much depth was necessary to ask for a feature. Just ignore it if you don't think it will benefit AHK.
Re: Optimized access to string characters
If all you wanted was to ask for a feature, you achieved that with the first post. I assume you posted because you want it implemented. But if you wanted it for the reason you stated, then implementing it wouldn't actually help you since SubStr would likely still be more efficient (depending on how you measure "efficiency"). I posted not to reject your request, but to address misinformation.
if mystring[A_Index] == "X" has two variables: mystring and A_Index. The equality operator takes two values and compares them. The sub-expression mystring[A_Index] invokes the default property of the value contained by mystring, with the parameter A_Index. If the invocation does not throw an exception (e.g. because the property doesn't exist), the value it produces is then compared to "X" by ==. No variables are created by this expression at runtime.
The default property for an Object is __Item, but for COM objects it has no name.
There is no such thing as "a portion of a variable". A variable contains a value. mystring presumably contains a string. You want mystring[A_Index] to return a single-character substring of the string contained by the variable, not a portion of the variable itself.
It seems that you think mystring[A_Index] == "X" could compare the character at that index directly, without having any need for an intermediate string containing that character. That would mean resolving this to a single operation, similar to CompareCharAt(mystring, A_Index, "X"), not evaluating each side of the operator and then evaluating the operator. But why would you assume it could work that way for mystring[A_Index] == "X" and not for SubStr(mystring, A_Index, 1) == "X"?
Actually, it is not practical for mystring[A_Index] because the indexing operation must be dispatched according to the runtime type of the value, and mystring could contain anything (e.g. an Array, Map or Gui). The CompareCharAt operation would need to perform the equivalent of evaluating mystring[A_Index] itself if mystring does not contain a string. Furthermore, String.prototype.__Item might have been redefined with different behaviour, and checking for that would possibly defeat the optimization. By contrast, it's always known at load-time whether SubStr refers to the built-in function (i.e. hasn't been shadowed by an inane user-defined one) and that it will return a string, so it would be technically possible to make the comparison work this way. However, it would almost certainly be a waste of time ("premature optimization"), because "creating" a single-character string for return is a very fast operation.
With the current AutoHotkey v2 implementation, there are a number of different ways strings might be returned from a built-in function. For SubStr, only the following are relevant:
A null-terminated single-character UTF-16 string is (typically) 4 bytes including the null-terminator. Copying these values repeatedly is not a performance concern, and is almost certainly faster than allocating new memory.
At some point in the future, the code may be reviewed to remove the reliance on null-termination. Then, SubStr can just return a pointer and length, without copying any strings. (The expression evaluator may still need to make a copy, as it does now.)
In any case, mystring[A_Index] will never be more CPU-efficient than SubStr(mystring, A_Index, 1). The difference in CPU-efficiency has nothing to do with allocating or copying strings; it is the cost of dynamic dispatch. Built-in properties are implemented the same way as user-defined ones; the only difference is that a built-in function is called instead of a user-defined one.
For a very fast and simple operation like retrieving a character from a string, the difference is more pronounced percentage-wise because the overall operation is still very fast. I did some testing with a simple "custom built-in" function ChrAt, functionally identical to SubStr when its third parameter is 1, but simpler. The results were:
Shorter = faster. The UDF was ((s, i) => SubStr(s, i, 1)). The test was to iterate over mystring := A_AhkPath with Loop StrLen(mystring) and one of the above. The slowest was still under 5µs per iteration (averaged over 100k iterations), and that's iterating over the entire string, not just a single call to SubStr/ChrAt/[].
So even if you're concerned about the performance of such a trivial operation, there is very little to be gained by adding this feature as built-in over implementing it yourself. If it is added, it will not be for performance reasons, but perhaps (subjective) readability or convenience.
if mystring[A_Index] == "X" has two variables: mystring and A_Index. The equality operator takes two values and compares them. The sub-expression mystring[A_Index] invokes the default property of the value contained by mystring, with the parameter A_Index. If the invocation does not throw an exception (e.g. because the property doesn't exist), the value it produces is then compared to "X" by ==. No variables are created by this expression at runtime.
The default property for an Object is __Item, but for COM objects it has no name.
There is no such thing as "a portion of a variable". A variable contains a value. mystring presumably contains a string. You want mystring[A_Index] to return a single-character substring of the string contained by the variable, not a portion of the variable itself.
It seems that you think mystring[A_Index] == "X" could compare the character at that index directly, without having any need for an intermediate string containing that character. That would mean resolving this to a single operation, similar to CompareCharAt(mystring, A_Index, "X"), not evaluating each side of the operator and then evaluating the operator. But why would you assume it could work that way for mystring[A_Index] == "X" and not for SubStr(mystring, A_Index, 1) == "X"?
Actually, it is not practical for mystring[A_Index] because the indexing operation must be dispatched according to the runtime type of the value, and mystring could contain anything (e.g. an Array, Map or Gui). The CompareCharAt operation would need to perform the equivalent of evaluating mystring[A_Index] itself if mystring does not contain a string. Furthermore, String.prototype.__Item might have been redefined with different behaviour, and checking for that would possibly defeat the optimization. By contrast, it's always known at load-time whether SubStr refers to the built-in function (i.e. hasn't been shadowed by an inane user-defined one) and that it will return a string, so it would be technically possible to make the comparison work this way. However, it would almost certainly be a waste of time ("premature optimization"), because "creating" a single-character string for return is a very fast operation.
With the current AutoHotkey v2 implementation, there are a number of different ways strings might be returned from a built-in function. For SubStr, only the following are relevant:
- If the substring is empty, the constant "" is returned directly.
- If the substring is a suffix of the input, it is returned directly. (Otherwise, the string must be copied in order to null-terminate it.)
- If the substring is less than 256 characters, it is copied into pre-allocated stack-based memory provided by the caller.
- Otherwise, memory is allocated from the heap and the substring is copied into it.
A null-terminated single-character UTF-16 string is (typically) 4 bytes including the null-terminator. Copying these values repeatedly is not a performance concern, and is almost certainly faster than allocating new memory.
At some point in the future, the code may be reviewed to remove the reliance on null-termination. Then, SubStr can just return a pointer and length, without copying any strings. (The expression evaluator may still need to make a copy, as it does now.)
In any case, mystring[A_Index] will never be more CPU-efficient than SubStr(mystring, A_Index, 1). The difference in CPU-efficiency has nothing to do with allocating or copying strings; it is the cost of dynamic dispatch. Built-in properties are implemented the same way as user-defined ones; the only difference is that a built-in function is called instead of a user-defined one.
For a very fast and simple operation like retrieving a character from a string, the difference is more pronounced percentage-wise because the overall operation is still very fast. I did some testing with a simple "custom built-in" function ChrAt, functionally identical to SubStr when its third parameter is 1, but simpler. The results were:
Code: Select all
============ ChrAt
============= SubStr
==================================== ChrAt via []
======================================== SubStr.Bind via []
============================================================ UDF via []
So even if you're concerned about the performance of such a trivial operation, there is very little to be gained by adding this feature as built-in over implementing it yourself. If it is added, it will not be for performance reasons, but perhaps (subjective) readability or convenience.
Re: Optimized access to string characters
I was referring to a portion of memory that belongs to the variable. The variable we are referring to is a string of character bytes, not a self-contained chunk of memory. There actually is such a thing as a portion of the variable - it would be any single or any subset of those bytes. I do not want the operation to return a substring at all. I want the operation to return a single byte or character. But since AHK does not have variables to represent bytes (AFAIK), an optimized-internal-comparison would also work, if such a thing is possible. If its not possible with AHK's setup, or the changes needed to perform such a comparison are too extreme, then that makes sense.lexikos wrote: ↑20 Mar 2021, 23:31There is no such thing as "a portion of a variable". A variable contains a value. mystring presumably contains a string. You want mystring[A_Index] to return a single-character substring of the string contained by the variable, not a portion of the variable itself.
I haven't assumed that. I would be fine using the function if I knew it was (or will ever be) capable of directly comparing the characters. But according to the description of the function, that is not what it does. It reports that it generates a second string in memory for my single character, even if I intend to throw that character away immediately after the comparison.lexikos wrote: ↑20 Mar 2021, 23:31It seems that you think mystring[A_Index] == "X" could compare the character at that index directly, without having any need for an intermediate string containing that character. That would mean resolving this to a single operation, similar to CompareCharAt(mystring, A_Index, "X"), not evaluating each side of the operator and then evaluating the operator. But why would you assume it could work that way for mystring[A_Index] == "X" and not for SubStr(mystring, A_Index, 1) == "X"?
The syntax doesn't really matter. Anything that directs execution to the right place without conflicts would work: if( string@3 == "X" ) or if( string@3isX ) or if( string@3[X] ) or if( string@[3]is[X] ), etc. You guys are much better suited to know what would conflict and what would be difficult. I'm just a begger on the street asking for changelexikos wrote: ↑20 Mar 2021, 23:31Actually, it is not practical for mystring[A_Index] because the indexing operation must be dispatched according to the runtime type of the value, and mystring could contain anything (e.g. an Array, Map or Gui). The CompareCharAt operation would need to perform the equivalent of evaluating mystring[A_Index] itself if mystring does not contain a string.
That would be fantastic. You could generate substrings on the fly without allocating memory. I implemented a string class like this once in C++, but it was strictly a container class - it wasn't allowed to own memory. So there were normal strings, and container strings. Container strings worked like sort of a const reference, which could only be modified in its "containment" sense. Sort of like a window that can be opened and closed on both ends to see more or less of the outside. Something like this (two string types) might be possible for AHK using its impressive reference system. You could start strings out as containers, then upgrade them when necessary.
I appreciate the amount of time you put into it. And I respect the efficiency of AHK - it really is impressive. Especially for such a string-heavy language. Efficiency is important, but its not really the reason I want dedicated syntax to operate on a character vs character level in AHK. It's more like a promise of efficiency. Or having the right tool for the job kind of thing. Even if AHK is using SubStr in the background, having syntax that enforces a character-level operation is communicating to the programmer "I have this covered - don't worry about this". From a programmer's perspective, it feels like this tool is missing from the shelf. Maybe it is my specific background that makes it feel that way.
Who is online
Users browsing this forum: No registered users and 36 guests