Possible AHK bug - the word "base" is missed in an array for-loop

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 13:28

I'm having a very strange problem in a script that is processing words in Word files (using COM). I was able to narrow down the problem to the word "base" being missed in an array for-loop. I wrote a small script that shows the problem:

Code: Select all

WordArray:={}
WordArray[1,"hello"]:=10
WordArray[2,"world"]:=20
WordArray[3,"base"]:=30
WordArray[4,"why does it miss base by itself"]:=40
WordArray[5,"base in a phrase works fine"]:=50
WordArray[6,"even other forms of the word like based"]:=60
WordArray[7,"and baseless"]:=70
WordArray[8,"and bases work fine"]:=80
For Key,AllText in WordArray
{
  For Word,Count in AllText
  {
    MsgBox % Word A_Space Count
  }
}
Bizarre as it sounds, it misses the "base" entry in the array. Is this a problem in my script or a bug in AHK? Thanks, Joe
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 14:03

Custom Objects wrote:Objects created by the script do not need to have any predefined structure. Instead, each object can inherit properties and methods from its base object (otherwise known as a "prototype" or "class").
...
To create an object derived from another object, scripts can assign to the base property...
...
It is possible to reassign an object's base at any time, effectively replacing all of the properties and methods that the object inherits.
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 14:27

Hi kon,
Any thoughts on how to handle this? The script is one that you've helped me with recently. It uses Word COM to get all the text from a Word file, then uses a RegExMatch while-loop to break up the text into individual words, then the array for-loop shown above to count the occurrences of each word. I put counters in two places to make sure that it was getting all words, and with most test docs, it worked fine — the counts were equal. Then I tested it on a very large doc — War and Peace (from Project Gutenberg). The two counters were off by 10 out of 18,768 unique words, which I found really strange. Further troubleshooting led me to the fact that the word "base" appears 10 times in War and Peace — and that's the problem. I wrote the small script above to confirm the issue. I suppose I could replace all occurrences of "base" with "basexxxyyyzzz" at the beginning and then change them back at the end, but isn't there a clean way to tell AHK that "base" in this case has nothing to do with the base property of the object? Thanks, Joe
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 14:54

Maybe something like this...

Code: Select all

foo := []
foo[1, "bar"] := "A"
;~ foo[1, "base"] := "B"
ObjRawSet(foo[1], "base", "B")
for k, v in foo[1]
    MsgBox, % k "`n" v
Objects.htm#Keys wrote:...
  • By default, the string key "base" is used to retrieve or set the object's base object, so cannot be used for storing ordinary values with a normal assignment. However, if a value is stored by some other means (such as ObjRawSet(Object, "base", "") or Object.SetCapacity("base", 0)), the key "base" then acts like any other string.
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 15:33

By default, the string key "base" is used to retrieve or set the object's base object, so cannot be used for storing ordinary values with a normal assignment.
Wow! That can lead to some insidious bugs. I hope the language designers fix that in v2. If I'm understanding it right, this means that a normal assignment cannot be used if the key is "base", which further means that you have to test every key for the value "base" and, if found, use a special assignment, such as ObjRawSet or Object.SetCapacity. Yikes!

Anyway, thanks for the code snippet — works perfectly! Regards, Joe
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 16:13

kon,
One other thing. In the code I posted above, when the value of Word is an integer with leading zeroes, the second for-loop removes the leading zeroes. For example, "01" becomes "1" and "002" becomes "2". Any way to prevent that? In essence, I want all values of Word treated as a string. Thanks again, Joe
TAC109
Posts: 1111
Joined: 02 Oct 2013, 19:41
Location: New Zealand

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 17:03

Another approach to solve your problems is to add a dummy character to the start of each key and value (such as ~). When retrieving data from the array, skip the dummy character. This approach also preserves your numbers in original format.
My scripts:-
XRef - Produces Cross Reference lists for scripts
ReClip - A Text Reformatting and Clip Management utility
ScriptGuard - Protects Compiled Scripts from Decompilation
I also maintain Ahk2Exe
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 18:44

Hi TAC109,
That's a nice work-around! Solves both the "base" problem and the leading-zeroes problem. Thanks very much, Joe
guest3456
Posts: 3463
Joined: 09 Oct 2013, 10:31

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 20:53

JoeWinograd wrote:
By default, the string key "base" is used to retrieve or set the object's base object, so cannot be used for storing ordinary values with a normal assignment.
Wow! That can lead to some insidious bugs. I hope the language designers fix that in v2.
there's nothing to fix. base is a keyword. its a special key that all objects have. it is what allows prototypical OOP inheritance
JoeWinograd wrote: If I'm understanding it right, this means that a normal assignment cannot be used if the key is "base", which further means that you have to test every key for the value "base" and, if found, use a special assignment, such as ObjRawSet or Object.SetCapacity. Yikes!
it means just don't use the word "base" as a key in any of your objects, because every object already has a built-in key called "base"

User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

20 Oct 2016, 21:15

it means just don't use the word "base" as a key in any of your objects, because every object already has a built-in key called "base"
That's impossible to do when the object is being populated with data from a source over which you have no control. In this case, any Word document with the word "base" in it will cause a failure in the script. I guess the answer is either (1) don't use objects when you can't control whether or not the data will result in a "base" key or (2) employ a cute trick, such as the one posted by TAC109, which, indeed, works fine. Regards, Joe
guest3456
Posts: 3463
Joined: 09 Oct 2013, 10:31

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 00:05

its not impossible to do: you either 1. hack on a prefix, like TAC showed you, 2. you check specifically for that word and do a similar modification, or 3. rethink your object design, since theres absolutely no reason to use a 2-dimensional array as you've shown in your OP

User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 01:42

What I meant is that it's impossible in some cases to avoid the word "base" as a data item. Yes, I can work around it with one of the methods that kon and you mentioned (and already have — TAC109's idea).
lexikos
Posts: 9583
Joined: 30 Sep 2013, 04:07
Contact:

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 03:20

... means that you have to test every key for the value "base" and, if found, use a special assignment, such as ObjRawSet or Object.SetCapacity.
Not true. You can choose to do it that way, but it's not the only or best way.

If your object is intended to be used only for storage of arbitrary keys, you presumably would not want any specialised properties (whether built-in or defined by you). In that case, there's no need to check for "base"; just use ObjRawSet for all keys.

The stock object is a "multi-purpose tool" and as such, is not perfect for every purpose. It is designed with the flexibility for you to make an object which behaves how you want (with some restrictions, like x.y and x["y"] must always be equivalent). So if you don't want x["base"] (or x.base) to return x's base, override it.

Code: Select all

class AArray {
    base {
        set {
            ObjRawSet(this, "base", value)
        }
        /* ; Use this to override the default value.
        get {
            ; An empty 'get' is the same as 'return ""'.
        }
        */
    }
}

a := new AArray
k := "base"
MsgBox % a.base = AArray  ; true
a[k] := "Hello."  ; overrides 'base'
MsgBox % a[k]  ; Hello.
This disables a.base and a["base"] only after an assignment is made. If you uncomment get, the override will be immediate (when the object is constructed) and permanent. A much shorter way of achieving something similar is ObjRawSet(a, "base", "") during initialisation of a. However, that method has drawbacks: 1) the value is present in the object, so ObjHasKey(a, "base") returns true and enumerators/the for-loop will include it; 2) ObjDelete(a, "base") will revert the behaviour.

On a related note, if you're storing arbitrary keys which could match method names (such as "HasKey"), you may become unable to call those methods. JavaScript and some other scripting languages also suffer from that problem. However, in AutoHotkey you can override that behaviour by redirecting the values to another object with the __Get and __Set meta-functions. (You would also need to override any built-in methods which you want to support.)

Code: Select all

class BArray {
    __New() {
        ObjRawSet(this, "data", {})
    }
    __Get(k) {
        return this.data[k]
    }
    __Set(k, v) {
        ObjRawSet(this.data, k, v)
        return v
    }
    HasKey(k) {
        return ObjHasKey(this.data, k)
    }
}

b := new BArray
b.HasKey := 42
MsgBox % b.HasKey  ; 42
MsgBox % b.HasKey("HasKey")  ; true
While we're on this topic, I think it's worth mentioning that objects are always case-insensitive. If you don't want that behaviour, there currently isn't a built-in solution. You can use a COM object such as ComObjCreate("Scripting.Dictionary") (this is the easiest), encode string keys (e.g. to hex/base64), or roll your own complete data structure. Whatever method you use, you can hide it away in a class if you wish.
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 12:17

If your object is intended to be used only for storage of arbitrary keys, you presumably would not want any specialised properties (whether built-in or defined by you).
Yes, that's the case.
In that case, there's no need to check for "base"; just use ObjRawSet for all keys.
Will do.
On a related note, if you're storing arbitrary keys which could match method names (such as "HasKey"), you may become unable to call those methods.
Good point! Hadn't thought of that (my script does call "HasKey"). I have no control over the values of the keys (they all come from external sources), so "HasKey" and other method names are certainly possible.
While we're on this topic, I think it's worth mentioning that objects are always case-insensitive.
Definitely worth mentioning. In my current usage, that's what I want, but good to keep in mind for future projects.
You can use a COM object such as [trimmed and Subject changed]ComObjCreate("Scripting.Dictionary")
I tried ComObjCreate("Scripting.Dictionary") a while ago and couldn't get it to work, but I'll take another spin at it later.

Thanks for the detailed reply — very helpful! Regards, Joe
guest3456
Posts: 3463
Joined: 09 Oct 2013, 10:31

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 13:48

lexikos wrote: On a related note, if you're storing arbitrary keys which could match method names (such as "HasKey"), you may become unable to call those methods.
ah yes this is a good point. what if your source file has words like "insert" or "push" or "delete" or "pop"? all of those will conflict

JoeWinograd wrote:What I meant is that it's impossible in some cases to avoid the word "base" as a data item. Yes, I can work around it with one of the methods that kon and you mentioned (and already have — TAC109's idea).
Joe, related to my #3 above, why do you need arbitrary words as keys in your object? Why are you populating keys with the words from your source file? What exactly are you trying to accomplish?

kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 14:31

guest3456 wrote:Joe, related to my #3 above, why do you need arbitrary words as keys in your object? Why are you populating keys with the words from your source file? What exactly are you trying to accomplish?
If you use the words as keys it is easier to do certain things like check if a word is in the array. ie:
if arr[key]

^as opposed to looping over the array.
User avatar
JoeWinograd
Posts: 2198
Joined: 10 Feb 2014, 20:00
Location: U.S. Central Time Zone

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 15:12

What exactly are you trying to accomplish?
At the highest level: trying to create a spreadsheet with a count of each word that appears in a Word file.

Here's the script's approach:

(1) Uses ComObjGet to store the entire contents of the file via .Content.Text (with .ShowRevisions=False) into an array (many thanks to forum members for help with this).

(2) Uses a for-loop to process all the text in the array. Inside the for-loop is a While Pos:=RegExMatch statement that splits the text into individual words. Originally, the RegExMatch was very simple ("\b\w+\b"), but has become very complex during development to handle words with apostrophes (both standard ones and Word's "smart" right apostrophe), email addresses, web addresses, hyphenated words, numbers with commas and decimal points, monetary values, etc. (many thanks again to forum members for help with this).

(3) When a RegEx pattern match occurs, it uses a "HasKey" call to see if the word is already in the word array, in which case it increments the counter:

WordArray[WordKey,CurrentWord]:=WordArray[WordKey,CurrentWord]+1

If not already in the word array, it sets the counter to 1:

WordArray[WordKey,CurrentWord]:=1

This is where it got into trouble with "base" — didn't happen in any of my small test docs, but there are 10 occurrences of "base" in War and Peace, which is where I noticed the problem.

The fix I'm using now is simply CurrentWord:="~" . CurrentWord before the "HasKey" call and StringTrimLeft,Word,Word,1 later (thanks to TAC109 above).

(4) After fully populating WordArray, the following code creates the word list and writes it to a spreadsheet (actually, a CSV file):

Code: Select all

For WordKey,AllText in WordArray
{
  For Word,Count in AllText
  {
    <create "WordList" variable here by concatenating the "Word" variable into it>
  }
}
FileAppend,%WordList%,%OutputFile%
Thanks for your interest and advice. If there's a better way to do it, I'm all ears! Regards, Joe
guest3456
Posts: 3463
Joined: 09 Oct 2013, 10:31

Re: Possible AHK bug - the word "base" is missed in an array for-loop

21 Oct 2016, 18:24

kon wrote: If you use the words as keys it is easier to do certain things like check if a word is in the array. ie:
if arr[key]

^as opposed to looping over the array.
JoeWinograd wrote: (3) When a RegEx pattern match occurs, it uses a "HasKey" call to see if the word is already in the word array, in which case it increments the counter
ah ok. i guess if speed is of utmost importance, then yes using HasKey will end up with this limitation. would be nice to have a HasValue builtin


Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: AHK_user and 220 guests