case sensitive/case insensitive arrays (maintain key order, no key autosort)

Get help with using AutoHotkey and its commands and hotkeys
User avatar
jeeswg
Posts: 6469
Joined: 19 Dec 2016, 01:58
Location: UK

case sensitive/case insensitive arrays (maintain key order, no key autosort)

13 Mar 2017, 22:19

tl;dr
Looking for the best way to use a Scripting.Dictionary object for a case insensitive array that maintains key order (I already have a case sensitive version).

CSobj ('case sensitive object'), which I have made a few amendments to, is based on the Scripting.Dictionary object. Unlike standard AHK arrays, it is case sensitive, and a For loop returns keys in the order they were added, it does not autosort.

I would like to make a similar CIobj ('case insensitive object'), that would be almost the same as a standard AHK array but would maintain key order, and not autosort keys. One way of doing this would be to have all the keys stored as lowercase, with the original case (the first time the key was seen) as the initial characters of the value. The get/set functions would wrap up, hide, all of this, you wouldn't know how ugly things were behind the scenes. But I am looking for the best way of doing this. Other ideas might involve a second array.

Some of the uses for this would include (that I already have the code for, but that I would like to potentially perfect CSobj for, or have 'CIobj' for):
- list comparison: items unique to list A / items unique to list B / items present in both
- table lookup: multiple values against a table/ini file
- table lookup: one value against a list (e.g. a spell list)
- list remove duplicates: remove duplicates, maintain order
- list sort: sort list A based on list B, items in both should be in list B's order, the remaining items in list A should be in their original order
- list line frequency: get frequency for each item, maintain order

Notes:
- CSobj is written in function syntax rather than method syntax.
Objects
https://autohotkey.com/docs/Objects.htm
- The code uses 'self' rather than 'this'.
- To use a Scripting.Dictionary object normally and get its key count:
COM Object Reference [AutoHotkey v1.1+] - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/5698 ... ntry357748
- For a normal AHK array to get its key count:
vCount := NumGet(&oArray + 4*A_PtrSize)
- To get the key count for a CSobj object I added in this method: oArray.count().
- Originally, CSobj couldn't get/set the values for keys that were positive/negative integers, I added in "" in 2 places to make this possible.
- The script mentioned here looks promising but uses the old code for COM.ahk before there was native support.
Scripting.Dictionary Object as Associative Array - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/1639 ... ive-array/

The code below shows the case sensitive 'maintain order' CSobj in action, it lists the keys in their original order, and then sorted.

Code: Select all

q::
oArray := CSobj()
vList := "Q,w,E,-2,1,q,W,e,2,-1,Q,1.5"
Loop, Parse, vList, `,
	vKey := A_LoopField, oArray[vKey] := ((vNum := oArray[vKey]) = "") ? 1 : vNum+1

;list array (original item order)
vOutput := "key count: " oArray.count() "`r`n`r`nfrequency table:`r`n"
vList2 := ""
For vKey, vValue in oArray
	vOutput .= vKey "`t" vValue "`r`n", vList2 .= vKey "`n"
MsgBox % vOutput

;list array (alphabetical order) (assumes no key name contains a LF)
vOutput := "key count: " oArray.count() "`r`n`r`nfrequency table:`r`n"
vList2 := SubStr(vList2, 1, -1)
Sort, vList2, F JEE_SortAsIfLCase
Loop, Parse, vList2, `n
	vOutput .= A_LoopField "`t" oArray[A_LoopField] "`r`n"
MsgBox % vOutput
Return

;==================================================

;JEE_SortStableAsIfLCase
JEE_SortAsIfLCase(vTextA, vTextB, vOffset) ;for use with Sort function
{
StringLower, vTextA, vTextA
StringLower, vTextB, vTextB
Return ("" vTextA) > ("" vTextB) ? 1 : ("" vTextA) < ("" vTextB) ? -1 : -vOffset
}

;==================================================

;based on:
;Case sensitive variables possible? - Ask for Help - AutoHotkey Community
;https://autohotkey.com/board/topic/61829-case-sensitive-variables-possible/
;and:
;How make Associative array keys case sensitive - Ask for Help - AutoHotkey Community
;https://autohotkey.com/board/topic/61840-how-make-associative-array-keys-case-sensitive/
;updated by jeeswg to allow positive/negative integer keys, and a count method added

CSobj() {
   static base := object("_NewEnum","__NewEnum", "Next","__Next", "__Set","__Setter", "__Get","__Getter", "__Call","__Caller")
   return, object("__sd_obj__", ComObjCreate("Scripting.Dictionary"), "base", base)
}
   __Getter(self, key) {
      return, self.__sd_obj__.item("" key)
   }
   __Setter(self, key, value) {
      self.__sd_obj__.item("" key) := value
      return, false
   }
   __NewEnum(self) {
      return, self
   }
   __Next(self, ByRef key = "", ByRef val = "") {
      static Enum
      if not Enum
         Enum := self.__sd_obj__._NewEnum
      if Not Enum[key], val:=self[key]
         return, Enum:=false
      return, true
   }
   __Caller(self, name) {
      if (name = "count")
         return, self.__sd_obj__.count
   }
Btw is a Scripting.Dictionary object much faster than, or slower than an AHK array?
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
lexikos
Posts: 6404
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: case sensitive/case insensitive arrays (maintain key order, no key autosort)

14 Mar 2017, 03:23

Btw is a Scripting.Dictionary object much faster than, or slower than an AHK array?
Faster or slower for what?

Is the Scripting.Dictionary object on its own in a language which can utilise it directly, in AutoHotkey with a COM object wrapper, or in AutoHotkey with an additional wrapper such as yours? Each one has more overhead than the last.

Is the comparison between the two including techniques to make them behave similarly, or comparing apples to oranges?

If all you want is to preserve the order of keys returned by the for-loop, you don't need to change the way the keys are stored. You can just remember the order of keys and override the enumerator to return keys in that order. See OrderedArray().

If you want to know which technique is faster, implement both with the same behaviour and functionality, and then benchmark them.
User avatar
jeeswg
Posts: 6469
Joined: 19 Dec 2016, 01:58
Location: UK

Re: case sensitive/case insensitive arrays (maintain key order, no key autosort)

14 Mar 2017, 21:07

Thanks, the OrderedArray() link you provided is very useful.
[AHK_L] For Loop in order of key-value pair creation - Ask for Help - AutoHotkey Community
https://autohotkey.com/board/topic/6179 ... /?p=389662

OK, well now I'm hoping to create 4 array templates:
- case sensitive, creation order, array [based on CSobj which uses a Scripting.Dictionary object]
- case sensitive, alphabetical order, array [based on a Scripting.Dictionary object, is this difficult to do?][EDIT: For 128 keys, do you compare against 64th, 32nd, 16th etc to find position? Is that how AHK arrays work?]
- case insensitive, creation order, array [based on OrderedArray() which uses a standard AHK array]
- case insensitive, alphabetical order, array [based on a standard AHK array]

An interesting possibility, if someone is familiar enough with objects to do this, as an interesting challenge/tutorial:
- case insensitive, creation order, array [based on a Scripting.Dictionary object]
- case insensitive, alphabetical order, array [based on a Scripting.Dictionary object]

Two things I want these arrays to have:
- No special treatment for positive/negative integers, treat everything as strings.
- A count() method.

I have achieved this for the 'case sensitive, creation order' OrderedArray().

Code: Select all

;I can achieve consistent string/number treatment for OrderedArray() by changing oaSet:
oaSet(obj, k, v)
{
    k := "" k
    if !ObjHasKey(obj, k)
    ;if !ObjHasKey(obj._keys, k) ;didn't work
        ;obj._keys.Insert(k) ;Insert is deprecated, see: https://autohotkey.com/docs/objects/Object.htm
        obj._keys.Push(k)
}

;A key count method:
;(note: add a reference to oaCall where base is defined)
;    static base := Object("__Set", "oaSet", "_NewEnum", "oaNewEnum", "__Call", "oaCall")

oaCall(obj, name)
{
    if (name = "count")
        return, NumGet(&obj._keys + 4*A_PtrSize)
}
I have been trying, but not succeeding, to create a slight modification of the standard AHK array which gets/sets integers as strings, and has a count() method.

(Despite trying to make everything look as straightforward as possible, I have only been investigating objects and methods for a number of days, and each thing here, however simple it looks to an expert, has caused hours of trouble.) [EDIT: I've used objects a lot e.g. Explorer/IE/Acc/Excel/Word, but ObjAddRef, methods (e.g. get/set/enum) and a lot about arrays are new to me.]

==================================================

Re. speed. I was asking more about the nature of the objects, and if that would have an obvious effect on the speed on certain types of usage (see the 6 examples above). Anyhow, the question was open-ended, if there was no obvious answer then so be it.

[the nature of the AHK standard array]
Just switched to AHK L.... - Ask for Help - AutoHotkey Community
https://autohotkey.com/board/topic/8772 ... /?p=556935
Key-value pairs are returned in the order that they exist within the object, which is implementation defined. The implementation, in this case, uses binary search, which requires that the keys are sorted.
[the nature of Scripting.Dictionary]
Scripting.Dictionary Object as Associative Array - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/1639 ... ive-array/
Thanks. I think it uses a hash table, so basically the look-up time should be independent on the number of items in the table. In reality, however, there could be collisions among them, depending on the implementations, so it could grow linearly with the number of items in the worst case, but I don't think it would ever happen.
==================================================

Benchmark tests, where all keys are lowercase:

length (haystack): 1009987
object time (msec) in dict not in dict num. keys
AHK 171 83997 35033 164766
SD 983 84005 35025 164767

length (haystack): 1009987
object time (msec) in dict not in dict num. keys
AHK 188 83997 35033 164766
SD 1092 84005 35025 164767

Code: Select all

q:: ;speed tests, check if word is in list (AHK standard array v. Scripting.Dictionary object)
;get spell list and text to spellcheck
;vPath := A_Desktop "\MySpellList.txt"
;FileRead, vListS, % vPath
vListS := vSpellListText
vPath := A_ScriptFullPath
FileRead, vText, % vPath
vLen := StrLen(vText)

vText := RegExReplace(vText, "[^\w]+", " ")
oArray1 := {}
oArray2 := ComObjCreate("Scripting.Dictionary")

Loop, Parse, vListS, `n, `r
{
	vTemp := A_LoopField
	if (vTemp = "")
		continue
	StringLower, vTemp, vTemp
	oArray1["" vTemp] := ""
	oArray2.item("" vTemp) := ""
}

vCount1 := NumGet(&oArray1 + 4*A_PtrSize)
vCount2 := oArray2.count()
vCount1Y := vCount2Y := 0
vCount1N := vCount2N := 0
Loop, 2
{
	vOutput := ""
	VarSetCapacity(vOutput, StrLen(vText)*2)
	oArray := oArray%A_Index%
	vTickCount1 := A_TickCount

	if (A_Index = 1) ;AHK standard array
		Loop, Parse, vText, % " "
		{
			vTemp := A_LoopField
			StringLower, vTemp, vTemp
			;for an AHK standard array, we don't need to make the text lowercase
			;but this keeps the speed tests consistent
			if oArray1.HasKey("" vTemp)
				vCount1Y++
			else
				vCount1N++
		}
	if (A_Index = 2) ;Scripting.Dictionary
		Loop, Parse, vText, % " "
		{
			vTemp := A_LoopField
			StringLower, vTemp, vTemp
			if oArray2.Exists("" vTemp)
				vCount2Y++
			else
				vCount2N++
		}
	vNum%A_Index% := A_TickCount - vTickCount1
}
vOutput := "length (haystack): " vLen "`r`n"
vOutput .= "object" "`t" "time" "`t" "in dict" "`t" "not in dict" "`t" "num. keys`r`n"
vOutput .= "AHK`t" vNum1 "`t" vCount1Y "`t" vCount1N "`t" vCount1 "`r`n"
vOutput .= "SD`t" vNum2 "`t" vCount2Y "`t" vCount2N "`t" vCount2 "`r`n"
MsgBox, % Clipboard := vOutput
return
==================================================

Regarding objects and documentation:

In the past few days I've read all the htms on objects in the documentation.

I'll be releasing a tutorial on what I've worked out so far.

The main difficulty is this page, it doesn't have any simple examples for classes for example:
Objects
https://autohotkey.com/docs/Objects.htm

In general the documentation is very good as ever. But unusually, with Objects.htm, I was lost at various points, and the simplest remedy is more concrete examples.

Another issue is simply that there isn't one page listing all the functions that start with ComObj or Obj, or all the pages relating to objects. Something my tutorial tries to do.

Another issue is people using 'For each', I don't think people should use the word 'each'. [EDIT: see below]

Hopefully my tutorial will address the main issues a beginner to objects faces, and will show how to make a beginner into an expert quite rapidly.

jeeswg's objects tutorial - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=7&t=29232

==================================================

[EDIT:] 'for each', click [Expand]

Code: Select all

oObj := ["a","b","c"]

;generally this is how I would loop through an object:
for vKey, vValue in oObj
	vOutput .= vKey " " vValue "`r`n"

;if values are not needed:
for vKey in oObj
	vOutput .= vKey "`r`n"

;if keys are not needed (where '_' is a variable):
for _, vValue in oObj
	vOutput .= vValue "`r`n"
	
;I would not do this (where 'each' is a variable):
for each, vValue in oObj
	vOutput .= vValue "`r`n"

MsgBox, % vOutput
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Ask For Help”

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], vsub and 214 guests