Page 1 of 2

detect/Remove duplicate values in array

Posted: 09 Nov 2017, 15:04
by derz00
Hello,

I would like to remove some duplicates in an array (or rather create a new array with a single list with only one occurrence of each name in the list.)

I have seen the function InStr but this is an array, not a string. I have seen HasKey but this is the value, not the key. Can anyone help me out?

Re: detect/Remove duplicate values in array

Posted: 09 Nov 2017, 15:29
by Exaskryz
This is probably a rather inefficient way, but I've done this (or something like it; not sure what script of mine did this...) or something like it in the past:

Code: Select all

object:=[], secondobject:=[]
object.Push("a","b","c","c","b","d")
Loop % object.Length()
{
value:=Object.RemoveAt(1) ; otherwise Object.Pop() would work from right to left
Loop % secondobject.Length()
If (value=secondobject[A_Index])
    Continue 2 ; jump to the top of the outer loop, we found a duplicate, discard it and move on
secondobject.Push(value)
}

MsgBox % secondobject.Length()
Loop % secondobject.Length()
MsgBox % secondobject[A_Index]
return

Re: detect/Remove duplicate values in array

Posted: 09 Nov 2017, 15:38
by derz00
Interesting. I also, since posting, thought of this. I would be real interested in knowing if there is a better way.

Code: Select all

RemoveDup(obj) {
for i, value in obj
	str.=value "`n"
nodupArray:={}
nodup:=""
loop parse, str, `n
	if !InStr(nodup, A_LoopField)
	{
		nodup.=A_LoopField "`n"
		nodupArray.Push(A_LoopField)
	}
Return nodupArray
}

Re: detect/Remove duplicate values in array

Posted: 09 Nov 2017, 17:26
by Helgef
If you don't mind it being sorted, try this,

Code: Select all

uniqueArr(arr, del := ""){
	return sortArr(arr, (del!=""?"D" del : "D" chr(1)) . " U C")
}
sortArr(arr,opt:=""){
	; Sort array using sort options
	; https://autohotkey.com/docs/commands/Sort.htm
	static guess:=0
	local delimiter,k,v,str
	if guess
		VarSetCapacity(str,arr.length()*guess,0)	; Speed can be improved by making a guess on the needed size.
	RegExMatch(opt,"O)\bD(.)\b",delimiter) ? delimiter:=delimiter[1] : (delimiter:="`n", opt.=" D`n")
	for k, v in arr
		str.=v . delimiter
	str:=RTrim(str,delimiter)
	Sort,str, % opt
	return StrSplit(str,delimiter)
}
; Example,
for k, v in uniqueArr(["a","a","x","x","b",1,2,3,3,4,9,9,9,9,9,9,9,9])
	str .= v "`n" 
msgbox % str
Edit: Also if you don't mind case insensitivity,

Code: Select all

unique(arr){
	local temp := [], out := []
	local k, v
	for k, v in arr
		temp[v] := ""
	for k in temp
		out[A_Index] := k
	return out
}

Re: detect/Remove duplicate values in array

Posted: 09 Nov 2017, 17:41
by teadrinker
Or like this:

Code: Select all

arr := ["a","b","c","c","b","d"]
newArr := [], testArr := []

for k, v in arr
   if !testArr.HasKey(v)
      testArr[v] := true, newArr.Push(v)
   
for k, v in newArr
   MsgBox, % v

Re: detect/Remove duplicate values in array

Posted: 09 Nov 2017, 20:00
by jeeswg
A warning about the case where an object key is called 'HasKey':

Code: Select all

q:: ;arrays and HasKey
;if an array has a key called HasKey,
;HasKey() will fail, so use ObjHasKey instead
oArray := {hello:0}
MsgBox, % oArray.HasKey("hello") ;1
oArray := {Haskey:0}
MsgBox, % oArray.HasKey("HasKey") ;(blank)
MsgBox, % oArray.HasKey("abc") ;(blank)
MsgBox, % ObjHasKey(oArray, "HasKey") ;1
MsgBox, % ObjHasKey(oArray, "abc") ;0
return
Example scripts to remove duplicates from an array, case sensitive and case insensitive, and that handle numeric v. string keys/values.

Code: Select all

q:: ;array - remove duplicates (case insensitive)
oArray := ["a","B","c","A","B","C",1,1.0,"1","1.0"]
oArray2 := [], oTemp := {}
for vKey, vValue in oArray
{
	if (ObjGetCapacity([vValue], 1) = "") ;is numeric
	{
		if !ObjHasKey(oTemp, vValue+0)
			oArray2.Push(vValue+0), oTemp[vValue+0] := ""
	}
	else
	{
		if !ObjHasKey(oTemp, "" vValue)
			oArray2.Push("" vValue), oTemp["" vValue] := ""
	}
}
vOutput := ""
for vKey, vValue in oArray2
	vOutput .= vKey " " vValue "`r`n"
MsgBox, % vOutput
return

w:: ;array - remove duplicates (case sensitive)
oArray := ["a","B","c","A","B","C",1,1.0,"1","1.0"]
oArray2 := [], oTemp := ComObjCreate("Scripting.Dictionary")
for vKey, vValue in oArray
	if !oTemp.Exists(vValue)
		oArray2.Push(vValue), oTemp.Item(vValue) := ""
vOutput := ""
for vKey, vValue in oArray2
	vOutput .= vKey " " vValue "`r`n"
MsgBox, % vOutput
return

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 04:22
by Helgef
@ Exaskryz, pop is faster. Edit: Although removeat is slower, Helgef is even slower, I finally realised you use removeat instead of pop because the former maintains the original order :oops:
@ derz00, you need to delimit A_LoopField, otherwise you will find that eg, instr(nodup, "a") is true for nodup := "aa", hence something like this might fix that,

Code: Select all

RemoveDup(obj) {
	for i, value in obj
		str.=value "`n"
	nodupArray:={}
	nodup:= "`n" 									; Added delimiter
	loop parse, str, `n
		if !InStr(nodup,  "`n"  A_LoopField "`n" )	; Added delimiter
		{
			nodup.=A_LoopField "`n"
			nodupArray.Push(A_LoopField)
		}
	Return nodupArray
}
@ teadrinker, :thumbup: You could use objhaskey, as pointed out by jeeswg.
@ jeeswg :thumbup: . Fyi, your second script behaves differently on v2. (The first one too, but that is more obvious)
Cheers.

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 05:02
by teadrinker
I've seen objhaskey and similar functions for the first time. Where are they described?

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 05:04
by Helgef
I think you will find if you search in the help file index. I see my link went to the method.

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 05:13
by teadrinker
Hmm, for me the link leads to Object.HasKey(Key).

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 05:23
by teadrinker
Found:
Each method also has an equivalent function, which can be used to bypass any custom behaviour implemented by the object -- it is recommended that these functions only be used for that purpose.

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 06:15
by Helgef
Thanks teadrinker.
It could be worth to note that using the functions can improve performance, presumably because the functions doesn't imply any array look-ups. Also, there are a few other ObjXXX functions which have their own spot in the documentation,

Code: Select all

objAddRef()
objRelease()
objBindMethod()
objRawSet()
Cheers.
Edit:
@ derz00, I'm sorry that your topics is sligthly derailing :oops:
jeeswg wrote:@Helgef: Are you serious about the approaches, both of them, not being two-way compatible? Did you try to find a fix? I'll look into it
I am serious :beard:. Two-way compability, isn't something I consider needs to be fixed, so no, I didn't try. And I don't think you can, since, eg, obj[1 ""] := obj[1] := value yields two key / value pairs (one integer key and one string key) in v1, while in v2 it yields only an integer key. For the first script, you can use try - catch to avoid the exception when you try to call a non-existent method.

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 06:16
by jeeswg
@Helgef: Are you serious about the approaches, both of them, not being two-way compatible? Did you try to find a fix? I'll look into it.

@teadrinker: I try to list the ObjXXX functions at both of these links:
[note: I intend to improve the objects tutorial significantly in future.]
jeeswg's objects tutorial - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=7&t=29232
list of every command/function/variable from across all versions - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 42#p131642

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 06:52
by teadrinker
Thanks, it's very informative!

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 08:34
by derz00
Helgef wrote: Edit:
@ derz00, I'm sorry that your topics is sligthly derailing :oops:
I don't forgive you, because I am thankful for it! :thumbup: My fears quickly died that I might not get a response. :) I should bring all these suggestions together into a dependable function and post it.

Re: detect/Remove duplicate values in array

Posted: 10 Nov 2017, 19:33
by jeeswg
For the HasKey warning script: AHK v2 is more strict and the script ends. As Helgef said, you can use 'try' to avoid this.

For the remove duplicates scripts, they are both working on both AHK v1 and AHK v2. But there are some differences relating to key names that look numeric.

The results I'm getting with AHK v2 here are surprising.
How can I get a numeric key name '1', and a string key name '1', in the same array?

Code: Select all

oArray := {}
oArray[1] := "a"
oArray[1.0] := "b"
oArray["1"] := "c"
oArray["1.0"] := "d"
vOutput := ""
for vKey, vValue in oArray
	vOutput .= vKey " " vValue "`r`n"
MsgBox, % vOutput
;MsgBox(vOutput)
return

;AHK v1
;1 a
;1 c
;1.0 d

;AHK v2
;1 c
;1.0 d
[EDIT:] Further tests on AHK v2:

Code: Select all

;AHK v2
MsgBox(Type(1)) ;Integer
MsgBox(Type("1")) ;String (as expected)
MsgBox(Type(1+0)) ;Integer

oArray := {}
oArray[1] := ""
for vKey, vValue in oArray
	MsgBox Type(vKey) ;Integer
oArray := ""

oArray := {}
oArray["1"] := ""
for vKey, vValue in oArray
	MsgBox Type(vKey) ;Integer (surprising)
oArray := ""
return

Re: detect/Remove duplicate values in array

Posted: 11 Nov 2017, 04:09
by Helgef
jeeswg wrote:; Integer (surprising)
Please refer to the v2 documentation, objects -> keys.

Cheers

Re: detect/Remove duplicate values in array

Posted: 11 Nov 2017, 04:16
by jeeswg
Thanks so much, it's such a relief to have an explanation for what was going on. This is exactly the sort of behaviour I thought AHK v2 was supposed to eliminate. Do you find this behaviour surprising? How can I specify a number stored as a string? Do you know? Thanks.

I'll reserve judgement for now, but presently this behaviour is very concerning.

[EDIT:] Here's a legitimate usage scenario that gets messed up by the 'string looks number' assumption. It works fine in AHK v1, but not in AHK v2.

Code: Select all

q::
oArray := {1:0,2:0,3:0,1a:0,2a:0,3a:0}
vOutput := ""
for vKey, vValue in oArray
	vOutput .= vKey " " vValue "`r`n"
MsgBox(vOutput)
;MsgBox, % vOutput

oArray := {"1":0,"2":0,"3":0,1a:0,2a:0,3a:0}
vOutput := ""
for vKey, vValue in oArray
	vOutput .= vKey " " vValue "`r`n"
MsgBox(vOutput)
;MsgBox, % vOutput
return

;for the 2nd example:
;AHK v1
1 0
1a 0
2 0
2a 0
3 0
3a 0

;AHK v2
;1 0
;2 0
;3 0
;1a 0
;2a 0
;3a 0
[EDIT:] And another one, re. dealing with hex strings generally:

Code: Select all

q::
oArray := {"00":0,"40":64,"80":128,"C0":196}
vOutput := ""
for vKey, vValue in oArray
	vOutput .= vKey " " vValue "`r`n"
MsgBox(vOutput)
;MsgBox, % vOutput
return

;AHK v1
;00 0
;40 64
;80 128
;C0 196

;AHK v2
;40 64
;80 128
;00 0
;C0 196

Re: detect/Remove duplicate values in array

Posted: 11 Nov 2017, 08:14
by Helgef
If I had a dollar for every second I waited for your scripts to finish before I realise I didn't hit q... :D. Shame on me for not learning, but indenting the hotkey routine makes it clearer, imho.
jeeswg wrote:This is exactly the sort of behaviour I thought AHK v2 was supposed to eliminate.
It does eliminate this sort of code:

Code: Select all

if !ObjHasKey(oTemp, "" vValue)
		oArray2.Push("" vValue), oTemp["" vValue] := ""
Although I can appreciate the fun of tinkering with script language specifics, it is probably better if the above is never necessary.
jeeswg wrote:Do you find this behaviour surprising?
I do not find it surprising that the behaviour is according to the documentation.
I do not find it surprising that v2 behaves differently from v1.
I do not find it surprising that the choice to change the behaviour was made. Not claiming the following is the (only) reason for the change, but, integer keys performs better than string key, and takes less space (on average, I guess). If it looks like a number, perhaps it is a number ;)
jeeswg wrote:How can I specify a number stored as a string? Do you know? Thanks.
If you are desperate,

Code: Select all

arr[key:="01"] := "" ; type(key) = string, key+0 = "01" + 0 = 1

jeeswg wrote: Here's a legitimate usage scenario that gets messed up by the 'string looks number' assumption. It works fine in AHK v1, but not in AHK v2.
legitimate, sure, but I think it is rare enough to not cast any doubts on wether the change in handling keys causes an unacceptable loss of feature vs the improvements implied by the change.
I'll reserve judgement for now, but presently this behaviour is very concerning.
That is sound, I will do that too, however, I am not presently concerned. But I would like to hear any arguments, both pros and cons. This thread is probably not the place though.

Cheers.

Re: detect/Remove duplicate values in array

Posted: 11 Nov 2017, 09:04
by jeeswg
- I want an object with obj[1] and obj["1"], can you achieve that?
- Do you have any examples of hotkey labels and indentation. I haven't seen anyone consistently do this on the forum, so I couldn't imitate the style if I wanted to. And I have some grey areas re. (a) what's considered standard, (b) rules for automating indentation.
- While writing those 2 lines of codes, I was actually pleased that I could handle all of the issues to do with string v. numeric key names so succinctly and effectively, and with no ambiguity.
- Re. surprise, I think it's been changed because 'users might be stupid', and I respect the intent, but it could quite easily cause more mistakes and not fewer. Anyone who uses arrays has to understand that there are string keys and numeric keys, it's fundamental. In situations like this you improve the documentation, you don't 'improve' (dumb-down and over-complicate) the language. It will annoy the power users and it won't help the newbies.
- Re. surprise. Almost everything brought in in AHK v1.1, i.e. by lexikos, has remained consistent in AHK v2, this would be an exception.
- I like to use integer keys also, but sometimes you intend them to be numbers stored as strings for good reasons, especially when you are handling strings and doing loops and sorts. Crucially, I at least need *a* way to do it, which I haven't so far seen.

[EDIT:]
- Ultimately, the number one thing that annoys and confuses both power users and newbies is ambiguity. I.e. fiddliness and special exceptions. Things that are clear and consistent work best.
- Currently I have no complaints about how AHK v1 handles the string/numeric key issue.
- One solution when handling strings could be to use a prefix character, so what you gain with integers, you lose with strings.
- Like I said the jury is still out, it partly depends on if there are ways to directly create key names e.g. '1', '2', '3', which are numbers stored as strings.
- There are situations in Explorer where you in one object you refer to items by name and by number at the same time e.g. the nth file (integer key), and by name (string key), however, you may have a folder, with a name comprised solely of digits e.g. a datestamp, or simply the folders named '1', '2' etc.
- You could also have issues relating to numbers with/without leading zeros.

[EDIT:] So Helgef, you've revealed to me (1) the deref in force an expression/can be an expression issue, (2) the no assume local issue, and now (3) the string keys in AHK v2 issue. Hmm, what next?