Filtering duplicates from large list (what's the fastest way?) Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
dd900
Posts: 121
Joined: 27 Oct 2013, 16:03

Filtering duplicates from large list (what's the fastest way?)

31 Mar 2021, 08:48

I am merging DNS filter lists for use with AdGuard Home. I don't like having a lot of lists so I am consolidating some of the ones that I use. I already have a working script, but my question is how can I speed it up?

So far this is fastest

Code: Select all

entry := "item"
If (!InStr(outtext, entry))
    outText .= entry "`n"
This out performs an array lookup and Loop, Parse. My issue is when the lists get big (100000+ entries) the script gets slow. How can I speed up the filtering?

What I'm thinking about trying:
1.) Using CLR.ahk and creating a function in C# that uses enumerable.Union().
2.) Splitting the outText into smaller sections then perform the filtering on the smaller strings.

I would like to hear some more suggestions from you guys. Thanks
User avatar
mikeyww
Posts: 26847
Joined: 09 Sep 2014, 18:38

Re: Filtering duplicates from large list (what's the fastest way?)

31 Mar 2021, 10:06

You might want to look at database options such as SQLite. Another option could be adding all entries and then removing duplicates, but I have not tested it.

Explained: Sort
User avatar
jasc2v8
Posts: 59
Joined: 10 Dec 2020, 12:24
Contact:

Re: Filtering duplicates from large list (what's the fastest way?)

31 Mar 2021, 11:31

if you add the duplicate entry items to an object as the key, dups will be omitted.

Code: Select all

MyArray := {}

Loop, 3
{
  MyArray["Entry" A_Index] := A_Index
  MyArray["Entry" A_Index] := A_Index
  MyArray["Entry" A_Index] := A_Index
}

for key, value in MyArray
  msg .=  "key=" key "`nvalue=" value "`n`n"

MsgBox % msg

MyArray=
User avatar
dd900
Posts: 121
Joined: 27 Oct 2013, 16:03

Re: Filtering duplicates from large list (what's the fastest way?)

31 Mar 2021, 12:10

jasc2v8 wrote:
31 Mar 2021, 11:31
if you add the duplicate entry items to an object as the key, dups will be omitted.
mikeyww wrote:
31 Mar 2021, 10:06
You might want to look at database options such as SQLite. Another option could be adding all entries and then removing duplicates, but I have not tested it.

Explained: Sort
Thanks for the suggestions guys. In my case Sort, var, U turned out to be fastest. The object approach wasn't much slower than Sort (slightly faster than InStr), but it used way more memory than either of the other two methods. I will try to implement my other two ideas to see how they compare. All other suggestions are welcome. Thank you
User avatar
dd900
Posts: 121
Joined: 27 Oct 2013, 16:03

Re: Filtering duplicates from large list (what's the fastest way?)  Topic is solved

19 Apr 2021, 01:03

If anyone is interested. I ended up using CLR with a wrapper around HashSet<string> contained by a Dictionary<string, HashSet<string>>
The HashSet will filter duplicates while adding to the Set. And the functions UnionWith and ExceptWith are blazing fast

Code: Select all

using System.Collections.Generic;

class HashSetDict
{
	private readonly Dictionary<string, HashSet<string>> listContainer = new Dictionary<string, HashSet<string>>();

	public int HasList(string name)
    {
		return listContainer.ContainsKey(name) ? 1 : 0;
	}

	public void AddList(string name)
	{
		listContainer.Add(name, new HashSet<string>());
	}

	public int RemoveList(string name)
	{
		listContainer[name].Clear();
		return listContainer.Remove(name) ? 1 : 0;
	}

	public int AddToList(string name, string item)
    {
		return listContainer[name].Add(item) ? 1 : 0;
	}

	public int RemoveFromList(string name, string item)
    {
		return listContainer[name].Remove(item) ? 1 : 0;
	}

	public int ListContains(string name, string item)
    {
		return listContainer[name].Contains(item) ? 1 : 0;
	}

	public void ListExceptWith(string name, string exceptName)
    {
		listContainer[name].ExceptWith(listContainer[exceptName]);
	}

	public void ListUnionWith(string name, string unionName)
	{
		listContainer[name].UnionWith(listContainer[unionName]);
	}

	public void ClearList(string name)
	{
		listContainer[name].Clear();
	}

	public string ListToString(string name)
    {
		return string.Join("\n", listContainer[name]);
    }

	public string ListToSortedString(string name)
	{
		return string.Join("\n", new SortedSet<string>(listContainer[name]));
	}
}

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 220 guests