We have a huge Courier email sent box that we spend hours cleaning duplicate email addresses out of.
We're new with AHK.
Can you offer any help or suggestions to adapt this code to searching a 50MB or so Courier email sent box for duplicate addresses?
Here is the third part, removing identical items from lists. The script below is faster than sorting. For each item in the list a local variable is generated, named by the hex representation of the name of the item (which could contain illegal characters for a variable name, like "."). If this name is new, it is the first occurrence of the item, otherwise we cut this item off the list.
Code:
Hexify(x) ; Convert a string to a huge hex number starting with X
{
StringLen Len, x
format = %A_FormatInteger%
SetFormat Integer, H
hex = X
Loop %Len%
{
Transform y, ASC, %x% ; ASCII code of 1st char, 15 < y < 256
StringTrimLeft y, y, 2 ; Remove leading 0x
hex = %hex%%y%
StringTrimLeft x, x, 1 ; Remove 1st char
}
SetFormat Integer, %format%
Return hex
}
ListUniq(ByRef list) ; Remove repeated items from list
{
list = %list%,
c = 0
Loop
{
StringGetPos d, list, `,,, %c% ; search from c
IfLess d,0, { ; No more ","
StringTrimRight list, list, 1
Return
}
StringMid item, list, % c+1, % d-c
hex := Hexify(item) ; Item might not be a valid name
IfEqual %hex%,, { ; 1st occurrence
%hex% = 1
c := d + 1
Continue
} ; Already found
StringLeft left, list,% c-1
StringTrimLeft right, list, %d%
list = %left%%right%
}
}
Test it with
Code:
list = 0,1,2,1,3,02,4,2,1
ListUniq(list)
MsgBox %list%