## Damerau-Levenshtein Distance - Fuzzy searches

Post your working scripts, libraries and tools
nnnik
Posts: 4499
Joined: 30 Sep 2013, 01:01
Location: Germany

### Damerau-Levenshtein Distance - Fuzzy searches

Code: Select all

``````/*
Levenshtein Distance
Source Wikipedia.com
https://en.wikipedia.org/wiki/Levenshtein_distance
Originally written in C translated into AutoHotkey
*/

LDistance(s, t)
{
; degenerate cases
if (StrLen( s ) = 0)
return StrLen( t )
if ( StrLen( t ) = 0)
return StrLen( s )
s := StrSplit( s )
t := StrSplit( t )
v0 := []
Loop % t.Length() + 1
v0[ A_Index ] := A_Index - 1
v3 := [v0]
Loop % s.Length()
{
; calculate v1 (current row distances) from the previous row v0
i := A_Index
v1 := [i]
; use formula to fill in the rest of the row
Loop % t.Length()
{
cost := !( s[ i ] == t[ A_Index ] )
v1[ A_Index + 1 ] := _lMin( v1[ A_Index ] + 1
, v0[ A_Index + 1 ] + 1
, v0[ A_Index ] + cost )
}
v3.Push( v0 )
v0 := v1
}

return v1[ t.Length() +1 ]
}

/*
Damerau-Levenshtein Distance
Source Wikipedia.com
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
Originally written in Python translated into AutoHotkey
Thx to Helgef for correcting it.
*/

DLDistance( a, b )
{
da := {}
d := []
a := StrSplit( a )
b := StrSplit( b )
maxdist := a.Length() + b.Length()
d[-1, -1] := maxdist
Loop % a.Length() + 1
{
i := A_Index - 1
d[i, -1] := maxdist
d[i, 0] := i
}
Loop % b.Length() + 1
{
j := A_Index - 1
d[-1, j] := maxdist
d[0, j] := j
}
Loop % a.Length()
{
db := 0
i := A_Index
Loop % b.Length()
{
j := A_Index
k :=  da.HasKey(b[j]) ?  da[b[j]] : da[b[j]] := 0
l := db
if !( cost := !(a[i] == b[j]) )
db := j
d[i, j] := _lmin( d[i-1, j-1] + cost, d[i,   j-1] + 1, d[i-1, j  ] + 1, d[k-1, l-1] + (i-k-1) + 1 + (j-l-1 ) )
}
da[a[i]] := i
}
return d[a.Length(), b.Length()] ;I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
}

_lMin( p* )
{
Ret := p.Pop()
For each,Val in p
if ( Val < Ret )
{
Ret := Val
}
return Ret
}``````
Recommends AHK Studio
Helgef
Posts: 4693
Joined: 17 Jul 2016, 01:02
Contact:

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Great stuff nnnik, thanks for sharing

Cheers.
Edit
iPhilip wrote: used the following script, which could be used for a spell checker, to test how well it compares words:
Neat!
Last edited by Helgef on 01 Nov 2017, 03:17, edited 1 time in total.
joedf
Posts: 8667
Joined: 29 Sep 2013, 17:08
GitHub: joedf
Contact:

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Nice! better than the original ahk v1.0 implementation by poly. I wanted a non global var version
iPhilip
Posts: 672
Joined: 02 Oct 2013, 12:21

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Thank you, nnnik!

I used the following script, which could be used for a spell checker, to test how well it compares words:

Code: Select all

``````Word := "tset"
for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))
Result .= Word.score "-" Word.word "`n"
MsgBox % Substr(Result,1,-1)
ExitApp

Score(Word,WordArray,Func:="") {
ScoredWords := []
for Index, WordElement in WordArray
ScoredWords.Push({word:WordElement, score:%Func%LDistance(Word,WordElement)})
Return ScoredWords
}

Sort(ScoredWords) {
for Index, Word in ScoredWords
SortString .= Word.score " " Index "`n"
SortString := SubStr(SortString,1,-1)
Sort, SortString, N
Indexes := []
Loop, Parse, SortString, `n
{  RegExMatch(A_LoopField, "O)(\d+) (\d+)", Match)
Indexes[Match[1]] .= Match[2] "`n"
}
for Index in Indexes
{  SortString := Substr(Indexes[Index],1,-1)
Sort, SortString, N
Indexes[Index] := SortString
}
SortedWords := []
for Score, IndexList in Indexes
for each, Index in StrSplit(IndexList,"`n")
SortedWords.Push({word:ScoredWords[Index].word, score:Score})
Return SortedWords
}``````
The result using the LDistance function is:

Code: Select all

``````2-test
2-text
3-toast
6-random``````
The result using the DLDistance function is:

Code: Select all

``````1-test
2-text
3-toast
6-random``````
The results show the benefit of the Damerau–Levenshtein distance (DLDistance) over the classical Levenshtein distance (LDistance) because it includes "transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions)." [ref]

Cheers!

Edit: I updated the result for the DLDistance function based on Helgef updated function in this post.
Last edited by iPhilip on 06 Nov 2017, 13:53, edited 1 time in total.
Windows 10 Pro (64 bit) - AutoHotkey v1.1+ (Unicode 32-bit)
burque505
Posts: 1669
Joined: 22 Jan 2017, 19:37

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Thanks, nnnik and iPhilip, that's cool. I wrapped iPhilip's code in a function so it would work with a "Run Auto Hot Key Script" [sic] Activity in UiPath Studio, which I posted at https://forum.uipath.com/t/levenshtein-distance/7068/6, linking back to this post. I also modified it so it will give back Levenshtein (i.e. change "D" to "" in the script).

Code: Select all

``for each, Word in Sort(Score(Word,StrSplit(WordList,","),""))``

Code: Select all

``for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))``
I like how you made it easy to modify, iPhilip, thanks!

burque505
iPhilip
Posts: 672
Joined: 02 Oct 2013, 12:21

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Hi burque505,

You are welcome.
Windows 10 Pro (64 bit) - AutoHotkey v1.1+ (Unicode 32-bit)
Nextron
Posts: 1387
Joined: 01 Oct 2013, 08:23
Location: Netherlands OS: Win10 AHK: Unicode x32

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Really cool nnnik! Already got a project in mind which could use this, with something like DLDistance(a, b)-Abs(StrLen(a)-StrLen(b))*(1-(Penalty:=0.15)) to allow sub-string searches/matches still give a nice score.
I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
The subtract on the return value may not be sufficient, based on the following result:

Code: Select all

``````DLDistance("test string","test string") ;returns 0
DLDistance("test string","Xest string") ;returns 0 too``````
Helgef
Posts: 4693
Joined: 17 Jul 2016, 01:02
Contact:

### Re: Damerau-Levenshtein Distance - Fuzzy searches

I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
I was thinking 1 based index vs 0 based index too, but my very quick look at nnnik's code made me think it was handled, but now you mention matrix, which I consider should have 1 based index
Originally written in Python translated into AutoHotkey
I'm pretty sure Python uses 0 based indices for lists, but I do not recall if, scipy or numpy matrices uses 1 or 0 based index.
nnnik
Posts: 4499
Joined: 30 Sep 2013, 01:01
Location: Germany

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Yea that's why I hesitated to publish the function initially.
If someone could point out the error I would be really greatful.
Recommends AHK Studio
Helgef
Posts: 4693
Joined: 17 Jul 2016, 01:02
Contact:

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Is it based on this code? It is from the wikipage, it looks most like your code I think.

Code: Select all

``````/*
algorithm DL-distance is
input: strings a[1..length(a)], b[1..length(b)]
output: distance, integer
da := new array of |Σ| integers
for i := 1 to |Σ| inclusive do
da[i] := 0
let d[−1..length(a), −1..length(b)] be a 2-d array of integers, dimensions length(a)+2, length(b)+2
// note that d has indices starting at −1, while a, b and da are one-indexed.
maxdist := length(a) + length(b)
d[−1, −1] := maxdist
for i := 0 to length(a) inclusive do
d[i, −1] := maxdist
d[i, 0] := i
for j := 0 to length(b) inclusive do
d[−1, j] := maxdist
d[0, j] := j
for i := 1 to length(a) inclusive do
db := 0
for j := 1 to length(b) inclusive do
k := da[b[j]]
ℓ := db
if a[i] = b[j] then
cost := 0
db := j
else
cost := 1
d[i, j] := minimum(d[i−1, j−1] + cost,  //substitution
d[i,   j−1] + 1,     //insertion
d[i−1, j  ] + 1,     //deletion
d[k−1, ℓ−1] + (i−k−1) + 1 + (j-ℓ−1)) //transposition
da[a[i]] := i
return d[length(a), length(b)]
*/
``````

Code: Select all

``````msgbox % DLDistance("test string","test string") ;returns 0
msgbox % DLDistance("test string","Xest string") ;returns 1
msgbox % DLDistance("test string","XXst string")

DLDistance( a, b )
{
da := {}
d := []
a := StrSplit( a )
b := StrSplit( b )
maxdist := a.Length() + b.Length()
d[-1, -1] := maxdist
Loop % a.Length() + 1
{
i := A_Index - 1
d[i, -1] := maxdist
d[i, 0] := i
}
Loop % b.Length() + 1
{
j := A_Index - 1
d[-1, j] := maxdist
d[0, j] := j
}
Loop % a.Length()
{
db := 0
i := A_Index
Loop % b.Length()
{
j := A_Index
k :=  da.HasKey(b[j]) ?  da[b[j]] : da[b[j]] := 0
l := db
if !( cost := !(a[i] == b[j]) )
db := j
d[i, j] := _lmin( d[i-1, j-1] + cost, d[i,   j-1] + 1, d[i-1, j  ] + 1, d[k-1, l-1] + (i-k-1) + 1 + (j-l-1 ) )
}
da[a[i]] := i
}
return d[a.Length(), b.Length()] ;I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
}

_lMin( p* )
{
Ret := p.Pop()
For each,Val in p
if ( Val < Ret )
{
Ret := Val
}
return Ret
}
``````
Cheers.
nnnik
Posts: 4499
Joined: 30 Sep 2013, 01:01
Location: Germany

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Thanks for that. I have updated the code
Recommends AHK Studio
burque505
Posts: 1669
Joined: 22 Jan 2017, 19:37

### Re: Damerau-Levenshtein Distance - Fuzzy searches

nnnik, Helgef, and iPhilip, thank you all for updating your code and your posts. Once again, greatly appreciated.
Regards,
burque505
joedf
Posts: 8667
Joined: 29 Sep 2013, 17:08
GitHub: joedf
Contact:

### Re: Damerau-Levenshtein Distance - Fuzzy searches

typo: Originally wriiten in C translated into AutoHotkey -> Originally written in C translated into AutoHotkey
nnnik
Posts: 4499
Joined: 30 Sep 2013, 01:01
Location: Germany

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Thanks
Recommends AHK Studio
joedf
Posts: 8667
Joined: 29 Sep 2013, 17:08