Page 1 of 1

### Damerau-Levenshtein Distance - Fuzzy searches

Posted: 29 Oct 2017, 02:07

Code: Select all

``````/*
Levenshtein Distance
Source Wikipedia.com
https://en.wikipedia.org/wiki/Levenshtein_distance
Originally written in C translated into AutoHotkey
*/

LDistance(s, t)
{
; degenerate cases
if (StrLen( s ) = 0)
return StrLen( t )
if ( StrLen( t ) = 0)
return StrLen( s )
s := StrSplit( s )
t := StrSplit( t )
v0 := []
Loop % t.Length() + 1
v0[ A_Index ] := A_Index - 1
v3 := [v0]
Loop % s.Length()
{
; calculate v1 (current row distances) from the previous row v0
i := A_Index
v1 := [i]
; use formula to fill in the rest of the row
Loop % t.Length()
{
cost := !( s[ i ] == t[ A_Index ] )
v1[ A_Index + 1 ] := _lMin( v1[ A_Index ] + 1
, v0[ A_Index + 1 ] + 1
, v0[ A_Index ] + cost )
}
v3.Push( v0 )
v0 := v1
}

return v1[ t.Length() +1 ]
}

/*
Damerau-Levenshtein Distance
Source Wikipedia.com
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
Originally written in Python translated into AutoHotkey
Thx to Helgef for correcting it.
*/

DLDistance( a, b )
{
da := {}
d := []
a := StrSplit( a )
b := StrSplit( b )
maxdist := a.Length() + b.Length()
d[-1, -1] := maxdist
Loop % a.Length() + 1
{
i := A_Index - 1
d[i, -1] := maxdist
d[i, 0] := i
}
Loop % b.Length() + 1
{
j := A_Index - 1
d[-1, j] := maxdist
d[0, j] := j
}
Loop % a.Length()
{
db := 0
i := A_Index
Loop % b.Length()
{
j := A_Index
k :=  da.HasKey(b[j]) ?  da[b[j]] : da[b[j]] := 0
l := db
if !( cost := !(a[i] == b[j]) )
db := j
d[i, j] := _lmin( d[i-1, j-1] + cost, d[i,   j-1] + 1, d[i-1, j  ] + 1, d[k-1, l-1] + (i-k-1) + 1 + (j-l-1 ) )
}
da[a[i]] := i
}
return d[a.Length(), b.Length()] ;I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
}

_lMin( p* )
{
Ret := p.Pop()
For each,Val in p
if ( Val < Ret )
{
Ret := Val
}
return Ret
}``````

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 29 Oct 2017, 08:28
Great stuff nnnik, thanks for sharing

Cheers.
Edit
iPhilip wrote: used the following script, which could be used for a spell checker, to test how well it compares words:
Neat!

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 29 Oct 2017, 12:27
Nice! better than the original ahk v1.0 implementation by poly. I wanted a non global var version

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 30 Oct 2017, 18:49
Thank you, nnnik!

I used the following script, which could be used for a spell checker, to test how well it compares words:

Code: Select all

``````Word := "tset"
for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))
Result .= Word.score "-" Word.word "`n"
MsgBox % Substr(Result,1,-1)
ExitApp

Score(Word,WordArray,Func:="") {
ScoredWords := []
for Index, WordElement in WordArray
ScoredWords.Push({word:WordElement, score:%Func%LDistance(Word,WordElement)})
Return ScoredWords
}

Sort(ScoredWords) {
for Index, Word in ScoredWords
SortString .= Word.score " " Index "`n"
SortString := SubStr(SortString,1,-1)
Sort, SortString, N
Indexes := []
Loop, Parse, SortString, `n
{  RegExMatch(A_LoopField, "O)(\d+) (\d+)", Match)
Indexes[Match[1]] .= Match[2] "`n"
}
for Index in Indexes
{  SortString := Substr(Indexes[Index],1,-1)
Sort, SortString, N
Indexes[Index] := SortString
}
SortedWords := []
for Score, IndexList in Indexes
for each, Index in StrSplit(IndexList,"`n")
SortedWords.Push({word:ScoredWords[Index].word, score:Score})
Return SortedWords
}``````
The result using the LDistance function is:

Code: Select all

``````2-test
2-text
3-toast
6-random``````
The result using the DLDistance function is:

Code: Select all

``````1-test
2-text
3-toast
6-random``````
The results show the benefit of the Damerau–Levenshtein distance (DLDistance) over the classical Levenshtein distance (LDistance) because it includes "transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions)." [ref]

Cheers!

Edit: I updated the result for the DLDistance function based on Helgef updated function in this post.

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 31 Oct 2017, 18:07
Thanks, nnnik and iPhilip, that's cool. I wrapped iPhilip's code in a function so it would work with a "Run Auto Hot Key Script" [sic] Activity in UiPath Studio, which I posted at https://forum.uipath.com/t/levenshtein-distance/7068/6, linking back to this post. I also modified it so it will give back Levenshtein (i.e. change "D" to "" in the script).

Code: Select all

``for each, Word in Sort(Score(Word,StrSplit(WordList,","),""))``

Code: Select all

``for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))``
I like how you made it easy to modify, iPhilip, thanks!

burque505

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 31 Oct 2017, 18:32
Hi burque505,

You are welcome.

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 05 Nov 2017, 11:06
Really cool nnnik! Already got a project in mind which could use this, with something like DLDistance(a, b)-Abs(StrLen(a)-StrLen(b))*(1-(Penalty:=0.15)) to allow sub-string searches/matches still give a nice score.
I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
The subtract on the return value may not be sufficient, based on the following result:

Code: Select all

``````DLDistance("test string","test string") ;returns 0
DLDistance("test string","Xest string") ;returns 0 too``````

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 05 Nov 2017, 12:49
I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
I was thinking 1 based index vs 0 based index too, but my very quick look at nnnik's code made me think it was handled, but now you mention matrix, which I consider should have 1 based index
Originally written in Python translated into AutoHotkey
I'm pretty sure Python uses 0 based indices for lists, but I do not recall if, scipy or numpy matrices uses 1 or 0 based index.

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 06 Nov 2017, 12:07
Yea that's why I hesitated to publish the function initially.
If someone could point out the error I would be really greatful.

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 06 Nov 2017, 13:11
Is it based on this code? It is from the wikipage, it looks most like your code I think.

Code: Select all

``````/*
algorithm DL-distance is
input: strings a[1..length(a)], b[1..length(b)]
output: distance, integer
da := new array of |Σ| integers
for i := 1 to |Σ| inclusive do
da[i] := 0
let d[−1..length(a), −1..length(b)] be a 2-d array of integers, dimensions length(a)+2, length(b)+2
// note that d has indices starting at −1, while a, b and da are one-indexed.
maxdist := length(a) + length(b)
d[−1, −1] := maxdist
for i := 0 to length(a) inclusive do
d[i, −1] := maxdist
d[i, 0] := i
for j := 0 to length(b) inclusive do
d[−1, j] := maxdist
d[0, j] := j
for i := 1 to length(a) inclusive do
db := 0
for j := 1 to length(b) inclusive do
k := da[b[j]]
ℓ := db
if a[i] = b[j] then
cost := 0
db := j
else
cost := 1
d[i, j] := minimum(d[i−1, j−1] + cost,  //substitution
d[i,   j−1] + 1,     //insertion
d[i−1, j  ] + 1,     //deletion
d[k−1, ℓ−1] + (i−k−1) + 1 + (j-ℓ−1)) //transposition
da[a[i]] := i
return d[length(a), length(b)]
*/
``````

Code: Select all

``````msgbox % DLDistance("test string","test string") ;returns 0
msgbox % DLDistance("test string","Xest string") ;returns 1
msgbox % DLDistance("test string","XXst string")

DLDistance( a, b )
{
da := {}
d := []
a := StrSplit( a )
b := StrSplit( b )
maxdist := a.Length() + b.Length()
d[-1, -1] := maxdist
Loop % a.Length() + 1
{
i := A_Index - 1
d[i, -1] := maxdist
d[i, 0] := i
}
Loop % b.Length() + 1
{
j := A_Index - 1
d[-1, j] := maxdist
d[0, j] := j
}
Loop % a.Length()
{
db := 0
i := A_Index
Loop % b.Length()
{
j := A_Index
k :=  da.HasKey(b[j]) ?  da[b[j]] : da[b[j]] := 0
l := db
if !( cost := !(a[i] == b[j]) )
db := j
d[i, j] := _lmin( d[i-1, j-1] + cost, d[i,   j-1] + 1, d[i-1, j  ] + 1, d[k-1, l-1] + (i-k-1) + 1 + (j-l-1 ) )
}
da[a[i]] := i
}
return d[a.Length(), b.Length()] ;I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
}

_lMin( p* )
{
Ret := p.Pop()
For each,Val in p
if ( Val < Ret )
{
Ret := Val
}
return Ret
}
``````
Cheers.

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 07 Nov 2017, 09:37
Thanks for that. I have updated the code

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 07 Nov 2017, 19:11
nnnik, Helgef, and iPhilip, thank you all for updating your code and your posts. Once again, greatly appreciated.
Regards,
burque505

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 07 Nov 2017, 21:40
typo: Originally wriiten in C translated into AutoHotkey -> Originally written in C translated into AutoHotkey

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 14 Nov 2017, 15:28
Thanks

### Re: Damerau-Levenshtein Distance - Fuzzy searches

Posted: 14 Nov 2017, 22:00