Damerau-Levenshtein Distance - Fuzzy searches

Post your working scripts, libraries and tools for AHK v1.1 and older
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Damerau-Levenshtein Distance - Fuzzy searches

29 Oct 2017, 02:07

Code: Select all

/*
Levenshtein Distance
Source Wikipedia.com
https://en.wikipedia.org/wiki/Levenshtein_distance
Originally written in C translated into AutoHotkey
*/

LDistance(s, t)
{ 
    ; degenerate cases
    if (StrLen( s ) = 0) 
		return StrLen( t )
	if ( StrLen( t ) = 0) 
		return StrLen( s )
	s := StrSplit( s )
	t := StrSplit( t )
    v0 := []
    Loop % t.Length() + 1
        v0[ A_Index ] := A_Index - 1
	v3 := [v0]
    Loop % s.Length()
    {
        ; calculate v1 (current row distances) from the previous row v0
		i := A_Index
		v1 := [i]
        ; use formula to fill in the rest of the row
        Loop % t.Length()
		{
			cost := !( s[ i ] == t[ A_Index ] )
			v1[ A_Index + 1 ] := _lMin( v1[ A_Index ] + 1
			, v0[ A_Index + 1 ] + 1
			, v0[ A_Index ] + cost )
		}
		v3.Push( v0 )
		v0 := v1
    }

    return v1[ t.Length() +1 ]
}

/*
Damerau-Levenshtein Distance
Source Wikipedia.com
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
Originally written in Python translated into AutoHotkey
Thx to Helgef for correcting it.
*/

DLDistance( a, b )
{
    da := {}
    d := []
	a := StrSplit( a )
	b := StrSplit( b )
    maxdist := a.Length() + b.Length()
    d[-1, -1] := maxdist
    Loop % a.Length() + 1
	{
        i := A_Index - 1
		d[i, -1] := maxdist
        d[i, 0] := i
    }
	Loop % b.Length() + 1
	{
		j := A_Index - 1
		d[-1, j] := maxdist
        d[0, j] := j
	}
    Loop % a.Length()
	{
        db := 0
		i := A_Index
        Loop % b.Length()
		{
            j := A_Index
			k :=  da.HasKey(b[j]) ?  da[b[j]] : da[b[j]] := 0
            l := db
            if !( cost := !(a[i] == b[j]) )
				db := j
            d[i, j] := _lmin( d[i-1, j-1] + cost, d[i,   j-1] + 1, d[i-1, j  ] + 1, d[k-1, l-1] + (i-k-1) + 1 + (j-l-1 ) ) 
		}
        da[a[i]] := i
	}
    return d[a.Length(), b.Length()] ;I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
}

_lMin( p* )
{
	Ret := p.Pop()
	For each,Val in p
		if ( Val < Ret )
		{
			Ret := Val
		}
	return Ret
}
Recommends AHK Studio
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

29 Oct 2017, 08:28

Great stuff nnnik, thanks for sharing :thumbup:

Cheers.
Edit
iPhilip wrote: used the following script, which could be used for a spell checker, to test how well it compares words:
Neat! :thumbup:
Last edited by Helgef on 01 Nov 2017, 03:17, edited 1 time in total.
User avatar
joedf
Posts: 8940
Joined: 29 Sep 2013, 17:08
Location: Canada
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

29 Oct 2017, 12:27

Nice! better than the original ahk v1.0 implementation by poly. I wanted a non global var version :+1:
Image Image Image Image Image
Windows 10 x64 Professional, Intel i5-8500, NVIDIA GTX 1060 6GB, 2x16GB Kingston FURY Beast - DDR4 3200 MHz | [About Me] | [About the AHK Foundation] | [Courses on AutoHotkey]
[ASPDM - StdLib Distribution] | [Qonsole - Quake-like console emulator] | [LibCon - Autohotkey Console Library]
iPhilip
Posts: 801
Joined: 02 Oct 2013, 12:21

Re: Damerau-Levenshtein Distance - Fuzzy searches

30 Oct 2017, 18:49

Thank you, nnnik!

I used the following script, which could be used for a spell checker, to test how well it compares words:

Code: Select all

Word := "tset"
WordList := "random,task,test,text,toast"
for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))
   Result .= Word.score "-" Word.word "`n"
MsgBox % Substr(Result,1,-1)
ExitApp

Score(Word,WordArray,Func:="") {
   ScoredWords := []
   for Index, WordElement in WordArray
      ScoredWords.Push({word:WordElement, score:%Func%LDistance(Word,WordElement)})
   Return ScoredWords
}

Sort(ScoredWords) {
   for Index, Word in ScoredWords
      SortString .= Word.score " " Index "`n"
   SortString := SubStr(SortString,1,-1)
   Sort, SortString, N
   Indexes := []
   Loop, Parse, SortString, `n
   {  RegExMatch(A_LoopField, "O)(\d+) (\d+)", Match)
      Indexes[Match[1]] .= Match[2] "`n"
   }
   for Index in Indexes
   {  SortString := Substr(Indexes[Index],1,-1)
      Sort, SortString, N
      Indexes[Index] := SortString
   }
   SortedWords := []
   for Score, IndexList in Indexes
      for each, Index in StrSplit(IndexList,"`n")
         SortedWords.Push({word:ScoredWords[Index].word, score:Score})
   Return SortedWords
}
The result using the LDistance function is:

Code: Select all

2-test
2-text
3-task
3-toast
6-random
The result using the DLDistance function is:

Code: Select all

1-test
2-text
3-task
3-toast
6-random
The results show the benefit of the Damerau–Levenshtein distance (DLDistance) over the classical Levenshtein distance (LDistance) because it includes "transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions)." [ref]

Cheers!

Edit: I updated the result for the DLDistance function based on Helgef updated function in this post.
Last edited by iPhilip on 06 Nov 2017, 13:53, edited 1 time in total.
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
burque505
Posts: 1731
Joined: 22 Jan 2017, 19:37

Re: Damerau-Levenshtein Distance - Fuzzy searches

31 Oct 2017, 18:07

Thanks, nnnik and iPhilip, that's cool. I wrapped iPhilip's code in a function so it would work with a "Run Auto Hot Key Script" [sic] Activity in UiPath Studio, which I posted at https://forum.uipath.com/t/levenshtein-distance/7068/6, linking back to this post. I also modified it so it will give back Levenshtein (i.e. change "D" to "" in the script).

Code: Select all

for each, Word in Sort(Score(Word,StrSplit(WordList,","),""))
instead of

Code: Select all

for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))
I like how you made it easy to modify, iPhilip, thanks!

burque505
iPhilip
Posts: 801
Joined: 02 Oct 2013, 12:21

Re: Damerau-Levenshtein Distance - Fuzzy searches

31 Oct 2017, 18:32

Hi burque505,

You are welcome. :)
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
User avatar
Nextron
Posts: 1391
Joined: 01 Oct 2013, 08:23
Location: Netherlands OS: Win10 AHK: Unicode x32

Re: Damerau-Levenshtein Distance - Fuzzy searches

05 Nov 2017, 11:06

Really cool nnnik! Already got a project in mind which could use this, with something like DLDistance(a, b)-Abs(StrLen(a)-StrLen(b))*(1-(Penalty:=0.15)) to allow sub-string searches/matches still give a nice score.
I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
The subtract on the return value may not be sufficient, based on the following result:

Code: Select all

DLDistance("test string","test string") ;returns 0
DLDistance("test string","Xest string") ;returns 0 too
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

05 Nov 2017, 12:49

I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
I was thinking 1 based index vs 0 based index too, but my very quick look at nnnik's code made me think it was handled, but now you mention matrix, which I consider should have 1 based index :)
Originally written in Python translated into AutoHotkey
I'm pretty sure Python uses 0 based indices for lists, but I do not recall if, scipy or numpy matrices uses 1 or 0 based index. :think:
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Damerau-Levenshtein Distance - Fuzzy searches

06 Nov 2017, 12:07

Yea that's why I hesitated to publish the function initially.
If someone could point out the error I would be really greatful.
Recommends AHK Studio
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

06 Nov 2017, 13:11

Is it based on this code? It is from the wikipage, it looks most like your code I think.

Code: Select all

/*
algorithm DL-distance is
    input: strings a[1..length(a)], b[1..length(b)]
    output: distance, integer
    da := new array of |Σ| integers
    for i := 1 to |Σ| inclusive do
        da[i] := 0
    let d[−1..length(a), −1..length(b)] be a 2-d array of integers, dimensions length(a)+2, length(b)+2
    // note that d has indices starting at −1, while a, b and da are one-indexed.
    maxdist := length(a) + length(b)
    d[−1, −1] := maxdist
    for i := 0 to length(a) inclusive do
        d[i, −1] := maxdist
        d[i, 0] := i
    for j := 0 to length(b) inclusive do
        d[−1, j] := maxdist
        d[0, j] := j
    for i := 1 to length(a) inclusive do
        db := 0
        for j := 1 to length(b) inclusive do
            k := da[b[j]]
            ℓ := db
            if a[i] = b[j] then
                cost := 0
                db := j
            else
                cost := 1
            d[i, j] := minimum(d[i−1, j−1] + cost,  //substitution
                               d[i,   j−1] + 1,     //insertion
                               d[i−1, j  ] + 1,     //deletion
                               d[k−1, ℓ−1] + (i−k−1) + 1 + (j-ℓ−1)) //transposition
        da[a[i]] := i
    return d[length(a), length(b)]
*/
I adjusted your code to this,

Code: Select all

msgbox % DLDistance("test string","test string") ;returns 0
msgbox % DLDistance("test string","Xest string") ;returns 1
msgbox % DLDistance("test string","XXst string")


DLDistance( a, b )
{
    da := {}
    d := []
	a := StrSplit( a )
	b := StrSplit( b )
    maxdist := a.Length() + b.Length()
    d[-1, -1] := maxdist
    Loop % a.Length() + 1
	{
        i := A_Index - 1
		d[i, -1] := maxdist
        d[i, 0] := i
    }
	Loop % b.Length() + 1
	{
		j := A_Index - 1
		d[-1, j] := maxdist
        d[0, j] := j
	}
    Loop % a.Length()
	{
        db := 0
		i := A_Index
        Loop % b.Length()
		{
            j := A_Index
			k :=  da.HasKey(b[j]) ?  da[b[j]] : da[b[j]] := 0
            l := db
            if !( cost := !(a[i] == b[j]) )
				db := j
            d[i, j] := _lmin( d[i-1, j-1] + cost, d[i,   j-1] + 1, d[i-1, j  ] + 1, d[k-1, l-1] + (i-k-1) + 1 + (j-l-1 ) ) 
		}
        da[a[i]] := i
	}
    return d[a.Length(), b.Length()] ;I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
}

_lMin( p* )
{
	Ret := p.Pop()
	For each,Val in p
		if ( Val < Ret )
		{
			Ret := Val
		}
	return Ret
}
Cheers.
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Damerau-Levenshtein Distance - Fuzzy searches

07 Nov 2017, 09:37

Thanks for that. I have updated the code
Recommends AHK Studio
burque505
Posts: 1731
Joined: 22 Jan 2017, 19:37

Re: Damerau-Levenshtein Distance - Fuzzy searches

07 Nov 2017, 19:11

nnnik, Helgef, and iPhilip, thank you all for updating your code and your posts. Once again, greatly appreciated.
Regards,
burque505
User avatar
joedf
Posts: 8940
Joined: 29 Sep 2013, 17:08
Location: Canada
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

07 Nov 2017, 21:40

typo: Originally wriiten in C translated into AutoHotkey -> Originally written in C translated into AutoHotkey
Image Image Image Image Image
Windows 10 x64 Professional, Intel i5-8500, NVIDIA GTX 1060 6GB, 2x16GB Kingston FURY Beast - DDR4 3200 MHz | [About Me] | [About the AHK Foundation] | [Courses on AutoHotkey]
[ASPDM - StdLib Distribution] | [Qonsole - Quake-like console emulator] | [LibCon - Autohotkey Console Library]
User avatar
nnnik
Posts: 4500
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Damerau-Levenshtein Distance - Fuzzy searches

14 Nov 2017, 15:28

Thanks
Recommends AHK Studio
jsong55
Posts: 222
Joined: 30 Mar 2021, 22:02

Re: Damerau-Levenshtein Distance - Fuzzy searches

06 Mar 2024, 18:40

Can someone please do v2 version of this?

Not sure how to solve the 2d array issue I get this error Image

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: No registered users and 119 guests