I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem? Topic is solved

Get help with using AutoHotkey (v2 or newer) and its commands and hotkeys
viv
Posts: 219
Joined: 09 Dec 2020, 17:48

I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?

24 Feb 2023, 06:09

C#
https://stackoverflow.com/a/40775015

Ahk

Code: Select all

LD(source, target) {
    if (source = target)
        return 0
    if !(lens := StrLen(source))
        return 0
    if !(lent := StrLen(target))
        return 0

    v0 := []
    v1 := []
    Loop lent + 1
        v0.Push(A_Index - 1)

    Loop lens {
        v1.Push(A_Index)
        i := A_Index
        Loop lent
        {
            cost := (SubStr(source, i, 1) = SubStr(target, A_Index, 1)) ? 0 : 1
            if v1.Length < A_Index + 1
                v1.Push("")
            v1[A_Index + 1] := Min(v1[A_Index] + 1, Min(v0[A_Index + 1] + 1, v0[A_Index] + cost))
        }
        Loop lent + 1
            v0[A_Index] := v1[A_Index]
    }
    return v1[lent + 1]
}
Simple test
20230224190749.jpeg
20230224190749.jpeg (44.67 KiB) Viewed 812 times
User avatar
boiler
Posts: 17384
Joined: 21 Dec 2014, 02:44

Re: I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?

24 Feb 2023, 07:08

C# is a compiled language and AHK is an interpreted language. What difference would you have expected to be reasonable?
viv
Posts: 219
Joined: 09 Dec 2020, 17:48

Re: I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?

24 Feb 2023, 07:35

@boiler

Because most functions don't have that much of a gap?

Code: Select all

loop 1000000
   v := s1 = s2 ; v1 := StrLen(s1) 
c# 12ms
ahk 156ms
This is almost no effect on the use of...

So I'm looking for what might be causing the problem.
The most likely part is probably the SubStr part?
C# can access a character of the current string directly by index
AHK I can only use SubStr for now

Is the overhead of this part that big?
I've split the string StrSplit and accessed it through an array and it doesn't improve much
iseahound
Posts: 1471
Joined: 13 Aug 2016, 21:04
Contact:

Re: I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?

24 Feb 2023, 17:35

It's the loops. They're faster in assembly (or the common language runtime in this case). I could use machine code to make it as fast, but it's not that slow!
iPhilip
Posts: 835
Joined: 02 Oct 2013, 12:21

Re: I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?  Topic is solved

24 Feb 2023, 23:04

@viv If you are interested in making your AutoHotkey script faster, the function below is ~30% faster than the code in your original post. It also fixes a few errors: what's returned when empty strings are passed or when the strings differ by case.

Code: Select all

; https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_two_matrix_rows

LD(Source, Target) {
   if Source == Target
      return 0
   Source := StrSplit(Source)
   Target := StrSplit(Target)
   if !Source.Length
      return Target.Length
   if !Target.Length
      return Source.Length
   
   v0 := [], v1 := []
   Loop Target.Length + 1
      v0.Push(A_Index - 1)
   v1.Length := v0.Length
   
   for Index, SourceChar in Source {
      v1[1] := Index
      for TargetChar in Target
         v1[A_Index + 1] := Min(v1[A_Index] + 1, v0[A_Index + 1] + 1, v0[A_Index] + (SourceChar !== TargetChar))
      Loop Target.Length + 1
         v0[A_Index] := v1[A_Index]
   }
   return v1[Target.Length + 1]
}
Cheers!
Windows 10 Pro (64 bit) - AutoHotkey v2.0+ (Unicode 64-bit)
viv
Posts: 219
Joined: 09 Dec 2020, 17:48

Re: I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?

25 Feb 2023, 00:01

@iPhilip

Thanks for the help, it's really faster
But it still doesn't meet my needs

I started by loading the CLR in AHK
It was very very fast

But some users don't like net or maybe the environment has problems to load CLR
So I want to switch to AHK native functions

It is not a core parameter
I made a renaming software
I use it to display the magnitude of the change after renaming for sorting purposes


In the test, it takes more than one second to load 100 files
This is too much consumption

It is common to rename a batch of thousands of files at a time
This would take 10+ seconds...

I decided to disable it when I couldn't load the CLR.

Finally, thanks for your help and have a nice life!
toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: I transcribed a C# Levenshtein distance function, the performance is 100 times slower? What is the problem?

26 Feb 2023, 11:18

I tested Levenshtein a few years ago to compare meta data of music files. I had similar thoughts on speed. I switched to SIFT3, it gives different numbers, but it was about 13 times faster and did the job that was expected.
See http://www.autohotkey.com/forum/topic59407.html
ciao
toralf

Return to “Ask for Help (v2)”

Who is online

Users browsing this forum: Bing [Bot] and 40 guests