AutoHotkey Community

It is currently May 27th, 2012, 12:15 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 23 posts ]  Go to page Previous  1, 2
Author Message
 Post subject:
PostPosted: January 12th, 2006, 2:54 pm 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
esromneb wrote:
Case sensitive would be faster for me then, right?
There is no significant speed diference in AHK. In other languages the case sensitive comparision could be faster.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 20th, 2010, 10:23 am 
Offline

Joined: January 31st, 2005, 9:50 am
Posts: 3910
Location: Bremen, Germany
I tried the code and found out, that is giving funny results when the string contains spaces. I changed Laszlos code in one line from
Code:
si = %A_LoopField%
to
Code:
si := A_LoopField
Now it works fine. I also added the case sensitive comparison

Full code below:
Code:
MsgBox % Levenshtein_distance( "A H K", "A H K" )
MsgBox % Levenshtein_distance( "AHK", "" )
MsgBox % Levenshtein_distance( "He", "Ben" )
MsgBox % Levenshtein_distance( "Toralf", "esromneb" )
MsgBox % Levenshtein_distance( "Toralf", "RalfLaDuce" )

Levenshtein_distance( s, t )
{
   n := StrLen(s)
   IfEqual n,0, Return m
   m := StrLen(t)
   IfEqual m,0, Return n

   d0_0 = 0
   Loop %n%
      d0_%A_Index% := A_Index
   Loop %m%
      d%A_Index%_0 := A_Index

   Loop Parse, s
   {
      i  = %A_Index%
      i1:= i-1
      si := A_LoopField
      Loop Parse, t
      {
         j1 := A_Index - 1
         d%A_Index%_%i% := min( d%A_Index%_%i1%+1, d%j1%_%i%+1, d%j1%_%i1%+!(si == A_LoopField) )
      }
   }
   Return d%m%_%n%
}

min( a, b, c )
{
   IfLess b, %a%, SetEnv a, %b%
   IfLess c, %a%, Return c
   Return a
}

_________________
Ciao
toralf
Image


Last edited by toralf on June 21st, 2010, 3:15 pm, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 20th, 2010, 10:32 am 
Offline

Joined: January 31st, 2005, 9:50 am
Posts: 3910
Location: Bremen, Germany
On wikipedia
http://en.wikipedia.org/wiki/Levenshtein_distance
several possible improvements are mentioned:
    1) If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix. In this way, the algorithm can be run in O(kl) time, where l is the length of the shortest string.

    2) By examining diagonals instead of rows, and by using lazy evaluation, we can find the Levenshtein distance in O(m (1 + d)) time (where d is the Levenshtein distance), which is much faster than the regular dynamic programming algorithm if the distance is small.

    3) We can adapt the algorithm to use less space, O(m) instead of O(mn), since it only requires that the previous row and current row be stored at any one time.


Could you please give me a hand to improve the above code? I do not even know where to start? I may be able to get 1) running, but need to play with it. Any help appreciated. Thanks

_________________
Ciao
toralf
Image


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 20th, 2010, 11:59 am 
Offline

Joined: January 31st, 2005, 9:50 am
Posts: 3910
Location: Bremen, Germany
I found a faster algorithm (about 10 times) on the web: SIFT3.
I'll work on that, since it is enough for waht I need.
I'll post the code when I have translated it.

Done. The SIFT3 code in AHK is here.

_________________
Ciao
toralf
Image


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 21st, 2010, 5:00 am 
Offline

Joined: February 14th, 2005, 4:05 pm
Posts: 4710
Location: Boulder, CO
toralf wrote:
funny results when the string contains spaces
have you tried AutoTrim Off? But, your expression assignment could be simpler.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 24th, 2010, 4:48 pm 
Offline
User avatar

Joined: August 23rd, 2010, 6:22 pm
Posts: 781
Location: Ontario, Canada
Here is an optimised version of Decarlo110's script, as I interpreted it:

Code:
TextDifference2(String1,String2)
{
 Distance := 0, (String1Len := StrLen(String1)) < (String2Len := StrLen(String2)) ? (ShorterString := String1, LongerString := String2) : (ShorterString := String2, LongerString := String1)
 Loop, Parse, ShorterString
  SubStr(LongerString,A_Index,1) <> A_LoopField ? Distance ++
 Return, Distance + Abs(String1Len - String2Len)
}


It is an order of magnitude faster, and works in O(n) time. As originally mentioned, however, it's still using the simple scanning strategy, and so is not as accurate as any of the other examples. It's main advantage is that it is extremely fast.

Say, isn't the SIFT3 algorithm similar to this, except with an offset? It's just finding the longest common string, right?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 24th, 2010, 4:53 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
http://www.autohotkey.com/forum/viewtop ... ight=sift3

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 24th, 2010, 4:54 pm 
Offline
User avatar

Joined: August 23rd, 2010, 6:22 pm
Posts: 781
Location: Ontario, Canada
Thanks for the link, checking out now.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page Previous  1, 2

All times are UTC [ DST ]


Who is online

Users browsing this forum: fincs and 7 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group