[Class] stringc.ahk

Post your working scripts, libraries and tools for AHK v1.1 and older
User avatar
Chunjee
Posts: 1495
Joined: 18 Apr 2014, 19:05
Contact:

[Class] stringc.ahk

Post by Chunjee » 03 Jun 2022, 18:28

stringc.ahk

Image
Image Image Image Image Image


Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.


Installation In a terminal or command line navigated to your project folder:

Code: Select all

npm install stringc.ahk
In your code only export.ahk needs to be included:

Code: Select all

#Include %A_ScriptDir%\node_modules
#Include stringc.ahk\export.ahk
ostringc := new stringc()

ostringc.compare("test", "testing")
; => 0.67
ostringc.compare("Hello", "hello")
; => 1.0

API Including the module provides a class stringc with three methods: .compare, .compareAll, and .bestMatch
Last edited by Chunjee on 22 May 2024, 14:20, edited 8 times in total.
User avatar
Chunjee
Posts: 1495
Joined: 18 Apr 2014, 19:05
Contact:

Re: [Class] stringc.ahk

Post by Chunjee » 03 Jun 2022, 18:29

Releases



Installation
In a terminal or command line navigated to your project folder:

Code: Select all

npm install stringc.ahk
You may also review or copy the package from ./export.ahk on GitHub; #Incude as you would normally when manually downloading.

In your code only export.ahk needs to be included:

Code: Select all

#Include %A_ScriptDir%\node_modules\stringc.ahk\export.ahk
ostringc := new stringc()

ostringc.compare("test", "testing")
; => 0.67
ostringc.compare("Hello", "hello")
; => 1.0

API Including the module provides a class stringc with three methods: .compare, .compareAll, and .bestMatch
Last edited by Chunjee on 11 Jun 2022, 17:40, edited 3 times in total.
User avatar
Chunjee
Posts: 1495
Joined: 18 Apr 2014, 19:05
Contact:

Re: [Class] stringc.ahk

Post by Chunjee » 03 Jun 2022, 18:30

API Including the module provides a class stringc with three methods: .compare, .compareAll, and .bestMatch


compare(string1, string2, [function]) Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-insensitive.


Arguments string1 (string): The first string

string2 (string): The second string

function (function): A function to applied to both strings prior to comparison.

Order does not make a difference.


Returns (Number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.


Example

Code: Select all

stringc.compare("healed", "sealed")
; => 0.80

stringc.compare("Olive-green table for sale, in extremely good condition."
	, "For sale: table in very good  condition, olive green in colour.")
; => 0.71

stringc.compare("Olive-green table for sale, in extremely good condition."
	, "For sale: green Subaru Impreza, 210,000 miles")
; => 0.30

stringc.compare("Olive-green table for sale, in extremely good condition."
	, "Wanted: mountain bike with at least 21 gears.")
; => 0.11

compareAll(targetStrings, mainString, [function]) Compares mainString against each string in targetStrings.


Arguments targetStrings (array): Each string in this array will be matched against the main string.

mainString (string): The string to match each target string against.

function (function): A function to applied to each element in targetStrings prior to comparison.


Returns (Object): An object with a ratings property, which gives a similarity rating for each target string, and a bestMatch property, which specifies which target string was most similar to the main string. The array of ratings are sorted from higest rating to lowest.


Example

Code: Select all

stringc.compareAll(["For sale: green Subaru Impreza, 210,000 miles"
	, "For sale: table in very good condition, olive green in colour."
	, "Wanted: mountain bike with at least 21 gears."]
	, "Olive-green table for sale, in extremely good condition.")
; =>
{ ratings:
	[{ target: "For sale: table in very good condition, olive green in colour.",
		rating: 0.71 },
	{ target: "For sale: green Subaru Impreza, 210,000 miles",
		rating: 0.30 },
	{ target: "Wanted: mountain bike with at least 21 gears.",
		rating: 0.11 }],
	bestMatch:
	{ target: "For sale: table in very good condition, olive green in colour.",
		rating: 0.71 } }

bestMatch(targetStrings, mainString, [function]) Compares mainString against each string in targetStrings.


Arguments mainString (string): The string to match each target string against.

targetStrings (Array): Each string in this array will be matched against the main string.

function (function): A function to applied to strings prior to comparison.


Returns (String): The string that was most similar to the first argument string.


Example

Code: Select all

stringc.bestMatch([" hard to    "
	, "hard to"
	, "Hard 2"]
	, "Hard to")
; => "hard to"
Last edited by Chunjee on 21 Oct 2022, 15:58, edited 2 times in total.
User avatar
Chunjee
Posts: 1495
Joined: 18 Apr 2014, 19:05
Contact:

Re: [Class] stringc.ahk

Post by Chunjee » 03 Jun 2022, 18:31

Q&A

This class is the spiritual successor of string-similarity.ahk; while using that class I often had several issues; the method names were long and hard to remember, the argument order did not follow the haystack-needle order, etc. This class is an attempt to fix those weakpoints and pivot in a new direction if needed. One big change you may also notice is the optional function argument. I work with 2d arrays often and it was a hassle using those in the old class, this should allow you much more flexibility.



I was making a movie metadata thing and needed to match the user's input with the closest imdb match. Most string comparisons seem to be designed with short single strings in mind. Longer strings like "Harry Potter and the Chamber of Secrets (2002)" proved difficult or would return huge numbers, creating additional sorting work. I really enjoy the concept of scoring via a number between 0 and 1. This class comes with some comfort methods like compareAll or bestMatchso you can feed an entire array and just get to what you wanted in the first place.
Last edited by Chunjee on 22 May 2024, 12:42, edited 2 times in total.
iseahound
Posts: 1471
Joined: 13 Aug 2016, 21:04
Contact:

Re: [Class] stringc.ahk

Post by iseahound » 03 Jun 2022, 21:50

Cool! Never heard of the sorenson dice coefficient, so that was interesting to learn. Taking 2x the cardinality of the interesction of the sets / total number of pairs to get a simple similarity measure is pretty smart. If you are working with large strings, you can optimize it by writing it in c.
User avatar
Chunjee
Posts: 1495
Joined: 18 Apr 2014, 19:05
Contact:

Re: [Class] stringc.ahk

Post by Chunjee » 22 May 2024, 12:27

Encountered a bug and fixed it;

v0.2.1
.compare the optional function argument was being called with four arguments, two of which would always be blank. This has been fixed and added to tests.

I added some JSDoc comments if your IDE can make use of them
Post Reply

Return to “Scripts and Functions (v1)”