automaticman
Joined: 27 Oct 2006 Posts: 648
|
Posted: Wed Dec 30, 2009 1:04 pm Post subject: |
|
|
It would be interesting to know or to explore (further) spaces where this distance measure could be useful in practical scenarios.
In the Wikipedia article we can read
| Quote: | | While the original motivation was to measure distance between human misspellings to improve applications such as spell checkers, Damerau–Levenshtein distance has also seen uses in biology to measure the variation between DNA. |
Let me write my opening thoughts here: First of all, we need a data representation in text/string form, which might be anything, especially if applied a la "unix methodology: The Importance of Being Textual and Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust" as mentioned in "The Art of Unix Programming" by Eric S. Raymond, and more importantly after which the above defined "edit distance measures" would have some useful meaning in that particular domain.
So the question is how can we "design" text representation spaces for data/information that we can make use of this distance measure in practice? Which text representation forms/formats are more useful, which are less useful, and for both of these, why and why not?
As a simple example just imagine a sequence of musical symbolic notes as available in .mid files and representing them as .txt, how should this .txt format be designed and which designs would be more powerful to this distance measure? |
|