Levenshtein and Distance Between Strings in 3D
Working on the strings distances, or text metrics, I found out the Levenshtein method insufficient. For less-than-similar strings it doesn’t help at all, giving numbers close to max possible, and for similar strings it does not consider the quality of different letters. Generally speaking, I find Janet1 closer to Janet2 than to Janet9. Or ABC closer to BBC than to NBC. The notion of number of operations in Lev method didn’t quite suit me either. Thinking of operations needed to create one string of the other, I’d rather take the count of smartest possible copy&paste moves. In other words, how many times I have to cut one string to make the other of the slices. That would be distance in first dimension. The other – distance between letters replacing each other: when abc becomes bbc, it’s a-b replacement, and distance from a to b is 1. The distance depends on the alphabet used. For some cases it’s more useful to use a keyboard-layout order of characters instead of usual alphabetic, in order to emphasise similarities based on easy typed sequences, like asdf or qwerty. Here’s some Flash demo, calculator and benchmark to compare performance of Levenshtein and my method.
3d position of the words
it depends on their similarity to the word you input
Sorry, either Adobe flash is not installed or you do not have it enabled
Similarity calculator
gives the original Levenshtein and the distance3d figures
Sorry, either Adobe flash is not installed or you do not have it enabled
Benchmarking
Levenshtein is rather quadratic, while distance3d seems more like linear, though the
difference shows up for words longer than 15 characters.
Sorry, either Adobe flash is not installed or you do not have it enabled

