Permutation distance
If you have two sequences, and one is a permutation of the other, how do you measure how far apart the two sequences are?
This post was motivated by the previous post on anagram statistics. For a certain dictionary, the code used in that post found that the longest anagram pairs were the following:
certifications, rectifications
impressiveness, permissiveness
teaspoonsful, teaspoonfuls
Anagrams are more amusing when the two words are more dissimilar, at least in my opinion.
There are several ways to measure (dis)similarity. The words in the first two pairs above are dissimilar in meaning, but the words in the last pair are synonyms [1]. The pairs of words above are not that dissimilar as character strings.
It would be hard to quantify how dissimilar two words are in meaning, but easier to quantify how dissimilar they are as sequences of characters. We'll look at two approaches: Hamming distance and transposition distance.
Hamming distanceThe Hamming distance between two words is the number of positions in which they differ. We can see the Hamming distances between the words above by viewing them in a fixed-width font.
certificationsrectificationsimpressivenesspermissivenessteaspoonsfulteaspoonfuls
The Hamming distances above are 2, 5, and 4.
The anagrams with maximum Hamming distance in my dictionary are "imperfections" and "perfectionism." (There's something poetic about these words being anagrams.) The Hamming distance is 13 because these 13-letter words differ in every position.
imperfectionsperfectionismTransposition distance
Another way to measure the distance between two permutations is how many elements would have to be swapped to turn one list into the other. This is called the transposition distance. For example, the transposition distance between "certifications" and "rectifications" is 1, because you only need to swap the first and third characters to turn one into the other.
It's not so obvious what the transposition distance is between "impressiveness" and "permissiveness". We can find an upper bound on the distance by experimentation. The two words only differ in the first five letters, so the distance between "impressiveness" and "permissiveness" is no larger than the distance between "impre" and "permi". The latter pair of words are no more than 4 transpositions apart as the following sequence shows:
imprepmirepeirmperimpermi
But is there a shorter sequence of transpositions? Would it help if we used the rest of the letters "ssiveness" as working space?
Computing transposition distance is hard, as in NP-hard. This is unfortunate since transposition has more practical applications than just word games. It is used, for example, in genetic research. It may be practical to compute for short sequences, but in general one must rely on bounds and approximations.
Note that transposition distance allows swapping any two elements. If we only allowed swapping consecutive elements, the problem is much easier, but the results are not the same. When restricted to consecutive swaps, the distance between "certifications" and "rectifications" is 3 rather than 1. We can swap the "c" and the "r" to turn "cer" into "rec" so we have to do something like
cerecrercrec
I don't know a name for the distance where you allow rotations. The words "teaspoonsful" and "teaspoonfuls" differ by one rotation, turning "sful" into "fuls." While this can be done in one rotation, it would take several swaps.
Related posts[1] Although "teaspoonfuls" is far more common, I remember a schoolmarm teaching us that this is incorrect.
And I still recoil at "brother in laws" and say "brothers in law" instead; the majority is on my side on this one.