Anagram frequency
An anagram of a word is another word formed by rearranging its letters. For example, "restful" and "fluster" are anagrams.
How common are anagrams? What is the largest set of words that are anagrams of each other? What are the longest words which have anagrams?
You'll get different answers to these questions depending on what dictionary you use. I started with the american-english dictionary from the Linux package wamerican. This list contains 102,305 words. I removed all words containing an apostrophe so that possessive forms didn't count as separate words. I then converted all words to lower case and removed duplicates. This reduced the list to 72,276 words.
Next I alphabetized the letters in each word to create a signature. Signatures that appear more than once correspond to anagrams.
Here are the statistics on anagram classes.
|------+-------| | size | count | |------+-------| | 1 | 62093 | | 2 | 3600 | | 3 | 646 | | 4 | 160 | | 5 | 61 | | 6 | 13 | | 7 | 2 | | 8 | 1 | |------+-------|
This means that 62,093 words or about 86% are in an anagram class by themselves. So about 14% of words are an anagram of at least one other word.
The largest anagram class had eight members:
least
slate
stael
stale
steal
tales
teals
tesla
Stael is a proper name. Tesla is a proper name, but it is also a unit of magnetic induction. In my opinion, tesla should count as an English word and Stael should not.
My search found two anagram classes of size seven:
pares
parse
pears
rapes
reaps
spare
spear
and
carets
caster
caters
crates
reacts
recast
traces
The longest words in this dictionary that form anagrams are the following, two pair of 14-letter words and one pair of 12-letter words.
certifications, rectifications
impressiveness, permissiveness
teaspoonsful, teaspoonfuls
I made a dictionary of anagrams here. Every word which has a anagram is listed, followed by its anagrams. Here are the first few lines:
abby: babyabeam: amebaabed: bade, beadabel: able, bale, bela, elbaabet: bate, beat, betaabets: baste, bates, beast, beats, betasabetter: berettaabhorred: harbored
There is some redundancy in this dictionary for convenience: every word in the list of anagrams will also appear as the first entry on a line.
Here's the Python code that produced the dictionary.
from collections import defaultdictlines = open("american-english", "r").readlines()words = set()for line in lines: if "'" not in line: line = line.strip().lower() words.add(line)def sig(word): return "".join(sorted(word))d = defaultdict(set)for w in words: d[sig(w)].add(w)for w in sorted(words): anas = sorted(d[sig(w)]) if len(anas) > 1: anas.remove(w) print("{}: {}".format(w, ", ".join(anas)))Related post