Article 6DCYV ARPAbet and the Major mnemonic system

ARPAbet and the Major mnemonic system

by
John
from John D. Cook on (#6DCYV)

giraffe.jpg

ARPAbet is a phonetic spelling system developed by- you guessed it-ARPA, before it became DARPA.

The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted in IPA is ZH in ARPAbet.

In ARPAbet notation, the Major mnemonic system can be summarized as follows:

0: S or Z
1: D, DH, T, or DH
2: N or NG
3: M
4: R
5: L
6: CH, JH, SH, or ZH
7: G or K
8: F or V
9: P or B

Numbers are encoded using the consonant sounds above; the system is based on sounds and not on spelling. You can insert any vowels or semivowels (e.g. w or y) you like. For example, you could encode 648 as giraffe" or 85 as waffle."

The CMU Pronouncing Dictionary lists 134,373 words along with their ARPAbet pronunciation. The Python code below will read in the pronouncing dictionary and produce a Major mnemonic dictionary. The resulting file is available here as a zip compressed text file.

To find a word that encodes a number, search the code output for that number. For example,

 grep ' 648' cmu_major.txt

will find words whose Major encoding begins with 648, and

 grep ' 648$' cmu_major.txt

fill find words whose Major encoding is exactly 648.

From this we learn that sherriff" is another possible encoding for 648.

Filling in the gaps

Suppose you're looking for encodings for all three digit numbers, 000 through 999. This can be hard to do. A common compromise is to only regard up to the first three consonants in a word. For example, you might use ladybug" to encode 519, ignoring the final G sound on the end.

The tradeoff is that if you adopt this rule then you can't use ladybug" to encode 5197. But finding single words that encode 4-digit numbers can be challenging if not impossible, so you may just forego the possibility. This is why in the example above I show both searching for numbers that begin with 648 and numbers that are exacly 648.

Despite the large size of the CMU dictionary, it does not contain words that map to numbers beginning with the following digits: 063, 333, 444, 466, 555, 688, 833, 866, 868, 883, 898, 988.

I can offer suggestions for these numbers, but it's hard to use anyone else's mnemonics. You may have to make up your own, using, for example, names of people you know personally or brand names you're familiar with etc.

  • 063: sashimi
  • 333: 3M corporate logo
  • 444: rah, rah, rah (i.e. cheerleader)
  • 466: Irish jig
  • 555: yellow oil well
  • 688: Chevy Forester
  • 833: hive mamma (i.e. queen bee)
  • 866: vichyssoise
  • 868: chief justice
  • 883: half femur
  • 898: half beef
  • 988: beef fajitas
Python code
# NB: File encoding is Latin-1, not UTF-8.with open("cmudict-0.7b", "r", encoding="latin-1") as f: lines = f.readlines()for line in lines: line.replace('0','') # remove stress notation line.replace('1','') line.replace('2','') pieces = line.split() numstr = "" for p in pieces[1:]: match p: case "S" | "Z": numstr += "0" case "D" | "DH" | "T" | "DH": numstr += "1" case "N" | "NG": numstr += "2" case "M": numstr += "3" case "R": numstr += "4" case "L": numstr += "5" case "CH" | "JH" | "SH" | "ZH": numstr += "6" case "G" | "K": numstr += "7" case "F" | "V": numstr += "8" case "P" | "B": numstr += "9" print(pieces[0], numstr)
The post ARPAbet and the Major mnemonic system first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments