CodeSOD: Microsoft's English Pluralization Service
Despite founding The Daily WTF more than fifteen years ago, I still find myself astonished and perplexed by the curious perversions in information technology that you all send in. These days, I spend most of my time doing "CEO of Inedo stuff", which means I don't get to code that much. And when I do, it's usually working with the beautiful, completely WTF- and bug-free code that our that our world-class engineers create.
I mention this, because when I come across TDWTF-worthy code on my own, in the wild, it's a very special occasion. And today, I'm excited to share with you one of the worst pieces of code I've seen in a very long time: EnglishPluralizationServices.cs
Anyone even remotely familiar with the English language knows that pluralization is hard, and not exactly something that should be generalized in library... let alone Microsoft's most strategic programming asset of the past two decades. And yet, despite that:
internal class EnglishPluralizationService : PluralizationService, ICustomPluralizationMapping{ private BidirectionalDictionary _userDictionary; private StringBidirectionalDictionary _irregularPluralsPluralizationService; private StringBidirectionalDictionary _assimilatedClassicalInflectionPluralizationService; private StringBidirectionalDictionary _oSuffixPluralizationService; private StringBidirectionalDictionary _classicalInflectionPluralizationService; private StringBidirectionalDictionary _irregularVerbPluralizationService; private StringBidirectionalDictionary _wordsEndingWithSePluralizationService; private StringBidirectionalDictionary _wordsEndingWithSisPluralizationService; private StringBidirectionalDictionary _wordsEndingWithSusPluralizationService; private StringBidirectionalDictionary _wordsEndingWithInxAnxYnxPluralizationService; private List _knownSingluarWords; private List _knownPluralWords; private string[] _uninflectiveSuffixList = new string[] { "fish", "ois", "sheep", "deer", "pos", "itis", "ism" }; private string[] _uninflectiveWordList = new string[] { "jackanapes", "species", "corps", "mackerel", "swine", "debris", "measles", "trout", "diabetes", "mews", "tuna", "djinn", "mumps", "whiting", "eland", "news", "wildebeest", "elk", "pincers", "police", "hair", "ice", "chaos", "milk", "cotton", "pneumonoultramicroscopicsilicovolcanoconiosis", "information", "aircraft", "scabies", "traffic", "corn", "millet", "rice", "hay", "----", "tobacco", "cabbage", "okra", "broccoli", "asparagus", "lettuce", "beef", "pork", "venison", "mutton", "cattle", "offspring", "molasses", "shambles", "shingles" };,>
I'm not sure what's more baffling. Is it the fact that someone knew enough about language constructs to use words like uninflective or assimilatedClassicalInflection, yet didn't know enough to realize that automatic pluralization is impossible? Or perhaps, the fact that the same engineer thought that an Entity Framework user might not only have a database table named jackanapes or wildebeest, but would care about proper, automatic pluralization?
This code should never have been written, clearly. But as bad of an idea this all is, its implementation is just plain wrong. It's been a while since I've studied my sixth-grade vocabulary words, but I'm pretty sure that most of these are not irregular plurals at all:
private Dictionary _irregularPluralsDictionary = new Dictionary() { {"brother", "brothers"}, {"child", "children"}, {"cow", "cows"}, {"ephemeris", "ephemerides"}, {"genie", "genies"}, {"money", "moneys"}, {"mongoose", "mongooses"}, {"mythos", "mythoi"}, {"octopus", "octopuses"}, {"ox", "oxen"}, {"soliloquy", "soliloquies"}, {"trilby", "trilbys"}, {"crisis", "crises"}, {"synopsis","synopses"}, {"rose", "roses"}, {"gas","gases"}, {"bus", "buses"}, {"axis", "axes"},{"memo", "memos"}, {"casino","casinos"}, {"silo", "silos"},{"stereo", "stereos"}, {"studio","studios"}, {"lens", "lenses"}, {"alias","aliases"}, {"pie","pies"}, {"corpus","corpora"}, {"viscus", "viscera"},{"hippopotamus", "hippopotami"}, {"trace", "traces"}, {"person", "people"}, {"chili", "chilies"}, {"analysis", "analyses"}, {"basis", "bases"}, {"neurosis", "neuroses"}, {"oasis", "oases"}, {"synthesis", "syntheses"}, {"thesis", "theses"}, {"change", "changes"}, {"lie", "lies"}, {"calorie", "calories"}, {"freebie", "freebies"}, {"case", "cases"}, {"house", "houses"}, {"valve", "valves"}, {"cloth", "clothes"}, {"tie", "ties"}, {"movie", "movies"}, {"bonus", "bonuses"}, {"specimen", "specimens"} };,>,>
In fact, the "just add an s to make it plural" rule is perhaps the ultimate starting point for pluralization. Words like "pie" and "cow" are not irregular at all, because you just add an "s" to make them "pies" and "cows". And this brings us to our next questions.
Where exactly did this list of not-actually-irregular plurals come from? It was obviously copy/pasted from somewhere, right? An English textbook? Like, perhaps the the teachers' edition? You know, where there's like a handout with pictures of a cow, then a bunch of cows, and a line below it where you write the word? Did the engineer" just use that page?
Up next on our tour is actual the actual pluralization logic, which is handled by InternalPluralize method. It's not nearly as simple as using those dictionaries. Consider this snippet from within that method:
// handle the word that do not inflect in the plural formif (IsUninflective(suffixWord)){ return prefixWord + suffixWord;}
Quick aside: please take a moment to appreciate the irony of improperly capitalization in the comment. "the word that do not inflect". Does this mean the author is not a native English speaker? Was this thing outsourced to Kerbleckistan? Does that even make sense to do for" English? Or was this just a typo?
I digress. Let's keep digging into IsUninflective.
// handle irregular inflections for common suffixes, e.g. "mouse" -> "mice"if (PluralizationServiceUtil.TryInflectOnSuffixInWord(suffixWord, new List() { "louse", "mouse" }, (s) => s.Remove(s.Length - 4, 4) + "ice", this.Culture, out newSuffixWord)){ return prefixWord + newSuffixWord;}if (PluralizationServiceUtil.TryInflectOnSuffixInWord(suffixWord, new List() { "tooth" }, (s) => s.Remove(s.Length - 4, 4) + "eeth", this.Culture, out newSuffixWord)){ return prefixWord + newSuffixWord;}if (PluralizationServiceUtil.TryInflectOnSuffixInWord(suffixWord, new List() { "goose" }, (s) => s.Remove(s.Length - 4, 4) + "eese", this.Culture, out newSuffixWord)){ return prefixWord + newSuffixWord;}if (PluralizationServiceUtil.TryInflectOnSuffixInWord(suffixWord, new List() { "foot" }, (s) => s.Remove(s.Length - 3, 3) + "eet", this.Culture, out newSuffixWord)){ return prefixWord + newSuffixWord;}
It seems that in addition to not knowing English, the author of this class that's used within the .NET framework doesn't really know C# very well? And certainly not how string comparison works in .NET. On the bright side, is that a test for "ese"? Did... our engineer figure out that there are some basic pluralization patterns you can apply? Maybe there's hope after all!
private bool IsUninflective(string word){ EDesignUtil.CheckArgumentNull(word, "word"); if (PluralizationServiceUtil.DoesWordContainSuffix(word, _uninflectiveSuffixList, this.Culture) || (!word.ToLower(this.Culture).Equals(word) && word.EndsWith("ese", false, this.Culture)) || this._uninflectiveWordList.Contains(word.ToLowerInvariant())) { return true; } else { return false; }}
On the plus side, this "common" suffixes logic ensures that both titmouse and snaggletooth are properly pluralized.
[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!