Article 65B8H The world is lumpy

The world is lumpy

by
John
from John D. Cook on (#65B8H)

The Pareto principle, or the 80-20 rule, says that 80% of output comes from 20% of inputs. For example, maybe the top 20% of salesmen generate 80% of a company's revenue.

For some reason, the Pareto principle angers some people. Mention the Pareto principle and someone will explain why it can't be true, based on an overly-literal understanding of the principle. It's a principle, a rule of thumb. It doesn't mean that exactly 80% of output will come from 20% of input, though this is approximately true surprisingly often.

More generally, the Pareto principle means that importance is very unevenly distributed. For example, a study of Japanese kanji used in newspapers found around 4,000 characters in use, but the 10 most frequently used characters accounted for 10% of character usage [1]. If I wanted to learn to read Japanese, I'd start by learning these 10 kanji.

top10kanji.png

The opposite of the Pareto principle would be the uniformitarian presumption that everything is equally important. A uniformatarian approach to reading Japanese would say Well, there are 4,000 kanji in use, so I'm going to study a Japanese dictionary from the beginning and learn them all." Even this would not be an entirely uniformitarian approach, since it starts with the 4,000 kanji the survey found in newspapers. Japanese has over 50,000 kanji.

Nobody would do this. Nobody is completely uniformitarian. However, most of us tend to underestimate how unevenly things are distributed. Of course the most common kanji are used the most: that's what it means to be most common! But I would not have expected that just 10 characters account for 10% of character use.

I know about the Pareto principle, power laws, etc. I know things are unevenly distributed and have written about this many times. For example, I wrote about Twitter follower distribution a few weeks ago. I expected kanji frequency to be very uneven, but I still underestimated how uneven it is.

[1] The same study found that the top 500 kanji accounted for 80% of characters. So about 12% of kanji accounted for 80% of usage. That's relative to the 4,000 kanji found in the study. It's less than 1% of the potential kanji someone could use.

The post The world is lumpy first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments