Article 54TXN Chemical element frequency in writing

Chemical element frequency in writing

by
John
from John D. Cook on (#54TXN)

How do the frequencies of chemical element names in English text compare to the abundance of elements in Earth's crust? Do we write most frequently about the elements that appear most frequently?

It turns out the answer is not really." The rarest elements rarely appear in writing. We don't have much to say about dysprosium, thulium, or lutetium, for example. But overall there's only a small correlation between word frequency and chemical frequency. (The rank correlation is substantially higher than ordinary linear correlation.)

We write often about things like oxygen and iron because they're such a part of the human experience. On the other hand, we care about some things like silver and gold precisely because they are rare.

Here are the most common elements according to text usage.

|------------+--------+-----------+---------+------------|| element | word % | word rank | earth % | earth rank ||------------+--------+-----------+---------+------------|| lead | 15.50 | 1 | 0.001 | 36 || gold | 11.64 | 2 | 0.000 | 75 || iron | 11.14 | 3 | 5.612 | 4 || silver | 7.38 | 4 | 0.000 | 68 || carbon | 5.15 | 5 | 0.012 | 17 || oxygen | 5.13 | 6 | 45.956 | 1 || copper | 4.61 | 7 | 0.006 | 26 || hydrogen | 3.51 | 8 | 0.139 | 10 || sodium | 3.38 | 9 | 2.352 | 6 || calcium | 2.84 | 10 | 4.137 | 5 || nitrogen | 2.79 | 11 | 0.002 | 34 || mercury | 2.22 | 12 | 0.000 | 67 || tin | 2.13 | 13 | 0.000 | 51 || potassium | 1.94 | 14 | 2.083 | 8 || zinc | 1.70 | 15 | 0.007 | 24 || silicon | 1.12 | 16 | 28.112 | 2 || nickel | 1.08 | 17 | 0.008 | 23 || phosphorus | 1.05 | 18 | 0.104 | 11 || magnesium | 0.98 | 19 | 2.322 | 7 || sulfur | 0.84 | 20 | 0.035 | 16 ||------------+--------+-----------+---------+------------|

This is based on the Google book corpus summarized here. There's some ambiguity; I imagine most used of lead" are the verb and not the element name. Some portion of the uses of iron" refer to a device for smoothing wrinkles out of clothes.

Word percentage is relative to the set of chemical element names. Earth percentage is relative to the Earth's crust.

The percentages above have been truncated for presentation' obviously the abundance of gold, silver, mercury, and tin is not zero, though it is when rounded to three decimal places. The full data for the first 111 elements is available here.

TZivMh_3Gkw
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments