Article 4SQYS Greek letter frequency and entropy

Greek letter frequency and entropy

by
John
from John D. Cook on (#4SQYS)

Would the letters in an ancient Greek text carry more or less information than in modern English?

To address this question, I downloaded a copy of the Greek New Testament from Project Gutenberg and ran the word frequency script from my previous post.

This lead to the follow table of letters and percent frequency.

I 13.10I 10.44I^1 9.76I 9.38If 7.72I 6.29I 4.82I... 4.52I 3.77I 7.90I 3.42I 3.32I1/4 2.85I 2.47I 2.16I^3 1.75I 1.54I 1.48I 0.78I 0.77I^2 0.75I 0.41I3/4 0.40I 0.20

From this I calculated the Shannon entropy of a Greek letter to be 4.045 bits. Using English letter frequencies I found on Wikipedia, I calculated the corresponding entropy for English to be 3.915. So in this regard, the two languages are pretty similar.

By the way, the frequency table for ancient (Koine) Greek letters is something like the famous ETAOIN SHRDLU order for English. The most common letters in Greek line up roughly with their English counterparts.

Update: Homer and Plato

I first wrote this post just looking at the New Testament, written in Koine Greek. The table below includes the results from Homer's Iliad and Plato's Republic to get a sample of other ancient Greek sources.

|---+-------+-------+----------|| | NT | Iliad | Republic ||---+-------+-------+----------|| I | 13.10 | 13.71 | 12.86 || I^2 | 0.75 | 0.92 | 0.53 || I^3 | 1.75 | 1.82 | 1.18 || I | 1.54 | 1.53 | 1.94 || I | 9.38 | 8.10 | 8.34 || I | 0.41 | 0.43 | 0.36 || I | 3.32 | 2.94 | 4.01 || I | 1.48 | 1.09 | 1.38 || I^1 | 9.76 | 8.82 | 9.86 || I | 3.77 | 4.18 | 3.52 || I | 2.47 | 2.75 | 2.97 || I1/4 | 2.85 | 3.38 | 3.13 || I | 4.82 | 5.72 | 8.95 || I3/4 | 0.40 | 0.54 | 0.36 || I | 10.44 | 10.72 | 10.23 || I | 7.90 | 3.55 | 3.78 || I | 3.42 | 4.33 | 3.29 || If | 7.72 | 7.88 | 6.61 || I | 6.29 | 8.34 | 7.53 || I... | 4.52 | 4.27 | 4.54 || I | 0.77 | 1.25 | 0.83 || I | 0.78 | 1.44 | 1.02 || I | 0.20 | 0.16 | 0.13 || I | 2.16 | 2.15 | 2.63 ||---+-------+-------+----------|

The frequencies are very similar, and they lead to very similar entropy calculations: 4.08 bits for Iliad and 4.05 bits for Republic.

Related postsW8uhujr-Lqw
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments