Article 5NB5P Index of coincidence

Index of coincidence

by
John
from John D. Cook on (#5NB5P)

Index of coincidence is a statistic developed by William Friedman for use in cryptanalysis. It measures how unevenly symbols are distributed in a message.

It's a kind of signature that could be used, for example, to infer the language of a text, even if the text has been encrypted with a simple substitution cipher. It also has more sophisticated uses, such as inferring key length in text that has been encrypted with a Vigenere cipher.

If a discrete random variable has n possible values, each occurring with probability pi, then its index of coincidence is

ioc_def.svg

In cryptanalysis, the probabilities are the letter frequencies in the language of the message. The sum above is often multiplied by some constant, but this makes no difference in application because you would compare index of coincidence values, not interpret the index in the abstract.

Often the sum above is multiplied by n. This amounts to dividing the sum above by the corresponding sum for a uniform distribution.

Index of coincidence is occasionally use in applications, though not in cryptography anymore.

Example

The letter frequencies in Google's English language corpus are given here. In this corpus the probabilities range from 0.1249 for e down to 0.0009 for z. The index of coincidence would be

0.1249^2 + 0.0928^2 + ... + 0.0009^2 = 0.06612

You're more likely to see this value multiplied by 26, which gives 1.719.

Relation to entropy

Index of coincidence seems a lot like entropy, and in fact it is a simple transformation of an entropy, though not Shannon entropy. Index of coincidence is closely related to Renyi entropy.

Renyi entropy of order is defined for a random variable X to be

renyi_discrete.svg

and so Renyi entropy of order 2 is the negative log of the index of coincidence. That is, if H is the Renyi entropy of order 2 and I is the index of coincidence, then

ioc_renyi.svg

Index of coicidence and Renyi entropy are sometimes defined with multiplicative constants that would slightly change the equations above.

Since exp(-x) is a decreasing function, increasing index of coincidence corresponds to decreasing Renyi entropy. Sorting in increasing order by index of coincidence is the same as sorting in decreasing order by Renyi entropy.

RelatedThe post Index of coincidence first appeared on John D. Cook.T5FtKX3-ilU
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments