Article 43DQD Rényi Entropy

Rényi Entropy

by
John
from John D. Cook on (#43DQD)

renyi_alfred.jpg

The most common way of measuring information is Shannon entropy, but there are others. Ri(C)nyi entropy, developed by Hungarian mathematician Alfri(C)d Ri(C)nyi, generalizes Shannon entropy and includes other entropy measures as special cases.

Ri(C)nyi entropy of order I

If a discrete random variable X has n possible values, where the ith outcome has probability pi, then the Ri(C)nyi entropy of order I is defined to be

renyi_discrete.svg

for 0 a I a a. In the case I = 1 or a this expression means the limit as I approaches 1 or a respectively.

Ri(C)nyi entropy of continuous random variable

The definition of Ri(C)nyi entropy can be extended to continuous random variables by

renyi_continuous.svg

but unlike the discrete case, Ri(C)nyi entropy can be negative for continuous random variables, and so Ri(C)nyi entropy is typically only used for discrete variables. For example, let X be the random variable defined on [1, a) with density

renyi_example_pdf.svg

Then

renyi_example_integral.svg

Special cases

Each value of I gives a possible entropy measure. All are additive for independent random variables. And for each discrete random variable X, HI is a monotone non-decreasing function of I.

Max-entropy: I = 0

Assume all the probabilities pi are positive. Then the H0 is known as the max-entropy, or Hartley entropy. It is simply log2n, the log of the number of values X takes on with positive probability.

Shannon entropy: I = 1

When I = 1 we get the more familiar Shannon entropy:

renyi_shannon.svg

Collision entropy: I = 2

When the order I is not specified, it's implicit default value is 2. That is, when people speak of Ri(C)nyi entropy without qualification, they often have in mind the case I = 2. This case is also called collision entropy and is used in quantum information theory.

Min-entropy: I = a

In the limit as I goes to a, the Ri(C)nyi entropy of X converges to the negative log of the probability of the most probable outcome. That is,

renyi_infinity.svg

Relation to Lebesgue norms

Let p without a subscript be the vector of all the pi. Then for I not equal to 1,

renyi_2lebesgue.svg

As with Lebesgue norms, you use varying values of the parameter to emphasize various features.

Related posts

This post was a warm-up for the next post: Ri(C)nyi differential privacy.

Other related posts:

85Z08hHt_C8
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments