Turning K-L divergence into a metric
Kullback-Leibler divergence is defined for two random variablesX and Y by
K-L divergence is non-negative, and it's zero if and only if X andY have the same distribution. But it is not a metric, for reasons explained here. For one thing, it's not symmetric.
Jeffreys divergenceWe can fix the symmetry problem by defining
TheJ above stands for Jeffreys, for Harold Jeffreys. J is called either the symmetrized K-L divergence or Jeffreys' divergence. It's still a divergence, not a distance.
A distance (metric) d has to have four properties:
- d(x, x) = 0
- d(x,y) > 0 ifx y
- d(x,y) =d(y,x)
- d(x,z) d(x,y) +d(y,z)
K-L divergence satisfies the first two properties. Jeffreys' divergence satisfies the first three, but not the last one, the triangle inequality.
To show thatJ doesn't satisfy the triangle inequality, letX, Y, andZ be Bernoulli random variables withp equal to 0.1, 0.2, and 0.3 respectively. Then the following Python code shows that the divergence fromX toY, plus the divergence fromY toZ, is less than the divergence fromX toZ. This would be like saying you could get from LA to NYC faster by having a layover in Denver rather than taking a direct flight.
from math import logkl = lambda p, q: p*log(p/q) + (1-p)*log((1-p)/(1-q))j = lambda p, q: kl(p, q) + kl(q, p)a = j(0.1, 0.2)b = j(0.2, 0.3)c = j(0.1, 0.3)print(a + b, c)
This prints 0.135 and 0.270.
Jensen-Shannon distanceJensen-Shannon distance turns K-L divergence into a metric as follows. First, define the random variableM to be the average ofX andY. Then average the K-L divergence fromM to each ofX andY. This defines the Jensen-Shannondivergence. It's still not a metric, but it's square root is, which defines the Jensen-Shannondistance.
The following code gives an example of Jensen-Shannon distance satisfying the triangle inequality.
def d(p, q): m = 0.5*(p + q) jsd = 0.5*kl(p, m) + 0.5*kl(q, m) return jsd**0.5a = d(0.1, 0.2)b = d(0.2, 0.3)c = d(0.1, 0.3)print(a + b, c)
This prints 0.1817 and 0.1801. Now a layover makes the trip longer.
The post Turning K-L divergence into a metric first appeared on John D. Cook.