Fixing Random, part 14

ericlippert

from Fabulous adventures in coding on 2019-03-18 16:05 (#4BAG7)

Last time on FAIC we achieved two major results in our effort to build better probability tools. First, we demonstrated that the SelectMany implementation which applies a likelihood function to a prior probability is the bind operation of the probability monad. Second, we gave an implementation of a wrapper object that implements it. It's action can be summed up as:

sample from the prior distribution
use the likelihood function to get the conditional distribution
sample from the conditional distribution
run the projection on the pair of samples to get the result

You probably recall though that I did not implement the Weight function. It's a little tricky to do so, for two reasons. First, I made the (now somewhat questionable!) decision to make weights integers. If the weights are fractions between 0.0 and 1.0, you can just multiply the weight of the prior sample by the weight of the conditional sample. (And if the weights are logarithms, you can just add them.) It's trickier with integers. And second, the projection at the end introduces once again the possibility that there will be "collisions"; the projection could pick non-unique values for unique combinations of the samples, that then have to be weighted as the sum.

That's all a little abstract, so let's work an example.

Suppose we have a population of people who have been diagnosed with Frob Syndrome, which seems to be linked with height. We'll divide the population of Frob Syndrome patients into three categories:

enum Height { Tall, Medium, Short }

and let's suppose in our study population there are five tall people, two medium-height people, and one short person in every eight:

var prior = new List<Height>() { Tall, Medium, Short }
.ToWeighted(5, 2, 1);

(To keep the code short on the page, suppose I have using static directives for each.)

Now let's suppose we've done a survey of each of the tall, medium and short people to learn the severity of their symptoms:

enum Severity { Severe, Moderate, Mild }

At this point I'm going to make the numbers a bit odd to illustrate the mathematics more clearly. What is the likelihood of a member of each group to report symptoms? Let's say that 10 out of every 21 tall people report severe symptoms, and the remaining 11 report moderate symptoms. For the medium-height people, 12 out of 17 report moderate symptoms and 5 report mild symptoms. And all the short people report mild symptoms:

And now let's suppose we have a recommended prescription level:

enum Prescription { DoubleDose, NormalDose, HalfDose }

Taller people or people with more severe symptoms get a higher dose; shorter people or people with mild symptoms get a smaller dose:

Prescription projection(Height h, Severity s)
{
switch (h)
{
case Tall: return s == Severe ? DoubleDose : NormalDose;
case Medium: return s == Mild ? HalfDose : NormalDose;
default: return HalfDose;
}
}

The question now is: what is the probability distribution on prescriptions for this study population? That is, if we picked a random member of this population, how likely is it that they'd have a double, normal or half dose prescription?

IDiscreteDistribution<Prescription> doses =
prior.SelectMany(likelihood, projection);

The problem is to work out the weightings of the three possible outcomes.

As I mentioned before, it's easiest to do this when the weights are fractions because we can then just multiply them and then add them up:

Height Severity PrescriptionTall (5/8) Severe (10/21) DoubleDose (25/84)Tall (5/8) Moderate (11/21) NormalDose (55/168)Medium (2/8) Moderate (12/17) NormalDose (3/17)Medium (2/8) Mild (5/17) HalfDose (5/68)Short (1/8) Mild (1/1) HalfDose (1/8)

(To save space I've elided the zero rows.)

So the probability of a member of this population getting a double dose is 25/84, getting a normal dose is 55/168 + 3/17 = 1439/2856, and getting a half dose is 5/68 + 1/8 = 27/136. Verifying that those add up to 1.0 is left as an exercise.

But we're going to grit our teeth here and do it all in integers! How might we do that?

Well, we know how to eliminate fractions: multiply all the weights in the first column by 8, and all the weights in the second column by 21 * 17, and none of the proportions will change:

Height Severity PrescriptionTall (5) Severe (170) DoubleDose (850)Tall (5) Moderate (187) NormalDose (935)Medium (2) Moderate (252) NormalDose (504)Medium (2) Mild (105) HalfDose (210)Short (1) Mild (357) HalfDose (357)

So the integer weights are: double dose is 850, normal dose is 935 + 504 = 1439, and half dose is 210 + 357 = 567.

Let's implement it!

First off, oddly enough there is a Sum() extension method but no Product() extension method, so let's implement that:

public static int Product(this IEnumerable<int> items) =>
items.Aggregate(1, (a, b) => a * b);

And I also need to know the total weight of a distribution:

public static int TotalWeight<T>(
this IDiscreteDistribution<T> d) =>
d.Support().Select(t => d.Weight(t)).Sum();

And now we can implement the algorithm I just sketched out:

int product = prior.Support()
.Select(a => likelihood(a).TotalWeight())
.Product();
var w = from h in prior.Support()
let ps = likelihood(h)
from s in ps.Support()
group prior.Weight(h) * ps.Weight(s) *
product / ps.TotalWeight()
by projection(h, s);
var dict = w.ToDictionary(g => g.Key, g => g.Sum());
var doses = dict.Keys.ToList();
var weights = dict.Values.ToList();

And sure enough, if we print those last two out:

DoubleDose, NormalDose, HalfDose850, 1439, 567

Super, we can now work out the weights in our implementation of SelectMany.

But" wait a minute. Why do we have to?

That is, why do we need a Combined wrapper class for SelectMany at all?

We just worked out the weights of every member of the support, and we did so making no assumptions whatsoever about the prior or the likelihood function. We can delete our Combined wrapper class, and replace our implementation of SelectMany with:

public static IDiscreteDistribution<C> SelectMany<A, B, C>(
this IDiscreteDistribution<A> prior,
Func<A, IDiscreteDistribution<B>> likelihood,
Func<A, B, C> projection)
{
int product = prior.Support()
.Select(a => likelihood(a).TotalWeight())
.Product();
var w = from a in prior.Support()
let pb = likelihood(a)
from b in pb.Support()
group prior.Weight(a) * pb.Weight(b) *
product / pb.TotalWeight()
by projection(a, b);
var dict = w.ToDictionary(g => g.Key, g => g.Sum());
return dict.Keys.ToWeighted(dict.Values);
}

Exercise: Do you see any potential pitfalls in this implementation of computing the new weights? Give it some thought; I'll give the answer in the next episode.

We do a small amount of math up front, and in exchange, we have computed the exact resulting probability distribution, which we can sample from efficiently. Just as we did with Where and Select" in previous episodes.

Aside: Once again, if you trace through all the logic I've written so far you will quickly see that it is hugely inefficient in terms of the amount of re-computation it does and garbage it produces. If we were writing this for production code, we'd be a lot more aggressive about finding code paths that do re-computation and eliminating them. The point of this exercise is that our code produces correct, efficient distribution objects out the other end, even if it is a bit wasteful to do so in this particular pedagogic implementation.

Think about the power of this: you can write programs that treat discrete probability distributions over arbitrary types as values, the same way you'd treat integers, strings, sequences, or whatever, as values that obey a particular set of algebraic rules. We can project, condition and combine them together with the same ease that we do today with sequences, and sample from them to boot!

The idea that we can describe a probabilistic workflow, and have as the output of that workflow a new distribution semantically equivalent to the effect of the workflow, but without any long-running sample-and-reject loops due to the conditions, is called inference by the probabilistic programming community.

We've seen that we can do inference on arbitrary discrete distributions provided that the supports are small and the weights are small integers; as we'll see throughout the rest of this series, the problem gets considerably harder as we abandon some of those simplifying assumptions.

Next time on FAIC: I'm going to implement a minor optimization in our weighted integer distribution. After that, we'll put it all together to show how what we've developed so far can be used for Bayesian inference.

Source	RSS or Atom Feed
Feed Location	http://ericlippert.com/feed
Feed Title	Fabulous adventures in coding
Feed Link	https://ericlippert.com/