Fixing Random, part 22
[Code for this episode is here.]
Last time in this series I left you with several challenges for improving our DSL for imperative probabilistic workflows. But first, a puzzle:
Question One: You are walking down the street when you see a can of gasoline, some matches, a house on fire, a wrench, a fire hose and a fire hydrant. You consider the house being on fire to be problematic, so what do you do?
.
.
.
.
Of course you connect the hose to the fire hydrant with the wrench and use the hose to extinguish the house.
Question Two: You are walking down the street when you see a can of gasoline, some matches, a house not on fire, a wrench, a fire hose and a fire hydrant. What do you do?
.
.
.
Set the house on fire with the gasoline and the matches; we have now reduced the problem to a previously solved problem, so we're done.
That's how compiler writers approach problems. We know how to do lowerings in a very simplified version of probabilistic C#; to lower more complicated forms, we just keep on applying the "lower it to a previously solved problem" technique:
- goto is straightforward: you just evaluate the "statement method" that is the statement you're going to.
- Once you have goto and if, it is straightforward to implement do, while, break and continue; just reduce those to combinations of if and goto, which are previously solved problems.
- for loops are just an irritatingly complicated version of while, which is a previously solved problem.
- Implementing throw and try is tricky because it is a non-local goto. I might come back in a later episode and describe ways of taming throw and try in these sorts of workflows. For some thoughts on why this is tricky, see my past episodes on problems with iterator blocks. Recall also that use of various kinds of try blocks was originally restricted for async methods, for similar reasons.
- We have not solved the problem for try, which means that we're blocked on foreach, lock and using, all of which are syntactic sugars for try
- To allow the sample operator in more places, again, reduce the problem to previously solved problems. For example:
x = Foo(sample A(), sample B()[sample C()]) + sample D();
is equivalent to
var a = sample A();
var b = sample B();
var c = sample C();
var f = Foo(a, b[c]);
var d = sample D();
x = f + d;
And we already know how to lower all of those.
- You've got to be careful with the operators that conditionally evaluate their operands (&& || ?? ?:), but you can lower those into if statements.
- Assignments, increment and decrements used in expression locations can be handled with the same trick: turn it into a form that has only statements that we already know how to do.
- For lambdas and anonymous methods, we've already stated up front that they need to be pure, which means that they should not be closed over values which change. Given that restriction, lambdas (and therefore also query expressions) can be allowed.
- Moreover, we know how to turn lambdas into ordinary methods; if we apply the same restrictions to a lambda body as we do to probabilistic methods then we could allow probabilistic lambdas; that is, lambdas that return probability distributions and contain sample operators and condition statements. This is much the same as the way C# allows async lambdas.
Aside: If actually implementing this scheme sounds super complicated with lots of details to keep track of and performance pitfalls everywhere, congratulations, you now know what it is like to be a compiler developer!
Compiler developers have to do all of this stuff even without fancy kinds of workflows in the language; everything that I am describing is a problem that has to be solved during the codegen phase of a compiler, because IL does not have concepts like while or for or "assignment as an expression". It's all about reducing programs to simpler and simpler forms, until you get it down to the IL.
Let's assume for now that we can make all those basic enhancements to our imperative DSL, and address the question I left you with last time: is this sugar for queries just a sugar, or is there something we can do with it that we can't do with a query?
Yes and no.
As we've seen, ultimately everything gets desugared into calls to SelectMany, so no. Anything you can do with this notation, you can do with a SelectMany, and therefore you could write a query comprehension. However, this notation certainly makes it much easier to clearly and concisely represent operations that would look pretty odd in a query.
For example, we don't normally think of queries as being recursive, but recursive workflows are straightforward in this DSL, so maybe there is an argument that yes, there is some more expressiveness in the statement form.
How's that, you say? Let's look at an example of a recursive workflow.
Let's play a game in n rounds; we'll state up front what n is. You flip a coin. If it is heads, I pay you n dollars and we're done. If it is tails, we go to the next round. In the next round, you flip a coin. If it is heads, I pay you n-1 dollars, if tails, we go to the next round, and so on. If we get down to zero dollars, the game is over, and you get nothing. (The recursion has to stop somewhere!)
Aside: Obviously we do not have to implement this using recursion; we could use a loop. My point is that we easily can implement it with recursion, so recursion is a tool we have available in our toolbox when building probabilistic workflows.
Exercise: what would a fair price be for you to pay me up front so that the game is on average a net gain for neither you nor me? The answer is below.
I'm going to assume that we've made all the straightforward improvements to our DSL, including allowing probabilistic expression-bodied methods. We can implement the logic of our game like this:
probabilistic IDiscreteDistribution<int> Game(int rounds) =>
(rounds <= 0 || sample Flip()) ?
rounds :
sample Game(rounds - 1);
We can do an initial lowering to a simpler form:
probabilistic IDiscreteDistribution<int> Game(int rounds)
{
S0: if (rounds <= 0) goto S5;
S1: x = sample Flip();
S2: if (x) goto S5;
S3: y = sample Game(rounds - 1)
S4: return y;
S5: return rounds;
}
And then lower that form into actual runnable C# 7. (I'll elide the unnecessary locals and temporaries.)
static IDiscreteDistribution<int> Game(int rounds)
{
IDiscreteDistribution<int> S0() =>
rounds <= 0 ? S5() : S1();
IDiscreteDistribution<int> S1() =>
Flip().SelectMany(x => S2(x));
IDiscreteDistribution<int> S2(bool x) =>
x ? S5() : S3();
IDiscreteDistribution<int> S3() =>
Game(rounds - 1).SelectMany(y => S4(y));
IDiscreteDistribution<int> S4(int y) =>
Singleton<int>.Distribution(y);
IDiscreteDistribution<int> S5() =>
Singleton<int>.Distribution(rounds);
return S0();
}
(Inlining that into a query is left as an exercise.)
I asked above what is the fair price to pay; that's the expected value. We can work that out directly from the distribution:
public static double ExpectedValue(
this IDiscreteDistribution<int> d) =>
d.Support().Select(s =>
(double)s * d.Weight(s)).Sum() / d.TotalWeight();
And if we run it, we get the answers we seek:
Console.WriteLine(Game(5).ShowWeights());
Console.WriteLine(Game(5).ExpectedValue());
5:164:83:42:21:10:14.03125
You, who know probability theory, could have predicted the results: out of a large set of games, on average you get paid 5 half the time, 4 a quarter of the time, and so on.
If you paid me $4.00 for the chance to play this game in five rounds, on average you'd come out ahead by a few cents. Half the time you'd net a dollar, a quarter of the time you'd break even, and the rest of the time you'd lose some or all of your money.
Again the thing that is so exciting about this technique is: we humans didn't need to work out either the weights or the expected value; we wrote a program describing the game and executing that program produced the solution for the exact distribution of the results!
In a previous series I showed how re-implementing algorithms that use doubles to instead use dual numbers can make the runtime compute the exact value of the derivative of a function at a point while it is computing the value of the function at that point. We no longer have to manually do the work to compute a derivative, and nor do we have to approximate the derivative using numerical methods, and nor do we have to do symbolic differentiation; we just imperatively describe how the value is computed, and the runtime does a little extra work along the way to exactly compute the derivative.
What I've shown here is that we can do pretty much the same thing for small discrete probability distributions! We don't have to work out what the distribution of this silly game is. Nor do we have to simulate the game, collect a hundred thousand data points, and make a statistical approximation of the distribution. We just describe the flow of the game in imperative code, run the method, and the thing that pops out the other end is the exact distribution! It's almost magical.
Next time on FAIC: It can't really be that easy, can it?