Fixing Random, part 21
Last time in this series I proposed a stripped-down DSL for probabilistic workflows. Today, let's see how we could "lower" it to ordinary C# 7 code. I'll assume of course that we have all of the types and extension methods that we've developed so far. (Code for this episode is here.)
The first thing I'm going to do is describe a possible but very problematic way to do it.
We know how C# lowers methods that have yield return statements: it generates a class that implements IEnumerable, rewrites the method body as the MoveNext method of that class, and makes the method return an instance of the class. We could do the same thing here.
Recall that last time I gave three examples, the longest of which was:
probabilistic static IDiscreteDistribution<string> Workflow(int z)
{
int ii = sample TwoDSix();
if (ii == 2)
return "two";
condition ii != z;
bool b = sample Flip();
return b ? "heads" : ii.ToString();
}
We could lower this to:
static IDiscreteDistribution<string> Workflow(int z) =>
new WorkflowDistribution(z);
sealed class WorkflowDistribution :
IDiscreteDistribution<string>
{
int z;
public WorkflowDistribution(int z)
{
this.z = z;
}
public string Sample()
{
start:
int ii = TwoDSix().Sample();
if (ii == 2)
return "two";
if (!(ii != z))
goto start;
bool b = Flip().Sample();
return b ? "heads" : ii.ToString();
}
public IEnumerable<string> Support() => ???
public int Weight(string t) => ???
}
We'd need to make sure that we did the right thing if z got modified during the workflow and the condition was not met of course, but that's a small detail.
We could similarly lower the other two; I won't labour the point. It's a lot of boilerplate code.
The good news is: we get the right distribution out of this thing:
Console.WriteLine(Workflow(3).Histogram());
4|*** 5|**** 6|***** 7|******* 8|****** 9|**** 10|*** 11|** 12|* two|**heads|****************************************
The bad news is:
- We haven't computed the support or the weights, as required.
- We are once again in a situation where a condition causes a potentially long-running loop to execute; we could do hundreds or thousands of iterations of going back to "start" for every successful sample if the condition was rarely met.
Basically, we've lost the great property that we had before: that we automatically get the right discrete distribution object inferred.
So let's state unequivocally right now what our goal here is. The goal of this feature is to take a method body that describes a probabilistic workflow that includes conditions and produce a semantically equivalent distribution that does not actually contain any loop-causing condition, the same as we did for query-based workflows.
In our exploration of creating a posterior, we've seen that we can take a query that contains Where clauses and produce a projected weighted integer distribution that matches the desired distribution. Though there is some cost to producing that distribution object, once we've paid that cost there is no additional per-sample cost. We can do the same here, by leveraging the work we've already done.
The technique I'm going to use is as follows:
- Each statement in the method will be numbered starting from zero; statements nested inside if statements are of course statements.
- Each statement will be replaced by a local method named Sn, for n the number of the statement.
- Each local method will take as its arguments all the locals of the method. (If there are any assignments to formal parameters then they will be considered locals for this purpose.)
- Each local method will return a probability distribution.
At this point I could give you the general rule for each kind of statement in my stripped down language, but let's not bother. Let's just lower the methods and it will become obvious what we're doing.
We'll start with the easy one.
probabilistic IDiscreteDistribution<bool> Flip()
{
int x = sample Bernoulli.Distribution(1, 1);
return x == 0;
}
we rewrite this as:
IDiscreteDistribution<bool> Flip()
{
IDiscreteDistribution<bool> S0(int x) =>
Bernoulli.Distribution(1, 1)
.SelectMany(_x => S1(_x));
IDiscreteDistribution<bool> S1(int x) =>
Singleton<bool>.Distribution(x == 0);
return S0(0);
}
Read that over carefully and convince yourself that this implements the correct logic, just in a much more convoluted fashion.
Aside: Notice that we are taking advantage of the monad laws here regarding the relationship between binding and the function that produces a Singleton. It might not have been clear before why this is an important monad law; hopefully it is now clear.
Aside: Recall that in the jargon of monads, the unit operation that creates a new wrapper around a single instance of the underlying type is sometimes called return. It is no coincidence that our lowering turns return statements into singletons!
Now let's lower the next one:
probabilistic IDiscreteDistribution<int> TwoDSix()
{
var d = SDU.Distribution(1, 6);
int x = sample d;
int y = sample d;
return x + y;
}
This becomes this mess:
static IDiscreteDistribution<int> TwoDSix()
{
IDiscreteDistribution<int> S0(
IDiscreteDistribution<int> d, int x, int y) =>
S1(StandardDiscreteUniform.Distribution(1, 6), x, y);
IDiscreteDistribution<int> S1(
IDiscreteDistribution<int> d, int x, int y) =>
d.SelectMany(_x => S2(d, _x, y));
IDiscreteDistribution<int> S2(
IDiscreteDistribution<int> d, int x, int y) =>
d.SelectMany(_y => S3(d, x, _y));
IDiscreteDistribution<int> S3(
IDiscreteDistribution<int> d, int x, int y) =>
Singleton<int>.Distribution(x + y);
return S0(null, 0, 0);
}
Again, walk through that and convince yourself that it's doing the right thing. This implementation is ridiculously overcomplicated for what it does, but you should have confidence that it is doing the right thing. Remember, we're trying to prove that the proposed language feature is in theory possible first, and then we'll think about ways to make it better.
Finally, let's do one with an if and a condition:
probabilistic IDiscreteDistribution<string> Workflow(int z)
{
int ii = sample TwoDSix();
if (ii == 2)
return "two";
condition ii != z;
bool b = sample Flip();
return b ? "heads" : ii.ToString();
}
This lowers to:
static IDiscreteDistribution<string> Workflow(int z)
{
IDiscreteDistribution<string> S0(int ii, bool b) =>
TwoDSix().SelectMany(_ii => S1(_ii, b));
IDiscreteDistribution<string> S1(int ii, bool b) =>
ii == 2 ? S2(ii, b) : S3(ii, b);
IDiscreteDistribution<string> S2(int ii, bool b) =>
Singleton<string>.Distribution("two");
IDiscreteDistribution<string> S3(int ii, bool b) =>
ii != z ? S4(ii, b) : Empty<string>.Distribution;
IDiscreteDistribution<string> S4(int ii, bool b) =>
Flip().SelectMany(_b => S5(ii, _b));
IDiscreteDistribution<string> S5(int ii, bool b) =>
Singleton<string>.Distribution(b ? "heads" : ii.ToString());
return S0(0, false);
}
And at last we can find out what the distribution of this workflow is:
Console.WriteLine(Workflow(3).ShowWeights());
gives us:
two:2heads:33 4:3 5:4 6:5 7:6 8:5 9:4 10:3 11:2 12:1
We can mechanically rewrite our little probabilistic workflow language into a program that automatically computes the desired probability distribution.
It should now be pretty clear what the rewrite rules are for our simple DSL:
- Normal assignment and declaration statements invoke the "next statement" method with new values for the locals.
- Assignment or declaration statements with sample become SelectMany.
- The condition statement evaluates their condition and then either continues to the next statement if it is met, or produces an empty distribution if it is not.
- if statements evaluate their condition and then invoke the consequence or the alternative helper method.
- return statements evaluate an expression and return a Singleton of it.
Exercise: I severely restricted the kinds of statements and expressions that could appear in this language. Can you see how to implement:
- while, do, for, break, continue, switch and goto/labeled statements
- compound assignment, increment and decrement as statements
Exercise: What about assignments, increments and decrements as expressions; any issues there?
Exercise: What about lambdas? Any issues with adding them in? Again, we're assuming that all functions called in this system are pure, and that includes lambdas.
Exercise: I severely restricted where the sample operator may appear, just to make it easier on me. Can you see how we might ease that restriction? For example, it would be much nicer to write TwoDSix as:
probabilistic IDiscreteDistribution<int> TwoDSix()
{
var d = SDU.Distribution(1, 6);
return sample d + sample d;
}
How might you modify the lowering regimen I've described to handle that?
We'll discuss possible answers to all of these in future episodes.
As I've noted repeatedly in this series: of course this is not how we'd actually implement the proposed feature. Problems abound:
- I've absurdly over-complicated the code generation to the point where every statement is its own local function.
- We are passing around all the locals as parameters; probably some of them could be realized as actual locals.
- Each method is tail recursive, but C# does not guarantee that tail recursions are optimized away, so the stack depth could get quite large.
- Several of the methods take arguments that they do not ever use.
- We could have made some simple inlining optimizations that would decrease the number of helper methods considerably.
That last point bears further exploration. Imagine we applied an inlining optimization over and over again - that is, the optimization where we replace a function call with the body of the function. Since every function call here is expression-bodied, that's pretty easy to do!
Exercise: Do so; the answer follows. (Note that I asked how to write this workflow as a query last time; did you get the same answer?)
.
.
.
.
What would come out the other end of repeated inlining is:
static IDiscreteDistribution<string> Workflow(int z) =>
TwoDSix().SelectMany(ii =>
ii == 2 ?
Singleton<string>.Distribution("two") :
ii != z ?
Flip().SelectMany(b =>
Singleton<string>.Distribution(b ?
"heads" :
ii.ToString())) :
Empty<string>.Distribution);
Which is hardly any easier to read, but at least it is not quite so many functions.
Is this then how we could implement this feature for real? Lower it to functions, then inline the functions? Though it would work, that's probably not how I'd do it for real. I might get to discuss a more realistic lowering technique in a future episode, but we'll see how it goes. This series is getting pretty long and we're not done yet!
And again, my purpose here was to show that it could be done with a straightforward, rules-based transformation; I'm not looking to find the optimal solution. It can be done, which is great!
Aside: I said a few episodes back that it would become clear why having an explicitly empty distribution is a win; hopefully it is now clear. Having an explicitly empty distribution allows us to implement condition using SelectMany! It is not at all obvious how to rewrite the workflow above into a query with a Where clause, but as we've seen, implementing the condition statement as conditionally producing an empty distribution gives us the ability to stick what is logically a Where anywhere in the workflow.
Aside: Many episodes back I gave the challenge of implementing a Where that took a probabilistic predicate: Func<T, IDiscreteDistribution>. That is no challenge at all in our new workflow language:
bool b = sample predicate(whatever);
condition b;
I sincerely hope that you agree that my proposed "imperative style" workflow is an improvement over our use of LINQ to project, condition and combine probability distributions; our Workflow() method is a lot easier to read as an imperative workflow than its query form, and not just because it uses the jargon of probability rather than that of sequences.
Next time on FAIC: It looks like we've come up with a reasonable syntactic sugar, but is it just a sugar? Is there anything that our "imperative" workflow can do that our "LINQ" workflow cannot?