Fixing Random, part 3

ericlippert

from Fabulous adventures in coding on 2019-02-07 17:30 (#48NJB)

Last time on FAIC I described a first attempt of how I'd like to fix System.Random:

Make every method static and threadsafe
Draw a clear distinction between crypto-strength and pseudo-random methods

There were lots of good comments on ways to improve the API and implementation details further. However, even with those improvements, we're still really just implementing slightly better versions of rand() from the 1970s. We can do better than that.

Let's start by taking a step back and asking what we really want when we're calling NextInt or NextDouble or whatever on a source of randomness. What we're saying is: we have a source of random Ts for some type T, and the values produced will correspond to some probability distribution. We can express that very clearly in the C# type system:

public interface IDistribution<T>
{
T Sample();
}

For this episode I'm just going to look at continuous probability distributions, which I'll represent as IDistribution<double>. We'll look at integer distributions and other possible values for type parameter T in the upcoming episodes.

Now that we have an interface, what's the simplest possible implementation? We have a library that produces a uniform distribution of doubles over the interval [0.0, 1.0), which is called the standard continuous uniform distribution. Let's make a little singleton class for that:

using SCU = StandardContinuousUniform;
public sealed class StandardContinuousUniform :
IDistribution<double>
{
public static readonly StandardContinuousUniform
Distribution = new StandardContinuousUniform();
private StandardContinuousUniform() { }
public double Sample() => Pseudorandom.NextDouble();
}

Aside: For the purposes of this series I'm going to use pseudo-random numbers; feel free to sub in crypto-strength random numbers if you like it better.

Exercise: make this a non-singleton that takes a source of randomness as a parameter, and feel all fancy because now you're doing "dependency injection".

This implementation might seem trivial - and it is - but upon this simple foundation we can build a powerful abstraction.

Now that we have a type that represents distributions, I'd like to build a bridge between distributions and LINQ. The connection between probability distributions and sequences should be pretty obvious; a distribution is a thing that can produce an infinite sequence of values drawn from that distribution.

I'll make a static class for my helper methods:

public static class Distribution
{
public static IEnumerable<T> Samples<T>(
this IDistribution<T> d)
{
while (true)
yield return d.Sample();
}

What does this give us? We instantly have access to all the representational power of the sequence operation library. Want a sum of twelve random doubles drawn from a uniform distribution? Then say that:

SCU.Distribution.Samples().Take(12).Sum()

Aside: You might be thinking "if there is such a strong connection between distributions and infinite sequences then why not cut out the middleman by making IDistribution extend the IEnumerable interface directly, rather than requiring a Samples() helper method?" That's an excellent question; the answer will become clear in a few episodes.

For testing and pedagogic purposes I'd like to be able to very quickly visualize a distribution from the console or in the debugger, so here's another couple extension methods:

public static string Histogram(
this IDistribution<double> d, double low, double high) =>
d.Samples().Histogram(low, high);

public static string Histogram(
this IEnumerable<double> d, double low, double high)
{
const int width = 40;
const int height = 20;
const int sampleCount = 100000;
int[] buckets = new int[width];
foreach(double c in d.Take(sampleCount))
{
int bucket = (int)
(buckets.Length * (c - low) /
(high - low));
if (0 <= bucket && bucket < buckets.Length)
buckets[bucket] += 1;
}
int max = buckets.Max();
double scale =
max < height ? 1.0 : ((double)height) / max;
return string.Join("",
Enumerable.Range(0, height).Select(
r => string.Join("", buckets.Select(
b => b * scale > (height - r) ? '*' : ' ')) + "\n"))
+ new string('-', width) + "\n";
}

Exercise: Rescue the above princess in a single LINQ expression without getting Jon to do it for you. (I haven't actually tried this; I'd be interested to know if it is possible. )

We can now do this:

SCU.Distribution.Histogram(0, 1)

and get the string:

**** ********** * * * * ******* *************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************----------------------------------------

Which obviously is the desired probability distribution histogram, so I'm pretty pleased.

Extension methods and fluent programming are nice and all, but what else does this get us? Well, we can very concisely produce derived probability distributions. For example, suppose we want the normal distribution:

using static System.Math;
using SCU = StandardContinuousUniform;
public sealed class Normal : IDistribution<double>
{
public double Mean { get; }
public double Sigma { get; }
public double I1/4 => Mean;
public double If => Sigma;
public readonly static Normal Standard = Distribution(0, 1);
public static Normal Distribution(
double mean, double sigma) =>
new Normal(mean, sigma);
private Normal(double mean, double sigma)
{
this.Mean = mean;
this.Sigma = sigma;
}
// Box-Muller method
private double StandardSample() =>
Sqrt(-2.0 * Log(SCU.Distribution.Sample())) *
Cos(2.0 * PI * SCU.Distribution.Sample());
public double Sample() => I1/4 + If * StandardSample();
}

And of course we can graph it:

Normal.Distribution(1.0, 1.5).Histogram(-4, 4)

 *** ****** ******** ********** *********** ************ ************** ************** **************** ****************** ****************** ******************** ********************** ************************ ************************ **************************** ****************************** ******************************* *********************************----------------------------------------

If you want a list with a hundred thousand normally distributed values with a particular mean and standard deviation, just ask for it:

Normal.Distribution(1.0, 1.5).Samples().Take(100000).ToList()

Hopefully you can see how much more useful, readable and powerful this approach is, compared to messing around with System.Random directly. We want to move the level of abstraction of the code away of the mechanistic details of loops and lists and whatnot; the code should be at the level of the business domain: distributions and population samples.

Exercise: Try implementing the Irwin-Hall distribution (hint: it's already written in this episode!)

Exercise: Try implementing the Gamma distribution using the Ahrens-Dieter algorithm.

Exercise: Try implementing the Poisson distribution as an IDistribution<int>.

In the next few episodes of FAIC: we'll stop looking at continuous distributions and take a deeper look at the use cases for our new interface when the type argument represents a small set of discrete value, like Booleans, small ints or enums.

Source	RSS or Atom Feed
Feed Location	http://ericlippert.com/feed
Feed Title	Fabulous adventures in coding
Feed Link	https://ericlippert.com/