Article 48X2F Fixing Random, part 4

Fixing Random, part 4

by
ericlippert
from Fabulous adventures in coding on (#48X2F)

[Code is here.]

Last time on FAIC I showed how we could use a simple interface to cleanly implement concepts like "normal distribution" or "standard uniform distribution" in a natural style using fluent programming. For the next many episodes we'll leave continuous distributions aside and consider only discrete distributions. That is, we have some probability distribution where there are only a small, fixed number of outcomes. Flip a coin, roll three dice and add them together, that sort of thing.

Since we assume the number of possible outcomes of a discrete distribution is small, we can enumerate them; the set of possible outcomes is, oddly enough, called the support of the distribution.

Some elements might be more likely than others; the ones that happen more often we say have more weight than the ones that happen less often. I want to be able to describe the weight of each member of a distribution easily, and I do not want to get off in the weeds of representing weights as doubles because double arithmetic has many pitfalls as we well know. For the purposes of this series I'm going to assume that we can weight the members of a distribution by an integer. (If I get around to it I might discuss double-weighted distributions later on.)

public interface IDiscreteDistribution<T> : IDistribution<T>
{
IEnumerable<T> Support();
int Weight(T t);
}

I'm going to require that the weight function accept any value of type T. If the value is not in the support, the weight is zero, and if the value is in the support, the weight is a positive integer. I'm going to further assume that the support and weights are small enough that the total weight fits into an integer.

Aside: If you divide the weight function by the total weight (as a double) then you have the probability mass function of the discrete distribution.

I said way back in part one of this series that System.Random has a "candy-machine interface" problem. Let's solve that problem right now:

using SCU = StandardContinuousUniform;
public sealed class StandardDiscreteUniform :
IDiscreteDistribution<int>
{
public static StandardDiscreteUniform Distribution(
int min, int max)
{
if (min > max)
throw new ArgumentException();
return new StandardDiscreteUniform(min, max);
}
public int Min { get; }
public int Max { get; }
private StandardDiscreteUniform(int min, int max)
{
this.Min = min;
this.Max = max;
}
public IEnumerable<int> Support() =>
Enumerable.Range(Min, 1 + Max - Min);
public int Sample() =>
(int)((SCU.Distribution.Sample() *
(1.0 + Max - Min)) + Min);
public int Weight(int i) =>
(Min <= i && i <= Max) ? 1 : 0;
public override string ToString() =>
$"StandardDiscreteUniform[{Min}, {Max}]";
}

Finally we can now do things like 10d6 (D&D notation for "sum ten six-sided die rolls") in a natural, fluent style. I would far prefer:

SDU.Distribution(1, 6).Samples().Take(10).Sum()

to

random.Next(1, 7) + random.Next(1, 7) + random.Next(1, 7) + "

or the equivalent loop nonsense that you typically see.

A common defense of "lower bound inclusive, upper bound exclusive" is that it is natural for programmers in C-family languages to think of an array or list as indexed from 0 (inclusive) to the array length (exclusive). Apparently we want to be able to say things like

myArray[Random.Next(0, myArray.Length)]

and not get confused and have to say myArray.Length-1. Well, my answer to that (germane) critique is: once again, we're reasoning at the wrong level. If we want a probability distribution drawn from elements of an array, then don't mess around with array indexing and choosing the right random number and so on. That is writing code at the level of the mechanisms, not the business domain. Instead, make an object that represents a distribution of elements chosen from a given array and sample from it!

But we are getting ahead of ourselves; we'll get to that in an upcoming episode.

Previously I made a little "histogram string" helper extension method that is useful for quickly displaying results in a debugger or console; let's do the same for discrete distributions. Moreover, to make sure that we really are getting the results we expect, I'm going to implement it by sampling rather than just basing it on the support and the corresponding weights:

public static string Histogram<T>(
this IDiscreteDistribution<T> d) =>
d.Samples().DiscreteHistogram();

public static string DiscreteHistogram<T>(
this IEnumerable<T> d)
{
const int sampleCount = 100000;
const int width = 40;
var dict = d.Take(sampleCount)
.GroupBy(x => x)
.ToDictionary(g => g.Key, g => g.Count());
int labelMax = dict.Keys
.Select(x => x.ToString().Length)
.Max();
var sup = dict.Keys.OrderBy(x => x).ToList();
int max = dict.Values.Max();
double scale = max < width ? 1.0 : ((double)width) / max;
return string.Join("\n",
sup.Select(s => $"{ToLabel(s)}|{Bar(s)}"));
string ToLabel(T t) =>
t.ToString().PadLeft(labelMax);
string Bar(T t) =>
new string('*', (int)(dict[t] * scale));
}

I gotta say, I am loving local functions in C# 7.

Let's take it for a test drive and roll 1d10:

StandardDiscreteUniform.Distribution(1, 10).Histogram()

The histogram appears as we'd expect:

 1|*************************************** 2|*************************************** 3|*************************************** 4|************************************** 5|*************************************** 6|**************************************** 7|************************************** 8|*************************************** 9|***************************************10|**************************************

Of course, sometimes we will want to see the weights directly:

public static string ShowWeights<T>(
this IDiscreteDistribution<T> d)
{
int labelMax = d.Support()
.Select(x => x.ToString().Length)
.Max();
return string.Join("\n", d.Support().Select(s =>
$"{ToLabel(s)}:{d.Weight(s)}"));
string ToLabel(T t) =>
t.ToString().PadLeft(labelMax);
}

We seem to be doing pretty well here; our implementations of each probability distribution are short and sweet, and each one provides a stone in the foundation for building more complex distributions.

Next time on FAIC: we'll build more simple discrete distributions.

External Content
Source RSS or Atom Feed
Feed Location http://ericlippert.com/feed
Feed Title Fabulous adventures in coding
Feed Link https://ericlippert.com/
Reply 0 comments