The coupon collector problem and π
How far do you have to go down the decimal digits of until you've seen all the digits 0 through 9?
We can print out the first few digits of and see that there's no 0 until the 32nd decimal place.
3.14159265358979323846264338327950
It's easy to verify that the remaining digits occur before the 0, so the answer is 32.
Now suppose we want to look at pairs of digits. How far out do we have to go until we've seen all pairs of digits (or base 100 digits if you want to think of it that way)? And what about triples of digits?
We know we'll need at least 100 pairs, and at least 1000 triples, so this has gotten bigger than we want to do by hand. So here's a little Python script that will do the work for us.
from mpmath import mp mp.dps = 30_000 s = str(mp.pi)[2:] for k in [1, 2, 3]: tuples = [s[i:i+k] for i in range(0, len(s), k)] d = dict() i = 0 while len(d) < 10**k: d[tuples[i]] = 1 i += 1 print(i)
The output:
32 396 6076
This confirms that we at the 32nd decimal place we will have seen all 10 possible digits. It says we need 396 pairs of digits before we see all 100 possible digit pairs, and we'll need 6076 triples before we've seen all possible triples.
We could have used the asymptotic solution to the coupon collector problem" to approximately predict the results above.
Suppose you have an urn with n uniquely labeled balls. You randomly select one ball at a time, return the ball to the run, and select randomly again. The coupon collector problem ask how many draws you'll have to make before you've selected each ball at least once.
The expected value for the number of draws is
n Hn
where Hn is the nth harmonic number. For large n this is approximately equal to
n(log n + )
where is the Euler-Mascheroni constant. (More on the gamma constant here.)
Now assume the digits of are random. Of course they're not random, but random is as random does. We can get useful estimates by making the modeling assumption that the digits behave like a random sequence.
The solution to the coupon collector problem says we'd expect, on average, to sample 28 digits before we see each digit, 518 pairs before we see each pair, and 7485 triples before we see each triple. On average" doesn't mean much since there's only one , but you could interpret this as saying what you'd expect if you repeatedly chose real numbers at random and looked at their digits, assuming the normal number conjecture.
The variance on the number of draws needed is asymptotically ^2 n^2/6, so the number of draws with usually be an interval of the expected value 2n.
If you want the details of the coupon collector problem, not just the expected value but the probabilities for different number of draws, see Sampling with replacement until you've seen everything.
The post The coupon collector problem and first appeared on John D. Cook.