Article 6C81X Collecting a large number of coupons

Collecting a large number of coupons

by
John
from John D. Cook on (#6C81X)

This post is an addendum to the recent post Reviewing a thousand things. We're going to look again at the coupon collector problem, randomly sampling a set of N things with replacement until we've collected one of everything.

As noted before, for largeNthe expected number of draws before you've seen everything at least once is approximately

N( log(N) + )

where is Euler's constant.

A stronger result using a Chernoff bound says that for any constant c, the probability that the number of draws exceeds

N( log(N) + c )

approaches

1 - exp( - exp(-c))

as N increases. See [1].

I said in the earlier post that if you were reviewing a thousand things by sampling flash cards with replacement, there's a 96% chance that after drawing 10,000 cards you would have seen everything. We could arrive at this result using the theorem above.

If N = 1,000 and we want N( log(N) + c ) to equal 10,000, we set c = 3.092. Then we find

1 - exp( - exp(-c) ) = 0.0444

which is consistent with saying there's a 96% chance of having seen everything, 95.56% if you want to be more precise.

[1] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.

The post Collecting a large number of coupons first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments