MCMC and the coupon collector problem
Bob Carpenter wrote today about how Markov chains cannot thoroughly cover high-dimensional spaces, and that they do not need to. That's kinda the point of Monte Carlo integration. If you want systematic coverage, you can/must sample systematically, and that's impractical in high dimensions.
Bob gives the example that if you want to get one integration point in each quadrant of a 20-dimensional space, you need a million samples. (220 to be precise.) But you may be able to get sufficiently accurate results with a lot less than a million samples.
If you wanted to be certain to have one sample in each quadrant, you could sample at (1, 1, 1, ..., 1). But if for some weird reason you wanted to sample randomly and hit each quadrant, you have a coupon collector problem. The expected number of samples to hit each of N cells with uniform [1] random sampling with replacement is
N( log(N) + )
where is Euler's constant. So if N = 220, the expected number of draws would be over 15 million.
Related posts- The coupon collector problem and
- Collecting a large number of coupons
- Consecutive coupon collector problem
[1] We're assuming each cell is sampled independently with equal probability each time. Markov chain Monte Carlo is more complicated than that because the draws are not independent.
The post MCMC and the coupon collector problem first appeared on John D. Cook.