Article 5BCND Predictive probability for large populations

Predictive probability for large populations

by
John
from John D. Cook on (#5BCND)

Suppose you want to estimate the number of patients who respond to some treatment in a large population of size N and what you have is data on a small sample of size n.

The usual way of going about this calculates the proportion of responses in the small sample, then pretends that this is the precisely known proportion, and applies it to the population as a whole. This approach underestimates the uncertainty in the final estimate because it ignores the uncertainty in the initial estimate.

A more sophisticated approach creates a distribution representing what is known about the proportion of successes based on the small sample and any prior information. Then it estimates the posterior predictive probability of outcomes in the larger population based on this distribution. For more background on predictive probability, see my post on uncertainty in a probability.

This post will look at ways to compute predictive probabilities using the asymptotic approximation presented in the previous post.

Suppose your estimate on the probability of success, based on your small sample and prior information, has a Beta(, ) distribution. Then the predictive probability of s successes and f failures is

pp_exact1.svg

The beta functions in the numerator and denominator are hard to understand and hard to compute, so we would like to replace them with something easier to understand and easier to compute.

We begin by expanding all the terms involving s and f in terms of gamma functions.

pp_exact_x2.svg

By rearranging the terms we can see three examples of the kind of gamma function ratios estimated in the previous post.

pp_exact_x3.svg

Now we can apply the simplest approximation from the previous post to get the following.

pp_short_approx2.svg

We can make this approximation more symmetric by expanding Beta(, ) in terms of factorials:

pp_short_approx3.svg

We saw in the previous post that we can make our asymptotic approximation much more accurate by shifting the arguments a bit, and the following expression, while a little more complicated, should be more accurate.

pp_long_approx2.svg

This should be accurate for sufficiently large s and f, but I haven't tested it numerically and wouldn't recommend using it without doing some testing first.

The post Predictive probability for large populations first appeared on John D. Cook.

VBs-yDjag3w
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments