Justifiable sample size

John

from John D. Cook on 2023-09-08 12:01 (#6EJWJ)

One of the most common things a statistician is asked to do is compute a sample. There are well known formulas for this, so why isn't calculating a sample size trivial?

As with most things in statistics, plugging numbers into a formula is not the hard part. The hard part is deciding what numbers to plug in, which in turn depends on understanding the context of the study. What are you trying to learn? What are the constraints? What do you know a priori?

In my experience, sample size calculation is very different in a scientific setting versus a legal setting.

In a scientific setting, sample size is often determined by budget. This isn't done explicitly. There is a negotiation between the statistician and the researcher that starts out with talk of power and error rates, but the assumptions are adjusted until the sample size works out to something the researcher can afford.

In a legal setting, you can't get away with statistical casuistry as easily because you have to defend your choices. (In theory researchers have to defend themselves too, but that's a topic for another time.)

Opposing counsel or a judge may ask how you came up with the sample size you did. The difficulty here may be more expository than mathematical, i.e. the difficulty lies in explaining subtle concepts, not is carrying out calculations. A statistically defensible study design is no good unless there is someone there who can defend it.

One reason statistics is challenging to explain to laymen is that there are multiple levels of uncertainty. Suppose you want to determine the defect rate of some manufacturing process. You want to quantify the uncertainty in the quality of the output. But you also want to quantify your uncertainty about your estimate of uncertainty. For example, you may estimate the defect rate at 5%, but how sure are you that the defect rate is indeed 5%? How likely is it that the defect rate could be 10% or greater?

When there are multiple contexts of uncertainty, these contexts get confused. For example, variations on the following dialog come up repeatedly.

Are you saying the quality rate is 95%?"

No, I'm saying that I'm 95% confident of my estimate of the quality rate."

Probability is subtle and there's no getting around it.

The post Justifiable sample size first appeared on John D. Cook.

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title	John D. Cook
Feed Link	https://www.johndcook.com/blog