Estimating standard deviation from range
Suppose you have a small number of samples, say between 2 and 10, and you'd like to estimate the standard deviation of the population these samples came from. Of course you could compute the sample standard deviation, but there is a simple and robust alternative
Let W be the range of our samples, the difference between the largest and smallest value. Think w" for width." Then
W / dn
is an unbiased estimator of where the constants dn can be looked up in a table [1].
| n | 1/d_n | |----+-------| | 2 | 0.886 | | 3 | 0.591 | | 4 | 0.486 | | 5 | 0.430 | | 6 | 0.395 | | 7 | 0.370 | | 8 | 0.351 | | 9 | 0.337 | | 10 | 0.325 |
The values dn in the table were calculated from the expected value of W/ for normal random variables, but the method may be used on data that do not come from a normal distribution.
Let's try this out with a little Python code. First we'll take samples from a standard normal distribution, so the population standard deviation is 1. We'll draw five samples, and estimate the standard deviation two ways: by the method above and by the sample standard deviation.
from scipy.stats import norm, gamma for _ in range(5): x = norm.rvs(size=10) w = x.max() - x.min() print(x.std(ddof=1), w*0.325)
Here's the output:
| w/d_n | std | |-------+-------| | 1.174 | 1.434 | | 1.205 | 1.480 | | 1.173 | 0.987 | | 1.154 | 1.277 | | 0.921 | 1.083 |
Just from this example it seems the range method does about as well as the sample standard deviation.
For a non-normal example, let's repeat our exercise using a gamma distribution with shape 4, which has standard deviation 2.
| w/d_n | std | |-------+-------| | 2.009 | 1.827 | | 1.474 | 1.416 | | 1.898 | 2.032 | | 2.346 | 2.252 | | 2.566 | 2.213 |
Once again, it seems both methods do about equally well. In both examples the uncertainty due to the small sample size is more important than the difference between the two methods.
Update: To calculate dn for other values of n, see this post.
[1] Source: H, A. David. Order Statistics. John Wiley and Sons, 1970.
The post Estimating standard deviation from range first appeared on John D. Cook.