Estimating standard deviation from range

John

from John D. Cook on 2022-03-07 19:06 (#5WVG6)

Suppose you have a small number of samples, say between 2 and 10, and you'd like to estimate the standard deviation of the population these samples came from. Of course you could compute the sample standard deviation, but there is a simple and robust alternative

Let W be the range of our samples, the difference between the largest and smallest value. Think w" for width." Then

W / d_n

is an unbiased estimator of where the constants d_n can be looked up in a table [1].

 | n | 1/d_n | |----+-------| | 2 | 0.886 | | 3 | 0.591 | | 4 | 0.486 | | 5 | 0.430 | | 6 | 0.395 | | 7 | 0.370 | | 8 | 0.351 | | 9 | 0.337 | | 10 | 0.325 |

The values d_n in the table were calculated from the expected value of W/ for normal random variables, but the method may be used on data that do not come from a normal distribution.

Let's try this out with a little Python code. First we'll take samples from a standard normal distribution, so the population standard deviation is 1. We'll draw five samples, and estimate the standard deviation two ways: by the method above and by the sample standard deviation.

 from scipy.stats import norm, gamma for _ in range(5): x = norm.rvs(size=10) w = x.max() - x.min() print(x.std(ddof=1), w*0.325)

Here's the output:

 | w/d_n | std | |-------+-------| | 1.174 | 1.434 | | 1.205 | 1.480 | | 1.173 | 0.987 | | 1.154 | 1.277 | | 0.921 | 1.083 |

Just from this example it seems the range method does about as well as the sample standard deviation.

For a non-normal example, let's repeat our exercise using a gamma distribution with shape 4, which has standard deviation 2.

 | w/d_n | std | |-------+-------| | 2.009 | 1.827 | | 1.474 | 1.416 | | 1.898 | 2.032 | | 2.346 | 2.252 | | 2.566 | 2.213 |

Once again, it seems both methods do about equally well. In both examples the uncertainty due to the small sample size is more important than the difference between the two methods.

Update: To calculate d_n for other values of n, see this post.

[1] Source: H, A. David. Order Statistics. John Wiley and Sons, 1970.

The post Estimating standard deviation from range first appeared on John D. Cook.

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title	John D. Cook
Feed Link	https://www.johndcook.com/blog