Understanding statistical error
A simple linear regression model has the form
y = + x + .
This means that the output variable y is a linear function of the input variable x, plus some error term that is randomly distributed.
There's a common misunderstanding over whose error the error term is. A naive view is that the world really is linear, that
y = + x
is some underlying Platonic reality, and that the only reason that we don't measure exactly that linear part is that we as observers have made some sort of error, that the fault is the real world rather than in the model.
No, reality is what it is, and it's our model that is in error. Some aspect of reality may indeed have a structure that is approximately linear (over some range, under some conditions), but when we truncate reality to only that linear approximation, we introduce some error. This error may be tolerable-and thankfully it often is-but the error is ours, not the world's.
The post Understanding statistical error first appeared on John D. Cook.