ODE to Fisher’s transform
I was calculating a correlation coefficient this afternoon and ran into something interesting.
Suppose you have two uncorrelated random variablesX andY. If you draw, say, a thousand samples each from X andY and compute Pearson's correlation coefficient, you almost certainly will not get 0, though you very likely will get a small number.
How do you find a confidence interval around a correlation coefficient?
Sample correlation coefficient values do not follow a normally distribution, though the distribution is approximately normal if the population correlation is small. The distribution gets further from normal as the correlation gets close to 1 or -1.
Enter Fisher's transformation. If you run the sample correlation coefficient r through the function
log((1 + r)/(1 - r)) = arctanh(r)
you get something that has a distribution closer to the normal distribution. You find a confidence interval for the transformed variable, then undo the transformation.
Now where did the Fisher transform come from?
I don't know whether this was Fisher's derivation, but Hotelling came up the following derivation. Assume you apply a transformG(r) to the correlation coefficient. Write an asymptotic expansion for the kurtosis 3 and set the first term equal to zero. This leads to the ordinary differential equation
3(1 - r^3) G''(r) - 6r G'(r) = 0
which has the solutionG(r) = arctanh(r).
I found this interesting because I've worked with differential equations and with statistics, but I've rarely seen them overlap.
The post ODE to Fisher's transform first appeared on John D. Cook.