Bayes factors vs p-values
Bayesian analysis and Frequentist analysis often lead to the same conclusions by different routes. But sometimes the two forms of analysis lead to starkly different conclusions.
The following illustration of this difference comes from a talk by Luis Pericci last week. He attributes the example to "Bernardo (2010)" though I have not been able to find the exact reference.
In an experiment to test the existence of extra sensory perception (ESP), researchers wanted to see whether a person could influence some process that emitted binary data. (I'm going from memory on the details here, and I have not found Bernardo's original paper. However, you could ignore the experimental setup and treat the following as hypothetical. The point here is not to investigate ESP but to show how Bayesian and Frequentist approaches could lead to opposite conclusions.)
The null hypothesis was that the individual had no influence on the stream of bits and that the true probability of any bit being a 1 is p = 0.5. The alternative hypothesis was that p is not 0.5. There were N = 104,490,000 bits emitted during the experiment, and s = 52,263,471 were 1's. The p-value, the probability of an imbalance this large or larger under the assumption that p = 0.5, is 0.0003. Such a tiny p-value would be regarded as extremely strong evidence in favor of ESP given the way p-values are commonly interpreted.
The Bayes factor, however, is 5.95, meaning that the null hypothesis appears to be about six times more likely than the alternative. The alternative in this example uses Jeffreys' prior, Beta(0.5, 0.5).
So given the data and assumptions in this example, the Frequentist concludes there is strong evidence for ESP while the Bayesian concludes there is substantial evidence against ESP.
The following Python code shows how one might calculate the p-value and Bayes factor.
from scipy.stats import binomfrom scipy import log, expfrom scipy.special import betalnN = 104490000s = 52263471# sf is the survival function, i.e. complementary cdf# ccdf multiplied by 2 because we're doing a two-sided testprint("p-value: ", 2*binom.sf(s, N, 0.5))# Compute the log of the Bayes factor to avoid underflow.logbf = N*log(0.5) - betaln(s+0.5, N-s+0.5)print("Bayes factor: ", exp(logbf))