Bayesian or frequentist test

This note is based on a discussion with Jean-Michel Guillon.

The idea was to test an idea from this this paper:
Sterne, J.A.C., Smith, G.D., 2001. Sifting the evidence-what’s wrong with significance tests? British Medical Journal 322, 226-231.

If our prior opinion about the risk ratio is vague (we consider a wide range of values to be equally likely) then the results of a frequentist analysis are similar to the results of a bayesian analysis; both are based on what statisticians call the likelihood for the data:
• The 95% confidence interval is the same as the 95% credible interval, except that the latter has the meaning often incorrectly ascribed to a confidence interval;
• The (one sided) P value is the same as the bayesian posterior probability that the drug increases the risk of death (assuming that we found a protective effect of the drug).
The two approaches, however, will give different results if our prior opinion is not vague, relative to the amount of information contained in the data.

Let test the first idea:

set.seed(1)
val1 <- rnorm(100, 10, 2)

# 95% credibility interval
library(HelpersMG)

dnormx <- function(data, x, fixed) {
  data <- unlist(data)
  x <- c(x, fixed)
  return(-sum(dnorm(data, mean=x['mean'], sd=x['sd'], log=TRUE)))
}

parameters_mcmc <- data.frame(Density = "dunif", 
                              Prior1 = -100, 
                              Prior2 = 100, 
                              SDProp = 0.35, 
                              Min = -100, 
                              Max = 100, 
                              Init = 10,
                              row.names = "mean", stringsAsFactors = FALSE)



set.seed(1)
mcmc_run <- MHalgoGen(n.iter=500000, parameters=parameters_mcmc, fixed=c(sd=2), 
                      data=val1, 
                      likelihood=dnormx, n.chains=1, n.adapt=100, thin=1, trace=100)

> quantile(mcmc_run$resultMCMC[[1]], probs=c(0.025, 0.975))
     2.5%     97.5% 
 9.812352 10.058331 
> # 95% confidence interval is 
> c(mean(val1)-2*sd(val1)/sqrt(length(val1)), 
+   mean(val1)+2*sd(val1)/sqrt(length(val1)))
[1]  9.807548 10.063363

This is indeed correct.

Let test the second idea. We will used 3 ways of testing. A classical t-test, a mcmc posterior distribution using HelpersMG package and same using BayesFirstAid package. This package is no more maintained but is available on GitHub. See http://sumsar.net/files/academia/baath_user14_abstract.pdf
This is possible because in bayes.t.test help, it is indicated that:
The constants M[μ], S[μ], L[σ] and H[σ] are set so that the priors on μ and σ are essentially flat.

> set.seed(1)
> val1 <- rnorm(10000, 10, 2)

> set.seed(1)
> # credibility interval
> library(HelpersMG)

> dnormx <- function(data, x, fixed) {
+     data <- unlist(data)
+     x <- c(x, fixed)
+     return(-sum(dnorm(data, mean=x['mean'], sd=x['sd'], log=TRUE)))
+ }

> parameters_mcmc <- data.frame(Density = "dunif", 
+                               Prior1 = -100, 
+                               Prior2 = 100, 
+                               SDProp = 0.35, 
+                               Min = -100, 
+                               Max = 100, 
+                               Init = 10,
+                               row.names = "mean", stringsAsFactors = FALSE)

> mcmc_run <- MHalgoGen(n.iter=500000, parameters=parameters_mcmc, fixed=c(sd=2), 
+                       data=val1, 
+                       likelihood=dnormx, n.chains=1, n.adapt=100, thin=1, trace=1)
Chain 1: [1] -21244.8863059415
Best likelihood for: 
mean = 9.98692609373803 
> plot(mcmc_run, xlim=c(-110, 110))
> hist(mcmc_run$resultMCMC[[1]])
> 1-sum(mcmc_run$resultMCMC[[1]]<10)/length(mcmc_run$resultMCMC[[1]])
[1] 0.25648
> t.test(x=val1, mu=10, alternative = c("less"))

One Sample t-test

data:  val1
t = -0.64573, df = 9999, p-value = 0.2592
alternative hypothesis: true mean is less than 10
95 percent confidence interval:
     -Inf 10.02023
sample estimates:
mean of x 
 9.986926 


> library(BayesianFirstAid)
> bayes.t.test(x=val1, mu=10, alternative = c("less"))
  |**************************************************| 100%

Bayesian estimation supersedes the t test (BEST) - one sample

data: val1, n = 10000

  Estimates [95% credible interval]
mean of val1: 10 [9.9, 10]
sd of val1: 2.0 [2.0, 2.0]

The mean is more than 10 by a probability of 0.263 
and less than 10 by a probability of 0.737 

Warning message:
In bayes.t.test.default(x = val1, mu = 10, alternative = c("less")) :
  The argument 'alternative' is ignored by bayes.binom.test

This it is indeed again correct.



Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

stepAIC from package MASS with AICc

Install treemix in ubuntu 20.04