Parameters standard error using ML or Bayesian analysis

This article was originally posted in my previous blog

Let data that you suppose being distributed as Gaussian distribution.

# Generate 100 values from Gaussian distribution
val=rnorm(100, mean=20, sd=5)

If you want know the mean and sd of the underlying Gaussian distribution, you can use directly mean() and sd() however you will not know the distribution of the mean and SD (ie. if the mean and SD are well known).

mean(val); sd(val)

To get the distribution of both mean and SD (and then their confidence interval), you can use maximum likelihood and the SE of the point estimates can be obtained from the square root of the inverse of the Hessian matrix.
Let do it:

# Return -ln L of values in val in Gaussian distribution with mean and sd in par
fitnorm<-function(par, val) {
  -sum(dnorm(val, par["mean"], par["sd"], log = TRUE))
}
# Initial values for search
p<-c(mean=20, sd=5)
# fit the model
result <- optim(par=p, fn=fitnorm, val=val, method="BFGS", hessian=TRUE)
# Inverse the hessian matrix to get SE for each parameters
mathessian <- result$hessian
inversemathessian <- solve(mathessian)
res <- sqrt(diag(inversemathessian))
# results
data.frame(Mean=result$par, SE=res)

Or use the function SEfromHessian in my package HelpersMG

> library("HelpersMG")
Le chargement a nécessité le package : coda
> SEfromHessian(result$hessian)
     mean        sd 

0.4607890 0.3258245 

An alternative is to use the delta method:

> library("car")
> deltaMethod(result$par, "mean", vcov.=solve(result$hessian))
     Estimate       SE    2.5 %   97.5 %
mean 19.09488 0.460789 18.19175 19.99801
> deltaMethod(result$par, "sd", vcov.=solve(result$hessian))
   Estimate        SE    2.5 %   97.5 %
sd  4.60789 0.3258245 3.969286 5.246495

The function fitdistr() of the package MASS will do the same:

library(MASS)
(r<-fitdistr(val, "normal"))

An alternative is to use Bayesian framework using MCMC and Metropolis-Hastings algorithm. For Bayesian analysis, we need to define priors. To make the analysis the most similar to the one in ML, let use uniform priors from -100 to 100 for the mean and from 0 to 10 for the sd.

Let do with JAGS:

Create a file GaussianDunif.bug with this code:
model {
    #niveau 1 = données
for (i in 1:N){
y[i] ~ dnorm(mu, tau)
}

    #priors
mu ~ dunif(-100,100)
tau <- 1/sigma^2
sigma ~ dunif(0, 10)
}

Then run this script within R:

N <- 100
y <- val
library(rjags)

jags <- jags.model('GaussianUnif.bug',
                   data = list('y' = y,
                               'N' = N),
                   n.chains = 1,
                   n.adapt = 100)

#Sampling
out1 <- coda.samples(jags, variable.names=c('mu', 'sigma'), n.iter=50000,thin=1)

You get this result:
> summary(out1)$statistics
           Mean        SD    Naive SE Time-series SE
mu    19.281745 0.5178043 0.002315691    0.003019341
sigma  5.135868 0.3693947 0.001651983    0.002053014

Take care, the SE for the parameter mu and sigma are within the SD object.
The subtlety of SE being noted SD in coda output is also described here:
To do the same with the package HelpersMG (because jags and Rjags can be difficult to install):

# define the priors
parameters_mcmc <- data.frame(Density=c('dunif', 'dunif'),
                              Prior1=c(-100, 0), Prior2=c(100, 10), SDProp=c(0.2, 0.2),
                              Min=c(-100, 0), Max=c(100, 10), Init=c(20, 5), 
                              stringsAsFactors = FALSE,
                              row.names=c('mean', 'sd'))
# Define de likelihood function
dnormx <- function(data, x) {
  data <- unlist(data)
  return(-sum(dnorm(data, mean=x['mean'], sd=x['sd'], log=TRUE)))
}

# run the analysis
library("HelpersMG")
mcmc_run <- MHalgoGen(n.iter=50000, parameters=parameters_mcmc, 
                                                data=val, likelihood=fitnorm, 
                                                n.chains=1, n.adapt=100, thin=1, trace=1)
mcmcforcoda <- as.mcmc(mcmc_run)

And the result is:
> summary(mcmcforcoda)$statistics
          Mean       SD    Naive SE Time-series SE
mean 19.278648 0.514343 0.002300212    0.013344615
sd    5.139455 0.375194 0.001677919    0.007535762

Note that to get the confidence interval, it is better to use:
> apply(mcmcforcoda[[1]], 2, quantile, probs=c(0.025, 0.975))
          mean       sd
2.5%  18.26614 4.469787
97.5% 20.27939 5.920845

Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

stepAIC from package MASS with AICc

Install treemix in ubuntu 20.04