Parameters standard error using ML or Bayesian analysis
This article was originally posted in my previous blog
Let data that you suppose being distributed as Gaussian distribution.
# Generate 100 values from Gaussian distribution
val=rnorm(100, mean=20, sd=5)
If you want know the mean and sd of the underlying Gaussian distribution, you can use directly mean() and sd() however you will not know the distribution of the mean and SD (ie. if the mean and SD are well known).
mean(val); sd(val)
To get the distribution of both mean and SD (and then their confidence interval), you can use maximum likelihood and the SE of the point estimates can be obtained from the square root of the inverse of the Hessian matrix.
Let do it:
# Return -ln L of values in val in Gaussian distribution with mean and sd in par
fitnorm<-function(par, val) {
-sum(dnorm(val, par["mean"], par["sd"], log = TRUE))
}
# Initial values for search
p<-c(mean=20, sd=5)
# fit the model
result <- optim(par=p, fn=fitnorm, val=val, method="BFGS", hessian=TRUE)
# Inverse the hessian matrix to get SE for each parameters
mathessian <- result$hessian
inversemathessian <- solve(mathessian)
res <- sqrt(diag(inversemathessian))
# results
data.frame(Mean=result$par, SE=res)
Or use the function SEfromHessian in my package HelpersMG
> library("HelpersMG")
Le chargement a nécessité le package : coda
> SEfromHessian(result$hessian)
mean sd
0.4607890 0.3258245
An alternative is to use the delta method:
> library("car")
> deltaMethod(result$par, "mean", vcov.=solve(result$hessian))
Estimate SE 2.5 % 97.5 %
mean 19.09488 0.460789 18.19175 19.99801
> deltaMethod(result$par, "sd", vcov.=solve(result$hessian))
Estimate SE 2.5 % 97.5 %
sd 4.60789 0.3258245 3.969286 5.246495
The function fitdistr() of the package MASS will do the same:
library(MASS)
(r<-fitdistr(val, "normal"))
An alternative is to use Bayesian framework using MCMC and Metropolis-Hastings algorithm. For Bayesian analysis, we need to define priors. To make the analysis the most similar to the one in ML, let use uniform priors from -100 to 100 for the mean and from 0 to 10 for the sd.
Let do with JAGS:
Create a file GaussianDunif.bug with this code:
model {
#niveau 1 = données
for (i in 1:N){
y[i] ~ dnorm(mu, tau)
}
#priors
mu ~ dunif(-100,100)
tau <- 1/sigma^2
sigma ~ dunif(0, 10)
}
Then run this script within R:
N <- 100
y <- val
library(rjags)
jags <- jags.model('GaussianUnif.bug',
data = list('y' = y,
'N' = N),
n.chains = 1,
n.adapt = 100)
#Sampling
out1 <- coda.samples(jags, variable.names=c('mu', 'sigma'), n.iter=50000,thin=1)
You get this result:
> summary(out1)$statistics
Mean SD Naive SE Time-series SE
mu 19.281745 0.5178043 0.002315691 0.003019341
sigma 5.135868 0.3693947 0.001651983 0.002053014
Take care, the SE for the parameter mu and sigma are within the SD object.
The subtlety of SE being noted SD in coda output is also described here:
To do the same with the package HelpersMG (because jags and Rjags can be difficult to install):
# define the priors
parameters_mcmc <- data.frame(Density=c('dunif', 'dunif'),
Prior1=c(-100, 0), Prior2=c(100, 10), SDProp=c(0.2, 0.2),
Min=c(-100, 0), Max=c(100, 10), Init=c(20, 5),
stringsAsFactors = FALSE,
row.names=c('mean', 'sd'))
# Define de likelihood function
dnormx <- function(data, x) {
data <- unlist(data)
return(-sum(dnorm(data, mean=x['mean'], sd=x['sd'], log=TRUE)))
}
# run the analysis
library("HelpersMG")
mcmc_run <- MHalgoGen(n.iter=50000, parameters=parameters_mcmc,
data=val, likelihood=fitnorm,
n.chains=1, n.adapt=100, thin=1, trace=1)
mcmcforcoda <- as.mcmc(mcmc_run)
And the result is:
> summary(mcmcforcoda)$statistics
Mean SD Naive SE Time-series SE
mean 19.278648 0.514343 0.002300212 0.013344615
sd 5.139455 0.375194 0.001677919 0.007535762
Note that to get the confidence interval, it is better to use:
> apply(mcmcforcoda[[1]], 2, quantile, probs=c(0.025, 0.975))
mean sd
2.5% 18.26614 4.469787
97.5% 20.27939 5.920845
Let data that you suppose being distributed as Gaussian distribution.
# Generate 100 values from Gaussian distribution
val=rnorm(100, mean=20, sd=5)
If you want know the mean and sd of the underlying Gaussian distribution, you can use directly mean() and sd() however you will not know the distribution of the mean and SD (ie. if the mean and SD are well known).
mean(val); sd(val)
To get the distribution of both mean and SD (and then their confidence interval), you can use maximum likelihood and the SE of the point estimates can be obtained from the square root of the inverse of the Hessian matrix.
Let do it:
# Return -ln L of values in val in Gaussian distribution with mean and sd in par
fitnorm<-function(par, val) {
-sum(dnorm(val, par["mean"], par["sd"], log = TRUE))
}
# Initial values for search
p<-c(mean=20, sd=5)
# fit the model
result <- optim(par=p, fn=fitnorm, val=val, method="BFGS", hessian=TRUE)
# Inverse the hessian matrix to get SE for each parameters
mathessian <- result$hessian
inversemathessian <- solve(mathessian)
res <- sqrt(diag(inversemathessian))
# results
data.frame(Mean=result$par, SE=res)
Or use the function SEfromHessian in my package HelpersMG
> library("HelpersMG")
Le chargement a nécessité le package : coda
> SEfromHessian(result$hessian)
mean sd
0.4607890 0.3258245
An alternative is to use the delta method:
> library("car")
> deltaMethod(result$par, "mean", vcov.=solve(result$hessian))
Estimate SE 2.5 % 97.5 %
mean 19.09488 0.460789 18.19175 19.99801
> deltaMethod(result$par, "sd", vcov.=solve(result$hessian))
Estimate SE 2.5 % 97.5 %
sd 4.60789 0.3258245 3.969286 5.246495
The function fitdistr() of the package MASS will do the same:
library(MASS)
(r<-fitdistr(val, "normal"))
An alternative is to use Bayesian framework using MCMC and Metropolis-Hastings algorithm. For Bayesian analysis, we need to define priors. To make the analysis the most similar to the one in ML, let use uniform priors from -100 to 100 for the mean and from 0 to 10 for the sd.
Let do with JAGS:
Create a file GaussianDunif.bug with this code:
model {
#niveau 1 = données
for (i in 1:N){
y[i] ~ dnorm(mu, tau)
}
#priors
mu ~ dunif(-100,100)
tau <- 1/sigma^2
sigma ~ dunif(0, 10)
}
Then run this script within R:
N <- 100
y <- val
library(rjags)
jags <- jags.model('GaussianUnif.bug',
data = list('y' = y,
'N' = N),
n.chains = 1,
n.adapt = 100)
#Sampling
out1 <- coda.samples(jags, variable.names=c('mu', 'sigma'), n.iter=50000,thin=1)
You get this result:
> summary(out1)$statistics
Mean SD Naive SE Time-series SE
mu 19.281745 0.5178043 0.002315691 0.003019341
sigma 5.135868 0.3693947 0.001651983 0.002053014
Take care, the SE for the parameter mu and sigma are within the SD object.
The subtlety of SE being noted SD in coda output is also described here:
To do the same with the package HelpersMG (because jags and Rjags can be difficult to install):
# define the priors
parameters_mcmc <- data.frame(Density=c('dunif', 'dunif'),
Prior1=c(-100, 0), Prior2=c(100, 10), SDProp=c(0.2, 0.2),
Min=c(-100, 0), Max=c(100, 10), Init=c(20, 5),
stringsAsFactors = FALSE,
row.names=c('mean', 'sd'))
# Define de likelihood function
dnormx <- function(data, x) {
data <- unlist(data)
return(-sum(dnorm(data, mean=x['mean'], sd=x['sd'], log=TRUE)))
}
# run the analysis
library("HelpersMG")
mcmc_run <- MHalgoGen(n.iter=50000, parameters=parameters_mcmc,
data=val, likelihood=fitnorm,
n.chains=1, n.adapt=100, thin=1, trace=1)
mcmcforcoda <- as.mcmc(mcmc_run)
And the result is:
> summary(mcmcforcoda)$statistics
Mean SD Naive SE Time-series SE
mean 19.278648 0.514343 0.002300212 0.013344615
sd 5.139455 0.375194 0.001677919 0.007535762
Note that to get the confidence interval, it is better to use:
> apply(mcmcforcoda[[1]], 2, quantile, probs=c(0.025, 0.975))
mean sd
2.5% 18.26614 4.469787
97.5% 20.27939 5.920845
Commentaires
Enregistrer un commentaire