Understand mixed-models

Mixed models include both fixed effects and random effects. What are exactly mixed or random effects is rather a philosophical subject, but the mathematical definition could help to better understand what they are.

So let take an example. First we generate some fake data:
Sector <- c(rep("I", 100), rep("II", 100))
Beach <- c(
  sample(c("A", "B", "C", "D", "E"), 100, replace=TRUE),
  sample(c("F", "G"), 100, replace=TRUE)
  )
number <- rnorm(200, 10, 1)
# Sector effect
number[1:100] <- number[1:100] +0.1
number[101:200] <- number[101:200] +0.5
# beach effect
beach.value <- 1:7
names(beach.value) <- LETTERS[1:7]
number <- number + unname(beach.value[Beach])

dataF <- data.frame(number=number, effect= number/10+runif(200, 0, 2),
                    Sector=Sector, Beach=Beach)

The data.frame dataF includes number column which is the dependent variable and effect column with is the fixed effect.

Two columns are factors, Sector and Beach and Beach is a subset of Sector.

Setup hierarchical model

Let try different formula for random effect:

> out3 <- lmer(formula = number ~ effect + (Sector | Beach), data=dataF)
> ranef(out3)
$Beach
  (Intercept)   SectorII
A  -1.5369462 -1.9358614
B  -1.2781884 -1.6099428
C  -0.2292436 -0.2887440
D   0.5526984  0.6961516
E   1.5673895  1.9742063
F   0.9266239  2.2728648
G   1.1618567  2.8498542

> head(predict(out3))
       1        2        3        4        5        6 
11.85990 11.81638 11.67911 11.58098 15.02737 11.97424 
With this solution all the beaches are in all the sectors. This is clearly wrong.


> out3 <- lmer(formula = number ~ effect + (1 + Sector | Beach), data=dataF)
> ranef(out3)
$Beach
  (Intercept)   SectorII
A  -1.5369462 -1.9358614
B  -1.2781884 -1.6099428
C  -0.2292436 -0.2887440
D   0.5526984  0.6961516
E   1.5673895  1.9742063
F   0.9266239  2.2728648
G   1.1618567  2.8498542

> head(predict(out3))
       1        2        3        4        5        6 
11.85990 11.81638 11.67911 11.58098 15.02737 11.97424 
With this solution all the beaches are in all the sectors. This is clearly wrong. 

> out5 <- lmer(formula = number ~ effect + (Beach | Sector), data=dataF)
boundary (singular) fit: see help('isSingular')
> ranef(out5)
$Sector
   (Intercept)    BeachB    BeachC     BeachD    BeachE    BeachF       BeachG
I    -5.176435  1.227155  2.475337  3.2393238  4.382591  5.147488  4.706577242
II    1.549393 -0.368046 -0.740758 -0.9704698 -1.310487 -1.452310 -0.009141647

with conditional variances for “Sector” 

With this solution all the beaches are in all the sectors. This is clearly wrong.

The difference with the previous model is subtle ! Here the random effect is modeled as the slope.

 


> out3 <- lmer(formula = number ~ effect + (1 | Sector / Beach), data=dataF)
> ranef(out3)
$`Beach:Sector`
     (Intercept)
A:I   -1.4181304
B:I   -1.1604936
C:I   -0.1181560
D:I    0.6569242
E:I    1.6663175
F:II  -0.2143497
G:II   0.5878879

$Sector
   (Intercept)
I     -1.76777
II     1.76777

> head(predict(out3))
       1        2        3        4        5        6 
11.86449 11.82112 11.68454 11.58676 15.01205 11.97863 
In this solution, there is a sector effect and the beach effect is linked with the sector.


> out3 <- lmer(formula = number ~ effect + (1 | Beach) + (1 | Sector), data=dataF)
> ranef(out3)
$Beach
  (Intercept)
A  -1.4181304
B  -1.1604936
C  -0.1181560
D   0.6569242
E   1.6663175
F  -0.2143497
G   0.5878879

$Sector
   (Intercept)
I     -1.76777
II     1.76777

> head(predict(out3))
       1        2        3        4        5        6 
11.86449 11.82112 11.68454 11.58676 15.01205 11.97863 

In this solution, there is a sector effect and a beach effect. 

The correct ones are the third (1 | Sector / Beach) and fourth (1 | Beach) + (1 | Sector).

Prediction for mixed model

Let do a simple glm on these data:
out0 <- glm(formula = number ~ effect + Sector, data=dataF)
head(predict(out0))
ef0 <- out0$coefficients
head(ef0["(Intercept)"]+dataF$effect*ef0["effect"]+
       ef0["SectorII"]*ifelse(dataF$Sector=="SectorII", 1, 0)
)

ok. It works.

Now let do with Sector being a random factor:

library("lme4")
out1 <- lmer(formula = number ~ effect + (1 | Sector) , data=dataF)
head(predict(out1))
ef <- fixef(out1)
er <- ranef(out1)
head(ef["(Intercept)"]+dataF$effect*ef["effect"]+ er$Sector[dataF$Sector, "(Intercept)"] )

It works also and the predicts are a little bit different. when checking the fitted parameters, we see the difference of logic between both. For Sector being a fixed factor, we have:
> out0$coefficients
(Intercept)      effect    SectorII
  12.340869    0.424527    3.409973 
And for Sector begin a random factor:

> ranef(out1)
$Sector
   (Intercept)
I    -1.698848
II    1.698848

When Sector is included as a random factor, the average coefficient is 0. Random factor effect is based on dispersion. Fixed factors effect is based on change on intercept.

Now when the random factors are hierarchical effect (Beach within Sector):

out3 <- lmer(formula = number ~ effect + (Sector | Beach), data=dataF)head(predict(out3))
ef <- fixef(out3)

er <- ranef(out3)
fixed <- model.matrix(~effect, data = dataF) %*% ef
random <- rowSums(model.matrix(~Sector, data = dataF) *               er$Beach[dataF$Beach, ]) 
head(fixed[, 1] + random)
The formula for the predict is not really straightforward by it works!

model.matrix(~effect, data = dataF) creates a matrix with 2 columns (intercept) and effect and the content is taken from the dataF data.frame.
%*% is the matrix multiplication sign.
Then it takes each row and makes the multiplication of the (intercept) value (1 !) with the (intercept) coefficient, and same for effect.
Then a matrix with one column is returned. This is the fixed component of the result.

model.matrix(~Sector, data = dataF) creates a matrix with 200 rows and two columns, (Intercept) and SectorII.
The column (intercept) is equal to 1 and the column SectorII is 1 for Sector II and 0 for Sector I.
er$Beach[dataF$Beach, ] returns also a matrix with 200 rows and two columns, (Intercept) and SectorII.
It simply multiply the first column of the model.matrix with the first column of er$Beach[dataF$Beach, ] and same for the second one. Then it makes the sum of both.

Nothing really complicated.

Comparison between glmmPQL {MASS}, glmmadmb {glmmADMB} and lmer() {lme4}

Let fit the data with these three standard packages for mixed models:
> out0 <- glm(formula = number ~ effect + Beach, data=dataF)
> head(predict(out0))
       1        2        3        4        5        6 
11.82933 11.78686 11.64316 11.54739 15.07602 11.93119 


> library(MASS)
> out4 <- glmmPQL(fixed = number ~ effect, random=~ 1 | Sector / Beach, data=dataF, 
+                 family=gaussian())
itération 1
itération 2
> head(predict(out4))
     I/B      I/B      I/A      I/A      I/E      I/A 
11.86492 11.82117 11.68538 11.58673 15.01615 11.98211 


> library(lme4)
> out3 <- lmer(formula = number ~ effect + (1 | Sector / Beach), data=dataF, REML = FALSE)
> head(predict(out3))
       1        2        3        4        5        6 
11.86492 11.82117 11.68538 11.58673 15.01615 11.98211 


> out5 <- lmer(formula = number ~ effect + (1 | Sector / Beach), data=dataF, REML = TRUE)
> head(predict(out5))
       1        2        3        4        5        6 
11.86449 11.82112 11.68454 11.58676 15.01205 11.97863 


> library(glmmADMB)
> out6 <- glmmadmb(formula = number ~ effect, random = ~ (1 | Sector / Beach), 
+                  data=dataF, family="gaussian")
> head(predict(out6))
[1] 14.73357 14.68982 14.81196 14.71330 15.05719 15.10868

A discussion about the use of pseudo-likelihood for model comparisons can be found in the package {bbmle} in the User guide.

Conclusions: glmmPQL() and lmer(, REML = FALSE) give same results. It is normal because pseudo-likelihood are used only for correlative structures which are absent here.

predict.glmmadmb() uses only fixed effect and then the prediction can be very bad (see figure below).

Effect both within fixed and random factor

If an effect is both in fixed and random factor, the fixed effect takes all the information but the interaction can have information. The first case is correct, not the second one.
> out3 <- lmer(formula = number ~ effect + Sector + (1 | Sector / Beach), data=dataF, REML = FALSE)
> ranef(out3)
$`Beach:Sector`
     (Intercept)
A:I  -1.32442880
B:I  -1.06906910
C:I  -0.04121253
D:I   0.71914965
E:I   1.71556079
F:II -0.39907015
G:II  0.39907015
$Sector
   (Intercept)
I            0
II           0
> out3 <- lmer(formula = number ~ effect + Beach + (1 | Sector / Beach), data=dataF, REML = FALSE)
> ranef(out3)
$`Beach:Sector`
     (Intercept)
A:I            0
B:I            0
C:I            0
D:I            0
E:I            0
F:II           0
G:II           0

$Sector
     (Intercept)
I  -3.997234e-29
II  9.993085e-30


Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

stepAIC from package MASS with AICc

Install treemix in ubuntu 20.04