AICc or AIC for binomial model with splited or grouped data

As shown in a previous plot, the two ways to use binomial data (grouped by independent variable or splited for each individual) produced two different estimations of R2 or Nagelkerke R2:

https://biostatsr.blogspot.com/2021/06/nagelkerke-r2-for-binomial-data-is.html

Let try if the problem occurs with AIC and AICc:

Grouped data:

n <- c(3, 4, 5, 2, 2)
data_grouped <- data.frame(x=n, y=5-n)
cof_grouped <- c(10, 7, 2, 3, 4)
res_grouped <- glm(cbind(x,y) ~ cof_grouped, data=data_grouped, family=binomial())

cof_grouped2 <- c(9, 7, 2, 3, 4)
res_grouped2 <- glm(cbind(x,y) ~ cof_grouped2, data=data_grouped, family=binomial())

library(AICcmodavg) # or library(MuMIn)

AIC(res_grouped2)-AIC(res_grouped)
AICc(res_grouped2)-AICc(res_grouped)

Splited data:

n <- c(3, 4, 5, 2, 2)
data_splited <- data.frame(x=unlist(lapply(n, 
                                           FUN = function(x) c(rep(1, x), rep(0, 5-x)))), 
                           y=1-unlist(lapply(n, 
                                             FUN = function(x) c(rep(1, x), rep(0, 5-x)))))

cof_splited <- rep(cof_grouped, each=5)
res_splited <- glm(cbind(x,y) ~ cof_splited, data=data_splited, family=binomial())

cof_splited2 <- rep(cof_grouped2, each=5)
res_splited2 <- glm(cbind(x,y) ~ cof_splited2, data=data_splited, family=binomial())

library(AICcmodavg) # or library(MuMIn)

AIC(res_splited2)-AIC(res_splited)
AICc(res_splited2)-AICc(res_splited)

######## Results

> AIC(res_grouped2)-AIC(res_grouped)
[1] 0.003632532
> AICc(res_grouped2)-AICc(res_grouped)
[1] 0.003632532

> AIC(res_splited2)-AIC(res_splited)
[1] 0.003632532
> AICc(res_splited2)-AICc(res_splited)
[1] 0.003632532

∆AIC and ∆AICc are exactly the same, this is normal because the number of parameters k are the same in this case.

If you test models with different number of parameters, this is no more true:

Grouped data:

n <- c(3, 4, 5, 2, 2)

data_grouped <- data.frame(x=n, y=5-n)

cof_grouped <- c(10, 7, 2, 3, 4)
cof_grouped2 <- c(9, 7, 2, 3, 4)

res_grouped <- glm(cbind(x,y) ~ cof_grouped  + cof_grouped2, 
                   data=data_grouped, family=binomial())

res_grouped2 <- glm(cbind(x,y) ~ cof_grouped, 
                    data=data_grouped, family=binomial())

library(AICcmodavg) # or library(MuMIn)

AIC(res_grouped)-AIC(res_grouped2)
AICc(res_grouped)-AICc(res_grouped2)

Splited data:

n <- c(3, 4, 5, 2, 2)

data_splited <- data.frame(x=unlist(lapply(n, 
                                           FUN = function(x) c(rep(1, x), rep(0, 5-x)))), 
                           y=1-unlist(lapply(n,                                              
                                             FUN = function(x) c(rep(1, x), rep(0, 5-x)))))

cof_splited <- rep(cof_grouped, each=5)
cof_splited2 <- rep(cof_grouped2, each=5)

res_splited <- glm(cbind(x,y) ~ cof_splited  + cof_splited2, 
                   data=data_splited, family=binomial())

res_splited2 <- glm(cbind(x,y) ~ cof_splited, 
                    data=data_splited, family=binomial())

library(AICcmodavg) # or library(MuMIn)

AIC(res_splited)-AIC(res_splited2)
AICc(res_splited)-AICc(res_splited2)

######## Results

> AIC(res_grouped)-AIC(res_grouped2)
[1] 1.98618
> AICc(res_grouped)-AICc(res_grouped2)
[1] 19.98618

> AIC(res_splited)-AIC(res_splited2)
[1] 1.98618
> AICc(res_splited)-AICc(res_splited2)
[1] 2.583583

So in conclusion, take care. When you split the data and use AICc, the level of test is the individual whereas when you group the data, the level of test is the independent variable. It can change a lot your conclusion whereas the data are fundamentally the same.

Here with a ∆AICc equal to 19.98, you will conclude that data are very different but with ∆AICc equal to 2.58, you will conclude that difference exists, but not so big.

I am not sure what is the correct way to do... and even if there is a correct way to do !



Commentaires

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

Install treemix in ubuntu 20.04

stepAIC from package MASS with AICc