Articles

Affichage des articles du juin, 2021

AICc or AIC for binomial model with splited or grouped data

As shown in a previous plot, the two ways to use binomial data (grouped by independent variable or splited for each individual) produced two different estimations of R2 or Nagelkerke R2: https://biostatsr.blogspot.com/2021/06/nagelkerke-r2-for-binomial-data-is.html Let try if the problem occurs with AIC and AICc: Grouped data: n <- c(3, 4, 5, 2, 2) data_grouped <- data.frame(x=n, y=5-n) cof_grouped <- c(10, 7, 2, 3, 4) res_grouped <- glm(cbind(x,y) ~ cof_grouped, data=data_grouped, family=binomial()) cof_grouped2 <- c(9, 7, 2, 3, 4) res_grouped2 <- glm(cbind(x,y) ~ cof_grouped2, data=data_grouped, family=binomial()) library(AICcmodavg) # or library(MuMIn) AIC(res_grouped2)-AIC(res_grouped) AICc(res_grouped2)-AICc(res_grouped) Splited data: n <- c(3, 4, 5, 2, 2) data_splited <- data.frame(x=unlist(lapply(n,                                             FUN = function(x) c(rep(1, x), rep(0, 5-x)))),                             y=1-unlist(lapply(n,                   

R2 for binomial data is sensitive on the grouping scheme

 Try to use these data to calculate Nagelkerke R2: library(fmsb) data_grouped <- data.frame(x=c(3, 4, 5), y=c(2, 1, 0)) cof_grouped <- c(10, 7, 2) res_grouped <- glm(cbind(x,y) ~ cof_grouped, data=data_grouped, family=binomial()) summary(res_grouped) or data_splited <- data.frame(x=c(1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1),                             y=1-c(1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1)) cof_splited <- rep(c(10, 7, 2), each=5) res_splited <- glm(cbind(x,y) ~ cof_splited, data=data_splited, family=binomial()) summary(res_splited) These data are exactly the same and indeed the fitted model is the same. But... > NagelkerkeR2(res_grouped)$R2 [1] 0.9538362 > NagelkerkeR2(res_splited)$R2 [1] 0.2879462 You will conclude for a very strong link and a weaker link in the second case. Note that you have the same problem with a common R2: > cor(x = data_grouped$x/(data_grouped$x+data_grouped$y), y=res_grouped$fitted.values)^2 [1] 0.964555 > cor(x =

Nagelkerke R2 with different packages gives different values

 Let try different packages to estimate Nagelkerke (1991) R^2. Nagelkerke NJD (1991) A note on a general definition of the coefficient of determination. Biometrika 78:691-192 nagelkerke() function from package companion truncates the significance digits at 6. NagelkerkeR2() from package fmsb reports a different value as compare to other functions in the first example. I don't know why. I have contacted the author of the package and he does not know also. install.packages("https://hebergement.universite-paris-saclay.fr/marcgirondot/CRAN/HelpersMG.tar.gz", repos=NULL, type="source") library(HelpersMG) library(rcompanion) library(fmsb) res <- glm(cbind(ncases,ncontrols) ~ agegp+alcgp+tobgp, data=esoph, family=binomial()) NagelkerkeR2(res)$R2 # [1] 0.9759681 nagelkerke(res)$Pseudo.R.squared.for.model.vs.null["Nagelkerke (Cragg and Uhler)", "Pseudo.R.squared"] # [1] 0.965045 NagelkerkeScaledR2(x=esoph$ncases, size = esoph$ncases+esoph$ncontrols