R2 for binomial data is sensitive on the grouping scheme

juin 21, 2021

Try to use these data to calculate Nagelkerke R2:

library(fmsb)
data_grouped <- data.frame(x=c(3, 4, 5), y=c(2, 1, 0))
cof_grouped <- c(10, 7, 2)
res_grouped <- glm(cbind(x,y) ~ cof_grouped, data=data_grouped, family=binomial())
summary(res_grouped)

data_splited <- data.frame(x=c(1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1),

y=1-c(1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1))

cof_splited <- rep(c(10, 7, 2), each=5)

res_splited <- glm(cbind(x,y) ~ cof_splited, data=data_splited, family=binomial())

summary(res_splited)

These data are exactly the same and indeed the fitted model is the same.

But...

> NagelkerkeR2(res_grouped)$R2

[1] 0.9538362

> NagelkerkeR2(res_splited)$R2

[1] 0.2879462

You will conclude for a very strong link and a weaker link in the second case.

Note that you have the same problem with a common R2:

> cor(x = data_grouped$x/(data_grouped$x+data_grouped$y), y=res_grouped$fitted.values)^2

[1] 0.964555

> cor(x = data_splited$x/(data_splited$x+data_splited$y), y=res_splited$fitted.values)^2

[1] 0.1607592

This problem is also indicated here:

https://thestatsgeek.com/2014/02/08/r-squared-in-logistic-regression/

The conclusion of the author is:

The low R squared for the individual binary data model reflects the fact that the covariate x does not enable accurate prediction of the individual binary outcomes. In contrast, x can give a good prediction for the number of successes in a large group of individuals.

Rechercher dans ce blog

BiostatR Blog

R2 for binomial data is sensitive on the grouping scheme

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

Install treemix in ubuntu 20.04

Multivariable analysis and correlation of iconography