R2 for binomial data is sensitive on the grouping scheme
Try to use these data to calculate Nagelkerke R2:
data_grouped <- data.frame(x=c(3, 4, 5), y=c(2, 1, 0))
cof_grouped <- c(10, 7, 2)
res_grouped <- glm(cbind(x,y) ~ cof_grouped, data=data_grouped, family=binomial())
summary(res_grouped)
These data are exactly the same and indeed the fitted model is the same.
But...
> NagelkerkeR2(res_grouped)$R2
[1] 0.9538362
> NagelkerkeR2(res_splited)$R2
[1] 0.2879462
You will conclude for a very strong link and a weaker link in the second case.
Note that you have the same problem with a common R2:
> cor(x = data_grouped$x/(data_grouped$x+data_grouped$y), y=res_grouped$fitted.values)^2
[1] 0.964555
> cor(x = data_splited$x/(data_splited$x+data_splited$y), y=res_splited$fitted.values)^2
[1] 0.1607592
This problem is also indicated here:
https://thestatsgeek.com/2014/02/08/r-squared-in-logistic-regression/
The conclusion of the author is:
The low R squared for the individual binary data model reflects the fact that the covariate x does not enable accurate prediction of the individual binary outcomes. In contrast, x can give a good prediction for the number of successes in a large group of individuals.
See also:
Mittlböck M, Heinzl H (2001) A note on R2 measures for Poisson and logistic regression models when both models are applicable. Journal of Clinical Epidemiology 54: 99-103 doi 10.1016/S0895-4356(00)00292-4
Commentaires
Enregistrer un commentaire