p-hacking: Another way to be sure to have significant results

août 19, 2019

I discover recently the quantile regression: rather than doing a regression on your points, you are doing a regression on the quantile(probs) of your points. As you have an infinite possibility for probs, you can do an infinite number of p-values... be sure that you will find one significant !

Let try:

df <- data.frame(y=runif(n = 10*20), x=rep(1:10, 20))

g0 <- glm(formula = y ~ x, data=df)
k <- summary(g0)

pvalue <- NULL

for (probs in seq(from=0.01, to=0.99, by=0.01)) {
q <- as.data.frame(aggregate(df, by=list(df$x), FUN = function(x) quantile(x, probs=probs)))
g <- glm(formula = y ~ x, data=q)
pvalue <- c(pvalue, summary(g)$coefficients["x", "Pr(>|t|)"])
}

h <- hist(pvalue, breaks=seq(from=0, to=1, by=0.05), las=1)
polygon(x=c(h$breaks[1], h$breaks[2], h$breaks[2], h$breaks[1]),
y=c(0, 0, h$counts[1], h$counts[1]),
col="grey")

probs <- seq(from=0.01, to=0.99, by=0.01)[which.min(pvalue)]
q <- as.data.frame(aggregate(df, by=list(df$x), FUN = function(x) quantile(x, probs=probs)))
g <- glm(formula = y ~ x, data=q)

plot(x = df$x, y=df$y, bty="n", pch=19, las=1, xaxt="n")
axis(1, at=1:10)
points(x = q$x, y=q$y, col="red", pch=19)
lines(x = q$x, y=g$fitted.values, col="red", lwd=2)
legend("bottomleft", legend = c(paste("p value=", k$coefficients["x", "Pr(>|t|)"]),
paste("p value=", min(pvalue)))
, col=c("black", "red"), pch=19)

The regression with all the points (in black) was not significant... pity after all this work !

Let try with all the quantiles from 0.01 to 0.99 (step 0.01).

Great: 3 quantiles produced significant p-value ! The most significant is shown in red in the previous graph.

In conclusion, take care when you see quantile regression in a paper !

Rechercher dans ce blog

BiostatR Blog

p-hacking: Another way to be sure to have significant results

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Standard error from Hessian Matrix... what can be done when problem occurs

Install treemix in ubuntu 20.04

Multivariable analysis and correlation of iconography