BiostatR Blog

Articles

Affichage des articles du août, 2019

Deviance test for binomial distribution

août 29, 2019

# Let A et B being observations from binomial distribution: A <- c(10, 10, 9, 7, 3, 2, 3) B <- c(2, 5, 6, 6, 10, 15, 20) # And the prediction of proportion being pred: pred <- c(0.9, 0.85, 0.75, 0.4, 0.3, 0.2, 0.1) # For example prod could be the result of a glm, or other model. # Number of parameters obtained from observations and used to estimate pred parameter <- 2 # Then the degrees of freedom are: df <- length(pred)-parameter LnL <- sum(dbinom(x=A, size = A+B, prob = pred, log = TRUE)) LnLSat <- sum(dbinom(x=A, size = A+B, prob = A/(A+B), log = TRUE)) deviance <- -2*(LnL - LnLSat) p <- 1-pchisq(deviance, df=df) # The larger is p, the more the observations could have been obtained from the model used to calculate pred # Do not use p<0.05 as a criteria; p is like a proxy plot(x=pred, y=A/(A+B), bty="n", las=1, xlim=c(0, 1), ylim=c(0,1)) text(x=0.8, y=0.

p-hacking: Another way to be sure to have significant results

août 19, 2019

I discover recently the quantile regression: rather than doing a regression on your points, you are doing a regression on the quantile(probs) of your points. As you have an infinite possibility for probs, you can do an infinite number of p-values... be sure that you will find one significant ! Let try: df <- data.frame(y=runif(n = 10*20), x=rep(1:10, 20)) g0 <- glm(formula = y ~ x, data=df) k <- summary(g0) pvalue <- NULL for (probs in seq(from=0.01, to=0.99, by=0.01)) { q <- as.data.frame(aggregate(df, by=list(df$x), FUN = function(x) quantile(x, probs=probs))) g <- glm(formula = y ~ x, data=q) pvalue <- c(pvalue, summary(g)$coefficients["x", "Pr(>|t|)"]) } h <- hist(pvalue, breaks=seq(from=0, to=1, by=0.05), las=1) polygon(x=c(h$breaks[1], h$breaks[2], h$breaks[2], h$breaks[1]), y=c(0, 0, h$counts[1], h$counts[1]), col="grey") probs <- seq(from=0.01, to=0.99, by=0.01)[which.min(pvalue)]

Another R inferno: if () {} else {} + x

août 18, 2019

if (TRUE) { 0 } else { 2 } + 3 will return 0 and not 3 because the priority order is: if (TRUE) then {0} else {2} + 3 This is clearly counter-intuitive. As a general rule, I propose to never use result of a if statement directly in a computing. Note that ifelse() gives more normal result: ifelse(TRUE, 0, 2) + 3 will return 3

sample(x, length) or x[runif(length, 1:(length(x)+1))] : what is the most rapid?

août 11, 2019

microbenchmark( sample(x = x, size=1000L, replace = TRUE) , sample(x = x, size=1000, replace = TRUE) , x[floor(runif(1000L, 1L, length(x)+1L))] , x[floor(runif(1000, 1, length(x)+1))] , x[runif(1000L, 1L, length(x)+1L)] , x[runif(1000, 1, length(x)+1)] , times = 10000L ) Unit: microseconds expr min lq mean median uq sample(x = x, size = 1000L, replace = TRUE) 42.202 46.0750 51.70049 50.4445 53.1295 sample(x = x, size = 1000, replace = TRUE) 41.433 46.0935 51.52335 50.5570 53.0650 x[floor(runif(1000L, 1L, length(x) + 1L))] 40.027 42.2450 49.22916 49.5545 52.0355 x[floor(runif(1000, 1, length(x) + 1))] 40.112 42.1865 59.03723 49.5050 51.9370 x[runif(1000L, 1L, length(x) + 1L)] 37.124 39.2570 46.15197 46.4555 48.9570 x[runif(1000, 1, length(x) + 1)] 37.093 39.1935 54.89901 46.3630 48.7875 In conclusion, the most rapid is x[runif(1000, 1, length(x) + 1)] But the

Integer, L, double, numeric

août 11, 2019

in R, integers can be setup using as.integer(x) or 1000L (L means L, long, because it used 32 bits representation of number which was longer than the old 8 or 16 bits !). f <- integer(length) creates a vector of size length with only 0. > f <- integer(1) > f [1] 0 > is.integer(f) [1] TRUE > identical(f, 0L) [1] TRUE The double() or numeric() are the same number representation: real() for R <3.0.0 was the same but is deprecated. microbenchmark( 1L:2000L , 1:1000 , as.integer(1):as.integer(2000) , times = 1000000L ) Unit: nanoseconds expr min lq mean median uq max 1L:2000L 129 144 181.2866 147 153 64915 1:1000 126 147 203.7078 149 154 20499610 as.integer(1):as.integer(2000) 299 329 453.9530 335 350 32912817

Bug in knitr / Rmarkdown for producing word document and graphics in pdf

août 09, 2019

Let try this minimal Rmarkdown file: --- title: "cex in Rmarkdown" output: word_document --- ```{r} knitr::opts_chunk$set(dev='pdf') ``` ```{r} plot((0:160)/4, 0:160, type="n") text(x=20, y=70, labels =expression(alpha), cex=1e-7) ``` When knitr-red from Rstudio (with r 3.6.1 on MacosX with knitr 1.24), it produced an error with this message ("Information of metric non available for this family") [translation from French because of my system configuration] However, text(x=20, y=70, labels =expression(alpha), cex=0) and text(x=20, y=70, labels =expression(alpha), cex=0.1) work. Also text(x=20, y=70, labels ="a", cex=1e-7) works but text(x=20, y=70, labels =expression("K"["0"]), cex=1e-7) produced also an error. The error does not occur when dev='png' is used but it occurs also for dev='postscript'. The only solution that I found is something like that: cexindex <- The value that y

Xcode updates

août 03, 2019

After each update of Xcode, do not forget to open it and install the command line tool. Then in terminal, do: sudo xcodebuild -license This command is not necessary if you open Xcode and install command line tools: sudo xcode-select --install For older version of Xcode: Install the sdk headers for version 10.14 with: open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg They are installed with 10.15 version. Check using: cd /Library/Developer/CommandLineTools/SDKs (base) marcgirondot@MacBook-Air-de-Marc SDKs % ls -al total 0 drwxr-xr-x 5 root wheel 160 11 déc 2019 . drwxr-xr-x 5 root wheel 160 15 nov 2019 .. lrwxr-xr-x 1 root wheel 15 11 déc 2019 MacOSX.sdk -> MacOSX10.15.sdk drwxr-xr-x 7 root wheel 224 11 déc 2019 MacOSX10.14.sdk drwxr-xr-x 8 root wheel 256 11 déc 2019 MacOSX10.15.sdk