BiostatR Blog

Articles

Affichage des articles du avril, 2019

Inferno with binomial glm with "Wilkinson-Rogers" format and visreg package

avril 30, 2019

Let do a binomial glm with cbind(success, failure) First, I prepare the data: factor <- rnorm(7, 10, 2) dta <- data.frame(p=c(2, 7, 8, 9, 10, 18, 7), n=c(20, 20, 20, 20, 20, 20, 20)) dta cannot be used directly; it must be a matrix, not a data.frame: g <- glm(dta ~ factor, family = binomial(link = "logit")) # Error in model.frame.default(formula = dta ~ factor, drop.unused.levels = TRUE) : # type (list) incorrect pour la variable 'dta' g <- glm(as.matrix(dta) ~ factor, family = binomial(link = "logit")) coef(g) (Intercept) factor -0.767901152 -0.006062598 Great, it works. But if you try to plot the effects with visreg package, it produced an error: library(visreg) visreg(g, xvar ="factor") # Error in dimnames(x) <- dn : # la longueur de 'dimnames' [1] n'est pas égale à l'étendue du tableau The error

FizzBuzz question

avril 22, 2019

In a recent blog post, a simple problem (called “FizzBuzz“) is solved. This problem is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant’s familiarity with basic programming concepts. https://www.r-bloggers.com/fizzbuzz-in-r-and-python/ The FizzBuzz Question I came across the FizzBuzz question in this excellent blog post on conducting data scientist interviews and have seen it referenced elsewhere on the web. The intent of the question is to probe the job applicant’s knowledge of basic programming concepts. The prompt of the question is as follows: In pseudo-code or whatever language you would like: write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”. The Solution in R We will first solve the problem in R. We will make use of control structure

Test of different version of parallel computing

avril 14, 2019

In conclusion, if no progress bar is necessary on unix system, mclapply with mc.preschedule = TRUE is very good. Don't forget to indicate mc.cores because the default is only 2 cores. If you work in windows system, you must use parLapply . However it is more complicated as you must transfer variable and which packages must be loaded. # No parallel computing; give a reference st1 <- system.time(f <- lapply(1:10000, FUN=function(x) {Sys.sleep(0.001)})) library(parallel) # Parallel computing using 4 cores with fork and mc.preschedule being TRUE st2 <- system.time(f <- mclapply(1:10000, FUN=function(x) {Sys.sleep(0.001)}, mc.cores = 4)) # Parallel computing using 4 cores with fork and mc.preschedule being FALSE st3 <- system.time(f <- mclapply(1:10000, FUN=function(x) {Sys.sleep(0.001)}, mc.cores = 4, mc.preschedule = FALSE)) # Parallel computing using 4 cores without fork st4 <- system.time({cl <- parallel::makeCluster(4); f <- parLapply

Build from source in Mojave

avril 02, 2019

To be able to build packages from source in Mojave, after the install of Xcode, run: open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg