Articles

Affichage des articles du avril, 2019

Inferno with binomial glm with "Wilkinson-Rogers" format and visreg package

Image
Let do a binomial glm with cbind(success, failure) First, I prepare the data: factor <- rnorm(7, 10, 2) dta <- data.frame(p=c(2, 7, 8, 9, 10, 18, 7),                   n=c(20, 20, 20, 20, 20, 20, 20)) dta cannot be used directly; it must be a matrix, not a data.frame: g <- glm(dta ~ factor,           family = binomial(link = "logit")) # Error in model.frame.default(formula = dta ~ factor, drop.unused.levels = TRUE) :  #     type (list) incorrect pour la variable 'dta' g <- glm(as.matrix(dta) ~ factor,           family = binomial(link = "logit")) coef(g)  (Intercept)       factor  -0.767901152 -0.006062598  Great, it works. But if you try to plot the effects with visreg package, it produced an error: library(visreg) visreg(g, xvar ="factor") # Error in dimnames(x) ...

FizzBuzz question

In a recent blog post, a simple problem (called “FizzBuzz“) is solved. This problem is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant’s familiarity with basic programming concepts.  https://www.r-bloggers.com/fizzbuzz-in-r-and-python/ The FizzBuzz Question I came across the FizzBuzz question in this excellent blog post on conducting data scientist interviews and have seen it referenced elsewhere on the web. The intent of the question is to probe the job applicant’s knowledge of basic programming concepts. The prompt of the question is as follows: In pseudo-code or whatever language you would like: write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”. The Solution in R We will first solve the problem in R. We will make use of control struc...

Test of different version of parallel computing

In conclusion, if no progress bar is necessary on unix system, mclapply with  mc.preschedule =  TRUE  is very good. Don't forget to indicate mc.cores because the default is only 2 cores. If you work in windows system, you must use  parLapply . However it is more complicated as you must transfer variable and which packages must be loaded. # No parallel computing; give a reference st1 <- system.time(f <- lapply(1:10000, FUN=function(x) {Sys.sleep(0.001)})) library(parallel) # Parallel computing using 4 cores with fork and mc.preschedule being TRUE st2 <- system.time(f <- mclapply(1:10000, FUN=function(x) {Sys.sleep(0.001)}, mc.cores = 4)) # Parallel computing using 4 cores with fork and mc.preschedule being FALSE st3 <- system.time(f <- mclapply(1:10000, FUN=function(x) {Sys.sleep(0.001)}, mc.cores = 4, mc.preschedule = FALSE)) # Parallel computing using 4 cores without fork st4 <- system.time({cl <- parallel::makeCluster(4);...

Build from source in Mojave

To be able to build packages from source in Mojave, after the install of Xcode, run: open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg