BiostatR Blog

Articles

Affichage des articles du mai, 2017

Draw a trend arrow with confidence interval

mai 30, 2017

Let draw an arrow with its width being dependent on the confidence interval: drawTrend <- function(x, y, show.circle=TRUE, show.0=TRUE) { model <- lm(y ~ x) direction.se <- coef(summary(model))["x", "Std. Error"] direction <- model$coefficients["x"] library("car") k <- ellipse(center=c(0.5, 0), radius=c(0.5, 0.5), shape=matrix(data = c(1, 0, 0, 1), nrow=2), segments=1001, draw = FALSE) par(mar=c(0, 0, 0, 0)) plot(x = c(0, 0), y=c(0, 0), type="n", bty="n", xlim=c(-0.1,1.1), ylim=c(-0.6, 0.6), las=1, asp=1, xaxt="n", yaxt="n", xlab="", ylab = "") if (show.circle) polygon(k[, 1], k[, 2], lwd=1) maxy <- which.max(k[, 2]) maxx <- which.max(k[, 1]) miny <- which.min(k[, 2]) minx <- which.min(k[, 1]) if (show.circle) { s <- seq(from=maxy, to=maxx, leng...

Two solutions for progress bar... not equivalent

mai 12, 2017

To make a progress bar, you have two main solutions: In library utils: txtProgressBar() In library progress: progress_bar$new() The second one makes a progress bar with automatic "estimated time to arrival". This is very useful. However, it does that functionality at a cost. Let do 3 examples: without progress bar, with a txtProgressBar() and with a progress_bar$new() (see below for the codes). The code is ran in Rstudio, R GUI and in terminal under MacOSX: Rstudio: > A[3]/R[3] elapsed 8.235853 > B[3]/R[3] elapsed 1.835827 R GUI: > A[3]/R[3] elapsed 7.895495 > B[3]/R[3] elapsed 1.181143 R in terminal: > A[3]/R[3] elapsed 10.74719 > B[3]/R[3] elapsed 2.079948 In conclusion, the progress_bar$new() is between 5 to 8 time slower than the txtProgressBar(). It is nicer but it has a cost ! R <- system.time({ ...

RStudio or not RStudio... that is the question!

mai 11, 2017

RStudio is a very useful environment for development in R. However, running your script in RStudio can take much much more time than running the same in R. Let do an example with a progress bar. Surprisingly R GUI is the fastest solution, second is RStudio and slowest is R in terminal. library("progress") In RStudio: > system.time({ + Total <- 50000 + pb <- progress_bar$new( + format = " completion [:bar] :percent eta: :eta", + total = Total, clear = FALSE) + for (annee in 1:Total) { + pb$tick() + Sys.sleep(0.0001) + } + }) completion [========================================================================] 100% eta: 0s utilisateur système écoulé 49.652 1.222 117.707 In R GU...

Standard error from Hessian Matrix... what can be done when problem occurs

mai 08, 2017

An Hessian matrix is a square matrix of partial second order derivative. From the square-root of the inverse of the diagonal, it is possible to estimate the standard error of parameters. Let take an example: val=rnorm(100, mean=20, sd=5) # Return -ln L of values in val in Gaussian distribution with mean and sd in par fitnorm<-function(par, val) { -sum(dnorm(val, par["mean"], par["sd"], log = TRUE)) } # Initial values for search p<-c(mean=20, sd=5) # fit the model result <- optim(par=p, fn=fitnorm, val=val, method="BFGS", hessian=TRUE) # Inverse the hessian matrix to get SE for each parameters mathessian <- result$hessian inversemathessian <- solve(mathessian) res <- sqrt(diag(inversemathessian)) # results data.frame(Mean=result$par, SE=res) library(MASS) (r<-fitdistr(val, "normal")) It works well. However, the inverse of the matrix cannot be calculated if the second order derivative for some paramete...

Integer random number: comparison of two strategies

mai 07, 2017

If you need an integer ransom number, here are two strategies. First get it from uniform distribution and take the integer part, second generate all integer possible values and take one randomly. The first solution is two times faster. As seen in the last example, the manipulation of a vector of 50000 elements is not efficient at all. However, if you know how many random numbers you will need, it is better to estimate first a long vector of random numbers and after get one per one. 1/ Using integer of uniform random number one by one: 1.994 > system.time( + for (i in 1:1000000) + a <- floor(runif(n = 1, min=1, max=50001)) + ) utilisateur système écoulé 1.853 0.116 1.994 2/ Long vector of integer of uniform random number: 0.196 > system.time({+ ru <- floor(runif(n = 1000000, min=1, max=50001)) + for (i in 1:1000000...

Retrieve data from data.frame

mai 02, 2017

The command subset is very efficient to retrieve data from data.frame. For example, let retrieve the size for a specific stage in this data frame: > library(embryogrowth) > s <- TSP.list[[1]] > s stages metric 1 8 NA 2 9 NA 3 10 NA 4 11 NA 5 12 NA 6 13 NA 7 14 0.016 8 15 0.023 9 16 0.035 10 17 0.044 11 18 0.057 12 19 0.072 13 20 0.090 14 21 0.140 15 22 0.240 16 23 0.340 17 24 0.550 18 25 0.750 19 26 1.000 For example: > subset(x = s, subset=stages==20, sele...