Articles

Affichage des articles du mai, 2017

Draw a trend arrow with confidence interval

Image
Let draw an arrow with its width being dependent on the confidence interval: drawTrend <- function(x, y, show.circle=TRUE, show.0=TRUE) {   model <- lm(y ~ x)   direction.se <- coef(summary(model))["x", "Std. Error"]   direction <- model$coefficients["x"]    library("car") k <- ellipse(center=c(0.5, 0), radius=c(0.5, 0.5),               shape=matrix(data = c(1, 0, 0, 1), nrow=2),               segments=1001, draw = FALSE) par(mar=c(0, 0, 0, 0)) plot(x = c(0, 0), y=c(0, 0), type="n", bty="n", xlim=c(-0.1,1.1),       ylim=c(-0.6, 0.6), las=1, asp=1,      xaxt="n", yaxt="n", xlab="", ylab = "") if (show.circle) polygon(k[, 1], k[, 2], lwd=1) maxy <- which.max(k[, 2]) maxx <- which.max(k[, 1]) miny <- which.min(k[, 2]) minx <- which.min(k[, 1]) if (show.circle) { s <- seq(from=maxy, to=maxx, length.out = 11) for (pos in 0:10) {   s_

Two solutions for progress bar... not equivalent

To make a progress bar, you have two main solutions: In library utils: txtProgressBar() In library progress: progress_bar$new() The second one makes a progress bar with automatic "estimated time to arrival". This is very useful. However, it does that functionality at a cost. Let do 3 examples: without progress bar, with a txtProgressBar() and with a progress_bar$new() (see below for the codes). The code is ran in Rstudio, R GUI and in terminal under MacOSX: Rstudio: > A[3]/R[3]  elapsed  8.235853  > B[3]/R[3]  elapsed  1.835827  R GUI: > A[3]/R[3]  elapsed  7.895495  > B[3]/R[3]  elapsed  1.181143  R in terminal: > A[3]/R[3]  elapsed  10.74719  > B[3]/R[3]  elapsed  2.079948  In conclusion, the progress_bar$new() is between 5 to 8 time slower than the txtProgressBar(). It is nicer but it has a cost ! R <- system.time({   Total <- 50000   for (annee in 1:Total) {     Sys.sleep(0.0001)   } }) library("

RStudio or not RStudio... that is the question!

RStudio is a very useful environment for development in R. However, running your script in RStudio can take much much more time than running the same in R. Let do an example with a progress bar. Surprisingly R GUI is the fastest solution, second is RStudio and slowest is R in terminal. library("progress") In RStudio: > system.time({ +     Total <- 50000 +     pb <- progress_bar$new( +         format = "  completion [:bar] :percent eta: :eta", +         total = Total, clear = FALSE) +     for (annee in 1:Total) { +         pb$tick() +         Sys.sleep(0.0001) +     } + })   completion [========================================================================] 100% eta:  0s utilisateur     système      écoulé       49.652       1.222     117.707  In R GUI: > system.time({ +   Total <- 50000 +   pb <- progress_bar$new( +     format = "  completion [:bar] :percent eta: :eta", +     total = Total, clear = FALSE) +   f

Standard error from Hessian Matrix... what can be done when problem occurs

An Hessian matrix is a square matrix of partial second order derivative. From the square-root of the inverse of the diagonal, it is possible to estimate the standard error of parameters. Let take an example: val=rnorm(100, mean=20, sd=5) # Return -ln L of values in val in Gaussian distribution with mean and sd in par fitnorm<-function(par, val) {   -sum(dnorm(val, par["mean"], par["sd"], log = TRUE)) } # Initial values for search p<-c(mean=20, sd=5) # fit the model result <- optim(par=p, fn=fitnorm, val=val, method="BFGS", hessian=TRUE) # Inverse the hessian matrix to get SE for each parameters mathessian <- result$hessian inversemathessian <- solve(mathessian) res <- sqrt(diag(inversemathessian)) # results data.frame(Mean=result$par, SE=res)   library(MASS) (r<-fitdistr(val, "normal")) It works well. However, the inverse of the matrix cannot be calculated if the second order derivative for some paramete

Integer random number: comparison of two strategies

If you need an integer ransom number, here are two strategies. First get it from uniform distribution and take the integer part, second generate all integer possible values and take one randomly. The first solution is two times faster. As seen in the last example, the manipulation of a vector of 50000 elements is not efficient at all. However, if you know how many random numbers you will need, it is better to estimate first a long vector of random numbers and after get one per one. 1/ Using integer of uniform random number one by one: 1.994 > system.time( +   for (i in 1:1000000) +     a <- floor(runif(n = 1, min=1, max=50001)) + ) utilisateur     système      écoulé        1.853       0.116       1.994 2/ Long vector of integer of uniform random number: 0.196 > system.time({+   ru <- floor(runif(n = 1000000, min=1, max=50001)) +   for (i in 1:1000000) +     a <- ru[i] + } + ) utilisateur     système      écoulé        0.187       0.007       0.

Retrieve data from data.frame

The command subset is very efficient to retrieve data from data.frame. For example, let retrieve the size for a specific stage in this data frame: > library(embryogrowth) > s <- TSP.list[[1]] > s    stages metric 1       8     NA 2       9     NA 3      10     NA 4      11     NA 5      12     NA 6      13     NA 7      14  0.016 8      15  0.023 9      16  0.035 10     17  0.044 11     18  0.057 12     19  0.072 13     20  0.090 14     21  0.140 15     22  0.240 16     23  0.340 17     24  0.550 18     25  0.750 19     26  1.000 For example: > subset(x = s, subset=stages==20, select="metric", drop = TRUE) [1] 0.09 An alternative solution for this simple case is: > s$metric[s$stages == 20] [1] 0.09 Let measure the relative speed of both solutions; clearly subset is very useful for complex situation but simpler ones are faster using direct comparison. > system.time(expr = { +   for (i in 1:100000) { +     g <- sub