Articles

Affichage des articles du 2017

Optimisation: sort or order and indicating default parameter

First, it is better to use sort(x) or x[order(x)]: > x <- sample(1:100, 100) > system.time({ +   for (i in 1:100000) y <- x[order(x, decreasing = FALSE)] + }) utilisateur     système      écoulé       1.908       0.043       1.965 > system.time({ +   for (i in 1:100000) y <- sort(x, decreasing = FALSE) + }) utilisateur     système      écoulé       2.993       0.035       3.043 Clearly the syntax x[order(x)] is more rapid (1/3 more fast). Now a second test: decreasing = FALSE is the default; what is the difference if it is not indicated: > system.time({ +   for (i in 1:100000) y <- x[order(x)] + }) utilisateur     système      écoulé       1.907       0.037       1.952 > > system.time({ +   for (i in 1:100000) y <- sort(x) + }) utilisateur     système      écoulé       3.049       0.035       3.094 No big change...

Sort your data to get a accurate sum

Based on a message of Jan Motl in r-help discussion list. If the numbers are positive, to get an accurate sum, sort the data by increasing order. > x <- c(rep(1.1, 10000), 10^16) > dput(c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))) c(10000000000010996, 10000000000011000) If the numbers are negative, to get an accurate sum, sort the data by decreasing order. > x <- c(rep(-1.1, 10000), -10^16) > dput(c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))) c(-10000000000011000, -10000000000010996) If the numbers are both positive and negative, no change. > x <- c(rep(1.1, 5000), rep(-1.1, 5000), 10^16) > dput(c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))) c(1e+16, 1e+16) Higham NJ (2002) Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics, Philadelphia, USA Chapter 4 - Summation See also  https://stackoverflow.com/questions/47847295/why

The problem of constant SD for likelihood of Gaussian distribution

Imagine that you have a Gaussian distribution N(mean, sd) and you search for a model f(xi) that explained the best your observed data ni. A common solution is to use maximum likelihood and then minimizing -Ln L using dnorm(ni, mean=f(xi), sd=SD, log=TRUE) with SD being fitted. However, doing this, you give huge weight to the large values ni. Imagine n1=1000 and n2=10. The model predicting n from the x gives a 10% error prediction. Then f(x1)=1100 and f(x2)=11. The -ln likelihood of 1 and 2 are: L1 <- -dnorm(x=n1, mean=f(x1), sd=SD, log=TRUE) and L2 <- -dnorm(x=n2, mean=f(x2), sd=SD, log=TRUE) The minimum for L1 is 6.024496 with SD=102 and for L2 is 1.418939 with SD=1. Then contribution of L1 in the total likelihood is much higher than contribution of L2... the values with higher figures drive the fit. This is normal because the absolute deviation is higher and the model suppose no heteroskedasticity. But heteroskedasticity is very very often observed. So it is bett

Distribution of Pearson coefficient of correlation

Image
library(hypergeo) pearson.p <- function(r, n, rho=0) {   # n # Number of observations   # r # Valeur for which density needs to be estimated   # rho # Observed correlation   N <- (n-2)*gamma(n-1)*(1-rho^2)^((n-1)/2)*(1-r^2)^((n-4)/2)   D <- sqrt(2*pi)*gamma(n-1/2)*(1-rho*r)^(n-3/2)   P <- hypergeo(A=1/2, B=1/2, C=(2*n-1)/2, z=(rho*r+1)/2)   return((N/D)*P) } # When the correlation is null, it simplified to : pearson.p.0 <- function(r, n) {   pofr <- ((1-r^2)^((n-4)/2))/beta(a = 1/2, b = (n-2)/2)   return(pofr) } n <- 50 # Generate 1000 series of n random variables m <- sapply(X = 1:1000, FUN = function(x) rnorm(n)) # Calculate the pearson correlation between all the series (1000x1000) cors <- cor(x = m, method = "pearson") # take only the off-diagonals, otherwise there'd be duplicates cor.data <- cors[upper.tri(cors, diag = FALSE)] par(mar=c(4, 4, 1, 1)+0.4) hist(cor.data, xlim=c(-1, 1), freq=FALSE,      las=1,

Error "failed to lock directory"

If you get this message when installing a package: ERROR: failed to lock directory ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library’ for modifying Try using: install.packages(xxx, INSTALL_opts = c('--no-lock')) or in terminal: R CMD INSTALL --no-lock xxx

Install package igraph on MacOSX or in Ubuntu

I fail to install this package from source. However the macbinary can be installed fine. > library(HelpersMG) Le chargement a nécessité le package : coda > wget("https://cran.r-project.org/bin/macosx/el-capitan/contrib/3.4/igraph_1.1.2.tgz") essai de l'URL 'https://cran.r-project.org/bin/macosx/el-capitan/contrib/3.4/igraph_1.1.2.tgz' Content type 'application/x-gzip' length 6928223 bytes (6.6 MB) ================================================== downloaded 6.6 MB > install.packages("igraph_1.1.2.tgz", repos=NULL) * installing *binary* package ‘igraph’ ... * DONE (igraph) For Ubuntu 16.04: apt - get update # refresh apt - get install software - properties - common add - apt - repository - y "ppa:marutter/rrutter" add - apt - repository - y "ppa:marutter/c2d4u" apt - get update # now with new repos apt - get install r - cran - igraph News: In ubuntu 22.04: sudo apt - get install libglpk - d

Partial coefficient of correlation

Let have 3 variables, x, y and a: x <- rnorm(100, mean=10, sd=2) y <- x+rnorm(100, mean=20, sd=5) a <- rnorm(100, mean=10, sd=2) df <- data.frame(x=x, y=y, a=a) The partial coefficient correlation between x and y relative to a is the correlation coefficient of the residuals of the linear regression of x and y over a: pcor <- function(df) {   ax <- lm(x ~ a, data=df)   ay <- lm(y ~ a, data=df)      rx <- residuals(ax)   ry <- residuals(ay)      return(cor(rx, ry)) } pcor(df) Let confirm with the package ppcor: library("ppcor") ppcor::: pcor(df)$estimate[2, 1] Note that I use ppcor:::pcor() to be sure that the version of pcor in the package is used. It works: we obtained the same result. It should be noted that ppcor:::pcor() is more rapid than my version of pcor() even if it estimate 3 partial correlations and me only one ! > system.time( +   for (i in 1:10000) ppcor:::pcor(df)$estima

Memory error when using java in R

Try: options(java.parameters = "-Xmx1000m") or options(java.parameters = "-Xmx2048m") You should make sure that you are setting the Java parameters before any JVM is initialized, i.e. before java-dependent packages are loaded.

Change login user in Rstudio-server

The logout button gives an error if the current user has checked "stay log in" button. To change a user, just use this link: http://[IP]:[port]/auth-sign-in

Where are located the packages and who is the owner

I have done a function list.packages() in my package HelpersMG. .libPaths() returns the list of locations for packages... but who is where ?: structure(lapply(.libPaths(), FUN = function(path) {list.dirs(path=path, full.names = FALSE, recursive = FALSE)}), .Names=.libPaths()) To get the owner for each location: structure(as.list(system(paste("ls -ld ", .libPaths()," | awk '{print $3}'", collapse=";"), intern=TRUE)), .Names=.libPaths()) It works both in ubuntu and MacOSx. I don't have Windows computer to do the same. The path used by .libPaths() are defined in one of the files used for R configuration at startup: in ubuntu: /usr/lib/R/etc/Renviron in MacosX: /Library/Frameworks/R.framework/Resources/etc/Renviron > file.path(Sys.getenv("R_HOME"), "etc", "Renviron") [1] "/Library/Frameworks/R.framework/Resources/etc/Renviron" and  ~/.Renviron ~/.Rprofile The files /usr/lib/R/etc/R

Run a R script from terminal

Let a.R being:  sayHello <- function(){    print('hello') } sayHello() In terminal, enter: Rscript a.R [1] "hello" Or: R CMD BATCH a.R Nothing is shown because the stdout() is store in a file: cat a.Rout to see the result. It is not very friendly: MacBook-Air-de-Marc:Documents marcgirondot$ cat a.Rout R version 3.4.2 beta (2017-09-19 r73321) -- "Short Summer" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit) R est un logiciel libre livré sans AUCUNE GARANTIE. Vous pouvez le redistribuer sous certaines conditions. Tapez 'license()' ou 'licence()' pour plus de détails. R est un projet collaboratif avec de nombreux contributeurs. Tapez 'contributors()' pour plus d'information et 'citation()' pour la façon de le citer dans les publications. Tapez 'demo()' pour des démonstrations, 'help(

Install rJava on archlinux

First, be sure that your date system is correct. timedatectl If not setup a ntp server: sudo timedatectl set-ntp true sudo pacman -S ntp sudo vi /etc/ntp.conf Replace the server with your stp server: server ntp.u-psud.fr iburst Force the date/time update and start the daemon sudo ntpd -gq sudo ntpd start Check if everything is ok timedatectl Update the system sudo pacman -Syu sudo pacman -S jdk8-openjdk sudo R CMD javareconf R In R, do install.packages("rJava") install.packages("mailR")

rJava in ubuntu

To install rJava in ubuntu, run: sudo apt - get install r - cran - rjava Alternative to get the most recent update: sudo apt-get install openjdk-8-jdk sudo R CMD javareconf sudo R install.packages("rJava")

eval() within a function

essai <- function() {   mx <- 10   eval(parse(text="k=20"), envir= environment())   print(k) } essai() # by default the environment used is .GlobalEnv # mx is not known in this environment essai <- function() {   mx <- 10   eval(parse(text="k=mx"))   print(k) } essai() essai <- function() {   mx <- 10   eval(parse(text="k=mx"), envir= .GlobalEnv)   print(k) } essai() # Here I force mx to be defined in .GlobalEnv essai <- function() {   mx <<- 10   eval(parse(text="k=mx"), envir= .GlobalEnv)   print(k) } essai() # environment() is the environment of the function essai <- function() {   mx <- 10   eval(parse(text="k=mx"), envir= environment())   print(k) } essai() essai <- function() { env <- new.env(parent = environment()) mx <- 10 eval(parse(text="k=mx"), envir= env) print(k) } essai()

install rgeos 0.3-28 without error

When I try update rgeos 0.3-24 package in R 3.4.1, I get this error: * installing to library ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library’ * installing *source* package ‘rgeos’ ... configure: CC: clang configure: CXX: clang++ configure: rgeos: 0.3-24 checking for /usr/bin/svnversion... yes cat: inst/SVN_VERSION: No such file or directory configure: svn revision:  checking for geos-config... no no configure: error: geos-config not found or not executable. ERROR: configuration failed for package ‘rgeos’ * removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgeos’ * restoring previous ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgeos’ Here is the solution. First install geos and gdal complete frameworks from here: http://www.kyngchaos.com/software:frameworks At the time I write this article, the links are here: GDAL 2.2 Complete GEOS framework v3.6.1-1 Then load the last version of