Articles

Affichage des articles du janvier, 2018

Install and running APACHE SPARK (R) from scratch

I suppose that you use R 3.x.x (in my case 3.5-devel) and Rstudio > 1.1.x (in my case 1.2.308). Updated the 25/3/2018 with the last version of SPARK. # In Rstudio, first install the package HelpersMG from CRAN and update it: install.packages("HelpersMG.tar.gz") install.packages("http://www.ese.u-psud.fr/epc/conservation/CRAN/HelpersMG.tar.gz", repos=NULL, type="source") # Then load the HelpersMG library and the last version of SPARK: library("HelpersMG") wget("http://apache.crihan.fr/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz") system("tar -xvzf spark-2.3.0-bin-hadoop2.7.tgz") # Change the .profile and .Rprofile for future use SPARK_HOME <- file.path(getwd(), "spark-2.3.0-bin-hadoop2.7") HOME <- system("echo $HOME",  intern = TRUE ) fileConn<-file(file.path(HOME, ".profile")) total <- readLines(fileConn) writeLines(c(total, paste0(&#

get.seed() ?

Using set.seed(x) you can generate a new random series... but there is no get.seed() function. This is because when you are using set.seed(x), you generate a vector of 626 values stored in .Random.seed. > h0 <- .Random.seed > set.seed(1) > h1 <- .Random.seed > set.seed(0) > h2 <- .Random.seed > identical(h0, h2) [1] TRUE > identical(h0, h1) [1] FALSE If you want get the current seed to use it later, you must store the vector of 626 values: > set.seed(0) > h0 <- .Random.seed > runif(1) [1] 0.8966972 > runif(1) [1] 0.2655087 > .Random.seed <- h0 > runif(1) [1] 0.8966972 > runif(1) [1] 0.2655087

Multivariable analysis and correlation of iconography

Image
Introduction Correlation iconography is a not very well known method to study multivariate data. It was developed long-time ago by Michel Lesty: Lesty M (1999) Une nouvelle approche dans le choix des régresseurs de la régression multiple en présence d'intéractions et de colinearités. La revue de Modulad 22:41-77 It is also well described in a French Wikipedia page: https://fr.wikipedia.org/wiki/Iconographie_des_corrélations After checking the possibilities of this method, I think that it deserves more attention. Let take the example of the wikipedia page: dta <- read.table(text=gsub(",", ".", "Élève Poids Âge Assiduité Note e1 52 12 12 5 e2 59 12,5 9 5 e3 55 13 15 9 e4 58 14,5 5 5 e5 66 15,5 11 13,5 e6 62 16 15 18 e7 63 17 12 18 e8 69 18 9 18"), header=TRUE) > dta   Élève Poids  Âge Assiduité Note 1    e1    52 12.0        12  5.0 2    e2    59 12.5         9  5.0

From negative binomial to Poisson distribution

When the parameter size is +Inf, then dnbinom(x, mu, size=+Inf) is similar to dpois(x, lambda=mu): > dnbinom(1:20, mu=5, size=+Inf)  [1] 3.368973e-02 8.422434e-02 1.403739e-01 1.754674e-01 1.754674e-01 1.462228e-01  [7] 1.044449e-01 6.527804e-02 3.626558e-02 1.813279e-02 8.242177e-03 3.434240e-03 [13] 1.320862e-03 4.717363e-04 1.572454e-04 4.913920e-05 1.445271e-05 4.014640e-06 [19] 1.056484e-06 2.641211e-07 > dpois(1:20, lambda =5)  [1] 3.368973e-02 8.422434e-02 1.403739e-01 1.754674e-01 1.754674e-01 1.462228e-01  [7] 1.044449e-01 6.527804e-02 3.626558e-02 1.813279e-02 8.242177e-03 3.434240e-03 [13] 1.320862e-03 4.717363e-04 1.572454e-04 4.913920e-05 1.445271e-05 4.014640e-06 [19] 1.056484e-06 2.641211e-07 This is logical from the definition of variance of negative binomial distribution: variance = mu + mu^2 / size When size is +Inf, variance is mu... and then negative binomial is a Poisson distribution.