Install and running APACHE SPARK (R) from scratch

I suppose that you use R 3.x.x (in my case 3.5-devel) and Rstudio > 1.1.x (in my case 1.2.308).
Updated the 25/3/2018 with the last version of SPARK.

# In Rstudio, first install the package HelpersMG from CRAN and update it:

install.packages("", repos=NULL, type="source")

# Then load the HelpersMG library and the last version of SPARK:

system("tar -xvzf spark-2.3.0-bin-hadoop2.7.tgz")

# Change the .profile and .Rprofile for future use

SPARK_HOME <- file.path(getwd(), "spark-2.3.0-bin-hadoop2.7")

HOME <- system("echo $HOME", intern = TRUE)

fileConn<-file(file.path(HOME, ".profile"))
total <- readLines(fileConn)
writeLines(c(total, paste0('SPARK_HOME="', SPARK_HOME, '"'), "export SPARK_HOME"))


fileConn<-file(file.path(HOME, ".Rprofile"))
total <- readLines(fileConn)
                      'if (nchar(Sys.getenv("SPARK_HOME")) < 1) {',
                             paste0('Sys.setenv(SPARK_HOME = "', SPARK_HOME, '")'), 

# Now install the sparkR package:

install.packages(file.path(SPARK_HOME, "R", "lib", "SparkR"), repos=NULL)

# You have SPARK ready to be used
# Now start the master on your computer
# If you return from a previous use, just begin here

SPARK_HOME <- Sys.getenv("SPARK_HOME")

system(paste0(file.path(SPARK_HOME, "sbin", ""), ";", file.path(SPARK_HOME, "sbin", "")))

# And run a slave on your computer; just to test

x <- system("ifconfig", intern=TRUE)
IP <- rev(gsub("^(.*) ([0-9\\.]+) (.*)$", "\\2", x[grep("inet ", x)]))[1]

system(paste0(file.path(SPARK_HOME, "sbin", ""), " spark://", IP, ":7077"))

# Let try to run a computing:


spark_link <- paste0("spark://", IP, ":7077")
sc <- sparkR.session(master = spark_link,
                     appName = "Nom de la session",
                     sparkEnvir = list(spark.driver.memory = "2g"))

output <- spark.lapply(1:100, function(x) {x*2})

Don't expect to have exceptional result in such a configuration ;) But it works.


