Install and running APACHE SPARK (R) from scratch
I suppose that you use R 3.x.x (in my case 3.5-devel) and Rstudio > 1.1.x (in my case 1.2.308).
Updated the 25/3/2018 with the last version of SPARK.
# In Rstudio, first install the package HelpersMG from CRAN and update it:
Updated the 25/3/2018 with the last version of SPARK.
# In Rstudio, first install the package HelpersMG from CRAN and update it:
install.packages("HelpersMG.tar.gz")
install.packages("http://www.ese.u-psud.fr/epc/conservation/CRAN/HelpersMG.tar.gz", repos=NULL, type="source")
# Then load the HelpersMG library and the last version of SPARK:
library("HelpersMG")
wget("http://apache.crihan.fr/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz")
system("tar -xvzf spark-2.3.0-bin-hadoop2.7.tgz")
# Change the .profile and .Rprofile for future use
SPARK_HOME <- file.path(getwd(), "spark-2.3.0-bin-hadoop2.7")
HOME <- system("echo $HOME", intern = TRUE)
fileConn<-file(file.path(HOME, ".profile"))
total <- readLines(fileConn)
writeLines(c(total, paste0('SPARK_HOME="', SPARK_HOME, '"'), "export SPARK_HOME"))
close(fileConn)
Sys.setenv(SPARK_HOME = SPARK_HOME)
fileConn<-file(file.path(HOME, ".Rprofile"))
total <- readLines(fileConn)
writeLines(c(total,
'if (nchar(Sys.getenv("SPARK_HOME")) < 1) {',
paste0('Sys.setenv(SPARK_HOME = "', SPARK_HOME, '")'),
'}'
))
close(fileConn)
# Now install the sparkR package:
install.packages(file.path(SPARK_HOME, "R", "lib", "SparkR"), repos=NULL)
# You have SPARK ready to be used
# Now start the master on your computer
# If you return from a previous use, just begin here
SPARK_HOME <- Sys.getenv("SPARK_HOME")
system(paste0(file.path(SPARK_HOME, "sbin", "stop-master.sh"), ";", file.path(SPARK_HOME, "sbin", "start-master.sh")))
# And run a slave on your computer; just to test
x <- system("ifconfig", intern=TRUE)
IP <- rev(gsub("^(.*) ([0-9\\.]+) (.*)$", "\\2", x[grep("inet ", x)]))[1]
system(paste0(file.path(SPARK_HOME, "sbin", "start-slave.sh"), " spark://", IP, ":7077"))
# Let try to run a computing:
library("SparkR")
spark_link <- paste0("spark://", IP, ":7077")
sparkR.stop()
sc <- sparkR.session(master = spark_link,
appName = "Nom de la session",
sparkEnvir = list(spark.driver.memory = "2g"))
output <- spark.lapply(1:100, function(x) {x*2})
Don't expect to have exceptional result in such a configuration ;) But it works.
Commentaires
Enregistrer un commentaire