R

R is a language and environment for statistical computing and graphics with wide userbase. There exists several packages that are easily imported to R.

Getting started

Simply load the latest R.

module load r
R

It is best to pick a version of R and stick with it. Do module spider r and use the whole name:

module load r/3.4.3-python-2.7.14

If you want to detect the number of cores, you should use the proper Slurm environment variables (defaulting to default):

library(parallel)
as.integer(Sys.getenv('SLURM_JOB_CPUS_PER_NODE', parallel::detectCores()))

Installing packages

There are two ways to install packages.

  1. You can usually install packages yourself, which allows you to keep up to date and reinstall as needed. Good instructions can be found here, for example:

    R
    > install.packages('L1pack')
    

    This should guide you to selecting a download mirror and offer you the option to install in your home directory.

    Before installing packages you should set a package location, because the default location of the home directory can quickly fill up and loading them from the home directory is very slow. Example of doing this is here:

    module load R
    export R_LIBS=$WRKDIR/R/$EBVERSIONR
    mkdir -p $R_LIBS
    

    Afterwards setting

    export R_LIBS=$WRKDIR/R/$EBVERSIONR
    

    after loading R module will point R to the correct library location (you can put this in your .bashrc file). More info on R library paths can be found here. Looking at R startup can also be informative.

  2. You can also put a request to the triton issue tracker and mention which R-version you are using.

Simple R serial job

Serial R example

serial_R.slrm:

#!/bin/bash
#SBATCH -p short
#SBATCH -t 00:05:00
#SBATCH -n 1
#SBATCH --mem=100
#SBATCH -o serial_R.out
module load R
n=3
m=2
srun Rscript --vanilla serial_R.R $n $m

serial_R.R:

args = commandArgs(trailingOnly=TRUE)
n<-as.numeric(args[1])
m<-as.numeric(args[2])
print(n)
print(m)
A<-t(matrix(0:5,ncol=n,nrow=m))
print(A)
B<-t(matrix(2:7,ncol=n,nrow=m))
print(B)
C<-matrix(0.5,ncol=n,nrow=n)
print(C)
C<-A %*% t(B) + 2*C
print(C)

Simple R job using OpenMP for parallelization

R OpenMP Example

r_openmp.slrm:

#!/bin/bash
#SBATCH -p batch
#SBATCH -t 00:15:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH -o r_openmp.out

module load R
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
time srun Rscript --default-packages=methods,utils,stats R-benchmark-25.R

The benchmark script is available here (more information about it is available here page).

Simple R parallel job using ‘parallel’-package

Parallel R example

r_parallel.slrm:

#!/bin/bash
#SBATCH -p short
#SBATCH -t 00:20:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH -o r_parallel.out

# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1

# Load the version of R you want to use
module load r

# Run your R script
srun Rscript r_parallel.R

r_parallel.R:

library(pracma)
library(parallel)
invertRandom <- function(index) {
    A<-matrix(runif(2000*2000),ncol=2000,nrow=2000);
    A<-A + t(A);
    B<-pinv(A);
    return(max(B %*% A));
}
ptm<-proc.time()
mclapply(1:16,invertRandom, mc.cores=Sys.getenv('SLURM_CPUS_PER_TASK'))
proc.time()-ptm

When constrained to opt-architecture, run times for different core numbers were

ncores 1 2 4 8
runtime 380.757 182.185 125.526 84.230