R

R is a language and environment for statistical computing and graphics with wide userbase. There exists several packages that are easily imported to R.

Getting started

Simply load the latest R.

module load r
R

As any packages you install against R are specific to the version you installed them with, it is best to pick a version of R and stick with it. You can do this by checking the R version with module spider r and using the whole name when loading the module:

module load r/3.6.1-python3

If you want to detect the number of cores, you should use the proper Slurm environment variables (defaulting to all cores):

library(parallel)
as.integer(Sys.getenv('SLURM_CPUS_PER_TASK', parallel::detectCores()))

Installing packages

There are two ways to install packages.

  1. You can usually install packages yourself, which allows you to keep up to date and reinstall as needed. Good instructions can be found here, for example:

    R
    > install.packages('L1pack')
    

    This should guide you to selecting a download mirror and offer you the option to install in your home directory.

    If you have a lot of packages, you can run out of home quota. In this case you should move your package directory to your work directory and replace it the ~/R-directory with a symlink that points to your $WRKDIR/R.

    Example of doing this is here:

    mv ~/R $WRKDIR/R
    ln -s $WRKDIR/R ~/R
    

    More info on R library paths can be found here. Looking at R startup can also be informative.

  2. You can also put a request to the triton issue tracker and mention which R-version you are using.

Simple R serial job

Serial R example

r_serial.sh:

#!/bin/bash -l
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --mem=100M
#SBATCH --output=r_serial.out

module load r
n=3
m=2
srun Rscript --vanilla r_serial.R $n $m

r_serial.R:

args = commandArgs(trailingOnly=TRUE)

n<-as.numeric(args[1])
m<-as.numeric(args[2])

print(n)
print(m)

A<-t(matrix(0:5,ncol=n,nrow=m))
print(A)
B<-t(matrix(2:7,ncol=n,nrow=m))
print(B)
C<-matrix(0.5,ncol=n,nrow=n)
print(C)

C<-A %*% t(B) + 2*C
print(C)

Simple R job using OpenMP for parallelization

R OpenMP Example

r_openmp.sh:

#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_openmp.out

module load r
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
time srun Rscript --default-packages=methods,utils,stats R-benchmark-25.R

The benchmark script is available here (more information about it is available here page).

Simple R parallel job using ‘parallel’-package

Parallel R example

r_parallel.sh:

#!/bin/bash
#SBATCH --time=00:20:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_parallel.out

# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1

# Load the version of R you want to use
module load r

# Run your R script
srun Rscript r_parallel.R

r_parallel.R:

library(pracma)
library(parallel)
invertRandom <- function(index) {
    A<-matrix(runif(2000*2000),ncol=2000,nrow=2000);
    A<-A + t(A);
    B<-pinv(A);
    return(max(B %*% A));
}
ptm<-proc.time()
mclapply(1:16,invertRandom, mc.cores=Sys.getenv('SLURM_CPUS_PER_TASK'))
proc.time()-ptm

When constrained to opt-architecture, run times for different core numbers were

ncores

1

2

4

8

runtime

380.757

182.185

125.526

84.230