R¶
R is a language and environment for statistical computing and graphics with wide userbase. There exists several packages that are easily imported to R.
Getting started¶
Simply load the latest R.
module load r
R
As any packages you install against R are specific to the version you
installed them with, it is best to pick a version of R and stick with it.
You can do this by checking the R version with module spider r
and
using the whole name when loading the module:
module load r/3.6.1-python3
If you want to detect the number of cores, you should use the proper Slurm environment variables (defaulting to all cores):
library(parallel)
as.integer(Sys.getenv('SLURM_CPUS_PER_TASK', parallel::detectCores()))
Installing packages¶
There are two ways to install packages.
You can usually install packages yourself, which allows you to keep up to date and reinstall as needed. Good instructions can be found here, for example:
R > install.packages('L1pack')
This should guide you to selecting a download mirror and offer you the option to install in your home directory.
If you have a lot of packages, you can run out of home quota. In this case you should move your package directory to your work directory and replace it the
~/R
-directory with a symlink that points to your$WRKDIR/R
.Example of doing this is here:
mv ~/R $WRKDIR/R ln -s $WRKDIR/R ~/R
More info on R library paths can be found here. Looking at R startup can also be informative.
You can also put a request to the triton issue tracker and mention which R-version you are using.
Simple R serial job¶
Serial R example¶
#!/bin/bash -l
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --mem=100M
#SBATCH --output=r_serial.out
module load r
n=3
m=2
srun Rscript --vanilla r_serial.R $n $m
args = commandArgs(trailingOnly=TRUE)
n<-as.numeric(args[1])
m<-as.numeric(args[2])
print(n)
print(m)
A<-t(matrix(0:5,ncol=n,nrow=m))
print(A)
B<-t(matrix(2:7,ncol=n,nrow=m))
print(B)
C<-matrix(0.5,ncol=n,nrow=n)
print(C)
C<-A %*% t(B) + 2*C
print(C)
Simple R job using OpenMP for parallelization¶
R OpenMP Example¶
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_openmp.out
module load r
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
time srun Rscript --default-packages=methods,utils,stats R-benchmark-25.R
The benchmark script is available here (more information about it is available here page).
Simple R parallel job using ‘parallel’-package¶
Parallel R example¶
#!/bin/bash
#SBATCH --time=00:20:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_parallel.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Load the version of R you want to use
module load r
# Run your R script
srun Rscript r_parallel.R
library(pracma)
library(parallel)
invertRandom <- function(index) {
A<-matrix(runif(2000*2000),ncol=2000,nrow=2000);
A<-A + t(A);
B<-pinv(A);
return(max(B %*% A));
}
ptm<-proc.time()
mclapply(1:16,invertRandom, mc.cores=Sys.getenv('SLURM_CPUS_PER_TASK'))
proc.time()-ptm
When constrained to opt-architecture, run times for different core numbers were
ncores |
1 |
2 |
4 |
8 |
---|---|---|---|---|
runtime |
380.757 |
182.185 |
125.526 |
84.230 |