Triton quick reference

In this page, you have all important reference information

Modules

Command Description
module load NAME load module
module avail list all modules
module spider NAME search modules
module list list currently loaded modules
module show NAME details on a module
module help NAME details on a module
module unload NAME unload a module
module save ALIAS save module to this alias (saved in ~/.lmod.d/)
module restore ALIAS load saved module set (faster than loading individually)

Storage

Name Path Quota Backup Locality Purpose
Home $HOME or /home/$username/ hard quota 1GB Nightly all nodes Small user specific files, no calculation data.
Work $WRKDIR or /scratch/work/$username/ 200GB and 1 million files x all nodes Personal working space for every user. Calculation data etc. Quota can be increased on request.
Scratch /scratch/$dept/$project/ on request x all nodes Department/group specific project directories.
Local temp /tmp/ limited by disk size x single-node Primary (and usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node.
XDG Runtime Directory (ramfs) $XDG_RUNTIME_DIR limited by memory x single-node Ramfs on the compute nodes with files cached in the memory. Small random-access data (https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html).
Local persistent /l/ varies x dedicated group servers only Local disk persistent storage. On servers purchased for a specific group. Not backed up.

Partitions

Partition Max job size Mem/core (GB) Tot mem (GB) Cores/node Limits Use
<default>           If you leave off all possible partitions will be used (based on time/mem)
debug 2 nodes 2.66 - 12 32-256 12,20,24 15 min testing and debugging short interactive. work. 1 node of each arch.
batch 16 nodes 2.66 - 12 32-256 12, 20,24 5d primary partition, all serial & parallel jobs
short 8 nodes 4 - 12 48-256 12, 20,24 4h short serial & parallel jobs, +96 dedicated CPU cores
hugemem 1 node 43 1024 24 3d huge memory jobs, 1 node only
gpu 1 node, 2-8GPUs 2 - 10 24-128 12 5d GPU computing
gpushort 4 nodes, 2-8 GPUs 2 - 10 24-128 12 4h GPU computing
interactive 2 nodes 5 128 24 1d for sinteractive command, longer interactive work

Use slurm partitions to see more details.

Job submission

Command Description
sbatch submit a job to queue (see standard options below)
srun Within a running job script/environment: Run code using the allocated resources (see options below)
srun On frontend: submit to queue, wait until done, show output. (see options below)
sinteractive Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive).  Exit shell when done. (see options below)
srun --pty bash (advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done.
scancel <jobid> Cancel a job in queue
salloc (advanced) Allocate resources from frontend node. Use srun to run using those resources, exit to close shell when done. Read the description! (see options below)
scontrol View/modify job and slurm configuration
Command Option Description
sbatch/srun/etc -t, --time=hh:mm:ss time limit
  -t, --time=dd-hh time limit, days-hours
  -p, --partition=partition job partition. Usually leave off and things are auto-detected.
  --mem-per-cpu=n request n MB of memory per core
  --mem=n request n MB memory per node
  -c, --cpus-per-task=n Allocate n CPU’s for each task. For multithreaded jobs. (compare –ntasks: -c means the number of cores for each process started.)
  -N, --nodes=n-m allocate minimum of n, maximum of m nodes.
  -n, --ntasks=n allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.)
  -J, --job-name=name short job name
  -o output print output into file output
  -e error print errors into file error
  --exclusive allocate exclusive access to nodes. For large parallel jobs.
  --constraint=feature request feature (see slurm features for the current list of configured features)
  --array=0-5,7,10-15 Run job multiple times, use variable $SLURM_ARRAY_TASK_ID to adjust parameters.
  --gres=gpu:n request n gpu cards
  --gres=spindle:n request n local disk drives (works in all partitions)
  --mail-type=type notify of events: BEGIN, END, FAIL, ALL, REQUEUE (not on triton) or ALL. MUST BE used with --mail-user= only
  --mail-user=your@email whome to send the email
srun -N <N_nodes> hostname Print allocated nodes (from within script)
Command Description
slurm q ; slurm qq Status of your queued jobs (long/short)
slurm partitions Overview of partitions (A/I/O/T=active,idle,other,total)
slurm cpus <partition> list free CPUs in a partition
slurm history [1day,2hour,…] Show status of recent jobs
seff <jobid> Show percent of mem/CPU used in job
slurm j <jobid> Job details (only while running)
slurm s ; slurm ss <partition> Show status of all jobs
sacct Full history information (advanced, needs args)

Full slurm command help:

$ slurm

Show or watch job queue:
 slurm [watch] queue     show own jobs
 slurm [watch] q   show user's jobs
 slurm [watch] quick     show quick overview of own jobs
 slurm [watch] shorter   sort and compact entire queue by job size
 slurm [watch] short     sort and compact entire queue by priority
 slurm [watch] full      show everything
 slurm [w] [q|qq|ss|s|f] shorthands for above!
 slurm qos               show job service classes
 slurm top [queue|all]   show summary of active users
Show detailed information about jobs:
 slurm prio [all|short]  show priority components
 slurm j|job      show everything else
 slurm steps      show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
 slurm h|history   show jobs finished since, e.g. "1day" (default)
 slurm shares
Show nodes and resources in the cluster:
 slurm p|partitions      all partitions
 slurm n|nodes           all cluster nodes
 slurm c|cpus            total cpu cores in use
 slurm cpus   cores available to partition, allocated and free
 slurm cpus jobs         cores/memory reserved by running jobs
 slurm cpus queue        cores/memory required by pending jobs
 slurm features          List features and GRES

Examples:
 slurm q
 slurm watch shorter
 slurm cpus batch
 slurm history 3hours

Other advanced commands (many require lots of parameters to be useful):

Command Description
squeue Full info on queues
sinfo Advanced info on partitions
slurm nodes List all nodes

Toolchains

Toolchain Compiler version MPI version BLAS version ScaLAPACK version FFTW version CUDA version
GOOLF Toolchains:            
goolf/triton-2016a GCC/4.9.3 OpenMPI/1.10.2 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4
goolf/triton-2016b GCC/5.4.0 OpenMPI/1.10.3 OpenBLAS/0.2.18 ScaLAPACK/2.0.2 FFTW/3.3.4
goolfc/triton-2016a GCC/4.9.3 OpenMPI/1.10.2 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4 7.5.18
goolfc/triton-2017a GCC/5.4.0 OpenMPI/2.0.1 OpenBLAS/0.2.19 ScaLAPACK/2.0.2 FFTW/3.3.4 8.0.61
GMPOLF Toolchains:            
gmpolf/triton-2016a GCC/4.9.3 MPICH/3.0.4 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4
gmpolfc/triton-2016a GCC/4.9.3 MPICH/3.0.4 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4 7.5.18
GMVOLF Toolchains:            
gmvolf/triton-2016a GCC/4.9.3 MVAPICH2/2.0.1 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4
gmvolfc/triton-2016a GCC/4.9.3 MVAPICH2/2.0.1 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4 7.5.18
IOOLF Toolchains:            
ioolf/triton-2016a icc/2015.3.187 OpenMPI/1.10.2 OpenBLAS/0.2.15 ScaLAPACK/2.0.2 FFTW/3.3.4
IOMKL Toolchains:            
iomkl/triton-2016a icc/2015.3.187 OpenMPI/1.10.2 imkl/11.3.1.150 imkl/11.3.1.150 imkl/11.3.1.150
iomkl/triton-2016b icc/2015.3.187 OpenMPI/1.10.3 imkl/11.3.1.150 imkl/11.3.1.150 imkl/11.3.1.150
iompi/triton-2017a icc/2017.1.132 OpenMPI/2.0.1 imkl/2017.1.132 imkl/2017.1.132 imkl/2017.1.132

Hardware

Node name Number of nodes Node type Arch CPU type Memory Configuration GPUs
pe[1-48,65-81] 65 Dell PowerEdge C4130 hsw 2x12 core Xeon E5 2680 v3 2.50GHz 128GB DDR4-2133  
pe[49-64,82] 17 Dell PowerEdge C4130 hsw 2x12 core Xeon E5 2680 v3 2.50GHz 256GB DDR4-2133  
pe[84-91] 8 Dell PowerEdge C4130 bdw 2x14 core Xeon E5 2680 v4 2.40GHz 128GB DDR4-2400  
ivb[1-24] 24 ProLiant SL230s G8 ivb 2x10 core Xeon E5 2680 v2 2.80GHz 256GB DDR3-1667  
ivb[25-48] 24 ProLiant SL230s G8 ivb 2x10 core Xeon E5 2680 v2 2.80GHz 64GB DDR3-1667  
wsm[1-112] 112 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 48GB DD3-1333  
wsm[113-136] 24 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 96GB DD3-1333  
tb[007-009] 2 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 48GB DD3-1333  
gpu[1-8] 8 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 24GB DD3-1333 2x M2090
gpu[9-11] 3 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 48GB DD3-1333 2x M2090
gpu[12-16] 5 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 24GB DD3-1333 2x M2050
gpu[17-19] 3 ProLiant SL390s G7 wsm 2x6 core Intel Xeon X5650 2.67GHz 24GB DD3-1333 2x M2070
gpu[20-22] 3 Dell PowerEdge C4130 hsw 2x6 core Xeon E5 2620 v3 2.50GHz 128GB DDR4-2133 4x2 GPU K80
gpu[23-27] 5 Dell PowerEdge C4130 hsw 2x12 core Xeon E5-2680 v3 @ 2.5GHz 256GB DDR4-2400 4x P100
dgx[01-02] 2 Nvidia DGX-1 bdw 2x20 core Xeon E5-2698 v4 @ 2.2GHz 512GB DDR4-2133 8x V100
Node type CPU count
48GB Xeon Westmere (2012) 1404
24GB Xeon Westmere + 2x GPU (2012) 120
96GB Xeon Westmere (2012) 288
1TB Xeon Westmere (2012) 48
256GB Xeon Ivy Bridge (2014) 480
64GB Xeon Ivy Bridge (2014) 480
128GB Xeon Haswell (2016) 1224
256GB Xeon Haswell (2016) 360
128GB Xeon Haswell + 4x GPU (2016) 36