Triton quick reference
In this page, you have all important reference information
Quick reference guide for the Triton cluster at Aalto University, but also useful for many other Slurm clusters. See also this printable Triton cheatsheet, as well as other cheatsheets.
Connecting
See also: Connecting to Triton.
Method |
Description |
From where? |
---|---|---|
ssh from Aalto networks |
Standard way of connecting via command line. Hostname is
>Linux/Mac/Win from command line: >Windows: same, see Connecting via ssh for details options. |
VPN and Aalto networks (which is VPN, most wired,
internal servers, |
ssh (from rest of Internet) |
Use Aalto VPN and row above. If needed: same as above, but must set up SSH key and then |
Whole Internet, if you first set up SSH key AND also use passwords (since 2023) |
VDI |
“Virtual desktop interface”, https://vdi.aalto.fi, from there you have to
|
Whole Internet |
Jupyter |
https://jupyter.triton.aalto.fi provides the Jupyter interface directly on Triton (including command line). Get a terminal with “New → Other → Terminal”. More info. |
Whole Internet |
Open OnDemand |
https://ood.triton.aalto.fi, Web-based interface to the cluster. Includes shell access and data transfer. “Triton Shell Access” for the terminal. More info. |
VPN and Aalto networks |
VSCode |
Web-based available via OpenOnDemand (row above). Desktop-based “Remote SSH” allows running on Triton (which is OK, but don’t use it for large computation). More info. |
Same as Open OnDemand or SSH above |
Modules
See also: Software modules.
Command |
Description |
---|---|
|
load module |
|
list all modules |
|
search modules |
|
list currently loaded modules |
|
details on a module |
|
details on a module |
|
unload a module |
|
save module collection to this alias (saved in |
|
list all saved collections |
|
details on a collection |
|
load saved module collection (faster than loading individually) |
|
unload all loaded modules (faster than unloading individually) |
Common software
See also: Applications.
Python:
module load anaconda
for the Anaconda distribution of Python 3, including a lot of useful packages. More info.R:
module load r
for a basic R package. More info.Matlab:
module load matlab
for the latest Matlab version. More info.Julia:
module load julia
for the latest Julia version. More info.
Storage
See also: Data storage
Name |
Path |
Quota |
Backup |
Locality |
Purpose |
---|---|---|---|---|---|
Home |
|
hard quota 10GB |
Nightly |
all nodes |
Small user specific files, no calculation data. |
Work |
|
200GB and 1 million files |
x |
all nodes |
Personal working space for every user. Calculation data etc. Quota can be increased on request. |
Scratch |
|
on request |
x |
all nodes |
Department/group specific project directories. |
Local temp |
|
limited by disk size |
x |
single-node |
Primary (and usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node. |
Local persistent |
|
varies |
x |
dedicated group servers only |
Local disk persistent storage. On servers purchased for a specific group. Not backed up. |
ramfs (login nodes only) |
|
limited by memory |
x |
single-node |
Ramfs on the login node only, in-memory filesystem |
Remote data access
See also: Remote access to data.
Method |
Description |
---|---|
rsync transfers |
Transfer back and forth via command line. Set up ssh first.
|
SFTP transfers |
Operates over SSH. sftp://triton.aalto.fi in file browsers
(Linux at least), FileZilla (to |
SMB mounting |
Mount (make remote viewable locally) to your own computer. Linux: File browser, MacOS: File browser, same URL as Linux Windows: |
Partitions
Partition |
Max job size |
Mem/core (GB) |
Tot mem (GB) |
Cores/node |
Limits |
Use |
---|---|---|---|---|---|---|
<default> |
If you leave off all possible partitions will be used (based on time/mem) |
|||||
debug |
2 nodes |
2.66 - 12 |
32-256 |
12,20,24 |
15 min |
testing and debugging short interactive. work. 1 node of each arch. |
batch |
16 nodes |
2.66 - 12 |
32-510 |
12, 20,24,40,128 |
5d |
primary partition, all serial & parallel jobs |
short |
8 nodes |
4 - 12 |
48-256 |
12, 20,24 |
4h |
short serial & parallel jobs, +96 dedicated CPU cores |
hugemem |
1 node |
43 |
1024 |
24 |
3d |
huge memory jobs, 1 node only |
gpu |
1 node, 2-8GPUs |
2 - 10 |
24-128 |
12 |
5d |
Long gpu jobs |
gpushort |
4 nodes, 2-8 GPUs |
2 - 10 |
24-128 |
12 |
4h |
Short GPU jobs |
interactive |
2 nodes |
5 |
128 |
24 |
1d |
for |
Use slurm partitions
to see more details.
Job submission
See also: Serial Jobs, Array jobs: embarassingly parallel execution, Parallel computing: different methods explained, Serial Jobs.
Command |
Description |
---|---|
|
submit a job to queue (see standard options below) |
|
Within a running job script/environment: Run code using the allocated resources (see options below) |
|
On frontend: submit to queue, wait until done, show output. (see options below) |
|
Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below) |
|
(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done. |
|
Cancel a job in queue |
|
(advanced) Allocate resources from frontend node. Use |
|
View/modify job and slurm configuration |
Command |
Option |
Description |
---|---|---|
|
|
time limit |
|
time limit, days-hours |
|
|
job partition. Usually leave off and things are auto-detected. |
|
|
request n MB of memory per core |
|
|
request n MB memory per node |
|
|
Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.) |
|
|
allocate minimum of n, maximum of m nodes. |
|
|
allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.) |
|
|
short job name |
|
|
print output into file output |
|
|
print errors into file error |
|
|
allocate exclusive access to nodes. For large parallel jobs. |
|
|
request feature (see |
|
|
Run job multiple times, use variable |
|
|
request a GPU, or |
|
|
request nodes that have disks, |
|
|
notify of events: |
|
|
whome to send the email |
|
|
|
Print allocated nodes (from within script) |
Command |
Description |
---|---|
|
Status of your queued jobs (long/short) |
|
Overview of partitions (A/I/O/T=active,idle,other,total) |
|
list free CPUs in a partition |
|
Show status of recent jobs |
|
Show percent of mem/CPU used in job. See Monitoring. |
|
Show GPU efficiency |
|
Job details (only while running) |
|
Show status of all jobs |
|
Full history information (advanced, needs args) |
Full slurm command help:
$ slurm
Show or watch job queue:
slurm [watch] queue show own jobs
slurm [watch] q show user's jobs
slurm [watch] quick show quick overview of own jobs
slurm [watch] shorter sort and compact entire queue by job size
slurm [watch] short sort and compact entire queue by priority
slurm [watch] full show everything
slurm [w] [q|qq|ss|s|f] shorthands for above!
slurm qos show job service classes
slurm top [queue|all] show summary of active users
Show detailed information about jobs:
slurm prio [all|short] show priority components
slurm j|job show everything else
slurm steps show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
slurm h|history show jobs finished since, e.g. "1day" (default)
slurm shares
Show nodes and resources in the cluster:
slurm p|partitions all partitions
slurm n|nodes all cluster nodes
slurm c|cpus total cpu cores in use
slurm cpus cores available to partition, allocated and free
slurm cpus jobs cores/memory reserved by running jobs
slurm cpus queue cores/memory required by pending jobs
slurm features List features and GRES
Examples:
slurm q
slurm watch shorter
slurm cpus batch
slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
Command |
Description |
---|---|
|
Full info on queues |
|
Advanced info on partitions |
|
List all nodes |
Slurm examples
See also: Serial Jobs, Array jobs: embarassingly parallel execution.
Simple batch script, submit with sbatch the_script.sh
:
#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=1G
module load anaconda
python my_script.py
Simple batch script with array (can also submit with
sbatch --array=1-10 the_script.sh
):
#!/bin/bash -l
#SBATCH --array=1-10
python my_script.py --seed=$SLURM_ARRAY_TASK_ID
Toolchains
Toolchain |
Compiler version |
MPI version |
BLAS version |
ScaLAPACK version |
FFTW version |
CUDA version |
---|---|---|---|---|---|---|
GOOLF Toolchains: |
||||||
goolf/triton-2016a |
GCC/4.9.3 |
OpenMPI/1.10.2 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
|
goolf/triton-2016b |
GCC/5.4.0 |
OpenMPI/1.10.3 |
OpenBLAS/0.2.18 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
|
goolfc/triton-2016a |
GCC/4.9.3 |
OpenMPI/1.10.2 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
7.5.18 |
goolfc/triton-2017a |
GCC/5.4.0 |
OpenMPI/2.0.1 |
OpenBLAS/0.2.19 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
8.0.61 |
GMPOLF Toolchains: |
||||||
gmpolf/triton-2016a |
GCC/4.9.3 |
MPICH/3.0.4 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
|
gmpolfc/triton-2016a |
GCC/4.9.3 |
MPICH/3.0.4 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
7.5.18 |
GMVOLF Toolchains: |
||||||
gmvolf/triton-2016a |
GCC/4.9.3 |
MVAPICH2/2.0.1 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
|
gmvolfc/triton-2016a |
GCC/4.9.3 |
MVAPICH2/2.0.1 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
7.5.18 |
IOOLF Toolchains: |
||||||
ioolf/triton-2016a |
icc/2015.3.187 |
OpenMPI/1.10.2 |
OpenBLAS/0.2.15 |
ScaLAPACK/2.0.2 |
FFTW/3.3.4 |
|
IOMKL Toolchains: |
||||||
iomkl/triton-2016a |
icc/2015.3.187 |
OpenMPI/1.10.2 |
imkl/11.3.1.150 |
imkl/11.3.1.150 |
imkl/11.3.1.150 |
|
iomkl/triton-2016b |
icc/2015.3.187 |
OpenMPI/1.10.3 |
imkl/11.3.1.150 |
imkl/11.3.1.150 |
imkl/11.3.1.150 |
|
iompi/triton-2017a |
icc/2017.1.132 |
OpenMPI/2.0.1 |
imkl/2017.1.132 |
imkl/2017.1.132 |
imkl/2017.1.132 |
Hardware
See also: Cluster technical overview.
Node name |
Number of nodes |
Node type |
Year |
Arch ( |
CPU type |
Memory Configuration |
Infiniband |
GPUs |
Disks |
---|---|---|---|---|---|---|---|---|---|
pe[1-48,65-81] |
65 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
128GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[49-64,82] |
17 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
256GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[83-91] |
8 |
Dell PowerEdge C4130 |
2017 |
bdw avx avx2 |
2x14 core Xeon E5 2680 v4 2.40GHz |
128GB DDR4-2400 |
FDR |
900GB HDD |
|
c[639-647,649-653,655-656,658] |
17 |
ProLiant XL230a Gen9 |
2017 |
hsw avx avx2 |
2x12 core Xeon E5 2690 v3 2.60GHz |
128GB DDR4-2666 |
FDR |
450G HDD |
|
skl[1-48] |
48 |
Dell PowerEdge C6420 |
2019 |
skl avx avx2 avx512 |
2x20 core Xeon Gold 6148 2.40GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
csl[1-48] |
48 |
Dell PowerEdge C6420 |
2020 |
csl avx avx2 avx512 |
2x20 core Xeon Gold 6248 2.50GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
milan[1-32] |
32 |
Dell PowerEdge C6525 |
2023 |
milan avx avx2 |
2x64 core AMD EPYC 7713 @2.0 GHz |
512GB DDR4-3200 |
HDR-100 |
No disk |
|
fn3 |
1 |
Dell PowerEdge R940 |
2020 |
avx avx2 avx512 |
4x20 core Xeon Gold 6148 2.40GHz |
2TB DDR4-2666 |
EDR |
No disk |
|
gpu[1-10] |
10 |
Dell PowerEdge C4140 |
2020 |
skl avx avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
gpu[11-17,38-44] |
14 |
Dell PowerEdge XE8545 |
2021, 2023 |
milan avx avx2 a100 |
2x24 core AMD EPYC 7413 @ 2.65GHz |
503GB DDR4-3200 |
EDR |
4x A100 80GB |
440 GB SSD |
gpu[20-22] |
3 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 kepler |
2x6 core Xeon E5 2620 v3 2.50GHz |
128GB DDR4-2133 |
EDR |
4x2 GPU K80 |
440 GB SSD |
gpu[23-27] |
5 |
Dell PowerEdge C4130 |
2017 |
hsw avx avx2 pascal |
2x12 core Xeon E5-2680 v3 @ 2.5GHz |
256GB DDR4-2400 |
EDR |
4x P100 |
720 GB SSD |
gpu[28-37] |
10 |
Dell PowerEdge C4140 |
2019 |
skl avx avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
dgx[1-2] |
2 |
Nvidia DGX-1 |
2018 |
bdw avx avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 16GB |
7 TB SSD |
dgx[3-7] |
5 |
Nvidia DGX-1 |
2018 |
bdw avx avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 32GB |
7 TB SSD |
gpuamd1 |
1 |
Dell PowerEdge R7525 |
2021 |
rome avx avx2 mi100 |
2x8 core AMD EPYC 7262 @3.2GHz |
250GB DDR4-3200 |
EDR |
3x MI100 |
32GB SSD |
Node type |
CPU count |
---|---|
48GB Xeon Westmere (2012) |
1404 |
24GB Xeon Westmere + 2x GPU (2012) |
120 |
96GB Xeon Westmere (2012) |
288 |
1TB Xeon Westmere (2012) |
48 |
256GB Xeon Ivy Bridge (2014) |
480 |
64GB Xeon Ivy Bridge (2014) |
480 |
128GB Xeon Haswell (2016) |
1224 |
256GB Xeon Haswell (2016) |
360 |
128GB Xeon Haswell + 4x GPU (2016) |
36 |
GPUs
See also: GPU computing.
Card |
Slurm feature name ( |
Slurm gres name ( |
total amount |
nodes |
architecture |
compute threads per GPU |
memory per card |
CUDA compute capability |
---|---|---|---|---|---|---|---|---|
Tesla K80* |
|
|
12 |
gpu[20-22] |
Kepler |
2x2496 |
2x12GB |
3.7 |
Tesla P100 |
|
|
20 |
gpu[23-27] |
Pascal |
3854 |
16GB |
6.0 |
Tesla V100 |
|
|
40 |
gpu[1-10] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
40 |
gpu[28-37] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
16 |
dgx[1-7] |
Volta |
5120 |
16GB |
7.0 |
Tesla A100 |
|
|
56 |
gpu[11-17,38-44] |
Ampere |
7936 |
80GB |
8.0 |
AMD MI100 (testing) |
|
Use |
gpuamd[1] |
Conda
See also: Python Environments with Conda
Command |
Description |
---|---|
|
Load module that provides miniconda on Triton - recommended for
using |
See link for six commands to run once per user account on Triton (to avoid filling up all space on your home directory). |
|
name: conda-example
channels:
- conda-forge
dependencies:
- numpy
- pandas
|
Minimal |
Environment management: |
|
|
Create environment from yaml file. Use |
|
Activate environment of name NAME. Note we use this and not
|
|
Deactivate conda from this session. |
|
List all environments. |
|
Remove the environment of that name. |
Package management: |
Inside the activate environment |
|
List packages in currently active environment. |
|
Install packages in an environment with minimal changes to what is already installed. Usually you would want to go at add them to environment.yml if it is a dependency. Better: add to environment.yml and then see next line. |
|
Update an environment file based on environment.yml |
|
Export an environment.yml that describes the current
environment. Add |
|
Search for a package. List includes name, version, build version (often including linked libraries like Python/CUDA), and channel. |
Other: |
|
|
Use |
|
Clean up cached files to free up space (not environments or packages in them). |
|
Used when making CUDA environment on login node (choose right
CUDA version for you). Used with |
Channel Package selection |
Package selection for tensorflow. The first |
Channels Package selection |
Package selection for pytorch. The first |
CUDA |
In channel conda-forge, automatically selected based on
software you need. For manual compilation, package
|
Command line
See also: Linux shell crash course.
- General notes
The command line has many small programs that when connected, allow you to do many things. Only a little bit of this is shown here.
Programs are generally silent if everything worked, and only print an error if something goes wrong.
ls [DIR]
List current directory (or DIR if given).
pwd
Print current directory.
cd DIR
Change directory.
..
is parent directory,/
is root,/
is also chaining directories, e.g.dir1/dir2
or../../
nano FILE
Edit a file (there are many other editors, but
nano
is common, nice, and simple).mkdir DIR-NAME
Make a new directory.
cat FILE
Print entire contents of file to standard output (the terminal).
less FILE
Less is a “pager”, and lets you scroll through a file (up/down/pageup/pagedown).
q
to quit,/
to search.mv SOURCE DEST
Move (=rename) a file.
mv SOURCE1 SOURCE2 DEST-DIRECTORY/
copies multiple files to a directory.cp SOURCE DEST
Copy a file. The
DEST-DIRECTORY/
syntax ofmv
works as well.rm FILE ...
Remove a file. Note, from the command line there is no recovery, so always pause and check before running this command! The
-i
option will make it confirm before removing each file. Add-r
to remove whole directories recursively.head [FILE]
Print the first 10 (or N lines with
-n N
) of a file. Can take input from standard input instead ofFILE
.tail
is similar but the end of the file.tail [FILE]
See above.
grep PATTERN [FILE]
Print lines matching a pattern in a file, suitable as a primitive find feature, or quickly searching for output. Can also use standard input instead of
FILE
.du [-ash] [DIR]
Print disk usage of a directory. Default is KiB, rounded up to block sizes (1 or 4 KiB),
-h
means “human readable” (MB, GB, etc),-s
means “only of DIR, not all subdirectories also”.-a
means “all files, not only directories”. A common pattern isdu -h DIR | sort -h
to print all directories and their sizes, sorted by size.stat
Show detailed information on a file’s properties.
find [DIR]
find can do almost anything, but that means it’s really hard to use it well. Let’s be practical: with only a directory argument, it prints all files and directories recursively, which might be useful itself. Many of us do
find DIR | grep NAME
to grep for the name we want (even though this isn’t the “right way”, there are find options which do this same thing more efficiently).|
(pipe):COMMAND1 | COMMAND2
The output of
COMMAND1
is sent to the input ofCOMMAND2
. Useful for combining simple commands together into complex operations - a core part of the unix philosophy.>
(output redirection):COMMAND > FILE
Write standard output of
COMMAND
toFILE
. Any existing content is lost.>>
(appending output redirection):COMMAND >> FILE
Like above, but doesn’t lose content: it appends.
<
(input redirection):COMMAND < FILE
Opposite of
>
, input toCOMMAND
comes fromFILE
.type COMMAND
orwhich COMMAND
Show exactly what will be run, for a given command (e.g.
type python3
).man COMMAND-NAME
Browse on-line help for a command.
q
will exit,/
will search (it usesless
as its pager by default).-h
and--help
Common command line options to print help on a command. But, it has to be implemented by each command.