Triton quick reference
In this page, you have all important reference information
Quick reference guide for the Triton cluster at Aalto University, but also useful for many other Slurm clusters. See also this printable Triton cheatsheet, as well as other cheatsheets.
Connecting
See also: Connecting to Triton.
Method |
Description |
From where? |
---|---|---|
Standard way of connecting via command line. Hostname is
>Linux/Mac/Win from command line: >Windows: same, see Connecting via ssh for details options. |
VPN and Aalto networks (which is VPN, most wired,
internal servers, |
|
Use Aalto VPN and row above. If needed: same as above, but must set up SSH key and then |
Whole Internet, if you first set up SSH key AND also use passwords (since 2023) |
|
https://ondemand.triton.aalto.fi, Web-based interface to the cluster. Also known as OOD. Includes shell access, GUI, data transfer, Jupyter and a number of GUI applications like Matlab etc. More info. |
Whole internet |
|
Since April 2024 Jupyter is part of Open OnDemand, see above. Use the “Jupyter” app to get same environment as before. More info. |
See Open OnDemand above |
|
With the “Remote-SSH” extension it can provide shell access and file transfer. See the VS Code page for some important usage warnings. |
Same as SSH options above above. |
Modules
See also: Software modules.
Command |
Description |
---|---|
|
load module |
|
list all modules |
|
search modules |
|
show prerequisite modules to this one |
|
list currently loaded modules |
|
details on a module |
|
details on a module |
|
unload a module |
|
save module collection to this alias (saved in |
|
list all saved collections |
|
details on a collection |
|
load saved module collection (faster than loading individually) |
|
unload all loaded modules (faster than unloading individually) |
Common software
See also: Applications.
Python:
module load scicomp-python-env
for the an Aalto Scientific Computing managed Python environment with common packages. More info.module load mamba
for mamba/conda for making your own environments (see below)
R:
module load r
for a basic R package. More info.module load scicomp-r-env
for an R module with various packages pre-installed
Matlab:
module load matlab
for the latest Matlab version. More info.
Storage
See also: Data storage
Name |
Path |
Quota |
Backup |
Sharing across |
Purpose |
---|---|---|---|---|---|
Home |
|
hard quota 10GB |
Nightly |
all nodes |
Small user specific files, no calculation data. |
Work |
|
200GB and 1 million files |
x |
all nodes |
Personal working space for every user. Calculation data etc. Quota can be increased on request. |
Scratch |
|
on request |
x |
all nodes |
Department/group specific project directories. |
Local temp |
|
local disk size |
x |
single-node |
(Usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node. Request with |
ramfs |
|
limited by memory |
x |
single-node |
Very fast but small in-memory filesystem |
Remote data access
See also: Remote access to data.
Method |
Description |
---|---|
rsync transfers |
Transfer back and forth via command line. Set up ssh first.
|
SFTP transfers |
Operates over SSH. sftp://triton.aalto.fi in file browsers
(Linux at least), FileZilla (to |
SMB mounting |
Mount (make remote viewable locally) to your own computer. Linux: File browser, MacOS: File browser, same URL as Linux Windows: |
Partitions
Partition |
Max job size |
Mem/core (GB) |
Tot mem (GB) |
Cores/node |
Limits |
Use |
---|---|---|---|---|---|---|
<default> |
If you leave off all possible partitions will be used (based on time/mem) |
Use slurm partitions
to see more details.
Job submission
See also: Serial Jobs, Array jobs: embarassingly parallel execution, Parallel computing: different methods explained, Serial Jobs.
Command |
Description |
---|---|
|
submit a job to queue (see standard options below) |
|
Within a running job script/environment: Run code using the allocated resources (see options below) |
|
On frontend: submit to queue, wait until done, show output. (see options below) |
|
Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below) |
|
(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done. |
|
Cancel a job in queue |
|
(advanced) Allocate resources from frontend node. Use |
|
View/modify job and slurm configuration |
Command |
Option |
Description |
---|---|---|
|
|
time limit |
|
time limit, days-hours |
|
|
job partition. Usually leave off and things are auto-detected. |
|
|
request n MB of memory per core |
|
|
request n MB memory per node |
|
|
Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.) |
|
|
allocate minimum of n, maximum of m nodes. |
|
|
allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.) |
|
|
short job name |
|
|
print output into file output |
|
|
print errors into file error |
|
|
allocate exclusive access to nodes. For large parallel jobs. |
|
|
request feature (see |
|
|
request nodes that have local disks |
|
|
Run job multiple times, use variable |
|
|
request a GPU, or |
|
|
notify of events: |
|
|
whome to send the email |
|
|
|
Print allocated nodes (from within script) |
Command |
Description |
---|---|
|
Status of your queued jobs (long/short) |
|
Overview of partitions (A/I/O/T=active,idle,other,total) |
|
list free CPUs in a partition |
|
Show status of recent jobs |
|
Show percent of mem/CPU used in job. See Monitoring. |
|
Show GPU efficiency |
|
Job details (only while running) |
|
Show status of all jobs |
|
Full history information (advanced, needs args) |
Full slurm command help:
$ slurm
Show or watch job queue:
slurm [watch] queue show own jobs
slurm [watch] q show user's jobs
slurm [watch] quick show quick overview of own jobs
slurm [watch] shorter sort and compact entire queue by job size
slurm [watch] short sort and compact entire queue by priority
slurm [watch] full show everything
slurm [w] [q|qq|ss|s|f] shorthands for above!
slurm qos show job service classes
slurm top [queue|all] show summary of active users
Show detailed information about jobs:
slurm prio [all|short] show priority components
slurm j|job show everything else
slurm steps show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
slurm h|history show jobs finished since, e.g. "1day" (default)
slurm shares
Show nodes and resources in the cluster:
slurm p|partitions all partitions
slurm n|nodes all cluster nodes
slurm c|cpus total cpu cores in use
slurm cpus cores available to partition, allocated and free
slurm cpus jobs cores/memory reserved by running jobs
slurm cpus queue cores/memory required by pending jobs
slurm features List features and GRES
Examples:
slurm q
slurm watch shorter
slurm cpus batch
slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
Command |
Description |
---|---|
|
Full info on queues |
|
Advanced info on partitions |
|
List all nodes |
Slurm examples
See also: Serial Jobs, Array jobs: embarassingly parallel execution.
Simple batch script, submit with sbatch the_script.sh
:
#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=1G
module load scicomp-python-env
python my_script.py
Simple batch script with array (can also submit with
sbatch --array=1-10 the_script.sh
):
#!/bin/bash -l
#SBATCH --array=1-10
python my_script.py --seed=$SLURM_ARRAY_TASK_ID
Hardware
See also: Cluster technical overview.
Node name |
Number of nodes |
Node type |
Year |
Arch ( |
CPU type |
Memory Configuration |
Infiniband |
GPUs |
Disks |
---|---|---|---|---|---|---|---|---|---|
pe[1-48,65-81] |
65 |
Dell PowerEdge C4130 |
2016 |
hsw avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
128GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[49-64,82] |
17 |
Dell PowerEdge C4130 |
2016 |
hsw avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
256GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[83-91] |
8 |
Dell PowerEdge C4130 |
2017 |
bdw avx2 |
2x14 core Xeon E5 2680 v4 2.40GHz |
128GB DDR4-2400 |
FDR |
900GB HDD |
|
skl[1-48] |
48 |
Dell PowerEdge C6420 |
2019 |
skl avx2 avx512 |
2x20 core Xeon Gold 6148 2.40GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
csl[1-48] |
48 |
Dell PowerEdge C6420 |
2020 |
csl avx2 avx512 |
2x20 core Xeon Gold 6248 2.50GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
milan[1-32] |
32 |
Dell PowerEdge C6525 |
2023 |
milan avx2 |
2x64 core AMD EPYC 7713 @2.0 GHz |
512GB DDR4-3200 |
HDR-100 |
No disk |
|
fn3 |
1 |
Dell PowerEdge R940 |
2020 |
avx2 avx512 |
4x20 core Xeon Gold 6148 2.40GHz |
2TB DDR4-2666 |
EDR |
No disk |
|
gpu[1-10] |
10 |
Dell PowerEdge C4140 |
2020 |
skl avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
gpu[11-17,38-44] |
14 |
Dell PowerEdge XE8545 |
2021, 2023 |
milan avx2 ampere a100 |
2x24 core AMD EPYC 7413 @ 2.65GHz |
503GB DDR4-3200 |
EDR |
4x A100 80GB |
440 GB SSD |
gpu[20-22] |
3 |
Dell PowerEdge C4130 |
2016 |
hsw avx2 kepler |
2x6 core Xeon E5 2620 v3 2.50GHz |
128GB DDR4-2133 |
EDR |
4x2 GPU K80 |
440 GB SSD |
gpu[23-27] |
5 |
Dell PowerEdge C4130 |
2017 |
hsw avx2 pascal |
2x12 core Xeon E5-2680 v3 @ 2.5GHz |
256GB DDR4-2400 |
EDR |
4x P100 |
720 GB SSD |
gpu[28-37] |
10 |
Dell PowerEdge C4140 |
2019 |
skl avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
dgx[1-2] |
2 |
Nvidia DGX-1 |
2018 |
bdw avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 16GB |
7 TB SSD |
dgx[3-7] |
5 |
Nvidia DGX-1 |
2018 |
bdw avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 32GB |
7 TB SSD |
gpuamd1 |
1 |
Dell PowerEdge R7525 |
2021 |
rome avx2 mi100 |
2x8 core AMD EPYC 7262 @3.2GHz |
250GB DDR4-3200 |
EDR |
3x MI100 |
32GB SSD |
gpu[45-48] |
4 |
Dell PowerEdge XE8640 |
2024 |
saphr avx2 h100 hopper |
2x48 core Xeon Platinum 8468 2.1GHz |
1024GB DDR5-4800 |
HDR |
4x H100 SXM 80GB |
21 TB SSD |
GPUs
See also: GPU computing.
Card |
Slurm partition ( |
Slurm feature name ( |
Slurm gres name ( |
total amount |
nodes |
architecture |
compute threads per GPU |
memory per card |
CUDA compute capability |
---|---|---|---|---|---|---|---|---|---|
Tesla K80* |
Not available |
|
|
12 |
gpu[20-22] |
Kepler |
2x2496 |
2x12GB |
3.7 |
Tesla P100 |
|
|
|
20 |
gpu[23-27] |
Pascal |
3854 |
16GB |
6.0 |
Tesla V100 |
|
|
|
40 |
gpu[1-10] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
|
40 |
gpu[28-37] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
|
16 |
dgx[1-2] |
Volta |
5120 |
16GB |
7.0 |
Tesla V100 |
|
|
|
16 |
dgx[3-7] |
Volta |
5120 |
32GB |
7.0 |
Tesla A100 |
|
|
|
56 |
gpu[11-17,38-44] |
Ampere |
7936 |
80GB |
8.0 |
Tesla H100 |
|
|
|
16 |
gpu[45-48] |
Hopper |
16896 |
80GB |
9.0 |
AMD MI100 (testing) |
Not yet installed |
|
Use |
gpuamd[1] |
Conda Environments (Mamba)
See also: Python Environments with Conda. Note that mamba
is a
drop-in replacement for conda
.
Command |
Description |
---|---|
|
Load module that provides conda/mamba Triton, for use making
your own environments. |
See link for six commands to run once per user account on Triton (to avoid filling up all space on your home directory). |
|
name: conda-example
channels:
- conda-forge
dependencies:
- numpy
- pandas
|
Minimal |
Environment management: |
Creating, activating, removing: |
|
Create environment from yaml file. Use |
|
Activate environment of name NAME. Note we use this and not
|
|
Deactivate conda from this session. HPC Cluster specific. |
|
List all environments. |
|
Remove the environment of that name. |
Package management: |
Inside the activate environment |
|
List packages in currently active environment. |
|
Update an environment based on updated environment.yml |
|
Install packages in an environment with minimal changes to what is already installed. Usually you would want to go at add them to environment.yml if it is a dependency. Better: add to environment.yml and see the previous line. |
|
Export an environment.yml that describes the current
environment. Add |
|
Search for a package. List includes name, version, build version (often including linked libraries like Python/CUDA), and channel. |
Other notes: |
|
|
Use |
|
Clean up cached files to free up space (not environments or packages in them). |
|
Used when making CUDA environment on login node (choose right
CUDA version for you). Used with |
Channel Package selection |
Package selection for tensorflow. The first |
Channels Package selection |
Package selection for pytorch. The first |
CUDA |
In channel conda-forge, automatically selected based on
software you need. For manual compilation, package
|
Command line
See also: Linux shell crash course.
- General notes
The command line has many small programs that when connected, allow you to do many things. Only a little bit of this is shown here.
Programs are generally silent if everything worked, and only print an error if something goes wrong.
ls [DIR]
List current directory (or DIR if given).
pwd
Print current directory.
cd DIR
Change directory.
..
is parent directory,/
is root,/
is also chaining directories, e.g.dir1/dir2
or../../
nano FILE
Edit a file (there are many other editors, but
nano
is common, nice, and simple).mkdir DIR-NAME
Make a new directory.
cat FILE
Print entire contents of file to standard output (the terminal).
less FILE
Less is a “pager”, and lets you scroll through a file (up/down/pageup/pagedown).
q
to quit,/
to search.mv SOURCE DEST
Move (=rename) a file.
mv SOURCE1 SOURCE2 DEST-DIRECTORY/
copies multiple files to a directory.cp SOURCE DEST
Copy a file. The
DEST-DIRECTORY/
syntax ofmv
works as well.rm FILE ...
Remove a file. Note, from the command line there is no recovery, so always pause and check before running this command! The
-i
option will make it confirm before removing each file. Add-r
to remove whole directories recursively.head [FILE]
Print the first 10 (or N lines with
-n N
) of a file. Can take input from standard input instead ofFILE
.tail
is similar but the end of the file.tail [FILE]
See above.
grep PATTERN [FILE]
Print lines matching a pattern in a file, suitable as a primitive find feature, or quickly searching for output. Can also use standard input instead of
FILE
.du [-ash] [DIR]
Print disk usage of a directory. Default is KiB, rounded up to block sizes (1 or 4 KiB),
-h
means “human readable” (MB, GB, etc),-s
means “only of DIR, not all subdirectories also”.-a
means “all files, not only directories”. A common pattern isdu -h DIR | sort -h
to print all directories and their sizes, sorted by size.stat
Show detailed information on a file’s properties.
find [DIR]
find can do almost anything, but that means it’s really hard to use it well. Let’s be practical: with only a directory argument, it prints all files and directories recursively, which might be useful itself. Many of us do
find DIR | grep NAME
to grep for the name we want (even though this isn’t the “right way”, there are find options which do this same thing more efficiently).|
(pipe):COMMAND1 | COMMAND2
The output of
COMMAND1
is sent to the input ofCOMMAND2
. Useful for combining simple commands together into complex operations - a core part of the unix philosophy.>
(output redirection):COMMAND > FILE
Write standard output of
COMMAND
toFILE
. Any existing content is lost.>>
(appending output redirection):COMMAND >> FILE
Like above, but doesn’t lose content: it appends.
<
(input redirection):COMMAND < FILE
Opposite of
>
, input toCOMMAND
comes fromFILE
.type COMMAND
orwhich COMMAND
Show exactly what will be run, for a given command (e.g.
type python3
).man COMMAND-NAME
Browse on-line help for a command.
q
will exit,/
will search (it usesless
as its pager by default).-h
and--help
Common command line options to print help on a command. But, it has to be implemented by each command.