Triton quick reference

In this page, you have all important reference information

Quick reference guide for the Triton cluster at Aalto University, but also useful for many other Slurm clusters. See also this printable Triton cheatsheet, as well as other cheatsheets.

Connecting

Modules

Common software

Storage

Remote data access

Quotas

Job submission

Command	Description
`sbatch`	submit a job to queue (see standard options below)
`srun`	Within a running job script/environment: Run code using the allocated resources (see options below)
`srun`	On frontend: submit to queue, wait until done, show output. (see options below)
`sinteractive`	Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below)
`srun --pty bash`	(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done.
`scancel JOBID`	Cancel a job in queue
`salloc`	(advanced) Allocate resources from frontend node. Use `srun` to run using those resources, `exit` to close shell when done (see options below)
`scontrol`	View/modify job and slurm configuration

Command	Option	Description
`sbatch`/`srun`/etc	`-t`, `--time=HH:MM:SS`	time limit
	`-t`, `--time=DD-HH`	time limit, days-hours
	`-p PARTITION`, `--partition=PARTITION`	job partition. Usually leave off and things are auto-detected.
	`--mem-per-cpu=N`	request N MB of memory per core
	`--mem=N`	request N MB memory per node
	`-c`, `--cpus-per-task=N`	*Allocate n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.)**
	`-N`, `--nodes=N-M`	allocate minimum of N, maximum of M nodes.
	`-n`, `--ntasks=N`	allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.)
	`--gpus=1`	request a GPU, or `--gpus=N` for multiple
	`--gres=min-vram:NNg`	request GPUs with at least `NN` GB of VRAM. To combine with other `--gres` options, use `--gres=min-vram:NNg,min-cuda-cc=NN`.
	`--gres=min-cuda-cc:NN`	request GPUs with CUDA compute capability of at least N.N. See above for combining with other GRES.
	`-J`, `--job-name=NAME`	short job name
	`-o OUTPUTFILE`	print output into file output
	`-e ERRORFILE`	print errors into file error
	`--exclusive`	allocate exclusive access to nodes. For large parallel jobs.
	`--constraint=FEATURE`	request feature (see `slurm features` for the current list of configured features, or Arch under the hardware list). Multiple with `--constraint="hsw\|skl"`.
	`--tmp=nnnG`	request a node with a local disk storage space and `nnn` GB of space on it.
	`--array=0-5,7,10-15`	Run job multiple times, use variable `$SLURM_ARRAY_TASK_ID` to adjust parameters.
	`--mail-type=TYPE`	notify of events: `BEGIN`, `END`, `FAIL`, `ALL`, `REQUEUE` (not on triton) or `ALL.` MUST BE used with `--mail-user=` only
	`--mail-user=first.last@aalto.fi`	Aalto email to send the notification about the job. External email addresses doesn’t work.
`srun`	`-N N_NODES hostname`	Print allocated nodes (from within script)

Command	Description
`slurm q` ; `slurm qq`	Status of your queued jobs (long/short)
`slurm partitions`	Overview of partitions (A/I/O/T=active,idle,other,total)
`slurm cpus PARTITION`	list free CPUs in a partition
`slurm history [1day,2hour,...]`	Show status of recent jobs
`seff JOBID`	Show percent of mem/CPU used in job. See Monitoring.
`sacct -o TRESUsageInAve -p -j JOBID`	Show GPU efficiency
`slurm j JOBID`	Job details (only while running)
`slurm s` ; `slurm ss PARTITION`	Show status of all jobs
`sacct`	Full history information (advanced, needs args)

Full slurm command help:

$ slurm

Show or watch job queue:
 slurm [watch] queue     show own jobs
 slurm [watch] q   show user's jobs
 slurm [watch] quick     show quick overview of own jobs
 slurm [watch] shorter   sort and compact entire queue by job size
 slurm [watch] short     sort and compact entire queue by priority
 slurm [watch] full      show everything
 slurm [w] [q|qq|ss|s|f] shorthands for above!
 slurm qos               show job service classes
 slurm top [queue|all]   show summary of active users
Show detailed information about jobs:
 slurm prio [all|short]  show priority components
 slurm j|job      show everything else
 slurm steps      show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
 slurm h|history   show jobs finished since, e.g. "1day" (default)
 slurm shares
Show nodes and resources in the cluster:
 slurm p|partitions      all partitions
 slurm n|nodes           all cluster nodes
 slurm c|cpus            total cpu cores in use
 slurm cpus   cores available to partition, allocated and free
 slurm cpus jobs         cores/memory reserved by running jobs
 slurm cpus queue        cores/memory required by pending jobs
 slurm features          List features and GRES

Examples:
 slurm q
 slurm watch shorter
 slurm cpus batch
 slurm history 3hours

Other advanced commands (many require lots of parameters to be useful):

Command	Description
`squeue`	Full info on queues
`sinfo`	Advanced info on partitions
`slurm nodes`	List all nodes

Slurm examples

Simple batch script, submit with sbatch the_script.sh:

#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=1G

module load scicomp-python-env
python my_script.py

Simple batch script with array (can also submit with sbatch --array=1-10 the_script.sh):

#!/bin/bash -l
#SBATCH --array=1-10

python my_script.py --seed=$SLURM_ARRAY_TASK_ID

Slurm Partitions

Partition	Max job size	Mem/core (GB)	Tot mem (GB)	Cores/node	Limits	Use
<default>						If you leave off all possible partitions will be used (based on time/mem)

Use slurm partitions to see more details.

Hardware

GPUs

Conda Environments (Mamba)

See also: Conda. Note that mamba is a drop-in replacement for conda.

Command	Description
`module load mamba`	Load module that provides conda/mamba Triton, for use making your own environments. `mamba` is a faster drop-in replacement for `conda`.
First time setup	See link for six commands to run once per user account on Triton (to avoid filling up all space on your home directory).
name: conda-example channels: - conda-forge dependencies: - numpy - pandas	Minimal `environment.yml` example. By defining our requirements in one place, our environment becomes reproducible and we can solve problems by re-creating it. “Dependencies” lists packages that will be installed.
Environment management:	Creating, activating, removing:
`mamba env create --file environment.yml`	Create environment from yaml file. Use `-n NAME` to set or override the name from the .yml file. Environments with `-n` are stored in `conda config --show envs_dirs`.
`source activate NAME`	Activate environment of name NAME. Note we use this and not `conda init`/`conda activate` to avoid changing Python for your whole account. HPC Cluster specific.
`source deactivate`	Deactivate conda from this session. HPC Cluster specific.
`mamba env list`	List all environments.
`mamba env remove -n NAME`	Remove the environment of that name.
Package management:	Inside the activate environment
`mamba list`	List packages in currently active environment.
`mamba env update --file environment.yml`	Update an environment based on updated environment.yml
`mamba install --freeze-installed --channel CHANNEL PACKAGE_NAME`	Install packages in an environment with minimal changes to what is already installed. Usually you would want to go at add them to environment.yml if it is a dependency. Better: add to environment.yml and see the previous line.
`mamba env export`	Export an environment.yml that describes the current environment. Add `--no-builds` to make it more portable across operating systems. Add `--from-history` to list only what you have explicitly requested in the past.
`mamba search [--channel conda-forge] NAME`	Search for a package. List includes name, version, build version (often including linked libraries like Python/CUDA), and channel.
Other notes:
`mamba ...`	Use `mamba` instead of `conda` for faster operations. `mamba` is a drop-in replacement. It should be installed in the environment.
`mamba clean -a`	Clean up cached files to free up space (not environments or packages in them).
`CONDA_OVERRIDE_CUDA="11.2" mamba ..`	Used when making CUDA environment on login node (choose right CUDA version for you). Used with `... env create` or `... install` to indicate that CUDA will be available when the program runs.
Channel `conda-forge` Package selection `tensorflow==cuda*`	Package selection for tensorflow. The first `*` can be replaced with the Tensorflow version specification
Channels `pytorch` and `conda-forge` Package selection `pytorch==cuda*`	Package selection for pytorch. The first `*` can be replaced with the pytorch version specification.
CUDA	In channel conda-forge, automatically selected based on software you need. For manual compilation, package `cudatoolkit` in conda-forge.

Method	Description	From where?
ssh from Aalto networks	Standard way of connecting via command line. Hostname is `triton.aalto.fi`. More SSH info. >Linux/Mac/Win from command line: `ssh USERNAME@triton.aalto.fi` >Windows: same, see Connecting via ssh for details options.	VPN and Aalto networks (which is VPN, most wired, internal servers, `eduroam`, `aalto` only if using an Aalto-managed laptop, but not `aalto open`). Simplest SSH option if you can use VPN.
ssh (from rest of Internet)	Use Aalto VPN and row above. If needed: same as above, but must set up SSH key and then `ssh -J USERNAME@kosh.aalto.fi USERNAME@triton.aalto.fi`.	Whole Internet, if you first set up SSH key AND also use passwords (since 2023)
Open OnDemand	https://ondemand.triton.aalto.fi, Web-based interface to the cluster. Also known as OOD. Includes shell access, GUI, data transfer, Jupyter and a number of GUI applications like Matlab etc. More info.	Whole internet
Jupyter	Since April 2024 Jupyter is part of Open OnDemand, see above. Use the “Jupyter” app to get same environment as before. More info.	See Open OnDemand above
VS Code / Codium desktop	With the “Remote-SSH” extension it can provide shell access and file transfer. See the VS Code page for some important usage warnings.	Same as SSH options above, but connect to `code.triton.aalto.fi` with SSH.
AI agents	They connect via SSH (see above), but read the AI coding agents page carefully for common problems.	Same as SSH options above, but connect to `code.triton.aalto.fi` with SSH.

Command / concept	Description
‘module’	A software that can be made available.
‘software stack’	Includes the compliers (etc) needed for other modules. Must be loaded before other modules.
`module load NAME`	Load module NAME, can combine multiple names.
`module load triton/NAME`	Load software stack module NAME. Makes other compiled software available. Generally, run `module spider` to first to see what you need.
`module avail`	List all modules available with current software stack.
`module spider PATTERN`	Search modules for PATTERN
`module spider NAME/ver`	Show prerequisite modules (the softare stack) to this one
`module list`	List currently loaded modules
`module show NAME`	Details on a module
`module help NAME`	Details on a module
`module unload NAME`	Unload a module
`module save ALIAS`	Save module collection to this alias (saved in `~/.lmod.d/`)
`module savelist`	List all saved collections
`module describe ALIAS`	Details on a collection
`module restore ALIAS`	Load saved module collection (faster than loading individually)
`module purge`	Unload all loaded modules (faster than unloading individually)

Name	Path	Quota	Backup	Sharing across	Purpose
Home	`$HOME` or `/home/USERNAME/`	hard quota 10GB	Nightly	all nodes	Small user specific files, no calculation data.
Work	`$WRKDIR` or `/scratch/work/USERNAME/`	200GB and 1 million files	x	all nodes	Personal working space for every user. Calculation data etc. Quota can be increased on request.
Scratch	`/scratch/DEPT/PROJECT/`	on request	x	all nodes	Department/group specific project directories.
Local temp (disk)	`/tmp/` (nodes with disks only)	local disk size	x	single-node	(Usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node. Request with `--tmp=nnnG`.
Local temp (ramfs)	`/dev/shm/` (and `/tmp/` on diskless nodes)	limited by memory	x	single-node	Very fast but small in-memory filesystem

Node name	Number of nodes	Node type	Year	Arch (`--constraint`)	CPU type	Memory Configuration	Infiniband	GPUs	Disks
pe[1-48,65-81]	65	Dell PowerEdge C4130	2016	hsw avx2	2x12 core Xeon E5 2680 v3 2.50GHz	128GB DDR4-2133	FDR		900GB HDD
pe[49-64,82]	17	Dell PowerEdge C4130	2016	hsw avx2	2x12 core Xeon E5 2680 v3 2.50GHz	256GB DDR4-2133	FDR		900GB HDD
pe[83-91]	8	Dell PowerEdge C4130	2017	bdw avx2	2x14 core Xeon E5 2680 v4 2.40GHz	128GB DDR4-2400	FDR		900GB HDD
skl[1-48]	48	Dell PowerEdge C6420	2019	skl avx2 avx512	2x20 core Xeon Gold 6148 2.40GHz	192GB DDR4-2667	EDR		No disk
csl[1-48]	48	Dell PowerEdge C6420	2020	csl avx2 avx512	2x20 core Xeon Gold 6248 2.50GHz	192GB DDR4-2667	EDR		No disk
milan[1-32]	32	Dell PowerEdge C6525	2023	milan avx2	2x64 core AMD EPYC 7713 @2.0 GHz	512GB DDR4-3200	HDR-100		No disk
fn3	1	Dell PowerEdge R940	2020	avx2 avx512	4x20 core Xeon Gold 6148 2.40GHz	2TB DDR4-2666	EDR		No disk
gpu[1-10]	10	Dell PowerEdge C4140	2020	skl avx2 avx512 volta	2x8 core Intel Xeon Gold 6134 @ 3.2GHz	384GB DDR4-2667	EDR	4x V100 32GB	1.5 TB SSD
gpu[11-17,38-44]	14	Dell PowerEdge XE8545	2021, 2023	milan avx2 ampere a100	2x24 core AMD EPYC 7413 @ 2.65GHz	503GB DDR4-3200	EDR	4x A100 80GB	440 GB SSD
gpu[28-37]	10	Dell PowerEdge C4140	2019	skl avx2 avx512 volta	2x8 core Intel Xeon Gold 6134 @ 3.2GHz	384GB DDR4-2667	EDR	4x V100 32GB	1.5 TB SSD
dgx[1-2,8-10,15-27]	22	Nvidia DGX-1	2018, 2025	bdw avx2 volta	2x20 core Xeon E5-2698 v4 @ 2.2GHz	512GB DDR4-2133	EDR	8x V100 16GB	7 TB SSD
dgx[3,5-7]	4	Nvidia DGX-1	2018	bdw avx2 volta	2x20 core Xeon E5-2698 v4 @ 2.2GHz	512GB DDR4-2133	EDR	8x V100 32GB	7 TB SSD
gpuamd1	1	Dell PowerEdge R7525	2021	rome avx2 mi100	2x8 core AMD EPYC 7262 @3.2GHz	250GB DDR4-3200 1	EDR	2x MI210, 1x MI100	32GB SSD
gpu[45-48]	4	Dell PowerEdge XE8640	2024	saphr avx2 h100 hopper	2x48 core Xeon Platinum 8468 2.1GHz	1024GB DDR5-4800	HDR	4x H100 SXM 80GB	21 TB SSD
gpu[49]	1	Dell PowerEdge XE9680	2025	emerald avx2 h200 hopper	2x32 core Xeon® Platinum 8562Y+ 2.8GHz	2048GB DDR5-5600	HDR	8x H200 SXM each split to 3x35GB	20 TB SSD
gpu[50-63]	14	Dell PowerEdge XE9680	2025	emerald avx2 h200 hopper	2x32 core Xeon® Platinum 8562Y+ 2.8GHz	2048GB DDR5-5600	HDR	8x H200 SXM 141GB	20 TB SSD
gpu[64-65]	2	Dell PowerEdge XE9780	2026	granite avx2 b300 blackwell	2x64 core Xeon® 6767P 2.4GHz	3072GB DDR5-6400	NDR	8x B300 SXM6 AC 288GB	28 TB SSD
gpuarm[1-2]	2	Supermicro ARS-221GL-NHIR	2026	NVIDIA grace h200 hopper	2x72 core Grace A02 Neoverse-V2 3.4GHz	1318GB LPDDR5-6400	NDR	2x H200 H200 SXM 144GB HBM3e	21 TB SSD

GPU brand name	GPU name in Slurm (`--gpus=NAME:n`)	VRAM GB (`--gres=gpu-vram:NNg`)	CUDA compute capability (`--gres=min-cuda-cc:NN`)	total amount	nodes	GPUs per node	Compute threads per GPU	Slurm partition (`--partition=`)
NVIDIA B300(*)	`b300`	`288`	10.3 (`103`)	16	gpu[64-65]	8	18944	`gpu-b300-288g-ellis`, `gpu-b300-288g-short`
NVIDIA H200(*)	`h200`	`141`	9.0 (`90`)	112	gpu[50-63]	8	16896	`gpu-h200-141g-ellis`, `gpu-h200-141g-short`
NVIDIA H200(**)	`h200_3g.71gb`	`71`	9.0 (`90`)	12	gpu[49]	12	6336	`gpu-h200-71g-ia-ellis`, `gpu-h200-71g-ia`
NVIDIA Grace-H200(+)	`h200`	`141`	9.0 (`90`)	4	gpuarm[1-2]	2	16896	`gpu-grace-h200-141g`
NVIDIA H100	`h100`	`80`	9.0 (`90`)	16	gpu[45-48]	4	16896	`gpu-h100-80g`
NVIDIA A100	`a100`	`80`	8.0 (`80`)	56	gpu[11-17,38-44]	4	7936	`gpu-a100-80g`
NVIDIA V100	`v100`	`32`	7.0 (`70`)	40	gpu[28-37]	4	5120	`gpu-v100-32g`
NVIDIA V100	`v100`	`32`	7.0 (`70`)	40	gpu[1-10]	4	5120	`gpu-v100-32g`
NVIDIA V100	`v100`	`32`	7.0 (`70`)	32	dgx[3,5-7]	8	5120	`gpu-v100-32g`
NVIDIA V100	`v100`	`16`	7.0 (`70`)	176	dgx[1-2,8-27]	8	5120	`gpu-v100-16g`
AMD MI210	`mi210` with `-p gpu-amd`	`32`		2	gpuamd[1]	2	7680	`gpu-amd`
AMD MI100	`mi100` with `-p gpu-amd`	`64`		1	gpuamd[1]	1	6656	`gpu-amd`