Submitting jobs on Triton
Optimally, before submitting a job: do enough tests and have a rough idea, how long your job takes, how much memory it needs and how much CPU(s)/GPU(s) it needs. Required Reading:
Types of jobs:
Triton uses the Slurm scheduling system to allocate resources, like computer nodes, memory on the nodes, GPUs etc, to the submitted jobs. For more details on Slurm, have a look here. In this quickstart guide, we will only introduce the most important parameters, and skip over a lot of details. There are multiple different types of jobs available on Triton. Here we focus on the most commonly used ones.
Interactive jobs (commonly to test things or run graphical platforms with cluster resources)
Batch jobs (normal jobs submitted to the cluster without direct user input)
to run an interactive connect to Triton and job simply run
from the command line. You will then be connected to a free node, and can run your interactive session. More details can be found in the tutorial for interactive jobs. If you have a specific command that you want to run you can also use:
The most common job to run is a batch job, i.e. you submit a script that runs your code on the cluster.
To run this kind of job, you need a small script where you set parameters for the job and submit it to the cluster.
Using a script to set the parameters has the advantage that it is
easier to modify and reuse than passing the parameters on the command line.
A basic script (e.g. in the file
BatchScript.slurm) for a slurm batch job could look as follows:
#!/bin/bash #SBATCH --time=04:00:00 #SBATCH --mem=2G #SBATCH --output=ScriptOutput.log module load anaconda srun python /path/to/script.py
To run this script use the command
So, let us go through this script:
#SBATCH --time=04:00:00asks for a 4 hour time slot, after which the job will be stopped.
#SBATCH --mem=2Gasks for 2Gb of memory for your job.
#SBATCH --output=ScriptOutput.logsets the terminal output of the job to the specified file.
module load anacondatells the node you run on to load the anaconda module.
srun python /path/to/scripttells the cluster to run the command
Most programming languages and tools have their own modules that need to be loaded before they can be run. You can get a list of available
modules by running
module spider. If you need a specific version of a module, you can check the available versions by running
module spider MODULENAME
module spider r for
R). To load a specific version you have to specify this version during the load command (e.g.
module load matlab/r2018b
for the 2018b release of MATLAB). For further details please have a look at the instructions for the specific application
There are plenty more parameters that you can set for the slurm scheduler as well (for a detailed list can be found here), but we are not going to discuss them in detail here, since they are likely not necessary for your first job.