This assumes you have read the interactive jobs tutorial first.
Triton is a large system that combines many different individual computers. At the same time, hundreds of people are using it. Thus, we must use a batch queuing system (slurm) in order to allocate resources.
The queue system takes computation requests from everyone, figures out the optimal use of resources, and allocates code to nodes. You have to start your code in a structured way in order for this to work. Our previous tutorial showed how to run things directly from the command line, without any scripting needed.
Now let’s see how to put these into scripts. A shell script takes any commands that you might type directly into a shell and automates them. The slurm scripts that we make in this lesson do do this. Scripts allow jobs to run asynchronously, in batch, and without human supervision.
A basic script¶
Let’s say we want to run
echo 'hello world'. We have to tell the
system how to run it. Here is a simple submission script, put it in a
hello.slrm (you can use the editor nano:
#!/bin/bash #SBATCH --time=0-00:05:00 # 5 mins #SBATCH --mem-per-cpu=500 # 500MB of memory srun echo 'hello, world'
Whatever your application or programming language requires, you put it in the script.
srun is a job step, and appears as a separate row in your
history - which is useful for monitoring. Then submit it with
$ sbatch hello.slrm
You must use
bash to submit the job
to process the
#SBATCH headers and run in the background.
This sends it to the queue to wait. Since the time requested is short, it will probably run on the debug partition, which is reserved for small test jobs (see below). Let’s see if it is in the queue:
Checking job status with
$ slurm q JOBID PARTITION NAME TIME START_TIME STATE NODELIST(REASON) 13031249 debug hello.slrm 0:00 N/A PENDING (None)
slurm q until you see it finish.
You can use
scancel with that jobid to cancel the job before it
The output is then saved to
slurm-13031249.out in your current
directory (the number being the job ID).
Loading modules in scripts¶
Need to load modules for your software? Do it in the batch scripts. In general, anything you can do from the shell, you can do here:
#!/bin/bash #SBATCH --time=0-00:05:00 # 5 mins #SBATCH --mem-per-cpu=500 # 500MB of memory module load anaconda3 python -V
Exercise: Try the Python version-printing script above. Try
changing to different modules,
Python, and others
if you can find them.
As you can see, the above script is limited to 5 minutes and 500MB of
memory. All scripts have to have limits, otherwise they can’t be
efficiently scheduled. If you exceed the limits, the jobs will be
killed. At least you need to set
The same parameters can be used in
- The sbatch script, prefixed by
sbatchcommand line program directly (like
srunfrom the command line, which lets you run programs without a batch script.
It is important to note that slurm is a declarative system. You declare what you need, and slurm handles finding the resources without you having to worry about details. The more resources you request, the harder it will be to schedule and the longer you may have to wait. So, you should ask for enough to make sure your job can complete, but once you get experience with your code reduce resources to just what is needed.
In general, you don’t want to go submitting too short jobs (under 5 minutes) because there is a lot of startup, accounting, and scheduling overhead. If you are testing, short things are fine, but once you get to bulk production try to have each job take at least 30 minutes if possible. If you have lots of things to run, combine them into fewer jobs.
Full slurm reference¶
||submit a job to queue (see standard options below)|
||Within a running job script/environment: Run code using the allocated resources (see options below)|
||On frontend: submit to queue, wait until done, show output. (see options below)|
||Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below)|
||(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done.|
||Cancel a job in queue|
||(advanced) Allocate resources from frontend node. Use
||View/modify job and slurm configuration|
||time limit, days-hours|
||job partition. Usually leave off and things are auto-detected.|
||request n MB of memory per core|
||request n MB memory per node|
||Allocate *n* CPU’s for each task. For multithreaded jobs. (compare –ntasks: -c means the number of cores for each process started.)|
||allocate minimum of n, maximum of m nodes.|
||allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.)|
||short job name|
||print output into file output|
||print errors into file error|
||allocate exclusive access to nodes. For large parallel jobs.|
||request feature (see
||Run job multiple times, use variable
||request a GPU, or
||request nodes that have disks,
||notify of events:
||whome to send the email|
||Print allocated nodes (from within script)|
Status of the jobs¶
Once you submit jobs, it goes into a queue. You need to be able to see the status of jobs. There are commands to do this.
|slurm j <jobid>||Status on single job (still running)|
|slurm history [2hours|5days|…]||Info on completed jobs, including mem/cpu usage.|
||Status of your queued jobs (long/short)|
||Overview of partitions (A/I/O/T=active,idle,other,total)|
||list free CPUs in a partition|
||Show status of recent jobs|
||Show percent of mem/CPU used in job|
||Job details (only while running)|
||Show status of all jobs|
||Full history information (advanced, needs args)|
Full slurm command help:
$ slurm Show or watch job queue: slurm [watch] queue show own jobs slurm [watch] q show user's jobs slurm [watch] quick show quick overview of own jobs slurm [watch] shorter sort and compact entire queue by job size slurm [watch] short sort and compact entire queue by priority slurm [watch] full show everything slurm [w] [q|qq|ss|s|f] shorthands for above! slurm qos show job service classes slurm top [queue|all] show summary of active users Show detailed information about jobs: slurm prio [all|short] show priority components slurm j|job show everything else slurm steps show memory usage of running srun job steps Show usage and fair-share values from accounting database: slurm h|history show jobs finished since, e.g. "1day" (default) slurm shares Show nodes and resources in the cluster: slurm p|partitions all partitions slurm n|nodes all cluster nodes slurm c|cpus total cpu cores in use slurm cpus cores available to partition, allocated and free slurm cpus jobs cores/memory reserved by running jobs slurm cpus queue cores/memory required by pending jobs slurm features List features and GRES Examples: slurm q slurm watch shorter slurm cpus batch slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
||Full info on queues|
||Advanced info on partitions|
||List all nodes|
See the full list of status commands on the reference page.
There are different partitions, which have different limits. The
“debug” partition is for short debugging, so is designed to always be
available. The “batch” partition is designed for all the normal long
jobs. There are also partitions for GPUs, huge memory nodes, interactive
shells, and so on. Most of the time, you should leave the partition off,
and slurm will use all possible partitions. You can specify your
-p PARTITION_NAME to whatever command you are
running, which is mainly needed if you want to force interactive or a
test partition. The available partitions are listed on the
You can see the partitions in the quick reference.
- Submit a batch job that just runs
- Set time to 1 hour and 15 minutes, memory to 500MB.
- Change the job’s name and output file.
- Monitor the job with
slurm watch queue.
- Check the output. Does it match
- Submit a batch job that just runs
- Create a simple batch script using
pi.pybased on the pi calculation of the interactive job tutorial exercises. Create multiple job steps (separate
srunlines), each of which runs
pi.pywith a greater and greater number of tries. How does this appear in
slurm history. When would you use extra
sruncommands, and when not?
- Create a batch script which does nothing (or some pointless
operation for a while), for example
sleep 300(waits for 300 seconds) in the
debugpartition. Check the queue to see when it starts running. Then, cancel the job. What output is produced?
- What happens if you submit a batch script with
sbatch? Does it appear to run? Does it use all the Slurm options?
- (Advanced) Create a batch script that runs in another language. Does it run? What are some of the advantages and problems here?