Introduction to batch scripts¶
You learned, in the interactive jobs how all Triton users must do their computation by submitting jobs to the Slurm batch system to ensure efficient resource sharing.
You additionally learned the interactive way to submit jobs, e.g. you could simply have an interative Bash session on a compute node. This proves useful for tests and debugging. Slurm jobs, however, are normally batch jobs, meaning that they are run unattended and asynchronously, without human supervision.
To create a batch job, you need to create a job script and subsequently submit it to Slurm. A job script is simply a shell script, e.g. Bash, where you put your resource requests and job steps. You will see what these two components are shortly. You have already seen how to do these interactively; and in this tutorial you will learn how to bundle them in your job scripts.
Please refer to the interactive jobs tutorial to learn the basics of Slurm.
Your first job script¶
A job script is simply a shell script (Bash). And so the first line
in the script should be the shebang directive (
#!) followed by the
full path to the executable binary of the shell’s interpreter, which is
Bash in our case. What then follow are the resource requests and the job steps.
Let’s take a look at the following script
#!/bin/bash #SBATCH --time=00:05:00 #SBATCH --mem-per-cpu=100 #SBATCH --output=/scratch/work/%u/hello.%j.out #SBATCH --partition debug srun echo "Hello $USER! You are on node $HOSTNAME"
Let’s name it
hello.sh (create a file using your editor of choice, e.g.nano;
write the script above and save it)
# followed by the SBATCH directives are understood
by Slurm as parameters, determining the resource requests.
Here, we have requested a time limit of 5 minutes, along with 100 MB of RAM per CPU.
Resource requests are followed by job steps, which are the actual
tasks to be done. Each
srun is a job step, and appears as a separate row in your
history - which is useful for monitoring.
Having written the script, you need to submit the job to Slum through the
$ sbatch hello.sh Submitted batch job 52428672
You must use
bash to submit the job
to process the
#SBATCH headers and run in the background.
When the job enters the queue successfully, the response that the job has been submitted is printed in your terminal, along with the job ID assigned to the job.
You can check the status of you jobs using
$ slurm q JOBID PARTITION NAME TIME START_TIME STATE NODELIST(REASON) 52428672 debug hello.sh 0:00 N/A PENDING (None)
Once the job is completed successfully, the state changes to COMPLETED and the
output is then saved to
hello.%j.out in your work
directory (“%j” is replaced by the jobID).
Setting resource parameters¶
In both the above example and the tutorial on interactive jobs, you learned
that resources are requested through job parameters such as
Please keep in mind that these parameters are hard values. If, for example, you request 5 GB of memory and your job uses substantially more, Slurm will kill your job.
Actually, there is a little bit of grace period in killing jobs (about an hour), and you can go over memory a little bit. But, if you go over the memory limit and the node runs out, you will be the first one to be killed! Don’t count on this.
We recommend you be as specific as possible when setting your resource parameters as they determine how fast your jobs will run. Therefore, please try to gain more understanding on how much resources your code needs to fine-tune your requested resources.
In general, please do not submit too short jobs (under 5 minutes) unless you are debugging. For your bulk production, try to have each job take at least 30 minutes, if possible. The reason behind this is that there is a big amount of startup, accounting, and scheduling overhead.
Monitoring your jobs¶
Once you submit your jobs, it goes into a queue. The two most useful commands to see
the status of your jobs with are
slurm q and
slurm h (You’ve seen both in use).
For example, command
scontrol show -d jobid <jobid> provides you detailed information
on a job. Information such as where stderr and stdout will be redirected to. These information
can be particularly beneficial for troubleshooting.
Another example could be the command
which will show information as indicated in the
--format option (job ID, the elapsed time,
number of occupied CPUs, etc.). You can specify any field of interest to be shown using
You can see more commands below.
||Status of your queued jobs (long/short)|
||Overview of partitions (A/I/O/T=active,idle,other,total)|
||list free CPUs in a partition|
||Show status of recent jobs|
||Show percent of mem/CPU used in job|
||Job details (only while running)|
||Show status of all jobs|
||Full history information (advanced, needs args)|
Full slurm command help:
$ slurm Show or watch job queue: slurm [watch] queue show own jobs slurm [watch] q show user's jobs slurm [watch] quick show quick overview of own jobs slurm [watch] shorter sort and compact entire queue by job size slurm [watch] short sort and compact entire queue by priority slurm [watch] full show everything slurm [w] [q|qq|ss|s|f] shorthands for above! slurm qos show job service classes slurm top [queue|all] show summary of active users Show detailed information about jobs: slurm prio [all|short] show priority components slurm j|job show everything else slurm steps show memory usage of running srun job steps Show usage and fair-share values from accounting database: slurm h|history show jobs finished since, e.g. "1day" (default) slurm shares Show nodes and resources in the cluster: slurm p|partitions all partitions slurm n|nodes all cluster nodes slurm c|cpus total cpu cores in use slurm cpus cores available to partition, allocated and free slurm cpus jobs cores/memory reserved by running jobs slurm cpus queue cores/memory required by pending jobs slurm features List features and GRES Examples: slurm q slurm watch shorter slurm cpus batch slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
||Full info on queues|
||Advanced info on partitions|
||List all nodes|
A partition is a set of computing nodes dedicated to a specific purpose. Examples include partitions assigned to debugging(“debug” partition), batch processing(“batch” partition), GPUs(“gpu” partition), etc.
sinfo lists the available partitions. Let’s see the first 4 partitions listed
for the sake of brevity:
$ sinfo | head -n 5 PARTITION AVAIL TIMELIMIT NODES STATE NODELIST interactive up 1-00:00:00 2 drng pe[1-2] jupyter-long up 10-00:00:0 2 drng pe[1-2] jupyter-short up 1-00:00:00 2 drng pe[1-2] grid up 3-00:00:00 1 drain pe76
You can specify a partition to be listed by
$ sinfo --partition=debug PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug up 1:00:00 1 drain* wsm1 debug up 1:00:00 1 drain pe3 debug up 1:00:00 1 idle pe83
Take a look at the manpage using
man sinfo for more details.
Generally, you don’t need to specify the partition; Slurm will
use any posssible partition. However, you can do so with
This is mainly needed if you want to force interactive or
debug partition (Slurm usually runs short jobs on the debug partition).
You can see the partitions in the quick reference.
||submit a job to queue (see standard options below)|
||Within a running job script/environment: Run code using the allocated resources (see options below)|
||On frontend: submit to queue, wait until done, show output. (see options below)|
||Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below)|
||(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done.|
||Cancel a job in queue|
||(advanced) Allocate resources from frontend node. Use
||View/modify job and slurm configuration|
||time limit, days-hours|
||job partition. Usually leave off and things are auto-detected.|
||request n MB of memory per core|
||request n MB memory per node|
||Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.)|
||allocate minimum of n, maximum of m nodes.|
||allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.)|
||short job name|
||print output into file output|
||print errors into file error|
||allocate exclusive access to nodes. For large parallel jobs.|
||request feature (see
||Run job multiple times, use variable
||request a GPU, or
||request nodes that have disks,
||notify of events:
||whome to send the email|
||Print allocated nodes (from within script)|
- Submit a batch job that just runs
- Set time to 1 hour and 15 minutes, memory to 500MB.
- Change the job’s name and output file.
- Monitor the job with
slurm watch queue.
- Check the output. Does it match
- Create a simple batch script using
pi.pybased on the pi calculation of the interactive job tutorial exercises. Create multiple job steps (separate
srunlines), each of which runs
pi.pywith a greater and greater number of tries. How does this appear in
slurm history? When would you use extra
sruncommands, and when not?
- Create a batch script which does nothing (or some pointless
operation for a while), for example
sleep 300(waits for 300 seconds) in the
debugpartition. Check the queue to see when it starts running. Then, cancel the job. What output is produced?
- What happens if you submit a batch script with
sbatch? Does it appear to run? Does it use all the Slurm options?
- (Advanced) Create a batch script that runs in another language. Does it run? What are some of the advantages and problems here?