Serial Jobs¶
Introduction to batch scripts¶
You learned, in the interactive jobs, how all Triton users must do their computation by submitting jobs to the Slurm batch system to ensure efficient resource sharing.
You additionally learned the interactive way to submit jobs, e.g. you could simply have an interative Bash session on a compute node. This proves useful for tests and debugging. Slurm jobs, however, are normally batch jobs, meaning that they are run unattended and asynchronously, without human supervision.
To create a batch job, you need to create a job script and subsequently submit it to Slurm. A job script is simply a shell script, e.g. Bash, where you put your resource requests and job steps. You will see what these two components are in this tutorial. You have already seen how to do these interactively; and in this tutorial you will learn how to bundle them in your job scripts.
See also
Please refer to the interactive jobs tutorial to learn the basics of Slurm.
Your first job script¶
A job script is simply a shell script (Bash). And so the first line
in the script should be the shebang directive (#!
) followed by the
full path to the executable binary of the shell’s interpreter, which is
Bash in our case. What then follow are the resource requests and the job steps.
Let’s take a look at the following script
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --mem-per-cpu=100
#SBATCH --output=/scratch/work/%u/hello.%j.out
#SBATCH --partition=debug
srun echo "Hello $USER! You are on node $HOSTNAME"
Let’s name it hello.sh
(create a file using your editor of choice, e.g. nano
;
write the script above and save it)
The symbol #
is a comment in a bash script, and Slurm
understands #SBATCH
as parameters, determining the resource
requests.
Here, we have requested a time limit of 5 minutes, along with 100 MB of RAM per CPU.
Resource requests are followed by job steps, which are the actual
tasks to be done. Each srun
within the a slurm script is a job
step, and appears as a separate row in your
history - which is useful for monitoring.
Having written the script, you need to submit the job to Slum through
the sbatch
command:
$ sbatch hello.sh
Submitted batch job 52428672
Warning
You must use sbatch
, not bash
to submit the job
since it is Slurm that understands the SBATCH
directives,
not Bash.
When the job enters the queue successfully, the response that the job has been submitted is printed in your terminal, along with the jobid assigned to the job.
You can check the status of you jobs using slurm q
:
$ slurm q
JOBID PARTITION NAME TIME START_TIME STATE NODELIST(REASON)
52428672 debug hello.sh 0:00 N/A PENDING (None)
Once the job is completed successfully, the state changes to COMPLETED and the
output is then saved to hello.%j.out
in your work
directory (“%j” is replaced by the jobid and “%u” by your username).
Setting resource parameters¶
In both the above example and the tutorial on interactive jobs, you learned
that resources are requested through job parameters such as --mem
, --time
, etc.
See also
See interactive jobs, the reference page or the details page for more information and advanced usage.
Please keep in mind that these parameters are hard values. If, for example, you request 5 GB of memory and your job uses substantially more, Slurm will kill your job.
Note
Actually, there is a little bit of grace period in killing jobs (about an hour), and you can go over memory a little bit. But, if you go over the memory limit and the node runs out, you will be the first one to be killed! Don’t count on this.
We recommend you be as specific as possible when setting your resource parameters as they determine how fast your jobs will run. Therefore, please try to gain more understanding on how much resources your code needs to fine-tune your requested resources.
Note
In general, please do not submit too short jobs (under 5 minutes) unless you are debugging. For your bulk production, try to have each job take at least 30 minutes, if possible. The reason behind this is that there is a big amount of startup, accounting, and scheduling overhead.
Monitoring your jobs¶
Once you submit your jobs, it goes into a queue. The two most useful commands to see
the status of your jobs with are slurm q
and slurm h
(You’ve seen both in use).
For example, command scontrol show -d jobid <jobid>
provides you detailed information
on a running job. Information such as where stderr and stdout will be redirected to. These information
can be particularly beneficial for troubleshooting.
Another example could be the command sacct --format=jobid,elapsed,ncpus,ntasks,state,MaxRss
which will show information as indicated in the --format
option (jobid, the elapsed time,
number of occupied CPUs, etc.). You can specify any field of interest to be shown using --format
.
You can see more commands below.
Command |
Description |
---|---|
|
Status of your queued jobs (long/short) |
|
Overview of partitions (A/I/O/T=active,idle,other,total) |
|
list free CPUs in a partition |
|
Show status of recent jobs |
|
Show percent of mem/CPU used in job |
|
Job details (only while running) |
|
Show status of all jobs |
|
Full history information (advanced, needs args) |
Full slurm command help:
$ slurm
Show or watch job queue:
slurm [watch] queue show own jobs
slurm [watch] q show user's jobs
slurm [watch] quick show quick overview of own jobs
slurm [watch] shorter sort and compact entire queue by job size
slurm [watch] short sort and compact entire queue by priority
slurm [watch] full show everything
slurm [w] [q|qq|ss|s|f] shorthands for above!
slurm qos show job service classes
slurm top [queue|all] show summary of active users
Show detailed information about jobs:
slurm prio [all|short] show priority components
slurm j|job show everything else
slurm steps show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
slurm h|history show jobs finished since, e.g. "1day" (default)
slurm shares
Show nodes and resources in the cluster:
slurm p|partitions all partitions
slurm n|nodes all cluster nodes
slurm c|cpus total cpu cores in use
slurm cpus cores available to partition, allocated and free
slurm cpus jobs cores/memory reserved by running jobs
slurm cpus queue cores/memory required by pending jobs
slurm features List features and GRES
Examples:
slurm q
slurm watch shorter
slurm cpus batch
slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
Command |
Description |
---|---|
|
Full info on queues |
|
Advanced info on partitions |
|
List all nodes |
Partitions¶
A slurm partition is a set of computing nodes dedicated to a specific purpose. Examples include partitions assigned to debugging(“debug” partition), batch processing(“batch” partition), GPUs(“gpu” partition), etc.
Command sinfo
lists the available partitions. Let’s see the first 4 partitions listed
for the sake of brevity:
$ sinfo | head -n 5
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
interactive up 1-00:00:00 2 drng pe[1-2]
jupyter-long up 10-00:00:0 2 drng pe[1-2]
jupyter-short up 1-00:00:00 2 drng pe[1-2]
grid up 3-00:00:00 1 drain pe76
You can specify a partition to be listed by sinfo
:
$ sinfo --partition=debug
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up 1:00:00 1 drain* wsm1
debug up 1:00:00 1 drain pe3
debug up 1:00:00 1 idle pe83
Take a look at the manpage using man sinfo
for more details.
Generally, you don’t need to specify the partition; Slurm will
use any posssible partition (though this is Aalto-specific, however
other sites may have other requirements here). However, you can do so
with -p PARTITION_NAME
.
This is mainly needed if you want to force interactive or
debug partition (Slurm usually runs short jobs on the debug partition).
See also
You can see the partitions in the quick reference.
Full reference¶
Command |
Description |
---|---|
|
submit a job to queue (see standard options below) |
|
Within a running job script/environment: Run code using the allocated resources (see options below) |
|
On frontend: submit to queue, wait until done, show output. (see options below) |
|
Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below) |
|
(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done. |
|
Cancel a job in queue |
|
(advanced) Allocate resources from frontend node. Use |
|
View/modify job and slurm configuration |
Command |
Option |
Description |
---|---|---|
|
|
time limit |
|
time limit, days-hours |
|
|
job partition. Usually leave off and things are auto-detected. |
|
|
request n MB of memory per core |
|
|
request n MB memory per node |
|
|
Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.) |
|
|
allocate minimum of n, maximum of m nodes. |
|
|
allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.) |
|
|
short job name |
|
|
print output into file output |
|
|
print errors into file error |
|
|
allocate exclusive access to nodes. For large parallel jobs. |
|
|
request feature (see |
|
|
Run job multiple times, use variable |
|
|
request a GPU, or |
|
|
request nodes that have disks, |
|
|
notify of events: |
|
|
whome to send the email |
|
|
|
Print allocated nodes (from within script) |
See also
There is a full description of running jobs on Triton and the reference page lists many useful commands.
Exercises¶
Submit a batch job that just runs
hostname
.Set time to 1 hour and 15 minutes, memory to 500MB.
Change the job’s name and output file.
Monitor the job with
slurm watch queue
.Check the output. Does it match
slurm history
?
Create a simple batch script to run the Pi calculation script
pi.py
used in the previous exercises. Create multiple job steps (separatesrun
lines), each of which runspi.py
with a greater number of tries. How does this appear inslurm history
? When would you use extrasrun
commands, and when not?Create a batch script which does nothing (or some pointless operation for a while), for example
sleep 300
(waits for 300 seconds) in thedebug
partition. Check the queue to see when it starts running. Then, cancel the job. What output is produced?What happens if you submit a batch script with
bash
instead ofsbatch
? Does it appear to run? Does it use all the Slurm options?(Advanced) Create a batch script that runs in another language using a different
#!
line. Does it run? What are some of the advantages and problems here?
What’s next?¶
Running multiple instances of a sbatch
script is easier with
array jobs.