Serial Jobs

Introduction to batch scripts

You learned, in the interactive jobs how all Triton users must do their computation by submitting jobs to the Slurm batch system to ensure efficient resource sharing.

You additionally learned the interactive way to submit jobs, e.g. you could simply have an interative Bash session on a compute node. This proves useful for tests and debugging. Slurm jobs, however, are normally batch jobs, meaning that they are run unattended and asynchronously, without human supervision.

To create a batch job, you need to create a job script and subsequently submit it to Slurm. A job script is simply a shell script, e.g. Bash, where you put your resource requests and job steps. You will see what these two components are shortly. You have already seen how to do these interactively; and in this tutorial you will learn how to bundle them in your job scripts.

See also

Please refer to the interactive jobs tutorial to learn the basics of Slurm.

Your first job script

A job script is simply a shell script (Bash). And so the first line in the script should be the shebang directive (#!) followed by the full path to the executable binary of the shell’s interpreter, which is Bash in our case. What then follow are the resource requests and the job steps.

Let’s take a look at the following script

#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --mem-per-cpu=100
#SBATCH --output=/scratch/work/%u/hello.%j.out
#SBATCH --partition debug

srun echo "Hello $USER! You are on node $HOSTNAME"

Let’s name it hello.sh (create a file using your editor of choice, e.g.nano; write the script above and save it)

The symbol # followed by the SBATCH directives are understood by Slurm as parameters, determining the resource requests. Here, we have requested a time limit of 5 minutes, along with 100 MB of RAM per CPU.

Resource requests are followed by job steps, which are the actual tasks to be done. Each srun is a job step, and appears as a separate row in your history - which is useful for monitoring.

Having written the script, you need to submit the job to Slum through the sbatch command:

$ sbatch hello.sh
Submitted batch job 52428672

Warning

You must use sbatch, not bash to submit the job to process the #SBATCH headers and run in the background.

When the job enters the queue successfully, the response that the job has been submitted is printed in your terminal, along with the job ID assigned to the job.

You can check the status of you jobs using slurm q:

$ slurm q
JOBID              PARTITION NAME                  TIME       START_TIME    STATE NODELIST(REASON)
52428672           debug     hello.sh              0:00              N/A  PENDING (None)

Once the job is completed successfully, the state changes to COMPLETED and the output is then saved to hello.%j.out in your work directory (“%j” is replaced by the jobID).

Setting resource parameters

In both the above example and the tutorial on interactive jobs, you learned that resources are requested through job parameters such as --mem, --time, etc.

See also

See interactive jobs, the reference page or the details page for more information and advanced usage.

Please keep in mind that these parameters are hard values. If, for example, you request 5 GB of memory and your job uses substantially more, Slurm will kill your job.

Note

Actually, there is a little bit of grace period in killing jobs (about an hour), and you can go over memory a little bit. But, if you go over the memory limit and the node runs out, you will be the first one to be killed! Don’t count on this.

We recommend you be as specific as possible when setting your resource parameters as they determine how fast your jobs will run. Therefore, please try to gain more understanding on how much resources your code needs to fine-tune your requested resources.

Note

In general, please do not submit too short jobs (under 5 minutes) unless you are debugging. For your bulk production, try to have each job take at least 30 minutes, if possible. The reason behind this is that there is a big amount of startup, accounting, and scheduling overhead.

Monitoring your jobs

Once you submit your jobs, it goes into a queue. The two most useful commands to see the status of your jobs with are slurm q and slurm h (You’ve seen both in use).

For example, command scontrol show -d jobid <jobid> provides you detailed information on a job. Information such as where stderr and stdout will be redirected to. These information can be particularly beneficial for troubleshooting.

Another example could be the command sacct --format=jobid,elapsed,ncpus,ntasks,state,MaxRss which will show information as indicated in the --format option (job ID, the elapsed time, number of occupied CPUs, etc.). You can specify any field of interest to be shown using --format.

You can see more commands below.

Command Description
slurm q ; slurm qq Status of your queued jobs (long/short)
slurm partitions Overview of partitions (A/I/O/T=active,idle,other,total)
slurm cpus <partition> list free CPUs in a partition
slurm history [1day,2hour,…] Show status of recent jobs
seff <jobid> Show percent of mem/CPU used in job
slurm j <jobid> Job details (only while running)
slurm s ; slurm ss <partition> Show status of all jobs
sacct Full history information (advanced, needs args)

Full slurm command help:

$ slurm

Show or watch job queue:
 slurm [watch] queue     show own jobs
 slurm [watch] q   show user's jobs
 slurm [watch] quick     show quick overview of own jobs
 slurm [watch] shorter   sort and compact entire queue by job size
 slurm [watch] short     sort and compact entire queue by priority
 slurm [watch] full      show everything
 slurm [w] [q|qq|ss|s|f] shorthands for above!
 slurm qos               show job service classes
 slurm top [queue|all]   show summary of active users
Show detailed information about jobs:
 slurm prio [all|short]  show priority components
 slurm j|job      show everything else
 slurm steps      show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
 slurm h|history   show jobs finished since, e.g. "1day" (default)
 slurm shares
Show nodes and resources in the cluster:
 slurm p|partitions      all partitions
 slurm n|nodes           all cluster nodes
 slurm c|cpus            total cpu cores in use
 slurm cpus   cores available to partition, allocated and free
 slurm cpus jobs         cores/memory reserved by running jobs
 slurm cpus queue        cores/memory required by pending jobs
 slurm features          List features and GRES

Examples:
 slurm q
 slurm watch shorter
 slurm cpus batch
 slurm history 3hours

Other advanced commands (many require lots of parameters to be useful):

Command Description
squeue Full info on queues
sinfo Advanced info on partitions
slurm nodes List all nodes

Partitions

A partition is a set of computing nodes dedicated to a specific purpose. Examples include partitions assigned to debugging(“debug” partition), batch processing(“batch” partition), GPUs(“gpu” partition), etc.

Command sinfo lists the available partitions. Let’s see the first 4 partitions listed for the sake of brevity:

$ sinfo | head -n 5
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
interactive      up 1-00:00:00      2   drng pe[1-2]
jupyter-long     up 10-00:00:0      2   drng pe[1-2]
jupyter-short    up 1-00:00:00      2   drng pe[1-2]
grid             up 3-00:00:00      1  drain pe76

You can specify a partition to be listed by sinfo:

$ sinfo --partition=debug
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug        up    1:00:00      1 drain* wsm1
debug        up    1:00:00      1  drain pe3
debug        up    1:00:00      1   idle pe83

Take a look at the manpage using man sinfo for more details.

Generally, you don’t need to specify the partition; Slurm will use any posssible partition. However, you can do so with -p PARTITION_NAME This is mainly needed if you want to force interactive or debug partition (Slurm usually runs short jobs on the debug partition).

See also

You can see the partitions in the quick reference.

Full reference

Command Description
sbatch submit a job to queue (see standard options below)
srun Within a running job script/environment: Run code using the allocated resources (see options below)
srun On frontend: submit to queue, wait until done, show output. (see options below)
sinteractive Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below)
srun --pty bash (advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done.
scancel <jobid> Cancel a job in queue
salloc (advanced) Allocate resources from frontend node. Use srun to run using those resources, exit to close shell when done. Read the description! (see options below)
scontrol View/modify job and slurm configuration
Command Option Description
sbatch/srun/etc -t, --time=hh:mm:ss time limit
  -t, --time=dd-hh time limit, days-hours
  -p, --partition=partition job partition. Usually leave off and things are auto-detected.
  --mem-per-cpu=n request n MB of memory per core
  --mem=n request n MB memory per node
  -c, --cpus-per-task=n Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.)
  -N, --nodes=n-m allocate minimum of n, maximum of m nodes.
  -n, --ntasks=n allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.)
  -J, --job-name=name short job name
  -o output print output into file output
  -e error print errors into file error
  --exclusive allocate exclusive access to nodes. For large parallel jobs.
  --constraint=feature request feature (see slurm features for the current list of configured features, or Arch under the hardware list). Multiple with --constraint="hsw|skl".
  --array=0-5,7,10-15 Run job multiple times, use variable $SLURM_ARRAY_TASK_ID to adjust parameters.
  --gres=gpu request a GPU, or --gres=gpu:n for multiple
  --gres=spindle request nodes that have disks, spindle:n, for a certain number of RAID0 disks
  --mail-type=type notify of events: BEGIN, END, FAIL, ALL, REQUEUE (not on triton) or ALL. MUST BE used with --mail-user= only
  --mail-user=your@email whome to send the email
srun -N <N_nodes> hostname Print allocated nodes (from within script)

See also

There is a full description of running jobs on Triton, and the reference page lists many useful commands.

Exercises

  1. Submit a batch job that just runs hostname.
    1. Set time to 1 hour and 15 minutes, memory to 500MB.
    2. Change the job’s name and output file.
    3. Monitor the job with slurm watch queue.
    4. Check the output. Does it match slurm history?
  2. Create a simple batch script using pi.py based on the pi calculation of the interactive job tutorial exercises. Create multiple job steps (separate srun lines), each of which runs pi.py with a greater and greater number of tries. How does this appear in slurm history? When would you use extra srun commands, and when not?
  3. Create a batch script which does nothing (or some pointless operation for a while), for example sleep 300 (waits for 300 seconds) in the debug partition. Check the queue to see when it starts running. Then, cancel the job. What output is produced?
  4. What happens if you submit a batch script with bash instead of sbatch? Does it appear to run? Does it use all the Slurm options?
  1. (Advanced) Create a batch script that runs in another language. Does it run? What are some of the advantages and problems here?

What’s next?

Running multiple instances of a sbatch script is easier with array jobs.