Serial Jobs

See also

This assumes you have read the interactive jobs tutorial first.

Introduction

Triton is a large system that combines many different individual computers. At the same time, hundreds of people are using it. Thus, we must use a batch queuing system (slurm) in order to allocate resources.

The queue system takes computation requests from everyone, figures out the optimal use of resources, and allocates code to nodes. You have to start your code in a structured way in order for this to work. Our previous tutorial showed how to run things directly from the command line, without any scripting needed.

Now let’s see how to put these into scripts. A shell script takes any commands that you might type directly into a shell and automates them. The slurm scripts that we make in this lesson do do this. Scripts allow jobs to run asynchronously, in batch, and without human supervision.

A basic script

Let’s say we want to run echo 'hello world'. We have to tell the system how to run it. Here is a simple submission script, put it in a file called hello.slrm (you can use the editor nano: nano hello.slrm):

#!/bin/bash
#SBATCH --time=0-00:05:00    # 5 mins
#SBATCH --mem-per-cpu=500    # 500MB of memory

srun echo 'hello, world'

Whatever your application or programming language requires, you put it in the script.

Each srun is a job step, and appears as a separate row in your history - which is useful for monitoring. Then submit it with sbatch:

$ sbatch hello.slrm

This sends it to the queue to wait. Since the time requested is short, it will probably run on the debug partition, which is reserved for small test jobs (see below). Let’s see if it is in the queue:

Checking job status with slurm q:

$ slurm q
JOBID              PARTITION NAME                  TIME       START_TIME    STATE NODELIST(REASON)
13031249           debug     hello.slrm            0:00              N/A  PENDING (None)

Keep rerunning slurm q until you see it finish.

You can use scancel with that jobid to cancel the job before it finishes.

The output is then saved to slurm-13031249.out in your current directory (the number being the job ID).

Loading modules in scripts

Need to load modules for your software? Do it in the batch scripts. In general, anything you can do from the shell, you can do here:

#!/bin/bash
#SBATCH --time=0-00:05:00    # 5 mins
#SBATCH --mem-per-cpu=500    # 500MB of memory

module load anaconda3
python -V

Exercise: Try the Python version-printing script above. Try changing to different modules, anaconda2, Python, and others if you can find them.

Job parameters

As you can see, the above script is limited to 5 minutes and 500MB of memory. All scripts have to have limits, otherwise they can’t be efficiently scheduled. If you exceed the limits, the jobs will be killed. At least you need to set --time, --mem-per-cpu or --mem.

See the previous tutorial, the reference page or the details page for more information and advanced usage.

The same parameters can be used in

  • The sbatch script, prefixed by #SBATCH
  • The sbatch command line program directly (like -p debug above)
  • sinteractive/srun from the command line, which lets you run programs without a batch script.

It is important to note that slurm is a declarative system. You declare what you need, and slurm handles finding the resources without you having to worry about details. The more resources you request, the harder it will be to schedule and the longer you may have to wait. So, you should ask for enough to make sure your job can complete, but once you get experience with your code reduce resources to just what is needed.

In general, you don’t want to go submitting too short jobs (under 5 minutes) because there is a lot of startup, accounting, and scheduling overhead. If you are testing, short things are fine, but once you get to bulk production try to have each job take at least 30 minutes if possible. If you have lots of things to run, combine them into fewer jobs.

Full slurm reference

Command Description
sbatch submit a job to queue (see standard options below)
srun Within a running job script/environment: Run code using the allocated resources (see options below)
srun On frontend: submit to queue, wait until done, show output. (see options below)
sinteractive Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below)
srun --pty bash (advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done.
scancel <jobid> Cancel a job in queue
salloc (advanced) Allocate resources from frontend node. Use srun to run using those resources, exit to close shell when done. Read the description! (see options below)
scontrol View/modify job and slurm configuration
Command Option Description
sbatch/srun/etc -t, --time=hh:mm:ss time limit
  -t, --time=dd-hh time limit, days-hours
  -p, --partition=partition job partition. Usually leave off and things are auto-detected.
  --mem-per-cpu=n request n MB of memory per core
  --mem=n request n MB memory per node
  -c, --cpus-per-task=n Allocate *n* CPU’s for each task. For multithreaded jobs. (compare –ntasks: -c means the number of cores for each process started.)
  -N, --nodes=n-m allocate minimum of n, maximum of m nodes.
  -n, --ntasks=n allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.)
  -J, --job-name=name short job name
  -o output print output into file output
  -e error print errors into file error
  --exclusive allocate exclusive access to nodes. For large parallel jobs.
  --constraint=feature request feature (see slurm features for the current list of configured features, or Arch under the hardware list). Multiple with --constraint="hsw|skl".
  --array=0-5,7,10-15 Run job multiple times, use variable $SLURM_ARRAY_TASK_ID to adjust parameters.
  --gres=gpu request a GPU, or --gres=gpu:n for multiple
  --gres=spindle request nodes that have disks, spindle:n, for a certain number of RAID0 disks
  --mail-type=type notify of events: BEGIN, END, FAIL, ALL, REQUEUE (not on triton) or ALL. MUST BE used with --mail-user= only
  --mail-user=your@email whome to send the email
srun -N <N_nodes> hostname Print allocated nodes (from within script)

Status of the jobs

Once you submit jobs, it goes into a queue. You need to be able to see the status of jobs. There are commands to do this.

Command  
slurm j <jobid> Status on single job (still running)
slurm history [2hours|5days|…] Info on completed jobs, including mem/cpu usage.
Command Description
slurm q ; slurm qq Status of your queued jobs (long/short)
slurm partitions Overview of partitions (A/I/O/T=active,idle,other,total)
slurm cpus <partition> list free CPUs in a partition
slurm history [1day,2hour,…] Show status of recent jobs
seff <jobid> Show percent of mem/CPU used in job
slurm j <jobid> Job details (only while running)
slurm s ; slurm ss <partition> Show status of all jobs
sacct Full history information (advanced, needs args)

Full slurm command help:

$ slurm

Show or watch job queue:
 slurm [watch] queue     show own jobs
 slurm [watch] q   show user's jobs
 slurm [watch] quick     show quick overview of own jobs
 slurm [watch] shorter   sort and compact entire queue by job size
 slurm [watch] short     sort and compact entire queue by priority
 slurm [watch] full      show everything
 slurm [w] [q|qq|ss|s|f] shorthands for above!
 slurm qos               show job service classes
 slurm top [queue|all]   show summary of active users
Show detailed information about jobs:
 slurm prio [all|short]  show priority components
 slurm j|job      show everything else
 slurm steps      show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
 slurm h|history   show jobs finished since, e.g. "1day" (default)
 slurm shares
Show nodes and resources in the cluster:
 slurm p|partitions      all partitions
 slurm n|nodes           all cluster nodes
 slurm c|cpus            total cpu cores in use
 slurm cpus   cores available to partition, allocated and free
 slurm cpus jobs         cores/memory reserved by running jobs
 slurm cpus queue        cores/memory required by pending jobs
 slurm features          List features and GRES

Examples:
 slurm q
 slurm watch shorter
 slurm cpus batch
 slurm history 3hours

Other advanced commands (many require lots of parameters to be useful):

Command Description
squeue Full info on queues
sinfo Advanced info on partitions
slurm nodes List all nodes

See the full list of status commands on the reference page.

Partitions

There are different partitions, which have different limits. The “debug” partition is for short debugging, so is designed to always be available. The “batch” partition is designed for all the normal long jobs. There are also partitions for GPUs, huge memory nodes, interactive shells, and so on. Most of the time, you should leave the partition off, and slurm will use all possible partitions. You can specify your partitions with -p PARTITION_NAME to whatever command you are running, which is mainly needed if you want to force interactive or a test partition. The available partitions are listed on the reference page.

You can see the partitions in the quick reference.

Exercises

  1. Basics
    1. Submit a batch job that just runs hostname.
    2. Set time to 1 hour and 15 minutes, memory to 500MB.
    3. Change the job’s name and output file.
    4. Monitor the job with slurm watch queue.
    5. Check the output. Does it match slurm history?
  2. Create a simple batch script using pi.py based on the pi calculation of the interactive job tutorial exercises. Create multiple job steps (separate srun lines), each of which runs pi.py with a greater and greater number of tries. How does this appear in slurm history. When would you use extra srun commands, and when not?
  3. Create a batch script which does nothing (or some pointless operation for a while), for example sleep 300 (waits for 300 seconds) in the debug partition. Check the queue to see when it starts running. Then, cancel the job. What output is produced?
  4. (Advanced) Create a batch script that runs in another language. Does it run? What are some of the advantages and problems here?

Next steps

There is a full description of running jobs on Triton, and the reference page lists many useful commands.

Running multiple instances of a sbatch script is easier with array jobs.