Job dependencies
Abstract
Sometimes, several jobs need to be run in sequence (for example, different parts that require different types of resources). This can be done with job dependencies.
This can be done with the
--dependency
Slurm option.
Introduction
Job dependencies are a way to specify dependencies between jobs. The most common use is to launch a job only after a previous job has completed successfully. But other kinds of dependencies are also possible.
Basic example
Dependencies are specified with the --dependency=DEPENDENCY_LIST
option. E.g. --dependency=afterok:123:124
means that the job can
only start after job ID’s 123 and 124 have both completed
successfully.
Automating job dependencies
A common problem with job dependencies is that you want job B to start only after job A finishes successfully. However, you cannot know the job ID of job A before it has been submitted. One solution is to catch the job id of job A when submitting it and store it as a shell variable, and using the stored value when submitting job B. Like:
$ idA=$(sbatch jobA.sh | awk '{print $4}')
$ sbatch --dependency=afterok:${idA} jobB.sh
Exercises
Dependencies-1: read the docs
Look at man sbatch
and investigate the --dependency
parameter.
Dependencies-2: Chain of jobs
Create a chain of jobs A -> B -> C each depending on the successful
completion of the previous job. In each job run e.g. sleep 60
to give you time to investigate the status of the queue.
Solution
You should all of your jobs in queue. Jobs with dependency on a previous job will have a status on pending, stating a dependency as the reason.
Dependencies-3: First job fails
Continuing from the previous exercise, what happens if at the end
of the job A script you put exit 1
. What does it mean?
Solution
Putting exit 1
at the end of your job script means it returns
a unix exit code indicating a failure. Next jobs in your depedency
list will sit in queue forever, since as far as they know the previous
job never completed successfully.