Running serial jobs in parallel
Very often you need to run a given script with multiple different sets of parameters. This is what is commonly called an embarrassingly parallel problem, because a) there are commonly many more problems than available processors, and b) it is very easy to parallelize. Here, we will show you how to adapt your code in this kind of situation, based on a non-parallel version of a problem. We will assume an individual script with fixed parameter values and modify it to adapt it’s input based on the serial job number id from slurm.
The unparallelized version
Lets assume you have a genetic algorithm optimization pipeline with a few fixed parameters.
import pygad
import numpy as np
function_inputs = np.array([4,-2,3.5,5,-11,-4.7])
desired_output = 44
def fitness_func(solution, solution_idx):
output = np.sum(solution*function_inputs)
fitness = 1.0 / np.abs(output - desired_output)
return fitness
# define the parameters
fitness_function = fitness_func
num_generations = 200000
num_parents_mating = 4
sol_per_pop = 100
num_genes = len(function_inputs)
mutation_percent_genes = 10
stop_criteria="saturate_50"
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
fitness_func=fitness_function,
sol_per_pop=sol_per_pop,
num_genes=num_genes,
mutation_percent_genes=mutation_percent_genes,
stop_criteria=stop_criteria)
ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution=solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
prediction = np.sum(np.array(function_inputs)*solution)
print("Predicted output based on the best solution : {prediction}".format(prediction=prediction))
library(GA)
f <- function(solution){
output = c(4,-2,3.5,5,-11,-4.7) %*% solution
fitness = 1./abs(output-44)
return(fitness)
}
# search space bounds
lowers = c(-20,-20,-20,-20,-20,-20)
uppers = c(20,20,20,20,20,20)
# maximum generations
maxiter = 200000
# set the population size
popSize = 100
# set the maximum number of generations to proceed if no improvment happens
run = 50
# mutation percentage
pmutation = 0.1
GA <- ga(type = "real-valued", fitness = f, lower = lowers, upper = uppers, maxiter= maxiter, run=run, pmutation = pmutation, popSize = popSize)
summary(GA)
GA@solution %*% c(4,-2,3.5,5,-11,-4.7)
% set the mutation rate
mutationRate = 0.1;
opts = optimoptions('ga','MutationFcn', {@mutationuniform, mutationRate});
% Set population size and end criteria
opts.PopulationSize = 100;
opts.MaxStallGenerations = 50;
opts.MaxGenerations = 200000;
%set the range for all genes
opts.InitialPopulationRange = [-20;20];
% define number of variables (genes)
numberOfVariables = 6;
[x,Fval,exitFlag,Output] = ga(@fitness,numberOfVariables,[],[],[], ...
[],[],[],[],opts);
output = [4,-2,3.5,5,-11,-4.7] * x'
exit(0)
function fit = fitness(x)
output = [4,-2,3.5,5,-11,-4.7] * x';
fit = abs(output - 44);
end
The parameters set in this example are: the maximum number of generations, the population size the maximum stalled generations (i.e. how many generations the algorithm should continue if it does not improve) and the mutation rate. Lets assume, we want to test how different mutation rates change the outcome of the algorithm and its runtime. We could run this within each language, for-looping over percentages from 0-100%, which can take quite some time. Alternatively, we can run 100 jobs, each determining one percentage.
Running a Slurm array job
Array jobs are defined in Slurm by the Parameter --array=XXX-YYY
, where XXX is the lowest index and YYY the highest index.
Each job will have access to an individual SLURM_ARRAY_TASK_ID
environment variable. There are two ways how this can be
incorporated into a job. Either directly in the submission script, or by retrieving it in your code. The former allows
the selection of input_file names based on the array ID number e.g.:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --array=1-4
srun ./my_application -input input_data_${SLURM_ARRAY_TASK_ID}
In our case however, we would like to directly use it within the script we run. So we will set up the following slurm script:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-100
#SBATCH --mem=500M
#SBATCH --output=python_array_%a.out
module load scicomp-python-env # use the normal scicomp environment for python
srun python serial.py
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-100
#SBATCH --mem=500M
#SBATCH --output=r_array_%a.out
# Load the version of R you want to use
module load r
# Run your R script
srun Rscript serial.r
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-100
#SBATCH --mem=3G
#SBATCH --output=r_array_%a.out
module load matlab
srun matlab -nodisplay -r serial
Then we modify the script as follows:
import pygad
import numpy as np
import os
function_inputs = np.array([4,-2,3.5,5,-11,-4.7])
desired_output = 44
def fitness_func(solution, solution_idx):
output = np.sum(solution*function_inputs)
fitness = 1.0 / np.abs(output - desired_output)
return fitness
# define the parameters
fitness_function = fitness_func
num_generations = 200000
num_parents_mating = 4
sol_per_pop = 100
num_genes = len(function_inputs)
mutation_percent_genes = int(os.getenv('SLURM_ARRAY_TASK_ID'))
stop_criteria="saturate_50"
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
fitness_func=fitness_function,
sol_per_pop=sol_per_pop,
num_genes=num_genes,
mutation_percent_genes=mutation_percent_genes,
stop_criteria=stop_criteria)
ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution=solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
prediction = np.sum(np.array(function_inputs)*solution)
print("Predicted output based on the best solution : {prediction}".format(prediction=prediction))
library(GA)
f <- function(solution){
output = c(4,-2,3.5,5,-11,-4.7) %*% solution
fitness = 1./abs(output-44)
return(fitness)
}
# search space bounds
lowers = c(-20,-20,-20,-20,-20,-20)
uppers = c(20,20,20,20,20,20)
# maximum generations
maxiter = 200000
# set the population size
popSize = 100
# set the maximum number of generations to proceed if no improvment happens
run = 50
# mutation percentage
pmutation = Sys.getenv('SLURM_ARRAY_TASK_ID')/100
GA <- ga(type = "real-valued", fitness = f, lower = lowers, upper = uppers, maxiter= maxiter, run=run, pmutation = pmutation, popSize = popSize)
summary(GA)
GA@solution %*% c(4,-2,3.5,5,-11,-4.7)
% set the mutation rate
mutationRate = str2double(getenv('SLURM_ARRAY_TASK_ID'))/100;
opts = optimoptions('ga','MutationFcn', {@mutationuniform, mutationRate});
% Set population size and end criteria
opts.PopulationSize = 100;
opts.MaxStallGenerations = 50;
opts.MaxGenerations = 200000;
%set the range for all genes
opts.InitialPopulationRange = [-20;20];
% define number of variables (genes)
numberOfVariables = 6;
[x,Fval,exitFlag,Output] = ga(@fitness,numberOfVariables,[],[],[], ...
[],[],[],[],opts);
output = [4,-2,3.5,5,-11,-4.7] * x'
save(['MutationJob' getenv('i') '.mat'], 'output');
exit(0)
function fit = fitness(x)
output = [4,-2,3.5,5,-11,-4.7] * x';
fit = abs(output - 44);
end
Now, our mutation rate is set based on the SLURM_ARRAY_TASK_ID
environment variable.
Best Practices
In general you should try not to create too many jobs at once as this can cause unnecessary stress on the scheduler. This is particularily important if your individual array jobs only take a very short time (<30 minutes). If you have a large amount of very short array jobs, it is a good idea to group them into batches. In our example this would work as follows.
Grouping array jobs
To group jobs without extensive modification of your script, you can simply create a batch loop that repeatedly calls your script and only changes either the provided input parameters, or export the variable defined in the batch for loop and access it within the script. For the genetic algorithm example the code would need to be modified as follows. First, we need to introduce a for loop in he slurm script that runs the job a number of times based on our requests.
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-10
#SBATCH --mem=500M
#SBATCH --output=python_array_%a.out
module load scicomp-python-env # use the normal scicomp environment for python
# size of each batch
BATCHSIZE=10
n=$SLURM_ARRAY_TASK_ID
# generate the sequence of indices used by each batch
indexes=`seq $((n*BATCHSIZE)) $(((n + 1)*BATCHSIZE - 1))`
# run your program for each value
for i in $indexes
do
export i #to access i within the python interpreter we need to export it.
srun python serial_array.py
done
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-10
#SBATCH --mem=500M
#SBATCH --output=r_array_%a.out
# Load the version of R you want to use
module load r
# size of each batch
BATCHSIZE=10
n=$SLURM_ARRAY_TASK_ID
# generate the sequence of indices used by each batch
indexes=`seq $((n*BATCHSIZE)) $(((n + 1)*BATCHSIZE - 1))`
# run your program for each value
for i in $indexes
do
export i #to access i within the python interpreter we need to export it.
srun Rscript serial.r
done
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-10
#SBATCH --mem=3G
#SBATCH --output=matlab_array_%a.out
module load matlab
# size of each batch
BATCHSIZE=10
n=$SLURM_ARRAY_TASK_ID
# generate the sequence of indices used by each batch
indexes=`seq $((n*BATCHSIZE)) $(((n + 1)*BATCHSIZE - 1))`
# run your program for each value
for i in $indexes
do
export i #to access i within the python interpreter we need to export it.
srun matlab -nodisplay -r serial_array
done
and then we need to change the environment variable used in the script.
import pygad
import numpy as np
import os
function_inputs = np.array([4,-2,3.5,5,-11,-4.7])
desired_output = 44
def fitness_func(solution, solution_idx):
output = np.sum(solution*function_inputs)
fitness = 1.0 / np.abs(output - desired_output)
return fitness
# define the parameters
fitness_function = fitness_func
num_generations = 200000
num_parents_mating = 4
sol_per_pop = 100
num_genes = len(function_inputs)
mutation_percent_genes = int(os.getenv('i'))
stop_criteria="saturate_50"
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
fitness_func=fitness_function,
sol_per_pop=sol_per_pop,
num_genes=num_genes,
mutation_percent_genes=mutation_percent_genes,
stop_criteria=stop_criteria)
ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution=solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
prediction = np.sum(np.array(function_inputs)*solution)
print("Predicted output based on the best solution : {prediction}".format(prediction=prediction))
library(GA)
f <- function(solution){
output = c(4,-2,3.5,5,-11,-4.7) %*% solution
fitness = 1./abs(output-44)
return(fitness)
}
# search space bounds
lowers = c(-20,-20,-20,-20,-20,-20)
uppers = c(20,20,20,20,20,20)
# maximum generations
maxiter = 200000
# set the population size
popSize = 100
# set the maximum number of generations to proceed if no improvment happens
run = 50
# mutation percentage
pmutation = Sys.getenv('i')/100
GA <- ga(type = "real-valued", fitness = f, lower = lowers, upper = uppers, maxiter= maxiter, run=run, pmutation = pmutation, popSize = popSize)
summary(GA)
GA@solution %*% c(4,-2,3.5,5,-11,-4.7)
% set the mutation rate
mutationRate = str2double(getenv('i'))/100;
opts = optimoptions('ga','MutationFcn', {@mutationuniform, mutationRate});
% Set population size and end criteria
opts.PopulationSize = 100;
opts.MaxStallGenerations = 50;
opts.MaxGenerations = 200000;
%set the range for all genes
opts.InitialPopulationRange = [-20;20];
% define number of variables (genes)
numberOfVariables = 6;
[x,Fval,exitFlag,Output] = ga(@fitness,numberOfVariables,[],[],[], ...
[],[],[],[],opts);
output = [4,-2,3.5,5,-11,-4.7] * x'
save(['MutationJob' getenv('i') '.mat'], 'output');
exit(0)
function fit = fitness(x)
output = [4,-2,3.5,5,-11,-4.7] * x';
fit = abs(output - 44);
end