Running Python with OpenMP parallelization

Various Python packages such as Numpy, Scipy and pandas can utilize OpenMP to run on multiple CPUs. As an example, let’s run the python script that calculates multiplicative inverse of five symmetric matrices of size 2000x2000.

nrounds = 5

t_start = time()

for i in range(nrounds):
    a = np.random.random([2000,2000])
    a = a + a.T
    b = np.linalg.pinv(a)

t_delta = time() - t_start

print('Seconds taken to invert %d symmetric 2000x2000 matrices: %f' % (nrounds, t_delta))

The full code for the example is in HPC examples-repository. One can run this example with srun:

module load anaconda
export OMP_PROC_BIND=true
srun --cpus-per-task=2 --mem=2G -t 00:15:00 python

or with sbatch by submitting python_openmp.slrm:

#!/bin/bash -l
#SBATCH -t 00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1G
#SBATCH -o python_openmp.out

module load anaconda/2020-03-tf2

export OMP_PROC_BIND=true

echo 'Running on: '$HOSTNAME

srun python


Python has a global interpreter lock (GIL), which forces some operations to be executed on only one thread and when these operations are occuring, other threads will be idle. These kinds of operations include reading files and doing print statements. Thus one should be extra careful with multithreaded code as it is easy to create seemingly parallel code that does not actually utilize multiple CPUs.

There are ways to minimize effects of GIL on your Python code and if you’re creating your own multithreaded code, we recommend that you take this into account.