Python

Video

See an example in the Winter Kickstart 2021 course

Python is widely used programming language where we have installed all basic packages on every node. Yet, python develops quite fast and the system provided packages are ofter not complete or getting old.

Python distributions

	What to use	How
I use some common libraries	The pre-built python environment by ASC	`module load scicomp-python-env`
I need to select my own packages	Mamba/conda environments	`module load mamba` and Conda
Own small pure Python packages	Virtual environment (for most purposes we recommend Conda though)	Normal virtualenv tools

The main version of modern Python is 3. Support for old Python 2 ended at the end of 2019. There are also different distributions: The “regular” CPython, Anaconda (using the conda package manager and containing CPython + a lot of other scientific software all bundled together, but not licensed for use in large organizations), miniforge/micromamba (installers focusing on the conda-forge channel from the conda package manager), PyPy (a just-in-time compiler, which can be much faster for some use cases).

Make sure your environments are reproducible - you can recreate them from scratch. History shows it’s easier to re-create when you have a problem (compared to solving dependency problems), and your code will also be installable on other systems. We recommend a minimal environment.yml (conda) or requirements.txt (pip), hand-created with exactly what you need in there.

Triton pre-built `scicomp-python-env`

This module contains a pre-built Conda environment with many common packages people request. It might serve your needs, and we can install other packages into it if you need it (but it might be faster to make your own env). Note that the versions in this might get updated at any time, so it’s not a stable solution.

It is loaded through the module system:

$ module load scicomp-python-env

Conda environments

Virtual environments

Python’s normal virtual environment tools work on Triton. We normally recommend Conda environments instead, since they handle all the extra compiled libraries needed for scientific software. Virtual environments probably work fine for pure-Python code.

We don’t include more instructions on virtual environments here.

Conda/virtualenvironments in Jupyter

If you make a conda environment / virtual environment, you can use it from Triton’s JupyterHub (or your own Jupyter). See Installing kernels from virtualenvs or Anaconda environments.

Warning: `pip install --user`

Warning

pip install --user can result in incompatibilities

We stringly recommend not to instal packages using pip install --user. If you do this, the package will be shared among all your projects, and will even overwrite any package installed in an environment. It is quite likely that eventually, you will get some incompatibilities between the Python you are using and the packages installed. In that case, you are on your own (simple recommendation is to remove all packages from ~/.local/lib/pythonN.N and reinstall). If you get incompatible module errors, our first recommendation will be to remove everything installed this way and use conda/virtual environments instead. It’s not a bad idea to do this when you switch to environments anyway.

If you encounter problems, remove all your user packages:

$ rm -r ~/.local/lib/python*.*/

and reinstall everything after loading the environment you want.

Note

Example of dangers of pip install --user

Someone did pip install --user tensorflow. Some time later, they noticed that they couldn’t use Tensorflow + GPUs. We couldn’t reproduce the problem, but in the end found they had this local install that was hiding any Tensorflow in any module (forcing a CPU version on them).

Background: `pip` vs `python` vs `anaconda` vs `conda` vs `virtualenv`

Virtual environments are self-contained python environments with all of their own modules, separate from the system packages. They are great for research where you need to be agile and install whatever versions and packages you need. We highly recommend virtual environments or conda environments (below)

Conda: use conda, see below

Normal Python: virtualenv + pip install, see below

You often need to install your own packages. Python has its own package manager system that can do this for you. There are three important related concepts:

pip: the Python package installer. Installs Python packages globally, in a user’s directory (--user), or anywhere. Installs from the Python Package Index.
virtualenv: Creates a directory that has all self-contained packages that is manageable by the user themself. When the virtualenv is activated, all the operating-system global packages are no longer used. Instead, you install only the packages you want. This is important if you need to install specific versions of software, and also provides isolation from the rest of the system (so that you work can be uninterrupted). It also allows different projects to have different versions of things installed. virtualenv isn’t magic, it could almost be seen as just manipulating PYTHONPATH, PATH, and the like. Docs: https://docs.python-guide.org/dev/virtualenvs/
conda: Sort of a combination of package manager and virtual environment. However, it only installed packages into environments, and is not limited to Python packages. It can also install other libraries (c, fortran, etc) into the environment. This is extremely useful for scientific computing, and the reason it was created. Docs for envs: https://conda.io/projects/conda/en/latest/user-guide/concepts/environments.html.

So, to install packages, there is pip and conda. To make virtual environments, there is venv and conda.

Advanced users can see this rosetta stone for reference.

On Triton we have added some packages on top of the Anaconda installation, so cloning the entire Anaconda environment to local conda environment will not work (not a good idea in the first place but some users try this every now and then).

Examples

Running Python with OpenMP parallelization

Various Python packages such as Numpy, Scipy and pandas can utilize OpenMP to run on multiple CPUs. As an example, let’s run the python script python_openmp.py that calculates multiplicative inverse of five symmetric matrices of size 2000x2000.

nrounds = 5

t_start = time()

for i in range(nrounds):
    a = np.random.random([2000,2000])
    a = a + a.T
    b = np.linalg.pinv(a)

t_delta = time() - t_start

print('Seconds taken to invert %d symmetric 2000x2000 matrices: %f' % (nrounds, t_delta))

The full code for the example is in HPC examples-repository. One can run this example with srun:

wget https://raw.githubusercontent.com/AaltoSciComp/hpc-examples/master/python/python_openmp/python_openmp.py
module load scicomp-python-env
export OMP_PROC_BIND=true
srun --cpus-per-task=2 --mem=2G --time=00:15:00 python python_openmp.py

or with sbatch by submitting python_openmp.sh:

#!/bin/bash -l
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1G
#SBATCH -o python_openmp.out

module load scicomp-python-env

export OMP_PROC_BIND=true

echo 'Running on: '$HOSTNAME

srun python python_openmp.py

Important

Python has a global interpreter lock (GIL), which forces some operations to be executed on only one thread and when these operations are occuring, other threads will be idle. These kinds of operations include reading files and doing print statements. Thus one should be extra careful with multithreaded code as it is easy to create seemingly parallel code that does not actually utilize multiple CPUs.

There are ways to minimize effects of GIL on your Python code and if you’re creating your own multithreaded code, we recommend that you take this into account.

Running MPI parallelized Python with mpi4py

MPI parallelized Python requires a valid MPI installation that support our SLURM scheduler. We have installed MPI-supporting Python versions to different toolchains.

Using mpi4py is quite easy. Example is provided below.

Python MPI4py

A simple script mpi4py.py that utilizes mpi4py.

#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sys.stdout.write(
    "Hello, World! I am process %d of %d on %s.\n"
    % (rank, size, name))

Running mpi4py.py using only srun:

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4

module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py

Example sbatch script mpi4py.sh when running mpi4py.py through sbatch:

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4

module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py