Python
Python is widely used programming language where we have installed all basic packages on every node. Yet, python develops quite fast and the system provided packages are ofter not complete or getting old.
Python distributions
What to use |
How |
|
---|---|---|
I use some common libraries |
The pre-built python environment by ASC |
|
I need to select my own packages |
Mamba/conda environments |
|
Own small pure Python packages |
Virtual environment (for most purposes we recommend Conda though) |
Normal virtualenv tools |
The main version of modern Python is 3. Support for old Python 2 ended at the end of 2019. There are also different distributions: The “regular” CPython, Anaconda (a package containing CPython + a lot of other scientific software all bundled togeter), PyPy (a just-in-time compiler, which can be much faster for some use cases). Triton supports all of these.
Make sure your environments are reproducible - you can recreate
them from scratch. History shows it’s easier to re-create when you
have a problem (compared to solving dependency problems), and your
code will also be installable on other systems.
We recommend a minimal environment.yml
(conda) or
requirements.txt
(pip), hand-created with exactly what you need in
there.
Triton pre-built scicomp-python-env
This module contains a pre-built Conda environment with many common packages people request. It might serve your needs, and we can install other packages into it if you need it (but it might be faster to make your own env). Note that the versions in this might get updated at any time, so it’s not a stable solution.
It is loaded through the module system:
$ module load scicomp-python-env
Conda environments
See also
Virtual environments
Python’s normal virtual environment tools work on Triton. We normally recommend Conda environments instead, since they handle all the extra compiled libraries needed for scientific software. Virtual environments probably work fine for pure-Python code.
We don’t include more instructions on virtual environments here.
Conda/virtualenvironments in Jupyter
If you make a conda environment / virtual environment, you can use it from Triton’s JupyterHub (or your own Jupyter). See Installing kernels from virtualenvs or Anaconda environments.
Warning: pip install --user
Warning
pip install --user
can result in incompatibilities
We stringly recommend not to instal packages using pip install --user
.
If you do this, the package will be shared among all
your projects, and will even overwrite any package installed in an environment.
It is quite likely that eventually, you will get some
incompatibilities between the Python you are using and the packages
installed. In that case, you are on your own (simple recommendation is
to remove all packages from ~/.local/lib/pythonN.N
and reinstall). If
you get incompatible module errors, our first recommendation will be to
remove everything installed this way and use conda/virtual
environments instead. It’s not a bad idea to do this when you
switch to environments anyway.
If you encounter problems, remove all your user packages:
$ rm -r ~/.local/lib/python*.*/
and reinstall everything after loading the environment you want.
Note
Example of dangers of pip install --user
Someone did pip install --user tensorflow
. Some time later,
they noticed that they couldn’t use Tensorflow + GPUs. We couldn’t
reproduce the problem, but in the end found they had this local
install that was hiding any Tensorflow in any module (forcing a CPU
version on them).
Background: pip
vs python
vs anaconda
vs conda
vs virtualenv
Virtual environments are self-contained python environments with all of their own modules, separate from the system packages. They are great for research where you need to be agile and install whatever versions and packages you need. We highly recommend virtual environments or conda environments (below)
Conda: use conda, see below
Normal Python: virtualenv + pip install, see below
You often need to install your own packages. Python has its own package manager system that can do this for you. There are three important related concepts:
pip: the Python package installer. Installs Python packages globally, in a user’s directory (
--user
), or anywhere. Installs from the Python Package Index.virtualenv: Creates a directory that has all self-contained packages that is manageable by the user themself. When the virtualenv is activated, all the operating-system global packages are no longer used. Instead, you install only the packages you want. This is important if you need to install specific versions of software, and also provides isolation from the rest of the system (so that you work can be uninterrupted). It also allows different projects to have different versions of things installed. virtualenv isn’t magic, it could almost be seen as just manipulating
PYTHONPATH
,PATH
, and the like. Docs: https://docs.python-guide.org/dev/virtualenvs/conda: Sort of a combination of package manager and virtual environment. However, it only installed packages into environments, and is not limited to Python packages. It can also install other libraries (c, fortran, etc) into the environment. This is extremely useful for scientific computing, and the reason it was created. Docs for envs: https://conda.io/projects/conda/en/latest/user-guide/concepts/environments.html.
So, to install packages, there is pip
and conda
. To make virtual
environments, there is venv
and conda
.
Advanced users can see this rosetta stone for reference.
On Triton we have added some packages on top of the Anaconda installation, so cloning the entire Anaconda environment to local conda environment will not work (not a good idea in the first place but some users try this every now and then).
Examples
Running Python with OpenMP parallelization
Various Python packages such as Numpy, Scipy and pandas can utilize OpenMP
to run on multiple CPUs. As an example, let’s run the python script
python_openmp.py
that calculates multiplicative inverse of five symmetric matrices of
size 2000x2000.
nrounds = 5
t_start = time()
for i in range(nrounds):
a = np.random.random([2000,2000])
a = a + a.T
b = np.linalg.pinv(a)
t_delta = time() - t_start
print('Seconds taken to invert %d symmetric 2000x2000 matrices: %f' % (nrounds, t_delta))
The full code for the example is in
HPC examples-repository.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/hpc-examples/master/python/python_openmp/python_openmp.py
module load scicomp-python-env
export OMP_PROC_BIND=true
srun --cpus-per-task=2 --mem=2G --time=00:15:00 python python_openmp.py
or with sbatch
by submitting
python_openmp.sh
:
#!/bin/bash -l
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1G
#SBATCH -o python_openmp.out
module load scicomp-python-env
export OMP_PROC_BIND=true
echo 'Running on: '$HOSTNAME
srun python python_openmp.py
Important
Python has a global interpreter lock (GIL), which forces some operations to be executed on only one thread and when these operations are occuring, other threads will be idle. These kinds of operations include reading files and doing print statements. Thus one should be extra careful with multithreaded code as it is easy to create seemingly parallel code that does not actually utilize multiple CPUs.
There are ways to minimize effects of GIL on your Python code and if you’re creating your own multithreaded code, we recommend that you take this into account.
Running MPI parallelized Python with mpi4py
MPI parallelized Python requires a valid MPI installation that support our SLURM scheduler. We have installed MPI-supporting Python versions to different toolchains.
Using mpi4py is quite easy. Example is provided below.
Python MPI4py
A simple script mpi4py.py
that utilizes mpi4py.
#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sys.stdout.write(
"Hello, World! I am process %d of %d on %s.\n"
% (rank, size, name))
Running mpi4py.py using only srun:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4
module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py
Example sbatch script mpi4py.sh
when running mpi4py.py through
sbatch:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4
module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py