PyTorch

pagelastupdated

2022-08-08

PyTorch is a commonly used Python package for deep learning.

Basic usage

First, check the tutorials up to and including GPU computing.

If you plan on using NVIDIA’s containers to run your model, please check the page about NVIDIA’s singularity containers.

The basic way to use PyTorch is via the Python in the anaconda module. If you’re not using Tensorflow as well, you can pick either -tf1- or -tf2-version. If you’re using Tensorflow as well, please check our Tensorflow page.

Don’t load any additional CUDA modules, anaconda includes everything.

Building your own environment with PyTorch

If you need a PyTorch version different to the one supplied with anaconda we recommend installing your own anaconda environment as detailed here.

Creating an environment with packages requiring CUDA

Many tools check, whether the system has a cuda capable graphics card set up and will install non cuda enabled versions by default if none is found (as is the case on the login node, where environments are normally built). This can be overcome by loading cuda specific versions (as detailed below). It might however happen, that the environment creation process aborts with a message similar to:

nothing provides __cuda needed by tensorflow-2.9.1-cuda112py310he87a039_0

In this instance it might be necessary to override the CUDA settings used by conda/mamba. To do this, prefix your environment creation command with CONDA_OVERRIDE_CUDA=CUDAVERSION, where CUDAVERSION is the Cuda toolkit version you intend to use as in:

CONDA_OVERRIDE_CUDA="11.2" mamba env create -f cuda-env.yml

This will allow conda to assume that the respective cuda libraries will be present at a later point but skip those requirements during installation.

Creating an environment with GPU enabled PyTorch

To create an environment with GPU enabled PyTorch you can use an environment file like this pytorch-env.yml:

name: pytorch-env
channels:
  - pytorch
  - conda-forge
dependencies:
  - pytorch=*=*cuda*

Here we install the latest pytorch vesion from pytorch-channel with an additional requirement that the build version of the pytorch-package must contain a reference to a cuda toolkit. Additional packages required by pytorch are installed from conda-forge-channel. For a specific version replace the =*=*cuda* with e.g. =1.12=*cuda* for version 1.12.

Examples:

Let’s run the MNIST example from PyTorch’s tutorials:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

The full code for the example is in tensorflow_mnist.py. One can run this example with srun:

wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/pytorch/pytorch_mnist.py
module load anaconda
srun --time=00:15:00 --gres=gpu:1 python pytorch_mnist.py

or with sbatch by submitting pytorch_mnist.sh:

#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00

module load anaconda

python pytorch_mnist.py

The Python-script will download the MNIST dataset to data folder.

Let’s run the MNIST example from PyTorch’s tutorials:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

The full code for the example is in pytorch_mnist.py. One can run this example with srun:

wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/pytorch/pytorch_mnist.py
module load nvidia-pytorch/20.02-py3
srun --time=00:15:00 --gres=gpu:1 singularity_wrapper exec python pytorch_mnist.py

or with sbatch by submitting pytorch_singularity_mnist.sh:

#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00

module load nvidia-pytorch/20.02-py3

singularity_wrapper exec python pytorch_mnist.py

The Python-script will download the MNIST dataset to data folder.