Tensorflow is a commonly used Python package for deep learning.

Basic usage

First, check the tutorials up to and including GPU computing.

With tensorflow, you have to decide at install time if you want a version that runs on CPUs or GPUs. This means that we can’t install it for everyone and expect it to work everywhere - you have to load something different if you want it to run on login node/regular nodes (probably for testing) or GPU nodes. You probably want to use GPUs.

The basic way to use is via the Python in the anaconda3 module (or anaconda2) - but these modules have the GPU version installed, so you can’t run or test on the login node.

If you module spider anaconda3 (or 2), you can see several versions ending in -cpu or -gpu. These have respectively the CPU and GPU versions of tensorflow installed. Don’t load any additional CUDA modules, anaconda includes everything.

If you use GPUs, you need --constraint='kepler|pascal|volta' in order to select a GPU new enough to run tensorflow. (Note that as we get never cards, this will need further updating).

Simple Tensorflow/Keras model

Let’s run the MNIST example from Tensorflow’s tutorials:

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)

The full code for the example is in tensorflow_mnist.py. One can run this example with srun:

wget https://raw.githubusercontent.com/AaltoScienceIT/scicomp-docs/master/triton/examples/tensorflow/tensorflow_mnist.py
module load anaconda3/latest
srun -t 00:15:00 --gres=gpu:1 python tensorflow_mnist.py

or with sbatch by submitting tensorflow_mnist.sh:

#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00

module load anaconda3/latest

python tensorflow_mnist.py

Do note that by default Keras downloads datasets to $HOME/.keras/datasets.

Common problems

  • ImportError: libcuda.so.1: cannot open shared object file: No such file or directory. GPU tensorflow can only be imported on GPU nodes (even though you’d think that you can import it and just not use the GPUs). So you can only run this code in the GPU queue. You could try something where you use CPU tensorflow for testing on login and GPU tensorflow for running in batch.
  • Random CUDA errors: don’t load any other CUDA modules, only anaconda. Anaconda includes the necessary libraries in compatible versions.