Pi exercises

This series of exercises uses a simple Python code that calculates pi.

Using the cluster from a command line

In triton/tut/cluster-shell.rst:

Shell-4: Clone the hpc-examples repository

(Part of a series: pi, ngrams)

Do the steps above to clone the hpc-examples repository. List the directory from the command line and verify it matches what you see in the view on Github repo page.

Is your home directory the right place to store a cloned git repository?

In triton/tut/cluster-shell.rst:

Shell-7: Try the --help option

(Part of a series: pi)

Many programs have a --help option which gives a reminder of the options of the program. (Note that this has to be explicitly programmed - it’s a convention, not magic.) Try giving this option to pi.py and see what happens.

Interactive jobs

In triton/tut/interactive.rst:

Interactive-3: Time scaling

The program hpc-examples/slurm/pi.py calculates pi using a simple stochastic algorithm. The program takes one positional argument: the number of trials.

The time program allows you to time any program, e.g. you can time python x.py to print the amount of time it takes.

  1. Run the program, timing it with time, a few times, increasing the number of trials, until it takes about 10 seconds: time python hpc-examples/slurm/pi.py 500, then 5000, then 50000, and so on.

  2. Add srun in front (srun python ...). Use the seff JOBID command to see how much time the program took to run. (If you’d like to use the time command, you can run srun --mem=MEM --time=TIME time python hpc-examples/slurm/pi.py ITERS)

  3. Look at the job history using slurm history - can you see how much time each process used? What’s the relation between TotalCPUTime and WallTime?

Serial Jobs

In triton/tut/serial.rst:

Serial-3: Submitting and cancelling a job

Create a batch script which does nothing (or some pointless operation for a while), for example sleep 300 (this shell command does nothing for 300 seconds). Check the queue to see when it starts running. Then, cancel the job. What output is produced?

In triton/tut/serial.rst:

Serial-4: Modifying Slurm script while its running

Modifying scripts while a job has been submitted is a bad practice.

Add sleep 120 into the Slurm script that runs pi.py. Submit the script and while it is running, open the Slurm script with an editor of your choice and add the following line near the end of the script.

echo 'Modified'

Use slurm q to check when the job finishes and check the output. What can you interpret from this?

Remove the created line after you have finished.

In triton/tut/serial.rst:

Serial-5: Modify script while it is running

Modifying scripts while a job has been submitted is a bad practice.

Add sleep 180 into the Slurm script that runs pi.py. Submit the script and while it is running, open the pi.py with an editor of your choice and add the following line near the start of the script.

raise Exception()

Use slurm q to check when the job finishes and check the output. What can you interpret from this?

Remove the created line after you have finished. You can also use git checkout -- pi.py (remember to give a proper relative path, depending on your current working directory!)

Array jobs: embarassingly parallel execution

In triton/tut/array.rst:

Array-2: Array jobs and different random seeds

Create a job array that uses the slurm/pi.py to calculate a combination of different iterations and seed values and save them all to different files. Keep the standard output (#SBATCH --output=FILE) separate from the standard error (#SBATCH --error=FILE).

In triton/tut/array.rst:

Array-3: Combine the outputs of the previous exercise.

You find the slurm/pi_aggregation.py program in hpc-examples. Run this and give all the output files as arguments. It will combine all the statistics and give a more accurate value of \(\pi\).

Shared memory parallelism: multithreading & multiprocessing

In triton/tut/parallel-shared.rst:

Shared memory parallelism 1: Test the example’s scaling

Run the example with a bigger number of trials (100000000 or \(10^{8}\)) and with 1, 2 and 4 CPUs. Check the running time and CPU utilization for each run.

In triton/tut/parallel-shared.rst:

Shared memory parallelism 2: Test scaling for a program that has a serial part

pi.py can be called with an argument --serial=0.1 to run a fraction of the trials in a serial fashion (here, 10%).

Run the example with a bigger number of trials (100000000 or \(10^{8}\)), 4 CPUs and a varying serial fraction (0.1, 0.5, 0.8). Check the running time and CPU utilization for each run.

In triton/tut/parallel-shared.rst:

Shared memory parallelism 3: More parallel \(\neq\) fastest solution

pi.py can be called with an argument --optimized to run an optimized version of the code that utilizes NumPy for vectorized calculations.

Run the example with a bigger number of trials (100000000 or \(10^{8}\)) and with 4 CPUs. Now run the optimized example with the same amount of trials and with 1 CPU. Check the CPU utilization and running time for each run.