HPC Kitchen metaphor

This is a series of videos which compares HPC and more broadly scientific computing with cooking or other things in real life. The goal is to make approaching computing easier and

Warning

This page is under construction and not all descriptions/transcripts are here yet. Currently, YouTube has the videos and full commentary, but I hope to make that available without YouTube.

[intro]: computing vs cooking

https://www.youtube.com/watch?v=yqGtnA7CUtU&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

Advanced computing resources like HPC (high-performance computing) clusters may seem intimidating, but really all the pieces are the same as a normal computer. The difference is you have to be able to use and coordinate all the resources. This sets you an the path by introducing you to a metaphor relating computing to cooking.

[data storage]: Understanding how data is stored and moved

https://www.youtube.com/watch?v=JAR9xyy5rcE&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

It’s all about data these days, but where does it all go? How do understand what we have? We compare storage in a HPC cluster (or really, anywhere) to storage of food in a kitchen. We’ll see how the hierarchy of size vs speed goes and how we access data on different servers. This doesn’t teach details about storage but prepares you to learn details in a future course.

[storage-performance]: Understanding the speed of data access and transfer

https://www.youtube.com/watch?v=9siGLV8pZ5A&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

When using supercomputers, data movement can actually be what slows you down surprisingly often. A modern GPU can easily pull in data faster than it can be delivered from storage, if you don’t preprocess it well. In this video, we go over the basic aspects of storage performance… with food.

[parallel]: methods to run on multiple processors

https://www.youtube.com/watch?v=I6fBq9HN3P4&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

Running programs in parallel: that’s what everyone thinks they want to use a cluster for. But there are actually different ways this can work, and you need to be able to distinguish between them, so that you can run them properly. This broadly explains what these methods are, so that you can understand later technical documentation.

[slurm]: the Slurm job scheduler spreads tasks to the cluster

https://www.youtube.com/watch?v=Y73A7lXISxU&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

We have talked about running stuff in parallel, but how does it get connected to hardware (processors, memory)? If we have one cluster for everyone, how do people share? That’s what this is all about.

[containers/environments]: Moving code around

https://www.youtube.com/watch?v=Ag7wcAwU_Jw&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

When you go to a different kitchen to cook, everything is different and it slows you down. This is a worse problem for computers, which don’t have a brain and thus the recipe has to be exact. As you get farther and farther into scientific coding, you’ll see how hard of a difficulty this “installing software and makin g it work” is. Environments and containers solve this problem.

[how-to-learn]: Don’t give up in learning computing

https://www.youtube.com/watch?v=evJV-02poDc&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

This is more of a fireside chat, to encourage you not to give up in learning computing. It can often feel there is so much to learn and it’s almost impossible. It can also feel like you can’t do it yourself. Well, most people don’t - all learning is somehow collaborative. Take the time to work together to share skills. Take the time to follow others even in practical computing skills, not just academic. And help others.

[data-management]: Data management is as important as data storage

https://www.youtube.com/watch?v=Q5A7n7mu-AI&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

People may think that getting a big enough storage space for your data is all you need. There’s far more than that. It’s easy for data (and kitchens) to become a huge mess without some care. It’s even worse when you work together. Unfortunately, it’s far too common for people to be too rushed to organize their stuff well, leading to worse problems down the line. Take care of your data and you data will :strikeout:`take care of` be valuable to you.

[big-vs-small-jobs]: The pitfalls of big jobs

https://www.youtube.com/watch?v=Qrql8rGfRVo&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

I was eating at a restaurant and their group policy said large groups have a different menu. Why is that? Larger groups are a less efficient use of tables, because of communication and synchronization overhead. In the same way, larger jobs on the cluster can be less efficiency because of overheads.

[reading-docs]: Making sense of information overload

https://www.youtube.com/watch?v=E9w-MNaXkDw&list=PLZLVmS9rf3nNDHRo1Baz_JVQWDI0mTYyB

Have you ever been faced with on overwhelming amount of information you need to go through before you can start something? Often times, it’s because there is actually so much information, but there are strategies for dealing with it. Let me talk about how I approach information overload…