Intro to High Performance Computing

Infos and important links

To watch: https://aaltoscicomp.github.io/scip/

To ask questions and interact (this document): https://hackmd.io/@AaltoSciComp/IntroSummer2021
- to write on this document, click on the "pencil" (Edit) icon on the top right corner and write at the bottom, above the ending line. If you experience lags, switch back to "view mode" ("eye" icon)
Questions that require one-to-one chat with our helpers (e.g. an installation issue):
- ZOOM LINK SENT TO REGISTERED PARTICIPANTS
Program: https://scicomp.aalto.fi/training/scip/summer-kickstart/
If you have just one screen (e.g. a laptop), we recommend arranging your windows like this:

╔═══════════╗ ╔═════════════╗
║   WEB     ║ ║  TERMINAL   ║
║  BROWSER  ║ ║   WINDOW    ║
║  WINDOW   ║ ╚═════════════╝
║   WITH    ║ ╔═════════════╗
║   THE     ║ ║   BROWSER   ║
║  STREAM   ║ ║   W/HACKMD  ║
╚═══════════╝ ╚═════════════╝

Do not put names or identifying information in here

07/06 - Intro to High Performance Computing

Archive of past chats, questions, etc at: https://hackmd.io/@AaltoSciComp/ArchiveSummer2021
Videos from yesterday are now posted: playlist here

08/06 - Hands-on HPC part 1

Previous content moved to Archive link above

09/06 - Hands-on HPC part 2

**Archived at https://hackmd.io/@AaltoSciComp/ArchiveSummer20210609 **

Monitoring

https://scicomp.aalto.fi/triton/tut/monitoring/

Monitoring examples are using script '/usr/bin/slurm'. Current version of the script doesn't work properly in turso. Fix is in TODO, but not implemented yet. So, Helsinki users need to do the monitoring exercises in kale, not in turso. We will fix the "slurm" script asap.
would you rather smaller font, or me continually adjusting the window like I do now? Adjusting works well
How can I check whether my script is utilizing GPU or is it using local machine?
- Good one! THis will be covered in the stream
Is there a way to similarly "watch" the .out file?
- tail -f output.file where the output.file is the output file name
Is there an equivalent to IDE "debugging" mode in batch scripts?
- Not that I know of. I see it more like "my script is ready and does all I need to do, now I wrap it with sbatch so that I can run it on a remote system"
- Normally I would do that kind of debugging as interactive jobs. And not use IDE debugger but standard command line debugger.
- Some advanced debuggers can connect to running jobs, but that is probably more than most people need.
How to load modules inside sbatch-script?
- any command you can type on a shell can be inside a bash script so in this case module load XYZ. You don't need the srun
.I though it was important to run the commands with srun?
- You need the srun if you want to track the usage of that specific line or if you need parallelization with MPI (they can comment live). because each srun entry will end up as its own entry in the history.
- but commands like "echo" "cd" "module load" they do not need srun
What was the meaning of %x in the output file?
- job name. This generic form could be used for any job so I don't have to go editing output filename so much.
- See https://slurm.schedmd.com/sbatch.html and search for "%x"
I did not get any output from the third exercise. Does srun print the date?
- It should. However you do not really need srun for "date" see question above.
- date is just a linux command that prints the date. srun date just means that srun runs the program and records stuff to Slurm database (for slurm history, seff etc.)
What is the red number in the command line? now it is 130.
- My shell is set to include "exit status"/"exit code" of a program.
- 'exit status' is a unix concept, basically return value of a program. 0="success", anything else conventiol to mean failure. This is often used in shell scripts
- And in my configuration, it is printed to help me when running stuff (if it is not zero)
- and apparently Control-C exits with status=130
Can you give a job name for srun. For example if you have multiple srun date, can you give a job name 'This gives date 1', 'This gives date 2'
- I haven't done that but probably! check srun manual page.
- You can with srun -J name. I would recommend against names with spaces.
- I prefer the long formats and apostrophes srun --job-name="niceName" date
is maxRSS per node/core/job?
- per job
Does this dot after the ID in seff command refer to the each specific line of the script, or the single command, or the single step /like single step of the cycle?
- I might have missed this, but seff just wants the jobID, no dots
  - They did use dot in the end refering to some specific step, which I did not get how to identify
  - oh yeah you see them from "sacct"
  - yes you can check the efficiency of each JOBID.0 etc…
- simo If you're monitoring individual job steps (started with srun), you can monitor them with seff <jobid>.<task_number>. Each task (srun-call) gets its own number.
  - Thanks!

Exercise until xx:10, then break until xx:20

https://scicomp.aalto.fi/triton/tut/monitoring/#exercises

Try both exercises.

When trying the pi.py example in kale, use "python pi.py" and not "python3 pi.py". There is python3 in the login node, but not in the nodes (unless you load e.g. anaconda module).
I'll be in Zoom breakout room 1 for Helsinki users, if you have questions.
Anobody else in zoom, feel free to ask for help about anything.
(Helsinki kale) srun: error: Unable to create step for job xxxx: More processors requested than permitted (Not in zoom because too tasking for computer)
- is it in a batch job? if srun is in batch job, the batch job should also request that many threads.
- what's the exact syntact and where you can find that information in general? cpu-per-task? threads? cpu?
  - We will talk about parallel jobs later (the tutorial with explanations is here)
  - In general this is the best resource https://slurm.schedmd.com/sbatch.html
when running pi.py with sbatch it starts but then says it failed, can you show the example again to see what might be wrong
- Did you run the pi.py like: python pi.py n, where n is the number of iterations? You need to use sbatch for the slurm script.
  - yes, but used python3
    - If you're in HY, use python. See answer above.
      thanks
    - If you want to use python3 you can load it by module load fgci-common and module load Python/3.8.6-GCCcore-10.2.0 (for example)
I tried exercise 2 by just replacing the command line starting with srun and it failed. What could be the reason for that?
- ```
  "#!/bin/bash
  #SBATCH --mem=500M
  #SBATCH --time=1:00:00
  #SBATCH --output=%x.%j.out

  srun --cpus-per-task=2 python pi.py --threads=2 100000000"
```
- Just run the srun outside of the slurm script. We'll talk about the requirements in sbatch later on.
- I had the same problem, but just realised I had the pi.py file in a different folder. So it's fixed now.
what is the relevant difference between cpu and memory usage? how to manage them?
- cpus are for the computation (a + b), memory for storing the values (the value of a and the value of b), so you might need very little memory but lots of CPUs like in this pi.py case. Some other times the values you need to store are huge but the operations you need to do cannot be parallelized. I hope I do not sound silly, I just try to explain it in the most simple way :) enrico
- how to manage them? start with some values, see "seff JOBID", look if you are using too much or too little. Change accordingly.
  - That would be good one way. Usually the case is more obvious like "I'm running out of memory on my laptop that has XX memory". So when you are doing work with some code/software you might have good huch already. After that it's trial and error quite often.

Quick poll:

I did exercise…:

1)0oooooooooo0ocoooöoooooooo
-1a)oooo
-1b)oo
-1c)oo
2)o0ooöooooooooooooooo
neither)o

(UH) second task is failed

make sure you're either in the same directory as the pi.py script or specify the correct path in the command
- It worked with the first task, so yes, dir is correct.
  - what error did you get?
  - no specific error, just exit code 1 and status FAILED
Worked for me. Did you use Kale?
-Yep
there is now a working example of second task sbatch file in breakout room 1 screen share. (I have muted me, not to disturb the main session)

Oulu: I did "module load python" in the batch-script and it did not transfer to the srun. I put it in srun instead and it did not recogninze the command "module". Is there any way to load modules and use in srun?
- Aalto here, but I think it related to default shells and things like that. You do not really need the "module" command to be targeted with srun
  - But when I do srun it does not recoginze "python". (I need to load the module.)
  - Can you try module load python and then srun python ...
  - Problem solved. I had made a difficult to find spelling error.
    - Image Not Showing Possible Reasons
      The image file may be corrupted
      The server hosting the image is unavailable
      The image path is incorrect
      The image format is not supported
      Learn More →
What is the bash (not python) command to print CPU time, wall clock time, CPU efficiency (especially wallclock time)?
- time? not really cpu efficiency but try the command time
- https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1/556411#556411 explanation of what time outputs
If I want to run GridSeachCV on 2 nodes. Is it possible to utilize both 2 nodes and 40 cpus for each node?
- Yes, if the program supports it. This means either that each run is independent of the others or that it uses MPI to communicate between cpus.
How to check how many CPUs are in the node? was it df -h?
- df checks for the disk size of where you are right now in the terminal
- for number of cpus on a linux machine: lscpu (and other ways)
- (Aalto) Best place start would be to check documentation https://scicomp.aalto.fi/triton/overview/
But –cpus-per-task is limited to only 40?
- The limit is the number of CPUs on on the nodes, so it depends on the machine.
- On Triton is is usually 40.
seff output gave CPU Utilized: 00:04:00, does it mean it used 4 CPUs?
- It's a time
- CPU efficiency is the % of efficiency. If you have requeted multiple CPUs and if that percentage is at 100 then you are using them all.
Twitch has stopped working it seems!
- Works here in Käpylä :) I am too scared to refresh though (flashbacks from yesterday)
- I opened a new window and it worked. Internet is not down.
- This worked. Thanks!
In Helsinki (and probably in Tampere also) one needs to do "module load fgci-common" before "module load gcc/9.2.0" works.
I REALLY with there'd be a working, simple python parallel example in the docs… not with openmp but joblib or multiprocess or something
- pi.py uses multiprocess from "pool" package
  - But it's not in the parallel section of the docs? I'm not going to find it when I need instructions on parallelisation, if it's not
    - We can add it, thanks for the feedback! (right now pi.py is in the previous pages on serial jobs)
the folder hello_omp is already in hpc_examples/openmp/
Why are the threads not in order? In Richards example.
- threads are fun! They can run in any order. This is a lot of the complexity of making stuff parallel, there are so many things that can go wrong.
  - :D please organise a course on this! – oh ok, answer is in next question
Can you recommend a course or tutorials for writing parallel programs? Or parallel programming for physics for example.
- UH students there is a course "Tools of high performance computing" which teaches this!
- With C and MPI we have one course run by Fillippo Federici Canova. Next one will be in September
- Matlab: we have matlab advanced with parallelization examples (videos coming to youtube, course finished few weeks ago)
- Python: we have a python course that covers parallelization. Next round in september
  - Is this Aalto? What's the name of the course?
    - Last time it was done in collaboration with Sweden and Norway so not just Aalto: https://scicomp.aalto.fi/training/python-scicomp/
- Julia: we have a julia course in August
- If anyone has some nice MOOC courses, let us know and we can add them at: https://scicomp.aalto.fi/training/
- What about C/C++/Fortran using OpenMPI
  - Filippo's courses see link above (but I am sure there is something in coursera, etc, let us know if you find anything nice!)
  - Also feel free to check CSC courses and PRACE (European supercomputing collaboration): https://www.csc.fi/en/training-offering browsing from here you will also find Prace online courses.
Again, there is a working version of openmp sbatch script for turso in Zoom room 1 screen share (and I was muted)
- can you show it please?
Is there a way to tell Slurm to just use as many CPUs as possible instead of exlplicitly telling it how many CPUs it should use with –cpus-per-task?
- you can give it a range, then the code needs to be able to detect what it actually has (Slurm envirnment variables)
I would like to clarify something already asked above which relates to gridsearchCV as an example. Did the reponse there mean that I would need to programe the "paralellism" in the code ? If I remember correctly some of these libraries in ML already have an underlying 'multithreading' implementation
- simo Yes. Most of the scikit-learn algorithm have options that you can toggle that set the number of workers you want to use for the problem. If you reserve multiple CPUs with --cpus-per-task and then set the number of workers to be the same in your code (see Python page for more examples on this), you can immediately run this part of the code in parallel.
- Yes I thought it would work this way 'out of the box'
  - It does, but you need to tell the program what to do. It won't utilize these features if you do not tell it that. So it has 'out of the box support for parallelism', it does not necessarily do stuff in parallel 'out of the box'. :)
    -Correct
- rkdarst and you need to read the docs to figure out how it works! Usually it would be OpenMP but it's annoying when it's unclear.
- But I guess if I've wrapped those parallel-ready functions into lots of my own code, which in turn has been parallelised e.g. by subject, then I guess I don't want to also have the functions use several workers? Unless I reserve m x n number of cores?
  - Correct. Also if some of these parallelizations do not need to "talk to each other" (i.e. no multithread) then the independent things can go to different nodes (see array jobs)
    - It would be good to know (hands-on) how to do that in Kale cluster -HY
      - Array jobs on stream right now :)
If my code has script so that output of script above creates input for following script, does it make sense to use multiple CPUs or to parallelize my task?
- If the input for next stage depends on previous stage, then they cannot be parallelized. There are tools to "solve" these dependencies because they can be modeled as a graph. One of such tool is "snakemake" https://snakemake.readthedocs.io/en/stable/ BUT if your graph is just a chain of nodes one after the other, then parallelization will not help.
- If you're going to run the whole pipeline with multiple different inputs / parameters, you can also use array jobs to parallelize it trivially. We'll talk about this next.
Love the analogies <3
- Mee too!
Can I contact someone in the future if I need help with running Comsol simulations in Triton?
- Definitely. Garage is a good way to get face to face (online) contact. Or using our ticketing system. Links are here: https://scicomp.aalto.fi/help/
Will this material and the hackmd discussions stay available long after the course? These are great and it would be nice to come back every so often.
- Material is in scicomp.aalto.fi , videos will be in youtube and hackmd will be stored.

Array jobs

If I have two unrelated jobs, can I request separate nodes to do them simultaneously?
- Most commonly you would just send two jobs to Slurm. E.g. two batch-scripts (or otherwise two independent jobs like now talking about array jobs). They will then get processes independently. Might run simultaneously or at some point anyway.
What if I have to input different parameters for each of the 16 jobs?
- Have a dictonary in your code so that number 0 maps to a,b,c,d parameters, number 1 to another set of parameters etc etc. I like to use a table so that each line is the set of parameters. I like to have it inside the actual code and not in the slurm sbatch script so that I know that on the sbatch side things are just simple/standard
- You can use different parameter files or script files, that have the same name except for the task number at the end. Then refer to the files as common_file_name_$SLURM_TASK_ID if the file names are common_file_name_0, common_file_name_1 and so on.
How short array tasks would you recommend/allow? I've understood they shouldn't be too short –> have to parallelise scripts myself
- Rule of thumb, if something is less than 30 minutes is is not efficient to have it as array job, but there are always exceptions. The best is to pack together things so that each array job lasts for few hours
Could you also add an example for how to bash-magic values from a row of csv file? If you need several variables for the runs
- First, you are welcome to use whatever code/script/MAGIC that works for you. We have examples here for bash, but nothing prevents you to use Matlab/python if you prefer those.
- From a reproducibility point of view, keep this part inside the rest of the code so that the only "input" that sbatch needs to pass is the integer of job array ID. This also helps debugging strange errors as all the code is in one place and only the integer input is the sweep parameter.
  - Fair point! This is important tacit knowledge it's hard to get elsewhere
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
  - Fully agree here. Not a good idea nesessary to try doing all via bash. Keep the job submission "simple".
Let's say I have input files A,B,C. I have to run same pipeline for all files but the parameters change for each input file. The input of next line is created in the previous line. Can I use array? The pipeline is serial but i want to parallelize the script for input files A, B C.
- write the serial pipeline whose only input is what changes pipeline.py
- have a "run" function that calls pipeline.py with the correct input given the integer
  - run(1) -> calls pipeline.py with file A :BASH script, not python. (replace .py with what you like :))
  - run(2) -> etc etc
    - If you are at Aalto, come to garage (Helsinki)
Are the array tasks computed in parallel or in series?
- In parallel independent from each other.
  - Can I make a task conditional on the others? "Run the first 4 tasks and when they're done, finally run task 5"
    - Yes. This starts to be quite special case already. You can see details in Slurm manual https://slurm.schedmd.com/job_array.html under "Job Dependencies". Definitely not something covered here in this course though.
  - If one task already uses all the cores on a node, will the tasks still run in parallel?
    - The tasks will run in parallel on different nodes
Is it possible in case of GridSearchCV start array with different parameter sets that are defined in the python code?
- code this inside the rest of the code and keep the sbatch code simple, see questions above

GPUs

https://scicomp.aalto.fi/triton/tut/gpu/

When to use GPUs vs CPUS? And why?
- The tools you use are able to run on GPUs. Not all software tools can run on GPU. All run on CPUs.\
- For example, why do crypto miners use GPUs and not CPUs?
  - hardware architecture of GPU allows for more efficient parallelization. Like having an HPC cluster inside the same piece of hardware.
- GPUs are good for certain computational problems. If the problem vectorizes easily and requires a lot of computation (compared to memory accesses) it will be efficient on a GPU. Neural networks are efficient to train in a GPU.
Why are there so many GPU software packages which are based on python, julia etc, i.e high level languages and not software packages based on low level languages?
- They are wrappers around the low-level lanugages htat actually implement the thing. It's a very conventient pattern: implement the core in low-level, drive from high-level.
  - https://stackoverflow.com/questions/35677724/tensorflow-why-was-python-the-chosen-language (just an example for TensorFlow)
  - That is useful for users but not for programmers. How do I become a better programmer and not a better user?
    - Difficult question: maybe join an Open source software project? by improving some existing tools you learn from others in the community. Many python packages were ported to C/c++ etc. Many of these high-level packages for GPU also allow other APIs (e.g. you can use the tensorflow core methods using C++ without python). good question for a conference on Research Software Engineers -> https://nordic-rse.org/events/2021-online-unconference/

Modules

https://scicomp.aalto.fi/triton/tut/modules/

How did you change the path from Lmod for anaconda? Specifically, what did Simo do to change the environment path to the anaconda python when using the command module load anaconda.
- "module load anaconda". This does all the stuff in the background (paths are not something you need to worry about).
- This module system manages all the env variables and paths dynamically. So, you can load multiple, remove, swap etc and that will handle the system env stuff in the backround.
on Kale load module anaconda fails.
- try "module load fgci-common" first.
- this gives an error as well…
- This meta-module makes the other Aalto modules available!
- Sorry, had a typo there
cat and spider, what other animals can we meet?
- Do not forget anaconda. There are plenty of funny names in the linux world. And you can create new ones yourself.
  - and, well, python :D
- tail
- there's also daemons if you're into fantasy
- Err. It's GNU/Linux after all…
Question about Garage: Is it Aalto only?
- Aalto garage: https://scicomp.aalto.fi/help/garage/
- Also HY has a garage. https://wiki.helsinki.fi/display/it4sci/HPC+Garage
Whats the best way to get in contact with you if we want to help you with teaching or materials?

Feedback

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

Please give one good thing about today:

A good overview of possibilities, we can perform on clusters
The level of the exercises was good
Now I understand why my first jobs on Triton were so slow and have some ideas how to speed up things.
Somehow, today, I was better able to follow all these channels and even do some exercises.
The hands on sessions were great.
Today the practical session was very good
Good overview, which told us the basics so that we know what to learn
Today's sessions were most useful from practical point of view
Good mix of HackMD/Hands on + talking. Simo explainations were really good and things going wrong with terminal commands is also good because it shows how you guys approach a problem.
Good to have more exercises. adequate time for exercises
The conversational tone is good and clear
It is really good you did it in coversation mode, 4 hours are quite heavy otherwise. To me is good you did an overview of several things, so we can have an idea of what we can do.

Please give one thing to improve about today:

Modules section could be moved to the begining of the day
- I agree with this one
I think, each of today's topics might be discussed for a whole day. I do not know, if this is the goal of this workshop, but maybe it would be worthwhile to separate this session into a few of them, and pay more attention to each of the topics.
- today could be split in two days
.The scheduling did not work well, but such it is when coding live
A bit more time for the exercises
- +1
- I agree with this as well
Wasn't that big of an issue but try not to talk over each other
Some introductions were a bit too long, we could've moved straight to the hands-on part.

How could the course be improved overall:

Make it more compatible for different clusters, so we do not need to jump from one platform to another, being in panic, that command does not work out of the box.
.Having both twitch and zoom open was very tasking for my university laptop, maybe this could be improved somehow
- it's taxing on my laptop too, and it's only 4 years old
.It would be best if everyone was on the same cluster (I'm fron UH), either by splitting the course entirely for different universities or otherwise arrange for everyone to have accounts/access to the same cluster
Zoom was not really optional for UH users for the second and third day of the course imo
- it was mandatory I would say
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Less meta topics imho

How should we answer HackMD questions (e.g. to not distract from the program (multianswer):
immediately: oo
with a delay: ooo
both (depending): oooooooo
unsure: ooco
the audience should ignore if it distracts: ooo

What areas should we cover more during this course:

Explain what we should expect from the cluster. Tell us what are realistic expectations, what the cluster can do and where it is useful and for what it is not suited. I had quite unrealistic expectations in the beginning of the course.
- +1. And this might be connected to the comment below, so that overview would include some links to the further reading.
I think, some further info (e.g. how to start your own work on clusters) might be useful.
Maybe some more information on monitoring and especially how to know how much resources (memory, CPU, GPU) I might need for a job. Might be more useful than the theoretical intro to HPC Clusters on day 1
- Agreed. I would like to hear more about the planning of the work, how much can I ask, what should I consider, how to check different parameters, etc.
some software have GUI, maybe you can mention an example on how to combine work done through GUI with work done through terminal commands
- Agreed. This is quite crucial actually for us doing a lot of plotting (hint: VDI and then ssh -X triton – not perfect but gets stuff done)
  - Also, check this out: https://scicomp.aalto.fi/triton/usage/workflows/
- I agree (is there another course on related GUI stuff?)

What areas should we cover less during this course:

Possibilities to do the same, but in a different fashions. I believe, most of the audience are newcomers in this field, and giving a lot of commands, which are doing basically the same, might be confusing.
- Agreed – you could maybe think of a different structuring, of starting with sort of high level working examples, and then bit by bit start delving into the stuff that makes it work (e.g. Unity tutorials are a great example of how to teach stuff that is actually very complex in a very approachable way)

what parts of the course were boring:

The theory talks on the first day were too theoretical and were not needed for the rest of the course. o
- I liked it! somehow i agree but still!
- for someone who likes computers and hardware the first day talks were the best
Troubleshooting
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Highly recommended to get inspired with Scientific Computing and doing research with computers in general: ->

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

Some of the links Richard mentioned:

Join a code refinery workshop (or do it by yourself with online videos from past workshops) https://coderefinery.org/
Check more learning materials at https://handsonscicomp.readthedocs.io/en/latest/
Check some past episodes from Research Software Hour https://researchsoftwarehour.github.io/

There is rather annoyingly timed service break for turso, from today 16:00 until our Slurm version is updated (ETA ca. 48h). The UH course accounts will be valid until 18th of June, so you'll have next week to try the exercises. Kale is up and running, though.

University of Helsinki daily HPC garage https://wiki.helsinki.fi/display/it4sci/HPC+Garage

University of Helsinki HackMD for HPC https://hackmd.io/8b4KArAzQu-h_Ejuku7zfA

There now is a zoom group chat for UH called "HPC school" for UH people who want to do exercises later and get help if needed. You can find it in zoom search.

^^^ Please write above this line ^^^

Intro to High Performance Computing

Infos and important links

07/06 - Intro to High Performance Computing

08/06 - Hands-on HPC part 1

09/06 - Hands-on HPC part 2

Monitoring

Exercise until xx:10, then break until xx:20

Array jobs

GPUs

Modules

Feedback

Read more

RSE training checklist

RSE tech kickstart checklist

Software evaluation flowchart

Kickstart-2025-{2,3} (archive)