# ARCHIVE 09/06 Intro to High Performance Computing
:::danger
## Infos and important links
* To watch: https://aaltoscicomp.github.io/scip/
* To ask questions and interact (this document): https://hackmd.io/@AaltoSciComp/IntroSummer2021
* *to write on this document, click on the "pencil" (Edit) icon on the top right corner and write at the bottom, above the ending line. If you experience lags, switch back to "view mode" ("eye" icon)*
* Questions that require one-to-one chat with our helpers (e.g. an installation issue):
- ZOOM LINK SENT TO REGISTERED PARTICIPANTS
* Program: https://scicomp.aalto.fi/training/scip/summer-kickstart/
* If you have just one screen (e.g. a laptop), we recommend arranging your windows like this:
```
╔═══════════╗ ╔═════════════╗
║ WEB ║ ║ TERMINAL ║
║ BROWSER ║ ║ WINDOW ║
║ WINDOW ║ ╚═════════════╝
║ WITH ║ ╔═════════════╗
║ THE ║ ║ BROWSER ║
║ STREAM ║ ║ W/HACKMD ║
╚═══════════╝ ╚═════════════╝
```
* **Do not put names or identifying information in here**
:::
# 07/06 - Intro to High Performance Computing
* Archive of past chats, questions, etc at: https://hackmd.io/@AaltoSciComp/ArchiveSummer2021
* Videos from yesterday are now posted: [playlist here](https://www.youtube.com/watch?v=GpJihwsh8i8&list=PLZLVmS9rf3nPFw29oKUj6w1QdsTCECS1S)
# 08/06 - Hands-on HPC part 1
**Previous content moved to Archive link above**
# 09/06 - Hands-on HPC part 2
## Icebreaker - Reflection on parallelisation: How can your analysis workflow be parallelised?
*Please write your answer here below*
- I'm planning to use Comsol in shared-memory parallel operations for large scale modelling.
- I process health data from multiple participants, so my analysis pipeline could work on participants' data in parallel. Every job can correspond to one participant. At the end I gather everything (group statistics).
- Similar situation with brain imaging – each subject can be their own job for every analysis step
- I have some code that takes as input parameter a location in the brain. In the end, I want the results for 4000 locations. Each location can be run in parallel. Is creating an array job of size 4000 too much?
- Geant4 particle simulations simulate millions of independent particles interacting with matter. The main task asks the computer how many logical cores are avaliable and creates that many worker threads. They then run as many particles in parallel as possible. At the end the results from all worker threads are combined and written to a .root file. Is there anyone else who is doing Geant4 simulations in here? I am a beginner and could need some help from a more experienced user or we could figure out how to do Geant4 on Triton together?
- We usually run single threaded algorithms on a lot of instances, so the execution on the different problem instances can be parallelized, even if the algorithms themselves are not parallel.
- I am working with the single cell data, thus I can run in parallel different cell populations analyses or data from different sources (DNAseq, RNAseq, proteomics, etc.)
- In ML we can run multiple experiments with different parameters, parallelize the dataloading to make it faster or split the computation in multiple gpus if the models are too big for one.
- I don't understand the question.
- Imagine you are cooking dinner. Some tasks can be done in parallel (while pasta boils I can make a salad), some others cannot (pasta takes 10 minutes and cannot speed that up with parallelization). Is your research problem something that can be broken into tasks that can be done in parallel? (you can guess I am italian :D [name=enrico])
- I would use it to run multiple simulations at the same time for modelling.
- I can train multiple ML models at once with different parameters
- I run samples from preprocessing to final data. Basically same steps with sample specific parameters. So my job requires parallelization.
- I mostly need to run the same simulations with different parameters, so to start, by running many instances of the model at once...
- I will work with especially big-scale data related to maritime, at least the data wrangling could be parallelized.
Feedback on day 2:
- After all it was beneficial even if much of it was Aalto-related (I am from UH). Good presenters (talking with understandable terms.. you seem to understand the beginners viewpoint) and nice Aalto web material. Important basics by Radovan, too. (Just think of it: someone has studied some subject for years and expects that you understand it in 2 minutes..) Important thing is to be able to DO the very basic steps after a course like this.
- For exercises we do during the course we need either more time to read the docs for other uni clusters, or have a document with the simplest commands for all uni clusters in the same place. UH documentation isn't bad, but when you're seeing it for the first time and have two minutes to figure it out while listening to the stream it can get frustrating.
- I've been in a few training sessions of this sort and this one has been the best. No need to apologise guys for those negative comments. Some people don't get the difference between feedback and complain.
- We fully understand that, we are just always trying to improve. And need to frame the big pictur.
## Serial jobs
https://scicomp.aalto.fi/triton/tut/serial/
- I have a question: After setting up the network folder in windows (smb://data.triton.aalto.fi/work/$username/) and creating some files there using windows operating system is there a delay until you see them in triton when connecting with PuTTY ?
- There should not be a noticeable delay. The folder you see on Windows is the same remote folder that Triton sees.
- Alright, still ls command is not giving anything
- try the other way around: create a file on remote (e.g. triton login node) and see if it appears on your local mount. it might be that the mount connection died. (I assume Linux or mac)
- Which folder are you in? can you check the output of pwd on Triton?
- home/username is the output of pwd
- /home/username is different from the work folder (the one in the smb-line). I think it's /scratch/work/username.
- Why is the srun option --cpus-per-task limited to 40 ? I thought the whole point of using the cluster is to have thousands of CPU cores avaliable? Why can I only use 40 at once?
- You are requesting one node with 40 cpus. It is different than requesting for 40 nodes each with one cpu. We cover this today.
- For my simulation I need as many cores (threads) as possible and having only 40 is quite a dissapointment.
- If you need the CPUs to talk with each other and if the code is written with "MPI" then you can scale for more than the CPUs on a single node. If you need CPUs to talk with each other AND the code is not done with MPI, then 40 is the maximum.
- When I use the srun option --ntasks=2 or --nodes=2 the whole simulation runs exaclty the same twice and the two runs overwrite each other results, which means it takes longer and does not produce any additional data.
- Why are you using ntasks? Because --cpus-per-task is limited to only 40 and I was trying if there is any other way to speed up my simulation.
- See the reply for the question above. It depends on your code if it is implemented with MPI.
- Which software do you use? Geant4
- You can use MPI with Geant4 so you can have multiple nodes with multiple CPUs
- setting these parameters will be covered in parallel. They have to be tuned to the code itself. you get weird stuff like this with codes that aren't adapted to it.
- Where did the hpc-examples come in my folder? I don't think it was there yesterday. Did I do something yesterday (the git clone thing?) to copy this folder from github? I can't remember, but would be better to know.
- Yes with "git clone https://github.com/AaltoSciComp/hpc-examples.git"
- (Kale) What was the command to copy file to cluster?
- `scp filename kale.grid.helsinki.fi:` to put it in your home folder on Kale
- Is hello.out a file?
- it will be, once you run it.
- ok, I see it now.
- (turso) I got an error
sbatch hello.sh
sbatch: error: invalid partition specified: long
sbatch: error: Batch job submission failed: Invalid partition name specified
- in turso, one needs to specify either the subcluster or the partition in the submit file. So you need to add the line
``#SBATCH --partition=test`` in the preamble of hello.sh file.
- in kale, the example works out of the box.
- Why do we not need to `chmod +x` the`.sh` file?
- we are not doing `./hello.sh` to run it as executable, but passing it as an argument to `sbatch` which reads it however it is.
- What is the difference between these two methods? Both give the same output.
- When you run `./hello.sh` it is actually not being submitted as a batch job: it is running on the login node.
- Can we use SBATCH inside an .sh file?
- You can have a script that submits yes instead of you typing it on the login node.
- Techically yes, but good questions is that is that the best way of doing things. We'll cover topics like array-jobs later (e.g. job with looping structure).
-
- If we forget to mention the time or cpu per mem in the .sh file then will the job not run at all?
- It uses some defaults.
- Defaults on Triton: 15 minutes, 500MB, max run time 5 days: DefaultTime=0:15:00 MaxTime=5-0 DefMemPerCPU=500 (/etc/slurm/slurm.conf)
- For the exercises (Q3), what is the difference between putting srun before the for loop and in the for loop? I notice that the srun docs give both as examples.
- each "srun" call will produce a "stamp" on the history. You see the difference in the history
- We will see more about this in monitoring.
- What does cat hello.sh do?
- `cat` prints contents of a file to the screen. So it prints contents of the file. Good for looking at these short things.
- Why do we use srun in the batch-script?
- It will help us track resource usage better. We will see more soon.
(ask questions at bottom of document)
### Quick poll:
Did you manage to run the example?
yes:Öoöooooooooooooooooooooooooooooôoo
no:oooo
yes, but I got an error: ooooo
yes, but it didn't print anything: o
- check nano hello.out (print is there)
- Was the command srun hello.sh to run the file?
- `sbatch hello.sh`
-
- Is anyone using Triton also getting command not found errors?
- Which command for example?
- --time=00:05:00 ...
- --mem-per-cpu ...
- --output= ...
- these three
- When submitting a serial job you need to set these parameters like `#SBATCH --time=00:05:00`. These comments are read by Slurm but they are not executed by `bash`. If you do not include the comment, you'll get `command not found`-errors.
- These are not commands but options for the SBATCH directive. Interactively they would go after "srun" e.g. "srun --time=xx --mem-per-cpu.. etc etc". Srun is the command and the "--..." are the options
- Do you know why they might be giving command not found errors then?
- We need to check what you run before getting those errors.
- I tried to just follow along. It never created the hello.out file, just these slurm-jobid.out error ones
- Is there a bash command or something to see the amoun of available memory? E.g. when running pilots with just the VDI resources, and you don't really know how big those are (and also n of cpus)
- Good question, do you want to check it in real time? Or are you happy to check it after the job is done?
- hm actually just to see what it generally is, I mean like total mem and cpus, to help with estimation (Spyder shows % of mem, but not actual MB of it when running commands/scripts) -- the way my scripts are structured (python joblib parallel) it's hard to just run a single subject with sbatch, so I do my testing inside Spyder
- some options https://linuxhint.com/check_memory_usage_process_linux/
- `htop` is a good tool for monitoring your processes e.g. on your laptop. In cluster you need to use other tools as `htop` only shows local processes, not processes that are running on other nodes. The field `RES` will tell the memory usage of your job. Use `q` to quit it.
```
sbatch: error: invalid partition specified: long
sbatch: error: Batch job submission failed: Invalid partition name specified
```
- How can I exit the shell file?
- nano: Ctrl-X
- To edit files on the terminal you can use a terminal editor like "nano". If your work folder is mounted on a graphical system, then you can use anything graphical.
- Is there a quick way (e.g tab completetion) for job id? When cancelling a job.
- Not that I know :) but if you have the job output file named as the jobID, then it is handy.
- Richards way is quicker but uses the mouse!
- Indeed! I don't have a mouse :D
- Do things like screen/tmux support copy and paste?
- yes, different shortcuts for different OSes.
- if we do not specify the memory, time etc in .sh file, will it be assigned by default or what?
- At Aalto yes, not sure about other sites.
- At Tampere we also have small defaults
- If I run a program without allocating CPU/GPU time and memory (eg. with a command python3 example.py), what will happen? Is there a default amount?
- yes it picks some default parameters
- (Turso) slurm history 1hour gave me this: Warning: output will not fit.> I meant to say it did not show me sleep.sh was run
- It's just telling you that the history output is wider than you current shell screen -- yes that means that your job didn't actually run, probably
-
- Is there a way to print out # of CPUs used in the .sh-file? When job started, the .sh file will print out that X CPUs is used for this task.
- Check the SLURM_ variables in the job you request. e.g:
`srun --cpus-per-task=4 --pty bash`
`env|grep SLURM`
- a good candidate seem to be $SLURM_CPUS_ON_NODE
- drag here to widen terminal->JobID JobName Start ReqMem MaxRSS TotalCPUTime WallTime Tasks CPU Ns Exit State Nodes
- I guess this is not a question?
- Btw I keep getting this message: man: can't set the locale; make sure $LC_* and $LANG are correct , should I do anything about it?
- which computer?
- Triton
- Let's get back to this later, I am not able to reproduce it.
- I get this message too everytime I login to narvi
- could mention this in zoom breakout room 2, there is something wrong with your account
- Are you on a Mac? I remember there was a similar problem a while ago. OS X sets a locale option that does not work with Triton. It does not change anything, just prints the message.
- Yes this happens for me on mac too, every time I connect to triton by jumping though kosh -- haven't noticed it really affecting anything thou
- We can debug this in garage if it really annoys you :)
- In Windows and PuTTY you might need to change the locale settings from settings -> window -> translation -> remote character set -> utf-8. See for example [this stackoverflow answer](https://serverfault.com/a/475930)
- On mac you need to do similar stuff, but cannot remember now how to do it.
-
- In turso, got sbatch: error: invalid partition specified: long (LF) sbatch: error: Batch job submission failed..
- I ran with #SBATCH --partition=test again in the script and it did not show the error.
- I try that, ok, it submitted a batch job
- perfect!
- What does the `$(seq 30)` do in the for loop?
- `seq` is a linux command that gives you a sequence of numbers. So in this case, numbers from 1 to 30 (try running `seq 10`, for example). The command `$(something)` means that bash will run `something` and place the output here. So basically, for loop over sequence that ranges from 1 to 30.
:::info
### Exercises until xx:55
### Then break until xx:05
Then we go over the exercises as a demo.
**Helsinki users, please come to Zoom breakout room 1**
Serial jobs: https://scicomp.aalto.fi/triton/tut/serial/#exercises
Try #1-3. You can try advanced if you want.
:::
(continued questions here)
- On triton, the output file only contained "slurmstepd: error: execve(): /var/spool/slurmd/job61003986/slurm_script: Not a directory"
- Let's check it on zoom
- It was a typo at the beginning #!/bin/bash/ (remove last slash)
- What is hostname in exercise 1? I don't find it in hpc-examples.
- It should depend on where it runs: as long as you see something, it's OK. Ideally it will match the `slurm history` report of where it says it runs.
- hostname is the variable that contains the name of the current host.
- I think we want it to run the command `hostname` in the shell script.
- Yes. This is not on the examples. It is like doing interactively `srun hostname` but instead you submit it non-interactively
- What is the command syntax for srun to get multiple lines of bash code? currently I have written a bash script with the for loop and am runnig *that* with srun.
- I prefer writing a bash script with the lines and then `srun myscript.sh`
- Exercise 3, how to run the whole script with srun? or does it matter?
- It is enough you if have some sruns to see why we use it.
- Is there a way to "comment" lines out from the SBATCH preferences?
- ##SBATCH comments it
- What happens if you give too much time or memory? What happens if you do not give enough time or memory?
- Too much? You might have to wait more in the queue
- Too little? Your jobs will be cancelled because they used all the resources you requested for
- It is good to start small, trial and error, use "seff" (soon you will hear about this) and optimize your analysis workflow before submitting many jobs.
- Is the memory needed to run a program directly related to the size of the script? Or what is the relation? E.g `stat hostname.sh` is 116 bytes in size. Could I run this on a node with memory set to 116 bytes?
- The size of the script file is not related to the size of the memory. E.g. a python/matlab script that loads a file of 10GB size can just be one line but it needs 10GB of memory to load the file.
### Quick poll
Did you manage to run the exercises?
ex1:ooooooooooooooôôôôôôôôôôôôoo0ooooooooooooooooooooooooooooooooooo
ex2:ooooooo0oooooooooooooooo0ooooooooo
ex3:ooo0oooooooooooooooooooooooooo
ex4:oooooooooo
ex5:oooø
none:o
- What are the different languages one can use in ex5?o
- http://dcjtech.info/topic/list-of-shebang-interpreter-directives/
- Is there a way to include a `#SBATCH <something>` to select a partition in a `sbatch` script?
- yes, the option is "-p" or --partition, link coming: https://slurm.schedmd.com/sbatch.html (search for "--partition")
- Might be useful to introduce the `man sbatch` command for searching for the CLI options
- I love terminal but the webpage is much nicer https://slurm.schedmd.com/sbatch.html :)
- is there a cat? do i hear correctly??
- yes! :cat:
- yes, it wants to get out. Normally it can open the door itself but I have blocked it.
- THIS IS BEST
- cute cat thank you!!! <3 <3
- if I wasn't so distracted we could have shown some more. Opening the door is always a nice trick.
- cat +1
## Monitoring
https://scicomp.aalto.fi/triton/tut/monitoring/
- Monitoring examples are using script '/usr/bin/slurm'. Current version of the script doesn't work properly in turso. Fix is in TODO, but not implemented yet. So, Helsinki users need to do the monitoring exercises in kale, not in turso. We will fix the "slurm" script asap.
- would you rather smaller font, or me continually adjusting the window like I do now? Adjusting works well
- How can I check whether my script is utilizing GPU or is it using local machine?
- Good one! THis will be covered in the stream
- Is there a way to similarly "watch" the .out file?
- `tail -f output.file` where the `output.file` is the output file name
- Is there an equivalent to IDE "debugging" mode in batch scripts?
- Not that I know of. I see it more like "my script is ready and does all I need to do, now I wrap it with sbatch so that I can run it on a remote system"
- Normally I would do that kind of debugging as interactive jobs. And not use IDE debugger but standard command line debugger.
- Some advanced debuggers can connect to running jobs, but that is probably more than most people need.
- How to load modules inside sbatch-script?
- any command you can type on a shell can be inside a bash script so in this case `module load XYZ`. You don't need the srun
- .I though it was important to run the commands with srun?
- You need the srun if you want to track the usage of that specific line or if you need parallelization with MPI (they can comment live). because each srun entry will end up as its own entry in the history.
- but commands like "echo" "cd" "module load" they do not need srun
- What was the meaning of %x in the output file?
- job name. This generic form could be used for any job so I don't have to go editing output filename so much.
- See https://slurm.schedmd.com/sbatch.html and search for "%x"
- I did not get any output from the third exercise. Does srun print the date?
- It should. However you do not really need srun for "date" see question above.
- `date` is just a linux command that prints the date. `srun date` just means that `srun` runs the program and records stuff to Slurm database (for `slurm history`, `seff` etc.)
- What is the red number in the command line? now it is 130.
- My shell is set to include "exit status"/"exit code" of a program.
- 'exit status' is a unix concept, basically return value of a program. 0="success", anything else conventiol to mean failure. This is often used in shell scripts
- And in my configuration, it is printed to help me when running stuff (if it is not zero)
- and apparently Control-C exits with status=130
- Can you give a job name for srun. For example if you have multiple srun date, can you give a job name 'This gives date 1', 'This gives date 2'
- I haven't done that but probably! check `srun` manual page.
- You can with `srun -J name`. I would recommend against names with spaces.
- I prefer the long formats and apostrophes `srun --job-name="niceName" date`
- is maxRSS per node/core/job?
- per job
- Does this dot after the ID in seff command refer to the each specific line of the script, or the single command, or the single step /like single step of the cycle?
- I might have missed this, but seff just wants the jobID, no dots
- They did use dot in the end refering to some specific step, which I did not get how to identify
- oh yeah you see them from "sacct"
- yes you can check the efficiency of each JOBID.0 etc...
- [name=simo] If you're monitoring individual job steps (started with `srun`), you can monitor them with `seff <jobid>.<task_number>`. Each task (`srun`-call) gets its own number.
- Thanks!
:::info
### Exercise until xx:10, then break until xx:20
https://scicomp.aalto.fi/triton/tut/monitoring/#exercises
- Try both exercises.
:::
- When trying the pi.py example in kale, use "python pi.py" and not "python3 pi.py". There is python3 in the login node, but not in the nodes (unless you load e.g. anaconda module).
- I'll be in Zoom breakout room 1 for Helsinki users, if you have questions.
- Anobody else in zoom, feel free to ask for help about anything.
- (Helsinki kale) srun: error: Unable to create step for job xxxx: More processors requested than permitted (Not in zoom because too tasking for computer)
- is it in a batch job? if srun is in batch job, the batch job should also request that many threads.
- what's the exact syntact and where you can find that information in general? cpu-per-task? threads? cpu?
- We will talk about parallel jobs later (the tutorial with explanations is [here](https://scicomp.aalto.fi/triton/tut/parallel/))
- In general this is the best resource https://slurm.schedmd.com/sbatch.html
- when running pi.py with sbatch it starts but then says it failed, can you show the example again to see what might be wrong
- Did you run the `pi.py` like: `python pi.py n`, where `n` is the number of iterations? You need to use `sbatch` for the slurm script.
- yes, but used python3
- If you're in HY, use `python`. See answer above.
thanks
- If you want to use python3 you can load it by `module load fgci-common` and `module load Python/3.8.6-GCCcore-10.2.0` (for example)
-
- I tried exercise 2 by just replacing the command line starting with srun and it failed. What could be the reason for that?
- ```
"#!/bin/bash
#SBATCH --mem=500M
#SBATCH --time=1:00:00
#SBATCH --output=%x.%j.out
srun --cpus-per-task=2 python pi.py --threads=2 100000000"
```
- Just run the srun outside of the slurm script. We'll talk about the requirements in sbatch later on.
- I had the same problem, but just realised I had the pi.py file in a different folder. So it's fixed now.
- what is the relevant difference between cpu and memory usage? how to manage them?
- cpus are for the computation (a + b), memory for storing the values (the value of a and the value of b), so you might need very little memory but lots of CPUs like in this pi.py case. Some other times the values you need to store are huge but the operations you need to do cannot be parallelized. I hope I do not sound silly, I just try to explain it in the most simple way :) [name=enrico]
- how to manage them? start with some values, see "seff JOBID", look if you are using too much or too little. Change accordingly.
- That would be good one way. Usually the case is more obvious like "I'm running out of memory on my laptop that has XX memory". So when you are doing work with some code/software you might have good huch already. After that it's trial and error quite often.
Quick poll:
I did exercise...:
1)0oooooooooo0ocoooöoooooooo
-1a)oooo
-1b)oo
-1c)oo
2)o0ooöooooooooooooooo
neither)o
(UH) second task is failed
- make sure you're either in the same directory as the pi.py script or specify the correct path in the command
- It worked with the first task, so yes, dir is correct.
- what error did you get?
- no specific error, just exit code 1 and status FAILED
- Worked for me. Did you use Kale?
-Yep
- there is now a working example of second task sbatch file in breakout room 1 screen share. (I have muted me, not to disturb the main session)
o
- Oulu: I did "module load python" in the batch-script and it did not transfer to the srun. I put it in srun instead and it did not recogninze the command "module". Is there any way to load modules and use in srun?
- Aalto here, but I think it related to default shells and things like that. You do not really need the "module" command to be targeted with srun
- But when I do srun it does not recoginze "python". (I need to load the module.)
- Can you try `module load python` and then `srun python ...`
- Problem solved. I had made a difficult to find spelling error.
- :+1:
- What is the bash (not python) command to print CPU time, wall clock time, CPU efficiency (especially wallclock time)?
- `time`? not really cpu efficiency but try the command `time`
- https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1/556411#556411 explanation of what `time` outputs
- If I want to run GridSeachCV on 2 nodes. Is it possible to utilize both 2 nodes and 40 cpus for each node?
- Yes, if the program supports it. This means either that each run is independent of the others or that it uses MPI to communicate between cpus.
- How to check how many CPUs are in the node? was it df -h?
- df checks for the disk size of where you are right now in the terminal
- for number of cpus on a linux machine: lscpu (and other ways)
- (Aalto) Best place start would be to check documentation https://scicomp.aalto.fi/triton/overview/
-
- But --cpus-per-task is limited to only 40?
- The limit is the number of CPUs on on the nodes, so it depends on the machine.
- On Triton is is usually 40.
- seff output gave CPU Utilized: 00:04:00, does it mean it used 4 CPUs?
- It's a time
- CPU efficiency is the % of efficiency. If you have requeted multiple CPUs and if that percentage is at 100 then you are using them all.
- Twitch has stopped working it seems!
- Works here in Käpylä :) I am too scared to refresh though (flashbacks from yesterday)
- I opened a new window and it worked. Internet is not down.
- This worked. Thanks!
- In Helsinki (and probably in Tampere also) one needs to do "module load fgci-common" before "module load gcc/9.2.0" works.
- I REALLY with there'd be a working, simple python parallel example in the docs... not with openmp but joblib or multiprocess or something
- pi.py uses multiprocess from "pool" package
- But it's not in the parallel section of the docs? I'm not going to find it when I need instructions on parallelisation, if it's not
- We can add it, thanks for the feedback! (right now pi.py is in the previous pages on serial jobs)
- the folder hello_omp is already in hpc_examples/openmp/
-
- Why are the threads not in order? In Richards example.
- threads are fun! They can run in any order. This is a lot of the complexity of making stuff parallel, there are so many things that can go wrong.
- :D please organise a course on this! -- oh ok, answer is in next question
- Can you recommend a course or tutorials for writing parallel programs? Or parallel programming for physics for example.
- UH students there is a course "Tools of high performance computing" which teaches this!
- With C and MPI we have one course run by Fillippo Federici Canova. Next one will be in September
- Matlab: we have matlab advanced with parallelization examples (videos coming to youtube, course finished few weeks ago)
- Python: we have a python course that covers parallelization. Next round in september
- Is this Aalto? What's the name of the course?
- Last time it was done in collaboration with Sweden and Norway so not just Aalto: https://scicomp.aalto.fi/training/python-scicomp/
- Julia: we have a julia course in August
- If anyone has some nice MOOC courses, let us know and we can add them at: https://scicomp.aalto.fi/training/
- What about C/C++/Fortran using OpenMPI
- Filippo's courses see link above (but I am sure there is something in coursera, etc, let us know if you find anything nice!)
- Also feel free to check CSC courses and PRACE (European supercomputing collaboration): https://www.csc.fi/en/training-offering browsing from here you will also find Prace online courses.
- Again, there is a working version of openmp sbatch script for turso in Zoom room 1 screen share (and I was muted)
- can you show it please?
- Is there a way to tell Slurm to just use as many CPUs as possible instead of exlplicitly telling it how many CPUs it should use with --cpus-per-task?
- you can give it a range, then the code needs to be able to detect what it actually has (Slurm envirnment variables)
- I would like to clarify something already asked above which relates to gridsearchCV as an example. Did the reponse there mean that I would need to programe the "paralellism" in the code ? If I remember correctly some of these libraries in ML already have an underlying 'multithreading' implementation
- [name=simo] Yes. Most of the scikit-learn algorithm have options that you can toggle that set the number of workers you want to use for the problem. If you reserve multiple CPUs with `--cpus-per-task` and then set the number of workers to be the same in your code (see Python page for more examples on this), you can immediately run this part of the code in parallel.
- Yes I thought it would work this way 'out of the box'
- It does, but you need to tell the program what to do. It won't utilize these features if you do not tell it that. So it has 'out of the box support for parallelism', it does not necessarily do stuff in parallel 'out of the box'. :)
-Correct
- [name=rkdarst] and you need to read the docs to figure out how it works! Usually it would be OpenMP but it's annoying when it's unclear.
- But I guess if I've wrapped those parallel-ready functions into lots of my own code, which in turn has been parallelised e.g. by subject, then I guess I don't want to _also_ have the functions use several workers? Unless I reserve m x n number of cores?
- Correct. Also if some of these parallelizations do not need to "talk to each other" (i.e. no multithread) then the independent things can go to different nodes (see array jobs)
- It would be good to know (hands-on) how to do that in Kale cluster -HY
- Array jobs on stream right now :)
- If my code has script so that output of script above creates input for following script, does it make sense to use multiple CPUs or to parallelize my task?
- If the input for next stage depends on previous stage, then they cannot be parallelized. There are tools to "solve" these dependencies because they can be modeled as a graph. One of such tool is "snakemake" https://snakemake.readthedocs.io/en/stable/ BUT if your graph is just a chain of nodes one after the other, then parallelization will not help.
- If you're going to run the whole pipeline with multiple different inputs / parameters, you can also use array jobs to parallelize it trivially. We'll talk about this next.
- Love the analogies <3
- Mee too!
- Can I contact someone in the future if I need help with running Comsol simulations in Triton?
- Definitely. Garage is a good way to get face to face (online) contact. Or using our ticketing system. Links are here: https://scicomp.aalto.fi/help/
- Will this material and the hackmd discussions stay available long after the course? These are great and it would be nice to come back every so often.
- Material is in scicomp.aalto.fi , videos will be in youtube and hackmd will be stored.
## Array jobs
- If I have two unrelated jobs, can I request separate nodes to do them simultaneously?
- Most commonly you would just send two jobs to Slurm. E.g. two batch-scripts (or otherwise two independent jobs like now talking about array jobs). They will then get processes independently. Might run simultaneously or at some point anyway.
- What if I have to input different parameters for each of the 16 jobs?
- Have a dictonary in your code so that number 0 maps to a,b,c,d parameters, number 1 to another set of parameters etc etc. I like to use a table so that each line is the set of parameters. I like to have it inside the actual code and not in the slurm sbatch script so that I know that on the sbatch side things are just simple/standard
- You can use different parameter files or script files, that have the same name except for the task number at the end. Then refer to the files as `common_file_name_$SLURM_TASK_ID` if the file names are `common_file_name_0`, `common_file_name_1` and so on.
- How short array tasks would you recommend/allow? I've understood they shouldn't be too short --> have to parallelise scripts myself
- Rule of thumb, if something is less than 30 minutes is is not efficient to have it as array job, but there are always exceptions. The best is to pack together things so that each array job lasts for few hours
- Could you also add an example for how to bash-magic values from a row of csv file? If you need several variables for the runs
- First, you are welcome to use whatever code/script/MAGIC that works for you. We have examples here for bash, but nothing prevents you to use Matlab/python if you prefer those.
- From a reproducibility point of view, keep this part inside the rest of the code so that the only "input" that sbatch needs to pass is the integer of job array ID. This also helps debugging strange errors as all the code is in one place and only the integer input is the sweep parameter.
- Fair point! This is important tacit knowledge it's hard to get elsewhere :thumbsup:
- Fully agree here. Not a good idea nesessary to try doing all via bash. Keep the job submission "simple".
- Let's say I have input files A,B,C. I have to run same pipeline for all files but the parameters change for each input file. The input of next line is created in the previous line. Can I use array? The pipeline is serial but i want to parallelize the script for input files A, B C.
- write the serial pipeline whose only input is what changes pipeline.py
- have a "run" function that calls pipeline.py with the correct input given the integer
- run(1) -> calls pipeline.py with file A :BASH script, not python. (replace .py with what you like :))
- run(2) -> etc etc
- If you are at Aalto, come to garage (Helsinki)
- Are the array tasks computed in parallel or in series?
- In parallel independent from each other.
- Can I make a task conditional on the others? "Run the first 4 tasks and when they're done, finally run task 5"
- Yes. This starts to be quite special case already. You can see details in Slurm manual https://slurm.schedmd.com/job_array.html under "Job Dependencies". Definitely not something covered here in this course though.
- If one task already uses all the cores on a node, will the tasks still run in parallel?
- The tasks will run in parallel on different nodes
- Is it possible in case of GridSearchCV start array with different parameter sets that are defined in the python code?
- code this inside the rest of the code and keep the sbatch code simple, see questions above
## GPUs
https://scicomp.aalto.fi/triton/tut/gpu/
- When to use GPUs vs CPUS? And why?
- The tools you use are able to run on GPUs. Not all software tools can run on GPU. All run on CPUs.\
- For example, why do crypto miners use GPUs and not CPUs?
- hardware architecture of GPU allows for more efficient parallelization. Like having an HPC cluster inside the same piece of hardware.
- GPUs are good for certain computational problems. If the problem vectorizes easily and requires a lot of computation (compared to memory accesses) it will be efficient on a GPU. Neural networks are efficient to train in a GPU.
-
- Why are there so many GPU software packages which are based on python, julia etc, i.e high level languages and not software packages based on low level languages?
- They are wrappers around the low-level lanugages htat actually implement the thing. It's a very conventient pattern: implement the core in low-level, drive from high-level.
- https://stackoverflow.com/questions/35677724/tensorflow-why-was-python-the-chosen-language (just an example for TensorFlow)
- That is useful for users but not for programmers. How do I become a better programmer and not a better user?
- Difficult question: maybe join an Open source software project? by improving some existing tools you learn from others in the community. Many python packages were ported to C/c++ etc. Many of these high-level packages for GPU also allow other APIs (e.g. you can use the tensorflow core methods using C++ without python). ***good question for a conference on Research Software Engineers* ->** https://nordic-rse.org/events/2021-online-unconference/
-
## Modules
https://scicomp.aalto.fi/triton/tut/modules/
- How did you change the path from Lmod for anaconda? Specifically, what did Simo do to change the environment path to the anaconda python when using the command `module load anaconda`.
- "module load anaconda". This does all the stuff in the background (paths are not something you need to worry about).
- This `module` system manages all the env variables and paths dynamically. So, you can load multiple, remove, swap etc and that will handle the system env stuff in the backround.
- on Kale load module anaconda fails.
- try "module load fgci-common" first.
- this gives an error as well...
- This meta-module makes the other Aalto modules available!
- Sorry, had a typo there
- cat and spider, what other animals can we meet?
- Do not forget anaconda. There are plenty of funny names in the linux world. And you can create new ones yourself.
- and, well, python :D
- tail
- there's also daemons if you're into fantasy
- Err. It's *GNU*/Linux after all...
-
-
- Question about Garage: Is it Aalto only?
- Aalto garage: https://scicomp.aalto.fi/help/garage/
- Also HY has a garage. https://wiki.helsinki.fi/display/it4sci/HPC+Garage
- Whats the best way to get in contact with you if we want to help you with teaching or materials?
## Feedback
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
Please give one good thing about today:
- A good overview of possibilities, we can perform on clusters
- The level of the exercises was good
- Now I understand why my first jobs on Triton were so slow and have some ideas how to speed up things.
- Somehow, today, I was better able to follow all these channels and even do some exercises.
- The hands on sessions were great.
- Today the practical session was very good
- Good overview, which told us the basics so that we know what to learn
- Today's sessions were most useful from practical point of view
- Good mix of HackMD/Hands on + talking. Simo explainations were really good and things going wrong with terminal commands is also good because it shows how you guys approach a problem.
- Good to have more exercises. adequate time for exercises
- The conversational tone is good and clear
- It is really good you did it in coversation mode, 4 hours are quite heavy otherwise. To me is good you did an overview of several things, so we can have an idea of what we can do.
Please give one thing to improve about today:
- Modules section could be moved to the begining of the day
- I agree with this one
- I think, each of today's topics might be discussed for a whole day. I do not know, if this is the goal of this workshop, but maybe it would be worthwhile to separate this session into a few of them, and pay more attention to each of the topics.
- today could be split in two days
- .The scheduling did not work well, but such it is when coding live
- A bit more time for the exercises
- +1
- I agree with this as well
- Wasn't that big of an issue but try not to talk over each other
- Some introductions were a bit too long, we could've moved straight to the hands-on part.
How could the course be improved overall:
- Make it more compatible for different clusters, so we do not need to jump from one platform to another, being in panic, that command does not work out of the box.
- .Having both twitch and zoom open was very tasking for my university laptop, maybe this could be improved somehow
- it's taxing on my laptop too, and it's only 4 years old
- .It would be best if everyone was on the same cluster (I'm fron UH), either by splitting the course entirely for different universities or otherwise arrange for everyone to have accounts/access to the same cluster
- Zoom was not really optional for UH users for the second and third day of the course imo
- it was mandatory I would say :smile_cat:
- Less meta topics imho
How should we answer HackMD questions (e.g. to not distract from the program (multianswer):
immediately: oo
with a delay: ooo
both (depending): oooooooo
unsure: ooco
the audience should ignore if it distracts: ooo
What areas should we cover more during this course:
- Explain what we should expect from the cluster. Tell us what are realistic expectations, what the cluster can do and where it is useful and for what it is not suited. I had quite unrealistic expectations in the beginning of the course.
- +1. And this might be connected to the comment below, so that overview would include some links to the further reading.
- I think, some further info (e.g. how to start your own work on clusters) might be useful.
- Maybe some more information on monitoring and especially how to know how much resources (memory, CPU, GPU) I might need for a job. Might be more useful than the theoretical intro to HPC Clusters on day 1
- Agreed. I would like to hear more about the planning of the work, how much can I ask, what should I consider, how to check different parameters, etc.
- some software have GUI, maybe you can mention an example on how to combine work done through GUI with work done through terminal commands
- Agreed. This is quite crucial actually for us doing a lot of plotting (hint: VDI and then ssh -X triton -- not perfect but gets stuff done)
- Also, check this out: https://scicomp.aalto.fi/triton/usage/workflows/
- I agree (is there another course on related GUI stuff?)
What areas should we cover less during this course:
- Possibilities to do the same, but in a different fashions. I believe, most of the audience are newcomers in this field, and giving a lot of commands, which are doing basically the same, might be confusing.
- Agreed -- you could maybe think of a different structuring, of starting with sort of high level working examples, and then bit by bit start delving into the stuff that makes it work (e.g. Unity tutorials are a great example of how to teach stuff that is actually very complex in a very approachable way)
what parts of the course were boring:
- The theory talks on the first day were too theoretical and were not needed for the rest of the course. o
- I liked it! somehow i agree but still!
- for someone who likes computers and hardware the first day talks were the best
- Troubleshooting :smile:
---
Highly recommended to get inspired with Scientific Computing and doing research with computers in general: ->
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
Some of the links Richard mentioned:
- Join a code refinery workshop (or do it by yourself with online videos from past workshops) https://coderefinery.org/
- Check more learning materials at https://handsonscicomp.readthedocs.io/en/latest/
- Check some past episodes from Research Software Hour https://researchsoftwarehour.github.io/
There is rather annoyingly timed service break for turso, from today 16:00 until our Slurm version is updated (ETA ca. 48h). The UH course accounts will be valid until 18th of June, so you'll have next week to try the exercises. Kale is up and running, though.
University of Helsinki daily HPC garage https://wiki.helsinki.fi/display/it4sci/HPC+Garage
University of Helsinki HackMD for HPC https://hackmd.io/8b4KArAzQu-h_Ejuku7zfA
There now is a zoom group chat for UH called "HPC school" for UH people who want to do exercises later and get help if needed. You can find it in zoom search.