
<p style="text-align: center"><b><font size=5 color=blueyellow>Julia for High-Performance Scientific Computing - Day 2</font></b></p>
:::success
**Julia for High-Performance Scientific Computing — Schedule**: https://hackmd.io/@yonglei/julia-hpc-2024-schedule
:::
## Schedule for Day 2 -- [Julia for HPC](https://enccs.github.io/julia-for-hpc/)
| Time (CET) | Time (EET) | Instructors |Contents |
| :---------: | :---------: | :---------: | :------: |
| 09:30-10:00 | 10:30-10:50 | Yonglei | Motivation (Julia HPC) |
| 10:00-11:30 | 10:50-12:30 | Yonglei | Writing performant Julia code |
| 11:30-12:30 | 12:30-13:30 | | ==Lunch Break== |
| 12:30-13:20 | 13:30-14:30 | Pedro | Multithreading |
| 13:20-13:30 | 14:20-14:30 | | Break |
| 13:30-14:20 | 14:30-15:20 | Pedro | Distributed computing |
| 14:20-14:30 | 15:20-15:30 | | Buffer time, Q&A |
---
## Exercises and Links
**Lesson material**:
- [Introduction to programming in Julia](https://enccs.github.io/julia-intro/)
- [Julia for high-performance scientific computing](https://enccs.github.io/julia-for-hpc/)
**Top 10 HPC clusters**
- https://top500.org/lists/top500/2024/11/
**GPU programming – When, Why, and How**
- [recorded video](https://www.youtube.com/watch?v=Yv87u2zQPJU&list=PL2GgjY1xUzfCwdcqcvXD17qyWhW4wM_Sy)
- [lesson material](https://enccs.github.io/gpu-programming/)
**Practical Intro to GPU programming using**
- Python
- [recorded video](https://www.youtube.com/watch?v=qydP7cOH4qc)
- [material for demo](https://github.com/ENCCS/webinar_documents)
- Julia
- [recorded video](https://www.youtube.com/watch?v=9n97yCgFCWA&t=3s)
- [material for demo](https://github.com/ENCCS/webinar_documents)
---
:::danger
You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.
:::
## Questions, answers and information
### Motivation (for Julia HPC)
### Writing performant Julia code
- All this kind of benchmarks tools take into account that the first time that you run a Julia code is slower than the consecutives ones?
- A short answer is yes. When you run the code at the first time, usually you need to install packages/libraries and their dependencies, this takes time and the efficiency is not high.
- You can run the code for 2nd and 3rd time to see the benchmarking results.
- For your personal project, a better way is to manage all the used libraries and packages and their dependencies using the `package manager` so that you don't need to worry about the version things of these package.
:::info
**System Settings for PC**
- ==M=2048, N=2048, 2000 steps==
- value for boundary cells, 2.0
| ID | Benchmark | Column- or row-major order | `@inbounds` | other tips | Note |
| :-: | :-------: | :------------------------: | :---------: | :--------: | :--: |
| YL | 3.142 ms | 38.099 ms | 1.952 ms | | PC |
| MT | 4.118 ms | | 3.860 ms | | PC |
| YL | 3.364 ms | 45.502 ms | 2.294 ms | | PC |
| YL | 4.452 ms | xxx ps | 2.354 ms | | LUMI-Setting (2048\*2048, 2000) |
| YL | 2.492 ms | xxx ps | 2.112 ms | | LUMI-Setting (2048\*2048, 2000) |
**System Settings for LUMI script**
- ==M=256, N=256, 1000 steps==
- value for boundary cells, 2.0
- ==Time unit is `pico second`==
| ID | Benchmark | Column- or row-major order | `@inbounds` | other tips | Note |
| :-: | :-------: | :------------------------: | :---------: | :--------: | :--: |
| PC | 18705 s | 128394 ps | 17193 ps | | PC |
| XX | xxxxx ps | xxx ps | xxxxx ps | | LUMI |
| XX | 29447 ps | xxx ps | 25169 ps | | LUMI |
| XX | 25950 ps | xxx ps | 22000 ps | | LUMI |
| XX | 22000 ps | xxx ps | 19949 ps | | LUMI |
| XX | 25389 ps | xxx ps | 19948 ps | | LUMI |
| YL | 25109 ps | xxx ps | 23215 ps | | LUMI-Setting (256\*256, 1000) |
:::
- I see that `*.toml` files are used for some kind of versioning. Can you elaborate how to use it properly?
- you can get two `toml` files if you run the code on LUMI. one is `project.toml` and the other is `manifest.toml`.
- these two files that are central to `package manager` and include information about package dependencies, versions and package names, etc.
- if you copy the `project.toml` to another machine, you can rebuild the same programming environment to reproduce your computational results.
- this is super important for the `reproduciable reserch`
- more info about these two `tmol` files are listed [HERE](https://pkgdocs.julialang.org/v1/toml-files/)
- What are the time-units provided when we run the code in LUMI? I see number in the .out file but I don't know if they are ns, ms or what.
- the time unit for the running on LUMI should be `ps`
- YL: for the code example `performant-template.jl`, I got 19898 for the setting `256*256, 1000`.
- YL: If I update the setting to `2048*2048, 2000`, the time will be 2.112e6. If we compare the time to that from PC computing, `2.112 ms = 2.112e6 ps`. So the time unit from the code example on LUMI should be `pico second`
### Multithreading
- In the spawn and fetch example, the thread remains occupied until fetch is called ?
- If I interface Julia with a C library that uses OpenMP for parallelization, is the execution via Julia also done in parallel ? Is it only if openmp is installed ?
- If the library parallelises with OpenMP, you need OpenMP from the C side anyway. That said, yes, if you call a C function from Julia and that function has a `#pragma omp for`, it will be multithreaded; you do need to set the relevant environment variable `$OMP_NUM_THREADS` for it to work.
### Distributed computing
---
:::info
*Always ask questions at the very bottom of this document, right **above** this.*
:::
---