Hybrid CPU programming with OpenMP and MPI

# Hybrid CPU programming with OpenMP and MPI This note contains the up-to-date information during the course ## Important links - [Zoom](https://cscfi.zoom.us/j/61000457419) - [RocketChat](https://chat.csc.fi/invite/HarWui) - [Course github](https://github.com/csc-training/hybrid-openmp-mpi) - [Course home page](https://events.prace-ri.eu/event/1225/) - [Lecture slides](https://kannu.csc.fi/s/2jSLcFELwHB45JY) ## General instructions - During the lectures, you can ask questions via microphone or Zoom chat - During the hands-on sessions, ask questions in the RocketChat (please use Multiline formatting for error messages and code snippets). - Complex questions with screen sharing etc. can be discussed in a private break-out room in Zoom. ## Exercises for current session - Simple tasking - Parallelizing Mandelbrot with tasks - Parallel Fibonacci - Tasks and loops ## Agenda | Monday | | | -------- | -------- | |09:00 - 09:45 | Introduction to hybrid programming| |09:45 - 10:00 | Break |10:00 - 10:45 | Getting started with OpenMP |10:45 - 11:30 | Exercises |11:30 - 12:10 | Library functions and data sharing |12:10 - 13:00 | Lunch break| |13:00 - 13:45 | Exercises| |13:45 - 14:30 | Reductions and execution controls| |14:30 - 14:45 | break| |14:45 - 15:45 | Exercises| |15:45 - 16:00 | Wrap-up and exercise walk-through| | Tuesday | | | -------- | -------- | |09:00 - 09:50 | Using OpenMP with MPI| |10:05 - 10:30 | Thread and process affinity| |10:30 - 11:15 | Exercises| |11:15 - 12:00 | OpenMP tasks| |12:00 - 13:00 | Lunch break| |13:00 - 13:45 | Exercises| |13:45 - 14:30 | Task dependencies| |14:30 - 14:45 | break| |14:45 - 15:45 | Exercises |15:45 - 16:00 | Wrap-up and exercise walk-through| ## 1st Day Morning quiz 1. My choice of programming language A. Fortran B. plain C C. C++ A.xxx B.xxxxxxxxx C.xxxxxxxxxxxxxxxx 2. My motivation for learning OpenMP A. I want to quickly parallelize serial application B. I am developing a plain MPI application and want to improve scalability C. I am using a MPI+OpenMP application and want to understand it better D. General interest in parallel programming A.xxxx B.xxxxx C.xxxxxxxxxxx D.xxxxxxxxxxxxxxxx 3. Please describe briefly the application you are working with - Particle-in-Cell simulations in plasma physics - Earth System Model (climate) +1 - Solver for perturbation theory in Quantum mechanics/Solid state physics - Phase-space density simulation - Nbody and hydrodynamic astrophyiscs code - Structure optimization for atomic systems and dividing ML works - Particle in cell algorithm - Optimization programs/analyzers for GIS data - Quantum Monte Carlo - Krylov based time-evolution of quantum mechanical systems - OpenFoam framework for fluid dynamics - Discrete dislocation dynamics ## 2nd Day Morning quiz 1. How is OpenMP program executed A. Parallal program is run as a set of independent process B. Program starts with single thread, threads are then forked and joined C. A special launcher program is needed for starting the program D. Same program can be run in serial and in parallel A. B.xxxxxxxxxxxxxxx C. D.xxxxxxxxxxxxx 2. Which of the following statements are true A. OpenMP threads communicate by sending and receiving messages B. OpenMP program can be run in distributed memory supercomputer C. Number of threads is always the same as number of CPU cores D. OpenMP programs are vulnerable to race conditions A. B.xxx C. D.xxxxxxxxxxxxxxxx 3. What `omp parallel` construct does? A. Distributes the following `for` / `do` loop to threads B. Creates N threads C. Instructs the compiler to parallelize following code block the best it can D. Creates a parallel region which is executed by all the threads A.x B.xxxxxxxxxx C. D.xxxxxxxxxxxxx 4. How is data visibilty within parallel region defined? A. All the threads share always all the data B. By set of default rules C. By data sharing clauses D. Via calls to OpenMP API A. B.xxxxxxxxxxxxx C.xxxxxxxxxxxxxxxx D.x 5. Where are the threads synchronized? A. At the end of parallel region B. At the end of `for` / `do` construct with `nowait` clause C. At the end of `master` construct D. At explicit `barrier` constructs A.xxxxxxxxxxxxxx B. C. D.xxxxxxxxxxxxx 6. What is the outcome of the following code snippet when run with 4 threads ``` integer :: i, prod, vec1(6), vec2(6) vec1 = 2 vec2 = 4 prod = 0 !$omp parallel do reduction(+:prod) do i = 1, 6 prod = prod + vec1(i) * vec2(i) end do !$omp end parallel do write(*,*) prod ``` A. Code cannot be run with 4 threads as loop is not distributed evenly B. I do not know C. 0 D. 48 A. B.xxxxxxxxxxxxx C. D.xx ## Free discussion Feel free to add any general remarks, tips, tricks, comments etc. here. For questions during the exercise sessions use, however, RocketChat as that will monitored more frequently