PHAS0100 - week 10 (25th March 2022)

# PHAS0100 - week 10 (25th March 2022) :::info ## :tada: Welcome to the 10th and final live-class! ### Today Today we will - Recap shared memory parallelism - Intro to the Message Parsing Interface (MPI) - Work through OpenMPI examples: - Hello world-type with/without catch2 - Point to point communications - Collective operations: scatter and reduce - Finish with a summary of the course ::: ### Timetable for today | Start | End | Topic | Room | | -------- | -------- | -------- | ----- | | 14:05 | 14:25 | Intro shared memory + MPI | Main | | 14:25 | 14:50 | [Install OpenMPI + hello/helloCatch](#Breakout-session-1) | Breakouts | |14:50 | 15:00 | Break | ---| |15:00 | 15:20 | Collective communications | Main | | 15:20 | 15:50 | [Scatter and reduce examples](#Breakout-session-2) | Breakouts | |15:50 | 16:00 | Break | --- | |16:00 | 16:20 | Point-to-point communications | Main | | 16:20 | 16:45 | [Ring pattern and SendRecv](#Breakout-session-3) | Breakouts | |16:45 | 17:00 | Closeout and questions | Main | # Breakout session 1 ## Class Exercise 1 **OpenMPI libraries and hello.cpp** 1. First clone the repository for today's class ``` git clone https://github.com/jeydobson/phas0100_week10_mpi.git ``` 2. Open the folder in VS Code and then `Reopen in Container` to build the updated Dockerfile image with OpenMPI libraries :::spoiler alternative If you prefer to install in a running container ``` shell sudo apt-get update sudo apt-get install openmpi-bin libopenmpi-dev ``` ::: 3. Check that the OpenMPI libraries installed ``` shell mpiexec --version mpicxx --version ``` :::warning As we will run cmake steps manually from the terminal it is useful to disable VS Code's automatic re-run of cmake every time a CMakeLists.txt file is saved: open the VS Code extensions tab -> `CMake Tools` extension settings -> Workspace -> uncheck the `Configure on Edit` setting. ::: 4. Add the relevant CMake commands to `phas0100_week10_mpi/hello/CMakeLists.txt` to compile `hello.cpp` Note: in order to get onto latter parts of this question, take a look at the hints below if this takes more than a few minutes :::spoiler hints Use `add_executable` and `target_link_libraries` The `MPI_CXX_LIBRARIES` is set by find_package and points to the MPI libraries. If that doesn't help see `hello/solution` ::: 5. Compile as usual ``` cd phas0100_week10_mpi mkdir build cd build cmake .. make ``` 6. Now from within `build` try running: ``` mpiexec ./hello/hello mpiexec -n <num_processors> ./hello/hello mpiexec -n <num_processors> --oversubscribe ./hello/hello ``` * Try out a few different values for ``<numprocessors>`` 1, 2, 10. The `--oversubscribe` option is necessary if `<numprocessors>` is greater than those available on your system. * Take a look at `hello.cpp`and discuss how running with different `<num_processors>` affects the output? Make sure you understand `MPI_Comm_rank` and `MPI_Comm_size` * What happens if you run the executable without `mpiexec`? 7. If you have time, look at the https://www.open-mpi.org/doc/v3.1/ docs and see if the documentation for `MPI_Comm_rank` and `MPI_Comm_size` makes sense. Try to get the communicator name with `MPI_Comm_get_name`? ## Class Exercise 2 **helloCatch.cpp** 1. Add relevant CMake commands to `phas0100_week10_mpi/hello/CMakeLists.txt` to compile `helloCatch.cpp` 2. As with `hello.cpp` run cmake, make 3. Try running with `mpiexec` with various numbers of :::spoiler Optional if time left over 5. Take a look at the https://www.open-mpi.org/doc/v3.1/ docs and see if the documentation for `MPI_Comm_rank` and `MPI_Comm_size` makes sense. Can you use `MPI_Comm_get_name` to get the name of the communicator? ::: # Breakout session 2 ## Class Exercise 3 **Scatter a message with MPI_Scatter** 1. Uncomment `add_subdirectory(collective)` in the top-level `phas0100_week10_mpi/CMakeLists.txt` file so that cmake tries to compile and add relevant commands to `phas0100_week10_mpi/collective/CMakeLists.txt` to compile `scatter.cpp` 2. Use the `MPI_Scatter` method to split `"This message is going to come out in separate channels"` evenly based on number of proccess and send each chunk to a process ## Class Exercise 4 **Sum resuts from different processes with MPI_Reduce** In this example the aim is to calculate pi using the Gregory and Leibniz formula: ``` pi/4 = 1 - 1/3 + 1/5 - 1/7 + ... ``` but we will split the series into chunks and process each chunk on a separate process and then use MPI_Reduce to sum them all and pass the result to the root process (rank 0). 1. Add relevant commands to `phas0100_week10_mpi/collective/CMakeLists.txt` to compile `reduce.cpp` 2. Use the MPI_Reduce method to sum the results from each process to calculate pi/4. A list of the pre-defined operators can be found further down the page here https://www.open-mpi.org/doc/v3.1/man3/MPI_Reduce.3.php # Breakout session 3 ## Class Exercise 5 **ring with blocking synchronous MPI_SSend** 0. Throughtout, use https://www.open-mpi.org/doc/v3.1/ to look up MPI method documentation 1. Uncomment `add_subdirectory(point2point)` in the top-level `phas0100_week10_mpi/CMakeLists.txt` file so that cmake tries to compile and add relevant commands to `phas0100_week10_mpi/point2point/CMakeLists.txt` to compile `ring.cpp` 2. For even ranked processes: implement an `MPI_Ssend` to send `message` from the current process to the one on its left 3. For even processes: implement an `MPI_Recv` to recieve the `message` from the process on the right 4. Then for odd processes: first call `MPI_Recv` 5. And then `MPI_Ssend` 6. Check the final REQUIRE passes. What happends if you reverse the order of 4 and 5? :::info You can use the `-c` options to tell catch2 to only run tests for a specific section like this ``` shell mpiexec --oversubscribe -n 4 point2point/ring -c "Blocking synchronous" ``` ::: ## Class/Homework Exercise 6 **ring with asynchronous MPI_ISend** Now implement using the asynchronous `MPI_ISend` in the `Asynchronous` SECTION. 1. First use `MPI_Isend` to send for all processes. 2. Then use `MPI_Recv` to receive messages for all processes. `MPI_Recv` also acts as a sync barrier. 3. Implement an `MPI_Test` to check the `MPI_Request` handle for`MPI_Isend`. Confirm that it is always set to `true`. ## Class/Homework Exercise 7 **ring with MPISendRecv** This send-recieve ring pattern is so common there is a dedicated method for it: `MPI_Sendrecv`. 1. Use MPI_Sendrecv to achieve the same as above but now in a single command. Once this is implemented the final REQUIRE check should pass. ## End of module Evaluation feedback Please, fill up [the survey to help us improve](https://moodle.ucl.ac.uk/mod/questionnaire/view.php?id=3352333) our teaching and your learning experience. # Questions Here you can post any question you have while we are going through this document. A summary of the questions and the answers will be posted in moodle after each session. Please, use a new bullet point for each question and sub-bullet points for their answers. For example writing like this: ``` - Example question - [name=student_a] Example answer - [name=TA_1] Example answer ``` produces the following result: - Example question - [name=student_a] Example answer - [name=TA_1] Example answer Write new questions below :point_down: - [] - [name=TA_1] Example answer ###### tags: `phas0100` `teaching` `class`