Parallel programming with MPI

###### tags: CSC course, MPI # Parallel programming with MPI This note contains the up-to-date information during the course ## Important links - [Zoom](https://cscfi.zoom.us/j/64679696411) - [RocketChat](https://chat.csc.fi/invite/5WQvZ4) - [Course github](https://github.com/csc-training/mpi-introduction) - [Lecture slides](https://events.prace-ri.eu/event/1223/attachments/1628/3088/LECTURE_slides_Parallel%20Programming%20with%20MPI%20%40%20CSC%202.9-3.9.2021.pdf) - [Course home page](https://events.prace-ri.eu/e/ParallelProgrammingMPI_CSC_SEP-2021) - [MPI reference](http://mpi.deino.net/mpi_functions/index.htm) ## General instructions - During the lectures, you can ask questions via microphone or Zoom chat - During the hands-on sessions, ask questions in the RocketChat (please use Multiline formatting for error messages and code snippets). - Complex questions with screen sharing etc. can be discussed in a private break-out room in Zoom. ## Exercises for current session ### Collective communication - Collective operations - (Bonus) Collective communication in Heat equation solver ## Agenda | Thursday | | | -------- | -------- | | 09:00 - 09:15 | Welcome the course | | 09:15 - 09:45 | What is high-performance computing | | 09:45 - 10:00 | *break* | | 10:00 - 10:45 | Parallel programming concepts | | 10:45 - 11:15 | Group work and demo | | 11:15 - 12:00 | Getting started with MPI | | 12:00 - 13:00 | **Lunch break** | | 13:00 - 13:45 | Exercises | | 13:45 - 14:30 | Point-to-point communication | | 14:30 - 14:45 | *break* | | 14:45 - 15:20 | Exercises | | 15:20 - 16:00 | Debugging | | 16:00 - 16:30 | Exercise walk-through and wrap-up | | Friday | | | -------- | -------- | | 09:00 - 09:15 | Summary of day 1 | | 09:15 - 09:35 | Programming practices and special parameters | | 09:35 - 10:45 | Exercises | | 10:45 - 11:15 | Performance analysis and communication patterns | | 11:15 - 12:00 | Exercises | | 12:00 - 13:00 | **Lunch break** | | 13:00 - 13:45 | Collective communication | | 13:45 - 14:30 | Exercises | | 14:30 - 14:45 | *break* | | 14:45 - 15:15 | Non-blocking communication | | 15:15 - 16:00 | Exercises | | 16:00 - 16:30 | Exercise walk-through and wrap-up | ## Quiz 1. How can boundary MPI tasks *e.g.* in communication chain be treated? A. Using `MPI_PROC_NULL` B. Using `MPI_ANY_SOURCE` C. By putting MPI calls into "if-else" structures D. With the help of "modulo" operations A.xxxxxxxxxxxxx B. C.xxxxxxxxx D.xxx 2. Which of the following statements apply to `MPI_Sendrecv`? A. It is required for correct functioning of MPI programs B. It is syntactic sugar in MPI C. It is useful preventing deadlocks D. It is useful for avoiding serialization of communication A. B. C.xxxxxxxxxxxxxxxxx D.xxxxxxxxxxxx 3. Which of the following statements about Collective communication are correct? A. All the processes participate in pairwise communication B. Every process sends a message to a specific process C. All the MPI tasks within a communicator communicate along the chosen pattern D. Every process receives messages from every other process A.xx B. C.xxxxxxxxxxxxxxxxxxx D. 4. The benefits of collective communication are A. There is no benefit B. Code is more compact C. MPI library can utilize special hardware D. MPI library can utilize efficient implementations A. B.xxxxxxxxxxxxxxx C.xx D.xxxxxxxxxxxxxxxxxx 5. What is the outcome of the following code snippet when run with 4 processes? ```fortran a(:) = my_id call mpi_gather(a, 2, MPI_INTEGER, aloc, 2, MPI_INTEGER, 3, MPI_COMM_WORLD, rc) if (my_id==3) print *, aloc(:) ``` A. "0 1 2 3" B. "2 2 2 2 2 2 2 2" C. "0 0 1 1 2 2 3 3" D. "0 1 2 3 0 1 2 3" A.x B. C.xxxxxxxxxxxxx D. 6. What is the outcome of the following code snippet when run with 8 processes, i.e. on ranks 0, 1, 2, 3, 4, 5, 6, 7 ```c if (rank % 2 == 0) { // Even processes MPI_Allreduce(&rank, &evensum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); if (0 == rank) printf("evensum = %d\n", evensum); } else { // odd processes MPI_Allreduce(&rank, &oddsum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); if (1 == rank) printf("oddsum = %d\n", oddsum); } ``` A. evensum = 16, oddsum = 12 B. evensum = 28, oddsum = 28 C. evensum = 12, oddsum = 16 D. evensum = 6, oddsum = 2 A. B.xxxx C.xxxxxxxxxx D. 7. Which of the following statements apply to non-blocking communication? A. Communication happens in the background during computation B. Latency is smaller and bandwidth better than with blocking routines C. There is possibility for overlapping communication and computation D. Non-blocking routines can have a small performance penalty A.xxxxxxxxxxxx B.x C.xxxxxxxxxxxxxxx D.xxxx 7. What is outcome of the following code snippet? ```c if (0 == myid) { int a = 4; MPI_Isend(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); a = 6; MPI_Wait(&req, MPI_STATUS_IGNORE); } else if (1 == myid) { int a; MPI_Irecv(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); std::cout << "a is: " << a << std::endl; MPI_Wait(&req, MPI_STATUS_IGNORE); } ``` A. 4 B. 6 C. Segmentation fault D. Not well defined A. B. C.xx D.xxxxxxxxxx ## Why I want to learn MPI? - I support research groups that need HPC and MPI to be used. But I want to improve my knowledge also. I believe that MPI is just as important as OpenMP as "scaling" tool. - Research group is developing parallel codes for astrophysical research, I would like to be able to contribute - MPI is used in a program that is developed by a collaboration between our group and several others. I don't intent to really make a program from scratch, but it would be valueble to have some experience to understand the code and to correct small bugs. - l would like to improve my scientific code to work with MPI routines - I have taught parallel computing with shared memory machines before (20 years ago), and would now like to learn to use the CSC computers efficiently for larger tasks with MPI - I use MPI programs to run atomic simulations and would like to understand the parallellisation better. - I want to do quantum mechancial calculations, which can be subdivided into many independent small calculations. - I want to run parallel programs requiring execution on multiple cores (code takes too long to run on a single core) - improve my skills in parallel programming - I am using parallel programming for Satellite Remote Sensing algorithms (lot of data, processing time important for Near Real Time processing) - I am interested in running Quantum Mechanical calculations which could be improved with MPI - I'm familiar with MPI4Py but I'd like to learn to use also the proper MPI. - To speed up easily parallelisable code, e.g. for computing quantum trajectories for simulating open systems. - I want to better understand the code that I am using, and optimize it if possible. - improve my codes - Do not have a specific use case as of now but want to learn the usage of this library and the possibilities it offers so that I can support researches better. I’ve come across MPI several times when looking up references on parallel programming and it is time I look into it. ## Fri morning quiz 1. What is MPI? A. the Message Passing interface B. the Miami Police Investigators C. the Minimal Polynomial instantiation D. the Millipede Podiatry institution E. a way of doing distributed memory parallel programming A.xxxxxxxxxxxxxxxxxxxxxxxxxx B. C. D. E.xxxxxxxxxxxxxxxx 2. How is a parallel MPI program executed? A. As a set of identical, independent processes B. Program starts serially, and then spawns and closes threads C. My MPI programs just crash :-( D. Each MPI task runs different program with different source code A.xxxxxxxxxxxxxxxxxxxDxxxx B. C.x D. 3. To compile and run an MPI program requires A. special compilers B. special libraries C. a special parallel computer D. a special operating system E. a launcher program and runtime system A.xxxxxxxx B.xxxxxxxxxxxxxxxxxxxxxxx C. D. E.xxxxxxxxxx 4. After initiating an MPI program with "mpiexec -n 4 ./my_mpi-_rogram", what does the call to MPI_Init() do? A. create the 4 parallel processes B. start program execution C. enable the 4 independent programs subsequently to communicate with each other D. create the 4 parallel threads A.xxxxxxxxxxxx B.xx C.xxxxxxXxxxxxxxx D.x 5. If you call MPI_Recv and there is no incoming message, what happens? A. the Recv fails with an error B. the Recv reports that there is no incoming message C. the Recv waits until a message arrives (potentially waiting forever) D. the Recv times out after some system specified delay (e.g. a few minutes) A. B. C.xxxxxxxxxxxxxxxxxxxxx D. Is D possible? Thanks! We may implement (the timeout) outside mpi library, if desidered? Ok. Yes, of course. I would say yes? 6. If you call MPI_Send and there is no matching receive, which of the following are possible outcomes? A. the message disappears B. the send fails with an error C. the send waits until a receive is posted (potentially waiting forever) D. the message is stored and delivered later on (if possible) E. the send times out after some system specified delay (e.g. a few minutes) F. the program continues execution regardless of whether the message is received A.x B.x C.xxxxxxxxxxxxxxX D.xxxxxxxxxxxx E. F.xxxxx 7. The MPI receive routine has a parameter "count". What does this mean? A. the size of the incoming message (in bytes) B. the size of the incoming message (in items, e.g. integers) C. the size of the buffer you have reserved tor storing the message in bytes D. the size of the buffer you have reserved for storing the message in items (e.g integers) A.x B.xxxxxxxxxxxx C.x D.xxxxxx Don't you have to reserve the memory space by yourself with in C with malloc etc 8. What happens if the incoming message is larger than “count" A. the receive fails with an error B. the receive reports zero data received C. the message writes beyond the end or the available storage D. only the first "count" items are received A.xxxxxxx B. C. D. xxxxxxxxxxxxx I tried this on puhti and got an error 9. What happens if the incoming message (of size "n") is smaller than "count" A. the receive fails with an error B. the receive reports zero data received C. the first "n" items are received D. the first "n" items are received and the rest of the storage is zeroed A.x B. C.xxxxxxxxxxxxxxxxxxx D. 10. How is the actual size of the incoming message reported? A. the value of "count" in the receive is updated. B. MPI cannot tell you C. it is stored in the Status parameter D. via the associated tag A. B. C.xxxxxxxxxxxxxx D. 11. Which of the following are possible outputs from this piece of code run on 3 processes: ``` printf("Welcome from rank %d\n“, rank); printf("Goodbye from rank %d\n", rank); ``` A. Welcome from rank 0 Welcome from rank 1 Welcome from rank 2 Goodbye from rank 0 Goodbye from rank 1 Goodbye from rank 2 B. Welcome from rank 2 Welcome from rank 1 Goodbye from rank 0 Goodbye from rank 1 Goodbye from rank 2 Welcome from rank 0 C. Welcome from rank 2 Goodbye from rank 2 Welcome from rank 0 Welcome from rank 1 Goodbye from rank 1 Goodbye from rank 0 D. Welcome from rank 0 Goodbye from rank 1 Welcome from rank 2 Goodbye from rank 0 Welcome from rank 1 Goodbye from rank 2 A.xxxxxxxxxxxxxxxxx B.xx C.xxxxxxxxxx xxxxxxx D.xxxx 12. Which of the following statements do you agree with regarding this code: ``` for (i=0; i < size; i++) { if (rank == i) { printf("Hello frorm rank %d\n", rank); j = 10*i; } } ``` A. The for loop ensures the operations are in order: rank 0, then rank 1, ... B. The for loop ensures the operation are done in parallel across all processes C. The for loop is entirely redundant D. The final value of j will be equal to 10*(size-1) A.x B.x C.xxxxxxxxxxxxxxxxxx D.xxxxx ``` printf("Hello frorm rank %d\n", rank); j = 10*rank; /* Shouldn't this be 10*rank? */ ``` ## Free discussion Feel free to add any general remarks, tips, tricks, comments etc. here. For questions during the exercise sessions use, however, RocketChat as that will monitored more frequently