Parallel Programming HW4 @NYCU, 2022 Fall === ###### tags: `2022_PP_NYCU` <!-- | 學號 | 姓名 | | -------- | -------- | | 310552060 |湯智惟 | --> ## Q1 ### How do you control the number of MPI processes on each node? 在 hosts中設置 **slots** ,hostname後面加上 slots = n,slots的數量表示該node可執行的processes數量。 For instance: ``` pp2 slots=4 pp3 slots=4 ``` ### Which functions do you use for retrieving the rank of an MPI process and the total number of processes? Retreve the rank of an MPI process: ```c MPI_Comm_rank(MPI_Comm comm, int *rank) ``` Retrieve the total number of processes: ```c MPI_Comm_size(MPI_Comm comm, int *size) ``` --- ## Q2 ### Why MPI_Send and MPI_Recv are called “blocking” communication? `MPI_Send` 訊息被其他process接收才繼續往下執行程式。 `MPI_Recv` 接收到訊息才繼續往下執行程式。 ### Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. `pi_block_linear.cc` | Processes | 2 | 4 | 8 | 12 | 16 | | -------- | -------- | -------- |--- |--- |--- | | time | 9.969061 | 5.510608 |3.981320 |2.941443| 1.908585 | ![](https://i.imgur.com/s1ItFRI.png) --- ## Q3 ### Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it. `pi_block_tree.cc` | Processes | 2 | 4 | 8 | 16 | | -------- | -------- | -------- |--- |--- | | time | 9.631155 | 5.942813 |3.754705 |1.443440| ![](https://i.imgur.com/uvobDyx.png) ### How does the performance of binary tree reduction compare to the performance of linear reduction? ![](https://i.imgur.com/3COZjNv.png) linear reduction 和 binary tree reduction 的效果差不多,Binary tree reduction 稍微好一點。 ### Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs. Increasing the number of processes, **binary tree reduction** is going to perform better. 因為 binary tree reduction 可以讓加法運算分散在多個processors 上,隨著processors增加,傳輸messages也越少。 但是 linear reduction 需要等每個processor 算完後,在一個processor 中一一加總。 因此,Binary tree reduction在效能上會好一點。 --- ## Q4 ### Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points) `pi_nonblock_linear.cc` | Processes | 2 | 4 | 8 | 12 | 16 | | -------- | -------- | -------- |--- |--- |--- | | time | 9.332650 | 4.995334 |3.136756 |2.651476| 1.936272 | ![](https://i.imgur.com/CavsFnq.png) ### What are the MPI functions for non-blocking communication? ```c MPI_ISEND( start, count, datatype, dest, tag, comm, request ) MPI_IRECV( start, count, datatype, src, tag, comm, request ) MPI_WAIT( request, status ) MPI_TEST( request, flag, status ) ``` ### How the performance of non-blocking communication compares to the performance of blocking communication? ![](https://i.imgur.com/px6UwFV.png) Blocking 和 Non-Blocking 的結果差異不大,Non-Blocing 略好一些。 因為Non-Blocking 不用等待其他人receive,可以先去做其他運算。 --- ## Q5. ### Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. `pi_gather.cc` | Processes | 2 | 4 | 8 | 12 | 16 | | -------- | -------- | -------- |--- |--- |--- | | time | 9.236979| 5.001171 |2.567303 |1.785666| 1.357843 | ![](https://i.imgur.com/oSsL2az.png) --- ## Q6. ### Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. `MPI_Reduce()` | Processes | 2 | 4 | 8 | 12 | 16 | | -------- | -------- | -------- |--- |--- |--- | | time | 9.233236| 5.236464 |2.508783|1.921540| 1.327587| ![](https://i.imgur.com/G5RM5ng.png) --- ## Q7 ### Describe what approach(es) were used in your MPI matrix multiplication for each data set. ![](https://i.imgur.com/HwBzFSD.png) 先分成 `master`(rank=0)和 `worker`(rank>0) **Master**把matrix A 的rows 平均分給每個worker去計算,把結果加總後,最後印出 - send: - matrix A 中多少rows - matrix B, C 長寬 - matrix B, matrix C - offset(相對A或C矩陣的位置) **Worker**各別把結果算出來後,把partial matrix send 回去 傳送方式皆以MPI中的`send`, `recv`作傳輸