PP-HW4 - HackMD

PP-HW4 =============================== > 0716078 鄒義杰 ## Q1 1. 在 hostfile 裡面設定 slot=N ```config pp2 slots=1 pp3 slots=1 pp4 slots=1 pp5 slots=1 pp6 slots=1 pp7 slots=1 pp8 slots=1 pp10 slots=1 ``` 2. Use `MPI_Comm_rank` for retrieving the rank of an MPI process. Use `MPI_Comm_size` for retrieving the total number of processes. ```c MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); ``` ## Q2 1. `MPI_Send` 要等到接收端 call `MPI_Recv` 確定接收到才會執行下一道指令。 `MPI_Recv` 要等到發送端 call `MPI_Send` 確定接收到才會執行下一道指令。 2. ![](https://i.imgur.com/87kvGud.png) ## Q3 1. ![](https://i.imgur.com/SmAz2UR.png) 2. `binary tree reduction` 會比 `linear reduction` 快一點點，因為總共需要的加法次數比較少 3. np 越大 `binary tree reduction` 的速度反而比較慢，我覺得是因為我每一層都要 `MPI_Barrier(MPI_COMM_WORLD)` 確保其他組都做完才進行下一層。 ## Q4 1. ![](https://i.imgur.com/9t3GbZO.png) 2. `MPI_Irecv` 3. non-blocking 跟 blocking 的效能看起來差不多，我覺得是因為 non-blocking 其實也是要等到全部的 recv 都到了之後才做加法，所以跟 blocking 差不多。 ## Q5 1. ![](https://i.imgur.com/A4gDo1Q.png) ## Q6 1. ![](https://i.imgur.com/QGFxdKC.png) ## Q7 1. 將 A 的 row 平均分個每個 rank 做，每個 rank 將分到的 A 與整個 B 計算，算出其中幾行 C，最後再用 reduce 的方式把所有的 C 加起來。