# PP_HW4
### Q1: Hello-mpi
1.How do you control the number of MPI processes on each node? (1 points)
#### ans:
#### 在hostfile定義哪些node使用在這次mpirun中
2.Which functions do you use for retrieving the rank of an MPI process and the total number of processes? (1 points)
#### ans:
#### rank:MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
#### total:process number MPI_Comm_size(MPI_COMM_WORLD,&world_size);
3.We use Open MPI for this assignment. What else MPI implementation is commonly used? What is the difference between them? (1 points)
#### ans:
#### mpi主要應用在計算複雜型的應用如影像計算方面,master(rank=0)分配工作給每個node並拿回結果,在Hello World這個程式中沒有計算複雜的工作。
### Q2: Block-linear
1.Why MPI_Send and MPI_Recv are called “blocking” communication? (1 points)
#### ans:
#### call MPI_Send後會等到其他process確實執行MPI_Recv後才會繼續執行 , MPI_Recv也是會block住直到他收到資料才會繼續執行。
2.Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)


### Q3: Block-tree
1.Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it. (1 points)
#### ans:


2.How does the performance of binary tree reduction compare to the performance of linear reduction? (2 points)
#### ans:
#### linear都會比binary的略快
3/Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs. (3 points)
#### ans:
#### 以8個process來舉例,在linear需要7次通訊來傳遞pi, 在tree中同樣也需要7次,但tree某些process需要先recv才會send導致雖然通訊數量相同但linear會比binary略快。

### Q4: Non-block-linear
1.Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)
#### ans:


2.What are the MPI functions for non-blocking communication? (1 points)
#### ans:
#### MPI_Isend()跟MPI_Irecv()
3.How the performance of non-blocking communication compares to the performance of blocking communication? (3 points)
#### ans:
基本上兩者差不多

### Q5:MPI_Gather
1.Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)


### Q6:Reduce
1.Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)


### Q7: One_side
1.Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)


2.Which approach gives the best performance among the 1.2.1-1.2.6 cases? What is the reason for that? (3 points)
#### ans : 在我的測試當中,nonblocking 會是其中最快的,我認為原因noblocking執行過後不管對方是否收到就繼續執行下去,當然也有可能是我跑測時間的時候,同時也有其他人在跑影響到時間。
3.Which algorithm or algorithms do MPI implementations use for reduction operations? You can research this on the WEB focusing on one MPI implementation. (1 points)
#### ans : MPI reduction 用的演算法是rank-order-based deterministic algorithms 。https://www.mcs.anl.gov/papers/P4093-0713_1.pdf
### Q8:
1.Plot ping-pong time in function of the message size for cases 1 and 2, respectively. (2 points)
#### ans:
#### case 1: mpicxx ping_pong.c -o ping_pong; mpirun -np 2 -npernode 2 --hostfile hosts ping_pong

#### case 2: mpicxx ping_pong.c -o ping_pong; mpirun -np 2 -npernode 1 --hostfile hosts ping_pong

2.Calculate the bandwidth and latency for cases 1 and 2, respectively. (3 points)
#### ans:
#### case 1:
#### bandwidth = 6780353631.749777 = 6.78 GB/s
#### latency = 5.375391367104946e-05 s = 53.7 * 10^-6 s
#### case 2:
#### bandwidth = 116773102.89540826 = 0.116 GB/s
#### latency = 0.0003346260147482886 = 334 * 10^-6 s
3.For case 2, how do the obtained values of bandwidth and latency compare to the nominal network bandwidth and latency of the NCTU-PP workstations. What are the differences and what could be the explanation of the differences if any? (4 points)
#### ans:
#### latency影響可能因素為以下幾種情況
#### propagation delay:封包在網路線上傳輸所花費的時間,與網路線上電子訊號跑的速度有關,這個時間就是距離除以訊號傳送速度所得到的數值。假設傳送距離為 d ,傳輸的速率為 s ,那麼 propagation delay 就是 d/s。
#### transmission delay:網路卡將資料傳送到網路線上(或從網路線上接收)所花的時間,與網路設備的傳送速度有關(如高速乙太網路傳送速度為 100Mbps)。假設頻寬為 L(bits),數據傳輸速率為 R(bits/sec),這樣產生的 transmission delay 就是 L/R。
#### nodal processing delay:路由器處理封包表頭(packet header)、檢查位元資料錯誤與尋找配送路徑等所花費的時間。
#### queuing delay:路由器因為某些因素無法立刻將封包傳送到網路上,造成封包暫存在佇列(queue)中等待的時間。
#### 種種影響導致結果不如我們使用ping所得到的結果。
### Q9:
Describe what approach(es) were used in your MPI matrix multiplication for each data set.
#### ans:
#### dataset1: 矩陣A(6x8)和矩陣B(8x4)相乘,master(rank=0) send矩陣B和所需要算的列給每個process,舉例:rank=1的process拿到A的2、3列和B矩陣,完成計算後返回2x4矩陣,master recv每個process算完的小矩陣,最後得到答案。