# Parallel Programming HW3
###### tags: `Parallel Programming`
## Part1
### Q1
1. How do you control the number of MPI processes on each node?
**Ans:**
In the hostfile, add slots=N after the hostname, which means that only N processes can be executed with that hostname.
```
pp2 slots=2
```
2. Which functions do you use for retrieving the rank of an MPI process and the total number of processes?
**Ans:**
* *MPICommrank(MPICOMMWORLD, &worldrank)* to get the rank of the process
* *MPICommsize(MPICOMMWORLD, &worldsize)* to get the number of processes
### Q2
1. Why MPI_Send and MPI_Recv are called “blocking” communication?
**Ans:**
Prior to returning, these two functions will wait until the entire message communication has been completed. The process will then proceed after this function returns before continuing to run.
2. Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.
**Ans:**






### Q3
1. Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it.
**Ans:**






2. How does the performance of binary tree reduction compare to the performance of linear reduction?
**Ans:**
The below figure shows the linear execution time minus tree's. Linear is faster than tree.

3. Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs.
**Ans:**
Linear performs better. In the case of tree, each node needs to be sent, picked up, and accumulated once, which results in a higher total number of times than linear.
### Q4
1. Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.
**Ans:**






2. What are the MPI functions for non-blocking communication?
**Ans:**
* MPI_Isend
* MPI_Irecv
* MPI_Wait
* MPI_Waitany
* MPI_Test
* MPI_Testany
3. How the performance of non-blocking communication compares to the performance of blocking communication?
**Ans:**
The below figure shows the nonblock execution time minus block's. In my case, block is faster than nonblock.

### Q5
1. Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.
**Ans:**






### Q6
1. Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.
**Ans:**






## Part2
### Q7
1. Describe what approach(es) were used in your MPI matrix multiplication for each data set.
**Ans:**
* Divide the matrix A into N blocks, where N represents the number of processes.
* By using MPI_Bcast, each process receives matrix B.
* In each process, only the multiplication of the matrix A and matrix B assigned to it is required.