Parallel programming HW4

Q1.1: How do you control the number of MPI processes on each node?

MPI run programs according to hostfile, it assign programs to machine in hostfile one-by-one.

machine_1
machine_1
machine_2

If we're going to run 8 copies of the program and hostfile is defined as above, program_1 & program 2 will be assigned to machine_1, program_3 will be assigned to machine_2, program_4 will be assigned to machine_1, …etc.

https://stackoverflow.com/questions/14411763/mapping-mpi-processes-to-particular-nodes

Q1.2: Which functions do you use for retrieving the rank of an MPI process and the total number of processes?

MPI_Comm_size is used to get the total number of processes.
MPI_Comm_rank is used to get the rank of current process.

Q2.1: Why MPI_Send and MPI_Recv are called “blocking” communication?

Since the result must be received by main machine, the communication should be syncronous, which means the program would be blocked until those operations are finished.

Q2.2: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Number of processes	2	4	8	12	16
Execution time	9.908397	5.456350	2.688769	1.824668	1.364209

Q3.1: Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Number of processed	2	4	8	16
Execution time	9.839091	5.292457	2.740805	1.832486

Q3.2: How does the performance of binary tree reduction compare to the performance of linear reduction?

The performance got not much improvement while binary tree reduction is used.

Q3.3: Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs.

linear approach may have better performance.
Since the tree approach need more message transfer, which take much more time than CPU calculation.

Q4.1: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Number of processes	2	4	8	12	16
Execution time	9.797364	5.305192	2.683531	2.438963	1.862100

Q4.2: What are the MPI functions for non-blocking communication?

send: MPI_Send
receive: MPI_Irecv

Q4.3: How the performance of non-blocking communication compares to the performance of blocking communication?

nonblocking communication have better performance while number of processes is small, but performance get worse compared to blocking one while the number of processes increases.

Q5: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Number of processes	2	4	8	12	16
Execution time	10.049845	5.587510	3.200731	2.257737	2.097789

Q6: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (reduce)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Number of processes	2	4	8	12	16
Execution time	10.105532	5.587154	2.961666	2.046166	2.500450

Q7: Describe what approach(es) were used in your MPI matrix multiplication for each data set.

Send part of values of matrix A to nodes according to MPI_rank, while sending whole matrix B to all nodes.
Calculate partial result

each node calculate row [
$r a n k * \frac{n}{n u m n o d e s}, (r a n k + 1) * \frac{n}{n u m n o d e s}$ )
Send all result back to node0 and ouput the result.

Parallel programming HW4

Q1.1: How do you control the number of MPI processes on each node?

Q1.2: Which functions do you use for retrieving the rank of an MPI process and the total number of processes?

Q2.1: Why MPI_Send and MPI_Recv are called “blocking” communication?

Q2.2: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Q3.1: Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it.

Q3.2: How does the performance of binary tree reduction compare to the performance of linear reduction?

Q3.3: Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs.

Q4.1: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Q4.2: What are the MPI functions for non-blocking communication?

Q4.3: How the performance of non-blocking communication compares to the performance of blocking communication?

Q5: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Q6: Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (reduce)

Q7: Describe what approach(es) were used in your MPI matrix multiplication for each data set.

Read more

Parallel programming HW5

Parallel programming HW2

〈Concurrency Primer〉校訂和範例撰寫

2022q1 Homework6 (ktcp)