Parallel Programming HW4 @NYCU, 2022 Fall

tags: `2022_PP_NYCU`

Q1

How do you control the number of MPI processes on each node?

在 hosts中設置 slots ，hostname後面加上 slots = n，slots的數量表示該node可執行的processes數量。

For instance:

pp2 slots=4
pp3 slots=4

Which functions do you use for retrieving the rank of an MPI process and the total number of processes?

Retreve the rank of an MPI process：

MPI_Comm_rank(MPI_Comm comm, int *rank)

Retrieve the total number of processes：

MPI_Comm_size(MPI_Comm comm, int *size)

Q2

Why MPI_Send and MPI_Recv are called “blocking” communication?

MPI_Send 訊息被其他process接收才繼續往下執行程式。
MPI_Recv 接收到訊息才繼續往下執行程式。

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

pi_block_linear.cc

Processes	2	4	8	12	16
time	9.969061	5.510608	3.981320	2.941443	1.908585

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Q3

Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it.

pi_block_tree.cc

Processes	2	4	8	16
time	9.631155	5.942813	3.754705	1.443440

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

How does the performance of binary tree reduction compare to the performance of linear reduction?

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

linear reduction 和 binary tree reduction 的效果差不多，Binary tree reduction 稍微好一點。

Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs.

Increasing the number of processes, binary tree reduction is going to perform better.

因為 binary tree reduction 可以讓加法運算分散在多個processors 上，隨著processors增加，傳輸messages也越少。
但是 linear reduction 需要等每個processor 算完後，在一個processor 中一一加總。
因此，Binary tree reduction在效能上會好一點。

Q4

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)

pi_nonblock_linear.cc

Processes	2	4	8	12	16
time	9.332650	4.995334	3.136756	2.651476	1.936272

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

What are the MPI functions for non-blocking communication?

MPI_ISEND( start, count, datatype, dest, tag, comm, request )
MPI_IRECV( start, count, datatype, src, tag, comm, request )
MPI_WAIT( request, status )
MPI_TEST( request, flag, status )

How the performance of non-blocking communication compares to the performance of blocking communication?

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Blocking 和 Non-Blocking 的結果差異不大，Non-Blocing 略好一些。
因為Non-Blocking 不用等待其他人receive，可以先去做其他運算。

Q5.

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

pi_gather.cc

Processes	2	4	8	12	16
time	9.236979	5.001171	2.567303	1.785666	1.357843

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Q6.

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

MPI_Reduce()

Processes	2	4	8	12	16
time	9.233236	5.236464	2.508783	1.921540	1.327587

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Q7

Describe what approach(es) were used in your MPI matrix multiplication for each data set.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

先分成 master（rank=0）和 worker（rank>0）

Master把matrix A 的rows 平均分給每個worker去計算，把結果加總後，最後印出

send:
- matrix A 中多少rows
- matrix B, C 長寬
- matrix B, matrix C
- offset(相對A或C矩陣的位置)

Worker各別把結果算出來後，把partial matrix send 回去

傳送方式皆以MPI中的send, recv作傳輸

Parallel Programming HW4 @NYCU, 2022 Fall

tags: 2022_PP_NYCU

Q1

How do you control the number of MPI processes on each node?

Which functions do you use for retrieving the rank of an MPI process and the total number of processes?

Q2

Why MPI_Send and MPI_Recv are called “blocking” communication?

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Q3

Measure the performance (execution time) of the code for 2, 4, 8, 16 MPI processes and plot it.

How does the performance of binary tree reduction compare to the performance of linear reduction?

Increasing the number of processes, which approach (linear/tree) is going to perform better? Why? Think about the number of messages and their costs.

Q4

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it. (1 points)

What are the MPI functions for non-blocking communication?

How the performance of non-blocking communication compares to the performance of blocking communication?

Q5.

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Q6.

Measure the performance (execution time) of the code for 2, 4, 8, 12, 16 MPI processes and plot it.

Q7

Describe what approach(es) were used in your MPI matrix multiplication for each data set.

Read more

Parallel Programming HW1 @NYCU, 2022 Fall

Parallel Programming HW2 @NYCU, 2022 Fall

Parallel Programming HW5 @NYCU, 2022 Fall

Parallel Programming HW6 @NYCU, 2022 Fall

tags: `2022_PP_NYCU`