Parallel Programming HW2 @NYCU, 2022 Fall
===
###### tags: `2022_PP_NYCU`
<!-- | 學號 | 姓名 |
| -------- | -------- |
| 310552060 |湯智惟 | -->
## Q1
### In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1.

```
❯ ./mandelbrot -t 1
[mandelbrot serial]: [668.115] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [668.263] ms
Wrote image file mandelbrot-thread.ppm
(1.00x speedup from 1 threads)
❯ ./mandelbrot -t 2
[mandelbrot serial]: [519.932] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [265.877] ms
Wrote image file mandelbrot-thread.ppm
(1.96x speedup from 2 threads)
❯ ./mandelbrot -t 3
[mandelbrot serial]: [798.260] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [492.891] ms
Wrote image file mandelbrot-thread.ppm
(1.62x speedup from 3 threads)
❯ ./mandelbrot -t 4
[mandelbrot serial]: [794.566] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [334.012] ms
Wrote image file mandelbrot-thread.ppm
(2.38x speedup from 4 threads)
```
)
```
❯ ./mandelbrot -t 1 -v 2
[mandelbrot serial]: [488.544] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [489.154] ms
Wrote image file mandelbrot-thread.ppm
(1.00x speedup from 1 threads)
❯ ./mandelbrot -t 2 -v 2
[mandelbrot serial]: [486.680] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [284.448] ms
Wrote image file mandelbrot-thread.ppm
(1.71x speedup from 2 threads)
❯ ./mandelbrot -t 3 -v 2
[mandelbrot serial]: [480.144] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [217.992] ms
Wrote image file mandelbrot-thread.ppm
(2.20x speedup from 3 threads)
❯ ./mandelbrot -t 4 -v 2
[mandelbrot serial]: [486.658] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [183.308] ms
Wrote image file mandelbrot-thread.ppm
(2.65x speedup from 4 threads)
```
### Is speedup linear in the number of threads used?
:::info
view1 的 speedup 和 thread number **非線性**
view2 的 speedup 和 thread number **線性**
:::
### In your writeup hypothesize why this is (or is not) the case?
:::info
- view1 的圖有黑白區塊非常明顯,黑白區塊在做運算時所花的時間不同,因此不同thread會有不同的運算時間。
- view2 的圖無太大顏色改變,所以在運算上的時間差不多,thread 數量和speedup 呈線性。
:::
---
## Q2 How do your measurements explain the speedup graph you previously created?
:::info
View1 因為每個 thread 工作量分配不均的關係,圖片中白色區塊的部份會花比較多計算,所以在不同 thread 數量下的 speedup 之所以沒有成線性成長。
:::
view1 實驗結果
```
$ ./mandelbrot -t 2
[mandelbrot serial]: [459.212] ms
Wrote image file mandelbrot-serial.ppm
thread0 takes [0.234840]
thread1 takes [0.236007]
thread0 takes [0.236044]
thread1 takes [0.237290]
thread0 takes [0.234665]
thread1 takes [0.235717]
thread0 takes [0.235215]
thread1 takes [0.236302]
thread0 takes [0.235817]
thread1 takes [0.237080]
[mandelbrot thread]: [235.903] ms
Wrote image file mandelbrot-thread.ppm
(1.95x speedup from 2 threads)
$ ./mandelbrot -t 3
[mandelbrot serial]: [458.592] ms
Wrote image file mandelbrot-serial.ppm
thread0 takes [0.093183]
thread2 takes [0.093638]
thread1 takes [0.282862]
thread0 takes [0.093044]
thread2 takes [0.093636]
thread1 takes [0.282560]
thread0 takes [0.093067]
thread2 takes [0.093692]
thread1 takes [0.282518]
thread0 takes [0.093067]
thread2 takes [0.093701]
thread1 takes [0.282682]
thread0 takes [0.092993]
thread2 takes [0.093618]
thread1 takes [0.282442]
[mandelbrot thread]: [282.581] ms
Wrote image file mandelbrot-thread.ppm
(1.62x speedup from 3 threads)
$ ./mandelbrot -t 4
[mandelbrot serial]: [459.216] ms
Wrote image file mandelbrot-serial.ppm
thread0 takes [0.045263]
thread3 takes [0.045677]
thread1 takes [0.192079]
thread2 takes [0.192883]
thread0 takes [0.045370]
thread3 takes [0.045800]
thread1 takes [0.192164]
thread2 takes [0.192752]
thread0 takes [0.045333]
thread3 takes [0.045706]
thread1 takes [0.191870]
thread2 takes [0.192589]
thread0 takes [0.045332]
thread3 takes [0.045726]
thread1 takes [0.191945]
thread2 takes [0.192660]
thread0 takes [0.045309]
thread3 takes [0.045751]
thread1 takes [0.192286]
thread2 takes [0.192915]
[mandelbrot thread]: [192.802] ms
Wrote image file mandelbrot-thread.ppm
(2.38x speedup from 4 threads)
```
圖片分配thread 狀況:

- 由實驗結果可以看到
- 3 個 thread時,thread 1 所花的時間比 thread 0, 2 還要多
- 4 個 thread時,thread 1,2 所花的時間比 thread 0, 3 還要多
- 由圖片切割可以看出
- 3 個 thread時,thread 1 要處理圖片中間那一塊,有比較多白色的區塊
- 4 個 thread時,thread 1,2 要處理圖片中間那二塊,有比較多白色的區塊
---
## Q3
### In your write-up, describe your approach to parallelization
:::info
我使用的方法是把圖切成很多區塊,並且讓每一個thread 都平均分配,讓每個thread 的工作量差不多。
如下圖與程式:
:::

```c
int step = args->numThreads;
int height = args->height;
for(int r = 0; r<height; r+=step)
{
mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, r, 1,args->maxIterations, args->output);
}
```
### Report the final 4-thread speedup obtained.
view1
```
$ ./mandelbrot -t 4
[mandelbrot serial]: [458.913] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [120.965] ms
Wrote image file mandelbrot-thread.ppm
(3.79x speedup from 4 threads)
```
view2
```
$ ./mandelbrot -t 4 -v 2
[mandelbrot serial]: [287.501] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [75.835] ms
Wrote image file mandelbrot-thread.ppm
(3.79x speedup from 4 threads)
```
---
## Q4
### Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads?
:::info
在performance 上, 8 thread並沒有比 4 thread有顯著的進步。
:::
- view1 with 8 thread
```
[mandelbrot serial]: [458.971] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [121.978] ms
Wrote image file mandelbrot-thread.ppm
(3.76x speedup from 8 threads)
```
- view2 with 8 thread
```
[mandelbrot serial]: [287.568] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [75.958] ms
Wrote image file mandelbrot-thread.ppm
(3.79x speedup from 8 threads)
```
| | view1 | view2 |
| -------- | -------- | -------- |
| 4-thread | 3.79x speedup | 3.79x speedup |
| 8-thread | 3.76x speedup | 3.79x speedup |
### Why or why not? (Notice that the workstation server provides 4 cores 4 threads.)
:::info
因為當 thread 數量多於核心數,會造成額外的 context switch overhead,導致速度沒有明顯的進步,甚至稍微退步。
:::
我有另外找了一台6核心的電腦作8 thread來實驗
```
$ ./mandelbrot -t 8
[mandelbrot serial]: [394.422] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [55.181] ms
Wrote image file mandelbrot-thread.ppm
(7.15x speedup from 8 threads)
$ ./mandelbrot -t 8 -v 2
[mandelbrot serial]: [481.647] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [74.546] ms
Wrote image file mandelbrot-thread.ppm
(6.46x speedup from 8 threads)
```
由實驗結果可以看出核心數增加確實可以加速,但若是比thread小很多,速度沒辦法顯著的進步。