Parallel Programming HW2 @NYCU, 2022 Fall === ###### tags: `2022_PP_NYCU` <!-- | 學號 | 姓名 | | -------- | -------- | | 310552060 |湯智惟 | --> ## Q1 ### In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. ![](https://i.imgur.com/17iq9aj.png) ``` ❯ ./mandelbrot -t 1 [mandelbrot serial]: [668.115] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [668.263] ms Wrote image file mandelbrot-thread.ppm (1.00x speedup from 1 threads) ❯ ./mandelbrot -t 2 [mandelbrot serial]: [519.932] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [265.877] ms Wrote image file mandelbrot-thread.ppm (1.96x speedup from 2 threads) ❯ ./mandelbrot -t 3 [mandelbrot serial]: [798.260] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [492.891] ms Wrote image file mandelbrot-thread.ppm (1.62x speedup from 3 threads) ❯ ./mandelbrot -t 4 [mandelbrot serial]: [794.566] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [334.012] ms Wrote image file mandelbrot-thread.ppm (2.38x speedup from 4 threads) ``` ![]([Imgur](https://imgur.com/JUnmiri)) ``` ❯ ./mandelbrot -t 1 -v 2 [mandelbrot serial]: [488.544] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [489.154] ms Wrote image file mandelbrot-thread.ppm (1.00x speedup from 1 threads) ❯ ./mandelbrot -t 2 -v 2 [mandelbrot serial]: [486.680] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [284.448] ms Wrote image file mandelbrot-thread.ppm (1.71x speedup from 2 threads) ❯ ./mandelbrot -t 3 -v 2 [mandelbrot serial]: [480.144] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [217.992] ms Wrote image file mandelbrot-thread.ppm (2.20x speedup from 3 threads) ❯ ./mandelbrot -t 4 -v 2 [mandelbrot serial]: [486.658] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [183.308] ms Wrote image file mandelbrot-thread.ppm (2.65x speedup from 4 threads) ``` ### Is speedup linear in the number of threads used? :::info view1 的 speedup 和 thread number **非線性** view2 的 speedup 和 thread number **線性** ::: ### In your writeup hypothesize why this is (or is not) the case? :::info - view1 的圖有黑白區塊非常明顯,黑白區塊在做運算時所花的時間不同,因此不同thread會有不同的運算時間。 - view2 的圖無太大顏色改變,所以在運算上的時間差不多,thread 數量和speedup 呈線性。 ::: --- ## Q2 How do your measurements explain the speedup graph you previously created? :::info View1 因為每個 thread 工作量分配不均的關係,圖片中白色區塊的部份會花比較多計算,所以在不同 thread 數量下的 speedup 之所以沒有成線性成長。 ::: view1 實驗結果 ``` $ ./mandelbrot -t 2 [mandelbrot serial]: [459.212] ms Wrote image file mandelbrot-serial.ppm thread0 takes [0.234840] thread1 takes [0.236007] thread0 takes [0.236044] thread1 takes [0.237290] thread0 takes [0.234665] thread1 takes [0.235717] thread0 takes [0.235215] thread1 takes [0.236302] thread0 takes [0.235817] thread1 takes [0.237080] [mandelbrot thread]: [235.903] ms Wrote image file mandelbrot-thread.ppm (1.95x speedup from 2 threads) $ ./mandelbrot -t 3 [mandelbrot serial]: [458.592] ms Wrote image file mandelbrot-serial.ppm thread0 takes [0.093183] thread2 takes [0.093638] thread1 takes [0.282862] thread0 takes [0.093044] thread2 takes [0.093636] thread1 takes [0.282560] thread0 takes [0.093067] thread2 takes [0.093692] thread1 takes [0.282518] thread0 takes [0.093067] thread2 takes [0.093701] thread1 takes [0.282682] thread0 takes [0.092993] thread2 takes [0.093618] thread1 takes [0.282442] [mandelbrot thread]: [282.581] ms Wrote image file mandelbrot-thread.ppm (1.62x speedup from 3 threads) $ ./mandelbrot -t 4 [mandelbrot serial]: [459.216] ms Wrote image file mandelbrot-serial.ppm thread0 takes [0.045263] thread3 takes [0.045677] thread1 takes [0.192079] thread2 takes [0.192883] thread0 takes [0.045370] thread3 takes [0.045800] thread1 takes [0.192164] thread2 takes [0.192752] thread0 takes [0.045333] thread3 takes [0.045706] thread1 takes [0.191870] thread2 takes [0.192589] thread0 takes [0.045332] thread3 takes [0.045726] thread1 takes [0.191945] thread2 takes [0.192660] thread0 takes [0.045309] thread3 takes [0.045751] thread1 takes [0.192286] thread2 takes [0.192915] [mandelbrot thread]: [192.802] ms Wrote image file mandelbrot-thread.ppm (2.38x speedup from 4 threads) ``` 圖片分配thread 狀況: ![](https://i.imgur.com/X700Nra.png) - 由實驗結果可以看到 - 3 個 thread時,thread 1 所花的時間比 thread 0, 2 還要多 - 4 個 thread時,thread 1,2 所花的時間比 thread 0, 3 還要多 - 由圖片切割可以看出 - 3 個 thread時,thread 1 要處理圖片中間那一塊,有比較多白色的區塊 - 4 個 thread時,thread 1,2 要處理圖片中間那二塊,有比較多白色的區塊 --- ## Q3 ### In your write-up, describe your approach to parallelization :::info 我使用的方法是把圖切成很多區塊,並且讓每一個thread 都平均分配,讓每個thread 的工作量差不多。 如下圖與程式: ::: ![](https://i.imgur.com/vsNVHLx.png) ```c int step = args->numThreads; int height = args->height; for(int r = 0; r<height; r+=step) { mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, r, 1,args->maxIterations, args->output); } ``` ### Report the final 4-thread speedup obtained. view1 ``` $ ./mandelbrot -t 4 [mandelbrot serial]: [458.913] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [120.965] ms Wrote image file mandelbrot-thread.ppm (3.79x speedup from 4 threads) ``` view2 ``` $ ./mandelbrot -t 4 -v 2 [mandelbrot serial]: [287.501] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [75.835] ms Wrote image file mandelbrot-thread.ppm (3.79x speedup from 4 threads) ``` --- ## Q4 ### Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? :::info 在performance 上, 8 thread並沒有比 4 thread有顯著的進步。 ::: - view1 with 8 thread ``` [mandelbrot serial]: [458.971] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [121.978] ms Wrote image file mandelbrot-thread.ppm (3.76x speedup from 8 threads) ``` - view2 with 8 thread ``` [mandelbrot serial]: [287.568] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [75.958] ms Wrote image file mandelbrot-thread.ppm (3.79x speedup from 8 threads) ``` | | view1 | view2 | | -------- | -------- | -------- | | 4-thread | 3.79x speedup | 3.79x speedup | | 8-thread | 3.76x speedup | 3.79x speedup | ### Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) :::info 因為當 thread 數量多於核心數,會造成額外的 context switch overhead,導致速度沒有明顯的進步,甚至稍微退步。 ::: 我有另外找了一台6核心的電腦作8 thread來實驗 ``` $ ./mandelbrot -t 8 [mandelbrot serial]: [394.422] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [55.181] ms Wrote image file mandelbrot-thread.ppm (7.15x speedup from 8 threads) $ ./mandelbrot -t 8 -v 2 [mandelbrot serial]: [481.647] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [74.546] ms Wrote image file mandelbrot-thread.ppm (6.46x speedup from 8 threads) ``` 由實驗結果可以看出核心數增加確實可以加速,但若是比thread小很多,速度沒辦法顯著的進步。