Try   HackMD

Parallel Programming HW2 @NYCU, 2022 Fall

tags: 2022_PP_NYCU

Q1

In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1.

❯ ./mandelbrot -t 1
[mandelbrot serial]:            [668.115] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [668.263] ms
Wrote image file mandelbrot-thread.ppm
                                (1.00x speedup from 1 threads)
                                
❯ ./mandelbrot -t 2
[mandelbrot serial]:            [519.932] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [265.877] ms
Wrote image file mandelbrot-thread.ppm
                                (1.96x speedup from 2 threads)
                                
                        
❯ ./mandelbrot -t 3
[mandelbrot serial]:            [798.260] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [492.891] ms
Wrote image file mandelbrot-thread.ppm
                                (1.62x speedup from 3 threads)

❯ ./mandelbrot -t 4
[mandelbrot serial]:            [794.566] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [334.012] ms
Wrote image file mandelbrot-thread.ppm
                                (2.38x speedup from 4 threads)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

❯ ./mandelbrot -t 1 -v 2
[mandelbrot serial]:            [488.544] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [489.154] ms
Wrote image file mandelbrot-thread.ppm
                                (1.00x speedup from 1 threads)
                                
❯ ./mandelbrot -t 2 -v 2
[mandelbrot serial]:            [486.680] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [284.448] ms
Wrote image file mandelbrot-thread.ppm
                                (1.71x speedup from 2 threads)
                                
❯ ./mandelbrot -t 3 -v 2
[mandelbrot serial]:            [480.144] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [217.992] ms
Wrote image file mandelbrot-thread.ppm
                                (2.20x speedup from 3 threads)
                                
❯ ./mandelbrot -t 4 -v 2
[mandelbrot serial]:            [486.658] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [183.308] ms
Wrote image file mandelbrot-thread.ppm
                                (2.65x speedup from 4 threads)

Is speedup linear in the number of threads used?

view1 的 speedup 和 thread number 非線性
view2 的 speedup 和 thread number 線性

In your writeup hypothesize why this is (or is not) the case?

  • view1 的圖有黑白區塊非常明顯,黑白區塊在做運算時所花的時間不同,因此不同thread會有不同的運算時間。
  • view2 的圖無太大顏色改變,所以在運算上的時間差不多,thread 數量和speedup 呈線性。

Q2 How do your measurements explain the speedup graph you previously created?

View1 因為每個 thread 工作量分配不均的關係,圖片中白色區塊的部份會花比較多計算,所以在不同 thread 數量下的 speedup 之所以沒有成線性成長。

view1 實驗結果

$ ./mandelbrot -t 2
[mandelbrot serial]:		[459.212] ms
Wrote image file mandelbrot-serial.ppm
thread0 takes [0.234840]
thread1 takes [0.236007]
thread0 takes [0.236044]
thread1 takes [0.237290]
thread0 takes [0.234665]
thread1 takes [0.235717]
thread0 takes [0.235215]
thread1 takes [0.236302]
thread0 takes [0.235817]
thread1 takes [0.237080]
[mandelbrot thread]:		[235.903] ms
Wrote image file mandelbrot-thread.ppm
				(1.95x speedup from 2 threads)

$ ./mandelbrot -t 3
[mandelbrot serial]:		[458.592] ms
Wrote image file mandelbrot-serial.ppm
thread0 takes [0.093183]
thread2 takes [0.093638]
thread1 takes [0.282862]
thread0 takes [0.093044]
thread2 takes [0.093636]
thread1 takes [0.282560]
thread0 takes [0.093067]
thread2 takes [0.093692]
thread1 takes [0.282518]
thread0 takes [0.093067]
thread2 takes [0.093701]
thread1 takes [0.282682]
thread0 takes [0.092993]
thread2 takes [0.093618]
thread1 takes [0.282442]
[mandelbrot thread]:		[282.581] ms
Wrote image file mandelbrot-thread.ppm
				(1.62x speedup from 3 threads)

$ ./mandelbrot -t 4
[mandelbrot serial]:		[459.216] ms
Wrote image file mandelbrot-serial.ppm
thread0 takes [0.045263]
thread3 takes [0.045677]
thread1 takes [0.192079]
thread2 takes [0.192883]
thread0 takes [0.045370]
thread3 takes [0.045800]
thread1 takes [0.192164]
thread2 takes [0.192752]
thread0 takes [0.045333]
thread3 takes [0.045706]
thread1 takes [0.191870]
thread2 takes [0.192589]
thread0 takes [0.045332]
thread3 takes [0.045726]
thread1 takes [0.191945]
thread2 takes [0.192660]
thread0 takes [0.045309]
thread3 takes [0.045751]
thread1 takes [0.192286]
thread2 takes [0.192915]
[mandelbrot thread]:		[192.802] ms
Wrote image file mandelbrot-thread.ppm
				(2.38x speedup from 4 threads)

圖片分配thread 狀況:

  • 由實驗結果可以看到
    • 3 個 thread時,thread 1 所花的時間比 thread 0, 2 還要多
    • 4 個 thread時,thread 1,2 所花的時間比 thread 0, 3 還要多
  • 由圖片切割可以看出
    • 3 個 thread時,thread 1 要處理圖片中間那一塊,有比較多白色的區塊
    • 4 個 thread時,thread 1,2 要處理圖片中間那二塊,有比較多白色的區塊

Q3

In your write-up, describe your approach to parallelization

我使用的方法是把圖切成很多區塊,並且讓每一個thread 都平均分配,讓每個thread 的工作量差不多。

如下圖與程式:

int step = args->numThreads;
int height = args->height;
for(int r = 0; r<height; r+=step)
{
    mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, r, 1,args->maxIterations, args->output);
}

Report the final 4-thread speedup obtained.

view1

 $ ./mandelbrot -t 4
[mandelbrot serial]:		[458.913] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:		[120.965] ms
Wrote image file mandelbrot-thread.ppm
				(3.79x speedup from 4 threads)

view2

$ ./mandelbrot -t 4 -v 2
[mandelbrot serial]:		[287.501] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:		[75.835] ms
Wrote image file mandelbrot-thread.ppm
				(3.79x speedup from 4 threads)

Q4

Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads?

在performance 上, 8 thread並沒有比 4 thread有顯著的進步。

  • view1 with 8 thread
[mandelbrot serial]:		[458.971] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:		[121.978] ms
Wrote image file mandelbrot-thread.ppm
				(3.76x speedup from 8 threads)
  • view2 with 8 thread
[mandelbrot serial]:		[287.568] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:		[75.958] ms
Wrote image file mandelbrot-thread.ppm
				(3.79x speedup from 8 threads)
view1 view2
4-thread 3.79x speedup 3.79x speedup
8-thread 3.76x speedup 3.79x speedup

Why or why not? (Notice that the workstation server provides 4 cores 4 threads.)

因為當 thread 數量多於核心數,會造成額外的 context switch overhead,導致速度沒有明顯的進步,甚至稍微退步。

我有另外找了一台6核心的電腦作8 thread來實驗

$ ./mandelbrot -t 8
[mandelbrot serial]:            [394.422] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [55.181] ms
Wrote image file mandelbrot-thread.ppm
                                (7.15x speedup from 8 threads)
$ ./mandelbrot -t 8 -v 2
[mandelbrot serial]:            [481.647] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]:            [74.546] ms
Wrote image file mandelbrot-thread.ppm
                                (6.46x speedup from 8 threads)

由實驗結果可以看出核心數增加確實可以加速,但若是比thread小很多,速度沒辦法顯著的進步。