# Parallel Programming Assignment 2 # 0816176 張辰宇 ## Q1: In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case? (You may also wish to produce a graph for VIEW 2 to help you come up with a good answer. Hint: take a careful look at the three-thread data-point.) The results using different number of threads are listed below. ### View 1 ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 2 [mandelbrot serial]: [491.160] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [249.895] ms Wrote image file mandelbrot-thread.ppm (1.97x speedup from 2 threads) 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 3 [mandelbrot serial]: [478.259] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [291.454] ms Wrote image file mandelbrot-thread.ppm (1.64x speedup from 3 threads) 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [482.081] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [238.224] ms Wrote image file mandelbrot-thread.ppm (2.02x speedup from 4 threads) ``` ![](https://i.imgur.com/WBYUy3v.png) ### view 2 ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 2 --view 2 [mandelbrot serial]: [299.227] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [180.306] ms Wrote image file mandelbrot-thread.ppm (1.66x speedup from 2 threads) 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 3 --view 2 [mandelbrot serial]: [305.834] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [148.633] ms Wrote image file mandelbrot-thread.ppm (2.06x speedup from 3 threads) 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4 --view 2 [mandelbrot serial]: [300.232] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [110.677] ms Wrote image file mandelbrot-thread.ppm (2.72x speedup from 4 threads) ``` ![](https://i.imgur.com/a8iiXLe.png) In view 1, the growth is not linear. In view 2, the growth is linear. ![](https://i.imgur.com/XllK6kD.png) According to the picture above, I think the middle thread is assigned a lot more points to calculate in view 1, which results in longer elapsed time when using 3 thread. In view 2, the distibution is more even. Therefore, the growth is liner/ My hypothesize is the white points represents the amount of work a thread has to do. This hypothesize explains the growth trend of different views. ## Q2: How do your measurements explain the speedup graph you previously created? - Thread num = 2 ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 2 [mandelbrot serial]: [461.094] ms Wrote image file mandelbrot-serial.ppm thread: 0 times: 237.278 thread: 1 times: 237.506 thread: 0 times: 236.003 thread: 1 times: 238.448 thread: 0 times: 235.794 thread: 1 times: 237.157 thread: 0 times: 237.164 thread: 1 times: 238.455 thread: 0 times: 236.994 thread: 1 times: 237.125 [mandelbrot thread]: [237.323] ms Wrote image file mandelbrot-thread.ppm (1.94x speedup from 2 threads) ``` We can observe that two threads have almost the same amount of work to do in this case. - Thread num = 3 ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 3 [mandelbrot serial]: [460.616] ms Wrote image file mandelbrot-serial.ppm thread: 0 times: 93.146 thread: 2 times: 93.949 thread: 1 times: 284.180 thread: 0 times: 93.496 thread: 2 times: 93.618 thread: 1 times: 282.634 thread: 0 times: 93.820 thread: 2 times: 93.930 thread: 1 times: 282.910 thread: 0 times: 93.329 thread: 2 times: 93.762 thread: 1 times: 283.378 thread: 0 times: 93.131 thread: 2 times: 94.722 thread: 1 times: 287.022 [mandelbrot thread]: [282.897] ms Wrote image file mandelbrot-thread.ppm (1.63x speedup from 3 threads) ``` We can observe that due to the geographic ditribution of the white points, thread number 1 has the most jobs to do, which becomes the bottleneck of the whole program. This is the reason that it is worse than the 2 threaded case - the distibution of the work is too unbalanced. - Thread num = 4 ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [461.560] ms Wrote image file mandelbrot-serial.ppm thread: 0 times: 45.426 thread: 3 times: 52.458 thread: 2 times: 196.468 thread: 1 times: 208.487 thread: 0 times: 45.738 thread: 3 times: 46.693 thread: 1 times: 194.270 thread: 2 times: 194.359 thread: 0 times: 45.266 thread: 3 times: 46.120 thread: 1 times: 193.091 thread: 2 times: 194.167 thread: 0 times: 45.288 thread: 3 times: 45.701 thread: 1 times: 192.587 thread: 2 times: 193.466 thread: 0 times: 47.374 thread: 3 times: 49.587 thread: 1 times: 192.929 thread: 2 times: 194.471 [mandelbrot thread]: [193.652] ms Wrote image file mandelbrot-thread.ppm (2.38x speedup from 4 threads) ``` We can observe that compare to three threaded case, the bottleneck is less obvious, so the effieciency starts to grow again. ## Q3: In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained. ``` cpp void workerThreadStart(WorkerArgs *const args) { int id, height, numThreads; id = args->threadId; numThreads = args->numThreads; height = args->height; for( ; id < height ; id += numThreads){ mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, id, 1, args->maxIterations, args->output); } } ``` I let every thread calculates one row at a time, and increase to the next numThreads of row. This will gurantee all the treads will do same amount of jobs(lines of row). For example, if there are total of 6 rows and 4 threads. Then thread 0 will do row 0 and row 3, thread 1 will do row 1, row 4, thread 2 will do row 2, row 5. The final result is as follow. ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [461.797] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [121.088] ms Wrote image file mandelbrot-thread.ppm (3.81x speedup from 4 threads) ``` ## Q4: Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) ``` 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [461.797] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [121.088] ms Wrote image file mandelbrot-thread.ppm (3.81x speedup from 4 threads) 816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 8 [mandelbrot serial]: [461.287] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [127.628] ms Wrote image file mandelbrot-thread.ppm (3.61x speedup from 8 threads) ``` We can observe that 8 threaded is slower than 4 threaded since we only have 4 cores, any number more than that will only lead to more context switch and more overhead. :::danger 你應該要根據server使用的cpu是4core&4thread去判斷, 重點是4thread, 若是支援Hyper-threading的4core 8thread的cpu開8個thread就不一定比較糟了 by TA :::