Programming Assignment II: Multi-thread Programming

# Programming Assignment II: Multi-thread Programming ## part2 ### Q1: > **In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case?** --- ### A1: * the measurement of View 1 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 2 [mandelbrot serial]: [466.881] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [237.234] ms Wrote image file mandelbrot-thread.ppm (1.97x speedup from 2 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 3 [mandelbrot serial]: [465.462] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [284.350] ms Wrote image file mandelbrot-thread.ppm (1.64x speedup from 3 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [464.053] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [193.463] ms Wrote image file mandelbrot-thread.ppm (2.40x speedup from 4 threads) ``` ![v1.png](https://hackmd.io/_uploads/SkU_oeZ7a.png =500x) --- * the measurement of View 2 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 2 --view 2 [mandelbrot serial]: [291.459] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [173.639] ms Wrote image file mandelbrot-thread.ppm (1.68x speedup from 2 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 3 --view 2 [mandelbrot serial]: [291.311] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [132.521] ms Wrote image file mandelbrot-thread.ppm (2.20x speedup from 3 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 4 --view 2 [mandelbrot serial]: [292.305] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [111.717] ms Wrote image file mandelbrot-thread.ppm (2.62x speedup from 4 threads) ``` ![v2.png](https://hackmd.io/_uploads/B1Qcox-QT.png =500x) --- 我們發現在View1中speedup並沒有隨著thread數變多而增加(3 threads比2 threads慢)，但在View2中speedup有隨著thread數增加而增加。 **推測原因為各thread之間工作分配不平均** 我們可以觀察View1被分成3部分的情形，發現中間區域的白色部分較上下多出不少，且在mandelbrot中計算時間與白色部分成正相關，這代表Thread1的計算時間比Thread0, Thread2高出不少。 ![view1-3.PNG](https://hackmd.io/_uploads/rkiERg-Qa.png =400x) --- ### Q2: >How do your measurements explain the speedup graph you previously created? ### A2: 我們量測每個thread運行時間，結果如下。 * the measurement of View 1 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 2 [mandelbrot serial]: [463.640] ms Wrote image file mandelbrot-serial.ppm [Thread 0]: [235.079] ms [Thread 1]: [236.205] ms [mandelbrot thread]: [236.406] ms Wrote image file mandelbrot-thread.ppm (1.96x speedup from 2 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 3 [mandelbrot serial]: [464.093] ms Wrote image file mandelbrot-serial.ppm [Thread 0]: [93.209] ms [Thread 2]: [93.952] ms [Thread 1]: [282.820] ms [mandelbrot thread]: [283.092] ms Wrote image file mandelbrot-thread.ppm (1.64x speedup from 3 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [463.909] ms Wrote image file mandelbrot-serial.ppm [Thread 0]: [45.513] ms [Thread 3]: [45.875] ms [Thread 1]: [192.702] ms [Thread 2]: [193.426] ms [mandelbrot thread]: [193.687] ms Wrote image file mandelbrot-thread.ppm (2.40x speedup from 4 threads) ``` ![view1-3.PNG](https://hackmd.io/_uploads/B1bNlbWXa.png =400x) 量測結果與假說符合，在3 threads的版本中，thread1的運行時間遠超過其他2個threads，因此speedup與thread數不成正比的關係即為thread間工作分配不平均。 * the measurement of View 2 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 2 --view 2 [mandelbrot serial]: [291.542] ms Wrote image file mandelbrot-serial.ppm [Thread 1]: [123.346] ms [Thread 0]: [173.100] ms [mandelbrot thread]: [173.174] ms Wrote image file mandelbrot-thread.ppm (1.68x speedup from 2 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 3 --view 2 [mandelbrot serial]: [290.939] ms Wrote image file mandelbrot-serial.ppm [Thread 2]: [80.119] ms [Thread 1]: [87.366] ms [Thread 0]: [132.188] ms [mandelbrot thread]: [132.277] ms Wrote image file mandelbrot-thread.ppm (2.20x speedup from 3 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 4 --view 2 [mandelbrot serial]: [292.117] ms Wrote image file mandelbrot-serial.ppm [Thread 3]: [61.140] ms [Thread 2]: [65.269] ms [Thread 1]: [66.041] ms [Thread 0]: [111.197] ms [mandelbrot thread]: [111.301] ms Wrote image file mandelbrot-thread.ppm (2.62x speedup from 4 threads) ``` ![view2-3.PNG](https://hackmd.io/_uploads/rkeAxb-Q6.png =400x) 在View 2中，我們也能發現3 Threads版本thread0的白色部分較其他2 threads多，量測結果也為thread0時間遠超其他2 threads，再次符合Q1提出的假說。 --- ### Q3: >In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained. ### A3: 我們把切割方式從平均分成N等分改為round robin的方式，讓每個thread每次只做1列，減輕各thread可能的工作不平均(如下圖) ![part.jpg](https://hackmd.io/_uploads/Sy09XZZQa.jpg =500x) 經過改良後的pratition方式，量測結果如下： * the measurement of View 1 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 4 [mandelbrot serial]: [461.321] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [121.457] ms Wrote image file mandelbrot-thread.ppm (3.80x speedup from 4 threads) ``` * the measurement of View 2 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 4 --view 2 [mandelbrot serial]: [291.237] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [76.720] ms Wrote image file mandelbrot-thread.ppm (3.80x speedup from 4 threads) ``` 在View 1和View 2中，使用新的partition方式都能達到3.80x的speedup。 --- ### Q4: >Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) ### A4: * the measurement of 8 Threads of View 1 and View 2 ``` 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 8 [mandelbrot serial]: [473.564] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [127.050] ms Wrote image file mandelbrot-thread.ppm (3.73x speedup from 8 threads) 109550104@pp7:~/HW2/part2$ ./mandelbrot -t 8 --view 2 [mandelbrot serial]: [290.533] ms Wrote image file mandelbrot-serial.ppm [mandelbrot thread]: [80.198] ms Wrote image file mandelbrot-thread.ppm (3.62x speedup from 8 threads) ``` 對比4 threads的版本，8 threads並沒有顯著提升，甚至還有下降的趨勢。 **推測原因為工作站為4 cores 4 threads，因此同時間最多只能有4個thread同時運作。** **因此，就算有8個threads也只能有4個同時運作，並且8個threads還會有context switch overhead，因此導致效果比4個threads還要差一點。**