# Programming Assignment HW2 ## Q1 > In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case? ![](https://i.imgur.com/CB0hlOf.png) ### **Ans:** 我測試了2至6個threads,實驗結果如上圖,很明顯speedup並不是linear,我認為是不同的input image會影響speedup。 ## Q2 > How do your measurements explain the speedup graph you previously created? 使用2個thread ![](https://i.imgur.com/b58GTHs.png) 使用3個thread ![](https://i.imgur.com/IZekTBo.png) 使用4個thread ![](https://i.imgur.com/rqT5o5q.png) 從上面的觀察,可以看到使用3個threads時,speedup的效果反而沒這麼好。而目前在執行平行化計算時是以高度將圖片做切分,再來看3個threads的情況,將圖片以高度切分三等份,thread1分配到的區塊有較多的白色區塊,需要比較多的計算時間。而實驗view2也發現黑白的分布比較均勻,就沒有這個問題。 ![](https://i.imgur.com/HHeBNzU.png) ## Q3 > In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained. ``` for (unsigned int i = args->threadId; i < args->height; i += args->numThreads){ mandelbrotSerial(args->x0,args->y0,args->x1,args->y1,args->width,args->height ,i,1,args->maxIterations,args->output); } ``` ![](https://i.imgur.com/KEIEaeG.png) 原本的方法是簡單的將整張圖根據thread數量去分塊,造成計算時間較不平均的問題,而我使用的方法是先把圖片再更細分,以上圖為例,我將圖片分成九塊,從上而下依序分配不同的thread去計算,這樣的方法就不會有某個thread都集中在圖片中間,造成不平均的問題。 下圖為結果。 ![](https://i.imgur.com/LENFs5q.png) ## Q4 > Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) ![](https://i.imgur.com/6myeoeX.png) 四個threads時speedup有3.8,而16個threads只有3.6。我想是workstation只有4個threads,當我們想使用更多thread,context switch就有可能會發生也就會導致較差的performance。