PP-f22 Assignment 2

# PP-f22 Assignment 2 ### Q1: In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case? (You may also wish to produce a graph for VIEW 2 to help you come up with a good answer. --- - View 1 ![](https://i.imgur.com/8hH3Acg.png) - View 2 ![](https://i.imgur.com/x6Zq35z.png) 速度不是隨著 Thread 增加線性變快的我認為是因為 VIEW 1 中間的 row 圖案比較複雜，如果用空間分割的話 Thread 1 會要做比 Thread 0 跟 2 更多的工作量，因此造成整體加速程度比只有 2 Threads 來的差。而 VIEW 2 的圖案複雜度比較平均，因此可以看出加速為線性成長。 ### Q2: How do your measurements explain the speedup graph you previously created? - View 1 (3 threads) | Thread 0 | Thread 1 | Thread 2 | | -------- | -------- | -------- | | 94.089 ms | 284.044 ms | 94.535 ms | View 1 中間部分圖案複雜度比較高，因此結果如 Q1 所說，確實在 Thread 1 上花最多時間 (Thread 0 上部分，Thread 1 中部分，Thread 2 下部分)。 ### Q3: In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained. 我採用的方法是將每一 row 輪流分配給各 Thread, 以 4 個 Thread 為例， Thread 0 算 row 0, 4, 8, Thread 1 算 row 1, 5, 9, 以此類推，最後除不盡的給最後一個 Thread 算。這樣做的優點是可以避免 Q2 的問題發生，因為在需要計算比較多的區域也是由各個 Thread 一起計算，就不會有瓶頸的產生。 - New method View 1 (4 threads) ![](https://i.imgur.com/k4LgQl5.png) ### Q4: Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) 表現沒有比較好，因為工作站只有 4 個 hardware thread，即使程式寫了 8 threads ，但是同時還是只能有 4 個 thread 一起跑，因此時間上會跟 4 threads 的版本差不多。 - New method View 1 (8 threads) ![](https://i.imgur.com/N5DrXof.png)