Parallel Programming @ NYCU - HW2
0716221 余忠旻
In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1.
(You may also wish to produce a graph for VIEW 2 to help you come up with a good answer.)
我將 image 由上到下均分給各個 thread 計算執行,如下實作:
下圖是不同 number of threads 所產生的相對應的 speedup 比較圖
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Q1: Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case?
Hint: take a careful look at the three-thread data-point.
- 在VIEW 1中,speedup隨著number of threads上升並沒有線性成長,反而在number of threads=3下降
- 在VIEW 2中,speedup隨著number of threads上升有線性成長
在 workerThreadStart()
中,每一個 thread 都會執行 mandelbrotSerial()
而 mandelbrotSerial()
會在for迴圈重複呼叫到 mandel()
看如下 mandel()
函式中,可以發現決定for迴圈的次數是
if (z_re * z_re + z_im * z_im > 4.f) break;
這個條件
在 8 位元灰階圖片中,0 表示黑色,最大值 255 則表示白色,數值越大顏色越淺
所以我發現,當 count=256 時, i 可能為 0~255 中任一數,其中數值大小決定該像素顏色深淺
再比較VIEW 1和 VIEW 2:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
可以明顯看出 VIEW 1 的淺色像素遠比 View2 來得多且集中(集中在中間)
接著我們分別來看3個thread計算執行的內容(如下圖)
thread1 所計算的image內容淺色像素多且集中
因此我認為是 thread1 拖垮整體執行速度,導致 speedup 下降
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Hypothesize: 每個 thread 的工作量與其負責區域所含有的淺色像素數量有關。
當 number of threads=3 時,thread1 會被分配到淺色像素多且集中的區域,
導致整體執行速度不佳。
Q2: How do your measurements explain the speedup graph you previously created?
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
在 number of thread=3 可以看出
thread1 執行時間也是比 thread0 和 thread2 久
thread1 所計算的image內容淺色像素多且集中
因此 thread1 拖垮整體執行速度,導致 speedup 下降
在 number of threads=4 也可以看出
thread1 和 thread2是主要的執行速度瓶頸
接著看 VIEW 2 , number of threads=3
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
相對來說 VIEW 2的 thread0 執行時間也是比 thread1 和 thread2 久
(VIEW 2 淺色像素主要在最上面)
但因為淺色像素在整體是相對分散許多
因此執行速度瓶頸雖是 thread0,但不至於拖垮整體執行速度,還是可以看出線性成長的 speedup
A: 在 VIEW 1,speedup隨著number of threads上升不會有線性成長,
是因每個 thread 工作量分配極度不均的關係。
Q3: In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained.
用一開始的方法,由上到下均分成n個區塊來分配給n個threads計算執行
但在 VIEW 1 由於淺色像素集中於同一區域導致工作量分配不均
因此我將 workerThreadStart()
修改成如下:
我將 workerThreadStart()
每一個thread執行mandelbrotSerial()
時,傳入的 totalRows=1
我以 number of threads=3為例:

這樣能將原本淺色像素集中區域,3個thread都會被分到部分計算執行,
如此能將工作量平均分配
- Final 4-thread speedup obtained
執行結果如下:
View 1: 3.80x speedup from 4 threads
View 2: 3.80x speedup from 4 threads
(Notice that the workstation server provides 4 cores 4 threads.)
當 number of threads=8 執行如下:
可以發現 number of threads=8 會比 number of threads=4 執行還要慢一些
我認為當number of threads=8時,speedup沒有上升反而下降,
是因為工作站提供的是 4 核心 4 執行序的 Server,
當使用 8 個 threads 時, core 被分配到的threads就可能會大於1
這時執行會多了 context switch 的 overhead
這也造成了整體執行時間比 number of threads=4 還要久的原因。