# Programming Assignment II: Multi-thread Programming ###### tags: `PP_report` ###### student: `0716301劉育源`  ## Part 2 * Q1: In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case? * 修改main.cpp,從 1 thread 跑到 -t 設定的 thread 數量,同時也計算每個 thread 所需時間,output |VIEW 1|VIEW 2| |-|-| ||| * graph |VIEW 1|VIEW 2| |-|-| ||| * answer: ```c=40 static inline int mandel(float c_re, float c_im, int count) { float z_re = c_re, z_im = c_im; int i; for (i = 0; i < count; ++i) { if (z_re * z_re + z_im * z_im > 4.f) break; float new_re = z_re * z_re - z_im * z_im; float new_im = 2.f * z_re * z_im; z_re = c_re + new_re; z_im = c_im + new_im; } return i; } ``` * 不呈線性成長,我假設一次配置每個 thread 連續的 row 效能不佳的原因,可以透過觀察以上這段 code line 48的地方,提早 break 則 output 小,以及觀察 mandelbrot-serial.ppm VIEW 1,可以看出來白色的區域(output 數值大)是偏向連續分布的,因此若是用連續分配 row 給每個 thread 的方式會讓某些 thread 的 work load 較高,在 join 時要等這些 thread 計算完畢 * Q2: How do your measurements explain the speedup graph you previously created? * 由上方 output 可看出,VIEW 1被配置到中間區域大塊白色區域的 thread 皆花費較多時間,如 3 thread 的 T1、4 thread 的 T1 和 T2 ...,而 VIEW 2 的結果則是能看每個結果的 T0 花費時間 * Q3: In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained. * output: |VIEW 1|VIEW 2| |-|-| ||| :::info The pictures are so small and blurry. >[name=TA] ::: * graph: |VIEW 1|VIEW 2| |-|-| ||| * answer: *  * 我改善的方法是每個 thread 交替做每個 row,`thread t` 負責做第 `(numThread*n+threadId) ` row for n = 0, 1, 2, 3...,讓 work load 較高的區域讓更多的 thread 分攤 * speedup when numThread == 4: 3.79x for VIEW 1, 3.81x for VIEW 2 * Q4: Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) * thread num <= 4 時,speedup 呈線性成長,題幹中提到,工作站電腦只有4C4T,因此在 thread num > 4 時,speedup 不再上升,VIEW 2亦是如此, * 在i7-6700(4C8T)運行 VIEW 1 的數據: |output|graph| |-|-| ||| * 我在另一台4C8T的電腦測試時,thread num <= 8 時也大致呈現線性成長,但是再超過 8 thread 之後效能顯著下降了一段(工作站電腦也有同樣問題),我認為是因為在有不同 logical thread 共用同個 physical thread 時造成的 context switch 的 overhead,以及 cache pollution 及在不同 core 間 mirgation 等 overhead,因此造成更多 thread 反而效能下降的問題
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up