2017q1 Homework1 (compute-pi)
contributed by <vtim9907
>
Reviewed by <rayleigh0407
>
開發環境
- OS : Ubuntu 16.04.2 LTS
- Kernel Version : 4.8.0-36-generic
- Memory : 32 GB
- CPU : Intel® Core™ i5-6600 Processor
- cache :
- L1i cache : 4*32 KB
- L1d cache : 4*32 KB
- L2 cache : 4*256 KB
- L3 cache : 6144 KB
- L1 cache imformation ( sudo dmidecode -t cache ) :
Handle 0x001E, DMI type 7, 19 bytes
Cache Information
Socket Designation: L1 Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 256 kB
Maximum Size: 256 kB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Parity
System Type: Unified
Associativity: 8-way Set-associative
基本實驗
25 rounds, N <= 21000
將原實驗結果視覺化呈現,先用 LibreOffice 畫看看 (25 round):
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
線實在有點粗,圖片也有點小,可能我還不太會用@@" 改用 Gnuplot 畫:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
10000 rounds, N <= 50000
覺得取樣的間隔有點太大,決定改成 N 增加 1000 就取樣一遍,並將 N 放大到 50000,一次跑 10000 rounds :
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
從這張圖就很明顯可以看出,除了 4 threads 的方法以外,其他方法的波動都比較小,穩定遞增,但是使用了 4 threads 的 OpenMP 卻不斷的震盪,奇怪的是這張圖的前半段,就是 N 還小於兩萬左右時,結果與上面的測試不太相符…,我覺得這應該是因為 10000 rounds 中,有幾 round 因為一些不確定因素的關係,時間花的比較久,所以拉高了整體所花的時間。
95% 信賴區間
為了解決上面測不太準的問題,我用取信賴區間的方式,取 95% 信賴區間內的數據當參考,以外的就忽略掉,這樣就可以避免某幾 round 沒跑好,導致整體數據的錯誤。
首先可以簡單從高中課本或是 wiki 查到母體標準差的定義 :
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
簡單化簡一下 :
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
而根據查 Z 值的表可以知道,95% 的信賴區間落在 ±1.96 個標準差內,所以可以知道,直接從母體算出 95% 信賴區間的上下界公式為:
實做取信賴區間,我直接加了一個 function 進 benchmark_clock_gettime.c
:
double calc_ci (double sample_time[])
{
int j, count = 0;
double average = 0.0, total_time = 0.0, sigma, upper, lower, conf_interval = 0.0;
for (j = 0; j < SAMPLE_NUM; ++j) {
average += sample_time[j];
total_time += sample_time[j] * sample_time[j];
}
average /= SAMPLE_NUM;
sigma = sqrt(total_time / SAMPLE_NUM - average * average);
upper = average + 1.96 * sigma;
lower = average - 1.96 * sigma;
for (j = 0; j < SAMPLE_NUM; ++j) {
if (sample_time[j] > lower && sample_time[j] < upper) {
conf_interval += sample_time[j];
count++;
}
}
return conf_interval /= count;
}
為了不讓程式跑太久,我改成一次跑 500 rounds,全部跑 1000 次,從這 1000 次取 95% 信賴區間內的數據當參考,最後畫成圖:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
這張圖我總共跑了五次,因為第一次跑還是有些微的波動,我怕是因為背景程式或是網頁的干擾,所以關掉了一些東西後重跑了四次,結果數據都很平滑,可見在測試效能時,還是要盡量排除不必要的不確定因素。
很明顯,這次的分析圖就相當平滑,也能看出當 N 越大時,OpenMP with 4 threads
還是比 AVX SIMD + Loop unrolling
快一點,是目前最快的優化版本。
計算誤差
參考了 廖健富學長的筆記,而選擇使用 逆推出來的 當作基準,來推算用上面幾個方法所算出來的 的誤差。
簡單寫了 function 來產生 .csv 檔,然後用 Gnuplot 畫出來:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
由於一開始的誤差很大,所以畫成圖後,前半部會成垂直狀,很難看。
不過因為誤差會極度減小的範圍只有 N 還很小的時候,所以只要把 N 的起始值調高,數據應該就比較漂亮:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
結果我想錯了 @@" 其實誤差值是持續穩定下降,所以區間只要選的太大,出來的圖形就會趨近直角,所以只好把區間縮小一點在畫出來;不過大略可以看出,若 趨近 ,算出來 的精確度就大約有 n 位。簡單證明
首先我們知道
又知 可以 power series 展開為
所以對等式 (1) 兩邊積分得到 :
然後因為 ,帶進 (2) 得到
實做
根據前面證明結果
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
可以簡化為:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
有了上面的式子就很好實做。
實做方面我直接選擇用 OpenMP 使用 4 threads 去跑,然後同樣一次跑 500 rounds ,共跑 1000 次,然後取 95% 信賴區間內的結果當參考,用 Gnuplot 畫成圖:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
新的公式比之前的版本快滿多的,幾乎是 Baseline 的三倍速。
計算 Leibniz formula for π 的誤差,與前一版的誤差一起畫成圖形做比較:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
看出其實兩者的精度差不多,只是分別從 的上下逼近 。
Reference