2017q1 Homework1 (compute-pi)

contributed by < laochanlam >

tags: `laochanlam`

Reviewed by `yanglin5689446`

OpenMP4 的資料圖明顯跟其他方式相差很大，是否有個合理的解釋？
執行的時候是否有其他因素影響到 performance ，使得圖看起來不太規律？

Reviewed by `Daichou`

實驗數據在100~95100的資料是否有偏差資料，尤其是openmp4。

Reviewed by `Billy4195`

leibniz法可以加上平行化的後的加速，以及跟原本PI的算法做比較。
[疑問]為什麼OpenMP4的N與時間的圖會有上下起伏呢?

開發環境

Ubuntu 16.10 (64bit)

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 61
Model name:            Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Stepping:              4
CPU MHz:               2400.878
CPU max MHz:           3000.0000
CPU min MHz:           500.0000
BogoMIPS:              4788.90
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3

Compute-pi 開發紀錄

先是按照作業要求重現一次實驗，把專案 clone 下來後，打開 computepi.c 可以清楚看到各種計算 pi 的不同優化版本。

分別為 baseline 版本, AVX SIMD 版本, AVX SIMD + Unroll looping 版本及 OpenMP ( 2 threads & 4 threads) 版本。

接著開始輸入 $ makecheck 及 $ make gencsv後，得到 result_clock_gettime.csv 這樣的數據，準備用 LibreOffice 繪圖。

在繪圖之前，先要搞懂資料的產生過程。

gencsv: default
	for i in `seq 100 5000 25000`; do \
		printf "%d," $$i;\
		./benchmark_clock_gettime $$i; \
	done > result_clock_gettime.csv

在Makefile中有這一段，看起來像是 Shell script 的東西，來研究一下。

for i in `seq 100 5000 25000`;

這一段就是i初始為 100 ,每執行多一次增加 5000 ，到 25000 就停止，所以一共會執行 5 次分別為 100 , 5100 , 10100 , 15100 , 20100 。

然後我把實驗數據改為100～95100，並用 LibreOffice 繪出圖像如下。

然後嘗試寫 gnuplot 的 script 畫圖。

因為 result_clock_gettime.csv 中用逗號作間隔，再加上各種對 gnuplot 的不熟悉，畫個圖快畫了我半個小時Orz。

本來我是用 "%lf,%lf,%lf,%lf,%lf,%lf" 來實現逗號分隔的數據讀取，但後來找到petermouse學長的共筆有更好的方法，就是 set datafile separator "," 即可實現逗號分隔。

附上程式碼供參考：

reset
set ylabel 'time(sec)'
set style data lines
set title 'Compute-pi'
set term png enhanced font 'Verdana,10'
set output 'runtime.png'
set xtics rotate by 45 right
set datafile separator ","

plot [:][:0.05]'result_clock_gettime.csv' using 1:2  title 'Baseline', \
'' using 1:3  title 'OpenMP ( 2 threads )', \
'' using 1:4  title 'OpenMP ( 4 threads )', \
'' using 1:5  title 'AVX', \
'' using 1:6  title 'AVX + Unroll looping', \
'' using 1:7  title 'Leibniz'

Leibniz formula for π 證明

參考shelly4132的共筆的証明方法，

先有以下分解：

對兩邊進行從 0 到 1 的積分，

當n趨向無限時,

得証

最後編寫程式並繪圖。

double compute_pi_leibniz(size_t N)
{
    double pi = 0.0;
    double term = 0.0;
    double sign = 1.0;
    for (size_t i = 0; i < N; i++) {
        term = (double) sign / ( 2.0 * i + 1.0 );
        sign = -sign;
        pi += term;
    }
    return pi * 4.0;
}

接下來等待有時間再進行其它優化。

補充知識

time
error rate

time

Wall-clock time

Wall-clock time 顧名思義就是掛鐘時間，也就是現實世界中實際經過的時間，是由 kernel 裡的 xtime 來紀錄，系統每次啟動時會先從設備上的 RTC 上讀入 xtime。

通常來說並不建議用來量測程式時間。

CPU time

CPU time 指的是程序在 CPU 上面運行消耗 (佔用) 的時間，clock() 就是很典型用來計算 CPU time 的時間函式，但要注意的是，如果有多條 threads，那 CPU time 算的是「每條 thread 的使用時間加總」。

當我們執行 $ time 時，會得出以下三個參數：

real time : Wall-clock time，注意這個時間會被執行中的其他程式所影響。
user time : 程式在 user mode 佔用所有的 CPU time 總和(若是多核計算時間會加總)
sys time : 程式在 kernel mode 佔用所有的 CPU time 總和。

參考及引用資料

B03: compute-pi

error rate

在廖健富學長提供的詳盡分析中計算 error rate 的函式是這樣的。

    // Error rate calculation

    #define M_PI acos(-1.0)

    double pi = compute_pi(n);

    double diff = pi - M_PI > 0 ? pi - M_PI : M_PI - pi;

    double error = diff / M_PI;

裡面是用了 arccos -1 來求 pi ，然後算它跟每個版本的誤差值來求 error rate。

參考及引用資料

廖健富學長提供的詳盡分析

進度

2017/03/01
- 重演實驗，並用 gnuplot畫圖

2017q1 Homework1 (compute-pi)

tags: laochanlam

Reviewed by yanglin5689446

Reviewed by Daichou

Reviewed by Billy4195

開發環境

相關連結

Compute-pi 開發紀錄

Leibniz formula for π 證明

補充知識

time

Wall-clock time

CPU time

參考及引用資料

error rate

參考及引用資料

進度

tags: `laochanlam`

Reviewed by `yanglin5689446`

Reviewed by `Daichou`

Reviewed by `Billy4195`