2016q3 Homework1 (compute_pi)

# 2016q3 Homework1 (compute_pi) contributed by <`abba123`> ## 開發環境 OS : ubuntu 16.04.1 LTS ## 實驗首先我們先把原本 5 種 compute_pi跑一次包含 baseline、baseline_openmp_2、baseline_openmp_4、baseline_AVX、baseline_AVX_unrolling ``` $ make gencsv ``` 其中 gencsv 的參數如下 ``` gencsv: default for i in `seq 1000 5000 10000000`; do \ printf "%d " $$i;\ ./benchmark_clock_gettime $$i; \ done > result_clock_gettime.csv ``` 這就等於下面的 for 迴圈從 1000 開始帶入每次加 5000 直到大於 10000000 為止 ```c= for(int i=1000;i<10000000;i+=5000) ``` 這時會輸出一個 result_clock_gettime.csv 用來紀錄結果的檔案我們把檔案拿來分析作圖下面是 gnuplot 腳本 ``` reset set xlabel 'N' set ylabel 'sec' set style fill solid set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime.png' plot "result_clock_gettime.csv" using 1:2 smooth csplines lw 2 title "Baseline", \ ''using 1:3 smooth csplines lw 2 title "Baseline with OpenMP_2", \ ''using 1:4 smooth csplines lw 2 title "Baseline with OpenMP_4", \ ''using 1:5 smooth csplines lw 2 title "AVX SIMD", \ ''using 1:6 smooth csplines lw 2 title "VX SIMD + Loop unrolling", \ ``` ``` $ gnuplot plot.gp ``` 我們拿 result_clock_gettime.csv 當作資料輸入，並會成折線圖 ![](https://i.imgur.com/h30iDfn.png) 這邊可以發現，原版baseline效能並不好，尤其是在 N 很大的時候 2 threads 跟 4 threads 效能一樣應該是因為我電腦只有雙核心，開到 4 個 threads 所以我後面都用 2 條 threads 下去跑 AVX SIMD + LOOP UNROLLING 的效能來到最高 ## 用別的方法實計算 pi 這裡我找了兩個方法來計算pi * 另外一種 Leibniz 的表示 https://crypto.stanford.edu/pbc/notes/pi/glseries.html ```c= double compute_pi_Leibniz_2(size_t N) { double pi=0.0; double x=0.0; for(size_t i=0; i<N; i++) { x=1/((double)i*2+1); if(i%2==0) { pi+=x; } else pi-=x; } return 4*pi; } ``` * Wallis Product https://crypto.stanford.edu/pbc/notes/pi/wallis.html ```c= double compute_pi_Wallis_Product(size_t N) { double pi=1.0; double x=0.0; for(size_t i=0; i<N; i++) { x=2*(double)i; pi*=((x+2)/(x+1))*((x+2)/(x+3)); } return 2*pi; } ``` 把以上兩種版本加進我們的圖表裡面一起跑一次 ![](https://i.imgur.com/l2n68pZ.png) >這裡曲線平滑應該是因為在跑得時候沒用電腦，沒有外在干擾可以發現 baselin >= Wallis >>>> Leibniz_2 前面兩個必須要在 N 很大的時候才看得出一點差距就算用了 openmp 也是差不多情形但第三種就明顯快很多原來同一種方法，用不同表示式可以差到麼多！！！！！ ## FUTURE WORK * 計算 error rate (做一半...) * 嘗試用信賴區間去除不合理數據 * 把新增的方法用 AVX SIMD 改寫