# 2016q3 Homework1 (compute-pi) ## 開發環境 * OS:Lubuntu 16.04 LTS * Memory: 8 GB * CPU: * Name: * Intel(R) Core(TM) i5-4200H CPU @ 2.80GHz * Cache: * L1d Cache: 32K * L1i Cache: 32K * L2 Cache: 256K * L3 Cache: 3072K ## Leibniz formula for π [Leibniz formula for π的詳細證明](https://proofwiki.org/wiki/Leibniz%27s_Formula_for_Pi) 其他取得π值的方法: (1)[黎曼ζ函數](https://zh.wikipedia.org/wiki/%E9%BB%8E%E6%9B%BC%CE%B6%E5%87%BD%E6%95%B8) ζ(2)=π^2^/6; (2)[沃利斯乘積](https://zh.wikipedia.org/wiki/%E6%B2%83%E5%88%A9%E6%96%AF%E4%B9%98%E7%A7%AF) (3)[高斯積分](https://zh.wikipedia.org/wiki/%E9%AB%98%E6%96%AF%E7%A7%AF%E5%88%86) (4)... ## Makefile 一開始完全不知道程式要怎麼產生想要的資料,所以觀察了一下Makefile,發現比起phonebook,又多一些看不懂的東西。 ```= gencsv: default for i in `seq 100 5000 25000`; do \ printf "%d," $$i;\ ./benchmark_clock_gettime $$i; \ done > result_clock_gettime.csv ``` 原來Makefile也可以用迴圈,而`seq 100 5000 20000`代表起始值100,等差5000,20000代表結束,其將作benchmark_clock_gettime的argv[1],printf與benchmark_clock_gettime將匯入result_clock_gettime.csv。 ## 觀察 在`benchmark_clock_gettime.c`中,每種方式都執行了25次,並且將其紀錄於表格中,如果將表格以`.txt`開啟,或是觀察`printf`輸出格式,可以發現到格式其實是以逗號分隔: ``` 100,0.000026,0.000072,0.000061,0.000021,0.000011 5100,0.001253,0.000599,0.000637,0.000517,0.000460 10100,0.002174,0.001128,0.001122,0.001019,0.000891 15100,0.003443,0.001682,0.003035,0.001511,0.001016 20100,0.004341,0.002275,0.002194,0.002971,0.001194 ``` 因此在做圖時,script需要`set datafile separator ","`,gnuplot才能區分每一行。 花了很久時間才寫好script ``` reset set datafile separator "," set title "compute-pi(25 rounds)" set xlabel 'N' set ylabel 'time(sec)' set terminal png size 1000,800 enhanced font 'Verdana,14' set output 'plot.png' plot "result_clock_gettime.csv" u 1:2 w lp title "Baseline",\ "result_clock_gettime.csv" u 1:3 w lp title "OpenMP with 2 threads",\ "result_clock_gettime.csv" u 1:4 w lp title "OpenMP with 4 threads",\ "result_clock_gettime.csv" u 1:5 w lp title "AVX SIMD" ,\ "result_clock_gettime.csv" u 1:5 w lp title "AVX SIMD + Loop unrolling" ,\ ``` 圖片: ![](https://i.imgur.com/dhrqC0o.png) 我發現每次執行都有落差(尤其是OpenMP(4)),可能時間短得不明顯又很少次,所以我把25次改為10000次 ![](https://i.imgur.com/O8Awfth.png) 這次有了比較一致的答案: 效率上AVX+LR > AVX >>> OMP(2) > OMP(4) >>> Baseline 接著,Loop數不變,將N放大 ![](https://i.imgur.com/D967MJ5.png) ![](https://i.imgur.com/njGoCBN.png) 在N比較小的情況下,OpenMP(4)有機會比Baseline效率還低,OpenMP(4)整體來說似乎不太穩定。 OpenMP(2)在9e5~1e6間也出現了明顯的升高,但在另一張圖沒有出現,對此沒有結論。 .... ## 了解改進方法 ### OpenMP 平行化需要注意到的問題,就是變數的存取,不同執行緒對共享變數做修改/讀取,可能會產生不可預期的結果,OpenMP提供了很多方法來讓我們決定變數要怎麼運作。 參考[OpenMP變數的平行化](https://kheresy.wordpress.com/2006/09/22/%E7%B0%A1%E6%98%93%E7%9A%84%E7%A8%8B%E5%BC%8F%E5%B9%B3%E8%A1%8C%E5%8C%96%EF%BC%8Dopenmp%EF%BC%88%E4%BA%94%EF%BC%89-%E8%AE%8A%E6%95%B8%E7%9A%84%E5%B9%B3%E8%A1%8C%E5%8C%96/)歸納以下幾個常用的: private:每個執行緒有各自的變數(但不會初始化值!)。 firstprivate:同private,但變數值為執行緒前的變數值。 lastprivate:同private,但原變數值會改為執行緒結束時的值。 atomic:防止多個執行緒修改。 reduction:每個執行緒有各自的變數(值為單位元素),最後在合併。 ### AVX SIMD 在raytracing時有使用過,但不是很理想。 在這裡充分的運用,ymm0~2為係數,ymm3作為運算,ymm4儲存運算結果 ### AVX SIMD + loop unrolling 利用到更多的avx暫存器,可以減少迴圈的數量,一個迴圈處理16個double,每個都多了原本的ymm2、ymm3、ymm4,等於一個迴圈中用了5+3+3+3=14個暫存器。 ## 關於time ### xtime vs jiffies ### cpu time [作業說明](https://hackmd.io/s/rJARexQT)中有提到CPU time是每條執行緒的時間加總,為了觀察到這個結果,我把每一種方法個別執行 ``` petermouse@pm-X550JK:~/ws/compute-pi$ time ./time_test_baseline N = 400000000 , pi = 3.141593 real 0m3.360s user 0m3.356s sys 0m0.000s petermouse@pm-X550JK:~/ws/compute-pi$ time ./time_test_openmp_2 N = 400000000 , pi = 3.141593 real 0m2.287s user 0m4.556s sys 0m0.000s petermouse@pm-X550JK:~/ws/compute-pi$ time ./time_test_openmp_4SIMD register load/store N = 400000000 , pi = 3.141593 real 0m1.772s user 0m7.032s sys 0m0.000s ``` 可以發現到user time與real time的關係,與執行緒的數量密切相關 ### ## 參考資料 [Makefile 語法簡介](http://tetralet.luna.com.tw/?op=ViewArticle&articleId=185) [使用gnuplot科學作圖](http://www.phy.fju.edu.tw/~ph116j/gnuplot_tutorial.pdf) [Visual C++ OpenMP doc](https://msdn.microsoft.com/en-us/library/0ca2w8dk.aspx) [OpenMP變數的平行化](https://kheresy.wordpress.com/2006/09/22/%E7%B0%A1%E6%98%93%E7%9A%84%E7%A8%8B%E5%BC%8F%E5%B9%B3%E8%A1%8C%E5%8C%96%EF%BC%8Dopenmp%EF%BC%88%E4%BA%94%EF%BC%89-%E8%AE%8A%E6%95%B8%E7%9A%84%E5%B9%B3%E8%A1%8C%E5%8C%96/) [clock_gettime](http://man7.org/linux/man-pages/man2/clock_gettime.2.html) [knowing the current time](http://www.makelinux.net/ldd3/chp-7-sect-2)