# 2017q1 Homework1 (compute-pi) ###### tags: `embedded` contributed by <`Cayonliow`> ### Reviewed by `PeterTing` * 可以試著優化 computepi.c 中的 compute_pi_leibniz ,像是用 OpenMP 或是 SIMD 的方式來做做看,或許可以發現更多的東西。 * 可以探討為什麼 OpenMP 最不穩定? * 可以用信賴區間 95%來對資料進行篩選。 * 可以對不同的時間函式進行研究 tags : `LibreOffice ` `SIMD` `Leibniz` `time` `error rate ` ## 作業 * 題目: [B02: compute-pi](https://hackmd.io/s/HkiJpDPtx#) * github(原來的): [compute-pi](https://github.com/sysprog21/compute-pi) * 參考資料: [廖健富學長提供的詳盡分析(hackpad)](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I) ### 開發環境 ```shell= cayon@cayon-X550JX:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz Stepping: 3 CPU MHz: 2423.484 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 5188.45 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 ``` ## 工具 * LibreOffice * 文章: [教學](http://www.ylm.edu.hk/subject/com/ebook/Calc_2011.pdf) * SIMD * 文章: [SIMD技術](http://share.onlinesjtu.com/mod/tab/view.php?id=303) , [GCC中SIMD指令的應用方法](http://www.ibm.com/developerworks/cn/linux/l-gccsimd/) , [Intel MMX/SSE指令集介紹](http://aerobme.pixnet.net/blog/post/39107691-intel-mmx-sse%E6%8C%87%E4%BB%A4%E9%9B%86%E4%BB%8B%E7%B4%B9) ## 原理與概念 * 圓周率 * 文章: [積分計算圓周率π](http://book.51cto.com/art/201506/480985.htm) , [Function](http://www.csie.ntnu.edu.tw/~u91029/Function2.html) * Leibniz * 文章: [萊布尼茲的微積分](http://episte.math.ntu.edu.tw/articles/sm/sm_17_03_2/) , [Leibniz formula for π](https://en.wikipedia.org/wiki/Leibniz_formula_for_%CF%80) * 黎曼積分 * 文章: [wikipedia](https://zh.wikipedia.org/wiki/%E9%BB%8E%E6%9B%BC%E7%A7%AF%E5%88%86) * time * Wall-clock time (掛鐘時間) * elapsed time * 指一段程序从运行到终止,系统时钟走过的时间。一般来说,系统时间都是要大于CPU时间的。通常这类时间可以由系统提供,在C++/Windows中,可以由`<time.h>`提供。注意得到的时间精度是和系统有关系的 * 文章: [學長姐們的筆記 (1)](https://charles620016.hackpad.com/ep/pad/static/OfmE372fliX) , [學長姐們的筆記 (2)](https://hackmd.io/s/S1s6k2ip#) , [什麼是 wall clock time ](http://pwcrab.blog.163.com/blog/static/1699038222011107101054946/) , [wall time](http://whatis.techtarget.com/definition/wall-time-real-world-time-or-wall-clock-time) * CPU time * 電腦本身是一個sequential circuit 循序電路 而讓sequential circuit能夠正常順利的運作 最關鍵的就是需要有一個clock 去控制資料的同步以及更新 也因此電腦最基本的時間單位就是 clock cycle * CPU time 可以視為是一個程式在CPU裡面執行了多少個 clock cycle 再乘上每一個clock cycle它實際花了多少的時間 * 文章: [CPU Time](https://chi_gitbook.gitbooks.io/personal-note/content/cpu_time.html) * ![](https://i.imgur.com/hPPsi18.jpg) ## 開發記錄 * 把專案 `git clone` 下來, 研究 LibreOffice 來制圖 * 可是制出來的圖數據是累加的, 搞不懂如何將數據獨立出來 * 所以先用之前作業的 gnuplot 制圖, 看結果 ![](https://i.imgur.com/TKi4B82.png) 遇到的問題: ```shell=vim cayon@cayon-X550JX:~/embedded/hw1/compute-pi$ make plot gnuplot scripts/runtime.gp plot[:][:]'result_clock_gettime.csv' using 1:2 title 'Baseline', '' using 1:3 title 'OpenMP(2 threads)', '' using 1:4 title 'OpenMP(4 threads)', '' using 1:5 title 'AVX', '' using 1:6 title 'AVX + Unroll looping', \ ^ "scripts/runtime.gp", line 14: invalid character \ Makefile:44: recipe for target 'plot' failed make: *** [plot] Error 1 ``` >>>>原來 `\` 後面不能有任何[空格] [color=blue] * 執行 `$ make check` 得到各個方式所實做出來計算 pi 的時間 ``` time ./time_test_baseline N = 400000000 , pi = 3.141593 3.26user 0.00system 0:03.27elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k 0inputs+0outputs (0major+84minor)pagefaults 0swaps time ./time_test_openmp_2 N = 400000000 , pi = 3.141593 3.24user 0.00system 0:01.62elapsed 199%CPU (0avgtext+0avgdata 1724maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps time ./time_test_openmp_4 N = 400000000 , pi = 3.141593 4.86user 0.00system 0:01.60elapsed 303%CPU (0avgtext+0avgdata 1784maxresident)k 0inputs+0outputs (0major+92minor)pagefaults 0swaps time ./time_test_avx N = 400000000 , pi = 3.141593 0.91user 0.00system 0:00.91elapsed 99%CPU (0avgtext+0avgdata 1712maxresident)k 0inputs+0outputs (0major+86minor)pagefaults 0swaps time ./time_test_avxunroll N = 400000000 , pi = 3.141593 0.87user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1764maxresident)k 0inputs+0outputs (0major+85minor)pagefaults 0swaps ``` * 參考 [petermouse 的共筆](https://hackmd.io/s/Hk2DSIYp#) * 在benchmark_clock_gettime.c中, 將表格以.txt開啟,或是觀察printf輸出格式,可以發現到格式其實是以逗號分隔輸出: ``` 100,0.000025,0.000072,0.000076,0.000017,0.000007 5100,0.001208,0.000596,0.001392,0.001637,0.000525 10100,0.002192,0.001114,0.001097,0.000590,0.000555 15100,0.003449,0.001743,0.000894,0.000868,0.000840 20100,0.004585,0.003216,0.002393,0.001183,0.001291 25100,0.005393,0.002688,0.002675,0.001436,0.001384 ``` 使用datafile separator ",",gnuplot才能區分每一行 的每一筆資料 * 原始版本的 `benchmark_clock_gettime.c`,每種方式都執行了25次 ```shell=vim int N = atoi(argv[1]); int i, loop = 25; // Baseline clock_gettime(CLOCK_ID, &start); for (i = 0; i < loop; i++) { compute_pi_baseline(N); ``` 將執行的次數調成1000 輸出結果:!![](https://i.imgur.com/rOt38e6.png) 雖然很明顯 Baseline 的斜率最大, 時間最長。 可是在 OpenMP(4 threads) , AVX , AVX + Unroll looping 這三個實作的結果來看, 線條太靠近,加上 OpenMP(4 threads) 跟 AVX 有不穩定的狀況出現, 數據結果的比較不太明顯 所以試着將執行的次數調成10000 輸出結果:![](https://i.imgur.com/DHJjm4Y.png) OpenMP(4 threads) 跟 AVX de1不穩定狀況消失了 可是結果線條還是太密集, 無法直接用肉眼判斷 可是我們還是能夠知道: AVX + Unroll looping > AVX >>> OpenMP(2 threads) > OpenMP(4 threads) >>> Baseline 所以試着將執行的次數調成100000 可是分別不大, 可是程式的執行時間 (`make gencsv`) 變長了很多 所以試着將執行的次數調回25 將 N 調大 輸出結果:![](https://i.imgur.com/6J5hZ4Q.png) 在幾次的輸出結果, 發現 OpenMP(4 threads) 最爲不穩定 * 參考 [laochanlam 的共筆](https://hackmd.io/s/H1TUAblqg#)的方式實作 Leibniz's formula * 參考 [shelly4132 的共筆](https://hackmd.io/s/HJrLU0dp#) 的証明方法 * ![](https://i.imgur.com/splxhim.png) ```shell=vim double compute_pi_leibniz(size_t N) { double pi = 0.0; double term = 0.0; double sign = 1.0; for (size_t i = 0; i < N; i++) { term = (double) sign / ( 2.0 * i + 1.0 ); sign = -sign; pi += term; } return pi * 4.0; } ``` 輸出結果: ![](https://i.imgur.com/1AkjRFX.png) * error rate * 引用 : [廖健富學長提供的詳盡分析](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I) ```shell=vim // Error rate calculation #define M_PI acos(-1.0) double pi = compute_pi(n); double diff = pi - M_PI > 0 ? pi - M_PI : M_PI - pi; double error = diff / M_PI; ``` 利用 cos^-1^(-1) 求 pi , error rate 就是與每一個版本的誤差值