# 2017q1 Homework1 (compute-pi)
###### tags: `embedded`
contributed by <`Cayonliow`>
### Reviewed by `PeterTing`
* 可以試著優化 computepi.c 中的 compute_pi_leibniz ,像是用 OpenMP 或是 SIMD 的方式來做做看,或許可以發現更多的東西。
* 可以探討為什麼 OpenMP 最不穩定?
* 可以用信賴區間 95%來對資料進行篩選。
* 可以對不同的時間函式進行研究
tags : `LibreOffice ` `SIMD` `Leibniz` `time` `error rate `
## 作業
* 題目: [B02: compute-pi](https://hackmd.io/s/HkiJpDPtx#)
* github(原來的): [compute-pi](https://github.com/sysprog21/compute-pi)
* 參考資料: [廖健富學長提供的詳盡分析(hackpad)](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I)
### 開發環境
```shell=
cayon@cayon-X550JX:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Stepping: 3
CPU MHz: 2423.484
CPU max MHz: 3600.0000
CPU min MHz: 800.0000
BogoMIPS: 5188.45
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
```
## 工具
* LibreOffice
* 文章: [教學](http://www.ylm.edu.hk/subject/com/ebook/Calc_2011.pdf)
* SIMD
* 文章: [SIMD技術](http://share.onlinesjtu.com/mod/tab/view.php?id=303) , [GCC中SIMD指令的應用方法](http://www.ibm.com/developerworks/cn/linux/l-gccsimd/) , [Intel MMX/SSE指令集介紹](http://aerobme.pixnet.net/blog/post/39107691-intel-mmx-sse%E6%8C%87%E4%BB%A4%E9%9B%86%E4%BB%8B%E7%B4%B9)
## 原理與概念
* 圓周率
* 文章: [積分計算圓周率π](http://book.51cto.com/art/201506/480985.htm) , [Function](http://www.csie.ntnu.edu.tw/~u91029/Function2.html)
* Leibniz
* 文章: [萊布尼茲的微積分](http://episte.math.ntu.edu.tw/articles/sm/sm_17_03_2/) , [Leibniz formula for π](https://en.wikipedia.org/wiki/Leibniz_formula_for_%CF%80)
* 黎曼積分
* 文章: [wikipedia](https://zh.wikipedia.org/wiki/%E9%BB%8E%E6%9B%BC%E7%A7%AF%E5%88%86)
* time
* Wall-clock time (掛鐘時間)
* elapsed time
* 指一段程序从运行到终止,系统时钟走过的时间。一般来说,系统时间都是要大于CPU时间的。通常这类时间可以由系统提供,在C++/Windows中,可以由`<time.h>`提供。注意得到的时间精度是和系统有关系的
* 文章: [學長姐們的筆記 (1)](https://charles620016.hackpad.com/ep/pad/static/OfmE372fliX) , [學長姐們的筆記 (2)](https://hackmd.io/s/S1s6k2ip#) , [什麼是 wall clock time ](http://pwcrab.blog.163.com/blog/static/1699038222011107101054946/) , [wall time](http://whatis.techtarget.com/definition/wall-time-real-world-time-or-wall-clock-time)
* CPU time
* 電腦本身是一個sequential circuit 循序電路 而讓sequential circuit能夠正常順利的運作 最關鍵的就是需要有一個clock 去控制資料的同步以及更新 也因此電腦最基本的時間單位就是 clock cycle
* CPU time 可以視為是一個程式在CPU裡面執行了多少個 clock cycle 再乘上每一個clock cycle它實際花了多少的時間
* 文章: [CPU Time](https://chi_gitbook.gitbooks.io/personal-note/content/cpu_time.html)
* ![](https://i.imgur.com/hPPsi18.jpg)
## 開發記錄
* 把專案 `git clone` 下來, 研究 LibreOffice 來制圖
* 可是制出來的圖數據是累加的, 搞不懂如何將數據獨立出來
* 所以先用之前作業的 gnuplot 制圖, 看結果
![](https://i.imgur.com/TKi4B82.png)
遇到的問題:
```shell=vim
cayon@cayon-X550JX:~/embedded/hw1/compute-pi$ make plot
gnuplot scripts/runtime.gp
plot[:][:]'result_clock_gettime.csv' using 1:2 title 'Baseline', '' using 1:3 title 'OpenMP(2 threads)', '' using 1:4 title 'OpenMP(4 threads)', '' using 1:5 title 'AVX', '' using 1:6 title 'AVX + Unroll looping', \
^
"scripts/runtime.gp", line 14: invalid character \
Makefile:44: recipe for target 'plot' failed
make: *** [plot] Error 1
```
>>>>原來 `\` 後面不能有任何[空格] [color=blue]
* 執行 `$ make check` 得到各個方式所實做出來計算 pi 的時間
```
time ./time_test_baseline
N = 400000000 , pi = 3.141593
3.26user 0.00system 0:03.27elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
3.24user 0.00system 0:01.62elapsed 199%CPU (0avgtext+0avgdata 1724maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
4.86user 0.00system 0:01.60elapsed 303%CPU (0avgtext+0avgdata 1784maxresident)k
0inputs+0outputs (0major+92minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
0.91user 0.00system 0:00.91elapsed 99%CPU (0avgtext+0avgdata 1712maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
0.87user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1764maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
```
* 參考 [petermouse 的共筆](https://hackmd.io/s/Hk2DSIYp#)
* 在benchmark_clock_gettime.c中, 將表格以.txt開啟,或是觀察printf輸出格式,可以發現到格式其實是以逗號分隔輸出:
```
100,0.000025,0.000072,0.000076,0.000017,0.000007
5100,0.001208,0.000596,0.001392,0.001637,0.000525
10100,0.002192,0.001114,0.001097,0.000590,0.000555
15100,0.003449,0.001743,0.000894,0.000868,0.000840
20100,0.004585,0.003216,0.002393,0.001183,0.001291
25100,0.005393,0.002688,0.002675,0.001436,0.001384
```
使用datafile separator ",",gnuplot才能區分每一行 的每一筆資料
* 原始版本的 `benchmark_clock_gettime.c`,每種方式都執行了25次
```shell=vim
int N = atoi(argv[1]);
int i, loop = 25;
// Baseline
clock_gettime(CLOCK_ID, &start);
for (i = 0; i < loop; i++) {
compute_pi_baseline(N);
```
將執行的次數調成1000
輸出結果:!![](https://i.imgur.com/rOt38e6.png)
雖然很明顯 Baseline 的斜率最大, 時間最長。 可是在 OpenMP(4 threads) , AVX , AVX + Unroll looping 這三個實作的結果來看, 線條太靠近,加上 OpenMP(4 threads) 跟 AVX 有不穩定的狀況出現, 數據結果的比較不太明顯
所以試着將執行的次數調成10000
輸出結果:![](https://i.imgur.com/DHJjm4Y.png)
OpenMP(4 threads) 跟 AVX de1不穩定狀況消失了 可是結果線條還是太密集, 無法直接用肉眼判斷
可是我們還是能夠知道: AVX + Unroll looping > AVX >>> OpenMP(2 threads) > OpenMP(4 threads) >>> Baseline
所以試着將執行的次數調成100000
可是分別不大, 可是程式的執行時間 (`make gencsv`) 變長了很多
所以試着將執行的次數調回25 將 N 調大
輸出結果:![](https://i.imgur.com/6J5hZ4Q.png)
在幾次的輸出結果, 發現 OpenMP(4 threads) 最爲不穩定
* 參考 [laochanlam 的共筆](https://hackmd.io/s/H1TUAblqg#)的方式實作 Leibniz's formula
* 參考 [shelly4132 的共筆](https://hackmd.io/s/HJrLU0dp#) 的証明方法
* ![](https://i.imgur.com/splxhim.png)
```shell=vim
double compute_pi_leibniz(size_t N)
{
double pi = 0.0;
double term = 0.0;
double sign = 1.0;
for (size_t i = 0; i < N; i++) {
term = (double) sign / ( 2.0 * i + 1.0 );
sign = -sign;
pi += term;
}
return pi * 4.0;
}
```
輸出結果: ![](https://i.imgur.com/1AkjRFX.png)
* error rate
* 引用 : [廖健富學長提供的詳盡分析](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I)
```shell=vim
// Error rate calculation
#define M_PI acos(-1.0)
double pi = compute_pi(n);
double diff = pi - M_PI > 0 ? pi - M_PI : M_PI - pi;
double error = diff / M_PI;
```
利用 cos^-1^(-1) 求 pi , error rate 就是與每一個版本的誤差值