owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework 1 (compute-pi)
###### tags: `sarah`,`compute-pi`
contributed by <`SarahCheng`>
github:
https://github.com/SarahYuHanCheng/hw1.git
## 開發記錄
* problem:
* reference:
* [AVX筆記](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I)
* [TempoJiJi](https://hackmd.io/s/Hy7tssw6)同學
* [jkrvivian](https://hackmd.io/MYZgpgTAJmCsUFpYDNiwQFnMhBOAhmAGx5gYCMY+IUuRy5QA?view)同學
* [jeff60907](https://hackmd.io/EYUwxgrAjAbAZgZgLQA4DsEZICwCYIAmSwUccSaBcYUB29wcIQA=)同學
---
`make check`實驗結果
```=
benchmark_clock_gettime
time ./time_test_baseline
N = 400000000 , pi = 3.141593
13.75user 0.00system 0:13.75elapsed 99%CPU (0avgtext+0avgdata 1672maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
14.51user 0.00system 0:07.28elapsed 199%CPU (0avgtext+0avgdata 1680maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
14.81user 0.00system 0:04.10elapsed 360%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+82minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
1.45user 0.00system 0:01.45elapsed 99%CPU (0avgtext+0avgdata 1652maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
1.49user 0.00system 0:01.49elapsed 99%CPU (0avgtext+0avgdata 1608maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps
```
---
### Gnuplot
STEP:
1. orig: `make gencsv`
```
for i in `seq 100 5000 25000`; do \
printf "%d," $i;\
./benchmark_clock_gettime $i; \
done > result_clock_gettime.csv
```
2. 新增 runtime.gp
```=
set xlabel 'N'
set ylabel 'Time(sec)'
set style fill solid
set title 'Compute_pi Time by clock_gettime() '
set term png enhanced font 'Verdana,10'
set output 'runtime.png'
set term pngcairo size 1280,960
set datafile separator ","
plot "result_clock_gettime.csv" using 1:2 title 'baseline' with lines lt rgb 'red' , \
"result_clock_gettime.csv" using 1:3 title 'openmp_2' with lines lt rgb 'blue' , \
"result_clock_gettime.csv" using 1:4 title 'openmp_4' with lines lt rgb 'green' ,\
"result_clock_gettime.csv" using 1:5 title 'AVX' with lines lt rgb 'orange' ,\
"result_clock_gettime.csv" using 1:6 title 'AVX + loop untolling' with lines lt rgb 'brown'
```
要plot "*csv" 必須加上第8行,不然會有一些莫名其妙的錯誤訊息
3. Makefile 加以下程式
```
plot: gencsv
gnuplot runtime.gp

clean:
rm -f $(EXECUTABLE) *.o *.s result_clock_gettime.csv runtime.png
```
-DBASELINE待解:
```
* 編譯時有`-DBASELINE`等等,其中的`-D`指的是`#define BASELINE 1`
* 這樣程式碼就可以塞在同一個.c .h中!只要在編譯的時候加上`-D`就好了!!
* Makefile的篇幅可以大幅降低!
* 我想phonebook也可以這樣改寫,不然不同的方法就還要在多生一組.c .h
```
* [許元杰Makefile+圓周](https://embedded2015.hackpad.com/-Ext1-Week-1-5rm31U3BOmh)
4. 執行 `make plot`,結果如下圖

> 未增加圓周率計算過程中迭代的數量,時間太短就看不出差異?
> 於Makefile中修改:
> `for i in `seq 1000 1000 250000`; `

### Wall-clock time vs. CPU time
**1\. Wall-clock time** 由 kernel 裡的 xtime 來記錄,系統每次啟動時會先從設備上的 [RTC](https://zh.wikipedia.org/wiki/%E5%AF%A6%E6%99%82%E6%99%82%E9%90%98) 上讀入 xtime。值是自 1970-01-01 起經歷的秒數,Linux 核心也會決定每秒要發生幾次中斷 (透過常數 HZ) ,每次 timer interrupt,就會去更新 `xtime`。可能會和 NTP 伺服器、時區、日光節約時間同步或使用者自己調整。所以「通常」不建議拿來量測程式時間,因為它不是一個穩定的時間,Wall-clock time 不一定是單調遞增 (monotonic)。
1. xtime: 從cmos電路中取得RTC的時間
2. jiffies: 從電腦開機到目前的總共時鍾中斷(time interrupt)次數
**2\. CPU time** CPU time 算的是「每條 thread 的使用時間加總」
很多時候可以用此判斷程式是 CPU bound 還是 I/O bound。
1. `real < user`: 表示程式是 CPU bound,使用平行處理有得到好處,效能有提昇。
2. `real ≒ user`: 表示程式是 CPU bound,即使使用平行處理,效能也沒有明顯提昇,常見的原因有可能是其實計算量沒有很大,生成、處理、結束多執行緒的 overhead 吃掉所有平行處理的好處,或是程式相依性太高,不適合拿來作平行處理。
3. `real > user`: 表示程式是 I/O bound,成本大部分為 I/O操作,使得平行處理無法帶來顯著的效能提昇 ( 或沒有提昇)。
---
### Compute-pi
方法有四:
**1\. 離散積分** $\pi$ = 4 *( $\int_0^1$ $1 \over 1+x^2$ dt )
* **Baseline**
```c
double computePi_v1(size_t N)
{
double pi = 0.0;
double dt = 1.0 / N; // dt = (b-a)/N, b = 1, a = 0
for (size_t i = 0; i < N; i++) {
double x = (double) i / N; // x = ti = a+(b-a)*i/N = i/N
pi += dt / (1.0 + x * x); // integrate 1/(1+x^2), i = 0....N
}
return pi * 4.0;
}
```
* **AVX SIMD**
* AVX (Advanced Vector Extensions) 是 Intel 一套用來作 Single Instruction Multiple Data 的指令集 [source](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I)
* 128 位 SIMD 暫存器 xmm0 - xmm15 擴展為 256 位的 ymm0 - ymm15 暫存器
* 支持 256 位的向量運算,由原來 128 位擴展為 256 位
* 指令可支持最多 4 個 operand,實現目標操作數無需損毀原來的內容
* 引進新的 AVX 指令,非 Legacy SSE 指令移植
* 新增 FMA 指令子集 (fused multiply-add/subtract 類運算)
* 可用`$ cat /proc/cpuinfo | grep flags` 尋找是否存在 `fma`關鍵字
* >沒有`fma`, [模擬器](https://software.intel.com/en-us/articles/intel-software-development-emulator)`sudo -i`進入root,但是`$ echo 0 > /proc/sys/kernel/yama/ptrace_scope`沒有反應..[參考](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I)
```
無 'sde' 這個指令,是指這個嗎:
'sds' 指令來自於 'simh' 套件 (universe)
'cde' 指令來自於 'cde' 套件 (universe)
'sdb' 指令來自於 'sdb' 套件 (universe)
'sed' 指令來自於 'sed' 套件 (main)
'sdf' 指令來自於 'sdf' 套件 (universe)
'ide' 指令來自於 'ecere-dev' 套件 (universe)
'see' 指令來自於 'mime-support' 套件 (main)
'sdc' 指令來自於 'hpsockd' 套件 (universe)
'ode' 指令來自於 'plotutils' 套件 (universe)
'tde' 指令來自於 'devtodo' 套件 (universe)
'spe' 指令來自於 'spe' 套件 (universe)
'sd' 指令來自於 'sd' 套件 (universe)
```
* ctrl+D =Exit root
* 像是`Intel(R) Core(TM) i5-3317U CPU` 等級的電腦就沒支援,這樣的話可用
* 引進新的指令編碼模式,使用 VEX prefix 來設計指令編碼`__m256d` 並不是一種暫存器,是指可以用來載入到 AVX 暫存器的 "Data type"後面程式中用到了`__attribute__ `機制,可以用來設置函數屬性(Function Attribute)、變數屬性(Variable Attribute)和類型屬性(Type Attribute)。`aligned` 則規定變數或結構的最小對齊格式,以 Byte 為單位。
**2\. Leibniz 級數**
[數學證明](https://crypto.stanford.edu/pbc/notes/pi/glseries.html)
Step:
1. add [leibniz code],[leibniz_openmp code] : `computepi.c` `time_test.c` `benchmark_clock_gettime.c` `Makefile`
2. `Make plot`
> * ~~出現錯誤訊息~~
```
benchmark_clock_gettime.c:(.text+0x340): 未定義參考到「compute_pi_Leibniz」
benchmark_clock_gettime.c:(.text+0x3c8): 未定義參考到「compute_pi_Leibniz_openmp_4」
collect2: error: ld returned 1 exit status
Makefile:9: recipe for target 'default' failed
make: *** [default] Error 1
```
> `compute_pi_Leibniz`—L改為小寫即可(要和computepi.c裡的函式名稱一致)
> * 圖是空的...

>>Warning: empty x range [1000:1000], adjusting to [990:1010]
>>
>> 1.查看`result_clock_gettime.csv`發現1,2欄為—nan(有可能是出現分母是0)
>> 2.runtime.gp plot的順序和benchmark_clock_gettime.c不一致
```
benchmark_clock_gettime.c:80:9: warning: implicit declaration of function ‘compute_pi_leibniz_openmp’ [-Wimplicit-function-declaration]compute_pi_leibniz_openmp(N,4);
```
結果和openmp2很接近

**3\. Euler**

1.`git reset --hard HEAD^`回到上一次commit的版本(還沒加其他方法)
2.computepi.c add function
```
double compute_pi_eular_baseline(size_t N)
{
double pi = 0.0;
for (size_t i = 1; i < N; i++)
{
pi += 1 / pow(i,2);
}
return sqrt(pi * 6.0);
}
```
##### Q: computpi.c,computepi.h,time_test.c,benchmark_clock_gettime.c有加`#include <math.h>`, Makefile有加 -lm,卻還是出現:
```
computepi.o: 於函式 compute_pi_eular:
computepi.c:(.text+0xcad): 未定義參考到「pow」
```
>> 可以嘗試在編譯時加入參數 `-lm` [name=ierosodin]
>好喔!謝謝你!
##### A:



>上面三張 runtime.gp欄位打錯,出現奇怪現象
>下面這張才是正常的:
>
**4\. [MonteCarlo蒙地卡羅法](https://zh.wikipedia.org/wiki/%E8%92%99%E5%9C%B0%E5%8D%A1%E7%BE%85%E6%96%B9%E6%B3%95):**
* 邊長為2的正方形,正方形內有一個內切圓,面積為 $1 \cdot 1 \cdotπ = π$
* 圓面積和正方形面積之比為π:4 = 正方形內均勻隨機丟石頭,落在圓形內的機率
```C==
double compute_pi_montecarlo(size_t N)
{
int count = 0;
srand(time(NULL));
for (size_t i = 0; i < N; i++) {
double x = (double)rand()/RAND_MAX;
double y = (double)rand()/RAND_MAX;
if((x*x+y*y)<=1) count++;
}
return 4.0 * count / N ;
}
```
```=
time ./time_test_baseline
N = 400000000 , pi = 3.141593
13.72user 0.00system 0:13.72elapsed 100%CPU (0avgtext+0avgdata 1656maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
14.51user 0.00system 0:07.26elapsed 199%CPU (0avgtext+0avgdata 1540maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
14.82user 0.00system 0:03.78elapsed 392%CPU (0avgtext+0avgdata 1648maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
1.45user 0.00system 0:01.45elapsed 99%CPU (0avgtext+0avgdata 1676maxresident)k
0inputs+0outputs (0major+80minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
1.48user 0.00system 0:01.48elapsed 100%CPU (0avgtext+0avgdata 1676maxresident)k
0inputs+0outputs (0major+77minor)pagefaults 0swaps
time ./time_test_leibniz
N = 400000000 , pi = 400000000.000000
7.39user 0.00system 0:07.39elapsed 99%CPU (0avgtext+0avgdata 1604maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
time ./time_test_leibniz_openmp_2
N = 400000000 , pi = 0.000000
7.79user 0.00system 0:03.90elapsed 199%CPU (0avgtext+0avgdata 1660maxresident)k
0inputs+0outputs (0major+77minor)pagefaults 0swaps
time ./time_test_leibniz_openmp_4
N = 400000000 , pi = 0.000000
8.04user 0.00system 0:02.03elapsed 395%CPU (0avgtext+0avgdata 1684maxresident)k
0inputs+0outputs (0major+80minor)pagefaults 0swaps
time ./time_test_montecarlo
N = 400000000 , pi = 400000000.000000
16.04user 0.00system 0:16.04elapsed 99%CPU (0avgtext+0avgdata 1596maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps
```
### GIT problem
`! [rejected] master -> master (non-fast-forward)`
`Automatic merge failed; fix conflicts and then commit the result.`
`Updates were rejected because the tip of your current branch is behind its remote counterpart.`
```=
To squelch this message
and maintain the traditional behavior, use:
git config --global push.default matching
To squelch this message and adopt the new behavior now, use:
git config --global push.default simple
When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.
```
執行`git pull origin master`後,多了不對的程式碼,想要還原`git revert HEAD`但是出現錯誤訊息:
```
error: revert is not possible because you have unmerged files.
hint: Fix them up in the work tree, and then use 'git add/rm <file>'
hint: as appropriate to mark resolution and make a commit.
fatal: revert failed
```
但是我不想merge他...

### A:`git push -f`強制push
