# 2016q3 Homework 1 (compute-pi) ###### tags: `sarah`,`compute-pi` contributed by <`SarahCheng`> github: https://github.com/SarahYuHanCheng/hw1.git ## 開發記錄 * problem: * reference: * [AVX筆記](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I) * [TempoJiJi](https://hackmd.io/s/Hy7tssw6)同學 * [jkrvivian](https://hackmd.io/MYZgpgTAJmCsUFpYDNiwQFnMhBOAhmAGx5gYCMY+IUuRy5QA?view)同學 * [jeff60907](https://hackmd.io/EYUwxgrAjAbAZgZgLQA4DsEZICwCYIAmSwUccSaBcYUB29wcIQA=)同學 --- `make check`實驗結果 ```= benchmark_clock_gettime time ./time_test_baseline N = 400000000 , pi = 3.141593 13.75user 0.00system 0:13.75elapsed 99%CPU (0avgtext+0avgdata 1672maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_2 N = 400000000 , pi = 3.141593 14.51user 0.00system 0:07.28elapsed 199%CPU (0avgtext+0avgdata 1680maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_4 N = 400000000 , pi = 3.141593 14.81user 0.00system 0:04.10elapsed 360%CPU (0avgtext+0avgdata 1700maxresident)k 0inputs+0outputs (0major+82minor)pagefaults 0swaps time ./time_test_avx N = 400000000 , pi = 3.141593 1.45user 0.00system 0:01.45elapsed 99%CPU (0avgtext+0avgdata 1652maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps time ./time_test_avxunroll N = 400000000 , pi = 3.141593 1.49user 0.00system 0:01.49elapsed 99%CPU (0avgtext+0avgdata 1608maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps ``` --- ### Gnuplot STEP: 1. orig: `make gencsv` ``` for i in `seq 100 5000 25000`; do \ printf "%d," $i;\ ./benchmark_clock_gettime $i; \ done > result_clock_gettime.csv ``` 2. 新增 runtime.gp ```= set xlabel 'N' set ylabel 'Time(sec)' set style fill solid set title 'Compute_pi Time by clock_gettime() ' set term png enhanced font 'Verdana,10' set output 'runtime.png' set term pngcairo size 1280,960 set datafile separator "," plot "result_clock_gettime.csv" using 1:2 title 'baseline' with lines lt rgb 'red' , \ "result_clock_gettime.csv" using 1:3 title 'openmp_2' with lines lt rgb 'blue' , \ "result_clock_gettime.csv" using 1:4 title 'openmp_4' with lines lt rgb 'green' ,\ "result_clock_gettime.csv" using 1:5 title 'AVX' with lines lt rgb 'orange' ,\ "result_clock_gettime.csv" using 1:6 title 'AVX + loop untolling' with lines lt rgb 'brown' ``` 要plot "*csv" 必須加上第8行,不然會有一些莫名其妙的錯誤訊息 3. Makefile 加以下程式 ``` plot: gencsv gnuplot runtime.gp ![](https://i.imgur.com/ZRDQhYx.png) clean: rm -f $(EXECUTABLE) *.o *.s result_clock_gettime.csv runtime.png ``` -DBASELINE待解: ``` * 編譯時有`-DBASELINE`等等,其中的`-D`指的是`#define BASELINE 1` * 這樣程式碼就可以塞在同一個.c .h中!只要在編譯的時候加上`-D`就好了!! * Makefile的篇幅可以大幅降低! * 我想phonebook也可以這樣改寫,不然不同的方法就還要在多生一組.c .h ``` * [許元杰Makefile+圓周](https://embedded2015.hackpad.com/-Ext1-Week-1-5rm31U3BOmh) 4. 執行 `make plot`,結果如下圖 ![](https://i.imgur.com/kBcsLpt.png) > 未增加圓周率計算過程中迭代的數量,時間太短就看不出差異? > 於Makefile中修改: > `for i in `seq 1000 1000 250000`; ` ![](https://i.imgur.com/MCNs6CF.png) ### Wall-clock time vs. CPU time **1\. Wall-clock time** 由 kernel 裡的 xtime 來記錄,系統每次啟動時會先從設備上的 [RTC](https://zh.wikipedia.org/wiki/%E5%AF%A6%E6%99%82%E6%99%82%E9%90%98) 上讀入 xtime。值是自 1970-01-01 起經歷的秒數,Linux 核心也會決定每秒要發生幾次中斷 (透過常數 HZ) ,每次 timer interrupt,就會去更新 `xtime`。可能會和 NTP 伺服器、時區、日光節約時間同步或使用者自己調整。所以「通常」不建議拿來量測程式時間,因為它不是一個穩定的時間,Wall-clock time 不一定是單調遞增 (monotonic)。 1. xtime: 從cmos電路中取得RTC的時間 2. jiffies: 從電腦開機到目前的總共時鍾中斷(time interrupt)次數 **2\. CPU time** CPU time 算的是「每條 thread 的使用時間加總」 很多時候可以用此判斷程式是 CPU bound 還是 I/O bound。 1. `real < user`: 表示程式是 CPU bound,使用平行處理有得到好處,效能有提昇。 2. `real ≒ user`: 表示程式是 CPU bound,即使使用平行處理,效能也沒有明顯提昇,常見的原因有可能是其實計算量沒有很大,生成、處理、結束多執行緒的 overhead 吃掉所有平行處理的好處,或是程式相依性太高,不適合拿來作平行處理。 3. `real > user`: 表示程式是 I/O bound,成本大部分為 I/O操作,使得平行處理無法帶來顯著的效能提昇 ( 或沒有提昇)。 --- ### Compute-pi 方法有四: **1\. 離散積分** $\pi$ = 4 *( $\int_0^1$ $1 \over 1+x^2$ dt ) * **Baseline** ```c double computePi_v1(size_t N) { double pi = 0.0; double dt = 1.0 / N; // dt = (b-a)/N, b = 1, a = 0 for (size_t i = 0; i < N; i++) { double x = (double) i / N; // x = ti = a+(b-a)*i/N = i/N pi += dt / (1.0 + x * x); // integrate 1/(1+x^2), i = 0....N } return pi * 4.0; } ``` * **AVX SIMD** * AVX (Advanced Vector Extensions) 是 Intel 一套用來作  Single Instruction Multiple Data 的指令集 [source](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I) * 128 位 SIMD 暫存器 xmm0 - xmm15 擴展為 256 位的 ymm0 - ymm15 暫存器 * 支持 256 位的向量運算,由原來 128 位擴展為 256 位 * 指令可支持最多 4 個 operand,實現目標操作數無需損毀原來的內容 * 引進新的 AVX 指令,非 Legacy SSE 指令移植 * 新增 FMA 指令子集 (fused multiply-add/subtract 類運算) * 可用`$ cat /proc/cpuinfo | grep flags` 尋找是否存在 `fma`關鍵字 * >沒有`fma`, [模擬器](https://software.intel.com/en-us/articles/intel-software-development-emulator)`sudo -i`進入root,但是`$ echo 0 > /proc/sys/kernel/yama/ptrace_scope`沒有反應..[參考](https://hackpad.com/Hw1-Extcompute_pi-geXUeYjdv1I) ``` 無 'sde' 這個指令,是指這個嗎: 'sds' 指令來自於 'simh' 套件 (universe) 'cde' 指令來自於 'cde' 套件 (universe) 'sdb' 指令來自於 'sdb' 套件 (universe) 'sed' 指令來自於 'sed' 套件 (main) 'sdf' 指令來自於 'sdf' 套件 (universe) 'ide' 指令來自於 'ecere-dev' 套件 (universe) 'see' 指令來自於 'mime-support' 套件 (main) 'sdc' 指令來自於 'hpsockd' 套件 (universe) 'ode' 指令來自於 'plotutils' 套件 (universe) 'tde' 指令來自於 'devtodo' 套件 (universe) 'spe' 指令來自於 'spe' 套件 (universe) 'sd' 指令來自於 'sd' 套件 (universe) ``` * ctrl+D =Exit root * 像是`Intel(R) Core(TM) i5-3317U CPU` 等級的電腦就沒支援,這樣的話可用 * 引進新的指令編碼模式,使用 VEX prefix 來設計指令編碼`__m256d` 並不是一種暫存器,是指可以用來載入到 AVX 暫存器的 "Data type"後面程式中用到了`__attribute__ `機制,可以用來設置函數屬性(Function Attribute)、變數屬性(Variable Attribute)和類型屬性(Type Attribute)。`aligned` 則規定變數或結構的最小對齊格式,以 Byte 為單位。 **2\. Leibniz 級數**![](https://hackpad-attachments.imgix.net/embedded2015.hackpad.com_5rm31U3BOmh_p.340787_1442759145862_undefined?fit=max&w=882) [數學證明](https://crypto.stanford.edu/pbc/notes/pi/glseries.html) Step: 1. add [leibniz code],[leibniz_openmp code] : `computepi.c` `time_test.c` `benchmark_clock_gettime.c` `Makefile` 2. `Make plot` > * ~~出現錯誤訊息~~ ``` benchmark_clock_gettime.c:(.text+0x340): 未定義參考到「compute_pi_Leibniz」 benchmark_clock_gettime.c:(.text+0x3c8): 未定義參考到「compute_pi_Leibniz_openmp_4」 collect2: error: ld returned 1 exit status Makefile:9: recipe for target 'default' failed make: *** [default] Error 1 ``` > `compute_pi_Leibniz`—L改為小寫即可(要和computepi.c裡的函式名稱一致) > * 圖是空的... ![](https://i.imgur.com/ibxXWRC.png) >>Warning: empty x range [1000:1000], adjusting to [990:1010] >> >> 1.查看`result_clock_gettime.csv`發現1,2欄為—nan(有可能是出現分母是0) >> 2.runtime.gp plot的順序和benchmark_clock_gettime.c不一致 ``` benchmark_clock_gettime.c:80:9: warning: implicit declaration of function ‘compute_pi_leibniz_openmp’ [-Wimplicit-function-declaration]compute_pi_leibniz_openmp(N,4); ``` 結果和openmp2很接近 ![](https://i.imgur.com/SnKDfUW.png) **3\. Euler** ![vic85821](https://hackmd.io/CYQwxgzAbFAcAaCAmAnARkfYB2CiauasiusGIIADBBhGrUA===) 1.`git reset --hard HEAD^`回到上一次commit的版本(還沒加其他方法) 2.computepi.c add function ``` double compute_pi_eular_baseline(size_t N) { double pi = 0.0; for (size_t i = 1; i < N; i++) { pi += 1 / pow(i,2); } return sqrt(pi * 6.0); } ``` ##### Q: computpi.c,computepi.h,time_test.c,benchmark_clock_gettime.c有加`#include <math.h>`, Makefile有加 -lm,卻還是出現: ``` computepi.o: 於函式 compute_pi_eular: computepi.c:(.text+0xcad): 未定義參考到「pow」 ``` >> 可以嘗試在編譯時加入參數 `-lm` [name=ierosodin] >好喔!謝謝你! ##### A: ![](https://i.imgur.com/ePCSvZV.png) ![](https://i.imgur.com/U9xN8dM.png) ![](https://i.imgur.com/lnMDjEv.png) >上面三張 runtime.gp欄位打錯,出現奇怪現象 >下面這張才是正常的: >![](https://i.imgur.com/NmQcup0.png) **4\. [MonteCarlo蒙地卡羅法](https://zh.wikipedia.org/wiki/%E8%92%99%E5%9C%B0%E5%8D%A1%E7%BE%85%E6%96%B9%E6%B3%95):** * 邊長為2的正方形,正方形內有一個內切圓,面積為 $1 \cdot 1 \cdotπ = π$ * 圓面積和正方形面積之比為π:4 = 正方形內均勻隨機丟石頭,落在圓形內的機率 ```C== double compute_pi_montecarlo(size_t N) { int count = 0; srand(time(NULL)); for (size_t i = 0; i < N; i++) { double x = (double)rand()/RAND_MAX; double y = (double)rand()/RAND_MAX; if((x*x+y*y)<=1) count++; } return 4.0 * count / N ; } ``` ```= time ./time_test_baseline N = 400000000 , pi = 3.141593 13.72user 0.00system 0:13.72elapsed 100%CPU (0avgtext+0avgdata 1656maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_2 N = 400000000 , pi = 3.141593 14.51user 0.00system 0:07.26elapsed 199%CPU (0avgtext+0avgdata 1540maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_4 N = 400000000 , pi = 3.141593 14.82user 0.00system 0:03.78elapsed 392%CPU (0avgtext+0avgdata 1648maxresident)k 0inputs+0outputs (0major+81minor)pagefaults 0swaps time ./time_test_avx N = 400000000 , pi = 3.141593 1.45user 0.00system 0:01.45elapsed 99%CPU (0avgtext+0avgdata 1676maxresident)k 0inputs+0outputs (0major+80minor)pagefaults 0swaps time ./time_test_avxunroll N = 400000000 , pi = 3.141593 1.48user 0.00system 0:01.48elapsed 100%CPU (0avgtext+0avgdata 1676maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps time ./time_test_leibniz N = 400000000 , pi = 400000000.000000 7.39user 0.00system 0:07.39elapsed 99%CPU (0avgtext+0avgdata 1604maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps time ./time_test_leibniz_openmp_2 N = 400000000 , pi = 0.000000 7.79user 0.00system 0:03.90elapsed 199%CPU (0avgtext+0avgdata 1660maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps time ./time_test_leibniz_openmp_4 N = 400000000 , pi = 0.000000 8.04user 0.00system 0:02.03elapsed 395%CPU (0avgtext+0avgdata 1684maxresident)k 0inputs+0outputs (0major+80minor)pagefaults 0swaps time ./time_test_montecarlo N = 400000000 , pi = 400000000.000000 16.04user 0.00system 0:16.04elapsed 99%CPU (0avgtext+0avgdata 1596maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps ``` ### GIT problem `! [rejected] master -> master (non-fast-forward)` `Automatic merge failed; fix conflicts and then commit the result.` `Updates were rejected because the tip of your current branch is behind its remote counterpart.` ```= To squelch this message and maintain the traditional behavior, use: git config --global push.default matching To squelch this message and adopt the new behavior now, use: git config --global push.default simple When push.default is set to 'matching', git will push local branches to the remote branches that already exist with the same name. ``` 執行`git pull origin master`後,多了不對的程式碼,想要還原`git revert HEAD`但是出現錯誤訊息: ``` error: revert is not possible because you have unmerged files. hint: Fix them up in the work tree, and then use 'git add/rm <file>' hint: as appropriate to mark resolution and make a commit. fatal: revert failed ``` 但是我不想merge他... ![](https://i.imgur.com/b2iSTL6.png) ### A:`git push -f`強制push ![](https://i.imgur.com/OnpDOgC.png)