Try   HackMD

2016q3 Homework 1 (compute-pi)

tags: sarah,compute-pi

contributed by <SarahCheng>
github:
https://github.com/SarahYuHanCheng/hw1.git

開發記錄


make check實驗結果

benchmark_clock_gettime time ./time_test_baseline N = 400000000 , pi = 3.141593 13.75user 0.00system 0:13.75elapsed 99%CPU (0avgtext+0avgdata 1672maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_2 N = 400000000 , pi = 3.141593 14.51user 0.00system 0:07.28elapsed 199%CPU (0avgtext+0avgdata 1680maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_4 N = 400000000 , pi = 3.141593 14.81user 0.00system 0:04.10elapsed 360%CPU (0avgtext+0avgdata 1700maxresident)k 0inputs+0outputs (0major+82minor)pagefaults 0swaps time ./time_test_avx N = 400000000 , pi = 3.141593 1.45user 0.00system 0:01.45elapsed 99%CPU (0avgtext+0avgdata 1652maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps time ./time_test_avxunroll N = 400000000 , pi = 3.141593 1.49user 0.00system 0:01.49elapsed 99%CPU (0avgtext+0avgdata 1608maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps

Gnuplot

STEP:

  1. orig: make gencsv
for i in `seq 100 5000 25000`; do \
	printf "%d," $i;\
	./benchmark_clock_gettime $i; \
done > result_clock_gettime.csv	
  1. 新增 runtime.gp
set xlabel 'N' set ylabel 'Time(sec)' set style fill solid set title 'Compute_pi Time by clock_gettime() ' set term png enhanced font 'Verdana,10' set output 'runtime.png' set term pngcairo size 1280,960 set datafile separator "," plot "result_clock_gettime.csv" using 1:2 title 'baseline' with lines lt rgb 'red' , \ "result_clock_gettime.csv" using 1:3 title 'openmp_2' with lines lt rgb 'blue' , \ "result_clock_gettime.csv" using 1:4 title 'openmp_4' with lines lt rgb 'green' ,\ "result_clock_gettime.csv" using 1:5 title 'AVX' with lines lt rgb 'orange' ,\ "result_clock_gettime.csv" using 1:6 title 'AVX + loop untolling' with lines lt rgb 'brown'

要plot "*csv" 必須加上第8行,不然會有一些莫名其妙的錯誤訊息

  1. Makefile 加以下程式
plot: gencsv
		gnuplot runtime.gp
![](https://i.imgur.com/ZRDQhYx.png)

clean:
	rm -f $(EXECUTABLE) *.o *.s result_clock_gettime.csv runtime.png

-DBASELINE待解:

* 編譯時有`-DBASELINE`等等,其中的`-D`指的是`#define BASELINE 1`
	* 這樣程式碼就可以塞在同一個.c .h中!只要在編譯的時候加上`-D`就好了!!
	* Makefile的篇幅可以大幅降低!
	* 我想phonebook也可以這樣改寫,不然不同的方法就還要在多生一組.c .h 
  1. 執行 make plot,結果如下圖
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

未增加圓周率計算過程中迭代的數量,時間太短就看不出差異?
於Makefile中修改:
for i in seq 1000 1000 250000;

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Wall-clock time vs. CPU time

1. Wall-clock time 由 kernel 裡的 xtime 來記錄,系統每次啟動時會先從設備上的 RTC 上讀入 xtime。值是自 1970-01-01 起經歷的秒數,Linux 核心也會決定每秒要發生幾次中斷 (透過常數 HZ) ,每次 timer interrupt,就會去更新 xtime。可能會和 NTP 伺服器、時區、日光節約時間同步或使用者自己調整。所以「通常」不建議拿來量測程式時間,因為它不是一個穩定的時間,Wall-clock time 不一定是單調遞增 (monotonic)。

  1. xtime: 從cmos電路中取得RTC的時間
  2. jiffies: 從電腦開機到目前的總共時鍾中斷(time interrupt)次數

2. CPU time CPU time 算的是「每條 thread 的使用時間加總」

很多時候可以用此判斷程式是 CPU bound 還是 I/O bound。

  1. real < user: 表示程式是 CPU bound,使用平行處理有得到好處,效能有提昇。
  2. real ≒ user: 表示程式是 CPU bound,即使使用平行處理,效能也沒有明顯提昇,常見的原因有可能是其實計算量沒有很大,生成、處理、結束多執行緒的 overhead 吃掉所有平行處理的好處,或是程式相依性太高,不適合拿來作平行處理。
  3. real > user: 表示程式是 I/O bound,成本大部分為 I/O操作,使得平行處理無法帶來顯著的效能提昇 ( 或沒有提昇)。

Compute-pi

方法有四:

1. 離散積分

π = 4 *(
01
11+x2
dt )

  • Baseline
double computePi_v1(size_t N)
{
    double pi = 0.0;
    double dt = 1.0 / N;&nbsp;// dt = (b-a)/N, b = 1, a = 0
    for (size_t i = 0; i < N; i++) {
        double x = (double) i / N; // x = ti = a+(b-a)*i/N = i/N
        pi += dt / (1.0 + x * x); // integrate 1/(1+x^2), i = 0....N
    }
    return pi * 4.0;
}
  • AVX SIMD
    • AVX (Advanced Vector Extensions) 是 Intel 一套用來作  Single Instruction Multiple Data 的指令集 source

    • 128 位 SIMD 暫存器 xmm0 - xmm15 擴展為 256 位的 ymm0 - ymm15 暫存器

    • 支持 256 位的向量運算,由原來 128 位擴展為 256 位

    • 指令可支持最多 4 個 operand,實現目標操作數無需損毀原來的內容

    • 引進新的 AVX 指令,非 Legacy SSE 指令移植

    • 新增 FMA 指令子集 (fused multiply-add/subtract 類運算)

    • 可用$ cat /proc/cpuinfo | grep flags 尋找是否存在 fma關鍵字

      • 沒有fma, 模擬器sudo -i進入root,但是$ echo 0 > /proc/sys/kernel/yama/ptrace_scope沒有反應..參考

無 'sde' 這個指令,是指這個嗎:
 'sds' 指令來自於 'simh' 套件 (universe)
 'cde' 指令來自於 'cde' 套件 (universe)
 'sdb' 指令來自於 'sdb' 套件 (universe)
 'sed' 指令來自於 'sed' 套件 (main)
 'sdf' 指令來自於 'sdf' 套件 (universe)
 'ide' 指令來自於 'ecere-dev' 套件 (universe)
 'see' 指令來自於 'mime-support' 套件 (main)
 'sdc' 指令來自於 'hpsockd' 套件 (universe)
 'ode' 指令來自於 'plotutils' 套件 (universe)
 'tde' 指令來自於 'devtodo' 套件 (universe)
 'spe' 指令來自於 'spe' 套件 (universe)
 'sd' 指令來自於 'sd' 套件 (universe)

​*  ctrl+D =Exit root
​
​* 像是`Intel(R) Core(TM) i5-3317U CPU` 等級的電腦就沒支援,這樣的話可用

​* 引進新的指令編碼模式,使用 VEX prefix 來設計指令編碼`__m256d` 並不是一種暫存器,是指可以用來載入到 AVX 暫存器的 "Data type"後面程式中用到了`__attribute__ `機制,可以用來設置函數屬性(Function Attribute)、變數屬性(Variable Attribute)和類型屬性(Type Attribute)。`aligned` 則規定變數或結構的最小對齊格式,以 Byte 為單位。

2. Leibniz 級數

數學證明

Step:

  1. add [leibniz code],[leibniz_openmp code] : computepi.c time_test.c benchmark_clock_gettime.c Makefile
  2. Make plot
  • 出現錯誤訊息
benchmark_clock_gettime.c:(.text+0x340): 未定義參考到「compute_pi_Leibniz」
benchmark_clock_gettime.c:(.text+0x3c8): 未定義參考到「compute_pi_Leibniz_openmp_4」
collect2: error: ld returned 1 exit status
Makefile:9: recipe for target 'default' failed
make: *** [default] Error 1

compute_pi_Leibniz—L改為小寫即可(要和computepi.c裡的函式名稱一致)

  • 圖是空的

Warning: empty x range [1000:1000], adjusting to [990:1010]

1.查看result_clock_gettime.csv發現1,2欄為—nan(有可能是出現分母是0)
2.runtime.gp plot的順序和benchmark_clock_gettime.c不一致

benchmark_clock_gettime.c:80:9: warning: implicit declaration of function ‘compute_pi_leibniz_openmp’ [-Wimplicit-function-declaration]compute_pi_leibniz_openmp(N,4);

結果和openmp2很接近

3. Euler

vic85821

1.git reset --hard HEAD^回到上一次commit的版本(還沒加其他方法)
2.computepi.c add function

double compute_pi_eular_baseline(size_t N)
{
    double pi = 0.0;
    for (size_t i = 1; i < N; i++)
    {
        pi += 1 / pow(i,2);
    }
    return sqrt(pi * 6.0);
}
Q: computpi.c,computepi.h,time_test.c,benchmark_clock_gettime.c有加#include <math.h>, Makefile有加 -lm,卻還是出現:
computepi.o: 於函式 compute_pi_eular:
computepi.c:(.text+0xcad): 未定義參考到「pow」

可以嘗試在編譯時加入參數 -lm ierosodin

好喔!謝謝你!

A:

上面三張 runtime.gp欄位打錯,出現奇怪現象
下面這張才是正常的:

4. MonteCarlo蒙地卡羅法

​* 邊長為2的正方形,正方形內有一個內切圓,面積為 $1 \cdot 1 \cdotπ = π$
​* 圓面積和正方形面積之比為π:4 = 正方形內均勻隨機丟石頭,落在圓形內的機率 
double compute_pi_montecarlo(size_t N) { int count = 0; srand(time(NULL)); for (size_t i = 0; i < N; i++) { double x = (double)rand()/RAND_MAX; double y = (double)rand()/RAND_MAX; if((x*x+y*y)<=1) count++; } return 4.0 * count / N ; }
time ./time_test_baseline N = 400000000 , pi = 3.141593 13.72user 0.00system 0:13.72elapsed 100%CPU (0avgtext+0avgdata 1656maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_2 N = 400000000 , pi = 3.141593 14.51user 0.00system 0:07.26elapsed 199%CPU (0avgtext+0avgdata 1540maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps time ./time_test_openmp_4 N = 400000000 , pi = 3.141593 14.82user 0.00system 0:03.78elapsed 392%CPU (0avgtext+0avgdata 1648maxresident)k 0inputs+0outputs (0major+81minor)pagefaults 0swaps time ./time_test_avx N = 400000000 , pi = 3.141593 1.45user 0.00system 0:01.45elapsed 99%CPU (0avgtext+0avgdata 1676maxresident)k 0inputs+0outputs (0major+80minor)pagefaults 0swaps time ./time_test_avxunroll N = 400000000 , pi = 3.141593 1.48user 0.00system 0:01.48elapsed 100%CPU (0avgtext+0avgdata 1676maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps time ./time_test_leibniz N = 400000000 , pi = 400000000.000000 7.39user 0.00system 0:07.39elapsed 99%CPU (0avgtext+0avgdata 1604maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps time ./time_test_leibniz_openmp_2 N = 400000000 , pi = 0.000000 7.79user 0.00system 0:03.90elapsed 199%CPU (0avgtext+0avgdata 1660maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps time ./time_test_leibniz_openmp_4 N = 400000000 , pi = 0.000000 8.04user 0.00system 0:02.03elapsed 395%CPU (0avgtext+0avgdata 1684maxresident)k 0inputs+0outputs (0major+80minor)pagefaults 0swaps time ./time_test_montecarlo N = 400000000 , pi = 400000000.000000 16.04user 0.00system 0:16.04elapsed 99%CPU (0avgtext+0avgdata 1596maxresident)k 0inputs+0outputs (0major+78minor)pagefaults 0swaps

GIT problem

! [rejected] master -> master (non-fast-forward)

Automatic merge failed; fix conflicts and then commit the result.

Updates were rejected because the tip of your current branch is behind its remote counterpart.

To squelch this message and maintain the traditional behavior, use: git config --global push.default matching To squelch this message and adopt the new behavior now, use: git config --global push.default simple When push.default is set to 'matching', git will push local branches to the remote branches that already exist with the same name.

執行git pull origin master後,多了不對的程式碼,想要還原git revert HEAD但是出現錯誤訊息:

error: revert is not possible because you have unmerged files.
hint: Fix them up in the work tree, and then use 'git add/rm <file>'
hint: as appropriate to mark resolution and make a commit.
fatal: revert failed

但是我不想merge他

A:git push -f強制push