Try   HackMD

2017q1 Homework1 (compute-pi)

tags: embedded

contributed by <Cayonliow>

Reviewed by PeterTing

  • 可以試著優化 computepi.c 中的 compute_pi_leibniz ,像是用 OpenMP 或是 SIMD 的方式來做做看,或許可以發現更多的東西。
  • 可以探討為什麼 OpenMP 最不穩定?
  • 可以用信賴區間 95%來對資料進行篩選。
  • 可以對不同的時間函式進行研究

tags : LibreOffice SIMD Leibniz time error rate

作業

開發環境

cayon@cayon-X550JX:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz Stepping: 3 CPU MHz: 2423.484 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 5188.45 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7

工具

原理與概念

  • 圓周率

  • Leibniz

  • 黎曼積分

  • time

    • Wall-clock time (掛鐘時間)

    • CPU time

      • 電腦本身是一個sequential circuit 循序電路 而讓sequential circuit能夠正常順利的運作 最關鍵的就是需要有一個clock 去控制資料的同步以及更新 也因此電腦最基本的時間單位就是 clock cycle

      • CPU time 可以視為是一個程式在CPU裡面執行了多少個 clock cycle 再乘上每一個clock cycle它實際花了多少的時間

      • 文章: CPU Time

      • Image Not Showing Possible Reasons
        • The image file may be corrupted
        • The server hosting the image is unavailable
        • The image path is incorrect
        • The image format is not supported
        Learn More →

開發記錄

  • 把專案 git clone 下來, 研究 LibreOffice 來制圖
    • 可是制出來的圖數據是累加的, 搞不懂如何將數據獨立出來
    • 所以先用之前作業的 gnuplot 制圖, 看結果
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →

遇到的問題:

cayon@cayon-X550JX:~/embedded/hw1/compute-pi$ make plot
gnuplot scripts/runtime.gp

plot[:][:]'result_clock_gettime.csv' using 1:2 title 'Baseline', '' using 1:3 title 'OpenMP(2 threads)', '' using 1:4 title 'OpenMP(4 threads)', '' using 1:5 title 'AVX', '' using 1:6 title 'AVX + Unroll looping', \ 
                                                                                                                                                                                                                      ^
"scripts/runtime.gp", line 14: invalid character \

Makefile:44: recipe for target 'plot' failed
make: *** [plot] Error 1

原來 \ 後面不能有任何[空格]

  • 執行 $ make check 得到各個方式所實做出來計算 pi 的時間
time ./time_test_baseline
N = 400000000 , pi = 3.141593
3.26user 0.00system 0:03.27elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
3.24user 0.00system 0:01.62elapsed 199%CPU (0avgtext+0avgdata 1724maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
4.86user 0.00system 0:01.60elapsed 303%CPU (0avgtext+0avgdata 1784maxresident)k
0inputs+0outputs (0major+92minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
0.91user 0.00system 0:00.91elapsed 99%CPU (0avgtext+0avgdata 1712maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
0.87user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1764maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
  • 參考 petermouse 的共筆

    • 在benchmark_clock_gettime.c中, 將表格以.txt開啟,或是觀察printf輸出格式,可以發現到格式其實是以逗號分隔輸出:
100,0.000025,0.000072,0.000076,0.000017,0.000007
5100,0.001208,0.000596,0.001392,0.001637,0.000525
10100,0.002192,0.001114,0.001097,0.000590,0.000555
15100,0.003449,0.001743,0.000894,0.000868,0.000840
20100,0.004585,0.003216,0.002393,0.001183,0.001291
25100,0.005393,0.002688,0.002675,0.001436,0.001384

使用datafile separator ",",gnuplot才能區分每一行 的每一筆資料

  • 原始版本的 benchmark_clock_gettime.c,每種方式都執行了25次
    int N = atoi(argv[1]);
    int i, loop = 25;

    // Baseline
    clock_gettime(CLOCK_ID, &start);
    for (i = 0; i < loop; i++) {
        compute_pi_baseline(N);

將執行的次數調成1000
輸出結果:!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

雖然很明顯 Baseline 的斜率最大, 時間最長。 可是在 OpenMP(4 threads) , AVX , AVX + Unroll looping 這三個實作的結果來看, 線條太靠近,加上 OpenMP(4 threads) 跟 AVX 有不穩定的狀況出現, 數據結果的比較不太明顯

所以試着將執行的次數調成10000
輸出結果:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

OpenMP(4 threads) 跟 AVX de1不穩定狀況消失了 可是結果線條還是太密集, 無法直接用肉眼判斷
可是我們還是能夠知道: AVX + Unroll looping > AVX >>> OpenMP(2 threads) > OpenMP(4 threads) >>> Baseline

所以試着將執行的次數調成100000
可是分別不大, 可是程式的執行時間 (make gencsv) 變長了很多

所以試着將執行的次數調回25 將 N 調大
輸出結果:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

在幾次的輸出結果, 發現 OpenMP(4 threads) 最爲不穩定

  • 參考 laochanlam 的共筆的方式實作 Leibniz's formula
    • 參考 shelly4132 的共筆 的証明方法
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
double compute_pi_leibniz(size_t N)
{
    double pi = 0.0;
    double term = 0.0;
    double sign = 1.0;
    for (size_t i = 0; i < N; i++) {
        term = (double) sign / ( 2.0 * i + 1.0 );
        sign = -sign;
        pi += term;
    }
    return pi * 4.0;
}

輸出結果:

    // Error rate calculation
    #define M_PI acos(-1.0)
    double pi = compute_pi(n);
    double diff = pi - M_PI > 0 ? pi - M_PI : M_PI - pi;
    double error = diff / M_PI;

利用 cos-1(-1) 求 pi , error rate 就是與每一個版本的誤差值