2017q1 Homework1 (compute-pi)

tags: `embedded`

contributed by <Cayonliow>

Reviewed by `PeterTing`

可以試著優化 computepi.c 中的 compute_pi_leibniz ，像是用 OpenMP 或是 SIMD 的方式來做做看，或許可以發現更多的東西。
可以探討為什麼 OpenMP 最不穩定？
可以用信賴區間 95%來對資料進行篩選。
可以對不同的時間函式進行研究

tags : LibreOffice SIMD Leibniz time error rate

作業

題目： B02: compute-pi
github(原來的): compute-pi
參考資料：廖健富學長提供的詳盡分析(hackpad)

開發環境

























cayon@cayon-X550JX:~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Stepping:              3
CPU MHz:               2423.484
CPU max MHz:           3600.0000
CPU min MHz:           800.0000
BogoMIPS:              5188.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7

工具

LibreOffice
- 文章：教學
SIMD
- 文章： SIMD技術 , GCC中SIMD指令的應用方法 , Intel MMX/SSE指令集介紹

原理與概念

圓周率
- 文章：積分計算圓周率π , Function
Leibniz
- 文章：萊布尼茲的微積分 , Leibniz formula for π
黎曼積分
- 文章： wikipedia
time
- Wall-clock time (掛鐘時間)
  - elapsed time
  - 指一段程序从运行到终止，系统时钟走过的时间。一般来说，系统时间都是要大于CPU时间的。通常这类时间可以由系统提供，在C++/Windows中，可以由<time.h>提供。注意得到的时间精度是和系统有关系的
  - 文章：學長姐們的筆記（1） , 學長姐們的筆記（2） , 什麼是 wall clock time , wall time
- CPU time
  - 電腦本身是一個sequential circuit 循序電路而讓sequential circuit能夠正常順利的運作最關鍵的就是需要有一個clock 去控制資料的同步以及更新也因此電腦最基本的時間單位就是 clock cycle
  - CPU time 可以視為是一個程式在CPU裡面執行了多少個 clock cycle 再乘上每一個clock cycle它實際花了多少的時間
  - 文章： CPU Time
  - Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →

開發記錄

把專案 git clone 下來，研究 LibreOffice 來制圖
- 可是制出來的圖數據是累加的，搞不懂如何將數據獨立出來
- 所以先用之前作業的 gnuplot 制圖，看結果
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →

遇到的問題：

cayon@cayon-X550JX:~/embedded/hw1/compute-pi$ make plot
gnuplot scripts/runtime.gp

plot[:][:]'result_clock_gettime.csv' using 1:2 title 'Baseline', '' using 1:3 title 'OpenMP(2 threads)', '' using 1:4 title 'OpenMP(4 threads)', '' using 1:5 title 'AVX', '' using 1:6 title 'AVX + Unroll looping', \ 
                                                                                                                                                                                                                      ^
"scripts/runtime.gp", line 14: invalid character \

Makefile:44: recipe for target 'plot' failed
make: *** [plot] Error 1

原來 \ 後面不能有任何[空格]

執行 $ make check 得到各個方式所實做出來計算 pi 的時間

time ./time_test_baseline
N = 400000000 , pi = 3.141593
3.26user 0.00system 0:03.27elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
3.24user 0.00system 0:01.62elapsed 199%CPU (0avgtext+0avgdata 1724maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
4.86user 0.00system 0:01.60elapsed 303%CPU (0avgtext+0avgdata 1784maxresident)k
0inputs+0outputs (0major+92minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
0.91user 0.00system 0:00.91elapsed 99%CPU (0avgtext+0avgdata 1712maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
0.87user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1764maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps

參考 petermouse 的共筆
- 在benchmark_clock_gettime.c中，將表格以.txt開啟，或是觀察printf輸出格式，可以發現到格式其實是以逗號分隔輸出：

100,0.000025,0.000072,0.000076,0.000017,0.000007
5100,0.001208,0.000596,0.001392,0.001637,0.000525
10100,0.002192,0.001114,0.001097,0.000590,0.000555
15100,0.003449,0.001743,0.000894,0.000868,0.000840
20100,0.004585,0.003216,0.002393,0.001183,0.001291
25100,0.005393,0.002688,0.002675,0.001436,0.001384

使用datafile separator ","，gnuplot才能區分每一行的每一筆資料

原始版本的 benchmark_clock_gettime.c，每種方式都執行了25次

    int N = atoi(argv[1]);
    int i, loop = 25;

    // Baseline
    clock_gettime(CLOCK_ID, &start);
    for (i = 0; i < loop; i++) {
        compute_pi_baseline(N);

將執行的次數調成1000
輸出結果：!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

雖然很明顯 Baseline 的斜率最大，時間最長。可是在 OpenMP(4 threads) , AVX , AVX + Unroll looping 這三個實作的結果來看，線條太靠近，加上 OpenMP(4 threads) 跟 AVX 有不穩定的狀況出現，數據結果的比較不太明顯

所以試着將執行的次數調成10000
輸出結果：

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

OpenMP(4 threads) 跟 AVX de1不穩定狀況消失了可是結果線條還是太密集，無法直接用肉眼判斷
可是我們還是能夠知道： AVX + Unroll looping > AVX >>> OpenMP(2 threads) > OpenMP(4 threads) >>> Baseline

所以試着將執行的次數調成100000
可是分別不大，可是程式的執行時間（make gencsv）變長了很多

所以試着將執行的次數調回25 將 N 調大
輸出結果：

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

在幾次的輸出結果，發現 OpenMP(4 threads) 最爲不穩定

參考 laochanlam 的共筆的方式實作 Leibniz's formula
- 參考 shelly4132 的共筆的証明方法
- Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →

double compute_pi_leibniz(size_t N)
{
    double pi = 0.0;
    double term = 0.0;
    double sign = 1.0;
    for (size_t i = 0; i < N; i++) {
        term = (double) sign / ( 2.0 * i + 1.0 );
        sign = -sign;
        pi += term;
    }
    return pi * 4.0;
}

輸出結果：

error rate
- 引用：廖健富學長提供的詳盡分析

    // Error rate calculation
    #define M_PI acos(-1.0)
    double pi = compute_pi(n);
    double diff = pi - M_PI > 0 ? pi - M_PI : M_PI - pi;
    double error = diff / M_PI;

利用 cos^-1(-1) 求 pi , error rate 就是與每一個版本的誤差值

2017q1 Homework1 (compute-pi)

tags: embedded

Reviewed by PeterTing

作業

開發環境

工具

原理與概念

開發記錄

tags: `embedded`

Reviewed by `PeterTing`