embedded
contributed by <Cayonliow
>
PeterTing
tags : LibreOffice
SIMD
Leibniz
time
error rate
cayon@cayon-X550JX:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Stepping: 3
CPU MHz: 2423.484
CPU max MHz: 3600.0000
CPU min MHz: 800.0000
BogoMIPS: 5188.45
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
圓周率
Leibniz
黎曼積分
time
Wall-clock time (掛鐘時間)
<time.h>
提供。注意得到的时间精度是和系统有关系的CPU time
電腦本身是一個sequential circuit 循序電路 而讓sequential circuit能夠正常順利的運作 最關鍵的就是需要有一個clock 去控制資料的同步以及更新 也因此電腦最基本的時間單位就是 clock cycle
CPU time 可以視為是一個程式在CPU裡面執行了多少個 clock cycle 再乘上每一個clock cycle它實際花了多少的時間
文章: CPU Time
git clone
下來, 研究 LibreOffice 來制圖
遇到的問題:
cayon@cayon-X550JX:~/embedded/hw1/compute-pi$ make plot
gnuplot scripts/runtime.gp
plot[:][:]'result_clock_gettime.csv' using 1:2 title 'Baseline', '' using 1:3 title 'OpenMP(2 threads)', '' using 1:4 title 'OpenMP(4 threads)', '' using 1:5 title 'AVX', '' using 1:6 title 'AVX + Unroll looping', \
^
"scripts/runtime.gp", line 14: invalid character \
Makefile:44: recipe for target 'plot' failed
make: *** [plot] Error 1
原來
\
後面不能有任何[空格]
$ make check
得到各個方式所實做出來計算 pi 的時間time ./time_test_baseline
N = 400000000 , pi = 3.141593
3.26user 0.00system 0:03.27elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
3.24user 0.00system 0:01.62elapsed 199%CPU (0avgtext+0avgdata 1724maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
4.86user 0.00system 0:01.60elapsed 303%CPU (0avgtext+0avgdata 1784maxresident)k
0inputs+0outputs (0major+92minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
0.91user 0.00system 0:00.91elapsed 99%CPU (0avgtext+0avgdata 1712maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
0.87user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1764maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
100,0.000025,0.000072,0.000076,0.000017,0.000007
5100,0.001208,0.000596,0.001392,0.001637,0.000525
10100,0.002192,0.001114,0.001097,0.000590,0.000555
15100,0.003449,0.001743,0.000894,0.000868,0.000840
20100,0.004585,0.003216,0.002393,0.001183,0.001291
25100,0.005393,0.002688,0.002675,0.001436,0.001384
使用datafile separator ",",gnuplot才能區分每一行 的每一筆資料
benchmark_clock_gettime.c
,每種方式都執行了25次 int N = atoi(argv[1]);
int i, loop = 25;
// Baseline
clock_gettime(CLOCK_ID, &start);
for (i = 0; i < loop; i++) {
compute_pi_baseline(N);
將執行的次數調成1000
輸出結果:!
雖然很明顯 Baseline 的斜率最大, 時間最長。 可是在 OpenMP(4 threads) , AVX , AVX + Unroll looping 這三個實作的結果來看, 線條太靠近,加上 OpenMP(4 threads) 跟 AVX 有不穩定的狀況出現, 數據結果的比較不太明顯
所以試着將執行的次數調成10000
輸出結果:
所以試着將執行的次數調成100000
可是分別不大, 可是程式的執行時間 (make gencsv
) 變長了很多
所以試着將執行的次數調回25 將 N 調大
輸出結果:
double compute_pi_leibniz(size_t N)
{
double pi = 0.0;
double term = 0.0;
double sign = 1.0;
for (size_t i = 0; i < N; i++) {
term = (double) sign / ( 2.0 * i + 1.0 );
sign = -sign;
pi += term;
}
return pi * 4.0;
}
輸出結果:
// Error rate calculation
#define M_PI acos(-1.0)
double pi = compute_pi(n);
double diff = pi - M_PI > 0 ? pi - M_PI : M_PI - pi;
double error = diff / M_PI;
利用 cos-1(-1) 求 pi , error rate 就是與每一個版本的誤差值