contributed by<zhanyangch
>
zhanyangch
sysprog2017
week1
Sean1127
預期執行 $ make plot 後,可透過 gnuplot 產生前述多種實做的效能分析比較圖表
clock_gettime()
得出的結果是正確的?clock_gettime()
跟其他的 function 有什麼不同?add machin-like method
不夠清楚,因為他人跟本不知道什麼是 machin-like method、新增這個 method 的目的、作法Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 42
Model name: Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz
製程: 7
CPU MHz: 855.421
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 5587.06
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 4096K
NUMA node0 CPU(s): 0-3
$ make check
time ./time_test_baseline
N = 400000000 , pi = 3.141593
5.21user 0.00system 0:05.21elapsed 99%CPU (0avgtext+0avgdata 1872maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
5.42user 0.00system 0:02.72elapsed 199%CPU (0avgtext+0avgdata 1760maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
9.58user 0.00system 0:02.79elapsed 343%CPU (0avgtext+0avgdata 1828maxresident)k
0inputs+0outputs (0major+94minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
1.72user 0.00system 0:01.72elapsed 99%CPU (0avgtext+0avgdata 1792maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
1.84user 0.00system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 1716maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
$ make gencsv
for (int i = 0; i <= N - 16; i += 16) {
ymm14 = _mm256_set1_pd(i * dt);
...
ymm9 = _mm256_add_pd(ymm9, ymm13);
}
double arctan(double x , size_t N,int threads)
{
double atan = 0.0;
double dz = x / N;
double z;
#pragma omp parallel num_threads(threads)
{
#pragma omp for private(z) reduction(+:atan)
for(size_t i=0; i<N ;i++) {
z =(double) x * i / N;
atan += dz / (1.0 + z * z );
}
}
return atan;
}
double compute_pi_machin(size_t N, int threads)
{
double pi = 4.0 * arctan((double)0.2, N, threads) - arctan((double)1.0 / 239, N, threads);
return pi * 4.0;
}