Try   HackMD

2016q3 Homework1 / compute-pi

tags: jserv

Date: 2016/09/30

contribute by <hankgo>


電腦環境

  • OS: Lubuntu 16.04 (upgrade from 15.10)

  • CPU: Intel i3-2350m ( 2 core / 4 thread )

    • 使用指令 $ lscpu
    • L1 data cache = 32K
    • L1 instruction cache = 32K
    • L2 cache = 256K
    • L3 cache = 3072K
  • RAM: 8GB

baseline 方法

數學來源

為什麼對 (1/1+X^2) 積分可以得到 pi/4 ?

  • REFERENCE

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • 剛剛看到其他大大使用微積分公式證明,我真的忘記了(跪)

Riemann Integral

  • 基本上由下列影片就可以看到 (看前一兩分鐘就有感覺),當 1/N 切越細的時候,越能夠逼近真實的面積,而 baseline 的方法就是使用 N 越來越大,1/N 越來越小的方式去逼近 pi 的真實數值。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

程式碼

double compute_pi_baseline(size_t N) { double pi = 0.0; double dt = 1.0 / N; // dt = (b-a)/N, b = 1, a = 0 for (size_t i = 0; i < N; i++) { double x = (double) i / N; // x = ti = a+(b-a)*i/N = i/N pi += dt / (1.0 + x * x); // integrate 1/(1+x^2), i = 0....N } return pi * 4.0; }

初次執行

執行結果

$ make check

Define

gcc -c -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.c -o computepi.o 
gcc -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.o time_test.c -DBASELINE -o time_test_baseline
gcc -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.o time_test.c -DOPENMP_2 -o time_test_openmp_2
gcc -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.o time_test.c -DOPENMP_4 -o time_test_openmp_4
gcc -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.o time_test.c -DAVX -o time_test_avx
gcc -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.o time_test.c -DAVXUNROLL -o time_test_avxunroll
gcc -O0 -std=gnu99 -Wall -fopenmp -mavx computepi.o benchmark_clock_gettime.c -o benchmark_clock_gettime
time ./time_test_baseline

在這邊可以看到,是下了不同的參數來編譯,而主要不同的地方在於 -DBASELINE -DOPENMP_2 -DOPENMP_4 -DAVX -DAVXUNROLL 這些參數,下了不同的參數就會有不同的結果,所以來觀察一下程式碼,在 computepi.c 中發現以下程式碼:

#if defined(BASELINE) pi = compute_pi_baseline(N); #endif #if defined(OPENMP_2) pi = compute_pi_openmp(N, 2); #endif #if defined(OPENMP_4) pi = compute_pi_openmp(N, 4); #endif #if defined(AVX) pi = compute_pi_avx(N); #endif #if defined(AVXUNROLL) pi = compute_pi_avx_unroll(N); #endif

再根據 gcc online doc,可以得知 -D 這個參數的用處:Predefine

-D name
Predefine name as a macro, with definition 1.

-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a ‘#define’ directive. In particular, the definition will be truncated by embedded newline characters.

If you are invoking the preprocessor from a shell or shell-like program you may need to use the shell's quoting syntax to protect characters such as spaces that have a meaning in the shell syntax.

If you wish to define a function-like macro on the command line, write its argument list with surrounding parentheses before the equals sign (if any). Parentheses are meaningful to most shells, so you will need to quote the option. With sh and csh, -D'name(args)=definition' works.

-D and -U options are processed in the order they are given on the command line. All -imacros file and -include file options are processed after all -D and -U options.

-U name
Cancel any previous definition of name, either built in or provided with a -D option.


run time

N = 400000000 , pi = 3.141593
7.71user 0.00system 0:07.71elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
time ./time_test_openmp_2


N = 400000000 , pi = 3.141593
8.62user 0.00system 0:04.31elapsed 199%CPU (0avgtext+0avgdata 1796maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
time ./time_test_openmp_4


N = 400000000 , pi = 3.141593
15.34user 0.00system 0:03.86elapsed 397%CPU (0avgtext+0avgdata 1816maxresident)k
0inputs+0outputs (0major+94minor)pagefaults 0swaps
time ./time_test_avx


N = 400000000 , pi = 3.141593
2.40user 0.00system 0:02.40elapsed 99%CPU (0avgtext+0avgdata 1868maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./time_test_avxunroll


N = 400000000 , pi = 3.141593
2.74user 0.00system 0:02.74elapsed 99%CPU (0avgtext+0avgdata 1720maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps

這邊列出了各種時間,差異如下:

  • user time:
  • system time:
  • elapsed time:

畫圖