2017q1 Homework1 (compute-pi)

contributed by < twzjwang >
作業說明: B03: compute-pi
github: twzjwang/compute-pi

Reviewed by `hunng`

實際實驗、比較各種擷取時間函數並比較
雖然只是更換 rand() 成 rand_r() 但卻能在效能上做出明顯改善

開發環境

Ubuntu Ubuntu 16.04.2 LTS
Linux 4.8.0-36-generic
cpu
version: Intel® Core™ i5-3337U CPU @ 1.80GHz
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
memory
size: 4GiB

開發前

數學相關式子可以試著使用 LaTeX 表示喔!
課程助教

時間處理
- Wall-clock time
  - 自 1970-01-01 起經歷的秒數
  - 「通常」不建議拿來量測程式時間
  - 不一定是單調遞增(monotonic)
- CPU time
  - 程序在 CPU 上面運行消耗 (佔用) 的時間
  - clock()
  - multithreads - 每條 thread 的使用時間加總

時間函數

clock()
- return number of clock ticks
- 將結果除以 CLOCKS_PER_SEC 得到時間
- 在 32-bit 系統下 CLOCKS_PER_SEC = 1000000
  
  $2^{32}$ / 1000000 = 4295(s) = 72(min)
  約72分鐘後會 wrap around
- 不能區分 kernel mode 、 user mode 占用的 CPU time

clock_gettime()

根據參數 clockid_t 決定 clock 類型
- system-wide (for all processes)
- per-process
時間單位 nanoseconds : 1/
$10^{- 9}$ (s) , 10 億分之一秒

struct timespec {
    time_t   tv_sec;        /* seconds */
    long     tv_nsec;       /* nanoseconds */
};

time()
- return 1970-01-01 00:00:00 +0000 (UTC) 至現在的秒數
- 在 32-bit 系統下，time_t 是 signed int32
  可記錄
  $2^{31}$ (s) = 68(years)，也就是到2038年會 wrap around
- 2038年問題

gettimeofday()

時間單位 microseconds :
$1 / 10^{- 6}$ (s)

struct timeval {
    time_t      tv_sec;     /* seconds */
    suseconds_t tv_usec;    /* microseconds */
};

timezone 結構

struct timezone {
    int tz_minuteswest;     /* minutes west of Greenwich */
    int tz_dsttime;         /* type of DST correction */
};

time() 和 gettimeofday() 不適合拿來作 benchmark
- 參考gettimeofday() should never be used to measure time
- Nobody sneaks in and changes your wristwatch when you’re not looking
  - 系統時間可能被改變(NTP、時區…)
- You’re not stupid, the computer is
- You are the limiting factor
  - 生活中通常也是用現實時間的時間差計時
  - 時間單位不夠精確
- 用 clock_gettime 代替
  - It’s not stable across reboots
  - goes up by one second for each second that passes in the physical world

實驗一 - Monte Carlo method

步驟
- 隨機放置一點在 A(x,y) , 0<=x<=1, 0<=y<=1
- 重複 N 次，並紀錄 M
- $p i = 4 * M / N$
C : 半徑為 1 ，原點為(0,0)的園
N : 樣本數
M : A 在 C 內的次數
- $x^{2} + y^{2} < 1$ => 在 C 內
P :
- $P = M / N$
- $P = 園面積 / 正方形面積 = (p i / 4) / 1 = p i / 4$
- =>
  $p i = 4 * M / N$
使用 rand() 產生隨機數

double montecarlo(long long int N)
{
    long long int i;
    long long int M = 0;
    double x,y;
    
    for (i = 0; i < N; i++) {
        x = (double) rand() / RAND_MAX;
        y = (double) rand() / RAND_MAX;
        if ( x * x + y * y < 1)
            M++;
    }
    return (double) M / N * 4;
}

方法 1: 實作 Monte Carlo method，分別使用 4 種時間函數計時

計時方法

clock()

clock_t start = clock();

...

clock_t finish = clock();
double  duration = (double)(finish - start) / CLOCKS_PER_SEC;

clock_gettime()

struct timespec start;
struct timespec finish;
clock_gettime(CLOCK_MONOTONIC, &start);

...

clock_gettime(CLOCK_MONOTONIC, &finish);
double  duration = (finish.tv_sec - start.tv_sec) 
    + (finish.tv_nsec - start.tv_nsec) / 1000000000.0;

time()

time_t start = time(0);

...

time_t finish = time(0);
double  duration = (double)(finish - start);

gettimeofday()

struct timeval start;
struct timeval finish;
gettimeofday(&start,NULL);

...

gettimeofday(&finish,NULL);
double  duration = (finish.tv_sec - start.tv_sec) 
    + (finish.tv_usec - start.tv_usec) / 1000000.0;

結果

$ gcc -o montecarlo montecarlo.c 
$ ./montecarlo
N = 100000000

clock()
pi = 3.141698
2.731048 (s)

clock_gettime()
pi = 3.141420
2.729236 (s)

time()
pi = 3.141591
3.000000 (s)

gettimeofday()
pi = 3.141585
2.749408 (s)

方法 2: 比較不同樣本數得到結果的誤差

使用 clock_gettime()

結果

n	$10^{0}$	$10^{1}$	$10^{2}$	$10^{3}$	$10^{4}$	$10^{5}$	$10^{6}$	$10^{7}$	$10^{8}$	$10^{9}$
time(s)	0.000003	0.000002	0.000015	0.000128	0.000943	0.009436	0.040706	0.277493	2.723461	27.253682
pi	4.000000	3.200000	3.120000	3.152000	3.167200	3.138880	3.141920	3.140899	3.141710	3.141604
error	0.858407	0.058407	0.021593	0.010407	0.025607	0.002713	0.000327	0.000694	0.000117	0.000011

X label 可以參考美圖欣賞第三章修改
課程助教

增加set xtics rotate by 45 right

$ gcc -o montecarlo montecarlo.c 
$ ./montecarlo
N = 1
0.000003 (s)
pi    = 4.000000
error = 0.858407

N = 10
0.000002 (s)
pi    = 3.200000
error = 0.058407

...

N = 100000000
2.723461 (s)
pi    = 3.141710
error = 0.000117

N = 1000000000
27.253682 (s)
pi    = 3.141604
error = 0.000011

方法 3: 用 OpenMP 平行化

#pragma omp parallel default(none) \
private(i,x,y) shared(N) reduction(+:M)
{
    #pragma omp for
    for (i = 0; i < N; i++) {
        x = (double) rand() / RAND_MAX;
        y = (double) rand() / RAND_MAX;
        if ( x * x + y * y < 1)
            M++;
    }
}

結果
- 以 N = 10000000 為例
  - serial
    - 0.277493 (s)
  - parallel
    - 2.966225 (s)
- multithread 比較慢…
  - 在 4 cores 環境下，時間增為 10.6 倍
Program is slower when using multiple threads
- rand() is not thread safe.
OpenMP program is slower than sequential one 下方留言提到
- rand() 使用 global state variables (hidden in the (g)libc implementation)
- 應使用 private 的 seed 作為 parameter 呼叫 rand_r()
- 同時推薦 erand48() 作為更好的 random number source
這位學長筆記中解釋為什麼 multi-thread 不保證比較快
- CheHsuan HackMD
- rand() is not reentrant or thread-safe
  - rand()有用到靜態變數(或全域變數)，導致該靜態變數或全域變數變得不可預期
  - multi-thread 的環境下要使用非 thread-safe 時，必須自己加上lock

方法 4: 用 OpenMP 平行化-改用 rand_r()

參考 How to generate random numbers in parallel? 產生隨機數
以 omp_get_thread_num() 作為 rand_r 的 seed

    #pragma omp parallel default(none) \
    private(i,x,y) shared(N) reduction(+:M)
    {
        unsigned int myseed = omp_get_thread_num();
        #pragma omp for
        for (i = 0; i < N; i++) {
            x = (double) rand_r(&myseed) / RAND_MAX;
            y = (double) rand_r(&myseed) / RAND_MAX;
            if ( x * x + y * y < 1)
                M++;
        }
    }

結果
- 以 N = 10000000 為例，比較平行化前後差異
  - serial
    - 0.277493 (s)
  - parallel
    - 0.079453 (s)
  - 在 4 cores 環境下，執行時間前為平行前的 29%
- 以 N = 10000000 為例，在使用OpenMp平行化下，比較 rand() 與 rand_r()
  - rand()
    - 2.966225 (s)
  - rand_r()
    - 0.079453 (s)
  - rand_r() 執行時間為 rand() 的 3%

將 Monte Carlo method 與其他方法合併

$ make plot
結果
- 暫不考慮誤差大小，僅考慮執行時間

2017q1 Homework1 (compute-pi)

Reviewed by `hunng`

開發環境

開發前

實驗一 - Monte Carlo method

方法 1: 實作 Monte Carlo method，分別使用 4 種時間函數計時

方法 2: 比較不同樣本數得到結果的誤差

方法 3: 用 OpenMP 平行化

方法 4: 用 OpenMP 平行化-改用 rand_r()

將 Monte Carlo method 與其他方法合併

參考資料

tags: `twzjwang`

2017q1 Homework1 (compute-pi)

Reviewed by hunng

開發環境

開發前

實驗一 - Monte Carlo method

方法 1: 實作 Monte Carlo method，分別使用 4 種時間函數計時

方法 2: 比較不同樣本數得到結果的誤差

方法 3: 用 OpenMP 平行化

方法 4: 用 OpenMP 平行化-改用 rand_r()

將 Monte Carlo method 與其他方法合併

參考資料

tags: twzjwang

Reviewed by `hunng`

tags: `twzjwang`