# 2017q1 Homework1 (compute-pi)
contributed by < `twzjwang` >
作業說明: [B03: compute-pi](https://hackmd.io/s/HkiJpDPtx)
github: [twzjwang/compute-pi](https://github.com/twzjwang/compute-pi)
### Reviewed by `hunng`
- 實際實驗、比較各種擷取時間函數並比較
- 雖然只是更換 rand() 成 rand_r() 但卻能在效能上做出明顯改善
# 開發環境
- Ubuntu Ubuntu 16.04.2 LTS
Linux 4.8.0-36-generic
- cpu
version: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
- memory
size: 4GiB
# 開發前
- [黎曼積分](https://zh.wikipedia.org/wiki/%E9%BB%8E%E6%9B%BC%E7%A7%AF%E5%88%86)
- [AVX](https://zh.wikipedia.org/wiki/AVX%E6%8C%87%E4%BB%A4%E9%9B%86)
>數學相關式子可以試著使用 LaTeX 表示喔!
>[name=課程助教][color=red]
- 時間處理
- Wall-clock time
- 自 1970-01-01 起經歷的秒數
- 「通常」不建議拿來量測程式時間
- 不一定是[單調遞增(monotonic)](https://zh.wikipedia.org/wiki/%E5%8D%95%E8%B0%83%E5%87%BD%E6%95%B0)
- CPU time
- 程序在 CPU 上面運行消耗 (佔用) 的時間
- clock()
- multithreads - 每條 thread 的使用時間加總
- 時間函數
- [clock()](https://linux.die.net/man/3/clock)
- return number of clock ticks
- 將結果除以 `CLOCKS_PER_SEC` 得到時間
- 在 32-bit 系統下 `CLOCKS_PER_SEC` = 1000000
$2^{32}$ / 1000000 = 4295(s) = 72(min)
約72分鐘後會 wrap around
- 不能區分 kernel mode 、 user mode 占用的 CPU time
- [clock_gettime()](https://linux.die.net/man/2/clock_gettime)
- 根據參數 `clockid_t` 決定 clock 類型
- system-wide (for all processes)
- per-process
- 時間單位 nanoseconds : 1/$10^{-9}$ (s) , 10 億分之一秒
```C
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
```
- [time()](https://linux.die.net/man/2/time)
- return 1970-01-01 00:00:00 +0000 (UTC) 至現在的秒數
- 在 32-bit 系統下,time_t 是 signed int32
可記錄 $2^{31}$(s) = 68(years),也就是到2038年會 wrap around
- [2038年問題](https://zh.wikipedia.org/wiki/2038%E5%B9%B4%E9%97%AE%E9%A2%98)
- [gettimeofday()](https://linux.die.net/man/2/gettimeofday)
- 時間單位 microseconds : $1/10^{-6}$ (s)
```c
struct timeval {
time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};
```
- `timezone` 結構
```c
struct timezone {
int tz_minuteswest; /* minutes west of Greenwich */
int tz_dsttime; /* type of DST correction */
};
```
- time() 和 gettimeofday() 不適合拿來作 benchmark
- 參考[gettimeofday() should never be used to measure time
](https://blog.habets.se/2010/09/gettimeofday-should-never-be-used-to-measure-time.html)
- Nobody sneaks in and changes your wristwatch when you’re not looking
- 系統時間可能被改變(NTP、時區...)
- You’re not stupid, the computer is
- You are the limiting factor
- 生活中通常也是用現實時間的時間差計時
- 時間單位不夠精確
- 用 clock_gettime 代替
- It’s not stable across reboots
- goes up by one second for each second that passes in the physical world
# 實驗一 - [Monte Carlo method](https://en.wikipedia.org/wiki/Monte_Carlo_method)
- 步驟
- 隨機放置一點在 A(x,y) , 0<=x<=1, 0<=y<=1
- 重複 N 次,並紀錄 M
- $pi = 4 * M / N$
- C : 半徑為 1 ,原點為(0,0)的園
- N : 樣本數
- M : A 在 C 內的次數
- $x^2 + y^2 < 1$ => 在 C 內
- P :
- $P = M / N$
- $P = 園面積 / 正方形面積 = (pi / 4) / 1 = pi / 4$
- => $pi = 4 * M / N$
- 使用 rand() 產生隨機數
```c
double montecarlo(long long int N)
{
long long int i;
long long int M = 0;
double x,y;
for (i = 0; i < N; i++) {
x = (double) rand() / RAND_MAX;
y = (double) rand() / RAND_MAX;
if ( x * x + y * y < 1)
M++;
}
return (double) M / N * 4;
}
```
## 方法 1: 實作 Monte Carlo method,分別使用 4 種時間函數計時
- 計時方法
- clock()
```C
clock_t start = clock();
...
clock_t finish = clock();
double duration = (double)(finish - start) / CLOCKS_PER_SEC;
```
- clock_gettime()
```C
struct timespec start;
struct timespec finish;
clock_gettime(CLOCK_MONOTONIC, &start);
...
clock_gettime(CLOCK_MONOTONIC, &finish);
double duration = (finish.tv_sec - start.tv_sec)
+ (finish.tv_nsec - start.tv_nsec) / 1000000000.0;
```
- time()
```C
time_t start = time(0);
...
time_t finish = time(0);
double duration = (double)(finish - start);
```
- gettimeofday()
```C
struct timeval start;
struct timeval finish;
gettimeofday(&start,NULL);
...
gettimeofday(&finish,NULL);
double duration = (finish.tv_sec - start.tv_sec)
+ (finish.tv_usec - start.tv_usec) / 1000000.0;
```
- 結果
```
$ gcc -o montecarlo montecarlo.c
$ ./montecarlo
N = 100000000
clock()
pi = 3.141698
2.731048 (s)
clock_gettime()
pi = 3.141420
2.729236 (s)
time()
pi = 3.141591
3.000000 (s)
gettimeofday()
pi = 3.141585
2.749408 (s)
```
## 方法 2: 比較不同樣本數得到結果的誤差
- 使用 clock_gettime()
- 結果
n | $10^0$ | $10^1$ | $10^2$ | $10^3$ | $10^4$ | $10^5$ | $10^6$ | $10^7$ | $10^8$ | $10^9$
-|----|----|----|----|----|----|----|----|----|----
time(s)|0.000003|0.000002|0.000015|0.000128|0.000943|0.009436|0.040706|0.277493|2.723461|27.253682
pi|4.000000|3.200000|3.120000|3.152000|3.167200|3.138880|3.141920|3.140899|3.141710|3.141604
error|0.858407|0.058407|0.021593|0.010407|0.025607|0.002713|0.000327|0.000694|0.000117|0.000011
![](https://i.imgur.com/yPHQL6X.png)
>X label 可以參考[美圖欣賞第三章](https://hackmd.io/s/Skwp-alOg#美圖欣賞) 修改
>[name=課程助教][color=red]
>增加`set xtics rotate by 45 right`
```
$ gcc -o montecarlo montecarlo.c
$ ./montecarlo
N = 1
0.000003 (s)
pi = 4.000000
error = 0.858407
N = 10
0.000002 (s)
pi = 3.200000
error = 0.058407
...
N = 100000000
2.723461 (s)
pi = 3.141710
error = 0.000117
N = 1000000000
27.253682 (s)
pi = 3.141604
error = 0.000011
```
## 方法 3: 用 OpenMP 平行化
```C
#pragma omp parallel default(none) \
private(i,x,y) shared(N) reduction(+:M)
{
#pragma omp for
for (i = 0; i < N; i++) {
x = (double) rand() / RAND_MAX;
y = (double) rand() / RAND_MAX;
if ( x * x + y * y < 1)
M++;
}
}
```
- 結果
- 以 N = 10000000 為例
- serial
- 0.277493 (s)
- parallel
- 2.966225 (s)
- multithread 比較慢...
- 在 4 cores 環境下,時間增為 10.6 倍
- [Program is slower when using multiple threads](http://stackoverflow.com/questions/16437105/program-is-slower-when-using-multiple-threads)
- rand() is not thread safe.
- [OpenMP program is slower than sequential one](http://stackoverflow.com/questions/10624755/openmp-program-is-slower-than-sequential-one) 下方留言提到
- rand() 使用 global state variables (hidden in the (g)libc implementation)
- 應使用 private 的 seed 作為 parameter 呼叫 rand_r()
- 同時推薦 [erand48()](https://linux.die.net/man/3/drand48) 作為更好的 random number source
- 這位學長筆記中解釋為什麼 multi-thread 不保證比較快
- [CheHsuan HackMD](https://hackmd.io/s/BkbydgVa)
- rand() is not reentrant or thread-safe
- rand()有用到靜態變數(或全域變數),導致該靜態變數或全域變數變得不可預期
- multi-thread 的環境下要使用非 thread-safe 時,必須自己加上lock
## 方法 4: 用 OpenMP 平行化-改用 [rand_r()](https://linux.die.net/man/3/rand_r)
- 參考 [How to generate random numbers in parallel?](http://stackoverflow.com/questions/4287531/how-to-generate-random-numbers-in-parallel) 產生隨機數
- 以 omp_get_thread_num() 作為 rand_r 的 seed
```C
#pragma omp parallel default(none) \
private(i,x,y) shared(N) reduction(+:M)
{
unsigned int myseed = omp_get_thread_num();
#pragma omp for
for (i = 0; i < N; i++) {
x = (double) rand_r(&myseed) / RAND_MAX;
y = (double) rand_r(&myseed) / RAND_MAX;
if ( x * x + y * y < 1)
M++;
}
}
```
- 結果
- 以 N = 10000000 為例,比較平行化前後差異
- serial
- 0.277493 (s)
- parallel
- 0.079453 (s)
- 在 4 cores 環境下,執行時間前為平行前的 29%
- 以 N = 10000000 為例,在使用OpenMp平行化下,比較 rand() 與 rand_r()
- rand()
- 2.966225 (s)
- rand_r()
- 0.079453 (s)
- rand_r() 執行時間為 rand() 的 3%
![](https://i.imgur.com/QHBdkoR.png)
# 將 Monte Carlo method 與其他方法合併
- `$ make plot`
- 結果
- 暫不考慮誤差大小,僅考慮執行時間
- ![](https://i.imgur.com/7rZmnhs.png)
# 參考資料
- [clock()、time()、clock_gettime()和gettimeofday()函数的用法和区别](http://www.cnblogs.com/krythur/archive/2013/02/25/2932647.html)
- [vic85821 HackMD](https://hackmd.io/s/rkgYN0P6#)
- [Program is slower when using multiple threads](http://stackoverflow.com/questions/16437105/program-is-slower-when-using-multiple-threads)
- [OpenMP program is slower than sequential one](http://stackoverflow.com/questions/10624755/openmp-program-is-slower-than-sequential-one)
- [CheHsuan HackMD](https://hackmd.io/s/BkbydgVa)
- [rand()](https://linux.die.net/man/3/rand)
- [rand_r()](https://linux.die.net/man/3/rand_r)
- [erand48()](https://linux.die.net/man/3/drand48)
- [How to generate random numbers in parallel?](http://stackoverflow.com/questions/4287531/how-to-generate-random-numbers-in-parallel)
- ['zmke' script.gp](https://github.com/zmke/compute-pi/blob/master/script.gp)
###### tags: `twzjwang`