# 2017q1 Homework1 (compute-pi) contributed by < `twzjwang` > 作業說明: [B03: compute-pi](https://hackmd.io/s/HkiJpDPtx) github: [twzjwang/compute-pi](https://github.com/twzjwang/compute-pi) ### Reviewed by `hunng` - 實際實驗、比較各種擷取時間函數並比較 - 雖然只是更換 rand() 成 rand_r() 但卻能在效能上做出明顯改善 # 開發環境 - Ubuntu Ubuntu 16.04.2 LTS Linux 4.8.0-36-generic - cpu version: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K - memory size: 4GiB # 開發前 - [黎曼積分](https://zh.wikipedia.org/wiki/%E9%BB%8E%E6%9B%BC%E7%A7%AF%E5%88%86) - [AVX](https://zh.wikipedia.org/wiki/AVX%E6%8C%87%E4%BB%A4%E9%9B%86) >數學相關式子可以試著使用 LaTeX 表示喔! >[name=課程助教][color=red] - 時間處理 - Wall-clock time - 自 1970-01-01 起經歷的秒數 - 「通常」不建議拿來量測程式時間 - 不一定是[單調遞增(monotonic)](https://zh.wikipedia.org/wiki/%E5%8D%95%E8%B0%83%E5%87%BD%E6%95%B0) - CPU time - 程序在 CPU 上面運行消耗 (佔用) 的時間 - clock() - multithreads - 每條 thread 的使用時間加總 - 時間函數 - [clock()](https://linux.die.net/man/3/clock) - return number of clock ticks - 將結果除以 `CLOCKS_PER_SEC` 得到時間 - 在 32-bit 系統下 `CLOCKS_PER_SEC` = 1000000 $2^{32}$ / 1000000 = 4295(s) = 72(min) 約72分鐘後會 wrap around - 不能區分 kernel mode 、 user mode 占用的 CPU time - [clock_gettime()](https://linux.die.net/man/2/clock_gettime) - 根據參數 `clockid_t` 決定 clock 類型 - system-wide (for all processes) - per-process - 時間單位 nanoseconds : 1/$10^{-9}$ (s) , 10 億分之一秒 ```C struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; ``` - [time()](https://linux.die.net/man/2/time) - return 1970-01-01 00:00:00 +0000 (UTC) 至現在的秒數 - 在 32-bit 系統下,time_t 是 signed int32 可記錄 $2^{31}$(s) = 68(years),也就是到2038年會 wrap around - [2038年問題](https://zh.wikipedia.org/wiki/2038%E5%B9%B4%E9%97%AE%E9%A2%98) - [gettimeofday()](https://linux.die.net/man/2/gettimeofday) - 時間單位 microseconds : $1/10^{-6}$ (s) ```c struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ }; ``` - `timezone` 結構 ```c struct timezone { int tz_minuteswest; /* minutes west of Greenwich */ int tz_dsttime; /* type of DST correction */ }; ``` - time() 和 gettimeofday() 不適合拿來作 benchmark - 參考[gettimeofday() should never be used to measure time ](https://blog.habets.se/2010/09/gettimeofday-should-never-be-used-to-measure-time.html) - Nobody sneaks in and changes your wristwatch when you’re not looking - 系統時間可能被改變(NTP、時區...) - You’re not stupid, the computer is - You are the limiting factor - 生活中通常也是用現實時間的時間差計時 - 時間單位不夠精確 - 用 clock_gettime 代替 - It’s not stable across reboots - goes up by one second for each second that passes in the physical world # 實驗一 - [Monte Carlo method](https://en.wikipedia.org/wiki/Monte_Carlo_method) - 步驟 - 隨機放置一點在 A(x,y) , 0<=x<=1, 0<=y<=1 - 重複 N 次,並紀錄 M - $pi = 4 * M / N$ - C : 半徑為 1 ,原點為(0,0)的園 - N : 樣本數 - M : A 在 C 內的次數 - $x^2 + y^2 < 1$ => 在 C 內 - P : - $P = M / N$ - $P = 園面積 / 正方形面積 = (pi / 4) / 1 = pi / 4$ - => $pi = 4 * M / N$ - 使用 rand() 產生隨機數 ```c double montecarlo(long long int N) { long long int i; long long int M = 0; double x,y; for (i = 0; i < N; i++) { x = (double) rand() / RAND_MAX; y = (double) rand() / RAND_MAX; if ( x * x + y * y < 1) M++; } return (double) M / N * 4; } ``` ## 方法 1: 實作 Monte Carlo method,分別使用 4 種時間函數計時 - 計時方法 - clock() ```C clock_t start = clock(); ... clock_t finish = clock(); double duration = (double)(finish - start) / CLOCKS_PER_SEC; ``` - clock_gettime() ```C struct timespec start; struct timespec finish; clock_gettime(CLOCK_MONOTONIC, &start); ... clock_gettime(CLOCK_MONOTONIC, &finish); double duration = (finish.tv_sec - start.tv_sec) + (finish.tv_nsec - start.tv_nsec) / 1000000000.0; ``` - time() ```C time_t start = time(0); ... time_t finish = time(0); double duration = (double)(finish - start); ``` - gettimeofday() ```C struct timeval start; struct timeval finish; gettimeofday(&start,NULL); ... gettimeofday(&finish,NULL); double duration = (finish.tv_sec - start.tv_sec) + (finish.tv_usec - start.tv_usec) / 1000000.0; ``` - 結果 ``` $ gcc -o montecarlo montecarlo.c $ ./montecarlo N = 100000000 clock() pi = 3.141698 2.731048 (s) clock_gettime() pi = 3.141420 2.729236 (s) time() pi = 3.141591 3.000000 (s) gettimeofday() pi = 3.141585 2.749408 (s) ``` ## 方法 2: 比較不同樣本數得到結果的誤差 - 使用 clock_gettime() - 結果 n | $10^0$ | $10^1$ | $10^2$ | $10^3$ | $10^4$ | $10^5$ | $10^6$ | $10^7$ | $10^8$ | $10^9$ -|----|----|----|----|----|----|----|----|----|---- time(s)|0.000003|0.000002|0.000015|0.000128|0.000943|0.009436|0.040706|0.277493|2.723461|27.253682 pi|4.000000|3.200000|3.120000|3.152000|3.167200|3.138880|3.141920|3.140899|3.141710|3.141604 error|0.858407|0.058407|0.021593|0.010407|0.025607|0.002713|0.000327|0.000694|0.000117|0.000011 ![](https://i.imgur.com/yPHQL6X.png) >X label 可以參考[美圖欣賞第三章](https://hackmd.io/s/Skwp-alOg#美圖欣賞) 修改 >[name=課程助教][color=red] >增加`set xtics rotate by 45 right` ``` $ gcc -o montecarlo montecarlo.c $ ./montecarlo N = 1 0.000003 (s) pi = 4.000000 error = 0.858407 N = 10 0.000002 (s) pi = 3.200000 error = 0.058407 ... N = 100000000 2.723461 (s) pi = 3.141710 error = 0.000117 N = 1000000000 27.253682 (s) pi = 3.141604 error = 0.000011 ``` ## 方法 3: 用 OpenMP 平行化 ```C #pragma omp parallel default(none) \ private(i,x,y) shared(N) reduction(+:M) { #pragma omp for for (i = 0; i < N; i++) { x = (double) rand() / RAND_MAX; y = (double) rand() / RAND_MAX; if ( x * x + y * y < 1) M++; } } ``` - 結果 - 以 N = 10000000 為例 - serial - 0.277493 (s) - parallel - 2.966225 (s) - multithread 比較慢... - 在 4 cores 環境下,時間增為 10.6 倍 - [Program is slower when using multiple threads](http://stackoverflow.com/questions/16437105/program-is-slower-when-using-multiple-threads) - rand() is not thread safe. - [OpenMP program is slower than sequential one](http://stackoverflow.com/questions/10624755/openmp-program-is-slower-than-sequential-one) 下方留言提到 - rand() 使用 global state variables (hidden in the (g)libc implementation) - 應使用 private 的 seed 作為 parameter 呼叫 rand_r() - 同時推薦 [erand48()](https://linux.die.net/man/3/drand48) 作為更好的 random number source - 這位學長筆記中解釋為什麼 multi-thread 不保證比較快 - [CheHsuan HackMD](https://hackmd.io/s/BkbydgVa) - rand() is not reentrant or thread-safe - rand()有用到靜態變數(或全域變數),導致該靜態變數或全域變數變得不可預期 - multi-thread 的環境下要使用非 thread-safe 時,必須自己加上lock ## 方法 4: 用 OpenMP 平行化-改用 [rand_r()](https://linux.die.net/man/3/rand_r) - 參考 [How to generate random numbers in parallel?](http://stackoverflow.com/questions/4287531/how-to-generate-random-numbers-in-parallel) 產生隨機數 - 以 omp_get_thread_num() 作為 rand_r 的 seed ```C #pragma omp parallel default(none) \ private(i,x,y) shared(N) reduction(+:M) { unsigned int myseed = omp_get_thread_num(); #pragma omp for for (i = 0; i < N; i++) { x = (double) rand_r(&myseed) / RAND_MAX; y = (double) rand_r(&myseed) / RAND_MAX; if ( x * x + y * y < 1) M++; } } ``` - 結果 - 以 N = 10000000 為例,比較平行化前後差異 - serial - 0.277493 (s) - parallel - 0.079453 (s) - 在 4 cores 環境下,執行時間前為平行前的 29% - 以 N = 10000000 為例,在使用OpenMp平行化下,比較 rand() 與 rand_r() - rand() - 2.966225 (s) - rand_r() - 0.079453 (s) - rand_r() 執行時間為 rand() 的 3% ![](https://i.imgur.com/QHBdkoR.png) # 將 Monte Carlo method 與其他方法合併 - `$ make plot` - 結果 - 暫不考慮誤差大小,僅考慮執行時間 - ![](https://i.imgur.com/7rZmnhs.png) # 參考資料 - [clock()、time()、clock_gettime()和gettimeofday()函数的用法和区别](http://www.cnblogs.com/krythur/archive/2013/02/25/2932647.html) - [vic85821 HackMD](https://hackmd.io/s/rkgYN0P6#) - [Program is slower when using multiple threads](http://stackoverflow.com/questions/16437105/program-is-slower-when-using-multiple-threads) - [OpenMP program is slower than sequential one](http://stackoverflow.com/questions/10624755/openmp-program-is-slower-than-sequential-one) - [CheHsuan HackMD](https://hackmd.io/s/BkbydgVa) - [rand()](https://linux.die.net/man/3/rand) - [rand_r()](https://linux.die.net/man/3/rand_r) - [erand48()](https://linux.die.net/man/3/drand48) - [How to generate random numbers in parallel?](http://stackoverflow.com/questions/4287531/how-to-generate-random-numbers-in-parallel) - ['zmke' script.gp](https://github.com/zmke/compute-pi/blob/master/script.gp) ###### tags: `twzjwang`