Try   HackMD

2017q1 Homework1 (compute-pi)

contributed by <vtim9907>

Reviewed by <rayleigh0407>

  • 為了平滑化而引用信賴區間這點還不錯 , 但是直接說95% 的信賴區間落在 ±1.96 個標準差內有點危險 , 這個結果是建立在資料分佈為 常態分佈 的情況下 , 可參考 wiki Normal distribution 或是 wiki Confidence interval

開發環境

  • OS : Ubuntu 16.04.2 LTS
  • Kernel Version : 4.8.0-36-generic
  • Memory : 32 GB
  • CPU : Intel® Core™ i5-6600 Processor
    • cache :
      • L1i cache : 4*32 KB
      • L1d cache : 4*32 KB
      • L2 cache : 4*256 KB
      • L3 cache : 6144 KB
    • L1 cache imformation ( sudo dmidecode -t cache ) :
Handle 0x001E, DMI type 7, 19 bytes
Cache Information
	Socket Designation: L1 Cache
	Configuration: Enabled, Not Socketed, Level 1
	Operational Mode: Write Back
	Location: Internal
	Installed Size: 256 kB
	Maximum Size: 256 kB
	Supported SRAM Types:
		Synchronous
	Installed SRAM Type: Synchronous
	Speed: Unknown
	Error Correction Type: Parity
	System Type: Unified
	Associativity: 8-way Set-associative

基本實驗

25 rounds, N <= 21000

將原實驗結果視覺化呈現,先用 LibreOffice 畫看看 (25 round):

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

線實在有點粗,圖片也有點小,可能我還不太會用@@" 改用 Gnuplot 畫:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

10000 rounds, N <= 50000

覺得取樣的間隔有點太大,決定改成 N 增加 1000 就取樣一遍,並將 N 放大到 50000,一次跑 10000 rounds :

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

從這張圖就很明顯可以看出,除了 4 threads 的方法以外,其他方法的波動都比較小,穩定遞增,但是使用了 4 threads 的 OpenMP 卻不斷的震盪,奇怪的是這張圖的前半段,就是 N 還小於兩萬左右時,結果與上面的測試不太相符,我覺得這應該是因為 10000 rounds 中,有幾 round 因為一些不確定因素的關係,時間花的比較久,所以拉高了整體所花的時間。

95% 信賴區間

為了解決上面測不太準的問題,我用取信賴區間的方式,取 95% 信賴區間內的數據當參考,以外的就忽略掉,這樣就可以避免某幾 round 沒跑好,導致整體數據的錯誤。

首先可以簡單從高中課本或是 wiki 查到母體標準差的定義 :

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

簡單化簡一下 :

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

而根據查 Z 值的表可以知道,95% 的信賴區間落在 ±1.96 個標準差內,所以可以知道,直接從母體算出 95% 信賴區間的上下界公式為:

  • 上界 : μ+1.96σ
  • 下界 : μ1.96σ

實做取信賴區間,我直接加了一個 function 進 benchmark_clock_gettime.c

double calc_ci (double sample_time[]) { int j, count = 0; double average = 0.0, total_time = 0.0, sigma, upper, lower, conf_interval = 0.0; for (j = 0; j < SAMPLE_NUM; ++j) { average += sample_time[j]; total_time += sample_time[j] * sample_time[j]; } average /= SAMPLE_NUM; sigma = sqrt(total_time / SAMPLE_NUM - average * average); upper = average + 1.96 * sigma; lower = average - 1.96 * sigma; for (j = 0; j < SAMPLE_NUM; ++j) { if (sample_time[j] > lower && sample_time[j] < upper) { conf_interval += sample_time[j]; count++; } } return conf_interval /= count; }

為了不讓程式跑太久,我改成一次跑 500 rounds,全部跑 1000 次,從這 1000 次取 95% 信賴區間內的數據當參考,最後畫成圖:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

這張圖我總共跑了五次,因為第一次跑還是有些微的波動,我怕是因為背景程式或是網頁的干擾,所以關掉了一些東西後重跑了四次,結果數據都很平滑,可見在測試效能時,還是要盡量排除不必要的不確定因素。

很明顯,這次的分析圖就相當平滑,也能看出當 N 越大時,OpenMP with 4 threads 還是比 AVX SIMD + Loop unrolling 快一點,是目前最快的優化版本。

計算誤差

參考了 廖健富學長的筆記,而選擇使用 Arccos(1) 逆推出來的 π 當作基準,來推算用上面幾個方法所算出來的 π 的誤差。

簡單寫了 function 來產生 .csv 檔,然後用 Gnuplot 畫出來:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

由於一開始的誤差很大,所以畫成圖後,前半部會成垂直狀,很難看。

不過因為誤差會極度減小的範圍只有 N 還很小的時候,所以只要把 N 的起始值調高,數據應該就比較漂亮:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

結果我想錯了 @@" 其實誤差值是持續穩定下降,所以區間只要選的太大,出來的圖形就會趨近直角,所以只好把區間縮小一點在畫出來;不過大略可以看出,若 N 趨近 10n ,算出來 π 的精確度就大約有 n 位。

Leibniz formula for π

簡單證明

首先我們知道

  • d[arctan(x)]dx=11+x2 (1)

又知 11+x2 可以 power series 展開為

  • 1x2+x4...

所以對等式 (1) 兩邊積分得到 :

  • arctan(x)=0x11+x2=0x1x2+x4... (2)

然後因為 arctan(1)=π4 ,帶進 (2) 得到

  • arctan(1)=π4=011x2+x4...=113+1517...

實做

根據前面證明結果

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

可以簡化為:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

有了上面的式子就很好實做。

實做方面我直接選擇用 OpenMP 使用 4 threads 去跑,然後同樣一次跑 500 rounds ,共跑 1000 次,然後取 95% 信賴區間內的結果當參考,用 Gnuplot 畫成圖:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

新的公式比之前的版本快滿多的,幾乎是 Baseline 的三倍速。

計算 Leibniz formula for π 的誤差,與前一版的誤差一起畫成圖形做比較:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

看出其實兩者的精度差不多,只是分別從 π 的上下逼近 π

Reference