Compute-PI
GPU acceleration
-
Device Info
- Device name: GeForce GTX 770
- Warp size: 32
- multiProcessorCount: 8
-
Gnuplot command
Speed up for 12800000 slices


- 需注意設定的執行緒數量,會很明顯影響到提昇速率
- 當執行緒數量超過 6144 時,便不再有速率提昇,可參考 Amdahl's law,固定負載下能平行計算的部份有限,因此速度提昇會趨近定值

Time cost for 400000000 slices
- Baseline: 2.999478(s)
- OpenMP
- 2 threads: 1.460224(s)
- 4 threads: 0.767469(s)
- AVX: 1.283873(s)
- AVX + Loop unroll: 1.169533(s)
- 6144 GPU threads (threads per block = 256): 0.070991(s)
Source of this article: https://hackmd.io/s/SyEgyhlA