Raytracing

Gprof

使用 -pg 進行編譯與連結
執行程式生成數據
Gprog 進行分析 $ gprof executable gmon.out

Intel AVX

AVX (Advanced Vector Extensions) 延伸自 SSE 指令集，硬體支援 16 個 256-bit YMM 暫存器 [YMM0 : YMM15]，而 SSE 指令集所使用的 128-bit XMM 暫存器則採用 YMM 的 lower-half 128 bits 達到相容
在這 256-bit 暫存器內，可以選擇放 8 個 float 或 4 個 double…，指令分為 scalar (suffix s) 和 vector (suffix p) 兩種運算方式，vector operate 以 SIMD 方式處理暫存器內的資料
- suffix s (scalar) : operate on the least-significant data element
- suffix p (packed) : compute all elements in parallel
AVX intrinsic names : _mm256_op_suffix(data_type param1, data_type param2, data_type param3)
- _mm256: prefix for working on 256-bit registers
- _op: the operation
- _suffix: type of data to operate on
AVX Suffix Marking

AVX Intrinsics Data Types

Execution Time

Original version

Execution time: 2.621690 sec

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 31.96      0.84     0.84 69646433     0.00     0.00  dot_product
 13.70      1.20     0.36 56956357     0.00     0.00  subtract_vector
 10.27      1.47     0.27 31410180     0.00     0.00  multiply_vector

Optimization

OpenMP 運算 dot_product
- #include <omp.h>
- link with -fopenmp
執行時間：44.493292 sec，提昇20倍之多，因其運算部份不多，建立與回收執行緒的時間反而推累整體速度。
AVX 運算 dot_product
- #include <immintrin.h>
- link with -mavx
執行時間：0.560055 sec，時間減少4倍多。

Source of this article: https://hackmd.io/s/rkkb-F9a