Raytracing - HackMD

# Raytracing ## Gprof 1. 使用 -pg 進行編譯與連結 2. 執行程式生成數據 3. Gprog 進行分析 `$ gprof executable gmon.out` ## Intel AVX * **AVX** (Advanced Vector Extensions) 延伸自 **SSE** 指令集，硬體支援 **16** 個 **256-bit** YMM 暫存器 [YMM0 : YMM15]，而 SSE 指令集所使用的 128-bit XMM 暫存器則採用 YMM 的 lower-half 128 bits 達到相容 ![](https://i.imgur.com/ReZrWNo.png) * 在這 256-bit 暫存器內，可以選擇放 **8** 個 **float** 或 **4** 個 **double**...，指令分為 scalar (suffix s) 和 vector (suffix **p**) 兩種運算方式，vector operate 以 SIMD 方式處理暫存器內的資料 * **suffix s (scalar)** : operate on the least-significant data element * **suffix p (packed)** : compute all elements in parallel ![](https://i.imgur.com/HD4QW82.png) * AVX intrinsic names : **_mm256_op_suffix**(data_type param1, data_type param2, data_type param3) * **_mm256:** prefix for working on 256-bit registers * **_op:** the operation * **_suffix:** type of data to operate on AVX Suffix Marking ![](https://i.imgur.com/l0fk3G8.png) AVX Intrinsics Data Types ![](https://i.imgur.com/2hjT1en.png) ## Execution Time ### Original version Execution time: 2.621690 sec ``` Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 31.96 0.84 0.84 69646433 0.00 0.00 dot_product 13.70 1.20 0.36 56956357 0.00 0.00 subtract_vector 10.27 1.47 0.27 31410180 0.00 0.00 multiply_vector ``` ### Optimization * OpenMP 運算 dot_product - `#include <omp.h>` - link with `-fopenmp` 執行時間：44.493292 sec，提昇20倍之多，因其運算部份不多，建立與回收執行緒的時間反而推累整體速度。 * AVX 運算 dot_product - `#include <immintrin.h>` - link with `-mavx` 執行時間：0.560055 sec，時間減少4倍多。 **Source of this article:** https://hackmd.io/s/rkkb-F9a