# Raytracing ## Gprof 1. 使用 -pg 進行編譯與連結 2. 執行程式生成數據 3. Gprog 進行分析 `$ gprof executable gmon.out` ## Intel AVX * **AVX** (Advanced Vector Extensions) 延伸自 **SSE** 指令集,硬體支援 **16** 個 **256-bit** YMM 暫存器 [YMM0 : YMM15],而 SSE 指令集所使用的 128-bit XMM 暫存器則採用 YMM 的 lower-half 128 bits 達到相容  * 在這 256-bit 暫存器內,可以選擇放 **8** 個 **float** 或 **4** 個 **double**...,指令分為 scalar (suffix s) 和 vector (suffix **p**) 兩種運算方式,vector operate 以 SIMD 方式處理暫存器內的資料 * **suffix s (scalar)** : operate on the least-significant data element * **suffix p (packed)** : compute all elements in parallel  * AVX intrinsic names : **_mm256_op_suffix**(data_type param1, data_type param2, data_type param3) * **_mm256:** prefix for working on 256-bit registers * **_op:** the operation * **_suffix:** type of data to operate on AVX Suffix Marking  AVX Intrinsics Data Types  ## Execution Time ### Original version Execution time: 2.621690 sec ``` Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 31.96 0.84 0.84 69646433 0.00 0.00 dot_product 13.70 1.20 0.36 56956357 0.00 0.00 subtract_vector 10.27 1.47 0.27 31410180 0.00 0.00 multiply_vector ``` ### Optimization * OpenMP 運算 dot_product - `#include <omp.h>` - link with `-fopenmp` 執行時間:44.493292 sec,提昇20倍之多,因其運算部份不多,建立與回收執行緒的時間反而推累整體速度。 * AVX 運算 dot_product - `#include <immintrin.h>` - link with `-mavx` 執行時間:0.560055 sec,時間減少4倍多。 **Source of this article:** https://hackmd.io/s/rkkb-F9a
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up