在gcc的指令中加入 -gp
的參數
gprof -b raytracing gmon.out | less
先來看看前五名
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
27.65 0.76 0.76 69646433 0.00 0.00 dot_product
12.74 1.11 0.35 56956357 0.00 0.00 subtract_vector
9.10 1.36 0.25 17836094 0.00 0.00 add_vector
9.10 1.61 0.25 13861875 0.00 0.00 rayRectangularIntersection
8.37 1.84 0.23 13861875 0.00 0.00 raySphereIntersection
Function inline
Execution time of raytracing() : 2.350556 sec
Loop unrolling
Loop unrolling的想法很簡單,就是把已知大小的for回圈拆開,聽起來好像很智障,不過在call 6千萬次的function裡面少掉branch的判斷其實就可以大幅提升效能了!
Execution time of raytracing() : 1.633041 sec
Muti-thread
CUDA