contributed by <LitSnow
>
make PROFILE=1
./raytracing
gprof ./raytracing | less
# Rendering scene
Done!
Execution time of raytracing() : 5.585869 sec
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
27.02 0.91 0.91 69646433 0.00 0.00 dot_product
18.41 1.53 0.62 56956357 0.00 0.00 subtract_vector
8.91 1.83 0.30 13861875 0.00 0.00 raySphereIntersection
8.31 2.11 0.28 13861875 0.00 0.00 rayRectangularIntersection
8.02 2.38 0.27 17836094 0.00 0.00 add_vector
7.72 2.64 0.26 10598450 0.00 0.00 normalize
6.83 2.87 0.23 31410180 0.00 0.00 multiply_vector
4.16 3.01 0.14 4620625 0.00 0.00 ray_hit_object
2.38 3.09 0.08 17821809 0.00 0.00 cross_product
2.23 3.17 0.08 1048576 0.00 0.00 ray_color
0.89 3.20 0.03 1048576 0.00 0.00 rayConstruction
0.89 3.23 0.03 1 0.03 3.36 raytracing
0.59 3.25 0.02 4221152 0.00 0.00 multiply_vectors
0.59 3.27 0.02 3838091 0.00 0.00 length
0.59 3.29 0.02 2520791 0.00 0.00 idx_stack_top
0.59 3.31 0.02 2110576 0.00 0.00 localColor
0.59 3.33 0.02 1241598 0.00 0.00 refraction
0.45 3.34 0.02 1241598 0.00 0.00 protect_color_overflow
由原始版本的gprof測試分析得知 dot_product
和subtract_vector
兩個function佔用最多時間所以優先將它們作loop unrolling
編譯
make clean
程式效能
# Rendering scene
Done!
Execution time of raytracing() : 4.773658 sec
gprof測試分析
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
20.01 0.55 0.55 69646433 0.00 0.00 dot_product
15.28 0.97 0.42 31410180 0.00 0.00 multiply_vector
13.10 1.33 0.36 10598450 0.00 0.00 normalize
11.28 1.64 0.31 13861875 0.00 0.00 rayRectangularIntersection
9.46 1.90 0.26 56956357 0.00 0.00 subtract_vector
7.28 2.10 0.20 13861875 0.00 0.00 raySphereIntersection
5.82 2.26 0.16 17836094 0.00 0.00 add_vector
4.73 2.39 0.13 17821809 0.00 0.00 cross_product
4.37 2.51 0.12 4620625 0.00 0.00 ray_hit_object
1.82 2.56 0.05 1048576 0.00 0.00 rayConstruction
1.82 2.61 0.05 1 0.05 2.75 raytracing
1.09 2.64 0.03 4221152 0.00 0.00 multiply_vectors
1.09 2.67 0.03 2110576 0.00 0.00 compute_specular_diffuse
1.09 2.70 0.03 1048576 0.00 0.00 ray_color
0.73 2.72 0.02 2110576 0.00 0.00 localColor
0.36 2.73 0.01 3838091 0.00 0.00 length
dot_product
的部份就省了 0.4秒subtract_vector
執行時間反而上升了,但是執行時間佔的比例從第2名掉到第5名把 math-toolkit.h 中 每個有loop的function 作 loop unrolling 能改善多少呢?
Done!
Execution time of raytracing() : 4.350825 sec
dcmc@ubuntu:~/raytracing$
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
21.23 0.49 0.49 69646433 0.00 0.00 dot_product
15.59 0.85 0.36 13861875 0.00 0.00 rayRectangularIntersection
10.83 1.10 0.25 10598450 0.00 0.00 normalize
9.96 1.33 0.23 56956357 0.00 0.00 subtract_vector
7.36 1.50 0.17 13861875 0.00 0.00 raySphereIntersection
6.71 1.66 0.16 31410180 0.00 0.00 multiply_vector
5.85 1.79 0.14 17821809 0.00 0.00 cross_product
4.33 1.89 0.10 17836094 0.00 0.00 add_vector
3.47 1.97 0.08 4221152 0.00 0.00 multiply_vectors
3.47 2.05 0.08 2110576 0.00 0.00 compute_specular_diffuse
2.60 2.11 0.06 4620625 0.00 0.00 ray_hit_object
2.60 2.17 0.06 2110576 0.00 0.00 localColor
1.73 2.21 0.04 1048576 0.00 0.00 ray_color
0.87 2.23 0.02 1048576 0.00 0.00 idx_stack_init
0.87 2.25 0.02 1048576 0.00 0.00 rayConstruction
0.43 2.26 0.01 2558386 0.00 0.00 idx_stack_empty
0.43 2.27 0.01 2520791 0.00 0.00 idx_stack_top
0.43 2.28 0.01 1241598 0.00 0.00 protect_color_overflow
multiply_vector
作過 loop unrolling 後 , 執行時間佔的比例從第2名掉到第6名參考資料