heathcliffYang
contributed by <heathcliffYang
>
yachiyang01
etc276
###
,這樣讀者就可以在看到「進行優化與比較」時一目了然。gprof ./raytracing | gprof2dot | dot -T png -o output.png
指令參考 hugikun999 的共筆,將 gprof 輸出的資訊轉成圖片表示
可以發現,raytracing 要建立光影的模型會呼叫到幾種數學運算
% cumulative self self total
time seconds seconds calls s/call s/call name
22.49 0.49 0.49 69646433 0.00 0.00 dot_product
16.98 0.86 0.37 56956357 0.00 0.00 subtract_vector
10.79 1.10 0.24 13861875 0.00 0.00 rayRectangularIntersection
8.49 1.28 0.19 31410180 0.00 0.00 multiply_vector
7.34 1.44 0.16 17836094 0.00 0.00 add_vector
7.34 1.60 0.16 10598450 0.00 0.00 normalize
5.97 1.73 0.13 4620625 0.00 0.00 ray_hit_object
4.82 1.84 0.11 13861875 0.00 0.00 raySphereIntersection
4.13 1.93 0.09 17821809 0.00 0.00 cross_product
2.29 1.98 0.05 1048576 0.00 0.00 ray_color
1.84 2.02 0.04 2110576 0.00 0.00 compute_specular_diffuse
1.61 2.05 0.04 4221152 0.00 0.00 multiply_vectors
1.38 2.08 0.03 1241598 0.00 0.00 refraction
0.92 2.10 0.02 3838091 0.00 0.00 length
在官方 Manual 提到:
主要花時間的數學運算可以從 self seconds 看出。
Performance counter stats for './raytracing' (100 runs):
703,209 cache-misses # 31.249 % of all cache refs ( +- 4.67% )
2,250,333 cache-references ( +- 5.17% )
7,446,324 branch-misses ( +- 0.59% )
4.132374478 seconds time elapsed ( +- 2.20% )
Loop unrolling
Performance counter stats for './raytracing' (100 runs):
323,532 cache-misses # 23.108 % of all cache refs ( +- 5.11% )
1,400,109 cache-references ( +- 5.30% )
5,727,748 branch-misses ( +- 0.23% )
2.586578907 seconds time elapsed ( +- 1.32% )
% cumulative self self total
time seconds seconds calls s/call s/call name
24.86 0.41 0.41 69646433 0.00 0.00 dot_product
11.22 0.60 0.19 13861875 0.00 0.00 rayRectangularIntersection
11.22 0.78 0.19 31410180 0.00 0.00 multiply_vector
10.92 0.96 0.18 56956357 0.00 0.00 subtract_vector
9.70 1.12 0.16 10598450 0.00 0.00 normalize
7.58 1.25 0.13 17821809 0.00 0.00 cross_product
7.28 1.37 0.12 4620625 0.00 0.00 ray_hit_object
5.76 1.46 0.10 13861875 0.00 0.00 raySphereIntersection
2.12 1.50 0.04 3838091 0.00 0.00 length
1.82 1.53 0.03 2110576 0.00 0.00 localColor
1.52 1.55 0.03 17836094 0.00 0.00 add_vector
1.21 1.57 0.02 4221152 0.00 0.00 multiply_vectors
1.21 1.59 0.02 1048576 0.00 0.00 rayConstruction
1.21 1.61 0.02 1048576 0.00 0.00 ray_color
POSIX thread
因為 typedef double point3[3];
-> point3 是傳 address
Software pipeline
這個 loop 總共有六圈,每一圈被分為 4 個 stage (也就是 4 個 block),讓他們前後 overlap –> 可同時執行不同階段的工作,增加 throughput
Performance counter stats for './raytracing' (100 runs):
95,023 cache-misses # 26.193 % of all cache refs ( +- 8.58% )
362,784 cache-references ( +- 13.95% )
5,151,867 branch-misses ( +- 0.26% )
2.337996284 seconds time elapsed ( +- 0.89% )
圖片源自 Introduction to Software Pipelining in the IA-64 Architecture Architecture
Performance counter stats for './raytracing' (100 runs):
50,139 cache-misses # 24.425 % of all cache refs ( +- 1.03% )
205,274 cache-references ( +- 2.93% )
5,184,165 branch-misses ( +- 0.83%)
1.320096975 seconds time elapsed ( +- 1.69% )
Performance counter stats for './raytracing' (100 runs):
61,863 cache-misses # 18.482 % of all cache refs ( +- 2.03% )
334,723 cache-references ( +- 2.57% )
5,663,438 branch-misses ( +- 0.32% )
1.497532812 seconds time elapsed ( +- 2.81% )
Performance counter stats for './raytracing' (100 runs):
62,719 cache-misses # 19.101 % of all cache refs ( +- 1.46% )
328,354 cache-references ( +- 1.91% )
5,877,346 branch-misses ( +- 0.07% )
1.197949277 seconds time elapsed ( +- 2.10% )
待參照資料 :
Compiler Optimizations for Modern VLIW/EPIC Architectures