Raytracying 是可以平行化的可以嘗試做作看,Callgraph 可以讓你了解各個Funcation 之間的關係。可以解釋一下Force inline的功能。
# Rendering scene
Done!
Execution time of raytracing() : 0.730467 sec
Execution time of raytracing() : 7.109102 sec
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
25.63 0.84 0.84 69646433 0.00 0.00 dot_product
16.78 1.39 0.55 56956357 0.00 0.00 subtract_vector
9.76 1.71 0.32 10598450 0.00 0.00 normalize
8.85 2.00 0.29 31410180 0.00 0.00 multiply_vector
7.93 2.26 0.26 17836094 0.00 0.00 add_vector
7.78 2.52 0.26 13861875 0.00 0.00 rayRectangularIntersection
5.49 2.70 0.18 17821809 0.00 0.00 cross_product
4.88 2.86 0.16 4620625 0.00 0.00 ray_hit_object
3.20 2.96 0.11 13861875 0.00 0.00 raySphereIntersection
2.75 3.05 0.09 4221152 0.00 0.00 multiply_vectors
1.53 3.10 0.05 1048576 0.00 0.00 ray_color
0.92 3.13 0.03 3838091 0.00 0.00 length
0.92 3.16 0.03 2110576 0.00 0.00 compute_specular_diffuse
0.92 3.19 0.03 1241598 0.00 0.00 refraction
0.92 3.22 0.03 1 0.03 3.28 raytracing
0.61 3.24 0.02 1241598 0.00 0.00 protect_color_overflow
0.31 3.25 0.01 1241598 0.00 0.00 reflection
0.31 3.26 0.01 1048576 0.00 0.00 rayConstruction
0.31 3.27 0.01 113297 0.00 0.00 fresnel
0.15 3.28 0.01 2520791 0.00 0.00 idx_stack_top
0.15 3.28 0.01 37595 0.00 0.00 idx_stack_pop
0.00 3.28 0.00 2558386 0.00 0.00 idx_stack_empty
0.00 3.28 0.00 2110576 0.00 0.00 localColor
0.00 3.28 0.00 1204003 0.00 0.00 idx_stack_push
0.00 3.28 0.00 1048576 0.00 0.00 idx_stack_init
0.00 3.28 0.00 3 0.00 0.00 append_rectangular
0.00 3.28 0.00 3 0.00 0.00 append_sphere
0.00 3.28 0.00 2 0.00 0.00 append_light
0.00 3.28 0.00 1 0.00 0.00 calculateBasisVectors
0.00 3.28 0.00 1 0.00 0.00 delete_light_list
0.00 3.28 0.00 1 0.00 0.00 delete_rectangular_list
0.00 3.28 0.00 1 0.00 0.00 delete_sphere_list
0.00 3.28 0.00 1 0.00 0.00 diff_in_second
0.00 3.28 0.00 1 0.00 0.00 write_to_ppm
Execution time of raytracing() : 7.425032 sec
Performance counter stats for './raytracing':
37,344 cache-misses # 23.133 % of all cache refs
161,432 cache-references
32,983,244,564 instructions # 1.54 insns per cycle
21,383,486,641 cycles
7.426804060 seconds time elapsed
dot_product
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
for (int i = 0; i < 3; i++)
dp += v1[i] * v2[i];
return dp;
}
subtract_vector
static inline
void subtract_vector(const double *a, const double *b, double *out)
{
for (int i = 0; i < 3; i++)
out[i] = a[i] - b[i];
}
執行時間
Execution time of raytracing() : 6.381563 sec
gprof結果
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
13.83 0.30 0.30 10598450 0.00 0.00 normalize
12.68 0.58 0.28 31410180 0.00 0.00 multiply_vector
11.99 0.84 0.26 17836094 0.00 0.00 add_vector
10.61 1.07 0.23 56956357 0.00 0.00 subtract_vector
10.14 1.29 0.22 69646433 0.00 0.00 dot_product
8.07 1.46 0.18 13861875 0.00 0.00 rayRectangularIntersection
6.92 1.61 0.15 4620625 0.00 0.00 ray_hit_object
5.30 1.73 0.12 17821809 0.00 0.00 cross_product
其他的運算(add_vector, multiply_vectors)也有類似的方式就一起改善
Execution time of raytracing() : 6.075147 sec
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
14.48 0.28 0.28 31410180 0.00 0.00 multiply_vector
14.22 0.55 0.27 10598450 0.00 0.00 normalize
13.17 0.80 0.25 56956357 0.00 0.00 subtract_vector
12.90 1.04 0.25 69646433 0.00 0.00 dot_product
12.11 1.27 0.23 13861875 0.00 0.00 rayRectangularIntersection
4.74 1.36 0.09 17821809 0.00 0.00 cross_product
4.74 1.45 0.09 13861875 0.00 0.00 raySphereIntersection
4.74 1.54 0.09 4620625 0.00 0.00 ray_hit_object
Execution time of raytracing() : 7.425032 sec
Performance counter stats for './raytracing':
37,344 cache-misses # 23.133 % of all cache refs
161,432 cache-references
32,983,244,564 instructions # 1.54 insns per cycle
21,383,486,641 cycles
7.426804060 seconds time elapsed
static inline __attribute__((always_inline))
void normalize(double *v)
{
double d = sqrt(v[0] * v[0] + v[1] * v[1] + v[2] * v[2]);
assert(d != 0.0 && "Error calculating normal");
v[0] /= d;
v[1] /= d;
v[2] /= d;
}
執行時間
Execution time of raytracing() : 3.098384 sec
gprof 結果
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
33.83 0.70 0.70 13861875 0.00 0.00 rayRectangularIntersection
15.46 1.02 0.32 13861875 0.00 0.00 raySphereIntersection
13.53 1.30 0.28 2110576 0.00 0.00 localColor
12.08 1.55 0.25 2110576 0.00 0.00 compute_specular_diffuse
8.22 1.72 0.17 1048576 0.00 0.00 ray_color
5.32 1.83 0.11 4620625 0.00 0.00 ray_hit_object
3.87 1.91 0.08 1241598 0.00 0.00 reflection
3.38 1.98 0.07 1048576 0.00 0.00 rayConstruction
0.97 2.00 0.02 2558386 0.00 0.00 idx_stack_empty
0.97 2.02 0.02 1241598 0.00 0.00 refraction
0.97 2.04 0.02 1 0.02 2.07 raytracing
# Rendering scene
Done!
Execution time of raytracing() : 4.239852 sec
比force inline 多了約一秒左右
TotallyWrong 共筆
inline 了解 - function attribute
force inline方式