tundergod
hw1
2016q3
contributed by <tundergod
>
make PROFILE=1
重新編譯程式碼,並且學習 gprof
math-toolkit.h
在內的函式實做,充分紀錄效能差異在共筆
-O0
(關閉最佳化)math-toolkit.h
定義的若干數學操作函式很重要,可參考 Investigating SSE cross product performance、MathFu 原始程式碼,以及 2015q3 Homework #1 ExtLinux 版本: Ubuntu 16.04 LTS
硬體資訊:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 69
Model name: Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
Stepping: 1
CPU MHz: 1661.660
CPU max MHz: 2600.0000
CPU min MHz: 800.0000
BogoMIPS: 4589.54
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
# Rendering scene
Done!
Execution time of raytracing() : 9.463558 sec
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
22.98 0.99 0.99 69646433 0.00 0.00 dot_product
17.64 1.75 0.76 56956357 0.00 0.00 subtract_vector
9.87 2.18 0.43 13861875 0.00 0.00 rayRectangularIntersection
9.29 2.58 0.40 31410180 0.00 0.00 multiply_vector
7.43 2.90 0.32 10598450 0.00 0.00 normalize
5.92 3.15 0.26 13861875 0.00 0.00 raySphereIntersection
5.69 3.40 0.25 17836094 0.00 0.00 add_vector
5.57 3.64 0.24 4620625 0.00 0.00 ray_hit_object
4.41 3.83 0.19 17821809 0.00 0.00 cross_product
3.71 3.99 0.16 1048576 0.00 0.00 ray_color
2.79 4.11 0.12 4221152 0.00 0.00 multiply_vectors
1.86 4.19 0.08 1 0.08 4.31 raytracing
0.70 4.22 0.03 2520791 0.00 0.00 idx_stack_top
0.70 4.25 0.03 1048576 0.00 0.00 rayConstruction
0.58 4.27 0.03 3838091 0.00 0.00 length
0.46 4.29 0.02 2110576 0.00 0.00 compute_specular_diffuse
0.46 4.31 0.02 1241598 0.00 0.00 refraction
0.00 4.31 0.00 2558386 0.00 0.00 idx_stack_empty
0.00 4.31 0.00 2110576 0.00 0.00 localColor
0.00 4.31 0.00 1241598 0.00 0.00 protect_color_overflow
0.00 4.31 0.00 1241598 0.00 0.00 reflection
0.00 4.31 0.00 1204003 0.00 0.00 idx_stack_push
0.00 4.31 0.00 1048576 0.00 0.00 idx_stack_init
0.00 4.31 0.00 113297 0.00 0.00 fresnel
0.00 4.31 0.00 37595 0.00 0.00 idx_stack_pop
0.00 4.31 0.00 3 0.00 0.00 append_rectangular
0.00 4.31 0.00 3 0.00 0.00 append_sphere
0.00 4.31 0.00 2 0.00 0.00 append_light
0.00 4.31 0.00 1 0.00 0.00 calculateBasisVectors
0.00 4.31 0.00 1 0.00 0.00 delete_light_list
0.00 4.31 0.00 1 0.00 0.00 delete_rectangular_list
0.00 4.31 0.00 1 0.00 0.00 delete_sphere_list
0.00 4.31 0.00 1 0.00 0.00 diff_in_second
0.00 4.31 0.00 1 0.00 0.00 write_to_ppm
static inline
double dot_product(const double *v1, const double *v2)
{
return v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2];
}
static inline __attribute__((always_inline))
double length(const double *v)
{
return sqrt(v[0] * v[0] + v[1] * v[1] + v[2] * v[2]);
}
#define DOT(a,b) ((a[0]*b[0])+(a[1]*b[1])+(a[2]*b[2]))
CFLAGS=-DNDEBUG
以消除 assert 對效能的影響7.43 2.90 0.32 10598450 0.00 0.00 normalize
#pragma omp parallel for num_threads(N) \
private(stk), private(d), private(object_color)
for (int j = 0 ; j < 512; j++) {
for (int i = 0; i < 512; i++) {
.....
}
}
typedef struct __ARG {
uint8_t *pixels;
color background;
rectangular_node rectangulars;
sphere_node spheres;
light_node lights;
const viewpoint *View;
int start_j;
point3 u, v, w, d;
} arg;
for (int j = (*data).start_j ; j < 512; j+=N) {
for (int i = 0; i < 512; i++) {
...
}
}
# Rendering scene
Done!
Execution time of raytracing() : 0.671045 sec
圖表: