contributed by <Jing Zhou
>
ubuntu 16.04 LTS
$ sudo apt-get update
$ sudo apt-get install graphviz
$ sudo apt-get install imagemagick
$ sudo apt-get install vim
$ vim ~/.vimrc
set ai
set cursorline
set enc=utf8
set number
set tabstop=4
set wrap
$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]
參考 使用Gnu gprof进行Linux平台下的程序分析 測試
$ gcc -pg test.c
$ gprof -b a.out gmon.out | less
結果(成功)
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 a
0.00 0.00 0.00 1 0.00 0.00 b
0.00 0.00 0.00 1 0.00 0.00 c
Call graph
granularity: each sample hit covers 2 byte(s) no time propagated
index % time self children called name
0.00 0.00 1/1 b [2]
[1] 0.0 0.00 0.00 1 a [1]
-----------------------------------------------
0.00 0.00 1/1 main [9]
[2] 0.0 0.00 0.00 1 b [2]
0.00 0.00 1/1 a [1]
0.00 0.00 1/1 c [3]
-----------------------------------------------
0.00 0.00 1/1 b [2]
[3] 0.0 0.00 0.00 1 c [3]
-----------------------------------------------
使用Cflow (未成功)
$ sudo apt install cflow
[linux /home/]$ sudo wget "http://ftp.gnu.org/gnu/cflow/cflow-1.4.tar.gz"
[linux /home/]$ sudo tar zxvf cflow-1.4.tar.gz
# 跟1.1版不同,configure不在 /cflow-1.4/src
[linux /home/cflow-1.4]$ ./configure
# 以下錯誤
[linux /home/cflow-1.4]$make CFLAGS=-pg LDFLAGS=-pg
[linux /home/cflow-1.4/src]$cflow parser.c
取得原始程式碼、編譯和測試:
$ git clone https://github.com/sysprog21/raytracing
$ cd raytracing
$ make
$ ./raytracing
Execution time of raytracing() : 2.338675 sec
清空編譯檔重新編譯(加上-pg)
$ make clean
$ make PROFILE=1
執行後產生gmon.out檔,用gprof分析
$ ./raytracing
$ gprof -b raytracing gmon.out | less
執行時間 使用-pg
的關係使時間變長
Execution time of raytracing() : 5.208219 sec
結果如下,得知subtract_vector和dot_product為效能瓶頸
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
22.10 0.53 0.53 69646433 0.00 0.00 dot_product
15.01 0.89 0.36 56956357 0.00 0.00 subtract_vector
9.38 1.12 0.23 17821809 0.00 0.00 cross_product
9.17 1.34 0.22 13861875 0.00 0.00 rayRectangularIntersection
8.34 1.54 0.20 13861875 0.00 0.00 raySphereIntersection
7.71 1.72 0.19 31410180 0.00 0.00 multiply_vector
6.67 1.88 0.16 10598450 0.00 0.00 normalize
5.21 2.01 0.13 4620625 0.00 0.00 ray_hit_object
2.92 2.08 0.07 17836094 0.00 0.00 add_vector
2.50 2.14 0.06 2110576 0.00 0.00 compute_specular_diffuse
2.08 2.19 0.05 2110576 0.00 0.00 localColor
2.08 2.24 0.05 1048576 0.00 0.00 ray_color
2.08 2.29 0.05 1 0.05 2.39 raytracing
1.67 2.33 0.04 4221152 0.00 0.00 multiply_vectors
1.67 2.37 0.04 2520791 0.00 0.00 idx_stack_top
0.42 2.38 0.01 3838091 0.00 0.00 length
0.42 2.39 0.01 1241598 0.00 0.00 protect_color_overflow
0.42 2.40 0.01 1 0.01 0.01 delete_sphere_list
0.21 2.40 0.01 1048576 0.00 0.00 rayConstruction
0.00 2.40 0.00 2558386 0.00 0.00 idx_stack_empty
0.00 2.40 0.00 1241598 0.00 0.00 reflection
0.00 2.40 0.00 1241598 0.00 0.00 refraction
0.00 2.40 0.00 1204003 0.00 0.00 idx_stack_push
0.00 2.40 0.00 1048576 0.00 0.00 idx_stack_init
0.00 2.40 0.00 113297 0.00 0.00 fresnel
0.00 2.40 0.00 37595 0.00 0.00 idx_stack_pop
0.00 2.40 0.00 3 0.00 0.00 append_rectangular
0.00 2.40 0.00 3 0.00 0.00 append_sphere
0.00 2.40 0.00 2 0.00 0.00 append_light
0.00 2.40 0.00 1 0.00 0.00 calculateBasisVectors
0.00 2.40 0.00 1 0.00 0.00 delete_light_list
0.00 2.40 0.00 1 0.00 0.00 delete_rectangular_list
0.00 2.40 0.00 1 0.00 0.00 diff_in_second
0.00 2.40 0.00 1 0.00 0.00 write_to_ppm
用perf檢測同樣可以發現
$ ./raytracing & sudo perf top -p $!
針對for迴圈做展開,例如
double dp = 0.0;
for (int i = 0; i < 3; i++)
dp += v1[i] * v2[i];
# 變成
dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2];
執行時間 下降約0.5秒
$ ./raytracing
Execution time of raytracing() : 1.814317 sec
Execution time of raytracing() : 3.983447 sec
執行結果 dot_product、subtract_vector、add_vector、multiply_vectors、multiply_vector等時間明顯下降
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.81 0.21 0.21 69646433 0.00 0.00 dot_product
14.84 0.39 0.18 13861875 0.00 0.00 rayRectangularIntersection
14.84 0.56 0.18 56956357 0.00 0.00 subtract_vector
6.78 0.64 0.08 4620625 0.00 0.00 ray_hit_object
5.94 0.71 0.07 17836094 0.00 0.00 add_vector
5.94 0.78 0.07 17821809 0.00 0.00 cross_product
5.09 0.84 0.06 1048576 0.00 0.00 ray_color
4.66 0.90 0.06 4221152 0.00 0.00 multiply_vectors
4.24 0.95 0.05 31410180 0.00 0.00 multiply_vector
4.24 1.00 0.05 2110576 0.00 0.00 compute_specular_diffuse
3.39 1.04 0.04 1241598 0.00 0.00 refraction
2.54 1.07 0.03 3838091 0.00 0.00 length
2.54 1.10 0.03 2110576 0.00 0.00 localColor
2.12 1.12 0.03 13861875 0.00 0.00 raySphereIntersection
1.70 1.14 0.02 10598450 0.00 0.00 normalize
0.85 1.15 0.01 2520791 0.00 0.00 idx_stack_top
0.85 1.16 0.01 1048576 0.00 0.00 rayConstruction
0.85 1.17 0.01 113297 0.00 0.00 fresnel
0.85 1.18 0.01 1 0.01 1.18 raytracing
0.00 1.18 0.00 2558386 0.00 0.00 idx_stack_empty
0.00 1.18 0.00 1241598 0.00 0.00 protect_color_overflow
0.00 1.18 0.00 1241598 0.00 0.00 reflection
0.00 1.18 0.00 1204003 0.00 0.00 idx_stack_push
0.00 1.18 0.00 1048576 0.00 0.00 idx_stack_init
0.00 1.18 0.00 37595 0.00 0.00 idx_stack_pop
0.00 1.18 0.00 3 0.00 0.00 append_rectangular
0.00 1.18 0.00 3 0.00 0.00 append_sphere
0.00 1.18 0.00 2 0.00 0.00 append_light
0.00 1.18 0.00 1 0.00 0.00 calculateBasisVectors
0.00 1.18 0.00 1 0.00 0.00 delete_light_list
0.00 1.18 0.00 1 0.00 0.00 delete_rectangular_list
0.00 1.18 0.00 1 0.00 0.00 delete_sphere_list
0.00 1.18 0.00 1 0.00 0.00 diff_in_second
0.00 1.18 0.00 1 0.00 0.00 write_to_ppm
#pragma omp parallel for
for (int i = 0; i < 3; i++)
out[i] = a[i] + b[i];
$ ./raytracing
# Rendering scene
Done!
Execution time of raytracing() : 139.242781 sec