Try   HackMD

2016q3 Homework1 (raytracing)

contributed by <Jing Zhou>

開發環境

ubuntu 16.04 LTS

準備過程

安裝相關工具

$ sudo apt-get update
$ sudo apt-get install graphviz
$ sudo apt-get install imagemagick

vimrc設定

$ sudo apt-get install vim
$ vim ~/.vimrc

set ai
set cursorline
set enc=utf8
set number
set tabstop=4
set wrap

astyle排版

​$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]

Gprof測試

參考 使用Gnu gprof进行Linux平台下的程序分析 測試

$ gcc -pg test.c
$ gprof -b a.out gmon.out | less

結果(成功)

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00        1     0.00     0.00  a
  0.00      0.00     0.00        1     0.00     0.00  b
  0.00      0.00     0.00        1     0.00     0.00  c

                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name
                0.00    0.00       1/1           b [2]
[1]      0.0    0.00    0.00       1         a [1]
-----------------------------------------------
                0.00    0.00       1/1           main [9]
[2]      0.0    0.00    0.00       1         b [2]
                0.00    0.00       1/1           a [1]
                0.00    0.00       1/1           c [3]
-----------------------------------------------
                0.00    0.00       1/1           b [2]
[3]      0.0    0.00    0.00       1         c [3]
-----------------------------------------------

使用Cflow (未成功)

$ sudo apt install cflow
[linux /home/]$ sudo wget "http://ftp.gnu.org/gnu/cflow/cflow-1.4.tar.gz"
[linux /home/]$ sudo tar zxvf cflow-1.4.tar.gz 
# 跟1.1版不同,configure不在 /cflow-1.4/src
[linux /home/cflow-1.4]$ ./configure
# 以下錯誤
[linux /home/cflow-1.4]$make CFLAGS=-pg LDFLAGS=-pg
[linux /home/cflow-1.4/src]$cflow parser.c

raytracing

取得原始程式碼、編譯和測試:

$ git clone https://github.com/sysprog21/raytracing 
$ cd raytracing
$ make
$ ./raytracing

Execution time of raytracing() : 2.338675 sec

清空編譯檔重新編譯(加上-pg)

$ make clean
$ make PROFILE=1

執行後產生gmon.out檔,用gprof分析

$ ./raytracing
$ gprof -b raytracing gmon.out | less

執行時間 使用-pg的關係使時間變長

Execution time of raytracing() : 5.208219 sec

結果如下,得知subtract_vectordot_product為效能瓶頸

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 22.10      0.53     0.53 69646433     0.00     0.00  dot_product
 15.01      0.89     0.36 56956357     0.00     0.00  subtract_vector
  9.38      1.12     0.23 17821809     0.00     0.00  cross_product
  9.17      1.34     0.22 13861875     0.00     0.00  rayRectangularIntersection
  8.34      1.54     0.20 13861875     0.00     0.00  raySphereIntersection
  7.71      1.72     0.19 31410180     0.00     0.00  multiply_vector
  6.67      1.88     0.16 10598450     0.00     0.00  normalize
  5.21      2.01     0.13  4620625     0.00     0.00  ray_hit_object
  2.92      2.08     0.07 17836094     0.00     0.00  add_vector
  2.50      2.14     0.06  2110576     0.00     0.00  compute_specular_diffuse
  2.08      2.19     0.05  2110576     0.00     0.00  localColor
  2.08      2.24     0.05  1048576     0.00     0.00  ray_color
  2.08      2.29     0.05        1     0.05     2.39  raytracing
  1.67      2.33     0.04  4221152     0.00     0.00  multiply_vectors
  1.67      2.37     0.04  2520791     0.00     0.00  idx_stack_top
  0.42      2.38     0.01  3838091     0.00     0.00  length
  0.42      2.39     0.01  1241598     0.00     0.00  protect_color_overflow
  0.42      2.40     0.01        1     0.01     0.01  delete_sphere_list
  0.21      2.40     0.01  1048576     0.00     0.00  rayConstruction
  0.00      2.40     0.00  2558386     0.00     0.00  idx_stack_empty
  0.00      2.40     0.00  1241598     0.00     0.00  reflection
  0.00      2.40     0.00  1241598     0.00     0.00  refraction
  0.00      2.40     0.00  1204003     0.00     0.00  idx_stack_push
  0.00      2.40     0.00  1048576     0.00     0.00  idx_stack_init
  0.00      2.40     0.00   113297     0.00     0.00  fresnel
  0.00      2.40     0.00    37595     0.00     0.00  idx_stack_pop
  0.00      2.40     0.00        3     0.00     0.00  append_rectangular
  0.00      2.40     0.00        3     0.00     0.00  append_sphere
  0.00      2.40     0.00        2     0.00     0.00  append_light
  0.00      2.40     0.00        1     0.00     0.00  calculateBasisVectors
  0.00      2.40     0.00        1     0.00     0.00  delete_light_list
  0.00      2.40     0.00        1     0.00     0.00  delete_rectangular_list
  0.00      2.40     0.00        1     0.00     0.00  diff_in_second
  0.00      2.40     0.00        1     0.00     0.00  write_to_ppm

用perf檢測同樣可以發現

​$ ./raytracing & sudo perf top -p $!

用loop unrolling優化

針對for迴圈做展開,例如

double dp = 0.0;
for (int i = 0; i < 3; i++)
    dp += v1[i] * v2[i];
		
# 變成	

dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2];

執行時間 下降約0.5秒

​$ ./raytracing

Execution time of raytracing() : 1.814317 sec

  • gprof分析
    執行時間 下降約1.4秒

Execution time of raytracing() : 3.983447 sec

執行結果 dot_product、subtract_vector、add_vector、multiply_vectors、multiply_vector等時間明顯下降

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 17.81      0.21     0.21 69646433     0.00     0.00  dot_product
 14.84      0.39     0.18 13861875     0.00     0.00  rayRectangularIntersection
 14.84      0.56     0.18 56956357     0.00     0.00  subtract_vector
  6.78      0.64     0.08  4620625     0.00     0.00  ray_hit_object
  5.94      0.71     0.07 17836094     0.00     0.00  add_vector
  5.94      0.78     0.07 17821809     0.00     0.00  cross_product
  5.09      0.84     0.06  1048576     0.00     0.00  ray_color
  4.66      0.90     0.06  4221152     0.00     0.00  multiply_vectors
  4.24      0.95     0.05 31410180     0.00     0.00  multiply_vector
  4.24      1.00     0.05  2110576     0.00     0.00  compute_specular_diffuse
  3.39      1.04     0.04  1241598     0.00     0.00  refraction
  2.54      1.07     0.03  3838091     0.00     0.00  length
  2.54      1.10     0.03  2110576     0.00     0.00  localColor
  2.12      1.12     0.03 13861875     0.00     0.00  raySphereIntersection
  1.70      1.14     0.02 10598450     0.00     0.00  normalize
  0.85      1.15     0.01  2520791     0.00     0.00  idx_stack_top
  0.85      1.16     0.01  1048576     0.00     0.00  rayConstruction
  0.85      1.17     0.01   113297     0.00     0.00  fresnel
  0.85      1.18     0.01        1     0.01     1.18  raytracing
  0.00      1.18     0.00  2558386     0.00     0.00  idx_stack_empty
  0.00      1.18     0.00  1241598     0.00     0.00  protect_color_overflow
  0.00      1.18     0.00  1241598     0.00     0.00  reflection
  0.00      1.18     0.00  1204003     0.00     0.00  idx_stack_push
  0.00      1.18     0.00  1048576     0.00     0.00  idx_stack_init
  0.00      1.18     0.00    37595     0.00     0.00  idx_stack_pop
  0.00      1.18     0.00        3     0.00     0.00  append_rectangular
  0.00      1.18     0.00        3     0.00     0.00  append_sphere
  0.00      1.18     0.00        2     0.00     0.00  append_light
  0.00      1.18     0.00        1     0.00     0.00  calculateBasisVectors
  0.00      1.18     0.00        1     0.00     0.00  delete_light_list
  0.00      1.18     0.00        1     0.00     0.00  delete_rectangular_list
  0.00      1.18     0.00        1     0.00     0.00  delete_sphere_list
  0.00      1.18     0.00        1     0.00     0.00  diff_in_second
  0.00      1.18     0.00        1     0.00     0.00  write_to_ppm

用OpenMP優化

  • 方法
#pragma omp parallel for
for (int i = 0; i < 3; i++)
	out[i] = a[i] + b[i];
  • 執行時間大幅增長為139秒,因迴圈執行次數較小,套用平行運算反而降低了效能
$ ./raytracing 
# Rendering scene
Done!
Execution time of raytracing() : 139.242781 sec