2016q3 Homework1 (raytracing) === contributed by <`Jing Zhou`> ## 開發環境 ubuntu 16.04 LTS ![](https://i.imgur.com/qTb5JIg.png) ## 準備過程 ### 安裝相關工具 ```shell $ sudo apt-get update $ sudo apt-get install graphviz $ sudo apt-get install imagemagick ``` ### vimrc設定 ```shell $ sudo apt-get install vim $ vim ~/.vimrc set ai set cursorline set enc=utf8 set number set tabstop=4 set wrap ``` ### astyle排版 $ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch] ### Gprof測試 參考 [使用Gnu gprof进行Linux平台下的程序分析](http://os.51cto.com/art/200703/41426.htm) 測試 ```shell $ gcc -pg test.c $ gprof -b a.out gmon.out | less ``` 結果(成功) ``` Flat profile: Each sample counts as 0.01 seconds. no time accumulated % cumulative self self total time seconds seconds calls Ts/call Ts/call name 0.00 0.00 0.00 1 0.00 0.00 a 0.00 0.00 0.00 1 0.00 0.00 b 0.00 0.00 0.00 1 0.00 0.00 c Call graph granularity: each sample hit covers 2 byte(s) no time propagated index % time self children called name 0.00 0.00 1/1 b [2] [1] 0.0 0.00 0.00 1 a [1] ----------------------------------------------- 0.00 0.00 1/1 main [9] [2] 0.0 0.00 0.00 1 b [2] 0.00 0.00 1/1 a [1] 0.00 0.00 1/1 c [3] ----------------------------------------------- 0.00 0.00 1/1 b [2] [3] 0.0 0.00 0.00 1 c [3] ----------------------------------------------- ``` 使用Cflow (未成功) ```shell $ sudo apt install cflow [linux /home/]$ sudo wget "http://ftp.gnu.org/gnu/cflow/cflow-1.4.tar.gz" [linux /home/]$ sudo tar zxvf cflow-1.4.tar.gz # 跟1.1版不同,configure不在 /cflow-1.4/src [linux /home/cflow-1.4]$ ./configure # 以下錯誤 [linux /home/cflow-1.4]$make CFLAGS=-pg LDFLAGS=-pg [linux /home/cflow-1.4/src]$cflow parser.c ``` ## raytracing 取得原始程式碼、編譯和測試: ```shell $ git clone https://github.com/sysprog21/raytracing $ cd raytracing $ make $ ./raytracing ``` >Execution time of raytracing() : 2.338675 sec 清空編譯檔重新編譯(加上-pg) ```shell $ make clean $ make PROFILE=1 ``` 執行後產生gmon.out檔,用gprof分析 ```shell $ ./raytracing $ gprof -b raytracing gmon.out | less ``` 執行時間 使用`-pg`的關係使時間變長 >Execution time of raytracing() : 5.208219 sec 結果如下,得知==subtract_vector==和==dot_product==為效能瓶頸 ```shell Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 22.10 0.53 0.53 69646433 0.00 0.00 dot_product 15.01 0.89 0.36 56956357 0.00 0.00 subtract_vector 9.38 1.12 0.23 17821809 0.00 0.00 cross_product 9.17 1.34 0.22 13861875 0.00 0.00 rayRectangularIntersection 8.34 1.54 0.20 13861875 0.00 0.00 raySphereIntersection 7.71 1.72 0.19 31410180 0.00 0.00 multiply_vector 6.67 1.88 0.16 10598450 0.00 0.00 normalize 5.21 2.01 0.13 4620625 0.00 0.00 ray_hit_object 2.92 2.08 0.07 17836094 0.00 0.00 add_vector 2.50 2.14 0.06 2110576 0.00 0.00 compute_specular_diffuse 2.08 2.19 0.05 2110576 0.00 0.00 localColor 2.08 2.24 0.05 1048576 0.00 0.00 ray_color 2.08 2.29 0.05 1 0.05 2.39 raytracing 1.67 2.33 0.04 4221152 0.00 0.00 multiply_vectors 1.67 2.37 0.04 2520791 0.00 0.00 idx_stack_top 0.42 2.38 0.01 3838091 0.00 0.00 length 0.42 2.39 0.01 1241598 0.00 0.00 protect_color_overflow 0.42 2.40 0.01 1 0.01 0.01 delete_sphere_list 0.21 2.40 0.01 1048576 0.00 0.00 rayConstruction 0.00 2.40 0.00 2558386 0.00 0.00 idx_stack_empty 0.00 2.40 0.00 1241598 0.00 0.00 reflection 0.00 2.40 0.00 1241598 0.00 0.00 refraction 0.00 2.40 0.00 1204003 0.00 0.00 idx_stack_push 0.00 2.40 0.00 1048576 0.00 0.00 idx_stack_init 0.00 2.40 0.00 113297 0.00 0.00 fresnel 0.00 2.40 0.00 37595 0.00 0.00 idx_stack_pop 0.00 2.40 0.00 3 0.00 0.00 append_rectangular 0.00 2.40 0.00 3 0.00 0.00 append_sphere 0.00 2.40 0.00 2 0.00 0.00 append_light 0.00 2.40 0.00 1 0.00 0.00 calculateBasisVectors 0.00 2.40 0.00 1 0.00 0.00 delete_light_list 0.00 2.40 0.00 1 0.00 0.00 delete_rectangular_list 0.00 2.40 0.00 1 0.00 0.00 diff_in_second 0.00 2.40 0.00 1 0.00 0.00 write_to_ppm ``` 用perf檢測同樣可以發現 $ ./raytracing & sudo perf top -p $! ![](https://i.imgur.com/cJ90pqC.png) ### 用loop unrolling優化 針對for迴圈做展開,例如 ```shell double dp = 0.0; for (int i = 0; i < 3; i++) dp += v1[i] * v2[i]; # 變成 dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2]; ``` 執行時間 ==下降約0.5秒== $ ./raytracing >Execution time of raytracing() : 1.814317 sec * gprof分析 執行時間 ==下降約1.4秒== >Execution time of raytracing() : 3.983447 sec 執行結果 ==dot_product、subtract_vector、add_vector、multiply_vectors、multiply_vector等時間明顯下降== ```shell Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 17.81 0.21 0.21 69646433 0.00 0.00 dot_product 14.84 0.39 0.18 13861875 0.00 0.00 rayRectangularIntersection 14.84 0.56 0.18 56956357 0.00 0.00 subtract_vector 6.78 0.64 0.08 4620625 0.00 0.00 ray_hit_object 5.94 0.71 0.07 17836094 0.00 0.00 add_vector 5.94 0.78 0.07 17821809 0.00 0.00 cross_product 5.09 0.84 0.06 1048576 0.00 0.00 ray_color 4.66 0.90 0.06 4221152 0.00 0.00 multiply_vectors 4.24 0.95 0.05 31410180 0.00 0.00 multiply_vector 4.24 1.00 0.05 2110576 0.00 0.00 compute_specular_diffuse 3.39 1.04 0.04 1241598 0.00 0.00 refraction 2.54 1.07 0.03 3838091 0.00 0.00 length 2.54 1.10 0.03 2110576 0.00 0.00 localColor 2.12 1.12 0.03 13861875 0.00 0.00 raySphereIntersection 1.70 1.14 0.02 10598450 0.00 0.00 normalize 0.85 1.15 0.01 2520791 0.00 0.00 idx_stack_top 0.85 1.16 0.01 1048576 0.00 0.00 rayConstruction 0.85 1.17 0.01 113297 0.00 0.00 fresnel 0.85 1.18 0.01 1 0.01 1.18 raytracing 0.00 1.18 0.00 2558386 0.00 0.00 idx_stack_empty 0.00 1.18 0.00 1241598 0.00 0.00 protect_color_overflow 0.00 1.18 0.00 1241598 0.00 0.00 reflection 0.00 1.18 0.00 1204003 0.00 0.00 idx_stack_push 0.00 1.18 0.00 1048576 0.00 0.00 idx_stack_init 0.00 1.18 0.00 37595 0.00 0.00 idx_stack_pop 0.00 1.18 0.00 3 0.00 0.00 append_rectangular 0.00 1.18 0.00 3 0.00 0.00 append_sphere 0.00 1.18 0.00 2 0.00 0.00 append_light 0.00 1.18 0.00 1 0.00 0.00 calculateBasisVectors 0.00 1.18 0.00 1 0.00 0.00 delete_light_list 0.00 1.18 0.00 1 0.00 0.00 delete_rectangular_list 0.00 1.18 0.00 1 0.00 0.00 delete_sphere_list 0.00 1.18 0.00 1 0.00 0.00 diff_in_second 0.00 1.18 0.00 1 0.00 0.00 write_to_ppm ``` ### 用OpenMP優化 * 方法 ```shell #pragma omp parallel for for (int i = 0; i < 3; i++) out[i] = a[i] + b[i]; ``` * 執行時間大幅增長為139秒,因迴圈執行次數較小,套用平行運算反而降低了效能 ```shell $ ./raytracing # Rendering scene Done! Execution time of raytracing() : 139.242781 sec ```