2016q3 Homework1 (raytracing)

# 2016q3 Homework1 (raytracing) contributed by <`linachiu`> ### Reviewed by `shelly4132` - 可以再嘗試OpenMp、POSIX Thread等優化方式 - 可以利用 **gnuplot** 繪製出效能比較圖表 ## 安裝相關工具 - graphviz: 畫示意圖 - imagemagick: 格式轉換 ```shell $ sudo apt-get update $ sudo apt-get install graphviz $ sudo apt-get install imagemagick ``` ---- ### 預期目標 - 學習效能分析工具 - 優化程式 ___ ### 先跟著老師的步驟看看會得到什麼 ``` $ make $ ./raytracing ``` 結果 ``` # Rendering scene Done! Execution time of raytracing() : 3.209233 sec ``` 和跑出一張光影圖 ![](https://i.imgur.com/1h4cNlz.jpg) 使用以下指令將他轉為png檔 ``` $ convert out.ppm out.png ``` ____ ## 效能分析工具 --gprof - GNU的工具 - 使用方式 - 編譯時加上`-pg`的參數，編譯器會在各函數中加入mcount函數 - 執行產生gmon.out - `$ ./raytracing gprof -b raytracing gmon.out | less` 執行gmon.out - `$ gprof ./raytracing | less` 可以看每個函式所佔時間比率 ![](https://i.imgur.com/df3hvMM.jpg) - gprof v.s perf **perf top** 的原理是每隔一段時間採樣一次，最後根據這些資料輸出,因此採樣的頻率可能會造成結果的不同。 --- ## 使用 gprof 編譯程式 (未優化) 先做`$ make clean` `$ make PROFILE=1`重新編譯 (使用gprof) - 執行 ```./raytracing gprof -b raytracing gmon.out | less``` 我們得到 ``` # Rendering scene Done! Execution time of raytracing() : 7.402606 sec ``` 時間上比剛才多出了很多，為什麼呢? 因為編譯器會在每個函數中加入 mcount 函數，在執行的時候記錄相關資訊。這些資訊會儲存至 gmon.out 中，最後呼叫 gprof 來繪製相關表格。 - 呼叫 gprof 繪製相關表格 ``` $ gprof ./raytracing | less ``` 因為很多所以只列出前幾個 ```shell= Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 20.83 0.61 0.61 69646433 0.00 0.00 dot_product 19.12 1.17 0.56 56956357 0.00 0.00 subtract_vector 8.54 1.42 0.25 10598450 0.00 0.00 normalize 8.37 1.67 0.25 31410180 0.00 0.00 multiply_vector 8.03 1.90 0.24 13861875 0.00 0.00 rayRectangularIntersection 8.03 2.14 0.24 13861875 0.00 0.00 raySphereIntersection 6.83 2.34 0.20 4620625 0.00 0.00 ray_hit_object 3.42 2.44 0.10 17821809 0.00 0.00 cross_product 3.24 2.53 0.10 4221152 0.00 0.00 multiply_vectors 2.90 2.62 0.09 17836094 0.00 0.00 add_vector 2.05 2.68 0.06 1048576 0.00 0.00 ray_color 1.71 2.73 0.05 2110576 0.00 0.00 compute_specular_diffuse 1.71 2.78 0.05 2110576 0.00 0.00 localColor 1.54 2.82 0.05 3838091 0.00 0.00 length ``` - 我們可以發現 dot_product 函數執行時間佔了最多，就讓我們來看看 dot_product 長怎麼樣 ![](https://i.imgur.com/8CCl6Bh.jpg) >> 避免用圖片呈現原始程式碼，請改正 [name=jserv] 看起來很正常的for loop ，但是其實其中牽扯到了branch 分支，我們可以使用減少MIPS的方式來優化他 ____ ## 優化方法(一) Loop unrolling 像是 dot_product 這樣可以簡單展開又不會影響閱讀的loop，其實就可以使用Loop unrolling 的方式優化 ![](https://i.imgur.com/RFjPPub.jpg) ``` # Rendering scene Done! Execution time of raytracing() : 6.884073 sec ``` 可以發現 Loop unrolling 後，執行時間從原本的 7.402606 sec 降至 6.884073 sec 看看各函式的執行時間 ```shell= Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 20.53 0.55 0.55 56956357 0.00 0.00 subtract_vector 13.44 0.91 0.36 13861875 0.00 0.00 rayRectangularIntersection 11.57 1.22 0.31 69646433 0.00 0.00 dot_product 10.83 1.51 0.29 10598450 0.00 0.00 normalize 8.96 1.75 0.24 31410180 0.00 0.00 multiply_vector 8.03 1.97 0.22 17836094 0.00 0.00 add_vector 7.09 2.16 0.19 13861875 0.00 0.00 raySphereIntersection 6.35 2.33 0.17 17821809 0.00 0.00 cross_product 2.99 2.41 0.08 1048576 0.00 0.00 ray_color 2.61 2.48 0.07 4620625 0.00 0.00 ray_hit_object 1.87 2.53 0.05 2110576 0.00 0.00 compute_specular_diffuse 1.49 2.57 0.04 4221152 0.00 0.00 multiply_vectors 1.12 2.60 0.03 1048576 0.00 0.00 rayConstruction 1.12 2.63 0.03 1 0.03 2.68 raytracing ``` - dot_product 的所佔比率從原本的21% 大幅降至 12% - Loop unrolling，可以減少branch數量 - 減少branch的使用 - 運算上，加減次數並不會減少，省下的每次for loop 產生的jmp ____ ## 優化方法(二) force inline function 我們可以在位優化前的表格中發現 math-toolkit.h 中的函示幾乎都排在前幾名，這裡使用force inline 的方式將static inline 改成 __attribute__((always_inline))強制開啟inline。因為我們不使用編譯器的最佳化，所以在編譯時會產生許多warming ``` # Rendering scene Done! Execution time of raytracing() : 5.294769 sec ``` 執行後發現從 6.884073 sec 又少了 1.6秒 ```shell= Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 35.19 0.99 0.99 13861875 0.00 0.00 rayRectangularIntersection 12.68 1.34 0.36 13861875 0.00 0.00 raySphereIntersection 9.29 1.60 0.26 31410180 0.00 0.00 multiply_vector 8.22 1.83 0.23 2110576 0.00 0.00 compute_specular_diffuse 7.15 2.03 0.20 17821809 0.00 0.00 cross_product 6.61 2.22 0.19 17836094 0.00 0.00 add_vector 6.43 2.40 0.18 4620625 0.00 0.00 ray_hit_object 3.57 2.50 0.10 1048576 0.00 0.00 ray_color 2.14 2.56 0.06 1 0.06 2.78 raytracing 1.79 2.61 0.05 4221152 0.00 0.00 multiply_vectors 1.43 2.65 0.04 2110576 0.00 0.00 localColor 1.07 2.68 0.03 1241598 0.00 0.00 refraction 0.71 2.70 0.02 1241598 0.00 0.00 protect_color_overflow 0.71 2.72 0.02 1048576 0.00 0.00 idx_stack_init 0.71 2.74 0.02 1048576 0.00 0.00 rayConstruction 0.71 2.76 0.02 subtract_vector ``` 原本的前三名也都到後面去了 ## 參考資料 - [gprof介紹](https://sourceware.org/binutils/docs/gprof/) - [勃興筆記](https://hackmd.io/s/BygM4qP6) - 助教