Ray tracing optimization

# Ray tracing optimization ## Development Environment * OS : 14.04.1-Ubuntu * kernel version : 3.19.0-25-generic * model name : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz * L1d cache: 32K * L1i cache: 32K * L2 cache: 3072K * memory description: System Memory physical id: 20 slot: System board or motherboard size: 8GiB * bank:0 description: SODIMM DDR3 Synchronous 1066 MHz (0.9 ns) product: HMT351S6EFR8C-PB vendor: Hynix Semiconductor (Hyundai Electronics) size: 4GiB width: 64 bits clock: 1066MHz (0.9ns) * bank:1 description: SODIMM DDR3 Synchronous 1066 MHz (0.9 ns) product: HMT351S6EFR8C-PB vendor: Hynix Semiconductor (Hyundai Electronics) size: 4GiB width: 64 bits clock: 1066MHz (0.9ns) ## Original * 關閉最佳化執行速度（不執行gprof）執行 ./raytracing 5次，平均花了 6.152786 sec * 關閉最佳化執行速度（執行gprof）執行 ./raytracing 5次，平均花了 9.835906 sec * 小結：因為gprof在每個function加入了程式碼導致時間變長。 install [gprof2dot](https://github.com/jrfonseca/gprof2dot) ```bash sudo apt-get install python graphviz sudo apt-get install python-pip sudo pip install gprof2dot sudo chmod 777 /usr/local/lib/python2.7/dist-packages/gprof2dot.py ``` 修改[Makefile](https://embedded2016.hackpad.com/2016q1-Homework-2A-GalzL151aZc) ```bash GPROF2DOT = \ /usr/local/lib/python2.7/dist-packages/gprof2dot.py plot: make clean make PROFILE=1 ./raytracing gprof ./$(EXEC) | $(GPROF2DOT) | dot -Tpng -o $@.png; ``` [使用graphviz + gprof畫出關係圖](https://embedded2016.hackpad.com/2016q1-Homework-2A-GalzL151aZc) ```bash make plot eog plot.png ``` ![](https://i.imgur.com/IRM6u1c.png) 從分析結果來看， dot_product佔了總時間23.9%，接下來要對dot_product做優化. ## Loop Unrolling 優化1 將for迴圈展開來執行。 ```bash=61 static inline double dot_product(const double *v1, const double *v2) { #ifdef DOT_PRODUCT_LOOP_UNROLLING_OPTIMIZATION return v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2]; #else double dp = 0.0; for (int i = 0; i < 3; i++) dp += v1[i] * v2[i]; return dp; #endif } ``` ```bash CFLAGS = \ -std=gnu99 -Wall -O0 -g -DDOT_PRODUCT_LOOP_UNROLLING_OPTIMIZATION ``` ![](https://i.imgur.com/kLFLLDQ.png) 未優化前為 9.835906 sec, 執行5次取平均為 8.928204 sec, 小結：共加快 0.907702 sec, 針對花最多時間的function的做優化，只改幾行code，就可以加快很多時間。 ## 消除 assert 對效能的影響優化2 從之前[同學的共筆中](https://embedded2016.hackpad.com/ep/pad/static/f5CCUGMQ4Kp)得知，若定義NDEBUG，就可以消除 assert 對效能的影響，於是就想要trace code來得知這訊息，<br> * step 1： Makefile有下面的code， ``` CC ?= gcc ``` * step 2：執行 gcc -v ```bash xxx@xxx-ThinkPad-X200:~/Jserv/2016_fall_school_course/raytracing$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04.3' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ``` * step 3：看到 --prefix=/usr ```bash xxx@xxx-ThinkPad-X200:~/Jserv/2016_fall_school_course/raytracing$ cd /usr xxx@xxx-ThinkPad-X200:/usr$ find -name "assert.h" | nl 1 ./include/assert.h 2 ./lib/syslinux/com32/include/assert.h ``` * step 4： open ./include/assert.h ```bash xxx@xxx-ThinkPad-X200:/usr$ vim `find -name "assert.h" | sed -n 1P` ``` * step 5：發現有以下的code，才知道原來定義NDEBUG,assert是不會有動作的 ```bash=43 /* void assert (int expression); If NDEBUG is defined, do nothing. If not, and EXPRESSION is zero, print an error message and abort. */ #ifdef NDEBUG # define assert(expr) (__ASSERT_VOID_CAST (0)) ``` * 在Makefile加入define NDEBUG執行程式： ``` 6 CFLAGS = \ 7 -std=gnu99 -Wall -O0 -g -DNDEBUG ``` 執行5次，取平均為 8.636865 s ![](https://i.imgur.com/32QpaKV.png) * 小結前一個優化執行時間 8.928204 s，修改後（define NDEBUG） 8.636865 s，小有改善。 >為什麼define NDEBUG，有時反而還會增加執行時間呢？[name=王佑誌][time=Sun, Oct 9, 2016 8:44 PM] >有什麼方法可以避免量測的執行時間變動過大嗎？ [name=王佑誌] >大量執行，並利用信賴區間刪掉極端數值，from Jserv 參考資料 [shelly4132: 共筆](https://hackmd.io/s/HJrLU0dp) ## video 記錄不要使用虛擬機安裝工具 ```bash $ sudo apt-get update $ sudo apt-get install graphviz $ sudo apt-get install imagemagick ``` gravphviz：可畫示意圖，關係圖 imagemagick(covert)：轉換圖片格式相關工具 cloc：計算source code的行數，comment數，code行數 eog，gimp：看圖片 gprof：可看出function執行時間。 make PROFILE=1 => 在編譯時加入 -pg gnu compiler選項 -pg，在function的進入與結束，加入了些代碼，可用來計算function在程式執行時期的時間，及關係圖，可看程式的熱點。 make clean make PROFILE=1 ./raytracing => 執行會增加時間 gprof ./raytracing | less -Ofast 執行時間，從6秒減少至0.8 需要學習統計學，提升思考維度 static是對定義才有效，對宣告無效， inline：可將code展開，減少function call的over head，而inline只是提示compiler，compiler只有最佳化時才會展開，可用forceinline強迫展開，但有一些特別的case不能展開 simd擁有獨立的register，load store需要花時間，若是經常切換執行，會花更多的時間。呼叫printf，printk是有成本的，有可能要去測試10個單位的程式碼，卻安插了printk可能高達10000個單位的程式碼。 GDB：在測試時，可以不用修改c code來做測試，原因有二：1。如果沒有source code，就只能依照api的header去改code，若header的文件有問題呢？則當場就可知道。如格式不合。 2。可做多樣測試，以link list為例，可以使用script做4個node，100個node，199999個node ..... GDB還可以將memory的值寫成一個檔案，和其它機器比對，更重要的是還可以跨平台。