# 2016q3 Homework1 (raytracing) ## 案例分析:raytracing - 取得原始程式碼、編譯和測試 ```shell $ git clone https://github.com/sysprog21/raytracing $ cd raytracing $ make $ ./raytracing ``` - 透過 convert 這個工具可將 PPM 轉為 PNG 或其他格式,如: ```shell $ convert out.ppm out.png ``` - `make check` 會檢驗程式碼輸出的圖片是否符合預期,符合的話會得到 “Verified OK!” 字樣 - 使用 gprof 分析 raytracing 生成的數據 ```shell $ gprof ./raytracing | less ``` - gprof 分析中呼叫最多次的函式 ```shell Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 24.89 0.49 0.49 69646433 0.00 0.00 dot_product 22.35 0.93 0.44 56956357 0.00 0.00 subtract_vector 9.65 1.12 0.19 17836094 0.00 0.00 add_vector 8.63 1.29 0.17 13861875 0.00 0.00 rayRectangularIntersection 7.11 1.43 0.14 13861875 0.00 0.00 raySphereIntersection 6.60 1.56 0.13 31410180 0.00 0.00 multiply_vector 5.59 1.67 0.11 17821809 0.00 0.00 cross_product 4.06 1.75 0.08 10598450 0.00 0.00 normalize ``` ### 可能的效能改進方向 - POSIX Thread - OpenMP - software pipelining - loop unrolling ### 原版 - 執行時間:4.759024 sec ### 優化版 - 展開 for 迴圈 - 修改呼叫最多次的 `dot_product()` - 展開 for 迴圈以減少分支指令 ```clike= static inline double dot_product(const double *v1, const double *v2) { double dp = 0.0; dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2]; return dp; } ``` - 執行時間:4.458128 sec ### 優化版 - OpenMP 標頭檔:`#include <omp.h>` 編譯參數:`-fopenmp` - 修改`raytracing()` - 用 64 個 thread 來跑 for 迴圈 ```clike= #pragma omp parallel for num_threads(64) for (int j = 0; j < height; j++) { for (int i = 0; i < width; i++) { double r = 0, g = 0, b = 0; /* MSAA */ for (int s = 0; s < SAMPLES; s++) { idx_stack_init(&stk); rayConstruction(d, u, v, w, i * factor + s / factor, j * factor + s % factor, view, width * factor, height * factor); if (ray_color(view->vrp, 0.0, d, &stk, rectangulars, spheres, lights, object_color, MAX_REFLECTION_BOUNCES)) { r += object_color[0]; g += object_color[1]; b += object_color[2]; } else { r += background_color[0]; g += background_color[1]; b += background_color[2]; } pixels[((i + (j * width)) * 3) + 0] = r * 255 / SAMPLES; pixels[((i + (j * width)) * 3) + 1] = g * 255 / SAMPLES; pixels[((i + (j * width)) * 3) + 2] = b * 255 / SAMPLES; } } } ``` - 執行時間:3.842497 sec - 問題:加上 OpenMP 後,執行了很多次,但每次的執行時間都差很多,有時候比原版快,有時候卻比原本更慢