# 2016q3 Homework1 (raytracing)
## 案例分析:raytracing
- 取得原始程式碼、編譯和測試
```shell
$ git clone https://github.com/sysprog21/raytracing
$ cd raytracing
$ make
$ ./raytracing
```
- 透過 convert 這個工具可將 PPM 轉為 PNG 或其他格式,如:
```shell
$ convert out.ppm out.png
```
- `make check` 會檢驗程式碼輸出的圖片是否符合預期,符合的話會得到 “Verified OK!” 字樣
- 使用 gprof 分析 raytracing 生成的數據
```shell
$ gprof ./raytracing | less
```
- gprof 分析中呼叫最多次的函式
```shell
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
24.89 0.49 0.49 69646433 0.00 0.00 dot_product
22.35 0.93 0.44 56956357 0.00 0.00 subtract_vector
9.65 1.12 0.19 17836094 0.00 0.00 add_vector
8.63 1.29 0.17 13861875 0.00 0.00 rayRectangularIntersection
7.11 1.43 0.14 13861875 0.00 0.00 raySphereIntersection
6.60 1.56 0.13 31410180 0.00 0.00 multiply_vector
5.59 1.67 0.11 17821809 0.00 0.00 cross_product
4.06 1.75 0.08 10598450 0.00 0.00 normalize
```
### 可能的效能改進方向
- POSIX Thread
- OpenMP
- software pipelining
- loop unrolling
### 原版
- 執行時間:4.759024 sec
### 優化版 - 展開 for 迴圈
- 修改呼叫最多次的 `dot_product()`
- 展開 for 迴圈以減少分支指令
```clike=
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2];
return dp;
}
```
- 執行時間:4.458128 sec
### 優化版 - OpenMP
標頭檔:`#include <omp.h>`
編譯參數:`-fopenmp`
- 修改`raytracing()`
- 用 64 個 thread 來跑 for 迴圈
```clike=
#pragma omp parallel for num_threads(64)
for (int j = 0; j < height; j++) {
for (int i = 0; i < width; i++) {
double r = 0, g = 0, b = 0;
/* MSAA */
for (int s = 0; s < SAMPLES; s++) {
idx_stack_init(&stk);
rayConstruction(d, u, v, w,
i * factor + s / factor,
j * factor + s % factor,
view,
width * factor, height * factor);
if (ray_color(view->vrp, 0.0, d, &stk, rectangulars, spheres,
lights, object_color,
MAX_REFLECTION_BOUNCES)) {
r += object_color[0];
g += object_color[1];
b += object_color[2];
} else {
r += background_color[0];
g += background_color[1];
b += background_color[2];
}
pixels[((i + (j * width)) * 3) + 0] = r * 255 / SAMPLES;
pixels[((i + (j * width)) * 3) + 1] = g * 255 / SAMPLES;
pixels[((i + (j * width)) * 3) + 2] = b * 255 / SAMPLES;
}
}
}
```
- 執行時間:3.842497 sec
- 問題:加上 OpenMP 後,執行了很多次,但每次的執行時間都差很多,有時候比原版快,有時候卻比原本更慢