raytracing

contributed by <snoopy831002>

Reviewed by <`hugikun999`>

嘗試利用gprof分析函式之間的呼叫關係，找出主要的優化目標
利用graphviz可以做出較易判斷的圖表
可以將全部檔案都看過，還有些地方可以做force inline
多測試一些thread的數目，比較thread數對於效能的影響。
sample數只有4，可以將sample做loop unroll，不過會造成main主體膨脹不少。
可以嘗試利用pthread。

gprof

什麼是gprof?

gprof是一個在Linux系統上效能分析的工具。他可以量化顯示顯示每個函式的呼叫次數，每個與其消耗的CPU cycle等數據。

使用方法

make PROFILE=1

當我們在執行 make 時，需傳入 PROFILE=1 的參數。傳入參數後會在編譯選項後加上 -pg ，使 mcount 函數嵌入程式碼中。此函數在執行期間會記錄呼叫次數等相關資訊。而這些資訊會儲存於 gmon.out ，方便那發者做分析。

開始分析

make PROFILE=1
./ratracing

得到結果：Excution time of raytracing:6.036864sec
我們發現消耗效能的前1、2名都在math-toolkit.h內->也就是我們要拿來開刀的地方啦！

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

優化方法一：使用gcc預設加速：Ofast

使用方法為修改makefile編譯器最佳化選項 -O0 -> -Ofast
得到結果：Excution time of raytracing:5.315068sec
快了近0.7秒，然而本次作業著重手動效能分析，依然要嘗試其他方法

優化方法二：loop unrolling

使用方法為將原來的loop拆開
例如將：


for(int i=0;i<3;i++) out[i]=a[i]+b[i];

改為



out[0] = a[0] + b[0];
out[1] = a[1] + b[1];
out[2] = a[2] + b[2];

得到結果：
Excution time of raytracing:5.245374sec
比原直行結果快接近0.8秒，亦超越系統優化。

將 math-toolkit.h 中有用到 for-loop 的地方拆開後，可以減少因為loop所造成的branch instruction，進而提升效能。

優化方法二：loop unrolling＋強制inline

使用方法：將math-toolkit.h的static inline改為static inline attribute ((always_inline))

得到結果：
Excution time of raytracing:2.349816sec
比loop unrolling又快約2秒。

優化方法三：使用Openmp平行化

參考吳勃興的共筆
使用openmp平行化for迴圈































 #pragma omp parallel for num_threads(64) \
                private(stk), private(d), \       
                private(object_color)
    for (int j = 0; j < height; j++) {
        for (int i = 0; i < width; i++) {
            double r = 0, g = 0, b = 0;
            /* MSAA */
            for (int s = 0; s < SAMPLES; s++) {
                idx_stack_init(&stk);
                rayConstruction(d, u, v, w,
                                i * factor + s / factor,
                                j * factor + s % factor,
                                view,
                                width * factor, height * factor);
                if (ray_color(view->vrp, 0.0, d, &stk, rectangulars, spheres,
                              lights, object_color,
                              MAX_REFLECTION_BOUNCES)) {
                    r += object_color[0];
                    g += object_color[1];
                    b += object_color[2];
                } else {
                    r += background_color[0];
                    g += background_color[1];
                    b += background_color[2];
                }
                pixels[((i + (j * width)) * 3) + 0] = r * 255 / SAMPLES;
                pixels[((i + (j * width)) * 3) + 1] = g * 255 / SAMPLES;
                pixels[((i + (j * )) * 3) + 2] = b * 255 / SAMPLES;
            }
        }
}

Excution time of raytracing : 1.284673sec
Openmp的使用可使我們的執行時間再提升1秒！！

在最內層回圈for(int s = 0; s < SAMPLES; s++)內的變數容易產生線程搶變數的問題造成race-condition。故必須將內部變數轉換為private(為了每個線程擁有獨立的變數)。而r,g,b此三變數不用設為private是因為他們每次進回圈之前都會初始化，每個線程有擁獨立的變數。

參考資料

吳勃興的共筆
 詹欣達的共筆
 Openmp中使用private的時機

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

raytracing

Reviewed by <hugikun999>

gprof

什麼是gprof?

使用方法

開始分析

優化方法一：使用gcc預設加速：Ofast

優化方法二：loop unrolling

優化方法二：loop unrolling＋強制inline

優化方法三：使用Openmp平行化

參考資料

Reviewed by <`hugikun999`>