owned this note
owned this note
Published
Linked with GitHub
<style>
h2.part{color:#D92424;}
h3.part{color:#0099B0;}
h4.part{color:#005BB0;}
h5.part{color:#FD6F0A;}
h6.part{color:#4400B0;}
</style>
# 2016q3 Homework1 (raytracing)
contributed by <`f5120125`>
###### tags: `sysprog`
## 開發環境
#### Ubuntu 14.04 LTS
- CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4
- Mem: 8 GiB
- Cache:
L1d cache: 32 KB
L1i cache: 32 KB
L2 cache: 256 KB
L3 cache: 4096 KB
## 學習目標
- **使用==gprof==分析Raytracing程式熱點**
- **分析程式找出優化方法**
### gprof
參考文件[[1]]
#### ►執行結果, 耗時 ==6.245250==秒
```
$ make PROFILE=1
$./raytracing
```
```
# Rendering scene
Done!
Execution time of raytracing() : 6.245250 sec
hua@hua-ubuntu:~/Desktop/sysprog-class/homework1/raytracing$
```
- 產生gmon.out後即可呼叫gprof來分析
```
$ gprof raytracing gmon.out | less
```
```
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
27.22 1.11 1.11 69646433 0.00 0.00 dot_product
18.39 1.86 0.75 56956357 0.00 0.00 subtract_vector
10.18 2.28 0.42 31410180 0.00 0.00 multiply_vector
8.34 2.62 0.34 17836094 0.00 0.00 add_vector
6.13 2.87 0.25 13861875 0.00 0.00 rayRectangularIntersection
6.13 3.12 0.25 10598450 0.00 0.00 normalize
6.01 3.36 0.25 17821809 0.00 0.00 cross_product
4.41 3.54 0.18 13861875 0.00 0.00 raySphereIntersection
3.19 3.67 0.13 4620625 0.00 0.00 ray_hit_object
1.96 3.75 0.08 4221152 0.00 0.00 multiply_vectors
1.23 3.80 0.05 3838091 0.00 0.00 length
1.23 3.85 0.05 2110576 0.00 0.00 compute_specular_diffuse
0.98 3.89 0.04 1048576 0.00 0.00 ray_color
0.98 3.93 0.04 1 0.04 4.08 raytracing
0.74 3.96 0.03 2520791 0.00 0.00 idx_stack_top
0.74 3.99 0.03 2110576 0.00 0.00 localColor
0.74 4.02 0.03 1241598 0.00 0.00 refraction
0.61 4.05 0.03 1048576 0.00 0.00 rayConstruction
```
#### ►分析
效能瓶頸發生在數學運算上如 ==**dot_product**==, ==**subtract_vector**== 等等, 因此我們可以著手改寫 ==**```math-toolkit.h```**==
## Raytracing程式優化
### 根據作業要求提示的技巧
- [x]==**Loop unrolling**==
- scheduling
- pipeline
- [x]==**Force inline**==
- 強制編譯器將指定的函數體插入並取代每一處調用該函數的地方, 從而節省了每次調用函數帶來的額外時間開支
- [x]==**Pthread**==
- [x]==**OpenMP**==
### 1. [Loop unrolling](https://www.ptt.cc/bbs/C_and_CPP/M.1246071002.A.A54.html)
- #### 修改在dot_product:
```clike=
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
dp += v1[0] * v2[0];
dp += v1[1] * v2[1];
dp += v1[2] * v2[2];
return dp;
}
```
#### ►執行結果
##### 時間快了 1.35 (sec)
```
# Rendering scene
Done!
Execution time of raytracing() : 4.895239 sec
```
| Technique used | Execution time |
| ----------------- | --------------------- |
| *original* | **6.245250** sec |
| *Loop unrolling* | **4.895239** sec |
####
### 2. Force inline
#### ►加入__forceinline
```
static inline __forceinline
double dot_product(const double *v1, const double *v2)
{
.
.
.
}
```
#### ►修改Makfile
```
CFLAGS = \
-std=gnu99 -Wall -O0 -g \
-D__forceinline="__attribute__((always_inline))"
```
#### ►執行結果
##### 時間快了 4.026964 (sec)
```
# Rendering scene
Done!
Execution time of raytracing() : 2.218286 sec
```
| Technique used | Execution time |
| ----------------- | --------------------- |
| *original* | **6.245250** sec |
| *Loop unrolling* | **4.895239** sec |
| *Force inline* | **2.218286** sec |
### 3. Pthread
#### ►Preliminaries
##### Program, Process, Thread差別
- [notes](https://hackmd.io/GwBgLApgRhDMCMBaWUCGATRYDsYqIA54pZERt0IAzKk2AY2xCA==?both)
##### 什麼是Pthread
- [POSIX Threads為POSIX.1c標準上定義的API](https://computing.llnl.gov/tutorials/pthreads/)
- [Getting Started With POSIX Threads](http://www.csie.ntu.edu.tw/~r92094/c++/pthread.txt)
#### ►Thread Management
##### 產生和結束 Thread
###### Routines
- **pthread_create (thread,attr,start_routine,arg)**
- [void pointer](http://quiz.geeksforgeeks.org/void-pointer-c/)
- [function pointer](http://www.cprogramming.com/tutorial/function-pointers.html)
- **pthread_exit (status)**
- **pthread_cancel (thread)**
- **pthread_attr_init (attr)**
- **pthread_attr_destroy (attr)**
##### 使用pthread_create
- [pthread_create 所建立的thread不分kernel thread或user thread](http://stackoverflow.com/questions/26188401/create-a-user-level-thread-or-kernel-level-thread-using-pthread-create)
```clike=
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg);
```
###### ◆第一個參數
- ==**unique Thread ID**==
###### ◆第二個參數
- 用於指定不同 ==**thread的attributes**==, 通常為傳入NULL, 代表是default attribute
###### ◆第三個參數
- **```void *```** 為 ==**generic pointer**==, 可以用來 ==**儲存任何型態的address**== 亦可以 ==**typecast成任何型態**==
- **```(*start_routine)```** 為 ==**function pointer**==, 其所能傳入的參數型態為 **```void *```**
###### ◆第四個參數
- start_routine要用到的參數, 若是start_routine有許多參數, 可以傳入一個structure
##### 撰寫欲傳入start_routine的參數結構
- #### 觀察```raytracing.c```中的function: ```raytracing```, 我希望可以對pixel做平行處理
- 因此對於參數部份我將以一個```struct```去儲存```pixel```的資料```raytracing```將改寫成以下
```clike=
void *raytracing( void* args )
```
```clike=
typedef struct __PTHREAD_ARGUMENT_STRUCTURE{
uint8_t *pixels;
double *background_color;
rectangular_node rectangulars;
sphere_node spheres;
light_node lights;
const viewpoint *view;
int width;
int height;
int pthNum;
}PTH_ARGS;
typedef struct PTHREAD_NODE{
PTH_ARGS* argPtr;
int init_height;
}PTH_NODE;
```
- ### pthread pool中有2條執行緒
```
# Rendering scene
Done!
Execution time of raytracing() : 1.149351 sec
```
- ### pthread pool中有4條執行緒
```
# Rendering scene
Done!
Execution time of raytracing() : 0.956216 sec
```
- ### pthread pool中有8條執行緒
```# Rendering scene
Done!
Execution time of raytracing() : 0.979401 sec
```
#### ►執行結果
- 當threads數目繼續上升, 效能卻沒有顯著提升了
```
# Rendering scene
Done!
Execution time of raytracing() : 2.218286 sec
```
| Technique used | Execution time |
| ----------------- | --------------------- |
| *original* | **6.245250** sec |
| *Loop unrolling* | **4.895239** sec |
| *Force inline* | **2.218286** sec |
| *pthread (2 threads)*| **1.149351** sec |
| *pthread (4 threads)*| **0.956216** sec |
| *pthread (8 threads)*| **0.979401** sec |
### 4. OpenMP
##### 包含標頭檔
==**```#include <omp.h>```**==
##### 修改Makefile
```
CC ?= gcc
CFLAGS = \
-std=gnu99 -Wall -O0 -g -fopenmp
LDFLAGS = \
-lm -lgomp
```
##### 修改raytracing.c
==**```#pragma omp parallel for num_threads(64) private(stk), private(d), private(object_color)```**==
#### ►執行結果
| Technique used | Execution time |
| -------------------- | ------------------ |
| *original* | **6.245250** sec |
| *Loop unrolling* | **4.895239** sec |
| *Force inline* | **2.218286** sec |
| *pthread (2 threads)*| **1.149351** sec |
| *pthread (4 threads)*| **0.956216** sec |
| *pthread (8 threads)*| **0.979401** sec |
| *OpenMP (8 threads)* | **0.956893** sec |
### 待完成目標
- [ ] 運用gnuplot做圖
## Reference
[[1]] http://os.51cto.com/art/200703/41426.htm
[[2]] https://computing.llnl.gov/tutorials/pthreads/#Thread
[[3]] https://computing.llnl.gov/tutorials/pthreads/man/pthread_create.txt
[[4]] http://www.cprogramming.com/tutorial/function-pointers.html
[1]:http://os.51cto.com/art/200703/41426.htm
[2]:https://computing.llnl.gov/tutorials/pthreads/#Thread
[3]:https://computing.llnl.gov/tutorials/pthreads/man/pthread_create.txt
[4]:http://www.cprogramming.com/tutorial/function-pointers.html