<style> h2.part{color:#D92424;} h3.part{color:#0099B0;} h4.part{color:#005BB0;} h5.part{color:#FD6F0A;} h6.part{color:#4400B0;} </style> # 2016q3 Homework1 (raytracing) contributed by <`f5120125`> ###### tags: `sysprog` ## 開發環境 #### Ubuntu 14.04 LTS - CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4 - Mem: 8 GiB - Cache: L1d cache: 32 KB L1i cache: 32 KB L2 cache: 256 KB L3 cache: 4096 KB ## 學習目標 - **使用==gprof==分析Raytracing程式熱點** - **分析程式找出優化方法** ### gprof 參考文件[[1]] #### ►執行結果, 耗時 ==6.245250==秒 ``` $ make PROFILE=1 $./raytracing ``` ``` # Rendering scene Done! Execution time of raytracing() : 6.245250 sec hua@hua-ubuntu:~/Desktop/sysprog-class/homework1/raytracing$ ``` - 產生gmon.out後即可呼叫gprof來分析 ``` $ gprof raytracing gmon.out | less ``` ``` Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 27.22 1.11 1.11 69646433 0.00 0.00 dot_product 18.39 1.86 0.75 56956357 0.00 0.00 subtract_vector 10.18 2.28 0.42 31410180 0.00 0.00 multiply_vector 8.34 2.62 0.34 17836094 0.00 0.00 add_vector 6.13 2.87 0.25 13861875 0.00 0.00 rayRectangularIntersection 6.13 3.12 0.25 10598450 0.00 0.00 normalize 6.01 3.36 0.25 17821809 0.00 0.00 cross_product 4.41 3.54 0.18 13861875 0.00 0.00 raySphereIntersection 3.19 3.67 0.13 4620625 0.00 0.00 ray_hit_object 1.96 3.75 0.08 4221152 0.00 0.00 multiply_vectors 1.23 3.80 0.05 3838091 0.00 0.00 length 1.23 3.85 0.05 2110576 0.00 0.00 compute_specular_diffuse 0.98 3.89 0.04 1048576 0.00 0.00 ray_color 0.98 3.93 0.04 1 0.04 4.08 raytracing 0.74 3.96 0.03 2520791 0.00 0.00 idx_stack_top 0.74 3.99 0.03 2110576 0.00 0.00 localColor 0.74 4.02 0.03 1241598 0.00 0.00 refraction 0.61 4.05 0.03 1048576 0.00 0.00 rayConstruction ``` #### ►分析 效能瓶頸發生在數學運算上如 ==**dot_product**==, ==**subtract_vector**== 等等, 因此我們可以著手改寫 ==**```math-toolkit.h```**== ## Raytracing程式優化 ### 根據作業要求提示的技巧 - [x]==**Loop unrolling**== - scheduling - pipeline - [x]==**Force inline**== - 強制編譯器將指定的函數體插入並取代每一處調用該函數的地方, 從而節省了每次調用函數帶來的額外時間開支 - [x]==**Pthread**== - [x]==**OpenMP**== ### 1. [Loop unrolling](https://www.ptt.cc/bbs/C_and_CPP/M.1246071002.A.A54.html) - #### 修改在dot_product: ```clike= static inline double dot_product(const double *v1, const double *v2) { double dp = 0.0; dp += v1[0] * v2[0]; dp += v1[1] * v2[1]; dp += v1[2] * v2[2]; return dp; } ``` #### ►執行結果 ##### 時間快了 1.35 (sec) ``` # Rendering scene Done! Execution time of raytracing() : 4.895239 sec ``` | Technique used | Execution time | | ----------------- | --------------------- | | *original* | **6.245250** sec | | *Loop unrolling* | **4.895239** sec | #### ### 2. Force inline #### ►加入__forceinline ``` static inline __forceinline double dot_product(const double *v1, const double *v2) { . . . } ``` #### ►修改Makfile ``` CFLAGS = \ -std=gnu99 -Wall -O0 -g \ -D__forceinline="__attribute__((always_inline))" ``` #### ►執行結果 ##### 時間快了 4.026964 (sec) ``` # Rendering scene Done! Execution time of raytracing() : 2.218286 sec ``` | Technique used | Execution time | | ----------------- | --------------------- | | *original* | **6.245250** sec | | *Loop unrolling* | **4.895239** sec | | *Force inline* | **2.218286** sec | ### 3. Pthread #### ►Preliminaries ##### Program, Process, Thread差別 - [notes](https://hackmd.io/GwBgLApgRhDMCMBaWUCGATRYDsYqIA54pZERt0IAzKk2AY2xCA==?both) ##### 什麼是Pthread - [POSIX Threads為POSIX.1c標準上定義的API](https://computing.llnl.gov/tutorials/pthreads/) - [Getting Started With POSIX Threads](http://www.csie.ntu.edu.tw/~r92094/c++/pthread.txt) #### ►Thread Management ##### 產生和結束 Thread ###### Routines - **pthread_create (thread,attr,start_routine,arg)** - [void pointer](http://quiz.geeksforgeeks.org/void-pointer-c/) - [function pointer](http://www.cprogramming.com/tutorial/function-pointers.html) - **pthread_exit (status)** - **pthread_cancel (thread)** - **pthread_attr_init (attr)** - **pthread_attr_destroy (attr)** ##### 使用pthread_create - [pthread_create 所建立的thread不分kernel thread或user thread](http://stackoverflow.com/questions/26188401/create-a-user-level-thread-or-kernel-level-thread-using-pthread-create) ```clike= int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); ``` ###### ◆第一個參數 - ==**unique Thread ID**== ###### ◆第二個參數 - 用於指定不同 ==**thread的attributes**==, 通常為傳入NULL, 代表是default attribute ###### ◆第三個參數 - **```void *```** 為 ==**generic pointer**==, 可以用來 ==**儲存任何型態的address**== 亦可以 ==**typecast成任何型態**== - **```(*start_routine)```** 為 ==**function pointer**==, 其所能傳入的參數型態為 **```void *```** ###### ◆第四個參數 - start_routine要用到的參數, 若是start_routine有許多參數, 可以傳入一個structure ##### 撰寫欲傳入start_routine的參數結構 - #### 觀察```raytracing.c```中的function: ```raytracing```, 我希望可以對pixel做平行處理 - 因此對於參數部份我將以一個```struct```去儲存```pixel```的資料```raytracing```將改寫成以下 ```clike= void *raytracing( void* args ) ``` ```clike= typedef struct __PTHREAD_ARGUMENT_STRUCTURE{ uint8_t *pixels; double *background_color; rectangular_node rectangulars; sphere_node spheres; light_node lights; const viewpoint *view; int width; int height; int pthNum; }PTH_ARGS; typedef struct PTHREAD_NODE{ PTH_ARGS* argPtr; int init_height; }PTH_NODE; ``` - ### pthread pool中有2條執行緒 ``` # Rendering scene Done! Execution time of raytracing() : 1.149351 sec ``` - ### pthread pool中有4條執行緒 ``` # Rendering scene Done! Execution time of raytracing() : 0.956216 sec ``` - ### pthread pool中有8條執行緒 ```# Rendering scene Done! Execution time of raytracing() : 0.979401 sec ``` #### ►執行結果 - 當threads數目繼續上升, 效能卻沒有顯著提升了 ``` # Rendering scene Done! Execution time of raytracing() : 2.218286 sec ``` | Technique used | Execution time | | ----------------- | --------------------- | | *original* | **6.245250** sec | | *Loop unrolling* | **4.895239** sec | | *Force inline* | **2.218286** sec | | *pthread (2 threads)*| **1.149351** sec | | *pthread (4 threads)*| **0.956216** sec | | *pthread (8 threads)*| **0.979401** sec | ### 4. OpenMP ##### 包含標頭檔 ==**```#include <omp.h>```**== ##### 修改Makefile ``` CC ?= gcc CFLAGS = \ -std=gnu99 -Wall -O0 -g -fopenmp LDFLAGS = \ -lm -lgomp ``` ##### 修改raytracing.c ==**```#pragma omp parallel for num_threads(64) private(stk), private(d), private(object_color)```**== #### ►執行結果 | Technique used | Execution time | | -------------------- | ------------------ | | *original* | **6.245250** sec | | *Loop unrolling* | **4.895239** sec | | *Force inline* | **2.218286** sec | | *pthread (2 threads)*| **1.149351** sec | | *pthread (4 threads)*| **0.956216** sec | | *pthread (8 threads)*| **0.979401** sec | | *OpenMP (8 threads)* | **0.956893** sec | ### 待完成目標 - [ ] 運用gnuplot做圖 ## Reference [[1]] http://os.51cto.com/art/200703/41426.htm [[2]] https://computing.llnl.gov/tutorials/pthreads/#Thread [[3]] https://computing.llnl.gov/tutorials/pthreads/man/pthread_create.txt [[4]] http://www.cprogramming.com/tutorial/function-pointers.html [1]:http://os.51cto.com/art/200703/41426.htm [2]:https://computing.llnl.gov/tutorials/pthreads/#Thread [3]:https://computing.llnl.gov/tutorials/pthreads/man/pthread_create.txt [4]:http://www.cprogramming.com/tutorial/function-pointers.html