Try   HackMD

2016q3 Homework1 (raytracing)

contributed by <f5120125>

tags: sysprog

開發環境

Ubuntu 14.04 LTS

  • CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4
  • Mem: 8 GiB
  • Cache:
    L1d cache: 32 KB
    L1i cache: 32 KB
    L2 cache: 256 KB
    L3 cache: 4096 KB

學習目標

  • 使用gprof分析Raytracing程式熱點
  • 分析程式找出優化方法

gprof

參考文件[1]

►執行結果, 耗時 6.245250

$ make PROFILE=1
$./raytracing
# Rendering scene
Done!
Execution time of raytracing() : 6.245250 sec
hua@hua-ubuntu:~/Desktop/sysprog-class/homework1/raytracing$ 
  • 產生gmon.out後即可呼叫gprof來分析
$ gprof raytracing gmon.out | less
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 27.22      1.11     1.11 69646433     0.00     0.00  dot_product
 18.39      1.86     0.75 56956357     0.00     0.00  subtract_vector
 10.18      2.28     0.42 31410180     0.00     0.00  multiply_vector
  8.34      2.62     0.34 17836094     0.00     0.00  add_vector
  6.13      2.87     0.25 13861875     0.00     0.00  rayRectangularIntersection
  6.13      3.12     0.25 10598450     0.00     0.00  normalize
  6.01      3.36     0.25 17821809     0.00     0.00  cross_product
  4.41      3.54     0.18 13861875     0.00     0.00  raySphereIntersection
  3.19      3.67     0.13  4620625     0.00     0.00  ray_hit_object
  1.96      3.75     0.08  4221152     0.00     0.00  multiply_vectors
  1.23      3.80     0.05  3838091     0.00     0.00  length
  1.23      3.85     0.05  2110576     0.00     0.00  compute_specular_diffuse
  0.98      3.89     0.04  1048576     0.00     0.00  ray_color
  0.98      3.93     0.04        1     0.04     4.08  raytracing
  0.74      3.96     0.03  2520791     0.00     0.00  idx_stack_top
  0.74      3.99     0.03  2110576     0.00     0.00  localColor
  0.74      4.02     0.03  1241598     0.00     0.00  refraction
  0.61      4.05     0.03  1048576     0.00     0.00  rayConstruction

►分析

效能瓶頸發生在數學運算上如 dot_product, subtract_vector 等等, 因此我們可以著手改寫 math-toolkit.h

Raytracing程式優化

根據作業要求提示的技巧

  • [x]Loop unrolling
  • scheduling
  • pipeline
  • [x]Force inline
  • 強制編譯器將指定的函數體插入並取代每一處調用該函數的地方, 從而節省了每次調用函數帶來的額外時間開支
  • [x]Pthread
  • [x]OpenMP

1. Loop unrolling

  • 修改在dot_product:

static inline double dot_product(const double *v1, const double *v2) { double dp = 0.0; dp += v1[0] * v2[0]; dp += v1[1] * v2[1]; dp += v1[2] * v2[2]; return dp; }

►執行結果

時間快了 1.35 (sec)
# Rendering scene
Done!
Execution time of raytracing() : 4.895239 sec
Technique used Execution time
original 6.245250 sec
Loop unrolling 4.895239 sec

2. Force inline

►加入__forceinline

static inline __forceinline
double dot_product(const double *v1, const double *v2)
{
	.
	.
	.
}

►修改Makfile

CFLAGS = \
    -std=gnu99 -Wall -O0 -g \
    -D__forceinline="__attribute__((always_inline))"

►執行結果

時間快了 4.026964 (sec)
# Rendering scene
Done!
Execution time of raytracing() : 2.218286 sec
Technique used Execution time
original 6.245250 sec
Loop unrolling 4.895239 sec
Force inline 2.218286 sec

3. Pthread

►Preliminaries

Program, Process, Thread差別
什麼是Pthread

►Thread Management

產生和結束 Thread
Routines
  • pthread_create (thread,attr,start_routine,arg)
  • pthread_exit (status)
  • pthread_cancel (thread)
  • pthread_attr_init (attr)
  • pthread_attr_destroy (attr)
使用pthread_create
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
◆第一個參數
  • unique Thread ID
◆第二個參數
  • 用於指定不同 thread的attributes, 通常為傳入NULL, 代表是default attribute
◆第三個參數
  • void *generic pointer, 可以用來 儲存任何型態的address 亦可以 typecast成任何型態
  • (*start_routine)function pointer, 其所能傳入的參數型態為 void *
◆第四個參數
  • start_routine要用到的參數, 若是start_routine有許多參數, 可以傳入一個structure
撰寫欲傳入start_routine的參數結構
  • 觀察raytracing.c中的function: raytracing, 我希望可以對pixel做平行處理

    • 因此對於參數部份我將以一個struct去儲存pixel的資料raytracing將改寫成以下
void *raytracing( void* args )
typedef struct __PTHREAD_ARGUMENT_STRUCTURE{ uint8_t *pixels; double *background_color; rectangular_node rectangulars; sphere_node spheres; light_node lights; const viewpoint *view; int width; int height; int pthNum; }PTH_ARGS; typedef struct PTHREAD_NODE{ PTH_ARGS* argPtr; int init_height; }PTH_NODE;
  • pthread pool中有2條執行緒

# Rendering scene
Done!
Execution time of raytracing() : 1.149351 sec

  • pthread pool中有4條執行緒

# Rendering scene
Done!
Execution time of raytracing() : 0.956216 sec
  • pthread pool中有8條執行緒

Done!
Execution time of raytracing() : 0.979401 sec

►執行結果

  • 當threads數目繼續上升, 效能卻沒有顯著提升了
# Rendering scene
Done!
Execution time of raytracing() : 2.218286 sec
Technique used Execution time
original 6.245250 sec
Loop unrolling 4.895239 sec
Force inline 2.218286 sec
pthread (2 threads) 1.149351 sec
pthread (4 threads) 0.956216 sec
pthread (8 threads) 0.979401 sec

4. OpenMP

包含標頭檔

#include <omp.h>

修改Makefile
CC ?= gcc
CFLAGS = \
	-std=gnu99 -Wall -O0 -g -fopenmp 
LDFLAGS = \
	-lm -lgomp
修改raytracing.c

#pragma omp parallel for num_threads(64) private(stk), private(d), private(object_color)

►執行結果

Technique used Execution time
original 6.245250 sec
Loop unrolling 4.895239 sec
Force inline 2.218286 sec
pthread (2 threads) 1.149351 sec
pthread (4 threads) 0.956216 sec
pthread (8 threads) 0.979401 sec
OpenMP (8 threads) 0.956893 sec

待完成目標

  • 運用gnuplot做圖

Reference

[1] http://os.51cto.com/art/200703/41426.htm
[2] https://computing.llnl.gov/tutorials/pthreads/#Thread
[3] https://computing.llnl.gov/tutorials/pthreads/man/pthread_create.txt
[4] http://www.cprogramming.com/tutorial/function-pointers.html