2017q1 Homework1 (raytracing)

contributed by < refleex >

要趕快追進度囉
課程助教

開發環境

作業系統： ubuntu 16.04 LTS (64-bit)
Architecture: x86_64
CPU 作業模式： 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數：2
每通訊端核心數：2
Socket(s): 1
NUMA 節點： 1
供應商識別號： GenuineIntel
CPU 家族： 6
型號： 60
Model name: Intel® Core™ i5-4210M CPU @ 2.60GHz
製程： 3
CPU MHz： 1694.671
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 5187.92
虛擬： VT-x
L1d 快取： 32K
L1i 快取： 32K
L2 快取： 256K
L3 快取： 3072K
NUMA node0 CPU(s): 0-3

前置作業

可善用 POSIX Thread, OpenMP, software pipelining, 以及 loop unrolling 一類的技巧來加速程式運作

先去了解何謂平行化，並學習一些基本平行化的方式，例如 openMP，還有 pthread 的語法，並熟悉使用

未優化版本

先執行最初版本$ make $ ./raytracing

# Rendering scene
Done!
Execution time of raytracing() : 2.666284 sec

接著為了能讓 gprof 偵測效能

$ make clean $ make PROFILE=1 $ ./raytracing

# Rendering scene
Done!
Execution time of raytracing() : 5.567427 sec

用 gprof 來檢測效能 $ gprof -b raytracing gmon.out | less

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 26.46      0.55     0.55 56956357     0.00     0.00  subtract_vector
 19.24      0.95     0.40 69646433     0.00     0.00  dot_product
  9.86      1.16     0.21 13861875     0.00     0.00  rayRectangularIntersection
  8.90      1.34     0.19 31410180     0.00     0.00  multiply_vector
  6.73      1.48     0.14 17836094     0.00     0.00  add_vector
  6.49      1.62     0.14 13861875     0.00     0.00  raySphereIntersection
  4.81      1.72     0.10 10598450     0.00     0.00  normalize
  4.33      1.81     0.09  4620625     0.00     0.00  ray_hit_object
  3.37      1.88     0.07 17821809     0.00     0.00  cross_product
  2.41      1.93     0.05  2110576     0.00     0.00  compute_specular_diffuse
  1.92      1.97     0.04        1     0.04     2.08  raytracing
  1.44      2.00     0.03  1048576     0.00     0.00  ray_color
  0.96      2.02     0.02  2110576     0.00     0.00  localColor
  0.96      2.04     0.02  1048576     0.00     0.00  rayConstruction
  0.72      2.05     0.02  4221152     0.00     0.00  multiply_vectors
  0.48      2.06     0.01  3838091     0.00     0.00  length
  0.48      2.07     0.01  1204003     0.00     0.00  idx_stack_push
  0.48      2.08     0.01   113297     0.00     0.00  fresnel

可以發現 subtract_vector（）和 dot_product 被呼叫最多次
分別花費了 0.55 秒和 0.4 秒

Force inline

關於 inline 了解 inline 是作為"建議"編譯器將 inline 函式直接插入執行檔編譯，避免過多的函數呼叫，但編譯器不一定會採納
因此用 __attribute__((always_inline)) 來強制將函式 inline

執行結果

# Rendering scene
Done!
Execution time of raytracing() : 2.478609 sec

OpenMP

先使用 openMP
要使用openmp前須在編譯時加入 -fopenmp 這個 library
因此要更改 Makefile




CFLAGS = \
        -std=gnu99 -Wall -O0 -g -fopenmp
LDFLAGS = \
        -lm -fopenmp

在 subtract_vector（）中加上 #pragma omp parallel for






void subtract_vector(const double *a, const double *b, double *out)
{
    #pragma omp parallel for
    for (int i = 0; i < 3; i++)
        out[i] = a[i] - b[i];
}

再讓程式跑一次$ ./raytracing

# Rendering scene
Done!
Execution time of raytracing() : 5.449012 sec

好像沒快多少，檢查 gprof

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 22.99      0.54     0.54 69646433     0.00     0.00  dot_product
 16.61      0.93     0.39 56956357     0.00     0.00  subtract_vector
 13.20      1.24     0.31 17836094     0.00     0.00  add_vector
  9.79      1.47     0.23 31410180     0.00     0.00  multiply_vector
  9.58      1.70     0.23 13861875     0.00     0.00  rayRectangularIntersection
  6.39      1.85     0.15 10598450     0.00     0.00  normalize
  4.26      1.95     0.10 17821809     0.00     0.00  cross_product
  3.83      2.04     0.09  4620625     0.00     0.00  ray_hit_object
  3.19      2.11     0.08 13861875     0.00     0.00  raySphereIntersection
  2.55      2.17     0.06  4221152     0.00     0.00  multiply_vectors
  2.55      2.23     0.06  1048576     0.00     0.00  ray_color
  1.70      2.27     0.04        1     0.04     2.35  raytracing
  1.28      2.30     0.03  2110576     0.00     0.00  localColor
  0.43      2.31     0.01  2520791     0.00     0.00  idx_stack_top
  0.43      2.32     0.01  2110576     0.00     0.00  compute_specular_diffuse
  0.43      2.33     0.01  1241598     0.00     0.00  reflection
  0.43      2.34     0.01  1048576     0.00     0.00  idx_stack_init
  0.43      2.35     0.01    37595     0.00     0.00  idx_stack_pop

subtract_vector（）花費時間明顯降低，降至 0.39 秒
可是 dot_product() 時間卻增加了
若不使用 gprof ，原始執行時間

# Rendering scene
Done!
Execution time of raytracing() : 2.723372 sec

將 raytracing.c 中的 raytracing 函式的 for 迴圈平行化

# Rendering scene
Done!
Execution time of raytracing() : 1.393815 sec

明顯加速許多
可是看一下圖卻是錯的

更改了一下 openmp 的寫法，將一些參數 private 起來避

#pragma omp parallel for private(object_color,stk,d)

再跑一次圖就正確了

# Rendering scene
Done!
Execution time of raytracing() : 1.305993 sec