2016q3 Homework1 (raytracing)

contributed by <HuangWenChen>

開發環境

Description : Ubuntu 16.04.1 LTS
linux kernel version : 4.4.0-38-generic
CPU : AMD A6-4455M APU with Radeon™ HD Graphics
Cache :
- L1d cache : 16K
- L1i cache : 64K
- L2 cache : 2048K

可使用$ lscpu and $ cat /etc/os-release 查看規格

前置準備

$ sudo apt-get update
$ sudo apt-get install graphviz
$ sudo apt-get install imagemagick

$ make
$ ./raytracing

學習 Gprof

參考資料

Gprof 是GNU profiler工具，可以觀察程序運行中各個函數消耗的時間。
基本使用:

在編譯時，gcc須使用 -pg 選項編譯和鏈接你的應用程序。
執行應用程序，運行完成後生成供gprof分析的數據文件（默認是gmon.out）。
使用gprof 程序分析你的應用程序生成的數據。

編輯 Makefile 加上 -pg 參數，再進行編譯跟執行，就會產生一個gmon.out檔

$ make PROFILE=1
PROFILE=1 讓gcc加上 -pg 此參數。

未修改的版本

$ make
$ ./raytracing

# Rendering scene
Done!
Execution time of raytracing() : 7.455105 sec

$ make PROFILE=1
$ ./raytracing 執行後產生 gmon.out 檔案

# Rendering scene
Done!
Execution time of raytracing() : 14.407905 sec

時間增加是因為中間而外增加了程式碼

gprof options executable-file gmon.out
如果已都了解統計圖表每個字段可以加 -b 省略詳細描述。
$ gprof raytracing gmon.out

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 18.54      1.06     1.06 56956357     0.00     0.00  subtract_vector
 16.97      2.03     0.97 69646433     0.00     0.00  dot_product
 14.34      2.85     0.82 13861875     0.00     0.00  rayRectangularIntersection
  8.75      3.35     0.50 13861875     0.00     0.00  raySphereIntersection
  7.61      3.79     0.44 17836094     0.00     0.00  add_vector
  6.30      4.15     0.36 17821809     0.00     0.00  cross_product
  5.60      4.47     0.32 31410180     0.00     0.00  multiply_vector
  5.60      4.79     0.32 10598450     0.00     0.00  normalize
  3.67      5.00     0.21  1048576     0.00     0.00  ray_color
  3.15      5.18     0.18  4620625     0.00     0.00  ray_hit_object
  2.27      5.31     0.13  2110576     0.00     0.00  compute_specular_diffuse
  1.40      5.39     0.08  2110576     0.00     0.00  localColor
  1.40      5.47     0.08  1241598     0.00     0.00  refraction
  0.96      5.52     0.06  3838091     0.00     0.00  length
  0.70      5.56     0.04  4221152     0.00     0.00  multiply_vectors
  0.52      5.59     0.03  2520791     0.00     0.00  idx_stack_top
  0.52      5.62     0.03  1241598     0.00     0.00  reflection
  0.52      5.65     0.03        1     0.03     5.72  raytracing
  0.35      5.67     0.02  1048576     0.00     0.00  rayConstruction
  0.26      5.69     0.02  2558386     0.00     0.00  idx_stack_empty
  0.26      5.70     0.02  1204003     0.00     0.00  idx_stack_push
  0.17      5.71     0.01  1241598     0.00     0.00  protect_color_overflow
  0.17      5.72     0.01   113297     0.00     0.00  fresnel
  0.00      5.72     0.00  1048576     0.00     0.00  idx_stack_init
  0.00      5.72     0.00    37595     0.00     0.00  idx_stack_pop
  0.00      5.72     0.00        3     0.00     0.00  append_rectangular
  0.00      5.72     0.00        3     0.00     0.00  append_sphere
  0.00      5.72     0.00        2     0.00     0.00  append_light
  0.00      5.72     0.00        1     0.00     0.00  calculateBasisVectors
  0.00      5.72     0.00        1     0.00     0.00  delete_light_list
  0.00      5.72     0.00        1     0.00     0.00  delete_rectangular_list
  0.00      5.72     0.00        1     0.00     0.00  delete_sphere_list
  0.00      5.72     0.00        1     0.00     0.00  diff_in_second
  0.00      5.72     0.00        1     0.00     0.00  write_to_ppm

從圖表可以看出花大部分時間前兩個分別為
subtract_vector, dot_product
所以先從兩個 function 開始改寫。

修改的版本

參考資料
- 2016 年春季系統程式
- 前人共筆

原先版本 dot_product








static inline
double dot_product(const double *v1, const double *v2)
{
    double dp = 0.0;
    for (int i = 0; i < 3; i++)
        dp += v1[i] * v2[i];
    return dp;
}

使用 Loop unrolling

參考資料
- wiki

讓迴圈去減少比較，展開使沒有做比較動作。






static inline
double dot_product(const double *v1, const double *v2)
{
    double dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2];
    return dp;
}

清除重新編譯
$ ./raytracing

# Rendering scene
Done!
Execution time of raytracing() : 13.070825 sec

發現時間從 14.407905 -> 13.070825 秒下降了 1.33708 秒
再將 subtract_vector 修改
清除重新編譯
$ ./raytracing

# Rendering scene
Done!
Execution time of raytracing() : 12.072921 sec

發現時間又從 13.070825 -> 12.072921 秒下降 0.997904 秒

使用 force inline

參考資料
- force inline

在程式碼裡看到 static inline ，inline 是提示編譯器做最佳化在 function call 轉成程式碼，減少呼叫的成本，但是 Makefile 一開始就把最佳化關掉。

在 static inline 後加上 __attribute__((always_inline))，就建議gcc去開起最佳化。

在 dot_product subtract_vector 上使用，又減少執行時間。

# Rendering scene
Done!
Execution time of raytracing() : 10.380690 sec

使用 OpenMP

參考資料

OpenMP（Open Multi-Processing）是一套支持跨平台共享內存方式的多線程並發的編程API

修改raytracing.c中的raytracing() 　參考共筆與 tutorials 實做
要將 for 迴圈平行化處理要加上次行

#pragma omp parallel for num_threads(16)   \
    private(stk), private(d),   \
    private(object_color)

num_threads() 要開的執行緒數量
private() 需要獨立變數
在共筆提及到需要 #include<omp.h> ，以及修改 MakeFile 在編譯選項中加上 -fopenmp
為了方便這裡不加上 -pg 去監控。

# Rendering scene
Done!
Execution time of raytracing() : 4.241044 sec

16個執行緒，時間明顯改善很多