2017q1 Homework1 (raytracing)

contributed by < 0xff07 >

原始碼在這裡

作業要求在這裡

開發環境

Ubuntu 16.04.2
Linux version 4.8.0-36-generic

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

中英文字間請以空白隔開

請列出開發環境相關資訊
課程助教

工具

gprof

簡單說明用法如下：

編譯時, 加入 -pg 選項
執行編譯好的程式。執行完成後會多出一個 gmon.out檔案
執行：
```
$ gprof <程式名稱> gmon.out
```
就會出現各種統計資訊。

也可以使用 gprof <程式名稱> gmon.out | less, 或導向到另一個檔案以方便閱讀。

執行 gprof評估

執行

$ make PROFILE=1
$ ./raytracing
$ gprof raytracing gmon.out

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 23.22      0.55     0.55 69646433     0.00     0.00  dot_product
 20.69      1.04     0.49 56956357     0.00     0.00  subtract_vector
  9.08      1.26     0.22 31410180     0.00     0.00  multiply_vector
  7.60      1.44     0.18 10598450     0.00     0.00  normalize
  6.76      1.60     0.16 17836094     0.00     0.00  add_vector
  6.33      1.75     0.15 13861875     0.00     0.00  rayRectangularIntersection
  5.07      1.87     0.12  4620625     0.00     0.00  ray_hit_object
  4.22      1.97     0.10 13861875     0.00     0.00  raySphereIntersection
  4.22      2.07     0.10  1048576     0.00     0.00  ray_color
  3.80      2.16     0.09 17821809     0.00     0.00  cross_product
  1.69      2.20     0.04  2110576     0.00     0.00  compute_specular_diffuse
  1.69      2.24     0.04        1     0.04     2.36  raytracing
  1.48      2.27     0.04  4221152     0.00     0.00  multiply_vectors
  0.84      2.29     0.02  1241598     0.00     0.00  protect_color_overflow
  0.84      2.31     0.02  1241598     0.00     0.00  refraction
  0.42      2.32     0.01  3838091     0.00     0.00  length
  0.42      2.33     0.01  2520791     0.00     0.00  idx_stack_top
  0.42      2.34     0.01  2110576     0.00     0.00  localColor
  0.42      2.35     0.01  1241598     0.00     0.00  reflection 
  0.21      2.36     0.01  2558386     0.00     0.00  idx_stack_empty
  0.21      2.36     0.01  1204003     0.00     0.00  idx_stack_push

註：僅列出前幾個執行時間佔較大的函數

發現 dot_product 這個函數佔了23.22%的執行時間，另外subreact_vector花了20.69%的執行時間, 兩個加起來高達43.91% !

視覺化 Call Graph

如果覺得 call graph 文字看起來很亂，可以參考這篇。以以裡面 KCachegrind。為例，首先安裝：

$ sudo apt-get install -y kcachegrind valgrind

接著編譯 raytracing 程式：

$ make

然後：

$ valgrind --tool=callgrind ./raytracing

會出現以下文字：

==27518== Callgrind, a call-graph generating cache profiler
==27518== Copyright (C) 2002-2015, and GNU GPL'd, by Josef Weidendorfer et al.
==27518== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==27518== Command: ./raytracing
==27518== 
==27518== For interactive control, run 'callgrind_control -h'.
# Rendering scene

然後等待…

完成之後出現

Execution time of raytracing() : 96.692830 sec
==27518== 
==27518== Events    : Ir
==27518== Collected : 19459508299
==27518== 
==27518== I   refs:      19,459,508,299

這時後會出現一個如 callgrind.out.<pid>的檔案。接著執行

$ kcachegrind callgrind.out.<pid>

就會跑出如下視窗了：

因為有點好奇如果加了 -pg 編譯的話, 呼叫關係會長怎樣。

實驗 1. Loop unrolling

將本來的subtract_vector由

static inline
void subtract_vector(const double *a, const double *b, double *out)
{
    for (int i = 0; i < 3; i++)
        out[i] = a[i] - b[i];
}

改成：

static inline
void subtract_vector(const double *a, const double *b, double *out)
{
    out[0] = a[0] - b[0];
    out[1] = a[1] - b[1];
    out[2] = a[2] - b[2];
}

以及將dot_product由

static inline
double dot_product(const double *v1, const double *v2)
{
    double dp = 0.0;
    for (int i = 0; i < 3; i++)
        dp += v1[i] * v2[i];
    return dp;
}

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 17.08      0.28     0.28 69646433     0.00     0.00  dot_product
 14.34      0.52     0.24 31410180     0.00     0.00  multiply_vector
 12.20      0.72     0.20 10598450     0.00     0.00  normalize
 11.59      0.91     0.19 17836094     0.00     0.00  add_vector
  8.54      1.05     0.14 56956357     0.00     0.00  subtract_vector
  8.24      1.18     0.14 17821809     0.00     0.00  cross_product
  6.71      1.29     0.11 13861875     0.00     0.00  rayRectangularIntersection
  4.88      1.37     0.08 13861875     0.00     0.00  raySphereIntersection
  3.66      1.43     0.06  4620625     0.00     0.00  ray_hit_object
  3.66      1.49     0.06  2110576     0.00     0.00  localColor
  2.44      1.53     0.04  4221152     0.00     0.00  multiply_vectors
  1.83      1.56     0.03  2110576     0.00     0.00  compute_specular_diffuse
  1.83      1.59     0.03  1048576     0.00     0.00  rayConstruction
  1.22      1.61     0.02  3838091     0.00     0.00  length
  0.61      1.62     0.01  1241598     0.00     0.00  protect_color_overflow
  0.61      1.63     0.01  1241598     0.00     0.00  reflection
  0.61      1.64     0.01  1048576     0.00     0.00  ray_color

（圖待補）

初步發現 dot_product 時間由0.55秒減少至0.28秒, subtract_vector 由0.49減少到0.14（！）。不過這只做了一次而已，所以需要多做幾次才可以有意義。

實驗2. openMP

把最 raytracing() 函式最內部的迴圈平行化。但是我 code 寫錯了, 以致於 make check 會顯示錯誤。可是產出的結果意外的跟本來的接近，如下：

左方為這份程式的輸出，右方是原始檔案。

這份程式執行時間如下：

# Rendering scene
Done!
Execution time of raytracing() : 1.692834 sec

而原始執行時間為：

# Rendering scene
Done!
Execution time of raytracing() : 2.973419 sec

花了原來 57 %的時間。真是太莫名奇妙了。如果再加上 loop unrolling , 跑起來的時間是：

# Rendering scene
Done!
Execution time of raytracing() : 1.264572 sec

約變成原來的 42%

因為第二次 commit 就莫名其妙變快了將近一半, 在開心過頭的狀況下就很無聊的把 branch 叫作 superfast 了…

這才剛開始，好嗎？ –jserv

Forgive my stupidity…