2016q3 Homework (raytracing)

contributed by <kaizsv>

prof

# Rendering scene
Done!
Execution time of raytracing() : 6.473611 sec

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 24.22      0.53     0.53 56956357     0.00     0.00  subtract_vector
 23.30      1.04     0.51 69646433     0.00     0.00  dot_product
 13.94      1.35     0.31 31410180     0.00     0.00  multiply_vector
  6.85      1.50     0.15 17836094     0.00     0.00  add_vector
  5.48      1.62     0.12 10598450     0.00     0.00  normalize
  4.57      1.72     0.10 17821809     0.00     0.00  cross_product
  4.57      1.82     0.10 13861875     0.00     0.00  rayRectangularIntersection
  4.57      1.92     0.10 13861875     0.00     0.00  raySphereIntersection

上面是我的 prof 結果，先對常被呼叫的subtract_vector, dot_product, multiply_vector, add_vector，做 loop unrolling

loop unrolling

# Rendering scene
Done!
Execution time of raytracing() : 5.899568 sec

快了1秒左右。

OpenMP

# Rendering scene
Done!
Execution time of raytracing() : 0.983607 sec
Verified OK

OpenMP 可以在shared-memory machine上執行平行程式，在編繹時加上-fopenmp。



































#include <omp.h>

#pragma omp parallel for num_threads(16) \
			  schedule(guided, 4) \
			  private(d) \
			  private(stk) \
			  firstprivate(object_color)
for (int j = 0; j < height; j++) {
    for (int i = 0; i < width; i++) {
	double r = 0, g = 0, b = 0;
            /* MSAA */
            for (int s = 0; s < SAMPLES; s++) {
                idx_stack_init(&stk);
                rayConstruction(d, u, v, w,
                                i * factor + s / factor,
                                j * factor + s % factor,
                                view,
                                width * factor, height * factor);
                if (ray_color(view->vrp, 0.0, d, &stk, rectangulars, spheres,
                              lights, object_color,
                              MAX_REFLECTION_BOUNCES)) {
                    r += object_color[0];
                    g += object_color[1];
                    b += object_color[2];
                } else {
                    r += background_color[0];
                    g += background_color[1];
                    b += background_color[2];
                }
                pixels[((i + (j * width)) * 3) + 0] = r * 255 / SAMPLES;
                pixels[((i + (j * width)) * 3) + 1] = g * 255 / SAMPLES;
                pixels[((i + (j * width)) * 3) + 2] = b * 255 / SAMPLES;
            }
	}
}

#pragma omp parallel for

這是 OpenMP 的編繹器指令，表示要將這個 for 迴圈平行化。

num_threads(16)

要用幾個 threads 來執行，也可以用 omp_get_max_threads()讓最多的 threads 執行。

schedule(dynamic, 4)

編繹器要如何分配工作，就用schedule指令。

static

每個 threads 會依序執行被切割的工作，而 schedule(static, 4) 的意思如下例子。

  #pragma omp parallel for num_threads[4] schedule(static)
  for (int i = 0; i < 1000; i++) {}
  
  thread 1: i = 0 ~ 249
  thread 2: i = 250 ~ 499
  thread 3: i = 500 ~ 749
  thread 4: i = 750 ~ 999

  #pragma omp parallel for num_threads[4] schedule(static, 4)
  for (int i = 0; i < 1000; i++) {}
  
  thread 1: i = 0, 1, 2, 3, 16, 17...
  thread 2: i = 4, 5, 6, 7, 20, 21...
  thread 3: i = 8, 9, 10, 11, 24, 25...
  thread 4: i = 12, 13, 14, 15, 28, 29...

dynamic

當 threads 完成某個區塊後，才會動態分配另一個區塊去執行。

guided

與 dynamic 類似，但區塊大小會指數遞減。

auto

編繹器自行決定。

runtime

使用者用環境變數 OMP_SCHEDULE 決定。

private and shared

private 就是每個 thread 有自己的一份變數，同理 shared 就是所有 threads 共用。

firstprivate

它就是 private 變數，但如果該變數在進入迴圈前就有初始值，則 firstprivate 會保留，如果是 private 的話會是未知，同理還有 lastprivate 是執行緒結束後 private 變數是否會更新。

在raytracing迴圈內，stk每次迴圈都會清空，d會先在rayConstruction被正規化，再計算顏色，而object_color在進入迴圈前有初始值了，就設成firstprivate。

多核心高效能程式開發

2016q3 Homework (raytracing)

prof

loop unrolling

OpenMP

tags: assigment_2 raytracing

tags: `assigment_2` `raytracing`