Try   HackMD

2016q3 Homework 1 (raytracing)

tags: sarah,raytracing

contributed by <SarahCheng>
github:
https://github.com/SarahYuHanCheng/raytracing.git

開發記錄

  • problem:

    • vi Makefile遇下圖問題 輸入D復原
    • 不知道怎麼進入.vimrc調顏色

      sudo apt-get install vim

  • reference:

STEP:

  1. orig: Execution time of raytracing() : 6.607877 sec
    after ofast:Execution time of raytracing() : 1.440246 sec

  2. convert out.ppm成其他檔案,用ls —lh來看大小差別

  3. make PROFILE=1

  4. gprof: gprof ./raytracing | less觀察個function呼叫的次數與時間,less讓結果分頁

	Each sample counts as 0.01 seconds.
	%   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 18.39      0.57     0.57  1241598     0.00     0.00  refraction
 13.87      1.00     0.43 56956357     0.00     0.00  subtract_vector
 11.94      1.37     0.37 13861875     0.00     0.00  rayRectangularIntersection
 10.97      1.71     0.34 69646433     0.00     0.00  dot_product
  8.55      1.98     0.27 31410184     0.00     0.00  multiply_vector
  7.42      2.21     0.23 10598450     0.00     0.00  normalize
  5.48      2.38     0.17 17836094     0.00     0.00  add_vector
  5.16      2.54     0.16 13861875     0.00     0.00  raySphereIntersection
  4.52      2.67     0.14  4620625     0.00     0.00  ray_hit_object
  3.23      2.77     0.10 17821809     0.00     0.00  cross_product
  3.06      2.87     0.10  1048576     0.00     0.00  ray_color
  2.74      2.96     0.09  4221152     0.00     0.00  multiply_vectors
  0.97      2.98     0.03  2110576     0.00     0.00  compute_specular_diffuse
  0.65      3.00     0.02  1048576     0.00     0.00  rayConstruction
  0.65      3.02     0.02        1     0.02     3.10  raytracing
  0.48      3.04     0.01  1241598     0.00     0.00  protect_color_overflow
  0.32      3.05     0.01  3838091     0.00     0.00  length
  0.32      3.06     0.01  2558386     0.00     0.00  idx_stack_empty
  1. loop unrolling
    math-toolkit.h(dot_product)的loop解開,原為:
double dp = 0.0; for (int i = 0; i < 3; i++) dp += v1[i] * v2[i]; return dp;

改為:

double dp = 0.0; dp = v1[0] * v2[0], v1[1] * v2[1], v1[2] * v2[2]; return dp;

結果:Execution time of raytracing() : 1.009193 sec

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 20.24      0.09     0.09 11534336     0.00     0.00  subtract_vector
 19.05      0.17     0.08  4194304     0.00     0.00  multiply_vector
 14.29      0.23     0.06  1048576     0.00     0.00  rayConstruction
  9.52      0.27     0.04  1048576     0.00     0.00  ray_color
  9.52      0.30     0.04        1    40.00   410.00  raytracing
  7.14      0.34     0.03  3145728     0.00     0.00  raySphereIntersection
  5.95      0.36     0.03  4194304     0.00     0.00  add_vector
  4.76      0.38     0.02  3145728     0.00     0.00  rayRectangularIntersection
  2.38      0.39     0.01  3145730     0.00     0.00  cross_product
  2.38      0.40     0.01  1048579     0.00     0.00  normalize
  2.38      0.41     0.01                             multiply_vectors
  1.19      0.41     0.01 10485760     0.00     0.00  dot_product

  1. force inline
    • reduce function call ( Inline直接再程式執行中把函式展開,而不是另外使用記憶體來呼叫函式並運算)
      Execution time of raytracing() : 0.592862 sec
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 34.15      0.14     0.14  1048576     0.00     0.00  rayConstruction
 29.27      0.26     0.12  3145728     0.00     0.00  rayRectangularIntersection
 24.39      0.36     0.10  3145728     0.00     0.00  raySphereIntersection
  9.76      0.40     0.04  1048576     0.00     0.00  ray_hit_object
  2.44      0.41     0.01        1    10.00   410.00  raytracing
  0.00      0.41     0.00  1048579     0.00     0.00  normalize

  1. OpenMP
  • test, get the number of threads
#include<stdio.h>
#include<omp.h>
int main()
{
    #pragma omp parallel 
    {
         printf("Hello!!\n");
    }
    return 0;
}
Hello!!
Hello!!
Hello!!
Hello!!

得知電腦有4個執行緒

#pragma omp parallel for schedule(dynamic)
    for (int j = 0; j < height; j++) {
        #pragma omp parallel for schedule(dynamic) private(d,stk,object_color) 
        for (int i = 0; i < width; i++) 
  • 編譯時記得在Makefile中的CFLAGS加上-fopenmp,LDFLAGS加上-lgomp

  • 結果:

    • schedule(dynamic):Execution time of raytracing() : 0.425498 sec
    • num_threads(2):Execution time of raytracing() : 0.594750 sec
    • num_threads(4):Execution time of raytracing() : 0.408811 sec
    • num_threads(16):Execution time of raytracing() : 0.490429 sec
    • num_threads(32):Execution time of raytracing() : 0.469159 sec
    • num_threads(64):Execution time of raytracing() : 0.526471 sec
    • num_threads(128):Execution time of raytracing() : 0.608506 sec
  • 若加上math-toolkit(dot_product)的平行執行

 double dp0, dp1, dp2;

    #pragma omp parallel sections
    {
        #pragma omp section
        {
            dp0 = v1[0] * v2[0];
        }
        #pragma omp section
        {
            dp1 = v1[1] * v2[1];
        }
        #pragma omp section
        {
            dp2 = v1[2] * v2[2];
        }        
    }
    return dp0 + dp1 + dp2;
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 42.70      2.69     2.69 12052835     0.00     0.00  rayRectangularIntersection
 18.73      3.87     1.18  1802340     0.00     0.00  compute_specular_diffuse
  7.94      4.37     0.50 11756078     0.00     0.00  raySphereIntersection
  7.94      4.87     0.50        1     0.50     6.30  raytracing
  6.35      5.27     0.40  1127490     0.00     0.00  refraction
  3.97      5.52     0.25  1844751     0.00     0.00  localColor
  3.81      5.76     0.24  4125111     0.00     0.00  ray_hit_object
  3.49      5.98     0.22   931033     0.00     0.00  ray_color
  2.70      6.15     0.17  9374771     0.00     0.00  normalize
  1.43      6.24     0.09   870632     0.00     0.00  rayConstruction
  0.32      6.26     0.02  1117590     0.00     0.00  reflection

結果:Execution time of raytracing() : 19.676675 sec

不知道why..待解

  1. 看complicate,畫圖(待)
  2. SIMD-有自己的暫存器和指令,要快,要是連續的處理(待)
  3. raytracing.c 每個pixel可以獨立運作,拆開平行運算-multithread(待)TempoJiJi