contributed by <f5120125
>
sysprog
參考文件[1]
$ make PROFILE=1
$./raytracing
# Rendering scene
Done!
Execution time of raytracing() : 6.245250 sec
hua@hua-ubuntu:~/Desktop/sysprog-class/homework1/raytracing$
$ gprof raytracing gmon.out | less
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
27.22 1.11 1.11 69646433 0.00 0.00 dot_product
18.39 1.86 0.75 56956357 0.00 0.00 subtract_vector
10.18 2.28 0.42 31410180 0.00 0.00 multiply_vector
8.34 2.62 0.34 17836094 0.00 0.00 add_vector
6.13 2.87 0.25 13861875 0.00 0.00 rayRectangularIntersection
6.13 3.12 0.25 10598450 0.00 0.00 normalize
6.01 3.36 0.25 17821809 0.00 0.00 cross_product
4.41 3.54 0.18 13861875 0.00 0.00 raySphereIntersection
3.19 3.67 0.13 4620625 0.00 0.00 ray_hit_object
1.96 3.75 0.08 4221152 0.00 0.00 multiply_vectors
1.23 3.80 0.05 3838091 0.00 0.00 length
1.23 3.85 0.05 2110576 0.00 0.00 compute_specular_diffuse
0.98 3.89 0.04 1048576 0.00 0.00 ray_color
0.98 3.93 0.04 1 0.04 4.08 raytracing
0.74 3.96 0.03 2520791 0.00 0.00 idx_stack_top
0.74 3.99 0.03 2110576 0.00 0.00 localColor
0.74 4.02 0.03 1241598 0.00 0.00 refraction
0.61 4.05 0.03 1048576 0.00 0.00 rayConstruction
效能瓶頸發生在數學運算上如 dot_product, subtract_vector 等等, 因此我們可以著手改寫 math-toolkit.h
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
dp += v1[0] * v2[0];
dp += v1[1] * v2[1];
dp += v1[2] * v2[2];
return dp;
}
# Rendering scene
Done!
Execution time of raytracing() : 4.895239 sec
Technique used | Execution time |
---|---|
original | 6.245250 sec |
Loop unrolling | 4.895239 sec |
static inline __forceinline
double dot_product(const double *v1, const double *v2)
{
.
.
.
}
CFLAGS = \
-std=gnu99 -Wall -O0 -g \
-D__forceinline="__attribute__((always_inline))"
# Rendering scene
Done!
Execution time of raytracing() : 2.218286 sec
Technique used | Execution time |
---|---|
original | 6.245250 sec |
Loop unrolling | 4.895239 sec |
Force inline | 2.218286 sec |
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg);
void *
為 generic pointer, 可以用來 儲存任何型態的address 亦可以 typecast成任何型態(*start_routine)
為 function pointer, 其所能傳入的參數型態為 void *
raytracing.c
中的function: raytracing
, 我希望可以對pixel做平行處理struct
去儲存pixel
的資料raytracing
將改寫成以下void *raytracing( void* args )
typedef struct __PTHREAD_ARGUMENT_STRUCTURE{
uint8_t *pixels;
double *background_color;
rectangular_node rectangulars;
sphere_node spheres;
light_node lights;
const viewpoint *view;
int width;
int height;
int pthNum;
}PTH_ARGS;
typedef struct PTHREAD_NODE{
PTH_ARGS* argPtr;
int init_height;
}PTH_NODE;
# Rendering scene
Done!
Execution time of raytracing() : 1.149351 sec
# Rendering scene
Done!
Execution time of raytracing() : 0.956216 sec
Done!
Execution time of raytracing() : 0.979401 sec
# Rendering scene
Done!
Execution time of raytracing() : 2.218286 sec
Technique used | Execution time |
---|---|
original | 6.245250 sec |
Loop unrolling | 4.895239 sec |
Force inline | 2.218286 sec |
pthread (2 threads) | 1.149351 sec |
pthread (4 threads) | 0.956216 sec |
pthread (8 threads) | 0.979401 sec |
#include <omp.h>
CC ?= gcc
CFLAGS = \
-std=gnu99 -Wall -O0 -g -fopenmp
LDFLAGS = \
-lm -lgomp
#pragma omp parallel for num_threads(64) private(stk), private(d), private(object_color)
Technique used | Execution time |
---|---|
original | 6.245250 sec |
Loop unrolling | 4.895239 sec |
Force inline | 2.218286 sec |
pthread (2 threads) | 1.149351 sec |
pthread (4 threads) | 0.956216 sec |
pthread (8 threads) | 0.979401 sec |
OpenMP (8 threads) | 0.956893 sec |
[1] http://os.51cto.com/art/200703/41426.htm
[2] https://computing.llnl.gov/tutorials/pthreads/#Thread
[3] https://computing.llnl.gov/tutorials/pthreads/man/pthread_create.txt
[4] http://www.cprogramming.com/tutorial/function-pointers.html