andy19950
>Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
18.06 0.48 0.48 69646433 0.00 0.00 dot_product
16.55 0.92 0.44 56956357 0.00 0.00 subtract_vector
11.66 1.23 0.31 31410180 0.00 0.00 multiply_vector
8.46 1.46 0.23 17836094 0.00 0.00 add_vector
8.46 1.68 0.23 13861875 0.00 0.00 rayRectangularIntersection
Performance counter stats for './raytracing' (100 runs):
62,894 cache-misses #27.256 % of all cache refs ( +- 1.64% )
230,751 cache-references ( +- 3.17% )
262,129,289 branch ( +- 0.00% )
3,417,337,773 instructions #1.79 insns per cycle ( +- 0.00% )
1,905,665,597 cycles ( +- 0.12% )
0.646348227 seconds time elapsed ( +- 0.40% )
void add_vector(const double *a, const double *b, double *out)
{
out[0] = a[0] + b[0];
out[1] = a[1] + b[1];
out[2] = a[2] + b[2];
}
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
16.19 0.28 0.28 69646433 0.00 0.00 dot_product
15.89 0.55 0.27 13861875 0.00 0.00 rayRectangularIntersection
13.83 0.78 0.24 56956357 0.00 0.00 subtract_vector
10.59 0.96 0.18 10598450 0.00 0.00 normalize
7.06 1.08 0.12 31410180 0.00 0.00 multiply_vector
Performance counter stats for './raytracing' (100 runs):
106,186 cache-misses #24.499 % of all cache refs ( +- 2.82% )
433,426 cache-references ( +- 2.71% )
969,948,464 branch ( +- 0.00% )
12,314,790,448 instructions #1.89 insns per cycle ( +- 0.00% )
6,506,120,704 cycles ( +- 0.07% )
2.152467875 seconds time elapsed ( +- 0.14% )
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
1.83 1.50 0.03 3838091 0.00 0.00 length
1.83 1.53 0.03 1 0.03 1.64 raytracing
1.22 1.55 0.02 2110576 0.00 0.00 localColor
* 在優化 raytracing() 之前先來了解一下什麼是 pthread
platform | fork | pthread_create()
====================| real user sys | real user sys
Intel 2.6 GHz Xeon | 8.1 0.1 2.9 | 0.9 0.2 0.3
Intel 2.8 GHz Xeon | 4.4 0.4 4.3 | 0.7 0.2 0.5
AMD 2.3 GHz Opteron | 12.5 1.0 12.5 | 1.2 0.2 1.3
AMD 2.4 GHz Opteron | 17.6 2.2 15.7 | 1.4 0.3 1.3
for(j=0; j<THREAD_NUM; j++){
/*---這邊省略把原本raytracing的參數包成struct傳入thread,詳細請參考我的github---*/
input* box = (input*) malloc (sizeof(input));
rc = pthread_create(&tid[j], &attr, raytracing, (void*) &box) ;
}
for(j=0; j<THREAD_NUM; j++)
rc = pthread_join(tid[j], NULL);
for (int j = box->j; j < (box->j + box->height / THREAD_NUM); j++) {
for (int i = 0; i < box->width; i++) {
THREAD_NUM EXEC_TIME || THREAD_NUM EXEC_TIME
512 0.996285 sec || 256 1.016958 sec
128 1.003348 sec || 64 1.022087 sec
32 0.957354 sec || 16 0.965541 sec
8 0.968322 sec || 4 1.089226 sec
2 1.217084 sec || 1 2.160865 sec
Performance counter stats for './raytracing' (100 runs):
98,242 cache-misses #11.745 % of all cache refs ( +- 0.80% )
836,439 cache-references ( +- 0.80% )
545,838,551 branch ( +- 0.00% )
10,597,581,108 instructions #1.19 insns per cycle ( +- 0.00% )
8,893,931,480 cycles ( +- 0.13% )
0.860754705 seconds time elapsed ( +- 0.26% )