owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework 1 (raytracing)
###### tags: `sysprog21` `JongSyuGithub`
contributed by <`JonSyuGithub`>
github: https://github.com/JonSyuGithub/raytracing
## Environment
* OS: Ubuntu 16.04 LTS
* CPU: [i5-457](http://ark.intel.com/products/75043/Intel-Core-i5-4570-Processor-6M-Cache-up-to-3_60-GHz)
* Memory 12GB ($ cat /proc/meminfo)
* Update & Install
```
$ sudo apt-get update
$ sudo apt-get install graphviz
$ sudo apt-get install imagemagick
```
## Objective
* try reducing execution time
## Original Version
* make & run
```
$ make
$ ./raytracing
```
```
# Rendering scene
Done!
Execution time of raytracing() : 2.403777 sec
```
* make(+gopro) & run
```
$ make clean
$ make PROFILE=1
$ ./raytracing
```
```
# Rendering scene
Done!
Execution time of raytracing() : 5.059268 sec
```
>> gopro計算函數執行指令數量,造成程式執行時間比原先程式未計算指令數量多了約2.6s [name=JonSyuGithub]
```
$ gprof ./raytracing | less
```
```
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
26.50 0.49 0.49 69646433 0.00 0.00 dot_product
21.09 0.88 0.39 56956357 0.00 0.00 subtract_vector
9.74 1.06 0.18 13861875 0.00 0.00 rayRectangularIntersection
8.11 1.21 0.15 17836094 0.00 0.00 add_vector
6.49 1.33 0.12 13861875 0.00 0.00 raySphereIntersection
5.41 1.43 0.10 10598450 0.00 0.00 normalize
4.87 1.52 0.09 31410180 0.00 0.00 multiply_vector
4.87 1.61 0.09 17821809 0.00 0.00 cross_product
3.79 1.68 0.07 4620625 0.00 0.00 ray_hit_object
2.16 1.72 0.04 4221152 0.00 0.00 multiply_vectors
2.16 1.76 0.04 1048576 0.00 0.00 ray_color
1.62 1.79 0.03 1 0.03 1.85 raytracing
1.08 1.81 0.02 2520791 0.00 0.00 idx_stack_top
1.08 1.83 0.02 2110576 0.00 0.00 compute_specular_diffuse
```
>> dot_product、subtract_vecotr呼叫次數最多 [name=JonSyuGithub]
* make(+gopro+**modified Makefile**) & run
```
# Rendering scene
Done!
Execution time of raytracing() : 0.623926 sec
```
>> 修改Makefile的CFLAGS = -std=gnu99 -Wall **-Ofast** -g經由compiler最佳化,發現執行時間從4.956461 sec降到0.623926 sec,接著嘗試不透過compiler最佳化自己修改程式碼,執行時間可以降到多低! [name=JonSyuGithub]
```clike=
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
for (int i = 0; i < 3; i++)
dp += v1[i] * v2[i];
return dp;
}
static inline
void subtract_vector(const double *a, const double *b, double *out)
{
for (int i = 0; i < 3; i++)
out[i] = a[i] - b[i];
}
```
## Optimal Version
### Loop unrolling
```clike=
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
dp = v1[0] * v2[0] + v1[1] * v2[1] + v1[2] * v2[2];
return dp;
}
static inline
void subtract_vector(const double *a, const double *b, double *out)
{
out[0] = a[0] - b[0];
out[1] = a[1] - b[1];
out[2] = a[2] - b[2];
}
```
```
# Rendering scene
Done!
Execution time of raytracing() : 4.652799 sec
```
>> 程式執行時間5.059268 sec降到4.639686 sec,僅減少約0.4 sec [name=JonSyuGithub]
```
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
15.29 0.22 0.22 56956357 0.00 0.00 subtract_vector
14.94 0.44 0.22 69646433 0.00 0.00 dot_product
13.90 0.64 0.20 31410180 0.00 0.00 multiply_vector
11.12 0.80 0.16 10598450 0.00 0.00 normalize
9.73 0.94 0.14 13861875 0.00 0.00 rayRectangularIntersection
7.64 1.05 0.11 17836094 0.00 0.00 add_vector
7.30 1.15 0.11 17821809 0.00 0.00 cross_product
```
>> 可以發現dot_product執行次數比例從26.5%降到14.94%、substract_vecotr執行次數比例從21.09%降到15.25%,因為沒有for loop不需要進行predict,參考下方unrolling後的組合語言並無jmp
[name=JonSyuGithub]
```
```
$ objdump -d ~/2016_Fall_Embedded/raytracing/raytracing.o
```
00000000000001eb <subtract_vector>:
1eb: 55 push %rbp
1ec: 48 89 e5 mov %rsp,%rbp
1ef: 48 83 ec 18 sub $0x18,%rsp
1f3: e8 00 00 00 00 callq 1f8 <subtract_vector+0xd>
1f8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
1fc: 48 89 75 f0 mov %rsi,-0x10(%rbp)
200: 48 89 55 e8 mov %rdx,-0x18(%rbp)
204: 48 8b 45 f8 mov -0x8(%rbp),%rax
208: f2 0f 10 00 movsd (%rax),%xmm0
20c: 48 8b 45 f0 mov -0x10(%rbp),%rax
210: f2 0f 10 08 movsd (%rax),%xmm1
214: f2 0f 5c c1 subsd %xmm1,%xmm0
218: 48 8b 45 e8 mov -0x18(%rbp),%rax
21c: f2 0f 11 00 movsd %xmm0,(%rax)
220: 48 8b 45 e8 mov -0x18(%rbp),%rax
224: 48 83 c0 08 add $0x8,%rax
228: 48 8b 55 f8 mov -0x8(%rbp),%rdx
22c: 48 83 c2 08 add $0x8,%rdx
230: f2 0f 10 02 movsd (%rdx),%xmm0
234: 48 8b 55 f0 mov -0x10(%rbp),%rdx
238: 48 83 c2 08 add $0x8,%rdx
23c: f2 0f 10 0a movsd (%rdx),%xmm1
240: f2 0f 5c c1 subsd %xmm1,%xmm0
244: f2 0f 11 00 movsd %xmm0,(%rax)
248: 48 8b 45 e8 mov -0x18(%rbp),%rax
24c: 48 83 c0 10 add $0x10,%rax
250: 48 8b 55 f8 mov -0x8(%rbp),%rdx
254: 48 83 c2 10 add $0x10,%rdx
258: f2 0f 10 02 movsd (%rdx),%xmm0
25c: 48 8b 55 f0 mov -0x10(%rbp),%rdx
260: 48 83 c2 10 add $0x10,%rdx
264: f2 0f 10 0a movsd (%rdx),%xmm1
268: f2 0f 5c c1 subsd %xmm1,%xmm0
26c: f2 0f 11 00 movsd %xmm0,(%rax)
270: 90 nop
271: c9 leaveq
272: c3 retq
```
```
0000000000000436 <dot_product>:
436: 55 push %rbp
437: 48 89 e5 mov %rsp,%rbp
43a: 48 83 ec 20 sub $0x20,%rsp
43e: e8 00 00 00 00 callq 443 <dot_product+0xd>
443: 48 89 7d e8 mov %rdi,-0x18(%rbp)
447: 48 89 75 e0 mov %rsi,-0x20(%rbp)
44b: 66 0f ef c0 pxor %xmm0,%xmm0
44f: f2 0f 11 45 f8 movsd %xmm0,-0x8(%rbp)
454: 48 8b 45 e8 mov -0x18(%rbp),%rax
458: f2 0f 10 08 movsd (%rax),%xmm1
45c: 48 8b 45 e0 mov -0x20(%rbp),%rax
460: f2 0f 10 00 movsd (%rax),%xmm0
464: f2 0f 59 c8 mulsd %xmm0,%xmm1
468: 48 8b 45 e8 mov -0x18(%rbp),%rax
46c: 48 83 c0 08 add $0x8,%rax
470: f2 0f 10 10 movsd (%rax),%xmm2
474: 48 8b 45 e0 mov -0x20(%rbp),%rax
478: 48 83 c0 08 add $0x8,%rax
47c: f2 0f 10 00 movsd (%rax),%xmm0
480: f2 0f 59 c2 mulsd %xmm2,%xmm0
484: f2 0f 58 c8 addsd %xmm0,%xmm1
488: 48 8b 45 e8 mov -0x18(%rbp),%rax
48c: 48 83 c0 10 add $0x10,%rax
490: f2 0f 10 10 movsd (%rax),%xmm2
494: 48 8b 45 e0 mov -0x20(%rbp),%rax
498: 48 83 c0 10 add $0x10,%rax
49c: f2 0f 10 00 movsd (%rax),%xmm0
4a0: f2 0f 59 c2 mulsd %xmm2,%xmm0
4a4: f2 0f 58 c1 addsd %xmm1,%xmm0
4a8: f2 0f 11 45 f8 movsd %xmm0,-0x8(%rbp)
4ad: f2 0f 10 45 f8 movsd -0x8(%rbp),%xmm0
4b2: c9 leaveq
4b3: c3 retq
```
## References
* (A02: raytracing) [https://hackmd.io/KYdgDAJgjArFAsBaARgJmATkfKYBmiAHKgGwmICGF8ZeAxvBQMyF1A==?both]
* [2016q3 Homework1 (raytracing)](https://hackmd.io/s/Hka0DiLT)
>> 謝謝同學提供disassemble該關鍵字,能夠查詢編譯後的組合語言 [name=JobSyuGithub]
* [Disassemble one function using objdump](http://stackoverflow.com/questions/5125896/how-to-disassemble-a-binary-executable-in-linux-to-get-the-assembly-code)
* [2016 年春季系統程式課程作業 2 + 作業 3 解說](https://www.youtube.com/watch?v=m1RmfOfSwno)