# 2016q3 Homework1 (raytracing)
contributed by <`TempoJiJi`>
###### tags: `TempoJiJi` `raytracing` `sysprog21`
### Reviewed by `SarahYuHanCheng`
* ć»șè°ćèĄšæžćŒćœććŻä»„çšçžźćŻ«ć代代皱

## é æçźæš
äčćć·Čć°raytracingćéćæčéąçç ç©¶ïŒäœæćäžæŹĄçžœćŸ©çżïŒäžæçéćšäžæŹĄæȘèœéć°ççźæšäžă
éć»çć
±çïŒ[ç”ć„ć
±ç](https://embedded2016.hackpad.com/ep/pad/static/f5CCUGMQ4Kp), [ćäșșć
±ç](https://embedded2016.hackpad.com/ep/pad/static/s8ujtGxBML2)
* ćžçż gprof
* 仄 gprof æćșæèœç¶é žïŒäžŠäžèææč毫æȘæĄ math-toolkit.h ćšć
§çćœćŒćŻŠć
* ćçš POSIX Thread, OpenMP, software pipelining, 仄ć loop unrolling äžéĄçæć·§äŸć éçšćŒéäœ
* 毊ćSIMD
## éçŒç°ćą
* Ubuntu 16.04 LTS
* L1d cache: 32K
* L1i cache: 32K
* L2 cache: 256K
* L3 cache: 3072K
* Architecture: x86_64
* CPU op-mode(s): 32-bit, 64-bit
* Byte Order: Little Endian
* CPU(s): 4
* Model name: Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
---
æžŹè©ŠæȘćȘćççšćŒæèœïŒ
```
# Rendering scene
Done!
Execution time of raytracing() : 3.338328 sec
```
ć©çšgprofè§ćŻçšćŒæèœćèĄçČïŒćšMakefileçç·šèŻéžé
èŁć ć
„ ==-pg== ïŒćéČèĄç·šèŻè·ć·èĄïŒć°±æçąçäžćgmon.outæȘïŒ
```clike
$ gprof raytracing gmon.out
% cumulative self self total
time seconds seconds calls s/call s/call name
24.91 0.60 0.60 69646433 0.00 0.00 dot_product
14.12 0.94 0.34 56956357 0.00 0.00 subtract_vector
10.38 1.19 0.25 13861875 0.00 0.00 rayRectangularIntersection
9.34 1.42 0.23 31410180 0.00 0.00 multiply_vector
8.72 1.63 0.21 10598450 0.00 0.00 normalize
7.06 1.80 0.17 4620625 0.00 0.00 ray_hit_object
4.98 1.92 0.12 17836094 0.00 0.00 add_vector
4.57 2.03 0.11 13861875 0.00 0.00 raySphereIntersection
4.15 2.13 0.10 17821809 0.00 0.00 cross_product
2.49 2.19 0.06 1048576 0.00 0.00 ray_color
2.08 2.24 0.05 1048576 0.00 0.00 rayConstruction
```
éèŁćŻä»„çć°dot_productè·subtrace_vectoræŻæè±æéçć°æčïŒçšćŒć·èĄæéçžœć
±èą«ćŒć«äșæ„èż7ćèŹæŹĄ(dot_product)ă
## æä»„ć
ćŸmath-toolkit.héć§è§ćŻïŒ
### Loop Unrolling
ćȘé»ïŒ
* ćæŻé æžŹïŒbranch predictorïŒć€±ææžć°ă
* ćŠæćŸȘç°é«ć
§èȘć„æČææžæçžéïŒćąć äș䞊çŒć·èĄçæ©æă
* ćŻä»„ćšć·èĄæćæ
ćŸȘç°ć±éă
çŒșé»ïŒ
* éäœćŻèźæ§
```
static inline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
for (int i = 0; i < 3; i++)
dp += v1[i] * v2[i];
return dp;
}
```
éèŁćŻä»„çć°ć©çšèżŽćäŸèšçźdot_productïŒèèżŽćææbranchçæ
æłçŒçïŒć°èŽæèœäžć„œïŒæä»„ć°±ć°èżŽćæé(loop unrolling)ïŒ
```clike=
double dp = 0.0;
dp += v1[0] * v2[0];
dp += v1[1] * v2[1];
dp += v1[2] * v2[2];
return dp;
```
```
# Rendering scene
Done!
Execution time of raytracing() : 2.999242 sec
```
ćŻä»„çć°ćȘæčäžćdot_productæé氱濫äșèż0.4ç§ć·Šćłă
---
### Force Inline
inlineæŻćç·šèŻćšæćșâć»șè°âïŒæinlineçćœæžćšćœæžäœçœźçŽæ„ć±éïŒæžèŒçł»ç”±èČ æ(overhead)ïŒïŒç·šèŻćšæć°æŻć
©è
ć·èĄæéćŸéžæć·èĄinlineèćŠă
ćȘé»ïŒ
* æžć°function call
ćšfucntionćŸéąć äž __forceinline
```clike=
static inline __forceinline
double dot_product(const double *v1, const double *v2)
{
double dp = 0.0;
for(int i =0;i<3;i++)
dp += v1[i] * v2[i];
return dp;
}
```
ç·šèŻć äž -D__forceinline="__attribute__((always_inline))" éČèĄç·šèŻïŒ
```
# Rendering scene
Done!
Execution time of raytracing() : 3.059393 sec
```
æéæŻćæŹçćż«äș0.2ç§ć·ŠćłïŒæ„äžäŸçšgprofè§ćŻfunction callïŒ
```
% cumulative self self total
time seconds seconds calls s/call s/call name
34.98 1.00 1.00 13861875 0.00 0.00 rayRectangularIntersection
22.38 1.64 0.64 13861875 0.00 0.00 raySphereIntersection
10.84 1.95 0.31 2110576 0.00 0.00 compute_specular_diffuse
9.44 2.22 0.27 2110576 0.00 0.00 localColor
7.17 2.43 0.21 1048576 0.00 0.00 ray_color
5.42 2.58 0.16 4620625 0.00 0.00 ray_hit_object
4.20 2.70 0.12 1048576 0.00 0.00 rayConstruction
```
ćŻä»„çć°math_tookit.hèŁçfunctionć·Čç¶äžèŠäș
---
### Macro
ćȘé»ïŒ
* ć·èĄéćșŠćż«ïŒæČæć çç push ć pop ćäœçéèŠïŒæžć°æéçèæ
çŒșé»ïŒ
* ć·šéèą«ćŒć«ć€æŹĄä»„ćŸïŒæèæćæŸćäœżçšć€§éçèšæ¶é«ç©șé
ć°math_tookit.hèŁçfunctionæčçČMacroäŸéČèĄèšçźïŒéæšŁć°±èœæžć°function call(stack frameçpush,popçèĄçČ)æćž¶äŸçæéè±èČ»
```clike=
#define DOT(a,b) ((a[0]*b[0])+(a[1]*b[1])+(a[2]*b[2]))
```
```
# Rendering scene
Done!
Execution time of raytracing() : 2.741537 sec
```
éèŁćŻä»„çć°æéæŻloop unrollingćż«äșèż0.2ç§ć·ŠćłïŒć æ€æ±șćźć°math_tookit.hèŁçè±èČ»èŒć€§çfunctionéœæčçČMacro argument
æ čægprofçç”æïŒćŻä»„ç„éćȘćfunctionçè±èČ»æć€§ïŒć æ€ć°ćźćéœæčçČMacro argumentćłćŻ
```clike
#define DOT(a,b) ((a[0]*b[0])+(a[1]*b[1])+(a[2]*b[2]))
#define SUB_VEC(a,b,c) (c[0]=a[0]-b[0], c[1]=a[1]-b[1], c[2]=a[2]-b[2])
#define MVEC(a,b,out) (out[0]=a[0]*(b), out[1]=a[1]*(b), out[2]=a[2]*(b))
#define ADD(a,b,c) (c[0]=a[0]+b[0], c[1]=a[1]+b[1], c[2]=a[2]+b[2])
#define CDOT(a,b,out) (out[0]=(a[1]*b[2])-(a[2]*b[1]),out[1]=(a[2]*b[0])-(a[0]*b[2]),out[2]=(a[0]*b[1])-(a[1]*b[0]))
```
```
# Rendering scene
Done!
Execution time of raytracing() : 1.634365 sec
```
ćŻä»„çć°ćȘæčäșmath_tookit.hæŽćçšćŒçæèœć°±ćż«äșæ„èż1ćïŒćŻèŠçšćŒçæèœç¶é žççæŻćšmath_tookit.hèŁă
---
## 毊ćPThread è· OpenMP
### OpenMP
éŠć
äŸćŻŠćOpenMPïŒćšforèżŽćäžć äžïŒ
```
#pragma omp parallel for num_threads(4) \
private(x), private(y)
```
num_threadsæŻèŠć»șç«çć·èĄç·æžéăprivatećŻä»„çąșäżèźæžćšććć·èĄç·èŁæŻçšç«çïŒäžŠäžæć çČæćć·èĄç·æčèźäșèźæžïŒèćœ±éżć°ć
¶ćźć·èĄç·ă
ćšraytracingçć
©ć±€forèżŽćäžć äžïŒ
```
#pragma omp parallel for num_threads(4) \
private(stk), private(d), \
private(object_color)
for (int j = 0 ; j < 512; j++) {
for (int i = 0; i < 512; i++) {
double r = 0, g = 0, b = 0;
/* MSAA */
```
éèŁć°stkădăobject_colorèšçČprivateïŒæŻć çČé3ćçćŒæŻæćšććć·èĄç·èŁéČèĄæŽćă
æćŸèŠèšćŸ ==#include<omp.h>== ïŒ ä»„ććšç·šèŻéžé
äžć äž ==-fopenp==
ć·èĄç”æïŒ
```
num_thread(4)
# Rendering scene
Done!
Execution time of raytracing() : 1.738263 sec
```
äžćçnum_threadæŻèŒćïŒ

---
### POSIX Thread
ç¶functionçćæžè¶
éäžćïŒćšć»șç«pthreadçæćéèŠć°ææćæžæć
è”·äŸïŒæèœćłé甊functionïŒ
```clike=
typedef struct __ARG {
uint8_t *pixels;
color background;
rectangular_node rectangulars;
sphere_node spheres;
light_node lights;
const viewpoint *View;
int start_j;
point3 u, v, w, d;
} arg;
```
ć°ćçćæäžćçç仜äș€ç”Šäžćçć·èĄç·ć»éČèĄïŒéèŁææć
ć°rowćæ4ćç仜ïŒćçš4ćć·èĄç·ć»ć·èĄïŒ
```clike=
/* (*data).start_j çČæŻćć·èĄç·æć·èĄçè”·é» */
for (int j = (*data).start_j ; j < 512; j+=MAXX) {
for (int i = 0; i < 512; i++) {
double r = 0, g = 0, b = 0;
/* MSAA */
...
```
ć·èĄç”æïŒ
```
# Rendering scene
Done!
Execution time of raytracing() : 1.603646 sec
```
äžćthreadæžéçæŻèŒćïŒ

ç±äžććŻćŸç„ïŒthreadçæžéè¶ć€ïŒäžä»ŁèĄšçšćŒæèœè¶ć„œă
---
## æäœłćȘćïŒćäžŠæææčæł
ćäžŠæææčæłïŒćè·äžćçç·šèŻćȘććć°æŻïŒ
