# 2017q1 Homework3(software-pipelining)
contributed by<`yanang`>
###### tags: `yanang`
## 開發環境
* OS: Ubuntu 16.04 LTS
* L1d cache: 32K
* L1i cache: 32K
* L2 cache: 256K
* L3 cache: 3072K
* Architecture: x86_64
* CPU 作業系統: 32-bit, 64-bit
* Byte Order: Little Endian
* CPU(s): 4
* Model name: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
*
Linux yanang 4.8.0-36-generic
#36~16.04.1-Ubuntu SMP
## 開發紀錄
* 測試執行
```
yanang@yanang:~/prefetcher$ ./main
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
sse prefetch: 53820 us
sse: 153584 us
naive: 261000 us
```
* 觀察程式碼,其中 _mm_unpacklo_epi32(...) 等都是未接觸過得東西,參考 [Programming trivia](https://www.randombit.net/bitbashing/2009/10/08/integer_matrix_transpose_in_sse2.html)
* * 程式碼
```c=
__m128i T0 = _mm_unpacklo_epi32(I0, I1);
__m128i T1 = _mm_unpacklo_epi32(I2, I3);
__m128i T2 = _mm_unpackhi_epi32(I0, I1);
__m128i T3 = _mm_unpackhi_epi32(I2, I3);
I0 = _mm_unpacklo_epi64(T0, T1);
I1 = _mm_unpackhi_epi64(T0, T1);
I2 = _mm_unpacklo_epi64(T2, T3);
I3 = _mm_unpackhi_epi64(T2, T3);
```
* * 執行說明
![](https://www.randombit.net/bitbashing/_images/sse2_transpose.png)
## 數據封裝
* 先以 [你所不知道的c語言:物件導向設計篇](https://hackmd.io/s/HJLyQaQMl) 當中所提到的 Data Encapsulation 將 impl.c 重寫架構
```c=
typedef struct object Object;
typedef void (*func_t)(Object *);
struct object {
int *src;
int *dst;
int w;
int h;
func_t naive_transpose, sse_transpose, sse_prefetch_transpose;
};
static void naive_transpose_impl(Object *self){...}
static void sse_transpose_impl(Object *self){...}
static void sse_prefetch_transpose_impl(Object *self){...}
int init_object(Object **self){
if (NULL == (*self = malloc(sizeof(Object)))) return -1;
(*self)->src = 0; (*self)->dst = 0;
(*self)->w = 0; (* self)->h = 0;
(*self)->naive_transpose = naive_transpose_impl;
(*self)->sse_transpose = sse_transpose_impl;
(*self)->sse_prefetch_transpose = sse_prefetch_transpose_impl;
return 0;
}
```
* main.c 當中呼叫方式也隨之更改
```c=
Object *o = NULL;
init_object(&o);
o->src = testin;
o->dst = testout;
o->w=4;
o->h=4;
o->sse_transpose(o);
```
* 重新執行
```
yanang@yanang:~/prefetcher$ ./main
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
sse prefetch: 59770 us
sse: 159602 us
naive: 298968 us
```
## 修改 Makefile 以產生不同執行檔
* 修改 Makefile,參考 [phonebook/ Makefile](https://github.com/yanang/phonebook/blob/master/Makefile)產生不同執行檔
* 利用 gcc -D 的語法,在 main.c 中用 #ifdef 可有不同執行成果 [gcc -D 參數](http://blog.csdn.net/hnrainll/article/details/6330498)
```c=
EXEC = naive sse sse_prefetch
all: $(GIT_HOOKS) $(EXEC)
naive: $(SRCS_common)
$(CC) $(CFLAGS) -D NAIVE -o naive main.c
```
* 重新執行 $ make
```
yanang@yanang:~/prefetcher$ ls
AUTHORS LICENSE Makefile README.md sse
impl.c main.c naive scripts sse_prefetch
```
* 執行成果 $ ./naive
```
yanang@yanang:~/prefetcher$ ./naive
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
naive: 292217 us
```
>上面的測試矩陣翻轉的程式碼也依各個執行檔執行各自的測試(數值相同而已)
## cache miss test
```
Performance counter stats for './naive':
50,162,690 cache-misses # 90.244 % of all cache refs
55,585,510 cache-references
1,549,137,957 instructions # 1.41 insn per cycle
1,095,202,211 cycles
0.435941554 seconds time elapsed
naive: 289430 us
```
```
Performance counter stats for './sse':
14,958,038 cache-misses # 78.597 % of all cache refs
19,031,290 cache-references
1,277,425,751 instructions # 1.77 insn per cycle
723,107,078 cycles
0.302469477 seconds time elapsed
sse: 153426 us
```
```
Performance counter stats for './sse_prefetch':
12,584,878 cache-misses # 76.055 % of all cache refs
16,547,114 cache-references
1,336,032,213 instructions # 2.40 insn per cycle
556,822,904 cycles
0.206013703 seconds time elapsed
sse prefetch: 60255 us
```