# 2017q1 Homework3(software-pipelining) contributed by<`yanang`> ###### tags: `yanang` ## 開發環境 * OS: Ubuntu 16.04 LTS * L1d cache: 32K * L1i cache: 32K * L2 cache: 256K * L3 cache: 3072K * Architecture: x86_64 * CPU 作業系統: 32-bit, 64-bit * Byte Order: Little Endian * CPU(s): 4 * Model name: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz * Linux yanang 4.8.0-36-generic #36~16.04.1-Ubuntu SMP ## 開發紀錄 * 測試執行 ``` yanang@yanang:~/prefetcher$ ./main 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 sse prefetch: 53820 us sse: 153584 us naive: 261000 us ``` * 觀察程式碼,其中 _mm_unpacklo_epi32(...) 等都是未接觸過得東西,參考 [Programming trivia](https://www.randombit.net/bitbashing/2009/10/08/integer_matrix_transpose_in_sse2.html) * * 程式碼 ```c= __m128i T0 = _mm_unpacklo_epi32(I0, I1); __m128i T1 = _mm_unpacklo_epi32(I2, I3); __m128i T2 = _mm_unpackhi_epi32(I0, I1); __m128i T3 = _mm_unpackhi_epi32(I2, I3); I0 = _mm_unpacklo_epi64(T0, T1); I1 = _mm_unpackhi_epi64(T0, T1); I2 = _mm_unpacklo_epi64(T2, T3); I3 = _mm_unpackhi_epi64(T2, T3); ``` * * 執行說明 ![](https://www.randombit.net/bitbashing/_images/sse2_transpose.png) ## 數據封裝 * 先以 [你所不知道的c語言:物件導向設計篇](https://hackmd.io/s/HJLyQaQMl) 當中所提到的 Data Encapsulation 將 impl.c 重寫架構 ```c= typedef struct object Object; typedef void (*func_t)(Object *); struct object { int *src; int *dst; int w; int h; func_t naive_transpose, sse_transpose, sse_prefetch_transpose; }; static void naive_transpose_impl(Object *self){...} static void sse_transpose_impl(Object *self){...} static void sse_prefetch_transpose_impl(Object *self){...} int init_object(Object **self){ if (NULL == (*self = malloc(sizeof(Object)))) return -1; (*self)->src = 0; (*self)->dst = 0; (*self)->w = 0; (* self)->h = 0; (*self)->naive_transpose = naive_transpose_impl; (*self)->sse_transpose = sse_transpose_impl; (*self)->sse_prefetch_transpose = sse_prefetch_transpose_impl; return 0; } ``` * main.c 當中呼叫方式也隨之更改 ```c= Object *o = NULL; init_object(&o); o->src = testin; o->dst = testout; o->w=4; o->h=4; o->sse_transpose(o); ``` * 重新執行 ``` yanang@yanang:~/prefetcher$ ./main 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 sse prefetch: 59770 us sse: 159602 us naive: 298968 us ``` ## 修改 Makefile 以產生不同執行檔 * 修改 Makefile,參考 [phonebook/ Makefile](https://github.com/yanang/phonebook/blob/master/Makefile)產生不同執行檔 * 利用 gcc -D 的語法,在 main.c 中用 #ifdef 可有不同執行成果 [gcc -D 參數](http://blog.csdn.net/hnrainll/article/details/6330498) ```c= EXEC = naive sse sse_prefetch all: $(GIT_HOOKS) $(EXEC) naive: $(SRCS_common) $(CC) $(CFLAGS) -D NAIVE -o naive main.c ``` * 重新執行 $ make ``` yanang@yanang:~/prefetcher$ ls AUTHORS LICENSE Makefile README.md sse impl.c main.c naive scripts sse_prefetch ``` * 執行成果 $ ./naive ``` yanang@yanang:~/prefetcher$ ./naive 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 naive: 292217 us ``` >上面的測試矩陣翻轉的程式碼也依各個執行檔執行各自的測試(數值相同而已) ## cache miss test ``` Performance counter stats for './naive': 50,162,690 cache-misses # 90.244 % of all cache refs 55,585,510 cache-references 1,549,137,957 instructions # 1.41 insn per cycle 1,095,202,211 cycles 0.435941554 seconds time elapsed naive: 289430 us ``` ``` Performance counter stats for './sse': 14,958,038 cache-misses # 78.597 % of all cache refs 19,031,290 cache-references 1,277,425,751 instructions # 1.77 insn per cycle 723,107,078 cycles 0.302469477 seconds time elapsed sse: 153426 us ``` ``` Performance counter stats for './sse_prefetch': 12,584,878 cache-misses # 76.055 % of all cache refs 16,547,114 cache-references 1,336,032,213 instructions # 2.40 insn per cycle 556,822,904 cycles 0.206013703 seconds time elapsed sse prefetch: 60255 us ```