# 2017q3 Homework1 (phonebook) contributed by <`kevin550029`> ## Perf 工具練習 ### 安裝&初步測試 1. 按照 [perf 原理和實務](https://hackmd.io/s/B11109rdg) 中的步驟安裝perf 2. 嘗試執行 `perf top` 會出現錯誤訊息, 表示只能在root權限下使用 ![error message](https://i.imgur.com/rpdDV6L.png) 3. 嘗試輸入下列指令, 但仍無法更改 restrict ``` $ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" ``` 4. 解決方法是在另外開一個 Terminal , 並在 Kernal 模式下執行 5. 利用 `perf top -p $pid` 測試 perf_top_example, 結果如下 ![](https://i.imgur.com/GJcH37s.png) #### 練習使用perf stat * 測試用程式碼 `stat_test.c` ```clike= #include <stdio.h> #include <stdint.h> static char array[10000][10000]; int main (void){ int i, j; for (i = 0; i < 10000; i++) for (j = 0; j < 10000; j++) array[j][i]++; return 0; } ``` * 使用 perf stat 分析 `stat_test.c` cache miss 情形 ``` g++ -c stat_test.c g++ stat_test.o -o stat perf stat ./stat ``` 得到結果如下 ``` Performance counter stats for './stat' (5 runs): 493,7852 cache-misses # 2.241 % of all cache refs ( +- 1.18% ) 2,2034,9691 cache-references ( +- 0.15% ) 22,0592,2702 instructions # 2.04 insns per cycle ( +- 0.00% ) 10,8220,2494 cycles ( +- 0.19% ) 0.344483546 seconds time elapsed ( +- 1.39% ) ``` 考慮存取的局部性,將 i,j對調改成array[i][j]++ ```clike= #include <stdio.h> #include <stdint.h> static char array[10000][10000]; int main (void){ int i, j; for (i = 0; i < 10000; i++) for (j = 0; j < 10000; j++) array[i][j]++; return 0; } ``` cache-references 從 2,2034,9691 下降到 519,4173 ``` 201,6420 cache-misses # 38.821 % of all cache refs ( +- 1.78% ) 519,4173 cache-references ( +- 0.75% ) 22,0580,7841 instructions # 2.74 insns per cycle ( +- 0.00% ) 8,0382,8158 cycles ( +- 0.04% ) 0.257051135 seconds time elapsed ( +- 2.12% ) ``` --- ## Phonebook ### 測試環境 ``` Linux version 4.4.0-96-generic Ubuntu 16.04.3 LTS L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz ``` ### 測試原始程式碼 ``` size of entry : 136 bytes execution time of append() : 0.038564 sec execution time of findName() : 0.005541 sec Performance counter stats for './phonebook_orig' (100 runs): 463,5331 cache-misses # 93.928 % of all cache refs ( +- 0.86% ) 493,4994 cache-references ( +- 0.83% ) 2,5927,9283 instructions # 1.39 insns per cycle ( +- 0.53% ) 1,8636,6232 cycles ( +- 0.66% ) 0.061023413 seconds time elapsed ( +- 1.04% ) ``` > cache-misses 為 93.928 % * phonebook_orig.c ```clike= #include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include "phonebook_orig.h" /* original version */ entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` > findname 和 append 皆使用 Linklist 的結構 > Linklist 插入或刪除節點較為容易 > 但在搜尋存取上必須花時間去取出鏈結指標, 以便讀取下一個節點 > 若資料量大, 則須花較多時間搜尋 * phonebook_orig.h ```clike= #ifndef _PHONEBOOK_H #define _PHONEBOOK_H #define MAX_LAST_NAME_SIZE 16 /* original version */ typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e); #endif ``` ## 修改Struct內容 * 從減少 cache-miss 角度出發 * 先嘗試將struct中尚未使用到的內容註解掉 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; /* char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; */ struct __PHONE_BOOK_ENTRY *pNext; } entry; entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e); ``` > 可以發現 cache-misses 的情況下降到 68.699%, > append 和 findname 的 performance 都有增加 ``` Performance counter stats for './phonebook_opt' (100 runs): 152,6120 cache-misses # 68.699 % of all cache refs ( +- 0.46% ) 222,1462 cache-references ( +- 0.12% ) 2,4388,2112 instructions # 2.03 insns per cycle ( +- 0.02% ) 1,1994,1058 cycles ( +- 0.22% ) 0.039064779 seconds time elapsed ( +- 0.33% ) ``` ![](https://i.imgur.com/bM07eho.png) * 縮小 Struct 內容 ```clike= typedef struct __PHONE_BOOK_USER_INFO{ char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } UserInfoEntry; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; UserInfoEntry *UserInfo; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` > cache-misses 的情況下降到 70.090%, ``` Performance counter stats for './phonebook_opt' (100 runs): 154,9857 cache-misses # 70.090 % of all cache refs ( +- 0.20% ) 221,1253 cache-references ( +- 0.11% ) 2,4384,1941 instructions # 2.02 insns per cycle ( +- 0.02% ) 1,2076,2749 cycles ( +- 0.13% ) 0.039567825 seconds time elapsed ( +- 0.65% ) ``` ![](https://i.imgur.com/XDawH7F.png) ## 參考資料 [共筆 ryanwang522](https://hackmd.io/s/Sk-qaWiYl) [共筆 ZixinYang](https://hackmd.io/s/B1qEBZFoW) [共筆 tina0405](https://hackmd.io/GYFgpghgrAHDEFoAmUBsSHgMwCMERAGMBOBQgdlQEYAGGKnVYiGIA===?view) [perf 原理和實務](https://hackmd.io/s/B11109rdg) [hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e) [Git 的基本使用](https://gogojimmy.net/2012/01/17/how-to-use-git-1-git-basic/)