# 2017q3 Homework1 (phonebook) contributed by `<jackyhobingo>` ## 系統環境 硬體規格 ```shell $ lscpu Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數:2 每通訊端核心數:4 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 158 Model name: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz 製程: 9 CPU MHz: 4022.314 CPU max MHz: 4200.0000 CPU min MHz: 800.0000 BogoMIPS: 7200.00 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 8192K NUMA node0 CPU(s): 0-7 ``` 作業系統 ```shell $ uname -a Linux jackyhobingo-D830MT 4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 09:02:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux ``` ## 分析原版程式 電話簿資料結構是 linked list function findName 的時間複雜度為 O(n) function append 的時間複雜度為 O(1) 而且在搜尋及添加時只需要用到lastName的資料 程式會將過多不必要的資料 (firstName, email, and other unimportant data) 存取到 cache,程式執行時 cache 可以存取的資料筆數過少,造成容易 cache miss ```C /* phonebook_orig.h */ typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` ```c /* phonebook_orig.c */ entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` ## 測試原版程式 ```shell $ sudo make cache-test perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./phonebook_orig size of entry : 136 bytes execution time of append() : 0.041267 sec execution time of findName() : 0.004651 sec ... Performance counter stats for './phonebook_orig' (100 runs): 4,428,668 cache-misses # 89.994 % of all cache refs ( +- 0.21% ) 4,921,091 cache-references ( +- 0.14% ) 261,751,981 instructions # 1.22 insn per cycle ( +- 0.02% ) 214,970,704 cycles ( +- 0.57% ) 0.053889418 seconds time elapsed ( +- 0.60% ) ``` ## 最佳化策略 先不修改 phonebook_opt.c 先使用 phonebook_orig.c 的寫法,只先修改資料結構,將phonebook_opt.h 中冗餘的資料另外存放。 ```C /* phonebook_opt.h */ typedef struct __PHONE_BOOK_OTHER_INFO { // new data structure char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } otherInfo; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; otherInfo * info; // new pointer struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 產生圖表,新的資料結構明顯改善效能,新的進入點 Size 只剩下 32 bytes,在 append() 時 malloc() 所配置的空間變小,因此加快append() 的速度,此外在 L1-cache 可以存放的資料筆數多上 4.25 倍,cache-misses降低,findName() 效能提高。 ```sh $ make plot ``` ![](https://i.imgur.com/R4RgjuT.png) ```sh $ sudo make cache-test ... Performance counter stats for './phonebook_opt' (100 runs): 1,186,638 cache-misses # 54.303 % of all cache refs ( +- 0.40% ) 2,185,228 cache-references ( +- 0.14% ) 244,863,348 instructions # 2.11 insn per cycle ( +- 0.02% ) 116,021,194 cycles ( +- 0.39% ) 0.028874041 seconds time elapsed ( +- 0.43% ) ``` ## 參考資料 [ktvexe開發紀錄(phonebook)](https://hackmd.io/s/rk5ayZDKx#)