2017q3 Homework1 (phonebook)

# 2017q3 Homework1 (phonebook) contributed by <`FATESAIKOU`> ### 環境說明 ![](https://i.imgur.com/Xn3Dv2r.png) * CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz * Memory: 40GB * Cache: * L1d/i: 32KB * L2: 256KB * L3: 8192KB ### 優化前效能測試效能前，首先查看範例程式碼當中，是如何進行效能測試，這邊我們看到其中的 main.c。 #### __buildin_clear_cache 首先引起我們的興趣的是從 main.c 56 行開始的： ```clike= #if defined(__GNUC__) __builtin___clear_cache((char *) pHead, (char *) pHead + sizeof(entry)); #endif ``` 參考了[共筆](https://embedded2016.hackpad.com/ep/pad/static/SS3c9W4KM1S)才發現這是 [gnugcc](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 所提供的 build-in function，目的在於清除之前程式執行時可能留下的指令快取，避免效能測定的不確定性。 #### clock_gettime 接著是 main.c 59 行： ```clike= clock_gettime(CLOCK_REALTIME, &start) ``` 參考[此處](https://starforcefield.wordpress.com/2012/08/06/c%E8%AA%9E%E8%A8%80%EF%BC%9A%E4%BD%BF%E7%94%A8clock_gettime%E8%A8%88%E7%AE%97%E7%A8%8B%E5%BC%8F%E7%A2%BC%E7%9A%84%E6%99%82%E9%96%93%E9%9C%80%E6%B1%82/)的說明，此函式用於計時時十分精準，有機會到達柰秒的精準度等級，我使用參考處提供的程式碼實際測量一次自己電腦的精準度等級： timer_test.c ```clike= #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { struct timespec tt; clock_getres(CLOCK_REAL_TIME, &tt); printf("Resolution: %ld\n", tt.tv_nsec); return 0; } ``` 結果： ``` Resolution: 1 ``` 很幸運的，電腦的精細度可以達到 tv_nsec 的最小值，也就是柰秒。 #### append 接著我們在 main.c 第 65 行的位置發現了 append，也就是插入資料進電話簿的部份，接著往深處看發現到其資料結構定義於 phonebook_orig.h 並且實做 append 方法於 phonebook_orig.c： phonebook_orig.h: entry ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } ``` 可以發現電話簿中每一筆紀錄都十分龐大，並且本身只包含一個指標。 phonebook_orig.c: append ```clike= enrty *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` append 很明顯只是將傳進來的名字放入新分配的空間，接著接上傳進來的節點，並且傳出這個新節點。 main.c: ```clike= while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i ++; line[i - 1] = '\0'; i = 0; e = append(line, e); } ``` 至此很明顯可以發現此處使用的是單向的 Linked List 作為儲存的資料結構，並且透過這種 append，可以省去插入時搜尋結尾的時間消耗，但同時也必須維護一份 pHead。 #### fineName 接著我們看到了一個實際進行搜尋的部份，位於 main.c 的 88 行，其對 "zyxel" 呼叫了 findName 並且同時為其計時。 ```clike= char input[MAX_LAST_NAME_SIZE] = "zyxel"; ... clock_gettime(CLOCK_REALTIME, &start); findName(input, e); clock_gettime(CLOCK_REALTIME, &end); ``` 而我們仍然可以在 phonebook_orig.c 當中找到 findName 的實作。 phonebook_orig.c: findName ```clike= entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } ``` 上面的 findName 採用 linear search 的方式，搜尋整條 pHead 所指向的 Linked List。 #### Report 在 main.c 的第 93 與 96 以及 97 行，程式最後輸出了==建立電話簿資料庫==以及==搜尋字串zyxel==所耗費的時間。 #### make plot 接著可以看看實際產生報表的地方(Makefile plot)，循著相依性，發現到其是執行 phonebook_orig 以及將來要優化的 phonebook_opt 各100次，寫入各自的 orig.txt 以及 opt.txt，接著透過 calculate.c 轉換為 output.txt 交給 gnuplot script runtime .gp 進行繪圖，產生報表，但由於 phonebook_opt 的部份因還未開始撰寫，因此去除 phonebook_opt 的報表。以下便是範例程式產生的報表。 ![](https://i.imgur.com/2fYjxDq.png) 讀入的資料庫為 words.txt，總共 349900 筆資料。 #### Bug 另外 main.c 當中還有個值得注意的 bug 位於 99 行 ```clike= if (pHead->pNext) free(pHead->pNext); free(pHead); ``` 很容易可以注意到這樣是記憶體釋放不全。 ### 尋找問題透過上面的 report 我們可以看到這個未優化的版本對於 words.txt 的資料以及搜尋 zyxel 時的效能表現，但這樣的資訊實在太過片面，我們希望能透過同時調整```資料庫大小```、```資料庫內字串長度分佈```以及```搜尋的目標字串長度```，加上觀察 append 以及 findName 的效能表現變化，以及對照 perf 產生的資料，嘗試找出效能低落的原因並改善。 #### make more plot 首先我們看看輸入的 words.txt 上的資料分佈情形： ![](https://i.imgur.com/19uDGsS.png) 上圖是以字串長度配合計數進行繪圖，可以發現其中長度為 8 的字串最常出現，接著開始向兩側遞減。接著以下針對調整載入的電話簿大小，同時使用常態分佈並調變 $\mu$ 控制載入的字串長度，其中 $\sigma$ 為求作圖方便固定為 $\mu/2$，最後將獲得的資料進行繪圖整理。 * append ![]() ![]() * findName ![]() ![]() * cache-misses ![]() ![]() * cache-references ![]() ![]() * cache-miss-rate ![]() ![]() ### 優化實驗接著我們希望能透過一些優化，加速本程式的```載入```以及```查詢```速度。 #### 快取優化 ##### structure simplize ##### memory pool y = (append, findName, alloc speed => cache miss) x = (db entry num), y = (average string length) #### 查詢加速 ##### hash ##### hash + linked list ##### hash(Soundex) + BST x = (db entry num), y = (average string length), z = (append, findName, encode speed => collision, cache miss) (branch predict?) #### hash table 大小影響 x = (db entry num), y = (hash table size), z = (append, findName => collision, cache miss) (上述三方法之綜合繪圖) ### dataset 更換 #### Sorted / Not sorted ##### 分布圖 (orig + opt\*3) y = (accumulation) x = (db string len) ##### 效能比較 y = (append, findName, encode speed => collision, cache miss) x = (db entry num) #### 原生資料 / 現實資料 ##### 分布圖 y = (accumulation) x = (db string len) ##### 效能比較 y = (append, findName, encode speed => collision, cache miss) x = (db entry num) ### 結果討論 ### 參考資料 * [gnuplot 基礎教學系列](https://www.youtube.com/playlist?list=PLgtzOcizoxeuMeNUu_WaL6P2wKlrRSgsj) * [clock_gettime / clock_getres](https://starforcefield.wordpress.com/2012/08/06/c%E8%AA%9E%E8%A8%80%EF%BC%9A%E4%BD%BF%E7%94%A8clock_gettime%E8%A8%88%E7%AE%97%E7%A8%8B%E5%BC%8F%E7%A2%BC%E7%9A%84%E6%99%82%E9%96%93%E9%9C%80%E6%B1%82/) * [過去 hackpad 共筆](https://embedded2016.hackpad.com/ep/pad/static/SS3c9W4KM1S) * [gnugcc](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)