# 2017q1 Homework1(phonebook) contributed by < `potinglee` > ### Reviewed by `yachiyang01` -git commit message需用命令語氣,內文與標題需一個空白行 ### Reviewed by `harryoooooooooo` * C/C++標準中,free(), delete等函式不須再另外判斷是否為NULL,傳入NULL並不會發生錯誤 >> 注意一致的格式 [name="jserv"] >進度嚴重落後,請趕工![name=課程助教][color=red] ## 開發環境 * OS: Ubuntu 16.04.2 LTS * L1d cache: 32K * L1i cache: 32K * L2 cache: 256K * L3 cache: 6144K * Architecture: x86_64 * CPU 作業模式: 32-bit, 64-bit * Byte Order: Little Endian * CPU(s): 4 * Model name: Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz * **memory** description: System memory physical id: 0 size: 7872MiB ## 未優化版本 * 執行: `$ ./phonebook_orig` * append() 及 findName() 時間如下 ``` size of entry : 136 bytes execution time of append() : 0.083434 sec execution time of findName() : 0.005348 sec ``` * catch test: `$ make cache-test` * cache-misses 為 92% ``` Performance counter stats for './phonebook_orig' (100 runs): 457,4257 cache-misses # 92.288 % of all cache refs ( +- 0.18% ) 495,6511 cache-references ( +- 0.10% ) 2,6246,7397 instructions # 1.50 insn per cycle ( +- 0.02% ) 1,7458,4735 cycles ( +- 0.19% ) 0.066569148 seconds time elapsed ( +- 0.26% ) ``` ## 優化版本1 - 減少資料結構大小 原本的資料結構為: ```c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 因為只搜尋 lastName,所以把其他不會用到的成員移除。結果如下: ```c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` * 執行: `$ ./phonebook_opt` * append() 及 findName() 時間如下 ``` size of entry : 24 bytes execution time of append() : 0.039963 sec execution time of findName() : 0.001991 sec ``` * catch test: `$ make cache-test` * cache-misses 減少為 63% ``` Performance counter stats for './phonebook_opt' (100 runs): 102,7372 cache-misses # 63.629 % of all cache refs ( +- 0.76% ) 161,4624 cache-references ( +- 0.13% ) 2,4126,7291 instructions # 2.16 insn per cycle ( +- 0.02% ) 1,1166,5508 cycles ( +- 0.19% ) 0.041329077 seconds time elapsed ( +- 0.30% ) ``` * 和未優化版本的比較圖: ![](https://i.imgur.com/5puco8J.png) ## 優化版本2 - 使用二元搜尋樹 將原本的用 linked list 建立的資料,改用二元搜尋樹建立。 原本建立 linked list 的程式: ```c= entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` 建立二元搜尋樹的程式: ```c= entry *append(char lastName[], entry *e) { if(e == NULL) { strcpy(e->lastName, lastName); return e; } else { //create a new node entry* newnode = (entry *) malloc(sizeof(entry)); strcpy(newnode->lastName, lastName); newnode->pRight = newnode->pLeft = NULL; entry *tmp = e; entry *prev = NULL; while(tmp != NULL) { prev = tmp; if(strcasecmp(lastName, tmp->lastName) > 0) tmp = tmp->pRight; else if(strcasecmp(lastName, tmp->lastName) < 0) tmp = tmp->pLeft; } if(prev->pRight == tmp) prev->pRight = newnode; else prev->pLeft = newnode; return newnode; } } ``` 搜尋二元搜尋樹的 findName(): ```c= entry *findName(char lastName[], entry *pHead) { if(pHead == NULL) { /* Element is not found */ return NULL; } while(pHead != NULL) { if(strcasecmp(lastName, pHead->lastName) == 0) { return pHead; } else { if(strcasecmp(lastName, pHead->lastName) > 0) { pHead = pHead->pRight; } else { pHead = pHead->pLeft; } } } return pHead; } ``` * 執行: `$ ./phonebook_opt` * append() 及 findName() 時間如下 ``` size of entry : 32 bytes execution time of append() : 0.309874 sec execution time of findName() : 0.003658 sec ``` * catch test: `$ make cache-test` * cache-misses 減少為 68%,但因資料結構較大的緣故,cache-miss較優化版本1增加 5%。 ``` Performance counter stats for './phonebook_opt' (100 runs): 160,2956 cache-misses # 68.397 % of all cache refs ( +- 0.50% ) 234,3613 cache-references ( +- 0.11% ) 3,2665,4220 instructions # 2.38 insn per cycle ( +- 0.01% ) 1,3712,2147 cycles ( +- 0.16% ) 0.050254760 seconds time elapsed ( +- 0.32% ) ``` * 和未優化版本的比較圖: ![](https://i.imgur.com/I0d3JTA.png) 雖然使用了二元搜尋樹,但 findNmae()的時間卻沒減少,反而增加。 推測是由於每次加新資料都從 tree 的尾端,因此會形成這種狀況: ```graphviz digraph bst { aaaa -> aaaaa -> aaaaaa -> aaaaaaa [label = pRight]; aaaa -> NULL [label = pLeft]; aaaaa -> NULL [label = pLeft]; aaaaaa -> NULL [label = pLeft]; aaaaaaa -> NULL [label = pLeft]; } ``` 每一個 node 雖有兩個指標,實際上卻只用到一個(另一個指向 NULL),因此整個二元搜尋樹等同於普通的 linked list,而無法發揮二元搜尋樹的優勢。 >> 請及早調整輸入的 data set [name="jserv"] ## 優化版本3 - 用樹旋轉改進以上的問題 (working) >> 可以先將 words.txt 透過 -R 打亂排序 >> [name=課程助教][color=red]