# 2016q3 Homework1 (phonebook) contributed by <`a530788140`> ### Reviewed by `Louie Lu` * git log is too simplify for reading * calcuate.c 33:40 寫的很奇怪。 * 沒有推上 Hashtable 的 code. * coding style * consist if else block (you got two style, i prefer this one) ```c if (expr) { stmt } else { stmt } ``` ## 實作二元搜尋樹 * `append` ```c entry *append(char lastName[], entry *root) { /* allocate memory for the new entry and put lastName */ if (strcasecmp("\0", root->lastName) == 0) { strcpy(root->lastName, lastName); } else { entry *tmp, *current, *parent; tmp = (entry *) malloc(sizeof(entry)); strcpy(tmp->lastName, lastName); tmp->lNext = NULL; tmp->rNext = NULL; current = root; while (current != NULL) { parent = current; if (hash(lastName) > hash(current->lastName)) current = current->rNext; else current = current->lNext; } if (hash(lastName) > hash(parent->lastName)) parent->rNext = tmp; else parent->lNext = tmp; } return root; } ``` >> 請保持原始程式碼一致的縮排 (4 個空白,而非 tab),注意看作業要求中 astyle 的設定方式 [name=jserv] >感謝老師提醒,已修改[name=儂偉] * `findName` ```c entry *findName(char lastname[], entry *root) { while (root != NULL) { if (strcasecmp(lastname, root->lastName) == 0) { return root; } else { if (hash(lastname) > hash(root->lastName)) { root = root->rNext; } else root = root->lNext; } } return NULL; } ``` --- ### 發現問題 >程式碼停在main 43行無法編譯[name=儂偉][time=Sat, Sep 24, 2016 3:30 PM] ### 使用GDB ``` Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at main.c:25 25 { (gdb) n 27 int i = 0; (gdb) n 33 fp = fopen(DICT_FILE, "r"); (gdb) n 34 if (fp == NULL) { (gdb) n 41 pHead = (entry *) malloc(sizeof(entry)); (gdb) n 42 printf("start"); (gdb) n 43 printf("size of entry : %lu bytes\n", sizeof(entry)); (gdb) n startsize of entry : 32 bytes 44 printf("end"); (gdb) n 45 e = pHead; (gdb) n 48 e->rNext = e->lNext = NULL; (gdb) ``` >>請避免用圖片來呈現文字,請多用文字方塊,這樣才能方便複製和搜尋[name=jserv] >>發現44行後無法執行的問題,待解決[name=儂偉][time=Sat, Sep 24, 2016 3:32 PM] >已改善,但目前還找不出問題[name=儂偉][time=Sat, Sep 24, 2016 4:51 PM] >發現降低字典檔數量就可以執行,還在思考原因[name=儂偉][time=Sat, Sep 24, 2016 5:05 PM] ``` startsize of entry : 32 bytes endexecution time of append() : 257.225241 sec execution time of findName() : 0.003507 sec ``` >發現原因,由於append用字串長度,字串長度太接近,樹的深度太深[name=儂偉][time=Sat, Sep 24, 2016 5:13 PM] --- ## 降低字典檔數量為5千 ![](https://i.imgur.com/yfrJt5f.png) 用二元樹會讓append的時間大幅增加,由於每次都要從頭開始判斷,也有可能是因為判斷大小的不夠均勻,導致樹歪掉,用數學來看log~2~50000~=15,照理說應該不會差到超過100倍,還在思考。 ### 使用hash(djb2)處理字串 改用5W字典檔 ```c unsigned long hash(char* str) { unsigned long hash = 5381; int c; while ((c = *str++) != 0) hash = ((hash << 5) + hash) + c; /* hash * 33 + c */ return hash; } ``` ![](https://i.imgur.com/SJKGWgi.png) 發現 `findName` 的速度快了5倍,但append時間還是太長,應該可以再進一步優化。 #### 未優化 ``` startsize of entry : 136 bytes endexecution time of append() : 0.008545 sec execution time of findName() : 0.000757 sec Performance counter stats for './phonebook_orig' (100 runs): 298,739 cache-misses # 90.767 refs ( +- 0.55% ) 329,128 cache-references ( +- 0.45% ) 38,030,533 instructions # 0.91 insns per ( +- 0.04% ) 41,899,470 cycles ( +- 2.70% ) 0.013604583 seconds time elapsed ( +- 2.61% ) ``` #### 優化後 ``` Performance counter stats for './phonebook_opt' (100 runs): 2,874,890 cache-misses # 3.110 % of all cache refs ( +- 3.77% ) 92,449,220 cache-references ( +- 0.22% ) 51,082,295,945 instructions # 1.94 insns per cycle ( +- 0.00% ) 26,326,936,041 cycles ( +- 0.80% ) 9.239935448 seconds time elapsed ( +- 0.89% ) ``` 發現優化後cache-misses差了30倍 > 對照不同資料結構對 phonebook [效能影響的實驗](https://embedded2015.hackpad.com/note-hw2-phonebook-3mUOSk5J9cu) by [Jingfei](https://github.com/jingfei/phonebook). [name=jserv] ### 優化hash方式來處理字串 ``` unsigned long hash(char *key) { unsigned int hashVal = 0; while (*key != '\0') { hashVal = (hashVal << 5) + *key++; } return hashVal % 42737; } ``` ### 大量減少append時間和findname時間 ``` size of entry : 32 bytes execution time of append() : 0.832717 sec execution time of findName() : 0.000000996 sec ``` ![](https://i.imgur.com/CN3wGEv.png) ### 但cache-misses增加了 ``` Performance counter stats for './phonebook_opt' (100 runs): 5,932,784 cache-misses # 36.397 % of all cache refs( +- 0.26% ) 16,300,312 cache-references ( +- 0.07% ) 3,345,379,637 instructions # 1.23 insns per cycle ( +- 0.00% ) 2,711,734,441 cycles ( +- 0.13% ) 0.829867464 seconds time elapsed ( +- 0.99% ) ``` --- ## 實作HashTable * findName ```c entry *findName(char lastname[], Hashtable *ht) { entry *tmp; unsigned int index = hash2(lastname,ht); for(tmp = ht->list[index]; tmp != NULL ; tmp = tmp->pNext) if(strcasecmp(tmp->lastName,lastname) == 0) return tmp; printf("%12s can not be find! \n",lastname); return NULL; } ``` *append ```c Hashtable *append(char lastName[], Hashtable *ht) { unsigned int index = hash2(lastName, ht); entry *new_entry; new_entry = (entry*) malloc(sizeof(entry)); if (new_entry == NULL) return NULL; strcpy(new_entry->lastName,lastName); new_entry->pNext = ht->list[index]; ht->list[index] = new_entry; return ht; } ``` ### 遇到問題 ``` Hashtable size = 42737 size of entry : 24 bytes execution time of append() : 0.090755 sec execution time of findName() : 0.000000 sec ``` >findname一直都是0秒 待解決[name=儂偉] ### 使用GDB觀察findname時間 ``` Breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at main.c:100 100 clock_gettime(CLOCK_REALTIME, &start); (gdb) p start $1 = {tv_sec = 1474797779, tv_nsec = 612819655} (gdb) n 101 findName(input, e); (gdb) n 102 clock_gettime(CLOCK_REALTIME, &end); (gdb) p end $2 = {tv_sec = 1474797779, tv_nsec = 698358342} (gdb) ``` 計算出findname時間差約為85538687ns ![](https://i.imgur.com/RUdlTwm.png)