# 2017q1 Homework1 (phonebook) contributed by <`jeff60907`> >>請加快進度!! >>[name=課程助教][color=red] ### Reviewed by `refleex` - hash function 有許多種,除了比較 hash table 改變所造成的差異外,也可以比較看看不同 hash function 間的差別 ## 課程說明連結 * [B01: phonebook作業要求](https://hackmd.io/s/rJYD4UPKe#) * [gnuplot 語法解說和示範](https://hackmd.io/s/Skwp-alOg#) * [cache 原理和實驗](https://hackmd.io/MwMwRgHGDsIIYFoAm04DYEBZMEYcIhwCYBTBNABgFYlgSRYBOUoA#) * [hash function 觀念和實務](https://hackmd.io/GwRg7AzAJgxgDATgLQCMERkgLFgZr1ADhGRilwEMz9CAmFQoA===?view) ## 開發環境 * lscpu 查看CPU、快取內容 ``` Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Stepping: 9 CPU MHz: 1605.039 CPU max MHz: 3900.0000 CPU min MHz: 1600.0000 BogoMIPS: 6807.01 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 ``` ### 使用 astyle + Git 實做自動程式碼排版檢查 ``` $ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch] ``` ## Phonebook 分析 ### 未優化前測試 ``` $ make run $ make cache-test ``` ``` size of entry : 136 bytes execution time of append() : 0.065398 sec execution time of findName() : 0.006174 sec ``` ``` Performance counter stats for './phonebook_orig' (100 runs): 199,1215 cache-misses # 95.216 % of all cache refs ( +- 0.53% ) 209,1256 cache-references ( +- 0.52% ) 2,6079,3262 instructions # 1.40 insns per cycle ( +- 0.02% ) 1,8677,7025 cycles ( +- 0.47% ) 0.063488818 seconds time elapsed ( +- 2.66% ) ``` 最初版本 cache-misses 高達 95.216%,代表大部份的存取資料都是 miss的 接著使用 `perf` 找熱點 了解 程式中哪個資源佔最多 ``` $ ./phonebook_orig & sudo perf top -p $! ``` ``` 14.85% phonebook_orig [.] findName ``` perf 出來 findName 為 phonebook中的 熱點,但是在上面 append()花的時間卻比findName()多 ### Memory Leak [參考0140454同學共筆](https://hackmd.io/s/Bkx3cYX6#)發現有 memory leak 問題,上學期做作業時沒有想過有這個問題, 當程式沒有將記憶體正確釋放將可能導致 memory leak,使用 `valgrind` 觀察 ``` $ sudo apt-get install valgrind $ valgrind --leak-check=full ./phonebook_orig ``` ``` ==26473== HEAP SUMMARY: ==26473== in use at exit: 47,586,264 bytes in 349,899 blocks ==26473== total heap usage: 349,906 allocs, 7 frees, 47,596,856 bytes allocated ==26473== ==26473== 816,000 bytes in 6,000 blocks are possibly lost in loss record 1 of 3 ==26473== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==26473== by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig) ==26473== by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig) ==26473== ==26473== 46,770,264 (136 direct, 46,770,128 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==26473== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==26473== by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig) ==26473== by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig) ==26473== ==26473== LEAK SUMMARY: ==26473== definitely lost: 136 bytes in 1 blocks ==26473== indirectly lost: 46,770,128 bytes in 343,898 blocks ==26473== possibly lost: 816,000 bytes in 6,000 blocks ==26473== still reachable: 0 bytes in 0 blocks ==26473== suppressed: 0 bytes in 0 blocks ==26473== ==26473== For counts of detected and suppressed errors, rerun with: -v ==26473== Use --track-origins=yes to see where uninitialised values come from ==26473== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0) ``` > 可以看到 `Heap summary` 總共程式使用了`47,586,264 bytes in 349,899 blocks` > `Leak summary`中 `definitely lost` 代表確定有`memory leak`問題必需要解決,而`indirectly or possibly` 是被懷疑的需要再針對程式中分析。 * 修改 main.c 記憶體釋放 ```c= if (pHead->pNext) free(pHead->pNext); ``` 把 free(pHead->pNext)一個一個 free ```c= while(pHead->pNext) { entry *next = pHead->pNext; pHead->pNext = next->pNext; free(next); } ``` ``` ==26705== LEAK SUMMARY: ==26705== definitely lost: 136 bytes in 1 blocks ==26705== indirectly lost: 0 bytes in 0 blocks ==26705== possibly lost: 0 bytes in 0 blocks ==26705== still reachable: 0 bytes in 0 blocks ==26705== suppressed: 0 bytes in 0 blocks ``` > free(pHead) 移除後發現,有`LEAK SUMMARY`問題,但是其他懷疑的都OK了 > 補回 free(pHead),解決 `memory leak` 問題 ``` ==26738== All heap blocks were freed -- no leaks are possible ``` ### 優化版本測試1 - 調整資料結構 * 首先 phonebook_opt.c/h 編輯 find()的時候,只尋找 lastName,但是整個結構包含了其他的資料,把用不到的資料成員自己存放在entry_detail,縮小搜尋的結構。 ```c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry_detail; typedef struct _LASTNAME_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; entry_detail *detail; struct _LASTNAME_ENTRY *pNext; } entry; ``` ``` size of entry : 32 bytes execution time of append() : 0.028760 sec execution time of findName() : 0.002783 sec ``` size 從 136 bytes 降為 32 bytes ``` Performance counter stats for './phonebook_opt' (100 runs): 35,6617 cache-misses # 74.485 % of all cache refs ( +- 0.19% ) 47,8774 cache-references ( +- 0.52% ) 2,4427,3902 instructions # 1.82 insns per cycle ( +- 0.02% ) 1,3421,9984 cycles ( +- 0.44% ) 0.040133182 seconds time elapsed ( +- 2.18% ) ``` >單純更改資料結構 cache-miss 從 95% 降到75% ,改善一點結構就可以讓 cache 搬運成本降低。 ![](https://i.imgur.com/3kvWsFF.png) ### 優化版本測試2 - Hash Function * 閱讀 hash 相關資料 * 建立 phonebook_hash.c/h * [參考hash函式衝突分析](http://blog.csdn.net/qq7366020/article/details/8730425) 使用 BKDR-Hash 進行優化 hash table size = 1024 ```c= unsigned int BKDRHash(char *str) { unsigned int hash = 0; while (*str) { hash = hash * 131 + (*str++); } return (hash & 0x7FFFFFFF); } ``` ``` size of entry : 32 bytes execution time of append() : 0.036309 sec execution time of findName() : 0.000034 sec ``` ``` Performance counter stats for './phonebook_hash' (100 runs): 32,2706 cache-misses # 50.147 % of all cache refs ( +- 0.31% ) 64,3514 cache-references ( +- 0.62% ) 2,2868,9908 instructions # 1.69 insns per cycle ( +- 0.02% ) 1,3552,0183 cycles ( +- 0.62% ) 0.045378226 seconds time elapsed ( +- 2.95% ) ``` > append()時間上升蠻多的,是因為建立 hash table 需要額外的時間,但是建立完後只要找到hash value 後需要搜尋的範圍就縮小很多了,findName()效能就大幅得上升。 > 而且 cahce-miss 也從 74% 變成 50% ![](https://i.imgur.com/TmvExVU.png) #### 測試加大 hash table size 讓 hash table 碰撞機率下降是否可以提高效能? hash table size = 2048 ``` Performance counter stats for './phonebook_hash' (100 runs): 32,6460 cache-misses # 46.424 % of all cache refs ( +- 0.22% ) 70,3208 cache-references ( +- 0.73% ) 2,2888,2470 instructions # 1.68 insns per cycle ( +- 0.02% ) 1,3649,5806 cycles ( +- 0.50% ) 0.046204817 seconds time elapsed ( +- 2.65% ) ``` ![](https://i.imgur.com/gr0Ygcb.png) hash table size = 8192 ``` Performance counter stats for './phonebook_hash' (100 runs): 33,9796 cache-misses # 39.912 % of all cache refs ( +- 0.10% ) 85,1367 cache-references ( +- 0.16% ) 2,2994,7215 instructions # 1.66 insns per cycle ( +- 0.02% ) 1,3835,4609 cycles ( +- 0.11% ) 0.052061784 seconds time elapsed ( +- 2.74% ) ``` ![](https://i.imgur.com/QNVATEd.png) hash table size = 32768 ``` Performance counter stats for './phonebook_hash' (100 runs): 46,1318 cache-misses # 40.176 % of all cache refs ( +- 0.81% ) 114,8230 cache-references ( +- 0.19% ) 2,3405,9699 instructions # 1.49 insns per cycle ( +- 0.02% ) 1,5712,5488 cycles ( +- 0.50% ) 0.058200217 seconds time elapsed ( +- 2.32% ) ``` > 提高 hash table size 可以提高執行速率但是一定程度後就不再改變,另外也可以降低 cache-miss 但是 size 高於某一個值 cache-miss 反而開始上升,並不是愈高愈好。 >>可以嘗試 "質數" hash table size 實驗 >>參考閱讀: [StackOverflow - Why should hash functions use a prime number modulus?](http://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus) >>[name=課程助教][color=red] ### 優化版本測試3 - 嘗試使用 tree 優化 * Trie 字典樹 存字串的方式,類似於編排字典的方式,減低了檢索單字的困難度,Trie 可以當成是多層次的索引表。 * 優點是速度快速,缺點是耗費記憶體 [參考ntnu演算法筆記](http://www.csie.ntnu.edu.tw/~u91029/String.html) [參考Vivian Lin助教共筆](https://hackmd.io/s/BJRMImUPg#) [參考0140454同學共筆](https://hackmd.io/s/Bkx3cYX6#) 建立 `pChild[27]` 紀錄字母索引表 ```c= typedef struct _LASTNAME_ENTRY { char ch; entry_detail *detail; struct _LASTNAME_ENTRY *pChild[27]; } entry; ``` ``` size of entry : 232 bytes execution time of append() : 0.192304 sec execution time of findName() : 0.000000 sec ``` ``` Performance counter stats for './phonebook_trie' (100 runs): 565,2061 cache-misses # 92.578 % of all cache refs ( +- 0.05% ) 610,5195 cache-references ( +- 0.04% ) 16,1597,4402 instructions # 1.53 insns per cycle ( +- 0.01% ) 10,5371,1599 cycles ( +- 0.07% ) 0.332670737 seconds time elapsed ( +- 0.83% ) ``` ![](https://i.imgur.com/kuMU0hf.png) > cache-misses 暴增 和 append 時間也比未優化更久,因為再建立樹結構的時候就花費很多時間成本, 但是 findName()執行時間就變成是最快的了 ### 優化版本測試4 - 建立 memeory pool memory pool 為事先建立一大塊的記憶體先放著,可以取代傳統的 malloc,因為每一次呼叫 malloc 都會有系統成本存在,之後 append() 需要記憶體空間就可以從 pool 存取即可。 ``` memory pool 的大小 trie 2 * 1024 * 1024 others 512 * 1024 ``` > trie 需要的記憶體空間大很多,給太小就會造成記憶體錯誤,可以驗證 trie 上述優點是速度快速,缺點是耗費記憶體 ![](https://i.imgur.com/ppq2jem.png) > 使用了memory pool 大幅地改善 append()的效能,trie 的時間減少了一半多,表示 trie 建立索引表時,呼叫 malloc時間成本很高。 ###### tags: `jeff60907`