Try   HackMD

2017q1 Homework1 (phonebook)

contributed by <jeff60907>

請加快進度!!
課程助教

Reviewed by refleex

  • hash function 有許多種,除了比較 hash table 改變所造成的差異外,也可以比較看看不同 hash function 間的差別

課程說明連結

開發環境

  • lscpu 查看CPU、快取內容
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping:              9
CPU MHz:               1605.039
CPU max MHz:           3900.0000
CPU min MHz:           1600.0000
BogoMIPS:              6807.01
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

使用 astyle + Git 實做自動程式碼排版檢查

$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]

Phonebook 分析

未優化前測試

$ make run
$ make cache-test
size of entry : 136 bytes
execution time of append() : 0.065398 sec
execution time of findName() : 0.006174 sec
 Performance counter stats for './phonebook_orig' (100 runs):

          199,1215      cache-misses              #   95.216 % of all cache refs      ( +-  0.53% )
          209,1256      cache-references                                              ( +-  0.52% )
       2,6079,3262      instructions              #    1.40  insns per cycle          ( +-  0.02% )
       1,8677,7025      cycles                                                        ( +-  0.47% )

       0.063488818 seconds time elapsed                                          ( +-  2.66% )

最初版本 cache-misses 高達 95.216%,代表大部份的存取資料都是 miss的
接著使用 perf 找熱點 了解 程式中哪個資源佔最多

$ ./phonebook_orig & sudo perf top -p $!
14.85%  phonebook_orig    [.] findName

perf 出來 findName 為 phonebook中的 熱點,但是在上面 append()花的時間卻比findName()多

Memory Leak

參考0140454同學共筆發現有 memory leak 問題,上學期做作業時沒有想過有這個問題,
當程式沒有將記憶體正確釋放將可能導致 memory leak,使用 valgrind 觀察

$ sudo apt-get install valgrind 
$ valgrind --leak-check=full ./phonebook_orig 
==26473== HEAP SUMMARY:
==26473==     in use at exit: 47,586,264 bytes in 349,899 blocks
==26473==   total heap usage: 349,906 allocs, 7 frees, 47,596,856 bytes allocated
==26473== 
==26473== 816,000 bytes in 6,000 blocks are possibly lost in loss record 1 of 3
==26473==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26473==    by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473==    by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473== 
==26473== 46,770,264 (136 direct, 46,770,128 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==26473==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26473==    by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473==    by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473== 

==26473== LEAK SUMMARY:
==26473==    definitely lost: 136 bytes in 1 blocks
==26473==    indirectly lost: 46,770,128 bytes in 343,898 blocks
==26473==      possibly lost: 816,000 bytes in 6,000 blocks
==26473==    still reachable: 0 bytes in 0 blocks
==26473==         suppressed: 0 bytes in 0 blocks
==26473== 
==26473== For counts of detected and suppressed errors, rerun with: -v
==26473== Use --track-origins=yes to see where uninitialised values come from
==26473== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)

可以看到 Heap summary 總共程式使用了47,586,264 bytes in 349,899 blocks
Leak summarydefinitely lost 代表確定有memory leak問題必需要解決,而indirectly or possibly 是被懷疑的需要再針對程式中分析。

  • 修改 main.c 記憶體釋放
if (pHead->pNext) free(pHead->pNext);

把 free(pHead->pNext)一個一個 free

while(pHead->pNext) { entry *next = pHead->pNext; pHead->pNext = next->pNext; free(next); }
==26705== LEAK SUMMARY:
==26705==    definitely lost: 136 bytes in 1 blocks
==26705==    indirectly lost: 0 bytes in 0 blocks
==26705==      possibly lost: 0 bytes in 0 blocks
==26705==    still reachable: 0 bytes in 0 blocks
==26705==         suppressed: 0 bytes in 0 blocks

free(pHead) 移除後發現,有LEAK SUMMARY問題,但是其他懷疑的都OK了
補回 free(pHead),解決 memory leak 問題

==26738== All heap blocks were freed -- no leaks are possible

優化版本測試1 - 調整資料結構

  • 首先 phonebook_opt.c/h 編輯

find()的時候,只尋找 lastName,但是整個結構包含了其他的資料,把用不到的資料成員自己存放在entry_detail,縮小搜尋的結構。

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry_detail; typedef struct _LASTNAME_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; entry_detail *detail; struct _LASTNAME_ENTRY *pNext; } entry;
size of entry : 32 bytes
execution time of append() : 0.028760 sec
execution time of findName() : 0.002783 sec

size 從 136 bytes 降為 32 bytes

Performance counter stats for './phonebook_opt' (100 runs):

           35,6617      cache-misses              #   74.485 % of all cache refs      ( +-  0.19% )
           47,8774      cache-references                                              ( +-  0.52% )
       2,4427,3902      instructions              #    1.82  insns per cycle          ( +-  0.02% )
       1,3421,9984      cycles                                                        ( +-  0.44% )

       0.040133182 seconds time elapsed                                          ( +-  2.18% )

單純更改資料結構 cache-miss 從 95% 降到75% ,改善一點結構就可以讓 cache 搬運成本降低。

優化版本測試2 - Hash Function

使用 BKDR-Hash 進行優化
hash table size = 1024

unsigned int BKDRHash(char *str) { unsigned int hash = 0; while (*str) { hash = hash * 131 + (*str++); } return (hash & 0x7FFFFFFF); }
size of entry : 32 bytes
execution time of append() : 0.036309 sec
execution time of findName() : 0.000034 sec
 Performance counter stats for './phonebook_hash' (100 runs):

           32,2706      cache-misses              #   50.147 % of all cache refs      ( +-  0.31% )
           64,3514      cache-references                                              ( +-  0.62% )
       2,2868,9908      instructions              #    1.69  insns per cycle          ( +-  0.02% )
       1,3552,0183      cycles                                                        ( +-  0.62% )

       0.045378226 seconds time elapsed                                          ( +-  2.95% )

append()時間上升蠻多的,是因為建立 hash table 需要額外的時間,但是建立完後只要找到hash value 後需要搜尋的範圍就縮小很多了,findName()效能就大幅得上升。
而且 cahce-miss 也從 74% 變成 50%

測試加大 hash table size

讓 hash table 碰撞機率下降是否可以提高效能?
hash table size = 2048

Performance counter stats for './phonebook_hash' (100 runs):

           32,6460      cache-misses              #   46.424 % of all cache refs      ( +-  0.22% )
           70,3208      cache-references                                              ( +-  0.73% )
       2,2888,2470      instructions              #    1.68  insns per cycle          ( +-  0.02% )
       1,3649,5806      cycles                                                        ( +-  0.50% )

       0.046204817 seconds time elapsed                                          ( +-  2.65% )

hash table size = 8192

Performance counter stats for './phonebook_hash' (100 runs):

           33,9796      cache-misses              #   39.912 % of all cache refs      ( +-  0.10% )
           85,1367      cache-references                                              ( +-  0.16% )
       2,2994,7215      instructions              #    1.66  insns per cycle          ( +-  0.02% )
       1,3835,4609      cycles                                                        ( +-  0.11% )

       0.052061784 seconds time elapsed                                          ( +-  2.74% )

hash table size = 32768

 Performance counter stats for './phonebook_hash' (100 runs):

           46,1318      cache-misses              #   40.176 % of all cache refs      ( +-  0.81% )
          114,8230      cache-references                                              ( +-  0.19% )
       2,3405,9699      instructions              #    1.49  insns per cycle          ( +-  0.02% )
       1,5712,5488      cycles                                                        ( +-  0.50% )

       0.058200217 seconds time elapsed                                          ( +-  2.32% )

提高 hash table size 可以提高執行速率但是一定程度後就不再改變,另外也可以降低 cache-miss 但是 size 高於某一個值 cache-miss 反而開始上升,並不是愈高愈好。

可以嘗試 "質數" hash table size 實驗
參考閱讀: StackOverflow - Why should hash functions use a prime number modulus?
課程助教

優化版本測試3 - 嘗試使用 tree 優化

建立 pChild[27] 紀錄字母索引表

typedef struct _LASTNAME_ENTRY { char ch; entry_detail *detail; struct _LASTNAME_ENTRY *pChild[27]; } entry;
size of entry : 232 bytes
execution time of append() : 0.192304 sec
execution time of findName() : 0.000000 sec
 Performance counter stats for './phonebook_trie' (100 runs):

          565,2061      cache-misses              #   92.578 % of all cache refs      ( +-  0.05% )
          610,5195      cache-references                                              ( +-  0.04% )
      16,1597,4402      instructions              #    1.53  insns per cycle          ( +-  0.01% )
      10,5371,1599      cycles                                                        ( +-  0.07% )

       0.332670737 seconds time elapsed                                          ( +-  0.83% )

cache-misses 暴增 和 append 時間也比未優化更久,因為再建立樹結構的時候就花費很多時間成本, 但是 findName()執行時間就變成是最快的了

優化版本測試4 - 建立 memeory pool

memory pool 為事先建立一大塊的記憶體先放著,可以取代傳統的 malloc,因為每一次呼叫 malloc 都會有系統成本存在,之後 append() 需要記憶體空間就可以從 pool 存取即可。

memory pool 的大小
trie     2 * 1024 * 1024
others   512 * 1024 

trie 需要的記憶體空間大很多,給太小就會造成記憶體錯誤,可以驗證 trie 上述優點是速度快速,缺點是耗費記憶體

使用了memory pool 大幅地改善 append()的效能,trie 的時間減少了一半多,表示 trie 建立索引表時,呼叫 malloc時間成本很高。

tags: jeff60907