2017q1 Homework1 (phonebook)

contributed by <jeff60907>

請加快進度!!
課程助教

Reviewed by `refleex`

hash function 有許多種，除了比較 hash table 改變所造成的差異外，也可以比較看看不同 hash function 間的差別

課程說明連結

開發環境

lscpu 查看CPU、快取內容

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping:              9
CPU MHz:               1605.039
CPU max MHz:           3900.0000
CPU min MHz:           1600.0000
BogoMIPS:              6807.01
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

使用 astyle + Git 實做自動程式碼排版檢查

$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]

Phonebook 分析

未優化前測試

$ make run
$ make cache-test

size of entry : 136 bytes
execution time of append() : 0.065398 sec
execution time of findName() : 0.006174 sec

 Performance counter stats for './phonebook_orig' (100 runs):

          199,1215      cache-misses              #   95.216 % of all cache refs      ( +-  0.53% )
          209,1256      cache-references                                              ( +-  0.52% )
       2,6079,3262      instructions              #    1.40  insns per cycle          ( +-  0.02% )
       1,8677,7025      cycles                                                        ( +-  0.47% )

       0.063488818 seconds time elapsed                                          ( +-  2.66% )

最初版本 cache-misses 高達 95.216%，代表大部份的存取資料都是 miss的
接著使用 perf 找熱點了解程式中哪個資源佔最多

$ ./phonebook_orig & sudo perf top -p $!

14.85%  phonebook_orig    [.] findName

perf 出來 findName 為 phonebook中的熱點，但是在上面 append()花的時間卻比findName()多

Memory Leak

參考0140454同學共筆發現有 memory leak 問題，上學期做作業時沒有想過有這個問題，
當程式沒有將記憶體正確釋放將可能導致 memory leak，使用 valgrind 觀察

$ sudo apt-get install valgrind 
$ valgrind --leak-check=full ./phonebook_orig

==26473== HEAP SUMMARY:
==26473==     in use at exit: 47,586,264 bytes in 349,899 blocks
==26473==   total heap usage: 349,906 allocs, 7 frees, 47,596,856 bytes allocated
==26473== 
==26473== 816,000 bytes in 6,000 blocks are possibly lost in loss record 1 of 3
==26473==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26473==    by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473==    by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473== 
==26473== 46,770,264 (136 direct, 46,770,128 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==26473==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26473==    by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473==    by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473== 

==26473== LEAK SUMMARY:
==26473==    definitely lost: 136 bytes in 1 blocks
==26473==    indirectly lost: 46,770,128 bytes in 343,898 blocks
==26473==      possibly lost: 816,000 bytes in 6,000 blocks
==26473==    still reachable: 0 bytes in 0 blocks
==26473==         suppressed: 0 bytes in 0 blocks
==26473== 
==26473== For counts of detected and suppressed errors, rerun with: -v
==26473== Use --track-origins=yes to see where uninitialised values come from
==26473== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)

可以看到 Heap summary 總共程式使用了47,586,264 bytes in 349,899 blocks
Leak summary中 definitely lost 代表確定有memory leak問題必需要解決，而indirectly or possibly 是被懷疑的需要再針對程式中分析。

修改 main.c 記憶體釋放


if (pHead->pNext) free(pHead->pNext);

把 free(pHead->pNext)一個一個 free





while(pHead->pNext) {
    entry *next = pHead->pNext;
    pHead->pNext = next->pNext;
    free(next);
}

==26705== LEAK SUMMARY:
==26705==    definitely lost: 136 bytes in 1 blocks
==26705==    indirectly lost: 0 bytes in 0 blocks
==26705==      possibly lost: 0 bytes in 0 blocks
==26705==    still reachable: 0 bytes in 0 blocks
==26705==         suppressed: 0 bytes in 0 blocks

free(pHead) 移除後發現，有LEAK SUMMARY問題，但是其他懷疑的都OK了
補回 free(pHead)，解決 memory leak 問題

==26738== All heap blocks were freed -- no leaks are possible

優化版本測試1 - 調整資料結構

首先 phonebook_opt.c/h 編輯

find()的時候，只尋找 lastName，但是整個結構包含了其他的資料，把用不到的資料成員自己存放在entry_detail，縮小搜尋的結構。



















typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry_detail;

typedef struct _LASTNAME_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    entry_detail *detail;
    struct _LASTNAME_ENTRY *pNext;
} entry;

size of entry : 32 bytes
execution time of append() : 0.028760 sec
execution time of findName() : 0.002783 sec

size 從 136 bytes 降為 32 bytes

Performance counter stats for './phonebook_opt' (100 runs):

           35,6617      cache-misses              #   74.485 % of all cache refs      ( +-  0.19% )
           47,8774      cache-references                                              ( +-  0.52% )
       2,4427,3902      instructions              #    1.82  insns per cycle          ( +-  0.02% )
       1,3421,9984      cycles                                                        ( +-  0.44% )

       0.040133182 seconds time elapsed                                          ( +-  2.18% )

單純更改資料結構 cache-miss 從 95% 降到75% ，改善一點結構就可以讓 cache 搬運成本降低。

優化版本測試2 - Hash Function

閱讀 hash 相關資料
建立 phonebook_hash.c/h
參考hash函式衝突分析

使用 BKDR-Hash 進行優化
hash table size = 1024








unsigned int BKDRHash(char *str)
{
    unsigned int hash = 0;
    while (*str) {
        hash = hash * 131 + (*str++);
    }
    return (hash & 0x7FFFFFFF);
}

size of entry : 32 bytes
execution time of append() : 0.036309 sec
execution time of findName() : 0.000034 sec

 Performance counter stats for './phonebook_hash' (100 runs):

           32,2706      cache-misses              #   50.147 % of all cache refs      ( +-  0.31% )
           64,3514      cache-references                                              ( +-  0.62% )
       2,2868,9908      instructions              #    1.69  insns per cycle          ( +-  0.02% )
       1,3552,0183      cycles                                                        ( +-  0.62% )

       0.045378226 seconds time elapsed                                          ( +-  2.95% )

append()時間上升蠻多的，是因為建立 hash table 需要額外的時間，但是建立完後只要找到hash value 後需要搜尋的範圍就縮小很多了，findName()效能就大幅得上升。
而且 cahce-miss 也從 74% 變成 50%

測試加大 hash table size

讓 hash table 碰撞機率下降是否可以提高效能？
hash table size = 2048

Performance counter stats for './phonebook_hash' (100 runs):

           32,6460      cache-misses              #   46.424 % of all cache refs      ( +-  0.22% )
           70,3208      cache-references                                              ( +-  0.73% )
       2,2888,2470      instructions              #    1.68  insns per cycle          ( +-  0.02% )
       1,3649,5806      cycles                                                        ( +-  0.50% )

       0.046204817 seconds time elapsed                                          ( +-  2.65% )

hash table size = 8192

Performance counter stats for './phonebook_hash' (100 runs):

           33,9796      cache-misses              #   39.912 % of all cache refs      ( +-  0.10% )
           85,1367      cache-references                                              ( +-  0.16% )
       2,2994,7215      instructions              #    1.66  insns per cycle          ( +-  0.02% )
       1,3835,4609      cycles                                                        ( +-  0.11% )

       0.052061784 seconds time elapsed                                          ( +-  2.74% )

hash table size = 32768

 Performance counter stats for './phonebook_hash' (100 runs):

           46,1318      cache-misses              #   40.176 % of all cache refs      ( +-  0.81% )
          114,8230      cache-references                                              ( +-  0.19% )
       2,3405,9699      instructions              #    1.49  insns per cycle          ( +-  0.02% )
       1,5712,5488      cycles                                                        ( +-  0.50% )

       0.058200217 seconds time elapsed                                          ( +-  2.32% )

提高 hash table size 可以提高執行速率但是一定程度後就不再改變，另外也可以降低 cache-miss 但是 size 高於某一個值 cache-miss 反而開始上升，並不是愈高愈好。

可以嘗試 "質數" hash table size 實驗
參考閱讀: StackOverflow - Why should hash functions use a prime number modulus?
課程助教

優化版本測試3 - 嘗試使用 tree 優化

Trie 字典樹存字串的方式，類似於編排字典的方式，減低了檢索單字的困難度，Trie 可以當成是多層次的索引表。
- 優點是速度快速，缺點是耗費記憶體
  參考ntnu演算法筆記
   參考Vivian Lin助教共筆
   參考0140454同學共筆

建立 pChild[27] 紀錄字母索引表





typedef struct _LASTNAME_ENTRY {
    char ch;
    entry_detail *detail;
    struct _LASTNAME_ENTRY *pChild[27];
} entry;

size of entry : 232 bytes
execution time of append() : 0.192304 sec
execution time of findName() : 0.000000 sec

 Performance counter stats for './phonebook_trie' (100 runs):

          565,2061      cache-misses              #   92.578 % of all cache refs      ( +-  0.05% )
          610,5195      cache-references                                              ( +-  0.04% )
      16,1597,4402      instructions              #    1.53  insns per cycle          ( +-  0.01% )
      10,5371,1599      cycles                                                        ( +-  0.07% )

       0.332670737 seconds time elapsed                                          ( +-  0.83% )

cache-misses 暴增和 append 時間也比未優化更久，因為再建立樹結構的時候就花費很多時間成本，但是 findName()執行時間就變成是最快的了

優化版本測試4 - 建立 memeory pool

memory pool 為事先建立一大塊的記憶體先放著，可以取代傳統的 malloc，因為每一次呼叫 malloc 都會有系統成本存在，之後 append() 需要記憶體空間就可以從 pool 存取即可。

memory pool 的大小
trie     2 * 1024 * 1024
others   512 * 1024

trie 需要的記憶體空間大很多，給太小就會造成記憶體錯誤，可以驗證 trie 上述優點是速度快速，缺點是耗費記憶體

使用了memory pool 大幅地改善 append()的效能，trie 的時間減少了一半多，表示 trie 建立索引表時，呼叫 malloc時間成本很高。