2016q3 Homework1 (phonebook)

contributed by <`HaoTse`>

Reviewed by <`jkrvivian`>

可以嘗試實作不同hash function進行效能比較
沒有進一步分析使用hash function造成append()時間提高的原因
還未實作文末提及到的字串壓縮法及binary search tree
91f657b 和315f489的commit可以參考 How to Write a Git Commit Message改善
64c7ccf 沒有說明程式碼修改的原因

開發環境

在筆電上灌了 Ubuntu 16.04 LTS
.vimrc參考成大資工wiki
安裝相關開發工具
- 基本編譯器
- 效能分析工具 (perf)
- astyle
- colordiff
- gnuplot

$ sudo apt-get update
$ sudo apt-get install build-essential
$ sudo apt-get install linux-tools-common linux-tools-generic
$ sudo apt-get install astyle colordiff gnuplot

$uname -a

Linux tse-Aspire-V5-573G 4.4.0-38-generic

查看電腦 cache 大小 $ lscpu | grep cache

L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

案例分析：Phone Book

未優化版本

$ ./phonebook_orig

size of entry : 136 bytes
execution time of append() : 0.047622 sec
execution time of findName() : 0.005108 sec

一個entry(PhoneBook size)為 136bytes，L1 cache的size為 32KB = 32 * 1024bytes，所以(32 * 1024) / (136 * 8) = 30.12，只能存30筆左右。

$ make cache-test

Performance counter stats for './phonebook_orig' (100 runs):

           96,4439      cache-misses              #   87.936 % of all cache refs      ( +-  0.43% )
          109,6754      cache-references                                              ( +-  0.28% )
       2,6120,5049      instructions              #    1.50  insns per cycle          ( +-  0.02% )
       1,7371,1341      cycles                                                        ( +-  0.39% )

       0.070385042 seconds time elapsed                                          ( +-  0.57% )

$ ./phonebook_orig & sudo perf top -p $!

使用perf尋找熱點鄭皓澤































  23.06%  libc-2.23.so      [.] __strcasecmp_l_avx     
  22.35%  phonebook_orig    [.] findName            
   8.66%  phonebook_orig    [.] main           
   5.41%  libc-2.23.so      [.] _int_malloc     
   5.21%  [kernel]          [k] clear_page_c_e  
   4.01%  phonebook_orig    [.] append         
   3.96%  libc-2.23.so      [.] __memcpy_sse2     
   3.65%  libc-2.23.so      [.] _IO_fgets
   3.20%  [unknown]         [k] 0x00007ff9db93e02e      
   2.73%  [kernel]          [k] release_pages      
   2.17%  libc-2.23.so      [.] memchr          
   1.40%  libc-2.23.so      [.] __strcpy_sse2_unaligned     
   1.40%  libc-2.23.so      [.] _IO_getline_info   
   1.37%  [kernel]          [k] uncharge_list  
   1.36%  [kernel]          [k] page_remove_rmap 
   1.28%  [kernel]          [k] alloc_pages_vma 
   0.94%  [kernel]          [k] lru_cache_add_active_or_unevictable
   0.81%  [kernel]          [k] __rmqueue.isra.79      
   0.71%  [kernel]          [k] get_page_from_freelist   
   0.71%  libc-2.23.so      [.] _IO_getline             
   0.71%  [kernel]          [k] pagevec_lru_move_fn  
   0.71%  libc-2.23.so      [.] malloc           
   0.70%  libc-2.23.so      [.] __strcasecmp_avx
   0.70%  phonebook_orig    [.] strcasecmp@plt   
   0.70%  [kernel]          [k] __pagevec_lru_add_fn  
   0.69%  [kernel]          [k] free_hot_cold_page 
   0.68%  [kernel]          [k] unmap_page_range  
   0.67%  [kernel]          [k] free_pcppages_bulk
   0.05%  [kernel]          [k] __perf_event_enable
   0.00%  [kernel]          [k] nmi_handle 
   0.00%  [kernel]          [k] native_write_msr_safe

使用 perf 的結果中的第2行我們可以看出 findName() 佔了 22.35%，第4行 append() 則佔了4.01%，findName()為熱點，著手改善findName()的速度。

優化方法一：減少struct的大小

前面提到我的電腦的L1 cache最多只能放30個entry，但總共有35萬個單字，因此一定會發生頻繁的cache miss。
根據題目要求「有沒有找到 last name」，我們只需要搜尋last name，因此其他資訊是可以忽略不看的。所以在這裡我就另外宣告了一個struct儲存last name，將他取代原本的entry，另外將原本的entry改為detail。





typedef struct __LAST_NAME_ENTRY{
    char lastName[MAX_LAST_NAME_SIZE];
    detail *detail;
    struct __LAST_NAME_ENTRY *pNext;
} entry;

效能分析

$ ./phonebook_opt

size of entry : 32 bytes
execution time of append() : 0.053768 sec
execution time of findName() : 0.002357 sec

entry的大小很明顯的從 136bytes下降至 32bytes，cache就可以放入(32 * 1024) / (32 * 8) = 128筆資料，大大的增加cache hit的機會。
findName()的執行時間也明顯下降了。

$ make cache-test

 Performance counter stats for './phonebook_opt' (100 runs):

           14,9402      cache-misses              #   28.386 % of all cache refs      ( +-  1.18% )
           52,6332      cache-references                                              ( +-  1.64% )
       2,4425,4756      instructions              #    1.49  insns per cycle          ( +-  0.02% )
       1,6354,4468      cycles                                                        ( +-  1.61% )

       0.071050083 seconds time elapsed                                          ( +-  1.84% )

cache miss 的機率明顯的從 87.936%下降至 23.386%。

$ ./phonebook_opt & sudo perf top -p $!




















  28.64%  libc-2.23.so      [.] __strcasecmp_l_avx
  20.16%  phonebook_opt     [.] main
   8.76%  phonebook_opt     [.] findName
   8.33%  libc-2.23.so      [.] malloc
   7.41%  libc-2.23.so      [.] __memcpy_sse2
   4.36%  libc-2.23.so      [.] _IO_getline_info
   2.83%  libc-2.23.so      [.] _IO_fgets
   2.76%  libc-2.23.so      [.] _int_malloc
   2.60%  libc-2.23.so      [.] memchr
   2.59%  phonebook_opt     [.] append
   2.50%  libc-2.23.so      [.] __strcasecmp_avx
   1.43%  [kernel]          [.] native_irq_return_iret
   1.34%  libc-2.23.so      [.] __strcpy_sse2_unaligned
   1.29%  [unknown]         [k] 0x00007f9a8187d02e
   1.26%  [kernel]          [k] do_exit
   1.25%  [kernel]          [k] unmap_page_range
   1.24%  [kernel]          [k] free_pages_prepare
   1.24%  [kernel]          [k] free_pcppages_bulk
   0.01%  [kernel]          [k] native_sched_clock
   0.00%  [kernel]          [k] native_write_msr_safe

findName()所佔的效能也從 22.35%降至 8.76%。

plot

優化方法二：使用hash function

從方法一改變資料結構後已經可以發現findName()明顯的效能進步，不過考慮到單字量的大小，就想到可以從hash function著手，利用last name當作key，對hash table做搜尋一定比每次都要從頭搜尋還要快。
根據參考資料看出BKDRHash的平均表現是最好的，因此在這裡選擇使用它。














// BKDR Hash Function
unsigned int BKDRHash(char *str)
{
    unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
    unsigned int hash = 0;

    while (*str)
    {
        hash = hash * seed + (*str++);
    }

    //HASH_TABLE_SIZE define in phonebook_hash.h
    return (hash % HASH_TABLE_SIZE);
}

並先將table size設為42737，宣告方式變為


entry *pHead[HASH_TABLE_SIZE], *e[HASH_TABLE_SIZE];

這裡出現了小bug，花了不少時間學習使用gdb。鄭皓澤

效能分析

$ ./phonebook_hash

size of entry : 32 bytes
execution time of append() : 0.079218 sec053768
execution time of findName() : 0.000000 sec

findName()的時間直接降至顯示0秒，但append()卻上升了0.02545秒。

$ make cache-test

Performance counter stats for './phonebook_hash' (100 runs):

          105,9511      cache-misses              #   64.586 % of all cache refs      ( +-  0.85% )
          164,0460      cache-references                                              ( +-  0.31% )
       2,9978,3164      instructions              #    1.13  insns per cycle          ( +-  0.02% )
       2,6638,1339      cycles                                                        ( +-  0.45% )

       0.111340353 seconds time elapsed                                          ( +-  0.52% )

cache miss 的機率從 23.386%上升至 64.586%。

plot

其他方法(待實驗)

字串壓縮法
BST

2016q3 Homework1 (phonebook)

contributed by <HaoTse>

Reviewed by <jkrvivian>

開發環境

案例分析：Phone Book

未優化版本

優化方法一：減少struct的大小

效能分析

優化方法二：使用hash function

效能分析

其他方法(待實驗)

參考資料

tags: HaoTse sysprog21 phonebook hash function

contributed by <`HaoTse`>

Reviewed by <`jkrvivian`>

tags: `HaoTse` `sysprog21` `phonebook` `hash function`