# 2016q3 Homework1 (phonebook) contributed by <`HaoTse`> --- ### Reviewed by <`jkrvivian`> * 可以嘗試實作不同hash function進行效能比較 * 沒有進一步分析使用hash function造成append()時間提高的原因 * 還未實作文末提及到的字串壓縮法及binary search tree * [91f657b](https://github.com/HaoTse/phonebook/commit/91f657bf5d02dde3e3862d432e2b272ce3b539eb) 和[315f489](https://github.com/HaoTse/phonebook/commit/315f489aa5d5caffb6fa42da252d5ea886a388fe)的commit可以參考[ How to Write a Git Commit Message](http://chris.beams.io/posts/git-commit/)改善 * [64c7ccf](https://github.com/HaoTse/phonebook/commit/64c7ccf186efc20100da1bfdd76700ce3c139757) 沒有說明程式碼修改的原因 ## 開發環境 - 在筆電上灌了 Ubuntu 16.04 LTS - .vimrc參考[成大資工wiki]( http://wiki.csie.ncku.edu.tw/vim/vimrc) - 安裝相關開發工具 - 基本編譯器 - 效能分析工具 (perf) - astyle - colordiff - gnuplot ``` $ sudo apt-get update $ sudo apt-get install build-essential $ sudo apt-get install linux-tools-common linux-tools-generic $ sudo apt-get install astyle colordiff gnuplot ``` ---- - `$uname -a` ```shell Linux tse-Aspire-V5-573G 4.4.0-38-generic ``` - 查看電腦 cache 大小 `$ lscpu | grep cache` ```shell L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K ``` --- ## 案例分析:Phone Book ---- ### 未優化版本 - `$ ./phonebook_orig` ``` size of entry : 136 bytes execution time of append() : 0.047622 sec execution time of findName() : 0.005108 sec ``` 一個entry(PhoneBook size)為 136bytes,L1 cache的size為 32KB = 32 * 1024bytes,所以(32 * 1024) / (136 * 8) = 30.12,只能存30筆左右。 ---- - `$ make cache-test` ``` Performance counter stats for './phonebook_orig' (100 runs): 96,4439 cache-misses # 87.936 % of all cache refs ( +- 0.43% ) 109,6754 cache-references ( +- 0.28% ) 2,6120,5049 instructions # 1.50 insns per cycle ( +- 0.02% ) 1,7371,1341 cycles ( +- 0.39% ) 0.070385042 seconds time elapsed ( +- 0.57% ) ``` ---- - `$ ./phonebook_orig & sudo perf top -p $!` > 使用perf尋找熱點[name=鄭皓澤] ```shell= 23.06% libc-2.23.so [.] __strcasecmp_l_avx 22.35% phonebook_orig [.] findName 8.66% phonebook_orig [.] main 5.41% libc-2.23.so [.] _int_malloc 5.21% [kernel] [k] clear_page_c_e 4.01% phonebook_orig [.] append 3.96% libc-2.23.so [.] __memcpy_sse2 3.65% libc-2.23.so [.] _IO_fgets 3.20% [unknown] [k] 0x00007ff9db93e02e 2.73% [kernel] [k] release_pages 2.17% libc-2.23.so [.] memchr 1.40% libc-2.23.so [.] __strcpy_sse2_unaligned 1.40% libc-2.23.so [.] _IO_getline_info 1.37% [kernel] [k] uncharge_list 1.36% [kernel] [k] page_remove_rmap 1.28% [kernel] [k] alloc_pages_vma 0.94% [kernel] [k] lru_cache_add_active_or_unevictable 0.81% [kernel] [k] __rmqueue.isra.79 0.71% [kernel] [k] get_page_from_freelist 0.71% libc-2.23.so [.] _IO_getline 0.71% [kernel] [k] pagevec_lru_move_fn 0.71% libc-2.23.so [.] malloc 0.70% libc-2.23.so [.] __strcasecmp_avx 0.70% phonebook_orig [.] strcasecmp@plt 0.70% [kernel] [k] __pagevec_lru_add_fn 0.69% [kernel] [k] free_hot_cold_page 0.68% [kernel] [k] unmap_page_range 0.67% [kernel] [k] free_pcppages_bulk 0.05% [kernel] [k] __perf_event_enable 0.00% [kernel] [k] nmi_handle 0.00% [kernel] [k] native_write_msr_safe ``` 使用 perf 的結果中的第2行我們可以看出 findName() 佔了 22.35%,第4行 append() 則佔了4.01%,findName()為熱點,著手改善findName()的速度。 --- ### 優化方法一:減少struct的大小 前面提到我的電腦的L1 cache最多只能放30個entry,但總共有35萬個單字,因此一定會發生頻繁的cache miss。 根據題目要求「有沒有找到 last name」,我們只需要搜尋last name,因此其他資訊是可以忽略不看的。所以在這裡我就另外宣告了一個struct儲存last name,將他取代原本的entry,另外將原本的entry改為detail。 ```clike= typedef struct __LAST_NAME_ENTRY{ char lastName[MAX_LAST_NAME_SIZE]; detail *detail; struct __LAST_NAME_ENTRY *pNext; } entry; ``` ---- #### 效能分析 - `$ ./phonebook_opt` ``` size of entry : 32 bytes execution time of append() : 0.053768 sec execution time of findName() : 0.002357 sec ``` entry的大小很明顯的從 136bytes下降至 32bytes,cache就可以放入(32 * 1024) / (32 * 8) = 128筆資料,大大的增加cache hit的機會。 findName()的執行時間也明顯下降了。 ---- - `$ make cache-test` ``` Performance counter stats for './phonebook_opt' (100 runs): 14,9402 cache-misses # 28.386 % of all cache refs ( +- 1.18% ) 52,6332 cache-references ( +- 1.64% ) 2,4425,4756 instructions # 1.49 insns per cycle ( +- 0.02% ) 1,6354,4468 cycles ( +- 1.61% ) 0.071050083 seconds time elapsed ( +- 1.84% ) ``` cache miss 的機率明顯的從 87.936%下降至 23.386%。 ---- - `$ ./phonebook_opt & sudo perf top -p $!` ```shell= 28.64% libc-2.23.so [.] __strcasecmp_l_avx 20.16% phonebook_opt [.] main 8.76% phonebook_opt [.] findName 8.33% libc-2.23.so [.] malloc 7.41% libc-2.23.so [.] __memcpy_sse2 4.36% libc-2.23.so [.] _IO_getline_info 2.83% libc-2.23.so [.] _IO_fgets 2.76% libc-2.23.so [.] _int_malloc 2.60% libc-2.23.so [.] memchr 2.59% phonebook_opt [.] append 2.50% libc-2.23.so [.] __strcasecmp_avx 1.43% [kernel] [.] native_irq_return_iret 1.34% libc-2.23.so [.] __strcpy_sse2_unaligned 1.29% [unknown] [k] 0x00007f9a8187d02e 1.26% [kernel] [k] do_exit 1.25% [kernel] [k] unmap_page_range 1.24% [kernel] [k] free_pages_prepare 1.24% [kernel] [k] free_pcppages_bulk 0.01% [kernel] [k] native_sched_clock 0.00% [kernel] [k] native_write_msr_safe ``` findName()所佔的效能也從 22.35%降至 8.76%。 ---- - plot ![](https://i.imgur.com/hpWWzFi.png) --- ### 優化方法二:使用hash function 從方法一改變資料結構後已經可以發現findName()明顯的效能進步,不過考慮到單字量的大小,就想到可以從hash function著手,利用last name當作key,對hash table做搜尋一定比每次都要從頭搜尋還要快。 根據[參考資料](https://www.byvoid.com/blog/string-hash-compare)看出BKDRHash的平均表現是最好的,因此在這裡選擇使用它。 ```clike= // BKDR Hash Function unsigned int BKDRHash(char *str) { unsigned int seed = 131; // 31 131 1313 13131 131313 etc.. unsigned int hash = 0; while (*str) { hash = hash * seed + (*str++); } //HASH_TABLE_SIZE define in phonebook_hash.h return (hash % HASH_TABLE_SIZE); } ``` 並先將table size設為42737,宣告方式變為 ```clike= entry *pHead[HASH_TABLE_SIZE], *e[HASH_TABLE_SIZE]; ``` > 這裡出現了小bug,花了不少時間學習使用gdb。[name=鄭皓澤] ---- #### 效能分析 - `$ ./phonebook_hash` ``` size of entry : 32 bytes execution time of append() : 0.079218 sec053768 execution time of findName() : 0.000000 sec ``` findName()的時間直接降至顯示0秒,但append()卻上升了0.02545秒。 ---- - `$ make cache-test` ``` Performance counter stats for './phonebook_hash' (100 runs): 105,9511 cache-misses # 64.586 % of all cache refs ( +- 0.85% ) 164,0460 cache-references ( +- 0.31% ) 2,9978,3164 instructions # 1.13 insns per cycle ( +- 0.02% ) 2,6638,1339 cycles ( +- 0.45% ) 0.111340353 seconds time elapsed ( +- 0.52% ) ``` cache miss 的機率從 23.386%上升至 64.586%。 ---- - plot ![](https://i.imgur.com/JVc7vg3.png) --- ## 其他方法(待實驗) - [ ] 字串壓縮法 - [ ] BST --- ## 參考資料 - [Linux 效能分析工具: Perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial) - [Makefile 簡介](http://tetralet.luna.com.tw/?op=ViewArticle&articleId=185) - [各種hash function介紹](https://www.byvoid.com/blog/string-hash-compare) - [BKDR hash介紹](http://blog.csdn.net/wanglx_/article/details/40400693) - [GDB簡介](http://tetralet.luna.com.tw/?op=ViewArticle&articleId=187&blogId=1) - [DADA筆記](https://embedded2016.hackpad.com/ep/pad/static/YkqjhwgnQcA) - [勃興筆記](https://hackmd.io/s/Bkx3cYX6) - [VVN筆記](https://hackmd.io/s/SyOQgOQa#優化) ###### tags: `HaoTse` `sysprog21` `phonebook` `hash function`