HW1-Phonebook - HackMD

# HW1-Phonebook ###### tags: `Miyavi-Chen` ## 開發環境 ### Reviewed by `natetang` * 尚未回答作業所提出的問題 * Commit message 放圖片那裏應該要跟你實際做優化的 message 不同。EX: [Use hash function to optimize functions](https://github.com/Miyavi-Chen/phonebook/commit/6100b29a3b729b60fbc52f89a04ddf641cd6d6d8) * 方法都有列出來了，就只差實作，加油 ### Reviewed by `cdq7` * Commit message [Reduce entry to reduce time](https://github.com/Miyavi-Chen/phonebook/commit/486b21634548303cc7ee9a09a88da4427317b319) seems a little vague. * The cache-miss using hash function shows a significant improvement compared to our peers. What is the key reason? * Commit message [Add phonebook_opt_hash](https://github.com/sysprog21/phonebook/commit/4b6e58f6ab983601f471bf1ca77f9e9dd3dea0b0) has a spelling error. "for decrising cache miss. Do you mean "decreasing"? * Two commit messages, [Add phonebook_opt_hash](https://github.com/sysprog21/phonebook/commit/4b6e58f6ab983601f471bf1ca77f9e9dd3dea0b0) and [Add phonebook_opt_hash for optimization](https://github.com/sysprog21/phonebook/commit/dcc165a9a509ace961064a4c86daa041e9a2ddb2) need to be more explicit. cpu: ```clike= $ lscpu Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數：2 每通訊端核心數：4 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 58 Model name: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz 製程： 9 CPU MHz： 1824.829 CPU max MHz: 3700.0000 CPU min MHz: 1600.0000 BogoMIPS: 6584.46 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 8192K NUMA node0 CPU(s): 0-7 ``` kernel: ``` $ uname -a Linux miyavi-desktop 4.8.0-39-generic #42~16.04.1-Ubuntu SMP Mon Feb 20 15:06:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux ``` ## Requirement 1. cache miss 降低 2. findName() 的時間必須原本的時間更短 3. 回答以下問題：本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題？ * 有代表性嗎？跟真實英文姓氏差異大嗎？ * 資料難道「數大就是美」嗎？ * 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ ## 課前準備 ### perf * 看支援的event $ perf list * 目前系統熱點 $ perf top * 監測cache miss的話 Root 須權限變更 . 由下面指令確認目前權限 ```clike= $ cat /proc/sys/kernel/perf_event_paranoid ``` * 由下面指令可更改權限 ```clike= $ sudo su echo 0 > /proc/sys/kernel/kptr_restrict echo 0 > /proc/sys/kernel/perf_event_paranoid ``` * Find hot point through perf ```clike= ./phonebook_orig & sudo perf top -e cycles:ppp -p $! ``` * 觀看特定event及sample 取樣頻率設定(5000) ```clike= $ perf top -e cache-misses -c 5000 ``` * 分析特定檔案events(重複跑5次). :u是user space,:k is kernel space ```clike= $ perf stat --repeat 5 -e cache-misses:u,cache-references,instructions,cycles ./perf_stat_cache_miss ``` * 觀測function level 改用 $ perf record 及$ perf report ```clike= $ perf record -F 5000 -e cache-misses ./phonebook_orig && perf report $ perf report ``` ### gnuplot * 繪圖 $ make plot * 檢圖 $ eog [file name] * .gp file 可設定圖片輸出格式 * 只修改圖可用 $ gnuplot scripts/runtime.gp **參考資料** [gnuplot 語法解說和示範](https://hackmd.io/s/Skwp-alOg) ### 程式碼格式化工具: astyle ``` $ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch] ``` * `--style=kr` 用K&R的風格進行編程 * `--indent=spaces=4` 四個空格 * `--indent-switches` 關於Case的縮排 - Indent ‘switch’ blocks, so that the inner 'case XXX:'headers are indented in relation to the switch block. * `--suffix=none` 不保存原始文件 * `*.[ch]` 所有的.c或.h檔 [Astyle official doc](http://astyle.sourceforge.net/astyle.html#_Quick_Start) ## 未優化版本 * 執行: `$ ./phonebook_orig` * append() 及 findName() 時間如下 ``` size of entry : 136 bytes execution time of append() : 0.069621 sec execution time of findName() : 0.005714 sec ``` * cache test: `$ make cache-test` * cache-misses 高達 92% ``` Performance counter stats for './phonebook_orig' (100 runs): 2,110,653 cache-misses # 92.831 % of all cache refs ( +- 0.15% ) 2,273,646 cache-references ( +- 0.19% ) 262,132,700 instructions # 1.33 insn per cycle ( +- 0.02% ) 196,377,615 cycles ( +- 0.38% ) 0.061147295 seconds time elapsed ( +- 1.48% ) ``` ## Phonebook 開發紀錄 * 先試著重現前人成果 ### 優化版本 1 - 減少資料結構大小 * 理解：因為find 只用到lastname,故建立新的struct把詳細資料搬移到entry_detail,在 entry 裡放一個pointer 指向entry_detail . ```c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry_detail; typedef struct __LAST_NAME_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; entry_detail *detail; struct __LAST_NAME_ENTRY *pNext; } entry; ``` * 執行時間: ``` size of entry : 32 bytes execution time of append() : 0.057417 sec execution time of findName() : 0.003356 sec ``` * cache-misses: 縮小了許多 (92% to 62%) * append/findName 時間也降低許多 ``` Performance counter stats for './phonebook_opt' (100 runs): 375,561 cache-misses # 62.049 % of all cache refs ( +- 0.58% ) 605,269 cache-references ( +- 0.95% ) 244,491,770 instructions # 1.83 insn per cycle ( +- 0.02% ) 133,387,297 cycles ( +- 0.36% ) 0.042542070 seconds time elapsed ( +- 1.67% ) ``` * plot ![](https://i.imgur.com/oXn2KMw.png) ## 優化版本 2 - 使用 Hash Function 儲存資料參考了 [laochanlam](https://github.com/laochanlam/phonebook/blob/master/phonebook_opt_hash.h) [ktvexe ](https://github.com/ktvexe/phonebook-1/blob/master/phonebook_opt.h) 後採用比較看得懂的laochanlam作法.所以選用 BKDR-Hash 進行優化，下面是 BKDR-Hash 的程式碼片段： [BKDR-Hash 參考](http://www.jianshu.com/p/0be67bf8887e) * seed 應為質數減少collision * 跟0x7FFFFFFF 做and確保hash value仍在unsigned int 內 * hash table size 應選用大的質數避免collision ```c= unsigned int BKDRHash(char *str){ unsigned int seed = 131; unsigned int hash = 0; while (*str){ hash = hash * seed + (*str++); } return (hash & 0x7FFFFFFF)% 42737 ; } ``` * 執行時間: * append()平均時間從0.061933->0.026846 大幅下降 * findName()平均時間從0.020846->0.000003 s大幅下降 ![](https://i.imgur.com/dYhGWCg.png) * cache-misses 下降到51.79% ``` Performance counter stats for './phonebook_opt_hash' (100 runs): 325,024 cache-misses # 51.790 % of all cache refs ( +- 0.14% ) 627,575 cache-references ( +- 0.47% ) 241,368,273 instructions # 1.56 insn per cycle ( +- 0.02% ) 154,418,301 cycles ( +- 0.38% ) 0.050084548 seconds time elapsed ( +- 1.92% ) ``` ## 優化版本 3 - BST TBD...進度落後中,之後再補 ## 優化版本 4 - trie and array TBD...進度落後中,之後再補 ## 優化版本 5 - 利用 memory pool TBD...進度落後中,之後再補 ## 進階版本 - Fuzzy Search TBD...進度落後中,之後再補