2018q1 Homework1 (phonebook)

# 2018q1 Homework1 (phonebook) contributed by <`bauuuu1021`> ###### tags: `bauuuu1021`, `sysprog` ## Code review ### Reviewed by `ryanpatiency` * commit message [Distributed by prefix](https://github.com/bauuuu1021/phonebook/commit/d70cbc8030d917ba618269645b263f50a40de76e "Distributed by prefix") is not clear, and lack subject * commit message [Increase decimal bit to 8 bits](https://github.com/bauuuu1021/phonebook/commit/bddd484026ef1b24be7cbc3a3b3de1aed4dcc371 "Increase decimal bit to 8 bits") could change to "Express the execution time more precisely". The former is what, the latter is why. We can know what by git diff, but we don't know why unless you say it. ### Reviewed by `shaimejump520` * [d70cbc8](https://github.com/bauuuu1021/phonebook/commit/d70cbc8030d917ba618269645b263f50a40de76e) 在 append 方法做了修改，然而該 function 的參數及回傳值都變得沒有意義，容易讓人對於該 function 的作用產生誤會 >這是因為我想到的辦法其實用不到傳入參數，又覺得不要改 main 比較好。不過就像學長說的可能會讓別人誤會 function 意義，要再想一下怎麼辦比較好，感謝提醒 [name=bauuuu1021] * fuzzy search 的部分，因為是根據相同字串長度的預設下去坐搜尋的，按照原方法搜尋 "justino"，若檔案中有兩筆資料分別為 "jxustino" 和 "justixx"，一般而言，應認為前者與搜尋的字串較相近，然而因為程式行為卻會認為後者較為接近。提供淺見，可使用解決 Longest Common Subsequence 的演算法，應可以提供準確率較高的搜尋。(但是由於該問題尚未有高效率的演算法，可能大幅增加 find time) ### Reviewed by `rwe0214` * distributed by prefix 其實也可以看成是一種 hash 的方式，所以 `findName()` 的時間會取決於每個 bucket 的 collision 數量。本文提到的 `findName()` 時間大幅降低的原因是因為在 `main.c` 裡搜尋的 target 為 `zyxel` ，會被分在 prefix 為 " z " 這個 bucket，而在 `words.txt` 檔中以 " z " 為 prefix 的數量只有 2203 個，故此 bucket 的 collision 的數量是 2202 個，比起搜尋全部將近 35 萬筆資料來說，時間相較之下便會大幅降低。 * 可以試著比較看看數量最多的 " s " 和數量最少 " x " 看看時間的差別。 ## 安裝Ubuntu 17.10 >中英文字間記得用空白隔開喔！ >[color=red][name=課程助教] >>[color=blue]感謝提醒！[name=bauuuu1021] * 助教的 [教學影片](https://www.youtube.com/watch?v=rjpBTT6AmeU) 蠻清楚的，有遇到的問題是剛開始在 BIOS 中無法選取 Boot->Launch CSM；把後來先把 Security->Secure Boot Control 給 Disable 後存檔重新開機再進入即解決。 * 我將 win10 裝在 ssd， ubuntu 裝在 hard disk ；發現要把 ubuntu 放在第一個開機選項否則無法選擇 OS。 ``` bauuuu1021@x555lb:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 17.10 Release: 17.10 Codename: artful ``` ``` bauuuu1021@x555lb:~/phonebook$ lscpu Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數： 2 每通訊端核心數： 2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 61 Model name: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz 製程： 4 CPU MHz： 2196.645 CPU max MHz: 2700.0000 CPU min MHz: 500.0000 BogoMIPS: 4393.29 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 3072K NUMA node0 CPU(s): 0-3 ``` * 我剛剛嘗試從16.04升級到17.10，感覺系統有點不穩嗎（？），軟體中心跑蠻慢的還一直閃退，最重要的是 terminal 開不起來，但是 vistual studio code 裡的可以，希望高手解惑QQ > 抱歉多嘴一下，前面提到開機選項其實並不一定要將Ubuntu設為優先才找得到，只不過Windows原生的開機選單有點難用，有種直觀的方法是找Windows的製作開機選單的小程式，用關鍵字下去蒐應該有很多可以使用，然後不知道terminal開不起來是指怎麼樣的開不起來呢？如果是指沒有快捷鍵的話，可以去系統選項新增。然後如果你問題解決了可以把我的話刪掉 >[color=red] [name=林靜儀] > >> 好的，開機部份我有空會試試！ terminal 就是按他之後會開始跑小圈圈，可是過一下子就不見了， terminal 也沒有開起來... >>[color=blue] [name=bauuuu1021] >> * 安裝雙系統可能會遇到時間跑掉的問題，這是其中一種 [解決方法](https://www.ptt.cc/bbs/Linux/M.1504346730.A.784.html)。 ## 實驗紀錄先執行以下指令以清空 cache ``` echo 1 | sudo tee /proc/sys/vm/drop_caches ``` 輸入以下指令以更改權限 ``` sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid' ``` ### orig 版本 ``` Performance counter stats for './phonebook_orig' (100 runs): 3,424,094 cache-misses 88.878 % of all cache refs ( +- 0.08% ) ``` ![](https://i.imgur.com/u0cFltI.png) ### 降低 entry size * 這次實驗的 [Github link](https://github.com/bauuuu1021/phonebook/commit/ecd216d2631906e965807aa23ba06944e1c98bed ) * 發現其實只用的到 lastName 所以先將 struct 中其他 element 刪除， entry size 降低到 24 bytes。 * function 都仍使用 phonebook_orig.c 中的。 * ![](https://i.imgur.com/jezhZph.png) * 從上圖可發現如果降低 entry size ，可大幅降低 append 及 findName 的時間。 * miss rate 約65% ``` Performance counter stats for './phonebook_opt' (100 runs): 869,400 cache-misses 65.164 % of all cache refs ( +- 0.41% ) ``` ### 先以字首分群，再進行搜尋 * 這次實驗的 [Github link](https://github.com/bauuuu1021/phonebook/commit/bddd484026ef1b24be7cbc3a3b3de1aed4dcc371) * append 時依照字首字母，分到 26 個子 list； findName 時依照字首字母，到所在的 list 尋找。 * 在 struct 中加上一個 pointer 指到新增的 detail data struct，因為實際情況不可能沒有紀錄 last name 以外資訊 * ![](https://i.imgur.com/DmMRQDZ.png) * append() 相較昨天的更花時間，推論應該是要先判斷要分到哪個 prefix 群組；而 findName 真是太神奇了，不知道為什麼會0。有用不在 words.txt 中的字確認過了，在 assert 會被擋下來。 * 另外測試 print 出 find 的下一個字結果也正確；也確定 tv_nsec > 0。 ``` Performance counter stats for './phonebook_opt' (100 runs): 463,084 cache-misses 64.347 % of all cache refs ( +- 0.34% ) 719,670 cache-references ( +- 0.82% ) 203,835,346 instructions 1.51 insn per cycle ( +- 0.02% ) 135,375,872 cycles ( +- 0.32% ) ``` * miss rate 仍不理想，約64%。 > 後來發現 findName 結果為 0 是因為小數點 6 位不足以表達，稍微更改了 output.txt 、 main.c 等與 make plot 有關之檔案，將精度提升到 8 位，得到新的圖如下： > ![](https://i.imgur.com/ld3XBlN.png) ### Hash table * 參考了一下 [各種Hash用法](https://www.byvoid.com/zht/blog/string-hash-compare "title") ，決定使用 BKDR 。 * 有先參考過 [Cache line](http://cenalulu.github.io/linux/all-about-cpu-cache/) ，了解 Cache 的運作方式。 * 這次實驗的 [Github link](https://github.com/bauuuu1021/phonebook/commit/d8f5f3ebbb07968d577bb36809af5f1b200a275d) ![](https://i.imgur.com/E7O3gir.png) * 相較字首分群 findName 又快一點點，但 append 增加了許多，推測是每筆資料都要跑一次 hash() 以求得 index number 的緣故。 ``` Performance counter stats for './phonebook_opt' (100 runs): 474,408 cache-misses 57.816 % of all cache refs ( +- 0.34% ) 820,546 cache-references ( +- 0.79% ) 251,405,888 instructions 1.61 insn per cycle ( +- 0.02% ) 156,114,570 cycles ( +- 0.27% ) ``` * miss rate 有些許下降，但跟其他同學相比還是太高。我會再研究一下 [Cache line](http://cenalulu.github.io/linux/all-about-cpu-cache/) 以找出原因。 * 輸入以下指令可得知 cache line 大小為 64 Bytes ``` bauuuu1021@x555lb:~$ cat /sys/devices/system/cpu/cpu1/cache/index0/coherency\_line\_size 64 ``` > * 將 index number 改為 8192 ， append time 大幅增加![](https://i.imgur.com/edFlOlK.png) > * miss rate 降到 46% 左右 > * 這次實驗的 [Github link](https://github.com/bauuuu1021/phonebook/commit/601fb9a3c7df64277ff1e66c0f62c6d3b09121af) ``` Performance counter stats for './phonebook_opt' (100 runs): 580,993 cache-misses 46.150 % of all cache refs ( +- 1.17% ) 1,258,927 cache-references ( +- 1.25% ) 251,430,128 instructions 1.48 insn per cycle ( +- 0.02% ) 170,397,584 cycles ( +- 0.88% ) ``` ### Fuzzy Search * 另外創了一個 branch 來時做這個功能，這是新 branch 的 [Github link](https://github.com/bauuuu1021/phonebook/commit/5c193dfd64575f46c980e6488d95121278ca41ed) * 順便整理一下管理 branch 會用到的指令 * Create local branch `git branch <new_branch>` * List branch `git branch` * Switch branch `git branch <target_branch>` * Push `git push origin <target_branch>` * 以字首分群為基礎做延伸，如果找不到索引字，則在相同字首中找出最多字母相同的。 * 以下為實驗結果 ``` bauuuu1021@X555LB:~/phonebook$ sudo ./phonebook_opt size of entry : 32 bytes Couldn't find 'justinm', did you mean 'justino'?? Couldn't find 'justinm', did you mean 'justino'?? Couldn't find 'justinm', did you mean 'justino'?? execution time of append() : 0.05303118 sec execution time of findName() : 0.00026880 sec ``` ## 問題討論本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題？ - 有代表性嗎？跟真實英文姓氏差異大嗎？ * dataset 中有些姓氏是明顯不存在的，例如連續超過 3 個母音相連的 aaaa 、完全沒有母音的 bbcc 等。 - 資料難道「數大就是美」嗎？ * 大量資料要是沒有透過有效率、有系統的儲存方式，會造成搜尋時需要花許多時間；就不算是一個好的 "Phonebook" 。 - 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ * 真實的電話簿不可能只儲存 last name ，一定還會有 detail data 如電話號碼、 first name 等等。在搜尋時也不一定要使用 last name ，像我的手機還可以透過 first name、公司名稱、電話號碼來搜尋；但勢必也會需要更多空間來當索引資料，不能像我的實驗都只能搜尋 last name，會造成所需時間增加（透過縮減 struct size 的實驗已證實）。 ## 環境對 Cache miss 之影響 * 因為我做出來的 Cache miss 跟其他使用類似方法的同學相比還是高了不少，因此想要找出可能的原因。 * 參考[`BroLeaf`](https://hackmd.io/s/SkHgvUU_f#)同學的實驗，這是他的電腦上跑出的結果。 ``` Performance counter stats for './phonebook_opt' (100 runs): 72,442 cache-misses # 17.652 % of all cache refs ( +- 0.31% ) 410,390 cache-references ( +- 0.70% ) 264,714,706 instructions # 1.85 insn per cycle ( +- 0.02% ) 143,473,401 cycles ( +- 0.41% ) 0.042652042 seconds time elapsed ( +- 0.55% ) ``` * 這是在我的電腦上跑出的結果。 ``` Performance counter stats for './phonebook_opt' (100 runs): 432,661 cache-misses # 44.000 % of all cache refs ( +- 0.72% ) 983,330 cache-references ( +- 0.32% ) 264,727,627 instructions # 1.95 insn per cycle ( +- 0.02% ) 136,104,259 cycles ( +- 0.20% ) 0.052446045 seconds time elapsed ( +- 0.46% ) ``` * 可以發現 instructions 數大致相同，但 miss rate 差超多。 * 整理出幾點我們開發環境明顯相異處 * 我的是 i5-5200U CPU，他的是 i7-4720HQ CPU * 我的 CPU 4 核，他的 8 核 * 他的 L3 cache 約是我的兩倍 * 推論可能是 `BroLeaf` 同學的 Cache size 比我大蠻多的（CPU 數兩倍，表示比我多 2 倍的 L1及L2 空間），或是 i5 跟 i7 存取 cache 的行為模式不一樣 (我的 cache-references 約為兩倍)。 * 因此我認為 cache miss 只適用於自身環境下演算法改變造成的增加或減少的參考，不適用與他人直接用 cache miss 數值做比較。 ## 參考資料 * [GNU/Linux 開發工具](https://hackmd.io/c/rJKbX1pFZ/https%3A%2F%2Fhackmd.io%2Fs%2Fr1Psrf0KW) * [Git 工具線上練習](https://try.github.io/levels/1/challenges/20)