2017q3 Homework2 (prefix-search)

--- tags: sysprog2017 --- # 2017q3 Homework2 (prefix-search) contributed by <`Yuessiah`, `HexRabbit`[^1]> ###### 本文件多處地方採用超連結來補完報告，建議點開來看 ## 開發環境 ``` eeeeeeeeeeeeeeeee eeeeeeeeeeeeeeeeeeeeeee eeeee eeeeeeeeeeee eeeee OS: elementary os elementary OS 0.3.2 freya eeee eeeee eee eeee Kernel: x86_64 Linux 4.4.0-93-generic eeee eeee eee eeee Shell: bash 4.3.11 eee eee eee eee Virtualization: VT-x eee eee eee eee CPU: Intel Core i7-7700 CPU @ 4.2GHz ee eee eeee eeee CPU(s): 8 ee eee eeeee eeeeee ee eee eeeee eeeee ee RAM: 1691MiB / 15934MiB eee eeee eeeeee eeeee eee L1d cache: 32K eee eeeeeeeeee eeeeee eee L1i cache: 32K eeeeeeeeeeeeeeeeeeeeeeee eeeee L2 cache: 256K eeeeeeee eeeeeeeeeeee eeee L3 cache: 8192K eeeee eeeee eeeeeee eeeeeee BogoMIPS: 7200.65 eeeeeeeeeeeeeeeee ``` # Ternary Search Tree ## 簡介在一堆字串的集合當中要搜尋一個特定字串，我們能用 [hash table](https://en.wikipedia.org/wiki/Hash_table) 實現，但 hash table 隨著資料(字串)量的增加，要調整 hash table 的 size 才能保持 hash table 應有的效能；那麼改用 [Binary Search Tree](https://en.wikipedia.org/wiki/Binary_search_tree) (BST)，雖然效能穩定下來，但相較 hash table 還是慢太多了；如果用 [Trie](https://en.wikipedia.org/wiki/Trie)，雖然查詢非常快，但是需要的儲存空間太大。而 [Ternary Search Tree](https://en.wikipedia.org/wiki/Ternary_search_tree) (TST) 結合了 Trie 和 BST 兩者的優點，不但查詢快，且儲存空間用得少。 ### 資料結構 > 用給定的 coding style 來縮排下述程式碼。不要忘了，為了「打群架」，我們需要建立紀律和對細節的高度掌握。 > [name="jserv"][color=red] >> 好哦，我馬上修正! >> [name="Yuessiah"][color=blue] ```clike struct Node { char data; unsigned isEndOfString: 1; struct Node *left, *eq, *right; }; ``` ### 插入 ```clike void insert(struct Node** root, char *word) { if (!(*root)) *root = newNode(*word); if ((*word) < (*root)->data) insert(&( (*root)->left ), word); else if ((*word) > (*root)->data) insert(&( (*root)->right ), word); else { if (*(word+1)) insert(&( (*root)->eq ), word+1); else (*root)->isEndOfString = 1; } } ``` ### 查詢 ```clike int search(struct Node *root, char *word) { if (!root) return 0; if (*word < (root)->data) return search(root->left, word); else if (*word > (root)->data) return search(root->right, word); else { if (*(word+1) == '\0') return root->isEndOfString; return search(root->eq, word+1); } } ``` --- ## [prefix-search](https://github.com/sysprog21/prefix-search) 研究與觀察 ### 輸入測試操作指令： ``` a d a de a del a h a ai ``` ### 走訪 ```from = l``` , 表示從上個節點的 ```lokid``` 來的 ```from = e``` , 表示從上個節點的 ```eqkid``` 來的 ```from = h``` , 表示從上個節點的 ```hikid``` 來的 ```prev_key``` 為父節點的 ```key``` 值若無 ```key``` 值或 ```str``` 值，此節點為 ```NULL``` ```haskell depth = 0, from = Ø, prev_key = Ø, key = d depth = 1, from = l, prev_key = d, key = a depth = 2, from = l, prev_key = a depth = 2, from = e, prev_key = a, key = i depth = 3, from = l, prev_key = i depth = 3, from = e, prev_key = i, str = ai depth = 4, from = l, prev_key = 0 depth = 4, from = h, prev_key = 0 depth = 3, from = h, prev_key = i depth = 2, from = h, prev_key = a depth = 1, from = e, prev_key = d, str = d depth = 2, from = l, prev_key = 0 depth = 2, from = h, prev_key = 0, key = e depth = 3, from = l, prev_key = e depth = 3, from = e, prev_key = e, str = de depth = 4, from = l, prev_key = 0 depth = 4, from = h, prev_key = 0, key = l depth = 5, from = l, prev_key = l depth = 5, from = e, prev_key = l, str = del depth = 6, from = l, prev_key = 0 depth = 6, from = h, prev_key = 0 depth = 5, from = h, prev_key = l depth = 3, from = h, prev_key = e depth = 1, from = h, prev_key = d, key = h depth = 2, from = l, prev_key = h depth = 2, from = e, prev_key = h, str = h depth = 3, from = l, prev_key = 0 depth = 3, from = h, prev_key = 0 depth = 2, from = h, prev_key = h ``` ### 結構圖 > HackMD 支援 GraphViz，請重新用對應語法改製下圖 > [name="jserv"][color=red] 圓圈上是 ```key``` 值矩形上是字串 ![](https://i.imgur.com/b64TgM8.png) --- ## [REF 版本](https://github.com/Yuessiah/prefix-search/commit/71b5e340f2d5b12d7a98544459a61335bf09ab89)實作用一塊很大的記憶體空間維護 add 後的字串，避開使用 strdup (i.d. copy) 產生零碎非連續的記憶體空間存放字串。 :::danger 1. 記憶體空間並非「大」就是好 2. 測試其他 memory pool :notes: jserv ::: > 感謝建議，有餘力會補上。 > [name="Yuessiah"][color=blue] ```clike int ord, reflen; char *refer[ORDMAX]; char *pool_request(char *s) { if(reflen + strlen(s) + 1 >= REFMAX) { refer[++ord] = (char*)malloc(sizeof(char)*REFMAX); reflen = 0; } char *fp = strcpy(refer[ord] + reflen, s); reflen += strlen(s) + 1; return fp; } ``` --- ## 效能測試 ### 隨機產出輸入測試資料寫個可以隨機生成測試資料的小程式 [generate_input.py](https://github.com/Yuessiah/prefix-search/blob/master/generate_input.py) 輸入 num 會分別產生兩個測試輸入檔案 (```case 'f', 's', 'd'``` 分別操作 num 次) 一個是一定找得到的字串集 ```match_input.txt``` 另一個保證找不到的字串集 ```bad_input.txt``` ### perf stat for test_cpy 輸入 num 為 10000 ``` $ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < match_input.txt Performance counter stats for './test_cpy': 802,400,118 cache-misses # 85.252 % of all cache refs (71.37%) 941,205,107 cache-references (71.45%) 24,044,167,994 instructions # 1.22 insns per cycle (71.46%) 19,664,808,946 cycles (71.46%) 3,873,694,250 branches (71.46%) 50,142,052 branch-misses # 1.29% of all branches (71.45%) 24,009,233,921 instructions # 1.22 insns per cycle (71.37%) 4.765697782 seconds time elapsed --- $ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < bad_input.txt Performance counter stats for './test_cpy': 7,694,804 cache-misses # 65.799 % of all cache refs (70.78%) 11,694,377 cache-references (70.74%) 944,629,368 instructions # 1.68 insns per cycle (70.73%) 563,366,677 cycles (70.73%) 190,461,459 branches (73.41%) 1,864,072 branch-misses # 0.98% of all branches (73.82%) 972,387,637 instructions # 1.73 insns per cycle (71.17%) 0.136838377 seconds time elapsed ``` ### perf stat for test_ref 輸入 num 為 10000 ``` $ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < match_input.txt Performance counter stats for './test_ref': 819,170,625 cache-misses # 83.692 % of all cache refs (71.44%) 978,793,649 cache-references (71.45%) 23,978,277,446 instructions # 1.09 insns per cycle (71.45%) 21,969,185,090 cycles (71.45%) 3,869,006,562 branches (71.45%) 50,637,809 branch-misses # 1.31% of all branches (71.39%) 24,013,397,881 instructions # 1.09 insns per cycle (71.39%) 5.295836590 seconds time elapsed --- $ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < bad_input.txt Performance counter stats for './test_ref': 7,570,185 cache-misses # 64.458 % of all cache refs (70.43%) 11,744,444 cache-references (70.46%) 901,386,228 instructions # 1.61 insns per cycle (70.43%) 558,437,763 cycles (70.42%) 179,952,823 branches (72.74%) 1,762,884 branch-misses # 0.98% of all branches (75.71%) 934,098,197 instructions # 1.67 insns per cycle (73.32%) 0.135439831 seconds time elapsed ``` REF 版本的執行時間與 CPY 版本差不多。 :::danger 1. 不要隨便說「差不多」，及早脫離文組^TM^，不要憑感覺做事 2. 需要用電腦科學的背景來解釋 :notes: jserv ::: > 感謝老師提醒，有空會去做解釋 > [name="Yuessiah"][color=blue] --- ## 加入 bench 執行參數 ```shell TESTS = \ test_cpy \ test_ref bench: $(TESTS) -rm -f output.txt echo 10000 | ./generate_input.py for method in $(TESTS); \ do \ echo "match-input:" >> output.txt; \ ./$$method < match_input.txt > /dev/null; \ echo "bad-input:" >> output.txt; \ ./$$method < bad_input.txt > /dev/null; \ done ``` 在 REF 及 CPY 版本程式中，將 ```case 'a', 'f', 's', 'd'```的每次執行時間分別加總起來，並新增一項[指令](https://github.com/Yuessiah/prefix-search/blob/master/test_cpy.c#L155)，輸出如下 (使用自己寫的 [script](https://github.com/Yuessiah/prefix-search/blob/master/generate_input.py), num = 10000) ``` ./test_cpy match-input: add, total time 0.089029 sec. find, total time 0.005313 sec. search, total time 4.882656 sec. delete, total time 0.008483 sec. bad-input: add, total time 0.085424 sec. find, total time 0.000572 sec. search, total time 0.002984 sec. delete, total time 0.001085 sec. ./test_ref match-input: add, total time 0.083307 sec. find, total time 0.005261 sec. search, total time 5.422224 sec. delete, total time 0.008679 sec. bad-input: add, total time 0.083800 sec. find, total time 0.000609 sec. search, total time 0.003103 sec. delete, total time 0.001162 sec. ``` --- ## CPY 版本的加速 >strdup 的代價要考慮，對 cache locality 有衝擊。不能只比較 prefix search 時間，建立 TST 的時間和空間開銷也該考慮。---jserv [^2] 考量到 [strdup](http://man7.org/linux/man-pages/man3/strdup.3.html) 的成本，我們希望減少 strdup 的使用，目前可以想到兩種作法 1. [lazy evaluation](https://zh.wikipedia.org/wiki/惰性求值) 2. 在做 prefix search (```case: 's'```)時，只對葉子節點使用 strdup ### 1. lazy evaluation 在 add (```case 'a'```) 字串時，不要馬上就使用 strdup，而是在"需要"的時候才用，換句話說在做 prefix search 才使用 strdup ### 2. 葉子節點 strdup() 只有 lazy evaluation 還不夠，我們可以只在葉子節點存整個字串，而此字串的 prefix 是不是 add 過的字串，只要判定是否 `refcnt > 0` 就行 ### pros and cons 雖然總體時間大幅下降，但只看 prefix search，他的操作執行時間變大了。有些情況人們只希望搜索的快一點，而非載入資料。 --- ## 作業遇到的問題純粹分享，本人已理解或解決。 - >Why did you specify -std twice? gnu99 overrides c99 in gcc's manner. ---jserv [^3] - 不同的作業環境所能容許的輸入規模不同 (e.g. 別人電腦可能輸入 30000 次操作會 segmentation fault, 我的不會) [^4][^5] - 連續 delete 兩次從未 add 進 TST 中的相同字串，會出現非預期的回應 [^6] --- ## TST 實際應用 - [auto-completion](http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/) - spell-checking --- ## Reference [geeksforgeeks.org/ ternary search tree](http://www.geeksforgeeks.org/ternary-search-tree) [Hackmd/ ```HMKRL``` 同學的共筆](https://hackmd.io/s/BkdddNs3-) [^1]: 本共筆是與`HexRabbit`[共同研究](https://hackmd.io/MwBgLArATAnAjAUwLTDgYwOxLAEzjJAQwmCiQwykwgA4IYwqg===?view)得以撰寫而成 [^2]: [Hackmd/ `st9007a` 同學的共筆/ 修正-test-refc](https://hackmd.io/s/SkeKQEq3Z#修正-test-refc) [^3]: Github/ comment [`9095633`](https://github.com/Yuessiah/prefix-search/commit/909563351240dadcd50125c7bd9159d5ca30b216#comments) by `jserv` [^4]: Github/ commit [`528a510`](https://github.com/HexRabbit/prefix-search/commit/528a510e144a8e6dd9d2a1cdb2da6614a1a1abaa) by `HexRabbit` [^5]: [Hackmd/ `HexRabbit` 同學的共筆/ 研究紀錄](https://hackmd.io/s/S1bax6MR2b#研究紀錄) [^6]: Github/ commit [`1cb6adc`](https://github.com/Yuessiah/prefix-search/commit/1cb6adc843f92c9eea21413f732b3acffbffd9c6) by `Yuessiah`