2019q3 Homework5 (dict)

# 2019q3 Homework5 (dict) contributed by < `xl86305955` > ###### tags:`sysprog2019` `Homework5` `dict` ## 預期目標 * 學習 [ternary search tree](https://en.wikipedia.org/wiki/Ternary_search_tree) 作為 [auto-complete](https://en.wikipedia.org/wiki/Autocomplete) 和 prefix search 的實作機制 * 學習 [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) / [Quotient filter](https://en.wikipedia.org/wiki/Quotient_filter) 和 [ternary search tree](https://en.wikipedia.org/wiki/Ternary_search_tree) 的整合議題 * bit-wise operation 的實踐 * 設計效能評比 (benchmark) 的程式框架 * 學習 GNU make 和 `Makefile` 撰寫 * 複習機率統計和對應的數學基礎 ## Trie v.s. Tenary Search Tree ### Trie 在 Trie 中，每個節點為陣列型式儲存對應字串的每一個字元，透過前序特性能夠快速新增或搜詢到想要的字串 ![](https://i.imgur.com/GZwIJZ4.png) * 時間複雜度 * 新增字串： $O(S)$，其中 $S$ 為字串長度 * 尋找字串： $O(S)$，其中 $S$ 為字串長度 * 優點：與 `Binary Search Tree` 相比，新增及尋找速度非常快。因為節點儲存的內容不同， Trie 所儲存的是字串中的每個字元，搜尋時只需依照字串內容不停往下走，若存在則走完整個字串，不在的話再過程終究可得知;而 BST 式將整個字串儲存成一個節點，因此一定要確定節點是否存在在樹中 * 缺點：若是資料量大且沒有相同前序，則相當浪費記憶體空間，因位每個 node 階以陣列型式儲存 ### Tenary Search Tree Tenary Search Tree 是一種 Trie 的延伸，又結合了 Binary Tree 的特性，每個節點儲存字串中的字元，兄弟節點之間具有 `>`, `=`, `<` 之關係，依據字串中字元順序長下去 ![](https://i.imgur.com/bvPCtrz.png) * Time Complexity | Time Complexity | Avg | Worst Case| | -------- | -------- | -------- | | Search | $O(logN)$ | $O(N)$ | | Insert | $O(logN)$ | $O(N)$ | | Delete | $O(logN)$ | $O(N)$ | ## 程式碼修正欲搜尋如 `Al Khāniq, Yemen` 之類程式： ``` $ ./test_cpy ternary_tree, loaded 259112 words in 0.106885 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: s find words matching prefix (at least 1 char): Al Al - not found ``` [dict 改善 & 效能檢討](https://hackmd.io/@j4ywJU0OTzS2BlwpJo6THA/r1ZdcbFpm)中提到欲改善 `fsacanf` 在遇到 `space` 會終止輸入，因此在此加入 `buffer` 處理 ```clike= char buf[WRDMAX]; while (fgets(buf, WORDMAX, fp)) { char *token = ","; char *s = strtok(buf, token); char *p = s; if (!tst_ins_del(&root, p, INS, CPY)) { fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } idx++; } ``` 修改過後： ``` $ ./test_cpy ternary_tree, loaded 93827 words in 0.056013 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: s find words matching prefix (at least 1 char): Al Al - searched prefix in 0.000290 sec suggest[0] : Al Ḩāmūl suggest[1] : Al Ḩīlah suggest[2] : Al Ḩadd suggest[3] : Al Ḩamdī suggest[4] : Al Ḩammāmāt ... ``` 在修改過後，發現 loaded words 從原先的 `259112` 變成 `93827`。從 `cities.txt` 觀察，裡面總共是 `93827` 個欄位修改過後，在 find 的選項會出現問題： ``` Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: Tainan Tainan not found by bloom filter. ``` 因此，要在 `line 11` 加入： ```clike bloom_add(bloom, p); ``` 加入過後，結果正確 ``` Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: Tainan Bloomfilter found Tainan in 0.000002 sec. Probability of false positives:0.001357 ---------- Tree found Tainan in 0.000002 sec. ``` ## Makefile 修改 ### 根據 Bench 繪圖之修改在 Makefile 中加入 `bench_file` 選項，將 bench_test 計算之搜尋 prefix 時間紀錄於下面兩個 `.txt` 檔中 ``` bench_file: $(TESTS) ./test_cpy --bench > bench_cpy.txt ./test_ref --bench > bench_ref.txt ``` 如此一來，即可在 Makefile 中加根據 `bench_cpy.txt` 與 `bench_ref.txt` 繪出 `runtimept.gp` ``` BENCH_FILE = bench_cpy.txt bench_ref.txt ... plot_pt: $(BENCH_FILE) gnuplot scripts/runtimept.gp eog runtime2.png ``` 另外也可加入 `runtime3` 與 `runtime4` 將其 `cpy`及`ref`分別輸出，以便觀察分佈情形 ``` plot_3: $(BENCH_FILE) gnuplot scripts/runtime3.gp eog runtime3.png plot_4: $(BENCH_FILE) gnuplot scripts/runtime4.gp eog runtime4.png ``` ### 根據 perf 繪圖之修改在每一次 test 中， `perf` 都會計算對應的 `cache-misses`, `cache-references`, `instruction counts`, `cycles`等數值。為了更好閱讀數值，我們可以將其圖像化。首先，加入 `-o` 將其輸出儲存 ``` test: $(TESTS) echo 3 | sudo tee /proc/sys/vm/drop_caches; perf stat --repeat 100 -o _perf_cpy.txt \ -e cache-misses,cache-references,instructions,cycles \ ./test_cpy --bench $(TEST_DATA) perf stat --repeat 100 -o _perf_ref.txt \ -e cache-misses,cache-references,instructions,cycles \ ./test_ref --bench $(TEST_DATA) ``` 其中內容大致如下，以 `_perf_cpy.txt` 為例： ``` $ cat _perf_cpy.txt # started on Wed Nov 20 13:38:06 2019 Performance counter stats for './test_cpy --bench s Tai' (100 runs): 419,7907 cache-misses # 46.395 % of all cache refs ( +- 0.36% ) 904,8119 cache-references ( +- 0.13% ) 3,9375,0057 instructions # 1.08 insn per cycle ( +- 0.01% ) 3,6526,3327 cycles ( +- 0.31% ) 0.085919477 seconds time elapsed ( +- 0.76% ) ``` 接著，準備繪圖所需之資料。利用 `grep` 指令將所需要數值提出，再利用 `sed` 指令將逗號去除，寫入新的檔案 ``` PERF_PREFILE = _perf_cpy.txt _perf_ref.txt ... perf: $(PERF_PREFILE) grep -Eo '[0-9]+\,+[0-9]+\,*[0-9]+' _perf_cpy.txt \ | sed 's/,//g' \ > perf_cpy.txt grep -Eo '[0-9]+\,+[0-9]+\,*[0-9]+' _perf_ref.txt \ | sed 's/,//g' \ > perf_ref.txt ``` 在 `/scripts` 下加入新的繪圖腳本 ``` reset set style fill solid set key center top set title 'perf stat' set term png enhanced font 'Verdana,10' set output 'perf_stat.png' plot 'perf_cpy.txt' using 1:xtic(1) with histogram title 'cpy' , \ 'perf_ref.txt' using 1:xtic(1) with histogram title 'ref' ``` 最後，在 Makefile 中加入選項 ``` plot_perf: $(PERF_FILE) gnuplot scripts/perf.gp eog perf_stat.png ``` 解決個別繪圖選項後，可加入選項加圖片一次全部匯出 ``` plot_all: output.txt $(BENCH_FILE) $(PERF_FILE) gnuplot scripts/runtime.gp gnuplot scripts/runtimept.gp gnuplot scripts/runtime3.gp gnuplot scripts/runtime4.gp gnuplot scripts/perf.gp eog runtime.png& eog runtime2.png& eog runtime3.png& eog runtime4.png& eog perf_stat.png& ``` ## 圖片輸出測資為 `s Tai`: ![](https://i.imgur.com/SuGG9yL.png) ![](https://i.imgur.com/LVcrDWe.png) ![](https://i.imgur.com/hdhYaut.png) ![](https://i.imgur.com/zZaR4Sj.png) 測資改為 `s New`: 原先分佈很燥動的 `cpy` 分佈變得較為集中，而 `ref` 則相反 ![](https://i.imgur.com/s2CSSsA.png) ![](https://i.imgur.com/IqecycP.png) ![](https://i.imgur.com/rzhTJoU.png) ## perf 觀察 ![](https://i.imgur.com/USOg0q0.png) ![](https://i.imgur.com/Nz3abJH.png) 利用 perf 觀察 cycle 總數，發現 `test_cpy` 中呼叫 `tst_ins_del` 花費最多指令周期。在 `line 2` `diff = *p - curr->key` 中，涉及大量 memory 操作 ```clike= while ((curr = *pcurr)) { /* iterate to insertion node */ diff = *p - curr->key; /* get ASCII diff for >, <, = */ if (diff == 0) { /* if char equal to node->key */ if (*p++ == 0) { /* check if word is duplicate */ if (del) { /* delete instead of insert */ (curr->refcnt)--; /* decrement reference count */ /* chk refcnt, del 's', return NULL on successful del */ return tst_del_word(root, curr, &stk, cpy); } else curr->refcnt++; /* increment refcnt if word exists */ return (void *) curr->eqkid; /* pointer to word / NULL on del */ } pcurr = &(curr->eqkid); /* get next eqkid pointer address */ } else if (diff < 0) { /* if char less than node->key */ pcurr = &(curr->lokid); /* get next lokid pointer address */ } else { /* if char greater than node->key */ pcurr = &(curr->hikid); /* get next hikid pointer address */ } if (del) tst_stack_push(&stk, curr); /* push node on stack for del */ } ``` * MOVSBL and MOVZBL * MOVSBL sign-extends a single byte, and copies it into a double-word destination * MOVZBL expands a single byte to 32 bits with 24 leading zeros, and copies it into a double-word destination ref: [what does movsbl instruction do? [duplicate]](https://stackoverflow.com/questions/7861095/what-does-movsbl-instruction-do) 在 tst_free_all 中會牽涉大量 cache-misses ![](https://i.imgur.com/JWSV2Wp.png) 在 test_cpy 中佔最多指令的是 malloc ![](https://i.imgur.com/1iIn9Mk.png)