# 2019q3 Homework5 (dict)
contributed by < `xl86305955` >
###### tags:`sysprog2019` `Homework5` `dict`
## 預期目標
* 學習 [ternary search tree](https://en.wikipedia.org/wiki/Ternary_search_tree) 作為 [auto-complete](https://en.wikipedia.org/wiki/Autocomplete) 和 prefix search 的實作機制
* 學習 [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) / [Quotient filter](https://en.wikipedia.org/wiki/Quotient_filter) 和 [ternary search tree](https://en.wikipedia.org/wiki/Ternary_search_tree) 的整合議題
* bit-wise operation 的實踐
* 設計效能評比 (benchmark) 的程式框架
* 學習 GNU make 和 `Makefile` 撰寫
* 複習機率統計和對應的數學基礎
## Trie v.s. Tenary Search Tree
### Trie
在 Trie 中,每個節點為陣列型式儲存對應字串的每一個字元,透過前序特性能夠快速新增或搜詢到想要的字串
![](https://i.imgur.com/GZwIJZ4.png)
* 時間複雜度
* 新增字串: $O(S)$,其中 $S$ 為字串長度
* 尋找字串: $O(S)$,其中 $S$ 為字串長度
* 優點: 與 `Binary Search Tree` 相比,新增及尋找速度非常快。因為節點儲存的內容不同, Trie 所儲存的是字串中的每個字元,搜尋時只需依照字串內容不停往下走,若存在則走完整個字串,不在的話再過程終究可得知;而 BST 式將整個字串儲存成一個節點,因此一定要確定節點是否存在在樹中
* 缺點: 若是資料量大且沒有相同前序,則相當浪費記憶體空間,因位每個 node 階以陣列型式儲存
### Tenary Search Tree
Tenary Search Tree 是一種 Trie 的延伸,又結合了 Binary Tree 的特性,每個節點儲存字串中的字元,兄弟節點之間具有 `>`, `=`, `<` 之關係,依據字串中字元順序長下去
![](https://i.imgur.com/bvPCtrz.png)
* Time Complexity
| Time Complexity | Avg | Worst Case|
| -------- | -------- | -------- |
| Search | $O(logN)$ | $O(N)$ |
| Insert | $O(logN)$ | $O(N)$ |
| Delete | $O(logN)$ | $O(N)$ |
## 程式碼修正
欲搜尋如 `Al Khāniq, Yemen` 之類程式:
```
$ ./test_cpy
ternary_tree, loaded 259112 words in 0.106885 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: s
find words matching prefix (at least 1 char): Al
Al - not found
```
[dict 改善 & 效能檢討](https://hackmd.io/@j4ywJU0OTzS2BlwpJo6THA/r1ZdcbFpm)中提到欲改善 `fsacanf` 在遇到 `space` 會終止輸入,因此在此加入 `buffer` 處理
```clike=
char buf[WRDMAX];
while (fgets(buf, WORDMAX, fp)) {
char *token = ",";
char *s = strtok(buf, token);
char *p = s;
if (!tst_ins_del(&root, p, INS, CPY)) {
fprintf(stderr, "error: memory exhausted, tst_insert.\n");
fclose(fp);
return 1;
}
idx++;
}
```
修改過後:
```
$ ./test_cpy
ternary_tree, loaded 93827 words in 0.056013 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: s
find words matching prefix (at least 1 char): Al
Al - searched prefix in 0.000290 sec
suggest[0] : Al Ḩāmūl
suggest[1] : Al Ḩīlah
suggest[2] : Al Ḩadd
suggest[3] : Al Ḩamdī
suggest[4] : Al Ḩammāmāt
...
```
在修改過後,發現 loaded words 從原先的 `259112` 變成 `93827`。從 `cities.txt` 觀察,裡面總共是 `93827` 個欄位
修改過後,在 find 的選項會出現問題:
```
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: f
find word in tree: Tainan
Tainan not found by bloom filter.
```
因此,要在 `line 11` 加入:
```clike
bloom_add(bloom, p);
```
加入過後,結果正確
```
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: f
find word in tree: Tainan
Bloomfilter found Tainan in 0.000002 sec.
Probability of false positives:0.001357
----------
Tree found Tainan in 0.000002 sec.
```
## Makefile 修改
### 根據 Bench 繪圖之修改
在 Makefile 中加入 `bench_file` 選項,將 bench_test 計算之搜尋 prefix 時間紀錄於下面兩個 `.txt` 檔中
```
bench_file: $(TESTS)
./test_cpy --bench > bench_cpy.txt
./test_ref --bench > bench_ref.txt
```
如此一來,即可在 Makefile 中加根據 `bench_cpy.txt` 與 `bench_ref.txt` 繪出 `runtimept.gp`
```
BENCH_FILE = bench_cpy.txt bench_ref.txt
...
plot_pt: $(BENCH_FILE)
gnuplot scripts/runtimept.gp
eog runtime2.png
```
另外也可加入 `runtime3` 與 `runtime4` 將其 `cpy`及`ref`分別輸出,以便觀察分佈情形
```
plot_3: $(BENCH_FILE)
gnuplot scripts/runtime3.gp
eog runtime3.png
plot_4: $(BENCH_FILE)
gnuplot scripts/runtime4.gp
eog runtime4.png
```
### 根據 perf 繪圖之修改
在每一次 test 中, `perf` 都會計算對應的 `cache-misses`, `cache-references`, `instruction counts`, `cycles`等數值。為了更好閱讀數值,我們可以將其圖像化。首先,加入 `-o` 將其輸出儲存
```
test: $(TESTS)
echo 3 | sudo tee /proc/sys/vm/drop_caches;
perf stat --repeat 100 -o _perf_cpy.txt \
-e cache-misses,cache-references,instructions,cycles \
./test_cpy --bench $(TEST_DATA)
perf stat --repeat 100 -o _perf_ref.txt \
-e cache-misses,cache-references,instructions,cycles \
./test_ref --bench $(TEST_DATA)
```
其中內容大致如下,以 `_perf_cpy.txt` 為例:
```
$ cat _perf_cpy.txt
# started on Wed Nov 20 13:38:06 2019
Performance counter stats for './test_cpy --bench s Tai' (100 runs):
419,7907 cache-misses # 46.395 % of all cache refs ( +- 0.36% )
904,8119 cache-references ( +- 0.13% )
3,9375,0057 instructions # 1.08 insn per cycle ( +- 0.01% )
3,6526,3327 cycles ( +- 0.31% )
0.085919477 seconds time elapsed ( +- 0.76% )
```
接著,準備繪圖所需之資料。利用 `grep` 指令將所需要數值提出,再利用 `sed` 指令將逗號去除,寫入新的檔案
```
PERF_PREFILE = _perf_cpy.txt _perf_ref.txt
...
perf: $(PERF_PREFILE)
grep -Eo '[0-9]+\,+[0-9]+\,*[0-9]+' _perf_cpy.txt \
| sed 's/,//g' \
> perf_cpy.txt
grep -Eo '[0-9]+\,+[0-9]+\,*[0-9]+' _perf_ref.txt \
| sed 's/,//g' \
> perf_ref.txt
```
在 `/scripts` 下加入新的繪圖腳本
```
reset
set style fill solid
set key center top
set title 'perf stat'
set term png enhanced font 'Verdana,10'
set output 'perf_stat.png'
plot 'perf_cpy.txt' using 1:xtic(1) with histogram title 'cpy' , \
'perf_ref.txt' using 1:xtic(1) with histogram title 'ref'
```
最後,在 Makefile 中加入選項
```
plot_perf: $(PERF_FILE)
gnuplot scripts/perf.gp
eog perf_stat.png
```
解決個別繪圖選項後,可加入選項加圖片一次全部匯出
```
plot_all: output.txt $(BENCH_FILE) $(PERF_FILE)
gnuplot scripts/runtime.gp
gnuplot scripts/runtimept.gp
gnuplot scripts/runtime3.gp
gnuplot scripts/runtime4.gp
gnuplot scripts/perf.gp
eog runtime.png&
eog runtime2.png&
eog runtime3.png&
eog runtime4.png&
eog perf_stat.png&
```
## 圖片輸出
測資為 `s Tai`:
![](https://i.imgur.com/SuGG9yL.png)
![](https://i.imgur.com/LVcrDWe.png)
![](https://i.imgur.com/hdhYaut.png)
![](https://i.imgur.com/zZaR4Sj.png)
測資改為 `s New`:
原先分佈很燥動的 `cpy` 分佈變得較為集中,而 `ref` 則相反
![](https://i.imgur.com/s2CSSsA.png)
![](https://i.imgur.com/IqecycP.png)
![](https://i.imgur.com/rzhTJoU.png)
## perf 觀察
![](https://i.imgur.com/USOg0q0.png)
![](https://i.imgur.com/Nz3abJH.png)
利用 perf 觀察 cycle 總數,發現 `test_cpy` 中呼叫 `tst_ins_del` 花費最多指令周期。在 `line 2` `diff = *p - curr->key` 中,涉及大量 memory 操作
```clike=
while ((curr = *pcurr)) { /* iterate to insertion node */
diff = *p - curr->key; /* get ASCII diff for >, <, = */
if (diff == 0) { /* if char equal to node->key */
if (*p++ == 0) { /* check if word is duplicate */
if (del) { /* delete instead of insert */
(curr->refcnt)--; /* decrement reference count */
/* chk refcnt, del 's', return NULL on successful del */
return tst_del_word(root, curr, &stk, cpy);
} else
curr->refcnt++; /* increment refcnt if word exists */
return (void *) curr->eqkid; /* pointer to word / NULL on del */
}
pcurr = &(curr->eqkid); /* get next eqkid pointer address */
} else if (diff < 0) { /* if char less than node->key */
pcurr = &(curr->lokid); /* get next lokid pointer address */
} else { /* if char greater than node->key */
pcurr = &(curr->hikid); /* get next hikid pointer address */
}
if (del)
tst_stack_push(&stk, curr); /* push node on stack for del */
}
```
* MOVSBL and MOVZBL
* MOVSBL sign-extends a single byte, and copies it into a double-word destination
* MOVZBL expands a single byte to 32 bits with 24 leading zeros, and copies it into a double-word destination
ref: [what does movsbl instruction do? [duplicate]](https://stackoverflow.com/questions/7861095/what-does-movsbl-instruction-do)
在 tst_free_all 中會牽涉大量 cache-misses
![](https://i.imgur.com/JWSV2Wp.png)
在 test_cpy 中佔最多指令的是 malloc
![](https://i.imgur.com/1iIn9Mk.png)