---
tags: sysprog2017
---
# 2017q3 Homework2 (prefix-search)
contributed by <`Yuessiah`, `HexRabbit`[^1]>
###### 本文件多處地方採用超連結來補完報告,建議點開來看
## 開發環境
```
eeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeee
eeeee eeeeeeeeeeee eeeee OS: elementary os elementary OS 0.3.2 freya
eeee eeeee eee eeee Kernel: x86_64 Linux 4.4.0-93-generic
eeee eeee eee eeee Shell: bash 4.3.11
eee eee eee eee Virtualization: VT-x
eee eee eee eee CPU: Intel Core i7-7700 CPU @ 4.2GHz
ee eee eeee eeee CPU(s): 8
ee eee eeeee eeeeee
ee eee eeeee eeeee ee RAM: 1691MiB / 15934MiB
eee eeee eeeeee eeeee eee L1d cache: 32K
eee eeeeeeeeee eeeeee eee L1i cache: 32K
eeeeeeeeeeeeeeeeeeeeeeee eeeee L2 cache: 256K
eeeeeeee eeeeeeeeeeee eeee L3 cache: 8192K
eeeee eeeee
eeeeeee eeeeeee BogoMIPS: 7200.65
eeeeeeeeeeeeeeeee
```
# Ternary Search Tree
## 簡介
在一堆字串的集合當中要搜尋一個特定字串,我們能用 [hash table](https://en.wikipedia.org/wiki/Hash_table) 實現,但 hash table 隨著資料(字串)量的增加,要調整 hash table 的 size 才能保持 hash table 應有的效能;那麼改用 [Binary Search Tree](https://en.wikipedia.org/wiki/Binary_search_tree) (BST),雖然效能穩定下來,但相較 hash table 還是慢太多了;如果用 [Trie](https://en.wikipedia.org/wiki/Trie),雖然查詢非常快,但是需要的儲存空間太大。
而 [Ternary Search Tree](https://en.wikipedia.org/wiki/Ternary_search_tree) (TST) 結合了 Trie 和 BST 兩者的優點,不但查詢快,且儲存空間用得少。
### 資料結構
> 用給定的 coding style 來縮排下述程式碼。不要忘了,為了「打群架」,我們需要建立紀律和對細節的高度掌握。
> [name="jserv"][color=red]
>> 好哦,我馬上修正!
>> [name="Yuessiah"][color=blue]
```clike
struct Node {
char data;
unsigned isEndOfString: 1;
struct Node *left, *eq, *right;
};
```
### 插入
```clike
void insert(struct Node** root, char *word)
{
if (!(*root)) *root = newNode(*word);
if ((*word) < (*root)->data) insert(&( (*root)->left ), word);
else if ((*word) > (*root)->data) insert(&( (*root)->right ), word);
else {
if (*(word+1)) insert(&( (*root)->eq ), word+1);
else (*root)->isEndOfString = 1;
}
}
```
### 查詢
```clike
int search(struct Node *root, char *word)
{
if (!root) return 0;
if (*word < (root)->data) return search(root->left, word);
else if (*word > (root)->data) return search(root->right, word);
else {
if (*(word+1) == '\0') return root->isEndOfString;
return search(root->eq, word+1);
}
}
```
---
## [prefix-search](https://github.com/sysprog21/prefix-search) 研究與觀察
### 輸入測試操作指令:
```
a
d
a
de
a
del
a
h
a
ai
```
### 走訪
```from = l``` , 表示從上個節點的 ```lokid``` 來的
```from = e``` , 表示從上個節點的 ```eqkid``` 來的
```from = h``` , 表示從上個節點的 ```hikid``` 來的
```prev_key``` 為父節點的 ```key``` 值
若無 ```key``` 值或 ```str``` 值,此節點為 ```NULL```
```haskell
depth = 0, from = Ø, prev_key = Ø, key = d
depth = 1, from = l, prev_key = d, key = a
depth = 2, from = l, prev_key = a
depth = 2, from = e, prev_key = a, key = i
depth = 3, from = l, prev_key = i
depth = 3, from = e, prev_key = i, str = ai
depth = 4, from = l, prev_key = 0
depth = 4, from = h, prev_key = 0
depth = 3, from = h, prev_key = i
depth = 2, from = h, prev_key = a
depth = 1, from = e, prev_key = d, str = d
depth = 2, from = l, prev_key = 0
depth = 2, from = h, prev_key = 0, key = e
depth = 3, from = l, prev_key = e
depth = 3, from = e, prev_key = e, str = de
depth = 4, from = l, prev_key = 0
depth = 4, from = h, prev_key = 0, key = l
depth = 5, from = l, prev_key = l
depth = 5, from = e, prev_key = l, str = del
depth = 6, from = l, prev_key = 0
depth = 6, from = h, prev_key = 0
depth = 5, from = h, prev_key = l
depth = 3, from = h, prev_key = e
depth = 1, from = h, prev_key = d, key = h
depth = 2, from = l, prev_key = h
depth = 2, from = e, prev_key = h, str = h
depth = 3, from = l, prev_key = 0
depth = 3, from = h, prev_key = 0
depth = 2, from = h, prev_key = h
```
### 結構圖
> HackMD 支援 GraphViz,請重新用對應語法改製下圖
> [name="jserv"][color=red]
圓圈上是 ```key``` 值
矩形上是字串
![](https://i.imgur.com/b64TgM8.png)
---
## [REF 版本](https://github.com/Yuessiah/prefix-search/commit/71b5e340f2d5b12d7a98544459a61335bf09ab89)實作
用一塊很大的記憶體空間維護 add 後的字串,避開使用 strdup (i.d. copy) 產生零碎非連續的記憶體空間存放字串。
:::danger
1. 記憶體空間並非「大」就是好
2. 測試其他 memory pool
:notes: jserv
:::
> 感謝建議,有餘力會補上。
> [name="Yuessiah"][color=blue]
```clike
int ord, reflen;
char *refer[ORDMAX];
char *pool_request(char *s)
{
if(reflen + strlen(s) + 1 >= REFMAX) {
refer[++ord] = (char*)malloc(sizeof(char)*REFMAX);
reflen = 0;
}
char *fp = strcpy(refer[ord] + reflen, s);
reflen += strlen(s) + 1;
return fp;
}
```
---
## 效能測試
### 隨機產出輸入測試資料
寫個可以隨機生成測試資料的小程式 [generate_input.py](https://github.com/Yuessiah/prefix-search/blob/master/generate_input.py)
輸入 num 會分別產生兩個測試輸入檔案 (```case 'f', 's', 'd'``` 分別操作 num 次)
一個是一定找得到的字串集 ```match_input.txt```
另一個保證找不到的字串集 ```bad_input.txt```
### perf stat for test_cpy
輸入 num 為 10000
```
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < match_input.txt
Performance counter stats for './test_cpy':
802,400,118 cache-misses # 85.252 % of all cache refs (71.37%)
941,205,107 cache-references (71.45%)
24,044,167,994 instructions # 1.22 insns per cycle (71.46%)
19,664,808,946 cycles (71.46%)
3,873,694,250 branches (71.46%)
50,142,052 branch-misses # 1.29% of all branches (71.45%)
24,009,233,921 instructions # 1.22 insns per cycle (71.37%)
4.765697782 seconds time elapsed
---
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < bad_input.txt
Performance counter stats for './test_cpy':
7,694,804 cache-misses # 65.799 % of all cache refs (70.78%)
11,694,377 cache-references (70.74%)
944,629,368 instructions # 1.68 insns per cycle (70.73%)
563,366,677 cycles (70.73%)
190,461,459 branches (73.41%)
1,864,072 branch-misses # 0.98% of all branches (73.82%)
972,387,637 instructions # 1.73 insns per cycle (71.17%)
0.136838377 seconds time elapsed
```
### perf stat for test_ref
輸入 num 為 10000
```
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < match_input.txt
Performance counter stats for './test_ref':
819,170,625 cache-misses # 83.692 % of all cache refs (71.44%)
978,793,649 cache-references (71.45%)
23,978,277,446 instructions # 1.09 insns per cycle (71.45%)
21,969,185,090 cycles (71.45%)
3,869,006,562 branches (71.45%)
50,637,809 branch-misses # 1.31% of all branches (71.39%)
24,013,397,881 instructions # 1.09 insns per cycle (71.39%)
5.295836590 seconds time elapsed
---
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < bad_input.txt
Performance counter stats for './test_ref':
7,570,185 cache-misses # 64.458 % of all cache refs (70.43%)
11,744,444 cache-references (70.46%)
901,386,228 instructions # 1.61 insns per cycle (70.43%)
558,437,763 cycles (70.42%)
179,952,823 branches (72.74%)
1,762,884 branch-misses # 0.98% of all branches (75.71%)
934,098,197 instructions # 1.67 insns per cycle (73.32%)
0.135439831 seconds time elapsed
```
REF 版本的執行時間與 CPY 版本差不多。
:::danger
1. 不要隨便說「差不多」,及早脫離文組^TM^,不要憑感覺做事
2. 需要用電腦科學的背景來解釋
:notes: jserv
:::
> 感謝老師提醒,有空會去做解釋
> [name="Yuessiah"][color=blue]
---
## 加入 bench 執行參數
```shell
TESTS = \
test_cpy \
test_ref
bench: $(TESTS)
-rm -f output.txt
echo 10000 | ./generate_input.py
for method in $(TESTS); \
do \
echo "match-input:" >> output.txt; \
./$$method < match_input.txt > /dev/null; \
echo "bad-input:" >> output.txt; \
./$$method < bad_input.txt > /dev/null; \
done
```
在 REF 及 CPY 版本程式中,將 ```case 'a', 'f', 's', 'd'```的每次執行時間分別加總起來,並新增一項[指令](https://github.com/Yuessiah/prefix-search/blob/master/test_cpy.c#L155),輸出如下 (使用自己寫的 [script](https://github.com/Yuessiah/prefix-search/blob/master/generate_input.py), num = 10000)
```
./test_cpy
match-input:
add, total time 0.089029 sec.
find, total time 0.005313 sec.
search, total time 4.882656 sec.
delete, total time 0.008483 sec.
bad-input:
add, total time 0.085424 sec.
find, total time 0.000572 sec.
search, total time 0.002984 sec.
delete, total time 0.001085 sec.
./test_ref
match-input:
add, total time 0.083307 sec.
find, total time 0.005261 sec.
search, total time 5.422224 sec.
delete, total time 0.008679 sec.
bad-input:
add, total time 0.083800 sec.
find, total time 0.000609 sec.
search, total time 0.003103 sec.
delete, total time 0.001162 sec.
```
---
## CPY 版本的加速
>strdup 的代價要考慮,對 cache locality 有衝擊。不能只比較 prefix search 時間,建立 TST 的時間和空間開銷也該考慮。---jserv [^2]
考量到 [strdup](http://man7.org/linux/man-pages/man3/strdup.3.html) 的成本,我們希望減少 strdup 的使用,目前可以想到兩種作法
1. [lazy evaluation](https://zh.wikipedia.org/wiki/惰性求值)
2. 在做 prefix search (```case: 's'```)時,只對葉子節點使用 strdup
### 1. lazy evaluation
在 add (```case 'a'```) 字串時,不要馬上就使用 strdup,而是在"需要"的時候才用,換句話說在做 prefix search 才使用 strdup
### 2. 葉子節點 strdup()
只有 lazy evaluation 還不夠,我們可以只在葉子節點存整個字串,而此字串的 prefix 是不是 add 過的字串,只要判定是否 `refcnt > 0` 就行
### pros and cons
雖然總體時間大幅下降,但只看 prefix search,他的操作執行時間變大了。有些情況人們只希望搜索的快一點,而非載入資料。
---
## 作業遇到的問題
純粹分享,本人已理解或解決。
- >Why did you specify -std twice? gnu99 overrides c99 in gcc's manner. ---jserv [^3]
- 不同的作業環境所能容許的輸入規模不同 (e.g. 別人電腦可能輸入 30000 次操作會 segmentation fault, 我的不會) [^4][^5]
- 連續 delete 兩次從未 add 進 TST 中的相同字串,會出現非預期的回應 [^6]
---
## TST 實際應用
- [auto-completion](http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/)
- spell-checking
---
## Reference
[geeksforgeeks.org/ ternary search tree](http://www.geeksforgeeks.org/ternary-search-tree)
[Hackmd/ ```HMKRL``` 同學的共筆](https://hackmd.io/s/BkdddNs3-)
[^1]: 本共筆是與`HexRabbit`[共同研究](https://hackmd.io/MwBgLArATAnAjAUwLTDgYwOxLAEzjJAQwmCiQwykwgA4IYwqg===?view)得以撰寫而成
[^2]: [Hackmd/ `st9007a` 同學的共筆/ 修正-test-refc](https://hackmd.io/s/SkeKQEq3Z#修正-test-refc)
[^3]: Github/ comment [`9095633`](https://github.com/Yuessiah/prefix-search/commit/909563351240dadcd50125c7bd9159d5ca30b216#comments) by `jserv`
[^4]: Github/ commit [`528a510`](https://github.com/HexRabbit/prefix-search/commit/528a510e144a8e6dd9d2a1cdb2da6614a1a1abaa) by `HexRabbit`
[^5]: [Hackmd/ `HexRabbit` 同學的共筆/ 研究紀錄](https://hackmd.io/s/S1bax6MR2b#研究紀錄)
[^6]: Github/ commit [`1cb6adc`](https://github.com/Yuessiah/prefix-search/commit/1cb6adc843f92c9eea21413f732b3acffbffd9c6) by `Yuessiah`