2017q3 Homework2 (prefix-search)

contributed by <Yuessiah, HexRabbit^[1]>

本文件多處地方採用超連結來補完報告，建議點開來看

開發環境

                                     
         eeeeeeeeeeeeeeeee           
      eeeeeeeeeeeeeeeeeeeeeee        
    eeeee  eeeeeeeeeeee   eeeee      OS: elementary os elementary OS 0.3.2 freya
  eeee   eeeee       eee     eeee    Kernel: x86_64 Linux 4.4.0-93-generic
 eeee   eeee          eee     eeee   Shell: bash 4.3.11
eee    eee            eee       eee  Virtualization:        VT-x
eee   eee            eee        eee  CPU: Intel Core i7-7700 CPU @ 4.2GHz
ee    eee           eeee       eeee  CPU(s):                8
ee    eee         eeeee      eeeeee  
ee    eee       eeeee      eeeee ee  RAM: 1691MiB / 15934MiB
eee   eeee   eeeeee      eeeee  eee  L1d cache:             32K
eee    eeeeeeeeee     eeeeee    eee  L1i cache:             32K
 eeeeeeeeeeeeeeeeeeeeeeee    eeeee   L2 cache:              256K
  eeeeeeee eeeeeeeeeeee      eeee    L3 cache:              8192K
    eeeee                 eeeee      
      eeeeeee         eeeeeee        BogoMIPS:              7200.65
         eeeeeeeeeeeeeeeee

Ternary Search Tree

簡介

在一堆字串的集合當中要搜尋一個特定字串，我們能用 hash table 實現，但 hash table 隨著資料(字串)量的增加，要調整 hash table 的 size 才能保持 hash table 應有的效能；那麼改用 Binary Search Tree (BST)，雖然效能穩定下來，但相較 hash table 還是慢太多了；如果用 Trie，雖然查詢非常快，但是需要的儲存空間太大。
而 Ternary Search Tree (TST) 結合了 Trie 和 BST 兩者的優點，不但查詢快，且儲存空間用得少。

資料結構

用給定的 coding style 來縮排下述程式碼。不要忘了，為了「打群架」，我們需要建立紀律和對細節的高度掌握。
"jserv"

好哦，我馬上修正!
"Yuessiah"

struct Node {
    char data;
    unsigned isEndOfString: 1;
    struct Node *left, *eq, *right;
};

插入

void insert(struct Node** root, char *word)
{
    if (!(*root)) *root = newNode(*word);
    if ((*word) < (*root)->data) insert(&( (*root)->left ), word);
    else if ((*word) > (*root)->data) insert(&( (*root)->right ), word);
    else {
        if (*(word+1)) insert(&( (*root)->eq ), word+1);
        else (*root)->isEndOfString = 1;
    }
}

查詢

int search(struct Node *root, char *word)
{
    if (!root) return 0;
    if (*word < (root)->data) return search(root->left, word);
    else if (*word > (root)->data) return search(root->right, word);
    else {
        if (*(word+1) == '\0') return root->isEndOfString;
        return search(root->eq, word+1);
    }
}

prefix-search 研究與觀察

輸入測試操作指令：

a
d
a
de
a
del
a
h
a
ai

走訪

from = l , 表示從上個節點的 lokid 來的
from = e , 表示從上個節點的 eqkid 來的
from = h , 表示從上個節點的 hikid 來的
prev_key 為父節點的 key 值
若無 key 值或 str 值，此節點為 NULL

depth =  0, from = Ø, prev_key = Ø, key = d
depth =  1, from = l, prev_key = d, key = a
depth =  2, from = l, prev_key = a
depth =  2, from = e, prev_key = a, key = i
depth =  3, from = l, prev_key = i
depth =  3, from = e, prev_key = i, str = ai
depth =  4, from = l, prev_key = 0
depth =  4, from = h, prev_key = 0
depth =  3, from = h, prev_key = i
depth =  2, from = h, prev_key = a
depth =  1, from = e, prev_key = d, str = d
depth =  2, from = l, prev_key = 0
depth =  2, from = h, prev_key = 0, key = e
depth =  3, from = l, prev_key = e
depth =  3, from = e, prev_key = e, str = de
depth =  4, from = l, prev_key = 0
depth =  4, from = h, prev_key = 0, key = l
depth =  5, from = l, prev_key = l
depth =  5, from = e, prev_key = l, str = del
depth =  6, from = l, prev_key = 0
depth =  6, from = h, prev_key = 0
depth =  5, from = h, prev_key = l
depth =  3, from = h, prev_key = e
depth =  1, from = h, prev_key = d, key = h
depth =  2, from = l, prev_key = h
depth =  2, from = e, prev_key = h, str = h
depth =  3, from = l, prev_key = 0
depth =  3, from = h, prev_key = 0
depth =  2, from = h, prev_key = h

結構圖

HackMD 支援 GraphViz，請重新用對應語法改製下圖
"jserv"

圓圈上是 key 值
矩形上是字串

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

REF 版本實作

用一塊很大的記憶體空間維護 add 後的字串，避開使用 strdup (i.d. copy) 產生零碎非連續的記憶體空間存放字串。

記憶體空間並非「大」就是好
測試其他 memory pool

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

感謝建議，有餘力會補上。
"Yuessiah"

int ord, reflen;
char *refer[ORDMAX];

char *pool_request(char *s)
{
    if(reflen + strlen(s) + 1 >= REFMAX) {
        refer[++ord] = (char*)malloc(sizeof(char)*REFMAX);
        reflen = 0;
    }
    char *fp = strcpy(refer[ord] + reflen, s);
    reflen += strlen(s) + 1;

    return fp;
}

效能測試

隨機產出輸入測試資料

寫個可以隨機生成測試資料的小程式 generate_input.py
輸入 num 會分別產生兩個測試輸入檔案 (case 'f', 's', 'd' 分別操作 num 次)
一個是一定找得到的字串集 match_input.txt
另一個保證找不到的字串集 bad_input.txt

perf stat for test_cpy

輸入 num 為 10000

$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < match_input.txt
 Performance counter stats for './test_cpy':

       802,400,118      cache-misses              #   85.252 % of all cache refs      (71.37%)
       941,205,107      cache-references                                              (71.45%)
    24,044,167,994      instructions              #    1.22  insns per cycle          (71.46%)
    19,664,808,946      cycles                                                        (71.46%)
     3,873,694,250      branches                                                      (71.46%)
        50,142,052      branch-misses             #    1.29% of all branches          (71.45%)
    24,009,233,921      instructions              #    1.22  insns per cycle          (71.37%)

       4.765697782 seconds time elapsed

---
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < bad_input.txt
 Performance counter stats for './test_cpy':

         7,694,804      cache-misses              #   65.799 % of all cache refs      (70.78%)
        11,694,377      cache-references                                              (70.74%)
       944,629,368      instructions              #    1.68  insns per cycle          (70.73%)
       563,366,677      cycles                                                        (70.73%)
       190,461,459      branches                                                      (73.41%)
         1,864,072      branch-misses             #    0.98% of all branches          (73.82%)
       972,387,637      instructions              #    1.73  insns per cycle          (71.17%)

       0.136838377 seconds time elapsed

perf stat for test_ref

輸入 num 為 10000

$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < match_input.txt
 Performance counter stats for './test_ref':

       819,170,625      cache-misses              #   83.692 % of all cache refs      (71.44%)
       978,793,649      cache-references                                              (71.45%)
    23,978,277,446      instructions              #    1.09  insns per cycle          (71.45%)
    21,969,185,090      cycles                                                        (71.45%)
     3,869,006,562      branches                                                      (71.45%)
        50,637,809      branch-misses             #    1.31% of all branches          (71.39%)
    24,013,397,881      instructions              #    1.09  insns per cycle          (71.39%)

       5.295836590 seconds time elapsed

---
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < bad_input.txt
 Performance counter stats for './test_ref':

         7,570,185      cache-misses              #   64.458 % of all cache refs      (70.43%)
        11,744,444      cache-references                                              (70.46%)
       901,386,228      instructions              #    1.61  insns per cycle          (70.43%)
       558,437,763      cycles                                                        (70.42%)
       179,952,823      branches                                                      (72.74%)
         1,762,884      branch-misses             #    0.98% of all branches          (75.71%)
       934,098,197      instructions              #    1.67  insns per cycle          (73.32%)

       0.135439831 seconds time elapsed

REF 版本的執行時間與 CPY 版本差不多。

不要隨便說「差不多」，及早脫離文組^TM，不要憑感覺做事
需要用電腦科學的背景來解釋

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

感謝老師提醒，有空會去做解釋
"Yuessiah"

加入 bench 執行參數

TESTS = \
    test_cpy \
    test_ref

bench: $(TESTS)
	-rm -f output.txt
	echo 10000 | ./generate_input.py
	for method in $(TESTS); \
	do \
		echo "match-input:" >> output.txt; \
		./$$method < match_input.txt > /dev/null; \
		echo "bad-input:" >> output.txt; \
		./$$method < bad_input.txt > /dev/null; \
	done

在 REF 及 CPY 版本程式中，將 case 'a', 'f', 's', 'd'的每次執行時間分別加總起來，並新增一項指令，輸出如下 (使用自己寫的 script, num = 10000)

./test_cpy
match-input:
  add,    total time 0.089029 sec.
  find,   total time 0.005313 sec.
  search, total time 4.882656 sec.
  delete, total time 0.008483 sec.
bad-input:
  add,    total time 0.085424 sec.
  find,   total time 0.000572 sec.
  search, total time 0.002984 sec.
  delete, total time 0.001085 sec.

./test_ref
match-input:
  add,    total time 0.083307 sec.
  find,   total time 0.005261 sec.
  search, total time 5.422224 sec.
  delete, total time 0.008679 sec.
bad-input:
  add,    total time 0.083800 sec.
  find,   total time 0.000609 sec.
  search, total time 0.003103 sec.
  delete, total time 0.001162 sec.

CPY 版本的加速

strdup 的代價要考慮，對 cache locality 有衝擊。不能只比較 prefix search 時間，建立 TST 的時間和空間開銷也該考慮。–-jserv ^[2]

考量到 strdup 的成本，我們希望減少 strdup 的使用，目前可以想到兩種作法

lazy evaluation
在做 prefix search (case: 's')時，只對葉子節點使用 strdup

1. lazy evaluation

在 add (case 'a') 字串時，不要馬上就使用 strdup，而是在"需要"的時候才用，換句話說在做 prefix search 才使用 strdup

2. 葉子節點 strdup()

只有 lazy evaluation 還不夠，我們可以只在葉子節點存整個字串，而此字串的 prefix 是不是 add 過的字串，只要判定是否 refcnt > 0 就行

pros and cons

雖然總體時間大幅下降，但只看 prefix search，他的操作執行時間變大了。有些情況人們只希望搜索的快一點，而非載入資料。

作業遇到的問題

純粹分享，本人已理解或解決。

Why did you specify -std twice? gnu99 overrides c99 in gcc's manner. –-jserv ^[3]
不同的作業環境所能容許的輸入規模不同 (e.g. 別人電腦可能輸入 30000 次操作會 segmentation fault, 我的不會) ^[4]^[5]
連續 delete 兩次從未 add 進 TST 中的相同字串，會出現非預期的回應 ^[6]

TST 實際應用

auto-completion
spell-checking

Reference

geeksforgeeks.org/ ternary search tree
Hackmd/ HMKRL 同學的共筆

本共筆是與HexRabbit共同研究得以撰寫而成 ↩︎
Hackmd/ st9007a 同學的共筆/ 修正-test-refc ↩︎
Github/ comment 9095633 by jserv ↩︎
Github/ commit 528a510 by HexRabbit ↩︎
Hackmd/ HexRabbit 同學的共筆/ 研究紀錄 ↩︎
Github/ commit 1cb6adc by Yuessiah ↩︎