contributed by <Yuessiah
, HexRabbit
[1]>
eeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeee
eeeee eeeeeeeeeeee eeeee OS: elementary os elementary OS 0.3.2 freya
eeee eeeee eee eeee Kernel: x86_64 Linux 4.4.0-93-generic
eeee eeee eee eeee Shell: bash 4.3.11
eee eee eee eee Virtualization: VT-x
eee eee eee eee CPU: Intel Core i7-7700 CPU @ 4.2GHz
ee eee eeee eeee CPU(s): 8
ee eee eeeee eeeeee
ee eee eeeee eeeee ee RAM: 1691MiB / 15934MiB
eee eeee eeeeee eeeee eee L1d cache: 32K
eee eeeeeeeeee eeeeee eee L1i cache: 32K
eeeeeeeeeeeeeeeeeeeeeeee eeeee L2 cache: 256K
eeeeeeee eeeeeeeeeeee eeee L3 cache: 8192K
eeeee eeeee
eeeeeee eeeeeee BogoMIPS: 7200.65
eeeeeeeeeeeeeeeee
在一堆字串的集合當中要搜尋一個特定字串,我們能用 hash table 實現,但 hash table 隨著資料(字串)量的增加,要調整 hash table 的 size 才能保持 hash table 應有的效能;那麼改用 Binary Search Tree (BST),雖然效能穩定下來,但相較 hash table 還是慢太多了;如果用 Trie,雖然查詢非常快,但是需要的儲存空間太大。
而 Ternary Search Tree (TST) 結合了 Trie 和 BST 兩者的優點,不但查詢快,且儲存空間用得少。
用給定的 coding style 來縮排下述程式碼。不要忘了,為了「打群架」,我們需要建立紀律和對細節的高度掌握。
"jserv"好哦,我馬上修正!
"Yuessiah"
struct Node {
char data;
unsigned isEndOfString: 1;
struct Node *left, *eq, *right;
};
void insert(struct Node** root, char *word)
{
if (!(*root)) *root = newNode(*word);
if ((*word) < (*root)->data) insert(&( (*root)->left ), word);
else if ((*word) > (*root)->data) insert(&( (*root)->right ), word);
else {
if (*(word+1)) insert(&( (*root)->eq ), word+1);
else (*root)->isEndOfString = 1;
}
}
int search(struct Node *root, char *word)
{
if (!root) return 0;
if (*word < (root)->data) return search(root->left, word);
else if (*word > (root)->data) return search(root->right, word);
else {
if (*(word+1) == '\0') return root->isEndOfString;
return search(root->eq, word+1);
}
}
a
d
a
de
a
del
a
h
a
ai
from = l
, 表示從上個節點的 lokid
來的
from = e
, 表示從上個節點的 eqkid
來的
from = h
, 表示從上個節點的 hikid
來的
prev_key
為父節點的 key
值
若無 key
值或 str
值,此節點為 NULL
depth = 0, from = Ø, prev_key = Ø, key = d
depth = 1, from = l, prev_key = d, key = a
depth = 2, from = l, prev_key = a
depth = 2, from = e, prev_key = a, key = i
depth = 3, from = l, prev_key = i
depth = 3, from = e, prev_key = i, str = ai
depth = 4, from = l, prev_key = 0
depth = 4, from = h, prev_key = 0
depth = 3, from = h, prev_key = i
depth = 2, from = h, prev_key = a
depth = 1, from = e, prev_key = d, str = d
depth = 2, from = l, prev_key = 0
depth = 2, from = h, prev_key = 0, key = e
depth = 3, from = l, prev_key = e
depth = 3, from = e, prev_key = e, str = de
depth = 4, from = l, prev_key = 0
depth = 4, from = h, prev_key = 0, key = l
depth = 5, from = l, prev_key = l
depth = 5, from = e, prev_key = l, str = del
depth = 6, from = l, prev_key = 0
depth = 6, from = h, prev_key = 0
depth = 5, from = h, prev_key = l
depth = 3, from = h, prev_key = e
depth = 1, from = h, prev_key = d, key = h
depth = 2, from = l, prev_key = h
depth = 2, from = e, prev_key = h, str = h
depth = 3, from = l, prev_key = 0
depth = 3, from = h, prev_key = 0
depth = 2, from = h, prev_key = h
HackMD 支援 GraphViz,請重新用對應語法改製下圖
"jserv"
圓圈上是 key
值
矩形上是字串
用一塊很大的記憶體空間維護 add 後的字串,避開使用 strdup (i.d. copy) 產生零碎非連續的記憶體空間存放字串。
感謝建議,有餘力會補上。
"Yuessiah"
int ord, reflen;
char *refer[ORDMAX];
char *pool_request(char *s)
{
if(reflen + strlen(s) + 1 >= REFMAX) {
refer[++ord] = (char*)malloc(sizeof(char)*REFMAX);
reflen = 0;
}
char *fp = strcpy(refer[ord] + reflen, s);
reflen += strlen(s) + 1;
return fp;
}
寫個可以隨機生成測試資料的小程式 generate_input.py
輸入 num 會分別產生兩個測試輸入檔案 (case 'f', 's', 'd'
分別操作 num 次)
一個是一定找得到的字串集 match_input.txt
另一個保證找不到的字串集 bad_input.txt
輸入 num 為 10000
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < match_input.txt
Performance counter stats for './test_cpy':
802,400,118 cache-misses # 85.252 % of all cache refs (71.37%)
941,205,107 cache-references (71.45%)
24,044,167,994 instructions # 1.22 insns per cycle (71.46%)
19,664,808,946 cycles (71.46%)
3,873,694,250 branches (71.46%)
50,142,052 branch-misses # 1.29% of all branches (71.45%)
24,009,233,921 instructions # 1.22 insns per cycle (71.37%)
4.765697782 seconds time elapsed
---
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_cpy < bad_input.txt
Performance counter stats for './test_cpy':
7,694,804 cache-misses # 65.799 % of all cache refs (70.78%)
11,694,377 cache-references (70.74%)
944,629,368 instructions # 1.68 insns per cycle (70.73%)
563,366,677 cycles (70.73%)
190,461,459 branches (73.41%)
1,864,072 branch-misses # 0.98% of all branches (73.82%)
972,387,637 instructions # 1.73 insns per cycle (71.17%)
0.136838377 seconds time elapsed
輸入 num 為 10000
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < match_input.txt
Performance counter stats for './test_ref':
819,170,625 cache-misses # 83.692 % of all cache refs (71.44%)
978,793,649 cache-references (71.45%)
23,978,277,446 instructions # 1.09 insns per cycle (71.45%)
21,969,185,090 cycles (71.45%)
3,869,006,562 branches (71.45%)
50,637,809 branch-misses # 1.31% of all branches (71.39%)
24,013,397,881 instructions # 1.09 insns per cycle (71.39%)
5.295836590 seconds time elapsed
---
$ perf stat -e cache-misses,cache-references,instructions,cycles,branches,branch-misses,instructions ./test_ref < bad_input.txt
Performance counter stats for './test_ref':
7,570,185 cache-misses # 64.458 % of all cache refs (70.43%)
11,744,444 cache-references (70.46%)
901,386,228 instructions # 1.61 insns per cycle (70.43%)
558,437,763 cycles (70.42%)
179,952,823 branches (72.74%)
1,762,884 branch-misses # 0.98% of all branches (75.71%)
934,098,197 instructions # 1.67 insns per cycle (73.32%)
0.135439831 seconds time elapsed
REF 版本的執行時間與 CPY 版本差不多。
感謝老師提醒,有空會去做解釋
"Yuessiah"
TESTS = \
test_cpy \
test_ref
bench: $(TESTS)
-rm -f output.txt
echo 10000 | ./generate_input.py
for method in $(TESTS); \
do \
echo "match-input:" >> output.txt; \
./$$method < match_input.txt > /dev/null; \
echo "bad-input:" >> output.txt; \
./$$method < bad_input.txt > /dev/null; \
done
在 REF 及 CPY 版本程式中,將 case 'a', 'f', 's', 'd'
的每次執行時間分別加總起來,並新增一項指令,輸出如下 (使用自己寫的 script, num = 10000)
./test_cpy
match-input:
add, total time 0.089029 sec.
find, total time 0.005313 sec.
search, total time 4.882656 sec.
delete, total time 0.008483 sec.
bad-input:
add, total time 0.085424 sec.
find, total time 0.000572 sec.
search, total time 0.002984 sec.
delete, total time 0.001085 sec.
./test_ref
match-input:
add, total time 0.083307 sec.
find, total time 0.005261 sec.
search, total time 5.422224 sec.
delete, total time 0.008679 sec.
bad-input:
add, total time 0.083800 sec.
find, total time 0.000609 sec.
search, total time 0.003103 sec.
delete, total time 0.001162 sec.
strdup 的代價要考慮,對 cache locality 有衝擊。不能只比較 prefix search 時間,建立 TST 的時間和空間開銷也該考慮。–-jserv [2]
考量到 strdup 的成本,我們希望減少 strdup 的使用,目前可以想到兩種作法
case: 's'
)時,只對葉子節點使用 strdup在 add (case 'a'
) 字串時,不要馬上就使用 strdup,而是在"需要"的時候才用,換句話說在做 prefix search 才使用 strdup
只有 lazy evaluation 還不夠,我們可以只在葉子節點存整個字串,而此字串的 prefix 是不是 add 過的字串,只要判定是否 refcnt > 0
就行
雖然總體時間大幅下降,但只看 prefix search,他的操作執行時間變大了。有些情況人們只希望搜索的快一點,而非載入資料。
純粹分享,本人已理解或解決。
Why did you specify -std twice? gnu99 overrides c99 in gcc's manner. –-jserv [3]
geeksforgeeks.org/ ternary search tree
Hackmd/ HMKRL
同學的共筆