owned this note
owned this note
Published
Linked with GitHub
# 2018q1 Homework2 (prefix-search)
contributed by < `f74034067` >
## 開發環境
```
$ uname -a
Linux suchihhan-S551LN 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
```
```
$ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 69
Model name: Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz
製程: 1
CPU MHz: 2394.417
CPU max MHz: 3000.0000
CPU min MHz: 800.0000
BogoMIPS: 4788.83
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 4096K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti retpoline tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
```
## 認識 ternary search tree
* 閱讀
* [LinYunWen](https://hackmd.io/s/SJPItbTtM#)
* [Ternary search tree](https://en.wikipedia.org/wiki/Ternary_search_tree)
* [Trie](https://zh.wikipedia.org/wiki/Trie)
### Tire (字典樹)
* 介紹
* 又稱 prefix tree (前綴樹)
* 字串不是直接保存在節點中,而是由節點在樹中的位置决定
* 除了 root (根節點) 對應空字串外,每一個節點的所有子孫都有相同的前缀
![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Trie_example.svg/400px-Trie_example.svg.png)
* 特性 (以 universal set = { 英文小寫字母 } 為例)
* 每個 node 建立時,都必須同時創造最多26個值為NULL的子節點
* 若儲存的字串有許多相同的 prefix ,那麼 trie 可以節省大量儲存空間
* 想反的,若是沒有許多相同的 prefix 將會非常浪費空間
### Ternary tree (三元樹)
* 介紹
* 是一種 trie (字典樹)
* 每個節點最多只會有3個子節點,分別代表小於,等於,大於
![「Ternary search tree」的圖片搜尋結果](https://www.geeksforgeeks.org/wp-content/uploads/Ternary-Search-Tree.png)
* 特性
* 較一般 prefix tree 節省空間
* 搜尋速度較慢
* 以不同的順序新增字串會產生不同的樹
## 修改 Makefile
* 閱讀
* [raygoah](https://hackmd.io/dwDOR7H2Q8qyxGxSOsc-vA?view)
* `$ make astyle`
```clike=
astyle:
astyle --style=kr --indent=spaces=4 --suffix=none *.[ch]
```
* `$ make bench`
```clike=
TEST_INPUT = f Taiwan
bench: $(TESTS)
echo 3 | sudo tee /proc/sys/vm/drop_caches
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./test_cpy --bench $(TEST_INPUT)
echo 3 | sudo tee /proc/sys/vm/drop_caches
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./test_ref --bench $(TEST_INPUT)
```
除了修改 Makefile,test_cpy.c 及 test_ref.c 也要在 `fgets()` 的地方新增判斷式判斷是否為 --bench 模式。
```clike=
- fgets(word, sizeof word, stdin);
+ if (bench_flag == 0) {
+ fgets(word, sizeof word, stdin);
+ } else {
+ strcpy(word, argv[2]);
+ }
```
```clike=
- if (!fgets(word, sizeof word, stdin)) {
- fprintf(stderr, "error: insufficient input.\n");
- break;
- }
- rmcrlf(word);
+ if (bench_flag == 0) {
+ if (!fgets(word, sizeof word, stdin)) {
+ fprintf(stderr, "error: insufficient input.\n");
+ break;
+ }
+ rmcrlf(word);
+ } else {
+ strcpy(word, argv[3]);
+ }
```
為避免程式卡在 `for(;;)` 中,在 `break;` 前
```clike=
if (bench_flag == 1) {
return 0;
}
```
* 執行結果
```
Performance counter stats for './test_cpy --bench f Taiwan' (100 runs):
488,395 cache-misses # 29.139 % of all cache refs ( +- 1.26% )
1,676,097 cache-references ( +- 0.47% )
449,033,851 instructions # 1.08 insn per cycle ( +- 0.01% )
417,234,828 cycles ( +- 0.48% )
0.148296779 seconds time elapsed ( +- 0.56% )
```
## 修改 FIXME
* 閱讀
* [rex662624](https://hackmd.io/s/SJbHD5sYM#)
* 首先修改 `test_ref.c` 標注 `FIXME` 的地方
```clike=
- res = tst_ins_del(&root, &p, INS, CPY);
+ res = tst_ins_del(&root, &p, INS, REF);
```
* 執行 `test_ref.c`,會發現搜尋結果有誤
```
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: s
find words matching prefix (at least 1 char): Tain
Tain - searched prefix in 0.000010 sec
suggest[0] : Tain
suggest[1] : Tain
suggest[2] : Tain
suggest[3] : Tain
suggest[4] : Tain
suggest[5] : Tain
suggest[6] : Tain
suggest[7] : Tain
suggest[8] : Tain
suggest[9] : Tain
suggest[10] : Tain
suggest[11] : Tain
```
* `tst.c` 中與 CPY 有關的程式碼
```clike=
if (*p++ == 0) {
if (cpy) { /* allocate storage for 's' */
const char *eqdata = strdup(*s);
if (!eqdata)
return NULL;
curr->eqkid = (tst_node *) eqdata;
return (void *) eqdata;
} else { /* save pointer to 's' (allocated elsewhere) */
curr->eqkid = (tst_node *) *s;
return (void *) *s;
}
}
```
tst_ins_del() 的最後一個參數若傳入 REF,會執行 else 部份,因此我決定先縮減測資並且分別印出兩種版本回傳的 address
* $ ./test_ref
REF 的版本中,每次傳入 tst_ins_del() 的 *p 都相同,curr -> eqkid 當然也就都指向同一個位址。
```
0x7ffc14f0a4d0
0x7ffc14f0a4d0
0x7ffc14f0a4d0
0x7ffc14f0a4d0
0x7ffc14f0a4d0
ternary_tree, loaded 5 words in 0.000190 sec
```
* $ ./test_cpy
CPY 的版本由於 strdup(*s) 會先用 malloc() 配置與參數 s 字串相同大小的空間,然後將 s 字串的內容複製到該空間,所以每次回傳的地址都不相同。
```
0x1d2a430
0x1d2a980
0x1d2aaf0
0x1d2ac60
0x1d2ae30
ternary_tree, loaded 5 words in 0.000124 sec
```
* 參考 [tina0405](https://hackmd.io/s/SkEFpwh2-#) 的作法,先建立一個叫 word_arr 二維陣列 ,每次讀入一個新的城市就依序存入 word_arr[ ][ WRDMAX ] 中,如此一來每次傳入 tst_ins_del() 的 *p 就不會一樣了。
若是在 main() 裡面建立一個 word_arr[ 259112 ][ WRDMAX ] 會造成程序 core dumped ,所以我改為宣告成全域變數。
```clike=
char word_arr[259112][WRDMAX];
int main()
{
...
while ((rtn = fscanf(fp, "%s", word_arr[idx])) != EOF) {
char *p = word_arr[idx];
/* FIXME: insert reference to each string */
if (!tst_ins_del(&root, &p, INS, REF)) {
fprintf(stderr, "error: memory exhausted, tst_insert.\n");
fclose(fp);
return 1;
}
idx++;
}
...
}
```
* $ ./test_ref
```
ternary_tree, loaded 259112 words in 0.155890 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: s
find words matching prefix (at least 1 char): Tain
Tain - searched prefix in 0.000018 sec
suggest[0] : Tain,
suggest[1] : Tainan,
suggest[2] : Taino,
suggest[3] : Tainter
suggest[4] : Taintrux,
```
* 雖然 search 的功能已經可以正確執行,然而 delete 跟 quit 的功能卻會出現錯誤
```
ternary_tree, loaded 259112 words in 0.166198 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: d
enter word to del: China
deleting China
China (refcnt: 1099) not removed.
delete failed.
```
```
ternary_tree, loaded 259112 words in 0.153593 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: q
*** Error in `./test_ref': free(): invalid pointer: 0x0000000000fb8000 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f2bbcc487e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f2bbcc5137a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f2bbcc5553c]
./test_ref[0x4021e4]
./test_ref[0x4021b9]
./test_ref[0x4021b9]
./test_ref[0x4021b9]
./test_ref[0x4021b9]
./test_ref[0x4021b9]
./test_ref[0x4021b9]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x4021b9]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x4021b9]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x40219e]
./test_ref[0x401348]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f2bbcbf1830]
./test_ref[0x400979]
======= Memory map: ========
00400000-00403000 r-xp 00000000 08:08 261623 /home/suchihhan/prefix-search/test_ref
00602000-00603000 r--p 00002000 08:08 261623 /home/suchihhan/prefix-search/test_ref
00603000-00604000 rw-p 00003000 08:08 261623 /home/suchihhan/prefix-search/test_ref
00604000-04546000 rw-p 00000000 00:00 0
0511e000-06831000 rw-p 00000000 00:00 0 [heap]
7f2bb8000000-7f2bb8021000 rw-p 00000000 00:00 0
7f2bb8021000-7f2bbc000000 ---p 00000000 00:00 0
7f2bbc9bb000-7f2bbc9d1000 r-xp 00000000 08:07 136362 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2bbc9d1000-7f2bbcbd0000 ---p 00016000 08:07 136362 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2bbcbd0000-7f2bbcbd1000 rw-p 00015000 08:07 136362 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2bbcbd1000-7f2bbcd91000 r-xp 00000000 08:07 135915 /lib/x86_64-linux-gnu/libc-2.23.so
7f2bbcd91000-7f2bbcf91000 ---p 001c0000 08:07 135915 /lib/x86_64-linux-gnu/libc-2.23.so
7f2bbcf91000-7f2bbcf95000 r--p 001c0000 08:07 135915 /lib/x86_64-linux-gnu/libc-2.23.so
7f2bbcf95000-7f2bbcf97000 rw-p 001c4000 08:07 135915 /lib/x86_64-linux-gnu/libc-2.23.so
7f2bbcf97000-7f2bbcf9b000 rw-p 00000000 00:00 0
7f2bbcf9b000-7f2bbcfc1000 r-xp 00000000 08:07 135871 /lib/x86_64-linux-gnu/ld-2.23.so
7f2bbd1a3000-7f2bbd1a6000 rw-p 00000000 00:00 0
7f2bbd1bf000-7f2bbd1c0000 rw-p 00000000 00:00 0
7f2bbd1c0000-7f2bbd1c1000 r--p 00025000 08:07 135871 /lib/x86_64-linux-gnu/ld-2.23.so
7f2bbd1c1000-7f2bbd1c2000 rw-p 00026000 08:07 135871 /lib/x86_64-linux-gnu/ld-2.23.so
7f2bbd1c2000-7f2bbd1c3000 rw-p 00000000 00:00 0
7ffc4547e000-7ffc4549f000 rw-p 00000000 00:00 0 [stack]
7ffc455c9000-7ffc455cc000 r--p 00000000 00:00 0 [vvar]
7ffc455cc000-7ffc455ce000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
已經終止 (core dumped)
```
* delete 功能的錯誤修改 `tst.c` 中 tst_del_word() 的最後一個參數,將原本寫死的 1 改為 cpy
```clike=
-tst_del_word(root, curr, &stk, 1);
+tst_del_word(root, curr, &stk, cpy);
```
* quit 功能會出錯則是因為 `test_ref.c` 中 tst_free_all(root) 會 free 到 REF 版本中宣告的二維陣列的位址,若改成 tst_free(root) 就不會出錯了
```clike=
-tst_free_all(root);
+tst_free(root)
```
## 比較 REF 與 CPY 的效能
修正上述的錯誤後,先來測試一下兩種版本的效能,以執行 `s Tai` 100次為例
* $ make bench
```
Performance counter stats for './test_cpy --bench s Tai' (100 runs):
405,229 cache-misses # 25.640 % of all cache refs ( +- 1.07% )
1,580,480 cache-references ( +- 0.32% )
449,637,901 instructions # 1.17 insn per cycle ( +- 0.00% )
385,322,883 cycles ( +- 0.22% )
0.132403198 seconds time elapsed ( +- 0.32% )
```
```
Performance counter stats for './test_ref --bench s Tai' (100 runs):
715,608 cache-misses # 35.563 % of all cache refs ( +- 1.47% )
2,012,240 cache-references ( +- 0.42% )
469,495,339 instructions # 1.00 insn per cycle ( +- 0.00% )
470,839,058 cycles ( +- 0.49% )
0.162691614 seconds time elapsed ( +- 0.62% )
```
在 REF 版本,使用陣列儲存 ternary search tree 中的某些節點,cache 的 spatial locality 特性反而有機會在搜尋中將許多尚不需用到的 data 搬到 cache 中,造成 cache-misses 較高,並且由於陣列宣告成固定大小,還會產生 memory fragmentation 的問題。
## 實做 memory pool
參考 [ChiuYiTang](https://hackmd.io/s/ByeocUhnZ#) 的作法
* 只紀錄兩個指標,一個 pPool 指向整個 memory pool 的開頭,一個 pTop 則指向下一次要分配出的記憶體位址。
```clike=
int main()
{
...
char *pPool = (char *) malloc(poolSize * sizeof(char));
char *pTop = pPool;
while ((rtn = fscanf(fp, "%s", pTop)) != EOF) {
t1 = tvgetf();
char *p = pTop;
/* FIXME: insert reference to each string */
if (!tst_ins_del(&root, &p, INS, REF)) {
fprintf(stderr, "error: memory exhausted, tst_insert.\n");
fclose(fp);
return 1;
}
idx++;
pTop += (strlen(pTop) + 1);
}
...
case 'a':
printf("enter word to add: ");
if (bench_flag == 0) {
if (!fgets(pTop, sizeof word, stdin)) {
fprintf(stderr, "error: insufficient input.\n");
break;
}
rmcrlf(pTop);
} else {
strcpy(pTop, argv[3]);
}
p = pTop;
t1 = tvgetf();
/* FIXME: insert reference to each string */
res = tst_ins_del(&root, &p, INS, REF);
t2 = tvgetf();
if (res) {
idx++;
pTop += (strlen(pTop) + 1);
printf(" %s - inserted in %.6f sec. (%d words in tree)\n",(char *) res, t2 - t1, idx);
} else
printf(" %s - already exists in list.\n", (char *) res);
}
...
case 'q':
tst_free(root);
free(pPool);
return 0;
...
```
* $ make bench
```
Performance counter stats for './test_cpy --bench s Tai' (100 runs):
444,998 cache-misses # 27.656 % of all cache refs ( +- 1.78% )
1,609,058 cache-references ( +- 0.60% )
480,816,403 instructions # 1.09 insn per cycle ( +- 0.02% )
442,471,864 cycles ( +- 0.50% )
0.152579698 seconds time elapsed ( +- 0.61% )
```
```
Performance counter stats for './test_ref --bench s Tai' (100 runs):
404,904 cache-misses # 25.170 % of all cache refs ( +- 1.48% )
1,608,690 cache-references ( +- 0.46% )
464,059,245 instructions # 1.10 insn per cycle ( +- 0.00% )
422,942,275 cycles ( +- 0.47% )
0.145401520 seconds time elapsed ( +- 0.56% )
```