owned this note
owned this note
Published
Linked with GitHub
# 2018q3 E02
先來看看 test_ref 的表現
```
choice: f
find word in tree: Taiwan
Bloomfilter found Taiwan in 0.000001 sec.
Probability of false positives:0.009693
----------
Tree found Taiwan in 0.000001 sec.
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: f
find word in tree: University
Bloomfilter found University in 0.000001 sec.
Probability of false positives:0.009693
----------
Tree found University in 0.000002 sec.
```
REF 版本相比 CPY 版本是有加 Bloom Filter。可以看到當字串比較簡單時,兩者搜尋的時間近乎相同。當字串開始複雜起來時,就可以發現樹的搜尋時間明顯變長,相比只要 hash 的 BloomFiler
## 內建的 runtime_gp3
剛開輸出來的圖長這樣
![](https://i.imgur.com/kbWjmSR.png)
在放大來看
![](https://i.imgur.com/YgVJI3R.png)
會發現 CPY 的 benchmark 怪怪的,搜尋時間竟然都是同樣的時間,根據 [Jyun-Neng](https://github.com/Jyun-Neng) 同學的共筆,可以把程式碼中 prefix 的部份修正。並且有正確數值輸出。
![](https://i.imgur.com/YCNa6ww.png)
並且在橫軸的時間明顯不對
``` clike=
double tvgetf()
{
struct timespec ts;
double sec;
clock_gettime(CLOCK_REALTIME, &ts);
sec = ts.tv_nsec;
sec /= 1e9;
sec += ts.tv_sec;
return sec;
}
```
原來是把秒數除以 1e9 也就是奈秒等級。因此把 runtime_gp3 修正成
並且把 xtic 修正為 25000,也修正檔案涵蓋範圍容納整個 benchmark 的資料。
```
reset
set xlabel 'prefix'
set ylabel 'time(nano sec)'
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime3.png'
set format x "%10.0f"
set xtics rotate by 45 right
plot [:][:1000]'bench_cpy.txt' using 1:2 with points title 'cpy',\
```
![](https://i.imgur.com/PKChnNk.png)
這樣好看多了。請注意,這些是利用 cities.txt 所作的 prefix search 得到的數據。僅能代表 tst 的效能。可以觀察到主要都落在 700 奈秒以內。
## 自製 benchmark
我們知道 ref 是有加 bloomfilter 的版本,所以用 test_ref.c 來實驗應該很適合。要考慮到字彙的涵蓋最好的方法就是全部 dictionary 全都找一次。Find 可以用 tst 的方法也可以用 bloomfilter 的方法。既然我的資料集是用原本的測站,那每次搜尋理論上都可以找到,也好比較兩個資料結構的效能。
這裡將模仿 bench.c 裡面的 benchmark 作法
```clike=
int find_bench_test(
const tst_node *root,
char *out_file,
bloom_t bloom) //傳入樹的跟節點與 bloomfiler
{
FILE *fp = fopen(out_file, "w");
FILE *dict = fopen(DICT_FILE, "r");
char word[WORDMAX] = "";
int idx = 0;
double t1, t2, t3, t4;
// clock_t start_1,start_2,end_1,end_2;
if (!fp || !dict) {
if (fp) {
fprintf(stderr, "error: file open failed in '%s'.\n", DICT_FILE);
fclose(fp);
}
if (dict) {
fprintf(stderr, "error: file open failed in '%s'.\n", out_file);
fclose(dict);
}
return 1;
}
//開始統計兩者搜尋的速度
while (fscanf(dict, "%s", word) != EOF) {
// printf("%s\n",word);
// size_t len = strlen(word);
// word[len] = '\0';
// if (len && word[len - 1] == '\n')
// word[--len] = 0;
t1 = tvgetf();
tst_search(root, word);
t2 = tvgetf();
t3 = tvgetf();
bloom_test(bloom, word);
t4 = tvgetf();
fprintf(fp, "%d %f %f nsec\n", idx, (t2 - t1) * 1000000,
(t4 - t3) * 1000000);
idx++;
}
fclose(fp);
fclose(dict);
return 0;
}
```
![](https://i.imgur.com/yp2gIDf.png)
綠色的部份是用 bloomfilter 來搜尋的時間,紫色的部份則是用 tst 走訪來搜尋字串。可以看到綠色的部份幾乎都在一奈秒以內,紫色則變化很大。
## 以 perf 分析 TST 程式碼的執行時期行為並且充分解讀
```
Performance counter stats for './test_cpy --bench s Tai' (100 runs):
1,367,998 cache-misses # 48.513 % of all cache refs ( +- 0.46% )
2,819,860 cache-references ( +- 0.19% )
527,225,080 instructions # 1.07 insn per cycle ( +- 0.00% )
491,207,582 cycles ( +- 0.23% )
0.139361246 seconds time elapsed ( +- 0.34% )
Performance counter stats for './test_ref --bench s Tai' (100 runs):
1,533,475 cache-misses # 45.929 % of all cache refs ( +- 0.35% )
3,338,797 cache-references ( +- 0.15% )
588,156,637 instructions # 1.02 insn per cycle ( +- 0.00% )
576,276,589 cycles ( +- 0.14% )
0.161919829 seconds time elapsed ( +- 0.24% )
```
## CPY 和 REF 的差別
可以看到在 tst.c 裡面,```tst_ins_del```這個函數接收不同的變數。
其中在這段判斷了是否為 CPY 與 REF
```clike=
if (*p++ == 0) {
if (cpy) { /* allocate storage for 's' */
const char *eqdata = strdup(*s);
if (!eqdata)
return NULL;
curr->eqkid = (tst_node *) eqdata;
return (void *) eqdata;
} else { /* save pointer to 's' (allocated elsewhere) */
curr->eqkid = (tst_node *) *s;
return (void *) *s;
}
}
```
可以觀察到 CPY 每次都呼叫 strdup 來複製字串而 REF 只有變更指標的地址。
CPY 版本的插入
```clike=
while ((rtn = fscanf(fp, "%s", word)) != EOF) {
char *p = word;
if (!tst_ins_del(&root, &p, INS, CPY)) {
fprintf(stderr, "error: memory exhausted, tst_insert.\n");
fclose(fp);
return 1;
}
idx++;
}
```
REF 版本的插入
```clike=
/* memory pool */
char *pool = (char *) malloc(poolsize * sizeof(char));
char *Top = pool;
while ((rtn = fscanf(fp, "%s", Top)) != EOF) {
char *p = Top;
/* insert reference to each string */
if (!tst_ins_del(&root, &p, INS, REF)) { /* fail to insert */
fprintf(stderr, "error: memory exhausted, tst_insert.\n");
fclose(fp);
return 1;
} else { /* update bloom filter */
bloom_add(bloom, Top);
}
idx++;
Top += (strlen(Top) + 1);
}
```
那就來動手執行看看吧
CPY 版本
```
ternary_tree, loaded 259112 words in 0.104011 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice:
```
REF 版本
```
ternary_tree, loaded 259112 words in 0.135317 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice:
```
把兩種版本都加上 bloom filter 會發生什麼事情呢?
CPY 版本
```
ternary_tree, loaded 259112 words in 0.146454 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: ^C
```
REF 版本
```
ternary_tree, loaded 259112 words in 0.142133 sec
Commands:
a add word to the tree
f find word in tree
s search words matching prefix
d delete word from the tree
q quit, freeing all data
choice: ^C
```
這樣的時間合理多了,
單純比較
`s Tai`這個指令效能的話
```
Performance counter stats for './test_cpy --bench s Tai' (100 runs):
1,406,773 cache-misses # 49.697 % of all cache refs ( +- 0.32% ) (48.81%)
2,830,689 cache-references ( +- 0.30% ) (51.07%)
533,090,688 instructions # 1.12 insn per cycle ( +- 0.09% ) (63.87%)
476,275,620 cycles ( +- 0.19% ) (64.32%)
0 mem-loads (64.44%)
85,402,392 mem-stores ( +- 0.10% ) (63.05%)
2,160,191 branch-misses # 2.08% of all branches ( +- 0.32% ) (60.78%)
103,795,816 branch-instructions ( +- 0.13% ) (47.53%)
0.135494542 seconds time elapsed ( +- 0.34% )
Performance counter stats for './test_ref --bench s Tai' (100 runs):
1,239,453 cache-misses # 46.502 % of all cache refs ( +- 0.52% ) (49.90%)
2,665,362 cache-references ( +- 0.23% ) (50.01%)
507,050,976 instructions # 1.13 insn per cycle ( +- 0.05% ) (62.51%)
448,161,640 cycles ( +- 0.24% ) (62.51%)
0 mem-loads (62.60%)
81,214,401 mem-stores ( +- 0.08% ) (62.59%)
2,160,993 branch-misses # 2.19% of all branches ( +- 0.29% ) (62.49%)
98,827,656 branch-instructions ( +- 0.11% ) (49.90%)
0.128512164 seconds time elapsed ( +- 0.23% )
```
REF 的表現的確比 CPY 要好。而且因為 REF 是使用連續的記憶體空間,也可以觀察到在所有 cache reference 中, REF 的表現要比 CPY 更佳。