2018q3 E02 - HackMD

# 2018q3 E02 先來看看 test_ref 的表現 ``` choice: f find word in tree: Taiwan Bloomfilter found Taiwan in 0.000001 sec. Probability of false positives:0.009693 ---------- Tree found Taiwan in 0.000001 sec. Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: University Bloomfilter found University in 0.000001 sec. Probability of false positives:0.009693 ---------- Tree found University in 0.000002 sec. ``` REF 版本相比 CPY 版本是有加 Bloom Filter。可以看到當字串比較簡單時，兩者搜尋的時間近乎相同。當字串開始複雜起來時，就可以發現樹的搜尋時間明顯變長，相比只要 hash 的 BloomFiler ## 內建的 runtime_gp3 剛開輸出來的圖長這樣 ![](https://i.imgur.com/kbWjmSR.png) 在放大來看 ![](https://i.imgur.com/YgVJI3R.png) 會發現 CPY 的 benchmark 怪怪的，搜尋時間竟然都是同樣的時間，根據 [Jyun-Neng](https://github.com/Jyun-Neng) 同學的共筆，可以把程式碼中 prefix 的部份修正。並且有正確數值輸出。 ![](https://i.imgur.com/YCNa6ww.png) 並且在橫軸的時間明顯不對 ``` clike= double tvgetf() { struct timespec ts; double sec; clock_gettime(CLOCK_REALTIME, &ts); sec = ts.tv_nsec; sec /= 1e9; sec += ts.tv_sec; return sec; } ``` 原來是把秒數除以 1e9 也就是奈秒等級。因此把 runtime_gp3 修正成並且把 xtic 修正為 25000，也修正檔案涵蓋範圍容納整個 benchmark 的資料。 ``` reset set xlabel 'prefix' set ylabel 'time(nano sec)' set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime3.png' set format x "%10.0f" set xtics rotate by 45 right plot [:][:1000]'bench_cpy.txt' using 1:2 with points title 'cpy',\ ``` ![](https://i.imgur.com/PKChnNk.png) 這樣好看多了。請注意，這些是利用 cities.txt 所作的 prefix search 得到的數據。僅能代表 tst 的效能。可以觀察到主要都落在 700 奈秒以內。 ## 自製 benchmark 我們知道 ref 是有加 bloomfilter 的版本，所以用 test_ref.c 來實驗應該很適合。要考慮到字彙的涵蓋最好的方法就是全部 dictionary 全都找一次。Find 可以用 tst 的方法也可以用 bloomfilter 的方法。既然我的資料集是用原本的測站，那每次搜尋理論上都可以找到，也好比較兩個資料結構的效能。這裡將模仿 bench.c 裡面的 benchmark 作法 ```clike= int find_bench_test( const tst_node *root, char *out_file, bloom_t bloom) //傳入樹的跟節點與 bloomfiler { FILE *fp = fopen(out_file, "w"); FILE *dict = fopen(DICT_FILE, "r"); char word[WORDMAX] = ""; int idx = 0; double t1, t2, t3, t4; // clock_t start_1,start_2,end_1,end_2; if (!fp || !dict) { if (fp) { fprintf(stderr, "error: file open failed in '%s'.\n", DICT_FILE); fclose(fp); } if (dict) { fprintf(stderr, "error: file open failed in '%s'.\n", out_file); fclose(dict); } return 1; } //開始統計兩者搜尋的速度 while (fscanf(dict, "%s", word) != EOF) { // printf("%s\n",word); // size_t len = strlen(word); // word[len] = '\0'; // if (len && word[len - 1] == '\n') // word[--len] = 0; t1 = tvgetf(); tst_search(root, word); t2 = tvgetf(); t3 = tvgetf(); bloom_test(bloom, word); t4 = tvgetf(); fprintf(fp, "%d %f %f nsec\n", idx, (t2 - t1) * 1000000, (t4 - t3) * 1000000); idx++; } fclose(fp); fclose(dict); return 0; } ``` ![](https://i.imgur.com/yp2gIDf.png) 綠色的部份是用 bloomfilter 來搜尋的時間，紫色的部份則是用 tst 走訪來搜尋字串。可以看到綠色的部份幾乎都在一奈秒以內，紫色則變化很大。 ## 以 perf 分析 TST 程式碼的執行時期行為並且充分解讀 ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 1,367,998 cache-misses # 48.513 % of all cache refs ( +- 0.46% ) 2,819,860 cache-references ( +- 0.19% ) 527,225,080 instructions # 1.07 insn per cycle ( +- 0.00% ) 491,207,582 cycles ( +- 0.23% ) 0.139361246 seconds time elapsed ( +- 0.34% ) Performance counter stats for './test_ref --bench s Tai' (100 runs): 1,533,475 cache-misses # 45.929 % of all cache refs ( +- 0.35% ) 3,338,797 cache-references ( +- 0.15% ) 588,156,637 instructions # 1.02 insn per cycle ( +- 0.00% ) 576,276,589 cycles ( +- 0.14% ) 0.161919829 seconds time elapsed ( +- 0.24% ) ``` ## CPY 和 REF 的差別可以看到在 tst.c 裡面，```tst_ins_del```這個函數接收不同的變數。其中在這段判斷了是否為 CPY 與 REF ```clike= if (*p++ == 0) { if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(*s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) *s; return (void *) *s; } } ``` 可以觀察到 CPY 每次都呼叫 strdup 來複製字串而 REF 只有變更指標的地址。 CPY 版本的插入 ```clike= while ((rtn = fscanf(fp, "%s", word)) != EOF) { char *p = word; if (!tst_ins_del(&root, &p, INS, CPY)) { fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } idx++; } ``` REF 版本的插入 ```clike= /* memory pool */ char *pool = (char *) malloc(poolsize * sizeof(char)); char *Top = pool; while ((rtn = fscanf(fp, "%s", Top)) != EOF) { char *p = Top; /* insert reference to each string */ if (!tst_ins_del(&root, &p, INS, REF)) { /* fail to insert */ fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } else { /* update bloom filter */ bloom_add(bloom, Top); } idx++; Top += (strlen(Top) + 1); } ``` 那就來動手執行看看吧 CPY 版本 ``` ternary_tree, loaded 259112 words in 0.104011 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ``` REF 版本 ``` ternary_tree, loaded 259112 words in 0.135317 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ``` 把兩種版本都加上 bloom filter 會發生什麼事情呢？ CPY 版本 ``` ternary_tree, loaded 259112 words in 0.146454 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ^C ``` REF 版本 ``` ternary_tree, loaded 259112 words in 0.142133 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ^C ``` 這樣的時間合理多了，單純比較 `s Tai`這個指令效能的話 ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 1,406,773 cache-misses # 49.697 % of all cache refs ( +- 0.32% ) (48.81%) 2,830,689 cache-references ( +- 0.30% ) (51.07%) 533,090,688 instructions # 1.12 insn per cycle ( +- 0.09% ) (63.87%) 476,275,620 cycles ( +- 0.19% ) (64.32%) 0 mem-loads (64.44%) 85,402,392 mem-stores ( +- 0.10% ) (63.05%) 2,160,191 branch-misses # 2.08% of all branches ( +- 0.32% ) (60.78%) 103,795,816 branch-instructions ( +- 0.13% ) (47.53%) 0.135494542 seconds time elapsed ( +- 0.34% ) Performance counter stats for './test_ref --bench s Tai' (100 runs): 1,239,453 cache-misses # 46.502 % of all cache refs ( +- 0.52% ) (49.90%) 2,665,362 cache-references ( +- 0.23% ) (50.01%) 507,050,976 instructions # 1.13 insn per cycle ( +- 0.05% ) (62.51%) 448,161,640 cycles ( +- 0.24% ) (62.51%) 0 mem-loads (62.60%) 81,214,401 mem-stores ( +- 0.08% ) (62.59%) 2,160,993 branch-misses # 2.19% of all branches ( +- 0.29% ) (62.49%) 98,827,656 branch-instructions ( +- 0.11% ) (49.90%) 0.128512164 seconds time elapsed ( +- 0.23% ) ``` REF 的表現的確比 CPY 要好。而且因為 REF 是使用連續的記憶體空間，也可以觀察到在所有 cache reference 中， REF 的表現要比 CPY 更佳。

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.