# 2020q3 Homework3 (dict) contributed by < `nelsonlai1` > ## 視覺化 ternary search tree + bloom filter 的效能表現並分析 在 `test_common.c` 中可以發現預設的輸入檔案為 `cities.txt` 以及使用 REF 機制。 而測試用的指令只有 `-- bench`,也就是並沒有使用 bloom filter。為了測試使用 bloom filter 帶來的差異,在 `test_common.c` 中加入 ```c= if (argc == 3 && strcmp(argv[1], "--bloom_bench") == 0) { int stat = bloom_bench_test(root, BENCH_TEST_FILE, LMAX, bloom); tst_free(root); free(pool); bloom_free(bloom); return stat; } ``` 並在 `bench.c` 加入 `bloom_bench_test` 這個 function。 而這個 function 與 `bench_test` 大致相同,只差在會先用 `bloom_test` 測試是否通過 bloom filter,再做 `tst_search_prefix`。 另外在 bench.c 中讀取資料使用 fscanf,會在空白處截斷,但 `test_common.c` 中建檔時使用 fgets,不會在空白處中斷,因此這裡也改用 fgets 讀取,讓 `bloom_test` 可以通過。 **bench input file : cities.txt** ![](https://i.imgur.com/mEvj1mu.png) **bench input file : password.txt (常用密碼字典,與 cities.txt 完全不同)** ![](https://i.imgur.com/Tp2jz1A.png) ## 以 perf 分析 TST 程式碼的執行時期行為並且充分解讀 **bench input file : cities.txt** `--bench CPY` ``` CPY mechanism ternary_tree, loaded 206849 words in 0.097356 sec Performance counter stats for './test_common --bench CPY': 205.15 msec task-clock # 0.999 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7350 page-faults # 0.036 M/sec 8,4884,4762 cycles # 4.138 GHz 11,0056,7614 instructions # 1.30 insn per cycle 1,7759,3844 branches # 865.671 M/sec 436,0298 branch-misses # 2.46% of all branches 0.205407753 seconds time elapsed 0.205419000 seconds user 0.000000000 seconds sys ``` `--bench REF` ``` REF mechanism ternary_tree, loaded 206849 words in 0.096256 sec Performance counter stats for './test_common --bench REF': 209.29 msec task-clock # 0.999 CPUs utilized 1 context-switches # 0.005 K/sec 0 cpu-migrations # 0.000 K/sec 7169 page-faults # 0.034 M/sec 8,6639,8287 cycles # 4.140 GHz 10,7603,0564 instructions # 1.24 insn per cycle 1,7275,8733 branches # 825.442 M/sec 456,7932 branch-misses # 2.64% of all branches 0.209602805 seconds time elapsed 0.193478000 seconds user 0.016123000 seconds sys ``` `--bloom_bench CPY` ``` CPY mechanism ternary_tree, loaded 206849 words in 0.097569 sec Performance counter stats for './test_common --bloom_bench CPY': 214.67 msec task-clock # 0.999 CPUs utilized 1 context-switches # 0.005 K/sec 0 cpu-migrations # 0.000 K/sec 7349 page-faults # 0.034 M/sec 8,8872,2265 cycles # 4.140 GHz 11,4129,9229 instructions # 1.28 insn per cycle 1,8121,4640 branches # 844.150 M/sec 436,6378 branch-misses # 2.41% of all branches 0.214975275 seconds time elapsed 0.210907000 seconds user 0.004055000 seconds sys ``` `--bloom_bench REF` ``` REF mechanism ternary_tree, loaded 206849 words in 0.095650 sec Performance counter stats for './test_common --bloom_bench REF': 218.26 msec task-clock # 0.999 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7168 page-faults # 0.033 M/sec 9,0654,7267 cycles # 4.154 GHz 11,1553,2711 instructions # 1.23 insn per cycle 1,7629,4874 branches # 807.732 M/sec 458,9250 branch-misses # 2.60% of all branches 0.218490144 seconds time elapsed 0.218502000 seconds user 0.000000000 seconds sys ``` 雖然在圖上看不太出來,但透過 perf 指令依然可看出在每筆資料都會命中的情況下使用 bloom filter ,會比不使用還花時間。而 REF 也會比 CPY 來的久。 **bench input file : password.txt** `--bench CPY` ``` CPY mechanism ternary_tree, loaded 206849 words in 0.096618 sec Performance counter stats for './test_common --bench CPY': 149.09 msec task-clock # 0.998 CPUs utilized 1 context-switches # 0.007 K/sec 0 cpu-migrations # 0.000 K/sec 7346 page-faults # 0.049 M/sec 6,1920,8149 cycles # 4.153 GHz 9,3016,3852 instructions # 1.50 insn per cycle 1,5460,1854 branches # 1036.972 M/sec 257,6981 branch-misses # 1.67% of all branches 0.149381557 seconds time elapsed 0.141297000 seconds user 0.008074000 seconds sys ``` `--bench REF` ``` REF mechanism ternary_tree, loaded 206849 words in 0.096548 sec Performance counter stats for './test_common --bench REF': 148.93 msec task-clock # 0.998 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7171 page-faults # 0.048 M/sec 6,1528,0654 cycles # 4.131 GHz 9,0672,2462 instructions # 1.47 insn per cycle 1,4985,9561 branches # 1006.229 M/sec 282,7426 branch-misses # 1.89% of all branches 0.149161319 seconds time elapsed 0.145145000 seconds user 0.004031000 seconds sys ``` `--bloom_bench CPY` ``` CPY mechanism ternary_tree, loaded 206849 words in 0.096723 sec Performance counter stats for './test_common --bloom_bench CPY': 147.76 msec task-clock # 0.998 CPUs utilized 2 context-switches # 0.014 K/sec 0 cpu-migrations # 0.000 K/sec 7345 page-faults # 0.050 M/sec 6,0905,3410 cycles # 4.122 GHz 9,2034,7182 instructions # 1.51 insn per cycle 1,5179,2614 branches # 1027.282 M/sec 245,7345 branch-misses # 1.62% of all branches 0.148022654 seconds time elapsed 0.140008000 seconds user 0.008000000 seconds sys ``` `--bloom_bench REF` ``` REF mechanism ternary_tree, loaded 206849 words in 0.096837 sec Performance counter stats for './test_common --bloom_bench REF': 147.60 msec task-clock # 0.998 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7168 page-faults # 0.049 M/sec 6,0931,8807 cycles # 4.128 GHz 8,9705,9714 instructions # 1.47 insn per cycle 1,4704,2041 branches # 996.217 M/sec 273,8430 branch-misses # 1.86% of all branches 0.147825302 seconds time elapsed 0.143846000 seconds user 0.003995000 seconds sys ``` 時間上的差距比使用 `cities.txt` 還小很多,但在都會 miss 時,使用 bloom filter 有稍微快了一點。