# 2020q3 Homework3 (dict) contributed by < `Rsysz` > ###### tags: `sysprog` > [I04: dict](https://hackmd.io/@sysprog/2020-dict) ## 原始碼解析 ### CPY與REF 首先看到 `tst_ins` 中透過 `const int cpy` 判斷模式,CPY模式下則透過 `strdup` 調用 `malloc` 取得一塊記憶體空間並複製字串,回傳位址 #### tst.c ```cpp if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) s; return (void *) s; } ``` C語言中列舉內部是以 `int` 儲存,因此原始碼中透過 `int REF` 儲存列舉 `enum{ INS }` ,以待 `tst_ins` 用於判斷插入時的模式使用 `char word[WRDMAX] = ""` 使用陣列宣告一段連續的記憶體空間並初始化內部數值,再以 `char *Top = word` 保留記憶體位址以待 `tst_ins` 使用 #### test_common.c ```cpp enum { INS, DEL, WRDMAX = 256, STKMAX = 512, LMAX = 1024 }; int REF = INS; ... char word[WRDMAX] = ""; int CPYmask = -1; ... if (!strcmp(argv[1], "CPY") || (argc > 2 && !strcmp(argv[2], "CPY"))) { CPYmask = 0; REF = DEL; printf("CPY mechanism\n"); } else printf("REF mechanism\n"); char *Top = word; char *pool = NULL; ``` 透過 `CPYmask` 確保只有 `REF` 模式下才 allocate memory pool ,並使用 `Top` 指向 `pool` ```cpp if (CPYmask) { // Only allacte pool in REF mechanism pool = malloc(poolsize); if (!pool) { fprintf(stderr, "Failed to allocate memory pool.\n"); return 1; } Top = pool; } ``` 此處將先前初始化的 `Top` 傳入,過濾讀取的文檔並呼叫 `tst_ins` 以指定模式插入,此外 * `REF` 模式下因 `Top` 已透過 `pool` 取得大量記憶體空間,無須回到 `pool` 頂端 * `CPY` 模式下因 `tst_ins` 內部 `strdup` 已複製字串,透過 `Top -= offset & ~CPYmask` 將 `Top` 重新回到陣列頂端,並清空陣列 ```cpp bloom_t bloom = bloom_create(TableSize); char buf[WORDMAX]; while (fgets(buf, WORDMAX, fp)) { int offset = 0; for (int i = 0, j = 0; buf[i + offset]; i++) { Top[i] = (buf[i + j] == ',' || buf[i + j] == '\n') ? '\0' : buf[i + j]; j += (buf[i + j] == ','); } while (*Top) { if (!tst_ins(&root, Top, REF)) { /* fail to insert */ fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } bloom_add(bloom, Top); idx++; int len = strlen(Top); offset += len + 1; Top += len + 1; } Top -= offset & ~CPYmask; memset(Top, '\0', WORDMAX); } ``` ## Ternary search tree + Bloom filter 的效能表現 **Ex1** 以 `cities.txt` 驗證 `Bloom filter` 因搜尋的 `TST` 就是基於 `cities.txt` 建立的,因此加入 `Bloom filter` 反而導致搜尋較慢 ![](https://i.imgur.com/ZFrzRrM.png) ## Perf 分析 TST 程式碼的執行時期行為 可以發現加上 `bloom filter` 後執行時間確實有變長,但看不出 `CPY` 或 `REF` 的顯著差異。 ```bash $ sudo perf stat ./test_common --bench REF REF mechanism ternary_tree, loaded 206849 words in 0.095285 sec Performance counter stats for './test_common --bench REF': 211.64 msec task-clock # 0.999 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7168 page-faults # 0.034 M/sec 8,7738,3896 cycles # 4.146 GHz 10,7785,9667 instructions # 1.23 insn per cycle 1,7339,8419 branches # 819.322 M/sec 449,2610 branch-misses # 2.59% of all branches 0.211871540 seconds time elapsed 0.195903000 seconds user 0.015992000 seconds sys ``` ```bash $ sudo perf stat ./test_common --bloom REF REF mechanism ternary_tree, loaded 206849 words in 0.098016 sec Performance counter stats for './test_common --bloom REF': 225.99 msec task-clock # 0.998 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7168 page-faults # 0.032 M/sec 9,2318,5920 cycles # 4.085 GHz 11,1833,7852 instructions # 1.21 insn per cycle 1,7695,2681 branches # 783.023 M/sec 455,4214 branch-misses # 2.57% of all branches 0.226365967 seconds time elapsed 0.214253000 seconds user 0.012127000 seconds sys ``` ```bash $ sudo perf stat ./test_common --bench CPY CPY mechanism ternary_tree, loaded 206849 words in 0.095965 sec Performance counter stats for './test_common --bench CPY': 206.50 msec task-clock # 0.999 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7346 page-faults # 0.036 M/sec 8,5594,8946 cycles # 4.145 GHz 11,0229,5289 instructions # 1.29 insn per cycle 1,7819,6522 branches # 862.943 M/sec 422,1781 branch-misses # 2.37% of all branches 0.206725199 seconds time elapsed 0.194586000 seconds user 0.012161000 seconds sys ``` ```bash $ sudo perf stat ./test_common --bloom CPY CPY mechanism ternary_tree, loaded 206849 words in 0.096215 sec Performance counter stats for './test_common --bloom CPY': 217.11 msec task-clock # 0.999 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 7347 page-faults # 0.034 M/sec 9,0145,4753 cycles # 4.152 GHz 11,4143,7609 instructions # 1.27 insn per cycle 1,8166,6222 branches # 836.730 M/sec 427,0604 branch-misses # 2.35% of all branches 0.217390310 seconds time elapsed 0.205303000 seconds user 0.012076000 seconds sys ``` ## CPY 和 REF 兩種實作機制 考慮到 CPY 和 REF 這兩種實作機制,思考效能落差的緣故,並且提出後續效能改善機制 充分量化並以現代處理器架構的行為解讀 如何兼顧執行時間和空間運用效率?