# 2020q3 Homework3 (dict)
contributed by < `Rsysz` >
###### tags: `sysprog`
> [I04: dict](https://hackmd.io/@sysprog/2020-dict)
## 原始碼解析
### CPY與REF
首先看到 `tst_ins` 中透過 `const int cpy` 判斷模式,CPY模式下則透過 `strdup` 調用 `malloc` 取得一塊記憶體空間並複製字串,回傳位址
#### tst.c
```cpp
if (cpy) { /* allocate storage for 's' */
const char *eqdata = strdup(s);
if (!eqdata)
return NULL;
curr->eqkid = (tst_node *) eqdata;
return (void *) eqdata;
} else { /* save pointer to 's' (allocated elsewhere) */
curr->eqkid = (tst_node *) s;
return (void *) s;
}
```
C語言中列舉內部是以 `int` 儲存,因此原始碼中透過 `int REF` 儲存列舉 `enum{ INS }` ,以待 `tst_ins` 用於判斷插入時的模式使用
`char word[WRDMAX] = ""` 使用陣列宣告一段連續的記憶體空間並初始化內部數值,再以 `char *Top = word` 保留記憶體位址以待 `tst_ins` 使用
#### test_common.c
```cpp
enum { INS, DEL, WRDMAX = 256, STKMAX = 512, LMAX = 1024 };
int REF = INS;
...
char word[WRDMAX] = "";
int CPYmask = -1;
...
if (!strcmp(argv[1], "CPY") || (argc > 2 && !strcmp(argv[2], "CPY"))) {
CPYmask = 0;
REF = DEL;
printf("CPY mechanism\n");
} else
printf("REF mechanism\n");
char *Top = word;
char *pool = NULL;
```
透過 `CPYmask` 確保只有 `REF` 模式下才 allocate memory pool ,並使用 `Top` 指向 `pool`
```cpp
if (CPYmask) { // Only allacte pool in REF mechanism
pool = malloc(poolsize);
if (!pool) {
fprintf(stderr, "Failed to allocate memory pool.\n");
return 1;
}
Top = pool;
}
```
此處將先前初始化的 `Top` 傳入,過濾讀取的文檔並呼叫 `tst_ins` 以指定模式插入,此外
* `REF` 模式下因 `Top` 已透過 `pool` 取得大量記憶體空間,無須回到 `pool` 頂端
* `CPY` 模式下因 `tst_ins` 內部 `strdup` 已複製字串,透過 `Top -= offset & ~CPYmask` 將 `Top` 重新回到陣列頂端,並清空陣列
```cpp
bloom_t bloom = bloom_create(TableSize);
char buf[WORDMAX];
while (fgets(buf, WORDMAX, fp)) {
int offset = 0;
for (int i = 0, j = 0; buf[i + offset]; i++) {
Top[i] =
(buf[i + j] == ',' || buf[i + j] == '\n') ? '\0' : buf[i + j];
j += (buf[i + j] == ',');
}
while (*Top) {
if (!tst_ins(&root, Top, REF)) { /* fail to insert */
fprintf(stderr, "error: memory exhausted, tst_insert.\n");
fclose(fp);
return 1;
}
bloom_add(bloom, Top);
idx++;
int len = strlen(Top);
offset += len + 1;
Top += len + 1;
}
Top -= offset & ~CPYmask;
memset(Top, '\0', WORDMAX);
}
```
## Ternary search tree + Bloom filter 的效能表現
**Ex1** 以 `cities.txt` 驗證 `Bloom filter` 因搜尋的 `TST` 就是基於 `cities.txt` 建立的,因此加入 `Bloom filter` 反而導致搜尋較慢
![](https://i.imgur.com/ZFrzRrM.png)
## Perf 分析 TST 程式碼的執行時期行為
可以發現加上 `bloom filter` 後執行時間確實有變長,但看不出 `CPY` 或 `REF` 的顯著差異。
```bash
$ sudo perf stat ./test_common --bench REF
REF mechanism
ternary_tree, loaded 206849 words in 0.095285 sec
Performance counter stats for './test_common --bench REF':
211.64 msec task-clock # 0.999 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
7168 page-faults # 0.034 M/sec
8,7738,3896 cycles # 4.146 GHz
10,7785,9667 instructions # 1.23 insn per cycle
1,7339,8419 branches # 819.322 M/sec
449,2610 branch-misses # 2.59% of all branches
0.211871540 seconds time elapsed
0.195903000 seconds user
0.015992000 seconds sys
```
```bash
$ sudo perf stat ./test_common --bloom REF
REF mechanism
ternary_tree, loaded 206849 words in 0.098016 sec
Performance counter stats for './test_common --bloom REF':
225.99 msec task-clock # 0.998 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
7168 page-faults # 0.032 M/sec
9,2318,5920 cycles # 4.085 GHz
11,1833,7852 instructions # 1.21 insn per cycle
1,7695,2681 branches # 783.023 M/sec
455,4214 branch-misses # 2.57% of all branches
0.226365967 seconds time elapsed
0.214253000 seconds user
0.012127000 seconds sys
```
```bash
$ sudo perf stat ./test_common --bench CPY
CPY mechanism
ternary_tree, loaded 206849 words in 0.095965 sec
Performance counter stats for './test_common --bench CPY':
206.50 msec task-clock # 0.999 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
7346 page-faults # 0.036 M/sec
8,5594,8946 cycles # 4.145 GHz
11,0229,5289 instructions # 1.29 insn per cycle
1,7819,6522 branches # 862.943 M/sec
422,1781 branch-misses # 2.37% of all branches
0.206725199 seconds time elapsed
0.194586000 seconds user
0.012161000 seconds sys
```
```bash
$ sudo perf stat ./test_common --bloom CPY
CPY mechanism
ternary_tree, loaded 206849 words in 0.096215 sec
Performance counter stats for './test_common --bloom CPY':
217.11 msec task-clock # 0.999 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
7347 page-faults # 0.034 M/sec
9,0145,4753 cycles # 4.152 GHz
11,4143,7609 instructions # 1.27 insn per cycle
1,8166,6222 branches # 836.730 M/sec
427,0604 branch-misses # 2.35% of all branches
0.217390310 seconds time elapsed
0.205303000 seconds user
0.012076000 seconds sys
```
## CPY 和 REF 兩種實作機制
考慮到 CPY 和 REF 這兩種實作機制,思考效能落差的緣故,並且提出後續效能改善機制
充分量化並以現代處理器架構的行為解讀
如何兼顧執行時間和空間運用效率?