2020q3 Homework3 (dict)

# 2020q3 Homework3 (dict) contributed by < ``yceugene`` > ###### tags: `linux2020` `sysprog2020` ## :memo: 目錄 [TOC] ### Outline 1. Trie ＆ Ternary tree 2. ternary search tree + bloom filter 的效能表現並分析 3. CPY & REF 機制差異 4. Bloom filter 空間浪費改善 5. 引入 test-common.c 統合 test_cpy.c 和 test_ref.c 6. 紀錄多餘逗號修正 7. 安裝 perf 8. Perf 分析 CPY & REF 機制效能 9. perf record 找 cache-misses 熱點 10. memory leaks 的偵測及修正 ### Environment ``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz Stepping: 10 CPU MHz: 800.036 CPU max MHz: 4400.0000 CPU min MHz: 800.0000 BogoMIPS: 6000.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 9216K NUMA node0 CPU(s): 0-5 ``` ### Trie Trie 又叫做 ``digital tree`` 或 ``prefix-tree``(或稱字典樹)，是一種 ``search tree``，通常用來存放字串。其特性是適合用來做 prefix search 但礙於各節點存有大量空指標，容易占用記憶體空間。 ### Ternary search tree(TST) TST 是一種 trie 的資料結構，其優點是結合了 trie 的時間效率和 BST 的空間效率。TST 的節點會包含一個 key(用來記錄 Keys 中的其中一個字元)，以及至多三個 child ，大於、等於、小於。在 ``dict`` 中 TST 結構定義如下: ``` c typedef struct tst_node { char key; /* char key for node (null for node with string) */ unsigned refcnt; /* refcnt tracks occurrence of word (for delete) */ struct tst_node *lokid; /* ternary low child pointer */ struct tst_node *eqkid; /* ternary equal child pointer */ struct tst_node *hikid; /* ternary high child pointer */ } tst_node; ``` 特性: * 因為各個節點至多只會有 3 個 child，因此使用起來更節省記憶體。 ### 初步執行 dict 專案 * 取得 dict 程式碼，編譯並測試 ``` $ make $ ./test_common CPY ``` * 執行結果 ``` CPY mechanism ternary_tree, loaded 206849 words in 0.295581 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: Taiwan Bloomfilter found Taiwan in 0.000002 sec. Probability of false positives:0.006306 ---------- Tree found Taiwan in 0.000006 sec. choice: s find words matching prefix (at least 1 char): Tain Tain - searched prefix in 0.000009 sec suggest[0] : Tain suggest[1] : Tainan suggest[2] : Taino suggest[3] : Tainter Lake suggest[4] : Taintrux ``` ### copy 機制 (CPY) & reference 機制 (REF) * 比較 copy 與 reference 機制的效率 * make bench ``` $ make bench CC test_common.o CC tst.o CC bloom.o LD test_common COPY mechanism test_common => choice: find words matching prefix (at least 1 char): Tai - searched prefix in 0.000305 sec REFERENCE mechanism test_common => choice: find words matching prefix (at least 1 char): Tai - searched prefix in 0.000452 sec (base) ``` * make bench TEST_DATA="s T" ``` $ make bench TEST_DATA="s T" COPY mechanism test_common => choice: find words matching prefix (at least 1 char): T - searched prefix in 0.000330 sec REFERENCE mechanism test_common => choice: find words matching prefix (at least 1 char): T - searched prefix in 0.000370 sec ``` ## 程式碼理解 ### main() 先針對 CPY 與 REF 兩種機制做不同的記憶體配置，主要差別是 Top 指標指向的空間不同。 ``` c if (!strcmp(argv[1], "CPY") || (argc > 2 && !strcmp(argv[2], "CPY"))) { CPYmask = 0; REF = DEL; printf("CPY mechanism\n"); } else printf("REF mechanism\n"); ``` CPY 下的配置: Top 指向一塊 char array ```c enum{ WRDMAX = 256}; char word[WRDMAX] char *Top = word; char *pool = NULL; ``` REF 下的配置: Top 指向一塊 heap 中的空間(大小為 2000000 * WRDMAX) ```c pool = malloc(poolsize); Top = pool; ``` 對 ``cities.txt`` 每個 row 做 parsing，再將字串 assign 給 pool: ```c= int offset = 0; for (int i = 0, j = 0; buf[i + offset]; i++) { Top[i] = (buf[i + j] == ',' || buf[i + j] == '\n') ? '\0' : buf[i + j]; j += (buf[i + j] == ','); } ``` 對 bloom filter 做新增: ```c= while (*Top) { if (!tst_ins(&root, Top, REF)) { /* fail to insert */ fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } bloom_add(bloom, Top); idx++; int len = strlen(Top); offset += len + 1; Top += len + 1; } ``` CPY 與 REF 兩種機制，在做 TST Insert 時會有不同，CPY 機制是在插入新字串時，malloc 一塊記憶體存放新字串；REF 機制由於在讀取檔案時，就已將全部字串放進 heap 中了，因次在樹中插入字串時，只需要存放對應的 heap address就可以了。 ```c= if (*p++ == 0) { if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) s; return (void *) s; } } ``` 在 CPY 機制下，每處理完一列資料時，需要把 Top 還原回 word 的起始位址；REF 下則不用: ``Top -= offset & ~CPYmask`` ## bloom filter Bloom filter 是利用 hash function，在不用走訪全部元素的前提，預測特定字串是否存於資料結構中。但只能做預測，有一定的錯誤率。建立 bloom filter: ```c bloom_t bloom = bloom_create(TableSize); ``` ### Bloom filter 主體宣告: ```c typedef unsigned int (*hash_function)(const void *data); typedef struct bloom_filter *bloom_t; ``` 用一個 linked list 存放用到的所有 hash function ``` c struct bloom_hash { hash_function func; struct bloom_hash *next; }; struct bloom_filter { struct bloom_hash *func; void *bits; // 即 table size_t size; // 為 table 大小 }; ``` ### tst_ins(&root, Top, REF) ## ternary search tree + bloom filter 的效能表現 ## 以 perf 分析 TST 程式碼的執行時期行為 ## 考慮到 CPY 和 REF 這兩種實作機制