2018q3 Homework3 (dict)

# 2018q3 Homework3 (dict) contributed by < [`siahuat0727`](https://github.com/siahuat0727) > [作業說明 E02: dict](https://hackmd.io/s/B1Bq5Jat7#%E4%BD%9C%E6%A5%AD%E8%A6%81%E6%B1%82) ## 開發環境 ``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Stepping: 9 CPU MHz: 826.723 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 5616.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 ``` ## Ternary Search Tree 實作細節 ### 分飾多角的 eqkid 資料結構 ```C typedef struct tst_node { char key; /* char key for node (null for node with string) */ unsigned refcnt; /* refcnt tracks occurrence of word (for delete) */ struct tst_node *lokid; /* ternary low child pointer */ struct tst_node *eqkid; /* ternary equal child pointer */ /** 其實它分飾多角 **/ struct tst_node *hikid; /* ternary high child pointer */ } tst_node; ``` 閱讀 `tst_ins_del` 時遇到很多困難，比如： ```C void *tst_ins_del(tst_node **root, const char *s, const int del, const int cpy) { ... while ((curr = *pcurr)) { /* iterate to insertion node */ ... return (void *) curr->eqkid; /* pointer to word */ ... pcurr = &(curr->eqkid); /* get next eqkid pointer address */ ... } for (;;) { ... curr->eqkid = (tst_node *) s; return (void *) s; ... } } ``` 乍看之下完全一頭霧水，怎麼有時候將 `eqkid` 傳出去當 word 起始位置，有時候又用它來取得下一個 node，有時候將傳進來的 `const char *s` 轉型成 `tst_node *` ~~WTF?!~~ 指派給它？（程式碼跟原始版本有些差異，[詳見這裡](https://hackmd.io/ClIwCm59Ttya3_FPwez2IQ?view#tst_ins_del-%E5%A5%87%E6%80%AA%E7%9A%84%E5%8F%83%E6%95%B8)）如果沒辦法直接讀懂，其實從 `tst_suggest` 可以看出一些端倪： ```C void tst_suggest( ... ) { ... if (p->key) tst_suggest(p->eqkid, c, nchr, a, n, max); else if (*(((char *) p->eqkid) + nchr - 1) == c) a[(*n)++] = (char *) p->eqkid; ... } ``` 若 `key` 非 0（`key` 爲 0 的 node 的存在是爲了表示 parent node 的結尾），則將 `eqkid` 當 node 指標進行 recursive call，否則當作字串使用。這裡需要記錄字串是因爲走訪到 node 之後要得知該 node 代表的字串是有點麻煩的事情（若有記錄 parent，可以往上透過一些規則找回），但這裡很巧妙的利用了 `eqkid` 當作 `char *` 指向該 node 表示的字串。 ### CPY 機制與 REF 機制的差別而作業中所謂的 copy 與 reference 機制，前者是傳入 `tst_ins_del` 的字串地址所表示的值**隨時會被覆寫**，因此 `eqkid` 需要 `malloc` 新空間將其 **copy** 存起來，而後者是傳進來的字串地址所表示的值**一直屬於該字串**，以此可將其當作 **reference** 直接存起來，不用 `malloc` 新空間。 ### 與 [Ternary Search Tree 視覺化](https://www.cs.usfca.edu/~galles/visualization/TST.html) 的差異承接上面，`eqkid` 分飾多角，沒有需要重複使用的時候嗎？ |Insert XY|Insert XYZ|Insert XY and XYZ |--|--|--| | ![](https://i.imgur.com/Sfb5JEz.png) | ![](https://i.imgur.com/CgpvAKU.png) | ![](https://i.imgur.com/QbPSMk8.png) | 如上面第三種情況，node Z 的 `eqkid` 該指向 ``"XY"`` 表示該字串結束，還是指向下一個 `key` 爲 0 的 node 表示 `"XYZ"` 的結束呢？再仔細觀察及思考會發現 node Z 的 key 爲 `'z'`，已無法表示字串 `"XY"` 的結束，`tst_node` 中也沒有類似綠色的屬性，所以 tst.c 的實作實際上是這樣的： |Insert XY and XYZ in order|Insert XYZ and XY in order| |--|--| |![](https://i.imgur.com/KU6JPaA.png)|![](https://i.imgur.com/vTpNRJd.png)| **注意：上圖（兩張都）請將 N 當作小綠空圓（表示字串 `"XY"` 的結束）並無視它下面的小綠空圓，而左圖 `>N` 爲 `>0`** > 有空再 p 圖 QQ，進度趕不上了可見用來表示結束的 node 永遠不會有 ternary equal child，因此 `eqkid` 就不會分身乏術啦。 ## 前期修改 ### test_CPY.c 補上 bloom filter 參考 test_ref.c，統一[加入 bloom filter](https://github.com/siahuat0727/dict/commit/33b57bcb6973b1e66cc6e306f383258f99dfd65d)，才能比較兩種實作機制的效能差異。 ### test_ref.c 補上 bench_test 參考 test_cpy.c，將缺失的[程式碼補齊](https://github.com/siahuat0727/dict/commit/a9f2e89935538b00c3777246bfa5c371fb0e92bb)。 ### bench.c 中 bug 修復程式將 `cities.txt` 中每個名稱（包括城市與國家名）在已建立好的 ternary tree 中進行搜索，照理來說每次搜索都要成功（建 tree 和 search 用的是相同資料），但若對程式碼進行以下修改： bench.c 的[這裡](https://github.com/siahuat0727/dict/blob/a9f2e89935538b00c3777246bfa5c371fb0e92bb/bench.c#L51) ```C if (tst_search_prefix(root, prefix, sgl, &sidx, max) != NULL) puts("Found"); else puts("Not found"); ``` 執行 `make && ./test_ref --bench` 會發現 Output： ``` ... Not found Not found ... ``` 原因是 `char prefix[3] = "" ` 的大小只有 `3`，`strncpy` 時沒預留 `\0` 的位置： ```C int bench_test(const tst_node *root, char *out_file, const int max) { char prefix[3] = ""; ... while (fscanf(dict, "%s", word) != EOF) { ... strncpy(prefix, word, 3); ... tst_search_prefix(root, prefix, sgl, &sidx, max) ... } ... ``` 而 `tst_search_prefix` 中 `s` 的使用上是以找到 `'\0'` 作爲結束： ```C void *tst_search_prefix(const tst_node *root, const char *s, char **a, int *n, const int max) { ... const char *start = s; ... /* get length of s */ for (; *start; start++) ; /* wait */ const size_t nchr = start - s; ... } ``` 因此 `nchr` 絕大多數情況下不是預想中的 3，而是較大的值，導致尋找非預期的 prefix，搜索失敗。 [修改方式](https://github.com/siahuat0727/dict/compare/a9f2e89...6833c18#diff-a9937b3765788bcf4c80db22f17c4032)：將 `prefix` 大小改爲 4 ，`strncpy` 限制 `n` 爲 `sizeof(prefix)-1` 可確保最後的 `'\0'` 不會被覆蓋。 ### 修改 bench.c 及繪圖程式時間單位錯誤 ```C int bench_test(const tst_node *root, char *out_file, const int max) { ... fprintf(fp, "%d %f sec\n", idx, (t2 - t1) * 1000000); // should be microseconds ... } ``` `t2 - t1` 的單位爲**秒**，乘上 `1e6` 後單位應爲**微秒** [統一單位爲 msec 並新增 `Makefile` 畫圖功能](https://github.com/siahuat0727/dict/commit/e7306791992a6082d0b4e8c9b9a5cd656741626f)後，執行 `make clean && make search-all && make plot` 並等待數十秒…… ``` search-all: $(TESTS) @for test in $(TESTS); do\ ./$$test --bench ; \ done ``` ![](https://i.imgur.com/xA7keNA.png) 這裡是給定每個已存在的字串 prefix 進行搜索，測試找到所有符合 prefix 的字串所需的時間，因此與建樹過程中 `malloc` 所花的時間等等**沒有關係**。很明顯的，以 **REF 機制**建立的 tree 在進行**搜索**時速度是比較**慢**的。 ## Perf 分析爲什麼會出現上面 CPY 機制明顯優於 REF 機制的結果呢？使用 perf 對上述實驗進行分析： ``` perf-search: $(TESTS) @for test in $(TESTS); do\ echo 3 | sudo tee /proc/sys/vm/drop_caches; \ perf stat \ -e cache-misses,cache-references,instructions,cycles \ ./$$test --bench; \ done ``` 執行 `make perf-search` ``` Performance counter stats for './test_cpy --bench': 1,681,368,653 cache-misses # 58.179 % of all cache refs 2,890,009,712 cache-references 49,637,166,894 instructions # 0.90 insn per cycle 54,932,228,832 cycles 15.492978290 seconds time elapsed ``` ``` Performance counter stats for './test_ref --bench': 1,967,405,626 cache-misses # 62.223 % of all cache refs 3,161,862,019 cache-references 49,705,117,369 instructions # 0.71 insn per cycle 70,053,549,114 cycles 19.957402867 seconds time elapsed ``` 明明**搜索過程一模一樣**，但兩者的 cache-misses 次數卻相差甚遠。仔細思考兩者的差異，其實**只有** ternimal node 的 `eqkid` 所指向的**字串地址來源不同**，CPY 機制是單獨 `malloc` 的小空間，REF 機制是 `malloc` 大空間的一小部分。所以過程中唯一稱得上不同的，大概只有最後對字串地址進行 dereference ，到對應的地址讀出字串的地方。於是嘗試刪掉最後的 dereference 相關的部分（原本用來檢查是否找到符合的 prefix，刪掉會導致結果不正確，但這裡的重點是尋找 REF 相對耗時的地方），發現這樣一來兩者的分析資料就差不多了！（不一定哪種機制比較快） ```C void tst_suggest(const tst_node *p, const char c, const size_t nchr, char **a, int *n, const int max) { if (!p || *n == max) return; tst_suggest(p->lokid, c, nchr, a, n, max); if (p->key) tst_suggest(p->eqkid, c, nchr, a, n, max); else // if (*(((char *) p->eqkid) + nchr - 1) == c) <= delete this only a[(*n)++] = (char *) p->eqkid; tst_suggest(p->hikid, c, nchr, a, n, max); } ``` ``` Performance counter stats for './test_cpy --bench': 1,390,728,865 cache-misses # 52.515 % of all cache refs 2,648,254,194 cache-references 45,808,349,880 instructions # 0.97 insn per cycle 47,022,265,491 cycles 12.974203801 seconds time elapsed ``` ``` Performance counter stats for './test_ref --bench': 1,340,926,158 cache-misses # 50.404 % of all cache refs 2,660,373,578 cache-references 45,869,337,319 instructions # 0.98 insn per cycle 46,894,627,000 cycles 12.866257905 seconds time elapsed ``` ![](https://i.imgur.com/9DbfzA3.png) 也就是說，造成 REF 相對慢的原因，與 `*(((char *) p->eqkid) + nchr - 1)` 背後發生的事情相關！ > 這世界怎麼越來越奇怪…… 待研究進一步用 perf 分析每個 intruction 發生 cache miss 的次數： `$ perf record -e cache-misses ./test_ref --bench && perf report` ``` 98.89% test_ref test_ref [.] tst_suggest 0.22% test_ref test_ref [.] tst_search_prefix 0.08% test_ref test_ref [.] tst_ins_del ``` 由於測試的是 prefix-search，時間都花在 `tst_suggest` 上不意外，先附上 [`tst_suggest`]() 前半段中每一行程式碼對應的組語指令 ``` tst_suggest(): const tst_node *p, // -0x8 相對於 %rbp 的地址 const char c, // -0xc const size_t nchr, // -0x18 char **a, // -0x20 int *n, // -0x28 const int max) // -0x10 { push %rbp mov %rsp,%rbp sub $0x30,%rsp mov %rdi,-0x8(%rbp) mov %esi,%eax mov %rdx,-0x18(%rbp) mov %rcx,-0x20(%rbp) mov %r8,-0x28(%rbp) mov %r9d,-0x10(%rbp) mov %al,-0xc(%rbp) if (!p || *n == max) cmpq $0x0,-0x8(%rbp) ↓ je 10e mov -0x28(%rbp),%rax mov (%rax),%eax cmp %eax,-0x10(%rbp) ↓ je 10e return; tst_suggest(p->lokid, c, nchr, a, n, max); movsbl -0xc(%rbp),%esi mov -0x8(%rbp),%rax mov 0x8(%rax),%rax mov -0x10(%rbp),%r8d mov -0x28(%rbp),%rdi mov -0x20(%rbp),%rcx mov -0x18(%rbp),%rdx mov %r8d,%r9d mov %rdi,%r8 mov %rax,%rdi → callq tst_suggest if (p->key) mov -0x8(%rbp),%rax movzbl (%rax),%eax test %al,%al ↓ je 9c tst_suggest(p->eqkid, c, nchr, a, n, max); movsbl -0xc(%rbp),%esi mov -0x8(%rbp),%rax mov 0x10(%rax),%rax mov -0x10(%rbp),%r8d mov -0x28(%rbp),%rdi mov -0x20(%rbp),%rcx mov -0x18(%rbp),%rdx mov %r8d,%r9d mov %rdi,%r8 mov %rax,%rdi → callq tst_suggest ↓ jmp e2 else if (*(((char *) p->eqkid) + nchr - 1) == c) 9c: mov -0x8(%rbp),%rax mov 0x10(%rax),%rax mov -0x18(%rbp),%rdx sub $0x1,%rdx add %rdx,%rax movzbl (%rax),%eax cmp %al,-0xc(%rbp) ↓ jne e2 a[(*n)++] = (char *) p->eqkid; mov -0x28(%rbp),%rax ... 10e: nop } ``` 看懂我們需要的指令不難，只需了解 `mov` 時 `0x8(%rax)` 會對 `%rax + 0x8` 這地址做 dereference，而 `(%rax)` 可理解爲 `0x0(%rax)` 的縮寫。另外還需知道 [x86 General registers ](http://www.eecg.toronto.edu/~amza/www.mindsec.com/files/x86regs.html)。 ``` 32 bits : EAX EBX ECX EDX 16 bits : AX BX CX DX 8 bits : AH AL BH BL CH CL DH DL ``` 進一步比較兩種機制的差異，左邊的數值是 cache miss 發生在該指令的 period。 `test_ref`：兩處熱點 ![](https://i.imgur.com/h18IW1X.png) ![](https://i.imgur.com/im84PJi.png) `test_cpy`：一處熱點 ![](https://i.imgur.com/2VRQofI.png) ![](https://i.imgur.com/BOUJuwD.png) 一開始很困惑到底爲什麼第一處是發生在 `mov -0x10(%rbp),%r8d`（把 `max` 的值給 `%r8d` register），原來 [perf 在回報時會有偏差](https://stackoverflow.com/questions/43794510/linux-perf-reporting-cache-misses-for-unexpected-instruction)，這裡看起來實際發生 cache miss 是在上一個 instruction， `mov 0x8(%rax),%rax`，即對 p 做 dereference（這裡是 function 裡的第一次，之後這 function 再對 `p` 做 dereference 時就很大機率 cache hit 了） `p->lokid` =>`(*p).lokid`，其實以 compiler 的角度（也是實際過程）應該是 `*(tst_node*)((char*)p + offsetof(tst_node, lokid))`，這裡的 `0x8` 即是所謂的 `offsetof(tst_node, lokid)`（在編譯時期可以算出來，所以指令中才是赤裸裸的 `0x8`），`rax` 在這裡是 `p`。第二處便是之前提到的字串 dereference `*(((char *) p->eqkid) + nchr - 1)`（這裡一樣有回報偏差，實際上也是上一條 instruction `movzbl (%rax),%eax`），原本以爲 REF 機制在這裡發生 cache miss 的機率只是比 CPY 機制稍高，沒想到竟然高了**十幾倍**！應該說 CPY 機制在這裡完全沒有特別容易 cache miss 的傾向。於是決定把地址印出來看看： ```C void tst_suggest （...） { ... if (p->key) tst_suggest(p->eqkid, c, nchr, a, n, max); else { if (*(((char *) p->eqkid) + nchr - 1) == c) a[(*n)++] = (char *) p->eqkid; printf("%p\n", (((char *) p->eqkid) + nchr - 1)); } ... } ``` `./test_cpy --bench`： ``` 0x556d88e35022 0x556d88d31862 0x556d88cedf92 0x556d8870d522 ... ``` `./test_ref --bench`： ``` 0x7f76374d48d7 0x7f763735fcf8 0x7f76374b9321 0x7f763743ac33 ... ``` `malloc` 小空間的 `0x556d88e35022` 等等大概是 heap 空間，那麼 `0x7f76374d48d7` 呢？ google 搜了好久後來還是在 man pages 中找到： `$ man malloc` >**NOTES** Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable using mallopt(3). Prior to Linux 4.7 allocations performed using mmap(2) were unaffected by the RLIMIT_DATA resource limit; since Linux 4.7, this limit is also enforced for allocations performed using mmap(2). 原來在 `malloc` 大於 `MMAP_THRESHOLD` bytes 的空間時，會呼叫 `mmap`，而 `mmap` 會（在 heap 和 stack 之間）創建一個 `vm_area_struct`，將檔案映射到虛擬記憶體空間。詳見[认真分析mmap：是什么为什么怎么用](https://www.cnblogs.com/huxiao-tee/p/4660352.html)。 ![](https://i.imgur.com/SxKZPKk.png) [圖片來源](https://jin-yang.github.io/post/kernel-memory-mmap-introduce.html) Standard segment layout in a Linux process. (32-bit) ![](https://i.imgur.com/syPRnCp.png) [圖片來源](https://manybutfinite.com/post/anatomy-of-a-program-in-memory/) 但這樣還是無法解釋實驗結果的差異，再對程式進行以下觀察： ```C #include <stdint.h> void tst_suggest （...） { ... if (p->key) tst_suggest(p->eqkid, c, nchr, a, n, max); else { if (*(((char *) p->eqkid) + nchr - 1) == c) a[(*n)++] = (char *) p->eqkid; printf("%ld\n", (intptr_t)(void*)(((char *) p->eqkid) + nchr - 1) - (intptr_t)(void*)&p->lokid); } ... } ``` `./test_ref --bench`： ``` 45833574649567 45833579989413 45833586770473 45833588397029 ... ``` `./test_cpy --bench`： ``` 42 42 42 42 ... ``` ### 終於找到 CPY 機制在 prefix-search 時效率比較好的原因回想 tst.c 的 `tst_ins_del`： ```C void *tst_ins_del(tst_node **root, const char *s, const int del, const int cpy) { ... if (*p++ == 0) { if (cpy) { /* allocate storage for 's' */ const char *eqdata = strdup(s); if (!eqdata) return NULL; curr->eqkid = (tst_node *) eqdata; return (void *) eqdata; } else { /* save pointer to 's' (allocated elsewhere) */ curr->eqkid = (tst_node *) s; return (void *) s; } } ... } ``` CPY 機制在 insert 時，創出 leaf 之後會間接呼叫 `malloc` 來存字串，在這之前的最後一次 `malloc` 是爲了分配該 leaf 的空間。一直 `malloc` 時，heap 也會类似 stack 一样增長，所以像作業在建 tree 這種很規律的情況下，字串地址與 leaf 的地址很可能都很靠近。也就是說，CPY 機制在一開始 `mov 0x8(%rax),%rax` （`p->eqkid`）後，字串就很大可能在 cache 裡面了，因此 cache miss 機率較低，這就是它相對來說在 prefix-search 上效率比較好的原因！ ## CPY 機制與 REF 機制在建 tree 的效率比較兩者建 tree 差異： + CPY 機制 + 讀檔時每次將地名暫時存入同一字串 `word` :+1: 只要 cache 沒被覆蓋就不會發生 cache miss + ternary search tree 的 terminal node 用 `strdup` 將地名存起來 :-1: 需要複製字串（影響小） :-1: 間接呼叫 `malloc`，可能發生 system call（影響大） + REF 機制 + 讀檔時每次將地名永久（直到 memory pool 被 free）依序存入 memory pool :-1: cache miss 次數 $\ge$ 使用的 memory pool 大小 / cache 一次載入的大小 + ternary search tree 的 terminal node 直接存 reference（指向 memory pool 的某個位置） :+1: 沒有明顯額外 overhead 還在嘗試用 perf 來分析 ## 缺陷補充 ### `tst_ins_del` 奇怪的參數在閱讀 test_cpy.c 時發現以下程式碼： ```C int main(int argc, char **argv) { char word[WRDMAX] = ""; ... char *p = word; if (!tst_ins_del(&root, &p, INS, CPY)) { .... } ``` `p` 指向 `word` 的開頭後將 `p` 的地址傳入 `tst_ins_del` ？有什麼用處呢？發現 `void *tst_ins_del(tst_node **root, char *const *s, const int del, const int cpy)` 沒直接操作 `s`，也不會改到 `*s` 的值。且 test_cpy.c 中有使用到 `char *p` 的其他所有地方也無意義。同樣的，test_ref.c 也有這情況。因此刪掉多餘的程式碼並將 `tst_ins_del` 參數中的 `char *cosnt *s` 改爲 `const char *s`。[git commit](https://github.com/siahuat0727/dict/commit/a51d66981456f61cc4b484e7a9c5c3551a0a495a) ### test_cpy.c 和 test_ref.c 的其他 bug 修復 ```C FILE *fp = fopen(IN_FILE, "r"); if (!fp) { /* prompt, open, validate file for reading */ fprintf(stderr, "error: file open failed '%s'.\n", argv[1]); return 1; } ``` 這裡的 `argv[1]` 應改爲 `IN_FILE` ```C int main() { ... bloom_t bloom = bloom_create(TableSize); ... for(;;) { ... switch(*word) { ... quit: case 'q': tst_free_all(root); return 0; ... } } bloom_free(bloom); return 0; } ``` 這裡 `case 'q'` 會直接 `return 0`，bloom filter 被 create 卻沒 free，造成 memory leak。 [修改後](https://github.com/siahuat0727/dict/commit/feb8eaa76b2ae66f2f0c248febf647184abd347d)： ```C int main() { ... case 'q': goto quit; ... quit: tst_free_all(root); bloom_free(bloom); return 0; } ``` >後來發現還有其他地方，待修改 ### Delete 不存在的資料會導致插入以下爲 `tst_ins_del` 簡化版的 pseudo code： ```C void *tst_ins_del(tst_node **root, const char *s, const int del, const int cpy) { 在樹中尋找字串 s，若發現已存在，執行對應的插入（增加 ref 數）或刪除後返回 // 這裡應加上：若動作爲刪除時直接返回表示刪除失敗，不要進行插入插入字串 s } ``` 由於原程式中 return 值爲 `NULL` 表示刪除成功，我想這裡可以學習 `sbrk()` 使用 `(void*) -1` 表示刪除失敗。(剛好 `test_cpy.c` 和 `test_ref.c` 中用 return 值 `res` => `if (res)` 來判斷是否刪除失敗，因此不用特別做修改。） [git commit](https://github.com/siahuat0727/dict/commit/194d134952dae16b2df02c7ef97284a213b86968) ### Search A 會發生 Segmentation fault [問題定位可參考這位同學](https://hackmd.io/nLSb34t0SE2V-56Gxi7ErA?view#%E5%B0%8B%E6%89%BE-search-%E2%80%9CA%E2%80%9D-%E7%99%BC%E7%94%9F-Segfault-%E7%9A%84%E5%8E%9F%E5%9B%A0) `tst_sugggest` 多處 call 到自己，因此應做以下[修改](https://github.com/siahuat0727/dict/commit/f7c31f754e6edd555c70afd49dfa2512de14267a)： ```C= void tst_suggest(const tst_node *p, const char c, const size_t nchr, char **a, int *n, const int max) { if (!p || *n == max) // 改 *n >= max return; tst_suggest(p->lokid, c, nchr, a, n, max); if (p->key) tst_suggest(p->eqkid, c, nchr, a, n, max); else if (*(((char *) p->eqkid) + nchr - 1) == c) // 需要再次確保 *n < max，因爲 *n 可能改變了 a[(*n)++] = (char *) p->eqkid; tst_suggest(p->hikid, c, nchr, a, n, max); } ``` 第 13 行較不影響效能的解決方法是把第 10 行的搜索改到第 14 行與第 15 行之間（這能確保 `*n` 值與進入 function 時一樣），但這會造成結果不會按字母順序排列，在這裡大概得不償失。 ### 城市名後面跟着逗號將參考[< `ChiuYiTang` >同學的解決思路](https://hackmd.io/s/ByeocUhnZ#%E9%87%9D%E5%B0%8D%E5%90%84%E5%9C%8B%E5%9F%8E%E5%B8%82%E7%9A%84-prefix-search-%E6%9C%89%E7%84%A1%E7%BC%BA%E9%99%B7%E5%91%A2%EF%BC%9F%E6%AF%94%E6%96%B9%E8%AA%AA%E7%84%A1%E6%B3%95%E5%8D%80%E5%88%86%E5%9F%8E%E5%B8%82%E5%92%8C%E5%9C%8B%E5%AE%B6%EF%BC%8C%E4%B8%A6%E6%8F%90%E5%87%BA%E5%85%B7%E9%AB%94%E7%A8%8B%E5%BC%8F%E7%A2%BC%E7%9A%84%E4%BF%AE%E6%AD%A3)，以逗號爲提示，區分城市名和國家名。另外，記錄城市名時不應該把逗號也存下來。 ### Bloom filter 空間浪費整個 table 都把 byte 當 bit 來用，浪費 1/8 的 table 空間。 ```C void bloom_add(bloom_t filter, const void *item) { struct bloom_hash *h = filter->func; uint8_t *bits = filter->bits; while (h) { unsigned int hash = h->func(item); hash %= filter->size; bits[hash] = 1; // 每個 uint8_t 不是 1 就是 0 h = h->next; } } ``` ### 潛在危機 ```C if (argc > 1 && strcmp(argv[1], "--bench") == 0) // a for auto strcpy(word, argv[2]); ``` 直接 `strcpy` 有可能發生 buffer overflow ## Reference [prefix-search by `rex662624`](https://hackmd.io/s/SJbHD5sYM) [prefix-search by `ChiuYiTang`](https://hackmd.io/s/ByeocUhnZ#) [Perf -- Linux下的系统性能调优工具](https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/) [x86 General registers ](http://www.eecg.toronto.edu/~amza/www.mindsec.com/files/x86regs.html) [Linux内存分配小结--malloc、brk、mmap](https://blog.csdn.net/gfgdsg/article/details/42709943) [认真分析mmap：是什么为什么怎么用](https://www.cnblogs.com/huxiao-tee/p/4660352.html) [glibc内存管理ptmalloc源代码分析](https://paper.seebug.org/papers/Archive/refs/heap/glibc%E5%86%85%E5%AD%98%E7%AE%A1%E7%90%86ptmalloc%E6%BA%90%E4%BB%A3%E7%A0%81%E5%88%86%E6%9E%90.pdf)