2017q3 Homework2 (prefix-search)

# 2017q3 Homework2 (prefix-search) contributed by <`yusihlin`> ### Reviewed by `zhanyangch` * 建立 TST 的時間的圖中，REF 有過大的離群值，像是[此圖](https://i.imgur.com/LFUumy4.png)，有一次的執行時間高達0.5左右，希望同學能夠解釋原因，如果是誤差，應在畫圖時將其去除，可以使圖片要呈現的資訊更明確。 * 有提到 Circular buffer 部份功能未使用，可以針對 Circular buffer 與 memory pool 作比較，提出其分別適合哪些應用場景。 ### Ternary search tries * 每個節點儲存一個 character 、一個 value 和三個指向 character 的指針 * 每個節點有三個子樹：low (left), equal (middle), high (right) ![](https://i.imgur.com/CF7lQpT.png) #### **搜尋** * 從根節點開始檢查，若資料中第一個 character 小於根節點，則調用 low link，大於根節點調用 high link，相等則為 equal link 並繼續檢查下一個 character * 找到最後會有兩種情況 * Search hit : 找到一個 value * Search miss : 找不到 value 或找到一半就沒有 link ![](https://i.imgur.com/CWq6ilS.png) #### **插入** * 先檢查要新增的資料是否已在 TST 上，若無則新增節點，查找過程中如有 equal link 時刪除資料的第一個 character ### test_cpy 測試 * [zhanyangch](https://hackmd.io/s/Syc8yOhhW#) 發現當進行 prefix-search 時產生 Segmentation fault，使用 GDB debugging ```C $ gdb test_cpy ... (gdb) run ... choice: s find words matching prefix (at least 1 char): A Program received signal SIGSEGV, Segmentation fault. 0x0000000000401d81 in tst_suggest (p=0x12e1a30, c=65 'A', nchr=1, a=0x7fffffffbc80, n=0x7fffffffbc40, max=1024) at tst.c:330 330 a[(*n)++] = (char *) p->eqkid; ``` * 到 tst.c 查看，當 tst_suggest 執行時多次進行遞迴呼叫，但當 *n = 1024 時執行到 a[(*n)++] = (char *) p->eqkid 對 a[1024] 存取，導致寫入不合法的記憶體位置，*n 又會繼續增加並寫入 * 修正 (!p || *n == max) 為 (!p || *n >= max) 進行 1024 次查找 * a[(*n)++] = (char *) p->eqkid 添加 *n ！= max 的判斷不寫入 a[1024] ### tst_ref 修正 * insert reference to each string : 進行 prefix searching 時使用 REF，將 tst_ins_del(&root, &p, INS, CPY) 改為 tst_ins_del(&root, &p, INS, REF) #### **測試 'q'** ```C choice: q *** Error in `.../test_ref': free(): invalid pointer: 0x00007fffffffdd10 *** ``` * 執行 quit 會執行 tst_free_all，觀察內容發現它會 free tst_node 所儲存的字串 * 從 tst_ins_del 觀察 CPY 和 REF 的差異 * CPY : 使用 [strdup()](http://c.biancheng.net/cpp/html/166.html)，功能為先用 maolloc() 配置與參數 s 字串相同的空間大小，然後將 s 字串的內容複製到分配好的位址並回傳位址，此地址可以被 free * REF : 直接將 *s 的位置 assign 給 curr->eqkid，變數的位址沒有另外配置空間，可能在外部（e.g. stack），不能被 free * tst_free_all 改為 tst_free ```C case 'q': tst_free(root); return 0; break; ``` * 測試 'a' 新增 word 後用 'q' 刪除同樣會有 invalid pointer 的問題，概念同上，修改 tst_ins_del 內呼叫 tst_del_word 傳入的參數，如此當傳入 REF 時字串存在外部不會被 free ```C ... if (del) { /* delete instead of insert */ (curr->refcnt)--; /* decrement reference count */ /* chk refcnt, del 's', return NULL on successful del */ return tst_del_word(root, curr, &stk, cpy); } else curr->refcnt++; /* increment refcnt if word exists */ return (void *) curr->eqkid; /* pointer to word / NULL on del */ ... ``` #### **測試 's'** ``` choice: s find words matching prefix (at least 1 char): Tain Tain - searched prefix in 0.000017 sec suggest[0] : Tain suggest[1] : Tain suggest[2] : Tain suggest[3] : Tain suggest[4] : Tain suggest[5] : Tain suggest[6] : Tain suggest[7] : Tain suggest[8] : Tain suggest[9] : Tain suggest[10] : Tain suggest[11] : Tain ``` * 參考 [st9007a](https://hackmd.io/s/SkeKQEq3Z#修正-test-refc-part-2) 的共筆在 loading words 前預先配置一塊空間供存放，優點是一次分配所需的記憶體，但由於不知道 word 的大小給每個 word 都配置 WRDMAX=256 的空間，所以有些空間被浪費掉了 :::info 1. 可考慮用 [circular buffer](https://en.wikipedia.org/wiki/Circular_buffer) 2. [Swoole](https://github.com/swoole/swoole-src) 的 memory pool 實作探討: [Swoole源码学习记录](https://github.com/LinkedDestiny/swoole-src-analysis/blob/master/03.%E4%B8%89%E7%A7%8DMemoryPool(%E4%B8%8B).md) :notes: jserv ::: ```C enum { INS, DEL, WRDMAX = 256, STKMAX = 512, LMAX = 1024, CITYMAX = 260000 }; char (*city_space)[WRDMAX] = malloc(sizeof *city_space * CITYMAX); ``` * loading words 時改存入剛剛配置好的空間 ```C while ((rtn = fscanf(fp, "%s", city_space[idx])) != EOF) { char *p = city_space[idx]; /* FIXME: insert reference to each string */ if (!tst_ins_del(&root, &p, INS, REF)) { fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } idx++; } ``` * 修正 add word 時改存到 city_space ```C case 'a': printf("enter word to add: "); if (!fgets(city_space[idx], sizeof city_space[idx], stdin)) { fprintf(stderr, "error: insufficient input.\n"); break; } rmcrlf(city_space[idx]); p = city_space[idx]; ... ``` * 離開時釋放剛剛配置的空間 ```C case 'q': tst_free(root); free(city_space); return 0; break; ``` ### 效能評比 * 修改 Makefile 以執行 bench 測試，參考 [HMKRL](https://hackmd.io/s/BkdddNs3-#效能測試) 搭配 chrt, taskset 做測試 ``` bench: echo 3 | sudo tee /proc/sys/vm/drop_caches perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ sudo chrt -f 99 taskset -c 0 ./test_cpy --bench echo 3 | sudo tee /proc/sys/vm/drop_caches perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ sudo chrt -f 99 taskset -c 0 ./test_ref --bench ``` * 在 test_cpy 和 test_ref 加入接受 --bench 的執行參數，並將loading words 和 prefix searching 測試的結果分別存進 txt 中 ```C if (argc == 2 && strcmp(argv[1], "--bench") == 0) { /* build loading time txt */ fp = fopen (BENCH_LOAD_OUT,"a+"); if (!fp) { fprintf(stderr, "error: file open failed in '%s'.\n", BENCH_LOAD_OUT); return 1; } fprintf(fp, "%.6f\n",t2 - t1); fclose(fp); /* build searching time txt */ int r = bench_test(root, BENCH_SEARCH_OUT, sgl, &sidx, LMAX); tst_free_all(root); return r; } ``` * 直接將原本 loading words 的時間存入，然後進到 bench.c 中做 prefix searching，搜尋的方式從 A-Z, a-z ```C int bench_test(const tst_node *root, char *search_out, char **a, int *n, const int max) { char word[] = "A"; int i; double t1, t2, search_time; FILE *fp = fopen (search_out, "a+"); if (!fp) { fprintf(stderr, "error: file open failed in '%s'.\n", search_out); return 1; } for (i = 0; i < 25; i++) { t1 = tvgetf(); tst_search_prefix(root, word, a, n, max); t2 = tvgetf(); search_time += t2 - t1; word[0]++; } word[0] = 'a'; for (i = 0; i < 25; i++) { t1 = tvgetf(); tst_search_prefix(root, word, a, n, max); t2 = tvgetf(); search_time += t2 - t1; word[0]++; } fprintf(fp, "%.6f\n",search_time); fclose(fp); return 0; } ``` * 搜尋 word 的方式參考 [zhanyangch](https://hackmd.io/s/Syc8yOhhW) ，只不過我把要寫入的 search.txt 傳入 bench.c 與 loading 的做區隔 ``` Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./test_cpy --bench' (100 runs): 137 cache-misses # 24.244 % of all cache refs ( +- 11.26% ) 567 cache-references ( +- 7.32% ) 1,852 instructions # 0.40 insn per cycle ( +- 8.18% ) 4,614 cycles ( +- 10.57% ) 0.207701012 seconds time elapsed ( +- 3.43% ) Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./test_ref --bench' (100 runs): 110 cache-misses # 21.932 % of all cache refs ( +- 2.52% ) 500 cache-references ( +- 0.76% ) 1,676 instructions # 0.45 insn per cycle 3,743 cycles ( +- 2.19% ) 0.225630593 seconds time elapsed ( +- 3.05% ) ``` * 測試結果看到執行時間 CPY 比 REF 短，但 cache miss 較 REF 多，但這包含了建立 TST 和搜尋的時間，應該要分別獨立來看 * 建立 TST 的時間 ![](https://i.imgur.com/CKu6opR.png) * Prefix searching 的時間 ![](https://i.imgur.com/AGkLtAA.png) * 求平均和標準差我用了 gnuplot 裡的 [fit command](http://gnuplot.sourceforge.net/docs_4.2/node82.html) 利用原本的資料建立 data set，利用 fit 回傳 mean，使用 fit 時同時會設置兩個變數，一個為離均差平方和 FIT_WSSR，另一個是自由度 (degrees of freedom) FIT_NDF，所以可以用 sqrt(FIT_WSSR / (FIT_NDF + 1 )) 求標準差，因為自由度比樣本數少一 * 建立 TST 時 CPY 會因字串長度分配記憶體，而記憶體通常是連續的，REF 先分配一塊足夠大的空間，每次分配的時候要去找位址，每個字的位置是不連續的，會多花點時間 * 搜尋的時候差距就更大了，同樣是因為空間分配的原因，記憶體配置的問題之後可以試著使用 Circular Buffer 改善 ### forward declaration * 使用 forward declaration 的好處是可以減少不必要的 #include，減少重新編譯的時間，但是後面的宣告需採用指標的方式，還有個限制是如果需用到 struct 的大小、成員便不可用此方法 * 應用在 phone 上，可以將 entry 使用 forward declaration 建立一個共同的 *.h，在實作上可以分別對不同的方法做 declaration ### Circular buffer 測試 * 參考[環形緩衝區](https://zh.wikipedia.org/wiki/%E7%92%B0%E5%BD%A2%E7%B7%A9%E8%A1%9D%E5%8D%80)自己建立了一個有部份功能的 circular buffer * Circular buffer 是一種用於表示一個固定尺寸、投尾相連的緩衝區的資料結構，適合快取資料流 * 一般含 4 個指標，長度、開始位置、結尾位置以及實際的 buffer * 先進先出，當一個資料元素被用掉後，其餘的資料元素不需要移動其儲存位置，假如 buffer滿了，從最開始的地方開始覆蓋 ```C // Circular buffer object typedef struct { int size; // maximum number of elements int start; // index of oldest element unsigned int end; // index at which to write new element char *elems; // vector of elements } CircularBuffer; void cbInit(CircularBuffer *cb, int size) { cb->size = size; cb->start = 0; cb->end = 0; cb->elems = (char *)calloc(cb->size, sizeof(char)); } ``` * 應用在 REF 之中也只是拿來做資料儲存用，跟 memory pool 很像，如果在此案例分配不足的空間 load words，將會覆蓋原本的儲存空間在後續 prefix searching 的處理上會產生錯誤 ( prefix searching 讀到後面覆蓋的資料 )，所以我還是讓空間比要寫入的資料大 * 測試 Circular buffer 的性質我在分配空間後指向 CircularBufferSize 一半的位置，start, end 也移至相應的位置開始存放資料 ```C /* start store from the middle */ buffer.start = CircularBufferSize >> 1; buffer.end = CircularBufferSize >> 1; buffer_head += (CircularBufferSize >> 1); while ((rtn = fscanf(fp, "%s", buffer_head)) != EOF) { char *p = buffer_head; /* FIXME: insert reference to each string */ if (!tst_ins_del(&root, &p, INS, REF)) { fprintf(stderr, "error: memory exhausted, tst_insert.\n"); fclose(fp); return 1; } idx++; int word_len = strlen(buffer_head) + 1; if( WRDMAX + buffer.end >= CircularBufferSize) { buffer_head -= buffer.end; buffer.end = 0; } else { buffer.end = word_len + buffer.end; buffer_head += word_len; } } ``` * 當空間存到不足以放 WRDMAX 時，則跳至空間起始開始存放，由 GDB 觀看 loading 完後的 buffer 狀況 ```C Breakpoint 1,... (gdb) p buffer.start $1 = 1500000 (gdb) p buffer.end $2 = 679425 (gdb) p buffer.elems+1500000 $3 = 0x7ffff789e370 "Shanghai," (gdb) p buffer.elems+2999750 $4 = 0x7ffff7a0c5d6 "Belaya" (gdb) p buffer.elems $5 = 0x7ffff7730010 "Gora," ``` * 我設置的 CircularBufferSize = 3000000，所以 cities 的第一筆 Shanghai 存在中間 buffer.elems+1500000，而 Belaya 之後的空間不足以放 WRDMAX 所以下一筆 Gora, 存在空間起始位置 * 因為在此案例只是做儲存，所以 Circular buffer 部份功能沒使用到，也應該說在此案只需使用 memory pool 即可達到連續分配空間的效果 #### **效能比較** ```C Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./test_cpy --bench' (100 runs): 116 cache-misses # 22.543 % of all cache refs ( +- 3.40% ) 513 cache-references ( +- 1.56% ) 1,688 instructions # 0.42 insn per cycle ( +- 0.72% ) 4,036 cycles ( +- 3.10% ) 0.211468978 seconds time elapsed ( +- 3.76% ) Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./test_ref --bench' (100 runs): 126 cache-misses # 23.132 % of all cache refs ( +- 8.87% ) 543 cache-references ( +- 5.52% ) 1,830 instructions # 0.41 insn per cycle ( +- 8.40% ) 4,471 cycles ( +- 8.84% ) 0.221327696 seconds time elapsed ( +- 5.16% ) ``` * 跟原本使用 2D array 分別與 CPY 的差異在 cache miss 反而上升了，執行時間差異不大 ![](https://i.imgur.com/LFUumy4.png) ![](https://i.imgur.com/g8Aaxmz.png) * 在 loading words 時 REF 的速度比 CPY 快了，因從不連續的空間儲存改成使用連續的空間會比 strdup 快一些， prefix searching 的差異跟原本的方法差不多，時間略為下降 ### 參考資料 * [5.2 TRIES-ternary search tries](http://algs4.cs.princeton.edu/lectures/52Tries.pdf) > 不要只貼網址，要把參照的文件/論文標題寫出來 > [name="jserv"][color=red] > 已修正 > [name="yusihlin"] * [fopen - C++ Reference - Cplusplus.com](http://www.cplusplus.com/reference/cstdio/fopen/) * [Basic statistics with gnuplot](http://www.phyast.pitt.edu/~zov1/gnuplot/html/statistics.html) * [Statistical overview - Gnuplot](http://gnuplot.sourceforge.net/docs_4.2/node86.html) * [Forward Declarations](http://coherence.pixnet.net/blog/post/47717126-forward-declarations)