2018q3 Homework3 (dict)

# 2018q3 Homework3 (dict) ###### tags: `System-Software-2018` contributed by < `DyslexiaS` > [E02: dict](https://hackmd.io/s/B1Bq5Jat7) ## 開發環境 ``` Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 61 Model name: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz Stepping: 4 CPU MHz: 1536.107 CPU max MHz: 2700.0000 CPU min MHz: 500.0000 BogoMIPS: 4389.96 Virtualisation: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 ``` ## 程式碼分析差異：`test_ref.c` 和 `test_cpy.c` 一開始在執行 `tst_ins_del` 之前： CPY使用 memory pool 的方式，再一開始就先 malloc 一大區塊的記憶體空間，之後就切一小塊使用 ### `tst_ins_del` 一開始看到 `tst_cpy.c` 發現第二個參數為 `char *const *s` ，由於 `const` 修飾前面的 `char*`，也就是對 `s` 所指向的內容，`*s` (檔案讀近來的字串)進行限定。而對 `*s` 來說，由於不能通過` *s= ...` 來進行另外賦值，因此可將 type 改成`char *const s` -> 參數也要從 `&p` 改成 `p` ```clike while ((rtn = fscanf(fp, "%s", word)) != EOF) { char *p = word; if (!tst_ins_del(&root, &p, INS, CPY)) ... } ``` 但後來發現傳進 function 的 `p` 在 `tst.c` 中僅使用到該位置的值，因此根本不需要在一開始將 word assign 給 `p` ，再傳入 `p`，可以將 `test_cpy.c` 和 `test_ref.c` 中多餘的 code 刪除 ### test_ref segmentation fault 執行 ```shell ./test_ref --bench Segmentation fault (core dumped) ``` 尋找原因：發現 `test_ref.c` 根本沒有考慮 `argc==2` 情況！！還在那邊腦補好久 :crying_cat_face: 補上遺失的 [code](https://github.com/DyslexiaS/dict/commit/f28fb6254df58e2b42403b3d52ef0a4af192355f#diff-6bcd8b219b110a587a128ba93c146482)：(記得要使用 `tst_free(root)`) ```clike if (argc == 2 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free(root); return stat; } ``` ### prefix cpy ```clike /* bench.c */ strncpy(prefix, word, 3); tst_search_prefix(root, prefix, sgl, &sidx, max); ``` prefix copy word 前三個 char，但卻沒有在後面加上 `\0` 的符號，在做搜尋的時候，`*start` 會一直往下找下去，造成 `nchr` 也不是 3 ```clike /* tst.c */ void *tst_search_prefix(const tst_node *root, const char *s, char **a, int *n, const int max) { const char *start = s; ... /* get length of s */ for (; *start; start++) ; /* wait */ const size_t nchr = start - s; ... } ``` 因此，把 `prefix` 的大小設成 4 ,用前 3 位，第 4 為自然就會是 '\0' ### 兩者都加入 Bloom filter 將 `test_cpy.c` 加上 [Bloom filter](https://github.com/DyslexiaS/dict/commit/ce158e6477b1f6fab687b5fdcbd8924a8708da7a) 的實做，以便做兩種方法的效能比較 ### 效能比較 **修改Makefile** ```shell /* Makefile */ bench_all: $(TESTS) @for test in $(TESTS); do\ ./$$test --bench ; \ done ``` **make plot:** ```shell plot: $(TESTS) gnuplot scripts/runtimept.gp eog runtime2.png & ``` ![](https://i.imgur.com/ait0XLX.png) `CPY` 明顯在搜尋部份比 `REF` 效能更好，和一開始猜測的結果有出路。以下用 perf 來實證 ```c /* Makefile */ bench_perf: $(TESTS) @for test in $(TESTS); do\ perf stat \ -e cache-misses,cache-references,instructions,cycles \ ./$$test --bench ; \ done ``` ```shell Performance counter stats for './test_cpy --bench': 1,359,539,493 cache-misses # 76.724 % of all cache refs 1,771,994,791 cache-references 49,822,032,050 instructions # 0.83 insn per cycle 60,056,929,114 cycles 24.275522922 seconds time elapsed ternary_tree, loaded 259112 words in 0.246672 sec ``` ```shell Performance counter stats for './test_ref --bench': 1,536,667,544 cache-misses # 79.805 % of all cache refs 1,925,526,536 cache-references 49,893,918,811 instructions # 0.68 insn per cycle 72,910,864,842 cycles 29.475014232 seconds time elapsed ``` `REF` 的 cache miss 和 cache-references 都比 `CPY` 來的多，原以為 `REF` 以 memory pool 的方式實做建出 tree，會減少 cache miss ，顯然跟想像中的完全相反。詳細的原因，可以看 [siahuat0727](https://hackmd.io/s/HJElcrA9Q#Perf-%E5%88%86%E6%9E%90) 的開發紀錄，裡面有很詳細的分析過程 ### `calculate.c` 與 `runtime.gp` ## [Linux 效能分析工具: Perf](https://hackmd.io/s/B11109rdg) ### 使用權限全開 ```shell $ sudo sh -c " echo -1 >/proc/sys/kernel/perf_event_paranoid" ``` 背景執行 ```shell $ ./example & ``` 取得一個 pid 值，查看 ```shell $ perf top -p $pid ``` ### 效能分析 ## Reference [Makefile 語法和示範](https://hackmd.io/s/SySTMXPvl) [Linux 製圖工具: gnuplot](https://hackmd.io/s/Skwp-alOg) [siahuat0727 github](https://hackmd.io/ClIwCm59Ttya3_FPwez2IQ?view#2018q3-Homework3-dict)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.