2018q3 Homework3 (dict)

# 2018q3 Homework3 (dict) ## 作業目標 * 在 GitHub 上 fork [dict](https://github.com/sysprog21/dict)，研讀並修改 `Makefile` 以允許 `$ make plot` 時，視覺化 ternary search tree + bloom filter 的效能表現。注意，需要設計實驗，允許不同的字串輸入並探討 TST 回應的統計分佈 * 要考慮到 prefix search 的特性還有詞彙的涵蓋率 * 限用 gnuplot 來製圖 * 參閱 [Hash Table Benchmarks](http://incise.org/hash-table-benchmarks.html) * 以 perf 分析 TST 程式碼的執行時期行為並且充分解讀 * 考慮到 CPY 和 REF 這兩種實作機制，思考效能落差的緣故，並且提出後續效能改善機制 * 充分量化並以現代處理器架構的行為解讀 * 注意：這兩者都該加入 Bloom filter，而且分析具體效益 (原有程式存在若干缺陷，也該指出並修正) ## 作業筆記 * 只要在系統輸入Key，便能找到相對應的Value，這就是Dictionary的基本概念。 * Ternary Search Tree 概念上等於 binary tree + trie tree ## 輸出有些問題 ``` allen@Mac:~/dict$ ./test_cpy ternary_tree, loaded 259112 words in 0.149099 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: f find word in tree: Taiwan Taiwan not found. Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: q allen@Mac:~/dict$ ls bench.c bloom.h cities.txt Makefile test_cpy test_ref tst.c bench.h bloom.o cpy.txt README.md test_cpy.c test_ref.c tst.h bloom.c calculate.c LICENSE scripts test_cpy.o test_ref.o tst.o allen@Mac:~/dict$ ./test_ref ternary_tree, loaded 259112 words in 0.190092 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: s find words matching prefix (at least 1 char): Tai Tai - not found Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ``` ## Bloom Filter [Bloom Filter 影片解說](https://www.youtube.com/watch?v=bEmBh1HtYrw) ### False Positive 機率 False Positive 直白翻譯"假正確"。舉例:程式搜尋資料庫完畢後，告訴我們搜尋的東西存在資料庫裡面，但事實上資料庫裡面不存在搜尋的東西。下面這張圖片有比較好的解釋。 ![](https://i.imgur.com/4nbkY1b.png) * Table 中某個位址經過 all of elements 設定後還是 false 的機率是 ==$(1-\dfrac{1}{m})^{kn}$== * m : Table 大小 * n : elements 數量 * k : hash funtions 數量 * False Positive = all k bits of new element are already set = ==$(1-e^{-\frac{kn}{m}})^k$== * 就是丟進去新的元素發現經過 hash functions 後產生出的位置在 table 中都被設定為 True ，會讓我誤以為原來新的元素已經存在在資料庫，明明不在，卻說存在 -> False Positive ## 20181008 進度 :::info test_ref.c : tst + bloom filter test_cpy.c : tst ::: * test_ref.c 裡面的 bloom filter 是用來新增字串與查詢字串在不在，但不知道實際字串是甚麼( 實際字串就是自己搜尋的，哈哈)。主要還是透過 tst.h 來做 prefix search 與，應該還有更好的結合。 ![](https://i.imgur.com/xwamm9n.jpg) ## 20181009 進度 ### 作業問題 * 考慮到 CPY 和 REF 這兩種實作機制，思考效能落差的緣故，並且提出後續效能改善機制 * 充分量化並以現代處理器架構的行為解讀 * 注意：這兩者都該加入 Bloom filter，而且分析具體效益 (原有程式存在若干缺陷，也該指出並修正) :::warning 作業的第三點如上， REF 是 CPY 加進 bloom filter 功能。不明白的是 Bloom filter 要在加進去 CPY ？這樣不就是 REF ？還是意思是希望我們想如何讓 Bloom filter 有更好的結合與 tst ::: * intel 7700k ``` Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數：2 每通訊端核心數：4 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 158 Model name: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 製程： 9 CPU MHz： 800.027 CPU max MHz: 4500.0000 CPU min MHz: 800.0000 BogoMIPS: 8400.00 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 8192K NUMA node0 CPU(s): 0-7 ``` L1d + L1i + L2 + L3 = 8512 8512/1024 = 8.3125 MB 與 [Intel® Core™ i7-7700K spec 描述一樣](https://ark.intel.com/zh-tw/products/97129/Intel-Core-i7-7700K-Processor-8M-Cache-up-to-4-50-GHz-) `$sudo make test` * `./test_ref performance` ``` Performance counter stats for './test_ref --bench s Tai' (100 runs): 458,0888 cache-misses # 41.055 % of all cache refs ( +- 0.17% ) 1115,7977 cache-references ( +- 0.12% ) 5,9634,0103 instructions # 1.17 insn per cycle ( +- 0.00% ) 5,0979,4107 cycles ( +- 0.11% ) 0.115807915 seconds time elapsed ( +- 0.12% ) ``` * `./test_cpy performance` ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 415,1947 cache-misses # 43.241 % of all cache refs ( +- 0.71% ) 960,1796 cache-references ( +- 0.33% ) 5,3562,9405 instructions # 1.22 insn per cycle ( +- 0.00% ) 4,3817,3754 cycles ( +- 0.33% ) 0.100212966 seconds time elapsed ( +- 0.35% ) ``` * cache-misses : 代表 cache 沒辦法提供給 memory 多餘空間的次數，換個想法就是， cpu 在 cache 中找不到想要的東西 * cache-references : 代表 cache 命中 * 藉由 insn per cycle 與 of all cache refs 這兩個指標可以了解到 cpu 執行的指令的速度與 cache 有沒有符合 cpu 的預期給資料 * [GitHub reference](https://stackoverflow.com/questions/12601474/what-are-perf-cache-events-meaning) * [CSDN reference](https://blog.csdn.net/witsmakemen/article/details/17715775) * CSDN 應該已經 ban 掉 140.116.xxx.xxx，學校網路都進不去，可以透過家用網路，或手機都還看的到連結。 ___ * MacBook air ``` Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數：2 每通訊端核心數：2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 61 Model name: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz 製程： 4 CPU MHz： 1385.380 CPU max MHz: 2700.0000 CPU min MHz: 500.0000 BogoMIPS: 3199.98 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 3072K ``` * `./test_ref performance` ``` Performance counter stats for './test_ref --bench s Tai' (100 runs): 3,426,304 cache-misses # 51.351 % of all cache refs ( +- 0.79% ) 6,672,330 cache-references ( +- 0.55% ) 596,753,595 instructions # 1.09 insn per cycle ( +- 0.00% ) 549,840,002 cycles ( +- 0.59% ) 0.208506766 seconds time elapsed ( +- 0.66% ) ``` * `./test_cpy performance` ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 3,210,788 cache-misses # 52.980 % of all cache refs ( +- 0.87% ) 6,060,422 cache-references ( +- 0.65% ) 535,947,375 instructions # 1.10 insn per cycle ( +- 0.01% ) 487,229,023 cycles ( +- 0.64% ) 0.187340189 seconds time elapsed ( +- 0.76% ) ``` 可以發現到在跑這兩個程式時，這兩台電腦的確在 cpu 及 cache 效能有反應出來都是高時脈及大快取的 i7-7700k 跑出來效能較好 ## 作業筆記紀錄 * 有些程式慢是因為計算量太大，其多數時間都應該在使用 CPU 進行計算，這叫做 CPU bound 型；有些程式慢是因為過多的 IO ，這種時候其 CPU 利用率應該不高，這叫做 IO bound 型；對于 CPU bound 程式的調校和 IO bound 的調校是不同的，出問題的方式不一樣的。 * [reference](https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/index.html) * cpy.txt 與 ref.txt 紀錄 load cities.txt 時間理論上兩個時間應該要差不多 * `fopen()` 以附加 `a` 的方式打開只寫文件。若文件不存在，則會創造該文件。如果文件存在，寫入的資料會延續上次的繼續寫，即文件原先的内容會被保留。 20181024 作業紀錄 * 一個 ref 與 cpy 差在 mempry pool 這個，之前在寫其他程式時有看過，但不了解是什麼，路慢慢，路漫漫。 * plot 可以針對者兩個在 prefix 搜尋比較效能，當初想的很簡單，以為只差別在 bloom filter 。但後來應該是兩總實做都要加上去，可好好比較 ref 與 cpy 差別 * Bloom filter 常用再檢查這個垃圾郵件，檢查這個地址是不是有出現在之前的黑名單，等等...。之前有想到一個很笨的困擾我很久的問題，就是 Bloom filter 儲存資料並非實際把資料存進去，有點像是利用編碼，結合 hash table 來找尋是否存在，所以我很好奇那就算知道在，那資料呢？突然恍然大悟，資料就是自己查詢的資料，只是有機率會是錯的，取決你用 table size 還有 hash functin 數量。如果是要做bloom filter 結合 prefix search 就不能。 Reference: * [CPU Cache 原理探討](https://hackmd.io/s/H1U6NgK3Z#) * [Perf 使用](https://hk.saowen.com/a/b6b4110cddc0a8d10a9a5a14a44e9ab3317f88324c133bca50d1fc0ce73afed5) * [Perf 中文說明](https://wizardforcel.gitbooks.io/100-gdb-tips/modify-pc-register.html) ___ `static void rmcrlf(char *s)` ```c= static void rmcrlf(char *s) { size_t len = strlen(s); if (len && s[len - 1] == '\n'){ s[--len] = 0; } ``` `static void rmcrlf(char *s)` 是因為在原本使用 fgets 時候會多包含一個 '\n' ，在這邊做去除。實際為什麼會多一個需要再查詢 `fgets()` 實際怎麼用。 ___ ```= allen@Mac:~/dict$ ./test_ref --bench ternary_tree, loaded 259112 words in 0.185805 sec --bench-success-- 程式記憶體區段錯誤 (core dumped) ``` :::danger 修改 ./test_ref ，讓他可以使用 --bench ，會出現 core dumped 。在 search 與 bench 時，都沒有去新增或減少記憶體，只有查詢而已。而在 ./test_ref 做 prefix search 時，也沒發生這個問題。 ::: ##### 找出問題點開啟 Core Dump 功能 ```= mec@mec-MS-7A63:~/JN_work/embededHW/dict$ ulimit -c unlimited mec@mec-MS-7A63:~/JN_work/embededHW/dict$ sudo ./test_ref --bench bloom filter build success ternary_tree, loaded 259112 words in 0.100380 sec 程式記憶體區段錯誤 (core dumped) mec@mec-MS-7A63:~/JN_work/embededHW/dict$ sudo gdb ./test_ref core ... Core was generated by `./test_ref --bench'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007ffa9a299532 in __GI___libc_free (mem=0x7ffa7b555eea) at malloc.c:2967 2967 malloc.c: 沒有此一檔案或目錄. ``` 不知道錯誤是什麼意思，覺得是應該有存取到不該存取的記憶體位置， free() 到不應該 free() 的地方。 * [Reference](https://hk.saowen.com/a/f8959115092480021e7adf7b9c23f9e6a40e8b611735e06f8a06ddbbd022413d) ___ ![](https://i.imgur.com/0YCatfS.png) * prefix search compare

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.