2019q3 Homework5 (dict)

# 2019q3 Homework5 (dict) contributed by < `YenHengLin` > ## 預備知識 ### bloom filter 的 hash function (djb2) djb2 function 其形式跟 [LCG(線性同餘方法)](https://zh.wikipedia.org/wiki/%E7%B7%9A%E6%80%A7%E5%90%8C%E9%A4%98%E6%96%B9%E6%B3%95)相同都為 $N_{j+1} \equiv ( N_j*A + B ) mod\ M$ ，將 `hash = ((hash << 5) + hash) + c` 轉換一下就能得到 $hash = hash * 33 + c$ ，帶入上一個函式 A 為 33， B 為 character ，至於 M 由於是在 2 進位系統， unsigned int 只能儲存到 32位的數字，所以 M 就為 $2^{32}$ 。 LCG 的最大週期為 M 但是大部分情況都會小於 M，為了要達到最大週期，A B M 就必須要特殊選取過 * B，M 互質 * M 的所有質因數都能整除 A-1 * 若 M 是 4 的倍數， A-1 也是 * A，B 是正整數回到 djb2 現在我們可以知道 5381 和 33 這兩個數字是怎麼選出來的 * 5381 為質數符合 B，M互質，所以將 5381 換成其他質數也能有一樣的效果 * $33-1=32$ ，而 $2^{32}$ 的質因數為 2，符合 M 的所有質因數都能整除 A-1，另外 M 是 4 的倍數，A-1也是。 djb2 透過合理的選擇數字使得 $0\sim2^{32}-1$，都有 1 對 1 對應的數字使得碰撞率不會過高，另外 *33 改成用 left shift 操作，來增加運算的速度 ### LCG 特性證明 * B，M 互質 LCG 的週期 T 等於 M 的時候，B 跟 M 互質，所以利用若 P 則 Q ，非 Q 則非 P 來證明。一開始假設 B 跟 M 不互質，則 B 跟 M 會有共同的因數 d，我們可以將 B 跟 M 寫成 $B= b*d$ ，$M=m*d$ ，則 $N_{j+1} \equiv ( N_j*A + b*d ) mod\ m*d$ 可以改寫成 $N_{j+1} \equiv ( N_j*A + B ) mod\ M$ 將 Nj 初始值帶 0 進去就會發現上式可改寫成 $N_{j+1} \equiv ( N_j*A + b*d ) mod\ m*d$ ### bloom filter 的 hash function (jenkins) * [雪崩性](https://zh.wikipedia.org/wiki/%E9%9B%AA%E5%B4%A9%E6%95%88%E5%BA%94):當輸入的資料發生最微小的變化，一個 bit 發生改變時，輸出的 bit 要有 50% 都發生變化 * jenkins 就符合雪崩性，當一個 bit 改變則輸出會有 16 個bit 都發生改變，符合 50% 都變化的原則(輸出為 unsigned int 有 32 bit) ## 對[隱藏在字典裡的數學](https://hackmd.io/@sysprog/BkE3uSvdN)依樣畫葫蘆 ### 在 makefile 新增 perf-search * code ```cmake perf-search: $(TESTS) @for test in $(TESTS);do \ echo 3 | sudo tee /proc/sys/vm/drop_caches; \ perf stat \ -e cache-misses,cache-references,instructions,cycles \ ./$$test --bench; \ done ``` 執行 `$make perf-search` 顯示 `You may not have permission to collect stats.`，去查 perf_event_paranoid 發現值為 3，屬於不允任何量測也不允許任何 cpu events data 的權限，透過指令 ` $sudo sh -c " echo -1 > /proc/sys/kernel/perf_event_paranoid"` 將權限修該成 -1，也就是權限全開的狀態 * 從新執行 `$make perf-search`，得到了 ``` 3 ternary_tree, loaded 259112 words in 0.211776 sec Performance counter stats for './test_cpy --bench': 383,077,458 cache-misses # 55.469 % of all cache refs 690,617,085 cache-references 49,869,063,023 instructions # 0.56 insn per cycle 89,164,787,279 cycles 26.868986327 seconds time elapsed 26.628128000 seconds user 0.111832000 seconds sys 3 ternary_tree, loaded 259112 words in 0.188451 sec Performance counter stats for './test_ref --bench': 447,661,956 cache-misses # 56.466 % of all cache refs 792,799,520 cache-references 49,846,958,921 instructions # 0.47 insn per cycle 105,727,595,852 cycles 31.934900738 seconds time elapsed 31.548879000 seconds user 0.111847000 seconds sys ``` 發現 ref 的 cache-misses 比 cpy 還要高，所以在一次用 perf 指令 `$perf record -e cache-misses ./test_ref --bench && perf report`，檢查哪個 instruction 導致 miss 率最高 ``` 98.32% test_ref test_ref [.] tst_suggest ◆ 0.22% test_ref test_ref [.] tst_search_prefix ▒ 0.12% test_ref test_ref [.] tst_ins_del ▒ 0.11% test_ref test_ref [.] tst_free ``` 發現 **tst_suggest** 是我們要找的 bottle neck 修改 tst.c 的 suggest 函式 ```c void tst_suggest(...) { ... ... else //if (*(((char *) p->eqkid) + nchr - 1) == c && *n < max) a[(*n)++] = (char *) p->eqkid; tst_suggest(p->hikid, c, nchr, a, n, max); } ``` 將 dereference 的部分註解掉，再次執行 `$make perf-search` ``` 3 ternary_tree, loaded 259112 words in 0.225895 sec Performance counter stats for './test_cpy --bench': 336,505,388 cache-misses # 52.624 % of all cache refs 639,455,417 cache-references 45,931,523,827 instructions # 0.59 insn per cycle 78,327,966,745 cycles 23.668885649 seconds time elapsed 23.472659000 seconds user 0.083859000 seconds sys 3 ternary_tree, loaded 259112 words in 0.289854 sec Performance counter stats for './test_ref --bench': 327,947,273 cache-misses # 51.669 % of all cache refs 634,713,384 cache-references 45,905,355,900 instructions # 0.59 insn per cycle 77,171,338,083 cycles 23.404041016 seconds time elapsed 23.171297000 seconds user 0.071873000 seconds sys ``` 發現 cache-misses 數有明顯的下降，利用 gnuplot 可以清楚的看出執行效能 ref 比 cpy 還要好 ## 參考資料 * [LCG(線性同餘方法)](https://zh.wikipedia.org/wiki/%E7%B7%9A%E6%80%A7%E5%90%8C%E9%A4%98%E6%96%B9%E6%B3%95) * [雪崩性](https://zh.wikipedia.org/wiki/%E9%9B%AA%E5%B4%A9%E6%95%88%E5%BA%94) * [隨機算法線性同餘法的理解](https://zhuanlan.zhihu.com/p/36301602)