# 2020q3 Homework3 (dict) contributed by < `guaneec` > ###### tags: `sysprog2020` Quick links: * [Problem description](https://hackmd.io/@sysprog/2020-dict) * [Submissions](https://hackmd.io/@sysprog/2020-homework3) * [2019q3](https://hackmd.io/@sysprog/rkw3RV0YS) * [2018q3](https://hackmd.io/@sysprog/SkJbKd1c7?type=view) * [Github repo](https://github.com/guaneec/sysprog-2020q3-dict) ## Visualizing the performance of TST with BF The input is taken from `cities.txt`, with some slight modifications: * Full match: no change (e.g. `Taiwan`) * Almost match: last character set to `'#'` (e.g. `Taiwa#`) * No match: first character set to `'#'` (e.g. `#aiwan`) :::danger `E.g.` stands for exempli gratia and means "for example." :notes: jserv ::: ### Effect of query string length ![](https://i.imgur.com/iFoGXMN.png) ```make img/l-full.png``` ![](https://i.imgur.com/x8MbL9a.png) ```make img/l-almost.png``` ![](https://i.imgur.com/Y9cr8HM.png) ```make img/l-none.png``` Comment: Bloom works best when the query string has a long matching prefix ### Effect of number of total items ![](https://i.imgur.com/1PSeNeV.png) ```make img/n-full.png``` ![](https://i.imgur.com/HNsqeD0.png) ```make img/n-almost.png``` ![](https://i.imgur.com/CAGf412.png) ```make img/n-none.png``` Comment: Exp-like curve :::warning :question: The spike at 44k items is present in all three graphs. I tried shuffling the input dictionary and the spike remained. Why? ::: ## Analysis with `perf` ```shell $ make test > /dev/null Performance counter stats for './test_common --bench CPY s Tai' (100 runs): 94,2328 cache-misses # 36.565 % of all cache refs ( +- 0.57% ) 257,7141 cache-references ( +- 0.22% ) 5,8474,0029 instructions # 1.05 insn per cycle ( +- 0.02% ) 5,5861,7286 cycles ( +- 0.16% ) 0.143788 +- 0.000449 seconds time elapsed ( +- 0.31% ) Performance counter stats for './test_common --bench REF s Tai' (100 runs): 96,3867 cache-misses # 37.434 % of all cache refs ( +- 0.77% ) 257,4858 cache-references ( +- 0.23% ) 5,5054,2701 instructions # 0.99 insn per cycle ( +- 0.00% ) 5,5686,3078 cycles ( +- 0.23% ) 0.143091 +- 0.000341 seconds time elapsed ( +- 0.24% ) ``` ```shell $ make bench COPY mechanism test_common => choice: find words matching prefix (at least 1 char): Tai - searched prefix in 0.000202 sec REFERENCE mechanism test_common => choice: find words matching prefix (at least 1 char): Tai - searched prefix in 0.000235 sec ``` REF is slightly slower than CPY. The reason may be more cache-misses. ![](https://i.imgur.com/4SGdo4V.png) ![](https://i.imgur.com/1jHAhbu.png) REF is faster for loading. ### CPY vs REF #### f: finding words The times given by `CLOCK_REALTIME` is really bad on my machine: ![](https://i.imgur.com/hzRegxp.png) `CLOCK_MONOTONIC` gives a smoother result ![](https://i.imgur.com/zXqCZ2a.png) This shows that measurements provided by `test_common` is too coarse to be meaningful. Btw the difference above is most likely some random fluctuation. Here's a figure of 16 repeated trials: ![](https://i.imgur.com/o749A7L.png) CPY is faster for query, REF is faster for insertion. For our use case (looking up words in a dictionary), most of the operations would be querying. Therefore in this case, CPY is a better choice. ``` $ echo 'q' | /usr/bin/time -v ./test_common CPY 2>&1 | grep 'Maximum resident' Maximum resident set size (kbytes): 30120 $ echo 'q' | /usr/bin/time -v ./test_common REF 2>&1 | grep 'Maximum resident' Maximum resident set size (kbytes): 29360 ``` :::warning **Aside** Q: Why does `./test_common REF` use ~30MB of memory when `pool = (char *) malloc(poolsize * sizeof(char));` allocates ~50MB? * [Does mmap or malloc allocate RAM?](https://stackoverflow.com/questions/21119617/does-mmap-or-malloc-allocate-ram) ::: `s Tai`: REF is slightly slower than CPY I logged the pointers dereferenced in `tst_suggest`. REF: ``` ... ptr: 0x56408c23c550: tst_node* ptr: 0x56408c23c580: tst_node* ptr: 0x56408c23c5b0: tst_node* ptr: 0x56408c23c5e0: tst_node* ptr: 0x7f65c7a15229: char* ptr: 0x56408c24d980: tst_node* ptr: 0x56408c25ab20: tst_node* ptr: 0x56408d5c95a0: tst_node* ... ``` CPY: ``` ptr: 0x55a4bec5ac50: tst_node* ptr: 0x55a4bec5ac80: tst_node* ptr: 0x55a4bec5acb0: tst_node* ptr: 0x55a4bec5ace0: char* ptr: 0x55a4bec6dab0: tst_node* ptr: 0x55a4bec7c020: tst_node* ptr: 0x55a4c0202850: tst_node* ptr: 0x55a4c0202880: tst_node* ``` With REF, the address for the string is too far away (it's in the memory pool). This causes cache misses and degrades performance. Such difference does not happen with `f` (`tst_search`) because `tst_search` only looks at the key within the `tst_node` and does not dereference a pointer to a string. ### `tst_search_prefix`, improved But why do we need to dereference the string anyway? It seems like a mistake in the implementation. In [this commit](https://github.com/guaneec/sysprog-2020q3-dict/commit/7b6d34e90ce645de6c0943bd2691b66e58e848ca), I removed the check that dereferences the string. It works. And it's faster. Original: ![](https://i.imgur.com/2ahM4xQ.png) New: ![](https://i.imgur.com/gI7DVJ0.png) `make img/stai.png` The cache problem is gone. No more difference between CPY and REF. The overall run time is also lower. ## The Xor Filter I used [the implementation provided by the original authors](https://github.com/FastFilter/xor_singleheader). The xor filter requires that all hashed value from the original set to be unique. Turns out the 2 hash functions `djb2` and `jenkins` found in `bloom.c` produce hash collisions. I use an alternative hash function called murmur. ![](https://i.imgur.com/YJbV9iT.png) Xor is slighly slower than bloom when finding. ![](https://i.imgur.com/9DVoLyy.png) Xor is slower when initializing.