# 2020q3 Homework3 (dict)
contributed by < `guaneec` >
###### tags: `sysprog2020`
Quick links:
* [Problem description](https://hackmd.io/@sysprog/2020-dict)
* [Submissions](https://hackmd.io/@sysprog/2020-homework3)
* [2019q3](https://hackmd.io/@sysprog/rkw3RV0YS)
* [2018q3](https://hackmd.io/@sysprog/SkJbKd1c7?type=view)
* [Github repo](https://github.com/guaneec/sysprog-2020q3-dict)
## Visualizing the performance of TST with BF
The input is taken from `cities.txt`, with some slight modifications:
* Full match: no change (e.g. `Taiwan`)
* Almost match: last character set to `'#'` (e.g. `Taiwa#`)
* No match: first character set to `'#'` (e.g. `#aiwan`)
:::danger
`E.g.` stands for exempli gratia and means "for example."
:notes: jserv
:::
### Effect of query string length
![](https://i.imgur.com/iFoGXMN.png)
```make img/l-full.png```
![](https://i.imgur.com/x8MbL9a.png)
```make img/l-almost.png```
![](https://i.imgur.com/Y9cr8HM.png)
```make img/l-none.png```
Comment:
Bloom works best when the query string has a long matching prefix
### Effect of number of total items
![](https://i.imgur.com/1PSeNeV.png)
```make img/n-full.png```
![](https://i.imgur.com/HNsqeD0.png)
```make img/n-almost.png```
![](https://i.imgur.com/CAGf412.png)
```make img/n-none.png```
Comment: Exp-like curve
:::warning
:question: The spike at 44k items is present in all three graphs. I tried shuffling the input dictionary and the spike remained. Why?
:::
## Analysis with `perf`
```shell
$ make test > /dev/null
Performance counter stats for './test_common --bench CPY s Tai' (100 runs):
94,2328 cache-misses # 36.565 % of all cache refs ( +- 0.57% )
257,7141 cache-references ( +- 0.22% )
5,8474,0029 instructions # 1.05 insn per cycle ( +- 0.02% )
5,5861,7286 cycles ( +- 0.16% )
0.143788 +- 0.000449 seconds time elapsed ( +- 0.31% )
Performance counter stats for './test_common --bench REF s Tai' (100 runs):
96,3867 cache-misses # 37.434 % of all cache refs ( +- 0.77% )
257,4858 cache-references ( +- 0.23% )
5,5054,2701 instructions # 0.99 insn per cycle ( +- 0.00% )
5,5686,3078 cycles ( +- 0.23% )
0.143091 +- 0.000341 seconds time elapsed ( +- 0.24% )
```
```shell
$ make bench
COPY mechanism
test_common => choice: find words matching prefix (at least 1 char): Tai - searched prefix in 0.000202 sec
REFERENCE mechanism
test_common => choice: find words matching prefix (at least 1 char): Tai - searched prefix in 0.000235 sec
```
REF is slightly slower than CPY. The reason may be more cache-misses.
![](https://i.imgur.com/4SGdo4V.png)
![](https://i.imgur.com/1jHAhbu.png)
REF is faster for loading.
### CPY vs REF
#### f: finding words
The times given by `CLOCK_REALTIME` is really bad on my machine:
![](https://i.imgur.com/hzRegxp.png)
`CLOCK_MONOTONIC` gives a smoother result
![](https://i.imgur.com/zXqCZ2a.png)
This shows that measurements provided by `test_common` is too coarse to be meaningful.
Btw the difference above is most likely some random fluctuation. Here's a figure of 16 repeated trials:
![](https://i.imgur.com/o749A7L.png)
CPY is faster for query, REF is faster for insertion.
For our use case (looking up words in a dictionary), most of the operations would be querying. Therefore in this case, CPY is a better choice.
```
$ echo 'q' | /usr/bin/time -v ./test_common CPY 2>&1 | grep 'Maximum resident'
Maximum resident set size (kbytes): 30120
$ echo 'q' | /usr/bin/time -v ./test_common REF 2>&1 | grep 'Maximum resident'
Maximum resident set size (kbytes): 29360
```
:::warning
**Aside**
Q: Why does `./test_common REF` use ~30MB of memory when `pool = (char *) malloc(poolsize * sizeof(char));` allocates ~50MB?
* [Does mmap or malloc allocate RAM?](https://stackoverflow.com/questions/21119617/does-mmap-or-malloc-allocate-ram)
:::
`s Tai`: REF is slightly slower than CPY
I logged the pointers dereferenced in `tst_suggest`.
REF:
```
...
ptr: 0x56408c23c550: tst_node*
ptr: 0x56408c23c580: tst_node*
ptr: 0x56408c23c5b0: tst_node*
ptr: 0x56408c23c5e0: tst_node*
ptr: 0x7f65c7a15229: char*
ptr: 0x56408c24d980: tst_node*
ptr: 0x56408c25ab20: tst_node*
ptr: 0x56408d5c95a0: tst_node*
...
```
CPY:
```
ptr: 0x55a4bec5ac50: tst_node*
ptr: 0x55a4bec5ac80: tst_node*
ptr: 0x55a4bec5acb0: tst_node*
ptr: 0x55a4bec5ace0: char*
ptr: 0x55a4bec6dab0: tst_node*
ptr: 0x55a4bec7c020: tst_node*
ptr: 0x55a4c0202850: tst_node*
ptr: 0x55a4c0202880: tst_node*
```
With REF, the address for the string is too far away (it's in the memory pool). This causes cache misses and degrades performance.
Such difference does not happen with `f` (`tst_search`) because `tst_search` only looks at the key within the `tst_node` and does not dereference a pointer to a string.
### `tst_search_prefix`, improved
But why do we need to dereference the string anyway? It seems like a mistake in the implementation.
In [this commit](https://github.com/guaneec/sysprog-2020q3-dict/commit/7b6d34e90ce645de6c0943bd2691b66e58e848ca), I removed the check that dereferences the string. It works. And it's faster.
Original:
![](https://i.imgur.com/2ahM4xQ.png)
New:
![](https://i.imgur.com/gI7DVJ0.png)
`make img/stai.png`
The cache problem is gone. No more difference between CPY and REF. The overall run time is also lower.
## The Xor Filter
I used [the implementation provided by the original authors](https://github.com/FastFilter/xor_singleheader).
The xor filter requires that all hashed value from the original set to be unique. Turns out the 2 hash functions `djb2` and `jenkins` found in `bloom.c` produce hash collisions. I use an alternative hash function called murmur.
![](https://i.imgur.com/YJbV9iT.png)
Xor is slighly slower than bloom when finding.
![](https://i.imgur.com/9DVoLyy.png)
Xor is slower when initializing.