# 2020q3 Homework3 (dict)
contributed by < `kuouu` >
###### tags: `2020進階電腦系統理論與實作`
## HW requirement
1. Fork dict on GitHub, and visualize ternary search tree + bloom filter's performance and analyze it.
- you need to design experiments to allow different string input, and study the TST response statistic distribution
- consider prefix search property and vocabulary coverage
- use `gnuplot` to visualize the result
2. use `perf` analyze TST program runtime and discuss your insight
3. consider CPY and REF implementation, give the explaination of the performance gap, and propose the possible improve method
4. study ...
## Enviornment Setup
- macOS connot install `perf` and `gnuplot`
- but maybe can use built in application **instrument**
- linux image in docker can use `linux-tool-generic`, but some infomation cannot access due to [access problem](https://www.swift.org/server/guides/linux-perf.html)
- use [lima](https://github.com/lima-vm/lima) setup linux vm env
### tool installation
`perf`:
```
sudo apt install linux-tools-common linux-tools-generic
sudo reboot
```
`gnuplot`:
```
sudo apt install gnuplot
```
## Analyze `dict` codebase
### common usage
**interactive mode (argc == 2)**
```shell
./test_common CPY // copy mechanism
./test_common REF // reference mechanism
./test_common anyRandomWord // default is set to REF
```
**bench mode (argc == 3)**
this mode is use for ploting the result, and the data will be set to `bench_ref.txt`, which is defined by `BENCH_TEST_FILE`
```shell
./test_common --bench CPY
```
**direct mode (argc > 3)**
```shell
./test_common --bench CPY s Tai // search Tai and terminate program
```
### make
**make test**
1. clean cache
2. run direct mode for 100 times and use perf to measure the result
**make bench**
run direct mode and grep only result text
**make plot**
same as `make test` but extract data and write to csv file
### script
**`runtime.gp`**: output.txt -> runtime.png
**`runtime3.gp`**: bench_cpy.txt -> runtime3.png
**`runtimebox.gp`**: bench_ref.txt bench_cpy.txt
**`runtimept.gp`**: bench_cpy.txt -> runtime2.png
## Analyze and visualize the performance of a TST + bloom filter
### Experiment Design
- (find) random string
- (find) length: short, medium, long
- (search) length: short, medium, long
- ...
### Result Visualization
#### 1. Ternary Tree Loaded Time Comparison
![default-test](https://hackmd.io/_uploads/HJQYOBEUp.png)
#### 2. Search Time Comparison
![search-time-1](https://hackmd.io/_uploads/SkjR5S4Ua.png)
command: `s Tai`
![search-time-2](https://hackmd.io/_uploads/rJPVsrVUT.png)
command: `s Ta`
![search-time-3](https://hackmd.io/_uploads/HJjOoHVLa.png)
command: `s T`
#### 3. Find Time Comparison
![find-time](https://hackmd.io/_uploads/B1TLfU4LT.png)
command: `s Taiwan`
I found it very difficult to test this functionality because the search time is to short. So I switch to a larger dataset [english-words](https://github.com/dwyl/english-words) with 479k English words.
### Performance