2020q3 Homework3 (dict)

# 2020q3 Homework3 (dict) contributed by < `kuouu` > ###### tags: `2020進階電腦系統理論與實作` ## HW requirement 1. Fork dict on GitHub, and visualize ternary search tree + bloom filter's performance and analyze it. - you need to design experiments to allow different string input, and study the TST response statistic distribution - consider prefix search property and vocabulary coverage - use `gnuplot` to visualize the result 2. use `perf` analyze TST program runtime and discuss your insight 3. consider CPY and REF implementation, give the explaination of the performance gap, and propose the possible improve method 4. study ... ## Enviornment Setup - macOS connot install `perf` and `gnuplot` - but maybe can use built in application **instrument** - linux image in docker can use `linux-tool-generic`, but some infomation cannot access due to [access problem](https://www.swift.org/server/guides/linux-perf.html) - use [lima](https://github.com/lima-vm/lima) setup linux vm env ### tool installation `perf`: ``` sudo apt install linux-tools-common linux-tools-generic sudo reboot ``` `gnuplot`: ``` sudo apt install gnuplot ``` ## Analyze `dict` codebase ### common usage **interactive mode (argc == 2)** ```shell ./test_common CPY // copy mechanism ./test_common REF // reference mechanism ./test_common anyRandomWord // default is set to REF ``` **bench mode (argc == 3)** this mode is use for ploting the result, and the data will be set to `bench_ref.txt`, which is defined by `BENCH_TEST_FILE` ```shell ./test_common --bench CPY ``` **direct mode (argc > 3)** ```shell ./test_common --bench CPY s Tai // search Tai and terminate program ``` ### make **make test** 1. clean cache 2. run direct mode for 100 times and use perf to measure the result **make bench** run direct mode and grep only result text **make plot** same as `make test` but extract data and write to csv file ### script **`runtime.gp`**: output.txt -> runtime.png **`runtime3.gp`**: bench_cpy.txt -> runtime3.png **`runtimebox.gp`**: bench_ref.txt bench_cpy.txt **`runtimept.gp`**: bench_cpy.txt -> runtime2.png ## Analyze and visualize the performance of a TST + bloom filter ### Experiment Design - (find) random string - (find) length: short, medium, long - (search) length: short, medium, long - ... ### Result Visualization #### 1. Ternary Tree Loaded Time Comparison ![default-test](https://hackmd.io/_uploads/HJQYOBEUp.png) #### 2. Search Time Comparison ![search-time-1](https://hackmd.io/_uploads/SkjR5S4Ua.png) command: `s Tai` ![search-time-2](https://hackmd.io/_uploads/rJPVsrVUT.png) command: `s Ta` ![search-time-3](https://hackmd.io/_uploads/HJjOoHVLa.png) command: `s T` #### 3. Find Time Comparison ![find-time](https://hackmd.io/_uploads/B1TLfU4LT.png) command: `s Taiwan` I found it very difficult to test this functionality because the search time is to short. So I switch to a larger dataset [english-words](https://github.com/dwyl/english-words) with 479k English words. ### Performance