2018q3 Homework3 (dict)

# 2018q3 Homework3 (dict) contributed by < [osamchiu](https://github.com/osamchiu) > ## Install perf and Testing ```sh $ sudo apt-get install linux-tools-common [sudo] password for samchiu: 正在讀取套件清單... 完成正在重建相依關係正在讀取狀態資料... 完成下列【新】套件將會被安裝： linux-tools-common 升級 0 個，新安裝 1 個，移除 0 個，有 237 個未被升級。需要下載 160 kB 的套件檔。此操作完成之後，會多佔用 303 kB 的磁碟空間。下載:1 http://tw.archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-tools-common all 4.15.0-36.39 [160 kB] 取得 160 kB 用了 0秒 (1,720 kB/s) 選取了原先未選的套件 linux-tools-common。（讀取資料庫 ... 目前共安裝了 178258 個檔案和目錄。）準備解開 .../linux-tools-common_4.15.0-36.39_all.deb ... 解開 linux-tools-common (4.15.0-36.39) 中... 設定 linux-tools-common (4.15.0-36.39) ... Processing triggers for man-db (2.8.3-2) ... $ perf WARNING: perf not found for kernel 4.15.0-34 You may need to install the following packages for this specific kernel: linux-tools-4.15.0-34-generic linux-cloud-tools-4.15.0-34-generic You may also want to install one of the following packages to keep up to date: linux-tools-generic linux-cloud-tools-generic ``` 補安裝一下上面提到的`linux-tools-4.15.0-34-generic` ```sh sudo apt-get install linux-tools-4.15.0-34-generic 正在讀取套件清單... 完成正在重建相依關係正在讀取狀態資料... 完成下列的額外套件將被安裝： linux-tools-4.15.0-34 下列【新】套件將會被安裝： linux-tools-4.15.0-34 linux-tools-4.15.0-34-generic 升級 0 個，新安裝 2 個，移除 0 個，有 237 個未被升級。需要下載 1,471 kB 的套件檔。此操作完成之後，會多佔用 7,009 kB 的磁碟空間。是否繼續進行 [Y/n]？ [Y/n] Y 下載:1 http://tw.archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-tools-4.15.0-34 amd64 4.15.0-34.37 [1,469 kB] 下載:2 http://tw.archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-tools-4.15.0-34-generic amd64 4.15.0-34.37 [1,960 B] 取得 1,471 kB 用了 0秒 (3,099 kB/s) 選取了原先未選的套件 linux-tools-4.15.0-34。（讀取資料庫 ... 目前共安裝了 178312 個檔案和目錄。）準備解開 .../linux-tools-4.15.0-34_4.15.0-34.37_amd64.deb ... 解開 linux-tools-4.15.0-34 (4.15.0-34.37) 中... 選取了原先未選的套件 linux-tools-4.15.0-34-generic。準備解開 .../linux-tools-4.15.0-34-generic_4.15.0-34.37_amd64.deb ... 解開 linux-tools-4.15.0-34-generic (4.15.0-34.37) 中... 設定 linux-tools-4.15.0-34 (4.15.0-34.37) ... 設定 linux-tools-4.15.0-34-generic (4.15.0-34.37) ... ``` 安裝成功用老師的code測試 ``` C #include <stdio.h> #include <unistd.h> double compute_pi_baseline(size_t N) { double pi = 0.0; double dt = 1.0 / N; for (size_t i = 0; i < N; i++) { double x = (double) i / N; pi += dt / (1.0 + x * x); } return pi * 4.0; } int main() { printf("pid: %d\n", getpid()); sleep(10); compute_pi_baseline(50000000); return 0; } ``` 然後發現忘了改權限(==直接sudo vim 還不給改為什麼呢？==) ```sh sudo sh -c " echo 0 > /proc/sys/kernel/perf_event_paranoid" ``` 然後就可以用了 ``` Samples: 2K of event 'cycles:uppp', Event count (approx.): 1260096297 Overhead Shared Object Symbol 100.00% perf_top_example [.] compute_pi_baseline 0.00% libc-2.27.so [.] __nanosleep 0.00% [kernel] [k] native_irq_return_iret ``` ## dict Fork and Testing 拉下來後看一下MAKEFILE ``` TESTS = test_cpy test_ref TEST_DATA = s Tai CFLAGS = -O0 -Wall -Werror -g # Control the build verbosity ifeq ("$(VERBOSE)","1") Q := VECHO = @true else Q := @ VECHO = @printf endif GIT_HOOKS := .git/hooks/applied .PHONY: all clean all: $(GIT_HOOKS) $(TESTS) $(GIT_HOOKS): @scripts/install-git-hooks @echo OBJS_LIB = \ tst.o bloom.o OBJS := \ $(OBJS_LIB) \ test_cpy.o \ test_ref.o deps := $(OBJS:%.o=.%.o.d) test_%: test_%.o $(OBJS_LIB) $(VECHO) " LD\t$@\n" $(Q)$(CC) $(LDFLAGS) -o $@ $^ -lm %.o: %.c $(VECHO) " CC\t$@\n" $(Q)$(CC) -o $@ $(CFLAGS) -c -MMD -MF .$@.d $< test: $(TESTS) echo 3 | sudo tee /proc/sys/vm/drop_caches; perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_cpy --bench $(TEST_DATA) perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_ref --bench $(TEST_DATA) bench: $(TESTS) @for test in $(TESTS); do\ ./$$test --bench $(TEST_DATA); \ done clean: $(RM) $(TESTS) $(OBJS) $(RM) $(deps) rm -f bench_cpy.txt bench_ref.txt ref.txt cpy.txt caculate -include $(deps) ``` 大體有test bench clean這三種先make過後執行看看 ``` $ ./test_cpy ternary_tree, loaded 259112 words in 0.134990 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: q $ ./test_ref ternary_tree, loaded 259112 words in 0.162607 sec Commands: a add word to the tree f find word in tree s search words matching prefix d delete word from the tree q quit, freeing all data choice: ``` 看起來沒啥問題開始著手看題目要求 ## 作業本體 ### 作業要求 * 在 GitHub 上 fork [dict](https://github.com/sysprog21/dict)，研讀並修改 `Makefile` 以允許 `$ make plot` 時，視覺化 ternary search tree + bloom filter 的效能表現。注意，需要設計實驗，允許不同的字串輸入並探討 TST 回應的統計分佈 * 要考慮到 prefix search 的特性還有詞彙的涵蓋率 * 限用 gnuplot 來製圖 * 參閱 [Hash Table Benchmarks](http://incise.org/hash-table-benchmarks.html) * 以 perf 分析 TST 程式碼的執行時期行為並且充分解讀 * 考慮到 CPY 和 REF 這兩種實作機制，思考效能落差的緣故，並且提出後續效能改善機制 * 充分量化並以現代處理器架構的行為解讀 * 注意：這兩者都該加入 Bloom filter，而且分析具體效益 (原有程式存在若干缺陷，也該指出並修正) ### 總結上面 - [x] 讓make可以吃plot - [x] 視覺化tst+bf by gnuplot(應該可以參考benchmarks) - [x] 用perf看run time狀況 - [ ] 提出改善（這讓我在想想...） ### 改make 這邊我其實先做了下面那個步驟（設計實驗＋plot）做完之後其實就是照著流程 1. clean 2. make test 3. ./test_cpy --beach 執行產出 beanch_cpy.txt 4. ./test_ref --beach 執行產出 beanch_ref.txt 5. gnuplot scirpt/*.gp 6. eog step.5 images. 把上面這些寫在make plot裡面不過實驗設計我覺得自己是沒作到什麼因為畢竟到這裡只是拿舊有的去處理跟展示 ``` makefile plot: test gcc -o caculate calculate.c ./caculate gnuplot scripts/runtime.gp gnuplot scripts/runtime3.gp gnuplot scripts/runtimept.gp eog runtime.png eog runtime2.png eog runtime3.png ``` ### 設計實驗＋plot 先看看make前面發現 ```makefile TESTS = test_cpy test_ref TEST_DATA = s Tai test: $(TESTS) echo 3 | sudo tee /proc/sys/vm/drop_caches; perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_cpy --bench $(TEST_DATA) perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_ref --bench $(TEST_DATA) ``` 測試檔案test_cpy 跟 test_ref 測試資料是s(搜尋) Tai 測試用perf測一百次然後看cache-misses,cache-references,instructions,cycles這四個數值結果： ```shell= Performance counter stats for './test_cpy --bench s Tai' (100 runs): 4,819,658 cache-misses # 51.457 % of all cache refs ( +- 0.36% ) 9,366,445 cache-references ( +- 0.21% ) 527,255,345 instructions # 1.22 insn per cycle ( +- 0.01% ) 432,344,571 cycles ( +- 0.19% ) 0.125924906 seconds time elapsed ( +- 0.29% ) Performance counter stats for './test_ref --bench s Tai' (100 runs): 5,942,214 cache-misses # 51.929 % of all cache refs ( +- 0.74% ) 11,443,010 cache-references ( +- 0.47% ) 588,650,231 instructions # 1.08 insn per cycle ( +- 0.00% ) 545,966,793 cycles ( +- 0.59% ) 0.158983632 seconds time elapsed ( +- 0.68% ) ``` 跑完發現多了一個檔案 cpy.txt 來去看一下 cpy 的 code 然後發現在兩邊的程式碼 cpy 相對完整一些有輸出 cpy.txt 然後在 calculate.c 裡面發現 ```C #include <stdio.h> #include <stdlib.h> int main(void) { FILE *fp = fopen("cpy.txt", "r"); FILE *output = fopen("output.txt", "w"); if (!fp || !output) { printf("ERROR opening input file orig.txt\n"); exit(0); } double sum = 0; double temp; while (fscanf(fp, "%lf\n", &temp) != EOF) sum += temp; FILE *fp2 = fopen("ref.txt", "r"); if (!fp2 || !output) { printf("ERROR opening input file orig.txt\n"); exit(0); } double sum2 = 0; while (fscanf(fp2, "%lf", &temp) != EOF) sum2 += temp; printf("insert %lf %lf\n", sum, sum2); sum /= 100; sum2 /= 100; fprintf(output, "insert %lf %lf\n", sum, sum2); fclose(fp); fclose(fp2); } ``` 這個程式主要是在計算平均時間但是輸入這邊是有 cpy.txt & ref.txt 所以我就先把 ref 的部份參考 cpy的地方補齊比對程式碼後發現差了這段 ```C if (argc == 2 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free_all(root); return stat; } FILE *output; output = fopen("cpy.txt", "a"); if (output != NULL) { fprintf(output, "%.6f\n", t2 - t1); fclose(output); } else printf("open file error\n"); ``` 把裡面的部份改成 ref 的再次 make test 就有 ref.txt 了這邊要小心的是因為他的 file open type 是 a 也就是接上去的所以cpy因為多跑了一次所以多出一百項為求公平我就把他們都砍了重跑一次本來想說在 makefile 裡面應該會有用到 calculate 後來發現根本沒有所以這邊猜測大概就是把這個東西寫進去看了一下 code 內容就只是把時間 sum 起來/100 也就是算平均然後產生一個 output.txt 試著單獨編譯並執行 ``` $ gcc -o calculate calculate.c $ ls bench.c bloom.o cpy.txt ref.txt test_cpy.o tst.c bench.h calculate LICENSE scripts test_ref tst.h bloom.c calculate.c Makefile test_cpy test_ref.c tst.o bloom.h cities.txt README.md test_cpy.c test_ref.o $ ./calculate insert 12.107501 14.594090 ``` 這邊就拿到平均數值了 --- 做到這邊發現自己跑題了我是來作圖的R 重新看了一次整包檔案發現有個資料夾 script 我在裡面找到了 gp 檔 google 後發現原來就是 gnuplot 的檔案一個感動的痛哭流涕還以為要自己寫於是我就研究了一下這些 gp 檔 - runtime.gp title就告訴我他要來比較 perfomance 用的檔案就是我上面產的 output.txt ```gp reset set ylabel 'time(sec)' set style fill solid set key center top set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime.png' plot [:][y=0:0.250]'output.txt' using 2:xtic(1) with histogram title 'cpy', \ '' using 3:xtic(1) with histogram title 'ref' , \ '' using ($0-0.500):(0.110):2 with labels title ' ' textcolor lt 1, \ '' using ($0-0.500):(0.120):3 with labels title ' ' textcolor lt 2, ``` - runtime3.gp 看到 xlabel 這個應該是跟字的測試畫圖有關 ```gp reset set xlabel 'prefix' set ylabel 'time(sec)' set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime3.png' set format x "%10.0f" set xtic 1200 set xtics rotate by 45 right plot [:12500][:1000]'bench_cpy.txt' using 1:2 with points title 'cpy',\ ``` - runtimebox.gp ```gp stats "bench_ref.txt" using 2 stats "bench_cpy.txt" using 2 ``` - runtimept.gp 這檔案擺明跟 runtime3 有 87%像一臉就是多了一個ref的 ```gp reset set xlabel 'prefix' set ylabel 'time(sec)' set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime2.png' set format x "%10.0f" set xtic 1200 set xtics rotate by 45 right plot [:12500][:1000]'bench_cpy.txt' using 1:2 with points title 'cpy',\ 'bench_ref.txt' using 1:2 with points title 'ref',\ ``` 先從我已經有的 output.txt 著手也就是 runtime 這個檔案 ``` $ gnuplot scripts/runtime.gp $ ls bench.c bloom.o cpy.txt README.md test_cpy test_ref.c tst.o bench.h calculate LICENSE ref.txt test_cpy.c test_ref.o bloom.c calculate.c Makefile runtime.png test_cpy.o tst.c bloom.h cities.txt output.txt scripts test_ref tst.h $ eog runtime.png ``` ![](https://i.imgur.com/GbuTiJe.png) 看來這個沒啥問題單純比較兩者的執行時間這個最後把上面流程丟進去 makefile 就行了 --- 發現少了 bench_ref.txt 跟 bench_cpy.txt 在程式碼裡面的話就是 ```C if (argc == 2 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free_all(root); return stat; } ``` 條件就是 arg 的 count = 2 第一個參數還要是 "--bench" 看了一下 makefile test 是用 ``` makefile TEST_DATA = s Tai ./test_cpy --bench $(TEST_DATA) ./test_ref --bench $(TEST_DATA) ``` 所以當然不符合喇～因為參數變成4個（程式本體, --bench, s, Tai）所以要符合讓它產出檔案的話就是 ```bash= $ ./test_cpy --bench ternary_tree, loaded 259112 words in 0.112566 sec $ ./test_ref --bench ternary_tree, loaded 259112 words in 0.159782 sec ``` 這樣就有檔案了然後來試試看跑圖出來 ![](https://i.imgur.com/rke0c7x.png) ![](https://i.imgur.com/D7QtyF3.png) 麻...就是太貼了點把y軸拉小一點然後發現它只取12500個點我就把它都改成全部 all in 最終 ![](https://i.imgur.com/NLqNH5M.png) ![](https://i.imgur.com/Js2YKUI.png) (那個x軸我懶的改了... 順帶一提我的 x 大小是 247613) ### perf看run time 這個我覺得就是 make test 裡面做的事情了觀察的有 cache-misses,cache-references,instructions,cycles 這四項分別是 ``` Performance counter stats for './test_cpy --bench s Tai' (100 runs): 4,819,658 cache-misses # 51.457 % of all cache refs ( +- 0.36% ) 9,366,445 cache-references ( +- 0.21% ) 527,255,345 instructions # 1.22 insn per cycle ( +- 0.01% ) 432,344,571 cycles ( +- 0.19% ) 0.125924906 seconds time elapsed ( +- 0.29% ) Performance counter stats for './test_ref --bench s Tai' (100 runs): 5,942,214 cache-misses # 51.929 % of all cache refs ( +- 0.74% ) 11,443,010 cache-references ( +- 0.47% ) 588,650,231 instructions # 1.08 insn per cycle ( +- 0.00% ) 545,966,793 cycles ( +- 0.59% ) 0.158983632 seconds time elapsed ( +- 0.68% ) ``` 當然 test 只有做搜尋的所以可能之後再補 a f d ### 改善機制沒想到待補...