# 2018q3 Homework3 (dict) contributed by <[krimson8](https://github.com/krimson8/dict)> ## 開發環境 ``` Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz Stepping: 10 CPU MHz: 800.012 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 BogoMIPS: 5616.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 9216K NUMA node0 CPU(s): 0-5 ``` ## 修改 scripts/runtime3.gp 首先修改`bench.c ` 的內容 ```Clike= while (fscanf(dict, "%s", word) != EOF) { if (strlen(word) < 4) continue; strncpy(prefix, word, 3); t1 = tvgetf(); tst_search_prefix(root, prefix, sgl, &sidx, max); t2 = tvgetf(); t = (t2 - t1) * 1000000; if(t > largest) { largest = t; largest_idx = idx; } fprintf(fp, "%d %f msec\n", idx, (t2 - t1) * 1000); idx++; } printf("Largest time consumed :%d--%f\n", largest_idx, largest); ``` ~~重複run 了幾次發現最大數字大約都是20 (單位)左右,~~ 還有發現`(t2 - t1) * 1000000` 這段出來的數字有問題 ```clike= double tvgetf() { struct timespec ts; double sec; clock_gettime(CLOCK_REALTIME, &ts); sec = ts.tv_nsec; sec /= 1e9; sec += ts.tv_sec; return sec; } ``` 根據 [man page](https://linux.die.net/man/3/clock_gettime) 的內容,`clock_gettime()` 把 `CLOCK_REALTIME` 寫入 `ts` 過後, timespec 這個 struct 裏面所存有的兩項資料都必須加入 `sec` 這個變數內來得到正確的時間,然後`sec` 這個變數內儲存的是 nanosec 奈秒,因此修改 `runtime3.gp` 的內容如下 `runtime3.gp` ```clike= reset set xlabel 'prefix' set ylabel 'time(msec)' set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime3.png' set format x "%10.0f" set xtic 1200 set xtics rotate by 45 right plot [:12500][:1]'bench_cpy.txt' using 1:2 with points title 'cpy',\ ``` 幾乎所有的搜尋時間都小於1,因此設y 的範圍爲0 - 1,詳見附圖 ![](https://i.imgur.com/Txbdg6c.png) ## 嘗試 make test perf 不成功,因爲沒有權限,錯誤訊息如下 ``` Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 3: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Makefile:45: recipe for target 'test' failed make: *** [test] Error 255 ``` 因此更動 `/proc/sys/kernel/perf_event_paranoid` 的內容 ``` sudo sh -c " echo -1 > /proc/sys/kernel/perf_event_paranoid" ``` ## 修改 test_ref.c 嘗試 `make test` 依然會出錯,比較程式碼得知 `test_ref.c` 內少了對應 `--bench`的情況 於是補上程式碼,對照`test_cpy.c` ```clike= if (argc == 2 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free_all(root); return stat; } FILE *output; output = fopen("ref.txt", "a"); if (output != NULL) { fprintf(output, "%.6f\n", t2 - t1); fclose(output); } else printf("open file error\n"); ``` ## 執行 make test 根據 `Makefile` 執行 test 的時候,會執行下列兩個指令 ``` perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_cpy --bench s Tai perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./test_ref --bench s Tai ``` 執行結果如下,使用 `> output` 讓原本程式碼執行時的輸出放去另外一個文件裏面,這樣就只會顯示perf 的輸出 ``` ~/D/dict ❯❯❯ make test > output [sudo] password for xiongjj: Performance counter stats for './test_cpy --bench s Tai' (100 runs): 554,3498 cache-misses # 59.239 % of all cache refs ( +- 0.91% ) 935,7871 cache-references ( +- 0.12% ) 5,2699,3847 instructions # 1.05 insn per cycle ( +- 0.01% ) 5,0183,2877 cycles ( +- 0.59% ) 0.132189464 seconds time elapsed ( +- 0.64% ) Performance counter stats for './test_ref --bench s Tai' (100 runs): 681,3224 cache-misses # 61.210 % of all cache refs ( +- 0.29% ) 1113,0890 cache-references ( +- 0.09% ) 5,8868,4620 instructions # 0.97 insn per cycle ( +- 0.00% ) 6,0984,1686 cycles ( +- 0.35% ) 0.161650066 seconds time elapsed ( +- 0.39% ) ``` ## bench.c bug 修改 參照 [Jyun-Neng](https://hackmd.io/s/ByoHx_go7) 同學的共筆,發現之前的結果都是錯誤的,因爲 `bench.c` 裏的 `prefix[3]` 太小,所以要 +1,這樣才有搜索到結果,不然 `tst.c` 裏的 `tst_prefix_search()` 都會回傳 null ## 將 `test_ref` 和 `test_cpy` 一起比較 首先比較的是兩者 `--bench` 同樣修改 `runtimept.gp` 內的 Y 最大值,結果如圖 ![](https://i.imgur.com/7J0nqHT.png) 接下來比較兩者在 `make test` 指令下的建制時間 `runtime.gp` ```clike reset set xlabel 'test_number' set ylabel 'time(sec)' set title 'perfomance comparison' set term png enhanced font 'Verdana,10' set output 'runtime.png' set format x "%10.0f" set xtic 1200 set xtics rotate by 45 right plot [:105][:1]'cpy.txt' using 1:2 with points title 'cpy',\ 'ref.txt' using 1:2 with points title 'ref',\ ``` ![](https://i.imgur.com/nNCNyaN.png)