# 2018q3 Homework3 (dict)
contributed by <[krimson8](https://github.com/krimson8/dict)>
## 開發環境
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz
Stepping: 10
CPU MHz: 800.012
CPU max MHz: 4000.0000
CPU min MHz: 800.0000
BogoMIPS: 5616.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 9216K
NUMA node0 CPU(s): 0-5
```
## 修改 scripts/runtime3.gp
首先修改`bench.c ` 的內容
```Clike=
while (fscanf(dict, "%s", word) != EOF) {
if (strlen(word) < 4)
continue;
strncpy(prefix, word, 3);
t1 = tvgetf();
tst_search_prefix(root, prefix, sgl, &sidx, max);
t2 = tvgetf();
t = (t2 - t1) * 1000000;
if(t > largest) {
largest = t;
largest_idx = idx;
}
fprintf(fp, "%d %f msec\n", idx, (t2 - t1) * 1000);
idx++;
}
printf("Largest time consumed :%d--%f\n", largest_idx, largest);
```
~~重複run 了幾次發現最大數字大約都是20 (單位)左右,~~
還有發現`(t2 - t1) * 1000000` 這段出來的數字有問題
```clike=
double tvgetf()
{
struct timespec ts;
double sec;
clock_gettime(CLOCK_REALTIME, &ts);
sec = ts.tv_nsec;
sec /= 1e9;
sec += ts.tv_sec;
return sec;
}
```
根據 [man page](https://linux.die.net/man/3/clock_gettime) 的內容,`clock_gettime()` 把 `CLOCK_REALTIME` 寫入 `ts` 過後, timespec 這個 struct 裏面所存有的兩項資料都必須加入 `sec` 這個變數內來得到正確的時間,然後`sec` 這個變數內儲存的是 nanosec 奈秒,因此修改 `runtime3.gp` 的內容如下
`runtime3.gp`
```clike=
reset
set xlabel 'prefix'
set ylabel 'time(msec)'
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime3.png'
set format x "%10.0f"
set xtic 1200
set xtics rotate by 45 right
plot [:12500][:1]'bench_cpy.txt' using 1:2 with points title 'cpy',\
```
幾乎所有的搜尋時間都小於1,因此設y 的範圍爲0 - 1,詳見附圖

## 嘗試 make test
perf 不成功,因爲沒有權限,錯誤訊息如下
```
Error:
You may not have permission to collect stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).
The current value is 3:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
kernel.perf_event_paranoid = -1
Makefile:45: recipe for target 'test' failed
make: *** [test] Error 255
```
因此更動 `/proc/sys/kernel/perf_event_paranoid` 的內容
```
sudo sh -c " echo -1 > /proc/sys/kernel/perf_event_paranoid"
```
## 修改 test_ref.c
嘗試 `make test` 依然會出錯,比較程式碼得知 `test_ref.c` 內少了對應 `--bench`的情況
於是補上程式碼,對照`test_cpy.c`
```clike=
if (argc == 2 && strcmp(argv[1], "--bench") == 0) {
int stat = bench_test(root, BENCH_TEST_FILE, LMAX);
tst_free_all(root);
return stat;
}
FILE *output;
output = fopen("ref.txt", "a");
if (output != NULL) {
fprintf(output, "%.6f\n", t2 - t1);
fclose(output);
} else
printf("open file error\n");
```
## 執行 make test
根據 `Makefile` 執行 test 的時候,會執行下列兩個指令
```
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./test_cpy --bench s Tai
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./test_ref --bench s Tai
```
執行結果如下,使用 `> output` 讓原本程式碼執行時的輸出放去另外一個文件裏面,這樣就只會顯示perf 的輸出
```
~/D/dict ❯❯❯ make test > output
[sudo] password for xiongjj:
Performance counter stats for './test_cpy --bench s Tai' (100 runs):
554,3498 cache-misses # 59.239 % of all cache refs ( +- 0.91% )
935,7871 cache-references ( +- 0.12% )
5,2699,3847 instructions # 1.05 insn per cycle ( +- 0.01% )
5,0183,2877 cycles ( +- 0.59% )
0.132189464 seconds time elapsed ( +- 0.64% )
Performance counter stats for './test_ref --bench s Tai' (100 runs):
681,3224 cache-misses # 61.210 % of all cache refs ( +- 0.29% )
1113,0890 cache-references ( +- 0.09% )
5,8868,4620 instructions # 0.97 insn per cycle ( +- 0.00% )
6,0984,1686 cycles ( +- 0.35% )
0.161650066 seconds time elapsed ( +- 0.39% )
```
## bench.c bug 修改
參照 [Jyun-Neng](https://hackmd.io/s/ByoHx_go7) 同學的共筆,發現之前的結果都是錯誤的,因爲 `bench.c` 裏的 `prefix[3]` 太小,所以要 +1,這樣才有搜索到結果,不然 `tst.c` 裏的 `tst_prefix_search()` 都會回傳 null
## 將 `test_ref` 和 `test_cpy` 一起比較
首先比較的是兩者 `--bench` 同樣修改 `runtimept.gp` 內的 Y 最大值,結果如圖

接下來比較兩者在 `make test` 指令下的建制時間
`runtime.gp`
```clike
reset
set xlabel 'test_number'
set ylabel 'time(sec)'
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime.png'
set format x "%10.0f"
set xtic 1200
set xtics rotate by 45 right
plot [:105][:1]'cpy.txt' using 1:2 with points title 'cpy',\
'ref.txt' using 1:2 with points title 'ref',\
```
