# 2019q1 Homework5 (daemon)
contributed by < `st9540808` >
* [F09: daemon](https://hackmd.io/s/r14Qu9hYV)
## 實作缺失的發現與改進
### buffer overflow
按照作業說明測試:
```
$ telnet localhost 12345
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
world
world
/"
```
dmesg 輸出可以看到 get request 印出輸入字串以外的東西
```shell
[245936.938694] fastecho: socket create ok....
[245936.938697] fastecho: setsockopt ok....
[245936.938700] fastecho: socket bind ok....
[245936.938702] fastecho: socket listen ok....
[245945.997407] fastecho: start get response
[245948.201059] fastecho: get request = world
\xf0\xf0/\x8c"\xa5U
[245948.201061] fastecho: start send request.
[245948.201088] fastecho: send request = world
\xf0\xf0/\x8c"\xa5U
```
對照 echo_server.c 的程式碼,buf 是由 kmalloc() 分配,並沒有對它做初始化,因此必須在 buf 最後寫入空字元。
```c=
static int get_request(struct socket *sock, unsigned char *buf, size_t size)
{
...
length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags);
printk(MODULE_NAME ": get request = %s\n", buf);
return length;
}
```
在 kernel_recvmsg() 之後在 length 寫入空字元:
```c=
length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags);
buf[length] = '\0';
```
更改完後發現在結束 telnet 時 dmesg 印出以下錯誤,目前還不知道是哪裡出錯。
```shell
[288301.831582] BUG: unable to handle kernel NULL pointer dereference at 0000000000000587
[288301.831593] IP: memcpy_orig+0x54/0x110
[288301.831595] PGD 0 P4D 0
[288301.831601] Oops: 0000 [#7] SMP PTI
[288301.831706] CPU: 3 PID: 10427 Comm: fastecho Tainted: G D OE 4.15.0-48-generic #51-U
buntu
[288301.831709] Hardware name: LENOVO 4291KV4/4291KV4, BIOS 8DET67WW (1.37 ) 12/05/2012
[288301.831714] RIP: 0010:memcpy_orig+0x54/0x110
[288301.831716] RSP: 0018:ffffbadb889b7cc0 EFLAGS: 00010206
[288301.831720] RAX: ffffa0efa97aa930 RBX: 0000000000000590 RCX: ffffa0ec40000000
[288301.831722] RDX: 0000000000000550 RSI: 000000000000058f RDI: ffffa0efa97aaec0
[288301.831725] RBP: ffffbadb889b7d08 R08: ffffa0efa97aa930 R09: 0000000000000000
[288301.831727] R10: 0000000000000800 R11: 0000000000000000 R12: 0000000000000590
[288301.831729] R13: ffffbadb889b7e80 R14: 0000000000000590 R15: ffffbadb889b7e90
[288301.831733] FS: 0000000000000000(0000) GS:ffffa0f05e2c0000(0000) knlGS:0000000000000000
[288301.831736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[288301.831738] CR2: 0000000000000587 CR3: 00000001a040a004 CR4: 00000000000606e0
[288301.831741] Call Trace:
[288301.831749] ? _copy_from_iter_full+0x231/0x250
[288301.831756] tcp_sendmsg_locked+0x93f/0xe70
[288301.831762] tcp_sendmsg+0x2c/0x50
[288301.831768] inet_sendmsg+0x2e/0xb0
[288301.831773] sock_sendmsg+0x3e/0x50
[288301.831777] kernel_sendmsg+0x2b/0x30
[288301.831783] send_request.isra.0+0xee/0x120 [fastecho]
[288301.831788] echo_server_worker+0x8f/0xf0 [fastecho]
[288301.831794] kthread+0x121/0x140
[288301.831798] ? get_request.constprop.1+0xc0/0xc0 [fastecho]
[288301.831803] ? kthread_create_worker_on_cpu+0x70/0x70
[288301.831807] ret_from_fork+0x35/0x40
[288301.831810] Code: 4c 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4 83 c2 20 eb 44 48 01 d6 48 01 d7 48 83 ea 20 0f 1f 00 48 83 ea 20 <4c> 8b 46 f8 4c 8b 4e f0 4c 8b 56 e8 4c 8b 5e e0 48 8d 76 e0 4c
[288301.831860] RIP: memcpy_orig+0x54/0x110 RSP: ffffbadb889b7cc0
[288301.831862] CR2: 0000000000000587
[288301.831866] ---[ end trace 196ae849f6e9e5e5 ]---
```
## 驗證效能
### SystemTap
使用 SystemTap 對 kernel module 做 profiling。安裝 SystemTap 同時也需要 debuginfo 才能做 probing。
```shell
$ sudo apt-get install -y systemtap
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622
$ sudo apt-get update
$ sudo apt-get install linux-image-$(uname -r)-dbgsym
```
安裝完確認是否能正常執行
```shell
stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'
```
測試完後寫一個腳本來 probe echo_server.c 的 get_request 函式。注意在執行之前必須把 fastecho.ko 放到 ``/lib/modules/`uname -r`/`` 的目錄下,第一次在 function 帶入 get_request 會得到以下錯誤訊息:
```shell
semantic error: while resolving probe point: identifier 'module' at ./profile.stp:3:7
source: probe module("fastecho").function("get_request") {
^
semantic error: no match (similar functions: init_module, send_request.isra.0, cleanup_module, echo_server_worker, get_request.constprop.1)
Pass 2: analysis failed. [man error::pass2]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
```
查了一下相關資料發現 gcc 對 get_request 做 constant propagation 的最佳化才得到 .constprop.1 的後綴。因此 function 必須帶入 get_request.constprop.1。
systemtap 測試腳本如下:
```clike=
#!/usr/bin/stap
global stats
probe module("fastecho").function("get_request.constprop.1").return {
a = gettimeofday_us() - @entry(gettimeofday_us());
printf("tid=%d name=%s ", tid(), execname());
printf("time=%dus\n", a);
stats <<< a;
}
probe end {
printf("\n");
printf("get_request execution time (us)\n");
print(@hist_log(stats));
}
```
借用[其他同學](https://hackmd.io/s/SJzmAarsN)的 bash script 來操作 telnet 進行測試,執行 systemtap 腳本後同時執行以下 bash script
```bash
#!/bin/bash
END=1000
for i in $(seq 1 $END);
do
echo "test" | telnet 127.0.0.1 12345
done
```
可以觀察到以下輸出結果,紀錄執行時間並印出直方圖。
> 至於為何每次 `echo "test" | telnet 127.0.0.1 12345` 都會印出三行還在搜尋相關資料
```shell
$ sudo stap ./profile.stp
tid=391 name=fastecho time=14us [1836/1836]
tid=391 name=fastecho time=3us
tid=391 name=fastecho time=3us
tid=394 name=fastecho time=31us
tid=394 name=fastecho time=9us
tid=394 name=fastecho time=3us
tid=397 name=fastecho time=14us
tid=397 name=fastecho time=3us
tid=397 name=fastecho time=2us
....
^C [15/1833]
get_request execution time (us)
value |-------------------------------------------------- count
0 | 0
1 | 0
2 |@@@@@@@@@@@@@@@@@@@@@@@@@ 672
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1265
8 |@@@@@@@@@@@@@@@@@ 444
16 |@@@@@@@@@@@@@@@@@@ 473
32 |@@@ 95
64 | 21
128 | 11
256 | 13
512 | 5
1024 | 2
2048 | 0
4096 | 0
~
524288 | 0
1048576 | 0
2097152 | 1
4194304 | 0
8388608 | 0
```
### [ftrace](https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/)
[ftrace - Function Tracer](https://www.kernel.org/doc/html/latest/trace/ftrace.html)
使用另外一種 dynamic tracing 的工具做 profiling,ftrace 已經寫入 Linux kernel,並以 sysfs 虛擬檔案系統為界面提供使用者相關功能,先確認自己的 kernel 是否開啟 ftrace 功能:
```shell
cat /boot/config-4.15.0-48-generic | grep CONFIG_DYNAMIC_FTRACE
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
```
輸入第一道命令後在另外一個終端機執行 `echo "test" | telnet 127.0.0.1 12345`,可以看到以下輸出
```shell
$ sudo trace-cmd record -p function_graph -l get_request.constprop.1
plugin 'function_graph'
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x5f3000
4096 bytes in size
CPU1 data recorded at offset=0x5f4000
0 bytes in size
CPU2 data recorded at offset=0x5f4000
0 bytes in size
CPU3 data recorded at offset=0x5f4000
0 bytes in size
$ sudo trace-cmd report
CPU 1 is empty
CPU 2 is empty
CPU 3 is empty
cpus=4
fastecho-16227 [000] 417473.036319: funcgraph_entry: + 11.441 us | get_request.constprop.1();
fastecho-16227 [000] 417473.036510: funcgraph_entry: 3.805 us | get_request.constprop.1();
fastecho-16227 [000] 417473.036515: funcgraph_entry: 2.572 us | get_request.constprop.1();
```