# 2019q1 Homework5 (daemon)
contributed by < `bauuuu1021` >
* [作業要求](https://hackmd.io/s/r14Qu9hYV)
* [GitHub](https://github.com/bauuuu1021/kecho)
下載並編譯 `kecho` ,置入 kernel module
```shell
$ git clone https://github.com/bauuuu1021/kecho
$ cd kecho
$ make
$ sudo insmod fastecho.ko
```
承前幾周作業的操作,輸入 `modinfo fastecho.ko`
```shell
filename: /home/bauuuu1021/kecho/fastecho.ko
version: 0.1
description: Fast echo server in kernel
author: National Cheng Kung University, Taiwan
license: Dual MIT/GPL
srcversion: B6A15BC54464DFDE106869E
depends:
retpoline: Y
name: fastecho
vermagic: 4.18.0-18-generic SMP mod_unload
parm: port:ushort
parm: backlog:ushort
```
:::danger
注意!在 `rmmod` 前一定要將 telnet 程式結束,否則移除 module 後會無法再度載入
:::
## kecho 缺失
>指出 [kecho 程式碼](https://github.com/sysprog21/kecho) 實作的缺失 (注意像是 buffer overflow 一類的問題),並予以改進
>
### Buffer overflow
:::warning
使用 `telnet` 時,程式預期回傳輸入的字串(元),但輸入 `a` 除了回傳 `a` 還伴隨著亂碼
```shell
bauuuu1021@X555:~/kecho$ telnet localhost 12345
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
a
a
��r
```
:::
使用 `dmesg` 查看
```shell
[16867.766947] fastecho: socket create ok....
[16867.766950] fastecho: setsockopt ok....
[16867.766953] fastecho: socket bind ok....
[16867.766956] fastecho: socket listen ok....
[17574.265477] fastecho: start get response
[17575.465972] fastecho: get request = a
\xfe\x8fr\xa1
[17575.465975] fastecho: start send request.
[17575.466002] fastecho: send request = a
\xfe\x8fr\xa1
[17575.466004] fastecho: start get response
[17578.028711] fastecho: get request = a
\xfe\x8fr\xa1
[17578.028713] fastecho: start get response
[17578.028715] fastecho: get request = a
\xfe\x8fr\xa1
[17578.028716] fastecho: start send request.
[17578.028759] fastecho: send request = a
\xfe\x8fr\xa1
```
從以上訊息可得知,雖然只輸入 `a` ,但 kernel module 實際收到的是 `a (\n)
\xfe\x8fr\xa1`
>發現依序 get -> send -> get 兩次(?!) -> send
>有點不太明白,先紀錄著
>[name=bauuuu1021][color=#0abab5]
>
以上顯示的字串 `fastecho: start get response` 及 `fastecho: get request` 出現在 [echo_server.c](https://github.com/sysprog21/kecho/blob/master/echo_server.c) 的 `get_request()` 中
```cpp=10
static int get_request(struct socket *sock, unsigned char *buf, size_t size)
{
struct msghdr msg;
struct kvec vec;
int length;
/* kvec setting */
vec.iov_len = size;
vec.iov_base = buf;
/* msghdr setting */
msg.msg_name = 0;
msg.msg_namelen = 0;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_flags = 0;
printk(MODULE_NAME ": start get response\n");
/* get msg */
length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags);
printk(MODULE_NAME ": get request = %s\n", buf);
return length;
}
```
而根據 `#define BUF_SIZE 4096` 及呼叫 `get_request()` 的 `echo_server_worker()` ,第 29 行的 `size` 為 4095
```cpp=91
res = get_request(sock, buf, BUF_SIZE - 1);
res = send_request(sock, buf, strlen(buf));
```
參考 [kernel_recvmsg](https://www.kernel.org/doc/htmldocs/networking/API-kernel-recvmsg.html)
* struct socket * sock : The socket to receive the message from
* struct msghdr * msg : Received message
* struct kvec * vec : Input [scatter/gather(s/g) array](https://en.wikipedia.org/wiki/Vectored_I/O) for message data
* size_t num : Size of input s/g array
* size_t size : Number of bytes to read
* int flags : Message flags (MSG_DONTWAIT, etc...)
:::warning
顯然 **Number of bytes to read** 在多數情況下不會是 4095 Bytes,或是 read 後必須處理多讀進來的字元
:::
#### 嘗試修改 buf
在 receive 後直接修改 buf 的值
```cpp
length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags);
strncpy(buf, "test", 5);
printk(MODULE_NAME ": get request = %s\n", buf);
```
結果
```shell
[ 1270.079206] fastecho: socket create ok....
[ 1270.079209] fastecho: setsockopt ok....
[ 1270.079213] fastecho: socket bind ok....
[ 1270.079215] fastecho: socket listen ok....
[ 1275.178793] fastecho: start get response
[ 1276.731929] fastecho: get request = test
[ 1276.731933] fastecho: start send request.
[ 1276.731982] fastecho: send request = test
[ 1276.731984] fastecho: start get response
[ 1278.591491] fastecho: get request = test
[ 1278.591493] fastecho: start send request.
[ 1278.591530] fastecho: send request = test
[ 1278.591531] fastecho: start get response
[ 1285.641779] fastecho: get request = test
[ 1285.641782] fastecho: start get response
[ 1285.641785] fastecho: get request = test
[ 1285.641786] fastecho: start send request.
[ 1285.641846] fastecho: send request = test
```
故猜測原本的問題與 `buf` 的修改有關
#### 配置記憶體後使用 memset
將配置 `buf` 記憶體
```cpp
buf = kmalloc(BUF_SIZE, GFP_KERNEL);
if (!buf) {
printk(KERN_ERR MODULE_NAME ": kmalloc error....\n");
return -1;
}
```
後面加上
```cpp
memset(buf, 0, BUF_SIZE);
```
即成功解決 telnet echo 第一次回傳的 buffer overflow 問題。
#### get_request 前使用 memset
然而第二次開始有時字尾會附加之前幾次 request 的部份字串,因此判斷在進行下一次 `get_request` 前也該加上 `memset`,實做如下
```cpp
static int send_request(struct socket *sock, unsigned char *buf, size_t size)
{
/* ship */
printk(MODULE_NAME ": send request = %s\n", buf);
memset(buf, 0, BUF_SIZE);
return length;
}
```
雖然這樣就解決了亂碼的問題,但結束程式後在 `dmesg` 中會看到大量的 error message
```shell
[ 463.976845] BUG: unable to handle kernel paging request at ffffffffffffffff
[ 463.976856] PGD 11440e067 P4D 11440e067 PUD 114410067 PMD 0
[ 463.976874] Oops: 0000 [#1] SMP PTI
[ 463.976886] CPU: 0 PID: 2154 Comm: fastecho Tainted: G OE 4.18.0-18-generic #19~18.04.1-Ubuntu
[ 463.976892] Hardware name: ASUSTeK COMPUTER INC. X555LB/X555LB, BIOS X555LB.603 05/06/2016
[ 463.976908] RIP: 0010:memcpy_erms+0x6/0x10
[ 463.976913] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 463.977048] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206
[ 463.977069] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590
[ 463.977076] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930
[ 463.977083] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000
[ 463.977090] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590
[ 463.977098] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90
[ 463.977107] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000
[ 463.977114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 463.977121] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0
[ 463.977128] Call Trace:
[ 463.977144] ? _copy_from_iter_full+0x231/0x250
[ 463.977156] tcp_sendmsg_locked+0x9e1/0xd50
[ 463.977168] tcp_sendmsg+0x2c/0x50
[ 463.977178] inet_sendmsg+0x2e/0xb0
[ 463.977188] sock_sendmsg+0x3e/0x50
[ 463.977198] kernel_sendmsg+0x2b/0x30
[ 463.977212] send_request.isra.0+0xee/0x130 [fastecho]
[ 463.977225] echo_server_worker+0x9c/0x100 [fastecho]
[ 463.977236] kthread+0x121/0x140
[ 463.977246] ? get_request.constprop.1+0xc0/0xc0 [fastecho]
[ 463.977256] ? kthread_create_worker_on_cpu+0x70/0x70
[ 463.977267] ret_from_fork+0x35/0x40
[ 463.977275] Modules linked in: fastecho(OE) rfcomm cmac bnep binfmt_misc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc rtsx_usb_ms uvcvideo videobuf2_vmalloc videobuf2_memops aesni_intel videobuf2_v4l2 aes_x86_64 crypto_simd videobuf2_common cryptd memstick ath3k arc4 videodev media glue_helper intel_cstate ath9k ath9k_common btusb intel_rapl_perf btrtl btbcm btintel joydev ath9k_hw ath snd_hda_codec_realtek bluetooth asus_nb_wmi snd_hda_codec_hdmi snd_hda_codec_generic mac80211 ecdh_generic asus_wmi snd_hda_intel snd_hda_codec cfg80211 input_leds snd_seq_midi serio_raw snd_seq_midi_event sparse_keymap snd_hda_core mac_hid snd_hwdep snd_rawmidi snd_pcm snd_seq snd_seq_device acpi_pad snd_timer
[ 463.977429] processor_thermal_device int340x_thermal_zone int3400_thermal intel_soc_dts_iosf mei_me mei lpc_ich snd intel_pch_thermal acpi_thermal_rel asus_wireless soundcore sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic rtsx_usb_sdmmc usbhid rtsx_usb hid nouveau i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper ahci syscopyarea sysfillrect psmouse libahci sysimgblt fb_sys_fops drm r8169 mii wmi video
[ 463.977524] CR2: ffffffffffffffff
[ 463.977533] ---[ end trace 2ffdde6bdedecd09 ]---
[ 463.977544] RIP: 0010:memcpy_erms+0x6/0x10
[ 463.977549] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 463.977674] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206
[ 463.977683] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590
[ 463.977689] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930
[ 463.977696] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000
[ 463.977702] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590
[ 463.977709] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90
[ 463.977717] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000
[ 463.977724] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 463.977731] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0
```
且移除 module 再重新載入會失敗並出現以下訊息
```shell
insmod: ERROR: could not insert module fastecho.ko: Address already in use
```
:::warning
逐步使用 `dmesg` 發現,從第一次 request 到結束 telnet 時使用 `^]` 皆未出現錯誤訊息,是到了
```shell
telnet> q
Connection closed.
```
才出現
:::
#### Error 排除
後來逐行檢查,發現
```cpp
static int echo_server_worker(void *arg)
{
/* skip */
res = get_request(sock, buf, BUF_SIZE - 1);
res = send_request(sock, buf, strlen(buf));
kernel_sock_shutdown(sock, SHUT_RDWR);
sock_release(sock);
kfree(buf);
return 0;
}
```
`get_request()` 有成功執行,但執行下一行 `send_request()` 卻導致程式不預期的終止。故嘗試將 `res = send_request(sock, buf, strlen(buf));` 移除,結果成功解決了錯誤!!!
### 輸入覆蓋輸出
輸入測試字串 `echo_testing...` 發現字串回傳後游標會在回傳的字串之首,導致再度輸入時會覆蓋掉之前回傳的字串。如下所示,此時游標在回傳的 (第二個) `echo_testing...` 的 `e` 上
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
echo_testing...
```
接著輸入下一個字串 `coverQQ` 會變成
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
coverQQsting...
```
(但其實不影響實際使用功能)
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
coverQQsting...
coverQQ
```
:::success
目前利用每次 request 結束到下次 request 間傳送兩個 newline 字元來解決 (但應有較優雅的方法)
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
echo_testing...
coverQQ
coverQQ
```
:::
### 認證失敗
執行 `sudo insmod fastecho.ko` 後,以 `dmesg` 查看發現以下訊息
```shell
[15549.339162] fastecho: module verification failed: signature and/or required key missing - tainting kernel
```
>發現只有開機後的第一次載入會出現這樣的錯誤訊息,移除再載入時不會發生
>[color=#0abab5]
>[read](https://blog.51cto.com/weiguozhihui/1585297)[color=red]
:::info
比較 [fibdev](https://github.com/sysprog21/fibdrv/blob/master/Makefile#L1) 的 Makefile 第 1 行 `CONFIG_MODULE_SIG=n`, 思考其作用
:notes: jserv
:::
## 效能分析
使用
```shell
echo "testing" | telnet localhost 12345
```
相當於送出 `testing` 字串,用 `dmesg` 檢驗是否有成功送出、收到、回傳
```shell
[13712.118442] fastecho: start get response
[13712.118472] fastecho: get request = testing
[13712.118473] fastecho: start send request.
[13712.118492] fastecho: send request = testing
```
因此想寫一個 script 來量測這樣的過程所需的時間
```script
#!/bin/bash
END=1000
for i in $(seq 1 $END);
do
echo "test" | telnet 127.0.0.1 12345
done
```
在執行時加上關鍵字 `time` 即可得到所需的時間
```shell
time sh test2.sh
```
結果
```shell
real 0m1.828s
user 0m1.405s
sys 0m0.523s
```
:::danger
如此僅能算是壓力測試,跟效能分析沒有直接關聯。重複的 spawn (fork+exec) 出 telnet 行程然後建立 TCP 連線的成本,其實是不固定的,你應該先釐清。Ftrace 和 eBPF 是你的好朋友。
:notes: jserv
:::
## 參考資料
* [tiny web server](https://github.com/bauuuu1021/tiny-webServer)
* [nstack 開發紀錄](https://hackmd.io/s/ryfvFmZ0f)
* [鳥哥 script](http://linux.vbird.org/linux_basic/0340bashshell-scripts.php#script_1st)
* [loop in script](https://blog.longwin.com.tw/2014/07/bash-shell-for-loop-script-2014/)
###### tags: `bauuuu1021` `sysprog` `2019spring`