# 分組報告 : daeman
contributed by <`bauuuu1021`, `warrenanson`, `BroLeaf`>
## Goal
重做 [daemon](https://hackmd.io/@sysprog/r14Qu9hYV?type=view) 作業,符合所有指定要求並彙整 [其他同學](https://hackmd.io/@sysprog/SkLygcAiV?type=view) 的成果
## kecho 缺失
>指出 [kecho 程式碼](https://github.com/sysprog21/kecho) 實作的缺失 (注意像是 buffer overflow 一類的問題),並予以改進
>
### Buffer overflow
使用 `telnet` 時,程式預期回傳輸入的字串(元),但輸入 `a` 除了回傳 `a` 還伴隨著亂碼
```shell
bauuuu1021@X555:~/kecho$ telnet localhost 12345
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
a
a
��r
```
使用 `dmesg` 查看
```shell
[16867.766947] fastecho: socket create ok....
[16867.766950] fastecho: setsockopt ok....
[16867.766953] fastecho: socket bind ok....
[16867.766956] fastecho: socket listen ok....
[17574.265477] fastecho: start get response
[17575.465972] fastecho: get request = a
\xfe\x8fr\xa1
[17575.465975] fastecho: start send request.
[17575.466002] fastecho: send request = a
\xfe\x8fr\xa1
[17575.466004] fastecho: start get response
[17578.028711] fastecho: get request = a
\xfe\x8fr\xa1
[17578.028713] fastecho: start get response
[17578.028715] fastecho: get request = a
\xfe\x8fr\xa1
[17578.028716] fastecho: start send request.
[17578.028759] fastecho: send request = a
\xfe\x8fr\xa1
```
從以上訊息可得知,雖然只輸入 `a` ,但 kernel module 實際收到的是 `a (\n)
\xfe\x8fr\xa1`
而字串 `fastecho: start get response` 及 `fastecho: get request` 出現在 [echo_server.c](https://github.com/sysprog21/kecho/blob/master/echo_server.c) 的 `get_request()` 中
```cpp=10
static int get_request(struct socket *sock, unsigned char *buf, size_t size)
{
struct msghdr msg;
struct kvec vec;
int length;
/* kvec setting */
vec.iov_len = size;
vec.iov_base = buf;
/* msghdr setting */
msg.msg_name = 0;
msg.msg_namelen = 0;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_flags = 0;
printk(MODULE_NAME ": start get response\n");
/* get msg */
length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags);
printk(MODULE_NAME ": get request = %s\n", buf);
return length;
}
```
而根據 `#define BUF_SIZE 4096` 及呼叫 `get_request()` 的 `echo_server_worker()` ,第 29 行的 `size` 為 4095
```cpp=91
res = get_request(sock, buf, BUF_SIZE - 1);
res = send_request(sock, buf, strlen(buf));
```
參考 [kernel_recvmsg](https://www.kernel.org/doc/htmldocs/networking/API-kernel-recvmsg.html)
* struct socket * sock : The socket to receive the message from
* struct msghdr * msg : Received message
* struct kvec * vec : Input [scatter/gather(s/g) array](https://en.wikipedia.org/wiki/Vectored_I/O) for message data
* size_t num : Size of input s/g array
* size_t size : Number of bytes to read
* int flags : Message flags (MSG_DONTWAIT, etc...)
:::warning
顯然 **Number of bytes to read** 在多數情況下不會是 4095 Bytes,或是 read 後必須處理多讀進來的字元
:::
#### 嘗試修改 buf
在 receive 後直接修改 buf 的值
```cpp
length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags);
strncpy(buf, "test", 5);
printk(MODULE_NAME ": get request = %s\n", buf);
```
結果
```shell
[ 1270.079206] fastecho: socket create ok....
[ 1270.079209] fastecho: setsockopt ok....
[ 1270.079213] fastecho: socket bind ok....
[ 1270.079215] fastecho: socket listen ok....
[ 1275.178793] fastecho: start get response
[ 1276.731929] fastecho: get request = test
[ 1276.731933] fastecho: start send request.
[ 1276.731982] fastecho: send request = test
[ 1276.731984] fastecho: start get response
[ 1278.591491] fastecho: get request = test
[ 1278.591493] fastecho: start send request.
[ 1278.591530] fastecho: send request = test
[ 1278.591531] fastecho: start get response
[ 1285.641779] fastecho: get request = test
[ 1285.641782] fastecho: start get response
[ 1285.641785] fastecho: get request = test
[ 1285.641786] fastecho: start send request.
[ 1285.641846] fastecho: send request = test
```
故猜測原本的問題與 `buf` 的修改有關
#### 配置記憶體後使用 memset
將配置 `buf` 記憶體
```cpp
buf = kmalloc(BUF_SIZE, GFP_KERNEL);
if (!buf) {
printk(KERN_ERR MODULE_NAME ": kmalloc error....\n");
return -1;
}
```
後面加上
```cpp
memset(buf, 0, BUF_SIZE);
```
即成功解決 telnet echo 第一次回傳的 buffer overflow 問題。
#### get_request 前使用 memset
然而第二次開始有時字尾會附加之前幾次 request 的部份字串,因此判斷在進行下一次 `get_request` 前也該加上 `memset`,實做如下
```cpp
static int send_request(struct socket *sock, unsigned char *buf, size_t size)
{
/* ship */
printk(MODULE_NAME ": send request = %s\n", buf);
memset(buf, 0, BUF_SIZE);
return length;
}
```
雖然這樣就解決了亂碼的問題,但結束程式後在 `dmesg` 中會看到大量的 error message
```shell
[ 463.976845] BUG: unable to handle kernel paging request at ffffffffffffffff
[ 463.976856] PGD 11440e067 P4D 11440e067 PUD 114410067 PMD 0
[ 463.976874] Oops: 0000 [#1] SMP PTI
[ 463.976886] CPU: 0 PID: 2154 Comm: fastecho Tainted: G OE 4.18.0-18-generic #19~18.04.1-Ubuntu
[ 463.976892] Hardware name: ASUSTeK COMPUTER INC. X555LB/X555LB, BIOS X555LB.603 05/06/2016
[ 463.976908] RIP: 0010:memcpy_erms+0x6/0x10
[ 463.976913] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 463.977048] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206
[ 463.977069] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590
[ 463.977076] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930
[ 463.977083] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000
[ 463.977090] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590
[ 463.977098] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90
[ 463.977107] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000
[ 463.977114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 463.977121] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0
[ 463.977128] Call Trace:
[ 463.977144] ? _copy_from_iter_full+0x231/0x250
[ 463.977156] tcp_sendmsg_locked+0x9e1/0xd50
[ 463.977168] tcp_sendmsg+0x2c/0x50
[ 463.977178] inet_sendmsg+0x2e/0xb0
[ 463.977188] sock_sendmsg+0x3e/0x50
[ 463.977198] kernel_sendmsg+0x2b/0x30
[ 463.977212] send_request.isra.0+0xee/0x130 [fastecho]
[ 463.977225] echo_server_worker+0x9c/0x100 [fastecho]
[ 463.977236] kthread+0x121/0x140
[ 463.977246] ? get_request.constprop.1+0xc0/0xc0 [fastecho]
[ 463.977256] ? kthread_create_worker_on_cpu+0x70/0x70
[ 463.977267] ret_from_fork+0x35/0x40
[ 463.977275] Modules linked in: fastecho(OE) rfcomm cmac bnep binfmt_misc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc rtsx_usb_ms uvcvideo videobuf2_vmalloc videobuf2_memops aesni_intel videobuf2_v4l2 aes_x86_64 crypto_simd videobuf2_common cryptd memstick ath3k arc4 videodev media glue_helper intel_cstate ath9k ath9k_common btusb intel_rapl_perf btrtl btbcm btintel joydev ath9k_hw ath snd_hda_codec_realtek bluetooth asus_nb_wmi snd_hda_codec_hdmi snd_hda_codec_generic mac80211 ecdh_generic asus_wmi snd_hda_intel snd_hda_codec cfg80211 input_leds snd_seq_midi serio_raw snd_seq_midi_event sparse_keymap snd_hda_core mac_hid snd_hwdep snd_rawmidi snd_pcm snd_seq snd_seq_device acpi_pad snd_timer
[ 463.977429] processor_thermal_device int340x_thermal_zone int3400_thermal intel_soc_dts_iosf mei_me mei lpc_ich snd intel_pch_thermal acpi_thermal_rel asus_wireless soundcore sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic rtsx_usb_sdmmc usbhid rtsx_usb hid nouveau i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper ahci syscopyarea sysfillrect psmouse libahci sysimgblt fb_sys_fops drm r8169 mii wmi video
[ 463.977524] CR2: ffffffffffffffff
[ 463.977533] ---[ end trace 2ffdde6bdedecd09 ]---
[ 463.977544] RIP: 0010:memcpy_erms+0x6/0x10
[ 463.977549] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 463.977674] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206
[ 463.977683] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590
[ 463.977689] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930
[ 463.977696] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000
[ 463.977702] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590
[ 463.977709] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90
[ 463.977717] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000
[ 463.977724] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 463.977731] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0
```
且移除 module 再重新載入會失敗並出現以下訊息
```shell
insmod: ERROR: could not insert module fastecho.ko: Address already in use
```
:::warning
逐步使用 `dmesg` 發現,從第一次 request 到結束 telnet 時使用 `^]` 皆未出現錯誤訊息,是到了
```shell
telnet> q
Connection closed.
```
才出現
:::
#### Error 排除
後來逐行檢查,發現
```cpp
static int echo_server_worker(void *arg)
{
/* skip */
res = get_request(sock, buf, BUF_SIZE - 1);
res = send_request(sock, buf, strlen(buf));
kernel_sock_shutdown(sock, SHUT_RDWR);
sock_release(sock);
kfree(buf);
return 0;
}
```
`get_request()` 有成功執行,但執行下一行 `send_request()` 卻導致程式不預期的終止。故嘗試將 `res = send_request(sock, buf, strlen(buf));` 移除,結果成功解決了錯誤!!!
### 輸入覆蓋輸出
輸入測試字串 `echo_testing...` 發現字串回傳後游標會在回傳的字串之首,導致再度輸入時會覆蓋掉之前回傳的字串。如下所示,此時游標在回傳的 (第二個) `echo_testing...` 的 `e` 上
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
echo_testing...
```
接著輸入下一個字串 `coverQQ` 會變成
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
coverQQsting...
```
(但其實不影響實際使用功能)
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
coverQQsting...
coverQQ
```
:::success
目前利用每次 request 結束到下次 request 間傳送兩個 newline 字元來解決 (但應有較優雅的方法)
```shell
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
echo_testing...
echo_testing...
coverQQ
coverQQ
```
:::
### 認證失敗
執行 `sudo insmod fastecho.ko` 後,以 `dmesg` 查看發現以下訊息
```shell
[15549.339162] fastecho: module verification failed: signature and/or required key missing - tainting kernel
```
>發現只有開機後的第一次載入會出現這樣的錯誤訊息,移除再載入時不會發生
>[color=#0abab5]
:::info
比較 [fibdev](https://github.com/sysprog21/fibdrv/blob/master/Makefile#L1) 的 Makefile 第 1 行 `CONFIG_MODULE_SIG=n`, 思考其作用
:notes: jserv
:::
>目前在 Makefile 最前面加上
>```
>CONFIG_MODULE_SIG=n
>CONFIG_MODULE_SIG_ALL=n
>```
>但仍無法解決問題
## 效能分析1
>1. 參閱相關材料,設計足以驗證 kecho 效能的實驗,過程中留意到 concurrency
>2. 在 kecho 的程式碼基礎上,引入 Concurrency Managed Workqueue (cmwq),配合 1. 的實驗,提出效能改善計畫和落實
>
根據 [afcidk ](https://hackmd.io/@afcidk/Hyqa6tLoE?type=view)同學的實驗,以下分為 Original, Workqueue, CMWQ 探討。每個類別分別紀錄其平均及最差回應時間,還有 CPU 使用率
### Original
![](https://i.imgur.com/2YoM2sN.png =330x300) ![](https://i.imgur.com/DiXkX9U.png =330x300)
![](https://i.imgur.com/NkYykXm.png)
### Workqueue
![](https://i.imgur.com/3UHiUDZ.png =330x300) ![](https://i.imgur.com/2zeUPTT.png =330x300)
![](https://i.imgur.com/U5mISwz.png)
### CMWQ
![](https://i.imgur.com/4A0Lfac.png =330x300) ![](https://i.imgur.com/ssSaSdm.png =330x300)
![](https://i.imgur.com/xND9mfe.png)
### 小結
可以發現,從 Original -> Workqueue -> CMWQ,回應時間大幅降低(平均及最差皆是),且 CPU 使用率大幅提昇
>不太確定是否是筆者指令錯誤,但在重現實驗的過程中跑到一半就會錯誤終止;故以上圖片是直接擷取原作者的共筆[color=#0abab5][name=bauuuu1021]
## 效能分析2
根據[JulianATA](https://hackmd.io/@hPMCWajOS-ORQdEEAQ04-w/B1-endN3N?type=view)同學的實驗以 thread 配置 telnet 連線,調整 Client 數量以及 Request 數量來探討(重複次數及 Timeout 皆固定)。
參數
- Client 數量
- Request 數量
- 重複次數(最後秒數為平均)
- Timeout
```python
def connect_and_send(Host,Port, number, Request_num, timeout):
tn = telnetlib.Telnet(Host,Port,timeout = 10)
for i in range (Request_num):
mesg = ("Telnet No."+str(number)+",Write No"+str(i)+"\n").encode('ascii')
tn.write(mesg)
tn.read_until(mesg)
tn.close()
```
```python
def concurrent_test(Host, Port, Client_num , Request_num,timeout):
threads = []
for i in range(Client_num):
threads.append(threading.Thread(target = connect_and_send, args = (Host,Port,i,Request_num,timeout,)))
for i in range(Clinent_num):
threads[i].start()
for i in range(Client_num):
threads[i].join()
return
```
### Client = 20 / Request = 20 / Repeat = 5 / Timeout = 10
![](https://i.imgur.com/sXHNVHZ.png)
### Client = 40 / Request = 5 / Repeat = 5 / Timeout = 10
![](https://i.imgur.com/jJY860p.png)
### Client = 50 / Request = 20 / Repeat = 5 / Timeout = 10
![](https://i.imgur.com/RLEx5eK.png)
![](https://i.imgur.com/CYTvsZp.png)
### 小結
明顯看到在 Client 大於 20 之後,回應時間明顯的增加。
可以得知 Client 和回應時間有正相關。
## Reference
* [daemon 作業區](https://hackmd.io/@sysprog/SkLygcAiV?type=view)
* [bauuuu1021 共筆](https://hackmd.io/@jD9XFdyQS0iyAaFMPYi5Ww/SJzmAarsN)
* [afcidk 共筆](https://hackmd.io/@afcidk/Hyqa6tLoE?type=view)
* [JulianATA 共筆](https://hackmd.io/@hPMCWajOS-ORQdEEAQ04-w/B1-endN3N?type=view)
###### tags: `linux2019`, `2019spring`