# 分組報告 : daeman contributed by <`bauuuu1021`, `warrenanson`, `BroLeaf`> ## Goal 重做 [daemon](https://hackmd.io/@sysprog/r14Qu9hYV?type=view) 作業,符合所有指定要求並彙整 [其他同學](https://hackmd.io/@sysprog/SkLygcAiV?type=view) 的成果 ## kecho 缺失 >指出 [kecho 程式碼](https://github.com/sysprog21/kecho) 實作的缺失 (注意像是 buffer overflow 一類的問題),並予以改進 > ### Buffer overflow 使用 `telnet` 時,程式預期回傳輸入的字串(元),但輸入 `a` 除了回傳 `a` 還伴隨著亂碼 ```shell bauuuu1021@X555:~/kecho$ telnet localhost 12345 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. a a ��r ``` 使用 `dmesg` 查看 ```shell [16867.766947] fastecho: socket create ok.... [16867.766950] fastecho: setsockopt ok.... [16867.766953] fastecho: socket bind ok.... [16867.766956] fastecho: socket listen ok.... [17574.265477] fastecho: start get response [17575.465972] fastecho: get request = a \xfe\x8fr\xa1 [17575.465975] fastecho: start send request. [17575.466002] fastecho: send request = a \xfe\x8fr\xa1 [17575.466004] fastecho: start get response [17578.028711] fastecho: get request = a \xfe\x8fr\xa1 [17578.028713] fastecho: start get response [17578.028715] fastecho: get request = a \xfe\x8fr\xa1 [17578.028716] fastecho: start send request. [17578.028759] fastecho: send request = a \xfe\x8fr\xa1 ``` 從以上訊息可得知,雖然只輸入 `a` ,但 kernel module 實際收到的是 `a (\n) \xfe\x8fr\xa1` 而字串 `fastecho: start get response` 及 `fastecho: get request` 出現在 [echo_server.c](https://github.com/sysprog21/kecho/blob/master/echo_server.c) 的 `get_request()` 中 ```cpp=10 static int get_request(struct socket *sock, unsigned char *buf, size_t size) { struct msghdr msg; struct kvec vec; int length; /* kvec setting */ vec.iov_len = size; vec.iov_base = buf; /* msghdr setting */ msg.msg_name = 0; msg.msg_namelen = 0; msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_flags = 0; printk(MODULE_NAME ": start get response\n"); /* get msg */ length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); printk(MODULE_NAME ": get request = %s\n", buf); return length; } ``` 而根據 `#define BUF_SIZE 4096` 及呼叫 `get_request()` 的 `echo_server_worker()` ,第 29 行的 `size` 為 4095 ```cpp=91 res = get_request(sock, buf, BUF_SIZE - 1); res = send_request(sock, buf, strlen(buf)); ``` 參考 [kernel_recvmsg](https://www.kernel.org/doc/htmldocs/networking/API-kernel-recvmsg.html) * struct socket * sock : The socket to receive the message from * struct msghdr * msg : Received message * struct kvec * vec : Input [scatter/gather(s/g) array](https://en.wikipedia.org/wiki/Vectored_I/O) for message data * size_t num : Size of input s/g array * size_t size : Number of bytes to read * int flags : Message flags (MSG_DONTWAIT, etc...) :::warning 顯然 **Number of bytes to read** 在多數情況下不會是 4095 Bytes,或是 read 後必須處理多讀進來的字元 ::: #### 嘗試修改 buf 在 receive 後直接修改 buf 的值 ```cpp length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); strncpy(buf, "test", 5); printk(MODULE_NAME ": get request = %s\n", buf); ``` 結果 ```shell [ 1270.079206] fastecho: socket create ok.... [ 1270.079209] fastecho: setsockopt ok.... [ 1270.079213] fastecho: socket bind ok.... [ 1270.079215] fastecho: socket listen ok.... [ 1275.178793] fastecho: start get response [ 1276.731929] fastecho: get request = test [ 1276.731933] fastecho: start send request. [ 1276.731982] fastecho: send request = test [ 1276.731984] fastecho: start get response [ 1278.591491] fastecho: get request = test [ 1278.591493] fastecho: start send request. [ 1278.591530] fastecho: send request = test [ 1278.591531] fastecho: start get response [ 1285.641779] fastecho: get request = test [ 1285.641782] fastecho: start get response [ 1285.641785] fastecho: get request = test [ 1285.641786] fastecho: start send request. [ 1285.641846] fastecho: send request = test ``` 故猜測原本的問題與 `buf` 的修改有關 #### 配置記憶體後使用 memset 將配置 `buf` 記憶體 ```cpp buf = kmalloc(BUF_SIZE, GFP_KERNEL); if (!buf) { printk(KERN_ERR MODULE_NAME ": kmalloc error....\n"); return -1; } ``` 後面加上 ```cpp memset(buf, 0, BUF_SIZE); ``` 即成功解決 telnet echo 第一次回傳的 buffer overflow 問題。 #### get_request 前使用 memset 然而第二次開始有時字尾會附加之前幾次 request 的部份字串,因此判斷在進行下一次 `get_request` 前也該加上 `memset`,實做如下 ```cpp static int send_request(struct socket *sock, unsigned char *buf, size_t size) { /* ship */ printk(MODULE_NAME ": send request = %s\n", buf); memset(buf, 0, BUF_SIZE); return length; } ``` 雖然這樣就解決了亂碼的問題,但結束程式後在 `dmesg` 中會看到大量的 error message ```shell [ 463.976845] BUG: unable to handle kernel paging request at ffffffffffffffff [ 463.976856] PGD 11440e067 P4D 11440e067 PUD 114410067 PMD 0 [ 463.976874] Oops: 0000 [#1] SMP PTI [ 463.976886] CPU: 0 PID: 2154 Comm: fastecho Tainted: G OE 4.18.0-18-generic #19~18.04.1-Ubuntu [ 463.976892] Hardware name: ASUSTeK COMPUTER INC. X555LB/X555LB, BIOS X555LB.603 05/06/2016 [ 463.976908] RIP: 0010:memcpy_erms+0x6/0x10 [ 463.976913] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 463.977048] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206 [ 463.977069] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590 [ 463.977076] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930 [ 463.977083] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000 [ 463.977090] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590 [ 463.977098] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90 [ 463.977107] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000 [ 463.977114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.977121] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0 [ 463.977128] Call Trace: [ 463.977144] ? _copy_from_iter_full+0x231/0x250 [ 463.977156] tcp_sendmsg_locked+0x9e1/0xd50 [ 463.977168] tcp_sendmsg+0x2c/0x50 [ 463.977178] inet_sendmsg+0x2e/0xb0 [ 463.977188] sock_sendmsg+0x3e/0x50 [ 463.977198] kernel_sendmsg+0x2b/0x30 [ 463.977212] send_request.isra.0+0xee/0x130 [fastecho] [ 463.977225] echo_server_worker+0x9c/0x100 [fastecho] [ 463.977236] kthread+0x121/0x140 [ 463.977246] ? get_request.constprop.1+0xc0/0xc0 [fastecho] [ 463.977256] ? kthread_create_worker_on_cpu+0x70/0x70 [ 463.977267] ret_from_fork+0x35/0x40 [ 463.977275] Modules linked in: fastecho(OE) rfcomm cmac bnep binfmt_misc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc rtsx_usb_ms uvcvideo videobuf2_vmalloc videobuf2_memops aesni_intel videobuf2_v4l2 aes_x86_64 crypto_simd videobuf2_common cryptd memstick ath3k arc4 videodev media glue_helper intel_cstate ath9k ath9k_common btusb intel_rapl_perf btrtl btbcm btintel joydev ath9k_hw ath snd_hda_codec_realtek bluetooth asus_nb_wmi snd_hda_codec_hdmi snd_hda_codec_generic mac80211 ecdh_generic asus_wmi snd_hda_intel snd_hda_codec cfg80211 input_leds snd_seq_midi serio_raw snd_seq_midi_event sparse_keymap snd_hda_core mac_hid snd_hwdep snd_rawmidi snd_pcm snd_seq snd_seq_device acpi_pad snd_timer [ 463.977429] processor_thermal_device int340x_thermal_zone int3400_thermal intel_soc_dts_iosf mei_me mei lpc_ich snd intel_pch_thermal acpi_thermal_rel asus_wireless soundcore sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic rtsx_usb_sdmmc usbhid rtsx_usb hid nouveau i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper ahci syscopyarea sysfillrect psmouse libahci sysimgblt fb_sys_fops drm r8169 mii wmi video [ 463.977524] CR2: ffffffffffffffff [ 463.977533] ---[ end trace 2ffdde6bdedecd09 ]--- [ 463.977544] RIP: 0010:memcpy_erms+0x6/0x10 [ 463.977549] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 463.977674] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206 [ 463.977683] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590 [ 463.977689] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930 [ 463.977696] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000 [ 463.977702] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590 [ 463.977709] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90 [ 463.977717] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000 [ 463.977724] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.977731] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0 ``` 且移除 module 再重新載入會失敗並出現以下訊息 ```shell insmod: ERROR: could not insert module fastecho.ko: Address already in use ``` :::warning 逐步使用 `dmesg` 發現,從第一次 request 到結束 telnet 時使用 `^]` 皆未出現錯誤訊息,是到了 ```shell telnet> q Connection closed. ``` 才出現 ::: #### Error 排除 後來逐行檢查,發現 ```cpp static int echo_server_worker(void *arg) { /* skip */ res = get_request(sock, buf, BUF_SIZE - 1); res = send_request(sock, buf, strlen(buf)); kernel_sock_shutdown(sock, SHUT_RDWR); sock_release(sock); kfree(buf); return 0; } ``` `get_request()` 有成功執行,但執行下一行 `send_request()` 卻導致程式不預期的終止。故嘗試將 `res = send_request(sock, buf, strlen(buf));` 移除,結果成功解決了錯誤!!! ### 輸入覆蓋輸出 輸入測試字串 `echo_testing...` 發現字串回傳後游標會在回傳的字串之首,導致再度輸入時會覆蓋掉之前回傳的字串。如下所示,此時游標在回傳的 (第二個) `echo_testing...` 的 `e` 上 ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... echo_testing... ``` 接著輸入下一個字串 `coverQQ` 會變成 ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... coverQQsting... ``` (但其實不影響實際使用功能) ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... coverQQsting... coverQQ ``` :::success 目前利用每次 request 結束到下次 request 間傳送兩個 newline 字元來解決 (但應有較優雅的方法) ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... echo_testing... coverQQ coverQQ ``` ::: ### 認證失敗 執行 `sudo insmod fastecho.ko` 後,以 `dmesg` 查看發現以下訊息 ```shell [15549.339162] fastecho: module verification failed: signature and/or required key missing - tainting kernel ``` >發現只有開機後的第一次載入會出現這樣的錯誤訊息,移除再載入時不會發生 >[color=#0abab5] :::info 比較 [fibdev](https://github.com/sysprog21/fibdrv/blob/master/Makefile#L1) 的 Makefile 第 1 行 `CONFIG_MODULE_SIG=n`, 思考其作用 :notes: jserv ::: >目前在 Makefile 最前面加上 >``` >CONFIG_MODULE_SIG=n >CONFIG_MODULE_SIG_ALL=n >``` >但仍無法解決問題 ## 效能分析1 >1. 參閱相關材料,設計足以驗證 kecho 效能的實驗,過程中留意到 concurrency >2. 在 kecho 的程式碼基礎上,引入 Concurrency Managed Workqueue (cmwq),配合 1. 的實驗,提出效能改善計畫和落實 > 根據 [afcidk ](https://hackmd.io/@afcidk/Hyqa6tLoE?type=view)同學的實驗,以下分為 Original, Workqueue, CMWQ 探討。每個類別分別紀錄其平均及最差回應時間,還有 CPU 使用率 ### Original ![](https://i.imgur.com/2YoM2sN.png =330x300) ![](https://i.imgur.com/DiXkX9U.png =330x300) ![](https://i.imgur.com/NkYykXm.png) ### Workqueue ![](https://i.imgur.com/3UHiUDZ.png =330x300) ![](https://i.imgur.com/2zeUPTT.png =330x300) ![](https://i.imgur.com/U5mISwz.png) ### CMWQ ![](https://i.imgur.com/4A0Lfac.png =330x300) ![](https://i.imgur.com/ssSaSdm.png =330x300) ![](https://i.imgur.com/xND9mfe.png) ### 小結 可以發現,從 Original -> Workqueue -> CMWQ,回應時間大幅降低(平均及最差皆是),且 CPU 使用率大幅提昇 >不太確定是否是筆者指令錯誤,但在重現實驗的過程中跑到一半就會錯誤終止;故以上圖片是直接擷取原作者的共筆[color=#0abab5][name=bauuuu1021] ## 效能分析2 根據[JulianATA](https://hackmd.io/@hPMCWajOS-ORQdEEAQ04-w/B1-endN3N?type=view)同學的實驗以 thread 配置 telnet 連線,調整 Client 數量以及 Request 數量來探討(重複次數及 Timeout 皆固定)。 參數 - Client 數量 - Request 數量 - 重複次數(最後秒數為平均) - Timeout ```python def connect_and_send(Host,Port, number, Request_num, timeout): tn = telnetlib.Telnet(Host,Port,timeout = 10) for i in range (Request_num): mesg = ("Telnet No."+str(number)+",Write No"+str(i)+"\n").encode('ascii') tn.write(mesg) tn.read_until(mesg) tn.close() ``` ```python def concurrent_test(Host, Port, Client_num , Request_num,timeout): threads = [] for i in range(Client_num): threads.append(threading.Thread(target = connect_and_send, args = (Host,Port,i,Request_num,timeout,))) for i in range(Clinent_num): threads[i].start() for i in range(Client_num): threads[i].join() return ``` ### Client = 20 / Request = 20 / Repeat = 5 / Timeout = 10 ![](https://i.imgur.com/sXHNVHZ.png) ### Client = 40 / Request = 5 / Repeat = 5 / Timeout = 10 ![](https://i.imgur.com/jJY860p.png) ### Client = 50 / Request = 20 / Repeat = 5 / Timeout = 10 ![](https://i.imgur.com/RLEx5eK.png) ![](https://i.imgur.com/CYTvsZp.png) ### 小結 明顯看到在 Client 大於 20 之後,回應時間明顯的增加。 可以得知 Client 和回應時間有正相關。 ## Reference * [daemon 作業區](https://hackmd.io/@sysprog/SkLygcAiV?type=view) * [bauuuu1021 共筆](https://hackmd.io/@jD9XFdyQS0iyAaFMPYi5Ww/SJzmAarsN) * [afcidk 共筆](https://hackmd.io/@afcidk/Hyqa6tLoE?type=view) * [JulianATA 共筆](https://hackmd.io/@hPMCWajOS-ORQdEEAQ04-w/B1-endN3N?type=view) ###### tags: `linux2019`, `2019spring`