# 2019q1 Homework5 (daemon) contributed by < `bauuuu1021` > * [作業要求](https://hackmd.io/s/r14Qu9hYV) * [GitHub](https://github.com/bauuuu1021/kecho) 下載並編譯 `kecho` ,置入 kernel module ```shell $ git clone https://github.com/bauuuu1021/kecho $ cd kecho $ make $ sudo insmod fastecho.ko ``` 承前幾周作業的操作,輸入 `modinfo fastecho.ko` ```shell filename: /home/bauuuu1021/kecho/fastecho.ko version: 0.1 description: Fast echo server in kernel author: National Cheng Kung University, Taiwan license: Dual MIT/GPL srcversion: B6A15BC54464DFDE106869E depends: retpoline: Y name: fastecho vermagic: 4.18.0-18-generic SMP mod_unload parm: port:ushort parm: backlog:ushort ``` :::danger 注意!在 `rmmod` 前一定要將 telnet 程式結束,否則移除 module 後會無法再度載入 ::: ## kecho 缺失 >指出 [kecho 程式碼](https://github.com/sysprog21/kecho) 實作的缺失 (注意像是 buffer overflow 一類的問題),並予以改進 > ### Buffer overflow :::warning 使用 `telnet` 時,程式預期回傳輸入的字串(元),但輸入 `a` 除了回傳 `a` 還伴隨著亂碼 ```shell bauuuu1021@X555:~/kecho$ telnet localhost 12345 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. a a ��r ``` ::: 使用 `dmesg` 查看 ```shell [16867.766947] fastecho: socket create ok.... [16867.766950] fastecho: setsockopt ok.... [16867.766953] fastecho: socket bind ok.... [16867.766956] fastecho: socket listen ok.... [17574.265477] fastecho: start get response [17575.465972] fastecho: get request = a \xfe\x8fr\xa1 [17575.465975] fastecho: start send request. [17575.466002] fastecho: send request = a \xfe\x8fr\xa1 [17575.466004] fastecho: start get response [17578.028711] fastecho: get request = a \xfe\x8fr\xa1 [17578.028713] fastecho: start get response [17578.028715] fastecho: get request = a \xfe\x8fr\xa1 [17578.028716] fastecho: start send request. [17578.028759] fastecho: send request = a \xfe\x8fr\xa1 ``` 從以上訊息可得知,雖然只輸入 `a` ,但 kernel module 實際收到的是 `a (\n) \xfe\x8fr\xa1` >發現依序 get -> send -> get 兩次(?!) -> send >有點不太明白,先紀錄著 >[name=bauuuu1021][color=#0abab5] > 以上顯示的字串 `fastecho: start get response` 及 `fastecho: get request` 出現在 [echo_server.c](https://github.com/sysprog21/kecho/blob/master/echo_server.c) 的 `get_request()` 中 ```cpp=10 static int get_request(struct socket *sock, unsigned char *buf, size_t size) { struct msghdr msg; struct kvec vec; int length; /* kvec setting */ vec.iov_len = size; vec.iov_base = buf; /* msghdr setting */ msg.msg_name = 0; msg.msg_namelen = 0; msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_flags = 0; printk(MODULE_NAME ": start get response\n"); /* get msg */ length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); printk(MODULE_NAME ": get request = %s\n", buf); return length; } ``` 而根據 `#define BUF_SIZE 4096` 及呼叫 `get_request()` 的 `echo_server_worker()` ,第 29 行的 `size` 為 4095 ```cpp=91 res = get_request(sock, buf, BUF_SIZE - 1); res = send_request(sock, buf, strlen(buf)); ``` 參考 [kernel_recvmsg](https://www.kernel.org/doc/htmldocs/networking/API-kernel-recvmsg.html) * struct socket * sock : The socket to receive the message from * struct msghdr * msg : Received message * struct kvec * vec : Input [scatter/gather(s/g) array](https://en.wikipedia.org/wiki/Vectored_I/O) for message data * size_t num : Size of input s/g array * size_t size : Number of bytes to read * int flags : Message flags (MSG_DONTWAIT, etc...) :::warning 顯然 **Number of bytes to read** 在多數情況下不會是 4095 Bytes,或是 read 後必須處理多讀進來的字元 ::: #### 嘗試修改 buf 在 receive 後直接修改 buf 的值 ```cpp length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); strncpy(buf, "test", 5); printk(MODULE_NAME ": get request = %s\n", buf); ``` 結果 ```shell [ 1270.079206] fastecho: socket create ok.... [ 1270.079209] fastecho: setsockopt ok.... [ 1270.079213] fastecho: socket bind ok.... [ 1270.079215] fastecho: socket listen ok.... [ 1275.178793] fastecho: start get response [ 1276.731929] fastecho: get request = test [ 1276.731933] fastecho: start send request. [ 1276.731982] fastecho: send request = test [ 1276.731984] fastecho: start get response [ 1278.591491] fastecho: get request = test [ 1278.591493] fastecho: start send request. [ 1278.591530] fastecho: send request = test [ 1278.591531] fastecho: start get response [ 1285.641779] fastecho: get request = test [ 1285.641782] fastecho: start get response [ 1285.641785] fastecho: get request = test [ 1285.641786] fastecho: start send request. [ 1285.641846] fastecho: send request = test ``` 故猜測原本的問題與 `buf` 的修改有關 #### 配置記憶體後使用 memset 將配置 `buf` 記憶體 ```cpp buf = kmalloc(BUF_SIZE, GFP_KERNEL); if (!buf) { printk(KERN_ERR MODULE_NAME ": kmalloc error....\n"); return -1; } ``` 後面加上 ```cpp memset(buf, 0, BUF_SIZE); ``` 即成功解決 telnet echo 第一次回傳的 buffer overflow 問題。 #### get_request 前使用 memset 然而第二次開始有時字尾會附加之前幾次 request 的部份字串,因此判斷在進行下一次 `get_request` 前也該加上 `memset`,實做如下 ```cpp static int send_request(struct socket *sock, unsigned char *buf, size_t size) { /* ship */ printk(MODULE_NAME ": send request = %s\n", buf); memset(buf, 0, BUF_SIZE); return length; } ``` 雖然這樣就解決了亂碼的問題,但結束程式後在 `dmesg` 中會看到大量的 error message ```shell [ 463.976845] BUG: unable to handle kernel paging request at ffffffffffffffff [ 463.976856] PGD 11440e067 P4D 11440e067 PUD 114410067 PMD 0 [ 463.976874] Oops: 0000 [#1] SMP PTI [ 463.976886] CPU: 0 PID: 2154 Comm: fastecho Tainted: G OE 4.18.0-18-generic #19~18.04.1-Ubuntu [ 463.976892] Hardware name: ASUSTeK COMPUTER INC. X555LB/X555LB, BIOS X555LB.603 05/06/2016 [ 463.976908] RIP: 0010:memcpy_erms+0x6/0x10 [ 463.976913] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 463.977048] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206 [ 463.977069] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590 [ 463.977076] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930 [ 463.977083] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000 [ 463.977090] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590 [ 463.977098] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90 [ 463.977107] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000 [ 463.977114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.977121] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0 [ 463.977128] Call Trace: [ 463.977144] ? _copy_from_iter_full+0x231/0x250 [ 463.977156] tcp_sendmsg_locked+0x9e1/0xd50 [ 463.977168] tcp_sendmsg+0x2c/0x50 [ 463.977178] inet_sendmsg+0x2e/0xb0 [ 463.977188] sock_sendmsg+0x3e/0x50 [ 463.977198] kernel_sendmsg+0x2b/0x30 [ 463.977212] send_request.isra.0+0xee/0x130 [fastecho] [ 463.977225] echo_server_worker+0x9c/0x100 [fastecho] [ 463.977236] kthread+0x121/0x140 [ 463.977246] ? get_request.constprop.1+0xc0/0xc0 [fastecho] [ 463.977256] ? kthread_create_worker_on_cpu+0x70/0x70 [ 463.977267] ret_from_fork+0x35/0x40 [ 463.977275] Modules linked in: fastecho(OE) rfcomm cmac bnep binfmt_misc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc rtsx_usb_ms uvcvideo videobuf2_vmalloc videobuf2_memops aesni_intel videobuf2_v4l2 aes_x86_64 crypto_simd videobuf2_common cryptd memstick ath3k arc4 videodev media glue_helper intel_cstate ath9k ath9k_common btusb intel_rapl_perf btrtl btbcm btintel joydev ath9k_hw ath snd_hda_codec_realtek bluetooth asus_nb_wmi snd_hda_codec_hdmi snd_hda_codec_generic mac80211 ecdh_generic asus_wmi snd_hda_intel snd_hda_codec cfg80211 input_leds snd_seq_midi serio_raw snd_seq_midi_event sparse_keymap snd_hda_core mac_hid snd_hwdep snd_rawmidi snd_pcm snd_seq snd_seq_device acpi_pad snd_timer [ 463.977429] processor_thermal_device int340x_thermal_zone int3400_thermal intel_soc_dts_iosf mei_me mei lpc_ich snd intel_pch_thermal acpi_thermal_rel asus_wireless soundcore sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic rtsx_usb_sdmmc usbhid rtsx_usb hid nouveau i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper ahci syscopyarea sysfillrect psmouse libahci sysimgblt fb_sys_fops drm r8169 mii wmi video [ 463.977524] CR2: ffffffffffffffff [ 463.977533] ---[ end trace 2ffdde6bdedecd09 ]--- [ 463.977544] RIP: 0010:memcpy_erms+0x6/0x10 [ 463.977549] Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 463.977674] RSP: 0018:ffffb4e64311bcc0 EFLAGS: 00010206 [ 463.977683] RAX: ffff8c7710255930 RBX: 0000000000000590 RCX: 0000000000000590 [ 463.977689] RDX: 0000000000000590 RSI: ffffffffffffffff RDI: ffff8c7710255930 [ 463.977696] RBP: ffffb4e64311bd08 R08: ffff8c7710255930 R09: 0000000000000000 [ 463.977702] R10: 0000000000000130 R11: 0000000040000000 R12: 0000000000000590 [ 463.977709] R13: ffffb4e64311be80 R14: 0000000000000590 R15: ffffb4e64311be90 [ 463.977717] FS: 0000000000000000(0000) GS:ffff8c7766c00000(0000) knlGS:0000000000000000 [ 463.977724] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.977731] CR2: ffffffffffffffff CR3: 000000011440a004 CR4: 00000000003606f0 ``` 且移除 module 再重新載入會失敗並出現以下訊息 ```shell insmod: ERROR: could not insert module fastecho.ko: Address already in use ``` :::warning 逐步使用 `dmesg` 發現,從第一次 request 到結束 telnet 時使用 `^]` 皆未出現錯誤訊息,是到了 ```shell telnet> q Connection closed. ``` 才出現 ::: #### Error 排除 後來逐行檢查,發現 ```cpp static int echo_server_worker(void *arg) { /* skip */ res = get_request(sock, buf, BUF_SIZE - 1); res = send_request(sock, buf, strlen(buf)); kernel_sock_shutdown(sock, SHUT_RDWR); sock_release(sock); kfree(buf); return 0; } ``` `get_request()` 有成功執行,但執行下一行 `send_request()` 卻導致程式不預期的終止。故嘗試將 `res = send_request(sock, buf, strlen(buf));` 移除,結果成功解決了錯誤!!! ### 輸入覆蓋輸出 輸入測試字串 `echo_testing...` 發現字串回傳後游標會在回傳的字串之首,導致再度輸入時會覆蓋掉之前回傳的字串。如下所示,此時游標在回傳的 (第二個) `echo_testing...` 的 `e` 上 ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... echo_testing... ``` 接著輸入下一個字串 `coverQQ` 會變成 ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... coverQQsting... ``` (但其實不影響實際使用功能) ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... coverQQsting... coverQQ ``` :::success 目前利用每次 request 結束到下次 request 間傳送兩個 newline 字元來解決 (但應有較優雅的方法) ```shell Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. echo_testing... echo_testing... coverQQ coverQQ ``` ::: ### 認證失敗 執行 `sudo insmod fastecho.ko` 後,以 `dmesg` 查看發現以下訊息 ```shell [15549.339162] fastecho: module verification failed: signature and/or required key missing - tainting kernel ``` >發現只有開機後的第一次載入會出現這樣的錯誤訊息,移除再載入時不會發生 >[color=#0abab5] >[read](https://blog.51cto.com/weiguozhihui/1585297)[color=red] :::info 比較 [fibdev](https://github.com/sysprog21/fibdrv/blob/master/Makefile#L1) 的 Makefile 第 1 行 `CONFIG_MODULE_SIG=n`, 思考其作用 :notes: jserv ::: ## 效能分析 使用 ```shell echo "testing" | telnet localhost 12345 ``` 相當於送出 `testing` 字串,用 `dmesg` 檢驗是否有成功送出、收到、回傳 ```shell [13712.118442] fastecho: start get response [13712.118472] fastecho: get request = testing [13712.118473] fastecho: start send request. [13712.118492] fastecho: send request = testing ``` 因此想寫一個 script 來量測這樣的過程所需的時間 ```script #!/bin/bash END=1000 for i in $(seq 1 $END); do echo "test" | telnet 127.0.0.1 12345 done ``` 在執行時加上關鍵字 `time` 即可得到所需的時間 ```shell time sh test2.sh ``` 結果 ```shell real 0m1.828s user 0m1.405s sys 0m0.523s ``` :::danger 如此僅能算是壓力測試,跟效能分析沒有直接關聯。重複的 spawn (fork+exec) 出 telnet 行程然後建立 TCP 連線的成本,其實是不固定的,你應該先釐清。Ftrace 和 eBPF 是你的好朋友。 :notes: jserv ::: ## 參考資料 * [tiny web server](https://github.com/bauuuu1021/tiny-webServer) * [nstack 開發紀錄](https://hackmd.io/s/ryfvFmZ0f) * [鳥哥 script](http://linux.vbird.org/linux_basic/0340bashshell-scripts.php#script_1st) * [loop in script](https://blog.longwin.com.tw/2014/07/bash-shell-for-loop-script-2014/) ###### tags: `bauuuu1021` `sysprog` `2019spring`