# 2019q1 Homework5 (daemon) contributed by < `st9540808` > * [F09: daemon](https://hackmd.io/s/r14Qu9hYV) ## 實作缺失的發現與改進 ### buffer overflow 按照作業說明測試: ``` $ telnet localhost 12345 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. world world /" ``` dmesg 輸出可以看到 get request 印出輸入字串以外的東西 ```shell [245936.938694] fastecho: socket create ok.... [245936.938697] fastecho: setsockopt ok.... [245936.938700] fastecho: socket bind ok.... [245936.938702] fastecho: socket listen ok.... [245945.997407] fastecho: start get response [245948.201059] fastecho: get request = world \xf0\xf0/\x8c"\xa5U [245948.201061] fastecho: start send request. [245948.201088] fastecho: send request = world \xf0\xf0/\x8c"\xa5U ``` 對照 echo_server.c 的程式碼,buf 是由 kmalloc() 分配,並沒有對它做初始化,因此必須在 buf 最後寫入空字元。 ```c= static int get_request(struct socket *sock, unsigned char *buf, size_t size) { ... length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); printk(MODULE_NAME ": get request = %s\n", buf); return length; } ``` 在 kernel_recvmsg() 之後在 length 寫入空字元: ```c= length = kernel_recvmsg(sock, &msg, &vec, size, size, msg.msg_flags); buf[length] = '\0'; ``` 更改完後發現在結束 telnet 時 dmesg 印出以下錯誤,目前還不知道是哪裡出錯。 ```shell [288301.831582] BUG: unable to handle kernel NULL pointer dereference at 0000000000000587 [288301.831593] IP: memcpy_orig+0x54/0x110 [288301.831595] PGD 0 P4D 0 [288301.831601] Oops: 0000 [#7] SMP PTI [288301.831706] CPU: 3 PID: 10427 Comm: fastecho Tainted: G D OE 4.15.0-48-generic #51-U buntu [288301.831709] Hardware name: LENOVO 4291KV4/4291KV4, BIOS 8DET67WW (1.37 ) 12/05/2012 [288301.831714] RIP: 0010:memcpy_orig+0x54/0x110 [288301.831716] RSP: 0018:ffffbadb889b7cc0 EFLAGS: 00010206 [288301.831720] RAX: ffffa0efa97aa930 RBX: 0000000000000590 RCX: ffffa0ec40000000 [288301.831722] RDX: 0000000000000550 RSI: 000000000000058f RDI: ffffa0efa97aaec0 [288301.831725] RBP: ffffbadb889b7d08 R08: ffffa0efa97aa930 R09: 0000000000000000 [288301.831727] R10: 0000000000000800 R11: 0000000000000000 R12: 0000000000000590 [288301.831729] R13: ffffbadb889b7e80 R14: 0000000000000590 R15: ffffbadb889b7e90 [288301.831733] FS: 0000000000000000(0000) GS:ffffa0f05e2c0000(0000) knlGS:0000000000000000 [288301.831736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [288301.831738] CR2: 0000000000000587 CR3: 00000001a040a004 CR4: 00000000000606e0 [288301.831741] Call Trace: [288301.831749] ? _copy_from_iter_full+0x231/0x250 [288301.831756] tcp_sendmsg_locked+0x93f/0xe70 [288301.831762] tcp_sendmsg+0x2c/0x50 [288301.831768] inet_sendmsg+0x2e/0xb0 [288301.831773] sock_sendmsg+0x3e/0x50 [288301.831777] kernel_sendmsg+0x2b/0x30 [288301.831783] send_request.isra.0+0xee/0x120 [fastecho] [288301.831788] echo_server_worker+0x8f/0xf0 [fastecho] [288301.831794] kthread+0x121/0x140 [288301.831798] ? get_request.constprop.1+0xc0/0xc0 [fastecho] [288301.831803] ? kthread_create_worker_on_cpu+0x70/0x70 [288301.831807] ret_from_fork+0x35/0x40 [288301.831810] Code: 4c 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4 83 c2 20 eb 44 48 01 d6 48 01 d7 48 83 ea 20 0f 1f 00 48 83 ea 20 <4c> 8b 46 f8 4c 8b 4e f0 4c 8b 56 e8 4c 8b 5e e0 48 8d 76 e0 4c [288301.831860] RIP: memcpy_orig+0x54/0x110 RSP: ffffbadb889b7cc0 [288301.831862] CR2: 0000000000000587 [288301.831866] ---[ end trace 196ae849f6e9e5e5 ]--- ``` ## 驗證效能 ### SystemTap 使用 SystemTap 對 kernel module 做 profiling。安裝 SystemTap 同時也需要 debuginfo 才能做 probing。 ```shell $ sudo apt-get install -y systemtap $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622 $ sudo apt-get update $ sudo apt-get install linux-image-$(uname -r)-dbgsym ``` 安裝完確認是否能正常執行 ```shell stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}' ``` 測試完後寫一個腳本來 probe echo_server.c 的 get_request 函式。注意在執行之前必須把 fastecho.ko 放到 ``/lib/modules/`uname -r`/`` 的目錄下,第一次在 function 帶入 get_request 會得到以下錯誤訊息: ```shell semantic error: while resolving probe point: identifier 'module' at ./profile.stp:3:7 source: probe module("fastecho").function("get_request") { ^ semantic error: no match (similar functions: init_module, send_request.isra.0, cleanup_module, echo_server_worker, get_request.constprop.1) Pass 2: analysis failed. [man error::pass2] Tip: /usr/share/doc/systemtap/README.Debian should help you get started. ``` 查了一下相關資料發現 gcc 對 get_request 做 constant propagation 的最佳化才得到 .constprop.1 的後綴。因此 function 必須帶入 get_request.constprop.1。 systemtap 測試腳本如下: ```clike= #!/usr/bin/stap global stats probe module("fastecho").function("get_request.constprop.1").return { a = gettimeofday_us() - @entry(gettimeofday_us()); printf("tid=%d name=%s ", tid(), execname()); printf("time=%dus\n", a); stats <<< a; } probe end { printf("\n"); printf("get_request execution time (us)\n"); print(@hist_log(stats)); } ``` 借用[其他同學](https://hackmd.io/s/SJzmAarsN)的 bash script 來操作 telnet 進行測試,執行 systemtap 腳本後同時執行以下 bash script ```bash #!/bin/bash END=1000 for i in $(seq 1 $END); do echo "test" | telnet 127.0.0.1 12345 done ``` 可以觀察到以下輸出結果,紀錄執行時間並印出直方圖。 > 至於為何每次 `echo "test" | telnet 127.0.0.1 12345` 都會印出三行還在搜尋相關資料 ```shell $ sudo stap ./profile.stp tid=391 name=fastecho time=14us [1836/1836] tid=391 name=fastecho time=3us tid=391 name=fastecho time=3us tid=394 name=fastecho time=31us tid=394 name=fastecho time=9us tid=394 name=fastecho time=3us tid=397 name=fastecho time=14us tid=397 name=fastecho time=3us tid=397 name=fastecho time=2us .... ^C [15/1833] get_request execution time (us) value |-------------------------------------------------- count 0 | 0 1 | 0 2 |@@@@@@@@@@@@@@@@@@@@@@@@@ 672 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1265 8 |@@@@@@@@@@@@@@@@@ 444 16 |@@@@@@@@@@@@@@@@@@ 473 32 |@@@ 95 64 | 21 128 | 11 256 | 13 512 | 5 1024 | 2 2048 | 0 4096 | 0 ~ 524288 | 0 1048576 | 0 2097152 | 1 4194304 | 0 8388608 | 0 ``` ### [ftrace](https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/) [ftrace - Function Tracer](https://www.kernel.org/doc/html/latest/trace/ftrace.html) 使用另外一種 dynamic tracing 的工具做 profiling,ftrace 已經寫入 Linux kernel,並以 sysfs 虛擬檔案系統為界面提供使用者相關功能,先確認自己的 kernel 是否開啟 ftrace 功能: ```shell cat /boot/config-4.15.0-48-generic | grep CONFIG_DYNAMIC_FTRACE CONFIG_DYNAMIC_FTRACE=y CONFIG_DYNAMIC_FTRACE_WITH_REGS=y ``` 輸入第一道命令後在另外一個終端機執行 `echo "test" | telnet 127.0.0.1 12345`,可以看到以下輸出 ```shell $ sudo trace-cmd record -p function_graph -l get_request.constprop.1 plugin 'function_graph' Hit Ctrl^C to stop recording ^CCPU0 data recorded at offset=0x5f3000 4096 bytes in size CPU1 data recorded at offset=0x5f4000 0 bytes in size CPU2 data recorded at offset=0x5f4000 0 bytes in size CPU3 data recorded at offset=0x5f4000 0 bytes in size $ sudo trace-cmd report CPU 1 is empty CPU 2 is empty CPU 3 is empty cpus=4 fastecho-16227 [000] 417473.036319: funcgraph_entry: + 11.441 us | get_request.constprop.1(); fastecho-16227 [000] 417473.036510: funcgraph_entry: 3.805 us | get_request.constprop.1(); fastecho-16227 [000] 417473.036515: funcgraph_entry: 2.572 us | get_request.constprop.1(); ```