# 2023q1 Homework7 (ktcp) ###### tags: `linux2023` contributed by < [xueyang0312](https://github.com/xueyang0312) > ## :checkered_flag: 自我檢查清單 - [x] 給定的 `kecho` 已使用 CMWQ,請陳述其優勢和用法,應重現相關實驗 - [ ] 搭配閱讀《Demystifying the Linux CPU Scheduler》第 1 章和第 2 章,描述 CPU 排程器和 workqueue/CMWQ 的互動,應探究相關 Linux 核心原始程式碼 - [ ] 研讀〈[Linux 核心設計: RCU 同步機制](https://hackmd.io/@sysprog/linux-rcu)〉並測試相關 Linux 核心模組以理解其用法 - [ ] 如何測試網頁伺服器的效能,針對多核處理器場景調整 - [x] 如何利用 [Ftrace](https://docs.kernel.org/trace/ftrace.html) 找出 `khttpd` 核心模組的效能瓶頸,該如何設計相關實驗學習。搭配閱讀《Demystifying the Linux CPU Scheduler》第 6 章 - [ ] 解釋 `drop-tcp-socket` 核心模組運作原理。`TIME-WAIT` sockets 又是什麼? - [x] 研讀 [透過 eBPF 觀察作業系統行為](https://hackmd.io/@sysprog/linux-ebpf),如何用 eBPF 測量 kthread / CMWQ 關鍵操作的執行成本? > 參照 [eBPF 教程](https://github.com/eunomia-bpf/bpf-developer-tutorial) - [x] 參照〈[測試 Linux 核心的虛擬化環境](https://hackmd.io/@sysprog/linux-virtme)〉和〈[建構 User-Mode Linux 的實驗環境](https://hackmd.io/@sysprog/user-mode-linux-env)〉,在原生的 Linux 系統中,利用 UML 或 `virtme` 建構虛擬化執行環境,搭配 GDB 追蹤 `khttpd` 核心模組 ## :penguin: 作業要求 * 回答上述「自我檢查清單」的所有問題,需要附上對應的參考資料和必要的程式碼,以第一手材料 (包含自己設計的實驗) 為佳 :::warning :warning: 如果你在 2023 年 4 月 18 日前,已從 GitHub [khttpd](https://github.com/sysprog21/khttpd) 進行 fork,請對舊的 repository 做對應處置,然後重新 fork :warning: 2023 年的作業要求和 2022 年不同,請留意! ::: * 在 GitHub 上 fork [khttpd](https://github.com/sysprog21/khttpd),目標是 1. 引入 [Concurrency Managed Workqueue](https://www.kernel.org/doc/html/v4.15/core-api/workqueue.html) (cmwq),改寫 kHTTPd,分析效能表現和提出改進方案,可參考 [kecho](https://github.com/sysprog21/kecho) 2. 提供目錄檔案存取功能,提供基本的 [directory listing](https://cwiki.apache.org/confluence/display/httpd/DirectoryListings) 功能 3. 目前的 kHTTPd 初步實作 [HTTP 1.1 keep-alive](https://en.wikipedia.org/wiki/HTTP_persistent_connection),不過效率不彰,以 ftrace 一類的工具指出問題所在並改進 4. 引入 timer,讓 kHTTPd 主動關閉逾期的連線 5. 以 RCU 搭配自行設計的 lock-free 資料結構,在並行環境中得以釋放系統資源 > 參照 [http-server-rcu](https://github.com/frextrite/http-server-rcu) 6. 學習 [cserv](https://github.com/sysprog21/cserv) 的 memcache 並在 kHTTPd 重新實作 * 過程中應一併完成以下: * 修正 kHTTPd 的執行時期缺失 * 指出 kHTTPd 實作的缺失 (特別是安全疑慮) 並予以改正 * 用你改進的 kHTTPd 和 [cserv](https://github.com/sysprog21/cserv) 進行效能評比,解釋行為落差 ## HTTP Protocol ![](https://hackmd.io/_uploads/Bk9cmzBBh.png) ![](https://hackmd.io/_uploads/BJUWmGBS3.png) **HTTP 封包範例:** ## khttpd 研讀 ### 掛載 `khttpd` Kernel Module 當掛載 khttpd kernel module 時,會執行 `khttpd_init` 函式,主要會做兩件事情: * `open_listen_socket` : 建立 `DEFAULT_PORT=8081` server。 * `kthread_run` : 建立一個立刻執行的 kernel thread,等待 client 連線,並服務 client。 ```c static int __init khttpd_init(void) { int err = open_listen_socket(port, backlog, &listen_socket); if (err < 0) { pr_err("can't open listen socket\n"); return err; } param.listen_socket = listen_socket; http_server = kthread_run(http_server_daemon, &param, KBUILD_MODNAME); if (IS_ERR(http_server)) { pr_err("can't start http server daemon\n"); close_listen_socket(listen_socket); return PTR_ERR(http_server); } return 0; } ``` ***open_listen_socket*** 為建立 TCP socket 連線,可以將 *setsockopt* 函式包裝成類似於系統呼叫 [`setsockopt`](https://man7.org/linux/man-pages/man3/setsockopt.3p.html) 的形式。 在 [socket man page](https://linux.die.net/man/7/socket#) 有提到,可以在 SOL_SOCKET 級別上對所有 socket 的屬性進行設定和讀取。 > These socket options can be set by using setsockopt(2) and read with getsockopt(2) with the socket level set to SOL_SOCKET for **all** sockets: 有些特別針對 TCP level 設定,例如:`TCP_NODELAY`、`TCP_CORK` 在額外設定。 ```c static inline int setsockopt(struct socket *sock, int level, int optname, int optval) { int opt = optval; return kernel_setsockopt(sock, level, optname, (char *) &opt, sizeof(opt)); } static int open_listen_socket(ushort port, ushort backlog, struct socket **res) { struct socket *sock; struct sockaddr_in s; int err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock); if (err < 0) { pr_err("sock_create() failure, err=%d\n", err); return err; } err = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, 1); if (err < 0) goto bail_setsockopt; err = setsockopt(sock, SOL_TCP, TCP_NODELAY, 1); if (err < 0) goto bail_setsockopt; err = setsockopt(sock, SOL_TCP, TCP_CORK, 0); if (err < 0) goto bail_setsockopt; err = setsockopt(sock, SOL_SOCKET, SO_RCVBUF, 1024 * 1024); if (err < 0) goto bail_setsockopt; err = setsockopt(sock, SOL_SOCKET, SO_SNDBUF, 1024 * 1024); if (err < 0) goto bail_setsockopt; memset(&s, 0, sizeof(s)); s.sin_family = AF_INET; s.sin_addr.s_addr = htonl(INADDR_ANY); s.sin_port = htons(port); err = kernel_bind(sock, (struct sockaddr *) &s, sizeof(s)); if (err < 0) { pr_err("kernel_bind() failure, err=%d\n", err); goto bail_sock; } err = kernel_listen(sock, backlog); if (err < 0) { pr_err("kernel_listen() failure, err=%d\n", err); goto bail_sock; } *res = sock; return 0; bail_setsockopt: pr_err("kernel_setsockopt() failure, err=%d\n", err); bail_sock: sock_release(sock); return err; } ``` - [ ] `SOL_SOCKET` Level | Setting | Description | | -------- | -------- | | SO_REUSEADDR |可以在绑定一個已經被使用過的地址和端口之前允許其重用,即使該地址和端口仍然處於 `TIME_WAIT` 狀態。這個選項的作用是在程序異常終止或網路問題導致端口没有正常關閉時,能夠更快地重啟程序並繼續使用相同的地址和端口。 | | SO_RCVBUF |Sets or gets the maximum socket receive buffer in bytes.| | SO_SNDBUF |Sets or gets the maximum socket send buffer in bytes. | | SO_REUSEADDR | socketA | socketB | Result | | -------- | --------| --------| --------| | ON / OFF | 192.168.1.1:21 | 192.168.1.1:21 |ERROR(EADDRINUSE | |ON / OFF|192.168.1.1:21|10.0.1.1:21| OK | |ON / OFF|10.0.1.1:21|192.168.1.1:21| OK | |OFF |192.168.1.1:21 |0.0.0.0:21 | ERROR(EADDRINUSE)| |OFF |0.0.0.0:21|192.168.1.1:21|ERROR(EADDRINUSE)| - [ ] `SOL_TCP` Level | Setting | Description | | -------- | -------- | | TCP_NODELAY | Turn off [Nagle’s algorithm](https://en.wikipedia.org/wiki/Nagle%27s_algorithm) | | TCP_CORK | 經常搭配 TCP_NODELAY 使用,為了避免不斷送出資料量不多 (小於 MSS) 的封包,使用 TCP_CORK 可以將資料匯聚並且一次發送資料量較大的封包 | ***http_server_daemon*** 建立 socket 完成並等待 client 連線,會建立一個 kernel thread 來執行 `http_server_daemon`。 註冊兩個訊號分別為:`SIGKILL`、`SIGTERM`,使用 *while* 迴圈判斷是否需要停止 `http_server_daemon` 的執行緒,當需要停止時,使用函式 `kthread_stop` 停止執行緒。接著在 *while* 迴圈中使用 `kernel_accept` 接受 *client* 連線要求,並且在成功建立連線後使用 `kthread_run` 建立新的執行緒,並且執行函式 `http_server_worker`。 ```c int http_server_daemon(void *arg) { struct socket *socket; struct task_struct *worker; struct http_server_param *param = (struct http_server_param *) arg; allow_signal(SIGKILL); allow_signal(SIGTERM); while (!kthread_should_stop()) { int err = kernel_accept(param->listen_socket, &socket, 0); if (err < 0) { if (signal_pending(current)) break; pr_err("kernel_accept() error: %d\n", err); continue; } worker = kthread_run(http_server_worker, socket, KBUILD_MODNAME); if (IS_ERR(worker)) { pr_err("can't create more worker process\n"); continue; } } return 0; } ``` 以下是 `kthread_should_stop` 在 `<linux/kthread.h>` 描述 ```c /** * kthread_should_stop - should this kthread return now? * * When someone calls kthread_stop() on your kthread, it will be woken * and this will return true. You should then return, and your return * value will be passed through to kthread_stop(). */ bool kthread_should_stop(void) { return test_bit(KTHREAD_SHOULD_STOP, &to_kthread(current)->flags); } ``` ### http_server_worker 接受 *client* 連線要求後,都會執行 `http_server_worker` 這個 worker thread function,主要執行以下事情: 1. 設定 *callback function*,並註冊 `signal` 和 初始化 *parser*。 2. 接收資料。 3. 執行 `http_parser_execute`,解讀資料。 4. 中斷連線後釋放用到的所有記憶體 ```c static int http_server_worker(void *arg) { char *buf; struct http_parser parser; struct http_parser_settings setting = { .on_message_begin = http_parser_callback_message_begin, .on_url = http_parser_callback_request_url, .on_header_field = http_parser_callback_header_field, .on_header_value = http_parser_callback_header_value, .on_headers_complete = http_parser_callback_headers_complete, .on_body = http_parser_callback_body, .on_message_complete = http_parser_callback_message_complete}; struct http_request request; struct socket *socket = (struct socket *) arg; allow_signal(SIGKILL); allow_signal(SIGTERM); buf = kzalloc(RECV_BUFFER_SIZE, GFP_KERNEL); if (!buf) { pr_err("can't allocate memory!\n"); return -1; } request.socket = socket; http_parser_init(&parser, HTTP_REQUEST); parser.data = &request; while (!kthread_should_stop()) { int ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1); if (ret <= 0) { if (ret) pr_err("recv error: %d\n", ret); break; } http_parser_execute(&parser, &setting, buf, ret); if (request.complete && !http_should_keep_alive(&parser)) break; memset(buf, 0, RECV_BUFFER_SIZE); } kernel_sock_shutdown(socket, SHUT_RDWR); sock_release(socket); kfree(buf); return 0; } ``` #### `http_parser` structure 這個 `http_parser` structure 利用 **bit field** 來節省空間。 [Bit-fields](https://en.cppreference.com/w/c/language/bit_field) 有說明例子: * `unsigned int b : 3` has the range `0..7` * `signed int b : 3` has the range `-4..3` * `int b : 3` 注意這裡的 `int` 是 **implement-defined**,也就是說它是有號無號的取決於 **compiler** 所以可能結果為 `0..7` or `-4..3` ```clike= struct http_parser { /** PRIVATE **/ unsigned int type : 2; /* enum http_parser_type */ unsigned int flags : 8; /* F_* values from 'flags' enum; semi-public */ unsigned int state : 7; /* enum state from http_parser.c */ unsigned int header_state : 7; /* enum header_state from http_parser.c */ unsigned int index : 5; /* index into current matcher */ unsigned int uses_transfer_encoding : 1; /* Transfer-Encoding header is present */ unsigned int allow_chunked_length : 1; /* Allow headers with both * `Content-Length` and * `Transfer-Encoding: chunked` set */ unsigned int lenient_http_headers : 1; uint32_t nread; /* # bytes read in various scenarios */ uint64_t content_length; /* # bytes in body. `(uint64_t) -1` (all bits one) * if no Content-Length header. */ /** READ-ONLY **/ unsigned short http_major; unsigned short http_minor; unsigned int status_code : 16; /* responses only */ unsigned int method : 8; /* requests only */ unsigned int http_errno : 7; /* 1 = Upgrade header was present and the parser has exited because of that. * 0 = No upgrade header present. * Should be checked when http_parser_execute() returns in addition to * error checking. */ unsigned int upgrade : 1; /** PUBLIC **/ void *data; /* A pointer to get hook to the "connection" or "socket" object */ }; ``` #### 設定 *callback function* 在 `http_parser.h` 宣告 function pointer 和定義 parser callback function ```c typedef int (*http_data_cb) (http_parser*, const char *at, size_t length); typedef int (*http_cb) (http_parser*); struct http_parser_settings { http_cb on_message_begin; http_data_cb on_url; http_data_cb on_status; http_data_cb on_header_field; http_data_cb on_header_value; http_cb on_headers_complete; http_data_cb on_body; http_cb on_message_complete; /* When on_chunk_header is called, the current chunk length is stored * in parser->content_length. */ http_cb on_chunk_header; http_cb on_chunk_complete; }; ``` 在 [Cprogramming.com](https://www.cprogramming.com/tutorial/function-pointers.html) 提到 > A function pointer is a variable that stores the address of a function that can later be called through that function pointer. 也給了以下例子: Declare a function pointer as though you were declaring a function, except with a name like *foo instead of just foo: ```c void (*foo)(int); ``` **Initializing** You can get the address of a function simply by naming it: ```c void foo(); func_pointer = foo; ``` or by prefixing the name of the function with an ampersand: ```c void foo(); func_pointer = &foo; ``` 在設定 callback function 時,有沒有 `&` 都是可以的 ```c struct http_parser_settings setting = { .on_message_begin = http_parser_callback_message_begin, .on_url = http_parser_callback_request_url, .on_header_field = http_parser_callback_header_field, .on_header_value = http_parser_callback_header_value, .on_headers_complete = http_parser_callback_headers_complete, .on_body = http_parser_callback_body, .on_message_complete = http_parser_callback_message_complete}; ``` #### `http_server_recv` 接收資料 ```c static int http_server_recv(struct socket *sock, char *buf, size_t size) { struct kvec iov = {.iov_base = (void *) buf, .iov_len = size}; struct msghdr msg = {.msg_name = 0, .msg_namelen = 0, .msg_control = NULL, .msg_controllen = 0, .msg_flags = 0}; return kernel_recvmsg(sock, &msg, &iov, 1, size, msg.msg_flags); } ``` #### `http_parser_execute` 解讀資料 利用 [ftrace](https://docs.kernel.org/trace/ftrace.html) 來解讀 `http_parser_execute` 在做什麼事 * `tracing_on` : 用於開啟或關閉追蹤功能。當 tracing_on 被設置為 1 時,表示追蹤功能已啟用,ftrace 會開始記錄相關的追蹤事件。而當 tracing_on 被設置為 0 時,表示追蹤功能已禁用,ftrace 將停止記錄追蹤事件。 * `set_graph_function`:函數的作用是在函數圖追蹤中設定僅追蹤特定的函數。 * `current_tracer`: `function_graph`的話可以打印出函數調用的關係,更加方便理解。 * `max_graph_depth`: This is the max depth it will trace into a function. 執行以下 script,主要透過 `wget` 自動從網絡下載該文件,傳送 HTTP **GET** Request: GET Request : ```shell= GET / HTTP/1.1 User-Agent: wget/1.20.3 (linux-gnu) Accept: */* Accept-Encoding: identity Host: localhost:8081 Connection: Keep-Alive ``` 1. `GET / HTTP/1.1`: 這是請求行,表示要求根目錄 (/) 的資源,並使用 HTTP/1.1 版本。 2. `User-Agent: wget/1.20.3 (linux-gnu):` 這是用戶代理(User-Agent)標頭欄位,用於識別發出請求的客戶端軟體和版本。在這個例子中,使用的是 wget 工具的版本 1.20.3,運行在 Linux 系統(linux-gnu)上。 3. `Accept: */*`: 這是 Accept 標頭欄位,用於告訴服務器客戶端所能接受的回應內容類型。這裡的 `*/*` 表示接受任何類型的回應內容。 4. `Accept-Encoding: identity`: 這是 Accept-Encoding 標頭欄位,用於告訴服務器客戶端所支援的內容編碼方式。在這個例子中,只接受 "identity" 編碼,表示不進行任何編碼。 5. `Host: localhost:8081`: 這是 Host 標頭欄位,指定請求的目標主機和端口。在這個例子中,請求的目標主機是 localhost,端口是 8081。 6. `Connection: Keep-Alive`: 這是 Connection 標頭欄位,用於控制連線的行為。在這個例子中,它指定保持連線(Keep-Alive),表示客戶端希望保持和服務器之間的連線,以便在後續請求中重用。 ```shell= #!/bin/bash TRACE_DIR=/sys/kernel/debug/tracing # clear echo 0 > $TRACE_DIR/tracing_on echo > $TRACE_DIR/set_graph_function echo > $TRACE_DIR/set_ftrace_filter echo nop > $TRACE_DIR/current_tracer # setting echo function_graph > $TRACE_DIR/current_tracer echo 3 > $TRACE_DIR/max_graph_depth echo http_server_worker > $TRACE_DIR/set_graph_function # execute echo 1 > $TRACE_DIR/tracing_on wget localhost:8081 echo 0 > $TRACE_DIR/tracing_on ``` 結果輸出: ```shell= # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 15) | http_server_worker [khttpd]() { 15) | kernel_sigaction() { 15) 0.127 us | _raw_spin_lock_irq(); 15) 0.492 us | } 15) | kernel_sigaction() { 15) 0.317 us | _raw_spin_lock_irq(); 15) 0.509 us | } 15) | kmem_cache_alloc_trace() { 15) 0.146 us | __cond_resched(); 15) 0.117 us | should_failslab(); 15) 0.878 us | } 15) 0.109 us | http_parser_init [khttpd](); 15) 0.112 us | kthread_should_stop(); 15) | http_server_recv.constprop.0 [khttpd]() { 15) 3.096 us | kernel_recvmsg(); 15) 3.353 us | } 15) | http_parser_execute [khttpd]() { 15) 0.112 us | http_parser_callback_message_begin [khttpd](); 15) 0.194 us | parse_url_char [khttpd](); 15) 0.164 us | http_parser_callback_request_url [khttpd](); 15) 0.160 us | http_parser_callback_header_field [khttpd](); 15) 0.117 us | http_parser_callback_header_value [khttpd](); 15) 0.122 us | http_parser_callback_header_field [khttpd](); 15) 0.095 us | http_parser_callback_header_value [khttpd](); 15) 0.096 us | http_parser_callback_header_field [khttpd](); 15) 0.097 us | http_parser_callback_header_value [khttpd](); 15) 0.106 us | http_parser_callback_header_field [khttpd](); 15) 0.117 us | http_parser_callback_header_value [khttpd](); 15) 0.095 us | http_parser_callback_header_field [khttpd](); 15) 0.098 us | http_parser_callback_header_value [khttpd](); 15) 0.098 us | http_parser_callback_headers_complete [khttpd](); 15) 0.099 us | http_should_keep_alive [khttpd](); 15) + 33.231 us | http_parser_callback_message_complete [khttpd](); 15) + 41.377 us | } 15) 0.142 us | http_should_keep_alive [khttpd](); 15) 0.133 us | kthread_should_stop(); 15) | http_server_recv.constprop.0 [khttpd]() { 15) ! 897.951 us | kernel_recvmsg(); 15) ! 898.491 us | } 15) | _printk() { 15) 3.394 us | vprintk(); 15) 3.598 us | } 15) | kernel_sock_shutdown() { 15) 0.455 us | inet_shutdown(); 15) 0.629 us | } 15) | sock_release() { 15) 5.292 us | __sock_release(); 15) 5.449 us | } 15) 0.152 us | kfree(); 15) ! 957.790 us | } ``` 一開始會先初始化 `parser` ,最初的 `state` 就是 **HTTP_REQUEST**,所以 `parser->state` 為 **s_start_req** ```c= void http_parser_init (http_parser *parser, enum http_parser_type t) { void *data = parser->data; /* preserve application data */ memset(parser, 0, sizeof(*parser)); parser->data = data; parser->type = t; parser->state = (t == HTTP_REQUEST ? s_start_req : (t == HTTP_RESPONSE ? s_start_res : s_start_req_or_res)); parser->http_errno = HPE_OK; } ``` 接著 `http_server_recv` 等待 client 傳送資料,等到收到資料後,執行 `http_parser_execute()`。 在 `http_parser.c` 裡有一個 macro `CURRENT_STATE` 代表 **p_state**,讓人更好理解意思 ```c= #define CURRENT_STATE() p_state ``` 接著有一個 for loop 來解析封包,以下列出 case 為 `s_start_req`: ```c= for (p=data; p != data + len; p++) { ch = *p; ... switch (CURRENT_STATE()) { ... case s_start_req: { if (ch == CR || ch == LF) break; parser->flags = 0; parser->uses_transfer_encoding = 0; parser->content_length = ULLONG_MAX; if (UNLIKELY(!IS_ALPHA(ch))) { SET_ERRNO(HPE_INVALID_METHOD); goto error; } parser->method = (enum http_method) 0; parser->index = 1; switch (ch) { case 'A': parser->method = HTTP_ACL; break; case 'B': parser->method = HTTP_BIND; break; case 'C': parser->method = HTTP_CONNECT; /* or COPY, CHECKOUT */ break; case 'D': parser->method = HTTP_DELETE; break; case 'G': parser->method = HTTP_GET; break; case 'H': parser->method = HTTP_HEAD; break; case 'L': parser->method = HTTP_LOCK; /* or LINK */ break; case 'M': parser->method = HTTP_MKCOL; /* or MOVE, MKACTIVITY, MERGE, M-SEARCH, MKCALENDAR */ break; case 'N': parser->method = HTTP_NOTIFY; break; case 'O': parser->method = HTTP_OPTIONS; break; case 'P': parser->method = HTTP_POST; /* or PROPFIND|PROPPATCH|PUT|PATCH|PURGE */ break; case 'R': parser->method = HTTP_REPORT; /* or REBIND */ break; case 'S': parser->method = HTTP_SUBSCRIBE; /* or SEARCH, SOURCE */ break; case 'T': parser->method = HTTP_TRACE; break; case 'U': parser->method = HTTP_UNLOCK; /* or UNSUBSCRIBE, UNBIND, UNLINK */ break; default: SET_ERRNO(HPE_INVALID_METHOD); goto error; } UPDATE_STATE(s_req_method); CALLBACK_NOTIFY(message_begin); break; } ... } } ``` 根據 `ch` 來 assign `parser->method`,然後更新狀態 `p_state` 為 **s_req_method**,接著執行 callback function : **http_parser_callback_message_begin** **是如何執行 callback function ?** ```c= /* Run the notify callback FOR, returning ER if it fails */ #define CALLBACK_NOTIFY_(FOR, ER) \ do { \ assert(HTTP_PARSER_ERRNO(parser) == HPE_OK); \ \ if (LIKELY(settings->on_##FOR)) { \ parser->state = CURRENT_STATE(); \ if (UNLIKELY(0 != settings->on_##FOR(parser))) { \ SET_ERRNO(HPE_CB_##FOR); \ } \ UPDATE_STATE(parser->state); \ \ /* We either errored above or got paused; get out */ \ if (UNLIKELY(HTTP_PARSER_ERRNO(parser) != HPE_OK)) { \ return (ER); \ } \ } \ } while (0) /* Run the notify callback FOR and consume the current byte */ #define CALLBACK_NOTIFY(FOR) CALLBACK_NOTIFY_(FOR, p - data + 1) ``` 將 `message_begin` 帶入 `FOR`,`CALLBACK_NOTIFY(message_begin)` 會變成如下: ```c= do { \ assert(HTTP_PARSER_ERRNO(parser) == HPE_OK); \ \ if (LIKELY(settings->on_message_begin)) { \ parser->state = CURRENT_STATE(); \ if (UNLIKELY(0 != settings->on_message_begin(parser))) { \ SET_ERRNO(HPE_CB_message_begin); \ } \ UPDATE_STATE(parser->state); \ \ /* We either errored above or got paused; get out */ \ if (UNLIKELY(HTTP_PARSER_ERRNO(parser) != HPE_OK)) { \ return (p - data + 1); \ } \ } \ } while (0) ``` 因為在執行 `http_parser_execute` 時有傳入 `settings` 當作 argument 之一,所以在第六行會執行在 `http_server.c` 所定義的 callback function : ```c= static int http_parser_callback_message_begin(http_parser *parser) { struct http_request *request = parser->data; struct socket *socket = request->socket; memset(request, 0x00, sizeof(struct http_request)); request->socket = socket; return 0; } ``` 到目前為止,`p_state` 被更改為 **s_req_method**,並且執行 **http_parser_callback_message_begin** callback function,下一次 case 則為 **s_req_method** ```c= for (p=data; p != data + len; p++) { ch = *p; ... switch (CURRENT_STATE()) { ... case s_req_method: { const char *matcher; if (UNLIKELY(ch == '\0')) { SET_ERRNO(HPE_INVALID_METHOD); goto error; } matcher = method_strings[parser->method]; if (ch == ' ' && matcher[parser->index] == '\0') { UPDATE_STATE(s_req_spaces_before_url); } else if (ch == matcher[parser->index]) { ; /* nada */ } else if ((ch >= 'A' && ch <= 'Z') || ch == '-') { switch (parser->method << 16 | parser->index << 8 | ch) { #define XX(meth, pos, ch, new_meth) \ case (HTTP_##meth << 16 | pos << 8 | ch): \ parser->method = HTTP_##new_meth; break; XX(POST, 1, 'U', PUT) XX(POST, 1, 'A', PATCH) XX(POST, 1, 'R', PROPFIND) XX(PUT, 2, 'R', PURGE) XX(CONNECT, 1, 'H', CHECKOUT) XX(CONNECT, 2, 'P', COPY) XX(MKCOL, 1, 'O', MOVE) XX(MKCOL, 1, 'E', MERGE) XX(MKCOL, 1, '-', MSEARCH) XX(MKCOL, 2, 'A', MKACTIVITY) XX(MKCOL, 3, 'A', MKCALENDAR) XX(SUBSCRIBE, 1, 'E', SEARCH) XX(SUBSCRIBE, 1, 'O', SOURCE) XX(REPORT, 2, 'B', REBIND) XX(PROPFIND, 4, 'P', PROPPATCH) XX(LOCK, 1, 'I', LINK) XX(UNLOCK, 2, 'S', UNSUBSCRIBE) XX(UNLOCK, 2, 'B', UNBIND) XX(UNLOCK, 3, 'I', UNLINK) #undef XX default: SET_ERRNO(HPE_INVALID_METHOD); goto error; } } else { SET_ERRNO(HPE_INVALID_METHOD); goto error; } ++parser->index; break; } ... } } ``` **method_strings[parser->method]** ```c= static const char *method_strings[] = { #define XX(num, name, string) #string, HTTP_METHOD_MAP(XX) #undef XX }; ``` ```c= /* Request Methods */ #define HTTP_METHOD_MAP(XX) \ XX(0, DELETE, DELETE) \ XX(1, GET, GET) \ XX(2, HEAD, HEAD) \ XX(3, POST, POST) \ XX(4, PUT, PUT) \ /* pathological */ \ XX(5, CONNECT, CONNECT) \ XX(6, OPTIONS, OPTIONS) \ XX(7, TRACE, TRACE) \ /* WebDAV */ \ XX(8, COPY, COPY) \ XX(9, LOCK, LOCK) \ XX(10, MKCOL, MKCOL) \ XX(11, MOVE, MOVE) \ XX(12, PROPFIND, PROPFIND) \ XX(13, PROPPATCH, PROPPATCH) \ XX(14, SEARCH, SEARCH) \ XX(15, UNLOCK, UNLOCK) \ XX(16, BIND, BIND) \ XX(17, REBIND, REBIND) \ XX(18, UNBIND, UNBIND) \ XX(19, ACL, ACL) \ /* subversion */ \ XX(20, REPORT, REPORT) \ XX(21, MKACTIVITY, MKACTIVITY) \ XX(22, CHECKOUT, CHECKOUT) \ XX(23, MERGE, MERGE) \ /* upnp */ \ XX(24, MSEARCH, M-SEARCH) \ XX(25, NOTIFY, NOTIFY) \ XX(26, SUBSCRIBE, SUBSCRIBE) \ XX(27, UNSUBSCRIBE, UNSUBSCRIBE) \ /* RFC-5789 */ \ XX(28, PATCH, PATCH) \ XX(29, PURGE, PURGE) \ /* CalDAV */ \ XX(30, MKCALENDAR, MKCALENDAR) \ /* RFC-2068, section 19.6.1.2 */ \ XX(31, LINK, LINK) \ XX(32, UNLINK, UNLINK) \ /* icecast */ \ XX(33, SOURCE, SOURCE) \ ``` * `XX` 是 macro name * `(num, name, string)` 是 macro 的 parameters * **#string**, 是 macro 的展開部分,使用了 `#` 運算符將 `string` 參數轉換為一個字串。 * 整體上,這個 macro 的目的是將 string 參數轉換為字串。 * `HTTP_METHOD_MAP(XX)` 是另一個 macro 名稱 * `(XX)` 是將 `XX` macro 作為參數傳遞給 `HTTP_METHOD_MAP` macro。 * 此時,`XX` macro 將被展開,並在 `HTTP_METHOD_MAP` macro 的展開過程中使用。 因此,`HTTP_METHOD_MAP(XX)` 的展開結果將使用 `XX` macro,並在每次展開時將 `string` 參數轉換為字串,以生成一系列的字串常量。 `method_strings` 陣列將展開為: ```c= static const char *method_strings[] = { "DELETE", "GET", "HEAD", ... }; ``` 所以 `method_strings[parser->method]` 為 **"GET"**,此時的 `ch` 為 `E`,`parser->index` 為 `1`,要解析完整的 `method` 直到遇到 `ch` 為 `‘ ’` 且 `matcher[parser->index]` 為 `'\0'`,才會將 `p_state` 更改為 `s_req_spaces_before_url`,在這過程中 `p` 會往下一格;`parser->index` 會 `++` 接著 `CURRENT_STATE()` 為 `s_req_spaces_before_url`,執行 `parse_url_char` 函式,將下列 `arguments` 傳入`parse_url_char` 函式 * `CURRENT_STATE()` :**`s_req_spaces_before_url`** * `ch` : **`/`** ```c= for (p=data; p != data + len; p++) { ch = *p; ... switch (CURRENT_STATE()) { ... case s_req_spaces_before_url: { if (ch == ' ') break; MARK(url); if (parser->method == HTTP_CONNECT) { UPDATE_STATE(s_req_server_start); } UPDATE_STATE(parse_url_char(CURRENT_STATE(), ch)); if (UNLIKELY(CURRENT_STATE() == s_dead)) { SET_ERRNO(HPE_INVALID_URL); goto error; } break; } ... } } ``` ```clike= static enum state parse_url_char(enum state s, const char ch) { if (ch == ' ' || ch == '\r' || ch == '\n') { return s_dead; } #if HTTP_PARSER_STRICT if (ch == '\t' || ch == '\f') { return s_dead; } #endif switch (s) { case s_req_spaces_before_url: /* Proxied requests are followed by scheme of an absolute URI (alpha). * All methods except CONNECT are followed by '/' or '*'. */ if (ch == '/' || ch == '*') { return s_req_path; } if (IS_ALPHA(ch)) { return s_req_schema; } break; ... } return s_dead; } ``` 將 `p_state` 更新為 `s_req_path`,所以 case 是 `s_req_path`,將 `CURRENT_STATE` 更新為 `s_req_http_start` 並且執行 **http_parser_callback_request_url** callback function ```c= for (p=data; p != data + len; p++) { ch = *p; ... switch (CURRENT_STATE()) { ... case s_req_server: case s_req_server_with_at: case s_req_path: case s_req_query_string_start: case s_req_query_string: case s_req_fragment_start: case s_req_fragment: { switch (ch) { case ' ': UPDATE_STATE(s_req_http_start); CALLBACK_DATA(url); break; case CR: case LF: parser->http_major = 0; parser->http_minor = 9; UPDATE_STATE((ch == CR) ? s_req_line_almost_done : s_header_field_start); CALLBACK_DATA(url); break; default: UPDATE_STATE(parse_url_char(CURRENT_STATE(), ch)); if (UNLIKELY(CURRENT_STATE() == s_dead)) { SET_ERRNO(HPE_INVALID_URL); goto error; } } break; } ... } } ``` ```c= /* Run data callback FOR with LEN bytes, returning ER if it fails */ #define CALLBACK_DATA_(FOR, LEN, ER) \ do { \ assert(HTTP_PARSER_ERRNO(parser) == HPE_OK); \ \ if (url_mark) { \ if (LIKELY(settings->on_url)) { \ parser->state = CURRENT_STATE(); \ if (UNLIKELY(0 != \ settings->on_url(parser, url_mark, (LEN)))) { \ SET_ERRNO(HPE_CB_url); \ } \ UPDATE_STATE(parser->state); \ \ /* We either errored above or got paused; get out */ \ if (UNLIKELY(HTTP_PARSER_ERRNO(parser) != HPE_OK)) { \ return (ER); \ } \ } \ url_mark = NULL; \ } \ } while (0) /* Run the data callback FOR and consume the current byte */ #define CALLBACK_DATA(FOR) \ CALLBACK_DATA_(FOR, p - FOR##_mark, p - data + 1) ``` ```c= static int http_parser_callback_request_url(http_parser *parser, const char *p, size_t len) { struct http_request *request = parser->data; strncat(request->request_url, p, len); return 0; } ``` 到目前為止 `ch` 為 `' '`,下一次 `ch` 為 `H` 並且 `CURRENT_STATE` 為 `s_req_http_start`,逐步分析 **`HTTP/1.1\r\n`** ```c= for (p=data; p != data + len; p++) { ch = *p; ... switch (CURRENT_STATE()) { ... case s_req_http_start: switch (ch) { case ' ': break; case 'H': UPDATE_STATE(s_req_http_H); break; case 'I': if (parser->method == HTTP_SOURCE) { UPDATE_STATE(s_req_http_I); break; } /* fall through */ default: SET_ERRNO(HPE_INVALID_CONSTANT); goto error; } break; case s_req_http_H: STRICT_CHECK(ch != 'T'); UPDATE_STATE(s_req_http_HT); break; case s_req_http_HT: STRICT_CHECK(ch != 'T'); UPDATE_STATE(s_req_http_HTT); break; case s_req_http_HTT: STRICT_CHECK(ch != 'P'); UPDATE_STATE(s_req_http_HTTP); break; case s_req_http_HTTP: STRICT_CHECK(ch != '/'); UPDATE_STATE(s_req_http_major); break; case s_req_http_major: if (UNLIKELY(!IS_NUM(ch))) { SET_ERRNO(HPE_INVALID_VERSION); goto error; } parser->http_major = ch - '0'; UPDATE_STATE(s_req_http_dot); break; case s_req_http_dot: { if (UNLIKELY(ch != '.')) { SET_ERRNO(HPE_INVALID_VERSION); goto error; } UPDATE_STATE(s_req_http_minor); break; } case s_req_http_minor: if (UNLIKELY(!IS_NUM(ch))) { SET_ERRNO(HPE_INVALID_VERSION); goto error; } parser->http_minor = ch - '0'; UPDATE_STATE(s_req_http_end); break; case s_req_http_end: { if (ch == CR) { UPDATE_STATE(s_req_line_almost_done); break; } if (ch == LF) { UPDATE_STATE(s_header_field_start); break; } SET_ERRNO(HPE_INVALID_VERSION); goto error; break; } /* end of request line */ case s_req_line_almost_done: { if (UNLIKELY(ch != LF)) { SET_ERRNO(HPE_LF_EXPECTED); goto error; } UPDATE_STATE(s_header_field_start); break; } ... } } ``` 到目前為止已經將 **`GET / HTTP/1.1\r\n`** `header` 解析完成,接著分析 `header_field` ```shell= GET / HTTP/1.1 User-Agent: wget/1.20.3 (linux-gnu) Accept: */* Accept-Encoding: identity Host: localhost:8081 Connection: Keep-Alive ``` ```c= for (p=data; p != data + len; p++) { ch = *p; ... switch (CURRENT_STATE()) { ... case s_header_field_start: { if (ch == CR) { UPDATE_STATE(s_headers_almost_done); break; } if (ch == LF) { /* they might be just sending \n instead of \r\n so this would be * the second \n to denote the end of headers*/ UPDATE_STATE(s_headers_almost_done); REEXECUTE(); } c = TOKEN(ch); if (UNLIKELY(!c)) { SET_ERRNO(HPE_INVALID_HEADER_TOKEN); goto error; } MARK(header_field); parser->index = 0; UPDATE_STATE(s_header_field); switch (c) { case 'c': parser->header_state = h_C; break; case 'p': parser->header_state = h_matching_proxy_connection; break; case 't': parser->header_state = h_matching_transfer_encoding; break; case 'u': parser->header_state = h_matching_upgrade; break; default: parser->header_state = h_general; break; } break; } ... } } ``` 總共有 5 個 header_field,分別是: * User-Agent: wget/1.20.3 (linux-gnu) * Accept: */* * Accept-Encoding: identity * Host: localhost:8081 * Connection: Keep-Alive 在這個 case `s_header_field_start` 中,會根據第一個字元,也就是 * `ch` = `'U'`, `c` = `'u'` * `ch` = `'A'`, `c` = `'a'` * `ch` = `'A'`, `c` = `'a'` * `ch` = `'H'`, `c` = `'h'` * `ch` = `'C'`, `c` = `'c'` 去辨識 `parser->header_state`,而下一步則是逐步解析後面的字元,所以會將 `CURRENT_STATE()` 更新為 `s_header_field` 已最後 **Connection: Keep-Alive** 為例,已經判斷 `parser->header_state` 為 `h_connection`,且目前 `ch` 為 `:`,所以將 `CURRENT_STATE()` 更新為 `s_header_value_discard_ws` ```c= case s_header_value_discard_ws: if (ch == ' ' || ch == '\t') break; if (ch == CR) { UPDATE_STATE(s_header_value_discard_ws_almost_done); break; } if (ch == LF) { UPDATE_STATE(s_header_value_discard_lws); break; } ``` 目前 `ch` 為 `' '`,所以 case `s_header_value_discard_ws` 第一個 `if` 成立,下一次 `ch` 為 `K`,會直接 fall through 到 case **s_header_value_start**,在這個 case 一開始,就會將 `CURRENT_STATE()` 更新為 `s_header_value`,目的是為了解析後續字串 ```c= case s_header_value_start: { MARK(header_value); UPDATE_STATE(s_header_value); parser->index = 0; c = LOWER(ch); switch (parser->header_state) { case h_upgrade: parser->flags |= F_UPGRADE; parser->header_state = h_general; break; case h_transfer_encoding: /* looking for 'Transfer-Encoding: chunked' */ if ('c' == c) { parser->header_state = h_matching_transfer_encoding_chunked; } else { parser->header_state = h_matching_transfer_encoding_token; } break; /* Multi-value `Transfer-Encoding` header */ case h_matching_transfer_encoding_token_start: break; case h_content_length: if (UNLIKELY(!IS_NUM(ch))) { SET_ERRNO(HPE_INVALID_CONTENT_LENGTH); goto error; } if (parser->flags & F_CONTENTLENGTH) { SET_ERRNO(HPE_UNEXPECTED_CONTENT_LENGTH); goto error; } parser->flags |= F_CONTENTLENGTH; parser->content_length = ch - '0'; parser->header_state = h_content_length_num; break; /* when obsolete line folding is encountered for content length * continue to the s_header_value state */ case h_content_length_ws: break; case h_connection: /* looking for 'Connection: keep-alive' */ if (c == 'k') { parser->header_state = h_matching_connection_keep_alive; /* looking for 'Connection: close' */ } else if (c == 'c') { parser->header_state = h_matching_connection_close; } else if (c == 'u') { parser->header_state = h_matching_connection_upgrade; } else { parser->header_state = h_matching_connection_token; } break; /* Multi-value `Connection` header */ case h_matching_connection_token_start: break; default: parser->header_state = h_general; break; } break; } ``` 接著在第 50 行,將 `parser->header_state` 更新為 `h_matching_connection_keep_alive`,最終 `parser->header_state` 會被更新為 `h_connection_keep_alive`,會在 case `s_header_value_lws` 將 `parser->flags` 新增 `F_CONNECTION_KEEP_ALIVE` ```c= case s_header_value_lws: { if (ch == ' ' || ch == '\t') { if (parser->header_state == h_content_length_num) { /* treat obsolete line folding as space */ parser->header_state = h_content_length_ws; } UPDATE_STATE(s_header_value_start); REEXECUTE(); } /* finished the header */ switch (parser->header_state) { case h_connection_keep_alive: parser->flags |= F_CONNECTION_KEEP_ALIVE; break; case h_connection_close: parser->flags |= F_CONNECTION_CLOSE; break; case h_transfer_encoding_chunked: parser->flags |= F_CHUNKED; break; case h_connection_upgrade: parser->flags |= F_CONNECTION_UPGRADE; break; default: break; } UPDATE_STATE(s_header_field_start); REEXECUTE(); } ``` 整個 message 解析完成,最後會到兩個 case,分別是 `s_headers_almost_done`、`s_headers_done`,在 case `s_headers_done`,會在去判斷 `http_should_keep_alive(parser)`,且將 `CURRENT_STATE()` 更新為 `s_dead`,執行 **message_complete** callback function ```c= static int http_parser_callback_message_complete(http_parser *parser) { struct http_request *request = parser->data; http_server_response(request, http_should_keep_alive(parser)); request->complete = 1; return 0; } ``` 最後第12行 `if` 成立,跳離 while loop ```c= static void http_server_worker(struct work_struct *work) { ... while (!daemon.is_stopped) { int ret; ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1); if (ret <= 0) { pr_err("read error: %d\n", ret); break; } http_parser_execute(&parser, &setting, buf, ret); if (request.complete && !http_should_keep_alive(&parser)) break; memset(buf, 0, RECV_BUFFER_SIZE); } ... } ``` ## 引入 [CMWQ](https://www.kernel.org/doc/html/v4.15/core-api/workqueue.html) 改寫 khttpd 整個程式的主要流程是建立 CMWQ → 連線建立後建立 work → workqueue 開始運作 → 釋放所有記憶體。 在 main.c創造一個專屬的 `working queue`,型態為 `struct workqueue_struct` ```diff= static struct socket *listen_socket; static struct http_server_param param; static struct task_struct *http_server; + struct workqueue_struct *http_server_wq; ``` 首先建立 CMWQ 的部份在掛載模組時執行,位於函式 `khttpd_init` ,以下為修改的部份: ```diff= static int __init khttpd_init(void) { int err = open_listen_socket(port, backlog, &listen_socket); if (err < 0) { pr_err("can't open listen socket\n"); return err; } param.listen_socket = listen_socket; + http_server_wq = alloc_workqueue(MODULE_NAME, 0, 0); http_server = kthread_run(http_server_daemon, &param, KBUILD_MODNAME); if (IS_ERR(http_server)) { pr_err("can't start http server daemon\n"); close_listen_socket(listen_socket); return PTR_ERR(http_server); } return 0; } ``` 為了有效管理 work ,所有的 work 都會被加到一個鏈結串列,可在 `http_server.h` 新增以下結構體: ```c= struct httpd_service { bool is_stopped; struct list_head worker_list; }; ``` 該結構體的作用是充當鏈結串列的首個節點,成員 `is_stopped` 用以判斷是否有結束連線的訊號發生。 接著新增 `khttpd` 結構: ```c= struct khttpd { struct socket *sock; struct list_head list; struct work_struct khttpd_work; }; ``` 在 `http_server.c` 中的 `http_server_daemon` 函式新增 `create_work()` 來新增 work 時機為 server 和 client 建立連線後。 `queue_work()` 則是將 `work` 加入到 `http_server_wq`,讓內部去處理 request;在最後結束時呼叫 `free_work()` 來釋放所有的記憶體。 為了方便與原本 `kthread_run` 做比較,所以新增一個 `CMWQ_MODE` debug mode。 ```diff= + #define CMWQ_MODE 1 + extern struct workqueue_struct *http_server_wq; + struct httpd_service daemon = {.is_stopped = false}; int http_server_daemon(void *arg) { struct socket *client_socket; struct http_server_param *param = (struct http_server_param *) arg; + #if CMWQ_MODE > 0 + struct work_struct *work; + #else struct task_struct *worker; + #endif allow_signal(SIGKILL); allow_signal(SIGTERM); + INIT_LIST_HEAD(&daemon.worker_list); while (!kthread_should_stop()) { int err = kernel_accept(param->listen_socket, &client_socket, 0); if (err < 0) { if (signal_pending(current)) break; pr_err("kernel_accept() error: %d\n", err); continue; } + #if CMWQ_MODE > 0 + + if (unlikely(!(work = create_work(client_socket)))) { + printk(KERN_ERR MODULE_NAME + ": create work error, connection closed\n"); + kernel_sock_shutdown(client_socket, SHUT_RDWR); + sock_release(client_socket); + continue; + } + + /* start server worker */ + queue_work(http_server_wq, work); + #else worker = kthread_run(http_server_worker, client_socket, KBUILD_MODNAME); if (IS_ERR(worker)) { pr_err("can't create more worker process\n"); continue; } + #endif } + #if CMWQ_MODE > 0 + printk(MODULE_NAME ": daemon shutdown in progress...\n"); + daemon.is_stopped = true; + free_work(); + #endif return 0; } ``` 函式 create_work 主要流程為建立 work 所需的空間 → 初始化 work → 將 work 加進鏈結串列裡。 當 work 被調度時,會執行 `http_server_worker` 函式。 ```c= static struct work_struct *create_work(struct socket *socket) { struct khttpd *work = kmalloc(sizeof(struct khttpd), GFP_KERNEL); if (!work) return NULL; work->sock = socket; INIT_WORK(&work->khttpd_work, http_server_worker); list_add(&work->list, &daemon.worker_list); return &work->khttpd_work; } ``` ```c= static void free_work(void) { struct khttpd *safe, *next; list_for_each_entry_safe (safe, next, &daemon.worker_list, list) { kernel_sock_shutdown(safe->sock, SHUT_RDWR); flush_work(&safe->khttpd_work); sock_release(safe->sock); kfree(safe); } } ``` 在 `http_server_worker` 函式中,就只有以下地方不同 ```diff= + static void http_server_worker(struct work_struct *work) - static int http_server_worker(void *arg) { char *buf; + struct khttpd *worker = container_of(work, struct khttpd, khttpd_work); + struct socket *socket = worker->sock; - struct socket *socket = (struct socket *) arg; struct http_request request; struct http_parser parser; struct http_parser_settings setting = { .on_message_begin = http_parser_callback_message_begin, .on_url = http_parser_callback_request_url, .on_header_field = http_parser_callback_header_field, .on_header_value = http_parser_callback_header_value, .on_headers_complete = http_parser_callback_headers_complete, .on_body = http_parser_callback_body, .on_message_complete = http_parser_callback_message_complete}; allow_signal(SIGKILL); allow_signal(SIGTERM); buf = kzalloc(RECV_BUFFER_SIZE, GFP_KERNEL); if (!buf) { pr_err("can't allocate memory!\n"); return; } request.socket = socket; http_parser_init(&parser, HTTP_REQUEST); parser.data = &request; - while(!kthread_should_stop()) { + while (!daemon.is_stopped) { int ret; ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1); if (ret <= 0) { pr_err("read error: %d\n", ret); break; } http_parser_execute(&parser, &setting, buf, ret); if (request.complete && !http_should_keep_alive(&parser)) break; memset(buf, 0, RECV_BUFFER_SIZE); } kernel_sock_shutdown(socket, SHUT_RDWR); sock_release(socket); kfree(buf); pr_info("http_server_worker exit\n"); } ``` 執行 `./htstress http://localhost:8081 -t 3 -c 20 -n 200000`,以下為執行結果。 ```shell= 0 requests 20000 requests 40000 requests 60000 requests 80000 requests 100000 requests 120000 requests 140000 requests 160000 requests 180000 requests requests: 200000 good requests: 200000 [100%] bad requests: 0 [0%] socket errors: 0 [0%] seconds: 1.535 requests/sec: 130255.568 ``` | Kthread | CMWQ | | -------- | -------- | | 50521.661 | 130255.568 | 更改為 `CMWQ`,整體 throughput (requests/sec) 提升了 2.5 倍。 ## 實做 directory listing 功能 ```c= #define CRLF "\r\n" #define PATH "/home/xueyang/linux2023/khttpd" #define HTTP_RESPONSE_200_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/html" CRLF "Connection: Close" CRLF \ "Transfer-Encoding: chunked" CRLF CRLF #define HTTP_RESPONSE_200_KEEPALIVE_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/html" CRLF "Connection: Keep-Alive" CRLF \ "Transfer-Encoding: chunked" CRLF CRLF #define HTTP_RESPONSE_501 \ "" \ "HTTP/1.1 501 Not Implemented" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: 21" CRLF \ "Connection: Close" CRLF CRLF "501 Not Implemented" CRLF #define HTTP_RESPONSE_501_KEEPALIVE \ "" \ "HTTP/1.1 501 Not Implemented" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: 21" CRLF \ "Connection: KeepAlive" CRLF CRLF "501 Not Implemented" CRLF /** * * If CRLF must be included in chunk data, the len of CRLF is 2. * * The len of chunk data is len of char array + 2 * the number of CRLF. */ #define HTTP_RESPONSE_DIRECTORY_LIST_BEGIN \ "" \ "7B" CRLF "<html><head><style>" CRLF \ "body{font-family: monospace; font-size: 15px;}" CRLF \ "td {padding: 1.5px 6px;}" CRLF "</style></head><body><table>" CRLF #define HTTP_RESPONSE_DIRECTORY_LIST_END \ "" \ "16" CRLF "</table></body></html>" CRLF "0" CRLF CRLF static void handle_directory(struct http_request *request, int keep_alive) { char *response; char absolute_path[100]; struct file *fp; request->dir_context.actor = http_server_trace_dir; if (request->method != HTTP_GET) { response = keep_alive ? HTTP_RESPONSE_501_KEEPALIVE : HTTP_RESPONSE_501; http_server_send(request->socket, response, strlen(response)); return; } /* extern struct file *filp_open(const char *, int, umode_t); */ memcpy(absolute_path, PATH, strlen(PATH)); memcpy(absolute_path + strlen(PATH), request->request_url, strlen(request->request_url)); absolute_path[strlen(PATH) + strlen(request->request_url)] = '\0'; fp = filp_open(absolute_path, O_RDONLY, 0); if (IS_ERR(fp)) { pr_err("open error: %s %ld\n", absolute_path, PTR_ERR(fp)); return; } else { printk("open success: %s\n", absolute_path); } if (S_ISDIR(fp->f_inode->i_mode)) { response = keep_alive ? HTTP_RESPONSE_200_KEEPALIVE_DUMMY : HTTP_RESPONSE_200_DUMMY; http_server_send(request->socket, response, strlen(response)); response = HTTP_RESPONSE_DIRECTORY_LIST_BEGIN; http_server_send(request->socket, response, strlen(response)); iterate_dir(fp, &request->dir_context); response = HTTP_RESPONSE_DIRECTORY_LIST_END; http_server_send(request->socket, response, strlen(response)); } else { /* is a file */ char *read_data = kmalloc(fp->f_inode->i_size, GFP_KERNEL); int ret = read_file(fp, read_data); if (ret < 0) { pr_err("read file error: %d\n", ret); return; } http_server_send_header(request->socket, 200, "OK", "text/plain", keep_alive, ret); http_server_send(request->socket, read_data, ret); kfree(read_data); } filp_close(fp, NULL); } static int http_server_response(struct http_request *request, int keep_alive) { handle_directory(request, keep_alive); return 0; } ``` `handle_directory` 主要做下列事情 1. 判斷 request 是否為 `GET` 2. 透過 `filp_open` 來開啟 User 點選的目錄或者檔案 3. 判斷是目錄或是檔案 1. 目錄:則透過 `iterate_dir` 走訪目錄內的所有資料 2. 檔案:則透過 `kernel_read` 讀取檔案,並且利用 **Chunked transfer encoding** 送出資料 ### iterate_dir iterate_dir 如何導向到自己定義的 callback function ```c= static void handle_directory(struct http_request *request, int keep_alive) { ... request->dir_context.actor = http_server_trace_dir; ... } ``` Callback function: ```c= static int http_server_trace_dir(struct dir_context *dir_context, const char *name, int namelen, loff_t offset, u64 ino, unsigned int d_type) { if (strcmp(name, ".") && strcmp(name, "..")) { struct http_request *request = container_of(dir_context, struct http_request, dir_context); char buf[256] = {0}; snprintf(buf, sizeof(buf), "%x\r\n<tr><td><a href=\"%s\">%s</a></td></tr>\r\n", 33 + (namelen << 1), name, name); http_server_send(request->socket, buf, strlen(buf)); } return 0; } ``` 由於需要透過 socket 回傳資料,但 `iterate_dir` 參數是固定,所以將 `struct dir_context` 加入自 `http_request` 資料結構 ```diff= struct http_request { struct socket *socket; enum http_method method; char request_url[128]; int complete; + struct dir_context dir_context; }; ``` 所以在 Callback function,透過 `container_of` 來找到 `struct http_request` 起始位址,就可以該request 的 socket descripter ### 讀取檔案資料 如何得知檔案屬性:在 Linux kernel 裡,檔案的屬性由結構 inode 所管理,位於 [include/linux/fs.h](https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L613) ,而這裡主要使用到成員 i_mode 及 i_size ,前者主要表示檔案的類型,後者儲存檔案的大小 ```c= struct inode { umode_t i_mode; ... loff_t i_size; ... }; ``` 接著我們可以利用巨集 `S_ISDIR` 來判斷是否為目錄 ```c= if (S_ISDIR(fp->f_inode->i_mode)) { response = keep_alive ? HTTP_RESPONSE_200_KEEPALIVE_DUMMY : HTTP_RESPONSE_200_DUMMY; http_server_send(request->socket, response, strlen(response)); response = HTTP_RESPONSE_DIRECTORY_LIST_BEGIN; http_server_send(request->socket, response, strlen(response)); iterate_dir(fp, &request->dir_context); response = HTTP_RESPONSE_DIRECTORY_LIST_END; http_server_send(request->socket, response, strlen(response)); } else { /* is a file */ char *read_data = kmalloc(fp->f_inode->i_size, GFP_KERNEL); int ret = read_file(fp, read_data); if (ret < 0) { pr_err("read file error: %d\n", ret); return; } http_server_send_header(request->socket, 200, "OK", "text/plain", keep_alive, ret); http_server_send(request->socket, read_data, ret); kfree(read_data); } ``` 若是檔案類型,使用 `kernel_read` 讀取到 buffer 當中 ```c= static inline int read_file(struct file *fp, char *buf) { return kernel_read(fp, buf, fp->f_inode->i_size, 0); } ``` ### 使用 Chunked transfer encoding 送出目錄資料 還未使用 `Chunked transfer encoding` 送出目錄資料的時候,一開始都必須先定義好 content 長度 ```c= #define HTTP_RESPONSE_501 \ "" \ "HTTP/1.1 501 Not Implemented" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: 21" CRLF \ "Connection: Close" CRLF CRLF "501 Not Implemented" CRLF ``` 在 HTTP 1.1 中提供了 Chunked encoding 的方法,可以將資料分成一個個的 chunk 並且分批發送,如此一來可以避免要在 HTTP header 中傳送 `Content-Length: xx` [HTTP headers | Transfer-Encoding](https://www.geeksforgeeks.org/http-headers-transfer-encoding/) 有提到例子: ```c= HTTP/1.0 200 OK Content-Type: text/plain Transfer-Encoding: chunked 0\r\n Mozilla\r\n 7\r\n Developer\r\n 9\r\n Network\r\n 0\r\n \r\n ``` 1. 每次傳送資料前都要先送出資料的長度 2. 資料的長度是 16 進位表示 3. 資料長度和資料由 \r\n 隔開 4. 要中斷資料傳送只要送出長度為 0 的資料即可 所以我們將 HTTP_RESPONSE_200_XXX 改成 Transfer-Encoding: chunked ```c= #define HTTP_RESPONSE_200_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/html" CRLF "Connection: Close" CRLF \ "Transfer-Encoding: chunked" CRLF CRLF #define HTTP_RESPONSE_200_KEEPALIVE_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/html" CRLF "Connection: Keep-Alive" CRLF \ "Transfer-Encoding: chunked" CRLF CRLF ``` ```c= static int http_server_trace_dir(struct dir_context *dir_context, const char *name, int namelen, loff_t offset, u64 ino, unsigned int d_type) { if (strcmp(name, ".") && strcmp(name, "..")) { struct http_request *request = container_of(dir_context, struct http_request, dir_context); char buf[256] = {0}; snprintf(buf, sizeof(buf), "%x\r\n<tr><td><a href=\"%s\">%s</a></td></tr>\r\n", 33 + (namelen << 1), name, name); http_server_send(request->socket, buf, strlen(buf)); } return 0; } static void handle_directory(struct http_request *request, int keep_alive) { ... if (S_ISDIR(fp->f_inode->i_mode)) { response = keep_alive ? HTTP_RESPONSE_200_KEEPALIVE_DUMMY : HTTP_RESPONSE_200_DUMMY; http_server_send(request->socket, response, strlen(response)); response = HTTP_RESPONSE_DIRECTORY_LIST_BEGIN; http_server_send(request->socket, response, strlen(response)); iterate_dir(fp, &request->dir_context); response = HTTP_RESPONSE_DIRECTORY_LIST_END; http_server_send(request->socket, response, strlen(response)); } ... } ``` 在第 14 行當中,就先傳送長度 `%x\r\n`,後面在傳送內容 ```c= snprintf(buf, sizeof(buf), "%x\r\n<tr><td><a href=\"%s\">%s</a></td></tr>\r\n", 33 + (namelen << 1), name, name); ``` ## 透過 [virtme](https://git.kernel.org/pub/scm/utils/kernel/virtme/virtme.git) 建立 Linux 核心測試環境,並搭配 [crash](https://github.com/crash-utility/crash) 進行核心偵錯 virtme, crash 安裝參照 [測試 Linux 核心的虛擬化環境](https://hackmd.io/@sysprog/linux-virtme#%E6%B8%AC%E8%A9%A6-Linux-%E6%A0%B8%E5%BF%83%E7%9A%84%E8%99%9B%E6%93%AC%E5%8C%96%E7%92%B0%E5%A2%83) ### 載入 khttpd kernel module 在宿主環境下,預先編譯好 kernel module,Makefile 中的 kernel 路徑改成如下 ```c= KDIR=/home/xueyang/linux ... ``` 啟動 virtme 虛擬環境,需要在啟動虛擬環境的命令中加入 `--qemu-opts -qmp tcp:localhost:4444,server,nowait` 這樣的參數 ```shell= $ virtme-run --kimg arch/x86/boot/bzImage \ --qemu-opts -qmp tcp:localhost:4444,server,nowait ``` 到 khttpd 目錄下,載入 module ```c= # insmod khttp.ko ``` ### 若 kernel 死掉,可以搭配 crash 來分析 kernel dump。 維持虛擬環境繼續執行的情況下,我們回到宿主系統,使用 QMP 來與現行 QEMU 環境通訊,擷取目前虛擬環境的 kernel dump。 依照以下的步驟產生 kernel dump 1. 使用 telnet 與 QEMU 連線,預期可以看到 QMP 的歡迎訊息 ```shell= $ telnet localhost 4444 {"QMP": {"version": {"qemu": {"micro": 0, "minor": 6, "major": 1}, "package": ""}, "capabilities": []}} ``` 2. 輸入以下的命令進入 QMP 的命令模式: ```shell= telnet> { "execute": "qmp_capabilities" } ``` 3. 執行命令將虛擬環境的記憶體內容傾到於指定的檔案中 ```shell= telnet> { "execute": "dump-guest-memory", "arguments": {"paging": false, "protocol": "file:vmcore.img" }} ``` ### 執行 crash 並分析 kernel dump 準備好含有 debug symbol 的 vmlinux 以及 kernel dump,就可以開始使用 crash 偵錯 ```shell= $ crash /home/xueyang/linux/vmlinux /home/xueyang/linux/vmcore.img Type "apropos word" to search for commands related to "word"... KERNEL: vmlinux DUMPFILE: vmcore.img CPUS: 1 DATE: Sat Jun 10 21:28:50 CST 2023 UPTIME: 00:00:15 LOAD AVERAGE: 0.00, 0.00, 0.00 TASKS: 45 NODENAME: (none) RELEASE: 5.15.0-rc7 VERSION: #4 SMP Sat Jun 10 21:26:45 CST 2023 MACHINE: x86_64 (2903 Mhz) MEMORY: 127.5 MB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 156 COMMAND: "bash" TASK: ffff9a5502323e00 [THREAD_INFO: ffff9a5502323e00] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> ``` dmesg 可以查看造成 crash 的原因,找到 pid 之後,就可以利用 GDB 來 debug。 ## 參考資料 * [SO_REUSEPORT解决了什么问题](https://www.cnblogs.com/schips/p/12553321.html)