# ktcp contributed by < `chiacyu` > ## [CMWQ](https://www.kernel.org/doc/html/latest/core-api/workqueue.html) 解讀 從文章的描述可以看到作者主要提到幾個問題 - 原本的workqueue無法在多個不同的 `CPU` 核之間互相搬移任務 - 原本的 Multi-thread workqueue 必須保持跟 CPU 核心一樣數量的 worker 可能會造成資源的浪費 - Work item之間必須彼此競爭可能導致更多的延遲 透過 CMWQ 希望能夠作到除了能兼容原先的實做之外還做了一些修改包括 - 將 worker pool 共享給所有的 workerqueue - 將 worker pool 裡worker的數量維持基礎水位避免過多worker佔用系統資源 - 當 work item 佔用太多時間,scheduler會介入換另一個work item可以被服務 利用 `kecho` 裡面的 `bench` 來做一下測試:以下是 `user-echo-server.c` 的結果 ![](https://i.imgur.com/pzo5U80.png) `kecho` 的結果如下 ![](https://i.imgur.com/2YvfPtQ.png) 可以看到除了在 `kernel space` 執行外, CMWQ也帶來時粉顯著的 --- ## CPU scheduler and workqueue/CMWQ --- ## 於 ktcp 中導入 CMWQ 首先可以先看還未引入 `cmwq` 時的執行效果 ```c 0 requests 10000 requests 20000 requests 30000 requests 40000 requests 50000 requests 60000 requests 70000 requests 80000 requests 90000 requests requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socket errors: 0 [0%] seconds: 2.232 requests/sec: 44807.565 Complete ``` ### 引入 cmwq 首先需要新增幾個資料結構來進行後續操作 ```c struct khttp { struct socket *sock; struct list_head list; struct work_struct khttp_work; }; ``` 透過 `khttp` 資料結構來紀錄 - 連接的 `socket` 位址 - `list` 結構為鏈結串列之節點 - `khttp_work` 針對 `workqueue` 的 單一 `worker` ```c struct khttp_server_service { bool is_stopped; struct list_head worker; }; ``` 透過 `khttp_server_service` 來紀錄 - `is_stopped` 來紀錄整個 `server` 目前的狀態 - `worker` 來作為紀錄 `worker` 鏈結串列的首部節點 ```c static struct work_struct *create_work(struct socket *sk) { struct khttp *work; if (!(work = kmalloc(sizeof(struct khttp), GFP_KERNEL))) return NULL; work->sock = sk; INIT_WORK(&work->khttp_work, http_server_worker); list_add(&work->list, &daemon.worker); return &work->khttp_work; } ``` 透過 `create_work()` 當不同的客戶端進行連線的時候,新增一個 `thread` 透過 `list_add `並將其串到 `daemon.worker` 節點的後方 - [INIT_WORK](https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/workqueues.h#L76) ```c static void free_work(void) { struct khttp *l, *tar; /* cppcheck-suppress uninitvar */ list_for_each_entry_safe (tar, l, &daemon.worker, list) { kernel_sock_shutdown(tar->sock, SHUT_RDWR); flush_work(&tar->khttp_work); sock_release(tar->sock); kfree(tar); } } ``` 接著需要 `free_work()` 來將所有資源釋放,透過 `list_for_each_entry_safe` 走訪每一個 `struct khttp *work` - `kernel_sock_shutdown` 來關閉該 `work` 所監聽的 `socket` - `flush_work` 將 `work_struct` 清空 - `sock_release` 將已經關閉的 `socket` 釋放 接著來看如何處理每一個客戶端的連線處理 ```c static void http_server_worker(struct work_struct *work) { struct khttp *worker = container_of(work, struct khttp, khttp_work); char *buf; struct http_parser parser; struct http_parser_settings setting = { .on_message_begin = http_parser_callback_message_begin, .on_url = http_parser_callback_request_url, .on_header_field = http_parser_callback_header_field, .on_header_value = http_parser_callback_header_value, .on_headers_complete = http_parser_callback_headers_complete, .on_body = http_parser_callback_body, .on_message_complete = http_parser_callback_message_complete}; struct http_request request; struct socket *socket = worker->sock; allow_signal(SIGKILL); allow_signal(SIGTERM); buf = kzalloc(RECV_BUFFER_SIZE, GFP_KERNEL); if (!buf) { pr_err("can't allocate memory!\n"); } request.socket = socket; http_parser_init(&parser, HTTP_REQUEST); parser.data = &request; while (!daemon.is_stopped) { int ret; memset(buf, 0, RECV_BUFFER_SIZE - 1); ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1); if (ret <= 0) { if (ret) pr_err("recv error: %d\n", ret); break; } http_parser_execute(&parser, &setting, buf, ret); if (request.complete && !http_should_keep_alive(&parser)) break; memset(buf, 0, RECV_BUFFER_SIZE); } kernel_sock_shutdown(socket, SHUT_RDWR); sock_release(socket); kfree(buf); } ``` 透過 `http_server_worker()` 來處理每個客戶端的連線 - 透過 container_of(work, struct khttp, khttp_work) 來從 `work` 裡面找到目標 `thread` - struct socket *socket = worker->sock 來取得該客戶所連接的 `socket` - 檢查 `daemon.is_stopped` 若服務已停止則關閉該 `socket` 並釋放 `buf` ```c int http_server_daemon(void *arg) { struct socket *socket; struct work_struct *work; struct http_server_param *param = (struct http_server_param *) arg; allow_signal(SIGKILL); allow_signal(SIGTERM); INIT_LIST_HEAD(&daemon.worker); while (!kthread_should_stop()) { int err = kernel_accept(param->listen_socket, &socket, 0); if (err < 0) { if (signal_pending(current)) break; pr_err("kernel_accept() error: %d\n", err); continue; } if (unlikely(!(work = create_work(socket)))) { printk(KERN_ERR "khttp : create work error, connection closed\n"); kernel_sock_shutdown(socket, SHUT_RDWR); sock_release(socket); continue; } /* start server worker */ queue_work(khttp_wq, work); } printk("khttp : daemon shutdown in progress...\n"); daemon.is_stopped = true; free_work(); return 0; } ``` 透過 `http_server_daemon()` 來啟動 server daemon - 透過 `INIT_LIST_HEAD` 將 `daemon.worker` 節點初始化之後新增的客戶端可以透過 `list_add()` 加入鏈結串列中 - 透過 `kthread_should_stop()` 判斷執行緒是否在執行中,若是尚未結束則透過 `kernel_accept` 建立新的連線 - 建立新的 `socket` 後透過 `create_work` 來建立新的執行緒處理新的連線 - 最後透過 `queue_work()` 啟動 `workqueue` 最後我們需要將 `server` 註冊進 Linux 系統模組中 ```c static int __init khttpd_init(void) { int err = open_listen_socket(port, backlog, &listen_socket); if (err < 0) { pr_err("can't open listen socket\n"); return err; } param.listen_socket = listen_socket; khttp_wq = alloc_workqueue("khttp_wq", WQ_UNBOUND, 0); http_server = kthread_run(http_server_daemon, &param, KBUILD_MODNAME); if (IS_ERR(http_server)) { pr_err("can't start http server daemon\n"); close_listen_socket(listen_socket); return PTR_ERR(http_server); } return 0; } ``` 在 `khttpd_init()`的時候 - 透過 `open_listen_socket()` 來監聽目標 `socket` 客戶可以透過該 `socket` 來建立連線 - 透過 `alloc_workqueue()` 來創造並啟動 `workqueue` 修改完之後的效能表現如下 ```c 0 requests 10000 requests 20000 requests 30000 requests 40000 requests 50000 requests 60000 requests 70000 requests 80000 requests 90000 requests requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socket errors: 0 [0%] seconds: 1.377 requests/sec: 72631.558 Complete ``` --- ## 引入 RCU 來管理客戶端 關於 RCU 的相關資訊可以查看 [Linux 核心設計: RCU 同步機制](https://hackmd.io/@sysprog/linux-rcu) 跟 [What is RCU, Fundamentally?](https://lwn.net/Articles/262464/) 最適合 RCU 的場景為, 「讀取很頻繁,寫入較少,且嚴格要求資料一致性」, 因此初步引入 RCU 來管理客戶鍊結串列 在 `create_work()` 中使用 `list_add_rcu()` 來加入新的客戶端 ```c static struct work_struct *create_work(struct socket *sk) { struct khttp *work; if (!(work = kmalloc(sizeof(struct khttp), GFP_KERNEL))) return NULL; work->sock = sk; INIT_WORK(&work->khttp_work, http_server_worker); list_add_rcu(&work->list, &daemon.worker); return &work->khttp_work; } ``` 在 `free_work()` 中 使用 `list_for_each_entry_rcu()` 來走訪鍊結串列並將其一一釋放 ```c static void free_work(void) { struct khttp *l, *tar; /* cppcheck-suppress uninitvar */ rcu_read_lock(); list_for_each_entry_rcu (tar, &daemon.worker, list) { kernel_sock_shutdown(tar->sock, SHUT_RDWR); flush_work(&tar->khttp_work); sock_release(tar->sock); kfree(tar); } rcu_read_unlock(); } ``` 但執行結果卻不如預期,因此需要好好運用 `ftrace` 等工具來進行分析 ```c 0 requests 10000 requests 20000 requests 30000 requests 40000 requests 50000 requests 60000 requests 70000 requests 80000 requests 90000 requests requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socket errors: 0 [0%] seconds: 1.668 requests/sec: 59951.823 Complete ``` ## ftrace 追蹤程式運行狀態 `ftrace` 是 Linux kernel 提供的追蹤機制,相關的內容可以參考 [Debugging the kernel using Ftrace - part 1](https://lwn.net/Articles/365835/) 跟 [Debugging the kernel using Ftrace - part 2](https://lwn.net/Articles/366796/) 還有 "Demystifying the Linux CPU Scheduler" 的第六章也可以看到相關的敘述 首先看看目前的系統是否有提供 `ftrace` 的功能 ```c cat /boot/config-`uname -r` | grep CONFIG_HAVE_FUNCTION_TRACER ``` 如果看到下列內容代表 `ftrace` 在該版本中可以使用 ``` CONFIG_HAVE_FUNCTION_TRACER=y ``` 接著可以到 `/sys/kernel/debug/tracing` 印出以下內容 ```c root@chiacyu-msi:/sys/kernel/debug/tracing# ls available_events max_graph_depth stack_max_size available_filter_functions options stack_trace available_tracers per_cpu stack_trace_filter buffer_percent printk_formats synthetic_events buffer_size_kb README timestamp_mode buffer_total_size_kb saved_cmdlines trace current_tracer saved_cmdlines_size trace_clock dynamic_events saved_tgids trace_marker dyn_ftrace_total_info set_event trace_marker_raw enabled_functions set_event_notrace_pid trace_options error_log set_event_pid trace_pipe events set_ftrace_filter trace_stat free_buffer set_ftrace_notrace tracing_cpumask function_profile_enabled set_ftrace_notrace_pid tracing_max_latency hwlat_detector set_ftrace_pid tracing_on instances set_graph_function tracing_thresh kprobe_events set_graph_notrace uprobe_events kprobe_profile snapshot uprobe_profile ``` ftrace 的使用方式是透過 `ehco` 寫入來進行互動,可以先查看 `available_filter_functions` 的內容,其中紀錄了目前 `ftrace` 可以追蹤的函式。 但是在需要先將 `khttp.ko` 透過 註冊進核心模組,之後就可以看到 ```c root@chiacyu-msi:/sys/kernel/debug/tracing# cat available_filter_functions | grep khttp parse_url_char.part.0 [khttpd] http_message_needs_eof [khttpd] http_should_keep_alive [khttpd] http_parser_execute [khttpd] http_method_str [khttpd] http_status_str [khttpd] http_parser_init [khttpd] http_parser_settings_init [khttpd] http_errno_name [khttpd] http_errno_description [khttpd] http_parser_url_init [khttpd] http_parser_parse_url [khttpd] http_parser_pause [khttpd] http_body_is_final [khttpd] http_parser_version [khttpd] http_parser_set_max_header_size [khttpd] http_parser_callback_header_field [khttpd] http_parser_callback_headers_complete [khttpd] http_parser_callback_request_url [khttpd] http_parser_callback_message_begin [khttpd] http_parser_callback_body [khttpd] http_server_recv.constprop.0 [khttpd] http_server_worker [khttpd] http_parser_callback_header_value [khttpd] http_server_daemon [khttpd] http_server_send.isra.0 [khttpd] http_parser_callback_message_complete [khttpd] ``` 我們可以撰寫一個 `shellscript` 來設定 `ftrace` - `max_graph_depth` 可以設定測量函式的深度 - `current_tracer` 會紀錄使用的量測項目,這邊設定為 `function_graph` - `set_graph_function` 則設定欲觀察的程式,在此為 `http_server_worker` ```c #!/bin/bash TRACE_DIR=/sys/kernel/debug/tracing echo > $TRACE_DIR/set_ftrace_filter echo > $TRACE_DIR/current_tracer echo nop > $TRACE_DIR/current_tracer echo function_graph > $TRACE_DIR/current_tracer # depth of the function calls echo 1 > max_graph_depth echo http_server_worker > $TRACE_DIR/set_graph_function echo 1 > $TRACE_DIR/tracing_on ./htstress -n 100 -c 1 -t 4 http://localhost:8081/ echo 0 > $TRACE_DIR/tracing_on ``` 執行完之後可以來看看 `trace` 裡面的內容 ```c root@chiacyu-msi:/sys/kernel/debug/tracing# cat trace | head -20 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 10) | http_server_worker [khttpd]() { 10) | kernel_sigaction() { 10) 0.140 us | _raw_spin_lock_irq(); 10) 0.110 us | _raw_spin_unlock_irq(); 10) 1.130 us | } 10) | kernel_sigaction() { 10) 0.110 us | _raw_spin_lock_irq(); 10) 0.120 us | _raw_spin_unlock_irq(); 10) 0.620 us | } 10) | kmem_cache_alloc_trace() { 10) 0.110 us | __cond_resched(); 10) 0.100 us | should_failslab(); 10) 1.190 us | } 10) 0.130 us | http_parser_init [khttpd](); 10) | http_server_recv.constprop.0 [khttpd]() { 10) | kernel_recvmsg() { ``` 接著可以將 `max_graph_depth` 的數字增加來看看結果 ```c root@chiacyu-msi:/sys/kernel/debug/tracing# cat trace | head -300 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 5) | http_server_worker [khttpd]() { 5) | kernel_sigaction() { 5) 0.220 us | _raw_spin_lock_irq(); 5) 0.150 us | _raw_spin_unlock_irq(); 5) 0.951 us | } 5) | kernel_sigaction() { 5) 0.100 us | _raw_spin_lock_irq(); 5) 0.170 us | _raw_spin_unlock_irq(); 5) 0.540 us | } 5) | kmem_cache_alloc_trace() { 5) 0.090 us | __cond_resched(); 5) 0.090 us | should_failslab(); 5) 0.990 us | } 5) 0.100 us | http_parser_init [khttpd](); 5) | http_server_recv.constprop.0 [khttpd]() { 5) | kernel_recvmsg() { 5) | sock_recvmsg() { 5) | security_socket_recvmsg() { 5) 0.700 us | apparmor_socket_recvmsg(); 5) 0.890 us | } 5) | inet_recvmsg() { 5) 1.550 us | tcp_recvmsg(); 5) 1.821 us | } 5) 3.081 us | } 5) 3.251 us | } 5) 3.441 us | } 5) | kernel_sock_shutdown() { 5) | inet_shutdown() { 5) | lock_sock_nested() { 5) 0.090 us | __cond_resched(); 5) 0.100 us | _raw_spin_lock_bh(); 5) | _raw_spin_unlock_bh() { 5) 0.100 us | __local_bh_enable_ip(); 5) 0.260 us | } 5) 0.770 us | } 5) | tcp_shutdown() { 5) | tcp_set_state() { 5) 0.100 us | inet_sk_state_store(); 5) 0.290 us | } 5) | tcp_send_fin() { 5) 1.840 us | __alloc_skb(); 5) 0.100 us | sk_forced_mem_schedule(); 5) 0.400 us | tcp_current_mss(); 5) + 43.448 us | __tcp_push_pending_frames(); 5) + 46.338 us | } 5) + 46.968 us | } 5) | sock_def_wakeup() { 5) 0.100 us | __rcu_read_lock(); 5) 0.100 us | __rcu_read_unlock(); 5) 0.490 us | } 5) | release_sock() { 5) 0.090 us | _raw_spin_lock_bh(); 5) | __release_sock() { 5) 0.130 us | _raw_spin_unlock_bh(); 5) 4.471 us | tcp_v4_do_rcv(); 5) 0.090 us | __cond_resched(); 5) 0.090 us | _raw_spin_lock_bh(); 5) 5.271 us | } 5) 0.100 us | tcp_release_cb(); 5) | _raw_spin_unlock_bh() { 5) 0.090 us | __local_bh_enable_ip(); 5) 0.260 us | } 5) 6.171 us | } 5) + 54.929 us | } 5) + 55.199 us | } 5) | sock_release() { 5) | inet_release() { 5) 0.110 us | ip_mc_drop_socket(); 5) | tcp_close() { 5) | lock_sock_nested() { 5) 0.080 us | __cond_resched(); 5) 0.090 us | _raw_spin_lock_bh(); 5) 0.130 us | _raw_spin_unlock_bh(); 5) 0.670 us | } 5) | __tcp_close() { 5) 0.150 us | __sk_mem_reclaim(); 5) 0.090 us | _raw_write_lock_bh(); 5) 0.120 us | _raw_write_unlock_bh(); 5) 0.100 us | _raw_spin_lock(); 5) 0.100 us | __release_sock(); 5) 1.141 us | inet_csk_destroy_sock(); 5) 0.090 us | _raw_spin_unlock(); 5) 0.090 us | __local_bh_enable_ip(); 5) 2.761 us | } 5) | release_sock() { 5) 0.090 us | _raw_spin_lock_bh(); 5) 0.120 us | tcp_release_cb(); 5) 0.130 us | _raw_spin_unlock_bh(); 5) 0.740 us | } 5) | sk_free() { 5) 1.270 us | __sk_free(); 5) 1.470 us | } 5) 6.561 us | } 5) 7.011 us | } 5) 0.110 us | module_put(); 5) | iput() { 5) 0.080 us | _raw_spin_lock(); 5) 0.110 us | _raw_spin_unlock(); 5) | evict() { 5) | inode_wait_for_writeback() { 5) 0.120 us | _raw_spin_lock(); 5) 0.171 us | __inode_wait_for_writeback(); 5) 0.100 us | _raw_spin_unlock(); 5) 0.741 us | } 5) | truncate_inode_pages_final() { 5) 0.100 us | truncate_inode_pages_range(); 5) 0.300 us | } 5) | clear_inode() { 5) 0.090 us | _raw_spin_lock_irq(); 5) 0.090 us | _raw_spin_unlock_irq(); 5) 0.450 us | } 5) 0.090 us | _raw_spin_lock(); 5) 0.180 us | wake_up_bit(); 5) 0.090 us | _raw_spin_unlock(); 5) | destroy_inode() { 5) 1.000 us | __destroy_inode(); 5) 0.140 us | call_rcu(); 5) 1.480 us | } 5) 4.121 us | } 5) 4.981 us | } 5) + 12.492 us | } 5) 0.280 us | kfree(); 5) + 76.673 us | } ``` 可以看到在 `__tcp_push_pending_frames` 花了最久的時間。 ## 檢查是否提供 `keep-Alive` 功能 在測試之前需要先充分的了解 HTTP request 的格式, 詳細資料可以參考 [HTTP Messages](https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages)。 HTTP 的 request 可以分成三的部份 - Method : 定義要求資料的形式,如 `GET`, `POST` 等等 - Request target : 要求的資料位置,通常是以 `URL` 形式 - HTTP version : HTTP 的版本 因此我們在掛載 `khttp` 之後輸入 `telnet localhost 8081`, 分別輸入 `GET / HTTP/1.0` 跟 `GET / HTTP/1.1` ```c (base) chiacyu@chiacyu-msi:~$ telnet localhost 8081 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET / HTTP/1.0 HTTP/1.1 200 OK Server: khttpd Content-Type: text/plain Content-Length: 12 Connection: Close Hello World! Connection closed by foreign host. ``` ```c (base) chiacyu@chiacyu-msi:~$ telnet localhost 8081 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET / HTTP/1.1 HTTP/1.1 200 OK Server: khttpd Content-Type: text/plain Content-Length: 12 Connection: Keep-Alive Hello World! ``` 可以看到目前的 `khttp` 目前有提供 `Keep Alive` 的功能 ## 使用 `timer` 主動中斷超時連線 由於目前的 `khttp` 沒有提供 `timer` 的機制來中斷連線,這個部份可以參考 [sehttpd](https://github.com/sysprog21/sehttpd) 的實作方式。 `sehttpd` 是透過一個 `priority queue` 的方式來管理所有連線。其中 `priority queue` 的結構是一個 `min heap` 其中透過 `prio_queue_min()` 取出最接近 `deadline` 的連線。 ```c static inline void *prio_queue_min(prio_queue_t *ptr) { return prio_queue_is_empty(ptr) ? NULL : ptr->priv[1]; } ``` 原本 `sehttpd` 裡面更新時間的方法為透過 `gettimeofday()` 的方式來獲取目前系統的時間,再轉換成 `ms` 的單位。 ```c static void time_update() { struct timeval tv; int rc UNUSED = gettimeofday(&tv, NULL); assert(rc == 0 && "time_update: gettimeofday error"); current_msec = tv.tv_sec * 1000 + tv.tv_usec / 1000; } ``` 很遺憾的是在 `kernel space` 並沒有辦法直接使用 `gettimeofday()` 需要透過別的方式得到目前的系統時間。 這邊使用 [ktime_get_real()](https://docs.kernel.org/core-api/timekeeping.html), 在透過 ktime_to_ms() 轉換成 `ms` 的格式。 ```c static void time_update(void) { ktime_t kt = ktime_get_real(); current_msec = ktime_to_ms(kt); } ``` 接著在 `http_server_daemon()` 裡面透過 `timer_init()` 將 `timer` 初始化,接著透過 `handle_expired_timers()` 來找出所有超過截止時間的連線,並一一將其釋放。 ```c int http_server_daemon(void *arg) { struct socket *socket; struct work_struct *work; struct http_server_param *param = (struct http_server_param *) arg; allow_signal(SIGKILL); allow_signal(SIGTERM); timer_init(); INIT_LIST_HEAD(&daemon.worker); while (!kthread_should_stop()) { int time = find_timer(); pr_info("wait time = %d\n", time); handle_expired_timers(); int err = kernel_accept(param->listen_socket, &socket, 0); ... ... ``` 在測試的時候遇到一個問題,就是 `http_server_daemon()` 會停留在 `kernel_accept()` 的部份而不會回到迴圈的開始,發現的原因是透過 `dmesg` 查看時並沒有看到 `"wait time = %d\n"` 持續被輸出,且超過時間的客戶連線也沒有順利被關閉。 翻找資料的時候看到 [Risheng1128](https://hackmd.io/@Risheng/linux2022-ktcp/https%3A%2F%2Fhackmd.io%2F%40Risheng%2Flinux2022-khttpd) 同學的報告才知道需要將 `socket` 改成 non-blocking 的方式。 詳細可以看這個 [commit](https://github.com/chiacyu/khttpd/commit/6c22e405c250058298224c023ce84d85dc5d9335) ```c @@ -247,7 +248,8 @@ int http_server_daemon(void *arg) pr_info("wait time = %d\n", time); handle_expired_timers(); int err = kernel_accept(param->listen_socket, &socket, 0); // int err = kernel_accept(param->listen_socket, &socket, 0); int err = kernel_accept(param->listen_socket, &socket, SOCK_NONBLOCK); if (err < 0) { if (signal_pending(current)) break; ... ... ``` 在 [kernel_accept](https://www.kernel.org/doc/html/v5.6/networking/kapi.html) 的頁面中可以看到,`int kernel_accept(struct socket * sock, struct socket ** newsock, int flags)` 函式需要透過三個參數,第一個參數為目前監聽的 `socket`, 第二個為要建立的新連線的 `socket`, 最後一個則為 `flag` 來設定 `socket` 的相關屬性。 >flags must be SOCK_CLOEXEC, SOCK_NONBLOCK or 0. If it fails, newsock is guaranteed to be NULL. Returns 0 or an error. 所以需要把第三個參數內容改成 `SOCK_NONBLOCK` 接著透過 `./htstress -n 10000 http://localhost:8081/` 來進行測試可以從 `dmesg` 中看到 `timer` 如預期的運作。 ```c [26107.946917] khttpd: handle_expired_timers() node->deleted: free node of socket 637491968 [26107.946968] khttpd: add_timer: prio_queue_insert successfully [26107.946972] khttpd: requested_url = / [26107.947031] khttpd: add_timer: prio_queue_insert successfully [26107.947035] khttpd: requested_url = / [26107.947044] khttpd: handle_expired_timers() node->deleted: free node of socket 1661477824 [26107.947094] khttpd: add_timer: prio_queue_insert successfully [26107.947098] khttpd: requested_url = / [26107.947108] khttpd: handle_expired_timers() node->deleted: free node of socket 1660977984 [26107.947154] khttpd: add_timer: prio_queue_insert successfully [26107.947158] khttpd: requested_url = / [26107.947168] khttpd: handle_expired_timers() node->deleted: free node of socket 1226516224 ``` ## 實做 [directory listing]()的功能 在 `kernel space` 有提供 `int iterate_dir(struct file *file, struct dir_context *ctx)` 函式可以使用。 關於 [int iterate_dir()](https://elixir.bootlin.com/linux/latest/source/fs/readdir.c#L40) 的定義需要輸入兩個參數,分別是 `struct file *file` 與 `struct dir_context *ctx`。 在 `kernel space` 裡面要開啟檔案需要透過不同的函式,這邊透過 [filp_open(const char *filename, int flags, umode_t mode)](https://elixir.bootlin.com/linux/latest/source/fs/open.c#L1315) 來回傳一個 `struct file` 的指針。 在這邊先指定打開 `"/"` root的檔案位置。再來可以看看 [`struct dir_context *`](https://elixir.bootlin.com/linux/v4.8/source/include/linux/fs.h#L1644) 的結構。透過 `typedef int (*filldir_t)(struct dir_context *, const char *, int, loff_t, u64, unsigned);` 來定義 `callback function`. 這邊先定義出 `printdir()` 來作為 `callback function`。當 `iterate_dir()`被執行的時候會呼叫 `printdir()`。 ```c static int printdir(struct dir_context *ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { if (strcmp(name, ".") ==0 || strcmp(name, "..") == 0 ){ return 0; } pr_info("Filename : %s\n", name); return 0; } void list_directory(void) { char *path = "/"; struct dir_context ctx = {.actor = &printdir}; struct file *fp = filp_open(path, O_DIRECTORY, S_IRWXU | S_IRWXG | S_IRWXO); if (IS_ERR(fp)) { printk("Open file error\n"); } iterate_dir(fp, &ctx); return; } ``` 執行出來的結果為下圖,可以看到成功印出 `root` 裡面的檔案內容,接著要把內容轉換成 `http` 的資料格式。 ```bash= [ 2662.325454] khttpd: Filename : dev [ 2662.325455] khttpd: Filename : cdrom [ 2662.325455] khttpd: Filename : boot [ 2662.325456] khttpd: Filename : proc [ 2662.325456] khttpd: Filename : lib32 [ 2662.325457] khttpd: Filename : var [ 2662.325457] khttpd: Filename : snap [ 2662.325457] khttpd: Filename : mnt [ 2662.325458] khttpd: Filename : etc [ 2662.325458] khttpd: Filename : sbin [ 2662.325458] khttpd: Filename : opt [ 2662.325459] khttpd: Filename : lib64 [ 2662.325459] khttpd: Filename : sys [ 2662.325459] khttpd: Filename : media [ 2662.325460] khttpd: Filename : lib [ 2662.325460] khttpd: Filename : tmp [ 2662.325460] khttpd: Filename : libx32 [ 2662.325461] khttpd: Filename : root [ 2662.325461] khttpd: Filename : swapfile [ 2662.325461] khttpd: Filename : run [ 2662.325462] khttpd: Filename : bin [ 2662.325462] khttpd: Filename : home [ 2662.325462] khttpd: Filename : srv [ 2662.325463] khttpd: Filename : lost+found [ 2662.325463] khttpd: Filename : usr ``` `Http` response 的資料格式可以參考 [http response](https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages)。修改完成程式碼之後可以透過瀏覽器測試。 ```c static int printdir(struct dir_context *ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { char *buf = kmalloc(BUFFER_SIZE, GFP_KERNEL); struct http_request *request = container_of(ctx, struct http_request, ctx); if (strcmp(name, ".") == 0 || strcmp(name, "..") == 0) { return 0; } snprintf(buf, BUFFER_SIZE, "<li><a href=/%s/>%s</a></li>", name, name); http_server_send(request->socket, buf, BUFFER_SIZE); return 0; } static void list_directory_info(struct http_request *request) { pr_info("Into : list_directory_info()\n"); char *response = kmalloc(BUFFER_SIZE, GFP_KERNEL); if (request->method != HTTP_GET) { response = HTTP_RESPONSE_501; http_server_send(request->socket, response, strlen(response)); kfree(response); } char *path = "/"; request->ctx.actor = &printdir; struct file *fp = filp_open(path, O_RDONLY, 0); if (IS_ERR(fp)) { pr_err("Open file error\n"); } snprintf(response, BUFFER_SIZE, "HTTP/1.1 200 OK \r\n%s%s%s", "Server: localhost\r\n", "Content-Type: text/html\r\n", "Keep-Alive: timeout=5, max=999\r\n\r\n"); http_server_send(request->socket, response, BUFFER_SIZE); memset(response, '\0', BUFFER_SIZE); snprintf(response, BUFFER_SIZE, "<!DOCTYPE html><html><head><title>Page " "Title</title></head><body><ul>"); http_server_send(request->socket, response, BUFFER_SIZE); memset(response, '\0', BUFFER_SIZE); iterate_dir(fp, &(request->ctx)); snprintf(response, BUFFER_SIZE, "</ul></body></html>"); http_server_send(request->socket, response, BUFFER_SIZE); kfree(response); return; } ``` 打開瀏覽器在 `URL` 中輸入 `http://localhost:8081`如果成功可以看到畫面如下: ![](https://hackmd.io/_uploads/SJyjTQOYh.png) 但目前還沒有辦法實踐回應功能,來試著引入 `WWWROOT` 功能來達成。透過 `#define DEFAULT_ROOT "/"` 來定義預設的檔案位置,再來可以透過 `module_param` 巨集來在 `insmod` 的時候定義 `WWWROOT` 個變數。詳細的使用方法可以看 [The Linux Kernel Module Programming Guide : 4.5 Passing Command Line Arguments to a Module](https://sysprog21.github.io/lkmpg/#passing-command-line-arguments-to-a-module) ```c #define DEFAULT_ROOT "/" ... extern char *WWWROOT = DEFAULT_ROOT; module_param(WWWROOT, charp, 0000); ... ``` 這邊在 `khttp_server_service` 裡面新增一個 `char *root` 來儲存 `WWWROOT` 的內容。這邊先將 `struct khttp_server_service daemon` 宣告為 `extern`。 接下來在 `khttpd_init()` 中將 `WWWROOT` 的內容指派給 `daemon.root`。之後在 `list_directory_info()` 可以取得 `WWWROOT`的內容。 ```c struct khttp_server_service { bool is_stopped; struct list_head worker; char *root; }; extern struct khttp_server_service daemon; ``` ```c static int __init khttpd_init(void) { int err = open_listen_socket(port, backlog, &listen_socket); if (err < 0) { pr_err("can't open listen socket\n"); return err; } param.listen_socket = listen_socket; daemon.root = WWWROOT; khttp_wq = alloc_workqueue("khttp_wq", WQ_UNBOUND, 0); http_server = kthread_run(http_server_daemon, &param, KBUILD_MODNAME); if (IS_ERR(http_server)) { pr_err("can't start http server daemon\n"); close_listen_socket(listen_socket); return PTR_ERR(http_server); } return 0; } ``` ```c static void list_directory_info(struct http_request *request) { pr_info("Into : list_directory_info()\n"); char *response = kmalloc(BUFFER_SIZE, GFP_KERNEL); if (request->method != HTTP_GET) { response = HTTP_RESPONSE_501; http_server_send(request->socket, response, strlen(response)); kfree(response); } char *path = daemon.root; ... ... ``` 接著當使用者在點擊資料夾的過程會透過 `request_url` 來改變目標位置。原本預設的 `request_url` 是 `/`。當點擊 `home`這個資料夾時 `request_url` 會變成 `/home`。 再來還需要判斷開啟的檔案內容是資料夾還是一般檔案。可以透過 `inode` 來判斷檔案的屬性。其中 `inode` 的結構可以參考 [fs.h](https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L603) ```c struct inode { umode_t i_mode; unsigned short i_opflags; kuid_t i_uid; kgid_t i_gid; unsigned int i_flags; ... ``` 可以透過巨集 `S_ISREG(m)`, `S_ISDIR(m)` 來判斷檔案的類型,其中要填入的參數則是 `imode`, 因此可以判定當 `S_ISDIR(m)` 為真時表示目前開啟的檔案為目錄格式。 ```c #define S_ISREG(m) (((m) & S_IFMT) == S_IFREG) #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) ``` 先新增一個 `inode` 的結構來取得 `struct file *fp` 的 `inode` 內容。再來對 `inode` 中的 `i_mode` 元素進行判斷。 ```c struct inode *inode = fp->f_inode; if (S_ISDIR(inode->i_mode)) { snprintf(response, BUFFER_SIZE, "<!DOCTYPE html><html><head><title>Directory" "</title></head><body><ul>"); http_server_send(request->socket, response, BUFFER_SIZE); memset(response, '\0', BUFFER_SIZE); iterate_dir(fp, &(request->ctx)); ... ... } else if (S_ISREG(inode->i_mode)) { snprintf(response, BUFFER_SIZE, "<!DOCTYPE html><html><head>" ... ``` 如果打開的檔案是 `regular file` 的話需要把檔案的內容讀取進 `buffer` 再回傳,在 `kernel space` 讀取檔案需要透過 `kernel_read` 相關的說明可以看 [fs.h](https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L2605)。 ```c ... } else if (S_ISREG(inode->i_mode)) { snprintf(response, BUFFER_SIZE, "<!DOCTYPE html><html><head><title>Regular" " File</title></head><body><p>"); http_server_send(request->socket, response, BUFFER_SIZE); memset(response, '\0', BUFFER_SIZE); int ret = kernel_read(fp, response, fp->f_inode->i_size, 0); http_server_send(request->socket, response, ret); ... ``` 之後打開網頁瀏覽器之後就可以就可以透過點擊資料夾來進行互動,當讀到文字檔的時候也可以看到文字檔的內容呈現在瀏覽器上。 ![](https://hackmd.io/_uploads/B1r7rLH93.png) ## 處理 MIME type 檔案