# 2020q1 Homework4 (khttpd) contributed by <_`ire33164`_> ###### tags: `Linux Kernel` [作業說明](https://hackmd.io/@sysprog/linux2020-khttpd) ## 開發環境 ``` $ uname -a Linux chia-GL72-6QF 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux ``` ``` $ gcc --version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` ## 自我檢查清單 ### 掛載時利用 port=1999 傳遞到核心,作為核心模組初始化的參數 是利用 `module_param` 來達到傳入初始化的參數,在 `main.c` 中其實可以發現不只 `port` 還有另一個 `blocklog` 也可以在掛載時被初始化成指定的值。 ```c static ushort port = DEFAULT_PORT; module_param(port, ushort, S_IRUGO); static ushort backlog = DEFAULT_BACKLOG; module_param(backlog, ushort, S_IRUGO); ``` 接著解釋`module_param` 的運作機制 : 以下節錄至 [include/linux/moduleparam.h](https://elixir.bootlin.com/linux/v4.18/source/include/linux/moduleparam.h#L128) : ```c /** * module_param - typesafe helper for a module/cmdline parameter * @value: the variable to alter, and exposed parameter name. * @type: the type of the parameter * @perm: visibility in sysfs. * * @value becomes the module parameter, or (prefixed by KBUILD_MODNAME and a * ".") the kernel commandline parameter. Note that - is changed to _, so * the user can use "foo-bar=1" even for variable "foo_bar". * * @perm is 0 if the the variable is not to appear in sysfs, or 0444 * for world-readable, 0644 for root-writable, etc. Note that if it * is writable, you may need to use kernel_param_lock() around * accesses (esp. charp, which can be kfreed when it changes). * * The @type is simply pasted to refer to a param_ops_##type and a * param_check_##type: for convenience many standard types are provided but * you can create your own by defining those variables. * * Standard types are: * byte, short, ushort, int, uint, long, ulong * charp: a character pointer * bool: a bool, values 0/1, y/n, Y/N. * invbool: the above, only sense-reversed (N = true). */ #define module_param(name, type, perm) \ module_param_named(name, name, type, perm) ``` 上述註解分別說明 `module_param` 的三個參數 : * `value` : 可以作為 module parameter 或是在 commandline 設定的 parameter,就如範例中,利用 `$ sudo insmod khttpd.ko port=1999` 就可以將 parameter port 初始化成 1999 * `type` : 表示該 variable 的資料型別,可以設定為 byte, short, ushort, int, uint, long, ulong, charp, bool, invbool * `perm` : 表示該 variable 的存取權限,若為 `0` 將不會在系統中顯示,`0444` 為 world-readable,範例中 `perm` 被設為 `S_IRUGO`,其被定義在 [linux/include/linux/stat.h ](http://lxr.linux.no/linux+v2.6.35/include/linux/stat.h#L52) 中 : ```c #define S_IRUSR 00400 #define S_IRGRP 00040 #define S_IROTH 00004 #define S_IRUGO (S_IRUSR|S_IRGRP|S_IROTH) ``` 範例中的 port 存取權限為 world-readable, 資料型別為 ushort。 若將 `module_param(port, ushort, S_IRUGO)` 繼續展開則為 `module_param_named(port, port, ushort, S_IRUGO)`。 以下節錄至 [/include/linux/moduleparam.h](https://elixir.bootlin.com/linux/v4.18/source/include/linux/moduleparam.h#L148) : ```c /** * module_param_named - typesafe helper for a renamed module/cmdline parameter * @name: a valid C identifier which is the parameter name. * @value: the actual lvalue to alter. * @type: the type of the parameter * @perm: visibility in sysfs. * * Usually it's a good idea to have variable names and user-exposed names the * same, but that's harder if the variable must be non-static or is inside a * structure. This allows exposure under a different name. */ #define module_param_named(name, value, type, perm) \ param_check_##type(name, &(value)); \ module_param_cb(name, &param_ops_##type, &value, perm); \ __MODULE_PARM_TYPE(name, #type) ``` 說明 `module_param_named` 可以對 module/commandline parameter 進行重新命名。 將 `module_param_cb` 展開 : ```c /** * module_param_cb - general callback for a module/cmdline parameter * @name: a valid C identifier which is the parameter name. * @ops: the set & get operations for this parameter. * @perm: visibility in sysfs. * * The ops can have NULL set or get functions. */ #define module_param_cb(name, ops, arg, perm) \ __module_param_call(MODULE_PARAM_PREFIX, name, ops, arg, perm, -1, 0) ``` 繼續展開 `__module_param_call` : ```c /* This is the fundamental function for registering boot/module parameters. */ #define __module_param_call(prefix, name, ops, arg, perm, level, flags) \ /* Default value instead of permissions? */ \ static const char __param_str_##name[] = prefix #name; \ static struct kernel_param __moduleparam_const __param_##name \ __used \ __attribute__ ((unused,__section__ ("__param"),aligned(sizeof(void *)))) \ = { __param_str_##name, THIS_MODULE, ops, \ VERIFY_OCTAL_PERMISSIONS(perm), level, flags, { arg } } ``` 可以看到裡頭註冊 `struct kernel_param` 各別的值 : ```c struct kernel_param { const char *name; struct module *mod; const struct kernel_param_ops *ops; const u16 perm; s8 level; u8 flags; union { void *arg; const struct kparam_string *str; const struct kparam_array *arr; }; }; ``` 而 `struct kernel_param` 正是模組參數的資料結構。 ### epoll 系統呼叫作用與 HTTP 效能分析工具原理 #### 解釋 epoll 根據 [epoll(7)](http://man7.org/linux/man-pages/man7/epoll.7.html) 的描述,他可以透過監控多個 file descripters 來觀察 I/O 事件。主要是透過 `epoll instance` 來操控,`epoll instance` 是一種 in-kernel 的資料結構,其中包含 `interest list` 與 `ready list`, `interest list` 包含所有被註冊的 file descripters,而 `ready list` 中則包含註冊且可以進行操作的 file descripters,也因此後者為前者的子集。 而提供的系統呼叫有以下三個 : * `epoll_create` : 創造一個 `epoll instance` 並回傳一個可以代表他的 fd,`size` 代表該 `epoll instance` 可以監控的數量。 * `epoll_ctl` : 用來新增修改或刪除 epfd 上的 fd。 * `epoll_wait` : 等待事件發生,若事件在 timeout 內發生即成功,並回傳 `ready list` 中 fb 的數量。 #### 解釋 HTTP 效能分析工具原理 以下程式碼皆擷取自 [htstress.c](https://github.com/sysprog21/khttpd/blob/master/htstress.c) : ```c start_time(); ``` 紀錄在執行以下程式碼前的時間 ```c for (int n = 0; n < num_threads - 1; ++n) pthread_create(&useless_thread, 0, &worker, 0); worker(0); ``` 創造的 thread 加上自己剛好等於 `num_theads` 個,並利用創造出來的 thread 執行 `worker`,接下來來看看 `worker` 中做了什麼 : ```c ... int efd = epoll_create(concurrency); ... for (int n = 0; n < concurrency; ++n) init_conn(efd, ecs + n); ... for (;;) { do { nevts = epoll_wait(efd, evts, sizeof(evts) / sizeof(evts[0]), -1); } while (!exit_i && nevts < 0 && errno == EINTR); ... ``` 首先註冊了 `concurrency` 個數的 epoll instance,也就代表著一次監聽 `concurrency` 個 fd,接著利用 fd 進行 socket 連線,相當於一次發出 `concurrency` 個 request。接著監聽每個 fd 是否有 I/O 事件的發生。並根據 ready list 中的 event 狀態 (EPOLLOUT|EPOLLIN) 做出 `send` 或 `recv` 等的對應動作,也就是向 server 端傳送或接受訊息。 ```c double delta = tve.tv_sec - tv.tv_sec + ((double) (tve.tv_usec - tv.tv_usec)) / 1e6; ``` 計算執行上面的程式碼所需的時間。 總結 [htstress.c](https://github.com/sysprog21/khttpd/blob/master/htstress.c) 的原理就是在利用 thread 同時向 server 發出 `num_threads` * `concurrency` 個 request,並在達到 `max_requests` requests 且都接收到 response 後,計算從發出到接受到 server 端回應的時間。 ## 將 fibdrv 作業的成果整合進 kHTTPd 因為要接受來自客戶端的請求 `/fib/N` ,因此要先找出處理 `http_request` 和 `http_response` 的部份。在 `https_server.c` 中找到 `http_server_response` 用來回應客戶端部份,首先看到預設的回傳 macro 格式 : ```c #define HTTP_RESPONSE_200_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: 12" CRLF \ "Connection: Close" CRLF CRLF "Hello World!" CRLF #define HTTP_RESPONSE_200_KEEPALIVE_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: 12" CRLF \ "Connection: Keep-Alive" CRLF CRLF "Hello World!" CRLF ``` 可以發現回傳的 macro 格式都是固定包含 "Hello World!" 和長度 "12",因此嘗試把這兩個部份改成可變的格式 : ```c #define HTTP_RESPONSE_200_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: %d" CRLF \ "Connection: Close" CRLF CRLF "%s" CRLF #define HTTP_RESPONSE_200_KEEPALIVE_DUMMY \ "" \ "HTTP/1.1 200 OK" CRLF "Server: " KBUILD_MODNAME CRLF \ "Content-Type: text/plain" CRLF "Content-Length: %d" CRLF \ "Connection: Keep-Alive" CRLF CRLF "%s" CRLF ``` 再來只需要將可變得兩個部份代入 fibonacci 運算結果即可,開始進行 fibonacci 的運算。因為題目要求必須考慮大數運算,因此匯入之前寫的 [bignum_operation.[ch]](https://github.com/ire33164/khttpd/blob/master/bignum_operation.h) 並重寫 `fib_eval()` 提供 `http_server.c` 求值。 接著改寫 `http_server_response` 為可以解析 `/fib/N` 需求的 function,而可以解析 request url 的部份我在 `http_get_parse_url` 中實做 : ```c= static char *http_get_parse_url(struct http_request *request, int keep_alive) { const char *delim = "/"; char *token, *cur = request->request_url; char *kbuf; char *response = keep_alive ? HTTP_RESPONSE_200_KEEPALIVE_DUMMY : HTTP_RESPONSE_200_DUMMY; int len = MAX_DIGIT + strlen(response) - 4 + 1; kbuf = kmalloc(len, GFP_KERNEL); cur++; token = strsep(&cur, delim); if (strcmp(token, "fib") == 0) { // evalue fib(N) long N; char fib_result[MAX_DIGIT]; kstrtol(strsep(&cur, delim), 10, &N); bignum fib_val = fib_eval(N); bignum2str(&fib_val, fib_result); snprintf(kbuf, len, response, strlen(fib_result), fib_result); return kbuf; } snprintf(kbuf, strlen(response) - 3, response, 12, "Hello World!"); return kbuf; } ``` 利用 `strsep` 將 request url 根據 `/` 切開,若 `/` 接著的是 `fib` 那麼理應下一個 `/` 會接著數值 N,因此又切了一次後利用 `kstrol` 將 type 為 string 的 N 轉成 type 為 long 的形式,接著利用 `fib_eval(N)` 求值 fibonacci(N),最後再將運算結果利用 `snprintf` 代入 http response 的特定格式中。另外若 request url 不為 `/fib` 開頭的話,一律回傳 "Hello World!"。 最後改寫 `http_server_response` : ```c static int http_server_response(struct http_request *request, int keep_alive) { char *response; pr_info("requested_url = %s\n", request->request_url); if (request->method != HTTP_GET) response = keep_alive ? HTTP_RESPONSE_501_KEEPALIVE : HTTP_RESPONSE_501; else response = http_get_parse_url(request, keep_alive); http_server_send(request->socket, response, strlen(response)); return 0; } ``` ### recv error : -104 問題 不同於使用瀏覽器像伺服器發出請求,使用 `wget()` 發送請求時會看到 kernel 出現 `recv error : -104` 的錯誤訊息, :::danger 未完成 ::: ## Fibonacci 運算檢驗 這邊我是使用 [Facebook 討論串](https://www.facebook.com/groups/system.software2020/permalink/345124729755066/) 中 [黃鈺盛](https://www.facebook.com/wilson.d.huang?comment_id=Y29tbWVudDozNDUxMjQ3Mjk3NTUwNjZfMzQ1NTcwNjE2Mzc3MTQ0) 提供的 python 實做版本,他向 `http://www.protocol5.com/Fibonacci/{number}.htm` 取得 fibonacci(N) 的值並執行 fibonacci 執行檔與其結果做比對,同時也紀錄下計算的時間。 對於驗證程式我做了一些修改,我把執行檔路徑修改成 `http://localhost:PORT/fib/N` ,並將執行改成對 URL 發出請求,並紀錄下從 http request 到接收到 khttpd 回應的時間。 ```c FIB_URL = "http://localhost:" + args.port + "/fib/" + args.index ``` ```c ... res = requests.get(url) if res.status_code == requests.codes.ok : duration = time.time() - start_time print(f"fibonacci({index}) = {res.text}") print(f"time cost {duration} (s)") expect = fetch_fib_number(args.index)[base] if res.text == expect: print(Fore.GREEN + "-----------------------Pass----------------------------") else: print(Fore.RED + "------------------------No pass--------------------------") exit(0) else: print(Fore.RED + "-----------------------Fail-----------------------") exit(1) ... ``` 接著在 Makefile 中新增 verify : ``` verify: python3 ./fib_verify.py ${PORT} ${N} ``` 接著若要驗證僅須下達 : ``` $ make verify python3 ./fib_verify.py 8081 10 fibonacci(10) = 55 time cost 0.0032465457916259766 (s) -----------------------Pass---------------------------- ``` ## Concurrency Managed Workqueue 參考 [kecho](https://github.com/sysprog21/kecho) 的實做手法引入 cmwq 後 : ``` $ ./htstress -n 100000 -c 4 -t 4 http://localhost:8081/ ... requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socker errors: 0 [0%] seconds: 2.082 requests/sec: 48038.931 ``` 與先前的版本 : ``` $ ./htstress -n 100000 -c 4 -t 4 http://localhost:8081/ .... requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socker errors: 0 [0%] seconds: 5.186 requests/sec: 19282.472 ``` `requests/sec` 整整成長了 1.5 倍 * 改成求 fibonacci 數值 舊版 : ``` $ ./htstress -n 100000 -c 4 -t 4 http://localhost:8081/fib/136 ... requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socker errors: 0 [0%] seconds: 6.231 requests/sec: 16047.714 ``` 新版 : ``` $ ./htstress -n 100000 -c 4 -t 4 http://localhost:8081/fib/13 ... requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socker errors: 0 [0%] seconds: 2.282 requests/sec: 43827.125 ``` 相差幅度又更大了。 ## 參考資料 1. [Facebook 討論串](https://www.facebook.com/groups/system.software2020/permalink/345124729755066/) 2. [fibdrv 作業說明](https://hackmd.io/@sysprog/linux2020-fibdrv) 3. [epoll(7)](http://man7.org/linux/man-pages/man7/epoll.7.html) 4. [ip(7)](http://man7.org/linux/man-pages/man7/ip.7.htm) 5. [EPOLLIN和EPOLLOUT究竟什麼時候觸發?](https://blog.csdn.net/hintonic/article/details/16882989) 6. [ERRNO(3)](http://man7.org/linux/man-pages/man3/errno.3.html) 7. [The method to epoll’s madness](https://medium.com/@copyconstruct/the-method-to-epolls-madness-d9d2d6378642) 8. [Concurrency Managed Workqueue之(二):CMWQ概述](http://www.wowotech.net/irq_subsystem/cmwq-intro.html)