2022q1 Homework6 (khttpd)

contributed by < Risheng1128 >

作業說明
 作業區
 2022 年 Linux 核心設計/實作課程期末專題

實驗環境 (筆電)

$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           142
Model name:                      Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
Stepping:                        9
CPU MHz:                         2700.000
CPU max MHz:                     3100.0000
CPU min MHz:                     400.0000
BogoMIPS:                        5399.81
Virtualization:                  VT-x
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                        3 MiB
NUMA node0 CPU(s):               0-3

自我檢查清單

參照 Linux 核心模組掛載機制，解釋 $ sudo insmod khttpd.ko port=1999 這命令是如何讓 port=1999 傳遞到核心，作為核心模組初始化的參數呢？
參照 CS:APP 第 11 章，給定的 kHTTPd 和書中的 web 伺服器有哪些流程是一致？又有什麼是你認為 kHTTPd 可改進的部分？
htstress.c 用到 epoll 系統呼叫，其作用為何？這樣的 HTTP 效能分析工具原理為何？

解釋如何傳遞資料到核心模組

請見 Linux 核心如何處理傳遞到核心模組的參數

掛載 `khttpd` 模組

掛載 khttpd 時，會執行函式 khttpd_init ，實際程式碼如以下所示

static int __init khttpd_init(void)
{
    int err = open_listen_socket(port, backlog, &listen_socket);
    if (err < 0) {
        pr_err("can't open listen socket\n");
        return err;
    }
    param.listen_socket = listen_socket;
    http_server = kthread_run(http_server_daemon, &param, KBUILD_MODNAME);
    if (IS_ERR(http_server)) {
        pr_err("can't start http server daemon\n");
        close_listen_socket(listen_socket);
        return PTR_ERR(http_server);
    }
    return 0;
}

khttpd 模組初始化的設定和 kecho 模組蠻像的，但仍然可以發現兩者不同之處，最明顯在於 khttpd 並沒有使用函式 alloc_workqueue 建立上述所提到的 CMWQ，而是採用系統預設的 workqueue ，因此之後可以討論兩者之間的效能差異，以下主要將 khttpd 分成兩個部份

open_listen: 建立伺服器並等待連線
kthread_run: 用於建立一個立刻執行的執行緒

首先函式 open_listen 的部份，建立 socket 連線的步驟都相同，而這邊有個特別的函式 setsockopt ，以下節錄部份 open_listen 程式碼及 setsockopt 程式碼

static int open_listen_socket(ushort port, ushort backlog, struct socket **res)
{
    ...
    err = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, 1);
    if (err < 0)
        goto bail_setsockopt;

    err = setsockopt(sock, SOL_TCP, TCP_NODELAY, 1);
    if (err < 0)
        goto bail_setsockopt;

    err = setsockopt(sock, SOL_TCP, TCP_CORK, 0);
    if (err < 0)
        goto bail_setsockopt;

    err = setsockopt(sock, SOL_SOCKET, SO_RCVBUF, 1024 * 1024);
    if (err < 0)
        goto bail_setsockopt;

    err = setsockopt(sock, SOL_SOCKET, SO_SNDBUF, 1024 * 1024);
    if (err < 0)
        goto bail_setsockopt;
    ...
}

static inline int setsockopt(struct socket *sock,
                             int level,
                             int optname,
                             int optval)
{
    int opt = optval;
    return kernel_setsockopt(sock, level, optname, (char *) &opt, sizeof(opt));
}

這邊有個特別的實作，主要是判斷 Linux 核心版本，參考 Support Linux v5.8+ (#5) 及 net: remove kernel_setsockopt 發現函式 kernel_setsockopt 在 Linux v5.8 之後已經被移除，因此在 khttpd 模組裡有對應不同 Linux 核心版本的實作

#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0)

接著研究像是 SOL_SOCKET 和 SOL_TCP 這類設定的意義，分別參考 socket(7) - Linux man page 及 tcp(7) — Linux manual page ，以下整理 khttpd 所使用到的設定，其中 SO_REUSEADDR 的說明有點難懂，特別參考 What is the meaning of SO_REUSEADDR (setsockopt option) - Linux?

SOL_SOCKET

Setting	Description
SO_REUSEADDR	在原本的連線結束後，有使用相同 IP 及 Port 的連線要求出現，讓 socket 可以直接重新建立連線
SO_RCVBUF	設定 socket receive buffer 可以接收的最大數量
SO_SNDBUF	設定 socket send buffer 可以送出的最大數量

SOL_TCP

Setting	Description
TCP_NODELAY	關閉 Nagle's algorithm — 參考 Best Practices for TCP Optimization in 2019
TCP_CORK	經常搭配 TCP_NODELAY 使用，為了避免不斷送出資料量不多 (小於 MSS) 的封包，使用 TCP_CORK 可以將資料匯聚並且一次發送資料量較大的封包 — 參考 Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?

建立 socket 後，使用函式 kthread_run 建立執行緒並執行函式 http_server_daemon

int http_server_daemon(void *arg)
{
    struct socket *socket;
    struct task_struct *worker;
    struct http_server_param *param = (struct http_server_param *) arg;

    // 登記要接收的 signal
    allow_signal(SIGKILL);
    allow_signal(SIGTERM);

    // 判斷執行緒是否該被中止
    while (!kthread_should_stop()) {
        int err = kernel_accept(param->listen_socket, &socket, 0);
        if (err < 0) {
            // 檢查當前執行緒是否有 signal 發生
            if (signal_pending(current))
                break;
            pr_err("kernel_accept() error: %d\n", err);
            continue;
        }
        worker = kthread_run(http_server_worker, socket, KBUILD_MODNAME);
        if (IS_ERR(worker)) {
            pr_err("can't create more worker process\n");
            continue;
        }
    }
    return 0;
}

整體程式邏輯都和 kecho 模組相同，首先登記 SIGKILL 及 SIGTERM ，接著使用函式 kthread_should_stop 判斷負責執行函式 http_server_daemon 的執行緒是否應該中止，使用函式 kernel_accept 接受 client 連線要求，成功建立後使用函式 kthread_run 建立新的執行緒並且執行函式 http_server_worker

執行 `http_server_worker`

所有連線的子執行緒都會執行函式 http_server_worker ，主要執行以下幾件事

設定 call back function ，在 khttpd 裡主要用來回傳資料給 client
進到迴圈，使用函式 kthread_should_stop 判斷該執行緒是否該中止
接收資料
使用函式 http_parser_execute 解讀收到的資料
中斷連線後釋放用到的所有記憶體

static int http_server_worker(void *arg)
{
    char *buf;
    struct http_parser parser;
    // 設定 callback function
    struct http_parser_settings setting = {
        .on_message_begin = http_parser_callback_message_begin,
        .on_url = http_parser_callback_request_url,
        .on_header_field = http_parser_callback_header_field,
        .on_header_value = http_parser_callback_header_value,
        .on_headers_complete = http_parser_callback_headers_complete,
        .on_body = http_parser_callback_body,
        .on_message_complete = http_parser_callback_message_complete};
    struct http_request request;
    struct socket *socket = (struct socket *) arg;

    allow_signal(SIGKILL);
    allow_signal(SIGTERM);

    buf = kmalloc(RECV_BUFFER_SIZE, GFP_KERNEL);
    if (!buf) {
        pr_err("can't allocate memory!\n");
        return -1;
    }

    request.socket = socket;
    // 設定 parser 初始參數
    http_parser_init(&parser, HTTP_REQUEST);
    parser.data = &request;
    
    // 判斷執行緒是否該被中止
    while (!kthread_should_stop()) {
        // 接收資料
        int ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1);
        if (ret <= 0) {
            if (ret)
                pr_err("recv error: %d\n", ret);
            break;
        }
        // 解析收到的資料
        http_parser_execute(&parser, &setting, buf, ret);
        if (request.complete && !http_should_keep_alive(&parser))
            break;
    }
    kernel_sock_shutdown(socket, SHUT_RDWR);
    sock_release(socket);
    kfree(buf);
    return 0;
}

設定 call back function 的部份，主要是用來送出回應 clent 的資料，以下為相關函式

static int http_parser_callback_message_complete(http_parser *parser)
{
    struct http_request *request = parser->data;
    http_server_response(request, http_should_keep_alive(parser));
    request->complete = 1;
    return 0;
}

static int http_server_response(struct http_request *request, int keep_alive)
{
    char *response;

    pr_info("requested_url = %s\n", request->request_url);
    if (request->method != HTTP_GET)
        response = keep_alive ? HTTP_RESPONSE_501_KEEPALIVE : HTTP_RESPONSE_501;
    else
        response = keep_alive ? HTTP_RESPONSE_200_KEEPALIVE_DUMMY
                              : HTTP_RESPONSE_200_DUMMY;
    http_server_send(request->socket, response, strlen(response));
    return 0;
}

而呼叫以下函式的時機在於解析整個資料後，可以在函式 http_parser_execute 裡找到相關實作

接著探討整個 khttpd 很核心的函式 http_parser_execute ，其功能就是將收到的資料進行解讀，並傳送給 client






































































size_t http_parser_execute (http_parser *parser,
                            const http_parser_settings *settings,
                            const char *data,
                            size_t len)
{
    ...
    for (p=data; p != data + len; p++) {
        ch = *p;

        if (PARSING_HEADER(CURRENT_STATE()))
            COUNT_HEADER_SIZE(1);
reexecute:
	    switch (CURRENT_STATE()) {
            ...
            case s_start_req:
            {
                if (ch == CR || ch == LF)
                    break;
                parser->flags = 0;
                parser->uses_transfer_encoding = 0;
                parser->content_length = ULLONG_MAX;

                if (UNLIKELY(!IS_ALPHA(ch))) {
                    SET_ERRNO(HPE_INVALID_METHOD);
                    goto error;
                }

                parser->method = (enum http_method) 0;
                parser->index = 1;
                switch (ch) {
                case 'A': parser->method = HTTP_ACL; break;
                case 'B': parser->method = HTTP_BIND; break;
                case 'C': parser->method = HTTP_CONNECT; /* or COPY, CHECKOUT */ break;
                case 'D': parser->method = HTTP_DELETE; break;
                case 'G': parser->method = HTTP_GET; break;
                case 'H': parser->method = HTTP_HEAD; break;
                case 'L': parser->method = HTTP_LOCK; /* or LINK */ break;
                case 'M': parser->method = HTTP_MKCOL; /* or MOVE, MKACTIVITY, MERGE, M-SEARCH, MKCALENDAR */ break;
                case 'N': parser->method = HTTP_NOTIFY; break;
                case 'O': parser->method = HTTP_OPTIONS; break;
                case 'P': parser->method = HTTP_POST;
                    /* or PROPFIND|PROPPATCH|PUT|PATCH|PURGE */
                    break;
                case 'R': parser->method = HTTP_REPORT; /* or REBIND */ break;
                case 'S': parser->method = HTTP_SUBSCRIBE; /* or SEARCH, SOURCE */ break;
                case 'T': parser->method = HTTP_TRACE; break;
                case 'U': parser->method = HTTP_UNLOCK; /* or UNSUBSCRIBE, UNBIND, UNLINK */ break;
                default:
                    SET_ERRNO(HPE_INVALID_METHOD);
                    goto error;
                }
                UPDATE_STATE(s_req_method);
                CALLBACK_NOTIFY(message_begin);
                break;
            }
            ...
            case s_message_done:
	            UPDATE_STATE(NEW_MESSAGE());
                CALLBACK_NOTIFY(message_complete);
                if (parser->upgrade) {
                    /* Exit, the rest of the message is in a different protocol. */
                    RETURN((p - data) + 1);
                }
                break;
            ...
            }
        ...
    }
    ...
}

函式 http_parser_execute 主要是一個很大的迴圈，將讀取到的資料的每個字元進行解讀，這邊特別提到兩種情況，分別是 s_start_req 及 s_message_done

在第 7 行可以看到整個函式的使用，第 15 行可以看到 s_start_req 的情況，其功能是當一開始進行解析時，會使用第一個字元判斷該要求是屬於那一種的類型，可以在第 31 ~ 48 行找到各種的對應

第 57 行可以看到 s_message_done 的實作，其功能是解析資料完畢後，要給 client 對應的回應，主要是使用以下的巨集進行上面提過的 callback function 呼叫 (位於第 59 行)

CALLBACK_NOTIFY(message_complete);

比較 `khttpd` 和 CS:APP 給定的網站伺服器

大致理解 khttpd 的實作流程後，可以開始 khttpd 和 CS:APP 提到的 TINY web 的比較

上圖是 CS:APP 所提供的 server 的流程架構，從這個流程圖可以得到一些資訊

兩者建立 socket 的流程基本相同，流程都是 socket → bind → listen → accept ，接著開始傳輸資料，而不同之處在於兩者使用的 API 不同
而 I/O 傳輸的部份， khttpd 使用 linux 核心的 API 而 TINY web 則是使用自己實作的 RIO 套件

接著還有一些不同之處

khttpd 運行在 kernel space 而 TINY web 是運行在 user space
khttpd 使用多執行緒的方式處理不同的連線，而 TINY web 則是用單執行緒一個一個處理連線

對於 khttpd 可改進的部份，目前是想到可以使用在 kecho 中提到的 CMWQ 進行改寫

`htstress.c` 原理分析

htstress 是一個 client ，可以藉由使用者的輸入參數像 server 進行不同的請求，並且最後計算出每個連線平均所花的時間，以下列出使用者可以選擇的選項

Usage: htstress [options] [http://]hostname[:port]/path
Options:
   -n, --number       total number of requests (0 for inifinite, Ctrl-C to abort)
   -c, --concurrency  number of concurrent connections
   -t, --threads      number of threads (set this to the number of CPU cores)
   -u, --udaddr       path to unix domain socket
   -h, --host         host to use for http request
   -d, --debug        debug HTTP response
   --help             display this message

特別挑出幾個實驗用到的選項進行說明

-n: 向 server 要求的總次數
-c: 每個 worker thread 對 server 建立連線的總數
-t: 執行緒的數量 (根據 CPU core 數設定)

可以稍微將 htstress.c 分成兩個主要函式，分別是 main 和 worker

首先函式 main 的部份主要是用來讀取使用者輸入的資料、解析 URL、建立執行緒並且計算時間

do {
	next_option =
		getopt_long(argc, argv, short_options, long_options, NULL);

	switch (next_option) {
	case 'n':
		// Convert a string to an unsigned quadword integer
		max_requests = strtoull(optarg, 0, 10);
		break;
	case 'c':
		concurrency = atoi(optarg);
		break;
	case 't':
		num_threads = atoi(optarg);
		break;
	case 'u':
		udaddr = optarg;
		break;
	case 'd':
		debug = 0x03;
		break;
	case 'h':
		host = optarg;
		break;
	case '4':
		hints.ai_family = PF_INET;
		break;
	case '6':
		hints.ai_family = PF_INET6;
		break;
	case '%':
		print_usage();
	case -1:
		break;
	default:
		printf("Unexpected argument: '%c'\n", next_option);
		return 1;
	}
} while (next_option != -1);

上述為讀取使用者輸入的部份，首先由函式 getopt_long 找到使用者輸入的選項，接著搭配 switch 進行分類

接著比較特別的地方如下，使用函式 getaddrinfo 將 socket 的資訊儲存在一個 linked list 裡，接著走訪該 linked list 並對每個節點進行連線測試，這樣的作法有點像是在實際測試前，進行連線測試

int j = getaddrinfo(node, port, &hints, &result);
if (j) {
	fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(j));
	exit(EXIT_FAILURE);
}

for (rp = result; rp; rp = rp->ai_next) {
	int testfd =
		socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
	if (testfd == -1)
		continue;

	if (connect(testfd, rp->ai_addr, rp->ai_addrlen) == 0) {
		close(testfd);
		break;
	}

	close(testfd);
}

接著討論函式 worker ，其為被建立的執行緒主要執行的函式，主要用來送出及接收資料，這裡使用到 epoll 系統呼叫，主要實作如下

int efd = epoll_create(concurrency);
if (efd == -1) {
	perror("epoll");
	exit(1);
}

for (int n = 0; n < concurrency; ++n)
	init_conn(efd, ecs + n);

for (;;) {
	do {
		nevts = epoll_wait(efd, evts, sizeof(evts) / sizeof(evts[0]), -1);
	} while (!exit_i && nevts < 0 && errno == EINTR);
	...
}

使用函式 epoll_create 建立 epoll 且可以讓 concurrency 個 file descriptor 進行監聽，接著使用 epoll_wait 開始進行監聽

接著比較特別的地方如下






















































if (evts[n].events & EPOLLOUT) {
	ret = send(ec->fd, outbuf + ec->offs, outbufsize - ec->offs, 0);

	if (ret == -1 && errno != EAGAIN) {
		/* TODO: something better than this */
		perror("send");
		exit(1);
	}

	if (ret > 0) {
		if (debug & HTTP_REQUEST_DEBUG)
			write(2, outbuf + ec->offs, outbufsize - ec->offs);

		ec->offs += ret;

		/* write done? schedule read */
		if (ec->offs == outbufsize) {
			evts[n].events = EPOLLIN;
			evts[n].data.ptr = ec;

			ec->offs = 0;

			if (epoll_ctl(efd, EPOLL_CTL_MOD, ec->fd, evts + n)) {
				perror("epoll_ctl");
				exit(1);
			}
		}
	}

} else if (evts[n].events & EPOLLIN) {
	for (;;) {
		ret = recv(ec->fd, inbuf, sizeof(inbuf), 0);

		if (ret == -1 && errno != EAGAIN) {
			perror("recv");
			exit(1);
		}

		if (ret <= 0)
			break;

		if (ec->offs <= 9 && ec->offs + ret > 10) {
			char c = inbuf[9 - ec->offs];
			if (c == '4' || c == '5')
				ec->flags |= BAD_REQUEST;
		}

		if (debug & HTTP_RESPONSE_DEBUG)
			write(2, inbuf, ret);

		ec->offs += ret;
	}
	...
}

這邊表示了 client 寫入及讀取的步驟，首先 client 要對 server 進行請求，因此對於每個準備好的連線，其事件都是 EPOLLOUT ，所以這時會進入第 1 行的判斷並開始使用函式 send 送出請求

當資料已經完全送出後，更改其事件為 EPOLLIN ，位於第 18 行的地方，並且等到資料準備好後再使用函式 recv 讀取

最後中斷連線後，會增加次數並依照使用者輸入的請求總數，判斷是否結束或重新建立連線

if (!ret) {
	close(ec->fd);

	int m = atomic_fetch_add(&num_requests, 1);

	if (max_requests && (m + 1 > (int) max_requests))
		atomic_fetch_sub(&num_requests, 1);
	else if (ec->flags & BAD_REQUEST)
		atomic_fetch_add(&bad_requests, 1);
	else
		atomic_fetch_add(&good_requests, 1);

	if (max_requests && (m + 1 >= (int) max_requests)) {
		end_time();
		return NULL;
	}

	if (ticks && m % ticks == 0)
		printf("%d requests\n", m);

	init_conn(efd, ec);
}

epoll 系統呼叫

已經在 kecho: epoll 系統呼叫進行相關的討論，主要參考 Linux 核心設計: 針對事件驅動的 I/O 模型演化

開發紀錄

作業要求

在 GitHub 上 fork khttpd，目標是提供檔案存取功能和修正 khttpd 的執行時期缺失。過程中應一併完成以下:
- 指出 kHTTPd 實作的缺失 (特別是安全疑慮) 並予以改正
- 引入 Concurrency Managed Workqueue (cmwq)，改寫 kHTTPd，分析效能表現和提出改進方案，可參考 kecho
- 實作 HTTP 1.1 keep-alive，並提供基本的 directory listing 功能
  - 可由 Linux 核心模組的參數指定 WWWROOT

擴充 kHTTPd，使其具備現代網頁伺服器的經典特色，並運用 Linux 核心的機制，例如使用 RCU 管理 HTTP 連線

相關資訊:
- Tempesta FW: 在現有的 Linux TCP/IP 堆疊上建構 Web 加速器
- http-server-rcu: 以 RCU 管理連線
- kws: 參照其待辦事項

`khttpd` 實作的缺失

在函式 http_server_worker 執行迴圈的部份，如下所示

while (!kthread_should_stop()) {
	// 接收資料
	int ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1);
	if (ret <= 0) {
		if (ret)
			pr_err("recv error: %d\n", ret);
		break;
	}
	// 解析收到的資料
	http_parser_execute(&parser, &setting, buf, ret);
	if (request.complete && !http_should_keep_alive(&parser))
		break;
}

發現用來讀取資料的參數 buf 在每次的迭代中，最後都沒有將原本的 buf 清空，可能會導致非預期的結果

為了觀察是否會有問題，做了一個小實驗，首先輸入命令 telnet localhost 8081 接著對伺服器輸入不同的要求，分別是 GET /12345 HTTP/1.1 及 GET / HTTP/1.1

GET /12345 HTTP/1.1

HTTP/1.1 200 OK
Server: khttpd
Content-Type: text/plain
Content-Length: 12
Connection: Keep-Alive

Hello World!
GET / HTTP/1.1

HTTP/1.1 200 OK
Server: khttpd
Content-Type: text/plain
Content-Length: 12
Connection: Keep-Alive

Hello World!

雖然可以看到伺服器有正常的回應，但是查看模組發出的訊息

[186673.227429] khttpd: buf = GET /12345 HTTP/1.1
[186673.338073] khttpd: buf = 
                T /12345 HTTP/1.1
[186673.338083] khttpd: requested_url = /12345
[186733.791423] khttpd: buf = GET / HTTP/1.1
                1.1
[186733.918134] khttpd: buf = 
                T / HTTP/1.1
                1.1
[186733.918155] khttpd: requested_url = /

發現參數 buf 實際上會被之前的輸入影響，雖然在這個範例沒有出現任何的問題，但很難保證這種情況不會出現問題

另外這裡每次送出要求會顯示兩個 buf = 是因為 HTTP 的格式是由兩個 \r\n 作為結束條件，因此需要按兩次 Enter ，才會有上面的樣子

接著簡單修改原始碼，使用函式 memset 將參數 buf 的值清空

while (!kthread_should_stop()) {
+	int ret;
+	memset(buf, 0, RECV_BUFFER_SIZE);
+	ret = http_server_recv(socket, buf, RECV_BUFFER_SIZE - 1);
	if (ret <= 0) {
		if (ret)
			pr_err("recv error: %d\n", ret);
		break;
	}
	pr_info("buf = %s", buf);
	// 解析收到的資料
	http_parser_execute(&parser, &setting, buf, ret);
	if (request.complete && !http_should_keep_alive(&parser))
		break;
}

接著可以再次嘗試上面的實驗，以下為模組輸出的結果

[187284.736753] khttpd: buf = GET /12345 HTTP/1.1
[187284.849034] khttpd: buf = 
[187284.849045] khttpd: requested_url = /12345
[187300.646245] khttpd: buf = GET / HTTP/1.1
[187300.784082] khttpd: buf = 
[187300.784103] khttpd: requested_url = /

可以很明顯看到參數 buf 已經不會被之前的輸入給影響

減少 `printk` 的使用

在 kecho: 改寫 benchmarking 裡已經對 kecho 減少多餘的函式呼叫，並且得到了效能的改善，因此這裡也對 khttpd 進行一樣的實作

在實作之前，先使用 htstress.c 測試原本 server 的效能，這裡使用命令 ./htstress http://localhost:8081 -t 3 -c 20 -n 200000 進行測試

requests:      200000
good requests: 200000 [100%]
bad requests:  0 [0%]
socker errors: 0 [0%]
seconds:       8.246
requests/sec:  24252.937

在 http_server.h 新增以下結構

enum {
    TRACE_accept_err = 1,  // accept 失敗總數
    TRACE_cthread_err,     // create thread 失敗總數
    TRACE_kmalloc_err,     // kmalloc 失敗總數
    TRACE_recvmsg,     	   // recvmsg 總數
    TRACE_sendmsg,         // sendmsg 總數
    TRACE_send_err,        // send request 失敗總數
    TRACE_recv_err,        // recv request 失敗總數
};

struct runtime_state {
    atomic_t accept_err, cthread_err;
    atomic_t kmalloc_err, recvmsg;
    atomic_t sendmsg, send_err;
    atomic_t recv_err;
};
extern struct runtime_state states;

而在 khttpd 裡，最常呼叫的 pr_info 位於函式 http_server_response ，以下為修改過程

static int http_server_response(struct http_request *request, int keep_alive)
{
    char *response;
+   int ret;

-   pr_info("requested_url = %s\n", request->request_url);
    if (request->method != HTTP_GET)
        response = keep_alive ? HTTP_RESPONSE_501_KEEPALIVE : HTTP_RESPONSE_501;
    else
        response = keep_alive ? HTTP_RESPONSE_200_KEEPALIVE_DUMMY
                              : HTTP_RESPONSE_200_DUMMY;
    ret = http_server_send(request->socket, response, strlen(response));
+   if (ret > 0)
+       TRACE(sendmsg);
+   return 0;
}

這裡將 pr_info 移除，改成使用計算送出次數的方式，可以避免每次送出資料前，都要先印出的多餘動作，而其他的部份也是做相同的事

最後一樣輸入命令 ./htstress http://localhost:8081 -t 3 -c 20 -n 200000 再測試一次

requests:      200000
good requests: 200000 [100%]
bad requests:  0 [0%]
socker errors: 0 [0%]
seconds:       6.606
requests/sec:  30274.801

可以看到 server 處理效率有明顯上升，再使用命令 dmesg 查看實際運作狀況，如下所示

[164105.005808] khttpd: recvmsg : 200046
[164105.005815] khttpd: sendmsg : 200046
[164105.005817] khttpd: kmalloc_err : 0
[164105.005819] khttpd: cthread_err : 0
[164105.005821] khttpd: send_err : 0
[164105.005823] khttpd: recv_err : 0
[164105.005824] khttpd: accept_err : 0

也嘗試用實驗室電腦測試，使用命令 ./htstress http://localhost:8081 -t 8 -c 20 -n 200000 進行測試，以下為修改前數據

requests:      200000
good requests: 200000 [100%]
bad requests:  0 [0%]
socker errors: 0 [0%]
seconds:       3.148
requests/sec:  63539.607

修改後數據

requests:      200000
good requests: 200000 [100%]
bad requests:  0 [0%]
socker errors: 0 [0%]
seconds:       2.784
requests/sec:  71841.893

引入 CMWQ 到 `khttpd`

在 kecho 的實作中，為了有效管理 work ，所有的 work 都會被加到一個 linked list ，因此在 http_server.h 新增以下結構

struct httpd_service {
    bool is_stopped;
    struct list_head head;
};
extern struct httpd_service daemon_list;

該結構目的是用來當 linked list 的第一個節點使用，而成員 is_stopped 是用來判斷是否有結束連線的訊號發生

接著修改原本的結構 struct http_request ，新增 linked list 節點以及 work 結構

struct http_request {
    struct socket *socket;
    enum http_method method;
    char request_url[128];
    int complete;
+   struct list_head node;
+   struct work_struct khttpd_work;
};

接著整個程式的主要流程是建立 CMWQ → 連線建立後建立 work → workqueue 開始運作 → 釋放所有記憶體

首先建立 CMWQ 的部份在掛載模組時執行，位於函式 khttpd_init ，以下為修改的部份

static int __init khttpd_init(void)
{
    int err = open_listen_socket(port, backlog, &listen_socket);
    if (err < 0) {
        pr_err("can't open listen socket\n");
        return err;
    }
    param.listen_socket = listen_socket;

+   // create CMWQ
+   khttpd_wq = alloc_workqueue(MODULE_NAME, 0, 0);
    http_server = kthread_run(http_server_daemon, &param, KBUILD_MODNAME);
    if (IS_ERR(http_server)) {
        pr_err("can't start http server daemon\n");
        close_listen_socket(listen_socket);
        return PTR_ERR(http_server);
    }
    return 0;
}

使用函式 alloc_workqueue 建立 CMWQ ，而這裡有個需要注意的地方，也就是參數 flag 的值會根據需求而不同，根據 kecho 的註解說明，如果是想要長時間連線，像是使用 telnet 連線，可以把 flag 設成 WQ_UNBOUND ，否則設成 0 即可

自己實際兩個都設定過，的確使用 WQ_UNBOUND 的效率沒有來的非常好，主要原因可能是 work 可能會被 delay 導致，也有發生測試的時候電腦當機的情況

接著是建立 work 的部份，使用時機是在 server 和 client 建立連線後，以下新增函式 create_work 用來新增 work

static struct work_struct *create_work(struct socket *sk)
{
    struct http_request *work;

    // 分配 http_request 結構大小的空間
    // GFP_KERNEL: 正常配置記憶體
    if (!(work = kmalloc(sizeof(struct http_request), GFP_KERNEL)))
        return NULL;

    work->socket = sk;
    
    // 初始化已經建立的 work ，並運行函式 http_server_worker
    INIT_WORK(&work->khttpd_work, http_server_worker);

    list_add(&work->node, &daemon_list.head);

    return &work->khttpd_work;
}

函式 create_work 主要流程為建立 work 所需的空間 → 初始化 work → 將 work 加進 linked list 裡

最後釋放記憶體的部份單純許多，就是走訪整個 linked list ，並逐一釋放

static void free_work(void)
{
    struct http_request *l, *tar;
    /* cppcheck-suppress uninitvar */

    list_for_each_entry_safe (tar, l, &daemon_list.head, node) {
        kernel_sock_shutdown(tar->socket, SHUT_RDWR);
        flush_work(&tar->khttpd_work);
        sock_release(tar->socket);
        kfree(tar);
    }
}

最後使用命令 ./htstress http://localhost:8081 -t 3 -c 20 -n 200000 測試，以下為執行結果

requests:      200000
good requests: 200000 [100%]
bad requests:  0 [0%]
socker errors: 0 [0%]
seconds:       3.861
requests/sec:  51801.192

也對實驗室電腦輸入命令 ./htstress http://localhost:8081 -t 8 -c 20 -n 200000 ，產生以下結果

requests:      200000
good requests: 200000 [100%]
bad requests:  0 [0%]
socker errors: 0 [0%]
seconds:       1.306
requests/sec:  153082.390

可以發現整個 server 的吞吐量 (throughput) 有大幅的成長

實驗設備	原本的實作	新增 CMWQ
筆電	30274.801	51801.192
實驗室桌電	71841.893	153082.390

HTTP keep-alive 模式

如下圖，可以簡單將 HTTP 分成兩種傳輸模式，分別是 multiple connections 及 persistent connection ，前者會在伺服器回應請求之後中斷連線，後者則會持續保持連線，根據 HTTP 的敘述，可以得到幾件資訊

在 HTTP 1.0 的版本中，預設的連線模式為 multiple connection ，如果要使用 persistent connection ，則需要在 header 添加以下資訊

Connection: keep-alive
在 HTTP 1.1 的版本則是預設使用 persistent connection ，允許在單一連線下處理多個請求

這邊可以利用 khttpd 做簡單的測試，使用命令 telnet localhost 8081 進行連線，在分別輸入 GET / HTTP/1.0 及 GET / HTTP/1.1 進行測試，並分別觀察伺服器回傳的資料

GET / HTTP/1.0

HTTP/1.1 200 OK
Server: khttpd
Content-Type: text/plain
Content-Length: 12
Connection: Close

GET / HTTP/1.1

HTTP/1.1 200 OK
Server: khttpd
Content-Type: text/plain
Content-Length: 12
Connection: Keep-Alive

很明顯，根據回傳的 Connection: xxxxx 資訊可以得知，結果符合上述的敘述，因此可以確認 httpd 本身就有 keep-alive 的功能

實作 directory listing 功能

為了實作 directory listing 的功能，首先要做的第一件事就是讀取現行目錄的檔案名稱，新增函式 handle_directory 用來實踐該功能，完整的修改可以參考 Add the function of directory list







































static bool handle_directory(struct http_request *request)
{
    struct file *fp;
    char buf[SEND_BUFFER_SIZE] = {0};

    request->dir_context.actor = tracedir;
    if (request->method != HTTP_GET) {
        snprintf(buf, SEND_BUFFER_SIZE,
                 "HTTP/1.1 501 Not Implemented\r\n%s%s%s%s",
                 "Content-Type: text/plain\r\n", "Content-Length: 19\r\n",
                 "Connection: Close\r\n", "501 Not Implemented\r\n");
        http_server_send(request->socket, buf, strlen(buf));
        return false;
    }

    snprintf(buf, SEND_BUFFER_SIZE, "HTTP/1.1 200 OK\r\n%s%s%s",
             "Connection: Keep-Alive\r\n", "Content-Type: text/html\r\n",
             "Keep-Alive: timeout=5, max=1000\r\n\r\n");
    http_server_send(request->socket, buf, strlen(buf));


    snprintf(buf, SEND_BUFFER_SIZE, "%s%s%s%s", "<html><head><style>\r\n",
             "body{font-family: monospace; font-size: 15px;}\r\n",
             "td {padding: 1.5px 6px;}\r\n",
             "</style></head><body><table>\r\n");
    http_server_send(request->socket, buf, strlen(buf));

    fp = filp_open("/home/benson/khttpd/", O_RDONLY | O_DIRECTORY, 0);
    if (IS_ERR(fp)) {
        pr_info("Open file failed");
        return false;
    }

    iterate_dir(fp, &request->dir_context);
    snprintf(buf, SEND_BUFFER_SIZE, "</table></body></html>\r\n");
    http_server_send(request->socket, buf, strlen(buf));
    filp_close(fp, NULL);
    return true;
}

函式 handle_directory 主要做以下幾件事

判斷 clent 的請求是否為 GET ，並送出對應的 HTTP header (第 7 ~ 19 行)
開啟現行目錄並透過函式 iterate_dir 走訪目錄內的所有資料夾 (第 28 ~ 34 行)
結束連線

接著根據上述的第 6 行，將把函式 iterate_dir 導向到函式 tracedir ，換言之就是在執行函式 iterate_dir 的過程中會呼叫 tracedir ，以下為函式 tracedir 的實作

// callback for 'iterate_dir', trace entry.
static int tracedir(struct dir_context *dir_context,
                    const char *name,
                    int namelen,
                    loff_t offset,
                    u64 ino,
                    unsigned int d_type)
{
    if (strcmp(name, ".") && strcmp(name, "..")) {
        struct http_request *request =
            container_of(dir_context, struct http_request, dir_context);
        char buf[SEND_BUFFER_SIZE] = {0};

        snprintf(buf, SEND_BUFFER_SIZE,
                 "<tr><td><a href=\"%s\">%s</a></td></tr>\r\n", name, name);
        http_server_send(request->socket, buf, strlen(buf));
    }
    return 0;
}

函式 tracedir 的功能就是會走訪整個目錄的資料，並且每執行一次就會將資料送到 client

而這裡有一個比較特別的地方，也就是使用到巨集 container_of ，由於函式 tracedir 的參數是固定的，又需要 socket 參數來送出資料，因此這邊將結構 dir_context 放進結構 http_request 裡，如此一來，透過巨集 container_of 就可以達到不用傳遞 socket 也可以使用的效果

struct http_request {
    struct socket *socket;
    enum http_method method;
    char request_url[128];
    int complete;
+   struct dir_context dir_context;
    struct list_head node;
    struct work_struct khttpd_work;
};

最後展現目前的結果 (節錄部份)

取得現行目錄

原本是想實作出類似命令 pwd 的功能，如此一來可以顯示現行目錄的檔案，但實際嘗試兩種方法後遇到瓶頸

首先節錄主要測試的程式碼，使用到的函式位於 fs/d_path.c 及 fs/namei.c

struct path pwd;
char *cwd;
char current_path[100] = {0}, buf[SEND_BUFFER_SIZE] = {0};

pwd = current->fs->pwd;
path_get(&pwd);
cwd = d_path(&pwd, current_path, 100);
pr_info("path = %s\n", cwd);

輸入命令 sudo insmod khttpd.ko 並用 chrome 測試後，實際的結果如下所示，並沒有顯示絕對路徑

path = /

接著嘗試另一種方法，在 fs/d_path.c 發現函式 d_absolute_path ，想嘗試執行試試，但函式 d_absolute_path 沒有使用巨集 EXPORT_SYMBOL ，因此無法直接在核心模組進行呼叫

好奇有沒有方法可以解決這個問題，結果在 include/linux/kallsyms.h 發現一個有趣的函式 kallsyms_lookup_name ，其功能是回傳存在的函式地址，如此一來就算沒有透過函式 EXPORT_SYMBOL 也可以進行函式呼叫

不過根據 kallsyms: Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol() 的說明，在 linux kernel v5.7 以後的版本已經無法直接呼叫函式 kallsyms_lookup_name ，最主要的原因在於函式 kallsyms_lookup_name 的功能會破壞核心模組的基本原則，也就是只能呼叫有 export 的函式

使用兩個方法都沒有成功，因此最後決定使用比較單純的作法，新增核心模組參數 WWWROOT ，在掛載模組時直接指定要開啟的路徑，完整修改參考 Add the parameter to assign a initial directory path

參考 The Linux Kernel Module Programming Guide ，使用巨集 module_param_string 新增參數 WWWROOT

#define PATH_SIZE   100
static char WWWROOT[PATH_SIZE] = {0};
module_param_string(WWWROOT, WWWROOT, PATH_SIZE, 0);

為了讓 WWWROOT 可以傳遞到其他檔案，在結構 httpd_service 新增成員 dir_path ，主要用來傳遞資料到不同檔案

struct httpd_service {
    bool is_stopped;
+   char *dir_path;
    struct list_head head;
};
extern struct httpd_service daemon_list;

接著在函式 khttpd_init 新增以下程式碼，主要功能是用來判斷參數 WWWROOT 是否為空字串，如果是則使用預設的路徑，這裡採用 "/"

// check WWWROOT is a empty string or not
if (!*WWWROOT)
    WWWROOT[0] = '/';
daemon_list.dir_path = WWWROOT;

最後測試程式，分別在掛載模組時輸入 sudo insmod khttpd.ko 及 sudo insmod khttpd.ko WWWROOT='"home/benson/khttpd"' ，並得到以下結果 (節錄部份結果)

sudo insmod khttpd.ko

sudo insmod khttpd.ko WWWROOT='"home/benson/khttpd"'

目前可以藉由參數 WWWROOT 輸入伺服器開啟的目錄

讀取檔案資料

想要讀取檔案的資料，必需先知道檔案的屬性，如檔案大小以及檔案類型，在 Linux kernel 裡，檔案的屬性由結構 inode 所管理，位於 include/linux/fs.h ，而這裡主要使用到成員 i_mode 及 i_size ，前者主要表示檔案的類型，後者儲存檔案的大小

/*
 * Keep mostly read-only and often accessed (especially for
 * the RCU path lookup and 'stat' data) fields at the beginning
 * of the 'struct inode'
 */
struct inode {
	umode_t			i_mode;
	...
	loff_t			i_size;
	...
}

相同的，檔案類型一樣位於 include/linux/fs.h ，可以看到不同類型的檔案有不同的數值

/* these are defined by POSIX and also present in glibc's dirent.h */
#define DT_UNKNOWN  0
#define DT_FIFO     1
#define DT_CHR      2
#define DT_DIR      4
#define DT_BLK      6
#define DT_REG      8
#define DT_LNK      10
#define DT_SOCK     12
#define DT_WHT      14

接著如何判斷檔案類型，參考 include/uapi/linux/stat.h 的資料，發現可以判斷檔案類型的巨集，這裡主要使用巨集 S_ISDIR 及 S_ISREG ，前者用來判斷是否為目錄，後者則是判斷是否為一般文件

#define S_ISLNK(m)	(((m) & S_IFMT) == S_IFLNK)
#define S_ISREG(m)	(((m) & S_IFMT) == S_IFREG)
#define S_ISDIR(m)	(((m) & S_IFMT) == S_IFDIR)
#define S_ISCHR(m)	(((m) & S_IFMT) == S_IFCHR)
#define S_ISBLK(m)	(((m) & S_IFMT) == S_IFBLK)
#define S_ISFIFO(m)	(((m) & S_IFMT) == S_IFIFO)
#define S_ISSOCK(m)	(((m) & S_IFMT) == S_IFSOCK)

接著開始修改程式，完整修改位於 Add the function of read file 及 Fix bug on reading file in deeper directory ，主要修改函式 handle_directory


















































static bool handle_directory(struct http_request *request)
{
    struct file *fp;
    char pwd[BUFFER_SIZE] = {0};

    ...

    catstr(pwd, daemon_list.dir_path, request->request_url);
    fp = filp_open(pwd, O_RDONLY, 0);

    if (IS_ERR(fp)) {
        send_http_header(request->socket, HTTP_STATUS_NOT_FOUND,
                         http_status_str(HTTP_STATUS_NOT_FOUND), "text/plain",
                         13, "Close");
        send_http_content(request->socket, "404 Not Found");
        return false;
    }

    if (S_ISDIR(fp->f_inode->i_mode)) {
        char buf[SEND_BUFFER_SIZE] = {0};
        snprintf(buf, SEND_BUFFER_SIZE, "HTTP/1.1 200 OK\r\n%s%s%s",
                 "Connection: Keep-Alive\r\n", "Content-Type: text/html\r\n",
                 "Keep-Alive: timeout=5, max=1000\r\n\r\n");
        http_server_send(request->socket, buf, strlen(buf));

        snprintf(buf, SEND_BUFFER_SIZE, "%s%s%s%s", "<html><head><style>\r\n",
                 "body{font-family: monospace; font-size: 15px;}\r\n",
                 "td {padding: 1.5px 6px;}\r\n",
                 "</style></head><body><table>\r\n");
        http_server_send(request->socket, buf, strlen(buf));

        iterate_dir(fp, &request->dir_context);

        snprintf(buf, SEND_BUFFER_SIZE, "</table></body></html>\r\n");
        http_server_send(request->socket, buf, strlen(buf));
        kernel_sock_shutdown(request->socket, SHUT_RDWR);

    } else if (S_ISREG(fp->f_inode->i_mode)) {
        char *read_data = kmalloc(fp->f_inode->i_size, GFP_KERNEL);
        int ret = read_file(fp, read_data);

        send_http_header(request->socket, HTTP_STATUS_OK,
                         http_status_str(HTTP_STATUS_OK), "text/plain", ret,
                         "Close");
        http_server_send(request->socket, read_data, ret);
        kfree(read_data);
    }
    filp_close(fp, NULL);
    return true;
}

修改後的函式 handle_directory 做了以下幾件事

第 8 行使用函式 catstr ，將 WWWROOT 的路徑及 client 的要求接在一起，並且輸出到 pwd，再由函式 filp_open 打開檔案
第 11 行表示如果開檔失敗，則回傳 NOT FOUND 訊息給 client
第 19 行表示如果為目錄，則將整個目錄擁有的檔案名稱傳送給 client
第 38 行表示如果為一般文件，則直接讀取檔案資料並且送給 client

接著稍微修改前面的實作，讓伺服器可以處理 ".." 的要求，完整修改參考 Consider request ".." to go back previous page ，以下節錄主要的修改

// callback for 'iterate_dir', trace entry.
static int tracedir(struct dir_context *dir_context,
                    const char *name,
                    int namelen,
                    loff_t offset,
                    u64 ino,
                    unsigned int d_type)
{
-   if (strcmp(name, ".") && strcmp(name, "..")) {
+   if (strcmp(name, ".")) {
        struct http_request *request =
            container_of(dir_context, struct http_request, dir_context);
        char buf[SEND_BUFFER_SIZE] = {0};
-       char *url =
-           !strcmp(request->request_url, "/") ? "" : request->request_url;

        SEND_HTTP_MSG(request->socket, buf,
                      "%lx\r\n<tr><td><a href=\"%s/%s\">%s</a></td></tr>\r\n",
-                     34 + strlen(url) + (namelen << 1), url, name, name);
+                     34 + strlen(request->request_url) + (namelen << 1),
+                     request->request_url, name, name);
    }
    return 0;
}

static int http_parser_callback_request_url(http_parser *parser,
                                            const char *p,
                                            size_t len)
{
    struct http_request *request = parser->data;
+   // if requst is "..", remove last character
+   if (p[len - 1] == '/')
+       len--;
    strncat(request->request_url, p, len);
    return 0;
}

函式 tracedir 主要只是移除多餘的程式碼，而函式 http_parser_callback_request_url 是因為進到多層目錄後會回不去原本的目錄而有的改動，以下給例子

假設現行目錄為 /ab/cd 並且送出 .. ，原來的時候會產生的結果為 /ab/ ，接著再送出一次 .. 會產生的結果仍然為 /ab/ ，表示進到兩層以上的目錄後會回不到更早的目錄

為了解決這樣的問題才會有以上的更動，如果路徑的最後一個字元為 '/' ，只要將其移除即可，用一樣的例結果會變成 /ab/cd → /ab → (空字串)

使用 Chunked transfer encoding 送出目錄資料

在之前的實作中，由於每次傳送目錄資料時，並不知道總資料大小，因此都是送完資料後直接關閉連線，而在 HTTP 1.1 中提供了 Chunked encoding 的方法，可以將資料分成一個個的 chunk 並且分批發送，如此一來可以避免要在 HTTP header 中傳送 Content-Length: xx

參考 Transfer-Encoding: Chunked encoding 並由以下的範例可以得到幾個資訊

每次傳送資料前都要先送出資料的長度
資料的長度是 16 進位表示
資料長度和資料由 \r\n 隔開
要中斷資料傳送只要送出長度為 0 的資料即可

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

7\r\n
Mozilla\r\n
9\r\n
Developer\r\n
7\r\n
Network\r\n
0\r\n
\r\n

有了以上的資訊後，可以開始實作程式碼，完整修改可以參考 Use chunk to send directory and rewrite send_http_header

在正式修改程式之前，之前撰寫的函式 send_http_header 和 send_http_content 實在是太冗長，因此將兩者重新修改並且寫的更有彈性，新增巨集函式 SEND_HTTP_MSG 如下

#define SEND_HTTP_MSG(socket, buf, format, ...)           \
    snprintf(buf, SEND_BUFFER_SIZE, format, __VA_ARGS__); \
    http_server_send(socket, buf, strlen(buf))

如此一來，輸入的資料可以讓使用者任意送出，程式碼也變得更簡潔

以下主要列出使用 chunked encoding 的部份，分別是函式 handle_directory 及 tracedir

// callback for 'iterate_dir', trace entry.
static int tracedir(struct dir_context *dir_context,
                    const char *name,
                    int namelen,
                    loff_t offset,
                    u64 ino,
                    unsigned int d_type)
{
    if (strcmp(name, ".") && strcmp(name, "..")) {
        struct http_request *request =
            container_of(dir_context, struct http_request, dir_context);
        char buf[SEND_BUFFER_SIZE] = {0};
        char *url =
            !strcmp(request->request_url, "/") ? "" : request->request_url;

        SEND_HTTP_MSG(request->socket, buf,
                      "%lx\r\n<tr><td><a href=\"%s/%s\">%s</a></td></tr>\r\n",
                      34 + strlen(url) + (namelen << 1), url, name, name);
    }
    return 0;
}

static bool handle_directory(struct http_request *request)
{
	...
	if (S_ISDIR(fp->f_inode->i_mode)) {
		SEND_HTTP_MSG(request->socket, buf, "%s%s%s", "HTTP/1.1 200 OK\r\n",
		              "Content-Type: text/html\r\n",
		              "Transfer-Encoding: chunked\r\n\r\n");
		SEND_HTTP_MSG(
		    request->socket, buf, "7B\r\n%s%s%s%s", "<html><head><style>\r\n",
		    "body{font-family: monospace; font-size: 15px;}\r\n",
		    "td {padding: 1.5px 6px;}\r\n", "</style></head><body><table>\r\n");

		iterate_dir(fp, &request->dir_context);

		SEND_HTTP_MSG(request->socket, buf, "%s",
		              "16\r\n</table></body></html>\r\n");
		SEND_HTTP_MSG(request->socket, buf, "%s", "0\r\n\r\n");
	}
	...
}

主要修改的部份在於發送 HTTP header 時，需要新增 Transfer-Encoding: chunked ，另外每次傳送資料時後要先送出該資料的長度，最後要記得送出長度為 0 的資料

經過這樣的修改後，目前的伺服器可以送出不固定大小的資料

最後展示程式的執行結果

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

使用 MIME 處理不同類型的檔案

參考 MIME 類別可以初步了解 MIME 。首先，MIME 是一種表示文件、檔案或各式位元組的標準並且被定義在 RFC 6838 裡，如果要使用 MIME 的功能，則需要在伺服器回應的 HTTP header 的項目 Content-Type 提供正確的類型

至於要回應什麼要的類型，可以參考 Common MIME types ，裡頭提供了不同的副檔名應該要回應的型態

如此一來可以開始修改程式碼，完整修改參考 Add MIME to deal with different kind of files ，新增檔案 mime_type.h 裡面儲存常見的 MIME 類型

新增函式 get_mime_str ，功能為根據要求的檔案找到對應的回應訊息

// return mime type string
const char *get_mime_str(char *request_url)
{
    char *request_type = strchr(request_url, '.');
    int index = 0;
    if (!request_type)
        return "text/plain";

    while (mime_types[index].type) {
        if (!strcmp(mime_types[index].type, request_type))
            return mime_types[index].string;
        index++;
    }
    return "text/plain";
}

接著修改函式 handle_directory 裡處理一般檔案的部份，主要就是利用函式 get_mime_str 取得對應的回應訊息

static bool handle_directory(struct http_request *request)
{
	...
	else if (S_ISREG(fp->f_inode->i_mode)) {
		char *read_data = kmalloc(fp->f_inode->i_size, GFP_KERNEL);
		int ret = read_file(fp, read_data);

		SEND_HTTP_MSG(
			request->socket, buf, "%s%s%s%s%d%s", "HTTP/1.1 200 OK\r\n",
+			"Content-Type: ", get_mime_str(request->request_url),
			"\r\nContent-Length: ", ret, "\r\nConnection: Close\r\n\r\n");
		http_server_send(request->socket, read_data, ret);
		kfree(read_data);
	}
	...
}

最後展示成果，實際開啟 kernel-scheduler-internals.pdf

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

卸載模組時產生錯誤

在目前的實作發現了一個問題，只要有對伺服器做請求後，在卸載模組時會產生以下的錯誤訊息

[ 3721.905941] ------------[ cut here ]------------
[ 3721.905958] kernel BUG at fs/inode.c:1676!
[ 3721.905974] invalid opcode: 0000 [#6] SMP PTI
[ 3721.905987] CPU: 0 PID: 10434 Comm: khttpd Tainted: G      D W  OE     5.13.0-41-generic #46~20.04.1-Ubuntu
[ 3721.905999] Hardware name: Acer Aspire F5-573G/Captain_SK  , BIOS V1.18 10/21/2016
[ 3721.906005] RIP: 0010:iput+0x1ac/0x200
[ 3721.906022] Code: 00 0f 1f 40 00 4c 89 e7 e8 01 fb ff ff
                     5b 41 5c 41 5d 5d c3 c3 85 d2 74 a4 49
                     83 bc 24 e0 00 00 00 00 0f 85 3a ff ff
                     ff eb 93 <0f> 0b 0f 0b e9 0e ff ff ff
                     a9 b7 08 00 00 75 17 41 8b 84 24 58 01
...

首先查了 fs/inode.c 的第 1676 行，參考 fs/inode.c 可以找到對應的函式 iput

















void iput(struct inode *inode)
{
	if (!inode)
		return;
	BUG_ON(inode->i_state & I_CLEAR);
retry:
	if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
		if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
			atomic_inc(&inode->i_count);
			spin_unlock(&inode->i_lock);
			trace_writeback_lazytime_iput(inode);
			mark_inode_dirty_sync(inode);
			goto retry;
		}
		iput_final(inode);
	}
}

而程式錯誤就是發生在上述函式的第 5 行，從程式碼大致可以先猜這次的程式錯誤和檔案系統有關

最後發現，當我對伺服器送出請求後，伺服器會經過開啟檔案及讀取檔案的步驟，但是關閉檔案並沒有執行，程式會停留在函式 filp_close ，直到下一次的請求出現才會關閉，相關程式碼如下

static bool handle_directory(struct http_request *request, int keep_alive)
{
	...
	fp = filp_open(pwd, O_RDONLY, 0);
	...
	if (S_ISDIR(fp->f_inode->i_mode)) {
		...    
	} else if (S_ISREG(fp->f_inode->i_mode)) {
		...
		int ret = read_file(fp, read_data);
		...
	}
	filp_close(fp, NULL);
	return true;
}

因此當 client 從遠端關閉時，最後一次請求的檔案的 file descriptor 是沒有被關閉的，因此這時如果卸載模組就會產生上述的問題

為了解決這個問題，目前的想法是可以建立 timer 管理連線，讓伺服器可以主動關閉逾時的連線，詳細步驟在後面會有解釋

建立 timer 主動關閉連線

根據高效 Web 伺服器開發 - 實作考量點提到以下考量點

當 Web 伺服器和客戶端網頁瀏覽器保持著一個長期連線的狀況下，遇到客戶端網路離線，伺服器該如何處理？

Ans: 通訊協定無法立刻知悉，所以僅能透過伺服器引入 timer 逾時事件來克服

目前的 khttpd 實作中並沒有使用 timer 來關閉閒置的連線，因此會導致部份資源被佔用

接著開始實作程式碼，參考 sehttpd 裡 timer 的實作，主要使用 min heap 來做管理，相關資訊可以參考二元堆積

本來以為實作不會很麻煩，結果就連看好幾天的日出了，其中一種原因在於 sehttpd 實作在 user space 而 khttpd 實作在 kernel space ，因此 khttpd 在實作相同功能時需要查詢大量的 kernel API ，而最主要的原因在於 sehttpd 為單執行緒而 khttpd 是多執行緒，因此在實作時需要考慮資源搶佔的問題，也遇到了好幾次的 dead lock

為了方便解決這個問題，將問題分成以下幾個小問題並且逐一解決

將 socket 設定為 non-blocking
讀取目前的時間
實作 prority queue 並且管理每個連線

將 socket 設定為 non-blocking

要將 socket 設定為 non-blocking 的原因在於，原本的實作中 socket 預設為 blocking ，因此執行緒會停滯在函式 kernel_accept ，但這樣的話沒有辦法去判斷是否已經有連線逾期，因此將 socket 設定為 non-blocking 可以避免執行緒停滯在函式 kernel_accept 上，完整修改參考 Set socket non-blocking and remove accept_err

主要參考 kernel_accept ，其中參數 flags 可以設定為 SOCK_NONBLOCK ，如下所示

int err = kernel_accept(param->listen_socket, &socket, SOCK_NONBLOCK);

如此一來 socket 就能被改成 non-blocking 模式

讀取目前的時間

要讀取目前的時間，在 sehttpd 中使用系統呼叫 gettimeofday 實作，對應程式碼如下

static void time_update()
{
    struct timeval tv;
    int rc UNUSED = gettimeofday(&tv, NULL);
    assert(rc == 0 && "time_update: gettimeofday error");
    current_msec = tv.tv_sec * 1000 + tv.tv_usec / 1000;
}

而在 khttpd 裡無法使用系統呼叫，參考 include/linux/time64.h 裡的結構 timespec64 ，其定義如下，其中成員 tv_sec 表示秒而成員 tv_nsec 表示奈秒

struct timespec64 {
	time64_t	tv_sec;			/* seconds */
	long		tv_nsec;		/* nanoseconds */
};

接著參考 include/linux/timekeeping.h 裡的函式 ktime_get_ts64 可以將目前的時間轉換成上述提到的結構 timespec64 的形式，以下擷取部份程式碼

/**
 * ktime_get_ts64 - get the monotonic clock in timespec64 format
 * @ts:		pointer to timespec variable
 *
 * The function calculates the monotonic clock from the realtime
 * clock and the wall_to_monotonic offset and stores the result
 * in normalized timespec64 format in the variable pointed to by @ts.
 */
void ktime_get_ts64(struct timespec64 *ts)
{
    struct timekeeper *tk = &tk_core.timekeeper;
    struct timespec64 tomono;
    unsigned int seq;
    u64 nsec;
    ...
}

有了以上的背景知識，可以開始在 khttpd 上進行實作，建立函式 time_update 如下所示

static void time_update()
{
    struct timespec64 tv;
    ktime_get_ts64(&tv);
    current_msec = tv.tv_sec * 1000 + tv.tv_nsec / 1000000;
}

如此一來就可以得到當下的時間，單位為毫秒

實作 prority queue 並且管理每個連線

在實作之前應該要先定義問題，首先只會有一個 consumer 移除資料，也就是執行 daemon 的執行緒，而 producer 則是由多個處理連線的執行緒組成，因此可以定義為 MPSC 的問題

直接實作多執行緒的版本太複雜，因此這裡先實作出單執行緒可以執行的版本，完整修改可以參考 Create timer to close http connection (only single thread) ，經過了這次的修改，已經解決上述所提到卸載模組產生的問題

經過多時的修改，目前完成了一個「可以動」的 lock-free 版本，完整修改可以參考 Rewrite timer from single thread to multiple thread 及 Update the key when connection resend the request ，以下定義 timer 和 priority queue 的結構

typedef int (*timer_callback)(struct socket *, enum sock_shutdown_cmd);
typedef struct {
    size_t key;
    size_t pos; // the position of timer in queue
    timer_callback callback;
    struct socket *socket;
} timer_node_t;

typedef int (*prio_queue_comparator)(void *pi, void *pj);
typedef struct {
    void **priv;
    atomic_t nalloc;  // number of items in queue
    atomic_t size;
    prio_queue_comparator comp;
} prio_queue_t;

整個 priority queue 的流程如下所示

建立 priority queue 並開始等待連線
只要有新增連線，就使用函式 prio_queue_insert 新增新的 timer 並加到 priority queue
使用函式 handle_expired_timers 偵測是否有 timer 逾期
卸載模組時，使用函式 http_free_timer 釋放所有 timer 及 priority queue
只要有連線再次送出請求，則需要更新其 key

插入 timer 到 priority queue

函式 prio_queue_insert 主要功能為插入 timer 到 priority queue 裡，如同前面所說，這次的實作可以解讀成 MPSC ，因此這裡需要解決多個 producer 要插入的問題























/* add a new item to the heap */
static bool prio_queue_insert(prio_queue_t *ptr, void *item)
{
    timer_node_t **slot;  // get the address we want to store item
    size_t old_nalloc, old_size;
    long long old;

restart:
    old_nalloc = atomic_read(&ptr->nalloc);
    old_size = atomic_read(&ptr->nalloc);

    // get the address want to store
    slot = (timer_node_t **) &ptr->priv[old_nalloc + 1];
    old = (long long) *slot;

    do {
        if (old_nalloc != atomic_read(&ptr->nalloc))
            goto restart;
    } while (!prio_queue_cmpxchg(slot, &old, (long long) item));

    atomic_inc(&ptr->nalloc);
    return true;
}

而這裡的解決方式是利用判斷新舊成員數決定資料是否被別人寫入，也就是上述程式碼第 17 行，接著使用函式 prio_queue_cmpxchg 執行 CAS 操作，程式碼如下所示，參考 2022q1 第 8 週測驗題 - 測驗 2 實作

原本參考 Semantics and Behavior of Atomic and Bitmask Operations ，想使用 linux kernel 提供的 atomic_cmpxchg 實作 CAS ，但是後來發現 linux kernel 的 atomic API 只能對變數本身的值做讀寫，不能對變數指到的資料讀寫，因此改成以下 inline assembly 的方式實作

而函式的邏輯已經整理在 2022q1 Homework5 (quiz8) - lf_compare_exchange ，主要更動就是從原本的 128 位元改成了 64 位元

static inline bool prio_queue_cmpxchg(timer_node_t **var,
                                      long long *old,
                                      long long neu)
{
    bool ret;
    union u64 {
        struct {
            int low, high;
        } s;
        long long ui;
    } cmp = {.ui = *old}, with = {.ui = neu};

    /**
     * 1. cmp.s.hi:cmp.s.lo compare with *var
     * 2. if equall, set ZF and copy with.s.hi:with.s.lo to *var
     * 3. if not equall， clear ZF and copy *var to cmp.s.hi:cmp.s.lo
     */
    __asm__ __volatile__("lock cmpxchg8b %1\n\tsetz %0"
                         : "=q"(ret), "+m"(*var), "+d"(cmp.s.high),
                           "+a"(cmp.s.low)
                         : "c"(with.s.high), "b"(with.s.low)
                         : "cc", "memory");
    if (!ret)
        *old = cmp.ui;
    return ret;
}

另外， min heap 在插入新的資料後都要經過 swim 的方式移動到正確的位置，而在這次的案例，資料 key 紀錄逾期的時間，且每個 timer 插入的時間一定都會比之前的 timer 大，因此不會出現後面的資料比前面的資料小的情況，也就可以省略 swim 的動作，如此一來，這樣就和 ring buffer 的操作相同

從 priority queue 移除 timer

函式 prio_queue_delmin 主要功能為從 priority queue 移除最小的 timer ，因為是 MPSC ，這裡主要是避免 root 和最後一個成員交換時會有 producer 加入新資料(程式碼第 14 行)，也是依據 heap 的新舊成員數來判斷是否有受到其他 producer 的影響

接著就是更新新的成員數並且執行 sink 的動作，最後關閉該 timer 的連線以及釋放其記憶體






























/* remove the item with minimum key value from the heap */
static bool prio_queue_delmin(prio_queue_t *ptr)
{
    size_t nalloc;
    timer_node_t *node;

    do {
        if (prio_queue_is_empty(ptr))
            return true;

        nalloc = atomic_read(&ptr->nalloc);
        prio_queue_swap(ptr, 1, nalloc);

        if (nalloc == atomic_read(&ptr->nalloc)) {
            node = ptr->priv[nalloc--];
            break;
        }
        // change again
        prio_queue_swap(ptr, 1, nalloc);
    } while (1);

    atomic_set(&ptr->nalloc, nalloc);
    prio_queue_sink(ptr, 1);

    if (node->callback)
        node->callback(node->socket, SHUT_RDWR);

    kfree(node);
    return true;
}

實測程式碼

接著可以測試程式運作以及實際的情況，使用命令 ./htstress localhost:8081 -n 20000 進行測試，以下節錄部份的程式運行過程，可以觀察到多個執行緒執行的狀況是正常的

remove node 00000000c8603c50 key 10712635 nalloc 17198
remove node 000000006d6d7424 key 10712635 nalloc 17197
remove node 00000000a7621098 key 10712635 nalloc 17196
add node 0000000047dc72c9 key 10720635 nalloc 17197
add node 00000000a80a0d9c key 10720635 nalloc 17198
add node 000000009c494072 key 10720635 nalloc 17199
add node 00000000972001db key 10720635 nalloc 17200
add node 00000000835c9d0f key 10720635 nalloc 17201
remove node 000000005c21a0ec key 10712636 nalloc 17200
remove node 000000004ed780e8 key 10712636 nalloc 17199
remove node 000000000be7bfbb key 10712636 nalloc 17198
remove node 00000000ad509208 key 10712636 nalloc 17197
remove node 000000001bf92ea9 key 10712636 nalloc 17196
remove node 00000000697caab7 key 10712636 nalloc 17195
add node 000000002fd797a7 key 10720636 nalloc 17196
add node 000000009502d850 key 10720636 nalloc 17197
add node 000000007d125979 key 10720636 nalloc 17198
add node 000000000b6728c7 key 10720636 nalloc 17199
add node 00000000a524b323 key 10720636 nalloc 17200
add node 00000000b827ea2c key 10720636 nalloc 17201
remove node 000000008791c3cb key 10712637 nalloc 17200

接著展示伺服器會更新每個連線的逾期時間，目前每個連線約等待 8 秒，可以看到第一次測試約等了 8 秒後自動關閉連線，且第二次的連線在送出請求後會再等待新的 8 秒

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

目前的 khttpd 實作已經多了許多機制，雖然和市面上的伺服器還是有很大的差距，但是基本的功能大致上都有實作出來

最後展示目前程式處理的效率，一樣使用命令 ./htstress localhost:8081 -n 20000

requests:      20000
good requests: 20000 [100%]
bad requests:  0 [0%]
socket errors: 0 [0%]
seconds:       9.244
requests/sec:  2163.532

好奇 google 伺服器的效率，因此也來測試，順便比較看看，使用命令 ./htstress www.google.com:80 -n 200 ，以下為測試結果

requests:      200
good requests: 200 [100%]
bad requests:  0 [0%]
socket errors: 0 [0%]
seconds:       99.095
requests/sec:  2.018

接著稍微比較兩者對每次連線回傳的資料量，雖然要考慮的東西遠遠不如這樣，但是至少 khttpd 在處理連線的方面還算不差

	khttpd	google
bytes	2701	772

使用 ftrace 觀察 khttpd

參考 ftrace - Function Tracer 及《Demystifying the Linux CPU Scheduler》第五章可以了解 ftrace 的概念及用法

ftrace 是一個內建於 Linux kernel 的追蹤工具，可以用來追蹤函式、追蹤事件、計算 context switch 時間及中斷被關閉的時間點等等

首先確認目前的系統是否有 ftrace ，輸入以下命令

cat /boot/config-`uname -r` | grep CONFIG_HAVE_FUNCTION_TRACER

期望輸出如下

CONFIG_HAVE_FUNCTION_TRACER=y

接著要怎麼使用 ftrace ? ftrace 很酷的一點在於，可以透過寫入路徑 /sys/kernel/debug/tracing/ 內的檔案來設定 ftrace ，以下提供部份檔案，可使用命令 sudo ls /sys/kernel/debug/tracing 查看

available_events            max_graph_depth   stack_max_size
available_filter_functions  options           stack_trace
available_tracers           per_cpu           stack_trace_filter
buffer_percent              printk_formats    synthetic_events
...

至於這些檔案負責什麼功能，以下列出實驗有使用到的設定，剩下可以從 ftrace - Function Tracer 找到說明

current_tracer: 設定或顯示當前使用的 tracers ，像是 function 、 function_graph 等等
tracing_on: 設定或顯示使用的 tracer 是否開啟寫入資料到 ring buffer 的功能，如果為 0 表示關閉，而 1 則表示開啟
trace: 儲存 tracer 所輸出的資料，換言之，就是紀錄整個追蹤所輸出的訊息
available_filter_functions: 列出 kernel 裡所有可以被追蹤的 kernel 函式
set_ftrace_filter: 指定要追蹤的函式，該函式一定要出現在 available_filter_functions 裡
set_graph_function: 指定要顯示呼叫關係的函數，顯示的資訊類似於程式碼的模樣，只是會將所有呼叫的函式都展開
max_graph_depth: function graph tracer 追蹤函式的最大深度

有了以上的知識，可以開始追蹤 khttpd ，完整修改可以參考 Use ftrace to trace khttpd server ，這裡嘗試追蹤 khttpd 裡每個連線都會執行的函式 http_server_worker ，首先第一步就是要先掛載核心模組，且透過檔案 available_filter_functions 確定是否可以追蹤 khttpd 的函式，輸入命令 cat available_filter_functions | grep khttpd 查看，可以看到 khttpd 裡可以被追蹤的所有函式

parse_url_char [khttpd]
http_message_needs_eof.part.0 [khttpd]
http_message_needs_eof [khttpd]
http_should_keep_alive [khttpd]
http_parser_execute [khttpd]
http_method_str [khttpd]
http_status_str [khttpd]
http_parser_init [khttpd]
http_parser_settings_init [khttpd]
http_errno_name [khttpd]
http_errno_description [khttpd]
http_parser_url_init [khttpd]
...

接著建立 shell script 來追蹤函式 http_server_worker ，如下所示

#!/bin/bash
TRACE_DIR=/sys/kernel/debug/tracing

# clear
echo 0 > $TRACE_DIR/tracing_on
echo > $TRACE_DIR/set_graph_function
echo > $TRACE_DIR/set_ftrace_filter
echo nop > $TRACE_DIR/current_tracer

# setting
echo function_graph > $TRACE_DIR/current_tracer
echo 3 > $TRACE_DIR/max_graph_depth
echo http_server_worker > $TRACE_DIR/set_graph_function

# execute
echo 1 > $TRACE_DIR/tracing_on
./htstress localhost:8081 -n 2000
echo 0 > $TRACE_DIR/tracing_on

主要邏輯就是先清空 ftrace 的設定，接著設定函式 http_server_worker 為要追蹤的函式，最後在測試時開啟 tracer

執行 shell script 後，從 ftrace 的檔案 trace 可以看到追蹤的輸出，以下節錄部份輸出

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  http_server_worker [khttpd]() {
 0)               |    kernel_sigaction() {
 0)   0.296 us    |      _raw_spin_lock_irq();
 0)   0.937 us    |    }
 0)   0.329 us    |    }
 0)               |    kmem_cache_alloc_trace() {
 0)   0.165 us    |      __cond_resched();
 0)   0.114 us    |      should_failslab();
 0)   0.913 us    |    }
 0)   0.111 us    |    http_parser_init [khttpd]();
 0)               |    http_add_timer [khttpd]() {
 0)   0.433 us    |      kmem_cache_alloc_trace();
 0)   0.174 us    |      ktime_get_ts64();
 0)   1.052 us    |    }
 0)               |    http_server_recv.constprop.0 [khttpd]() {
 0)   3.134 us    |      kernel_recvmsg();
 0)   3.367 us    |    }
 0)               |    kernel_sock_shutdown() {
 0) + 40.992 us   |      inet_shutdown();
 0) + 41.407 us   |    }
 0)   0.433 us    |    kfree();
 0) + 50.869 us   |  }

由上面的結果可以看到整個 http_server_worker 函式所花的時間以及內部函式所花的時間，有這樣的實驗可以開始分析造成 khttpd 效率低落的原因

找出 khttpd 的效能瓶頸

將可以追蹤函式的深度增加後，再次追蹤函式 http_server_worker 一次，以下為單次連線的追蹤結果

- echo 3 > $TRACE_DIR/max_graph_depth
+ echo 5 > $TRACE_DIR/max_graph_depth

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 3)               |  http_server_worker [khttpd]() {
 3)               |    kernel_sigaction() {
 3)   0.082 us    |      _raw_spin_lock_irq();
 3)   0.238 us    |    }
 3)               |    kernel_sigaction() {
 3)   0.079 us    |      _raw_spin_lock_irq();
 3)   0.222 us    |    }
 3)               |    kmem_cache_alloc_trace() {
 3)               |      __cond_resched() {
 3)   0.070 us    |        rcu_all_qs();
 3)   0.209 us    |      }
 3)   0.068 us    |      should_failslab();
 3)   0.567 us    |    }
 3)   0.076 us    |    http_parser_init [khttpd]();
 3)               |    http_add_timer [khttpd]() {
 3)               |      kmem_cache_alloc_trace() {
 3)               |        __cond_resched() {
 3)   0.070 us    |          rcu_all_qs();
 3)   0.201 us    |        }
 3)   0.070 us    |        should_failslab();
 3)   0.490 us    |      }
 3)   0.084 us    |      ktime_get_ts64();
 3)   0.792 us    |    }
 3)               |    http_server_recv.constprop.0 [khttpd]() {
 3)               |      kernel_recvmsg() {
 3)               |        sock_recvmsg() {
 3)   0.260 us    |          security_socket_recvmsg();
 3) + 13.319 us   |          inet_recvmsg();
 3) + 13.813 us   |        }
 3) + 13.957 us   |      }
 3) + 14.118 us   |    }
 3)               |    http_parser_execute [khttpd]() {
 3)   0.085 us    |      http_parser_callback_message_begin [khttpd]();
 3)   0.350 us    |      parse_url_char [khttpd]();
 3)   0.114 us    |      http_parser_callback_request_url [khttpd]();
 3)   0.077 us    |      http_parser_callback_header_field [khttpd]();
 3)   0.068 us    |      http_parser_callback_header_value [khttpd]();
 3)   0.070 us    |      http_parser_callback_headers_complete [khttpd]();
 3)   0.073 us    |      http_should_keep_alive [khttpd]();
 3)               |      http_parser_callback_message_complete [khttpd]() {
 3)   0.069 us    |        http_should_keep_alive [khttpd]();
 3)               |        handle_directory [khttpd]() {
 3)   6.950 us    |          filp_open();
 3) + 16.874 us   |          http_server_send [khttpd]();
 3) + 12.320 us   |          http_server_send [khttpd]();
 3) ! 478.209 us  |          iterate_dir();
 3)   8.984 us    |          http_server_send [khttpd]();
 3)   9.507 us    |          http_server_send [khttpd]();
 3)   1.422 us    |          filp_close();
 3) ! 536.623 us  |        }
 3) ! 536.907 us  |      }
 3) ! 540.598 us  |    }
 3)   0.078 us    |    http_should_keep_alive [khttpd]();
 3)               |    kernel_sock_shutdown() {
 3)               |      inet_shutdown() {
 3)               |        lock_sock_nested() {
 3)   0.106 us    |          __cond_resched();
 3)   0.074 us    |          _raw_spin_lock_bh();
 3)   0.069 us    |          __local_bh_enable_ip();
 3)   0.524 us    |        }
 3)               |        tcp_shutdown() {
 3)   0.127 us    |          tcp_set_state();
 3)   5.040 us    |          tcp_send_fin();
 3)   5.393 us    |        }
 3)               |        sock_def_wakeup() {
 3)   0.077 us    |          rcu_read_unlock_strict();
 3)   0.230 us    |        }
 3)               |        release_sock() {
 3)   0.087 us    |          _raw_spin_lock_bh();
 3)   2.299 us    |          __release_sock();
 3)   0.078 us    |          tcp_release_cb();
 3)   0.106 us    |          _raw_spin_unlock_bh();
 3)   2.940 us    |        }
 3)   9.504 us    |      }
 3)   9.667 us    |    }
 3)   0.132 us    |    kfree();
 3) ! 567.482 us  |  }

由上面的結果可以清楚看到，影響 khttpd 效能最大的部份在於走訪目錄的函式 iterate_dir ，其次為用來接受和送出資料的函式 kernel_recvmsg 及 http_server_send

Jim Huang

2022/05/24 02:09:52

輸入

提交 pull request 到 khttpd (Edited)

Risheng

2022/05/24 06:39:57

已提交 PR !

2022/05/24 03:41:34

TODO: 參照 seHTTPd，主動關閉 timeout 的連線 (Edited)

2022/05/24 03:42:38

結構 `dir_context

ctx 不是好的命名，請改進 (Edited)

2022/05/24 08:36:51

已修改！

2022/06/03 20:39:11

修改後的函式 `handle_directory` 做了以下幾件事 1.

應考慮 ".." 的處理 (Edited)

2022/06/05 07:45:56

已修改，也更新到筆記中 (Edited)

2022/06/15 16:14:58

將「解釋如何傳遞資料到核心模組」內容搬到新建立的 HackMD 頁面 (Edited)

2022/06/19 03:33:21

不能對變數指到的資料讀寫

將指標型態轉型為 long，亦可用 CAS (Edited)

2022/06/19 11:07:41

最後展示目前程式處理的效率，一樣使用命令 `./htstress localhost:8081 -n 2000

TODO: 使用 ftrace (在《Demystifying the Linux CPU Scheduler》的第五章有相關使用方式) 來找出效能的瓶頸 (Edited)

2022q1 Homework6 (khttpd)

實驗環境 (筆電)

自我檢查清單

解釋如何傳遞資料到核心模組

掛載 khttpd 模組

執行 http_server_worker

比較 khttpd 和 CS:APP 給定的網站伺服器

htstress.c 原理分析

epoll 系統呼叫

開發紀錄

khttpd 實作的缺失

減少 printk 的使用

引入 CMWQ 到 khttpd

HTTP keep-alive 模式

實作 directory listing 功能

取得現行目錄

讀取檔案資料

使用 Chunked transfer encoding 送出目錄資料

使用 MIME 處理不同類型的檔案

卸載模組時產生錯誤

建立 timer 主動關閉連線

將 socket 設定為 non-blocking

讀取目前的時間

實作 prority queue 並且管理每個連線

插入 timer 到 priority queue

從 priority queue 移除 timer

實測程式碼

使用 ftrace 觀察 khttpd

找出 khttpd 的效能瓶頸

Read more

第一次使用 ARM Cortex-M4 就上手

面試經驗

Cortex-M4 practices

2023q1 Homework3 (fibdrv)

掛載 `khttpd` 模組

執行 `http_server_worker`

比較 `khttpd` 和 CS:APP 給定的網站伺服器

`htstress.c` 原理分析

`khttpd` 實作的缺失

減少 `printk` 的使用

引入 CMWQ 到 `khttpd`