2024 年 Linux 核心設計/實作課程作業 —— ktcp

--- title: 2024 年 Linux 核心設計/實作課程作業 —— ktcp image: https://repository-images.githubusercontent.com/181623502/5a221200-560c-11ea-8a63-53e08f8c367c description: 檢驗學員對 Linux 核心 kthread 和 workqueue 處理機制的認知 tags: linux2024 --- # L11: ktcp > 主講人: [jserv](https://wiki.csie.ncku.edu.tw/User/jserv) / 課程討論區: [2024 年系統軟體課程](https://www.facebook.com/groups/system.software2024/) :mega: 返回「[Linux 核心設計/實作](https://wiki.csie.ncku.edu.tw/linux/schedule)」課程進度表 ==[解說錄影](https://youtu.be/fXPfIkiG-Go)== (2024 年) ==[解說錄影](https://youtu.be/XMKNPiIYVCE)== (2022 年) ## :memo: 預期目標 * 學習〈[Linux 核心設計: 針對事件驅動的 I/O 模型演化](https://hackmd.io/@sysprog/linux-io-model)〉 * 探討 TCP 伺服器開發議題 * 學習 Linux 核心的 kernel thread 和 workqueue 處理機制 * 學習 [Concurrency Managed Workqueue](https://www.kernel.org/doc/html/latest/core-api/workqueue.html) (cmwq)，搭配閱讀《Demystifying the Linux CPU Scheduler》第 1 章和第 2 章，得知 CPU 排程器和 workqueue/CMWQ 的互動 * 預習電腦網路原理 * 學習 [Ftrace](https://docs.kernel.org/trace/ftrace.html)，搭配閱讀《Demystifying the Linux CPU Scheduler》第 6 章 ## :rocket: `kecho`: 執行在 Linux 核心模式的 TCP 伺服器取得 kecho 原始程式碼並編譯: ```shell $ git clone https://github.com/sysprog21/kecho $ cd kecho $ make ``` 預期會見到以下: * 執行檔: `bench` 及 `user-echo-server` * 核心模組 `kecho.ko` 及 `drop-tcp-socket.ko` 接著可進行測試: ```shell $ make check ``` 參考輸出: ``` Preparing... Send message via telnet Progress : [########################################] 100% Complete ``` 該操作由以下動作組成: ```shell $ sudo insmod kecho.ko $ telnet localhost 12345 ``` 會出現以下輸出: ``` Trying ::1... Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. ``` 可輸入任何字元 (記得按下 Enter)，然後就會看到 telnet 回應你剛才輸入的字元。按下 `Ctrl` 和 `]` 組合鍵，之後按下 `q`，即可離開 telnet 畫面。接著可以試著在 `$ telnet localhost 12345` 時不要輸入任何字元，只是等待，會看到以下的 kernel 訊息 (可用 `$ dmesg` 觀察): ``` cope le: 4404 kB RssShmem: 0 kB VmData: 880 kB VmStk: 132 kB VmExe: 136 kB VmLib: 6336 kB VmPTE: 212 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 Threads: 1 SigQ: 0/31543 ``` kecho 掛載時可指定 port 號碼: (預設是 `port=12345`) ```shell $ sudo insmod kecho.ko port=1999 ``` 修改或測試 kecho 的過程，可能因為 `TIME-WAIT` sockets 持續佔用，導致 `rmmod` 無法成功，這時可透過給定的 `drop-tcp-socket` 核心模組來剔除特定的 TCP 連線。請詳細閱讀 [kecho](https://github.com/sysprog21/kecho) 以得知必要的設定和準備工作。 ## :house: `user-echo-server`: 執行於使用者層級的 TCP 伺服器 `user-echo-server` 是 `kecho` 的使用者層級的實作，可對照功能和比較效能，運用 [epoll](http://man7.org/linux/man-pages/man7/epoll.7.html) 系統呼叫，會傾聽 port 12345。不管是 `user-echo-server` 抑或 `kecho`，都可搭配給定的 `bench` 程式來分析效能。請詳細閱讀 [kecho](https://github.com/sysprog21/kecho) 以得知必要的設定和準備工作。 ## :shark: seHTTPd [seHTTPd](https://github.com/sysprog21/sehttpd) 是個高效的 web 伺服器，涵蓋並行處理、I/O 事件模型、epoll, [Reactor pattern](http://en.wikipedia.org/wiki/Reactor_pattern)，和 Web 伺服器在事件驅動架構的考量，可參見 [高效 Web 伺服器開發](https://hackmd.io/@sysprog/fast-web-server)。預先準備的套件: (eBPF 作為後續分析使用) ```shell $ sudo apt install wget $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 4052245BD4284CDD $ echo "deb https://repo.iovisor.org/apt/$(lsb_release -cs) $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/iovisor.list $ sudo apt-get update $ sudo apt-get install bcc-tools linux-headers-$(uname -r) $ sudo apt install apache2-utils ``` 取得程式碼和編譯: ```shell $ git clone https://github.com/sysprog21/sehttpd $ cd sehttpd $ make ``` 預期可見 `sehttpd` 這個執行檔。接著透過內建的 test suite 來測試: ```shell $ make check ``` ### 對 seHTTPd 進行壓力測試首先，我們可用「古典」的方法，透過 [Apache Benching tool](https://httpd.apache.org/docs/current/programs/ab.html) 對 seHTTPd 進行壓力測試。在一個終端機視窗執行以下命令: ```shell $ ./sehttpd ``` 切換到網頁瀏覽器，開啟網址 `http://127.0.0.1:8081/` 應在網頁瀏覽器畫面中見到以下輸出: :::info Welcome! If you see this page, the seHTTPd web server is successfully working. ::: 然後在另一個終端機視窗執行以下命令: ```shell $ ab -n 10000 -c 500 -k http://127.0.0.1:8081/ ``` 參考程式輸出: (數值若跟你的測試結果有顯著出入，實屬正常) ```shell Server Software: seHTTPd Server Hostname: 127.0.0.1 Server Port: 8081 Document Path: / Document Length: 241 bytes Concurrency Level: 500 Time taken for tests: 0.927 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 4180000 bytes HTML transferred: 2410000 bytes Requests per second: 10784.81 [#/sec] (mean) Time per request: 46.361 [ms] (mean) Time per request: 0.093 [ms] (mean, across all concurrent requests) Transfer rate: 4402.39 [Kbytes/sec] received ``` 留意到上述幾項: * `-k` 參數: 表示 "Enable the HTTP KeepAlive feature"，也就是在一個 HTTP session 中執行多筆請求 * `-c` 參數: 表示 concurrency，即同時要下達請求的數量 * `-n` 參數: 表示壓力測試過程中，期望下達的請求總量關於輸出結果，請詳閱 [ab - Apache HTTP server benchmarking tool](https://httpd.apache.org/docs/current/programs/ab.html) 說明。 :notebook: 需要注意的是，`ab` 無法有效反映出多執行緒的特性 (`ab` 自身就消耗單核 100% 的運算量)，因此我們才會在 [khttpd](https://github.com/sysprog21/khttpd) 提供 [htstress.c](https://github.com/sysprog21/khttpd/blob/master/htstress.c)，後者提供 `-t` 選項，能夠依據測試環境的有效 CPU 個數進行分配。 [ab - Apache HTTP server benchmarking tool](https://httpd.apache.org/docs/current/programs/ab.html) 的實作從今日的 GNU/Linux 或 FreeBSD 來說，算是過時且未能反映系統特性，除了 `htstress`，尚可使用 [wrk](https://github.com/wg/wrk)，該專案的訴求是 > wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue. 另一個可測試 HTTP 伺服器負載的工具是 [httperf](https://github.com/httperf/httperf)。 ### 例外處理倘若你將 seHTTPd 執行後，不立刻關閉，隨即較長時間的等待和重新用上述 `ab` 多次測試 (變更 `-n` 和 `-c` 參數的指定數值) 後，可能會遇到以下錯誤狀況 (部分) 1. Segmentation fault; 2. 顯示訊息 `[ERROR] (src/http.c:253: errno: Resource temporarily unavailable) rc != 0` 可用 `$ grep -r log_err` 搜尋原始程式碼，以得知現有的例外處理機制 (注意: 裡頭存在若干缺失，切勿「舉燭」)。 ### :microscope: 以 [eBPF](https://en.wikipedia.org/wiki/Berkeley_Packet_Filter) 追蹤 HTTP 封包研讀〈[Linux 核心設計: 透過 eBPF 觀察作業系統行為](https://hackmd.io/@sysprog/linux-ebpf) 〉以理解核心動態追蹤機制，後者允許我們使用非侵入式的方式，不用去修改我們的作業系統核心內部，不用去修改我們的應用程式，也不用去修改我們的業務程式碼或者任何系統配置，就可快速高效地精確獲取我們想要的資訊。在 [seHTTPd](https://github.com/sysprog21/sehttpd) 原始程式碼的 [ebpf 目錄](https://github.com/sysprog21/sehttpd/tree/master/ebpf)提供簡易的 HTTP 封包分析工具，就是建構在 eBPF 的基礎之上，並透過 [IO Visor](https://github.com/iovisor) 提供的工具來運作。概念示意圖: ![](https://i.imgur.com/3cgRcmI.png) 使用方式: (預先在另一個終端機視窗執行 `$ ./sehttpd`) ```shell $ cd ebpf $ sudo python http-parse-sample.py ``` 注意: 這個工具預設監控 `eth0` 這個網路介面 (network interface)。倘若你的預設網路介面不是 `eth0`，你需要依據 `ip` 工具的輸出，決定監控哪個網路介面。舉例來說，在某台 GNU/Linux 機器上執行以下命令: ```shell $ ip link ``` 你會見到若干輸出，如果你的環境裡頭已執行 [Docker](https://www.docker.com/)，輸出數量會很可觀，但不用擔心，排除 `lo`, `tun`, `virbr`, `docker`, `br-`, `veth` 開頭的輸出，然後就剩下 `enp5s0` 這樣的網路介面 (端視你的網路硬體而有不同)，於是可將上述命令改為: ```shell $ sudo python http-parse-sample.py -i enp5s0 ``` 然後打開網頁瀏覽器，多次存取和刷新 `http://127.0.0.1:8081/` 網址，然後你應可在上述執行 Python 程式的終端機見到類似以下的輸出: ``` TCP src port '51670' TCP dst port '8081' ¢GET / HTTP/1.1 IP hdr length '20' IP src '192.168.50.97' IP dst '61.70.212.51' TCP src port '8081' TCP dst port '51670' ÌHTTP/1.1 304 Not Modified ``` 關於上述程式運作的概況，可參考 [Appendix C](https://hackmd.io/@0xff07/r1f4B8aGI)。透過 eBPF 追蹤 [fibdrv](https://github.com/sysprog21/fibdrv) 核心模組的運作機制，可參見 [0xff07 的共筆](https://hackmd.io/@0xff07/S1vfNHWB8) * [對應的程式碼](https://github.com/0xff07/fibdrv/) ## :icecream: 電腦網路概論預習 [CS:APP 第 11 章](https://hackmd.io/s/ByPlLNaTG): Network Programming，搭配閱讀: * [nstack 開發紀錄 (1)](https://hackmd.io/s/ryfvFmZ0f) * [nstack 開發紀錄 (2)](https://hackmd.io/s/r1PUn3KGV) 網頁伺服器流程，參見下圖: ![](https://hackmd.io/_uploads/B1_un3szn.png) * 基本的 Client-Server 運作概念: 1. client 發送一個連線請求 2. server 接收到請求後，再根據 server 本身的資源處理相對應請求 3. server 回應(回傳相關資訊) 4. client 獲得從 server 回傳的結果 ![](https://hackmd.io/_uploads/rk0Y3hif3.png) * 再來看到細部的運作流程，左半部分為 client，右半部分為 server 1. 在 client 與 server 先透過 [`getaddrinfo()`](https://man7.org/linux/man-pages/man3/getaddrinfo.3.html) 啟用程序，回傳值為 `struct addrinfo` 的結構，裡面就含有連線所需要的資料，如:IP 位址、 port ([通訊埠](https://zh.m.wikipedia.org/zh-tw/%E9%80%9A%E8%A8%8A%E5%9F%A0))、 server 名稱...等等 2. 再來 client 與 server 呼叫(call) `socket()` 建立連接，回傳值為 `file descriptor` ，注意只有建立連結但不會操作系統，也不會往網路上傳送任何內容 > [socket man page](https://man7.org/linux/man-pages/man2/socket.2.html) > socket() creates an endpoint for communication and returns a file descriptor that refers to that endpoint. 3. server 使用 [`bind()`](https://man7.org/linux/man-pages/man2/bind.2.html) 函式將 `socket` 與特定的 IP 位址和 port 連接起來(在 kernel space 中進行) 4. server 開啟監聽狀態，呼叫 [`listen()`](https://man7.org/linux/man-pages/man2/listen.2.html) ，準備接受來自 client 的請求 5. server 就可以使用 [`accept()`](https://man7.org/linux/man-pages/man2/accept.2.html) 將 client 連接 6. 而 client 使用 [`connect()`](https://man7.org/linux/man-pages/man2/connect.2.html) 發送 Connection request 等待 server `accept` 7. 成功取得連線後，client 與 server 就可以進行通訊，使用 read 、 write 的方式(rio, reliable I/O，一部分來自 Unix I/O 系統，可用於讀寫 rom 並處理一些底層 I/O 的操作) 8. 當 client 結束連線，會發送結束連線的請求(EOF, End of file)，server 獲得此訊息後結束對 client 的連線。 9. 結束一個 client 的連線後，server 可以再接續下一個 client 的連線或是關閉整個 server ==此流程只能用於單一的 server/client 的連線，依照需求適用於小型連線系統，如:路由器內部系統設定== * 在 CS:APP 第 12 章 (並行程式設計)中，提到接收多個 client 的連線方式，與 khttpd 的方式相近，架構圖: ![](https://hackmd.io/_uploads/SyMih2jMn.png) * server 開啟監聽過程，當有任何一個 client 請求連線時，server 就會 `fork` 行程去處理到對應的 client ，所以在處理的過程中，子行程只要對應 client 就好，不會干涉到其他子行程的運作(`Address space` 獨立)，能完成多 client 的連線需求。 > 以 server 的角度就是持續的接受連線的請求 ## :rocket: kHTTPd: 執行在 Linux 核心模式的 Web 伺服器取得 kHTTPd 原始程式碼並編譯: ```shell $ git clone https://github.com/sysprog21/khttpd $ cd khttpd $ make ``` 預期會見到執行檔 `htstress` 和核心模組 `khttpd.ko`。接著可進行測試: ```shell $ make check ``` 參考輸出: ``` 0 requests 10000 requests 20000 requests 30000 requests 40000 requests 50000 requests 60000 requests 70000 requests 80000 requests 90000 requests requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socker errors: 0 [0%] seconds: 4.722 requests/sec: 21177.090 ``` 這台電腦的實驗結果顯示，我們的 kHTTPd 每秒可處理超過 20K 個 HTTP 請求。上述實驗透過修改過的 [htstress](https://github.com/arut/htstress) 工具得到，我們可拿來對 `http://www.google.com` 網址進行測試: ```shell $ ./htstress -n 1000 -c 1 -t 4 http://www.google.com/ ``` 參考輸出: ``` requests: 1000 good requests: 1000 [100%] bad requests: 0 [0%] socker errors: 0 [0%] seconds: 17.539 requests/sec: 57.015 ``` kHTTPd 掛載時可指定 port 號碼: (預設是 `port=8081`) ```shell $ sudo insmod khttpd.ko port=1999 ``` 除了用網頁瀏覽器開啟，也可用 `wget` 工具: ```shell $ wget localhost:1999 ``` 參考 `wget` 執行輸出: ``` Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:1999... failed: Connection refused. Connecting to localhost (localhost)|127.0.0.1|:1999... connected. HTTP request sent, awaiting response... 200 OK Length: 12 [text/plain] Saving to: 'index.html' ``` 得到的 `index.html` 內容就是 `Hello World!` 字串。下方命令可追蹤 kHTTPd 傾聽的 port 狀況: ```shell $ sudo netstat -apn | grep 8081 ``` 注意，在多次透過網頁瀏覽器存取 kHTTPd 所建立的連線後，可能在 module unload 時，看到 `dmesg` 輸出以下: ``` CPU: 37 PID: 78277 Comm: khttpd Tainted: G D WC OE K 4.15.0-91-generic #92-Ubuntu Hardware name: System manufacturer System Product Name/ROG STRIX X399-E GAMING, BIOS 0808 10/12/2018 RIP: 0010:0xffffffffc121b845 RSP: 0018:ffffabfc9cf47d68 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8f1bc898a000 RCX: 0000000000000218 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff84c5107f RBP: ffffabfc9cf47dd8 R08: ffff8f14c7e34e00 R09: 0000000180400024 R10: ffff8f152697cfc0 R11: 000499a097a8e100 R12: ffff8f152697cfc0 R13: ffffabfc9a6afe10 R14: ffff8f152697cfc0 R15: ffff8f1d4a5f8000 FS: 0000000000000000(0000) GS:ffff8f154f540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc121b845 CR3: 000000138b00a000 CR4: 00000000003406e0 Call Trace: ? __schedule+0x256/0x880 ? kthread+0x121/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ? ret_from_fork+0x22/0x40 Code: Bad RIP value. RIP: 0xffffffffc121b845 RSP: ffffabfc9cf47d68 CR2: ffffffffc121b845 ---[ end trace 76d6d2ce81c97c71 ]--- ``` ### [`htstress.c`](https://github.com/sysprog21/khttpd/blob/master/htstress.c) 流程 `htstress.c` 為 client，做為發送給 server 的測試，未傳入參數時可以得到參數的設定模式，如下 ```shell $./htstress Usage: htstress [options] [http://]hostname[:port]/path Options: -n, --number total number of requests (0 for inifinite, Ctrl-C to abort) -c, --concurrency number of concurrent connections -t, --threads number of threads (set this to the number of CPU cores) -u, --udaddr path to unix domain socket -h, --host host to use for http request -d, --debug debug HTTP response --help display this message ``` 對應在 `script/test.sh` 中的敘述: ```shell ./htstress -n 100000 -c 1 -t 4 http://localhost:8081/ ``` * `-n` : 表示對 server 請求連線的數量 * `-c` : 表示總體對 server 的連線數量 * `-t` : 表示使用多少執行緒 `main` 中主要建立與 server 的連線 1. 設定參數 : 透過 [`getopt_long()`](https://linux.die.net/man/3/getopt_long) 獲得輸入的參數，再透過 `swtich` 設定對應的變數 2. 設定連線所需的資訊 : [`getaddrinfo`](https://man7.org/linux/man-pages/man3/getaddrinfo.3.html) 取得多個 `addrinfo` 結構，裡面含有 server 的 IP 位址 3. 計算時間 : `start_time()` 紀錄時間，使用 [`gettimeofday()`](https://man7.org/linux/man-pages/man2/gettimeofday.2.html) 計算運行時間 4. 測試 server 連線 : 使用 [`pthread_create`](https://man7.org/linux/man-pages/man3/pthread_create.3.html) 創立參數所設定的執行數數量，執行 `worker()` 函式對應到每一個創建 client，發送連線請求給 server 5. 印出測試結果再來看到 `worker()` 函式，與 server 進行連線過程，分別要建立與 server 連線的 client 與 epoll 程序監聽 * 建立 `epoll_event` 結構陣列儲存監聽資料，變數名稱為 `evts[MAX_EVENT]` (MAX_EVENT 為設定監聽事件數量的最大值) ```c struct epoll_event { uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ } __EPOLL_PACKED; ``` * [`epoll_create`](https://man7.org/linux/man-pages/man2/epoll_create.2.html) (變數為 `efd`)建立總體對 server 的 concurrency(1) 連線 > 不過自從 Linux2.6.8 後 epoll_create 中 size 的引數是被忽略的，建立好後占用一個 fd，使用後必須呼叫 close() 關閉，否則會導致資源的浪費 * socket 連線方式，定義於函式 `init_conn()`，並設定 epoll 程序 * socket 連線定義於 `struct econn ecs[concurrency], *ec` 中，進行初始化將 efd(epoll fd) 與 socket(ecs) 傳入 `init_conn()` 中 * 先透過 `socket()` 建立與 server 的連線，並返回 fd，傳入 `ec->fd` 中 * [fcntl()](https://man7.org/linux/man-pages/man2/fcntl.2.html) : file control，對 fd 更改特性，`fctrl(ec->fd, F_SETFL, O_NONBLOCK)` 將 socket 的 fd 更改為非阻塞式，相比於阻塞式的方式，不會因為讀取不到資料就會停著 * [connect()](https://man7.org/linux/man-pages/man2/connect.2.html) : 為系統呼叫，根據 socket 的 fd (`ec->fd`) 與 server 的 IP 地址連線，因為是 nonblocking 的型式，所以不會等待連線成功的時候才會返回，因此在未連線時會回傳一巨集 `EAGAIN` 表示未連線，所以將 `connect()` 在迴圈中執行到連線成功 * [epoll_ctl()](https://man7.org/linux/man-pages/man2/epoll_ctl.2.html) : 將連線成功的 socket (ec->fd)加入在 epoll 監聽事件(efd)中，所使用到 `EPOLL_CTL_ADD` 巨集加入監聽事件，並將 efd 事件設定為可寫的狀態，使用 `EPOLLOUT` ```c static void init_conn(int efd, struct econn *ec) { int ret; // 建立連線 ec->fd = socket(sss.ss_family, SOCK_STREAM, 0); ... // 設定 fd 控制權為 nonblock 形式 fcntl(ec->fd, F_SETFL, O_NONBLOCK); // sys call 連線 do { ret = connect(ec->fd, (struct sockaddr *) &sss, sssln); } while (ret && errno == EAGAIN); ... // 設定 epoll fd 的事件狀態，並指向 socket struct epoll_event evt = { .events = EPOLLOUT, .data.ptr = ec, }; // 加入已完成連線的 socket 加入 epoll 監聽程序中 if (epoll_ctl(efd, EPOLL_CTL_ADD, ec->fd, &evt)) { ... } } ``` 連線的初始化完成後，繼續看 `worker()` 處理 I/O 事件的無限 for-loop * epoll 監聽 : * 進入無限的 for-loop 中處理所有的連線請求 * 使用 [`epoll_wait`](https://hackmd.io/-YilHq7jQgS3S9LdUgqJmA#%E8%AC%9B%E8%A7%A3-htstressc-%E7%94%A8%E5%88%B0-epoll-%E7%B3%BB%E7%B5%B1%E5%91%BC%E5%8F%AB) 輪詢的方式將可用的 fd 儲存至 `evts` 陣列中 * 在 `htstress.c` 中 `evts.event` 表示事件狀態的巨集: * EPOLLIN : 表示對應 fd 可讀 * EPOLLOUT : 表示對應 fd 可寫 * EPOLLERR : 表示對應 fd 發生錯誤 * EPOLLHUP : 表示對應 fd 被結束連線 * epoll 的錯誤處理，以 `if (evts[n].events & EPOLLERR){ ... }` 判斷事件是否為錯誤狀態 * [getsockopt()](https://man7.org/linux/man-pages/man2/getsockopt.2.html) : 可以獲得 epoll 監聽 socket 的狀態，透過巨集 `SO_ERROR` 紀錄錯誤訊息(0 為沒有錯誤的產生)，看到宣告方式 `if (getsockopt(efd, SOL_SOCKET, SO_ERROR, (void *) &error, &errlen) == 0)`，讀取到 `efd` 的資料將檢查的結果寫入至 `error` 變數中 * [atomic_fetch_add()](https://en.cppreference.com/w/c/atomic/atomic_fetch_add) : 使用到 atomic 的操作方式，紀錄錯誤產生的數量，確保在記錄錯誤數量的時候不會被多執行緒干擾(每個 socket 都有機會互相干擾程序，所以要確保計數的正確性) * [close()](https://man7.org/linux/man-pages/man2/close.2.html) : 將有錯誤連線的 socket 連線關閉，避免系統的佔用 > [ISO/IEC 9899:2011 (P.283) : atomic_fetch function](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1548.pdf) > These operations are atomic read-modify-write operations. ```c if (evts[n].events & EPOLLERR) { /* normally this should not happen */ ... if (getsockopt(efd, SOL_SOCKET, SO_ERROR, (void *) &error, &errlen) == 0) {...} ... // 計數錯誤 atomic_fetch_add(&socket_errors, 1); // 關閉有錯誤的 socket fd close(ec->fd); ... // 重新初始化連線 init_conn(efd, ec); } ``` * client 傳送數據至 server : * 事件狀態為 EPOLLOUT(表示可寫) : 先確認是事件的狀態為可寫，並確保連線的狀態是可用的，使用 [`send()`](https://man7.org/linux/man-pages/man2/send.2.html) 函式開啟要傳送資料的 fd，再來傳送資料(包含傳送的資料與長度，以檔案的 offset 表示)，傳送成功後會返回傳送資料的長度 * 若傳送有問題時，紀錄錯誤訊息(使用 [`write()`](https://man7.org/linux/man-pages/man2/write.2.html)，注意到 `write` 的第一個引數為 fd，這裡使用 `2`，[參考文章](http://codewiki.wikidot.com/c:system-calls:write)解釋，0 表示 `STDIN` 標準輸入(鍵盤)，1 表示 `STDOUT` 標準輸出(終端機視窗)，2 表示 `STDERR` 標準錯誤輸出(將錯誤訊息輸出至終端機)) * 確認資料是否有完整傳送至 server，將事件改為 `EPOLLIN` 可讀的狀態，等待 server 傳送資料 * server 傳送數據至 client : * 事件狀態為 `EPOLLIN`，使用 [`recv()`](https://man7.org/linux/man-pages/man2/recv.2.html) 得到從 server 傳送來的資料，從 socket 的 fd 獲得，將獲得的資料存入 `buffer(inbuf)` 中。 * 關閉 client 與 server 連線: * 當處理完所有的通訊資料後(也就是 `ret = 0`) 時，使用 [`close()`](https://man7.org/linux/man-pages/man2/close.2.html) 關閉 client 的 fd(要關閉否則會占用資源)，這裡要注意的是在建立 epoll 監聽與 socket 連線，同時都要有對應的 `close()` 關閉其 fd，不過在 `htstress.c` 中沒有看到對 epoll 的 fd 進行 `close()` 的敘述。 > [epoll_create(2)](https://man7.org/linux/man-pages/man2/epoll_create.2.html) 提到: > When no longer required, the file descriptor returned by epoll_create() should be closed by using close(2). When all file descriptors referring to an epoll instance have been closed, the kernel destroys the instance and releases the associated resources for reuse. > (對應的 `epoll_create()` 要透過 `close()` 將 epoll fd 關閉，不過若 epoll 所監聽所有的 fd 已被關閉，核心就會直接釋放 epoll 的相關資源) ```c // client 傳送訊息，確認事件狀態為可寫 if (evts[n].events & EPOLLOUT) { ret = send(ec->fd, outbuf + ec->offs, outbufsize - ec->offs, 0); ... // 將錯誤訊息存入 if (debug & HTTP_REQUEST_DEBUG) write(2, outbuf + ec->offs, outbufsize - ec->offs); ... /* write done? schedule read */ if (ec->offs == outbufsize) { evts[n].events = EPOLLIN; evts[n].data.ptr = ec; ... // 事件可讀狀態 if (evts[n].events & EPOLLIN) { ... // 獲得從 server 傳來的資料 ret = recv(ec->fd, inbuf, sizeof(inbuf), 0); ... // 所有請求處理結束 if (!ret) { // 關閉 socket 連線 close(ec->fd); ... } ``` ### 核心 API 許多 Linux 裝置驅動程式或子系統會透過 kernel threads（簡稱`kthread`)，在背景執行提供特定服務，然後等待特定 events 的發生。等待的過程中，kthread 會進入 sleep 狀態，當 events 發生時，kthread 會被喚醒執行一些耗時的工作，如此一來，可防止 main thread 被 blocked。使用示範: [kernel-threads.c](https://github.com/muratdemirtas/Linux-Kernel-Examples/blob/master/kernel-threads.c) `kthread_run` 巨集在 Linux v5.5 的定義 [include/linux/kthread.h](https://elixir.bootlin.com/linux/v5.5/source/include/linux/kthread.h#L43) : ```cpp #define kthread_run(threadfn, data, namefmt, ...) \ ({ \ struct task_struct *__k \ = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \ if (!IS_ERR(__k)) \ wake_up_process(__k); \ __k; \ }) ``` 可見到 `kthread_create` 成功時直接 `wake_up_process`，回傳值為 `task_struct`。下方命令可查閱系統上的 kthread: ```shell $ ps -ef ``` 預期可見: ``` root 2 0 0 Feb17 ? 00:00:01 [kthreadd] ``` PPID 為 `2` 的都屬於 kthread，而 `$ ps auxf` 可見樹狀結構。參考輸出結果: ``` 0:01 /usr/sbin/sshd -D 0:00 \_ sshd: jserv [priv] 0:05 | \_ sshd: jserv@pts/11 0:03 | \_ -bash 0:00 | \_ ps auxf 0:00 | \_ less ``` 尋找剛才載入的 khttpd 核心模組: ```shell $ ps -ef | grep khttpd ``` 預期可見以下: ``` root 18147 2 0 14:13 ? 00:00:00 [khttpd] ```