2022q1 seHTTPd 改進

# 2022q1 seHTTPd 改進 contributed by < `haoyu0970624763` > > [作業說明](https://hackmd.io/@sysprog/linux2022-sehttpd) > [GitHub](https://github.com/haoyu0970624763/sehttpd) ### 壓力測試 * [apache bench](https://httpd.apache.org/docs/current/programs/ab.html) * 用法: `ab -n 100000 -c 500 -k http://127.0.0.1:8081/` * 無法有效反映出多執行緒的特性 * [htstress](https://github.com/sysprog21/khttpd/blob/master/htstress.c) * 用法: * 編譯: `gcc -std=gnu11 -Wall -Wextra -Werror -o htstress htstress.c -lpthread` * 使用: `./htstress -n 10000 -c 100 -t 8 localhost:8081` * 使用 epoll 觸發、可反映多執行緒的特性 * [wrk](https://github.com/wg/wrk) * 只支援 HTTP 通訊協定的連線請求（如 get, post 等），若要執行 get 之外的 HTTP 請求，需要使用者自行撰寫 lua 腳本 * 尚有 [httperf](https://github.com/httperf/httperf) , [weighttp](https://github.com/lighttpd/weighttp) 等壓力測試工具等 ### 使用 computed goto 改寫 switch-case 在 [Computed goto for efficient dispatch tables](https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables) 一文中提及, goto 在執行效率上比 switch-case 要更好，主要可以從 2 個層面探討: * switch-case 會做 bounds checking , 檢查是否有 default case 的狀況，即使程式中的 switch 沒有寫到 default case，編譯期間仍會產生強制檢查的程式，這是為了符合 C99 的標準 > If no converted case constant expression matches and there is no default label, no part of the switch body is executed. * 另一部份則是 branch prediction branch prediction 的部份可參考老師之前撰寫的 [switch-背後的-goto-和實作考量](https://hackmd.io/@sysprog/c-control-flow?type=view#switch-%E8%83%8C%E5%BE%8C%E7%9A%84-goto-%E5%92%8C%E5%AF%A6%E4%BD%9C%E8%80%83%E9%87%8F) 我們透過 [Labels as Values](https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html)，在 http_parser.c 的兩個函式中加入 conditions 作為 label address table 讓 state 進行查表，並透過 goto 跳躍到對應的程式碼。以 `http_parse_request_body` conditions 為範例 ``` const void *conditions[] = { &&c_start, &&c_key, &&c_spaces_before_colon, &&c_spaces_after_colon, &&c_value, &&c_cr, &&c_crlf, &&c_crlfcr, }; ``` 接著用 goto 取代 switch ``` goto *conditions[state]; ``` 我用 perf 觀察原本的程式碼以及使用 computed goto 改寫後的效率 switch 版本的觀測紀錄如下 ``` $ sudo perf stat ./sehttpd -p 8081 $ ./htstress -n 100000 -c 1 -t 4 http:/127.0.0.1:8081/ requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socket errors: 0 [0%] seconds: 4.626 requests/sec: 21617.794 Performance counter stats for './sehttpd -p 8081': 4,551.69 msec task-clock # 0.498 CPUs utilized 458 context-switches # 100.622 /sec 36 cpu-migrations # 7.909 /sec 100,070 page-faults # 21.985 K/sec 14,861,152,979 cycles # 3.265 GHz 10,629,920,230 instructions # 0.72 insn per cycle 2,043,158,393 branches # 448.879 M/sec 25,004,139 branch-misses # 1.22% of all branches ``` goto 版本的觀測紀錄如下 ``` $ sudo perf stat ./sehttpd -p 8081 $ ./htstress -n 100000 -c 1 -t 4 http://127.0.0.1:8081/ requests: 100000 good requests: 100000 [100%] bad requests: 0 [0%] socket errors: 0 [0%] seconds: 4.150 requests/sec: 24094.580 Performance counter stats for './sehttpd -p 8081': 4,093.64 msec task-clock # 0.207 CPUs utilized 359 context-switches # 87.697 /sec 34 cpu-migrations # 8.306 /sec 100,071 page-faults # 24.445 K/sec 13,730,441,736 cycles # 3.354 GHz 10,613,940,571 instructions # 0.77 insn per cycle 2,038,720,358 branches # 498.021 M/sec 24,397,318 branch-misses # 1.20% of all branches ``` 可以看出在 computed goto 版本中的執行的 cycles 少於 switch 版本的且 branch miss rate 也略少於 switch 版本在改寫的過程中有參照 [oscarshiang](https://hackmd.io/@oscarshiang/linux_sehttpd#2020q1-Homework6-sehttpd) 的共筆，不過在測量的過程中我沒辦法觀測出 branch miss rate 有明顯的改善，我更傾向在這個實驗中 goto 之所以效能更好的原因在於減少了`bounds checking` #### 程式碼更動更改 buffer size 從 8124 更改為 8192 , 變成 2 的冪 , 我們就可以透過跟 (MAX_BUF-1) 做 bitwise-AND , 可取得 0~8191的範圍來取代 % 的運算 , 因為 & 運算比 % 來的更有效率原本 ```c p = (uint8_t *) &r->buf[pi % MAX_BUF]; ``` 更改成 ```c p = (uint8_t *) &r->buf[pi & (MAX_BUF - 1)]; ``` :::warning 可將 `pi % MAX_BUF` 改為 `pi & (MAX_BUF - 1)`，當 `MAX_BUF` 為 2 的冪詳見: [你所不知道的 C 語言: bitwise 操作](https://hackmd.io/@sysprog/c-bitwise) :notes: jserv ::: ### epoll 工作模式節錄自 [epoll](https://man7.org/linux/man-pages/man7/epoll.7.html) 中 >monitoring multiple file descriptors to see if I/O is possible on any of them epoll 用來監聽 I/O 事件是否準備好了 , 而 epoll 又可分成 * Level-triggered * 當 file descriptors 準備就緒時 , kernel 透過 epoll 發送通知告知我們已就緒,接著就可以進行IO操作, 當我們長時間沒有進行操作時 , kernel 會繼續發送通知給我們 * socket 在 readable/writable 狀態時，epoll_wait會返回該socket * Edge-triggered * 當 file descriptors 從還沒準備好轉變成準備好時,kernel 通過 epoll 發送通知告知我們,並且不再發送第二次通知以 [test_epoll_lt_and_et](https://github.com/Manistein/test_epoll_lt_and_et) 舉例說明可能會講的更清楚在客户端输入aabbcc，客户端發送包含 aabbcc 内容的數據包到服務端，在服務端一次可讀取2個字元的情況下 * lt 模式下, 服務端先讀取 aa 並輸出 , epoll_wait 可返回 socket 繼續讀取資料 , 讀取 bb並輸出 , 依此類推 * et 模式下, 服務端先讀取 aa 並輸出 , epoll_wait 只有等到新的數據到達時才會被喚醒 , bbcc 會遺留在服務端的 buffer 我們可以看出 1. lt 模式下的操作更為直覺與簡單 2. et 模式下可能隱含較多的 bug 此外, et 模式的效能會比 lt 來的更好 , 原因在於當 kernel 發送通知後 , 會把該事件從 ready list 中刪除 , 而 lt 模式則不會從 ready list 中刪除該事件 , 當 list中有越來越多事件時 , epoll 下次要通知用戶時 , 搜尋 list 的時間會增長兩種模式各有其優缺點 , 在高效的 web server 中 , 我認為採用 edge-triggered 的方式可能會更合適 ### timer web server 需要同時間處理多項請求 , 要優先處理哪個請求就變成一個重要的課題 , 而常見的作法之一是先處理最快到期的請求在 [mainloop.c](https://github.com/haoyu0970624763/sehttpd/blob/master/src/mainloop.c) 中 server 接收請求後會先呼叫 [timer.c](https://github.com/haoyu0970624763/sehttpd/blob/master/src/timer.c) 中的`handle_expired_timers` 函式處理已經超出期限但還沒處理的請求 , 接著會呼叫 `add_timer` 函式將新的請求加入 queue 中而 `add_timer` 中根據 `node->key = current_msec + timeout` (`current_msec` 表示當前的時間 , `timeout` 代表處理請求的最大延遲時間),顯而易見的可以發現 node->key 值代表的是此請求處理的期限我們想取得最快到期的請求就可以透過 priority queue(MIN_HEAP)的方式獲取 , 也就是 `timer.c` 中的 `prio_queue_min` 函式

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.