# [2021 年暑期 Linux 核心](https://hackmd.io/@sysprog/linux2021-summer/) 第 4 週測驗題 ###### tags: `linux2021` :::info 目的: 檢驗學員對 ==[Atomics 操作](https://hackmd.io/@sysprog/concurrency-atomics)==, ==[Lock-Free Programming](https://hackmd.io/@sysprog/concurrency-lockfree)==, ==[事件驅動伺服器:原理和實例](https://hackmd.io/@sysprog/event-driven-server)==, ==[The Linux Kernel Module Programming Guide](https://sysprog21.github.io/lkmpg/)== 的認知 ::: ==[作答表單](https://docs.google.com/forms/d/e/1FAIpQLSfNPZf1s08LbrofIsvA7QPWPAVPjQ2Psfd4U9B8UuZ4GkJz3w/viewform)== ### 測驗 `1` 給定 `vpoll` 核心模組可產生/合成 ([synthesize](https://www.dictionary.com/browse/synthesize)) select / poll / epoll 的事件,一旦 `vpoll` 掛載進 Linux 核心後,將可藉由 `ioctl` 接受來自使用者層級的命令要求。以下是參考的測試程式碼: (檔名: `user.c`) ```cpp #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <sys/epoll.h> #include <sys/ioctl.h> #include <unistd.h> #define handle_error(msg) \ do { \ perror(msg); \ exit(EXIT_FAILURE); \ } while (0) #define VPOLL_IOC_MAGIC '^' #define VPOLL_IO_SETEVENTS _IO(VPOLL_IOC_MAGIC, 3) #define VPOLL_IO_DELEVENTS _IO(VPOLL_IOC_MAGIC, 2) #define VPOLL_IO_ADDEVENTS _IO(VPOLL_IOC_MAGIC, 1) int main(int argc, char *argv[]) { struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLERR | EPOLLOUT | EPOLLHUP | EPOLLPRI, .data.u64 = 0, }; int efd = open("/dev/vpoll", O_RDWR | O_CLOEXEC); if (efd == -1) handle_error("/dev/vpoll"); int epollfd = epoll_create1(EPOLL_CLOEXEC); if (efd == -1) handle_error("epoll_create1"); if (epoll_ctl(epollfd, EPOLL_CTL_ADD, efd, &ev) == -1) handle_error("epoll_ctl"); switch (fork()) { case 0: sleep(1); ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN); sleep(1); ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN); sleep(1); ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN | EPOLLPRI); sleep(1); ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLPRI); sleep(1); ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLOUT); sleep(1); ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLHUP); exit(EXIT_SUCCESS); default: while (1) { int nfds = epoll_wait(epollfd, &ev, 1, 1000); if (nfds < 0) handle_error("epoll_wait"); else if (nfds == 0) printf("timeout...\n"); else { printf("GOT event %x\n", ev.events); ioctl(efd, VPOLL_IO_DELEVENTS, ev.events); if (ev.events & EPOLLHUP) break; } } break; case -1: /* should not happen */ handle_error("fork"); } close(epollfd); close(efd); return 0; } ``` 參考執行輸出: ``` timeout... GOT event 1 timeout... GOT event 1 timeout... GOT event 3 timeout... GOT event 2 timeout... GOT event 4 timeout... GOT event ZZZ ``` > 最後一行 `GOT event ` 訊息不完整,留給學員推測 > `ZZZ` 為 16 進位數值 `vpoll` 原始程式碼可參見: [gist](https://gist.github.com/jserv/d4df5902686333198ec2c033b8c19af9) (部分程式碼隱蔽) `MMM` 和 `NNN` 為數值,`WWW` 為 Linux 核心 API 函式名稱,後者可參見 [memory-barriers.txt](https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt),以下摘錄: > `wake_up_process()` always executes a general memory barrier. The barrier again occurs before the task state is accessed. In particular, if the wake_up() in the previous snippet were replaced by a call to `wake_up_process()` then one of the two loads would be guaranteed to see 1. > > The available waker functions include: > * `complete()`; > * `wake_up()`; > * `wake_up_all()`; > * `wake_up_bit()`; > * `wake_up_interruptible()`; > * `wake_up_interruptible_all()`; > * `wake_up_interruptible_nr()`; > * `wake_up_interruptible_poll()`; > * `wake_up_interruptible_sync()`; > * `wake_up_interruptible_sync_poll()`; > * `wake_up_locked()`; > * `wake_up_locked_poll()`; > * `wake_up_nr()`; > * `wake_up_poll()`; > * `wake_up_process()`; > > In terms of memory ordering, these functions all provide the same guarantees of a `wake_up()` (or stronger). 搭配對照 [include/linux/wait.h](https://github.com/torvalds/linux/blob/master/include/linux/wait.h) 和 [include/uapi/linux/eventpoll.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/eventpoll.h) 請補完程式碼,使得執行符合預期。 ==作答區== MMM = ? NNN = ? WWW = ? ZZZ = ? :::success 延伸問題: 1. 解釋上述程式碼運作原理,並探討對應的 memory order 議題 2. 修改 `vpoll` 核心模組,實作效能評比的特別模式,從而分析 `epoll` 效能 * 參閱 [Epoll Kernel Performance Improvements](https://events19.linuxfoundation.org/wp-content/uploads/2018/07/dbueso-oss-japan19.pdf) 和 [linux-ipc-benchmarks](https://github.com/kamalmarhubi/linux-ipc-benchmarks) ::: --- ### 測驗 `2` 考慮 `lfring` 是個 lock-free ring buffer 實作,並支援 multiple-producer/multiple-consumer (MPMC) 的情境。測試程式的參考輸出: ```shell $ ./lfring testing MPMC lock-free ring testing MPSC lock-free ring testing SPMC lock-free ring testing SPSC lock-free ring ``` 執行過程不會觸發任何 `assert` 失敗。 `lfring` 目前只支援 x86-64 架構,可在 Linux 和 macOS 執行,程式碼可見 [gist](https://gist.github.com/jserv/f810c45ad4423f406f9e0dbe9dabadc9) (部分程式碼隱蔽) 請補完程式碼,使得執行符合預期。 延伸閱讀: * [An introduction to lockless algorithms](https://lwn.net/Articles/844224/) * [Lockless algorithms for mere mortals](https://lwn.net/Articles/827180/) * [Lockless patterns: an introduction to compare-and-swap](https://lwn.net/Articles/847973/) ==作答區== DDD = ? KKK = ? TTT = ? HHH = ? :::success 延伸問題: 1. 解釋上述程式碼運作原理,搭配閱讀 [DPDK: Ring Library](https://doc.dpdk.org/guides/prog_guide/ring_lib.html) 解說,儘量引入圖解和強調 MPMC 的考量 2. 提出改進策略並著手實作 3. 研讀 Linux 核心 [kfifo](https://www.kernel.org/doc/htmldocs/kernel-api/kfifo.html) 文件,搭配 [kfifo-examples](https://github.com/sysprog21/kfifo-examples) 測試,以 git 取得 [linux](https://github.com/torvalds/linux) 程式碼工作副本,觀察 `git log include/linux/kfifo.h lib/kfifo.c` 並觀察修改記錄 * 留意 `spin_unlock_irqrestore` 的使用 * 解釋 commit 7e10505 的 race condition 成因 4. 將 `lfring` 移植到 Linux 核心,並提供對應的測試及效能評比程式 :::