# [2021 年暑期 Linux 核心](https://hackmd.io/@sysprog/linux2021-summer/) 第 4 週測驗題
###### tags: `linux2021`
:::info
目的: 檢驗學員對 ==[Atomics 操作](https://hackmd.io/@sysprog/concurrency-atomics)==, ==[Lock-Free Programming](https://hackmd.io/@sysprog/concurrency-lockfree)==, ==[事件驅動伺服器:原理和實例](https://hackmd.io/@sysprog/event-driven-server)==, ==[The Linux Kernel Module Programming Guide](https://sysprog21.github.io/lkmpg/)== 的認知
:::
==[作答表單](https://docs.google.com/forms/d/e/1FAIpQLSfNPZf1s08LbrofIsvA7QPWPAVPjQ2Psfd4U9B8UuZ4GkJz3w/viewform)==
### 測驗 `1`
給定 `vpoll` 核心模組可產生/合成 ([synthesize](https://www.dictionary.com/browse/synthesize)) select / poll / epoll 的事件,一旦 `vpoll` 掛載進 Linux 核心後,將可藉由 `ioctl` 接受來自使用者層級的命令要求。以下是參考的測試程式碼: (檔名: `user.c`)
```cpp
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/epoll.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define handle_error(msg) \
do { \
perror(msg); \
exit(EXIT_FAILURE); \
} while (0)
#define VPOLL_IOC_MAGIC '^'
#define VPOLL_IO_SETEVENTS _IO(VPOLL_IOC_MAGIC, 3)
#define VPOLL_IO_DELEVENTS _IO(VPOLL_IOC_MAGIC, 2)
#define VPOLL_IO_ADDEVENTS _IO(VPOLL_IOC_MAGIC, 1)
int main(int argc, char *argv[])
{
struct epoll_event ev = {
.events =
EPOLLIN | EPOLLRDHUP | EPOLLERR | EPOLLOUT | EPOLLHUP | EPOLLPRI,
.data.u64 = 0,
};
int efd = open("/dev/vpoll", O_RDWR | O_CLOEXEC);
if (efd == -1)
handle_error("/dev/vpoll");
int epollfd = epoll_create1(EPOLL_CLOEXEC);
if (efd == -1)
handle_error("epoll_create1");
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, efd, &ev) == -1)
handle_error("epoll_ctl");
switch (fork()) {
case 0:
sleep(1);
ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN);
sleep(1);
ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN);
sleep(1);
ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN | EPOLLPRI);
sleep(1);
ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLPRI);
sleep(1);
ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLOUT);
sleep(1);
ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLHUP);
exit(EXIT_SUCCESS);
default:
while (1) {
int nfds = epoll_wait(epollfd, &ev, 1, 1000);
if (nfds < 0)
handle_error("epoll_wait");
else if (nfds == 0)
printf("timeout...\n");
else {
printf("GOT event %x\n", ev.events);
ioctl(efd, VPOLL_IO_DELEVENTS, ev.events);
if (ev.events & EPOLLHUP)
break;
}
}
break;
case -1: /* should not happen */
handle_error("fork");
}
close(epollfd);
close(efd);
return 0;
}
```
參考執行輸出:
```
timeout...
GOT event 1
timeout...
GOT event 1
timeout...
GOT event 3
timeout...
GOT event 2
timeout...
GOT event 4
timeout...
GOT event ZZZ
```
> 最後一行 `GOT event ` 訊息不完整,留給學員推測
> `ZZZ` 為 16 進位數值
`vpoll` 原始程式碼可參見: [gist](https://gist.github.com/jserv/d4df5902686333198ec2c033b8c19af9) (部分程式碼隱蔽)
`MMM` 和 `NNN` 為數值,`WWW` 為 Linux 核心 API 函式名稱,後者可參見 [memory-barriers.txt](https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt),以下摘錄:
> `wake_up_process()` always executes a general memory barrier. The barrier again occurs before the task state is accessed. In particular, if the wake_up() in the previous snippet were replaced by a call to `wake_up_process()` then one of
the two loads would be guaranteed to see 1.
>
> The available waker functions include:
> * `complete()`;
> * `wake_up()`;
> * `wake_up_all()`;
> * `wake_up_bit()`;
> * `wake_up_interruptible()`;
> * `wake_up_interruptible_all()`;
> * `wake_up_interruptible_nr()`;
> * `wake_up_interruptible_poll()`;
> * `wake_up_interruptible_sync()`;
> * `wake_up_interruptible_sync_poll()`;
> * `wake_up_locked()`;
> * `wake_up_locked_poll()`;
> * `wake_up_nr()`;
> * `wake_up_poll()`;
> * `wake_up_process()`;
>
> In terms of memory ordering, these functions all provide the same guarantees of a `wake_up()` (or stronger).
搭配對照 [include/linux/wait.h](https://github.com/torvalds/linux/blob/master/include/linux/wait.h) 和 [include/uapi/linux/eventpoll.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/eventpoll.h)
請補完程式碼,使得執行符合預期。
==作答區==
MMM = ?
NNN = ?
WWW = ?
ZZZ = ?
:::success
延伸問題:
1. 解釋上述程式碼運作原理,並探討對應的 memory order 議題
2. 修改 `vpoll` 核心模組,實作效能評比的特別模式,從而分析 `epoll` 效能
* 參閱 [Epoll Kernel Performance Improvements](https://events19.linuxfoundation.org/wp-content/uploads/2018/07/dbueso-oss-japan19.pdf) 和 [linux-ipc-benchmarks](https://github.com/kamalmarhubi/linux-ipc-benchmarks)
:::
---
### 測驗 `2`
考慮 `lfring` 是個 lock-free ring buffer 實作,並支援 multiple-producer/multiple-consumer (MPMC) 的情境。測試程式的參考輸出:
```shell
$ ./lfring
testing MPMC lock-free ring
testing MPSC lock-free ring
testing SPMC lock-free ring
testing SPSC lock-free ring
```
執行過程不會觸發任何 `assert` 失敗。
`lfring` 目前只支援 x86-64 架構,可在 Linux 和 macOS 執行,程式碼可見 [gist](https://gist.github.com/jserv/f810c45ad4423f406f9e0dbe9dabadc9) (部分程式碼隱蔽)
請補完程式碼,使得執行符合預期。
延伸閱讀:
* [An introduction to lockless algorithms](https://lwn.net/Articles/844224/)
* [Lockless algorithms for mere mortals](https://lwn.net/Articles/827180/)
* [Lockless patterns: an introduction to compare-and-swap](https://lwn.net/Articles/847973/)
==作答區==
DDD = ?
KKK = ?
TTT = ?
HHH = ?
:::success
延伸問題:
1. 解釋上述程式碼運作原理,搭配閱讀 [DPDK: Ring Library](https://doc.dpdk.org/guides/prog_guide/ring_lib.html) 解說,儘量引入圖解和強調 MPMC 的考量
2. 提出改進策略並著手實作
3. 研讀 Linux 核心 [kfifo](https://www.kernel.org/doc/htmldocs/kernel-api/kfifo.html) 文件,搭配 [kfifo-examples](https://github.com/sysprog21/kfifo-examples) 測試,以 git 取得 [linux](https://github.com/torvalds/linux) 程式碼工作副本,觀察 `git log include/linux/kfifo.h lib/kfifo.c` 並觀察修改記錄
* 留意 `spin_unlock_irqrestore` 的使用
* 解釋 commit 7e10505 的 race condition 成因
4. 將 `lfring` 移植到 Linux 核心,並提供對應的測試及效能評比程式
:::