Try   HackMD

2021 年暑期 Linux 核心 第 4 週測驗題

tags: linux2021

作答表單

測驗 1

給定 vpoll 核心模組可產生/合成 (synthesize) select / poll / epoll 的事件,一旦 vpoll 掛載進 Linux 核心後,將可藉由 ioctl 接受來自使用者層級的命令要求。以下是參考的測試程式碼: (檔名: user.c)

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/epoll.h>
#include <sys/ioctl.h>
#include <unistd.h>

#define handle_error(msg)   \
    do {                    \
        perror(msg);        \
        exit(EXIT_FAILURE); \
    } while (0)

#define VPOLL_IOC_MAGIC '^'
#define VPOLL_IO_SETEVENTS _IO(VPOLL_IOC_MAGIC, 3)
#define VPOLL_IO_DELEVENTS _IO(VPOLL_IOC_MAGIC, 2)
#define VPOLL_IO_ADDEVENTS _IO(VPOLL_IOC_MAGIC, 1)

int main(int argc, char *argv[])
{
    struct epoll_event ev = {
        .events =
            EPOLLIN | EPOLLRDHUP | EPOLLERR | EPOLLOUT | EPOLLHUP | EPOLLPRI,
        .data.u64 = 0,
    };

    int efd = open("/dev/vpoll", O_RDWR | O_CLOEXEC);
    if (efd == -1)
        handle_error("/dev/vpoll");
    int epollfd = epoll_create1(EPOLL_CLOEXEC);
    if (efd == -1)
        handle_error("epoll_create1");
    if (epoll_ctl(epollfd, EPOLL_CTL_ADD, efd, &ev) == -1)
        handle_error("epoll_ctl");

    switch (fork()) {
    case 0:
        sleep(1);
        ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN);
        sleep(1);
        ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN);
        sleep(1);
        ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLIN | EPOLLPRI);
        sleep(1);
        ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLPRI);
        sleep(1);
        ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLOUT);
        sleep(1);
        ioctl(efd, VPOLL_IO_ADDEVENTS, EPOLLHUP);
        exit(EXIT_SUCCESS);
    default:
        while (1) {
            int nfds = epoll_wait(epollfd, &ev, 1, 1000);
            if (nfds < 0)
                handle_error("epoll_wait");
            else if (nfds == 0)
                printf("timeout...\n");
            else {
                printf("GOT event %x\n", ev.events);
                ioctl(efd, VPOLL_IO_DELEVENTS, ev.events);
                if (ev.events & EPOLLHUP)
                    break;
            }
        }
        break;
    case -1: /* should not happen */
        handle_error("fork");
    }

    close(epollfd);
    close(efd);
    return 0;
}

參考執行輸出:

timeout...
GOT event 1
timeout...
GOT event 1
timeout...
GOT event 3
timeout...
GOT event 2
timeout...
GOT event 4
timeout...
GOT event ZZZ

最後一行 GOT event 訊息不完整,留給學員推測
ZZZ 為 16 進位數值

vpoll 原始程式碼可參見: gist (部分程式碼隱蔽)

MMMNNN 為數值,WWW 為 Linux 核心 API 函式名稱,後者可參見 memory-barriers.txt,以下摘錄:

wake_up_process() always executes a general memory barrier. The barrier again occurs before the task state is accessed. In particular, if the wake_up() in the previous snippet were replaced by a call to wake_up_process() then one of
the two loads would be guaranteed to see 1.

The available waker functions include:

  • complete();
  • wake_up();
  • wake_up_all();
  • wake_up_bit();
  • wake_up_interruptible();
  • wake_up_interruptible_all();
  • wake_up_interruptible_nr();
  • wake_up_interruptible_poll();
  • wake_up_interruptible_sync();
  • wake_up_interruptible_sync_poll();
  • wake_up_locked();
  • wake_up_locked_poll();
  • wake_up_nr();
  • wake_up_poll();
  • wake_up_process();

In terms of memory ordering, these functions all provide the same guarantees of a wake_up() (or stronger).

搭配對照 include/linux/wait.hinclude/uapi/linux/eventpoll.h

請補完程式碼,使得執行符合預期。

作答區

MMM = ?
NNN = ?
WWW = ?
ZZZ = ?

延伸問題:

  1. 解釋上述程式碼運作原理,並探討對應的 memory order 議題
  2. 修改 vpoll 核心模組,實作效能評比的特別模式,從而分析 epoll 效能

測驗 2

考慮 lfring 是個 lock-free ring buffer 實作,並支援 multiple-producer/multiple-consumer (MPMC) 的情境。測試程式的參考輸出:

$ ./lfring 
testing MPMC lock-free ring
testing MPSC lock-free ring
testing SPMC lock-free ring
testing SPSC lock-free ring

執行過程不會觸發任何 assert 失敗。

lfring 目前只支援 x86-64 架構,可在 Linux 和 macOS 執行,程式碼可見 gist (部分程式碼隱蔽)

請補完程式碼,使得執行符合預期。

延伸閱讀:

作答區

DDD = ?
KKK = ?
TTT = ?
HHH = ?

延伸問題:

  1. 解釋上述程式碼運作原理,搭配閱讀 DPDK: Ring Library 解說,儘量引入圖解和強調 MPMC 的考量
  2. 提出改進策略並著手實作
  3. 研讀 Linux 核心 kfifo 文件,搭配 kfifo-examples 測試,以 git 取得 linux 程式碼工作副本,觀察 git log include/linux/kfifo.h lib/kfifo.c 並觀察修改記錄
    • 留意 spin_unlock_irqrestore 的使用
    • 解釋 commit 7e10505 的 race condition 成因
  4. lfring 移植到 Linux 核心,並提供對應的測試及效能評比程式