Try  HackMD Logo HackMD

2021 年暑期 Linux 核心 第 5 週測驗題

目的: 檢驗學員對 Atomics 操作, Lock-Free Programming, The Linux Kernel Module Programming Guide, 《Linux Kernel Scheduler Internals》的認知

測驗 1

ptrace 系統呼叫允許一個追蹤者 (tracer) 行程得以觀察或控制另外一個受追蹤者 (tracee) 行程的內容(如記憶體或暫存器)或執行流程。這主要用來實作除錯器 (debugger) 與系統呼叫的追蹤。使用這個系統呼叫之後,被追蹤者必須依附 (attach) 於追蹤者行程。該 attach 行為是以執行緒為單位,也就是說,在一個具備多個執行緒的行程中,其中每個執行緒都可個別 attach 在不同的追蹤者行程,或在其他執行緒被追蹤時,維持原本的執行流程。

延伸閱讀:

dont_trace 這個 Linux 核心模組利用 task_struct 內部成員 ptrace 來偵測給定的行程是否被除錯器或其他使用 ptrace 系統呼叫的程式所 attach,一旦偵測到其他行程嘗試追蹤該行程,dont_trace 就會主動 kill 追蹤者行程。流程:

  • Once any process starts "ptracing" another, the tracee gets added into ptraced that's located in task_struct, which is simply a linked list that contains all the tracees that the process is "ptracing".
  • Once a tracer is found, the module lists all its tracees and sends a SIGKILL signal to each of them including the tracer. This results in killing both the tracer and its tracees.
  • Once the module is attached to the kernel, the module's "core" function will run periodically through the advantage of workqueues. Specifically, the module runs every JIFFIES_DELAY, which is set to 1. That is, the module will run every one jiffy. This can be changed through modifying the macro JIFFIES_DELAY defined in the module.

執行以下命令可得知目前 Linux 核心的 HZ 設定:

$ getconf CLK_TCK

例如輸出 100 就表示 100 HZ,亦即週期為

0.01 秒。

GDB 使用示範:

$ yes >/dev/null &
$ sudo gdb -q --pid=`pidof yes`

預期可見類似以下輸出:

Attaching to process 15718
Reading symbols from /usr/bin/yes...
(No debugging symbols found in /usr/bin/yes)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
0x00007f1113ebf1e7 in __GI___libc_write (fd=1, buf=0x563b5271c0f0, nbytes=8192) at ../sysdeps/unix/sysv/linux/write.c:26

可在 GDB 執行以下命令,要求 yes 命令不再執行: ((gdb) 開頭表示 GDB 命令提示字元,不用輸入 (gdb) 字樣,只要輸入 signal 開始的 GDB 命令)

(gdb) signal SIGINT

預期可見以下輸出:

Continuing with signal SIGINT.

Program terminated with signal SIGINT, Interrupt.
The program no longer exists.

一旦 dont_trace 掛載到 Linux 核心,重複上述操作時,會遇到截然不同的狀況:

$ sudo gdb -q --pid=`pidof yes`
Attaching to process 15736
Reading symbols from /usr/bin/yes...
[1]+  Killed                  yes > /dev/null
Killed

也就是 GDB 無法去追蹤給定的 yes 行程,除非執行 sudo rmmod dont_trace 命令,GDB 一類使用 ptrace 系統呼叫的程式才可正確運作。

以下是 dont_trace.c 程式碼列表: (遮蔽部分程式碼)

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/list.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/sched/signal.h>
#include <linux/workqueue.h>

MODULE_AUTHOR("National Cheng Kung University, Taiwan");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_DESCRIPTION("A kernel module that kills ptrace tracer and its tracees");

#define JIFFIES_DELAY 1
#define DONT_TRACE_WQ_NAME "dont_trace_worker"

static void periodic_routine(struct work_struct *);
static DECLARE_DELAYED_WORK(dont_trace_task, periodic_routine);
static struct workqueue_struct *wq;
static bool loaded;

/* Send SIGKILL from kernel space */
static void kill_task(struct task_struct *task)
{
    send_sig(SIGKILL, task, 1);
}

/* @return true if the process has tracees */
static bool is_tracer(struct list_head *children)
{
    struct list_head *list;
    list_for_each (list, children) {
        struct task_struct *task =
            list_entry(list, struct task_struct, ptrace_entry);
        if (task)
            return true;
    }
    return false;
}

/* Traverse the element in the linked list of the ptraced proccesses and
 * finally kills them.
 */
static void kill_tracee(struct list_head *children)
{
    struct list_head *list;
    list_for_each (list, children) {
        struct task_struct *task_ptraced =
            list_entry(list, struct task_struct, ptrace_entry);
        pr_info("ptracee -> comm: %s, pid: %d, gid: %d, ptrace: %d\n",
                task_ptraced->comm, task_ptraced->pid, task_ptraced->tgid,
                task_ptraced->ptrace);
        kill_task(task_ptraced);
    }
}

static void check(void)
{
    struct task_struct *task;
    for_each_process (task) {
        if (!is_tracer(&task->ptraced))
            continue;

        kill_tracee(&task->ptraced);
        kill_task(task); /* Kill the tracer once all tracees are killed */
    }
}

static void periodic_routine(struct work_struct *ws)
{
    if (likely(loaded))
        check();
    PPPP;
}

static int __init dont_trace_init(void)
{
    wq = create_workqueue(DONT_TRACE_WQ_NAME);
    queue_delayed_work(wq, &dont_trace_task, JIFFIES_DELAY);

    loaded = true;
    pr_info("Loaded!\n");
    return 0;
}

static void __exit dont_trace_exit(void)
{
    loaded = false;

    /* No new routines will be queued */
    cancel_delayed_work(&dont_trace_task);

    /* Wait for the completion of all routines */
    flush_workqueue(wq);
    destroy_workqueue(wq);

    pr_info("Unloaded.\n");
}

module_init(dont_trace_init);
module_exit(dont_trace_exit);

對應的 Makefile:

obj-m += dont_trace.o

KERNELDIR ?= /lib/modules/`uname -r`/build
PWD := $(shell pwd)

all:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules

clean:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) clean

請補完程式碼,使得執行符合預期。

作答區

  • PPPP = ?

延伸問題:

  1. 解釋上述程式碼運作原理,應探討 Linux 核心內部 ptrace 系統呼叫和 signal 的實作方式
  2. 擴充第 1 週測驗題的程式碼 (當然是指你擴充過的實作),允許特定的行程得以隱藏或拒絕被追蹤,並且提供對應的 ioctl 介面,讓使用者層級的工具進行操作
  3. 研讀 The race to limit ptrace 一類的材料,探討行程避免被追蹤的手法

測驗 2

我們打算利用 Linux 核心模組,建立一個「後門」,得以讓使用者層級的行程自己建立新的排程策略,使用方式:

$ sudo insmod proc_queue.ko
$ sudo insmod proc_sched.ko time_quantum=5
$ sudo insmod proc_set.ko

其中 proc_sched 實作 round-robin 排程策略,並可接受指定的 time quantum 或 timeslice。

程式碼列表可見: gist (遮蔽部分程式碼)

測試程式碼: (檔名: user/test_thread.c)

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>

#define N_THREADS 2

void *test_pthread(void *ptr)
{
#ifdef SYS_gettid
    pid_t tid = syscall(SYS_gettid);
#else
#error "SYS_gettid unavailable on this system"
#endif
    FILE *fp = fopen("/proc/process_sched_add", "w");
    fprintf(fp, "%d", tid);
    fclose(fp);

    while (1) {
        printf("TID: %d\n", tid);
        sleep(1);
    }
    pthread_exit(NULL);
}

int main()
{
    pthread_t threads[N_THREADS];
    for (long t = 0; t < N_THREADS; t++) {
        printf("In main: creating thread %ld\n", t);
        int rc = pthread_create(&threads[t], NULL, test_pthread, NULL);
        if (rc) {
            printf("ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }

    /* Last thing that main() should do */
    pthread_exit(NULL);
    return 0;
}

參考執行輸出:

$ user/test_thread
In main: creating thread 0
In main: creating thread 1

[1]+  Stopped
$ fg
TID: 27793
TID: 27794
TID: 27793
TID: 27794
TID: 27793
TID: 27794
...

請補完程式碼,使程式碼執行符合預期。

延伸問題:

  1. 解釋上述程式碼運作原理,並重寫為單一 Linux 核心模組
  2. 原本的程式碼充斥重複的程式碼,請改寫為更精簡的實作
  3. 嘗試將 linux-cfs-sim 移植到上述 Linux 核心模組,也就是重新實作 CFS 演算法,但不更動原本 Linux 核心實作 (即「架空」Linux 核心)

測驗 3

考慮一個 lock-free hash table 實作,程式碼可見 gist

作答區

  • KKKK = ?
  • QQQQ = ?
  • ZZZZ = ?

延伸問題:

  1. 解釋上述程式碼運作原理,並指出其設計和實作的缺失 (如原本只接受 uin32_t 型態的輸入,無法處理字串作為 key)
  2. 研讀 lfht (Lock-free wait-free hash table using C11 atomics) 程式碼及對應的論文 Lock-Free Linked Lists and Skip Lists
  3. 參照 atomic_hash 及其內建的測試程式,用更大規模的來測試