--- tags: linux2021 --- # [2021 年暑期 Linux 核心](https://hackmd.io/@sysprog/linux2021-summer/) 第 5 週測驗題 :::info 目的: 檢驗學員對 ==[Atomics 操作](https://hackmd.io/@sysprog/concurrency-atomics)==, ==[Lock-Free Programming](https://hackmd.io/@sysprog/concurrency-lockfree)==, ==[The Linux Kernel Module Programming Guide](https://sysprog21.github.io/lkmpg/)==, 《Linux Kernel Scheduler Internals》的認知 ::: ### 測驗 `1` [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫允許一個追蹤者 (tracer) 行程得以觀察或控制另外一個受追蹤者 (tracee) 行程的內容(如記憶體或暫存器)或執行流程。這主要用來實作除錯器 (debugger) 與系統呼叫的追蹤。使用這個系統呼叫之後,被追蹤者必須依附 (attach) 於追蹤者行程。該 attach 行為是以執行緒為單位,也就是說,在一個具備多個執行緒的行程中,其中每個執行緒都可個別 attach 在不同的追蹤者行程,或在其他執行緒被追蹤時,維持原本的執行流程。 > 延伸閱讀: > * [Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps](https://landley.net/kdocs/ols/2007/ols2007v1-pages-215-224.pdf) > * [進階 GDB](http://www.study-area.org/cyril/opentools/opentools/x1265.html) `dont_trace` 這個 Linux 核心模組利用 `task_struct` 內部成員 [ptrace](https://elixir.bootlin.com/linux/v5.10.60/source/include/linux/sched.h#L669) 來偵測給定的行程是否被除錯器或其他使用 [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫的程式所 attach,一旦偵測到其他行程嘗試追蹤該行程,`dont_trace` 就會主動 `kill` 追蹤者行程。流程: * Once any process starts "ptracing" another, the [tracee](https://man7.org/linux/man-pages/man2/ptrace.2.html) gets added into [ptraced](https://elixir.bootlin.com/linux/latest/source/include/linux/sched.h#L867) that's located in `task_struct`, which is simply a linked list that contains all the tracees that the process is "ptracing". * Once a tracer is found, the module lists all its tracees and sends a `SIGKILL` signal to each of them including the tracer. This results in killing both the tracer and its tracees. * Once the module is attached to the kernel, the module's "core" function will run periodically through the advantage of workqueues. Specifically, the module runs every `JIFFIES_DELAY`, which is set to 1. That is, the module will run every one [jiffy](https://www.oreilly.com/library/view/linux-device-drivers/9781785280009/4041820a-bbe4-4502-8ef9-d1913e133332.xhtml). This can be changed through modifying the macro `JIFFIES_DELAY` defined in the module. 執行以下命令可得知目前 Linux 核心的 `HZ` 設定: ```shell $ getconf CLK_TCK ``` 例如輸出 `100` 就表示 100 HZ,亦即週期為 $0.01$ 秒。 GDB 使用示範: ```shell $ yes >/dev/null & $ sudo gdb -q --pid=`pidof yes` ``` 預期可見類似以下輸出: ``` Attaching to process 15718 Reading symbols from /usr/bin/yes... (No debugging symbols found in /usr/bin/yes) Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007f1113ebf1e7 in __GI___libc_write (fd=1, buf=0x563b5271c0f0, nbytes=8192) at ../sysdeps/unix/sysv/linux/write.c:26 ``` 可在 GDB 執行以下命令,要求 `yes` 命令不再執行: (`(gdb) ` 開頭表示 GDB 命令提示字元,不用輸入 ==`(gdb) `== 字樣,只要輸入 `signal` 開始的 GDB 命令) ``` (gdb) signal SIGINT ``` 預期可見以下輸出: ``` Continuing with signal SIGINT. Program terminated with signal SIGINT, Interrupt. The program no longer exists. ``` 一旦 `dont_trace` 掛載到 Linux 核心,重複上述操作時,會遇到截然不同的狀況: ``` $ sudo gdb -q --pid=`pidof yes` Attaching to process 15736 Reading symbols from /usr/bin/yes... [1]+ Killed yes > /dev/null Killed ``` 也就是 GDB 無法去追蹤給定的 `yes` 行程,除非執行 `sudo rmmod dont_trace` 命令,GDB 一類使用 [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫的程式才可正確運作。 以下是 `dont_trace.c` 程式碼列表: (遮蔽部分程式碼) ```cpp #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/list.h> #include <linux/module.h> #include <linux/sched.h> #include <linux/sched/signal.h> #include <linux/workqueue.h> MODULE_AUTHOR("National Cheng Kung University, Taiwan"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_DESCRIPTION("A kernel module that kills ptrace tracer and its tracees"); #define JIFFIES_DELAY 1 #define DONT_TRACE_WQ_NAME "dont_trace_worker" static void periodic_routine(struct work_struct *); static DECLARE_DELAYED_WORK(dont_trace_task, periodic_routine); static struct workqueue_struct *wq; static bool loaded; /* Send SIGKILL from kernel space */ static void kill_task(struct task_struct *task) { send_sig(SIGKILL, task, 1); } /* @return true if the process has tracees */ static bool is_tracer(struct list_head *children) { struct list_head *list; list_for_each (list, children) { struct task_struct *task = list_entry(list, struct task_struct, ptrace_entry); if (task) return true; } return false; } /* Traverse the element in the linked list of the ptraced proccesses and * finally kills them. */ static void kill_tracee(struct list_head *children) { struct list_head *list; list_for_each (list, children) { struct task_struct *task_ptraced = list_entry(list, struct task_struct, ptrace_entry); pr_info("ptracee -> comm: %s, pid: %d, gid: %d, ptrace: %d\n", task_ptraced->comm, task_ptraced->pid, task_ptraced->tgid, task_ptraced->ptrace); kill_task(task_ptraced); } } static void check(void) { struct task_struct *task; for_each_process (task) { if (!is_tracer(&task->ptraced)) continue; kill_tracee(&task->ptraced); kill_task(task); /* Kill the tracer once all tracees are killed */ } } static void periodic_routine(struct work_struct *ws) { if (likely(loaded)) check(); PPPP; } static int __init dont_trace_init(void) { wq = create_workqueue(DONT_TRACE_WQ_NAME); queue_delayed_work(wq, &dont_trace_task, JIFFIES_DELAY); loaded = true; pr_info("Loaded!\n"); return 0; } static void __exit dont_trace_exit(void) { loaded = false; /* No new routines will be queued */ cancel_delayed_work(&dont_trace_task); /* Wait for the completion of all routines */ flush_workqueue(wq); destroy_workqueue(wq); pr_info("Unloaded.\n"); } module_init(dont_trace_init); module_exit(dont_trace_exit); ``` 對應的 `Makefile`: ```shell obj-m += dont_trace.o KERNELDIR ?= /lib/modules/`uname -r`/build PWD := $(shell pwd) all: $(MAKE) -C $(KERNELDIR) M=$(PWD) modules clean: $(MAKE) -C $(KERNELDIR) M=$(PWD) clean ``` 請補完程式碼,使得執行符合預期。 ==作答區== * PPPP = ? :::success 延伸問題: 1. 解釋上述程式碼運作原理,應探討 Linux 核心內部 [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫和 [signal](https://man7.org/linux/man-pages/man7/signal.7.html) 的實作方式 2. 擴充[第 1 週測驗題](https://hackmd.io/@sysprog/linux2021-summer-quiz1)的程式碼 (當然是指你擴充過的實作),允許特定的行程得以隱藏或拒絕被追蹤,並且提供對應的 `ioctl` 介面,讓使用者層級的工具進行操作 3. 研讀 [The race to limit ptrace](https://www.rezilion.com/blog/the-race-to-limit-ptrace/) 一類的材料,探討行程避免被追蹤的手法 ::: --- ### 測驗 `2` 我們打算利用 Linux 核心模組,建立一個「後門」,得以讓使用者層級的行程自己建立新的排程策略,使用方式: ```shell $ sudo insmod proc_queue.ko $ sudo insmod proc_sched.ko time_quantum=5 $ sudo insmod proc_set.ko ``` 其中 `proc_sched` 實作 [round-robin 排程策略](https://en.wikipedia.org/wiki/Round-robin_scheduling),並可接受指定的 time quantum 或 timeslice。 程式碼列表可見: [gist](https://gist.github.com/jserv/4f4973fab342f50c5b106997dfa7c3b3) (遮蔽部分程式碼) 測試程式碼: (檔名: `user/test_thread.c`) ```cpp #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <sys/syscall.h> #include <unistd.h> #define N_THREADS 2 void *test_pthread(void *ptr) { #ifdef SYS_gettid pid_t tid = syscall(SYS_gettid); #else #error "SYS_gettid unavailable on this system" #endif FILE *fp = fopen("/proc/process_sched_add", "w"); fprintf(fp, "%d", tid); fclose(fp); while (1) { printf("TID: %d\n", tid); sleep(1); } pthread_exit(NULL); } int main() { pthread_t threads[N_THREADS]; for (long t = 0; t < N_THREADS; t++) { printf("In main: creating thread %ld\n", t); int rc = pthread_create(&threads[t], NULL, test_pthread, NULL); if (rc) { printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); } } /* Last thing that main() should do */ pthread_exit(NULL); return 0; } ``` 參考執行輸出: ```shell $ user/test_thread In main: creating thread 0 In main: creating thread 1 [1]+ Stopped $ fg TID: 27793 TID: 27794 TID: 27793 TID: 27794 TID: 27793 TID: 27794 ... ``` 請補完程式碼,使程式碼執行符合預期。 :::success 延伸問題: 1. 解釋上述程式碼運作原理,並重寫為單一 Linux 核心模組 2. 原本的程式碼充斥重複的程式碼,請改寫為更精簡的實作 3. 嘗試將 [linux-cfs-sim](https://github.com/sysprog21/linux-cfs-sim) 移植到上述 Linux 核心模組,也就是重新實作 CFS 演算法,但不更動原本 Linux 核心實作 (即「架空」Linux 核心) ::: --- ### 測驗 `3` 考慮一個 lock-free hash table 實作,程式碼可見 [gist](https://gist.github.com/jserv/c3823ea893e08607b432827a11ec4b69) ==作答區== * KKKK = ? * QQQQ = ? * ZZZZ = ? :::success 延伸問題: 1. 解釋上述程式碼運作原理,並指出其設計和實作的缺失 (如原本只接受 uin32_t 型態的輸入,無法處理字串作為 key) 2. 研讀 [lfht](https://github.com/ksandstr/lfht) (Lock-free wait-free hash table using C11 atomics) 程式碼及對應的論文 [Lock-Free Linked Lists and Skip Lists](http://www.cse.yorku.ca/~ruppert/papers/lfll.pdf) 3. 參照 [atomic_hash](https://github.com/divfor/atomic_hash) 及其內建的測試程式,用更大規模的來測試 :::