---
tags: linux2021
---
# [2021 年暑期 Linux 核心](https://hackmd.io/@sysprog/linux2021-summer/) 第 5 週測驗題
:::info
目的: 檢驗學員對 ==[Atomics 操作](https://hackmd.io/@sysprog/concurrency-atomics)==, ==[Lock-Free Programming](https://hackmd.io/@sysprog/concurrency-lockfree)==, ==[The Linux Kernel Module Programming Guide](https://sysprog21.github.io/lkmpg/)==, 《Linux Kernel Scheduler Internals》的認知
:::
### 測驗 `1`
[ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫允許一個追蹤者 (tracer) 行程得以觀察或控制另外一個受追蹤者 (tracee) 行程的內容(如記憶體或暫存器)或執行流程。這主要用來實作除錯器 (debugger) 與系統呼叫的追蹤。使用這個系統呼叫之後,被追蹤者必須依附 (attach) 於追蹤者行程。該 attach 行為是以執行緒為單位,也就是說,在一個具備多個執行緒的行程中,其中每個執行緒都可個別 attach 在不同的追蹤者行程,或在其他執行緒被追蹤時,維持原本的執行流程。
> 延伸閱讀:
> * [Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps](https://landley.net/kdocs/ols/2007/ols2007v1-pages-215-224.pdf)
> * [進階 GDB](http://www.study-area.org/cyril/opentools/opentools/x1265.html)
`dont_trace` 這個 Linux 核心模組利用 `task_struct` 內部成員 [ptrace](https://elixir.bootlin.com/linux/v5.10.60/source/include/linux/sched.h#L669) 來偵測給定的行程是否被除錯器或其他使用 [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫的程式所 attach,一旦偵測到其他行程嘗試追蹤該行程,`dont_trace` 就會主動 `kill` 追蹤者行程。流程:
* Once any process starts "ptracing" another, the [tracee](https://man7.org/linux/man-pages/man2/ptrace.2.html) gets added into [ptraced](https://elixir.bootlin.com/linux/latest/source/include/linux/sched.h#L867) that's located in `task_struct`, which is simply a linked list that contains all the tracees that the process is "ptracing".
* Once a tracer is found, the module lists all its tracees and sends a `SIGKILL` signal to each of them including the tracer. This results in killing both the tracer and its tracees.
* Once the module is attached to the kernel, the module's "core" function will run periodically through the advantage of workqueues. Specifically, the module runs every `JIFFIES_DELAY`, which is set to 1. That is, the module will run every one [jiffy](https://www.oreilly.com/library/view/linux-device-drivers/9781785280009/4041820a-bbe4-4502-8ef9-d1913e133332.xhtml). This can be changed through modifying the macro `JIFFIES_DELAY` defined in the module.
執行以下命令可得知目前 Linux 核心的 `HZ` 設定:
```shell
$ getconf CLK_TCK
```
例如輸出 `100` 就表示 100 HZ,亦即週期為 $0.01$ 秒。
GDB 使用示範:
```shell
$ yes >/dev/null &
$ sudo gdb -q --pid=`pidof yes`
```
預期可見類似以下輸出:
```
Attaching to process 15718
Reading symbols from /usr/bin/yes...
(No debugging symbols found in /usr/bin/yes)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
0x00007f1113ebf1e7 in __GI___libc_write (fd=1, buf=0x563b5271c0f0, nbytes=8192) at ../sysdeps/unix/sysv/linux/write.c:26
```
可在 GDB 執行以下命令,要求 `yes` 命令不再執行: (`(gdb) ` 開頭表示 GDB 命令提示字元,不用輸入 ==`(gdb) `== 字樣,只要輸入 `signal` 開始的 GDB 命令)
```
(gdb) signal SIGINT
```
預期可見以下輸出:
```
Continuing with signal SIGINT.
Program terminated with signal SIGINT, Interrupt.
The program no longer exists.
```
一旦 `dont_trace` 掛載到 Linux 核心,重複上述操作時,會遇到截然不同的狀況:
```
$ sudo gdb -q --pid=`pidof yes`
Attaching to process 15736
Reading symbols from /usr/bin/yes...
[1]+ Killed yes > /dev/null
Killed
```
也就是 GDB 無法去追蹤給定的 `yes` 行程,除非執行 `sudo rmmod dont_trace` 命令,GDB 一類使用 [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫的程式才可正確運作。
以下是 `dont_trace.c` 程式碼列表: (遮蔽部分程式碼)
```cpp
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/list.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/sched/signal.h>
#include <linux/workqueue.h>
MODULE_AUTHOR("National Cheng Kung University, Taiwan");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_DESCRIPTION("A kernel module that kills ptrace tracer and its tracees");
#define JIFFIES_DELAY 1
#define DONT_TRACE_WQ_NAME "dont_trace_worker"
static void periodic_routine(struct work_struct *);
static DECLARE_DELAYED_WORK(dont_trace_task, periodic_routine);
static struct workqueue_struct *wq;
static bool loaded;
/* Send SIGKILL from kernel space */
static void kill_task(struct task_struct *task)
{
send_sig(SIGKILL, task, 1);
}
/* @return true if the process has tracees */
static bool is_tracer(struct list_head *children)
{
struct list_head *list;
list_for_each (list, children) {
struct task_struct *task =
list_entry(list, struct task_struct, ptrace_entry);
if (task)
return true;
}
return false;
}
/* Traverse the element in the linked list of the ptraced proccesses and
* finally kills them.
*/
static void kill_tracee(struct list_head *children)
{
struct list_head *list;
list_for_each (list, children) {
struct task_struct *task_ptraced =
list_entry(list, struct task_struct, ptrace_entry);
pr_info("ptracee -> comm: %s, pid: %d, gid: %d, ptrace: %d\n",
task_ptraced->comm, task_ptraced->pid, task_ptraced->tgid,
task_ptraced->ptrace);
kill_task(task_ptraced);
}
}
static void check(void)
{
struct task_struct *task;
for_each_process (task) {
if (!is_tracer(&task->ptraced))
continue;
kill_tracee(&task->ptraced);
kill_task(task); /* Kill the tracer once all tracees are killed */
}
}
static void periodic_routine(struct work_struct *ws)
{
if (likely(loaded))
check();
PPPP;
}
static int __init dont_trace_init(void)
{
wq = create_workqueue(DONT_TRACE_WQ_NAME);
queue_delayed_work(wq, &dont_trace_task, JIFFIES_DELAY);
loaded = true;
pr_info("Loaded!\n");
return 0;
}
static void __exit dont_trace_exit(void)
{
loaded = false;
/* No new routines will be queued */
cancel_delayed_work(&dont_trace_task);
/* Wait for the completion of all routines */
flush_workqueue(wq);
destroy_workqueue(wq);
pr_info("Unloaded.\n");
}
module_init(dont_trace_init);
module_exit(dont_trace_exit);
```
對應的 `Makefile`:
```shell
obj-m += dont_trace.o
KERNELDIR ?= /lib/modules/`uname -r`/build
PWD := $(shell pwd)
all:
$(MAKE) -C $(KERNELDIR) M=$(PWD) modules
clean:
$(MAKE) -C $(KERNELDIR) M=$(PWD) clean
```
請補完程式碼,使得執行符合預期。
==作答區==
* PPPP = ?
:::success
延伸問題:
1. 解釋上述程式碼運作原理,應探討 Linux 核心內部 [ptrace](https://man7.org/linux/man-pages/man2/ptrace.2.html) 系統呼叫和 [signal](https://man7.org/linux/man-pages/man7/signal.7.html) 的實作方式
2. 擴充[第 1 週測驗題](https://hackmd.io/@sysprog/linux2021-summer-quiz1)的程式碼 (當然是指你擴充過的實作),允許特定的行程得以隱藏或拒絕被追蹤,並且提供對應的 `ioctl` 介面,讓使用者層級的工具進行操作
3. 研讀 [The race to limit ptrace](https://www.rezilion.com/blog/the-race-to-limit-ptrace/) 一類的材料,探討行程避免被追蹤的手法
:::
---
### 測驗 `2`
我們打算利用 Linux 核心模組,建立一個「後門」,得以讓使用者層級的行程自己建立新的排程策略,使用方式:
```shell
$ sudo insmod proc_queue.ko
$ sudo insmod proc_sched.ko time_quantum=5
$ sudo insmod proc_set.ko
```
其中 `proc_sched` 實作 [round-robin 排程策略](https://en.wikipedia.org/wiki/Round-robin_scheduling),並可接受指定的 time quantum 或 timeslice。
程式碼列表可見: [gist](https://gist.github.com/jserv/4f4973fab342f50c5b106997dfa7c3b3) (遮蔽部分程式碼)
測試程式碼: (檔名: `user/test_thread.c`)
```cpp
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>
#define N_THREADS 2
void *test_pthread(void *ptr)
{
#ifdef SYS_gettid
pid_t tid = syscall(SYS_gettid);
#else
#error "SYS_gettid unavailable on this system"
#endif
FILE *fp = fopen("/proc/process_sched_add", "w");
fprintf(fp, "%d", tid);
fclose(fp);
while (1) {
printf("TID: %d\n", tid);
sleep(1);
}
pthread_exit(NULL);
}
int main()
{
pthread_t threads[N_THREADS];
for (long t = 0; t < N_THREADS; t++) {
printf("In main: creating thread %ld\n", t);
int rc = pthread_create(&threads[t], NULL, test_pthread, NULL);
if (rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
/* Last thing that main() should do */
pthread_exit(NULL);
return 0;
}
```
參考執行輸出:
```shell
$ user/test_thread
In main: creating thread 0
In main: creating thread 1
[1]+ Stopped
$ fg
TID: 27793
TID: 27794
TID: 27793
TID: 27794
TID: 27793
TID: 27794
...
```
請補完程式碼,使程式碼執行符合預期。
:::success
延伸問題:
1. 解釋上述程式碼運作原理,並重寫為單一 Linux 核心模組
2. 原本的程式碼充斥重複的程式碼,請改寫為更精簡的實作
3. 嘗試將 [linux-cfs-sim](https://github.com/sysprog21/linux-cfs-sim) 移植到上述 Linux 核心模組,也就是重新實作 CFS 演算法,但不更動原本 Linux 核心實作 (即「架空」Linux 核心)
:::
---
### 測驗 `3`
考慮一個 lock-free hash table 實作,程式碼可見 [gist](https://gist.github.com/jserv/c3823ea893e08607b432827a11ec4b69)
==作答區==
* KKKK = ?
* QQQQ = ?
* ZZZZ = ?
:::success
延伸問題:
1. 解釋上述程式碼運作原理,並指出其設計和實作的缺失 (如原本只接受 uin32_t 型態的輸入,無法處理字串作為 key)
2. 研讀 [lfht](https://github.com/ksandstr/lfht) (Lock-free wait-free hash table using C11 atomics) 程式碼及對應的論文 [Lock-Free Linked Lists and Skip Lists](http://www.cse.yorku.ca/~ruppert/papers/lfll.pdf)
3. 參照 [atomic_hash](https://github.com/divfor/atomic_hash) 及其內建的測試程式,用更大規模的來測試
:::