2019q1 Homework1 (linux-list)

# 2019q1 Homework1 (linux-list) contributed by < `0xff07` > ## 開發環境 ```shell $ uname -a Linux OxB16BOOB5 4.18.0-15-generic #16-Ubuntu SMP Thu Feb 7 10:56:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ``` ```shell $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 70 Model name: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Stepping: 1 CPU MHz: 850.328 CPU max MHz: 3700.0000 CPU min MHz: 800.0000 BogoMIPS: 4988.84 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K L4 cache: 131072K NUMA node0 CPU(s): 0-7 ``` ```shell $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.10 DISTRIB_CODENAME=cosmic DISTRIB_DESCRIPTION="Ubuntu 18.10" ``` # `container_of()` ```c= #ifndef container_of #ifdef __LIST_HAVE_TYPEOF #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) #else #define container_of(ptr, type, member) \ ((type *) ((char *) (ptr) -offsetof(type, member))) #endif #endif ``` 該程式原理類似 `qtest` 中，使用位址反推 HEADER / FOOTER 的機制。關於該巨集各部份的說明如下： ## `offsetof()` 首先發現一個 `offsetof` 巨集。查詢 `man offsetof` 後可以發現關於該巨集的說明： ```man DESCRIPTION The macro offsetof() returns the offset of the field member from the start of the structure type. This macro is useful because the sizes of the fields that compose a structure can vary across implementations, and compilers may insert different numbers of padding bytes between fields. Consequently, an element's offset is not necessarily given by the sum of the sizes of the previous ele‐ ments. A compiler error will result if member is not aligned to a byte boundary (i.e., it is a bit field). ``` 並且提供範例程式： ```C= EXAMPLE On a Linux/i386 system, when compiled using the default gcc(1) options, the program below produces the following output: $ ./a.out offsets: i=0; c=4; d=8 a=16 sizeof(struct s)=16 Program source #include <stddef.h> #include <stdio.h> #include <stdlib.h> int main(void) { struct s { int i; char c; double d; char a[]; }; /* Output is compiler dependent */ printf("offsets: i=%zd; c=%zd; d=%zd a=%zd\n", offsetof(struct s, i), offsetof(struct s, c), offsetof(struct s, d), offsetof(struct s, a)); printf("sizeof(struct s)=%zd\n", sizeof(struct s)); exit(EXIT_SUCCESS); } ``` 由 man 的說明可知該巨集的功能為：**給定結構體，以及成員的名稱，`offsetof` 將給出該成員之位址，與結構體起始位址的差異** ## `({...})` `container_of` 這個巨集至乍看之下沒有回傳值，但可發現它是一個被小括號包住的復合敘述。這是一個 GCC 的延伸功能。參閱 [6.1 Statements and Declarations in Expressions](https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html) 部份內容可以知道：小括號包住形成複合敘述(coumpound statement)的大括號，在 GCC 中是一個合法的 expression ，該 expression 中最後一個敘述的值，視為該復合敘述計算之後的值。這個延伸功能的其中一個好處是 **增加巨集的安全**。舉例而言，考慮以下找出兩個 `int` 中最大值的巨集： ```c #define maxint(a,b) ((a) > (b) ? (a) : (b)) ``` 假定想要先將 `a` 增加 1 ，並找出該結果與 `b` 較大者，可能會寫出以下程式： ```c maxint(++a, b); ``` 但該程式將會被展開成： ```c ((++a) > (b) ? (++a) : (b)); ``` 因此在這個巨集當中 `a` 將會增加兩次，與原先預期不同。但如果使用該延伸功能，將該巨集改寫成： ```c #define maxint(a,b) \ ({int _a = (a), _b = (b); _a > _b ? _a : _b; }) ``` `a` 與 `b` 中的 expression 都只會被計算一次。而大括號形成的複合敘述，可以使 `_a` `_b` 變數的可視範圍限制在大括號中。 ## 整體功能從巨集後者的： ```c= ... #else #define container_of(ptr, type, member) \ ((type *) ((char *) (ptr) -offsetof(type, member))) #endif ... ``` 或許較好理解。該巨集將 `member` 所在位址，扣除該成員在結構體起始位址的差異，反推回結構體的起始位址。而前面的： ```c= ... #ifdef __LIST_HAVE_TYPEOF #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ... ``` 可注意到復合敘述當中的第二行 `(type *) ((char *) __pmember - offsetof(type, member));` 與前述的 `((type *) ((char *) (ptr) -offsetof(type, member)))` 幾乎相同， ## LIST_POISONING 首先查到這個技巧能夠用來[除錯](https://wiki.litmus-rt.org/litmus/KernelDebugging)。在程式發生未預期的結束時，可以使用 core dump 進行除錯。將刪除的節點設置成某些特定的記憶體位址，若該位址在 core dump 中觀察到被寫入，就可推測問題出在某個 `list_head` 結構的存取中。另外查到一個 "memory poisoning"。在 kernel 的文件中，Kernel Self-Protection 的章節提到： > Memory poisoning > > When releasing memory, it is best to poison the contents (clear stack on syscall return, wipe heap memory on a free), to avoid reuse attacks that rely on the old contents of memory. This frustrates many uninitialized variable attacks, stack content exposures, heap content exposures, and use-after-free attacks. 所以推測有一部份可能是基於安全考量。 :::danger 調閱 git log，得知 Linux 核心開發者對 `list.h` 進行哪些相關修改 :notes: jserv ::: ## `list_head` 的應用 ### `struct task_struct` 第一個想到的應用是用於作業系統排程相關。因此第一個想到的就是去查 process control block 相關的結構。因此查到 [sched.h](https://github.com/torvalds/linux/blob/master/include/linux/sched.h) 中的 `struct task_struct` 結構。在這個結構中紀錄許多與行程有關的資料，如 `pid`, signal handler 等等。當中有許多資訊使用 `list_head` 儲存。舉例來說： ```C= /* * 'ptraced' is the list of tasks this task is using ptrace() on. * * This includes both natural children and PTRACE_ATTACH targets. * 'ptrace_entry' is this task's link on the p->parent->ptraced list. */ struct list_head ptraced; struct list_head ptrace_entry; ``` `ptraced` 紀錄該程序正在使用 `ptrace` 觀察的程序們。而當中也有與 `perf` 相關的資料結構，仍然是用 `list_head` 建立： ```C= #ifdef CONFIG_PERF_EVENTS struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts]; struct mutex perf_event_mutex; struct list_head perf_event_list; #endif ``` # 巨集的考量可發現 `list_head` 的實作方式，易於擴充，但又可以將所有擴展後的新資料結構，使用同一套巨集進行操作。舉例來說，在 `bash` 中執行 `locate list.h` ，試著尋找一些 Linux 核心中使用的相關結構： ```bash ... /usr/src/linux-headers-4.18.0-11/arch/ia64/include/asm/native/patchlist.h /usr/src/linux-headers-4.18.0-11/include/linux/klist.h /usr/src/linux-headers-4.18.0-11/include/linux/list.h /usr/src/linux-headers-4.18.0-11/include/linux/llist.h /usr/src/linux-headers-4.18.0-11/include/linux/plist.h /usr/src/linux-headers-4.18.0-11/include/linux/quicklist.h /usr/src/linux-headers-4.18.0-11/include/linux/rcu_segcblist.h /usr/src/linux-headers-4.18.0-11/include/linux/rculist.h /usr/src/linux-headers-4.18.0-11/include/linux/scatterlist.h /usr/src/linux-headers-4.18.0-11/include/linux/ceph/pagelist.h /usr/src/linux-headers-4.18.0-11/include/linux/netfilter/ipset/ip_set_list.h /usr/src/linux-headers-4.18.0-11/include/uapi/linux/netfilter/ipset/ip_set_list.h ... ``` 以 `klist.h` 及 `plist.h` 為例，當中資料結構宣告如下： ```C= sstruct klist { spinlock_t k_lock; struct list_head k_list; void (*get)(struct klist_node *); void (*put)(struct klist_node *); } __attribute__ ((aligned (sizeof(void *)))); ``` ```C= struct plist_head { struct list_head node_list; }; struct plist_node { int prio; struct list_head prio_list; struct list_head node_list; }; ``` # 文件生成文件當中註解的符號，原理類似 Doxygen 由特定註解格式自動生成說明文件。但該格式的規範與 Doxygen 不同。在[核心的文件](https://www.kernel.org/doc/html/v4.9/kernel-documentation.html)中有提到： > The Linux kernel uses Sphinx to generate pretty documentation from reStructuredText files under Documentation. To build the documentation in HTML or PDF formats, use make htmldocs or make pdfdocs. The generated documentation is placed in Documentation/output. 可知該文件生成是使用 `sphinx`，而非 Doxygen。試著依照[指示]()安裝相依套件： ```bash $ ./scripts/sphinx-pre-install ``` 將會出現需要生成文件所需安裝的相關套件(如指示的連結中說明所提到的)，複製貼上執行即可。若相依套件均已安裝，則會顯示： ``` Detected OS: Ubuntu 18.04.1 LTS. All optional dependenties are met. Needed package dependencies are met. ``` 安裝完之後，可以考慮安裝文件中指定的佈景主題。但這並非必要(我反而覺得 `sphinx` 預設主題產生的網頁搜尋功能比較好用)： ``` $ pip install sphinx_rtd_theme ``` 接著在核心原始碼的資料夾中執行 ``` $ make htmldocs ``` 待執行完成後，將可以在 `Documentation/output` 之下，找到文件的 html 輸出結果。 ![](https://i.imgur.com/4z0lvjN.png) 如需搜尋，可以點進 `Search.html` 中，該網頁提供關鍵字搜尋功能。舉例來說，若搜尋 `list_add_tail`，則可以找到相關條目： ![](https://i.imgur.com/YhOyv2f.png) # 加入 qtest > 這方面的程式放在 `lab0` 裡面 [linux-list-test](https://github.com/0xff07/lab0-c/tree/linux-list-test) 這個 branch 中。因為 `struct list_head` 僅是作為使用者撰寫的 linked list 中的其中一個成員，所以這邊就使用一個包含 `struct list_head` 的結構體來測試。這邊參考 `/private/common.h` 中的測試結構體，定義: ```c= struct listitem { uint16_t i; struct list_head list; }; ``` 因為 `list.h` 中關鍵的部份是 `struct list_head` 中的指標操作, 幫物件配置記憶體並不是 `list.h` 的工作。所以測試的重點在於有沒有正確連接各個指標。接著， `qtest` 中使用一個全域的 linked list `q` 作為待測試的資料結構。這邊也仿照其中的作法，設置一個全域變數 `p` 作為待測試的結構： ```c= /* list being tested */ struct listitem p = {.i = 0, .list = {0, 0}}; /* number of elements in list */ size_t pcnt = 0; ``` 因為 `list.h` 中並沒有包含如 `q_new` 之類幫物件配置記憶體空間的操作，但有包含初始化物件的操作，所以這邊不像 `qtest` 本來的測試宣告一個準備指向 linked list 的指標，而是直接宣告一個 `struct listitem`。而相較於 `queue_t` 之前要先 `q_new`，在測試 `listitem` 之前要先將其中的 `listitem.list` 使用 `INIT_LIST_HEAD` 來初始化。然後宣告一個 `listitem` 的陣列作為記憶體池: ```c= #define POOL_NUM 100 static struct listitem pool_listitem[POOL_NUM]; static bool pool_valid[POOL_NUM]; static bool list_init = false; ``` 再這當中，`pool_listitem` 陣列是真正存放節點的記憶體空間，`pool_valid` 表示該對應元素有沒有被使用，如果有的話，就會被設成 `true`; 而如果要執行類似 `free` 整個串列連結的操作，只要把所有 `pool_valid` 設成 `false` ，並且重新執行 `LIST_INIT_HEAD` 就好。 ## 觀察：測試 wrapper 的設計首先觀察一下 `qtest` 當中的測試函數是怎麼運作的。可以觀察到裡面的函數大多遵循下面這種模式： ```c= bool do_<operation name> (int argc, char *argv[]) { /* check if argc and argv are correct, q NULL, etc */ if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } if (q == NULL) { /* ommit */ } ... error_check(); /* risky test that may invoke SIGSEGV and other exceptions */ /* is wrapped inside if (exception_setup(true)){} */ if (exception_setup(true)) { } exception_cancel(); show_queue(3); return !error_check(); } ``` 幾乎每一個測試函數，最後都會進行 `show_queue`。因此也先實作對應版本的 `show_queue`。 ## show_list 首先實作 `show_queue` 的對應版本 `show_list`。首先看一下 show_queue 的程式： ```c= static bool show_queue(int vlevel) { bool ok = true; if (verblevel < vlevel) return true; int cnt = 0; if (q == NULL) { report(vlevel, "q = NULL"); return true; } report_noreturn(vlevel, "q = ["); list_ele_t *e = q->head; if (exception_setup(true)) { while (ok && e && cnt < qcnt) { if (cnt < big_queue_size) report_noreturn(vlevel, cnt == 0 ? "%s" : " %s", e->value); e = e->next; cnt++; ok = ok && !error_check(); } } exception_cancel(); if (!ok) { report(vlevel, " ... ]"); return false; } if (e == NULL) { if (cnt <= big_queue_size) report(vlevel, "]"); else report(vlevel, " ... ]"); } else { report(vlevel, " ... ]"); report( vlevel, "ERROR: Either list has cycle, or queue has more than %d elements", qcnt); ok = false; } return ok; } ``` 可以發現在遍歷整個 `q` 時，是用維護好的 `qcnt` 進行便歷，而不是直接使用 `q` 的指標操作作為依據。這樣可以檢測出比如「插入通通成功，但刪除通通什麼都沒做」的狀況。如果僅透過指標操作，仍然可以順利遍歷整個 list，也不會發生非法記憶體存取，但這樣並不表示結果正確。而透過維護 `qcnt` ，遍歷結束之後立刻會發現 `cnt` 數字對不起來（或是說遍歷完之後發現沒有回到頭等等）。這邊仿照上述測試，實作 `show_list` 以及 `do_show_list` ```c= static bool show_list(int vlevel) { int cnt = 0; bool ok = true; if (list_init == false) { report(vlevel, "p = NULL.\n"); return true; } struct listitem *cur_node = container_of(p.list.next, struct listitem, list); struct list_head *iter = p.list.next; report_noreturn(vlevel, "p = [ "); if (exception_setup(true)) { while (ok && iter != &(p.list) && cnt < pcnt) { if (cnt < big_queue_size) report_noreturn(vlevel, cnt ? "%d " : "%d ", cur_node->i); cnt++; iter = iter->next; cur_node = container_of(iter, struct listitem, list); } } exception_cancel(); if (!ok) { report(vlevel, "... ]"); return false; } if (iter == &(p.list)) { if (cnt <= big_queue_size) report(vlevel, "]"); else report(vlevel, " ... ]"); } else { report(vlevel, " ... ]"); report(vlevel, "ERROR : length of list doesn't match the number of insert and " "delete operations."); ok = false; } return ok; } ``` 這邊的邏輯大致與 `show_queue` 相同，但不一樣之處是：在 `queue.h` 系列的操作中，會時時刻刻檢查動態配置的記憶體是否正確(即 `error_check()` 函數)，但這邊因為已經開好陣列了，所以不進行檢查。 ```c= bool do_list_show(int argc, char *argv[]) { if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } return show_list(0); } ``` ## do_init_list_head() 接著寫 `INIT_LIST_HEAD` 的測試： ```c= bool do_list_init_head(int argc, char *argv[]) { if (argc != 1) { report(3, "%s takes exactly 1 argument."); return false; } if (list_init == true) { report(3, "Warning : list is not empty"); for (int i = 0; i < POOL_NUM; i++) pool_valid[i] = 0; } bool ok = true; if (exception_setup(true)) { INIT_LIST_HEAD(&(p.list)); if (p.list.next != &(p.list)) { report(3, "next field is not properly linked."); ok = false; } if (p.list.prev != &(p.list)) { report(3, "prev field is not properly linked."); ok = false; } } exception_cancel(); if (ok) list_init = true; return ok; } ``` 這邊是用 `list_init` 作為有沒有初始化的依據，並且在執行結束之後，驗證是否頭尾都連到自己。 ## do_list_add 這邊的邏輯類似 `q_insert` 。雖然理論上數字範圍與重複次數應該要是任意的，但在無論限制多寬，本質上還是從 `pool_listitem` 中的某個元素連接到裡面另外一個元素。所以這邊先規定插入數字 `i` 時，必須使用 `pool_listitem[i]` 的記憶體，而範圍只能在 `0 ~ POOL_NUM - 1`，且一個數字只能插入一次。 ```c= bool do_list_add(int argc, char *argv[]) { int insert; bool ok = true; if (argc != 2) { report(1, "%s takes 2 arguments."); return false; } if (!get_int(argv[1], &insert)) { report(1, "'%s' is not a valid number.", argv[1]); return false; } if (insert >= POOL_NUM || insert < 0) { report(1, "'%s' out of range. Should between 0 and %d.", argv[1], POOL_NUM - 1); return false; } if (pool_valid[insert] == true) { report(1, "'%s' is already in list.", argv[1]); return false; } if (false == list_init) { report(1, "Calling list_add on null list"); return false; } error_check(); if (exception_setup(true)) { struct list_head *original_next = p.list.next; pool_listitem[insert].i = insert; struct list_head *node = &(pool_listitem[insert].list); struct list_head *head = &(p.list); list_add(node, head); if ((p.list.next) != node) { report(1, "ERROR : New node isn't correctly insert to head."); ok = false; } if ((node->prev) != head) { report(1, "ERROR : Head isn't correctly connected to new node."); ok = false; } if ((original_next->prev) != node) { report(1, "ERROR : The second node before insertion isn't correctly " "connected to newly added node."); ok = false; } if ((node->next) != original_next) { report(1, "ERROR : new node isn't correctly connected to second node " "before insertion."); ok = false; } if (ok) { pcnt++; pool_valid[insert] = true; } } exception_cancel(); show_list(3); return ok; } ``` 大致上來說，架構是檢測輸入, 並在連接之後，對相應的節點做出位置檢查。