2019q1 Homework1 (list)

# 2019q1 Homework1 (list) contributed by <`MetalheadKen`> :::info :mega: 題目出處 * [F02: list](https://hackmd.io/s/S12jCWKHN) ::: ### 自我檢查清單 [TOC] ### 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ * macro * [用法](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256#page=158)：`#define macro_name replacement-list new-line` * 用 `#define` 來定義一個常數或指令 * 在編譯前，preprocessor 會把 `macro_name` 替換為先前定義好的常數或指令 * 沒有 function call 在執行時期的成本，但會增加 code size，且比起 function 的寫法較不易 debug 和維護 * function * [用法](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256#page=153) * 由於在執行時期呼叫 function 時，需要將 return address 和 function parameter 藉由 `push` 和 `pop` 進 stack 裡，因此在執行時期會耗費大量時間成本 * TODO * 比較 macro、function 和 inline function 的 code size 和 execution time ### Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 ### GNU extension 的 [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 有何作用？在程式碼中扮演什麼角色？ * `typeof` 會回傳參數內的資料型態 * 例如： ```C int x[10]; typeof(x) y; // 相當於 int y[10]; typeof(x[0]) z; // 相當於 int z; ``` * 在 GCC manual 中的範例可以看到使用了 `typeof` 可以依參數內的變數來動態的去調整其資料型態，可以讓程式變得更有彈性以及更容易維護 ```C #define max(a, b) \ ({ typeof(a) _a = (a); \ typeof(b) _b = (b); \ _a > _b ? _a : _b; )}) ``` ### 解釋 `container_of` 巨集的原理 ```C #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` * `container_of` * 為藉由結構體中的成員的記憶體位址，找到該結構體最一開始的記憶體位址 * `__extension__` * 根據 [GCC manual](https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#Alternate-Keywords) 中提到，在編譯的參數裡添加 `-pedantic` 或 `-ansi` 的話，編譯器會對所有使用到 GNU extension 的地方提出警告，但是並不會對包含在 `__extension__` 的運算式產生警告 > You can prevent such warnings within one expression by writing `__extension__` before the expression. `__extension__` has no effect aside from this. * `__typeof__` * 根據 [GCC manual](https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#Alternate-Keywords) 中提到，若想要在 ISO C 程式中可以相容的話，可以把 `typeof` 改寫為 `__typeof__` > If you are writing a header file that must work when included in ISO C programs, write `__typeof__` instead of `typeof`. * `__typeof__(((type *) 0)->member)` * 一般來說對 0 作 dereference 應該會造成 segmentation fault，但是在 [GCC manual](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 中提到在使用 `typeof` 時只有在執行時期才能確定的 expression 才會去求值 > The operand of typeof is evaluated for its side effects if and only if it is an expression of variably modified type or the name of such a type. * `const __typeof__(((type *) 0)->member) *__pmember = (ptr);` * 宣告一個指向結構體中的成員 `member` 的資料型態的指標 `__pmember`，並指派一個變數 `ptr` * 若程式設計師在使用 `container_of` 時傳進去的參數其 `ptr` 的資料型態與 `member` 的資料型態不相符時，在編譯期間會輸出 `error: initialization from incompatible pointer type` 的錯誤訊息，可藉此在 compile time 檢查出人為上的疏失 * `offsetof` * `size_t offsetof(type, member)` * 依據 [man pages](http://man7.org/linux/man-pages/man3/offsetof.3.html) 說明，`offsetof` 巨集會回傳在結構體 `type` 中，成員 `member` 的位址偏移量 * `(type *) ((char *) __pmember - offsetof(type, member));` * 先把 `__pmember` 轉型成 pointer to char 的資料型態，藉此讓之後的 pointer arithmetics 時的 offset 位移量可以為 n * `sizeof(char)` > Ref: [cplusplus.com](http://www.cplusplus.com/doc/tutorial/pointers) * 之後把 `__pmember` 與 `member` 在結構體 `type` 中的位址偏移量相減即可得到該結構體最一開始的記憶體位址 * `({...})` * 在 [GCC manual](https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html) 的第 6.1 章說明此寫法為一個 GNU extension，並提及最終會回傳在 compound statement (大括號) 內最後一個 expression 的計算後的結果 ### 除了你熟悉的 add 和 delete 操作，`list.h` 還定義一系列操作，為什麼呢？這些有什麼益處？ * `list_add` 和 `list_add_tail` * 若有 LIFO 的需要可以使用 `list_add`，而若有 FIFO 的需求則可以使用 `list_add_tail` * `list_del_init` * 與一般的 `list_del` 不同的是，`list_del_init` 在呼叫 `list_del` 來把節點從 list 移除後，還會呼叫 `INIT_LIST_HEAD` 來對已經移除的節點再作一次初始化 * 用意為因為在 `list_del` 中並未對已移除的節點其 `prev` 和 `next` 做初始化，有可能會對已移除的節點造成誤用的情況發生 * `list_empty` * 檢查 list 中的 `head` 是否沒有任何節點連接 * `list_is_singular` * 檢查 list 中的 `head` 是否只有一個節點連接 * `list_splice` 系列 * 把兩條 list 連接在一起，不需一直呼叫 `list_add` 來把一個個的節點連接上去，增加效率 * `list_cut_position` * 已某個節點為基準點來切割成兩個 list * `list_move` 系列 * 把 list 中的某一個節點移到最前頭或最後頭 * `list_entry` 系列 * 在 `list.h` 可以看到 doubly linked list 的定義為 ```clike struct list_head { struct list_head *prev; struct list_head *next; }; ``` 而在使用時可以定義如下 ```clike struct listitem { int val; struct list_head list; }; struct listitem *item; ``` 並在走訪整個 doubly linked list 時可撰寫為 ```clike struct list_head *node; struct list_head *curr = item->list; for (node = curr->next; node != curr; node = node->next) { ... } ``` 從上述程式碼中可以發現到，在走訪時我們只能取得到 `struct listitem` 中的 `struct list_head` 也就是其節點，但無法取得到節點中的資料 `val`，但利用 `list_entry` 和其巨集內使用的 `container_of` 可以取得到 `struct listitem` 的最一開始的記憶體位址也跟著可以取得到節點中的資料 `val` 了總體來說，藉由定義了一系列操作，並將所有常用的操作用巨集或函式包裝起來，可以 * 不必再重新造輪子，加速開發時程 * 使其抽象化，讓程式可讀性增加 * [Linux kernel coding style](https://www.kernel.org/doc/html/v4.18/process/coding-style.html#functions) 中提到函式的寫法要又短又甜而且只做一件事，並且內容要可以在一兩頁內顯示完畢 > Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all know), and do one thing and do that well. * Robert C. Martin, *Clean Code: A Handbook of Agile Software Craftsmanship* > The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that. *[LIFO]: Last-In-First-Out *[FIFO]: First-In-First-Out ### `LIST_POISONING` 這樣的設計有何意義？在 `list.h` 的函式 `list_del` 的註解中提到在節點移除後，會將該節點的 `prev` 和 `next` 指向一個非法存取的記憶體位置，若之後有任何程式想要去存取已經移除掉的節點的話，即造成 segmentation fault，可藉此用來在除錯階段得知哪裡有誤 ```c /** * list_del() - Remove a list node from the list * * ... * * LIST_POISONING can be enabled during build-time to provoke an invalid memory * access when the memory behind the next/prev pointer is used after a list_del. * This only works on systems which prohibit access to the predefined memory * addresses. */ static inline void list_del(struct list_head *node) { struct list_head *next = node->next; struct list_head *prev = node->prev; next->prev = prev; prev->next = next; #ifdef LIST_POISONING node->prev = (struct list_head *) (0x00100100); node->next = (struct list_head *) (0x00200200); #endif } ``` 根據 [LITMUS wiki](https://wiki.litmus-rt.org/litmus/KernelDebugging) 中可以得知 `0x00100100` 和 `0x00200200` 為 Well-known addresses，目的為可以在 kernel oops 的時候發現是何種錯誤，因而挑選的一個特殊的位址 ### linked list 採用環狀基於哪些考量？ * 從任何一個節點皆可以走訪整個 list * 若想存取該節點的前一個節點無須從頭開始走訪 * 在刪除某一節點時，其時間複雜度可為 $O(1)$ 而不是 singly linked list 的 $O(n)$ * [Linux Device Driver 3/e](https://static.lwn.net/images/pdf/LDD3/ch11.pdf#page=10) 提到若 linked list 是環狀的，那麼每一個節點的操作都是一樣的，因此不用像 singly linked list 一樣要去維護 `head` 和 `tail` > Since Linux lists are circular, the head of the list is not generally differnet from any other entry. ### 什麼情境會需要對 linked list 做排序呢？舉出真實世界的應用，最好在作業系統核心出現 ### 什麼情境會需要找到 [第 k 大/小的元素](https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array) 呢？又，你打算如何實作？ ### `list_for_each_safe` 和 `list_for_each` 的差異在哪？"safe" 在執行時期的影響為何？ ```clike /** * list_for_each - iterate over list nodes * @node: list_head pointer used as iterator * @head: pointer to the head of the list * * The nodes and the head of the list must must be kept unmodified while * iterating through it. Any modifications to the the list will cause undefined * behavior. */ #define list_for_each(node, head) \ for (node = (head)->next; node != (head); node = node->next) /** * list_for_each_safe - iterate over list nodes and allow deletes * @node: list_head pointer used as iterator * @safe: list_head pointer used to store info for next entry in list * @head: pointer to the head of the list * * The current node (iterator) is allowed to be removed from the list. Any * other modifications to the the list will cause undefined behavior. */ #define list_for_each_safe(node, safe, head) \ for (node = (head)->next, safe = node->next; node != (head); \ node = safe, safe = node->next) ``` * 差別在於多了一個變數 `safe` 來預先儲存下一個節點 * 當使用 `list_del_init` 或是使用 `LIST_POISONING` 來對移除的節點作初始化時，需要多一個變數來預先儲存下一個的節點，否則在移除節點後，因已經移除的節點的 `prev` 和 `next` 已經被改變了，故在執行 `node = node->next` 時會造成不預期的結果 ### for_each 風格的開發方式對程式開發者的影響為何？ * 提示︰對照其他程式語言，如 Perl 和 Python ### 程式註解裡頭大量存在 `@` 符號，這有何意義？你能否應用在後續的程式開發呢？ * 提示︰對照看 Doxygen ### `tests/` 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ ### `tests/` 目錄的 unit test 可如何持續精進和改善呢？