2019q3 Homework3 (list)

# 2019q3 Homework3 (list) contributied by <`nckuoverload`> ###### tags: `sysprog2019` ## 自我檢查清單 - [x] 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ - [x] Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 - [x] GNU extension 的 [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 有何作用？在程式碼中扮演什麼角色？ - [x] 解釋以下巨集的原理 ```cpp #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` - [ ] 除了你熟悉的 add 和 delete 操作，`list.h` 還定義一系列操作，為什麼呢？這些有什麼益處？ - [ ] `LIST_POISONING` 這樣的設計有何意義？提示: 和 [Linux 核心記憶體管理](https://hackmd.io/@sysprog/rJBXOchtE)有關 - [ ] linked list 採用環狀是基於哪些考量？ - [ ] 什麼情境會需要對 linked list 做排序呢？舉出真實世界的應用，最好在作業系統核心出現 - [ ] 什麼情境會需要需要找到 [第 k 大/小元素](https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/) 呢？又，你打算如何實作？ - [ ] `list_for_each_safe` 和 `list_for_each` 的差異在哪？"safe" 在執行時期的影響為何？ - [x] for_each 風格的開發方式對程式開發者的影響為何？ * 提示：對照其他程式語言，如 Perl 和 Python - [ ] 程式註解裡頭大量存在 `@` 符號，這有何意義？你能否應用在後續的程式開發呢？ * 提示: 對照看 Doxygen - [ ] `tests/` 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ - [ ] `tests/` 目錄的 unit test 可如何持續精進和改善呢？ --- ### 1. 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？首先可以參考 [macro vs linked list](https://stackoverflow.com/questions/9104568/macro-vs-function-in-c) 。總結來說，這篇的第一個回覆者覺得各有好壞， macro 並沒有型態檢查 (type checking) ，因此可以當作多型 (polymorphic type) 來使用。第二個回覆者，針對這兩個方式做了比較，可以整理成下表: | macro | function | |-|-| |Macro is Preprocessed|Function is Compiled| |No Type Checking|Type Checking is Done| |Code Length Increases | Code Length remains Same |Use of macro can lead to side effect | No side Effect |Speed of Execution is Faster | Speed of Execution is Slower |Before Compilation macro name is replaced by macro value | During function call, Transfer of Control takes place |Useful where small code appears many time | Useful where large code appears many time |Macro does not Check Compile Errors | Function Checks Compile Errors 第二個部分可以討論 **sied effect**: 首先可以參考[這篇問答](https://stackoverflow.com/questions/32284073/what-are-expressions-with-side-effects-and-why-should-they-be-not-passed-to-a-ma)第一個回覆者有舉些例子，例子主要在說明有些情況下，使用 macro 會造成的副作用，例如: `#define MAX(a, b) ((a) > (b) ? (a) : (b))` 這個 macro 程式碼在 `C99 §6.10.3.5` 也有出現，如果將它寫成 ```cpp #define MAX(a, b) ((a) > (b) ? (a) : (b)) int x = 5; int y = 7; int z = MAX(x++, y++); ``` y 的部分會累加兩次，解構成 `int z = ((x++) > (y++) ? (x++) : (y++));` ，在前半比較部分就會先累加一次，在三元運算子後又會再累加一次。第三部分可以探討執行速度 (Speed of Execution): 在編譯階段，編譯器就會將程式碼中有 macro 的部分替換成 macro 的程式碼，因此在執行階段，程式不需要透過 function call 去跳到函式的所在位址，也就少了呼叫函式的 overhead。 #### 小結: 在 Linux 中會大量使用到 linked-list，如果使用 macro 可以在執行速度上表現較佳。另外在多型方面也較符合 reusable 的精神。 :::info 不確定這邊是否有用到多型的概念? ::: #### 比較: 如果執行 10000 次作業中的 `merge-sort.c` ，分別引入有 `inline` 和沒有的版本，會如圖所示。 `inline` 平均為 1167 (ticks) `noinline` 平均為 1208 (ticks) ![](https://i.imgur.com/O3IhLTC.png) 詳細上來看其實差異並不大，整體沒有使用 `inline` 會較大， ### 2. Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 1. [`linux/net/netfilter/nf_conncount.c`](https://github.com/torvalds/linux/blob/9c7db5004280767566e91a33445bf93aa479ef02/net/netfilter/nf_conncount.c) netfilter 可以參考自 [Wikipedia](https://en.wikipedia.org/wiki/Netfilter) 的解釋，是一個網路相關操作的框架。這份程式碼主要是用來計算配對(連結)到一個點所需要的連結數。 `count the number of connections matching an arbitrary key.` ```cpp=40 /* we will save the tuples of all connections we care about */ struct nf_conncount_tuple { struct list_head node; struct nf_conntrack_tuple tuple; struct nf_conntrack_zone zone; int cpu; u32 jiffies32; }; ``` 在網路中，因為連結點會動態的增減，適合使用 linked-list 這種不需要事先宣告大小的資料型態。 2. [`linux/drivers/watchdog/watchdog_pretimeout.c`](https://github.com/torvalds/linux/blob/2f4c53349961c8ca480193e47da4d44fdb8335a8/drivers/watchdog/watchdog_pretimeout.c) 3. [`linux/fs/dcookies.c`](https://github.com/torvalds/linux/blob/9c7db5004280767566e91a33445bf93aa479ef02/fs/dcookies.c#L3) ### 3. for_each 風格的開發方式對程式開發者的影響為何？ Java 中有個 [Iterator Interface](https://docs.oracle.com/javase/8/docs/api/java/util/Iterator.html) 用法和這邊的 foreach 有類似的行為。當一個開發者需要去探訪每一個節點時方便使用，`for_each` 行為上類似 `hasNext()`、`Next()`等。當一個開發者需要探訪某一個集合或是資料結構底層時，不需要了解對應的結構，只要會善用這種函式就可以輕易的走訪每個節點。 ### 4. 解釋 container_of 巨集的原理首先參考這篇[Linux Kernel: container_of 巨集](http://adrianhuang.blogspot.com/2010/01/linux-kernel-containerof.html)。主要是用來方便程式取得某一個結構的起始位址。 `__extension__`: 參考自[Alternate Keywords](https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#Alternate-Keywords)，GCC 在ANSI C 上做一些擴展，使用 `__` 這類型的就是避免編譯器對這些擴展拋出警告。 `__typeof__`: 依照 [Wikipedia](https://en.wikipedia.org/wiki/Typeof) 的敘述，主要是用來指定某個變數的型別，透過傳入的 1. 表示式 2. 類別來決定。在 macro 中，因為 macro 無法指定型別，透過這個函式可以用來將型別指定為傳入的參數。 `offset()`: 計算該變數在這個結構上的偏移值。下列程式碼中，`test` 的偏移量為 0 ，`a` 的偏移量為10 byte。 ```cpp struct node { char test[10]; int a; }; ``` `container_of` 搭配 linked-list 的 `head` 可以很輕鬆地得到整個 list 的起始位址。它會用 `list_entry` 包裝，並且搭配 `list_first_entry` 和 `list_last_entry` 使用。 `head` 為這個串列的 head，它的 *next 即為這個串列的第一個節點的起始位址。 ```cpp #define list_first_entry(head, type, member) \ list_entry((head)->next, type, member) ``` 反之，可以得到串列的最後一個節點的起始位址。 ```cpp #define list_last_entry(head, type, member) \ list_entry((head)->prev, type, member) ``` ## implement merge sort ### 1. 切割程式碼第 26 ~ 45 行，主要是使用遞迴將程式碼切割成左和右。 36~37: 為了避免弄亂排序，左邊的部分從頭開始依序加入，右邊的部分則反向從尾巴開始依序加入。 38~41: 如果串列剩下一個值，無法切割左右，則加入到左邊。 44~45: 遞迴處理。 ```cpp=26 struct list_head list_left, list_right; INIT_LIST_HEAD(&list_left); INIT_LIST_HEAD(&list_right); struct listitem *left_item = NULL, *right_item = NULL; while (!list_empty(head)) { left_item = list_first_entry(head, struct listitem, list); right_item = list_last_entry(head, struct listitem, list); list_del(&left_item->list); list_del(&right_item->list); list_add(&right_item->list, &list_right); list_add_tail(&left_item->list, &list_left); if (list_is_singular(head)) { left_item = list_first_entry(head, struct listitem, list); list_del(&left_item->list); list_add_tail(&left_item->list, &list_left); } } list_mergesort(&list_left); list_mergesort(&list_right); ``` ### 2. 比較合併遞迴完成後，將左邊和右邊比較並且插入插入串列中。 48~54: 當左或右其中一個串列已經為空時，將未空的串列剩下的節點全數直接貼到目標串列中。 59~64: 先取出左右串列中第一個值，並且比較後將較小的貼到目標串列中。 ```cpp=47 while (!list_empty(&list_left) || !list_empty(&list_right)) { if (list_empty(&list_left)) { list_splice_tail(&list_right, head); break; } else if (list_empty(&list_right)) { list_splice_tail(&list_left, head); break; } struct listitem *l = list_first_entry(&list_left, struct listitem, list), *r = list_first_entry(&list_right, struct listitem, list); if (cmpint(&l->i, &r->i) < 0) { list_del(&l->list); list_add_tail(&l->list, head); } else { list_del(&r->list); list_add_tail(&r->list, head); } } ``` ###