2018q3 Homework3(list)

# 2018q3 Homework3(list) ## 自我檢查事項 ### 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ ### Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 * job scheduler: * [linux kernel sched/fair.c](https://elixir.bootlin.com/linux/latest/source/kernel/sched/fair.c#L780) * [CFS document](https://www.mjmwired.net/kernel/Documentation/scheduler/sched-design-CFS.txt) * Linux kernel 在 2.6.23 版本後採用的排成器，使用的是 Completely Fair Scheduler ，內部實作 queue 來儲存任務的資料結構就是以 linked list 實作 * file system: * [fs.h](http://lxr.linux.no/linux-bk+v2.6.10/include/linux/fs.h) * 在 linux kernel 內管理硬碟閒置空間的一種解法就是以 linked list 彼此串連相對小塊的空間，稱之為 block ### 解釋以下巨集的原理 ```C= #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` 乍看之下不太明白使用的場景以及確切功能，第一步直覺上是以 preprocessor 的方式 `gcc -E -P` 輸出， [containerof.c](https://github.com/sysprog21/linux-list/blob/master/tests/containerof.c) 經過預處理器後輸出如下： ```C= // the output of linux-list/tests/containerof.c void __assert_rtn(const char *, const char *, int, const char *) __attribute__((noreturn)) __attribute__((__disable_tail_calls__)); typedef long int ptrdiff_t; typedef long unsigned int size_t; typedef long unsigned int rsize_t; typedef int wchar_t; typedef long double max_align_t; struct teststruct { int a; int b; }; int main(void) { struct teststruct item; (__builtin_expect(!(&item == __extension__({ const __typeof__(((struct teststruct *) 0)->b) *__pmember = (&item.b); (struct teststruct *) ((char *) __pmember - __builtin_offsetof(struct teststruct, b)); })), 0) ? __assert_rtn(__func__, "test.c", 17, "&item == container_of(&item.b, struct teststruct, b)") : (void)0); return 0; } ``` 從上述預處理過後的程式碼依然沒有獲得新的資訊，進一步發現這個函數在 linux kernel 內有被定義，從 offsetof 開始逐步拆解。 ```C= /* offsetof 定義在 <linux/stddef.h>中，把 0 * 轉型成 TYPE* 再存取其中一個 member，並取出其 * 位址，最後轉型為 long int 才能做之後的處理 * * 0 member * v v * [= member_A 5 bit =][= member_B 6 bit =][=== other member ====] * */ #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) ``` 從上方程式碼可以看出，`offsetof(test_struct, member_B)` 會輸出指向 member_A 末端的位址，並且因為是從 0 開始也就代表著整個 structure 內 member_B 之前所有 member 佔有的空間。在 containerof macro function 內，我們先宣告一個指標，指向其中一個 member 的起始位置，減去 offsetof 之後就會回到整個 structure 的開頭點，可知 containerof 的目的在於，給定一個 structure 內的 member 輸出這個 member 對應的整個 structure 的起始點。 ### GNU extension 的 typeof 有何作用？在程式碼中扮演什麼角色？最直觀的觀點是 typeof 提供我們一種方法去存取某個變數的類型，考慮以下程式碼 ```C= #define typecheck(type,x) \ ({ type __dummy; \ typeof(x) __dummy2; \ (void)(&__dummy == &__dummy2); \ 1; \ }) ``` 以上巨集函數設計目的在於檢驗 x 是否真為 type 參數所描述的類型，實際使用上會是 `typecheck(int, x)` 這個樣子，以傳入的參數先宣告了一個 __dummy 再用 x 的類型宣告了一個 __dummy2 並比較兩者地址，若 __dummy 和 __dummy2 不是同一個類型，在編譯階段就會跳出警告。同樣可以存取類型的方式，對應到 C++ 有一個關鍵字 `decltype` 也可以對變數類型作存取。 ### 除了你熟悉的 add 和 delete 操作，list.h 還定義一系列操作，為什麼呢？這些有什麼益處？ * list.h 內包含許多 list_splice 系列對多 node 的操作函數，如下方舉例以 list_splice 為例 ```C= static inline void list_splice(struct list_head *list, struct list_head *head) { struct list_head *head_first = head->next; struct list_head *list_first = list->next; struct list_head *list_last = list->prev; if (list_empty(list)) return; head->next = list_first; list_first->prev = head; list_last->next = head_first; head_first->prev = list_last; } ``` 專案規模逐漸擴大時，很直覺得反應會是以 OOP 的形式撰寫專案，而物件導向自然也就包含 constructor 和 distructor，直覺上看到這樣的函式介面會讓我想到 C++ 的 copy-constructor，有了這樣的一個操作介面，縱使我們一樣也可以用一個 for-loop 達到一樣的效果，但大粒度的操作提高了靈活性。 * list.h 內也提供了一些 move 函數，如 [list_move](https://github.com/sysprog21/linux-list/blob/master/include/list.h#L318)方便移動節點位置，以下附上相關程式碼。 ```C= static inline void list_move(struct list_head *node, struct list_head *head) { list_del(node); list_add(node, head); } ``` 在我看來這樣的一個 linked list 資料結構基本上就可以視為一個簡單的 database 也就是必須起碼要有 CRUD 這類的功能，所以我想這樣的 mv 函數也是之前在 single linked list 內沒做到的地方。 * entry 系列的 macro function，list.h 內提供多個搭配 containerof 這個 macro function 直接取得不同節點位址的函式，老師在上課中總是強調希望一個操作能在常數時間複雜度的情況下完成，有了這樣各個不同節點的起始位置，甚至可以做到建置成一個 hash table 在常數時間內知道不同節點內匹配的內容，而不是每次需要搜尋節點就從頭遍歷，以下附上相關的巨集函數實作細節。 ```C= #define list_entry(node, type, member) container_of(node, type, member) ``` ### LIST_POISONING 這樣的設計有何意義？以下是有關 LIST_POISONING 的相關程式碼 ```C= static inline void list_del(struct list_head *node) { struct list_head *next = node->next; struct list_head *prev = node->prev; next->prev = prev; prev->next = next; #ifdef LIST_POISONING node->prev = (struct list_head *) (0x00100100); node->next = (struct list_head *) (0x00200200); #endif } ``` 從上方程式碼中可以看到，若是我們打開 LIST_POISONING 他會對我們從 list 內抽出的節點之 prev 和 next 指標 assign `0x00100100` 和`0x00200200` 兩個地址，而這是無效的地址，在 Hw2 的 lab0c 中也有相關的處理，他的做法是 assign 進一個 `0xdeadbeef` 通過這個做法，若是在之後的程式碼對這個地址填值，因為不是合法地址， process 會 crash 掉，由此達到在 debug 階段很清楚知道是哪裡的問題，此類技巧尤其適合用在多執行緒。 ### linked list 採用環狀是基於哪些考量？在網路上其實有很多篇文章在探討為何 kernel 內的 list 大多都是 circular list ，雖然 circular & non-circular 都有實作，不過 circular list 在實際用途上顯得更有彈性，在實際上需要遍歷一個 list 的時候，終止條件就不會是 `if (present_node -> next == NULL)` 而是 `while(present_node -> next != list_head)`且所有節點皆可以做為 list 的 head.就我所知 linux kernel 內管理 process 的表格是以 linked list 實作的，如果想殺死現在執行中的所有 process 下 `kill(-1, sig)` 指令勢必得要遍歷整個 list，而環狀的設計就會比較好使用。 ![image alt](https://cdncontribute.geeksforgeeks.org/wp-content/uploads/CircularLinkeList.png) ### `list_for_each_safe` 和 `list_for_each` 的差異在哪？“safe” 在執行時期的影響為何？ ```C= /** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. * @head: the head for your list. */ #define list_for_each(pos, head) \ for (pos = (head)->next; pos != (head); pos = pos->next /** * list_for_each_safe - iterate over a list safe against removal of list entry * @pos: the &struct list_head to use as a loop cursor. * @n: another &struct list_head to use as temporary storage * @head: the head for your list. */ #define list_for_each_safe(pos, n, head) \ for (pos = (head)->next, n = pos->next; pos != (head); \ pos = n, n = pos->next) ``` * list_for_each: * 以 macro function 封裝了遍歷整個 list 的過程 ```C= // 實際使用場景應該如下 list_ele_t *pos = malloc(struct list_ele_t); list_for_each(pos, q->head) { // do something on el } ``` * 相對於 `list_for_each_safe` 這個函數只吃兩個輸入 * list_for_each_safe: * list_for_each_safe一樣實作了整個封裝遍歷的過程: ```C= list_ele_t *pos = malloc(struct list_ele_t); list_ele_t *tmp = malloc(struct list_ele_t); list_for_each ``` * 最大的差別之處在於在對 `pos` 進入該次 iteration 操作之前會先以 temp 把下一節點位址記錄下來，這樣做的好處是防止可能在下面的操作對 pos 做出了不可預期的錯誤，像是直接 free 掉或是把 pos 只向別處。 ### for_each 風格的開發方式對程式開發者的影響為何？ > 提示：對照其他程式語言，如 Perl 和 Python ```Python= my_list = [1, 2, 3] # 我對 Perl 並不熟悉以下以 Python 舉例 # 在 Python 內想做到 for_each 等價的操作，可以有下列實作方式 # 1. simple for-loop for i, el in enumerate(my_list): do_something(i, el) # 2. map, reduce 和 filter # Python 提供這類的內建函數可以方便地對每個元素操作，以下以 map 作為舉例 map(lambda x: x**2, my_list) # 3. 以 Function 的形式 # 和第一類最大的差異在於多了一層 wrapper 使用者看到的只是把想對每個元素做 # 的動作包在一起丟進去，我想 map, reduce 和 filter 的內部實作也是這樣 def ForEach(list, function): for i, el in enumerate(list): function(i, el) ``` 主要最大的影響在於從最直觀的，自己實作遍歷整個 list 改為把函數作為參數引入。 ### 程式註解裡頭大量存在 @ 符號，這有何意義？你能否應用在後續的程式開發呢？在軟體開發的過程中一份好的文件敘述，讓使用者不僅了解設計者的巧思，也可以讓使用者在使用過程中，利用得更好，在不同語言存在許多自動化的文件產生器，而 doxygen 提供了在 C, Objective-C, C#, PHP, Java, Python, IDL (Corba, Microsoft, and UNO/OpenOffice flavors), Fortran, VHDL, Tcl 這類語言的文件自動化，從 doxygen 的文件可以看出 `@`表示的是一個函數的輸入以及其代表的相對意義。 ### tests/ 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ ```C= #include <assert.h> #include "list.h" #include "common.h" int main(void) { struct list_head testlist; struct listitem item1; struct listitem item2; struct listitem item3; struct listitem item4; struct listitem *item = NULL; size_t i; item1.i = 1; item2.i = 2; item3.i = 3; item4.i = 4; INIT_LIST_HEAD(&testlist); assert(list_empty(&testlist)); list_add(&item1.list, &testlist); list_add(&item3.list, &item1.list); list_add(&item2.list, &item1.list); list_add(&item4.list, &item3.list); i = 1; list_for_each_entry (item, &testlist, list) { assert(item->i == i); i++; } assert(i == 5); return 0; } ``` 如老師在上課過程中一在強調，在工業上一個軟體的安全性以及正確性是必須要被要求的，而這樣的單元測試就是以一個自動化的手段測試我們實作的函式，是否真如我們希望有那樣的輸入輸出，另一方面也測試寫出來的程式在面對意想不到的輸入會有怎樣的行為。 ### tests/ 目錄的 unit test 可如何持續精進和改善呢？對我來說，我會想要改進下列幾點： * 我會想要有一個像是 lab0-c 那樣的 CLI 介面可以和使用者互動 * 仔細觀察每一個測試程式碼會發現在每個測試檔案中有多個部分是重複的，若是以像 google-test 那樣進行封裝，想必可以以更簡短的形式呈現