2018q3 Homework3 (list)

# 2018q3 Homework3 (list) contributed by < [`butastur-rtos`](https://github.com/butastur-rtos) > ###### tags: `homework3` `list` `sysprog2018` * 預備知識 * 作業要求 ## 預備知識 * \_\_extension__ * [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) * doubly-linked list ### \_\_extension__ ### typeof * What's the different between `__typeof__` and `typeof`? * [__builtin_types_compatible_p](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) * [BUILD_BUG_ON_ZERO](https://github.com/torvalds/linux/blob/master/include/linux/build_bug.h#L23) * [__must_be_array](https://github.com/torvalds/linux/blob/master/tools/include/linux/compiler-gcc.h#L21) * [__same_type](https://github.com/torvalds/linux/blob/master/tools/include/linux/compiler.h#L25) ### [doubly-linked list ](https://www.cs.cmu.edu/~guna/15-123S11/Lectures/Lecture11.pdf) ## [作業要求](https://hackmd.io/s/SyVPDd1cX) * 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ * Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 * GNU extension 的 typeof 有何作用？在程式碼中扮演什麼角色？ * 解釋以下巨集的原理 ```clike #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` * 除了 add 和 delete 操作，[list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 還定義一系列操作，為什麼呢？這些有什麼益處？ * LIST_POISONING 這樣的設計有何意義？ * linked list 採用環狀是基於哪些考量？ * list_for_each_safe 和 list_for_each 的差異在哪？“safe” 在執行時期的影響為何？ * for_each 風格的開發方式對程式開發者的影響為何？ * 程式註解裡頭大量存在 @ 符號，這有何意義？能否應用在後續的程式開發呢？ * tests/ 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ * tests/ 目錄的 unit test 可如何持續精進和改善呢？ * 改寫 Homework1: lab0 使其成為 doubly-linked list 且充分考慮到 circular * 附上修改過程，也需要讓 qtest 得以支援 * 將 dict 裡頭的測試資料拿來作效能分析的輸入 ### 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ macro 是在 compile 的時候作 replace ### Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 [ [source] ](https://github.com/torvalds/linux/blob/master/net/ipv4/fib_lookup.h#L9) ```clike struct fib_alias { struct hlist_node fa_list; ... }; ``` [ [source] ](https://github.com/torvalds/linux/blob/master/net/ipv4/fib_trie.c#L2661) ```clike struct key_vector { ... union { /* This list pointer if valid if (pos | bits) == 0 (LEAF) */ struct hlist_head leaf; ... }; }; ... static int fib_route_seq_show(struct seq_file *seq, void *v) { struct fib_alias *fa; struct key_vector *l = v; ... hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) { ... } ... } ``` 以上的 code 有幾個地方要考慮一下: * hlist_for_each_entry_rcu 會透過 l->leaf 的 address 取得什麼? * fa 作為 loop 的 cursor，是屬於哪一個 collection 的 cursor? ### 解釋以下巨集的原理 ```clike #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` 解釋巨集 container_of 之前先解釋以下兩件事 * typeof * [offsetof](https://github.com/torvalds/linux/blob/master/scripts/kconfig/list.h#L10) 從 typeof 說起，舉 2 個例子宣告 ptr_x 為一個指標，這個指標所指向的 type 和 x 一樣 ```clike int x; typeof(&x) ptr_x; ``` __builtin_types_compatible_p 是一個 built-in function，如果 2 個 type 一樣就返回 1 ```clike __builtin_types_compatible_p(typeof(x), int) ``` 接著解釋 offsetof ```clike #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) ``` ```clike struct node { int a; char b; }; void main(int argc, char **argv) { int offset = offsetof(struct node, b); printf("%d\r\n", offset); // this will be 4 printf("sizeof(int):%d\r\n", sizeof(int)); } ``` 為什麼上面的輸出會是 4 呢? 因為 sizeof(int) 是 4 個 bytes, offsetof 這個 macro 是計算 TYPE 裡要 offset 幾個 byte [ [offsetof] ](https://zh.wikipedia.org/wiki/Offsetof) But, 這樣理解 offsetof 是不夠的, 要從 [C99規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) 來看 offsetof 的規範 [7.17 Common definitions](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf): :::info offsetof(type, member-designator) which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type) ::: 以 [C99規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) 重新審視一下 offsetof，就是取得一個 size_t，代表這一個 member 在 structure 裡，距離這個 structure 的 beginning 有幾個 bytes。 [offsetof macro](http://blog.linux.org.tw/~jserv/archives/001399.html) 對 offsetof 有更仔細的說明，透過這篇文章，再來重新看一下這個巨集 ```clike #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) ``` :::info Nigel Jones 對 offsetof macro 作了一步一步的說明，重新整理一下如下： * ((TYPE *)0) 把 0 當成是一個指向 TYPE 的 pointer 的 value * ((TYPE *)0)->MEMBER ((TYPE *)0)是一個 pointer，指向0 這個 address，((TYPE *)0)->MEMBER 就是透過這個 pointer 指向 structure member (在本例就是MEMBER) * &((TYPE *)0)->MEMBER) 就是 structure member 的 address * ((size_t) &((TYPE *)0)->MEMBER) 對 structure member 作 cast ::: :::warning :question: ((<b>size_t</b>) &((TYPE *)0)->MEMBER) 把 structure member cast 成 <b>size_t</b> 為什麼可以表示 member 到 beginning of its structure 有幾個 bytes ? ::: 先看一下沒有 \_\_extension__ 的情況 [ [source] ](https://github.com/torvalds/linux/blob/master/scripts/kconfig/list.h#L12) ```clike /** * container_of - cast a member of a structure out to the containing structure * @ptr: the pointer to the member. * @type: the type of the container struct this is embedded in. * @member: the name of the member within the struct. * */ #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) ``` :::info container_of 的目的是用三個參數來取得一個 struct 的address * pointer to the member * 這個 member 所位於的 struct 是什麼 type * 這個 member 的名字 ::: 再講一件事就可以開始解釋 container_of 的原理了，就是==對 pointer 減 1== 實際上發生了什麼事? 看一下以下的範例 ```clike int a = 111; char b = 'a'; int *ptr_int = &a; char *ptr_char = &b; printf("ptr_int: %p\r\n", ptr_int); ptr_int++; printf("ptr_int: %p\r\n\r\n", ptr_int); printf("ptr_char: %p\r\n", ptr_char); ptr_char++; printf("ptr_char: %p\r\n", ptr_char); ``` ```shell ptr_int: 0x7ffd38371d14 ptr_int: 0x7ffd38371d18 ptr_char: 0x7ffd38371d13 ptr_char: 0x7ffd38371d14 ``` :::info * 隨著 pointer 所指向的 object 的 sizeof 不一樣，減 1 所造成的 offset 也不一樣。 * 直接對 pointer to member 減掉 offsetof 得到的偏移量 (offsetof 的單位是 byte)，不會得到一定正確的結果 ::: 以下解釋 container_of 的原理 * member 的名字 * 是給 ==typeof== 使用，用來宣告 pointer to the member。這個新的 pointer 之後會被 cast to char *，因為先前提到的 offsetof 計算出的偏移量，==單位是 byte==。 * 給 ==offsetof== 使用，取得 member 到 structure 的開頭有幾個 bytes，是一個以 byte 為單位的偏移量 * 有了 offsetof 這個 macro，就可以用 pointer to the member 來取得 member 到 structure 的開頭有幾個 bytes，把這個 pointer to the member 的 value (是 value 而不是 address) ==cast to char *==(理由先前有提到)，然後減掉這個 offset 就可以取得這個 structure 的 address。 * 簡單來說，就是==把一個指向 structure member 的 char * 減掉 offset 就可以取得 address of structure==。 ### LIST_POISONING 這樣的設計有何意義？ * non-initialized list entries * [LIST_POISON1](https://github.com/torvalds/linux/blob/master/include/linux/poison.h#L23) #### non-initialized list entries #### [LIST_POISON1](https://github.com/torvalds/linux/blob/master/include/linux/poison.h#L23) 對 0x100 這個 address 作 dereference 會發生什麼事? 會[產生一個 page fault exception](https://github.com/torvalds/linux/blob/master/include/linux/poison.h#L19) [source](https://github.com/torvalds/linux/blob/master/include/linux/list.h#L123) ```clike static inline void list_del(struct list_head *entry) { __list_del_entry(entry); entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } ``` 但是這樣的設計是為了什麼? [source](https://github.com/torvalds/linux/blob/master/include/linux/poison.h#L19) ```shell These are non-NULL pointers that will result in page faults under normal circumstances, used to verify that nobody uses non-initialized list entries. ``` 底下用一段code來實驗 uses non-initialized list entries 會發生什麼事 ```clike #include <stdio.h> void main(int argc, char **argv) { char * ptr = ((void *) 0x100); printf("%c\r\n", *ptr); } ``` 以上這段的執行結果會是 ```shell $ gcc test1.c -o test1 $ ./test1 程式記憶體區段錯誤 (核心已傾印) ``` 至此，可以說 LIST_POISONING 設計的意義就如同 [註解](https://github.com/torvalds/linux/blob/master/include/linux/poison.h#L19) 說的那樣，是用來==驗證未初始化的 list entry 不會被使用到==，如果用到就產生一個 page fault exception ### linked list 採用環狀是基於哪些考量？ * 什麼是環狀? ### list_for_each_safe 和 list_for_each 的差異在哪？“safe” 在執行時期的影響為何？ ### 程式註解裡頭大量存在 @ 符號，這有何意義？能否應用在後續的程式開發呢？底下示範 @ 如何應用在文件以及如何在開發的過程中發生 @ 的意義 ### 改寫 Homework1: lab0 使其成為 doubly-linked list 且充分考慮到 circular * [doubly-linked list ](https://www.cs.cmu.edu/~guna/15-123S11/Lectures/Lecture11.pdf) * 參考 [list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) * 增加 prev 這個 node * 考慮 insert / remove node 時 prev 要另外作什麼事? * 充分考慮到 circular 先增加一個 prev，指向 list 裡==每一個 node 的前一個 node== ```clike typedef struct ELE { char *value; struct ELE *next; struct ELE *prev; } list_ele_t; ``` 先改寫 q_insert_tail，這個 function 是 insert 到 list 的 tail，所以新增的 node 的 prev 要指向 tail 所指向的 node ```clike bool q_insert_tail(queue_t *q, char *s) { if (q == NULL) { return false; } /* You need to write the complete code for this function */ /* Remember: It should operate in O(1) time */ /* newt means new tail */ list_ele_t *newt; newt = malloc(sizeof(list_ele_t)); if (newt != NULL) { newt->next = NULL; newt->value = strdup(s); newt->prev = q->tail; if (q->size == 0) q->head = newt; else q->tail->next = newt; q->tail = newt; q->size++; } return newt != NULL; } ``` 但 `make test` 的結果是 ```shell +++ TESTING trace trace-06-string: # Test of truncated strings ERROR: Segmentation fault occurred. You dereferenced a NULL or invalid pointer --- trace-06-string 0/7 ``` q_reverse 在 size > 1 的時候才有意義 ```clike void q_reverse(queue_t *q) { if (q != NULL && q->size > 1) { ... } ... } ``` 接著改寫 q_insert_head，第一個 element (就是 newh) 的 prev 先指向 NULL，等一下再考慮如果不指向 NULL 的情況 ```clike bool q_insert_head(queue_t *q, char *s) { ... if (newh != NULL) { ... newh->prev = NULL; ... } ... } ``` `make test` 的結果 ```shell --- trace-15-perf 7/7 --- TOTAL 100/100 ``` :::info 還剩下兩個 function 要改寫 * q_remove_head * q_reverse ::: 接下來改寫 q_remove_head remove head 之後，把 head 的 prev 指向 NULL ```clike bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { ... if(q->head != NULL) q->head->prev = NULL; return true; } ``` `make test` 的結果 ```shell --- trace-15-perf 7/7 --- TOTAL 100/100 ``` 接下來改寫 q_reverse circular 的情況 :::danger 持續更新中... :::