2018q3 Homework3 (list)

# 2018q3 Homework3 (list) contributed by <[amikai](https://github.com/amikai/lab0-c/tree/list)> --- # 自我檢查表 * 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ * Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 * GNU extension 的 [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 有何作用？在程式碼中扮演什麼角色？ * 解釋以下巨集的原理 ```Clike #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` * 除了你熟悉的 add 和 delete 操作，`list.h` 還定義一系列操作，為什麼呢？這些有什麼益處？ * `LIST_POISONING` 這樣的設計有何意義？ * linked list 採用環狀是基於哪些考量？ * `list_for_each_safe` 和 `list_for_each` 的差異在哪？"safe" 在執行時期的影響為何？ * for_each 風格的開發方式對程式開發者的影響為何？ * 提示：對照其他程式語言，如 Perl 和 Python * 程式註解裡頭大量存在 `@` 符號，這有何意義？你能否應用在後續的程式開發呢？ * 提示: 對照看 Doxygen * `tests/` 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ * `tests/` 目錄的 unit test 可如何持續精進和改善呢？ * 從 [linux-list](https://github.com/sysprog21/linux-list) 學習裡頭的技巧，包含裡頭 unit test 的設計，之後改寫 [Homework1: lab0](https://hackmd.io/s/BJp_jq-tm) 的程式碼，需要重新 fork，命名為 ==lab-list==，使其成為 doubly-linked list 且充分考慮到 circular * 附上你的修改過程，也需要讓 `qtest` 得以支援 * 可將 [dict](https://hackmd.io/s/B1Bq5Jat7) 裡頭的測試資料拿來作效能分析的輸入 # 為何 Linux 採用 macro 來實作 linked list? function call 為了要記住跳回來的位址, 會將 return address 推入堆疊, 在參數的傳遞上有可能也會使用到 stack. 就算使用 `inline` 關鍵字, 也是建議編譯器展開, 最終決定權還是在編譯器, 所以使用 macro 可以省下這些成本. :::info 問題: 雖然有些操作使用 macro 來實作, 有些使用 function call, 為何不全部統一用 macro 就好 ::: # linked list 使用場景 TODO # typeof 和 container of 意義 ## typeof typeof 可以讓你在編譯時期就把某些變數的型態展開, 所以可以寫出以下程式: ```clike int a = 1; typeof(a) b = 2; // int b = 2 ``` linux 也使用它來做編譯時期的型態檢查, 可以看到[程式碼](https://elixir.bootlin.com/linux/latest/source/include/linux/typecheck.h#L9) ```clike= #define typecheck(type,x) \ ({ type __dummy; \ typeof(x) __dummy2; \ (void)(&__dummy == &__dummy2); \ 1; \ }) ``` 使用方法範例如下, 檢查 x 是不是 double 型態: ```clike int x = 1; typecheck(dobule, x); ``` 編譯之後馬上得到警告訊息: ``` test.c: In function main : test.c:5:22: warning: comparison of distinct pointer types lacks a cast (void)(&__dummy == &__dummy2); \ ^ test.c:12:5: note: in expansion of macro typecheck typecheck(double , x); ^~~~~~~~~ ``` 可以看到 typecheck 可以抓出 x 的型態, 並且填入你想檢查的型態, 因為兩個不同的指標型態不能比較, 若是此兩者型態不同則會發生編譯警告, 而且在 compile time 就可以得知 ## offsetof 要解釋 `container_of` 原理還要先懂 `offsetof`, offsetof 的用法就是給定成員的型態 , 以及成員的名稱, 則會回傳: 成員的位址 - struct 的起始位址 ![](https://i.imgur.com/DYiZ1sd.jpg) [offsetof source code](https://elixir.bootlin.com/linux/latest/source/include/linux/stddef.h#L19) ```clike #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) ``` 這段程式碼我覺得最神奇的就是, 為什麼可以對 0 地址做一些操作, 並不會被 mmu 所阻擋: 所以我寫了一隻程式 ``` struct employee { int age; int salary; }; int x = offsetof(struct employee, salary); ``` 對應到的反組譯 ``` 0x00000000000005fe <+4>: movl $0x4,-0x4(%rbp) ``` 看來邊一起都算好了是 4 就直接放進 x 了, 所以程式根本就沒有去讀 0 的位址, 所以也不會被 mmu 阻擋 :::info 問題: 為什麼 `(size_t)&((TYPE *)0)->MEMBER` 不會真的讀到 0 的記憶體位址? ::: 如果要我寫 `offsetof` 我只能寫出這種程式: ```clike= #define offsetof(TYPE, MEMBER) ({ \ TYPE __dummy; \ (size_t)((char *)&__dummy.MEMBER - (char *)&__dummy);}) ``` :::info 問題: 我的 offsetof 和 linux 的實做差異在哪裡 ::: ## cotainer_of `container_of` 用途: 給定成員的位址, struct 的型態, 成員的名稱, `container_of` 則會回傳此 struct 的位址, 以下是示意圖 ![](https://i.imgur.com/IgayoN9.jpg) `container_of` 在 linux linked list 是怎麼被使用的 ![](https://i.imgur.com/6h0Bgax.jpg) 我們自己常常在寫 list 的時候都是寫成上面那種型態, 但是 linux 是用下面的方式達成的, 你拿著 struct list_head 指標要怎得到自己的 data? 所以 linux 就撰寫了 `container_of` macro 讓你只要給定成員位址和 node 的型態, 就可以拿到指向 node 的指標, 再使用此指標指向 data 舉例: 你想要寫入下一個 node 的 data, 如果是上面那一個做法只要這樣就可以拿出資料: ```clike node->next->data = 1; ``` 如果是下面那個方式你就得向 `container_of` 求救了 ```clike container_of(list->next, struct ele, list)->data = 1; ``` 稍微做個總結 - `container_of`: 給定成員的位址, struct 的型態, 成員的名稱, `container_of` 則會回傳此 struct 的位址 - `offsetof`: 給定成員的型態 , 以及成員的名稱, 則會回傳: 成員的位址 - struct 的起始位址 - `typeof`: 給定一個變數, 將會在編譯時期展開成此變數的型態 `contain_of` 程式碼 ```Clike #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` 一步一步展開: - `__typeof__(((type *) 0)->member)` 是為了要得到這個 member 的型態並用來宣告- - - `__pmember` - `((char *) __pmember - offsetof(type, member))` 先用 char * 作轉型是為了在相減時拿到真正的記憶體真正相減的數字, 而不是偏移量 - 拿到此結構的位址之後, 在用結構的型態 (type *) 強制轉型 # 除了你熟悉的 add 和 delete 操作，`list.h` 還定義一系列操作，為什麼呢？這些有什麼益處？ TODO # `LIST_POISONING` 這樣的設計有何意義？可以看到[程式碼以及註解](https://github.com/ecsv/linux-like-list/blob/126c6625758476cf0095e9aa46bebf26eafb2729/list.h#L186) ```clike= /** * list_del() - Remove a list node from the list * @node: pointer to the node * * The node is only removed from the list. Neither the memory of the removed * node nor the memory of the entry containing the node is free'd. The node * has to be handled like an uninitialized node. Accessing the next or prev * pointer of the node is not safe. * * Unlinked, initialized nodes are also uninitialized after list_del. * * LIST_POISONING can be enabled during build-time to provoke an invalid memory * access when the memory behind the next/prev pointer is used after a list_del. * This only works on systems which prohibit access to the predefined memory * addresses. */ static __inline__ void list_del(struct list_head *node) { struct list_head *next = node->next; struct list_head *prev = node->prev; next->prev = prev; prev->next = next; #ifdef LIST_POISONING node->prev = (struct list_head *)(0x00100100); node->next = (struct list_head *)(0x00200200); #endif } ``` 可以知道此函數沒有真正的將這個 node 的記憶體釋放掉, 而是直接將這個 node 的 prev 及 next 成員分別指向 0x00100100 和 0x00200200 :::warning 為什麼是指向 0x00100100 和 0x00200200呢 ::: # linked list 採用環狀是基於哪些考量？在這則 stackoverflow 文章: [Why does the Linux kernel use circular doubly linked lists to store the lists of processes? ](https://stackoverflow.com/questions/46089965/why-does-the-linux-kernel-use-circular-doubly-linked-lists-to-store-the-lists-of) 提到: 如果每次都從頭搜尋是一個很耗時間, circular linked list 允許你以任何一個結點開使進行搜尋, 假設你手上有某個節點的位址, 你已經知道你需要搜尋的目標節點在附近但是又不太確定, 你可以從你手上的指標開始搜尋, 假設在附近就很快速找到, 不在附近最後也一定找得到只是花了較多的時間. 相比來說若不是使用環狀串列就無法這樣做 # list_for_each_safe 和 list_for_each差異以下是 `list_for_each_safe` , `list_for_each` 和 `list_del` 的程式碼 ```clike= #define list_for_each(node, head) \ for (node = (head)->next; node != (head); node = node->next) #define list_for_each_safe(node, safe, head) \ for (node = (head)->next, safe = node->next; node != (head); \ node = safe, safe = node->next) ``` 為什麼要使用 `list_for_each_safe` 是防止你在 `list_for_each` 裡使用 `list_del` 或是釋放節點資源, 舉例還說: 在迭代 list 時, 如果刪除了 A 節點, A 的下一個節點的位址就拿不到了 (因為需要使用 A->next 才能拿到), 所以使用了一個 `safe` 變數記住下一個節點的位址, 這樣就可以安全的刪除了 # lab0 改寫 ## background 在 linux 裡 circular double linked list 的設計會長這樣, 會有一個頭部將所有的 `entry` 串起來 ![](https://i.imgur.com/h17wwP9.jpg) 第一個元素不是 `head` 而是 `head->next` `list_first_entry` 才會這樣寫(從 head->next 取值): ```clike= #define list_first_entry(head, type, member) \ list_entry((head)->next, type, member) ``` 在之後的改寫都是能盡量使用 linux 的 marco 實作就使用, 以達到練習感受 linux list 的目的 ## data struture 將原本的資料結構改為 linux liked 的資料結構, 用 queue_t 去維護 list 的頭部, 並在 entry 嵌入 `list_head` ```clike= typedef struct ELE { char *value; struct list_head list; } list_ele_t; /* Queue structure */ typedef struct { struct list_head head; int size; } queue_t; ``` ## q_new 使用 linux 提供的 `INIT_LIST_HEAD` 初始化 list ```clike= queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (q) { INIT_LIST_HEAD(&q->head); q->size = 0; } return q; } ``` ## q_free linux 除了提供 `list_for_each_safe`, 走訪 list 的變數, 每次只能拿到 list 的位址, 並不能直接拿到 entry 的位址, 需要在使用 `container_of` 或是 `list_entry` 才能得到. 其實 linux 額外提供了 `list_for_each_entry_safe` 讓你不需要每次在使用 `container_of` 拿到 entry 的起始位址, 在走訪時就直接把 entry 位址給你 ```clike= void q_free(queue_t *q) { if (!q) return; list_ele_t *visit; list_ele_t *next; list_for_each_entry_safe(visit, next, &q->head, list) { free(visit->value); free(visit); } free(q); } ``` ## q_insert_head 這裡使用了 list_add 直接將 entry 放在 list 最前方 ```clike= bool q_insert_head(queue_t *q, char *s) { if (!q) return false; list_ele_t *newh = (list_ele_t *) malloc(sizeof(list_ele_t)); if (!newh) return false; newh->value = (char *) malloc(strlen(s) + 1); if (!newh->value) { free(newh); return false; } strcpy(newh->value, s); list_add(&newh->list, &q->head); q->size++; return true; } ``` ## q_insert_tail 這裡使用了 list_add 直接將 entry 放在 list 最後方 ```clike= bool q_insert_tail(queue_t *q, char *s) { if (!q) return false; list_ele_t *newh = (list_ele_t *) malloc(sizeof(list_ele_t)); if (!newh) return false; newh->value = (char *) malloc(strlen(s) + 1); if (!newh->value) { free(newh); return false; } strcpy(newh->value, s); list_add_tail(&newh->list, &q->head); q->size++; return true; } ``` ## q_remove_head linux 提供了 list_del 幫助你刪除節點以及 list_first_entry 幫助你拿到第一個 entry 的位址, 所以刪除第一個節點只要上述兩者函數搭配使用即可 ```clike= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { // if queue is NULL or empty if (!q || q->size == 0) return false; int size; list_ele_t *target_node = list_first_entry(&q->head, list_ele_t, list ); // if sp is non-NULL if (sp) { size = strlen(target_node->value); size = size > bufsize - 1 ? bufsize - 1 : size; strncpy(sp, target_node->value, size); sp[size] = '\0'; } list_del(&target_node->list); q->size--; free(target_node->value); free(target_node); return true; } ``` ## q_reverse 在 circular double linked list 要反轉其實特別簡單, 只要將 prev 和 next 裡的位址進行互換即可, 記住頭部 (head) 裡的位址也要替換 (最後 3 行) ```clike= void q_reverse(queue_t *q) { if (!q || q->size == 0) return; struct list_head *visit, *temp, *safe; list_for_each_safe(visit, safe,&q->head) { temp = visit->next; visit->next = visit->prev; visit->prev = temp; } temp = q->head.next; q->head.next = q->head.prev; q->head.prev = temp; } ``` ## 測試程式的修改目前只是將不相容的程式碼, 用語意相同的方式去替換, qtest 互動介面在使用 new 時會偵測到 list 有 loop, 或有錯誤警告, 但是這是屬於正常狀況, 還在修改中