# 2018q3 Homework3 (list) contributed by < [`p61402`](https://github.com/p61402) > ###### tags: `sysprog` ## Linux 風格的 linked list - [ ] 自我檢查事項: ### 為何 Linux 採用 macro 來實作 linked list?一般的 function call 有何成本? 使用 macro 相對於 function 有幾個優點以及缺點: - 優點 - 較有效率 - 不需傳遞參數 (可能是優點也是缺點) - compiler 會預先處理 - 缺點 - Debug 較為困難 - 可能會使用較多的記憶體 --- ### Linux 應用 linked list 在哪些場合?舉三個案例並附上對應程式碼,需要解說,以及揣摩對應的考量 1. [linux/include/linux/mm_types.h](https://github.com/torvalds/linux/blob/master/include/linux/mm_types.h):**LRU** 在此 header file 中定義了`page`的結構,使用`union`來節省記憶體的使用。 其中一個`struct`宣告了 page cache 以及 anonymous page 的使用,在第 17 行宣告一個`lru`的 linked list 用來實現記憶體管理機制的 least recently used 的演算法。 ```clike= struct page { unsigned long flags; /* Atomic flags, some possibly * updated asynchronously */ /* * Five words (20/40 bytes) are available in this union. * WARNING: bit 0 of the first word is used for PageTail(). That * means the other users of this union MUST NOT use the bit to * avoid collision and false-positive PageTail(). */ union { struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by * zone_lru_lock. Sometimes used as a generic list * by the page owner. */ struct list_head lru; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; pgoff_t index; /* Our offset within mapping. */ /** * @private: Mapping-private opaque data. * Usually used for buffer_heads if PagePrivate. * Used for swp_entry_t if PageSwapCache. * Indicates order in the buddy system if PageBuddy. */ unsigned long private; }; // (以下略) ``` 名詞解釋 (參考資料:[Linux Memory Managment FAQ](https://landley.net/writing/memory-faq.txt)): - **page cache** 又稱 disk cache,基本上就是用來作為硬碟的快取,在進行 disk I/O 時會先檢查是否存在 page cache,若沒找到才會對 disk 進行讀寫。 - **anonymous page** 這些分頁沒有對應到實際的檔案或裝置,像是程式運行時使用到的 stack 及 heap 就是屬於此種類型。 RLU 在 Linux 使用 2 種 list 實作,分別是`active_list`以及`inactive list`,其中`active_list`用來保存最近存取過的 page,而`inactive_list`則是用來保存最近沒有被存取過的 page。(參考資料:[Page Frame Reclamation](https://www.kernel.org/doc/gorman/html/understand/understand013.html)) 可以查看`/proc/meminfo`檔案來得知記憶體使用狀況。 ```bash= $ cat /proc/meminfo MemTotal: 8049140 kB MemFree: 5242392 kB MemAvailable: 6854304 kB Buffers: 565792 kB Cached: 1030720 kB SwapCached: 0 kB Active: 1583232 kB Inactive: 683464 kB Active(anon): 671484 kB Inactive(anon): 17672 kB Active(file): 911748 kB Inactive(file): 665792 kB Unevictable: 228 kB # 以下略 ``` 2. [linux/kernel/event/core.c](https://github.com/torvalds/linux/blob/master/kernel/events/core.c) 將 event 依照 cgroup 重新排程。 ```clike= static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list); #define PERF_CGROUP_SWOUT 0x1 /* cgroup switch out every event */ #define PERF_CGROUP_SWIN 0x2 /* cgroup switch in events based on task */ /* * reschedule events based on the cgroup constraint of task. * * mode SWOUT : schedule out everything * mode SWIN : schedule in based on cgroup for next */ static void perf_cgroup_switch(struct task_struct *task, int mode) { struct perf_cpu_context *cpuctx; struct list_head *list; unsigned long flags; /* * Disable interrupts and preemption to avoid this CPU's * cgrp_cpuctx_entry to change under us. */ local_irq_save(flags); list = this_cpu_ptr(&cgrp_cpuctx_list); list_for_each_entry(cpuctx, list, cgrp_cpuctx_entry) { WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0); perf_ctx_lock(cpuctx, cpuctx->task_ctx); perf_pmu_disable(cpuctx->ctx.pmu); if (mode & PERF_CGROUP_SWOUT) { cpu_ctx_sched_out(cpuctx, EVENT_ALL); /* * must not be done before ctxswout due * to event_filter_match() in event_sched_out() */ cpuctx->cgrp = NULL; } if (mode & PERF_CGROUP_SWIN) { WARN_ON_ONCE(cpuctx->cgrp); /* * set cgrp before ctxsw in to allow * event_filter_match() to not have to pass * task around * we pass the cpuctx->ctx to perf_cgroup_from_task() * because cgorup events are only per-cpu */ cpuctx->cgrp = perf_cgroup_from_task(task, &cpuctx->ctx); cpu_ctx_sched_in(cpuctx, EVENT_ALL, task); } perf_pmu_enable(cpuctx->ctx.pmu); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); } local_irq_restore(flags); } ``` 3. [linux/include/linux/sched.h](https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L446) linux kernel 定義了 process 在排程時的單位`sched_entity`,其中在第 6 行宣告了一個`linked_head`叫做`group_node`。 ```clike= struct sched_entity { /* For load-balancing: */ struct load_weight load; unsigned long runnable_weight; struct rb_node run_node; struct list_head group_node; unsigned int on_rq; u64 exec_start; u64 sum_exec_runtime; u64 vruntime; u64 prev_sum_exec_runtime; u64 nr_migrations; struct sched_statistics statistics; #ifdef CONFIG_FAIR_GROUP_SCHED int depth; struct sched_entity *parent; /* rq on which this entity is (to be) queued: */ struct cfs_rq *cfs_rq; /* rq "owned" by this entity/group: */ struct cfs_rq *my_q; #endif #ifdef CONFIG_SMP /* * Per entity load average tracking. * * Put into separate cache line so it does not * collide with read-mostly values above. */ struct sched_avg avg; #endif }; ``` linux kernel 目前對於 process 的排程採用 [Completely Fair Scheduler (CFS)](https://en.wikipedia.org/wiki/Completely_Fair_Scheduler) 演算法,透過計算每個 process 在理想處理器下的 **maximum execution time** 來決定執行順序,此演算法使用紅黑樹實作。 排程器決定執行哪個 process 將被執行的流程如下: 1. The leftmost node of the scheduling tree is chosen (as it will have the lowest spent execution time), and sent for execution. 2. If the process simply completes execution, it is removed from the system and scheduling tree. 3. If the process reaches its maximum execution time or is otherwise stopped (voluntarily or via interrupt) it is reinserted into the scheduling tree based on its new spent execution time. 4. The new leftmost node will then be selected from the tree, repeating the iteration. --- ### GNU extension 的 [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 有何作用?在程式碼中扮演什麼角色? `typeof`是 C 語言的一個 extension,`typeof`的參數兩種形式:expression 或 type。 若使用 expression,則該 expression 不會被執行,只會得到該 expression 的類型,例如: ```clike= int func() { return 5; } int main() { type(func()) var; } ``` 使用 gdb 檢查變數`var`的型別: ```bash (gdb) b main Breakpoint 1 at 0x4004ea: file typeof_test.c, line 19. (gdb) r Starting program: /home/qoo/Documents/jserv/typeof_test Breakpoint 1, main () at typeof_test.c:19 19 } (gdb) ptype var type = int ``` 若使用 type 當作參數: ```clike #define pointer(T) typeof(T*) ``` `pointer(T)`會回傳`T`所指向的記憶體位址。 ```clike= double a = 2.0; pointer(double) p1; p1 = &a; printf("%f\n", *p1); int b = 4; pointer(int) p2; p2 = &b; printf("%d\n", *p2); ``` ```bash # output 2.000000 4 ``` 甚至可以用在 macro 時宣告變數,自定義參數的類型,達到 overloading 的效果: ```clike= #define max(a,b) \ ({ typeof (a) _a = (a); \ typeof (b) _b = (b); \ _a > _b ? _a : _b; }) ``` ```clike= int x1 = 5, y1 = 3; printf("%d\n", max(x1, y1)); char x2 = 'a', y2 = 'b'; printf("%c\n", max(x2, y2)); ``` ```bash # output 5 b ``` --- ### 解釋以下巨集的原理 ```Clike= #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` 首先在第 3 行,`((type *) 0)`將`0`強制轉為`type`這個 structure 的型別,而`(((type *) 0)->member)`指向`member`這個成員的位址,再利用`typeof`取得此成員的型別,宣告一個此成員型別的指標`_pmember_`,指派為`ptr`的記憶體位址。 根據 man page 中對[`offsetof`](http://man7.org/linux/man-pages/man3/offsetof.3.html)的定義: ```clike= #include <stddef.h> size_t offsetof(type, member); ``` - The macro offsetof() returns the offset of the field member from the start of the structure type. 因此`offsetof`會回傳struct `type`與其成員`member`的 offset。 以下例子: ```clike= #include <stddef.h> #include <stdio.h> #include <stdlib.h> int main(void) { struct s { int i; char c; double d; char a[]; }; /* Output is compiler dependent */ printf("offsets: i=%zd; c=%zd; d=%zd a=%zd\n", offsetof(struct s, i), offsetof(struct s, c), offsetof(struct s, d), offsetof(struct s, a)); printf("sizeof(struct s)=%zd\n", sizeof(struct s)); exit(EXIT_SUCCESS); } ``` ```bash offsets: i=0; c=4; d=8 a=16 sizeof(struct s)=16 ``` 可以看到`int i`佔了 4 bytes,而由於 alignment 的關係,`char c`雖然只有 1 byte 的大小,卻佔了 4 bytes的空間,`double d`佔了 8 bytes,`char a[]`則不佔任何空間,整個`struct s`共佔 16 bytes。 因此在第 4 行,`offsetof(type, member)`可以取得`member`這個成員對於`type`這個結構的 offset。 再將剛剛得到的`member`這個成員的記憶體位址`__pmember`,減去與此 structure 的 offset `offsetof(type, member)`,就可以得到此結構的起始位址,因此`container_of`的用處是在給定 structure 中某個成員的位址時,可以計算出此 structure 的起始位址。 舉個例子: ```clike= struct s { int i; char c; double d; char a[]; } example; double *d_ptr = &example.d; struct s *start_address = container_of(d_ptr, struct s, d); ``` 使用 gdb 查看: ```bash (gdb) p start_address $3 = (struct s *) 0x7fffffffdce0 (gdb) p &example $4 = (struct s *) 0x7fffffffdce0 ``` --- ### 除了你熟悉的 add 和 delete 操作,`list.h` 還定義一系列操作,為什麼呢?這些有什麼益處? 在`list.h`中除了常用的`__list_add`與`__list_del`函式之外,還有`list_replace`、`list_empty`、`list_cut_position`等函式,況且同樣性質的函式也定義好幾種,例如:`list_empty`及`list_empty_careful`等等。 在開頭的注釋提到,使用 doubly linked list 在於能夠善加利用 next/prev 兩個 entry,因此才會需要實作更多的 function 讓 doubly linked list 發揮出應有的實力。 ``` Some of the internal functions ("__xxx") are useful when manipulating whole lists rather than single entries, as sometimes we already know the next/prev entries and we can generate better code by using them directly rather than using the generic single-entry routines. ``` --- ### `LIST_POISONING` 這樣的設計有何意義? 在`list_del`函式中,將已刪除的 node 的`prev`以及`next`指標都設為預先配置的記憶體,當`list_del`執行完成,往後若嘗試存取此段記憶體,都會導致存取錯誤。 ```c= static inline void list_del(struct list_head *node) { struct list_head *next = node->next; struct list_head *prev = node->prev; next->prev = prev; prev->next = next; #ifdef LIST_POISONING node->prev = (struct list_head *) (0x00100100); node->next = (struct list_head *) (0x00200200); #endif } ``` --- ### linked list 採用環狀是基於哪些考量? 1. 任何一個 node 都可以是起始點,可以從任何一個 node 開始 traverse 整個 list,只要在遇到起始點時停下即可。 2. 實作 queue 比較方便,不需要 maintan 兩個 pointer (front 跟 rear)。 3. 實作某些需要重複尋訪整個 list 的應用,例如:Round Robin 演算法 (RR)。 4. 用來實作高等資料結構,例如:Fibonacci Heap。 --- ### `list_for_each_safe` 和 `list_for_each` 的差異在哪?"safe" 在執行時期的影響為何? 先說結論,在 delete element 時`list_for_each`會出問題,而`list_for_each_safe`則不會。 list 的 delete element 主要由三個函式組成: ```c= /* * Delete a list entry by making the prev/next entries * point to each other. * * This is only for internal list manipulation where we know * the prev/next entries already! */ static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; WRITE_ONCE(prev->next, next); } /** * list_del - deletes entry from list. * @entry: the element to delete from the list. * Note: list_empty() on entry does not return true after this, the entry is * in an undefined state. */ static inline void __list_del_entry(struct list_head *entry) { if (!__list_del_entry_valid(entry)) return; __list_del(entry->prev, entry->next); } static inline void list_del(struct list_head *entry) { __list_del_entry(entry); entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } ``` 大致流程如下: `list_del`呼叫`__list_del_entry`,`__list_del_entry`再呼叫`__list_del`,傳入的參數是`entry`的前一個及後一個 node,而`__list_del`再透過將兩個 node 連接將`entry`刪除,其中的`Writee_Once`函式是為了確保此操作為 atomic。 最後會將`entry`的`prev`及`next`指標都指向`LIST_POISON`,當往後存取此兩個指標時會產生 page fault 的錯誤,避免任何人再對`entry`進行操作。 其定義可在 [linux/include/linux/poison.h](https://github.com/torvalds/linux/blob/master/include/linux/poison.h) 中找到: ```c= /* * These are non-NULL pointers that will result in page faults * under normal circumstances, used to verify that nobody uses * non-initialized list entries. */ #define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA) #define LIST_POISON2 ((void *) 0x200 + POISON_POINTER_DELTA) ``` 再來看`list_for_each`以及`list_for_each_safe`: ```c= /** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. * @head: the head for your list. */ #define list_for_each(pos, head) \ for (pos = (head)->next; pos != (head); pos = pos->next) ``` ```c= /** * list_for_each_safe - iterate over a list safe against removal of list entry * @pos: the &struct list_head to use as a loop cursor. * @n: another &struct list_head to use as temporary storage * @head: the head for your list. */ #define list_for_each_safe(pos, n, head) \ for (pos = (head)->next, n = pos->next; pos != (head); \ pos = n, n = pos->next) ``` 在`list_for_each`中在迴圈結束後,由於已經將`pos`的前後都指向 POISON ,`pos = pos->next`就會成為一個非法的操作。 而在`list_for_each_safe`中,新增了一個參數`n`用來保存指向的下一個元素,因此迴圈結束後,`pos`會指向`n`,不會導致找不到後續元素的狀況發生。 --- ### for_each 風格的開發方式對程式開發者的影響為何? 提示:對照其他程式語言,如 Perl 和 Python 在許多程式語言都有 [for each loop](https://en.wikipedia.org/wiki/Foreach_loop) 可以用,但可惜的是 C 語言本身並沒有提供,但可以使用 macro 來宣告,例如迭代`string`中的每個`char`: ```c= #define foreach( ptrvar, strvar ) char* ptrvar; for( ptrvar=strvar ; (*ptrvar) != '\0' ; *ptrvar++) ``` 自己本身也有寫過 python,在 python 當中也有類似 for each 的語法能夠使用,例如迭代一個 list 中的所有元素: ```python im_a_list = ['apple', 666, False] for val in im_a_list: print(val) ``` 要更改 list 中的元素時也可以順便取出索引值: ```python im_a_list = ['apple', "pineapple", "pen pineapple apple"] for i, val in enumerate(im_a_list): im_a_list[i] += "pen" ``` 在開發過程中要遍歷整個 list 的機會要比只查看其中某一範圍還要來得多 (至少以我目前的經驗是如此),即使要取得某一範圍的值,python 甚至提供方便的語法來取得 list 中某一範圍的值: ```python the_list = ["a", "b", "c", "d", "e"] for val in the_list[:3]: # 取出 list 中 index=0~2 的值 print(val) ``` 因此我認為 for each 系列的語法能夠使得開發者更容易撰寫程式碼,也可使程式碼閱讀起來更加簡潔。 --- ### 程式註解裡頭大量存在 `@` 符號,這有何意義?你能否應用在後續的程式開發呢? 提示: 對照看 Doxygen [Doxygen](https://www.stack.nl/~dimitri/doxygen/manual/docblocks.html) 是一個 document generator,能夠將程式碼的註解轉換成說明文件,若註解依照特定的格式撰寫,就可以透過 Doxygen 自動產生文件。 在 list.h 中,`@`在註解中被大量用來標記參數名稱,參考官方文件中的說明: For functions one can use the @param command to document the parameters and then use [in], [out], [in,out] to document the direction. For inline documentation this is also possible by starting with the direction attribute. 例如在[`container_of`](https://github.com/sysprog21/linux-list/blob/master/include/list.h#L19)的函式定義: ```c /** * container_of() - Calculate address of object that contains address ptr * @ptr: pointer to member variable * @type: type of the structure containing ptr * @member: name of the member variable in struct @type * * Return: @type pointer of object containing ptr */ ``` --- ### `tests/` 目錄底下的 unit test 的作用為何?就軟體工程來說的精神為何? 在軟體工程中,[unit test](https://en.wikipedia.org/wiki/Unit_testing) 是非常重要的一件事,每個龐大的專案一定是由許多小元件所組成,因此確保每個小零件的正確性才能避免更大環節的錯誤。因此 unit test 就是針對每個 module 進行正確性的檢驗,簡單來說就是針對單一功能進行測試。 --- ### `tests/` 目錄的 unit test 可如何持續精進和改善呢? 我認為這個 unit test 沒有針對一些特殊的 case 進行測試,例如在 [linux-list/tests/list_del.c](https://github.com/sysprog21/linux-list/blob/master/tests/list_del.c) 當中沒有測試當刪除空的 list 時會發生什麼事,這樣有可能會造成測試的結果不夠 robust。 --- ## 改寫作業一:Doubly Linked List ### queue.h ```c= typedef struct ELE { /* Pointer to array holding string. This array needs to be explicitly allocated and freed */ char *value; struct ELE *prev; struct ELE *next; } list_ele_t; ``` ### queue.c #### q_insert_head ```c= bool q_insert_head(queue_t *q, char *s) { if (!q) return false; list_ele_t *newh = malloc(sizeof(list_ele_t)); if (!newh) return false; newh->value = malloc(strlen(s) + 1); if (!newh->value) { free(newh); return false; } strcpy(newh->value, s); newh->next = q->head; if (q->size == 0) { newh->prev = NULL; q->tail = newh; } else { q->head->prev = newh; newh->prev = NULL; } q->head = newh; q->size++; return true; } ``` #### q_insert_tail ```c= bool q_insert_tail(queue_t *q, char *s) { /* You need to write the complete code for this function */ /* Remember: It should operate in O(1) time */ if (!q) return false; list_ele_t *newh = malloc(sizeof(list_ele_t)); if (!newh) return false; newh->value = malloc(strlen(s) + 1); if (!newh->value) { free(newh); return false; } strcpy(newh->value, s); newh->next = NULL; if (q->size == 0) { newh->prev = NULL; q->head = newh; } else { q->tail->next = newh; newh->prev = q->tail; } q->tail = newh; q->size++; return true; } ``` #### q_remove_head ```c= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { /* You need to fix up this code. */ if (!q || q->size == 0) return false; if (sp) { strncpy(sp, q->head->value, bufsize); sp[bufsize - 1] = '\0'; } list_ele_t *e = q->head; q->head = q->head->next; free(e->value); free(e); q->size--; if (q->size) q->head->prev = NULL; else q->tail = NULL; return true; } ``` #### q_reverse ```c= void q_reverse(queue_t *q) { if (!q || q->size == 0) return; list_ele_t *temp, *cur = q->head; while (cur) { temp = cur->next; cur->next = cur->prev; cur->prev = temp; cur = temp; } temp = q->head; q->head = q->tail; q->tail = temp; } ``` 測試: ```bash qoo@qoo-S551LN:~/Documents/lab0-c$ make test gcc -O0 -g -Wall -Werror -c queue.c gcc -O0 -g -Wall -Werror -o qtest qtest.c report.c console.c harness.c queue.o scripts/driver.py --- Trace Points +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head --- trace-01-ops 6/6 +++ TESTING trace trace-02-ops: # Test of insert_head, insert_tail, and remove_head --- trace-02-ops 6/6 +++ TESTING trace trace-03-ops: # Test of insert_head, insert_tail, reverse, and remove_head --- trace-03-ops 6/6 +++ TESTING trace trace-04-ops: # Test of insert_head, insert_tail, and size --- trace-04-ops 6/6 +++ TESTING trace trace-05-ops: # Test of insert_head, insert_tail, remove_head reverse, and size --- trace-05-ops 6/6 +++ TESTING trace trace-06-string: # Test of truncated strings --- trace-06-string 7/7 +++ TESTING trace trace-07-robust: # Test operations on NULL queue --- trace-07-robust 7/7 +++ TESTING trace trace-08-robust: # Test operations on empty queue --- trace-08-robust 7/7 +++ TESTING trace trace-09-robust: # Test remove_head with NULL argument --- trace-09-robust 7/7 +++ TESTING trace trace-10-malloc: # Test of malloc failure on new --- trace-10-malloc 7/7 +++ TESTING trace trace-11-malloc: # Test of malloc failure on insert_head --- trace-11-malloc 7/7 +++ TESTING trace trace-12-malloc: # Test of malloc failure on insert_tail --- trace-12-malloc 7/7 +++ TESTING trace trace-13-perf: # Test performance of insert_tail --- trace-13-perf 7/7 +++ TESTING trace trace-14-perf: # Test performance of size --- trace-14-perf 7/7 +++ TESTING trace trace-15-perf: # Test performance of insert_tail, size, and reverse --- trace-15-perf 7/7 --- TOTAL 100/100 ```