2023q1 Homework1 (lab0)

# 2023q1 Homework1 (lab0) contributed by < `koty6069` > ### Reviewed by `yanjiew1` - `q_insert_head` 和 `q_insert_tail` 及 `q_remove_head` 和 `q_remove_tail` 均有重複的程式碼出現，可以思考如何重用這些程式，避免重複的程式碼。 - 可以再思考如何利用原有佇列是已排序的特性來實作 `q_merge` 。 - Coding Style 和 commit message 均符合規範。 - 函式的實作過程和說明寫得清楚有條理。 - 仍有些作業要求未完成： - 新增洗牌的指令 `shuffle` 及分析亂度 - 研究亂數產生器 - 研讀論文〈[Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)〉 - 觀察與分析 `entropy` - `web` 指令改善 ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz CPU family: 6 Model: 142 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 10 CPU max MHz: 3400.0000 CPU min MHz: 400.0000 BogoMIPS: 3600.00 Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2: 1 MiB (4 instances) L3: 6 MiB (1 instance) ``` ## 開發過程 ### `q_new` 使用 `malloc` 配置新的記憶體空間，若空間配置成功，使用 `INIT_LIST_HEAD` 初始化 `head`，使其 `next` 和 `prev` 皆指向自身。當 `malloc` 配置空間失敗時會回傳 `NULL`，因此當配置失敗時無需額外處理，直接回傳 `NULL`。 ```c /* Create an empty queue */ struct list_head *q_new() { struct list_head *head = malloc(sizeof(struct list_head)); if (head) INIT_LIST_HEAD(head); return head; } ``` ### `q_free`  `list_for_each_entry_safe` 可允許我們在走訪佇列時，對當前節點進行處理而不會導致不可預期行為<s>安全的對當前節點進行處理</s> ???，使用 `q_release_element` 將每個節點的空間釋放，最後使用 `free(l)` 將 `l` 的空間釋放。 :::warning 改進漢語表達。 :notes: jserv ::: ```c /* Free all storage used by queue */ void q_free(struct list_head *l) { if (!l) return; element_t *entry, *safe; list_for_each_entry_safe (entry, safe, l, list) q_release_element(entry); free(l); } ``` ### `q_insert_head` 和 `q_insert_tail` 1. 檢查 `head` 是否為 `NULL` ，若為 `NULL` 則直接回傳 `false`。 2. 配置空間給 `node`，若失敗回傳 `false`。 3. 配置字串長度加一（`strlen` 之回傳值不包含 `'\0'` ）之空間給 `node` 的成員 `value` 用來儲存字串 `s`，若配置失敗回傳 `false`。 4. 使用 `strncpy` 複製字串並於最後補上 `'\0'`。 5. 使用 `list_add` / `list_add_tail` 將 `node` 新增到佇列的開頭/結尾。 ```c /* Insert an element at head of queue */ bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *node = malloc(sizeof(element_t)); if (!node) return false; size_t len = strlen(s); node->value = malloc(sizeof(char) * (len + 1)); if (!node->value) { free(node); return false; } /* Need to free the space of node */ strncpy(node->value, s, len); node->value[len] = '\0'; list_add(&node->list, head); return true; } ``` ```c /* Insert an element at tail of queue */ bool q_insert_tail(struct list_head *head, char *s) { if (!head) return false; element_t *node = malloc(sizeof(element_t)); if (!node) return false; size_t len = strlen(s); node->value = malloc(sizeof(char) * (len + 1)); if (!node->value) { free(node); return false; } /* Need to free the space of node */ strncpy(node->value, s, len); node->value[len] = '\0'; list_add_tail(&node->list, head); return true; } ``` :::warning 這二段程式高度相似，可以思考如何簡化。提示：`q_insert_tail` 可以直接呼叫已經實作好的 `q_insert_head` 來達成相同目的。 > [name=yanjiew1] ::: ### `q_remove_head` 和 `q_remove_tail` * 使用 `list_first_entry` / `list_last_entry` 找到開頭/結尾的 entry。 * 使用 `list_del_init` 將其從 list 中移除，若 `sp` 不為 `NULL`，使用 `strncpy` 將該節點的 `value` 複製到 `sp`。 ```c /* Remove an element from head of queue */ element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (!head || list_empty(head)) return NULL; element_t *node = list_first_entry(head, element_t, list); list_del_init(&node->list); if (sp) { strncpy(sp, node->value, bufsize - 1); sp[bufsize - 1] = '\0'; } return node; } ``` ```c /* Remove an element from tail of queue */ element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) { if (!head || list_empty(head)) return NULL; element_t *node = list_last_entry(head, element_t, list); list_del_init(&node->list); if (sp) { strncpy(sp, node->value, bufsize - 1); sp[bufsize - 1] = '\0'; } return node; } ``` :::warning 思考如何精簡程式碼。 :notes: jserv ::: :::warning * 若傳入的 `bufsize` 為 0 ，則 `bufsize - 1` 會變成 $2^{64} - 1$ ，可能導致一些問題。 * 這二段程式碼高度相似，應可再精簡。 > [name=yanjiew1] ::: ### `q_size` 按照[作業指示](https://hackmd.io/@sysprog/linux2023-lab0)進行實作，使用 `list_for_each` 走訪整個佇列，每經過一個節點就遞增`len`。 ```c /* Return number of elements in queue */ int q_size(struct list_head *head) { if (!head) return 0; int len = 0; struct list_head *li; list_for_each (li, head) len++; return len; } ``` ### `q_delete_mid` 老師上課提過要善用 doubly-linked list 的特性，因此使用兩個指標，`ahead` 從佇列開頭正向前進，`back` 從佇列結尾反向前進，有兩種情況代表走到中間節點： 1. `ahead` 和 `back` 走到同一個節點。 2. 當 `ahead` 與 `back` 相鄰，此時 `back` 即為中間節點。 ~~`ahead` 超過 `back` ，此時 `ahead` 走到的節點即為中間節點。~~ 找到目標節點後先將其從佇列中移除，再將其空間釋放。 ```c /* Delete the middle node in queue */ bool q_delete_mid(struct list_head *head) { if (!head || list_empty(head)) return false; struct list_head *ahead = head->next, *back = head->prev; while (ahead != back && ahead->next != back) { ahead = ahead->next; back = back->prev; } list_del_init(back); q_release_element(list_entry(back, element_t, list)); // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/ return true; } ``` :::warning 撰寫出更精簡的程式碼。 :notes: jserv > * 將 `ahead->prev != back` 改成 `ahead-next != back` 並將後續處理節點由 `ahead` 改成 `back` 減少一次的迴圈。 > * 將 `list_entry` 放入 `q_release_element` 的參數中，避免多使用一個變數去儲存。 ::: ### `q_delete_dup` * 題目要求為傳入已排序的佇列，因此只考慮相鄰的節點是否重複。 * 佇列中所有重複的節點都應被刪除，因此使用 `bool del` 來紀錄節點是否需要刪除。 * 若當前節點與下一節點重複，即刪除當前節點並將 `del` 設為 `true` 。（比較字串時若 `next` 為 `head` 則不進行比較。） * 若當前節點不與下一節點重複，但 `del` 為 `true` ，代表當前節點與被刪除的前一節點重複，仍必須刪除當前節點。 ```c /* Delete all nodes that have duplicate string */ bool q_delete_dup(struct list_head *head) { if (!head) return false; element_t *cur, *safe; bool del = false; list_for_each_entry_safe (cur, safe, head, list) { char *s = list_entry(cur->list.next, element_t, list)->value; if (cur->list.next != head && strcmp(cur->value, s) == 0) { list_del(&cur->list); q_release_element(cur); del = true; } else if (del) { list_del(&cur->list); q_release_element(cur); del = false; } } // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/ return true; } ``` ### `q_swap` 節點兩兩一組互換，因此使用 `first` 和 `second` 分別記錄。 ```c /* Swap every two adjacent nodes */ void q_swap(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *first = head->next, *second = first->next; for (; first != head && second != head;) { first->next = second->next; second->prev = first->prev; first->prev->next = second; second->next->prev = first; first->prev = second; second->next = first; first = first->next; second = first->next; } // https://leetcode.com/problems/swap-nodes-in-pairs/ } ``` :::warning 可以思考如何利用 `list.h` 提供的巨集和函式來操作，讓程式更簡潔。 > [name=yanjiew1] ::: ### `q_reverse` * 執行 `q_reverse` 前： `node2->next = node3` / `node2->prev = node1` ```graphviz digraph G { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p0[label="<h> Head | {<pr> prev|<ne> next}"]; p1[label="<h> Node1|{<pr> prev|<ne> next}"]; p2[label="<h> Node2|{<pr> prev|<ne> next}"]; p3[label="<h> Node3|{<pr> prev|<ne> next}"]; p0:ne -> p1:h; p1:ne -> p2:h; p2:ne -> p3:h; //p3:ne -> p0:h; //p0:pr -> p3:h; p1:pr -> p0:h; p2:pr -> p1:h; p3:pr -> p2:h; } ``` * 執行 `q_reverse` 後： `node2->next = node1` / `node2->prev = node3` ```graphviz digraph G { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p0[label="<h> Head | {<pr> prev|<ne> next}"]; p3[label="<h> Node3|{<pr> prev|<ne> next}"]; p2[label="<h> Node2|{<pr> prev|<ne> next}"]; p1[label="<h> Node1|{<pr> prev|<ne> next}"]; p0:ne -> p3:h; p3:ne -> p2:h; p2:ne -> p1:h; //p1:ne -> p0:h; //p0:pr -> p1:h; p3:pr -> p0:h; p2:pr -> p3:h; p1:pr -> p2:h; } ``` * 從執行 `q_reverse` 前後的圖可以觀察到，將每個節點的 `next` 和 `prev` 互換即可達成目標。 * 最後將 `head` 的 `next` 和 `prev` 互換。 ```c /* Reverse elements in queue */ void q_reverse(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *node, *safe, *tmp; list_for_each_safe (node, safe, head) { tmp = node->next; node->next = node->prev; node->prev = tmp; } tmp = head->next; head->next = head->prev; head->prev = tmp; } ``` :::success 好的觀察，透過圖片解說很棒！ > [name=yanjiew1] ::: ### `q_reverseK` 起初想到使用 `list_cut_position` 將需要翻轉的 sublist 切割出來，再使用 `q_reverse` 將 sublist 進行翻轉，最後透過 `list_splice` 將其加回原本的佇列。在實作過程中發現需要紀錄一個 head 當作 `list_splice` 的插入點，因此參考其他同學的作業後，選擇紀錄 sublist 起始節點的前一個節點當作分割點（`cut`）。 `cnt` 用來記錄目前的 sublist 長度，若長度到達 `k` 則進行以下步驟： 1. 將 `cut->next` （真正的起始點）到 `node` （當前節點）移動到 `tmp`。 2. 使用 `q_reverse` 進行翻轉。 3. 將翻轉完的 sublist 加入到分割點 `cut` 後。 4. 將分割點設為當前節點，並將 `cnt` 設回 0。  ```c /* Reverse the nodes of the list k at a time */ void q_reverseK(struct list_head *head, int k) { if (!head || list_empty(head) || list_is_singular(head)) return; struct list_head *node, *safe, *cut = head; int cnt = 0; list_for_each_safe (node, safe, head) { if (++cnt == k) { LIST_HEAD(tmp); list_cut_position(&tmp, cut->next, node); q_reverse(&tmp); list_splice(&tmp, cut); cut = safe->prev; cnt = 0; } } // https://leetcode.com/problems/reverse-nodes-in-k-group/ } ``` ### q_sort 參考 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 實作 `merge_sort_list` 和 `merge` 兩個函式進行排序。 ```c struct list_head *merge(struct list_head *l1, struct list_head *l2) ``` ```c struct list_head *merge_sort_list(struct list_head *head) ``` [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 當中使用的結構為單向鏈結串列，且傳入 `merge_sort_list` 之參數為指向第一個節點的指標，因此先在 `q_sort` 中將最後一個節點指向 `NULL` 使其結構變為單向鏈結串列，並將指向第一個節點的指標傳入 `merge_sort_list`。 ```c head->prev->next = NULL; head->next = merge_sort_list(head->next); ``` `merger_sort_list` 中使用快慢指標找出中間節點並將原本的 list 切成兩個部份，並持續以遞迴的方式進行切割，最後再透過 `merge` 將所有的 list 合併。 ```c while (fast && fast->next) { slow = slow->next; fast = fast->next->next; } fast = slow->next; slow->next = NULL; ``` 在實作 `merge` 時，因為不能配置新的記憶體，因此參考 [linked list 和非連續記憶體操作](https://hackmd.io/@sysprog/c-linked-list#%E6%A1%88%E4%BE%8B%E6%8E%A2%E8%A8%8E-LeetCode-21-Merge-Two-Sorted-Lists) 中 [LeetCode 21. Merge Two Sorted Lists ](https://leetcode.com/problems/merge-two-sorted-lists/) 之作法，使用間接指標將 list 合併。 ```c for(node = NULL; l1 && l2; *node = (*node)->next) { if (strcmp(list_entry(l1, element_t, list)->value, list_entry(l2, element_t, list)->value) < 0) { node = &l1; } else { node = &l2; } *ptr = *node; ptr = &(*ptr)->next; } *ptr = (struct list_head *) ((u_int64_t) l1 | (u_int64_t) l2); ``` 最後，由於前面將雙向鏈結串列改為單向，在參閱 [seasonwang0905](https://hackmd.io/@seasonwang/B1B0lVqpo) 的作法後，要將每一節點的 `prev` 重新指回前一個節點，以保持 doubly-linked list 的結構。 ```c struct list_head *cur = head, *nt = head->next; while (nt) { nt->prev = cur; cur = nt; nt = nt->next; } cur->next = head; head->prev = cur; ``` ### q_descend 從 `queue` 的結尾向前走訪，若該節點的字串比目前紀錄的 `str` 小，就將該節點移除，反之將 `str` 設成該節點的 `cur->value`。 ```c while (node != head) { element_t *cur = list_entry(node, element_t, list); node = node->prev; if (strcmp(cur->value, str) < 0) { list_del_init(&cur->list); q_release_element(cur); } else { str = cur->value; } } ``` ### q_merge 在此系統中，使用 `queue.c` 中定義的 `queue_contex_t` 來維護所有目前存在的佇列，在 `queue_contex_t` 中有包含 `struct list_head chain` 以 doubly-linked list 的結構來紀錄系統中的佇列，因此同樣可使用之前用在單一佇列上的巨集，來操作整個佇列。 ```c typedef struct { struct list_head *q; struct list_head chain; int size; int id; } queue_contex_t; ``` 首先紀錄 `head` 中的第一個佇列，用來將其他佇列都串接到此佇列上面，同時紀錄長度，用於最後回傳。 ```c struct list_head *list = list_first_entry(head, queue_contex_t, chain)->q; size_t total = list_first_entry(head, queue_contex_t, chain)->size; ``` 之後使用 `list_for_each_entry_safe` 走訪所有佇列，將除了第一個 queue 以外的所有成員都接到 `list` 上，同時要將每個佇列的長度都加上 `total`。 ```c list_for_each_entry_safe (node, safe, head, chain) { if (node->id == 0) continue; list_splice_init(node->q, list); total += node->size; } ``` 最後使用 `q_sort` 將串接完的 queue 進行重新排序。 :::warning 在 `queue.h` 中有說明傳入的各個佇列都是已排序的。可以思考如何利用已排序的特性來實作更高效率的程式。 > [name=yanjiew1] ::: ### 自動測試結果 ``` +++ TESTING trace trace-17-complexity: # Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant ERROR: Probably not constant time or wrong implementation ERROR: Probably not constant time or wrong implementation ERROR: Probably not constant time or wrong implementation Probably constant time --- trace-17-complexity 0/5 --- TOTAL 95/100 ``` :::warning 這是因為內建的 `dudect` 有瑕疵，改善 `dudect` 後即可通過測試。 > [name=yanjiew1] ::: --- ## 以 [Valgrind](https://valgrind.org/) 分析記憶體問題執行 `$ make valgrind` 可看到輸出錯誤訊息，節錄如下： ``` +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head ==3785== 32 bytes in 1 blocks are still reachable in loss record 1 of 2 ==3785== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==3785== by 0x10CBE0: do_new (qtest.c:146) ==3785== by 0x10DE0C: interpret_cmda (console.c:181) ==3785== by 0x10E3C1: interpret_cmd (console.c:201) ==3785== by 0x10E7C2: cmd_select (console.c:610) ==3785== by 0x10F0AE: run_console (console.c:705) ==3785== by 0x10D1FE: main (qtest.c:1224) ``` 觀察後可發現每一次測試的錯誤訊息皆為相同格式： ```clike XX bytes in X blocks are still reachable... ``` 從 [Valgrind + 自動測試程式](https://hackmd.io/@sysprog/linux2023-lab0/%2F%40sysprog%2Flinux2023-lab0-b#Valgrind-%E4%BD%BF%E7%94%A8%E6%A1%88%E4%BE%8B) 和 [Valgrind Frequently Asked Questions](https://valgrind.org/docs/manual/faq.html) 的 5.2 中可得知，代表程式結束時有記憶體尚未釋放。 > "still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. 從錯誤訊息可以發現程式的進入點 main 在 qtest.c 中，並且發現當中會呼叫一些初始化的函式。 ```c=1202 q_init(); init_cmd(); console_init(); ``` 往後繼續檢視，可發現與離開程式有關的函式。 ```c=1221 add_quit_helper(q_quit); bool ok = true; ok = ok && run_console(infile_name); /* Do finish_cmd() before check whether ok is true or false */ ok = finish_cmd() && ok; ``` 檢查 `q_quit` 按照前面作業各函式的命名模式，可得知此函式功能應為在結束程式時對佇列作處理，觀察內容，可發現會將目前有紀錄的所有佇列透過 `q_free` 進行釋放。 ```c=1063 if (exception_setup(true)) { struct list_head *cur = chain.head.next; while (chain.size > 0) { queue_contex_t *qctx, *tmp; tmp = qctx = list_entry(cur, queue_contex_t, chain); cur = cur->next; q_free(qctx->q); free(tmp); chain.size--; } } ``` 但在此函式的第一行直接 `return true`，導致不會執行到後面釋放空間的行為，因此這邊應該將此行移除，移除後再次執行 `$ make valgrind` 可發現前述之錯誤訊息不再出現，代表程式結束前有正確釋放掉已配置的記憶體空間。 ```c=1056 static bool q_quit(int argc, char *argv[]) { return true; ``` 開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) 後執行自動測試也無出現錯誤訊息。 ``` $ make clean $ make SANITIZER=1 $ make test ``` ## Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) ### 研讀 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 初步可發現其中分為三個函式，並透過註解大致了解其功用： * `merge`：用來將兩個 list 合併，透過此函式合併的 list 為單向鏈結串列，最後一個節點指向 `NULL`。 * `merge_final`：最後一次合併所使用的函式，會將合併完的 list 恢復成 doubly-linked list。 * `list_sort`：進行排序時呼叫的主要函式。以下會針對上述三個函式進行解析。 #### `list_sort` `list_sort` 需要三個參數，根據註解，其定義如下： * `void *priv`：用來傳遞給 `cmp` 當作參數使用，`list_sort`本身不會用到。 * `struct list_head *head`：要被排序的 list。 * `list_cmp_func_t cmp`：用來比較元素大小的比較函式，支援回傳（<0 / =0 / >0）和（bool 0/1）兩種方式來決定要排序的元素大小。 ```c __attribute__((nonnull(2,3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) ``` 在排序前會先將原本的雙向鏈結串列改為單向（將最後一個節點指向 `NULL`），`pending` 用來暫存需要合併的 list，並用 `count` 紀錄當中 list 的數量。 ```c struct list_head *list = head->next, *pending = NULL; size_t count = 0; /* Count of pending */ if (list == head->prev) /* Zero or one elements */ return; /* Convert to a null-terminated singly-linked list. */ head->prev->next = NULL; ```