linux2024-homework1

# 2024q1 Homework1 (lab0) contributed by < `iLeafy11` > :::danger 終於生病生完了，從今天開始，每天都來寫!!!!!! ::: ## 開發環境新電腦安裝 Ubuntu 22.04 遇到 Failed to start Ubuntu live CD Installer，嘗試過各種方法，對於我來說，目前暫時無解，所以先使用 WSL。安裝 WSL for Microsoft Windows 11 遇到問題時，可以參考: [Troubleshooting Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/troubleshooting)，解決問題。 ``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz CPU family: 6 Model: 165 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 3 BogoMIPS: 5807.99 ``` ### 在 WSL 使用 perf 如果在 WSL 使用 `perf` 會得到以下結果: ``` WARNING: perf not found for kernel 5.15.79.1-microsoft You may need to install the following packages for this specific kernel: linux-tools-5.15.79.1-microsoft-standard-WSL2 linux-cloud-tools-5.15.79.1-microsoft-standard-WSL2 You may also want to install one of the following packages to keep up to date: linux-tools-standard-WSL2 linux-cloud-tools-standard-WSL2 ``` 如果我們想要進一步 `sudo apt install linux-tools-standard-WSL2` 會得到以下結果: ``` E: Unable to locate package linux-tools-standard-WSL2 ``` 不過幸好可以在 GitHub 上，找到有討論串有解決方法 [Install perf on WSL 2](https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13)，`install-linux-perf-on-wsl2.sh`： ```shell apt install flex bison git clone https://github.com/microsoft/WSL2-Linux-Kernel --depth 1 cd WSL2-Linux-Kernel/tools/perf make -j8 sudo cp perf /usr/local/bin ``` 之後在 ~/.bashrc 設置路徑: ```shell LINUX_TOOLS_DIR=/usr/lib/linux-tools LINUX_TOOLS_MAXVER_DIR=`find $LINUX_TOOLS_DIR -maxdepth 1 -mindepth 1 -type d | sort --reverse | head --lines=1` if [ -d $LINUX_TOOLS_MAXVER_DIR ]; then for TOOL in `ls -r $LINUX_TOOLS_MAXVER_DIR | grep -v ".so$"`; do alias $TOOL=$LINUX_TOOLS_MAXVER_DIR/$TOOL done fi ``` 重啟 WSL 之後就可以順利使用 perf。 ## 開發紀錄不同於 2021 年的 lab0，這次 2024 年的 lab0，已經大幅改變，除了提早引入了參考 [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 所改寫的 list.h，並且也額外加入了若干 LeetCode 題。參考我 2021 年當時在 linux 核心設計課堂上的實作，此次 lab0 光是 queue.[ch] 的內容至少就包含了 [2021q1 Homework2 (quiz2) 測驗1](https://hackmd.io/MGK7E7F4QlOAXX9kOEyY7w#%E6%B8%AC%E9%A9%97-1) 以及 [2021q1 Homework1 (lab0)](https://hackmd.io/28XwxjCOQj6PS766LQyAPQ?view)。 ### `q_delete_mid` (LeetCode 2095. Delete the Middle Node of a Linked List) 關於 `q_delete_mid` 函式，如果點進去 [LeetCode 2095. Delete the Middle Node of a Linked List](https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/) 超連結可以發現底下有兩個 Hint: :::info * Hint 1 > If a point with a speed s moves n units in a given time, a point with speed 2 * s will move 2 * n units at the same time. Can you use this to find the middle node of a linked list? * Hint 2 > If you are given the middle node, the node before it, and the node after it, how can you modify the linked list? ::: 可以發現 Hint 1 即是 [2021q1 Homework2 (quiz2) 測驗1](https://hackmd.io/MGK7E7F4QlOAXX9kOEyY7w#%E6%B8%AC%E9%A9%97-1) 中，Merge Sort 實作找中點的方式(快慢指標)。實作概念類似如下: ```c struct list_head *q_middle(struct list_head *head) { struct list_head *fast = head->next, *slow; list_for_each(slow, head) { if(fast->next == head || fast->next->next == head) break; fast = fast->next->next; } return slow; } ``` 找到中點後，就可以很輕易的移除它，只需要 `list_del` 之後再釋放其記憶體即可。 ### `q_delete_dup` (LeetCode 82. Remove Duplicates from Sorted List II) [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/description/) 的 Discussion 中，有討論到使用 Dictionary 資料結構的方式讓這題解法達到 $O(1)$ 時間複雜度，概念類似於以下[使用 Hash Table 實作眾數問題](https://gist.github.com/iLeafy11/4ac1d4d8929aef09d5c8166dd6ab3edc) 但是如果貿然修改 lab0 的資料結構，後續勢必會面臨許多開發問題，所以或許只需要考量到 $O(N)$ 時間複雜度的解法，以走訪整個 list 的概念來實作，不過由於是 Sorted List，所以我們可以單純的做類似於以下的事: ```c list_for_each_safe (node, safe, head) { if (node, node->next's str value is the same) list_del(node); else if (...) list_del(node); ... } ``` 留意 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/description/) 是要將全部的 Duplicates 刪除，並非刪除到只剩下單獨一個。所以實作上可以將 Duplicates 的 `value` 先暫存起來: ```c bool q_delete_dup(struct list_head *head) { ... list_for_each_entry_safe(entry, safe, head, list) { if (&safe->list != head && !strcmp(entry->value, safe->value)) { dup_str = safe->value; ... } else if (dup_str && !strcmp(entry->value, dup_str)) { ... dup_str = NULL; } } ... } ``` 有趣的是，題目為 "Remove" Duplicates，但是 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/description/) 題目網站上的敘述卻是: >Given the head of a sorted linked list, ***delete*** all nodes that have duplicate numbers, leaving only distinct numbers from the original list. Return the linked list sorted as well. ### `q_swap` (LeetCode 24. Swap Nodes in Pairs) [LeetCode 24. Swap Nodes in Pairs](https://leetcode.com/problems/swap-nodes-in-pairs/description/) 的介紹是 Singly Linked List 的 Swap 操作，但是在 lab0，顯然並非 Singly Linked list。要操作的是 Doubly Linked List。 `q_swap` 函式在 queue.h 上註解的解釋是: > Swap every two adjacent nodes 我們可以發現，`q_swap` 即是 `q_reverseK` 當 k = 2 的時候。但是這題最好不要將 `q_swap` 當成是 `q_reverseK` 的 subset 來實作，畢竟 `q_swap` 實作上比起 `q_reverseK` 可以較為簡單許多: ```c void q_swap(struct list_head *head) { struct list_head *node; list_for_each(node, head) { if (node->next == head) break; list_move_tail(node->next, node); } } ``` ### `q_reverseK` (LeetCode 25. Reverse Nodes in k-Group) [LeetCode 25. Reverse Nodes in k-Group](https://leetcode.com/problems/reverse-nodes-in-k-group/description/) 這題的討論區有許多有關於測試資料的抱怨。不過在這裡的實作我就暫時不去考慮情景(k > list size)，直接用最天真的方式來去思考問題實作。可以簡單透過設立一個變數 `i` 記錄是否經過 `k` 個 node，之後再進行這個區間的 reverse 即可。要注意 reverse 的動作必須在一個暫存的 `LIST_HEAD(tmp)` 進行。 ```c void q_reverseK(struct list_head *head, int k) { int i = 0; LIST_HEAD(tmp); struct list_head *node, *safe; struct list_head *from = head; list_for_each_safe(node, safe, head) { i++; if (i == k) { i = 0; list_cut_position(&tmp, from, node); q_reverse(&tmp); list_splice(&tmp, from); INIT_LIST_HEAD(&tmp); from = safe->prev; } } } ``` ### `q_sort` 有一個稍微惱人的地方是，如果先前沒有實作 `q_size`，直接實作 `q_sort`，在測試 `q_sort` 的過程會一直得到 segmentation fault 的報錯，但是問題並非出在 `q_sort` 本身，而是 qtest.c 的 `do_sort` 測試會呼叫 `q_size` 這個函式。所以要小心，實作 `q_sort` 前要先將 `q_size` 實作出來，否則可能會出現一整個下午除錯除到不知道到底發生什麼事的狀況。實作上採用 Merge Sort，首先遞迴找出中點切割，最後再進行 Merge。 ```c void q_sort(struct list_head *head, bool descend) { if (!head || list_empty(head) || list_is_singular(head)) return; /* Merge Sort Implementation */ LIST_HEAD(left); LIST_HEAD(sorted); list_cut_position(&left, head, q_middle(head)); q_sort(&left, descend); q_sort(head, descend); q_sort_merge(&left, head, descend); } ``` Merge 則使用迭代實作: ```c void q_sorted_merge(struct list_head *lhs, struct list_head *rhs, bool descend) { LIST_HEAD(sorted); while (!list_empty(lhs) && !list_empty(rhs)) { char *lv = list_entry(lhs->next, element_t, list)->value; char *rv = list_entry(rhs->next, element_t, list)->value; struct list_head *tmp; if(descend) tmp = strcmp(lv, rv) >= 0 ? lhs->next : rhs->next; else tmp = strcmp(lv, rv) <= 0 ? lhs->next : rhs->next; list_move_tail(tmp, &sorted); } list_splice_tail((struct list_head *) ((uintptr_t) lhs | (uintptr_t) rhs), &sorted); INIT_LIST_HEAD(rhs); list_splice_tail(&sorted, rhs); } ``` 這個函式做的事情其實就是將兩個 Sorted Lists 進行 Merge，也就是 Merge k Sorted Lists 當 k = 2 時的情景，所以這個函式就可以在實作 `q_merge` 的時候再次使用到。當我們遞迴找中點切割，切到已經不能再切割的時候(切到每個 "Sub List" 只含有一個 element)，這時就可以將每個只含有一個 "element" 的 "Sub List" 視為是 Sorted List，再逐一 Merge 在一起。 ### `q_descend` & `q_ascend` (LeetCode 2487. Remove Nodes From Linked List) 我們從 LeetCode 網站 [2487. Remove Nodes From Linked List](https://leetcode.com/problems/remove-nodes-from-linked-list/) 給的兩個 hint: :::info * Hint 1 > Iterate on nodes in reversed order. * Hint 2 > When iterating in reversed order, save the maximum value that was passed before. ::: 幾乎等於就是給了我們答案。由於 queue.[ch] 的資料結構是 Doubly Linked List，所以可以直接做到 Hint 1: > Iterate on nodes in reversed order. 為了保持 `list_for_each` 這類的 function-like macro 的實作風格，所以直接從 kernel 的 [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 複製下來以下兩個巨集: ```c #define list_for_each_reverse(pos, head) \ for (pos = (head)->prev; pos != (head); pos = pos->prev) ``` ```c #define list_for_each_entry_safe_reverse(pos, n, head, member) \ for (pos = list_last_entry(head, typeof(*pos), member), \ n = list_prev_entry(pos, member); \ !list_entry_is_head(pos, head, member); \ pos = n, n = list_prev_entry(n, member)) ``` 這樣就可以直接利用這兩個巨集達到反向走訪整個 queue 的作用。再加上 Hint 2 的提示，我們只需要在反向走訪 queue 的時候暫存 element value 的最大值，這樣就可以很直接的實作出 `q_descend`: ```c int q_descend(struct list_head *head) { int counter = 0; element_t *entry, *safe; char *s = NULL; list_for_each_entry_safe_reverse(entry, safe, head, list) { if(!s || strcmp(entry->value, s) > 0) { s = entry->value; counter++; } else { list_del(&entry->list); q_release_element(entry); } } return counter; } ``` `q_ascend` 的實作邏輯與 `q_descend` 一致，只需要更改 element 的 value 的 compare function 即可， ```diff int q_ascend(struct list_head *head) { ... - if(!s || strcmp(entry->value, s) > 0) { + if(!s || strcmp(entry->value, s) < 0) { s = entry->value; counter++; } ... ... } ``` ### `q_merge` (LeetCode 23. Merge k Sorted Lists) LeetCode 題目網站: [LeetCode 23. Merge k Sorted Lists](https://leetcode.com/problems/merge-k-sorted-lists/) 有了 `q_sorted_merge` 這個函式，`q_merge` 實作起來就變得相對直觀: ```diff int q_merge(struct list_head *head, bool descend) { queue_contex_t *first = list_first_entry(head, queue_contex_t, chain); queue_contex_t *entry, *safe; list_for_each_entry_safe(entry, safe, head, chain) { if(entry == first) continue; q_sorted_merge(entry->q, first->q, descend); + INIT_LIST_HEAD(entry->q); } return 0; } ``` 將除了第一個佇列以外，其餘所有的佇列都與第一個佇列進行 `q_sorted_merge`，這樣我們就可以將所有以排序的佇列合併，成為一個新的以排序佇列存放在第一個佇列的位置。 :::warning 額外加了一行 `INIT_LIST_HEAD(entry->q)` ，原因如下: 如果我們查看 `q_merge` 函式的註解: >... There is no need to free the 'qcontext_t' and its member 'q' since they will be released externally. **However, q_merge() is responsible for making the queues to be NULL-queue, except the first one.** 這裡的 NULL-queue 我認為有些誤導人，因為這些經由 Merge 過後剩餘的空的 queues 並不是一定是要指向 `NULL`。而對於這些 queues 的處理，應取決於 `q_free` 的實作。原因我們可以查看 qtest.c 的 `do_merge` 函式，裡面對於經由 Merge 過後剩餘的空的 queues 的處理是: ```diff static bool do_merge( ... ) { ... ... stuct list_head *cur = ...; while (((uintptr_t) cur != (uintptr_t)&chain.head) { queue_contex_t *ctx = list_entry(...); cur = cur->next; + q_free(ctx->q); free(ctx); } ... ... } ``` 顯然是利用我們實作的 `q_free` 來釋放記憶體。因此舉例來說，假設我的 `q_free` 的實作如下: ```c /* Free all storage used by queue */ void q_free(struct list_head *head) { if (!head) return; if (!list_empty(head)) { element_t *entry, *safe; list_for_each_entry_safe(entry, safe, head, list) q_release_element(entry); } free(head); } ``` 如果我們在 `q_merge` 不去額外增加一行 `INIT_LIST_HEAD(entry->q)`，讓 `lhs` 的指標經由 `q_sorted_merge` 過後不去做額外處理使其剩餘骯髒值，那麼經由參數 `lhs` 傳遞的指標值，也就是那些經由 Merge 過後剩餘的空的 queues 的指標 (`entry->q`)，在 `q_free` 操作之後，會進入到 `if (!list_empty(head))` 這個判斷式裡面，進而呼叫 `q_release_element(entry)`，導致存取到非法記憶體，造成 segmentation fault。 ::: 到這裡，基本的 queue.c 實作暫時告一個段落，是時候開始真正 lab0 的重點了。 ### qtest shuffle https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle https://golangprojectstructure.com/the-knuth-shuffle/ ### qtest option entropy 1 研讀: https://hackmd.io/@8dSak6oVTweMeAe9fXWCPA/ryoegPgw_ ### deduct 研讀: https://github.com/oreparaz/dudect