2024q1 Homework2 (quiz1+2)

# 2024q1 Homework2 (quiz1+2) contribute by <`Hualing-Chiu`> ## [第一週測驗題](https://hackmd.io/@sysprog/linux2024-quiz1) ### 測驗一 > 延伸問題: > 1. 解釋上述程式碼的運作原理，提出改進方案並予以實作。 > 2. 使用 Linux 核心風格的 List API 改寫上述程式碼，並針對鏈結串列，提出可避免最差狀況的快速排序實作，應設計效能評比的測試程式。 #### 運作原理 ##### `list_add` 此函式的目的為在鏈結串列前端加入一個新節點，`node_t` 是要被加入的新節點，當節點加入完成後，將 `*list` 指回串列的開頭。 ```c void list_add(node_t **list, node_t *node_t) { node_t->next = *list; *list = node_t; } ``` ##### `list_tail` 此函式目的在於找到串列尾端，這裡使用間接指標來操作，不斷向後更新，因此 `AAAA` 應為 `(*left)->next`。 ```c node_t *list_tail(node_t **left) { while ((*left) && (*left)->next) left = &((*left)->next); return *left; } ``` ##### `list_length` 此函式是在計算整個串列的長度，當判斷 `*left` 不為 `null` 時，不斷將指標向後更新，因此 `BBBB` 應為 `(*left)->next`。 ```c int list_length(node_t **left) { int n = 0; while (*left) { ++n; left = &((*left)->next); } return n; } ``` ##### `quick_sort` 假設原本的鏈結串列為下: 每次都挑選最左邊的節點為 pivot，在進入 `while` 迴圈之前，將 `begin[0]` 及 `end[0]`分別指向串列左端節點及右端節點，進入迴圈後，`L` 指向 `begin[0]`的節點，`R` 指向 `end[0]` 的節點，`p` 則是會向後更新去走訪每個節點。 ```c while (p) { node_t *n = p; p = p->next; list_add(n->value > value ? &right : &left, n); } ``` ```graphviz digraph structs { node[shape=box]; struct0 [label= "3"]; struct1 [label= "5"]; struct2 [label= "4"]; struct3 [label= "1"]; struct4 [label= "2"]; node[shape=plaintext]; pivot; L [fontcolor="blue"]; R [fontcolor="red"]; p { rank="same"; struct0 -> struct1 struct1 -> struct2 struct2 -> struct3 struct3 -> struct4 } pivot -> struct0; L -> struct0; R -> struct4; p -> struct1; } ``` 當`n->value > value` 條件成立時，會將當前指向的節點加到 `right` 的開頭，反之則加到 `left`的開頭。待 `p` 走訪完整個串列後，會將串列分為 `left`、`right`、`pivot`三個部分，且 `begin[0]` 與 `end[0]` 分別指向 `left` 串列的頭跟尾，`begin[1]` 與 `end[1]` 指向 `pivot`，`begin[2]` 與 `begin[2]` 分別指向 `right` 串列的頭跟尾。 ```c begin[i] = left; end[i] = list_tail(left); begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = list_tail(right); ``` ```graphviz digraph structs { node[shape=box]; struct0 [label= "2"]; struct1 [label= "1"]; struct2 [label= "3"]; struct3 [label= "4"]; struct4 [label= "5"]; node[shape=plaintext]; pivot; left; right; begin0 [fontcolor="blue"]; end0 [fontcolor="red"]; begin1 [fontcolor="blue"]; end1 [fontcolor="red"]; begin2 [fontcolor="blue"]; end2 [fontcolor="red"]; { rank="same"; struct0->struct1 left->struct0 pivot->struct2 struct3->struct4 right->struct3 } begin0->struct0 end0->struct1 begin1->struct2 end1->struct2 begin2->struct3 end2->struct4 } ``` 當 `L` 與 `R` 從堆疊中取出相同值的情況下，代表子串列只剩下單一元素，即可加入到 `result` 的開頭，就可得到排序好的串列。 ```c } else { if (L) list_add(&result, L); i--; } ``` ### 測驗二 > 延伸問題 > 1. 解釋上述程式碼的運作原理，提出改進方案並予以實作。 > 2. 將改進過的 Timsort 實作整合進 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c)，作為另一個有效的排序命令，應設計效能評比的測試程式，確保測試案例符合 Timsort 設想的資料分布樣式。 #### [Timsort](https://en.wikipedia.org/wiki/Timsort) 運作原理 Timsort 是結合合併排序和插入排序的特點，對排序進行優化，尤其是在已經部分排序的序列，先識別出資料中已排序的子序列( run )，透過不斷合併這些 run 來達成全資料範圍的排序。接著對 Timsort 程式碼進行探討，以圖示為例: 原始的鏈結串列為雙向鏈結串列 ```graphviz digraph { node [shape=box] graph [rankdir = "LR"]; head -> 1 -> 2 -> 3 -> 8 -> 7 -> 5 -> 6 -> head; head -> 6 -> 5 -> 7 -> 8 -> 3 -> 2 -> 1 -> head; } ``` 在 `timsort` 函式中，首先先將鏈結串列斷開改成單向串列 ```c head->prev->next = NULL; ``` ```graphviz digraph { node [shape=box] graph [rankdir = "LR"]; head->1->2->3->8->7->5->6; 6->5->7->8->3->2->1->head; node[shape=plaintext]; 6->NULL } ``` 接著是 `find_run` 函式，尋找串列中已經排序好的 run，使用 `len` 計算 run 的長度，以下用圖示說明 `find_run` 函式的作法： * 串列是**升序( ascending )** * 首先將 `head` 指向開頭，`next` 指向 `list` 的下一個節點 ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 1->2->3->8->7->5->6; } node[shape=plaintext]; head->1 list->1 next->2 { rank = "same" 6->NULL } } ``` * 若 `next` 不為空且 `cmp(priv, list, next) <= 0` 成立，則指標持續向後更新，直到找到非升序的串列為止。 ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 1->2->3->8->7->5->6; } node[shape=plaintext]; head->1 list->3 next->8 { rank = "same" 6->NULL } } ``` * 當找到非升序的節點後，斷開它與前一個節點後回傳 `result.head` 與 `result.next`。 ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 1->2->3; 8->7->5->6 } node[shape=plaintext]; result_head->1 list->3 result_next->8 { rank = "same" 3->NULL } } ``` * 串列是**降序( descending )** * 新增一個 `prev` 指標實作反轉序列 ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 8->7->5->6 } node[shape=plaintext]; head->8 list->8 next->7 prev { rank = "same" 6->NULL } } ``` * 把需被反轉的節點給 `prev` ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 7->5->6 8 } node[shape=plaintext]; head->7 prev->8 list->7 next->5 { rank = "same" // 6->NULL 8->NULL } } ``` * 若 `next` 不為空且 `cmp(priv, list, next) <= 0` 則繼續走訪，否則離開迴圈。 ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 5->6 7->8 } node[shape=plaintext]; head->5 prev->7 list->5 next->6 { rank = "same" // 6->NULL 8->NULL } } ``` * 離開迴圈後回傳 `result.head` 與 `result.next`，可以發現此段子序列已排序完成。 ```graphviz digraph { node [shape=box] // graph [rankdir = "LR"]; { rank = "same" 5->7->8 6 } node[shape=plaintext]; result_head->5 // prev->7 // list->5 result_next->6 { rank = "same" 8->NULL } } ``` 再來執行 `merge_collapse` 函式，此函式目的為確保堆疊上的 run 長度保持平衡，此判斷條件就是測驗二題目所描述的 $A<=B+C$ ```c n >= 3 && run_size(tp->prev->prev) <= run_size(tp->prev) + run_size(tp) ``` 此外，還多了一個判斷條件為假設尚未合併的 run 有四個以上，若 $A<=B+C$ 或 $B<=C+D$，則 $C$ 和 $B$ 或 $D$ 選長度小的兩者合併。 ```c n >= 4 && run_size(tp->prev->prev->prev) <= run_size(tp->prev->prev) + run_size(tp->prev) ``` 最後，確保堆疊內的 run 長度都達到平衡後，執行 `merge_force_collapse` 函式將剩餘的 run 全部合併，接著把雙向鏈結接上變為雙向串列，即完成 Timsort 排序。 #### 執行輸出 ``` Creating sample ==== Testing timesort ==== Comparisons: 9490 List is sorted ``` ## [第二週測驗題](https://hackmd.io/@sysprog/linux2024-quiz2) ### 測驗一首先需先了解 preorder 與 postorder 的定義 * preorder ### 測驗二 ### 測驗三