2023 年 Linux 核心設計/實作課程作業 —— lab0 (E)

--- title: 2023 年 Linux 核心設計/實作課程作業 —— lab0 (E) image: https://i.imgur.com/robmeiQ.png description: 改寫自 CMU Introduction to Computer Systems 課程第一份作業，擴充 GNU/Linux 開發工具的使用並強化自動分析機制。 tags: linux2023 --- # L01: [lab0](https://hackmd.io/@sysprog/linux2023-lab0) > 主講人: [jserv](http://wiki.csie.ncku.edu.tw/User/jserv) / 課程討論區: [2023 年系統軟體課程](https://www.facebook.com/groups/system.software2023/) :mega: 返回「[Linux 核心設計/實作](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程進度表 ## 研讀 Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) + 用 bottom up 實作 merge sort 對 cache 較友善，因為過程中就是一直合併，cache 被參照到的機會更大。 + 而 top down 是會先做 partition 再來 merge，但 partition 本身對 cache 不友善，在 cache 移進移出（內容不斷更新），導致 [cache thrashing](https://en.wikipedia.org/wiki/Thrashing_(computer_science))。 ### 合併方式合併方式是當有 $3 \times 2^k$ 個節點時，合併前兩個 $2^k$ 變成 $2^{k + 1}$，並留下一個 $2^k$ 不動，維持著合併：不合併為 2 : 1 的比例，因為只要 $3 * 2^k$ 可以放得進 L1 cache，可以避免 cache thrashing。 `count` 為 pending list 中節點的數量，在 `count` 變 `count + 1` 後，可以觀察到第 `k` 個 bit 會改為 1，0 ~ `k - 1` 個 bit 會變 0，此時會將 2 個 $2^k$ 合併，並留下一個 $2^k$。 ### 何時合併每當 `count + 1`，可能會有兩種情況： #### 1. 當 `count + 1` 後為 $2^k$，就不合併（只有 leading 1 是 1，其餘都為 0） + 例子： `count` = 1（$0001_2$） `count + 1` = 2（$0010_2$）因為 `count + 1` 為 2 是 2 的冪，所以此種情況不合併。 #### 2. 當 count + 1 後不為 $2^k$，就合併 + 例子： `count` = 2（$0010_2$） `count + 1` = 3（$0011_2$）因為 `count + 1` 為 3 不是 2 的冪，所以此種情況要合併。 + 可以觀察到在 `count` 變 `count + 1` 後，第 k 個 bit 會改為 1，0 ~ `k - 1` 個 bit 會變 0。而這裡的 k 為 0，所以會將 2 個 $2^0$ 合併，並留下一個 $2^0$，也就是合併 2 個 1 為 2，留一個 1 不合併。以下是 count 從 0 一直加到 16 merge 的狀況：（括號內是當次被合併的節點加起來的節點數量，用加號來分隔尚未合併的節點，黃色底則是告知此次合併的是 1 + 1, 2 + 2 或 4 + 4 等。） | count 變化 | count 二進位 | merge | pending 上的節點| | -------- | -------- | -------- | -------- | | 0 $\to$ 1 | 0000 $\to$ 0001 | no（$2^0$） | 1 | | 1 $\to$ 2 | 0001 $\to$ 0010 | no（$2^1$） | 1 + 1 | | 2 $\to$ 3 | 0010 $\to$ 001==1== | yes | (2) + 1 | | 3 $\to$ 4 | 0011 $\to$ 0100 | no（$2^2$） | 2 + 1 + 1 | | 4 $\to$ 5 | 0100 $\to$ 010==1== | yes | 2 + (2) + 1| | 5 $\to$ 6 | 0101 $\to$ 01==1==0 | yes | (4) + 1 + 1| | 6 $\to$ 7 | 0110 $\to$ 011==1== | yes | 4 + (2) + 1| | 7 $\to$ 8 | 0111 $\to$ 1000 | no（$2^3$） | 4 + 2 + 1 + 1| | 8 $\to$ 9 | 1000 $\to$ 100==1== | yes | 4 + 2 + (2) + 1| | 9 $\to$ 10 | 1001 $\to$ 10==1==0 | yes | 4 + (4) + 1 + 1| | 10 $\to$ 11 | 1010 $\to$ 101==1== | yes | 4 + 4 + (2) + 1| | 11 $\to$ 12 | 1011 $\to$ 1==1==00 | yes | (8) + 2 + 1 + 1| | 12 $\to$ 13 | 1100 $\to$ 110==1== | yes | 8 + 2 + (2) + 1| | 13 $\to$ 14 | 1101 $\to$ 11==1==0 | yes | 8 + (4) + 1 + 1| | 14 $\to$ 15 | 1110 $\to$ 111==1== | yes | 8 + 4 + (2) + 1| | 15 $\to$ 16 | 1111 $\to$ 10000 | no（$2^4$） | 8 + 4 + 2 + 1 + 1 | ### list_sort ```c __attribute__((nonnull(2,3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) { struct list_head *list = head->next, *pending = NULL; size_t count = 0; /* Count of pending */ if (list == head->prev) /* Zero or one elements */ return; /* Convert to a null-terminated singly-linked list. */ head->prev->next = NULL; ``` + priv: 從 `merge` 函式可以看到 priv 會被當作 cmp 的參數傳入，在其他地方不會用到。 + head: 傳入指向 struct list_head 的指標，和原本自己寫的 `q_sort` 傳入的參數一樣 + cmp: compare function，以 function pointer 的型別傳入 cmp 參數有考慮到通用性，但會增加 function call 的成本。 list 指向 head 的第一個節點，pending 指向 NULL，先檢查是否沒有任何元素或只有一個元素，然後將 head 前一個節點指向的下一個位置指向 NULL，將雙向 linked list 變成單向。 ```c do { size_t bits; struct list_head **tail = &pending; /* Find the least-significant clear bit in count */ for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; /* Do the indicated merge */ if (likely(bits)) { struct list_head *a = *tail, *b = a->prev; a = merge(priv, cmp, b, a); /* Install the merged result in place of the inputs */ a->prev = b->prev; *tail = a; } /* Move one element from input list to pending */ list->prev = pending; pending = list; list = list->next; pending->next = NULL; count++; } while (list); ``` 在 while 迴圈中，會先讓 tail 往前移動到待 merge 的節點，然後在 if 判斷是否需要 merge，最後移動 pending 和 list 的位置，直到 list 沒有節點。pending 從沒有節點，然後一次次將節點從 list 中搬到 pending，等到 list 沒有節點就代表現階段結束了。 ```c ... /* End of input; merge together all the pending lists. */ list = pending; pending = pending->prev; for (;;) { struct list_head *next = pending->prev; if (!next) break; list = merge(priv, cmp, pending, list); pending = next; } /* The final merge, rebuilding prev links */ merge_final(priv, cmp, head, pending, list); } EXPORT_SYMBOL(list_sort); ``` ### merge ```c __attribute__((nonnull(2, 3, 4))) static struct list_head * merge(void *priv, list_cmp_func_t cmp, struct list_head *a, struct list_head *b) { // cppcheck-suppress unassignedVariable struct list_head *head, **tail = &head; for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { *tail = a; tail = &a->next; a = a->next; if (!a) { *tail = b; break; } } else { *tail = b; tail = &b->next; b = b->next; if (!b) { *tail = a; break; } } } return head; } ``` 當 `cmp(priv, a, b) <= 0` 表示 a 的值小於 b，因為由小排序到大，所以先接 a 再接 b，`cmp(priv, a, b) > 0` 表示 a 的值大於 b，則是先接 b 再接 a。其中 `head` 會在排序過 list 的最前面，並回傳回去。 ### 排序過程圖示以 4, 3, 2, 1 的 list 為例做 list_sort 排序，隨著 `count` 增加，`pending` 內節點數量漸漸增加，而 `list` 內的節點數量漸漸減少： #### count = 0 -> count = 1 + list 指向 head->next： ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] head:right:e -> node4:w node4:left:w -> head:e node4:right:e -> node3:w node3:left:w -> node4:e node3:right:e -> node2:w node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node4:n } ``` + 因為 bits 為 0，不需 merge。`list->prev = pending` 因為 pending 為 NULL，所以 list->prev 也指向 NULL： ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w node3:left:w -> node4:e node3:right:e -> node2:w node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node4:n } ``` + pending 節點數量 + 1，list 節點數量 - 1，count + 1，`pending->next = NULL`： ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:left:w -> node4:e node3:right:e -> node2:w node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node3:n pending -> node4:n tail -> pending:n } ``` #### count = 1 -> count = 2 ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:left:w -> node4:e node3:right:e -> node2:w node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node3:n pending -> node4:n tail -> node4:left:s } ``` ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:left:w -> node4:e node3:right:e -> node2:w[color="white"] node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node2:n pending -> node3:n } ``` #### count = 2 -> count = 3 ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] a [label="a", style="normal", color="white"] b [label="b", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:left:w -> node4:e node3:right:e -> node2:w[color="white"] node3:right:e -> node4:e node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node2:n pending -> node3:n tail -> pending a -> node3:n b -> node4:n } ``` ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] a [label="a", style="normal", color="white"] b [label="b", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:right:e -> node2:w[color="white"] node3:right:e -> node4:e node2:left:w -> node3:e node2:right:e -> node1:w node1:left:w -> node2:e list -> node2:n pending -> node3:n tail -> pending a -> node3:n b -> node4:n } ``` ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:right:e -> node2:w[color="white"] node3:right:s -> node4:e node2:left:w -> node3:e node2:right:e -> node1:w[color="white"] node1:left:w -> node2:e list -> node1:n pending -> node2:n tail -> pending } ``` #### count = 3 -> count = 4 ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:right:e -> node2:w[color="white"] node3:right:s -> node4:e node2:left:w -> node3:e node2:right:e -> node1:w[color="white"] node1:left:w -> node2:e list -> node1:n pending -> node2:n tail -> node3:left:s } ``` ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] head:right:e -> node4:w node4:right:e -> node3:w[color="white"] node3:right:e -> node2:w[color="white"] node3:right:s -> node4:e node2:left:w -> node3:e node2:right:e -> node1:w[color="white"] node1:left:w -> node2:e pending -> node1:n tail -> node3:left:s } ``` #### count = 4 ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] next [label="next", style="normal", color="white"] a [label="a", style="normal", color="white"] b [label="b", style="normal", color="white"] head:right:e -> node1:w node4:right:e -> node3:w[color="white"] node3:right:e -> node2:w[color="white"] node3:right:s -> node4:e node2:left:w -> node3:e node1:left:w -> head:e pending -> node2:n next -> node3:n list -> node1:n a -> node2:n b -> node1:n } ``` ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] next [label="next", style="normal", color="white"] a [label="a", style="normal", color="white"] b [label="b", style="normal", color="white"] head:right:e -> node1:w node4:right:e -> node3:w[color="white"] node3:right:e -> node2:w[color="white"] node3:right:s -> node4:e node2:left:w -> node3:e node1:left:w -> head:e pending -> node2:n next -> node3:n list -> node1:n tail -> node1:n a -> node2:n b -> node2:n } ``` ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head|{<left>prev|<right>next}", style="bold"] node4 [label="4|{<left>prev|<right>next}", style="bold"] node3 [label="3|{<left>prev|<right>next}", style="bold"] node2 [label="2|{<left>prev|<right>next}", style="bold"] node1 [label="1|{<left>prev|<right>next}", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] tail [label="tail", style="normal", color="white"] next [label="next", style="normal", color="white"] a [label="a", style="normal", color="white"] b [label="b", style="normal", color="white"] head:right:e -> node1:w node1:left:w -> head:e node1:right:e -> node2:w node2:left:w -> node1:e node2:right:e -> node3:w node3:left:w -> node2:e node3:right:e -> node4:w node4:left:w -> node3:e list -> node1:n pending -> node2:n a -> node2:n b -> node4:n tail -> node4:n } ``` ## Linux 核心排序實作的演進探討 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 相關實作時，除了觀察程式碼，，也該理解為何 Linux 核心開發者採用這段程式碼，也就是推敲開發是如何思考及進行取捨。我們可參見 github commit history，[lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 最近一次演算法上的改動是 May 15, 2019, [commit b5c56e0c](https://github.com/torvalds/linux/commit/b5c56e0cdd62979dd538e5363b06be5bdf735a09), **lib/list_sort: optimize number of calls to comparison function** 引入，其中引用三篇論文： 1. Bottom-up Mergesort: A Detailed Analysis Wolfgang Panny, Helmut Prodinger Algorithmica 14(4):340--354, October 1995 https://doi.org/10.1007/BF01294131 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260 2. The cost distribution of queue-mergesort, optimal mergesorts, and power-of-two rules Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen Journal of Algorithms 30(2); Pages 423--448, February 1999 https://doi.org/10.1006/jagm.1998.0986 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 3. Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q 對於比較次數的探討，我們可寫成以下形式： \begin{gather*} nlog_2 n - Kn + O(1) \end{gather*} 其中，以下 merge sort 的變形 (variants)，領導係數 (leading coefficient) 皆為 $log_2 (n!)$，探討的著重點在於一次項係數 K。開發者探討了 merge sort 的三種實作方式，分別為 **top-down mergesort**、**bottom-up mergesort** 和 **queue-mergesort**，以及開發者提出的方式，以下簡述不同的實作方式： **1. Top-down mergesort** * 有最少的 average case、worst case 的 comparison 次數。 * 需要使用遞迴或是額外空間作為 user stack。 * 需要事先知道整個 list 的大小。下圖例子是 balanced power-of-2 rule dividing strategy: ```graphviz digraph G { splines=line node[shape=box, height=.1];sorted_1 sorted_2 sorted_3 sorted_4 sorted_5 sorted_6 sorted_7 sorted_8; node[shape=record, height=.1];input result divide_41 divide_42 divide_21 divide_22 divide_23 divide_24 merge_21 merge_22 merge_23 merge_24 merge_41 merge_42; node[shape=none];merge_pad input[label="<f0>2|<f1>5|<f2>4|<f3>6|<f4>8|<f5>1|<f6>7|<f7>3"] result[label="<f0>1|<f1>2|<f2>3|<f3>4|<f4>5|<f5>6|<f6>7|<f7>8"] divide_41[label="<f1>2|<f2>5|<f3>4|<f4>6"] divide_42[label="<f1>8|<f2>1|<f3>7|<f4>3"] divide_21[label="<f0>2|<f1>5"] divide_22[label="<f0>4|<f1>6"] divide_23[label="<f0>8|<f1>1"] divide_24[label="<f0>7|<f1>3"] sorted_1[label="1"] sorted_2[label="2"] sorted_3[label="3"] sorted_4[label="4"] sorted_5[label="5"] sorted_6[label="6"] sorted_7[label="7"] sorted_8[label="8"] merge_21[label="<f0>2|<f1>5"] merge_22[label="<f0>4|<f1>6"] merge_23[label="<f0>1|<f1>8"] merge_24[label="<f0>3|<f1>7"] merge_41[label="<f1>2|<f2>4|<f3>5|<f4>6"] merge_42[label="<f1>1|<f2>3|<f3>7|<f4>8"] merge_pad[label=""] input -> divide_41 input -> divide_42 divide_41 -> divide_21 divide_41 -> divide_22 divide_42 -> divide_23 divide_42 -> divide_24 divide_21:f0 -> sorted_2 divide_21:f1 -> sorted_5 divide_22:f0 -> sorted_4 divide_22:f1 -> sorted_6 divide_23:f0 -> sorted_8 divide_23:f1 -> sorted_1 divide_24:f0 -> sorted_7 divide_24:f1 -> sorted_3 sorted_1; sorted_2; sorted_3; sorted_4; sorted_5; sorted_6; sorted_7; sorted_8; sorted_2 -> merge_21:f0 sorted_5 -> merge_21:f1 sorted_4 -> merge_22:f0 sorted_6 -> merge_22:f1 sorted_8 -> merge_23:f1 sorted_1 -> merge_23:f0 sorted_7 -> merge_24:f1 sorted_3 -> merge_24:f0 merge_22 -> merge_pad[style=invis] merge_23 -> merge_pad[style=invis] merge_pad -> result[style=invis] merge_21 -> merge_41 merge_22 -> merge_41 merge_23 -> merge_42 merge_24 -> merge_42 merge_41 -> result merge_42 -> result } ``` **2. Bottom-up mergesort** * 在這幾種變形中需要最多的 comparison 次數。 ```graphviz digraph G { splines=line node[shape=record, height=.1];input result merge_21 merge_22 merge_23 merge_24 merge_41 merge_42; input[label="<f0>2|<f1>5|<f2>4|<f3>6|<f4>8|<f5>1|<f6>7|<f7>3"] result[label="<f0>1|<f1>2|<f2>3|<f3>4|<f4>5|<f5>6|<f6>7|<f7>8"] merge_21[label="<f0>2|<f1>5"] merge_22[label="<f0>4|<f1>6"] merge_23[label="<f0>1|<f1>8"] merge_24[label="<f0>3|<f1>7"] merge_41[label="<f1>2|<f2>4|<f3>5|<f4>6"] merge_42[label="<f1>1|<f2>3|<f3>7|<f4>8"] input:f0 -> merge_21:f0 input:f1 -> merge_21:f1:n input:f2 -> merge_22:f0 input:f3 -> merge_22:f1 input:f4 -> merge_23:f1 input:f5 -> merge_23:f0 input:f6 -> merge_24:f1 input:f7 -> merge_24:f0:n merge_21:f0 -> merge_41:f1 merge_21:f1 -> merge_41:f3 merge_22:f0 -> merge_41:f2 merge_22:f1 -> merge_41:f4 merge_23:f0 -> merge_42:f1 merge_23:f1 -> merge_42:f4 merge_24:f0 -> merge_42:f2 merge_24:f1 -> merge_42:f3 merge_41:f1 -> result:f1 merge_41:f2 -> result:f3 merge_41:f3 -> result:f4 merge_41:f4 -> result:f5 merge_42:f1 -> result:f0 merge_42:f2 -> result:f2 merge_42:f3 -> result:f6 merge_42:f4 -> result:f7 } ``` **3. Queue-mergesort** * 特別適合用於鏈結串列的排序。 * queue-mergesort comparison 的次數少於 bottom-up mergesort，略高於 top-down mergesort。 * 可以以 top-down 或是 bottom-up 的方式實作。 * 透過 get front、put back 操作，因此排序完的結果會是 unstable。根據 [3] 的演算法，虛擬碼如下 ```c queue-mergesort(Q): while (Q.size != 1) do Q.put(Merge(Q.get, Q.get)) ``` **4. lib/list_sort.c** 如果查看更之前的版本 [commit 043b3f7b](https://github.com/torvalds/linux/commit/043b3f7b6388fca6be86ca82979f66c5723a0d10) 會發現是用 bottom-up mergesort 實作。 ## 證明前提：所有子串列其長度皆為 2 的冪，所有合併皆為同大小串列。設 $t_i$ 為目前長度小於 $2^i$ 的所有子串列的節點數總和。$i$ 為正整數。第一點：證明當 $t_i$ 從 $3\times2^{i-1}-1$ 加一變成 $3\times2^{i-1}$ 時，不存在正整數 $j \neq i$ 使得 $t_j$ = $3\times2^{j-1}$ 。假設存在 $j>i$ 且 $t_j$ = $3\times2^{j-1}$ 。則有一個正整數 $l$ 使得 $3\times2^{i-1} + l = 3\times2^{j-1}$ 。因為長度小於 $2^{i}$ 的所有子串列的節點數總和都包含在 $t_i$ 裡面了，所以 $l$ 都在長度大於等於 $2^i$ 的串列裡。因此可以把 $l$ 寫成 $A\times2^i$ 。其中 A 等於任意正整數。 $3\times2^{i-1} + A\times2^i = 3\times2^{j-1}$ $(3+2A)\times2^{i-1} = 3\times2^{j-1}$ $\dfrac{(3+2A)}{3} = 2^{j-i}$ $1+\dfrac{2A}{3} = 2^{j-i}$ 令 $A = 3B$ 代入，$B$ 為任意正整數。 $1+2B = 2^{j-i}$ 因為 $j>i$ ，所以 $2^{j-i}$ 必為偶數，不存在任何正整數 $B$ 使得 $1+2B$ 等於偶數，假設不成立，因此不存在 $j>i$ 且 $t_j$ = $3\times2^j$ 。假設存在 $j<i$ 且 $t_j$ = $3\times2^{j-1}$ 。則有一個正整數 $l$ 使得 $3\times2^{i-1} = 3\times2^{j-1}+l$ 。因為長度小於 $2^{j}$ 的所有子串列的節點數總和都包含在 $t_j$ 裡面了，所以 $l$ 都在長度大於等於 $2^j$ 的串列裡。因此可以把 $l$ 寫成 $A\times2^j$ 。其中 A 等於任意正整數。 $3\times2^{i-1} = 3\times2^{j-1} + A\times2^j$ $3\times2^{i-1} = (3+2A)\times2^{j-1}$ $\dfrac{(3+2A)}{3} = 2^{i-j}$ $1+\dfrac{2A}{3} = 2^{i-j}$ 令 $A = 3B$ 代入，$B$ 為任意正整數。 $1+2B = 2^{i-j}$ 因為 $j<i$ ，所以 $2^{i-j}$ 必為偶數，不存在任正整數 $B$ 使得 $1+2B$ 等於偶數，假設不成立，因此不存在 $j<i$ 且 $t_j$ = $3\times2^j$ 。總和上述兩點，可得證。第二點：證明在 "能保持差異最大 $2:1$ 的情況下，能合併則合併。" 的條件下，當$t_{n+1}$ 從 $3\times2^n-1$ 增加到 $3\times2^n$ 時，一定存在兩個長度為 $2^n$ 的子串列可供合併。當 $n = 0$，因為長度最短為 1， $3\times2^0$ = 3，長度分別為 1,1,1 ，因此可以合併，成立。設當 $n = k$ 時成立，則當 n = k+1 時，必須證明當 $t_{k+2}$ 增加到 $3\times2^{k+1}$ 時，一定存在兩個長度為 $2^{k+1}$ 的子串列可合併。第一個 $2^{k+1}$ 子串列在 $t_{k+1}$來到 $3\times2^k$ 時根據假設誕生。此時 $t_{k+2}$ 也等於 $3\times2^k$ ，合併後 $t_{k+2}$ = $3\times2^k$，$t_{k+1}$ = $2^k$ 。第二個 $2^{k+1}$ 子串列在 $t_{k+1}$再次來到 $3\times2^k$ 時根據假設誕生，此時 $t_{k+2}$ 來到 $5\times2^k$ 。由上述兩點可知，在 $t_{k+2}$ 從 0 增加到 $3\times2^{k+1}$ 的過程中，一定會經過 $3\times2^k$ 以及 $5\times2^k$ ，並在這兩時刻產生出長度為 $2^{k+1}$ 的串列。所以當 $t_{k+2}$ 增加到 $6\times2^{k} = 3\times2^{k+1}$ 時，一定存在兩個長度為 $2^{k+1}$ 的子串列可供合併，根據數學歸納法得證。第三點：證明 $2^i$ 長度串列的合併(兩個 $2^{i-1}$ 合併成 $2^i$)只會發生在 $t_i$ 變成 $3\times2^{i-1}$ 。由第二點可知，對所有自然數 $i$ ，$t_i$ 不會超過 $3\times 2^{i-1}$ ，因為只要等於 $3\times2^{i-1}$ 就會合併串列並變回 $2^{i-1}$ 。所以合併不會發生在 $t_i > 3\times2^{i-1}$ 。當 $t_i < 3\times 2^{i-1}$ ，此時合併成 $2^i$ 串列無法保證兩條串列比例差異不會超過 $2:1$ ，所以不會合併。故合併只會發生在 $t_i = 3\times 2^{i-1}$ 得證。綜合上述三點，可得出當 count 增加到 count+1 ，最多只會存在一正整數 $i$ 使得 $t_i$ 為 $3\times 2^{i-1}$ 。若此 $i$ 存在，則一定可合併成長度為 $2^i$ 的串列，且此串列為此時合併的唯一串列。每次 count 遞增所合併的串列也不會造成更長串列的連鎖合併，因為合併成 $2^i$ 只會改變 $t_i$ 的值，不會改變其他的 $t$ 值，所以不會有其他 $t_j$ 增加成 $3\times 2^{j-1}$ 而造成合併。 ### 為什麼 list_sort 中串列大小的比例差距限制是二比一因為目的是為了避免合併差距過大，因此使用二比一而不是更大三比一、四比一很合理。需要考慮的是為什麼不用三比二或四比三這種比例。先簡述一下 list_sort 的演算法：第一階段：將節點一個一個讀取，如果確定合併的大小不會超過整體的特定比例，就將之前的串列合併。因為本質還是 bottom up merge sort ，所以此階段的合併都是同大小，所有子串列也都會是二的冪。第二階段：當所有節點都讀進並盡可能的同大小合併後，就來到必須合併不同大小的階段。這個階段會將子串列由小大到一個接一個合併。從以上兩階段可以看出，因為第一階段都是二的冪，在維持 $2m : 2m+1$ 比例的情況下，第一階段結束時最大的子串列就是 $2^{\lfloor log_22mn/{(2m+1)} \rfloor}$ 。論文〈[The cost distribution of queue-mergesort, optimal mergesorts, and power-of-two rules](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380)〉提及: > Note that $2^{\lfloor log_22n/3 \rfloor}$ is the unique power of two lying between $n/3$ and $2n/3$ and that the choice of rationals other than $2/3$ will not be more balanced. For example, if $2^{\lfloor log_2n/2 \rfloor}$ or $2^{\lfloor log_25n/9 \rfloor}$ then the sizes of two subproblems are $(2,5)$ for $n = 7$ while the balanced power-of-two gives $(3,4)$. 這段話其實不精準，因為當 $n = 3\times 2^k$ 時，在包括的情況下介於 $n/3$ 以及 $2n/3$ 的二的冪會有兩個，再不包括的情況下卻又會不存在。如果是寫成 $n/3 < 2^{\lfloor log_22n/3 \rfloor} \leq 2n/3$ 就符合這段話的描述。而這段敘述講述了一件事， $2^{\lfloor log_22n/3 \rfloor}$ 可以找到最接近 $n/2$ 的二的冪。如果今天是用 $2^{\lfloor log_23n/5 \rfloor}$ 去求 $2^k$ ，則當 $n = 13$ 時，$2^{\lfloor log_23n/5 \rfloor} = 4$，而最接近 $13/2$ 的二的冪卻是 $8$ 。這代表如果在第一階段用 $3:2$ 的比例去限制，在第二階段合併中，最後一次合併會是 $9:4$ ，這個比例差距已經超過第一階段的 $3:2$ ，而同樣的反例在 $2m:2m+1 ,m>2$ 中也都會發生。因此這樣的演算法只能用在 $2:1$ 的情況。 ### 證明 list_sort 的演算法是 optimal mergesort optimal merge sort 定義：在最差的情況下，比較次數小於等於所有其他 mergesort 。對所有 mergesort ，都可以繪製出一個 merge tree 來表示其合併的順序及長度，每一個節點的權重代表當下的串列長度。方形的是 external node (leaf) ，圓形的是 internal node 。如下圖所示： ```graphviz digraph BST { node [fontname="Arial" ]; l1 [ label = "5" ]; l21 [ label = "3" ]; l22 [ label = "2" ]; l31 [ label = "2" ]; l32 [shape=rect label = "1" ]; l33 [shape=rect label = "1" ]; l34 [shape=rect label = "1" ]; l41 [shape=rect label = "1" ]; l42 [shape=rect label = "1" ]; l1 -> { l21 l22 }; l21 -> { l31 l32 }; l22 -> { l33 l34 }; l31 -> { l41 l42 }; } ``` >[graphviz參考連結](https://stackoverflow.com/questions/41194837/how-to-get-graphviz-dot-to-represent-binary-tree-correctly) internal node 的數量為 leaf 數減一。在最差的情況下，合併 (x,y) 的次數為 x+y-1 ，因此最差情況下總共的比較次數就是 $\sum\limits_{v}^{T}({w(v)} - 1) = \sum\limits_{v}^{T}{w(v)} - (n-1)$ 其中 $n$ 為這次排序的節點總數，也是 leaf 的總數。而 $v$ 為所有的 internal node 。$w(v)$ 為該 internal node 的權重（也就是長度）。在這邊因為對所有合併排序 n - 1 都是固定的，所以只要看 $\sum\limits_{v}^{T}{w(v)}$ 就好。論文〈[Queue-Mergesort](https://www.sciencedirect.com/science/article/pii/002001909390088Q?via%3Dihub)〉提及: > We use the fact that the weight of a merge tree is equal to its external path length. The height h(f) of a node I in a tree is the distance of a path from 1 to the root. The external path length of a tree T’ is the sum E(T’) = $\sum\limits_{l\ a\ leaf\ of\ T'}{h(l)}$ 及以下： > Thus a merge tree T describes an optimal merge-sort on n items if and only if T has minimum external path length $\sum\limits_{l\ a\ leaf}{h(l)}$. It is known that this occurs if and only if the following condition is satisfied: all of T’s leaves are located on its bottom two levels. 因此可知，只要證明 list_sort 的 merge tree 符合所有 leaf 都在最下面兩層這個條件，就可以證明它是 optimal merge sort 。分析 list_sort 的演算法，可以得出以下兩點： 1. 在合併的過程中，最多只有一個子串列不是 2 的冪（第二階段排最後的子串列）。 2. 在合併的過程中，兩者差異不大於兩倍。 **證明合併過程中對所有長度 n 的子串列，其 merge tree 的所有 leaf 都在最下面兩層。** **證明過程：** 第一階段所有合併皆為 2 的冪，符合命題。 **用數學歸納法證明第二階段：** 最小端的子串列只可能是 (1,1) 或 (1,2)，兩者合併都符合命題。對第二階段過程中的任意合併，假設其兩個子串列都符合命題。則合併的過程中，由第一點可知，其中一個子串列一定為二的冪，因此其 merge tree 為 full binary tree ，所有 leaf 都在最底層。另其長度為 $2^{k_1}$ ，則 merge tree 的高度為 $k_1$ 。當另一個子串列如果也是二的冪，因為第二點中兩者差異不大於兩倍，因此其大小只可能是 $2^{k_1-1}$ 或 $2^{k_1+1}$ 。 merge tree 的高度 $k_2 = k_1\pm1$，符合命題。當第二的子串列不為二的冪，那根據假設，此串列的 merge tree 所有 leaf 都在最下面兩層，而其中一層就是 $k_1$ ，否則會違反第二點兩者差異不大於兩倍的描述，因此也符合命題。根據數學歸納法，對第二階段過程中的任意合併，其 merge tree 的所有 leaf 都在最下面兩層。所以可以得出最後一次合併所產生的 merge tree 也符合命題。根據論文中所述，可得知 `list_sort` 中的演算法也是 optimal merge sort 。