2025q1 Homework1 (lab0)

contributed by < kdnvt >

作業書寫規範:

無論標題和內文中，中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
文字訊息 (尤其是程式執行結果) 請避免用圖片來表示，否則不好搜尋和分類
共筆書寫請考慮到日後協作，避免過多的個人色彩，用詞儘量中性
不要在筆記內加入 [TOC] : 筆記左上方已有 Table of Contents (TOC) 功能，不需要畫蛇添足
不要變更預設的 CSS 也不要加入任何佈景主題: 這是「開發紀錄」，用於評分和接受同儕的檢閱
當在筆記中貼入程式碼時，避免非必要的行號，也就是該手動將 c= 或 cpp= 變更為 c 或 cpp。行號只在後續討論明確需要行號時，才要出現，否則維持精簡的展現。可留意「你所不知道的 C 語言: linked list 和非連續記憶體」裡頭程式碼展現的方式
HackMD 不是讓你張貼完整程式碼的地方，GitHub 才是！因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示)，可附上對應 GitHub commit 的超連結，列出程式碼是為了「檢討」和「便於他人參與討論」
留意科技詞彙的使用，請參見「資訊科技詞彙翻譯」及「詞彙對照表」
不要濫用 :::info, :::success, :::warning 等標示，儘量用清晰的文字書寫。:::danger 則僅限授課教師作為批注使用
避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景
在中文敘述中，使用全形標點符號，例如該用「，」，而非 ","。注意書名號的使用，即 〈 和 〉，非「小於」和「大於」符號
避免使用不必要的 emoji 字元

開發環境

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           126
Model name:                      Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
Stepping:                        5
CPU MHz:                         1200.000
CPU max MHz:                     3600.0000
CPU min MHz:                     400.0000
BogoMIPS:                        2380.80
Virtualization:                  VT-x
L1d cache:                       192 KiB
L1i cache:                       128 KiB
L2 cache:                        2 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-7

針對佇列操作的程式碼實作

q_free

一開始的程式為以下

void q_free(struct list_head *l)
{
    element_t *entry, *safe;
    list_for_each_entry_safe (entry, safe, l, list) {
        if (entry->value)
            free(entry->value);
        free(entry);
    }
    free(l);

    return;
}

但當我完成 insert , remove 等程式，發現有關 malloc failure on new 的測試有時會發生錯誤，而且是 segmentation fault 。我決定用 make valgrind 來檢查出錯的地方，於是有了以下訊息：

# Test of malloc failure on new
==26221== Invalid read of size 8
==26221==    at 0x10E6C9: q_free (queue.c:45)
==26221==    by 0x10ADC4: queue_quit (qtest.c:838)
==26221==    by 0x10D54E: do_quit (console.c:253)
==26221==    by 0x10DF9F: finish_cmd (console.c:593)
==26221==    by 0x10C728: main (qtest.c:963)
==26221==  Address 0x8 is not stack'd, malloc'd or (recently) free'd

根據以上訊息，可以知道問題發生在 q_free 中，而 q_free 中第一個遇到的就是 list_for_each_entry_safe 這個巨集，於是我去看 list.h 中此巨集的實作方法：

#define list_for_each_entry_safe(entry, safe, head, member)                \
    for (entry = list_entry((head)->next, __typeof__(*entry), member),     \
        safe = list_entry(entry->member.next, __typeof__(*entry), member); \
         &entry->member != (head); entry = safe,                           \
        safe = list_entry(safe->member.next, __typeof__(*entry), member))

可以看到這個巨集一開始就直接用

entry = list_entry((head)->next, typeof(*entry), member)

將 list_entry 的內容指派給 entry 。而且也直接使用(head)->next ，所以我很快發現我忘記檢查傳入的引數是否為 NULL 或 empty queue，於是我將程式碼改成以下：

 void q_free(struct list_head *l)
 {
+    if (!l)
+        return;
+    if (list_empty(l)) {
+        free(l);
+        return;
+    }
     element_t *entry, *safe;
     list_for_each_entry_safe (entry, safe, l, list) {
         if (entry->value)

其他想法：

上面那段程式中

    if (!l)
        return;
    if (list_empty(l)) {
        free(l);
        return;
    }

有點冗長，於是我嘗試將程式碼改成以下：

    if (!l || (list_empty(l) && (free(l),true)))
        return;

其中用到了 || 及 && 在前者條件已經確定下，後者不會被執行的技巧。而後面的 (free(l),true) 則是我參閱規格書中 comma operator ：

The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value. 97) If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined.

因此我想它的行為應該會是執行完前面的 free(l) 後直接以 true 來取代這段 expression 。

最後發現根本不用這麼麻煩，因為如果傳入 free 的指標本身就指向 NULL ，free 是不會有任何行為的，所以根本不用多做一個檢查。

簡化後的版本：

 void q_free(struct list_head *l)
 {
-    if (!l)
-        return;
-    if (list_empty(l)) {
+    if (!l || list_empty(l)) {
         free(l);
         return;
     }
     element_t *entry, *safe;
-    list_for_each_entry_safe (entry, safe, l, list) {
-        if (entry->value)
-            free(entry->value);
-        free(entry);
-    }
+    list_for_each_entry_safe (entry, safe, l, list)
+        q_release_element(entry);
     free(l);
 
     return;

q_sort

一開始決定以疊代迭代的方式實作 merge sort 。

依據《國家教育研究院雙語詞彙、學術名詞暨辭書資訊網》，"iterative" 一詞翻譯為「迭代」。
注意「疊」和「迭」語意不同:

疊: 堆聚、累積
迭: 輪流、更替，如：「更迭」。《詩經．邶風．柏風》：「日居月諸，胡迭而微。」

依本處情境來，乃逐一節點走訪，存取特定的記憶體內容，因此該用「迭」
:notes: jserv

初始版本：

＃define STACKSIZE 100000

void q_sort(struct list_head *head)
{
    if (!head || list_empty(head) || list_is_singular(head))
        return;

    int count = 0, n = q_size(head);
    struct list_head *sorted[STACKSIZE];

    struct list_head *cur, *safe;
    list_for_each_safe (cur, safe, head)
        INIT_LIST_HEAD(sorted[count++] = cur);

    for (int size_each_list = 1; size_each_list < n; size_each_list *= 2) {
        for (int i = 0; i + size_each_list < n; i += size_each_list * 2) {
            struct list_head *left = sorted[i];
            struct list_head *right = sorted[i + size_each_list];
            sorted[i] = merge(left, right);
        }
    }
    list_add_tail(head, sorted[0]);
}

/*
 * The list in this function is doubly circular linked_list without
 * the Head node.
 * Move each node in list right to list left at corresponding position.
 */
struct list_head *merge(struct list_head *left, struct list_head *right)
{
    struct list_head *head = left;
 
    if (strcmp(list_entry(left, element_t, list)->value,
               list_entry(right, element_t, list)->value) > 0) 
        list_move_tail(head = (right = right->next)->prev, left);
    
    struct list_head *tail = head->prev;
    while (left != tail && right->next != left) {
        int cmp = strcmp(list_entry(left, element_t, list)->value,
                         list_entry(right, element_t, list)->value);
        if (cmp <= 0)  // to keep sorting stable, split condition as <= , >
            left = left->next;
        else
            list_move_tail((right = right->next)->prev, left);
    }
    while (right->next != left && right->next != head) {
        int cmp = strcmp(list_entry(left, element_t, list)->value,
                         list_entry(right, element_t, list)->value);

        list_move_tail((right = right->next)->prev, cmp < 0 ? head : left);
        // Shoud be cmp <= 0 because of stability
    }
    return head;
}

上面的 merge 的實作本來是想把右列的 sorted list 一個一個判斷後插入左列，但越寫越不直覺，還要想一些例外狀況，可讀性差也不易理解，於是寫完之後決定重寫一份：

 struct list_head *merge(struct list_head *left, struct list_head *right)
 {
-    struct list_head *head = left;
- 
-    if (strcmp(list_entry(left, element_t, list)->value,
-               list_entry(right, element_t, list)->value) > 0) 
-        list_move_tail(head = (right = right->next)->prev, left);
-    
-    struct list_head *tail = head->prev;
-    while (left != tail && right->next != left) {
-        int cmp = strcmp(list_entry(left, element_t, list)->value,
-                         list_entry(right, element_t, list)->value);
-        if (cmp <= 0)  // to keep sorting stable, split condition as <= , >
-            left = left->next;
-        else
-            list_move_tail((right = right->next)->prev, left);
-    }
-    while (right->next != left && right->next != head) {
-        int cmp = strcmp(list_entry(left, element_t, list)->value,
-                         list_entry(right, element_t, list)->value);
+    struct list_head *head;
+
+    int cmp = strcmp(list_entry(left, element_t, list)->value,
+                     list_entry(right, element_t, list)->value);
+    struct list_head **chosen =
+        cmp <= 0 ? &left : &right;  // cmp <= 0 for stability
+    head = *chosen;
+    *chosen = (*chosen)->next;
+
+    list_del_init(head);
 
-        list_move_tail((right = right->next)->prev, cmp < 0 ? head : left);
-        // Shoud be cmp <= 0 because of stability
+    while (left->next != head && right->next != head) {
+        cmp = strcmp(list_entry(left, element_t, list)->value,
+                     list_entry(right, element_t, list)->value);
+        chosen = cmp <= 0 ? &left : &right;  // cmp <= 0 for stability
+        list_move_tail((*chosen = (*chosen)->next)->prev, head);
     }
+    struct list_head *remain = left->next != head ? left : right;
+    struct list_head *tail = head->prev;
+
+    head->prev = remain->prev;
+    head->prev->next = head;
+    remain->prev = tail;
+    remain->prev->next = remain;
+
     return head;
 }

上面這段程式碼是先判斷找出 value 最小的 element 當成第一個 node ，然後比較左右列最小的 node 一個一個插入進去。
執行起來較為對稱，也沒有太多例外狀況，較好理解可讀性也較佳。

需要特別注意的是為了保持排序的穩定性，條件為 cmp <= 0 而不是 cmp < 0 。

不過 q_sort 中有一個致命的缺點，就是陣列 sorted 是固定長度，若佇列太小會很浪費空間，太大則會無法執行。另外，這次作業似乎禁止在 q_sort 中使用 malloc ，不過就算能用，對於記憶體也可能會是很大的負擔。於是我想了另一套實作方法來解決這個問題：

@@ -3,19 +3,34 @@ void q_sort(struct list_head *head)
     if (!head || list_empty(head) || list_is_singular(head))
         return;
 
-    int count = 0, n = q_size(head);
-    struct list_head *sorted[STACKSIZE];
-
     struct list_head *cur, *safe;
     list_for_each_safe (cur, safe, head)
-        INIT_LIST_HEAD(sorted[count++] = cur);
+        cur->prev = cur;
+
+    /* pointer first points to first sorted list */
+    struct list_head *first = head->next;
+    INIT_LIST_HEAD(head);
+    while (first->prev->next != head) {
+        struct list_head **last = &first;
+        struct list_head *next_list = (*last)->prev->next;
+        struct list_head *next_next_list = next_list->prev->next;
+
+        while ((*last) != head && next_list != head) {
+            /* Make each sorted list doubly circular list
+             * to make use of function merge.
+             */
+            (*last)->prev->next = (*last);
+            next_list->prev->next = next_list;
+            (*last) = merge((*last), next_list);
 
-    for (int size_each_list = 1; size_each_list < n; size_each_list *= 2) {
-        for (int i = 0; i + size_each_list < n; i += size_each_list * 2) {
-            struct list_head *left = sorted[i];
-            struct list_head *right = sorted[i + size_each_list];
-            sorted[i] = merge(left, right);
+            /* The result of function merge is a doubly circular list,
+             * so make tail of list point it next to next sorted list.
+             */
+            last = &((*last)->prev->next);
+            *last = next_next_list;
+            next_list = (*last)->prev->next;
+            next_next_list = next_list->prev->next;
         }
     }
-    list_add_tail(head, sorted[0]);
+    list_add_tail(head, first);
 }

這個方法主要使用每個 sorted list 尾端節點的 next 指向下一個 sorted list ，如此一來我就可以用 linked_list 來走訪各個子 sorted list ，不需要額外的空間來記錄這些 sorted list 的位置。
結構大致上如下：

圖中 1 5 7 為一個 sorted list ， 2 3 6 為另一個 sorted list ，以一個 first 來記錄整個 linked list 的位置，最後一個節點則指向 head ，而 head 指向自己。
註：此圖僅為說明，若是圖中的佇列，以程式碼執行的結果應該不會出現三個三個一組的情形。

範例說明：
假如一個佇列順序為 1 6 7 2 5 3 。

程式一開始先以

    struct list_head *cur, *safe;
    list_for_each_safe (cur, safe, head)
        cur->prev = cur;

    /* pointer first points to first sorted list */
    struct list_head *first = head->next;
    INIT_LIST_HEAD(head);

將佇列切割成單個 sorted list ，圖如下：

接著對每兩個相鄰的 sorted list 做合併：

最後再將 head 插回排序好的佇列中，再回傳就完成了。

研讀 lib/list_sort.c

除了程式碼及註解外，我參考 DokiDokiPB，才發現 commit message 也是很重要的資訊來源。如果實作上較為複雜或有其他考量，開發者會把重要的資訊及考量的點也寫在 commit message 中。

延伸閱讀：
lib/list_sort: optimize number of calls to comparison function

研讀 lib/list_sort.c 時我發現 list_sort.c 實作方法跟我的有一些相似之處，例如用 prev 來串起排序好的子串列，以及用 bottom up 的方法實作排序。不過也有許多不同的地方， list_sort.c 在合併的過程中，直接把子串列當成單向鏈結串列，因此操作上省去了許多 prev 的指標操作，只要在最後一次 merge 時再串起即可。

而選取合併的子串列時，雖然是用 bottom up 的方法，但不同於一般的 level order merge ——即每一層相同大小的子串列都合併完後，才合併更大的子串列—— list_sort.c 的實作上採用了 depth-first order ，目的是為了避免 thrashing 。

level order merge 在每一層都會幾乎讀取所有子串列(有些實作上最後一條子串列可能不對讀到)，當要排序的雙向鏈結串列超過 L1 cache 的大小，同一層前面排序好的子串列必須從 cache 中搬出，讓給後面要排序的子串列。而到了下一層，剛被搬出的子串列又要被搬進 cache ，造成資料不斷被搬進搬出 cache ，頻繁讀取卻沒有利用到 cache ，對於效能是很大的傷害。如果採用 depth-first order，就可以趁相近的子串列都還在 L1 cache 時，盡量將大小相同的兩者合併，進而打造出 cache friendly 的實作方法。

不過只用 depth-first order 還有一個很大的問題，就是會造成 unbalanced merge ，即兩個大小差異極大的子串列做合併。（我的實作中沒有考慮到這一點，這是一個可以再改進的部份。）

因此，這份 commit ：lib/list_sort: optimize number of calls to comparison function 的作者 George Spelvin 提出了另一種實作方式來改善這個缺點。

首先，他分析了不同實作下所需的 compares 次數。

n \times \log_{2} (n) - K \times n + O (1)

其中使用一般 Top down merge sort 因為可以將子串列平均分割，因此平均情況下 K = 1.248 ，而一般的 bottom up merge sort 平均情況下 K = 0.965 。（George Spelvin 做的實驗顯示 K = 1.01）

但是使用 Top down merge sort 或其他改良版的 merge sort (如佇列merge sort)時，都會用到 breadth-first multi-pass structure ，與我們想要以 depth-first order 善用 cache 的需求不符。

George Spelvin 透過確保每次合併時，兩個子串列的大小不會大於

2 : 1

，成功的將 K 值增加到 1.207，省下了

0.2 \times n

次的 compares 操作。

George Spelvin 在 commit message 最後附上了一些分析合併排序的論文：

Bottom-up Mergesort: A Detailed Analysis
Wolfgang Panny, Helmut Prodinger Algorithmica 14(4):340–354, October 1995

The cost distribution of queue-mergesort, optimal mergesorts, and power-of-two rules
Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen Journal of Algorithms 30(2); Pages 423–448, February 1999

Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253–259, 10 December 1993

確保 2 : 1 的實作方法

這段操作十分精簡且高效，。
George Spelvin 甚至在 commit message 的最後寫上了：

I confess to being more than a little bit proud of how clean this code turned out. It took a lot of thinking, but the resultant inner loop is very simple and efficient.

先引入一個變數 count 來記錄目前讀進的元素個數，並根據其 bit pattern 來判斷每次的合併。
節錄 list_sort.c 中的註解：

/*
 * The number of pending lists of size 2^k is determined by the
 * state of bit k of "count" plus two extra pieces of information:
 *
 * - The state of bit k-1 (when k == 0, consider bit -1 always set), and
 * - Whether the higher-order bits are zero or non-zero (i.e.
 *   is count >= 2^(k+1)).
 *
 * There are six states we distinguish.  "x" represents some arbitrary
 * bits, and "y" represents some arbitrary non-zero bits:
 * 0:  00x: 0 pending of size 2^k;           x pending of sizes < 2^k
 * 1:  01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
 * 2: x10x: 0 pending of size 2^k; 2^k     + x pending of sizes < 2^k
 * 3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
 * 4: y00x: 1 pending of size 2^k; 2^k     + x pending of sizes < 2^k
 * 5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
 * (merge and loop back to state 2)
 */

其中

/*      k k-1
 *      v v   
 * 0:   0 0 x: 0 pending of size 2^k;           x pending of sizes < 2^k
 * 1:   0 1 x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
 * 2: x 1 0 x: 0 pending of size 2^k; 2^k     + x pending of sizes < 2^k
 * 3: x 1 1 x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
 * 4: y 0 0 x: 1 pending of size 2^k; 2^k     + x pending of sizes < 2^k
 * 5: y 0 1 x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
 * (merge and loop back to state 2)
 */

左邊的 0/1 代表第 k 個 bit ，右邊的 0/1 代表第 k-1 個 bit 。
一開始不知道那一個才是 bit k ，看了好久才看懂。
x 最多為 k-2 bits ，因此最大為

2^{k - 1} - 1

。

對於第 k bit ，可以將其分為以下六種狀態，並由其所在的狀態來判斷目前長度為

2^{k}

的串列數量：

0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k

代表目前有 x 數量個元素在長度小於

2^{k}

的子串列中。

1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k

代表有 (

2^{k - 1}

+ x) 個元素在長度小於

2^{k}

的子串列中。

2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k

代表有 (

2^{k}

+ x) 個元素在長度小於

2^{k}

的子串列中，但為了確保之後的合併不會超過 2 : 1 ，因此不會將那

2^{k}

個合併成一個長度為

2^{k}

的串列。

3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k

代表有一個長度為

2^{k}

的串列，另外有 (

2^{k}

+ x) 個元素在長度小於

2^{k}

的子串列中。
因為至少有

2^{k}

2^{k - 1}

個元素，所以可以將那

2^{k}

個元素合併。如果之後都沒有其他元素了，剩下長度差異最多也就是

2^{k} : 2^{k - 1} = 2 : 1

。

4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k

代表有一個長度為

2^{k}

的串列，另外有 (

2^{k}

+ x) 個元素在長度小於

2^{k}

的子串列中。跟狀態 2 一樣，雖然有可合併成

2^{k}

的子串列，但不會立即合併成

2^{k}

。

5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k

代表有兩個長度為

2^{k}

的串列，另外有 (

2^{k - 1}

+ x) 個元素在長度小於

2^{k}

的子串列中。兩個長度為

2^{k}

的子串列一樣不會立即合併成

2^{k + 1}

，唯有確定後面的元素個數達到

2^{k}

，足以確保差距不會超過

2^{k + 1} : 2^{k} = 2 : 1

，才會進行合併並且回到狀態 2 。

參考 DokiDokiPB 及hankluo6

count decimal	count binary	merge	before merge	after merge
0	0000	No	NULL	X
1	0001	No	1	X
2	0010	Yes	1-1	2
3	0011	No	1-2	X
4	0100	Yes	1-1-2	2-2
5	0101	Yes	1-2-2	1-4
6	0110	Yes	1-1-4	2-4
7	0111	No	1-2-4	X
8	1000	Yes	1-1-2-4	2-2-4
9	1001	Yes	1-2-2-4	1-4-4
10	1010	Yes	1-1-4-4	2-4-4
11	1011	Yes	1-2-4-4	1-2-8
12	1100	Yes	1-1-2-8	2-2-8
13	1101	Yes	1-2-2-8	1-4-8
14	1110	Yes	1-1-4-8	2-4-8
15	1111	No	1-2-4-8	X
16	10000	Yes	1-1-2-4-8	2-2-4-8

有了以上的初步理解，就可以來看 list_sort.c 程式實作：

do {
	size_t bits;
	struct list_head **tail = &pending;

	/* Find the least-significant clear bit in count */
	for (bits = count; bits & 1; bits >>= 1)
		tail = &(*tail)->prev;
	/* Do the indicated merge */
	if (likely(bits)) {
		struct list_head *a = *tail, *b = a->prev;

		a = merge(priv, cmp, b, a);
		/* Install the merged result in place of the inputs */
		a->prev = b->prev;
		*tail = a;
	}

	/* Move one element from input list to pending */
	list->prev = pending;
	pending = list;
	list = list->next;
	pending->next = NULL;
	count++;
} while (list);

先分析每一次讀取時 count 的變化，有可能是：

0 \overset{k 個}{\overset{⏞}{11. .11}} = 01 x

(狀態 1)

這樣的話會被下面的程式排除，不給合併。

    for (bits = count; bits & 1; bits >>= 1)
        tail = &(*tail)->prev;
	/* Do the indicated merge */
    if (likely(bits)) {
        /*....*/
    }

另一種則是

y 0 \overset{k 個}{\overset{⏞}{11. .11}} = y 01 x

(狀態 5)

每一次 count + 1 ，都會有一個 bit k 從 0 進位成 1 ，這個 bit 為 least signifcant clear bit 。而這個 bit 處於狀態 5 ，也就是目前有兩個長度為

2^{k}

的子串列。

在它之前的所有 bits 皆為狀態 3 。

x 111. .11 = x 11 x

(狀態 3)

包括第 0 個 bit

The state of bit k-1 (when k == 0, consider bit -1 always set)

根據狀態 3 的敘述，可以說目前

\forall i \in Z, 0 \leq i \leq k - 1

都有剛好一條長度為

2^{i}

的子串列。

也就是目前的子串列大小排列是：

\overset{p e n d i n g}{⟹} \overset{k 個 prev}{\overset{⏞}{2^{0} \overset{p r e v}{⟹} 2^{1} \overset{p r e v}{⟹} . . . . . \overset{p r e v}{⟹} 2^{k}}} \overset{p r e v}{⟹} 2^{k} \overset{p r e v}{⟹} . .

目前在長度小於

2^{k}

的子串列中的元素個數為：

(2^{0} + 2^{1} + 2^{2} + . . . + 2^{k - 1}) + 1 = (2^{k} - 1) + 1 = 2^{k}

數學式中最後的

1

並不在各個子串列中，他是這個迴圈的結尾才會被加進的元素。這個數學式讓我們得知這個迴圈結束時長度小於

2^{k}

的子串列中的元素個數已經到了

2^{k}

，因此可以合併兩條大小為

2^{k}

的子串列。

所以在一開始的迴圈只要往前 k 次就可以到達要合併子串列：

for (bits = count; bits & 1; bits >>= 1)
    tail = &(*tail)->prev;

到達後進行合併：

if (likely(bits)) {
    struct list_head *a = *tail, *b = a->prev;

    a = merge(priv, cmp, b, a);
    /* Install the merged result in place of the inputs */
    a->prev = b->prev;
    *tail = a;
}

然後輸入下一個元素：

    /* Move one element from input list to pending */
    list->prev = pending;
    pending = list;
    list = list->next;
    pending->next = NULL;
    count++;

沒有待輸入的元素時，將剩餘的子串列從小到大合併：

    /* End of input; merge together all the pending lists. */
    list = pending;
    pending = pending->prev;
    for (;;) {
        struct list_head *next = pending->prev;

        if (!next)
            break;
        list = merge(priv, cmp, pending, list);
        pending = next;
    }

最後在用 merge_final 合併最後兩條子串列，並串起 prev ：

    merge_final(priv, cmp, head, pending, list);

用這種方式就可以達成 depth-first merge sort 同時保證兩者差異不會大於

2 : 1

。

一些證明

前提：所有子串列其長度皆為 2 的冪，所有合併皆為同大小串列。

設

t_{i}

為目前長度小於

2^{i}

的所有子串列的節點數總和。

i

為正整數。

第一點：
證明當

t_{i}

從

3 \times 2^{i - 1} - 1

加一變成

3 \times 2^{i - 1}

時，不存在正整數

j \neq i

使得

t_{j}

3 \times 2^{j - 1}

。

假設存在

j > i

且

t_{j}

3 \times 2^{j - 1}

。則有一個正整數

l

使得

3 \times 2^{i - 1} + l = 3 \times 2^{j - 1}

。因為長度小於

2^{i}

的所有子串列的節點數總和都包含在

t_{i}

裡面了，所以

l

都在長度大於等於

2^{i}

的串列裡。因此可以把

l

寫成

A \times 2^{i}

。其中 A 等於任意正整數。

3 \times 2^{i - 1} + A \times 2^{i} = 3 \times 2^{j - 1}

(3 + 2 A) \times 2^{i - 1} = 3 \times 2^{j - 1}

\frac{(3 + 2 A)}{3} = 2^{j - i}

1 + \frac{2 A}{3} = 2^{j - i}

令

A = 3 B

代入，

B

為任意正整數。

1 + 2 B = 2^{j - i}

因為

j > i

，所以

2^{j - i}

必為偶數，不存在任何正整數

B

使得

1 + 2 B

等於偶數，假設不成立，因此不存在

j > i

且

t_{j}

3 \times 2^{j}

。

假設存在

j < i

且

t_{j}

3 \times 2^{j - 1}

。則有一個正整數

l

使得

3 \times 2^{i - 1} = 3 \times 2^{j - 1} + l

。因為長度小於

2^{j}

的所有子串列的節點數總和都包含在

t_{j}

裡面了，所以

l

都在長度大於等於

2^{j}

的串列裡。因此可以把

l

寫成

A \times 2^{j}

。其中 A 等於任意正整數。

3 \times 2^{i - 1} = 3 \times 2^{j - 1} + A \times 2^{j}

3 \times 2^{i - 1} = (3 + 2 A) \times 2^{j - 1}

\frac{(3 + 2 A)}{3} = 2^{i - j}

1 + \frac{2 A}{3} = 2^{i - j}

令

A = 3 B

代入，

B

為任意正整數。

1 + 2 B = 2^{i - j}

因為

j < i

，所以

2^{i - j}

必為偶數，不存在任正整數

B

使得

1 + 2 B

等於偶數，假設不成立，因此不存在

j < i

且

t_{j}

3 \times 2^{j}

。

總和上述兩點，可得證。

第二點：
證明在 "能保持差異最大

2 : 1

的情況下，能合併則合併。" 的條件下，當

t_{n + 1}

從

3 \times 2^{n} - 1

增加到

3 \times 2^{n}

時，一定存在兩個長度為

2^{n}

的子串列可供合併。

當

n = 0

，因為長度最短為 1，

3 \times 2^{0}

= 3，長度分別為 1,1,1 ，因此可以合併，成立。

設當

n = k

時成立，則當 n = k+1 時，必須證明當

t_{k + 2}

增加到

3 \times 2^{k + 1}

時，一定存在兩個長度為

2^{k + 1}

的子串列可合併。
第一個

2^{k + 1}

子串列在

t_{k + 1}

來到

3 \times 2^{k}

時根據假設誕生。此時

t_{k + 2}

也等於

3 \times 2^{k}

，合併後

t_{k + 2}

3 \times 2^{k}

，

t_{k + 1}

2^{k}

。
第二個

2^{k + 1}

子串列在

t_{k + 1}

再次來到

3 \times 2^{k}

時根據假設誕生，此時

t_{k + 2}

來到

5 \times 2^{k}

。
由上述兩點可知，在

t_{k + 2}

從 0 增加到

3 \times 2^{k + 1}

的過程中，一定會經過

3 \times 2^{k}

以及

5 \times 2^{k}

，並在這兩時刻產生出長度為

2^{k + 1}

的串列。所以當

t_{k + 2}

增加到

6 \times 2^{k} = 3 \times 2^{k + 1}

時，一定存在兩個長度為

2^{k + 1}

的子串列可供合併，根據數學歸納法得證。

第三點：
證明

2^{i}

長度串列的合併(兩個

2^{i - 1}

合併成

2^{i}

)只會發生在

t_{i}

變成

3 \times 2^{i - 1}

。

由第二點可知，對所有自然數

i

，

t_{i}

不會超過

3 \times 2^{i - 1}

，因為只要等於

3 \times 2^{i - 1}

就會合併串列並變回

2^{i - 1}

。所以合併不會發生在

t_{i} > 3 \times 2^{i - 1}

。

當

t_{i} < 3 \times 2^{i - 1}

，此時合併成

2^{i}

串列無法保證兩條串列比例差異不會超過

2 : 1

，所以不會合併。

故合併只會發生在

t_{i} = 3 \times 2^{i - 1}

得證。

綜合上述三點，可得出當 count 增加到 count+1 ，最多只會存在一正整數

i

使得

t_{i}

為

3 \times 2^{i - 1}

。若此

i

存在，則一定可合併成長度為

2^{i}

的串列，且此串列為此時合併的唯一串列。

每次 count 遞增所合併的串列也不會造成更長串列的連鎖合併，因為合併成

2^{i}

只會改變

t_{i}

的值，不會改變其他的

t

值，所以不會有其他

t_{j}

增加成

3 \times 2^{j - 1}

而造成合併。

為什麼 list_sort 中串列大小的比例差距限制是二比一

因為目的是為了避免合併差距過大，因此使用二比一而不是更大三比一、四比一很合理。需要考慮的是為什麼不用三比二或四比三這種比例。

先簡述一下 list_sort 的演算法：

第一階段：
將節點一個一個讀取，如果確定合併的大小不會超過整體的特定比例，就將之前的串列合併。因為本質還是 bottom up merge sort ，所以此階段的合併都是同大小，所有子串列也都會是二的冪。

第二階段：
當所有節點都讀進並盡可能的同大小合併後，就來到必須合併不同大小的階段。這個階段會將子串列由小大到一個接一個合併。

從以上兩階段可以看出，因為第一階段都是二的冪，在維持

2 m : 2 m + 1

比例的情況下，第一階段結束時最大的子串列就是

2^{⌊ l o g_{2} 2 m n / (2 m + 1) ⌋}

。

在第二篇論文 The cost distribution of queue-mergesort, optimal mergesorts, and
power-of-two rules 中看到以下這段話：

Note that
$2^{⌊ l o g_{2} 2 n / 3 ⌋}$ is the unique power of two lying between
$n / 3$ and
$2 n / 3$ and that the choice of rationals other than
$2 / 3$ will not be more balanced. For example, if
$2^{⌊ l o g_{2} n / 2 ⌋}$ or
$2^{⌊ l o g_{2} 5 n / 9 ⌋}$ then the sizes of two subproblems are
$(2, 5)$ for
$n = 7$ while the balanced power-of-two gives
$(3, 4)$ .

這段話其實沒有很精準，因為當

n = 3 \times 2^{k}

時，在包括的情況下介於

n / 3

以及

2 n / 3

的二的冪會有兩個，再不包括的情況下卻又會不存在。如果是寫成

n / 3 < 2^{⌊ l o g_{2} 2 n / 3 ⌋} \leq 2 n / 3

就符合這段話的描述。
而這段敘述講述了一件事，

2^{⌊ l o g_{2} 2 n / 3 ⌋}

可以找到最接近

n / 2

的二的冪。

如果今天是用

2^{⌊ l o g_{2} 3 n / 5 ⌋}

去求

2^{k}

，則當

n = 13

時，

2^{⌊ l o g_{2} 3 n / 5 ⌋} = 4

，而最接近

13 / 2

的二的冪卻是

8

。
這代表如果在第一階段用

3 : 2

的比例去限制，在第二階段合併中，最後一次合併會是

9 : 4

，這個比例差距已經超過第一階段的

3 : 2

，而同樣的反例在

2 m : 2 m + 1, m > 2

中也都會發生。因此這樣的演算法只能用在

2 : 1

的情況。

證明 list_sort 的演算法是 optimal mergesort

optimal merge sort 定義：在最差的情況下，比較次數小於等於所有其他 mergesort 。

對所有 mergesort ，都可以繪製出一個 merge tree 來表示其合併的順序及長度，每一個節點的權重代表當下的串列長度。方形的是 external node (leaf) ，圓形的是 internal node 。如下圖所示：

參考資訊: graphviz

internal node 的數量為 leaf 數減一。在最差的情況下，合併 (x,y) 的次數為 x+y-1 ，因此最差情況下總共的比較次數就是

\sum_{v}^{T} (w (v) - 1) = \sum_{v}^{T} w (v) - (n - 1)

其中

n

為這次排序的節點總數，也是 leaf 的總數。而

v

為所有的 internal node 。

w (v)

為該 internal node 的權重（也就是長度）。在這邊因為對所有合併排序 n - 1 都是固定的，所以只要看

\sum_{v}^{T} w (v)

就好。

引述論文 Queue-Mergesort 中以下段落：

We use the fact that the weight of a merge tree is equal to its external path length. The height h(f) of a node I in a tree is the distance of a path from 1 to the root. The external path length of a tree T’ is the sum E(T’) =
$\sum_{l a l e a f o f T^{'}} h (l)$

以及以下段落：

Thus a merge tree T describes an optimal merge-sort on n items if and only if T has minimum external path length
$\sum_{l a l e a f} h (l)$ . It is known that this occurs if and only if the following condition is satisfied: all of T’s leaves are located on its bottom two levels.

因此我們可以得知，只要證明 list_sort 的 merge tree 符合所有 leaf 都在最下面兩層這個條件，就可以證明它是 optimal merge sort 。

分析 list_sort 的演算法，可以得出以下兩點：

第一點：在合併的過程中，最多只有一個子串列不是二的冪（第二階段排最後的子串列）。

第二點：在合併的過程中，兩者差異不大於兩倍。

證明合併過程中對所有長度 n 的子串列，其 merge tree 的所有 leaf 都在最下面兩層。

證明過程：

第一階段所有合併皆為二的冪，符合命題。

用數學歸納法證明第二階段：

最小端的子串列只可能是 (1,1) 或 (1,2)，兩者合併都符合命題。

對第二階段過程中的任意合併，假設其兩個子串列都符合命題。

則合併的過程中，由第一點可知，其中一個子串列一定為二的冪，因此其 merge tree 為 full binary tree ，所有 leaf 都在最底層。另其長度為

2^{k_{1}}

，則 merge tree 的高度為

k_{1}

。

當另一個子串列如果也是二的冪，因為第二點中兩者差異不大於兩倍，因此其大小只可能是

2^{k_{1} - 1}

或

2^{k_{1} + 1}

。 merge tree 的高度

k_{2} = k_{1} \pm 1

，符合命題。

當第二的子串列不為二的冪，那根據假設，此串列的 merge tree 所有 leaf 都在最下面兩層，而其中一層就是

k_{1}

，否則會違反第二點兩者差異不大於兩倍的描述，因此也符合命題。

根據數學歸納法，對第二階段過程中的任意合併，其 merge tree 的所有 leaf 都在最下面兩層。所以可以得出最後一次合併所產生的 merge tree 也符合命題。

根據論文中所述，可得知 list_sort 中的演算法也是 optimal merge sort 。