2022q1 Homework1 (lab0)

contributed by < SmallHanley >

作業說明
 GitHub

開發環境

$ uname -r
5.13.0-28-generic

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           142
Model name:                      Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
Stepping:                        10
CPU MHz:                         1800.000
CPU max MHz:                     3400.0000
CPU min MHz:                     400.0000
BogoMIPS:                        3600.00
Virtualisation:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-7

實作 queue.c

q_new()

malloe 一個 struct list_head 大小的空間，代表這個新串列的 head 節點，並用 head 指標指向此空間。
檢查 head 是否為 NULL，如果 malloc 失敗回傳 NULL。
呼叫 INIT_LIST_HEAD(head) 使得 head 節點的 prev 和 next 指標皆指向自己，完成雙向環狀串列初始化。

struct list_head *q_new()
{
    struct list_head *head = malloc(sizeof(struct list_head));
    if (!head) {
        return NULL;
    }
    INIT_LIST_HEAD(head);
    return head;
}

q_free()

檢查 l 是否為 NULL。
呼叫 list_for_each_entry_safe (elm, tmp, l, list) 走訪串列 l 中的每個結構體 element_t，每次迭代中依序執行移除節點、釋放 value 空間、釋放結構體 element_t。
最後要釋放串列的 head 節點。
這裡使用 list_for_each_entry_safe() 而不是 list_for_each_entry() 是因為操作會牽涉到移除 current node，前者會先暫存下個節點的位址。

void q_free(struct list_head *l)
{
    element_t *elm, *tmp;
    if (l) {
        list_for_each_entry_safe (elm, tmp, l, list) {
            list_del(&elm->list);
            free(elm->value);
            free(elm);
        }
        free(l);
    }
}

不要只張貼程式碼，你應該說明並探討後續的改進空間。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

TODO

q_insert_head()

檢查 head 是否為 NULL，如果是則回傳 false。
malloc 一個空間給欲新插入的結構體 elm，如果 elm 為 NULL 代表 malloc 失敗，回傳 false。
malloc 一個 strlen(s) + 1 位元組的空間給 val，此空間會作為新結構體 elm 的 value 指向的空間。如果 val 為 NULL，釋放 elm 並回傳 false。
接著使用 memcpy 將 s 的資料複製至 val，並執行 elm->value = val。
最後再將新的結構體的 list 插入至 head 串列中，並回傳 true。

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head) {
        return false;
    }
    element_t *elm = malloc(sizeof(element_t));
    if (!elm) {
        return false;
    }
    size_t len = strlen(s) + 1;
    char *val = malloc(sizeof(char) * len);
    if (!val) {
        free(elm);
        return false;
    }
    memcpy(val, s, len);
    elm->value = val;
    list_add(&elm->list, head);
    return true;
}

q_insert_tail()

作法和 q_insert_head() 類似，不過最後是呼叫 list_add_tail() 插入。

bool q_insert_tail(struct list_head *head, char *s)
{
    if (!head) {
        return false;
    }
    element_t *elm = malloc(sizeof(element_t));
    if (!elm) {
        return false;
    }
    size_t len = strlen(s) + 1;
    char *val = malloc(sizeof(char) * len);
    if (!val) {
        free(elm);
        return false;
    }
    memcpy(val, s, len);
    elm->value = val;
    list_add_tail(&elm->list, head);
    return true;
}

q_remove_head()

檢查 queue 是否為 NULL 或是為空，是則回傳 NULL。
呼叫 list_first_entry(head, element_t, list) 取得 head 結構體的位址。
呼叫 list_del_init(&elm->list) 移除該節點。
將該結構體 value 的內容複製到 sp。
原本是使用 memcpy() 複製，不過使用 valgrind 發現此處會發生 Invalid read of size，原因是當 bufsize 大於 elm->value 的 size 會複製到不該存取的空間，於是改成 strncpy，會有額外檢查機制。q_remove_tail() 同理。

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    if (!head || list_empty(head)) {
        return NULL;
    }
    element_t *elm = list_first_entry(head, element_t, list);
    list_del_init(&elm->list);
    if (sp) {
        strncpy(sp, elm->value, bufsize);
        sp[bufsize - 1] = '\0';
    }
    return elm;
}

q_remove_tail()

與 q_remove_head() 類似，不過使用 list_last_entry(head, element_t, list) 取得 tail 結構體的位址。

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    if (!head || list_empty(head)) {
        return NULL;
    }
    element_t *elm = list_last_entry(head, element_t, list);
    list_del_init(&elm->list);
    if (sp) {
        strncpy(sp, elm->value, bufsize);
        sp[bufsize - 1] = '\0';
    }
    return elm;
}

q_size()

採用作業說明的方式。
後續改進：
- 可以用一個空間來存放串列的 size，每當串列有更動時就更新這個空間，需要討論的是這個空間要放在哪裡。 TODO
- 除此之外，如果是用整數來儲存，就需要額外討論 overflow 的問題，串列的 size 必須設限。

int q_size(struct list_head *head)
{
    if (!head)
        return 0;

    int len = 0;
    struct list_head *li;

    list_for_each (li, head)
        len++;
    return len;
}

q_delete_mid()

檢查 queue 是否為 NULL 或是為空，是則回傳 false。
找尋中間節點的部分是參考你所不知道的 C 語言: linked list 和非連續記憶體 - 案例探討: Leetcode 2095. Delete the Middle Node of a Linked List 同學更新的共筆，比較不一樣的地方是，這裡使用的是雙向串列，不用額外的空間去暫存上一個節點或是使用指標的指標。
最後再將相關空間釋放。

bool q_delete_mid(struct list_head *head)
{
    // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
    if (!head || list_empty(head)) {
        return false;
    }
    struct list_head *slow = head->next;
    for (struct list_head *fast = head->next;
         fast != head && fast->next != head; fast = fast->next->next) {
        slow = slow->next;
    }
    element_t *del = list_entry(slow, element_t, list);
    list_del(slow);
    free(del->value);
    free(del);
    return true;
}

q_delete_dup()

檢查 head 是否為 NULL，是則回傳 false。
用一個 bool 值 has_dup 代表有沒有發現 duplicates。
呼叫 list_for_each_safe (node, tmp, head) 走訪每一個節點，並判斷此節點和下一個節點的 value 是否相同，是則將 has_dup 設為 true，然後從串列中移除此節點並釋放相關空間。
若 value 不同，而 has_dup 依然是 true，代表此節點是連續相同值的節點中的最後一個，將 has_dup 設回 false 並釋放相關空間。

bool q_delete_dup(struct list_head *head)
{
    // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
    if (!head) {
        return false;
    }
    struct list_head *node, *tmp;
    bool has_dup = false;
    list_for_each_safe (node, tmp, head) {
        if (node->next != head &&
            !strcmp(list_entry(node, element_t, list)->value,
                    list_entry(node->next, element_t, list)->value)) {
            has_dup = true;
            element_t *del = list_entry(node, element_t, list);
            list_del(node);
            free(del->value);
            free(del);
        } else if (has_dup) {
            has_dup = false;
            element_t *del = list_entry(node, element_t, list);
            list_del(node);
            free(del->value);
            free(del);
        }
    }
    return true;
}

q_swap()

我採用的方法是，先用 list_for_each (node, head) 走訪節點 (注意：這裡不會走訪每一個節點)，雖然會牽涉到刪除節點，不過我還是使用 list_for_each，以下會有說明。
每次迭代中會先檢查 node->next 是否等於 head，考慮到串列節點個數若是奇數，最後一個節點落單不需要再做交換。
接著會用 tmp 指標暫存下一個節點的位址，然後先後執行 list_del(node) 和 list_add(node, tmp) 來完成交換。
最後 list_for_each() 的 node = node->next 會將 node 指標指向下一組的第一個節點 (假設兩兩一組)，所以只會走訪奇數編號的節點 (假設從 head->next 的節點從 1 向上編號)。

void q_swap(struct list_head *head)
{
    // https://leetcode.com/problems/swap-nodes-in-pairs/
    struct list_head *node;
    list_for_each (node, head) {
        if (node->next == head) {
            break;
        }
        struct list_head *tmp = node->next;
        list_del(node);
        list_add(node, tmp);
    }
}

q_reverse()

一開始是用 list_for_each_safe 走訪每個節點，並將該節點的 next 和 prev 指標指向的位址互換，程式碼如下：

void q_reverse(struct list_head *head)
{
    struct list_head *node, *tmp;
    list_for_each_safe (node, tmp, head) {
        struct list_head *prev = node->prev;
        node->prev = node->next;
        node->next = prev;
    }
}

執行 mack check 後對應的輸出結果如下：

l = [gerbil bear dolphin meerkat bear]
cmd> # Reverse it
cmd> reverse
l = [gerbil]

會發現串列中只剩下 gerbil 這個節點，於是我追蹤程式，繪製成圖。
為方便講解，假設串列中只有三個節點，value 分別為 gerbil、bear 和 dolphin，l = [gerbil bear dolphin]，如下圖：

按照上述程式碼，reverse 結果如下：

不得不說這張圖真難畫，不知道有沒有比較好的畫法？

請愛用 edotor

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

可以發現 head 節點的 prev 和 next 指標沒有修改到，next 指標依舊指到 gerbil 結構體的節點，而 gerbil 節點的 next 指標因為 reverse，指回 head 節點。

list_for_each_safe 走訪完後，需要再將 head 節點的 prev 和 next 指標指向的位址交換。除此之外，一開始需要檢查 queue 是否為 NULL 或是 empty，對應程式碼修改如下：

void q_reverse(struct list_head *head)
{
    if (!head || list_empty(head)) {
        return;
    }
    struct list_head *node, *tmp;
    list_for_each_safe (node, tmp, head) {
        struct list_head *prev = node->prev;
        node->prev = node->next;
        node->next = prev;
    }
    tmp = head->next;
    head->next = head->prev;
    head->prev = tmp;
}

q_sort()

我採用遞迴版本的 merge sort，參考 linux2021-quiz2：

void merge_sort_list(struct list_head *head)
{
    if (list_empty(head) || list_is_singular(head)) {
        return;
    }
    struct list_head *slow = head->next;
    for (struct list_head *fast = head->next;
         fast != head && fast->next != head; fast = fast->next->next) {
        slow = slow->next;
    }
    LIST_HEAD(sorted);
    LIST_HEAD(left);
    list_cut_position(&left, head, slow);
    merge_sort_list(&left);
    merge_sort_list(head);
    merge(&left, head, &sorted);
    INIT_LIST_HEAD(head);
    list_splice_tail(&sorted, head);
}

其中，找尋中間節點方式是採用與 q_delete_mid() 一樣的方式：

struct list_head *slow = head->next;
for (struct list_head *fast = head->next;
     fast != head && fast->next != head; fast = fast->next->next) {
    slow = slow->next;
}

執行 trace-04 的命令：

./qtest -v 3 -f traces/trace-04-ops.cmd

會發現在 merge_sort_list() 的地方出現 Segmentation fault：

Program received signal SIGSEGV, Segmentation fault.
0x000055555555ab54 in merge_sort_list (head=head@entry=0x7fffff7ff040) at queue.c:280
280	    LIST_HEAD(sorted);

用 GDB 的 backtrace (bt) 查看各個 frame，發現遞迴深度異常，造成堆疊溢出：

(gdb) bt
#0  0x000055555555ab54 in merge_sort_list (head=head@entry=0x7fffff7ff040) at queue.c:280
#1  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff090) at queue.c:283
#2  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff0e0) at queue.c:283
#3  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff130) at queue.c:283
#4  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff180) at queue.c:283
#5  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff1d0) at queue.c:283
#6  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff220) at queue.c:283
#7  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff270) at queue.c:283
#8  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff2c0) at queue.c:283
#9  0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff310) at queue.c:283
#10 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff360) at queue.c:283
#11 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff3b0) at queue.c:283
#12 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff400) at queue.c:283
#13 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff450) at queue.c:283
#14 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff4a0) at queue.c:283
#15 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff4f0) at queue.c:283
#16 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff540) at queue.c:283
#17 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff590) at queue.c:283
#18 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff5e0) at queue.c:283
#19 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffff7ff630) at queue.c:283
.
.
.

進一步查看各個 frame 的 local variables，發現 slow 的結果為 <optimised out>：

(gdb) bt -full -20
.
.
.
#104783 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffffffd8f0) at queue.c:283
        slow = <optimised out>
        sorted = {prev = 0x7fffffffd890, next = 0x7fffffffd890}
        left = {prev = 0x7fffffffd8a0, next = 0x7fffffffd8a0}
#104784 0x000055555555ab9f in merge_sort_list (head=head@entry=0x7fffffffd940) at queue.c:283
        slow = <optimised out>
        sorted = {prev = 0x7fffffffd8e0, next = 0x7fffffffd8e0}
        left = {prev = 0x555555566e48, next = 0x555555566e48}
#104785 0x000055555555ab9f in merge_sort_list (head=head@entry=0x555555566fb0) at queue.c:283
        slow = <optimised out>
        sorted = {prev = 0x7fffffffd930, next = 0x7fffffffd930}
        left = {prev = 0x555555567108, next = 0x555555567108}
#104786 0x000055555555ac18 in q_sort (head=0x555555566fb0) at queue.c:300

-full
Print the values of the local variables also. This can be combined with the optional count to limit the number of frames shown.

-n
Print only the outermost n frames, where n is a positive number.

查閱 GDB 文件 Debugging with GDB，其中有提到關於 optimized-out 的變數可以關閉編譯器最佳化重新編譯：

If you need to display the values of such optimized-out arguments, either deduce that from other variables whose values depend on the one you are interested in, or recompile without optimizations.

(gdb) bt -full -20
.
.
.
#87320 0x000055555555c2b8 in merge_sort_list (head=0x7fffffffd910) at queue.c:283
        slow = 0x555555568df8
        sorted = {prev = 0x7fffffffd8a0, next = 0x7fffffffd8a0}
        left = {prev = 0x7fffffffd8b0, next = 0x7fffffffd8b0}
#87321 0x000055555555c2b8 in merge_sort_list (head=0x7fffffffd970) at queue.c:283
        slow = 0x555555568df8
        sorted = {prev = 0x7fffffffd900, next = 0x7fffffffd900}
        left = {prev = 0x7fffffffd910, next = 0x7fffffffd910}
#87322 0x000055555555c2b8 in merge_sort_list (head=0x7fffffffd9d0) at queue.c:283
        slow = 0x555555568e48
        sorted = {prev = 0x7fffffffd960, next = 0x7fffffffd960}
        left = {prev = 0x555555568e48, next = 0x555555568e48}
#87323 0x000055555555c2b8 in merge_sort_list (head=0x555555568fb0) at queue.c:283
        slow = 0x555555569108
        sorted = {prev = 0x7fffffffd9c0, next = 0x7fffffffd9c0}
        left = {prev = 0x555555569108, next = 0x555555569108}
#87324 0x000055555555c347 in q_sort (head=0x555555568fb0) at queue.c:300

會發現 slow 指標在遞迴到 frame #87321 以後 (frame number

\leq

87321) 都維持指向 0x555555568df8 這個位址，表示尋找中間節點有問題。

回頭分析尋找中間節點的過程，發現當串列的 size = 2 時，會發生問題，假設 sub_list 的兩個節點 value 分別為 "1" 和 "2" (以 node₁、node₂ 表示)，示意圖如下，一開始 fast 和 slow 指標都指向 head->next (即 node₁)：

進行一次迭代以後，fast 指標指向 head 節點， slow 指標指向 node₂：

此時 fast == head，跳出迴圈，中間節點為 slow 指標指向的節點 (即 node₂)。接著，執行以下程式碼：

LIST_HEAD(sorted);
LIST_HEAD(left);
list_cut_position(&left, head, slow);
merge_sort_list(&left);

其中，list_cut_position() 的實作如下：

static inline void list_cut_position(struct list_head *head_to,
                                     struct list_head *head_from,
                                     struct list_head *node)
{
    struct list_head *head_from_first = head_from->next;

    if (list_empty(head_from))
        return;

    if (head_from == node) {
        INIT_LIST_HEAD(head_to);
        return;
    }

    head_from->next = node->next;
    head_from->next->prev = head_from;

    head_to->prev = node;
    node->next = head_to;
    head_to->next = head_from_first;
    head_to->next->prev = head_to;
}

list_cut_position() 會將 head_from->next 節點到 node 節點 (包含) 接至 head_to 串列，原串列中剩餘的節點會接到 head from 串列，執行後結果如下：

分割位置後 left 串列的 size 依然是 2，呼叫 merge_sort_list() 後，會回到之前的循環，造成遞迴不會中止，堆疊溢出。

要解決此問題，需要重新思考找尋中間節點的方法，使得 slow 指標指向 node₁，以下是修改後的程式碼：

struct list_head *fast = head->next, *slow;
list_for_each (slow, head) {
    if (fast->next == head || fast->next->next == head) {
        break;
    }
    fast = fast->next->next;
}

完整程式碼如下：

void merge(struct list_head *head_a,
           struct list_head *head_b,
           struct list_head *sorted)
{
    if (list_empty(head_a)) {
        list_splice_tail(head_b, sorted);
        return;
    }
    if (list_empty(head_b)) {
        list_splice_tail(head_a, sorted);
        return;
    }

    while (!list_empty(head_a) && !list_empty(head_b)) {
        char *lv = list_entry(head_a->next, element_t, list)->value;
        char *rv = list_entry(head_b->next, element_t, list)->value;
        struct list_head *tmp =
            strcmp(lv, rv) <= 0 ? head_a->next : head_b->next;
        list_del(tmp);
        list_add_tail(tmp, sorted);
    }
    list_splice_tail(list_empty(head_a) ? head_b : head_a, sorted);
}

void merge_sort_list(struct list_head *head)
{
    if (list_empty(head) || list_is_singular(head)) {
        return;
    }
    struct list_head *fast = head->next, *slow;
    list_for_each (slow, head) {
        if (fast->next == head || fast->next->next == head) {
            break;
        }
        fast = fast->next->next;
    }
    LIST_HEAD(sorted);
    LIST_HEAD(left);
    list_cut_position(&left, head, slow);
    merge_sort_list(&left);
    merge_sort_list(head);
    merge(&left, head, &sorted);
    INIT_LIST_HEAD(head);
    list_splice_tail(&sorted, head);
}

void q_sort(struct list_head *head)
{
    if (!head || list_empty(head) || list_is_singular(head)) {
        return;
    }
    merge_sort_list(head);
}

將 q_sort 原本的 q_size(head) < 2 敘述改為 list_is_singular，善用現有的 API

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

還有 list_empty(head)

引入 lib/list_sort.c

整合至 lab0-c

在 lib/list_sort.c 第 222 行使用到 likely 這個巨集，是被定義在 include/linux/compiler.h，需要對其進行改。同理，unlikely 也要一併修改：

# define likely(x)	__builtin_expect(!!(x), 1)
# define unlikely(x)	__builtin_expect(!!(x), 0)

關於這個 built-in function long __builtin_expect (long exp, long c) 可以參見 Other Built-in Functions Provided by GCC。
我使用 Compiler Explorer 簡單測試 likely() 和 unlikely() 是否有不同的編譯結果，前者在 RISC-V rv32gc 會被編譯成 beq a5, zero, <label>，後者會被編譯成 bne a5, zero, <label>，測試程式如下：

#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
 
void test_likely(volatile int x)
{
    if (likely(x)) {
        x = 1;
    } else {
        x = 0;
    }
}
 
void test_unlikely(volatile int x)
{
    if (unlikely(x)) {
        x = 1;
    } else {
        x = 0;
    }
}

接著，在第 56 行宣告一個 count 變數，type 為 u8，被定義在 tools/include/linux/types，用來計算 pending list 的個數，這裡我更改為 uint8_t。

而在 lib/list_sort.c 中使用到 __attribute__ 這個關鍵字，如下：

__attribute__((nonnull(2, 3))) void list_sort(void *priv,
                                              struct list_head *head,
                                              list_cmp_func_t cmp)

在 GNU C and C++ 中，可以用來描述函式的特定性質 (參見 Declaring Attributes of Functions)。而這裡使用到的是 nonnull 這個 attribute (參見 Function-Attributes-nonnull)

The nonnull attribute specifies that some function parameters should be non-null pointers.

可以根據指引在 Makefile 新增 -Wnonnull flag。

在 queue.c 中增加有關 linux list_sort 的程式碼：

static int value_cmp(void *priv,
                     const struct list_head *a,
                     const struct list_head *b)
{
    char *av = list_entry(a, element_t, list)->value;
    char *bv = list_entry(b, element_t, list)->value;
    return strcmp(av, bv);
}

void q_linux_sort(struct list_head *head)
{
    if (!head || list_empty(head) || list_is_singular(head)) {
        return;
    }
    list_sort(NULL, head, value_cmp);
}

另外，我參考 laneser 的作法，增加 linuxsort option 來切換 sort 的實作，當 linuxsort = 0 時採用 q_sort() 的方式，當 linuxsort = 1 時採用 q_linux_sort() 的方式。

我嘗試引入 lib/list_sort.c 到專案，想請問相關的程式碼是不是同樣要以 GPLv2 釋出？

是的，你要在 README.md 標注授權條款，這二者不衝突

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

lib/list_sort.c 實作分析

在探討 lib/list_sort.c 相關實作時，除了單純 trace code 以外，也需要了解為什麼 Linux 核心會採用這段程式碼，開發者是如何思考以及做一些取捨。我們可以看 github commit history，lib/list_sort.c 最近一次演算法上的改動是 May 15, 2019, commit b5c56e0c, lib/list_sort: optimize number of calls to comparison function 引入，其中引用了三篇論文：

[1] Bottom-up Mergesort: A Detailed Analysis
Wolfgang Panny, Helmut Prodinger
Algorithmica 14(4):340–354, October 1995
https://doi.org/10.1007/BF01294131
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260

[2] The cost distribution of queue-mergesort, optimal mergesorts, and
power-of-two rules
Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen
Journal of Algorithms 30(2); Pages 423–448, February 1999
https://doi.org/10.1006/jagm.1998.0986
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380

[3] Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253–259, 10 December 1993
https://doi.org/10.1016/0020-0190(93)90088-q
https://sci-hub.tw/10.1016/0020-0190(93)90088-Q

對於 comparison 次數的探討，我們可以寫成以下形式：

\begin{matrix} n l o g_{2} n - K n + O (1) \end{matrix}

其中，以下 merge sort 的變形 (variants)，領導係數 (leading coefficient) 皆為

l o g_{2} (n!)

，探討的著重點在於一次項係數 K。

開發者探討了 merge sort 的三種實作方式，分別為 top-down mergesort、bottom-up mergesort 和 queue-mergesort，以及開發者提出的方式，以下簡述不同的實作方式：

1. Top-down mergesort

有最少的 average case、worst case 的 comparison 次數。
需要使用遞迴或是額外空間作為 user stack。
需要事先知道整個 list 的大小。

下圖例子是 balanced power-of-2 rule dividing strategy：

2. Bottom-up mergesort

在這幾種變形中需要最多的 comparison 次數。

3. Queue-mergesort

特別適合用於鏈結串列的排序。
queue-mergesort comparison 的次數少於 bottom-up mergesort，略高於 top-down mergesort。
可以以 top-down 或是 bottom-up 的方式實作。
透過 get front、put back 操作，因此排序完的結果會是 unstable。

根據 [3] 的演算法，pseudo code 如下

queue-mergesort(Q):
    while (Q.size != 1) do
        Q.put(Merge(Q.get, Q.get))

4. lib/list_sort.c

如果查看更之前的版本 commit 043b3f7b 會發現是用 bottom-up mergesort 實作。TODO

效能比較

以下分別對 My sort 和 Linux sort 進行 100 萬筆資料的排序，共 5 次，考慮到整個排序的瓶頸是在呼叫比較函式的次數，以下分別測量時間和比較次數，發現我使用的 top-down merge 的確比較次數較少：

其實這裡說瓶頸是在呼叫比較函式的次數不太正確，後續實驗也發現 compare 次數少，不見得執行時間短，也需要考慮 cache locality (以下三者都是深度優先，可能對於 cache 的友善程度不同？待實驗)，以及 function call 的成本，像是 top-down mergesort 我使用遞迴法，雖然 compare 次數少，但是執行時間最久。

My sort

test	time	compare
1	1.681	18674748
2	1.688	18674652
3	1.666	18674308
4	1.667	18673658
5	1.670	18673899

Linux sort

test	time	compare
1	1.394	18686507
2	1.415	18687977
3	1.432	18686862
4	1.397	18686900
5	1.397	18686296

後來我又加測更之前的 bottom-up mergesort 版本：

Linux sort (commit 043b3f7b)

test	time	compare
1	1.357	18716830
2	1.375	18716517
3	1.365	18715850
4	1.357	18715780
5	1.351	18716069

發現確實如開發者說的，bottom-up mergesort 的 compare 次數最高，不過所花的時間是三者最少的。看到這裡可能還看不到開發者的考量點，於是我接著進行以下實驗，分別測試兩個 linux sort 的版本 (即 2:1 balanced merge 和 bottom-up mergesort)，以及兩種不同大小的測資，非別為 1048580，略大於 2 的冪，以及 1048576，2 的冪：

2:1 balanced merge (commit b5c56e0c)

test size 1048580	time	compare	test size 1048576	time	compare
1	1.473	19645491	1	1.491	19644674
2	1.489	19645794	2	1.475	19646095
3	1.520	19644566	3	1.471	19645472
4	1.506	19645249	4	1.511	19644989
5	1.490	19646700	5	1.494	19645982

Bottom-up mergesort (commit 043b3f7b)

test size 1048580	time	compare	test size 1048576	time	compare
1	1.546	20250025	1	1.419	19645530
2	1.570	20676398	2	1.416	19645425
3	1.537	20414941	3	1.417	19645713
4	1.586	20556979	4	1.435	19646198
5	1.544	20606780	5	1.442	19645194

可以觀察到 bottom-up mergesort 在 best case (即 2 的冪) 表現確實比較優異，不過一旦排序的大小略高於 2 的冪 (這裡只多 4)，compare 的次數會增加近百萬，執行時間也會大幅增加，這就是開發者要這樣修改的原因，並且可以增加程式的可預測性。

至於為什不用 top-down mergesort，前面有討論過缺點，需要事先知道整個鏈結串列的大小，且對於空間使用的可預測性低。而 Queue-mergesort 則是 unstable sort，lib/list_sort.c 是 linux list 排序的界面，無法預期使用者的用途，若使用者的用途要求排序好的串列必須要是 stable，則會有非預期結果。

測試 script

#!/bin/bash
make clean
make GCOV=1
./qtest -f traces/trace-test-sort.cmd
gcov queue.c
sed -n 326p queue.c.gcov

在 qtest 提供新的命令 shuffle

這裡根據 Fisher–Yates shuffle 的第二種演算法實作，pseudo code 如下：

-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n−2 do
     j ← random integer such that i ≤ j < n
     exchange a[i] and a[j]

在 queue.c 增加 q_shuffle() 的實作，這裡先用沒有優化的版本，有待後續改進：

void q_shuffle(struct list_head *head)
{
    if (!head || list_empty(head) || list_is_singular(head)) {
        return;
    }
    int size = q_size(head);
    srand(time(NULL));
    struct list_head *left, *right;
    for (int i = 0; i < size; i++) {
        int j = rand() % (size - i) + i;
        if (i != j) {
            left = right = head->next;
            int t = i;
            while (t--) {
                left = left->next;
            }
            while (j--) {
                right = right->next;
            }
            struct list_head *posl = left->prev, *posr = right->prev;
            list_del(right);
            list_add(right, posl);
            if (j - i > 1) {
                list_del(left);
                list_add(left, posr);
            }
        }
    }
}

運用 Valgrind 排除 `qtest` 實作的記憶體錯誤

嘗試先觀察 traces/trace-eg.cmd 的命令，並得到以下對應輸出，發現 linenoise.c:1224 呼叫 malloc() 以及 linenoise.c:1236 呼叫 strdup() ，沒有對應的記憶體回收機制：

$ valgrind -q --leak-check=full ./qtest -f traces/trace-eg.cmd
.
.
.
==17974== 5 bytes in 1 blocks are still reachable in loss record 1 of 3
==17974==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==17974==    by 0x4A4E50E: strdup (strdup.c:42)
==17974==    by 0x110C1E: linenoiseHistoryAdd (linenoise.c:1236)
==17974==    by 0x1117B1: linenoiseHistoryLoad (linenoise.c:1325)
==17974==    by 0x10C79F: main (qtest.c:979)
==17974== 
==17974== 140 bytes in 19 blocks are still reachable in loss record 2 of 3
==17974==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==17974==    by 0x4A4E50E: strdup (strdup.c:42)
==17974==    by 0x110B92: linenoiseHistoryAdd (linenoise.c:1236)
==17974==    by 0x1117B1: linenoiseHistoryLoad (linenoise.c:1325)
==17974==    by 0x10C79F: main (qtest.c:979)
==17974== 
==17974== 160 bytes in 1 blocks are still reachable in loss record 3 of 3
==17974==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==17974==    by 0x110BDE: linenoiseHistoryAdd (linenoise.c:1224)
==17974==    by 0x1117B1: linenoiseHistoryLoad (linenoise.c:1325)
==17974==    by 0x10C79F: main (qtest.c:979)
==17974==

不過睡一覺起來做一些操作後再重現原本測試，發現一樣是執行 valgrind -q --leak-check=full ./qtest -f traces/trace-eg.cmd 這段命令，卻沒有產生 memory leak 資訊。

後來我再重新檢視後者的操作中與前者有什麼不同之處，發現我在後者的操作中有刪除掉 .cmd_history 這個檔案。經過測試，的確在有 .cmd_history 這個檔案並且該檔案有內容時會產生 memory leakage。

.cmd_history 這的檔案在 console.h:6 用 HISTORY_FILE 定義：

#define HISTORY_FILE ".cmd_history"

並且在 qtest.c:979 main 函式有呼叫 linenoise.c:1309 linenoiseHistoryLoad(HISTORY_FILE)，並且在該函式中，如果 history file (即 .cmd_history) 有命令時，會呼叫 linenoise.c:1215 linenoiseHistoryAdd(buf)，並且有呼叫 malloc 配置一空間給 char **history (用來暫存字元陣列的陣列)，以及呼叫 strdup 將 .cmd_history 檔案讀取到的命令放進 history 陣列中，這與 valgrind 產生出的資訊相符，的確有配置的空間需要釋放。

進一步找尋適當的位置將所配置的空間釋放，以解決 memory leak 的問題，發現的確有一函式 freeHistory() 負責處理上述提到的 malloc 以及 strdup 的記憶體釋放。而在 linenoiseAtExit() 有呼叫到 freeHistory() (該函式還會呼叫 disableRawMode()，跟 terminal interface 有關，尚未研究)，其中與 disableRawMode() 對應的是 enableRawMode()，會發現該函式有將 linenoiseAtExit() 註冊 atexit(3)，被註冊的函式會在 normal process termination 時以逆序被呼叫。如果追蹤哪些函式會呼叫 enableRawMode()，會發現有 linenoiseRaw() 跟 linenoisePrintKeyCodes() 兩個函式。前者在 linenoise() 會呼叫，後者沒有被任何函式呼叫。

接著，查看 console.c run_console() 會發現 has_infile true 或是 false (即有無 -f flag) 有不同的行為，前者直接透過 cmd_select() 進行相關直譯動作，並未呼叫與 linenoise 相關的函式，但是不管有無 -f flag，都會在 qtest 的 main 函式呼叫 linenoiseHistoryLoad()，因此當有 infile 時，會產生 memory leak。透過實驗也證實相同的命令在有 infile 會產生 memory leak，沒有 infile 時則不會。

對應測試命令檔案如下：

traces/trace-memleak1.cmd

new
ih hello
quit

綜合上述，當有 .cmd_history 這個檔案且非空，並且有 infile 時會產生 memory leak。對應的解決方式是將 qtest 的 main 函式中的 linenoiseHistoryLoad() 移至 run_console()，當有 infile 時才呼叫該函式。對應修改參見 commit 6db5d4b0。

Massif 視覺化

以下為 traces/trace-eg.cmd 命令檔案測試結果，比較修改前後的記憶體使用情形，發現修改前記憶體使用沒有歸零，修改完後有歸零。

Before
After