2024q1 Homework1 (lab0)

contributed by < wu81177 >

開發環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
    CPU family:          6
    Model:               140
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    CPU max MHz:         4700.0000
    CPU min MHz:         400.0000
    BogoMIPS:            5606.40

指定的佇列操作

`q_new`

實作 q_new 過程中，我學習到 list.h 中 list_head 和 INIT_LIST_HEAD 的部份

`LIST_HEAD`

從結構成員可以發現 list_head 組成的佇列是雙向佇列，同時我也理解到 list_head 除了獨立當作佇列的頭，也會嵌入佇列成員結構體中，佇列成員在這份作業中為 element_t

`INIT_LIST_HEAD`

此函數函式是用來初始化已存在的 list_head ，如果要同時宣告並初始化可用 LIST_HEAD（）

無論標題和內文中，中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)

q_free

list_for_each_safe 和 list_for_each_entry_safe 都能確保釋放當前節點不會遺失下一個節點，然而後者有使用 list_entry ，選用它就能輕鬆獲得元素的指標並釋放成員
迴圈中使用 q_release_element 來釋放空間，最後再釋放 head

改進你的漢語表達。

而在運行 q_test 的時候發現有問題因此使用 Valgrind 檢查，以下為檢察結果 :

ERROR: Attempted to free unallocated block.  Address = 0x4b8e850
==31792== Invalid read of size 8
==31792==    at 0x10F43C: find_header (harness.c:98)
==31792==    by 0x10F43C: test_free (harness.c:181)
==31792==    by 0x10F72E: q_release_element (queue.h:120)
==31792==    by 0x10F72E: q_free (queue.c:30)
...

發現少檢查 head 為空的情況

void q_free(struct list_head *head)
{   
+   if (!head)
+       return;
    element_t *entry, *safe;
    list_for_each_entry_safe (entry, safe, head, list)
        q_release_element(entry);
    free(head);
}

q_insert_head & q_insert_tail

原先 value 是直接使用 s 的空間，後來參考別人的發現應該使用 strdup ，因為 s 可能會被改動

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head)
        return false;
    element_t *e = malloc(sizeof(element_t));
    if (!e)
        return false;
-    e->value = s;
+    e->value = strdup(s);

    list_add(&e->list, head);
    return true;
}

使用 list_add 可以將新元素放置在 head 後面， list_add_tail 則可以放到 head 前面
發現應該要檢查 strdup 是否成功，如果失敗應該要釋放當前元素，如下

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head || !s)
        return false;
    element_t *e = malloc(sizeof(element_t));
    if (!e)
        return false;
    e->value = strdup(s);
+   if(!e->value){
+       free(e);
+       return false;
+   }
    list_add(&e->list, head);
    return true;
}

q_remove_head & q_remove_tail

從 list.h 中可以發現 container_of 和 list_entry 的功能完全相同，我選擇了前者來獲取 element_t 的指標
移除元素有 list_del 和 list_del_init 可用，我選用後者，否則再次使用已移除的元素時可能會出錯

發現沒有排除佇列為空的情況，原先的程式在下列第四行會出錯

-   if (!head)
+   if (!head || list_empty(head))
        return NULL;
    element_t *e = container_of(head->next, element_t, list);

q_size

使用 list_for_each 走訪佇列並且計數

q_delete_mid

我使用的方法是從鏈結串列頭 head 出發，同時使用兩個指標進行走訪，一個指標順行，另一個指標則逆行，直到它們相遇（奇數個元素情況）或者在交會前一步（偶數個元素情況）再刪除順行指標所指向的元素

發現使用 q_release_element 後再用 free 釋放元素內容是多餘的

bool q_delete_mid(struct list_head *head)
{
    if (!head || list_empty(head))
        return false;
    struct list_head *forward = head->next;
    struct list_head *backward = head->prev;
    while (forward != backward && forward->next != backward) {
        forward = forward->next;
        backward = backward->prev;
    }
    element_t *e = container_of(forward, element_t, list);
    list_del(forward);
    q_release_element(e);
-    free(forward)
    return true;
}

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

留意詞彙的使用：

q_delete_dup

使用兩層迴圈比對，原本都想用 list_for_each ，後來發現會重複比對，改成以下這樣

    list_for_each (outer, head) {
-        list_for_each(inner, head){
+        for (inner = (outer)->next; inner != (head); inner = inner->next) {
            ...
        }
    }

發現當沒有元素或單元素應該 return true

-   if (!head || head->prev == head->next) {
+   if (!head)
        return false;
+   if (head->prev == head->next)
+       return true;

原先的寫法一直無法順利運作

bool q_delete_dup(struct list_head *head)

// https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
if (!head | head->prev == head->next) {
return false;

struct list_head *outer, *inner, *tmp;
element_t *outer_ele, *inner_ele;
for (outer = (head)->next; outer != (head)->prev; outer = outer->next) {
for (inner = (outer)->next; inner != (head);) {
outer_ele = container_of(outer, element_t, list);
inner_ele = container_of(inner, element_t, list);
if (strcmp(outer_ele->value, inner_ele->value) == 0) {
tmp = inner->next;
list_del(inner);
q_release_element(inner_ele);
inner = tmp;
} else {
inner = inner->next;
}
}
}
return true;
}

明確標注同學的 GitHub 帳號名稱

於是換了寫法，經過同學 kkkkk1109 提醒發現輸入的佇列是已排序的，因此改用他的寫法

你的洞見和檢討呢？

原先針對為排序的串列要使用兩層迴圈檢查有沒有重複，但由於輸入串列已排序，相同的鍵值一定會相鄰，因此只需要檢查相鄰的節點，如果和前者相同即刪除，如此一來可以將時間複雜度降低 N 倍

q_reverse

使用 list_for_each_safe 走訪佇列，沿途使用 list_move 將元素移至首位，如此一來就能倒轉順序
原先想將 q_reverse 視為 q_reverseK 的特例，後來覺得在 q_reverseK 中呼叫 q_reverse 比較好實作

多檢查空元素和單元素的情況以提早退出

-   if (!head)
+   if (!head || head->prev == head->next)
        return;

q_reverseK

使用 list_cut_position 剪下子佇列，用 q_reverse 倒序後再用 list_splice 接回去
list_cut_position(head_to, head_from, node) 剪下的部份並不包含 head_from 和 node 兩個節點

q_swap

視為 q_reverseK k 為2的特例

q_ascent

從 head 逆向走訪佇列，如果下一個元素大於當前元素則使用 list_del 刪除下個元素，如此一來就能留下遞增元素
學會 strcmp () 怎麼用，此函數會開始比較每一個字串的第一個字元。如果它們彼此相等，會繼續往下進行配對，直到字元不同或到達較短字串的結尾

q_descent

從 head 逆向走訪佇列，如果下一個元素小於當前元素則使用 list_del 刪除下個元素，如此一來就能留下遞減元素

q_sort

由於我在 q_merge 中會使用這個函式，在使用的前一步它已經是分段排序的佇列，這時候如果用 merge sort 會有不錯的效果，所以我選擇用 merge sort實作之
用來合併兩以排序佇列的函式我定義為 merge_two ，在裡面原先判斷是否遞減和兩元素比較大小的部份使用兩層 if ，後來改為以下較緊湊的寫法

 while (!list_empty(l1) && !list_empty(l2)) {
        element_t *ele_l1 = list_first_entry(l1, element_t, list);
        element_t *ele_l2 = list_first_entry(l2, element_t, list);
-       if(descend){
-           if(strcmp(ele_l1->value, ele_l2->value) < 0)
-                list_move_tail(l2->next, &tmp_head);
-           else
-                list_move_tail(l1->next, &tmp_head);
-       } else {
-            if(strcmp(ele_l1->value, ele_l2->value) < 0)
-                list_move_tail(l1->next, &tmp_head);
-            else
-                list_move_tail(l2->next, &tmp_head);
-        }
+        if (descend == (strcmp(ele_l1->value, ele_l2->value) < 0)) {
+            list_move_tail(l2->next, &tmp_head);
+        } else {
+            list_move_tail(l1->next, &tmp_head);
+        }
 }

而在 q_sort 中，使用先前 q_delete_mid 函式中的方法找到中點，再使用 list_cut_position 剪下後半段，用遞迴的方式分別排序前後兩段再用 merge_two 合併

q_merge

我首先去理解 queue_contex_t 的運作原理
每個 queue_contex_t 代表了一個佇列，其中 q 會指向佇列的頭，而 chain 則是用來連接不同的 queue_contex_t
在這個函式中，我將所有的佇列使用 list_splice 拼接到第一個佇列，並且使用 list_del_init 移除已合併的佇列，最後再使用 q_sort 排序

取第一個佇列時誤用 list_entry，後來使用 list_first_entry

-queue_contex_t *first_q = list_entry(&head->next, queue_contex_t, chain);
+queue_contex_t *first_q = list_first_entry(head, queue_contex_t, chain);

有時會出現 merge 完佇列消失的情況

l = [c r z]
cmd> merge
l = []

後來發現在清空佇列後不移除當前 queue_contex_t 就好了

list_for_each_entry_safe (curr, safe, head, chain) {
        if (curr == first_q)
            continue;
        first_q->size += curr->size;
        list_splice_init(curr->q, first_q->q);
-       list_del_init(&curr->chain);
}

結果

使用 make test 測試，只剩 trace-17 無法通過

+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
Probably constant time
Probably constant time
---     trace-17-complexity     0/5
---	TOTAL		95/100
make: *** [Makefile:60: test] Error 1

可以看到唯 it 和 ih 無法通過，有時候 it 會通過，分數為95
最後經由底下修改 lab0/dudect 後可以得到100分

使用 Valgrind 檢查錯誤

先前使用 Valgrind 發現了實作 q_free 的暇疵，而最後我也嘗試使用它來找出 trace-17 中 it 和 ih 無法通過的原因，使用過程如下

$ valgrind -q --leak-check=full ./qtest 
cmd> option simulation 1
cmd> it
Probably constant time
cmd> ih
Probably constant time
cmd> rh
Probably constant time
cmd> rt
Probably constant time
cmd> option simulation 0
cmd> quit
Freeing queue

Valgrind 沒有找到記憶體使用問題，同時也發現可能在通過邊緣，上面單獨測試it 和 ih 沒問題

實作 Shuffle

在 queue.c 中加上 q_shuffle 函式， q_test 中加上 do_shuffle 函式檢查是否為空或單節點，還有 console_init 中加上 ADD_COMMAND(shuffle, "Do Fisher-Yates shuffle", "");

* void q_shuffle(struct list_head *head) {
    if (!head || head->next == head->prev)
        return;
    for (int len = q_size(head)-1; len > 0; len--) {
        int random = 1 + rand() % len;
        struct list_head *old = NULL, *new = NULL;
        int j = 1;
        struct list_head *pos;
        list_for_each(pos, head) {
            if (j == random) {
                old = pos;
            }
            if (j == len + 1) {
                new = pos;
                break;
            }
            j++;
        }
        struct list_head *pre_new = new->prev;
        struct list_head *pre_old = old->prev;
        if(old->next == new){
            list_del(old);
            list_add(old,new);
+       } else {
+           list_del(new);
+           list_del(old);
+           list_add(new, pre_old);
+           list_add(old, pre_new);
        }
    }
}

使用 git diff 或 diff -u 來產生程式碼之間的變異清單，不要憑感覺填入，注意位移量。

首先在外層迴圈把 len 減少，裡面抽出要交換的 index 後用 list_for_each 找到要交換的 old 和 new ，最後交換節點
，交換的部份原先的寫法沒考慮到兩節點相鄰的狀況

實驗結果

測試 [1, 2, 3, 4] 做 shuffle 1,000,000 次的直方圖，可以看到結果很平均

改進 lab0-c 的 constant time test

閱讀〈Dude, is my code constant time?〉

避免非必要的項目縮排 (即 * 和 - )，以清晰、明確，且流暢的漢語書寫。

檢查程式是否為常數運行時間很重要是因為如果不是，有機會受到時序攻擊，資安上存在疑慮
而作者提出了 dudect 工具來測試程式是否為 constant time ，且相較於其他常見工具， dudect 不需要對硬體行為進行建模，也使用了 cropping 預處理，捨棄掉大於 p percentile 的資料，避免離群值過分影響統計結果，而 t value 的計算是使用 Welch’s t 檢定法。

修改 lab0/dudect

檢查 fixture.c 發現沒有進行 cropping ，因此從作者的原始碼以 percentile 為關鍵字找到 percentile 和 prepare_percentiles 函式，接著將前述兩個函式和他們會使用到的函式複製到 fixture.c 裡面，再修改 data type 就可以通過 trace-17-complexity，得到滿分

+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
Probably constant time
Probably constant time
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
---	TOTAL		100/100

Linux 核心鏈結串列排序

研讀 Linux 核心原始程式碼的 lib/list_sort.c (commit b5c56e) ，首先覺得陌生的就是__attribute__((nonnull()))，後來了解到__attribute__ 是 gcc 中用來附加屬性給函式或結構的用法，提供編譯器額外的訊息，而 nonnull(2,3,4,5) 是用來表示函式內的第 2,3,4,5 個參數不能為空，如此一來就不用在函式內檢查，接下來逐一記錄研讀 list_sort.c 中函式的心得。

不要「舉燭」，你該質疑 Linux 核心開發者「如果不這樣做，會怎樣？」

`merge`

此函式是用來合併已排序鏈結串列，過程中走訪兩個傳入的串列並且使用 cmp(priv, a, b) 來比較 a, b 節點中 priv 的大小，決定誰要排在前面，最後回傳合併的串列
值得注意的是，根據註解的說法，此函式回傳的是單向並且非環狀的串列，這樣比較適合多次合併使用。

`merge_final`




















for (;;) {
		/* if equal, take 'a' -- important for sort stability */
		if (cmp(priv, a, b) <= 0) {
			tail->next = a;
			a->prev = tail;
			tail = a;
			a = a->next;
			if (!a)
				break;
		} else {
			tail->next = b;
			b->prev = tail;
			tail = b;
			b = b->next;
			if (!b) {
				b = a;
				break;
			}
		}
	}

前面的 for 迴圈中，除了將兩個串列合併之外，也建立了雙向的連接，然而就算是 b 先結束， a 剩下部份的開頭也統一改名為 b




















tail->next = b;
	do {
		/*
		 * If the merge is highly unbalanced (e.g. the input is
		 * already sorted), this loop may run many iterations.
		 * Continue callbacks to the client even though no
		 * element comparison is needed, so the client's cmp()
		 * routine can invoke cond_resched() periodically.
		 */
		if (unlikely(!++count))
			cmp(priv, b, b);
		b->prev = tail;
		tail = b;
		b = b->next;
	} while (b);

	/* And the final links to make a circular doubly-linked list */
	tail->next = head;
	head->prev = tail;
}

這段主要是將 b 剩下的部份改成雙向串列，並且在最後將頭尾相接使其形成環狀，但 if (unlikely(!++count)) 這段看不懂，經過查找資料後了解 unlikely 是一個巨集被定義為 # define unlikely(x) __builtin_expect(!!(x), 0) ，其中 !!(x) 會確保輸入值轉變為布林值，而 __builtin_expect 是 gcc 的內建函式，查看手冊中 Built-in Function: long __builtin_expect (long exp, long c) 舉例：

if (__builtin_expect (x, 0))
  foo ();

當 foo () 被執行的情況很罕見的時候可以這樣寫，幫助編譯器最佳化
而在 merge_final 中 count 增加，就會變非 0 ，執行 cmp(priv, b, b) ，但我還是不知道為什麼要執行 cmp(priv, b, b) 比較 b 自己，而且由於 ++ 是在前面，所以在執行第一次的時候 count 就會是非 0

`list_sort`












/* Find the least-significant clear bit in count */
for (bits = count; bits & 1; bits >>= 1)
    tail = &(*tail)->prev;
/* Do the indicated merge */
if (likely(bits)) {
    struct list_head *a = *tail, *b = a->prev;

    a = merge(priv, cmp, b, a);
    /* Install the merged result in place of the inputs */
    a->prev = b->prev;
    *tail = a;

for 迴圈會逐位尋找二進位 count ，直到遇到第一個 0 結束迴圈，接下來判斷如果剩餘的位數存在 1 就進行合併，這樣做相當於當 count+1 不為

2^{k}

時合併，參考何時合併的表格：

count 變化	count 二進位	merge	pending 上的節點
0 $\to$ 1	0000 $\to$ 0001	no（ $2^{0}$ ）	1
1 $\to$ 2	0001 $\to$ 0010	no（ $2^{1}$ ）	1 $\leftarrow$ 1
2 $\to$ 3	0010 $\to$ 0011	yes	(2) $\leftarrow$ 1
3 $\to$ 4	0011 $\to$ 0100	no（ $2^{2}$ ）	2 $\leftarrow$ 1 $\leftarrow$ 1
4 $\to$ 5	0100 $\to$ 0101	yes	2 $\leftarrow$ (2) $\leftarrow$ 1
5 $\to$ 6	0101 $\to$ 0110	yes	(4) $\leftarrow$ 1 $\leftarrow$ 1
6 $\to$ 7	0110 $\to$ 0111	yes	4 $\leftarrow$ (2) $\leftarrow$ 1
7 $\to$ 8	0111 $\to$ 1000	no（ $2^{3}$ ）	4 $\leftarrow$ 2 $\leftarrow$ 1 $\leftarrow$ 1
8 $\to$ 9	1000 $\to$ 1001	yes	4 $\leftarrow$ 2 $\leftarrow$ (2) $\leftarrow$ 1
9 $\to$ 10	1001 $\to$ 1010	yes	4 $\leftarrow$ (4) $\leftarrow$ 1 $\leftarrow$ 1
10 $\to$ 11	1010 $\to$ 1011	yes	4 $\leftarrow$ 4 $\leftarrow$ (2) $\leftarrow$ 1
11 $\to$ 12	1011 $\to$ 1100	yes	(8) $\leftarrow$ 2 $\leftarrow$ 1 $\leftarrow$ 1
12 $\to$ 13	1100 $\to$ 1101	yes	8 $\leftarrow$ 2 $\leftarrow$ (2) $\leftarrow$ 1
13 $\to$ 14	1101 $\to$ 1110	yes	8 $\leftarrow$ (4) $\leftarrow$ 1 $\leftarrow$ 1
14 $\to$ 15	1110 $\to$ 1111	yes	8 $\leftarrow$ 4 $\leftarrow$ (2) $\leftarrow$ 1
15 $\to$ 16	1111 $\to$ 10000	no（ $2^{4}$ ）	8 $\leftarrow$ 4 $\leftarrow$ 2 $\leftarrow$ 1 $\leftarrow$ 1

而整個函式會逐一把原始串列 list 的節點移到待處理串列 pending ，而 pending 中已排序的子串列會以前段提到的規則合併，直到原始 list 輸入完，接著在最後的 for 迴圈中會繼續合併 pending 中子串列，在這個階段是取最後一個和倒數第二個子串列合併成一個，直到只剩兩個子串列就結束迴圈，執行 final_merge 完成全部的合併

閱讀 < BOTTOM-UP MERGESORT A Detailed Analysis >

Merge sort 可分成 bottom-up 和 top-down ， top-down 為課本典型的演算法，將待排序序列分成左右兩個子序列，左右又繼續分裂直到子序列長度是一再合併，適合遞迴處理; bottum-up 則是將待排序序列元素逐一處理，在適當的時機合併，適合非遞迴實作，commit b5c56e 屬於 bottom-up 的一種變形，也是逐一將元素讀入，差別在於合併的時機不同。
下表為 bottom-up 和 top-down　的性能較比較：

Case	Best Case	Average Case	Worst Case
Top-Down	1	2N lg N - 0.146N	N lg N - 1.248N
Bottom-Up	1	2N lg N - 0.146N	N lg N - 0.965N

雖然 bottom-up 的最差情況效能略輸 top-down ，但是 bottom-up 不需要預先知道序列長度，適合 link-list 實作的序列，這點使得它的實用性大於 top-down。

閱讀 Queue-Mergesort

這篇文章主要介紹一種合併排序的變體稱為 Queue Mergesort ，使用佇列存放待合併序列，先將每個元素單獨作為一個待排序序列存放於佇列，之後不斷從頭部取出兩個進行合併後放到尾部，直到佇列中剩下一個序列即完成排序。
作者也證明這種方法是 optiml mergesort ，也就是對任何待排序元素個數 N ，最差情況下比較次數不比其他 mergesort 算法演算法多

algorithm 是「演算法」，而「算法」是 the way you calculate，也就是你在國中小學數學測驗卷上的「計算過程」，二者不同。

閱讀 The cost distribution of queue-mergesort, optimal mergesorts, and power-of-two rules

當兩個要合併的序列長度比值是2的冪時，也就是使用 power-of-2 rules ，有機會使得整體的時間複雜度隨著 N 變大時線性成長，既使合併兩序列的複雜度是超線性，因此相較於 half-half rules ，使用 power-of-2 rules 可以在 N 很大的時候顯著降低運算時間，而在 commit b5c56e 中選用合併序列長度比值上限為 2 是因為不論 N 多少每一步合併都能限制在這個比值以下。

引入 lib/list_sort.c 到 lab0-c

經過了上面的理解，我意識到自己之前實作 q_sort 的方法是多麼天真且愚蠢，因此我引入 lib/list_sort.c 的算法演算法到 lab0-c (commit 9837691)。
過程中做了一些改寫，為了簡化實作，刪去了編譯器優化的部份__attribute__((nonnull()), likely 和 unlikely ，還有以 strcmp 取代 cmp 函式，因此也刪減了有關函式的傳入參數 prev, cmp 。