2024q1 Homework1 (lab0)

contributed by < jimmylu890303>

Reviewed by `jujuegg`

~~- 建議統一將函式或巨集的名稱加上 `` - 英文與中文之間需有空格 - 感覺可以加入 graphviz 來更詳細的解釋程式碼~~

你的洞見呢？與其談「心法」，不如設想「如果我是你」，我會怎麼做，示範一次給同學看，而且檢討自己哪裡沒做好，從而呼應原本學員的不足處。
清楚標示學員的 git commits 和敘述的不足處，予以檢討。
軟體工程師要學會說故事，本作業要求學員從良性詳盡的批評開始。繼續檢討！
工程人員說話要精準，不要說「感覺」。

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

開發環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 5 5600H with Radeon Graphics
    CPU family:          25
    Model:               80
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            0
    CPU max MHz:         4280.0000
    CPU min MHz:         400.0000
    BogoMIPS:            6587.90

課程教材閱讀

連結

佇列實作及改進過程

`q_new`

commit 379b392。

根據作業要求，可以知道 NULL 佇列和 Empty 佇列的定義不同，而 Empty 佇列代表的是該佇列指向有效的資料結構(head)，所以我在實做該方法時，需要 malloc 一個記憶體區塊給 head ，並且在將 head 中的 prev 及 next 指向自己。

對照 Dictionary.com 對 implement 一詞的解釋:

to fulfill; perform; carry out:
to put into effect according to or by means of a definite plan or procedure.
Computers. to realize or instantiate (an element in a program), often under certain conditions as specified by the software involved.
to fill out or supplement

「實作」會是符合上述 "implement" 語意的譯詞，取「起」以「創作」的意思，而「製造」本質上是固定輸入並通常產生固定輸出的行為。

\to

資訊科技詞彙翻譯

$ ./qtest 
cmd> new
l = []

commit ac33283 調整為，只要有使用 malloc 就必須檢查 malloc 所回傳的指標，以確保申請記憶體區塊是否成功，並且在將原本將指標初始化的動作以 INIT_LIST_HEAD 取代以保持程式碼更加簡潔。

struct list_head *head =
    (struct list_head *) malloc(sizeof(struct list_head));
- head->next = head;
- head->prev = head;
+ if (!head)
+    return NULL;
+ INIT_LIST_HEAD(head);

`q_free`

commit 8d62891

避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景。

在佇列的實作中，鏈結串列是透過 element_t 中的 list_head 結構來做鏈結，所以可以透過定義在 list.h 中的 list_for_each_entry_safe 來走訪所有 queue 中的 element_t ，並且取得該 element_t 的位址以釋放該記憶體區塊，另外因為 element_t 中指向字串的記憶體區塊也需要一併釋放掉，最後也需要將 head 的 list_head 記憶體區塊也釋放。

對照 Dictionary.com 對 implement 一詞的解釋:

to fulfill; perform; carry out:
to put into effect according to or by means of a definite plan or procedure.
Computers. to realize or instantiate (an element in a program), often under certain conditions as specified by the software involved.
to fill out or supplement

「實作」會是符合上述 "implement" 語意的譯詞，取「起」以「創作」的意思，而「製造」本質上是固定輸入並通常產生固定輸出的行為。

\to

資訊科技詞彙翻譯

$./qtest 
cmd> new
l = []
cmd> ih a
l = [a]
cmd> it b
l = [a b]
cmd> free
l = NULL

`q_insert_head()` & `q_insert_tail()`

根據 list.h 中定義的 list_add 和 list_add_tail ，我可以使用這兩個API來實做 q_insert_head() 和 q_insert_tail() 。

q_insert_head() 直覺上的作法是使用 list.h 中的 list_add 將要新增的 new_node 新增至 head->next ，值得注意的是在 element_t 中的 value 要指向插入的字串時，需要使用 strdup(*src) 來複製一份副本，不然在測試階段它會有 error message ERROR: Need to allocate and copy string for new queue element 。

詳細實作在 commit 25d9e99 。

q_insert_tail() 直覺上的想法是使用 list.h 中的 list_add_tail 將要新增的 new_node 新增至 head->prev ，值得注意的是在 element_t 中的 value 要指向插入的字串時，需要使用 strdup(*src) 來複製一份副本，不然在測試階段它會有 error message ERROR: Need to allocate and copy string for new queue element 。

詳細實作在 commit bdf083b 。

後續修正

commit a93b88f

針對 strdup 的行為進行程式碼修正，在使用 strdup時它會 malloc 一個記憶體空間，並將目標字串複製到該記憶體空間，所以因為有使用 malloc ，所以就必須要檢查申請記憶體空間是否成功。
diff new_node->value = strdup(s); + if (!new_node->value) { + free(new_node); + return false; + }

$./qtest 
cmd> new
l = []
cmd> ih a
l = [a]
cmd> it b
l = [a b]
cmd> ih 0
l = [0 a b]
cmd> it 1
l = [0 a b 1]

但是我在使用commit -a來commit這兩個變更時，會發生Memory leak的錯誤，目前還沒有找出原因為何。但根據~~維基百科~~，主因是計算機程式的記憶體管理失當，因而失去對一段已分配記憶體空間的控制

詳閱作業說明，使用合適的工具來排除問題。

為何不是查閱指定教材和作業系統教科書，以理解 memory leak 呢？

$ git commit -a
Following files need to be cleaned up:
queue.c
queue.c:47:5: error: Memory leak: new_node [memleak]
    return true;
    ^
queue.c:59:5: error: Memory leak: new_node [memleak]
    return true;
    ^

Fail to pass static analysis.

後來檢查後發現是以下寫法差異導致錯誤原因。

- list_add(&(new_node->list), head);
+ list_add(&new_node->list, head);

`q_remove_head`, `q_remove_tail`

q_remove_head() 的作法是 head->next 為佇列中的第一個元素，所以只需要使用 list_del() 將第一個元素從佇列中移除，之後透過 list_entry 將此 list_head 的 entry 回傳。

詳細實作在 commit 617635d。

q_remove_tail() 的作法是 head->prev 為佇列中的最後一個元素，所以只需要使用 list_del() 將最後一個元素從佇列中移除，之後透過 list_entry 將此 list_head 的 entry 回傳。

詳細實作在 commit b2bd2de。

後續修正

commit 7136ee0

針對 strncpy 的行為進行修正以通過 trace-07-string 的測試，若傳入的參數 bufsize 小於原字串的長度，則 strncp 只會複製原字串的前 bufsize 個字元，因此並不會複製到 null terminator ，這會導致在讀取該字串時會造成錯誤。

element_t *entry = list_entry(first_element, __typeof__(*entry), list);
strncpy(sp, entry->value, bufsize);
+ sp[bufsize - 1] = '\0';

commit d147dfd
使用 list.h 的函數增加程式碼可讀性

l = [a b c]
cmd> rh
Removed a from queue
l = [b c]
cmd> rh 
Removed b from queue
l = [c]
cmd>

`q_delete_mid()`

commit ebc0534

在實作 q_delete_mid 時，可以使用快慢指標的概念去找出中間的點，head 的節點為可以視為 dummy node 可當作 slow_node ，而 head->next 為佇列中的第一個節點並當作 fast_node，每次迴圈 fast_node 移動兩個節點的距離，而 slow_node 移動一個節點的距離，當 fast_node == head 或者是 fast_node->next == head 時，即可找到 mid_node = slow->next。

是否充分彰環狀雙向鏈結串列的特性？

l = [a b c]
cmd> dm
l = [a c]
cmd> dm
l = [a]
cmd>

`q_delete_dup()`

commit d49a7ed

在實作 q_delete_dup 時，我是用直觀的方式來完成功能，每個 entry 都和在自己後面的 entry 去做比較，如果 strcmp(s1,s2)==0 代表兩值相同需要刪除該 entry ，等到走訪完成該 entry 後方所有的 entry 後，在刪除該 entry 。

l = [a a b b b c d]
cmd> dedup
l = [c d]

`q_swap()`

commit 7840fa4

在實作 q_swap時，我是以 first 指標為第一個節點和 second 指標為第二個節點兩兩互相交換，在每次迴圈結束時兩個指標皆移動兩個節點的距離，若其中一個指標指回 head ，則迴圈結束。

l = [1 2 3]
cmd> swap
l = [2 1 3]
cmd> it 4
l = [2 1 3 4]
cmd> swap
l = [1 2 4 3]

後續在測試階段時，發現使用 q_swap 後在做其他操作，會造成指標錯誤，

commit afbaf0d

  first->next = second->next;
- second->next = first;
  second->prev = first->prev;
+ second->next = first;
  first->prev = second;
  cur->next = second;
+ first->next->prev = first;
  cur = cur->next->next;
  first = cur->next;

`q_sort()`

避免非必要的項目縮排 (即 * 和 - )，以清晰、明確，且流暢的漢語書寫。

實作 Bubble Sort

起初 q_sort 我是使用 bubble sort 來進行排序， bubble sort 的時間複雜度為

O (N^{2})

, 在進行 trace-14 的測試時，會跳出 sorting algorithms with O(n^2) time complexity are expected failed 的訊息。

詳細實作在 commit 1001c3c

實作 Merge Sort

為了通過 trace-14 的測試，所以必須使用時間複雜度為 O(

N l o g N

) 的排序演算法，所以我選了 Merge sort 作為我要實作的演算法，具體演算法參考此處。

詳細實作在 commit 70bb1e3 。
使用 list.h 中的函數來改善程式碼可讀性在 commit d51a37c 。

但是上面的 Merge sort 在 trace-14 的測試中會無法通過，會造成 Segmentaion Fault ，我對 trace-14 中的測試項目中進行了以下更改。

- ih dolphin 1000000
- it gerbil 1000000
+ ih dolphin 100000
+ it gerbil 100000

之後 trace-14 的測試即可以正常通過，我推測是因為因為原先測試的資料量太大，導致在使用遞迴的方式去合併兩個串列時，會有記憶體區塊重疊的情形。

if (l1_entry->list.next != &l1_entry->list)
    result = merge(l1_entry->list.next, l2);
else
    result = merge(NULL, l2);

if (l2_entry->list.next != &l2_entry->list)
            result = merge(l2_entry->list.next, l1);
else
    result = merge(l1, NULL);

避免非必要的項目縮排 (即 * 和 - )，以清晰、明確，且流暢的漢語書寫。

改善 Merge Sort

commit 26b8f04

以上的錯誤問題我是透過將 merge fucntion 從原先的遞迴方式改成迭代的方式，並且可順利通過 trace-14 。。

+ while (l1_len != 0 && l2_len != 0) {
+     element_t *l1_entry = list_entry(l1, __typeof__(*l1_entry), list);
+     element_t *l2_entry = list_entry(l2, __typeof__(*l2_entry), list);
+     if (strcmp(l1_entry->value, l2_entry->value) <= 0) {
+         cur = l1;
+         list_del(l1);
+         l1 = l1->next;
+         l1_len--;
+         list_add_tail(cur, &result);
+     } else {
+         cur = l2;
+         list_del(l2);
+         l2 = l2->next;
+         l2_len--;
+         list_add_tail(cur, &result);
+     }
+ }

你如何確保目前的測試已涵蓋排序演算法的最差狀況？

`q_merge()`

詳細實作在 commit 2904144 。

在實作 q_merge 之前，需要先了解多個佇列是如何透過 list_head 鏈結的及 q_merge 中參數 head 所鏈結的串列結構，舉例來說 head->next 是指向到 queue.h 中 q_contex_t 的 list_head chain ，透過這個 chain 可以走訪到下一個 q_contex_t ，並且透過 q_contex_t->q 可以存取該佇列的 queue list ，也就是可以走訪此佇列上的各個 element_t 。

q_merge 的直覺作法就是讓已排序好的串列兩兩進行合併，若干輪後，就只會剩下一個已排序的串列。

沒必要參照這種缺乏充分驗證過的程式碼，你可以做更好。
數學分析呢？

// q_merge()
// 每個串列皆與第一條串列進行合併 
for (int i = 1; i < q_size(head); i++) {
    queue_contex_t *next_queue =
        list_entry(next, __typeof__(*next_queue), chain);
    mergeTwoLists(first_queue->q, next_queue->q);
    next = next->next;
}
// mergeTwoLists()
while (l1_len != 0 && l2_len != 0) {
    element_t *l1_entry = list_entry(l1_ele, __typeof__(*l1_entry), list);
    element_t *l2_entry = list_entry(l2_ele, __typeof__(*l2_entry), list);

    if (strcmp(l1_entry->value, l2_entry->value) <= 0) {
        cur = l1_ele;
        list_del(l1_ele);
        l1_ele = l1_ele->next;
        l1_len--;
        list_add_tail(cur, &result_head);
    } else {
        cur = l2_ele;
        list_del(l2_ele);
        l2_ele = l2_ele->next;
        l2_len--;
        list_add_tail(cur, &result_head);
    }
}

實作佇列結果

$ make test
...
---     trace-17-complexity     0/5
---     TOTAL           95/100
make: *** [Makefile:60: test] Error 1

在 trace-17 的測試中，測試了 q_insert_head , q_insert_tail , q_remove_head , q_remove_tail 四個函數是否時間複雜度為 Constant time，但目前還無法通過，因為 dutect 還有缺陷。

在修正完 dudect 缺失的功能後， trace-17 成功通過。

commit f08de9c

+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
Probably constant time
Probably constant time
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
---     TOTAL           100/100

在 qtest 提供新的命令 `shuffle`

在作業要求中，需要對一個佇列中所有的節點進行洗牌 (shuffle) ，也就是將串列的元素順序打亂，參照老師所提及的 Fisher–Yates shuffle 的洗牌演算法完成實現在 qtest 中新增 q_shuffle 的命令。

在給定一個長度為

N

的串列，經過洗牌 (shuffle) 會將串列的順序亂數打亂，而這樣可能的排列數就會是

N!

個情況，洗牌演算法的條件是所有出現的排列結果機率相等，洗牌演算法必須可以產生出

N!

中的每一種結果，不可以多或少，否則每種結果出現的機率不相等。

Fisher–Yates shuffle

透過一個迴圈去迭代串列，但是它的方式是從最後一個元素開始，每次都跟前面一個隨機的元素交換位置。這樣的話，可以保證隨機過的位置不會再被交換，就不會出現重複的情況。

// n is the length of array
for (int i = n - 1; i > 0; i--) {
    int randomIndex = rand() % (i + 1);
    int temp = arr[i];
    arr[i] = arr[randomIndex];
    arr[randomIndex] = temp;
}

以上方程式碼為例，它是從陣列中最後一個元素與該元素的前面元素作交換的動作。

假設一定有一個長度為 5 的陣列，

第一次迭代 : i = 4 , randomIndex 的範圍會介於 0 ~ 4 之間，所以可能交換的其況為 5 種(包含自己)。
第二次迭代 : i = 3 , randomIndex 的範圍會介於 0 ~ 3 之間。
第三次迭代 : i = 2 , randomIndex 的範圍會介於 0 ~ 2 之間。
第四次迭代 : i = 1 , randomIndex 的範圍會介於 0 ~ 1 之間。
第五次迭代 : i = 0 , randomIndex 的範圍會為 0 。

整個過程產生出來可能的結果為

5 \times 4 \times 3 \times 2 \times 1 = 5!

，所以可以由上述可知， Fisher–Yates shuffle 可以產生出

N!

種可能。

Fisher–Yates shuffle 在 lab0 的實作
- Fisher–Yates shuffle 在 lab0 中實作方法基本與上方的程式碼相同，但上方的程式碼要操作的資料型別為陣列，可以透過 randomIndex 直接存取到要交換的元素並作交換，而在 lab0 中要操作的資料型別為環狀佇列，所以需要透過指標去指到特定 index 的元素來完成交換，並且需要考量兩節點是否相鄰的情況。

詳細實作在 commit a54ea55。

思考 Fisher–Yates 演算法是否適合鏈結串列？

// Find the nodes
cur = head->prev;
for (int j = i; j < (len - 1); j++)
    cur = cur->prev;
cmp = cur;
for (int j = 0; j < rand_num; j++)
    cmp = cmp->prev;

// Swap two node
if (rand_num > 1) {
    struct list_head *tmp = cmp->prev;
    list_move(cmp, cur->prev);
    list_move(cur, tmp);
} else {
    list_del(cmp);
    list_add(cmp, cur);
}

l = [a b c d]
cmd> shuffle
l = [c a d b]
cmd> shuffle
l = [c b a d]
cmd> shuffle
l = [b a c d]
cmd> shuffle
l = [b c d a]

分析Fisher–Yates shuffle

若以長度為 4 的陣列為例，全部可能的可能為

4!

個，在進行

10^{6}

次洗牌 (shuffle) ，得到以下各個組合數出現的次數及數據直方圖，可以大致看資料呈現均勻分佈。

參閱作業說明，可以使用 Pearson's chi-squared test 來作於類別變數的檢定，來驗證一組觀察值的次數分配是否異於理論上的分配。

各個排列結果機率相等，且為

p = \frac{100}{24} = 4.17 %

期望值公式為

E = \sum_{i = 1}^{K} X \times P (X_{i})

，

k = n! = 24

Chi-squared test statistic 為

X^{2} = \sum_{i = 1}^{k} \frac{(O_{i} - E_{i})^{2}}{E_{i}}

，

O

為觀察次數

$ ./scripts/compute.py 
Expectation:  41666
Observation:  {
    '1234': 41786, '1243': 41534, '1324': 41652,
    '1342': 41614,'1423': 41678, '1432': 41603,
    '2134': 41851, '2143': 41566, '2314': 42071,
    '2341': 41363, '2413': 41564, '2431': 41643,
    '3124': 41423, '3142': 41935, '3214': 41502,
    '3241': 41731, '3412': 41743, '3421': 41694,
    '4123': 41688, '4132': 41634, '4213': 41758,
    '4231': 41770, '4312': 41552, '4321': 41645
}
chi square sum:  13.279364469831519

使用教材中的 Python腳本來計算 Chi-squared sum，可得

X^{2} = 13.2794

此次分析

k = 24

，所以自由度為

24 - 1 = 23

，在選擇顯著水準為

a l p h a = 0.05

，根據 Chi-Square 分佈表，可看到 P value 介於 0.05 和 0.1 之間，因為

P (0.05 -0.1) > a l p h a (0.05)

，統計檢定的結果不拒絕虛無假說（

H_{0}

），也就是樣本資料不拒絕 shuffle 的結果遵循 Uniform distribution，因為沒有足夠證據推翻。

研讀論文〈Dude, is my code constant time?〉

此論文主要探討的是檢測程式是否為 Constant Time ，在這裡我將 Constant Time 想成是一個固定的時間，而非像時間複雜度所代表的 Constant Time ，為什麼會這樣想是因為作者說每種看似是 Constant Time 的演算法實作其實都存在 Time leakage 的問題，那何謂 Time leakage ？

Time leakage 是指通過測量加密演算法或者其他演算法的執行時間來判斷其內部狀態或敏感資訊的攻擊方法。攻擊者可以透過觀察演算法的執行時間，推測出關鍵的訊息，例如密碼的私鑰，這種攻擊通常利用演算法在處理不同輸入的執行時間差異，從而推測出演算法內部的工作原理或者敏感資訊。

具體 dudect 的運作流程如下

初始化設置 : 需要先定義兩個不同類別的測試資料，一個為固定的測資(全部的資料為某個固定的值)，另一個為隨機的測資。
測量執行時間 : 使用特定的輸入資料(固定或隨機)多次測試演算法，然後測量每次執行的時間。
進行統計分析 : 對於每種輸入的資料類別，分析收集到的執行時間數據，來檢測兩個類別的執行時間分佈是否存在顯著的差異，若有明顯的差異，則代表存在 Time leakage 的問題。

simulation 運作

# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
option simulation 1
it
ih
rh
rt
option simulation 0

以 trace-17 來說，在執行 ./qtest 可以透過 option 的方式將 simulation 打開，而在 qtest.c 中若 simulation == 1 ，則就會呼叫如 is_remove_tail_const() 的函數。

static inline bool do_rh(int argc, char *argv[])
{
    return queue_remove(POS_HEAD, argc, argv);
}

static bool queue_remove(position_t pos, int argc, char *argv[])
{
    #if !(defined(__aarch64__) && defined(__APPLE__))
    if (simulation) {
        bool ok =
            pos == POS_TAIL ? is_remove_tail_const() : is_remove_head_const();
    }
    #endif
}

is_remove_tail_const() 被定義在 lab0-c/dudect/fixture.h 內

/* Interface to test if function is constant */
#define _(x) bool is_##x##_const(void);
DUT_FUNCS
#undef _

DUT_FUNCS 被定義在 lab0-c/dudect/constant.h 內

#define DUT_FUNCS  \
    _(insert_head) \
    _(insert_tail) \
    _(remove_head) \
    _(remove_tail)

#define DUT(x) DUT_##x

enum {
#define _(x) DUT(x),
    DUT_FUNCS
#undef _
};

在 lab0-c/dudect/fixture.c 看 is_remove_tail_const() 的實作

#define DUT_FUNC_IMPL(op) \
    bool is_##op##_const(void) { return test_const(#op, DUT(op)); }

#define _(x) DUT_FUNC_IMPL(x)
DUT_FUNCS
#undef _

之後會呼叫 test_const(#op, DUT(op)) ，再呼叫 doit

static bool test_const(char *text, int mode)
{
    bool result = false;
    t = malloc(sizeof(t_context_t));

    for (int cnt = 0; cnt < TEST_TRIES; ++cnt) {
        printf("Testing %s...(%d/%d)\n\n", text, cnt, TEST_TRIES);
        init_once();
        for (int i = 0; i < ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1;
             ++i)
            result = doit(mode);
        printf("\033[A\033[2K\033[A\033[2K");
        if (result)
            break;
    }
    free(t);
    return result;
}

doit 跟 dudect.h 中的 dudect_main 的函數功能差不多，只是在 lab0 中的 doit 少了實作 prepare_percentiles 來設定用於裁剪測量值的不同閥值，所以在 labo 中只需要將缺失的補齊。

論文和實作程式碼是否有出入？

在 doit 的程式碼中，做了以下變更，在呼叫 update_statistics 之前，需要先呼叫 prepare_percentiles 來計算

static bool doit(int mode)
{
+   int64_t *percentiles =  calloc(NUMBER_PERCENTILES, sizeof(int64_t));

    prepare_inputs(input_data, classes);

    bool ret = measure(before_ticks, after_ticks, input_data, mode);
    differentiate(exec_times, before_ticks, after_ticks);
+   prepare_percentiles(exec_times,N_MEASURES,percentiles);
+   update_statistics(exec_times, classes, percentiles);
    ret &= report();
}

update_statistics 作了以下變更

 for (size_t i = 0; i < N_MEASURES; i++) {
        int64_t difference = exec_times[i];
        /* CPU cycle counter overflowed or dropped measurement */
        if (difference <= 0)
            continue;

        /* do a t-test on the execution time */
+       t_push(t, exec_times[0], classes[i]);

+        // t-test on cropped execution times, for several cropping thresholds.
+       for (size_t crop_index = 1; crop_index < NUMBER_PERCENTILES;
+             crop_index++) {
+           if (difference < percentiles[crop_index]) {
+                t_push(t, difference, classes[i]);
+           }
+        }
    }

commit f08de9c

最後成功使 trace-17-complexity 的測試成功通過。

+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
Probably constant time
Probably constant time
Probably constant time
Probably constant time
---     trace-17-complexity     5/5

研讀 Linux 核心原始程式碼的 lib/list_sort.c

鏈結串列在 linux 核心中是環狀雙向鏈結串列的結構，但是在 list_sort 中會將串列視為單向串列，雖然每個節點中都有 prev 指標，但是 prev pointer 在 list_sort 時會將 prev 指標拿去指向 pending 的串列，並且在以下程式碼可以觀察出將最後一個節點的 next 指標改指向 NULL 作為中止條件。

/* Convert to a null-terminated singly-linked list. */
head->prev->next = NULL;

而在以下程式碼中會將原本的串列分成若干個已排序的子串列，在這裡有使用到 pointer to pointer 的技巧來串接各個串列，並且使用 count 來紀錄 Pending 串列有幾個節點和控制是否要進行 merge

	do {
		size_t bits;
		struct list_head **tail = &pending;

		/* Find the least-significant clear bit in count */
		for (bits = count; bits & 1; bits >>= 1)
			tail = &(*tail)->prev;
		/* Do the indicated merge */
		if (likely(bits)) {
			struct list_head *a = *tail, *b = a->prev;

			a = merge(priv, cmp, b, a);
			/* Install the merged result in place of the inputs */
			a->prev = b->prev;
			*tail = a;
		}

		/* Move one element from input list to pending */
		list->prev = pending;
		pending = list;
		list = list->next;
		pending->next = NULL;
		count++;
	} while (list);

在這個迴圈內，鏈結串列會被分成兩個部份：

list 指向尚未排序的鏈結串列的開頭
pending 則是指向最新已排序好的子串列

在這段程式碼中他的操作較難理解，在閱讀追蹤時，將它畫成圖後，可以比較理解它是如何作用的， { 1,2 } , { 4,5 } , { 6,7 } 分別為 3 個 pending 的串列。

在此處有使用到 likely 巨集，使用巨集告訴編譯器說此條件為真的機率較高，所以在編譯時就會將該條件後面的程式碼區段緊接在該條件之後。

if (likely(bits))

而在 linux kernel 內，可以發現以下巨集， likely 和 unlikely() 會被轉成 GCC 所提供的 __builtin_expect() ，此函數用於針對 CPU 分支的最佳化。

#define likely(x)    __builtin_expect(!!(x), 1)
#define unlikely(x)  __builtin_expect(!!(x), 0)

而根據 GCC Document 所說，雖然一般來說，應該要使用真實實際的分支反饋來進行預測優化，但是有些情況下，收集實際分支的資訊可能會很困難，這時就可以使用 __builtin_expect 來提供分支預測信息。雖然這種靜態的方法可能不如動態反饋準確，但它仍然可以幫助編譯器更好地優化程式

You may use __builtin_expect to provide the compiler with branch prediction
information. In general, you should prefer to use actual profile feedback for this
(-fprofile-arcs), as programmers are notoriously bad at predicting how their pro-
grams actually perform. However, there are applications in which this data is hard
to collect.

而根據 GCC 規格書所提供的範例，編譯器期待 ptr != NULL 為 1 (即 expect exp == c )，所以我們期望執行 foo()

// define
long __builtin_expect (long exp, long c)

if (__builtin_expect (ptr != NULL, 1))
    foo (*ptr);

而在 linux kernel 內為何需使用 !!(x) ，因為這麼做的原因是 x 本身的數值不一定為 1 或 0，使用兩次 NOT 操作可以保證結果一定為 1 或 0

#define likely(x)    __builtin_expect(!!(x), 1)

而分支預測器( branch predictor )會試圖預測條件分支的結果，從而提前執行對應的分支指令，__builtin_expect() 提供對於分支指令執行方向的預測，編譯器可以利用這個預測資訊來生成更好的程式碼。而分支預測器則是在實際執行時期根據硬件機制進行分支指令的預測。因此，__builtin_expect()提供的預測資訊可能會影響分支預測器的工作。

而在隨後的程式碼中則是從最新的子串列開始進行兩兩合併，並且剩下兩個子串列時在進行最後的合併。

for (;;) {
    struct list_head *next = pending->prev;

    if (!next)
        break;
    list = merge(priv, cmp, pending, list);
    pending = next;
}
/* The final merge, rebuilding prev links */
merge_final(priv, cmp, head, pending, list);

引入 list_sort 到 lab0-c 當中

commit 8a861c6。

而在引入時我做了以下改動：

實作自己的 list_cmp_func_t

cmp_function 中因為字串的值是存放在 element_t 內，所以需要先存取該 element_t 的結構再去進行字串比較，而 __attribute__((nonnull(1, 2)) 代表的是該函數參數的第一個和第二個指標不能為空指標。

// in linux_sort.h
typedef int
    __attribute__((nonnull(1, 2))) (*list_cmp_func_t)(const struct list_head *,
                                                      const struct list_head *);
// in queue.c
int cmp_function(void *priv, const struct list_head *a, const struct list_head *b)
{
    element_t *a_entry = list_entry(a, element_t, list);
    element_t *b_entry = list_entry(b, element_t, list);

    if (strcmp(a_entry->value, b_entry->value) <= 0)
        return 0;
    else
        return 1;
}

引入 tim_sort 到 lab0-c 當中

commit - 45a04b7

演算法參考測驗二。

比較 3 種排序演算法

使用 test.cmd 測試三種排序演算法所花費的時間及計算排序比較次數

Merge sort :

$ ./qtest -v 3 -f traces/trace-test_merge_sort.cmd 
l = []
l = [ichkvh taplje nkqstlqys lvtvursyq jjfdv yzazaupvf dcanggjaj cuakam myzvfcpdq zgvgpbmf spbfj kbwuk mgugom ydsoed aebgwv lpshb fxfxclq fccxi fniozikbm nabsc ssajt rsrma cqtlnu tfjys hytfvz wbwtvf uhilajgbd qomzazzu asfohfqn ivhoaoyr ... ]
  Comparisons:    18673849
l = [aaaaswgm aaaavh aaabajyfy aaabjpe aaackep aaaclw aaacqza aaaddy aaadiy aaaebzz aaaedb aaaedp aaaevj aaafagzlc aaafb aaafdaw aaagm aaahpbm aaaitkck aaaiys aaajfmra aaajm aaajw aaakdnmeo aaaksahub aaalbzskz aaalfvsnw aaalod aaamalzeh aaamgqxzy ... ]
Delta time = 1.346
Freeing queue

Linux list sort :

$ ./qtest -v 3 -f traces/trace-test_linux_sort.cmd 
l = []
l = [ienlzvnc ccanftt msrumsk yvmjvmcyb ldpkwhka bawkju kayuwy feglorpuc mgelovjnz nurlmgv iqfiq rvivoew wjjnk fedwtamr lhwqtay ygmgkfjm xeivjuh axdbjl bdijo alfjp bwgoflx uuffrzv neiddwc ijcfaxhe gkxvd xqkjsc zzidxcb uotrwlkva wzvpqq djndref ... ]
  Comparisons:    18686885
l = [aaaamaft aaaav aaabda aaabo aaabpawwg aaacgifui aaadenjy aaadhnk aaaduxoz aaadys aaaech aaaefcco aaaeteor aaafa aaafcxmcv aaagmjqgl aaagrzt aaaiimap aaaitsjld aaaiup aaajjar aaajq aaajyywn aaakmvxc aaallpat aaalodl aaalu aaamykmr aaaoawg aaaongmlh ... ]
Delta time = 0.931
Freeing queue

Tim sort :

$ ./qtest -v 3 -f traces/trace-test_tim_sort.cmd 
l = []
l = [uyevbvrx sizyrlkeh dxrutoo alstfe iwcjbbunl tceuppe rfshingi ziisgw xfaqhsjb nhomdu klzvpxk lzknovzi xntgfhlwn mukmk xyowupt oezbdgcp slpjtvfn shzxdax adquzrz zewcv klwuv ezlknn zbbvx oycaefmjg sfnlllnx eblwdnp bhcatlt xjfomk sqwqhxc uowxcxfmy ... ]
  Comparisons:    20608584
l = [aaaacm aaaarpsjt aaabzoxvj aaaclgzf aaacuz aaacw aaadewbj aaaew aaafh aaafoblh aaagcxopp aaagyoovg aaaifroz aaaigw aaaiifyjr aaajee aaakeilop aaakk aaakwdfl aaalbx aaalczqd aaalwalo aaamnnk aaamomwl aaamtr aaaneg aaaoayggy aaapagk aaapdlew aaapgpuvz ... ]
Delta time = 0.980
Freeing queue

	Merge Sort	List Sort	Tim Sort
Time	1.346	0.931	0.980
Comparision	18673849	18686885	20608584

由上方表格可知， Merge Sort 的執行時間高於其他兩種演算法，但比較次數 Tim Sort 為最高。

使用 valgrind cachegrind 分析 cache miss ratio :

Merge sort

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Ir               I1mr        ILmr        Dr              D1mr           DLmr         Dw              D1mw           DLmw            file:function
----------------------------------------------------------------------------------------------------------------------------------------------------------------
64,014,838 (10.56%)  3 ( 0.15%)  3 ( 0.15%)  7,107,192 ( 5.15%)     5,962 ( 0.05%)      43 ( 0.00%)  6,566,543 ( 7.90%)         0                0           /home/jimmylu/linux2024/lab0-c/queue.c:merge
28,732,961 ( 4.74%)  2 ( 0.10%)  2 ( 0.10%) 10,047,327 ( 7.28%) 3,895,919 (35.92%) 470,927 (33.68%)  3,514,305 ( 4.23%)    73,153 ( 2.46%)  56,529 ( 8.18%)  /home/jimmylu/linux2024/lab0-c/queue.c:mergeSortList

Linux 的 list_srot

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Ir               I1mr        ILmr        Dr              D1mr           DLmr         Dw              D1mw           DLmw            file:function
----------------------------------------------------------------------------------------------------------------------------------------------------------------
60,706,044 ( 9.48%)  3 ( 0.15%)  3 ( 0.15%)  6,977,888 ( 4.81%)   751,295 ( 6.95%)  57,513 ( 1.73%) 10,709,872 (14.91%)       1 ( 0.00%)       1 ( 0.00%)  /home/jimmylu/linux2024/lab0-c/linux_sort.c:merge
11,627,055 ( 1.82%)  7 ( 0.35%)  7 ( 0.35%)  1,893,426 ( 1.31%)   139,300 ( 1.29%) 115,824 ( 3.47%)  2,434,431 ( 3.39%) 273,519 (18.52%) 221,882 (19.31%)  /home/jimmylu/linux2024/lab0-c/linux_sort.c:list_sort

Tim_srot

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Ir               I1mr        ILmr        Dr              D1mr           DLmr         Dw              D1mw           DLmw            file:function
----------------------------------------------------------------------------------------------------------------------------------------------------------------
60,500,631 ( 9.11%)  4 ( 0.20%)  4 ( 0.20%)  6,620,349 ( 4.35%)   712,301 ( 6.63%)  34,360 ( 1.02%) 10,116,234 (14.23%)   3,148 ( 0.26%)       2 ( 0.00%)  /home/jimmylu/linux2024/lab0-c/tim_sort.c:merge_at
21,051,737 ( 3.17%) 19 ( 0.95%) 19 ( 0.96%)  5,485,264 ( 3.61%)   147,095 ( 1.37%) 116,236 ( 3.46%)  1,933,558 ( 2.72%)      13 ( 0.00%)      11 ( 0.00%)  /home/jimmylu/linux2024/lab0-c/tim_sort.c:timsort

Type	Merge Sort	List Sort	Tim Sort
I1mr	5	10	23
D1mr	3901881	890595	859396

三種排序演算法中， Merge Sort 的 cache miss ratio為最高，而 Tim Sort 為最低，因為 Tim Sort 相較於傳統的 Merge Sort ，為了降低 cache miss 而作改善。

2024q1 Homework1 (lab0)

Reviewed by jujuegg

開發環境

課程教材閱讀

佇列實作及改進過程

q_new

q_free

q_insert_head() & q_insert_tail()

後續修正

q_remove_head, q_remove_tail

後續修正

q_delete_mid()

q_delete_dup()

q_swap()

q_sort()

實作 Bubble Sort

實作 Merge Sort

改善 Merge Sort

q_merge()

實作佇列結果

在 qtest 提供新的命令 shuffle

Fisher–Yates shuffle

分析Fisher–Yates shuffle

研讀論文 〈Dude, is my code constant time?〉

simulation 運作

研讀 Linux 核心原始程式碼的 lib/list_sort.c

引入 list_sort 到 lab0-c 當中

引入 tim_sort 到 lab0-c 當中

比較 3 種排序演算法

使用 test.cmd 測試三種排序演算法所花費的時間及計算排序比較次數

使用 valgrind cachegrind 分析 cache miss ratio :

Read more

Linux 核心期末專題 : KVM-host 實作 Virtio-Net

2024q1 Homework5 (assessment)

2024q1 Homework2 (quiz1+2)

2024q1 Homework6 (integration)

Reviewed by `jujuegg`

`q_new`

`q_free`

`q_insert_head()` & `q_insert_tail()`

`q_remove_head`, `q_remove_tail`

`q_delete_mid()`

`q_delete_dup()`

`q_swap()`

`q_sort()`

`q_merge()`

在 qtest 提供新的命令 `shuffle`

研讀論文〈Dude, is my code constant time?〉