2025q1 Homework1 (lab0)

contributed by < davidshiao55 >

作業書寫規範:

無論標題和內文中，中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
文字訊息 (尤其是程式執行結果) 請避免用圖片來表示，否則不好搜尋和分類
共筆書寫請考慮到日後協作，避免過多的個人色彩，用詞儘量中性
不要在筆記內加入 [TOC] : 筆記左上方已有 Table of Contents (TOC) 功能，不需要畫蛇添足
不要變更預設的 CSS 也不要加入任何佈景主題: 這是「開發紀錄」，用於評分和接受同儕的檢閱
當在筆記中貼入程式碼時，避免非必要的行號，也就是該手動將 c= 或 cpp= 變更為 c 或 cpp。行號只在後續討論明確需要行號時，才要出現，否則維持精簡的展現。可留意「你所不知道的 C 語言: linked list 和非連續記憶體」裡頭程式碼展現的方式
HackMD 不是讓你張貼完整程式碼的地方，GitHub 才是！因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示)，可附上對應 GitHub commit 的超連結，列出程式碼是為了「檢討」和「便於他人參與討論」
留意科技詞彙的使用，請參見「資訊科技詞彙翻譯」及「詞彙對照表」
不要濫用 :::info, :::success, :::warning 等標示，儘量用清晰的文字書寫。:::danger 則僅限授課教師作為批注使用
避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景
在中文敘述中，使用全形標點符號，例如該用「，」，而非 ","。注意書名號的使用，即 〈 和 〉，非「小於」和「大於」符號
避免使用不必要的 emoji 字元

開發環境

$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   12
  On-line CPU(s) list:    0-11
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
    CPU family:           6
    Model:                158
    Thread(s) per core:   2
    Core(s) per socket:   6
    Socket(s):            1
    Stepping:             10
    CPU(s) scaling MHz:   19%
    CPU max MHz:          4500.0000
    CPU min MHz:          800.0000
    BogoMIPS:             5199.98
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
                          ca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s
                          s ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
                          art arch_perfmon pebs bts rep_good nopl xtopology nons
                          top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
                          ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid 
                          sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer a
                          es xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpu
                          id_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexp
                          riority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 
                          smep bmi2 erms invpcid mpx rdseed adx smap clflushopt 
                          intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida ara
                          t pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi m
                          d_clear flush_l1d arch_capabilities
Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    192 KiB (6 instances)
  L1i:                    192 KiB (6 instances)
  L2:                     1.5 MiB (6 instances)
  L3:                     12 MiB (1 instance)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-11
Vulnerabilities:          
  Gather data sampling:   Mitigation; Microcode
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flush
                          es, SMT vulnerable
  Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prct
                          l
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointe
                          r sanitization
  Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP conditional;
                           RSB filling; PBRSB-eIBRS Not affected; BHI Not affect
                          ed
  Srbds:                  Mitigation; Microcode
  Tsx async abort:        Not affected

針對佇列操作的程式碼實作

q_free

commit: 81544d6

實作 q_free 函式，以正確地釋放佇列所佔用的所有記憶體資源。透過使用 list_for_each_entry_safe 巨集，我們可以安全地遍歷鏈結串列，並在迴圈中呼叫 q_release_element 來釋放每個元素的記憶體。迴圈結束後，使用 free 函式釋放 list_head 結構體本身，確保沒有記憶體洩漏。

commit: f25c599

void q_free(struct list_head *head)
 {
+    if (!head)
+        return;
+
     element_t *entry = NULL, *safe = NULL

為 q_free 函式新增了對空指標的檢查，以避免對空指標執行 free 操作。具體修改如下：

q_insert_head

commit: 54f1e93

首先，對 head 進行空指標檢查，以確保傳入的佇列有效。接著，分配記憶體空間給新的 element，若記憶體分配失敗則直接回傳 false。隨後，使用 test_strdup 分配記憶體並複製 s 到 element 結構體中的字串，確保新節點的內容正確。最後，使用 list_add 將新節點加入佇列的開頭，並回傳 true，表示插入成功。

commit: aec8cf5

 void q_free(struct list_head *head)
 /* Insert an element at head of queue */
 bool q_insert_head(struct list_head *head, char *s)
 {
-    if (!head)
+    if (!head || !s)
         return false;
 
     element_t *e = malloc(sizeof(element_t));
     if (!e)
         return false;
 
-    e->value = test_strdup(s);
+    size_t len = strlen(s) + 1;
+    e->value = malloc(len);
+    if (!e->value) {
+        free(e);
+        return false;
+    }
+    strlcpy(e->value, s, len);
     list_add(&e->list, head);
 
     return true;

對 q_insert_head 函式進行了以下改進：

加入對 s 的空指標檢查：在插入新元素時，首先檢查字串指標 s 是否為空，以避免對空指標進行操作，確保程式的穩定性。
改用 strlcpy 取代 test_strdup：在 harness.c 中發現註解標註該檔案屬於測試支援程式碼（test support code）。在閱讀 CERN 相關文獻後，決定改用 strlcpy 來實作字串複製，以提高記憶體管理的安全性。
避免記憶體洩漏：在檢測到 e->value 的記憶體分配失敗時，新增對 e 的記憶體釋放操作，以避免記憶體洩漏問題的發生。

q_insert_tail

commit: 7629631
commit: aec8cf5

同 q_insert_head 的實作，但將 list_add 換成 list_add_tail。

q_remove_head

commit: d2f60aa

首先，檢查 head 是否為空指標，或佇列是否為空，若是則直接回傳 NULL。接著，使用 list_first_entry 取得佇列的第一個元素 e，若提供了字串指標 sp，則使用 strlcpy 將元素的值複製到 sp，確保不超過 bufsize。然後，使用 list_del 將元素從佇列中移除，最後回傳指向已移除元素的指標 e。

q_remove_tail

commit: 7629631

同 q_remove_head 的實作，但將 list_first_entry 換成 list_last_entry 。

q_size

commit: 7bad176

首先，檢查 head 是否為空指標或佇列是否為空，若是則回傳 0。接著，使用 list_for_each 巨集遍歷佇列，計算元素數量並回傳。

q_delete_mid

commit: 30c06e1

在閱讀完分析「快慢指標」後，使用快慢指標的方式實作，以更有效地利用時間局部性。首先，對 head 做空指標及空佇列檢查，若為空則回傳 false。接著，利用快慢指標法遍歷佇列，直到快指標到達佇列末尾，慢指標即指向中間節點。最後，使用 list_del 刪除該中間節點並釋放記憶體。

q_delete_dup

commit: db08bab

此題跟 leetcode 最大的差異是鏈結串列並非排序好的，在理解題目後考慮了兩種實作方法：

使用雜湊表實作，時間複雜度：
$O (n)$ ，空間複雜度：
$O (n)$
使用巢狀迴圈實作，時間複雜度：
$O (n^{2})$ ，空間複雜度：
$O (1)$

考慮到 C 語言沒有原生雜湊表實作，選擇使用巢狀迴圈。考慮到刪除重複節點可能會刪除到下一個節點不適合使用 list_for_each_safe 巨集，改使用 list_for_each 巨集，並在內層迴圈中使用一個 runner 指標去刪除重複節點並檢測當前節點是否重複，若當前節點是重複節點則在外層迴圈再刪除此節點。

commit: bec3e6a

$./qtest 
cmd> new
l = []
cmd> it a
l = [a]
cmd> it b
l = [a b]
cmd> it c
l = [a b c]
cmd> it a
l = [a b c a]
cmd> dedup
ERROR: Duplicate strings are in queue or distinct strings are not in queue
l = [b c]
cmd> size
ERROR: Computed queue size as 2, but correct value is 4
l = [b c]

在發現以上測資會出錯後，閱讀 qtest.c 中 do_dedup 的實作，發現我錯誤理解題意，此題只需移除相鄰重複節點。時間複雜度：

O (n)

，空間複雜度

O (1)

bool q_delete_dup(struct list_head *head)
     if (!head || list_empty(head))
         return false;
 
-    struct list_head *node = head->next;
-
-    while (node != head) {
+    struct list_head *node;
+    list_for_each (node, head) {
         element_t *e = list_entry(node, element_t, list);
         bool dup = false;
 
-        struct list_head *safe = node->next;
-        // safe find the first non-duplicated value
-        while (safe != head) {
-            struct list_head *next = safe->next;
-            element_t *s = list_entry(safe, element_t, list);
-            if (!strcmp(s->value, e->value)) {
+        struct list_head *runner = node->next;
+        while (runner != head) {
+            struct list_head *next = runner->next;
+
+            element_t *r = list_entry(runner, element_t, list);
+            if (!strcmp(e->value, r->value)) {
                 dup = true;
-                list_del(&s->list);
-                q_release_element(s);
-            } else {
-                break;
+                list_del(&r->list);
+                q_release_element(r);
             }
-            safe = next;
+            runner = next;
         }
         if (dup) {
+            struct list_head *next = node->next;
             list_del(&e->list);
             q_release_element(e);
+            node = next;
         }
-        node = safe;
     }
     return true;
 }

新的實作使用一個 safe 指標指向第一個與此節點結構體中字串相異的節點，若有找到相同節點則刪除相同節點和此節點。

q_reverse

commit: 9bbf5f2

首先，檢查 head 是否為空指標，或佇列是否為空，接著從 head 遍歷各節點，將個解點中 next 和 prev 指向的節點交換。

q_reverseK

commit: c84a603

首先，檢查 head 是否為空指標，佇列是否為空或佇列是否只有一個元素，若是則直接返回。然後，遍歷佇列，對每組 k 個節點進行反轉操作。對於每組節點，首先找到第 k 個節點，若不足 k 個則不進行反轉。接著，反轉這 k 個節點的指標方向。最後，調整前一組的最後一個節點與當前組的第一個和第 k 個節點，以及下一組的第一個節點之間的指標關係，以維持雙向鏈結串列的結構。

q_sort

commit: 0e31de7

首先，將環狀鏈結串列轉換為非環狀的雙向鏈結串列。接著，使用快慢指標將串列分割為兩部分，並遞迴地對每部分進行合併排序。最後，將排序後的串列重新連接到頭節點。此實作中包含兩個輔助函式：merge_sort_dlist 使用快慢指標將串列分割為兩部分；merge_two_sorted 則為遞迴函式，用於合併兩個已排序的雙向鏈結串列。

merge_sort_dlist 中快慢指標偏移 fast 指標使得在偶數節點數的時候會選擇第一個中間節點，使得在節點數為二的時候叫容易處理分割。

static struct list_head *merge_sort_dlist(struct list_head *head, bool descend)
{
    ...
    struct list_head *slow = head, *fast = head->next;
    while (fast && fast->next) {
        slow = slow->next;
        fast = fast->next->next;
    ...
}

commit : 309a9ba

參照你所不知道的 C 語言: linked list 和非連續記憶體將 merge_two_sorted 函式從遞迴實作改為使用 indirect pointer ，類似於合併單向鏈結串列的方法，並添加額外的指標以維持雙向鏈結串列的結構。此更改有助於提高函式的效率，避免遞迴帶來的額外開銷。

/*  merges two sorted (non-circular) doubly linked lists */
static struct list_head *merge_two_sorted(struct list_head *left,
                                          struct list_head *right,
                                          bool descend)
{
    struct list_head *head = NULL, **ptr = &head, **node, *prev = NULL;

    for (node = NULL; left && right; *node = (*node)->next) {
        const element_t *l = list_entry(left, element_t, list);
        const element_t *r = list_entry(right, element_t, list);
        bool cmp = (descend) ? (strcmp(l->value, r->value) >= 0)
                             : (strcmp(l->value, r->value) <= 0);
        node = (cmp) ? &left : &right;
        *ptr = *node;
        (*node)->prev = prev;
        prev = *node;
        ptr = &(*ptr)->next;
    }
    *ptr = (struct list_head *) ((uintptr_t) left | (uintptr_t) right);
    (*ptr)->prev = prev;
    return head;
}

commit: 4dde27f

在實作 q_merge 後新增了兩個輔助函式，因此更精簡化了程式碼。

void q_sort(struct list_head *head, bool descend)
         return;
 
     // Break the the list into normal doubly-linked list
-    struct list_head *first = head->next;
-    struct list_head *last = head->prev;
-    first->prev = NULL;
-    last->next = NULL;
+    struct list_head *first = break_ring(head);
 
     struct list_head *sorted = merge_sort_dlist(first, descend);
-    struct list_head *tail = sorted;
-    while (tail->next)
-        tail = tail->next;
-
-    // Link the head back
-    head->next = sorted;
-    head->prev = tail;
-    sorted->prev = head;
-    tail->next = head;
+    recirc(head, sorted);
 }

q_ascend

commit: 8d19670

首先，檢查 head 是否為空指標、佇列是否為空或僅有一個元素，若是則回傳佇列大小。接著，從佇列尾部開始向前遍歷，追蹤目前的最小值，移除所有大於該最小值的節點，並在找到新的最小值時更新。此方法確保了剩餘的節點按遞增順序排列。

q_descend

commit: 83f6e33

同 q_ascend ，但將最小值換成最大值。

q_merge

commit: 526c2da

包含三個輔助函式：

break_ring：將具有哨兵頭節點的環狀鏈結串列轉換為標準的雙向鏈結串列。
recirc：將標準的雙向鏈結串列重新構造成環狀鏈結串列，並加入 head 節點。
merge_two_sorted：從 q_sort 中提取出來的函式，用於合併兩個已排序的雙向鏈結串列。

在 q_merge 函式中遍歷節點，首先使用 break_ring 轉換環狀鏈結串列，接著透過 merge_two_sorted 合併該個和第一個已排序的佇列，最後使用 recirc 重新構造環狀鏈結串列。

研讀 Linux 核心原始程式碼的 lib/list_sort.c

list_sort.c 有別於傳統 top-down 的 merge sort，採取了 bottom-up 的策略，更有效率的利用快取。

將 list_sort.c 引入 lab0

commit: 824efeb

直接將 list_sort.c 引入 queue.c 中，再多加入 cmp_func 和 sort_arg，尚未比較效能。

+#define likely(x) __builtin_expect(!!(x), 1)
+#define unlikely(x) __builtin_expect(!!(x), 0)
+
+struct sort_arg {
+    bool descend;
+};
+
+static int cmp_func(void *priv,
+                    const struct list_head *a,
+                    const struct list_head *b)
+{
+    const struct sort_arg *arg = priv;
+    const element_t *ea = list_entry(a, element_t, list);
+    const element_t *eb = list_entry(b, element_t, list);
+
+    int cmp = strcmp(ea->value, eb->value);
+    if (!arg->descend) {
+        return cmp;  // normal ascending
+    } else {
+        // just flip the sign
+        return -cmp;
+    }
+}
+
+typedef int (*list_cmp_func_t)(void *,
+                               const struct list_head *,
+                               const struct list_head *);
@@ -333,11 +591,16 @@  void q_sort(struct list_head *head, bool descend)
     if (!head || list_empty(head) || list_is_singular(head))
         return;
 
-    // Break the the list into normal doubly-linked list
-    struct list_head *first = break_ring(head);
+    // Build an 'arg' struct with the chosen ordering
+    struct sort_arg arg = {
+        .descend = descend,
+    };
 
-    struct list_head *sorted = merge_sort_dlist(first, descend);
-    recirc(head, sorted);
+    // Then just call list_sort. It will:
+    // 1) break the ring,
+    // 2) do a mergesort in place,
+    // 3) re-circularize the list.
+    list_sort(&arg, head, cmp_func);
 }

實作q_shuffle

實驗1,000,000次 shuffle 實驗結果如下：

$python3 scripts/shuffle.py 
Expectation:  41666
Observation:  {'1234': 41450, '1243': 41577, '1324': 41709, '1342': 41571, '1423': 41283, '1432': 41835, '2134': 41623, '2143': 41725, '2314': 41279, '2341': 41439, '2413': 41830, '2431': 41689, '3124': 41762, '3142': 41821, '3214': 41754, '3241': 41692, '3412': 41610, '3421': 41708, '4123': 41979, '4132': 41775, '4213': 41825, '4231': 41522, '4312': 41699, '4321': 41843}
chi square sum:  17.030672490759855

自由度為 4! - 1 = 23，P value = 0.7951，P value > apla (0.05)，統計檢定的結果不拒絕虛無解說，不拒絕 shuffle 的結果遵循 Uniform distribution。

dudect 論文

這篇論文主要描述透過 fix-vs-random leakage detection 的方式不用透過硬體，依靠執行時間的統計分析判斷程式是否是常數執行時間。

wurrrrrrrrrr的開發紀錄提到他對在 constant.c 中對 fix class 的 constant 做變動得到的不同結果，在我自己實驗後也復現了同樣結果，在 constant 為 0 時會過不了 trace-17 但在 0xAA, 0xCC, 0xFF 都過得了。
我對這個實驗結果的解釋是當我們把 fix class 固定填入 0 時，strlen 這類遇到第一個 \0 就會停止的函式的的執行時間就會非常短，造成 False positive 的非常數執行時間，是 fix input 單方面造成的並沒有任何 secret-dependency，巫同學的實驗結果也看出這個現象。

         if (classes[i] == 0)
-            memset(input_data + (size_t) i * CHUNK_SIZE, 0, CHUNK_SIZE);
+            memset(input_data + (size_t) i * CHUNK_SIZE, 0xff, CHUNK_SIZE);