2023q1 Homework1 (lab0)

contributed by <DokiDokiPB >

開發環境

$ gcc --version                                    
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz

使用主機： Macbook Air 2014 A1466 (Ubuntu Linux)

開發過程

在 queue.c 中，使用 list.h 的 list_head 串接資料，list.h 中具有幾個好用函式與巨集

初始化佇列開頭：INIT_LIST_HEAD
新增單一節點：list_add, list_add_tail
轉移多個節點至新的 head：list_splice
node：表示具有元素( element_t )的節點
list：具有多個節點的佇列開頭，或是~~複數串接的節點~~ ???
head：表示佇列開頭

改進漢語表達。
jserv

q_new

struct list_head *q_new()
{
    struct list_head *head = malloc(sizeof(struct list_head));
    if (!head) {
        return NULL;
    }
    INIT_LIST_HEAD(head);
    return head;
}

透過 malloc 配置 head，依據 list_head，皆為 head->next, head->prev 指向 heaed

q_free

void q_free(struct list_head *l)
{
    if (!l) {
        return;
    }
    element_t *e, *safe;
    list_for_each_entry_safe (e, safe, l, list) {
        free(e->value);
        free(e);
    }
    free(l);
}

在 list_for_each_entry_safe 中，利用 *safe 指標保留下個節點的位址，就能將現在操作的節點釋放

q_insert_head, q_insert_tail

將節點加入至 head ， q_insert_head 透過 list_add 加為首個節點；q_insert_tail 透過 list_add_tail 加為末個節點

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head || !s) {
        return false;
    }

    element_t *e = malloc(sizeof(element_t));
    char *dup = strdup(s);
    if (!e) {
        return false;
    }
    if (!dup) {
        free(e);
        return false;
    }
    e->value = dup;
    
    list_add(&e->list, head);
    return true;
}

bool q_insert_tail(struct list_head *head, char *s)
{
    // ...
    list_add_tail(&e->list, head);
    return true;
}

q_remove_head, q_remove_tail

將指定節點的元素從 head 移除

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    if (!head || !bufsize) {
        return NULL;
    }
    if (list_empty(head)) {
        return NULL;
    }

    struct list_head *node = head->next;
    element_t *e = container_of(node, element_t, list);
    size_t len = strlen(e->value);
    memset(sp, 0, bufsize);
    if (len > bufsize) {
        strncpy(sp, e->value, bufsize);
        sp[bufsize - 1] = '\n';
    } else {
        strncpy(sp, e->value, len);
    }
    list_del(node);
    return e;
}

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    // ...
    struct list_head *node = head->prev;
    // ...
    return e;
}

依據在 queue.h 中函式的說明，bufsize 為 *sp 的陣列長度

若 bufsize 小於元素中的字串長度(變數 len )，則擷取部分元素字串。
若 bufsize 大於 len，則整段複製到 *sp

思考如何精簡程式碼，善用前置處理器。
jserv

q_size

int q_size(struct list_head *head)
{
    if (!head) {
        return 0;
    }
    int c = 0;
    struct list_head *node;
    list_for_each (node, head) {
        c++;
    }
    return c;
}

利用 list_for_each 走過整個佇列，取得元素數量

q_delete_mid

























bool q_delete_mid(struct list_head *head)
{
    // 
    if (!head) {
        return false;
    }

    if (list_empty(head)) {
        return false;
    }

    struct list_head *slow, *fast;
    slow = head->next;
    fast = head->next;
    while (fast != head && fast->next != head) {
        slow = slow->next;
        fast = fast->next->next;
    }

    list_del(slow);
    element_t *e = container_of(slow, element_t, list);
    free(e->value);
    free(e);
    return true;
}

依據 LeetCode 2095. Delete the Middle Node of a Linked List 以及其圖片的說明

當節點數量為奇數時，中間節點為正中間（若有 7 個節點，取第 4 個)
當節點為偶數時，中間節點為中間取後一個(若有 4 個節點，取第 3 個)

在第 15 ~ 17 行中，透過一次移動一個節點的 slow 指標與一次移動兩個節點的 fast 指標，當 fast 回到 head 指標時，slow 指標即在中間節點的位址上。

改進程式碼，使其更精簡的程式碼。
jserv

q_reverse




















void q_reverse(struct list_head *head)
{
    if (!head) {
        return;
    }
    head->prev->next = NULL;
    struct list_head *node = head->next;
    struct list_head *cur = head;
    while (node) {
        struct list_head *tmp = node->next;
        cur->prev = node;
        node->next = cur;
        node->prev = tmp;

        cur = node;
        node = tmp;
    }
    cur->prev = head;
    head->next = cur;
}

反向 = reverse

需要詞彙精準的定義。
jserv

在第 6 行中，先將 doubly-linked list 尾端元素 head->prev->next 指派為 NULL，使得在 9 行中能夠脫離 while 迴圈，表示到尾端處理完畢

使用 node 變數走過每個節點，並將每個節點反向。
使用 cur 表示 node 的前一個節點；在第 18 行中表示反向後的第一個節點，串接至 head

q_reverseK

void q_reverseK(struct list_head *head, int k)
{
    // https://leetcode.com/problems/reverse-nodes-in-k-group/
    if (k <= 1)
        return;

    struct list_head *safe, *node, *start = head->next;
    int c = 0;
    list_for_each_safe (node, safe, head) {
        c++;
        if (c == k) {
            c = 0;
            struct list_head *prev = start->prev;

            LIST_HEAD(sub_head);
-            sub_head.next = start;
-            start->prev = &sub_head;
-            sub_head.prev = node;
-            node->next = &sub_head;
+            list_cut_position(&sub_head, start, node);
            q_reverse(&sub_head);
+            list_splice(&sub_head, prev);
-            prev->next = sub_head.next;
-            sub_head.next->prev = prev;
-            sub_head.prev->next = safe;
-            safe->prev = sub_head.prev;

            start = safe;
        }
    }
}

每 k 個節點，將該複數節點加入到 sub_head 中，透過既有的函式 q_reverse 處理；再將處理好的 sub_head 中的節點加回 head 中

看~~其他同學的實作~~，傾向直接操作節點。我的實作還是以一個 sub_head 包裝想要操作的節點。
TODO: 撰寫更精簡的程式碼。參照他人的程式碼時，應標註出處。
jserv

原本的實作考量 list_cut_postition 的函式參數名稱都為 head_from, head_to，可能僅適用完成的佇列而非節點。之後，觀摩 yanjiew 後，我發現可以使用 list_cut_position 去切割 head 的多個節點，再使用 list_splice 接回原本的 head

q_swap

void q_swap(struct list_head *head)
{
    // https://leetcode.com/problems/swap-nodes-in-pairs/
    q_reverseK(head, 2);
}

在作業說明中， q_swap 是一種 q_reverseK(head, 2); 的特例。

q_sort

避免張貼完整程式碼，你應該闡述自己的想法、考量因素，還有相關的策略。程式碼列表僅為輔助性質。
jserv

void q_sort(struct list_head *head)
{
//    ...

    // split with middle.
    struct list_head *slow = head->next, *fast = head->next->next;
    while (fast != head && fast->next != head) {
        slow = slow->next;
        fast = fast->next->next;
    }
    LIST_HEAD(left_head);
    LIST_HEAD(right_head);
-    struct list_head *right_node = slow->next;
-    left_head.next = head->next;
-    left_head.next->prev = &left_head;
-    left_head.prev = slow;
-    left_head.prev->next = &left_head;

-    right_head.next = right_node;
-    right_head.next->prev = &right_head;
-    right_head.prev = head->prev;
-    right_head.prev->next = &right_head;
-    INIT_LIST_HEAD(head);
+    list_cut_position(&left_head, head, slow);
+    list_splice_init(head, &right_head);
    q_sort(&left_head);
    q_sort(&right_head);
    merge_two_lists(&left_head, &right_head, head);
}

這裡採用 merge sort，程式碼主體可分為 divide, conquer and merge

Divide: 將 head 切為左半邊 left_head 與右半邊 right_head ，在程式碼中，切割的中間的節點為 slow
Conquer: 將 left_head 與 right_head 分別用 q_sort 函式處理
Merge: 合併兩個佇列(以 merge_two_lists 函式實作)
將較小元素先加入至 head 中，若其中一方沒有元素，即把另一方整個加入至 head 尾端中

在實作的過程中，可以利用 list_cut_position 切割左半與右半佇列

q_descend

int q_descend(struct list_head *head)
{
    int ret = 0;
    if (!head)
        return ret;
    if (list_empty(head))
        return ret;

    ret = 1;
    LIST_HEAD(less);
    struct list_head *cur = head->prev, *node = head->prev->prev;

    while (node != head) {
        struct list_head *prev = node->prev;
        element_t *e1 = list_entry(cur, element_t, list);
        element_t *e2 = list_entry(node, element_t, list);
        int c = strcmp(e2->value, e1->value);
        if (c >= 0) {
            cur = node;
            ret++;
        } else {
            list_move(node, &less);
        }
        node = prev;
    }

    element_t *e, *safe;
    list_for_each_entry_safe (e, safe, &less, list) {
        free(e->value);
        free(e);
    }
    return ret;
}

在 LeetCode 2487. Remove Nodes From Linked List 以及圖片的說明中

5 -> 2 -> 13 -> 3 -> 8
-->>
13 -> 8

可以視為由後往前來數的嚴格遞增的佇列。在實作上，也是從後往前的方式移除較小的元素。

less 表示要被刪除的節點，先將欲刪除的節點預先加入至 less 佇列中
cur 表示當前最大數值的節點
node 表示當前比較的節點，若大於 cur 的字串，則 cur 變為 node，ret 累計加 1

q_delete_dup

bool q_delete_dup(struct list_head *head)
{
    // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
    if (!head)
        return false;
    if (list_is_singular(head))
        return true;

    LIST_HEAD(delete_head);
    int move_count = 1;
    struct list_head *cur = head->next;
    while (cur != head) {
        struct list_head *node = cur->next;
        bool isdup = false;
        while (node != head) {
            struct list_head *next = node->next;
            element_t *e1 = list_entry(node, element_t, list);
            element_t *e2 = list_entry(cur, element_t, list);
            int c = strcmp(e1->value, e2->value);
            if (c == 0) {
                isdup = true;
                list_move(node, &delete_head);
            }
            node = next;
        }

        if (isdup) {
            list_move(cur, &delete_head);
        } else {
            move_count += 1;
        }
        cur = head;
        for (int i = 0; i < move_count; i++)
            cur = cur->next;
    }

    element_t *e, *safe;
    list_for_each_entry_safe (e, safe, &delete_head, list) {
        free(e->value);
        free(e);
    }

    return true;
}

雖然有提供 LeetCode 23. Merge k Sorted Lists，根據函式註解說明，並無特別強調排序。

在沒有排序的佇列中，每次比較一個節點是否有重複，皆必須走過所有節點確認

cur 表示當前的節點
node 表示被比較的節點，如果與 cur 的元素相同，則加入到 delete_head 中
透過 delete_head 重複的節點刪除。

此程式碼移除重複的節點，憑肉眼觀察也是，不過卻有 ERROR: Duplicate strings are in queue or distinct strings are not in queue，即使使用 q_sort 也會出現。

🤔🤔🤔 問題待排除

$ ./qtest 
cmd> new
l = []
cmd> it 5
l = [5]
cmd> it 4
l = [5 4]
cmd> it 3
l = [5 4 3]
cmd> it 2
l = [5 4 3 2]
cmd> it 1
l = [5 4 3 2 1]
cmd> it 5
l = [5 4 3 2 1 5]
cmd> it 4
l = [5 4 3 2 1 5 4]
cmd> dedup
ERROR: Duplicate strings are in queue or distinct strings are not in queue
l = [3 2 1]
cmd> quit

q_merge

int q_merge(struct list_head *head)
{
    // https://leetcode.com/problems/merge-k-sorted-lists/
    return 0;
}

~~如果是兩個佇列合併，則只需要 merge_two_lists 函式，若是多個佇列合併，則需要考量使用 priority queue 紀錄最小值~~

排除錯誤以及使用 Valgrind 排除錯誤

錯誤一：在兩個以上的佇列，在命令列使用 `free` 會造成程式進入無窮迴圈

避免使用「卡住」這樣不精準的詞彙。
jserv

遇到了一個錯誤，步驟如下

./qtest 啟動程式
new 兩次，會有兩個不同 ID 的 queue
free 一次會造成程式碼卡住

經過測試之後，並不是 q_free 造成的無限迴圈
推測是在某個地方造成無限迴圈

解決思路：

使用 GDB 命令 gdb qtest 重現錯誤
使用 Ctrl+C 中斷執行，查看無限迴圈的行數
使用 where 查看 call stack
使用 s 逐行檢查卡住的部份
- 發現卡在 qtest.c 的第 851 ~ 853 行
使用 p cur, p currnet->q, p currnet->q->next 檢查記憶體位址

Reading symbols from qtest...
(gdb) run
Starting program: /home/g111056119/Documents/lab0-c_2023/qtest 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
cmd> new
l = []
cmd> new
l = []
cmd> free
^C
Program received signal SIGINT, Interrupt.
is_circular () at qtest.c:853
853	        cur = cur->next;
(gdb) where
#0  is_circular () at qtest.c:853
#1  q_show (vlevel=vlevel@entry=3) at qtest.c:877
#2  0x0000555555558ab5 in do_free (argc=<optimized out>, argv=<optimized out>) at qtest.c:123
#3  0x0000555555559dfb in interpret_cmda (argc=argc@entry=1, argv=argv@entry=0x555555569070)
    at console.c:181
#4  0x000055555555a3b0 in interpret_cmd (cmdline=cmdline@entry=0x555555568ae0 "free")
    at console.c:201
#5  0x000055555555b01a in run_console (infile_name=infile_name@entry=0x0) at console.c:691
#6  0x00005555555591ed in main (argc=1, argv=<optimized out>) at qtest.c:1223

(gdb) p current->q
$18 = (struct list_head *) 0x5555555654b8 <current>
(gdb) p current->q->next
$19 = (struct list_head *) 0x555555568f78
(gdb) p current->q->next->next
$20 = (struct list_head *) 0x5555555654c0 <chain>
(gdb) p current->q->next->next->next
$21 = (struct list_head *) 0x555555568f78
(gdb) p current->q->next->next->next->next
$22 = (struct list_head *) 0x5555555654c0 <chain>










static bool is_circular()
{
    struct list_head *cur = current->q->next;
    while (cur != current->q) {
        if (!cur)
            return false;
        cur = cur->next;
    }
//     ...
}

透過比較 do_free 與 do_prev 函式內容，發現多打 & 符號

do_free 函式

if (chain.size > 1) {
        qnext = ((uintptr_t) &current->chain.next == (uintptr_t) &chain.head)
                    ? chain.head.next
                    : current->chain.next;
    }

do_prev 函式

if (chain.size > 1) {
        next = ((uintptr_t) chain.head.prev == (uintptr_t) &current->chain)
                   ? chain.head.next
                   : current->chain.next;
        current = next ? list_entry(next, queue_contex_t, chain) : NULL;
    }

因此在 do_free 函式中移除 & 即可

if (chain.size > 1) {
-        qnext = ((uintptr_t) &current->chain.next == (uintptr_t) &chain.head)
+        qnext = ((uintptr_t) current->chain.next == (uintptr_t) &chain.head)
                    ? chain.head.next
                    : current->chain.next;
    }

後來發現 github/lab0-c 已經修正該錯誤，因此只要 git 操作即可

git remote add lab0-c https://github.com/sysprog21/lab0-c.git
gir remote update
git merge lab0-c/master

錯誤二

使用指令測試單獨的檔案

valgrind ./qtest -f ./traces/trace-01-ops.cmd

valgrind 會回報以下問題：

# Test of insert_head and remove_head
l = []
l = [gerbil]
l = [bear gerbil]
l = [dolphin bear gerbil]
Removed dolphin from queue
l = [bear gerbil]
Removed bear from queue
l = [gerbil]
Removed gerbil from queue
l = []
==8952== 32 bytes in 1 blocks are still reachable in loss record 1 of 2
==8952==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==8952==    by 0x10CBCE: do_new (qtest.c:145)
==8952==    by 0x10DDFA: interpret_cmda (console.c:181)
==8952==    by 0x10E3AF: interpret_cmd (console.c:201)
==8952==    by 0x10E7B0: cmd_select (console.c:610)
==8952==    by 0x10F09C: run_console (console.c:705)
==8952==    by 0x10D1EC: main (qtest.c:1223)
==8952== 
==8952== 56 bytes in 1 blocks are still reachable in loss record 2 of 2
==8952==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==8952==    by 0x10F110: test_malloc (harness.c:133)
==8952==    by 0x10F515: q_new (queue.c:17)
==8952==    by 0x10CC07: do_new (qtest.c:149)
==8952==    by 0x10DDFA: interpret_cmda (console.c:181)
==8952==    by 0x10E3AF: interpret_cmd (console.c:201)
==8952==    by 0x10E7B0: cmd_select (console.c:610)
==8952==    by 0x10F09C: run_console (console.c:705)
==8952==    by 0x10D1EC: main (qtest.c:1223)
==8952==

後來發現 git commit c7d31497c5845bc0af308bf4601e7636c18f0153 修正此問題

command 的翻譯是「命令」，instruction 的翻譯是「指令」
jserv

在 qtest 提供新的命令 shuffle

參考其他巨集的實作，新增一個 shuffle 命令

ADD_COMMAND(new, "Create new queue", "");
ADD_COMMAND(free, "Delete queue", "");
+ ADD_COMMAND(shuffle, "Shuffle the current queue.", "");

由於在 git 設定中，不能變更 queue.h 與 list.h，因此實作只能在 q_test.c 進行

static bool do_shuffle(int argc, char *argv[])
{
    if (argc != 1) {
        report(1, "%s takes no arguments", argv[0]);
        return false;
    }

    if (!current) {
        report(3, "Warning: Try to operate null queue");
        return false;
    }
    struct list_head *q = current->q;
    if (!q)
        return false;
    if (list_empty(q) || list_is_singular(q))
        return true;

    const int n = q_size(q);
    for (int i = n; i > 1; i--) {
        int r = rand() % i;
        struct list_head *tmp = q->next;
        while (r--)
            tmp = tmp->next;
        list_move_tail(tmp, q);
    }
    return q_show(3);
}

研讀論文〈Dude, is my code constant time?〉

台北大學歐士田教授的假設檢定的介紹，閱讀之後對於論文的理解非常有幫助，使用了基本的統計學在驗證

H_{0}

論文摘要中，闡述一種資訊安全的攻擊方式：Timing attacks，一種 side-channel attacks (旁路攻擊)，例如函式驗證敏感資訊時，若是錯誤的資訊就提早回傳結果，導致攻擊者可以利用處理時間，來破解驗證機制。
- 論文在不知道硬體實作的情況下，透過統計方式檢驗是否程式碼為固定時間執行 (constant time)
論文介紹中，主要闡述其貢獻：TIMING LEAKAGE DETECTION
- 使用兩組資料測試，利用統計的方式去比較這兩者的執行時間
  - 第一組為固定的常態分佈的測試資料(固定的資料有固定的結果)
  - 第二組為隨機的資料
- 由於沒學過統計，這裡我理解為使用兩個統計結果，使用其標準差跟平均數去計算兩者統計結果差異，利用 Welcch's t-test 跟 two-tails test 計算出某個數值，以驗證或推翻虛無假說。

修正 Probably not constant time or wrong implementation

使用測試案例 trace-17-complexity.cmd 會獲得 Probably not constant time or wrong implementation，觀察其測試內容

# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
option simulation 1
it
ih
rh
rt
option simulation 0

根據作業說明提示：亂數＋論文閱讀

計算 approximate of df(degree of freedom)公式，參考 Welch (Unpooled Variance) t Tests and Confidence Intervals: Introduction 提及的公式：

$S_{1}$ 與
$S_{2}$ 表示變異數
$n$ 表示樣本數量

$d f = \frac{(\frac{S_{1}^{2}}{n_{1}} + \frac{S_{2}^{2}}{n_{2}})^{2}}{\frac{1}{n_{1} - 1} (\frac{S_{1}^{2}}{n_{1}})^{2} + \frac{1}{n_{2} - 1} (\frac{S_{2}^{2}}{n_{2}})^{2}}$

參考 RinHizakura 的筆記中的描述，在 fixture.c的report函式中

static bool report(void)
{
    double max_t = fabs(t_compute(t));
    double number_traces_max_t = t->n[0] + t->n[1];
    double max_tau = max_t / sqrt(number_traces_max_t);
//     ...
//     ...
    /* Definitely not constant time */
    if (max_t > t_threshold_bananas)
        return false;

    /* Probably not constant time. */
    if (max_t > t_threshold_moderate)
        return false;

    /* For the moment, maybe constant time. */
    return true;
}

max_t 表示 t value
利用 t value 查看在 df 產生的分佈中位置
計算區域下的積分面積，此為 p value
如果 p value 超過小於某個指定數值
$α$ ，即表示可以推翻
$H_{0}$ 並支持
$H_{1}$ ，在本作業中
- $H_{0}$ 表示 Probably Not Constant Time 的虛無假設
- $H_{1}$ 表示對立假設

新增以下程式碼：

double s1 = t->m2[0] / (t->n[0] - 1);
s1 = s1 * s1;
double s2 = t->m2[1] / (t->n[1] - 1);
s2 = s2 * s2;

double num = s1 / t->n[0] + s2 / t->n[1];
num = num * num;
double d1 = s1 / t->n[0];
d1 = (d1 * d1) / (t->n[0] - 1);
double d2 = s2 / t->n[1];
d2 = (d2 * d2) / (t->n[1] - 1);
double den = d1 + d2;
double df = num / den;

附註：

目前的程式碼計算僅僅算到 df。尚未有想法去計算積分下的數值