2024q1 Homework1 (lab0)

contributed by < yuyuan0625 >

Reviewed by `56han`

q_delete_mid 針對環狀且雙向的佇列，是否有更快的方法？

考慮雙向環狀的條件可以發現，用兩個指標從頭和尾向對方移動，
只需要存取

n

個節點就好，然而快慢指標需要存取

\frac{3}{2} n

個節點。

關於快慢指標需要存取
$\frac{3}{2} n$ 個節點
範例：假設有一個由 5 個節點組成的鏈結串列。對於 fast 指針：
在第一次迭代中，fast 指向第 3 個節點（直接存取第 1 個和第 3 個節點，間接存取第 2 個節點）。
在第二次迭代中，由於每次移動都是兩步，fast 指針會嘗試指向第 5 個節點（直接存取第 3 個和第 5 個節點，間接存取第4個節點）。
在這個過程中，第 1 個和第 3 個節點被直接走訪了 2 次（一次為 fast 指向的節點，另一次為 fast->next），而第 2 個、第 4 個和第 5 個節點至少被存取了一次。

$\to$ 不該只看開發紀錄，審查學員的 git commits 了嗎？

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

Reviewed by `164253`

q_remove_head 和 q_remove_tail 有許多重複部分

q_merge 中，將每個佇列都和第一個合併
會造成最差複雜度達到

\sum_{i = 1}^{n} i \log_{2} i \approx n^{2} \log_{2} n

開發環境

$ gcc -v
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            12th Gen Intel(R) Core(TM) i5-12400
    CPU family:          6
    Model:               151
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            2
    CPU max MHz:         4400.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4992.00

指定的佇列操作

`q_free`

commit db3a900

發現程式無法通過 trace-13 (Test of malloc failure on new)，最後發現是 q_free 漏掉了檢查 head 是否存在就直接對 head 進行 element_t 的相關操作。

修改程式碼如下：

@@ -25,6 +25,8 @@ struct list_head *q_new()
 /* Free all storage used by queue */
 void q_free(struct list_head *head)
 {
+    if (!head)
+        return;
     element_t *current, *safe;
     list_for_each_entry_safe (current, safe, head, list) {
         q_release_element(current);

`q_insert_head`

利用 strdup(s) 將字串複製到 element_t 的 value ，並使用 list_add 將新增的節點加到鏈結串列的前端。

/* Insert an element at head of queue */
bool q_insert_head(struct list_head *head, char *s)
{
    if (!head)
        return false;

    element_t *new = malloc(sizeof(element_t));
    if (!new)
        return false;

    new->value = strdup(s);
    if (!new->value) {
        free(new);
        return false;
    }

    list_add(&new->list, head);
    return true;
}

`q_insert_tail`

原本是將 q_insert_head 的程式碼的 list_add 修改成 list_add_tail。後續想到可以透過更改傳入的指標就可以將整個 q_insert_head 程式碼重用，維持程式的簡潔。

@@ -61,18 +60,7 @@ bool q_insert_tail(struct list_head *head, char *s)
     if (!head)
         return false;
 
-    element_t *new = malloc(sizeof(element_t));
-    if (!new)
-        return false;
-
-    new->value = strdup(s);
-    if (!new->value) {
-        free(new);
-        return false;
-    }
-
-    // INIT_LIST_HEAD(&new->list);
-    list_add_tail(&new->list, head);
+    q_insert_head(head->prev, s);
     return true;
 }

`q_remove_*`

先前漏掉了以下條件

@sp: string would be inserted
@bufsize: size of the string
If sp is non-NULL and an element is removed, copy the removed string to *sp (up to a maximum of bufsize-1 characters, plus a null terminator.)

因此我對兩個 remove 函式進行進行以下變更：
只複製到 bufsize - 1 的資料

commit cde6823

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
     element_t *removed_element = list_first_entry(head, element_t, list);
{
    if (list_empty(head))
        return NULL;
     list_del(&removed_element->list);
     if (sp != NULL && bufsize > 0) {
-        int len = strlen(removed_element->value);
-        strncpy(sp, removed_element->value, len);
-        sp[len] = '\0';
+        strncpy(sp, removed_element->value, bufsize - 1);
+        sp[bufsize - 1] = '\0';
     }
     return removed_element;
 }

`q_delete_mid`

利用快慢指標尋找鏈結串列的中點，並使用 q_release_elemen 將其 entry 刪除。

bool q_delete_mid(struct list_head *head)
{
    // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
    if (head == NULL || list_empty(head)) {
        return false;
    }

    struct list_head **indir = &head, *fast;
    for (fast = head->next; fast != head && fast->next != head;
         fast = fast->next->next) {
        indir = &(*indir)->next;
    }
    indir = &(*indir)->next;
    element_t *mid_element = list_entry(*indir, element_t, list);
    list_del(*indir);
    q_release_element(mid_element);
    return true;
}

`q_delete_dup`

利用 list_for_each_entry_safe 比較兩相鄰節點的 value 值，若相同則刪除當前節點，並將 is_dup = true ，到下一步時就會將該節點也刪除，並重製is_dup = false ，直到走訪完所有節點。

bool q_delete_dup(struct list_head *head)
{
    // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
    if (!head)
        return false;

    element_t *curr, *safe;
    bool is_dup = false;
    list_for_each_entry_safe (curr, safe, head, list) {
        if (&safe->list != head && !strcmp(curr->value, safe->value)) {
            list_del(&curr->list);
            q_release_element(curr);
            is_dup = true;
        }

        else if (is_dup) {
            list_del(&curr->list);
            q_release_element(curr);
            is_dup = false;
        }
    }
    return true;
}

`q_reverse`

實做方法：利用 list_for_each_safe 走訪每個節點，並使用 list_move 將節點移至鏈結串列的前端，最後就可以獲得和原本順序相反的鏈結串列。

void q_reverse(struct list_head *head)
{
    if (!head || list_empty(head))
        return;

    struct list_head *curr, *safe;
    list_for_each_safe (curr, safe, head)
        list_move(curr, head);
}

`q_reverseK`

將鏈結串列分割為長度為 k 的段落，並利用 q_reverse() 將此段落反轉，接著再合併回原始的鏈結串列中，重複此操作直到所剩的鏈結串列長度小於 k 。

原先我忽略 curr 已被更改過，最後還使用 tmp_head = curr 紀錄鏈結串列的切割點，導致存取到錯誤的位置。卡了很久以後看到 komark06 的實作和精美的圖例才發現要改用 safe->prev 才能指向正確的位置。
這次的經驗也讓我後續在更改鏈結串列後會更小心的檢查是否有誤用到已更改的指標，後續也會利用時間學習使用 Graphviz

void q_reverseK(struct list_head *head, int k)
{
    // https://leetcode.com/problems/reverse-nodes-in-k-group/
    if (!head || list_empty(head))
        return;

    struct list_head *curr, *safe, *tmp_head = head;
    LIST_HEAD(splited_list_head);
    int len = 0;
    list_for_each_safe (curr, safe, head) {
        len++;
        if (len == k) {
            list_cut_position(&splited_list_head, tmp_head, curr);
            q_reverse(&splited_list_head);
            list_splice(&splited_list_head, tmp_head);
            len = 0;
-           tmp_head = curr;
+           tmp_head = safe->prev;
        }
    }
}

使用 git diff 或 diff 命令來產生程式碼之間的變更，不要憑感覺填入，留意位移量。

好的已修正！

`q_ascend`

一開始的想法是從鏈結串列前端往後檢查，但後來發現從尾端往前檢查會更有效率，可以單純用兩兩 element_t 的 value 做比較就好，若不符合就刪除。

int q_ascend(struct list_head *head)
{
    // https://leetcode.com/problems/remove-nodes-from-linked-list/
    if (!head || list_empty(head))
        return 0;

    int len = 1;
    element_t *cur = list_last_entry(head, element_t, list);
    while (cur->list.prev != head) {
        element_t *prev = list_entry(cur->list.prev, element_t, list);
        if (strcmp(prev->value, cur->value) > 0) {
            list_del(&prev->list);
            q_release_element(prev);
        } else {
            len++;
            cur = prev;
        }
    }
    return len;
}

`q_descend`

只要將q_ascend中的判斷式由 > 改成 < 即可

- if (strcmp(prev->value, cur->value) > 0)
+ if (strcmp(prev->value, cur->value) < 0)

`q_swap`

q_swap 即為鏈結串列中兩兩一組進行 reverse，即為reverseK, k=2 的情況，因此可以重用q_reverseK

void q_swap(struct list_head *head)
{
    // https://leetcode.com/problems/swap-nodes-in-pairs/
    q_reverseK(head, 2);
}

`q_sort`

q_sort
參考你所不知道的 C 語言: linked list 和非連續記憶體的快慢指標，找出鏈結串列的中點，將未排序的佇列分為左、右兩個子佇列，然後遞歸地對這兩個子佇列進行排序，最後將這兩個有序的子佇列合併成一個有序的佇列。

void q_sort(struct list_head *head, bool descend)
{
    if (!head || list_empty(head) || list_is_singular(head))
        return;

    struct list_head **indir = &head, *fast;
    for (fast = head->next; fast != head && fast->next != head;
         fast = fast->next->next) {
        indir = &(*indir)->next;
    }

    LIST_HEAD(tmp);
    list_cut_position(&tmp, *indir, head->prev);
    q_sort(head, descend);
    q_sort(&tmp, descend);
    mergeTwoList(head, &tmp, descend);
}

mergeTwoList
實作方法：利用 cmp 紀錄 L1 第一個節點 E1 和 L2 第一個節點 E2 的大小關係（大於零表示 E1 較大，反之亦然），若 descend＝true 就將 cmp 的值變號。

假設descend ＝ false，若 cmp > 0 表示 E1 較大，因此將 E2 加入暫時的鏈結串列中，並將 count++。一直重複此比較動作直到 L1 或 L2 其中一個鏈結串列長度為 0 ，並將暫存的鏈結串列移回 L1 的前端，接著再將 L2 移至 L1 的前端，如此一來不論是 L1 或 L2 還有剩餘節點都可以保持正確排序。

int mergeTwoList(struct list_head *L1, struct list_head *L2, bool descend)
{
    if (!L1 || !L2)
        return 0;

    LIST_HEAD(head);
    int count = 0;
    while (!list_empty(L1) && !list_empty(L2)) {
        element_t *E1 = list_first_entry(L1, element_t, list);
        element_t *E2 = list_first_entry(L2, element_t, list);
        int cmp = strcmp(E1->value, E2->value);
        if (descend)
            cmp = -cmp;
        if (cmp > 0) {
            list_move_tail(&E2->list, &head);
        } else
            list_move_tail(&E1->list, &head);
        count++;
    }
    count += q_size(L1) + q_size(L2);
    list_splice(&head, L1);
    list_splice_tail_init(L2, L1);
    return count;
}

`q_merge`

透過走訪 chain 將每個佇列都串到第一個佇列，同時更新 size。在連接完成後，對合併後的佇列使用 q_sort 函數進行排序。

int q_merge(struct list_head *head, bool descend)
{
    if (!head || list_empty(head))
        return 0;

    if (list_is_singular(head))
        return list_first_entry(head, queue_contex_t, chain)->size;

    queue_contex_t *first = list_first_entry(head, queue_contex_t, chain);

    for (struct list_head *cur = head->next->next; cur != head;
         cur = cur->next) {
        queue_contex_t *que = container_of(cur, queue_contex_t, chain);
        list_splice_init(que->q, first->q);
        que->size = 0;
    }
    q_sort(first->q, descend);
    first->size = q_size(first->q);
    return first->size;
}

Valgrind 與 Address Sanitizer 記憶體檢查

使用Valgrind

執行 make valgrind，結果顯示無記憶體問題

執行 massif 分析:

Heap blocks
Heap administration blocks
Stack sizes

並獲得 massif.out.<pid> 檔案

$ valgrind --tool=massif ./qtest -f <trace file>

使用 massif-visualizer 將數據圖形化

$ massif-visualizer massif.out.<pid>

研讀論文〈Dude, is my code constant time?〉

論文內容

simulation 解釋

在 qtest.c 的 queue_insert 和 queue_remove 函式中會判斷 simulation 是否等於 1 ，若為 1 ， queue_insert 會使用is_insert_tail_const() 、is_insert_head_const() ; queue_remove 會使用 is_remove_tail_const() 、 is_remove_head_const() 來檢查函式是否能在常數時間內執行完畢。
這四個函式都是由 fixture.h 中 is_##op##_const 定義的。

fixture.h

#ifndef DUDECT_FIXTURE_H
#define DUDECT_FIXTURE_H
#define DUDECT_NUMBER_PERCENTILES  (100)

#include <stdbool.h>
#include "constant.h"

/* Interface to test if function is constant */
#define _(x) bool is_##x##_const(void);
DUT_FUNCS
#undef _

#endif

再進一步看 fixture.c ，發現 is_##op##_const 會回傳 test_const ，而test_const 最後會呼叫 doit 函式。

fixture.c

static bool test_const(char *text, int mode)
{
    bool result = false;
    t = malloc(sizeof(t_context_t));

    for (int cnt = 0; cnt < TEST_TRIES; ++cnt) {
        printf("Testing %s...(%d/%d)\n\n", text, cnt, TEST_TRIES);
        init_once();
        for (int i = 0; i < ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1;
             ++i)
            result = doit(mode);
        printf("\033[A\033[2K\033[A\033[2K");
        if (result)
            break;
    }
    free(t);
    return result;
}

#define DUT_FUNC_IMPL(op) \
    bool is_##op##_const(void) { return test_const(#op, DUT(op)); }

比對 dudect.h 與 fixture.c ，可以發現 dudect_main 對應於 doit 。在 dudect 中作者將第一批次的測試丟棄，並執行 prepare_percentiles 作為暖身，而doit 中沒有判斷是否為第一筆。另外論文作者在 update_statistics 中，從第十筆後才開始進行統計，故更改程式碼。

update_statistics

-    for (size_t i = 0; i < N_MEASURES; i++) {
+    for (size_t i = 10; i < N_MEASURES; i++) {

doit

static bool doit(int mode)
 {
     int64_t *before_ticks = calloc(N_MEASURES + 1, sizeof(int64_t));
@@ -133,8 +150,19 @@ static bool doit(int mode)
 
     bool ret = measure(before_ticks, after_ticks, input_data, mode);
     differentiate(exec_times, before_ticks, after_ticks);
-    update_statistics(exec_times, classes);
-    ret &= report();
+   int64_t *percentiles = (int64_t*) calloc(DUDECT_NUMBER_PERCENTILES, sizeof(int64_t));
+   bool first_time = percentiles[DUDECT_NUMBER_PERCENTILES - 1] == 0;
+   if (first_time) {
+       // throw away the first batch of measurements.
+       // this helps warming things up.
+       prepare_percentiles(exec_times, percentiles);
+   } else {
+       update_statistics(exec_times, classes);
+       ret &= report();
+   }

後續加入 fixture.c 中缺少的一些函式才能正確執行

+static int cmp(const int64_t *a, const int64_t *b) { return (int)(*a - *b); }
+
+static int64_t percentile(int64_t *a_sorted, double which, size_t size) {
+  size_t array_position = (size_t)((double)size * (double)which);
+  assert(array_position < size);
+  return a_sorted[array_position];
+}
+
+static void prepare_percentiles(int64_t *exec_times, int64_t *percentiles) {
+  qsort(exec_times, N_MEASURES, sizeof(int64_t), (int (*)(const void *, const void *))cmp);
+  for (size_t i = 0; i < DUDECT_NUMBER_PERCENTILES; i++) {
+    percentiles[i] = percentile(
+        exec_times, 1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES)),
+        N_MEASURES);
+  }
+}
+

fixture.h

+    #define DUDECT_NUMBER_PERCENTILES  (100)

更改後即可通過 trace-17-complexity 檢測。

研讀 Linux `list_sort.c` 並引入專案

將 list_sort.c 和 list_sort.h 加入專案。並且刪除不必要的 include 。

將 list_sort.c 中的 u8 改成 uint8_t，而 uint8_t 是由 stdint.h 提供的。因此在 list_sort.h 中加入 #include <stdint.h> 。

list_sort.h

+    #include "stdint.h"

list_sort.c
利用兩次NOT op來確保值一定是 0 或 1 ，因為 if 內邏輯敘述的值可以是 0 或是非 0 整數，所以如果不做 !!(x) 就無法確保值一定是 0 或 1

-    u8 count = 0;
+    uint8_t count = 0;

接著編譯後發現 likely 和 unlikely 未被定義，參考 [Linux Kernel慢慢學]likely and unlikely macro

在Linux kernel 2.6之後提供了likely, unlikely 這兩個macro，他們被定義在[/include/linux/compiler.h]。list_sort.h(https://github.com/torvalds/linux/blob/f6cef5f8c37f58a3bc95b3754c3ae98e086631ca/include/linux/compiler.h#L19)中，幫助compiler做最佳化。

將 likely, unlikely 這兩個巨集的定義加入 list_sort.h

+    #define likely(x) __builtin_expect(!!(x), 1)
+    #define unlikely(x) __builtin_expect(!!(x), 0)

另外，移除 list_sort.c 的最後一行

-    EXPORT_SYMBOL(list_sort);

最後，因為 list_sort 需要傳入比較函式 cmp ，因此利用字串比較的方式在 qtest.c 中實作。

參考 chiangkd同學：研讀 Linux 核心原始程式碼的 list_sort.c

新增 shuffle 指令
在 qtest.c 中加入

do_lsort()
console_init() 中新增命令

ADD_COMMAND(lsort, "Sort queue in ascending order provided by linux kernel","");

效能測試

參考 chiangk 的比較實驗，利用 perf 分析執行時間

perf stat --repeat 5 -e instructions,cycles ./qtest -f traces/trace-sort.cmd

trace-sort.cmd

option fail 0
option malloc 0
new
ih RAND 500000
time
sort
time

本專案實作的 q_sort:

Instructions	Cycles
2,469,368,053	2,577,816,050

linux 核心的 list_sort:

Instructions	Cycles
2,338,003,862	2,508,507,272

可以發現 linux 核心的排序演算法比我自行實作的更有效率。

實作 `shuffle` 指令

實作 Fisher-Yates shuffle Algorithm

首先，使用 q_size 函式獲取佇列的大小 len。
從範圍 0 到 (len - 1) 中隨機選擇一個索引 random 。然後，將 old 指向從佇列前端開始數來第 random 個節點，將 new 指向尚未被選取的最後一個節點。將 old 和 new 指向的節點的值交換，然後將 len 減 1。
隨著 len 減小，已經被選取並交換值到佇列後面的節點會逐漸增多，直到所有節點都已經被選取過，隨機洗牌操作結束。

演算法範例

Fisher–Yates shuffle

實作

commit 3d1fd05

因為無法更改 queue.h 的內容，因此需要另外新增 shuffle.c，另外要加入註解 // cppcheck-suppress nullPointer 才能忽略警告。

element_t *oldnode =
            list_entry(old, element_t, list);  // cppcheck-suppress nullPointer
element_t *newnode =
            list_entry(new, element_t, list);  // cppcheck-suppress nullPointer

新增 shuffle 指令
在 qtest.c 加入

do_shuffle()
console_init() 中新增命令

ADD_COMMAND(shuffle, "Shuffle queue", "");

結果呈現

cmd> new
l = []
cmd> it RAND 10
l = [qgddxcgbz nbwne rnbxkhrj laggviiki btposwd lewpbxnu uqqre volpds bnqaqk emtjpwfzs]
cmd> sort
l = [bnqaqk btposwd emtjpwfzs laggviiki lewpbxnu nbwne qgddxcgbz rnbxkhrj uqqre volpds]
cmd> shuffle
l = [emtjpwfzs nbwne uqqre bnqaqk qgddxcgbz rnbxkhrj laggviiki lewpbxnu volpds btposwd]
cmd> sort
l = [bnqaqk btposwd emtjpwfzs laggviiki lewpbxnu nbwne qgddxcgbz rnbxkhrj uqqre volpds]

統計方法驗證

使用 lab0-d 提供的程式碼測試亂度

Expectation:  41666
Observation:  {'1234': 41670, '1243': 41248, '1324': 41440, '1342': 41939, '1423': 41832, '1432': 41197, '2134': 41628, '2143': 41902, '2314': 41715, '2341': 41731, '2413': 41998, '2431': 41715, '3124': 41594, '3142': 41703, '3214': 41738, '3241': 41592, '3412': 41683, '3421': 41692, '4123': 41371, '4132': 41466, '4213': 42206, '4231': 41722, '4312': 41529, '4321': 41689}
chi square sum:  28.404214467431473

圖表顯示 shuffle 的結果大致符合 uniform distribution

2024q1 Homework1 (lab0)

Reviewed by 56han

Reviewed by 164253

開發環境

指定的佇列操作

q_free

q_insert_head

q_insert_tail

q_remove_*

q_delete_mid

q_delete_dup

q_reverse

q_reverseK

q_ascend

q_descend

q_swap

q_sort

q_merge