Try   HackMD

2024q1 Homework1 (lab0)

contributed by < yu-hisennn >

Reveiwed by yy214123

你的 q_delete_mid 以快慢指標的策略進行實作,為找出串列的中間節點:

快指標須走訪

n
慢指標須走訪
n2

總共
3n2

考慮到使用的資料結構為雙向環狀鍊結串列,可改以下方方式實作:







G



head

head



1

1



head->1






2

2



1->2






3

3



2->3






4

4



3->4






5

5



4->5






5->head






p1

*p1



p1->1





p2

*p2



p2->5











G



head

head



1

1



head->1






2

2



1->2






3

3



2->3






4

4



3->4






5

5



4->5






5->head






p1

*p1



p1->2





p2

*p2



p2->4











G



head

head



1

1



head->1






2

2



1->2






3

3



2->3






4

4



3->4






5

5



4->5






5->head






p1

*p1



p1->3





p2

*p2



p2->3





僅走訪

n 次即可找到串列之中間節點。

開發環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ lscpu
架構:                   x86_64
CPU 作業模式:            32-bit, 64-bit
Address sizes:          39 bits physical, 48 bits virtual
Byte Order:             Little Endian
CPU(s):                 8
On-line CPU(s) list:    0-7
供應商識別號:             GenuineIntel
Model name:             Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
CPU 家族:               6
型號:                   60
每核心執行緒數:           2
每通訊端核心數:           4
Socket(s):              1
製程:                   3
CPU max MHz:            4400.0000
CPU min MHz:            800.0000
BogoMIPS:               7981.55

針對佇列的實作

無論標題和內文中,中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)

在實作過程中,發現有些程式碼已經在 list.h 檔中實作出來了,所以直接呼叫即可,這樣一來可以讓程式碼更精簡,可讀性也會有所提昇,像是︰

  • q_new
struct list_head *q_new()
{
    return NULL;
    struct list_head *head = malloc(sizeof(struct list_head));
    if (!head) {
        return NULL;  // Allocation failed
    }
-   head->prev = head->next = head;
+   INIT_LIST_HEAD(head);
    return head;
}

上述 diff 的內容,應該以 git diffdiff 命令產生,注意修改行數的位移量。

  • q_insert_head
bool q_insert_head(struct list_head *head, char *s)
{
    if (!head || !s) {
        return false;
    }
    element_t *new_elem = malloc(sizeof(element_t));
    if (!new_elem) {
        return false;  // Allocation failed
    }
    new_elem->value = strdup(s);  // Copy string
    if (!new_elem->value) {
        free(new_elem);
        return false;  // String copy failed
    }
-   new_elem->list.next = head->next;
-   new_elem->list.prev = head;
-   head->next->prev = &new_elem->list;
-   head->next = &new_elem->list;
+   list_add(&new_elem->list, head);
    return true;
}
  • q_insert_tail
bool q_insert_head(struct list_head *head, char *s)
{
    if (!head || !s) {
        return false;
    }
    element_t *new_elem = malloc(sizeof(element_t));
    if (!new_elem) {
        return false;  // Allocation failed
    }
    new_elem->value = strdup(s);  // Copy string
    if (!new_elem->value) {
        free(new_elem);
        return false;  // String copy failed
    }
-   new_elem->list.prev = head->prev;
-   new_elem->list.next = head;
-   head->prev->next = &new_elem->list;
-   head->prev = &new_elem->list;
+   list_add_tail(&new_elem->list, head);
    return true;
}

或是直接呼叫之前寫好的 q_insert_head

bool q_insert_tail(struct list_head *head, char *s) {
    if (!head || !s) {
        return false;
    }
-   if (!head || !s) {
-       return false;
-   }
-   element_t *new_elem = malloc(sizeof(element_t));
-   if (!new_elem) {
-       return false; // Allocation failed
-   }
-   new_elem->value = strdup(s); // Copy string
-   if (!new_elem->value) {
-       free(new_elem);
-       return false; // String copy failed
-   }
-   list_add_tail(&new_elem->list, head);
-   return true;
+   return q_insert_head(head->prev, s);
}

同理,q_remove_tail也可以呼叫之前寫好的 q_remove_head

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) {
    if (!head || head->prev == head) {
        return NULL; // Queue is empty or NULL
    }
-   struct list_head *last = head->prev;
-   element_t *elem = list_entry(last, element_t, list);
-   if (sp && bufsize > 0) {
-       strncpy(sp, elem->value, bufsize - 1);
-       sp[bufsize - 1] = '\0';
-   }
-   list_del(last);
-   q_release_element(elem);
-   return elem;
+   return q_remove_head(head->prev->prev, sp, bufsize);
}

q_insert_head

commit 7ab86a7

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

原本利用 strdup 的方式來實作,測試時發現會報出 ERROR: Probably not constant time or wrong implementation 的錯誤,於似乎就去確認 strdup 的實作方式,發現可能是因為多次呼叫需要跑遍整條字串長度的函式而造成的,於是就先把它改成自行宣告的作法。

第一次修改時,沒有使用一個變數來紀錄 strlen(n) 的回傳值,而是要用時才作呼叫,結果也會發生 ERROR: Probably not constant time or wrong implementation ,最後利用一個變數來紀錄避免多次呼叫 strlen(n) 來降低整體的時間複雜度。

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head || !s) {
        return false;
    }
    element_t *new_elem = malloc(sizeof(element_t));
+   int s_len = strlen(s);

    if (!new_elem) {
        return false;  // Allocation failed
    }
-   new_elem->value = strdup(s);
+   new_elem->value = (char *) malloc((s_len + 1) * sizeof(char));
    if (!new_elem->value) {
        free(new_elem);
        return false;  // String copy failed
    }

+   strncpy(new_elem->value, s, (s_len + 1));
    list_add(&new_elem->list, head);
    return true;
}

q_delete_mid

  • 採取快慢指標的方式,來找到中間要刪掉的節點,其中迴圈終止時,slow 即為整條鏈結串列的中點。
struct list_head *slow = head->next, *fast = head->next->next;
while (fast != head && fast->next != head) {
    slow = slow->next;
    fast = fast->next->next;
}

commit 488930d

q_delete_dup

commit db89b47

先判斷目前的值與下一個的值是不是重複的,重複的話就將下一個值給刪除,並且用一個旗標來紀錄目前的值有出現重複的情形,最後再將其刪除即可。

bool found = false;
while (current != head && current->next != head) {
    element_t *current_entry = list_entry(current, element_t, list);
    element_t *next_entry = list_entry(current->next, element_t, list);

    if (strcmp(current_entry->value, next_entry->value) == 0) {
        list_del(current->next);
        q_release_element(next_entry);
        found = true;
    } else {
        current = current->next;
        if (found) {
            list_del(current->prev);
            q_release_element(current_entry);
            found = false;
        }
    }
}

q_swap

  • 利用兩個節點來暫存下一個及下下一個節點,按照順序來修改節點指向
struct list_head *current = head->next;
while (current != head && current->next != head) {
    struct list_head *next = current->next;
    struct list_head *nextNext = next->next;

    next->next = current;
    next->prev = current->prev;
    current->prev->next = next;
    current->next = nextNext;
    nextNext->prev = next;
    if (current->next != head) {
        current->next->prev = current;
    }

    current = nextNext;
}

commit 504c744

  • 改用 list_move 的方式來實作
while (current != head && current->next != head) {
-   struct list_head *next = current->next;
-   struct list_head *nextNext = next->next;
-   next->next = current;
-   next->prev = current->prev;
-   current->prev->next = next;
-   current->next = nextNext;
-   nextNext->prev = next;
-   if (current->next != head) {
-       current->next->prev = current;
-   current = nextNext;
+   struct list_head *prev = current->prev;
+   list_move(current->next, prev);
+   current = current->next;
}

commit 47abac9

q_reverseq_reverseK

commit a30283b

避免濫用詞彙,此「核心」非彼「核心」(還記得課名嗎?),可改說「二者實作手法相似,唯 ___ 不同」

這兩個實作核心概念差不多,都是利用到巨集裡的 list_move ,利用一個指標定在第一個值上(最後會變成最後一個值),把它後面的都移到head/start後面即可達成反轉的效果

struct list_head *start = head, *current = NULL;
while (times--) {
    int count = k;
    for (current = start->next; --count > 0;)
        list_move(current->next, start);
    start = current;
}

q_ascend 及 q_descend

這兩個實作手法也是大同小異,只差在裡面的判斷是 >= 或是 <= ,這邊利用一個變數 result 來儲存佇列沒有被變動幾次,最後直接回傳即可(不需要在呼叫 q_size )

element_t *min_entry = list_entry(head->prev, element_t, list);
int result = 1;

for (struct list_head *current = head->prev->prev; current != head;) {
    element_t *current_entry = list_entry(current, element_t, list);
    struct list_head *prev = current->prev;
    if (strcmp(current_entry->value, min_entry->value) >= 0) {
        list_del_init(current);
        q_release_element(current_entry);
    } else {
        result++;
        min_entry = current_entry;
    }
    current = prev;
}

commit 32571b7

將中心的實作內容拉出來成一個函式,呼叫時再把要遞增或是遞減傳入,減少兩個函式重複的程式碼

int process_list(struct list_head *head, bool ascend)
{
    // 雙方重複的程式碼
    bool condition =
            ascend ? (strcmp(current_entry->value, extreme_entry->value) >= 0)
                   : (strcmp(current_entry->value, extreme_entry->value) <= 0);
}
int q_ascend(struct list_head *head)
{
    return process_list(head, true);
}

int q_descend(struct list_head *head)
{
    return process_list(head, false);
}

commit f7a6c27

q_sort

這邊的實作是利用 merge sort 的方式去進行,一開始先利用快慢指標的方式去找到中點,找到中點後切一刀來分成左半邊及右半邊,在利用 merge 函式來完成合併的動作。

list_move 來串接節點,當有一邊串列為空且另一邊不為空時,再把不為空的串列接回 current 後方。

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

void merge(struct list_head *head,
           struct list_head *left,
           struct list_head *right,
           bool descend)
{
    struct list_head *current = head;
    while (!list_empty(left) && !list_empty(right)) {
        element_t *left_entry = list_entry(left->next, element_t, list);
        element_t *right_entry = list_entry(right->next, element_t, list);

        if (descend ? strcmp(left_entry->value, right_entry->value) >= 0
                    : strcmp(left_entry->value, right_entry->value) <= 0) {
            list_move(left->next, current);
        } else {
            list_move(right->next, current);
        }
        current = current->next;
    }
    struct list_head *remaining = list_empty(left) ? right : left;
    list_splice_tail(remaining, current->next);
}

/* Sort elements of queue in ascending/descending order */
void q_sort(struct list_head *head, bool descend)
{
    if (!head || list_empty(head) || head->next->next == head) {
        return;
    }

    struct list_head *slow = head->next, *fast = head->next->next;
    while (fast != head && fast->next != head) {
        slow = slow->next;
        fast = fast->next->next;
    }

    LIST_HEAD(left);   // left half
    LIST_HEAD(right);  // right half
    list_cut_position(&left, head, slow);
    list_splice_init(head, &right);

    q_sort(&left, descend);
    q_sort(&right, descend);

    merge(head, &left, &right, descend);
}

commit 23062f2

q_merge

不理解就說不理解,不要說「不太理解」,理工人說話要精準。
你若有疑慮,就嘗試提交修改,從而確認對整個專案的衝擊。

了解 descend 參數的用意,因為已經拿到 k 個 sorted 的佇列了,合併時還有需要管是不是 descend 嗎?還是說有可能給的 sorted 佇列跟 descend 兩邊是不一樣的排序方式
先將所有的內佇列串起來,再執行 q_sort 來進行排序。

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

queue_contex_t *target = list_entry(head->next, queue_contex_t, chain);
queue_contex_t *current = NULL;
list_for_each_entry (current, head, chain) {
    if (current == target)
        continue;
    list_splice_init(current->q, target->q);
    target->size = target->size + current->size;
    current->size = 0;
}
q_sort(target->q, descend);

commit 6d8e3db

實作疑問

詳見作業說明。

了解!

目前分數還是卡在 95/100 ,因 q_insert_headq_insert_tail ,有機會變成 ERROR: Probably not constant time

新增 percentile

在原先的 lab0-c 中,percentile 被拔掉了,於是先研讀論文看 percentile 會對測量造成什影響,發現到

  • percentile 是來避免不正確的外部資料來拖長整體的時間,造成時間分析上會不準確
  • 利用給定的閥值去篩選哪些資料要留下來,哪些資料要捨去
  • 圖為
    1(12)10(i+1)NPERCENTILES
    ,也就是選取的值經過排序後,範圍要在裡面才會被採用
  • img
  • 參照 dudect ,以及觀摩 willwillhi 的作法
static int cmp(const int64_t *a, const int64_t *b)
{
    return (int) (*a - *b);
}

static int64_t percentile(int64_t *a, double which, size_t size)
{
    size_t array_position = (size_t) ((double) size * (double) which);
    assert(array_position >= 0);
    assert(array_position < size);
    return a[array_position];
}

void prepare_percentiles(int64_t *exec_times, int64_t *percentiles)
{
    qsort(exec_times, N_MEASURES, sizeof(int64_t),
          (int (*)(const void *, const void *)) cmp);
    for (size_t i = 0; i < N_PERCENTILES; i++) {
        percentiles[i] = percentile(
            exec_times, 1 - (pow(0.5, 10 * (double) (i + 1) / N_PERCENTILES)),
            N_MEASURES);
    }
}

commit 4f98446

分數紀錄

kirby

  • 利用 API 重構

commit 6d3cad3

無論標題和內文中,中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)


研讀論文 <Dude, is my code constant time?>

作者開發了一套工具 dudect ,使其可以檢驗程式是否是常數時間複雜度。

方法

  • 步驟一: 在兩個不同類別的資料上,個別測量其執行時間
    • 類別定義: 利用 fix-vs-random 來做測試。其中,第一種類別是常數,第二種則為隨機選取
    • 迴圈計數器: 用來精準地測量執行時間
    • 環境變數: 去最小化對於環境改變,造成的結果的影響
  • 步驟二: 後處理
    • 裁減: 由於系統或其他外部活動可能會造成測量的偏差,在這種情況下,需要丟棄那些與類別無關或是超過的數據
    • 典型時間分布是正偏態(positively skew),示意圖如下
    • img
  • 步驟三: 統計測試
    • t-test: 利用 Welch's t-test. 來檢驗平均值的等價性
    • 非參數測試: 利用 Kolmogorov-Smirnov 等方式來測試

實作

  • 利用
    p
    當作閥值,丟棄掉那些大於百分位數
    p
    的數值
  • 使用 Welch's t-testWelford's 的變異數去改善數值的穩定性

Student's t-distribution

圖片

定義

是一種機率分佈的類型,與常態分佈相似,常應用在估計樣本少或是不知道變異數的情況下

與常態分佈比較

  • 兩者都屬於對稱分佈
  • 相較於常態分佈,前端以及尾端有較多分佈機率
  • 常態分佈會被標準差以及變異數兩個所影響,T-分佈則是被 degree of freedom(df)所控制

為何此篇論文使用t-distribution?

  • 在測試的資料集中,樣本數量,故不適用需要龐大樣本數量的常態分佈,T-分佈會較合適

指出論文和程式碼實作出入之處。


引入 lib/list_sort.c

流程

  • 新增 list_sort.clist_sort.h
  • 修改 #include ,以及 list_sort.h 的內容
- #include <linux/kernel.h>
- #include <linux/bug.h>
- #include <linux/compiler.h>
- #include <linux/export.h>
- #include <linux/string.h>
- #include <linux/list.h>
+ #include "queue.h"
+ #include "list_sort.h"
  • list_sort.o 加入至 MakeFileOBJS
  • qtest.c 中,新增 do_sort_list 的函式以及指令
  • 打 make -> help 確認有沒有出現,即是否正確執行
cmd> help
Commands:
  #           ...          | Display comment
  ascend                   | Remove every node which has a node with a strictly less value anywhere to the right side of it
  dedup                    | Delete all nodes that have duplicate string
  descend                  | Remove every node which has a node with a strictly greater value anywhere to the right side of it
  dm                       | Delete middle node in queue
  free                     | Delete queue
  help                     | Show summary
  ih          str [n]      | Insert string str at head of queue n times. Generate random string(s) if str equals RAND. (default: n == 1)
  it          str [n]      | Insert string str at tail of queue n times. Generate random string(s) if str equals RAND. (default: n == 1)
  list_sort                | Sort queue in ascending/descening order by linux list sort
  log         file         | Copy output to file
  merge                    | Merge all the queues into one sorted queue
  new                      | Create new queue
  next                     | Switch to next queue
  option      [name val]   | Display or set options. See 'Options' section for details
  prev                     | Switch to previous queue
  quit                     | Exit program
  reverse                  | Reverse queue
  reverseK    [K]          | Reverse the nodes of the queue 'K' at a time
  rh          [str]        | Remove from head of queue. Optionally compare to expected value str
  rt          [str]        | Remove from tail of queue. Optionally compare to expected value str
  show                     | Show queue contents
  size        [n]          | Compute queue size n times (default: n == 1)
  sort                     | Sort queue in ascending/descening order
  source                   | Read commands from source file
  swap                     | Swap every two adjacent nodes in queue
  time        cmd arg ...  | Time command execution
  web         [port]       | Read commands from builtin web server
Options:
  descend     0            | Sort and merge queue in ascending/descending order
  echo        1            | Do/don't echo commands
  entropy     0            | Show/Hide Shannon entropy
  error       5            | Number of errors until exit
  fail        30           | Number of times allow queue operations to return false
  length      1024         | Maximum length of displayed string
  malloc      0            | Malloc failure probability percent
  simulation  0            | Start/Stop simulation mode
  verbose     4            | Verbosity level

commit d16586c

比較

為防止只使用一次比較可能會有不公平的情形發生,我採用每個大小個測3次來取其平均。

# script for compare q_sort and list_sort performance.
new
it RAND 100000
time
sort # list_sort
time
free

q_sort

使用 Metric prefix,而非漢語的「萬」。

資料數 round 1 round 2 round 3 平均
106
0.067 0.067 0.068 0.0673
2106
0.151 0.143 0.149 0.1476
3106
0.239 0.233 0.240 0.2373
4106
0.331 0.334 0.334 0.333
5106
0.425 0.432 0.428 0.4283

list_sort

資料數 round 1 round 2 round 3 平均
106
0.059 0.061 0.063 0.061
2106
0.136 0.127 0.130 0.131
3106
0.219 0.207 0.223 0.2163
4106
0.297 0.296 0.311 0.3013
5106
0.385 0.382 0.384 0.3836

image

可以發現資料數量 100000 ~ 500000 時間差從 0.0063 -> 0.0447

提供更多解讀,闡述你的啟發。


在 qtest 提供新的命令 shuffle

此函式用於打亂串列的順序
利用 Fisher-Yates shuffle 演算法來實作

struct list_head *last = head->prev;
int size = q_size(head);
while (size-- > 1) {
    int index = rand() % size;
    struct list_head *current = head->next, *temp;
    while (index--) {
        current = current->next;
    }
    temp = last->prev;
    list_move(last, current->prev);
    if (temp != current)
        list_move(current, temp);
    last = current->prev;
}

流程如下:

  1. 找到整條串列長度,並設為 size
  2. 隨機挑選範圍 0 ~ size - 1 的節點與 last 作交換
  3. size 數減 1,並將 last 更新為 current 的前一位
  4. 重複 2. ~ 3. ,直到 size == 1 時,即完成亂序

圖示(為了圖片簡潔,這邊不畫最後一個節點連到 head 的線):







q_shuffle
Origin


h0

prev

head

next



p1

prev

1

next



h0->p1





p1->h0





p2

prev

2

next



p1->p2





p2->p1





p3

prev

3(last)

next



p2->p3





p3->p2





Round 1:

  • Random Index 0 ~ 2, ex: Index = 0






q_shuffle
Round 1 (first move)


h0

prev

head

next



p3

prev

3(last)

next



h0->p3





p3->h0





p1

prev

1(current)

next



p3->p1





p1->p3





p2

prev

2(temp)

next



p1->p2





p2->p1











q_shuffle
Round 1 (Second move)


h0

prev

head

next



p3

prev

3(last)

next



h0->p3





p3->h0





p2

prev

2(temp)

next



p3->p2





p2->p3





p1

prev

1(current)

next



p2->p1





p1->p2











q_shuffle
Round 1 (Update last node)


h0

prev

head

next



p3

prev

3

next



h0->p3





p3->h0





p2

prev

2(last)

next



p3->p2





p2->p3





p1

prev

1

next



p2->p1





p1->p2





Round 2:

  • Random Index 0 ~ 1, ex: Index = 0






q_shuffle
Round 2 (first move)


h0

prev

head

next



p2

prev

2(last)

next



h0->p2





p2->h0





p3

prev

3(current, temp)

next



p2->p3





p3->p2





p1

prev

1

next



p3->p1





p1->p3





  • current == temp






q_shuffle
Round 2 (Update last node)


h0

prev

head

next



p2

prev

2(last)

next



h0->p2





p2->h0





p3

prev

3

next



p2->p3





p3->p2





p1

prev

1

next



p3->p1





p1->p3





size == 1, 函式結束

測試隨機度

採用 python script 來對此隨機函式作測試,總共執行 1000000
統計如下:

順序 觀察到的頻率 期待的頻率
(OiEi)2Ei
123 166653 166666 0.00101
132 166775 166666 0.07128
213 167066 166666 0.96000
231 166651 166666 0.00135
312 166440 166666 0.30645
321 166415 166666 0.37800
Total 0.71812
  • X2
    = 1.7181188724754899

亂序 1234 的如圖所示

shuffle_res