Try   HackMD

2024q1 Homework1 (lab0)

contributed by < TSAICHUNGHAN >

Reviewed by jimmy01240397

  1. 善用巨集減少重複的程式碼
    • q_insert_head
    • q_insert_tail
    • q_remove_head
    • q_remove_tail
  2. q_delete_mid:沒必要使用 NULL,可以直接判斷是否回到 head 即可。
  3. q_delete_dup:可使用變數儲存比較結果來減少多餘的分支
-   bool flag = false;
+   bool now_del = false;
    element_t *entry, *safe;
    list_for_each_entry_safe (entry, safe, head, list) {
+       now_del = entry->list.next != head &&
+                 strcmp(entry->value, safe->value) == 0;
+       if (now_del || flag) {
-       if (entry->list.next != head &&
-           strcmp(entry->value, safe->value) == 0) {
            list_del(&entry->list);
            q_release_element(entry);
+           flag = now_del;
-           flag = true;
-       } else if (flag) {
-           list_del(&entry->list);
-           q_release_element(entry);
-           flag = false;
        }
    }
  1. 已修改沒必要的 NULL ,我只需將我的迴圈改成判斷是否回到 head 即可,因此也不須將環狀鏈結串列斷開。
-    while (fast != NULL && fast->next != NULL)
+    while (fast != head && fast->next != head)
  1. q_delete_dup 裡我確實寫的過於冗長,我已改進我的程式碼,下次 coding 會避免沒必要的分支以精簡程式碼。

修改後 commit:579a2ef
謝謝 jimmy01240397 同學抽空 Review 。
jeremy

Reviewed by jujuegg

- 不須列出完整程式碼,列出重點部分討論即可 - 有列出實作函式的設計流程,讚 - 可以在程式碼中加入適當的空行,增加可讀性

你的洞見呢?與其談「心法」,不如設想「如果我是你」,我會怎麼做,示範一次給同學看,而且檢討自己哪裡沒做好,從而呼應原本學員的不足處。清楚標註學員的不足處 (git commits 和描述)。
軟體工程師要學會說故事,本作業要求學員從良性詳盡的批評開始。繼續檢討!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

實驗環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
$ lscpu | less
架構:                              x86_64
CPU 作業模式:                       32-bit, 64-bit
Address sizes:                     39 bits physical, 48 bits virtual
Byte Order:                        Little Endian
CPU(s):                            8
On-line CPU(s) list:               0-7
供應商識別號:                       GenuineIntel
Model name:                        Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
CPU 家族:                          6
型號:                              158
每核心執行緒數:                      1
每通訊端核心數:                      8
Socket(s):                         1
製程:                              12
CPU max MHz:                       4900.0000
CPU min MHz:                       800.0000
BogoMIPS:                          7200.00

對照 Dictionary.comimplement 一詞的解釋:

  • to fulfill; perform; carry out:
  • to put into effect according to or by means of a definite plan or procedure.
  • Computers. to realize or instantiate (an element in a program), often under certain conditions as specified by the software involved.
  • to fill out or supplement

「實作」會是符合上述 "implement" 語意的譯詞,取「起」以「創作」的意思,而「製造」本質上是固定輸入並通常產生固定輸出的行為。
資訊科技詞彙翻譯

佇列實作

  • 參考資料:
  1. The Linux 核心API - List Management Functions
  2. 你所不知道的 C 語言: linked list 和非連續記憶體
  • 導入作業規範的程式碼縮排風格

clang-format的使用範例

$ clang-format -i queue.c

q_new

建立一個空佇列,初始化 struct list_head ,須先將 nextprev 都指向自身。

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

設計流程:

用 malloc 分配記憶體空間給 head
若 head 分配空間無效,則回傳 NULL
使用 list.h 中的 INIT_LIST_HEAD() 來初始化 head

程式碼

struct list_head *q_new()
{
    struct list_head *new =
        (struct list_head *) malloc(sizeof(struct list_head));
    if (!new)
        return NULL;
    INIT_LIST_HEAD(new);
    return new;
}

使用作業規範的程式碼縮排風格。

q_free

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

若 head 為無效,回傳
新增兩個 element 指標
使用 list_for_each_entry_safe 可以訪問每個鏈節,且儲存下個連結的鏈節 ???
使用 list_del 移除鏈節後,此時還未釋放節點的記憶體空間
使用 free() 使放鏈節記憶體空間

/* Free all storage used by queue */
void q_free(struct list_head *head)
{
    if (!head)
        return;
    element_t *node, *temp;
    list_for_each_entry_safe (node, temp, head, list) {
        list_del(&node->list);
        q_release_element(node);
    }
    free(head);
}

q_insert_head, q_insert_tail

避免非必要的項目列表 (即 * ),使用清晰明確的漢語書寫。

  • 設計流程:
  1. 若 head 為無效,則回傳 false
  2. 分配記憶體空間給新的結構體,若無效則回傳 false
  3. 使用 strdup(s)char *s 裡的字串複製給 node->value 指定的位置
  4. 檢查 strdup 返回的值是否無效,若無效則釋放記憶體空間,並回傳 false
  5. 將新結構體用 list.h 中的 list_add 插到鏈結頭部

中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)

q_insert_head 程式碼

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head)
        return false;
    element_t *node = malloc(sizeof(element_t));
    if (!node)
        return false;
    node->value = strdup(s);
    if (!node->value) {
        free(node);
        return false;
    }
    list_add(&node->list, head);
    return true;
}

q_insert_tail 程式碼

/* Insert an element at tail of queue */
bool q_insert_tail(struct list_head *head, char *s)
{
    if (!head)
        return false;
    element_t *node = malloc(sizeof(element_t));
    if (!node)
        return false;
    node->value = strdup(s);
    if (!node->value) {
+       free(node);
        return false;
    }
    list_add_tail(&node->list, head);
    return true;
}

q_remove_head,q_remove_tail

  • 設計流程:
  1. 檢查 head 是否無效,若無效則傳 NULL
  2. list_first_entry 來取得 list 的第一個 element
  3. sp==NULL && bufsize <= 0 則回傳 NULL
  4. strncpy()node->value 所指向的字串符複製到 sp ,最多複製bufsize個
  5. list_del 移除list中的節點

q_remove_head 程式碼

/* Remove an element from head of queue */
element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    if (!head || list_empty(head))
        return NULL;
    element_t *node = list_first_entry(head, element_t, list);
    if (!sp && bufsize <= 0)
        return NULL;
-   strncpy(sp, node->value, bufsize);
+   strncpy(sp, node->value, bufsize - 1);
+   sp[bufsize - 1] = 0;
    list_del(&node->list);
    return node;
}

q_remove_tail 程式碼

/* Remove an element from tail of queue */
element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    if (!head || list_empty(head))
        return NULL;
    element_t *node = list_last_entry(head, element_t, list);
    if (!sp && bufsize <= 0)
        return NULL;
-   strncpy(sp, node->value, bufsize);
+   strncpy(sp, node->value, bufsize - 1);
+   sp[bufsize - 1] = 0;
    list_del(&node->list);
    return node;
}

更動:
在字串的結尾補上 \0 表示結束

q_size

  • 設計流程:
  1. 檢查head是否無效,若無效則回傳0
  2. 建立新的結構體,使用 list_for_each_entry 從 head 開始訪問每個鏈節,每經過一個總數加一

程式碼

/* Return number of elements in queue */
int q_size(struct list_head *head)
{
    int sum = 0;
-   if (!head)
+   if (!head || list_empty(head));
        return 0;
-   element_t *node = malloc(sizeof(element_t));
+   element_t *node;
    list_for_each_entry (node, head, list) {
        sum++;
    }
    return sum;
}

更動:
此部份更動已寫在後續章節「使用 Valgrind 分析記憶體問題」中

q_delete_mid

  • 設計流程:
  1. 先判斷若鏈節為空或只有單一個,則回傳 false
  2. 使用快慢指標,當快指標到尾時,慢指標會剛好到中間
  3. 此時使用 list_entry 由慢指標得知結構體,再用 q_release_element 進行記憶體的釋放

程式碼

/* Delete the middle node in queue */
    if (!head || list_empty(head))
        return false;
-   struct list_head *prev = head->prev;
-   head->prev->next = NULL;
    struct list_head *fast = head->next;
    struct list_head *slow = head->next;
-   while (fast != NULL && fast->next != NULL) {
+   while (fast != head && fast->next != head) {
        fast = fast->next->next;
        slow = slow->next;
    }
    list_del(slow);  // prev->next = slow->next;
    q_release_element(list_entry(slow, element_t, list));
-   head->prev = prev;
-   prev->next = head;
    return true;
}

更動 1:
當初在設計時忘記是個循環的雙向鏈結串列,因此將鏈結串列的循環給斷開,再最後才把循環接起來。
更動 2:
修改自 jimmy01240397 的建議,只須將迴圈條件設置成當 fast 指標指向 head 即可。

q_delete_dup

  • 設計流程
  1. 若 head 為無效或空的鏈節串列,回傳 false
  2. list_for_each_entry_safe 訪問每個節點,並紀錄下個節點跟當下的節點的字串進行比較
  3. 若重複則用 flag 作記號來判斷是否刪除

程式碼

/* Delete all nodes that have duplicate string */
bool q_delete_dup(struct list_head *head)
{
    // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
    if (!head || list_empty(head))
        return false;
    bool flag = false;
+   bool now_del = false;
    element_t *entry, *safe;
    list_for_each_entry_safe (entry, safe, head, list) {
-       if (entry->list.next != head &&
-           strcmp(entry->value, safe->value) == 0) {
+       now_del = entry->list.next != head && strcmp(entry->value, safe->value) == 0;
+       if(now_del || flag)
+       {
            list_del(&entry->list);
            q_release_element(entry);
-           flag = true;
-       } else if (flag) {
-           list_del(&entry->list);
-           q_release_element(entry);
-           flag = false;
+           flag = now_del;
        }
    }
    return true;
}

修改自 jimmy01240397 的建議,減少沒必要的分支以精簡程式碼。

q_swap

  • 設計流程:
  1. 若 head 為無效或為空的鏈節串列,則回傳
  2. 使用 list_movecur 的下個節點移至 newhead 的下個節點來進行 swap ,再將 newhead 設為 cur 來進行下組的 swap

程式碼

/* Swap every two adjacent nodes */
void q_swap(struct list_head *head)
{
    // https://leetcode.com/problems/swap-nodes-in-pairs/
-   if (head == NULL)
+   if (!head || list_empty(head))
        return;
-   struct list_head *cur = head->next;
-   struct list_head *newhead = cur;
+   struct list_head *newhead = head, *cur = newhead->next;
-   for (; cur != head && cur->next != head;
-        cur = cur->next, newhead = newhead->next->next) {
-       list_move(cur, newhead->next);
-   }
+   for (; cur != head && cur->next != head; newhead = cur, cur = cur->next) {
+       list_move(cur->next, newhead);
+   }
}

更動:
參考 chiangkd,更改swap實做使程式更直觀。

q_reverse

  • 設計流程:
  1. 若head為無效或鍊結串列為空則回傳
  2. list_for_each_safe 可以訪問每個節點的同時保留下個節點並進行 reverse
  3. 最後因為 list_for_each_safe 定義上的關係 node != head ,因此必須另外定義head的 nextprev

程式碼

void q_reverse(struct list_head *head)
{
    if (!head || head->next == NULL)
        return;
    struct list_head *node, *safe;
    list_for_each_safe (node, safe, head) {
        node->next = node->prev;
        node->prev = safe;
    };
    safe = head->next;
    head->next = head->prev;
    head->prev;
}

待改進:
使用 list_move 使程式碼更簡潔易讀

中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)

q_reversek

q_swap 實作相似,利用 list_move 可以用更簡潔的方式表示。以 k 個節點為一組進行 reverse ,在將 newhead 變更為前一組的最後一個節點。

  • 設計流程:
  1. 若 head 為無效或為空的鏈結串列,則回傳
  2. 宣告 list_head 的指標 currnewhead 指向 head
  3. 以 k 個節點一組,計算總共會進行幾組的 reverse
  4. 在每組 reverse 中, curr->next每次會被移除到newhead的下個節點,經過(k-1)次的循環後curr會間接的變成尾巴的節點。
  5. curr 變為新的頭節點

程式碼:

/* Reverse the nodes of the list k at a time */
void q_reverseK(struct list_head *head, int k)
{
    // https://leetcode.com/problems/reverse-nodes-in-k-group/
    if (!head || list_empty(head))
        return;
    struct list_head *newhead = head, *curr = newhead->next;
    int times = q_size(head) / k;
    for (int i = 0; i < times; i++) {
        for (int j = 1; j < k; j++) {
            list_move(curr->next, newhead);
        }
        newhead = curr;
+       curr = newhead->next;
    }
}

q_sort

參考你所不知道的 C 語言: linked list 和非連續記憶體和學長 wanghanchi 中合併兩個鏈節串列並作排序的方法。

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

  • 設計流程:
    • mergesort
      1.檢查鏈節串列是否為空或無效
      2.利用快慢指標找到中間的節點並且切成兩個獨立的 list_head
      3.呼叫自己進行迭代,最後傳入 q_merge_two
    • q_merge_two
      1.使用 indirect pointer 來訪問每個節點
      2.逐一比較list1list2字串大小
      3.若 list1list2 走到盡頭,剩餘的一方將會接續排序在尾巴
    • q_sort
      1.檢查鏈節串列是否為空或無效
      2.將環狀結構斷開
      3.將 head->next 的值帶入 mergesort 函式
      4.最後因為雙向鏈結的 prev 已亂掉,因此重新定義,並把環狀結構接回

mergesort

struct list_head *mergesort(struct list_head *l)
{
    if (!l || l->next == NULL)
        return l;
    struct list_head *fast = l, *slow = l;
    while (fast->next != NULL && fast->next->next != NULL) {
        fast = fast->next->next;
        slow = slow->next;
    }
    struct list_head *mid = slow->next;
    slow->next = NULL;
    return (q_merge_two(mergesort(l), mergesort(mid)));
}

q_merge_two

/*Merge the two lists in a one sorted list*/
struct list_head *q_merge_two(struct list_head *list1, struct list_head *list2)
{   
    struct list_head *new_head = NULL;
    struct list_head **ptr = &new_head;
-   element_t *list1_entry = list_entry(list1, element_t, list);
-   element_t *list2_entry = list_entry(list2, element_t, list);
    for (struct list_head **node = NULL; list1 && list2;
+       *node = (*node)->next) {
+       node = strcmp(list_entry(list1, element_t, list)->value,
+                     list_entry(list2, element_t, list)->value) >= 0
+                  ? &list2
+                  : &list1;
        *ptr = *node;
        ptr = &(*ptr)->next;
    }
    *ptr = (struct list_head *) ((uint64_t) list1 | (uint64_t) list2);
    return new_head;
}

更動:
在我執行trace-04-ops時 sort 總是無法順利進行,輸出如下:

$ ./qtest -v 3 -f traces/trace-04-ops.cmd
# Test of insert_head, insert_tail, size, swap, and sort
l = []
l = [gerbil]
l = [bear gerbil]
l = [dolphin bear gerbil]
l = [bear dolphin gerbil]
Removed bear from queue
l = [dolphin gerbil]
l = [bear dolphin gerbil]
Queue size = 3
l = [bear dolphin gerbil]
l = [bear dolphin gerbil meerkat]
l = [bear dolphin gerbil meerkat bear]
l = [bear dolphin gerbil meerkat bear gerbil]
Queue size = 6
l = [bear dolphin gerbil meerkat bear gerbil]
ERROR: Not sorted in ascending order
l = [bear meerkat gerbil bear dolphin gerbil]
Removed bear from queue
l = [meerkat gerbil bear dolphin gerbil]
Removed meerkat from queue
l = [gerbil bear dolphin gerbil]
Removed gerbil from queue
l = [bear dolphin gerbil]
Removed bear from queue
l = [dolphin gerbil]
Queue size = 2
l = [dolphin gerbil]
Freeing queue

此時我才發現佇列並不會跟著鏈結串列一起訪問節點,因此把 list_entry 帶入迴圈跟著存取每個節點。

access 的翻譯是「存取」,而非「訪問」(visit)

q_sort

/* Sort elements of queue in ascending/descending order */
void q_sort(struct list_head *head, bool descend)
{
    if (head == NULL || list_empty(head))
        return;
    head->prev->next = NULL;
    head->next = mergesort(head->next);
    struct list_head *cur = head;
    struct list_head *next = head->next;
    while (next) {
        next->prev = cur;
        cur = next;
        next = next->next;
    }
    cur->next = head;
    head->prev = cur;
}

q_descend

題目說明 : 2487. Remove Nodes From Linked List

參考 wanghanchi 的解法,在單向鏈節串列中可以使用reverse來更加簡潔的表達,在環狀雙向鏈節串列中使用prev就能更輕易的解決。

  • 設計流程:
  1. head 為無效或為空的鏈節串列,回傳零
  2. 使用 list_head 的指標 curr 指向 head->prevprev 指向 curr->prev
  3. 使用 list_entry 得到佇列裡的資訊
  4. 在迴圈裡使用 strcmp() 進行比較,若 curr_entry 的字串大於 prev_entry ,則刪除 prev 節點,反之使 curr 移動到 prev 所在節點

程式碼

int q_descend(struct list_head *head)
{
    // https://leetcode.com/problems/remove-nodes-from-linked-list/
    if (!head || head->next == NULL)
        return 0;
-   struct list_head *curr = head->prev, *prev = curr->prev;
+   struct list_head *curr = head->prev;
-   element_t *curr_entry = list_entry(curr, element_t, list);
-   element_t *prev_entry = list_entry(prev, element_t, list);
-   while (prev != head) {
+   while (curr != head && curr->prev != head) {
+   element_t *curr_entry = list_entry(curr, element_t, list);
+   element_t *prev_entry = list_entry(prev, element_t, list);
        if (strcmp(curr_entry->value, prev_entry->value) > 0) {
-           list_del(prev);
+           list_del(&prev_entry->list);
            q_release_element(prev_entry);
        } else {
-           curr = prev;
+           curr = curr->prev;
        }
    }
    return q_size(head);
}

更動:

  1. 減少指向 list_head 的指標使用,使程式碼更精簡
  2. 變更 list_entry 的位置,使每次迴圈都能訪問到佇列位置

q_merge

參考 chiacyu

避免非必要的項目縮排 (即 * - ),以清晰、明確,且流暢的漢語書寫。

檢查 head 是否為空或無效
新增一個 list_head 名為 temp
使用 list_for_each_entry_safe 訪問每個佇列,
list_splice_init 將每個節點移動到 temp 的鏈結串列,並將佇列 curr->q 的頭指標初始化為空佇列
q_sort 進行排序
使用 list_splice_init 將排序後的 temp 鏈結串列合併到 head 鏈結串列的第一個隊列中
回傳合併後的節點數量

/* Merge all the queues into one sorted queue, which is in ascending/descending
 * order */
int q_merge(struct list_head *head, bool descend)
{
    // https://leetcode.com/problems/merge-k-sorted-lists/
    if(!head || list_empty(head)){
        return 0;
    }
    struct list_head temp;
    INIT_LIST_HEAD(&temp);
    queue_contex_t *curr, *safe;
    list_for_each_entry_safe(curr, safe, head, chain){
        list_splice_init(curr->q, &temp);
    }
    q_sort(&temp,false);
    list_splice_init(&temp, list_first_entry(head, queue_contex_t, chain)->q);
    return q_size(head);
}
  1. 移除非必要的 :::info:::spoiler,讓行文清晰且流暢。
  2. 中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
  3. HackMD 不是讓你張貼完整程式碼的地方,GitHub 才是!因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示),可附上對應 GitHub commit 的超連結,列出程式碼是為了「檢討」和「便於他人參與討論」,不是用來「假裝自己有付出」
  4. 改進漢語表達。

目前跑分

$ make test
scripts/driver.py -c
...
+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
---     trace-17-complexity     0/5
---	TOTAL		95/100

研讀論文並改進 dudect

作業要求:論文〈Dude, is my code constant time?〉重點提示

論文相關筆記紀錄 : 筆記

解析 LAB0-C/dudect

dudect/constant.c

measure 主要測試 insertremove 功能是否正常。

其測試的方法為用佇列的大小判斷,下方程式碼從 case DUT(insert_head) 中擷取。

    int before_size = q_size(l);
    before_ticks[i] = cpucycles();
    dut_insert_head(s, 1);
    after_ticks[i] = cpucycles();
    int after_size = q_size(l);
    dut_free();
    if (before_size != after_size - 1)
        return false;

dudect/ttest.c
先看結構體

typedef struct {
    double mean[2];
    double m2[2];
    double n[2];
} t_context_t;

t_init 主要用作初始化 t_context_t這個結構體

void t_init(t_context_t *ctx)
{
    for (int class = 0; class < 2; class ++) {
        ctx->mean[class] = 0.0;
        ctx->m2[class] = 0.0;
        ctx->n[class] = 0.0;
    }
    return;
}

t_push 看似用來確認穩定狀態

void t_push(t_context_t *ctx, double x, uint8_t class)
{
    assert(class == 0 || class == 1);
    ctx->n[class]++;

    /* Welford method for computing online variance
     * in a numerically stable way.
     */
    double delta = x - ctx->mean[class];
    ctx->mean[class] = ctx->mean[class] + delta / ctx->n[class];
    ctx->m2[class] = ctx->m2[class] + delta * (x - ctx->mean[class]);
}

t_compute

下方程式碼為 Welch's test 公式 : t=X0X1s12N1+s22N2

double t_compute(t_context_t *ctx)
{
    double var[2] = {0.0, 0.0};
    var[0] = ctx->m2[0] / (ctx->n[0] - 1);
    var[1] = ctx->m2[1] / (ctx->n[1] - 1);
    double num = (ctx->mean[0] - ctx->mean[1]);
    double den = sqrt(var[0] / ctx->n[0] + var[1] / ctx->n[1]);
    double t_value = num / den;
    return t_value;
}

dudect/fixture.c

/* Interface to test if function is constant */
#define _(x) bool is_##x##_const(void);
DUT_FUNCS
#undef _

is_##x##_const 函式由前置處理器 concatenation 處理,當中 DUT_FUNCS 包含 insert_headinsert_tailremove_headremove_tail

相關知識可以參考 你所不知道的 C 語言:前置處理器應用篇

differentiate()

主要用作計算執行時間

    for (size_t i = 0; i < N_MEASURES; i++)
        exec_times[i] = after_ticks[i] - before_ticks[i];

report()

用作判斷是否為 constant time

    /* Definitely not constant time */
    if (max_t > t_threshold_bananas)
        return false;

    /* Probably not constant time. */
    if (max_t > t_threshold_moderate)
        return false;

    /* For the moment, maybe constant time. */
    return true;

doit()

ret 接收 measure 測試 insertremove 是否都正常的布林值。

update_statistics 接收來自 differentiate 的執行時間結果和 classes。

最後 ret 檢查是否為 constant time 。

    bool ret = measure(before_ticks, after_ticks, input_data, mode);
    differentiate(exec_times, before_ticks, after_ticks);
    update_statistics(exec_times, classes);
    ret &= report();
    ...
    return ret;

dudect實作原理

traces/trace-17-complexity.cmd 中看到第一行為 option simulation 1 ,可見 simulation 為檢驗時間複雜度的開關,因此先來了解 simulation 與 dudect 的關聯。

qtest.c 中看到這段

if (simulation) {
    if (argc != 1) {
        report(1, "%s does not need arguments in simulation mode", argv[0]);
        return false;
    }
    bool ok =
        pos == POS_TAIL ? is_insert_tail_const() : is_insert_head_const();
    if (!ok) {
        report(1,
        "ERROR: Probably not constant time or wrong implementation");
        return false;
    }
    report(1, "Probably constant time");
    return ok;

程式碼更新於 commit : bdcfae3
pos == POS_TAIL ? is_insert_tail_const() 可以用 pos 選擇要插入的位置。

從上述程式碼中看到似乎是用 is_insert_head(tail)_const 去回傳是否為 constant time 。

static bool test_const(char *text, int mode)
{
    bool result = false;
    t = malloc(sizeof(t_context_t));

    for (int cnt = 0; cnt < TEST_TRIES; ++cnt) {
        printf("Testing %s...(%d/%d)\n\n", text, cnt, TEST_TRIES);
        init_once();
        for (int i = 0; i < ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1;
             ++i)
            result = doit(mode);
        printf("\033[A\033[2K\033[A\033[2K");
        if (result)
            break;
    }
    free(t);
    return result;
}

預設測試 TEST_TRIES 次,每次測試時先呼叫 init_once()t_context_t 結構體進行初始化,接著進入 doit ,由 prepare_inputs 輸出提供的值 ,經由 measure 測試功能是否正常及 differentiate 得到執行時間,再將計算得到的執行時間、 classes 輸入給 update_statistics ,若函式都成功執行且為 constant time ( report() 回傳 true ) 則 ret 回傳 true

修改 LAB0-C/dudect 達到時間常數

LAB0-C/dudect 中相較原程式碼 dudect.h 少了 percentile() 功能,即少了裁切測量值。

因此將原始碼的 percentile 加入 LAB0-C/dudect 中並修改,最後終於讓我看到了卡比之星。

修改內容 commit d04d313
image

檢討

讀了一遍這篇論文,對於完全沒有統計基礎的人來說十分挫折,我並不能完全理解裡面分析及理論想表達的內容,反而是透過看同學們的筆記一點一點紀錄才大致了解實作原理,後續有時間再回來複習。

在 qtest 提供新的命令 shuffle

詳細內容可以參考我的 commit:d0493b2commit:c091ea3

設計邏輯遵循 Fisher–Yates shuffle 裡的 The modern algorithm

下方表格為實作概念:

Times Range Roll Scratch Result
0 1 2 3 4 5 6 7
1 0~6 2 1 2 7 4 5 6 3
2 0~5 5 1 2 7 4 5 6 3
3 0~4 3 1 2 7 5 4 6 3
4 0~3 1 1 5 7 2 4 6 3
5 0~2 1 1 7 5 2 4 6 3
6 0~1 1 1 7 5 2 4 6 3
7 0~0 0 1 7 5 2 4 6 3

qtest 中輸出結果

cmd> new
l = []
cmd> it 1
l = [1]
cmd> it 2
l = [1 2]
cmd> it 3
l = [1 2 3]
cmd> it 4
l = [1 2 3 4]
cmd> it 5
l = [1 2 3 4 5]
cmd> it 6
l = [1 2 3 4 5 6]
cmd> it 7
l = [1 2 3 4 5 6 7]
cmd> shuffle
l = [5 6 3 2 1 7 4]

接著導入 M01: lab0 所提供的驗證程式輸出

Expectation:  41666
Observation:  {'1234': 41792, '1243': 41748, '1324': 41661, '1342': 41497, '1423': 41519, '1432': 41770, '2134': 41175, '2143': 41992, '2314': 41550, '2341': 41643, '2413': 41633, '2431': 41949, '3124': 41780, '3142': 41597, '3214': 41703, '3241': 41754, '3412': 41483, '3421': 41258, '4123': 41648, '4132': 41651, '4213': 41983, '4231': 41864, '4312': 41753, '4321': 41597}
chi square sum:  21.73297172754764

Figure_1
shuffle 的頻率是大致符合 Uniform distribution

使用 Valgrind 分析記憶體問題

使用案例

由於我在 $make check 總是出現有 ERROR: Freed queue, but 2 blocks are still allocated

$ make check 
./qtest -v 3 -f traces/trace-eg.cmd
# Demonstration of queue testing framework
# Use help command to see list of commands and options
# Initial queue is NULL.
l = NULL
# Create empty queue
l = []
# See how long it is
Queue size = 0
l = []
# Fill it with some values. First at the head
l = [dolphin]
l = [bear dolphin]
l = [gerbil bear dolphin]
# Now at the tail
l = [gerbil bear dolphin meerkat]
l = [gerbil bear dolphin meerkat bear]
# Reverse it
l = [bear meerkat dolphin bear gerbil]
# See how long it is
Queue size = 5
l = [bear meerkat dolphin bear gerbil]
# Delete queue. Goes back to a NULL queue.
l = NULL
ERROR: There is no queue, but 2 blocks are still allocated
# Exit program
Freeing queue
ERROR: Freed queue, but 2 blocks are still allocated
make: *** [Makefile:57:check] 錯誤 1

一開始看完以為問題出在 q_free 上,因此花費了大量時間反覆修改測試,直到我看了 以 Valgrind 分析記憶體問題 才開始使用到工具分析計體資訊,以下為輸出結果。

$ valgrind ./qtest < traces/trace-eg.cmd
l = NULL
l = []
Queue size = 0
l = []
==26769== Conditional jump or move depends on uninitialised value(s)
==26769==    at 0x484ED28: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26769==    by 0x10DC6B: strsave_or_fail (report.c:254)
==26769==    by 0x10E551: parse_args (console.c:152)
==26769==    by 0x10E551: interpret_cmd (console.c:200)
==26769==    by 0x10F1F0: run_console (console.c:691)
==26769==    by 0x10D3C3: main (qtest.c:1258)
==26769== 
==26769== Conditional jump or move depends on uninitialised value(s)
==26769==    at 0x484F02A: strncpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26769==    by 0x10DCE5: strncpy (string_fortified.h:95)
==26769==    by 0x10DCE5: strsave_or_fail (report.c:266)
==26769==    by 0x10E551: parse_args (console.c:152)
==26769==    by 0x10E551: interpret_cmd (console.c:200)
==26769==    by 0x10F1F0: run_console (console.c:691)
==26769==    by 0x10D3C3: main (qtest.c:1258)
==26769== 
l = [dolphin]
l = [bear dolphin]
l = [gerbil bear dolphin]
l = [gerbil bear dolphin meerkat]
l = [gerbil bear dolphin meerkat bear]
l = [bear meerkat dolphin bear gerbil]
Queue size = 5
l = [bear meerkat dolphin bear gerbil]
l = NULL
ERROR: There is no queue, but 2 blocks are still allocated
Freeing queue
ERROR: Freed queue, but 2 blocks are still allocated
==26769== 128 bytes in 2 blocks are still reachable in loss record 1 of 1
==26769==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26769==    by 0x10F2E7: test_malloc (harness.c:133)
==26769==    by 0x10F90F: q_size (queue.c:104)
==26769==    by 0x10B60F: do_size (qtest.c:560)
==26769==    by 0x10DFD1: interpret_cmda (console.c:181)
==26769==    by 0x10E586: interpret_cmd (console.c:201)
==26769==    by 0x10F1F0: run_console (console.c:691)
==26769==    by 0x10D3C3: main (qtest.c:1258)
==26769== 

這時我才發現可能是我的 q_size 造成的問題,以下是我當時的 q_size 及更改後:

/* Return number of elements in queue */
int q_size(struct list_head *head)
{
    int sum = 0;
-   if (!head)
+   if (!head || list_empty(head));
        return 0;
-   element_t *node = malloc(sizeof(element_t));
+   element_t *node;
    list_for_each_entry (node, head, list) {
        sum++;
    }
    return sum;
}

原來我這時在設計 q_size 時自作聰明的多幫 node 設置 malloc ,下次在設計時更應該思考配置記憶體的必要。
這問題現在看似很簡單,但在當時 $ make check 的情況看來,我只會認為一定是我的 q_free 在某個地方出錯,我更應該善用工具來分析問題。


實作 list_sort.c

研讀 list_sort.c

感謝 RinHizakura, chiangkd, Risheng

不該只有致謝,你的洞見呢?

__attribute

可以設置函式屬性(Function Attribute)、變量屬性(Variable Attribute)和類型屬性(Type Attribute)。

其中函式屬性可以幫助開發者把一些特性添加到函數聲明中,從而可以使編譯器在錯誤檢查方面的功能更強大。__attribute__ 機制也很容易同非GNU應用程序> 做到兼容 之功效。

研讀資訊科技詞彙翻譯詞彙對照表,調整用語,並改進你的漢語表達。

參考自 [GNU C `__attribute__` 機制簡介](https://huenlil.pixnet.net/blog/post/26078382)

為何不查閱 GCC 手冊呢?

__attribute__((nonnull(2,3)))

其中 2、3 代表地二、三個引數不能為空

參考 __attribute__((nonnull)) function attribute

This function attribute specifies function parameters that are not supposed to be null pointers. This enables the compiler to generate a warning on encountering such a parameter

merge

__attribute__((nonnull(2,3,4)))
static struct list_head *merge(void *priv, list_cmp_func_t cmp,
				struct list_head *a, struct list_head *b)
{
	struct list_head *head, **tail = &head;

	for (;;) {
		/* if equal, take 'a' -- important for sort stability */
		if (cmp(priv, a, b) <= 0) {
			*tail = a;
			tail = &a->next;
			a = a->next;
			if (!a) {
				*tail = b;
				break;
			}
		} else {
			*tail = b;
			tail = &b->next;
			b = b->next;
			if (!b) {
				*tail = a;
				break;
			}
		}
	}
	return head;
}

先定義函式第 2、3、4 不能為 NULL

linux/include/linux/list_sort.h 中找到 cmp 的 typedef

typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
		const struct list_head *, const struct list_head *);

可以確定 cmp 是函數指標,且回傳 int ,參數分別為 void *、兩個 struct list_head *

  • cmp > 0 ,即為 ab 之後
  • cmp <= 0 ,即為 aa 之前

因 list_sort 為 stable sort ,故不需特別探討小於和等於 0 的區別。

stable sort : Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list. See here for a more complete description.
ex :
排序前 : 3,5,19,1,3*,10
排序後 : 1,3,3*,5,10,19
兩個3、3*排序前後順序相同

merge_final

 * Combine final list merge with restoration of standard doubly-linked
 * list structure.  This approach duplicates code from merge(), but
 * runs faster than the tidier alternatives of either a separate final
 * prev-link restoration pass, or maintaining the prev links
 * throughout.

merge 作法相似,差別在於最終連結 prev 回上一個節點,形成雙向環狀鏈結串列,在速度上比起每次 merge 後再接 prev 來的更快。

__attribute__((nonnull(2,3,4,5)))
static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
			struct list_head *a, struct list_head *b)
{
	struct list_head *tail = head;
	u8 count = 0;

	for (;;) {
		/* if equal, take 'a' -- important for sort stability */
		if (cmp(priv, a, b) <= 0) {
			tail->next = a;
			a->prev = tail;
			tail = a;
			a = a->next;
			if (!a)
				break;
		} else {
			tail->next = b;
			b->prev = tail;
			tail = b;
			b = b->next;
			if (!b) {
				b = a;
				break;
			}
		}
	}

	/* Finish linking remainder of list b on to tail */
	tail->next = b;
	do {
		/*
		 * If the merge is highly unbalanced (e.g. the input is
		 * already sorted), this loop may run many iterations.
		 * Continue callbacks to the client even though no
		 * element comparison is needed, so the client's cmp()
		 * routine can invoke cond_resched() periodically.
		 */
		if (unlikely(!++count))
			cmp(priv, b, b);
		b->prev = tail;
		tail = b;
		b = b->next;
	} while (b);

	/* And the final links to make a circular doubly-linked list */
	tail->next = head;
	head->prev = tail;
}

程式碼的註釋提到若合併是極度不平衡(像是輸入已經排序),則會經過多次的迭代,儘管這些迭代不需要。因此 cmp 可以週期性的呼叫 cond_resched()

`cond_resched()` 的介紹參考 [cond_resched的使用](https://www.twblogs.net/a/5e533156bd9eee2117c39e55) > 在可搶佔內核中,在內核態有很多搶佔點,在有高優先級的進程需要運行時,就會在搶佔點到來時執行搶佔;而在內核不可搶佔系統中(如centos系統),在內核態運行的程序可調用cond_resched主動讓出cpu,防止其在內核態執行時間過長導致可能發生的soft lockup或者造成較大的調度延遲。

查閱 Linux 核心原始程式碼裡頭的文件和程式碼註解,上述描述不精確。

接著討論 likelyunlikely ,在 linux/include/linux/compiler.h 可以找到

#define likely_notrace(x)	__builtin_expect(!!(x), 1)
#define unlikely_notrace(x)	__builtin_expect(!!(x), 0)

__built_expect() 是 gcc 的內建 function ,用來將 branch 的相關資訊提供給 compiler ,告訴 compiler 設計者期望的比較結果,以便對程式碼改變分支順序來進行優化。

  • !!(x) 用來確保值一定是 0 或 1
  • 因為 if 內邏輯敘述的值可以是 0 或是非 0 的整數的,所以如果不做 !!(x) 就無法確保值一定是0或1
if (likely(x)) {
    /* execute when x is true */ 
}
else {
    /* execute when x is false */
}
  • 使用 likely macro 表示這段敘述(x)為 true 的機率比較大(比較常發生),告訴 compiler 將 x==ture 的對應執行程式碼接在判斷後面
  • 使用 unlikely macro 表示這段敘述(x)為 true 的機率比較小(不常發生),告訴 compiler 將 x==false 的對應執行程式碼接在判斷後面

參考自 chiangkd 提供的參考資料 likely / unlikely macro introduction

list_sort

在看程式碼之前先來看註解的說明 :

 * The comparison function @cmp must return > 0 if @a should sort after
 * @b ("@a > @b" if you want an ascending sort), and <= 0 if @a should
 * sort before @b *or* their original order should be preserved.  It is
 * always called with the element that came first in the input in @a,
 * and list_sort is a stable sort, so it is not necessary to distinguish
 * the @a < @b and @a == @b cases.
 *
 * This is compatible with two styles of @cmp function:
 * - The traditional style which returns <0 / =0 / >0, or
 * - Returning a boolean 0/1.
 * The latter offers a chance to save a few cycles in the comparison
 * (which is used by e.g. plug_ctx_cmp() in block/blk-mq.c).
 *
 * A good way to write a multi-word comparison is::
 *
 *	if (a->high != b->high)
 *		return a->high > b->high;
 *	if (a->middle != b->middle)
 *		return a->middle > b->middle;
 *	return a->low > b->low;

a 需被排序在 b 之後,則 @cmp > 0 (@a > @b),反之若 a 需被排序在 b 之前,則 @cmp <= 0 。list_sort 是 stable sort 因此在這裡我們不需特別探討小於和等於 0 的區別。

 * This mergesort is as eager as possible while always performing at least
 * 2:1 balanced merges.  Given two pending sublists of size 2^k, they are
 * merged to a size-2^(k+1) list as soon as we have 2^k following elements.

從這裡開始說明 list_sort 的實作方法,兩個要被 merge 的 list 始終保持 2 : 1 的大小,這可以降低比較次數。
相較 fully-eager bottom-up mergesort 只要出現兩個 2k 大小的 list 就會立刻被合併。而 list_sort 是在出現兩個 2k 大小的 list 的時候不立即合併,而是等到下一個 2k 的 list 被蒐集起來時,前者的兩個 linked list 才會被合併。

避免過多的中英文混用,已有明確翻譯詞彙者,例如「鏈結串列」(linked list) 和「佇列」(queue),就使用該中文詞彙,英文則留給變數名稱、人名,或者缺乏通用翻譯詞彙的場景。

 * Thus, it will avoid cache thrashing as long as 3*2^k elements can
 * fit into the cache.  Not quite as good as a fully-eager bottom-up
 * mergesort, but it does use 0.2*n fewer comparisons, so is faster in
 * the common case that everything fits into L1.

只要這 2 : 1 的 list (即 32k 的 list )可以完全被放到 cache 裡,就可以避免 cache thrashing。

 * The merging is controlled by "count", the number of elements in the
 * pending lists.  This is beautifully simple code, but rather subtle.
  • count 用來計算在 pending lists 的 element 數量
 * Each time we increment "count", we set one bit (bit k) and clear
 * bits k-1 .. 0.  Each time this happens (except the very first time
 * for each bit, when count increments to 2^k), we merge two lists of
 * size 2^k into one list of size 2^(k+1).
  • 每當 count 增加時,第 k 位的 bit 設為 1 ,並清除 k-1 位的 bit 為 0

不懂就說不懂,沒有「不太明白」這回事。

後續的註解說明看不懂,因此直接參考 kdnvt確保 2 : 1 的實作方法中有非常清楚的說明

引述其的表格 :

count decimal count binary merge before merge after merge
0 0000 No NULL X
1 0001 No 1 X
2 0010 Yes 1-1 2
3 0011 No 1-2 X
4 0100 Yes 1-1-2 2-2
5 0101 Yes 1-2-2 1-4
6 0110 Yes 1-1-4 2-4
7 0111 No 1-2-4 X
8 1000 Yes 1-1-2-4 2-2-4
9 1001 Yes 1-2-2-4 1-4-4
10 1010 Yes 1-1-4-4 2-4-4
11 1011 Yes 1-2-4-4 1-2-8
12 1100 Yes 1-1-2-8 2-2-8
13 1101 Yes 1-2-2-8 1-4-8
14 1110 Yes 1-1-4-8 2-4-8
15 1111 No 1-2-4-8 X
16 10000 Yes 1-1-2-4-8 2-2-4-8

list_sort 實作 :

do {
	size_t bits;
	struct list_head **tail = &pending;
		/* Find the least-significant clear bit in count */
	for (bits = count; bits & 1; bits >>= 1)
		tail = &(*tail)->prev;
	/* Do the indicated merge */
	if (likely(bits)) {
		struct list_head *a = *tail, *b = a->prev;
			a = merge(priv, cmp, b, a);
		/* Install the merged result in place of the inputs */
		a->prev = b->prev;
		*tail = a;
	}
		/* Move one element from input list to pending */
	list->prev = pending;
	pending = list;
	list = list->next;
	pending->next = NULL;
	count++;
} while (list);

count = 0
初始狀態如下圖所示,這邊假設一共有 5 個節點,其中 node1head
此時 bit 為 0 因此不進入 for() 迴圈也不進行 merge







list



node1

node1

prev

next



node2

node2

prev

next



node1:n->node2:m





node3

node3

prev

next



node2:n->node3:m





node4

node4

prev

next



node3:n->node4:m





node5

node5

prev

next



node4:n->node5:m





NULL
NULL



node5:n->NULL





head
head



head->node1:m





pending
pending



pending->NULL





tail
tail



tail->pending





list
list



list->node2:m











list



node1

node1

prev

next



node2

node2

prev

next



node1:n->node2:m





NULL
NULL



node2:n->NULL





node2:p->NULL





node3

node3

prev

next



node3:p->node2:m





node4

node4

prev

next



node3:n->node4:m





node5

node5

prev

next



node4:n->node5:m





node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node2:m





tail
tail



tail->pending





list
list



list->node3:m





count = 1
此時 bits 為 1 經過一次 for 迴圈 tail 指向 node2prev ,不進行 merge







list



node1

node1

prev

next



node2

node2

prev

next



node1:n->node2:m





NULL
NULL



node2:n->NULL





node2:p->NULL





node3

node3

prev

next



node3:p->node2:m





node4

node4

prev

next



node3:n->node4:m





node5

node5

prev

next



node4:n->node5:m





node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node2:m





tail
tail



tail->node2:p





list
list



list->node3:m











list



node1

node1

prev

next



node2

node2

prev

next



node1:n->node2:m





NULL
NULL



node2:n->NULL





node2:p->NULL





node3

node3

prev

next



node3:p->node2:m





node3:n->NULL





node4

node4

prev

next



node4:p->node3:m





node5

node5

prev

next



node4:n->node5:m





node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node3:m





tail
tail



tail->node2:p





list
list



list->node4:m





count = 2
此時 bits102 ,不進入 for 迴圈,但進行 merge







list



node1

node1

prev

next



node2

node2

prev

next



node1:n->node2:m





NULL
NULL



node2:n->NULL





node2:p->NULL





node3

node3

prev

next



node3:p->node2:m





node3:n->NULL





node4

node4

prev

next



node4:p->node3:m





node5

node5

prev

next



node4:n->node5:m





node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node3:m





tail
tail



tail->pending





list
list



list->node4:m





此時 a(node3)b(node2) 已經合併完成,合併後用 node2,3 來表示







list



node1

node1

prev

next



node23

node2,3

prev

next



node1:n->node23:m





NULL
NULL



node23:p->NULL





node23:n->NULL





node4

node4

prev

next



node4:p->node23:m





node4:n->NULL





node5

node5

prev

next



node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node4:m





tail
tail



tail->pending





list
list



list->node5:m





count = 3
count = 3 = 112 因此會經過 2 次 for 迴圈,此時 tail 指向 node2,3prev ,因 bit 為 0 故此階段不進行合併。







list



node1

node1

prev

next



node23

node2,3

prev

next



node1:n->node23:m





NULL
NULL



node23:p->NULL





node23:n->NULL





node4

node4

prev

next



node4:p->node23:m





node4:n->NULL





node5

node5

prev

next



node5:p->node4:m





NULL1
NULL



node5:n->NULL1





head
head



head->node1:m





pending
pending



pending->node4:m





tail
tail



tail->node23:p





list
list



list->node5:m











list



node1

node1

prev

next



node23

node2,3

prev

next



node1:n->node23:m





NULL
NULL



node23:p->NULL





node23:n->NULL





node4

node4

prev

next



node4:p->node23:m





node4:n->NULL





node5

node5

prev

next



node5:p->node4:m





node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node5:m





tail
tail



tail->node23:p





list
list



list->NULL





count = 4
count1002 bits 不為 0 進行合併







list



node1

node1

prev

next



node23

node2,3

prev

next



node1:n->node23:m





NULL
NULL



node23:p->NULL





node23:n->NULL





node4

node4

prev

next



node4:p->node23:m





node4:n->NULL





node5

node5

prev

next



node5:p->node4:m





node5:n->NULL





head
head



head->node1:m





pending
pending



pending->node5:m





tail
tail



tail->pending





list
list



list->NULL





此時 a(node5)b(node4) 已經合併完成,合併後用 node4,5 來表示







list



node1

node1

prev

next



node23

node2,3

prev

next



node1:n->node23:m





NULL
NULL



node23:p->NULL





node23:n->NULL





node45

node4,5

prev

next



node45:p->node23:m





node45:n->NULL





head
head



head->node1:m





pending
pending



pending->node45:m





tail
tail



tail->pending





list
list



list->NULL





上述節點都已被加到 pending list ,接著將所有的 pending list 合併在一起

/* End of input; merge together all the pending lists. */
list = pending;
pending = pending->prev;
for (;;) {
	struct list_head *next = pending->prev;
	if (!next)
		break;
	list = merge(priv, cmp, pending, list);
	pending = next;
}

最後將 prev 重新接回變成雙向鏈結串列

/* The final merge, rebuilding prev links */
merge_final(priv, cmp, head, pending, list);

list_sort 加入到專案中

  1. list_sort.clist_sort.h 加入到 LAB0_C 專案中
  2. list_sort.c 中用到的巨集加入到 list_sort.h
#define likely_notrace(x)	__builtin_expect(!!(x), 1)
#define unlikely_notrace(x)	__builtin_expect(!!(x), 0)
  • 定義 list_sort.c 中的 u8 型別
#include <stdint.h>
typedef uint8_t u8;
  1. Makefile 中的 OBJS 中新增 list_sort.o
OBJS := qtest.o report.o console.o harness.o queue.o \
        random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \
        shannon_entropy.o \
        linenoise.o web.o \
+   	list_sort.o
  1. 修改 qtest.c
  • 加入 include "list_sort.h"
  • 設定參數來決定是否使用 list_sort
static int use_list_sort = 0;
  • 加入比較函式 cmp
static int cmp(void *priv, const struct list_head *a, const struct list_head *b)
{
    return strcmp(list_entry(a, element_t, list)->value,
                list_entry(b, element_t, list)->value);
}

此時參考到 Linux 核心的比較函式撰寫

/* `priv` is the test pointer so check() can fail the test if the list is invalid. */
static int cmp(void *priv, const struct list_head *a, const struct list_head *b)
{
	struct debug_el *ela, *elb;

	ela = container_of(a, struct debug_el, list);
	elb = container_of(b, struct debug_el, list);

	check(priv, ela, elb);
	return ela->value - elb->value;
}

得知 priv 是個測試用的指標

  • list_sort 加入到 help 中,可以透過 option list_sort [true/false] 來啟用 sortlist_sort
add_param("list_sort", &use_list_sort,
              "The list_sort function used within the Linux kernel.", NULL);
Options:
  descend     0            | Sort and merge queue in ascending/descending order
  echo        1            | Do/don't echo commands
  entropy     0            | Show/Hide Shannon entropy
  error       5            | Number of errors until exit
  fail        30           | Number of times allow queue operations to return false
  length      1024         | Maximum length of displayed string
  list_sort   0            | The list_sort function used within the Linux kernel.
  malloc      0            | Malloc failure probability percent
  simulation  0            | Start/Stop simulation mode
  verbose     4            | Verbosity level
cmd> 
  1. 編寫一個測試內容 trace-sort.cmd 將其放入 trace 目錄中
# Testing the efficiency of list_sort and sort.
# You can modify the option listsort and the RAND to meet your need

option list_sort 1
new
it RAND 1000000
time
sort
time

輸入

$ ./qtest -f traces/trace-sort.cmd 

即可得到 list_sortsort 的處理時間

使用 perf 分析工具

參考 perf 安裝

輸入:

$ cat /proc/sys/kernel/perf_event_paranoid

kernel.perf_event_paranoid 的會是 4

輸入:

$ sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"

-1 可以將權限全開

list_sortq_sort 效率分析

使用 Metric prefix,而非「萬」

資料數 q_sort list_sort
100k 0.066 0.039
200k 0.167 0.128
300k 0.290 0.202
400k 0.414 0.290
500k 0.547 0.387
600k 0.687 0.479
700k 0.816 0.579
800k 0.951 0.682
900k 1.102 0.753
1000k 1.255 0.856

使用 gnuplot 製圖工具畫成折線圖方便觀察
image
從圖看出當資料量越大 list_sort 的處理效率明顯更好

接著使用 perf 工具分析更詳細的內容

輸入

$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./qtest -f traces/trace-sort.cmd

不要急著比較程式效能,你該思考:

  1. 目前的測試方法是否涵蓋排序演算法的最差狀況?
  2. 是否考慮到 stable sorting 的判定
  3. 資料分布要從隨機字串改為足以反映真實情況,要做哪些修改?
  4. 是否引入統計模型來處理效能比較?
  5. 你的數學分析呢?

謝謝老師的指教,我會在更進一步的分析
jeremy

重複 5 次該程序 ,並顯示每個 event 的變化區間,接著分析對一百萬的資料做 sort ,並分析 cache-missescache-referencesinstructionscycles

得到以下結果

  • q_sort
 Performance counter stats for './qtest -f traces/trace-sort.cmd' (5 runs):

        60,203,429      cache-misses                     #   65.85% of all cache refs           ( +-  1.23% )
        91,423,384      cache-references                                                        ( +-  0.20% )
     4,732,222,486      instructions                     #    0.64  insn per cycle              ( +-  0.01% )
     7,353,673,626      cycles                                                                  ( +-  0.71% )

            1.7575 +- 0.0174 seconds time elapsed  ( +-  0.99% )

  • list_sort
 Performance counter stats for './qtest -f traces/trace-sort.cmd' (5 runs):

        47,561,816      cache-misses                     #   66.22% of all cache refs           ( +-  0.17% )
        71,827,278      cache-references                                                        ( +-  0.16% )
     4,693,241,118      instructions                     #    0.84  insn per cycle              ( +-  0.03% )
     5,615,783,740      cycles                                                                  ( +-  0.42% )

           1.32896 +- 0.00571 seconds time elapsed  ( +-  0.43% )
測試內容 q_sort list_sort
cache-misses 60,203,429 47,561,816
cache-references 91,423,384 71,827,278
instructions 4,732,222,486 4,693,241,118
cycles 7,353,673,626 5,615,783,740

由以上資料分析得知:

  • list_sort 相比 q_sort 少了 21% 的 cache-misses
  • 由 IPC(insns per cycle) 得知,執行速度也比 q_sort 快了約 31%

中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)

留意各式細節,唯有重視小處並步步為營,方可挑戰原始程式碼規模超越三千七百萬行的 Linux 核心

收到
jeremy


Tic Tac Toe

M03: ttt

Add the option of 'ttt' in the terminal

目前只是原封不動的加入到 lab0 後續再將其修改與解讀

console.c 中加入

ADD_COMMAND(ttt, "Play Tic Tac Toe", "");

do_ttt 函式

static bool do_ttt(int argc, char *argv[])
{
    /* .... */
}

更多合併過程可以參考我的

commit 538e145

在合併的過程中遇到的問題:Cppcheck: Null pointer dereference

zobrist.c:63:21: error: Null pointer dereference: (struct zobrist_entry_t*)0 [nullPointer]
            entry = hlist_entry(hash_table[i].first, zobrist_entry_t, ht_list);
                    ^

Fail to pass static analysis.

探討:

  1. 呼叫 hlist_entry
  2. hlist_entrycontainer_of 的包裝
  3. container_of 中有這段
__typeof__(((type *) 0)->member)

(type *)0 將 0 轉型成 null pointer ,是為了取得 member 的 type 。
因此修改 pre-commit.hook ,在 CPPCHECK_suppresses 中加入這段

+    --suppress=nullPointer:zobrist.c \

jasperlin1996 的筆記中也有遇到相同問題,且探討的更深入。

ai vs ai

新增電腦對決電腦的功能至 qtest 中

commit:d7cd873

./qtest 
cmd> option ai_vs_ai 1
cmd> ttt
 1 |     ×     × 
 2 |     ×  ○    
 3 |     ○  ×  ○ 
 4 |  ○          
---+-------------
      A  B  C  D
O won!
Moves: B2 -> C2 -> C3 -> D3 -> B1 -> B3 -> D1 -> A4

問題:
每次執行的結果都相同,應增加更多隨機性使其結果有更多變化

TODO:
在對弈的過程中,要在螢幕顯示當下的時間 (含秒數),並持續更新。

參考:Monte Carlo Tree Search Wiki / Monte Carlo Tree Search 解說

四步驟:

  1. Selection : Start from root R and select successive child nodes until a leaf node L is reached. The root is the current game state and a leaf is any node that has a potential child from which no simulation (playout) has yet been initiated.

從根節點開始,由 UCB 判斷該往那的子節點進行移動,其中 UCB 將在最佳策略與探索新路徑做出平衡。
UCB=wini+cln(Ni)ni2
wi : 該 node 贏得的次數
ni : 該 node 總共進行次數
Ni : Parent 的模擬次數
c : 權重 (可以自行調整 c:2 )

  1. Expansion : Unless L ends the game decisively (e.g. win/loss/draw) for either player, create one (or more) child nodes and choose node C from one of them. Child nodes are any valid moves from the game position defined by L.
  2. Simulation : Complete one random playout from node C. This step is sometimes also called playout or rollout. A playout may be as simple as choosing uniform random moves until the game is decided (for example in chess, the game is won, lost, or drawn).
  3. Backpropagation : Use the result of the playout to update information in the nodes on the path from C to R.

image

總結:

wini => exploitation (選擇已知最好策略,即勝率最高)
cln(Ni)ni2 => exploration (選擇探索已知路線)

此方程式不完全只走目前已知勝率最高的路線,而是在探索與最佳策略之間切換來達到平衡。

Fixed point implementation

為何不建議在 Linux 核心裡使用浮點數運算 :
拖慢執行速度,還需要手動儲存/取回相關的 FPU 暫存器。

其中 FPU 造成之問題:Lazy FP state restore

fixed_power_int

來自 Linux 核心原始程式碼的 Load Average 處理機制 => linux/kernel/sched/loadavg.c

理解這定點數的乘冪花了我些時間了解,先從其註釋看起

 * By exploiting the relation between the definition of the natural power
 * function: x^n := x*x*...*x (x multiplied by itself for n times), and
 * the binary encoding of numbers used by computers: n := \Sum n_i * 2^i,
 * (where: n_i \elem {0, 1}, the binary vector representing n),
 * we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is
 * of course trivially computable in O(log_2 n), the length of our binary
 * vector.

寫成數學式:xn=xi=0kni2i

例如 (n=610=1102) :x6=x4x2

下面程式碼的目的為決定是否要將 x0,x2,x4 乘入,以 n=1102 為例,在第一次迴圈中 n & 1false ,故 x0 不會乘入其中。

    if (n & 1) {
        result *= x;
        result += 1UL << (frac_bits - 1);
        result >>= frac_bits;
    }
    n >>= 1;                               

到進行下次迴圈時 x 變為 x2 再下次為 x2x2=x4 以此類推

    x *= x;
    x += 1UL << (frac_bits - 1);              //carry
    x >>= frac_bits;

scaling factor

較大的縮放係數,可以使其數值範圍較大,但反之精度會較差,在這裡縮放係數表示為 1frac_bits
在老師提供的 mctsscore 使用的是 double ,因此要將其改成定點數運算,使用到 stdint.h 中的 uint64_t 來取代 double

implementation

commit:322411f

在 UCB 的計算中用到 log2 與 sqrt 的運算,需先將其改成使用定點數的運算。

log

參考< jujuegg >的實作使用泰勒級數展開來取得近似 ln(x) 的值 :
ln(1+x)=xx22!+x33!....

sqrt

接個引入第三週測驗中的開方根函式

static uint64_t fixed_sqrt(uint64_t N)
{
    uint64_t msb = 0;
    uint64_t n = N;
    while (n > 1) {
        n >>= 1;
        msb++;
    }
    uint64_t a = 1 << msb;
    uint64_t result = 0;
    while (a != 0) {
        if ((result + a) * (result + a) <= N)
            result += a;
        a >>= 1;
    }
    return result;
}

參照〈並行程式設計:排程器原理〉,引入若干 coroutine