2022q1 Homework1 (lab0)

contributed by < laneser >

作業要求

實驗環境

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           94
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Stepping:                        3
CPU MHz:                         799.929
CPU max MHz:                     4000.0000
CPU min MHz:                     800.0000
BogoMIPS:                        6816.61
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        8 MiB
NUMA node0 CPU(s):               0-7

針對佇列的操作

q_new

Create empty queue.
Return NULL if could not allocate space.

struct list_head *q_new()
{
    struct list_head *q = malloc(sizeof(struct list_head));
    if (q != NULL)
    {
        q->next = q->prev = q;
    }
    return q;
}

但發現其實 list.h 已經有提供 INIT_LIST_HEAD 可以用，所以應該改用

INIT_LIST_HEAD(q);

q_free

Free all storage used by queue.

void q_free(struct list_head *l)
{
    if (l == NULL)
        return;
    while (list_empty(l) == false) {
        struct list_head *n = l->next;
        list_del(n);
        free(n);
    }
    free(l);
}

q_size

Return number of elements in queue.
Return 0 if q is NULL or empty

int q_size(struct list_head *head)
{
    if (head == NULL)
        return 0;

    int len = 0;
    struct list_head *li;

    list_for_each (li, head)
        len++;
    return len;
}

q_insert_head

Attempt to insert element at head of queue.
Return true if successful.
Return false if q is NULL or could not allocate space.

bool q_insert_head(struct list_head *head, char *s)
{
    if (head == NULL)
        return false;
    element_t *new_ele = malloc(sizeof(element_t));
    if (new_ele == NULL)
        return false;
    new_ele->value = strdup(s);
    if (new_ele->value == NULL) {
        free(new_ele);
        return false;
    }
    list_add(&new_ele->list, head);
    return true;
}

然後才發現原來 node 是 element_t, 我之前的 q_free 函式寫錯了，應該改為

    while (list_empty(l) == false) {
        element_t *n = q_remove_head(l, NULL, 0);
        // failed to remove? stop it or it might lead to dead loop.
        if (n == NULL)
            return;
        q_release_element(n);
    }

q_insert_tail

Attempt to insert element at tail of queue.
Return true if successful.
Return false if q is NULL or could not allocate space.

此時發現 q_insert_tail 跟 q_insert_head 有重複的程式碼，所以抽出來做 new_element

/*
* New an element for s,
* It will allocate memory for s
* Return null if allocation failed.
*/
element_t *new_element(char *s)
{
    element_t *new_ele = malloc(sizeof(element_t));
    if (new_ele == NULL)
        return NULL;
    new_ele->value = strdup(s);
    if (new_ele->value == NULL) {
        free(new_ele);
        return NULL;
    }
    return new_ele;
}

bool q_insert_head(struct list_head *head, char *s)
{
    if (head == NULL)
        return false;
    element_t *new_ele = new_element(s);
    if (new_ele == NULL)
        return false;
    list_add(&new_ele->list, head);
    return true;
}

bool q_insert_tail(struct list_head *head, char *s)
{
    if (head == NULL)
        return false;
    element_t *new_ele = new_element(s);
    if (new_ele == NULL)
        return false;
    list_add_tail(&new_ele->list, head);
    return true;
}

q_remove_head

Attempt to remove element from head of queue.
Return target element.
Return NULL if queue is NULL or empty.

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    element_t *n = container_of(head->next, element_t, list);
    list_del(head->next);
    if (sp != NULL) {
        strncpy(sp, n->value, bufsize - 1);
        sp[bufsize - 1] = '\0';
    }
    return n;
}

需要花點時間讀懂你所不知道的 C 語言: linked list 和非連續記憶體
搞到兩點才看到皮卡丘 :sob:

(早上起床發現老師會貼心幫你修改作業…)

多花點時間再讀 list.h 有發現 list_for_each_entry_safe , 使用這個函式會讓 q_free 更簡潔也更快速，不用 remove 也就不用檢查 remove 失敗。

void q_free(struct list_head *l)
{
    if (l == NULL)
        return;
    element_t *n, *s;
    list_for_each_entry_safe (n, s, l, list)
        q_release_element(n);
    free(l);
}

list_first_entry 也可以用在 q_remove_head 函式當中

element_t *n = list_first_entry(head, element_t, list);

q_remove_tail

Attempt to remove element from tail of queue.

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    if (!head || !head->prev)
        return NULL;
    return q_remove_head(head->prev->prev, sp, bufsize);
}

感謝同學 at0mCe11 提醒 head 要做檢查。

可以盡量使用 q_remove_head, 才發現 q_remove_head 沒有檢查 list empty.
需要針對 q_remove_head 開頭補上

    if (list_empty(head)) {
        return NULL;
    }

q_delete_mid

Delete the middle node in list

bool q_delete_mid(struct list_head *head)
{
    // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
    if (!head || list_empty(head)) {
        return false;
    }
    struct list_head *h = head->next;
    struct list_head *t = head->prev;
    while (true) {
        if (h == t) {
            // length is odd number, delete the one.
            list_del(h);
            q_release_element(list_entry(h, element_t, list));
            break;
        } else if (h->next == t) {
            // length is even number, delete the bigger one.
            list_del(t);
            q_release_element(list_entry(t, element_t, list));
            break;
        }
        h = h->next;
        t = t->prev;
    }
    return true;
}

既然是 doubly-linked list, 那取中間最簡單的方法就是分別從頭跟尾一起搜尋，直到雙方碰頭為止. 不過, 如果 head 不是一個 doubly-linked list, 可能會陷入無窮迴圈。
然後又發現 q_remove_head 沒檢查 NULL。

q_delete_dup

Delete all nodes that have duplicate string,
leaving only distinct strings from the original list.
Return true if successful.
Return false if list is NULL.

Note: this function always be called after sorting

bool q_delete_dup(struct list_head *head)
{
    // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
    if (!head)
        return false;
    struct list_head *node;
    for (node = head->next; node->next != head;) {
        element_t *current = list_entry(node, element_t, list);
        element_t *next = list_entry(node->next, element_t, list);
        if (strcmp(current->value, next->value) == 0) {
            list_del(&next->list);
            q_release_element(next);
        } else {
            node = node->next;
        }
    }
    return true;
}

既然 list 已經是排序，所以重複的字串節點一定是在相鄰的節點上，所以只要依序移除重複節點就可以留下不重複的字串了。

延伸思考：若要針對 unsorted list 移除掉重複內容的節點，該怎麼做？如何避免沈重的排序操作？
:notes: jserv

對於 unsorted list, 考慮記憶體充足的狀況，最簡單就是建立 hash，這樣也是只要 O(n) 的時間就可以做完刪除重複的節點。

如果想要減低記憶體使用，可以考慮利用 bloom filter or cuckoo filter 來過濾重複字串，可是遇到 false postive 或者真的重複的情境就無可避免需要回頭掃描重複節點。

一般而言 bloom filter 使用的記憶體會遠小於 hash, 如果 false postive 機率夠低就有優勢。而且 filter 可以依照 list size 建立不同的大小，這樣可以應對的情境就比較廣。

Google 發現還有 trie 這種做法，一邊走訪 list, 一邊建立字典樹，也是一種方法。

可以結合 bloom filer and hash, 第一次走訪 list 建立 bloom filter, 並且僅針對可能重複的節點建立 hash, 第二次走訪則根據 hash 刪除重複的節點。

目前結論：

Sort + delete duplication scan cost O(nlogn)+O(n) = O(nlogn)
Build hash/trie and scan at the same time cost O(n)
Use bloom filter + hash cost O(2n), but we can trade time with memory.

感謝其他同學的回饋，我對於 delete duplicate 的意義弄錯了，所以再實作一次。

bool q_delete_dup(struct list_head *head)
{
    // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
    if (!head)
        return false;
    element_t *n, *s;
    bool isdup = false;
    list_for_each_entry_safe (n, s, head, list) {
        if ((n->list.next != head) &&
            (strcmp(n->value,
                    list_entry(n->list.next, element_t, list)->value) == 0)) {
            list_del(&n->list);
            q_release_element(n);
            isdup = true;
        } else if (isdup) {
            list_del(&n->list);
            q_release_element(n);
            isdup = false;
        }
    }
    return true;
}

還是基於相鄰的節點是才會有重複的字串，只是利用 isdup 紀錄目前這個節點是否與前一個節點為重複。

q_swap

Attempt to swap every two adjacent nodes.

void q_swap(struct list_head *head)
{
    // https://leetcode.com/problems/swap-nodes-in-pairs/
    if (!head)
        return;
    struct list_head *node;
    for (node = head->next; (node->next != head) && (node != head);
         node = node->next) {
        struct list_head *next = node->next;
        list_del(node);
        list_add(node, next);
    }
}

依序倆倆節點互相交換，先移除一個節點，再加入在交換的位置上，就完成了。
判斷迴圈繼續的條件需要至少兩個節點以上。

q_reverse

Reverse elements in queue

void q_reverse(struct list_head *head)
{
    if (!head || list_empty(head)) {
        return;
    }
    struct list_head *h = head->next;
    struct list_head *t = head->prev;
    while (true) {
        if (h == t) {
            // length is odd number, no need to move the last one.
            break;
        }
        if (h->next == t) {
            // length is even number, swap the last two element.
            list_del(h);
            list_add(h, t);
            break;
        }
        // swap two element.
        struct list_head *tmp = h->prev;
        list_del(h);
        list_add(h, t);
        list_del(t);
        list_add(t, tmp);
        tmp = h->prev;
        h = t->next;
        t = tmp;
    }
}

對於反向排列，簡單的思考就很像 delete mid 的架構，將頭尾交換，一直到中間剩下一對或者單一個節點為止。如果有好用的 swap function, 應該會讓程式碼更簡單。

想到如果 swap element point, 連 list 的指標通通不用改，這樣程式碼果然更乾淨。

/* Swap two elements' data */
void swap_element_value(element_t *a, element_t *b)
{
    char *tmp = a->value;
    a->value = b->value;
    b->value = tmp;
}

void q_reverse(struct list_head *head)
{
    if (!head || list_empty(head)) {
        return;
    }
    struct list_head *h = head->next;
    struct list_head *t = head->prev;
    while (h != t) {
        // swap two element.
        swap_element_value(list_entry(h, element_t, list),
                           list_entry(t, element_t, list));
        if (h->next == t) {
            // swaped latest two element, over.
            break;
        }
        h = h->next;
        t = t->prev;
    }
}

q_sort

Sort elements of queue in ascending order

void quickSort(struct list_head *low, struct list_head *high)
{
    if ((low != high) && (high->next != low)) {
        // partition elements.
        // using high element as pivot
        char *pivot_value = list_entry(high, element_t, list)->value;
        struct list_head *idx = low->prev;
        for (struct list_head *node = low; node != high; node = node->next) {
            element_t *current = list_entry(node, element_t, list);
            if (strcmp(current->value, pivot_value) <= 0) {
                idx = idx->next;
                swap_element_value(list_entry(idx, element_t, list), current);
            }
        }
        idx = idx->next;
        swap_element_value(list_entry(idx, element_t, list),
                           list_entry(high, element_t, list));
        quickSort(low, idx->prev);
        quickSort(idx->next, high);
    }
}

/*
 * Sort elements of queue in ascending order
 * No effect if q is NULL or empty. In addition, if q has only one
 * element, do nothing.
 */
void q_sort(struct list_head *head)
{
    if (head && !list_empty(head))
        quickSort(head->next, head->prev);
}

有了 swap, 就可以做簡單的 quick sort, 但做完看起來效能有待加強。
看來是要花時間研讀老師指定的 merge sort lib/list_sort.c 才能通過。

看了 make test 用的測試範例，發現 quicksort 之所以效能不佳，與老師上課不斷地提醒的 不夠確定性 有關係，因為 quicksort 的 worst case 很容易發生在已經完成排序的 data, 此時花費的時間會落入 O(\(n^2\)), 而不再是 O(nlogn).

換句話說，我簡單地實現一版 mergesort, 也會落入相同的問題。效能已經跟 inserstion sort 一樣慢了。

struct list_head *my_merge(struct list_head *l1, struct list_head *l2)
{
    // merge with recursive
    if (!l2)
        return l1;
    if (!l1)
        return l2;

    if (strcmp(list_entry(l1, element_t, list)->value,
               list_entry(l2, element_t, list)->value) < 0) {
        l1->next = my_merge(l1->next, l2);
        return l1;
    } else {
        l2->next = my_merge(l1, l2->next);
        return l2;
    }
}

struct list_head *my_mergeSortList(struct list_head *head)
{
    if (!head || !head->next)
        return head;

    // split list
    struct list_head *i = head->next;
    struct list_head *j = head;
    j->next = NULL;

    return my_merge(my_mergeSortList(i), my_mergeSortList(j));
}

void my_mergesort(struct list_head *head)
{
    if (q_size(head) <= 1) {
        return;
    }
    struct list_head *list = head->next;
    head->prev->next = NULL;
    list = my_mergeSortList(list);
    head->next = list;

    // rebuild prev link
    struct list_head *i = head;
    while (i->next != NULL) {
        i->next->prev = i;
        i = i->next;
    }
    head->prev = i;
    i->next = head;
}

老師提的 sort list 範例在 split list 的地方本來看不懂:

    // split list
    while(fast && fast->next) {
        slow = slow->next;
        fast = fast->next->next;
    }
    fast = slow->next;
    slow->next = NULL;

但看到隔壁同學的心得 https://hackmd.io/@ppodds/HJ20QXakq, 老師提到 Robert W. Floy 提出的循環偵測演算法, 原來這裡的 split 是要找到中間的節點。
所以 split 用這個方法就比原來好。不過還是有部分測試沒過。

要使用 lib_sort.c, 只需把 lib/list_sort.c 直接複製貼上，在這段程式碼之前補上一些定義就可以使用了。

typedef unsigned char u8;
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
typedef int
    __attribute__((nonnull(2, 3))) (*list_cmp_func_t)(void *,
                                                      const struct list_head *,
                                                      const struct list_head *);
int sort_comp(void *p, const struct list_head *a, const struct list_head *b)
{
    return strcmp(list_entry(a, element_t, list)->value,
                  list_entry(b, element_t, list)->value);
}

likely 跟 unlikely 這兩個機制蠻有趣的， https://meetonfriday.com/posts/cecba4ef 說這個是要優化 compiler 產生的指令。如果我們可以提供 if 條件發生的機率，那麼 compiler 可以幫我們產生更快的指令碼。說不定將來的 compiler 搭配靜態分析跟亂數產生的資料也可以自己判斷發生的機率而做到一樣的事情吧？

這樣搭配

void q_sort(struct list_head *head)
{
    if (head && !list_empty(head))
        list_sort(NULL, head, sort_comp);
}

雖然可以在自己的環境看到 100 分，但過不了 cppcheck 這一關。
原來 linux kernel 的 code 這行:

    struct list_head *head, **tail = &head;

會讓 cppcheck 抱怨 head 指標並沒有初始化就拿來用，其實後面會利用 *tail 塞值，但是靜態分析在這邊力有未逮，只好先安撫 cppcheck:

    struct list_head *head = NULL, **tail = &head;

這樣終於在 GitHub 看到抄來的滿分皮卡丘！

而原本的 mergesort 只剩下 trace-14-perf.cmd 這個測試過不了。

cmd> new
l = []
cmd> ih dolphin 1000000
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
cmd> it gerbil 1000000
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
cmd> reverse
l = [gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil ... ]
cmd> sort
Segmentation fault (core dumped)

對於這種數量級的才發生錯誤的狀況，大概猜是 stackoverflow.
因為呼叫太多層遞迴函式才會導致 Segmentation fault.
所以優化一下 my_merge

struct list_head *my_merge(struct list_head *a, struct list_head *b)
{
    struct list_head *head = NULL, **tail = &head;
    for (;;) {
        /* if equal, take 'a' -- important for sort stability */
        if (strcmp(list_entry(a, element_t, list)->value,
               list_entry(b, element_t, list)->value) <= 0) {
            *tail = a;
            tail = &a->next;
            a = a->next;
            if (!a) {
                *tail = b;
                break;
            }
        } else {
            *tail = b;
            tail = &b->next;
            b = b->next;
            if (!b) {
                *tail = a;
                break;
            }
        }
    }
    return head;
}

這邊就幾乎按照 lib_sort.c 裡的，用 for 取代遞迴。
終於有一個比較是自己寫的滿分版本。

比較自己實作的 merge sort 和 Linux 核心程式碼之間效能落差

由於沒辦法新增 queue.h 的定義，所以就把 lib_sort.c 的程式放進 qtest.c,
然後新增一個 option 來切換 sort 的實作，這樣很快就可以看看自己實作的效能跟 Linux 核心程式碼的差距。
在 qtest.c 當中把 do_sort 函式的部分改為這樣:

    if (exception_setup(true)) {
        if (is_enable_linux_sort()) {
            q_sort_linux(l_meta.l);
        } else {
            q_sort(l_meta.l);
        }
    }

於是利用這樣的 commands 來呈現差距:

# Test performance of sort compare to linux queue sort
option fail 0
option malloc 0
new
ih dolphin 1000000
it gerbil 1000000
reverse
time sort
option linux_qsort 1
new
ih dolphin 1000000
it gerbil 1000000
reverse
time sort

執行的結果顯示 lib_sort.c 效能約比自己的實作還要快 15%, 甚至用不同的資料測試還可以差到 35% 的時間。
顯然自己實作的效能輸很大。

lib_sort.c 針對不少 mergesort 的情境做了優化，而且架構上先走訪一次 list, 做了一輪可以盡早合併的 list, 切割預期數量 (power of two) 的 pending list, 然後才再把 pending list 做 mergesort. 而且會考量硬體層面，像是註解提到可以避免 cache thrashing, 但恐怕我還需要更多知識跟理解才能知道為什麼。

覺得 mergesort 一開始切割 list 這邊有優化的空間，如果一個 list 有發現已經排序好的，那就不用排了，把切割點切在沒有排序的 list 上面，若無法發現排序，才使用 cycle detection.

struct list_head *my_mergeSortList(struct list_head *head)
{
    if (!head || !head->next)
        return head;
    struct list_head *sorted_tail = head;
    // scan list for not sorted node
    while (sorted_tail->next) {
        if (strcmp(list_entry(sorted_tail, element_t, list)->value,
                   list_entry(sorted_tail->next, element_t, list)->value) > 0) {
            break;
        }
        sorted_tail = sorted_tail->next;
    }
    if (!sorted_tail->next) {
        // all the list is sorted, just return
        return head;
    }
    if (sorted_tail != head) {
        struct list_head *not_sorted = sorted_tail->next;
        sorted_tail->next = NULL;
        return my_merge(head, my_mergeSortList(not_sorted));
    }
    // the first two node is not sorted,
    // use cycle detection to split list at middle
    struct list_head *slow = head;
    struct list_head *fast = head->next;
    while (fast && fast->next) {
        slow = slow->next;
        fast = fast->next->next;
    }
    fast = slow->next;
    slow->next = NULL;
    return my_merge(my_mergeSortList(head), my_mergeSortList(fast));
}

這樣就不太會輸那麼多了 (會贏很多是因為資料太順了，自己的實作有 overfitting 的狀況)。

cmd> # Test performance of sort compare to linux queue sort
cmd> new
l = []
cmd> ih dolphin 1000000
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
cmd> it gerbil 1000000
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
cmd> time sort
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
Delta time = 0.310 (lib_sort cost 0.639)
cmd> reverse
l = [gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil ... ]
cmd> time sort
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
Delta time = 0.367 (lib_sort cost 0.807)
cmd> time sort
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
Delta time = 0.343 (lib_sort cost 0.779)
cmd> reverse
l = [gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil ... ]
cmd> time sort
l = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ]
Delta time = 0.332 (lib_sort cost 0.779)
cmd> new
Freeing old queue
l = NULL
l = []
cmd> ih RAND 100000
l = [yxyzgble nzhmn ctwmt bjbkw oyoih xhkaiboyz syjluc yvozduje ynucbro bfekbkja vlnemh eolretmn bkohtezz mohtjoqzm oqeww rukfmfe wgtevoy iwigvd bwolqvx haouyiptr fjhra ncglrksf ldnwrt ecmooht qcypcgoz drutbjmg xeiuteq oebry qgmrpk vpwwo ... ]
cmd> time sort
l = [aaada aaafyckx aaaiwafhn aaajrigh aaajsfo aaaln aaatq aaaxjv aabagcvw aabbnnnyf aabdv aabftyoo aabgxte aabkq aablehbsv aabxtztiw aacebvvc aacetjszx aacmlowk aactvk aacyuzz aadptyps aadzfnmc aaeeqwxl aaelfqbd aaevbihv aaewkftdq aafbyha aaffqybq aafki ... ]
Delta time = 0.074 (lib_sort cost 0.086)
cmd> ih RAND 100000
l = [splnj txgmcz mskaos cewvucpzq huwererb yjntn twrpgtd xwgkih vehlho foibyawhc bfmqwwxu apwolcdli qbhuj kwmpihw ntenygd fhaoccghy daojzpaor vbsfurbwn naexdat vbycwgjn ytihi omlnzni qmlwfwet cbqvl rhtmzlztc svhxcds hqbuo gfwvr fhwtgf lxiwte ... ]
cmd> time sort
l = [aaada aaafyckx aaahezhv aaaiwafhn aaajrigh aaajsfo aaaln aaatq aaaxjv aabagcvw aabaghlk aabbnnnyf aabdv aabftyoo aabgxte aabkq aablehbsv aabnx aabpjn aabptkcjj aabrmx aabxtztiw aacebvvc aacetjszx aacgg aacmcdffj aacmlowk aacppul aactvk aacyuzz ... ]
Delta time = 0.173 (lib_sort cost 0.188)

在讀了同學 bakudr18 的 sort 以及 kdnvt 對於排序的比較次數的作業後，在這裡其實可以花更多時間鑽研問題，尤其是 bakudr18 對 linux kernel 的研究方法很棒，要去挖 git log! 而不是只看 source code.

TODO: 對此問題再多花點時間。

運用 Valgrind 排除 qtest 實作的記憶體錯誤

透過 valgrind 呼叫 qtest 執行，看來沒甚麼問題。

valgrind ./qtest
cmd> new
l = []
cmd> ih gerbil
l = [gerbil]
cmd> ih bear
l = [bear gerbil]
cmd>
Freeing queue

但如果透過 -f 讓 qtest 執行命令，就會看到 memory leak 的訊息。











































valgrind ./qtest -f ./traces/trace-01-ops.cmd 
cmd> # Test of insert_head and remove_head
cmd> option fail 0
cmd> option malloc 0
cmd> new
l = []
cmd> ih gerbil
l = [gerbil]
cmd> ih bear
l = [bear gerbil]
cmd> ih dolphin
l = [dolphin bear gerbil]
cmd> rh dolphin
Removed dolphin from queue
l = [bear gerbil]
cmd> rh bear
Removed bear from queue
l = [gerbil]
cmd> rh gerbil
Removed gerbil from queue
l = []
cmd> 
Freeing queue
==13465== 4 bytes in 1 blocks are still reachable in loss record 1 of 3
==13465==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13465==    by 0x4A3F50E: strdup (strdup.c:42)
==13465==    by 0x110CDE: linenoiseHistoryAdd (linenoise.c:1236)
==13465==    by 0x111871: linenoiseHistoryLoad (linenoise.c:1325)
==13465==    by 0x10CAC3: main (qtest.c:1282)
==13465== 
==13465== 145 bytes in 19 blocks are still reachable in loss record 2 of 3
==13465==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13465==    by 0x4A3F50E: strdup (strdup.c:42)
==13465==    by 0x110C52: linenoiseHistoryAdd (linenoise.c:1236)
==13465==    by 0x111871: linenoiseHistoryLoad (linenoise.c:1325)
==13465==    by 0x10CAC3: main (qtest.c:1282)
==13465== 
==13465== 160 bytes in 1 blocks are still reachable in loss record 3 of 3
==13465==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==13465==    by 0x110C9E: linenoiseHistoryAdd (linenoise.c:1224)
==13465==    by 0x111871: linenoiseHistoryLoad (linenoise.c:1325)
==13465==    by 0x10CAC3: main (qtest.c:1282)
==13465==

所以問題應該就是 -f 這個選項開啟的時候，沒有把 history 釋放掉。
也就是沒有呼叫 linenoise.c 裡面的 freehistory 這個函式，或者它的 caller linenoiseAtExit 函式。
觀察 linenoise.h 會發現， freehistory 或者 linenoiseAtExit 這些函式都沒有開放給外部呼叫，而 linenoiseAtExit 是在 enableRawMode 的時候，透過 atexit 註冊到程序結束時被自動呼叫。
一路追查 enableRawMode，直到 linenoise 這個函式，終於找到 -f 選項差異。

    if (!has_infile) {
        char *cmdline;
        while ((cmdline = linenoise(prompt)) != NULL) {
            interpret_cmd(cmdline);
            linenoiseHistoryAdd(cmdline);       /* Add to the history. */
            linenoiseHistorySave(HISTORY_FILE); /* Save the history on disk. */
            linenoiseFree(cmdline);
        }
    } else {
        while (!cmd_done())
            cmd_select(0, NULL, NULL, NULL, NULL);
    }

如果沒有 -f , 則會透過 linenoise 讀取命令，否則是透過檔案內容。
而透過檔案內容因為沒有呼叫 linenoise 因而沒有機會註冊 linenoiseAtExit，自然就沒有 free history 資料了。

我的想法是，既然沒有 -f 選項的時候也用不到 history, 那就不需要那麼早呼叫 linenoiseHistoryLoad, 而是在呼叫 linenoise 之前讀取 history 就好了。

diff --git a/console.c b/console.c
index 2616acd..89d5b37 100644
--- a/console.c
+++ b/console.c
@@ -652,6 +652,7 @@ bool run_console(char *infile_name)
     }
 
     if (!has_infile) {
+        linenoiseHistoryLoad(HISTORY_FILE); /* Load the history */
         char *cmdline;
         while ((cmdline = linenoise(prompt)) != NULL) {
             interpret_cmd(cmdline);
diff --git a/qtest.c b/qtest.c
index cd7f3e1..8c1c02c 100644
--- a/qtest.c
+++ b/qtest.c
@@ -1279,7 +1279,6 @@ int main(int argc, char *argv[])
     linenoiseSetCompletionCallback(completion);
 
     linenoiseHistorySetMaxLen(HISTORY_LEN);
-    linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */
     set_verblevel(level);
     if (level > 1) {
         set_echo(true);

第二個發現的問題是 make valgrind 若失敗的時候，也會有錯誤:

+++ TESTING trace trace-17-complexity:
[0m---	trace-17-complexity	0/5[0m
[0m---	TOTAL		95/100[0m
==26274== 32 bytes in 1 blocks are still reachable in loss record 1 of 30
==26274==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==26274==    by 0x10D15C: malloc_or_fail (report.c:189)
==26274==    by 0x10DBE6: add_cmd (console.c:90)
==26274==    by 0x10DF70: init_cmd (console.c:409)
==26274==    by 0x10C8BE: main (qtest.c:1275)

追查 init_cmd 所建立的 cmd_list 等變數應該是在 do_quit() 的時候被 free, 所以若有遺留，就是 do_quit() 沒有被呼叫，再一路追查到 finish_cmd() 應該也是沒被呼叫，因為 qtest.c 是這樣呼叫 finish_cmd():

ok = ok && finish_cmd();

若失敗的時候, ok 的內容已經是 false, 則 finish_cmd() 就不會被呼叫了。可以改為

ok = finish_cmd() && ok;

就可以避免 ok 是 false 的時候沒有呼叫 finish_cmd().

Valgrind 還可以透過 massif 得到視覺化的資料，但我的 linux 只有 console mode, 沒有 GUI, 好在還有好心人士有提供 massif.js 可以直接視覺化資料。

valgrind --tool=massif ./qtest -f traces/trace-sort-perf.cmd

會得到一個 massif.out.PID 的檔案，把內容丟到該 webpage 就有圖片, 但沒有 Massif-Visualizer 漂亮。

而因為 Valgrind 會大幅降低速度，跑分析之類的都會 timeout, 可以把 harness.c 裡面的 time_limit 暫時調高用來跑獲取分析數據

static int time_limit = 10;

在 qtest 提供新的命令 `shuffle`

理解 Fisher–Yates shuffle 演算法，實作如下:

void swap_value(element_t *a, element_t *b)
{
    char *tmp = a->value;
    a->value = b->value;
    b->value = tmp;
}

/* Get the nth node from list */
struct list_head *list_idx(struct list_head *head, int idx)
{
    while (idx-- >= 0) {
        head = head->next;
    }
    return head;
}

static void q_shuffle(struct list_head *head)
{
    int len = q_size(head);
    if (len <= 1) {
        return;
    }

    srand(time(NULL));
    struct list_head *tail = head->prev;
    for (int i = len; i > 1; i--) {
        int rand_idx = rand() % i;
        swap_value(list_entry(list_idx(head, rand_idx), element_t, list),
                   list_entry(tail, element_t, list));
        tail = tail->prev;
    }
}

要在 qtest 加命令，只需要在 console_init() 其中加入一行:

    ADD_COMMAND(
        shuffle,
        "                | Shuffle the queue using Fisher–Yates shuffle");

並且提供 do_shuffle 函式的實作:

static bool do_shuffle(int argc, char *argv[])
{
    if (argc != 1) {
        report(1, "%s takes no arguments", argv[0]);
        return false;
    }

    if (!l_meta.l)
        report(3, "Warning: Try to access null queue");
    error_check();

    set_noallocate_mode(true);
    if (exception_setup(true))
        q_shuffle(l_meta.l);
    exception_cancel();

    set_noallocate_mode(false);

    show_queue(3);
    return !error_check();
}

就可以透過 qtest 使用 shuffle 命令:

cmd> new
l = []
cmd> ih RAND 5
l = [tcuvqhs kfczncdek nmxysnawp minhyu woyshni]
cmd> shuffle
l = [woyshni kfczncdek nmxysnawp tcuvqhs minhyu]
cmd> shuffle
l = [woyshni tcuvqhs minhyu kfczncdek nmxysnawp]

在 qtest 提供新的命令 `web`

既然需要引入 tiny-web-server , 那就是把當中的 tiny.c 放進專案，然後想辦法讓 Makefile 接受它。

OBJS := qtest.o report.o console.o harness.o queue.o \
        random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \
        linenoise.o tiny.o

看來只要在 Makefile 的 OBJS 後面加上 tiny.o, Makefile 會自動編譯 tiny.c.

接著修改 tiny.c 的 Main.

// Launch tinyweb server
// Return accepted fd
// Return 0 if error or failed.
int tinyweb_main(int argc, char **argv)
{
    if (argc > 1 && (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help"))) {
        print_help();
        return 0;
    }

    int default_port = DEFAULT_PORT, listenfd;
    char buf[256];
    char *path = getcwd(buf, 256);
    if (argc == 2) {
        if (argv[1][0] >= '0' && argv[1][0] <= '9') {
            default_port = atoi(argv[1]);
        } else {
            path = argv[1];
            if (chdir(path) != 0) {
                perror(path);
                exit(1);
            }
        }
    } else if (argc == 3) {
        default_port = atoi(argv[2]);
        path = argv[1];
        if (chdir(path) != 0) {
            perror(path);
            exit(1);
        }
    }
    printf("serve directory '%s'\n", path);

    listenfd = open_listenfd(default_port);
    if (listenfd > 0) {
        printf("listen on port %d, fd is %d\n", default_port, listenfd);
    } else {
        perror("ERROR");
        return 0;
    }
    // Ignore SIGPIPE signal, so if browser cancels the request, it
    // won't kill the whole process.
    signal(SIGPIPE, SIG_IGN);
    return listenfd;
}

把 Main 改為只要啟動 web server 就回傳 listenfd, 之後就能整合進 qtest.
還有把每次接收連線進來的 accept 函式也寫出來。

// Tinyweb accept connection
// Return connection fd
int tinyweb_accept(int listenfd)
{
    struct sockaddr_in clientaddr;
    socklen_t clientlen = sizeof clientaddr;
    return accept(listenfd, (SA *) &clientaddr, &clientlen);
}

而且因為其他 qtest 需要呼叫 tiny.c 裡面的函式跟定義, 所以抽出 tiny.h.

#ifndef __TINY_H
#define __TINY_H

#include <sys/types.h>

typedef struct {
    char filename[512];
    off_t offset; /* for support Range */
    size_t end;
} http_request;

// Launch tinyweb server
int tinyweb_main(int argc, char **argv);

// Tinyweb accept connection
int tinyweb_accept(int listenfd);

// Process http request
void parse_request(int fd, http_request *req);

#endif  /* __TINY_H */

把需要提供給 qtest 使用的三個函式作為 tiny-web-server 的介面, http_request 就照著搬過來。

就可以先從提供 web 命令開始做起。
在 console_init 裡面補上：

ADD_COMMAND(web, "                | Launch tinyweb server");

而且實作 do_web 函式：

/* tinyweb server fd */
int tinyweb_fd = 0;
/* tinyweb connection fd */
int tinyweb_conn_fd = 0;

static bool do_web(int argc, char *argv[])
{
    if (tinyweb_fd) {
        report(3, "Warning: Already launched tinyweb server");
    } else {
        tinyweb_fd = tinyweb_main(argc, argv);
        report(3, "Launched tinyweb server, press enter to make web accept input");
    }
    return true;
}

利用 tinyweb_fd 作為 server listen socket id, 而一次只能起一個 web, 也沒有打算要關掉的意思。 :laughing:
現在就應該有 web 命令了。

 ./qtest
cmd> web
serve directory '/root/lab0-c'
listen on port 9999, fd is 3
Launched tinyweb server
cmd>

但是，後續還有好多事情要做。
因為作業的要求是能夠在輸入指令的同時，處理 web 的輸入。現在的狀態會讓 qtest 停止在等待鍵盤輸入的狀態，因而無法接受來自網路的連線需求。所以有必要把 tinyweb_fd 這個接收連線的 socket id 傳遞給等待鍵盤出入的函式。
老師已經提示等待鍵盤輸入的函式就在 linenoise.c 的 linenoiseEdit 當中。
所以可以一路追查到 qtest 的

ok = ok && run_console(infile_name);

這行指令就是 qtest 等待鍵盤輸入的進入點。
基於執行 web 命令的時候根本沒有離開 run_console 的函式，所以 tinyweb_fd 值的變化難以透過 run_console 傳遞，但 tinyweb_fd 的位址可不會變化，所以就是給它位址。
tinyweb_conn_fd 是之後我們要跟網路輸出溝通使用的，這裡也先給它。

ok = ok && run_console(infile_name, &tinyweb_fd, &tinyweb_conn_fd);

於是， run_console 裡面主要的接收鍵盤指令並且執行的部分就變成

while ((cmdline = linenoise(prompt, *tinyweb_fd, tinyweb_conn_fd)) != NULL) {
    if (*tinyweb_conn_fd)
    {
        char header[] = "HTTP/1.1 200 OK\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\n";
        // write http response header
        if (write(*tinyweb_conn_fd, header, strlen(header)) < 0)
            printf("write web output error.");
    }
    interpret_cmd(cmdline);
    if (*tinyweb_conn_fd)
    {
        fsync(*tinyweb_conn_fd);
        close(*tinyweb_conn_fd);
        *tinyweb_conn_fd = 0;
    }
    linenoiseHistoryAdd(cmdline);       /* Add to the history. */
    linenoiseHistorySave(HISTORY_FILE); /* Save the history on disk. */
    linenoiseFree(cmdline);
    }

在這邊考慮把 linenoise 的實作成為當 web 跟 stdio 二者任何一個有輸入的時候，就會把該輸入的字串回傳成為 cmdline.
如果 *tinyweb_conn_fd 有值，表示 tinyweb 有收到 http 連線，於是實作 http protocol, 回傳 http header 並且在執行完命令後關閉連線。

接著就是比較難的 linenoise 實作，不過老師已經有給了範例，稍微修改就可以用。

signed char c;
int nread;
char seq[3];

if (tinyweb_fd) {
    // we have both web and console.
    fd_set set;
    FD_ZERO(&set);
    FD_SET(tinyweb_fd, &set);
    FD_SET(stdin_fd, &set);
    int rv = select(tinyweb_fd + 1, &set, NULL, NULL, NULL);
    switch (rv) {
    case -1:
        perror("select"); /* an error occurred */
        continue;
    case 0:
        printf("timeout occurred\n"); /* a timeout occurred */
        continue;
    default:
        if (FD_ISSET(tinyweb_fd, &set)) {
            // web requested.
            *tinyweb_conn_fd = tinyweb_accept(tinyweb_fd);
            http_request req;
            parse_request(*tinyweb_conn_fd, &req);
            if (strlen(req.filename) >= buflen)
            {
                // request is too large!
                printf("web input too large to process\n");
                close(*tinyweb_conn_fd);
                continue;
            }
            printf("web input: %s", req.filename);
            strncpy(buf, req.filename, buflen);
            // replace / to ' ', simulate command line input
            char *replace = buf;
            while ((replace = strchr(replace, '/')) != NULL)
                *replace++ = ' ';
            return strlen(buf);
        } else if (FD_ISSET(stdin_fd, &set)) {
            nread = read(l.ifd, &c, 1);
            if (nread <= 0)
                return l.len;
        }
        break;
    }
} else {
    nread = read(l.ifd, &c, 1);
    if (nread <= 0)
        return l.len;
}

在 while(1) 的等待迴圈中，如果同時有 web 要一起接收輸入的話，就用 select 來查看到底是誰有輸入，一旦 tinyweb_fd 有輸入，就是有連線進來了，可用 tinyweb_accept 接收，用 parse_request 處理 http, 取得 url 後面的 filename 當作命令的輸入。
不過，需要一點小轉換，把 / 換成空白，這樣命令格式就跟 stdio 的格式一樣了。

最後，處理命令的輸出，修改 report.c 當中的 report 以及 report_noreturn 也一併輸出給 http, 所以 curl 命令的輸出就會漂亮不少。

extern int tinyweb_conn_fd;

// report string to verb/logfile/tinyweb
void reports(char *msg)
{
    if (!verbfile)
        init_files(stdout, stdout);
    fputs(msg, verbfile);
    fflush(verbfile);
    if (logfile) {
        fputs(msg, logfile);
        fflush(logfile);
    }
    if (tinyweb_conn_fd) {
        if (write(tinyweb_conn_fd, msg, strlen(msg)) < 0)
            printf("Write web error.");
        fsync(tinyweb_conn_fd);
    }
}

void report(int level, char *fmt, ...)
{
    if (level > verblevel)
        return;
    va_list ap;
    int bufsize = 1024;
    char buf[bufsize];
    va_start(ap, fmt);
    vsnprintf(buf, bufsize, fmt, ap);
    va_end(ap);
    reports(buf);
    reports("\n");
}

void report_noreturn(int level, char *fmt, ...)
{
    if (level > verblevel)
        return;
    va_list ap;
    int bufsize = 1024;
    char buf[bufsize];
    va_start(ap, fmt);
    vsnprintf(buf, bufsize, fmt, ap);
    va_end(ap);
    reports(buf);
}

這樣就有支援 curl 的 qtest 了。

./qtest
cmd> web
serve directory '/root/lab0-c'
listen on port 9999, fd is 3
Launched tinyweb server
cmd> web input: new
l = []
cmd> web input: it/a
l = [a]
cmd> web input: it/b
l = [a b]
cmd> web input: ih/c
l = [c a b]
cmd> web input: sort
l = [a b c]
cmd>

這裡只有透過 web 命令啟動 tiny-web-server, 接著在另一個 shell 執行

root@k8sdev:~/lab0-c# curl http://localhost:9999/new
l = []
root@k8sdev:~/lab0-c# curl http://localhost:9999/it/a
l = [a]
root@k8sdev:~/lab0-c# curl http://localhost:9999/it/b
l = [a b]
root@k8sdev:~/lab0-c# curl http://localhost:9999/ih/c
l = [c a b]
root@k8sdev:~/lab0-c# curl http://localhost:9999/sort
l = [a b c]

有了完整的 http 實作， curl 指令的回應就會比較正式。

而且可以觀察自己實作的 http 細節:

root@k8sdev:~/lab0-c# curl http://localhost:9999/sort -v
*   Trying 127.0.0.1:9999...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9999 (#0)
> GET /sort HTTP/1.1
> Host: localhost:9999
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
< 
l = [a b c]
* Closing connection 0

研讀論文〈Dude, is my code constant time?〉

這篇論文提出了 dudect 這個程式來判定某個測試程式是否是 constant time , 因為基於 Student’s t-test , 利用 Timing leakage detection 而做到不侷限某特定 CPU 硬體平台。

Introduction 先從 side-channel attacks 講述了 constant time 方法的重要性，還有偵測 constant time 的困難度。多數的方法都需要針對硬體設計參數，但是硬體廠商偏偏又很少洩露許多相關資訊。所以這篇論文就是貢獻了無須硬體資訊的檢測方法。

接下來陳述利用 Timing leakage detection 這個方法，先利用兩種類型的輸入資料來量測目標函式耗費的時間。一種為固定的模式，一種用亂數模式。重複執行而取得許多的量測結果。
在做統計分析之前需要先過濾極端值，以及做 centered product 這種 second-order 處理。
最後用 Welch's_t-test 來判定是否有通過檢驗, 但也提到其他 Non-parametric tests 的判定方法。

接著論文展示了這套方法用在 Memory comparison, AES, Curve-25519 等判定。
但蠻有趣的是，在 Curve-25519 的判定上，相同的程式用在不同的硬體上卻會有不同的判定結果。

接著討論一些改進的方向，諸如編譯器的問題，硬體方面的假設幾乎為零，需要多少重複執行測試才能確認？偽陽性的問題等等。

最後結論提到論文的方法概念簡單，而且有擴充的空間，並提出了下一步的方向跟感謝。

指出現有程式的缺陷

提示: 和 RIO 套件有關

既然老師都已經提示跟 RIO 的實作有關，可是對著這幾行程式也看不太出問題。

但 tiny-web-server 剛好裡面也有一套 RIO，拿來一比就發現 qtest 的 RIO 是有 stack 的，追查一下為甚麼要用 stack, 可以發現原來是因為需要支援 source 這個命令。

執行 source 命令需要開啟另一個檔案匯入命令，並且執行完畢後又回歸原本執行中的檔案位置。
所以就需要類似 stack 的方式來操作許多開啟的檔案。

stack 就會讓人聯想 stackoverflow …因為大家都去那裏查問題。 :laughing:
所以我只會想到，如果 source 命令後面接的檔案名稱正是自身檔案，就會爆掉。

做個簡單的小實驗提供一個名為 test.cmd 的檔案內容:

new
it a
show
source test.cmd
show

然後進行

./qtest -f ./test.cmd
...
...
...
l = [a]
cmd> source test.cmd
*** buffer overflow detected ***: terminated
Aborted (core dumped)

只是總覺得這算使用者操作錯誤，不像是程式有問題。
但如果要防呆，可以考慮驗證 source 命令後面的檔案是否存在目前已經使用中的檔案名稱，若有就判斷是無窮迴圈。但這樣是基於目前還沒有設計出分支命令的關係。所以也許給個上限是最簡單的。

另外 buffer 也讓人會聯想超過 buffer size 會如何?
而程式碼 readline 函式內部也有處理萬一超過就只好直接砍斷的作法。而我也很難想出更好的做法了。畢竟資源也沒辦法無限上綱。

參考資訊

共筆格式參考：共筆示範
作業參考：
- bakudr18 的 sort
- kdnvt 對於排序的比較次數