2021q1 Homework1 (lab0)

contributed by < YOYOPIG >

tags: `linux2021`

Environment

Linux Kubuntu 20.04 LTS

TODO

Update queue.h
Implement queue.c
Optimize sorting algorithm
透過 Massif 視覺化 “simulation” 過程中的記憶體使用量
Address sanitizer bug fix
coroutine
解釋 select 系統呼叫在本程式的使用方式
說明 antirez/linenoise 的運作原理，注意到 termios 的運用
研讀論文 Dude, is my code constant time?

環境安裝

$ sudo apt install build-essential git clang-format cppcheck aspell colordiff valgrind

Could not connect to tw.archive.ubuntu.com:80 (2001:e10:2000:240:e643:4bff:fee8:a63c). - connect (111: Connection refused) Could not connect to tw.archive.ubuntu.com:80 (140.110.240.80). - connect (111: Connection refused)

之前也曾遇過一次，但是忘記怎麼解決的了。這次重新爬文，發現是因為最近 Ubuntu 在台灣的 server 不太穩定，可以嘗試更換 mirror 即可。

Queue implementation

仔細研究要求，我們需要用單向的 linked list 實作一個 queue。同時，要可以支援

插入至頭、尾
取出頭
查詢 queue 的大小
Reverse
排序

queue.h

目前 queue 結構只包含了一個指向 head 的 pointer。雖然只靠這個 pointer 就足以處理上面的需求，但在插入尾部、查詢大小等操作時，都需要 traverse 過一次，需要 O(n) 的時間。為了達到 constant time，我們可以加入一個指向最後一個 node 的 pointer，以及一個紀錄大小的變數。

typedef struct {
    list_ele_t *head; /* Linked list of elements */
    list_ele_t *tail;
    int size;
} queue_t;

queue.c

New a queue

queue_t *q_new()
{
    queue_t *q = malloc(sizeof(queue_t));
    if (!q)
        return NULL;
    q->head = NULL;
    q->tail = NULL;
    q->size = 0;
    return q;
}

Delete a queue

這裡要注意，和一些用 class 包住 data structure 的語言不同，如果我們只 free 掉 queue，並不會把整個 queue 裡的所有資源釋出。仔細看我們定義的 queue，其實只是由兩個 pointer 及一個 int 組成，並不包含 new 的過程中 malloc 出來的 node。因此，對 queue 使用 free 只會 free 掉紀錄頭尾的兩個 pointer 及記錄大小的 size，同時也無法再存取中間的 nodes，造成 memory leak。這裡最適合的作法是先一個一個的 free 掉中間的 node，最後再 free 掉我們的 queue。

void q_free(queue_t *q)
{
    if (!q)
        return;
    /* Freeing the list elements and the strings */
    list_ele_t *cur = q->head;
    list_ele_t *next = NULL;
    while (cur) {
        next = cur->next;
        free(cur->value);
        free(cur);
        cur = next;
    }
    /* Free queue structure */
    free(q);
}

Insert data

插入資料基本上就是 malloc 出一個 list element，然後插入頭/尾。由於我們都有指標指向 queue 原本的頭尾了，因此 insertion 的時間複雜度都是 O(n)

bool q_insert_head(queue_t *q, char *s)
{
    if (!q)
        return false;
    list_ele_t *newh;
    newh = malloc(sizeof(list_ele_t));
    if (!newh)
        return false;
    newh->value = malloc(sizeof(char) * (strlen(s) + 1));
    if (!newh->value) {
        free(newh);
        return false;
    }
    strncpy(newh->value, s, strlen(s) + 1);
    newh->next = q->head;
    q->head = newh;
    if (!q->tail)
        q->tail = newh;
    q->size++;
    return true;
}

bool q_insert_tail(queue_t *q, char *s)
{
    if (!q)
        return false;
    list_ele_t *node;
    node = malloc(sizeof(list_ele_t));
    if (!node)
        return false;
    node->value = malloc(sizeof(char) * (strlen(s) + 1));
    if (!node->value) {
        free(node);
        return false;
    }

    strncpy(node->value, s, strlen(s) + 1);
    node->next = NULL;
    if (q->size == 0) {
        q->head = node;
        q->tail = node;
    } else {
        q->tail->next = node;
        q->tail = node;
    }
    q->size++;
    return true;
}

pop

bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
    if (!q || !q->size)
        return false;
    if (sp) {
        if (strlen(q->head->value) > bufsize - 1) {
            strncpy(sp, q->head->value, bufsize - 1);
            sp[bufsize - 1] = '\0';
        } else {
            strncpy(sp, q->head->value, strlen(q->head->value) + 1);
        }
    }
    list_ele_t *old = q->head;
    q->head = q->head->next;
    free(old->value);
    free(old);
    q->size--;
    return true;
}

小問題
在第一行的判斷式中，原本是寫成if (q == NULL || q->size==0)，經老師建議改成if (!q || !q->size)。可以理解在改完之後效率較高，!q也不難理解是q不存在的意思。然而，這裡不太確定!q->size的寫法，是否會較容易造成後續維護的問題? 似乎沒有q->size==0來的直觀，如果是在其他有 bool 型態的語言，變數名稱又不像 size 這麼直觀的話，或許可能會搞混 data type?
之後可以查一下是否有能維持原寫法，但透過 compiler 在 compile time 自動把 == 0判斷式優化成!的方式 (to be done…)

size

int q_size(queue_t *q)
{
    if (!q)
        return 0;
    return q->size;
}

Reverse

void q_reverse(queue_t *q)
{
    if (!q || q->size <= 1)
        return;
    list_ele_t *cur = q->head;
    list_ele_t *next = cur->next;
    cur->next = NULL;
    while (next) {
        list_ele_t *tmp = next->next;
        next->next = cur;
        cur = next;
        next = tmp;
    }
    list_ele_t *tmp = q->head;
    q->head = q->tail;
    q->tail = tmp;
    return;
}

Sort

這裡我們實作 merge sort，細節寫在 sort.c 中，或許未來可以根據資料類型支援其他 sorting algorithm

void q_sort(queue_t *q)
{
    if (!q || q->size <= 1)
        return;
    q->head = merge_sort(q->head);
    // Update the position of the tail
    q->tail = q->head;
    while (q->tail->next)
        q->tail = q->tail->next;
}

sort.c

這裡我們透過快慢雙指針找到中點，分割後做 merge sort

list_ele_t *merge_sort(list_ele_t *head)
{
    if (!head || !head->next)
        return head;

    // Find and split the list in the middle
    list_ele_t *middle = NULL;
    list_ele_t *slow = head;
    list_ele_t *fast = head->next;

    while (fast) {
        fast = fast->next;
        if (fast) {
            slow = slow->next;
            fast = fast->next;
        }
    }

    middle = slow->next;
    slow->next = NULL;
    // Sort the two parts and then merge
    list_ele_t *first = merge_sort(head);
    list_ele_t *second = merge_sort(middle);
    return merge(first, second);
}

merge

old version Reference

list_ele_t *merge(list_ele_t *first, list_ele_t *second)
{
    // Edge case : only one part remaining
    if (!first)
        return second;
    if (!second)
        return first;
    list_ele_t *ordered = NULL;
    if (strcmp(first->value, second->value) < 0) {
        ordered = first;
        ordered->next = merge(first->next, second);
    } else {
        ordered = second;
        ordered->next = merge(first, second->next);
    }
    return ordered;
}

發現用 Recursive 的型式，面對大測資 trace-15 (1000000 * 2 nodes)無法跑過
考慮的解法如下：

改成 iterative 應該可以稍微改善效能
考慮使用其他 O(n) sorting algorithm

Iterative form

list_ele_t *merge(list_ele_t *first, list_ele_t *second)
{
    list_ele_t *ordered = NULL;
    list_ele_t *cur = NULL;

    while (first && second) {
        list_ele_t *tmp;

        // Acquire the smallest element
        if (strcmp(first->value, second->value) < 0) {
            tmp = first;
            first = first->next;
            tmp->next = NULL;
        } else {
            tmp = second;
            second = second->next;
            tmp->next = NULL;
        }

        // Append it to ordered list
        if (!ordered) {
            ordered = tmp;
            cur = tmp;
        } else {
            cur->next = tmp;
            cur = cur->next;
        }
    }
    // Append the only remaining list to back
    cur->next = (first) ? first : second;
    return ordered;
}

雖然 time complexity 不變，但透過減少 recursive call，成功通過所有測試

Radix sort optimization

既然 merge sort 已經是 comparison sort 系列演算法時間複雜度最低的幾種之一了，要進一步優化，顯然得用其他的方法。由於我們資料的 value 都是字串，而字串的比對是屬於一位一位分別比較的，因此這裡嘗試使用概念相似的 radix sort + counting sort 降低時間複雜度至 O(d * n)。
Reference - MikeCAT's c++ implementation

首先，我們先找出 queue 中所有元素最長字串的長度。接著，根據 radix sort 的精神，一位一位的進行排序

void radix_sort(queue_t *q)
{
    int max = get_max_length(q->head, q->size);

    for (int digit = max-1; digit >= 0; digit--) { 
        counting_sort(q->head, q->size, digit, max);
    }
}

既然只比較每個字串的其中一個 character ，我們可以用時間複雜度最低的 counting sort 來達成。

void counting_sort(list_ele_t *head, int size, int digit, int max)
{
    char **tmp = NULL;
    list_ele_t **nodes = NULL;
    list_ele_t *cur = head;
    tmp = malloc(sizeof(char *) * size);
    nodes = malloc(sizeof(list_ele_t *) * size);
    int count[257];  // Support 256 ASCII code. Change this according to the
                     // data

    // Count the occurence of each letter
    for (int i = 0; i < 257; i++) {
        count[i] = 0;
    }
    for (int i = 0; i < size; i++) {
        nodes[i] = cur;
        if (digit < strlen(cur->value))
            count[(int) (unsigned char) cur->value[digit] + 1]++;
        else
            count[0]++;
        cur = cur->next;
    }

    // Sum up the counts
    for (int i = 1; i < 257; i++) {
        count[i] += count[i - 1];
    }

    for (int i = size - 1; i >= 0; i--) {
        if (digit < strlen(nodes[i]->value)) {
            tmp[count[(int) (unsigned char) nodes[i]->value[digit] + 1] - 1] =
                nodes[i]->value;
            count[(int) (unsigned char) nodes[i]->value[digit] + 1]--;
        } else {
            tmp[count[0] - 1] = nodes[i]->value;
            count[0]--;
        }
    }

    for (int i = 0; i < size; ++i) {
        nodes[i]->value = tmp[i];
    }
    free(tmp);
    free(nodes);
}

遺憾的是，最後 Radix sort 並未帶來更好的效能，面對最大的測試資料 trace-15 依然不能在時間內跑完。為了檢測執行時間的實際情況，這裡使用 trace-16 來當作 benchmark，分別執行 merge sort 以及 radix sort。

#include <time.h>

clock_t begin = clock();

SORTING_ALGORITHM

clock_t end = clock();
double time_spent = (double) (end - begin) / CLOCKS_PER_SEC;
printf("%f\n", time_spent);

在 trace-16 中，總共執行了六次 q_sort()，最後結果如下：

Merge sort

Radix sort

可以看到，雖然 Radix sort 理論上時間複雜度較低，但是實際執行起來花了更多的時間。另外，運用相似的想法也試著實作了利用 trie 結構優化的版本，但依然無法通過大測資 trace-15。根據這篇 stackoverflow 討論，猜想可能是因為資料中的 prefix 無法有效分開待排序資料，才造成理論實際上的差異。關於這個推論可能得再設計實驗驗證。

Bug report

radix is not in words list, but should be recognized as an often-used algorithm

git commit -m "Implement radix sort"
Following files need to be cleaned up:
queue.c
sort.c
sort.h
Implement radix sort                                                       [line 1]
 - Possible misspelled word(s): radix

How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/
Proceed with commit? [e/n/?] n

Visualization with massif

這裡我們使用 Massif 來視覺化 Valgrind 的分析結果。為了方便觀察，另外安裝網站上提到的圖形介面工具 massif-visualizer。

$ sudo apt-get install massif-visualizer

接著執行

$ make clean
$ make valgrind

編譯成功後，就可以監控程式執行時的記憶體使用狀況了。

$ valgrind --tool=massif ./qtest

程式執行之後會出現名為 massif.out.<pid> 的檔案。開啟後即可看到這次執行期間的記憶體使用量分佈狀況。

Address sanitizer bug fix

嘗試使用 Address Sanitizer 命令如下：

$ make clean
$ make SANITIZER=1
$ ./qtest
cmd> help

如作業說明所示，~~被噴了滿臉~~出現了以下 error

「噴」這個動詞適合放在這裡嗎？工程人員說話要精準
:notes: jserv

好的，下次會注意，謝謝老師指正

==8881==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55f6bfb063c0 at pc 0x55f6bfaef7fd bp 0x7fffd81a6a90 sp 0x7fffd81a6a80
READ of size 4 at 0x55f6bfb063c0 thread T0
    #0 0x55f6bfaef7fc in do_help_cmd /home/yoyopig/hdd/gitReady/lab0-c/console.c:307
    #1 0x55f6bfaef910 in interpret_cmda /home/yoyopig/hdd/gitReady/lab0-c/console.c:221
    #2 0x55f6bfaf00f5 in interpret_cmd /home/yoyopig/hdd/gitReady/lab0-c/console.c:244
    #3 0x55f6bfaf1838 in run_console /home/yoyopig/hdd/gitReady/lab0-c/console.c:660
    #4 0x55f6bfaee425 in main /home/yoyopig/hdd/gitReady/lab0-c/qtest.c:780
    #5 0x7fcf722460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #6 0x55f6bfaebbcd in _start (/home/yoyopig/hdd/gitReady/lab0-c/qtest+0x8bcd)

0x55f6bfb063c1 is located 0 bytes to the right of global variable 'echo' defined in 'console.c:59:13' (0x55f6bfb063c0) of size 1
SUMMARY: AddressSanitizer: global-buffer-overflow /home/yoyopig/hdd/gitReady/lab0-c/console.c:307 in do_help_cmd

看起來是 echo 出現了 global-buffer-overflow 的問題。在 console.c 檔案搜尋 echo 變數，可以發現 echo 出現在十個地方。其中可以找到 echo 定義


static bool echo = 0;

但是在 add_param 函式中，為了符合 param_ele 的定義，使用了 int* 存取。


add_param("echo", (int *) &echo, "Do/don't echo commands", NULL);

使用一個指向 4 bytes 大小的 int* 指向僅有 1 byte 的 bool 很明顯是一件危險的事情。這裡一開始有兩種解決的想法：

直接更改 echo 的 type
改用 void* ，同時在 add_param() 中處理第二個參數的型態，選擇 int* or bool*

第 1 種方法效率較差，每一個 echo 變數都得多花費不必要的 3 bytes，因此優先考慮第 2 種。開始 trace add_param





















/* Add a new parameter */
void add_param(char *name,
               int *valp,
               char *documentation,
               setter_function setter)
{
    param_ptr next_param = param_list;
    param_ptr *last_loc = &param_list;
    while (next_param && strcmp(name, next_param->name) > 0) {
        last_loc = &next_param->next;
        next_param = next_param->next;
    }

    param_ptr ele = (param_ptr) malloc_or_fail(sizeof(param_ele), "add_param");
    ele->name = name;
    ele->valp = valp;
    ele->documentation = documentation;
    ele->setter = setter;
    ele->next = next_param;
    *last_loc = ele;
}

可以看到，新增的 parameter 會被存起來。進一步查詢 param_ptr 的定義發現：

/* Integer-valued parameters */
typedef struct PELE param_ele, *param_ptr;
struct PELE {
    char *name;
    int *valp;
    char *documentation;
    /* Function that gets called whenever parameter changes */
    setter_function setter;
    param_ptr next;
};

要支援 bool 型態的 valp，肯定得改動這個資料結構。但這時問題就出現了，以一個多人合作的專案來說， header file 中定義的資料結構常常會有大量的 dependency。擅自改動可能會造成程式其他地方的問題。當然這份程式尚且不是那麼龐大，但為了共同維護的容易，這裡就暫時先不考慮為了單一變數改動預定義的資料結構，退而求其次的改變 echo 的 type。把 echo 改成 int 後，再次執行，發生以下錯誤

==12468==ERROR: AddressSanitizer: global-buffer-overflow on address 0x56159ab355e0 at pc 0x56159ab1c7f7 bp 0x7fff458ea950 sp 0x7fff458ea940
READ of size 4 at 0x56159ab355e0 thread T0
    #0 0x56159ab1c7f6 in do_help_cmd /home/yoyopig/hdd/gitReady/lab0-c/console.c:307
    #1 0x56159ab1c90a in interpret_cmda /home/yoyopig/hdd/gitReady/lab0-c/console.c:221
    #2 0x56159ab1d0ef in interpret_cmd /home/yoyopig/hdd/gitReady/lab0-c/console.c:244
    #3 0x56159ab1e835 in run_console /home/yoyopig/hdd/gitReady/lab0-c/console.c:660
    #4 0x56159ab1b425 in main /home/yoyopig/hdd/gitReady/lab0-c/qtest.c:780
    #5 0x7f37395430b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #6 0x56159ab18bcd in _start (/home/yoyopig/hdd/gitReady/lab0-c/qtest+0x8bcd)

0x56159ab355e1 is located 0 bytes to the right of global variable 'simulation' defined in 'console.c:21:6' (0x56159ab355e0) of size 1
  'simulation' is ascii string ''

可以看到，simulation 變數也有同樣的問題。這裡一樣改成 int，之後確定可以正常執行。結果如下：

cmd> help
Commands:
        #        ...            | Display comment
        free                    | Delete queue
        help                    | Show documentation
        ih       str [n]        | Insert string str at head of queue n times. Generate random string(s) if str equals RAND. (default: n == 1)
        it       str [n]        | Insert string str at tail of queue n times. Generate random string(s) if str equals RAND. (default: n == 1)
        log      file           | Copy output to file
        new                     | Create new queue
        option   [name val]     | Display or set options
        quit                    | Exit program
        reverse                 | Reverse queue
        rh       [str]          | Remove from head of queue.  Optionally compare to expected value str
        rhq                     | Remove from head of queue without reporting value.
        show                    | Show queue contents
        size     [n]            | Compute queue size n times (default: n == 1)
        sort                    | Sort queue in ascending order
        source   file           | Read commands from source file
        time     cmd arg ...    | Time command execution
Options:
        echo    1       Do/don't echo commands
        error   5       Number of errors until exit
        fail    30      Number of times allow queue operations to return false
        length  1024    Maximum length of displayed string
        malloc  0       Malloc failure probability percent
        simulation      0       Start/Stop simulation mode
        verbose 4       Verbosity level
cmd>

事後發想：
在處理完以上兩個變數後，發現 bool 型態的變數似乎不是特例，在 simulation, verbose, error, echo 4 個中就佔了兩個。因此，是否有需要另外定義 bool_param_ele 並採用先前提到的方法二成為值得考慮的方案。

研讀論文 Dude, is my code constant time?

這次作業使用到 dudect 套件檢查時間複雜度。試著閱讀原論文
筆記
TODO: Trace github code