Try   HackMD

2021q1 Homework1 (lab0)

contributed by < 93i7xo2 >

Source: lab-0

To-Do List

  • 研讀論文 Dude, is my code constant time?,解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析,達到驗證時間複雜度,需要解釋 Student's t-distribution 及程式實作的原理。注意:現有實作存在若干致命缺陷,請討論並提出解決方案;
  • 開啟 Address Sanitizer,修正 qtest 執行過程中的錯誤
    • 先執行 qtest 再於命令提示列輸入 help 命令,會使 開啟 Address Sanitizer 觸發錯誤,應予以排除
  • 解釋 select 系統呼叫在本程式的使用方式,並分析 console.c 的實作,說明其中運用 CS:APP RIO 套件 的原理和考量點。可對照參考 CS:APP 第 10 章重點提示 :arrow_forward: 為避免「舉燭」,請比照 qtest 實作,撰寫獨立的終端機命令解譯器 (可建立新的 GitHub repository)
  • 運用 Valgrind 排除 qtest 實作的記憶體錯誤,並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量,需要設計對應的實驗
  • qtest 中實作 coroutine,並提供新的命令 web,提供 web 伺服器功能,注意: web 伺服器運作過程中,qtest 仍可接受其他命令
  • 說明 antirez/linenoise 的運作原理,注意到 termios 的運用
  • 指出現有程式的缺陷 (提示: 和 RIO 套件 有關),嘗試強化並提交 pull request

系統環境

CPU: i5-6200U
RAM: 20G
OS: Ubuntu 20.04.2 LTS

完成qtest.c實作

q_new

為了實現

O(1) 時間複雜度,加上 countlast_element 紀錄。

 queue_t *q_new()
 {
     queue_t *q = malloc(sizeof(queue_t));
-    /* TODO: What if malloc returned NULL? */
-    q->head = NULL;
+    if (q) {
+        q->head = NULL;
+        q->count = 0;
+        q->last_element = q->head;
+    }
     return q;
 }

q_free

 void q_free(queue_t *q) 
 {
-    /* TODO: How about freeing the list elements and the strings? */
     /* Free queue structure */
+    if (!q)
+        return;
+
+    while (q->head) {
+        list_ele_t *tmp = q->head;
+        q->head = q->head->next;
+        free(tmp->value);
+        free(tmp);
+    }
     free(q);
 }

q_insert_head

依據 CERN Common vulnerabilities guide for C programmers 的建議使用 strncpy,以 strlen 取的字串大小作為 strncpy 參數。由於 strncpy 並不會加上 null character,需自行補上或以 calloc 替代 malloc

 bool q_insert_head(queue_t *q, char *s)
 {
-    list_ele_t *newh;
-    /* TODO: What should you do if the q is NULL? */
-    newh = malloc(sizeof(list_ele_t));
-    /* Don't forget to allocate space for the string and copy it */
-    /* What if either call to malloc returns NULL? */
+    if (!q)
+        return false;
+
+    /* Create a new element. */
+    list_ele_t *newh = malloc(sizeof(list_ele_t));
+    int len = strlen(s);
+    if (!newh)
+        return false;
+    newh->value = (char *) malloc(len + 1);
+    if (!newh->value) {
+        free(newh);
+        return false;
+    }
+    strncpy(newh->value, (const char *) s, len);
+    newh->value[len] = '\0';
+
+    /* Attach new element at the head of linked list. */                                               
     newh->next = q->head;
     q->head = newh;
+    q->count++;
+
+    /* Mark it as last element if it's a first element. */
+    if (!newh->next)
+        q->last_element = newh;
+
     return true;
 }

q_insert_tail

一開始實作 q_insert_tail 忽視掉佇列可能是空的可能性,透過 valgrind 發現無效寫入

==127495== Invalid write of size 8
==127495==    at 0x10DF1E: q_insert_tail (queue.c:105)
==127495==    by 0x10E3D2: measure (constant.c:69)
==127495==    by 0x10E5D7: doit (fixture.c:136)
==127495==    by 0x10E82F: is_insert_tail_const (fixture.c:168)
==127495==    by 0x10B572: do_insert_tail (qtest.c:259)
==127495==    by 0x10CB67: interpret_cmda (console.c:220)
==127495==    by 0x10D0EF: interpret_cmd (console.c:243)
==127495==    by 0x10D6CC: cmd_select (console.c:569)
==127495==    by 0x10D903: run_console (console.c:628)
==127495==    by 0x10BFC1: main (qtest.c:772)
==127495==  Address 0x8 is not stack'd, malloc'd or (recently) free'd
==127495== 
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer

於是加入空佇列的判斷

-q->last_element->next = newh;
-q->last_element = newh;
+if (!q->last_element) {
+    q->head = q->last_element = newh;
+} else {
+    q->last_element->next = newh;
+    q->last_element = newh;
+}
q->count++;

最終實作如下

 bool q_insert_tail(queue_t *q, char *s)
 {
-    /* TODO: You need to write the complete code for this function */
-    /* Remember: It should operate in O(1) time */
-    /* TODO: Remove the above comment when you are about to implement. */
-    return false;
+    if (!q)
+        return false;
+
+    /* Create a new element. */
+    list_ele_t *newh = malloc(sizeof(list_ele_t));
+    int len = strlen(s);
+    if (!newh)
+        return false;
+    newh->value = (char *) malloc(len + 1);
+    if (!newh->value) {
+        free(newh);
+        return false;
+    }
+    strncpy(newh->value, (const char *) s, len);
+    newh->value[len] = '\0';
+    newh->next = NULL;
+
+    /* Insert element at tail of queue, with time complexity of O(1) */
+    if (!q->last_element) {
+        q->head = q->last_element = newh;
+    } else {
+        q->last_element->next = newh;
+        q->last_element = newh;
+    }
+    q->count++;
+
+    return true;
 }

q_remove_head

 bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
 {
-    /* TODO: You need to fix up this code. */
-    /* TODO: Remove the above comment when you are about to implement. */
+    if (!q || !q->head)
+        return false;
+    if (sp) {
+        int realBufsize = strlen(q->head->value) < bufsize - 1
+                              ? strlen(q->head->value)
+                              : bufsize - 1;
+        memcpy((void *) sp, (const void *) q->head->value, realBufsize);
+        *(sp + realBufsize) = '\0';
+    }
+    list_ele_t *target = q->head;
+    /* May be bufsize is greater than actual size of value */
     q->head = q->head->next;
+    q->count--;
+    free(target->value);
+    free(target);
+
+    /* Mark last element as NULL if queue is empty */
+    if (!q->head)
+        q->last_element = NULL;
+
     return true;
 }

q_size

 int q_size(queue_t *q)
 {
-    /* TODO: You need to write the code for this function */
-    /* Remember: It should operate in O(1) time */
-    /* TODO: Remove the above comment when you are about to implement. */
-    return 0;
+    /* With time complexity of O(1) */
+    return q ? q->count : 0;
 }

q_reverse

 void q_reverse(queue_t *q)
 {
-    /* TODO: You need to write the code for this function */
-    /* TODO: Remove the above comment when you are about to implement. */
+    if (q == NULL)
+        return;
+
+    list_ele_t *prev = NULL;
+    list_ele_t *temp;
+    list_ele_t *current = q->last_element = q->head;
+    while (current != NULL) {
+        temp = current->next;
+        current->next = prev;
+        prev = current;
+        current = temp;
+    }
+    q->head = prev;
 }

q_sort

  • 以 mergesort 來實作 qsort,分割 list 時使用 slow/faster pointer 的方法
  • 參考 gpwork4u 的建議將 comparator 獨立出來方便替換成其他函式
 void q_sort(queue_t *q)
 {
-    /* TODO: You need to write the code for this function */
-    /* TODO: Remove the above comment when you are about to implement. */
+    if (q == NULL || q->head == NULL || q->count == 1)
+        return;
+    q->head = mergesort(q->head, comparator);
+
+    list_ele_t *end = q->head;
+    while (end->next) {
+        end = end->next;
+    }
+    q->last_element = end;
 }
+
+/*
+ * Sort elements in linked list
+ * This function is called by `q_sort`
+ */
+list_ele_t *mergesort(list_ele_t *start,
+                      int (*compar)(const void *, const void *))
+{
+    if (!start || !start->next) {
+        return start;
+    }
+
+    // Split the list into two lists
+    list_ele_t *slow = start, *fast = start;
+    while (fast->next && fast->next->next) {
+        slow = slow->next;
+        fast = fast->next->next;
+    }
+    list_ele_t *left = start, *right = slow->next;
+    slow->next = NULL;
+
+    left = mergesort(left, compar);
+    right = mergesort(right, compar);
+
+    for (list_ele_t *merge = NULL; left || right;) {
+        if (!right || (left && (*compar)(left, right) < 0)) {
+            list_ele_t *next = left->next;
+            if (!merge) {
+                start = merge = left;
+            } else {                                                                                   
+                merge->next = left;
+                merge = left;
+            left = next;
+        } else {
+            list_ele_t *next = right->next;
+            if (!merge) {
+                start = merge = right;
+            } else {
+                merge->next = right;
+                merge = right;
+            }
+            right = next;
+        }
+        merge->next = NULL;
+    }
+    return start;
+}

memleak in qtest.c

使用 make valgrind 或是 valgrind ./qtest -f traces/trace-XX-ops.cmd 會出現 memory leak 的訊息:

cmd> 
Freeing queue
==364164== 5 bytes in 1 blocks are still reachable in loss record 1 of 3
==364164==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==364164==    by 0x4A6150E: strdup (strdup.c:42)
==364164==    by 0x110394: linenoiseHistoryAdd (linenoise.c:1236)
==364164==    by 0x110F27: linenoiseHistoryLoad (linenoise.c:1325)
==364164==    by 0x10C2BE: main (qtest.c:781)
==364164== 

trace 後發現這段配置的記憶體應該是經由 linenoise 掛上 exit handlers 並於程式正常結束時清除

linenoise
    └linenoiseRaw
        └enableRawMode
            └atexit(linenoiseAtExit);

按照 run_console 現有寫法,只有在 interactive mode 下才會呼叫到 linenoise

if (!has_infile) {
    char *cmdline;

    while ((cmdline = linenoise(prompt)) != NULL) {
        interpret_cmd(cmdline);
        linenoiseHistoryAdd(cmdline);       /* Add to the history. */
        linenoiseHistorySave(HISTORY_FILE); /* Save the history on disk. */
        linenoiseFree(cmdline);
    }
} else {
    while (!cmd_done())
        cmd_select(0, NULL, NULL, NULL, NULL);
}

並在 runconsole 前設定 linenoise

/* Trigger call back function(auto completion) */
linenoiseSetCompletionCallback(completion);

linenoiseHistorySetMaxLen(HISTORY_LEN);
linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */

...

bool ok = true;
ok = ok && run_console(infile_name);

想到的解法有兩個,在 qtest 結束前自行呼叫 linenoiseAtExit,或是將 linenoiseHistoryLoad 搬移到接下來要執行 linenoise 的前面,確保記憶體一定被釋放,例如原始專案範例 example.c

選擇用前者,畢竟在 non-interactive mode 去載入 linenoise 的設定就是一件匪夷所思的事情,修改後如下:

if (!has_infile) {
    char *cmdline;

+   /* Trigger call back function(auto completion) */
+   linenoiseSetCompletionCallback(completion);
+
+   linenoiseHistorySetMaxLen(HISTORY_LEN);
+   linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */
    while ((cmdline = linenoise(prompt)) != NULL) {
        interpret_cmd(cmdline);
        linenoiseHistoryAdd(cmdline);       /* Add to the history. */
        linenoiseHistorySave(HISTORY_FILE); /* Save the history on disk. */
        linenoiseFree(cmdline);
    }
} else {
    while (!cmd_done())
        cmd_select(0, NULL, NULL, NULL, NULL);
}

嵌入 natural sort

natsort 的行為是先以數字做排序,其次是字母,有別於 strcmp 逐一字元比較;數字的界定是以左右字母或特殊字元(-)做區隔,不計中間的空格或 0。

pic02a
pic02000
pic05
pic2
pic3
pic4
pic 4 else
pic 5

測試腳本提供數種 testdata,然而實際用到的只有 test-dates、test-fractions、test-words。

來看 natsort 的程式碼,strnatcasecmp.c 提供兩個比較函式

  • strnatcmp
  • strnatcasecmp: 功能同上,但字母不分大小寫,官方測試資料也沒測到大小寫字母。
    ​​​​if (fold_case) {
    ​​​​    ca = nat_toupper(ca);
    ​​​​    cb = nat_toupper(cb);
    ​​​​}
    

基於考量採用 strnatcasecmp 當作比較函數,並在 qtest.c 引入 strnatcmp.hstrnatcmp.h,同時在console.csimulation後方新增一個 option - natsort,用以切換 natsort/normal sort mode。

int comparator(const void *a, const void *b)
{
    int (*sort_alg)() = strcmp;
    if (natsort)
        sort_alg = strnatcasecmp;
    return sort_alg(((list_ele_t *) a)->value, ((list_ele_t *) b)->value);
}

可惜在 console 輸入時是以空白做運算元分隔,下面字詞插入時會被誤判,故不列入測試。

pic 4 else
pic 5 something
Fractional release numbers

run-test.bash 有這麼一段

diff -u "$1" <( ./natsort < "$2" ) && return

根據 3.5.6 Process Substitution 的說明,diff 必須要將 <() 內的程式輸出讀進來,作用等同

./natsort < "$2" | diff -u "$1" - && return

研讀論文 Dude, is my code constant time?