2024q1 Homework1 (lab0)

contributed by < johnny69527 >

開發環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         36 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
    CPU family:          6
    Model:               42
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            7
    CPU max MHz:         2900.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4589.81

指定的佇列操作

不知道為何在 trace-17-complexity 中的幾個單純的操作被認為可能不是 constant time?

follow up: 研究 dudect 相關程式

+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
Probably constant time
ERROR: Probably not constant time or wrong implementation
---     trace-17-complexity     0/5
---     TOTAL           95/100
make: *** [Makefile:60: test] Error 1

引入 lib/list_sort.c

加入 linux list sort 實作程式碼

list_sort.h和list_sort.c

加入 linux_sort 命令

使用 git diff 或 diff -u 命令來產生程式碼之間的變更列表，不要憑感覺填入，注意位移量。

修改 qtest.c :

+#include "list_sort.h"
 #include "queue.h"

...
..
.
    
-bool do_sort(int argc, char *argv[])
+static int cmp(void *priv, struct list_head *a, struct list_head *b)
+{
+    element_t *l = list_entry(a, element_t, list);
+    element_t *r = list_entry(b, element_t, list);
+    return strcmp(l->value, r->value);
+}
+
+static int cmp_descend(void *priv, struct list_head *a, struct list_head *b)
+{
+    return -1 * cmp(priv, a, b);
+}
+
+bool _do_sort(int argc, char *argv[], int mode)
{

...
..
.
    set_noallocate_mode(true);
-    if (current && exception_setup(true))
-        q_sort(current->q, descend);
+    if (current && exception_setup(true)) {
+        if (mode == 0)
+            q_sort(current->q, descend);
+        else if (mode == 1)
+            list_sort(NULL, current->q, descend ? cmp_descend : cmp);
+    }
    exception_cancel();
    set_noallocate_mode(false);

...
..
.
    

+bool do_sort(int argc, char *argv[])
+{
+    return _do_sort(argc, argv, 0);
+}
+
+bool do_linux_sort(int argc, char *argv[])
+{
+    return _do_sort(argc, argv, 1);
+}
+


...
..
.
     ADD_COMMAND(sort, "Sort queue in ascending/descening order", "");
+    ADD_COMMAND(linux_sort, "Sort queue with linux lib/list_sort.c", "");
     ADD_COMMAND(size, "Compute queue size n times (default: n == 1)", "[n]");
    
...
..

不知道為何？上述修改，若把 #include "list_sort.h 放在 ＃include "harness.h" 之前，會得到如下編譯錯誤。但放在之後，便能正確編譯。

不要說「不知道為何？」陳述事實即可，找到根本原因。

$ make
  CC    qtest.o
qtest.c: In function ‘do_free’:
qtest.c:104:5: error: implicit declaration of function ‘error_check’ [-Werror=implicit-function-declaration]
  104 |     error_check();
      |     ^~~~~~~~~~~
qtest.c:107:9: error: implicit declaration of function ‘set_cautious_mode’ [-Werror=implicit-function-declaration]
  107 |         set_cautious_mode(false);
      |         ^~~~~~~~~~~~~~~~~
qtest.c:118:13: error: implicit declaration of function ‘exception_setup’ [-Werror=implicit-function-declaration]
  118 |         if (exception_setup(true))
      |             ^~~~~~~~~~~~~~~
qtest.c:120:9: error: implicit declaration of function ‘exception_cancel’ [-Werror=implicit-function-declaration]
  120 |         exception_cancel();
      |         ^~~~~~~~~~~~~~~~
qtest.c:132:19: error: implicit declaration of function ‘allocation_check’ [-Werror=implicit-function-declaration]
  132 |     size_t bcnt = allocation_check();
      |                   ^~~~~~~~~~~~~~~~
qtest.c: In function ‘do_reverse’:
qtest.c:526:5: error: implicit declaration of function ‘set_noallocate_mode’ [-Werror=implicit-function-declaration]
  526 |     set_noallocate_mode(true);
      |     ^~~~~~~~~~~~~~~~~~~
qtest.c: In function ‘console_init’:
qtest.c:1087:26: error: ‘fail_probability’ undeclared (first use in this function)
 1087 |     add_param("malloc", &fail_probability, "Malloc failure probability percent",
      |                          ^~~~~~~~~~~~~~~~
qtest.c:1087:26: note: each undeclared identifier is reported only once for each function it appears in
qtest.c: In function ‘sigalrm_handler’:
qtest.c:1109:5: error: implicit declaration of function ‘trigger_exception’ [-Werror=implicit-function-declaration]
 1109 |     trigger_exception(
      |     ^~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:54: qtest.o] Error 1

修改 Makefile:

 OBJS := qtest.o report.o console.o harness.o queue.o \
         random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \
         shannon_entropy.o \
-        linenoise.o web.o
+        linenoise.o web.o list_sort.o

效能比較

修改 harness.c 關掉命令的執行時限：

-static int time_limit = 1;
+static int time_limit = 0;

建立 trace-perf-merge-sort.cmd 和 trace-perf-linux-sort.cmd 命令序列，用來給 qtest 産生我自己實作的 merge sort 和 linux list sort 的執行時間數據。

修改 Makefile，加入 sort_perf make target:

sort_perf: 
	./qtest -v 2 -f traces/trace-perf-linux-sort.cmd -l linux_sort.dat
	cat linux_sort.dat | grep -v 'Delta time =' | sed 's/# //g' > linux_sort.dat01
	cat linux_sort.dat | grep 'Delta time =' | sed 's/Delta time = //g' > linux_sort.dat02
	pr -m -T linux_sort.dat01 linux_sort.dat02 > linux_sort.dat
	rm linux_sort.dat01 linux_sort.dat02
	./qtest -v 2 -f traces/trace-perf-merge-sort.cmd -l merge_sort.dat
	cat merge_sort.dat | grep -v 'Delta time =' | sed 's/# //g' > merge_sort.dat01
	cat merge_sort.dat | grep 'Delta time =' | sed 's/Delta time = //g' > merge_sort.dat02
	pr -m -T merge_sort.dat01 merge_sort.dat02 > merge_sort.dat
	rm merge_sort.dat01 merge_sort.dat02
	gnuplot scripts/sort_perf.gp
	rm linux_sort.dat merge_sort.dat
	xdg-open sort_perf.png

執行 make sort_perf 産生以下執行時間比較圖，可以看出我實作的 merge sort 和 linux list sort 有很大的效能落差。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

你如何設計實驗？用什麼資料當作排序程式的輸入？

整合 web 命令和 linenoise 套件

解決 favicon.ico 的問題

按照作業說明的做法解決 favicon.ico 的問題。

char *p = web_recv(web_connfd, &clientaddr);
char *buffer =
    "HTTP/1.1 200 OK\r\n%s%s%s%s%s%s"
    "Content-Type: text/html\r\n\r\n"
    "<html><head><style>"
    "body{font-family: monospace; font-size: 13px;}"
    "td {padding: 1.5px 6px;}"
    "</style><link rel=\"shortcut icon\" href=\"#\">"
    "</head><body><table>\n";
web_send(web_connfd, buffer);

但是引申了另一個問題－ page loads twice in Google chrome，一些瀏覽器（如 chrome）會每次發出重複多次請求：

cmd> web
listen on port 9999, fd is 3
cmd> l = []
cmd> l = []
cmd> l = []
cmd> l = [a]
cmd> l = [a a]
cmd> l = [a a a]
cmd> Current queue ID: 2
l = [a a a]
cmd> Current queue ID: 2
l = [a a a]
cmd> Current queue ID: 2
l = [a a a]
cmd>

在 console.c 加入一個不做任何事的 favicon.ico 命令迴避這個問題。

static bool do_nothing(int arc, char *argv[])
{
    printf("favicon.ico work around\n");
    return true;
}

...
..
.    
    ADD_COMMAND(web, "Read commands from builtin web server", "[port]");
    add_cmd("favicon.ico", do_nothing,
            "Do nothing command for favicon.ico request from browser", "");

cmd> web
listen on port 9999, fd is 3
cmd> l = []
cmd> favicon.ico work around
cmd> l = [a]
cmd> favicon.ico work around
cmd> Current queue ID: 0
l = [a]
cmd> favicon.ico work around
cmd>

用 select 系統呼叫同時處理 stdin 及 socket

在 linenoise.c 中的 line_edit 的 while(1) 迴圈裡採用 select 系統呼叫查詢哪個 File Descriptor 有內容可以讀取，便不會造成 web 和 stdio 相互阻塞的情況。

使用 git diff 或 diff -u 命令來產生程式碼之間的變更列表，不要憑感覺填入，注意位移量。

+extern int web_fd;
+extern int web_connfd;
 /* This function is the core of the line editing capability of linenoise.
  * It expects 'fd' to be already in "raw mode" so that every key pressed
  * will be returned ASAP to read().
  */
 static int line_edit(int stdin_fd,
                      int stdout_fd,
                      char *buf,
                      size_t buflen,
                      const char *prompt)
 { 
     ...
     ..
     .
     while (1) {
+        fd_set set;
+        FD_ZERO(&set);
+        if (web_fd)
+            FD_SET(web_fd, &set);
+        FD_SET(l.ifd, &set);
+
+        int rv = web_fd ? select(web_fd + 1, &set, NULL, NULL, NULL)
+                        : select(l.ifd + 1, &set, NULL, NULL, NULL);
+
+        switch (rv) {
+        case -1:
+            perror("select"); /* an error occurred */
+            continue;
+        case 0:
+            printf("timeout occurred\n"); /* a timeout occurred */
+            continue;
+        default:
+            if (web_fd && FD_ISSET(web_fd, &set)) {
+                struct sockaddr_in clientaddr;
+                socklen_t clientlen = sizeof clientaddr;
+                web_connfd =
+                    accept(web_fd, (struct sockaddr *) &clientaddr, &clientlen);
+                char *p = web_recv(web_connfd, &clientaddr);
+                strncpy(buf, p, strlen(p) + 1);
+
+                char *buffer =
+                    "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n";
+                web_send(web_connfd, buffer);
+
+                free(p);
+                return strlen(p) + 1;
+            } else if (FD_ISSET(l.ifd, &set)) {
                 signed char c;
                 int nread;
                 char seq[5];
 
                 nread = read(l.ifd, &c, 1);
                 if (nread <= 0)
                     return l.len; 
                 ...
                 ..
                 .
-            return l.len;
+        }
+    }    
+    return l.len;
 }

Fisher-Yates shuffle

q_shuffle 實作：

/* Shuffle elements in queue */
void q_shuffle(struct list_head *head)
{
    if (!head || list_empty(head) || list_is_singular(head))
        return;

    /* srand(time(0)); */

    int len = q_size(head);
    struct list_head *i_th = head->prev;

    for (int i = len; i > 0; i--) {
        struct list_head *r_th = head;
        int r = rand() % i + 1;
        if (r != i) {
            // find r_th
            for (int j = 0; j < r; j++)
                r_th = r_th->next;

            // swap r_th and i_th element
            element_t *e_rth = list_entry(r_th, element_t, list);
            element_t *e_ith = list_entry(i_th, element_t, list);
            char *temp = e_rth->value;
            e_rth->value = e_ith->value;
            e_ith->value = temp;
        }
        i_th = i_th->prev;
    }
}

統計分析

使用作業說明提供的 script 進行分析：

$ python3 scripts/chi_squared_sum.py 
Expectation:  41666
Observation:  {'1234': 41399, '1243': 41753, '1324': 41752, '1342': 41786, '1423': 41823, '1432': 41857, '2134': 41372, '2143': 41883, '2314': 41408, '2341': 41436, '2413': 42025, '2431': 41837, '3124': 41544, '3142': 41452, '3214': 41737, '3241': 41646, '3412': 41844, '3421': 41626, '4123': 41845, '4132': 41684, '4213': 41543, '4231': 41539, '4312': 41529, '4321': 41680}
chi square sum:  18.117601881630108

用查表的方式查出自由度 23，\(X^2\)=18.117601881630108 時，p=0.75108=75.108%。\(H_0\) 不顯著，無證據說此分佈不均勻。

若以時間作為亂數種子 srand(time(0))，進行分析：

$ python3 scripts/chi_squared_sum.py 
Expectation:  41666
Observation:  {'1234': 35691, '1243': 55973, '1324': 44561, '1342': 13536, '1423': 48781, '1432': 72626, '2134': 67690, '2143': 33906, '2314': 28418, '2341': 22146, '2413': 25936, '2431': 38575, '3124': 60054, '3142': 20356, '3214': 34389, '3241': 33245, '3412': 41248, '3421': 38249, '4123': 58762, '4132': 56140, '4213': 41892, '4231': 49423, '4312': 35990, '4321': 42413}
chi square sum:  122952.21797148754

自由度 23，\(X^2\)=122952.21797148754 時，p=0=0%
經由卡方檢定計算出的 p 值小於設定的信心水準 (如0.05) 拒絕虛無假說，不符合平均分佈。

為甚麼採用不同的亂數種子－ os_random(getpid() ^ getppid()) 和 time(0)，會産生相對的結果呢？

The srand() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by rand(). These sequences are repeatable bycalling srand() with the same seed value

man srand

根據手冊所說，種子代表一組 pseudo-random integer，同一個 pseudo-random number generator 不應該是一樣亂嗎？

複習離散數學。