# 2024q1 Homework1 (lab0)
contributed by < [`johnny69527`](https://github.com/johnny69527/lab0-c) >
## 開發環境
```bash
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 36 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
CPU family: 6
Model: 42
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 7
CPU max MHz: 2900.0000
CPU min MHz: 800.0000
BogoMIPS: 4589.81
```
## 指定的佇列操作
不知道為何在 trace-17-complexity 中的幾個單純的操作被認為可能不是 constant time?
> follow up: 研究 dudect 相關程式
```shell
+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
Probably constant time
ERROR: Probably not constant time or wrong implementation
--- trace-17-complexity 0/5
--- TOTAL 95/100
make: *** [Makefile:60: test] Error 1
```
## 引入 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c)
### 加入 linux list sort 實作程式碼
[`list_sort.h`](https://github.com/johnny69527/lab0-c/blob/master/list_sort.h)和[`list_sort.c`](https://github.com/johnny69527/lab0-c/blob/master/list_sort.c)
### 加入 linux_sort 命令
:::danger
使用 `git diff` 或 `diff -u` 命令來產生程式碼之間的變更列表,不要憑感覺填入,注意位移量。
:::
修改 `qtest.c` :
```c
+#include "list_sort.h"
#include "queue.h"
...
..
.
-bool do_sort(int argc, char *argv[])
+static int cmp(void *priv, struct list_head *a, struct list_head *b)
+{
+ element_t *l = list_entry(a, element_t, list);
+ element_t *r = list_entry(b, element_t, list);
+ return strcmp(l->value, r->value);
+}
+
+static int cmp_descend(void *priv, struct list_head *a, struct list_head *b)
+{
+ return -1 * cmp(priv, a, b);
+}
+
+bool _do_sort(int argc, char *argv[], int mode)
{
...
..
.
set_noallocate_mode(true);
- if (current && exception_setup(true))
- q_sort(current->q, descend);
+ if (current && exception_setup(true)) {
+ if (mode == 0)
+ q_sort(current->q, descend);
+ else if (mode == 1)
+ list_sort(NULL, current->q, descend ? cmp_descend : cmp);
+ }
exception_cancel();
set_noallocate_mode(false);
...
..
.
+bool do_sort(int argc, char *argv[])
+{
+ return _do_sort(argc, argv, 0);
+}
+
+bool do_linux_sort(int argc, char *argv[])
+{
+ return _do_sort(argc, argv, 1);
+}
+
...
..
.
ADD_COMMAND(sort, "Sort queue in ascending/descening order", "");
+ ADD_COMMAND(linux_sort, "Sort queue with linux lib/list_sort.c", "");
ADD_COMMAND(size, "Compute queue size n times (default: n == 1)", "[n]");
...
..
```
不知道為何?上述修改,若把 `#include "list_sort.h` 放在 `#include "harness.h"` 之前,會得到如下編譯錯誤。但放在之後,便能正確編譯。
:::danger
不要說「不知道為何?」陳述事實即可,找到根本原因。
:::
```bash
$ make
CC qtest.o
qtest.c: In function ‘do_free’:
qtest.c:104:5: error: implicit declaration of function ‘error_check’ [-Werror=implicit-function-declaration]
104 | error_check();
| ^~~~~~~~~~~
qtest.c:107:9: error: implicit declaration of function ‘set_cautious_mode’ [-Werror=implicit-function-declaration]
107 | set_cautious_mode(false);
| ^~~~~~~~~~~~~~~~~
qtest.c:118:13: error: implicit declaration of function ‘exception_setup’ [-Werror=implicit-function-declaration]
118 | if (exception_setup(true))
| ^~~~~~~~~~~~~~~
qtest.c:120:9: error: implicit declaration of function ‘exception_cancel’ [-Werror=implicit-function-declaration]
120 | exception_cancel();
| ^~~~~~~~~~~~~~~~
qtest.c:132:19: error: implicit declaration of function ‘allocation_check’ [-Werror=implicit-function-declaration]
132 | size_t bcnt = allocation_check();
| ^~~~~~~~~~~~~~~~
qtest.c: In function ‘do_reverse’:
qtest.c:526:5: error: implicit declaration of function ‘set_noallocate_mode’ [-Werror=implicit-function-declaration]
526 | set_noallocate_mode(true);
| ^~~~~~~~~~~~~~~~~~~
qtest.c: In function ‘console_init’:
qtest.c:1087:26: error: ‘fail_probability’ undeclared (first use in this function)
1087 | add_param("malloc", &fail_probability, "Malloc failure probability percent",
| ^~~~~~~~~~~~~~~~
qtest.c:1087:26: note: each undeclared identifier is reported only once for each function it appears in
qtest.c: In function ‘sigalrm_handler’:
qtest.c:1109:5: error: implicit declaration of function ‘trigger_exception’ [-Werror=implicit-function-declaration]
1109 | trigger_exception(
| ^~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:54: qtest.o] Error 1
```
修改 `Makefile`:
```
OBJS := qtest.o report.o console.o harness.o queue.o \
random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \
shannon_entropy.o \
- linenoise.o web.o
+ linenoise.o web.o list_sort.o
```
### 效能比較
修改 `harness.c` 關掉命令的執行時限:
```c
-static int time_limit = 1;
+static int time_limit = 0;
```
建立 [trace-perf-merge-sort.cmd](https://github.com/johnny69527/lab0-c/blob/master/traces/trace-perf-merge-sort.cmd) 和 [trace-perf-linux-sort.cmd](https://github.com/johnny69527/lab0-c/blob/master/traces/trace-perf-linux-sort.cmd) 命令序列,用來給 qtest 産生我自己實作的 merge sort 和 linux list sort 的執行時間數據。
修改 `Makefile`,加入 `sort_perf` make target:
```
sort_perf:
./qtest -v 2 -f traces/trace-perf-linux-sort.cmd -l linux_sort.dat
cat linux_sort.dat | grep -v 'Delta time =' | sed 's/# //g' > linux_sort.dat01
cat linux_sort.dat | grep 'Delta time =' | sed 's/Delta time = //g' > linux_sort.dat02
pr -m -T linux_sort.dat01 linux_sort.dat02 > linux_sort.dat
rm linux_sort.dat01 linux_sort.dat02
./qtest -v 2 -f traces/trace-perf-merge-sort.cmd -l merge_sort.dat
cat merge_sort.dat | grep -v 'Delta time =' | sed 's/# //g' > merge_sort.dat01
cat merge_sort.dat | grep 'Delta time =' | sed 's/Delta time = //g' > merge_sort.dat02
pr -m -T merge_sort.dat01 merge_sort.dat02 > merge_sort.dat
rm merge_sort.dat01 merge_sort.dat02
gnuplot scripts/sort_perf.gp
rm linux_sort.dat merge_sort.dat
xdg-open sort_perf.png
```
執行 `make sort_perf` 産生以下執行時間比較圖,可以看出我實作的 merge sort 和 linux list sort 有很大的效能落差。

:::danger
你如何設計實驗?用什麼資料當作排序程式的輸入?
:::
## 整合 web 命令和 linenoise 套件
### 解決 favicon.ico 的問題
按照作業說明的做法解決 favicon.ico 的問題。
```c
char *p = web_recv(web_connfd, &clientaddr);
char *buffer =
"HTTP/1.1 200 OK\r\n%s%s%s%s%s%s"
"Content-Type: text/html\r\n\r\n"
"<html><head><style>"
"body{font-family: monospace; font-size: 13px;}"
"td {padding: 1.5px 6px;}"
"</style><link rel=\"shortcut icon\" href=\"#\">"
"</head><body><table>\n";
web_send(web_connfd, buffer);
```
但是引申了另一個問題 - [page loads twice in Google chrome](https://stackoverflow.com/questions/2009092/page-loads-twice-in-google-chrome),一些瀏覽器(如 chrome)會每次發出重複多次請求:
```shell
cmd> web
listen on port 9999, fd is 3
cmd> l = []
cmd> l = []
cmd> l = []
cmd> l = [a]
cmd> l = [a a]
cmd> l = [a a a]
cmd> Current queue ID: 2
l = [a a a]
cmd> Current queue ID: 2
l = [a a a]
cmd> Current queue ID: 2
l = [a a a]
cmd>
```
在 `console.c` 加入一個不做任何事的 favicon.ico 命令迴避這個問題。
```c
static bool do_nothing(int arc, char *argv[])
{
printf("favicon.ico work around\n");
return true;
}
...
..
.
ADD_COMMAND(web, "Read commands from builtin web server", "[port]");
add_cmd("favicon.ico", do_nothing,
"Do nothing command for favicon.ico request from browser", "");
```
```bash
cmd> web
listen on port 9999, fd is 3
cmd> l = []
cmd> favicon.ico work around
cmd> l = [a]
cmd> favicon.ico work around
cmd> Current queue ID: 0
l = [a]
cmd> favicon.ico work around
cmd>
```
### 用 select 系統呼叫同時處理 stdin 及 socket
在 `linenoise.c` 中的 `line_edit` 的 `while(1)` 迴圈裡採用 `select` 系統呼叫查詢哪個 File Descriptor 有內容可以讀取,便不會造成 web 和 stdio 相互阻塞的情況。
:::danger
使用 `git diff` 或 `diff -u` 命令來產生程式碼之間的變更列表,不要憑感覺填入,注意位移量。
:::
```c
+extern int web_fd;
+extern int web_connfd;
/* This function is the core of the line editing capability of linenoise.
* It expects 'fd' to be already in "raw mode" so that every key pressed
* will be returned ASAP to read().
*/
static int line_edit(int stdin_fd,
int stdout_fd,
char *buf,
size_t buflen,
const char *prompt)
{
...
..
.
while (1) {
+ fd_set set;
+ FD_ZERO(&set);
+ if (web_fd)
+ FD_SET(web_fd, &set);
+ FD_SET(l.ifd, &set);
+
+ int rv = web_fd ? select(web_fd + 1, &set, NULL, NULL, NULL)
+ : select(l.ifd + 1, &set, NULL, NULL, NULL);
+
+ switch (rv) {
+ case -1:
+ perror("select"); /* an error occurred */
+ continue;
+ case 0:
+ printf("timeout occurred\n"); /* a timeout occurred */
+ continue;
+ default:
+ if (web_fd && FD_ISSET(web_fd, &set)) {
+ struct sockaddr_in clientaddr;
+ socklen_t clientlen = sizeof clientaddr;
+ web_connfd =
+ accept(web_fd, (struct sockaddr *) &clientaddr, &clientlen);
+ char *p = web_recv(web_connfd, &clientaddr);
+ strncpy(buf, p, strlen(p) + 1);
+
+ char *buffer =
+ "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n";
+ web_send(web_connfd, buffer);
+
+ free(p);
+ return strlen(p) + 1;
+ } else if (FD_ISSET(l.ifd, &set)) {
signed char c;
int nread;
char seq[5];
nread = read(l.ifd, &c, 1);
if (nread <= 0)
return l.len;
...
..
.
- return l.len;
+ }
+ }
+ return l.len;
}
```
## Fisher-Yates shuffle
`q_shuffle` 實作:
```c
/* Shuffle elements in queue */
void q_shuffle(struct list_head *head)
{
if (!head || list_empty(head) || list_is_singular(head))
return;
/* srand(time(0)); */
int len = q_size(head);
struct list_head *i_th = head->prev;
for (int i = len; i > 0; i--) {
struct list_head *r_th = head;
int r = rand() % i + 1;
if (r != i) {
// find r_th
for (int j = 0; j < r; j++)
r_th = r_th->next;
// swap r_th and i_th element
element_t *e_rth = list_entry(r_th, element_t, list);
element_t *e_ith = list_entry(i_th, element_t, list);
char *temp = e_rth->value;
e_rth->value = e_ith->value;
e_ith->value = temp;
}
i_th = i_th->prev;
}
}
```
### 統計分析
使用作業說明提供的 script 進行分析:
```
$ python3 scripts/chi_squared_sum.py
Expectation: 41666
Observation: {'1234': 41399, '1243': 41753, '1324': 41752, '1342': 41786, '1423': 41823, '1432': 41857, '2134': 41372, '2143': 41883, '2314': 41408, '2341': 41436, '2413': 42025, '2431': 41837, '3124': 41544, '3142': 41452, '3214': 41737, '3241': 41646, '3412': 41844, '3421': 41626, '4123': 41845, '4132': 41684, '4213': 41543, '4231': 41539, '4312': 41529, '4321': 41680}
chi square sum: 18.117601881630108
```

用[查表](https://homepage.divms.uiowa.edu/~mbognar/applets/chisq.html)的方式查出自由度 23,$X^2$=18.117601881630108 時,p=0.75108=75.108%。$H_0$ 不顯著,無證據說此分佈不均勻。
若以時間作為亂數種子 `srand(time(0))`,進行分析:
```
$ python3 scripts/chi_squared_sum.py
Expectation: 41666
Observation: {'1234': 35691, '1243': 55973, '1324': 44561, '1342': 13536, '1423': 48781, '1432': 72626, '2134': 67690, '2143': 33906, '2314': 28418, '2341': 22146, '2413': 25936, '2431': 38575, '3124': 60054, '3142': 20356, '3214': 34389, '3241': 33245, '3412': 41248, '3421': 38249, '4123': 58762, '4132': 56140, '4213': 41892, '4231': 49423, '4312': 35990, '4321': 42413}
chi square sum: 122952.21797148754
```

自由度 23,$X^2$=122952.21797148754 時,p=0=0%
經由卡方檢定計算出的 p 值小於設定的信心水準 (如0.05) 拒絕虛無假說,不符合平均分佈。
為甚麼採用不同的亂數種子 - `os_random(getpid() ^ getppid())` 和 `time(0)`,會産生相對的結果呢?
>
> The srand() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by rand(). These sequences are repeatable bycalling srand() with the same seed value
>
>man srand
根據手冊所說,種子代表一組 pseudo-random integer,同一個 pseudo-random number generator 不應該是一樣亂嗎?
:::danger
複習離散數學。
:::