# 2021q1 Homework1 (lab0)
contributed by < `XDEv11` >
> [GitHub](https://github.com/XDEv11/lab0-c)
> [lab0](https://hackmd.io/@sysprog/linux2021-lab0)
## `queue.[ch]` 實作
### `queue_t` 結構定義
為了能實作出 O(1) 的 `q_size` 與 `q_insert_tail`,這邊直接紀錄佇列的大小與尾端。
並且定義常數 `QUEUE_STRLEN_MAX` 來限制字串最大長度,方便後面使用 `*n*` 的函式,避免 `<string.h>` 中某些不安全的函式。
[CERN Common vulnerabilities guide for C programmers](https://security.web.cern.ch/recommendations/en/codetools/c.shtml)。
```cpp
#define QUEUE_STRLEN_MAX 255
typedef struct {
list_ele_t *head; /* Linked list of elements */
int size;
list_ele_t **tail;
} queue_t;
```
`tail` 是指標的指標,指向 `head`
```graphviz
digraph SLL {
rankdir=LR
head [shape="cds", label="head"]
pp [shape="box3d", label="pp"]
nn [shape="none", label=""]
pp:n -> head:w [color="red"];
head :e -> nn :w [color="red"];
}
```
或是尾端節點的 `next`。
```graphviz
digraph SLL {
rankdir=LR
head [shape="cds", label="head"]
nodes [shape="record", label="..."];
nodeZ [shape="record", label="{Z | {<value>value | <next>next}}"];
head :e -> nodes :w;
nodes :es -> nodeZ :w;
pp [shape="box3d", label="pp"];
nn [shape="none", label=""];
pp:n -> nodeZ:next:w [color="red"];
nodeZ:next:s -> nn [color="red"];
}
```
因此要在尾端放入新節點,只需要指定新節點的指標到 `*tail` 即可。
### 例外處理
若傳入參數為 `NULL` 佇列,則依據函式要求做特殊處理。(e.g., No effect or Return false)
### `q_new`
```cpp
queue_t *q_new()
{
queue_t *q = malloc(sizeof(queue_t));
if (q) {
q->head = NULL;
q->size = 0;
q->tail = &(q->head);
}
return q;
}
```
### `q_free`
```cpp
void q_free(queue_t *q)
{
if (!q)
return;
while (q->head) {
list_ele_t *tmp = q->head;
q->head = q->head->next;
free(tmp->value);
free(tmp);
}
free(q);
}
```
### 輔助函數
這邊先寫一個輔助函數,建立新的節點,並把 `s` 中的字串複製過去。這邊使用 `*n*` 的函式,避免造成緩衝區溢位 ([buffer overflow](https://en.wikipedia.org/wiki/Buffer_overflow)) 的問題。
```cpp
static inline list_ele_t *ele_new(char *s)
{
if (!s)
return NULL;
list_ele_t *newe = malloc(sizeof(list_ele_t));
if (!newe)
return NULL;
size_t slen = strnlen(s, QUEUE_STRLEN_MAX);
char *news = malloc((slen + 1) * sizeof(char));
if (!news) {
free(newe);
return NULL;
}
strncpy(news, s, slen);
news[slen] = '\0';
newe->value = news;
return newe;
}
```
### `q_insert_head`
對空佇列加入元素時,由上圖例可知,需要特別去修改 `q->tail`。
```cpp
bool q_insert_head(queue_t *q, char *s)
{
if (!q)
return false;
list_ele_t *newh = ele_new(s);
if (!newh)
return false;
newh->next = q->head;
if (!q->head)
q->tail = &(newh->next);
q->head = newh;
q->size += 1;
return true;
}
```
### `q_insert_tail`
`q->tail` 改為指向新元素的 `next`。
```cpp
bool q_insert_tail(queue_t *q, char *s)
{
if (!q)
return false;
list_ele_t *newt = ele_new(s);
if (!newt)
return false;
newt->next = NULL;
*(q->tail) = newt;
q->tail = &(newt->next);
q->size += 1;
return true;
}
```
### `q_remove_head`
這邊一樣使用 `*n*` 的函式來進行字串複製。
跟 `q_insert_head` 一樣需要注意的是,當它變為空佇列時,需要特別去修改 `q->tail`。
```cpp
bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
if (!q || !q->head)
return false;
if (sp && bufsize) {
strncpy(sp, q->head->value, bufsize - 1);
sp[bufsize - 1] = '\0';
}
list_ele_t *tmp = q->head;
q->head = q->head->next;
if (!q->head)
q->tail = &(q->head);
q->size -= 1;
free(tmp->value);
free(tmp);
return true;
}
```
### `q_size`
```cpp
int q_size(queue_t *q)
{
return q ? q->size : 0;
}
```
### `q_reverse`
逐步地把每個元素加入 `newh` 的前端即可,最後要先把 `q->tail` 指向 `q->head` (即新的尾端) 的 `next`。
```cpp
void q_reverse(queue_t *q)
{
if (!q || !q->head)
return;
list_ele_t *newh = NULL;
list_ele_t *p = q->head;
while (p) {
list_ele_t *tmp = p;
p = p->next;
tmp->next = newh;
newh = tmp;
}
q->tail = &(q->head->next);
q->head = newh;
}
```
### `q_sort`
這邊使用 [merge sort](https://en.wikipedia.org/wiki/Merge_sort) 演算法來實作 `q_sort`,時間複雜度是 $\theta(n\log n$),較為穩定。
這邊的 strcmp 就沒有使用 `*n*` 的函式,因為在程式中可以確保節點裡面的字串會是 null-terminated。
```cpp
static list_ele_t *merge(list_ele_t *p1, list_ele_t *p2)
{
if (!p1 && !p2)
return NULL;
else if (!p1)
return p2;
else if (!p2)
return p1;
list_ele_t *res = NULL, **pp = &res;
while (p1 || p2) {
if (!p2 || (p1 && strcmp(p1->value, p2->value) <= 0)) {
*pp = p1;
p1 = p1->next;
} else {
*pp = p2;
p2 = p2->next;
}
pp = &((*pp)->next);
}
return res;
}
```
透過 `mid` 每次移動一步以及 `tail` 每次移動二步來尋找鏈結串列的中心, `mid` 及 `tail` 皆為指標的指標,不過這邊的意義比較不同,代表的位置不是指向的節點 (`*mid` / `*tail`),而是當指向某個節點的 `next` (或是 `head`),實際上就是代表在那個節點 (或是在 `head`),因此 `tail` 可以透過 `*tail` 判斷是否有下一個節點,也就是是否到達尾端。
一開始 `mid` 跟 `tail` 都在 `head` 上,
```graphviz
digraph SLL {
rankdir=LR
head [shape="cds", label="head"]
nodes [shape="record", label="..."]
head :e -> nodes :w;
mid [shape="box3d", label="mid"]
tail [shape="box3d", label="tail "]
mid :ne -> head :w [color="red"];
tail :ne -> head :w [color="red"];
}
```
移動二次後,
```graphviz
digraph SLL {
rankdir=LR
head [shape="cds", label="head"]
nodeA [shape="record", label="{A | {<value>value | <next>next}}"]
nodeB [shape="record", label="{B | {<value>value | <next>next}}"]
nodeC [shape="record", label="{C | {<value>value | <next>next}}"]
nodeD [shape="record", label="{D | {<value>value | <next>next}}"]
head :e -> nodeA :w;
nodeA:next :s -> nodeB :w;
nodeB:next :s -> nodeC :w;
nodeC:next :s -> nodeD :w;
mid [shape="box3d", label="mid"]
tail [shape="box3d", label="tail "]
mid :ne -> nodeB:next :sw [color="red"];
tail :ne -> nodeD:next :sw [color="red"];
}
```
可以看到此時 `mid` 剛好在前半部的尾端,指標的指標在這裡除了可以得到後半部的前端 (`*mid`),也是為了方便把二部分分開 (`*mid = NULL;`)。
`merge_sort` 的參數 `list` 也使用指標的指標,直接修改真正的指標,一方面因為經過排序後,舊的指標沒有什麼代表意義及用途,另一方面也避免新的指標未被記錄而造成記憶體流失 (Memory leak)。
```cpp
static void merge_sort(list_ele_t **pp)
{
if (!(*list) || !(*list)->next)
return;
// Split into two halves
list_ele_t **mid = list, **tail = list;
while (*tail) {
tail = &((*tail)->next); // Move tail twice
if (*tail)
tail = &((*tail)->next);
mid = &((*mid)->next); // Move mid once
}
list_ele_t *L = *list, *R = *mid;
*mid = NULL;
merge_sort(&L);
merge_sort(&R);
*list = merge(L, R);
}
```
呼叫 `merge_sort` 進行排序後,需重新找一次尾端節點,並把 `q->tail` 指向它的 `next`。
```cpp
void q_sort(queue_t *q)
{
if (!q)
return;
merge_sort(&(q->head));
q->tail = &(q->head);
while (*(q->tail))
q->tail = &((*(q->tail))->next);
}
```
## SANITIZER
透過 `$ make SANITIZER=1` 開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) 後,
執行 `$ make test`,得到以下錯誤,
```
==1968==ERROR: AddressSanitizer: global-buffer-overflow on address 0x561845b335e0 at pc 0x561845b1bb08 bp 0x7ffd6eb9cc40 sp 0x7ffd6eb9cc30
READ of size 4 at 0x561845b335e0 thread T0
#0 0x561845b1bb07 in do_option_cmd /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:369
#1 0x561845b1a8f0 in interpret_cmda /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:221
#2 0x561845b1b0d5 in interpret_cmd /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:244
#3 0x561845b1c301 in cmd_select /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:594
#4 0x561845b1c87b in run_console /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:667
#5 0x561845b19405 in main /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/qtest.c:780
#6 0x7f386810c0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#7 0x561845b16bad in _start (/mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/qtest+0x8bad)
0x561845b335e1 is located 0 bytes to the right of global variable 'simulation' defined in 'console.c:21:6' (0x561845b335e0) of size 1
'simulation' is ascii string ''
SUMMARY: AddressSanitizer: global-buffer-overflow /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:369 in do_option_cmd
```
執行 `qtest` 再於命令提示列輸入 `help` 命令,得到以下錯誤,
```
==1954==ERROR: AddressSanitizer: global-buffer-overflow on address 0x56493755b3c0 at pc 0x5649375447dd bp 0x7ffded617170 sp 0x7ffded617160
READ of size 4 at 0x56493755b3c0 thread T0
#0 0x5649375447dc in do_help_cmd /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:307
#1 0x5649375448f0 in interpret_cmda /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:221
#2 0x5649375450d5 in interpret_cmd /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:244
#3 0x564937546818 in run_console /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:660
#4 0x564937543405 in main /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/qtest.c:780
#5 0x7fd4c631e0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#6 0x564937540bad in _start (/mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/qtest+0x8bad)
0x56493755b3c1 is located 0 bytes to the right of global variable 'echo' defined in 'console.c:59:13' (0x56493755b3c0) of size 1
SUMMARY: AddressSanitizer: global-buffer-overflow /mnt/e/SF/Linux_Kernel_Internals/homework1/lab0-c/console.c:307 in do_help_cmd
```
可以看到程式在 `console.c:369` 以及 `console.c:307` 出問題,程式碼如下,
```cpp=369
int oldval = *plist->valp;
```
```cpp=307
report(1, "\t%s\t%d\t%s", plist->name, *plist->valp,
plist->documentation);
```
可以發現都與 `plist` 有關,型態是 `param_ptr`,到 `console.h` 中可找到相關型態定義,
```cpp=30
/* Integer-valued parameters */
typedef struct PELE param_ele, *param_ptr;
struct PELE {
char *name;
int *valp;
char *documentation;
/* Function that gets called whenever parameter changes */
setter_function setter;
param_ptr next;
};
```
而錯誤訊息中也發現是跟 `simulation` 以及 `echo` 二個變數有關,來看看 `console.c` 中相關的部分。
```cpp=21
bool simulation = false;
```
```cpp=59
static bool echo = 0;
```
```cpp=104
add_param("simulation", (int *) &simulation, "Start/Stop simulation mode",
NULL);
```
```cpp=108
add_param("echo", (int *) &echo, "Do/don't echo commands", NULL);
```
那再來看看 `add_param` 到底在幹嗎?
```cpp=133
/* Add a new parameter */
void add_param(char *name,
int *valp,
char *documentation,
setter_function setter)
{
param_ptr next_param = param_list;
param_ptr *last_loc = ¶m_list;
while (next_param && strcmp(name, 412 next_param->name) > 0) {
last_loc = &next_param->next;
next_param = next_param->next;
}
param_ptr ele = (param_ptr) malloc_or_fail(sizeof(param_ele), "add_param");
ele->name = name;
ele->valp = valp;
ele->documentation = documentation;
ele->setter = setter;
ele->next = next_param;
*last_loc = ele;
}
```
原來它就是在 `param_list` 的尾端加入一個新的參數,但是 `valp` 的型態為 `int *`,而呼叫時被直接轉型,導致後續透過 `int` 的指標 dereference 時發生記憶體錯誤。
為了不改變其它結構定義與函數原型,最簡單的解決辦法就是把二個 `bool` 變數改為 `int` 型態。
## VALGRIND
執行 `$ make valgrind`,會得到以下訊息。
```
--- Trace Points
+++ TESTING trace trace-01-ops:
# Test of insert_head and remove_head
==3151== 5 bytes in 1 blocks are still reachable in loss record 1 of 3
==3151== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3151== by 0x4A4A50E: strdup (strdup.c:42)
==3151== by 0x110092: linenoiseHistoryAdd (linenoise.c:1236)
==3151== by 0x110C25: linenoiseHistoryLoad (linenoise.c:1325)
==3151== by 0x10C24C: main (qtest.c:769)
==3151==
==3151== 5 bytes in 1 blocks are still reachable in loss record 2 of 3
==3151== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3151== by 0x4A4A50E: strdup (strdup.c:42)
==3151== by 0x110006: linenoiseHistoryAdd (linenoise.c:1236)
==3151== by 0x110C25: linenoiseHistoryLoad (linenoise.c:1325)
==3151== by 0x10C24C: main (qtest.c:769)
==3151==
==3151== 160 bytes in 1 blocks are still reachable in loss record 3 of 3
==3151== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3151== by 0x110052: linenoiseHistoryAdd (linenoise.c:1224)
==3151== by 0x110C25: linenoiseHistoryLoad (linenoise.c:1325)
==3151== by 0x10C24C: main (qtest.c:769)
==3151==
--- trace-01-ops 6/6
--- TOTAL 6/6
```
`qtest.c:769` 呼叫 `linenoiseHistoryLoad()`,又會呼叫 `linenoiseHistoryAdd()`,在 `linenoise.c:1224` 附近可以看到以下的程式碼,
```cpp=1222
/* Initialization on first call. */
if (history == NULL) {
history = malloc(sizeof(char *) * history_max_len);
if (history == NULL)
return 0;
memset(history, 0, (sizeof(char *) * history_max_len));
}
```
配置了記憶體給 `history`。
而在 `linenoise.h` 中,我們可以看到 `freeHistory()` 這個函式, `linenoise.c` 中的程式碼如下,
```cpp=1186
/* ================================ History ================================= */
/* Free the history, but does not reset it. Only used when we have to
* exit() to avoid memory leaks are reported by valgrind & co. */
static void freeHistory(void)
{
if (history) {
int j;
for (j = 0; j < history_len; j++)
free(history[j]);
free(history);
}
}
```
可以得知應透過這個函式來釋放 `history` 的記憶體。
先看一下 `freeHistory()` 會在哪邊被呼叫,追蹤過程中可以看到 [`atexit()`](https://man7.org/linux/man-pages/man3/atexit.3.html)。
> The atexit() function registers the given function to be called at normal process termination, either via exit() or via return from the program's main(). Functions so registered are called in the reverse order of their registration; no arguments are passed.
```graphviz
digraph {
free [shape="plaintext", label="freeHistory()"];
exit [shape="plaintext", label="linenoiseAtExit()"];
enable [shape="plaintext", label="enableRawMode()"];
raw [shape="plaintext", label="linenoiseRaw()"];
linenoise [shape="plaintext", label="linenoise()"];
run [shape="plaintext", label="run_console()"];
run -> linenoise;
linenoise -> raw;
raw -> enable;
enable -> exit[label="atexit()"];
exit -> free;
}
```
在 `console.c` 中,
```cpp=650
bool run_console(char *infile_name)
{
if (!push_file(infile_name)) {
report(1, "ERROR: Could not open source file '%s'", infile_name);
return false;
}
if (!has_infile) {
char *cmdline;
while ((cmdline = linenoise(prompt)) != NULL) {
interpret_cmd(cmdline);
linenoiseHistoryAdd(cmdline); /* Add to the history. */
linenoiseHistorySave(HISTORY_FILE); /* Save the history on disk. */
linenoiseFree(cmdline);
}
} else {
while (!cmd_done())
cmd_select(0, NULL, NULL, NULL, NULL);
}
return err_cnt == 0;
}
```
只有 Raw mode 的時候會呼叫到 `linenoise()`,也只有這時候需要用到 `history` 的相關函式,因此把 ` qtest.c:769` 附近程式碼改成,
```cpp=768
if (!infile_name) {
linenoiseHistorySetMaxLen(HISTORY_LEN);
linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */
}
```
即可解決問題。
###### tags: `linux2021`