2021q1 Homework1 (lab0)

# 2021q1 Homework1 (lab0) contributed by < [`bakudr18`](https://github.com/bakudr18/lab0-c)> ###### tags: `linux2021` * 作業要求: [lab0](https://hackmd.io/@sysprog/linux2021-lab0) ## queue[.ch] 實作 ### queue_t ```cpp typedef struct { list_ele_t *head; /* Linked list of elements */ list_ele_t *tail; size_t size; } queue_t; ``` 宣告 `queue_t` 結構成員如上，定義 `tail`與 `size` 使後續實作達到時間複雜度 $O(1)$ 的要求。 ### q_new & q_free `q_new` 除了要記得對所有成員初始化，也須考慮 malloc failed 。`q_free` 僅須記得照順序釋放記憶體即可。 ### q_insert_head, q_insert_tail & q_remove_head ```cpp bool q_insert_head(queue_t *q, char *s) { list_ele_t *newh; size_t length; if (!q) return false; newh = malloc(sizeof(list_ele_t)); if (!newh) return false; length = strlen(s) + 1; newh->value = malloc(sizeof(char) * length); if (!newh->value) { free(newh); return false; } strncpy(newh->value, s, length); newh->next = q->head; q->head = newh; if (!q->tail) q->tail = q->head; q->size++; return true; } bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { list_ele_t *tmp; if (!q || !q->head) return false; if (sp) { size_t length = strlen(q->head->value) + 1; length = bufsize < length ? bufsize : length; strncpy(sp, q->head->value, length - 1); sp[length - 1] = '\0'; } tmp = q->head; q->head = q->head->next; free(tmp->value); free(tmp); if (!q->head) q->tail = NULL; q->size--; return true; } ``` `q_insert_head` 與 `q_insert_tail` 在前半部配置記憶體部分是一樣的，僅須注意 `free(newh)` 這行即可，`q_insert_tail` 請參考 [ commit](https://github.com/bakudr18/lab0-c/commit/bd910101fbeea4a752d0c819dc9402cdf87bb285)。而 `q_remove_head` 須注意 `bufsize` 是由使用者傳進來的參數， size 很可能小於 `q->head->value` ，因此需做適當的裁切 :::danger 避免在超連結指向頻繁變動的檔案 (此例為 `queue.c`) 的行數，改用指向特定 git commit 的超連結。 :notes: jserv ::: ::: success 已修正連結至 git commit :notes: Bakud ::: ### q_size & q_reverse 因為 `queue_t` 有儲存 `size` 成員，因此能做到 $O(1)$ 時間複雜度。 `q_reverse` 就透過 `curr` , `next` , `q->head` 三個 node 實作走訪節點及反轉即可。 ### q_sort 考慮在 worse case 也能有 $O(nlogn)$ 時間複雜度，因此使用 merge sort 實作 ```cpp static list_ele_t *merge_sort(list_ele_t **head, size_t size) { list_ele_t *mid; size_t ll, rl; if (size < 2) return *head; rl = size / 2; ll = size - rl; mid = q_split(*head, ll); merge_sort(head, ll); merge_sort(&mid, rl); return q_merge_sorted_list(head, *head, mid); } void q_sort(queue_t *q) { if (!q) return; q->tail = merge_sort(&q->head, q->size); } ``` `merge_sort` 函式以遞迴方式實作，透過傳入 `size` 來找到中間的 node ，並且回傳更新後的 `tail` 來更新 `q->tail` 。 ```cpp static list_ele_t *q_split(list_ele_t *walk, size_t step) { list_ele_t *remain; while (--step) walk = walk->next; remain = walk->next; walk->next = NULL; return remain; } static list_ele_t *q_merge_sorted_list(list_ele_t **head, list_ele_t *first, list_ele_t *second) { while (first && second) { if (strcasecmp(first->value, second->value) < 0) { *head = first; first = first->next; } else { *head = second; second = second->next; } head = &(*head)->next; } if (first) *head = first; else *head = second; while ((*head)->next) head = &(*head)->next; return *head; } ``` 輔助函式 `q_split` 會回傳 split 後的 linked list head。 `q_merge_sorted_list` 用來合併兩個 sorted list，並且需回傳 `tail`。但是目前的實作方法在測資 trace-15-perf.cmd 會超過 time limit。 ## 改進 queue library ### Flexible array member [Flexible array member](https://frankchang0125.blogspot.com/2013/01/c-struct-hack-structure-with-variable.html) (以下簡稱 FAM) 在 C99 以後被納入標準，根據 C99 §6.7.2.1 ¶16 > As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. With two exceptions, the flexible array member is ignored. First, the size of the structure shall be equal to the offset of the last element of an otherwise identical structure that replaces the flexible array member with an array of unspecified length.106) Second, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it. 透過宣告 incomplete array type 在 struct 的最後一個成員，可以透過 `.` 或 `->` operator 來操作 FAM ，這樣做的好處大致有兩個，其一是 FAM 本身不占空間，因此以下宣告 ```cpp typedef struct ELE { struct ELE *next; char value[]; } list_ele_t; ``` `sizeof(list_ele_t)` 只佔 8 bytes ，而 FAM 也無需考慮 struct memory alignment 的問題。第二個好處是在配置記憶體時只需配置一次，可以避免記憶體變得比較零散，能夠增加 cache hit 的機率。 ```diff bool q_insert_head(queue_t *q, char *s) { ... - newh = malloc(sizeof(list_ele_t)); - if (!newh) - return false; - length = strlen(s) + 1; - newh->value = malloc(sizeof(char) * length); - if (!newh->value) { - free(newh); + newh = malloc(sizeof(list_ele_t) + sizeof(char) * length); + if (!newh) return false; - } strncpy(newh->value, s, length); newh->next = q->head; ... } ``` ### Optimizing merge sort * 根據 [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort) ，目前做法的 merge sort 不利於 memory locality。而 tiled merge sort 在 split subarray 到 $S$ 時(約為L1 cache size)，就透過 in-place sort alogrithm 如 insertion sort 進行排序來減少 cache swap 的機率，然後才繼續 merge subarray。這部分的實作會等到其他題目做完後再補上。 ## 以 Address Sanitizer 偵測記憶體錯誤 ### 分析 qtest 首先執行 `make SANITIER=1` 開啟 [Address Sanitier](https://github.com/google/sanitizers/wiki/AddressSanitizer) 功能，編譯完成後執行 `qtest` 的 `help` 指令，會得到以下輸出 ``` ! ================================================================= ==31814==ERROR: AddressSanitizer: global-buffer-overflow on address 0x561e33cf5380 at pc 0x561e33cdf637 bp 0x7fff1b2c09e0 sp 0x7fff1b2c09d0 READ of size 4 at 0x561e33cf5380 thread T0 #0 0x561e33cdf636 in do_help_cmd /home/ycyao/linux2021/lab0-c/console.c:307 #1 0x561e33cdf74a in interpret_cmda /home/ycyao/linux2021/lab0-c/console.c:221 #2 0x561e33cdff2f in interpret_cmd /home/ycyao/linux2021/lab0-c/console.c:244 #3 0x561e33ce1672 in run_console /home/ycyao/linux2021/lab0-c/console.c:660 #4 0x561e33cde25f in main /home/ycyao/linux2021/lab0-c/qtest.c:780 #5 0x7fd621bba0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) #6 0x561e33cdbb8d in _start (/home/ycyao/linux2021/lab0-c/qtest+0x8b8d) ``` 可以看到程式在 `console.c` 在第 307 行發生 [global buffer overflow](https://github.com/google/sanitizers/wiki/AddressSanitizerExampleGlobalOutOfBounds)，也就是存取到不該使用的用來存全域變數的記憶體位置。 ```c=304 param_ptr plist = param_list; report(1, "Options:"); while (plist) { report(1, "\t%s\t%d\t%s", plist->name, *plist->valp, plist->documentation); plist = plist->next; } ``` 而 `param_list` 的 `name`, `valp` 與 `documentation` 三個參數是透過 `add_param` 函式初始化的，而 `add_param` 則是在 `init_cmd` 函式中被呼叫到 ```c void add_param(char *name, int *valp, char *documentation, setter_function setter) { ... ele->name = name; ele->valp = valp; ele->documentation = documentation; ... } ``` ```c add_param("simulation", (int *) &simulation, "Start/Stop simulation mode", NULL); add_param("verbose", &verblevel, "Verbosity level", NULL); add_param("error", &err_limit, "Number of errors until exit", NULL); add_param("echo", (int *) &echo, "Do/don't echo commands", NULL); ``` 估計是這附近造成 Error。接著繼續觀察 Address Sanitizer 的輸出結果 ```! ==31814==ERROR: AddressSanitizer: global-buffer-overflow on address 0x561e33cf5380 at pc 0x561e33cdf637 bp 0x7fff1b2c09e0 sp 0x7fff1b2c09d0 ``` ```! 0x561e33cf5381 is located 0 bytes to the right of global variable 'echo' defined in 'console.c:59:13' (0x561e33cf5380) of size 1 SUMMARY: AddressSanitizer: global-buffer-overflow /home/ycyao/linux2021/lab0-c/console.c:307 in do_help_cmd Shadow bytes around the buggy address: 0x0ac446796a20: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x0ac446796a30: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x0ac446796a40: f9 f9 f9 f9 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9 0x0ac446796a50: 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 0x0ac446796a60: 01 f9 f9 f9 f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 =>0x0ac446796a70:[01]f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 0x0ac446796a80: 04 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 ``` 可以發現程式在記憶體位置 `0x561e33cf5380` 的地方出錯，並且很貼心地告訴你這個位置存放的變數是 `echo` ，並且 size 為 1 byte。而 `add_param` 使用到 `echo` 的地方就是 `console.c` 的 108 行 ```c=108 add_param("echo", (int *) &echo, "Do/don't echo commands", NULL); ``` 這裡對 `echo` 做了強制轉型，由 `bool*` 轉成 `int*` ，以滿足函式參數條件，推測是這裡出錯了，接著我們以 gdb 來實驗 ~~(順便練習使用 gdb :stuck_out_tongue_winking_eye:)~~。 ```bash= $ gdb ./qtest # 設定 break point 在程式出錯的 307 行 (gdb) b console.c:307 (gdb) r Breakpoint 1, do_help_cmd (argc=<optimized out>, argv=<optimized out>) at console.c:307 307 report(1, "\t%s\t%d\t%s", plist->name, *plist->valp, # print plist->valp (gdb) p plist->valp $1 = (int *) 0x555555576380 <echo> # print &echo (gdb) p &echo $2 = (_Bool *) 0x555555576380 <echo> # sizeof echo (gdb) p sizeof(echo) $3 = 1 # size of *plist->valp (gdb) p sizeof(*plist->valp) $4 = 4 ``` **可以看到 plist->valp 的型態為 `int*` 且 size 為 4，但其指向的變數 `echo` 型態為 `_Bool*` 而 size 為 1，因此在 `printf` 時會操作到錯誤的記憶體空間 `0x555555576380` ~ `0x555555576383`。** 再繼續觀察 Address Sanitizer， ``` Shadow bytes around the buggy address: =>0x0ac446796a70:[01]f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 ``` 注意到這裡的 `0x0ac446796a70` 與上面 `echo` 的記憶體位置並不一樣，這是因為這裡的位置是由 Address Sanitizer memory mapping 而來的，根據 [AddressSanitizerAlgorithm](https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#memory-mapping-and-instrumentation) > The virtual address space is divided into 2 disjoint classes: > * Main application memory (Mem): this memory is used by the regular application code. > * Shadow memory (Shadow): this memory contains the shadow values (or metadata). There is a correspondence between the shadow and the main application memory. Poisoning a byte in the main memory means writing some special value into the corresponding shadow memory. > AddressSanitizer maps 8 bytes of the application memory into 1 byte of the shadow memory. 也就是說，上面的 `0x555555576380` 為程式實際使用的記憶體，而 `0x0ac446796a70` 是 Address Sanitizer 映射出來的 shadow memory， Address Sanitizer 會將 application 中的 8 bytes 映射的 1 bytes 的 shadow memory ，而其中的 `[01]` 表示這個位址只有 1 byte 可以使用，`f9` 表示保留給全域變數但尚未宣告不可使用的位址。 ``` Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Global redzone: f9 ``` ### 修正 Address Sanitizer 偵測到的錯誤一開始的方向是想把 `int` 和 `bool` 型態的 `valp` 用 `union` 加到 `struct PELE` 內，然後利用 enum 和 C11 standard 中的 `_Generic` macro (詳見[你所不知道的C語言:前置處理器篇](https://hackmd.io/@sysprog/c-preprocessor?type=view#_Generic-C11)) 等技巧達到傳入 `valp` type 到 `add_param` 的目的，大致程式邏輯如下所示。 ```c= #define VALP_TYPE(x) _Generic((x), int : valp_type_int, bool : valp_type_bool) typedef enum { valp_type_int, valp_type_bool } valp_type_t; struct PELE { char *name; union { int *valp_int; bool *valp_bool; }; int valp_type; ... }; void add_param(char *name, void *valp, int valp_type, char *doccumentation, setter_function setter); ``` 雖然這樣做可以保留在未來要擴充新型態的 `valp` 需要，但是也有明顯的缺點。當未來需要新增新型態時，除了上述的 `struct PELE`, `VALP_TYPE`, `enum` 需要修改外，還需要在所有用到 `valp` 的函式內(包含 `do_option_cmd`, `*setter_function` 等)都新增 `valp` type 的判斷，因此勢必會造成後續維護上的困難，考量過後我認為直接把 `simulation` 和 `echo` 改成 `int` 反而是比較好的做法(或許有更好的做法但我暫時想不到)，即使這個做法會需要改到 `console.[ch]` 以外的程式碼(因為 `simulation` 是全域變數)，但是這樣做在後續的維護上也會更方便，詳細細節請見 [git commit](https://github.com/bakudr18/lab0-c/commit/fd8bec73e2949f96db3d81af3703c3b3c604f2a9)。 ## 運用 Valgrind 排除 qtest 的記憶體錯誤 ### 錯誤1 執行 `valgrind ./qtest -f traces/trace-01-ops.cmd` 會得到以下輸出，可知 memory leak 是來自 `linenoiseHistory` 沒被釋放。 ```bash ==11922== 5 bytes in 1 blocks are still reachable in loss record 1 of 3 ==11922== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==11922== by 0x4A4C50E: strdup (strdup.c:42) ==11922== by 0x10FF60: linenoiseHistoryAdd (linenoise.c:1236) ==11922== by 0x110AF3: linenoiseHistoryLoad (linenoise.c:1325) ==11922== by 0x10C1B5: main (qtest.c:769) ==11922== ==11922== 79 bytes in 19 blocks are still reachable in loss record 2 of 3 ==11922== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==11922== by 0x4A4C50E: strdup (strdup.c:42) ==11922== by 0x10FED4: linenoiseHistoryAdd (linenoise.c:1236) ==11922== by 0x110AF3: linenoiseHistoryLoad (linenoise.c:1325) ==11922== by 0x10C1B5: main (qtest.c:769) ==11922== ==11922== 160 bytes in 1 blocks are still reachable in loss record 3 of 3 ==11922== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==11922== by 0x10FF20: linenoiseHistoryAdd (linenoise.c:1224) ==11922== by 0x110AF3: linenoiseHistoryLoad (linenoise.c:1325) ==11922== by 0x10C1B5: main (qtest.c:769) ==11922== ``` 我一開始的作法是在 main function 的結尾執行函式 `freeHistory` ([commit 330ba3](https://github.com/bakudr18/lab0-c/commit/330ba35d85c18c5e58ac6eebaae49c0df9fcb233))，但這個做法會導致重複釋放同一段記憶體。後來參考了 [akamayu-ouo](https://hackmd.io/@akamayu-ouo/sysprog21-lab0#Valgrind) 同學的作法，分析以下函式 ```c static int enableRawMode(int fd) { struct termios raw; if (!isatty(STDIN_FILENO)) goto fatal; if (!atexit_registered) { atexit(linenoiseAtExit); atexit_registered = 1; } ... } ``` `isatty` 可以判斷 file descripter 是否來自 `stdin` ，而 `atexit` 則是當程式碼正常結束時(透過 `main return` 或 `exit(3)` )，會去執行註冊的 callback function 因此，當以讀檔作為輸入時，就不需要存 `.cmd_history` ，當然也就不需要釋放記憶體，這樣也能讓程式碼更為簡潔。 ### 錯誤2 當讀檔失敗或是輸入錯誤 `cmd` 次數超過 `err_limit` 時就會引發 memory leak，從 `add_cmd` 和 `add_param` 所配置的記憶體未被釋放。 ```bash= $ valgrind ./qtest -f dummy ERROR: Could not open source file 'dummy' ==12176== 32 bytes in 1 blocks are still reachable in loss record 1 of 24 ==12176== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==12176== by 0x10C869: malloc_or_fail (report.c:189) ==12176== by 0x10D2F3: add_cmd (console.c:124) ==12176== by 0x10D3DC: init_cmd (console.c:95) ==12176== by 0x10C035: main (qtest.c:762) ==12176== ``` 解決方法是無論 `run_console` 執行結果如何，都必須執行 `do_quit_cmd` 來釋放記憶體。 ```diff= static void record_error() { err_cnt++; if (err_cnt >= err_limit) { report(1, "Error limit exceeded. Stopping command execution"); - quit_flag = true; + do_quit_cmd(0, NULL); } } int main(int argc, char *argv[]) { ... bool ok = true; ok = ok && run_console(infile_name); - ok = ok && finish_cmd(); + ok = finish_cmd() && ok; return ok ? 0 : 1; } ``` ## Tiny web server with coroutine ### 嘗試在 [tiny-web-server](https://github.com/7890/tiny-web-server) 加入 coroutine 首先參考 [Coroutines in one page of C](https://www.embeddedrelated.com/showarticle/455.php) 一文，分析以下函式 ```c= #define get_sp(p) asm volatile("movq %%rsp, %0" : "=r"(p)) #define get_fp(p) asm volatile("movq %%rbp, %0" : "=r"(p)) #define set_sp(p) asm volatile("movq %0, %%rsp" : : "r"(p)) #define set_fp(p) asm volatile("movq %0, %%rbp" : : "r"(p)) enum { FRAME_SZ = 5 }; // fairly arbitrary void start(coroutine *c, func f, void *arg, void *sp) { start_params *p = ((start_params *) sp) - 1; // save params before stack switching p->c = c; p->f = f; p->arg = arg; get_sp(p->old_sp); get_fp(p->old_fp); set_sp(p - FRAME_SZ); set_fp(p); // effectively clobbers p //(and all other locals) get_fp(p); // we read p back from $fp // and now we read our params from p if (!setjmp(p->c->callee_context)) { set_sp(p->old_sp); set_fp(p->old_fp); return; } (*p->f)(p->arg); longjmp(p->c->caller_context, DONE); } ``` `sp` 指向自己預先宣告的空間來存放目前執行 function 的 context ，這裡透過 x86_64 的暫存器 `rbp` 與 `rsp` 直接修改 frame pointer 和 stack pointer 的位置以保留目前的 stack context ，然後 return。 ```c= void yield(coroutine *c) { if (!setjmp(c->callee_context)) { longjmp(c->caller_context, WORKING); } } int next(coroutine *c) { int ret = setjmp(c->caller_context); if (!ret) { longjmp(c->callee_context, 1); } else { return ret == WORKING; } } ``` 接著呼叫 `next` 函式即可透過 `longjmp` 跳至 `(*p->f)(p->arg)` 去執行預先註冊的 `func f` 。若 `func f` 中呼叫到 `yield` 函式，會跳回函式 `next` 並回傳 `ret == WORKING` ，即可回到上層呼叫 `next` 函式的地方，等到下次再呼叫 `next` ，`func f` 便會從上次 `yield` 的地方繼續執行。因此，為了整合進 [tiny-web-server](https://github.com/7890/tiny-web-server) ，需先將 `listenfd` 改為 non-blocking socket，然後再修改 server 連線 client 的方式如下 ```diff= void acceptor(void *arg) { + listener *lnr = (listener *) arg; while (1) { int connfd = accept(lnr->fd, (SA *) &lnr->clientaddr, &lnr->clientlen); + if (connfd < 0) { + if (errno == EAGAIN) + printf("No connecting request, yield now\n"); + yield(lnr->c); + } else { process(connfd, &lnr->clientaddr); close(connfd); + } } } ``` ```c= #define N 1000 void wait_client(int fd, struct sockaddr_in addr, socklen_t len) { coroutine c; int stack[N]; listener lnr; lnr.c = &c; lnr.fd = fd; lnr.clientaddr = addr; lnr.clientlen = len; start(&c, &acceptor, &lnr, stack + N - 1); while (next(&c)) { printf("No client, do other things\n"); } } ``` 接著寫個簡單的測試，可得到以下輸出，驗證 coroutine 方法正確。 ```bash= No connecting request, yield now no client, do other things No connecting request, yield now no client, do other things No connecting request, yield now no client, do other things No connecting request, yield now no client, do other things No connecting request, yield now no client, do other things ... ``` 然而，目前做法有兩個很大的缺點。 * `get_sp` , `get_fp` , `set_sp` , `set_fp` 只能在 x86_64 架構下執行。 * 一但開啟編譯器優化，就會導致以下錯誤，因為無法保證執行順序 (例如 `wait_client` 的第 13 行 `while (next(&c))` 可能在第 8 行 `lnr.fd = fd;` 之前就優先被執行，因為編譯器判斷兩段程式碼沒有相依性。 ```bash *** longjmp causes uninitialized stack frame ***: terminated Segmentation fault (core dumped) ``` ### 待改進之處儲存 stack context 的方法有很多種，除了如上修改 stack pointer外，也有透過 [static variable](https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html) 、[variable length arrays or `alloca`](https://fanf.livejournal.com/105413.html)、或 [setcontext and getcontext](http://dotat.at/cgi/git/picoro.git) 等方式，但目前我還無法評估哪個方式適用於那些情況，我想這也是為什麼 `getcontext` 已經被 POSIX 棄用的原因之一吧，因為移植性問題很難做出一個通用形式。