# 2021q1 Homework1 (lab0) contributed by < `Edwardz43` > ## 開發環境 ```shell $ uname -a Linux edlo-Ryzen-3600 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` ## 作業要求 - [x] 依據指示著手修改 `queue.[ch]` 和連帶的檔案: - [X] `q_new` : Create empty queue - [X] `q_free` : Free ALL storage used by queue - [X] `q_insert_head` : Attempt to insert element at ==head== of queue(==LIFO==) - [X] `q_insert_tail` : Attempt to insert element at ==tail== of queue(==FIFO==) - [X] `q_remove_head` : Attempt to remove element from ==head== of queue - [X] `q_size` : Return number of elements in queue - [X] `q_reverse` : Reverse elements in queue - [X] `q_sort` : Sort elements of queue in ==ascending== order - [X] 開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer),修正 `qtest` 執行過程中的錯誤 - [X] 運用 `Valgrind` 排除 `qtest` 實作的記憶體錯誤,並透過 `Massif` 視覺化 “simulation” 過程中的記憶體使用量,需要設計對應的實驗 - [ ] 在 `qtest` 中實作 [coroutine](https://en.wikipedia.org/wiki/Coroutine),並提供新的命令 web,提供 web 伺服器功能 - [ ] 解釋 [select](https://man7.org/linux/man-pages/man2/select.2.html) 系統呼叫在本程式的使用方式,並分析 `console.c` 的實作,說明其中運用 CS:APP [RIO](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 套件 的原理和考量點 - [ ] 說明 [antirez/linenoise](https://github.com/antirez/linenoise) 的運作原理,注意到 [termios](https://man7.org/linux/man-pages/man3/termios.3.html) 的運用 - [ ] 研讀論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf),解釋本程式的 “simulation” 模式是如何透過以實驗而非理論分析,達到驗證時間複雜度,需要解釋 [Student’s t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理 ## 實作 ### ==queue.h== 新增兩個成員: * `tail` : 最後一個元素的索引,之後在 `q_insert_tail` 會需要 * `size` : 佇列的容量, `q_size` 會用到 ```c= typedef struct { ... int tail; /* index of the last element */ int size; /* number of elements */ } queue_t ``` ### ==q_new== 初始化成員,若 allocate 失敗,回傳 NULL ```c= queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (!q || sizeof(queue_t) == 0) return NULL; q->head = q->tail = NULL; q->size = 0; return q; } ``` ### ==q_free== 先逐一尋訪每個節點,釋放其內容以及節點本身,最後在釋放 `queue` ```c= void q_free(queue_t *q) { if (!q) { return; } for (list_ele_t *cur = q->head; cur;) { list_ele_t *tmp = cur; cur = cur->next; free(tmp->value); free(tmp); } free(q); } ``` ### ==q_insert_head== 在分配 `value of element` 時需要注意,為了儲存長度 `l` 的字串,需要至少 `l + 1` 個 char 元素,並在最後設定為 `\0` 字元 ```c= ... newh->value = malloc(sizeof(char) * strlen(s) + 1); ... ``` 在 allocate `newh->value` 的過程中若失敗,要記得一併釋放之前已經 allocate 成功的部份 ```c= ... if (!newh->value) { free(newh->value); free(newh); return false; } ... ``` 首次 insert 記得將 `element` 同時指定為 `tail` ```c= ... if (q->size == 0) q->tail = newh; ... } ``` ### ==q_insert_tail== 和 `q_insert_head` 類似的流程,差別在若是首次輸入,記得同時指派 `queueu->head` ```c= ... if (q->size == 0) { q->head = newt; } ... ``` :::danger 這邊犯了一個錯,在新增元素後,忘記將 `tail->next` 初始化為 `NULL`,導致測試過程會誤判佇列的長度 ::: ### ==q_remove_head== 根據題目要求,在移除元素時,須一並將移除的內容複製到 `sp` 引數中,過程中需要檢查 `buffer size` 的大小,指派適當的複製長度 ```c= ... size_t cpySize = bufsize > strlen(q->head->value) ? strlen(q->head->value) : bufsize - 1; strncpy(sp, q->head->value, cpySize) sp[cpySize] = '\0'; ... ``` 若移除的元素為最後一個,記得處理 `tail` ```c= ... if (!q->head) q->tail = NULL; ... ``` ### ==q_size== 若佇列存在就回傳 `size` ,否則回傳 0 ### ==q_reverse== 因應題目要求,不做額外的記憶體配置,純粹利用改變指標的標記來實做 ```graphviz digraph ER{ node[shape=box]; previous; current; next; {rank=same;previous,current,next} previous->current[dir="forward",arrowhead="normal",arrowtail="normal"]; current->next[dir="forward",arrowhead="normal",arrowtail="normal"]; } ``` 首先宣告三個指標:前一個 (`previous`) 、 當前 (`current`) 、 下一個 (`next`) ```graphviz digraph ER{ node[shape=box]; previous; current; next; {rank=same;previous,current,next} previous->current[dir="forward",arrowhead="crow",arrowtail="normal"]; } ``` 每次操作就是把 `current` 的 `next` 指向 `previous` , 然後將三個指標依序向後位移,繼續下一次的操作,直到 `next` 為 `NULL` ```c= void q_reverse(queue_t *q) { if (!q || q->size == 0 || q->size == 1) return; list_ele_t *pre, *cur, *next; pre = NULL; cur = q->head; next = q->head->next; while (next) { cur->next = pre; pre = cur; cur = next; next = next->next; } cur->next = pre; q->head = cur; } ``` ### ==q_sort== 一開始用的是 `Bubble Sort`,雖然功能正常,但是[效能](https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_sorts)太差,測試過不了 ```c= if (!q || q->size == 0 || q->size == 1) return; list_ele_t *pre, *cur; for (int i = q->size; i > 0; i--) { pre = q->head; cur = q->head; for (int j = 0; j < i - 1 && cur->next; j++) { if (strcmp(cur->value, cur->next->value) > 0) { list_ele_t *tmp = cur->next; cur->next = tmp->next; tmp->next = cur; if (cur == q->head) { q->head = tmp; pre = tmp; } else { pre->next = tmp; pre = pre->next; } } else { cur = cur->next; if (j != 0) pre = pre->next; } } } list_ele_t *tmph = q->head; while (tmph->next) tmph = tmph->next; q->tail = tmph; ``` :::info 改用平均表現較好的 `Merge Sort` 來實作 ::: ```c= static void merge_sort(list_ele_t **headRef) { list_ele_t *head, *slow, *fast; head = *headRef; if (head && head->next) { front_back_split(head, &slow, &fast); merge_sort(&slow); merge_sort(&fast); *headRef = sorted_merge(slow, fast); } } ``` 將串列透過遞迴不斷分割,分割的過程是採取龜兔賽跑的方式,利用 `slow` 與 `fast` 兩個指標來切割 ```c= static void front_back_split(list_ele_t *source, list_ele_t **frontRef, list_ele_t **backRef) { list_ele_t *slow = source; list_ele_t *fast = source->next; while (fast) { fast = fast->next; if (fast) { slow = slow->next; fast = fast->next; } } *frontRef = source; *backRef = slow->next; slow->next = NULL; } ``` 之後在利用 `strcmp` 來將 list 排序 ```c=186 static list_ele_t *sorted_merge(list_ele_t *l1, list_ele_t *l2) { list_ele_t *result; if (!l1) return l2; if (!l2) return l1; if (strcmp(l1->value, l2->value) < 0) { result = l1; result->next = sorted_merge(l1->next, l2); } else { result = l2; result->next = sorted_merge(l1, l2->next); } return result; } ``` ## 除錯與改進 改成 `merge sort` 之後,雖然效能比之前的 `bubble sort` 好,但是在 `trace-15-perf.cmd` 會發生 `segmentation fault` 研究了一下,發現在處理數量大到一定規模的元素時,就會觸發 ```shell= cmd> new q = [] cmd> ih abc 1000000 q = [abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ... ] cmd> ih def 1000000 q = [def def def def def def def def def def def def def def def def def def def def def def def def def def def def def def ... ] cmd> sort [1] 78426 segmentation fault (core dumped) ./qtest ``` 利用 `scripts/debug.py` 除錯,印出 `core dump` ```shell= Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055555555a222 in sorted_merge (l1=l1@entry=0x5555592711c0, l2=0x55555af7f540) at queue.c:187 187 { #0 0x000055555555a222 in sorted_merge (l1=l1@entry=0x5555592711c0, l2=0x55555af7f540) at queue.c:187 #1 0x000055555555a252 in sorted_merge (l1=l1@entry=0x5555592711c0, l2=0x55555af7f5c0) at queue.c:198 #2 0x000055555555a252 in sorted_merge (l1=l1@entry=0x5555592711c0, l2=0x55555af7f640) at queue.c:198 ... ``` 在執行 `sorted_merge` 的遞迴呼叫時,出了狀況,目前尚在釐清原因 :::info ~~研究 `gdb` 中~~ 重新閱讀作業需求,修正方向為利用下列工具幫助除錯 ::: >* 運用 Valgrind 排除 qtest 實作的記憶體錯誤,並透過 Massif 視覺化 “simulation” 過程中的記憶體使用量,需要設計對應的實驗 透過 `make valgrind` 進行測試,發現問題出在持續的遞迴造成 stack overflow ```shell= +++ TESTING trace trace-15-perf: # Test performance of insert_tail, size, reverse, and sort ==115294== Stack overflow in thread #1: can't grow stack to 0x1ffe801000 ==115294== Stack overflow in thread #1: can't grow stack to 0x1ffe801000 ==115294== Can't extend stack to 0x1ffe8010a8 during signal delivery for thread 1: ==115294== no stack segment ==115294== ==115294== Process terminating with default action of signal 11 (SIGSEGV) ==115294== Access not within mapped region at address 0x1FFE8010A8 ==115294== Stack overflow in thread #1: can't grow stack to 0x1ffe801000 ==115294== at 0x10E222: sorted_merge (queue.c:187) ==115294== If you believe this happened as a result of a stack ==115294== overflow in your program's main thread (unlikely but ==115294== possible), you can try to increase the size of the ==115294== main thread stack using the --main-stacksize= flag. The main thread stack size used in this run was 8388608. ... ``` 原來 valgrind 在運行時預設的 `main stack size` 是 8kb,可以透過`.valgrindrc` 中的 `--main-stacksize` 修改 加到32kb,再跑一遍測試 ```shell= ... ==120727== by 0x4A5950E: strdup (strdup.c:42) ==120727== by 0x11007D: linenoiseHistoryAdd (linenoise.c:1236) ==120727== by 0x110C9C: linenoiseHistoryLoad (linenoise.c:1325) ==120727== by 0x10C234: main (qtest.c:769) ==120727== --- trace-15-perf 6/6 +++ TESTING trace trace-16-perf: ... ``` 過關了...,不過新的問題出現了,演算法明顯有缺點,佔用過多記憶體,勢必要優化 --- ### Valgrind with Massif 利用 `valgrind` 搭配 `Massif` 幫排序演算法做健檢 ```shell= $ valgrind --tool=massif ./qtest ... ==129972== possible), you can try to increase the size of the ==129972== main thread stack using the --main-stacksize= flag. ==129972== The main thread stack size used in this run was 8388608. [1] 129972 segmentation fault (core dumped) valgrind --tool=massif ./qtest $ ms_print ./massif.out.129972 -------------------------------------------------------------------------------- Command: ./qtest Massif arguments: (none) ms_print arguments: ./massif.out.129972 -------------------------------------------------------------------------------- MB 181.5^ # | ::::::# | ::@:::: # | :::::@:::: # | :::@:::::@:::: # | ::: :@:::::@:::: # | ::::::: :@:::::@:::: # | :::::::::: :@:::::@:::: # | ::::::::::::: :@:::::@:::: # | ::::: ::::::::::: :@:::::@:::: # | @:::: : : ::::::::::: :@:::::@:::: # | ::::@:: : : : ::::::::::: :@:::::@:::: # | ::@: ::@:: : : : ::::::::::: :@:::::@:::: # | @::::@: ::@:: : : : ::::::::::: :@:::::@:::: # | ::::@::::@: ::@:: : : : ::::::::::: :@:::::@:::: # | :::::: @::::@: ::@:: : : : ::::::::::: :@:::::@:::: # | :::::: ::: @::::@: ::@:: : : : ::::::::::: :@:::::@:::: # | @::: : :: ::: @::::@: ::@:: : : : ::::::::::: :@:::::@:::: # | :@@@@: : : :: ::: @::::@: ::@:: : : : ::::::::::: :@:::::@:::: # | :::::@@ @: : : :: ::: @::::@: ::@:: : : : ::::::::::: :@:::::@:::: # 0 +----------------------------------------------------------------------->Mi 0 697.1 Number of snapshots: 59 Detailed snapshots: [7, 8, 9, 18, 23, 27, 46, 52, 58 (peak)] ... ``` 用`ms_print` 可以在 commandline 將 `Massif` 的資訊簡易的視覺化,但是結果很陽春 參考了以前同學做的筆記,猜想可能的 bottleneck 是過多的遞迴造成記憶體耗盡,~~或是 `strcmp` function call 造成的效能問題?~~ :::info 減少或捨棄遞迴的做法來降低記憶體開銷 ::: 將原先的 `sorted_merge` function inline 回 `merge_sort` ,減少遞迴的呼叫與回傳,改以直接操作 pointer to pointer 替代 ```c= static void merge_sort(list_ele_t **headRef) { ... *headRef = NULL; list_ele_t **tmp = headRef; while (slow && fast) { if (strcmp(slow->value, fast->value) < 0) { *tmp = slow; slow = slow->next; } else { *tmp = fast; fast = fast->next; } tmp = &(*tmp)->next; } *tmp = slow ? slow : fast; ... } ``` --- ### [Massif-Visualizer](https://apps.kde.org/en/massif-visualizer) — 資料視覺化 ```shell= $ valgrind --tool=massif --heap=no --stacks=yes -v ./qtest ``` 因為主要是測試 stack overflow , 所以關閉 heap ##### test case: ```shell= cmd> new cmd> it abc 500000 cmd> ih def 500000 cmd> reverse cmd> sort cmd> free ``` ![遞迴版本](https://i.imgur.com/zBHqXIK.png) #### 遞迴版本 ![修改版](https://i.imgur.com/vy50Cq5.png) #### 修改版 可以看出修改前後的記憶體消費相差近千倍,相當恐怖的差距 遞迴版本的記憶體消費約在 ==1 MB==,峰值跑到 ==7.6 MB==,修改後平均花費在 ==2~3 kB==,峰值只有 ==7.5 kB== >一開始沒仔細看圖例的單位,以為修改過效能更差... ## Address Sanitizer Debug and Fix 執行以下操作: ```shell= $ make SANITIZER=1 ... $ cmd> help ``` 出現錯誤: ```shell= ================================================================= ==21080==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000104886260 at pc 0x00010486dd6f bp 0x7ffeeb397220 sp 0x7ffeeb397218 READ of size 4 at 0x000104886260 thread T0 #0 0x10486dd6e in do_help_cmd console.c:307 #1 0x10486f891 in interpret_cmda console.c:221 #2 0x10486efb6 in interpret_cmd console.c:244 #3 0x10486f5c7 in run_console console.c:660 #4 0x10486a735 in main qtest.c:780 #5 0x7fff67bdfcc8 in start+0x0 (libdyld.dylib:x86_64+0x1acc8) 0x000104886261 is located 0 bytes to the right of global variable 'echo' defined in 'console.c:59:13' (0x104886260) of size 1 SUMMARY: AddressSanitizer: global-buffer-overflow console.c:307 in do_help_cmd Shadow bytes around the buggy address: 0x100020910bf0: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x100020910c00: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x100020910c10: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x100020910c20: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x100020910c30: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 =>0x100020910c40: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 0x100020910c50: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x100020910c60: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 0x100020910c70: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 0x100020910c80: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 0x100020910c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==21080==ABORTING [1] 21080 abort ./qtest ``` 由訊息得知,錯誤的位置在 `console.c` ```c=307 while (plist) { report(1, "\t%s\t%d\t%s", plist->name, *plist->valp, plist->documentation); plist = plist->next; } ``` 繼續往下追,發現問題出在 `PELE` 的 `valp` 型態為 pointer to an intege ```c= struct PELE { char *name; int *valp; char *documentation; /* Function that gets called whenever parameter changes */ setter_function setter; param_ptr next; }; ``` 但是 `echo` 這個 parameter 的宣告型態為 `bool` , 在 `add_param` 時被轉型為 pointer to integer ```c=108 add_param("echo", (int *) &echo, "Do/don't echo commands", NULL); ``` 當 1 byte 的 bool 轉型為 ~~8~~ 4 byte 的 int,結果會如何? ```c= #include <stdio.h> #include <stdbool.h> void test() { bool a = true; printf(" %x\n", a); printf(" %x\n", *(int *)&a); } int main(void) { test(); return 0; } ``` 根據執行的結果,每次印出的值都不一樣 ```shell= > ./test 1 6488001 > ./test 1 e5088001 > ./test 1 9a988001 ``` 相同的問題也發生在另一個參數 `simulation`,比較簡的解決方法就是把型別改為 int ,不過 這也意味著會浪費額外的記憶體空間,所以思考方向轉為:利用類似 ==泛型(generic)== 的方法,將 `add_param` 改寫為接受不定型別的參數,在 function 內部針對不同型別處理 ```c= void add_param(char *name, void *valp, char *documentation, setter_function setter) ``` 但是泛型寫法常都是由外部將型別傳入,對現行的程式碼來說,或許需要大量的改動(?) >TODO: 研究中 :::info C99 §6. 3. 1. 1 Boolean Characters And Integers ::: >6.3.1.1 Boolean, characters, and integers 1 Every integer type has an integer conversion rank defined as follows: — No two signed integer types shall have the same rank, even if they hav e the same representation. — The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision. — The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char. — The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any. — The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width. — The rank of char shall equal the rank of signed char and unsigned char. — The rank of _Bool shall be less than the rank of all other standard integer types. — The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2). — The rank of any extended signed integer type relative to another extended signed integer type with the same precision is implementation-defined, but still subject to the other rules for determining the integer conversion rank. — For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has greater rank than T3, then T1 has greater rank than T3. --- ## Coroutine Implemetation in qtest :::info [Coroutines in one page of C](https://www.embeddedrelated.com/showarticle/455.php) 研讀中 ::: ###### tags: `week1` `sysprog2021q1`