2022q1 Homework1 (lab0)

--- tags: linux2022 --- # 2022q1 Homework1 (lab0) contributed by < `KatherineHuangg` > ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz Stepping: 10 CPU MHz: 1800.000 CPU max MHz: 3400.0000 CPU min MHz: 400.0000 BogoMIPS: 3600.00 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-7 ``` ## 作業要求 * `q_new`: 建立新的「空」佇列; * `q_free`: 釋放佇列所佔用的記憶體; * `q_insert_head`: 在佇列開頭 (head) 插入 (insert) 給定的新節點 (以 LIFO 準則); * `q_insert_tail`: 在佇列尾端 (tail) 插入 (insert) 給定的新節點 (以 FIFO 準則); * `q_remove_head`: 在佇列開頭 (head) 移去 (remove) 給定的節點; * `q_release_element`: 釋放特定節點的記憶體空間; * `q_size`: 計算目前佇列的節點總量; * `q_delete_mid`: 移走佇列中間的節點，詳見 [LeetCode 2095. Delete the Middle Node of a Linked List](https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/) * `q_delete_dup`: 在已經排序的狀況，移走佇列中具備重複內容的節點，詳見 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) * `q_swap`: 交換佇列中鄰近的節點，詳見 [LeetCode 24. Swap Nodes in Pairs](https://leetcode.com/problems/swap-nodes-in-pairs/) * `q_reverse`: 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列節點，換言之，它只能重排已經存在的節點; * `q_sort`: 以==遞增順序==來排序鏈結串列的節點，可參閱 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 得知實作手法; ## 開發過程 ### q_new: * 建立新的「空」佇列; ```c struct list_head *q_new() { struct list_head *q = malloc(sizeof(struct list_head)); if (q == NULL) return NULL; INIT_LIST_HEAD(q); return q; } ``` ### q_free: * 釋放佇列所佔用的記憶體; * 依序釋放掉 list 第一個 entry，但在`list.h`中，對`list_del()`的說明為： `The node is only removed from the list. Neither the memory of the removed node nor the memory of the entry containing the node is free'd.`，所以需要再利用 q_release_element 來刪除在 memory 裡面的位置。 :::danger 注意共筆書寫規範：中英文間用一個空白字元區隔 :notes: jserv ::: :::info 已修改 ::: ```c void q_free(struct list_head *l) { if (l == NULL) return; while (list_empty(l) == 0) { element_t *target = list_first_entry(l, element_t, list); list_del(l->next); q_release_element(target); } free(l); } ``` ### q_insert_head: * 在佇列開頭 (head) 插入 (insert) 給定的新節點 (以 LIFO 準則); ```c bool q_insert_head(struct list_head *head, char *s) { if (head == NULL) return false; element_t *newnode = malloc(sizeof(element_t)); if (newnode == NULL) return false; int length = strlen(s) + 1; newnode->value = malloc(sizeof(char) * (length)); if (newnode->value == NULL) { // Cannot allocate space for value; free(newnode); return false; } strncpy(newnode->value, s, length); list_add(&newnode->list, head); // why &?? return true; } ``` ### q_insert_tail: * 在佇列尾端 (tail) 插入 (insert) 給定的新節點 (以 FIFO 準則); ```c bool q_insert_tail(struct list_head *head, char *s) { if (head == NULL) return false; element_t *newnode = malloc(sizeof(element_t)); if (newnode == NULL) return false; int length = strlen(s) + 1; newnode->value = malloc(sizeof(char) * (length)); if (newnode->value == NULL) { // Cannot allocate space for value; free(newnode); return false; } strncpy(newnode->value, s, length); list_add_tail(&newnode->list, head); // why &?? return true; } ``` ### q_remove_head: * 在佇列開頭 (head) 移去 (remove) 給定的節點; * 原版本： ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (head == NULL || list_empty(head) || sp == NULL) return NULL; element_t *target = list_first_entry(head, element_t, list); list_del(head->next); int length = strlen(target->value); if (length <= bufsize - 1) { strncpy(sp, target->value, length); sp[length] = '\0'; } else { strncpy(sp, target->value, bufsize - 1); sp[bufsize - 1] = '\0'; } return target; } ``` * ==後來發現中間有類似的code，於是決定將8~15行改寫為：== ```c int length = strlen(target->value) <= bufsize - 1 ? strlen(target->value) : bufsize - 1; strncpy(sp, target->value, length); sp[length] = '\0'; ``` :::warning 不確定可讀性是否降低 > `strlen(target->value)` 不需要反覆執行 (evaluate)，可改為更簡潔的程式碼 > :notes: jserv ::: :::info 已修改，感謝教授提點 ::: * 經由教授的提點後將 `strlen(target->value)` 提出來獨立成一個變數，但是當我把所有的程式碼寫完後發現 `trace-17-complexity` 一直過不了，且錯誤訊息為當執行 `q_remove_tail()` 時會報： **`ERROR: Probably not constant time`** ，於是我回頭來看，認為造成這個的原因或許是因為使用了 `strlen()` ，他的 time complexity 為 O(n) * * 程式碼有些冗長，於是==最終==修改為以下版本： ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (head == NULL || list_empty(head) || sp == NULL) return NULL; element_t *target = list_first_entry(head, element_t, list); list_del(head->next); strncpy(sp, target->value, bufsize); sp[bufsize - 1] = '\0'; return target; } ``` ### q_remove_tail: ```c element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) { if (head == NULL || list_empty(head) || sp == NULL) return NULL; element_t *target = list_last_entry(head, element_t, list); list_del(head->prev); strncpy(sp, target->value, bufsize); sp[bufsize - 1] = '\0'; return target; } ``` ### q_size: * 計算目前佇列的節點總量; ```c int q_size(struct list_head *head) { if (head == NULL || list_empty(head)) return 0; unsigned int count = 0; struct list_head *node = NULL; list_for_each (node, head) count++; return count; } ``` ### q_delete_mid: * 移走佇列中間的節點 * fast 一次跑兩個 node , slow 一次只跑一個 node ，所以當 fast 到達最後一個 node 或是回到 head 時，表示 slow 已經到達整個 list 的中間了，此時就可以 delete 掉 slow 所處的 entry 。 ```c bool q_delete_mid(struct list_head *head) { // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/ if (head == NULL || list_empty(head)) return false; struct list_head *slow = NULL, *fast = NULL; for (slow = fast = head->next; fast != head && fast->next != head; slow = slow->next, fast = fast->next->next) { } list_del(slow); q_release_element(list_entry(slow, element_t, list)); return true; } ``` ### q_delete_dup: * 在已經排序的狀況，移走佇列中具備重複內容的節點，詳見 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) ```c bool q_delete_dup(struct list_head *head) { // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/ if (head == NULL) return false; element_t *entry = NULL, *safe = NULL; char *pre_value = ""; list_for_each_entry_safe (entry, safe, head, list) { if (strcmp(pre_value, entry->value) == 0) { list_del(&entry->list); q_release_element(entry); } else { pre_value = entry->value; } } return true; } ``` :::warning * 原本剛寫完的時候覺得過了就 ok ，但當我回來要寫這段 code 的說明時，發現了我的作法會殘留一個 entry 在 list 裡面且竟然不會報錯！？於是我回頭去看 `qtest.c` ，發現在`static bool do_dedup()`程式碼是有錯誤的，510~524 行的 code 似乎在不全刪的情況下依然能通過。 * 雖然沒報錯但不符合題意，於是我將我的程式碼修改為以下： ::: * 新增了 `last_dup` 判斷是否有 duplicate 的存在，若有則將 `same` 所指向的 node 從 list 中 delete 掉(因為 same 會指向第一個 duplicate entry ) ```c bool q_delete_dup(struct list_head *head) { // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/ if (head == NULL) return false; element_t *entry = NULL, *safe = NULL; char *pre_value = ""; struct list_head *same = NULL; bool last_dup = false; list_for_each_entry_safe (entry, safe, head, list) { if (strcmp(pre_value, entry->value) == 0) { last_dup = true; list_del(&entry->list); q_release_element(entry); } else { if(last_dup == true) { list_del(same); q_release_element(list_entry(same, element_t, list)); last_dup = false; } pre_value = entry->value; same = &entry->list; } } return true; } ``` ### q_swap: * 交換佇列中鄰近的節點，詳見[LeetCode 24. Swap Nodes in Pairs](https://leetcode.com/problems/swap-nodes-in-pairs/) * `list_move_tail` 的作法是合併 `list_del(tmp)` 和 `list_add_tail(tmp, node)` ，經由查看 `list.h` 可發現： `list_add_tail()` 是透過 *prev 將 entry 插入 list 最後面，而 `list_del()` 將 entry 從 list 移除不會將該 entry 從記憶體 free掉，因此可以藉由此方法將 `node->next` 移到 node 前面 ```c void q_swap(struct list_head *head) { // https://leetcode.com/problems/swap-nodes-in-pairs/ if (head == NULL) return; struct list_head *node = NULL; for (node = head->next; node != head && node->next != head; node = node->next) { struct list_head *tmp = node->next; list_move_tail(tmp, node); } } ``` ### q_reverse: * 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列節點，換言之，它只能重排已經存在的節點; * `list_for_each_safe` 可以保證依舊可以讀到下一個 entry ，利用此特性將 node 的 `*prev` 和 `*next`內容透過 `*tmp` 做交換 ```c void q_reverse(struct list_head *head) { if (head == NULL || list_empty(head)) return; struct list_head *node = NULL, *safe = NULL, *tmp = NULL; list_for_each_safe (node, safe, head) { tmp = node->next; node->next = node->prev; node->prev = tmp; } tmp = head->next; head->next = head->prev; head->prev = tmp; } ``` ### q_sort * 以==遞增順序==來排序鏈結串列的節點，可參閱 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 得知實作手法; * 將 merge sort 的實做分成 `merge()` 、`mergeSortList` 以及 `q_sort ` * **`q_sort()`:** * 首先先將 doubly linked list 利用 `head->prev->next = NULL` 的方式變成 singly linked list，然後再呼叫 `mergeSortList` 進行 merge sort 。 * 以 singly linked list 排序完的 node->prev 都會亂掉，所以最後再將他們整理回來。 * **`mergeSortList()`:** * 利用跟 `q_delete_mid` 一樣的手法找到中間的 node 後，將 list 拆成 `l1` 、`l2` ，然後再呼叫 `merge()` 合併 `l1` 、`l2` * **`merge()`:** * `merge() 初始作法：利用 recursive 完成，但未通過 performance 的測試` :::spoiler 初始 merge() 的程式碼： ```c struct list_head *merge(struct list_head *list1, struct list_head *list2) { // merge with recursive if (list1 == NULL) return list2; if (list2 == NULL) return list1; element_t *list1_entry = list_entry(list1, element_t, list); element_t *list2_entry = list_entry(list2, element_t, list); if (strcmp(list1_entry->value, list2_entry->value) < 0) { list1->next = merge(list1->next, list2); return list1; } else { list2->next = merge(list1, list2->next); return list2; } } ``` ::: * 後來參考[你所不知道的 C 語言: linked list 和非連續記憶體：案例探討: LeetCode 21. Merge Two Sorted Lists](https://hackmd.io/@sysprog/c-linked-list?fbclid=IwAR2ta34m_jdXBIsegt04JYa5iW1bVpMe_JO7d7vxSuE0qrxD5DUzRkeroQs#%E6%A1%88%E4%BE%8B%E6%8E%A2%E8%A8%8E-LeetCode-21-Merge-Two-Sorted-Lists) 的 pointer to pointer 作法，透過 `*ptr` 的存取去修改 `head` 的內容，最後回傳的 `head` 即為合併後的 linked list ```c struct list_head *merge(struct list_head *list1, struct list_head *list2) { struct list_head *head = NULL; struct list_head **ptr = &head; for (; list1 && list2; ptr = &(*ptr)->next) { element_t *list1_entry = list_entry(list1, element_t, list); element_t *list2_entry = list_entry(list2, element_t, list); if (strcmp(list1_entry->value, list2_entry->value) <= 0) { *ptr = list1; list1 = list1->next; } else { *ptr = list2; list2 = list2->next; } } if(list1 == NULL) *ptr = list2; if(list2 == NULL) *ptr = list1; return head; } struct list_head *mergeSortList(struct list_head *list) { if (!list || !list->next) return list; // split list struct list_head *fast = list->next; struct list_head *slow = list; while (fast != NULL && fast->next != NULL) { slow = slow->next; fast = fast->next->next; } fast = slow->next; // Right hand side slow->next = NULL; // Left hand side's tail // Sort each list struct list_head *l1 = mergeSortList(list); struct list_head *l2 = mergeSortList(fast); // merge sorted l1 and sorted l2 return merge(l1, l2); } void q_sort(struct list_head *head) { if (head == NULL || list_empty(head) || list_is_singular(head)) return; struct list_head *list = head->next; head->prev->next = NULL; head->next = mergeSortList(list); struct list_head *curr = head, *next = head->next; while (next != NULL) { next->prev = curr; curr = next; next = next->next; } curr->next = head; head->prev = curr; } ``` ### 在 qtest 提供新的命令 shuffle * 允許藉由 Fisher–Yates shuffle 演算法，對佇列中所有節點進行洗牌 (shuffle) 操作 * 為了了解如何在 qtest 新增新的指令 shuffle ，我參考了 [Risheng1128](https://hackmd.io/@Risheng/linux2022-lab0#%E5%9C%A8-qtest-%E6%8F%90%E4%BE%9B%E6%96%B0%E7%9A%84%E5%91%BD%E4%BB%A4-shuffle) 的步驟，分析 qtest 是如何從得到命令一直到使用 `console.c` 來添加以及執行命令的運作流程。而在後面該作者得出以下的結論： :::info 結論: 1. 一開始使用函式 `getopt()` 取得使用者對 qtest 的輸入 2. 接著在函式 `console_init()` 裡一路追蹤，得知了所有的命令一開始都會被儲存在一個 linked list 上 3. 執行命令一共有兩種模式，等待使用者輸入以及讀取檔案 4. 利用函式指標執行每種不同的指令 ::: * 因此可以得知 qtest 從執行、初始化命令到執行命令的過程，而以下是我的新增指令的實做步驟： 1. 在 `qtest.c` 中的 `console_init()` 新增 shuffle 命令 2. 仿照其他指令新增函式 `do_shuffle()` ，作為命令 `shuffle` 的執行函式 3. 根據 Fisher–Yates shuffle 演算法來實做 `q_shuffle()` **STEP 1.** 在 `qtest.c` 中的 `console_init()` 新增 shuffle 命令 ```c ADD_COMMAND(shuffle, " | shuffle the queue"); ``` **Step 2.** 仿照其他指令新增函式 `do_shuffle()` ，作為命令 `shuffle` 的執行函式 ```c static bool do_shuffle(int argc, char *argv[]) { if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } if (!l_meta.l) report(3, "Warning: Try to access null queue"); error_check(); set_noallocate_mode(true); if (exception_setup(true)) q_shuffle(l_meta.l); exception_cancel(); set_noallocate_mode(false); show_queue(3); return !error_check(); } ``` **Step 3.** 根據 Fisher–Yates shuffle 演算法來實做 `q_shuffle()` * 一開始實做的時候本想將 `q_shuffle()` 和其他的 q_* function 一樣定義在 `queue.h` 並實做在 `queue.c` ，但當我 `$ git commit -a` 時會因為無法修改 `queue.h` 而報錯，因此我將 `q_shuffle()` 放在 `qtest.c` 的 `do_shuffle()` 上面。 * 而這邊的操作方式是參考了[Fisher–Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Fisher_and_Yates'_original_method)裡提到的 **Pencil-and-paper method** ，第 i 輪都從前 q_size - i 個數中隨機取第 num 個數，並將該數移到 list 的最後面，因此當離開迴圈的時候，第一個 node 就會是當初第一個移動的 node。 ```c void q_shuffle(struct list_head *head) { if (!head || list_empty(head)) return; srand(time(NULL)); int count = q_size(head); if (count == 1) return; for (; count > 0; count--) { struct list_head *curr = head->next; int num = rand() % count + 1; count--; for (int i = 1; i < num; i++) { curr = curr->next; } list_move_tail(curr, head); } return; } ``` ### 運用 Valgrind 排除 qtest 實作的記憶體錯誤輸入以下指令檢測由 command line 輸入的情形： ```shell $ valgrind -q --leak-check=full ./qtest ``` 而後在 `cmd>` 依序輸入和 `traces/trace-01-ops.cmd` 相同的指令並以 ctrl+c 終止，得到以下結果： ```shell cmd> new l = [] cmd> ih gerbil l = [gerbil] cmd> ih bear l = [bear gerbil] cmd> ih dolphin l = [dolphin bear gerbil] cmd> rh dolphin Removed dolphin from queue l = [bear gerbil] cmd> rh bear Removed bear from queue l = [gerbil] cmd> rh gerbil Removed gerbil from queue l = [] cmd> Freeing queue ``` 這樣看下來目前沒什麼問題，但若在 terminal 輸入： `$ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd` 讀取檔案 `trace-01-ops.cmd`： > 根據網路參考資源 [使用 VALGRIND 側寫記憶體使用量](https://access.redhat.com/documentation/zh-tw/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-valgrind) 對 `--leak-check` 的解釋： -> 啟用這選項時，memcheck 會在用戶端程式完成執行時，搜尋記憶體漏洞。預設值為 summary（摘要），會列出漏洞的數目。其它選項包括 yes（是）與 full（完整），兩者都會列出每個漏洞的詳細資料；而選項 no（否）會停用檢查記憶體漏洞的功能。 :::spoiler 執行結果 ```shell $ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd cmd> # Test of insert_head and remove_head cmd> option fail 0 cmd> option malloc 0 cmd> new l = [] cmd> ih gerbil l = [gerbil] cmd> ih bear l = [bear gerbil] cmd> ih dolphin l = [dolphin bear gerbil] cmd> rh dolphin Removed dolphin from queue l = [bear gerbil] cmd> rh bear Removed bear from queue l = [gerbil] cmd> rh gerbil Removed gerbil from queue l = [] cmd> Freeing queue ==154524== 8 bytes in 1 blocks are still reachable in loss record 1 of 3 ==154524== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==154524== by 0x4A7A50E: strdup (strdup.c:42) ==154524== by 0x110A44: linenoiseHistoryAdd (linenoise.c:1236) ==154524== by 0x1115D7: linenoiseHistoryLoad (linenoise.c:1325) ==154524== by 0x10C80B: main (qtest.c:998) ==154524== ==154524== 159 bytes in 19 blocks are still reachable in loss record 2 of 3 ==154524== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==154524== by 0x4A7A50E: strdup (strdup.c:42) ==154524== by 0x1109B8: linenoiseHistoryAdd (linenoise.c:1236) ==154524== by 0x1115D7: linenoiseHistoryLoad (linenoise.c:1325) ==154524== by 0x10C80B: main (qtest.c:998) ==154524== ==154524== 160 bytes in 1 blocks are still reachable in loss record 3 of 3 ==154524== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==154524== by 0x110A04: linenoiseHistoryAdd (linenoise.c:1224) ==154524== by 0x1115D7: linenoiseHistoryLoad (linenoise.c:1325) ==154524== by 0x10C80B: main (qtest.c:998) ==154524== ``` ::: ->就會出現 **still reachable** 的 memory leak： > 看了網路上的文章 [valgrind报告5种内存泄露的研究](https://blog.csdn.net/louObaichu/article/details/45507365) 以及 [Still Reachable Leak detected by Valgrind](https://stackoverflow.com/questions/3840582/still-reachable-leak-detected-by-valgrind) 了解，**still reachable** 是 memory leak 的其中一種，而 still reachable 官方解釋為： "still reachable"： means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. > 若程序是正常結束的，那他可能不會造成程序崩潰，但長時間運行的話有可能會耗盡系統資源而後參考了 [RiSheng](https://hackmd.io/@Risheng/linux2022-lab0#%E9%81%8B%E7%94%A8-Valgrind-%E6%8E%92%E9%99%A4-qtest-%E5%AF%A6%E4%BD%9C%E7%9A%84%E8%A8%98%E6%86%B6%E9%AB%94%E9%8C%AF%E8%AA%A4) 的開發紀錄，跟著錯誤訊息找到在 `linenoise.c` 中的 `linenoiseHistoryAdd()` ，而錯誤訊息又分兩種： malloc 和 strdup ，因此我們可以先看以下的 `linenoiseHistoryAdd()` 程式碼： ```c=1 int linenoiseHistoryAdd(const char *line) { char *linecopy; if (history_max_len == 0) return 0; /* Initialization on first call. */ if (history == NULL) { history = malloc(sizeof(char *) * history_max_len); if (history == NULL) return 0; memset(history, 0, (sizeof(char *) * history_max_len)); } /* Don't add duplicated lines. */ if (history_len && !strcmp(history[history_len - 1], line)) return 0; /* Add an heap allocated copy of the line in the history. * If we reached the max length, remove the older line. */ linecopy = strdup(line); if (!linecopy) return 0; if (history_len == history_max_len) { free(history[0]); memmove(history, history + 1, sizeof(char *) * (history_max_len - 1)); history_len--; } history[history_len] = linecopy; history_len++; return 1; } ``` * 錯誤訊息1: malloc * 可以在第10行找到： ```c history = malloc(sizeof(char *) * history_max_len); ``` * 錯誤訊息2: strdup * 可以在第22行找到： ```c linecopy = strdup(line); ``` :::warning 未完待續 :::