--- tags: linux2022, linux --- # 2022q1 Homework1 (lab0) contributed by < `AmyLin0210` > > [作業要求](https://hackmd.io/@sysprog/linux2022-lab0) ## 基本實做 首先我們要先找到 linked list 的資料型態 在 `list.h` 裡面定義了 `struct list_head` ```c struct list_head { struct list_head *prev; struct list_head *next; }; ``` 在 `queue.h` 裡面定義了 `element_t` ```c typedef struct { char *value; struct list_head list; } element_t; ``` 在這邊需要使用到 [container_of](https://hackmd.io/@sysprog/linux-macro-containerof) 來拿 `element_t` 內的字串,在 `list.h` 中同樣可以看到取得 entry containing node 的指標的程式碼 ```cpp #define list_entry(node, type, member) container_of(node, type, member) ``` 在這裡要實作的 circular doubly-linked list 架構示意圖如下: ```graphviz digraph list_add_tail { nodesep=0.3 rankdir="LR" node[shape=record] head [ label = " {<p> prev | <n> next} " xlabel = "head" style="filled" fillcolor="lightblue" ] L1 [ label = " value | {<p> prev | <n> next} " xlabel = "L1" style="filled" fillcolor="pink" ] L2 [ label = " value | {<p> prev | <n> next} " xlabel = "L2" style="filled" fillcolor="pink" ] head:n->L1 head:p->L2 L1:n->L2 L1:p->head L2:n->head L2:p->L1 } ``` ### q_insert_head 由於在 `q_insert_head` 與 `q_insert_tail` 中同樣都有需要增加一個 element 的需求,因此把新增一個 element 這件事情獨立出來,實做一個 `new_element` 的函式。 ```c struct list_head *new_element(char *s) { element_t *tmp = malloc(sizeof(element_t)); if (!tmp) return NULL; tmp->list.prev = NULL; tmp->list.next = NULL; if (s) { tmp->value = strdup(s); if (!tmp->value) { free(tmp); return NULL; } } else { tmp->value = NULL; } return &(tmp->list); } ``` 以下是 `q_insert_head` 的實做。 ```c bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; struct list_head *tmp = new_element(s); if (!tmp) return false; list_add(tmp, head); return true; } ``` ### q_insert_tail ```c bool q_insert_tail(struct list_head *head, char *s) { if (!head) return false; struct list_head *tmp = new_element(s); if (!tmp) return false; list_add_tail(tmp, head); return true; } ``` ### q_remove_head ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (!head || list_empty(head)) return NULL; element_t *tmp = list_entry(head->next, element_t, list); if (sp) { strncpy(sp, tmp->value, bufsize); strcpy(sp + bufsize - 1, "\0"); } list_del(head->next); return tmp; } ``` ### q_remove_tail ```c element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) { if (!head || list_empty(head)) return false; element_t *tmp = list_entry(head->prev, element_t, list); if (sp){ strncpy(sp, tmp->value, bufsize); strcpy(sp + bufsize - 1, "\0"); } list_del(head->prev); return tmp; } ``` ### q_size ```clike int q_size(struct list_head *head) { if (!head) return 0; int len = 0; struct list_head *li; list_for_each (li, head) len++; return len; } ``` ### q_delete_mid 在這邊使用到了快慢指標的技巧,每一次快指標( `fast` ) 往下走兩個位置,慢指標 ( `slow` )就往下走一個位置,當快指標走回 `head` 的時候,慢指標會正好走到整條 linked list 的中央。 ```c bool q_delete_mid(struct list_head *head) { if (!head || list_empty(head)) return false; struct list_head *slow = head->next; for (struct list_head *fast = head->next; fast != head && fast->next != head; fast = fast->next->next) { slow = slow->next; } list_del(slow); q_release_element(list_entry(slow, element_t, list)); return true; } ``` ### q_reverse 在這裡的 reverse 想法是,從頭開始迭代一條 linked list,每一次都將走訪到的節點挪置 head,當走訪完整條 linked list,那這條 linked list 就同時被 reverse 了。 示意圖: ``` 。 head -> node1 -> node2 -> node3 ``` ``` 。 head -> node1 -> node2 -> node3 // 將 node2 挪置最前方 head -> node2 -> node1 -> node3 ``` ``` 。 head -> node2 -> node1 -> node3 // 將 node3 挪置最前方,迭代完成 head -> node3 -> node2 -> node1 ``` 程式碼: ```c void q_reverse(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *iter, *next; list_for_each_safe(iter, next, head) { list_move(iter, head); } } ``` :::warning TODO: 一併討論針對 singly-linked list vs. doubly-linked list,這類 reverse 操作的實作考量。有些科技公司面試會出這樣的題目。 :notes: jserv ::: :::info 在 doubly-linked list 中,有許多作法都可以達成 reverse 這項操作,除了上述在遍歷節點後將節點挪至最前端外,也可以直接將節點中的 `prev` 與 `next` 做 swap,這樣的優點是可以減少一些變數的使用。 以下圖範例程式碼為例,我只需要儲存 `head` 與 `iter` 就能夠完成整個 reverse 的操作。 ```c void q_reverse(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *iter = head; do { iter->next = (void *)((uintptr_t)(iter->prev) ^ (uintptr_t)(iter->next)); iter->prev = (void *)((uintptr_t)(iter->prev) ^ (uintptr_t)(iter->next)); iter->next = (void *)((uintptr_t)(iter->prev) ^ (uintptr_t)(iter->next)); iter = iter->prev; } while (iter != head); } ``` 在上方程式碼不使用 `list_h` 內提供的 API 的原因是,`list_for_each_safe` 不會遍歷到 `head` 這個節點,但我希望可以將所有的 `prev` 與 `next` 做 swap (含 `head`) 而不要處理太多例外,因此使用 while 迴圈來實做。 而在 singly-linked list 中,想要 reverse 需要得到下面的資訊:正在處理的節點、節點的前一個節點、節點的後一個節點。而且也沒辦法像上面的 doubly-linked list 一樣使用對指標 swap 的方式處理,必然是將節點拆開重接。 下面是 singly-linked list 的測試程式碼 ```c struct node_s { int val; struct node_s *next; }; void reverse (struct node_s *head) { struct node_s *iter = head; struct node_s *prev = NULL; struct node_s *next = head->next; do { iter->next = prev; prev = iter; iter = next; next = next->next; } while (iter != head); head->next = prev; } ``` ::: ### q_sort 在這邊我使用的演算法為 merge sort ```c void q_sort(struct list_head *head) { if (!head || list_empty(head)) return; head->prev->next = NULL; struct list_head *sorted = mergesort(head->next); head->next = sorted; sorted->prev = head; while(sorted->next) { sorted = sorted->next; } sorted->next = head; head->prev = sorted; } struct list_head *mergesort(struct list_head *head) { if (!head->next) { return head; } struct list_head *mid = head; for (struct list_head *fast = head->next; fast && fast->next; fast = fast->next->next) { mid = mid->next; } struct list_head *right = mergesort(mid->next); mid->next = NULL; struct list_head *left = mergesort(head); head = merge(left, right); return head; } struct list_head *merge(struct list_head *l1, struct list_head *l2) { struct list_head *head = l1; struct list_head *prev = NULL; struct list_head **ptr = &head; struct list_head **node; for (node = NULL; l1 && l2; *node = (*node)->next) { node = (strcmp(list_entry(l1, element_t, list)->value, list_entry(l2, element_t, list)->value) < 0) ? &l1 : &l2; (*node)->prev = prev; prev = *node; *ptr = *node; ptr = &(*ptr)->next; } *ptr = (l1)? l1 : l2; (*ptr)->prev = prev; return head; } ``` :::warning TODO: 避免使用遞迴呼叫,並從 Linux 核心原始程式碼的 list_sort.c 探討後續改進。 :notes: jserv ::: ### q_swap ```c void q_swap(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *first = head->next, *second = head->next->next; for (;first != head && second != head; first = first->next, second = first->next) { first->prev->next = second; first->next = second->next; second->next->prev = first; second->next = first; second->prev = first->prev; first->prev = second; } } ``` ### q_delete_dup ```c bool q_delete_dup(struct list_head *head) { if (!head || list_empty(head)) return false; struct list_head *cur = head->next; struct list_head *tmp = cur; while (cur != head && cur->next != head) { char *c1 = list_entry(cur, element_t, list)->value; char *c2 = list_entry(cur->next, element_t, list)->value; tmp = cur->next; if (strcmp(c1, c2) == 0) { while (tmp != head && strcmp(c1, c2) == 0) { cur->next = tmp->next; tmp->next->prev = cur; q_release_element(list_entry(tmp, element_t, list)); tmp = cur->next; c2 = list_entry(tmp, element_t, list)->value; } cur->prev->next = tmp; tmp->prev = cur->prev; q_release_element(list_entry(cur, element_t, list)); cur = tmp; } else { cur = cur->next; } } return true; } ``` ## 運用 Valgrind 排除 qtest 實作的記憶體錯誤 在單純使用 valgrind 去觀察執行 `./qtest` 的時候沒有問題 ```bash $ valgrind ./qtest cmd> new l = [] cmd> ih aaa l = [aaa] cmd> ih bbb l = [bbb aaa] cmd> reverse l = [aaa bbb] cmd> quit Freeing queue ``` 但發現若在 `./qtest` 加上參數 `-f` ,從檔案來測試時,就會出現 memory leak。 ```bash $ valgrind ./qtest -f ./traces-01-ops.cmd ... ==71277== 41 bytes in 1 blocks are still reachable in loss record 1 of 3 ==71277== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==71277== by 0x4A4F50E: strdup (strdup.c:42) ==71277== by 0x1108BA: linenoiseHistoryAdd (linenoise.c:1236) ==71277== by 0x11144D: linenoiseHistoryLoad (linenoise.c:1325) ==71277== by 0x10C6B0: main (qtest.c:951) ==71277== ==71277== 160 bytes in 1 blocks are still reachable in loss record 2 of 3 ==71277== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==71277== by 0x11087A: linenoiseHistoryAdd (linenoise.c:1224) ==71277== by 0x11144D: linenoiseHistoryLoad (linenoise.c:1325) ==71277== by 0x10C6B0: main (qtest.c:951) ==71277== ==71277== 223 bytes in 19 blocks are still reachable in loss record 3 of 3 ==71277== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==71277== by 0x4A4F50E: strdup (strdup.c:42) ==71277== by 0x11082E: linenoiseHistoryAdd (linenoise.c:1236) ==71277== by 0x11144D: linenoiseHistoryLoad (linenoise.c:1325) ==71277== by 0x10C6B0: main (qtest.c:951) ==71277== ``` 從上面發現的特性,可以去找找看從 file 讀取資料與從 command line 輸入資料兩種模式會有哪些不同的行為。 在 `qtest.c` 裡的第 910 行,可以找到它去判斷 option 的程式碼,發現如果有給 `-f` 這個參數,會將檔案名稱丟入 `infile_name`。 ```c case 'f': strncpy(buf, optarg, BUFSIZE); buf[BUFSIZE - 1] = '\0'; infile_name = buf; break; ``` 再來找到 `infile_name` 這個參數會在哪裡被使用到,可以發現同樣在 `qtest.c` 裡的 962 行呼叫了 `run_console(file_name)`,這個函式被定義在 `console.c` 的 639 行。 從以下程式碼可以發現,若有 `infile_name`,那檔案的內容會在第三行的 `push_file` 函式處理,由於不會執行到第 8 至 15 行,所以不會執行到 `linenoise.c` 的內容。 ```c= bool run_console(char *infile_name) { if (!push_file(infile_name)) { report(1, "ERROR: Could not open source file '%s'", infile_name); return false; } if (!has_infile) { char *cmdline; while ((cmdline = linenoise(prompt)) != NULL) { interpret_cmd(cmdline); linenoiseHistoryAdd(cmdline); /* Add to the history. */ linenoiseHistorySave(HISTORY_FILE); /* Save the history on disk. */ linenoiseFree(cmdline); } } else { while (!cmd_done()) cmd_select(0, NULL, NULL, NULL, NULL); } return err_cnt == 0; } ``` 回到 `qtest.c` 這個檔案,在第 951 行的地方呼叫了 `linenoiseHistoryLoad(HISTORY_FILE)`,在這邊 `HISTORY_FILE` 的內容是 `.cmd_history`,所以它會讀檔案的內容並存起來,但由於後續不會去釋放掉相關的記憶體,所以就造成了 memory leak。 :::warning 準備提交 pull request? :notes: jserv ::: :::info 在提交 pull request 之前,已經有對這份 project 做過一些變更 (e.g. queue.c) 並 commit,但不希望在這個檔案內的內容同樣被 pr 上去,所以做了以下處理: 1. 當前進度完成後,在本地端 commit 2. 開一條新的 branch ,取名作 `patch-qtest` 3. 將其內容回朔到變更 `queue.c` 之前 4. 將已經在[原專案](https://github.com/sysprog21/lab0-c)的變更 pull 回來 5. 修改想要變更的內容並 commit 經過以下操作,在 `patch-qtest` 這條分支內的變更內容就會只剩下想要對 `qtest.c` 的變更。 ::: 觀察 `linenoiseHistoryLoad(HISTORY_FILE)` ,它所做的事情是將過去 command line 所執行的指令個重新讀入再寫進檔案,如果是想要留下所有的歷史紀錄的話,在沒有下 `-f` 這個參數時執行就好,因此變更 `qtest.c` 的程式碼為: ```c if (!infile_name) { /* Trigger call back function(auto completion) */ linenoiseSetCompletionCallback(completion); linenoiseHistorySetMaxLen(HISTORY_LEN); linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */ } ``` 發現在給定錯誤的測試檔案路徑時,也會造成 memory leak ```bash $ valgrind ./qtest -f ./traces/trace-01-ops.cm ERROR: Could not open source file './traces/trace-01-ops.cm' ==136232== 32 bytes in 1 blocks are still reachable in loss record 1 of 28 ==136232== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==136232== by 0x10CD61: malloc_or_fail (report.c:189) ==136232== by 0x10D7EB: add_cmd (console.c:89) ==136232== by 0x10DB75: init_cmd (console.c:408) ==136232== by 0x10C4C1: main (qtest.c:944) ==136232== ==136232== 32 bytes in 1 blocks are still reachable in loss record 2 of 28 ==136232== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==136232== by 0x10CD61: malloc_or_fail (report.c:189) ==136232== by 0x10D7EB: add_cmd (console.c:89) ==136232== by 0x10DB8F: init_cmd (console.c:409) ==136232== by 0x10C4C1: main (qtest.c:944) ==136232== ==136232== 32 bytes in 1 blocks are still reachable in loss record 3 of 28 ==136232== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==136232== by 0x10CD61: malloc_or_fail (report.c:189) ==136232== by 0x10D7EB: add_cmd (console.c:89) ==136232== by 0x10DBA9: init_cmd (console.c:410) ==136232== by 0x10C4C1: main (qtest.c:944) ... ``` 找到 `console.c` 的 `run_console` 函式,可以發現當他找不到對應的檔案時,會執行以下的程式碼: ```c if (!push_file(infile_name)) { report(1, "ERROR: Could not open source file '%s'", infile_name); return false; } ``` 再回到 `qtest.c` ,觀察以下程式碼,會發現若檔案路徑錯誤,會回傳 `false`,然後在第 3 行的地方,由於使用了 `&&` ,所以當 `ok` 為 `false` 時,會跳過 `finish_cmd()` 這個函式呼叫,故更改成以下程式碼,讓 `finish_cmd()` 先執行。 ```diff bool ok = true; ok = ok && run_console(infile_name); + ok = finish_cmd() && ok; - ok = ok && finish_cmd(); ``` 在還沒有對程式碼做變更的階段,使用 massif 來看 ```bash $ valgrind --tool=massif ./qtest -f ./traces/trace-01-ops.cmd $ ms_print massif.out.101289 ms_print massif.out.101289 -------------------------------------------------------------------------------- Command: ./qtest -f ./traces/trace-01-ops.cmd Massif arguments: (none) ms_print arguments: massif.out.101289 -------------------------------------------------------------------------------- KB 14.59^ ### | # ::: :: | # : :: | # : :: | # : :: : | :@:::@@@# : @: : | :@:::@@@# : @: : | :@:::@@@# : @: : | :@:::@@@# : @: : | :@:::@@@# : @: : | :@:::@@@# : @: : | :@:::@@@# : @: : | :@@@:::@@@# : @: : | :@@@:::@@@# : @: : | :@@@:::@@@# : @: : | :@@@:::@@@# : @: : | :@@@:::@@@# : @: @ | :@@@:::@@@# : @: @ | @@@@:::@@@# : @: @: | @@@@@:::@@@# : @: @: 0 +----------------------------------------------------------------------->ki 0 305.6 ``` 變更 `qtest.c` 後印出的東西為 ```bash $ ms_print massif.out.133040 -------------------------------------------------------------------------------- Command: ./qtest -f ./traces/trace-01-ops.cmd Massif arguments: (none) ms_print arguments: massif.out.133040 -------------------------------------------------------------------------------- KB 13.81^ # | #: :::::: | # : : | # : : | # :: : : | :::::@@@# :: : : | :::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | @::::@@@# :: : @ | :@@::::@@@# :: : @: 0 +----------------------------------------------------------------------->ki 0 295.8 ``` ## 在 qtest 提供新的命令 shuffle 在這邊會使用到 [Fisher–Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) 的演算法。 流程大致如下: ``` 原始 array: [0, 1, 2, 3, 4] 隨機數 原陣列內容 變更後陣列內容 0-4 3 [0, 1, 2, 3, 4] [0, 1, 2, 4, 3] 0-3 1 [0, 1, 2, 4, 3] [0, 2, 4, 3, 1] 0-2. 1 [0, 2, 4, 3, 1] [0, 4, 3, 1, 2] 0-1 0. [0, 4, 3, 1, 2] [4, 3, 1, 2, 0] ``` 在資料型態為 array 的狀態中,時間複雜度為 $O(n)$,但是若資料型態為 linked list,由於會有把 node 從特定的位置挪至最後面,時間複雜度會增加到 $O(n^2)$ 由於不能夠修改 `queue.h` 這份檔案,因此只好將 `q_shuffle` 這個函式實作在 `qtest.c` 內 ```c static void q_shuffle(struct list_head *head) { if (!head || list_empty(head)) return; srand(time(NULL)); int size = q_size(head); for (int i = 0; i < size - 1; ++i) { struct list_head *tmp = head->next; int n = rand() % (size - i - 1); for (int j = 0; j < n; ++j) { tmp = tmp->next; } list_move_tail(tmp, head); } } static bool do_shuffle(int argc, char *argv[]) { if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } if (!l_meta.l) report(3, "Warning: Calling shuffle on null queue"); error_check(); set_noallocate_mode(true); if (exception_setup(true)) q_shuffle(l_meta.l); exception_cancel(); set_noallocate_mode(false); show_queue(3); return !error_check(); } ``` 並同時新增下面的程式碼至 `test.c` 內,使其可以執行 shuffle。 ```c ADD_COMMAND(shuffle, " | Shuffle queue"); ``` ## 參考資料 - [Linux 核心原始程式碼巨集: container_of](https://hackmd.io/@sysprog/linux-macro-containerof) - [你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list)