2020q3 Homework1 (lab0)

# 2020q3 Homework1 (lab0) contributed by < `Weiting0114` > ## 簡介 lab0 中主要是透過許多工具，讓我們養成良好的寫程式習慣，包含： * Git pre-commit hook * [git-good-commit](https://github.com/tommarshall/git-good-commit): 學習撰寫良好的 Git commit message * [cppcheck](http://cppcheck.sourceforge.net/): 靜態程式碼分析工具 * [clang-format](https://clang.llvm.org/docs/ClangFormat.html): 確保一致的程式風格 * [GNU Aspell](http://aspell.net/): 拼字檢查 * [valgrind](https://valgrind.org/): 動態程式分析工具 * [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer): 檢查記憶體的使用詳細的環境設定以及lab的相關說明請參閱 [I01: lab0](https://hackmd.io/@sysprog/2020-lab0?fbclid=IwAR2YOjtMYzE_kB5vqvBUOSM9OlR5hgvz7NHUGg21NOcN9Hb8vccxEV-yC3o) [Github 程式碼](https://github.com/Weiting0114/lab0-c) ## 環境 ```lang=shell $ uname -a Linux eric 5.4.0-47-generic #51~18.04.1-Ubuntu SMP Sat Sep 5 14:35:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` ## 作業要求於 `queue.[c/h]` 中完成下列功能 * `q_new`: 建立新的「空」佇列; * `q_free`: 釋放佇列所佔用的記憶體; * `q_insert_head`: 在佇列開頭 (head) 加入 (insert) 給定的新元素 (以 LIFO 準則); * `q_insert_tail`: 在佇列尾端 (tail) 加入 (insert) 給定的新元素 (以 FIFO 準則); * `q_remove_head`: 在佇列開頭 (head) 移除 (remove) 給定的元素; * `q_size`: 計算佇列中的元素數量; * `q_reverse`: 以反向順序重新排列鏈結串列，只能重排已經存在的元素; * `q_sort`: 以==遞增順序==來排序鏈結串列的元素 * 功能需要在 $O(1)$ 時間內執行 ## 開發過程 #### **`queue.h`** * 將 `queue_t` 結構體增加兩位成員，`tail`、`size`。增加 `list_ele_t *tail` 提供Queue快速尋訪，達成執行時間需在 $O(1)$ 內執行完畢。 ```c= /* Queue structure */ typedef struct { list_ele_t *head; /* Linked list of elements */ list_ele_t *tail; int size; } queue_t; ``` #### **`q_new`** * 透過判斷式確認記憶體分配是否成功，避免空指標的操作。 ```c= queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (q){ q->head = NULL; q->tail = NULL; q->size = 0; } return q; } ``` #### **`q_free`** * 利用新節點 `tmp` 儲存要釋放的資料 * 透過 `q->head` 的移動，將整個 `queue` 清空 >在清除節點時須先釋放資料再釋放節點，否則記憶體中會殘留資料 ```c= void q_free(queue_t *q) { if (q == NULL) return; while (q->head) { list_ele_t *temp; temp = q->head; q->head = q->head->next; free(temp->value); free(temp); } free(q); } ``` #### **`q_insert_head`** * 將新元素 `newh` 插入 `queue` 的最前端 * 判斷 `queue` 是否為 `NULL` ，避免在之後的計算上造成 memory leak * 分配記憶體給新的元素(`newh`)，判斷是否分配成功，若失敗，則將之前分配的 `newh` 清除 * 利用 C library <string.h> 的 strncpy 複製 s，並存入 `newh->value` * 在新增第一個元素時，須將 `q->head`、 `q->tail` 指向新插入的元素。之後隨著插入元素的增加，將 `q->head` 逐漸向後移 :::info :exclamation: 下列是未判斷 `queue` 是否為 `NULL`，會在 `trace-07-robust` 出錯，利用 valgrid 查看發現是 memory leak。 ```lang = shell $ valgrind --memcheck:show-leak-kinds=all ./qtest < traces/trace-07-robust.cmd ==5649== Address 0x10 is not stack'd, malloc'd or (recently) free'd ==5649== Segmentation fault occurred. You dereferenced a NULL or invalid pointer ... ==5649== HEAP SUMMARY: ==5649== in use at exit: 9,165 bytes in 30 blocks ==5649== total heap usage: 53 allocs, 23 frees, 10,442 bytes allocated ==5649== ==5649== LEAK SUMMARY: ==5649== definitely lost: 0 bytes in 0 blocks ==5649== indirectly lost: 0 bytes in 0 blocks ==5649== possibly lost: 0 bytes in 0 blocks ==5649== still reachable: 9,165 bytes in 30 blocks ==5649== suppressed: 0 bytes in 0 blocks ``` ::: ```cpp= bool q_insert_head(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newh; newh = malloc(sizeof(list_ele_t)); if (newh == NULL) return false; newh->value = malloc(sizeof(char) * strlen(s) + 1); if (newh->value == NULL) { free(newh); return false; } strncpy(newh->value, s, strlen(s) + 1); if (q->size == 0) { newh->next = NULL; q->head = q->tail = newh; } else { newh->next = q->head; q->head = newh; } q->size += 1; return true; } ``` #### **`q_insert_tail`** * 作法與 `q_insert_head` 差不多 * 將新元素 (`newt`) 放入 `queue` 的尾端 * 若插入為第一個元素，則將 head 和 tail 同時指向新元素 (`newt`) ```c= bool q_insert_tail(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newt; newt = malloc(sizeof(list_ele_t)); if (newt == NULL) return false; newt->value = malloc(sizeof(char) * strlen(s) + 1); if (newt->value == NULL) { free(newt); return false; } strncpy(newt->value, s, strlen(s) + 1); if (q->size == 0) { q->head = q->tail = newt; } else { q->tail->next = newt; q->tail = newt; } newt->next = NULL; q->size += 1; return true; } ``` #### **`q_remove_head`** * 藉由確認 `q->head` 是否為 `NULL`，確認 `queue` 是否為空 * 若 `sp` 不為 `NULL`，將要刪除後的字串複製進去 * 新增一個指標 `rm_node` 以供釋放 (`free`) * 將 `rm_node` 的成員都清空後，再釋放 `rm_node` ```c= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (q == NULL || q->head == NULL || q->size == 0) return false; list_ele_t *rm_node = q->head; if (sp) { memset(sp, '\0', bufsize); strncpy(sp, rm_node->value, bufsize - 1); } /* *Method as same as free function *Free node's value and then free value */ q->head = q->head->next; q->size -= 1; free(rm_node->value); free(rm_node); return true; } ``` :::info :exclamation: 在 `trace-06-string` 測試中出現了error，原因是因為沒有設定字串的結束 ( `'\0'`)，因此利用 memset 將 `sp` 的內容全部寫為零。 ```cpp=8 memset(sp, '\0', bufsize); ```  > ERROR: Removed value aardvark_bear_dolphin_gerbil_jXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX != expected value meerkat_panda_squirrel_vulture ::: #### **`q_size`** * 若 `q` 不存在，則返回 `0` * 藉由直接取得 `q->size`，達成在 $O(1)$ 時間內完成 `q_size()` * 在 `q_new` 時，便已將 `q->size` 設為 `0` ，因此就算 queue 為空，還是會返回 `0` ```c= int q_size(queue_t *q) { if (q == NULL) return 0; return q->size; } ``` #### **`q_reverse`** * 利用 `q->size` 檢查 queue 是否有足夠的資料可以做反轉 * 將 `q->tail` 設定為 `q->head`，利用 `curnxt` 來儲存 `q->head->next` ，避免待會將資料反向後，無法取得下一筆資料 ```cpp= void q_reverse(queue_t *q) { if (q == NULL || q->size <= 1) return; list_ele_t *cur, *curnxt; q->tail = q->head; cur = q->head; curnxt = q->head->next; while (curnxt != NULL) { q->head = curnxt; curnxt = q->head->next; q->head->next = cur; cur = q->head; } q->tail->next = NULL; } ``` * 原始 queue 的狀況 ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"] b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; d [label="{ <data> d | <ref> }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> d :data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; d:ref:d -> NULL :data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; head [shape=box]; head -> a; tail [shape=box]; tail -> d; } ``` * 進入迴圈之前 queue 的狀況 ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"] b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; d [label="{ <data> d | <ref> }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> d:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; d:ref:d -> NULL:data[arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; cur -> a; curnxt -> b; head [shape=box]; head -> a; tail [shape=box]; tail -> a; } ``` * 進入迴圈之後將 `q->head` 、 `curnxt` 向後移動 * 並將兩兩元素做交換 <pre> while (curnxt != NULL) { q->head = curnxt; curnxt = q->head->next; q->head->next = cur; cur = q->head; } q->tail->next = NULL; </pre> 下圖藍色部份是為了將整個 `queue` 走過一遍，紅色的部份是將 `node` 做反轉： ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"] b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; d [label="{ <data> d | <ref> }"]; a:ref:a -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; b:ref:b -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> d:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; d:ref:d -> NULL[arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; cur -> a; curnxt [fontcolor=blue, shape=box, color=blue] curnxt -> c [color=blue]; head [fontcolor=blue, shape=box, color=blue]; head -> b [color=blue]; tail [shape=box]; tail -> a; } ``` ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"]; b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; d [label="{ <data> d | <ref> }"]; head [fontcolor=blue, color=blue]; cur [label="{cur}"][fontcolor=red, color=red]; curnxt [label="{curnxt}"][fontcolor=blue, color=blue]; a:ref:c -> b [arrowtail=dot, dir=both, tailclip=false]; b:ref:c -> a:data [arrowtail=dot, dir=both, tailclip=false] [color=red]; cur -> b [color=red]; head -> b [color=blue]; tail -> a; curnxt -> c:data [color=blue]; c:ref:c -> d [arrowtail=dot, dir=both, tailclip=false]; d:ref:d -> NULL [arrowtail=dot, dir=both, tailclip=false]; } ``` #### **`q_sort`** * 參考 [Linked LIst Sort](https://npes87184.github.io/2015-09-12-linkedListSort/)，使用 Merge sort 排序來完成 * `merge_sort()` * 將一個 Linked list 拆成兩個 * 藉由兩個指標 (`fast`, `slow`) 不同的前進速度 (`fast` : `slow` = 1 : 2)，進而找出位於中間的節點。演算法是利用[龜兔賽跑 (Floyd’s algorithm)](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm) * `fast` 會指向最後的 `element` * `slow` 會指向中間的 `element` * 將 `fast` 指向 `slow->next` 後，便可以得到兩個等長的 `linked list` (`head` 和 `fast`) * 將兩個已排序的 Linked list 合成一個 Linked list，並回傳 * 參考 [jwang0306](https://hackmd.io/@jwang0306/lab0-c) 的 `merge`，將比較字串的部分縮進 `while` * 遍歷 `l1` 和 `l2`，並比較兩者元素，來建立新 Linked list ```cpp= void q_sort(queue_t *q) { if (q == NULL || q->size <= 1) return; q->head = split(q->head); while (q->tail) { q->tail = q->tail->next; } } ``` **`split`** ``` cpp= list_ele_t *split(list_ele_t *head) { if (head == NULL || head->next == NULL) return head; /*Split list*/ list_ele_t *slow = head, *fast; for (fast = head->next; fast && fast->next; fast = fast->next->next) { slow = slow->next; } fast = slow->next; slow->next = NULL; // sort each list list_ele_t *left = split(head); list_ele_t *right = split(fast); // merge sorted left and sorted right return merge(left, right); } ``` **`merge`** ``` cpp= list_ele_t *merge(list_ele_t *left, list_ele_t *right) { list_ele_t *head = NULL; list_ele_t **tmp = &head; while (left && right) { if (strcmp(left->value, right->value) < 0) { *tmp = left; left = left->next; } else { *tmp = right; right = right->next; } tmp = &((*tmp)->next); } if (left) *tmp = left; if (right) *tmp = right; return head; } ``` :::info [你所不知道的C語言: linked list 和非連續記憶體操作](https://hackmd.io/@sysprog/c-linked-list?type=view) Linus Torvalds 的解說: > People who understand pointers just use a "pointer to the entry pointer", and initialize that with the address of the list_head. And then as they traverse the list, they can remove the entry without using any conditionals, by just doing`*pp = entry->next`. 因此我們先做一個指向 head 的指標: `Node **indirect = &list->head;` ```mermaid graph LR subgraph linked list head==>node1==>node2 end subgraph pointer to pointer indirect==>head end ``` 在走訪 list 的時候，先`*indirect` (dereference `indirect` ，找出它所指向的 node ，此刻為 head)，接著```(*tmp)->next```(找出它的 `->next` ，這邊為 head 的 next ，也就是 node1)，然後 `indirect = &((*indirect)->next)` (將其 reference 存回 `indirect` ，此刻 `indirect` 就變為指向 node1 的指標)： ```mermaid graph LR subgraph linked list head==>node1==>node2 end subgraph pointer to pointer indirect==>node1 end ``` ::: ## 指標的指標 * 擷取自[你所不知道的 C 語言：指標篇](https://reurl.cc/EzmAVa) * 利用下列兩段程式碼解說為何要使用指標的指標考慮以下程式碼: ```cpp=1 int B = 2; void func(int *p) { p = &B; } int main() { int A = 1, C = 3; int *ptrA = &A; func(ptrA); printf("%d\n", *ptrA); return 0; } ``` 在第 5 行 (含) 之前的記憶體示意: ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structptr [label="ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw } ``` 第 6 行 `ptrA` 數值傳入 `func` 後的記憶體示意: ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p|&A"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structp:p -> structa:A:nw } ``` `func` 將變數 `p` 的值指定為 `&B` ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p|&B"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structp:p -> structb:B:nw } ``` 由上圖可見，原本在 main 中的 `ptrA` 內含值沒有改變。這不是我們期望的結果，該如何克服呢？可透過「指標的指標」來改寫，如下： ```cpp=1 int B = 2; void func(int **p) { *p = &B; } int main() { int A = 1, C = 3; int *ptrA = &A; func(&ptrA); printf("%d\n", *ptrA); return 0; } ``` 在第 5 行 (含) 之前的記憶體示意: ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structptr [label="ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw } ``` 第 6 行時，傳入 `func` 的參數是 `&ptrA`，也就是下圖中的 `&ptrA(temp)`: ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structadptr [label="&ptrA(temp)|<adptr> &ptrA"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structadptr:adptr -> structptr:name_ptr:nw } ``` 進入 func 執行後，在編譯器產生對應參數傳遞的程式碼中，會複製一份剛傳入的 `&ptrA`，產生一個自動變數 `p`，將 `&ptrA` 內的值存在其中，示意如下: ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p(in func)| &ptrA"] structadptr [label="&ptrA(temp)|<adptr> &ptrA"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structadptr:adptr -> structptr:name_ptr:nw structp:p -> structptr:name_ptr:nw } ``` 在 `func` 中把 `p` 指向到的值換成 `&B`: ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p(in func)| &ptrA"] structadptr [label="&ptrA(temp)|<adptr> &ptrA"] structptr [label="<name_ptr> ptrA|<ptr> &B"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structb:B:nw structadptr:adptr -> structptr:name_ptr:nw structp:p -> structptr:name_ptr:nw } ``` 經過上述「指標的指標」，`*ptrA` 的數值從 1 變成 2，而且 `ptrA` 指向的物件也改變了。 ## Valgrind - Massif * 環境設定: 安裝 [Massif](https://reurl.cc/GrAN13) `$ sudo apt install massif-visualizer` * 首先執行以下命令，這裡使用 `trace-06-string.cmd`作為範例 ```shell $ valgrind --tool=massif ./qtest < traces/trace-06-string.cmd ``` * 在該資料夾可得到記憶體資料的文字檔 `massif.out.<pid>` ，執行下列指令 ```shell $ ms_print massif.out.16246 ``` :::info :exclamation: `pid` 會根據執行 valgrind 時的執行緒改變，因此要注意其對應的編號 ::: * 執行完 `ms_print massif.out.16246` 即可得到下列結果 ![](https://i.imgur.com/0m24yrL.png)