2020q3 Homework1 (lab0)

# 2020q3 Homework1 (lab0) contributed by <`howish`> ## 開發環境 ## 作業要求 :penguin: * 在 GitHub 上 fork [lab0-c](https://github.com/sysprog21/lab0-c) * 參閱 [Git 教學和 GitHub 設定指引](https://hackmd.io/@sysprog/git-with-github) ^附教學影片^ * ==詳細閱讀 [C Programming Lab](https://www.cs.cmu.edu/afs/cs/academic/class/15213-s20/www/labs/cprogramminglab.pdf)== ，依據指示著手修改 `queue.[ch]` 和連帶的檔案，測試後用 Git 管理各項修改，記得也該實作 `q_sort` 函式。 * 在提交程式變更前，務必詳閱 [如何寫好 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/) * 不用理會 [Autolab](http://www.autolabproject.com/) 和檔案下載的描述，這兩者都是 CMU 專屬 * 除了修改程式，也要編輯「[作業區](https://hackmd.io/@sysprog/2020-homework1)」，增添開發紀錄和 GitHub 連結，除了提及你如何逐步達到自動評分程式的要求外，共筆也要涵蓋以下: * 運用 Valgrind 排除 `qtest` 實作的記憶體錯誤，並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量，需要設計對應的實驗 * 研讀論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)，解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理。注意：現有實作存在若干致命缺陷，請討論並提出解決方案; * 挑戰題 * 參照 [antirez/linenoise](https://github.com/antirez/linenoise)，強化 `qtest` 命令列功能，使其支援單行或多行編輯、歷史命令處理、命令自動補完 (例如輸入 `h` 之後按下 Tab 按鍵，自動接續補上 `elp`，成為完整的 `help` 命令)。除了整合程式碼外，應當說明 [antirez/linenoise](https://github.com/antirez/linenoise) 的運作原理，注意到 [termios](http://man7.org/linux/man-pages/man3/termios.3.html) 的運用 * 思考如何預設開啟 AddressSanitizer，卻又能滿足效能測試所需 (可能需要兩種以上的 build target) * 指出現有程式的缺陷 (提示: 和 [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 有關)，嘗試強化並提交 pull request * 截止日期： * Sep 20, 2020 (含) 之前 * 越早在 GitHub 上有動態、越早接受 code review，評分越高 ## 實作queue_t ### 實作目標 * `q_new`: 建立新的「空」佇列; * `q_free`: 釋放佇列所佔用的記憶體; * `q_insert_head`: 在佇列開頭 (head) 加入 (insert) 給定的新元素 (以 LIFO 準則); * `q_insert_tail`: 在佇列尾端 (tail) 加入 (insert) 給定的新元素 (以 FIFO 準則); * `q_remove_head`: 在佇列開頭 (head) 移除 (remove) 給定的元素。 * `q_size`: 計算佇列中的元素數量。 * `q_reverse`: 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列元素，換言之，它只能重排已經存在的元素; * ==`q_sort`==: 「[進階電腦系統理論與實作](http://wiki.csie.ncku.edu.tw/sysprog/schedule)」課程所新增，以==遞增順序==來排序鏈結串列的元素，可參閱 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 得知實作手法; ### `queue_t` 注意到在實作目標中 `q_insert_tail` 和 `q_size` 需要常數時間的實作，故在 `queue_t` 的成員中加上 `tail` 與 `size` 兩個成員，並在實作成員函式時記得要處理它們。 ```c= typedef struct { list_ele_t *head; /* Linked list of elements */ list_ele_t *tail; /* Last element of the linked list */ size_t size; } queue_t; ``` ### `q_new` 需要處理的部份為當 `malloc` 失敗的時候， `malloc` 函式會回傳 `NULL` 表示記憶體配置失敗，偵測到此情形時對應回傳 `NULL` 給使用者。 ```c= queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (q == NULL) return NULL; q->head = NULL; q->tail = NULL; q->size = 0; return q; } ``` ### `q_free` 釋放整個 queue 的方法為從 linked list 的 `head` 節點開始往後，一個一個釋放它本身以及它的成員。 ```c= void q_free(queue_t *q) { list_ele_t *ele_to_free = q->head; list_ele_t *next_to_free; while(ele_to_free) { next_to_free = ele_to_free->next; _list_ele_free(ele_to_free); ele_to_free = next_to_free; } free(q); } ``` 其中因為處理釋放節點的部份只針對一個 `list_ele_t` 做處理，所以將它包成 `_list_ele_free` 函式來增加可讀性與程式碼重用性。 ```c= static void _list_ele_free(list_ele_t* ele) { free(ele->value); free(ele); } ``` ### `q_insert_head` 首先來實做一個靜態函式。它需要在給定字串下，回傳一個配置好的 `list_ele_t` 結構，並且在失敗十回傳 `NULL`。 ```c= static list_ele_t *_list_ele_new(char const *s) { list_ele_t *new_ele; size_t ssize = strlen(s) + 1; new_ele = malloc(sizeof(list_ele_t)); if (new_ele == NULL) return NULL; new_ele->value = malloc(ssize * sizeof(char)); if (new_ele->value == NULL) { free(new_ele); return NULL; } strncpy(new_ele->value, s, ssize); new_ele->next = NULL; return new_ele; } ``` 需要處理的例外狀況有兩個： 1. 配置節點記憶體失敗 2. 配置節點中字串的記憶體失敗兩個情況下都需要回傳 `NULL` 給使用者，且第二個情況在回傳 `NULL` 值之前要先把先前配置好的記憶體釋放掉，才不會造成 memory leak 。最初實作時第13行使用的是 strcpy 函式，但在 commit 時會被警告有潛在危險需要被清除： ```bash Following files need to be cleaned up: queue.c Dangerous function detected in /home/howard/C_CPP_Project/lab0-c/queue.c 21: strcpy(new_ele->value, s); ``` 原因是 [`strcpy`](https://linux.die.net/man/3/strcpy) 有潛在的違法記憶體存取問題，後來查詢了相關議題才將它改成 [`strncpy`](https://linux.die.net/man/3/strncpy) ，它排除問題的方式為讓使用者在使用時一併告訴函式字串的複製長度。有了產生新節點的函式，實作 `insert_head` 部份就很容易了： ```c= bool q_insert_head(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newh; newh = _list_ele_new(s); if (newh == NULL) return false; if (q->size) { newh->next = q->head; q->head = newh; } else q->head = q->tail = newh; q->size++; return true; } ``` 比較需要注意的只有當 queue 還為空時，需要用新節點初始化 `head` 和 `tail`，其他情況把新的節點接在head之前即可。 ### `q_insert_tail` ```c= bool q_insert_tail(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newt; newt = _list_ele_new(s); if (newt == NULL) return false; if (q->size) { q->tail->next = newt; q->tail = q->tail->next; } else q->head = q->tail = newt; q->size++; return true; } ``` 與前一部分相同，只是這次是將新節點接在tail後面。 ### `q_remove_head` ```c= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (q == NULL || q->size == 0) return false; // Copy Head content if (sp) { strncpy(sp, q->head->value, bufsize); sp[bufsize - 1] = '\0'; } // Remove head list_ele_t *next = q->head->next; _list_ele_free(q->head); q->head = next; // Handle tail if (q->size == 1) q->tail = NULL; q->size--; return true; } ``` 這個函式要做兩件事情： 1. 把第一個節點（如果有的話）的字串複製到給定的 buffer。 2. 刪除節點。第一部份使用前述的 [`strncpy`](https://linux.die.net/man/3/strncpy) 函式來複製字串，為了讓思緒清晰。參考 manual 中的敘述 > The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated. > If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written. 釐清一下使用時會發生的狀況： 1. 若 bufsize > length_of_value，則 `strncpy` 會複製整個 `value` 字串，並且把 `sp` 的尾巴全部用`\0`填滿。 3. 若 bufsize = length_of_value，則 `strncpy` 會複製整個 `value` 字串，但尾巴沒有`\0`，這時需要手動加上`\0`。 4. 若 bufsize < length_of_value，則 `strncpy` 會複製 `value` 的前 `bufsize` 個字元，且尾巴沒有`\0`，需要手動加上`\0`。 \ 第二部份需要注意的是當 queue 的大小剛好為1時， `head` 會被自動指定成 `NULL` (因為尾巴的下一個元素是 `NULL`)，但 `tail` 不會，需要手動將它設成 `NULL`。 ### `q_size` ```c= int q_size(queue_t *q) { return q == NULL ? 0 : q->size; } ``` 因為 `size` 成員在其他函式中都被適當的處理，故這個函式只要簡單的拋回值就好。 ### `q_reverse` ```c= void q_reverse(queue_t *q) { if (q == NULL || q->size == 0) return; // Handle tail q->tail = q->head; // Reverse linked list list_ele_t *pre = NULL; list_ele_t *cur = q->head; list_ele_t *nxt; while (cur != NULL) { nxt = cur->next; cur->next = pre; pre = cur; cur = nxt; } q->head = pre; } ``` 使用三個節點指標來做循序的串列反轉。 ### `q_sort` ```c= void q_sort(queue_t *q) { if (q == NULL || !q->size) return; // Implement sorting by merge sort _merge_sort(q->head, q->size, &q->head, &q->tail); } ``` 考慮連結串列的結構， merge sort 是比較好在它上面實作的排序方法。因為考慮到 merge_sort 操作時同時會產生串列的結尾，順便讓它回傳結尾的指標值給 `tail` 成員，就不用在事後重新計算。 ```c= static void _merge_sort(list_ele_t *head, size_t len, list_ele_t **newhead, list_ele_t **newtail) { if (len <= 1) { *newhead = *newtail = head; return; } list_ele_t *prevmid, *mid = head, *dummy; for (size_t i = 0; i < len / 2 - 1; i++) mid = mid->next; prevmid = mid; mid = mid->next; prevmid->next = NULL; _merge_sort(head, len / 2, &head, &dummy); _merge_sort(mid, len - len / 2, &mid, &dummy); _merge_two_list(head, mid, newhead, newtail); } ``` merge_sort 的主要程式碼，以下摘要： 1. 使用遞迴的方式實作。 2. 因為事先知道長度，所以不使用快慢指標的方式找中間節點，而是直接移動固定長度。 3. 找到中間節點之前要先停一下，把前一個節點與中間節點的連接移除，才能得到兩個獨立且約相等長的串列。可以注意到因 size >= 2 所以所謂的「中間節點的前一個節點」一定存在。 ```c= static void _merge_two_list(list_ele_t *lst1, list_ele_t *lst2, list_ele_t **newhead, list_ele_t **newtail) { *newhead = *newtail = NULL; while (lst1 != NULL || lst2 != NULL) { list_ele_t **ref; ref = !lst1 || (lst2 && strcmp(lst1->value, lst2->value) > 0) ? &lst2 : &lst1; if (*newhead == NULL) *newhead = *newtail = *ref; else { (*newtail)->next = *ref; (*newtail) = (*newtail)->next; } *ref = (*ref)->next; } } ``` 合併兩個串列的程式碼，以下摘要： 1. 利用一個指標的指標來紀錄當個 iteration 下要選擇哪一個串列來操作。 2. 要注意初始狀態下 `newhead` 還是 `NULL` 時需作對應的初始化。 ### 紀錄：NULL 指標開發時一度疑惑於究竟可不可以使用以下表示法來判斷指標是否是 `NULL`： ```c if (ptr) { // Content } if (!ptr) { // Content } ``` 在 [C99-7.17p3][C99_STD] 中 *NULL* 的敘述為： > The macros are NULL which expands to an ==implementation-defined null pointer constant==; 而 [C99-6.3.2.3p3][C99_STD] 中對 *null pointer constant* 的敘述為 > An integer constant expression ==with the value 0, or such an expression cast to type void *==, is called a null pointer constant. > ==If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer==, is guaranteed to compare unequal to a pointer to any object or function. 以及 [C99-6.3.2.3p4][C99_STD] 中對 *null pointer* 的敘述 > Conversion of a null pointer to another pointer type yields a null pointer of that type. ==Any two null pointers shall compare equal==. 根據這些定義，若 `ptr` 是 null pointer，則： 1. `if (ptr != NULL)` 的寫法是在比較兩個 *null pointer*是否**不**相等，答案為否。 2. `if (ptr)`的寫法是在判斷 ptr 是否 unequal to 0 expression，但因為 0 expression 轉成 pointer type 正是一個 null pointer 的定義，所以答案也是否。可以看出兩個寫法的行為是一致的，因此解答了我的疑惑。 ## qtest ## Reference [C99_STD]: http://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf