linux2024-homework1

--- tags: linux kernel class --- # 2024q1 Homework1 (lab0) contributed by < [`ChenFuhuangKye`](https://github.com/kyehuang) > ### Reviewed by `Chloexyw` 1. 剛開始的`q_insert_head` 和 `q_insert_tail` 的實作手法類似，差異只有一行程式碼，不需要將兩者的程式碼都貼上來，可以選擇附上對應的 GitHub commit 的連結即可 > 了解，我改用 git diff 比較前後差異。 ### Reviewed by `chishuo9810` 1. 在 `q_remove_head` 中的敘述，remove 應改成移除可以降低中英夾雜的問題。 > 由於這邊是要 remove ，不是要刪除，因此利用 list_del 將開頭移除。最後如果 sp 不是空的要把 value 存入。 2. 看完你的 HackMD，函式方法沒什麼問題，但在閱讀你的敘述時有點困難。撇除如上一點提到少數中英夾雜，在描述函式實做方法時出現大量用變數來描述的語句，例如 `q_delete_dup`： ``` 先確認 head 是否是空的及 head 是否有指向一個 list_head ，接著找到 e1 跟 e2 ，如果 e1 跟 e2 相等，把 e2 刪除並且找下一個 e2，如果 e1 跟 e2 還是相同，繼續刪除 e2 ，反之停止刪除 e2 再刪除 e1 。 ``` 這樣的敘述只有熟悉程式碼的人才能直接看懂，如果要解釋實作方法的話應該先用中文描述一次在解釋程式執行流程，否則閱讀到一半還需要上下對照變數名稱代表意義挺讓人費神。 > 以作更新。 ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz CPU family: 6 Model: 165 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 2 BogoMIPS: 5184.01 ``` ## 實作指定佇列的操作 <s>在實作每一個 function 之前，需要閱讀 `queue.h` 對於每一個 function 的要求。</s> 在實作每一個函式之前，需要閱讀 `queue.h` 檔案對於每一個函式的定義。 :::danger 避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景。 $\to$ [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) > 了解，我會避免中英文混用。 ::: ### `q_new` 建立一個空佇列，其 next 及 prev 指標指向自身。 ```c struct list_head *new_node = malloc(sizeof(struct list_head)); INIT_LIST_HEAD(new_node); ``` > commit [cc0c3e3](https://github.com/kyehuang/lab0-c/commit/cc0c3e3de471d6718f480d3ba9b5ea81fab0fed2) :::danger 改進漢語表達。不只是書寫程式碼註解的方式，應該描述你的考量因素、做法，還有探討作法的優劣 (如果有的話)。養成撰寫專業文件的習慣，即使這只是一份課堂作業。 > 收到，我會更改漢語表達的方式。 ::: ### `q_free` 先確認 `l` 不是 `NULL` ，走訪所有節點並釋放空間，最後再釋放 `l`。 ```c if (!l) return; struct list_head *next = l->next; while (next != l) { element_t *node = list_entry(next, element_t, list); q_release_element(node); next = next->next; } free(l); ``` > commit [7665473](https://github.com/kyehuang/lab0-c/commit/7665473d73abfb13f584c07d4c48b384d6a51127) 在閱讀 [你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list)時，發現若要移除中間的節點，需要先移動到下一個節點，因此將 `q_release_element` 移動到 `next = next->next;` 下方。 ```diff struct list_head *next = l->next; while (next != l) { element_t *node = list_entry(next, element_t, list); - q_release_element(node); next = next->next; + q_release_element(node); } free(l); } ``` > commit [73ea69c](https://github.com/kyehuang/lab0-c/commit/73ea69c366463d88e62537c5f471435567d52342) ### `q_insert_head` 和 `q_insert_tail` > 由於 `q_insert_head` 和 `q_insert_tail` 實作手法類似，因此放在一起。 `q_insert_head` 插入一個元素在佇列的頭部。 `q_insert_tail` 插入一個元素在佇列的尾端。首先建立一個 `element_t` 的新節點，然後利用 `list_add` 將它插入到佇列的頭部。如果是 `q_insert_tail` 改用 `list_add_tail` 。 ```c if (!head) return false; element_t *new_node = malloc(sizeof(element_t)); char *tmp = strdup(s); if (!new_node || !tmp) { free(new_node); return false; } new_node->value = tmp; list_add(&new_node->list, head); return true; ``` `q_insert_tail` 改用 `list_add_tail` ```diff - list_add(&new_node->list, head); + list_add_tail(&new_node->list, head); ``` > commit [9b2d114](https://github.com/ChenFuhuangKye/lab0-c/commit/9b2d1141aa25bcad4413af2b5e3efad5d0b2a4f2) 發現 `q_insert_head` 和 `q_insert_tail` 沒有考慮周全，有做更改。 ```diff - if (!head) + if (!head || !s) return false; element_t *new_node = malloc(sizeof(element_t)); + if (!new_node) + return false; + char *tmp = strdup(s); - if (!new_node || !tmp) { + if (!tmp) { free(new_node); return false; ``` > commit [424e629](https://github.com/ChenFuhuangKye/lab0-c/commit/424e629a1e08638b1440971199bc1ebb74f6acd6) :::warning 如何重用更多程式碼？ > `q_insert_tail` 可以先利用 `q_insert_head` 將新節點插到頭部，再利用 `list_move_tail` 移到尾端。 ::: ```diff - if (!head || !s) - return false; - - element_t *new_node = malloc(sizeof(element_t)); - if (!new_node) - return false; - - char *tmp = strdup(s); - if (!tmp) { - free(new_node); - return false; - } - - new_node->value = tmp; - list_add_tail(&new_node->list, head); - + q_insert_head(head, s); + list_move_tail(head->next, head); ``` > commit [aa09579](https://github.com/ChenFuhuangKye/lab0-c/commit/aa09579fbbecd964f84a3439b837bef254f6464f) ### `q_remove_head` 利用 `list_head` 找到開頭。接著利用 `list_first_entry` 找到 `element_t` 的位置。由於這邊是要 remove ，不是要刪除，因此利用 `list_del` 將開頭移除。最後如果 `sp` 不是空的要把 `value` 存入。 ```c element_t *node = list_first_entry(head, element_t, list); list_del(&node->list); if (sp) { memcpy(sp, node->value, bufsize); sp[bufsize - 1] = '\0'; } return node; ``` ### `q_remove_tail` 跟 `q_remove_head` 作法類似，只是把 `list_first_entry` 改成 `list_last_entry` 找到結尾。 ``` c element_t *node = list_last_entry(head, element_t, list); list_del(&node->list); if (sp) { memcpy(sp, node->value, bufsize); sp[bufsize - 1] = '\0'; } return node; ``` > commit [a3dcdb8](https://github.com/ChenFuhuangKye/lab0-c/commit/a3dcdb8c8bdb9959771074b9caccd962301e42e3) 如同 `q_insert_tail` 一樣，`q_remove_tail` 可以再簡化。 ```diff + list_move(head->prev, head); - element_t *node = list_last_entry(head, element_t, list); - list_del(&node->list); - - if (sp) { - memcpy(sp, node->value, bufsize); - sp[bufsize - 1] = '\0'; - } - return node; + return q_remove_head(head, sp, bufsize); ``` > commit [bb92932](https://github.com/ChenFuhuangKye/lab0-c/commit/bb92932e0eef0c845deae696f4bfdd638c8c504b) ### `q_size` 利用 `list_for_each` 逐一走訪節點從而計算次數。 ```c int count = 0; struct list_head *node; list_for_each (node, head) count++; return count; ``` > commit [b97774f](https://github.com/ChenFuhuangKye/lab0-c/commit/b97774f0799800c42d640b21c7b22bb9a6b1096d) ### `q_delete_mid` 利用快慢指標找到中間的節點位置，再利用 `list_entry` 找到要釋放的 `element_t` 位置，先將節點移出佇列再釋放 `element_t`。 ```c struct list_head **indir = &head->next; for (struct list_head *fast = head->next; fast != head && fast->next != head; fast = fast->next->next) indir = &(*indir)->next; element_t *node = list_entry(*indir, element_t, list); list_del(*indir); q_release_element(node); return true; ``` > commit [2f9284a](https://github.com/ChenFuhuangKye/lab0-c/commit/2f9284a0c5edb51eb74b0690febbbc0e16946b7b) ### `q_delete_dup` :::danger - [ ] traverse (動詞) 和 traversal (名詞) 根據 Dictionary.com 的[解釋](https://www.dictionary.com/browse/traverse ): (作為及物動詞和不及物動詞都有類似的意思，以下列出作為及物動詞的寓意) * to pass or move over, along, or through. * to go to and fro over or along. 其實這意思很好懂，就像我們「走過」/「穿越」校園一般，於是 traverse a linked list 就會是「(用某種手段) 存取多個鏈結串列的節點」，但這裡卻沒有必要「所有」的範圍：英語的 "move over/through" 用於某個區域時，根本沒有這樣的隱含意義。如果將 traverse 翻譯為「遍歷」，就會導致「超譯」，也就是跳脫「直譯」和「意譯」。當我們回頭看 "traverse" 所在的技術描述內容，例如 "traverse every node"，若翻譯為「遍歷每個節點」，那麼既然都「遍」(意即「全面」、「到處」)，又何來「每個」節點呢？於是，合理的翻譯應改為「逐一走訪每個節點」 —— 差異在哪？在 "traverse every node" 的應用場景中，可能是我們嘗試在鏈結串列尋找特定的節點內含的資料，一旦找到就停止，或者我們要偵測給定的鏈結串列是否包含環狀 (circular) 結構，並沒有真的要「遍」(到處/全面)「歷」(意即「經過」) 每個節點。在我們的用語中，要區分「意圖」(intention) 和「實際作用」(reaction)，濫用「遍歷」會使得語意不清，從而難以推測英語原文的訴求。還有個更重要的原因是，「遍歷」這詞已在理工領域存在，且廣泛使用，即「遍歷理論」(Ergodic theory)，後者是研究具有不變測度 (invariant measure) 的動力系統及其相關問題的一個數學分支。遍歷理論研究遍歷變換，由試圖證明統計物理中的遍歷假設 (Ergodic hypothesis) 演進而來。在統計學中，若單一個物理系統在不同時間內重複相同的實驗 (如丟擲一枚硬幣)，其取樣數據所得的統計結果 (如硬幣出現正面的機率) 和極多個完全相同的物理系統的系集 (如丟極多個相同的硬幣) 在同時作相同的實驗取樣數據的統計結果假設為相同時，此種假設即稱為「遍歷性假設」或「遍歷假設」。基於這個假設，對時間平均的統計方式及結果，便可由對系集的平均及統計方式得到。在一般物理系統中，尤其在統計力學範圖中，均採用此遍歷性假設為基本的假設。在流體力學中對亂流的實驗分析，亦是採用此假設。遍歷 (Ergodic) 源於以下二個希臘詞： * ἔργον (érgon) (對應英語的 work) * ὁδός (hodós) (對應英語的 path 或 way) 最初這是由奧地利物理學家波茲曼 ([Ludwig Boltzmann](https://en.wikipedia.org/wiki/Ludwig_Boltzmann)) 於統計力學領域 (statistical mechanics) 提出的概念，其一廣為人知的結果是：在經過長時間後，時間平均將會趨近空間平均。此事實在動力系統扮演極重要的角色，在隨機分析領域亦然。因此，若貿然將 traverse 翻譯為「遍歷」，一來語意不清，二來增加和理工多項領域專業人員的溝通成本，實在得不償失。 $\to$ [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) ::: 刪除相同節點直到有新的節點出現。<s>遍歷</s> 逐一走過每個節點，當遇到具有相同值節點時，會刪除節點，直到遇見不同值得節點為止。 ```c struct list_head *node = head->next; element_t *e1; struct list_head *safe; while (node && node != head && node->next != head) { e1 = list_entry(node, element_t, list); element_t *e2 = list_entry(node->next, element_t, list); safe = node->next; if (!strcmp(e1->value, e2->value)) { do { list_del(&(e2->list)); safe = safe->next; q_release_element(e2); if (safe == head) break; e2 = list_entry(safe, element_t, list); } while (!strcmp(e1->value, e2->value)); list_del(&e1->list); q_release_element(e1); } node = safe; } ``` > commit [7e16a81](https://github.com/ChenFuhuangKye/lab0-c/commit/7e16a81ccb91ad96d1fd0e6802bce2f2b042b772) ### `quick_sort` 實作 `q_sort` 之前，因為第一周上課時有提到 `quick_sort` ，我先決定使用 `quick_sort` 當作排序演算法，沒有考慮其時間複雜度問題。判斷新的 `node` 跟 `pivot` 的關係，分別插入 `pivot` 前後，接著重複呼叫 `quick_sort` 。 ```c void quick_sort(struct list_head *head, struct list_head *end) { struct list_head *safe; struct list_head *pivot = head->next; char *value = list_entry(pivot, element_t, list)->value; struct list_head *node = pivot->next; while (node != end) { element_t *elem = list_entry(node, element_t, list); safe = node->next; if (strcmp(elem->value, value) < 0) { list_move_tail(node, pivot); } else { list_move(node, pivot); } node = safe; } if (head->next != pivot && head->next->next != pivot) quick_sort(head, pivot); if (pivot->next != end && pivot->next->next != end) quick_sort(pivot, end); } ``` > commit [ef03a10](https://github.com/ChenFuhuangKye/lab0-c/commit/ef03a106bb8d813087f303f082a024854bc54f4f) 我可以成功使用 sort 功能，但是複雜度接近 $O(n^2)$ 太高了，改用 `merge_sort`。 :::danger 補上對應的數學分析。 ::: #### 時間複雜度分析 1. 最差狀況: $O(n^2)$: 最差的分割序列狀況發生在挑選的 pivot 總是最大或最小值，每一個節點都需要作一次分割，才能排序成功，因此最尾端節點需要經過 $n$ 次、次尾端節點要經過 $(n-1)$ 次 ... 第一個節點需要經過 $1$ 次，總共需要 $[n+(n-1)+…+1] = [n*(n+1)]/2 = (n^2+n)/2$ ， $O((n^2+n)/2) = O(n^2)$ 2. 最佳狀況: $O(n*log(n))$: 最佳情況則發生在每次 pivot 都可以順利選到序列的中位數，每次的分割長度都會減半，經過 $log(n)$ 次長度就會變為 1 ，每次分割後排序 $n$ 筆資料，因此最佳狀況為 $O(n*log(n))$ 。在 git commit 時，pre-commit hook 出現以下錯誤。看似 `pivot` 沒有移動，但是我利用 `list_move` 調整了 `list_head` 結構，導致錯誤訊息，因此我使用 `git commit -a --no-verify` 暫停檢查。 ```bash $ git commit -a Following files need to be cleaned up: queue.c queue.c:204:20: style: Condition 'head->next!=pivot' is always false [knownConditionTrueFalse] if (head->next != pivot && head->next->next != pivot) ^ queue.c:191:29: note: pivot is assigned 'head->next' here. struct list_head *pivot = head->next; ^ queue.c:204:20: note: Condition 'head->next!=pivot' is always false if (head->next != pivot && head->next->next != pivot) ^ Fail to pass static analysis. ``` ### `merge_sort` 分成兩步驟 :::danger recursive 的翻譯是「遞迴」 ::: 第一步驟將佇列等分為左右兩個長度相等的佇列。使用<s>遞歸</s> 遞迴方法，將原始佇列分成兩半，直到每個子佇列都只包含一個元素或沒有元素。第二步驟根據大小合併左右兩個佇列。直到所有佇列都被完全合併為止。 ```c list_cut_position(&left, head, mid); list_splice_init(head, &right); merge_sort(&left, descend); merge_sort(&right, descend); ``` ```c while (!list_empty(left) && !list_empty(right)) { element_t *left_entry = list_entry(left->next, element_t, list); element_t *right_entry = list_entry(right->next, element_t, list); if (descend ? strcmp(left_entry->value, right_entry->value) > 0 : strcmp(left_entry->value, right_entry->value) < 0) { list_move_tail(left->next, &tmp); } else { list_move_tail(right->next, &tmp); } } list_splice_tail_init(list_empty(left) ? right : left, &tmp); ``` > commit [569eced](https://github.com/ChenFuhuangKye/lab0-c/commit/569eced55ffff0b2045118f3a73f24e1002ceaff) ### `q_swap` 利用 `list_move` 移動兩個相鄰節點。 ```c if (!head || list_empty(head)) return; struct list_head *cur = head->next; while (cur != head && cur->next != head) { list_move(cur, cur->next); cur = cur->next; } ``` > commit [af248c0](https://github.com/ChenFuhuangKye/lab0-c/commit/af248c0af05ea843156ebb7838926f1064205771) 改用 List API ，簡化程式碼。 ```diff - struct list_head *cur = head->next; - while (cur != head && cur->next != head) { - list_move(cur, cur->next); - cur = cur->next; + struct list_head *node; + struct list_head *safe; + + list_for_each_safe (node, safe, head) { + if (safe == head) + break; + list_move(node, safe); + safe = node->next; ``` > commit [821ed78](https://github.com/ChenFuhuangKye/lab0-c/commit/821ed78e98637a26a78b1caf9556ad36adbabbc1) ### q_reverse 逐一走過佇列上所有節點，從開頭的每個節點移動到尾端的前一個位置上，並且在每次移動後更新尾端節點，使得每個後續移動的節點都插入到前一個移動節點前面。 :::danger 改進你的漢語表達。 ::: ```c struct list_head *end = head; struct list_head *node = head->next; struct list_head *safe = node->next; while (node != end && safe != end) { list_move_tail(node, end); end = end->prev; node = safe; safe = safe->next; } ``` > commit [2ce8e94](https://github.com/ChenFuhuangKye/lab0-c/commit/2ce8e94b7e6bf990e47862c2bab1da61a7441764) 後來思考 `q_reverse` 的方法跟 `q_swap` 類似，只要將每個 `node` 移到 `head` 後面就好了。 ```diff - struct list_head *end = head; - struct list_head *node = head->next; - struct list_head *safe = node->next; + struct list_head *node; + struct list_head *safe; - while (node != end && safe != end) { - list_move_tail(node, end); - end = end->prev; - node = safe; - safe = safe->next; + list_for_each_safe (node, safe, head) { + list_move(node, head); } ``` > commit [fbcf2c9](https://github.com/ChenFuhuangKye/lab0-c/commit/fbcf2c937565ee5d05809d43fec54fb2b6530747) ### `q_reverseK` 跟 `q_reverse` 一樣，只是利用 Sliding Window 限制 reverse 範圍。 ```c struct list_head *node; struct list_head *safe; struct list_head *start = head; int count = 0; list_for_each_safe (node, safe, head) { if (count == k) { start = node->prev; count = 0; } list_move(node, start); count++; } ``` > commit [31ec1dd](https://github.com/ChenFuhuangKye/lab0-c/commit/31ec1ddd76cbcacc3b13c63f44f1474245df83ab) ### `q_ascend` 從 `list_head` 後端往前面檢查，只要其值比前面小就刪除該節點。 > 實作時發現要求是 `remove` node ，然而實際上要的是 `delete` node。 ```c struct list_head *node = head->prev; struct list_head *safe = node->prev; while (node != head && safe != head) { element_t *cur = list_entry(node, element_t, list); element_t *front = list_entry(safe, element_t, list); if (strcmp(front->value, cur->value) > 0) { list_del(node); q_release_element(cur); } node = safe; safe = safe->prev; } ``` > commit [b9474e3](https://github.com/ChenFuhuangKye/lab0-c/commit/b9474e30ee076943354d7fcd1bd6c9f23e653a55) > ### `q_descend` 跟 `q_ascend` 作法類似，從後端檢查，但是因為改成要刪除前面的節點，因此 `node` 的位置沒有改變，而是要將 `safe` 指到 `node` 前方。 ```c struct list_head *node = head->prev; struct list_head *safe = node->prev; while (node != head && safe != head) { element_t *cur = list_entry(node, element_t, list); element_t *front = list_entry(safe, element_t, list); if (strcmp(front->value, cur->value) < 0) { list_del(safe); q_release_element(front); safe = node->prev; } else { node = safe; safe = safe->prev; } } ``` > commit [baa34a8](https://github.com/ChenFuhuangKye/lab0-c/commit/baa34a86f648de40b791d267b79fb9691be4981c) ### `q_merge` 先將所有佇列串在一起，再利用 `q_sort` 排序。 ```c list_for_each_safe (node, safe, head) { if (node == head->next) continue; queue_contex_t *cur_q = list_entry(node, queue_contex_t, chain); list_splice_init(cur_q->q, first_q->q); } q_sort(first_q->q, descend); ``` > commit [e159def](https://github.com/ChenFuhuangKye/lab0-c/commit/e159def004c906d88870685444509a60a5e3ebd6) 完成實作指定佇列的操作，拿到 95 分。 ``` --- TOTAL 95/100 ``` 根據作業提示，下一步驟閱讀 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf) --- ## 閱讀 `Dude, is my code constant time?` 論文論文介紹了一個名為 `dudect` 的工具，用於評估程式碼是否在同一平台上以恆定時間執行。首先量測兩組不同輸入資料的執行時間，然後透過統計測試來檢查這兩種時間分佈是否存在統計上的差異。 :::danger 列出論文和實作程式碼的出入。 ::: ### 理解 `lab0` 如何判定恆定時間在 `fixture.c` 中有一個 `doit` ，是用來判斷是否為恆定時間。首先呼叫 `prepare_inputs` 準備輸入資料。第二步呼叫 `measure` ， `measure` 中有一個 `mode` 參數用來決定要判定哪個 function 為恆定時間，並得到執行前後的 `cpucycle` 。第三步呼叫 `differentiate` ，用於計算執行時間。第四步呼叫 `update_statistics` ，將 cpu cycle counter 非正常資料去除。最後呼叫 `report` ，計算出 `Welford method` 的 t 值， t 值用於統計數據確定數據之間的平均直是否存在顯著差異。如果 t 值大於 threshold 及判定不是恆定時間。相比於 [dudect](https://github.com/oreparaz/dudect/tree/master) 少了 `prepare_percentiles` ，將測量結果進行預處理，丟棄超過特定百分比的測量結果。將 `prepare_percentiles` 加入 `fixture.c` ，就可以拿到 100 分。 ![image](https://hackmd.io/_uploads/HkbIe9lpT.png) > commit [1e1f081](https://github.com/ChenFuhuangKye/lab0-c/commit/1e1f081a52a70f62abfb0c89725514687832237c) --- ## 研讀 Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 先來看到 list_sort 的參數的意義。 * `priv` : `list_sort` 函式不會去使用，而是傳遞給 `cmp` 比較函式，可以用於傳遞需要在比較函式中使用的額外訊息。 * `head` : 為佇列的開頭。 * `cmp` : 為一個比較函式，用於比較兩個元素之間的排列順序，並回傳一個整數來表示兩個元素的比較結果。由於 `head` 跟 `cmp` 不能為 NULL ，加上 `__attribute__((nonnull(2, 3)))` 讓 compiler 可以發出警告。 ```c __attribute__((nonnull(2, 3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) ``` list_sort 初始值 * `list` : 這是一個指向佇列中第一個節點的指標，讓我們可以訪問佇列中的第一個節點，進而<s>遍歷</s> ??? 整個佇列。 * `pending` : 這是一個指向待處理節點的指標，其初始值為 NULL 。 * `count` : 這是一個整數值，用來紀錄待處理節點的數量。 list_sort 有分成三個區塊。 1. 遍歷所有節點，並把節點逐一加入 `pending` ，當 `pending` 的數量 + 1 後為 2 的冪就不合併，反之合併。 2. 當 `list` 中沒有節點時，將 `pending` 中所有節點合併。 3. 最後整理成環狀佇列。 ### 執行步驟 #### `在 do while` 之前 ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head", style="bold"] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] head -> node6 node6 -> node5 node5 -> node4 node4 -> node3 node3 -> node2 node2 -> node1 list -> node6:n } ``` #### count = 1 加入 6 到 `pending` ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5 -> node4 node4 -> node3 node3 -> node2 node2 -> node1 list -> node5:w pending -> node6:w } ``` #### count = 2 加入 5 到 `pending` ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5 -> node6 node4 -> node3 node3 -> node2 node2 -> node1 list -> node4:w pending -> node5:w } ``` #### count = 3 加入 4 到 `pending` ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5 -> node6 node4 -> node5 node3 -> node2 node2 -> node1 list -> node3:w pending -> node4:w } ``` #### count = 4 加入 3 到 `pending` ，5 節點的分支與 6 節點的分支合併 ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5:s -> node6:n node4 -> node5 node3 -> node4 node2 -> node1 list -> node2:w pending -> node3:w } ``` #### count = 5 加入 2 到 `pending` ， 3 節點的分支與 4 節點的分支合併。 ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5:s -> node6:n node3 -> node5 node3:s -> node4:n node2 -> node3 list -> node1:w pending -> node2:w } ``` #### count = 6 加入 1 到 `pending` ， 3 節點的分支與 5 節點的分支合併。 ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5:s -> node6:n node4:s -> node5:n node3:s -> node4:n node2 -> node3 node1 -> node2 list pending -> node1:w } ``` #### 當 `list` 沒有節點將 1 節點的分支、 2 節點的分支與 3 節點的分支合併。 #### 最後將頭尾接上 ### 實作比較用的函式我在 `qtest.c` 新增比較用的函式，可以透過 `descend` 調整排序方式。 ```c __attribute__((nonnull(2, 3))) int list_cmp(void *priv, const struct list_head *a, const struct list_head *b) { element_t *left = list_entry(a, element_t, list); element_t *right = list_entry(b, element_t, list); return descend ? strcmp(right->value, left->value) : strcmp(left->value, right->value); } ``` ### 比較執行時間測試 sort (merge sort)、 list_sort 以及 tim_sort 執行時間，分別測試 100K 筆到 1M 筆，每筆測試 30 次，並計算他們平均的執行時間。 > commit [abb9365](https://github.com/ChenFuhuangKye/lab0-c/commit/abb93657fec67cbdffe205ac19617a7641dcc5e7) | 排序實作 | 100K 筆執行時間 | 200K 筆執行時間 | 300K 筆執行時間 | 400K 筆執行時間 | 500K 筆執行時間 | 600K 筆執行時間 | 700K 筆執行時間 | 800K 筆執行時間 | 900K 筆執行時間 | 1M 筆執行時間 | | -------- | -------- | -------- |-------- |-------- |-------- |-------- |-------- |-------- |-------- |-------- | | sort | 0.0537 | 0.13203333333333334|0.21410000000000007|0.30246666666666666| 0.3918666666666668| 0.4839| 0.5791666666666665| 0.6766333333333333| 0.7736333333333333|0.8722333333333332 | |list_sort | 0.05236666666666668| 0.1303333333333333| 0.21126666666666674| 0.2986666666666667| 0.38563333333333344|0.4787666666666667| 0.5730666666666665| 0.6686666666666666|0.7642666666666665| 0.8581333333333333| |tim_sort | 0.05010000000000001| 0.12553333333333336| 0.2069| 0.28646666666666665| 0.38649999999999995| 0.4716999999999999| 0.5545000000000002| 0.6417000000000002|0.7665| 0.8676333333333334| ![image](https://hackmd.io/_uploads/SJ44PRURp.png) list_sort 的執行時間略快於我寫的 sort ，推測是因為 list_sort 不會等到全部元素分割成單一節點才進行合併，可以減少合併的數量進而提升執行效率， tim_sort 較擅長處理部分排序的狀況，因此其執行時間易受到資料排序狀況影響。 :::danger 你準備的測試資料是否涵蓋合併排序演算法的最差狀況？ ::: --- ## 在 qtest 提供新的命令 `shuffle` 1. 定義 `swap` 函式 : 交換鏈結中兩個節點。 2. 實作 `shuffle` 函式 : 根據 Fisher-Yates 洗牌算法，從鏈結串列的尾端開始，隨機選擇一個元素與之交換，直到達到鏈結的開頭。 ```c int size = q_size(head)-1; struct list_head *tail = head->prev; while (size) { int i = rand() % size; int count = 0; struct list_head *tmp; list_for_each(tmp, head) { if(count == i) break; count++; } swap(tmp, tail); size--; tail = tmp->prev; } ``` 3. 新增 `do_shuffle` 函式：這個函式將會先檢查鏈結中的元素數量。如果鏈結是空的或者只有一個元素，則不需要進行洗牌。如果鏈結串列中有兩個或更多的節點，則呼叫 `shuffle` 函式進行洗牌。 4. 使用 `ADD_COMMAND` 新增 `shuffle` 命令，這將允許使用者透過 `shuffle` 命令觸發 `do_shuffle` 函式。 > commit [2c56b97](https://github.com/ChenFuhuangKye/lab0-c/commit/2c56b97663572e1e9f35014d88f16804d01b6b87) ### 觀察 shuffle 1M 次結果 :::danger 以數學分析來討論，需要樣本空間應該要多大，才能討論是否能代表資料分布？ ::: ![image](https://hackmd.io/_uploads/BJ09wHORT.png) ``` Expectation: 41666 Observation: {'1234': 41392, '1243': 41907, '1324': 41598, '1342': 41563, '1423': 42164, '1432': 41460, '2134': 41834, '2143': 41677, '2314': 41740, '2341': 41727, '2413': 41207, '2431': 41311, '3124': 41722, '3142': 41696, '3214': 41826, '3241': 41624, '3412': 41788, '3421': 41507, '4123': 41514, '4132': 41613, '4213': 41921, '4231': 41857, '4312': 41867, '4321': 41485} chi square sum: 26.045792732683726 ``` 每種狀況出現的次數，幾乎相同。 ## rebase 專案首先加入上游的專案 ```bash $ git add remote sysprog https://github.com/sysprog21/lab0-c $ git remote origin sysprog ``` 更新上游專案的更改，並 rebase 。 ```bash $ git fetch sysprog21 $ git rebase sysprog/master ``` 此時在 q_free 有衝突，處理完衝突之後再繼續 rebase ，最後用 `git push --force` 。 ```bash $ git rebase --continue $ git push --force ``` 在 [commit](https://github.com/ChenFuhuangKye/lab0-c/commits/master/) 紀錄上可以看到我的分支在 [Merge pull request sysprog21#159 from komark06/master](https://github.com/ChenFuhuangKye/lab0-c/commit/bbd4243526e0aa29fcb704bca164006329f8064d) 分支之後