2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by <`jychen0611` > ### Reviewed by `jimmy01240397` 1. 善用巨集展開以減少重複的程式碼，例如： ```c #define q_remove_base(head, sp, bufsize, from) \ if (!head || list_empty(head)) \ return NULL; \ element_t *tmp = list_first_entry(head, element_t, list); \ strncpy(sp, tmp->value, bufsize); \ sp[bufsize - 1] = '\0'; \ list_del(&tmp->list); \ return tmp; element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { q_remove_base(head, sp, bufsize, first); } element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) { q_remove_base(head, sp, bufsize, last); } ``` - q_insert_head - q_insert_tail - q_remove_head - q_remove_tail - q_ascend - q_descend :::success :monkey: ***[commit e674e7b](https://github.com/sysprog21/lab0-c/commit/e674e7bf903c947336ec0789ef3e5b0209aa3428)*** Add multiple sets of function macros [What?] 採納 `jimmy01240397` 的建議將同類型的函式寫成巨集 [Why?] 原程式碼存在重複性極高的函式，可用巨集整合之 [How?] 以下依函式類型兩兩說明 (1) q_insert_head 和 q_insert_tail 使用巨集 `q_insert_base` 利用 `to` 判別是 q_insert_head 還是 q_insert_tail 使用，若傳入 head ，則表 q_insert_head ，需使用 `list_add`。反之則使用 `list_add_tail`。 ```diff +#define q_insert_base(head, sp, to) \ . . + !strcmp(to, "head") ? list_add(&tmp->list, head) : list_add_tail(&tmp->list, head); \ ``` (2) q_remove_head 和 q_remove_tail 使用巨集 `q_remove_base` 利用 `from` 判別是 q_remove_head 還是 q_remove_tail 使用，若傳入 first ，則表 q_remove_head ，需使用 `list_first_entry`。反之則使用 `list_last_entry`。 ```diff #define q_remove_base(head, sp, bufsize, from) \ . . + element_t *tmp = !strcmp(from, "first") ? list_first_entry(head, element_t, list) : list_last_entry(head, element_t, list); \ ``` (3) q_ascend 和 q_descend 使用巨集 `q_scend_base` 利用 `dir` 判別是 q_ascend 還是 q_descend 使用，若傳入 ascend ，則表 q_ascend ，需使用 `strcmp(cur, t->value) >= 0`。反之則使用 `strcmp(cur, t->value) <= 0`。 ```diff +#define q_scend_base(head, dir) \ . . + if (strcmp(dir, "ascend") ? strcmp(cur, t->value) >= 0 : strcmp(cur, t->value) <= 0) {\ ``` ::: 2. 刪除部分非必要的分支 - [PR](https://github.com/jychen0611/lab0-c/pull/1) :::success :monkey: ***[commit b003cdc](https://github.com/sysprog21/lab0-c/commit/b003cdc9b30a7801e85ed5d51ab6f906b7b08729)*** Merge pull request #1 from chumytest/master [What?] 接受 `jimmy01240397` 的 pull request [Why?] 原程式碼不夠精簡，且有多餘判斷式 [How?] 以下依不同函式更動逐一說明 (1) q_delete_mid 在此無須檢查節點數量是否只有一個。若只有一個節點，則 slow 會指到這個節點，且不會進入 while 迴圈，最終會刪除 slow 所指的這個唯一的節點。 ```diff! - if (list_is_singular(head)) { - struct list_head *del = head->next; - list_del(del); - q_release_element(list_entry(del, element_t, list)); - } ``` (2) q_delete_dup 新增變數 `nowdel` 表當前是否需刪除該節點，以簡化判斷式敘述。 ```diff - bool del = false; + bool del = false, nowdel = false; ``` 使用 `del` 紀錄 `nowdel` 的先前狀態，以刪除最後一個重複節點。 ```diff list_for_each_safe (l, r, head) { element_t *nl = list_entry(l, element_t, list); element_t *nr = list_entry(r, element_t, list); - if (l->next != head && !strcmp(nl->value, nr->value)) { - list_del(l); - q_release_element(nl); - del = true; - } else if (del) { + nowdel = r != head && !strcmp(nl->value, nr->value); + if (nowdel || del) { list_del(l); q_release_element(nl); - del = false; + del = nowdel; } } ``` (3) merge (用於 q_sort) 以 `((descend * 2) - 1) * strcmp(l->value, r->value) > 0` 判斷要優先將 left 的 element 放入佇列尾端，還是將 right 的 element 放入佇列尾端。 `((descend * 2) - 1)` 此處理方式可使 descend 值從原本的 0/1 表示轉成 -1/+1 表示。因此 * `((descend * 2) - 1)`為 +1 表 descend * `((descend * 2) - 1)`為 -1 表 ascend eg. 若 `strcmp(l->value, r->value) > 0` (l>r) 且 `((descend * 2) - 1)`為 +1 (descend)，則優先放入 left 的 element。 ```diff while (!list_empty(left) && !list_empty(right)) { element_t *l = list_entry(left->next, element_t, list); element_t *r = list_entry(right->next, element_t, list); - if (descend) { - if (strcmp(l->value, r->value) > 0) { - list_move_tail(left->next, head); - } else { - list_move_tail(right->next, head); - } + if (((descend * 2) - 1) * strcmp(l->value, r->value) > 0) { + list_move_tail(left->next, head); } else { - if (strcmp(l->value, r->value) < 0) { - list_move_tail(left->next, head); - } else { - list_move_tail(right->next, head); - } + list_move_tail(right->next, head); } } ``` 最後是微幅調整，將`!list_empty(left)` 代換成 `list_empty(left)` ，`list_splice_tail(left, head)` 也隨之替換成 `list_splice_tail(right, head)` ```diff - if (!list_empty(left)) - list_splice_tail(left, head); - else + if (list_empty(left)) list_splice_tail(right, head); + else + list_splice_tail(left, head); ``` (4) q_merge 新增變數 `tmphead` 作為 `head->next` 的替代放入 `list_for_each` 的第二個輸入，待進入迴圈後，再將 `tmphead` 改為 `head` 。其用意為使 node 初始化為 `head->next->next` ，這樣便能跳過第一個佇列，不必額外去檢查當前是否為第一個佇列。刪除 `if (list_empty(another_q->q)) continue;` 不會影響程式運作，不需要做此額外判斷。 ```diff - struct list_head *tmp; - list_for_each (tmp, head) { - if (tmp == head->next) { - continue; - } + struct list_head *tmp, *tmphead = head->next; + list_for_each (tmp, tmphead) { + tmphead = head; queue_contex_t *another_q = list_entry(tmp, queue_contex_t, chain); - if (list_empty(another_q->q)) - continue; len += another_q->size; list_splice_init(another_q->q, first_q->q); } ``` ::: 然而我在修改 commit message 時遭遇以下問題新增的 `nowdel` 被系統指出範圍可以縮減，且初始化 `nowdel` 為 false 並無實質意義 ``` elian@elian-System-Product-Name:~/linux2024/lab0-c$ git rebase -i 2808dbedc22d4577bd0ecbd28862b6fb28fdefaa queue.c:156:23: style: The scope of the variable 'nowdel' can be reduced. [variableScope] bool del = false, nowdel = false; ^ queue.c:156:30: style: Variable 'nowdel' is assigned a value that is never used. [unreadVariable] bool del = false, nowdel = false; ^ Fail to pass static analysis. ``` 於是我做了以下修改 :::success :monkey: ***[commit 7d17d45](https://github.com/sysprog21/lab0-c/commit/7d17d450bc58b8c8b1ad5421216335d175404bb9)*** Correct the variable [What?] 修正變數 `nowdel` 的宣告方式 [Why?] `nowdel` 被系統指出範圍可以縮減，且初始化 `nowdel` 為 false 並無實質意義 [How?] 把 `nowdel` 宣告在 `list_for_each_safe` 迴圈內 ```diff - bool nowdel = false; list_for_each_safe (l, r, head) { element_t *nl = list_entry(l, element_t, list); element_t *nr = list_entry(r, element_t, list); - nowdel = r != head && !strcmp(nl->value, nr->value); + bool nowdel = r != head && !strcmp(nl->value, nr->value); if (nowdel|| del) { list_del(l); q_release_element(nl); del = nowdel; } } ``` ::: 經修改後，程式可正常 push 上去，但唯獨 pull request 的 commit message 無法做修改，仍會有上述問題。 ***待解*** 3. q_merge：直接將所有已經排序完成的鏈結串列接上再重新排序<s>有點不太</s> 優雅，應該有效能更好的解決方式。 :::danger 用字務必精準，既然說他人不好，那就提供好的做法給同學看。 :notes: jserv ::: :::success :monkey: ***[commit 0db6def](https://github.com/sysprog21/lab0-c/commit/0db6defde03ea52bbb01a9274a195c70a830557b)*** Improve the efficiency of q_merge [What?] 撰寫 `merge_with_first_q` 函式，使其他佇列能和第一個佇列兩兩合併，以改善 q_merge 效能。 [Why?] 比起直接將所有已經排序完成的鏈結串列接上再重新排序，有效能更好的解決方式。 [How?] 在 `merge_with_first_q` 函式中，使用佇列標頭變數 `tmp_head` 存放已排序的節點，其排序過程似 `merge` 函式，最終將排序好的節點接回第一個佇列。 ```diff list_for_each (tmp, tmphead) { tmphead = head; queue_contex_t *another_q = list_entry(tmp, queue_contex_t, chain); len += another_q->size; - list_splice_init(another_q->q, first_q->q); + merge_with_first_q(another_q->q, first_q->q, descend); } - q_sort(first_q->q, descend); ``` ::: ### Reviewed by `Kuanch` * 未完成 `lib/list_sort.c` 的效能測量，這部分實作不困難，使用 `perf` 及準備小規模獨立的測試即可，可參考 [q_sort 與 list_sort 效能比較](https://hackmd.io/QDxYHCXIQ7SQ-k7cM34Vcw?both#q_sort-%E8%88%87-list_sort-%E6%95%88%E8%83%BD%E6%AF%94%E8%BC%83)，亦可透過 `dudect/cpucycles.h`僅計算 cycle 數 :::success :monkey: ***[commit 2a8d392](https://github.com/jychen0611/lab0-c/commit/2a8d392d8803efaaac0451ba00ef46dbd149b9f8)*** [比較自己實作的 merge sort 和 Linux 核心程式碼之間效能落差](https://hackmd.io/NCV2VQAWQe60GtwtIjypjA?view#%E6%AF%94%E8%BC%83%E8%87%AA%E5%B7%B1%E5%AF%A6%E4%BD%9C%E7%9A%84-merge-sort-%E5%92%8C-Linux-%E6%A0%B8%E5%BF%83%E7%A8%8B%E5%BC%8F%E7%A2%BC%E4%B9%8B%E9%96%93%E6%95%88%E8%83%BD%E8%90%BD%E5%B7%AE) ::: * 研讀論文〈Dude, is my code constant time?〉[我](https://hackmd.io/QDxYHCXIQ7SQ-k7cM34Vcw?both#%E7%A0%94%E8%AE%80%E8%AB%96%E6%96%87%E3%80%88Dude-is-my-code-constant-time%E3%80%89)傾向至少將程式碼一一和文章對應，譬如它是如何計算執行時間、怎麼做後處理、如何計算統計量等等，再針對其中有興趣的程式碼深入探討；或參考 [vax-r](https://hackmd.io/@vax-r/linux2024-homework1#%E7%A0%94%E8%AE%80%E8%AB%96%E6%96%87%E3%80%88Dude-is-my-code-constant-time%E3%80%89)，他以記錄自身的問題以及解答撰寫本節 :::success :monkey: ***[commit e9eab10](https://github.com/jychen0611/lab0-c/commit/e9eab108d1e653618024676c81eaecec5473df87)*** [論文〈Dude, is my code constant time?〉](https://hackmd.io/NCV2VQAWQe60GtwtIjypjA?both#%E8%AB%96%E6%96%87%E3%80%88Dude-is-my-code-constant-time%E3%80%89) ::: * Git 部分，commit message 有落實 Why?What?How? 很難得，向你看齊 ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Vendor ID: GenuineIntel Model name: 13th Gen Intel(R) Core(TM) i9-13900K CPU family: 6 Model: 183 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 Stepping: 1 CPU max MHz: 5800.0000 CPU min MHz: 800.0000 BogoMIPS: 5990.40 ``` ## 針對佇列操作的程式碼實作 >依據上述指示著手修改 queue.[ch] 和連帶的檔案，測試後用 Git 管理各項修改，要能滿足 $ make test 自動評分系統的所有項目。 >* :warning: 在提交程式變更前，務必詳閱如何寫好 Git Commit Message >* 詳閱「你所不知道的 C 語言: linked list 和非連續記憶體」並應用裡頭提及的手法，例如 pointer to pointer (指向某個指標的指標，或說 indirect pointer)，讓程式碼更精簡且有效 >* 參閱「2023 年 Linux 核心設計/實作課程第一次作業檢討」，留意各式細節，唯有重視小處並步步為營，方可挑戰原始程式碼規模超越三千七百萬行的 Linux 核心 ### 結構體 ```c struct list_head { struct list_head *prev; struct list_head *next; }; typedef struct { char *value; struct list_head list; } element_t; ``` ### `q_new` :::success :monkey: ***[commit 0bf802e](https://github.com/sysprog21/lab0-c/commit/0bf802ef7337d9aa71d6ebb3d0cae4049b58d3f0)*** Implement q_new [What?] 初始化一個雙向環狀佇列 [Why?] 此功能可初始化一個標頭指標 h 以串接之後新增的節點 [How?] 宣告一個 struct list_head *h 並使用 malloc 函式分配記憶體給指標 h ，此指標可用來串街接之後新增的節點使用巨集 INIT_LIST_HEAD 初始化指標 h ::: 根據 C99 standard 的 malloc 描述如下： >`Synopsis` ```c #include <stdlib.h> void *malloc(size_t size); ``` >`Description` The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate. `Returns` The malloc function returns either a null pointer or a pointer to the allocated space. :::success :monkey: ***[commit bf419a8](https://github.com/sysprog21/lab0-c/commit/bf419a8694dd5a967f304513433cdf5ef6207bc4)*** Check null pointer [What?] 新增檢查 h 是否有成功被分配到記憶體空間 [Why?] malloc 之回傳可能為 null pointer [How?] 在呼叫 INIT_LIST_HEAD 前先檢查 h 是否為 NULL ::: 可知 malloc 之回傳可能為 null pointer 故在呼叫 `INIT_LIST_HEAD` 前須先檢查是否為 NULL ```diff struct list_head *h = (struct list_head *) malloc(sizeof(struct list_head)); - INIT_LIST_HEAD(h); + if (h) + INIT_LIST_HEAD(h); return h; ``` ### `q_free` :::success :monkey: ***[commit c3ed425](https://github.com/sysprog21/lab0-c/commit/c3ed42510dc4d7dec3cc48a60673a9a5e58f1821)*** Implement q_free [What?] 實作佇列刪除功能，並回收分配給該佇列的空間 [Why?] 使系統之後能用一個函式就刪除佇列 [How?] 遍歷佇列中所有節點釋放該空間，最後再釋放佇列標頭指標所佔空間 ::: 根據 C99 standard 的 free 描述如下： >`Synopsis` ```c #include <stdlib.h> void free(void *ptr); ``` >`Description` The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by the calloc, malloc, or realloc function, or if the space has been deallocated by a call to free or realloc, the behavior is undefined. `Returns` The free function returns no value. 使用 `list_for_each_entry_safe` 巨集以逐一釋放 entry 空間 ```c /** * list_for_each_entry_safe - Iterate over list entries and allow deletes * @entry: pointer used as iterator * @safe: @type pointer used to store info for next entry in list * @head: pointer to the head of the list * @member: name of the list_head member variable in struct type of @entry * * The current node (iterator) is allowed to be removed from the list. Any * other modifications to the the list will cause undefined behavior. * * FIXME: remove dependency of __typeof__ extension */ #define list_for_each_entry_safe(entry, safe, head, member) \ for (entry = list_entry((head)->next, __typeof__(*entry), member), \ safe = list_entry(entry->member.next, __typeof__(*entry), member); \ &entry->member != (head); entry = safe, \ safe = list_entry(safe->member.next, __typeof__(*entry), member)) ``` 其中，`list_entry` 表 `container_of` ```c /** * list_entry() - Get the entry for this node * @node: pointer to list node * @type: type of the entry containing the list node * @member: name of the list_head member variable in struct @type * * Return: @type pointer of entry containing node */ #define list_entry(node, type, member) container_of(node, type, member) ``` 參考自　[Linux 核心原始程式碼巨集: container_of](https://hackmd.io/@sysprog/linux-macro-containerof) 巨集 container_of 在 offsetof 的基礎上，擴充為「給定成員的位址、struct 的型態，及成員的名稱，傳回此 struct 的位址」 ![image](https://hackmd.io/_uploads/H1nGa2YhT.png) ```diff! if (!l) return; + element_t *entry, *safe; + list_for_each_entry_safe (entry, safe, l, list) { + free(entry); + } free(l); ``` :::success :monkey: ***[commit c5ff2aa](https://github.com/sysprog21/lab0-c/commit/c5ff2aafd16f03f8b03e883d0582d5b17345aa44)*** Replace free() with q_release_element() on q_free [What?] 釋放節點中 value 空間 [Why?] 原程式並無釋放節點中的字元指標 value 所佔用的空間，因此做以下修正 [How?] 以 q_release_element() 取代 free() ::: 原程式並無釋放節點中的字元指標 `value` 所佔用的空間，因此做以下修正 ```diff! - free(entry); + q_release_element(entry); ``` ### `q_insert_head` :::success :monkey: ***[commit a6d38b5](https://github.com/sysprog21/lab0-c/commit/a6d38b51d0a2b52a4cc608a13ba212f13a91804d)*** Implement q_insert_head, q_insert_tail [What?] 實作新增節點到佇列的頭或尾之功能 [Why?] 使佇列能新增帶有指定字串的節點到頭尾 [How?] 先確認佇列是否存在，再嘗試分配空間給新的節點 tmp ，若分配成功則將指定字串 s 放入節點，最後利用`list_add` 和`list_add_tail`將節點加入佇列 ::: 根據 C99 standard 的 strcpy 描述如下： >`Synopsis` ```c #include <string.h> char *strcpy(char * restrict s1, const char * restrict s2); ``` >`Description` The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. `Returns` The strcpy function returns the value of s1. 使用 strcpy 後在 commit 時遭遇以下問題： ``` Following files need to be cleaned up: queue.c Dangerous function detected in /home/elian/linux2024/lab0-c/queue.c 44: strcpy(tmp->value, s); 57: strcpy(tmp->value, s); Read 'Common vulnerabilities guide for C programmers' carefully. https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml ``` 根據訊息提供的 [Common vulnerabilities guide for C programmers](https://security.web.cern.ch/recommendations/en/codetools/c.shtml) >The strcpy built-in function does not check buffer lengths and may very well overwrite memory zone contiguous to the intended destination. In fact, the whole family of functions is similarly vulnerable: strcpy, strcat and strcmp. `strcpy` `strcat` `strcmp` 由於這些函式不會檢查緩衝區的長度，皆有可能發生相鄰記憶體區域覆寫的情形於是改用 `strncpy` 來實現，更改完就能 commit 了根據 C99 standard 的 strncpy 描述如下： >`Synopsis` ```c #include <string.h> char *strncpy(char * restrict s1, const char * restrict s2, size_t n); ``` >`Description` The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written. `Returns` The strncpy function returns the value of s1. ```diff if (!head) return false; element_t *tmp = malloc(sizeof(element_t)); if (!tmp) return false; - strcpy(tmp->value, s); + strncpy(tmp->value, s, (strlen(s) + 1)); list_add(&tmp->list, head); ``` :::success :monkey: ***[commit 15790b1](https://github.com/sysprog21/lab0-c/commit/15790b1723dc617ba1eae60bd16bedc96efe5f06)*** Add allocation for new element's value [What?] 為新增節點的value配置空間 [Why?] 由於節點中的字元指標 value 也需配置空間，才能存放字串 s ，故新增此功能 [How?] 配置 strlen(s) + 1 的空間給節點的 value，並檢查是否配置成功，若失敗則是放節點空間，並回傳 false ::: 參考 [millaker](https://hackmd.io/@millaker/HyXNtXmna) 為新增節點的value配置空間 ```diff! + tmp->value = (char *) malloc(sizeof(char) * (strlen(s) + 1)); + if (!tmp->value) { + q_release_element(tmp); + return false; + } ``` ### `q_insert_tail` :::success :monkey: ***[commit a6d38b5](https://github.com/sysprog21/lab0-c/commit/a6d38b51d0a2b52a4cc608a13ba212f13a91804d)*** Implement q_insert_head, q_insert_tail [What?] 實作新增節點到佇列的頭或尾之功能 [Why?] 使佇列能新增帶有指定字串的節點到頭尾 [How?] 先確認佇列是否存在，再嘗試分配空間給新的節點 tmp ，若分配成功則將指定字串 s 放入節點，最後利用`list_add` 和`list_add_tail`將節點加入佇列 ::: ```diff if (!head) return false; element_t *tmp = malloc(sizeof(element_t)); if (!tmp) return false; - strcpy(tmp->value, s); + strncpy(tmp->value, s, (strlen(s) + 1)); list_add_tail(&tmp->list, head); ``` ### `q_remove_head` :::success :monkey: ***[commit ef95866](https://github.com/sysprog21/lab0-c/commit/ef95866b3fe34f24d5888f67b8e5cbccc0ffaaf0)*** Implement q_remove_head, q_remove_tail [What?] 實作從佇列頭或尾移除節點並回傳的函式 [Why?] 使節點可以從佇列頭尾移除 [How?] 先檢查佇列是否存在或是否為空，接著利用`list_first_entry` 或 `list_last_entry`取得佇列頭尾的節點，並將節點中的字串複製給 sp，最後利用`list_del` 將節點從佇列移除，回傳節點 ::: 使用`list_empty` 檢查佇列是否為空使用`list_first_entry` 取得佇列第一個節點 `strncpy`不會複製最後一個字元`\0` 使用`list_del` 將該節點從佇列中移除 ```diff if (!head || list_empty(head)) return NULL; element_t *tmp = list_first_entry(head, element_t, list); strncpy(sp, tmp->value, bufsize); + sp[bufsize - 1] = '\0'; list_del(&tmp->list); return tmp; ``` [改成巨集後](https://github.com/sysprog21/lab0-c/commit/e674e7bf903c947336ec0789ef3e5b0209aa3428)發現有以下問題 ``` Testing remove_head...(0/10) Segmentation fault occurred. You dereferenced a NULL or invalid pointer ``` :::success :monkey: ***[commit 0656c4c](https://github.com/sysprog21/lab0-c/commit/0656c4cff0d42436cdda8fefdf82dc7cbb4ef4f8)*** Modify the macro q_remove_base [What?]新增 `if (tmp && sp)` 判別式到巨集 `q_remove_base` [Why?]原先方式會存取到 NULL [How?]在使用 `tmp` 和 `sp` 前先檢查是否為空 ::: ### `q_remove_tail` :::success :monkey: ***[commit ef95866](https://github.com/sysprog21/lab0-c/commit/ef95866b3fe34f24d5888f67b8e5cbccc0ffaaf0)*** Implement q_remove_head, q_remove_tail [What?] 實作從佇列頭或尾移除節點並回傳的函式 [Why?] 使節點可以從佇列頭尾移除 [How?] 先檢查佇列是否存在或是否為空，接著利用`list_first_entry` 或 `list_last_entry`取得佇列頭尾的節點，並將節點中的字串複製給 sp，最後利用`list_del` 將節點從佇列移除，回傳節點 ::: 使用`list_last_entry` 取得佇列最後一個節點 ```diff if (!head || list_empty(head)) return NULL; + element_t *tmp = list_last_entry(head, element_t, list); strncpy(sp, tmp->value, bufsize); sp[bufsize - 1] = '\0'; list_del(&tmp->list); return tmp; ``` ### `q_size` :::success :monkey: ***[commit 42858f3](https://github.com/sysprog21/lab0-c/commit/42858f39d5bc538fea9411b7207cd91e1b1a5149)*** Implement function : q_size [What?] 實作取得佇列大小的函式 [Why?] 此函數可取得佇列大小 [How?] 使用了巨集 list_for_each 走訪所有節點，並利用 len 變數紀錄節點個數 ::: 使用了巨集 `list_for_each` 走訪所有節點 ```c /** * list_for_each - Iterate over list nodes * @node: list_head pointer used as iterator * @head: pointer to the head of the list * * The nodes and the head of the list must be kept unmodified while * iterating through it. Any modifications to the the list will cause undefined * behavior. */ #define list_for_each(node, head) \ for (node = (head)->next; node != (head); node = node->next) ``` ```diff if (!head) return 0; int len = 0; struct list_head *li; + list_for_each (li, head) + len++; return len; ``` :::warning HackMD 不是讓你張貼完整程式碼的地方，GitHub 才是！因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示)，可附上對應 GitHub commit 的超連結，列出程式碼是為了「檢討」和「便於他人參與討論」，不是用來「假裝自己有付出」 ::: ### `q_delete_mid` #### <approach 1> :::success :monkey: ***[commit 093aebb](https://github.com/sysprog21/lab0-c/commit/093aebbc6650dd5d2977ae73ccadb8d9232901a8)*** Implement q_delete_mid (first approach) [What?] 實作刪除佇列中間節點之函式 [Why?] 使佇列能從中間移除節點 [How?] 第一個版本嘗試最簡單的方式，利用 `q_size` 取得佇列中間的位置，將指標移至中間刪除該節點 ::: 先嘗試最簡單的方式，利用`q_size`取得佇列中間的位置，將指標移至中間刪除該節點 ```diff! + int counter = q_size(head) / 2; + struct list_head *tmp = head->next; + while (counter--) { + tmp = tmp->next; + } list_del(tmp); q_release_element(list_entry(tmp, element_t, list)); ``` #### <approach 2> :::success :monkey: ***[commit 2808dbe](https://github.com/sysprog21/lab0-c/commit/2808dbedc22d4577bd0ecbd28862b6fb28fdefaa)*** Implement approach 2 of the q_delete_mid [What?] 提出另一個版本的方法 [Why?] 原方法需倚賴 q_size 函式來取得佇列大小，需要花一次遍歷佇列的時間，再走到佇列中間刪除節點。而新方法只需遍歷佇列一次即可。 [How?] 利用快慢指針的搭配找出中間節點 fast一次移動兩步，slow一次移動一步當 fast 走訪完整個佇列後，slow會剛好落在佇列中間的位置 ::: 方法二利用快慢指針的搭配找出中間節點 `fast`一次移動兩步，`slow`一次移動一步當 `fast` 走訪完整個佇列後，`slow`會剛好落在佇列中間的位置 ```diff! + struct list_head *fast = head->next->next; + struct list_head *slow = head->next; + while (fast != head && fast->next != head) { + fast = fast->next->next; + slow = slow->next; + } ``` ### `q_delete_dup` C99 Standard >`Synopsis` ```c #include <string.h> int strcmp(const char *s1, const char *s2); ``` >`Description` The strcmp function compares the string pointed to by s1 to the string pointed to by s2. `Returns` The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2. #### <approach 1.0> :::success :monkey: ***[commit fc79a87](https://github.com/sysprog21/lab0-c/commit/fc79a8790abf97518a0ed2b95e2ba4966b510907)*** Implement q_delete_dup [What?] 實作刪除佇列中連續相同節點之函式 [Why?] 使佇列可移除所有連續相同的節點 [How?] 利用 `list_for_each_safe` 走訪每個節點，發現有重複的節點便刪除之 ::: :::success :monkey: ***[commit 036b0e1](https://github.com/sysprog21/lab0-c/commit/036b0e104d7e4eaefa4ce6d3e33d42807da80ebf)*** Add flag on q_delete_dup [What?] 新增旗標`del_flag` 以刪除所有重複相鄰節點 [Why?] 原先我誤會題目要求，以為刪除重複相鄰節點到剩下一個即可。而題目真正的要求是要刪除所有重複相鄰節點，故作此修改 [How?] 走訪佇列節點的過程中，若發現有重複相鄰節點，則刪除其中一個，並將旗標 `del_flag` 設為true，以利刪除最後一個多餘節點 ::: 利用 `list_for_each_safe` 走訪每個節點，發現有重複的節點便刪除之 :::success :monkey: ***[commit c9cc2dc](https://github.com/sysprog21/lab0-c/commit/c9cc2dc1a6ca7668f30b0285f1ff18cae2733f92)*** Revise the q_delete_dup function [What?] 修改 q_delete_dup 函式，解決對 null pointer 取值之錯誤 [Why?] 執行 `list_for_each_entry_safe` 可以走訪整個 list 但是需要注意在走訪完時 n 會指向 head，且 head 是無法透過 `container_of` 找到節點的，因為它沒有嵌入到結構體中，這時如果對 n 取 `contain_of` 會拿到 NULL 如果再做取值 n->value 會報錯成為無效的記憶體操作。 [How?] 新增 l->next != head 條件，避免對 null pointer 取值 ::: 參考 [chiangkd](https://hackmd.io/@chiangkd/2023spring-lab0-c#q_delete_dup) 新增 `l->next != head` 條件，避免對 null pointer 取值 > 執行 list_for_each_entry_safe 可以走訪整個 list 但是需要注意在走訪完時 n 會指向 head，且 head 是無法透過 container_of 找到節點的，因為它沒有嵌入到結構體中，這時如果對 n 取 contain_of 會拿到 NULL 如果再做取值 n->value 會報錯成為無效的記憶體操作。 ```diff! struct list_head *l, *r; bool del = false; list_for_each_safe (l, r, head) { element_t *nl = list_entry(l, element_t, list); element_t *nr = list_entry(r, element_t, list); + if (l->next != head && strcmp(nl->value, nr->value) == 0) { list_del(l); q_release_element(nl); del = true; } else if (del) { if (strcmp(nl->value, nr->value) == 0) del = true; else del = false; list_del(l); q_release_element(nl); ``` :::warning TODO: 撰寫為更精簡的程式碼 ::: #### <approach 1.1> :::success :monkey: ***[commit 5ce9dc5](https://github.com/sysprog21/lab0-c/commit/5ce9dc50596fdcfba91563512524beddd000366e)*** Revise the q_delete_dup function [What?] 移除 q_delete_dup 中多餘程式碼 [Why?] 後來發現先前寫法在跑某些測資時會存取到 NULL 原因在於下面這段程式碼完全是多餘的 ```diff! - if (strcmp(nl->value, nr->value) == 0) - del = true; - else - del = false; + del = false; ``` 遭遇到重複節點只需多刪除最後一個節點即可，前面不管重複幾次都滿足條件 !strcmp(nl->value, nr->value) 。進行多餘的判斷且沒使用 l->next != head 確保程式不會使用到空指標是危險的行為！ [How?] 刪除上述多餘程式碼，使功能正常運作 ::: :::danger access 的翻譯是「存取」，而非「拜訪」(visit) ::: 後來發現上述寫法在跑某些測資時會存取到 `NULL`。原因在於下面這段程式碼完全是多餘的，遭遇到重複節點只需多刪除最後一個節點即可，前面不管重複幾次都滿足條件 `!strcmp(nl->value, nr->value)` 。進行多餘的判斷且沒使用 `l->next != head` 確保程式不會使用到空指標是危險的行為！ ```diff! - if (strcmp(nl->value, nr->value) == 0) - del = true; - else - del = false; + del = false; ``` 更改就能穩定通過測試項目 06 了 :::danger 此處是「測試項目」(item)，而非「資料」，查閱辭典以理解二者的差異。 ::: ### `q_swap` #### <approach 0> :::success :monkey: ***[commit 6ccf009](https://github.com/sysprog21/lab0-c/commit/6ccf0098458b98be56b94d9325d2f176bbbc8e4d)*** Implement q_swap [What?] 實作交換佇列中兩兩相鄰節點之功能 [Why?] 使佇列能兩兩交換相鄰節點之位置 [How?] 利用list_for_each_safe走訪佇列，每兩步進行一次左右節點交換 ::: :::success :monkey: ***[commit 8003196](https://github.com/sysprog21/lab0-c/commit/80031966bd41f1ff85e1e07c1b4a83ae11ee44f0)*** Consider q_size is odd [What?] 考量節點個數是奇數時，第一個節點會和最後一個節點交換的情形 [Why?] 若節點個數是奇數，則原程式碼會讓第一個節點和最後一個節點交換 [How?] 在迴圈中加入 if (r == head) 之判斷，即時中止迴圈 ::: :::success :monkey: ***[commit f2bea72](https://github.com/sysprog21/lab0-c/commit/f2bea720bc11f6c5a23c9fa6207ea7329eecb1a9)*** Add counter on q_swap [What?] 加入counter紀錄當前迴圈走了幾步，偶次步才進行節點交換 [Why?] 因為題目要求第1個節點和第2個節點交換，第3和第4交換...以此類推 [How?] 迴圈中加入判斷式 if (counter % 2 == 0) 若滿足才進行交換 ::: 利用`list_for_each_safe`走訪佇列，每兩步進行一次左右節點交換並參考 [vax-r](https://hackmd.io/@vax-r/linux2024-homework1)，考量節點個數是奇數時，head 也會被交換的情形 ```diff! struct list_head *r, *l; int counter = 0; list_for_each_safe (l, r, head) { + if (r == head) + break; if (counter % 2 == 0) { l->next = r->next; r->prev = l->prev; l->prev->next = r; r->next->prev = l; l->prev = r; r->next = l; } ++counter; } ``` #### <approach １> :::success :monkey: ***[commit 5c6caea](https://github.com/sysprog21/lab0-c/commit/5c6caea796514eb09e8b8a037c4f2049b9998be6)*** Revise the q_swap function [What?] 寫個簡單的 for 迴圈取代 `list_for_each_safe` [Why?] 先前的寫法是錯的，原因是交換後 l 會指向右邊的節點，r 會指向左邊的節點 [How?] 以 for 迴圈將 l 指標推進，並將 l 和 r 交換位置。交換完後 l 會在右邊，因此 l 前進後可直接進行下一次的交換。無須使用 if (counter % 2 == 0) 判斷式 ::: 上面的寫法是錯的，原因是交換後 `l` 會指向右邊的節點，`r` 會指向左邊的節點於是我決定寫個簡單的 for 迴圈取代 `list_for_each_safe` ```diff! int counter = 0; - list_for_each_safe (l, r, head) { + for (l = (head)->next; l != (head); l = l->next) { r = l->next; if (r == head) break; - if (counter % 2 == 0) { l->next = r->next; r->prev = l->prev; l->prev->next = r; r->next->prev = l; l->prev = r; r->next = l; - } - ++counter; ``` #### <approach ２> :::success :monkey: ***[commit 5d87af3](https://github.com/sysprog21/lab0-c/commit/5d87af30ec20c1df4937251d455a2687b8246d8c)*** Implement second version of q_swap [What?] 提出第二個方法，並盡量使用 List API [Why?] 善用 List API，可精簡程式碼，提昇程式可讀性 [How?] 利用list_del和list_add_tail 將佇列前兩個節點移出佇列，並以相反的順序放回佇列尾端。 ::: 為了善用 List API，我寫了第二個版本的 `q_swap` 利用`list_del`和`list_add_tail` 將佇列前兩個節點移出佇列，並以相反的順序放回佇列尾端。 ```diff! + struct list_head *first = l, *second = r; + l = r->next; + list_del(first); + list_del(second); + list_add_tail(second, head); + list_add_tail(first, head); ``` ### `q_reverse` :::success :monkey: ***[commit ce7affb](https://github.com/sysprog21/lab0-c/commit/ce7affb4b06e7ee440e8fe6e3f7a5f7fdca3abb2)*** Implement q_reverse [What?] 實作將佇列節點全部調換順序的函式 [Why?] 使佇列能更改方向 [How?] 依序將佇列所有 link 指向反方向。並發現指標 l 一旦指到最後一個節點程式便會跳出迴圈，應當再補上一次迴圈內的動作 ::: 依序將佇列所有 link 指向反方向，並發現指標`l`一旦指到最後一個節點程式便會跳出迴圈，應當再補上一次迴圈內的動作 ```diff! if (!head || list_empty(head)) return; struct list_head *r, *l; list_for_each_safe (l, r, head) { l->next = l->prev; l->prev = r; } + l->next = l->prev; + l->prev = r; ``` :::warning TODO: 善用 List API ::: ### `q_reverseK` #### <approach 0> :::success :monkey: ***[commit 5b9c697](https://github.com/sysprog21/lab0-c/commit/5b9c697418584debe54aad9915e9a795c90d6ed2)*** Implement q_reverseK [What?] 實作將佇列中每 k 個節點調換順序之函式 [Why?] 使佇列能以每 k 個節點來調換順序 [How?] 利用 counter 判別當前指標l r是否在 group 的交界處若是，則將左方 group 反向串接到右方 group 否則執行單純的指標反轉 ::: 利用 `counter` 判別當前指標`l` `r`是否在 group 的交界處若是，則將左方 group 反向串接到右方 group 否則執行單純的指標反轉 ```diff! struct list_head *l, *r, *leader; int counter = k; list_for_each_safe (l, r, head) { if (counter == k) leader = l; if (counter == 0) { r->prev = leader; l->prev = leader->next; leader->next->next = l; leader->next = r; counter = k; } else { l->next = l->prev; l->prev = r; } --counter; } ``` 上述寫法是錯的 #### <approach 1> :::success :monkey: ***[commit 13b8552](https://github.com/sysprog21/lab0-c/commit/13b85527938868d152e8d882c25ea7519dbf811c)*** Revise the q_reverseK function [What?] 修正 q_reverseK 函式，使功能正常。使用 `list_cut_position`, `list_splice_init` 完成實作，修正程式的功能，同時也精簡化程式碼 [Why?] 原先寫法有誤 [How?] 首先利用 counter 判斷當前是否需要調換前面 k 個節點順序。接著利用`list_cut_position`分出前面 k 個節點。使用 q_reverse 函式調換 k 個節點的順序。最後再用 `list_splice_init` 將 k 個節點放回佇列 ::: 於是參考 [rkhuncle](https://hackmd.io/@rkhuncle/linux2024-homework1) 使用 `list_cut_position`, `list_splice_init` 完成實作下圖是我對 `list_cut_position` 的理解 `leader`為`head_from`的輸入，本身並不會放入`head_to list` 中 ```graphviz digraph G { rankdir = LR node[shape=record]; { rank = same head_to [label="head_to", color=blue]; head_from [label="head_from", style=dashed, color=grey]; head_from_first [label="head_from_first", style=dashed, color=grey]; "node" [label="node", style=dashed, color=grey]; } 1->2->3->4->5->"..." [color=blue]; "..."->5->4->3->2->1 [color=orange]; head_from->1 [color=grey]; head_from_first->2 [color=grey]; "node"->4 [color=grey]; } ``` 使用 `list_cut_position` 後 ```graphviz digraph G { rankdir = LR node[shape=record]; { rank = same head_to [label="head_to", color=blue]; head_from [label="head_from", style=dashed, color=grey]; head_from_first [label="head_from_first", style=dashed, color=grey]; "node" [label="node", style=dashed, color=grey]; } 5->"..." [color=blue]; "..."->5 [color=orange]; 1->5 [color=blue]; 5->1 [color=orange]; 2->3->4 [color=blue]; 4->3->2 [color=orange]; head_from->1 [color=grey]; head_from_first->2 [color=grey]; "node"->4 [color=grey]; head_to->2 [color=blue]; 2->head_to [color=orange]; head_to->4 [color=orange]; 4->head_to [color=blue]; } ``` ```diff! struct list_head *l, *r, *leader; + struct list_head group; int counter = k - 1; for (l = (head)->next, r = l->next, leader = head; l != (head); l = r, r = l->next, --counter) { if (counter == 0) { + list_cut_position(&group, leader, l); q_reverse(&group); + list_splice_init(&group, leader); + leader = r->prev; counter = k; } } ``` ### `q_sort` :::success :monkey: ***[commit b621694](https://github.com/sysprog21/lab0-c/commit/b621694727dd5d59e6d316ea258e205eaed72e2c)*** Implement q_sort [What?] 實作將佇列排序之函式，並盡量善用 List API 精簡化程式碼 [Why?] 此函式較為複雜，多使用 List API 可增加程式可讀性 [How?] 先宣告 `list_head`:`l` `r` 分別存放由`list_cut_position` 所分割的左右兩半佇列，接著再分別對左右兩半佇列進行 q_sort ，排序完畢後再將左右兩半進行 merge 。以下是 merge 過程，先判別是否 `descend`，再依序比較左右兩個佇列的第一個 node.value ，將較 `大/小` 的 node 塞入`head` 的尾端，當有任何佇列為空則結束迴圈，再將非空佇列加入 `head`。 ::: 參考 [Merge Sort – Data Structure and Algorithms Tutorials](https://www.geeksforgeeks.org/merge-sort/) 和 [linkedListSort](https://npes87184.github.io/2015-09-12-linkedListSort/) 以了解 merge sort 的基本原理與實作方式。 [subsidium](https://hackmd.io/@subsidium/linux2024-homework1#q_sort) 完善利用 List API，程式碼寫得精簡易讀，值得借鏡！於是我花了30分鐘認真將 List API 的所有功能和實作細節看懂看熟，並開始撰寫 q_sort 程式碼。先宣告 `list_head`:`l` `r` 分別存放由`list_cut_position` 所分割的左右兩半佇列，接著再分別對左右兩半佇列進行 q_sort ，排序完畢後再將左右兩半進行 merge 。以下是 merge 過程，先判別是否 `descend`，再依序比較左右兩個佇列的第一個 node.value ，將較 `大/小` 的 node 塞入`head` 的尾端，當有任何佇列為空則結束迴圈，再將非空佇列加入 `head`。 ```diff! while (!list_empty(left) && !list_empty(right)) { element_t *l = list_entry(left->next, element_t, list); element_t *r = list_entry(right->next, element_t, list); if (descend) { if (strcmp(l->value, r->value) > 0) { + list_move_tail(left->next, head); } else { + list_move_tail(right->next, head); } } else { if (strcmp(l->value, r->value) < 0) { + list_move_tail(left->next, head); } else { + list_move_tail(right->next, head); } } } if (!list_empty(left)) + list_splice_tail(left, head); else + list_splice_tail(right, head); ``` :::danger 你如何確保目前的測試已涵蓋排序演算法的最差狀況？ ::: 參考 [2024/03/19,21 課堂簡記 164253](https://hackmd.io/3e8LI8_0SLOo_UTDAnPKYQ#164253) > c 可以自己寫一個 `next_permutation` ，實作方式類似 [leetcode 第 31 題](https://leetcode.com/problems/next-permutation/description/)，過程中需要排序，考慮每次改變的區間可以發現，長度為 $k$ 的區間有 $f(k)$ 個 $f(n)=0,f(n-1)=n-1,f(k)=k(1+\sum_{i=k+1}^nf(i))$ $\left\{\begin{aligned}\frac{f(k+1)}{k+1}-1=f(k+2)+\cdots+f(n) \\ \frac{f(k)}k-1=f(k+1)+\cdots+f(n)\end{aligned}\right.$ $f(k)=f(k+1)\frac{k(k+2)}{k+1}=(n-1)\frac{(n-2)n}{n-1}\frac{(n-3)(n-1)}{n-2}\cdots\frac{k(k+2)}{k+1}=\frac{n!}{(k+1)!}k$ $f(n-1)=n-1$ $eg.$ 以數字1~8為例(n=8)，18765432 ⇨ 21345678 `2`移動了 n-1 格，可知每當首位數字變動時，才會有數字恰移動 n-1 格，因此長度為 n-1 的變動區間有 n-1 個。 $f(k)=k(1+\sum_{i=k+1}^nf(i))$ 下圖是我的理解 ![411FBDDB-2660-4AAC-A9A7-50B1B7274376](https://hackmd.io/_uploads/S14vX3906.jpg) 可將變動區間的發生頻率定義成： * 變動範圍 n-1 的區間有 n-1 個 * *每個變動範圍區間大於等於 k 的上下間隔必存在恰 k-1 個變動範圍為 k-1 的區間。* 相當於 * 變動範圍為 1 的區間有 n!/2 個 * *每經歷 k-1 次區間範圍為 k-1 的變動，後續必會遭遇區間範圍大於等於 k 的變動，除了最後 k-1 次。* $<pf>$ [1] 由 $f(n-1)=n-1$ 首項成立 [2] 假設 $f(k)=k(1+\sum_{i=k+1}^nf(i))$ 成立，則 $f(k-1) =$ ### `q_ascend` :::success :monkey: ***[commit 7145a8b](https://github.com/sysprog21/lab0-c/commit/7145a8b7a0aff5ee9fcb66f0403b4ffbd323d203)*** Implement q_ascend and q_descend [What?] 實作移除不滿足 ascend 或 descend 排列順序的節點之函式 [Why?] 依作業需求實作此功能 [How?] 由佇列尾端沿著 prev 走訪整個佇列，並利用 curMin 紀錄先前遇過的最小字串，若當前存取的節點字串大於等於curMin 則刪除該節點。 ::: 由佇列尾端沿著 `prev` 走訪整個佇列，並利用 `curMin` 紀錄先前遇過的最小字串，若當前存取的節點字串大於等於`curMin` 則刪除該節點。 ```diff! while (tmp != head) { element_t *na = list_entry(tmp, element_t, list); if (strcmp(curMin, na->value) <= 0) { struct list_head *del = tmp; tmp = tmp->prev; list_del(del); q_release_element(na); } else { curMin = na->value; tmp = tmp->prev; } } ``` ### `q_descend` :::success :monkey: ***[commit 7145a8b](https://github.com/sysprog21/lab0-c/commit/7145a8b7a0aff5ee9fcb66f0403b4ffbd323d203)*** Implement q_ascend and q_descend [What?] 實作移除不滿足 ascend 或 descend 排列順序的節點之函式 [Why?] 依作業需求實作此功能 [How?] 由佇列尾端沿著 prev 走訪整個佇列，並利用 curMax 紀錄先前遇過的最大字串，若當前拜訪的節點字串小於等於curMax 則刪除該節點。 ::: 由佇列尾端沿著 `prev` 走訪整個佇列，並利用 `curMax` 紀錄先前遇過的最大字串，若當前拜訪的節點字串小於等於`curMax` 則刪除該節點。 ```diff! while (tmp != head) { element_t *t = list_entry(tmp, element_t, list); if (strcmp(curMax, t->value) >= 0) { struct list_head *del = tmp; tmp = tmp->prev; list_del(del); q_release_element(t); } else { curMax = t->value; tmp = tmp->prev; } } ``` ### `q_merge` :::success :monkey: ***[commit dc78edf](https://github.com/sysprog21/lab0-c/commit/dc78edfe7117d173bcfdc7b7d786215a4edd5e20)*** Implement q_merge [What?] 實作將多個佇列合併並排序之函式 [Why?] 依作業需求實作此功能 [How?] 此函式中傳入的 head 指向此 chain 的第一個節點（queue_contex_t）。 head->next為 first_q，用來整合其他 queue 中的節點。只需造訪整條 chain 上的節點，利用list_splice_init 將所有佇列串接到 first_q ，最後再用 q_sort 將整個佇列排序即可。 ::: 參考 [25077667](https://hackmd.io/nfQwmNi0SUey9DVs-8uvow?view) 畫的圖，可知每個`queue_contex_t` 裝有一個 queue 的 `list_head` ，這些`queue_contex_t` 串起來就形成一個 chain 。 ```graphviz digraph QueueRelationships { node [shape=record, fontname="Arial"]; // Define the queue context, which manages individual queues queue_context_1 [label="{queue_contex_t|+ q: list_head*\l+ chain: list_head\l+ size: int\l+ id: int\l}", style=filled, fillcolor=lightblue]; queue_context_2 [label="{queue_contex_t|+ q: list_head*\l+ chain: list_head\l+ size: int\l+ id: int\l}", style=filled, fillcolor=lightblue]; queue_context_3 [label="{queue_contex_t|+ q: list_head*\l+ chain: list_head\l+ size: int\l+ id: int\l}", style=filled, fillcolor=lightblue]; // Define the elements as entries of a queue, now explicitly shown element_A [label="{element_t|+ value: char*\l+ list: list_head\l}", style=filled, fillcolor=lightgray]; element_B [label="{element_t|+ value: char*\l+ list: list_head\l}", style=filled, fillcolor=lightgray]; element_C [label="{element_t|+ value: char*\l+ list: list_head\l}", style=filled, fillcolor=lightgray]; element_1 [label="{element_t|+ value: char*\l+ list: list_head\l}", style=filled, fillcolor=lightgray]; element_2 [label="{element_t|+ value: char*\l+ list: list_head\l}", style=filled, fillcolor=lightgray]; element_3 [label="{element_t|+ value: char*\l+ list: list_head\l}", style=filled, fillcolor=lightgray]; // Define the conceptual queue through element_t chain, now removed for direct visualization queue1 [label="Conceptual Queue (chain of element_t)", shape=ellipse, style=dashed]; queue2 [label="Conceptual Queue (chain of element_t)", shape=ellipse, style=dashed]; queue3 [label="Conceptual Queue (chain of element_t)", shape=ellipse, style=dashed]; // Define the global chain of queues managed by queue_context // global_chain [label="Global Chain of queue_context", shape=ellipse, style=dashed]; // Relationships queue_context_1 -> queue1 [label="manages (q points to first element_t)"]; queue_context_2 -> queue2 [label="manages (q points to first element_t)"]; queue_context_3 -> queue3 [label="manages (q points to first element_t)"]; queue1 -> element_A; queue2 -> element_1; element_A -> element_B; element_B -> element_A; element_B -> element_C; element_C -> element_B; element_C -> element_A; element_A -> element_C; // global_chain -> queue_context_1 [label="part of", dir=none]; // global_chain -> queue_context_2 [label="part of", dir=none]; element_1 -> element_2; element_2 -> element_1; element_2 -> element_3; element_3 -> element_2; element_3 -> element_1; element_1 -> element_3; queue_context_1 -> queue_context_2; queue_context_2 -> queue_context_1; queue_context_2 -> queue_context_3; queue_context_3 -> queue_context_2; queue_context_3 -> queue_context_1; queue_context_1 -> queue_context_3; } ``` 在此函式中傳入的 head 指向此 chain 的第一個節點（`queue_contex_t`） `head->next`為 `first_q`，用來整合其他 queue 中的節點只需造訪整條 chain 上的節點，利用`list_splice_init` 將所有佇列串接到 `first_q` ，最後再用 q_sort 將整個佇列排序即可。 ```diff int len = 0; + queue_contex_t *first_q = list_entry(head->next, queue_contex_t, chain); struct list_head *tmp; list_for_each (tmp, head) { if (tmp == head->next) { continue; } + queue_contex_t *another_q = list_entry(tmp, queue_contex_t, chain); if (list_empty(another_q->q)) continue; len += another_q->size; + list_splice_init(another_q->q, first_q->q); } + q_sort(first_q->q, descend); ``` 目前 95 分，但奇怪的是我在本地端用 `make test` 無法通過測試項目 06，上傳到 git hub 卻通過了？原來`RAND`會產生隨機字串，程式在跑某些測資時會發生 `Segmentation fault occurred. You dereferenced a NULL or invalid pointermake` 的狀況目前發現問題出在 `q_delete_dup` ！已修正 :cactus: :cactus: :cactus: 詳見 `q_delete_dup <approach 1.1>` [commit 8f13b09](https://github.com/sysprog21/lab0-c/commit/8f13b0955b324cf617e14ee7b74d455e4b025049) ![image](https://hackmd.io/_uploads/rJcqDVgpT.png) --- ## Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) >研讀 Linux 核心原始程式碼的 lib/list_sort.c 並嘗試引入到 lab0-c 專案，比較你自己實作的 merge sort 和 Linux 核心程式碼之間效能落差，也該嘗試改進針對鏈結串列排序實作的效能 >* 詳閱[改進 lib/list_sort.c含解說錄影](https://hackmd.io/@sysprog/Hy5hmaKBh) --- ### 解釋 Linux 核心的 lib/list_sort. #### `likely, unlikely` 在list_sort.c 中，經常使用 ```c! if (likely(bits)) ``` 之敘述，於是我參考 [likely and unlikely macro](https://meetonfriday.com/posts/cecba4ef/) 了解在 `/include/linux/compiler.h` 中存在巨集`likely` `unlikely` ```c # define likely(x) __builtin_expect(!!(x), 1) # define unlikely(x) __builtin_expect(!!(x), 0) ``` :cactus:`__built_expect()`是 gcc 的內建 function，用來將 branch 的相關資訊提供給 compiler，告訴 compiler 設計者期望的比較結果，以便對程式碼改變分支順序來進行優化。 :cactus:!!(x)的用意是為了確保出來的值不是0就是1 * 因為x可以是任意數 > 根據c99 standard : ```c if ( expression ) statement ``` > The controlling expression of an if statement shall have scalar type. :cactus:使用`likely` 表示這段敘述(x)為 true 的機率比較大(比較常發生)，告訴 compiler 將 x==ture 的對應執行程式碼接在判斷後面 :cactus: 使用`unlikely` 表示這段敘述(x)為 true 的機率比較小(不常發生)，告訴 compiler 將 x==false 的對應執行程式碼接在判斷後面 #### `cmp` 另一個引起我注意的是 cmp 函式 ```c if (cmp(priv, a, b) <= 0) ``` 參考 [The Linux Kernel API](https://www.kernel.org/doc/html/next/core-api/kernel-api.html) ```c void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) ``` * Parameters >`void *priv` private data, opaque to list_sort(), passed to cmp `struct list_head *head` the list to sort `list_cmp_func_t cmp` the elements comparison function * Description >The comparison function cmp must >- `return > 0` if a should sort after b ("a > b" if you want an ascending sort), and >- `return <= 0` if a should sort before b or their original order should be preserved. >- It is always called with the element that came first in the input in a, and list_sort is a stable sort, so it is not necessary to distinguish the a < b and a == b cases. >This is compatible with two styles of cmp function: >- The traditional style which returns <0 / =0 / >0, or >- Returning a boolean 0/1. >- The latter offers a chance to save a few cycles in the comparison (which is used by e.g. plug_ctx_cmp() in block/blk-mq.c). 由上敘述可知，當 cmp 回傳大於0的值表 a>b，a 應排在 b 的後面，因此這段存在 `merge` 中的程式可這樣解讀 : ```c if (cmp(priv, a, b) <= 0) { *tail = a; tail = &a->next; a = a->next; if (!a) { *tail = b; break; } } else { *tail = b; tail = &b->next; b = b->next; if (!b) { *tail = a; break; } } ``` :cactus: 若 `a<=b` ，則 a 先加入 tail。若 a 為空，則將整個 b 佇列接到 tail 後方 :cactus: 若 `a>b` ，則 b 先加入 tail。若 b 為空，則將整個 a 佇列接到 tail 後方 ### 將 Linux 核心的 lib/list_sort 引入 lab0-c :::success :monkey: ***[commit ee5503b](https://github.com/sysprog21/lab0-c/commit/ee5503b51f9d3772c2668afe4f618b504034b7df)*** Integrate the Linux kernel's lib/list_sort into lab0-c [What?] 將 Linux 核心的 lib/list_sort 引入 lab0-c [Why?] 為了比較自己寫的 q_sort 和 Linux kernel 的 list_sort 之效能 [How?] 以下依修改檔案逐項說明 (1) 在 queue.c 中新增 list_sort 中三個函式 `merge` `merge_final` `list_sort` ，由於我先前已將某函式命名為 merge ，在此以 `merge_list_sort` 替代原 `merge` 函式。 ```c static struct list_head *merge_list_sort(struct list_head *a, struct list_head *b) static void merge_final(struct list_head *head, struct list_head *a, struct list_head *b) void list_sort(struct list_head *head) ``` 函式內進行些微調整： * 去除參數`priv` * 以 `strcmp` 取代 `cmp` * 參考 [Kuanch](https://hackmd.io/QDxYHCXIQ7SQ-k7cM34Vcw?view) 去除 `islikely` `unlikely` (2) 參考 [kart81604](https://hackmd.io/@kart81604/HkGHY1cTs#%E5%B0%87-liblist_sort-%E5%BC%95%E5%85%A5%E8%87%B3-lab0-c-%E5%B0%88%E6%A1%88) 修改 qtest.c * 在函式 `console_init` 加入新命令 `ADD_COMMAND(list_sort, "Sort queue in ascending order by list_sort", "");` ```c ADD_COMMAND(list_sort, "Sort queue in ascending order by list_sort", ""); ``` * 新增 `do_list_sort` 函式 ```c bool do_list_sort(int argc, char *argv[]) ``` * 宣告 `list_sort` 函式 ```c void list_sort(struct list_head *head); ``` ::: ### 閱讀 *George Spelvin* 在 [commit b5c56e0](https://github.com/torvalds/linux/commit/b5c56e0cdd62979dd538e5363b06be5bdf735a09) 提及的三篇論文 #### Bottom-up Mergesort: A Detailed Analysis >Wolfgang Panny, Helmut Prodinger Algorithmica 14(4):340--354, October 1995 --- #### The cost distribution of queue-mergesort, optimal mergesorts, andpower-of-two rules >Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen Journal of Algorithms 30(2); Pages 423--448, February 1999 --- #### Queue-Mergesort >Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 --- 這篇作者提出了 `queue-mergesort` 的方法，並將其方法類比為 `Huffman encoding` 來證明其方法的 merge tree 的 external path length 為最小，推得其權重為最小，因此該 merge tree 為最佳，`queue-mergesort` 為最佳 mergesort 之一。 * 若一個 mergesort 為最佳，表其在 worst case 的比較次數為最少 <---> 其 merge tree 的 leaves 只分佈在樹的最底下兩層 * top-down 必最佳，bottom-up 經些微調整後也為最佳 ***虛擬碼*** ```c queue-mergesort(Q) { while(Q.size!=1)do Q.put(Merge(Q.get, Q.get)) } ``` * 每次取最左方兩個佇列合併排序，並將排序完的佇列放置最右方，直到只剩一個佇列排序即完成。 ***示意圖*** ```graphviz digraph G { node[shape=plaintext]; head [fontcolor="brown"]; tail [fontcolor="brown"]; node[shape=box]; n0 [label= "1",color=lightblue]; n1 [label= "1",color=lightblue]; n2 [label= "1"]; n3 [label= "1"]; n4 [label= "1"]; n5 [label= "1"]; A [label= "A"]; B [label= "B"]; C [label= "C",style=filled,color=lightblue]; D [label= "D"]; E [label= "E"]; F [label= "F",style=filled,color=lightblue]; { rank="same"; n0->n1->n2->n3->n4->n5 } head->n0; tail->n5; n0->C; n1->F; n2->D; n3->A; n4->E; n5->B } ``` ```graphviz digraph G { node[shape=plaintext]; head [fontcolor="brown"]; tail [fontcolor="brown"]; node[shape=box]; n0 [label= "1",color=yellow]; n1 [label= "1",color=yellow]; n2 [label= "1"]; n3 [label= "1"]; n4 [label= "2",color=lightblue]; A [label= "A",style=filled,color=yellow]; B [label= "B"]; C [label= "C",style=filled,color=lightblue]; D [label= "D",style=filled,color=yellow]; E [label= "E"]; F [label= "F",style=filled,color=lightblue]; { rank="same"; n0->n1->n2->n3->n4 } head->n0; tail->n4; n0->D; n1->A; n2->E; n3->B; n4->C->F; } ``` ```graphviz digraph G { node[shape=plaintext]; head [fontcolor="brown"]; tail [fontcolor="brown"]; node[shape=box]; n0 [label= "1", color=pink]; n1 [label= "1", color=pink]; n2 [label= "2"]; n3 [label= "2",color=yellow]; A [label= "A",style=filled,color=yellow]; B [label= "B",style=filled,color=pink]; C [label= "C"]; D [label= "D",style=filled,color=yellow]; E [label= "E",style=filled,color=pink]; F [label= "F"]; { rank="same"; n0->n1->n2->n3 } head->n0; tail->n3; n0->E; n1->B; n2->C->F; n3->A->D; } ``` ```graphviz digraph G { node[shape=plaintext]; head [fontcolor="brown"]; tail [fontcolor="brown"]; node[shape=box]; n0 [label= "2", color=purple]; n1 [label= "2", color=purple]; n2 [label= "2", color=pink]; A [label= "A",style=filled,color=purple]; B [label= "B",style=filled,color=pink]; C [label= "C",style=filled,color=purple]; D [label= "D",style=filled,color=purple]; E [label= "E",style=filled,color=pink]; F [label= "F",style=filled,color=purple]; { rank="same"; n0->n1->n2 } head->n0; tail->n2; n0->C->F; n1->A->D; n2->B->E; } ``` ```graphviz digraph G { node[shape=plaintext]; head [fontcolor="brown"]; tail [fontcolor="brown"]; node[shape=box]; n0 [label= "2", color=lightgray]; n1 [label= "4", color=lightgray]; A [label= "A",style=filled,color=purple]; B [label= "B",style=filled,color=lightgray]; C [label= "C",style=filled,color=purple]; D [label= "D",style=filled,color=purple]; E [label= "E",style=filled,color=lightgray]; F [label= "F",style=filled,color=purple]; { rank="same"; n0->n1 } head->n0; tail->n1; n0->B->E; n1->A->C->D->F; } ``` ```graphviz digraph G { node[shape=plaintext]; head [fontcolor="brown"]; tail [fontcolor="brown"]; node[shape=box]; n0 [label= "6", color=lightgray]; A [label= "A",style=filled,color=lightgray]; B [label= "B",style=filled,color=lightgray]; C [label= "C",style=filled,color=lightgray]; D [label= "D",style=filled,color=lightgray]; E [label= "E",style=filled,color=lightgray]; F [label= "F",style=filled,color=lightgray]; { rank="same"; } head->n0; tail->n0; n0->A->B->C->D->E->F; } ``` ### 比較自己實作的 merge sort 和 Linux 核心程式碼之間效能落差使用 [perf](https://wiki.csie.ncku.edu.tw/embedded/perf-tutorial) 工具檢測效能一開始`perf_event_paranoid`權限被設為 4 :::danger command 是「命令」，而非「指令」(instruction) ::: 可使用以下命令修改權限(-1 為權限全開) ```shell $ sudo sysctl kernel.perf_event_paranoid=-1 ``` #### <方法零> 最粗糙的實驗方式 (實驗樣本不同) 分別對 sort / list_sort 執行相同命令，看跑完結果如何 ```shell $ sudo perf stat ./qtest cmd> new cmd> ih RAND 999999 cmd> sort / list_sort cmd> quit ``` ***sort*** ``` Performance counter stats for './qtest': 1,173.50 msec task-clock # 0.028 CPUs utilized 37 context-switches # 31.530 /sec 6 cpu-migrations # 5.113 /sec 35,235 page-faults # 30.026 K/sec 6,588,902,076 cpu_core/cycles/ # 5.615 GHz (99.66%) 5,033,629,278 cpu_atom/cycles/ # 4.289 GHz (0.33%) 5,084,456,532 cpu_core/instructions/ # 0.77 insn per cycle (99.66%) 563,426,814 cpu_atom/instructions/ # 0.09 insn per cycle (0.34%) 725,644,366 cpu_core/branches/ # 618.359 M/sec (99.66%) 91,778,539 cpu_atom/branches/ # 78.209 M/sec (0.34%) 19,228,387 cpu_core/branch-misses/ # 2.65% of all branches (99.66%) 62,259 cpu_atom/branch-misses/ # 0.01% of all branches (0.34%) TopdownL1 (cpu_core) # 66.4 % tma_backend_bound # 11.0 % tma_bad_speculation # 8.9 % tma_frontend_bound # 13.7 % tma_retiring (99.66%) TopdownL1 (cpu_atom) # 97.5 % tma_bad_speculation (0.34%) # 0.2 % tma_frontend_bound (0.34%) # 106.7 % tma_backend_bound # 106.7 % tma_backend_bound_aux (0.34%) # 0.0 % tma_retiring (0.01%) 42.189814609 seconds time elapsed 1.005834000 seconds user 0.168980000 seconds sys ``` ***list_sort*** ``` Performance counter stats for './qtest': 1,108.08 msec task-clock # 0.022 CPUs utilized 43 context-switches # 38.806 /sec 6 cpu-migrations # 5.415 /sec 35,233 page-faults # 31.796 K/sec 6,297,501,839 cpu_core/cycles/ # 5.683 GHz (99.17%) 808,320,907 cpu_atom/cycles/ # 0.729 GHz (0.37%) 4,719,015,350 cpu_core/instructions/ # 0.75 insn per cycle (99.17%) 2,629,617,057 cpu_atom/instructions/ # 0.42 insn per cycle (0.73%) 674,906,876 cpu_core/branches/ # 609.076 M/sec (99.17%) 384,454,428 cpu_atom/branches/ # 346.954 M/sec (0.83%) 19,209,980 cpu_core/branch-misses/ # 2.85% of all branches (99.17%) 2,627,089 cpu_atom/branch-misses/ # 0.39% of all branches (0.83%) TopdownL1 (cpu_core) # 64.6 % tma_backend_bound # 8.6 % tma_bad_speculation # 13.1 % tma_frontend_bound # 13.8 % tma_retiring (99.17%) TopdownL1 (cpu_atom) # 67.9 % tma_bad_speculation (0.83%) # 10.8 % tma_frontend_bound (0.83%) # 24.3 % tma_backend_bound # 24.3 % tma_backend_bound_aux (0.83%) # 0.0 % tma_retiring (0.47%) 49.449310632 seconds time elapsed 0.939995000 seconds user 0.169441000 seconds sys ``` 整理成表格，可發現我寫的 q_sort 表現與 list_sort 相差不大，但這是不合理的實驗。 | | my sort | list sort | | -------- | -------- | -------- | | task clock | 1,173.50 msec | 1,108.08 msec | | context switches | 37 | 43 | | page faults | 35,235 | 35,233 | | branches misses (core) | 19,228,387 | 19,209,980 | | branches misses (atom) | 62,259 | 2,627,089 | #### <方法一> 使用同樣本 (仍不公平，因為不同演算法擅長的樣本不同) 參考 [kart81604](https://hackmd.io/@kart81604/HkGHY1cTs#%E5%B0%87-liblist_sort-%E5%BC%95%E5%85%A5%E8%87%B3-lab0-c-%E5%B0%88%E6%A1%88) 將先隨機產生資料，並寫入 data.txt 中，如此兩次實驗即可共用相同輸入。在 `qtest` 中新增 `write` `read` 兩個指令，分別代表寫入 data.txt 和從中讀取資料。 :::success :monkey: ***[commit 2a8d392](https://github.com/jychen0611/lab0-c/commit/2a8d392d8803efaaac0451ba00ef46dbd149b9f8)*** Add the "write" and "read" commands to qtest [what?] 在 qtest.c 中加入 `do_write` `do_read` 兩個函式，分別代表將當前佇列節點資料寫入 data.txt 和從 data.txt 中讀取資料放入佇列 [why?] 由於實驗需以相同基準點衡量性能，故將隨機產生的資料寫入 data.txt，以利不同實驗對象讀取相同輸入 [how?] 在 qtest.c 中新增 `do_write` 函式，逐一將當前佇列節點資料寫進 data.txt。新增 `do_read` 函式，從 data.txt 中讀取指定數量的字串，並插入當前佇列( buffer 大小設為 10)。 ::: ***產生資料並寫檔*** ```shell $./qtest cmd> new cmd> ih RAND 999999 cmd> write cmd> quit ``` ***讀檔進行效能測試*** ```shell $sudo perf stat ./qtest cmd> new cmd> read 999999 cmd> sort / list_sort cmd> quit ``` ***q_sort*** ``` Performance counter stats for './qtest': 1,010.43 msec task-clock # 0.027 CPUs utilized 37 context-switches # 36.618 /sec 8 cpu-migrations # 7.917 /sec 35,236 page-faults # 34.872 K/sec 5,322,818,753 cpu_core/cycles/ # 5.268 GHz (99.65%) 689,256,019 cpu_atom/cycles/ # 0.682 GHz (0.32%) 3,193,111,361 cpu_core/instructions/ # 0.60 insn per cycle (99.65%) 1,354,510,793 cpu_atom/instructions/ # 0.25 insn per cycle (0.34%) 651,937,624 cpu_core/branches/ # 645.209 M/sec (99.65%) 269,959,937 cpu_atom/branches/ # 267.174 M/sec (0.35%) 18,212,741 cpu_core/branch-misses/ # 2.79% of all branches (99.65%) 2,721,285 cpu_atom/branch-misses/ # 0.42% of all branches (0.35%) TopdownL1 (cpu_core) # 70.7 % tma_backend_bound # 13.4 % tma_bad_speculation # 6.1 % tma_frontend_bound # 9.8 % tma_retiring (99.65%) TopdownL1 (cpu_atom) # 90.3 % tma_bad_speculation (0.35%) # 6.4 % tma_frontend_bound (0.35%) # 10.6 % tma_backend_bound # 10.6 % tma_backend_bound_aux (0.35%) # 0.0 % tma_retiring (0.03%) 37.838074239 seconds time elapsed 0.995614000 seconds user 0.016123000 seconds sys ``` ***list_sort*** ``` Performance counter stats for './qtest': 1,040.56 msec task-clock # 0.035 CPUs utilized 38 context-switches # 36.519 /sec 11 cpu-migrations # 10.571 /sec 35,235 page-faults # 33.861 K/sec 5,619,021,193 cpu_core/cycles/ # 5.400 GHz (99.78%) 709,118,105 cpu_atom/cycles/ # 0.681 GHz (0.20%) 2,848,500,928 cpu_core/instructions/ # 0.51 insn per cycle (99.78%) 130,223,207 cpu_atom/instructions/ # 0.02 insn per cycle (0.22%) 602,747,086 cpu_core/branches/ # 579.251 M/sec (99.78%) 28,606,170 cpu_atom/branches/ # 27.491 M/sec (0.22%) 18,072,038 cpu_core/branch-misses/ # 3.00% of all branches (99.78%) 1,106,266 cpu_atom/branch-misses/ # 0.18% of all branches (0.22%) TopdownL1 (cpu_core) # 73.1 % tma_backend_bound # 9.8 % tma_bad_speculation # 8.5 % tma_frontend_bound # 8.6 % tma_retiring (99.78%) TopdownL1 (cpu_atom) # 96.1 % tma_bad_speculation (0.22%) # 10.0 % tma_frontend_bound (0.22%) # 79.5 % tma_backend_bound # 79.5 % tma_backend_bound_aux (0.22%) # 0.0 % tma_retiring (0.02%) 29.854585006 seconds time elapsed 0.994007000 seconds user 0.047903000 seconds sys ``` ***表格*** | | my sort | list sort | | -------- | -------- | -------- | | task clock | 1,010.43 msec | 1,040.56 msec | | context switches | 37 | 38 | | page faults | 35,236 | 35,235 | | branches misses (core) | 18,212,741 | 18,072,038 | | branches misses (atom) | 2,721,285 | 1,106,266 | #### <方法二> 參考 [CPython 的 listsort 的說明文件](https://github.com/python/cpython/blob/main/Objects/listsort.txt) 盡量涵蓋到不同樣本 ***random data*** 見方法一 ***descending data*** ```shell $./qtest cmd> new cmd> ih RAND 999999 cmd> sort cmd> reverse cmd> write cmd> quit ``` | | my sort | list sort | | -------- | -------- | -------- | | task clock | 394.22 msec | 232.44 msec | | context switches | 30 | 32 | | page faults | 35,235 | 35,236 | | branches misses (core) | 2,759,462 | 1,794,584 | | branches misses (atom) | 715,985 | <not counted> | ***ascending data*** ```shell $./qtest cmd> new cmd> ih RAND 999999 cmd> sort cmd> write cmd> quit ``` | | my sort | list sort | | -------- | -------- | -------- | | task clock | 370.47 msec | 268.25 msec | | context switches | 35 | 37 | | page faults | 35,236 | 35,239 | | branches misses (core) | 2,931,585 | 1,621,214 | | branches misses (atom) | 2,768,175 | 1,246,388 | ***ascending, then 99999 random at the end*** ```shell $./qtest cmd> new cmd> ih RAND 999999 cmd> sort cmd> it RAND 99999 cmd> write cmd> quit ``` | | my sort | list sort | | -------- | -------- | -------- | | task clock | 368.02 msec | 421.53 msec | | context switches | 31 | 38 | | page faults | 38,752 | 38,753 | | branches misses (core) | 5,185,188 | 3,837,989 | | branches misses (atom) | 47,711 | 62,375 | ***ascending, then reverseK 7*** ```shell $./qtest cmd> new cmd> ih RAND 100000 cmd> sort cmd> reverseK 7 cmd> write cmd> quit ``` | | my sort | list sort | | -------- | -------- | -------- | | task clock | 65.35 msec | 64.16 msec | | context switches | 32 | 34 | | page faults | 3596 | 3596 | | branches misses (core) | 436,034 | 323,387 | | branches misses (atom) | 185,322 | 227,083 | ***many duplicates*** ```shell $./qtest cmd> new cmd> ih RAND 10000 cmd> ih b 10000 cmd> ih RAND 10000 cmd> ih d 10000 cmd> ih RAND 10000 cmd> ih a 10000 cmd> ih RAND 10000 cmd> ih c 10000 cmd> ih RAND 10000 cmd> ih e 10000 cmd> write cmd> quit ``` | | my sort | list sort | | -------- | -------- | -------- | | task clock | 83.52 msec | 64.73 msec | | context switches | 30 | 33 | | page faults | 3,595 | 38,753 | | branches misses (core) | 880,806 | 937,747 | | branches misses (atom) | 113,522 | 137,047 | ***all equal*** ```shell $./qtest cmd> new cmd> ih a 100000 cmd> write cmd> quit ``` | | my sort | list sort | | -------- | -------- | -------- | | task clock | 49.54 msec | 60.67 msec | | context switches | 28 | 36 | | page faults | 3,595 | 3596 | | branches misses (core) | 166,327 | 57,640 | | branches misses (atom) | <not counted> | 62,937 | :::danger 翻閱統計學教科書，探討要什麼量體，才能代表樣本特徵？ ::: #### 研讀 [Probability & Statistics for Engineers & Scientists, 9/e (GE-Paperback)](https://www.tenlong.com.tw/products/9781292161365) *中央極限定理* #### <方法三> 重新撰寫腳本進行實驗 --- ## 論文〈[Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)〉 >研讀論文〈Dude, is my code constant time?〉，解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 Student's t-distribution 及程式實作的原理。 >* 注意：現有實作存在若干致命缺陷，請討論並提出解決方案 >* 在 oreparaz/dudect 的程式碼具備 percentile 的處理，但在 lab0-c 則無。 --- ### step1 測量執行時間 * 以兩種不同類型(fix vs random)資料作為輸入，統計其執行時間之分佈是否相同 * 使用 CPU cycle 數計算時間(cycle counter) * 每次測量會隨機選一種類型的資料，以最小化環境變因 ### step2 後製處理 * 將極端值裁除 * 高階預處理 eg. `centered product` ### step3 進行統計測試 * 嘗試推翻虛無假說 `the two timing distributions are equql.` * t 測試 eg. `Welch's test` 比較兩者樣本分佈差異 $$ t = \frac{\bar{X_0} - \bar{X_1}}{\sqrt{\frac{s^2_1}{N_1} + \frac{s^2_2}{N_2}}} $$ * 無參數測試 eg. `Kolmogorov-Smirnov` ### lab0-c/dudect 中的 `fixture.c` 和 `constant.c` #### 函式 `test_const` 測試常數時間的主要函式。會進行 10 次測試，每次測試會呼叫 `ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1` 次 `doit` 並將回傳值指派給變數 `result`。 * 其中 `ENOUGH_MEASURE` = 10000，`N_MEASURES` = 150，`DROP_SIZE` = 20。 * 10次測試中若 `result` 為真，則跳出迴圈並回傳 True。 #### 函式 `doit` 與 `dudect_main` 之比較 |doit (lab0-c)|dudect_main (dudect.h)| |-|-| |輸入僅有 `mode`，其他參數以`N_MEASURES`初始化 |以結構體 `dudect_ctx_t` 做為輸入| |執行 `prepare_inputs` ，初始化 `input_data`|執行 `prepare_inputs`，初始化 `input_data`| |回傳值 `ret` 由 `measure` 取得|執行 `measure`，測量 `exec_times`| |執行 `differentiate` 和 `update_statistics`|回傳值 `ret` 被指派為 `DUDECT_NO_LEAKAGE_EVIDENCE_YET`| |`ret` 和 `report` 的回傳值取 `&` 後再回傳|若為第一次執行 `dudect_main`，則執行 `prepare_percentiles`。否則執行`update_statistics` 並更新 `ret` 為 `report` 之回傳值| #### 結構體比較 lab0-c ```c typedef struct { double mean[2];// double m2[2]; // double n[2]; // } t_context_t; ``` dudect.h ```c typedef struct { double mean[2];// double m2[2]; // double n[2]; // } t_context_t; ``` ```c typedef struct { int64_t *ticks; int64_t *exec_times; uint8_t *input_data; uint8_t *classes; dudect_config_t *config; ttest_ctx_t *ttest_ctxs[DUDECT_TESTS]; int64_t *percentiles; } dudect_ctx_t; ``` #### 函式 `measure` 之比較 |lab0-c|dudect.h| |-|-| | 利用 `mode` 判別當前是否為 `insert_head` `insert_tail` `remove_head` `remove_tail`|單一模式| | || #### 函式 `differentiate` 利用`measure` 取得的紀錄，計算執行時間 #### 函式 `update_statistics` 之比較 |lab0-c|dudect.h| |-|-| |若執行時間非負，直接執行 t 測試 |若執行時間非負，直接執行 t 測試| |無 |對利用 `percentiles` 刪減極端值後的執行時間進行 t 測試| |無 |當測量值超過10000次則進行二次測試| #### 函式 `report` 之比較 |lab0-c|dudect.h| |-|-| |無輸入 |`ctx` 為輸入 | ||| #### 函式 `prepare_percentiles` 產生 `percentiles` 用以篩除極端值 #### 函式對應關係 |[lab0-c](https://github.com/sysprog21/lab0-c/tree/master/dudect) | [dudect.h](https://github.com/oreparaz/dudect/blob/master/src/dudect.h) | | - | - | |doit |dudect_main| |prepare_inputs |prepare_inputs | |measure |measure | |無 |prepare_percentiles | |update_statistics |update_statistics | |report |report | ### 修改 lab0-c 中的 dudect 程式碼 #### 新增 prepare_percentiles 相關函式以篩除極端值對函式輸入參數格式進行些微修改，由於原程式碼使用結構體 `dudect_ctx_t` 存放 exec_times 等資訊，而 lab0-c 中沒有，因此需將其作為參數輸入。 ```c static int cmp(const int64_t *a, const int64_t *b) static int64_t percentile(int64_t *a_sorted, double which, size_t size) static void prepare_percentiles(int64_t *exec_times,int64_t *percentiles) ``` ```diff +#define DUDECT_NUMBER_PERCENTILES 100 static bool doit(int mode) { + int64_t *percentiles = calloc(DUDECT_NUMBER_PERCENTILES, sizeof(int64_t)); + prepare_percentiles(exec_times, percentiles); ``` :::success :monkey: ***[commit e9eab10](https://github.com/jychen0611/lab0-c/commit/e9eab108d1e653618024676c81eaecec5473df87)*** Add the prepare_percentiles function to filter out extreme values [What?] 針對檢測程式執行時間的程式碼進行修改，使測試程式近似原 dudect 程式 [Why?] 原測試程式無實作 `prepare_percentiles` 功能，可能導致極端值影響判斷 [How?] 將相關程式碼加入 fixture.c ，並對函式輸入參數格式進行些微修改。由於原程式碼使用結構體 `dudect_ctx_t` 存放 exec_times 等資訊，而 lab0-c 中沒有，因此需將其作為參數輸入。 ::: 修改完後發現可以通過測試項目 17 ![image](https://hackmd.io/_uploads/BJWoBbekA.png) --- ## 在 qtest 提供新的命令 shuffle >在 qtest 提供新的命令 shuffle，允許藉由 Fisher–Yates shuffle 演算法，對佇列中所有節點進行洗牌 (shuffle) 操作，需要以統計的原理來分析你的實作，探究洗牌的「亂度」 ### 在 `queue.c` 新增 `q_shuffle` 函式 >根據 [Fisher–Yates shuffle 演算法來實作洗牌](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm) [1] 先用 q_size 取得 queue 的大小 len。 [2] 隨機從 0 ~ (len - 1) 中抽出一個數字 random，然後 old 將指向從前面數來第random 個節點，new 會指向最後一個未被抽到的節點，將 old 和 new 指向的節點的值交換，再將 len - 1。 [3] 隨著 len 大小變小，已經被抽到過，並交換值到 queue 後面的會愈來愈多，直到所有的節點都已經被抽到過，shuffle 就結束。在 qtest.c 新增函式 q_shuffle，並新增 shuffle 命令以便呼叫 q_shuffle。 ```diff + void q_shuffle(struct list_head *head) + static bool do_shuffle(int argc, char *argv[]) + ADD_COMMAND(shuffle, "Using the Fisher-Yates shuffle algorithm to shuffle", ""); ``` :::success :monkey: ***[commit 5816f1b](https://github.com/jychen0611/lab0-c/commit/5816f1bc1b05c9f8c90b2079bd96a2f3bfbde4e5)*** Add "shuffle" command [What?] 在 qtest.c 新增函式 q_shuffle，並新增 shuffle 命令以便呼叫 q_shuffle。 [Why?] 依作業要求在 qtest 提供新的命令 shuffle ，以探究洗牌亂度。 [How?] 參考 Fisher–Yates shuffle 演算法，對佇列中所有節點進行洗牌操作。 ::: ### 利用[測試程式](https://hackmd.io/@sysprog/linux2024-lab0/%2F%40sysprog%2Flinux2024-lab0-d#%E6%B8%AC%E8%A9%A6%E7%A8%8B%E5%BC%8F)統計 shuffle 結果發現我一開始的做法有誤，結果並不平均 ![image](https://hackmd.io/_uploads/Hy6lF0O1R.png) 原因出在我把 `srand(time(NULL))` 寫在 q_shuffle 函式中，導致`srand(time(NULL))`會被重複呼叫。由於電腦指令跑很快，幾乎同時間的亂數會很多，使得重複相同亂數會很多。當我把 `srand(time(NULL))` 移至主函式後，雖然分佈遍均勻了，卻又發現我寫的程式有死角，只有一半的情況會發生。 ![image](https://hackmd.io/_uploads/rkNcORd1C.png) 後來發現原因出在指派 new 指標時少走一次 `next`，修改後問題已改善。 ![image](https://hackmd.io/_uploads/ry-v3AOkC.png) :::success :monkey: ***[commit 66d9a67](https://github.com/jychen0611/lab0-c/commit/66d9a67f484ee1ed2f039d226d645ea547444fdd)*** Revise "shuffle" command [What?] 修正 q_shuffle 函式使其能正確運作，達到均勻亂數洗牌 [Why?] 原因出在我把 `srand(time(NULL))` 寫在 q_shuffle 函式中，導致`srand(time(NULL))`會被重複呼叫。由於電腦指令跑很快，幾乎同時間的亂數會很多，使得重複相同亂數會很多。當我把 `srand(time(NULL))` 移至主函式後，雖然分佈遍均勻了，卻又發現我寫的程式有死角，只有一半的情況會發生。後來發現原因出在指派 new 指標時少走一次 `next`，修改後問題已改善。 [How?] 修正 `srand(time(NULL))` 位置，修正 new 指派位置。 ::: --- ## 比較亂數產生器的品質 >在 qtest 中執行 option entropy 1 並搭配 ih RAND 10 一類的命令，觀察亂數字串的分布，並運用「資訊與熵互補，資訊就是負熵」的觀念，設計一組新的命令或選項，得以在 qtest 切換不同的 PRNG 的實作 (可選定 Xorshift)，搭配統計分析工具，比較亂數產生器 (如 Xorshift vs. 內建) 的品質 >* 參見: 由大數法則理解訊息熵 (entropy) 的數學意義 >* 參見: The Linux Pseudorandom Number Generator Revisited ## web >確保 qtest 提供的 web 命令得以搭配上述佇列實作使用，目前一旦網頁伺服器啟動後，原本處理命令列操作的 linenoise 套件就無法使用，請提出改善機制 >* 提示: 使用 select 系統呼叫 # 2024q1 Homework3 (Tic-Tac-Toe) >* 將 Linux 核心的 [linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 標頭檔與 `hlist` 相關的程式碼抽出，成為 `hlist.h`，並確保後者可與 [lab0-c](https://github.com/sysprog21/lab0-c) 既有的 `list.h` 並存，[jserv/ttt](https://github.com/jserv/ttt) 專案的 `zobrist.[ch]` 會用到 `hlist.h` >* 修改 `qtest` 程式，使其新增 `ttt` 命令，其作用是執行與人類對弈的井字遊戲，並使用 [Monte Carlo tree search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search) (MCTS) 演算法，確保使用定點數來取代原本 [jserv/ttt](https://github.com/jserv/ttt) 的浮點數運算。 >* 擴充上述 `ttt` 命令，在 `qtest` 引入新的 `option` (參照 `cmd>` 命令提示的 ` help` 說明文字)，允許使用者切換「人 vs.電腦」或「電腦 vs. 電腦」的對弈，注意後者應該採取 MCTS 以外的演算法 (也可納入你發展的演算法)。對弈的過程中，要在螢幕顯示當下的時間 (含秒數)，並持續更新。 >* 針對 `ttt` 命令的「電腦 vs. 電腦」的對弈模式，參照〈[並行程式設計：排程器原理](https://hackmd.io/@sysprog/concurrent-sched)〉，引入若干 coroutine (注意：不得使用 POSIX Thread，採取教材提到的實作方式)，分別對應到 AI~1~, AI~2~ (具備玩井字遊戲的人工智慧程式，採取不同的演算法) 和鍵盤事件處理，意味著使用者可連續觀看**多場**「電腦 vs. 電腦」的對弈，當按下 Ctrl-P 時暫停或回復棋盤畫面更新 (但運算仍持續)、當按下 Ctrl-Q 時離開「電腦 vs. 電腦」的對弈，注意不該使用 (n)curses 函式庫。當離開「電腦 vs. 電腦」的對弈時，`ttt` 應顯示多場對弈的過程，亦即類似 `Moves: B2 -> C2 -> C3 -> D3 -> D4` 的輸出，儘量重用現有的結構體和程式碼。 > 關於鍵盤事件和終端機畫面的處理機制，可參見 [mazu-editor](https://github.com/jserv/mazu-editor) 原始程式碼和〈[Build Your Own Text Editor](https://viewsourcecode.org/snaptoken/kilo/)〉。 >* 亂數產生器對於 MCTS 相當重要，可參照 [Improving performance and efficiency of PRNG](https://github.com/jserv/ttt/issues/32)，嘗試引入其他快速的 PRNG 實作並比較 MCTS 實作獲得的效益。 >* 在「電腦 vs. 電腦」的對弈模式，比照前述 load average 的計算方式，規範針對下棋 AI 的 load 機制，並運用定點數來統計不同 AI 實作機制對系統負載的使用率，整合到 `qtest` 輸出，開發過程中應有對應的數學分析和羅列實作考量因素。 >* 考慮到畫面更新不該混雜其他 `printf` 的訊息，可運用 [GDB 進行遠端除錯](https://hackmd.io/@sysprog/gdb-example)。 ## 將 jserv/ttt 專案的程式碼部分整合進 lab0-c ### 引入 hash list 標頭檔 :::success :monkey: ***[commit 979d357](https://github.com/sysprog21/lab0-c/commit/979d357d68f856ba3d21c4333ada4f388c1f2997)*** Add header file of hash list [What?] 新增 hlist.h 標頭檔 [Why?] 依作業要求整合 ttt 專案部份程式碼到 lab0-c [How?] 將 Linux 核心的 linux/list.h 標頭檔與 hlist 相關的程式碼抽出，成為 hlist.h，並確保後者可與 lab0-c 既有的 list.h 並存，jserv/ttt 專案的 zobrist.[ch] 會用到 hlist.h ::: ### 新增 ttt 命令執行與人類對弈的井字遊戲 (使用 Monte Carlo tree search) :::success :monkey: ***[commit 2c249ec](https://github.com/sysprog21/lab0-c/commit/2c249ecce2dffb2a13c3c59522c7c21bb1595905)*** Add Noughts and Crosses game command [What?] 修改 qtest 程式，使其新增 ttt 命令 [Why?] 為了實現 AI 與人類對弈的井字遊戲 [How?] 將 jserv/ttt 專案的程式碼部分整合進 lab0-c，，使用 Monte Carlo tree search (MCTS) 演算法作為 AI 玩家，演算法仍使用浮點數運算。 ::: ![image](https://hackmd.io/_uploads/BJbtD3Ql0.png) #### Monte Carlo tree search 原理 **UCT value** $$UCT = \frac{w_i}{n_i} + c\sqrt{\frac{\ln{t}}{n_i}}$$ * $w_i$ : the number of simulated wins from that node after the i-th move. * $n_i$ : the number of simulations that have occured for that node after the i-th move. * $c$ : the exploration parameter * $t$ : the total number of simulations that have occured after i moves. 其中$\frac{w_i}{n_i}$ 代表利用，$c\sqrt{\frac{\ln{t}}{n_i}}$代表探索，其作用在於探詢可能的路徑，避免有更好的路徑沒被開發(可由式子中發現使用次數越少的節點 $\sqrt{\frac{\ln{t}}{n_i}}$ 會越大，且當模擬次數越多時探索力度會稍微加大，但其加大幅度會遞減)。調整 $c$ 值可改變探索與利用的比例。 **four basic phases to MCTS:** * selection : 選擇 USB 最高的節點形成一條路徑 * expansion : 生成 selection 所選路徑的下一步所有狀態，並隨機挑選一種狀態 * simulation : 從上步所選狀態開始模擬試玩，直到分出勝負或平手 * back-propagation : 依模擬結果打分數，並反向傳播給整條路徑上的節點，以更新 UCB 值 ### 設計以定點數運算取代浮點數運算的蒙地卡羅搜尋樹演算法 > Robert Love 在 Linux Kernel Development 一書論及浮點運算: >>No (Easy) Use of Floating Point When using floating-point instructions kernel normally catches a trap and then initiates the transition from integer to floating point mode. Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot easily trap itself. Using a floating point inside the kernel requires manually saving and restoring the floating point registers. Except in the rare cases, no floating-point operations are in the kernel. ***以下實驗使用 Ubuntu 24.04 LTS 進行開發*** ```shell! $ uname -r 6.8.0-31-generic $ gcc --version gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 $ lscpu 架構： x86_64 CPU 作業模式： 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 22 On-line CPU(s) list: 0-21 供應商識別號： GenuineIntel Model name: Intel(R) Core(TM) Ultra 7 155H CPU 家族： 6 型號： 170 每核心執行緒數： 2 每通訊端核心數： 16 Socket(s): 1 製程： 4 CPU(s) scaling MHz: 25% CPU max MHz: 4800.0000 CPU min MHz: 400.0000 BogoMIPS: 5990.40 ``` 原 MCTS 程式碼在計算分數時會使用到浮點數運算，改以定點數表示之 ```diff +typedef long fixed_point; +#define shift 10 //frac = 2^10 +#define EXPLORATION_FACTOR fixed_sqrt(2) struct node { - double score; + fixed_point score; }; ``` * 定點數縮放因子 * $frac=2^{10}$ * 1 $\leftrightarrow$ $2^{10}$ * 0.5 $\leftrightarrow$ $2^{9}$ * 0 $\leftrightarrow$ 0 * -1 $\leftrightarrow$ $-2^{10}$ * DBL_MAX $\leftrightarrow$ LONG_MAX * 加法減法 : 可直接進行 * 乘法 : 結果要多除 ${frac}$ * 除法 : 結果要多乘 ${frac}$ * 自然對數 $ln$ 根據泰勒展開式可得 $$lnx = (\frac{x-1}{x}) + \frac{1}{2}(\frac{x-1}{x})^2 + \frac{1}{3}(\frac{x-1}{x})^3 +\ ...$$ $$= \sum_{n=1}^{\infty} \frac{1}{n}(\frac{x-1}{x})^n$$ 使用 frac = 1024 實作 (利於使用 bitwise) * 其在 1<x<100 之表現 : ![log](https://hackmd.io/_uploads/ByckCHyM0.png) * 如圖可知 `UCT value` 的分佈多位於 0~3.5 之間，但 ln的輸入為 t (the total number of simulations that have occured after i moves)必為整數，因此不須特別考量 0<t<1 之情況。 ![uct](https://hackmd.io/_uploads/SyzN6LJG0.png) * 平方根 * 使用十分逼近法求平方根 ![squa](https://hackmd.io/_uploads/SJBBBo1z0.png) * 可發現當 x<4 會求出 0 ![squa](https://hackmd.io/_uploads/SkIiriJzC.png) * 發現問題出在逼近值的上限設成$\frac{x}{2}$，調整後可正常擬合 ![squa](https://hackmd.io/_uploads/BkFRDsyMR.png) * 0<x<1 的表現也正常 ![squa](https://hackmd.io/_uploads/HJf7uiJzA.png) ![image](https://hackmd.io/_uploads/HyjVMqlzA.png) 當我把定點數 ln 和平方根運算寫入 ttt 遊戲中才發現，如果要追求極高的精度只會導致演算法下棋速率緩慢，該如何權衡精度與運算速率是一大問題。 ### UCT value 分佈比較 #### 依序輸入命令 c3 c2 c1 * 可發現使用定點數的 UCT value 主要集中在特定數值上，相較浮點數很不均勻 ![uct](https://hackmd.io/_uploads/SyXKgqeMR.png) ![uct](https://hackmd.io/_uploads/S1zLnogGR.png) ![uct](https://hackmd.io/_uploads/S1UG6oeMC.png) ## 參考資料 * [C99 standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) * [Linux 核心專題: 改進 lib/list_sort.c](https://hackmd.io/@sysprog/Hy5hmaKBh) * [Tic Tac Toe at the Monte Carlo](https://medium.com/swlh/tic-tac-toe-at-the-monte-carlo-a5e0394c7bc2)