# 2023q1 Homework1 (lab0) contributed by < [`Cycatz`](https://blog.staque.xyz/about/about.html) > ## 開發環境 ```shell $ gcc --version gcc (GCC) 12.2.1 20230201 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD Ryzen 7 PRO 6850U with Radeon Graphics CPU family: 25 Model: 68 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 Frequency boost: enabled CPU(s) scaling MHz: 39% CPU max MHz: 4767.1870 CPU min MHz: 1600.0000 BogoMIPS: 5391.33 ``` ## queue.c 開發過程 ### Overview 使用 `make test` 測試程式完成狀況: ``` shell scripts/driver.py -c --- Trace Points +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head --- trace-01-ops 5/5 +++ TESTING trace trace-02-ops: # Test of insert_head, insert_tail, remove_head, remove_tail, and delete_mid --- trace-02-ops 6/6 +++ TESTING trace trace-03-ops: # Test of insert_head, insert_tail, remove_head, reverse and merge --- trace-03-ops 6/6 +++ TESTING trace trace-04-ops: # Test of insert_head, insert_tail, size, swap, and sort --- trace-04-ops 6/6 +++ TESTING trace trace-05-ops: # Test of insert_head, insert_tail, remove_head, reverse, size, swap, and sort --- trace-05-ops 6/6 +++ TESTING trace trace-06-ops: # Test of insert_head, insert_tail, delete duplicate, sort, descend and reverseK --- trace-06-ops 6/6 +++ TESTING trace trace-07-string: # Test of truncated strings --- trace-07-string 6/6 +++ TESTING trace trace-08-robust: # Test operations on empty queue --- trace-08-robust 6/6 +++ TESTING trace trace-09-robust: # Test remove_head with NULL argument --- trace-09-robust 6/6 +++ TESTING trace trace-10-robust: # Test operations on NULL queue --- trace-10-robust 6/6 +++ TESTING trace trace-11-malloc: # Test of malloc failure on insert_head --- trace-11-malloc 6/6 +++ TESTING trace trace-12-malloc: # Test of malloc failure on insert_tail --- trace-12-malloc 6/6 +++ TESTING trace trace-13-malloc: # Test of malloc failure on new --- trace-13-malloc 6/6 +++ TESTING trace trace-14-perf: # Test performance of insert_tail, reverse, and sort --- trace-14-perf 6/6 +++ TESTING trace trace-15-perf: # Test performance of sort with random and descending orders # 10000: all correct sorting algorithms are expected pass # 50000: sorting algorithms with O(n^2) time complexity are expected failed # 100000: sorting algorithms with O(nlogn) time complexity are expected pass --- trace-15-perf 6/6 +++ TESTING trace trace-16-perf: # Test performance of insert_tail --- trace-16-perf 6/6 +++ TESTING trace trace-17-complexity: # Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant ERROR: Probably not constant time or wrong implementation ERROR: Probably not constant time or wrong implementation ERROR: Probably not constant time or wrong implementation Probably constant time --- trace-17-complexity 0/5 --- TOTAL 95/100 make: *** [Makefile:60: test] Error 1 ``` 目前只差時間複雜度的測試沒有通過,但暫時沒有想到進一步提升效率的做法,因此先擱置。 ### `q_new` 呼叫 `INIT_LIST_HEAD` 以初始化串列開頭: ``` c struct list_head *q_new() { element_t *q = malloc(sizeof(element_t)); if (q == NULL) return NULL; q->value = NULL; INIT_LIST_HEAD(&q->list); return &q->list; } ``` ### `q_free` `q_free` 會先檢查傳入的 `list_head` 是否為 `NULL`,避免對空指標進行引用,導致未定義行為。 檢查之後會使用 `delete_node` 函式從串列中移除元素並進行記憶體釋放,最後則是釋放串列的開頭。 ``` c void q_free(struct list_head *l) { if (l == NULL) return; element_t *cur, *next; list_for_each_entry_safe (cur, next, l, list) { delete_node(cur); } /* free the head */ cur = list_entry(l, element_t, list); free(cur); } ``` 而 `delete_node` 是我自己定義的函式,可供 `q_delete_mid` 與 `q_delete_dup` 重複使用,減少程式碼行數: ``` c static void delete_node(element_t *e) { list_del(&e->list); if (e->value) free(e->value); free(e); } ``` :::warning `if (e->value)` 敘述非必要,請閱讀 free(3) :notes: jserv > 已於 [386909](https://github.com/Cycatz/lab0-c/commit/386909b34c53d89864a6cd65bb8e2f18811a42f9) 修改 > 根據 free(3): > > If ptr is NULL, no operation is performed. > > 因此不需要進行多餘 `e->value` 的判斷。 ::: ```diff @@ -13,8 +13,7 @@ static void delete_node(element_t *e) { list_del(&e->list); - if (e->value) - free(e->value); + free(e->value); free(e); } ``` ### `q_insert_(head|tail)` `q_insert_head` 除了檢查 `head` 是否為空指標以外,還會檢查字串是否為空指標。 接著元素結構 `element_t` 會被分配,該元素的資料 `node->value` 也會被分配空間並複製傳入的字串。最後再使用 `list_add` 將元素插入到串列的開頭。 ``` c /* Insert an element at head of queue */ bool q_insert_head(struct list_head *head, char *s) { if (head == NULL || s == NULL) return false; element_t *node = malloc(sizeof(element_t)); if (node == NULL) return false; size_t len = strlen(s); node->value = malloc(len + 1); if (node->value == NULL) { free(node); return false; } strncpy(node->value, s, len); node->value[len] = '\0'; list_add(&node->list, head); return true; } ``` :::warning 撰寫更精簡的程式碼。 > 已於 [695175](https://github.com/Cycatz/lab0-c/commit/695175785e676a6627b087f75cfd470479b79de5) 改善,使用 `strdup` 函式減少程式碼行數。 ::: `q_insert_tail` 跟 `q_insert_head` 的實作大同小異,唯一差別只在於使用 `list_add_tail` 來插入元素到最尾端: ``` c /* Insert an element at tail of queue */ bool q_insert_tail(struct list_head *head, char *s) { if (head == NULL || s == NULL) return false; element_t *node = malloc(sizeof(element_t)); if (node == NULL) return false; size_t len = strlen(s); node->value = malloc(len + 1); if (node->value == NULL) { free(node); return false; } strncpy(node->value, s, len); node->value[len] = '\0'; list_add_tail(&node->list, head); return true; } ``` 更新,使用 `strdup` 來進行字串的複製操作: ```diff @@ -70,13 +70,18 @@ bool q_insert_head(struct list_head *head, char *s) if (node == NULL) return false; + node->value = strdup(s); - size_t len = strlen(s); - node->value = malloc(len + 1); if (node->value == NULL) { free(node); return false; } - strncpy(node->value, s, len); - node->value[len] = '\0'; - list_add(&node->list, head); - return true; } @@ -90,12 +95,16 @@ bool q_insert_tail(struct list_head *head, char *s) if (node == NULL) return false; + node->value = strdup(s); - size_t len = strlen(s); - node->value = malloc(len + 1); if (node->value == NULL) { free(node); return false; } - strncpy(node->value, s, len); - node->value[len] = '\0'; - list_add_tail(&node->list, head); return true; } ``` ### `q_remove_(head|tail)` `q_remove_head` 額外使用 `list_empty()` 檢查串列是否為空。 接著移除掉開頭的元素並複製該字串到傳入的空間中。 這裡要注意可能傳入的空間 `sp` 可能為空指標,或者是 `bufsize` 為 0 。 ``` c /* Remove an element from head of queue */ element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (head == NULL || list_empty(head)) return NULL; struct list_head *first = head->next; list_del(first); element_t *entry = list_entry(first, element_t, list); size_t slen, dlen; if (!entry->value || !sp || bufsize == 0) return entry; slen = strlen(entry->value); dlen = slen < (bufsize - 1) ? slen : (bufsize - 1); strncpy(sp, entry->value, dlen); sp[dlen] = '\0'; return entry; } ``` `q_remove_tail` 邏輯也差不多,只不過刪除的是最後一個元素 ``` c /* Remove an element from tail of queue */ element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) { if (head == NULL || list_empty(head)) return NULL; struct list_head *last = head->prev; list_del(last); element_t *entry = list_entry(last, element_t, list); size_t slen, dlen; if (!entry->value || !sp || bufsize == 0) return entry; slen = strlen(entry->value); dlen = slen < (bufsize - 1) ? slen : (bufsize - 1); strncpy(sp, entry->value, dlen); sp[dlen] = '\0'; return entry; } ``` ### `q_size` `q_size` 逐一走訪所有串列的元素以得到串列的長度: :::danger 注意 [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) :notes: jserv ::: :::success 已修改:「遍歷」 -> 「逐一走訪」 ::: ``` c /* Return number of elements in queue */ int q_size(struct list_head *head) { int len = 0; const struct list_head *node; if (!head) return 0; list_for_each (node, head) len++; return len; } ``` ### `q_delete_mid` `q_delete_mid` 先使用 `find_mid` 來找到串列中間的元素,然後再用上面提到過的 `delete_node` 進行刪除: ``` c /* Delete the middle node in queue */ bool q_delete_mid(struct list_head *head) { if (head == NULL || list_empty(head)) return false; size_t size = q_size(head); struct list_head *mid = find_mid(head->next, size); element_t *entry = list_entry(mid, element_t, list); delete_node(entry); return true; } ``` ``` c static struct list_head *find_mid(struct list_head *head, size_t len) { size_t mid = len / 2; struct list_head *entry = head; while (1) { if (mid-- == 0) break; entry = entry->next; } return entry; } ``` ### `q_delete_dup` 在 `q_delete_dup` 中有定義兩個變數: - `bool dup`: 記錄目前元素是否為重複 - `element_t *ori`: 記錄新遇到的元素 函式的流程是: 1. 逐一走訪串列的每個元素 2. 當發現目前的元素的值與其一個元素值相等時,會刪除目前的元素,並設定 `dup = true` ,代表有重複情形發生 3. 若不相等,檢查 `dup` 判斷是否有重複狀況發生,如果有,則會刪除 `ori`。最後再更新 `dup = false` 與設定 `ori` 為目前元素 ``` c /* Delete all nodes that have duplicate string */ bool q_delete_dup(struct list_head *head) { // https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/ if (head == NULL || list_empty(head)) return false; bool dup = false; element_t *entry, *safe, *ori = NULL; list_for_each_entry_safe (entry, safe, head, list) { if (ori && ori->value && entry->value && strcmp(ori->value, entry->value) == 0) { delete_node(entry); dup = true; } else { if (dup) delete_node(ori); ori = entry; dup = false; } } if (dup) delete_node(ori); return true; } ``` ### `q_swap` 針對兩個元素交換,要處理兩元素之間的關係共 2 條連結: - `former->next` - `latter->prev` 我們還需要處理 `former` 的前一個元素與 `latter` 的後一個元素的連結: - `former->prev` - `former->prev->next` - `latter->next` - `latter->next->prev` 因此要處理 2 + 4 共 6 條連結 實作上我使用了 `list_for_each` 來遍歷了所有元素並修改它們之間的關係。 ``` c /* Swap every two adjacent nodes */ void q_swap(struct list_head *head) { if (head == NULL) return; struct list_head *former, *latter; list_for_each (former, head) { latter = former->next; if (latter != head) { latter->prev = former->prev; former->prev->next = latter; former->next = latter->next; latter->next->prev = former; latter->next = former; former->prev = latter; } } } ``` 這時你可能會問為什麼我用的不是 `list_for_each_safe`,難道不怕改變連結後出現錯誤例如出現無窮迴圈?答案是不會,因為改變連結後,目前的元素指標 `former` 恰好會指向下下個元素,這正好是我們要的。 實作了 `q_reverseK` 之後,我們也可以將 `q_swap` 改寫成: ``` c void q_swap(struct list_head *head) { q_reverseK(head, 2); } ``` ### `q_reverse` `q_reverse` 可以分成三個步驟: 1. 將所有元素的 `prev` 指向下一個元素 `next` 2. 將所有元素的 `next` 指向前一個元素,走訪時需要額外記錄前一個元素(因為 `prev` 已經被覆寫了) 3. 最後再交換 `head->prev` 與 `head->next` 的值即可 ``` c void q_reverse(struct list_head *head) { if (head == NULL) return; struct list_head *entry, *former = head, *safe; list_for_each (entry, head) { entry->prev = entry->next; } entry = head->next; safe = entry->next; while (1) { /* Suppress unreadVariable */ (void) safe; if (entry == head) break; safe = entry->next; entry->next = former; former = entry; entry = safe; } /* Swap head's next and prev */ struct list_head *tmp; tmp = head->next; head->next = head->prev; head->prev = tmp; } ``` 實作了 `q_reverseK` 之後,我們也可以將 `q_reverse` 改寫成: ``` c void q_reverse(struct list_head *head) { q_reverseK(head, q_size(head)); } ``` ### `q_reverseK` 方法是集滿 `k` 個元素就往回串,重複此過程,最後再重建往前的 path。但缺點是要處理 `cnt % k != 0` 的狀況,變得比較不優雅: ``` c /* Reverse the nodes of the list k at a time */ void q_reverseK(struct list_head *head, int k) { // https://leetcode.com/problems/reverse-nodes-in-k-group/ if (k <= 1 || head == NULL || list_empty(head)) return; LIST_HEAD(new_head); struct list_head *nhead = &new_head; struct list_head *cur, *next; cur = head->next; next = cur->next; int cnt = 0; while (1) { if (cur == head) break; if (++cnt % k == 0) { /* insert to tail */ struct list_head *pos = cur; for (int i = 0; i < k; i++) { nhead->next = pos; nhead = nhead->next; pos = pos->prev; } } cur = next; next = next->next; } /* Handle cnt % k != 0 condition */ size_t left = cnt % k; struct list_head *left_begin = head; while (1) { if (left-- == 0) break; left_begin = left_begin->prev; } if (cnt % k) { nhead->next = left_begin; nhead = head->prev; } /* Rebuild reverse path */ head->prev = nhead; head->next = new_head.next; nhead->next = head; struct list_head *prev = head; cur = head->next; while (1) { if (cur == head) break; cur->prev = prev; prev = cur; cur = cur->next; } } ``` 下面是我重寫的版本,只有一個主要的迴圈,改成每次迴圈都使用 `rcur` 串起來並同時設定好往前的邊,在 `cnt % k == 0` 時再接上去: ``` c /* Reverse the nodes of the list k at a time */ void q_reverseK(struct list_head *head, int k) { // https://leetcode.com/problems/reverse-nodes-in-k-group/ if (k <= 1 || head == NULL || list_empty(head)) return; LIST_HEAD(last_head); LIST_HEAD(rcur_head); struct list_head *last = &last_head, *rcur = &rcur_head; struct list_head *cur, *next, *first = NULL; int cnt = 0; size_t size = q_size(head); size_t maximum = (size / k) * k; cur = head->next; next = cur->next; while (maximum-- > 0) { /* Insert to head */ cur->next = rcur; cur->prev = rcur->prev; rcur->prev = cur; rcur = rcur->prev; if (++cnt == k) { struct list_head *last_new = rcur->prev->prev; rcur->prev = last; last->next = rcur; last = last_new; if (first == NULL) first = rcur; /* Reset rcur */ rcur = &rcur_head; INIT_LIST_HEAD(rcur); /* Reset cnt */ cnt = 0; } cur = next; next = next->next; } /* Directly connect to the rest elements*/ if (size != maximum) { last->next = cur; cur->prev = last; last = head->prev; } /* Occur when q_size < k */ if (first == NULL) first = head->next; /* Connect to head */ head->prev = last; last->next = head; head->next = first; first->prev = head; } ``` ### `q_sort` 首先我定義了一個靜態函式叫做 `merge_sort`,有兩個參數,分別是要排序的起始元素 `head` 已經要排序的元素數量 `len`。執行步驟如下: 1. 若長度小於等於 `1` ,即可直接回傳起始元素 2. 使用 `find_mid` 來切割左半邊與右半邊,並分別對兩邊呼叫 `merge_sort` 3. 兩邊都成功排序好之後,進行合併 ``` c static struct list_head *merge_sort(struct list_head *head, size_t len) { if (len == 0 || len == 1) { return head; } struct list_head *mid = find_mid(head, len); size_t l, r; l = len / 2; r = len - l; /* 2. sort each half */ struct list_head *l_head = merge_sort(head, l); struct list_head *r_head = merge_sort(mid, r); struct list_head *l_cur = l_head; struct list_head *r_cur = r_head; /* 3. merge two halves */ LIST_HEAD(new); struct list_head *tmp = &new; while (l > 0 && r > 0) { element_t *l_ele, *r_ele; l_ele = list_entry(l_cur, element_t, list); r_ele = list_entry(r_cur, element_t, list); if (strcmp(l_ele->value, r_ele->value) <= 0) { tmp->next = l_cur; tmp = tmp->next; l_cur = l_cur->next; l--; } else { tmp->next = r_cur; tmp = tmp->next; r_cur = r_cur->next; r--; } } if (l) tmp->next = l_cur; if (r) tmp->next = r_cur; return new.next; } ``` :::warning 簡化上述程式碼,例如最後幾行可改為 `tmp->next = (struct list_head *) ((uintptr_t) l_cur | (uintptr_t) r_cur);` :notes: jserv > 因為 `l_cur` 與 `r_cur` 都不為 NULL ,因此不能使用以上寫法。但可將 `tmp = tmp->next` 移至共同執行區域,並移除不需要變數宣告 `l_head` 與 `r_head`,詳細修改在 [9c6d7c6](https://github.com/Cycatz/lab0-c/commit/9c6d7c64e8ec22cd15dad83308edf69d736dd18e) > > 不懂 `uintptr_t` ,因此查閱[規格書](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) (ch7.18.1.4): > > The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer > > 因此指標可以轉換為 `uintptr_t` 以進行 bitwise 操作 ::: `q_sort` 直接呼叫 `merge_sort(head->next, len)`,排序好之後,我們需要復原每個元素的 `prev` 連結。 ``` c /* Sort elements of queue in ascending order */ void q_sort(struct list_head *head) { if (head == NULL || list_empty(head)) return; size_t len = q_size(head); struct list_head *new = merge_sort(head->next, len); struct list_head *cur = new, *prev = head; while (len-- > 0) { cur->prev = prev; prev = cur; cur = cur->next; } head->next = new; head->prev = prev; prev->next = head; } ``` ### `q_descend` 從串列尾部開始逐一走訪每個元素, `max` 儲存目前字典序最大值的字串,每次比較目前元素的字串與 `max` 的字典序大小,如果小於 `max` 即刪除: :::warning 留意 [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) :notes: jserv > 已修改:「遍歷」 -> 「逐一走訪每個元素」 ::: ``` c int q_descend(struct list_head *head) { struct list_head *entry = head->prev, *pprev; char *max = NULL; size_t len = 0; pprev = entry->prev; while (1) { element_t *e; if (entry == head) break; e = list_entry(entry, element_t, list); if (!max || strcmp(e->value, max) >= 0) { max = e->value; len++; } else { delete_node(e); } entry = pprev; pprev = pprev->prev; } return len; } ``` ### `q_merge` 把所有的串列裡面的元素都串起來,然後呼叫 `q_sort` 一次排序好: ``` c /* Merge all the queues into one sorted queue, which is in ascending order */ int q_merge(struct list_head *head) { LIST_HEAD(mq); struct list_head *mqc = &mq; struct list_head *q_entry, *entry, *safe; /* Merge elements in each queue into one queue */ list_for_each (q_entry, head) { queue_contex_t *q_ctx = list_entry(q_entry, queue_contex_t, chain); struct list_head *q_head = q_ctx->q; list_for_each_safe (entry, safe, q_head) { mqc->next = entry; mqc = mqc->next; list_del(entry); } } /* Replace the head pointer */ struct list_head *first_queue = list_entry(head->next, queue_contex_t, chain)->q; first_queue->next = mq.next; first_queue->prev = mqc; mq.next->prev = first_queue; mqc->next = first_queue; q_sort(first_queue); return q_size(first_queue); } ``` 對於單個串列,其實我們不必將元素一個一個加入到新的串列裡面,我們可以只加該串列的開頭並跳到結尾就可以 (可以呼叫 `list_splice_init`) 。 並且呼叫 `q_sort` 會導致複雜度超級爛,是 $O(NlogN)$,$N$ 為所有元素的數量,待改善。 --- ## 研讀 Linux 核心原始程式碼的 [lib/list.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) ### 研讀 `list_sort` 的核心概念就是 merge sort ,只不過是從左到右,並採用 bottom-up 的方式來 merge 元素,直到排序完畢。 在 `list_sort` 中,元素會被分成兩群 `list` 和 `pending`: - `list` 指向未被處理的元素的開頭,一開始會指向第一個元素 (`head->next`) - 而 `pending` 存放的是 sorted sublists , sublist 會遵守長度遞減排序,使用 `prev` 存取上一個 sublist - 例如 pending 元素數量 `count = 7` 時, sublist 依序為 `4 -> 2 -> 1` 而 `list_sort` 過程主要可以分成兩個階段: 1. `list` 不為空,每回合 `pending` 按照目前總元素數量 `count` 決定需要合併的 sublists 並進行合併。合併結束後 `list` 移動一個元素到 `pending` 2. `list` 為空, 由後至前,由小到大合併 `pending` 裡面的 sublists :::danger 不需要完整張貼程式碼 (GitHub 更適合做這件事),程式碼在筆記中,只該是輔助解說,因此你只需要列出關鍵的程式碼。 :notes: jserv > 已刪除程式碼片段 ::: 例如 `list = [4, 3, 2, 1]` ,在排序的第一階段的過程會像這樣: === Count: 1 === List: [ 3 2 1 ] Pending -> [ 4 ] === Count: 2 === List: [ 2 1 ] Pending -> [ 3 ] -> [ 4 ] === Count: 3 === List: [ 1 ] Pending -> [ 2 ] -> [ 3 4 ] === Count: 4 === List: [ ] Pending -> [ 1 ] -> [ 2 ] -> [ 3 4 ] 可以發現 `list` 裡面的元素每回合移動到 `pending` 裡面,並且 `pending` 會在 `count = 2 ~ 3` 時合併 (`[3] + [4] = [3 4]`) 。 下面是 `list = [8, 7, 6, 5, 4, 3, 2, 1]` 的例子: === Count: 1 === List: [ 7 6 5 4 3 2 1 ] Pending -> [ 8 ] === Count: 2 === List: [ 6 5 4 3 2 1 ] Pending -> [ 7 ] -> [ 8 ] === Count: 3 === List: [ 5 4 3 2 1 ] Pending -> [ 6 ] -> [ 7 8 ] === Count: 4 === List: [ 4 3 2 1 ] Pending -> [ 5 ] -> [ 6 ] -> [ 7 8 ] === Count: 5 === List: [ 3 2 1 ] Pending -> [ 4 ] -> [ 5 6 ] -> [ 7 8 ] === Count: 6 === List: [ 2 1 ] Pending -> [ 3 ] -> [ 4 ] -> [ 5 6 7 8 ] === Count: 7 === List: [ 1 ] Pending -> [ 2 ] -> [ 3 4 ] -> [ 5 6 7 8 ] === Count: 8 === List: [ ] Pending -> [ 1 ] -> [ 2 ] -> [ 3 4 ] -> [ 5 6 7 8 ] 至於合併的方法是根據 `count` 從右邊數來第一個 bit '1' 的位置: - 例如 3 的二進位表示是 `0b0011` , 第一個遇到的 bit '1' 在 `k = 0` 的位置,因此合併兩個 size 為 $2^0 = 1$ 的 sublist ,生成 $2 ^ 1 = 2$ 大小的 sublist - 例如 6 的二進位表示是 `0b0110` , 第一個遇到的 bit '1' 在 `k = 1` 的位置 ,因此合併兩個 size 為 $2^1 = 2$ 的 sublist , 生成 $2 ^ 2 = 4$ 大小的 sublist 例外是 `count` 為 `2^k` 則不進行合併 :::warning **TODO**: 閱讀證明 ::: ### 引入程式碼與進行效能分析 我的實作放在 [list-sort-impl](https://github.com/Cycatz/lab0-c/tree/list-sort-impl) 分支,主要有兩個改動: 1. 使用巨集 `USE_LIST_SORT` 切換我自己的 `q_sort` 實作與 Linux 核心的 `list_sort` 實作 2. 使用巨集 `LIST_SORT_DETAILS` 切換是否要印出 `list` 與 `pending` 的元素內容,如前面範例所示,方便理解演算法過程 使用以下命令進行編譯: ``` bash make LOCAL_CFLAGS="-DUSE_LIST_SORT -DLIST_SORT_DETAILS" ``` 因為使用巨集來切換 sort 模式有點不方便,每次都需要重新編譯。因此我新增一個子命令叫做 `list_sort`,使用方法跟 `sort` 一樣: ```shell $ ./qtest cmd> help Commands: ... list_sort | Sort queue in ascending order with list sort ``` 然後我們撰寫測試檔案,名稱為`traces/(trace-test-sort|trace-test-list_sort).cmd`,其內容為排序 1000000 個隨機元素: ``` option fail 0 option malloc 0 new it RAND 1000000 list_sort free ``` 使用 [hyperfine](https://github.com/sharkdp/hyperfine) 進行初步的時間分析: ```shell $ hyperfine -- "./qtest -f traces/trace-test-sort.cmd 2>&1 >/dev/null" Benchmark 1: ./qtest -f traces/trace-test-sort.cmd 2>&1 >/dev/null Time (mean ± σ): 1.722 s ± 0.019 s [User: 1.498 s, System: 0.223 s] Range (min … max): 1.699 s … 1.768 s 10 runs ``` ```shell $ hyperfine -- "./qtest -f traces/trace-test-list_sort.cmd 2>&1 >/dev/null" Benchmark 1: ./qtest -f traces/trace-test-list_sort.cmd 2>&1 >/dev/null Time (mean ± σ): 1.613 s ± 0.017 s [User: 1.393 s, System: 0.220 s] Range (min … max): 1.593 s … 1.635 s 10 runs ``` 可以看到 list sort 比我實作的 sort 快大約 6%。 :::warning 注意用語:此處若用「略快」,則會在後續分析時,缺乏精準的比較標的。 :notes: jserv > 已修改: 「略快」 -> 6% ::: 再來我們可以使用 perf 命令進行更一步的效能分析: list sort: ```shell $ perf stat -r 10 -e cycles,instructions,branches,faults,migrations,cache-misses,cache-references,L1-dcache-load-misses,L1-icache-load-misses ./qtest -f traces/trace-test-list_sort.cmd 2>&1 >/dev/null Performance counter stats for './qtest -f traces/trace-test-list_sort.cmd' (10 runs): 6,247,634,709 cycles:u ( +- 0.26% ) (85.64%) 2,126,374,515 instructions:u # 0.34 insn per cycle ( +- 0.11% ) (85.64%) 495,365,631 branches:u ( +- 0.16% ) (85.64%) 35,230 faults:u ( +- 0.00% ) 0 migrations:u 45,040,410 cache-misses:u # 42.331 % of all cache refs ( +- 0.22% ) (85.76%) 106,261,207 cache-references:u ( +- 0.12% ) (85.83%) 62,674,722 L1-dcache-load-misses:u ( +- 0.32% ) (85.83%) 40,349 L1-icache-load-misses:u ( +- 8.90% ) (85.65%) 1.61549 +- 0.00577 seconds time elapsed ( +- 0.36% ) ``` my sort: ```shell $ perf stat -r 10 -e cycles,instructions,branches,faults,migrations,cache-misses,cache-references,L1-dcache-load-misses,L1-icache-load-misses ./qtest -f traces/trace-test-sort.cmd 2>&1 >/dev/null Performance counter stats for './qtest -f traces/trace-test-sort.cmd' (10 runs): 6,745,005,859 cycles:u 2,235,144,329 instructions:u # 0.33 insn per cycle 529,523,395 branches:u 35,228 faults:u 0 migrations:u 50,563,551 cache-misses:u # 44.978 % of all cache refs 112,671,718 cache-references:u 65,363,837 L1-dcache-load-misses:u 45,277 L1-icache-load-misses:u 1.73245 +- 0.00838 seconds time elapsed ( +- 0.48% ) ``` 可以發現 list sort 的 cache misses 比例比 my sort 的比例只低了約 2.6% ,因此 my sort 的效能比 list sort 低落並不完全是 cache misses 的緣故。 但可以發現 my sort 執行的 instructions 的數量比 list sort 多了約 5 % ,因此整體執行的 cycles 數增加,造成執行時間較長。我們也可以從各自的 L1 icache misses 數量印證。 --- ## 在 qtest 提供新的命令 `shuffle` 這邊我實作 [Fisher–Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm) 的方法是從開頭開始,而不是從尾部。 每次使用 `randombytes` 挑選一個 index $j$, $idx \leq j \leq len$ ,並使用 `target` 變數記錄該索引對應的元素。 ```c while (idx < len) { size_t tidx = 0; struct list_head *target = ent; randombytes((uint8_t *) &rand, sizeof(rand)); rand = rand % (len - idx); while (tidx++ != rand) { target = target->next; } /* Swap two ent and target */ ent = target->next; idx++; } ``` 接著交換目前的元素與隨機選取的元素。可以分成兩種情況:兩元素相鄰或是不相鄰,主要是避免有自己指向自己的情況發生 (`node->next = node`) ```c /* Swap two elements */ if (ent->next == target) { /* Avoid node->next = node */ target->prev = ent->prev; ent->prev->next = target; ent->next = target->next; target->next->prev = ent; target->next = ent; ent->prev = target; } else { tmp.next = ent->next; tmp.prev = ent->prev; ent->next = target->next; ent->prev = target->prev; target->next = tmp.next; target->prev = tmp.prev; ent->next->prev = ent; ent->prev->next = ent; target->next->prev = target; target->prev->next = target; } ``` :::warning **TODO**: 精簡程式碼 ::: ### 統計方法驗證 `shuffle` 我們使用以下程式對 `[1 2 3 4]` 進行 1000000 次的 shuffle: ```python #!/usr/bin/env python import subprocess import re import random from collections import Counter from itertools import permutations def gen_permutation(l): table = list(permutations(l, len(l))) return table def run_test(input_command, regex): command = './qtest -v 3' output = subprocess.run(command.split(), input=input_command, capture_output=True, encoding='utf-8').stdout m = regex.findall(output, re.M) return [tuple(l.split()) for l in m[1:]] def pearson_chi_square(e, o): return (o - e) * (o - e) / e def pearson_chi_squares(N, E, O, table): chi_square = sum([pearson_chi_square(E[i], O[table[i]]) for i in range(N)]) return chi_square def test_shuffle(arr=['1', '2', '3', '4'], times=1000000): command = "new\n" command += "".join([f"it {e}\n" for e in arr]) command += "shuffle\n" * times command += "free\n" command += "quit\n" regex_str = r"l = \[(\S+" regex_str += "".join([r" \S+" for _ in range(len(arr) - 1)]) regex_str += r")\]" regex = re.compile(regex_str) table = gen_permutation(arr) res = run_test(command, regex) c = Counter(res) N = len(table) e = times / N E = [e] * N chi_square = pearson_chi_squares(N, E, c, table) print(c) print(c.total()) print(E) print("Chi square is:", chi_square) print("df (degree of freedom) is:", N - 1) test_shuffle() ``` 結果如下: ``` Counter({('3', '4', '2', '1'): 42060, ('3', '2', '1', '4'): 42026, ('1', '4', '2', '3'): 42017, ('4', '3', '1', '2'): 41950, ('3', '4', '1', '2'): 41886, ('2', '4', '1', '3'): 41815, ('1', '2', '3', '4'): 41812, ('2', '3', '1', '4'): 41725, ('2', '1', '4', '3'): 41720, ('2', '1', '3', '4'): 41697, ('4', '1', '3', '2'): 41642, ('2', '4', '3', '1'): 41639, ('1', '4', '3', '2'): 41639, ('4', '1', '2', '3'): 41615, ('1', '3', '4', '2'): 41588, ('3', '1', '4', '2'): 41582, ('1', '3', '2', '4'): 41541, ('4', '2', '1', '3'): 41520, ('2', '3', '4', '1'): 41494, ('1', '2', '4', '3'): 41491, ('4', '2', '3', '1'): 41440, ('3', '1', '2', '4'): 41439, ('4', '3', '2', '1'): 41362, ('3', '2', '4', '1'): 41300}) 1000000 [41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664, 41666.666666666664] Chi square is: 24.764623999999998 df (degree of freedom) is: 23 ``` 我們假設虛無假說 $H_0$ 為 shuffle 的結果發生的機率相同,遵守 Uniform distribution ,而對應的對立假說 $H_1$ 即為 shuffle 的結果有至少有一個不一樣。 我們設定信心水準 (significant level) $\alpha$ 為 0.05 ,根據 chi square 的值 $24.76$ 與自由度 $23$ 查[卡方分佈表](https://people.richland.edu/james/lecture/m170/tbl-chi.html)找出 p 值,發現 p 介於 0.9 ~ 0.1 之間 ($14.848 < 24.76 < 32.007$ ),大於 0.05。 因此我們可以說統計檢定的結果不拒絕虛無假說 $H_0$ ,沒有直接證據推翻 shuffle 結果為 uniform distribution。