--- tags: linux2023 --- # [2023q1](http://wiki.csie.ncku.edu.tw/linux/schedule) 第 1 週測驗題 :::info 目的: 檢驗學員對 linked list 的認知 ::: ==[作答表單: 測驗 1](https://docs.google.com/forms/d/e/1FAIpQLSdOhxEanV_DrnM4or0rkU0Rbv-r0Sqy2iTkhjqEK3vRawWD0Q/viewform)== (針對 Linux 核心「設計」課程) ==[作答表單: 測驗 2-3](https://docs.google.com/forms/d/e/1FAIpQLScneFLpKZ6j2k1WpN6ALwQMjc4vAtE4qxQOI5LPVNKUjew-Ag/viewform)== (針對 Linux 核心「實作」課程) ### 測驗 `1` 給定 [list.h](https://github.com/sysprog21/lab0-c/blob/master/list.h) 作為〈[linked list 和非連續記憶體操作](https://hackmd.io/@sysprog/c-linked-list)〉提及的 Linux 核心風格 circular doubly-linked list 實作,欲處理的節點採用以下結構體: ```c #include <stdint.h> #include "list.h" struct item { uint16_t i; struct list_head list; }; ``` 用以對節點內含值比較的函式: ```c static inline int cmpint(const void *p1, const void *p2) { const uint16_t *i1 = (const uint16_t *) p1; const uint16_t *i2 = (const uint16_t *) p2; return *i1 - *i2; } ``` 以下程式碼實作快速排序法,數值由小到大排列,並確保可達到 [stable sorting](https://en.wikipedia.org/wiki/Sorting_algorithm) 的目標。 ```c static void list_sort(struct list_head *head) { if (list_empty(head) || list_is_singular(head)) return; struct list_head list_less, list_greater; INIT_LIST_HEAD(&list_less); INIT_LIST_HEAD(&list_greater); struct item *pivot = list_first_entry(head, AAA, BBB); list_del(&pivot->list); struct item *itm = NULL, *is = NULL; CCC (itm, is, head, list) { if (cmpint(&itm->i, &pivot->i) < 0) DDD (&itm->list, &list_less); else EEE (&itm->list, &list_greater); } list_sort(&list_less); list_sort(&list_greater); list_add(&pivot->list, head); list_splice(&list_less, head); FFF(&list_greater, head); } ``` 請補完,使其行為符合預期。作答規範: * `AAA`, `BBB`, `CCC`, `DDD`, `EEE`, `FFF` 應以最簡潔的形式撰寫,且符合[作業一](https://hackmd.io/@sysprog/linux2023-lab0)排版規範 (近似 Linux 核心程式碼排版風格),均不包含小括號 (即 `(` 和 `)`),且不可出現 `,` 和 `;` 這樣的分隔符號 (separator) * `BBB` 為 identifier * `CCC`, `DDD`, `EEE`, `FFF` 均為以 `list_` 開頭的巨集 (macro) - [ ] Hybrid sorting 除了在教科書出現的經典排序演算法,如 merge sort 和 quick sort,在諸多軟體會採用截長補短的策略,融合兩項或更多排序演算法,這樣的案例包含 [Timsort](https://en.wikipedia.org/wiki/Timsort) 和 [Introsort](https://en.wikipedia.org/wiki/Introsort)。 [Introsort](https://en.wikipedia.org/wiki/Introsort) 可看做 quick sort 的強化,遞迴分割陣列,區間越來越短,數字也幾乎排序好。對於幾乎已經排序好的短區間,quick sort 表現不佳,於是切換為 heapsort。分水嶺通常設定成 $\log{N^2} = 2 \times log{N}$,`N` 是輸入資料的長度。之前學員已嘗試實作 [Introsort](https://en.wikipedia.org/wiki/Introsort),可見: * [ccs100203](https://hackmd.io/@cccccs100203/linux2021-quiz1), [Part II](https://hackmd.io/@cccccs100203/linux2021-sort) * [hankluo6](https://hackmd.io/@hankluo6/2021q1quiz1), [Part II](https://hackmd.io/@hankluo6/2021q5sort) [Introsort](https://en.wikipedia.org/wiki/Introsort) 又可進一步擴充為 [Pattern Defeating Quicksort](https://github.com/orlp/pdqsort),可簡稱為 pdqsort,縮減比較的次數,進行細部改良,可參見論文 [Pattern-defeating Quicksort](https://drive.google.com/file/d/0B1-vl-dPgKm_T0Fxeno1a0lGT0E/view)。 ![](https://i.imgur.com/Ol3xr7j.png) [swenson/sort](https://github.com/swenson/sort) 彙整多項排序實作,並提供多種情境的效能評估,參考執行輸出: ``` Qsort 100000 x86_64 109 9216431.2 ns/op Binary_insertion_sort 100000 x86_64 1 1020912000.0 ns/op Bitonic_sort 100000 x86_64 2 994466000.0 ns/op Quick_sort 100000 x86_64 185 5419470.3 ns/op Merge_sort 100000 x86_64 151 6638079.5 ns/op Heap_sort 100000 x86_64 164 6099676.8 ns/op Shell_sort 100000 x86_64 113 8874380.5 ns/op Tim_sort 100000 x86_64 135 7445088.9 ns/op Merge_sort_in_place 100000 x86_64 158 6372259.5 ns/op Grail_sort 100000 x86_64 122 8244967.2 ns/op Sqrt_sort 100000 x86_64 144 6978180.6 ns/op Rec_stable_sort 100000 x86_64 36 27847444.4 ns/op Grail_sort_dyn_buffer 100000 x86_64 139 7201316.5 ns/op ``` 延伸閱讀: * [fluxsort](https://github.com/scandum/fluxsort) - A branchless stable quicksort / mergesort hybrid * [Wolfsort](https://github.com/scandum/wolfsort) is a stable adaptive hybrid radix / merge sort * [Pattern-defeating Quicksort 閱讀筆記](https://hackmd.io/@Chang-Chia-Chi/pdqsort) * CppCon 2019: Andrei Alexandrescu "[Speed Is Found In The Minds of People](https://youtu.be/FJJTYQYB1JQ)" - [簡報檔案](https://github.com/CppCon/CppCon2019/blob/master/Presentations/speed_is_found_in_the_minds_of_people/speed_is_found_in_the_minds_of_people__andrei_alexandrescu__cppcon_2019.pdf) > In all likelihood, sorting is one of the most researched classes of algorithms. It is a fundamental task in Computer Science, both on its own and as a step in other algorithms. Efficient algorithms for sorting and searching are now taught in core undergraduate classes. Are they at their best, or is there more blood to squeeze from that stone? This talk will explore a few less known – but more allegro! – variants of classic sorting algorithms. And as they say, the road matters more than the destination. Along the way, we'll encounter many wondrous surprises and we'll learn how to cope with the puzzling behavior of modern complex architectures. :::success 延伸問題: 1. 解釋上述程式碼運作原理 2. 針對 [Quick sort](https://en.wikipedia.org/wiki/Quicksort) 提出上述程式碼的改進方案並實作 3. 引入上述的 hybrid sort 策略並在 Linux 核心風格的 circular doubly-linked list 實作,需要量化性能表現並探討不同資料集 (data set) 的影響 4. ==BONUS==: 嘗試對 Linux 核心提出排序程式碼改進的貢獻 ::: --- ### 測驗 `2` 延續上方測驗 `1`,參考 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/),將以下程式碼改寫為非遞迴 (non-recursive; iterative) 的實作: ```c #define MAX_DEPTH 512 static void list_sort_nr(struct list_head *head) { if (list_empty(head) || list_is_singular(head)) return; struct list_head stack[MAX_DEPTH]; for (int i = 0; i < MAX_DEPTH; i++) INIT_LIST_HEAD(&stack[i]); int top = -1; list_splice_init(head, &stack[++top]); struct list_head partition; INIT_LIST_HEAD(&partition); while (top >= 0) { INIT_LIST_HEAD(&partition); list_splice_init(&stack[top--], &partition); if (!list_empty(&partition) && !list_is_singular(&partition)) { struct list_head list_less, list_greater; INIT_LIST_HEAD(&list_less); INIT_LIST_HEAD(&list_greater); struct item *pivot = list_first_entry(&partition, struct item, list); list_del(&pivot->list); INIT_LIST_HEAD(&pivot->list); struct item *itm = NULL, *is = NULL; GGGG (itm, is, &partition, list) { list_del(&itm->list); if (cmpint(&itm->i, &pivot->i) < 0) list_move(&itm->list, &list_less); else list_move(&itm->list, &list_greater); } HHHH (&pivot->list, &list_less); if (!list_empty(&list_greater)) list_splice_tail(&list_greater, IIII); if (!list_empty(&list_less)) list_splice_tail(&list_less, JJJJ); } else { top++; list_splice_tail(&partition, &stack[top]); while (top >= 0 && list_is_singular(&stack[top])) { struct item *tmp = list_first_entry(&stack[top], struct item, list); list_del(&tmp->list); INIT_LIST_HEAD(KKKK); list_add_tail(&tmp->list, head); } } } } ``` [Optimized QuickSort](https://alienryderflex.com/quicksort/) 提到的方法是以 swap 為主體,利用 `L` 與 `R` 去紀錄需交換的數量,再用 `begin[]` 與 `end[]` 作為 stack,用來紀錄比較的範圍。 - 假定這是一開始的 array。會拿 head 當作 pivot,並且 L 會從左邊開始掃,紀錄有多少值小於 pivot,R 從右邊開始,紀錄有多少值大於 pivot。(R 一開始會定義為陣列的 length,以減法的方式紀錄) - 這裡的 L 就會是 3,而 R 就會是 5 ```graphviz digraph graphname{ node[shape=box] head[shape=plaintext,fontcolor=red] pivot[shape=plaintext,fontcolor=red] L[shape=plaintext,fontcolor=green] R[shape=plaintext,fontcolor=green] rankdir=LR { rankdir = LR P[label=4] A[label=4] B[label=1] C[label=3] D[label=5] E[label=2] F[label=7] NULL1[label=NULL,shape=plaintext] } { rank="same" pivot->P head->A } { rank="same" L->D } { rank="same" R->E } A->B->C->D->E->F->NULL1 } ``` - 於是會把 array[4] 放到 array[0],array[3] 放到 array[4],再把 pivot 放入 array[3] ```graphviz digraph graphname{ node[shape=box] head[shape=plaintext,fontcolor=red] pivot[shape=plaintext,fontcolor=red] rankdir=LR { rankdir = LR P[label=4] A[label=2] B[label=1] C[label=3] D[label=4] E[label=5] F[label=7] NULL1[label=NULL,shape=plaintext] } { rank="same" head->A pivot->P } A->B->C->D->E->F->NULL1 } ``` - 再來就是利用 begin 與 end 作為 stack 去儲存尚未排列好的序列,由於 pivot 目前在 array[3] 的位置,所以就會有一組 beg 跟 end 為 0~2、一組為 4~5。 - 之後會重複上述的行為,一次會以 5 為 pivot,另一次則會以 2 為 pivot,就可以得到排序好的 array。 ```graphviz digraph graphname{ node[shape=box] head[shape=plaintext,fontcolor=red] rankdir=LR { rankdir = LR A[label=1] B[label=2] C[label=3] D[label=4] E[label=5] F[label=7] NULL1[label=NULL,shape=plaintext] } { rank="same" head->A } A->B->C->D->E->F->NULL1 } ``` 用變數 `i` 紀錄現在的 `stack` 在什麼位置,每次迴圈的開始都進行 pop 的操作: ```cpp node_t *top = stk[i--]; ``` 依照相對於 `pivot` 大小整理節點的步驟沒有不同。而分類完 `left` 和 `right` 之後,本來應該各自遞迴計算的部分換成「放進 stack 中」。 這份程式碼嘗試用堆疊 (stack) 模擬原本的遞迴呼叫,請補完程式碼,使其如同測驗 `1` 提供的程式碼一般運作。作答規範: * 全部應以最簡潔的形式撰寫,避免空白字元 * `GGGG` 和 `HHHH` 都是以 `list_` 開頭的巨集 * `IIII`, `JJJJ`, `KKKK` 都包含 `stack`,並搭配必要的 C 運算子 :::success 延伸問題: 1. 解釋上方程式碼運作原理,指出設計和實作的缺失 2. 比較測驗 `1` 和測驗 `2` 的程式碼,設計實驗來分析二者效能,解釋為何非遞迴的實作未能總是比遞迴的實作快 3. 提出效能改進方案,特別是避免依賴 `MAX_DEPTH` 4. 引入 [Introsort](https://en.wikipedia.org/wiki/Introsort),也就是 quick sort 和 heap sort (或其他可避免前者 $O(n^2)$ 最差時間複雜度的演算法)。加入 heapsort 的原因為,當 quicksort 迭代 $2 \times \log n$ 次後排序仍未完成,那就可能遭遇 $O(n^2)$ 時間複雜度。此時切換為 heap sort 能將其時間複雜度控制在 $O(n \log n)$ ::: --- ### 測驗 `3` XOR 運算特性: * $A \oplus A = 0$ * $A \oplus 0 = A$ * $A \oplus 1 = \neg A$ * $(A \oplus B) \oplus B = A$ 以下程式碼是個 [XOR linked list](https://en.wikipedia.org/wiki/XOR_linked_list) 的實作,在這樣的資料結構可表示為: ``` 1 2 3 4 data HAT CAT EAT BAT link 2 2 6 3 ``` 要往下一格移動時,我們需要前一格在哪和這一格的位置。例如我們現在在 `HAT` (1), 已知前一格是 (0),那麼下一格就是 `link(1) XOR 0` = `2 XOR 0` = 2,也就是 `CAT`。繼續,現在位於 `CAT` (2),前一格在 (1),於是下一格就是 `link(2) XOR 1` = `2 XOR 1` = 3,亦即 `EAT`,依此類推。只要當找出來的下一格是 (0) 就結束。 [XOR linked list](https://en.wikipedia.org/wiki/XOR_linked_list) 的優勢在於空間的佔用更精簡,與尋常作法相比,佔用的儲存空間比值會趨近 $67\%$。 以下是學習 Linux 核心 `list.h` 風格的 XOR linked list 實作,其中 `##` 為 C 語言前置處理器,參見 [你所不知道的 C 語言:前置處理器應用篇](https://hackmd.io/@sysprog/c-preprocessor) 參考執行輸出: ``` xorlist_for_each test node [0] 999 node [1] 998 node [2] 997 ... node [996] 2 node [997] 1 node [998] 0 ``` 程式碼列表: ```c #include <assert.h> #include <errno.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> typedef struct _xorlist_node { struct _xorlist_node *cmp; } xor_node_t; typedef struct _xor_list_struct { xor_node_t head, tail; uint32_t count; } xor_list_t; #define XOR_COMP(a, b) ((xor_node_t *) (LLLL)) static inline xor_node_t *address_of(xor_node_t *n1, xor_node_t *n2) { assert(n1 && n2); return XOR_COMP(n1, n2); } #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) #define xorlist_for_each(node, rp, rn, list) \ for (rp = &(list)->head, node = rp->cmp; node != &(list)->tail; \ rn = address_of(rp, node->cmp), rp = node, node = rn) #define xorlist_for_each_prev(node, rp, rn, list) \ for (rp = &(list)->tail, node = rp->cmp; node != &(list)->head; \ rn = address_of(rp, node->cmp), rp = node, node = rn) #define xorlist_for_each_from(node, pos1, pos2, rp, rn, list) \ for (rp = pos2, node = pos1; node != &(list)->tail; \ rn = address_of(rp, node->cmp), rp = node, node = rn) #define xorlist_for_each_from_prev(node, pos1, pos2, rp, rn, list) \ for (rp = pos1, node = pos2; node != &(list)->head; \ rn = address_of(rp, node->cmp), rp = node, node = rn) /* Note that when the delete function success is must return 0. */ #define xorlist_delete_prototype(name, node) \ int _xorlist_delete_##name(xor_node_t *node) #define xorlist_delete_call(name) _xorlist_delete_##name static inline xor_node_t *xornode_init(xor_node_t *n) { assert(n); n->cmp = NULL; return n; } #define XORNODE_INIT(n) \ do { \ (n).cmp = NULL; \ } while (0) #define XORLIST_INIT(h) \ do { \ (h).head.cmp = &(h).tail; \ (h).tail.cmp = &(h).head; \ (h).count = 0; \ } while (0) int xorlist_add(xor_list_t *list, xor_node_t *n) { xor_node_t *real_next; if (!n) return ENOMEM; xor_node_t *real_prev = &list->head; xor_node_t *node = real_prev->cmp; if (node == &list->tail) real_next = MMMM; else real_next = node; real_prev->cmp = n; n->cmp = XOR_COMP(real_prev, real_next); real_next->cmp = XOR_COMP(n, XOR_COMP(real_prev, PPPP)); list->count++; return 0; } int xorlist_del(xor_list_t *list, xor_node_t *n, xor_node_t *target, int (*delete_func)(xor_node_t *)) { assert(list && n && target && delete_func); assert(&list->head != target && &list->tail != target); xor_node_t *nn = address_of(target, n->cmp); xor_node_t *an = address_of(n, target->cmp); xor_node_t *ana = address_of(target, an->cmp); n->cmp = XOR_COMP(nn, an); an->cmp = XOR_COMP(n, ana); delete_func(target); list->count--; return 0; } int xorlist_destroy(xor_list_t *list, int (*delete_func)(xor_node_t *)) { assert(delete_func); xor_node_t *real_prev = &list->head; xor_node_t *node = real_prev->cmp; xor_node_t *real_next = address_of(real_prev, node->cmp); xor_node_t *tmp = real_prev; real_prev = node; node = real_next; while (node != &list->tail) { real_next = address_of(real_prev, node->cmp); tmp = real_prev; real_prev = node; node = real_next; if (delete_func(tmp) != 0) perror("delete function failed"); } return 0; } struct test_node { int value; xor_node_t xornode; }; xorlist_delete_prototype(test_delete, _node) { struct test_node *node = container_of(_node, struct test_node, xornode); free(node); return 0; } int main(void) { xor_list_t list; xor_node_t *p1, *p2; XORLIST_INIT(list); for (int i = 0; i < 1000; i++) { struct test_node *new = malloc(sizeof(struct test_node)); xornode_init(&new->xornode); new->value = i; xorlist_add(&list, &new->xornode); if (i == 5) p1 = &new->xornode; if (i == 6) p2 = &new->xornode; } xor_node_t *real_prev, *real_next, *node; int i = 0; printf("xorlist_for_each test\n"); xorlist_for_each(node, real_prev, real_next, &list) { printf("node [%d] %d\n", i++, container_of(node, struct test_node, xornode)->value); } i = 0; printf("xorlist_for_from test\n"); xorlist_for_each_from(node, p1, p2, real_prev, real_next, &list) { printf("node %d\n", container_of(node, struct test_node, xornode)->value); } i = 0; printf("xorlist_for_each_from_prev test\n"); xorlist_for_each_from_prev(node, p1, p2, real_prev, real_next, &list) { printf("node [%d] %d\n", i++, container_of(node, struct test_node, xornode)->value); } i = 0; printf("xorlist_for_each_prev test\n"); xorlist_for_each_prev(node, real_prev, real_next, &list) { printf("node [%d] %d\n", i++, container_of(node, struct test_node, xornode)->value); } printf("xorlist_del test\n"); xorlist_del(&list, p2, p1, xorlist_delete_call(test_delete)); i = 0; xorlist_for_each(node, real_prev, real_next, &list) { printf("node [%d] %d\n", i++, container_of(node, struct test_node, xornode)->value); } xorlist_destroy(&list, xorlist_delete_call(test_delete)); return 0; } ``` 請補完程式碼,使其運作符合預期。作答規範: * `LLLL` 為表示式,在運算子前後要一個空白字元區隔,應包含 `uintptr_t`,符合 [lab0-c]() 預期的排版風格 * `MMMM` 和 `PPPP` 為表示式 :::success 延伸問題: 1. 解釋上述程式碼運作原理,指出其實作缺陷並改進 2. 在上述 XOR linked list 的基礎,實作測驗 `1` 和 `2` 提及的快速排序演算法,注意要引入你改進效能的版本 3. 修改 `xorlist_destroy`,使其不急著釋放記憶體,而搭配 [atexit(3)](https://man7.org/linux/man-pages/man3/atexit.3.html) 並在程式即將結束執行時,才批次處理資源釋放 :::