2021q1 Homework1 (quiz1)

# 2021q1 Homework1 (quiz1) contributed by < `tigger12613` > > [2021q1 第 1 週測驗題](https://hackmd.io/@sysprog/linux2021-quiz1) > [GitHub](https://github.com/tigger12613/quiz1) ## 作業要求 - [x] 1. 解釋程式運作原理，搭配 Graphviz，比照 Linked List 練習題在 HackMD 筆記上視覺化展現 - [x] 1.1 測試程式使用到 random，多執行幾次可發現輸出結果相仿，請修正，並引入其他 Pseudorandom number generator - [x] 2. 參考 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 並重寫上述 quick sort 程式碼，避免使用遞迴呼叫 - [ ] 3. Linux 核心內部也有 linked list 實作，但是 circulr doubly-linked list，[linux-list](https://github.com/sysprog21/linux-list) 仿效 Linux 核心的實作並予以簡化，在 examples/ 目錄提供 quick sort 實作，請探討 Linux 的 linked list 和上述程式碼的落差，並改寫 linux-list 的 quick sort 範例，避免使用遞迴呼叫 - [ ] 4. 研讀 Ching-Kuang Shene (冼鏡光) 教授撰寫的 A Comparative Study of Linked List Sorting Algorithms，思考高效率的 linked list 排序演算法，並落實於上述 (3) 的程式碼中 ## 在電腦上重現程式 quiz 缺少兩個函式，予以實作 ```cpp static node_t *list_make_node_t(node_t *list, long num) { node_t *tmp = malloc(sizeof(node_t)); if (!tmp) { return list; } tmp->value = num; tmp->next = list; return tmp; } static void list_free(node_t **list) { node_t *tmp; while (*list) { tmp = *list; *list = (*list)->next; free(tmp); } } ``` 運作正常 ``` shell $ valgrind --leak-check=full ./link_list ==45637== Memcheck, a memory error detector ==45637== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==45637== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==45637== Command: ./link_list ==45637== NOT IN ORDER : 1016 84 706 124 326 483 763 498 939 186 205 809 236 74 255 81 115 105 966 359 IN ORDER : 74 81 84 105 115 124 186 205 236 255 326 359 483 498 706 763 809 939 966 1016 ==45637== ==45637== HEAP SUMMARY: ==45637== in use at exit: 0 bytes in 0 blocks ==45637== total heap usage: 21 allocs, 21 frees, 1,344 bytes allocated ==45637== ==45637== All heap blocks were freed -- no leaks are possible ==45637== ==45637== For lists of detected and suppressed errors, rerun with: -s ==45637== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` ## 程式運作過程 ### Main 產生長度為 20 的 linked list 並隨機給予 0 ~ 1023 的數值，列印 linked list ，排序，再列印 linked list ，確認排序是否正確，free linked list ```cpp int main(int argc, char **argv) { size_t count = 20; node_t *list = NULL; while (count--) list = list_make_node_t(list, random() % 1024); list_display(list); quicksort(&list); list_display(list); if (!list_is_ordered(list)) return EXIT_FAILURE; list_free(&list); return EXIT_SUCCESS; } ``` ### List add node 將節點插入 list 的頭部 ```cpp static inline void list_add_node_t(node_t **list, node_t *node_t) { node_t->next = *list; *list = node_t; } ``` ```graphviz digraph a { rankdir=LR; node[shape=box]; "node"->"head" "head"->"..." "..."->"tail" } ``` ### List concat 將 right list 鏈結在 left list 後面 ```cpp static inline void list_concat(node_t **left, node_t *right) { while (*left) left = &((*left)->next); *left = right; } ``` ```graphviz digraph a { rankdir=LR; node[shape=box]; node1 [label = "..."] "left"->"..." "..."->"left tail" "left tail" -> "right"[color = red label="concat"] "right"->node1 node1->"right tail" } ``` ### Quick sort ```cpp= void quicksort(node_t **list) { if (!*list) return; node_t *pivot = *list; int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; node_t *left = NULL, *right = NULL; while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? &right : &left, n); } quicksort(&left); quicksort(&right); node_t *result = NULL; list_concat(&result, left); list_concat(&result, pivot); list_concat(&result, right); *list = result; } ``` 這段用於把 pivot 拆出來 ```cpp=6 node_t *pivot = *list; int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; ``` while loop 把比 pivot 大的放到 right list 上面，小或等於的放到 list list 上面 ```cpp=12 while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? &right : &left, n); } ``` ```graphviz digraph sort{ rankdir=LR; node[shape=box]; right left pivot n1 [label = "> pivot"] n2 [label = " <= pivot"] right-> n1 left->n2 } ``` 接下來排序 left list and right list ，然後接起來 ```cpp=18 quicksort(&left); quicksort(&right); node_t *result = NULL; list_concat(&result, left); list_concat(&result, pivot); list_concat(&result, right); *list = result; ``` ```graphviz digraph sort{ rankdir=LR; node[shape=box]; right result pivot { rank = same left -> lleft } lright [label = " sorted and > pivot"] lleft [label = " sorted and <= pivot"] { rank = same right-> lright } lleft->pivot [label = "concat" color = "red"] pivot->lright [label = "concat" color = "red"] result->lleft } ``` ### 隨機種子加上隨機種子在使用 random() 之前就可以每一次都產生不同的隨機數 ```cpp srand(time(NULL)); ``` PRNG 使用初始值產生一串帶有隨機性質的數字序列，如果初始值相等，序列也會相等，因此不是真隨機。在使用時使用 time 作為初始值可以讓初始值看似隨機數，產生的數字序列也會看起來隨機。 :::warning 倘若 time 函式得到的時間數值是可預測的話，PRNG 的輸出也可預測，該如何改進？留意到 entropy 相關討論 :notes: jserv ::: ## Iterative quicksort 新增一個 stack 用於紀錄還未排序的 linked list sorted 紀錄已經排序好的範圍每次 loop 從 stack 中最右邊的 linked list 進行切分，如果只有一個節點，代表這個節點是 stack 上數值最大的節點，並且小於等於 sorted 鏈上的每一個節點，所以將其放到 sorted 鏈的第一個上，然後從 stack 上移除($i--$)，持續重複這個過程直到 stack 為空。 ```cpp void quicksort_iterative(node_t **list) { #define MAX_LEVELS 1000 if (!*list) return; int i = 0, j = 0; node_t *stack[MAX_LEVELS]; node_t *sorted; sorted = NULL; stack[0] = *list; while(i>=0 && i<MAX_LEVELS){ if(!stack[i]){ i--; continue; } if(!stack[i]->next){ stack[i]->next = sorted; sorted = stack[i]; i--; continue; } node_t *pivot = stack[i]; int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; node_t *left = NULL, *right = NULL; while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? &right : &left, n); } stack[i] = left; stack[i+1] = pivot; stack[i+2] = right; i +=2; } *list = sorted; } ``` 接著對效能測試，重複 5000 次對長度 1000 的 list 做排序 iterative 版本快了46% 的速度 | recursive version | iterative version | | ----------------- | ----------------- | | 0.832864 sec | 0.567636 sec | `MAX_LEVELS`的數值選擇應該考慮到 worst case 的狀況，如果每一次的 pivot 都選到最小的那一個也就是說 left 都為 NULL ，也就是需要 $2 \times (n-1)$ 個位置去儲存 linked list node ```graphviz digraph { node [color=black shape=none margin=0 height=.3 ] values [label=< <table border="0" cellborder="1" cellspacing="0"> <tr> <td port="f0" bgcolor = "lightyellow"> NULL </td> <td port="f1" bgcolor = "lightyellow"> pivot1 </td> <td port="f2" bgcolor = "lightyellow"> NULL </td> <td port="f3" bgcolor = "lightyellow"> pivot2 </td> <td port="f4" bgcolor = "lightyellow"> ... </td> <td port="f5" bgcolor = "lightyellow"> NULL </td> <td port="f6" bgcolor = "lightyellow"> pivot_end </td> <td port="f7" bgcolor = "lightyellow"> right_end </td> </tr> </table> >]; "stack"->values:f0 { rank=same; "stack"; values } } ``` :::warning `MAX_LEVELS` 數值設定的依據為何？做數學分析來指定更適切的數值。 :notes: jserv :::