# 2024q1 Homework2 (quiz1+2) contributed by < ```aa860630``` > ## quiz1 ### 測驗1 : 資料型別 : ```node_t``` ```c typedef struct __node { struct __node *left, *right; struct __node *next; long value; } node_t; ``` ```graphviz digraph "__node" { node [shape="record"]; rankdir = "LR" subgraph cluster_a { label = "node_t"; node1 [label="{<left> left |<right> right}|next|value"]; } } ``` ```list_tail()```與```list_length()```裡的 while 迴圈作用在於檢測節點是否存在,透過不斷將```left```指向下個節點來移動,由於```left```為indirect pointer,所以 (*left)->next 前要加一個 "&" ```c node_t *list_tail(node_t **left) { while ((*left) && (*left)->next) left = &((*left)->next); return *left; } int list_length(node_t **left) { int n = 0; while (*left) { ++n; left = &((*left)->next); } return n; } ``` Quick sort 比較直觀的方法是當選出每輪的 Pivot 後,利用遞迴的方式去排序大於和小於 Pivot 的串列,而此處是利用指標陣列去儲存每個串列的位置,陣列裡的 index 越高代表串列越接近最大值,這麼做的好處在於不用進行函式呼叫(不用使用 stack 做儲存),快速且節省空間 ```c while (i >= 0) { node_t *L = begin[i], *R = end[i]; ... begin[i] = left; end[i] = list_tail(left); begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = list_tail(right); left = right = NULL; i += 2; ... i--; } } ``` #### 時間複雜度 假如每次選擇的 pivot 都剛好是中位數,下輪需要比較的次數約只有一半,為best case,時間複雜度為 : $T(n) = 2*T(n/2) + c*n$ $O(n) = n logn$ 當 Quick sort 作用於已排序串列時,每輪選擇的 pivot 為最大或最小,使得切割無法產生一分為二的效果,時間複雜度為 : $T(n) = T(n-1) + T(0) + c*n$ $O(n) = n^2$ > 第一輪 ```graphviz digraph { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p1[label="{<data> 1|<next>}"]; p2[label="{<data> 2|<next>}"]; p3[label="{<data> 3|<next>}"]; p4[label="{<data> 4|<next>}"]; p5[label="{<data> 5|<next>}"]; L[shape=none]; pivot[shape=none]; R[shape=none]; L -> p1 pivot -> p1 R -> p5 // {rank=same; L -> p1;} edge[tailclip=false,arrowtail=dot,dir=both]; p1:next:c -> p2:data; p2:next:c -> p3:data; p3:next:c -> p4:data; p4:next:c -> p5:data; p5:next:c -> NULL; } ``` > 第二輪 ```graphviz digraph { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p1[label="{<data> 1|<next>}"]; p2[label="{<data> 2|<next>}"]; p3[label="{<data> 3|<next>}"]; p4[label="{<data> 4|<next>}"]; p5[label="{<data> 5|<next>}"]; L[shape=none]; pivot[shape=none]; R[shape=none]; L -> p2 pivot -> p2 R -> p5 // {rank=same; L -> p1;} edge[tailclip=false,arrowtail=dot,dir=both]; p1:next:c -> p2:data; p2:next:c -> p3:data; p3:next:c -> p4:data; p4:next:c -> p5:data; p5:next:c -> NULL; } ``` > 第三輪 ```graphviz digraph { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p1[label="{<data> 1|<next>}"]; p2[label="{<data> 2|<next>}"]; p3[label="{<data> 3|<next>}"]; p4[label="{<data> 4|<next>}"]; p5[label="{<data> 5|<next>}"]; L[shape=none]; pivot[shape=none]; R[shape=none]; L -> p3 pivot -> p3 R -> p5 // {rank=same; L -> p1;} edge[tailclip=false,arrowtail=dot,dir=both]; p1:next:c -> p2:data; p2:next:c -> p3:data; p3:next:c -> p4:data; p4:next:c -> p5:data; p5:next:c -> NULL; } ``` ### 測驗2 : * ```find_run()``` : 這個函式用於查找連續遞增或遞減的子序列(即 run)。它會返回一個 pair 結構,其中包含找到的 run 的頭指針和下一個元素的指針 * ```merge_collapse()``` : 這個函式用於合併相鄰的 run,直到達到某個條件為止。它會根據 stk_size 來決定合併的次數,同時根據 run 的大小來判斷合併的位置 * ```merge_force_collapse()``` : 這個函式用於強制合併相鄰的 run,直到堆疊中只剩下一個 run。它會根據 run 的大小來決定合併的位置,並且會根據 stk_size 來判斷是否要繼續合併 * ```merge_final()``` : 函式將剩餘的 run 合併成一個完整的排序好的鏈表。這個函式會遍歷所有的 run,並根據 cmp 函式的返回值來進行合併,同時保持排序的穩定性。最後,它會透過```build_prev_link()```重新建立 prev 連結,使之成為一個循環鏈表 timsort 的程式碼主要透過上述函式來實現的,這些函式的執行順序和作用如下: 1. 首先,程序初始化了一些變量,包括堆疊大小 (stk_size),然後設置了 head 為鏈表的頭指針,並檢查是否只有一個元素,如果只有一個元素,則不需要進行排序,直接返回。 2. 接下來,利用 find_run() 函式找到所有的 run。這個函式會遍歷鏈表,並找到遞增或遞減的子序列,然後將其分離出來。 以遞減為例,根據 ```cmp(priv,list,next)``` ,若當前節點大於下個節點,則```len++```,一直持續到下個節點不為遞減,與此同時反轉這個遞減串列 ```c static struct pair find_run(void *priv, struct list_head *list, list_cmp_func_t cmp) { size_t len = 1; struct list_head *next = list->next, *head = list; struct pair result; ... if (cmp(priv, list, next) > 0) { /* decending run, also reverse the list */ struct list_head *prev = NULL; do { len++; list->next = prev; prev = list; list = next; next = list->next; head = list; } while (next && cmp(priv, list, next) > 0); list->next = prev; ... } head->prev = NULL; head->next->prev = (struct list_head *) len; result.head = head, result.next = next; return result; } ``` 3. 在找到 run 之後,利用 merge_collapse() 函式合併相鄰的 run,直到滿足某個條件為止。這個函式會根據堆疊中 run 的大小來決定是否需要進行合併,同時根據 run 的大小來決定合併的位置。 藉此函示,不僅確保堆疊中 run 的長度平衡,且因合併只能在相鄰的兩個 run 間進行,以確保排序的穩定性,因此合併操作僅可能採取 A+(B+C) 或 (A+B)+C 的形式進行。此舉讓 Timsort 在執行時兼顧高效又穩定,詳細說明請見[2024q1](https://hackmd.io/@sysprog/linux2024-quiz1#%E6%B8%AC%E9%A9%97-2) ```c static struct list_head *merge_collapse(void *priv, list_cmp_func_t cmp, struct list_head *tp) { int n; while ((n = stk_size) >= 2) { if ((n >= 3 && run_size(tp->prev->prev) <= run_size(tp->prev) + run_size(tp)) || (n >= 4 && run_size(tp->prev->prev->prev) <= run_size(tp->prev->prev) + run_size(tp->prev))) { if (run_size(tp->prev->prev) < run_size(tp)) { tp->prev = merge_at(priv, cmp, tp->prev); } else { tp = merge_at(priv, cmp, tp); } } else if (run_size(tp->prev) <= run_size(tp)) { tp = merge_at(priv, cmp, tp); } else { break; } } return tp; } ``` 4. 然後,利用 merge_force_collapse() 函式強制合併相鄰的 run,直到堆疊中只剩下一個 run。這個函式會一直遍歷堆疊,直到堆疊中只剩下一個 run 為止。 5. 最後,利用 merge_final() 函式將剩餘的 run 合併成一個完整的排序好的鏈表。這個函式會遍歷所有的 run,並根據 cmp 函式的返回值來進行合併,同時保持排序的穩定性。最後,它會重新建立 prev 連結,使之成為一個循環鏈表。 甚麼是穩定性?當一樣的數值在比大小時即便位置互換看起來結果不變,但不必要的交換可能會延長執行時間,甚至影響實驗結果,```cmp(priv, a, b) <= 0```中的``` = ```就在確保若數值一樣則以靠前的```a```為優先,確保排序的穩定性。 ```c for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { tail->next = a; a->prev = tail; tail = a; a = a->next; if (!a) break; } ``` 這樣,利用上述函式的組合,就完成了 timsort 算法的實現。 ```c void timsort(void *priv, struct list_head *head, list_cmp_func_t cmp) { stk_size = 0; struct list_head *list = head->next, *tp = NULL; if (head == head->prev) return; /* Convert to a null-terminated singly-linked list. */ head->prev->next = NULL; do { /* Find next run */ struct pair result = find_run(priv, list, cmp); result.head->prev = tp; tp = result.head; list = result.next; stk_size++; tp = merge_collapse(priv, cmp, tp); } while (list); /* End of input; merge together all the runs. */ tp = merge_force_collapse(priv, cmp, tp); /* The final merge; rebuild prev links */ struct list_head *stk0 = tp, *stk1 = stk0->prev; while (stk1 && stk1->prev) stk0 = stk0->prev, stk1 = stk1->prev; if (stk_size <= 1) { build_prev_link(head, head, stk0); return; } merge_final(priv, cmp, head, stk1, stk0); } ``` ## quiz2 ### 測驗1 : #### Hash table 資料結構[(參考)](https://hackmd.io/rpIi8FeqQ9eBJ-UTFgb--A?view) 示意圖如下: ```graphviz digraph G { rankdir=LR; node[shape=record]; map [label="hlist_head.first |<ht0> |<ht1> |<ht2> |<ht3> |<ht4> |<ht5> |<ht7> |<ht8> "]; node[shape=none] null1 [label=NULL] null2 [label=NULL] subgraph cluster_1 { style=filled; color=lightgrey; node [shape=record]; hn1 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 1" } subgraph cluster_2 { style=filled; color=lightgrey; node [shape=record]; hn2 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 2" } subgraph cluster_3 { style=filled; color=lightgrey; node [shape=record]; hn3 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 3" } map:ht1 -> hn1 hn1:next -> hn2 hn2:next -> null1 hn2:prev:s -> hn1:next:s map:ht5 -> hn3 hn3:next -> null2 hn1:prev:s -> map:ht1 hn3:prev:s -> map:ht5 } ``` Linux 核心的 hash table 實作中,用以處理 hash 數值碰撞的 hlist_node: ```c struct hlist_node { struct hlist_node *next, **pprev; }; struct hlist_head { struct hlist_node *first; }; ``` 此處不使用雙向鏈結串列而是使用 pointer to pointer益處有二 : * 僅有末端指標內容是 ```NULL```。 * ```next ```指向相鄰的節點本身,而 ```pprev``` 指向指著自身節點的指標。 ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>hlist_node_1 | {<p>pprev | <n>next}", group = list]; node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next}", group = list]; node_3[label = "<m>hlist_node_3 | {<p>pprev | <n>next}", group = list]; NULL[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head} list_head -> node_1 -> node_2 -> node_3[ weight = 10, style = invis ] list_head:n -> node_1:m; node_1:p -> list_head:n; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> node_3:m; node_3:p -> node_2:n; node_3:n -> NULL; } ``` #### 雜湊函數 ```int hash = (num < 0 ? -num : num) % size;```翻譯成數學式為: $hash=(∣num∣mod size)$ 這個函數首先取絕對值來保證得到的是非負數,然後再取餘數來確保得到的哈希值在哈希表的範圍內 該函式使用哈希表來找尋節點,較一般二元樹搜尋(時間複雜度為$O(logn)$)還快上許多,但要尋找到合適的雜湊函數相對難,必須要 hash range 與碰撞次數中取得平衡 ```c static int find(int num, int size, const struct hlist_head *heads) { struct hlist_node *p; int hash = (num < 0 ? -num : num) % size; hlist_for_each (p, &head[hash]) { struct order_node *on = list_entry(p, struct order_node, node); if (num == on->val) return on->idx; } return -1; } ``` #### DFS Preorder 中第一筆資料為根,而 Inorder 中左子樹及右子樹之資料分別在根的左右,所以利用```find()```得到根的index後,即可切割出左子樹及右子樹,且因為 Preorder 中左子樹之資料必定在右子樹之資料前面,所以先對左子樹遞迴做```dfs()```,接著對右子樹遞迴做```dfs()```便可得到整顆樹 ```C static struct TreeNode *dfs(int *preorder, int pre_low, int pre_high, int *inorder, int in_low, int in_high, struct hlist_head *in_heads, int size) { if (in_low > in_high || pre_low > pre_high) return NULL; struct TreeNode *tn = malloc(sizeof(*tn)); tn->val = preorder[pre_low]; int idx = find(preorder[pre_low], size, in_heads); tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder, in_low, idx - 1, in_heads, size); tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder, idx + 1, in_high, in_heads, size); return tn; } ``` 從以下程式碼得知,在第一輪搜尋完,可得根在中序之 index 為1,左子樹的資料為中序中 index 小於1的資料($inorder[i],i<1$),右子樹的資料為中序中 index 大於1的資料($inorder[i],1<i<=4$) ```c tn->left = dfs(preorder, 1, 1, inorder, 0, 0, in_heads, 5); tn->right = dfs(preorder, 2, 4, inorder, 2, 4, in_heads, 5); ``` 前序 3[9][20,15,7] 中序 [9]3[15,20,7] ```c /** * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants * @pos: the cgroup * to use as the loop cursor * @cgroup: cgroup whose descendants to walk * * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A * descendant cgroup which hasn't finished ->css_online() or already has * finished ->css_offline() may show up during traversal and it's each * subsystem's responsibility to verify that each @pos is alive. * * If a subsystem synchronizes against the parent in its ->css_online() and * before starting iterating, and synchronizes against @pos on each * iteration, any descendant cgroup which finished ->css_online() is * guaranteed to be visible in the future iterations. * * In other words, the following guarantees that a descendant can't escape * state updates of its ancestors. * * my_online(@cgrp) * { * Lock @cgrp->parent and @cgrp; * Inherit state from @cgrp->parent; * Unlock both. * } * * my_update_state(@cgrp) * { * Lock @cgrp; * Update @cgrp's state; * Unlock @cgrp; * * cgroup_for_each_descendant_pre(@pos, @cgrp) { * Lock @pos; * Verify @pos is alive and inherit state from @pos->parent; * Unlock @pos; * } * } * * As long as the inheriting step, including checking the parent state, is * enclosed inside @pos locking, double-locking the parent isn't necessary * while inheriting. The state update to the parent is guaranteed to be * visible by walking order and, as long as inheriting operations to the * same @pos are atomic to each other, multiple updates racing each other * still result in the correct state. It's guaranateed that at least one * inheritance happens for any cgroup after the latest update to its * parent. * * If checking parent's state requires locking the parent, each inheriting * iteration should lock and unlock both @pos->parent and @pos. * * Alternatively, a subsystem may choose to use a single global lock to * synchronize ->css_online() and ->css_offline() against tree-walking * operations. */ #define cgroup_for_each_descendant_pre(pos, cgroup) \ for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \ pos = cgroup_next_descendant_pre((pos), (cgroup))) ``` ### 測驗2 :