2024q1 Homework2 (quiz1+2)

# 2024q1 Homework2 (quiz1+2) contributed by < ```aa860630``` > ## quiz1 ### 測驗1 : 資料型別 : ```node_t``` ```c typedef struct __node { struct __node *left, *right; struct __node *next; long value; } node_t; ``` ```graphviz digraph "__node" { node [shape="record"]; rankdir = "LR" subgraph cluster_a { label = "node_t"; node1 [label="{<left> left |<right> right}|next|value"]; } } ``` ```list_tail()```與```list_length()```裡的 while 迴圈作用在於檢測節點是否存在，透過不斷將```left```指向下個節點來移動，由於```left```為indirect pointer，所以 (*left)->next 前要加一個 "&" ```c node_t *list_tail(node_t **left) { while ((*left) && (*left)->next) left = &((*left)->next); return *left; } int list_length(node_t **left) { int n = 0; while (*left) { ++n; left = &((*left)->next); } return n; } ``` Quick sort 比較直觀的方法是當選出每輪的 Pivot 後，利用遞迴的方式去排序大於和小於 Pivot 的串列，而此處是利用指標陣列去儲存每個串列的位置，陣列裡的 index 越高代表串列越接近最大值，這麼做的好處在於不用進行函式呼叫(不用使用 stack 做儲存)，快速且節省空間 ```c while (i >= 0) { node_t *L = begin[i], *R = end[i]; ... begin[i] = left; end[i] = list_tail(left); begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = list_tail(right); left = right = NULL; i += 2; ... i--; } } ``` #### 時間複雜度假如每次選擇的 pivot 都剛好是中位數，下輪需要比較的次數約只有一半，為best case，時間複雜度為 : $T(n) = 2*T(n/2) + c*n$ $O(n) = n logn$ 當 Quick sort 作用於已排序串列時，每輪選擇的 pivot 為最大或最小，使得切割無法產生一分為二的效果，時間複雜度為 : $T(n) = T(n-1) + T(0) + c*n$ $O(n) = n^2$ > 第一輪 ```graphviz digraph { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p1[label="{<data> 1|<next>}"]; p2[label="{<data> 2|<next>}"]; p3[label="{<data> 3|<next>}"]; p4[label="{<data> 4|<next>}"]; p5[label="{<data> 5|<next>}"]; L[shape=none]; pivot[shape=none]; R[shape=none]; L -> p1 pivot -> p1 R -> p5 // {rank=same; L -> p1;} edge[tailclip=false,arrowtail=dot,dir=both]; p1:next:c -> p2:data; p2:next:c -> p3:data; p3:next:c -> p4:data; p4:next:c -> p5:data; p5:next:c -> NULL; } ``` > 第二輪 ```graphviz digraph { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p1[label="{<data> 1|<next>}"]; p2[label="{<data> 2|<next>}"]; p3[label="{<data> 3|<next>}"]; p4[label="{<data> 4|<next>}"]; p5[label="{<data> 5|<next>}"]; L[shape=none]; pivot[shape=none]; R[shape=none]; L -> p2 pivot -> p2 R -> p5 // {rank=same; L -> p1;} edge[tailclip=false,arrowtail=dot,dir=both]; p1:next:c -> p2:data; p2:next:c -> p3:data; p3:next:c -> p4:data; p4:next:c -> p5:data; p5:next:c -> NULL; } ``` > 第三輪 ```graphviz digraph { node[shape=record]; graph[pencolor=transparent]; rankdir=LR; p1[label="{<data> 1|<next>}"]; p2[label="{<data> 2|<next>}"]; p3[label="{<data> 3|<next>}"]; p4[label="{<data> 4|<next>}"]; p5[label="{<data> 5|<next>}"]; L[shape=none]; pivot[shape=none]; R[shape=none]; L -> p3 pivot -> p3 R -> p5 // {rank=same; L -> p1;} edge[tailclip=false,arrowtail=dot,dir=both]; p1:next:c -> p2:data; p2:next:c -> p3:data; p3:next:c -> p4:data; p4:next:c -> p5:data; p5:next:c -> NULL; } ``` ### 測驗2 : * ```find_run()``` : 這個函式用於查找連續遞增或遞減的子序列（即 run）。它會返回一個 pair 結構，其中包含找到的 run 的頭指針和下一個元素的指針 * ```merge_collapse()``` : 這個函式用於合併相鄰的 run，直到達到某個條件為止。它會根據 stk_size 來決定合併的次數，同時根據 run 的大小來判斷合併的位置 * ```merge_force_collapse()``` : 這個函式用於強制合併相鄰的 run，直到堆疊中只剩下一個 run。它會根據 run 的大小來決定合併的位置，並且會根據 stk_size 來判斷是否要繼續合併 * ```merge_final()``` : 函式將剩餘的 run 合併成一個完整的排序好的鏈表。這個函式會遍歷所有的 run，並根據 cmp 函式的返回值來進行合併，同時保持排序的穩定性。最後，它會透過```build_prev_link()```重新建立 prev 連結，使之成為一個循環鏈表 timsort 的程式碼主要透過上述函式來實現的，這些函式的執行順序和作用如下： 1. 首先，程序初始化了一些變量，包括堆疊大小 (stk_size)，然後設置了 head 為鏈表的頭指針，並檢查是否只有一個元素，如果只有一個元素，則不需要進行排序，直接返回。 2. 接下來，利用 find_run() 函式找到所有的 run。這個函式會遍歷鏈表，並找到遞增或遞減的子序列，然後將其分離出來。以遞減為例，根據 ```cmp(priv,list,next)``` ，若當前節點大於下個節點，則```len++```，一直持續到下個節點不為遞減，與此同時反轉這個遞減串列 ```c static struct pair find_run(void *priv, struct list_head *list, list_cmp_func_t cmp) { size_t len = 1; struct list_head *next = list->next, *head = list; struct pair result; ... if (cmp(priv, list, next) > 0) { /* decending run, also reverse the list */ struct list_head *prev = NULL; do { len++; list->next = prev; prev = list; list = next; next = list->next; head = list; } while (next && cmp(priv, list, next) > 0); list->next = prev; ... } head->prev = NULL; head->next->prev = (struct list_head *) len; result.head = head, result.next = next; return result; } ``` 3. 在找到 run 之後，利用 merge_collapse() 函式合併相鄰的 run，直到滿足某個條件為止。這個函式會根據堆疊中 run 的大小來決定是否需要進行合併，同時根據 run 的大小來決定合併的位置。藉此函示，不僅確保堆疊中 run 的長度平衡，且因合併只能在相鄰的兩個 run 間進行，以確保排序的穩定性，因此合併操作僅可能採取 A+(B+C) 或 (A+B)+C 的形式進行。此舉讓 Timsort 在執行時兼顧高效又穩定，詳細說明請見[2024q1](https://hackmd.io/@sysprog/linux2024-quiz1#%E6%B8%AC%E9%A9%97-2) ```c static struct list_head *merge_collapse(void *priv, list_cmp_func_t cmp, struct list_head *tp) { int n; while ((n = stk_size) >= 2) { if ((n >= 3 && run_size(tp->prev->prev) <= run_size(tp->prev) + run_size(tp)) || (n >= 4 && run_size(tp->prev->prev->prev) <= run_size(tp->prev->prev) + run_size(tp->prev))) { if (run_size(tp->prev->prev) < run_size(tp)) { tp->prev = merge_at(priv, cmp, tp->prev); } else { tp = merge_at(priv, cmp, tp); } } else if (run_size(tp->prev) <= run_size(tp)) { tp = merge_at(priv, cmp, tp); } else { break; } } return tp; } ``` 4. 然後，利用 merge_force_collapse() 函式強制合併相鄰的 run，直到堆疊中只剩下一個 run。這個函式會一直遍歷堆疊，直到堆疊中只剩下一個 run 為止。 5. 最後，利用 merge_final() 函式將剩餘的 run 合併成一個完整的排序好的鏈表。這個函式會遍歷所有的 run，並根據 cmp 函式的返回值來進行合併，同時保持排序的穩定性。最後，它會重新建立 prev 連結，使之成為一個循環鏈表。甚麼是穩定性?當一樣的數值在比大小時即便位置互換看起來結果不變，但不必要的交換可能會延長執行時間，甚至影響實驗結果，```cmp(priv, a, b) <= 0```中的``` = ```就在確保若數值一樣則以靠前的```a```為優先，確保排序的穩定性。 ```c for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { tail->next = a; a->prev = tail; tail = a; a = a->next; if (!a) break; } ``` 這樣，利用上述函式的組合，就完成了 timsort 算法的實現。 ```c void timsort(void *priv, struct list_head *head, list_cmp_func_t cmp) { stk_size = 0; struct list_head *list = head->next, *tp = NULL; if (head == head->prev) return; /* Convert to a null-terminated singly-linked list. */ head->prev->next = NULL; do { /* Find next run */ struct pair result = find_run(priv, list, cmp); result.head->prev = tp; tp = result.head; list = result.next; stk_size++; tp = merge_collapse(priv, cmp, tp); } while (list); /* End of input; merge together all the runs. */ tp = merge_force_collapse(priv, cmp, tp); /* The final merge; rebuild prev links */ struct list_head *stk0 = tp, *stk1 = stk0->prev; while (stk1 && stk1->prev) stk0 = stk0->prev, stk1 = stk1->prev; if (stk_size <= 1) { build_prev_link(head, head, stk0); return; } merge_final(priv, cmp, head, stk1, stk0); } ``` ## quiz2 ### 測驗1 : #### Hash table 資料結構[(參考)](https://hackmd.io/rpIi8FeqQ9eBJ-UTFgb--A?view) 示意圖如下: ```graphviz digraph G { rankdir=LR; node[shape=record]; map [label="hlist_head.first |<ht0> |<ht1> |<ht2> |<ht3> |<ht4> |<ht5> |<ht7> |<ht8> "]; node[shape=none] null1 [label=NULL] null2 [label=NULL] subgraph cluster_1 { style=filled; color=lightgrey; node [shape=record]; hn1 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 1" } subgraph cluster_2 { style=filled; color=lightgrey; node [shape=record]; hn2 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 2" } subgraph cluster_3 { style=filled; color=lightgrey; node [shape=record]; hn3 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 3" } map:ht1 -> hn1 hn1:next -> hn2 hn2:next -> null1 hn2:prev:s -> hn1:next:s map:ht5 -> hn3 hn3:next -> null2 hn1:prev:s -> map:ht1 hn3:prev:s -> map:ht5 } ``` Linux 核心的 hash table 實作中，用以處理 hash 數值碰撞的 hlist_node: ```c struct hlist_node { struct hlist_node *next, **pprev; }; struct hlist_head { struct hlist_node *first; }; ``` 此處不使用雙向鏈結串列而是使用 pointer to pointer益處有二 : * 僅有末端指標內容是 ```NULL```。 * ```next ```指向相鄰的節點本身，而 ```pprev``` 指向指著自身節點的指標。 ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>hlist_node_1 | {<p>pprev | <n>next}", group = list]; node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next}", group = list]; node_3[label = "<m>hlist_node_3 | {<p>pprev | <n>next}", group = list]; NULL[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head} list_head -> node_1 -> node_2 -> node_3[ weight = 10, style = invis ] list_head:n -> node_1:m; node_1:p -> list_head:n; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> node_3:m; node_3:p -> node_2:n; node_3:n -> NULL; } ``` #### 雜湊函數 ```int hash = (num < 0 ? -num : num) % size;```翻譯成數學式為: $hash=(∣num∣mod size)$ 這個函數首先取絕對值來保證得到的是非負數，然後再取餘數來確保得到的哈希值在哈希表的範圍內該函式使用哈希表來找尋節點，較一般二元樹搜尋(時間複雜度為$O(logn)$)還快上許多，但要尋找到合適的雜湊函數相對難，必須要 hash range 與碰撞次數中取得平衡 ```c static int find(int num, int size, const struct hlist_head *heads) { struct hlist_node *p; int hash = (num < 0 ? -num : num) % size; hlist_for_each (p, &head[hash]) { struct order_node *on = list_entry(p, struct order_node, node); if (num == on->val) return on->idx; } return -1; } ``` #### DFS Preorder 中第一筆資料為根，而 Inorder 中左子樹及右子樹之資料分別在根的左右，所以利用```find()```得到根的index後，即可切割出左子樹及右子樹，且因為 Preorder 中左子樹之資料必定在右子樹之資料前面，所以先對左子樹遞迴做```dfs()```，接著對右子樹遞迴做```dfs()```便可得到整顆樹 ```C static struct TreeNode *dfs(int *preorder, int pre_low, int pre_high, int *inorder, int in_low, int in_high, struct hlist_head *in_heads, int size) { if (in_low > in_high || pre_low > pre_high) return NULL; struct TreeNode *tn = malloc(sizeof(*tn)); tn->val = preorder[pre_low]; int idx = find(preorder[pre_low], size, in_heads); tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder, in_low, idx - 1, in_heads, size); tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder, idx + 1, in_high, in_heads, size); return tn; } ``` 從以下程式碼得知，在第一輪搜尋完，可得根在中序之 index 為1，左子樹的資料為中序中 index 小於1的資料($inorder[i],i<1$)，右子樹的資料為中序中 index 大於1的資料($inorder[i],1<i<=4$) ```c tn->left = dfs(preorder, 1, 1, inorder, 0, 0, in_heads, 5); tn->right = dfs(preorder, 2, 4, inorder, 2, 4, in_heads, 5); ``` 前序 3[9][20,15,7] 中序 [9]3[15,20,7] ```c /** * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants * @pos: the cgroup * to use as the loop cursor * @cgroup: cgroup whose descendants to walk * * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A * descendant cgroup which hasn't finished ->css_online() or already has * finished ->css_offline() may show up during traversal and it's each * subsystem's responsibility to verify that each @pos is alive. * * If a subsystem synchronizes against the parent in its ->css_online() and * before starting iterating, and synchronizes against @pos on each * iteration, any descendant cgroup which finished ->css_online() is * guaranteed to be visible in the future iterations. * * In other words, the following guarantees that a descendant can't escape * state updates of its ancestors. * * my_online(@cgrp) * { * Lock @cgrp->parent and @cgrp; * Inherit state from @cgrp->parent; * Unlock both. * } * * my_update_state(@cgrp) * { * Lock @cgrp; * Update @cgrp's state; * Unlock @cgrp; * * cgroup_for_each_descendant_pre(@pos, @cgrp) { * Lock @pos; * Verify @pos is alive and inherit state from @pos->parent; * Unlock @pos; * } * } * * As long as the inheriting step, including checking the parent state, is * enclosed inside @pos locking, double-locking the parent isn't necessary * while inheriting. The state update to the parent is guaranteed to be * visible by walking order and, as long as inheriting operations to the * same @pos are atomic to each other, multiple updates racing each other * still result in the correct state. It's guaranateed that at least one * inheritance happens for any cgroup after the latest update to its * parent. * * If checking parent's state requires locking the parent, each inheriting * iteration should lock and unlock both @pos->parent and @pos. * * Alternatively, a subsystem may choose to use a single global lock to * synchronize ->css_online() and ->css_offline() against tree-walking * operations. */ #define cgroup_for_each_descendant_pre(pos, cgroup) \ for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \ pos = cgroup_next_descendant_pre((pos), (cgroup))) ``` ### 測驗2 :