# 2024q1 Homework2 (quiz1+2)
contributed by < ```aa860630``` >
## quiz1
### 測驗1 :
資料型別 : ```node_t```
```c
typedef struct __node {
struct __node *left, *right;
struct __node *next;
long value;
} node_t;
```
```graphviz
digraph "__node"
{
node [shape="record"];
rankdir = "LR"
subgraph cluster_a
{
label = "node_t";
node1 [label="{<left> left |<right> right}|next|value"];
}
}
```
```list_tail()```與```list_length()```裡的 while 迴圈作用在於檢測節點是否存在,透過不斷將```left```指向下個節點來移動,由於```left```為indirect pointer,所以 (*left)->next 前要加一個 "&"
```c
node_t *list_tail(node_t **left)
{
while ((*left) && (*left)->next)
left = &((*left)->next);
return *left;
}
int list_length(node_t **left)
{
int n = 0;
while (*left) {
++n;
left = &((*left)->next);
}
return n;
}
```
Quick sort 比較直觀的方法是當選出每輪的 Pivot 後,利用遞迴的方式去排序大於和小於 Pivot 的串列,而此處是利用指標陣列去儲存每個串列的位置,陣列裡的 index 越高代表串列越接近最大值,這麼做的好處在於不用進行函式呼叫(不用使用 stack 做儲存),快速且節省空間
```c
while (i >= 0) {
node_t *L = begin[i], *R = end[i];
...
begin[i] = left;
end[i] = list_tail(left);
begin[i + 1] = pivot;
end[i + 1] = pivot;
begin[i + 2] = right;
end[i + 2] = list_tail(right);
left = right = NULL;
i += 2;
...
i--;
}
}
```
#### 時間複雜度
假如每次選擇的 pivot 都剛好是中位數,下輪需要比較的次數約只有一半,為best case,時間複雜度為 :
$T(n) = 2*T(n/2) + c*n$
$O(n) = n logn$
當 Quick sort 作用於已排序串列時,每輪選擇的 pivot 為最大或最小,使得切割無法產生一分為二的效果,時間複雜度為 :
$T(n) = T(n-1) + T(0) + c*n$
$O(n) = n^2$
> 第一輪
```graphviz
digraph {
node[shape=record];
graph[pencolor=transparent];
rankdir=LR;
p1[label="{<data> 1|<next>}"];
p2[label="{<data> 2|<next>}"];
p3[label="{<data> 3|<next>}"];
p4[label="{<data> 4|<next>}"];
p5[label="{<data> 5|<next>}"];
L[shape=none];
pivot[shape=none];
R[shape=none];
L -> p1
pivot -> p1
R -> p5
// {rank=same; L -> p1;}
edge[tailclip=false,arrowtail=dot,dir=both];
p1:next:c -> p2:data;
p2:next:c -> p3:data;
p3:next:c -> p4:data;
p4:next:c -> p5:data;
p5:next:c -> NULL;
}
```
> 第二輪
```graphviz
digraph {
node[shape=record];
graph[pencolor=transparent];
rankdir=LR;
p1[label="{<data> 1|<next>}"];
p2[label="{<data> 2|<next>}"];
p3[label="{<data> 3|<next>}"];
p4[label="{<data> 4|<next>}"];
p5[label="{<data> 5|<next>}"];
L[shape=none];
pivot[shape=none];
R[shape=none];
L -> p2
pivot -> p2
R -> p5
// {rank=same; L -> p1;}
edge[tailclip=false,arrowtail=dot,dir=both];
p1:next:c -> p2:data;
p2:next:c -> p3:data;
p3:next:c -> p4:data;
p4:next:c -> p5:data;
p5:next:c -> NULL;
}
```
> 第三輪
```graphviz
digraph {
node[shape=record];
graph[pencolor=transparent];
rankdir=LR;
p1[label="{<data> 1|<next>}"];
p2[label="{<data> 2|<next>}"];
p3[label="{<data> 3|<next>}"];
p4[label="{<data> 4|<next>}"];
p5[label="{<data> 5|<next>}"];
L[shape=none];
pivot[shape=none];
R[shape=none];
L -> p3
pivot -> p3
R -> p5
// {rank=same; L -> p1;}
edge[tailclip=false,arrowtail=dot,dir=both];
p1:next:c -> p2:data;
p2:next:c -> p3:data;
p3:next:c -> p4:data;
p4:next:c -> p5:data;
p5:next:c -> NULL;
}
```
### 測驗2 :
* ```find_run()``` : 這個函式用於查找連續遞增或遞減的子序列(即 run)。它會返回一個 pair 結構,其中包含找到的 run 的頭指針和下一個元素的指針
* ```merge_collapse()``` : 這個函式用於合併相鄰的 run,直到達到某個條件為止。它會根據 stk_size 來決定合併的次數,同時根據 run 的大小來判斷合併的位置
* ```merge_force_collapse()``` : 這個函式用於強制合併相鄰的 run,直到堆疊中只剩下一個 run。它會根據 run 的大小來決定合併的位置,並且會根據 stk_size 來判斷是否要繼續合併
* ```merge_final()``` : 函式將剩餘的 run 合併成一個完整的排序好的鏈表。這個函式會遍歷所有的 run,並根據 cmp 函式的返回值來進行合併,同時保持排序的穩定性。最後,它會透過```build_prev_link()```重新建立 prev 連結,使之成為一個循環鏈表
timsort 的程式碼主要透過上述函式來實現的,這些函式的執行順序和作用如下:
1. 首先,程序初始化了一些變量,包括堆疊大小 (stk_size),然後設置了 head 為鏈表的頭指針,並檢查是否只有一個元素,如果只有一個元素,則不需要進行排序,直接返回。
2. 接下來,利用 find_run() 函式找到所有的 run。這個函式會遍歷鏈表,並找到遞增或遞減的子序列,然後將其分離出來。
以遞減為例,根據 ```cmp(priv,list,next)``` ,若當前節點大於下個節點,則```len++```,一直持續到下個節點不為遞減,與此同時反轉這個遞減串列
```c
static struct pair find_run(void *priv,
struct list_head *list,
list_cmp_func_t cmp)
{
size_t len = 1;
struct list_head *next = list->next, *head = list;
struct pair result;
...
if (cmp(priv, list, next) > 0) {
/* decending run, also reverse the list */
struct list_head *prev = NULL;
do {
len++;
list->next = prev;
prev = list;
list = next;
next = list->next;
head = list;
} while (next && cmp(priv, list, next) > 0);
list->next = prev;
...
}
head->prev = NULL;
head->next->prev = (struct list_head *) len;
result.head = head, result.next = next;
return result;
}
```
3. 在找到 run 之後,利用 merge_collapse() 函式合併相鄰的 run,直到滿足某個條件為止。這個函式會根據堆疊中 run 的大小來決定是否需要進行合併,同時根據 run 的大小來決定合併的位置。
藉此函示,不僅確保堆疊中 run 的長度平衡,且因合併只能在相鄰的兩個 run 間進行,以確保排序的穩定性,因此合併操作僅可能採取 A+(B+C) 或 (A+B)+C 的形式進行。此舉讓 Timsort 在執行時兼顧高效又穩定,詳細說明請見[2024q1](https://hackmd.io/@sysprog/linux2024-quiz1#%E6%B8%AC%E9%A9%97-2)
```c
static struct list_head *merge_collapse(void *priv,
list_cmp_func_t cmp,
struct list_head *tp)
{
int n;
while ((n = stk_size) >= 2) {
if ((n >= 3 &&
run_size(tp->prev->prev) <= run_size(tp->prev) + run_size(tp)) ||
(n >= 4 && run_size(tp->prev->prev->prev) <=
run_size(tp->prev->prev) + run_size(tp->prev))) {
if (run_size(tp->prev->prev) < run_size(tp)) {
tp->prev = merge_at(priv, cmp, tp->prev);
} else {
tp = merge_at(priv, cmp, tp);
}
} else if (run_size(tp->prev) <= run_size(tp)) {
tp = merge_at(priv, cmp, tp);
} else {
break;
}
}
return tp;
}
```
4. 然後,利用 merge_force_collapse() 函式強制合併相鄰的 run,直到堆疊中只剩下一個 run。這個函式會一直遍歷堆疊,直到堆疊中只剩下一個 run 為止。
5. 最後,利用 merge_final() 函式將剩餘的 run 合併成一個完整的排序好的鏈表。這個函式會遍歷所有的 run,並根據 cmp 函式的返回值來進行合併,同時保持排序的穩定性。最後,它會重新建立 prev 連結,使之成為一個循環鏈表。
甚麼是穩定性?當一樣的數值在比大小時即便位置互換看起來結果不變,但不必要的交換可能會延長執行時間,甚至影響實驗結果,```cmp(priv, a, b) <= 0```中的``` = ```就在確保若數值一樣則以靠前的```a```為優先,確保排序的穩定性。
```c
for (;;) {
/* if equal, take 'a' -- important for sort stability */
if (cmp(priv, a, b) <= 0) {
tail->next = a;
a->prev = tail;
tail = a;
a = a->next;
if (!a)
break;
}
```
這樣,利用上述函式的組合,就完成了 timsort 算法的實現。
```c
void timsort(void *priv, struct list_head *head, list_cmp_func_t cmp)
{
stk_size = 0;
struct list_head *list = head->next, *tp = NULL;
if (head == head->prev)
return;
/* Convert to a null-terminated singly-linked list. */
head->prev->next = NULL;
do {
/* Find next run */
struct pair result = find_run(priv, list, cmp);
result.head->prev = tp;
tp = result.head;
list = result.next;
stk_size++;
tp = merge_collapse(priv, cmp, tp);
} while (list);
/* End of input; merge together all the runs. */
tp = merge_force_collapse(priv, cmp, tp);
/* The final merge; rebuild prev links */
struct list_head *stk0 = tp, *stk1 = stk0->prev;
while (stk1 && stk1->prev)
stk0 = stk0->prev, stk1 = stk1->prev;
if (stk_size <= 1) {
build_prev_link(head, head, stk0);
return;
}
merge_final(priv, cmp, head, stk1, stk0);
}
```
## quiz2
### 測驗1 :
#### Hash table 資料結構[(參考)](https://hackmd.io/rpIi8FeqQ9eBJ-UTFgb--A?view)
示意圖如下:
```graphviz
digraph G {
rankdir=LR;
node[shape=record];
map [label="hlist_head.first |<ht0> |<ht1> |<ht2> |<ht3> |<ht4> |<ht5> |<ht7> |<ht8> "];
node[shape=none]
null1 [label=NULL]
null2 [label=NULL]
subgraph cluster_1 {
style=filled;
color=lightgrey;
node [shape=record];
hn1 [label="hlist_node | {<prev>pprev | <next>next}"];
label="hash_key 1"
}
subgraph cluster_2 {
style=filled;
color=lightgrey;
node [shape=record];
hn2 [label="hlist_node | {<prev>pprev | <next>next}"];
label="hash_key 2"
}
subgraph cluster_3 {
style=filled;
color=lightgrey;
node [shape=record];
hn3 [label="hlist_node | {<prev>pprev | <next>next}"];
label="hash_key 3"
}
map:ht1 -> hn1
hn1:next -> hn2
hn2:next -> null1
hn2:prev:s -> hn1:next:s
map:ht5 -> hn3
hn3:next -> null2
hn1:prev:s -> map:ht1
hn3:prev:s -> map:ht5
}
```
Linux 核心的 hash table 實作中,用以處理 hash 數值碰撞的 hlist_node:
```c
struct hlist_node {
struct hlist_node *next, **pprev;
};
struct hlist_head {
struct hlist_node *first;
};
```
此處不使用雙向鏈結串列而是使用 pointer to pointer益處有二 :
* 僅有末端指標內容是 ```NULL```。
* ```next ```指向相鄰的節點本身,而 ```pprev``` 指向指著自身節點的指標。
```graphviz
digraph G {
rankdir = LR;
splines = false;
node[shape = "record"]
list_head[label = "<m>list_head | <n>first"]
node_1[label = "<m>hlist_node_1 | {<p>pprev | <n>next}", group = list];
node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next}", group = list];
node_3[label = "<m>hlist_node_3 | {<p>pprev | <n>next}", group = list];
NULL[shape = plaintext, label = "NULL", group = list]
{rank = "min" list_head}
list_head -> node_1 -> node_2 -> node_3[
weight = 10, style = invis
]
list_head:n -> node_1:m;
node_1:p -> list_head:n;
node_1:n -> node_2:m;
node_2:p -> node_1:n;
node_2:n -> node_3:m;
node_3:p -> node_2:n;
node_3:n -> NULL;
}
```
#### 雜湊函數
```int hash = (num < 0 ? -num : num) % size;```翻譯成數學式為:
$hash=(∣num∣mod size)$
這個函數首先取絕對值來保證得到的是非負數,然後再取餘數來確保得到的哈希值在哈希表的範圍內
該函式使用哈希表來找尋節點,較一般二元樹搜尋(時間複雜度為$O(logn)$)還快上許多,但要尋找到合適的雜湊函數相對難,必須要 hash range 與碰撞次數中取得平衡
```c
static int find(int num, int size, const struct hlist_head *heads)
{
struct hlist_node *p;
int hash = (num < 0 ? -num : num) % size;
hlist_for_each (p, &head[hash]) {
struct order_node *on = list_entry(p, struct order_node, node);
if (num == on->val)
return on->idx;
}
return -1;
}
```
#### DFS
Preorder 中第一筆資料為根,而 Inorder 中左子樹及右子樹之資料分別在根的左右,所以利用```find()```得到根的index後,即可切割出左子樹及右子樹,且因為 Preorder 中左子樹之資料必定在右子樹之資料前面,所以先對左子樹遞迴做```dfs()```,接著對右子樹遞迴做```dfs()```便可得到整顆樹
```C
static struct TreeNode *dfs(int *preorder,
int pre_low,
int pre_high,
int *inorder,
int in_low,
int in_high,
struct hlist_head *in_heads,
int size)
{
if (in_low > in_high || pre_low > pre_high)
return NULL;
struct TreeNode *tn = malloc(sizeof(*tn));
tn->val = preorder[pre_low];
int idx = find(preorder[pre_low], size, in_heads);
tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder,
in_low, idx - 1, in_heads, size);
tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder,
idx + 1, in_high, in_heads, size);
return tn;
}
```
從以下程式碼得知,在第一輪搜尋完,可得根在中序之 index 為1,左子樹的資料為中序中 index 小於1的資料($inorder[i],i<1$),右子樹的資料為中序中 index 大於1的資料($inorder[i],1<i<=4$)
```c
tn->left = dfs(preorder, 1, 1, inorder, 0, 0, in_heads, 5);
tn->right = dfs(preorder, 2, 4, inorder, 2, 4, in_heads, 5);
```
前序 3[9][20,15,7]
中序 [9]3[15,20,7]
```c
/**
* cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants
* @pos: the cgroup * to use as the loop cursor
* @cgroup: cgroup whose descendants to walk
*
* Walk @cgroup's descendants. Must be called under rcu_read_lock(). A
* descendant cgroup which hasn't finished ->css_online() or already has
* finished ->css_offline() may show up during traversal and it's each
* subsystem's responsibility to verify that each @pos is alive.
*
* If a subsystem synchronizes against the parent in its ->css_online() and
* before starting iterating, and synchronizes against @pos on each
* iteration, any descendant cgroup which finished ->css_online() is
* guaranteed to be visible in the future iterations.
*
* In other words, the following guarantees that a descendant can't escape
* state updates of its ancestors.
*
* my_online(@cgrp)
* {
* Lock @cgrp->parent and @cgrp;
* Inherit state from @cgrp->parent;
* Unlock both.
* }
*
* my_update_state(@cgrp)
* {
* Lock @cgrp;
* Update @cgrp's state;
* Unlock @cgrp;
*
* cgroup_for_each_descendant_pre(@pos, @cgrp) {
* Lock @pos;
* Verify @pos is alive and inherit state from @pos->parent;
* Unlock @pos;
* }
* }
*
* As long as the inheriting step, including checking the parent state, is
* enclosed inside @pos locking, double-locking the parent isn't necessary
* while inheriting. The state update to the parent is guaranteed to be
* visible by walking order and, as long as inheriting operations to the
* same @pos are atomic to each other, multiple updates racing each other
* still result in the correct state. It's guaranateed that at least one
* inheritance happens for any cgroup after the latest update to its
* parent.
*
* If checking parent's state requires locking the parent, each inheriting
* iteration should lock and unlock both @pos->parent and @pos.
*
* Alternatively, a subsystem may choose to use a single global lock to
* synchronize ->css_online() and ->css_offline() against tree-walking
* operations.
*/
#define cgroup_for_each_descendant_pre(pos, cgroup) \
for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \
pos = cgroup_next_descendant_pre((pos), (cgroup)))
```
### 測驗2 :