2023q1 Homework4 (quiz4)

# 2023q1 Homework4 (quiz4) contributed by < `Joshualee0321` > ## 開發環境 ```bash $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: AuthenticAMD Model name: AMD Ryzen 5 2600X Six-Core Processor CPU family: 23 Model: 8 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU max MHz: 3600.0000 CPU min MHz: 2200.0000 BogoMIPS: 7199.10 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht ``` ## 測驗一 [`rb.h`](https://gist.github.com/jserv/6610eb56bf2486979c8bf6ee8061f71c) * 打開這個標頭檔後，可發現許多巨集以下為自己列出的筆記，<s>歡迎各位幫我指正我哪裡解釋錯誤。</s> :::danger 筆記和程式碼本來就要讓各方指教，不用特別提及。 :notes: jserv ::: ### 理解 rb.h 的實作展開該巨集，利用 `rb_gen(static, ex_, ex_t, ex_node_t, ex_link, ex_cmp)` 把以下的巨集展開之後可以得到 ```c static void ex_new(ex_t *tree); static ex_node_t *ex_search(ex_t *tree, ex_node_t *key); static void ex_insert(ex_t *tree, ex_node_t *node); static void ex_remove(ex_t *tree, ex_node_t *node); static void ex_destroy(ex_t *tree, void (*cb)(ex_node_t *, void*), void *arg); ``` :::danger 避免非必要的項目標示 (亦即 `* ` 開頭的描述)，並改進你的漢語表達。 :notes: jserv ::: 接下來我們可以觀察 `ex_insert` 的實作方法，其中大致可以分成幾個部分 ```c static void ex_insert(ex_t *rbtree, ex_node_t *node) { ex_path_entry_t path[RB_MAX_DEPTH]; ex_path_entry_t *pathp; rbt_node_new(ex_node_t, ex_link, rbtree, node); /* Wind. */ path->node = rbtree->root; for (pathp = path; pathp->node; pathp++) { int cmp = pathp->cmp = x_cmp(node, pathp->node); assert(cmp != 0); if (cmp < 0) { pathp[1].node = rbtn_left_get(ex_node_t, ex_link, pathp->node); } else { pathp[1].node = rbtn_right_get(ex_node_t, ex_link, pathp->node); } } pathp->node = node; ``` * 首先宣告 `ex_path_entry_t path[RB_MAX]` 以及 `ex_path_entry_t *pathp` 作為 iterator * 其作用為從樹根開始走訪節點（尋找插入的地方），並且把往左往右記錄在 `ex_path_entry_t` 的陣列當中 * 假設一紅黑樹如下 ```graphviz digraph BST { graph[ordering="out"] node [shape=circle, fixedsize=true, style=filled,width=.8] "B:35"[fillcolor=red] "C:60"[fillcolor=red] "A:50" "A:50" -> "B:35" "A:50" -> "C:60" } ``` * 假設現在要插入 `15`，則會先以 `BST` 的特性找到該插入的地方並且插入 :::info 注意這邊先行設定愈插入的 node 為紅色 ::: * 則接下來會得到以下樹 ```graphviz digraph BST{ graph[ordering="out"] node [shape=circle, fixedsize=true, style=filled, width=.8] "B:35"[fillcolor=red] "C:60"[fillcolor=red] NULL[color=transparent] "D:15"[color=red] "A:50" -> "B:35" "A:50" -> "C:60" "B:35" -> "D:15" "B:35" -> NULL } ``` 此時的 `path` 會長這樣 ```graphviz digraph { node [shape=plaintext, fontcolor=black, fontsize=16]; "Iter:" -> "Values:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> pathp | <f2> pathp[1]", color=white]; values [label="<f0> 50, -1| <f1> 35, -1| <f2> NULL| <f3> | <f4> | <f5> "]; { rank=same; "Iter:"; pointers } { rank=same; "Values:"; values } edge [color=blue]; pointers:f0 -> values:f1; pointers:f2 -> values:f2; } ``` 依照 `rb_insert` 中的 `pathp->node = node` ，把 `D:15` 插入在 `NULL` 中，接下來就進入到程式碼的第二個片段 ```c for (pathp--; (uintptr_t) pathp >= (uintptr_t) path; pathp--) { x_type *cnode = pathp->node; if(pathp->cmp < 0) { ... } else { ... } } ``` 由於在上一個迴圈中 `pathp` 的位置如下 ```graphviz digraph { node [shape=plaintext, fontcolor=black, fontsize=16]; "Iter:" -> "Values:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> pathp | <f2> ", color=white]; values [label="<f0> 50, -1| <f1> 35, -1| <f2> NULL| <f3> | <f4> | <f5> "]; { rank=same; "Iter:"; pointers } { rank=same; "Values:"; values } edge [color=blue]; pointers:f0 -> values:f2; } ``` 故要把原本的指針往前調整如下 ```graphviz digraph { node [shape=plaintext, fontcolor=black, fontsize=16]; "Iter:" -> "Values:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> pathp | <f2> ", color=white]; values [label="<f0> 50, -1| <f1> 35, -1| <f2> NULL| <f3> | <f4> | <f5> "]; { rank=same; "Iter:"; pointers } { rank=same; "Values:"; values } edge [color=blue]; pointers:f0 -> values:f1; } ``` 所以才會在這個迴圈先做一次 `pathp--` 往前走一格。再來，把以上 `if` 的部分展開之後可以看到 `else` 在下一個部分再看 ```c // cnode = 當前位置 if(pathp->cmp < 0) { ex_node_t *left = pathp[1].node; rbtn_left_set(ex_node_t, ex_link, cnode, left); if (!rbtn_red_get(ex_node_t, ex_link, left)) return; ex_node_t *leftleft = rbtn_left_get(ex_node_t, ex_link, left); if(leftleft && rbtn_red_get(ex_node_t, ex_link, leftleft)) { ex_node_t *tnode; // 暫存 rbtn_black_set(ex_node_t, ex_link, leftleft); rbtn_rotate_right(ex_node_t, ex_link, cnode, tnode); cnode = tnode; } else{...} } ``` * 由於這樣設計只需要考慮往左邊或往右邊，所以這邊用格子中的值去判斷自己的親節點是左邊還是右邊 `if(pathp->cmp < 0){...} else{...}`，並且由於是左傾斜的樹，他只需要對右邊的特殊狀況處理即可 * 如果有兩個顏色都為紅色的節點，那就代表這棵 `RB_t` 並不平均，那他就會旋轉，圖示如下 ```graphviz digraph BST { graph[ordering="out"] node [shape=circle, fixedsize=true, style=filled,width=.8] left,leftleft[fillcolor=red] null,null1[fontsize=0,color=transparent] cnode -> left cnode -> null[color=transparent] left -> leftleft left -> null1[color=transparent] fix_left[fillcolor=red] fix_left -> fix_leftleft fix_left -> fix_cnode } ``` 當然，這邊只考慮了往左的狀況，那如果往右邊呢？ ```c else { ex_node_t *right = path[1].node; rbtn_right_set(ex_node_t, ex_link, cnode, right); // 把 current node 的右邊設定為當前的下一個點 if (!rbtn_red_get(ex_node_t, ex_link, right)) // 如果沒有連續兩個紅色 return; ex_node_t *left = rbtn_left_get(ex_node_t, ex_link, cnode); // 這邊對右邊的狀況作特殊處理 if (left && rbtn_red_get(ex_node_t, ex_link, left)) { rbtn_black_set(ex_node_t, ex_link, left); rbtn_black_set(ex_node_t, ex_link, right); rbtn_red_set(ex_node_t, ex_link, cnode); } else { /* 因為並不是兩個連續的紅色 node，又是右邊，所以必須回復左傾斜的特性*/ ex_node_t *tnode; bool tred = rbtn_red_get(ex_node_t, ex_link, cnode); // 看當前的點是否為紅 rbtn_rotate_left(ex_node_t, ex_link, cnode, tnode); rbtn_color_set(ex_node_t, ex_link, tnode, tred); rbtn_red_set(ex_node_t, ex_link, cnode); // 旋轉後根據顏色設定 cnode = tnode; } } pathp->node = cnode; ``` 最後再根據紅黑樹的特性，把樹根設定為黑色即可 `rbtree-root = path->node; rbtn_black_set(ex_node_t, ex_link, rbtree->root);` :::info 以上為 linux 左傾斜紅黑樹插入的實作 > 確認 Linux 核心的 rbtree 是否為 LLRBT，若不是，考慮因素在哪？ :notes: jserv <s> `AAAA` = `rbtn_black_set` `BBBB` = `tred` `CCCC` = `rbtn_red_set` </s> ::: ### 以下為 `rb_tree` 的刪除部分，由於太長，這邊只列出想法以及答案此程式碼也可以分成幾段來觀看，並且於先前 `insert` 一樣，都會創建一個 `iteration_list` 來記錄當前走過的路。第一段 ```c for(pathp = path; pathp->node; pathp++) { int cmp = pathp->cmp = x_cmp(node, pathp->node); if(cmp < 0) {...} // 代表往左走 else { // 代表往右走或是找到了 if(cmp == 0) { // 找到要刪除的點 pathp->cmp = 0; // 這邊可以視為用 0 紀錄要刪除的點 for(pathp++; pathp->node; pathp++) { ... } // 往左找替換點 } } } ``` * 接下來為下一個段落，下一個段落就是從先確認是否在 0 和 1 層做事，如果是的話就不需要和後繼者 (`successor`) 互相交換，並且把要刪除的點放在 `path[-1]`，反之，則調整顏色 ```c pathp--; // 因為會跑到 NULL 節點要往回走一格 if (pathp->node != node) { ... } // left successor else { ex_node_t *left = rbtn_left_get(ex_node_t, ex_link, node); rbtn_black_set(ex_node_t, ex_link, left); if (pathp->node != node) { ... } else { ex_node_t *left = rbtn_left_get(ex_node_t, ex_link, node); if(left) { ... } else if(!rbtn_right_get(ex_node_t, ex_link, node)) {} } } ``` * 經剛剛所述都調整過後 (經過交換 `successor` 以及設定顏色後) 開始從當前位置往前調整紅黑樹 * <s>根據此[網誌](http://alrightchiu.github.io/SecondRound/red-black-tree-deleteshan-chu-zi-liao-yu-fixupxiu-zheng.html)得知紅黑樹的刪除過程</s> :::warning 參照更權威的書籍和文獻，例如大學教科書，避免從語焉不詳的網誌學習。 :notes: jserv ::: * 最後就是根據當前的點往前調整紅黑樹使其恢復平衡首先先來看到程式碼片段的 FFFF ```c if (leftrightleft && \ rbtn_red_get(x_type, x_field, leftrightleft)) { \ /* || */ \ /* pathp(b) */ \ /* / \\ */ \ /* (r) (b) */ \ /* \ */ \ /* (b) */ \ /* / */ \ /* (r) */ \ x_type *unode; \ rbtn_black_set(x_type, x_field, leftrightleft); \ rbtn_rotate_right(x_type, x_field, pathp->node, \ unode); \ rbtn_rotate_right(x_type, x_field, pathp->node, \ tnode); \ rbtn_right_set(x_type, x_field, unode, tnode); \ FFFF(x_type, x_field, unode, tnode); ``` 欲刪除的節點為目前節點的右節點，為了要使刪除時維持左傾斜的特性，作者選擇右旋轉兩次，以下為圖示 ```graphviz digraph BST { graph[ordering="out"] node [shape=circle, fixedsize=true, style=filled,width=.8] n1, n2, n3[color=transparent, fontsize=0] "A" "B:unode"[fillcolor=red] "C" "D"[fillcolor=red] "A" -> "B:unode" "A" -> "E" "B:unode" -> n1[color=transparent] "B:unode" -> "C" "C" -> "D" "C" -> n2[color=transparent] } ``` 在最後一步之前會轉化為這樣 ```graphviz digraph BST { graph[ordering="out"] node [shape=circle, fixedsize=true, style=filled,width=.8] n1, n2, n3, n4[color=transparent, fontsize=0] "A" "B:unode"[fillcolor=red] "C" "D"[fillcolor=red] tnode -> n4[color=transparent] tnode -> "B:unode" "B:unode" -> n1[color=transparent] "B:unode" -> "A" "A" -> "C" "A" -> "E" "C" -> "D" "C" -> n2[color=transparent] } ``` 在經過最後一次左旋轉即可得到左傾斜的紅黑樹 ```graphviz digraph BST { graph[ordering="out"] node [shape=circle, fixedsize=true, style=filled,width=.8] n1, n2, n3, n4[color=transparent, fontsize=0] "A" "B:unode"[fillcolor=red] "C" "D"[fillcolor=red] "B:unode" -> tnode "B:unode" -> "A" "A" -> "C" "A" -> "E" "C" -> "D" "C" -> n2[color=transparent] } ``` 如此一來即恢復平衡在 `GGGG` 則是可以直接以 `rbtn_rotate_right` 填入來恢復平衡 :::info 以上為 linux 左傾斜紅黑樹刪除的想法 `DDDD` = `0` : 為了要記錄刪除的點 `EEEE` = `!rbtn_right_get(x_type, x_field, node)` : 為了看是否右邊沒有東西 `FFFF` = `rbtn_rotate_left` `GGGG` = `rbtn_rotate_right` ::: --- ## 測驗二 ### `TimSort` * 是一種排序演算法，幾乎所有的情況都可以有 `O(nlogn)` (comparison based sort 的最佳時間複雜度)，其中利用了 `insertion_sort` 跟 `merge_sort` 來分別處理比較少的資料以及比較多的資料。 ### `__inplace_merge` 為了要合併兩個 `run` 並且保持 `inplace`，這邊採用的方法為反轉裡面的資料來達到排序的效果 * 由於我們可以確定每一個 run 都經由 `insertion_sort` 排序，所以為升冪排序，以下給予圖片解說來解釋裡面程式碼的作用 * 假設兩個run 分別為 `[1, 2, 4, 5]` 跟 `[3, 6, 10]` 流程如下： 1. 首先將 `cur_1` 指到比 `first_2` 更小的位置利用 `while(cur_1 < last_2 && !cmp(cur_2, cur_1)) cur_1 += size;` 2. 再來把 `cur_2` 指到比　`cur_1` 更小的位置利用 `while(cur_2 < last_2 && cmp(cur_2, cur_1)) cur_2 += size;` 即可得到目前的狀況 ```graphviz digraph { node [shape=plaintext, fontcolor=black, fontsize=16]; "" -> "Values:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> first_1 |<cur> cur_1| <f1> last_1 | <f2> first_2| <f5> cur_2| <f3> last_2", color=white]; values [label="<f0> 1| <f1> 2| <f2> 4| <f3> 5 | <f4> 3 | <f5> 6 | <f6> 10"]; { rank=same; ""; pointers } { rank=same; "Values:"; values } edge [color=blue]; // pointers:f0 -> values:f2; pointers:f0 -> values:f0; pointers:f2 -> values:f4; pointers:f1 -> values:f3; pointers:f3 -> values:f6; pointers:cur -> values:f2; pointers:f5 -> values:f4; } ``` 接下來把 `cur_1` 到 `first_2` 之前相反過來 * `__reverse(cur_1, first_2, size)` 再來將 `first_2` 與 `cur_2` 之前相反過來 * `__reverse(first_2, cur_2, size)` 即可得到當前的陣列 ```graphviz digraph { node [shape=plaintext, fontcolor=black, fontsize=16]; "" -> "Values:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> first_1 |<cur> cur_1| <f1> last_1 | <f2> first_2| <f5> cur_2| <f3> last_2", color=white]; values [label="<f0> 1| <f1> 2| <f2> 5| <f3> 4 | <f4> 3 | <f5> 6 | <f6> 10"]; { rank=same; ""; pointers } { rank=same; "Values:"; values } edge [color=blue]; // pointers:f0 -> values:f2; pointers:f0 -> values:f0; pointers:f2 -> values:f4; pointers:f1 -> values:f3; pointers:f3 -> values:f6; pointers:cur -> values:f2; pointers:f5 -> values:f4; } ``` 可以注意到，根據剛剛的翻轉，中間已經呈現了一個降冪的關係，此時只需要重新反轉一次即可得到最後排序好的兩個 `run` `__reverse(cur_1, cur_2, size)` ### 為避免單個 run 太長，導致 `merge` 速度變差 * 設計這個 `tim_sort` 的人有稍微考慮到，當兩個 `run` 的長度不一致，達到非常懸殊的等級的時候，假設兩個 `run` 分別為 `A_run, (len(A_run) == 10)` 以及 `B_run, (len(B_run) == 100)`，此時的 `merge` 會在兩個已經sort好的陣列中做出相當多不需要的操作，而當兩個 `run` 的長度差不多時，此時的 `merge` 才算比較有效率。 * 為了達到此效用，作者設計了一個 `merge_rule` 如下 ```c static inline bool merge_rule(run_t *stack, size_t top) { if (top < 1) return false; if (top < 2) return run_length(stack[top]) > run_length(stack[top - 1]); return run_length(stack[top]) > run_length(stack[top - 1]) || run_length(stack[top]) + run_length(stack[top - 1]) > run_length(stack[top - 2]); } ``` 意即若當前這個 `run` 加上前一個 `run` 沒有比前兩個 `run` 大時，不會進入 `merge` 的片段 :::info <s> 各答案如下： `AAAA` : `__reverse(cur_1, cur_2, size)` `BBBB` : `top - 1` `CCCC` : `top - 2` `DDDD` : `first + MIN_RUN * size` 我想這邊應該不需要太過解釋，就只是沒有超過時取最小 `EEEE` : `top--` 為了要合併 top 的內容 `FFFF` : `stack[top - 1].last = stack[top].last` 一個一個合併 </s> 不用抄寫答案，只要專注程式碼本身和你的理解。 :notes: jserv ::: --- ## 測驗三 ### `RB_LOG2_MAX_NODES` ```c #define RB_LOG2_MAX_MEM_BYTES (sizeof(void *) << 3) /* clang-format off */ #define RB_LOG2_MAX_NODES \ (RB_LOG2_MAX_MEM_BYTES - \ (sizeof(void *) >= 8 ? 4 \ : sizeof(void *) >= 4 ? 3 \ : 1) - 1) /* clang-format on */ ``` 可以根據 `rb_tree` 的定義，樹高為 `2 * log(n + 1)` 以及不超出記憶體限制的規則得到以下結論。 ### `sum_tree` :::danger 沒必要將程式碼註解改為中文，直接解讀程式碼的作用，避免逐行解說，而該揣摩設計者的思維。 :notes: jserv ::: ```c static int sum_subtree(node_t *node) { int result = 0; while (node) { node_t *pre; if (!rbtn_left_get(node_t, link, node)) goto do_print; for (pre = rbtn_left_get(node_t, link, node); rbtn_right_get(node_t, link, pre) && rbtn_right_get(node_t, link, pre) != node; pre = rbtn_right_get(node_t, link, pre)) { } /* 根據每一個點往右走*/ if (!rbtn_right_get(node_t, link, pre)) { rbtn_right_get(node_t, link, pre) = node; node = rbtn_left_get(node_t, link, node); /* 右邊沒有找左邊 */ } else { rbtn_right_get(node_t, link, pre) = NULL; do_print: result += node->value; printf("value: %d\n", node->value); node = rbtn_right_get(node_t, link, node); /* cache */ } } return result; } ```