2023q1 Homework4 (quiz4)

contributed by < Joshualee0321 >

開發環境

$ gcc --version
    gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
    Copyright (C) 2021 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ lscpu
    Architecture:            x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Address sizes:         43 bits physical, 48 bits virtual
      Byte Order:            Little Endian
    CPU(s):                  12
      On-line CPU(s) list:   0-11
    Vendor ID:               AuthenticAMD
      Model name:            AMD Ryzen 5 2600X Six-Core Processor
        CPU family:          23
        Model:               8
        Thread(s) per core:  2
        Core(s) per socket:  6
        Socket(s):           1
        Stepping:            2
        Frequency boost:     enabled
        CPU max MHz:         3600.0000
        CPU min MHz:         2200.0000
        BogoMIPS:            7199.10
        Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht

測驗一 `rb.h`

打開這個標頭檔後，可發現許多巨集以下為自己列出的筆記，~~歡迎各位幫我指正我哪裡解釋錯誤。~~

筆記和程式碼本來就要讓各方指教，不用特別提及。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

理解 rb.h 的實作

展開該巨集，利用 rb_gen(static, ex_, ex_t, ex_node_t, ex_link, ex_cmp) 把以下的巨集展開之後可以得到

static void ex_new(ex_t *tree);
static ex_node_t *ex_search(ex_t *tree, ex_node_t *key);
static void ex_insert(ex_t *tree, ex_node_t *node);
static void ex_remove(ex_t *tree, ex_node_t *node);
static void ex_destroy(ex_t *tree, void (*cb)(ex_node_t *, void*), void *arg);

避免非必要的項目標示 (亦即 * 開頭的描述)，並改進你的漢語表達。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

接下來我們可以觀察 ex_insert 的實作方法，其中大致可以分成幾個部分

    static void ex_insert(ex_t *rbtree, ex_node_t *node)             
    {                                                                          
        ex_path_entry_t path[RB_MAX_DEPTH];                             
        ex_path_entry_t *pathp;                                         
        rbt_node_new(ex_node_t, ex_link, rbtree, node);                           
        /* Wind. */                                                            
        path->node = rbtree->root;                                             
        for (pathp = path; pathp->node; pathp++) {                             
            int cmp = pathp->cmp = x_cmp(node, pathp->node);                   
            assert(cmp != 0);                                                  
            if (cmp < 0) {                                                     
                pathp[1].node = rbtn_left_get(ex_node_t, ex_link, pathp->node);   
            } else {                                                          
                pathp[1].node = rbtn_right_get(ex_node_t, ex_link, pathp->node);  
            }                                                                 
        }                                                                      
        pathp->node = node;

首先宣告 ex_path_entry_t path[RB_MAX] 以及 ex_path_entry_t *pathp 作為 iterator
其作用為從樹根開始走訪節點（尋找插入的地方），並且把往左往右記錄在 ex_path_entry_t 的陣列當中
假設一紅黑樹如下

digraph BST {
    graph[ordering="out"]
    node [shape=circle, fixedsize=true, style=filled,width=.8]
    "B:35"[fillcolor=red]
    "C:60"[fillcolor=red]
    "A:50"
    
    "A:50" -> "B:35"
    "A:50" -> "C:60"
}

假設現在要插入 15，則會先以 BST 的特性找到該插入的地方並且插入

注意這邊先行設定愈插入的 node 為紅色

則接下來會得到以下樹

digraph BST{ 
    graph[ordering="out"]
    node [shape=circle, fixedsize=true, style=filled, width=.8]

    "B:35"[fillcolor=red]
    "C:60"[fillcolor=red]
    NULL[color=transparent]
    
    "D:15"[color=red]
    
    "A:50" -> "B:35"
    "A:50" -> "C:60"
    "B:35" -> "D:15"
    "B:35" -> NULL
    
    
}

此時的 path 會長這樣

digraph {
    node [shape=plaintext, fontcolor=black, fontsize=16];
    "Iter:" -> "Values:" [color=white];

    node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
    pointers [label="<f0> pathp | <f2> pathp[1]", color=white];
    values [label="<f0> 50, -1| <f1> 35, -1| <f2> NULL| <f3>  | <f4>  | <f5> "];


    { rank=same; "Iter:"; pointers }
    { rank=same; "Values:"; values }

    edge [color=blue];
    pointers:f0 -> values:f1;
    pointers:f2 -> values:f2;
}

依照 rb_insert 中的 pathp->node = node ，把 D:15 插入在 NULL 中，接下來就進入到程式碼的第二個片段

    for (pathp--; (uintptr_t) pathp >= (uintptr_t) path; pathp--) {
        x_type *cnode = pathp->node;
        if(pathp->cmp < 0) { ... }
        else { ... }
    }

由於在上一個迴圈中 pathp 的位置如下

digraph {
    node [shape=plaintext, fontcolor=black, fontsize=16];
    "Iter:" -> "Values:" [color=white];

    node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
    pointers [label="<f0> pathp | <f2> ", color=white];
    values [label="<f0> 50, -1| <f1> 35, -1| <f2> NULL| <f3>  | <f4>  | <f5> "];


    { rank=same; "Iter:"; pointers }
    { rank=same; "Values:"; values }

    edge [color=blue];
    pointers:f0 -> values:f2;
}

故要把原本的指針往前調整如下

digraph {
    node [shape=plaintext, fontcolor=black, fontsize=16];
    "Iter:" -> "Values:" [color=white];

    node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
    pointers [label="<f0> pathp | <f2> ", color=white];
    values [label="<f0> 50, -1| <f1> 35, -1| <f2> NULL| <f3>  | <f4>  | <f5> "];


    { rank=same; "Iter:"; pointers }
    { rank=same; "Values:"; values }

    edge [color=blue];
    pointers:f0 -> values:f1;
}

所以才會在這個迴圈先做一次 pathp-- 往前走一格。
再來，把以上 if 的部分展開之後可以看到 else 在下一個部分再看

// cnode = 當前位置
if(pathp->cmp < 0) {
    ex_node_t *left = pathp[1].node;
    rbtn_left_set(ex_node_t, ex_link, cnode, left);
    if (!rbtn_red_get(ex_node_t, ex_link, left))
        return;
    
    ex_node_t *leftleft = rbtn_left_get(ex_node_t, ex_link, left);
    if(leftleft && rbtn_red_get(ex_node_t, ex_link, leftleft)) {
        ex_node_t *tnode; // 暫存
        rbtn_black_set(ex_node_t, ex_link, leftleft);
        rbtn_rotate_right(ex_node_t, ex_link, cnode, tnode);
        cnode = tnode;
    } else{...}
}

由於這樣設計只需要考慮往左邊或往右邊，所以這邊用格子中的值去判斷自己的親節點是左邊還是右邊 if(pathp->cmp < 0){...} else{...}，並且由於是左傾斜的樹，他只需要對右邊的特殊狀況處理即可
如果有兩個顏色都為紅色的節點，那就代表這棵 RB_t 並不平均，那他就會旋轉，圖示如下

digraph BST {
    graph[ordering="out"]
    node [shape=circle, fixedsize=true, style=filled,width=.8]
    
    left,leftleft[fillcolor=red]
    
    null,null1[fontsize=0,color=transparent] 
    cnode -> left
    cnode -> null[color=transparent]
    left -> leftleft
    left -> null1[color=transparent]
    
    fix_left[fillcolor=red]
    
    fix_left -> fix_leftleft
    fix_left -> fix_cnode
    
}

當然，這邊只考慮了往左的狀況，那如果往右邊呢？

else {
    ex_node_t *right = path[1].node;
    rbtn_right_set(ex_node_t, ex_link, cnode, right);
    // 把 current node 的右邊設定為當前的下一個點
    
    if (!rbtn_red_get(ex_node_t, ex_link, right))
        // 如果沒有連續兩個紅色
        return;
    ex_node_t *left = rbtn_left_get(ex_node_t, ex_link, cnode);
    // 這邊對右邊的狀況作特殊處理
    
    if (left && rbtn_red_get(ex_node_t, ex_link, left)) {
        rbtn_black_set(ex_node_t, ex_link, left);
        rbtn_black_set(ex_node_t, ex_link, right);
        rbtn_red_set(ex_node_t, ex_link, cnode);
    } else {
        /* 因為並不是兩個連續的紅色 node，又是右邊，所以必須回復左傾斜的特性*/
        ex_node_t *tnode;
        bool tred = rbtn_red_get(ex_node_t, ex_link, cnode); 
        // 看當前的點是否為紅
        rbtn_rotate_left(ex_node_t, ex_link, cnode, tnode);
        rbtn_color_set(ex_node_t, ex_link, tnode, tred);
        rbtn_red_set(ex_node_t, ex_link, cnode);
        // 旋轉後根據顏色設定
        cnode = tnode;
    } 
}
pathp->node = cnode;

最後再根據紅黑樹的特性，把樹根設定為黑色即可 rbtree-root = path->node; rbtn_black_set(ex_node_t, ex_link, rbtree->root);

以上為 linux 左傾斜紅黑樹插入的實作

確認 Linux 核心的 rbtree 是否為 LLRBT，若不是，考慮因素在哪？

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

~~`AAAA` = `rbtn_black_set` `BBBB` = `tred` `CCCC` = `rbtn_red_set`~~

以下為 `rb_tree` 的刪除部分，由於太長，這邊只列出想法以及答案

此程式碼也可以分成幾段來觀看，並且於先前 insert 一樣，都會創建一個 iteration_list 來記錄當前走過的路。第一段

for(pathp = path; pathp->node; pathp++) {
    int cmp = pathp->cmp = x_cmp(node, pathp->node);
    if(cmp < 0) {...} // 代表往左走
    else { // 代表往右走 或是找到了
        if(cmp == 0) {
            // 找到要刪除的點
            pathp->cmp = 0; // 這邊可以視為用 0 紀錄要刪除的點
            for(pathp++; pathp->node; pathp++) { ... } // 往左找替換點
        }
    }
}

接下來為下一個段落，下一個段落就是從先確認是否在 0 和 1 層做事，如果是的話就不需要和後繼者 (successor) 互相交換，並且把要刪除的點放在 path[-1]，反之，則調整顏色

pathp--; // 因為會跑到 NULL 節點 要往回走一格
if (pathp->node != node) { ... } // left successor
else {
    ex_node_t *left = rbtn_left_get(ex_node_t, ex_link, node);
    rbtn_black_set(ex_node_t, ex_link, left);
    
    if (pathp->node != node) { ... }
    else {
        ex_node_t *left = rbtn_left_get(ex_node_t, ex_link, node);
        if(left) { ... }
        else if(!rbtn_right_get(ex_node_t, ex_link, node)) {}
    }
}

經剛剛所述都調整過後 (經過交換 successor 以及設定顏色後) 開始從當前位置往前調整紅黑樹
~~根據此網誌得知紅黑樹的刪除過程~~
參照更權威的書籍和文獻，例如大學教科書，避免從語焉不詳的網誌學習。
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
jserv
最後就是根據當前的點往前調整紅黑樹使其恢復平衡

首先先來看到程式碼片段的 FFFF

                    if (leftrightleft &&                                       \
                        rbtn_red_get(x_type, x_field, leftrightleft)) {        \
                        /*      ||                                        */   \
                        /*    pathp(b)                                    */   \
                        /*   /        \\                                  */   \
                        /* (r)        (b)                                 */   \
                        /*   \                                            */   \
                        /*   (b)                                          */   \
                        /*   /                                            */   \
                        /* (r)                                            */   \
                        x_type *unode;                                         \
                        rbtn_black_set(x_type, x_field, leftrightleft);        \
                        rbtn_rotate_right(x_type, x_field, pathp->node,        \
                                          unode);                              \
                        rbtn_rotate_right(x_type, x_field, pathp->node,        \
                                          tnode);                              \
                        rbtn_right_set(x_type, x_field, unode, tnode);         \
                        FFFF(x_type, x_field, unode, tnode);

欲刪除的節點為目前節點的右節點，為了要使刪除時維持左傾斜的特性，作者選擇右旋轉兩次，以下為圖示

digraph BST {
    graph[ordering="out"]
    node [shape=circle, fixedsize=true, style=filled,width=.8]
    
    n1, n2, n3[color=transparent, fontsize=0]
    
    "A"
    "B:unode"[fillcolor=red]
    "C"
    "D"[fillcolor=red]
    
    "A" -> "B:unode"
    "A" -> "E"
    "B:unode" -> n1[color=transparent]
    "B:unode" -> "C"
    "C" -> "D"
    "C" -> n2[color=transparent]
}

在最後一步之前會轉化為這樣

digraph BST {
    graph[ordering="out"]
    node [shape=circle, fixedsize=true, style=filled,width=.8]
    
    n1, n2, n3, n4[color=transparent, fontsize=0]
    
    "A"
    "B:unode"[fillcolor=red]
    "C"
    "D"[fillcolor=red]
    
    tnode -> n4[color=transparent]
    tnode -> "B:unode"
    "B:unode" -> n1[color=transparent]
    "B:unode" -> "A"
    "A" -> "C"
    "A" -> "E"


    "C" -> "D"
    "C" -> n2[color=transparent]
}

在經過最後一次左旋轉即可得到左傾斜的紅黑樹

digraph BST {
    graph[ordering="out"]
    node [shape=circle, fixedsize=true, style=filled,width=.8]
    
    n1, n2, n3, n4[color=transparent, fontsize=0]
    
    "A"
    "B:unode"[fillcolor=red]
    "C"
    "D"[fillcolor=red]
    
    "B:unode" -> tnode
    "B:unode" -> "A"
    
    "A" -> "C"
    "A" -> "E"
    "C" -> "D"
    "C" -> n2[color=transparent]
}

如此一來即恢復平衡
在 GGGG 則是可以直接以 rbtn_rotate_right 填入來恢復平衡

以上為 linux 左傾斜紅黑樹刪除的想法
DDDD = 0 : 為了要記錄刪除的點
EEEE = !rbtn_right_get(x_type, x_field, node) : 為了看是否右邊沒有東西
FFFF = rbtn_rotate_left
GGGG = rbtn_rotate_right

測驗二

`TimSort`

是一種排序演算法，幾乎所有的情況都可以有 O(nlogn) (comparison based sort 的最佳時間複雜度)，其中利用了 insertion_sort 跟 merge_sort 來分別處理比較少的資料以及比較多的資料。

`__inplace_merge`

為了要合併兩個 run 並且保持 inplace，這邊採用的方法為反轉裡面的資料來達到排序的效果

由於我們可以確定每一個 run 都經由 insertion_sort 排序，所以為升冪排序，以下給予圖片解說來解釋裡面程式碼的作用
假設兩個run 分別為 [1, 2, 4, 5] 跟 [3, 6, 10]

流程如下：

首先將 cur_1 指到比 first_2 更小的位置利用
while(cur_1 < last_2 && !cmp(cur_2, cur_1)) cur_1 += size;
再來把 cur_2 指到比　cur_1 更小的位置利用
while(cur_2 < last_2 && cmp(cur_2, cur_1)) cur_2 += size;

即可得到目前的狀況

digraph {
    node [shape=plaintext, fontcolor=black, fontsize=16];
    "" -> "Values:" [color=white];

    node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
    pointers [label="<f0> first_1 |<cur> cur_1| <f1> last_1 | <f2> first_2| <f5> cur_2| <f3> last_2", color=white];
    values [label="<f0> 1| <f1> 2| <f2> 4| <f3> 5 | <f4> 3 | <f5> 6 | <f6> 10"];


    { rank=same; ""; pointers }
    { rank=same; "Values:"; values }

    edge [color=blue];
    // pointers:f0 -> values:f2;
    pointers:f0 -> values:f0;
    pointers:f2 -> values:f4;
    pointers:f1 -> values:f3;
    pointers:f3 -> values:f6;
    pointers:cur -> values:f2;
    pointers:f5 -> values:f4;
}

接下來把 cur_1 到 first_2 之前相反過來

__reverse(cur_1, first_2, size)

再來將 first_2 與 cur_2 之前相反過來

__reverse(first_2, cur_2, size)

即可得到當前的陣列

digraph {
    node [shape=plaintext, fontcolor=black, fontsize=16];
    "" -> "Values:" [color=white];

    node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
    pointers [label="<f0> first_1 |<cur> cur_1| <f1> last_1 | <f2> first_2| <f5> cur_2| <f3> last_2", color=white];
    values [label="<f0> 1| <f1> 2| <f2> 5| <f3> 4 | <f4> 3 | <f5> 6 | <f6> 10"];

    { rank=same; ""; pointers }
    { rank=same; "Values:"; values }

    edge [color=blue];
    // pointers:f0 -> values:f2;
    pointers:f0 -> values:f0;
    pointers:f2 -> values:f4;
    pointers:f1 -> values:f3;
    pointers:f3 -> values:f6;
    pointers:cur -> values:f2;
    pointers:f5 -> values:f4;
}

可以注意到，根據剛剛的翻轉，中間已經呈現了一個降冪的關係，此時只需要重新反轉一次即可得到最後排序好的兩個 run

__reverse(cur_1, cur_2, size)

為避免單個 run 太長，導致 `merge` 速度變差

設計這個 tim_sort 的人有稍微考慮到，當兩個 run 的長度不一致，達到非常懸殊的等級的時候，假設兩個 run 分別為 A_run, (len(A_run) == 10) 以及 B_run, (len(B_run) == 100)，此時的 merge 會在兩個已經sort好的陣列中做出相當多不需要的操作，而當兩個 run 的長度差不多時，此時的 merge 才算比較有效率。
為了達到此效用，作者設計了一個 merge_rule 如下

static inline bool merge_rule(run_t *stack, size_t top)
{
    if (top < 1)
        return false;
    if (top < 2)
        return run_length(stack[top]) > run_length(stack[top - 1]);
    return run_length(stack[top]) > run_length(stack[top - 1]) ||
           run_length(stack[top]) + run_length(stack[top - 1]) >
               run_length(stack[top - 2]);
}

意即若當前這個 run 加上前一個 run 沒有比前兩個 run 大時，不會進入 merge 的片段

各答案如下： `AAAA` : `__reverse(cur_1, cur_2, size)` `BBBB` : `top - 1` `CCCC` : `top - 2` `DDDD` : `first + MIN_RUN * size` 我想這邊應該不需要太過解釋，就只是沒有超過時取最小 `EEEE` : `top--` 為了要合併 top 的內容 `FFFF` : `stack[top - 1].last = stack[top].last` 一個一個合併

不用抄寫答案，只要專注程式碼本身和你的理解。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

測驗三

`RB_LOG2_MAX_NODES`

#define RB_LOG2_MAX_MEM_BYTES (sizeof(void *) << 3)

/* clang-format off */
#define RB_LOG2_MAX_NODES            \
    (RB_LOG2_MAX_MEM_BYTES -         \
     (sizeof(void *) >= 8   ? 4      \
      : sizeof(void *) >= 4 ? 3   \
                            : 1) - 1)
/* clang-format on */

可以根據 rb_tree 的定義，樹高為 2 * log(n + 1) 以及不超出記憶體限制的規則得到以下結論。

`sum_tree`

沒必要將程式碼註解改為中文，直接解讀程式碼的作用，避免逐行解說，而該揣摩設計者的思維。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

static int sum_subtree(node_t *node)
{
    int result = 0;
    while (node) {
        node_t *pre;
        if (!rbtn_left_get(node_t, link, node))
            goto do_print;

        for (pre = rbtn_left_get(node_t, link, node);
             rbtn_right_get(node_t, link, pre) &&
             rbtn_right_get(node_t, link, pre) != node;
             pre = rbtn_right_get(node_t, link, pre)) {
        }
        /* 根據每一個點往右走*/
        if (!rbtn_right_get(node_t, link, pre)) {
            rbtn_right_get(node_t, link, pre) = node;
            node = rbtn_left_get(node_t, link, node);
            /* 右邊沒有找左邊 */
        } else {
            rbtn_right_get(node_t, link, pre) = NULL;

        do_print:
            result += node->value;
            printf("value: %d\n", node->value);
            node = rbtn_right_get(node_t, link, node);
            /* cache */
        }
    }
    return result;
}

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

2023q1 Homework4 (quiz4)

開發環境

測驗一 rb.h

理解 rb.h 的實作

以下為 rb_tree 的刪除部分，由於太長，這邊只列出想法以及答案

測驗二

TimSort

__inplace_merge

為避免單個 run 太長，導致 merge 速度變差

測驗三

RB_LOG2_MAX_NODES

sum_tree

測驗一 `rb.h`

以下為 `rb_tree` 的刪除部分，由於太長，這邊只列出想法以及答案

`TimSort`

`__inplace_merge`

為避免單個 run 太長，導致 `merge` 速度變差

`RB_LOG2_MAX_NODES`

`sum_tree`