Linux 紅黑樹探討

--- tags: Linux Kernel --- # Linux 紅黑樹探討 contributed by < [steven1lung](https://github.com/steven1lung) > [Documentation](https://www.kernel.org/doc/Documentation/rbtree.txt) 在 Linux 核心中，有許多地方都有使用到紅黑樹。但是我們可能會想說為什麼要使用紅黑樹，而不是其他的二元樹像是：AVL Tree 之類。雖然二者平均時間複雜度都是 $O(\log n)$，但因行為和特性的落差，有不同的應用場景。以下簡述 AVL Tree 和紅黑樹的差異: * 平衡 - AVL Tree 比紅黑樹還要平衡（左右子樹的高度不能差超過 2），但是會在平衡自己的時候花費比紅黑樹多的時間。 - 如果考慮到要更快速地去 search 一個節點，那 AVL tree 會比較適合。 - 紅黑樹的優點在於雖然沒有到完全平衡（最長路徑 $\le$ 2 倍最短路徑），但是紅黑樹會在平衡自己時達到 $O(1)$ （最多 3 個 rotation）的複雜度。 * 空間 - AVL 的節點會需要宣告 factor (height) 的變數給每個節點來作為平衡的參考，而紅黑樹只需要 1 位元的變數來區分顏色（紅、黑）。 > 下方介紹 Linux 如何運用 alignment 將這個額外的 1 位元也省略總而言之我們可以分兩種場景來選擇要使用的樹： 1. 插入跟刪除較多：紅黑樹 2. 查詢節點較多：AVL Tree 在 Linux Kernel 中有許多地方都有使用到紅黑樹，例如：`hr_timer` 使用紅黑樹來記錄 timer requests、`ext3` 檔案系統使用紅黑樹來追蹤目錄、VMA（Virtual Memory Area）也是用紅黑樹來紀錄。而在 CFS 中，需要頻繁地插入跟刪除節點（任務），所以我覺得這是當初開發者會選擇使用紅黑樹的考量。 ## Linux 紅黑樹介紹 Linux 紅黑數跟一般紅黑樹不一樣得地方在有對速度進行優化，少了一層的 indirection 並且有更好的 cache locality。並且使用的方法是將 `rb_node` 嵌入要使用的結構中，透過 `container_of` 存取 data，而不是在 `rb_node` 裡宣告 data 指標。 **Linux 紅黑樹需要使用者去定義 tree search 跟 insert 函式。** ### 紅黑樹節點 ```c struct rb_node { unsigned long __rb_parent_color; struct rb_node *rb_right; struct rb_node *rb_left; } __attribute__((aligned(sizeof(long)))); ``` > 看了 `rb_node` 跟 `list_head` 之後，發現 Linux 風格的資料結構都不會包含 data，而是會將 `rb_node` 或是 `list_head` 這些資料結構包在一個更大的結構體裡面。一個紅黑樹節點包含：親代節點、左子節點、右子節點、顏色。而在 Linux 宣告的程式碼中，很巧妙地將親代節點跟顏色一起宣告，使用 `unsigned long __rb_parent_color` 將指向親代節點的指標跟自身的顏色合併。透過 `__attribute__((aligned(sizeof(long))))` 讓編譯器知道 `struct rb_node` 會對齊 `sizeof(long)`，這樣就會讓指標最低 2 個位元是沒有使用到的，就可以把顏色的資料放進去其中一個位元中。舉例來說，一個節點指標如果指向 `0xF724315C` ，這個位址轉成二進制會變成 `...1100`，最低 2 個位元會是 `0`，Linux 核心開發者利用這特性，將其中一個沒用到的位元拿來標注紅和黑這二種顏色。 ### container_of Linux Kernel 風格資料結構都會用一個大的 `struct` 包住我們使用的資料結構（`list_head`、`rb_node`），再透過巨集 `container_of` 從節點去存取外面的 container（也就是包住節點的結構體）。 ```c /** * container_of - cast a member of a structure out to the containing structure * @ptr: the pointer to the member. * @type: the type of the container struct this is embedded in. * @member: the name of the member within the struct. * */ #define container_of(ptr, type, member) ({ \ void *__mptr = (void *)(ptr); \ BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ !__same_type(*(ptr), void), \ "pointer type mismatch in container_of()"); \ ((type *)(__mptr - offsetof(type, member))); }) ``` ```c #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) ``` `container_of` 的實作就是先將位址 `0` 轉成結構體指標，再取得 member（節點）的位址。因為是從 `0` 開始，所以就會得出這個 member 在結構體裡的偏移量。最後再將原本 member 記憶體位址扣掉這個偏移量，就可以得到 container struct 的位址了！ #### `container_of` or `rb_entry` 關於什麼時候要使用 `container_of` 還是 `rb_entry`（或是其他的 `XXXX_entry`，依照你正在使用的資料結構），作者的說法是當你是要存取節點的 container structure 時，會使用 `container_of`。如果要存取節點 container structure 裡的 member，就會使用 `XXXX_entry`。雖然 `XXXX_entry` 是一個巨集，且定義跟 `container_of` 一樣，透過這樣來區分是要存取大的 container 還是裡頭的 member 我覺得對其他人閱讀你的程式碼時會更好被理解、或是自己在看時更容易懂當初的寫法，是一個小但是有幫助的技巧。 ### rb_link_node 我們先看一個最常被使用到的函式： ```c static inline void rb_link_node(struct rb_node *node, struct rb_node *parent, struct rb_node **rb_link) { node->__rb_parent_color = (unsigned long)parent; node->rb_left = node->rb_right = NULL; *rb_link = node; } ``` 這個函式就是把 `node` 的親代節點設為 `parent`，並且將 `rb_link` 指向 `node`。放入 `parent` 的左或是右子樹則是依照 `rb_link`。 ### rb_insert_color ```c void rb_insert_color(struct rb_node *node, struct rb_root *root) { __rb_insert(node, root, dummy_rotate); } ``` `__rb_insert` 可以到 [lib/rbtree.c](https://elixir.bootlin.com/linux/v5.10/source/lib/rbtree.c#L85) 看，程式碼太長就不放上來了。`__rb_insert` 做的事情就是插入節點並且進行平衡（rotate）。 ### 親代節點當要存取親代節點時，可以透過巨集來存取： ```c #define rb_parent(r) ((struct rb_node *)((r)->__rb_parent_color & ~3)) ``` 這個巨集會捨棄最右邊的兩個位元，並且將結果轉成節點指標，就可以得到親代節點的位址了。 ### 顏色 ```c= #define RB_RED 0 #define RB_BLACK 1 #define __rb_color(pc) ((pc) & 1) #define __rb_is_black(pc) __rb_color(pc) #define __rb_is_red(pc) (!__rb_color(pc)) #define rb_color(rb) __rb_color((rb)->__rb_parent_color) #define rb_is_red(rb) __rb_is_red((rb)->__rb_parent_color) #define rb_is_black(rb) __rb_is_black((rb)->__rb_parent_color) ``` 在 `rbtree_augmented` 中定義了紅色為 0，黑色為 1。由此可知，如果一個節點是紅色的，那就可以直接存取 `__rb_parent_color`，但還是透過巨集方式統一比較好。第七行的 `rb_color(rb)` 會存取這個節點的親代節點 `__rb_parent_color` 的顏色，因為知道黑色是 1，所以透過 `(pc) & 1` 就可以知道顏色了。我們先看設定親代節點所使用的函式： ```c static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p) { rb->__rb_parent_color = rb_color(rb) | (unsigned long)p; } ``` 這個函式所做的事情就是將 `p` 設定為 `rb` 的親代節點，`rb_color(rb)` 是 `p` 的顏色跟 `p` 的位址去做 `|` 就一起設定好親代節點的位址跟顏色了。也有可以針對親代節點去設定顏色的函式： ```c static inline void rb_set_parent_color(struct rb_node *rb, struct rb_node *p, int color) { rb->__rb_parent_color = (unsigned long)p | color; } ``` ## 簡單的紅黑樹例子 ### 決定架構首先要定義好紅黑樹節點外的 container 結構： ```c struct mytype { struct rb_node node; char *keystring; }; ``` 我們就可以透過 `container_of(ptr, type, member)` 從 `rb_node` 存取 `struct mytype`。 ### Search 實作接著就需要定義 search 的函式，想法也很直覺：比對現在節點的 `keystring` 如果小於就往左子節點找，大於就往右子節點找。範例如下： ```c struct mytype *my_search(struct rb_root *root, char *string) { struct rb_node *node = root->rb_node; while (node) { struct mytype *data = container_of(node, struct mytype, node); int result; result = strcmp(string, data->keystring); if (result < 0) node = node->rb_left; else if (result > 0) node = node->rb_right; else return data; } return NULL; } ``` ### Insert 實作插入的做法就需要先使用 `search` 找到要插入的位置，新增節點，再進行平衡跟重新著色。 ```c= int my_insert(struct rb_root *root, struct mytype *data) { struct rb_node **new = &(root->rb_node), *parent = NULL; /* Figure out where to put new node */ while (*new) { struct mytype *this = container_of(*new, struct mytype, node); int result = strcmp(data->keystring, this->keystring); parent = *new; if (result < 0) new = &((*new)->rb_left); else if (result > 0) new = &((*new)->rb_right); else return FALSE; } /* Add new node and rebalance tree. */ rb_link_node(&data->node, parent, new); rb_insert_color(&data->node, root); return TRUE; } ``` 第 6 到 17 行是找出要插入的位置，作法是用一個指向**節點位址**的指標（指標的指標）來儲存新增的節點要在的位址。第 20 行就是將找到的 `*new` 位址插入一個新的節點，將 `*new` 位址放入節點並連接 `parent`。 > 從 11 或是 14 可以知道 `new` 是左、右子節點指標的指標，可以透過 `*new` 存取左、右子節點 ### 迭代紅黑樹里的節點 ```c struct rb_node *rb_first(struct rb_root *tree); struct rb_node *rb_last(struct rb_root *tree); struct rb_node *rb_next(struct rb_node *node); struct rb_node *rb_prev(struct rb_node *node); ``` 要迭代過每個節點的方式是透過先呼叫 `rb_first` 或是 `rb_last`，就會回傳一個紅黑樹節點的指標。之後再用這個指標去呼叫 `rb_next` 或是 `rb_prev` 就可以存取下一個或是上一個節點了。 > 回傳 NULL 就代表沒有下一個了將所有節點的 `keystring` 輸出的範例（由小到大）： ```c struct rb_node *node; for (node = rb_first(&mytree); node; node = rb_next(node)) printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); ``` ## Source code [linux/rbtree.h](https://elixir.bootlin.com/linux/v5.10/source/include/linux/rbtree.h) [linux/rbtree_augmented.h](https://elixir.bootlin.com/linux/v5.10/source/include/linux/rbtree_augmented.h#L146) [lib/rbtree.c](https://elixir.bootlin.com/linux/v5.10/source/lib/rbtree.c)

Read more

Git 基礎教學

提交第一份 Patch 到 Linux Kernel

2022q1 Homework6 (ktcp)

在乙太鍊上實作 NFT 鑄造網站