Try   HackMD

linux2023: ahbji

α
− 1 :ST-Tree 運作原理分析

ST-tree 主要參考 AVL tree 的平衡操作,并引入了 rbtree 的結構特點,相對於 AVL tree ,這一特點有助於對代碼運作原理的解讀。

ST-tree 的結構定義

tree 節點由 st_node 結構體定義,結合了 AVL tree 和 rbtree 的特點:

  • parent、left 、right 指針來自 rbtree
  • hint 來自 AVL tree ,根據注釋内容可知,hint 記錄的是一種近似值,不像 AVL tree 那樣需要準確記錄當前節點的 depth 。
struct st_node {
    short hint;
    struct st_node *parent;
    struct st_node *left, *right;
};

st_*(struct st_node *n) 系列巨集和函式定義了獲取當前 st_node 的親代、左右子代、前驅和後繼的指針

#define st_root(r) (r->root)
#define st_left(n) (n->left)
#define st_right(n) (n->right)
#define st_rparent(n) (st_right(n)->parent)
#define st_lparent(n) (st_left(n)->parent)
#define st_parent(n) (n->parent)

struct st_node *st_first(struct st_node *n)
{
    if (!st_left(n))
        return n;

    return st_first(st_left(n));
}

struct st_node *st_last(struct st_node *n)
{
    if (!st_right(n))
        return n;

    return st_last(st_right(n));
}

root 節點和 ST-tree 的插入操作

st_root 結構體可以用來找到 ST-tree 的 root 節點

struct st_root {
    struct st_node *root;
};

st_roottreeint_init 中被初始化,在 treeint_insert 處理第一個節點時將之作爲 root 節點引用到 st_root

struct treeint *treeint_insert(int a)
{
    struct st_node *p = NULL;
    enum st_dir d;
    for (struct st_node *n = st_root(tree); n;) {
        struct treeint *t = container_of(n, struct treeint, st_n);
        if (a == t->value)
            return t;

        p = n;

        if (a < t->value) {
            n = st_left(n);
            d = LEFT;
        } else if (a > t->value) {
            n = st_right(n);
            d = RIGHT;
        }
    }

    struct treeint *i = calloc(sizeof(struct treeint), 1);
    if (st_root(tree))
        // 其他節點
        st_insert(&st_root(tree), p, &i->st_n, d);
    else
        // root 節點
        st_root(tree) = &i->st_n;

    i->value = a;
    return i;
}

st_insert 負責將新節點 n 插入到 p 所在的位置,方向由 d 決定,插入后需要對 tree 進行更新

enum st_dir { LEFT, RIGHT };
/* The process of insertion is straightforward and follows the standard approach
 * used in any BST. After inserting a new node into the tree using conventional
 * BST insertion techniques, an update operation is invoked on the newly
 * inserted node.
 */
void st_insert(struct st_node **root,
               struct st_node *p,
               struct st_node *n,
               enum st_dir d)
{
    if (d == LEFT)
        st_left(p) = n;
    else
        st_right(p) = n;

    st_parent(n) = p;
    st_update(root, n);
}

更新處理方法涉及到當前節點的選裝和 hint 更新,然後遞歸更新 parent。
和 AVL tree 不同的是:

  • ST tree 是先旋轉再更新當前節點的 hint 。
  • 這裏通過判斷是否無法更新當前平衡因子來決定是否繼續遞歸走訪 parent。
  • 與之相對,AVL tree 是先更新 height 再旋轉,旋轉時會對當前節點及其 child 的 height 再做更新,另外,無論是左傾還是右傾處理,都有著更複雜的旋轉選擇。這為 ST tree 出現 worst case 埋下了伏筆。
static inline void st_update(struct st_node **root, struct st_node *n)
{
    if (!n)
        return;
    // 讀取平衡因子
    int b = st_balance(n);
    int prev_hint = n->hint;
    struct st_node *p = st_parent(n);

	// 處理右傾
    if (b < -1) {
        /* leaning to the right */
        if (n == *root)
            *root = st_right(n);
        st_rotate_right(n);
    }
	// 處理左傾
    else if (b > 1) {
        /* leaning to the left */
        if (n == *root)
            *root = st_left(n);
        st_rotate_left(n);
    }
    // 更新當前節點的平衡因子
    n->hint = st_max_hint(n);
    if (n->hint == 0 || n->hint != prev_hint)
        st_update(root, p);
}

static inline int st_balance(struct st_node *n)
{
    int l = 0, r = 0;

    if (st_left(n))
        l = st_left(n)->hint + 1;

    if (st_right(n))
        r = st_right(n)->hint + 1;

    return l - r;
}

static inline int st_max_hint(struct st_node *n)
{
    int l = 0, r = 0;

    if (st_left(n))
        l = st_left(n)->hint + 1;

    if (st_right(n))
        r = st_right(n)->hint + 1;

    return l > r ? l : r;
}

子樹的旋轉采用了 AVL tree 的處理方式,需要注意的是,在實作中,左傾時右旋,右傾時左旋:

/**
  * 左傾的處理方式是對 n 右旋
  *        5 <-n             3
  *      /   \             /   \
  *     3 <-l 8           1     5
  *    / \         =>   /      /  \
  *   1   4            0      4    8
  *  /
  * 0
  */
static inline void st_rotate_left(struct st_node *n)
{
    struct st_node *l = st_left(n), *p = st_parent(n);

    st_parent(l) = st_parent(n);
    st_left(n) = st_right(l);
    st_parent(n) = l;
    st_right(l) = n;

    if (p && st_left(p) == n)
        st_left(p) = l;
    else if (p)
        st_right(p) = l;

    if (st_left(n))
        st_lparent(n) = n;
}

/**
  * 右傾的處理方式是對 n 左旋
  *      1 <-n                   4
  *    /   \                   /   \
  *   0     4 <-r             1     5
  *       /   \        =>   /   \    \
  *      3     5           0     3    6
  *             \                 
  *              6                                                
  */
static inline void st_rotate_right(struct st_node *n)
{
    struct st_node *r = st_right(n), *p = st_parent(n);

    st_parent(r) = st_parent(n);
    st_right(n) = st_left(r);
    st_parent(n) = r;
    st_left(r) = n;

    if (p && st_left(p) == n)
        st_left(p) = r;
    else if (p)
        st_right(p) = r;

    if (st_right(n))
        st_rparent(n) = n;
}

遺留不解之處

如何迅速找到 worst case ?

ST-tree 的節點移除操作

使用繁體中文書寫,留意資訊科技詞彙翻譯

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

ST-tree 的節點移除操作的設計思路來自 AVL tree ,根據注釋内容得知其詳情:

  • The process of deletion in this tree structure is relatively more intricate
    這部分注釋介紹了刪除操作在這棵樹結構中相對複雜。刪除操作和其他 BST 中的刪除操作類似。當然要刪除一個節點時,會有不同情況需要考慮。
  • When removing a node, if the node to be deleted has a right child, the deletion process entails replacing the node to be removed with the first node encountered in the right subtree:會在新插入節點的右子樹上繼續更新操作
    如果要删除的節點有一個右子節點,刪除操作會將要刪除的節點替換爲後繼節點。這個替換過程涉及到節點的旋轉操作,以便在刪除后保持書的平衡。隨後,會在新插入節點的右子樹上繼續更新操作。
  • Similarly, if the node to be deleted does not have a right child, the replacement process involves utilizing the first node found in the left subtree
    類似地,如果要刪除的節點沒有右子節點,則會在左子樹中找到前驅節點。隨後会對新插入節點的左子樹做更新處理。
  • In scenarios where the node to be deleted has no children (neither left nor right), it can be directly removed from the tree, and an update operation is invoked on the parent node of the deleted node.:
    要刪除的節點沒有子節點(既没有左子節點也没有右子節點)時,可以直接從樹中刪除該節點,並在被刪除節點的親代節點上呼叫更新操作。
/* The process of deletion in this tree structure is relatively more intricate,
 * although it shares similarities with deletion methods employed in other BST.
 * When removing a node, if the node to be deleted has a right child, the
 * deletion process entails replacing the node to be removed with the first node
 * encountered in the right subtree. Following this replacement, an update
 * operation is invoked on the right child of the newly inserted node.
 *
 * Similarly, if the node to be deleted does not have a right child, the
 * replacement process involves utilizing the first node found in the left
 * subtree. Subsequently, an update operation is called on the left child of th
 * replacement node.
 *
 * In scenarios where the node to be deleted has no children (neither left nor
 * right), it can be directly removed from the tree, and an update operation is
 * invoked on the parent node of the deleted node.
 */
void st_remove(struct st_node **root, struct st_node *del)
{
    if (st_right(del)) {
        // 找到後繼節點 least
        struct st_node *least = st_first(st_right(del));
        if (del == *root)
            *root = least;

        // AAAA; 替換 least 到 del
        st_replace_right(del, least);
        // BBBB; 從 least->right 開始更新
        st_update(root, st_right(least));

        return;
    }

    if (st_left(del)) {
        // 找到前驅節點 most
        struct st_node *most = st_last(st_left(del));
        if (del == *root)
            *root = most;

        // CCCC; 替換 most 到 del
        st_replace_left(del, most);
        // DDDD; 從 most->left 開始更新子樹
        st_update(root, st_left(most));
        return;
    }

    /* remove root */
    if (del == *root) {
        *root = 0;
        return;
    }

    /* remove leaf*/
    struct st_node *parent = st_parent(del);

    if (st_left(parent) == del)
        st_left(parent) = 0;
    else
        st_right(parent) = 0;

    // EEEE; 在 parent 上做 update
    st_update(root, parent);
}

用前驅和後繼節點替換當前子樹的 root ,實作用的是 BST tree 的標準做法,詳情如注釋中的圖示:

/**
 * 將當前子樹的 root 替換爲後繼節點
 *           8                             9
 *     /          \                  /          \   
 *    4           12                4           12
 *  /  \        /    \     =>     /  \        /    \
 * 3    6     10     14          3    6     10     14
 *    /  \   /  \   /  \            /  \      \   /  \
 *   5   7  9   11 13  15          5   7      11 13  15
*/
static inline void st_replace_right(struct st_node *n, struct st_node *r)
{
    struct st_node *p = st_parent(n), *rp = st_parent(r);

    if (st_left(rp) == r) {
        st_left(rp) = st_right(r);
        if (st_right(r))
            st_rparent(r) = rp;
    }

    if (st_parent(rp) == n)
        st_parent(rp) = r;

    st_parent(r) = p;
    st_left(r) = st_left(n);

    if (st_right(n) != r) {
        st_right(r) = st_right(n);
        st_rparent(n) = r;
    }

    if (p && st_left(p) == n)
        st_left(p) = r;
    else if (p)
        st_right(p) = r;

    if (st_left(n))
        st_lparent(n) = r;
}
/**
 * 將當前子樹的 root 替換爲前驅節點
 *           8                             7
 *     /          \                  /          \   
 *    4           12                4           12
 *  /  \        /    \     =>     /  \        /    \
 * 3    6     10     14          3    6     10     14
 *    /  \   /  \   /  \            /      /  \   /  \
 *   5   7  9   11 13  15          5      9   11 13  15
*/
static inline void st_replace_left(struct st_node *n, struct st_node *l)
{
    struct st_node *p = st_parent(n), *lp = st_parent(l);

    if (st_right(lp) == l) {
        st_right(lp) = st_left(l);
        if (st_left(l))
            st_lparent(l) = lp;
    }

    if (st_parent(lp) == n)
        st_parent(lp) = l;

    st_parent(l) = p;
    st_right(l) = st_right(n);

    if (st_left(n) != l) {
        st_left(l) = st_left(n);
        st_lparent(n) = l;
    }

    if (p && st_left(p) == n)
        st_left(p) = l;
    else if (p)
        st_right(p) = l;

    if (st_right(n))
        st_rparent(n) = l;
}

需要注意的是,因爲 st_update 的缺陷,刪除時也會因爲 hint 更新不及時導致出現 worst case 的情況。

α
− 2 :指出上述程式碼可改進之處,特別跟 AVL tree 和 red-black tree 相比,並予以實作

關於和 AVL tree 和 rbtree 的對比,以及對於 ST tree 的 worst case 論述,fourcolor 的 linux 2023 Quiz01 作業做了很好的總結。

改進思路是全面使用 AVL tree 的方式來做旋轉和 hint 更新。因爲時間關係,實作尚未完成。

β
- 1 align_up 運作原理分析

題目要求,align_up 函式需要返回對齊到 alignment 的地址。

根據題目中樣例代碼中的注釋和 alignment & (alignment - 1) 的理解,我需要用加、減運算和 bitwise 操作實作儅 alignment 為 2 的冪時 sz 對齊到 alignment 的地址。

總結樣例輸出,可知:

  • 儅 sz 能被 alignment 整除時,直接返回當前 sz
  • 儅 sz 不能被 alignment 整除時,返回 sz - (sz % alignment) + alignment

因爲不能使用條件判斷,所以只能推導出公式 (sz + mask) & ~mask來解決,其中 mask = alignment - 1

公式的推導有些運氣成分:我只是寫下這些内容,觀察了幾個小時,用排除法找到了這個公式

(120 with 4) & 11111100 = 120
(121 with 4) & 11111100 = 124
(122 with 4) & 11111100 = 124
(123 with 4) & 11111100 = 124
(124 with 4) & 11111100 = 124
(125 with 4) & 11111100 = 128

代碼如下:

#include <stdio.h>
#include <stdint.h>

static inline uintptr_t align_up(uintptr_t sz, size_t alignment)
{
    uintptr_t mask = alignment - 1;
    if ((alignment & mask) == 0) {  /* power of two? */
        // return MMMM;
        return (sz + mask) & ~mask;
    }
    return (((sz + mask) / alignment) * alignment);
}

int main(int argc, char const *argv[]) {
    printf("%ld\n", align_up(120, 4));
    printf("%ld\n", align_up(121, 4));
    printf("%ld\n", align_up(122, 4));
    printf("%ld\n", align_up(123, 4));
    printf("%ld\n", align_up(124, 4));
    printf("%ld\n", align_up(125, 4));
}

關於數學推導過程,StevenChen8759 的作業太精彩了

β
- 2 align_up 在 Linux 核心原始程式碼找出類似 align_up 的程式碼,並舉例說明其用法

内核中明確使用這種方式處理地址對其的是 Network drive 文檔 qlge.rst ,其中 Dump kernel data structures in drgn 章節提到了相同的地址對其方式:

def align(x, a):
    """the alignment a should be a power of 2
    """
    mask = a - 1
    return (x+ mask) & ~mask

def struct_size(struct_type):
    struct_str = "struct {}".format(struct_type)
    return sizeof(Object(prog, struct_str, address=0x0))

def netdev_priv(netdevice):
    NETDEV_ALIGN = 32
    return netdevice.value_() + align(struct_size("net_device"), NETDEV_ALIGN)

name = 'xxx'
qlge_device = None
netdevices = prog['init_net'].dev_base_head.address_of_()
for netdevice in list_for_each_entry("struct net_device", netdevices, "dev_list"):
    if netdevice.name.string_().decode('ascii') == name:
        print(netdevice.name)

ql_adapter = Object(prog, "struct ql_adapter", address=netdev_priv(qlge_device))

這種方式也可以處理符号位,避免將值設置爲負數,例如 powerpc kernel 的 vecemu.c

/* Round to floating integer, towards +/- Inf */
static unsigned int rfii(unsigned int x)
{
	int exp, mask;

	exp = ((x >> 23) & 0xff) - 127;
	if (exp == 128 && (x & 0x7fffff) != 0)
		return x | 0x400000;	/* NaN -> make it a QNaN */
	if (exp >= 23)
		return x;		/* it's an integer already (or Inf) */
	if ((x & 0x7fffffff) == 0)
		return x;		/* +/-0 -> +/-0 */
	if (exp < 0)
		/* 0 < |x| < 1.0 rounds to +/- 1.0 */
		return (x & 0x80000000) | 0x3f800000;
	mask = 0x7fffff >> exp;
	/* mantissa overflows into exponent - that's OK,
	   it can't overflow into the sign bit */
	return (x + mask) & ~mask;
}

γ
- 1 qsort-mt 運作原理分析

qsort-mt 使用的是基於 fork-join 模型的多執行緒快速排序方法:

  • main 執行緒負責初始化 thread pool ,並從 pool 中取出第一個 qsort 執行緒對指定序列排序
  • 第一個 qsort 執行緒找到 pivot 並分割好子序列后,則在 pool 中取出處於 idle 狀態 qsort 執行緒,以遞歸的方式對子序列進行快速排序。
  • 所有子序列都排序完成後,則在第一個 qsort 執行緒中將所有 qsort 執行緒設置為 terminate 狀態,然後返回到 main 執行緒打印程序運行時間以及分別在 user space 和 kernel space 使用的 cpu 時間數。

遺留問題

爲何在 do-while 回圈中處理 pthread_xxx call 錯誤?

#define verify(x)                                                      \
    do {                                                               \
        int e;                                                         \
        if ((e = x) != 0) {                                            \
            fprintf(stderr, "%s(%d) %s: %s\n", __FILE__, __LINE__, #x, \
                    strerror(e));                                      \
            exit(1);                                                   \
        }                                                              \
    } while (0)

γ
- 2 以 Thread Sanitizer 找出上述程式碼的 data race 並著手修正

使用 TSan (ThreadSanitizer) 編譯,可在執行時檢查是否有 data race 發生。

$ gcc -std=gnu11 -Wall -g -fsanitize=thread -o qsort_mt qsort_mt.c -lpthread

目前 Ubuntu 只在 22.04 以上版本支持 Tsan,低於 22.04 版本的 Ubuntu 在使用 TSan 編譯會報錯 cannot open libtsan_preinit.o: No such file or directory

$./qsort-mt -tv -f2 -h2 -n100
==================
WARNING: ThreadSanitizer: data race (pid=24034)
  Write of size 4 at 0x7b4000000080 by thread T1 (mutexes: write M0):
    #0 allocate_thread /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:211 (qsort-mt+0x3476)
    #1 qsort_algo /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:314 (qsort-mt+0x4193)
    #2 qsort_thread /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:352 (qsort-mt+0x45b1)

  Previous read of size 4 at 0x7b4000000080 by thread T2 (mutexes: write M2):
    #0 qsort_thread /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:343 (qsort-mt+0x44c7)

  Location is heap block of size 256 at 0x7b4000000000 allocated by main thread:
    #0 calloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:672 (libtsan.so.0+0x31edc)
    #1 qsort_mt /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:132 (qsort-mt+0x28fa)
    #2 main /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:505 (qsort-mt+0x500e)

  Mutex M0 (0x7fffffffd988) created at:
    #0 pthread_mutex_init ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1227 (libtsan.so.0+0x4bee1)
    #1 qsort_mt /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:130 (qsort-mt+0x28dd)
    #2 main /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:505 (qsort-mt+0x500e)

  Mutex M2 (0x7b40000000a8) created at:
    #0 pthread_mutex_init ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1227 (libtsan.so.0+0x4bee1)
    #1 qsort_mt /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:136 (qsort-mt+0x297f)
    #2 main /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:505 (qsort-mt+0x500e)

  Thread T1 (tid=24039, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 qsort_mt /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:144 (qsort-mt+0x2a8e)
    #2 main /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:505 (qsort-mt+0x500e)

  Thread T2 (tid=24040, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 qsort_mt /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:144 (qsort-mt+0x2a8e)
    #2 main /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:505 (qsort-mt+0x500e)

SUMMARY: ThreadSanitizer: data race /mnt/c/Users/andre/Desktop/LinuxKernelInternals/linux2023-summer/Quiz01/01_03/qsort/qsort-mt.c:220 in allocate_thread
==================
0.103 0.0895 0.0199
ThreadSanitizer: reported 1 warnings

分析上述 log ,可以得出以下信息:

  • Thread T1 和 T2 分別是由 main Thread 創建的 qsort 執行緒
  • T1 需要從 pool 中取出 T2 對子序列執行 qsort,因而修改 T2 的 st 狀態
    ​​​​static struct qsort *allocate_thread(struct common *c) ​​​​{ ​​​​ verify(pthread_mutex_lock(&c->mtx_al)); ​​​​ for (int i = 0; i < c->nthreads; i++) ​​​​ if (c->pool[i].st == ts_idle) { ​​​​ c->idlethreads--; ​​​​ c->pool[i].st = ts_work; ​​​​ verify(pthread_mutex_lock(&c->pool[i].mtx_st)); ​​​​ verify(pthread_mutex_unlock(&c->mtx_al)); ​​​​ return (&c->pool[i]); ​​​​ } ​​​​ verify(pthread_mutex_unlock(&c->mtx_al)); ​​​​ return (NULL); ​​​​}
    然而此時,T2 正在 mtx_st critical section 中輪詢其 st 狀態
    ​​​​verify(pthread_mutex_lock(&qs->mtx_st)); ​​​​while (qs->st == ts_idle) ​​​​ verify(pthread_cond_wait(&qs->cond_st, &qs->mtx_st)); ​​​​verify(pthread_mutex_unlock(&qs->mtx_st));

可以看出,在 allocate_thread 函式中修改 pool[i].st 時并未在 mtx_st critical section 中,這是導致 data race 的原因。

修正很簡單,只需要在 mtx_st crtical section 中修改 pool[i].st 狀態便得到解決:

--- a/Quiz01/qsort_mt.c
+++ b/Quiz01/qsort_mt.c
@@ -208,8 +208,8 @@ static struct qsort *allocate_thread(struct common *c)
     for (int i = 0; i < c->nthreads; i++)
         if (c->pool[i].st == ts_idle) {
             c->idlethreads--;
-            c->pool[i].st = ts_work;
             verify(pthread_mutex_lock(&c->pool[i].mtx_st));
+            c->pool[i].st = ts_work;
             verify(pthread_mutex_unlock(&c->mtx_al));
             return (&c->pool[i]);
         }

繼續探討現有實作的效能疑慮,為何平行程度不高?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv