2021q1 Homework1 (quiz1)

contributed by < ccs100203 >

tags: `linux2021`

第 1 週測驗題

測驗 1

考慮一個單向 linked list，其結構定義為:

typedef struct __node {                   
    int value;
    struct __node *next;
} node_t;

對該單向 linked list 進行 Quick sort，由小排序到大。





































static inline void list_add_node_t(node_t **list, node_t *node_t) {
    node_t->next = *list;
    *list = node_t;
}

static inline void list_concat(node_t **left, node_t *right) {
    while (*left)
        left = &((*left)->next);
    *left = right;
}

void quicksort(node_t **list)
{
    if (!*list)
        return;

    node_t *pivot = *list;
    int value = pivot->value;
    node_t *p = pivot->next;
    pivot->next = NULL;

    node_t *left = NULL, *right = NULL;
    while (p) {
        node_t *n = p;
        p = p->next;
        list_add_node_t(n->value > value ? &right : &left, n);
    }

    quicksort(&left);
    quicksort(&right);

    node_t *result = NULL;
    list_concat(&result, left);
    list_concat(&result, pivot); 
    list_concat(&result, right);
    *list = result;
}

Quicksort 原理

基於 Divide and Conquer 的方式進行，在每次的迭代中選定一個 pivot，將其餘數字與 pivot 比較，比 pivot 小的會放入 pivot 左邊，反之則在右邊。而兩段數列會進入下一次的迭代，一直到最後就能得到排序好的數列。

程式碼解析

quicksort

每一次選出 pivot 的動作，這裡是將 list 的 head 作為 pivot。
這邊將 pivot 與其餘節點斷開，用 p 來保存後續的 list。




node_t *pivot = *list;
int value = pivot->value;
node_t *p = pivot->next;
pivot->next = NULL;

執行過後

利用接下來的 while 迴圈，分別將比較小的節點放入 left，大的放入 right，如此我們可以得到一個 pivot 以及兩條 list。由於 list_add_node_t 的特性，節點會新增在 list 的最前方，下面會說明。

再來以 recursion 的形式讓 left 與 right 進入下一次迭代，一直到切出所有 pivot。
最後的部份就是利用 list_concat 接上所有已排序好的節點。

假設 recursion 已經回到最前面，要將 1 2 3 4 全部接上之前

在全部 concatenate 完成之後就會變成

list_add_node_t

static inline void list_add_node_t(node_t **list, node_t *node_t) {
    node_t->next = *list;
    *list = node_t;
}

此函式的作用為將一個 node 加入 list 的前方。

假設進入函式的狀況為下圖

經由 node_t->next = *list;

在通過 *list = node_t; 轉移 list 指向的位置，即可完成新增

list_concat

static inline void list_concat(node_t **left, node_t *right) {
    while (*left)
        left = &((*left)->next);
    *left = right;
}

此函式的作用在於串接兩條 list。會先藉由一個 while 迴圈找到 *left 尾端的 NULL，將 right 放在其位置，就可以串接兩條 list，若 *left 一開始就是 NULL，則 right 會直接取代他。

假設一開始進入函式的情況為下圖

while 迴圈過後

將 right 放進 *left (也就是 tail->next) 的位置，就可以完成串接

Random 修正

根據 random(3)

The random() function uses a nonlinear additive feedback random number generator employing a default table of size 31 long integers to return successive pseudo-random numbers in the range from 0 to RAND_MAX. The period of this random number generator is very large, approximately 16 * ((2^31) - 1).
又根據 Linear congruential generator
可以知道 random(3) 是透過此方式實做的，而 LCG 的公式為:

$X_{n + 1} = (a X_{n} + c) \mod m$
C 所使用的參數為下表

modulus m multiplier a increment c output bits of seed in rand()

$2^{31}$ 1103515245 12345 bits 30..0

所以程式執行結果每次都相同的原因是因為每次都是從相同位置開始計算。所以最直觀的方式就是改變 seed 讓其計算的行為改變。所以可以在這邊引入 srand() 與 time
這可以將時間當作 random 的 seed，用來改變其亂數的規律。
```
srand(time(NULL));
```
但這會遇到一個問題，短時間內快速重複執行還是會得到相同的亂數，所以要設法解決這個問題。

modulus m	multiplier a	increment c	output bits of seed in rand()
$2^{31}$	1103515245	12345	bits 30..0

後來找到 linux 中使用的 getrandom(2)，會藉由 urandom 的 entropy 去產生亂數，使用這個方法解決了上面遇到的問題。

unsigned int buf[1];
while (count--) {
    getrandom(buf, sizeof(int), 0);
    list = list_make_node_t(list, *buf % 1024);
}

Non-recursive Quicksort

參考 Optimized QuickSort

文章提供的方法解析

內文所提到的方法是以 swap 為主體，利用 L 與 R 去紀錄需交換的數量，再用 beg[] 與 end[] 作為 stack，用來紀錄比較的範圍。

假定這是一開始的 array。會拿 head 當作 pivot，並且 L 會從左邊開始掃，紀錄有多少值小於 pivot，R 從右邊開始，紀錄有多少值大於 pivot。(R 一開始會定義為陣列的 length，以減法的方式紀錄)
這裡的 L 就會是 3，而 R 就會是 5

於是他會把 array[4] 放到 array[0]，array[3] 放到 array[4]，再把 pivot 放入 array[3]

再來就是利用 beg 與 end 作為 stack 去儲存尚未排列好的序列，由於 pivot 目前在 array[3] 的位置，所以就會有一組 beg 跟 end 為 0~2、一組為 4~5。
之後會重複上述的行為，一次會以 5 為 pivot，另一次則會以 2 為 pivot，就可以得到排序好的 array。

自行改寫的 non-recursion 版本

由於原先給的程式已經能有效的劃分出:

小於等於 pivot 的 left
pivot
大於 pivot 的 right

所以要做的就是如何進行後續第二層的 partition，以及要如何把 list 串接回來，這邊引入上述的 stack，得以儲存迭代進行的狀態。

stack 會用來儲存每次做完的 left 與 right，這可以用來取代原先需要遞迴的情況

由於 stack 的特性與存放順序的關聯，運作中的 stack 會如同下圖

|              |                        |              |
|              |                        |    right_3   |
|              |                        |______________|
|    right_2   |                        |    left_3    |
|______________|                        |______________|
|    left_2    |        ------>         |    left_2    |
|______________|                        |______________|
|    left_1    |                        |    left_1    |
|______________|                        |______________|
|    left_0    |                        |    left_0    |
|______________|                        |______________|

通過這張圖可以看出，會先將 right 的部份優先做完，再去做 left 的部份，如同 DFS。那麼我就可以很直覺的在 node 走到 leaf 時 (也就是沒有連接下一個節點)，開始將 node 放入 result，要注意到的是每次都要放在 result 的最前方，因為程式會從最大的值開始放，而我們需要將 list 從小排到大。

因為 pivot 加入 result 的時間點變了，所以必須要想辦法儲存，否則就會從 list 中消失
- 我的作法是利用 list_concat 將 pivot 接在 left 的最後面，將其保留下來。
  但這個作法如果在 list 內有兩個相同的數字，會遇到一個嚴重的問題，程式會進入無窮迴圈無法結束。
- 在上面第 2 點可以看到，我是在節點是 leaf 的情形，也就是沒有銜接後面的節點時才會將他放入 result，但因為原先的作法會將與 pivot 一樣大的節點放入 left，所以最後就會產生 left 裡面永遠有兩個相同大小的
  $X$ ，程式就無法結束。
- 而解決辦法就是，把與 pivot 一樣大的另一個
  $X$ 放入 right，就能夠將兩個
  $X$ 有效的分開。所以將 > 改成 >= 很重要。
```
list_add_node_t(n->value >= value ? &right : &left, n);
```

以下是完整實作:

#define MAX_LEVELS 600
void quicksort_non_recursion(node_t **list) {
    if (!*list || !((*list)->next))
        return;

    int sp = 0;
    node_t *stack[MAX_LEVELS] = {0};
    node_t *result = NULL;

    stack[sp] = *list;
    while (sp >= 0) {
        node_t *pivot = stack[sp--];
        // If `pivot` doesn't have `next`, add it into the head of `result`.
        if (pivot->next) {
            int value = pivot->value;
            node_t *p = pivot->next;
            pivot->next = NULL;
            node_t *left = NULL, *right = NULL;
            while (p) {
                node_t *n = p;
                p = p->next;
                // It is important to use `large than` to prevent the program from falling into an infinite loop.
                list_add_node_t(n->value >= value ? &right : &left, n);
            }
            // prevent `pivot` vanish
            list_concat(&left, pivot);
            if (left)
                stack[++sp] = left;
            if (right)
                stack[++sp] = right;
        } else {
            list_add_node_t(&result, pivot);
        }
    }
    *list = result;
}

TODO: 程式碼的 node_t *stack[MAX_LEVELS] 可換為類似 C++ STL vector 的實作，例如用 C 語言開發的 Vector，基本想法是:

建立 vector 物件時，指定某個容量，使其恰好可應付大部分排序的需求
倘若在壓力測試或效能分析的情境，可運用 Vector 去動態延伸所需的記憶體空間

這樣的話，可兼顧空間和執行效率。

另外，quick sort 本身尚可改進，Introsort 提出混合演算法，在資料量少的時候 (如

n < 20

)，排序演算法改用插入排序，否則，使用快速排序。參考資訊:

:notes: jserv

`MAX_LEVELS` 開銷

引用 Optimized QuickSort — C Implementation (Non-Recursive) 作者的說法

Since
$2^{300}$ is greater than
$10^{90}$ , and there are only about
$10^{80}$ , fundamental particles in this universe from which a human-made computer can be built, no larger limit will ever be needed.

他所設立的

2^{300}

已經大於人造電腦可以建立出來的資料量，所以不會有觸碰到極限的問題。

我目前尚未能驗證他的說詞

也因為我是用一個 stack 去取代原先作者所使用的兩個 stack，所以我程式碼的 Max_LEVELS 在他的說法裡應設為 600。

引入類 qsort 函式風格的 `compar`

透過 man qsort 得知其屬於 stdlib.h

TODO

將 stack 改為類 vector 實作

引用 vector

將 stack 替代為下方的版本，在需要記憶體時執行 push, 並且使用完後就會 erase。

void quicksort_non_recursion(node_t **list) {
    if (!*list || !((*list)->next))
        return;

    int sp = 0;
    node_t *result = NULL;

    vector(node_t *) vec_stack;
    vector_inititialize(&vec_stack);

    vector_push_back(&vec_stack, *list);
    while (sp >= 0) {
        node_t *pivot = vector_at(&vec_stack, sp--);
        vector_erase_back(&vec_stack);
        // If `pivot` doesn't have `next`, add it into the head of `result`.
        if (pivot->next) {
            int value = pivot->value;
            node_t *p = pivot->next;
            pivot->next = NULL;
            node_t *left = NULL, *right = NULL;
            while (p) {
                node_t *n = p;
                p = p->next;
                // It is important to use `large than` to prevent the program from falling into an infinite loop.
                list_add_node_t(n->value >= value ? &right : &left, n);
            }
            // prevent `pivot` vanish
            list_concat(&left, pivot);
            if (left) {
                vector_push_back(&vec_stack, left);
                sp++;
            }
            if (right) {
                vector_push_back(&vec_stack, right);
                sp++;
            }
        } else {
            list_add_node_t(&result, pivot);
        }
    }
    *list = result;
}

引入 Introsort

參考 IntroSort，顯示 C++ STL 的 sort 是基於 quicksort，並由 heapsort 和 insertion sort 組成。

加入 heapsort 的原因為，當 quicksort 迭代了
$2 * l o g (n)$ 次之後還沒得到解，那就有高機率會得到
$O (n^{2})$ 的時間複雜度。此時切換為 heapsort 能將其時間複雜度控制在
$O (n l o g n)$ 。

The reason behind this “fallback” is that if Quicksort was not able to get the solution after
$2 * l o g (n)$ recursions for a branch, probably it hit its worst case and it is degrading to complexity
$O (n^{2})$ .
加入 insertion sort 是為了優化此演算法，而作者表示這是經由經驗法則所得出的結論

The number 20 is an empiric number obtained observing the behaviour of InsertionSort with lists of this size.

看懂 introsort 後，在 quiz2 可挑戰實作 Timsort，說不定還可改進 Linux 核心的 list_sort.c
:notes: jserv

下面是以遞迴版本的 quicksort 結合 insertion sort 所形成的 introsort

非遞迴版本的目前想不到方法把答案串起來，所以就變成實作遞迴版本的。
而 heapsort 還未實作。

2021/03/06
正在思考如何以最小的成本去實作 heapsort，單純的 linked list 效率很差，也不可能在不改動結構的情況下轉換為 Tree，目前想以新增一個 tree node struct 的方式進行。

2021/03/07
在想是不是改成混用 tree sort 會比較好，兩難。

void insertionsort(node_t **list) {
    if (!*list || !((*list)->next))
        return;
    
    node_t *result = *list;
    node_t *remain = result->next;
    result->next = NULL;
    
    while (remain) {
        node_t *head = remain;
        remain = head->next;
        node_t **cursor = &result;

        while (*cursor && head->value > (*cursor)->value) {
            cursor = &(*cursor)->next;
        }
        
        head->next = *cursor;
        *cursor = head;
    }
    *list = result;
}

void introsort(node_t **list) {
    if (!*list || !((*list)->next))
        return;

    node_t *pivot = *list;
    int value = pivot->value;
    node_t *p = pivot->next;
    pivot->next = NULL;

    node_t *left = NULL, *right = NULL;
    int l = 0, r = 0, threshold = 20;
    while (p) {
        node_t *n = p;
        p = p->next;
        if (n->value > value) {
            list_add_node_t(&right, n);
        } else {
            list_add_node_t(&left, n);
        }
    }

    if (l < threshold)
        insertionsort(&left);
    else
        quicksort(&left);

    if (r < threshold)
        insertionsort(&right);
    else
        quicksort(&right);

    node_t *result = NULL;
    list_concat(&result, left);
    list_concat(&result, pivot);
    list_concat(&result, right);
    *list = result;
}

探討 Linux 的 linked list 和 sysprog21/linux-list 的落差

Linux 的 linked list

參照 list.h，可以看到 linux 使用的 linked list，但是卻沒有定義所使用的 struct list_head，於是在 types.h 內找到了他的定義。

可以看出 Linux 內部是使用 Doubly-linked list.



struct list_head {
	struct list_head *next, *prev;
};

那如何知道他是 circular doubly-linked list 呢 ?
從以下的程式可以看出，當使用 INIT_LIST_HEAD 時，會建立出一個 HEAD，並且會將 HEAD 的 next 與 prev 都指向自己，配合後面插入節點的操作，就可以知道他屬於 circular doubly-linked list.













/**
 * INIT_LIST_HEAD - Initialize a list_head structure
 * @list: list_head structure to be initialized.
 *
 * Initializes the list_head to point to itself.  If it is a list header,
 * the result is an empty list.
 */
static inline void INIT_LIST_HEAD(struct list_head *list)
{
	WRITE_ONCE(list->next, list);
	list->prev = list;
}

而繼續看下去，可以發現很大的差異為包裝以及驗證的部份
以 void list_add() 為例:
可以看出他並不是一開始拿到資料與節點就直接進行新增，而是多包裝了一層 __list_add()












/**
 * list_add - add a new entry
 * @new: new entry to be added
 * @head: list head to add it after
 *
 * Insert a new entry after the specified head.
 * This is good for implementing stacks.
 */
static inline void list_add(struct list_head *new, struct list_head *head)
{
	__list_add(new, head, head->next);
}

再來去尋找他的實作，可以知道他會先藉由 __list_add_valid() 做一層驗證，再去進行 add 的操作。可以看出他是將新的節點插入在 prev 與 next 之間。
並且可以看出，這樣子的操作會維持其 Circular doubly linked list 的特性。


















/*
 * Insert a new entry between two known consecutive entries.
 *
 * This is only for internal list manipulation where we know
 * the prev/next entries already!
 */
static inline void __list_add(struct list_head *new,
			      struct list_head *prev,
			      struct list_head *next)
{
	if (!__list_add_valid(new, prev, next))
		return;

	next->prev = new;
	new->next = next;
	new->prev = prev;
	WRITE_ONCE(prev->next, new);
}

值得一提的是，如果 CONFIG_DEBUG_LIST 為 False， __list_add_valid 會直接回傳 true
程式中所使用的 WRITE_ONCE()，其定義在 compiler.h 內

/*
 * Prevent the compiler from merging or refetching reads or writes. The
 * compiler is also forbidden from reordering successive instances of
 * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
 * particular ordering. One way to make the compiler aware of ordering is to
 * put the two invocations of READ_ONCE or WRITE_ONCE in different C
 * statements.
 *
 * These two macros will also work on aggregate data types like structs or
 * unions. If the size of the accessed data type exceeds the word size of
 * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
 * fall back to memcpy and print a compile-time warning.
 *
 * Their two major use cases are: (1) Mediating communication between
 * process-level code and irq/NMI handlers, all running on the same CPU,
 * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
 * mutilate accesses that either do not require ordering or that interact
 * with an explicit memory barrier or atomic instruction that provides the
 * required ordering.
 */

#define WRITE_ONCE(x, val)				\
({							\
	union { typeof(x) __val; char __c[1]; } __u =	\
		{ .__val = (val) }; 			\
	__write_once_size(&(x), __u.__c, sizeof(x));	\
	__u.__val;					\
})

這巨集存在的原因，主要是 memory (re)ordering，你應該參照 Kernel documentation，摘錄對應的考量。
:notes: jserv

好的，研究中

static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
	switch (size) {
	case 1: *(volatile  __u8_alias_t *) p = *(__u8_alias_t  *) res; break;
	case 2: *(volatile __u16_alias_t *) p = *(__u16_alias_t *) res; break;
	case 4: *(volatile __u32_alias_t *) p = *(__u32_alias_t *) res; break;
	case 8: *(volatile __u64_alias_t *) p = *(__u64_alias_t *) res; break;
	default:
		barrier();
		__builtin_memcpy((void *)p, (const void *)res, size);
		barrier();
	}
}

sysprog21/linux-list

根據 list.h

直接參照 list_add()
可以發現他並不像上面的版本有做一層封裝，並對其操作進行驗證














/**
 * list_add() - Add a list node to the beginning of the list
 * @node: pointer to the new node
 * @head: pointer to the head of the list
 */
static inline void list_add(struct list_head *node, struct list_head *head)
{
    struct list_head *next = head->next;

    next->prev = node;
    node->next = next;
    node->prev = head;
    head->next = node;
}

將 quick sort 改寫為非遞迴版本

TODO

高效率的 linked list 排序演算法

研讀 Ching-Kuang Shene (冼鏡光) 教授撰寫的 A Comparative Study of Linked List Sorting Algorithms，思考高效率的 linked list 排序演算法。

TODO

2021q1 Homework1 (quiz1)

tags: linux2021

測驗 1

Quicksort 原理

程式碼解析

quicksort

list_add_node_t

list_concat

Random 修正

Non-recursive Quicksort

文章提供的方法解析

自行改寫的 non-recursion 版本

MAX_LEVELS 開銷

引入類 qsort 函式風格的 compar

將 stack 改為類 vector 實作

引入 Introsort

探討 Linux 的 linked list 和 sysprog21/linux-list 的落差

Linux 的 linked list

sysprog21/linux-list

將 quick sort 改寫為非遞迴版本

高效率的 linked list 排序演算法

Read more

2022/08/23~24 OSSLA Records

So What Has RCU Done Lately? (2023)

2021q1 Homework5 (sort)

RCU in the Linux Kernel

tags: `linux2021`

`MAX_LEVELS` 開銷

引入類 qsort 函式風格的 `compar`