2021q1 Homework (quiz1)

contributed by < hankluo6 >

第 1 週測驗題

解題思路

`LLL`

(c) left = &((*left)->next)

從函式名稱 list_concat 可知應為將兩個 list 做 concatenation，LLL 為在迴圈內尋找 list 的尾端，且 left 的型態為 node_t **，為 node pointer 的 "address"，故答案為 &((left)->next。

`AAA`

(e) &right

AAA 為 n->value > value 時要將 node n 加入到的 list，此時 n 為目前 while 迴圈中操作的 node，value 為 pivot 節點上的值 (此例為第一個節點的值)，但這樣並不能判斷 left list 跟 right list 兩者間的大小關係，觀察下方程式碼中的 list_concat(&result, left); 先出現可知 left list 的值較小，故 AAA 應為 right 表示 value 比 pivot 的 value 大時加入到 right 的 list 中。

`BBB`

(c) &left

反之，value 小於等於 pivot 的 value 時加入到 left 的 list 中。

`CCC`

(b) list_concat(&result, pivot); list_concat(&result, right);

再依照大小順序將 list 連接即可。

程式原理




static inline void list_add_node_t(node_t **list, node_t *node_t) {
    node_t->next = *list;
    *list = node_t;
}

將 node_t 加入到 list 的前端，並將之設定為新的 head。因為傳入的 list type 為 pointer to pointer，所以可以很優雅地處理 head 為 NULL 時的問題。

第 2 行 : node_t->next = *list;

第 3 行*list = node_t





static inline void list_concat(node_t **left, node_t *right) {
    while (*left)
        left = &((*left)->next);
    *left = right;
}

取得 left 中最後的節點的 address，並把 right 的 list 移動到該位置中。因 right 本身的結構沒有更改，故原先節點在 right 中的順序在串聯後的 list 依然保持不變。

滿足 while 迴圈條件: 第 1 次

\to

left = &((*left)->next)

滿足 while 迴圈條件: 第 2 次

\to

left = &((*left)->next)

滿足 while 迴圈條件: 第 3 次

\to

left = &((*left)->next)

滿足 while 迴圈條件: 第 4 次

\to

left = &((*left)->next)

一旦 *left == NULL 成立，離開上述 while 迴圈

第 4 行 : *left = right

void quicksort(node_t **list)
{
    if (!*list)
        return;

    node_t *pivot = *list;
    int value = pivot->value;
    node_t *p = pivot->next;
    pivot->next = NULL;

    node_t *left = NULL, *right = NULL;
    while (p) {
        node_t *n = p;
        p = p->next;
        list_add_node_t(n->value > value ? &right : &left, n);
    }

    quicksort(&left);
    quicksort(&right);

    node_t *result = NULL;
    list_concat(&result, left);
    list_concat(&result, pivot); 
    list_concat(&result, right);;
    *list = result;
}

quicksort 與一般常見陣列中的 quick sort 類似，在 quicksort 中會先選擇第一個 node 當作 pivot，接著依照每個 node 的值跟 pivot 間的大小關係，將每個 node 分配到比 pivot 大的 list 與比 pivot 小的 list 當中，再遞迴對兩個 sublist 做 quicksort，最後再透過 list_concat 以 left_head ~ left_tail -> pivot -> right_head ~ right_tail 的順序接起來，並把 *list 設定為最後 result list 的開頭 (node_t **list 本身的值不變，改變的是其指向的記憶體位置 *list)。

`random`

random(3) 提到：

The srandom() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by random(). These sequences are repeatable by calling srandom() with the same seed value. If no seed value is provided, the random() function is automatically seeded with a value of 1.

所以要讓每次程式結果不同，必須設置 seed 值，可在程式內加入 srand(time(NULL)) 以目前時間設定 seed。

使用 getrandom(2) 從 /dev/urandom 取得目前 entropy pool 中的 noise，就能在不透過 seed 的情況下保證有一定的隨機性：

int number;
while (count--) {
    getrandom(&number, 2, GRND_NONBLOCK);
    list = list_make_node_t(list, number % 1024);
}

Optimized QuickSort

GitHub

//  quickSort
//
//  This public-domain C implementation by Darel Rex Finley.
//
//  * Returns YES if sort was successful, or NO if the nested
//    pivots went too deep, in which case your array will have
//    been re-ordered, but probably not sorted correctly.
//
//  * This function assumes it is called with valid parameters.
//
//  * Example calls:
//    quickSort(&myArray[0],5); // sorts elements 0, 1, 2, 3, and 4
//    quickSort(&myArray[3],5); // sorts elements 3, 4, 5, 6, and 7

bool quickSort(int *arr, int elements) {
  #define MAX_LEVELS  1000

  int piv, beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R ;

  beg[0]=0; end[0]=elements;
  while (i>=0) {
    L=beg[i]; R=end[i]-1;
    if (L<R) {
      piv=arr[L]; if (i==MAX_LEVELS-1) return NO;
      while (L<R) {
        while (arr[R]>=piv && L<R) R--; if (L<R) arr[L++]=arr[R];
        while (arr[L]<=piv && L<R) L++; if (L<R) arr[R--]=arr[L]; }
      arr[L]=piv; beg[i+1]=L+1; end[i+1]=end[i]; end[i++]=L; }
    else {
      i--; }}
  return YES; }

先觀察原本 Optimized QuickSort — C Implementation (Non-Recursive) 內的程式，可以發現 while 迴圈用來當作遞迴的每次呼叫，並分成兩部份：

if 的部份為 quicksort 中的 split
else 部份則為 quicksort 中的 merge

另外使用 beg 及 end 紀錄範圍，變數 i 用來模擬 stack 的行為，用其隨著時間改變 beg 及 end 的範圍。

陣列初始內容

滿足 while 迴圈條件: 第 1 次

滿足 while 迴圈條件: 第 2 次

滿足 while 迴圈條件: 第 3 次

你好，我驗算過發現，滿足 while 迴圈條件: 第 3 次的時候
end[2] 會指向 31， beg[3] 和 end[3] 會指向最尾端空的地方。Chia-Lun Hsu
感謝更正！hankluo6

由上面例子可以觀察到此程式會把 array 一直分成 sub-array，並優先選擇右方的 array 來處理，亦即每次排序皆為選擇目前尚未排序的陣列中最大值，並放至右側，如此便能讓右側的陣列滿足 Loop invariant 的特性。可以看成把 Array 以二等分持續分開，形成 Full Binary Tree，再將 tree 以右子樹開始走訪各節點，每次碰到 leaf 時，該數字就是目前要排序的值。

將上述概念套用到 linked list 也符合，持續往樹的右方 search，透過變數 i 模擬 stack 的操作，當找到要排序的值後，在透過 list_add_node_t 插入到最終 list 的前方。為了程式簡潔，我將 pivot 也當作一次迴圈考量，所以此時對應的 tree 應為 Ternary Tree。

void quicksort_nonrecursive(node_t **list)
{
    int value;
    int i = 0;
    int max_level = 100;
    node_t *node_beg[max_level], *node_end[max_level];
    node_t *result = NULL, *left = NULL, *right = NULL;
    node_t *pivot, *p, *L, *R;
    
    node_beg[0] = *list;
    node_end[0] = get_list_tail(list);
    
    while (i >= 0) {
        L = node_beg[i]; 
        R = node_end[i];
        
        if (L != R) {
            pivot = L;
            value = pivot->value;
            p = pivot->next;
            pivot->next = NULL;
            
            while (p) {
                node_t *n = p;
                p = p->next;
                list_add_node_t(n->value > value ? &right : &left, n);
            }

            node_beg[i] = left;
            node_end[i] = get_list_tail(&left);
            node_beg[i + 1] = pivot;
            node_end[i + 1] = pivot;
            node_beg[i + 2] = right;
            node_end[i + 2] = get_list_tail(&right);
            
            left = NULL;
            right = NULL;
            i += 2;
        }
        else {
            if (L)
                list_add_node_t(&result, L);
            i--;
        }
    }
    *list = result;
}

效能比較

輸入為長度 2000 的 linked list
- 使用 -O0 編譯：
  
  可以發現沒有使用 recursion 的效能平均比使用 recursion 的還高。
- 使用 -O3 編譯：
  
  兩者間的差距明顯縮小，說明編譯器對遞迴函式有很好的最佳化。

可改進的空間:

max_level 與 quick sort 時 partition 的位置有關，當 worst case 時，Tree 退化成 linked list，樹高為 n，而每次迴圈 i 需要加 2，會 max_level = 2n，best case 時，Tree 為平衡樹， max_level = 2log(n)。
node_beg 以及 node_end 可以用 dynamic array 來實作，起始容量為
$l o g_{2} (n)$ ，並隨著迴圈越深加倍其容量。

以 introsort 改進原先 quick sort：

實做 insertion sort

void insert_sorted(node_t *entry, node_t **list)
{
    while (*list && (*list)->value < entry->value)
        list = &(*list)->next;
    entry->next = *list;
    *list = entry;
}

void insertsort(node_t **list) 
{
    node_t *sorted = NULL;
    node_t *cur = *list;
    while (cur) {
        node_t *node = cur;
        cur = cur->next;
        insert_sorted(node, &sorted);
    }
    *list = sorted;
}

在做 partition 時紀錄 left list 及 right list 的長度

while (p) {
    node_t *n = p;
    p = p->next;
    if (n->value > value) {
        list_add_node_t(&right, n);
        r++;
    }
    else {
        list_add_node_t(&left, n);
        l++;
    }
}

當長度小於 20 時，以 insertion sort 取代 quick sort

if (l < 20)
    insertsort(&left);
else 
    quicksort_recursion(&left);
if (r < 20)
    insertsort(&right);
else
    quicksort_recursion(&right);

程式碼

下一步就是引入 tree-based sort 到上方程式碼中，改善 quick sort 原本 worst case 的表現，從而落實 introsort

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

Intro Sort

當遞迴的次數太多時，改以 tree sort 來排序，在 introSort 內新增 treesort 函式：

void introsort(node_t **list, int max_level, int insert)
{
    if (!*list)
        return;

    if (max_level == 0) {
        treesort(list);
        return;
    }

    node_t *pivot = *list;
    int value = pivot->value;
    int l = 0, r = 0;
    node_t *p = pivot->next;
    pivot->next = NULL;

    node_t *left = NULL, *right = NULL;
    while (p) {
        node_t *n = p;
        p = p->next;
        if (n->value > value) {
            list_add_node_t(&right, n);
            r++;
        }
        else {
            list_add_node_t(&left, n);
            l++;
        }
    }

    if (l < insert) {
        insertsort(&left);
    }
    else { 
        introsort(&left, max_level - 1, insert);
    }
    if (r < insert) {
        insertsort(&right);
    }
    else {
        introsort(&right, max_level - 1, insert);
    }
    node_t *result = NULL;
    list_concat(&result, left);
    list_concat(&result, pivot); 
    list_concat(&result, right);;
    *list = result;
}

而 treesort 的實作是以紅黑樹來當作平衡樹，便能在 worst case 也有

O (n l o g n)

的複雜度，紅黑樹以 rv32emu-next/c_map.h 及 rv32emu-next/c_map.c 改寫來實現。並將原本的 node_t 結構擴增為同時存有 linked list 及 tree 的欄位：

typedef struct __node {                   
    int value;
    struct __node *next; // for linked list
    struct __node *left, *right, *up; // for red-black tree
    c_map_color_t color; // for red-black tree
} node_t;

改用 union 可以更節省空間。

treesort 會將 linked list 內的 node 加入到 red-black tree 中，在以 inorder 的方式取出：

void treesort(node_t **list) {
    node_t **record = list;
    c_map_t map = c_map_new(sizeof(int), sizeof(NULL), c_map_cmp_int);
    while (*list) {
        c_map_insert(map, *list, NULL);
        list = &(*list)->next;
    }
    node_t *node = c_map_first(map), *first = node;
    for ( ;node; node = c_map_next(node)) {
        *list = node;
        list = &(*list)->next;
    }
    *list = NULL;
    *record = first;
    free(map);
}

要特別注意此紅黑樹不能插入同樣數值的節點。

效能分析

找尋適當的參數值，測試資料為 100000 筆隨機不重複數字：

for (int max_level = 1; max_level < (1 << 15); max_level <<= 1) {
    for (int insert = 10; insert < 40; ++insert) {
        size_t times = 100;
        time = 0;
        while (times--) {
            size_t count = 100000;
            int *test_arr = malloc(sizeof(int) * count);

            for (int i = 0; i < count; ++i) {
                test_arr[i] = i;
            }
            shuffle(test_arr, count);

            while (count--) {
                list1 = list_make_node_t(list1, test_arr[count]);
            }
            clock_gettime(CLOCK_MONOTONIC, &tt1);
            introsort(&list1, max_level, insert);
            clock_gettime(CLOCK_MONOTONIC, &tt2);
            time += diff_in_ns(tt1, tt2);
            list_free(&list1);
            free(test_arr);
        }
        fprintf(fd, "%f ", time / 100.0);
    }
    fprintf(fd, "\n");
}

max_level	insert	time(ns)
16	21	43538009
32	21	38184258
64	21	49350176
128	21	49877200

詳細輸出結果
max_level 參數決定到第幾層時使用 tree sort
insert 參數決定少於幾個 node 時使用 insertion sort

測量出來最佳的 max_level 為 32，大約為

2 \times l o g_{2} (100000)

，與論文 Introspective Sorting and Selection Algorithms 中的假設相同。

摘錄自 Introspective Sorting and Selection Algorithms 第 5 頁：

depth limit is
$2 ⌊ l o g_{2} N ⌋$ , since this value empirically produces good results.

每種 sorting 方式比較，測試資料為 100000 筆隨機不重複數字：
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
針對圖例標注的修正: with 可簡稱 w/; without 可簡稱 w/o。參見 wiktionary: w/o
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
jserv
可以發現 tree sort 的效能比其他都還要快上許多，這與 A Comparative Study of Linked List Sorting Algorithms 內的統計結果一樣。問題可能是出在遞迴的成本太高，導致 quick sort 無法超越 tree sort。

與 linux/lib/list_sort.c 比較，測試資料為 100000 筆隨機不重複數字：

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

嘗試用 perf 分析 tree sort 與 list_sort：

list_sort

 Performance counter stats for './list_sort.out':

     4,416,717      cache-misses              #    0.531 % of all cache refs      (66.65%)
   832,301,750      cache-references                                              (66.68%)
48,222,171,264      cycles                                                        (66.69%)
16,549,168,666      branch-instructions                                           (66.69%)
    26,319,955      branch-misses             #    0.16% of all branches          (66.67%)
   564,284,840      L1-dcache-load-misses                                         (66.63%)
   
  12.103174289 seconds time elapsed

  11.414579000 seconds user
   0.688155000 seconds sys

tree_sort

 Performance counter stats for './tree_sort.out':

  1,347,843,845      cache-misses              #   20.133 % of all cache refs      (66.65%)
  6,694,777,854      cache-references                                              (66.66%)
226,565,049,469      cycles                                                        (66.68%)
 28,693,147,859      branch-instructions                                           (66.68%)
  1,123,005,940      branch-misses             #    3.91% of all branches          (66.67%)
  3,115,288,209      L1-dcache-load-misses                                         (66.66%)
  
   58.205776700 seconds time elapsed

   58.170830000 seconds user
    0.007998000 seconds sys

list_sort.c 的註解寫道：

Thus, it will avoid cache thrashing as long as
$3 \times 2^{k}$ elements can fit into the cache. Not quite as good as a fully-eager bottom-up mergesort, but it does use
$0.2 \times n$ fewer comparisons, so is faster in the common case that everything fits into L1.

所以 list_sort 中的 merge sort 對 cache 比較友善，從 perf 的分析也能看出 list_sort 的各項數值皆比 tree sort 還低，這也能說明 linux 核心中使用 merge sort 而不使用 tree sort 的原因。

tree sort 的效能瓶頸推測是在 red-black tree 的部份，透過 perf 分析，可以發現 L1-cache-misses 和 branch-misses 最高的部份皆為 c_map_insert 中的比較函式 obj->comparator(&node->value, &cur->value) 內的 Conditional Expression ，而 cache-misses 最高的部份則是 c_map_insert 中建立樹的指派 cur->left = node 及 cur->right = node。故可以推測出造成效能較低的原因為建立樹所需要的成本太高，需要走訪及比較樹中的節點，而當節點的數量越多，此成本就越高。

tags: `linux2021`

Jim Huang

2021/03/10 23:31:32

改用 `union` 可以更節省空間。 ::: `

歡迎提交 pull request 到 rv32emu-next 專案 (Edited)

WayneLin1992

2021/03/14 10:44:38

[linux/lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 比較，測試資料為 100000 筆隨機不重複數字：

你怎麼有辦法叫出 kernel中的sort 進行比較? (Edited)

hankluo6

2021/03/15 11:15:44

我將 list_sort 抽出為獨立可運行的程式，沒有直接從 kernel 內比較

2021/03/14 10:51:43

顯然需要改寫 list_sort.c 的原始程式碼 (Edited)

2021q1 Homework (quiz1)

解題思路

LLL

AAA

BBB

CCC

程式原理

random

Optimized QuickSort

Intro Sort

效能分析

tags: linux2021

Read more

Linked List Sort

2021q1 Homework5 (sort)

2022q1 Homework5 (quiz5)

2022q1 Homework5 (quiz6)

`LLL`

`AAA`

`BBB`

`CCC`

`random`

tags: `linux2021`