2024q1 Homework2 (quiz1+2)

contributed by < max890808 >

第一周測驗題

測驗一

參考 Optimized QuickSort — C Implementation (Non-Recursive) 實作非遞迴 (non-recursive; iterative) 的快速排序法

解釋程式碼運作原理

node_t *begin[max_level], *end[max_level];
node_t *left = NULL, *right = NULL;

begin 和 end 用來紀錄比較的範圍
left 存放比 pivot 的 value 小的 node_t
right 存放比 pivot 的 value 大的 node_t

圖解
假設有一串列 [2, 0, 4, 1, 3] ，此時 i 為0 ，left 和 right 皆為NULL

遍歷整個串列，比 pivot 的 value 小的會被放入 left 串列，反之放入 right 串列

while (p) {
    node_t *n = p;
-   p = CCCC;
+   p = p->next;
    list_add(n->value > value ? &right : &left, n);
}

更新 begin 和 end

    begin[i] = left;
-   end[i] = DDDD;
+   end[i] = list_tail(&left);
    begin[i + 1] = pivot;
    end[i + 1] = pivot;
    begin[i + 2] = right;
-   end[i + 2] = EEEE;
+   end[i + 2] = list_tail(&right);

進入下一次迭代，此時 i = 2

此時 begin[4] = 4 和 end[4] = 4, 不滿足 if 的條件，因此進行 else 的動作,透過 list_add 將 begin[4] 放入 result ，並將 i 減一

if (L)
    list_add(&result, L);
i--;

當索引 i 扣到小於 0 時，while 條件不成立跳出迴圈，完成排序，將 result 指派給 list

使用Linux 核心風格的 List API 改寫上述程式碼

commit 7ff3f48

將原本的結構體改為使用 list_head，詳細修改可以看 commit 紀錄

typedef struct __node {
-   struct __node *left, *right;
-   struct __node *next;
+   struct list_head list;
    long value;
} node_t;

改進方案

max_level 大小
原本的 max_level 為 2 * list_legnth(list) ，但最差的情況下只需要 list_legnth(list) + 1 ，因此作以下修正：

- int max_level = 2 * n;
+ int max_level = n + 1;

移除 end[]
end[i] 可由 begin[i]->prev 取得，因此不需要使用額外的儲存空間

- node_t *L = begin[i], *R = end[i];
+ struct list_head *L = begin[i]->next, *R = begin[i]->prev;

避免最差狀況的快速排序

commit 4644839

最差的狀況會發生在 Pivot 恰好是該資料陣列的最小值或最大值，這裡的解決方式為 Randomized Quick Sort ，用亂數選取的方式，隨機挑一個值作為 Pivot ，並使用 <time.h> 比較最差狀況的結果

資料總數	`quick_sort` 第一次 (s)	`quick_sort` 第二次 (s)	`quick_sort` 第三次 (s)	`quick_sort` 平均 (s)	`random_quick_sort` 第一次 (s)	`random_quick_sort` 第二次 (s)	`random_quick_sort` 第三次 (s)	`random_quick_sort` 平均 (s)
1000	0.009	0.008	0.009	0.009	0.001	0.001	0.001	0.001
10000	0.600	0.621	0.578	0.600	0.005	0.006	0.006	0.006
100000	55.475	55.576	57.474	56.175	0.057	0.057	0.070	0.061

測驗二

Timsort 定義一個 minrun 控制每個 run 的長度，使 run 的數量等於或者略小於 2 的冪，這樣可以提高合併的效率。並且採用一組堆疊 (stack) 來暫存 run，避免在一開始就掃描整個資料範圍來產生 run，從而降低記憶體開銷。此外，透過 merge_collapse 函式確保 run 的長度，函式策略如下：

A > B + C
B > C

當不符合上述條件時，則會判斷 A 、C 大小，小的跟 B 合併。在合併兩個 run 時，Timsort 會有一個機制決定是否要啟動 Galloping mode 。

解釋程式碼運作原理

do {
    /* Find next run */
    struct pair result = find_run(priv, list, cmp);
    result.head->prev = tp;
    tp = result.head;
    list = result.next;
    stk_size++;
    tp = merge_collapse(priv, cmp, tp);
} while (list);

不斷呼叫 find_run 函式找出下一個 run，並根據 run 的長度和目前的堆疊狀態，呼叫 merge_collapse() 合併不滿足條件的 run。

struct list_head *stk0 = tp, *stk1 = stk0->prev;
while (stk1 && stk1->prev)
    stk0 = stk0->prev, stk1 = stk1->prev;
if (stk_size <= FFFF) {
    build_prev_link(head, head, stk0);
    return;
}
merge_final(priv, cmp, head, stk1, stk0);

使用 build_prev_link() 使原本單向串列成為環狀雙向鏈結串列，並透過 marge_final() 將最後兩段 run 進行合併。

改進方案

實作 galloping mode 並加入 MIN_GALLOP 判斷 galloping mode 和 one-pair-at-a-time mode的使用時機

TODO :

實作改進方案
將改進過的 Timsort 實作整合進 sysprog21/lab0-c，作為另一個有效的排序命令，應設計效能評比的測試程式，確保測試案例符合 Timsort 設想的資料分布樣式。

第二周測驗題

測驗一

Linux 核心的 hash table 與典型的雙向環狀鏈結串列不同，hash table 所使用的 pprev 宣告為指標的指標，這樣的好處為 hlist_head 僅需要保存一個指標，當 bucket 很大時，就能減少記憶體開銷。示意圖如下：

解釋程式碼的運作

find

static int find(int num, int size, const struct hlist_head *heads)
{
    struct hlist_node *p;
    int hash = (num < 0 ? -num : num) % size;
-   hlist_for_each (p, BBBB) {
+   hlist_for_each (p, &heads[hash]) {
-       struct order_node *on = CCCC(p, struct order_node, node);
+       struct order_node *on = list_entry(p, struct order_node, node);
        if (num == on->val)
            return on->idx;
    }
    return -1;
}

透過雜湊值和 list_entry 找出目標結構體的位置，回傳相對應 idx 值

node_add

static inline void node_add(int val,
                            int idx,
                            int size,
                            struct hlist_head *heads)
{
    struct order_node *on = malloc(sizeof(*on));
    on->val = val;
    on->idx = idx;
    int hash = (val < 0 ? -val : val) % size;
-   hlist_add_head(&on->node, DDDD);
+   hlist_add_head(&on->node, &heads[hash]);
}

計算雜湊值判斷要將節點加入到雜湊表的哪一位置

dfs

static struct TreeNode *dfs(int *preorder,
                            int pre_low,
                            int pre_high,
                            int *inorder,
                            int in_low,
                            int in_high,
                            struct hlist_head *in_heads,
                            int size)
{
    if (in_low > in_high || pre_low > pre_high)
        return NULL;

    struct TreeNode *tn = malloc(sizeof(*tn));
    tn->val = preorder[pre_low];
    int idx = find(preorder[pre_low], size, in_heads);
    tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder,
                   in_low, idx - 1, in_heads, size);
    tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder,
                    idx + 1, in_high, in_heads, size);
    return tn;
}

透過 find 函式找到目前 root 的 idx 值，分別將 preorder 和 inorder 切成 left, root 和 right 三部分，不斷的遞迴重複上述步驟直到 preorder 或 inorder 陣列長度為一

測試程式

commit 781cdee

修正版

commit 30b0614

- preOrderTraversal(tree, check_pre);
+ preOrderTraversal(check_tree, check_pre);
    
- inOrderTraversal(tree, check_in);
+ inOrderTraversal(check_tree, check_in);

透過 generateRandomTree() 隨機生成一顆樹

struct TreeNode* generateRandomTree(int height, int *number) {
    if (height <= 0) {
        return NULL;
    }
    struct TreeNode* root = createNode(rand() % 100);
    number += 1;
    root->left = generateRandomTree(height - 1, number);
    root->right = generateRandomTree(height - 1, number);
    return root;
}

preOrderTraversal() 和 inOrderTraversal() 負責找出前序和中序的陣列

void preOrderTraversal(struct TreeNode* root, int* pre) {
    if (root != NULL) {
        *pre = root->val;
        preOrderTraversal(root->left, pre+1);
        preOrderTraversal(root->right, pre+1);
    }
}

void inOrderTraversal(struct TreeNode* root, int* in) {
    if (root != NULL) {
        inOrderTraversal(root->left, in+1);
        *in = root->val;
        inOrderTraversal(root->right, in+1);
    }
}

之後將前序和中序傳入 buildTree() 得到所要測試的樹，再透過 preOrderTraversal() 和 inOrderTraversal() 得到所要測試的樹的前序和中序陣列，最後比較一開始的前序和中序陣列是否相等就能完成測試

TODO:

指出其中可改進之處並實作
在 Linux 核心的 cgroups 相關原始程式碼，找出 pre-order walk 程式碼並探討

測驗二

LRU(Least Recently Used Cache) 是一種快取的實做方式，概念是會儲存最近用過的內容，會透過 Hash Map 與 Double Linked List 來搭配實做，如果資料愈常被使用，資料會被擺在 List愈前方的位置，如果快取滿了，則會從 List 最末端元素開始移除。

因為 List 會從第一筆逐一查詢，所以查詢的時間複雜度為 O(N)，為了降低查詢時間，所以才搭配HashMap，在 HashMap中設置 key，讓 key 的內容對應到 List 中的元素，就可以讓查詢時間降低到 O(1)

解釋程式碼的運作

LRUCache

typedef struct {
    int capacity;
    int count;
    struct list_head dhead;
    struct hlist_head hhead[];
} LRUCache;

capacity: 紀錄 cache 能存放的最大資料數
count: cache 當前的資料數
dhead: 資料的頭節點
hhead[]: 雜湊表

lRUCacheGet

int lRUCacheGet(LRUCache *obj, int key)
{
    int hash = key % obj->capacity;
    struct hlist_node *pos;
    hlist_for_each (pos, &obj->hhead[hash]) {
-       LRUNode *cache = list_entry(pos, LRUNode, HHHH);
+       LRUNode *cache = list_entry(pos, LRUNode, node);
        if (cache->key == key) {
-           list_move(IIII, &obj->dhead);
+           list_move(&cache->link, &obj->dhead);
            return cache->value;
        }
    }
    return -1;
}

計算 hash 值後，在對應的 hhead[hash] 所指向的鍵結串列進行搜索，若找到該節點回傳對應的值，並且透過 list_move 更新 dhead

lRUCachePut

void lRUCachePut(LRUCache *obj, int key, int value)
{
    LRUNode *cache = NULL;
    int hash = key % obj->capacity;
    struct hlist_node *pos;
    hlist_for_each (pos, &obj->hhead[hash]) {
-       LRUNode *c = list_entry(pos, LRUNode, JJJJ);
+       LRUNode *c = list_entry(pos, LRUNode, node);
        if (c->key == key) {
-           list_move(KKKK, &obj->dhead);
+           list_move(&c->link, &obj->dhead);
            cache = c;
        }
    }

    if (!cache) {
        if (obj->count == obj->capacity) {
            cache = list_last_entry(&obj->dhead, LRUNode, link);
            list_move(&cache->link, &obj->dhead);
            hlist_del(&cache->node);
            hlist_add_head(&cache->node, &obj->hhead[hash]);
        } else {
            cache = malloc(sizeof(LRUNode));
            hlist_add_head(&cache->node, &obj->hhead[hash]);
            list_add(&cache->link, &obj->dhead);
            obj->count++;
        }
        cache->key = key;
    }
    cache->value = value;
}

將新的資料放入快取當中，若資料已經在快取中，則使用 list_move 更新 dhead。若資料不在快取中且快取空間已滿，則會刪除最久沒被使用的資料，加入新的節點。若資料不在快取中且快取空間還沒滿，則配置新的記憶體空間並加入新節點。

TODO:

撰寫完整的測試程式，指出其中可改進之處並實作
在 Linux 核心找出 LRU 相關程式碼並探討

測驗三

解釋程式碼的運作

hweight_long

#define __const_hweight8(w)                                              \
    ((unsigned int) ((!!((w) & (1ULL << 0))) + (!!((w) & (1ULL << 1))) + \
                     (!!((w) & (1ULL << 2))) + (!!((w) & (1ULL << 3))) + \
                     (!!((w) & (1ULL << 4))) + (!!((w) & (1ULL << 5))) + \
                     (!!((w) & (1ULL << 6))) + (!!((w) & (1ULL << 7)))))

#define __const_hweight16(w) (__const_hweight8(w) + __const_hweight8((w) >> 8))
#define __const_hweight32(w) \
    (__const_hweight16(w) + __const_hweight16((w) >> 16))
#define __const_hweight64(w) \
    (__const_hweight32(w) + __const_hweight32((w) >> 32))

static inline unsigned long hweight_long(unsigned long w)
{
    return __const_hweight64(w);
}

計算 w 當中有幾個位元被設定

__ffs

static inline unsigned long __ffs(unsigned long word)
{
    int num = 0;

#if BITS_PER_LONG == 64
-   if ((word & AAAA) == 0) {
+   if ((word & 0xffffffff) == 0) {
        num += 32;
        word >>= 32;
    }
#endif
    if ((word & 0xffff) == 0) {
        num += 16;
        word >>= 16;
    }
    if ((word & 0xff) == 0) {
        num += 8;
        word >>= 8;
    }
    if ((word & 0xf) == 0) {
        num += 4;
        word >>= 4;
    }
    if ((word & 0x3) == 0) {
        num += 2;
        word >>= 2;
    }
    if ((word & 0x1) == 0)
        num += 1;
    return num;
}

找 word 第一個被設定的 bit 的位置

fns

static inline unsigned long fns(unsigned long word, unsigned int n)
{
    while (word) {
        unsigned int bit = __ffs(word);
        if (n-- == 0)
            return bit;
        __clear_bit(bit, &word);
    }

    return BITS_PER_LONG;
}

透過 __ffs 函式得到第 N 個被設定的 bit 的位置

find_nth_bit

static inline unsigned long find_nth_bit(const unsigned long *addr,
                                           unsigned long size,
                                           unsigned long n)
{
    if (n >= size)
        return size;

    if (small_const_nbits(size)) {
        unsigned long val = *addr & GENMASK(size - 1, 0);

        return val ? fns(val, n) : size;
    }

    return FIND_NTH_BIT(addr[idx], size, n);
}

透過 small_const_nbits 巨集判斷 size 的大小，若 size 小於 BITS_PER_LONG ，則運用 fns 函式找出第 N 個被設定的 bit 的位置，反之則會進入到 FIND_NTH_BIT 巨集

FIND_NTH_BIT

#define FIND_NTH_BIT(FETCH, size, num)                          \
    ({                                                          \
        unsigned long sz = (size), nr = (num), idx, w, tmp;     \
                                                                \
        for (idx = 0; (idx + 1) * BITS_PER_LONG <= sz; idx++) { \
            if (idx * BITS_PER_LONG + nr >= sz)                 \
                goto out;                                       \
                                                                \
            tmp = (FETCH);                                      \
            w = hweight_long(tmp);                              \
            if (w > nr)                                         \
                goto found;                                     \
                                                                \
            nr -= w;                                            \
        }                                                       \
                                                                \
-       if (sz CCCC BITS_PER_LONG)                              \
+       if (sz % BITS_PER_LONG)                                 \
            tmp = (FETCH) & BITMAP_LAST_WORD_MASK(sz);          \
    found:                                                      \
        sz = min(idx * BITS_PER_LONG + fns(tmp, nr), sz);       \
    out:                                                        \
        sz;                                                     \
    })

每次迭代都計算 FETCH 的 BITS_PER_LONG 長度有幾個位元被設定，透過 found 找出目標答案

TODO:

在 Linux 核心原始程式碼找出 find_nth_bit 的應用案例，並予以解說和分析，需要涵蓋 CPU affinity 相關的原始程式碼。

2024q1 Homework2 (quiz1+2)

第一周測驗題

測驗一

解釋程式碼運作原理

使用Linux 核心風格的 List API 改寫上述程式碼

改進方案

避免最差狀況的快速排序

測驗二

解釋程式碼運作原理

改進方案

第二周測驗題

測驗一

解釋程式碼的運作

測試程式

測驗二

解釋程式碼的運作

測驗三

解釋程式碼的運作

Read more

2024q1 Homework6 (integration)

2024q1 Homework4 (quiz3+4)

2024q1 Homework5 (assessment)

2024q1 Homework1 (lab0)