2025q1 Homework2 (quiz1+2)

contributed by < Potassium-chromate >

第一周

測驗一

程式碼運作原理

list_insert_before 的具體用途為將 item 插入到 before 前面。具體方式為持續遍歷鍊結串列直到遇到 before ，接著把 before 前一個節點的 next 指向 item。

static inline void list_insert_before(list_t *l,
                                      list_item_t *before,
                                      list_item_t *item)
{
    list_item_t **p;
    for (p = &l->head; *p != before; p = &(*p)->next)
        ;
    *p = item;
    item->next = before;
}

合併排序操作

本次實作為 Top-Down 的方式，其中尋找中間節點的作法為利用快慢指標 (slow and fast pointers)。

尋找中間節點 - get_middle()
在 get_middle() 函數中，我們使用快慢指標 (slow and fast pointers) 來找到鏈結串列的中間節點：

slow 指標每次移動一步。
fast 指標每次移動兩步。
當 fast 到達鏈結串列的末端時，slow 剛好位於中間位置。

合併兩個已排序的鏈結串列 - merge()
其實做方式為利用遞迴函數來合併兩個已排序的鏈結串列，並返回排序後的新鏈結串列的第一個節點。

若 left 的值較小，則 left->next 指向遞迴合併後的結果，返回 left。
若 right 的值較小，則 right->next 指向遞迴合併後的結果，返回 right。3

static inline list_item_t *merge(list_item_t *left, list_item_t *right) {
    if (!left) return right;
    if (!right) return left;
    
    if (left->value <= right->value) {
        left->next = merge(left->next, right);
        return left;
    } else {
        right->next = merge(left, right->next);
        return right;
    }
}

實作 Merge Sort - merge_sort()
本次是以遞迴式的方式來實做 Merge Sort 主函數：

當 head 為空或僅有一個節點時，直接返回 head。
使用 get_middle() 找到中間節點。
將鏈結串列分成兩部分，並遞迴地對左右兩部分進行 Merge Sort。
使用 merge() 合併已排序的左右兩部分，並返回排序後的鏈結串列頭部。

list_item_t* middle = get_middle(head);
list_item_t* next_to_middle = middle->next;
middle->next = NULL;
    
list_item_t* left = merge_sort(head);
list_item_t* right = merge_sort(next_to_middle);

測驗二

程式碼運作原理

由於目的是要找到左子樹的最右節點，因此需要持續遍歷 (*pred_ptr)->r 直到遇到 NULL。因此 EEEE 為 (*pred_ptr)->r， FFFF 為 (*pred_ptr)->r。

while ((*pred_ptr)->r) // EEEE
    *pred_ptr = (*pred_ptr)->r; // FFFF

其刪除節點的邏輯如下：
情況 1：節點有兩個子節點

需找到該節點 左子樹中最大值的節點（前驅節點，Predecessor） 來替換它。
若前驅節點為該節點的左子節點，可直接用它來替換目標節點。
若前驅節點位於左子樹深處，則需先移除前驅節點，並妥善接回其左子樹，最後才用前驅節點替換目標節點。

具體過程如下：
步驟 1：尋找前驅節點

50 的左子樹存在最大值節點 40（即前驅節點）。

       50
      /  \
    30    70
   /  \   /  \
  10  40 60  80
      /
     35

步驟 2：移除 40，並將其左子節點 35 接回

由於 40 尚有左子節點 35，若直接用 40 替換 50，則 35 可能會丟失，因此須先將 40 替換為 35。

       50
      /  \
    30    70
   /  \   /  \
  10  35 60  80

步驟 3：使用 40 替換 50

最後，將 40 取代 50 的位置，並保留其原本的左右子樹結構。

       40
      /  \
    30    70
   /  \   /  \
  10   35 60  80

情況 2：節點只有一個子節點

如果該節點有一個子節點，我們用其子節點取代目標節點。

以下為用 graviz 視覺化刪除過程的範例：

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

移除節點 654

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

移除節點 284

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

程式碼細節可以參考 Commit 3c697d6

參閱 memory-allocators

參考 memory-allocators，可以得知 allocators 分為以下四種：
1. Linear Allocator:

記憶體管理策略
- 以連續區塊的方式管理記憶體的最簡單的分配器。
- 透過增加指標（bump pointer）來線性分配記憶體
- 不允許釋放特定位置，只能一次釋放全部記憶體。
工作原理
- 將一個指標保留在記憶體區塊的第一個記憶體位址上開始。
- 每次分配都會使指針向前移動來表示最後分配的位置。

2. Stack Allocator

記憶體管理策略
- 工作方式類似於堆疊資料結構（LIFO—後進先出）。
- 分配從堆疊頂部進行。
- 釋放必須按照分配的相反順序進行。
工作原理
- 堆疊指標追蹤分配記憶體的頂部。
- 每次新的分配都會將指針向堆的頂端。
- 僅允許最近的分配的記憶體進行釋放。

3. Pool Allocator

記憶體管理策略
- 管理固定大小的記憶體區塊。
- 使用鍊結串列來追蹤可用的區塊。
工作原理
- 預先分配一大塊內存，並將其分成固定大小的區塊。
- 維護一個儲存空閒區塊的鍊結串列。
- 每當請求新的記憶體空間時，就會從空閒區塊的鍊結串列分配一塊出來。

實做方式
以下為memory-allocators 的實做方式。
1. 初始化 - 分配一個大記憶體區域

void PoolAllocator::Init() {
    m_start_ptr = malloc(m_totalSize);
    this->Reset();
}

2. 建立自由列表 - 初始化可用區塊

void PoolAllocator::Reset() {
    m_used = 0;
    m_peak = 0;
    // Create a linked-list with all free positions
    const int nChunks = m_totalSize / m_chunkSize;
    for (int i = 0; i < nChunks; ++i) {
        std::size_t address = (std::size_t) m_start_ptr + i * m_chunkSize;
        m_freeList.push((Node *) address);
    }
}

4. Free list allocator

記憶體管理策略
- 有兩種常見的實作：一種使用連結列表，一種使用紅黑樹。
- 當請求分配時，它會在連結列表中搜尋可以容納資料的區塊。
- 它從鍊結串列(或是紅黑數)中刪除元素，並在資料之前放置一個 allocation header 以紀錄該區塊的大小。
- 在鍊結串列(或是紅黑數)中也會持續合併連續的記憶體區塊以創建更大的區塊。
工作原理
- 預先以鍊結串列(或是紅黑數)來儲存記憶體中每個空閒連續區塊的起始位址及其大小
- 在紅黑數的結構中，可以將複雜度降低到
  $O (l o g (N))$

效能評比

在作業提供的資料結構中，二元搜尋樹 (BST) 沒有自平衡機制，這可能導致插入與刪除節點時出現樹的高度不均問題，影響搜尋與操作效率。

因此，我本次實作紅黑樹 (Red-Black Tree, RBT)，並比較：

BST vs. RBT 在插入的效能
BST vs. RBT 在刪除的效能

參考 Linux 核心的紅黑樹，可以得知紅黑數的基本性質為：

每個節點都是紅色或黑色。
根節點必為黑色。
紅色節點的子節點必為黑色，即不能有連續的紅色節點。
從根節點到所有葉節點的黑色節點數相同。
新插入的節點預設為紅色，並透過旋轉與 顏色調整 來維持紅黑樹性質。

紅黑數的實做細節可以參考 Commit 50e822e

下圖是隨機插入 1 到 30 的結果並使用 Graphviz 視覺化的結果：

刪除 10 個節點 (13, 16, 24, 11, 25, 27, 10, 12, 19, 21) 後，結果如下：

下方為 RBT 與 BST 於插入 3000 個節點與刪除 1500 節點的時間：
測量方式為使用 Linux 內建效能分析工具： Perf。

紅黑數：

eason@eason-System-Product-Name:~/2025_lab_1/memory_allocators$ sudo perf stat -r 100 ./RBtree

 Performance counter stats for './RBtree' (100 runs):

             16.65 msec task-clock                       #    0.987 CPUs utilized               ( +-  0.99% )
                 0      context-switches                 #    0.000 /sec                      
                 0      cpu-migrations                   #    0.000 /sec                      
                91      page-faults                      #    5.466 K/sec                       ( +-  0.09% )
     <not counted>      cpu_atom/cycles/                                                        ( +-100.00% )  (0.00%)
         6743,6385      cpu_core/cycles/                 #    4.050 GHz                         ( +-  0.29% )
     <not counted>      cpu_atom/instructions/                                                  ( +-100.00% )  (0.00%)
         9391,3135      cpu_core/instructions/           #  128.96  insn per cycle              ( +-  0.01% )
     <not counted>      cpu_atom/branches/                                                      ( +-100.00% )  (0.00%)
         2030,7443      cpu_core/branches/               #    1.220 G/sec                       ( +-  0.01% )
     <not counted>      cpu_atom/branch-misses/                                                 ( +-100.02% )  (0.00%)
            7,8665      cpu_core/branch-misses/          #   37.70% of all branches             ( +-  0.16% )
             TopdownL1 (cpu_core)                 #     65.4 %  tma_backend_bound      
                                                  #      2.0 %  tma_bad_speculation    
                                                  #      3.2 %  tma_frontend_bound     
                                                  #     29.4 %  tma_retiring             ( +-  0.29% )

          0.016874 +- 0.000169 seconds time elapsed  ( +-  1.00% )

二元搜尋樹：

eason@eason-System-Product-Name:~/2025_lab_1/memory_allocators$ sudo perf stat -r 100 ./BST

 Performance counter stats for './BST' (100 runs):

             58.54 msec task-clock                       #    0.995 CPUs utilized               ( +-  0.49% )
                 0      context-switches                 #    0.000 /sec                      
                 0      cpu-migrations                   #    0.000 /sec                      
               106      page-faults                      #    1.811 K/sec                       ( +-  0.06% )
     <not counted>      cpu_atom/cycles/                                                        ( +- 32.80% )  (0.00%)
       2,5534,4994      cpu_core/cycles/                 #    4.362 GHz                         ( +-  0.28% )
     <not counted>      cpu_atom/instructions/                                                  ( +- 32.91% )  (0.00%)
       5,2419,1360      cpu_core/instructions/           #   29.97  insn per cycle              ( +-  0.11% )
     <not counted>      cpu_atom/branches/                                                      ( +- 32.96% )  (0.00%)
       1,0172,8554      cpu_core/branches/               #    1.738 G/sec                       ( +-  0.10% )
     <not counted>      cpu_atom/branch-misses/                                                 ( +- 34.71% )  (0.00%)
           79,0093      cpu_core/branch-misses/          #   14.99% of all branches             ( +-  1.56% )
             TopdownL1 (cpu_core)                 #     23.0 %  tma_backend_bound      
                                                  #     10.0 %  tma_bad_speculation    
                                                  #     29.5 %  tma_frontend_bound     
                                                  #     37.6 %  tma_retiring             ( +-  0.30% )

          0.058839 +- 0.000291 seconds time elapsed  ( +-  0.49% )

可以發現 RBT 快了 BST 約 4 倍。並且理論上 BST 的刪除節點的時間複雜度為

O (n)

， BRT 刪除節點的時間複雜度為

O (l o n (n))

TODO: 比較在不同數量節點的情況下，BST 與 RBT 的效能表現。

測驗三

程式碼運作原理

判斷式 L != R 代表當左右兩邊的linked list頭尾不相等時，表示該linked list不只包含一個節點；而在 else 內，則代表該linked list只剩一個節點，此時可以直接將該節點插入最終的要回傳linked list。

然後，還可以發現在else中會將i減1，再搭配上面提到的部分。可以發現當right只剩一個節點並執行i--後，begin[i]就會退回pivot，然而pivot必定只有一個節點，因此又會再做一次i--，然後begin[i]又會退回left的部分，最後便可以持續分割原本的left並對其做排序。

 while (i >= 0) {
    node_t *L = begin[i], *R = end[i];
    if (L != R) {
        ...
        i += 2;
    } else {
        if (L)
            list_add(&result, L);
        i--;
    }
}

最後，我們可以用Graphviz來觀察一下quick_sort中right, left與pivot的變化。
這是最一開始的狀態。共10個節點。

一開始一定是取第一個節點作為pivot，也就是8。因此我們可以發現right, left與pivot會變成以下狀態。

之後，9和8就會被併入最終的要回傳linked list。
之後再對原本的那條left做排序即可。可以發現，整結構其實就是一個stack。

實做細節可以參考 Commit c9912d7

研讀論文

For quick sort, Hoare's original algorithm [3] cannot
be used since this algorithm "bums a candle from
both ends".

在論文中，對於在雙向鍊結串列上使用 Hoare's 快速排序的主要擔憂是，霍爾原始的分區演算法 "bums a candle from both ends" ，同時從前面和後面進行遍歷。雖然這在陣列上很簡單（透過索引隨機存取很容易），但在鍊錶上卻很麻煩，特別是在單向的鍊結串列上。

However, since quick sort is not stable (Sedgewick
[6]), it is not included. Instead, an algorithm which
was originally designed to make quick sort stable
and to handle equal keys is selected for this study.
This algorithm was first proposed by Motzkin [5]
and then analyzed by Wegner [7].

隨後論文指出了第二個缺點：傳統的快速排序不穩定，這代表具有相同鍵值的節點在排序後可能不會保持其原始順序。由於這種不穩定性，作者選擇放棄傳統的快速排序，而採用一種變體（由 Motzkin 提出並由 Wegner 分析），結合以上兩點，就成為作業利用堆疊來實做快速排序演算法的方式。

第二周

測驗一

程式碼運作原理

首先來看 getnum 函式，其實作上是利用了線性同餘生成器（Linear Congruential Generator, LCG）的原理來產生偽隨機數。

參考： Linear congruential generator – Wikipedia

LCG 的基本公式如下：

N_{j + i} = (A \times N_{j} + B) m o d m

其中

A

B

為常數，且

B

與

M

必須互質。這種方式能有效產生統計上看似隨機的數字，但因為內部狀態容易從輸出推導，因此它屬於偽隨機數產生器（PRNG）。

static inline uint8_t getnum(void)
{
    static uint16_t s1 = UINT16_C(2);
    static uint16_t s2 = UINT16_C(1);
    static uint16_t s3 = UINT16_C(1);

    s1 *= UINT16_C(171);
    s1 %= UINT16_C(30269);

    s2 *= UINT16_C(172);
    s2 %= UINT16_C(30307);

    s3 *= UINT16_C(170);
    s3 %= UINT16_C(30323);

    return s1 ^ s2 ^ s3;
}

使用了三個不同參數的 LCG (s1, s2, s3)。
每次呼叫 getnum() 時，這三個狀態會根據各自的乘數與模數進行更新。
因為變數是 static，它們只在第一次呼叫時初始化，之後會保留上次的狀態，形成一個持續變化的序列。
最後透過 XOR (^) 結合三個數值，以增加隨機性與分佈品質。

static inline void random_shuffle_array(uint16_t *operations, uint16_t len)
{
    for (uint16_t i = 0; i < len; i++) {
        /* WARNING biased shuffling */
        uint16_t j = get_unsigned16() % (i + 1);
        operations[i] = operations[j];
        operations[j] = i;
    }
}

可以發現以上程式碼有以下問題：

無效洗牌

當 i==j 時，可能會造成無效的洗牌。

偏向分佈

使用 % (i + 1) 可能會產生模數偏差，除非 get_unsigned16() 能均勻的生成 1 ~ i+1 的變數。

改進 random_shuffle_array:
可以採用 Fisher-Yates Shuffle 的方式來洗牌，如下:

#include <stdint.h>
#include <stdlib.h>

static inline void random_shuffle_array(uint16_t *operations, uint16_t len)
{
    for (uint16_t i = len - 1; i > 0; i--) {
        uint16_t j = rand() % (i + 1);
        
        // Swap operations[i] and operations[j]
        uint16_t temp = operations[i];
        operations[i] = operations[j];
        operations[j] = temp;
    }
}

並且以熵的方式來衡量兩個洗牌算法之間的亂度：

length of array	Original	Fisher Yate	Ideal entropy
7	12.2954	12.2956	12.2922
8	15.2696	15.2700	15.2992
9	18.1794	18.1802	18.4691

可以發現 Fisher-Yates Shuffle 的效果其實比原本的演算法略好一點。

詳細的 python 腳本實做可以參考：Commit bf5b69f

為何將 `list_move_tail` 換為 `list_move`，會無法滿足 stable sorting?

當兩個元素具有相同的值時相當於 cmpint() == 0，item->list 會被放入 list_greaterelse 中。

如果使用 list_move_tail()，則兩個節點中的第一個會更早出現在 list_greater，從而保持順序。
如果我們使用 list_move()，則第二個節點反而會被插入到前面，這會反轉相等鍵值節點的原始順序。

2025q1 Homework2 (quiz1+2)

第一周

測驗一

程式碼運作原理

合併排序操作

測驗二

程式碼運作原理

參閱 memory-allocators

效能評比

測驗三

程式碼運作原理

研讀論文

第二周

測驗一

程式碼運作原理

為何將 list_move_tail 換為 list_move，會無法滿足 stable sorting?

Read more

2025q1 Homework1 (lab0)

2025q1 Homework1 (ideas)

為何將 `list_move_tail` 換為 `list_move`，會無法滿足 stable sorting?