2025q1 Homework2 (quiz1+2)

contributed by <Dennis40816>

第一週

測驗 1

Solution

AAAA : &l->head
BBBB : before
CCCC : &(*p)->next
DDDD : item->next

Description

本題中使用單向鏈結串列。

先從 list_insert_before 的解釋著手:

/**
 * Inserts a new item into the list before a specified item.
 *
 * This function traverses the list to locate the position immediately before
 * the item pointed to by @before and inserts @item in that position.
 * The time complexity is O(n), where n is the number of steps needed to
 * reach @before from the head of the list.
 *
 * Parameters:
 * @l      : Pointer to the list.
 * @before : Pointer to the item before which the new item should be inserted.
 *           - If @before is the head of the list, the new item is inserted
 *             at the front.
 *           - If @before is NULL, the new item is appended to the end of
 *             the list.
 *           - In all other cases, behavior is undefined if @before does not
 *             belong to @l.
 * @item   : The new list item to be inserted.
 */
static inline void list_insert_before(list_t *l,
                                      list_item_t *before,
                                      list_item_t *item);

得知 before 指向的，是當插入新節點 item 後， item 的下一個元素。

當 before 是頭部節點，item 將被插入在最前方，如果是 NULL 則被插入在最尾端。





list_item_t **p;
for (p = AAAA; *p != BBBB; p = CCCC)
    ;
*p = item;
DDDD = before;

list_insert_before 做的事情是:

找到 before 目前所在的位置 (第 2 - 3 行)
將 before 用 item 替換 (第 4 行)
將 item 的下個元素指向 before (第 5 行)

因此，AAAA - DDDD 的內容推論如下:

尋找 before 應當從頭部節點開始，struct list_t 中僅有成員 head，故 AAAA 為 &l->head。
停止尋找條件為 *p 指向的節點與 before 相同時，故 BBBB 為 before。應注意，list_insert_before 的註釋提到當 before 不存在於 l 中時，可能產生未定義行為，推測將因為對尾部節點的下一個，即 NULL 解引用導致 core dump。
對於每次循環，p 應該往下移動，故 CCCC 為 &(*p)->next。應注意運算子的優先級別，在此為 -> > * > &，詳見 C Operator Precedence，故括弧應僅包含 *p。
將 item 的下一個節點換作 before，故 DDDD 為 item->next。

以 Graphviz 呈現:

當循環開始時，*p 指向 head

當 *p 指向的節點為 before 時，插入 item

將 item->next 指向 before

注意 p 指向的是節點 next 的地址，從圖中看便是指向 next，而 next 再指向下一個節點，可以描述出 p 是 indirect pointer。之所以說優雅是因為頭部節點的地址實際存儲在 &head 記憶體中，「像是」一個 next 的存在，這使的操作得以一致化。

延伸 : 運作原理

測試程式 test_list 的流程如下:

透過函式 list_reset 初始化全域串列與節點陣列，將每個節點的值設定為索引值並清除所有鏈結，對於所有測試前都會先執行。
測試第一種情況，傳入 before 為 l.head 以逆序方式將陣列中的內容插入到鏈結串列中，並確保測試後的鏈結串列數值是反向排列的。
測試第二種情況，傳入 before 為 NULL，以順序方式將陣列中的內容插入到鏈結串列中，並確保數值是順序的。
測試第三種情況，同上一條。

延伸 : 合併排序

可以透過以下方式進行遞迴式單向鏈結串列排序:

透過快慢指標將鏈結串列分成兩段，第一段的頭部由 head 指標指向，第二段頭部則是 slow->next 指向的節點，注意要透過另一個指標(e.g., mid 儲存)。
切斷第一部分和第二部分的鏈結( slow->next == NULL )
將 head 和 mid 傳入 merge_sort_list (遞迴呼叫)
當 head 和 mid 已經排序完並返回後，呼叫 merge_lists 合併兩者為單個鏈結串列。

合併中，當兩個鏈結串列非空時，取出值較小者，放入要返回的鏈結串列尾部，並將被取出值的鏈結串列往下一個移動

實作如下:

/**
 * @brief Merges two sorted linked lists into one sorted list.
 * @param a pointer to the first sorted list.
 * @param b pointer to the second sorted list.
 * @return pointer to the head of the merged sorted list.
 */
static list_item_t *merge_lists(list_item_t *a, list_item_t *b)
{
    list_item_t dummy;             // dummy node to simplify merge operation
    list_item_t *tail = &dummy;     // tail pointer for the merged list
    dummy.next = NULL;
    
    while (a && b) {
        if (a->value <= b->value) {
            tail->next = a;
            a = a->next;
        } else {
            tail->next = b;
            b = b->next;
        }
        tail = tail->next;
    }
    tail->next = (a) ? a : b;
    return dummy.next;
}

/**
 * @brief Recursively sorts the linked list using merge sort.
 * @param head pointer to the head of the list.
 * @return pointer to the head of the sorted list.
 */
static list_item_t *merge_sort_list(list_item_t *head)
{
    if (head == NULL || head->next == NULL)
        return head;
    
    // Find the middle of the list using slow and fast pointers
    list_item_t *slow = head;
    list_item_t *fast = head->next;
    while (fast && fast->next) {
        slow = slow->next;
        fast = fast->next->next;
    }
    
    // Split the list into two halves
    list_item_t *mid = slow->next;
    slow->next = NULL;
    
    // Recursively sort each half
    list_item_t *left = merge_sort_list(head);
    list_item_t *right = merge_sort_list(mid);
    
    // Merge the two sorted halves
    return merge_lists(left, right);
}

/**
 * @brief Sorts the entire linked list in ascending order.
 * @param l pointer to the list structure.
 */
void list_sort(list_t *l)
{
    if (l == NULL)
        return;
    l->head = merge_sort_list(l->head);
}

藉由以下修改使 merge_lists 消除 if - else:

static list_item_t *merge_lists(list_item_t *a, list_item_t *b)
{
    list_item_t dummy;
    list_item_t **last = &dummy.next;
    dummy.next = NULL;
    
    while (a && b) {
        list_item_t **min = (a->value <= b->value) ? &a : &b;
        *last = *min;
        last = &(*last)->next;
        *min = (*min)->next;
    }
    *last = a ? a : b;
    return dummy.next;
}

測驗 2

Solution

EEEE : (*pred_ptr)->r
FFFF : &(*pred_ptr)->r

Description

當要快速解題時，著重在以下內容:

/* Find the in-order predecessor:
 * This is the rightmost node in the left subtree.
 */
block_t **pred_ptr = &(*node_ptr)->l;
while (EEEE)
    pred_ptr = FFFF;

註釋提到要找左子樹中最右下角的節點，起始位置已經設定成左子樹，因此我們只需要確認每個節點是否具有右子節點。如果有則更新成該節點並繼續，反之結束。

EEEE 要確認是否具有右子節點(非 NULL )，故用 (*pred_ptr)->r
FFFF 是更新節點，pred_ptr 是 indirect pointer 故為 &(*pred_ptr)->r。
注意 * 和 -> 的優先級。

延伸 : 分析 & 解讀

先探討試題中的內容。

首先看到 block_t 定義:

typedef struct block {
    size_t size;
    struct block *l, *r;
} block_t;

其中 size 用於紀錄該 block 的大小 (bytes)。

接著有兩個未提供的實作的函式，我們稍後探討這部分的實作:

block_t **find_free_tree(block_t **root, block_t *target);
block_t *find_predecessor_free_tree(block_t **root, block_t *node);

關鍵函式 remove_free_tree:

void remove_free_tree(block_t **root, block_t *target)

裡面用到幾個關鍵變數，如下表:

Variable	Type	Description
`target`	`block_t*`	A pointer to the target node, i.e., the node that is intended to be removed from the tree.
`pred_ptr`	`block_t**`	An indirect pointer to the field in the left subtree of the target node, used to locate its in-order predecessor (i.e. the rightmost node in the left subtree).
`node_ptr`	`block_t**`	An indirect pointer pointing to the field (in the parent or root) that holds the target node's address, enabling direct modification of the tree structure during removal.

remove_free_tree 可以分成三個階段:

在樹中找到 target ，並用其親代的左或右成員地址表達，並紀錄在 node_ptr 中。
根據 target 的左節點和右節點，在移除前更新親代的子節點，避免直接移除破壞樹結構。
移除 target 的子節點，避免被刪除的節點還能存取樹中的其他節點。

針對第二點，可以分為三種情況:

target 同時有左和右子節點。
target 僅具有單邊節點。
target 不具有任何子節點。

第二種情況只要將 node_ptr 指向唯一的子節點即可:

*node_ptr = child;

第三種情況則更新成 NULL :

*node_ptr = NULL;

Note

我認為第二種情況可以和第三種合併，因為第三種情況不論取左右子節點，都是 NULL。

針對第一種情況，要先尋找到目標節點的中序前驅節點。中序前驅中，中序是走訪方式(左 ➞ 中 ➞ 右)；前驅則指最靠近目標節點，但比目標節點小的節點。因此中序前驅可以理解為前一個。注意靠近並非路徑靠近，而是值靠近，例如此處應為 size 接近。

對於二元搜尋樹 (Binary Search Tree, BST) 而言，中序前驅節點位於左子樹最右下角的節點，反之中序後繼則在右子樹最左下角。

當找到中序前驅節點後，分兩種情況:

左子樹中沒有任何右節點 ( *pred_ptr == (*node_ptr)->l )
其他

對於前者:

紀錄 target 右節點 ( (*node_ptr)->r )。
- 不用紀錄 target 左節點的左子樹原因是，即將用 target 左節點替換掉 target，其本身的 l 成員已有該資訊，缺乏的僅是右側節點的資訊。
將原本指向 target 的 *node_ptr 改指向中序前驅節點 *pred_ptr，注意不是 node_ptr = pred_ptr 而是 *node_ptr = *pred_ptr，後者才會改變樹結構。
確保沒有顯而易見的環 (指向自身)。

對於後者:

紀錄 target 左右節點資訊。
用另一個指標保存中繼前驅節點，用於解決下一步從樹中移除中繼前驅後，其 l 成員會被置 NULL 導致遺失左節點的問題，用 pred_node 紀錄。
將中繼前驅從原位置移除(再次呼叫 remove_free_tree )
變更 *node_ptr 指向 pred_node 指向的節點。注意不可用 node_ptr = &pred_node 因為 pred_node 是區域變數。
檢查是否有顯而易見的環。

延伸 : 實作 free tree

實作前可以思考以下問題:

為什麼 find_free_tree 需要返回 block_t** ?

如果只是要修改 block_t 內容，block_t* 就足夠了。但若要更新其親代節點的子節點，則應返回 block_t**，並透過以下方式進行更新:

block_t **ptr = find_free_tree(root, target);
if (ptr) {
    // With an indirect pointer, we can directly modify the parent's pointer.
    
    // Get size
    size_t size = (*ptr)->size;
    
    // For example, remove target by updating the parent's pointer to point to target->l.
    *ptr = target->r; 
}

我們不用再特別尋找親代節點的指標再透過該指標去修改，而是直接返回親代的左子樹或右子樹 field 的地址 ( &parent->l or &parent->r )，如此便能直接藉由一次解引用更新裡面指向的節點。 remove_free_tree 中的使用方法便是如此 ( *node_ptr = *pred_ptr ) 。

下圖描述返回值 block_t** ( ptr ) 所指向的位置。

我們需要實作以下的函式:

create_block
insert_free_tree
remove_free_tree (同上方實作，移除檢查部分)
find_free_tree
free_free_tree
find_predecessor_free_tree (optional)
print_free_tree (optional)

以下是實現方法，運行範例請見 Compiler Explorer:

create_block

block_t *create_block(size_t size) {
    block_t *node = malloc(sizeof(block_t));
    if (!node) {
        perror("malloc");
        exit(EXIT_FAILURE);
    }
    node->size = size;
    node->l = node->r = NULL;
    return node;
}

insert_free_tree

當找到 NULL 處即可插入，以下是迭代寫法:

void insert_free_tree(block_t **root, block_t *node) {
    if (root == NULL || node == NULL) return;

    while (*root != NULL) {
        if (node->size < (*root)->size)
            root = &(*root)->l;
        else
            root = &(*root)->r;
    }
    *root = node;
}

遞迴寫法:

void insert_free_tree(block_t **root, block_t *node) {
    if (root == NULL) return;
    if (*root == NULL) {
        *root = node;
        return;
    }
    if (node->size < (*root)->size) {
        insert_free_tree(&((*root)->l), node);
    } else {
        insert_free_tree(&((*root)->r), node);
    }
}

find_free_tree

採用迭代走訪子樹，可以善用 BST 的特性，以 size 為判斷條件決定繼續走訪左子節點(小於)還是右子節點(大等於)。當 *root 為 target 時就返回。

block_t **find_free_tree(block_t **root, block_t *target) {
    while (root && *root) {
        if (*root == target) {
            return root;
        }
        if (target->size < (*root)->size) {
            root = &(*root)->l;
        } else {
            root = &(*root)->r;
        }
    }
    return NULL;
}

當然也能透過遞迴作法實現:

block_t **find_free_tree(block_t **root, block_t *target) {
    if (root == NULL || *root == NULL)
        return NULL;
    if (*root == target)
        return root;
    block_t **found = find_free_tree(&(*root)->l, target);
    if (found != NULL)
        return found;
    return find_free_tree(&(*root)->r, target);
}

free_free_tree

void free_free_tree(block_t *root) {
    if (root == NULL)
        return;
    free_free_tree(root->l);
    free_free_tree(root->r);
    free(root);
}

延伸 : 參考 memory-allocators 的效能比較與討論

要使當前程式為真正的記憶體配置器，要針對 block_t 結構體進行修改，例如需要紀錄真正可以讀寫的 buffer 地址。

可改進

BST 非常依賴 root，其決定該樹的平衡狀況，如果 root 為極端值，則操作速度會退化，是否能使用其他平衡樹?

Reference

charliechiu : linux2025-homework2

測驗 3

Solution

GGGG : head->prev=prev
HHHH : list_entry(pivot,node_t,list)->value
IIII : list_entry(n,node_t,list)->value
JJJJ : pivot
KKKK : right

Description

先看到 GGGG 所在的輔助函式:

static void rebuild_list_link(struct list_head *head)

只做以下流程:

用 prev 紀錄目前 head。
找到 head->next 用 node 紀錄。
進入循環，由於目前是單向鏈結串列，故停止條件為 node 不是 NULL (走訪到尾)。
循環內透過 prev, node 等操作，建立沿途每個元素的 prev 關係 ( next 已經被建立)。
離開循環後，使鏈結串列構成環，即將最後一個元素 prev 的 next 指向 head ，而 head->prev 則指向最後一個元素 prev。故得 GGGG 為 head->prev=prev。

接著看到 HHHH 的關鍵程式:

if (L != R) {
    struct list_head *pivot = L;
    value = /* HHHH */
    // ...
    if (n_value > value) {
        n->next = right;

可知 value 是比較的基準值，在快速排序中，我們以 pivot 的值作為基準，因此 value 是類似 to_node_t_func(pivot)->value，這可以透過 list_entry 巨集達成。由於 list_head 儲存在成員 list 中。故 HHHH 為 list_entry(pivot,node_t,list)->value。

接著移動到 IIII :

struct list_head *n = p;
p = p->next;
int n_value = /* IIII */;

我們能夠推斷 n_value 實際上就是 to_node_t(n)->value，也就是 list_entry(n,node_t,list)->value 即 IIII。

最後的 JJJJ 和 KKKK 可以放在一起看。快速排序中，合併後的順序是 left ➜ pivot ➜ right。題目中的作法是從取出第 begin[i]，而 i 是遞減的 ( i-- )，也就是實際合併時，會先將 begin[i+2] 處理後的放到 result 前面，之後是 begin[i+1] 處理後的，…。所以 right 應該要放置在第一個被拿取的地方，即:

begin[i] = left;
begin[i + 1] = pivot; /* JJJJ */
begin[i + 2] = right; /* KKKK */

延伸 : 解讀程式

在〈Optimized QuickSort — C Implementation (Non-Recursive)〉提到的陣列快速排序流程分成作者自己提出的和本來就存在於 Wiki 百科上的，可以用以下文字解釋:

Darel Rex Finley's Quick Sort

將 head 指向的元素複製到 pivot 中，L 從最左側開始，R 從最右側開始

R 往左側移動直到遇到比 pivot 小的值，此處為第七個元素 1，因此將該值複製到 L 指向的空間

R 已經置換過，換成 L 往右邊移動直到遇到比 pivot 大的值，此處為第二個元素 5，將該值複製到 R 指向的地方

繼續往左移動 R，最終指到第五個元素 2

接下來又換到 L，停在第四個元素 6

此時，換 R，但此時 R 碰到 L 故結束，將 pivot 的值複製進 L 指向的地方，並更新 begin 和 end

第一輪開始前，begin[i] 是 0、end[i] 是 8
第一輪結束後，begin[i+1] 是 L+1 (4)、end[i+1] 是 end[i] (8)、end[i] 被更新為 L (3)，且 i 往上遞增 1 為 1

繼續從 i = 1 開始，L 的新值為指向 index 為 4 的 6，R 指向 index 為 7 的 7，pivot 值改為 6

接著 R 往左移動到第七個元素 5，並將該值複製到 L 處

L 往右移動到第六個元素 8 時，複製到 R

換 R 但碰到 L，將 pivot 複製到 L 後結束第二輪

第二輪開始前，begin[i] 是 4、end[i] 是 8
第二輪結束後，begin[i+1] 是 L+1 (6)、end[i+1] 是 end[i] (8)、end[i] 被更新為 L (5)，且 i 往上遞增 1 為 2

第三輪，L 指向第七個元素 8，R 指向第八個元素 7。由於 R 當前指向的已經小於新的 pivot 值 8，因此將該值複製到 L。

而此時顯而易見的是，L 將會碰到 R，所以 pivot 的值將被放入最後一個元素中

達到該步驟後，剩下的就是更新 begin、end 並重走訪先前排序好的子陣列。剩下的過程中不會有任何除了 L 的值被替換(被 pivot 替換)，直到 i < 0

分析完陣列的作法後，結合原本題目的說明來看看鏈結串列的快速排序吧!
題目中一開始舉的例子省略中間步驟，導致有點難快速理解，其流程如下:

L 一開始在 array[0]，R 在 array[5]，當 R 往左邊走一格就碰到了小於 pivot 的值，因此發生 array[0] = array[4]。
換 L 移動，到 array[3] 發現大於 pivot 的值，所以 array[3] = array[4]
L 和 R 碰到，把 pivot 值給 array[3]

接著看到非 Linux Kernel list 的 quick_sort 實現，有幾點可以注意:

max_level 設定成 2n 是確保在極端情況下，不發生數組越界。
透過 list_tail 獲得最後一個元素 (相當於 R = end[i] - 1 )
L 從 pivot 的下一個開始(和陣列不一樣)
鏈結串列不使用 L 和 R 的複製，因為交換成本高，而是拆分成兩個子鏈結串列，小於等於的放 left，大於的放 right
只有 begin 沒有 end，放置到 begin 的順序按照索引順序為 left、 pivot、right

這些部分也會在用 list 實作的 quick_sort 中。

實作中有三處遮罩:

CCCC : p->next
DDDD : list_tail(&left) (先放左邊)
EEEE : list_tail(&right) (再放右邊)

最後看回使用 list 實作的 quick_sort，其測試的流程是透過 shuffle 先打亂有 100000 的鏈結串列，之後使用 quick_sort 排序並透過 list_is_ordered 驗證。

shuffle 使用 rand 實作，這能透過先前實作 PRNG 進行改進。
list_tail 在這邊沒有使用 prev 的作法是因為 quick_sort 過程中，會只維護單邊 ( netx 方向) 的鏈結，到最後才從新建立另一邊鏈結。

延伸 : 雙向鏈結的排序演算法考量

研讀〈A comparative study of linked list sorting algorithms〉

第二週

測驗 1

Solution

AAAA : list_first_entry
BBBB : list_del
CCCC : list_move_tail
DDDD : list_add
EEEE : list_splice
FFFF : list_splice_tail

Description

該測驗延續上週的快速排序，差別是更改了 API 讓使用者能夠傳入自定義的比較函式指標。

觀察核心程式碼，在開始時檢查邊緣情況確認是否能提早返回。接著進行 list_head 的初始化，再往下碰到了 AAAA。快速排序通常取頭部節點作為 pivot，因此 AAAA 對應的操作是取出成員 list 為 ad 的節點地址操作，如此後續才能執行如 pivot->list 等操作。故 AAAA 為 container_of。而由於 pivot 要移出鏈結串列外，因此 BBBB 為 list_del。

接著來到比較大小的部分，比 pivot 小的放到 list_less 中，其它放到 list_greater，應注意題目要求符合 stable sorting，意謂要符合原鏈結串列順序。因此插入時應插入在尾端。故 CCCC 和另一分支都使用 list_move_tail。

分區完成後，透過遞迴方式呼叫 list_quicksort，並於遞迴結束後，按序將 list_less、pivot、list_greater，題目中首先操作的是 pivot，因此可以使用 list_add 將其先加入 head 中，同時也是 head 的唯一節點。接著透過 EEEE 和 FFFF 分別操作 list_less、list_greater，如前述 EEEE 要讓 list_less 在 head 唯一的節點前，因此是從頭部插入，相反 FFFF 則要從尾部插入。故 EEEE 為 list_splice 而 FFFF 為 list_splice_tail。

在此，我們記錄 list_splice 和 list_cut_position 的功能。前者用於拼接鏈結串列，注意拼接後的哨兵節點 (例如 list_less) 是不能直接承接其他節點的!需要進行初始化後才能繼續使用。或者使用 list_splice_init 來簡化該流程。

list_cut_position 是輔助使用者從 head 的第一個節點到指定節點(含)移動到另一個鏈結串列中，剩下的留在原鏈結串列中。

延伸 : 改進 random_shuffle_array

在 random_shuffle_array 註釋提到:

/* WARNING biased shuffling */

所謂 biased shuffling 是指取樣時，由於使用取餘操作而結果非 0 時，會導致小於取餘結果的值有更高的機會被取樣。儘管機率很小，依然會降低隨機性。

可以藉由移除多出的尾端，使每種取餘結果出現的次數是相同的。

static uint16_t uniform_rand(uint16_t n)
{
    uint32_t range = (uint32_t)UINT16_MAX + 1;
    uint32_t bound = range - (range % n);          /* upper bound */
    uint16_t x;
    do {
        x = get_unsigned16();                      
    } while ((uint32_t)x >= bound);                /* reject if in the “extra” tail */
    return x % n;
}

static inline void random_shuffle_array(uint16_t *ops, uint16_t len)
{
    for (uint16_t i = 0; i < len; i++) {
        /* unbiased shuffling */
        uint16_t j = uniform_rand(i + 1);
        operations[i] = operations[j];
        operations[j] = i;
    }
}

但要注意，目前的作法是拒絕取樣(rejection sampling)，極端情況下可能會經歷多論迴圈一直重新取樣，這種非常數的運行時間有被惡意偵測去猜測(只要猜測尾端)的可能性!

延伸 : list_move_tail 換為 list_move 造成的影響

若將 list_move_tail 置換成 list_move 會使得原先在後的節點移動到新鏈結串列時被插入到頭部，最終得到的鏈結串列就會和原本的順序顛倒，也會使排序過程變成非穩定的。

穩定排序對於多重排序非常重要，例如 PCI 匯流排上的裝置被掃描時，會依照 bus number、device number、function number 等進行排序，若在排序第三種鍵時，發現兩個目標值相同而改變了兩者順序，可能導致依照第一、二種鍵的排序被破壞! 因此，應 list_move_tail 不能被 list_move 替換。

測驗 2

Solution

GGGG：14
HHHH：2
IIII：0
JJJJ：3
KKKK：2
LLLL：1
MMMM：~1
NNNN：
PPPP： 2

Description

首先觀察 clz2:














static const int mask[] = {0, 8, 12, GGGG};
static const int magic[] = {HHHH, 1, 0, IIII};

unsigned clz2(uint32_t x, int c)
{
    if (!x && !c)
        return 32;

    uint32_t upper = (x >> (16 >> c));
    uint32_t lower = (x & (0xFFFF >> mask[c]));
    if (c == JJJJ)
        return upper ? magic[upper] : KKKK + magic[lower];
    return upper ? clz2(upper, c + 1) : (16 >> (c)) + clz2(lower, c + LLLL);
}

對照題目給的 step 1 - 3，其中:

step 1 : line 9 - 10
step 2 : line 13
step 3 : line 11 - 12

step 1 中，根據 #define clz32(x) clz2(x, 0) 可知道 c 的初始值是 0。只要上半部非 0，c 每次遞增 1。因此 upper 的右移位數將會是 16、8、4、2 (根據 step 3，會在 2 bits 的情況時結束遞迴)。
對於 lower，遮罩 0xFFFF >> mask[c] 中，低位為 1 的數量與當前的 upper 位數 (也就是 upper 右移位數) 相同。也就是低為是 1 的數量也是 16、8、4、2。mask 陣列和上述數量的總和固定會是 16，所以 mask 如下:

static const int mask[] = {0, 8, 12, 14}; // GGGG == 14

接著讓我們跳到 step 3，對於停止條件是 2 bits，也就是 c 等於 3 時。故 JJJJ 為 3。line 12 告訴我們，當 upper 非 0，返回 magic[upper]，否則返回 KKKK + magic[lower]。變相告訴我們:

magic[1:3] 跟高半段有關 ( upper 為 0 會走另一個分支)。
magic[0] 只跟低半段有關。

magic 陣列是針對 0b00、0b01、0b10、0b11 而言，各提供了幾個 leading zero，因此 magic 陣列為

static const int magic[] = {2, 1, 0, 0}; // HHHH == 2, IIII == 0

對於 KKKK，當 2 bits 下，upper 是 0，代表 upper 固定提供兩個 leading zero，故 KKKK 為 2。

最終回到 step 2，重點在處理完 upper 後處理低半段的地方:

(16 >> (c)) + clz2(lower, c + LLLL)

其中:

16 >> c : 代表當前 upper 全是 0 會提供的 leading zero 數量。例如 c 是 0 也就是第一輪 upper 就都是 0，就會提供 16 >> 0 個 leading zero。
clz2(lower, c + LLLL) : 代表繼續處理下半部，每次遞迴 c 都遞增，故 LLLL 為 1。

探討完 clz2 和 clz32，可以繼續擴展至 clz64，後者依照上半段是否全 0 分為兩種情況

上半段全 0，使用 clz32 處理下半段 + 32 (上半段全 0)。
上半段不是全 0，使用 clz32 處理上半段。

至此，輔助函式探討完畢，進入 sqrti，參數是 x。根據註解，我們要計算 m，該值由 1 << shift 表達，且 m 要是 4 的冪。而 shift 為 fls(x) & MMMM。首先，fls 代表 find last set，即最高為是 1 的位元索引。該部分可以由 clz64 改寫，即 line 2:





int total_bits = 64;
int fls = total_bits - 1 - clz64(x); /* here */
int shift = fls & MMMM;

int m = 1 << shift;

m 要求是 4 的冪代表了 shift 必須偶數，如此 m 就會是

2^{s h i f t} = 4^{s h i f t / 2}

，確保其為 4 的冪。進而推斷 bitwise and MMMM 會使 fls 向下取偶，例如 29 會變成 28。該過程可以透過將第 0 位放 0 其他放 1 達成。故 MMMM 為 ~1，注意 ~ 的優先級比 bitwise and 高，因此不需要加上括號。

NNNN 和 PPPP 放在下節說明。

延伸 : sqrti 原理 (手算平方根)

延伸我們手算平方根的原理，可以推導出以下方式:

流程

X：待開平方的整數，例：十進制的 12312 或二進制的 0b100101
n：進制基底，十進制時 (
$n = 10$ )，二進制時 (
$n = 2$ )
分組（group）：將 (
$X$ ) 的數位從最高位起，每次「兩位」作為一組；十進制即「兩個十進位位元」，二進制即「兩個 bit」
P：已經確定的高位部分的商（partial root）
rem：剩餘前綴（remainder），即還沒開平方的那部分
d：本輪要嘗試的下一位商，範圍 (
$0 \leq d < n$ )

逐位試商方法可在任意進制

n

下高效求整數平方根。首先將整數

X

從最高位起，每次成組（兩位）帶下作為前綴。假設已得高位商為

P

，剩餘前綴為

rem

，我們在

0 \leq d < n

中找最大

d

，使得新商

P n + d

的平方不超過前綴。

透過 「帶下 → 試商 → 扣除 → 更新商」 的迭代，直至所有組處理完畢，即可得到

⌊ \sqrt{X} ⌋

。

根據平方和公式:

(P n + d)^{2} = P^{2} n^{2} + 2 P n d + d^{2} = P^{2} n^{2} + (2 P n + d) d

由於每次計算都已經會從

rem

扣除

P^{2} n^{2}

，因此我們在試商時使用的判斷式便是:

(2 P n + d) d \leq rem

P

的更新則是

P = P n + d

範例

讓我們以 10 進制開始示範，以 12312 為例子，此處

2 P n

可以寫成

20 P

。

分組

$12312 ⟶ (1) (23) (12)$
處理第一組
$(1)$
- 帶下：
  $rem = 0 \times 100 + 1 = 1$
- 除數底數：
  $P = 0$ ,
  $20 P = 0$
- 試商：最大
  $d = 1$ ，因
  $(0 + 1) \times 1 = 1 \leq 1$
- 更新：
  $rem = 1 - 1 = 0, P = 0 \times 10 + 1 = 1$
處理第二組
$(23)$
- 帶下：
  $rem = 0 \times 100 + 23 = 23$
- 除數底數：
  $P = 1$ ,
  $20 P = 20$
- 試商：
  
  $d = 1 : (20 + 1) \times 1 = 21 \leq 23, d = 2 : (20 + 2) \times 2 = 44 > 23$
  取
  $d = 1$
- 更新：
  $rem = 23 - 21 = 2, P = 1 \times 10 + 1 = 11$
處理第三組
$(12)$
- 帶下：
  $rem = 2 \times 100 + 12 = 212$
- 除數底數：
  $P = 11$ ,
  $20 P = 220$
- 試商：
  
  $d = 0 : (220 + 0) \times 0 = 0 \leq 212, d = 1 : (220 + 1) \times 1 = 221 > 212$
  取
  $d = 0$
- 更新：
  $rem = 212 - 0 = 212, P = 11 \times 10 + 0 = 110$
結果驗證

$110^{2} = 12100 \leq 12312 < 111^{2} = 12321$

因此答案是 110。

以 2 進制寫成 C code，

d \in {0, 1}

，

(4 P + d) d

作為試除數，

P = 2 P + d

uint64_t isqrt_direct(uint64_t X) {
    uint64_t P   = 0; // partial root (accumulated result)
    uint64_t rem = 0; // remainder after bringing down groups

    // Compute highest group index k (each group = two bits)
    // (63 - clz64(X)) gives the index of the top bit; >>1 to get group count
    int k = (63 - clz64(X)) >> 1;

    // Process each group from most significant (k) down to 0
    for (int i = k; i >= 0; --i) {
        // 1. Bring down the next group of two bits into rem:
        //    rem = rem * 4 + G_i, where G_i = (X >> (2*i)) & 0x3
        rem = (rem << 2) | ((X >> (2 * i)) & 0x3);

        // 2. Form the trial divisor t = 4*P + 1
        uint64_t t = (P << 2) | 1;

        if (rem >= t) {
            // d = 1 case: subtract t and set next root bit to 1
            rem -= t;           // rem -= (4*P + 1)
            P   = (P << 1) | 1; // P = 2*P + 1
        } else {
            // d = 0 case: set next root bit to 0
            P <<= 1;            // P = 2*P
        }
    }

    return P;
}

其中迴圈對

P

的操作能夠更簡化些:

for (int i = k; i >= 0; --i) {
    rem = (rem << 2) | ((X >> (2 * i)) & 0x3);

    uint64_t t = (P << 2) | 1;
    P <<= 1;
    if (rem >= t) {
        rem -= t;
        P |= 1;
    }
}

延伸 : sqrti 原理 (多元平方和)

題目中的作法可以參考從 √2 的存在談開平方根的快速運算 - Digit-by-digit Method、 Hacker's Delight Edition 2 中 11-2 Hardware Algorithm 和 2025-02-25 問答簡記。

$N^{2}$ ：要開平方的整數。
$n$ ：最高位的索引。將
$N$ 在所選進制下展開為

$N = a_{n} + a_{n - 1} + \dots + a_{0}$
後，最大的次方指標即為
$n$ 。
$m$ ：當前迴圈的位元指標，從
$n$ 依序遞減到
$0$ ，用以決定第
$m$ 位的貢獻。
$a_{m}$ ：第
$m$ 位的貢獻值。
- 二進制：
  $a_{m} \in {0, 2^{m}}$
- 十進制：
  $a_{m} \in {0, 1 \cdot 10^{m}, \dots, 9 \cdot 10^{m}}$
$P_{m + 1}$ ：已經「放入」的高位和，

$P_{m + 1} = \sum_{i = m + 1}^{n} a_{i} .$
注意，我們計算時會從高位往低位計算，因此
$P_{m + 1}$ 會比
$P_{m}$ 更早被計算。
$P_{m}$ ：加入第
$m$ 位後的部分和

$P_{m} = P_{m + 1} + a_{m} .$
$X_{m + 1}$ ：第
$m + 1$ 輪結束後剩餘的餘量，初始為

$X_{n + 1} = N^{2} .$
$Y_{m}$ ：若放入第
$m$ 位所產生的增量，

$Y_{m} = (2 P_{m + 1} + a_{m}) a_{m} .$
$X_{m}$ ：第
$m$ 輪結束後的新餘量，

$X_{m} = X_{m + 1} - Y_{m} .$

老師推導中的核心被開根號數 (

N^{2}

) 可以被拆解成多項和

(a_{n} + a_{n - 1} + . . . + a_{0})^{2}

，且多元平方和中，如果多增加一項 (

a_{m}

)，則增量 (

Y_{m}

) 可以由下式表達:

Y_{m} = (2 P_{m + 1} + a_{m}) a_{m} .

此處

P_{m + 1}

是指從最前面的項往下加到

a_{m}

前一項，例如對於

N = a_{n} + . . . + a_{m + 1} + a_{m}

，注意此處

m + 1

代表的是

m

的前一項而不是後一項:

P_{m + 1} = a_{n} + a_{n - 1} + . . . + a_{m + 2} + a_{m + 1}

這是很好的特性，因為能知道增加下一項

a_{m}

會不會超過

N^{2}

。將此概念套用到套用到 2 進制整數系統後，數字

x

可表示成 :

x = (\underset{leading zeros}{\underset{⏟}{00000}} \underset{value}{\underset{⏟}{b_{n} b_{n - 1} \dots b_{0}}})_{2}^{2} = ⌊ N^{2} ⌋

此時，

⌊ N^{2} ⌋

可表示成:

\begin{aligned} ⌊ N^{2} ⌋ & = (b_{n} \times 2^{n} + b_{n - 1} \times 2^{n - 1} + \dots + b_{1} \times 2^{1} + b_{0} \times 2^{0})^{2} \\ = (a_{n} + a_{n - 1} + \dots + a_{0})^{2} \end{aligned}

此時，誤差

X_{m}

可表達為 :

X_{m} = N^{2} - P_{m}^{2} = X_{m + 1} - Y_{m}

而

Y_{m}

可進一步表達成

\begin{aligned} Y_{m} & = (2 P_{m + 1} + a_{m}) a_{m} = (2 P_{m + 1} + b_{m} 2^{m}) (b_{m} 2^{m}) \\ = 2 b_{m} P_{m + 1} 2^{m} + b_{m}^{2} 2^{2 m} = {\begin{cases} 0, & b_{m} = 0, \\ P_{m + 1} 2^{m + 1} + 2^{2 m}, & b_{m} = 1. \end{cases} \end{aligned}

我們能將其拆解成

C_{m} = P_{m + 1} 2^{m + 1}

和

D m = 2^{2 m}

其中

C_{m}

可寫成:

\begin{aligned} C_{m} & = P_{m + 1} 2^{m + 1} \\ = (P_{m} - a_{m}) 2^{m} 2 \\ = 2 P_{m} 2^{m} - 2 a_{m} 2^{m} \\ = 2 c_{m - 1} - 2 a_{m} 2^{m} . \end{aligned}

移向後得:

C_{m - 1} = \frac{C_{m}}{2} + 2 a_{m} 2^{m} = \frac{C_{m}}{2} + D_{m}

D_{m}

的關係則如下:

D_{m} = a_{m} 2^{2 m} = a_{m} 2^{2 (m - 1) + 2} = a_{m} D_{m - 1} \times 4

移向後得

D_{m - 1} = {\begin{cases} \frac{D_{m}}{4}, & a_{m} = 1 \\ 0, & a_{m} = 0 \end{cases}

對應到題目中的實作可知:

b :
$Y_{m} = C_{m} + D_{m}$
y :
$C_{m} = \frac{C_{m + 1}}{2} + D_{m}$
m :
$D_{m}$ ，不論如何持續右移 2 bits

故 NNNN 對應 1 (

C_{m - 1} = C_{m} / 2

，右移 1)，PPPP 對應 2 (

D_{m - 1} = D_{m} / 4

，右移 2)。

另外，若不希望分支存在，可以使用:

t = (int64_t)(x | ~(x - b)) >> 63; // -1 (all set) if x >= b, else 0 (all unset).
x = x - (b & t);
y = y + (m & t);

延伸 : sqrti 向上取整

向上取整的實作可以基於前述向下取整的實作，只需要判斷 x 殘留值是否大於 0。因此返回值更改為以下即可:

return y + (x > 0);

根據 C11 §6.5.8 - 6

Each of the operators < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is
false.
$^{107)}$ The result has type int.

判斷大小的運算子只可能返回 0 或 1，因此可用來進行向上取整。

延伸 : sqrtf

測驗 3

延伸 : 證明 Theorem S

When we successively insert the points

{θ}, {2 θ}, \dots

into the interval

[0, 1]

, Theorem S asserts that each new point always splits one of the largest remaining subintervals.

If the interval

[a, c]

is thereby broken into two parts

[a, b]

and

[b, c]

, we call it a bad break if one of these parts is more than twice as long as the other, namely if

$b - a > 2 (c - b)$ , or
$c - b > 2 (b - a)$ .

Exercise 8 (Section 6.4): Prove that bad breaks will occur for some

{n θ}

unless

θ mod 1 = φ^{- 1} or θ mod 1 = φ^{- 2},

and that for these latter values of

θ

, no bad breaks ever occur.

2025q1 Homework2 (quiz1+2)

第一週

測驗 1

Solution

Description

延伸 : 運作原理

延伸 : 合併排序

測驗 2

Solution

Description

延伸 : 分析 & 解讀

延伸 : 實作 free tree

延伸 : 參考 memory-allocators 的效能比較與討論

可改進

Reference

測驗 3

Solution

Description

延伸 : 解讀程式

延伸 : 雙向鏈結的排序演算法考量

研讀 〈A comparative study of linked list sorting algorithms〉

第二週

測驗 1

Solution

Description

延伸 : 改進 random_shuffle_array

延伸 : list_move_tail 換為 list_move 造成的影響

測驗 2

Solution

Description

延伸 : sqrti 原理 (手算平方根)

延伸 : sqrti 原理 (多元平方和)

延伸 : sqrti 向上取整

延伸 : sqrtf

測驗 3

延伸 : 證明 Theorem S

Read more

2025q1 Homework1 (lab0)

2025q1 Homework3 (kxo)

ORB-SLAM3 for ROS2 on Ubuntu 24.04

kxo Code Write-Up

研讀〈A comparative study of linked list sorting algorithms〉