2025q1 Homework2 (quiz1+2)

contributed by < BigMickey69 >

quiz1

測驗一

填空格 - list_insert_before 函式：

list_item_t **p;
for (p = &l; *p != before; p = &(*p)->next)
    ;
*p = item;
item->next = before;

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

測驗一的程式碼目的是將 items 這個 array 中的所有元素插入一 list l 之中，先是插到 l.head 前，再檢查結束後的清單大小是否為N。

/* Test inserting at the beginning */

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

再來是插在 NULL 前（清單最後），並檢查節點 value 是否正確。

/* Test inserting at the end */

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

最後一樣是插到最後，但插入結束後只會檢查清單大小是否為N。

#define my_assert(test, message) \
    do {                         \
        if (!(test))             \
            return message;      \
    } while (0)

這邊使用 do-while 可使巨集變得好幾行，行為更貼近函式。

為什麼使用巨集而不使用函式？

my_run_test 與 my_assert 都使用於回傳 char* 的函式中，若使用函式的話就得多一行偵測它回傳的值，才能再決定是否 return。

e.g.
使用 function :

const char* check_value(int x) {
    const char* error = my_assert(x > 0, "x must be positive");
    if (error) return error;
    return "OK";
}

使用 macro :

const char* check_value(int x) {
    my_assert(x > 0, "x must be positive");
    return "OK";
}

耗更多記憶體
速度更慢
多一行程式碼

測驗二

我認為提供的程式碼試圖模擬 allocater 的工作，在控制的環境中模擬出配置記憶體與分頁等。

結構為樹狀、每個 block 都有兩個子節點，而函式 remove_free_tree 的目標是利用 Binary Search Tree 的概念去尋找 target 並釋放記憶體。

若沒有子節點 -> 直接刪除
若有一子節點 -> 子節點取代自己
若有兩子節點 -> 找出 in-order predecessor 來取代自己
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →

填空：

if ((*node_ptr)->l && (*node_ptr)->r) {
        /* Find the in-order predecessor:
         * This is the rightmost node in the left subtree.
         */
        block_t **pred_ptr = &(*node_ptr)->l;
        while ((*pred_ptr)->r)
            pred_ptr = &(*pred_ptr)->r;

要求：

解釋上方程式碼運作的原理，並嘗試補完程式碼，使其可獨立執行，應提供相關的測試程式碼
參閱 memory-allocators，針對你補完 (及後續改進) 的記憶體配置器，進行效能評比，並解讀其效能表現

補上缺的兩函式：

block_t **find_free_tree(block_t **root, block_t *target) {
    if (!root || !*root) return NULL;

    if (target->size < (*root)->size) {
        return find_free_tree(&(*root)->l, target);
    }
    else if (target->size > (*root)->size) {
        return find_free_tree(&(*root)->r, target);
    } 
    else {
        if (*root == target)
            return root;
        
        block_t **returner = find_free_tree(&(*root)->l, target);
        if (!returner) 
            returner = find_free_tree(&(*root)->r, target);
        return returner;
    }
}



block_t *find_predecessor_free_tree(block_t **root, block_t *node) {
    if (!root || !*root || !node->l)
        return NULL;

    block_t *cur = node->l;
    while (cur->r)
        cur = cur->r;

    return cur;
}

我認為若是要記憶體配置，則不太可能只有每個 size 指出現過一次。因此先用 BST 去找正確的位置，再從那邊出發找完全相符的目標元素。

測試程式碼：

int main() {
    block_t t_root = {50, NULL, NULL};
    block_t n1 = {30, NULL, NULL};
    block_t n2 = {70, NULL, NULL};
    block_t n3 = {20, NULL, NULL};
    block_t n4 = {40, NULL, NULL};
    block_t n5 = {60, NULL, NULL};
    block_t n6 = {80, NULL, NULL};

    /* Manually constructing BST */
    t_root.l = &n1;
    t_root.r = &n2;
    n1.l = &n3;
    n1.r = &n4;
    n2.l = &n5;
    n2.r = &n6;

    block_t *root = &t_root;

    printf("Before removing 30:\n");
    print_tree(root, 0);

    remove_free_tree(&root, &n1);  // Remove 30
    printf("\nAfter removing 30:\n");
    print_tree(root, 0);

    remove_free_tree(&root, &n3);  // Remove 20 (leaf)
    printf("\nAfter removing 20:\n");
    print_tree(root, 0);

    return 0;
}

特性	BST-分配器	傳統 malloc
分配	O(log n)	O(n)、 O(1)（快取）
釋放	O(log n)	O(n)
碎片化處理	較好	較差
搜尋 free block	O(log n)（BST 搜尋）	O(n)（無快取）
Metadata	需要額外指標 & 樹平衡	簡單 linked list

測驗三

在 linked list 中實現 quicksort ，因 linked list 不像是矩陣那樣，可自由抓每一部份，要操作資料就得走到節點處才行，所以用傳統的方式來 quicksort 效率上很差，而且又不希望使用遞迴，因此活用 stack 來完成快速排序。

運作原理：

創兩個 stack ，一為 begin 左邊界、一為 end 右邊界
從頭開始，pivot ＆ L 為頭節點、R 為尾節點（寫一 helper function 來方便找尾節點）
假設 cur 為目前節點，cur 開始走過每個節點，從 begin.pop 到 end.pop ，將所有小於 pivot 的點推入 left 的 sublist 中，而所有大於的丟入 right。再來在 begin 與 end 中新增新的節點
再來先處理 right sublist 再 left，重複操作。

要求：

解釋上述程式碼運作原理
研讀〈A comparative study of linked list sorting algorithms〉，分析針對雙向鏈結串列的排序演算法考量，並運用 Linux 核心風格的 List API 予以實作

Linux 核心風格的 List API 予以實作

教授有提供 Linux 核心風格 List API 的簡化標頭檔 list.h範例程式碼，但我決定用 lab0 專案中與使用的程式碼來操作，未來可方便直接用於 q_sort 函式之中，也可思考現階段的 mergesort 與此 quicksort 在各樣本數下速度的差異（Github連結）

Mock-up 的完整程式碼

邏輯與上面大同小異，只是細節上有些變化。首先會先斷掉 list 的尾巴，一律將所有 list 視為 singly linked list 來方便處理。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

一切分割跟排序都完成後，再將走過一遍連結串列，將 singly linked list 的 prev 指標/修正，變回完整的 doubly linked list。

寫的過程中也是遇到非常多問題，一次又一次的 segmentation fault ，debug 也是花了不少的時間。

示意圖（來源）:

坦承

這邊非常誠實的澄清，我發現我最近幾個月開始有些過度依賴chatgpt，以至於自己寫的程式在執行前都要給它驗證一遍才敢執行。在寫這個 quicksort 寫到一半時碰壁，好幾小時在與 gpt 反覆辯論，但 gpt 就是抓不到錯誤，而我也沒有非常花心思去看自己的程式碼。我知道身邊許多資工人在不知不覺中也養成了這種壞習慣，過度依賴個很有瑕疵的工具。

掙扎許久後我把 gpt 關掉，打開 yt music 決定自己慢慢 debug，中間不斷插入 printf 看輸出，用小畫家來畫圖思考，幾小時候完成了程式。有時在想，如果從一開始就不這麼仰賴 gpt 來解題，如果不要那麼懶、更相信自己，是不是能夠更快寫出來?

本該是理所當然的事，現在才領悟。從今之後會警惕自己不要落入 gpt 的陷阱，多閱讀文獻、使用google，盡可能靠自己的力量去完成，成為一位有價值的資訊人。

quiz2

測驗一

要求：

解釋上方程式碼運作原理，改進 random_shuffle_array 也探討 list_for_each_entry_safe 建構的迴圈中，若將 list_move_tail 換為 list_move，為何會無法滿足 stable sorting
將程式碼整合進 lab0 提及的 qtest，並探討環狀雙向鏈結串列的排序效能，並善用 perf 一類的效能分析工具，探討包含 Linux 核心 list_sort 在內排序實作的效能表現並充分解釋

一開始將 AAAA, BBBB, CCCC, DDDD, EEEE, FFFF 填入後將完整程式碼建成一 .c 檔，發現程式會卡在一迴圈中出不去。
Output:

qsort 後執行 linkedlist quicksort，但遲遲無法完成

問題出在 list_for_each_entry_safe 迴圈中，AAAA格的答案猜錯，應為 list_first_entry 才對：

// list_quicksort 函式中
    pivot = container_of(head, struct listitem, list);
    list_del(&pivot->list);
    
    list_for_each_entry_safe (item, is, head, list) {
        //printf("going");
        if (cmpint(&item->i, &pivot->i) < 0)
            list_move_tail(&item->list, &list_less);
        else
            list_move_tail(&item->list, &list_greater);
    }
    printf("checkpoint\n");

改善 random_shuffle_array

回想起 lab0 的其中一個要求 - 實做亂數產生器，其中就有提到 Fisher–Yates shuffle演算法，這邊也將使用此演算法

演算法邏輯：

從最後一個元素 n 開始（標為i），此為元素一
利用 rand() % i + 1 取餘數找出元素二的位置，簡稱 j（rand 使用系統時間生成亂數）
將 i 與 j 兩元素進行交換
i - 1 ，以此類推直到 i 走到第一格

graph (6)

graph (7)

graph (8)

static inline void random_shuffle_array(uint16_t *operations, uint16_t len)
{   
    // Set every num to their respective number first
    for (uint16_t i = 0; i < len; i++) 
        operations[i] = i + 1;
    
    // Fisher–Yates shuffle
    for (uint16_t i = len-1; i > 0; i--) {
        uint16_t j = rand() % (i + 1);
        uint16_t temp = operations[i];
        operations[i] = operations[j];
        operations[j] = temp;
    }
}

為何將 list_move_tail 換為 list_move 時，會無法滿足 stable sorting

stable sorting -> 當兩元素數值相同時，排序前後的順序兩元素的順序會維持不變

list_move_tail -> 將節點丟到 list 尾端

list_move -> 將節點丟到 list 頭端（dummy head 後）

假設一 list = [3, 1, 2, 2, 4]，2用不同顏色來方便區分

pivot 為 3，接下來得將元素們分為兩 list，Less 與 Greater

list_move_tail：

Less: [1, 2, 2]

Great: [4]

list_move：

Less: [2, 2, 1]

Great: [4]

因為元素是依序丟入，所以藍色的2會先被處理而先加入清單，紅色的2則會比較晚加入，因此最後在藍色2的前方。

兩相同元素在排序前後改變了。

quicksort 融入 lab0 與新增 qtest 指令

將 listitem 改為 element_t，cmpint 改用 strcmp 基本上就能放入 lab0 的 queue.c 之中了。

在 queue.h 之中也得定義一行

void list_quicksort(struct list_head *head);

接下來進入 qtest.c ，新增一函式 do_quicksort ，程式碼這邊完全取自 do_sort 。來到 console_init 函式，新增一行：

ADD_COMMAND(quicksort, "**Assignment from N02** Sort queue in ascending order", "");

完成後重新執行 make 指令，測試 qtest ：

help 畫面有 quicksort 指令

沒問題！

問題：這些改動 commit 到 lab0 中時會被擋下，

解決辦法：改把函式直接丟到 qtests 之中，並刪掉在 queue.h 裡的定義

利用 perf 分析

可看見這個 mov 指令旁邊寫了 100％，但這不代表這個指令花費了這部份全部的處理器時間，而是這邊是熱點，每次檢查時都在這邊。

這意味著時間都花在這個迴圈之中，也就是走過連結串列的迴圈中，算是在預料之內。

如何簡短時間？

既然時間都花在走過各連結，那麼解決改善的方向也很簡單，減少需要在串列上走的時間。

我的想法：
每次呼叫 list_move_tail 都得走遍整個連結串列找到尾巴，浪費時間。

使用 lise_move：
- 優：省時
- 缺：破壞 stable sort
改用雙向連結串列：
- 優：省時，尾巴為 head->prev
- 缺：節點結構得花更多記憶體

Linux 核心內的 list_sort

為一種改良過的 merge sort ，與自己在 lab0 實做的合併排序最大的不同是 bits 這邊：

do {
    size_t bits;
    struct list_head **tail = &pending;

    /* Find the least-significant clear bit in count */
    for (bits = count; bits & 1; bits >>= 1)
        tail = &(*tail)->prev;
    /* Do the indicated merge */
    if (likely(bits)) {
        struct list_head *a = *tail, *b = a->prev;

        a = merge(priv, cmp, b, a);
        /* Install the merged result in place of the inputs */
        a->prev = b->prev;
        *tail = a;
    }

    /* Move one element from input list to pending */
    list->prev = pending;
    pending = list;
    list = list->next;
    pending->next = NULL;
    count++;
} while (list);

清單從後往前走，每次往 prev 指標前進，接下來將一部份一部份慢慢分析

for (bits = count; bits & 1; bits >>= 1)

count 的二進位表示紀錄著當前大小為 2^k 的 list 在等待合併，k 為第 K 位元

bits ＆ 1 檢查要不要繼續前進

#define likely(x)   __builtin_expect(!!(x), 1)

if(likely(bits)) 與 if(bits) 邏輯上是一樣的，但是使用巨集可告知編譯器預期結果為 true ，從而進一步去優化此程式

接下來把下一個該處理的節點存入 pending，並斷掉它的 next 指標，最後 count + 1

簡單卻很有效的用 count 跟 bit minipulation 去紀錄是否有待合併清單，在有相同大小清單時迅速做出合併，減少整體比較次數、也降低 cpu & 記憶體的開銷。

測驗二

要求：

解釋上述程式碼，並探討擴充為
$⌈ \sqrt{x} ⌉$ (向上取整數) 實作，該如何調整並保持只用整數加減及位元操作
參照計算機結構測驗題 C 及其注釋，以 C 語言 (搭配 GNU extension) 撰寫不依賴除法指令的 sqrtf，確保符合 IEEE 754 規格。對照 sqrtf, rsqrt, reciprocal implementations in Arm assembly
參照從 √2 的存在談開平方根的快速運算提到的「藉由查表的平方根運算」，以 C 語言實作，並比較不同手法的整數開平方根效能
- 一旦發現效能改進的機會，可準備提交改進 int_sqrt 程式碼到 Linux 核心

clz(), clz32(), clz64()

unsigned clz2(uint32_t x, int c)
{
    if (!x && !c)
        return 32;

    uint32_t upper = (x >> (16 >> c));
    uint32_t lower = (x & (0xFFFF >> mask[c]));
    if (c == 3)
        return upper ? magic[upper] : 2 + magic[lower];
    return upper ? clz2(upper, c + 1) : (16 >> (c)) + clz2(lower, c + 1);
}

函式 clz2 不外乎目的就是 count leading zeros ，利用遞迴不斷的分割成上下部份，直到上部份與下部份剩 2 位元後再利用 magic[] 查表計算出結果

呼叫前都得傳入個 0 初始話，利用巨集省麻煩 -> clz32

x >> 16 分離出上半部份，利用 AND 跟查表分離出下半部份。在 c == 3 之前不斷分割。若要計算 64 位元的整數，則手動分割成上下部份再傳入 clz32 兩次
graph (9)

整數平方根 sqrti()

uint64_t sqrti(uint64_t x)
{
    uint64_t m, y = 0;
    if (x <= 1)
        return x;

    int total_bits = 64;

    /* clz64(x) returns the count of leading zeros in x.
     * (total_bits - 1 - clz64(x)) gives the index of the highest set bit.
     * Rounding that index down to an even number ensures our starting m is a
     * power of 4.
     */
    int shift = (total_bits - 1 - clz64(x)) & ~1;
    m = 1ULL << shift;

    while (m) {
        uint64_t b = y + m;
        y >>= 1;
        if (x >= b) {
            x -= b;
            y += m;
        }
        m >>= 2;
    }
    return y;
}

~1 = 0xFFFFFFFFFFFFFFFE，除了 LSB 其餘位元均為1 ，可確保 shift 為4的倍數，m 為最接近但小於 x 的4的倍數

從最大的候選位元開始，若加入後仍小於 x ，
，則存入 y 之中，反之不存，m 再/4。每一次迴圈中y會往右一格，因而依序存下每個位元

$⌈ \sqrt{x} ⌉$ 向上取整數

在return y 前加這一小段就可以實現：

if(x)
    return y+1;

看起來簡單但也是想了一陣子，我是假設兩數字 144 與 143 並紀錄 sqrti 過程中的每一步驟中的x, y 與 m 數值並去尋找規律，而發現當剛好為平方數時最後會 x = 0

e.g. 144:

iteration 1: x = 144, y = 0, m = 64

b = 0 + 64 = 64
y >>= 1 = 0
x >= b is true:
- x = 144 - 64 = 80
- y = 0 + 64 = 64
m >>= 2 = 16

iteration 2: x = 80, y = 64, m = 16

b = 64 + 16 = 80
y >>= 1 = 32
x >= b is true:
- x = 80 - 80 = 0
- y = 32 + 16 = 48

m >>= 2 = 4

iteration 3: x = 0, y = 48, m = 4

b = 48 + 4 = 52
y >>= 1 = 24
x >= b is false
m >>= 2 = 16

iteration 4: x = 0, y = 24, m = 1

b = y + m = 24 + 1
y >>= 1 = 12
x >= b is false
m >>= 2 = 0

x is 0 when the loop ends!

e.g. 143:

iteration 1: x = 143, y = 0, m = 64

b = 0 + 64 = 64
y >>= 1 = 0
x >= b is true:
- x = 143 - 64 = 79
- y = 0 + 64 = 64
m >>= 2 = 16

iteration 2: x = 79, y = 64, m = 16

b = 64 + 16 = 80
y >>= 1 = 32
x >= b is false
m >>= 2 = 4

iteration 3: x = 79, y = 32, m = 4

b = 32 + 4 = 36
y >>= 1 = 16
x >= b is true:
- x = 79 - 36 = 43
- y = 16 + 4 = 20
m >>= 2 = 1

iteration 4: x = 43, y = 20, m = 1

b = 20 + 1 = 21
y >>= 1 = 10
x >= b is true:
- x = 43 - 21 = 22
- y = 10 + 1 = 11
m >>= 2 = 0

x is 22 when the loop ends!

撰寫不依賴除法指令的 sqrtf

這邊提供 sqrtf1, sqrtf2, sqrtf3 三個函式
- 大同小異，只有微小的差異

sqrtf1:

float sqrtf1(float number)
{	
	float y = number;
	uint32_t i = *(uint32_t*)&y;
	i = (i>>1) + 0x1fc00000;
	y = *(float*)&i;
	
	return y;
}

參照 quake3 中的 fast_inverse_sqrt，我想用差不多的作法來尋找平方根！

首先，原想起浮點數的定義，1 位元為sign、 8位元為 Exponent、23位元為Mantissa：

平方根恆正，所以 sign bit 為0

因此 bit representaion 為 E ＆ M或是 2^23*E + M ，而浮點數的實際數值為(1+M/2^23 )*2^(E-127)

接下來，我們它取log2：

變成 log2(1+M/2^23) + E - 127後，數字乍看之下變得比較難處理，但回想起大一微積分：

當 x 趨近 0，log(1+x) ~= x

話一張圖表示曲線與直線，若平移直線一咪咪可以讓0到1之間的預測值誤差控制一些，使0到1之間的預測值平均誤差控制到最小，μ = 0.043

帶進去後移項我們可以發現 M+2^23*E又出現了，沒錯這就是前面寫道的bit representaion，得到結論：

某種程度上，浮點數的log2 約略為它的 bit representaion 移位後的數值！

前置作業完成，可以開始計算了：

這邊的目標是尋找 magic number，其實就只是計算式子＋移項後的結果。

**測試用函式： **

#define N_TESTS 100000

float testValues[N_TESTS];
for (int i = 0; i < N_TESTS; i++) {
    testValues[i] = (float)(i + 1);
}

double error_tot = 0.0;
float max_error = 0.0;

//printf("Value\tFastSqrt\tSqrt\t\tError (%%)\n");
//printf("-----------------------------------------------------\n");
for (int i = 0; i < N_TESTS; i++) {
    float value = testValues[i];
    float approx = sqrtf1(value);
    float correct = sqrt(value);

    float errorPercent = fabs(approx - correct) / correct * 100.0f;
    //printf("%-6.2f\t%-9.4f\t%-9.4f\t%-8.2f\n", value, approx, correct, errorPercent);
    error_tot += errorPercent;
    if(errorPercent > max_error)
        max_error = errorPercent;
}

printf("Error avg: %.6f%%\n", error_tot/N_TESTS);
printf("Error max: %.6f%%\n", max_error);

誤差不能說小，但在一些不要求精準度的情況下這樣或許就夠了，但若能再準一點就好了。

sqrtf1 耗時: 0.306210 seconds

sqrtf2:

引入牛頓法，設 f(y) = 0 = y^2 - x ，則新 y = y - f(y)/f'(y)，推導後得出 y/2 + x/2y

加上:

    // Newton's method to get better approx
    y = (y + number/y)*0.5;

誤差減少了許多，但因為使用了除法，耗時也增加了不少。

sqrtf2 耗時: 0.414774 seconds

sqrtf3:

回想起 quake3 中的 fast_inverse_square，其解 y = x^-1/2 乘上x不就是x^1/2了嗎？

float sqrtf3( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;
	i  = 0x5f3759df - ( i >> 1 );
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );

	return y*number;
}

中間邏輯與1 2 一樣，還沒有使用到除法，應為最快的吧？
但實際上 sqrt3 卻是三者之間最慢的函式，而且還慢不少。

sqrtf3 耗時: 1.453131 seconds

我的猜測：

y = y * ( threehalfs - ( x2 * y * y ) );

與

return y*number;

須花太多時間運算，多次浮點數乘法運算會比一次的除法還要花上更多時間

測時間函式：

#define NUM_TESTS 100000000
double get_time() {
    struct timespec t;
    clock_gettime(CLOCK_MONOTONIC, &t);
    return t.tv_sec + t.tv_nsec * 1e-9;
}
// time = start - end times

結論： sqrt2 準確且快速，但還是會輸 math.h 的 sqrt函式😢。
具體原因不確定，猜測是因為 math.h 的 sqrt 會動用處理器專門運算平方根的電路，程式不可能比單一功能的邏輯閘矩陣還快…

測驗三

要求：

解釋上述程式碼運作原理，應提供測試程式碼

針對 Linux 核心的 hash table 實作，提供更多圖例並揣摩 Linux 核心開發者
進行《The Art of Computer Programming》(vol 3) 一書section 6.4 的 exercise 8，也就是證明 Theorem S

注意: Linux 核心內部大量使用 hash table，但並非每處的雜湊函數都滿足減少碰撞及避免 clustering 現象的原則，也就是存在若干改進空間
探討 Linux 核心中使用 hash table 的應用案例，至少二項，應當列出原始程式碼並分析，斟酌搭配 git log 以得知 Linux 核心開發者變更程式碼的考量因素

這邊的 hashtable 實作沒有 bucket 大小上的限制，因此也沒有 collision (碰撞)的問題。每個bucket 會有個 hlist_head ，而 hlist_head 再指向第一個鍊結串列節點

Hash function 利用黃金比例生出 hash：

#define GOLDEN_RATIO_32 0x61C88647
static inline unsigned int hash(unsigned int val, unsigned int bits)
{
    /* High bits are more random, so use them. */
    return (val * GOLDEN_RATIO_32) >> (32 - bits);
}

黃金比例中的數字（尤其是高位元的）分佈還算隨機，簡單的乘上val再平移可以還算均勻的分佈每個元素，是個常見的 hash function

pprev

剛開始腦袋有點轉不過來，不理解 pprev 是拿來做什麼的，就好比 prev 是上一個節點的指標，我以為 pprev 上一個會儲存上一個節點指標的地址

實際上 pprev 要儲存著上一個指標的 next 指標的地址，為的是在刪除節點時能夠更方便且迅速的操作

// 若要刪除一節點 node ，兩行就能解決
*node->pprev = node->next;
free(node);

什麼是 map_t 中的 bits?

結構 map_t 中有個叫 bits 的整數：

typedef struct {
    int bits;
    struct hlist_head *ht;
} map_t;

hashtable 中的 bucket count 就是由 bits 推出來，bucket_count = 1 << bits 。這麼做在操作時可依賴 bit operation 從而加速 hash 等函式的執行速度，也因為為2的倍數而在記憶體配置上有一咪咪的優勢

map_init配置記憶體並回傳一初始化的 map_t

hash 尋輸入值對應的 hash 值

find_key 到 hash table 中對應的 bucket，走遍連結串列尋找對應 key 的 hash_key

map_add 先檢查此 key 是否已存在，若無則配置記憶體並將它插入 bucket 中 first 的位置，若已被佔據則將舊節點往後推

最後 map_deinit 負責刪除整個結構並釋放所有被配置的記憶體

Linux 核心中使用 hash table 的應用案例

Page Tables:

Paging 或是分頁是記憶體管理的一部分，將空間分科成一塊一塊固定大小，再做記憶體與虛擬記憶體的管理與交換。需要時將資源帶到記憶體中，待機時資源送到虛擬記憶/硬碟中。

Linux 核心中共有4種等級的 Page Tables

Page Global Directory (PGD) - pgd_t/pgdval_t.
Page Upper Directory (PUD) - pud_t/pudval_t.
Page Middle Directory (PMD) - pmd_t/pmdval_t.
Page Table Entry directory (PTE) - pte_t/pteval_t.

// 定義
typedef unsigned long   pgdval_t;
typedef unsigned long   pudval_t;
typedef unsigned long   pmdval_t;
typedef unsigned long   pteval_t;

typedef struct { pgdval_t pgd; } pgd_t;
typedef struct { pudval_t pud; } pud_t;
typedef struct { pmdval_t pmd; } pmd_t;
typedef struct { pteval_t pte; } pte_t;

pgdval_t 為32位元，因此使用 unsigned long，再用 pgd_t 將其包住，這麼做雖然看起來沒意義，但卻是有些好處。

型別安全：一個期待傳入 pgd_t 的函式無法不小心被傳入 pmd_t

抽離：若哪天得改變核心中的 page global directory，則改變定義就行，不必大幅更動程式碼

直觀：一眼就能知道程式在操作哪個部份的 paging ，而不會將四個稿混

注意到了嗎？分頁就是很多個 hash table 綁在一起，一層一層的排列，好比千層麵

多層 Hash Table 結構的好處

節省空間：
如果直接用一個巨大的單級頁表，則會產生大量的空洞（sparse mapping）、浪費記憶體。分層的結構使在虛擬地址空間的某個區段被實際使用時，才會被分配相應的下層頁表，從而更有效地利用內存

搜尋速度：
CPU 將虛擬地址轉為實際地址時，會依次通過各層，從 PGD 層開始，根據虛擬地址的位元找到對應的 entry。再利用這個 entry 指向下一層（PUD）的頁表，重複同樣的查找過程。一直到 PTE 層，最終獲得具體的實際頁面地址。

因為只是在多個小型 hash table 中尋找，因此理論上每步驟都是O(1)常數時間，所以整體搜尋效率還是很高的。

Linux 核心 Github

Kobject

Linux 核心 Github

kobject 為核心中用來表示在 user space 的核心物件的一種概念（?）：

每個 kobject 的週期都有個管理者，當操作完畢或是 scope 改變就會釋放記憶體
kobject 像樹狀結構那樣整理，且每個物件都有自己類似ID 的獨特辨認串與指向 parent kobject 的指標

切入主題，在 lib/kobject.c 中，kobject 主要是以鏈結串列加入到對應的 kset 中，但搜尋時時，為了加速搜尋，sysfs 的基層 kernfs 會用 hash table 來尋找這些物件

e.g.

struct kobject *kset_find_obj(struct kset *kset, const char *name)
{
    struct kobject *k;
    struct kobject *ret = NULL;

    spin_lock(&kset->list_lock);
    list_for_each_entry(k, &kset->list, entry) {
        if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
            ret = kobject_get_unless_zero(k);
            break;
        }
    }
    spin_unlock(&kset->list_lock);
    return ret;
}

EXPORT_SYMBOL_GPL(kset_find_obj);

為的就是 hash table 那快速 O(1) 的查搜尋速度

kobject namespace 操作與固定陣列
另一個和 hash table 概念相關的例子在於 kobject 的命名空間管理。內核中有一個固定大小的陣列，用來存放不同命名空間類型的操作函式指標：

static DEFINE_SPINLOCK(kobj_ns_type_lock);
static const struct kobj_ns_type_operations *kobj_ns_ops_tbl[KOBJ_NS_TYPES];

開發者透過陣列，可以根據 enum kobj_ns_type 操作在O(1)內查詢到對應的命名空間。這就類似於一個簡化的 hash table，其鍵值也是固定，提供了非常快的查找速度。因 kobj_ns_type 的大小有限，所以可直接使用陣列省麻煩