2022q1 Homework1 (lab0)

contributed by < rickywu0421 >

開發環境

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          1
On-line CPU(s) list:             0
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           142
Model name:                      Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
Stepping:                        10
CPU MHz:                         1391.999
BogoMIPS:                        2783.99
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       32 KiB
L1i cache:                       32 KiB
L2 cache:                        256 KiB
L3 cache:                        6 MiB

lab0-c 開發過程

參考你所不知道的 C 語言：linked list 和非連續記憶體實作

以下開發過程以 Head 表示佇列的頭，以 Node 表示 element_t 結構體。

所有以 list 作為開頭的 inline function 及 macro 均定義自 list.h。

q_new

首先透過 malloc 配置空間~~生成一個 allocated memory~~ 給 head, 接著透過 inline function INIT_LIST_HEAD() 初始化 head, 使 head 的 prev 與 next 指向自身。

注意用語，配置 (allocate) 和生成 (generation) 不同，請查閱 Cambridge Dictionary
:notes: jserv

struct list_head *q_new()
{
    struct list_head *head;

    head = (struct list_head *) malloc(sizeof(struct list_head));
    if (!head)
        return NULL;
    INIT_LIST_HEAD(head);

    return head;
}

q_free

透過 list_for_each_safe() 走訪整個 linked list, 並使用 list_entry() 得到 Node 的指標, 對 Node 的 value 及 Node 本身進行 free() 操作。待清完所有 queue element, 對 Head 進行 free() 操作。

void q_free(struct list_head *l)
{
    struct list_head *pos, *safe;

    if (!l)
        return;

    list_for_each_safe (pos, safe, l) {
        element_t *ele;

        ele = list_entry(pos, element_t, list);
        free(ele->value);
        free(ele);
    }

    free(l);
}

`__q_insert`

該函式僅作為此檔案內部使用 (即 helper function), 其目的為實作 q_insert_head() 及 q_insert_tail() 中的共同部分, 即初始化 Node, 並將字串透過 strncpy() 複製到 Node 的 value 中。

static bool __q_insert(element_t **ele, char *s)
{
    size_t len;

    if (!s)
        return false;

    *ele = (element_t *) malloc(sizeof(element_t));
    if (!(*ele))
        return false;

    len = strlen(s);
    (*ele)->value = (char *) malloc(len + 1);
    if (!(*ele)->value) {
        free(*ele);
        return false;
    }

    /* Copy string into queue element */
    strncpy((*ele)->value, s, len + 1);

    return true;
}

q_insert_head

透過 __q_insert() 初始化節點, 若初始化成功則透過 list_add() 將節點的 list 鏈結到 Head 的 next。

bool q_insert_head(struct list_head *head, char *s)
{
    element_t *ele;

    if (!head)
        return false;

    /* Insert queue element into head of @head */
    if (__q_insert(&ele, s)) {
        list_add(&ele->list, head);

        return true;
    }

    return false;
}

q_insert_tail

透過 __q_insert() 初始化 Node, 若初始化成功則透過 list_add_tail() 將 Node 的 list 鏈結到 Head 的 prev (即 list 的最後)。

bool q_insert_tail(struct list_head *head, char *s)
{
    element_t *ele;

    if (!head)
        return false;

    /* Insert queue element into tail of @head */
    if (__q_insert(&ele, s)) {
        list_add_tail(&ele->list, head);

        return true;
    }

    return false;
}

q_remove_head

使用 list_entry() 得到 queue 中的第一個 Node, 並透過 list_del() 將其 list 從 linked list 中 unlink。最後將此 Node 的 value 透過 strncpy 複製到 sp 中。

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    element_t *first;

    if (!head || list_empty(head))
        return NULL;

    first = list_entry(head->next, element_t, list);
    list_del(&first->list);

    if (sp && bufsize) {
        strncpy(sp, first->value, bufsize - 1);
        sp[bufsize - 1] = '\0';
    }

    return first;
}

q_remove_tail

使用 list_entry() 得到 queue 中的最後一個 Node, 並透過 list_del() 將其 list 從 linked list 中 unlink。最後將此 Node 的 value 透過 strncpy 複製到 sp 中。

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    element_t *last;

    if (!head || list_empty(head))
        return NULL;

    last = list_entry(head->prev, element_t, list);
    list_del(&last->list);

    if (sp && bufsize) {
        strncpy(sp, last->value, bufsize - 1);
        sp[bufsize - 1] = '\0';
    }

    return last;
}

q_size

若 Head 為 NULL, 則直接回傳 0, 否則走訪整個 linked list, 每走訪一個 list node 便將 size++, 最後回傳 size。

int q_size(struct list_head *head)
{
    struct list_head *pos;
    int size;

    if (!head)
        return 0;

    size = 0;
    list_for_each (pos, head)
        size++;

    return size;
}

q_delete_mid

此函式利用快(fast)慢(slow)指標的技巧。

fast 與 slow 的初始值均為 head->next, for loop 的最後 slow = slow->next, fast = fast->next->next, 則當 fast == head 或 fast->next == head 時, slow 即為整個 linked list 的中心點, 此時使用 list_del() 將 slow 從 linked list 中 unlink, 並將其 entry 透過 free() deallocate memory。

bool q_delete_mid(struct list_head *head)
{
    element_t *ele;
    struct list_head *slow, *fast;

    if (!head || list_empty(head))
        return false;

    for (slow = head->next, fast = head->next;
         fast != head && fast->next != head;
         slow = slow->next, fast = fast->next->next)
        ;

    list_del(slow);

    ele = list_entry(slow, element_t, list);
    free(ele->value);
    free(ele);

    return true;
}

q_delete_dup

使用 list_for_each_entry_safe() 走訪整個 linked list, 比較 prev_value 是否等於 pos->value, 若是, 則表示此 Node 為 duplicate node, 此時呼叫 list_del() 將該 Node 的 list unlink, 並透過 free() deallocate memory。除此之外, 因為第一個重複的 Node 也要被清掉, 因此需要記錄其為 start, 並在造訪下一種 value 的節點或造訪到 Head 時清掉它的 memory。

參考並修改自 blueskyson。

bool q_delete_dup(struct list_head *head)
{
    element_t *pos, *safe, *start = NULL;
    char *prev_value = "";

    if (!head)
        return false;

    list_for_each_entry_safe (pos, safe, head, list) {
        if (!strcmp(pos->value, prev_value)) {
            /* Record the start queue element of the duplicate set,
               which will be delete later */
            if (!start)
                start = list_entry(pos->list.prev, element_t, list);

            list_del(&pos->list);
            free(pos->value);
            free(pos);
        } else {
            prev_value = pos->value;

            /* Defered deletion of the start of the duplicate set */
            if (start) {
                list_del(&start->list);
                free(start->value);
                free(start);
                start = NULL;
            }
        }
    }

    /* Defered deletion of the start of the duplicate set */
    if (start) {
        list_del(&start->list);
        free(start->value);
        free(start);
    }

    return true;
}

以下是 refactor 後的程式碼, 我透過比較現節點與 next節點之字串, 取代 prev_value, 並用 dup 確定是否在重複節點的 set 中, 此變數的目的為辨別重複節點的 set 中的最後一個節點。

bool q_delete_dup(struct list_head *head)
{
    element_t *pos, *next;
    int dup = 0;

    if (!head)
        return false;

    list_for_each_entry_safe (pos, next, head, list) {
        if (&next->list != head && 
                !strcmp(pos->value, next->value)) {
            list_del(&pos->list);
            free(pos->value);
            free(pos);
            dup = 1;
        }
        else if (dup) {
            list_del(&pos->list);
            free(pos->value);
            free(pos);
            dup = 0;
        }
    }

    return true;
}

q_swap

透過 list_for_each() 走訪整個 linked list, 期間將 pos 使用 list_del() unlink, 再將之通過 list_add() 將其鏈結到 pos->next 的後面。

參考自 SmallHanley。

void q_swap(struct list_head *head)
{
    struct list_head *pos;

    if (!head || list_empty(head) || list_is_singular(head))
        return;

    list_for_each (pos, head) {
        if (pos->next == head)
            break;

        list_del(pos);
        list_add(pos, pos->next);
    }
}

q_reverse

從 Head 開始走訪整個 linked list, 將 pos 的 prev 與 next 成員對調, 持續進行此操作直到 pos 回到 Head。

參考自 blueskyson。

void q_reverse(struct list_head *head)
{
    struct list_head *pos, *next;

    if (!head || list_empty(head) || list_is_singular(head))
        return;

    /* Start from the real head, not the first queue element */
    pos = head, next = head->next;
    do {
        pos->next = pos->prev;
        pos->prev = next;
        pos = next;
        next = next->next;
    } while (pos != head);
}

q_sort

此函式使用 naive merge sort 排序 list。

在 call 真正實作 merge sort 的 function (__q_sort()) 之前, 先將 list 的最後一個成員 (head->prev) 的 next 指向 NULL, 以防止之後排序時誤把 Head 當成最後一個成員的下一位成員。

在 call 完 __q_sort() 之後, 此 list 即被單向的排序, 亦即此 list 不再是一個 circular doubly-linked list。因此在函式的末端透過 for loop 走訪 list, 將 prev 指向正確的節點。

void q_sort(struct list_head *head)
{
    struct list_head *list, *pos, *prev;

    if (!head || list_empty(head) || list_is_singular(head))
        return;

    /* Make the last node's next to NULL, avoiding the result of
       __q_split_into_two() is wrong */
    head->prev->next = NULL;

    list = head->next;
    list = __q_sort(list);

    /* Now the list is a sorted singly-linked list,
       it's time to recover the list into doubly-linked list */
    head->next = list;

    for (prev = head, pos = head->next; pos; prev = pos, pos = pos->next)
        pos->prev = prev;

    prev->next = head;
    head->prev = prev;
}

__q_sort

Merge sort 之遞迴呼叫函式。

首先先檢查 list 是否為 singular list, 若否, 則透過 __q_split_into_two() 函式將 list 切割為 left 與 right, 接著對其二進行遞迴呼叫, 函式最後使用 __q_merge_two_lists() 將已排序好的 left 與 right 合併成一個 list。

static struct list_head *__q_sort(struct list_head *left)
{
    if (left->next) {
        struct list_head *right;

        right = __q_split_into_two(left);

        left = __q_sort(left);
        right = __q_sort(right);

        return __q_merge_two_lists(left, right);
    }
    return left;
}

__q_split_into_two

切割 list 為兩個部分。參數為左半部份, 回傳值為右半部份。

與 q_delete_mid() 的手法類似, 使用快慢指標找到 list 中的中間點, 再將中間點的前一個節點之 next 指向 NULL, 最後回傳中間點, 即完成切割。

static struct list_head *__q_split_into_two(struct list_head *head)
{
    struct list_head *fast, *slow;

    /* Find the middle node of the list */
    for (fast = head, slow = head; fast && fast->next;
         fast = fast->next->next, slow = slow->next)
        ;

    slow->prev->next = NULL;

    return slow;
}

__q_merge_two_lists

將兩個 sorted list 合併為一個 sorted list。

參考自你所不知道的 C 語言: linked list 和非連續記憶體。

透過 indirect pointer (ptr) 的技巧, 避免對 head 使用 malloc()。

static struct list_head *__q_merge_two_lists(struct list_head *left,
                                             struct list_head *right)
{
    struct list_head *head, **ptr;

    head = NULL;
    ptr = &head;

    for (; left && right; ptr = &(*ptr)->next) {
        element_t *e_left, *e_right;

        e_left = list_entry(left, element_t, list);
        e_right = list_entry(right, element_t, list);

        if (strcmp(e_left->value, e_right->value) <= 0) {
            *ptr = left;
            left = left->next;
        } else {
            *ptr = right;
            right = right->next;
        }
    }

    *ptr = (struct list_head *) ((uintptr_t) left | (uintptr_t) right);

    return head;
}

研讀 linux list_sort.c

首先觀察 list_sort() 的 prototype, 其定義在 include/linux/list_sort.h, 為了方便閱讀, 加上了 lib/list_sort.c中 list_sort() 實作的註解。

/**
 * list_sort - sort a list
 * @priv: private data, opaque to list_sort(), passed to @cmp
 * @head: the list to sort
 * @cmp: the elements comparison function
 */
__attribute__((nonnull(2,3)))
void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);

參數

該函式有三個參數：

priv: 用來傳遞到 cmp 的資料
head: 要被 sort 的 circular doubly-linked list
cmp: 比較兩個元素的值並決定排列順序
其中 list_cmp_func_t 的 typedef 如下：

typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
		const struct list_head *, const struct list_head *);

使用者透過自定義的 cmp 決定 list_sort() 的排列方式, 假設 a, b 為傳入的兩個型態為 list_head 的參數, 若 cmp 函式回傳小於或等於 0 (list_sort() 為 stable sort, 不用分別小於與等於的差別) 的 integer, 則 a, b 不進行交換, 反之則進行交換。

attribute((nonnull(2,3))

我們可以觀察到 list_sort() 的宣告中帶有 __attribute__((nonnull(2,3))) 的前輟。

參見 gcc 9.4 的 online doc, 第 6 章的 Extensions to the C Language Family 中的 6.33 Declaring Attributes of Functions 提到, GNU C 提供 function attribute 用來修飾 function, 以提供編譯器進行最佳化或確保程式的正確性。Function attribute 的用法如下:

__attribute__((<some attribute>)) <function prototype>

要注意的是, 根據上述的文件, __attribute__ 後面的 attribute 必須以兩個小括號包起來(原因為何還有待確認)。

而 nonnull 為其中一種 function attribute, 其目的為確保指定位置的 pointer 不為 NULL , 若 compiler 檢查到指定位置的 pointer reference 到 NULL, 且 compiler option -Wnonnull 有 eable, 則 compiler 會發出 warning。

list_sort 設計理念

以下參考之 gcc 文件均取自 gcc 9.4 online doc。

lib/list_sort.c

/*
 * This mergesort is as eager as possible while always performing at least
 * 2:1 balanced merges.  Given two pending sublists of size 2^k, they are
 * merged to a size-2^(k+1) list as soon as we have 2^k following elements.
 *
 * Thus, it will avoid cache thrashing as long as 3*2^k elements can
 * fit into the cache.  Not quite as good as a fully-eager bottom-up
 * mergesort, but it does use 0.2*n fewer comparisons, so is faster in
 * the common case that everything fits into L1.
 */

根據以上程式碼註解可得知, 此 merge sort 總是保持 list 的大小比為 2:1。假設有兩個大小為

2^{k}

的 list, 此時若為 fully-eager bottom-up mergesort 則會馬上進行合併, 但本實作不會, 而是等到有的三個大小為

2^{k}

的 list 時, 才會開始合併上面的兩個 list 為大小

2^{k + 1}

的一個 list, 此時 list 的大小比保持在 2:1 (

2^{k + 1}

2^{k}

)。

根據程式碼註解, 若前面

3 * 2^{k}

的 list 都可以被放進 cache 裡, 則可以避免 cache thrasing。

而要如何保持 2:1 的 list 呢, 以下註解解釋其中的玄機：

/*
 * The merging is controlled by "count", the number of elements in the
 * pending lists.  This is beautifully simple code, but rather subtle.
 *
 * Each time we increment "count", we set one bit (bit k) and clear
 * bits k-1 .. 0.  Each time this happens (except the very first time
 * for each bit, when count increments to 2^k), we merge two lists of
 * size 2^k into one list of size 2^(k+1).
 */

方法中提到用變數 count 來記錄 pending list 中的節點個數, 每當 count = count + 1 時使得第 k 個 bit 變為 1 且 bits k-1 .. 0 都為 0 時 (除了第 k 個 bit 第一次變成 1), 此時合併 2 個大小為

2^{k}

的 list 為大小為

2^{k + 1}

的 list。

以下舉例 count 為 0~7 時的 merge 操作:

count(10 進位)	count(2 進位)	count+1(2 進位)	k	merge 操作
0	0000	0001	0	no merge, 因 bit `k = 0` 為初次 set
1	0001	0010	1	no merge, 因 bit `k = 1` 為初次 set
2	0010	0011	0	merge two $2^{0}$ lists to $2^{1}$
3	0011	0100	2	no merge, 因 bit `k = 2` 為初次 set
4	0100	0101	0	merge two $2^{0}$ lists to $2^{1}$
5	0101	0110	1	merge two $2^{1}$ lists to $2^{2}$
6	0110	0111	0	merge two $2^{0}$ lists to $2^{1}$
7	0111	1000	3	no merge, 因 bit `k = 3` 為初次 set

但具體來說程式要怎麼做到上述的判斷呢？根據上表我們可以觀察到, merge 發生在 "set k bit" 同時 "k 又不是第一次被 set", 這件事其實可以翻譯成 "從 LSB 往 MSB, 看到第一個 0 時, 更高位元是否存在 1"。

list_sort() 透過以下程式碼做到這件事：

/* Find the least-significant clear bit in count */
for (bits = count; bits & 1; bits >>= 1)
    tail = &(*tail)->prev;

For loop 結束時, 若 bits 不為 0, 則表示更高位元存在 1, 此時準備進行 merge。

同時 tail 走訪了 k 值以下的 sublist, 走到準備進行 merge 的位置。

/* Do the indicated merge */
if (likely(bits)) {
    struct list_head *a = *tail, *b = a->prev;

    a = merge(priv, cmp, b, a);
    /* Install the merged result in place of the inputs */
    a->prev = b->prev;
    *tail = a;
}

我們不難猜到 likely 的功能為判斷 bits 是否不為 0, 但為何不直接寫成 if (bits) 呢？

以下探討 likely 巨集的奧祕。

likely

likely 的定義在 include/linux/compiler.h 中：

#if defined(CONFIG_TRACE_BRANCH_PROFILING) \
    && !defined(DISABLE_BRANCH_PROFILING) && !defined(__CHECKER__)

#define likely(x)    (__branch_check__(x, 1, __builtin_constant_p(x)))

#else
#define likely(x)	__builtin_expect(!!(x), 1)

#endif

可以看到 likely 的實作有兩種可能, 取決於 macro CONFIG_TRACE_BRANCH_PROFILING, DISABLE_BRANCH_PROFILING 與 __CHECKER__ 是否 define。

CONFIG_TRACE_BRANCH_PROFILING 與 DISABLE_BRANCH_PROFILING 這兩個 macro, 推測應該設定在 config file 以在 compile kernel 時啟用。透過以下指令也許可以查看目前的核心是否有啟用：

查看 CONFIG_TRACE_BRANCH_PROFILING 是否啟用：

cat "/boot/config-`uname -r`" | grep "CONFIG_TRACE_BRANCH_PROFILING"

查看 DISABLE_BRANCH_PROFILING 是否啟用：

cat "/boot/config-`uname -r`" | grep "DISABLE_BRANCH_PROFILING"

__CHECKER__ 用來作為 sparse (linux kernel 中用來除錯的靜態工具) 使用。

likely (第二種實作)

我們先看到第二個實作：

#define likely(x)	__builtin_expect(!!(x), 1)

其使用到另一個 gcc 的 built-in function __builtin_expect(), 以下為此函式之介紹。

__builtin_expect

Function prototype:

long __builtin_expect (long exp, long c);

參考自 gcc 9.4 第 6.59 章 Other Built-in Functions Provided by GCC, __builtin_expect 提供 compiler 有關 branch prediction 的資訊, 其回傳值為 x。

舉例來說：

if (__builtin_expect(x, 0)) {
    foo();
    ...
} else {
    bar();
    ...
}

這段程式碼提供 compiler 以下資訊：x 很高的機率為 0, 故我們不期望 if 判斷會通過, 故較高的機率為進入 else 執行 bar() 函式。

至於 compiler 要怎麼透過此資訊優化呢, What is the advantage of GCC's __builtin_expect in if else statements? 文章中的回答給出了可能的優化後的 assembly code：

    cmp   $x, 0
    jne   _foo
_bar:
    call  bar
    ...
    jmp   after_if
_foo:
    call  foo
    ...
after_if:

我們可以發現 assembly 中 bar label 出現在 foo label 前面, 也就是說, 若 jne 判斷為否時 (等同上方 C code 中 if 判斷為否), 程式將繼續往下執行到 bar label 的區塊, 因此免除了 jmp 指令 (jmp 指令將浪費 cpu 從 memory prefetch 到 cache 的 instruction, 而使得執行 cycle 數增加)。

回到 __branch_checker__ ,

______r = __builtin_expect(!!(x), expect);

為何這邊出現了 !!(x), 理由為 x 有可能是 0 或非 0 的整數, 透過 !!(x) 可以將值限制在 0 與 1。

試想一個情況, 我們預測 x 為非 0 的情況發生機率較大, 此時就可以將 expect 填入 1, x 只要是非 0, 則 !!(x) 的值就會是 1。

回到 likely,

#define likely(x)	__builtin_expect(!!(x), 1)

原來此 macro 就是提供 compiler 以下資訊：x 較高機率為非 0 的整數。這解釋了為什麼 list_sort() 中不直接使用 if (bits) 而要使用 if (likely(bits)) 的寫法。

likely 有一個兄弟 unlikely, 其定義如下：

# define unlikely(x)	__builtin_expect(!!(x), 0)

為 likely 的反例, 即告訴 compiler x 較高機率為 0。

我們回頭看 likely 的第一個定義：

#define likely(x)    (__branch_check__(x, 1, __builtin_constant_p(x)))

在搞懂這個定義之前, 我們需要先介紹幾個 gcc built-in function, macro 與變數：

__builtin_constant_p

Function prototype:

int __builtin_constant_p (exp);

__builtin_constant_p 定義自 gcc 9.4 的 6.59 小節 Other Built-in Functions Provided by GCC, 用來判斷傳入的參數在 compile time 是否為一常數, 若是, 則回傳 1, 同時 compiler 可以對含此參數的 expression 進行 constant folding 優化, 否則回傳 0。

__branch_check__

再來我們看到另一個巨集 __branch_check__：

#define __branch_check__(x, expect, is_constant) ({         \
    long ______r;                   \
    static struct ftrace_likely_data        \
        __aligned(4)                \
        __section("_ftrace_annotated_branch")   \
        ______f = {             \
        .data.func = __func__,          \
        .data.file = __FILE__,          \
        .data.line = __LINE__,          \
    };                      \
    ______r = __builtin_expect(!!(x), expect);  \
    ftrace_likely_update(&______f, ______r,     \
        expect, is_constant);  \
    ______r;                    \
})

`func`, `FILE`, `LINE`

首先, 我們先研究 designated initializers 中的 __func__, __FILE__ 與 __LINE__, 其中除了 __func__ 屬於 gcc extension 的 pre-defined 變數, __FILE__ 與 __LINE__ 均屬於 C standard 的 macro。

__func__：取代 __FUNCTION__, 此變數儲存當前的函式名稱, 其中的原理為 gcc 偷偷將 static const char __func__[] = "function-name"; 加到 function 的開頭。
__FILE__：表示目前 input file 的名稱。
__LINE__：表示目前 input file 執行到第幾行。

gcc 9.4 另外提到 __FILE__ 與 __LINE__ 在產生 error message 時很有幫助, 如以下用法：

fprintf (stderr, "Internal error: "
                 "negative string length "
                 "%d at %s, line %d.",
         length, __FILE__, __LINE__);

`struct ftrace_likely_data` & `struct ftrace_branch_data`

__branch_check__, 裡頭的 struct ftrace_likely_data 是什麼呢？根據 ftrace wikipedia, ftrace 即 function tracer, 是 linux kernel 中用來追蹤函式呼叫的工具。

struct ftrace_likely_data 的定義如下, 取自 include/linux/compiler_types.h：

struct ftrace_likely_data {
	struct ftrace_branch_data	data;
	unsigned long			constant;
};

其中的第一個成員的 type struct ftrace_branch_data 之定義如下：

struct ftrace_branch_data {
    const char *func;
    const char *file;
    unsigned line;
    union {
        struct {
            unsigned long correct;
            unsigned long incorrect;
        };
        struct {
            unsigned long miss;
            unsigned long hit;
        };
        unsigned long miss_hit[2];
    };
};

根據欄位名稱與 __branch_check__ 中的 designated initializers 區塊不難猜到, func, file 與 line 對應到上面提到的 __func__, __FILE__ 與 __LINE__。

而 union 包起來的部分在原始碼中缺乏註解, 故只能透過名稱猜測 miss, hit 即表示 branch miss 與 branch hit 的次數, 至於 correct, incorrect 則有待考察。

__aligned

__aligned 的定義 "猜測" 如下(在 include/linux/compiler_types.h 中可以看到不少以雙底線為開頭, 不以雙底線結尾的變數, 其都是為了簡化冗長的 __attribute__(()) macro)。

#define __aligned(alignment) __attribute__((aligned(alignment)))

參考自 gcc 9.4 第 6.34.1 章 Common Variable Attributes aligned 指定了變數或 struct field 的最小 alignment。

Data alignment 的重要性參考你所不知道的 C 語言：記憶體管理、對齊及硬體特性。

以下作個簡單的實驗：

struct s {
    char a;    // 1 byte, will be add 3 byte padding by compiler
    int b;     // 4 byte
} t;

以上的 struct s 因為 a 只佔一個 1 byte, 以 32 位元 cpu 來說, 這會導致下一個變數 b 的位置沒有對齊 4 byte, 於是 compiler 通常都會自動加上 padding 以符合 alignment。

我們透過印出 sizeof(struct s) 與 a 及 b 的 address 可以驗證上面的說法：

$ ./test
sizeof(struct s) = 8
address of a = 0x6827010
address of b = 0x6827014

我們修改一下 struct s, 在 b 的定義加上__attribute__((aligned(8))), 使得 b 對齊 8 byte：

struct s {
    char a;
    int b __attribute__((aligned(8)));
} t;

執行程式結果如下：

sizeof(struct t) = 16
address of a = 0x7dd2c020
address of b = 0x7dd2c028

可以發現, a 與 b 的 address 相差了 8 個 byte, 故 b 的確對齊了 8 byte。

這邊有個疑問, 為何 b 的 size 從 4 byte 提升到 8 byte, 也就是為何 sizeof(struct t) = 8 + 8 = 16 而不是 sizeof(struct t) = 8 + 4 = 12？

回到 __branch_check__ 的定義,

    ...               
    static struct ftrace_likely_data    \
        __aligned(4)                    \
        __section("_ftrace_annotated_branch")   \
        ______f = {                        \
        ...
        }
    ...

其中的 __aligned(4) 即是指定此結構體的變數 ______f 的 address 要對齊 4 byte。

__section

如同 __aligned , __section 的定義 "猜測" 如下

#define __section(section_name) __attribute__((section(section_name)))

參考自 gcc 9.4 第 6.34.1 章 Common Variable Attributes, compiler 通常將 global object 放置在 section bss(未初始化) 或 data(初始化), 但某些情況可能希望將 object 放置到特定或自定的 section 中, 此時透過 section attribute 即可達到此目的。

__branch_check__ 中使用到 __section 指定該 object 要使用到 "_ftrace_annotated_branch" section, 但程式碼註解中沒有提到有關訊息, 故此 section 還有待查證。

__section("_ftrace_annotated_branch")

ftrace_likely_update

此函式定義在 kernel/trace/trace_branch.c。

此部份需 trace 完整的程式碼, 待以後實力更堅強時再完成此部份。

likely (第一種實作)

回到 likely 的第一種實作：

#define likely(x)    (__branch_check__(x, 1, __builtin_constant_p(x)))

#define __branch_check__(x, expect, is_constant) ({         \
    long ______r;                   \
    static struct ftrace_likely_data        \
        __aligned(4)                \
        __section("_ftrace_annotated_branch")   \
        ______f = {             \
        .data.func = __func__,          \
        .data.file = __FILE__,          \
        .data.line = __LINE__,          \
    };                      \
    ______r = __builtin_expect(!!(x), expect);  \
    ftrace_likely_update(&______f, ______r,     \
        expect, is_constant);  \
    ______r;                    \
})

其回傳的結果與第一種實作相同, 均為

__builtin_expect(!!(x), expect);

但透過定義 struct ftrace_likely_data 記錄一些額外的資訊如 __func__, __FILE__ 與 __LINE__, 並透過 ftrace_likely_update() 傳入 ftrace 的 code 中 (此部分尚需研究), 如此變可以使 ftrace 提供更多資訊給 kernel 開發者。

但第一種實作增加了記錄 branch 資訊的 overhead, 我猜測為了效能考量, linux kernel 預設的 likely 應該是第二種實作, 第一種實作應是作為開發者版本使用。

list_sort()

回到 list_sort() 程式碼：

if (likely(bits)) {
    struct list_head *a = *tail, *b = a->prev;

    a = merge(priv, cmp, b, a);
    /* Install the merged result in place of the inputs */
    a->prev = b->prev;
    *tail = a;
}

此時 a, b 指向準備進行 merge 的大小為

2^{k}

的兩個sublists, 傳入 merge 函式並得到合併後的 sublist。

merge 完之後透過以下程式碼將新的 node 加入 pending 中。

/* Move one element from input list to pending */
list->prev = pending;
pending = list;
list = list->next;
pending->next = NULL;
count++;

list_sort() 將持續作以上的所有操作, 直到 list == NULL (list 的所有節點都已進到 pending), 最後透過 merge_final() 將所有 sublists 合併, 並把原本被破壞掉的 prev 加回去, 使 head 恢復成 circular doubly-linked list。