Intrusive Linked List in Linux Kernel

本文適合讀者：熟悉基本 Linked List 原理、熟悉 C 語法與指標概念。

這一切源自於研究所考試時，OS 科出了一題：請解釋 instrusive linked list 相比於 traditional linked list 的好處，考試時是我第一次聽到這東西，答案也就隨意亂掰，可想而知那一題大概 15 分我八成是 1 分都沒拿到 XD。

兩個半月後，OS 課出了關於 CPU Scheduling 的作業，因為作業環境是採用 Unix-like 的 xv6，作業實作中就碰巧需要使用到 instrusive linked list API。剛開始看到的時候我還認不出這就是 Intrusive Linked List，認真研究一波後才發現原來這又是一次我跟 Intrusive Linked List 命運的相遇哈哈哈，所以決定來徹底搞懂它，順便寫成一篇文件。

Intrusive Linked List 起手式：它長什麼樣子?

linux/list.h in GitHub 實作了 Intrsuive Linked List 的大部分內容，但生啃 kernel code 對菜鳥如我來說實在是吃不消，所以以下我會循序漸進地拆解各個部分講解，再組合起來一次理解整個觀念。

先來看看你我都熟悉的 Traditional Linked List 常見的 Implementation:


















/* C code: Traditional Linked List Example */
// 這是某個你自己需要用到的 Structure
typedef struct student {
    int id;
    char *name;
} Student;

// 然後才會有一個 Node Structure，讓每個 Node 代表一個 Student
typedef struct student_node {
    Student *stu;
    struct student_node *prev;
    struct student_node *next;
} StudentNode;

// 當你想宣告一個 student node，你就會
Student stu = { .id=1, .name="MyName" };
StudentNode *student = { .stu=&stu, .prev=NULL, .next=NULL };
// 因此你就可以用 student->prev->stu 之類的來找到該 student node 前後的 student。

示意圖如下：每個 Node 包著一個 object pointer pointing to 你真正需要的 object。

(Source: https://www.data-structures-in-practice.com/intrusive-linked-lists/)

Intrusive Linked List 就完全不一樣了，以下我從 xv6/list.h 拼貼出一段 Intrusive Linked List 的例子：














/* C code: Intrusive Linked List Example */
// 先定義一個 struct list_head，這個 list_head 將會被用來埋在實際的 object struct 中
struct list_head {
    struct list_head *next, *prev;
};

// 這是主要的 object struct，這裡以 thread 為例
struct thread {
    void (*fp)(void *arg);  // thread function
    void *arg;              // function arguments
    ...
    struct list_head thread_list; // Intrusive Linked List
    ...
};

如此一來，每個 struct thread 之間就會有一條 pointer，而所有 struct thread 則會形成一條 linked list。

示意圖如下：每個 object 自帶一個 struct list_head，而該 list_head 有個 pointer pointing to 下一個 object 的 list_head。
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
(Source: https://www.data-structures-in-practice.com/intrusive-linked-lists/)

那你可能會想：欸這樣的話，假設現在我有一個 linked list of struct thread，而 struct list_head head 是指向這條 linked list 的頭的指標，那我要怎麼拿到這條 linked list 的第一支 thread（也就是 head 指向的 thread）呢？

head->next 似乎行不通，因為 struct list_head 的 member 只有 prev 跟 next，並沒有你想要的 thread 資料。這時候就要讀讀下面關於 container_of（或 list_entry）的講解啦！

`container_of`/`list_entry` 說明

延續上面的問題，要拿到 head 的下一個 thread，正確作法是：

struct thread *th = list_entry(head->next, typeof(*th), thread_list);

其中，list_entry(...) 是一個 macro，相關定義如下：

#define list_entry(ptr, type, member) container_of(ptr, type, member)

由此可知 list_entry 只是 container_of 的 alias。container_of 定義如下：

#define container_of(ptr, type, member) ({                  \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
    (type *)( (char *)__mptr - offsetof(type,member) );     \
})

先不要被這串妖魔鬼怪嚇到，我們逐一來分解它。

首先是最外面的 ({...})，這是 statement expression 語法，最後一行 evaluate 後的值會回傳並賦值給 th。
typeof 是 GNU C 的 extension 語法，如果是在 standard C 下的話需要改為 __typeof__。typeof(a) 會替換成 a 的型別，通常用在宣告變數時，如：假設 a 是一個 int ，則 typeof(a) b = 0; 基本上等價於 int b = 0;。
注意觀察 ((type *)0)->member，這應該是整段 #define 最難理解的地方。這句話的意思是我先把 0 型別轉換成指向 type 的指標，再從這個指標把 member 這個 member 提取出來。這整句話並沒有違法，因為我們沒有 dereference 它，也沒有 access 它，我們只是想知道這個東東的型別是什麼（外面包的一層 typeof()）。

到目前為止你如果用心的 trace 一下 code，可以發現第一句的 typeof( ((type *)0)->member )，其實就等於你的 Intrusive Linked List 的型別 struct list_head。之所以那麼拐彎抹角，其實只是在檢查型別有沒有正確而已。因此第一句執行完，我們相當於宣告了一個型別為 struct list_head 的 *__mptr，而且它的值跟 ptr 一樣。

接下來，只要把 __mptr 減掉適當的 offset，就可以獲得 ptr 所在的那個 thread 並賦值給 th，也就是我們的所求 head 的下一個 thread。什麼意思呢？請參考示意圖：

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(Source: https://www.data-structures-in-practice.com/intrusive-linked-lists/)

將整個大長方形視為一個 object，這個 object 的 memory 從 0x10 開始，而從 0x18 開始紀錄的則是 struct list_head 的部分，對應到上面的 __mptr。因此，這之間的差距 offset = 8 ，只要把 __mptr 減掉 8 就可以拿到這個 object 的起始位址（i.e. 指標）。注意我們在相減之前還有先把 __mptr 轉成 char*，這樣 __mptr - 8 才會是減掉 8 bytes（一個 char 剛好 1 byte）。

原本的 container_of macro 中還有另一個 macro offsetof 就是拿來計算這個 offset 是多少的，這個 macro 被定義在 stddef.h 中，基本上只要知道它的用處是什麼即可，這邊就不再深究下去，有興趣的人可以到 wiki 跟這篇 stackoverflow 瞧瞧。

讀到這邊，理論上應該可以對整個 intrusive linked list 還有操作的語法有一定的掌握度了，而事實上 linux/list.h 裡還有許多依照此語法衍伸的各種實用 macro，如在 linked list 後方加入 object 的 list_add、用於 traverse 整條 linked list 的 list_for_each 等等，在理解這篇文說明的基本語法後，應該就可以更容易地看懂這些 Intrusive Linked List 操作 API 了吧！

Intrusive Linked List in Linux Kernel

Intrusive Linked List 起手式：它長什麼樣子?

container_of/list_entry 說明

Reference

Read more

113年文組跨考資工所 台清交正取/央榜首兼找教授心得

Virtualbox Ubuntu 無法開啟相機 - 解決方式

`container_of`/`list_entry` 說明

113年文組跨考資工所台清交正取/央榜首兼找教授心得