# <font color = "orange">Intro</font> <font size = 4>**2024 Fall NCU Linux OS Project 2**</font> * Implement a custom wait queue-like functionality in kernel space. Allowing user applications to operate through the system call. * Add a thread to the wait queue to wait. Remove threads from the wait queue, allowing them exit in **FIFO** order **Link :** [Project2.pdf](https://github.com/gary7102/Linux-Wait-Queue/blob/main/Project_2.pdf) , [github](https://github.com/gary7102/Linux-Wait-Queue/tree/main) , [hackmd](https://hackmd.io/@gary7102/SyBMtkdHkx) , [DEMO](https://hackmd.io/@gary7102/SJrFJ2181l) <font size = 4>**Environment**</font> ``` OS: Ubuntu 22.04 on VMware workstation 17 ARCH: X86_64 Kernel Version: 5.15.137 ``` # <font color = "orange">Wait Queue Implemention</font> <font size = 4>**以下為kernel space code**</font> <font size = 4>**wait queue node**</font> ```c typedef struct wait_queue_node { struct list_head list; struct task_struct *task; } wait_queue_node_t; ``` * 每個node存表示 `my_wait_queue` 中的一個節點,每個節點對應一個process,裡面存了: * `list_head list`:使用 `list_head`(Circular doubly linked list)將所有等待的process接在一起。 * `task_struct *task`: 每個process 指向自己的 process descriptor <font size = 4>**global var.**</font> ```c static LIST_HEAD(my_wait_queue); // self-define FIFO Queue static spinlock_t queue_lock; // protect list_add, list_del ``` * `my_wait_queue`: 使用 `LIST_HEAD` 定義了一個空的circularly doubly linked list,作為head node of wait queue。 * `queue_lock`:使用 `spinlock_t` 定義一個spin lock,用於保護multi-threads對 `my_wait_queue` 的操作,防止race condition。 * 一次就只會有一個thread 取得鑰匙,目的就是要保護`my_wait_queue` ,同時間只會有一個thread 在操作,包含 delete, add 都不能同時間執行 <font size = 4>**enter_wait_queue**</font> ```c static int enter_wait_queue(void) { wait_queue_node_t *node; // 分配新節點 node = kmalloc(sizeof(wait_queue_node_t), GFP_KERNEL); if (!node) return 0; node->task = current; spin_lock(&queue_lock); ////////// Critical Region /////////////////// list_add_tail(&node->list, &my_wait_queue); // 加入 FIFO Queue printk(KERN_INFO "Thread ID %d added to wait queue\n", current->pid); ////////// Critical Region End /////////////// spin_unlock(&queue_lock); set_current_state(TASK_INTERRUPTIBLE); // 設定為可中斷的睡眠狀態 schedule(); // 進入睡眠 return 1; // 成功加入 } ``` * 使用 `spin_lock` 保護`list_add_tail`,同時間只會有一個node 加入 `my_wait_queue` ,確保不會出現race condition。 * 使用 `list_add_tail` 將節點加入queue尾部。 * 設置process state為 `TASK_INTERRUPTIBLE`,並使用 `schedule()` 將當前process 設為睡眠,直到呼叫 `wake_up_process` 才可以離開wait queue。 :::warning 記得 `schedule();` 要加在spinlock之外, 否則在 critical region 中使用 `schedule();` 會使 process 占用 lock 不放,導致deadlock 發生 ::: <font size = 4>**clean_wait_queue**</font> ```c static int clean_wait_queue(void) { wait_queue_node_t *node, *temp; spin_lock(&queue_lock); ////////// Critical Region /////////////////// list_for_each_entry_safe(node, temp, &my_wait_queue, list) { list_del(&node->list); // 從 Queue 中移除 printk(KERN_INFO "Thread ID %d removed from wait queue\n", node->task->pid); wake_up_process(node->task); // 喚醒進程 kfree(node); // 釋放記憶體 msleep(100); // sleep 100 ms } ////////// Critical Region End /////////////// spin_unlock(&queue_lock); return 1; // 成功清理 } ``` * 使用 `list_for_each_entry_safe` 遍歷wait queue * 使用 `list_del` 從queue中移除節點。 * 使用 `wake_up_process` 喚醒process。 * 釋放節點的記憶體。 :::success 這邊加入 `msleep(100)` 因為在user space 中印出的順序受到multi-thread的影響可能會亂掉 因此即使kernel space 的wait queue已經確保是FIFO的順序移除,但是在User space I/O 卻無法保證照順序印出, 所以在`wake_up_process`之後加上睡眠100ms,讓user space 有時間可以印出資訊而不被多執行緒打亂 ::: # <font color = "orange">Function explanation</font> <font size = 4>**list_head**</font> ```c /* * Circular doubly linked list implementation. * * Some of the internal functions ("__xxx") are useful when * manipulating whole lists rather than single entries, as * sometimes we already know the next/prev entries and we can * generate better code by using them directly rather than * using the generic single-entry routines. */ #define LIST_HEAD_INIT(name) { &(name), &(name) } #define LIST_HEAD(name) \ struct list_head name = LIST_HEAD_INIT(name) ``` 可以看到這是一個 Circular doubly linked list , 也就是我們要用來儲存wait queue的資料結構, 將linked list的 next 和 prev 都指向變數 name 本身,形成一個空的環狀雙向鏈表。 也就是說,使用 `LIST_HEAD` 就定義了一個空的雙向鏈表,作為等待Queue的head節點。 <font size = 4>**list_add_tail**</font> ```c /* * Insert a new entry between two known consecutive entries. * * This is only for internal list manipulation where we know * the prev/next entries already! */ static inline void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { if (!__list_add_valid(new, prev, next)) return; next->prev = new; new->next = next; new->prev = prev; WRITE_ONCE(prev->next, new); } // ... /** * list_add_tail - add a new entry * @new: new entry to be added * @head: list head to add it before * * Insert a new entry before the specified head. * This is useful for implementing queues. */ static inline void list_add_tail(struct list_head *new, struct list_head *head) { __list_add(new, head->prev, head); // = add (new, tail-node , head) } ``` * `head` 是circular linked list的 head節點 * 透過 `head->prev` 獲取尾節點,並將新節點插入到尾節點和頭節點之間,也就是加入linked list最後面。 * `__list_add` 中,其實就是將`new` 插入到`prev`和`next`當中,透過4個操作: * 更新 `next->prev = new` 和 `new->next = next` * 更新 `new->prev = prev` 和 `prev->next = new` **圖例:** ```markdown # circularly linked list before add: next <-> A <-> B <-> C <->prev # 加入new node到最後面: after add : next <-> A <-> B <-> C <-> prev <-> new ``` <font size = 4>**list_del**</font> ```c /* * Delete a list entry by making the prev/next entries point to each other. * This is only for internal list manipulation where we know the prev/next entries already! */ static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; WRITE_ONCE(prev->next, next); } /* * Check validity of deletion * */ static inline void __list_del_entry(struct list_head *entry) { if (!__list_del_entry_valid(entry)) return; __list_del(entry->prev, entry->next); } /** * list_del - deletes entry from list. * @entry: the element to delete from the list. * Note: list_empty() on entry does not return true after this, the entry is * in an undefined state. */ static inline void list_del(struct list_head *entry) { __list_del_entry(entry); entry->next = LIST_POISON1; // (void *) 0x00100100, invalid addr. entry->prev = LIST_POISON2; // (void *) 0x00200200, invalid addr. } ``` * 從雙向鏈表(`list_head`)中刪除指定的節點。 * 透過 `next->prev = prev` 及 `prev->next = next` 達到delete node * 使用 `LIST_POISON1` 及 `LIST_POISON2` 明確標記已刪除的節點,便於檢測誤用 **圖例:** ```markdown # circularly linked list before delete: head <-> A <-> prev <-> dnode <-> next # Delete dnode: after delete : head <-> A <-> prev <-> next ``` <font size = 4>**list_for_each_entry_safe**</font> ```c /** * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry * @pos: the type * to use as a loop cursor. * @n: another type * to use as temporary storage * @head: the head for your list. * @member: the name of the list_head within the struct. */ #define list_for_each_entry_safe(pos, n, head, member) \ for (pos = list_first_entry(head, typeof(*pos), member), \ n = list_next_entry(pos, member); \ !list_entry_is_head(pos, head, member); \ pos = n, n = list_next_entry(n, member)) ``` * `pos` 保存當前節點,`n` 保存`pos` 的下一個節點(通常用存放在`tmp`)。 * `n` 確保當前節點(`pos`)刪除後,仍然能獲取下一個節點的地址。 * 刪除 `pos` 後,會透過 `pos=n` 及 `n=list_next_entry()` 一棟到下一個node,因此可以確保刪除的正確性. # <font color = "orange">**User space code**</font> :::spoiler `wait_queue.c` ```c #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h> #include <sys/syscall.h> #define NUM_THREADS 10 #define SYS_CALL_MY_WAIT_QUEUE 452 void *enter_wait_queue(void *thread_id) { pid_t tid = syscall(SYS_gettid); // get thread id fprintf(stderr, "enter wait queue thread_id: %d, current thread id = %d\n", *(int *)thread_id, tid); int result; // result = syscall(SYS_CALL_MY_WAIT_QUEUE, 1); fprintf(stderr, "exit wait queue thread_id: %d, current thread id = %d\n", *(int *)thread_id, tid); } void *clean_wait_queue() { int result; // result = syscall(SYS_CALL_MY_WAIT_QUEUE, 2); } int main() { void *ret; pthread_t id[NUM_THREADS]; int thread_args[NUM_THREADS]; for (int i = 0; i < NUM_THREADS; i++) { thread_args[i] = i; pthread_create(&id[i], NULL, enter_wait_queue, (void *)&thread_args[i]); } sleep(1); fprintf(stderr, "start clean queue ...\n"); clean_wait_queue(); for (int i = 0; i < NUM_THREADS; i++) { pthread_join(id[i], &ret); } return 0; } ``` ::: # 執行結果 <font size = 4>__User space result:__</font> ![image](https://hackmd.io/_uploads/rJpGeocSJe.png) 透過原本的User space code for迴圈中 `i` 值以及真正的thread id來檢查, 進入wait queue以及離開wait queue的順序是保持FIFO <font size = 4>__Kernel space result:__</font> ![image](https://hackmd.io/_uploads/Syjrmj9Hkl.png) # Another implementation :::spoiler my_wait_queue.c ```c #include <linux/sched.h> #include <linux/kernel.h> #include <linux/syscalls.h> #include <linux/delay.h> static int condition = 0; DECLARE_WAIT_QUEUE_HEAD(my_wait_queue); static DEFINE_MUTEX(my_wait_queue_lock); static int enter_wait_queue(void) { condition = 0; DECLARE_WAITQUEUE(wait_entry,current); mutex_lock(&my_wait_queue_lock); // 加鎖 add_wait_queue_exclusive(&my_wait_queue, &wait_entry); printk("before join wait queue, %d",current->pid); wait_event_exclusive_cmd(my_wait_queue,condition,mutex_unlock(&my_wait_queue_lock);,mutex_lock(&my_wait_queue_lock);); printk("after waiting event, %d",current->pid); remove_wait_queue(&my_wait_queue,&wait_entry); if(waitqueue_active(&my_wait_queue)!=0){ msleep(100); wake_up(&my_wait_queue); } mutex_unlock(&my_wait_queue_lock); // 加鎖 return 0; } static int clean_wait_queue(void) { printk("start waking up process."); mutex_lock(&my_wait_queue_lock); // 加鎖 condition = 1; if(waitqueue_active(&my_wait_queue)!=0) wake_up(&my_wait_queue); mutex_unlock(&my_wait_queue_lock); // 加鎖 return 0; } SYSCALL_DEFINE1(call_my_wait_queue, int, id) { switch (id){ case 1: enter_wait_queue(); break; case 2: clean_wait_queue(); break; } return 0; } ``` ::: ## 全域、static變數初始化 ```c static int condition = 0; DECLARE_WAIT_QUEUE_HEAD(my_wait_queue); static DEFINE_MUTEX(my_wait_queue_lock); ``` 此區塊會初始化變數 `6:`首先將用於`wait_event()`判斷process是否擁有所需的資源的`condition`變數設為0(告訴wait_event目前process無法取得所需資源,需要進行wait)。 `7:`透過`DECLARE_WAIT_QUEUE_HEAD()`來建立一個作為全域變數的wait queue head實例:`my_wait_queue`。 `8:`DEFINE一個static的Mutex lock實例:`my_wait_queue_lock`。 ## enter_wait_queue() ```c static int enter_wait_queue(void) { condition = 0; DECLARE_WAITQUEUE(wait_entry,current); mutex_lock(&my_wait_queue_lock); // 加鎖 add_wait_queue_exclusive(&my_wait_queue, &wait_entry); printk("before join wait queue, %d",current->pid); wait_event_exclusive_cmd(my_wait_queue,condition,mutex_unlock(&my_wait_queue_lock);,mutex_lock(&my_wait_queue_lock);); printk("after waiting event, %d",current->pid); remove_wait_queue(&my_wait_queue,&wait_entry); if(waitqueue_active(&my_wait_queue)!=0){ msleep(100); wake_up(&my_wait_queue); } mutex_unlock(&my_wait_queue_lock); // 加鎖 return 0; } ``` `11:`將`condition`設回0以確保想加入wait queue的process能夠正確進入wait狀態。 `12:`透過`wait.h`中的`DECLARE_WAITQUEUE()` macro來建立一個新的wait queue entry。 `13:`加鎖以確保進行`add_wait_queue()`操作時不會有競爭條件 (Race Condition)的情況發生。 `14:`透過`wait.h`中的function `add_wait_queue_exclusive()`來將process加入wait queue。之所以會選擇exclusive加入的原因一方面是讓之後可以直接使用`wake_up()`就可以每次叫醒一個process進行執行(減少規定不能改動的user space code中printf順序出錯的機率),另一方面是考量`add_wait_queue_exclusive()`本身會將要進入wait_queue的process串接在wait_queue最後方的性質,正好符合這次題目FIFO的需求。 `17:`透過同樣於`wait.h`中的function`wait_event_exclusive_cmd()`來讓process進入wait狀態(因為沒有所需資源(condition=0),故讓process wait),而最後面的兩個cmd分別會於process開始wait前;process被叫醒時執行,這裡設定於開始wait前將mutex lock解鎖,並於被叫醒後嘗試獲得mutex lock。<font color=red> **當process進入wait狀態後實際上被暫停在這行** </font> `18:`<font color=green>當process被叫醒後會由這行繼續執行(得到mutex lock後)。</font> `19:`透過`wait.h`中的function `remove_wait_queue()`將process由wait queue中移除。 `20~23:`讓此process檢查,假設waitqueue中還有其他在wait的process,就將其叫醒(同樣是應對題目不可修改的Userspace code中printf順續亂掉的問題)。 `24:`在此被叫醒的process返回userspace繼續執行前將mutex lock解鎖讓剛剛wake_up()的其他process可以開始執行。 ## clean_wait_queue() ```c static int clean_wait_queue(void) { printk("start waking up process."); mutex_lock(&my_wait_queue_lock); // 加鎖 condition = 1; if(waitqueue_active(&my_wait_queue)!=0) wake_up(&my_wait_queue); mutex_unlock(&my_wait_queue_lock); // 加鎖 return 0; } ``` 此function內的實作與上方`enter_wait_queue()`中負責叫醒其他process的`19~24`行相同,主要用於叫醒第一個process。 ## 執行結果: ### UserSpace output: ![image](https://hackmd.io/_uploads/ry9hJ21Iye.png) ### KernelSpace output: ![image](https://hackmd.io/_uploads/BJzXlnkUJl.png) # Reference * [Waitqueue in Linux](https://embetronicx.com/tutorials/linux/device-drivers/waitqueue-in-linux-device-driver-tutorial/) * [bootlin](https://elixir.bootlin.com/linux/v5.15.137/source/include/linux/syscalls.h)