# Linux Project 3 ###### tags: `Linux` > version : Linux kernel 5.14.16、ubuntu 20.04 #### get_number_of_context_switches & get_number_of_entering_a_wait_queue Path : **include/linux/sched.h** * task_struct內記錄 context switch 次數之變數 * [nvcsw](https://elixir.bootlin.com/linux/v5.14.16/source/include/linux/sched.h#L939) (自願性) * [nivcsw](https://elixir.bootlin.com/linux/v5.14.16/source/include/linux/sched.h#L940) (非自願性) * [last_switch_count](https://elixir.bootlin.com/linux/v5.14.16/source/include/linux/sched.h#L991) (nvcsw+nivcsw) ### Trace [schedule()](https://elixir.free-electrons.com/linux/v5.14.16/source/kernel/sched/core.c#L6111) 了解context switch怎麼運作、在哪裡計次數等... * `$ cat /proc/[pid]/status`可查看nvcsw、nivcsw 次數 ![](https://i.imgur.com/Re7Bhak.png) * last_switch_count在copy_mm中直接進行nvcsw及nivcsw兩個context switch 次數的相加 * schedule中,if (likely(prev != next))用來判斷是否進行切換task,如果有進行task的切換,則會將process switch的次數累加。 ``` static void __sched __schedule(void) { ... switch_count = &prev->nivcsw;//強制切换的次數 if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {//process處于非執行狀態并且允許搶占 ... switch_count = &prev->nvcsw;//自願切換的次數 } ... if (likely(prev != next)) { //if we are switching between different tasks rq->nr_switches++; rq->curr = next; ++*switch_count;//process switch 數字累加// <= increment nvcsw or nivcsw via pointer context_switch(rq, prev, next); /* unlocks the rq */ /* * The context switch have flipped the stack from under us * and restored the local variables which were saved when * this task called schedule() in the past. prev == current * is still correct, but it can be moved to another cpu/rq. */ cpu = smp_processor_id(); rq = cpu_rq(cpu); } else raw_spin_unlock_irq(&rq->lock); ... } not_running && preemptive : voluntary ``` * add new field in task_struct * [task_struct](https://elixir.bootlin.com/linux/v5.14.16/source/include/linux/sched.h#L1403) ```c= struct task_struct { ...... unsigned long wait_q_times; /* number_of_entering_a_wait_queue */ /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. */ ...... }; ``` --- Path : **arch/x86/kernel/process.c** * initialize the value in copy_thread * [copy_thread](https://elixir.free-electrons.com/linux/v5.14.16/source/arch/x86/kernel/process.c#L119) ```c= int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, struct task_struct *p, unsigned long tls) { ...... p->wait_q_times = 0; ...... } ``` --- * add the value in [default_wake_function](https://elixir.free-electrons.com/linux/v5.14.16/source/kernel/sched/core.c#L6435) > [try_to_wake_up](https://elixir.free-electrons.com/linux/v5.14.16/source/kernel/sched/core.c#L3712) ```c= static int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { ...... p->wait_q_times++; ...... } ``` ### wait queue trace: Path : **include/linux/wait.h** ```c= .... typedef int (*wait_queue_func_t)(struct wait_queue_entry *wq_entry, unsigned mode, int flags, void *key); int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int flags, void *key); .... /* * A single wait-queue entry structure: */ struct wait_queue_entry { unsigned int flags; void *private; wait_queue_func_t func; struct list_head entry; }; ... #define __WAITQUEUE_INITIALIZER(name, tsk) { \ .private = tsk, \ .func = default_wake_function, \ .entry = { NULL, NULL } } #define DECLARE_WAITQUEUE(name, tsk) \ struct wait_queue_entry name = __WAITQUEUE_INITIALIZER(name, tsk) ... ``` 透過DECLARE_WAITQUEUE宣告type為wait_queue_entry之element,可以發現,element的預設func為default_wake_function。 如果有process要進入wait queue,process會呼叫wait_event ![](https://i.imgur.com/RLxrXGa.png) [__wait_event](https://elixir.free-electrons.com/linux/v5.14.16/source/include/linux/wait.h#L274)當中for迴圈裡`prepare_to_wait_event`會去檢測該process是否有signal,若有則回回傳非0,若沒有則將process加到wait queue裡。若for裡面的條件當未滿足,該process會進到cmd ( schedule() )。 ![](https://i.imgur.com/pOSR2G8.png) Path : **kernel/sched/wait.c** [prepare_to_wait_event](https://elixir.free-electrons.com/linux/v5.14.16/source/kernel/sched/wait.c#L295) ![](https://i.imgur.com/y9h4zlo.png) * wake_up()主要是呼叫__wake_up_common(),會去尋訪wait queue裡面的wait_queue_t,依序調用wake up function 來喚醒process * curr->default_wake_function ```c= static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode, int nr_exclusive, int wake_flags, void *key, wait_queue_entry_t *bookmark) { wait_queue_entry_t *curr, *next; int cnt = 0; lockdep_assert_held(&wq_head->lock); if (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) { curr = list_next_entry(bookmark, entry); list_del(&bookmark->entry); bookmark->flags = 0; } else curr = list_first_entry(&wq_head->head, wait_queue_entry_t, entry); if (&curr->entry == &wq_head->head) return nr_exclusive; list_for_each_entry_safe_from(curr, next, &wq_head->head, entry) { unsigned flags = curr->flags; int ret; if (flags & WQ_FLAG_BOOKMARK) continue; ret = curr->func(curr, mode, wake_flags, key); if (ret < 0) break; if (ret && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) && (&next->entry != &wq_head->head)) { bookmark->flags = WQ_FLAG_BOOKMARK; list_add_tail(&bookmark->entry, &next->entry); break; } } return nr_exclusive; } ``` #### System call 1. get_number_of_context_switches ```c= #include <linux/kernel.h> #include <linux/string.h> #include <linux/uaccess.h> #include <linux/syscalls.h> #include <asm/current.h> SYSCALL_DEFINE1(get_number_of_context_switches, void* __user, result) { struct task_struct *task = current; unsigned long v = task->nvcsw; unsigned long iv = task->nivcsw; unsigned long times = v + iv; if(copy_to_user(result, &times, sizeof(unsigned long)) != 0) { printk("get_number_of_context_switches copy fail!\n"); } else printk("get_number_of_context_switches copy success!\n"); return 0; } ``` 2. get_number_of_entering_a_wait_queue ```c= #include <linux/kernel.h> #include <linux/string.h> #include <linux/uaccess.h> #include <linux/syscalls.h> #include <asm/current.h> SYSCALL_DEFINE1(get_number_of_entering_a_wait_queue, void* __user, result) { struct task_struct *task = current; unsigned long times = task->wait_q_times; if(copy_to_user(result, &times, sizeof(unsigned long)) != 0) { printk("get_number_of_entering_a_wait_queue copy fail!\n"); } else printk("get_number_of_entering_a_wait_queue copy success!\n"); return 0; } ``` #### User call ```c= #include <stdio.h> #include <syscall.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <time.h> #define NUMBER_OF_ITERATIONS 99999999 #define NUMBER_OF_IO_ITERATIONS 6 int main () { unsigned long last_switch_count,wait_q_times; int a,j=0; int i,t=2,u=3,v; unsigned int w; char c; for(i=0; i<NUMBER_OF_IO_ITERATIONS; i++) { v=1; c = getchar(); } for(i=0; i<NUMBER_OF_ITERATIONS; i++) v=(++t)*(u++); a = syscall(452, (void*)&wait_q_times); printf("System call wait_q return %d\n",a); printf("This process enters wait_q %lu times.\n", wait_q_times); for(i=0; i<NUMBER_OF_IO_ITERATIONS; i++) { v=1; printf("I love my home.\n"); } a = syscall(451, (void*)&last_switch_count); printf("System call csw return %d\n",a); printf("This process encounters %lu times context switches.\n", last_switch_count); int b; scanf("%d",&b); } ``` * 發生進程切換的場景有以下三種: 進程運行不下去了: 1.比如因為要等待IO完成,或者等待某個資源、某個事件: //把進程放進等待隊列,把進程狀態置為TASK_UNINTERRUPTIBLE ``` prepare_to_wait(waitq, wait, TASK_UNINTERRUPTIBLE); //切換進程 schedule(); ``` 2.process還在運行,但kernel不讓它繼續使用CPU了: 比如process的time slice用完了,或者priority 更高的process來了,所以該process必須把CPU的使用權交出來; 3.process還可以運行,但它自己的算法決定主動交出CPU給別的進程: 用戶程序可以通過系統調用sched_yield()來交出CPU,內核則可以通過函數cond_resched()或者yield()來做到。 process切換分為自願切換(Voluntary)和強制切換(Involuntary),以上場景1屬於自願切換,場景2和3屬於強制切換。 * 自願切換發生的時候,進程不再處於運行狀態,比如由於等待IO而阻塞(TASK_UNINTERRUPTIBLE),或者因等待資源和特定事件而休眠(TASK_INTERRUPTIBLE),又或者被debug/trace設置為TASK_STOPPED/TASK_TRACED狀態; * 強制切換發生的時候,進程仍然處於運行狀態(TASK_RUNNING),通常是由於被優先級更高的進程搶占(preempt),或者進程的時間片用完了。 注:實際情況更複雜一些,由於Linux內核支持搶占,kernel preemption有可能發生在自願切換的過程之中,比如進程正進入休眠,本來如果順利完成的話就屬於自願切換,但休眠的過程並不是原子操作,進程狀態先被置成TASK_INTERRUPTIBLE,然後進程切換,如果Kernel Preemption恰好發生在兩者之間,那就打斷了休眠過程,自願切換尚未完成,轉而進入了強制切換的過程(雖然是強制切換,但此時的進程狀態已經不是運行狀態了),下一次進程恢復運行之後會繼續完成休眠的過程。所以判斷進程切換屬於自願還是強制的算法要考慮進程在切換時是否正處於被搶占(preempt)的過程中,參見以下內核代碼: 最後,澄清幾個容易產生誤解的場景: * 進程可以通過調用sched_yield()主動交出CPU,這不是自願切换,而是屬於強制切换,因為進程仍然處於運行狀態。 * 有时候內核代碼會在耗時較長的循环體内通過調用 cond_resched()或yield() ,主動讓出CPU,以免CPU被内核代碼佔據太久,給其它進程運行機會。這也屬於強制切换,因為進程仍然處於運行狀態。