# FALL2023 Linux Operating System Project 2 ###### tags: `Linux` `project` 112522105 曾尚群 112522121 馬維欣 112522028 李軒豪 ## 題目 [Project 2](https://staff.csie.ncu.edu.tw/hsufh/COURSES/FALL2023/linux_project_2.html) ### 題目簡介 寫一個新的系統呼叫 `int my_set_process_priority (int x)`,使得每次 context switch 將 CPU 轉移給 process P 時,可以設定它的 priority 為 x。這個系統呼叫的返回值是 0 或 1。0 表示執行這個系統呼叫時發生錯誤。1 表示系統呼叫成功完成。參數 x 的值應該在 101 到 139 之間。其他的參數值會導致錯誤。也就是說,當系統呼叫收到一個錯誤的參數值時,系統呼叫返回 0。 ### Source code of `main.c` 我們將 `int my_set_process_priority (int x)` system call 之 ID 指定為 451。 ```c= #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #define TOTAL_ITERATION_NUM 100000000 #define NUM_OF_PRIORITIES_TESTED 40 int my_set_process_priority(int x) { return (int)syscall(451, x); } int main() { int index=0; int priority,i; struct timeval start[NUM_OF_PRIORITIES_TESTED], end[NUM_OF_PRIORITIES_TESTED]; gettimeofday(&start[index], NULL); //begin for(i=1;i<=TOTAL_ITERATION_NUM;i++) rand(); gettimeofday(&end[index], NULL); //end /*================================================================================*/ for(index=1, priority=101;priority<=139;++priority,++index) { if(!my_set_process_priority(priority)) printf("Cannot set priority %d\n", priority); gettimeofday(&start[index], NULL); //begin for(i=1;i<=TOTAL_ITERATION_NUM;i++) rand(); gettimeofday(&end[index], NULL); //end } /*================================================================================*/ printf("The process spent %ld uses to execute when priority is not adjusted.\n", ((end[0].tv_sec * 1000000 + end[0].tv_usec) - (start[0].tv_sec * 1000000 + start[0].tv_usec))); for(i=1;i<=NUM_OF_PRIORITIES_TESTED-1;i++) printf("The process spent %ld uses to execute when priority is %d.\n", ((end[i].tv_sec * 1000000 + end[i].tv_usec) - (start[i].tv_sec * 1000000 + start[i].tv_usec)), i+100); return 0; } ``` 此段程式碼大致流程如下: * 先不調整優先級,執行一個迴圈,產生一些隨機數(目的只是製造CPU工作量),並記錄執行的開始和結束時間。 * 再將優先級從 101 遞增到 139,每次執行相同的迴圈,並記錄執行的開始和結束時間。 * 最後將每個優先級的執行時間計算出來,並輸出到標準輸出。 ## References [Project 2](https://staff.csie.ncu.edu.tw/hsufh/COURSES/FALL2023/linux_project_2.html) [linux - What is the concept of vruntime in CFS - Stack Overflow](https://stackoverflow.com/questions/19181834/what-is-the-concept-of-vruntime-in-cfs) ## Environment > Linux kernel version: 5.15.137 Distribution: Ubuntu 22.04.3 gcc: 11.4.0 gdb: 12.1 ## Kernel Space Code ### Modification of Kernel Code Add `int my_fixed_priority` in /include/linux/sched.h `task_struct` # Line 1491 (10 in the below code snippet) ```c= #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH /* * If L1D flush is supported on mm context switch * then we use this callback head to queue kill work * to kill tasks that are not running on SMT disabled * cores */ struct callback_head l1d_flush_kill; #endif int my_fixed_priority; // A new field /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. */ randomized_struct_fields_end /* CPU-specific state of this task: */ struct thread_struct thread; /* * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It *MUST* be at the end of 'task_struct'. * * Do not put anything below here! */ }; ``` 在 /kernel/fork.c 裡面的 `copy_process()` 中把上面新增的 `my_fixed_priority` 初始化為 `0` (在 `return p;` 之前, Line #2441) ```c= static __latent_entropy struct task_struct *copy_process( struct pid *pid, int trace, int node, struct kernel_clone_args *args) { int pidfd = -1, retval; struct task_struct *p; ...... p->my_fixed_priority = 0; return p; ``` 在 `context_switch()` 裡面可以看到 `switch_to()` function call. ```c= /* Here we just switch the register state and the stack. */ switch_to(prev, next, prev); ``` 在 `__swith_to()` in /arch/x86/kernel/process_64.c 可以看到如下註解 > switch_to(x,y) should switch tasks from x to y. ### vruntime 與 static_prio 之關係 > Essentially, vruntime is a measure of the "runtime" of the thread - **the amount of time it has spent on the processor.** > <cite>Source: https://stackoverflow.com/questions/19181834/what-is-the-concept-of-vruntime-in-cfs<cite> 依據上面那段 stackoverflow 回答去 trace source code 可以在 [update_curr()](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/fair.c#L825) 之中看到`vruntime` 是執行時間之累加 ```c static void update_curr(struct cfs_rq *cfs_rq) { struct sched_entity *curr = cfs_rq->curr; u64 now = rq_clock_task(rq_of(cfs_rq)); u64 delta_exec; if (unlikely(!curr)) return; delta_exec = now - curr->exec_start; if (unlikely((s64)delta_exec <= 0)) return; curr->exec_start = now; if (schedstat_enabled()) { struct sched_statistics *stats; stats = __schedstats_from_se(curr); __schedstat_set(stats->exec_max, max(delta_exec, stats->exec_max)); } curr->sum_exec_runtime += delta_exec; schedstat_add(cfs_rq->exec_clock, delta_exec); curr->vruntime += calc_delta_fair(delta_exec, curr); update_min_vruntime(cfs_rq); if (entity_is_task(curr)) { struct task_struct *curtask = task_of(curr); trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime); cgroup_account_cputime(curtask, delta_exec); account_group_exec_runtime(curtask, delta_exec); } account_cfs_rq_runtime(cfs_rq, delta_exec); } ``` 並透過 [calc_delta_fair()](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/fair.c#L644) 權重去修正 ```c static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se) { if (unlikely(se->load.weight != NICE_0_LOAD)) delta = __calc_delta(delta, NICE_0_LOAD, &se->load); return delta; } ``` 而權重是由 `static_prio` 在 [set_load_weight()](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/core.c#L1203) 中透過 [sched_prio_to_weight](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/core.c#L10923) 所對應出來的。 ```c static void set_load_weight(struct task_struct *p, bool update_load) { int prio = p->static_prio - MAX_RT_PRIO; struct load_weight *load = &p->se.load; /* * SCHED_IDLE tasks get minimal weight: */ if (task_has_idle_policy(p)) { load->weight = scale_load(WEIGHT_IDLEPRIO); load->inv_weight = WMULT_IDLEPRIO; return; } /* * SCHED_OTHER tasks have to update their load when changing their * weight */ if (update_load && p->sched_class == &fair_sched_class) { reweight_task(p, prio); } else { load->weight = scale_load(sched_prio_to_weight[prio]); load->inv_weight = sched_prio_to_wmult[prio]; } } ``` ```c const int sched_prio_to_weight[40] = { /* -20 */ 88761, 71755, 56483, 46273, 36291, /* -15 */ 29154, 23254, 18705, 14949, 11916, /* -10 */ 9548, 7620, 6100, 4904, 3906, /* -5 */ 3121, 2501, 1991, 1586, 1277, /* 0 */ 1024, 820, 655, 526, 423, /* 5 */ 335, 272, 215, 172, 137, /* 10 */ 110, 87, 70, 56, 45, /* 15 */ 36, 29, 23, 18, 15, }; ``` ## 測試 static_prio ### 測試修改 `static_prio` 為了依據 `my_fixed_priority` 的值而修改優先權,在 /arch/x86/kernel/process_64.c 之中的 `__switch_to` function 開頭(Line #565)檢查 `my_fixed_priority`,若不為 0 就修改 `static_prio`。 ```c /* * switch_to(x,y) should switch tasks from x to y. * * This could still be optimized: * - fold all the options into a flag word and test it with a single test. * - could test fs/gs bitsliced * * Kprobes not supported here. Set the probe on schedule instead. * Function graph tracer not supported too. */ __visible __notrace_funcgraph struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct *next_p) { struct thread_struct *prev = &prev_p->thread; struct thread_struct *next = &next_p->thread; struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id(); // 若 my_fixed_priority 不為 0,則 set static_prio to my_fixed_priority if (next_p->my_fixed_priority) next_p->static_prio = next_p->my_fixed_priority; ``` ### New system call 並在我們新增之 system call `int my_fixed_priority` 根據參數值設定 `my_fixed_priority`。 ```c #include <linux/kernel.h> #include <linux/syscalls.h> #include <linux/sched.h> SYSCALL_DEFINE1(my_set_process_priority, int, x) { if (x < 101 || x > 139) { return 0; } else { current->my_fixed_priority = x; } printk("current's value of vruntime: %llu\n", current->se.vruntime); return 1; } ``` 接著執行 main.c 來測試 `static_prio` 之影響,根據以下執行結果可發現數值大小對程式所需之執行時間沒有什麼差異。 ``` The process spent 2020706 uses to execute when priority is not adjusted. The process spent 1975008 uses to execute when priority is 101. The process spent 1963755 uses to execute when priority is 102. The process spent 1944268 uses to execute when priority is 103. The process spent 1924135 uses to execute when priority is 104. The process spent 1837485 uses to execute when priority is 105. The process spent 1893226 uses to execute when priority is 106. The process spent 2045678 uses to execute when priority is 107. The process spent 2015171 uses to execute when priority is 108. The process spent 2154645 uses to execute when priority is 109. The process spent 2201007 uses to execute when priority is 110. The process spent 2111720 uses to execute when priority is 111. The process spent 2019423 uses to execute when priority is 112. The process spent 2134753 uses to execute when priority is 113. The process spent 1863293 uses to execute when priority is 114. The process spent 2023724 uses to execute when priority is 115. The process spent 1868610 uses to execute when priority is 116. The process spent 1937802 uses to execute when priority is 117. The process spent 1950290 uses to execute when priority is 118. The process spent 1871952 uses to execute when priority is 119. The process spent 1910462 uses to execute when priority is 120. The process spent 1872151 uses to execute when priority is 121. The process spent 1845879 uses to execute when priority is 122. The process spent 1848309 uses to execute when priority is 123. The process spent 1789066 uses to execute when priority is 124. The process spent 1971255 uses to execute when priority is 125. The process spent 1996993 uses to execute when priority is 126. The process spent 1923429 uses to execute when priority is 127. The process spent 1916814 uses to execute when priority is 128. The process spent 1906097 uses to execute when priority is 129. The process spent 1789882 uses to execute when priority is 130. The process spent 1996162 uses to execute when priority is 131. The process spent 1926518 uses to execute when priority is 132. The process spent 1858048 uses to execute when priority is 133. The process spent 1827906 uses to execute when priority is 134. The process spent 2014355 uses to execute when priority is 135. The process spent 1948083 uses to execute when priority is 136. The process spent 1916628 uses to execute when priority is 137. The process spent 2111792 uses to execute when priority is 138. The process spent 1879825 uses to execute when priority is 139. ``` ![image](https://hackmd.io/_uploads/H1_fXuHO6.png) 從 dmesg 中可看到 `vruntime` 隨程式執行時間變長數值也隨之增加。 ``` [ 133.473369] current's value of vruntime: 2015297951 [ 135.448398] current's value of vruntime: 3991124176 [ 137.412166] current's value of vruntime: 5954672764 [ 139.356449] current's value of vruntime: 7898152496 [ 141.280566] current's value of vruntime: 9821767247 [ 143.117863] current's value of vruntime: 11660934287 [ 145.010905] current's value of vruntime: 13552129161 [ 147.056392] current's value of vruntime: 15595476653 [ 149.071388] current's value of vruntime: 17610884951 [ 151.225856] current's value of vruntime: 19766068103 [ 153.426694] current's value of vruntime: 21965456052 [ 155.538263] current's value of vruntime: 24076726066 [ 157.557546] current's value of vruntime: 26095860628 [ 159.692160] current's value of vruntime: 28231288597 [ 161.555339] current's value of vruntime: 30090651812 [ 163.578946] current's value of vruntime: 32114091629 [ 165.447454] current's value of vruntime: 33981647586 [ 167.385156] current's value of vruntime: 35920833109 [ 169.335349] current's value of vruntime: 37868132400 [ 171.207212] current's value of vruntime: 39739458897 [ 173.117587] current's value of vruntime: 41650838518 [ 174.989657] current's value of vruntime: 43522025417 [ 176.835460] current's value of vruntime: 45369596021 [ 178.683697] current's value of vruntime: 47217050156 [ 180.472695] current's value of vruntime: 49004608572 [ 182.443880] current's value of vruntime: 50975894494 [ 184.440806] current's value of vruntime: 52971512367 [ 186.364172] current's value of vruntime: 54895015717 [ 188.280928] current's value of vruntime: 56810523890 [ 190.186966] current's value of vruntime: 58717904114 [ 191.976797] current's value of vruntime: 60505509153 [ 193.972903] current's value of vruntime: 62500912503 [ 195.899370] current's value of vruntime: 64428409867 [ 197.757372] current's value of vruntime: 66283826929 [ 199.585232] current's value of vruntime: 68111273310 [ 201.599540] current's value of vruntime: 70126522858 [ 203.547580] current's value of vruntime: 72073977437 [ 205.464168] current's value of vruntime: 73989345083 [ 207.575916] current's value of vruntime: 76100627813 ``` ![image](https://hackmd.io/_uploads/SJ_U2_B_T.png) 看不出相關性可能是 OS 沒有太多工作所以使得影響不大,接著朝增加 CPU 負載去測試。 ### 測試修改 `static_prio` 並使得 CPU 滿載 先執行此 bash 以佔滿 CPU,方便觀察當 CPU 負載高時 `static_prio` 對執行時間的影響。 ```bash #!/bin/bash # This program will use all CPU cores on a Linux system # Get the number of CPU cores cores=$(nproc --all) # Define a function to kill all child processes kill_child_processes() { echo "Killing all child processes..." pkill -P $$ exit } # Set a trap to call the function when SIGINT or SIGTERM is received trap kill_child_processes SIGINT SIGTERM # Run an infinite loop on each core for i in $(seq 1 $cores); do while :; do :; done & done # Wait for user to press Ctrl+C echo "Press Ctrl+C to stop the program" wait ``` 由下面結果可看出當 `static_prio` 越低執行越快,當值越低可分配到更多 CPU time 來執行,但真的是這樣嗎? ``` The process spent 2026030 uses to execute when priority is not adjusted. The process spent 1835775 uses to execute when priority is 101. The process spent 1827980 uses to execute when priority is 102. The process spent 1849097 uses to execute when priority is 103. The process spent 1854544 uses to execute when priority is 104. The process spent 1853041 uses to execute when priority is 105. The process spent 2069556 uses to execute when priority is 106. The process spent 2500159 uses to execute when priority is 107. The process spent 2339809 uses to execute when priority is 108. The process spent 2485997 uses to execute when priority is 109. The process spent 2358995 uses to execute when priority is 110. The process spent 2402551 uses to execute when priority is 111. The process spent 2436133 uses to execute when priority is 112. The process spent 2420266 uses to execute when priority is 113. The process spent 2332006 uses to execute when priority is 114. The process spent 2274887 uses to execute when priority is 115. The process spent 2260765 uses to execute when priority is 116. The process spent 2337443 uses to execute when priority is 117. The process spent 2241903 uses to execute when priority is 118. The process spent 2377825 uses to execute when priority is 119. The process spent 2297037 uses to execute when priority is 120. The process spent 2294993 uses to execute when priority is 121. The process spent 2344030 uses to execute when priority is 122. The process spent 2218725 uses to execute when priority is 123. The process spent 2298536 uses to execute when priority is 124. The process spent 2370103 uses to execute when priority is 125. The process spent 2231213 uses to execute when priority is 126. The process spent 2237669 uses to execute when priority is 127. The process spent 2331380 uses to execute when priority is 128. The process spent 2306331 uses to execute when priority is 129. The process spent 2290908 uses to execute when priority is 130. The process spent 2254764 uses to execute when priority is 131. The process spent 2242983 uses to execute when priority is 132. The process spent 2174773 uses to execute when priority is 133. The process spent 2415935 uses to execute when priority is 134. The process spent 2444864 uses to execute when priority is 135. The process spent 2394191 uses to execute when priority is 136. The process spent 2367370 uses to execute when priority is 137. The process spent 2471920 uses to execute when priority is 138. The process spent 2255032 uses to execute when priority is 139. ``` ![image](https://hackmd.io/_uploads/r1yqv_ruT.png) 下面是 dmesg 之 output: ``` [ 1982.095933] current's value of vruntime: 2019642048 [ 1983.931709] current's value of vruntime: 3855322565 [ 1985.759687] current's value of vruntime: 5683186791 [ 1987.608784] current's value of vruntime: 7530882710 [ 1989.463327] current's value of vruntime: 9386186965 [ 1991.316367] current's value of vruntime: 11237935727 [ 1993.385922] current's value of vruntime: 13309737974 [ 1995.886078] current's value of vruntime: 15809298578 [ 1998.225884] current's value of vruntime: 18149081145 [ 2000.711879] current's value of vruntime: 20629958330 [ 2003.070871] current's value of vruntime: 22989263015 [ 2005.473420] current's value of vruntime: 25392981402 [ 2007.909552] current's value of vruntime: 27828622788 [ 2010.329815] current's value of vruntime: 30248254911 [ 2012.661819] current's value of vruntime: 32579890465 [ 2014.936704] current's value of vruntime: 34851477891 [ 2017.197466] current's value of vruntime: 37115186643 [ 2019.534905] current's value of vruntime: 39450689305 [ 2021.776805] current's value of vruntime: 41690425252 [ 2024.154627] current's value of vruntime: 44069931325 [ 2026.451662] current's value of vruntime: 46365597079 [ 2028.746652] current's value of vruntime: 48661184693 [ 2031.090679] current's value of vruntime: 51004707730 [ 2033.309402] current's value of vruntime: 53224416634 [ 2035.607936] current's value of vruntime: 55520062433 [ 2037.978036] current's value of vruntime: 57891679851 [ 2040.209248] current's value of vruntime: 60123271568 [ 2042.446915] current's value of vruntime: 62359033540 [ 2044.778293] current's value of vruntime: 64690620152 [ 2047.084622] current's value of vruntime: 66994243933 [ 2049.375528] current's value of vruntime: 69285728404 [ 2051.630291] current's value of vruntime: 71541420034 [ 2053.873271] current's value of vruntime: 73785114791 [ 2056.048044] current's value of vruntime: 75956852055 [ 2058.463978] current's value of vruntime: 78372548154 [ 2060.908838] current's value of vruntime: 80814546438 [ 2063.302953] current's value of vruntime: 83209856590 [ 2065.670192] current's value of vruntime: 85577497735 [ 2068.141973] current's value of vruntime: 88048980733 ``` ### 測試修改 `static_prio` 並使得 CPU 滿載測試,但倒著測試 下面我們修改測試程式,將 `static_prio` 從遞增改為遞減。 ```c // for (index = 1, priority = 101; priority <= 139; ++priority, ++index) for (index = 1, priority = 139; priority >= 101; --priority, ++index) { if (!my_set_process_priority(priority)) printf("Cannot set priority %d.\n", priority); gettimeofday(&start[index], NULL); // begin for (i = 1; i <= TOTAL_ITERATION_NUM; i++) rand(); gettimeofday(&end[index], NULL); // end } /*================================================================================*/ printf("The process spent %ld uses to execute when priority is not adjusted.\n", ((end[0].tv_sec * 1000000 + end[0].tv_usec) - (start[0].tv_sec * 1000000 + start[0].tv_usec))); for (i = 1; i <= NUM_OF_PRIORITIES_TESTED - 1; i++) printf("The process spent %ld uses to execute when priority is %d.\n", // ((end[i].tv_sec * 1000000 + end[i].tv_usec) - (start[i].tv_sec * 1000000 + start[i].tv_usec)), i + 100); ((end[i].tv_sec * 1000000 + end[i].tv_usec) - (start[i].tv_sec * 1000000 + start[i].tv_usec)), 140 - i); } ``` 執行結果如下,可以發現將 `static_prio` 相反後結果也反過來了,變成 `static_prio` 越大越早執行完。 ``` The process spent 1853794 uses to execute when priority is not adjusted. The process spent 1864695 uses to execute when priority is 139. The process spent 1889550 uses to execute when priority is 138. The process spent 1931517 uses to execute when priority is 137. The process spent 1919280 uses to execute when priority is 136. The process spent 1919862 uses to execute when priority is 135. The process spent 1927162 uses to execute when priority is 134. The process spent 1959397 uses to execute when priority is 133. The process spent 2223555 uses to execute when priority is 132. The process spent 2675564 uses to execute when priority is 131. The process spent 2596256 uses to execute when priority is 130. The process spent 2617641 uses to execute when priority is 129. The process spent 2551892 uses to execute when priority is 128. The process spent 2505582 uses to execute when priority is 127. The process spent 2616207 uses to execute when priority is 126. The process spent 2524626 uses to execute when priority is 125. The process spent 2484606 uses to execute when priority is 124. The process spent 2476003 uses to execute when priority is 123. The process spent 2442095 uses to execute when priority is 122. The process spent 2546463 uses to execute when priority is 121. The process spent 2520236 uses to execute when priority is 120. The process spent 2409671 uses to execute when priority is 119. The process spent 2322118 uses to execute when priority is 118. The process spent 2472866 uses to execute when priority is 117. The process spent 2404122 uses to execute when priority is 116. The process spent 2524165 uses to execute when priority is 115. The process spent 2441315 uses to execute when priority is 114. The process spent 2416174 uses to execute when priority is 113. The process spent 2399548 uses to execute when priority is 112. The process spent 2495979 uses to execute when priority is 111. The process spent 2424141 uses to execute when priority is 110. The process spent 2438499 uses to execute when priority is 109. The process spent 2455793 uses to execute when priority is 108. The process spent 2281964 uses to execute when priority is 107. The process spent 2396571 uses to execute when priority is 106. The process spent 2390729 uses to execute when priority is 105. The process spent 2561988 uses to execute when priority is 104. The process spent 2553584 uses to execute when priority is 103. The process spent 2656366 uses to execute when priority is 102. The process spent 2422569 uses to execute when priority is 101. ``` ![image](https://hackmd.io/_uploads/SktcPdS_T.png) 下面是 dmesg 之 output: ``` [ 2722.342247] current's value of vruntime: 1798630766 [ 2724.206905] current's value of vruntime: 3658730526 [ 2726.096417] current's value of vruntime: 5548649481 [ 2728.027896] current's value of vruntime: 7478764338 [ 2729.947139] current's value of vruntime: 9395493538 [ 2731.866963] current's value of vruntime: 11309653915 [ 2733.794085] current's value of vruntime: 13233859891 [ 2735.753442] current's value of vruntime: 15191697466 [ 2737.976954] current's value of vruntime: 17402913876 [ 2740.652463] current's value of vruntime: 20072916194 [ 2743.248666] current's value of vruntime: 22665006514 [ 2745.866254] current's value of vruntime: 25274534399 [ 2748.418097] current's value of vruntime: 27823149102 [ 2750.923629] current's value of vruntime: 30327796355 [ 2753.539784] current's value of vruntime: 32940527634 [ 2756.064361] current's value of vruntime: 35461047808 [ 2758.548917] current's value of vruntime: 37942095181 [ 2761.024871] current's value of vruntime: 40414327404 [ 2763.466919] current's value of vruntime: 42855026778 [ 2766.013331] current's value of vruntime: 45395180674 [ 2768.533525] current's value of vruntime: 47912254986 [ 2770.943149] current's value of vruntime: 50321172743 [ 2773.265223] current's value of vruntime: 52638472944 [ 2775.738041] current's value of vruntime: 55104998871 [ 2778.142118] current's value of vruntime: 57504256692 [ 2780.666234] current's value of vruntime: 60019576081 [ 2783.107501] current's value of vruntime: 62455424994 [ 2785.523628] current's value of vruntime: 64868627991 [ 2787.923129] current's value of vruntime: 67247036193 [ 2790.419061] current's value of vruntime: 69739672045 [ 2792.843156] current's value of vruntime: 72159093417 [ 2795.281608] current's value of vruntime: 74592391448 [ 2797.737356] current's value of vruntime: 77044972845 [ 2800.019278] current's value of vruntime: 79326152605 [ 2802.415804] current's value of vruntime: 81718955667 [ 2804.806488] current's value of vruntime: 84107437731 [ 2807.368428] current's value of vruntime: 86661874870 [ 2809.921966] current's value of vruntime: 89214835298 [ 2812.578282] current's value of vruntime: 91867289589 ``` ### 結論 由上面的實驗可以發現 `static_prio` 與優先權沒有直接關係,而是與 `vruntime` 關係較大,`static_prio` 雖然能影響到 `vruntime` 之數值,但對整體無太大影響。 | In Ascending Order | In Descending Order | |:--------------------------------------------------:|:--------------------------------------------------:| |![image](https://hackmd.io/_uploads/r1RhVOB_a.png)| ![image](https://hackmd.io/_uploads/ry3PV_HOT.png)| ## 測試 vruntime 接下來的測試將著重於測試 `vruntime`。 `vruntime` 有可能對執行時間有影響,因為 CFS Scheduler 在 [pick_next_task_fair()](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/fair.c#L7502) 會呼叫 [pick_next_entity()](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/fair.c#L4729) 選擇 `vruntime` 最小的行程來執行,也就是紅黑樹左下的節點。 ```c /* * Pick the next process, keeping these things in mind, in this order: * 1) keep things fair between processes/task groups * 2) pick the "next" process, since someone really wants that to run * 3) pick the "last" process, for cache locality * 4) do not run the "skip" process, if something else is available */ static struct sched_entity * pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr) { struct sched_entity *left = __pick_first_entity(cfs_rq); struct sched_entity *se; /* * If curr is set we have to see if its left of the leftmost entity * still in the tree, provided there was anything in the tree at all. */ if (!left || (curr && entity_before(curr, left))) left = curr; se = left; /* ideally we run the leftmost entity */ /* * Avoid running the skip buddy, if running something else can * be done without getting too unfair. */ if (cfs_rq->skip && cfs_rq->skip == se) { struct sched_entity *second; if (se == curr) { second = __pick_first_entity(cfs_rq); } else { second = __pick_next_entity(se); if (!second || (curr && entity_before(curr, second))) second = curr; } if (second && wakeup_preempt_entity(second, left) < 1) se = second; } if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) { /* * Someone really wants this to run. If it's not unfair, run it. */ se = cfs_rq->next; } else if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1) { /* * Prefer last buddy, try to return the CPU to a preempted task. */ se = cfs_rq->last; } return se; } ``` ### Test 1 先修改 `__switch_to()` 開頭設定下方變數的值,來嘗試固定住 `vruntime`,測試它常數時結果有無影響,但修改後 `vruntime` 依然會隨著執行時間增加,並不是常數,代表後續還有地方會修改它。 ```c= __visible __notrace_funcgraph struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct *next_p) { struct thread_struct *prev = &prev_p->thread; struct thread_struct *next = &next_p->thread; struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id(); next_p->se.vruntime = 1; ``` ### Test 2 接續 Test 1 並使用 [此 script](https://hackmd.io/fYBthNyNTAG85N--g9QNAw?view#%E6%B8%AC%E8%A9%A6%E4%BF%AE%E6%94%B9-static_prio-%E4%B8%A6%E4%BD%BF%E5%BE%97-CPU-%E6%BB%BF%E8%BC%89) 來讓 CPU 處於滿載狀態再進行測試。 並將 /arch/x86/kernel/process_64.c 之中的 `__switch_to()` 還原 Test 1 之修改成 [Code](https://hackmd.io/fYBthNyNTAG85N--g9QNAw#%E6%B8%AC%E8%A9%A6%E4%BF%AE%E6%94%B9-static_prio) 再修改 kernel/sched/fair.c 之中的 `update_curr()`。 ```c= static void update_curr(struct cfs_rq *cfs_rq) { struct sched_entity *curr = cfs_rq->curr; u64 now = rq_clock_task(rq_of(cfs_rq)); u64 delta_exec; if (unlikely(!curr)) return; delta_exec = now - curr->exec_start; if (unlikely((s64)delta_exec <= 0)) return; curr->exec_start = now; if (schedstat_enabled()) { struct sched_statistics *stats; stats = __schedstats_from_se(curr); __schedstat_set(stats->exec_max, max(delta_exec, stats->exec_max)); } curr->sum_exec_runtime += delta_exec; schedstat_add(cfs_rq->exec_clock, delta_exec); if (current->my_fixed_priority == 0) curr->vruntime += calc_delta_fair(delta_exec, curr); else curr->vruntime = 1; update_min_vruntime(cfs_rq); ``` 在回傳前(Line #30)檢查 `my_fixed_priority`,不為 0 的話就將 `vruntime` 設為 1。 ``` [ 1057.522208] current's value of vruntime: 4112847254 [ 1061.380293] current's value of vruntime: 1 [ 1065.321616] current's value of vruntime: 1 [ 1069.490959] current's value of vruntime: 1 [ 1073.621199] current's value of vruntime: 1 [ 1077.712692] current's value of vruntime: 1 [ 1081.616864] current's value of vruntime: 1 [ 1085.701423] current's value of vruntime: 1 [ 1089.707695] current's value of vruntime: 1 [ 1093.332316] current's value of vruntime: 1 [ 1097.484884] current's value of vruntime: 1 [ 1101.783684] current's value of vruntime: 1 [ 1105.841539] current's value of vruntime: 1 [ 1110.036392] current's value of vruntime: 1 [ 1114.255475] current's value of vruntime: 1 [ 1118.125474] current's value of vruntime: 1 [ 1122.243642] current's value of vruntime: 1 [ 1126.242381] current's value of vruntime: 1 [ 1130.326394] current's value of vruntime: 1 [ 1134.523725] current's value of vruntime: 1 [ 1138.548194] current's value of vruntime: 1 [ 1142.625472] current's value of vruntime: 1 [ 1146.840046] current's value of vruntime: 1 [ 1151.008095] current's value of vruntime: 1 [ 1155.193122] current's value of vruntime: 1 [ 1159.166703] current's value of vruntime: 1 [ 1163.473526] current's value of vruntime: 1 [ 1167.748382] current's value of vruntime: 1 [ 1171.550823] current's value of vruntime: 1 [ 1175.539722] current's value of vruntime: 1 [ 1179.858346] current's value of vruntime: 1 [ 1183.971569] current's value of vruntime: 1 [ 1188.115526] current's value of vruntime: 1 [ 1191.976397] current's value of vruntime: 1 [ 1196.334119] current's value of vruntime: 1 [ 1200.415309] current's value of vruntime: 1 [ 1204.392703] current's value of vruntime: 1 [ 1208.164983] current's value of vruntime: 1 [ 1212.265262] current's value of vruntime: 1 ``` 可以發現成功修改 `vruntime` 之值成 1 了,因為每次迭代時都為 1,所以執行時間依舊非常接近。 ### Test 3 接續 Test 2,一樣修改 kernel/sched/fair.c 的 `upate_curr()`。 ```c= if (current->my_fixed_priority == 0) curr->vruntime += calc_delta_fair(delta_exec, curr); else curr->vruntime += 100 * calc_delta_fair(delta_exec, curr); ``` 此測試直接將 `vruntime` 放大來比較 `vruntime` 間的差異,看看結果如何。 下面為 `dmesg` 之 ouput: ``` [ 82.258570] current's value of vruntime: 4167667200 [ 86.412317] current's value of vruntime: 401889783242 [ 90.276094] current's value of vruntime: 787724288742 [ 94.240814] current's value of vruntime: 1182739258842 [ 98.346853] current's value of vruntime: 1593281633442 [ 102.261997] current's value of vruntime: 1984366706542 [ 106.420192] current's value of vruntime: 2400316786342 [ 110.414818] current's value of vruntime: 2799564976242 [ 114.412626] current's value of vruntime: 3199190669342 [ 118.589289] current's value of vruntime: 3616796012242 [ 122.738574] current's value of vruntime: 4031986806242 [ 126.812221] current's value of vruntime: 4439180138542 [ 131.108366] current's value of vruntime: 4868746121742 [ 135.148863] current's value of vruntime: 5272718363742 [ 139.281099] current's value of vruntime: 5685899755042 [ 143.207411] current's value of vruntime: 71355422624 [ 147.290400] current's value of vruntime: 475535074724 [ 151.400704] current's value of vruntime: 885962972024 [ 155.450062] current's value of vruntime: 1281176514824 [ 159.538908] current's value of vruntime: 1685971689624 [ 163.553414] current's value of vruntime: 2087510002824 [ 167.668953] current's value of vruntime: 2499158134924 [ 171.406966] current's value of vruntime: 2872993077524 [ 175.274291] current's value of vruntime: 3259637485824 [ 179.454013] current's value of vruntime: 3677485072924 [ 183.398058] current's value of vruntime: 4071750764024 [ 187.275616] current's value of vruntime: 4459627189924 [ 191.248067] current's value of vruntime: 4857111687124 [ 195.207792] current's value of vruntime: 5253003848524 [ 199.199976] current's value of vruntime: 5652104893124 [ 203.209231] current's value of vruntime: 6053213008024 [ 207.086406] current's value of vruntime: 6440741355424 [ 211.124081] current's value of vruntime: 6844648342524 [ 215.281185] current's value of vruntime: 7260168834424 [ 219.140188] current's value of vruntime: 7646098790124 [ 223.317078] current's value of vruntime: 8063627749024 [ 227.378516] current's value of vruntime: 8469961686424 [ 231.320079] current's value of vruntime: 8863897928924 [ 235.156955] current's value of vruntime: 9247838848824 ``` 中間有一段 `vruntime` 突然變小是因為在 [`place_entity()`](https://elixir.bootlin.com/linux/v5.15.137/source/kernel/sched/fair.c#L4374) 中休眠行程喚醒時會以 `min_vruntim` 值為基礎,重新設定 `vruntime` 的值。 ```c static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { u64 vruntime = cfs_rq->min_vruntime; /* * The 'current' period is already promised to the current tasks, * however the extra weight of the new task will slow them down a * little, place the new task so that it fits in the slot that * stays open at the end. */ if (initial && sched_feat(START_DEBIT)) vruntime += sched_vslice(cfs_rq, se); /* sleeps up to a single latency don't count. */ if (!initial) { unsigned long thresh = sysctl_sched_latency; /* * Halve their sleep time's effect, to allow * for a gentler effect of sleepers: */ if (sched_feat(GENTLE_FAIR_SLEEPERS)) thresh >>= 1; vruntime -= thresh; } /* * Pull vruntime of the entity being placed to the base level of * cfs_rq, to prevent boosting it if placed backwards. * However, min_vruntime can advance much faster than real time, with * the extreme being when an entity with the minimal weight always runs * on the cfs_rq. If the waking entity slept for a long time, its * vruntime difference from min_vruntime may overflow s64 and their * comparison may get inversed, so ignore the entity's original * vruntime in that case. * The maximal vruntime speedup is given by the ratio of normal to * minimal weight: scale_load_down(NICE_0_LOAD) / MIN_SHARES. * When placing a migrated waking entity, its exec_start has been set * from a different rq. In order to take into account a possible * divergence between new and prev rq's clocks task because of irq and * stolen time, we take an additional margin. * So, cutting off on the sleep time of * 2^63 / scale_load_down(NICE_0_LOAD) ~ 104 days * should be safe. */ if (entity_is_long_sleeper(se)) se->vruntime = vruntime; else se->vruntime = max_vruntime(se->vruntime, vruntime); } ``` 主程式輸出,依舊看不出趨勢: ``` The process spent 4139554 uses to execute when priority is not adjusted. The process spent 4149707 uses to execute when priority is 101. The process spent 3860881 uses to execute when priority is 102. The process spent 3962407 uses to execute when priority is 103. The process spent 4104190 uses to execute when priority is 104. The process spent 3913783 uses to execute when priority is 105. The process spent 4157078 uses to execute when priority is 106. The process spent 3993799 uses to execute when priority is 107. The process spent 3997167 uses to execute when priority is 108. The process spent 4176147 uses to execute when priority is 109. The process spent 4148891 uses to execute when priority is 110. The process spent 4073348 uses to execute when priority is 111. The process spent 4295903 uses to execute when priority is 112. The process spent 4040320 uses to execute when priority is 113. The process spent 4132097 uses to execute when priority is 114. The process spent 3926268 uses to execute when priority is 115. The process spent 4086059 uses to execute when priority is 116. The process spent 4113263 uses to execute when priority is 117. The process spent 4051963 uses to execute when priority is 118. The process spent 4091202 uses to execute when priority is 119. The process spent 4016585 uses to execute when priority is 120. The process spent 4117454 uses to execute when priority is 121. The process spent 3739591 uses to execute when priority is 122. The process spent 3868812 uses to execute when priority is 123. The process spent 4181181 uses to execute when priority is 124. The process spent 3945296 uses to execute when priority is 125. The process spent 3878687 uses to execute when priority is 126. The process spent 3973515 uses to execute when priority is 127. The process spent 3960702 uses to execute when priority is 128. The process spent 3993096 uses to execute when priority is 129. The process spent 4010106 uses to execute when priority is 130. The process spent 3877943 uses to execute when priority is 131. The process spent 4038424 uses to execute when priority is 132. The process spent 4157825 uses to execute when priority is 133. The process spent 3859633 uses to execute when priority is 134. The process spent 4177540 uses to execute when priority is 135. The process spent 4062036 uses to execute when priority is 136. The process spent 3942117 uses to execute when priority is 137. The process spent 3837394 uses to execute when priority is 138. The process spent 3668431 uses to execute when priority is 139. ``` ### Test 4 接續 Test 3 一樣修改 kernel/sched/fair.c 的 `upate_curr()`,但是這次為了把 `vruntime` 的差距弄大一點,所以把值改成 2 的 (priority - 100) 次方,使 `vruntime` 快速增長。 ```c= if (current->my_fixed_priority == 0) curr->vruntime += calc_delta_fair(delta_exec, curr); else curr->vruntime = 1 << (current->my_fixed_priority - 100); ``` ``` The process spent 4428744 uses to execute when priority is not adjusted. The process spent 3372224 uses to execute when priority is 101. The process spent 3286566 uses to execute when priority is 102. The process spent 3197117 uses to execute when priority is 103. The process spent 3163331 uses to execute when priority is 104. The process spent 3172321 uses to execute when priority is 105. The process spent 3262332 uses to execute when priority is 106. The process spent 3212122 uses to execute when priority is 107. The process spent 3283453 uses to execute when priority is 108. The process spent 3306081 uses to execute when priority is 109. The process spent 3363687 uses to execute when priority is 110. The process spent 3817252 uses to execute when priority is 111. The process spent 3535661 uses to execute when priority is 112. The process spent 3660268 uses to execute when priority is 113. The process spent 3609197 uses to execute when priority is 114. The process spent 3815558 uses to execute when priority is 115. The process spent 3629498 uses to execute when priority is 116. The process spent 3594311 uses to execute when priority is 117. The process spent 3571539 uses to execute when priority is 118. The process spent 3769295 uses to execute when priority is 119. The process spent 3699368 uses to execute when priority is 120. The process spent 3607773 uses to execute when priority is 121. The process spent 3589997 uses to execute when priority is 122. The process spent 3769907 uses to execute when priority is 123. The process spent 3703860 uses to execute when priority is 124. The process spent 3640303 uses to execute when priority is 125. The process spent 3825626 uses to execute when priority is 126. The process spent 3798506 uses to execute when priority is 127. The process spent 3662586 uses to execute when priority is 128. The process spent 3709986 uses to execute when priority is 129. The process spent 3686079 uses to execute when priority is 130. The process spent 3540194 uses to execute when priority is 131. The process spent 3582423 uses to execute when priority is 132. The process spent 3614944 uses to execute when priority is 133. The process spent 3613389 uses to execute when priority is 134. The process spent 3665369 uses to execute when priority is 135. The process spent 3661545 uses to execute when priority is 136. The process spent 3514923 uses to execute when priority is 137. The process spent 3677330 uses to execute when priority is 138. The process spent 3718200 uses to execute when priority is 139. ``` 這裡出了錯誤,上面程式對 `int 1` 做左移,但是它只有 32 bits 會溢位。 ``` [ 51.720972] current's value of vruntime: 4298572641 [ 55.093205] current's value of vruntime: 2 [ 58.379780] current's value of vruntime: 4 [ 61.576906] current's value of vruntime: 8 [ 64.740245] current's value of vruntime: 16 [ 67.912574] current's value of vruntime: 32 [ 71.174913] current's value of vruntime: 64 [ 74.387044] current's value of vruntime: 128 [ 77.670505] current's value of vruntime: 256 [ 80.975553] current's value of vruntime: 512 [ 84.337816] current's value of vruntime: 1024 [ 88.153782] current's value of vruntime: 2048 [ 91.688506] current's value of vruntime: 4096 [ 95.348008] current's value of vruntime: 8192 [ 98.956608] current's value of vruntime: 16384 [ 102.771673] current's value of vruntime: 32768 [ 106.400803] current's value of vruntime: 65536 [ 109.994827] current's value of vruntime: 131072 [ 113.566141] current's value of vruntime: 262144 [ 117.335250] current's value of vruntime: 524288 [ 121.034475] current's value of vruntime: 1048576 [ 124.642140] current's value of vruntime: 2097152 [ 128.232053] current's value of vruntime: 4194304 [ 132.001892] current's value of vruntime: 8388608 [ 135.705702] current's value of vruntime: 16777216 [ 139.345968] current's value of vruntime: 33554432 [ 143.171568] current's value of vruntime: 67108864 [ 146.970803] current's value of vruntime: 134217728 [ 150.634168] current's value of vruntime: 268435456 [ 154.344866] current's value of vruntime: 536870912 [ 158.031589] current's value of vruntime: 1073741824 [ 161.572343] current's value of vruntime: 18446744071562067968 [ 161.573047] ================================================================================ [ 161.573052] UBSAN: shift-out-of-bounds in kernel/sched/fair.c:854:22 [ 161.573055] shift exponent 32 is too large for 32-bit type 'int' [ 161.573057] CPU: 5 PID: 3004 Comm: a.out Not tainted 5.15.137 #12 [ 161.573059] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 161.573061] Call Trace: [ 161.573063] <TASK> [ 161.573066] dump_stack_lvl+0x4a/0x6b [ 161.573072] dump_stack+0x10/0x18 [ 161.573074] ubsan_epilogue+0x9/0x3a [ 161.573076] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef [ 161.573078] update_curr.cold+0x2c/0x34 [ 161.573081] task_tick_fair+0x50/0x830 [ 161.573083] ? sched_clock+0x9/0x10 [ 161.573086] ? sched_clock_cpu+0x12/0x120 [ 161.573088] scheduler_tick+0xcf/0x310 [ 161.573090] update_process_times+0xc7/0xe0 [ 161.573092] tick_sched_handle+0x29/0x70 [ 161.573095] tick_sched_timer+0x75/0xa0 [ 161.573097] ? tick_sched_do_timer+0xa0/0xa0 [ 161.573099] __hrtimer_run_queues+0x104/0x230 [ 161.573101] hrtimer_interrupt+0x101/0x240 [ 161.573103] __sysvec_apic_timer_interrupt+0x5e/0xe0 [ 161.573105] sysvec_apic_timer_interrupt+0x36/0xa0 [ 161.573107] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 161.573109] RIP: 0033:0x7fbc3c84c230 [ 161.573112] Code: 00 e8 24 b0 04 00 eb b0 66 90 48 8d 3d 31 55 1d 00 48 89 44 24 08 e8 df b0 04 00 48 8b 44 24 08 48 83 c4 18 5b 5d c3 0f 1f 00 <f3> 0f 1e fa 48 83 ec 18 ba 01 00 00 00 64 48 8b 04 25 28 00 00 00 [ 161.573114] RSP: 002b:00007ffc23d240e8 EFLAGS: 00000206 [ 161.573116] RAX: 00000000560caf73 RBX: 0000000000000000 RCX: 00007fbc3ca1f238 [ 161.573117] RDX: 0000000000000000 RSI: 00007ffc23d240d4 RDI: 00007fbc3ca1f860 [ 161.573118] RBP: 00007ffc23d24620 R08: 0000000000000000 R09: 00007fbc3ca1f280 [ 161.573119] R10: 0000000000000000 R11: 0000000000000000 R12: 00007ffc23d24738 [ 161.573120] R13: 00005653ffdfb1ee R14: 00005653ffdfdda0 R15: 00007fbc3ca79040 [ 161.573123] </TASK> [ 161.573123] ================================================================================ [ 165.155285] current's value of vruntime: 1 [ 168.770705] current's value of vruntime: 2 [ 172.384530] current's value of vruntime: 4 [ 176.050304] current's value of vruntime: 8 [ 179.712221] current's value of vruntime: 16 [ 183.227475] current's value of vruntime: 32 [ 186.905123] current's value of vruntime: 64 ``` ### Test 5 接續 Test 4,將 `int 1` 改成 `long 1`。 ```c= if (current->my_fixed_priority == 0) curr->vruntime += calc_delta_fair(delta_exec, curr); else curr->vruntime = (long)1 << (current->my_fixed_priority - 100); ``` 由下方之執行結果與上方固定與 100 倍 `vruntime` (Test 2, 3)實驗結果相比,似乎差距有出來了,但還是很接近,似乎在 `vruntime` 大於一定值時會導致較少導致不容易被 CFS 選中導致執行時間會比較久。 ``` The process spent 3987992 uses to execute when priority is not adjusted. The process spent 3445567 uses to execute when priority is 101. The process spent 3770163 uses to execute when priority is 102. The process spent 3617444 uses to execute when priority is 103. The process spent 3644045 uses to execute when priority is 104. The process spent 3755615 uses to execute when priority is 105. The process spent 3803837 uses to execute when priority is 106. The process spent 3580712 uses to execute when priority is 107. The process spent 3582123 uses to execute when priority is 108. The process spent 3830283 uses to execute when priority is 109. The process spent 3776663 uses to execute when priority is 110. The process spent 3549516 uses to execute when priority is 111. The process spent 3802537 uses to execute when priority is 112. The process spent 3778204 uses to execute when priority is 113. The process spent 3533065 uses to execute when priority is 114. The process spent 3799272 uses to execute when priority is 115. The process spent 3736201 uses to execute when priority is 116. The process spent 3662759 uses to execute when priority is 117. The process spent 3870829 uses to execute when priority is 118. The process spent 3739904 uses to execute when priority is 119. The process spent 3736811 uses to execute when priority is 120. The process spent 3607730 uses to execute when priority is 121. The process spent 3586165 uses to execute when priority is 122. The process spent 3559252 uses to execute when priority is 123. The process spent 3630071 uses to execute when priority is 124. The process spent 3499468 uses to execute when priority is 125. The process spent 3758180 uses to execute when priority is 126. The process spent 3685475 uses to execute when priority is 127. The process spent 3647000 uses to execute when priority is 128. The process spent 3649907 uses to execute when priority is 129. The process spent 3683761 uses to execute when priority is 130. The process spent 3615835 uses to execute when priority is 131. The process spent 3689355 uses to execute when priority is 132. The process spent 3692036 uses to execute when priority is 133. The process spent 3790853 uses to execute when priority is 134. The process spent 4011269 uses to execute when priority is 135. The process spent 3677206 uses to execute when priority is 136. The process spent 3881988 uses to execute when priority is 137. The process spent 3818184 uses to execute when priority is 138. The process spent 3752251 uses to execute when priority is 139. ``` 這次 `vruntime` 可正常設置成 2 的次方了。 ``` [ 101.822757] current's value of vruntime: 3944324083 [ 105.268327] current's value of vruntime: 2 [ 109.038494] current's value of vruntime: 4 [ 112.655942] current's value of vruntime: 8 [ 116.299992] current's value of vruntime: 16 [ 120.055612] current's value of vruntime: 32 [ 123.859453] current's value of vruntime: 64 [ 127.440171] current's value of vruntime: 128 [ 131.022298] current's value of vruntime: 256 [ 134.852589] current's value of vruntime: 512 [ 138.629258] current's value of vruntime: 1024 [ 142.178780] current's value of vruntime: 2048 [ 145.982111] current's value of vruntime: 4096 [ 149.761767] current's value of vruntime: 8192 [ 153.296061] current's value of vruntime: 16384 [ 157.096528] current's value of vruntime: 32768 [ 160.833791] current's value of vruntime: 65536 [ 164.497498] current's value of vruntime: 131072 [ 168.369237] current's value of vruntime: 262144 [ 172.109940] current's value of vruntime: 524288 [ 175.847480] current's value of vruntime: 1048576 [ 179.455856] current's value of vruntime: 2097152 [ 183.042612] current's value of vruntime: 4194304 [ 186.602405] current's value of vruntime: 8388608 [ 190.232989] current's value of vruntime: 16777216 [ 193.732915] current's value of vruntime: 33554432 [ 197.491553] current's value of vruntime: 67108864 [ 201.177447] current's value of vruntime: 134217728 [ 204.824835] current's value of vruntime: 268435456 [ 208.475109] current's value of vruntime: 536870912 [ 212.159217] current's value of vruntime: 1073741824 [ 215.775377] current's value of vruntime: 2147483648 [ 219.465044] current's value of vruntime: 4294967296 [ 223.157379] current's value of vruntime: 8589934592 [ 226.948525] current's value of vruntime: 17179869184 [ 230.960089] current's value of vruntime: 34359738368 [ 234.637556] current's value of vruntime: 68719476736 [ 238.519809] current's value of vruntime: 137438953472 [ 242.338245] current's value of vruntime: 274877906944 ``` 新的結果 ``` The process spent 8079862 uses to execute when priority is not adjusted. The process spent 5779832 uses to execute when priority is 101. The process spent 6818705 uses to execute when priority is 102. The process spent 7234621 uses to execute when priority is 103. The process spent 3215762 uses to execute when priority is 104. The process spent 4064804 uses to execute when priority is 105. The process spent 3806057 uses to execute when priority is 106. The process spent 5991658 uses to execute when priority is 107. The process spent 3046623 uses to execute when priority is 108. The process spent 3210019 uses to execute when priority is 109. The process spent 3184245 uses to execute when priority is 110. The process spent 3150085 uses to execute when priority is 111. The process spent 4007876 uses to execute when priority is 112. The process spent 2569941 uses to execute when priority is 113. The process spent 2755365 uses to execute when priority is 114. The process spent 2541993 uses to execute when priority is 115. The process spent 2515570 uses to execute when priority is 116. The process spent 2717884 uses to execute when priority is 117. The process spent 2840883 uses to execute when priority is 118. The process spent 4846713 uses to execute when priority is 119. The process spent 4033845 uses to execute when priority is 120. The process spent 4091932 uses to execute when priority is 121. The process spent 3407736 uses to execute when priority is 122. The process spent 4292261 uses to execute when priority is 123. The process spent 4097759 uses to execute when priority is 124. The process spent 3427645 uses to execute when priority is 125. The process spent 6647513 uses to execute when priority is 126. The process spent 4491951 uses to execute when priority is 127. The process spent 3770880 uses to execute when priority is 128. The process spent 4356292 uses to execute when priority is 129. The process spent 3820705 uses to execute when priority is 130. The process spent 4810189 uses to execute when priority is 131. The process spent 5704320 uses to execute when priority is 132. The process spent 3759986 uses to execute when priority is 133. The process spent 4443290 uses to execute when priority is 134. The process spent 3034053 uses to execute when priority is 135. The process spent 4011569 uses to execute when priority is 136. The process spent 8203326 uses to execute when priority is 137. The process spent 14792166 uses to execute when priority is 138. The process spent 12428543 uses to execute when priority is 139. ``` ``` [91363.933971] current's value of vruntime: 2 [91392.001838] current's value of vruntime: 4 [91399.784766] current's value of vruntime: 8 [91405.510675] current's value of vruntime: 16 [91415.718312] current's value of vruntime: 32 [91425.307306] current's value of vruntime: 64 [91429.736608] current's value of vruntime: 128 [91435.091146] current's value of vruntime: 256 [91438.707134] current's value of vruntime: 512 [91444.188028] current's value of vruntime: 1024 [91452.324652] current's value of vruntime: 2048 [91458.640939] current's value of vruntime: 4096 [91465.336580] current's value of vruntime: 8192 [91472.357091] current's value of vruntime: 16384 [91481.586793] current's value of vruntime: 32768 [91485.540576] current's value of vruntime: 65536 [91490.750686] current's value of vruntime: 131072 [91495.901628] current's value of vruntime: 262144 [91504.647470] current's value of vruntime: 524288 [91510.837274] current's value of vruntime: 1048576 [91516.241505] current's value of vruntime: 2097152 [91520.090351] current's value of vruntime: 4194304 [91524.199431] current's value of vruntime: 8388608 [91528.218994] current's value of vruntime: 16777216 [91535.101881] current's value of vruntime: 33554432 [91539.155982] current's value of vruntime: 67108864 [91544.824273] current's value of vruntime: 134217728 [91550.522904] current's value of vruntime: 268435456 [91554.346312] current's value of vruntime: 536870912 [91557.662758] current's value of vruntime: 1073741824 [91563.049835] current's value of vruntime: 2147483648 [91567.698232] current's value of vruntime: 4294967296 [91573.894896] current's value of vruntime: 8589934592 [91578.070975] current's value of vruntime: 17179869184 [91582.523592] current's value of vruntime: 34359738368 [91586.512184] current's value of vruntime: 68719476736 [91592.142344] current's value of vruntime: 137438953472 [91595.970235] current's value of vruntime: 274877906944 ``` ### Test 6 接續 Test 5,這次讓 main program 裡面的 priority 遞減,倒著測試 `vruntime` 確認其是否有影響。 ```c= //for(index=1, priority=101;priority<=139;++priority,++index) for (index = 1, priority = 139; priority >= 101; --priority, ++index) { if(!my_set_process_priority(priority)) printf("Cannot set priority %d\n", priority); gettimeofday(&start[index], NULL); //begin for(i=1;i<=TOTAL_ITERATION_NUM;i++) rand(); gettimeofday(&end[index], NULL); //end } /*================================================================================*/ printf("The process spent %ld uses to execute when priority is not adjusted.\n", ((end[0].tv_sec * 1000000 + end[0].tv_usec) - (start[0].tv_sec * 1000000 + start[0].tv_usec))); for(i=1;i<=NUM_OF_PRIORITIES_TESTED-1;i++) printf("The process spent %ld uses to execute when priority is %d.\n", //((end[i].tv_sec * 1000000 + end[i].tv_usec) - (start[i].tv_sec * 1000000 + start[i].tv_usec)), i+100); ((end[i].tv_sec * 1000000 + end[i].tv_usec) - (start[i].tv_sec * 1000000 + start[i].tv_usec)), 140 - i); return 0; ``` ``` The process spent 9261591 uses to execute when priority is not adjusted. The process spent 4248620 uses to execute when priority is 139. The process spent 3424541 uses to execute when priority is 138. The process spent 5097628 uses to execute when priority is 137. The process spent 3631228 uses to execute when priority is 136. The process spent 5901888 uses to execute when priority is 135. The process spent 3929649 uses to execute when priority is 134. The process spent 5940852 uses to execute when priority is 133. The process spent 3245011 uses to execute when priority is 132. The process spent 4245613 uses to execute when priority is 131. The process spent 4266686 uses to execute when priority is 130. The process spent 4953714 uses to execute when priority is 129. The process spent 10084803 uses to execute when priority is 128. The process spent 15185233 uses to execute when priority is 127. The process spent 8695799 uses to execute when priority is 126. The process spent 4909407 uses to execute when priority is 125. The process spent 11414812 uses to execute when priority is 124. The process spent 9502933 uses to execute when priority is 123. The process spent 4838308 uses to execute when priority is 122. The process spent 2398544 uses to execute when priority is 121. The process spent 2336625 uses to execute when priority is 120. The process spent 3761715 uses to execute when priority is 119. The process spent 4141443 uses to execute when priority is 118. The process spent 10151837 uses to execute when priority is 117. The process spent 6793835 uses to execute when priority is 116. The process spent 2385918 uses to execute when priority is 115. The process spent 2671247 uses to execute when priority is 114. The process spent 2503366 uses to execute when priority is 113. The process spent 2532150 uses to execute when priority is 112. The process spent 2401123 uses to execute when priority is 111. The process spent 2498817 uses to execute when priority is 110. The process spent 3161778 uses to execute when priority is 109. The process spent 2239706 uses to execute when priority is 108. The process spent 2790137 uses to execute when priority is 107. The process spent 2376690 uses to execute when priority is 106. The process spent 2917025 uses to execute when priority is 105. The process spent 2702912 uses to execute when priority is 104. The process spent 2609807 uses to execute when priority is 103. The process spent 2633144 uses to execute when priority is 102. The process spent 2770795 uses to execute when priority is 101. ``` ``` [93768.084071] current's value of vruntime: 38956960869 [93772.332655] current's value of vruntime: 549755813888 [93775.757167] current's value of vruntime: 274877906944 [93780.854748] current's value of vruntime: 137438953472 [93784.485944] current's value of vruntime: 68719476736 [93790.387777] current's value of vruntime: 34359738368 [93794.317391] current's value of vruntime: 17179869184 [93800.258187] current's value of vruntime: 8589934592 [93803.503169] current's value of vruntime: 4294967296 [93807.748744] current's value of vruntime: 2147483648 [93812.015393] current's value of vruntime: 1073741824 [93816.969063] current's value of vruntime: 536870912 [93827.053765] current's value of vruntime: 268435456 [93842.238841] current's value of vruntime: 134217728 [93850.934555] current's value of vruntime: 67108864 [93855.843916] current's value of vruntime: 33554432 [93867.258611] current's value of vruntime: 16777216 [93876.761449] current's value of vruntime: 8388608 [93881.599710] current's value of vruntime: 4194304 [93883.998288] current's value of vruntime: 2097152 [93886.334895] current's value of vruntime: 1048576 [93890.096577] current's value of vruntime: 524288 [93894.237983] current's value of vruntime: 262144 [93904.389716] current's value of vruntime: 131072 [93911.183483] current's value of vruntime: 65536 [93913.569383] current's value of vruntime: 32768 [93916.240607] current's value of vruntime: 16384 [93918.743954] current's value of vruntime: 8192 [93921.276083] current's value of vruntime: 4096 [93923.677187] current's value of vruntime: 2048 [93926.175982] current's value of vruntime: 1024 [93929.337734] current's value of vruntime: 512 [93931.577422] current's value of vruntime: 256 [93934.367536] current's value of vruntime: 128 [93936.744208] current's value of vruntime: 64 [93939.661208] current's value of vruntime: 32 [93942.364097] current's value of vruntime: 16 [93944.973883] current's value of vruntime: 8 [93947.607004] current's value of vruntime: 4 ``` 與前個實驗同樣是 `vruntime` 越大執行越久。 ### 結論 由上面實驗的結果,可以觀察到雖然 `vruntime` 能影響執行時間,但是這現象只能在我們將 `vruntime` 設置的間隔大一點才能發現,在 `vruntime` 高於某個區間才能觀察出該迭代需要較久的執行時間。 ## 其他測試 ### 修改 enqueue_task_fair 想法是在CFS用到vruntime之前去改變static_prio跟vruntime(Line20~22)。 ```c= static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) { struct cfs_rq *cfs_rq; struct sched_entity *se = &p->se; /* * If in_iowait is set, the code below may not trigger any cpufreq * utilization updates, so do it here explicitly with the IOWAIT flag * passed. */ if (p->in_iowait) cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT); for_each_sched_entity(se) { if (se->on_rq) break; cfs_rq = cfs_rq_of(se); if (p->my_fixed_priority) { p->static_prio = p->my_fixed_priority; p->se.vruntime = 1; } enqueue_entity(cfs_rq, se, flags); /* * end evaluation on encountering a throttled cfs_rq * * note: in the case of encountering a throttled cfs_rq we will * post the final h_nr_running increment below. */ if (cfs_rq_throttled(cfs_rq)) break; cfs_rq->h_nr_running++; flags = ENQUEUE_WAKEUP; } for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); cfs_rq->h_nr_running++; if (cfs_rq_throttled(cfs_rq)) break; update_load_avg(cfs_rq, se, UPDATE_TG); update_cfs_group(se); } if (!se) add_nr_running(rq, 1); hrtick_update(rq); } ``` 當process要被加入到CFS的紅黑樹時,enqueue_task_fair這個函式會被呼叫,並且在`enqueue_entity`時被加入到runqueue,藉此影響process在其中的優先權。 https://hackmd.io/@RinHizakura/BJ9m_qs-5 >接著就是與排程最為相關的部分了。首先,for_each_sched_entity 的行為是遍歷輸入的 task_struct 之 shed_entity 與其向上的 parent,這是因為 CFS 的結構是多階層的,因此我們需要對每一層級的 sched entity(可能是屬於 task_struct 或 task_group) 都進行 enqueue 操作。 >對於每次遍歷的目標 sched_entity,如果其不在 runqueue 上的話,就把他加入到自己所屬(cfs_rq_of)的 runqueue(enqueue_entity) ### System call 在system call中修改my_fixed_priority並印出`policy`的值,結果和原來差不多,也就是執行時間不隨著priority的提高而增加。同時也可看出`task_struct`中的`unsigned int policy`的值為0。查詢 `include/uapi/linux/sched.h` >#define SCHED_NORMAL 0 #define SCHED_FIFO 1 #define SCHED_RR 2 #define SCHED_BATCH 3 #define SCHED_IDLE 5 #define SCHED_DEADLINE 6 代表使用的是SCHED_NORMAL,因此用的Scheduler為Completely Fair Scheduler(CFS) >5. Scheduling policies CFS implements three scheduling policies: >- SCHED_NORMAL (traditionally called SCHED_OTHER): The scheduling policy that is used for regular tasks. > > — <cite>Source: https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt<cite> 因為是使用CFS,而其排程的方式是找vruntime最小的process。因此更改`task_struct`中的`static_prio`或`prio`欄位,並不會直接影響到process的優先權。 #### System call ```c= #include <linux/kernel.h> #include <linux/syscalls.h> #include <linux/sched.h> #include <linux/sched/rt.h> SYSCALL_DEFINE1(my_set_process_priority, int, x) { if (x < 101 || x > 139){ return 0; } else{ current->my_fixed_priority = x; printk("current's my_fixed_priority: %d\n", current->my_fixed_priority); printk("current's static priority: %d\n", current->static_prio); printk("load_weight: %d", current->se.load.weight); printk("current's vruntime: %d\n", current->se.vruntime); printk("\n"); } return 1; } ``` ### Testing Result #### Output of main.c ``` The process spent 700235 uses to execute when priority is not adjusted. The process spent 741797 uses to execute when priority is 101. The process spent 690870 uses to execute when priority is 102. The process spent 682518 uses to execute when priority is 103. The process spent 693510 uses to execute when priority is 104. The process spent 689205 uses to execute when priority is 105. The process spent 700151 uses to execute when priority is 106. The process spent 697124 uses to execute when priority is 107. The process spent 693079 uses to execute when priority is 108. The process spent 727996 uses to execute when priority is 109. The process spent 703636 uses to execute when priority is 110. The process spent 707687 uses to execute when priority is 111. The process spent 727939 uses to execute when priority is 112. The process spent 688285 uses to execute when priority is 113. The process spent 684848 uses to execute when priority is 114. The process spent 689580 uses to execute when priority is 115. The process spent 680084 uses to execute when priority is 116. The process spent 689089 uses to execute when priority is 117. The process spent 687513 uses to execute when priority is 118. The process spent 679362 uses to execute when priority is 119. The process spent 679481 uses to execute when priority is 120. The process spent 684154 uses to execute when priority is 121. The process spent 673355 uses to execute when priority is 122. The process spent 715016 uses to execute when priority is 123. The process spent 687629 uses to execute when priority is 124. The process spent 678225 uses to execute when priority is 125. The process spent 676437 uses to execute when priority is 126. The process spent 686919 uses to execute when priority is 127. The process spent 707980 uses to execute when priority is 128. The process spent 677194 uses to execute when priority is 129. The process spent 720130 uses to execute when priority is 130. The process spent 681196 uses to execute when priority is 131. The process spent 699227 uses to execute when priority is 132. The process spent 702344 uses to execute when priority is 133. The process spent 691622 uses to execute when priority is 134. The process spent 678081 uses to execute when priority is 135. The process spent 708347 uses to execute when priority is 136. The process spent 707730 uses to execute when priority is 137. The process spent 695181 uses to execute when priority is 138. The process spent 675494 uses to execute when priority is 139. ``` #### Result of dmesg ``` [87731.010113] current's my_fixed_priority: 101 [87731.010114] current's static priority: 120 [87731.010115] load_weight: 1048576 [87731.010115] current's vruntime: 768281953 [87731.717893] current's my_fixed_priority: 102 [87731.717894] current's static priority: 120 [87731.717895] load_weight: 1048576 [87731.717895] current's vruntime: 1476238097 [87732.412224] current's my_fixed_priority: 103 [87732.412225] current's static priority: 120 [87732.412226] load_weight: 1048576 [87732.412226] current's vruntime: -2126623326 [87733.107687] current's my_fixed_priority: 104 [87733.107688] current's static priority: 120 [87733.107688] load_weight: 1048576 [87733.107689] current's vruntime: -1430727888 [87733.793453] current's my_fixed_priority: 105 [87733.793454] current's static priority: 120 [87733.793455] load_weight: 1048576 [87733.793455] current's vruntime: -746387378 [87734.480151] current's my_fixed_priority: 106 [87734.480152] current's static priority: 120 [87734.480153] load_weight: 1048576 [87734.480153] current's vruntime: -58670743 [87735.209074] current's my_fixed_priority: 107 [87735.209075] current's static priority: 120 [87735.209076] load_weight: 1048576 [87735.209077] current's vruntime: 669377261 [87735.965893] current's my_fixed_priority: 108 [87735.965894] current's static priority: 120 [87735.965895] load_weight: 1048576 [87735.965895] current's vruntime: 1425748240 [87736.753787] current's my_fixed_priority: 109 [87736.753788] current's static priority: 120 [87736.753789] load_weight: 1048576 [87736.753789] current's vruntime: -2080968826 [87737.536793] current's my_fixed_priority: 110 [87737.536795] current's static priority: 120 [87737.536796] load_weight: 1048576 [87737.536796] current's vruntime: -1297695755 [87738.262545] current's my_fixed_priority: 111 [87738.262546] current's static priority: 120 [87738.262547] load_weight: 1048576 [87738.262548] current's vruntime: -569563845 [87738.987134] current's my_fixed_priority: 112 [87738.987135] current's static priority: 120 [87738.987136] load_weight: 1048576 [87738.987136] current's vruntime: 154362076 [87739.731173] current's my_fixed_priority: 113 [87739.731174] current's static priority: 120 [87739.731175] load_weight: 1048576 [87739.731175] current's vruntime: 898634213 ``` 從結果可看出`static_prio`與`vruntime`都沒有被修改,代表後面還有其他函式去修改這兩個的值。 ## 總結 依據 [測試 static_prio](https://hackmd.io/fYBthNyNTAG85N--g9QNAw?view#%E6%B8%AC%E8%A9%A6-static_prio) 與 [測試 vruntime](https://hackmd.io/fYBthNyNTAG85N--g9QNAw?view#%E6%B8%AC%E8%A9%A6-vruntime) 兩節的測試可以發現 `static_prio` 雖然它會影響 `vruntime` 的值,但對行程的執行時間影響不大,而直接修改 `vruntime` 之影響較大,但也不是非常明顯。這可能是因為 CFS 之排程的方式是找紅黑樹左下(vruntime 最小) 的 process,或許是因為當 OS 行程不多時影響不大,所以將值設定太接近時都會以接近的機率被選中來執行。在我們的測試中,寫了一個 script 來執行核心數量的行程以占滿 CPU,所以可能是當測試程式的 `vruntime` 大於該 script 的 `vruntime` 時才會讓排班不容易選到它,使得執行時間變久。