# Project2
> 第 7 組
> 112522099 許俊偉
> 112522115 林語潔
> 112526001 許仁覺
## Introduction
[linux_project_2 作業說明](https://staff.csie.ncu.edu.tw/hsufh/COURSES/FALL2023/linux_project_2.html)
Write a new system call int my_set_process_priority(int x) so that a process P can use this new system call my_set_process_priority(int x) to set the priority of the process as x every time when a context switch (i.e. process switch) transfers CPU to process P
## Outline
[TOC]
## Enviroment
* VMware 16.2.4
* Ubuntu 16.04 , 64位元
* Kernel 4.15.1
## Compile Kernel
1. 下載 linux kernel 並解壓縮
`tar -xvf ~/linux-4.15.1.tar.gz`
`cd linux-4.15.1/`
下載網址: https://cdn.kernel.org/pub/linux/kernel/v4.x/
2. 在目錄 linux-4.15.1/ 下
`mkdir mycall`
`cd mycall`
3. 新增一個檔案 my_set_priority.c,後面有system call 的程式
4. 在目錄 linux-4.15.1/mycall/ 下
`gedit Makefile`
`obj-y := my_set_priority.o`
5. 在目錄 linux-4.15.1/ 下修改檔案 Makefile
這是為了告訴它,新增的 system call 的 source files 是在 mycall 目錄下
`gedit Makefile`
```
core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/
找到這行,在最後面新增 mycall/
core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ mycall/
```
6. 新增 system call 到 system call table 中
`gedit arch/x86/entry/syscalls/syscall_64.tbl`
```
在最後面新增一行
333 common my_set_priority sys_my_set_priority
```
7. 修改 system call header
`gedit include/linux/syscalls.h`
```
在最下面 (#endif前) 新增一行
asmlinkage long sys_my_set_priority(int number) ;
這定義了我們 system call 的 prototype
asmlinkage 代表我們的參數都可以在 stack 裡取用
```
8. 安裝編譯 kernel 所需的套件
`sudo apt install build-essential libncurses-dev libssl-dev libelf-dev bison flex -y`
9. 設定檔
`sudo make menuconfig`
10. 開始編譯
`sudo make -j4`
`sudo make modules_install install -j4`
`sudo make install -j4`
11. 修改 grub
`sudo gedit /etc/default/grub`
修改成:
```
GRUB_TIMEOUT_STYLE=menu
GRUB_TIMEOUT=3
```
更新 grub:
`sudo grub-mkconfig -o /boot/grub/grub.cfg`
12. 重開機
`reboot`
## Add Syscall
### 設定 my_fixed_priority
```c=
#include <linux/sched.h>
#include <asm/current.h>
#include <linux/printk.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
SYSCALL_DEFINE1(my_set_priority, int, number){
printk("SYSCALL333 | number: %ld\n", number);
current -> my_fixed_priority = number ;
if (current -> my_fixed_priority == number) {
return 1;
}
return 0;
}
```
### 查看 static_prio
```c=
#include <linux/sched.h>
#include <asm/current.h>
#include <linux/printk.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
SYSCALL_DEFINE0(check_static_prio){
printk("SYSCALL334 | static_prio: %ld\n", current -> static_prio);
}
```
## Kernel Space
1. **在 task_struct 新增欄位**
> 路徑: include/linux/sched.h

2. **在 copy_process() 初始化新增的欄位**
> 路徑: kernel/fork.c

3. **在 __schedule() 中更改 static->prio**
> 路徑: kernel/sched/core.c
> 
## User Space
```c=
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <syscall.h>
#define TOTAL_ITERATION_NUM 100000000
#define NUM_OF_PRIORITIES_TESTED 40
int main(){
int index=0;
int priority,i;
struct timeval start[NUM_OF_PRIORITIES_TESTED], end[NUM_OF_PRIORITIES_TESTED];
gettimeofday(&start[index], NULL); //begin
for(i=1;i<=TOTAL_ITERATION_NUM;i++)
rand();
gettimeofday(&end[index], NULL); //end
/*================================================================================*/
for(index=1, priority=101;priority<=139;++priority,++index) {
if(!syscall(333, priority))
printf("Cannot set priority %d.\n", priority);
gettimeofday(&start[index], NULL); //begin
for(i=1;i<=TOTAL_ITERATION_NUM;i++)
rand();
gettimeofday(&end[index], NULL); //end
syscall(334); // 查看 static_prio是否有成功修改
}
/*================================================================================*/
printf("The process spent %ld uses to execute when priority is not adjusted.\n", ((end[0].tv_sec * 1000000 + end[0].tv_usec) - (start[0].tv_sec * 1000000 + start[0].tv_usec)));
for(i=1;i<=NUM_OF_PRIORITIES_TESTED-1;i++)
printf("The process spent %ld uses to execute when priority is %d.\n", ((end[i].tv_sec * 1000000 + end[i].tv_usec) - (start[i].tv_sec * 1000000 + start[i].tv_usec)), i+100);
return 0;
}
```
* 使用 dmesg 查看

> line1: SYSCALL333 設定 my_fixed_priority 為 130
> line2: 在 core.c __schedule() 中,更改 static_prio 之前
> line3: 在 core.c __schedule() 中,更改 static_prio 之後
> line4: SYSCALL334 使用 system call 查看 static_prio 是否有成功修改
* 程式輸出

> 為了防止新process佔用cpu時間,place_entity會對新process的vruntime進行處罰,故雖然改變的static_prior,但因為對新process權重改變的vruntime處罰,實際運行的時間並無隨prior有趨勢上的變化。
## 補充
### 四種Scheduling class及priority
透過Next 把各Scheduling classes 串起來

### task_struct 四種優先級成員變數
[/include/linux/sched.h](https://elixir.bootlin.com/linux/v4.15.1/source/include/linux/sched.h#L520)
在與task有關的task_struct 裡,因應不同的scheuling定義了四種優先級的成員變數
```c=
struct task_struct {
...
int prio;
int static_prio;
int normal_prio;
unsigned int rt_priority;
...
```
> prio
> 動態優先級,系統會針對此prio值去選擇scheduling class,為調度器真正使用的優先度
> 初始值由sched_fork()或由effective_prio()決定
```c=
static int effective_prio(struct task_struct *p)
{
p->normal_prio = normal_prio(p);
/*
* If we are RT tasks or we were boosted to RT priority,
* keep the priority unchanged. Otherwise, update priority
* to the normal priority:
*/
if (!rt_prio(p->prio))
return p->normal_prio;
return p->prio;
}
```
> static_prio
> 静態優先級,為process 被創造時sched_fork()產生
> 可以透過nice、renice或者setpriority()來做修改
> 用戶透過輸入nice到NICE_TO_PRIO修改static_prio值
>normal_prio
>是基於static_prio和調度策略計算出來的優先權,在建立process時會繼承父行程normal_prio。
>normal tasks:normal_prio = static_prio;
>real time task:normal_prio = 99 - rt_priority;
```c=
static inline int __normal_prio(struct task_struct *p)
{
return p->static_prio;
}
static inline int normal_prio(struct task_struct *p)
{
int prio;
if (task_has_dl_policy(p))
prio = MAX_DL_PRIO-1; //Deadline: 固定為-1
else if (task_has_rt_policy(p))
prio = MAX_RT_PRIO-1 - p->rt_priority; //Real time: 固定為100-1-rt_priority
else
prio = __normal_prio(p); //Normal: prior=static_prio
return prio;
}
```
> rt_priority
> Realtime優先級,在被rl調度器調度的process中才會有意義, 範圍0-99
### Prior(Kernel)與Nice(User)轉換
> 主要透過 NICE_TO_PRIO 和 PRIO_TO_NICE 進行轉換

```c=
#define MAX_USER_RT_PRIO 100
#define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2)
#define NICE_TO_PRIO(nice) ((nice) + DEFAULT_PRIO)
#define PRIO_TO_NICE(prio) ((prio) - DEFAULT_PRIO)
```
### 新建process 流程
從 do_fork-->copy_process 進行 process 調度的初始化
```c=
copy_process()
sched_fork()
__sched_fork()
fair_sched_class->task_fork()->task_fork_fair()
__set_task_cpu()
update_curr()
place_entity()
wake_up_new_task()
activate_task()
enqueue_task
fair_sched_class->enqueue_task-->enqueue_task_fair()
```
sched_fork()
>呼叫 __sched_fork() 對 task_struct 進行 default 值的初始化(ex: se.vruntime=0)
>將父 process 的 normal_prio 設目前 process 的 prio
>透過 process 的 prio 分配 sched_class
```c=
int sched_fork(unsigned long clone_flags, struct task_struct *p)
{
unsigned long flags;
int cpu = get_cpu();
__sched_fork(clone_flags, p);
p->state = TASK_NEW; /* TASK_NEW 表示新prcoess */
p->prio = current->normal_prio; /* 將父process的normal_prio設子process 的prio */
if (unlikely(p->sched_reset_on_fork)) { /* 如果重置設定=True,重置policy、static_prio、prio、weight、inv_weight等。 */
if (task_has_dl_policy(p) || task_has_rt_policy(p)) {
p->policy = SCHED_NORMAL;
p->static_prio = NICE_TO_PRIO(0);
p->rt_priority = 0;
} else if (PRIO_TO_NICE(p->static_prio) < 0)
p->static_prio = NICE_TO_PRIO(0);
p->prio = p->normal_prio = __normal_prio(p);
set_load_weight(p, false);
p->sched_reset_on_fork = 0;
}
if (dl_prio(p->prio)) {
put_cpu();
return -EAGAIN;
} else if (rt_prio(p->prio)) {
p->sched_class = &rt_sched_class;
} else {
p->sched_class = &fair_sched_class; /* 設定子process的調度器為CFS */
}
init_entity_runnable_average(&p->se);
raw_spin_lock_irqsave(&p->pi_lock, flags);
...
if (p->sched_class->task_fork)
p->sched_class->task_fork(p); /*使用task_fork方法,CFS 對應task_fork_fair() */
...
}
```
task_fork_fair
> 更新 curr 資訊
> 設定目前 process 的 vuntime 為新 process vruntime
> 呼叫 place_entity 對 process vruntime 進行懲罰(因为新 process導致 CFS run queue 權重發生變化)
```c=
static void task_fork_fair(struct task_struct *p)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se, *curr;
struct rq *rq = this_rq();
struct rq_flags rf;
rq_lock(rq, &rf);
update_rq_clock(rq);
cfs_rq = task_cfs_rq(current); /* 取得目前process所在cpu的cfs_rq */
curr = cfs_rq->curr;
if (curr) {
update_curr(cfs_rq); /*更新目前调度實體cfs_rq->curr資訊 ex: 運行時間vruntime */
se->vruntime = curr->vruntime; /*新process的vrumtime使用目前process的vruntime */
}
place_entity(cfs_rq, se, 1); /*對新process的vruntime 進行懲罰 */
if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
swap(curr->vruntime, se->vruntime);
resched_curr(rq);
}
se->vruntime -= cfs_rq->min_vruntime;
rq_unlock(rq, &rf);
}
```
### 計算vruntime
> CFS 中所謂的 fair 是 vruntime 的,而不是實際時間的平等。
> CFS 調度器拋棄先前固定時間片和固定調度週期的演算法,而採用 process 權重值的比重來量化和計算實際運行時間。
> 引入虛擬時鐘概念,每個 process 虛擬時間是實際運行時間相對 Nice 值為0的權重比例值。
> Nice 值小的 process,優先順序高且權重大,其虛擬時鐘比真實時鐘跑得慢,所以可以獲得更多的實際運行時間。
> 反之,Nice 值大的 process,優先權低權重小,獲得的實際運行時間也更少。
>CFS選擇 vruntime 跑得慢的進程(Nice 低),而不是實際運行時間運行的少的進程。
>vruntime=delta_exec*nice_0_weight/weight
>vruntime表示進程的虛擬運行時間,delta_exec表示進程實際運行時間,nice_0_weight表示nice為0權重值,weight表示該進程的權重值,可以透過prio_to_weight[]取得。
### Process 調度流程
```c=
__schedule()
->pick_next_task()
->pick_next_task_fair()
->context_switch()
->switch_mm()
->cpu_v7_switch_mm()
->switch_to()
->__switch_to
```
__schedule
> 呼叫pick_next_task()讓process調度器從run_queue中選擇一個最適合的next process,然後context_switch()切換到next process運行。
```c=
static void __sched notrace __schedule(bool preempt)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
struct rq_flags rf;
struct rq *rq;
int cpu;
cpu = smp_processor_id();
rq = cpu_rq(cpu);
prev = rq->curr;
...
/* 呼叫pick_next_task_fair()從rq上選擇適當的process傳回next。*/
next = pick_next_task(rq, prev, &rf);
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
/* 如果目前process和next不同,那麼呼叫context_switch()進行上下文切換。 */
if (likely(prev != next)) {
rq->nr_switches++;
rq->curr = next;
++*switch_count;
trace_sched_switch(preempt, prev, next);
/* Also unlocks the rq: */
rq = context_switch(rq, prev, next, &rf);
} else {
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
rq_unlock_irq(rq, &rf);
}
balance_callback(rq);
}
```
## Problem
1. 編譯時缺少 gelf.h

* solution: `sudo apt-get install libelf-dev`
* reference: [Ubuntu编译出现:gelf.h: No such file or directory](https://blog.csdn.net/weixin_44260459/article/details/123035958)
2. 在寫copy_process時放錯位置

* solution: 放在最後一行

## Reference
1. [Source code: /include/linux/sched.h](https://elixir.bootlin.com/linux/v4.15.1/source/include/linux/sched.h#L520)
2. [Source code: /kernel/fork.c](https://elixir.bootlin.com/linux/v4.15.1/source/kernel/fork.c#L1534)
3. [Source code: /kernel/sched/core.c](https://elixir.bootlin.com/linux/v4.15.1/source/kernel/sched/core.c)
4. [四種任務優先度 (prio、static_prio、rt_priority 與 normal_prio) 之間的差異](https://hackmd.io/@kurokuo/r1naFPjRV)
5. [__schedule __setscheduler_params __sched_setscheduler](https://hackmd.io/@LJP/rJVFJsi1E#__setscheduler)
6. [sched_fork过程分析](https://blog.csdn.net/lizhijun_buaa/article/details/129346719)
7. [Linux进程管理 (2)CFS调度器](https://www.cnblogs.com/arnoldlu/p/8465034.html)