Chap. 05 - Process scheduling

# Chap. 05 - Process scheduling > 課程內容 : 清華大學開放式課程周志遠教授 > 參考書目 : Operating System Concepts (9th), Abraham Silberschatz, Peter Baer, Galvin, Greg Gagne > > 其他科目內容請參見 [[Here]](https://hackmd.io/@ChenZE/By4SOO6Jyl) **Interview:** [#Preemptive/Non-preemptive](#Q-請說明-CPU-排程中-preemptive-與-non-preemptive-的差別) [#Scheduling algorithm](#Q-試舉出幾種常見經典的排程演算法scheduling-algorithm) ## Content > 本章重點聚焦在 CPU scheduling * [Basic concepts](#Basic-concepts) * [1. Preemptive v.s. non-preemptive](#1-Preemptive-vs-non-preemptive) * [1.1 CPU scheduler](#11-CPU-scheduler) * [1.2 Non-preemptive scheduling](#12-Non-preemptive-scheduling) * [1.3 Preemptive](#13-Preemptive) * [Scheduling algorithm](#Scheduling-algorithm) * [1. Scheduling criteria](#1-Scheduling-criteria) * [2. FCFS scheduling](#2-FCFS-scheduling) * [3. SJF scheduling](#3-SJF-scheduling) * [3.1 Non-preemptive](#31-Non-preemptive) * [3.2 Preemptive](#32-Preemptive) * [3.3 Approximate SJF](#33-Approximate-SJF) * [4. Priority scheduling](#4-Priority-scheduling) * [5. RR scheduling](#5-RR-scheduling) * [6. Multi-level queue scheduling](#6-Multi-level-queue-scheduling) * [7. Multi-level feedback queue scheduling](#7-Multi-level-feedback-queue-scheduling) * [Special scheduling issues](#Special-scheduling-issues) * [1. Multi-processor scheduling](#1-Multi-processor-scheduling) * [2. Multi-core processor scheduling](#2-Multi-core-processor-scheduling) * [3. Real-time scheduling](#3-Real-time-scheduling) ## Basic concepts > The idea of multi-prorgamming: > Keep several processes **in memory**. Every time one process has to **wait** and another process **takes over** the use of the CPU. Scheduler 存在的主要目的是**因為 CPU cores 的數量有限**，每次只能執行一個 process，所有必須**透過 scheduler 的管理讓 CPU 不斷計算**。 Process 在執行的過程中是由一連串的 CPU cycle execution 與 I/O wait 交替組成，這些一連串的 CPU cycle 與 I/O wait 稱為 CPU burst 與 I/O burst。 * CPU burst: a large number of short CPU burst * 多數的 CPU burst 時間都很短(速度快) * 但因為 process 大部分都是使用 CPU 計算，所有又會有**大量的 CPU brust** * I/O burst: a small number of long CPU burst * 多數的 I/O burst 時間都很長(速度慢) * 但因為需要處理 I/O 的機會較少，所以 **I/O burst 的數量也比較少** Program 執行時有分為 I/O-bound program 與 CPU-bound program。CPU-bound 的 program 會有比較長的 CPU burst，反之 I/O-bound 都是較短的 CPU burst。 ### 1. Preemptive v.s. non-preemptive > [!Tip] **Interview Ques.** > ##### Q: 請說明 CPU 排程中 preemptive 與 non-preemptive 的差別 > * Non-preemptive 是指 CPU 一旦被某個 process 佔用，就會持續執行該 process 直到它主動主動釋放資源，中途不會被其他 process 搶佔 > * Preemptive 是指在某些情況下(Ex. time slice/priority)會中斷目前正在執行的 process，將 CPU 分配給其他 process 執行 #### 2.1 CPU scheduler > **CPU scheduler** > > CPU scheduler(排程器)的作用是**從 ready queue 之中選擇一個 process 開始執行。** ![image](https://hackmd.io/_uploads/HyNnphuvkx.png) CPU scheduler 會在以下 4 種狀況作動，並判斷是否要在 CPU 與 memory 之間進行 process 的切換(switch) | No. | Status | Intro. | |:-: | :-: | :-: | | 1 | I/O event | switch **from running to waiting** state | | 2 | Time-sharing | switch **from running to ready** state | | 3 | I/O finish | switch **from waiting to ready** state | | 4 | Terminate | process finish | (註：第三種是因為執行時間(timer)抵達上限，在 CPU burst 的過程中直接打斷並將控制權轉給其他的 process) 此外，CPU scheduler 有兩種執行模式: (1) preemptive 與 (2) non-preemptive，兩者差別在於要不要在 process 執行 CPU burst 的過程中**強制切換**。 #### 1.2 Non-preemptive scheduling Non-preemptive scheduling 是屬於**不會打斷 process CPU burst 的 scheduler**。只會在下面兩種情況進行排程(scheduling)與切換(switch) * CPU is **terminated** * switch to the **waiting state**(I/O event) (只會在 No. 1 與 No. 4 的情況下作動) #### 1.3 Preemptive Preemptive 會**強制打斷 process 的 CPU burst**，並且**根據 priority 決定要不要進行 process 的切換**，上述四種狀況都有可能發生 preemptive scheduling。現今的 OS 大多是採用 preemptive scheduling，因為效率比較好，但因為 process 隨時會被中斷執行，所以需要注意**同步化(synchronization)的問題**。 Preemptive scheduler 的作用範圍**不僅限於 user space**。對於 OS kernel 來說其本身也是一隻程式，也有可能會受到 preemptive scheduler 的影響而被迫中斷導致問題出現。解決方式(Unix solution)：如果是 kernel program 接收 scheduler 發出的 interrupt 要求切換 process 的時候，會自動忽略(disable interrupt)直到當前的 process 完成，在這之間也不會做 time sharing。 :::info * Scheduler: 決定下一個要執行的 process * Dispatcher: 決定怎麼執行切換的過程，Ex: context switch 就屬於一種 dispatcher ::: ## Scheduling algorithm > [!Tip] **Interview Ques.** > ##### Q: 試舉出幾種常見/經典的排程演算法(scheduling algorithm) > * **F**irst **C**ome **F**irst **S**erved(FCFS) > * **S**hortest **J**ob **F**irst(SJF) > * **R**ound-**R**obin(RR) ### 1. Scheduling criteria 判斷一個 scheduling algorithm 的好壞有以下幾種標準： (1) 以系統角度而言 * CPU utilization * CPU 使用率 * 理論上介於 0% ~100% 之間 * 但實際上會介在 40% ~ 90% 之間 * **Throughput** * 單位時間內完成的工作量(process) (2) Scheduler 角度而言 * Turnaround time * 一個工作的執行所需的時間 * 從 submission 到 completion * Waiting time * 在 ready queue 中的總計等待時間 * I/O burst 的時間不算在內 * Response time * 第一個 CPU burst 的時間 * 從 submission 到第一次 CPU burst 的時間 > [!Note] > 以下 scheduling algorithm 中舉例的 process 寫法為 process No.(CPU burst time)，例如 P1(24) 代表第一個 process 需要 24 個 CPU cycle burst time，但實際上可能有多個不同的 burst time 切換執行，Ex: CPU -> I/O -> CPU -> I/O -> ... ### 2. FCFS scheduling **F**irst-**C**ome-**F**irst-**S**erved scheduling(先到先執行)，先進入 ready queue 的 process 會先被執行。假設 P$_1$(24)，P$_2$(3)，P$_3$(3)，則執行順序如下 :  ![image](https://hackmd.io/_uploads/B1BcAtySxx.png) * Each process waiting time: P1 = 0, P2 = 24, P3 = 27 * Average waiting time(AWT): $\cfrac{0 + 24 + 27}{3} = 17$ * Improvement: 把 burst time 較短的放前面先執行可以縮短等待時間 (Improvement ver.) ![image](https://hackmd.io/_uploads/SyZ5D1gHeg.png)  * Each waiting time: P$_1$ = 6, P$_2$ = 0, P$_3$ = 3 * AWT: $\cfrac{6 + 0 + 3}{3} = 3$ ### 3. SJF scheduling **S**hortest-**J**ob-**F**irst scheduling(最短的 burst time 先執行)。目的是 minimize 等待時間(waiting time)，因為 FCFS scheduling 的 process 是隨機出現的，哪個先進入 ready queue 就先執行。 SJF scheduling 能夠得到**最小的 waiting time**，但因為無法開全知視角(友哈巴赫?)知道哪些 process 有最短的 burst time，所以是**理論上的最佳解(optimal solution)** 根據 scheduler 分成兩種模式 * Non-preemptive * Preemptive #### 3.1 Non-preemptive CPU 每次執行一個最短 burst time 的 process，直到該 process 完成後才會換下一個 process。以下表 process 為例： | Process No. | Arrival time | Burst time | | :-: | :-: | :-: | | P$_1$ | 0 | 7 | | P$_2$ | 2 | 4 | | P$_3$ | 4 | 1 | | P$_4$ | 5 | 4 | (arrival time 表示進入 ready queue 的時間) ##### (1) T = 0 時：P$_1$ 執行 * Ready queue: P$_1$ * Schedule: P$_1$(7) 只有 P$_1$ 在 ready queue 中等待，所以先執行 P$_1$。 ![image](https://hackmd.io/_uploads/rksvA3dD1x.png) ##### (2) T = 7 時：P$_1$ 完成，P$_3$ 執行** * Ready queue: P$_2$(4), P$_3$(1), P$_4$(4) * Schedule: P$_3$(1) T = 2，4，5 時 P$_2$，P$_3$，P$_4$ 依序進入 ready queue 中等待。且因為是 non-preemptive scheduling 所以會等待 P$_1$ 完成(T=7)才會繼續下一個 process。選擇最短 burst time 的 P$_3$ 執行。 ![image](https://hackmd.io/_uploads/HJSYAn_DJe.png) ##### (3) T = 8 時：P$_3$ 完成，P$_2$ 執行 * Ready queue: P$_2$(4), P$_4$(4) * Schedule: P$_2$(4) T = 8 時 P$_3$ 完成，選擇下一個最短 burst time 的 P$_2$ 執行。 ![image](https://hackmd.io/_uploads/H1jqAhOPyx.png) ##### (4) T = 12 時：P$_2$ 完成，P$_4$ 執行 * Read queue: P$_4$(4) * Schedule: P$_4$(4) T = 12 時 P$_2$ 完成，選擇下一個最短 burst time 的 P$_4$ 執行。 ![image](https://hackmd.io/_uploads/SJTsR2Ov1l.png) ##### (5) T = 16 時：P$_4$ 完成 * waiting time = completion time - arrival time - run time * P$_1$ = 7 - 0 - 7 = 0 * P$_2$ = 12 - 2 - 4 = 6 * P$_3$ = 8 - 4 - 1 = 3 * P$_4$ = 16 - 5 - 4 = 7 * AWT = $\cfrac{0 + 6 + 3 + 7}{4} = 4$ #### 3.2 Preemptive CPU 每次執行一個最短 burst time 的 process，直到該 process 完成後才會換下一個 process，且每個 process 執行**過程中隨時會被打斷**。以下表 process 為例： | Process No. | Arrival time | Burst time | | :-: | :-: | :-: | | P$_1$ | 0 | 7 | | P$_2$ | 2 | 4 | | P$_3$ | 4 | 1 | | P$_4$ | 5 | 4 | ##### (1) T = 0 時：P$_1$(7) 執行 * Ready queue: P$_1$(7) * Schedule: P$_1$(7) 只有 P$_1$ 在 ready queue 中等待，所以先執行 P$_1$。 ![image](https://hackmd.io/_uploads/ByATChuvJx.png) ##### (2) T = 2 時：P$_1$(5) 暫停，P$_2$(4) 執行 * Ready queue: P$_1$(5), P$_2$(4) * Schedule: P$_2$(4) T = 2 時 P$_2$ 進入 ready queue，此時前一個 P$_1$ 還有 5 個 burst time 尚未完成，所以選擇較小的 P$_2$(4) 執行。($P_1(5) > P_2(4)$) ![image](https://hackmd.io/_uploads/HJQyyTuPJx.png) ##### (3) T = 4 時：P$_1$(5) 暫停，P$_2$(2) 暫停，P$_3$(1) 執行 * Ready queue: P$_1$(5), P$_2$(2), P$_3$(1) * Schedule: P$_3$(1) T = 4 時 P$_3$ 進入 ready queue，此時前一個 P$_2$ 還有 2 個 burst time 尚未完成，所以選擇較小的 P$_3$(1) 執行。($P_1(5) > P_2(2) > P_3(1)$) ![image](https://hackmd.io/_uploads/SkYBkpODJe.png) ##### (4) T = 5 時：P$_1$(5) 暫停，P$_2$(2) 執行，P$_3$(0) 完成，P$_4$(4) 等待 * Ready queue: P$_1$(5), P$_2$(2), P$_4$(4) * Schedule: P$_2$(2) T = 5 時 P$_4$ 進入 ready queue，此時前一個 P$_3$ 完成，所以選擇最小 burst time 的 P$_2$ 完成。($P_1(5) > P_4(4) > P_2(2)$) ![image](https://hackmd.io/_uploads/HJzu1puDke.png) ##### (5) T = 7 時：P$_1$(5) 暫停，P$_2$(0) 完成，P$_3$(0) 完成，P$_4$(4) 執行 * Ready queue: P$_1$(5), P$_4$(4) * Schedule: P$_4$(4) T = 7 時 P$_2$ 完成，所以選擇最小 burst time 的 P$_4$ 執行。($P_1(5) > P_4(4)$) ![image](https://hackmd.io/_uploads/rkUFJa_Dyl.png) ##### (6) T = 11 時：P$_1$(5) 暫停，P$_2$(0) 完成，P$_3$(0) 完成，P$_4$(0) 完成 * Ready queue: P$_1$(5) * Schedule: P$_1$(5) T = 11 時 P$_4$ 完成，所以選擇最後的 P$_1$ 執行。 ![image](https://hackmd.io/_uploads/Syoqyp_P1l.png) ##### (7) T = 16 時：P$_1$(0) 完成，P$_2$(0) 完成，P$_3$(0) 完成，P$_4$(0) 完成 * Waiting time = completion time - arrival time - run time * P$_1$ = 16 - 0 - 7 = 9 * P$_2$ = 7 - 2 - 4 = 1 * P$_3$ = 5 - 4 - 1 = 0 * P$_4$ = 11 - 5 - 4 = 2 * AWT = $\cfrac{9 + 1 + 0 + 2}{4} = 3$ #### 3.3 Approximate SJF 因為不可能開啟全知視角知道未來 process 的 burst time，但可以**用過去的 burst time 來預測**未來可能的 burst time。預測方式有很多種，以指數移動平均(exponential movement average)為例 $$ \tau_{n+1} = \alpha \cdot t_n + (1 - \alpha) \cdot \tau_n $$ $t_n$ 表前一個 burst time，$\tau_n$ 表過去的 burst time 平均。 ### 4. Priority scheduling 每個 process 給定一個優先度(priority)，此優先順序可以是固定的，也可以在 run time 期間改變。 OS 會**根據每個 process 的優先度做排程**並分配 CPU 的計算資源。 * Scheme * Non-preemptive * Preemptive * Problem：starvation * 優先度較低的 process 可能**永遠不會被執行** * Solution：aging * 一段時間沒有被執行的 process 就自動提升它的 priority ### 5. Round-Robin scheduling 每個 process **輪流執行**一段時間(time quantum, TQ)後再交換。在 TQ 時間後 process 會被**打斷(preempted)並加到 ready queue 的尾端等待**再次執行。RR scheduling 的效能與 TQ 參數有關： * TQ 太大，效果**類似 FCFS**，每個 process 做完才離開 * TQ 太小，**context seitch 次數太過頻繁** 以下表的 process 為例，假設 TQ = 20 | Process No. | Burst time | | :-: | :-: | | P1 | 53 | | P2 | 17 | | P3 | 68 | | P4 | 24 | ![image](https://hackmd.io/_uploads/BkDpypOw1l.png) ### 6. Multi-level queue scheduling 將 ready queue 分割成多個 queue，**每個 queue 有自己的 scheduling algorithm** 並對相同類型的 process 做排程。但因為每此只允許一個 process 執行，所以每個 **queue 之間也需要做 scheduling** 決定要從哪個 queue 挑選 process 執行。 ![image](https://hackmd.io/_uploads/rJrAJpdvke.png) ### 7. Multi-level feedback queue scheduling queue 之間的順序是有意義的，且 process 會在不同的 queue 之間做移動，如下圖所示。 ![image](https://hackmd.io/_uploads/r1NJx6uPkx.png) * 某個 process 進入第一個 queue 並以 FCFS 搭配 TQ = 8 進行。如果在 8 ms 內無法完成就會移動到第二個 queue * 進入第二個 queue 並以 FCFS 搭配 TQ = 16 進行。如果在 16 ms 內無法完成就會移到第三個 queue ## Special scheduling issues 前面的內容都是針對 user space 的 scheduler，後續的內容是針對 kernel space 的 scheduler。 > [!Note] > > * Multi-processor(多處理器) > * 一台計算機有多個(物理)處理器(CPU)，每個處理器能夠獨立執行指令 > * Multi-cores(多核心) > * 一個處理器有內含多個處理核心(cores)，每個核心都能夠獨立執行指令 > * 是 multi-processor 的延伸概念，把多個 CPU 的核心集中到一個 CPU 之中 > * Ex: 雙核心/四核心 CPU > * Multi-tasks(多工) > * 一個計算機中能夠同時執行多個任務(tasks)或程式(program) > * 透過 timie-sharing 的技術實現，一次執行一個程式，但每次執行的時間都很短 > * Ex: 現在作業系統多數都有支援 multi-tasks，例如能夠同時編輯檔案與聽音樂 > * Multi-threads > * 同一個 process 內有多個 threads 同時執行 > * 提高執行的平行效率 > * Ex: 瀏覽網頁時，下載任務使用一個 thread，網頁畫面的呈現使用另一個 thread > * Multi-programming > * 能夠在記憶體中同時容納多個程式 > * 與 multi-tasks 的差別在於 multi-programming 是聚焦在記憶體的管理，增加 CPU 的使用率 > * 涉及到 process 在不同 state queue 中的等待與管理 ### 1. Multi-processor scheduling * Asymmetric multi-processing(AMP)：有一個 **master CPU** 來管理其他的 CPU * Symmetric multi-processing(SMP)：沒有 master CPU，每個 CPU 都是**獨立**的 * 每個 CPU 都是獨立的，能夠自我排程(self-scheduling)，自行決定何時要挑選下一個 process * 所有 process 會放在一個共用的 ready queue 之中，或是每個 CPU 會有自己的 ready queue * 需要處理同步化的問題 ![image](https://hackmd.io/_uploads/HyEzxadP1e.png) (1) Process affinity 把某個 process 綁定在某個指定的 CPU core 執行，增加 cache reuse 的機會。有兩種方式 * Soft affinity：如果 process 等待太久都還沒執行，則綁定到其他的 CPU 上 * Hard affinity：process 不會移動，會持續等到直到被執行 (2) NUMA(Non-Uniform Memory Access) 一種分散式的記憶體架構，每個 CPU 會有自己的 local memory 與其他 CPU 的 remote memory。每個 CPU 可以存取自己的 local moemory(速度快)，也可以存取其他 CPU 的 remote memory(速度慢)，存取的速度會依照距離不同而改變。 ![image](https://hackmd.io/_uploads/Hy6-eT_wJl.png) (3) Load-balancing 盡量讓每個 CPU 都有相同的 loading * Push migration：loading 高的 CPU 丟一些工作(process)給 loading 低的 CPU * Pull migration：loading 低的 CPU 從 loading 高的 CPU 拉一些工作(process) :::info 整體 CPUs 負載較低的話使用 push migration；整體 CPUs 負載較高的話使用 pull migration。 ::: ### 2. Multi-core processor scheduling 與 multi-processor 相比速度較快，且電力需求較低。 ### 3. Real-time scheduling real-time 指的是在 deadline 之前完成 * Soft real-time：盡量避免 miss deadline，但如果錯過 deadline 也沒關係。Ex: midel streaming * Hard real-time：保證避免 miss deadline。Ex: 自動駕駛系統、武器系統 (1) Rate-Monotonic algorithm 一種 fix-priority 的方法，根據 process 的週期(period)決定優先順序。週期越短的優先性越高，如下圖所示，T1 代表每 4 個週期執行 1 次，其餘依此推類。 ![image](https://hackmd.io/_uploads/HkGVea_Dyl.png) (2) Earliest-Deadline-First(DFS) algorithm 一種 nonfix-priority 的方法，優先性會根據 deadline 做調整 ![image](https://hackmd.io/_uploads/ByWLgpdvyl.png)