OS Introduction + CPU Management

Ch. 1 Introduction + I/O

The kernel is built surrounding interrupts

What is OS
- No definition
- A program that acts as an intermediary between a user of a computer and the computer hardware
- a resource allocator
- a control program
- Everything a vendor ships when you order an operating system
- The one program running at all times on the computer is the kernel
UNIX OS
- POSIX：一個 OS 的標準
- A kernel + some system program (coreutil+ binutil)
- 參照這個標準的例子：UNIX、Linux
Typical PC Organization
- North bridge (memory hub): CPU, Memory, PCIe device (顯卡、NVMe SSDs…)
- South bridge (IO hub): PCI, USB, SCSI, SATA
IO operation
- polling
  - CPU 全程介入
  - Busy-wait cycle to wait for I/O from device
  - CPU perform data movement
  - e.g. GIPO
- interrupt
  - 等到 IO 做好 CPU 再回來繼續
  - Interrupt vector to dispatch interrupt to correct handler
  - Interrupt handler (ISR) receives interrupts
Synchronous I/O vs. Asynchronous I/O
- sync I/O: blocking call, e.g. read(), fread()
- async I/O: non-blocking call, e.g. write(), fwrite()
- e.g.
  - fsync()：讓檔案的寫入變 sync，可以確保資料已經寫入 -> sync IO
  - aio_read()：讓 read 變 async，用在 prefetch，如果應用程式預測未來會用到這些資料，就可以先叫 OS 去拿

Interrupt

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Common function
- hardware: interrupt req. (IRQ)
  - IO completion, timer
  - async. interrupt
- software: trap, exception
  - devide by 0, access violation, system call
  - sync. interrupt
interrupt handling
- 假設 IO 裝置發了一個中斷訊號，他會送給 PIC (=programmable interrupt controller)
- PIC 送訊號給 CPU
- CPU 把 PC 和 flag 存起來
- CPU 去看 IVT (interrupt vector table) 找出要跳到的 ISR 的位址
- CPU 跳到該位置，去找 ISR (interrupt service routine) 把事情做完
- 做到 IRET 代表作完了，把 PC 和 flag 拿回來，再回到 user program
Some terms
- ISR (interrupt service routine)：一串處理 interrupt 的 code
- Interrupt vector：一個 interrupt 對應到一個 ISR 位址
- IVT (interrupt vector table)：一群 Interrupt vector，CPU 收到 interrupt 後會來這裡找 ISR 的位址，再跳到該位址處理 interrupt
- IRQ (interrupt request)：由硬體產生的 interrupt
- PIC (programmable interrupt controller)：首先接收 IO 設備來的 interrupt 再進一步傳給 CPU

Direct Memory Access (DMA)

Data movement of I/O operation
- Option 1: CPU moves data between main memory and I/O local buffers
  - slow
  - waste cycle
- Option 2: CPU offloads data movement to DMA
DMA
- 硬體
- 用來快速把 IO 的資料傳到 memory
- Image Not Showing Possible Reasons
  The image was uploaded to a note which you don't have access to
  The note which the image was originally uploaded to has been deleted
  Learn More →
  - CPU 會告訴 DMA controller 要從哪裡傳到哪裡、要傳多少
  - 接著 DMA controller 就能直接開始資料傳輸，無需 CPU 參與
  - 傳輸結束後，DMA controller 發個 interrupt 給 CPU 告知傳輸已完成，讓 CPU 進行後續處理

Multiprogramming

Multiprogramming organizes processes so CPU always has one to execute
Timesharing (Single core case)
- 頻繁切換 CPU 在跑的 process（大於等於 100 Hz）
- 切換 process 方式
  - 上個 process 發出 sync IO
  - 跑太久
- 因為頻率很高，所以看起來像同步運行
terminology
- Multiprogramming: multiple processes are loaded into memory for execution
- Multitasking: Multiprogramming with overlapped task execution
- Timesharing: multitasking + periodic switch among processes（有了時間的概念）

Booting

Bootstrap program
- Typically stored in ROM or EEPROM, generally known as firmware
- Initializes all aspects of system
- also called BIOS
bootloading procedure
- FFFF:0000
- BIOS，接著去讀磁碟上的第 0 個 section（MBR）
- MBR (master boot record)：去磁碟把 OS 讀進來
- Boot Manager
- Pre-OS
- OS
- Image Not Showing Possible Reasons
  The image was uploaded to a note which you don't have access to
  The note which the image was originally uploaded to has been deleted
  Learn More →
BIOS+MBR has been replaced by UEFI+GPT, an extensive spec.

Ch. 2 System Structures

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

System Program

e.g. cp, mv, ls, …
UNIX = system program + kernel
GNU coreutils + binutils
- Image Not Showing Possible Reasons
  The image was uploaded to a note which you don't have access to
  The note which the image was originally uploaded to has been deleted
  Learn More →
Programming-language support - Compilers, assemblers, debuggers and interpreters sometimes provided

User Interface

Shell: the interface program between users and the kernel
- CLI, GUI（對，GUI 也算 shell）

System Call

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

	system call	system API	system program
e.g.	some assembly	`fork()`, `read()`	`cat`, `ls`, `cp`

kernel 能碰到所有東西（如記憶體），並且 system call 是用中斷的方式執行，所以我們很少直接用 system call，像 fork() 就是 system API
Three most common system APIs
- Win32 API for Windows
- POSIX API for UNIX
- Java API for the Java VM
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- int x86 處理器的 instruction，用來觸發 2e 號中斷
Dual Mode Operations
- kernel mode & user mode
- supported by OS, allowed OS to protect itself against un-trusted code
- some instructions designated as privileged
- application 呼叫 system call，利用 software interrupt 進入 kernel mode
- Image Not Showing Possible Reasons
  The image was uploaded to a note which you don't have access to
  The note which the image was originally uploaded to has been deleted
  Learn More →
System call parameter passing
- 存進 register
- 存進一個 block (table) 並把其位址放進 register 傳入 system call
- 放進 program 的 stack
e.g.
- Image Not Showing Possible Reasons
  The image was uploaded to a note which you don't have access to
  The note which the image was originally uploaded to has been deleted
  Learn More →
- mov eax, 4: system call number (4: sys_write)
- int 0x80: 觸發中斷進入 kernel
- mov ebx, 0: 回傳值
- 寫入和 exit 都是一個 system call

OS Design and Implementation

mechanism & policy
- mechanism: How to do it
- policy: What will be done（或如何選擇）
- e.g. disk IO
  - Mechanism: How to read and write from disk?
  - Policy: Which disk I/O operation should be performed first?

OS Structure

MS-DOS
- 很早的 OS
- 應用程式可以直接碰到記憶體和控制硬體 -> 危險
- 效能超快
- 現在還用在主機板韌體升級
- Image Not Showing Possible Reasons
  The image was uploaded to a note which you don't have access to
  The note which the image was originally uploaded to has been deleted
  Learn More →
FreeRTOS
- A real-time, embedded operating systems
- 把 OS 和 program 直接寫在一起，也沒辦法下載新的 program
UNIX
- Systems programs + Kernel
- module
  - 原先的版本也沒辦法安裝程式，要重新編譯 kernel 才行
  - 但後來做了 module，就能透過新增 kernel module 來達成
Microkernel
- 把大部分的東西丟到 user，只留必要的（CPU scheduling、Memory Management、IPC）
- easy to extend：每個功能都是一個 user 的 process，所以只要加 user process 就能擴充
- easy to port the operating system to new architectures：kernel 本身很小很好移植
- reliable and secure
- 超慢：process 間傳訊息才能溝通 + context switch
Mac OS
- Mach (μ-kernel): memory management, RPC, IPC, message passing, thread scheduling
- BSD: networking, file systems, POSIX APIs
Google Fuchsia
- based on the Zircon microkernel
- designed for IoT devices

Virtual Machine

Provides an interface identical to the underlying bare hardware
Now popular because of cloud-based computation
Different levels of virtualization exist
- 虛擬化硬體 e.g. VM Ware
- 虛擬化 system call e.g. 在 Linux kernel 上跑 WIN32 的 system call
  - 做法：call WIN32 後轉成 POSIX 傳給 kernel
- Namespace
  - 執行環境
  - e.g. Docker、虛擬化 File system
Each virtual machine is isolated from all other virtual machines
VMmare Architecture
Processor Management
- - 只有 VMM 在 kernel mode，其他都在 user mode
  - 當有人在 application 呼叫一個 system call，會發送一個 trap 給 VMM
  - VMM 檢查，選擇適當的 guest OS，並將指令送過去
  - guest OS 做完之後，IRET 會發送給 VMM
  - VMM 再送回 application 讓他知道完了
- sensitive / privilege instruction
  - 只能在 kernel 做的指令
  - 如果 application 要做，則需要先送給 VMM
Java Virtual Machine (JVM)
- JVM 的 ISA 跟底層的 kernel 都不一樣，所以會用到 interpreter 轉換
- e.g. Android

Ch. 3 Process Concept

A process uses the following context
- Text section: executable binaries (e.g. function)
- Stack section: function args. + local vars.
- Data section: global vars. + heap
  - global var. 有沒有初始值，程式對待他的方式會不一樣
  - 沒有初始值的 global var. 又叫做 BSS
- heap：動態記憶體配置的記憶體來源
- Program counter and other CPU registers
- e.g.
```
int i, j = 2;
int foo(int x) {
    int *y;
    char c;
    static char d = 'x';
    
    i = 0;
    y = (int*)malloc(100);
}
```
  - stack: x, y, c
  - data: i, j, d
  - text: foo
  - y 指向一塊由 heap 分配來的記憶體
Process schedule
- running -> ready
  - Case 1: run out of its time quantum
  - Case 2: IO interrupt 讓優先度高的 process 從 waiting 進到 ready 並馬上進入 running，擠掉原先在 running 的 process
Process Control Block (PCB)
- 每個 process 都有一個，儲存 process 的資訊
- 包含
  - PID
  - Process state
  - Saved CPU registers values（之前在 CPU 跑的時候的一些資料）
  - CPU scheduling info (e.g., priority)
  - Memory-management information (e.g. segment table and page-table base register)
  - I/O status info (e.g., opened files)
Context Switch
- ready 和 running 間轉換的過程
- context-switch 的時候 CPU 沒辦法處理任何 process
  - 約 2000ns on Intel 5150
  - 花費時間來源：寫入 & 取回資料進 CPU、pipeline stall 和 cache 的花費時間
- 流程
  1. 執行 P0
  2. P0 被打斷讓出 CPU ，此時處於 idle 狀態
  3. 進行 Context Switch：將 P0 的狀態存於 PCB 並讀取 P1 的 PCB。
  4. 執行 P1
- - 先存 PC（CS + IP）和 FLAG（PSW）
  - 再存 general register
  - 將 stack pointer 存進 PCB (process control block)

Process scheduling

- 下面的都是 waiting
Scheduler
- Short-term scheduler (or CPU scheduler)
  - selects which process should be executed next and allocates CPU
  - involked frequently (ms) -> must to be fast
- Long-term scheduler (or job scheduler)
  - selects which processes should be brought into the ready queue
  - controls the degree of multiprogramming
  - invoked in seconds or minutes
  - process 的性質：這個 process 大部分的時間花在何處
    - IO-bound: spend more time on IO
    - CPU-bound: more time on computation
  - 一個 long-term scheduler 需要有效的混合兩種類型的 process 才能完整運用資源
  - 大部分的 PC 其實沒有 long-term scheduler，因為使用者自己決定有哪些 program 要被運行
  - e.g. MapReduce
- Mid-term scheduler
  - 當記憶體滿的時候，決定哪些 process 的記憶體要被 swap out
  - 現今已被虛擬記憶體取代（process 的記憶體分成 page，根據使用度移出 page）

Operation on Process

Process creation

fork system call creates new process. The child is an exact copy of the parent
exec system call used after a fork to replace the process' memory space with a new program
- hthreadd：kernel daemons
- pdflush()：write() 是 async IO，pdflush 就是負責背景將這些資料寫進記憶體

fork() -> exec() convention

fork 複製一份當前的 process，但 fork 會替換記憶體位置
fork 不會真的完全複製，用 memory mapping 的方式有效地複製，由 MMU（Memory Merged Unit，一個硬體）負責
- 細節：複製初 child 只會對應到原本 process 的 memory，當 child 有改某塊記憶體時，才會額外配置給他，這樣的做法又叫 Copy-On-Write（CoW）

如果 CPU 沒有 MMU 的話會用 vfork() 取代 fork()

他會暫停 parent，直接把 child 的 memory 加到 stack 上，而如果 child 再呼叫 exec 就會重新配置一塊記憶體給他，再把原先 stack 上 child 的東西刪掉，把 stack 還給 parent 再繼續跑























# include <stdio.h>
# include <stdlib.h>
# include <unistd.h>
int x=0;
int main() {
    pid_t pid;
    /* fork another process */
    pid = vfork();
    if (pid < 0) { /* error occurred */
        fprintf(stderr, "Fork Failed");
        exit(-1);
    }
    else if (pid == 0) { /* child process */
        x++; // often, here calls exec()
        _exit(0);
    }
    else { /* parent process */
        /* parent will wait for the child to complete */
        wait (NULL);
        printf ("%d",x); // Output will be 1
        exit(0);
    }
}

Multiprocess architecture example：web browser (Chrome)
- 每個頁面都是一個 process，這樣一個頁面 crash 也不會影響其他頁面

Process termination

兩種方法
1. 用 exit() 自己結束，回傳值要被 parent 回收
2. parent 發送 signal(SIGKILL) 強制結束 child
zombie (defunct) process
- 一個 process 結束但沒有人接收他的 return value
- 他的所有資料都清掉了，但會留一個 entry 在 process table
orphan process
- 在 child 結束以前 parent 就已經結束
- 在 Linux，此時他會被 init (pid=0) 接收，init 會等他並接收他的回傳值
- application: using double fork to process zombie process
  - fork 兩次並把中間那個 process 刪掉，這樣 child 就會自動被 init 接收

Inter-Process Communication

Shared Memory

Linux system call
- shmget(): create a block of shared memory
- shmat(): attach shared memory to the current process’s address space
- shmdt()
- shmctl(): control shared memory (including delete)
Producer-consumer problem
- producer 一直給，consumer 一直拿，要怎麼用 buffer（shared memory）同步兩人的動作
- 限制
  - buffer 大小有限
  - 不允許 overwriting 和 null reading
- solution: round buffer
  - 兩個指標代表 in、out 的位置
  - in == out：空的
  - out + 1 == in：滿了
  - 還是有問題：浪費 CPU cycle -> require proper synchronization (in Ch. 6)

Message Passing

process 間看不到對方的資料
需要建立通道
- 有 buffer
- 會自動處理空或滿的 buffer（built-in synchronization）
e.g. pipe
- pipe is created and configured by the parent process
- 若需寫入（出），另一端需關掉

Signal

一個 process 溝通的方式（但通常是拿來處理一些情況）
signal for process 很像 interrupt for CPU
sync vs. async
- sync
  - 自己觸發
  - divide overflow, memory-access violations, …
  - e.g. SIGSEGV (Memory protection fault), SIGFPE (Arithmetic fault)
- async
  - e.g. SIGKILL, SIGSTOP, SIGCHLD (child terminated)
signal handler：當某個 signal 來的時候，要如何去處理（會用一個函式）

Ch. 4 Thread Programming

thread 又稱 lightweight process，創建 thread 的時間更快，並且 context switch 也比較快

Multithreading Models

Thread Library
- Pthreads (P for POSIX), Win32 threads, Java threads
- POSIX 也規範了在這個作業系統底下 threads 相關的 API
- 但 Pthread spec 沒有定義其 model（for flexibility），意即 Pthreads 可以用 1-1、M-1、M-M 來實作
User thread vs. kernel thread
- user thread：透過 API 創建出的 thread
- kernel thread：並無此名稱，只是用來理解。指的是 CPU 中可以獲得時間或 CPU 排程的 unit
mapping of user thread to kernel thread
- Many-to-One
- One-to-One
- Many-to-Many
Many-to-One
- 許多 user thread mapping 到一個 kernel thread
  - CPU 排成排到這個 process 只會有一個 thread 可以用，如何在 process 中挑選哪個 thread 執行由 thread library 決定
- pros & cons
  - portability：Kernel 不支援 multithread 也能用，也不一定要用在 POSIX 系統
  - 如果其中一個 thread 卡住 (e.g. 執行 read()), 則所有 threads 都會被卡住，因為對 kernel 來說只有一個 thread 而他正在 read
    - 但其實有一些 wrapper function 可以繞過，如
      ssize_t pth_read(int fd, void *buf, size_t nbytes)
  - 無法平行化
- e.g.
  - Solaris Green Threads (Java thread)
  - GNU Portable Threads: An implementation of the Pthread specification
One-to-One
- pros & cons
  - Require kernel support; less portable
  - 可平行化
  - 不會被 sync functuon 卡住
- e.g. Windows NT/XP/2000, Linux Pthread, Solaris 9 and later
Many-to-Many
- 現在較少見
Amdahl’s Law（詳見計組）

Threading issue

fork() and exec() in thread: undefined
signal
- Synchronous signal：被造成這個 signal 的 thread 收到。e.g. access violation
- Asynchronous signal：typically delivered to the first thread that does not block the signal. e.g. process termination

Ch. 5 CPU Scheduling

I/O wait：有多少比例的時間是在等 IO
$p$ : I/O wait;
$n$ : Degree of multiprogramming
CPU scheduling decisions may take place when a process:
1. Switches from running to waiting state (sync IO, sleep(), yield())
2. Switches from running to ready state (context switch)
3. Switches from waiting to ready (when high priority process become ready, scheduler will work and send it to CPU)
4. Terminates
Scheduling under only 1 and 4 is non-preemptive or cooperative
- Easy to implement
- e.g. Windows 3.1, Old versions of Mac OS, Sensor OS (e.g., TinyOS 1.0)
All other scheduling is preemptive（可搶先的）
- Higher responsiveness
- Higher context switch overheads
- Must deal with race conditions (In Ch. 6)
Dispatcher
- 由 scheduler 決定誰被執行，dispatcher 執行換人的動作，其負責
  - switching context
  - 從 kernel mode 轉換到 user mode
  - process 換了之後，page table 的 pointer、program counter(load 到新的 state)
- Dispatch latency: dispatcher 執行換人動作，停止一個 process，啟動另一個 process 所需的時間
  - Scheduling time
  - Interrupt re-enabling time
  - Context switch time

Scheduling Algorithms

Criteria

(+) CPU utilization – keep the CPU as busy as possible
(+) Throughput – # of processes that complete their execution per time unit
(-) Turnaround time – time to execute a particular process
(-) Waiting time – time waiting in the ready queue
(-) Responsiveness (interactivity) – time from user sending input until the system delivers a feedback

Algorithms

First-Come, First-Served (FCFS)
- absolutely fair
- convoy effect: short process behind long process
  - 明明做很快卻要等很久
  - e.g. I/O-bound process 無法趕快做完交給 I/O 繼續做
    - I/O 利用效率不好
    - I/O 反應很慢
Shortest-Job-First (SJF)
- Two schemes:
  - Non-preemptive：直接看 burst time
  - Preemptive：看 remaining time，如果新進來的 process 比當前的 remaining time 還快，則會讓給新來的
    - starvation：很長的 process 儘管早到，若中間一直有小 process 進來，則他會一直無法完成
- Optimal average waiting time
- Large waiting time variation
- 實務上，無法知道 process 實際的 burst time -> using prediction
  - e.g. exponential moving average
  - 根據先前他的 burst time 預測下一次
  - $τ_{n + 1} = α t_{n} + (1 - α) τ_{n}$
- The long-standing dilemma: efficiency vs. fairness
  - Solution: aging（隨著等待時間提高 priority）
Priority Scheduling
- Convention：數字越小 priority 越高
- Problem: Starvation
- Solution: Aging（等越久 priority 越高）
Round Robin (RR)
- Time sharing：每個 process 獲得一個單位時間
  $q$ （通常是 10~100 ms），時間到了還沒結束，就會強制送回 ready queue
- $q$ large = FIFO
- $q$ small：high overhead of context switch
- - large quantum time：better avg turnaround（process 從抵達到完成所花的時間）
  - small quantum time：better interactivity
Multilevel Queue
- 將 ready queue 分為兩個（或以上）的 queue，每個 queue 用不同方法管理
  - foreground (interactive)：RR
  - background (batch)：FIFO
- scheduling 也需要在兩個 queue 間切換，如
  - fixed priority：foreground 做完再做 background -> starvation
  - 分配不同比例的 CPU 時間給 queue
Multilevel Feedback Queue
- Promotion / Demotion：把 process 送進不同 queue
- e.g.
  - 在前兩個 queue 若把時間用完，則會被 demote 到下個 queue，反之則會被 promote
  - 讓 I/O-bound process 能獲得較高優先度，提升 I/O 使用度以及互動性
  - 讓 CPU-bound 能獲得較大的 time quantum，減少 turnaround 及 throughput

Multiple Processor Scheduling

Multiprocessor Architecture

Homogeneous architecture: All CPUs share global memory (SMP, Symmetric MultiProcessing) and contend for memory cycles via the common bus
- Easy
- Low process migration cost
- Bad scalability：多 CPU 會過於競爭 memory
- Cache coherence：每個 CPU 裡都會有 cache 暫存資料，若 CPU 過多則會導致 cache 間的一致性很差
Heterogeneous architecture: CPUs have local memory and memory reference is either local or remote (NUMA, Non-Uniform Memory Access)
- 現今較常見
- Good scalability
- High process migration cost
- common topology: ring, mesh, hypercube*, ad-hoc

Multiple Processor Scheduling Algorithm

partitioned scheduling
- 較適合 NUMA：不用一直傳遞資料
- 較常見
- e.g. Linux
Goal:
- Processor affinity
  - process migration cost (for NUMA): cache re-population, pipeline re-start, process data transfer
  - 最好一個 process 固定在一個 CPU 跑
- Load balancing
- 上述兩者互相衝突

Hardware Multithreading

A thread represents a logical processor, emulated by an independent set of register
節省 memory stall 的時間，再讀取的時候可以先去做其他事
- physical 上的兩個 logical CPU 其實還是共享同一個 physical CPU，所以如果此時有兩個工作 1 0 1 0 會比 1 1 0 0 還好
- Linux strategy：從下往上平均分配 process

Real-Time Scheduling

IEEE definition of real time systems: "A real-time system is a system whose correctness includes its response time as well as its functional correctness."
不用快，但要在 DDL 前做完
Soft real-time：超過 DDL 沒關係但要盡快 e.g. 影音串流
Hard real-time：超過會出事 e.g. 航空電腦

Real-Time Scheduling Algorithm

Support preemptive
Processes have new characteristics: periodic
- 每過一定週期就會需要做這個 process 一次
- 通常 DDL = 週期
Rate Montonic Scheduling
- 週期越短優先度越高
- 有機會 miss DDL
Earliest Deadline First Scheduling (EDF)
- DDL 越早優先度越高

Thread Scheduling

Light-Weight Process (LWP)
- abstraction for mapping of user threads to kernel threads
- LWP 對 kernel thread 通常是 1-1

Two scheduling

Local Scheduling
- User kernel 如何排到 LWP
- Process contention scope (PCS)
Global Scheduling
- Kernel 如何排到 CPU
- System contention scope (SCS)

Pthread API




/* set the scheduling algorithm to PROCESS or SYSTEM */
pthread attr setscope(&attr, PTHREAD_SCOPE_SYSTEM);
/* set the scheduling policy - FIFO, RR, or OTHER */
pthread attr setschedpolicy(&attr, SCHED_OTHER);

Linux 原生的 thread model 是 1-1，所以只能設 PTHREAD_SCOPE_SYSTEM

Operating System Example

Solaris
- priority-base and RR in the same priority
- - 每類有自己的 priority 及演算法
- - time quantum expired：時間用完後 priority 會變多少（demote）
  - return from sleep：sleep 完後 priority 會變多少（promote）
Linux CFS Scheduling (2.6.23+)
- Virtual runtime (vruntime)
  - 每個 process 有自己的 clock，且跑的速度可能不一樣
  - 每次挑選 vruntime 最小的跑
  - 又叫 virtual clock algorithm
- Nice value：-20 ~ +19 (priority: high ~ low)
  - $priority = 120 + nice value$
- Completely Fair Scheduler (CFS)
  - CFS 傾向 I/O-bound，所以其 clock 會跑得比較慢
  - CFS 根據 nice value 和最近的運行狀況（I/O-bound、CPU-bound）調整 vruntime 的增加速度

Ch. 6 Synchronization

Background：同時讀寫資料可能會導致 data inconsistency，尤其是多 process 的狀況，先前利用兩個指標 in、out 表示 buffer 的空間，那能不能利用 cnt 存 buffer 內的資料數量



















/* Producer Code */
while (true) {
    /* produce an item and put in nextProduced */
    while (count == BUFFER_SIZE)
        ; // do nothing
    buffer [in] = nextProduced;
    in = (in + 1) % BUFFER_SIZE;
    count++;
}

/* Consumer Code */
while (true) {
    while (count == 0)
        ; // do nothing
    nextConsumed =  buffer[out];
    out = (out + 1) % BUFFER_SIZE;
    count--;
    /*  consume the item in nextConsumed */
}

但 count++、count-- 實際上由三行組語組成，由於 context switch, Preemptive scheduling 等因素，這些指令可能不是一次執行完成，導致執行結果不符合預期（Race condition）
```
move ax, counter
add ax, 1
move counter, ax
```

Critical Section Problem

每一個 Process 存取 shared data 的 code segment 我們稱為 critical section
solution
- Mutual Exclusion：當一個 Process 執行 critical section 時，其他 Processes 不得執行 critical section
- Progress：如果有 process 想進入 critical section，且沒有其他人也在裡面，則他一定可以進去
  - 為了避免先行排程好進入 critical 的順序
  - e.g. 依照 P1 -> P2 -> P3 的順序，但若 P2 本身程式邏輯就很少會進入 critical section，則會導致 P1 和 P3 都被卡住
- Bounded Waiting：Process 等待進入 critical section 的期限必須受到限制，不可一直排隊等待

Synchronization Hardware

Interrupt Disabling

Masking interrupts prevents preemption, resolve race problem
在 critical section 前後把 interrupt 開關
常用在 kernel
interrupt disabling is Privilege instruction, cannot be used in user mode
無法處理 multiprocessor 不同 CPU 競爭的問題

Atomic Instructions

現今機器有些特別的 atomic hardware instructions，即無法被中斷的指令
在 miltiprocessor 中還會特別把這些 atomic inst. 存取的 memory 獨立開來
e.g.
- Test memory word and set value (TSL)
- Swap contents of two memory words (XCHG)

TestAndSet：把特定 section 或記憶體空間鎖住





















bool TestAndSet (bool *target) {
    bool rv = *target;
    *target = TRUE;
    return rv:
}

/*** P1 ***/
do {
    while (TestAndSet(&lock)); /* do nothing */
    // critical section
    lock = FALSE;
    // remainder section 
} while (TRUE);

/*** P2 ***/
do { 
    while (TestAndSet(&lock)); /* do nothing */
    // critical section
    lock = FALSE;
    // remainder section 
} while (TRUE);

swap










/* In process */
do {
    key = TRUE;
    while (key == TRUE)
        Swap(&lock, &key);
    //  critical section
    lock = FALSE;
    //  remainder section 
} while (TRUE);

TAS / SWAP
- can be used in uniprocessor, multiprocessor, user mode, or kernel mode
- issue
  - wasting CPU cycle（要在那邊等解鎖）
  - the contention is stateless, starvation is possible
- solve mutual exclusive, progressive, but not bounded waiting

Semaphores

Semaphore S：integer variable

Init: S >= 0
wait()
- S--
- running -> waiting (process status)
signal()
- S++
- waiting -> ready
Associated with a waiting queue














wait(S) { 
    S--;
    if (S < 0) { 
        // add this process to waiting queue
        block();
    }
}
Signal(S) { 
    S++;
    if (S <= 0) { 
        // remove a process P from the waiting queue
        wakeup(P);
    }
}

signal 內的條件為 <= 0，代表 S++ 前小於零，亦即有 process 被 block 住，而現在有空間了，所以可以放出 P

Semaphores operations must be atomic
semaphore 的功用用初始值來決定
- Mutex lock: init = 1
- Sequencing or event: init = 0
- Capacity control: init = capacity

Mutex lock: Semaphore mutex = 1







/* P1 */                   	/* P2 */ 
do {                       	do {
    waiting(mutex);        	    waiting(mutex);
    // Critical section    	    // Critical section
    signal(mutex);         	    signal(mutex);
    // Reminder section    	    // Reminder section
} while (TRUE);            	    } while (TRUE);

假設 P1 先進入，waiting(mutex) 將 mutex 變成 0，若此時 P2 進入則再把 mutex 改成 -1，此時 P2 被 block
P1 結束 critical section執行 signal()，把 mutex 變成 0，此時就放出 P2

Sequencing or event: Semaphore synch = 0



/* P1 */           /* P2 */
S1;                wait(synch)
signal(synch);     S2;

Capacity control: Semaphore sem = capacity








/* Pi, Pj, Pk, ... */
{
    // ...
    wait(sem);
    // ...
    signal(sem);
    // ...
}

Deadlock

S and Q be two semaphores initialized to 1






/* P0 */                /* P1 */
wait(S);                wait(Q);
wait(Q);                wait(S);
// Critical section     // Critical section
signal(Q);              signal(S);
signal(S);              signal(Q)

P0 跑完 wait(S)，突然 context switch 到 P1 執行 waiy(Q)，此時兩個 process 就會卡住

Classic Problems of Synchronization

Bounded-Buffer Problem

Semaphore mutex initialized to 1
- protect buffer
- allow many consumer and producer

如何能取用 / 新建資料？
A: 成功進入 buffer 後看使否為空 / 已滿
Issue: 當 buffer 為空且 consumer 先進入 buffer 時，必須先 block 等 producer 放入再往下跑 -> busy waiting

Semaphore full initialized to 0
- empty (for consumer)
Semaphore empty initialized to N
- N free slots



















/*** Producer ***/
do {
    // produce an item
    wait (empty);
    wait (mutex);
    // add the item to the buffer
    signal (mutex);
    signal (full);
} while (true);

/*** Consumer ***/
do {
    wait (full);
    wait (mutex);
    // remove an item from buffer
    signal (mutex);
    signal (empty);
    // consume the removed item
} while (true);

Readers and Writers Problem

Description: A data set is shared among a number of concurrent processes

Readers: 只能寫入

Writers: 可讀取及寫出

Allow multiple reader to read at the same time.

Only one single writer can access the shared data at the same time.

No readers will wait until the writer locked the shared object（當有一個 reader 進了，之後的 reader 都可以一直進）

Readers need not to synch with each other

Semaphore wrt initialized to 1
Semaphore mutex initialized to 1. (to protect “readcount”)
Integer readcount initialized to 0
- 處理第一個 reader 的情況



















/* Writer */
do {
    wait(wrt);
    // writing is performed
    signal(wrt);
} while (true)
    
/* Reader */
do {
    wait(mutex);
    readcount++;
    if (readercount == 1)  wait(wrt);
    signal(mutex);
    // reading is performed
    wait(mutex);
    readcount--;
    if (readcount == 0)  signal(wrt);
    signal (mutex);
} while (true)

Writer may be starve (the second readers-writers problem)
- writer 優先
- 有 writer 要進來，後進來的 reader 不能更新 readcount，所以需要一個多的 lock 加在 readmutex 外：readTry
- 當 writer 正在使用資源，決定什麼時候 readTry 要加 / 解鎖：writecount
- wmutex 保護 writecount

Dining-Philosophers Problem

Description：假設有五位哲學家圍坐在一張圓形餐桌旁，做以下兩件事情之一：吃飯，或者思考。吃東西的時候，他們就停止思考，思考的時候也停止吃東西。所有哲學家吃飯的時候需要身邊的兩隻叉子。問題在於如何設計一套規則，使得在哲學家們在完全不交談，也就是無法知道其他人可能在什麼時候要吃飯或者思考的情況下，可以在這兩種狀態下永遠交替下去。








do {
    wait(chopstick[i]);
    wait(chopStick[(i + 1) % 5]);
    // eat
    signal(chopstick[i]);
    signal(chopstick[(i + 1) % 5]);
    // think 
} while(true);

可能導致 deadlock（每個人都拿著左邊的筷子）
- 只允許 <5 人可以吃（加個訊號機即可）
- 一個人都拿左邊，其他人都拿右邊
- 兩隻筷子同時拿

Sleeping Barber Problem

Description：一個理髮廳有
$n$ 張椅子及一個理髮師

如果沒有人，理髮師會睡覺

如果椅子滿了，客人會直接離開

如果有空位且理髮師正在忙，則客人會坐下等

如果有空位但理髮師在睡覺，客人會去叫醒理髮師

Semaphore customers = 0
- customers are waiting
- barber wait if customer = 0
Semaphore barber = 0
- barber is ready
- customer wait if barber is busy
Semaphore accessSeats = 1 (lock NumberOfFreeSeats)
int NumberOfFreeSeats = N

























/* Customer */
while(1) { 
    wait(accessSeats)              // mutex protect the number of available seats
    if ( NumberOfFreeSeats > 0 ) { // if any free seats
        NumberOfFreeSeats--;       // sitting down on a chair
        signal(customers);         // notify the Barber
        signal(accessSeats);      // release the lock
        wait(barber);              // wait if the B is busy
        ...                        // cut hair
    } 
    else {                         // there are no free seats
        signal(accessSeats);      // release the lock on the seats
        ...                        // C leaves without a haircut
    }
 }

/* Barber */
while(1) {
    wait(Customers) ;              // wait for C and sleep
    wait(accessSeats);            // mutex protect the number of available seats
    NumberOfFreeSeats++;           // one chair gets free
    signal(Barber);                // Bring in a C for haircut
    signal(accessSeats);          // release the mutex on the chairs
    ...
}

Monitors

Mutex Locks

pthread_mutex_lock(), pthread_mutex_unlock()
類似於 init value = 1 的 semaphore
A mutex can only be unlocked by the thread that has locked the mutex

Moniter

可以將 Moniter 想成一個特殊 class (high level language construct)，每一個 Moniter 都有可能發生 race condition 的 local variable (對應下圖的 shared data) 需要保護，而要操作這些 var 必須透過 Moniter 的 method。
Condition variable
- Two operation
  - x.wait()
  - x.signal()：如果有 process 正在等待，則叫醒他，若無則無用處（與 semphore 不同，semphore 的 wait() 會把 cnt+1）
- 在 monitor 當中，condition variable 不是 counter 而是一個 queue，wait()、signal() 分別對應到 push 和 pop，因此若 queue 為空則 signal() 沒有用

Ch. 7 Deadlock

System Model

Resource
$R_{1}, R_{2}, \dots, R_{m}$ ：CPU cycles, memory objects, I/O devices…
Each resource type
$R_{i}$ has
$W_{i}$ instances：DMA channels…
Each process utilizes a resource as follows
1. request
2. use
3. release
Resource Allocation Graph

Deadlock Characterization

若 Deadlock 發生了，則以下事件都會發生

Mutual exclusion
Hold and wait：拿有某些資源的 process 正在 wait 被其他 process 佔有的資源
No preemption：Resource 不能被搶先使用，必須等上一個 process 自願放出
Circular wait：
$P_{0}$ 正在等
$P_{1}$ 的 resource，
$P_{1}$ 正在等
$P_{2}$ 的 resource、…、
$P_{n}$ 正在等
$P_{0}$ 的 resource
In resource allocation graph
- Resources have multiple instances：Deadlock
  $\Rightarrow$ Cycle
- Resources have single instances：Deadlock
  $\Leftrightarrow$ Cycle
- If graph contains no cycles
  $\Rightarrow$ no deadlock
- If graph contains a cycle
  $\Rightarrow$
  - Resources have single instance, then deadlock
  - Resources have multiple instances, then possible deadlock

Handle Deadlock

保證不讓系統進 deadlock state（prevention or avoidance），e.g. RTOS
允許系統進入 deadlock，但能夠復原（detection and recovery）
Just ignore… (used by most OS xdddd)

Deadlock Prevention

blocking at least one of the four conditions required for deadlock to occur
No mutex: impossible
No hold and wait: low concurrency
No preemption
- 當一個握有
  $R_{i}$ 的 process（called victim）正在等另一個資源，則當其他 process 需要
  $R_{i}$ 的時候就會被放出來
- $R_{i}$ 可用時 victim 會被重啟
- 需要 checkpoint 的機制 -> Computationally expensive
No circular wait
- impose a total ordering or a partial ordering of allocation on resources
- e.g.
  $R_{1} \to R_{2}$ then
  $R_{2} \to R_{1}$ can not be exist
- implemented by programmer

Deadlock Avoidance

預測 Deadlock，不要讓 process 進到 unsafe state

Avoidance with single instance resources

重新標示 resource-allocation graph 的邊
- Claim edge: may use a resource at some time
- Request edge: is requesting a resource
- Assignment edge: is holding a resource
每當 process 發出需求，檢查加入該邊後 graph 是否有 cycle，若有則 wait
In commidity OS, deadlock management has been dropped and become the programmer’s responsibility to write deadlock-free programs
But in real-time OS (RTOS), there are still resource-synchronization protocols to avoid deadlocks
e.g. Highest Locker’s Protocol in RTOS
- 當有很多 process 會去鎖一個 resource，當一個 process 成功鎖到時，該 process 的 priority 會增加成那群 process 最高的 priority，一旦釋出 resource 後會還原成原先的 priority
- mutex 只能被鎖上的人解開
- 上半部為一個 deadlock
- 下半部：L 拿到紫色的 resource 後會將自己的 priority 變成 H，此時當 H 進入時不會被 preempt，當 L 釋放紫色後回到原先的 priority，此時 H 才開始跑
- 不要產生 cycle，先把拿到資源的 process 處理完

Avoidance with multiple instance resources

Safe/unsafe-state method
- safe state：一定沒有 deadlock
- unsafe-state：可能產生 deadlock

OS Introduction + CPU Management

Ch. 1 Introduction + I/O

Interrupt

Direct Memory Access (DMA)

Multiprogramming

Booting

Ch. 2 System Structures

System Program

User Interface

System Call

OS Design and Implementation

OS Structure

Virtual Machine

Ch. 3 Process Concept

Process scheduling

Operation on Process

Process creation

Process termination

Inter-Process Communication

Shared Memory

Message Passing

Signal

Ch. 4 Thread Programming

Multithreading Models

Threading issue

Ch. 5 CPU Scheduling

Scheduling Algorithms

Criteria

Algorithms

Multiple Processor Scheduling

Multiprocessor Architecture

Multiple Processor Scheduling Algorithm

Hardware Multithreading

Real-Time Scheduling

Real-Time Scheduling Algorithm

Thread Scheduling

Operating System Example

Ch. 6 Synchronization

Critical Section Problem

Synchronization Hardware

Interrupt Disabling

Atomic Instructions

Semaphores

Classic Problems of Synchronization

Bounded-Buffer Problem

Readers and Writers Problem

Dining-Philosophers Problem

Sleeping Barber Problem

Monitors

Mutex Locks

Moniter

Ch. 7 Deadlock

System Model

Deadlock Characterization

Handle Deadlock

Deadlock Prevention

Deadlock Avoidance

Avoidance with single instance resources

Avoidance with multiple instance resources

Read more

作業系統概論

OS Storage Management

機率 - 2 (Expectation ~ Conti R.V.)

機率 - 1 (Set Thm ~ Entropy)