owned this note
owned this note
Published
Linked with GitHub
# OS Introduction + CPU Management
## Ch. 1 Introduction + I/O
> **The kernel is built surrounding interrupts**
- What is OS
- No definition
- A program that acts as an intermediary between a user of a computer and the computer hardware
- a resource allocator
- a control program
- *Everything a vendor ships when you order an operating system*
- The one program running at all times on the computer is the kernel
- UNIX OS
- POSIX:一個 OS 的標準
- A kernel + some system program (coreutil+ binutil)
- 參照這個標準的例子:UNIX、Linux
- Typical PC Organization
- North bridge (memory hub): CPU, Memory, PCIe device (顯卡、NVMe SSDs...)
- South bridge (IO hub): PCI, USB, SCSI, SATA
- IO operation
- polling
- CPU 全程介入
- Busy-wait cycle to wait for I/O from device
- CPU perform data movement
- e.g. GIPO
- interrupt
- 等到 IO 做好 CPU 再回來繼續
- Interrupt vector to dispatch interrupt to correct handler
- Interrupt handler (ISR) receives interrupts
- Synchronous I/O vs. Asynchronous I/O
- sync I/O: blocking call, e.g. `read()`, `fread()`
- async I/O: non-blocking call, e.g. `write()`, `fwrite()`
- e.g.
- `fsync()`:讓檔案的寫入變 sync,可以確保資料已經寫入 -> sync IO
- `aio_read()`:讓 read 變 async,用在 prefetch,如果應用程式預測未來會用到這些資料,就可以先叫 OS 去拿
### Interrupt

- Common function
- hardware: interrupt req. (IRQ)
- IO completion, timer
- async. interrupt
- software: trap, exception
- devide by 0, access violation, system call
- sync. interrupt
- interrupt handling
- 假設 IO 裝置發了一個中斷訊號,他會送給 PIC (=programmable interrupt controller)
- PIC 送訊號給 CPU
- CPU 把 PC 和 flag 存起來
- CPU 去看 IVT (interrupt vector table) 找出要跳到的 ISR 的位址
- CPU 跳到該位置,去找 ISR (interrupt service routine) 把事情做完
- 做到 IRET 代表作完了,把 PC 和 flag 拿回來,再回到 user program
- Some terms
- ISR (interrupt service routine):一串處理 interrupt 的 code
- Interrupt vector:一個 interrupt 對應到一個 ISR 位址
- IVT (interrupt vector table):一群 Interrupt vector,CPU 收到 interrupt 後會來這裡找 ISR 的位址,再跳到該位址處理 interrupt
- IRQ (interrupt request):由硬體產生的 interrupt
- PIC (programmable interrupt controller):首先接收 IO 設備來的 interrupt 再進一步傳給 CPU
### Direct Memory Access (DMA)
- Data movement of I/O operation
- Option 1: CPU moves data between main memory and I/O local buffers
- slow
- waste cycle
- Option 2: CPU offloads data movement to DMA
- DMA
- 硬體
- 用來快速把 IO 的資料傳到 memory
- 
- CPU 會告訴 DMA controller 要從哪裡傳到哪裡、要傳多少
- 接著 DMA controller 就能直接開始資料傳輸,無需 CPU 參與
- 傳輸結束後,DMA controller 發個 interrupt 給 CPU 告知傳輸已完成,讓 CPU 進行後續處理
### Multiprogramming
- Multiprogramming organizes processes so CPU always has one to execute
- Timesharing (Single core case)
- 頻繁切換 CPU 在跑的 process(大於等於 100 Hz)
- 切換 process 方式
- 上個 process 發出 sync IO
- 跑太久
- 因為頻率很高,所以看起來像同步運行
- terminology
- Multiprogramming: multiple processes are loaded into memory for execution
- Multitasking: Multiprogramming with overlapped task execution
- Timesharing: multitasking + periodic switch among processes(有了時間的概念)
### Booting
- Bootstrap program
- Typically stored in ROM or EEPROM, generally known as firmware
- Initializes all aspects of system
- also called BIOS
- bootloading procedure
- FFFF:0000
- BIOS,接著去讀磁碟上的第 0 個 section(MBR)
- MBR (master boot record):去磁碟把 OS 讀進來
- Boot Manager
- Pre-OS
- OS
- 
- BIOS+MBR has been replaced by UEFI+GPT, an extensive spec.
## Ch. 2 System Structures

### System Program
- e.g. `cp`, `mv`, `ls`, ...
- UNIX = system program + kernel
- GNU coreutils + binutils
- 
- Programming-language support - Compilers, assemblers, debuggers and interpreters sometimes provided
### User Interface
- Shell: the interface program between users and the kernel
- CLI, GUI(對,GUI 也算 shell)
### System Call

| | system call | system API | system program |
| - | -------- | -------- | -------- |
| **e.g.** | some assembly | `fork()`, `read()` | `cat`, `ls`, `cp` |
- kernel 能碰到所有東西(如記憶體),並且 system call 是用中斷的方式執行,所以我們很少直接用 system call,像 `fork()` 就是 system API
- Three most common system APIs
- Win32 API for Windows
- POSIX API for UNIX
- Java API for the Java VM
- 
- `int` x86 處理器的 instruction,用來觸發 `2e` 號中斷
- Dual Mode Operations
- kernel mode & user mode
- supported by OS, allowed OS to protect itself against un-trusted code
- some instructions designated as privileged
- application 呼叫 system call,利用 software interrupt 進入 kernel mode
- 
- System call parameter passing
- 存進 register
- 存進一個 block (table) 並把其位址放進 register 傳入 system call
- 放進 program 的 stack
- e.g.
- 
- `mov eax, 4`: system call number (4: sys_write)
- `int 0x80`: 觸發中斷進入 kernel
- `mov ebx, 0`: 回傳值
- 寫入和 exit 都是一個 system call
### OS Design and Implementation
- mechanism & policy
- mechanism: How to do it
- policy: What will be done(或如何選擇)
- e.g. disk IO
- Mechanism: How to read and write from disk?
- Policy: Which disk I/O operation should be performed first?
### OS Structure
- MS-DOS
- 很早的 OS
- 應用程式可以直接碰到記憶體和控制硬體 -> 危險
- 效能超快
- 現在還用在主機板韌體升級
- 
- FreeRTOS
- A real-time, embedded operating systems
- 把 OS 和 program 直接寫在一起,也沒辦法下載新的 program
- UNIX
- Systems programs + Kernel
- 
- module
- 原先的版本也沒辦法安裝程式,要重新編譯 kernel 才行
- 但後來做了 module,就能透過新增 kernel module 來達成
- Microkernel
- 把大部分的東西丟到 user,只留必要的(CPU scheduling、Memory Management、IPC)
- easy to extend:每個功能都是一個 user 的 process,所以只要加 user process 就能擴充
- easy to port the operating system to new architectures:kernel 本身很小很好移植
- reliable and secure
- 超慢:process 間傳訊息才能溝通 + context switch
- 
- Mac OS
- Mach (μ-kernel): memory management, RPC, IPC, message passing, thread scheduling
- BSD: networking, file systems, POSIX APIs
- 
- Google Fuchsia
- based on the Zircon microkernel
- designed for IoT devices
### Virtual Machine
- Provides an interface identical to the underlying bare hardware
- Now popular because of cloud-based computation
- Different levels of virtualization exist
- 虛擬化硬體 e.g. VM Ware
- 虛擬化 system call e.g. 在 Linux kernel 上跑 WIN32 的 system call
- 做法:call WIN32 後轉成 POSIX 傳給 kernel
- Namespace
- 執行環境
- e.g. Docker、虛擬化 File system
- 
- Each virtual machine is isolated from all other virtual machines
- VMmare Architecture
- 
- Processor Management
- 
- 只有 VMM 在 kernel mode,其他都在 user mode
- 當有人在 application 呼叫一個 system call,會發送一個 trap 給 VMM
- VMM 檢查,選擇適當的 guest OS,並將指令送過去
- guest OS 做完之後,IRET 會發送給 VMM
- VMM 再送回 application 讓他知道完了
- sensitive / privilege instruction
- 只能在 kernel 做的指令
- 如果 application 要做,則需要先送給 VMM
- Java Virtual Machine (JVM)
- JVM 的 ISA 跟底層的 kernel 都不一樣,所以會用到 interpreter 轉換
- 
- e.g. Android
## Ch. 3 Process Concept
- A process uses the following context
- Text section: executable binaries (e.g. function)
- Stack section: function args. + local vars.
- Data section: global vars. + heap
- global var. 有沒有初始值,程式對待他的方式會不一樣
- 沒有初始值的 global var. 又叫做 BSS
- heap:動態記憶體配置的記憶體來源
- Program counter and other CPU registers
- e.g.
```cpp=
int i, j = 2;
int foo(int x) {
int *y;
char c;
static char d = 'x';
i = 0;
y = (int*)malloc(100);
}
```
- stack: `x`, `y`, `c`
- data: `i`, `j`, `d`
- text: `foo`
- `y` 指向一塊由 heap 分配來的記憶體
- Process schedule
- 
- running -> ready
- Case 1: run out of its time quantum
- Case 2: IO interrupt 讓優先度高的 process 從 waiting 進到 ready 並馬上進入 running,擠掉原先在 running 的 process
- Process Control Block (PCB)
- 每個 process 都有一個,儲存 process 的資訊
- 包含
- PID
- Process state
- Saved CPU registers values(之前在 CPU 跑的時候的一些資料)
- CPU scheduling info (e.g., priority)
- Memory-management information (e.g. segment table and page-table base register)
- I/O status info (e.g., opened files)
- Context Switch
- ready 和 running 間轉換的過程
- context-switch 的時候 CPU 沒辦法處理任何 process
- 約 2000ns on Intel 5150
- 花費時間來源:寫入 & 取回資料進 CPU、pipeline stall 和 cache 的花費時間
- 流程
1. 執行 P0
2. P0 被打斷讓出 CPU ,此時處於 idle 狀態
3. 進行 Context Switch:將 P0 的狀態存於 PCB 並讀取 P1 的 PCB。
4. 執行 P1
- 
- 先存 PC(CS + IP) 和 FLAG(PSW)
- 再存 general register
- 將 stack pointer 存進 PCB (process control block)
### Process scheduling
- 
- 下面的都是 waiting
- Scheduler
- Short-term scheduler (or CPU scheduler)
- selects which process should be executed next and allocates CPU
- involked frequently (ms) -> must to be fast
- Long-term scheduler (or job scheduler)
- selects which processes should be brought into the ready queue
- controls the degree of multiprogramming
- invoked in seconds or minutes
- process 的性質:這個 process 大部分的時間花在何處
- IO-bound: spend more time on IO
- CPU-bound: more time on computation
- 一個 long-term scheduler 需要有效的混合兩種類型的 process 才能完整運用資源
- 大部分的 PC 其實沒有 long-term scheduler,因為使用者自己決定有哪些 program 要被運行
- e.g. MapReduce
- Mid-term scheduler
- 當記憶體滿的時候,決定哪些 process 的記憶體要被 swap out
- 
- 現今已被虛擬記憶體取代(process 的記憶體分成 page,根據使用度移出 page)
### Operation on Process
#### Process creation
- `fork` system call creates new process. The child is an exact copy of the parent
- `exec` system call used after a fork to replace the process' memory space with a new program
- 
- 
- hthreadd:kernel daemons
- pdflush():`write()` 是 async IO,pdflush 就是負責背景將這些資料寫進記憶體
- `fork()` -> `exec()` convention
- fork 複製一份當前的 process,但 fork 會替換記憶體位置
- fork 不會真的完全複製,用 memory mapping 的方式有效地複製,由 MMU(Memory Merged Unit,一個硬體) 負責
- 細節:複製初 child 只會對應到原本 process 的 memory,當 child 有改某塊記憶體時,才會額外配置給他,這樣的做法又叫 Copy-On-Write(CoW)
- 如果 CPU 沒有 MMU 的話會用 `vfork()` 取代 `fork()`
- 他會暫停 parent,直接把 child 的 memory 加到 stack 上,而如果 child 再呼叫 exec 就會重新配置一塊記憶體給他,再把原先 stack 上 child 的東西刪掉,把 stack 還給 parent 再繼續跑
```cpp=
# include <stdio.h>
# include <stdlib.h>
# include <unistd.h>
int x=0;
int main() {
pid_t pid;
/* fork another process */
pid = vfork();
if (pid < 0) { /* error occurred */
fprintf(stderr, "Fork Failed");
exit(-1);
}
else if (pid == 0) { /* child process */
x++; // often, here calls exec()
_exit(0);
}
else { /* parent process */
/* parent will wait for the child to complete */
wait (NULL);
printf ("%d",x); // Output will be 1
exit(0);
}
}
```
- Multiprocess architecture example:web browser (Chrome)
- 每個頁面都是一個 process,這樣一個頁面 crash 也不會影響其他頁面
#### Process termination
- 兩種方法
1. 用 `exit()` 自己結束,回傳值要被 parent 回收
2. parent 發送 `signal(SIGKILL)` 強制結束 child
- zombie (defunct) process
- 一個 process 結束但沒有人接收他的 return value
- 他的所有資料都清掉了,但會留一個 entry 在 process table
- orphan process
- 在 child 結束以前 parent 就已經結束
- 在 Linux,此時他會被 init (pid=0) 接收,init 會等他並接收他的回傳值
- application: using double fork to process zombie process
- fork 兩次並把中間那個 process 刪掉,這樣 child 就會自動被 init 接收
#### Inter-Process Communication

##### Shared Memory
- Linux system call
- `shmget()`: create a block of shared memory
- `shmat()`: attach shared memory to the current process’s address space
- `shmdt()`
- `shmctl()`: control shared memory (including delete)
- Producer-consumer problem
- producer 一直給,consumer 一直拿,要怎麼用 buffer(shared memory)同步兩人的動作
- 限制
- buffer 大小有限
- 不允許 overwriting 和 null reading
- solution: round buffer
- 兩個指標代表 `in`、`out` 的位置
- `in == out`:空的
- `out + 1 == in`:滿了
- 還是有問題:浪費 CPU cycle -> require proper synchronization (in Ch. 6)
##### Message Passing
- process 間看不到對方的資料
- 需要建立通道
- 有 buffer
- 會自動處理空或滿的 buffer(built-in synchronization)
- e.g. pipe
- pipe is created and configured by the parent process
- 若需寫入(出),另一端需關掉
##### Signal
- 一個 process 溝通的方式(但通常是拿來處理一些情況)
- signal for process 很像 interrupt for CPU
- sync vs. async
- sync
- 自己觸發
- divide overflow, memory-access violations, ...
- e.g. SIGSEGV (Memory protection fault), SIGFPE (Arithmetic fault)
- async
- e.g. SIGKILL, SIGSTOP, SIGCHLD (child terminated)
- signal handler:當某個 signal 來的時候,要如何去處理(會用一個函式)
## Ch. 4 Thread Programming

- thread 又稱 lightweight process,創建 thread 的時間更快,並且 context switch 也比較快
### Multithreading Models
- Thread Library
- Pthreads (P for POSIX), Win32 threads, Java threads
- POSIX 也規範了在這個作業系統底下 threads 相關的 API
- 但 Pthread spec 沒有定義其 model(for flexibility),意即 Pthreads 可以用 1-1、M-1、M-M 來實作
- User thread vs. kernel thread
- user thread:透過 API 創建出的 thread
- kernel thread:並無此名稱,只是用來理解。指的是 CPU 中可以獲得時間或 CPU 排程的 unit
- mapping of user thread to kernel thread
- Many-to-One
- One-to-One
- Many-to-Many
- 
- Many-to-One
- 許多 user thread mapping 到一個 kernel thread
- CPU 排成排到這個 process 只會有一個 thread 可以用,如何在 process 中挑選哪個 thread 執行由 thread library 決定
- pros & cons
- portability:Kernel 不支援 multithread 也能用,也不一定要用在 POSIX 系統
- 如果其中一個 thread 卡住 (e.g. 執行 `read()`), 則所有 threads 都會被卡住,因為對 kernel 來說只有一個 thread 而他正在 read
- 但其實有一些 wrapper function 可以繞過,如
```cpp
ssize_t pth_read(int fd, void *buf, size_t nbytes)
```
- 無法平行化
- e.g.
- Solaris Green Threads (Java thread)
- GNU Portable Threads: An implementation of the Pthread specification
- One-to-One
- pros & cons
- Require kernel support; less portable
- 可平行化
- 不會被 sync functuon 卡住
- e.g. Windows NT/XP/2000, Linux Pthread, Solaris 9 and later
- Many-to-Many
- 現在較少見
- 
- Amdahl’s Law(詳見計組)
### Threading issue
- `fork()` and `exec()` in thread: undefined
- signal
- Synchronous signal:被造成這個 signal 的 thread 收到。e.g. access violation
- Asynchronous signal:typically delivered to the first thread that does not block the signal. e.g. process termination
## Ch. 5 CPU Scheduling

- I/O wait:有多少比例的時間是在等 IO
- $p$: I/O wait; $n$: Degree of multiprogramming
- CPU scheduling decisions may take place when a process:
1. Switches from running to waiting state (sync IO, `sleep()`, `yield()`)
2. Switches from running to ready state (context switch)
3. Switches from waiting to ready (when high priority process become ready, scheduler will work and send it to CPU)
4. Terminates
- Scheduling under only 1 and 4 is non-preemptive or cooperative
- Easy to implement
- e.g. Windows 3.1, Old versions of Mac OS, Sensor OS (e.g., TinyOS 1.0)
- All other scheduling is preemptive(可搶先的)
- Higher responsiveness
- Higher context switch overheads
- Must deal with race conditions (In Ch. 6)
- Dispatcher
- 由 scheduler 決定誰被執行,dispatcher 執行換人的動作,其負責
- switching context
- 從 kernel mode 轉換到 user mode
- process 換了之後,page table 的 pointer、program counter(load 到新的 state)
- Dispatch latency: dispatcher 執行換人動作,停止一個 process,啟動另一個 process 所需的時間
- Scheduling time
- Interrupt re-enabling time
- Context switch time
### Scheduling Algorithms
#### Criteria
- (+) CPU utilization – keep the CPU as busy as possible
- (+) Throughput – # of processes that complete their execution per time unit
- (-) Turnaround time – time to execute a particular process
- (-) Waiting time – time waiting in the ready queue
- (-) Responsiveness (interactivity) – time from user sending input until the system delivers a feedback
#### Algorithms
- **First-Come, First-Served (FCFS)**
- 
- absolutely fair
- convoy effect: short process behind long process
- 明明做很快卻要等很久
- e.g. I/O-bound process 無法趕快做完交給 I/O 繼續做
- I/O 利用效率不好
- I/O 反應很慢
- **Shortest-Job-First (SJF)**
- Two schemes:
- Non-preemptive:直接看 burst time
- Preemptive:看 remaining time,如果新進來的 process 比當前的 remaining time 還快,則會讓給新來的
- starvation:很長的 process 儘管早到,若中間一直有小 process 進來,則他會一直無法完成
- Optimal average waiting time
- Large waiting time variation
- 實務上,無法知道 process 實際的 burst time -> using prediction
- e.g. exponential moving average
- 根據先前他的 burst time 預測下一次
- $\tau_{n+1}=\alpha t_n + (1-\alpha)\tau_n$
- The long-standing dilemma: efficiency vs. fairness
- Solution: aging(隨著等待時間提高 priority)
- **Priority Scheduling**
- Convention:數字越小 priority 越高
- Problem: Starvation
- Solution: Aging(等越久 priority 越高)
- **Round Robin (RR)**
- Time sharing:每個 process 獲得一個單位時間 $q$(通常是 10~100 ms),時間到了還沒結束,就會強制送回 ready queue
- $q$ large = FIFO
- $q$ small:high overhead of context switch
- 
- large quantum time:better avg turnaround(process 從抵達到完成所花的時間)
- small quantum time:better interactivity
- **Multilevel Queue**
- 將 ready queue 分為兩個(或以上)的 queue,每個 queue 用不同方法管理
- foreground (interactive):RR
- background (batch):FIFO
- scheduling 也需要在兩個 queue 間切換,如
- fixed priority:foreground 做完再做 background -> starvation
- 分配不同比例的 CPU 時間給 queue
- 
- **Multilevel Feedback Queue**
- Promotion / Demotion:把 process 送進不同 queue
- e.g.
- 
- 在前兩個 queue 若把時間用完,則會被 demote 到下個 queue,反之則會被 promote
- 讓 I/O-bound process 能獲得較高優先度,提升 I/O 使用度以及互動性
- 讓 CPU-bound 能獲得較大的 time quantum,減少 turnaround 及 throughput
### Multiple Processor Scheduling
#### Multiprocessor Architecture
- Homogeneous architecture: All CPUs share global memory (**SMP**, Symmetric MultiProcessing) and contend for memory cycles via the common bus
- Easy
- Low process migration cost
- Bad scalability:多 CPU 會過於競爭 memory
- Cache coherence:每個 CPU 裡都會有 cache 暫存資料,若 CPU 過多則會導致 cache 間的一致性很差
- Heterogeneous architecture: CPUs have local memory and memory reference is either local or remote (**NUMA**, Non-Uniform Memory Access)
- 
- 現今較常見
- Good scalability
- High process migration cost
- common topology: ring, mesh, hypercube*, ad-hoc
#### Multiple Processor Scheduling Algorithm

- partitioned scheduling
- 較適合 NUMA:不用一直傳遞資料
- 較常見
- e.g. Linux
- Goal:
- Processor affinity
- process migration cost (for NUMA): cache re-population, pipeline re-start, process data transfer
- 最好一個 process 固定在一個 CPU 跑
- Load balancing
- 上述兩者互相衝突
#### Hardware Multithreading
- A thread represents a logical processor, emulated by an independent set of register
- 節省 memory stall 的時間,再讀取的時候可以先去做其他事
- 
- 
- physical 上的兩個 logical CPU 其實還是共享同一個 physical CPU,所以如果此時有兩個工作 1 0 1 0 會比 1 1 0 0 還好
- Linux strategy:從下往上平均分配 process
### Real-Time Scheduling
- IEEE definition of real time systems: *"A real-time system is a system whose correctness includes its response time as well as its functional correctness."*
- 不用快,但要在 DDL 前做完
- Soft real-time:超過 DDL 沒關係但要盡快 e.g. 影音串流
- Hard real-time:超過會出事 e.g. 航空電腦
#### Real-Time Scheduling Algorithm
- Support preemptive
- Processes have new characteristics: periodic
- 每過一定週期就會需要做這個 process 一次
- 通常 DDL = 週期
- **Rate Montonic Scheduling**
- 週期越短優先度越高
- 
- 有機會 miss DDL
- **Earliest Deadline First Scheduling (EDF)**
- DDL 越早優先度越高
- 
### Thread Scheduling
- Light-Weight Process (LWP)
- abstraction for mapping of user threads to kernel threads
- LWP 對 kernel thread 通常是 1-1
- 
- Two scheduling
- Local Scheduling
- User kernel 如何排到 LWP
- Process contention scope (PCS)
- Global Scheduling
- Kernel 如何排到 CPU
- System contention scope (SCS)
- Pthread API
```cpp=
/* set the scheduling algorithm to PROCESS or SYSTEM */
pthread attr setscope(&attr, PTHREAD_SCOPE_SYSTEM);
/* set the scheduling policy - FIFO, RR, or OTHER */
pthread attr setschedpolicy(&attr, SCHED_OTHER);
```
- Linux 原生的 thread model 是 1-1,所以只能設 `PTHREAD_SCOPE_SYSTEM`
### Operating System Example
- Solaris
- priority-base and RR in the same priority
- 
- 每類有自己的 priority 及演算法
- 
- time quantum expired:時間用完後 priority 會變多少(demote)
- return from sleep:sleep 完後 priority 會變多少(promote)
- Linux CFS Scheduling (2.6.23+)
- Virtual runtime (vruntime)
- 每個 process 有自己的 clock,且跑的速度可能不一樣
- 每次挑選 vruntime 最小的跑
- 又叫 virtual clock algorithm
- Nice value:-20 ~ +19 (priority: high ~ low)
- 
- $\text{priority}=120+\text{nice value}$
- Completely Fair Scheduler (CFS)
- CFS 傾向 I/O-bound,所以其 clock 會跑得比較慢
- CFS 根據 nice value 和最近的運行狀況(I/O-bound、CPU-bound)調整 vruntime 的增加速度
## Ch. 6 Synchronization
- Background:同時讀寫資料可能會導致 data inconsistency,尤其是多 process 的狀況,先前利用兩個指標 `in`、`out` 表示 buffer 的空間,那能不能利用 `cnt` 存 buffer 內的資料數量
```cpp=
/* Producer Code */
while (true) {
/* produce an item and put in nextProduced */
while (count == BUFFER_SIZE)
; // do nothing
buffer [in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
count++;
}
/* Consumer Code */
while (true) {
while (count == 0)
; // do nothing
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
count--;
/* consume the item in nextConsumed */
}
```
- 但 `count++`、`count--` 實際上由三行組語組成,由於 context switch, Preemptive scheduling 等因素,這些指令可能不是一次執行完成,導致執行結果不符合預期(Race condition)
```=
move ax, counter
add ax, 1
move counter, ax
```
### Critical Section Problem
- 每一個 Process 存取 shared data 的 code segment 我們稱為 critical section
- solution
- Mutual Exclusion:當一個 Process 執行 critical section 時,其他 Processes 不得執行 critical section
- Progress:如果有 process 想進入 critical section,且沒有其他人也在裡面,則他一定可以進去
- 為了避免先行排程好進入 critical 的順序
- e.g. 依照 P1 -> P2 -> P3 的順序,但若 P2 本身程式邏輯就很少會進入 critical section,則會導致 P1 和 P3 都被卡住
- Bounded Waiting:Process 等待進入 critical section 的期限必須受到限制,不可一直排隊等待
### Synchronization Hardware
#### Interrupt Disabling
- Masking interrupts prevents preemption, resolve race problem
- 在 critical section 前後把 interrupt 開關
- 常用在 kernel
- interrupt disabling is Privilege instruction, cannot be used in user mode
- 無法處理 multiprocessor 不同 CPU 競爭的問題
#### Atomic Instructions
- 現今機器有些特別的 atomic hardware instructions,即無法被中斷的指令
- 在 miltiprocessor 中還會特別把這些 atomic inst. 存取的 memory 獨立開來
- e.g.
- Test memory word and set value (TSL)
- Swap contents of two memory words (XCHG)
- TestAndSet:把特定 section 或記憶體空間鎖住
```cpp=
bool TestAndSet (bool *target) {
bool rv = *target;
*target = TRUE;
return rv:
}
/*** P1 ***/
do {
while (TestAndSet(&lock)); /* do nothing */
// critical section
lock = FALSE;
// remainder section
} while (TRUE);
/*** P2 ***/
do {
while (TestAndSet(&lock)); /* do nothing */
// critical section
lock = FALSE;
// remainder section
} while (TRUE);
```
- swap
```cpp=
/* In process */
do {
key = TRUE;
while (key == TRUE)
Swap(&lock, &key);
// critical section
lock = FALSE;
// remainder section
} while (TRUE);
```
- TAS / SWAP
- can be used in uniprocessor, multiprocessor, user mode, or kernel mode
- issue
- wasting CPU cycle(要在那邊等解鎖)
- the contention is stateless, starvation is possible
- solve mutual exclusive, progressive, but not bounded waiting
### Semaphores
- Semaphore `S`:integer variable
- Init: S >= 0
- `wait()`
- `S--`
- running -> waiting (process status)
- `signal()`
- `S++`
- waiting -> ready
- Associated with a waiting queue
```cpp=
wait(S) {
S--;
if (S < 0) {
// add this process to waiting queue
block();
}
}
Signal(S) {
S++;
if (S <= 0) {
// remove a process P from the waiting queue
wakeup(P);
}
}
```
- `signal` 內的條件為 `<= 0`,代表 `S++` 前小於零,亦即有 process 被 block 住,而現在有空間了,所以可以放出 P
- Semaphores operations must be atomic
- semaphore 的功用用初始值來決定
- Mutex lock: init = 1
- Sequencing or event: init = 0
- Capacity control: init = capacity
- Mutex lock: `Semaphore mutex = 1`
```cpp=
/* P1 */ /* P2 */
do { do {
waiting(mutex); waiting(mutex);
// Critical section // Critical section
signal(mutex); signal(mutex);
// Reminder section // Reminder section
} while (TRUE); } while (TRUE);
```
- 假設 P1 先進入,`waiting(mutex)` 將 `mutex` 變成 0,若此時 P2 進入則再把 `mutex` 改成 -1,此時 P2 被 block
- P1 結束 critical section執行 `signal()`,把 `mutex` 變成 0,此時就放出 P2
- Sequencing or event: `Semaphore synch = 0`
```cpp=
/* P1 */ /* P2 */
S1; wait(synch)
signal(synch); S2;
```
- Capacity control: `Semaphore sem = capacity`
```cpp=
/* Pi, Pj, Pk, ... */
{
// ...
wait(sem);
// ...
signal(sem);
// ...
}
```
- Deadlock
- `S` and `Q` be two semaphores initialized to 1
```cpp=
/* P0 */ /* P1 */
wait(S); wait(Q);
wait(Q); wait(S);
// Critical section // Critical section
signal(Q); signal(S);
signal(S); signal(Q)
```
- P0 跑完 `wait(S)`,突然 context switch 到 P1 執行 `waiy(Q)`,此時兩個 process 就會卡住
### Classic Problems of Synchronization
#### Bounded-Buffer Problem
- Semaphore `mutex` initialized to 1
- protect buffer
- allow many consumer and producer
:::warning
如何能取用 / 新建資料?
A: 成功進入 buffer 後看使否為空 / 已滿
Issue: 當 buffer 為空且 consumer 先進入 buffer 時,必須先 block 等 producer 放入再往下跑 -> busy waiting
:::
- Semaphore `full` initialized to 0
- empty (for consumer)
- Semaphore `empty` initialized to N
- N free slots
```cpp=
/*** Producer ***/
do {
// produce an item
wait (empty);
wait (mutex);
// add the item to the buffer
signal (mutex);
signal (full);
} while (true);
/*** Consumer ***/
do {
wait (full);
wait (mutex);
// remove an item from buffer
signal (mutex);
signal (empty);
// consume the removed item
} while (true);
```
#### Readers and Writers Problem
> Description: A data set is shared among a number of concurrent processes
> - Readers: 只能寫入
> - Writers: 可讀取及寫出
> - Allow multiple reader to read at the same time.
> - Only one single writer can access the shared data at the same time.
> - No readers will wait until the writer locked the shared object(當有一個 reader 進了,之後的 reader 都可以一直進)
> - Readers need not to synch with each other
- Semaphore `wrt` initialized to 1
- Semaphore `mutex` initialized to 1. (to protect “readcount”)
- Integer `readcount` initialized to 0
- 處理第一個 reader 的情況
```cpp=
/* Writer */
do {
wait(wrt);
// writing is performed
signal(wrt);
} while (true)
/* Reader */
do {
wait(mutex);
readcount++;
if (readercount == 1) wait(wrt);
signal(mutex);
// reading is performed
wait(mutex);
readcount--;
if (readcount == 0) signal(wrt);
signal (mutex);
} while (true)
```
- Writer may be starve (the second readers-writers problem)
- writer 優先
- 有 writer 要進來,後進來的 reader 不能更新 `readcount`,所以需要一個多的 lock 加在 `readmutex` 外:`readTry`
- 當 writer 正在使用資源,決定什麼時候 `readTry` 要加 / 解鎖:`writecount`
- wmutex 保護 `writecount`
#### Dining-Philosophers Problem
> Description:假設有五位哲學家圍坐在一張圓形餐桌旁,做以下兩件事情之一:吃飯,或者思考。吃東西的時候,他們就停止思考,思考的時候也停止吃東西。所有哲學家吃飯的時候需要身邊的兩隻叉子。問題在於**如何設計一套規則,使得在哲學家們在完全不交談,也就是無法知道其他人可能在什麼時候要吃飯或者思考的情況下,可以在這兩種狀態下永遠交替下去。**
```cpp=
do {
wait(chopstick[i]);
wait(chopStick[(i + 1) % 5]);
// eat
signal(chopstick[i]);
signal(chopstick[(i + 1) % 5]);
// think
} while(true);
```
- 可能導致 deadlock(每個人都拿著左邊的筷子)
- 只允許 <5 人可以吃(加個訊號機即可)
- 一個人都拿左邊,其他人都拿右邊
- 兩隻筷子同時拿
#### Sleeping Barber Problem
> Description:一個理髮廳有 $n$ 張椅子及一個理髮師
> - 如果沒有人,理髮師會睡覺
> - 如果椅子滿了,客人會直接離開
> - 如果有空位且理髮師正在忙,則客人會坐下等
> - 如果有空位但理髮師在睡覺,客人會去叫醒理髮師
- Semaphore `customers` = 0
- customers are waiting
- barber wait if `customer` = 0
- Semaphore `barber` = 0
- barber is ready
- customer wait if `barber` is busy
- Semaphore `accessSeats` = 1 (lock `NumberOfFreeSeats`)
- int `NumberOfFreeSeats` = N
```cpp=
/* Customer */
while(1) {
wait(accessSeats) // mutex protect the number of available seats
if ( NumberOfFreeSeats > 0 ) { // if any free seats
NumberOfFreeSeats--; // sitting down on a chair
signal(customers); // notify the Barber
signal(accessSeats); // release the lock
wait(barber); // wait if the B is busy
... // cut hair
}
else { // there are no free seats
signal(accessSeats); // release the lock on the seats
... // C leaves without a haircut
}
}
/* Barber */
while(1) {
wait(Customers) ; // wait for C and sleep
wait(accessSeats); // mutex protect the number of available seats
NumberOfFreeSeats++; // one chair gets free
signal(Barber); // Bring in a C for haircut
signal(accessSeats); // release the mutex on the chairs
...
}
```
### Monitors
#### Mutex Locks
- `pthread_mutex_lock()`, `pthread_mutex_unlock()`
- 類似於 init value = 1 的 semaphore
- A mutex can only be unlocked by the thread that has locked the mutex
#### Moniter
- 可以將 Moniter 想成一個特殊 class (high level language construct),每一個 Moniter 都有可能發生 race condition 的 local variable (對應下圖的 shared data) 需要保護,而要操作這些 var 必須透過 Moniter 的 method。
- 
- Condition variable
- Two operation
- `x.wait()`
- `x.signal()`:如果有 process 正在等待,則叫醒他,若無則無用處(與 semphore 不同,semphore 的 `wait()` 會把 cnt+1)
- 在 monitor 當中,condition variable 不是 counter 而是一個 queue,`wait()`、`signal()` 分別對應到 push 和 pop,因此若 queue 為空則 `signal()` 沒有用
## Ch. 7 Deadlock
### System Model
- Resource $R_1,R_2,\dots,R_m$:CPU cycles, memory objects, I/O devices...
- Each resource type $R_i$ has $W_i$ instances:DMA channels...
- Each process utilizes a resource as follows
1. request
2. use
3. release
- Resource Allocation Graph
- 
- 
### Deadlock Characterization
> 若 Deadlock 發生了,則以下事件都會發生
- Mutual exclusion
- Hold and wait:拿有某些資源的 process 正在 wait 被其他 process 佔有的資源
- No preemption:Resource 不能被搶先使用,必須等上一個 process 自願放出
- Circular wait:$P_0$ 正在等 $P_1$ 的 resource,$P_1$ 正在等 $P_2$ 的 resource、...、$P_n$ 正在等 $P_0$ 的 resource
- In resource allocation graph
- 
- Resources have multiple instances:Deadlock $\Rightarrow$ Cycle
- Resources have single instances:Deadlock $\Leftrightarrow$ Cycle
- If graph contains no cycles $\Rightarrow$ no deadlock
- If graph contains a cycle $\Rightarrow$
- Resources have single instance, then deadlock
- Resources have multiple instances, then **possible** deadlock
### Handle Deadlock
- 保證不讓系統進 deadlock state(prevention or avoidance),e.g. RTOS
- 允許系統進入 deadlock,但能夠復原(detection and recovery)
- Just ignore... (used by most OS xdddd)
### Deadlock Prevention
- blocking at least one of the four conditions required for deadlock to occur
- No mutex: impossible
- No hold and wait: low concurrency
- No preemption
- 當一個握有 $R_i$ 的 process(called victim) 正在等另一個資源,則當其他 process 需要 $R_i$ 的時候就會被放出來
- $R_i$ 可用時 victim 會被重啟
- 需要 checkpoint 的機制 -> Computationally expensive
- No circular wait
- impose a total ordering or a partial ordering of allocation on resources
- e.g. $R_1\rightarrow R_2$ then $R_2\rightarrow R_1$ can not be exist
- implemented by programmer
### Deadlock Avoidance
- 預測 Deadlock,不要讓 process 進到 unsafe state
#### Avoidance with single instance resources
- 重新標示 resource-allocation graph 的邊
- Claim edge: may use a resource at some time
- Request edge: is requesting a resource
- Assignment edge: is holding a resource
- 每當 process 發出需求,檢查加入該邊後 graph 是否有 cycle,若有則 wait
- In commidity OS, deadlock management has been dropped and become the programmer’s responsibility to write deadlock-free programs
- But in real-time OS (RTOS), there are still resource-synchronization protocols to avoid deadlocks
- e.g. Highest Locker’s Protocol in RTOS
- 當有很多 process 會去鎖一個 resource,當一個 process 成功鎖到時,該 process 的 priority 會增加成那群 process 最高的 priority,一旦釋出 resource 後會還原成原先的 priority
- mutex 只能被鎖上的人解開
- 
- 上半部為一個 deadlock
- 下半部:L 拿到紫色的 resource 後會將自己的 priority 變成 H,此時當 H 進入時不會被 preempt,當 L 釋放紫色後回到原先的 priority,此時 H 才開始跑
- 不要產生 cycle,先把拿到資源的 process 處理完
#### Avoidance with multiple instance resources
- Safe/unsafe-state method
- safe state:一定沒有 deadlock
- unsafe-state:可能產生 deadlock
- 
- 
- 