OS Introduction + CPU Management
Ch. 1 Introduction + I/O
The kernel is built surrounding interrupts
- What is OS
- No definition
- A program that acts as an intermediary between a user of a computer and the computer hardware
- a resource allocator
- a control program
- Everything a vendor ships when you order an operating system
- The one program running at all times on the computer is the kernel
- UNIX OS
- POSIX:一個 OS 的標準
- A kernel + some system program (coreutil+ binutil)
- 參照這個標準的例子:UNIX、Linux
- Typical PC Organization
- North bridge (memory hub): CPU, Memory, PCIe device (顯卡、NVMe SSDs…)
- South bridge (IO hub): PCI, USB, SCSI, SATA
- IO operation
- polling
- CPU 全程介入
- Busy-wait cycle to wait for I/O from device
- CPU perform data movement
- e.g. GIPO
- interrupt
- 等到 IO 做好 CPU 再回來繼續
- Interrupt vector to dispatch interrupt to correct handler
- Interrupt handler (ISR) receives interrupts
- Synchronous I/O vs. Asynchronous I/O
- sync I/O: blocking call, e.g.
read()
, fread()
- async I/O: non-blocking call, e.g.
write()
, fwrite()
- e.g.
fsync()
:讓檔案的寫入變 sync,可以確保資料已經寫入 -> sync IO
aio_read()
:讓 read 變 async,用在 prefetch,如果應用程式預測未來會用到這些資料,就可以先叫 OS 去拿
Interrupt
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- Common function
- hardware: interrupt req. (IRQ)
- IO completion, timer
- async. interrupt
- software: trap, exception
- devide by 0, access violation, system call
- sync. interrupt
- interrupt handling
- 假設 IO 裝置發了一個中斷訊號,他會送給 PIC (=programmable interrupt controller)
- PIC 送訊號給 CPU
- CPU 把 PC 和 flag 存起來
- CPU 去看 IVT (interrupt vector table) 找出要跳到的 ISR 的位址
- CPU 跳到該位置,去找 ISR (interrupt service routine) 把事情做完
- 做到 IRET 代表作完了,把 PC 和 flag 拿回來,再回到 user program
- Some terms
- ISR (interrupt service routine):一串處理 interrupt 的 code
- Interrupt vector:一個 interrupt 對應到一個 ISR 位址
- IVT (interrupt vector table):一群 Interrupt vector,CPU 收到 interrupt 後會來這裡找 ISR 的位址,再跳到該位址處理 interrupt
- IRQ (interrupt request):由硬體產生的 interrupt
- PIC (programmable interrupt controller):首先接收 IO 設備來的 interrupt 再進一步傳給 CPU
Direct Memory Access (DMA)
- Data movement of I/O operation
- Option 1: CPU moves data between main memory and I/O local buffers
- Option 2: CPU offloads data movement to DMA
- DMA
- 硬體
- 用來快速把 IO 的資料傳到 memory
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- CPU 會告訴 DMA controller 要從哪裡傳到哪裡、要傳多少
- 接著 DMA controller 就能直接開始資料傳輸,無需 CPU 參與
- 傳輸結束後,DMA controller 發個 interrupt 給 CPU 告知傳輸已完成,讓 CPU 進行後續處理
Multiprogramming
- Multiprogramming organizes processes so CPU always has one to execute
- Timesharing (Single core case)
- 頻繁切換 CPU 在跑的 process(大於等於 100 Hz)
- 切換 process 方式
- 上個 process 發出 sync IO
- 跑太久
- 因為頻率很高,所以看起來像同步運行
- terminology
- Multiprogramming: multiple processes are loaded into memory for execution
- Multitasking: Multiprogramming with overlapped task execution
- Timesharing: multitasking + periodic switch among processes(有了時間的概念)
Booting
- Bootstrap program
- Typically stored in ROM or EEPROM, generally known as firmware
- Initializes all aspects of system
- also called BIOS
- bootloading procedure
- FFFF:0000
- BIOS,接著去讀磁碟上的第 0 個 section(MBR)
- MBR (master boot record):去磁碟把 OS 讀進來
- Boot Manager
- Pre-OS
- OS
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- BIOS+MBR has been replaced by UEFI+GPT, an extensive spec.
Ch. 2 System Structures
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
System Program
- e.g.
cp
, mv
, ls
, …
- UNIX = system program + kernel
- GNU coreutils + binutils
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- Programming-language support - Compilers, assemblers, debuggers and interpreters sometimes provided
User Interface
- Shell: the interface program between users and the kernel
System Call
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
system call |
system API |
system program |
e.g. |
some assembly |
fork() , read() |
cat , ls , cp |
- kernel 能碰到所有東西(如記憶體),並且 system call 是用中斷的方式執行,所以我們很少直接用 system call,像
fork()
就是 system API
- Three most common system APIs
- Win32 API for Windows
- POSIX API for UNIX
- Java API for the Java VM
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
int
x86 處理器的 instruction,用來觸發 2e
號中斷
- Dual Mode Operations
- kernel mode & user mode
- supported by OS, allowed OS to protect itself against un-trusted code
- some instructions designated as privileged
- application 呼叫 system call,利用 software interrupt 進入 kernel mode
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- System call parameter passing
- 存進 register
- 存進一個 block (table) 並把其位址放進 register 傳入 system call
- 放進 program 的 stack
- e.g.
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
mov eax, 4
: system call number (4: sys_write)
int 0x80
: 觸發中斷進入 kernel
mov ebx, 0
: 回傳值
- 寫入和 exit 都是一個 system call
OS Design and Implementation
- mechanism & policy
- mechanism: How to do it
- policy: What will be done(或如何選擇)
- e.g. disk IO
- Mechanism: How to read and write from disk?
- Policy: Which disk I/O operation should be performed first?
OS Structure
Virtual Machine
- Provides an interface identical to the underlying bare hardware
- Now popular because of cloud-based computation
- Different levels of virtualization exist
- 虛擬化硬體 e.g. VM Ware
- 虛擬化 system call e.g. 在 Linux kernel 上跑 WIN32 的 system call
- 做法:call WIN32 後轉成 POSIX 傳給 kernel
- Namespace
- 執行環境
- e.g. Docker、虛擬化 File system

- Each virtual machine is isolated from all other virtual machines
- VMmare Architecture
- Processor Management
- 只有 VMM 在 kernel mode,其他都在 user mode
- 當有人在 application 呼叫一個 system call,會發送一個 trap 給 VMM
- VMM 檢查,選擇適當的 guest OS,並將指令送過去
- guest OS 做完之後,IRET 會發送給 VMM
- VMM 再送回 application 讓他知道完了
- sensitive / privilege instruction
- 只能在 kernel 做的指令
- 如果 application 要做,則需要先送給 VMM
- Java Virtual Machine (JVM)
- JVM 的 ISA 跟底層的 kernel 都不一樣,所以會用到 interpreter 轉換

- e.g. Android
Ch. 3 Process Concept
- A process uses the following context
- Text section: executable binaries (e.g. function)
- Stack section: function args. + local vars.
- Data section: global vars. + heap
- global var. 有沒有初始值,程式對待他的方式會不一樣
- 沒有初始值的 global var. 又叫做 BSS
- heap:動態記憶體配置的記憶體來源
- Program counter and other CPU registers
- e.g.
- stack:
x
, y
, c
- data:
i
, j
, d
- text:
foo
y
指向一塊由 heap 分配來的記憶體
- Process schedule

- running -> ready
- Case 1: run out of its time quantum
- Case 2: IO interrupt 讓優先度高的 process 從 waiting 進到 ready 並馬上進入 running,擠掉原先在 running 的 process
- Process Control Block (PCB)
- 每個 process 都有一個,儲存 process 的資訊
- 包含
- PID
- Process state
- Saved CPU registers values(之前在 CPU 跑的時候的一些資料)
- CPU scheduling info (e.g., priority)
- Memory-management information (e.g. segment table and page-table base register)
- I/O status info (e.g., opened files)
- Context Switch
- ready 和 running 間轉換的過程
- context-switch 的時候 CPU 沒辦法處理任何 process
- 約 2000ns on Intel 5150
- 花費時間來源:寫入 & 取回資料進 CPU、pipeline stall 和 cache 的花費時間
- 流程
- 執行 P0
- P0 被打斷讓出 CPU ,此時處於 idle 狀態
- 進行 Context Switch:將 P0 的狀態存於 PCB 並讀取 P1 的 PCB。
- 執行 P1
- 先存 PC(CS + IP) 和 FLAG(PSW)
- 再存 general register
- 將 stack pointer 存進 PCB (process control block)
Process scheduling
- Scheduler
- Short-term scheduler (or CPU scheduler)
- selects which process should be executed next and allocates CPU
- involked frequently (ms) -> must to be fast
- Long-term scheduler (or job scheduler)
- selects which processes should be brought into the ready queue
- controls the degree of multiprogramming
- invoked in seconds or minutes
- process 的性質:這個 process 大部分的時間花在何處
- IO-bound: spend more time on IO
- CPU-bound: more time on computation
- 一個 long-term scheduler 需要有效的混合兩種類型的 process 才能完整運用資源
- 大部分的 PC 其實沒有 long-term scheduler,因為使用者自己決定有哪些 program 要被運行
- e.g. MapReduce
- Mid-term scheduler
- 當記憶體滿的時候,決定哪些 process 的記憶體要被 swap out

- 現今已被虛擬記憶體取代(process 的記憶體分成 page,根據使用度移出 page)
Operation on Process
Process creation
fork
system call creates new process. The child is an exact copy of the parent
exec
system call used after a fork to replace the process' memory space with a new program

- hthreadd:kernel daemons
- pdflush():
write()
是 async IO,pdflush 就是負責背景將這些資料寫進記憶體
fork()
-> exec()
convention
- fork 複製一份當前的 process,但 fork 會替換記憶體位置
- fork 不會真的完全複製,用 memory mapping 的方式有效地複製,由 MMU(Memory Merged Unit,一個硬體) 負責
- 細節:複製初 child 只會對應到原本 process 的 memory,當 child 有改某塊記憶體時,才會額外配置給他,這樣的做法又叫 Copy-On-Write(CoW)
- 如果 CPU 沒有 MMU 的話會用
vfork()
取代 fork()
- 他會暫停 parent,直接把 child 的 memory 加到 stack 上,而如果 child 再呼叫 exec 就會重新配置一塊記憶體給他,再把原先 stack 上 child 的東西刪掉,把 stack 還給 parent 再繼續跑
- Multiprocess architecture example:web browser (Chrome)
- 每個頁面都是一個 process,這樣一個頁面 crash 也不會影響其他頁面
Process termination
- 兩種方法
- 用
exit()
自己結束,回傳值要被 parent 回收
- parent 發送
signal(SIGKILL)
強制結束 child
- zombie (defunct) process
- 一個 process 結束但沒有人接收他的 return value
- 他的所有資料都清掉了,但會留一個 entry 在 process table
- orphan process
- 在 child 結束以前 parent 就已經結束
- 在 Linux,此時他會被 init (pid=0) 接收,init 會等他並接收他的回傳值
- application: using double fork to process zombie process
- fork 兩次並把中間那個 process 刪掉,這樣 child 就會自動被 init 接收
Inter-Process Communication

Shared Memory
- Linux system call
shmget()
: create a block of shared memory
shmat()
: attach shared memory to the current process’s address space
shmdt()
shmctl()
: control shared memory (including delete)
- Producer-consumer problem
- producer 一直給,consumer 一直拿,要怎麼用 buffer(shared memory)同步兩人的動作
- 限制
- buffer 大小有限
- 不允許 overwriting 和 null reading
- solution: round buffer
- 兩個指標代表
in
、out
的位置
in == out
:空的
out + 1 == in
:滿了
- 還是有問題:浪費 CPU cycle -> require proper synchronization (in Ch. 6)
Message Passing
- process 間看不到對方的資料
- 需要建立通道
- 有 buffer
- 會自動處理空或滿的 buffer(built-in synchronization)
- e.g. pipe
- pipe is created and configured by the parent process
- 若需寫入(出),另一端需關掉
Signal
- 一個 process 溝通的方式(但通常是拿來處理一些情況)
- signal for process 很像 interrupt for CPU
- sync vs. async
- sync
- 自己觸發
- divide overflow, memory-access violations, …
- e.g. SIGSEGV (Memory protection fault), SIGFPE (Arithmetic fault)
- async
- e.g. SIGKILL, SIGSTOP, SIGCHLD (child terminated)
- signal handler:當某個 signal 來的時候,要如何去處理(會用一個函式)
Ch. 4 Thread Programming

- thread 又稱 lightweight process,創建 thread 的時間更快,並且 context switch 也比較快
Multithreading Models
- Thread Library
- Pthreads (P for POSIX), Win32 threads, Java threads
- POSIX 也規範了在這個作業系統底下 threads 相關的 API
- 但 Pthread spec 沒有定義其 model(for flexibility),意即 Pthreads 可以用 1-1、M-1、M-M 來實作
- User thread vs. kernel thread
- user thread:透過 API 創建出的 thread
- kernel thread:並無此名稱,只是用來理解。指的是 CPU 中可以獲得時間或 CPU 排程的 unit
- mapping of user thread to kernel thread
- Many-to-One
- One-to-One
- Many-to-Many

- Many-to-One
- 許多 user thread mapping 到一個 kernel thread
- CPU 排成排到這個 process 只會有一個 thread 可以用,如何在 process 中挑選哪個 thread 執行由 thread library 決定
- pros & cons
- portability:Kernel 不支援 multithread 也能用,也不一定要用在 POSIX 系統
- 如果其中一個 thread 卡住 (e.g. 執行
read()
), 則所有 threads 都會被卡住,因為對 kernel 來說只有一個 thread 而他正在 read
- 但其實有一些 wrapper function 可以繞過,如
- 無法平行化
- e.g.
- Solaris Green Threads (Java thread)
- GNU Portable Threads: An implementation of the Pthread specification
- One-to-One
- pros & cons
- Require kernel support; less portable
- 可平行化
- 不會被 sync functuon 卡住
- e.g. Windows NT/XP/2000, Linux Pthread, Solaris 9 and later
- Many-to-Many
- 現在較少見

- Amdahl’s Law(詳見計組)
Threading issue
fork()
and exec()
in thread: undefined
- signal
- Synchronous signal:被造成這個 signal 的 thread 收到。e.g. access violation
- Asynchronous signal:typically delivered to the first thread that does not block the signal. e.g. process termination
Ch. 5 CPU Scheduling

- I/O wait:有多少比例的時間是在等 IO
- : I/O wait; : Degree of multiprogramming
- CPU scheduling decisions may take place when a process:
- Switches from running to waiting state (sync IO,
sleep()
, yield()
)
- Switches from running to ready state (context switch)
- Switches from waiting to ready (when high priority process become ready, scheduler will work and send it to CPU)
- Terminates
- Scheduling under only 1 and 4 is non-preemptive or cooperative
- Easy to implement
- e.g. Windows 3.1, Old versions of Mac OS, Sensor OS (e.g., TinyOS 1.0)
- All other scheduling is preemptive(可搶先的)
- Higher responsiveness
- Higher context switch overheads
- Must deal with race conditions (In Ch. 6)
- Dispatcher
- 由 scheduler 決定誰被執行,dispatcher 執行換人的動作,其負責
- switching context
- 從 kernel mode 轉換到 user mode
- process 換了之後,page table 的 pointer、program counter(load 到新的 state)
- Dispatch latency: dispatcher 執行換人動作,停止一個 process,啟動另一個 process 所需的時間
- Scheduling time
- Interrupt re-enabling time
- Context switch time
Scheduling Algorithms
Criteria
- (+) CPU utilization – keep the CPU as busy as possible
- (+) Throughput – # of processes that complete their execution per time unit
- (-) Turnaround time – time to execute a particular process
- (-) Waiting time – time waiting in the ready queue
- (-) Responsiveness (interactivity) – time from user sending input until the system delivers a feedback
Algorithms
- First-Come, First-Served (FCFS)

- absolutely fair
- convoy effect: short process behind long process
- 明明做很快卻要等很久
- e.g. I/O-bound process 無法趕快做完交給 I/O 繼續做
- Shortest-Job-First (SJF)
- Two schemes:
- Non-preemptive:直接看 burst time
- Preemptive:看 remaining time,如果新進來的 process 比當前的 remaining time 還快,則會讓給新來的
- starvation:很長的 process 儘管早到,若中間一直有小 process 進來,則他會一直無法完成
- Optimal average waiting time
- Large waiting time variation
- 實務上,無法知道 process 實際的 burst time -> using prediction
- e.g. exponential moving average
- 根據先前他的 burst time 預測下一次
- The long-standing dilemma: efficiency vs. fairness
- Solution: aging(隨著等待時間提高 priority)
- Priority Scheduling
- Convention:數字越小 priority 越高
- Problem: Starvation
- Solution: Aging(等越久 priority 越高)
- Round Robin (RR)
- Time sharing:每個 process 獲得一個單位時間 (通常是 10~100 ms),時間到了還沒結束,就會強制送回 ready queue
- large = FIFO
- small:high overhead of context switch
- large quantum time:better avg turnaround(process 從抵達到完成所花的時間)
- small quantum time:better interactivity
- Multilevel Queue
- 將 ready queue 分為兩個(或以上)的 queue,每個 queue 用不同方法管理
- foreground (interactive):RR
- background (batch):FIFO
- scheduling 也需要在兩個 queue 間切換,如
- fixed priority:foreground 做完再做 background -> starvation
- 分配不同比例的 CPU 時間給 queue

- Multilevel Feedback Queue
- Promotion / Demotion:把 process 送進不同 queue
- e.g.

- 在前兩個 queue 若把時間用完,則會被 demote 到下個 queue,反之則會被 promote
- 讓 I/O-bound process 能獲得較高優先度,提升 I/O 使用度以及互動性
- 讓 CPU-bound 能獲得較大的 time quantum,減少 turnaround 及 throughput
Multiple Processor Scheduling
Multiprocessor Architecture
- Homogeneous architecture: All CPUs share global memory (SMP, Symmetric MultiProcessing) and contend for memory cycles via the common bus
- Easy
- Low process migration cost
- Bad scalability:多 CPU 會過於競爭 memory
- Cache coherence:每個 CPU 裡都會有 cache 暫存資料,若 CPU 過多則會導致 cache 間的一致性很差
- Heterogeneous architecture: CPUs have local memory and memory reference is either local or remote (NUMA, Non-Uniform Memory Access)

- 現今較常見
- Good scalability
- High process migration cost
- common topology: ring, mesh, hypercube*, ad-hoc
Multiple Processor Scheduling Algorithm

- partitioned scheduling
- 較適合 NUMA:不用一直傳遞資料
- 較常見
- e.g. Linux
- Goal:
- Processor affinity
- process migration cost (for NUMA): cache re-population, pipeline re-start, process data transfer
- 最好一個 process 固定在一個 CPU 跑
- Load balancing
- 上述兩者互相衝突
Hardware Multithreading
- A thread represents a logical processor, emulated by an independent set of register
- 節省 memory stall 的時間,再讀取的時候可以先去做其他事

- physical 上的兩個 logical CPU 其實還是共享同一個 physical CPU,所以如果此時有兩個工作 1 0 1 0 會比 1 1 0 0 還好
- Linux strategy:從下往上平均分配 process
Real-Time Scheduling
- IEEE definition of real time systems: "A real-time system is a system whose correctness includes its response time as well as its functional correctness."
- 不用快,但要在 DDL 前做完
- Soft real-time:超過 DDL 沒關係但要盡快 e.g. 影音串流
- Hard real-time:超過會出事 e.g. 航空電腦
Real-Time Scheduling Algorithm
- Support preemptive
- Processes have new characteristics: periodic
- 每過一定週期就會需要做這個 process 一次
- 通常 DDL = 週期
- Rate Montonic Scheduling
- 週期越短優先度越高

- 有機會 miss DDL
- Earliest Deadline First Scheduling (EDF)
- DDL 越早優先度越高

Thread Scheduling
- Light-Weight Process (LWP)
- abstraction for mapping of user threads to kernel threads
- LWP 對 kernel thread 通常是 1-1

- Two scheduling
- Local Scheduling
- User kernel 如何排到 LWP
- Process contention scope (PCS)
- Global Scheduling
- Kernel 如何排到 CPU
- System contention scope (SCS)
- Pthread API
- Linux 原生的 thread model 是 1-1,所以只能設
PTHREAD_SCOPE_SYSTEM
Operating System Example
- Solaris
- priority-base and RR in the same priority
- time quantum expired:時間用完後 priority 會變多少(demote)
- return from sleep:sleep 完後 priority 會變多少(promote)
- Linux CFS Scheduling (2.6.23+)
- Virtual runtime (vruntime)
- 每個 process 有自己的 clock,且跑的速度可能不一樣
- 每次挑選 vruntime 最小的跑
- 又叫 virtual clock algorithm
- Nice value:-20 ~ +19 (priority: high ~ low)
- Completely Fair Scheduler (CFS)
- CFS 傾向 I/O-bound,所以其 clock 會跑得比較慢
- CFS 根據 nice value 和最近的運行狀況(I/O-bound、CPU-bound)調整 vruntime 的增加速度
Ch. 6 Synchronization
- Background:同時讀寫資料可能會導致 data inconsistency,尤其是多 process 的狀況,先前利用兩個指標
in
、out
表示 buffer 的空間,那能不能利用 cnt
存 buffer 內的資料數量
- 但
count++
、count--
實際上由三行組語組成,由於 context switch, Preemptive scheduling 等因素,這些指令可能不是一次執行完成,導致執行結果不符合預期(Race condition)
Critical Section Problem
- 每一個 Process 存取 shared data 的 code segment 我們稱為 critical section
- solution
- Mutual Exclusion:當一個 Process 執行 critical section 時,其他 Processes 不得執行 critical section
- Progress:如果有 process 想進入 critical section,且沒有其他人也在裡面,則他一定可以進去
- 為了避免先行排程好進入 critical 的順序
- e.g. 依照 P1 -> P2 -> P3 的順序,但若 P2 本身程式邏輯就很少會進入 critical section,則會導致 P1 和 P3 都被卡住
- Bounded Waiting:Process 等待進入 critical section 的期限必須受到限制,不可一直排隊等待
Synchronization Hardware
Interrupt Disabling
- Masking interrupts prevents preemption, resolve race problem
- 在 critical section 前後把 interrupt 開關
- 常用在 kernel
- interrupt disabling is Privilege instruction, cannot be used in user mode
- 無法處理 multiprocessor 不同 CPU 競爭的問題
Atomic Instructions
- 現今機器有些特別的 atomic hardware instructions,即無法被中斷的指令
- 在 miltiprocessor 中還會特別把這些 atomic inst. 存取的 memory 獨立開來
- e.g.
- Test memory word and set value (TSL)
- Swap contents of two memory words (XCHG)
- TestAndSet:把特定 section 或記憶體空間鎖住
- swap
- TAS / SWAP
- can be used in uniprocessor, multiprocessor, user mode, or kernel mode
- issue
- wasting CPU cycle(要在那邊等解鎖)
- the contention is stateless, starvation is possible
- solve mutual exclusive, progressive, but not bounded waiting
Semaphores
- Semaphore
S
:integer variable
- Init: S >= 0
wait()
S--
- running -> waiting (process status)
signal()
- Associated with a waiting queue
signal
內的條件為 <= 0
,代表 S++
前小於零,亦即有 process 被 block 住,而現在有空間了,所以可以放出 P
- Semaphores operations must be atomic
- semaphore 的功用用初始值來決定
- Mutex lock: init = 1
- Sequencing or event: init = 0
- Capacity control: init = capacity
- Mutex lock:
Semaphore mutex = 1
- 假設 P1 先進入,
waiting(mutex)
將 mutex
變成 0,若此時 P2 進入則再把 mutex
改成 -1,此時 P2 被 block
- P1 結束 critical section執行
signal()
,把 mutex
變成 0,此時就放出 P2
- Sequencing or event:
Semaphore synch = 0
- Capacity control:
Semaphore sem = capacity
- Deadlock
S
and Q
be two semaphores initialized to 1
- P0 跑完
wait(S)
,突然 context switch 到 P1 執行 waiy(Q)
,此時兩個 process 就會卡住
Classic Problems of Synchronization
Bounded-Buffer Problem
- Semaphore
mutex
initialized to 1
- protect buffer
- allow many consumer and producer
如何能取用 / 新建資料?
A: 成功進入 buffer 後看使否為空 / 已滿
Issue: 當 buffer 為空且 consumer 先進入 buffer 時,必須先 block 等 producer 放入再往下跑 -> busy waiting
- Semaphore
full
initialized to 0
- Semaphore
empty
initialized to N
Readers and Writers Problem
Description: A data set is shared among a number of concurrent processes
- Readers: 只能寫入
- Writers: 可讀取及寫出
- Allow multiple reader to read at the same time.
- Only one single writer can access the shared data at the same time.
- No readers will wait until the writer locked the shared object(當有一個 reader 進了,之後的 reader 都可以一直進)
- Readers need not to synch with each other
- Semaphore
wrt
initialized to 1
- Semaphore
mutex
initialized to 1. (to protect “readcount”)
- Integer
readcount
initialized to 0
- Writer may be starve (the second readers-writers problem)
- writer 優先
- 有 writer 要進來,後進來的 reader 不能更新
readcount
,所以需要一個多的 lock 加在 readmutex
外:readTry
- 當 writer 正在使用資源,決定什麼時候
readTry
要加 / 解鎖:writecount
- wmutex 保護
writecount
Dining-Philosophers Problem
Description:假設有五位哲學家圍坐在一張圓形餐桌旁,做以下兩件事情之一:吃飯,或者思考。吃東西的時候,他們就停止思考,思考的時候也停止吃東西。所有哲學家吃飯的時候需要身邊的兩隻叉子。問題在於如何設計一套規則,使得在哲學家們在完全不交談,也就是無法知道其他人可能在什麼時候要吃飯或者思考的情況下,可以在這兩種狀態下永遠交替下去。
- 可能導致 deadlock(每個人都拿著左邊的筷子)
- 只允許 <5 人可以吃(加個訊號機即可)
- 一個人都拿左邊,其他人都拿右邊
- 兩隻筷子同時拿
Sleeping Barber Problem
Description:一個理髮廳有 張椅子及一個理髮師
- 如果沒有人,理髮師會睡覺
- 如果椅子滿了,客人會直接離開
- 如果有空位且理髮師正在忙,則客人會坐下等
- 如果有空位但理髮師在睡覺,客人會去叫醒理髮師
- Semaphore
customers
= 0
- customers are waiting
- barber wait if
customer
= 0
- Semaphore
barber
= 0
- barber is ready
- customer wait if
barber
is busy
- Semaphore
accessSeats
= 1 (lock NumberOfFreeSeats
)
- int
NumberOfFreeSeats
= N
Monitors
Mutex Locks
pthread_mutex_lock()
, pthread_mutex_unlock()
- 類似於 init value = 1 的 semaphore
- A mutex can only be unlocked by the thread that has locked the mutex
Moniter
- 可以將 Moniter 想成一個特殊 class (high level language construct),每一個 Moniter 都有可能發生 race condition 的 local variable (對應下圖的 shared data) 需要保護,而要操作這些 var 必須透過 Moniter 的 method。

- Condition variable
- Two operation
x.wait()
x.signal()
:如果有 process 正在等待,則叫醒他,若無則無用處(與 semphore 不同,semphore 的 wait()
會把 cnt+1)
- 在 monitor 當中,condition variable 不是 counter 而是一個 queue,
wait()
、signal()
分別對應到 push 和 pop,因此若 queue 為空則 signal()
沒有用
Ch. 7 Deadlock
System Model
- Resource :CPU cycles, memory objects, I/O devices…
- Each resource type has instances:DMA channels…
- Each process utilizes a resource as follows
- request
- use
- release
- Resource Allocation Graph
Deadlock Characterization
若 Deadlock 發生了,則以下事件都會發生
- Mutual exclusion
- Hold and wait:拿有某些資源的 process 正在 wait 被其他 process 佔有的資源
- No preemption:Resource 不能被搶先使用,必須等上一個 process 自願放出
- Circular wait: 正在等 的 resource, 正在等 的 resource、…、 正在等 的 resource
- In resource allocation graph

- Resources have multiple instances:Deadlock Cycle
- Resources have single instances:Deadlock Cycle
- If graph contains no cycles no deadlock
- If graph contains a cycle
- Resources have single instance, then deadlock
- Resources have multiple instances, then possible deadlock
Handle Deadlock
- 保證不讓系統進 deadlock state(prevention or avoidance),e.g. RTOS
- 允許系統進入 deadlock,但能夠復原(detection and recovery)
- Just ignore… (used by most OS xdddd)
Deadlock Prevention
- blocking at least one of the four conditions required for deadlock to occur
- No mutex: impossible
- No hold and wait: low concurrency
- No preemption
- 當一個握有 的 process(called victim) 正在等另一個資源,則當其他 process 需要 的時候就會被放出來
- 可用時 victim 會被重啟
- 需要 checkpoint 的機制 -> Computationally expensive
- No circular wait
- impose a total ordering or a partial ordering of allocation on resources
- e.g. then can not be exist
- implemented by programmer
Deadlock Avoidance
- 預測 Deadlock,不要讓 process 進到 unsafe state
Avoidance with single instance resources
- 重新標示 resource-allocation graph 的邊
- Claim edge: may use a resource at some time
- Request edge: is requesting a resource
- Assignment edge: is holding a resource
- 每當 process 發出需求,檢查加入該邊後 graph 是否有 cycle,若有則 wait
- In commidity OS, deadlock management has been dropped and become the programmer’s responsibility to write deadlock-free programs
- But in real-time OS (RTOS), there are still resource-synchronization protocols to avoid deadlocks
- e.g. Highest Locker’s Protocol in RTOS
- 當有很多 process 會去鎖一個 resource,當一個 process 成功鎖到時,該 process 的 priority 會增加成那群 process 最高的 priority,一旦釋出 resource 後會還原成原先的 priority
- mutex 只能被鎖上的人解開

- 上半部為一個 deadlock
- 下半部:L 拿到紫色的 resource 後會將自己的 priority 變成 H,此時當 H 進入時不會被 preempt,當 L 釋放紫色後回到原先的 priority,此時 H 才開始跑
- 不要產生 cycle,先把拿到資源的 process 處理完
Avoidance with multiple instance resources
- Safe/unsafe-state method
- safe state:一定沒有 deadlock
- unsafe-state:可能產生 deadlock


