作業系統概論 – 張立平

傳教

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

ppt / 考古題
2017 OS note
中正資工 OS

這個網路論壇已經不存在或沒有存取權 QQ

Syllbus

上課會抽人問問題
7-8 次作業
期中/期末

Ch1. Introduction

What is an operating system

使用者與電腦硬體中間幫忙互動的 intermediary program
Goals:
- execute: 幫忙執行使用者的程式
- convenient: 讓使用者更便利的使用
- efficient: 效能
Computer System Structure (分四層)
- Hardware
- OS: 控制與讓硬體跟 app 合作
- Application
- User
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
OS 的定義
沒有一個明確的定義
- 資源配置者
  - 效能 vs 公平
- 是一個控制程式
  - control / monitor
  - 避免錯誤或不當使用
- 訂購操作系統時供應商所提供的一切
- 電腦開機時始終會執行的東西
  - kernel
- UNIX OS = kernel + system programs

IO Structure

IO Hardware
- IO devices
- concepts:
  - Port:
    - CPU 只要對特定的 port 做讀寫界可以操作 IO
    - ex. 320 - 32F: disk IO
  - Bus
    - daisy chain
      - CPU處理多個同時中斷要求主要有三種方法之一
      - Daisy Chaining / Pulling / Independent Requesting
    - shared direct access
  - Controller (host adapter)
- IO instructions control devices
- devices address
  - Direct IO instructions
  - Memory-mapped IO (load / store)
- - 傳統看法: Processor 是 OS 的中心
  - 近年有另外一說: Memory 才是 OS 的中心
PC Organization
- 北橋 (memory hub)
  - CPU
  - Memory
  - High-Speed devices (ex. GPU)
- 南橋 (IO hub)
  - IO
  - Network
  - USB
  - HD
  - …
IO Operation
- IO 與 CPU 同時運作
- device controller 有 local buffer
- CPU or DMA 處理 device buffer 與 memory 間的 data 移動
- IO 對 CPU 很慢
  - polling
    - CPU 等 IO 處理完後再往下執行
    - busy-waiting: CPU 沒有意義的等待 or 檢查一件事情
  - interrupt
    - IO device 主動告知執行完，執行期間 CPU 可以處理其他事情 (DMA)
    - interrupt-request
    - 可以被 ignore 或 delay (mask)
    - 常見功能:
      - 錯誤處理
      - 進入 kernel mode
      - timer
    - 過程:
      - 將 interrupted instructure 存進 memory (之後才能回來繼續執行)
      - 同時只能執行一個 interrupt 事件
      - trap: 觸發 by 軟體
      - IRQ: 觸發 by 硬體
- Interrupt Handling
  - 保護 CPU 目前的 state 需要把 registers 與 PC(program counter) 存入 memory
  - 決定哪一種 interrupt，跳到相對應的動作 (Interrupt vector table)
Direct Memory Access
- 需要快速的 IO 會將它放到離 CPU/memory 較近的地方
- ex. PCI: GPU 現在離 CPU 越來越近了
- ex. disk 進 memory 由誰來做?
  - CPU? 會汙染 cache，不適合來做這種東西
  - device (DMA)
Synchronous / Asynchronous IO
- Sync:
  - 在 IO 開始後，process blocking 住，只有在 IO 結束後才會繼續
  - blocking 不代表 CPU 會 hang 住，而是會讓 process 進入 waiting mode 而被 preempt 到其他 process
  - ex. read() / fread() / fsync() (一個會確定都寫入才繼續的 sys call)
- Asyn:
  - IO 開始後，control 回到 program 繼續執行，不會等待 IO 結束 (non-blocking)
  - ex. IO writing (without O_SYNC) write() / fwrite() / aio_read() (多讀入)
  - PS. 寫入時，data 會進入 buffer，程式可以繼續，but data 何時進入 file 就不一定了

Memory / Storage structure

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Memory Hierarchy
- Main memory
  - large / volatile
  - RAM (random access)
- Secondary storage
  - nonvolatile / no random access
Storage Structure
- hierarchy 階層式
- Caching (快取)
  - 從較慢的 storage 複製到較快的 storage
  - access 時會先檢查 cache，如果有就用，沒有就先複製到 cache 在用
- cache 比 main storage 小
- 需求
  - 更有效的 lookup service
  - replacement policy 減少 cache miss
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →

Operating-System structure

電腦開機
- Bootstrap program
  - 開機時載入
  - 存在 ROM 或 EEPROM(firmware)
  - load OS kernel
  - BIOS + MBR / UEFI + GTP
Multiprogramming
- 同時載入多個 job
  - 單一使用者不可長時占用 CPU / IO
  - 對單一 Job 來看，只有 OS 和自己
- Timesharing (multitasking)
  - 程式可以交錯執行
  - knowage:
    - process
    - CPU scheduling
    - swapping
    - virtual memory
    - etc.
- 差別
  - Multiprogramming
    - 多工，同時載入多個 job 進 main memory
    - (不一定可以同時執行/分時多工/interrupt)
  - Multitasking
    - Job 可交錯執行
    - 不一定適用 timesharing 方式分工
  - Timesharing
    - 一定要有 timer，會 interrupt
    - multitasking + periodic switch
  - Multiprocessing
    - 一個機器有多個 processor
Dual Mode Operations
- user space -> kernel space
- Interrupt 為硬體驅動
- exception or trap 為軟體造成
- 硬體(CPU) 需要支援
- 把控制權轉移到 kernel
- 可以讓 OS 自我保護並保護其他 component

Three Pillars

Process Management
- process 是指執行中的 program
- 需要資源
  - CPU time
  - IO
  - files / data storage
- Scheduling
  - create / delete / suspending / resuming
  - communication
  - synchronization (多 process 時)
  - deadlock handling
Memory Management
- 少數 CPU 可以直接 access 的地方
- 從有 data / instruction
- 操作
  - allocating / deallocating => memory fragment (記憶體破碎)
  - swapping / paging
  - protection (追蹤哪段 mem 是哪個 proc 正在使用)
Storage Management
- OS 統一邏輯的看到 (file style)
  - logical storage -> abstract physical properties
- file-system management
  - accessw control
- 操作
  - create / delete
  - read / write
  - mapping 從 secondary storage 到 main memory
- Mass Storage management
- IO Subsystem
  - buffing
  - caching
  - spooling (後台 ex. printer)

Ch2. System Structures

提供 Service
- User Interface
  - GUI
  - CLI
- Program execution
- IO
- File system
- Communications
- Error detection
- efficient

OS user interface

CLI
- shell
GUI
- 所視即所得
touchscreen

System Call

programming interface
多是被包裝好的 high level API (Application Program Interface)，而不會直接呼叫
- 軟體抽象化
- 可攜性、簡單性
- (早期是用組語直接呼叫)
- ex. WIN32 API, POSIX API, Java API
Implementation
- Interrupt 通常會帶入一個中斷編號
- system-call interface 會維護 interrupt vector table 來查找中斷編號的意義
- app 不需要知道 system call 實作方法
Dual Mode Operation
- trap
- kernel <-> user mode
- 可以保護系統與其他使用者/元件
- 硬體提供 mode bit
- system call 只有在 interrupt 完後，進入 kernel mode 才開始執行
- 參數傳遞 (三種方法)
  - by registers
  - 存在 memory 的 block 或 table，再傳遞 memory address
  - push 進 stack
system call 種類
- process control
- file management
- device management
- information maintaenance
- communications

OS design and implementation

goals / spec.
- user goals
- system goals
hardware / type of system
different goals:
- ex. RTOS (Real-time OS): 時間預測性，低延遲
- ex. Mainframe OS: 帶寬，公平性
原則
- mechanism: 決定如何做
- policy: 決定下一個該做哪個 job

OS Structure

Simple Structure
- ex. MS-DOS / IoT
- 沒有 modules 的設計
- 快
- 沒有保護，應用程式可以直接 access hardware
- 沒有 dual mode (user / kernel mode) => 不用 system call / interrupt => 快
- 系統錯誤 (ex. 記憶體使用錯誤) 會造成整個系統 crash
UNIX – monolithic
- 目前最受歡迎的架構
- 分兩部分
  - systems programs
    - ex. binutils, cc, as, ld
  - kernel
    - kernel 中沒有再明確的架構區分
Layered Approach
- 目前沒有 OS 把它實作出來
- debug 與 scale 都很方便
- layer 是很難定義的
- 程式間的 depandency 太高
- ex. storage controler 需要 memroy，但是 memory 需要 storage 做 swapping
- modulize
  - 再過去，需要加入新的 device 時，需要將 device driver code patch 進 kernel code，才能載入，重新開機後才能執行
  - module 將 driver code 事先就 compile 進 kernel 帶是沒有載入，只有當硬體插入後，才會 load 進 kernel
  - .ko: kernel object
Microkernel system
- 上面的 kernel 東西太多了，希望只把有必要的東西放進 kernel，其他的東西都放到 service (user space)
- message passing: 不同 app 溝通
- 好處
  - 更容易加功能
  - 移植更簡單
  - 可靠 / 安全
    - Monolithic 所有 sys call 都可能是漏洞，Microkernel 的 sys call 很少
- 缺點
  - message pass 都要中斷返回，解資料是用 copy/shared memory 的
  - ex. windows NT 3.5
Virtual Machine
- layered approach (分層)
- 虛擬化的硬體 (hypervisor)
- 需要將 phy hardware 分享給 VM 使用
- 可以提供 isolate
- guest OS 與 host OS 的 CPU instructure set 相同

Ch3. Processes-Concept

Concept

A program in execution 執行中的程式
包含
- text section
- program counter
- stack
- data section
- heap
Process State
Process 不可能一直在 CPU 上工作
- new
- running
- ready
- waiting
- terminated
State Changage
- Running -> waiting
  - IO / event waiting
- Running -> Ready
  - timer interrupt
  - 有高 priority process 進入 ready，進而搶佔進 running
- Waiting 不會直接進入 Running
Process Control Block (PCB)
- 每個 process 有關的資訊
  - process state
  - registers value
  - CPU scheduling info. (ex. priority)
  - Mem-management info.
    - ex. segment table
    - ex. page-table base register
  - Accounting info. (quota)
  - IO status
Context Switch
- 將 CPU 的執行焦點轉移
- CPU switch to another process
- OS 需要儲存 old process 的 state
- switch 時 CPU 不能使用
- Context Switch 一定要用組語來寫

Process Scheduling

Scheduling Qqueu
- Ready queue
- Device queue (等待 IO)
- Event queue (等待 event ex. semaphore)
schedule 種類
- long-term scheduler (job scheduler)
  - 哪一個 process 該進入 ready queue
  - 不常見，以秒/分為單位
  - server
  - 控制 degree of multiprogramming
  - 兩種 bound 的 process 合理的混合載入 memory 做 multiprogram
  - 適用 batch-processing systems
  - PC (timesharing system) 不該使用 long-term
    - 會物理限制 degree of multiprogramming
    - 如果使用者長時間搶占不到 CPU…
- short-term scheduler (CPU scheduler)
  - 發生頻率高 => 效能要好 => 最好是 O(1) / O(log)
  - 哪一個 process 該接下來使用 CPU
  - timer interrupt
- Medium term scheduling
  - 觀察是否有長時間 waiting 的 process，將它的 memory swapout 到 disk
Process 可分為
- IO bound
  - 大多數時間在等待 IO
  - CPU priority 較高 (因為 CPU 使用量較少)
- CPU bound
  - 大多是時間在使用 CPU
  - CPU priority 較低
  - ex. 轉檔 / 科學計算

Process 的運作

Creation
- 父程序 create 子程序
- 最終會形成樹狀結構
- 資源分配
  - 父子分享所有資源
  - 子使用父資源的 subset
  - 不分享資源 <- 現行做法
- 執行
  - 同時執行
  - 父等待子結束後才繼續執行
- process creation
  - address space
    - 子複製父的 (duplicate)
    - 然後子程序 load 進剛剛複製的 resource
    - 把父的 data copy 來
  - ex.
    - fork()
    - exec()
Termination
- C compiler 會在 main() {} 最後 "}" 自行插入 exit();
- 要求 OS 刪掉它 (exit)
  - 清 memory (de-allocated)
  - return exit value 給 parent (parent 利用 wait 收到)
  - synchronous termination (wait)
- parent 主動刪除 child
  - kill
  - asynchronous termination (signal handling)
- 如果 parent 比 child 早結束
  - child 會變成孤兒 (orphan process)
  - 會被 grand-parent 接管
  - 通常會被 init / systemd 接管
  - init / systemd 實作上會再 kill 掉 orphan
Orphan & Zombie
- orphan
  - parent 提前結束，child 還沒死亡
  - orphan 是短暫狀態，會很快被 grandparent 接管
- zombie
  - parent 沒有收回回傳值 (parent 還沒死亡)
  - retern value 會卡在 process table (memory) 中，造成不必要的空間浪費
  - 實作方法: fork 出 child 後很快 exit，而 parent 沒有 wait，同時讓 parent 活很久
fork / vfork / clone
- 參考
- fork: 完全複製出記憶體資料，記憶體位置不同，連 PC 都不一樣的 process
- vfork: 子程序完全跑在父程序的 memory 上，父會 blocking 直到子跑完
- clone: 有條件的 clone 出來

Inter-Process communication (IPC)

message passing
- 有同步效果，可以協調兩端 speed (一端需要等待另外一端塞入資料後才能運行)
- 花時間 copy / sys call
- IPC-Message passing
  - pipe()
    - 接收者應該關掉 output side，利用 input side 接收
    - 傳送者應該關掉 input side，利用 output side 傳送
  - signal
    - sync.
      - ex. /0, overflow
      - SIGSEGV, SIGFPE
    - asyn.
      - ex. kill
      - SIGKILL, SIGSTOP, SIGCHLD (child 用來送給 parent)
    - signal handler
      - default handler: C compiler 幫加了
    - 對象是 process
shared memory
- 適合交換大量資料
- 需要傳送 signal 才能讓對面知道已經寫完了，可以讀取了
- IPC-Shared memory
  - shmget(): create
  - shmat(): attach
  - shmdt: detach
  - shmctl: control (通常是用來 delete)
- Producer-Consumer Problem
  - Producer: 提供者，放入 shm 的人
  - Consumer: 消耗者，提出 shm 的人
  - 用 shm 來同步 prod / cons
  - 問題: buffer size 是有限的，如何確保得知 buffer 已滿 / 已空
    - bounded buffer (環狀設計)
  - 問題二: 如何確定 full / empty?
    - 如果用 counter 來處理，會有 race condition
    - in = out: 空
    - in = (out + 1) % n: 滿
    - 以上實做可能會出現 busy-waiting 問題

Ch4. Multithreaded

Overview

thread 的 global value (text, data) 共享，local value (stack, heap) 是獨立的
thread 好處
- 允許在其他 UI 執行時這個 UI input
- 資源共享 (share code) (multiprocess 會 copy 一份)
- 節能 (thread 比 process 輕量)
- 如果有 multi-core，有機會拿來提升效能
- 就算只有 single-core，有時候 multithread 可以幫助寫 code 時的邏輯
thread 有自己的
- stack pointer
- register
- shceduling properties
- etc.
user thread / kernel thread
- user thread 和 kernel thread 的 mapping 方法不再 spec. 上

MultiThreading models

thread 是否可以被 kernel 看到?
Kernel 看到的 thread 數量不必然等於 user space 上的 thread 數量

主要的幾個 thread libraries
- Pthreads (POSIX 中，UNIX 家族的 sys call api)
  - spec. 上沒有決定如何實作
  - 1-1, M-1, M-M
- Win32 threads
- Java threads
Many-to-One
- kernel 看不到 user thread => user thread 會自己 scheduling (timesharing) => 沒有真的平行的效果
- thread 內部自己輪流
- 若其中一個 thread 執行到 blocking code，整個 process 的 thread 都會卡死 (process 看起來還是只有一個)
- ex. GNU portable threads (pthread)
One-to-One
- 每一 user thread map 到每一 kernel thread
Many-to-Many
- 一個 thread 不會 block 到另外一個 thread
- OS 只需要製造出足夠的 kernel thread (比 1-1 有效率)
Two-level model
- 跟 M:M 很像，但是可以同時多種 model
Light-Weight Processes (LWP)
- optional
- 像是虛擬化的 processors

Thread libraries

Pthread
- POSIX
- 只有 spec. 沒有如何時做
- 大概有三種實作:
  - GNU portable thread
  - Linux Pthread
  - Mac OSX Pthread
Win32 thread

Issue

fork / exec
- fork 會 fork 單一 thread 還是所有的 thread 都會一同 fork?
  - spec. 沒定義
- exec?
  - 同樣 spec. 沒定義
Signal Handling
- 如果有一個 thread 觸發 signal，會由誰來接收
- Synchronous signal (自己觸發的 signal)
  - 會交由觸發的 thread 接收 signal
  - ex. 錯誤存取
- Asynchronous signal (外部觸發)
  - 通常交由第一個沒有被 block 的 thread 處理
  - ex. 程序終止
Thread Pools
- 製造一個 pools 的 thread 等待使用
- 好處
  - 比要的時候才製造快
  - 允許應用程序中的 thread 數綁定到池的大小
  - 類似於多 M:M model 的概念

Ch5. CPU Scheduling

Amdahl's law: 平行程式的效能提升一定不是線性的

Basic Concept

何謂 CPU utilization
CPU-IO burst cycle
CPU burst 分配
CPU Scheduler (short-term scheduler)
- 從 ready queue 選出一個 process 來跑
- 發生時機
  - from Running to waiting -> 自己願意把 CPU 放出來
  - from running to ready -> time slice 用完
  - from waiting to ready -> 出現 priority 較高的 ready process
  - terminates -> 結束
- 上面 2 跟 3 都是需要 preemptive(搶佔) 的
- 種類
  - cooperative scheduling (合作型)
    - 不需要而外 HW (ex. timer) 來處理
    - process 主動釋放 CPU
      - sleep(0) or vield() -> 都是釋放 CPU 的 sys call
      - blocking call 也可以造成 context switch
    - 如果 ill-behaved(ex. 進入無窮迴圈) process 可以造成系統其他 process 永遠搶不到 CPU
  - preemptive scheduling
    - 更高的響應
    - 更高的 overheads
    - 需要處理 race conditions
- dispatcher (調度員)
  - 處理 CPU 的 process selected (short-term)
  - dispatch latency: dispatcher 在停止一個 process，開啟另外一個 process 間，需要花費的時間

Scheduling Criteria (調度標準)

criteria
- CPU utilization(使用率): 越高越好
- Throughput(單位時間內完成的 process 數量): 越高越好
- turnaround time(完成一個 process 所需要的總時間): 越低越好
- waiting time(一個 process 完成前，待在 ready queue 的時間): 越低越好
- response time(arrive 到產生第一個 response 所經過的平均時間): 沒有很正式的定義
- waiting + bursting = turnaround

Scheduling Algorithms

FCFS
- First Come First Served
- non-preemptible
- 不搶占，但也不會讓 CPU 進入 IDLE
- 不會 starvation
- IO-bound 的 process 可能因為高 waiting time 而很晚才進入 IO
- 公平、CPU utilization 高、throughput 低
SJF
- Shortest Job First
- 兩種
  - non-preemptive
  - preemptive: 新的 process 進 ready queue 時，都檢查一下現在誰是 shortest job
- 只看平均 waiting SJF 一定是最短的
- 可能會有 starvation
- 不公平
如何預測下一個 CPU burst time
- 幾乎不可能
- 利用簡單線性方程式
  $τ_{n + 1} = α t_{n} + (1 - α) τ_{n}$
  - $0 \leq α \leq 1$ (係數)
  - $τ_{n}$ 第 n 個 process 的預測 burst time
  - $t_{n}$ 第 n 個 process 的實際 burst time
- 試想
  - $α = 0 \equiv τ_{n + 1} = τ_{n}$ : 歷史真實記錄無效
  - $α = 1 \equiv τ_{n + 1} = t_{n}$ : 最近一個 burst time 完全影響下一次的預測
  - 展開原式，會發現越接近最近的 burst time，越影響預測
Priority Scheduling
- ex. SJF: 越短的 job priority 越高
- 分 preemptive 與 non-preemptive
- 可能會 starvation
- 解法: aging (會根據時間改變 priority)
Round Robin (RR)
- 單位 CPU 的排程
- time quantum (time slice)，超過就被搶占 / 進入 ready queue
- RR 的 ready queue 適用 FCFS 的
- 當有 n 個 process 在 ready queue，可以預測至少 (n-1) * q 的 time unit 後可以輪到 (q 是 time quantum)
- 如果 q 小，call interrupt 的 overhead 就會很大 => 不划算
- 不會 starvation
MultiLevel Queue
- ready queue 被分為多種類
- ex.
  - Foreground - RR
  - Background - FCFS
- process 會根據性質進入不同的 queue
- Feedback Queue
  - process 可以在不同的 queue 間移動 (aging 可以利用這個性質實做)
  - 前面的 queue 拿完，才進後面的 queue
  - upgrade / demote
  - Ex.
    - Q0 - RR 8 ms
    - Q1 - RR 16 ms
    - Q2 - FCFS
    - schedule: 新的 job 進入 Q0，拿到 CPU time 後開始執行，如果在時限內完成，則結束，如果沒有，會被移動到 Q1。進入 Q1 後同理
    - 通常最後會造成 IO-bound 被推到前面(這種 process 常是跟 user 互動的)，CPU bound 被推到後面
    - 公平，不會 starvation

Multiple Processor Scheduling

種類
- SMP (Symmetric MultiProcessing)
  - 對稱式架構，每顆 CPU 同樣重要
  - 共用 main memory
  - 透過 bus 溝通，要設計溝通方法
  - 每顆 CPU 自己處理自己的 scheduling
  - 可以共用 ready queue，也可以有自己的 ready queue
  - Process 數量小時效能好，架構簡單，擴展性差，cache 如何設計，race condition
- NUMA (Non-Uniform Memory Access)
  - Asymmetric multiprocessing
  - 只有一個 processor 訪問系統數據結構，減少了 data sharing 的需要
  - master - slave 架構
  - 擴展性更大
  - 常見建法: 超立方(hypercube): 保證點與點間最長距離 log(n)
Multiple processor scheduling
- processor affinity(親和力)
  - process migration(從一顆 CPU 換到另外一顆 CPU) 的 cost 很高，有時候把 process stick 在同一個 processor 反而比較好
  - soft affinity / hard affinity
- load balancing
  - push migration: 將 process 從高載的 CPU 推到低載的 CPU 的 ready queue
  - pull migration: 將 waiting process 拉進低載的 CPU 的 ready queue
  - Linux: 每 200ms 跑一次 push migration / 每次有 empty queue 跑一次 pull migration
- process migration 因為 LB 而常用，因為 cache 而少用 (互相衝突)
multicore vs multithreading
- multiple processor cores
  - 在同一個 chip 上
  - 快速溝通 / 減少功耗 / cache sharing
- multiple hardware threads (per core)
  - Takes advantage of memory stall to make progress on another thread while memory retrieve happens
  - A thread here is hardware-oriented, different from “threads” in Chapter 4
  - A thread corresponds to a logical processor, emulated by an independent set of registers
  - 當有多個 thread 時，可以交錯 CPU cycle 與 memory cycle
  - 個別 register，共用 ALU
  - 因為 instruction set 有相依性，所以如果邏輯上有多個 CPU，可以交錯執行而減少 Idle time

Real-Time Scheduling

限制 waiting time 不能超過 n
“A real-time system is a system whose correctness includes its response time as well as its functional correctness.” – IEEE

兩種類
- soft real-time systems
  - 不保證何時安排 critical real-time process
  - ex. packet 要在 streaming buffer 用完前送到並重組 (reassamble)
- hard real-time systems
  - 要在 deadline 前執行完畢
  - ex. 飛機航空系統 (控制)
Priority-based Scheduling
- 需要有 preemptive 與 priority based scheduling
- 但只保證 soft-realtime
- periodic process: process 需要 CPU 以恆定間隔進行
- processing time: t, deadline d, period p
- $0 \leq t \leq d \leq p$
Rate Montonic Scheduling (RM Scheduling)
- rate(較短的 period) 較高的 process 有較高的 priority
- ex. P1 (t=20, p=50), P2 (t=35, p=100), P1 較高 priority
- 會出現 missed deadlines
Earliest Deadline First Scheduling (EDF)
- CPU 在任何時期都該先做 deadline 較進的 process

Thread Scheduling

Local Scheduling
- thread library 決定哪一個要放入 LWP
- Process Contention Scope (PCS)
Global Scheduling
- Kernel 決定下一個 run 的 kernel thread
- System Contention Scope (SCS)

OS example

Solaris
- RR
- 60 條 RR queue
- 提早交出 CPU 的 priority 提前
- time slice 用完的 priority 退後
Windows
- 不重要
Linux
- Completely Fair Scheduler (CFS)
- virtual runtime
  - 每個 process 有自己的 runtime
  - 每使用 n 個 CPU 時間，runtime 就 +n (用這個 n 來控制成長速度(priority 的一種))
  - 每次從 RBTree (將 process 以 runtime 建立 RBTree) 中拿出 vRT 小的 process 來執行
  - 利用 vRT 成長速率不同，控制 process 使用真的 CPU Time 的長度
    - vRT slow: RT 成長慢 -> 較常使用 CPU
    - vRT fast: RT 成長快 -> RT 一上升，就無法使用 CPU Time
- nice value
  - -20 ~ 19 (high ~ low priority)
  - 小 nice 值會讓 vRT 成長慢，可得到更多 CPU time

Ch6. Synchronization

主要問題: 某個操作是由很多小操作組成，而多個小操作不能保證連續過程中不被搶占
ex. Producer-Consumer Problem 利用 counter 來操作
Race Condition

Critical-Section Problem

正在操作被共用的東西時，需要連續而不會被搶占(interrupt)的操作
解法三要素
- Mutual Exclusion: 排他性，不可被搶占
- Progress: 如果 Pj 想使用 resource 而目前沒有 Process 相要使用此 resource，Pj 應該要能夠使用此 resource
- Bounded Waiting: 因為 critical section 而進入 waiting 的時間要是可以預測的 (ex. Pj 在 waiting 時，因為 priority 低而永遠進不去)

Hardware-based approach

synchronization hardware
- 需要有能停止 context switch (interrupt) 的 hardware
- Interrupt disabling
  - 設計簡單，小系統上好用
  - 問題
    1. 如果 multiprocessor 無效
    2. privilege instruction 無法在 user mode 使用
    3. 程式寫太差了話，造成 interrupt leg 太久(interrupt disable 時)，signal 送不進來
  - Atomic Instruction
    - Atomic = non-interruptable
    - spin locks
- Test and set or swap
  - uniprocessor 與 multiprocessor 都可行
  - 透過 Atomic Instruction 幫忙，此時測試並設定 memory 中的 value
  - 透過 Atomic Instruction 幫忙，此時 swap value
．．．

Ch.7 Deadlock

討論 deadlock 需要一些預設的環境

System Model:

資源： CPU, memory, … (R1, R2, …)
權重：對每個 R1 有 W1 權重
資源效用：
1. request
2. use
3. release
Deadlock 條件(需同時發生)：
- Mutual execlusion: 一個資源一次只能讓一個 process 用
- Hold and wait: process 同時拿到一些資源，也等待一些資源
- No preemption: 無法強佔 (如果可以，則可以用搶佔的方式把所有資源拿來跑，跑完還回去繼續用)
- Circular wait: 環狀等待
將條件用 Graph 表示
- $P = P_{1}, P_{2}, . . .$ process 用 P set 表示
- $R = R_{1}, R_{2}, . . .$ Resourse 用 R set 表示
- Request Edge: Pi -> Rj Pi 需要 Rj 資源
- Assignment Edge: Rj -> Pi Rj 資源給予 Pi 使用
- ex.

Deadlock handling

detection or prevention or avoidance (偵測與避免)

Prevention

只要 deadlock 四條件其中一個保證不成立就好

Mutual Exclusion
- 只要是 serially reusable resources 就一定是 mutual exclusion => 無法在這點上避免 deadlock
Hold and Wait
- 可以嘗試保證在 requesting 資源時，沒有 holding 其他資源
- 資源全有全無
- Low concurrency among processes due to long critical sections
  - 用一個長期執行(long critical sections)的 process 佔用資源，可以在執行期間降低 concurrency(並發性) (執行完就不知道了…)
- 缺點：平行度下降
No Preemption
- 可以用 preemption 的機制把被別人 holding 的資源搶走
- victim (被搶的 process) 要在資源回來時 restart
- 需要有 checkpoint 的機制 (reg copy to memory, …)
Circular Wait
- 把 request / assig 變成 DAG (有向無環圖)
- 建立一個 total order 或 partial order，只能增序或降序要求資源

Avoidence

基於 cycle detection，在給予資源時，先做檢查

Graph
- Claim edge: May use a resource at some time
- Request Edge: Requesting a resource
- Assignment Edge: holding a resource
- 資源可能會在一段時間被 claim (但不一定會被用 hold) (ex. memory allocate 會先要求大一些空間)
1-Instance Resources (每次更改時，檢查是否有環)
1. 確認所有 claim edges (預測)
2. 當 process 真的要求時，將 claim 改成 request edge
3. 當 resource 可用時，將 request 改成 assignment edge 並重新檢查是否有環
4. 如果檢查發現有環，將 request 改回 assignment edge => 沒有環
另外一種方法：
- 用 highest level lock 來鎖著 cycle
N-Instance Resources
- graph-based approach
- safe / unsafe-state method
  - safe sys: 沒有 deadlock
  - 給予資源時都要持續保證保持在 safe-state
  - 需要定義 safe-state
Safe-State
- Process 需要 resources 時，系統應該馬上知道給予資源是否會破壞 stafe state
- safe sequence 可以維護 safe state
- safe sequence 中的 process 可以 request 的資源是目前可以使用的資源或前面的 process 握住的資源
  - 如果 request 前面 process 正在使用的資源，當那個 process 結束後，馬上可以給這個 process 使用

Banker's Algorithm

對於 N-Instances，需要預測做大要求資源，等待拿到資源，以及在拿到所有要求資源後，要在時限內環回來

Data
- Available: Avai[j] = k (紀錄 type j 的 Resource 有多少資源可用)
- Max: M[i, j] (Pi 最大會要求多少 Resource j)
- Allocation: Allo[i, j] (紀錄 Pi 已經佔用了多少 Rj)
- Need: N[i, j] (紀錄 Pi 還需要多少 Rj)
Need[i, j] = Max[i, j] - Allo[i, j]
Safety Algo.
- Max - Allo = Need (為什麼要用最大需求：如果可以符合最大需求，就一定可以符合比最大需求小的一班需求)
- => 算出 Need，排出每次 Available > Need 的序列，並且當次用完後，Need 可被加回 Available
- 如果可以找到上述序列，就可以用這個序列保證 saft-state

Detection

Detection & Recovery (週期性檢查)

Single Instance
- 維護 wait-for graph
  - Pi -> Pj: Pi 等待 Pj 用完 Resource (per resource)
- 定期檢查 cycle
- 拓墣排序用
  $n^{2}$ 檢查 cycle
- Resource-Allocation Graph -> Wait-for Graph
- Detection Algo.
  - 一樣找出 available / allocation / request
  - 方法：
    - 看不大懂 @@a
- Deadlock detection 保證 safe state，unsafe 不保證一定會 deadlock
- 如果 detect 到 => roll back 一些 process
Recovery
- Kill Process
  - 選擇 order
- Preemption
  - victim selection (minimize cost)
  - Rollback
  - 要避免 Starvation (因為用了 preemption)

Summary

Deadlock => 四條件 (mutual exclusion, hold and wait, non-preemptible, circular wait)
四條件擇一不成立 => 必定無 deadlock
prevention
- 擇一取消
avoidance
- 拒絕會產生 DL 的 resource request
- 1-instance per resource
  - DL <-> cycle
  - safe state <-> no DL
- N-instance
  - DL -> cycle
  - safe state -> no DL
  - DL -> unsafe
- safety check

Ch.8 Memory Management

Address binding

程式必須被帶入 memeory 並載入 processor 才可以變成程序
如何將 data / instruction 從 disk 帶入 memory

Binding inst. / data to Memory，有三種時間會 bind to mem.
- Compile time
  - static link: compile 時就知道放到 MM 的哪裡 (絕對位置)
  - 如果位置改變就要重新 compile
- Load time
  - 如果不知道執行時的 memory address，compiler 就必須寫入記憶體相對位置，在 load program 時才決定位置
  - relocatable instructions
  - non-relocatable instructions
- Execution time
  - dynamic link
    - .dll / .so 檔
    - 當程式用到時才會載入，每次載入時 address 都會不同，可以多個 process 共用 ex. glibc
    - address table / GOT 處理 dynamic linking address 位址
  - 安全
  - 不同種類的 memory management
    - Logical address: CPU 產出的，也叫做 virtual memory
    - Physical address: memory unit 看到的
  - User program 永遠只看得到 logical address
  - 需要 hardware 幫忙轉 logical 成 physical
MMU (Memory Management Unit)
- CPU 中將 logical 轉為 physical address
- Paging
  - Address mapping
  - Virtual memory
- Segmentation
  - Memory protection

Contiguous Memory Allocation

連續分派
- MM 通常會被分成兩部分
  - OS 保留，low memory 上，包含 interrupt vector
  - User processes
- Single-partition allocation
  - Relocation 可以讓 user process 互相隔離，並且益於更改環境(系統)與資料
  - Limit register 紀載 logical address range 並限制使用大小
Memory Allocation
- 給 process 的記憶體空間一定要連續
- 然，連續給出 process 再隨 process 結束後收回，會出現 free memory 片段(不連續)問題
- 如何安全且有效率地給出 free holes
  - First fit
    - 效率
  - Best fit: 給出最小符合
    - 盡量保留最大連續空間，但是剩餘的小空間會非常難使用
  - Worst-fit: 給出最大符合
    - 與上相反，通常會留下很多差不多大小的洞
    - 速度與使用效率都是最差的
  - Buddy System: 都已
    $2^{n}$ 給記憶體 (類似切網段)
    - 速度快，破碎化不一定多
Fragmentation
- External Fragmentation
  - 切割零碎，雖然剩下的加總記憶體空間夠程序使用，但連續空間不夠程序使用，使得程序無法執行
- Internal Fragmentation
  - 作業系統配置給 process 的 memory 空間大於 process 真正所需的空間，這些多出來的空間該 process 用不到，而且也沒辦法供其他 process 使用，形成浪費。(配合 paging / buddy system)

Paging

當出現 Fragmentation 問題時，是否可以將零碎的空間收回，再統一釋放使用
需要所有的 program relocate => OS 降速
=> 是否可以邏輯上連續，而實際上不一定連續呢?

Page
- 將物理 mem. 切割成塊(frame)，通常 4k
- 將邏輯 mem. 切割成塊(page)，與 frame 同大
- 再需要用到 mem. 時，建立 page table 指派到沒用的 frame，並且將這些 frame 標記為正在使用
Implementation of Page Table
- 每個 process 都有自己的 Page Table
  - Page-table base register (PTBR) 指向 page table
  - Page-table length register (PRLR) 代表 page table size
- 存在 Main memory
- 需要兩次的 mem access 才能執行/存取，一次是查找 page table，一次是載入 data/instruction
- 兩次記憶體查找問題可以用特殊硬體快速解決 associative memory / translation look-aside buffers (TLBs)
  - TLB 是 page table 的 catch

這裡怎麼計算是重點，求有緣人補充 XD

Effective Access Time
- $ε$ : 查找 TLB 時間
- $α$ : Hit ratio
- $T$ : memory access time
- $E A T = (1 T + ε) \cdot α + (2 T + ε) (1 - α) = 2 + ε - α$
- 1 是 TLB hit 只需要載入 data/instruction 一次記憶體存取，2 是 TLB miss 需要查找 Pabe table + 載入
- $ε$ 很小

Structure of the Page Table

Hierarchical Paging

每個 process 一個 PT，且 PT 大且連續是不方便且無效率的
既然 PT 也是在 mem. 中的，是否也可以被切片
- fragmented (不用連續)
- allocated on demand (省空間)
two-levle PT
- 原來 20 個 entry => 10*10 = 100 個 entry => 邏輯空間大，若 TLB miss 高，會消耗很多效率
- 應該是 logi(page) -> phy(outer page table) -> phy(page of page table) -> phy(frame)，不然會出現地回查找問題

Hashed Page Tables

記憶體稀疏且邏輯位址空間大時，適合用這種

適合真正使用的記憶體遠小於邏輯記憶體空間 (稀疏)

Inverted Page Table

邏輯空間的總大小會隨的 process 數量上升而變大 (logi mem. 是 per-process 的)，但 phy. mem. 不會跟著變大
inverted page table 的大小只跟 phy mem 有關，跟 log mem. 無關
全部 process 只有一個 page table
PT 內部存的是 pid 與 page number
search 到對的 pid 與 page number 時，search index 就是 frame num.

Segmentation

從 per-process　的角度看 memroy，不用管 mem.-management 等 phy. 的東西

Hardware
- Logical address: <segment-number, offset>
- Segment table (map phy. addr.)
  - base: 從哪裡
  - limit: 多長
- Segment-table base register (STBR)
- Segment-table length register (STLR)
- 也有類似 PT 的東西
Segment Protection
- 一些安全性規範，ex. 某些地方可執行不可寫入

Ch.9 Virtual-Memory

Demand Paging

Memory Protection
- protection bit
- valid-invalid
  - valid: 存在 process 的 local address space，是 legal page
  - invalid: 不在 process 的 local address space
- invalid page 可能是 illegal 或 valid 但還沒被 referenced
Virtual Memory
有多種理由用 VM
- 只有 program 中的一部分需要在 memory 中 (其他可以執行到在載入等等)
- VM 可以比 phy. 開的還要大
- 更加高效 (process 開始時，不用真的買上把 mem. 開起來)
- 可以有 share memory
VM 多用 demanded paging 實作
- 當 VM 摸到 invalide 的 main mem. 時(page fault)，OS 再去要新的 mem. (ex. swap in)
Demand Paging
- VM 可以比 phy. mem. 還大
- 只有在需要時，才把 page 拉近 memory (從 disk)
  - 減少 IO
  - 增加 process 數量 (減少 mem 需求)
  - 回應速度加快 (跟加速 process 開啟時間原因相等)
Performance of DP
- Page Fault Rate: p
- $E A T = (1 - p) * m e m . a c c e s s + p * (p a g e_{f} a u l t_{o} v e r h e a d + s w a p_{o} u t + s w a p_{i} n + r e s t a r t_{o} v e r h e a d)$
TLB
- hit
  - 絕對不會 page fault
- miss
  - not page fault
    - 去 PT 把 cache 抓回 TLB
  - page fault
    - 去 disk 抓進 PT 再抓進 TLB

Page Replacement

避免 over-allocation，需要有 page 回收機制
回收機制在 logical 與 physical 是分開做互不影響的

Basic Replacement
1. 找 frame
  - 如果有 free frame => 直接用
  - 如果沒 free frame => 複寫一個 victim frame
2. 讀 data 進 frame
3. 更新 PT 跟 frame table
4. restart process
Page Repolacement
如何選出 victim frame
- 有些 data 不用寫回，可以省一次 IO
- 用 dirty bit 紀錄是否需要寫回
Replacement 方法
- FIFO (first in first out)
- OPT (Optimal page replacement)
  - 誰的下一次 reference 最遠選他
  - but process 無法預測
  - 當作效能分析的 upper bound
- LRU (least-recently used)
  - 上次用裡現在最久的
  - Belady’s anomaly 當 Process 分配到較多的 Frame 數量，有時其 Page Fault Ratio 卻不降反升
- LRU 相似算法
  - Reference bit
    - 當這個 page 有被 reference 時，將 bit 設為一
    - 優先更新 RB = 0 的
    - 無法得知 RB = 1 的人誰先誰後
  - Second chance
    - 多一個 clock 固定清掉 RB
  - Enhanced Second-Chance Algo.
    - 再多一個 dirty bit
    - dirty bit 需要 mem IO 兩次，會比沒有 dirty bit 的人更耗時
    - (ref, dirty) victim 選擇順序
      1. (0, 0)
      2. (0, 1)
      3. (1, 0)
      4. (1, 1)
- Counting
  - LFU (Frequency)
    - 更新最小 count 的
    - 用 counter 的半衰期(left shift) 解決時間影響
  - MFU
    - 更新最大 count 的
    - 因為覺得最少使用的應該是剛剛進來的，不久之後就要使用了

Thrashing

Thrashing == busy swapping in and out
Page size 不夠導致高 page fault rate，造成

低 CPU 使用率
OS 可能會以為需要一狀況提高 mutiprogramming 的維度
更多 process 加入

Performance

再背景定期做 swapout 確保 free frame 夠多 & 背景執行不影響效能

Pre-paging
- 避免 process 剛起來時有大量的 page fault，預先作 swap in
- 但是可能載入大量錯誤 / 不需要的 data
TLB Reach
- TLB Reach = (TLB size) * (Page size)
- 有這麼多的 mem data size 是可以直接查找 TLB 就找到的
- 理想上 working set 的 data 都要可以進 TLB => 高效
- 也許會想要提高 TLB size，但是 TLB 需要大面積電路且功耗高
- 兩方法:
  - 增加 page size
    - 可能會產生高 IO、page fault，載入一整條 page 都不用
  - 提供 multiple page size
    - 比較常用的方法
Page Size Tradeoff
- Large
  - page table 相對小
  - 大 TLB reach
  - 高 page fault
- Small
  - 大 page table
  - 小 TLB reach
  - 低 page fault
- 對 Page size
  - 希望大 page siez => 小 page table
  - 小 page table 再 page search 時可以減少第一次 memory access time
- 對 IO
  - 大 page size
  - 序列畫 IO 一次近來
- 對 locality
  - 小 page size 可以比較精確的抓到要的值

兩個 process 各自維護的 page 共用同一個 frame

way
- 程式共用 .dll / .so
- lib 動態載入
- share memory

Copy on Write (COW)

常見於 fork
當 parent fork 出 child 時，理論上會式兩份 mem.，可是為了加速，可能會用 share frame 的方式，並設定 read only
如果 write 時會需要 trap，然後才 copy 出自己這份
只有再要 write 時，會再把 frame copy 一份並更新 page table

Swap & Memory-Mapped File

Swap-space (VM mapping 到 disk 上)
- Linux: Swap partition
- windows: pagefile.sys
swap-map: 追蹤 swap out 與位置
Memory Mapping Files
- 正常 file RW 需要 disk IO (system call)
- 如果把 file 當作 swap space，就可以用記憶體方式操作
- 好處
  - 方便好用 (ex. linux shared memory)
  - 不需要 user-kernel mode 轉換
  - 可以多 process 共用

Swapping

Blocking store
- 非揮發性儲存
Swap out, in
- 有 priority 決定誰出誰進