---
# System prepended metadata

title: OS 筆記

---

# OS 筆記
## CH1、CH2 - Overview of Operating Systems
### Computing Systems and Computation Mode
系統的運作模式（computation mode）可以分成三類：
|  | Batch Mode | Online Mode | Real-Time Mode |
| - | - | - | - |
| 跟外部世界（user / environment）的 interaction | 0 | 有 | 強烈 |
| focus on | **efficiency** | **responsiveness** | **predictability** |
| 效能指標（performance metrics） | 單一 process 的完成時間（completion time for one process, not each process） | throughput（每單位時間能完成多少事情） | **tardiness**（延遲） |
| resource allocation policy | 在 process 開始執行前，就先分配足夠的資源給它，藉此縮短執行與完成的時間 | 作業系統必須為每個 process 分配「公平份額」的資源來處理輸入與輸出，且任何 process 都不應該長時間獨佔 CPU 或記憶體 | 作業系統必須在「所有時間」皆確保每個運行中 process 的資源，以保證每次執行在時間上的結果都能保持一致 |
| 介紹 | results written to file or output device、one process needs to complete sa soon as possible | no time constraints involved、each process needs to take inputs and generate outputs from time to time | time constraints imposed（有嚴格的時間限制）、the completion time of every process are bounded within an interval |
| example | Compiler, ML Training | Spreadsheet, games, browsing | robot control, self-driving car |
### Computing System Architecture
#### Components of Computer：
![OS圖](https://hackmd.io/_uploads/S1r-rmIn-g.png)
#### architectures
有兩種 architecture，都是照著 turing machine 的架構做的（有三個很重要的 part：input、output、thinking process）
|  | von Neumann Architecture | Havard Architecture |
| - | - | - |
| 示意圖 | ![無標題（草稿）](https://hackmd.io/_uploads/HyeVpqmU2-e.png) | ![OS圖](https://hackmd.io/_uploads/rJGc9mIhZl.png) |
| 解釋 | 資料與指令共用同一條匯流排（bus） | separate storage and signal pathways for instructions and data |

modern processors 中，大多數外部架構上看起來像是 von Neumann，但在內部為了提升效能，其 instruction cache 跟 data cache 會分開配置，融合了兩者的特性，叫 modified Harvard architecture
缺點：在這兩種 architecture 中，processors 都被放置在電腦系統的最核心位置，代表 processors have the most heavy loading and complex workload，而其他所有 components 都只是為了支援 processors 而運作
現代的 GPU 跟 Deep Neural Networks：現代的 GPU 運算能力極強，能夠在一到兩個週期（cycles）內，同時在數千個 GPU 核心上執行單一指令，但 Deep Neural Networks 在資料移動要用數十、數百甚至數千個週期
#### Definition of Operating Systems
（其實沒有一個全世界通用的定義）
作業系統主要負責管理（overlook）硬體元件來服務應用程式
然後系統中除了作業系統核心之外的軟體有兩種：
- system program：隨著作業系統一起發布，但不屬於作業系統核心（kernel）的一部分
- application program：所有與作業系統無關的程式

現今針對一般用途和行動裝置的作業系統，通常還會整合 middleware（a set of software frameworks that provide addition services to application developers such as dockers, databases, multimedia, graphics）
### Type of Operating Systems
分成兩大類，Stand Alone 跟 Distributed Operating Systems
#### Stand Alone (Uni-Processor) Operating Systems
（這們課幾乎都是在講這種）
管理一台 computer、一個 processer 但可能有多個 processing core
這種 OS：
- acts as a resource manager or an arbitrator (Manages CPU, I/O devices, memory)
- provides a virtual interface that is easier to use the hardware

A View of Operating System Services：
![OS圖](https://hackmd.io/_uploads/rkUHZsLn-x.png)
Stand Alone 又再細分成三大類：
1. Monolithic (e.g., MS-DOS, early UNIX and Linux)
  One large kernel that handles everything.
  The UNIX OS consists of two separable parts：
      - system programs
      - the kernel（everything below the system call interface and above the physical hardware / provides the file system, CPU scheduling, memory management, and other operating-system functions; a large number of functions for one level）
      
   monolithic 的好處：所有東西都寫在一起所以要分享資料的話直接 share memory 就好 很方便簡單
   monolithic 的壞處：修改其中一個部分的時候整個系統都得關機改設定
   Traditional UNIX System Structure 跟 Linux System Structure 都是 monilithic 但有點不一樣
   **Traditional UNIX System Structure：**
   ![截圖 2026-04-10 16.03.02](https://hackmd.io/_uploads/H1Z6CQIh-l.png)
   **Linux System Structure：**
   Runs entirely in kernel mode in a single address space（does have a modular design that allows the kernel to be modified during run time）
   Applications typically use the glibc standard C library when communicating with the system call interface to the kernel
   ![截圖 2026-04-10 16.03.39](https://hackmd.io/_uploads/B1PykNIn-l.png)
2. Ring-based / Layer-Designed
   ![截圖 2026-04-10 16.18.41](https://hackmd.io/_uploads/B1eOzVI3-l.png)
  Functionality is decomposed into N layers
  Each layer uses services of layer N-1 and implements new service(s) for layer N+1
  一層一層往上架，最底層/核心是 hardware，最頂層是 user interface
  越底層/裡面權限越高、越頂層/外面權限越低
  有些 processor 想要比 layer 0 還高的權限 所以就有負的 layer (e.g. hypervisor)
3. Microkernel
  ![截圖 2026-04-10 16.20.42](https://hackmd.io/_uploads/r1rJ7N8h-l.png)
  small kernel
  user-level servers implement additional functionality
  作業系統的核心不要把所有東西包進去
  Moves as much from the kernel into user space（盡可能的把作業系統的功能從 kernel space 移出，轉而放到 user space 來執行）
  Mach 是最知名的 example（Mac OS X kernel (Darwin) partly based on Mach）
  Communication takes place between user modules using message passing
  microkernel 的好處：easier to extend（擴充）、easier to port the operating system to new architectures（移植）、reliable and secure
  microkernel 的壞處：performance overhead of user space to kernel space communication

Modules：Many modern operating systems implement loadable kernel modules（LKMs）- use object-oriented approach、each core component is seperate、each talks to others over known interfaces、each is loadable as needed within the kernel
雖然剛剛有三種 structure，但很難有單一的 structure 符合需求，所以現在都是用 hybrid systems（投影片 p81 有一些舉例但我覺得應該不會考舉例吧）
#### Distributed Operating Systems
- 管理多個 computer (computer 有很多台 然後有一台會執行 operating system)
- number of computers or processors are
connected via network to increase the computation capacity

分成三類：
| | Distributed Operating Systems（DOS） | Networked Operating Systems (NOS) | Middleware |
| - | - | - | - |
| **Description** | Tightly-coupled operating（一個 operating system 管多個 processor，也就是說這個 operating system 可以決定每個 processor 要做什麼樣的工作）system for multi-processors（一個 computer 上可能有多顆 processor 的 chip）and homogeneous multicomputers（有很多台使用一樣架構的 processor 的電腦），不管是 multi-processors 還是 homogeneous multicomputers，這些 processor 都會使用一樣的指令集（ex. 都使用 Intel 的 x86 的 instruction） | Loosely-coupled（比較鬆散、不緊密連結的 operating system，每個 computer 會有自己的 local 的 operating system 可以自己決定自己的 local 的 computer 要做什麼工作，不能強制某個 computer 做某件事情，只能 request 它，底層的 computer 可以拒絕）operating system for heterogeneous multicomputers（每個 computer 可能用不同的 processor，這些 computer 用 LAN 或 WAN 連接） | Additional layer atop of NOS implementing general-purpose services，比 NOS 再多一個調和的角色（例如底層的 computer 拒絕的時候就可以決定怎麼處理） |
| **Main Goal** | Offer local services to remote clients。Hide and manage hardware resources | Offer local services to remote clients（舉例用途：需要很大的算力的時候可能會需要用很多 computer）。Provides features such as file sharing across the network，along with a communication scheme that allows different processes on different computers to exchange messages | Provide distribution transparency |
| 行為 | collects a number of computers to **act like ONE** computer | collects a number of computers to **act like a group** of computers |

下面兩種描述是指一樣的東西：
1. NOS enables resource sharing among the computers but acts autonomously from all other computers
2. DOS enables resources sharing but acts less autonomously
### Interface with Operating Systems
![OS圖 2](https://hackmd.io/_uploads/rkoPMjIn-l.png)
### System Services / System Calls
#### API - System Call - OS Relationship
![截圖 2026-04-10 17.09.45](https://hackmd.io/_uploads/HyDvCNU2Wl.png)
#### System Calls
- Programming interface to the services provided by the OS
- Typically written in a high-level language (C or C++)
- Mostly accessed by programs via a high-level Application Programming Interface（API）rather than direct system call use
- 3 most common APIs：
    - Win32 API for Windows
    - POSIX API for POSIX-based systems（including virtually all versions of UNIX, Linux, and Mac OS X）
    - Java API for the Java virtual machine（JVM）
- 通常作業系統會為每一個 system call 分配一個專屬的 number。system call interface 內部會維護一張表格 (table)，這張表就是依照這些編號來建立索引的
- 當應用程式發出請求時，system call interface 會透過查表在 OS kernel 中喚起相對應的 system call 來執行，完成後再將執行的狀態與結果回傳給應用程式（呼叫 system call 的時候要把參數從 user space 傳到 kernel space）
  ![截圖 2026-04-10 21.57.44](https://hackmd.io/_uploads/SyNyfY8nWx.png)
- caller 完全不需要知道 system call 是如何 implement 的，只需要遵守 API 的規範（implement 細節會被 run-time support library 隱藏起來並代為管理）
- 執行 system call 的時候要傳的 parameter 不只 number，parameter passing 的方法有三種
    - registers（缺點是有時候 parameter 的數量會比 register 多、後面兩個方法都不用限制 parameter 的數量長跟度）
    - 把 parameter 放在 memory 的 block / table 裡然後把那個 memory 的 address 當作 parameter 傳給 register（Linux 跟 Solaris 都用這個）
      ![截圖 2026-04-10 22.53.38](https://hackmd.io/_uploads/rybW19U2be.png)
    - 用 stack：program push 進去 stack 之後 OS pop 出來
System Calls to Serve / Types of System Calls：
- Process control
    - create process, terminate process
    - end, abort
    - load, exeute
    - get process attributes, set process attributes
    - wait for time
    - wait event, signal event
    - allocate and free memory
    - Dump memory if error
    - Debugger for determining bugs, single step execution
    - Locks for managing access to shared data between process
- File management
    - create file, delete file
    - open, close file
    - read, write, reposition
    - get and set file attributes
- Device management
    - request device, release device
    - read, write, reposition
    - get device attributes, set device attributes
    - logically attach or detach devices
- Information maintenance
    - get time or date, set time or date
    - get system data, set system data
    - get and set process, file, or device attributes
- Communications
    - create, delete communication connection
    - send, receive messages if message passing model to host name or process name (from client to server)
    - Shared-mamory model create and gain access to memory regions
    - transfer status information
    - attach and detach remote devices
- Protection
    - Control access to resources
    - Get and set permissions
    - Allow and deny user access
### Interrupts
![OS圖 3](https://hackmd.io/_uploads/SJweQo8nbx.png)
#### interrupt 的流程
1. 保存當前狀態：OS 會透過 register 跟 program counter 的內容來完整保留 CPU 當下的狀態
2. 轉移控制權：interrupt 會將控制權轉移給 interrupt service routine。這個轉移過程通常會透過 interrupt vector 進行，interrupt vector 是一個包含了所有 service routine 的 address 的表格（interrupt architecture must save the address of the interrupted instruction）
3. 判斷類型與執行：OS 會判斷剛剛發生的是哪一種類型的 interrupt，並且會有獨立的程式碼片段來決定針對該類型的 interrupt 應該採取什麼樣的具體行動。處理完成後 CPU 才會再回到原本的 user program 繼續執行
    interrupt 的種類：
    - 硬體 / 外部中斷
    - 軟體中斷（Trap 或例外 Exception）

![截圖 2026-04-10 23.40.09](https://hackmd.io/_uploads/Hkwy55L3We.png)
operating system is driven by interrupts
#### Interrupt Timeline
![截圖 2026-04-10 23.39.09](https://hackmd.io/_uploads/BJ3oFqL2Zl.png)
上面的綠線高的時候代表 cpu 在執行 user program 低的時候的代表在執行 inturrupt service routine
下面的綠線高的時候代表 IO device 是 idle 的 低的時候代表在工作
下面的 timeline 是做 IO request 會發生的事
### Functional Components in Operating Systems
老師沒教，他說 p55 之後的內容都是大略，每個內容以後的章節都會更詳細講，考古有再來看
## CH3 - Process
### Process Concept
- An operating system executes a variety of programs that run as a process
- Process – a program in execution; process execution must progress in sequential fashion. No parallel execution of instructions of a single process.
- multiple parts：
    - The program code, also called **text section**
    - Current states including **program counter, processor registers**
    - **Stack** containing temporary data（Function parameters, return addresses, local variables）
    - **Data section** containing global variables
    - **Heap** containing memory dynamically allocated during run time
- Program is passive stored on disk（executable file）、process is active（Program becomes process when an executable file is loaded into memory）
- Execution of program started via GUI mouse clicks, command line entry of its name, etc.
- One program can create several processes at the same time（Consider multiple users executing the same program）

process in memory：
![截圖 2026-03-16 00.36.52](https://hackmd.io/_uploads/rJZElvV5-x.png)
包含前面列的幾個 component（不同的 OS 會有一些差異，但不會差太多）
uninitialized data、initialized data、text 的大小是固定的，其他是動態的
C program 的範例：
![截圖 2026-03-16 00.42.58](https://hackmd.io/_uploads/B1xj-wN9Wl.png)
Use `size` command to view the size of each section of a program：
- Linux
  ![截圖 2026-04-11 00.37.56](https://hackmd.io/_uploads/ryQjDsL2-g.png)
- Mac
  ![截圖 2026-04-11 00.39.10](https://hackmd.io/_uploads/rkEawjIh-l.png)
  觀察可以發現 `__DATA：16384`，`16384 = 0x4000`，that is 16KB memory page（On ARM64, each memory page sizes at 16KB）
  如果用 `size -m` 可以看到更詳細的資訊，其中會多看到 `__PAGEZERO：4294967296`，`4294967296 = 0x100000000`，that is 4GB page zero（which is chunk of the address space starting at address 0 which allows no access）

When a program is loaded and executed, memory, including virtual and physical, will be allocated for different sections of the process
用 vmmap 可以 view the size of each section of a process
```
vmmap(1)                    General Commands Manual                   vmmap(1)

NAME
     vmmap – Display the virtual memory regions allocated in a process

SYNOPSIS
     vmmap [-s] [-w] [-v] [-pages] [-interleaved] [-submap] [-allSplitLibs]
           [-noCoalesce] [-summary]
           pid | partial-executable-name | memory-graph-file [address]

DESCRIPTION
     vmmap displays the virtual memory regions allocated in a specified
     process, helping a programmer understand how memory is being used, and
     what the purposes of memory at a given address may be.

     vmmap requires one argument -- either the process ID or the full or
     partial executable name of the process to examine, or the pathname of a
     memory graph file generated by leaks or the Xcode Memory Graph Debugger.

     If the optional address is given, information is only shown for the VM
     region containing that address (if any) and the regions around it.

OPTIONS
     -s, -sortBySize
                    Print sorted regions and malloc zones by size (dirty +
                    swapped)

     -w, -wide      Print wide output, to show full paths of mapped files.

     -v, -verbose   Equivalent to -w -submap -allSplitLibs -noCoalesce

     -pages         Print region sizes in page counts rather than kilobytes.
```
下面是一個 Hello World 的程式
```=c=55
for( i = 0; i < NUM_ITERATIONS; i++ ){
    memset( str, 0, sizeof(str) );
    strcpy( str, "Hello World, OS Spring 2023." );
    if( iflag == TRUE ) printf( "%s\n", str );
    else if( sflag == TRUE ) nanosleep( &deadline, NULL );
}
const long long unsigned elapsed = time_ns() - start_ns;
printf( "%s\n", str );
```
執行上面的 Hello World 程式：
![截圖 2026-04-11 00.51.51](https://hackmd.io/_uploads/H1E2csL2-e.png)
執行完那個程式之後查看那個 process 的資訊：
![截圖 2026-04-11 00.52.20](https://hackmd.io/_uploads/B1zC5o8hbe.png)
![截圖 2026-04-11 00.54.02](https://hackmd.io/_uploads/S1irsiL2Wx.png)
放大上面那張 summary 會找到：
1. All the memories to execute the process
   ![截圖 2026-04-11 00.56.09](https://hackmd.io/_uploads/ryShsoInZe.png)
2. All the memories on physical memory devices
   ![截圖 2026-04-11 00.56.26](https://hackmd.io/_uploads/SJIpojU2Wl.png)
### Process State
Requirements of running processes in OS
1. Multiplexing / Time-Sharing
   on modern computers, physical and logical resources need to be shared among numbers of processes. OS needs to allow the processes to occupy AND yield the resources from time to time（at sub-second level）
2. Isolation
   the operating system should protect each process from being abused or attacked on resource assignment, data storage, and execution
   ![截圖 2026-04-11 00.59.56](https://hackmd.io/_uploads/ryI5hsUhZx.png)
3. Interaction
   resources, including data and code, should be shared among process
   ![截圖 2026-04-11 01.00.42](https://hackmd.io/_uploads/B1uT2s8nbl.png)

要怎麼達成這些 requirements：
- The processes in the same operating system need to share
    - the processors but one process at a time,
    - the memory
    - other peripheral devices（其他周圍的 devices）
- The OS on uni-processor schedules the processes to execute：
    - One at a time
    - One process can only be scheduled to execute on processor at certain cases and to suspend from processor at certain cases
    - Process State is the way to allow the OS to decide when to schedule properly

關於 process state：
As a process executes, it changes its state
有下面幾種：
- New（The process is being created）
- Running（Instructions are being executed）
- Waiting（The process is waiting for some event to occur）
- Ready（The process is waiting to be assigned to a processor）
- Terminated（The process has finished execution）

state diagram：
![截圖 2026-04-11 01.06.34](https://hackmd.io/_uploads/SJdXAi8nZg.png)
用 PCB 來 keep track of the states：
PCB - Information associated with each process (also called task control block)
裡面會存這些東西：
- process state（running, waiting, etc.）
- Program counter（location of instruction to next execute）
- CPU registers（contents of all process-centric registers）
- CPU scheduling information（priorities, scheduling queue pointers）
- Memory-management information（memory allocated to the process）
- Accounting information（CPU time used, clock time elapsed since start, time limits）
- I/O status information（I/O devices allocated to process, list of open files）

![截圖 2026-04-11 01.09.28](https://hackmd.io/_uploads/r1vARsI2Zl.png)
用 C 的 structure 來實作：
```=c
struct task_struct{
    pid t_pid; /* process identifier */
    long state; /* state of the process */
    unsigned int time_slice /* scheduling information */
    struct task_struct *parent;/* this process’s parent */
    struct list_head children; /* this process’s children */
    struct files_struct *files;/* list of open files */
    struct mm_struct *mm; /* address space of this process */
}
```
![截圖 2026-04-11 01.12.27](https://hackmd.io/_uploads/HkiK1n8h-e.png)
### Process and Thread
#### What is Process and Thread
Process 跟 Thread 都是用來執行程式碼的，但它們處於不同的「粒度 (granularities)」層級
- The resources are allocated to the processes.
- the processors are idle when the ready queue is backlogged（壅塞），這時候就會 waste the processing（CPU/GPU）time
- To optimize resource usage, the operating system moves the idle process from the processors and selects another ready process to use the processor.
- However, from the process’s perspective, can one process delegate its processing time to other parts of its work or the processes it chose? 答案就是用 threads

multitasking in a process：
![截圖 2026-04-11 02.34.04](https://hackmd.io/_uploads/HyKszTIh-x.png)
#### Multi-core or Multi-processor
| multiple processors | multicore processor |
| - | - |
| ![截圖 2026-04-11 02.36.10](https://hackmd.io/_uploads/HJ5Xm6L3-l.png) | ![截圖 2026-04-11 02.36.31](https://hackmd.io/_uploads/BkxBmT82-x.png) |
| Each processor has its own cache but share main memory and IO with other processors via shared Bus | Within each processor, there are more than one processing unit, called a core. Each core has its own L1 cache but may share other cache with other cores of same processor. |

#### 關於 multiple processors
the OS can schedule different processes to run on different processors, but need to re-design the OS（There are multiple processors. How about memory, disk, IO devices, etc？）
![截圖 2026-04-11 02.42.48](https://hackmd.io/_uploads/Skv2NaLnWg.png)
#### multiple processors design 1
Each CPU has its own private operating system and memory is shared among all the processors and input-output system are also shared. All the system is connected by the single bus.
![截圖 2026-04-11 02.43.56](https://hackmd.io/_uploads/BkFgHpI2Ze.png)
但會有 race condition、分配到的資源可能很少
所以就出現了別的做法：
#### multiple processors design 2：Asymmetric multiprocessing - Master slave multiprocessor
There is a single data structure which keeps track of the ready processes.
- In this model, one processor works as master and other central processing unit work as a slave.
- All the processors are managed by the single processor which is called master server.
- The master server runs the operating system process and the slave server run the user processes.
- The memory and input-output devices are shared among all the processors and all the processor are connected to a common bus.
- This system is simple and reduces the data sharing so this system is called Asymmetric multiprocessing.

![截圖 2026-04-11 02.46.25](https://hackmd.io/_uploads/rJAFB6IhWl.png)
但還是有會有問題，例如 bus 那邊會變成 buttleneck、需要 system service 的時候都要去呼叫 master processor（還有一些其他問題）
所以又再出現了別的做法：
#### multiple processors design 3：Symmetric multiprocessor（SMP）
In SMP model, there is one copy of the OS in memory, but any processor can run it.
- When a system call is made, the processor on which the system call was made traps to the kernel and then processes that system call.
- This model balances processes and memory dynamical.
- This approach uses Symmetric Multiprocessing where each processor is self-scheduling. The scheduling proceeds further by having the scheduler for each processor examine the ready queue and select a process to execute

![截圖 2026-04-11 02.49.22](https://hackmd.io/_uploads/HyMHUpI2Zg.png)
#### multiple processors design 4：big.LITTLE
不是所有的 processors 都要是 symmetric
The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority
or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.
![截圖 2026-04-11 02.51.29](https://hackmd.io/_uploads/BJbTIT8nbe.png)
上面那些對 CPU 可能是好方法但對 GPU 不是，不過關於 GPU 的期中考不會出現
### Process Scheduling
#### How Schedule
Process scheduler selects among available processes for next execution on CPU core.
- For a system with a single core, there will never be more than one process running at a time, whereas a multicore system can run multiple processes at
one time.
- Goal：Maximize CPU use, quickly switch processes onto CPU core
- Maintains scheduling queues of processes（process scheduling 會用 ready queue 和 wait queue 做）
    - Ready queue – set of processes residing in main memory, ready and waiting to execute
    - Wait queues – set of processes waiting for an event（i.e., I/O）
    - Processes migrate among the various queues.

![截圖 2026-04-11 02.54.11](https://hackmd.io/_uploads/Sy-Dwa8nbx.png)
Not Every Process Has Same Workload，所以不能 optimally assigning processor time，但這樣的話要怎麼 schedule？
有兩種 process：
- I/O-bound process：spends more of its time doing I/O than it spends doing computations（IO-bound 的 CPU 有點像是用洗衣機洗衣服的人，只要丟進去跟拿出來，剩下的時間都是 idle 的）
- CPU-bound process：大部分時間在計算，少部分時間在 IO，這類的應該要被分配到比較多 CPU 時間


但有個問題：some process or programmer may cheat to acquire more process time，像是用 spin lock
#### Spin Lock
它的 CPU 雖然在做 IO，但為了拿到更多 CPU 時間會假裝自己在做計算，像是你雖然是去洗衣服但除了丟進去跟拿出來之外還一直站在洗衣機旁邊等，不會 release rpocessing time）
lock 的方法：
- lock a resource：once and for good
  The process tries to lock the resource
  If not available, it may abort or come back later
  ```=c
  if lock( myFile ){
      printf( "this is what I want to write.\n" );
      unlock( myFile );
  }
  ```
  ![截圖 2026-04-11 11.42.10](https://hackmd.io/_uploads/B1CfXrw2-x.png)
  ![截圖 2026-04-11 11.42.20](https://hackmd.io/_uploads/ryuQ7rv3Wg.png)
  ![截圖 2026-04-11 11.49.29](https://hackmd.io/_uploads/BydRErPh-x.png)
- spin lock：keep trying
  The process repeatedly tries to lock the resource
  During the trials, the process stays in running state and consume CPU time
  ```=c
  while( not lock( myFile ) ) {
      // 不會做任何事，一直空等
  };
  printf( "this is what I want to write.\n" );
  unlock( myFile );
  ```
  ![截圖 2026-04-11 11.46.41](https://hackmd.io/_uploads/S13Q4BPhbe.png)
  ![截圖 2026-04-11 11.46.49](https://hackmd.io/_uploads/BywEESPh-g.png)
  ![截圖 2026-04-11 11.49.50](https://hackmd.io/_uploads/r1BlBHv3-x.png)
- mutex lock：wait for lock
  The process waits for the signal when the resource is available.
  During the waiting, the process is suspended
  ```=c
  if acquire( myFile.lock ){
      printf( "this is what I want to write.\n" );
      release( myFile.lock );
  }
  ```
  ![截圖 2026-04-11 11.48.57](https://hackmd.io/_uploads/BJN3ESP2Zg.png)
  ![截圖 2026-04-11 11.49.05](https://hackmd.io/_uploads/HJ32VBD2-e.png)
  ![截圖 2026-04-11 11.50.22](https://hackmd.io/_uploads/rJ3WSBv3bx.png)

用不同的 queue 來放不同狀態的 process（像是 I/O request 的就放在 I/O wait queue）
Queueing-diagram representation of process scheduling：
![截圖 2026-04-11 11.56.36](https://hackmd.io/_uploads/HJZY8BP2bg.png)
#### Context Switch
context switch：做 process 切換的時候要把原來的 process 執行的環境移除、暫存到另一個 storge device 上，然後把下一個要執行的 process 搬進去 processor、memory 裡面
-  Context of a process represented in the PCB（context 是 process 在執行的程式碼或環境）
- Context-switch time is pure overhead; the system does no useful work while switching
- The more complex the OS and the PCB, the longer the context switch

圖例（process P0 做 context switch 到 Process P1 然後再 context switch 回去的過程）：
![截圖 2026-04-11 11.59.33](https://hackmd.io/_uploads/SkHVDBvhWl.png)
Context-switch time/overhead dependent on hardware support（Some hardware provides multiple sets of registers per CPU, so multiple contexts loaded at once）
Context switch can occur voluntarily or involuntarily：
- voluntarily
    - one process explicitly yields the CPU to another.
    - Call `yield()` to yield CPU（呼叫 yield 之後 process 狀態會變成 ready）
    - Call `sleep()`
    - Request I/O, request to lock/semaphore, makes a system call that blocks, etc.
- involuntarily
    -  the system scheduler suspends an active process, and switches control to a different process.
    - Process/thread scheduler tries to share CPU fairly by time-slicing.（因為 time slice expire 被 involuntarily context switch 的話出去的時候 process 還是會回到 ready 的狀態）
    - Suspend/resume at periodic intervals
    - Involuntary context switches can happen **any time**.

CPU-bound process：only has involuntary context switch or no context switch
![截圖 2026-04-11 12.08.59](https://hackmd.io/_uploads/ry5PtSwnZl.png)
I/O-bound process：lead to voluntary and involuntary context switch
![截圖 2026-04-11 12.09.40](https://hackmd.io/_uploads/ryVjFrwnZe.png)
![截圖 2026-04-11 12.09.49](https://hackmd.io/_uploads/Hy0cKSPh-l.png)
（太長一串所以只截最前面跟最後面）
cost of context switch 有分成 direct 跟 indirect：
- Direct Cost: Load and Store instructions（載入跟儲存指令）
- indirect cost：COLD cache and cache misses
    - HOT Cache：the required data are stored in cache.（cache 有要的 data 所以可以直接拿） No need to fetch data/instruction from main memory.
    - COLD Cache：just a blank cache or one with stale data.（因為 cache 是空的所以找不到要的 data） Need to re-fetch data/instructions from main memory.（要去 main memory fetch data 所以就要 context switch）

`sleep()` 的實作：
- puts the process into waiting state and waiting queue, which causes a voluntary context switch

`nenosleep()` 的實作（suspends the process for very short period of time）：
- 可能的實作方法：may be implemented by a spin-lock and keeps the processor at running state. The process will be rescheduled involuntarily when the time slice expires.
- Mac：involuntary
  ![截圖 2026-04-11 12.37.45](https://hackmd.io/_uploads/SktXxLPhWx.png)
- Linux：voluntary
  ![截圖 2026-04-11 12.38.03](https://hackmd.io/_uploads/rycVxIP2bx.png)
#### Mode Switch（Not Context Switch）
Switching between User and Kernel Modes
Mode switch requires privilege switch and the pointers to memory region switches and takes time.
- user mode（he processes execute the instructions in user-level）
    - math operations：`sqrt()`、`abs()`
    - memory operations：`memset()`、`strcpy()`
- kernel mode（the processes execute the instructions implemented in kernel）
    - device I/O：`printf()`、`dd()`
    - collect system information：`getpid()`

![截圖 2026-04-11 12.57.30](https://hackmd.io/_uploads/r1qTELP2bl.png)
## CH4 - Threas & Concurrency
### Overviwe - Multitasking a Process
#### overview
Can the system execute different functions of one process in time-sharing manner？
One process can have many tasks to accomplish, including
- User interface
- Storage
- Computing

Sequencing all the tasks can lead to poor responsiveness and performance.
They are not all sequential and can be conducted in parallel. However, it is expensive to use multiple process including
- demanding memory use
- unnecessary isolation among the sibling processes
- competing resources with other applications/processes

**Threading** is designed to tackle aforementioned pitfalls. In particular, process creation is heavy-weight while thread creation is light-weight.
Multiple tasks of the application can be implemented by separate threads
- Update display
- Fetch data
- Spell checking
- Answer a network request

![截圖 2026-04-11 16.38.14](https://hackmd.io/_uploads/rJBKdYP3-l.png)
![截圖 2026-03-16 02.22.50](https://hackmd.io/_uploads/BynbYONc-g.png)
- muti-processed socket server：如果不 fork 一個 child 出來那 server 就只能服務一個人
  ![截圖 2026-04-11 16.40.40](https://hackmd.io/_uploads/ByVzttvnbg.png)
  ![截圖 2026-04-11 16.49.14](https://hackmd.io/_uploads/r1rMsFwn-g.png)
  child of Server：
  ```=c
  if( !fork() ){
      ...
      while( nBytes!=0 ){
          ...
      }
      printf("Close one socket.\n");
      close(newSocket);
      exit(0);
  ```
  Server：
  ```=c
      while( 1 ){
          newSocket = ...
          ...
      }
      return 0
  }
  ```
  每一個 child 都有自己的 process memory（從 parent 複製）：
  ![截圖 2026-04-11 16.53.50](https://hackmd.io/_uploads/S1FQ3YvhWe.png)
- multithreaded socket server：但有些時候只是要傳一些資料而已沒有必要用到 fork 所以用 thread
  ![截圖 2026-04-11 16.40.53](https://hackmd.io/_uploads/Hy-7tYw3Zg.png)

achieve multithread on single processor：
What do we need to run these concurrent tasks in a process（From memory point of views）：
- Argc/argv: no need to change.
- Stack: need to support concurrently access to different stack area. Each thread can only access its stack.
- Heap: shared by the threads and need to avoid access conflict among threads.
- Data: shared by the threads and need to avoid access conflict among threads.
- Text: read-only and can be shared without extra efforts.

From CPU point of views：
- Registers for each thread：increase the number of registers on CPU or limit the use on register for each thread.
- Cache：reserve the cache for each thread（but how？）

From OS point of views：
- The threads of one process can share resources：physical and logical.
- Reduce the frequency and overhead of context switch.

![截圖 2026-04-11 17.00.45](https://hackmd.io/_uploads/Hkc6aYD3Zg.png)
![截圖 2026-03-16 02.43.06](https://hackmd.io/_uploads/rJevpTOE5Wg.png)
server thread：
```=c
void * socketThread( void *arg ){
...
/* loop while connection is live */
while( nBytes!=0 ){
    if ( nBytes = recv( newSocket, ...
        printf( "[%llu] Received: %s\n", ...
```
Server（有兩段）：
```=c
newSocket = accept( welcomeSocket, ...
if( pthread_create( &tid[i++], ...
    printf("Failed to create thread\n");
}
```
```=c
while( i < nThread ){
    pthread_join( tid[i++],NULL );
    printf( "Thread %d joined.\n", i );
}
```
以下是一些 multiple thread version：
![截圖 2026-03-16 14.36.48](https://hackmd.io/_uploads/BkTbHXBc-x.png)
![截圖 2026-03-16 14.40.50](https://hackmd.io/_uploads/BkAl8mr9Ze.png)
![截圖 2026-03-16 14.41.02](https://hackmd.io/_uploads/Skob8QBcZl.png)
conclusion：concurrently running different functions, called multithreading, can be achieved by software, by using complier, system software, and operating systems
#### TCB
what should we changed to share the memory space：
| PCB（process control block） | TCP（thread control block） |
| - | - |
| - process ID <br> - process state <br> - program counter <br> - CPU registers <br> - CPU scheduling information <br> - accounting information <br> - I/O status information | - current state <br> - registers <br> - status（EFLAGS） <br> - program counter（EIP） <br> - stack |
| ![截圖 2026-04-11 19.37.19](https://hackmd.io/_uploads/Sy5_z3P2bx.png) | ![截圖 2026-04-11 19.37.37](https://hackmd.io/_uploads/ryhtGnw2Wl.png) |

benefits：
- Responsiveness：may allow continued execution if part of process is blocked, especially important for user interfaces
- Resource Sharing：threads share resources of process, easier than shared memory or message passing
- Economy：cheaper than process creation, thread switching lower overhead than context switching
- Scalability：process can take advantage of multicore architectures
#### Amdahl’s Law、Gustafson's law
Amdahl’s Law：Identifies performance gains from adding additional cores to an application that has both serial and parallel components
公式：$speedup \leq \frac{1}{S+\frac{1-S}{N}}$（$S$ 是 serial portion、$N$ 是 number of processing cores）
- e.g. if application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times（S = 0.25, N = 2）

![截圖 2026-04-11 20.24.17](https://hackmd.io/_uploads/SylKa2v3Zl.png)
when $N \rightarrow \infty$, $speedup \rightarrow \frac{1}{S}$（Serial portion of an application has disproportionate effect on performance gained by adding additional cores）
Amdahl’s Law 的缺點：
- 只能算 UPPER bound，但他被 sequential execution 的比例限制（The sequential execution dominates the execution of a process）
- 這個算法預設 size of the problem（i.e. size of the data）= line of codes
- 這個算法預設每個 processor 都一樣，但 CPU 跟 GPU 不一樣、big.LITTLE architecture 沒有所有的 core 都一樣

Gustafson's law：
$speedup=s+p\times N=s+(1-s)\times N=N+(1-N)\times s$（$N$ 是 number of processors、$s$ and $p$ are the fractions of time spent executing the serial parts and the parallel parts of the program on the parallel system, where $s+p=1$）
![截圖 2026-04-11 20.49.13](https://hackmd.io/_uploads/H1HImaDnbe.png)
### Multithreading Models
| User Level Thread（ULT） | Kernel Level Thread（KLT） |
| - | - |
| management done by user-level threads library | Supported by the Kernel |
| Three primary thread libraries： <br> - POSIX Pthreads <br> - Windows threads <br> - Java threads | Examples：virtually all general-purpose operating systems, including： <br> - Windows <br> - Linux <br> - Mac OS X <br> - iOS <br> - Android |

Mode switch between ULT and KLT requires privilege change and thread switching. In some cases, it requires lock certain kernel resources and takes significant overhead
下面兩張投影片在影片裡面有動畫：
![截圖 2026-04-11 20.56.42](https://hackmd.io/_uploads/HJBfBTPhbe.png)
![截圖 2026-04-11 20.56.52](https://hackmd.io/_uploads/S1JmraPhWx.png)
Mapping between User and Kernel Threads
![截圖 2026-04-11 20.57.37](https://hackmd.io/_uploads/BypBB6Pn-l.png)
- Many-to-One Model（N:1）
  ![截圖 2026-04-11 20.59.48](https://hackmd.io/_uploads/rkyRB6vh-e.png)
  - Many user-level threads mapped to single kernel thread
  - One thread blocking causes all to block
  - Multiple threads may not run in parallel on multicore system because only one may be in kernel at a time
  - Few systems currently use this model：Solaris Green Threads、GNU Portable Threads
- One-to-One Model（1:1）
  ![截圖 2026-04-11 21.00.01](https://hackmd.io/_uploads/HJn0BpDnZg.png)
  - Each user-level thread maps to kernel thread
  - Creating a user-level thread creates a kernel thread
  - More concurrency than many-to-one
  - Number of threads per process sometimes restricted due to overhead
  - Example：Windows、Linux
- Many-to-Many Model（N:M）
  ![截圖 2026-04-11 21.00.12](https://hackmd.io/_uploads/Hkwy8TP3Wl.png)
  - Allows many user level threads to be mapped to many kernel threads
  - Allows the operating system to create a sufficient number of kernel threads
  - Windows with the ThreadFiber package
  - Otherwise not very common
- Two-level model
  ![截圖 2026-04-11 21.00.28](https://hackmd.io/_uploads/HyveUpw2-g.png)
  - Similar to N:M, except that it allows a user thread to be bound to kernel thread

Although the many-to-many model appears to be the most flexible of the models discussed, in practice it is difficult to implement.
In addition, with an increasing number of processing cores appearing on most
systems, limiting the number of kernel threads has become less important. As a result, most operating systems now use the one-to-one model.
### Implicit Threading
#### what is Implicit Threading
- Implicit Threading：Creation and management of threads done by compilers and run-time libraries rather than programmers
- Growing in popularity as numbers of threads increase, program correctness is more difficult with explicit threads（programmer 自己寫的程式可能會有 bug）
- Five methods explored
    - Thread Pools
    - Fork-Join
    - OpenMP
    - Grand Central Dispatch
    - Intel Threading Building Blocks
#### Thread Pool
- Create a number of threads in a pool where they await work
- Advantages：
    - Usually slightly faster to service a request with an existing thread than create a new thread
    - Allows the number of threads in the application(s) to be bound to the size of the pool
    - Separating task to be performed from mechanics of creating task allows different strategies for running task（i.e. Tasks could be scheduled to run periodically）

Windows API supports thread pools：
```=c
DWORD WINAPI PoolFunction( AVOID Param ){
    /*
    * this function runs as a separate thread.
    */
}
```
那些在被 request 前就先 create 好的 thread 叫 worker thread：
- When no assignment, it waits on signals.
- When signaled（unlocked）, it executes the function of the assigned task.
- When done, it waits for next assignment.
- One worker thread may execute different functions.
- Worker threads require explicitly free and shutdown.（要釋放跟關閉）

Threadpool for Socket Server：
```=c
int main(){
    int welcomeSocket, newSocket, portNum, clientLen, nBytes, i;
    char buffer[1024];
    ...
    assert( ( pool = threadpool_create( THREAD, QUEUE, 0 ) ) != NULL );
    ...
    while( i < nThread ){
        newSocket = accept( welcomeSocket, (struct sockaddr *) &serverStorage, &addr_size );
        i++;
        printf( "Accept %d connection.\n", i );
        
        // for each client request creates a thread and assign the client request   
        // to it to process 
        // so the main thread can entertain next request
        if( threadpool_add( pool, socketThread, &newSocket, 0 ) != 0 ) 
            printf( "Failed to add thread\n" );
    }
    ...
```
![截圖 2026-04-11 23.03.38](https://hackmd.io/_uploads/By_Rzyd3Zl.png)
create the threadpool：
```=c
threadpool_t *threadpool_create( int thread_count, int queue_size, int flags ){
    if( thread_count <= 0 || thread_count > MAX_THREADS || queue_size <= 0 || queue_size > MAX_QUEUE ){
        return NULL;
    }

    threadpool_t *pool;
    int i;

    if( ( pool = (threadpool_t *)malloc(sizeof(threadpool_t))) == NULL ){
        goto err;
    }

    /* Initialize */
    pool->thread_count = 0;
    pool->queue_size = queue_size;
    pool->head = pool->tail = pool->count = 0;
    pool->shutdown = pool->started = 0;

    /* Allocate thread and task queue */
    pool->threads = (pthread_t *)malloc(sizeof(pthread_t) * thread_count);
    pool->queue = (threadpool_task_t *)malloc(sizeof(threadpool_task_t) * queue_size);

    /* Initialize mutex and conditional variable first */
    if( ( pthread_mutex_init( &(pool->lock), NULL ) != 0)  || ( pthread_cond_init( &(pool->notify), NULL ) != 0 ) || ( pool->threads == NULL ) || ( pool->queue == NULL ) ){
        goto err;
    }

    /* Start worker threads */
    for( i = 0; i < thread_count; i++ ){
        if( pthread_create( &(pool->threads[i]), NULL, threadpool_thread, (void*)pool ) != 0 ){
            threadpool_destroy( pool, 0 );
            return NULL;
        }
        pool->thread_count++;
        pool->started++;
    }

    return pool;

 err:
    if( pool ){
        threadpool_free( pool );
    }
    return NULL;
}
```
thread pool - an array of thread：
```=c
/**
 *  @struct threadpool
 *  @brief The threadpool struct
 *
 *  @var notify       Condition variable to notify worker threads.
 *  @var threads      Array containing worker threads ID.
 *  @var thread_count Number of threads
 *  @var queue        Array containing the task queue.
 *  @var queue_size   Size of the task queue.
 *  @var head         Index of the first element.
 *  @var tail         Index of the next element.
 *  @var count        Number of pending tasks
 *  @var shutdown     Flag indicating if the pool is shutting down
 *  @var started      Number of started threads
 */
struct threadpool_t {
  pthread_mutex_t lock;
  pthread_cond_t notify;
  pthread_t *threads;
  threadpool_task_t *queue;
  int thread_count;
  int queue_size;
  int head;
  int tail;
  int count;
  int shutdown;
  int started;
};
```
assign task to the queue：
```=c
int threadpool_add( threadpool_t *pool, void (*function)(void *), void *argument, int flags ){
    int err = 0;
    int next;

    if( pool == NULL || function == NULL ){
        return threadpool_invalid;
    }

    if( pthread_mutex_lock( &(pool->lock)) != 0 ){
        return threadpool_lock_failure;
    }

    next = pool->tail + 1;
    next = ( next == pool->queue_size ) ? 0 : next;

    do{
        /* Are we full ? */
        if( pool->count == pool->queue_size ){
            err = threadpool_queue_full;
            break;
        }

        /* Are we shutting down ? */
        if( pool->shutdown ){
            err = threadpool_shutdown;
            break;
        }

        /* Add task to queue */
        pool->queue[pool->tail].function = function;
        pool->queue[pool->tail].argument = argument;
        pool->tail = next;
        pool->count += 1;

        /* pthread_cond_broadcast */
        if( pthread_cond_signal( &(pool->notify) ) != 0 ){
            err = threadpool_lock_failure;
            break;
        }
    }while( 0 );

    if( pthread_mutex_unlock(&pool->lock) != 0 ){
        err = threadpool_lock_failure;
    }

    return err;
}
```
worker thread：
```=c
static void *threadpool_thread( void *threadpool ){
    threadpool_t *pool = (threadpool_t *)threadpool;
    threadpool_task_t task;

    for( ;; ){
        /* Lock must be taken to wait on conditional variable */
        pthread_mutex_lock( &(pool->lock) );

        /* Wait on condition variable, check for spurious wakeups.
           When returning from pthread_cond_wait(), we own the lock. */
        while( ( pool->count == 0 ) && ( !pool->shutdown ) ){
            pthread_cond_wait( &(pool->notify), &(pool->lock) );
        }

        if( ( pool->shutdown == immediate_shutdown ) || ( ( pool->shutdown == graceful_shutdown ) && ( pool->count == 0 ) ) ){
            break;
        }

        /* Grab our task */
        task.function = pool->queue[pool->head].function;
        task.argument = pool->queue[pool->head].argument;
        pool->head += 1;
        pool->head = ( pool->head == pool->queue_size ) ? 0 : pool->head;
        pool->count -= 1;

        /* Unlock */
        pthread_mutex_unlock( &(pool->lock) );

        /* Get to work */
        (*(task.function))(task.argument);
    }

    pool->started--;

    pthread_mutex_unlock( &(pool->lock) );
    pthread_exit( NULL );
    return( NULL );
}
```
multithread client：
```=c
int main(){
    int i = 0;
    pthread_t tid[nThread];
     
    while( i < nThread ){
        sleep( i*2 );	
        if( pthread_create( &tid[i], NULL, clientThread, NULL ) != 0 )
            printf( "Failed to create thread\n" );
        i++;
    }
    i = 0;
    while( i < nThread ){
        pthread_join( tid[i++],NULL );
        printf( "Thread %d joined.\n", i );
    }
    return 0;
}
```
redundant worker threads：
Each connection uses different worker thread to send/recv messages
![截圖 2026-04-11 23.17.37](https://hackmd.io/_uploads/SJyQIkO2Wl.png)
![截圖 2026-04-11 23.18.09](https://hackmd.io/_uploads/HkTNL1_n-e.png)
kess worker thread：
Some connections have to re-use worker threads to send/recv messages.
410569 and 410571 are assigned for more than once
![截圖 2026-04-11 23.19.50](https://hackmd.io/_uploads/ryro8ydnWx.png)
![截圖 2026-04-11 23.20.30](https://hackmd.io/_uploads/B1IRIy_nWl.png)
compare resource usage：
- 3 worker threads
  ![截圖 2026-04-11 23.23.14](https://hackmd.io/_uploads/SJJuD1uhZg.png)
- 10 worker threads
  ![截圖 2026-04-11 23.23.29](https://hackmd.io/_uploads/ry6uvkdhZl.png)
- 100 worker threads（Two times of memory are used for additional threads.）
  ![截圖 2026-04-11 23.23.46](https://hackmd.io/_uploads/HyecwyunZe.png)
#### Fork-Join Parallelism
![截圖 2026-04-11 23.25.33](https://hackmd.io/_uploads/S1qe_1_2-l.png)
- Multiple threads (tasks) are forked, and then joined.
    - Similar to the Divide-and-Conquer algorithm.
    - There are dependence among the tasks.
- In Thread Pool model, the tasks are mostly independent.
- General algorithm for fork-join strategy：
  ```
  Task( problem )
      if problem is small enough
          solve the problem directly
      else
          subtask1 = fork( new Task( subset of problem ) )
          subtask2 = fork( new Task( subset of problem ) )
          result1 = join( subtask1 )
          result2 = join( subtask2 )
          return combined result
  ```
  ![截圖 2026-04-11 23.28.50](https://hackmd.io/_uploads/Syl6_1uhWg.png)
 
OpenMP 跟 GCD
### Threading Issues
- Semantics of `fork()` and `exec()`system calls
    - Does `fork()` duplicate only the calling thread or all threads？
        - Some UNIXes have two versions of fork:
            1. duplicates all threads
            2. duplicates only the thread that invoked the `fork()` system call.
        - `exec()` usually works as normal – replace the running process including all threads.
- Signal handling：Synchronous and asynchronous
    - Signals are used in UNIX systems to notify a process that a particular event has occurred.
    - A signal handler is used to process signals
        - Signal is generated by particular event
        - Signal is delivered to a process
        - Signal is handled by one of two signal handlers：default handler or user-defined
    - Every signal has default handler that kernel runs when handling signal
        - User-defined signal handler can override default
        - For single-threaded, signal delivered to process
    - Where should a signal be delivered for multi-threaded？
        - Deliver the signal to the thread to which the signal applies：
          ```=c
          pthread kill( pthread_t tid, int signal );
          ```
        - Deliver the signal to every thread in the process
        - Deliver the signal to certain threads in the process：each thread can choose to block certain signals
        - Assign a specific thread to receive all signals for the process
        - Without specified assigned, because signals need to be handled only once, a signal is typically delivered only to the first thread found that is not blocking it.
- Thread cancellation of target thread：Asynchronous or deferred
    - Terminating a thread before it has finished
    - Thread to be canceled is target thread
    - Two general approaches:
        1. Asynchronous cancellation：terminates the target thread immediately
        2. Deferred cancellation：allows the target thread to periodically check if it should be cancelled
    - Pthread code to create and cancel a thread：
      ```=c
      pthread_t tid;
      
      /* create the thread */
      pthread_create( &tid, 0, worker, NULL );
      ...
      /* cancel the thread */
      pthread_cancel( tid );
      
      /* wait for the thread to terminate */
      pthread_join( tid, NULL );
      ```
    - Invoking thread cancellation requests cancellation, but actual cancellation depends on thread state
    - Pthreads supports three cancellation modes
      `pthread_setcancelstate()`：changes the cancellation state for the calling thread, that is, whether cancellation requests are ignored or not
      `pthread_setcanceltype()`：changes the type of responses to cancellation requests for the calling thread：asynchronous immediate or deferred
      | Mode | State | Type |
      | - | - | - |
      | Off | Disabled | - |
      | Deferred | Enabled | Deferred |
      | Asynchronous | Enabled | Asynchronous|
    - If thread has cancellation disabled, cancellation remains pending until thread enables it
    - Default type is deferred（Cancellation only occurs when thread reaches cancellation point, i.e. `pthread_testcancel()` then cleanup handler is invoked）
    - On Linux systems, thread cancellation is handled through signals.
- Thread-local storage
    - Thread-local storage（TLS）allows each thread to have its own copy of data
    - Useful when you do not have control over the thread creation process（i.e. when using a thread pool）
    - Different from local variables
        - Local variables visible only during single function invocation
        - TLS visible across function invocations
    - Similar to static data
        - TLS is unique to each thread
- Scheduler Activations
    - Both M:N and Two-level models require communication to maintain the appropriate number of kernel threads allocated to the application
    - Typically use an intermediate data structure between user and kernel threads – lightweight process（LWP）
    ![截圖 2026-04-11 23.55.20](https://hackmd.io/_uploads/Sk4lJgu2-x.png)
        - Appears to be a virtual processor on which process can schedule user thread to run
        - Each LWP attached to kernel thread
    - How many LWPs to create？
        - Scheduler activations provide upcalls - a communication mechanism from the kernel to the upcall handler in the thread library.
        - This communication allows an application to maintain the correct number kernel threads.
## CH9 - Memory Management for Main Memory
### Background
overview - How to manage memory for limited number of processes on bare metal (physical) main memory
![截圖 2026-04-12 00.25.17](https://hackmd.io/_uploads/ByFeUgOhWx.png)
from program to process：
![截圖 2026-04-12 00.27.51](https://hackmd.io/_uploads/rkL9Llu3-l.png)
這節主要在講最後面那塊 instructions in memory
每個 process 需要的記憶體 的 address 都是從 00...0 開始：
![截圖 2026-04-12 00.30.23](https://hackmd.io/_uploads/r1j7veOn-x.png)
這樣一次只能執行一個 process，所以有多個 process 的時候可以像下面那樣子用 pointer
![截圖 2026-03-29 02.12.37](https://hackmd.io/_uploads/SkUQc9roWl.png)
但這樣會有一個問題，就是每個 process 需要的記憶體的 address 都是從 00...0 開始，但改 pointer 的話就不會符合，所以要把每個 address 加一個數字（base），這就叫 address binding
一個 process 應該要只能 access 自己的 address space，所以除了 base register 還要一個 limit register：
![截圖 2026-04-12 00.34.03](https://hackmd.io/_uploads/B1P-_e_hbe.png)
CPU must check every memory access generated in user mode to be sure it is between base and limit for that user process
the instructions to loading the base and limit registers are privileged at ring 2 or lower
![截圖 2026-04-12 00.35.38](https://hackmd.io/_uploads/SkdPOxO2Wg.png)
比較詳細的步驟：
Address binding of instructions and data to memory addresses can happen at three different stages：
- compile time：如果在 compile 的時候就已經事先知道（a priori）程式未來會被放在記憶體的哪個位置， compiler 就可以直接產生**absolute code**，但缺點是如果未來這個程式需要改變起始的記憶體位置，就必須把整個程式重新 compile 一次
- load time：如果在 compile 時還不知道程式會被放在哪裡，compiler 就會產生**relocatable code**，在這種情況下具體的記憶體位址綁定會延遲到程式真正被「載入 (load)」到記憶體時才進行
- execution time：如果一個 process 在執行期間，可能會被從記憶體的一個區段移動到另一個區段（例如發生 context switch 或 swapping），那麼位址的綁定就必須延遲到「執行時期」才能進行，這種綁定方式需要硬體架構的支援，例如需要 base register 與 limit register 等 address maps 的協助。

| ![截圖 2026-04-12 01.05.32](https://hackmd.io/_uploads/ry5vy-u3Wl.png) | ![截圖 2026-04-12 01.06.22](https://hackmd.io/_uploads/Bk35k-d3-x.png) |
|-|-|

（object module 是 compiler 編譯出來的 .o 檔）
compile time binding 跟 load time binding 的話 logical address（generated by the CPU; also referred to as virtual address）跟 physical address（address seen by the memory unit）會完全相同
![截圖 2026-04-12 01.37.13](https://hackmd.io/_uploads/B1uAUWu3Wl.png)
MMU 會用一個 relocation register（其實就是 base register 只是名字變了）把 logical address 加上 base（轉成 physical address）然後檢查是否在合法區間內
The user program deals with logical addresses; it never sees the real physical addresses
- Execution-time binding occurs when reference is made to location in memory
- Logical address bound to physical addresses

![截圖 2026-04-12 01.59.55](https://hackmd.io/_uploads/BJaXnWd2Zl.png)
### Contiguous Memory Allocation on Physical Memory
#### Memory Allocation
- Main memory must store both OS and user processes.
    - Because of limited resource, must allocate efficiently.
    - Contiguous allocation is one（early）method.
- Main memory usually is divided into two partitions for：
    - Operating system：usually held in high/low memory with interrupt vector
    - User processes：usually held in low/high memory（Each process contained in single contiguous section of memory）

![截圖 2026-04-12 02.58.30](https://hackmd.io/_uploads/r1EJ9G_2-l.png)
Multiple-partition allocation
- Degree of multiprogramming limited by number of partitions.
- Variable-partition sizes for efficiency (sized to a given process’ needs).
- When a process arrives, its memory is allocated from a partition large enough to accommodate it
- When a process terminates
    - the memory is released as a free partition.
    - Adjacent free partitions can be combined to enlarge the partition.
- Hole：block of available memory; holes of various size are scattered throughout memory.
- Operating system maintains information for 
    1. allocated partitions
    2. free partitions（holes）

![截圖 2026-04-12 03.08.32](https://hackmd.io/_uploads/HJAN3M_hbx.png)
有很多 hole 的時候要選哪個比較好？有三種選法：
1. First-fit：Allocate the first hole that is big enough（有最好的 time-complexity 因為另外兩個都要遍歷整個 list 這個不用）
2. Best-fit：Allocate the smallest hole that is big enough; must search entire list, unless ordered by size
3. Worst-fit: Allocate the largest hole; must also search entire list
    - Produces the largest leftover hole
    - First-fit and best-fit better than worst-fit in terms of speed and storage utilization
#### Fragmentation
Fragmentation：有很多 hole，每個 hole 都塞不下單一 process 但這些 hole 全部加起來就可以
![截圖 2026-04-12 03.17.06](https://hackmd.io/_uploads/Sykr0zO2-x.png)
為了解決這個問題所以 fragment 移動成下面那樣，但 base 會變，所以 physical address of data and instructions,
which are generated by loader, must be changed（relocated），這個動作叫 compaction
![截圖 2026-04-12 03.17.43](https://hackmd.io/_uploads/ryvwCGO3Zg.png)
但 compaction 會有 I/O 的 problem（如果一個 process 在 I/O 的時候 OS 把他的記憶體位置移動了就會寫到錯的地方），解決方法：
- Latch（鎖住）job in memory while it is involved in I/O
- Do I/O only into OS buffers

（backing store has same fragmentation problems）
fragmentation 分兩種：
1. External Fragmentation：
    total memory space exists to satisfy a request, but it is not contiguous
    如果用 first-fit 的話：given N blocks allocated, 0.5N blocks lost to fragmentation，所以有 1/3 的記憶體無法使用
2. Internel Fragmentation： allocated memory may be slightly larger than requested memory（拿到的比需要的多）; this size difference is memory internal to a partition, but not being used
   
| ![截圖 2026-04-12 11.11.26](https://hackmd.io/_uploads/rkcvTFO3Wx.png) | ![截圖 2026-04-12 11.11.43](https://hackmd.io/_uploads/ryaOpYOhbg.png) | ![截圖 2026-04-12 11.12.04](https://hackmd.io/_uploads/HyMcTt_2We.png) |
| - | - | - |

#### Memory Relocation
Relocation registers used to protect user processes from each other, and from changing
operating-system code and data.
- Base register contains value of smallest physical address.
- Limit register contains range of logical addresses – each logical address must be less than the limit register.
- MMU maps logical address dynamically.
- Can then allow actions such as kernel code being transient and kernel changing size

![截圖 2026-04-12 11.32.58](https://hackmd.io/_uploads/rJO_zqdhbx.png)
### Paging to Manage
#### Paging
paging 是設計來解決 fragmentation 的
Physical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter is available
paging 可以：
- Avoid external fragmentation
- Avoid problem of varying sized memory chunks
-  Divide physical memory into fixed-sized blocks called frames（Size is power of 2, between 512 bytes and 16 Mbytes）
- Divide logical memory into blocks of same size called pages
- Keep track of all free frames, rather than free holes.
- To run a program of N pages, need to find N free frames in order to load program
- Set up a page table to translate logical to physical addresses

Address generated by compiler/loader is divided into：
- page number：used as an index into a page table which contains base address of each page in physical memory
- page offset：combined with base address to define the physical memory address that is sent to the memory unit.

如果 given 的 logical address space 是 2ᵐ、page size 是 2ⁿ：
![截圖 2026-04-12 11.43.43](https://hackmd.io/_uploads/rJnlH9d2Zg.png)
假設 logical memory 上的 page size 會等於 physical memory 上的 frame size，把 logical address 轉成 physical address 要用 page table 把 page index map 到 frame index
![截圖 2026-04-12 11.48.07](https://hackmd.io/_uploads/S1S-8qu3Zg.png)
透過 paging 就可以 physical address 不連續
![截圖 2026-04-12 11.48.26](https://hackmd.io/_uploads/rkufI5u2Wx.png)
example：
![截圖 2026-04-12 11.52.07](https://hackmd.io/_uploads/HyBgw5d2bl.png)
internal fragmentation 的計算：
- 假設 page size = 2048 bytes、process memory size = 72766 bytes：需要 35 pages + 1086 bytes，所以 internal fragmentation = 2048 - 1086 = 962 bytes
- worst case 的 fragmentation：1 frame - 1 byte
- average fragmentation：1/2 frame size
- 那小一點的 frame size 好嗎？
    - each page table entry takes memory to track
    - page sizes growing over time（Solaris supports two page sizes：8 KB and 4 MB）

free frame list：
![截圖 2026-04-12 12.07.16](https://hackmd.io/_uploads/rybF99u2-e.png)
page table 的效率會很差 很多 overhead：
Page table is kept in main memory
- Page-table base register（PTBR）points to the page table.
- Page-table length register（PTLR）indicates size of the page table.

In this scheme every data/instruction access requires two memory accesses
- One for the page table and one for the data/instruction.
- When one memory fetch takes 10ns and one instruction takes 0.28 ~ 0.84 ns（on 3.5GHz single core processor）
    - The time to execute one instruction will be 0.28 + 10*2 = 20.28, in which only 1.38% of time is used for computation.

The two-memory access problem can be solved by the use of a special fast-lookup hardware cache called translation look-aside buffers（TLBs aka associative memory）
兩題 estimate page table size 的問題：
> 在解之前需要先知道各種 2 的次方的前綴詞
> | K | M | G | T | P | E |
> | - | - | - | - | - | - |
> | 2^10 | 2^20 | 2^30 | 2^40 | 2^50 | 2^60 |
1. Consider a system with a 32-bit logical address space. Assuming that each entry consists of 4 bytes and the page size in such a system is 4 KB（2^12）, each process may need up to **？** of physical address space for the page table alone
   - page size 是 2^12，所以用 12 bits 表示（page offset = 12）
   - logical address = page number 跟 page offset 接起來，所以 page number 的 bit 數 = logical address 的 bit 數 - page offset 的 bit 數 = 32 - 12 = 20 bits
   - page number 的 bit 數 = 20，所以 number of pages = 2^20 = 1 M
   - 總共有 1M 個 entry，然後每個 entry 都要 4 Byte，所以總共需要的 size of page table = 1 M  * 4 Bytes = **4 MB**
2. Consider a system with a 64-bit logical address space. Assuming that each entry consists of 8 bytes and the page size in such a system is 4 KB（2^12）, each process may need up to **？** of physical address space for the page table alone
   - page size 是 2^12，所以用 12 bits 表示（page offset = 12）
   - logical address = page number 跟 page offset 接起來，所以 page number 的 bit 數 = logical address 的 bit 數 - page offset 的 bit 數 = 64 - 12 = 52 bits
   - page number 的 bit 數 = 52，所以 number of pages = 2^52 = 4 P
   - 總共有 4 P 個 entry，然後每個 entry 都要 8 Byte，所以總共需要的 size of page table = 4 P * 8 Bytes = **32 PB**

#### Translation Look-Aside Buffer（TLB）
TLB（aka associative memory）stores a subset of page mapping entries for all the process in the system to avoid the fetch page tables and enhance the computation efficiency.（反正就是存常用的 page number 對應 frame number，page number 是 key、frame number 是 value，會 parallel search for `(key, value)` pairs）
![截圖 2026-04-12 18.11.16](https://hackmd.io/_uploads/S1XCkxK3Wx.png)
TLBs typically small（64 to 1024 entries）
尋找 `(key, value) = (p, d)` 的時候會有兩種情況：
- TLB hit（找到 `p` 了）
- TLB miss（找不到 `p`）：這時候就乖乖去 page table 找 frame number `d` 然後把 `(p, d)` load 到 TLB 裡讓下次找的時候可以加速
    - replacement policy must be considered（如果是滿的要決定要換掉哪個）
    - some entries can be wired down for permanent fast access

some TLBs store address-space identifiers（ASIDs）in each entry - uniquely identifies each process to provide addresss-space protection for that process（otherwise need to flush at every context switch）
paging with TLB：
![截圖 2026-04-12 18.15.41](https://hackmd.io/_uploads/B1y1ZxYnbl.png)
Effective Access Time（EAT）：
- hit ratio：TLB hit 的 percentage（80% hit ratio 代表有 80% 的時間都能在 TLB 裡面找到）
- 假設 access memory 的時間是 10 ns，如果 TLB hit 總共就是 10 ns，如果 TLB miss 總共就是 20 ns
- 假設 hit ratio 是 80%、access memory 的時間是 10 ns，那 EAT = 0.8 * 10 + 0.2 * 20 = 12 ns，代表有 20% slowdown
- 假設 hit ratio 是 99%、access memory 的時間是 10 ns，那 EAT = 0.99 * 10 + 0.01 * 20 = 10.1 ns，代表只有 1% slowdown
#### Memory Protection
- Memory protection implemented by associating protection bit with each frame to indicate if read-only or read-write access is allowed
- Can also add more bits to indicate page execute-only, and so on
- Valid-invalid bit attached to each entry in the page table
    - "valid"：the associated page is in the process’ logical address
    space, and is thus a legal page
    - "invalid"：the page is not in the process’ logical address space
    - Or use page-table length register（PTLR）
- Any violations result in a trap to the kernel

![截圖 2026-04-12 18.57.35](https://hackmd.io/_uploads/H1ln9eY2-l.png)
#### Shared Page
很多個 process 可以 share 一個 page
- Shared code
    - One copy of read-only（reentrant）code shared among processes（i.e. text editors, compilers, window systems）
    - Similar to multiple threads sharing the same process space
    - Also useful for interprocess communication if sharing of read-write pages are allowed
- Private code and data
    - Each process keeps a separate copy of the code and data
    - The pages for the private code and data can appear anywhere in the logical address space

| ![截圖 2026-04-12 19.03.50](https://hackmd.io/_uploads/B1LXheF3Ze.png) | ![截圖 2026-04-12 19.04.09](https://hackmd.io/_uploads/r1YV2lK2Zl.png) |
| - | - |

### Structure of the Page Table
Memory structures for paging can get huge using straight-forward methods
Consider a 32-bit logical address space as on modern computers and page size of 4 KB (212), each process requires 4 MB of physical address space for the page table alone.
The simple solution is to divide the page table into smaller units
有三種方法：
1. hierachical page tables
   Break up the logical address space into multiple page tables
   A simple technique is a two-level page table to page the page table
   ![截圖 2026-04-12 19.16.23](https://hackmd.io/_uploads/SyDz1-tnbx.png)
   Since the page table is paged, the page number is further divided into a 20-bit page number and a 12-bit offset（4 K page size）
   ![截圖 2026-04-12 19.17.49](https://hackmd.io/_uploads/Sycv1-KhWx.png)
   p1 = index to outer page table, p2 = displacement within inner page table（aka forward-mapped page table）
   ![截圖 2026-04-12 19.19.31](https://hackmd.io/_uploads/B1SC1ZK2Wl.png)
   如果是 64-bit logical address space：如果 page size 是 4KB（2^12）代表 page table 有 2^52 enteies，two level 的話 inner page table 有 2^10 4-Byte entries，address 就會長這樣：
   ![截圖 2026-04-12 19.24.55](https://hackmd.io/_uploads/B1NM--thWg.png)
   outer page table 有 2^42 entries，還是很多，所以可能可以再加一層 outer page table，但這樣 2nd outer page table 還是有 2^32 entries：
   ![截圖 2026-04-12 19.27.12](https://hackmd.io/_uploads/Bk1j--Fhbe.png)
   The 64-bit UltraSPARC would require seven levels of paging，所以成本過高，所以對 64-bit architecture 來說 hierarchical page is inappropriate
2. hashed page tables
   Common when the memory spaces are addressed by more than 32 bits.
   The virtual page number is hashed into a page table to avoid huge table
   This page table contains a chain of elements hashing to the same location, each element contains virtual page number（for matching）, value of the mapped page frame（for mapping）, pointer to the next element
   Virtual page numbers are compared in this chain searching for a match, If a match is found, the corresponding physical frame is extracted
   ![截圖 2026-04-12 19.37.21](https://hackmd.io/_uploads/SkMbNWt2Zl.png)
   Variation for 64-bit addresses is clustered page tables
   - Similar to hashed but each entry refers to several pages（such as 16）rather than 1
   - Especially useful for sparse address spaces（where memory references are non-contiguous and scattered）
3. inverted page tables
   原本：每個 process 有自己的 page table 記錄用到的 page
   inverted：每個 frame 自己紀錄佔用它的 page number 跟 process ID / ASID
   Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs
   ![截圖 2026-04-12 20.33.36](https://hackmd.io/_uploads/S1kV-zFh-e.png)
   那這樣 shared memory 怎麼辦？會失敗（發生 page fault ☹️）
   多個 process 搶同一個位置，非常災難：
   ![截圖 2026-04-12 20.40.35](https://hackmd.io/_uploads/HJkRGGYhZe.png)
   inverted 通常需要 ASID be stored in each entry of the page table
   ASID uniquely identifies each process and is used to provide address-space protection for that process（some TLBs store ASIDs is each TLB entry、In inverted page table, the pid assumes the role of the ASID）
   When resolving virtual pages numbers, the OS ensures that the ASID for the currently running process matches the ASID associated with the virtual page.
   Use hash table to limit the search to one — or at most a few — page-table entries（TLB can accelerate access）
### Swapping
When the total memory space requirements of processes exceeds physical memory, a process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution.
- Backing store – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images
- Roll out, roll in – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed.

Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped.
System maintains a ready queue of ready-to-run processes which have memory images on disk.
![截圖 2026-04-12 20.52.18](https://hackmd.io/_uploads/Bk-qHMKhWe.png)
The process memory is stored in a continuous address space, which is reserved by the operating systems for fast and sequential access
The method is **not** used in modern operating systems to support memory paging
Does the swapped out process need to swap back in to same physical addresses？
- Depends on address binding method
- Plus consider pending I/O to / from process memory space

Modified versions of swapping are found on many system（i.e. UNIX, Linux, and
Windows）：
- （Standard）Swapping normally disabled
- Started if more than threshold amount of memory allocated
- Disabled again once memory demand reduced below threshold

context switch time 要考慮 swapping（swap out 跟 swap in）：
- If next processes to be put on CPU is not in memory, need to swap out a process and swap in target process
- Context switch time can then be very high
- 100 MB process swapping to hard disk with transfer rate of 50 MB/sec
    - Swap out time of 2000 ms
    - Plus swap in of same sized process
    - Total context switch swapping component time of 4000 ms（4 seconds）
- Can reduce if reduce size of memory swapped – by knowing how much memory really being used（System calls to inform OS of memory use via `request_memory()` and `release_memory()`）
- Other constraints as well on swapping
    - Pending I/O – can't swap out as I/O would occur to wrong process
    - Or always transfer I/O to kernel space, then to I/O device（Known as double buffering, adds overhead）
- Standard swapping not used in modern operating systems
    - But modified version common（Swap only when free memory extremely low）

swapping with paging：
![截圖 2026-04-12 21.09.40](https://hackmd.io/_uploads/HkzstMYnWx.png)
- Swapping contiguous memory for processes are challenging.
    - Need to find the victim, which uses similar memory, has lower priority, and not ready to run.
    - Otherwise, the fragmentation will become worse.
- Swapping with paging can avoid the aforementioned issues.
## CH10 - Virtual Memory
### Background
Virtual memory – separation of user logical memory from physical memory
- Only part of the program needs to be in memory for execution
- Logical address space can therefore be much larger than physical address space.
- Allows physical address spaces to be shared by several processes
- Allows for more efficient process creation
- More programs running concurrently
- Less I/O needed to load or swap processes

To access the data and instruction, only physical addresses can access the main memory.
In the executable files, the addresses are either logical address or virtual address.
- In operating systems supporting old CPUs, virtual memory addresses are same as logical memory address.
- In operating systems supporting modern CPUs, they may not be the same.
    - modern CPU support segment selector to switch the address between segments：code, data, stack, etc.
    - For 64-bit processors, 16 bit Segment Selector and 32 bit offset, 48 bits in total, are used to address the memory.

Memory segmentation refers to the implementation of memory segmentation in the Intel x86 computer instruction set architecture.
Segmentation was introduced on the Intel 8086 in 1978 as a way to allow programs to address more than 64 KB（65,536 bytes）of memory.
The Intel 80286 introduced a second version of segmentation in 1982 that added support for virtual memory and memory protection.
- In both real and protected modes, the system uses 16-bit segment registers to derive the actual memory address.
- In real mode, the registers CS, DS, SS, and ES point to the currently used program code segment（CS）, the current data segment（DS）, the current stack segment（SS）, and one extra segment determined by the programmer（ES）. The Intel 80386, introduced in 1985, adds two additional segment registers, FS and GS, with no specific uses defined by the hardware. The way in which the segment registers are used differs between the two modes.

logical memory vs. virtual memory：
| | logical memory | virtual memory |
| - | - | - |
| base | 1 | several（one for each segment selector） |
| offset | ranges the entire addressable memory | ranges one segment only for memory protection |

為什麼要設計成多個基底並限制偏移量？例子：Text Segment 是 read only，透過虛擬記憶體這種分區段的機制，系統可以防止程式不小心透過 data segment 的定址方式去存取或破壞到 read only 的程式碼區段。
Virtual address space – logical view of how process is stored in memory
- Usually start at address 0, contiguous addresses until end of segment space
- Meanwhile, physical memory organized in page frames
- MMU must map logical to physical

Virtual memory can be implemented via：
- Demand paging
- Demand segmentation

Virtual Memory Space Can be Greater Than Physical Memory
Separation of logical memory as perceived by developers from physical memory
- Allow an extremely large virtual memory to be provided for programmers when only a smaller physical memory is available
- Make the task of programming much easier

![截圖 2026-04-13 02.32.53](https://hackmd.io/_uploads/ByrDSPt2be.png)

Usually design logical address space for stack to start at Max logical address and grow "down" while heap grows "up"
- Maximizes address space use
- Unused address space between the two is hole
- No physical memory needed until heap or stack grows to a given new page.

Enables sparse address spaces with holes left for growth, dynamically linked libraries, etc.
System libraries shared via mapping（not loading）into virtual address space.
Shared memory by mapping（not loading）pages read-write into virtual address space
![截圖 2026-04-13 02.38.11](https://hackmd.io/_uploads/r1liUwt3Wx.png)

shared library using virtual memory：
Virtual memory allows files and memory to be shared by two or more processes through page sharing.
- System libraries such as the standard C library can be shared by several processes through mapping of the shared object into a virtual address space.
- Processes can share memory.
- Pages can be shared during process creation with the `fork()` system call.

![截圖 2026-04-13 02.40.11](https://hackmd.io/_uploads/HJufDPY3Zg.png)
### Demand Paging
#### Demand Paging Concept
Bring a page into memory only when it is needed（**not when the process starts**）
- Less I/O needed, no unnecessary I/O
- Less memory needed
- Faster response
- More users

Similar to paging system with swapping
Page is needed, then reference to it
- invalid reference：abort
- not-in-memory：bring to memory

Lazy swapper：never swaps a page into memory unless page will be needed（Swapper that deals with pages is a pager）
![截圖 2026-04-13 02.52.09](https://hackmd.io/_uploads/SywJqDtnZg.png)
With swapping, pager guesses which pages will be used before swapping out again. Instead, pager brings in only those pages into memory
How to determine that set of pages？Need new MMU functionality to implement demand paging
- If pages needed are already memory resident：No difference from non demand-paging
- If page needed and not memory resident：Need to detect and load the page into memory from storage
    - Without changing program behavior
    - Without programmer needing to change code

Valid-Invalid Bit：
![截圖 2026-04-13 03.01.12](https://hackmd.io/_uploads/H1w-nPF3bl.png)
- With each page table entry a valid–invalid bit is associated（v：in-limit/memory – memory resident, i：not-in-limit/memory)
- Initially valid–invalid bit is set to i on all entries
- During MMU address translation, if valid–invalid bit in page table entry is i：page fault
- While a process is executing, some pages are in memory, and some pages are in secondary storage：這個情況也可以用 Valid-Invalid 來表示
  ![截圖 2026-04-13 03.02.49](https://hackmd.io/_uploads/r1PDnPF3bl.png)
  - Valid：the associated page is legal and in memory
  - Invalid：the page is either not valid 也就是 not in limit（#6 and #7）or Valid but currently in secondary storage 也就是 not in memory（#1, #3, and #4）
#### Page Fault Handling
- Demand paging is also used to load the logical memory page on demand.
- Swap-in/Out need NOT to be contiguous memory space.
- Swap-in/Out are carried out by pages in logical memory / frames in physical memory.
- Pure demand paging：
    - No page is loaded before first instruction.
    - The process starts on the first instruction, which is stored on a non-resident memory page and causes page fault.
    - Each page fault loads the required memory page into physical memory.

steps in handling page fault：
- If there is a reference to a page, first reference to that page will trap to operating system
    - Page fault
- Operating system looks at an interal table at PCB to decide：
    - Invalid reference（Out of range）：abort
    - Just not in memory（page fault）：page in
- Steps for page in：
  ![截圖 2026-04-13 03.34.01](https://hackmd.io/_uploads/rJVhm_K2-x.png)
    - Find free frame in physical memory.
      ![截圖 2026-04-13 03.35.29](https://hackmd.io/_uploads/ry2-NuF3bx.png)
    - Swap page into frame via scheduled disk operation
      ![截圖 2026-04-13 03.34.51](https://hackmd.io/_uploads/rJBk4_F2bl.png)
    - Reset tables to indicate page now in memory, Set validation bit = v
      ![截圖 2026-04-13 03.35.50](https://hackmd.io/_uploads/rJlm74uKhZe.png)
    - Restart the instruction that causes the page fault
      ![截圖 2026-04-13 03.35.00](https://hackmd.io/_uploads/BykxNOF3bl.png)
#### Aspects of Demand Paging
Locality of reference（參考區域性）
- Theoretically, some programs access several new pages of memory with each instruction execution（一行程式碼如果就要 access 很多 page 然後都找不到，就會發生好幾次 page fault）
    - Example：one page for the instruction and many for data
    - Lead to unacceptable system performance
- Analysis of running processes shows that this behavior is exceedingly unlikely

Hardware support（same as paging and swapping）
- Page table with a valid-invalid bit（or other protection bits）
- Secondary memory（swap device with swap space）
#### Instruction Restart
A page fault may occur at any memory reference（memory reference：存取記憶體）
- If the page fault occurs on the instruction fetch, we can restart by fetching the instruction again
- If a page fault occurs while we are fetching an operand（存取運算元）, we must fetch and decode the instruction again and then fetch the operand
    - Example：add A and B as C
    ![截圖 2026-04-13 09.05.50](https://hackmd.io/_uploads/By2u-6F2Wg.png)
        - Fetch and decode the instruction（ADD）
        - Fetch A
        - Fetch B
        - Add A and B
        - Store the sum in C（Page Fault）
            - If C is in a page not currently in memory, we will have to get the desired page, bring it in, correct the page table, and restart the instruction（all steps above）

Major difficulty：one instruction modifies different locations
- Example: an instruction moves some bytes from one location to another（possibly overlapping）location
    - If either block（source or destination）跨越 a page boundary, a page fault may occur after the move is partially done
    - If the source and destination blocks overlap, the source block may have been modified, in which case we cannot simply restart the instruction
- solution：
    - Add microcode attempts to access both ends of both blocks and triggers page faults（if any）before anything is modified（在 instruction 真正開始修改任何資料之前，microcode 會先嘗試去存取這兩個 block 的頭尾兩端，如果會發生缺頁，就在資料被改寫前提早觸發 Page Fault）
    - Use temporary registers to hold the values of overwritten locations, and, if there is a page fault, all the old values are written back into memory before the trap

#### Free-Frame List
- When a page fault occurs, the operating system must bring the desired page from secondary storage into main memory.
    - Most operating systems maintain a free-frame list, a pool of free frames.
      ![截圖 2026-04-13 09.16.07](https://hackmd.io/_uploads/HkDJETtn-e.png)
    - Free frames must also be allocated when the stack or heap segments from a process expand.
- Operating system typically allocate free frames using a technique known as zero-fill-on-demand, the content of the frames zeroed-out before being allocated.
- When a system starts up, all available memory is placed on the free-frame list.
#### Performance of Demand Paging
3 major activities：
1. Service the interrupt：careful coding means just several hundred instructions needed: 1 - 100 microseconds
2. Read the page：lots of time（several milliseconds）
3. Restart the process：again just a small amount of time（1 - 100
microseconds）

worse case 的 stages：
1. Trap to the operating system
2. Save the user registers and process state
3. Determine that the interrupt was a page fault
4. Check that the page reference was legal and determine the location of the page on the disk
5. Issue a read from the disk to a free frame:
    - Wait in a queue for this device until the read request is serviced
    - Wait for the device seek and/or latency time
    - Begin the transfer of the page to a free frame
6. While waiting, allocate the CPU to some other user
7. Receive an interrupt from the disk I/O subsystem（I/O completed）
8. Save the registers and process state for the other user
9. Determine that the interrupt was from the disk
10. Correct the page table and other tables to show page is now in memory
11. Wait for the CPU to be allocated to this process again
12. Restore the user registers, process state, and new page table, and then resume the interrupted instruction

page fault rate：$0 \leq p \leq 1$
- $p=0$：no page faults
- $p=1$：every references is a fault

Effective Access Time（EAT）= $(1-p) \times($memory access$)+ p \times($page fault overhead $+$ swap page out $+$ swap page in$)$
- Example
    - Memory access time = 200 nanoseconds
    - Average page-fault service time = 8 milliseconds = 8000000 nanoseconds
    - EAT = $(1-p)\times 200+p\times 8000000$ = $20+p\times 799880$
    - If 1 access out of 1000 causes a page fault, i.e. $p$ = 0.001, then EAT = 8.2 microseconds. This is a slowdown by a factor of 40（compared to 200 ns 整個超慢）
    - If performance degradation $\leq$ 10 percent（220 ns）
      $220 \geq 200 + 7,999,800 \times p$
      $20 \geq 7,999,800 \times p$
      $p \leq .0000025$
      代表 $\leq$ one page fault in every 400000 memory accesses

Demand Paging Optimizations：
- Swap space I/O faster than file system I/O even if on the same device（Swap allocated in larger chunks, less management needed than file system）
- Copy entire process image to swap space at process load time
    - Then page in and out of swap space
    - Used in older BSD Unix
- Demand page in from program binary on disk, but discard rather than （program binary：程式二進位檔）paging out when freeing frame
    - Used in Solaris and current BSD
    - Still need to write to swap space
        - Pages not associated with a file（like stack and heap）– anonymous memory
        - Pages modified in memory but not yet written back to the file system
- Mobile systems
    - Typically don’t support swapping
    - Instead, demand page from file system and reclaim read-only pages（such as code）
### Copy-on-Write
`fork()` 一個小孩的時候應該要複製一份一樣的 page 給小孩，但 COW 會先讓 parent 跟 chile 一起 share 同一份，要 write 的時候再 copy，然後只會 copy 被改到的那個 page
COW allows more efficient process creation as only modified pages are copied
| ![截圖 2026-04-13 10.19.26](https://hackmd.io/_uploads/Hyj3fAKhWx.png) | ![截圖 2026-04-13 10.19.36](https://hackmd.io/_uploads/HyJRGRt3bx.png) |
| - | - |

In general, free pages are allocated from a pool of zero-fill-on-demand pages
- Pool should always have free frames for fast demand page execution（Don’t want to have to free a frame as well as other processing on page fault
- Why zero-out a page before allocating it?

`fork()` can speed up the process by using COW. However, `vfork()` has parent suspend and child using address space of parent
- Designed to have child call `exec()`
- No need to use COW.
- often used to implement UNIX command-line shell interfaces.

### Page Replacement
Over-Booking / Over-Allocating the memory：
Because not all the pages will be used, it is not necessary to reserve the physical memory for all the virtual / physical memory.
- By allocating only 50% of requested memory, we can double（theoretically）the number of processes in the system.

Similarly, there are demands from the kernel, I/O buffers, etc
Consequently, the system is over-allocating memory when the process uses more
over-booking 的時候會需要 Page Replacement：
![截圖 2026-04-13 16.03.44](https://hackmd.io/_uploads/B1nw7XqhWl.png)
How much to allocate to each, including processes, kernel, and I/O buffers？
Page replacement – find some page in memory, but not really in use, page it out
- Algorithm – terminate？swap out the process？replace the page？
- Performance – need an algorithm which will result in minimum number of page faults

Same page may be brought back and forth into memory several times.
Prevent over-allocation of memory by modifying page-fault service routine to include page replacement：
![截圖 2026-04-13 16.16.30](https://hackmd.io/_uploads/rytPUXc3-x.png)
Basic Page Replacement：
1. find the location of the desired page on disk
2. Find a free frame
    - If there is a free frame, use it
    - If there is no free frame, use a page replacement algorithm to select a victim frame
    - write victim frame to disk if dirty
        - Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk
            - Each page or frame has a modify bit associated with it in the hardware to indicate if the page has been modified
            - When we select a page for replacement
              if modify bit is set：write the page to storage
              if modify bit is clear：need to write the memory page to storage
            - It reduces I/O time by one-half if the page has not been modified.
3. bring the desired page into the（newly）free frame; update the page and frame tables
4. continue the process by restarting the instruction that caused the trap

note：now potentially 2 page transfers for page fault – increasing EAT.
Frame-allocation algorithm：決定 how many frames to give each process、which frames to replace
Page-replacement algorithm：Reduce/Minimize page-fault rate on both first access and re-access
![截圖 2026-04-13 17.01.38](https://hackmd.io/_uploads/S1x-bNcnbe.png)
Evaluate algorithm by running it on a particular string of memory references（reference string）and computing the number of page faults on that string（給定一串 Reference String 看看在固定的頁框數量下，哪個演算法產生的 page fault 次數最少）
- string is just page numbers, not full address
- repeated access to the same page does not cause a page fault
- results depend on number of frames avaliable

FIFO Page Replacement：
- Create a FIFO queue to hold all pages in memory
- Replace a page at the head of the queue
- Insert a page at the tail of the queue
- 15 page faults in the running example `7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1`：
  ![截圖 2026-04-13 21.26.18](https://hackmd.io/_uploads/rkvZ1u92bx.png)
  ![截圖 2026-04-13 21.26.29](https://hackmd.io/_uploads/SJGzyO52bg.png)
  ![截圖 2026-04-13 21.26.41](https://hackmd.io/_uploads/ByCzJ_9n-x.png)
#### Belady's Anomaly
For some page-replacement algorithms, the page-fault rate may increase as the number of allocated frames increases
Reference string `1 2 3 4 1 2 5 1 2 3 4 5`：
- If # of frames = 1, # of page faults = 12
  ![截圖 2026-04-13 21.29.45](https://hackmd.io/_uploads/SJt0yOqh-e.png)
- If # of frames = 2, # of page faults = 12
  ![截圖 2026-04-13 21.29.59](https://hackmd.io/_uploads/B1rkld52We.png)
- If # of frames = 3, # of page faults = 9
  ![截圖 2026-04-13 21.30.19](https://hackmd.io/_uploads/ryFxl_5nWe.png)
- If # of frames = 4, # of page faults = **10**
  ![截圖 2026-04-13 21.30.29](https://hackmd.io/_uploads/HJeM-edqn-l.png)
- If # of frames ≥ 5, # of page faults = 5
  ![截圖 2026-04-13 21.30.47](https://hackmd.io/_uploads/HJ4fgO92Zx.png)

Giving more memory to a process does not necessarily improve its performance

FIFO Illustrating Belady’s Anomaly：
![截圖 2026-04-13 21.31.33](https://hackmd.io/_uploads/SJzrxd9nbe.png)
#### Optimal Page Replacement
NO Belady’s Anomaly（所有基於 stack 的演算法，在為 page assign priority 時，這個標準是完全獨立於系統目前擁有的 frame 數量的）
Replace the page that will not be used for the longest period of time
Guarantee the lowest number of page faults for a fixed reference string（It requires future knowledge of the reference string）
9 page faults in the running example `7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1`（15 page faults while using FIFO）
![截圖 2026-04-13 21.35.35](https://hackmd.io/_uploads/Hy4N-_92bl.png)
The optimal algorithm is used mainly for comparison studies
#### LRU Page Replacement
NO Belady’s Anomaly
Replace the page that has not been used for the longest period of time
The optimal page replacement looking forward in time, rather than backward
12 page faults in the running example `7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1`
![截圖 2026-04-13 21.40.54](https://hackmd.io/_uploads/BJQ_zOc2-g.png)
Often used and considered to be good
有兩種實作方法：
1. Counter implementation：
    - Every page entry has a counter; every time page is referenced through this entry, copy the clock into the counter
    - When a page needs to be changed, look at the counters to find smallest value
    - Search through table needed
2. Stack implementation：
    - Keep a stack of page numbers in a double link form
    - Page referenced：
        - move it to the top
        - requires 6 pointers to be changed
    - But each update becomes more expensive
    - No search for replacement（直接拿最上面的 不用遍歷尋找）
    - Use of a stack to record most recent page references：
      ![截圖 2026-04-13 22.00.27](https://hackmd.io/_uploads/HyiZvOq2-x.png)

Not many computer systems provide sufficient hardware support for true LRU page replacement（The updating of the clock fields or stack must be done for every memory reference）
Many systems provide help in the form of a reference bit
- The reference bit（in the page table）of a page is set to 1 by the hardware when the page is referenced
- We can determine which pages have（or have not）been used after their reference bits are set to 0（initially and periodically）
  ![截圖 2026-04-13 22.28.38](https://hackmd.io/_uploads/r17i6Oc3-x.png)
  ![截圖 2026-04-13 22.28.47](https://hackmd.io/_uploads/Sy2spu52bl.png)
  ![截圖 2026-04-13 22.28.57](https://hackmd.io/_uploads/ryInad5nZl.png)
    However, we do not know the order of use

Examples of variations
- Additional-reference-bits algorithm
    - Gain ordering information by recording the reference bits at regular intervals
        - At regular intervals（e.g. every 100 milliseconds）, a timer interrupt transfers control to the operating system which 
            - Shift the reference bit for each page into the high-order bit
            - Shift the other bits right
        - Example: keep an 8-bit byte for each page in a table in memory
            - 00000000 vs. 11111111 vs. 11000100 vs. 01110111
        - The number of bits can be varied 
            - In the extreme case, it can be zero, leaving only the reference bit itself (second-chance page-replacement algorithm)
- Second-chance algorithm
    - A FIFO replacement algorithm with reference bits
    - When a page has been selected according to FIFO Replacement, inspect its reference bit
      if is 0（not refferenced）：replace the page
      if is 1：give the page a second chance and select the next FIFO page（get a second chance：referenced is cleared to 0、arrival time is reset to the current time, so will not be replaces until other pages have been replaces or given second chances）
      ![截圖 2026-04-13 22.44.47](https://hackmd.io/_uploads/SJ3vbYc2We.png)
    - the reference bit of a page is set to 1 when the page is referenced（如果一個 page 重要到一直可以維持在 1 的話他就用遠不會被 replace）
    - One way to implement the second-chance algorithm is as a circular queue（aka clock algorithm）
    - worst case 就是退化成 FIFO replacement
- Enhanced second-chance algorithm
    - Consider the reference bit and the modify bit（described earlier）as an ordered pair
      `(0, 0)`：neither recently referenced nor modified（best page to replace, i.e. no neel to page out）
      `(0, 1)`：not recently referenced but modified（need to be written out before replacement）
      `(1, 0)`：recently referenced but clean（probably will be use again soon and be paged in）
      `(1, 1)`：recently referenced and modified（probably will be use again soon and need to be written out before replacement）
    - Use the same scheme as in the clock algorithm and examine the class to which that page belongs
#### Counting-Based Page Replacement
Keep a counter of the number of references that have been made to each page
衍生出兩種替換邏輯：
1. Least frequently used（LFU）page-replacement algorithm
    - Require that the page with the smallest count be replaced（數字很大的應該很常被用不應該被換掉）
2. Most frequently used（MFU）page-replacement algorithm
    - Require that the page with the largest count be replaced（數字很小的應該剛被存取還沒用，要給他機會被用）
#### Page-Buffering Algorithms
Keep a pool of free frames 
- When a page fault occurs, a victim frame is chosen as before 
- The desired page is read into a free frame from the pool before the victim is written out（Restart the process as soon as possible）

Maintain a list of modified pages
- When the paging device is idle, a modified page is selected and written to 
secondary storage, and its modify bit is then reset（Increase the probability that a page will be clean）

Keep a pool of free frames but remember which page was in each 
frame 
- When a page fault occurs, check whether the desired page is in the free-frame pool to reduce paging in overhead. If not, select a free frame and read into it
#### Applications and Page Replacement
Applications accessing data through the operating system's virtual memory may perform worse
- Some applications understand their memory use and storage use better than an operating system, e.g. database and data warehouse
- Twice the memory is being used for a set of I/O（Both the operating system and the application are buffering I/O）

Some operating systems provide a raw disk：A large sequential array of logical blocks, without any file-system data structure（Bypass all the file-system services, such as file I/O demand paging, file locking, prefetching, space allocation, file names, and directories）
### Allocation of Frames
Each process needs minimum number of frames, defined by the computer architecture.
- Performance: as the number of frames allocated to each process decreases, the page-fault rate increases
- Instruction restart: to hold all the different pages that any single instruction can reference
- Example: IBM 370 – 6 pages to handle SS MOVE instruction:
    - instruction is 6 bytes, might span 2 pages
    - 2 pages to handle from
    - 2 pages to handle to

Maximum of course is total frames in the system
Two major allocation schemes
    - fixed allocation
    - priority allocation

Many variations
![截圖 2026-04-13 23.50.09](https://hackmd.io/_uploads/BJphecq3Zg.png)
![截圖 2026-04-13 23.50.19](https://hackmd.io/_uploads/B1wpgq53Wx.png)
![截圖 2026-04-13 23.50.28](https://hackmd.io/_uploads/rkbCx5c3-e.png)
![截圖 2026-04-13 23.50.37](https://hackmd.io/_uploads/rkqCxq53We.png)
![截圖 2026-04-13 23.50.49](https://hackmd.io/_uploads/r1LyWqcnbg.png)
![截圖 2026-04-13 23.50.58](https://hackmd.io/_uploads/Skkg-5q3be.png)
![截圖 2026-04-13 23.51.11](https://hackmd.io/_uploads/HyilW5chbg.png)
![截圖 2026-04-13 23.52.49](https://hackmd.io/_uploads/ryRUb99n-x.png)
![截圖 2026-04-13 23.52.59](https://hackmd.io/_uploads/BkOw-9cnZx.png)
![截圖 2026-04-13 23.53.08](https://hackmd.io/_uploads/H1ZOZc5hbe.png)
### Thrashing
![截圖 2026-04-13 23.53.42](https://hackmd.io/_uploads/S1XcZc93Zg.png)
![截圖 2026-04-13 23.55.05](https://hackmd.io/_uploads/S1IJMc93bg.png)
![截圖 2026-04-13 23.55.49](https://hackmd.io/_uploads/B1WfGqch-l.png)
![截圖 2026-04-13 23.55.57](https://hackmd.io/_uploads/HkcGfqqhWe.png)
![截圖 2026-04-13 23.56.13](https://hackmd.io/_uploads/r1KQMqchZg.png)
![截圖 2026-04-13 23.56.22](https://hackmd.io/_uploads/SJzVGcqnZx.png)
![截圖 2026-04-13 23.57.15](https://hackmd.io/_uploads/HJYPzcc3bl.png)
![截圖 2026-04-13 23.57.24](https://hackmd.io/_uploads/r1buz99hZg.png)
![截圖 2026-04-13 23.57.33](https://hackmd.io/_uploads/B1t_Gqc2Wg.png)
![截圖 2026-04-13 23.57.44](https://hackmd.io/_uploads/ryXtM5q3Wx.png)
![截圖 2026-04-13 23.57.52](https://hackmd.io/_uploads/H1iKfqq3Wg.png)
### Allocating Kernel Memory
![截圖 2026-03-29 18.17.15](https://hackmd.io/_uploads/rkKNhu8oZl.png)
![截圖 2026-03-29 18.30.51](https://hackmd.io/_uploads/Sk8DyKUibl.png)
#### Body System
![截圖 2026-03-29 18.33.02](https://hackmd.io/_uploads/Bk91lF8s-e.png)
![截圖 2026-03-29 18.34.26](https://hackmd.io/_uploads/rJ-BgKLi-e.png)
比較簡單但有蠻多缺點：
![截圖 2026-03-29 18.35.16](https://hackmd.io/_uploads/SkzOltLjbl.png)
#### slab
| ![截圖 2026-03-29 18.45.52](https://hackmd.io/_uploads/ryjkmK8sWg.png) | ![截圖 2026-03-29 18.46.19](https://hackmd.io/_uploads/rJwZQYUiZg.png) |
| - | - |

slab 是最先提出來的
另外兩個 (slub, slob) 是後來提出來的 各有用途
![截圖 2026-03-29 18.48.51](https://hackmd.io/_uploads/SJli7FIj-g.png)
![截圖 2026-03-29 19.00.12](https://hackmd.io/_uploads/HJwrIF8jZg.png)
![截圖 2026-03-29 19.01.42](https://hackmd.io/_uploads/Bkfi8YIj-e.png)
![截圖 2026-03-29 19.03.57](https://hackmd.io/_uploads/rkcmPt8sZe.png)
![截圖 2026-03-29 19.05.12](https://hackmd.io/_uploads/r1EOvYUi-g.png)