---
# System prepended metadata

title: OS-Chap12 - I/O systems_
tags: [作業系統 Operating System note, 110-1, '2021']

---

# OS-Chap12 - I/O systems_
###### tags: `作業系統 Operating System note`,  `110-1`, `2021`
# Contents
[TOC]
<style>
.blue {
  color: red;
}
</style>

---
# Overview
- 電腦主要工作有兩類
    - I/O
    - 運算(Computation)
> < I/O佔絕大部分的時間。>
1. I/O device(device 太多，所以分種類)
    - 儲存型(storage) device：disk（硬碟）、tapes（磁碟機）
    - 傳輸型(transmission) device：網卡
    - 人機介面(human-interface) device：鍵盤、滑鼠、螢幕
    - 其餘特殊(specialized) device：遊戲手把（搖桿）、觸控面板
- 分類的意義在於，不同類型的 device，要做的功能也不同
    - storage ➜ Read/ Write
        - flash ➜ 特殊協定(protocal)
        - SD 卡 ➜ 想要 R/W 還需下 command，並等 SD 卡回覆
    - transmission ➜ ts/ rs (transfer傳送/receive接收)
- 所以只要統計好上方這幾個種類，可能最終會使用到多少的功能，也可以說其資料結構會需要哪幾種函數指標(function)，在依據各種不同 categories，繼承該 device 所需要的資料結構即可。
    - ex. Keyboard 需要 buffer，但 network card 不用，所以 buffer 的資料結構就可以只加在 human-interface device 上。


2. I/O Subsystem :star:
    - 為 **OS的子系統 (Subsystem)**
    - ==**管理、控制與電腦連接的各項裝置**==。
3. Device drivers :star:
    - 要有一個**統一的存取方式(uniform device-access interface)** 給 I/O subsystem。

4. I/O Hardware(硬體設備)
    - Port: 
        - 每個 device 都會有自己的編號(port)
        ![](https://i.imgur.com/ggPosfE.jpg)
    - Bus: 
        ![](https://i.imgur.com/rblJyUr.jpg)
        - 除了 CPU 外，
        :star: why CPU 不需要 memory address？
        :star2: why device 需要 memory address？
            - :star2: 因為 device 需要 memory address 來控制每個 device 的開關(register)。
            - :star: 且因為 CPU 是 master！他是主動 enable/ disable CPU 裏頭的 RAM、ROM、timer、interrupt、GPIO，前面這些被包含在 CPU 的晶片哩，也有各自的 controller 去控制裏頭的 I/O，故亦可以控制 CPU 裡面的資料何時被送出。
                - 結論： **CPU 裏頭的 register 不用用位置存取！(r0~r15, $sp)** 
    - Controller: 用來操作 device 的控制器
        - 每個 device 都會有自己的 controller。
    - 整體觀念:
        ![](https://i.imgur.com/nL3u3V9.jpg)
    - example: (個人電腦_PC)
        ![](https://i.imgur.com/F0kP51Z.jpg)
        - CPU 有 4 個核心，表示有 4 個 controller
            (每個核心裡面都各自有 clock，所以可以各自執行自己的工作)
        - 北橋晶片相對於南橋晶片接近 CPU，故北橋的 clock 較南橋來的快。
        - CPU frequent : 北橋 frequent : 南橋 frequent = 1:4:8
            - 此頻率是透過 counter 實作(counter 是用 JK正反器實作)
         >   // 設備的快慢取決於 clock
         >   // 正反器就是開關。
        - 從上圖中可以得知，clock(I/O)速度 : CPU > 北橋 > 南僑(只是連接 device，所以不用那麼快)
            - 因 main memory 連接在北橋上，而 CPU 每次要存取 memory 時，都得經過北橋晶片才能存取 memory，耗費時間極多(讓 CPU stall 很多 cycle)。
            - 所以才會加了一堆 MMU、cache、TLB，為了加速記憶體存取的時間。

> 上課黑板老師畫的開關圖
> ![](https://i.imgur.com/HVXsHFk.png)
> 左邊 12V 在開關接上後(開)，中間的磁圈會產生磁場，造成右方 220V 的電流流通。
> :star:因為兩個電路在不同迴路，所以開關時電路不會燒壞 12V 電路的原因。
> 繼電器（Relay）：中間那個磁圈的東西
> :+1:但是當多次迅速得開關時，右方 220V 的線圈，會因伏特數越高、凸波越高，而產生火花，突破臨界時就將線圈的鐵融化掉了(就黏住了！！！)。


- 系統複雜度高，因為 I/O 硬體裝置間差異度高，例如各裝置速度、功能都不同，故**獨立於 Kernel 外另設 Subsystem**。


ex. 每個設備都要初始化，所以 OS 在設備出現後，就會給他一個資料結構(OS想管理就是用一個資料結構)。
若現在電腦有 100 個設備，就有 100 個資料結構，OS 想管理這些資料結構，就是用 linked-list(or tree) 把這些資料結構存起來。
    當 application 要求網卡時，就從 linked-list(or tree) 中找出想要的 device。
    假設想要使用該網卡的傳送資料，就會從該資料結構裡面找出一個「函數指標」，並呼叫 transfer＜ts＞ 的那個函數，**而這個傳送「函數指標」的行為，就是 driver ！**
// ts 代表傳送，rs 代表接收
 :star: **這就是為甚麼 device 都得先註冊，把該 device 的 driver 掛上去 OS 的 API，讓上面 app 可以呼叫他！** :star: 
:+1: 也就是說！ :star: 
- I/O subsystem 會提供一些 API 給 OS 使用；而 OS 要將這些 API 給上層 app 使用時，則讓 app 透過 system call 呼叫 OS ，就可以使用這些 device 掛上去 API 的 driver。
 
 
# 和 device(設備, I/O) 溝通的「管道」有兩種
- ### 存取 device 的方法
## Port-mapped I/O
1. By Port
2. 每個 port 都有各自獨特的 port address(ID)，用來讓 CPU 存取每個 device。
==**＜非 memory address space＞**==
        ![](https://i.imgur.com/ggPosfE.jpg)
3. 每個 I/O port 由 **四個暫存器(four registers)** 組合而成。
    - **Data in** reg. :
    - **Data out** reg. : 
    - **Status** reg. : 
    - **Control** reg. : 
>    暫存器的位置就是 I/O port 的位置。
4. 若想透過 I/O port 存取各個 I/O，==需要透過些『特殊指令(special I/O instructions)』== 存取 device。
    - ex. x86 的 IN, OUT 指令
>    special I/O instructions 的意思是「不同於 memory access 」的存取指令。(memory access 用 memory address 的方式存取 I/O)


## Memory-mapped I/O 
- #### :star: 現今都用 memory-mapped 的方式 :star:
1. By Bus(匯流排):
2. 每個 I/O 都給記憶體位置 memory address，用來讓 CPU 存取每個 device。
    >    把 device 看成一般的 memory。
    - 因為會保留 RAM 中用不到的 memory address，把這些 addr. 當成 I/O device 存取的門牌號碼(地址)！
        eg. RAM 的位置在 memory 的上半部，device(周邊 device) 的位置在 memory 的下半部(高位元的部分)。
        - 優: 對於存取大一點的 memory I/O 是有益的(ex. 顯卡__graphic card)
            > - memory-mapped 可以直接透過 DMA 的方式存取想要的 I/O，就不用透過特殊指令的 move 傳入 CPU 再拿出。
            > - 因為想要的 I/O 本身是 controller，本身就有 clock，不用靠別人搬動。
        - 缺: 如果 RAM 和 I/O 的範圍沒有限制好，可能會造成「想存取/改動(modification) RAM 的值，卻改動到 I/O」➜ accidental modification ➜ error。
### 小小整理結論
1. I/O mapped I/O(port-mapped I/O或Direct I/O)
- I/O與memory均擁有自己的記憶體空間
- 需要特別的指令來處理I/O
- 好處是完全不用考慮記憶體空間被I/O佔用，缺點需要額外的指令專門處理I/O存取。
![](https://i.imgur.com/PrrGLLt.png)

2. Memory Mapped I/O
- I/O與memory共用記憶體空間
- 不需要特別指令來處理I/O
- 其實Memory mapped I/O只是將I/O的port或memory 映射(mapping)到記憶體位址(memory address)上，
- 好處就是**可以把I/O存取直接當成存取記憶體來用**，缺點是**有映射到的區域原則上就不能放真正的記憶體**。
![](https://i.imgur.com/qIf7487.png)


# 和 device(設備) 溝通的「方式」有兩種
- #### 想知道設備有沒有我要的資料
## Busy-Waiting = Poll
- 用迴圈不斷的找想要的資料
- processor 定期的向 device 詢問其儲存狀態的暫存器(state reg.)
- ex. 手機怎麼知道何時有人打電話進來？
    - 一直不斷地偵測/ 詢問(poll) 有沒有接收到訊號。

## Interrupt
- device 做完該做的工作後，若有需要 CPU 執行，再向其發出訊號。
- ex. timer(時間)
    - 每隔一段時間(10 ns)後向 CPU 發出訊號。(中斷他)


# 「傳送資料」(transfer) 的方式有兩種
## Programmed I/O
- 就是寫程式 read/write、in/out，並由 CPU 控制其傳輸。 
- conclusion to Property of Programmed I/O
    - 在 CPU 內
    - 透過程式碼決定傳送甚麼資料到哪裡

## DMA :star: :star: :star: :star2: :star2: (好像很重要)
###### Direct memory access - 直接記憶體存取
- #### 適用於==欲傳送的資料比較大的時候。(large data transfer)==
- 用範例來解釋！
    - ex. net card(網卡)
    - ex. camera(攝影機/攝像頭)➜把 camera 的資料傳到 RAM 裡 
    > - 若傳輸大資料使用 CPU，會占用 CPU 太多時間，使得 CPU 無法做其他事 
    > - 因 CPU 在執行五個 stage 時，只有兩個 stage 會使用到 bus，其餘用不到的時間 DMA 就可以趁機使用 bus 來傳遞資料了！（而且 bus 很大，可以很快地就傳完資料。）
- DMA Controller : 用來實作 DMA 的控制器＜hardware＞
    1. 控制何時讓資料從 device 透過 bus 流出。
    2. 當 DMA 傳輸完成後，發送一個通知(Interrupt)給 OS
- conclusion to Property of DMA
    - 為了一次性 or 快速**傳遞大量資料**而發展出的方法
    - 有 **DMA controller**
    - 因為用 bus 傳，所以屬於 **memory-mapped** 的方法
    - controller 做完該做的事情，要告知 OS ➜ **Interrupt**
- :star2: :star2: :star2: :star2: :star2: ![](https://i.imgur.com/t5DUebG.png)

## Blocking 
- blocking 發生 interrupt 時，會讓 process 整個處於停滯的狀態，等到 I/O 完成後回傳，OS才會繼續做
- User Program 要資料時 system call 呼叫 kernel，kernel 沒完成找到且 copy 資料前，Process 不能做任何事情。
- //Process 去 sleep 了，直到 kernel 叫她起床
## Non-Blocking
- Non blocking-> 即使某個 process 發起 interrput時，IO 也會繼續處理下一條 Process(pineline的概念)
- 很多 User Program 同時用 Polling 去查看資料到底找到沒，不斷呼叫kernel，若沒找到直接回傳 no_data。找到後就 copy 資料成功後回傳給 User Program
- //Process不會sleep，有一個資料kernel找到後，回傳給user


### [ref.](https://www.itread01.com/content/1549696876.html)
### [ref.](https://kaka-lin.github.io/2020/07/io_models/)

## Asynchronous(非同步)
- Asynchronous(user program & kernel不是同步的)->user program要讀資料時，System call 呼叫 kenel，叫 kernel 去處理這些事，kernel處理完之後才呼叫 user program
- //kernel變工具人，像是user網購下訂單填入地址付完錢(user)，然後剩下讓網站(kernel)去找貨、上貨、物流，到貨的時候才通知user，user負責收貨

![](https://i.imgur.com/KvnZtTe.png)
![](https://i.imgur.com/QoWA0TA.png)


---

# H.W.

## 13.2
- Q: What are the ==advantages and disadvantages== of supporting **memory-mapped I/O** to device control registers?
- A:
    - The **advantage** of supporting memory-mapped I/O to device control registers is that it **eliminates the need for *special I/O instructions* from the instruction set** and therefore also does not require the enforcement of protection rules that prevent user programs from executing these I/O instructions. 
    - The **disadvantage** is that the resulting **flexibility** needs to be handled with **care**; the **memory translation units** need to ensure that **the memory addresses associated with the device control registers** are **==not== accessible ==by user programs==** in order to ensure protection.

>－－－－－－－－－－－－－－－－－－－－－－－－－
>- Advantages include:
>    1. Memory mapped I/O gives you **a single address space** and **a common set of instructions** for both **data** and **I/O operations**.
>    2. You can define memory ordering rules and memory barriers that apply both to device accesses and normal memory.
>    3. You do**n’t need a whole separate set of opcodes for I/O instructions**. You can **reuse your ==ordinary memory access instructions==**.
>    4. You can use pointers in languages such as C and C++ to access devices, rather than platform-specific intrinsics or inline assembly. Caveats:
>        - You still need to tag those pointers volatile.
>        - You may still need intrinsics or inline assembly to implement memory barriers.
>    5. You can **reuse the same memory mapping mechanisms** you use for other memory to control access to devices (e.g. page table entries).
>    6. You may benefit from much of the low-latency buses and request routing infrastructure put in place to optimize normal data accesses.

>- Disadvantages include:
>    1. It potentially **complicates your cache controller**, as **device accesses behave differently from normal memory accesses**.
>    2. It potentially complicates instruction scheduling (especially speculation) as the processor doesn’t know immediately that a given load or store goes to device memory; rather, the MMU or some other structure in or near the memory system informs it once it receives and decodes the address.
>    3. It adds corner conditions and restrictions, such as requiring certain specific access widths (e.g. 32-bit writes only; no 8-bit, 16-bit, or 64-bit writes), which may catch some compilers by surprise.
>    4. You still end up with high access latency and lower throughput when your request steps off the fast, low-latency path meant for data into an I/O subsystem with slower, simpler buses.
>    5. You can get some nasty surprises when you use types such as std::atomic to perform MMIO, and discover the compiler uses an instruction that isn’t compatible with your peripheral.
>- [ref.](https://www.quora.com/What-are-the-advantages-and-disadvantages-of-memory-mapped-I-O)


## 13.5
- Q: What are the various kinds of performance overhead associated with **servicing an interrupt**?
- A:[![](https://i.imgur.com/ccvnIKQ.png)](https://nanopdf.com/download/exe-c07-io_pdf)

回顧：什麼是overhead(可以點圖片)
[![](https://i.imgur.com/SiHBxyC.png)](https://hackmd.io/@cindyrumi/cindyOSchp3#Context-Switch)


簡而言之：
利用Interrupt來跟I/O設備傳輸資料時，需要中斷CPU內的程序，流程如下：
1. 儲存原本執行中的Process狀態
2. 因爲Process被中斷：清空Pipeline上的指令
3. ..傳輸傳輸傳輸..
4. 恢復之前的Process狀態
5. 把之前的指令放回Pipeline

所以overhead會取決於：
* 儲存和回復Process的速度
* 清空和回復Pipeline的速度


- [ref. of answer](https://nanopdf.com/download/exe-c07-io_pdf)


## 13.6
- Q: $^{(1)}$Describe three circumstance under which blocking I/O should be used.
    $^{(2)}$Describe three circumstances under which nonblocking I/O should be used.
    $^{(3)}$Why not just implement nonblocking I/O and have processes busy-wait until their devices are ready?
[![](https://i.imgur.com/t1X4vYz.png)](https://nanopdf.com/download/exe-c07-io_pdf)

blocking vs non-blocking：
* Blocking的I/O傳輸在等待資料傳輸時，CPU會直接卡在I/O操作上，無法做其他事情，而Non-blocking類似於busy-waiting，會直接回傳資料的傳輸狀態，但是process需要一直詢問kernel資料傳好了沒。
* ![](https://i.imgur.com/uKukRsR.png)


什麼情況要用non-Blocking：
* 如果I/O設備很多（例如sockets），可以使用non-blocking來達到多工的效果（I/O multiplexing），每個設備輪流詢問CPU I/O的資料狀況，使得單一thread同時監控多個設備的作用，而不需要使用多個thread/process造成系統資源的浪費。
* 斷斷續續的資料傳輸（例如UDP協定的傳輸或串流），可以使用non-blocking來解決I/O設備不連續傳送的狀況，如果用 blocking，就會需要等到收到一份完整的資料（例如UDP封包）後，才會回傳狀態，這時CPU就會浪費很多時間在等待I/O設備發送資料。
* 用於實現資料同步，如果使用blocking來實現同步就會需要等待一個資料傳完，再傳下一個，此時可能其餘的I/O設備都是空閒的，所以可以使用non-blocking來實現多個來源的資料同步處理。

- [ref. of answer](https://nanopdf.com/download/exe-c07-io_pdf)
- [ref. of infor. 菜鳥成長史](https://wirelessr.gitbooks.io/working-life/content/io_model.html)
- blocking vs non-blocking
    - :star: [Java的I/O模型](https://medium.com/@clu1022/%E6%B7%BA%E8%AB%87i-o-model-32da09c619e6) :star: 
    - [我所理解的 I/O](https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/81519/)
    - [Stack overflow : Blocking IO vs non-blocking IO](https://stackoverflow.com/questions/1241429/blocking-io-vs-non-blocking-io-looking-for-good-articles)
    - [Stack overflow :  What is "non-blocking" concurrency](https://stackoverflow.com/questions/2824225/what-is-non-blocking-concurrency-and-how-is-it-different-than-normal-concurren)
    - [計算機網路-TCP和UDP總結（區別、優缺點、應用場景）](https://www.796t.com/article.php?id=161917)

--
- 概括來說，一個IO操作可以分爲兩個部分：發出請求、結果完成。
- 若從發出請求到結果返回，一直Block，那就是Blocking IO；如果發出請求就可以返回（結果完成不考慮），就是non-blocking IO；如果發出請求就返回，結果返回是Block在select或者poll上的，則其只能稱爲IO multiplexing；如果發出請求就返回，結果返回通過Call Back的方式被處理，就是AIO。
- $^{(1)}$
- $^{(2)}$
    - Non-blocking I/O is useful when I/O may come from more than one source and the order of the I/O arrival is not predetermined. Examples include network daemons listening to more than one network socket, window managers that accept mouse movement as well as keyboard input, and I/O-management programs, such as a copy command that copies data between I/O devices. In the last case, the program could optimize its performance by buffering the input and output and using non-blocking I/O to keep both devices fully occupied.

 
- $^{(3)}$Non-blocking I/O is more complicated for programmers, because of the asynchonous rendezvous that is needed when an I/O occurs. Also, busy waiting is less efﬁcient than interrupt-driven I/O so the overall system performance would decrease.
[ref.](http://fcs351.pbworks.com/w/page/6493664/Tutorial9)


## 13.8
- Q: Some **DMA controllers** support **direct virtual memory access**, where the targets of I/O operations are specified as virtual addresses and **a translation from virtual to physical address** is performed **during the DMA**. ==How does this designed complicate the design of the DMA controller?== What are the **advantages** of providing such functionality?

讓 DMA controller 支援虛擬記憶體的直接訪問，意味着需要把第二記憶體（如硬碟）的部分資料復制到記憶體中。而虛擬記憶體的實現是利用查 page表，來對應虛擬記憶體位置與實際的位置，DMA controller就會需要**同時處理實體位置和虛擬位置**，而這個特性就會讓虛擬記憶體的轉換（查表）變得更復雜。

優點是使其餘硬體也能和CPU一起享有虛擬記憶體的好處，例如：
* 連續的記憶體區塊
* 額外的記憶體空間

- [What do you think a DMA controller can get direct access to the virtual memory or not?](https://www.quora.com/What-do-you-think-a-DMA-controller-can-get-direct-access-to-the-virtual-memory-or-not)

![](https://i.imgur.com/M314vc9.png)
- Answer:
- **Direct virtual memory access** allows a device to **perform a transfer from two memory-mapped devices *without the intervention of the CPU* or *the use of main memory as a staging ground(暫存地)***;
- **直接虛擬內存訪問**允許設備**執行來自兩個內存映射設備的傳輸*無需 CPU 干預*或*使用主內存作為暫存地***
- 
- the device simply issues(發出) memory operations to the memory-mapped addresses of a target device and the ensuing virtual address translation guarantees that the data is transferred to the appropriate device. 
- 設備簡單地向目標設備的內存映射地址發出內存操作，隨後的虛擬地址轉換保證數據傳輸到適當的設備。
- 
- This functionality, however, comes at the **cost** of having to support virtual address translation on addresses accessed by a DMA controller and ==requires the addition of an address-translation unit **(MMU)** to the DMA controller.== 
- 然而，此功能的代價是必須支持對 DMA 控制器訪問的地址進行虛擬地址轉換，並且需要向 DMA 控制器添加地址轉換單元。
- 
- The address translation results in **both hardware and software costs** and might also result in ==coherence problems(一致性問題)== between **the data structures <span class="blue">maintained</span> by the CPU for address translation** and **corresponding structures used by the DMA controller**. 
- 地址轉換會導致硬件和軟件成本，並且還可能導致 CPU <span class="blue">維護</span>的用於**地址轉換的數據結構**與 **DMA 控制器使用的相應結構**之間的==一致性問題==。
- 
- These coherence issues would also need to be dealt with and result in a further increase in system complexity.
- 這些連貫性問題也需要處理，並導致系統複雜性進一步增加。
- 
![](https://i.imgur.com/JlOtYwi.png)


# 我的作業解答

## 13.2
- ![](https://i.imgur.com/OVRZTr7.png)
## 13.5
- ![](https://i.imgur.com/ixZLxgh.png)

## 13.6
- ![](https://i.imgur.com/cXOeDDX.png)
    
## 13.8
- ![](https://i.imgur.com/4lVYgQS.png)