OS-Chap12 - I/O systems_

# OS-Chap12 - I/O systems_ ###### tags: `作業系統 Operating System note`, `110-1`, `2021` # Contents [TOC] <style> .blue { color: red; } </style> --- # Overview - 電腦主要工作有兩類 - I/O - 運算(Computation) > < I/O佔絕大部分的時間。> 1. I/O device(device 太多，所以分種類) - 儲存型(storage) device：disk（硬碟）、tapes（磁碟機） - 傳輸型(transmission) device：網卡 - 人機介面(human-interface) device：鍵盤、滑鼠、螢幕 - 其餘特殊(specialized) device：遊戲手把（搖桿）、觸控面板 - 分類的意義在於，不同類型的 device，要做的功能也不同 - storage ➜ Read/ Write - flash ➜ 特殊協定(protocal) - SD 卡 ➜ 想要 R/W 還需下 command，並等 SD 卡回覆 - transmission ➜ ts/ rs (transfer傳送/receive接收) - 所以只要統計好上方這幾個種類，可能最終會使用到多少的功能，也可以說其資料結構會需要哪幾種函數指標(function)，在依據各種不同 categories，繼承該 device 所需要的資料結構即可。 - ex. Keyboard 需要 buffer，但 network card 不用，所以 buffer 的資料結構就可以只加在 human-interface device 上。 2. I/O Subsystem :star: - 為 **OS的子系統 (Subsystem)** - ==**管理、控制與電腦連接的各項裝置**==。 3. Device drivers :star: - 要有一個**統一的存取方式(uniform device-access interface)** 給 I/O subsystem。 4. I/O Hardware(硬體設備) - Port: - 每個 device 都會有自己的編號(port) ![](https://i.imgur.com/ggPosfE.jpg) - Bus: ![](https://i.imgur.com/rblJyUr.jpg) - 除了 CPU 外， :star: why CPU 不需要 memory address？ :star2: why device 需要 memory address？ - :star2: 因為 device 需要 memory address 來控制每個 device 的開關(register)。 - :star: 且因為 CPU 是 master！他是主動 enable/ disable CPU 裏頭的 RAM、ROM、timer、interrupt、GPIO，前面這些被包含在 CPU 的晶片哩，也有各自的 controller 去控制裏頭的 I/O，故亦可以控制 CPU 裡面的資料何時被送出。 - 結論： **CPU 裏頭的 register 不用用位置存取！(r0~r15, $sp)** - Controller: 用來操作 device 的控制器 - 每個 device 都會有自己的 controller。 - 整體觀念: ![](https://i.imgur.com/nL3u3V9.jpg) - example: (個人電腦_PC) ![](https://i.imgur.com/F0kP51Z.jpg) - CPU 有 4 個核心，表示有 4 個 controller (每個核心裡面都各自有 clock，所以可以各自執行自己的工作) - 北橋晶片相對於南橋晶片接近 CPU，故北橋的 clock 較南橋來的快。 - CPU frequent : 北橋 frequent : 南橋 frequent = 1:4:8 - 此頻率是透過 counter 實作(counter 是用 JK正反器實作) > // 設備的快慢取決於 clock > // 正反器就是開關。 - 從上圖中可以得知，clock(I/O)速度 : CPU > 北橋 > 南僑(只是連接 device，所以不用那麼快) - 因 main memory 連接在北橋上，而 CPU 每次要存取 memory 時，都得經過北橋晶片才能存取 memory，耗費時間極多(讓 CPU stall 很多 cycle)。 - 所以才會加了一堆 MMU、cache、TLB，為了加速記憶體存取的時間。 > 上課黑板老師畫的開關圖 > ![](https://i.imgur.com/HVXsHFk.png) > 左邊 12V 在開關接上後(開)，中間的磁圈會產生磁場，造成右方 220V 的電流流通。 > :star:因為兩個電路在不同迴路，所以開關時電路不會燒壞 12V 電路的原因。 > 繼電器（Relay）：中間那個磁圈的東西 > :+1:但是當多次迅速得開關時，右方 220V 的線圈，會因伏特數越高、凸波越高，而產生火花，突破臨界時就將線圈的鐵融化掉了(就黏住了！！！)。 - 系統複雜度高，因為 I/O 硬體裝置間差異度高，例如各裝置速度、功能都不同，故**獨立於 Kernel 外另設 Subsystem**。 ex. 每個設備都要初始化，所以 OS 在設備出現後，就會給他一個資料結構(OS想管理就是用一個資料結構)。若現在電腦有 100 個設備，就有 100 個資料結構，OS 想管理這些資料結構，就是用 linked-list(or tree) 把這些資料結構存起來。當 application 要求網卡時，就從 linked-list(or tree) 中找出想要的 device。假設想要使用該網卡的傳送資料，就會從該資料結構裡面找出一個「函數指標」，並呼叫 transfer＜ts＞的那個函數，**而這個傳送「函數指標」的行為，就是 driver ！** // ts 代表傳送，rs 代表接收 :star: **這就是為甚麼 device 都得先註冊，把該 device 的 driver 掛上去 OS 的 API，讓上面 app 可以呼叫他！** :star: :+1: 也就是說！ :star: - I/O subsystem 會提供一些 API 給 OS 使用；而 OS 要將這些 API 給上層 app 使用時，則讓 app 透過 system call 呼叫 OS ，就可以使用這些 device 掛上去 API 的 driver。 # 和 device(設備, I/O) 溝通的「管道」有兩種 - ### 存取 device 的方法 ## Port-mapped I/O 1. By Port 2. 每個 port 都有各自獨特的 port address(ID)，用來讓 CPU 存取每個 device。 ==**＜非 memory address space＞**== ![](https://i.imgur.com/ggPosfE.jpg) 3. 每個 I/O port 由 **四個暫存器(four registers)** 組合而成。 - **Data in** reg. : - **Data out** reg. : - **Status** reg. : - **Control** reg. : > 暫存器的位置就是 I/O port 的位置。 4. 若想透過 I/O port 存取各個 I/O，==需要透過些『特殊指令(special I/O instructions)』== 存取 device。 - ex. x86 的 IN, OUT 指令 > special I/O instructions 的意思是「不同於 memory access 」的存取指令。(memory access 用 memory address 的方式存取 I/O) ## Memory-mapped I/O - #### :star: 現今都用 memory-mapped 的方式 :star: 1. By Bus(匯流排): 2. 每個 I/O 都給記憶體位置 memory address，用來讓 CPU 存取每個 device。 > 把 device 看成一般的 memory。 - 因為會保留 RAM 中用不到的 memory address，把這些 addr. 當成 I/O device 存取的門牌號碼(地址)！ eg. RAM 的位置在 memory 的上半部，device(周邊 device) 的位置在 memory 的下半部(高位元的部分)。 - 優: 對於存取大一點的 memory I/O 是有益的(ex. 顯卡__graphic card) > - memory-mapped 可以直接透過 DMA 的方式存取想要的 I/O，就不用透過特殊指令的 move 傳入 CPU 再拿出。 > - 因為想要的 I/O 本身是 controller，本身就有 clock，不用靠別人搬動。 - 缺: 如果 RAM 和 I/O 的範圍沒有限制好，可能會造成「想存取/改動(modification) RAM 的值，卻改動到 I/O」➜ accidental modification ➜ error。 ### 小小整理結論 1. I/O mapped I/O(port-mapped I/O或Direct I/O) - I/O與memory均擁有自己的記憶體空間 - 需要特別的指令來處理I/O - 好處是完全不用考慮記憶體空間被I/O佔用，缺點需要額外的指令專門處理I/O存取。 ![](https://i.imgur.com/PrrGLLt.png) 2. Memory Mapped I/O - I/O與memory共用記憶體空間 - 不需要特別指令來處理I/O - 其實Memory mapped I/O只是將I/O的port或memory 映射(mapping)到記憶體位址(memory address)上， - 好處就是**可以把I/O存取直接當成存取記憶體來用**，缺點是**有映射到的區域原則上就不能放真正的記憶體**。 ![](https://i.imgur.com/qIf7487.png) # 和 device(設備) 溝通的「方式」有兩種 - #### 想知道設備有沒有我要的資料 ## Busy-Waiting = Poll - 用迴圈不斷的找想要的資料 - processor 定期的向 device 詢問其儲存狀態的暫存器(state reg.) - ex. 手機怎麼知道何時有人打電話進來？ - 一直不斷地偵測/ 詢問(poll) 有沒有接收到訊號。 ## Interrupt - device 做完該做的工作後，若有需要 CPU 執行，再向其發出訊號。 - ex. timer(時間) - 每隔一段時間(10 ns)後向 CPU 發出訊號。(中斷他) # 「傳送資料」(transfer) 的方式有兩種 ## Programmed I/O - 就是寫程式 read/write、in/out，並由 CPU 控制其傳輸。 - conclusion to Property of Programmed I/O - 在 CPU 內 - 透過程式碼決定傳送甚麼資料到哪裡 ## DMA :star: :star: :star: :star2: :star2: (好像很重要) ###### Direct memory access - 直接記憶體存取 - #### 適用於==欲傳送的資料比較大的時候。(large data transfer)== - 用範例來解釋！ - ex. net card(網卡) - ex. camera(攝影機/攝像頭)➜把 camera 的資料傳到 RAM 裡 > - 若傳輸大資料使用 CPU，會占用 CPU 太多時間，使得 CPU 無法做其他事 > - 因 CPU 在執行五個 stage 時，只有兩個 stage 會使用到 bus，其餘用不到的時間 DMA 就可以趁機使用 bus 來傳遞資料了！（而且 bus 很大，可以很快地就傳完資料。） - DMA Controller : 用來實作 DMA 的控制器＜hardware＞ 1. 控制何時讓資料從 device 透過 bus 流出。 2. 當 DMA 傳輸完成後，發送一個通知(Interrupt)給 OS - conclusion to Property of DMA - 為了一次性 or 快速**傳遞大量資料**而發展出的方法 - 有 **DMA controller** - 因為用 bus 傳，所以屬於 **memory-mapped** 的方法 - controller 做完該做的事情，要告知 OS ➜ **Interrupt** - :star2: :star2: :star2: :star2: :star2: ![](https://i.imgur.com/t5DUebG.png) ## Blocking - blocking 發生 interrupt 時，會讓 process 整個處於停滯的狀態，等到 I/O 完成後回傳，OS才會繼續做 - User Program 要資料時 system call 呼叫 kernel，kernel 沒完成找到且 copy 資料前，Process 不能做任何事情。 - //Process 去 sleep 了，直到 kernel 叫她起床 ## Non-Blocking - Non blocking-> 即使某個 process 發起 interrput時，IO 也會繼續處理下一條 Process(pineline的概念) - 很多 User Program 同時用 Polling 去查看資料到底找到沒，不斷呼叫kernel，若沒找到直接回傳 no_data。找到後就 copy 資料成功後回傳給 User Program - //Process不會sleep，有一個資料kernel找到後，回傳給user ### [ref.](https://www.itread01.com/content/1549696876.html) ### [ref.](https://kaka-lin.github.io/2020/07/io_models/) ## Asynchronous(非同步) - Asynchronous(user program & kernel不是同步的)->user program要讀資料時，System call 呼叫 kenel，叫 kernel 去處理這些事，kernel處理完之後才呼叫 user program - //kernel變工具人，像是user網購下訂單填入地址付完錢(user)，然後剩下讓網站(kernel)去找貨、上貨、物流，到貨的時候才通知user，user負責收貨 ![](https://i.imgur.com/KvnZtTe.png) ![](https://i.imgur.com/QoWA0TA.png) --- # H.W. ## 13.2 - Q: What are the ==advantages and disadvantages== of supporting **memory-mapped I/O** to device control registers? - A: - The **advantage** of supporting memory-mapped I/O to device control registers is that it **eliminates the need for *special I/O instructions* from the instruction set** and therefore also does not require the enforcement of protection rules that prevent user programs from executing these I/O instructions. - The **disadvantage** is that the resulting **flexibility** needs to be handled with **care**; the **memory translation units** need to ensure that **the memory addresses associated with the device control registers** are **==not== accessible ==by user programs==** in order to ensure protection. >－－－－－－－－－－－－－－－－－－－－－－－－－ >- Advantages include: > 1. Memory mapped I/O gives you **a single address space** and **a common set of instructions** for both **data** and **I/O operations**. > 2. You can define memory ordering rules and memory barriers that apply both to device accesses and normal memory. > 3. You do**n’t need a whole separate set of opcodes for I/O instructions**. You can **reuse your ==ordinary memory access instructions==**. > 4. You can use pointers in languages such as C and C++ to access devices, rather than platform-specific intrinsics or inline assembly. Caveats: > - You still need to tag those pointers volatile. > - You may still need intrinsics or inline assembly to implement memory barriers. > 5. You can **reuse the same memory mapping mechanisms** you use for other memory to control access to devices (e.g. page table entries). > 6. You may benefit from much of the low-latency buses and request routing infrastructure put in place to optimize normal data accesses. >- Disadvantages include: > 1. It potentially **complicates your cache controller**, as **device accesses behave differently from normal memory accesses**. > 2. It potentially complicates instruction scheduling (especially speculation) as the processor doesn’t know immediately that a given load or store goes to device memory; rather, the MMU or some other structure in or near the memory system informs it once it receives and decodes the address. > 3. It adds corner conditions and restrictions, such as requiring certain specific access widths (e.g. 32-bit writes only; no 8-bit, 16-bit, or 64-bit writes), which may catch some compilers by surprise. > 4. You still end up with high access latency and lower throughput when your request steps off the fast, low-latency path meant for data into an I/O subsystem with slower, simpler buses. > 5. You can get some nasty surprises when you use types such as std::atomic to perform MMIO, and discover the compiler uses an instruction that isn’t compatible with your peripheral. >- [ref.](https://www.quora.com/What-are-the-advantages-and-disadvantages-of-memory-mapped-I-O) ## 13.5 - Q: What are the various kinds of performance overhead associated with **servicing an interrupt**? - A:[![](https://i.imgur.com/ccvnIKQ.png)](https://nanopdf.com/download/exe-c07-io_pdf) 回顧：什麼是overhead(可以點圖片) [![](https://i.imgur.com/SiHBxyC.png)](https://hackmd.io/@cindyrumi/cindyOSchp3#Context-Switch) 簡而言之：利用Interrupt來跟I/O設備傳輸資料時，需要中斷CPU內的程序，流程如下： 1. 儲存原本執行中的Process狀態 2. 因爲Process被中斷：清空Pipeline上的指令 3. ..傳輸傳輸傳輸.. 4. 恢復之前的Process狀態 5. 把之前的指令放回Pipeline 所以overhead會取決於： * 儲存和回復Process的速度 * 清空和回復Pipeline的速度 - [ref. of answer](https://nanopdf.com/download/exe-c07-io_pdf) ## 13.6 - Q: $^{(1)}$Describe three circumstance under which blocking I/O should be used. $^{(2)}$Describe three circumstances under which nonblocking I/O should be used. $^{(3)}$Why not just implement nonblocking I/O and have processes busy-wait until their devices are ready? [![](https://i.imgur.com/t1X4vYz.png)](https://nanopdf.com/download/exe-c07-io_pdf) blocking vs non-blocking： * Blocking的I/O傳輸在等待資料傳輸時，CPU會直接卡在I/O操作上，無法做其他事情，而Non-blocking類似於busy-waiting，會直接回傳資料的傳輸狀態，但是process需要一直詢問kernel資料傳好了沒。 * ![](https://i.imgur.com/uKukRsR.png) 什麼情況要用non-Blocking： * 如果I/O設備很多（例如sockets），可以使用non-blocking來達到多工的效果（I/O multiplexing），每個設備輪流詢問CPU I/O的資料狀況，使得單一thread同時監控多個設備的作用，而不需要使用多個thread/process造成系統資源的浪費。 * 斷斷續續的資料傳輸（例如UDP協定的傳輸或串流），可以使用non-blocking來解決I/O設備不連續傳送的狀況，如果用 blocking，就會需要等到收到一份完整的資料（例如UDP封包）後，才會回傳狀態，這時CPU就會浪費很多時間在等待I/O設備發送資料。 * 用於實現資料同步，如果使用blocking來實現同步就會需要等待一個資料傳完，再傳下一個，此時可能其餘的I/O設備都是空閒的，所以可以使用non-blocking來實現多個來源的資料同步處理。 - [ref. of answer](https://nanopdf.com/download/exe-c07-io_pdf) - [ref. of infor. 菜鳥成長史](https://wirelessr.gitbooks.io/working-life/content/io_model.html) - blocking vs non-blocking - :star: [Java的I/O模型](https://medium.com/@clu1022/%E6%B7%BA%E8%AB%87i-o-model-32da09c619e6) :star: - [我所理解的 I/O](https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/81519/) - [Stack overflow : Blocking IO vs non-blocking IO](https://stackoverflow.com/questions/1241429/blocking-io-vs-non-blocking-io-looking-for-good-articles) - [Stack overflow : What is "non-blocking" concurrency](https://stackoverflow.com/questions/2824225/what-is-non-blocking-concurrency-and-how-is-it-different-than-normal-concurren) - [計算機網路-TCP和UDP總結（區別、優缺點、應用場景）](https://www.796t.com/article.php?id=161917) -- - 概括來說，一個IO操作可以分爲兩個部分：發出請求、結果完成。 - 若從發出請求到結果返回，一直Block，那就是Blocking IO；如果發出請求就可以返回（結果完成不考慮），就是non-blocking IO；如果發出請求就返回，結果返回是Block在select或者poll上的，則其只能稱爲IO multiplexing；如果發出請求就返回，結果返回通過Call Back的方式被處理，就是AIO。 - $^{(1)}$ - $^{(2)}$ - Non-blocking I/O is useful when I/O may come from more than one source and the order of the I/O arrival is not predetermined. Examples include network daemons listening to more than one network socket, window managers that accept mouse movement as well as keyboard input, and I/O-management programs, such as a copy command that copies data between I/O devices. In the last case, the program could optimize its performance by buffering the input and output and using non-blocking I/O to keep both devices fully occupied. - $^{(3)}$Non-blocking I/O is more complicated for programmers, because of the asynchonous rendezvous that is needed when an I/O occurs. Also, busy waiting is less efﬁcient than interrupt-driven I/O so the overall system performance would decrease. [ref.](http://fcs351.pbworks.com/w/page/6493664/Tutorial9) ## 13.8 - Q: Some **DMA controllers** support **direct virtual memory access**, where the targets of I/O operations are specified as virtual addresses and **a translation from virtual to physical address** is performed **during the DMA**. ==How does this designed complicate the design of the DMA controller?== What are the **advantages** of providing such functionality? 讓 DMA controller 支援虛擬記憶體的直接訪問，意味着需要把第二記憶體（如硬碟）的部分資料復制到記憶體中。而虛擬記憶體的實現是利用查 page表，來對應虛擬記憶體位置與實際的位置，DMA controller就會需要**同時處理實體位置和虛擬位置**，而這個特性就會讓虛擬記憶體的轉換（查表）變得更復雜。優點是使其餘硬體也能和CPU一起享有虛擬記憶體的好處，例如： * 連續的記憶體區塊 * 額外的記憶體空間 - [What do you think a DMA controller can get direct access to the virtual memory or not?](https://www.quora.com/What-do-you-think-a-DMA-controller-can-get-direct-access-to-the-virtual-memory-or-not) ![](https://i.imgur.com/M314vc9.png) - Answer: - **Direct virtual memory access** allows a device to **perform a transfer from two memory-mapped devices *without the intervention of the CPU* or *the use of main memory as a staging ground(暫存地)***; - **直接虛擬內存訪問**允許設備**執行來自兩個內存映射設備的傳輸*無需 CPU 干預*或*使用主內存作為暫存地*** - - the device simply issues(發出) memory operations to the memory-mapped addresses of a target device and the ensuing virtual address translation guarantees that the data is transferred to the appropriate device. - 設備簡單地向目標設備的內存映射地址發出內存操作，隨後的虛擬地址轉換保證數據傳輸到適當的設備。 - - This functionality, however, comes at the **cost** of having to support virtual address translation on addresses accessed by a DMA controller and ==requires the addition of an address-translation unit **(MMU)** to the DMA controller.== - 然而，此功能的代價是必須支持對 DMA 控制器訪問的地址進行虛擬地址轉換，並且需要向 DMA 控制器添加地址轉換單元。 - - The address translation results in **both hardware and software costs** and might also result in ==coherence problems(一致性問題)== between **the data structures <span class="blue">maintained</span> by the CPU for address translation** and **corresponding structures used by the DMA controller**. - 地址轉換會導致硬件和軟件成本，並且還可能導致 CPU <span class="blue">維護</span>的用於**地址轉換的數據結構**與 **DMA 控制器使用的相應結構**之間的==一致性問題==。 - - These coherence issues would also need to be dealt with and result in a further increase in system complexity. - 這些連貫性問題也需要處理，並導致系統複雜性進一步增加。 - ![](https://i.imgur.com/JlOtYwi.png) # 我的作業解答 ## 13.2 - ![](https://i.imgur.com/OVRZTr7.png) ## 13.5 - ![](https://i.imgur.com/ixZLxgh.png) ## 13.6 - ![](https://i.imgur.com/cXOeDDX.png) ## 13.8 - ![](https://i.imgur.com/4lVYgQS.png)