PDF link: https://book.douban.com/subject/33463930/
為什麼需要 Vitualization
為了使 OS 更加易於使用-將 Hardware(physical resource) 轉換為通用的虛擬形式
資源的隔離
以模擬獨有的方式共享 CPU resource
以模擬獨有的方式共享 Memory resource
在 writeback 模式下,cpu cache 寫回 memory 要三個步驟。
多個Vitual Resource 同時對單一 api 的解釋不存在 atomically
DRAM 以 volatile 方式存儲數據,中止後數據就丟失。
概敘流程:透過 System Call 打到 File System ,對 I/O Device 發出請求,轉到 Device Driver。
在 Abstraction 和 Performance 中間找到平衡點
寫 C 不用管組合語言,寫組合語言不用考慮指令集實作,實作指令集不用考慮電晶體的組成,這就是抽象的力量。
避免 Application 破壞整個 OS
避免 OS 層級的失效,提高 OS 本身的 Reliability
OS = Library
Isolation 的概念開始出現
任意存取 Memory 或 File I/O 沒有 Isolation 和 Reliability
從操作 File 的 Library 升級為操作 File System 的 Library
如何隔離對 File 和 File System 的操作 -> System Call
隔離 User Mode 跟 Kernel Mode
在 Kernel Mode 下,OS 可以完全操作 Hardware,因此可以透過 driver 操作 I/O 或取求更多 Memory 給 User 之類的功能
流程是 Application -> trap -> trap handler -> kernal mode -> return from trap
提高 CPU 利用率,減少 I/O 操作的佔比。
Memory protection, Concurrency 變成顯學
Pipe 是創舉,提供極為直覺的程序互動方式。
Unix -> Linux 脫離了版權枷鎖。
Physical 上有一個桃子
有無限多個 User 都想吃桃,由於 User 量不定因此我們也無法預先切好桃子來分配。
於是我們提供一個假象,承諾 Make Peach Great Again 每個人都有一個自己的桃子,但其實每個人拿到的都是一個 Virtual Peach。
這樣做不會出錯的基礎是建立在
在 OS 的議題上,CPU 就是那個桃子
後面不會有更多桃子了。
Process 是 OS 對 Resource 調度的最基本抽象,Program 本身沒有生命週期,只是存在 Disk 上的一堆 Byte,只有跟 Process 結合成為 Application 之後才有 Life cycle 可言。
有了 Process 之後 User 就不用管理 CPU 的可用狀況。
Q: 如何提供有無限多
桃..Resource 的假象
A: 基於 Time Sharing 和 Space Sharing 共享 Resource,對 CPU 使用 Time Sharing,在不同的時間段把 CPU 提供給不同 Instance 使用。對 Disk 使用 Space Sharing,把整個 Disk 切成不同的空間分給不同 Instance 使用。
在此機制之上,必定要有特定 Policy 存在以作到 Sharing,Scheduling Policy 會嘗試利用各種訊息(e.g. Time usage,Space usage, throughput) 去做出決定。
Process 的 Life cycle 儲存在兩個部份,其一是 address space,具體長相參見下圖
其二是 Register,很多執行的狀態會在 Register 中流轉。
例如以下組合語言
個人就曾經有經驗,由於 Stack overflow 不小心修改到 Stack Pointer,最終導致 Register 從 Stack Pointer 取得位置並 return 時跳到無法解析的位置導致 Exception。
這種狀況操作的好可以直接變成一種攻擊手段,控制 Applicatino 跳轉到任意指定的位置。
分為兩種,Eagerly & Lazily
Ready -> Running = Scheduled
Running -> Blcoked = Descheduled
OS 在 Trace Process status 的時候會需要許多資料,因此在 Descheduled 的時候會需要把這些資料保存到 Memory 內,範例如下
OS
xv6 process kernel data struct
此外 Process 其實還有一些其他的狀態如
因此 Linux 中 Parent 呼叫 wait 之後的流程其實是這樣
Event | Parent | Child |
---|---|---|
Running | Running | |
Parent::wait(Block) | Blocking | Running |
Child::return | Blocking | Final |
Ready | Final | |
Parent::wait(Unblock) | Running |
./process-run.py
獨萬捲書不如行萬里路,對於內容有任何問題都推薦直接啃 xv6 Source Code ,極為簡潔易懂。
fork 是相當特殊的 API,同時也是搞 Multi-Process 一切開始的地方。
輸出
或
這種不確定就是 Concurrency 的代表特徵之一
Child Process 有不同的 address space, register 等,但 memory 內的數值都是複製過去的。
Parent process 會獲得 Child process 的 PID,至於 Child Process 則會獲得 0。
感謝宇光提出
TL;DR
fork 的執行在 Kernel mode,有全部權限,執行完透過 Trap Frame 修改 Register 狀態來回傳。
可以參考 xv6 這裡的敘述
https://github.com/mit-pdos/xv6-public/blob/eeb7b415dbcb12cc362d0783e41c3d1f44066b17/TRICKS
實作在這
https://github.com/mit-pdos/xv6-public/blob/master/proc.c#L181
https://github.com/mit-pdos/xv6-public/blob/master/proc.c#L74
調用 Fork 後切到 Kernel Mode,在系統創建新 Process 之後可以從 PCB 取得 PID,接下來在載入 Parent Process stack pointer 的時候先將 PID 寫入 return value 的 register,等 Parent Process 載入完成時自然可以從該 register 讀到對應 Value。
On x86 the EAX register is normally used for return values.
可見編譯出來的組合語言也是直接從 eax 取值。
https://godbolt.org/z/7893jnf6z
Block until all child process execute to Final
waitpid 可以指定 child process
具體流程可參考上方 Data Struct 小節
直接開啟加載給定 binary 並且執行,繼承現有 resource,除非 exec 在加載時發生異常不然永不返回。
把操作範圍最小化,這樣可以方便實現很多其他功能。
比如 Shell 在解析 cat main.go > test.txt
的時候,可以先 fork 一個 Process,並且在真正執行 cat 之前先把這個 Child process 的 stdout 導到 test.txt,在 exec cat main.go
時會自動繼承,並且由於 exec cat main.go
時是直接使用 Child process,對於 Parent process 可以不在意 Child process 打算做什麼,單純就 wait() 直到對應 Child process 返回即可。
簡單,方便,易測試。
RTFM(Read the friendly manpage)
We want to virtualize our resources without performance costs
Only kernal mode can access the physical resource
System Call
a way to allow OS expose some key feature to user application.
An standard data exchange interface for System Call
When System Call happen
Call fork() -> Call Trap() ->
Save current register status to TrapFrame ->
Query Trap Table to find out where is the System Call code ->
Jump to specific space to execute System Call Code(A number) ->
Send interupt ->
OS received interupt, read System Call Code number, Call specific function ->
Function read argument from trap frame
If a process never yield itself, how we make sure OS is still executing in our CPU?
Through Clock interupt.(Provide by hardware)
However, cause the trap is not call by User.
So the register should be save by hardware. (Just like when a function call happen)
then trigger trap() -> jump to switch() -> executing other process
Time already approve that reboot computer is a good way to make sure the status of application back to our control.
It's also easy way can be automatic
So be positive about reboot. It's not a bad way to solve problem. It's a reliable way to solve the problem XD
Why context effect performance a lot?
We may heard that it's because that CPU have to reload process status.
However, what's "process status"?
It's actully combine lot's of things include
Recover register status is actully not the biggest cost, it's quite cheap.
All the memory address we see is virtual address
only OS & Hardware know the physical address
Those are actully C library
reference: https://blog.csdn.net/dog250/article/details/80331590
Source code: https://github.com/lattera/glibc/blob/master/malloc/malloc.c
Unitialize/Initialized data are also a part of heap
With Base address
& Size
Usually It will store like this(explicit)
Top 2 bit declare is it stack/heap/text
So the code to find the physical address will be like
Bounds and Base will be different for each process? or different for each segment?
stack grow reversely, how we present it?
Quite simple, add a reverse flag.(one bit)
Or hardcode the caculation of stack as subtraction
Some of Segment need to been sharing
e.g, dynamic library
So there is actually a Protect bit
to mark is those
Better use space, let heap and stack can share the same unused address range.
There is two way to record memory space usage
for example, we have follow memory space
0~9 | 10~19 | 20~29 |
---|---|---|
free | used | free |
Sees the no big different. However, let's see another example
0~9 | 10~19 | 20~29 |
---|---|---|
used | used | free |
It show's that the time complexity of find a free space in second solution(Record free space) is always O(1). However, in the first solution(Record used space) we may have to traverse to the end of list, it worst-case time-complexity is O(n).
There are still a disadvantages for second solution, it may sperate a free space to two different block, such as
We have to merge them by ourself when we schedule them.
magic
is used to check is that header correct or not, a.k.a checksum.
traverse the list, find the smallest free space that fit the requirement
0~29
0~9 | 10~19 | 20~29 |
---|---|---|
used | used | used |
free(obj1) 40
obj2 50
free C -> Mark