XV6 A simple, Unix-like Teaching Operating System Chapter 3

# XV6 A simple, Unix-like Teaching Operating System Chapter 3 --- > [TOC] --- ## Chapter 3 Traps, interrupts, and drivers - 處理這些會遇到三個挑戰 - 內核必須使處理器能夠從用戶態轉換到內核態（並且再轉換回用戶態） - 內核和設備必須協調好他們並行的活動。 - 內核必須知道硬件接口的細節。 --- ### Systems calls, exceptions, and interrupts - 三種情況需要從 User -> Kernel - System Call ： User 需要 OS 服務時 - Exception ： Program 執行不合法操作時 - Interrupt : 硬體發出信號希望執行一個 Handler 時 - EX ： 100 ms 發出一次 Interrupt 實現計時 --- ### X86 protection(即為權限) - x86 有四個特權級，從0（特權最高 Kernel）編號到3（特權最低 User）。在實際使用中，大多數的操作系統都使用兩個特權級，0 和 3，他們被稱為內核模式和用戶模式。當前執行指令的**特權級**存在於 %cs 寄存器中的 CPL 中。 - 在x86 中，中斷處理程序的入口在 Interrupt Descriptor Table （IDT）中被定義。這個表有 256 個 Entry，每一個都提供了相應的 %cs(Code Segment) 和 %eip (Extended Instruction Pointer)。 - 一個程序要在x86 上進行一個系統調用，它需要調用int n 指令，這裡n 就是IDT 的索引。 - int 指令進行下面一些步驟： 1. 從IDT 中獲得第 n 個描述符，n 就是 int 的參數。 2. 檢查 %cs 的域 CPL <= DPL，DPL 是描述符中記錄的特權級。 3. 如果目標段選擇符的 PL < CPL，就在 CPU 內部的寄存器中保存 %esp 和 %ss 的值。 4. 從一個任務段描述符中加載 %ss 和 %esp。 5. 將 %ss Push。 6. 將 %esp Push。 7. 將 %eflags Push。 8. 將 %cs Push。 9. 將 %eip Push。 10. 清除 %eflags 的一些位。 11. 設置 %cs 和 %eip 為描述符中的值。 ![](https://i.imgur.com/8iqtqHG.png) --- ### Code: The first system call ### initcode.S ```c= # Initial process execs /init. #include "syscall.h" #include "traps.h" # exec(init, argv) .globl start start: pushl $argv pushl $init pushl $0 // where caller pc would be movl $SYS_exec, %eax int $T_SYSCALL # for(;;) exit(); exit: movl $SYS_exit, %eax int $T_SYSCALL jmp exit # char init[] = "/init\0"; init: .string "/init\0" # char *argv[] = { init, 0 }; .p2align 2 argv: .long init .long 0 ``` --- ### Code: Assembly trap handlers - 相關文檔 - traps.c - traps.h - trapasm.S - vectors.pl - mmu.h - Tvinit 在 main.c 被呼叫並設置 IDT 中 256 entries，然後需求被 vectors [i] ( 服務程式 ) 處理。 - 下方他將會執行 T_SYSCALL，User 會掉用 Trap (他利用第二個參數設 1 來辨別他是 Trap) - 同時會設置權限為 DPL_USER 使得 User 可以透過掉用 int 產生中斷 ( 但 int 不能產生硬體中斷，若產生了會發生錯誤並調用 13 號中斷 ) 。 ### - tvinit() ```c= void tvinit(void) { int i; for(i = 0; i < 256; i++) SETGATE(idt[i], 0, SEG_KCODE<<3, vectors[i], 0); SETGATE(idt[T_SYSCALL], 1, SEG_KCODE<<3, vectors[T_SYSCALL], DPL_USER); initlock(&tickslock, "time"); } ``` ### - SETGATE() ```c= // Set up a normal interrupt/trap gate descriptor. // - istrap: 1 for a trap (= exception) gate, 0 for an interrupt gate. // interrupt gate clears FL_IF, trap gate leaves FL_IF alone // - sel: Code segment selector for interrupt/trap handler // - off: Offset in code segment for interrupt/trap handler // - dpl: Descriptor Privilege Level - // the privilege level required for software to invoke // this interrupt/trap gate explicitly using an int instruction. #define SETGATE(gate, istrap, sel, off, d) \ { \ (gate).off_15_0 = (uint)(off) & 0xffff; \ (gate).cs = (sel); \ (gate).args = 0; \ (gate).rsv1 = 0; \ (gate).type = (istrap) ? STS_TG32 : STS_IG32; \ (gate).s = 0; \ (gate).dpl = (d); \ (gate).p = 1; \ (gate).off_31_16 = (uint)(off) >> 16; \ } ``` --- - 當 Trap 發生時 - User : 要從 Task Segment Descriptor 讀取 %esp 和 %ss ，push 舊的 User %esp 和 %ss 到新的 Stack 上。 - Kernel : 不用。 **( 對照 "X86 protection" 的步驟 )** - xv6 使用一個 Perl 腳本來產生 IDT 表項指向的中斷處理函數入口點。每一個入口都會 Push 一個錯誤碼（如果 Processor 沒有 Push 的話），Push 中斷號，然後跳轉到 Alltraps。 :::warning - 找不到哪裡執行 vectors.pl - 看不懂什麼叫如果 Processor 沒有 ::: ### - vectors.pl ```c= #!/usr/bin/perl -w # Generate vectors.S, the trap/interrupt entry points. # There has to be one entry point per interrupt number # since otherwise there's no way for trap() to discover # the interrupt number. print "# generated by vectors.pl - do not edit\n"; print "# handlers\n"; print ".globl alltraps\n"; for(my $i = 0; $i < 256; $i++){ print ".globl vector$i\n"; print "vector$i:\n"; if(!($i == 8 || ($i >= 10 && $i <= 14) || $i == 17)){ print " pushl \$0\n"; } print " pushl \$$i\n"; print " jmp alltraps\n"; } print "\n# vector table\n"; print ".data\n"; print ".globl vectors\n"; print "vectors:\n"; for(my $i = 0; $i < 256; $i++){ print " .long vector$i\n"; } ``` - vectors.S https://gist.github.com/syuu1228/3588942 - Alltraps 繼續保存處理器的寄存器 alltraps 的主要任務是設置和恢復調用堆棧,而 trap 的輸入為 trapframe* ， trapframe 中 eip 、 cs 等由硬件設置， trapno 和 err 由 vectors.S 中的代碼設置， alltraps 要做的就是依照下圖順序把 ds 、 es 、其他所有寄存器壓入堆棧。一方面為了保存中斷現場，另一方面也為中斷處理程序獲得與中斷有關的信息提供方便。 ```c= #include "trapasm.S" # vectors.S sends all traps here. .globl alltraps alltraps: # Build trap frame. 保存當前數據段的內容 pushl %ds pushl %es pushl %fs pushl %gs pushal # Set up data and per-cpu segments. # 將當前數據段設置為內核數據段 movw $(SEG_KDATA<<3), %ax movw %ax, %ds movw %ax, %es movw $(SEG_KCPU<<3), %ax movw %ax, %fs movw %ax, %gs # Call trap(tf), where tf=%esp # 以指向 trapframe 內容的指針（esp）作為參數，調用 trap 函數 pushl %esp call trap addl $4, %esp # Return falls through to trapret... .globl trapret trapret: popal popl %gs popl %fs popl %es popl %ds addl $0x8, %esp # trapno and errcode iret ``` :::warning - Problem - Set up data and per-cpu segments? (怎麼設置？為什麼設置每一棵？) ::: ![](https://i.imgur.com/zToEoSc.png) --- ### Code: C trap handler - 每個 Handler 設置 Trap Frame 之後都會呼叫 C 函式 trap，trap 會依照 Trap Number 來找到 tf->trapno，可以知道他位什麼被呼叫以及要做什麼事。 - 如果 Trap 是 T_SYSCALL 會呼叫 Handler Syscall 。 - 之後他會判斷是否為硬體中斷。 - 細節都會再第五章看到。 ### - trap() ```c= //PAGEBREAK: 41 void trap(struct trapframe *tf) { if(tf->trapno == T_SYSCALL){ if(proc->killed) exit(); proc->tf = tf; syscall(); if(proc->killed) exit(); return; } switch(tf->trapno){ case T_IRQ0 + IRQ_TIMER: if(cpu->id == 0){ acquire(&tickslock); ticks++; wakeup(&ticks); release(&tickslock); } lapiceoi(); break; case T_IRQ0 + IRQ_IDE: ideintr(); lapiceoi(); break; case T_IRQ0 + IRQ_IDE+1: // Bochs generates spurious IDE1 interrupts. break; case T_IRQ0 + IRQ_KBD: kbdintr(); lapiceoi(); break; case T_IRQ0 + IRQ_COM1: uartintr(); lapiceoi(); break; case T_IRQ0 + 7: case T_IRQ0 + IRQ_SPURIOUS: cprintf("cpu%d: spurious interrupt at %x:%x\n", cpu->id, tf->cs, tf->eip); lapiceoi(); break; //PAGEBREAK: 13 default: if(proc == 0 || (tf->cs&3) == 0){ // In kernel, it must be our mistake. cprintf("unexpected trap %d from cpu %d eip %x (cr2=0x%x)\n", tf->trapno, cpu->id, tf->eip, rcr2()); panic("trap"); } // In user space, assume process misbehaved. cprintf("pid %d %s: trap %d err %d on cpu %d " "eip 0x%x addr 0x%x--kill proc\n", proc->pid, proc->name, tf->trapno, tf->err, cpu->id, tf->eip, rcr2()); proc->killed = 1; } // Force process exit if it has been killed and is in user space. // (If it is still executing in the kernel, let it keep running // until it gets to the regular system call return.) if(proc && proc->killed && (tf->cs&3) == DPL_USER) exit(); // Force process to give up CPU on clock tick. // If interrupts were on while locks held, would need to check nlock. if(proc && proc->state == RUNNING && tf->trapno == T_IRQ0+IRQ_TIMER) yield(); // Check if the process has been killed since we yielded if(proc && proc->killed && (tf->cs&3) == DPL_USER) exit(); } ``` --- ### Code: System calls (主要在說機制) - Trap 調用 Syscall ### - System call numbers ```c= // File:syscall.h // System call numbers #define SYS_fork 1 #define SYS_exit 2 #define SYS_wait 3 #define SYS_pipe 4 #define SYS_read 5 #define SYS_kill 6 #define SYS_exec 7 #define SYS_fstat 8 #define SYS_chdir 9 #define SYS_dup 10 #define SYS_getpid 11 #define SYS_sbrk 12 #define SYS_sleep 13 #define SYS_uptime 14 #define SYS_open 15 #define SYS_write 16 #define SYS_mknod 17 #define SYS_unlink 18 #define SYS_link 19 #define SYS_mkdir 20 #define SYS_close 21 ``` ### - Function ```c= // File:syscall.c static int (*syscalls[])(void) = { [SYS_fork] sys_fork, [SYS_exit] sys_exit, [SYS_wait] sys_wait, [SYS_pipe] sys_pipe, [SYS_read] sys_read, [SYS_kill] sys_kill, [SYS_exec] sys_exec, [SYS_fstat] sys_fstat, [SYS_chdir] sys_chdir, [SYS_dup] sys_dup, [SYS_getpid] sys_getpid, [SYS_sbrk] sys_sbrk, [SYS_sleep] sys_sleep, [SYS_uptime] sys_uptime, [SYS_open] sys_open, [SYS_write] sys_write, [SYS_mknod] sys_mknod, [SYS_unlink] sys_unlink, [SYS_link] sys_link, [SYS_mkdir] sys_mkdir, [SYS_close] sys_close, }; ``` ### - syscall() ```c= // File:syscall.c void syscall(void) { int num; num = proc->tf->eax; if(num > 0 && num < NELEM(syscalls) && syscalls[num]) { proc->tf->eax = syscalls[num](); } else { cprintf("%d %s: unknown sys call %d\n", proc->pid, proc->name, num); proc->tf->eax = -1; } } ``` - 如何獲得系統調用的參數 - argint() - argptr() - argstr() 分別用於獲取整數，指針和字符串起始地址 --- ### Code: Interrupts - Programmable Interrupt Controler ( PIC ) - 隨著多核處理器主板的出現，需要一種新的處理中斷的方式，因為每一顆CPU 都需要一個中斷控制器來處理髮發送給它的中斷，而且也得有一個方法來分發中斷。這一方式包括兩個部分： - 第一個部分是在I/O 系統中的（IO APIC，ioapic.c）， - 另一部分是關聯在每一個處理器上的（局部APIC，lapic.c）。 xv6 是為搭載多核處理器的主板設計的，每一個處理器都需要編程接受中斷。 - 舊 -> PIC , picirq.c 管理。(單核) - 新 -> APIC , ioapic.c , lapic.c 管理。(多核) - 重點：多核心 - 在多核處理器上，xv6 必須編寫 IOAPIC 和每一個處理器的 LAPIC。IO APIC 維護了一張表，處理器可以通過內存映射I/O寫這個表的表項，而非使用 inb 和 outb 指令(PIC)。在初始化的過程中，xv6 將第 0 號中斷映射到 IRQ 0，以此類推，然後把它們都屏蔽掉。不同的設備自己開啟自己的中斷，並且同時指定哪一個處理器接受這個中斷。舉例來說，xv6 將鍵盤中斷分發到處理器0。將磁盤中斷分發到編號最大的處理器，你們將在下面看到。 ### - lapicinit() ```c= // File:lapic.c void lapicinit(void) { if(!lapic) return; // Enable local APIC; set spurious interrupt vector. lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS)); // The timer repeatedly counts down at bus frequency // from lapic[TICR] and then issues an interrupt. // If xv6 cared more about precise timekeeping, // TICR would be calibrated using an external time source. lapicw(TDCR, X1); lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER)); lapicw(TICR, 10000000); // Disable logical interrupt lines. lapicw(LINT0, MASKED); lapicw(LINT1, MASKED); // Disable performance counter overflow interrupts // on machines that provide that interrupt entry. if(((lapic[VER]>>16) & 0xFF) >= 4) lapicw(PCINT, MASKED); // Map error interrupt to IRQ_ERROR. lapicw(ERROR, T_IRQ0 + IRQ_ERROR); // Clear error status register (requires back-to-back writes). lapicw(ESR, 0); lapicw(ESR, 0); // Ack any outstanding interrupts. lapicw(EOI, 0); // Send an Init Level De-Assert to synchronise arbitration ID's. lapicw(ICRHI, 0); lapicw(ICRLO, BCAST | INIT | LEVEL); while(lapic[ICRLO] & DELIVS); // Enable interrupts on the APIC (but not on the processor). lapicw(TPR, 0); } int cpunum(void) { // Cannot call cpu when interrupts are enabled: // result not guaranteed to last long enough to be used! // Would prefer to panic but even printing is chancy here: // almost everything, including cprintf and panic, calls cpu, // often indirectly through acquire and release. if(readeflags()&FL_IF){ static int n; if(n++ == 0) cprintf("cpu called from %x with interrupts enabled\n", __builtin_return_address(0)); } if(lapic) return lapic[ID]>>24; return 0; } ``` --- ### 補充 PIC - 參考資源： http://docs.linuxtone.org/ebooks/Optimze/Interrupt%20in%20Linux.pdf - PIC - 中斷控制器是連接設備和CPU的橋樑，一個設備產生中斷後，需要經過中斷控制器的轉發，才能最終到達CPU。 - 一般來說5,7,9，10,11管腳是可以被其它設備使用的。但萬惡的是，出於兼容的目的，即使某個管腳沒有接通遺留設備，BIOS通常也會把它預留下來，例如IRQ3,4,13就經常是這種情況。由此可見，PIC能接的設備實在太少了，更致命的是，它無法適用於 MP（多處理器平台。 ![](https://i.imgur.com/5gpSt14.png =500x400) - APIC - APIC由兩部分組成 - 一個稱為 LAPIC （本地高級中斷控制器） - 一個稱為 IOAPIC（I/O APCI，I/O 高級中斷控制器）前者位於 CPU 中，在 MP 平台每個 CPU 都有一個自己的LAPIC。後者通常位於南橋上，像 PIC 一樣連接各個產生中斷的設備。 ![](https://i.imgur.com/dIMiyRX.png) - IOAPIC - IOAPIC最大的作用在於中斷分發，根據其內部的PRT（可編程重定向表）表，IOAPIC 可以格式化出一條中斷消息，發送給某個 CPU 的 LAPIC，由 LAPIC 通知 CPU 進行處理。 | Bit | 描述 | | -------- | -------- | | 63:56 | 目的字段，目的字段，R / W（可讀寫）。根據Destination Filed（見下）值的不同，該字段值的意義不同，它有兩個意義：| | | 1. 物理模式（目的地模式為0時）：其值為APIC ID，用於標識一個唯一的APIC。| | | 2. 邏輯模式（目的地模式為1時）：其值根據LAPIC的不同配置，代表一組CPU（具體見LAPIC相關內容） | | 55:17 | Reserved | | 16 | 中斷屏蔽位，R/W。1 時，對應的中斷管腳被屏蔽，這時產生的中斷將被忽略。0 時，對應管腳產生的中斷被發送至LAPIC。| | 15 | 模式，觸發模式，R/ W。指明該管腳的的中斷由什麼方式觸發。1：電平，電平觸發 2：邊沿，邊沿觸發 | | 14 | Remote IRR，遠程 IRR，RO（只讀）。只對 level 觸發的中斷有效，當該中斷是 edge 觸發時，該值代表的意義未定義。當中斷是 level 觸發時，LAPIC 接收了該中斷，該位置一，LAPIC 寫 EOI 時，該位清零。 | | 13 | Interrupt Input Pin Polarity（INTPOL），中斷管腳的極性，R/W。指定該管腳的有效電平是高電平還是低電平。0：高電平 1：低電平 | | 12 | Delivery Status，傳送狀態，RO。 0：IDEL，當前沒有中斷 1：Send Pending，IOAPIC 已經收到該中斷，但由於某種原因該中斷還未發送給 LAPIC 筆者：某種原因，例如 IOAPIC 沒有競爭到總線 | | 11 | Destination Mode，目的地模式，R/W。 0：Physical Mode，解釋見 Destination Field 1：Logical Mode，同上 | | 10：8 | Fixed： 000b，SMI：010b，NMI：100b，INIT：101b，ExtINT：111b。 | | 7:0 | Interrupt Vector，中斷向量，R/W。指定該中斷對應的 vector，範圍從 10h 到 FEh | - 當 IOAPIC 某個管腳接收到中斷信號後，會根據該管腳對應的 RTE，格式化出一條中斷消息，發送給某個（或多個）CPU 的 LAPIC。從上表我們可以看出，該消息包含了一個中斷的所有信息。 - LAPIC - 收到來自 IOAPIC 的中斷消息後，LAPIC 會將該中斷交由 CPU 處理。和 IOAPIC 比較，LAPIC 具有更多的寄存器以及更複雜的機制。但對於處理來自 IOAPIC 的中斷消息，最重要的寄存器還是 IRR、ISR 以及 EOI。 ![](https://i.imgur.com/5hGW1RQ.png) - 與 PIC 中的 IRR、ISR 不同的是，LAPIC 的 ISR、IRR 均為 256bit 寄存器，對應 x86 平台上的 256 個中斷 vector，其中 0~15 為架構預留。 - IRR：功能和 PIC 的類似，代表 LAPIC 已接收中斷，但還未交 CPU 處理。 - ISR：功能和 PIC 類似，代表 CPU 已開始處理中斷，但還未完成。與 PIC 有所不同的是，當CPU 正在處理某中斷時，同類型中斷如果發生，相應的IRR bit 會再次置一（PIC模式下，同類型的中斷被屏蔽）；如果某中斷被pending 在IRR 中，同類型的中斷發生，則ISR 中相應的bit 被置一。這說明在 APIC 系統中，同一類型中斷最多可以被計數兩次（想不通甚麼意思？想不通就聯想一下 Linux 可信信號）。超過兩次時，不同架構處理不一樣。 - 與 PIC 一樣，LAPIC 同樣需要軟件寫 EOI 來知會中斷處理的完成，不同的是，LAPIC 中的 EOI 是一個 32bit 寄存器。 --- ### Code: Disk driver - main 調用 ideinit - ideinit 調用 picenable 和 ioapicenable 來打開 IDE_IRQ 中斷 - picenable打開單處理器的中斷 - ioapicenable打開多處理器的中斷但只是打開最後一個CPU的中斷。 :::warning 在一個雙處理器系統上，CPU 1 專門處理磁盤中斷。 ::: - ideinit 檢查磁盤硬件： - 最初調用 idewait 來等待磁盤接受命令 - PC 主板通過 I/O 端口 0x1f7 來表示磁盤硬件的狀態 - idewait 直到 IDE_BSY 被清除以及 IDE_DRDY 被設置才得已獲取狀態。 - 磁盤控制器已經就緒後，ideinit 可以檢查有多少磁盤。 - 它假設磁盤0是存在的，因為啟動加載器和內核都是從磁盤0加載的，但它必須檢查磁盤 1。它通過寫 I/O 端口 0x1f6 來選擇磁盤 1 然後等待一段時間，獲取狀態位來查看磁盤是否就緒。如果不就緒，ideinit 認為磁盤不存在。 --- ### Real world - 在這一章中的絕大多數設備使用了I/O 指令來進行編程，但這都是針對老設備的了。而所有現代設備都使用內存映射I/O （Memory-mapped I/O）來進行編程。 --- ### 總結 - 這章對我來說比較需要注意的是，Trap 的機制以及 APIC (多核)。 --- ## 20170909 Meeting Nothing! ---

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.