Multi-threading on top of Operating System in 1,000 Lines

# Multi-threading on top of Operating System in 1,000 Lines 你們的任務是詳閱 [Operating System in 1,000 Lines](https://operating-system-in-1000-lines.vercel.app/en) 並在 QEMU 上實驗，先逐步建構出小型的RISC-V 作業系統，隨後擴充程式碼，使其具備部分 POSIX Thread 介面，例如 mutex, conditionvariable, semaphore, barrier 等，而且該考慮到 priority inversion。參見: https://hackmd.io/@sysprog/concurrency-thread-package [期末專題 hackmd](https://hackmd.io/@sysprog/BJBqnuPBye) ## 事前資料閱讀 ### 基本資料 [isa-zicsr](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view) [ISA-privilged](https://drive.google.com/file/d/17GeetSnT5wW3xNuAHI95-SI1gPGd5sJ_/view) ### ISA #### [CSR instructions]([isa-zicsr](https://drive.google.com/file/d/17GeetSnT5wW3xNuAHI95-SI1gPGd5sJ_/view)) p.65 ![{F2D02CFA-268A-48FC-BA90-61767751FEDA}](https://hackmd.io/_uploads/S1ze7MY8kg.png) ##### Conditions determining whether a CSR instruction reads or writes the specified CSR. * Register operand | **Instruction** | **rd is x0** | **rs1 is x0** | **Reads CSR** | **Writes CSR** | |--------------------|--------------|---------------|---------------|----------------| | CSRRW | Yes | - | No | Yes | | CSRRW | No | - | Yes | Yes | | CSRRS/CSRRC | - | Yes | Yes | No | | CSRRS/CSRRC | - | No | Yes | Yes | * Immediate operand | **Instruction** | **rd is x0** | **uimm = 0** | **Reads CSR** | **Writes CSR** | |------------------------|--------------|--------------|---------------|----------------| | CSRRWI | Yes | - | No | Yes | | CSRRWI | No | - | Yes | Yes | | CSRRSI/CSRRCI | - | Yes | Yes | No | | CSRRSI/CSRRCI | - | No | Yes | Yes | ### 作業系統層次架構 ```graphviz digraph RISC_V_OS { rankdir=TB; node [shape=box, style=filled, fillcolor=lightblue]; UserSpace [label="User Space\nHigh-level APIs\n(e.g., write(), sleep())"]; OSKernel [label="OS Kernel\nSystem Calls\n(e.g., sys_write(), sys_sleep())"]; SBI [label="SBI (Supervisor Binary Interface)\nHardware Abstraction Layer\n(e.g., UART, Timer, Shutdown)"]; Hardware [label="Hardware\nUART, Timer, CPU"]; subgraph cluster_UserSpace { label = "User Space"; style = rounded; node [shape=box]; write [label="write(fd, buf, count)"]; sleep [label="sleep(ticks)"]; } subgraph cluster_OSKernel { label = "OS Kernel"; style = rounded; node [shape=box]; sys_write [label="sys_write(fd, buf, count)"]; sys_sleep [label="sys_sleep(ticks)"]; SBIWrapper [label="SBI Wrappers"]; } subgraph cluster_SBI { label = "SBI"; style = rounded; node [shape=box]; sbi_console_putchar [label="sbi_console_putchar"]; sbi_set_timer [label="sbi_set_timer"]; sbi_shutdown [label="sbi_shutdown"]; } subgraph cluster_Hardware { label = "Hardware"; style = rounded; node [shape=box]; UART [label="UART (Serial Port)"]; Timer [label="Timer"]; CPU [label="CPU"]; } // Connections write -> sys_write; sleep -> sys_sleep; sys_write -> sbi_console_putchar; sys_sleep -> sbi_set_timer; sbi_console_putchar -> UART; sbi_set_timer -> Timer; sbi_shutdown -> CPU; } ``` ### 巨集語法 `__VA_ARGS__` 是 C 的一個標準功能，用於處理可變參數宏。它允許宏接收不定數量的參數 `##` 當可變參數為空時，## 會移除前面多餘的逗號，避免編譯錯誤。 ### 異常處理 RISC-V 中異常處理的簡化過程異常發生後，處理器： 1. 保存當前指令的地址到 sepc (保存異常指令地址的 CSR )。記錄異常原因到 scause。根據 stvec 的值跳轉到異常處理程序。在進入異常處理程序之前： 2. 堆疊指針（SP）被設置到內核專用的空間（可以透過 sscratch 暫存舊的 SP）。一些必要的暫存器值可能已經自動保存到堆疊。異常處理程序： 3. 呼叫 handle_trap 函數，使用 a0 中的堆疊指針，根據 scause 判斷異常的原因。根據異常類型進行具體處理，並可能修改保存的寄存器值。處理完成後： 4. 恢復保存的寄存器內容。使用 sret 返回到異常發生的位置，繼續執行原本的程式。 :::info * sscratch 當 trap 發生時，暫存此執行緒的 kernel sp * 保存異常狀態的結構的地址會保存在 a0 傳遞給 exception_handler * `WRITE_CSR` 設置 `stvec` 為進入異常處理入口 (`kernel_entry`) 的地址。 `kernel_entry` 主要責任是： 1. 保存當前處理器狀態（寄存器、堆疊等）。 2. 確定異常的原因（通過讀取 scause, sepc, stval 等 CSR）。 3. 根據異常原因執行對應的處理邏輯（如非法指令、頁面錯誤或中斷處理）。 4. 恢復處理器狀態，並繼續執行原程序或進行其他操作。 ::: ### 模擬 trap ```clike void kernel_main(void) { memset(__bss, 0, (size_t)__bss_end - (size_t)__bss); WRITE_CSR(stvec, (uint32_t)kernel_entry); __asm__ __volatile__("unimp"); // pseudo instruction to trigger an illegal instruction trap for (;;) { __asm__ __volatile__("wfi"); } } ``` :::info `#reg` 將 reg 轉換成字串 ``` __asm__ __volatile__("csrw " #reg ", %0" ::"r"(__tmp)) ``` ::: ### Linker Script 特殊的腳本語言，用於控制鏈接器（Linker）在將多個目標文件（object files）組合成可執行文件（executable）時，如何安排記憶體的使用和程式的佈局。可以定義例如 `__stack_top ` 等各種符號，讓 c 語言可以使用來控制記憶體分配。 ### memory allocation `PAGE_SIZE` 定義在 `common.h` 而非在 `kernel.h` 因為它為通用的變數，其他模組也可能用到。不單只用於內核管理，廣泛的記憶體管理都會用到。 `next_paddr` 記錄了分配的下一個空閒地址，因此需要在每次呼叫間保留值，靜態變數非常適合這種需求。 ## 實作 ### 系統資訊 ```bash! Operating System: Ubuntu 22.04.4 LTS Kernel: Linux 6.8.0-47-generic Architecture: x86-64 ``` ### 介紹在極簡 RISC-V OS，往「單行程、多執行緒（POSIX threads）」擴充：在不破壞原本多工/分頁/系統呼叫/簡易檔案與 shell 的骨架上，加上 TCB。 ### 實作說明 1. 建立 TCB（[thread.h](https://github.com/nelson0720j/posix_thread_in_1k_lines/commit/22497e4897d0d74114d925f52515051546b3d2e6#diff-76e2e90299ffb45b1f53bfb55d71710faeae10eb96be201efebe5e622e47b59bR17)） ```clike struct thread { int tid; enum thread_state state; uint32_t kernel_stack_sp; struct process *parent; uint8_t kernel_stack[8192]; int priority; }; ``` 2. 建立 [thread](https://github.com/nelson0720j/posix_thread_in_1k_lines/commit/22497e4897d0d74114d925f52515051546b3d2e6#diff-161b2a279f4c67a1ab075a7890ecf6f3f1d483d910910fa52b3715e25cfdcbd7R83) * 從 threads[] 找空位 * 準備 kernel stack 初值 * 放到就緒隊列 ```clike int thread_create(struct process *proc, void (*fn)(void *), void *arg){ // 尋找一個空的threads[i] int i; for(i=0; i<MAX_THREADS; i++){ if(threads[i].state == THREAD_UNUSED){ break; } } if(i == MAX_THREADS){ return -1; // no slot } struct thread *t = &threads[i]; t->tid = alloc_tid++; t->state = THREAD_RUNNABLE; t->parent = proc; // 初始化 kernel stack uint32_t *sp = (uint32_t*)(t->kernel_stack + sizeof(t->kernel_stack)); *--sp = (uint32_t)&thread_trampoline; // 預留空間給 callee-saved registers (s0~s11) for(int i=0; i<12; i++){ *--sp = 0; } *--sp = (uint32_t)arg; *--sp = (uint32_t)fn; t->kernel_stack_sp = (uint32_t)sp; // 放到就緒隊列 enqueue_thread(t); return t->tid; } ``` :::spoiler |--------------------------| <- stack top (高位址) | fn | | arg | | s0 | | s1 | | ... | | s11 | | return address (ra=thread_trampoline) |--------------------------| <- stack bottom (低位址) ::: 3. 第一次執行入口（[thread_trampoline](https://github.com/nelson0720j/posix_thread_in_1k_lines/commit/22497e4897d0d74114d925f52515051546b3d2e6#diff-161b2a279f4c67a1ab075a7890ecf6f3f1d483d910910fa52b3715e25cfdcbd7R68)，在 S-mode） * 從當前 kernel stack 取出 (fn, arg) * 直接 jalr 呼叫 fn(arg) * 回來就 thread_exit() ```clike __attribute__((naked)) static void thread_trampoline(void){ __asm__ volatile ( // 從當前 kernel stack 取出 fn 與 arg "lw a0, 0(sp)\n" // a0=arg "lw a1, 4(sp)\n" // a1=fn "jalr a1\n" // 跳到真正要執行的函式 fn 的位址 // 執行完 fn() 就 thread_exit "call thread_exit\n" "1: j 1b\n" ); } ``` 4. [排程](https://github.com/nelson0720j/posix_thread_in_1k_lines/commit/22497e4897d0d74114d925f52515051546b3d2e6#diff-161b2a279f4c67a1ab075a7890ecf6f3f1d483d910910fa52b3715e25cfdcbd7R140)與情境切換 * 簡單 RR：目前 RUNNABLE 的先丟回佇列，再取下一個 * 若跨 process，先切 satp（page table），sfence.vma (清除TLB) * 設 sscratch 為要切入 thread 的 kernel stack top * 呼叫 switch_context(&prev_sp, &next_sp)，只交換 kernel SP，恢復 * s0~s11、ra，跳到對方的 kernel stack 上繼續執行 5. user side 介面與測試 * user space 中包裝好 sys call [thread_create(fn,arg) / thread_exit()](https://github.com/nelson0720j/posix_thread_in_1k_lines/commit/22497e4897d0d74114d925f52515051546b3d2e6#diff-d44815bd5187b5fda25f879bf3d1b536966c0f5bc596ec776eed9b49f1035b7eR40) * shell 新增 thread 指令 ### user stack ``` kernel (S-mode) | | trap_return (從 trapframe 還原 user 狀態，sret) 等同 kernel_entry 尾端 v sret → U-mode | | thread_trampoline_u v user fn(arg) ``` 建立 user stack 1. TCB 新增以下: `struct trapframe *tf` // trapframe 位址 `*user_stack_top` 建 thread 時，先把它放到自己的 kernel stack 頂端 2. thread create 新增: 分配 user stack 在該 thread 的 kernel stack 上預留一塊位址，讓 trapframe 在 kernel stack 上面設定 sepc 為 thread_trampoline_u sstatus 設成 SPP=U 3. 排程切換若換到不同 process：切 satp + sfence.vma 更新 `sscratch = next->kstack_top` 更新 kernel stack 位址 `switch_context(&prev->kernel_stack_sp, &next->kernel_stack_sp)` 4. trap 入口/出口建立 thread_trampoline_u (user-side thread 的入口點（U-mode）) 流程: 1. thread_create 2. 第一次 context switch 3. 在 thread_trampoline： kernel thread：直接 call fn(arg)；結束呼叫 thread_exit()。 user thread：載入 trapframe，sret → U-mode 進入 thread_trampoline_u。 4. thread_trampoline_u：取出 fn/arg → 執行 → syscall(thread_exit) 回 kernel 清理。