# Multi-threading on top of 'Operating System in 1,000 Lines' > 王景霈, 傅孟楷 [GitHub](https://github.com/k12795g/posix_thread_in_1k_lines) ## Descriptions Your task is to read [Operating System in 1,000 Lines](https://operating-system-in-1000-lines.vercel.app/en) and experiment on QEMU, gradually building a small RISC-V operating system. The system then expands the code to have some POSIX Thread interfaces, such as mutex, conditionvariable, semaphore, barrier, etc., and takes priority inversion into consideration. reference: https://hackmd.io/@sysprog/concurrency-thread-package ## System Information ```bash! Operating System: Ubuntu 22.04.4 LTS Kernel: Linux 6.8.0-47-generic Architecture: x86-64 ``` # Chapter 2 ## Basic Information [isa-zicsr](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view) [ISA-privilged](https://drive.google.com/file/d/17GeetSnT5wW3xNuAHI95-SI1gPGd5sJ_/view) ## ISA ### [CSR instructions]([isa-zicsr](https://drive.google.com/file/d/17GeetSnT5wW3xNuAHI95-SI1gPGd5sJ_/view)) p.65 ![{F2D02CFA-268A-48FC-BA90-61767751FEDA}](https://hackmd.io/_uploads/S1ze7MY8kg.png) #### Conditions determining whether a CSR instruction reads or writes the specified CSR. * Register operand | **Instruction** | **rd is x0** | **rs1 is x0** | **Reads CSR** | **Writes CSR** | |--------------------|--------------|---------------|---------------|----------------| | CSRRW | Yes | - | No | Yes | | CSRRW | No | - | Yes | Yes | | CSRRS/CSRRC | - | Yes | Yes | No | | CSRRS/CSRRC | - | No | Yes | Yes | * Immediate operand | **Instruction** | **rd is x0** | **uimm = 0** | **Reads CSR** | **Writes CSR** | |------------------------|--------------|--------------|---------------|----------------| | CSRRWI | Yes | - | No | Yes | | CSRRWI | No | - | Yes | Yes | | CSRRSI/CSRRCI | - | Yes | Yes | No | | CSRRSI/CSRRCI | - | No | Yes | Yes | # Chapter 4 ## Operating System Hierarchy ```graphviz digraph RISC_V_OS { rankdir=TB; node [shape=box, style=filled, fillcolor=lightblue]; UserSpace [label="User Space\nHigh-level APIs\n(e.g., write(), sleep())"]; OSKernel [label="OS Kernel\nSystem Calls\n(e.g., sys_write(), sys_sleep())"]; SBI [label="SBI (Supervisor Binary Interface)\nHardware Abstraction Layer\n(e.g., UART, Timer, Shutdown)"]; Hardware [label="Hardware\nUART, Timer, CPU"]; subgraph cluster_UserSpace { label = "User Space"; style = rounded; node [shape=box]; write [label="write(fd, buf, count)"]; sleep [label="sleep(ticks)"]; } subgraph cluster_OSKernel { label = "OS Kernel"; style = rounded; node [shape=box]; sys_write [label="sys_write(fd, buf, count)"]; sys_sleep [label="sys_sleep(ticks)"]; SBIWrapper [label="SBI Wrappers"]; } subgraph cluster_SBI { label = "SBI"; style = rounded; node [shape=box]; sbi_console_putchar [label="sbi_console_putchar"]; sbi_set_timer [label="sbi_set_timer"]; sbi_shutdown [label="sbi_shutdown"]; } subgraph cluster_Hardware { label = "Hardware"; style = rounded; node [shape=box]; UART [label="UART (Serial Port)"]; Timer [label="Timer"]; CPU [label="CPU"]; } // Connections write -> sys_write; sleep -> sys_sleep; sys_write -> sbi_console_putchar; sys_sleep -> sbi_set_timer; sbi_console_putchar -> UART; sbi_set_timer -> Timer; sbi_shutdown -> CPU; } ``` # Chapter 6 ## `define NULL ((void *)0)` `void *` is a special type of generic pointer in the C language, which represents a "pointer without a specific type". On some platforms or compilers, directly using 0 as a pointer might result in type mismatch warnings or errors. Casting 0 explicitly to void * indicates that "this is a null pointer, not just the numeric value 0," ensuring semantic clarity and cross-platform consistency. # Chapter 7 ## Macro Syntax `__VA_ARGS__` is a standard feature in C used for handling variable argument macros, allowing macros to accept a variable number of arguments. removes extra commas when the variable argument list is empty, preventing compilation errors. ## do {} while(0) ### Reasons for Using `do { ... } while(0)` in Macros 1. **Single Statement Behavior** Ensures multi-line macros behave as a single statement, avoiding issues in `if-else` structures. 2. **Scope Isolation** Encapsulates variables inside the macro to prevent conflicts with external variables. 3. **Consistency** Treats the macro like a regular statement, requiring a semicolon (`;`) after its usage. 4. **Safety** Prevents unexpected behavior when macros are expanded in different contexts. ::: # Chapter 8 ## Exception Handling Simplified RISC-V exception handling process: 1. Save the current instruction address to `sepc` (CSR for storing the faulting instruction address). Record the exception cause in `scause`. Jump to the exception handler based on `stvec`. 2. The stack pointer (SP) is set to kernel space (via `sscratch` to store the old user SP). Necessary register values may be automatically saved to the stack. 3. Calls `handle_trap` using `a0` (stack pointer) to analyze `scause`. Performs specific exception handling and may modify saved register values. 4. Restores saved register contents. Uses `sret` to return to the original execution. :::info * `sscratch` register usually used to store the user mode's SP while exception occurse. After handling the trap, when we want to return to user mode, we must restore the original user stack pointer using `sscratch`. * The address of the structure that stores the exception state is saved in `a0` and passed to `exception_handler` . * `WRITE_CSR` sets `stvec` to the address to enter the exception_handler entry(`kernel_entry`). `kernel_entry` responsibilities: 1. store current processor's states (registers, stack ...). 2. define the reasons of exceptions. 3. execute the corresponding processing logic according to the exception cases. 4. restore the processor, and continue to execute the original program. ## Simulating trap ```clike void kernel_main(void) { memset(__bss, 0, (size_t)__bss_end - (size_t)__bss); WRITE_CSR(stvec, (uint32_t)kernel_entry); __asm__ __volatile__("unimp"); // pseudo instruction to trigger an illegal instruction trap for (;;) { __asm__ __volatile__("wfi"); } } ``` ```clike void handle_trap(struct trap_frame *f) { uint32_t scause = READ_CSR(scause); uint32_t stval = READ_CSR(stval); uint32_t user_pc = READ_CSR(sepc); PANIC("unexpected trap scause=%x, stval=%x, sepc=%x\n", scause, stval, user_pc); } ``` :::info * `scause` : Stores the type of exception. * `stval`(Supervisor Trap Value Register): Stores the value associated with the exception. * `spec`: Stores the address of the instruction where the exception occurred. * `#reg` turn `reg` into string ``` __asm__ __volatile__("csrw " #reg ", %0" ::"r"(__tmp)) ``` ::: # Chapter 9 ## Linker Script A specialized scripting language used to control how the linker arranges memory usage and program layout when combining multiple object files into an executable. It allows defining symbols like `__stack_top`, making them accessible in C for memory allocation control. By defining this in the linker script instead of hardcoding addresses, the linker can determine the position to avoid overlapping with the kernel's static data. ## memory allocation `PAGE_SIZE` is defined in `common.h` instead of `kernel.h` because it is a general variable used in multiple modules, not just for kernel management but also for broader memory management. `next_paddr` tracks the next free address for allocation, requiring persistence across function calls, making it suitable for a static variable. # Chapter 10 ## Process * RISC-V, s0 to s11 are callee-saved registers. Other registers like a0 are caller-saved registers, and already saved on the stack by the caller. * `yield()` will search the executable process, if there is no executable process, it will excute idle_proc. * While using separate kernel stacks for each process, `sscratch` will store their own SP of current process. * `csrrw sp, sscratch, sp` swap operation. Swaps the user mode sp and kernel mode sp, so system can change from user mode to supervisor mode. :::info Each process has its own independent kernel stack! ::: ## about `a0`, `a1` and `naked` When a function is marked as naked, it skips the standard prologue and epilogue generated by the C language function, relying entirely on inline assembly to handle the function's operations. As a result, the first and second parameters of the function directly correspond to the a0 and a1 registers in the inline assembly. ```asm __attribute__((naked)) void switch_context(uint32_t *prev_sp, uint32_t *next_sp) { __asm__ __volatile__( // omitted // Switch the stack pointer. "sw sp, (a0)\n" // *prev_sp = sp; "lw sp, (a1)\n" // Switch stack pointer (sp) here // omitted ); } ``` :::info ## Explanation of Array Bounds In C, we can only take the address of the element **just past the end** of an array (e.g., `&array[size]`), but we cannot read from or write to it. Accessing any index outside the array bounds, except the one just past the last element for address computation, is considered undefined behavior. ::: :::info ## Pointer Behavior in C The effects of `++` and `--` operators in C are determined by the type of the corresponding variable. For example: ```c uint32_t *sp = (uint32_t *) &proc->stack[sizeof(proc->stack)]; *--sp = 0; ``` ::: # Chapter 11 ## Page table * `map_page()` does : 1. Maps the `vaddr` to `paddr` and creates corresponding page tables. 2. Creates a new page table, if L2 table does not exist. 3. Sets permissions * `sfence.vma` used to manage TLB. Refresh correctly to ensure that the outdated TLB entry will not be used. ### Examine page table content ``` SECTIONS { . = 0x80200000; __kernel_base = .; ``` Examine if kernel virtual address (0x80200000) is correctly mapped to physical address 1. ``` # satp | MODE (1 bit) | ASID (9 bits) | PPN (22 bits) | ``` satp = 80080243, satp stores L1 physical address We can predict as follow: PPN = (0x80080243 & 0x3FFFFF) = 0x80243 L1 base physical address = 0x80243 * 4096 = 0x80243000 2. VPN1 = (0x80200000 >> 22) & 0x3FF = 512 VPN0 = (0x80200000 >> 12) & 0x3FF = 512 3. ``` (qemu) xp /x 0x80243000+512*4 0000000080243800: 0x20091001 ``` 4. PPN = 0x20091001 >> 10 = 0x20091 L2 Physical Address = PPN * 4096 = 0x20091 * 4096 = 0x80244000 5. ``` (qemu) xp /x 0x80244000+512*4 0000000080244800: 0x200800cf ``` L2 PTE = 0x200800cf 6. calculate physical address PTE point to. ``` #PTE | PPN (22 bits) | Flags (10 bits) | ``` 0x20080 = PPN Physical Address = PPN * 4096 = 0x20080 * 4096 = 0x80200000 Indicates that 0x80200000 (virtual address) is mapped to 0x80200000 (physical address). Also, we can check by: ``` (qemu) info mem vaddr paddr size attr -------- ---------------- -------- ------- 80200000 0000000080200000 00001000 rwx--ad ``` ## Per-Process Page Tables with Shared Kernel Mapping In the current implementation, each process has its own page table, and all page tables include mappings to the same kernel space. # Chapter 13 ## How to get in user mode After running `switch_context()` ! If we encounter interrupt or execption, we will want to do context switch to save the information. Since when creating process, we will set `ra` to `user_entry`. ``` struct process *create_process(const void *image, size_t image_size) { //omitted *--sp = 0; // s3 *--sp = 0; // s2 *--sp = 0; // s1 *--sp = 0; // s0 *--sp = (uint32_t) user_entry; // ra (changed!) ``` When doing `switch_context` , we will read the next_process's `ra` , and it will lead us to `user_entry` and go to `USER_BASE` by `sret` which be set to `sepc` . Thus, we will change to user mode. ### HOW to activate shell After several steps above, we will go to `user.c` `start()`, and it will call `main` which define in `shell.c` (We will compile common.c, user.c, shell.c into shell.elf, so we can call `main`) # Chapter 14 ## Shell Exit Flow Triggered by exit() ### **User-Level Call to `exit()`** The user program (`shell.c`) calls the `exit()` function to terminate the process. `exit()` in user space internally triggers a system call: ```c syscall(SYS_EXIT, 0, 0, 0); ``` - `SYS_EXIT` (with value `3`) is passed as the system call number in the `a3` register. - The `syscall` function executes the RISC-V `ecall` instruction to request a transition to kernel mode. --- ### **Trap to Kernel Mode** 1. The `ecall` instruction generates a synchronous exception. 2. The processor saves the current user program state: - `sepc`: Address of the `ecall` instruction. - `scause`: Exception cause (set to `8`, indicating a user-mode `ecall`). 3. The processor jumps to the address specified in the `stvec` register, which is the kernel's trap entry point: ```c stvec = kernel_entry; ``` --- ### **Kernel Trap Handler (`kernel_entry`)** In `kernel_entry`, the kernel saves the trap frame (registers and state) and calls the main trap handler function: ```c handle_trap(struct trap_frame *f); ``` --- ### **Handling the Exception (`handle_trap`)** The `handle_trap` function reads the `scause` register: - `scause == 8`: Indicates a user-mode `ecall`. The kernel calls the system call handler: ```c handle_syscall(f); ``` --- ### **System Call Handler (`handle_syscall`)** 1. In `handle_syscall`, the kernel checks the system call number passed in the `a3` register of the trap frame: - `f->a3 == SYS_EXIT` (value `3`). 2. The kernel logs the process termination: ```c printf("process %d exited\n", current_proc->pid); ``` 3. The kernel updates the current process's state: ```c current_proc->state = PROC_EXITED; ``` 4. The kernel calls `yield()` to trigger process scheduling. --- ### **Process Scheduler (`yield()`)** 1. The scheduler selects the next process to run. 2. If no other processes are runnable, it falls back to the idle process. 3. The terminated process (`current_proc`) is not rescheduled. # Chapter 16 ## Issue with `readfile` When we attempted the final `readfile` in Chapter 16, following the textbook's code exactly, we encountered an issue: the file system initialized successfully and listed two files, but using the `readfile` command to read the hardcoded `hello.txt` in `shell.c` failed to find the file. Eventually, we resolved the issue by changing the hardcoded `"hello.txt"` to `"./hello.txt"` in `shell.c`. The reason is that the command: ```bash (cd disk && tar cf ../disk.tar --format=ustar ./*.txt) ``` matches file names with relative paths. As a result, the file names in the file system include the `./` prefix, which must match the file names hardcoded in `shell.c` to locate the file successfully. For example: ```bash shiky@SY:~/ncku_master/CO/disk$ ls ./*.txt ./hello.txt ./meow.txt shiky@SY:~/ncku_master/CO/disk$ ls *.txt hello.txt meow.txt ``` # Implement Pthread in 1K OS Steps: 1. Implement kernel thread in kernel. 2. Provide pthread interface in user space, and interact with kernel threa through `syscall` . 3. Solve priority inversion, and implement `mutex`, `priority inheritance` in kernel thread. ## Future works futex ## tools check exception happend line `llvm-addr2line -e kernel.elf {sepc}` # Challenges in Modifying the System to Support User-Mode Threads When transitioning the existing minimal system to support threads, we encountered several challenges. The original system was not designed to implement threads. While using kernel-mode threads, the functionality worked as expected, including context switching. However, when attempting to create user-mode threads, we faced two critical issues. ### Issue 1: Context Switching from Kernel Mode to User Mode When using a syscall to create threads and directly assigning the user-mode function as the entry point, context switching from the original kernel mode to the new thread was not possible. This is because context switching relies on `ret` to return to the address stored in the `ra` register. However, in the current kernel mode, there is no method to `ret` to a different stack space. To address this, an additional function is required to rewrite `sepc` and `sstatus`, and finally, invoke `sret` to return to the specified address in user mode. ### Issue 2: User Stack Management in User Mode Even if the above issue is resolved, switching to user mode leads to execution errors for almost every instruction. Further investigation revealed that this is also a mode management issue. Each user-mode thread must have a dedicated user stack to utilize in user mode. It is not permissible to directly access addresses in the kernel stack while in user mode. ``` PANIC: kernel.c:772: unexpected trap scause=0000000f, stval=80225d58, sepc=01000026 ``` This issue remains unresolved at the moment. ### Issue 3: Can not read the text field In [this github example.](https://github.com/nelson0720j/posix_thread_in_1k_lines/tree/pthread) After setting the sp to user entry, the sp been set to the text field address in user mode, but it can not read the address to execute the shell function. Eventually invoke panic.