Multi-threading on top of 'Operating System in 1,000 Lines'

王景霈, 傅孟楷

Descriptions

Your task is to read Operating System in 1,000 Lines and experiment on QEMU, gradually building a small RISC-V operating system. The system then expands the code to have some POSIX Thread interfaces, such as mutex, conditionvariable, semaphore, barrier, etc., and takes priority inversion into consideration.

reference: https://hackmd.io/@sysprog/concurrency-thread-package

System Information

Operating System: Ubuntu 22.04.4 LTS              
          Kernel: Linux 6.8.0-47-generic 
    Architecture: x86-64

Chapter 2

Basic Information

isa-zicsr
ISA-privilged

ISA

CSR instructions p.65

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Conditions determining whether a CSR instruction reads or writes the specified CSR.

Instruction	rd is x0	rs1 is x0	Reads CSR	Writes CSR
CSRRW	Yes	-	No	Yes
CSRRW	No	-	Yes	Yes
CSRRS/CSRRC	-	Yes	Yes	No
CSRRS/CSRRC	-	No	Yes	Yes

Immediate operand

Instruction	rd is x0	uimm = 0	Reads CSR	Writes CSR
CSRRWI	Yes	-	No	Yes
CSRRWI	No	-	Yes	Yes
CSRRSI/CSRRCI	-	Yes	Yes	No
CSRRSI/CSRRCI	-	No	Yes	Yes

Chapter 4

Operating System Hierarchy

Chapter 6

`define NULL ((void *)0)`

void * is a special type of generic pointer in the C language, which represents a "pointer without a specific type".
On some platforms or compilers, directly using 0 as a pointer might result in type mismatch warnings or errors.
Casting 0 explicitly to void * indicates that "this is a null pointer, not just the numeric value 0," ensuring semantic clarity and cross-platform consistency.

Chapter 7

Macro Syntax

__VA_ARGS__ is a standard feature in C used for handling variable argument macros, allowing macros to accept a variable number of arguments.
removes extra commas when the variable argument list is empty, preventing compilation errors.

do {} while(0)

Reasons for Using `do { ... } while(0)` in Macros

Single Statement Behavior
Ensures multi-line macros behave as a single statement, avoiding issues in if-else structures.
Scope Isolation
Encapsulates variables inside the macro to prevent conflicts with external variables.
Consistency
Treats the macro like a regular statement, requiring a semicolon (;) after its usage.
Safety
Prevents unexpected behavior when macros are expanded in different contexts.

:::

Chapter 8

Exception Handling

Simplified RISC-V exception handling process:

Save the current instruction address to sepc (CSR for storing the faulting instruction address).
Record the exception cause in scause.
Jump to the exception handler based on stvec.
The stack pointer (SP) is set to kernel space (via sscratch to store the old user SP).
Necessary register values may be automatically saved to the stack.
Calls handle_trap using a0 (stack pointer) to analyze scause.
Performs specific exception handling and may modify saved register values.
Restores saved register contents.
Uses sret to return to the original execution.

sscratch register usually used to store the user mode's SP while exception occurse. After handling the trap, when we want to return to user mode, we must restore the original user stack pointer using sscratch.
The address of the structure that stores the exception state is saved in a0 and passed to exception_handler .
WRITE_CSR sets stvec to the address to enter the exception_handler entry(kernel_entry).
kernel_entry responsibilities：
1. store current processor's states (registers, stack …).
2. define the reasons of exceptions.
3. execute the corresponding processing logic according to the exception cases.
4. restore the processor, and continue to execute the original program.

Simulating trap

void kernel_main(void)
{
    memset(__bss, 0, (size_t)__bss_end - (size_t)__bss);

    WRITE_CSR(stvec, (uint32_t)kernel_entry);
    __asm__ __volatile__("unimp"); // pseudo instruction to trigger an illegal instruction trap

    for (;;)
    {
        __asm__ __volatile__("wfi");
    }
}

void handle_trap(struct trap_frame *f)
{
    uint32_t scause = READ_CSR(scause);
    uint32_t stval = READ_CSR(stval);
    uint32_t user_pc = READ_CSR(sepc);

    PANIC("unexpected trap scause=%x, stval=%x, sepc=%x\n", scause, stval, user_pc);
}

scause : Stores the type of exception.
stval（Supervisor Trap Value Register）: Stores the value associated with the exception.
spec: Stores the address of the instruction where the exception occurred.
#reg turn reg into string

 __asm__ __volatile__("csrw " #reg ", %0" ::"r"(__tmp))

Chapter 9

Linker Script

A specialized scripting language used to control how the linker arranges memory usage and program layout when combining multiple object files into an executable.

It allows defining symbols like __stack_top, making them accessible in C for memory allocation control.

By defining this in the linker script instead of hardcoding addresses, the linker can determine the position to avoid overlapping with the kernel's static data.

memory allocation

PAGE_SIZE is defined in common.h instead of kernel.h because it is a general variable used in multiple modules, not just for kernel management but also for broader memory management.

next_paddr tracks the next free address for allocation, requiring persistence across function calls, making it suitable for a static variable.

Chapter 10

Process

RISC-V, s0 to s11 are callee-saved registers. Other registers like a0 are caller-saved registers, and already saved on the stack by the caller.
yield() will search the executable process, if there is no executable process, it will excute idle_proc.
While using separate kernel stacks for each process, sscratch will store their own SP of current process.
csrrw sp, sscratch, sp swap operation. Swaps the user mode sp and kernel mode sp, so system can change from user mode to supervisor mode.

Each process has its own independent kernel stack!

about `a0`, `a1` and `naked`

When a function is marked as naked, it skips the standard prologue and epilogue generated by the C language function, relying entirely on inline assembly to handle the function's operations. As a result, the first and second parameters of the function directly correspond to the a0 and a1 registers in the inline assembly.

__attribute__((naked)) void switch_context(uint32_t *prev_sp,
                                           uint32_t *next_sp) {
    __asm__ __volatile__(
        // omitted

        // Switch the stack pointer.
        "sw sp, (a0)\n"         // *prev_sp = sp;
        "lw sp, (a1)\n"         // Switch stack pointer (sp) here
        
        // omitted
    );
}

Explanation of Array Bounds

In C, we can only take the address of the element just past the end of an array (e.g., &array[size]), but we cannot read from or write to it. Accessing any index outside the array bounds, except the one just past the last element for address computation, is considered undefined behavior.

Pointer Behavior in C

The effects of ++ and -- operators in C are determined by the type of the corresponding variable. For example:

uint32_t *sp = (uint32_t *) &proc->stack[sizeof(proc->stack)];
*--sp = 0;

Chapter 11

Page table

map_page() does :

Maps the vaddr to paddr and creates corresponding page tables.
Creates a new page table, if L2 table does not exist.
Sets permissions

sfence.vma used to manage TLB.
Refresh correctly to ensure that the outdated TLB entry will not be used.

Examine page table content

SECTIONS {
. = 0x80200000;
__kernel_base = .;

Examine if kernel virtual address (0x80200000) is correctly mapped to physical address

# satp
| MODE (1 bit) | ASID (9 bits) | PPN (22 bits) |

satp = 80080243, satp stores L1 physical address
We can predict as follow:
PPN = (0x80080243 & 0x3FFFFF) = 0x80243
L1 base physical address = 0x80243 * 4096 = 0x80243000
2. VPN1 = (0x80200000 >> 22) & 0x3FF = 512
VPN0 = (0x80200000 >> 12) & 0x3FF = 512
3. ```
(qemu) xp /x 0x80243000+512*4
0000000080243800: 0x20091001

4.  PPN = 0x20091001 >> 10 = 0x20091
L2 Physical Address = PPN * 4096
= 0x20091 * 4096
= 0x80244000

5. ```
(qemu) xp /x 0x80244000+512*4
0000000080244800: 0x200800cf

L2 PTE = 0x200800cf
6. calculate physical address PTE point to.

#PTE
| PPN (22 bits) | Flags (10 bits) |

0x20080 = PPN
Physical Address = PPN * 4096
= 0x20080 * 4096
= 0x80200000
Indicates that 0x80200000 (virtual address) is mapped to 0x80200000 (physical address).

Also, we can check by:

(qemu) info mem
vaddr    paddr            size     attr
-------- ---------------- -------- -------
80200000 0000000080200000 00001000 rwx--ad

Per-Process Page Tables with Shared Kernel Mapping

In the current implementation, each process has its own page table, and all page tables include mappings to the same kernel space.

Chapter 13

How to get in user mode

After running switch_context() !

If we encounter interrupt or execption, we will want to do context switch to save the information.
Since when creating process, we will set ra to user_entry.

struct process *create_process(const void *image, size_t image_size) {
//omitted
    *--sp = 0;                      // s3
    *--sp = 0;                      // s2
    *--sp = 0;                      // s1
    *--sp = 0;                      // s0
    *--sp = (uint32_t) user_entry;  // ra (changed!)

When doing switch_context , we will read the next_process's ra , and it will lead us to user_entry and go to USER_BASE by sret which be set to sepc . Thus, we will change to user mode.

HOW to activate shell

After several steps above, we will go to user.c start(), and it will call main which define in shell.c (We will compile common.c, user.c, shell.c into shell.elf, so we can call main)

Chapter 14

Shell Exit Flow Triggered by exit()

User-Level Call to `exit()`

The user program (shell.c) calls the exit() function to terminate the process.
exit() in user space internally triggers a system call:

syscall(SYS_EXIT, 0, 0, 0);

SYS_EXIT (with value 3) is passed as the system call number in the a3 register.
The syscall function executes the RISC-V ecall instruction to request a transition to kernel mode.

Trap to Kernel Mode

The ecall instruction generates a synchronous exception.
The processor saves the current user program state:
- sepc: Address of the ecall instruction.
- scause: Exception cause (set to 8, indicating a user-mode ecall).
The processor jumps to the address specified in the stvec register, which is the kernel's trap entry point:

stvec = kernel_entry;

Kernel Trap Handler (`kernel_entry`)

In kernel_entry, the kernel saves the trap frame (registers and state) and calls the main trap handler function:

handle_trap(struct trap_frame *f);

Handling the Exception (`handle_trap`)

The handle_trap function reads the scause register:

scause == 8: Indicates a user-mode ecall.
The kernel calls the system call handler:

handle_syscall(f);

System Call Handler (`handle_syscall`)

In handle_syscall, the kernel checks the system call number passed in the a3 register of the trap frame:
- f->a3 == SYS_EXIT (value 3).
The kernel logs the process termination:

printf("process %d exited\n", current_proc->pid);

The kernel updates the current process's state:

current_proc->state = PROC_EXITED;

The kernel calls yield() to trigger process scheduling.

Process Scheduler (`yield()`)

The scheduler selects the next process to run.
If no other processes are runnable, it falls back to the idle process.
The terminated process (current_proc) is not rescheduled.

Chapter 16

Issue with `readfile`

When we attempted the final readfile in Chapter 16, following the textbook's code exactly, we encountered an issue: the file system initialized successfully and listed two files, but using the readfile command to read the hardcoded hello.txt in shell.c failed to find the file.

Eventually, we resolved the issue by changing the hardcoded "hello.txt" to "./hello.txt" in shell.c. The reason is that the command:

(cd disk && tar cf ../disk.tar --format=ustar ./*.txt)

matches file names with relative paths. As a result, the file names in the file system include the ./ prefix, which must match the file names hardcoded in shell.c to locate the file successfully.

For example:

shiky@SY:~/ncku_master/CO/disk$ ls ./*.txt
./hello.txt  ./meow.txt
shiky@SY:~/ncku_master/CO/disk$ ls *.txt
hello.txt  meow.txt

Implement Pthread in 1K OS

Steps:

Implement kernel thread in kernel.
Provide pthread interface in user space, and interact with kernel threa through syscall .
Solve priority inversion, and implement mutex, priority inheritance in kernel thread.

Future works

futex

tools

check exception happend line
llvm-addr2line -e kernel.elf {sepc}

Challenges in Modifying the System to Support User-Mode Threads

When transitioning the existing minimal system to support threads, we encountered several challenges. The original system was not designed to implement threads. While using kernel-mode threads, the functionality worked as expected, including context switching. However, when attempting to create user-mode threads, we faced two critical issues.

Issue 1: Context Switching from Kernel Mode to User Mode

When using a syscall to create threads and directly assigning the user-mode function as the entry point, context switching from the original kernel mode to the new thread was not possible. This is because context switching relies on ret to return to the address stored in the ra register. However, in the current kernel mode, there is no method to ret to a different stack space.

To address this, an additional function is required to rewrite sepc and sstatus, and finally, invoke sret to return to the specified address in user mode.

Issue 2: User Stack Management in User Mode

Even if the above issue is resolved, switching to user mode leads to execution errors for almost every instruction. Further investigation revealed that this is also a mode management issue. Each user-mode thread must have a dedicated user stack to utilize in user mode. It is not permissible to directly access addresses in the kernel stack while in user mode.

PANIC: kernel.c:772: unexpected trap scause=0000000f, stval=80225d58, sepc=01000026

This issue remains unresolved at the moment.

Issue 3: Can not read the text field

In this github example.
After setting the sp to user entry, the sp been set to the text field address in user mode, but it can not read the address to execute the shell function. Eventually invoke panic.