Try   HackMD

Port xv6-riscv to 32-bit RV32I core

陳昭詣, 周姵彣

GitHub

Project Goal

Porting the xv6-riscv Operating System to the RISC-V RV32I Architecture and Running it on QEMU Emulator.

  1. xv6-riscv boots successfully on the QEMU emulator.
  2. Core functionalities of the system (e.g., output, interrupt handling, memory management) work correctly on the RV32I architecture.
  3. All modifications ensure compatibility with the 32-bit architecture.

Steps

1. Prepare the Development Environment

  • Install the RISC-V toolchain (e.g., riscv32-unknown-elf-gcc) to ensure compatibility with the RV32I architecture.
  • Configure environment variables by adding the toolchain path to PATH.

2. Modify xv6-riscv Code

(1) entry.S:

  • Resolve unsupported instructions in the RISC-V 32-bit architecture (e.g., csrr mhartid) by replacing them with RV32I-compatible instructions.

(2) printf.c and trap.c:

  • Fix type mismatches caused by 64-bit integers, adapting the code for 32-bit processing.

(3) VirtIO Driver:

  • Adjust handling of 64-bit shift operations (e.g., >> 32) to fit the 32-bit architecture.
  • Modify data structures to be compatible with the 32-bit architecture.

3. Adjust Hardware Support

(1) ACLINT/CLINT:

  • Update kernel access to CLINT to ensure proper timer and interrupt control in the 32-bit architecture.

(2) Page Table Entries:

  • Ensure memory management and page table configurations are compatible with the 32-bit kernel.

(3) Memory Management:

  • A 32-bit architecture supports only 4GB of virtual address space, requiring a redesign of memory allocation and management strategies.
  • The memory layout in xv6, originally designed for a 64-bit architecture, may need to be adapted for a 32-bit address space.

(4) File System:

  • The sizes of file system structures (e.g., inode, superblock) may not align with the memory alignment requirements of a 32-bit architecture. Therefore, it is necessary to verify whether the data types used in the file system structures are compatible with a 32-bit system.

(5) User Mode:

  • Verify and enable proper functionality of user mode in the 32-bit architecture.

(6) Trap:

  • In a 32-bit architecture, the Trap Vector Table (TVT) and the handling of hardware traps (e.g., w_stvec) must be adapted accordingly.

(7) Interrupt & Device Driver:

  • The priority, trigger mechanism, and handling logic of interrupts need to be checked for compatibility with a 32-bit architecture.
  • Virtual devices like VirtIO, which typically use 64-bit data structures, need to be adjusted to 32-bit.

4. Compile xv6-riscv

(1) Use the RISC-V toolchain (e.g., riscv32-unknown-elf-gcc) to compile the modified xv6 code into an executable kernel for the RV32I architecture.
(2) Resolve compilation errors and ensure the generated files are fully compatible with the 32-bit architecture.

5. Test on QEMU Emulator

(1) Run the compiled xv6 kernel on the QEMU emulator:

  • Example command:
    qemu-system-riscv32 -machine virt -kernel kernel/kernel.elf

(2) Test the kernel's core functionalities, including:

  • Console output (verify that functions like puts and putchar work correctly).
  • Interrupt handling.
  • Memory management.

(3) Validate that the system boots successfully and identify any issues that may require further adjustments.

Prerequisite

https://github.com/mit-pdos/xv6-riscv
https://github.com/nananapo/xv6-rv32
https://github.com/jserv/xv6-riscv
https://github.com/harihitode/ladybird-xv6/commits/ladybird/

Modify Content

Modify xv6-riscv Code

(1) entry.S:

  • Original Code
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
csrr a1, mhartid

This code is used to retrieve the hartid from the mhartid register.

  • Modified Code
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
li a1, 0

The hartid is directly set to 0, assuming a single-core processor environment.

Reason for Modification

  1. RV32I Architecture Does Not Support mhartid Instruction
    In the RISC-V 32-bit architecture, the csrr mhartid instruction cannot be used because mhartid is a privileged-mode register, which is unsupported in the 32-bit environment.
  2. Single-Core Processor Assumption
    Setting the hartid to 0 is based on the assumption of a single-core environment, where the default hardware thread ID is 0.
    This assumption is suitable for single-core simulation environments (e.g., when running on QEMU) to ensure the system operates correctly.

(2) printf.c :

  • Replace all occurrences of long long and unsigned long long in the functions with int and unsigned int to adapt to a 32-bit architecture.
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Reason for Modification

When porting to a 32-bit architecture, long long and unsigned long long are 64-bit data types, which exceed the range of a 32-bit architecture.

(3) trap.c :

  • Replace incompatible pointer-to-integer and integer-to-pointer conversions with uintptr_t.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Mark the unused variable trampoline_userret as used with void to avoid warnings.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Reason for Modification

  • Pointer-to-integer conversion issue:
    (1)In a 32-bit architecture, pointer size is 32 bits, whereas the previous code may have been based on a 64-bit architecture, leading to mismatched conversions.
    (2)Use uintptr_t to standardize conversions between pointers and integers, ensuring cross-platform compat
  • Unused variable warning:
    The compiler issued a warning for unused variables. To keep the code clean, the unused variables have been removed.

(4)VirtIO Driver:

Modify 1: features_low and features_high

  • Before modification: A single 64-bit integer features was used to represent the VirtIO device's feature fields.
  • After modification: The features field has been split into two 32-bit variables:
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
  • Reason for modification:
    (1) In a 32-bit architecture, 64-bit data cannot be handled directly. Therefore, the 64-bit field needs to be split into high and low parts.
    (2) The high field is marked as used with (void)features_high;to avoid compiler warnings, as currently only the low feature field needs to be processed.

Modify 2: Operating on features_low

  • Before modification: The original code might directly perform bit operations on the 64-bit features.
  • After modification: Bit operations are performed on features_low to disable unnecessary VirtIO features:
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
  • Reason for modification:
    A 32-bit architecture can only directly handle the low feature field, so all operations are performed on features_low.

Modify 3: Handling high and low parts of physical addresses

  • Before modification: The code might directly use a 64-bit integer to set the physical address:
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
  • After modification: The 64-bit address is split into high and low parts and written separately:
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
  • Reason for modification:
    In a 32-bit architecture, physical addresses can only be represented using two 32-bit fields, with the high and low parts handled separately.

Adjust Hardware Support

(1) Memory Management:

  • Original Code:
    image
    (1) uint64 is not suitable for a 32-bit architecture.
    (2) Standard cross-platform types were not used.
  • Modified Code:
    image
    (1) Use uintptr_t instead of uint64.
    (2) Improved type conversion: Explicitly convert the return values of pa_start and PGROUNDUP to uintptr_t, and then cast to char* to ensure type consistency.
  • Reason for modification:
    (1) Adapt to a 32-bit architecture.
    (2) Eliminate type mismatch errors.
    (3) Enhance cross-architecture portability.

(2) Page Table:

  • Original Code(kernel/riscv.h):
    image
    (kernel/vm.c)
    image

  • Modified Code(kernel/riscv.h):
    image
    (kernel/vm.c)
    image
    (1) Adjust bits and values to 32-bit.
    (2) walk() function adjustment:
    For the memory allocation pointer returned by kalloc(), use uintptr_t and proper casting to handle 32-bit addresses.

  • Reason for modification:
    The modifications are necessary because the original code was designed for a 64-bit architecture, whereas in a 32-bit environment, the address space, data types, and page table format are different. Without these changes, the following issues may arise:
    (1) Address calculation errors (overflow or incorrect addresses).
    (2) Page table entry access failures.
    (3) Memory allocation and management errors.
    (4) Inability to properly execute virtual memory management functions.

RISC-V MODES

CPU has multiple modes, each with different privileges. In RISC-V, there are three modes:

Mode Overview xv6
Machine mode Highest, most powerful. Startup + Initialization 、 Timer Interrupts
Supervisor mode Mode in which the kernel operates All kernel code and some instructions are privileged runs in this mode.
User mode Mode in which applications operate All user code runs in this mode.

Xv6-riscv Start up







G



entry

kernel/entry.S



start

kernel/start.c



entry->start





desc1

Set up stack



entry->desc1




main

kernel/main.c



start->main





desc2

Machine Mode



start->desc2




desc3

Supervisor Mode



main->desc3




CSRs (Control and Status Registers)

Opcode and operands Overview
csrr rd, csr Read from CSR
csrw csr, rs Write to CSR
csrrw rd, csr, rs Read from and write to CSR at once
sret Return from trap handler (restoring program counter, operation mode, etc.)
sfence.vma Clear Translation Lookaside Buffer (TLB)

Machine-level CSR in xv6

Name Description
mhartid Hardware thread ID .
mstatus Machine status register.
mstatush Additional machine status register, RV32 only. Contain the same fields found in bits 62:36 of mstatus
mtvec Machine trap-handler base address.
mepc Machine exception program counter.
mscratch Scratch register for machine trap handlers.
mie Machine interrupt-enable register.
medeleg Machine exception delegation register.
medelegh Upper 32 bits of medeleg, RV32 only.
mideleg Machine interrupt delegation register.
pmpcfg0 Physical memory protection configuration.
pmpcfg1 Upper 32 bits of medeleg, RV32 only.
pmpaddr0 Physical memory protection address register.

Page table

RISC-V have three types of page table:

  • sv32: two level (rv32)
  • sv39: three level (rv64, xv6-64bit)
  • sv48: four level (rv64 )

sv32 Page Table Entries

Reference:sv32

Sv32 uses a two-level page table to enable virtual memory. A virtual address contains two Virtual Page Numbers (VPN) and an offset. The Physical Page Number (PPN) of the leaf Page Table Entry (PTE) is combined with the offset to form the physical address.

RV32

Trap machinery

There are three kinds of event which cause the CPU to set aside ordinary execution of instructions and force a transfer of control to special code that handles the event.

  1. System Call ( Synchronous ) :
    When a user program executes the ecall instruction to ask the kernel to do something for it.

Reference: user/user.h

  1. Exception ( Synchronous ) :
    An unusual condition occurring at run time associated with an
    instruction in the current RISC-V hart.
  • Program Error ( Illegal Instruction, Alignment Error, Memory Access/Page Fault, etc. )
  1. Interrupt ( Asynchronous ) :
    An external asynchronous event that may cause a RISC-V hart (Hardware thread) to experience an unexpected transfer of control.
  • Software Interrupt
  • Timer Interrupt (Only in MACHINE MODE)
  • External Interrupt (Disk & UART)

Supervisor-Level CSRs

Reference: kernel/riscv.h & RISC-V Instruction Set Manual

  • stvec (Supervisor Exception Program Counter) : The kernel writes the address of its trap handler here; the RISC-V jumps to the address in stvec to handle a trap.

  • sepc (Supervisor Exception Program Counter) : When a trap occurs, RISC-V saves the program counter here (since the pc is then overwritten with the value in stvec ). The sret (return from trap) instruction copies sepc to the pc. The kernel can write sepc to control where sret goes.

  • scause: RISC-V puts a number here that describes the reason for the trap.

  • sscratch (Supervisor Scratch Register): A temporary storage to save the stack pointer at the time of exception occurrence, which is later restored. For example,hold the address of the trapframe page while the hart is executing user code.

  • sstatus (Supervisor Statue Register) : The SIE (Supervisor mode Interrupt Enable) bit controls whether device interrupts are allowed. If the kernel clears SIE bit, the RISC-V will defer device interrupts until SIE is set again. The SPP bit indicates whether a trap comes from user mode or supervisor mode,and controls to what mode sret return

  • satp (Supervisor Address Translation and Protection Register) : Holds the address of the page table.

Trap Roadmap







G



SH

Shell



write

trap



ecall

ecall



write->ecall





uservec_trampoline

uservec--kernel/trampoline.S



ecall->uservec_trampoline


Supervisor Mode



usertrap

usertrap()--kernel/trap.c



uservec_trampoline->usertrap


stvec



Exce

Exception



usertrap->Exce





DEVI

DEVICE



usertrap->DEVI





SYSC

SYSCALL



usertrap->SYSC





TIMER

TIMER



usertrap->TIMER





exception

print 
 exit()



syscall

syscall()



usertrapret

usertrapret()



syscall->usertrapret





userret

userret()--kernel/trampoline.S



usertrapret->userret





userret->ecall


sret , User Mode



device

deviceintr()



device->usertrapret





timer

yield()



timer->usertrapret





Exce->exception





DEVI->device





SYSC->syscall





TIMER->timer





Hardware layer

  1. If the trap is a device interrupt, and the sstatus SIE bit is clear, skip the following steps.
  2. Disable interrupts by clearing the SIE bit in sstatus.
  3. Copy the pc to sepc.
  4. Save the current mode (user or supervisor) in the SPP bit in sstatus.
  5. Set scause to reflect the trap’s cause.

Here is the structure of scause
scause

If the trap was caused by an interrupt,the Interrupt field will be 1. The WLRL field contains a code that identify the latest exception or interrupt.

  1. Set the mode to supervisor.
  2. Copy stvec to the pc.
  3. Start executing at the new pc.

Note the the CPU doesn't switch either to the kernel page table or to kernel stack. Kernel software must perform these tasks.

Trap from user space

In user space,trap may occur if the user program make a system call(ecall),or does something illegal, or if a device interrupts.

However, RISC-V hardware does not switch page tables when it forces a trap. This means that the user page table must include mappings for uservec to execute properly. In user space, stvec should store a pointer pointing to uservec.

Xv6 satisfies these requirements using a trampoline page which contains uservec. The trampoline page is mapped in both user and kernel space's page table.

Overview to execute write

  1. Call write() to execute ecall
  2. Switch to Supervisor Mode
  3. Jump to uservec (kernel/trampoline.S) where stvec points to
  4. Jump to usertap() (kernel/trap.c)
  5. usertrap() call the function syscall()
  6. Trap handler looks up the system call numble in a table and calls the paticular function inside the kernel ( for us is sys_write() )
  7. When it finished,it returns back syscall()
  8. (Trying to return user space) call to usertrapret (kernel/trap.c)
  9. usertrapret's call to userret(kernel/trampoline.S) passes a pointer to the process's user page table in a0
  10. Executes sret to return to user space.






G



SH

Shell



write

write()



ecall

ecall



write->ecall





uservec_trampoline

uservec--kernel/trampoline.S



ecall->uservec_trampoline


Supervisor Mode



usertrap

usertrap()--kernel/trap.c



uservec_trampoline->usertrap


stvec



SYSC

SYSCALL



usertrap->SYSC





syscall

syscall()



sys_write

sys_write()



syscall->sys_write





usertrapret

usertrapret()



syscall->usertrapret





sys_write->syscall





userret

userret()--kernel/trampoline.S



usertrapret->userret





userret->ecall


sret , User Mode



SYSC->syscall





Trampframe Page Content(kernel/proc.h)

struct trapframe {
  /*   0 */ uint32 kernel_satp;   // kernel page table
  /*   4 */ uint32 kernel_sp;     // top of process's kernel stack
  /*   8 */ uint32 kernel_trap;   // usertrap()
  /*  12 */ uint32 epc;           // saved user program counter
  /*  16 */ uint32 kernel_hartid; // saved kernel tp
  /*  20 */ uint32 ra;
  /*  24 */ uint32 sp;
  /*  28 */ uint32 gp;
  /*  32 */ uint32 tp;
  ... ALL 31 general purpose register 
  /* 140 */ uint32 t6;
};

uservec (kernel/trampoline.S)

After calling ecall,trap handler switches to supervisor mode,and the Program Counter is set to the location (0x3ffffff000) of the trampoline page (which stvec points to ). It then jumps to the beginning of the trampoline page, which is uservec.

  • Task Description
    • Saves the user context (e.g., program counter, stack pointer) into the trapframe.
    • Set up satp register to point to the kernel page table.
    • Jump to usertrap() to handle the trap
uservec:    
        # trap.c sets stvec to point here, so
        # traps from user space start here,
        # in supervisor mode, but with a
        # user page table.
        #

        # save user a0 in sscratch so
        # a0 can be used to get at TRAPFRAME.
        csrw sscratch, a0

        # each process has a separate p->trapframe memory area,
        # but it's mapped to the same virtual address
        # (TRAPFRAME) in every process's user page table.
        li a0, TRAPFRAME
        
        # save the user registers in TRAPFRAME
        sw ra, 20(a0)
        sw sp, 24(a0)
        sw gp, 28(a0)
        ...
        
	# save the user a0 in p->tf->a0
        csrr t0, sscratch
        sw t0, 56(a0)
        
        # restore kernel stack pointer from p->tf->kernel_sp
        lw sp, 4(a0)

        # make tp holw the current hartid, from p->tf->kernel_hartid
        lw tp, 16(a0)

        # load the address of usertrap(), p->tf->kernel_trap
        lw t0, 8(a0)

        # restore kernel page table from p->tf->kernel_satp
        lw t1, 0(a0)
        csrw satp, t1
        sfence.vma zero, zero
        
        # a0 is no longer valid, since the kernel page
        # table does not specially map p->tf.

        # jump to usertrap(), which does not return
        jr t0

Detailed Operation

  1. csrw sscratch, a0 : Save the original content of a0 into the sscratch
  2. li a0, TRAPFRAME : a0 now points to TRAPFRAME, allowing operations on TRAPFRAME to be performed through the a0 register.
  3. save the user registers in TRAPFRAME : (Above we show the structure of trapframe) In xv6, process->trapframe is mapped to TRAPFRAME such that the physical address and the virtual memory address are the same.
  4. csrr t0, sscratch : reads the content from the sscratch, which contains the original content of the a0.
  5. Loading sp and tp from TRAPFRAME : read the stack pointer and hart ID.
  6. lw t0, 8(a0) : Sets up the destination for usertrap()
  7. ld t1, 0(a0) : loads the kernel page table address (stored at offset 0 in TRAPFRAME)
  8. csrw satp, t1 : Switching to the kernel page table
  9. jump to usertrap()

usertrap (kernel/trap.c)

This function handles the specific behavior corresponding to the cause of the trap.

  • Task Description
    • Save up kernel trap vector(kernelvec) in the stvec register.
    • Read the scause register and handle traps based on their cause:
      • Exception : Print an error message and terminate the process using exit.
      • Device Interrupt: Call deviceintr() to handle the device interrupt.
      • System Call: Enable interrupts and invoke syscall(). If the process is marked as dead, call exit()
      • Timer Interrupt: End the current time slice. If the process is not killed, call yield()
usertrap(void)
{
  int which_dev = 0;

  if((r_sstatus() & SSTATUS_SPP) != 0)
    panic("usertrap: not from user mode");

  // send interrupts and exceptions to kerneltrap(),
  // since we're now in the kernel.
  w_stvec((uint32)kernelvec);

  struct proc *p = myproc();
  
  // save user program counter.
  p->tf->epc = r_sepc();
  
  if(r_scause() == 8){
    // system call

    if(p->killed)
      exit(-1);

    // sepc points to the ecall instruction,
    // but we want to return to the next instruction.
    p->tf->epc += 4;

    // an interrupt will change sstatus &c registers,
    // so don't enable until done with those registers.
    intr_on();

    syscall();
  } else if((which_dev = devintr()) != 0){
    // ok
  } else {
    printf("usertrap(): unexpected scause %p pid=%d\n", r_scause(), p->pid);
    printf("            sepc=%p stval=%p\n", r_sepc(), r_stval());
    p->killed = 1;
  }

  if(p->killed)
    exit(-1);

  // give up the CPU if this is a timer interrupt.
  if(which_dev == 2)
    yield();

  usertrapret();
}

Detailed Operation

  1. if((r_sstatus() & SSTATUS_SPP) != 0) : check the SSTATUS_SPP bit of the sstatus . This bit indicates the processor mode before the trap (0: User mode : 1: Supervisor mode)
  2. w_stvec((uint32)kernelvec) : Processes traps occurring in supervisor mode (exceptions or device interrupts).
  3. p->tf->epc = r_sepc(); : The user space Program Counter (sepc),which is saved into the trapframe. This allows the kernel to restore the process's execution state after handling the trap.
  4. if(r_scause() == 8) : Determine the Cause of the Trap
  5. Prepares to return to user mode via usertrapret()

usertrapret (kernel/trap.c)

The first step in returning to user space is the call to usertrapret. This function sets up the RISC-V control registers to prepare for a future trap from user space.

  • Task Description
    • Disable Interrupts.(We're still executing in the kernel. If an interrupt occurs at that point,then program will go to the user trap handler.)
    • Set up stvec register to point to uservec
    • Save the contents of the sp and tp registers
    • Load the Program Counter into the sepc register.
1. usertrapret(void)
2. {
3.   struct proc *p = myproc();
4. 
5.   // turn off interrupts, since we're switching
6.   // now from kerneltrap() to usertrap().
7.   intr_off();
8. 
9.   // send syscalls, interrupts, and exceptions to trampoline.S
10.   w_stvec(TRAMPOLINE + (uservec - trampoline));
11. 
12.   // set up trapframe values that uservec will need when
13.   // the process next re-enters the kernel.
14.   p->tf->kernel_satp = r_satp();         // kernel page table
15.   p->tf->kernel_sp = p->kstack + PGSIZE; // process's kernel stack
16.   p->tf->kernel_trap = (uint32)usertrap;
17.   p->tf->kernel_hartid = r_tp();         // hartid for cpuid()
18. 
19.   // set up the registers that trampoline.S's sret will use
20.   // to get to user space.
21.   
22.   // set S Previous Privilege mode to User.
23.   unsigned long x = r_sstatus();
24.   x &= ~SSTATUS_SPP; // clear SPP to 0 for user mode
25.   x |= SSTATUS_SPIE; // enable interrupts in user mode
26.   w_sstatus(x);
27. 
28.   // set S Exception Program Counter to the saved user pc.
29.   w_sepc(p->tf->epc);
30. 
31.   // tell trampoline.S the user page table to switch to.
32.   uint32 satp = MAKE_SATP(p->pagetable);
33. 
34.   // jump to trampoline.S at the top of memory, which 
35.   // switches to the user page table, restores user registers,
36.   // and switches to user mode with sret.
37.   uint32 fn = TRAMPOLINE + (userret - trampoline);
38.   ((void (*)(uint32,uint32))fn)(TRAPFRAME, satp);
39. }

Detailed Operation

  1. Line 23~26 : Configure sstatus for User Mode
  2. w_sepc(p->tf->epc) : Set Program Counter (sepc)
  3. uint32 satp = MAKE_SATP(p->pagetable): Switch Page Table
  4. Jump to userret

userret (kernel/trampoline.S)

This is assembly code located on the trampoline page that is mapped in both user and kernel page tables; the reason is that userret will switch page tables.

  • Task Description
    • Set up satp register to point to the user page table.
    • Restore the saved user-mode register contents
    • Configure the sstatus register:
      • Set the spp bit to user mode.
      • Set the SPIE bit to enable interrupts
    • Execute the sret instruction and switches to user mode
userret:
        # userret(TRAPFRAME, pagetable)
        # switch from kernel to user.
        # usertrapret() calls here.
        # a0: TRAPFRAME, in user page table.
        # a1: user page table, for satp.

        # switch to the user page table.
        csrw satp, a1
        sfence.vma zero, zero

        # put the saved user a0 in sscratch, so we
        # can swap it with our a0 (TRAPFRAME) in the last step.
        lw t0, 56(a0)
        csrw sscratch, t0

        # restore all but a0 from TRAPFRAME
        lw ra, 20(a0)
        lw sp, 24(a0)
        ...
        # restore user a0, and save TRAPFRAME in sscratch
        csrrw a0, sscratch, a0
        
        # return to user mode and user pc.
        # usertrapret() set up sstatus and sepc.
        sret

Detailed Operation

  1. csrw satp, a1: switch to the user page table
  2. sfence.vma zero, zero: Flushes the Translation Lookaside Buffer (TLB) to ensure the processor does not use outdated page table entries.
  3. Restore Registers from TRAPFRAME
  4. sret: Switch back to user mode. It sets the pc to the content stored in sepc (ecall + 4)

Trap from kernel space

Xv6 handles traps from kernel code in a different way than traps from user code.

  • Two types of kernel trap:
    • Device Interrupt
    • Exception

Review Trap Roadmap

Q: How we got into supervisor mode ?
Ans: A trap occurs in user mode, and then we enter supervisor mode.

usertrap(void)
{
  int which_dev = 0;

  if((r_sstatus() & SSTATUS_SPP) != 0)
    panic("usertrap: not from user mode");

  // send interrupts and exceptions to kerneltrap(),
  // since we're now in the kernel.
  w_stvec((uint32)kernelvec);
  ...

We can see that usertrap() is already in supervisor mode at this point.We have set stvec to the memory address of kernelvec.
And we are in supervisor mode, which means that we can directly rely on sp and satp to execute the trap handler.

Kernelvec.S (Upper part)

When a trap occurs while the CPU is in kernel mode, the stvec register points to the kernelvec (located in kernel/Kernelvec.S)

  • Keypoints at this stage:
    • The satp register already points to the kernel page table.
    • The sp (stack pointer) is set to a valid kernel stack

Kernelvec will Saves all 32 general-purpose registers onto the stack of the interrupted kernel thread. This ensures the interrupted thread can resume execution without interference from the trap.

Then Jump to Kerneltrap

.globl kerneltrap
.globl kernelvec
.align 4
kernelvec:
        // make room to save registers.
        addi sp, sp, -128

        // save the registers.
        sw ra, 0(sp)
        sw sp, 4(sp)
        sw gp, 8(sp)
        ...
        sw t6, 120(sp)
        // call the C trap handler in trap.c
        call kerneltrap

kerneltrap (kernel/trap.c)

kerneltrap()
{
  int which_dev = 0;
  uint32 sepc = r_sepc();
  uint32 sstatus = r_sstatus();
  uint32 scause = r_scause();
  

  if((sstatus & SSTATUS_SPP) == 0)
    panic("kerneltrap: not from supervisor mode");
  if(intr_get() != 0)
    panic("kerneltrap: interrupts enabled");

  if((which_dev = devintr()) == 0){
    printf("scause %p\n", scause);
    printf("sepc=%p stval=%p\n", r_sepc(), r_stval());
    panic("kerneltrap");
  }

  // give up the CPU if this is a timer interrupt.
  if(which_dev == 2 && myproc() != 0 && myproc()->state == RUNNING)
    yield();

  // the yield() may have caused some traps to occur,
  // so restore trap registers for use by kernelvec.S's sepc instruction.
  w_sepc(sepc);
  w_sstatus(sstatus);
}

Detailed Operation

  1. Save the current contents of the sepc, sstatus, and scause . This is because if we call yield() below, there may be other traps that change the contents of the sepc,and we need to restore these CSRs at the end of the yield().
  2. (sstatus & SSTATUS_SPP) == 0: Determine if the trap is from supervisor mode by sstatus
  3. kerneltrap Handles Two Types of Traps:
  • Device Interrupt : Uses the devintr() function to identify and handle interrupts from devices such as timers, UART, or disk.

devintr() = 1 (UART/DISK)
devintr() = 2 (Timer)
devintr() = 0 (other)

  • Exceptions : If the trap is an exception (e.g., illegal instruction or page fault), it is considered a fatal error in the kernel, and the kernel invokes panic() to stop execution.
  1. w_sepc(sepc)、w_sstatus(sstatus): Once the kerneltrap finishes handling the trap. Restores sepc (saved program counter) and the previous mode from sstatus.
  2. Return to kernelvec

Special Case: Timer Interrupts

If the trap is caused by a timer interrupt and a process’s kernel thread is running(that the interrupted thread is a kernel thread, not a scheduler thread)

The kerneltrap calls yield() to let other threads execute.

Q: What Happens in yield() ?
Ans: The current thread give up the CPU. Another thread is scheduled to run, and eventually, the original thread resumes execution at its kerneltrap context.

Kernelvec.S (Lower part)

Control is passed back to kernelvec, which pops the saved registers from the stack. Then executes the sret instruction.

Copies the value of sepc to the program counter (PC).Then we're back to the memory address where the trap occurred.

 // restore registers.
        lw ra, 0(sp)
        lw sp, 4(sp)
        lw gp, 8(sp)
        // not this, in case we moved CPUs: lw tp, 12(sp)
        lw t0, 16(sp)
        lw t1, 20(sp)
        ...
         // return to whatever we were doing in the kernel.
        sret

Page Fault Exception

Page Fault is triggered when the CPU fials to translate a virtual address to valid a physical address.
RISC-V has three different types of page faults:
(Reference: scause CSR)

  1. Instruction Page Fault
  2. Load Page Fault
  3. Store Page Fault

Important Registers:

  1. scause: Record the type of Page Fault
  2. stval: Record the virtual address that could not be translated

Xv6’s Handling of Page Faults:
User space:the kernel kills the faulting process
Kernel space:the kernel panics.

Advanced Applications of Page Faults

Copy-On-Write (COW)

Goal: Share physical pages between parent and child processes to improve memory efficiency during fork.
Mechanism:

  1. During fork, parent and child initially share all physical pages, mapped as read-only (PTE_W cleared).
  2. If either process attempts to write, a Store Page Fault is triggered.
  3. The kernel handles the fault by:
  • Allocating a new physical page
  • Copying the contents of the shared page to the new page.
  • Updates the PTE in the faulting process’s page table to point to the new page and allow writes.
  1. Resumes the faulting process, re-executing the instruction that caused the fault.

Advantages: Reduces memory usage and speeds up fork

Lazy Allocation

Goal: Delay physical memory allocation until it is actually used.
Mechanism:

  1. When a process requests memory (e.g., via sbrk), the kernel:
  • Records the address space expansion.
  • Does not allocate physical memory or create PTEs for the new virtual addresses.
  1. When the process accesses these new addresses:
  • A Page Fault is triggered.
  • The kernel allocates a physical page and updates the PTE to map the address.

Advantages: Prevents memory waste for unused allocations,and reduces overhead for large memory requests.

Demand Paging

Goal: Improve application startup times by loading memory pages only when accessed.
Mechanism:

  1. During exec, the kernel sets up the program’s page table with invalid PTEs for the program’s text and data segments.
  2. When the program accesses these pages for the first time, a Page Fault occurs.
  3. The kernel loads the required page from disk into memory and updates the PTE.

Advantages:

  1. Only pages that are accessed are loaded, minimizing I/O
  2. Improves the responsiveness of large applications.

Paging to Disk

Goal: Manage memory usage when the total demand exceeds physical RAM.
Mechanism:

  1. Pages that are not actively used are written to disk and marked invalid in the PTE.
  2. When an application accesses a paged-out page:
  • A Page Fault occurs.
  • The kernel:
    • Allocates a physical page
    • Loads the page from disk.
    • Updates the PTE to point to the new physical page.
  1. If no physical memory is available, the kernel evicts another page to disk.

Advantages: Enables efficient memory utilization, allowing more processes to run simultaneously.

Interrupt & device drivers

Devices that need attention from the operating system can usually be configured to generate interrupts, which are one type of trap. The kernel trap handling code recognizes when a device has raised an interrupt and calls the driver’s interrupt handler.

Recall the Trap Roadmap.System calls, page faults, and interrupts all use the same mechanism. Therefore,this dispatch happens in devintr (kernel/trap.c).

The following figure is from SiFive's manual on SoC(FU540-C000)
All devices are connected to the processor, which handles device interrupts through Platform Level Interrupt Control (PLIC).
plic
sifive

As you can see in the upper left corner, we have 53 different interrupts from the device. After these interrupts arrive at the PLIC, the PLIC routes these interrupts. Core-Local Interruptor (CLINT) generates local interrupt.

Global Interrupt (PLIC)

Platform-Level Interrupt Controller (PLIC). The global interrupt controller in a RISC-V system.
Global interrupts are routed through a Platform-Level Interrupt Controller (PLIC),which can direct interrupts to any hart in the system via the external interrupt.

Reference:PLIC.adoc

Keywords

  1. Gateway:
    • The gateway converts external hardware signals (edge-triggered, level-triggered, or message-signaled) into interrupt requests recognizable by the PLIC.
    • Each device or signal source passes through its corresponding gateway before entering the PLIC.
  2. Interrupt Identifiers (IDs) & Priority:
    • Each interrupt is assigned a unique ID
    • Interrupts are also assigned a priority level
  3. Interrupt Targets:
    • In the RISC-V specification, an "Interrupt Target" refers to a specific Hart and privilege mode (M-mode, S-mode, or U-mode).
    • Each target has its own enable bits (mie.MS|UET|SIE) and a threshold value.
  4. Claim and Completion:
    • When an interrupt is triggered, the PLIC sends a notification to the designated target.
    • If the Hart determines that the enable conditions (enable bit & threshold) are satisfied, it will “Claim” the interrupt and enter the ISR (Interrupt Service Routine).
    • After the ISR is completed, the target writes to the completion register, signaling the PLIC to clear the pending bit.
  5. Memory Mapping:
    • The PLIC's memory map organizes these registers into specific regions, such as:
      • Priority registers: Base address 0x000000.
      • Pending bits: Base address 0x001000.
      • Enable bits: Base address 0x002000.
      • Threshold and Claim/Completion registers: Base address 0x200000

PLIC Interrupt Flow

plic_flow

  1. Hardware Interrupt Signal: External devices (e.g., network cards, disks, UARTs) generate an interrupt signal and send it to the PLIC through the gateway.
  2. Interrupt Request: The gateway determines the appropriate trigger method (edge, level, or message-signaled) and converts the signal into an interrupt request. The PLIC sets the corresponding pending bit.
  3. Priority Evaluation: The PLIC evaluates the priority of pending interrupts against the threshold values of potential targets.
  4. Interrupt Notification: If the interrupt's priority exceeds the target's threshold, the PLIC sends a notification to the target hart.
  5. Interrupt Claim: The target claims the interrupt by reading the claim register.
  6. ISR Execution: The target executes the interrupt service routine (ISR) associated with the claimed interrupt.
  7. Completion: The target writes the interrupt ID back to the completion register, signaling the PLIC to clear the pending bit.

PLIC Implemention on xv6

Actually, QEMU "implements" the PLIC in two divices:

  • 1 : VIRTI0Q_IRQ
  • 10 : UART0_IRQ
  1. Initialization:
    • Setting Interrupt Priorities of both devices to 1
    • For each core (S mode only): Set enable bits for both devices.
    • Set core's threshold to 0

Once an external interrupt is triggered, the following steps occur:

  1. The CPU jumps to the S-mode trap handler (trap.c).
  2. If the trap handler identifies the cause as an external interrupt,it invokes the PLIC-specific handling function plic_claim().
  3. plic_claim() reads the claim register to identify the interrupt source.
  4. xv6 uses this ID to determine which device caused the interrupt and executes the corresponding ISR.
  5. After the ISR finishes, plic_complete() writes the interrupt ID back to the completion register.
  6. xv6 goes back to the end of trap.c and does some common closing actions,and finally sret returns the interrupted program.

Local interrupt (CLINT/ACLINT)

CLINT

Core-Local Interruptor (CLINT).Local interrupts (Software and timer interrupts) are signaled directly to an individual hart with a dedicated interrupt value.

Reference: Core Local Interrupt (CLINT)

CLINT register

  • mtime: is a synchronization counter.Incremented monotonically to provide a uniform time for the system. (unit:tick)
  • mtimecmp: For timer interrupt.When mtime is greater than or equal to mtimecmp, a timer interrupt is generated by the CLINT.
  • msip: For the software interrupt. CLINT will issue a software interrupt to Hart when the value in msip is not 0.

mip
mip

  • MEIP: the interrupt-pending and interrupt-enable bits for machine-level external interrupts.
  • MTIP: the interrupt-pending and interrupt-enable bits for machine timer interrupts.
  • MSIP: the interrupt-pending and interrupt-enable bits for machine-level
    software interrupts.

Timer Interrupt in xv6

risc-v-asm-manual
timer_interr

kernel/start.c()

void
timerinit()
{
  // each CPU has a separate source of timer interrupts.
  int id = r_mhartid();

  // ask the CLINT for a timer interrupt.
  uint32 interval = 1000000; // cycles; about 1/10th second in qemu.
  *(uint64*)CLINT_MTIMECMP(id) = *(uint64*)CLINT_MTIME + interval;

  // prepare information in scratch[] for timervec.
  // scratch[0..3] : space for timervec to save registers.
  // scratch[4] : address of CLINT MTIMECMP register.
  // scratch[5] : desired interval (in cycles) between timer interrupts.
  uint32 *scratch = &mscratch0[32 * id];
  scratch[4] = CLINT_MTIMECMP(id);
  scratch[5] = interval;
  w_mscratch((uint32)scratch);

  // set the machine-mode trap handler.
  w_mtvec((uint32)timervec);

  // enable machine-mode interrupts.
  w_mstatus(r_mstatus() | MSTATUS_MIE);

  // enable machine-mode timer interrupts.
  w_mie(r_mie() | MIE_MTIE);
}
  • CLINT_MTIMECMP(id): defined in kernel/memlayout.h.

ACLINT

Advanced Core Local Interrupt (ACLINT). The RISC-V ACLINT specification defines a set of memory mapped devices which provide inter-processor interrupts (IPI) and timer functionalities for each HART on a multi-HART RISC-V platform.

ACLINT is a group of memory mapped devices used on multi-HART RISC-V platforms to provide

  • Inter-Processor Interrupts (IPI)
  • Timer functionalities

CLINT Limitations

The SiFive Core-Local Interruptor (CLINT) device has been widely adopted in the RISC-V world to provide machine-level IPI and timer functionalities.
Unfortunately, the SiFive CLINT has a unified register map for both IPI and timer functionalities and it does not provide supervisor-level IPI functionality.

  • Unified register map for both IPI and timer functionalities
  • Lack of supervisor-level IPI functionality

So the RISC-V ACLINT specification takes a more modular approach by defining separate memory mapped devices for IPI and timer functionalities.

  • ACLINT defines IPI and timer functions as separate memory mapped devices.
  • Provide dedicated Memory-mapped I/O devices for Supervisor-level support so that SBI calls can avoid using IPI in Linux RISC-V.
  • Compatibility with SiFive CLINT: RISC-V ACLINT specification backward compatible with SiFive CLINT

ACLINT Devices

Name Privilege Level Functionality
MTIMER Machine Fixed-frequency counter and timer events
MSWI Machine Inter-processor (or software) interrupts
SSWI Supervisor Inter-processor (or software) interrupts

Reference

  1. https://github.com/mit-pdos/xv6-riscv-book
  2. RISC-V ISA Specifications