owned this note
owned this note
Published
Linked with GitHub
# Final Term project: RV32 port for latest MIT xv6 operating system
## Preparation for executing [ladybird](https://github.com/harihitode/ladybird)
- Review of [srv32](https://github.com/kuopinghsu/srv32)
- [note](https://hackmd.io/Y1kiEiZzSuyWi8TH8Y0q0w)
- installation
```shell=
$ git clone https://github.com/harihitode/ladybird
$ git clone https://github.com/harihitode/ladybird-xv6.git
$ cd ladybird/sim
$ make
$ make xv6
```

- install llvm for rebuild the kernel
- reference: [RISC-V LLVM setup in Ubuntu](https://mucrolores.medium.com/risc-v-llvm-setup-in-ubuntu-a27652a5b9a8)
```shell=
$ sudo apt-get update
$ sudo apt-get -y install \
binutils build-essential libtool texinfo \
gzip zip unzip patchutils curl git \
make cmake ninja-build automake bison flex gperf \
grep sed gawk python bc \
zlib1g-dev libexpat1-dev libmpc-dev \
libglib2.0-dev libfdt-dev libpixman-1-dev
$ mkdir riscv
$ cd riscv
$ mkdir _install
$ export PATH=`pwd`/_install/bin:$PATH
$ hash -r
$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
$ pushd riscv-gnu-toolchain
$ ./configure --prefix=`pwd`/../_install --enable-multilib
$ make -j`nproc`
$ make -j`nproc` build-qemu
$ popd
$ git clone https://github.com/llvm/llvm-project.git riscv-llvm
$ pushd riscv-llvm
$ ln -s ../../clang llvm/tools || true
$ mkdir _build
$ cd _build
$ cmake -G Ninja -DCMAKE_BUILD_TYPE="Release" \
-DBUILD_SHARED_LIBS=True -DLLVM_USE_SPLIT_DWARF=True \
-DCMAKE_INSTALL_PREFIX="../../_install" \
-DLLVM_OPTIMIZED_TABLEGEN=True -DLLVM_BUILD_TESTS=False \
-DDEFAULT_SYSROOT="../../_install/riscv64-unknown-elf" \
-DLLVM_DEFAULT_TARGET_TRIPLE="riscv64-unknown-elf" \
-DLLVM_TARGETS_TO_BUILD="RISCV" \
../llvm
$ cmake --build . --target install
$ popd
```
- Now, test whether you can rebuild the kernel
```shell=
$ cd $HOME/ladybird-xv6/
$ make clean
$ make
$ make fs.img
$ cd $HOME/ladybird/sim
$ make xv6
```
If it runs in correct way, when you make clean it, the ladybird-xv6/kernel/kernel fs.img and ladybird-xv6/user/_XXX should be cleaned up. Once you **make** it, it should rebuild without any wrong message.
## The problem when running modified xv6 on qemu
- The execution work flow is roughly as follows
```shell
kernel.ld -> entry.S -> start.c -> main.c -> scheduler()(in proc.c, remain looping without stopping in general)
```
- While taking a look at the scheduler, we found that only one process is runnable under **qemu** comparing with multiple runnable processes that have been executed under **launch_sim**

- still working on it...
## The startup of the kernel
- kernel.ld, entry.S, start.c
### kernel.ld
- the fisrt function : **_entry**
- allocate the memory for data, bss, rodata, text, trapsec, etc.
### entry.S
- initialize the stack pointer for the current hart: sp = stack0 + (hartid * 4096)
- call start function, which is inside the start.c
### start.c
- From supervisor mode to machine mode, and finally call mret to return to the supervisor mode
- w_satp(0) => no paging
- Delegate all exception and interruption to supervisor mode except the timer interrupt.
- w_mepc((uint64)main): the first instruction after machine mode return to supervisor mode
- the final instruction **mret** returns to the previous mode(which in this case is the supervisor mode) and jump to the instruction saved in the mepc
- timerinit():
- w_mtvec: store the address of interrupt handler
- w_mstatus <- enable machine mode interruption
- w_mie(r_mie() | MIE_MTIE) : enable machine mode time interruption
- 
- The parameter a1, a2, a3 are nothing but the temporary variable for the computation of mtimecmp = mtimecmp + interval
- The code
```assembly=
//in kernel/kernelvec/S
.globl kerneltrap
.globl kernelvec
...
.globl timervec
.align 4
timervec:
# start.c has set up the memory that mscratch points to:
# scratch[0,4,8,12] : register save area.
# scratch[16] : address of CLINT's MTIMECMP register.
# scratch[20] : desired interval between interrupts.
csrrw a0, mscratch, a0
sw a1, 0(a0)
sw a2, 4(a0)
sw a3, 8(a0)
sw a4, 12(a0)
# schedule the next timer interrupt
# by adding interval to mtimecmp.
lw a1, 16(a0) # CLINT_MTIMECMP(hart)
lw a2, 20(a0) # interval
lw a3, 0(a1)
lw a4, 4(a1)
add a3, a3, a2
sltu a2, a3, a2
add a4, a4, a2
# new comparand is in a4:a3
li a2, -1
sw a2, 0(a1)
sw a4, 4(a1)
sw a3, 0(a1)
# raise a supervisor software interrupt.
li a1, 2
csrw sip, a1
lw a4, 12(a0)
lw a3, 8(a0)
lw a2, 4(a0)
lw a1, 0(a0)
csrrw a0, mscratch, a0 // this is for the swap between a0 and mscratch
mret
```
## Status and Trap architectures
### Trap
- Exception
- two types
- syscall: ecall
- Error: illegal instruction, alignment error
- Interrupt
- two modes
- user mode
- supervisor mode
- both run into **Handler(...)** after excecuted, which is in supervisor mode
- stvec
- a csr(cause and status register)
- contains the address of the **Handler(...)** code
- two types:
- kernelvec
- uservec
- satp(address translation pointer)
- csr, paoint to page table
- always on in S and U mode
- initialization: set to 0
- status register
[source code from ladybird](https://github.com/harihitode/ladybird/blob/1f27f9470cf15b595ab799e83f3dab5c7f298ddb/sim/csr.c)
- delegation: in xv6, all trap except the timer interruption are delegated to supervisor mode.
- the workflow of handling a trap(according to [xv6 Kernel-14: Trap Handling](https://www.youtube.com/watch?v=k4f2vHCV5iQ&list=PLbtzT1TYeoMhTPzyTZboW_j7TPAnjv9XB&index=14))


### Trampoline and trapframe
- The corresponding information is stored in proc.h
- Pay attention to several data structure
- struct cpu
- struct proc
- struct trapframe

- The corresponding code is in **kernel/trampoline.S**.
- trap.c sets **stvec** to point here, so traps from user space start here, in supervisor mode, but with a user page table. sscratch points to where the process's p.trapframe is mapped into user space at TRAPFRAME. a0 is TRAPFRAME.
```assembly=
## kernel/trampoline.S
.section trampsec
.globl trampoline
trampoline:
.align 4
.globl uservec
uservec:
csrrw a0, sscratch, a0
# save the user registers in TRAPFRAME
sw ra, 20(a0)
sw sp, 24(a0)
sw gp, 28(a0)
sw tp, 32(a0)
sw t0, 36(a0)
...
save the user a0 in p->trapframe->a0
csrr t0, sscratch
sw t0, 56(a0)
# restore kernel stack pointer from p->trapframe->kernel_sp
lw sp, 4(a0)
# make tp hold the current hartid, from p->trapframe->kernel_hartid
lw tp, 16(a0)
# load the address of usertrap(), p->trapframe->kernel_trap
lw t0, 8(a0)
# restore kernel page table from p->trapframe->kernel_satp
lw t1, 0(a0)
csrw satp, t1
sfence.vma zero, zero # !! Notice that we have to make sure we have already finish all instruction above!
# a0 is no longer valid, since the kernel page
# table does not specially map p->tf.
# jump to usertrap(), which does not return
jr t0
```
```c=
//trap.c
//
// handle an interrupt, exception, or system call from user space.
// called from trampoline.S
//
void usertrap(void)
{
int which_dev = 0;
if((r_sstatus() & SSTATUS_SPP) != 0)
panic("usertrap: not from user mode");
// send interrupts and exceptions to kerneltrap(),
// since we're now in the kernel.
w_stvec((uint32)kernelvec);
struct proc *p = myproc();
// save user program counter.
p->trapframe->epc = r_sepc();//read the pc where the exception occurs
if(r_scause() == 8){// system call!!
if(p->killed)
exit(-1);
// sepc points to the ecall instruction,
// but we want to return to the next instruction.
p->trapframe->epc += 4;
// an interrupt will change sstatus &c registers,
// so don't enable until done with those registers.
intr_on();
syscall();
} else if((which_dev = devintr()) != 0){
// 1: uart/disk
// 2: timer(software interrupt)
// 0: other(error)
} else {
printf("usertrap(): unexpected scause %p pid=%d\n", r_scause(), p->pid);
//!! Notice that if intr_on() is used, than it will disable the read from stval and sepc(i.e. we cannot use r_sepc() and r_stval() in this case!!!)
printf(" sepc=%p stval=%p\n", r_sepc(), r_stval());
p->killed = 1;
}
if(p->killed)
exit(-1);
// give up the CPU if this is a timer interrupt.
if(which_dev == 2)
yield();//!! like an no op, and it may return from an different core
usertrapret();
}
```
```c=
//trap.c
// return to user space
//
void usertrapret(void)
{
struct proc *p = myproc();
// we're about to switch the destination of traps from
// kerneltrap() to usertrap(), so turn off interrupts until
// we're back in user space, where usertrap() is correct.
intr_off();
// send syscalls, interrupts, and exceptions to trampoline.S
//TRAMPOLINE: point to the highest page in memory
//this function will set the stvec from kernelvec to uservec!!
w_stvec(TRAMPOLINE + (uservec - trampoline));
// set up trapframe values that uservec will need when
// the process next re-enters the kernel.
p->trapframe->kernel_satp = r_satp(); // kernel page table
p->trapframe->kernel_sp = p->kstack + PGSIZE; // process's kernel stack // initialize with an empty stack!!
p->trapframe->kernel_trap = (uint32)usertrap;
p->trapframe->kernel_hartid = r_tp(); // hartid for cpuid() // since we hace call yield(), it may be in the different core
// set up the registers that trampoline.S's sret will use
// to get to user space.
// set S Previous Privilege mode to User.
unsigned long x = r_sstatus();
x &= ~SSTATUS_SPP; // clear SPP to 0 for user mode
x |= SSTATUS_SPIE; // enable interrupts in user mode
w_sstatus(x);
// set S Exception Program Counter to the saved user pc.
w_sepc(p->trapframe->epc);
// tell trampoline.S the user page table to switch to. //MAKE_SATP is a preprocessor macro
uint32 satp = MAKE_SATP(p->pagetable);
// jump to trampoline.S at the top of memory, which
// switches to the user page table, restores user registers,
// and switches to user mode with sret.
// (userret - trampoline) is the offset of userret in the trampoline page
uint32 fn = TRAMPOLINE + (userret - trampoline);
//(void (*)(uint32,uint32))fn: tell compiler that fn is an function with 2 arguments and return void
//and then we set the 2 arguments with (TRAPFRAME, satp)
((void (*)(uint32,uint32))fn)(TRAPFRAME, satp);
}
```
```assembly=
.globl userret
userret:
# userret(TRAPFRAME, pagetable)
# switch from kernel to user.
# usertrapret() calls here.
# a0: TRAPFRAME, in user page table.
# a1: user page table, for satp.
# switch to the user page table.
#!! that is, it switches to the user's address space!!
csrw satp, a1
sfence.vma zero, zero
# put the saved user a0 in sscratch, so we
# can swap it with our a0 (TRAPFRAME) in the last step.
lw t0, 56(a0)
csrw sscratch, t0
# restore all but a0 from TRAPFRAME
lw ra, 20(a0)
lw sp, 24(a0)
lw gp, 28(a0)
lw tp, 32(a0)
lw t0, 36(a0)
lw t1, 40(a0)
...
# restore user a0, and save TRAPFRAME in sscratch
csrrw a0, sscratch, a0
# return to user mode and user pc.
# usertrapret() set up sstatus and sepc.
sret
```
- Note:
- The **usertrapret** will set up all the variables in trapframe(including kernel_satp, kernel_epc, etc.) for the next process entering the kernel.
- It's a process from user mode to supervisor mode(or called kernel mode).
- The last instruction is **sret**, which is for returning to the previous user process.
- Once the process entering the supervisor mode, the interrupt will be disable immediatedly, and when the reason for trap is the **syscall**, then the interrupt will be enabled again.
- The other possible interrupt enabled section during the supervisor mode is when we call **yield()** function. Remember that the **yield** function allows the context switch dring the execution(i.e. the process may switch in this period, as well as the mode)
- When we encounter a trap during the kernel mode, things will be different. For example, there is no memory to store values for the corresponding registers. We need to store these thing inside the kernel stack. Further information will be discused later.
## Trap in Kernel Mode
- Take a look at kernelvec.S(the first code be executed after trap occurs in the kernel mode), it's mainly used to call the **kerneltrap()** in trap.c
```c=
// interrupts and exceptions from kernel code go here via kernelvec,
// on whatever the current kernel stack is.
void
kerneltrap()
{
int which_dev = 0;
uint32 sepc = r_sepc();
uint32 sstatus = r_sstatus();
uint32 scause = r_scause();
if((sstatus & SSTATUS_SPP) == 0)
panic("kerneltrap: not from supervisor mode");
if(intr_get() != 0)
panic("kerneltrap: interrupts enabled");
if((which_dev = devintr()) == 0){
printf("scause %p\n", scause);
printf("sepc=%p stval=%p\n", r_sepc(), r_stval());
panic("kerneltrap");
}
// give up the CPU if this is a timer interrupt.
if(which_dev == 2 && myproc() != 0 && myproc()->state == RUNNING)
yield();
// the yield() may have caused some traps to occur,
// so restore trap registers for use by kernelvec.S's sepc instruction.
w_sepc(sepc);
w_sstatus(sstatus);
}
```
## Page Table
- 2 types
- kernel page table
- one page table per user process (implementation options chosen by xv6-riscv: sv39- three level)
- translation lookaside buffers(TLBs)
- caches recent PTE
- when change satp, we need to flush all TLBs
- using sfence.vma(in xv6, we use sfence_vma())
- virtual address(39 bits)
| 25 bits | 9 bits | 9 bits | 9 bits | 12 bits |
| ------- | ------ | ------ | ------ | ------- |
| ignore | level 2 index | level 1 index | level 0 index | offset into 4kb data page |
- page table entry(pte)
| 10 bits | 44 bits | 6 bits | 1 | 1 | 1 | 1 | 1 |
| ------- | -------------------- | ------------- | --- | --- | --- | --- | --- |
| unused | physical page number | unused in xv6 | u | x | w | r | v |
- physical address(65 bits)
| 8 bits | 44 bits | 12 bits |
| ------- | -------------------- | ------- |
| ingored | physical page number | offset |
## Memory layout
- Kernel Virtual Memory(256 GB)
| Content | Note | Permission |
| ----------------------------------- |:--------------------------------------------------------- |:---------- |
| trampoline | 1 page(2KB) | r_x |
| guard page | 1 page | --- |
| kernel stack 0 | 1 page | rw_ |
| guard page | 1 page | --- |
| kernel stack 0 | 1 page | rw_ |
| ...(total 64 stacks and guard page) | ... | ... |
| free memory | for the page allocator(kalloc, kfree), end at 0x8800_0000 | rw_ |
| kernel data | | rw_ |
| kernel text | start at 0x80000000 | r_x |
| ... | ... | ... |
| virtio disk | 1 page, start at 0x10001000 | rw_ |
| uart 0 | 1 page, start at 0x10000000 | rw_ |
| ... | | |
| plic | 4 MB, start at 0xc0000000 | rw_ |
| ... | 2 GB | ... |
- Physical Address
| Content | Note | Permission |
|:---------------------------:|:------------------:| ---------- |
| Unused | | |
| Physical memory(RAM) | end at 0x8800_0000 | |
| Unused and other IO devices | | |
| virtio disk | 1 page | |
| uart0 | 1 page | |
| ... | | |
| plic | 4 MB | |
| ... | | |
| clint | machine mode only | |
| Unused | | |
| boot | | |
- Note:
- **PLIC**(Platform Level Interrupt Controller)
- machine mode only(remember that virtual memory addressing is **not** active under machine mode)
- virtual memory below 0x8800_0000 are **direct mapped** to the physical memory
- **trampoline page** in both kernel and user virtual memory are shared the same address in physical memory
- Each user process has its own user process space(64 in total)
- memorylayout.h contains the necessary information for memory layout
## PLIC(platform level interrupt controller)
- There is a graph illustrated the role for PLIC from a reference book. It's typically a controller with multiple input(peripheral devices such as keyboard, uart, VGA and so on.) and multiple output(multicores).
- One thing to notice is that the graph seems to be directly send the signal(trap signal) from peripheral to PLIC and then to cores. However, in the real word, we need a **cross bus**(e.x. AXIBUS, one kind of bus protocol) to handle the signals(ready, valid, and bits) in order to send the data with the uncertain-length delay(some devices need to be implemented off-chip, so the latency must excceed one cycle)
- The function is just like the memory-mapped unit.
- The structure inside the controller is a n by n enable matrix. Once the peripheral(trap handler) send a signal, the plic will determine which device to receive the signal.
- When source interrupt pending bit is set
- All enabled cores are notified(If an **interrupt** are enable, a **trap** occurs)
- The trap handler will **Claim** the interrupt
- Read a memory-mapped register in the PLIC.
- Returns id of the interrupting device
- PLIC will clear the **source interrupt pending bit**
- Notice that
- only one **CLAIM** will return the ID(other returns 0)
- **handler** runs and deals with the device
- **handler** signals **interrupt completion**
- The details of the impplementation:
- Device are number 1, 2, 3, ...1023(0 = none!!)
- Each core could contain more than one HART
- Each mode can be interrupt(m, s, u)
- Multiple **Targets**(from 0 to 15781)
- Each device is assigned a **priority**
- Each core("**target**") is assigned a **Threshold**
- A device can only interrupt a core if its **priority** > the core's **threshold**
- PLIC Memory-Mapped Registers
- Device Priorities(1023 * 4 bytes)
- Enable bit matrix(1023 * 15782 bytes)
- Priority Thresholds for cores(15782 * 4 bytes)
- Pending bits(15782 * 4 bytes)
- **CLAIM WORDS**(15782 * 4 bytes)
- load to claim: load associative address of the targets to claim an interrupt(in short, core load the id of which the deivce send the interrupt)
- store to complete: store the id whose interrupt just complete
- In original xv6, the **QEMU** implements the PLIC
- only two devices for IRQ
- 1: VIRTIO0_IRQ
- 10: UART0_IRQ
- Ininitialization:
- set priority of both devices to 1
- for each core: set enable bit for both devices and set core's threshold to 0(then both devices could have the access to both core(priority > target!!))

## System Call
- The roadmap of the system call

- trace the code of **sbrk** system call
```c=
//trap.c
void usertrap(void)
{
int which_dev = 0;
if((r_sstatus() & SSTATUS_SPP) != 0)
panic("usertrap: not from user mode");
...
if(r_scause() == 8){
// system call
if(p->killed)
exit(-1);
// sepc points to the ecall instruction,
// but we want to return to the next instruction.
p->trapframe->epc += 4;
// an interrupt will change sstatus &c registers,
// so don't enable until done with those registers.
intr_on();
syscall();
...
usertrapret();
}
```
```c=
//syscall.c
oid
syscall(void)
{
int num;
struct proc *p = myproc();
num = p->trapframe->a7;
if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
p->trapframe->a0 = syscalls[num]();
} else {
printf("%d %s: unknown sys call %d\n",
p->pid, p->name, num);
p->trapframe->a0 = -1;
}
}
////==============
...
extern uint64 sys_sbrk(void);
...
///===============
static uint64 (*syscalls[])(void) = {
...
[SYS_sbrk] = sys_sbrk,
...
}
//==============
static uint64 argraw(int n)
{
struct proc *p = myproc();
switch (n) {
case 0:
return p->trapframe->a0;
case 1:
return p->trapframe->a1;
case 2:
return p->trapframe->a2;
case 3:
return p->trapframe->a3;
case 4:
return p->trapframe->a4;
case 5:
return p->trapframe->a5;
}
panic("argraw");
return -1;
}
int argint(int n, int *ip)
{
*ip = argraw(n);
return 0;
}
```
```c=
//sysproc.c
uint64 sys_sbrk(void)
{
int addr;
int n;
if(argint(0, &n) < 0)
return -1;
addr = myproc()->sz;
if(growproc(n) < 0)// increase the size of memory space!
return -1;
return addr;
}
```