莊易騰
The source code for this project is available on GitHub.
You should download the Alpine Python3 RootFS and place it in the user
directory.
fw_jump.bin
in the bootloader directory)Please refer to commits 4834d56
, a2b16cf
, and 95021ed
.
These commits resolve compilation errors and modify the Makefile in the repository to ensure successful operation.
To complete the compilation process, you can use the all.sh
script as follows:
$ chmod +x all.sh
$ ./all.sh
If successful, the result will look like the image below:
We can use make run
to launch QEMU and execute the kernel:
$ make run
GDB Background
An inferior is a general term that refers to "an entity being debugged using GDB."
How to Run GDB on uCore-SMP
We can configure GDB to use an initialization file called .gdbinit
.
To set this up, follow these steps before running GDB:
$ mkdir ~/.config/gdb
$ echo "add-auto-load-safe-path your_uCore-SMP_path/.gdbinit" >> ~/.config/gdb/gdbinit
Next, start the GDB server with the following command:
$ make gdb_debug
Finally, open another terminal and use the following command to connect to the GDB server:
$ riscv64-unknown-elf-gdb
Common GDB commands
r
: Run the program being debuggedl
: Show the source code around the current execution pointinfo breakpoints
: List all breakpointsd breakpoint_number
: Delete a specific breakpointn
: Execute the next line of code, stepping over function callss
: Execute the next line of code, stepping into function callsc
: Resume execution until the next breakpointbt
: Display the current call stackinfo registers
: Display the current values of CPU registersq
: Exit GDBThe issue appeared in the fu740
branch, but I discovered that I should be using the fu740-python
branch.
When I used the make run
commnad to launch QEMU, I encounterd the following issue:
Using GDB, I determined that the problem originates in the kinit
function.
To address the issue, I traced the code of the kinit
function. The following image illustrates the flow of the kinit
function. we can ovserve that the freerange
function repeatedly calls the recycle_physical_page
function. Within the recycle_physical_page
function, the memset
function is invoked to initialize physical page.
According to the image below, we need to allocate 256 MiB of memory space, spanning from 0x80000000
to 0x90000000
.
According to the QEMU documentation, the default RAM size is 128 MiB.
The issue appeared in the fu740
branch, but I discovered that I should be using the fu740-python
branch.
After resolving the issue mentioned above, I encountered another problem:
After discussing with my instructor, I attempted to utilize the QEMU monitor to resolve the issue and learn how to use it effectively.
QEMU Monitor Background
The QEMU monitor is used to give complex commands to the QEMU emulator.
How to run QEMU monitor on uCore-SMP
To use the QEMU monitor for memory inspection, I modified the uCore-SMP makefile.
You can launch the QEMU monitor by running the following command:
make monitor_debug
Finally, open another terminal and use the following command to connect to the QEMU monitor:
$ telnet 127.0.0.1 4444
Common commands
info qtree
: Show device treexp /fmt addr
: Physical memory dump starting at addrI used the QEMU monitor to inspect memory state and discovered that the content of the mount point consisted entirely of zeros. This led me to suspect that the issue might have been be caused by the format of the drive image.
Ultimately, I identified the root cause and resolved the issue (Exploration Process). The image below demonstrates Python executing successfully on the kernel.
Exploration Process
This section explains how I resolved the f_mount
failure issue.
Upon encountering the issue, I began investigating the functionality of the f_mount
API and learned that it is used to mount a filesystem object to a logical volume. Additionally, I examined the meaning of f_mount
's return value and found that the return value of 13 indicates FR_NO_FILESYSTEM.
However, I initially lacked a clear understanding of FR_NO_FILESYSTEM, so I consulted with my instructor. My instructor suggested using the QEMU monitor to inspect the memory's status. Following this advice, I eventually discovered that the drive image's format was not as expected.
To investigate further, I examined the contents of fs.img
using the file command, which identified the type of fs.img
as "data".
user/fs.img: data
I also analyzed the contents of riscv64-rootfs.img
with the file command, and the result was as follows:
user/riscv64-rootfs.img: DOS/MBR boot sector MS-MBR Windows 7 english at offset 0x163 "Invalid partition table" at offset 0x17b "Error loading operating system" at offset 0x19a "Missing operating system", disk signature 0xc7aafbf3; partition 1 : ID=0xb, active, start-CHS (0x0,32,33), end-CHS (0xc,190,50), startsector 2048, 202752 sectors
From this analysis, I concluded that fs.img
was not useable. I then revisited the README.md
file for further clarification. Upon reviewing it, I realized that I had been executing the kernel on the wrong branch. After switching to the correct branch, I was able to successfully launch the kernel.
Before entering the kernel's main function, _entry.S
allocates a boot stack for each hart.
Since the code only allocates 8 stacks of 4 KB each, the number of CPUs should not exceed 8.
When the first hart begins executing the main
function, it stores the hartid in the tp
register and sets the static variable first_hart to 0.
The first hart then calls the init_cpu
function to initialize the status of all CPUs, including setting the core_id and start_cycle.
for (int i = 0; i < NCPU; i++) {
...
cpus[i].core_id = i;
...
// The r_time function returns the value of the time csr
cpus[i].start_cycle = r_time();
}
Next, the first hart calls the trapinit_hart
function to set stvec and enable the supervisor global interrupt by setting sstatus.SIE. It then enables the supervisor external interrupt and the supervisor software interrupt by setting sie.SEIE and sie.SSIE, respectively.
PLIC Background
PLIC supports up-to 1023 interrupts (0 is reserved) and 15872 contexts.
After executing the trapinit_hart
function, the first hart executes the plicinit
and plicinithart
functions. The plicinit
function sets the priority of interrupt 1 to 1. The plicinithart
function enables interrupt 1 for supervisor mode and sets the supervisor's interrupt threshold to 0.
Then, the first hart executes the kvminit
function to create the kernel page table and the kvminithart
function to write to the satp CSR, enabling virtual memory. After enabling virtual memory, the first hart executes the sfnce.vma zero, zero instruction to flush all TLB entries.
After the first hart completes all initialization, it calls the init_booted
function to set both static arrays, booted and halted, to all zeros:
volatile int booted[NCPU]; // whether a core is booted
volatile int halted[NCPU]; // whether a core is halted
The first hart then sets booted[hartid]
to 1, marking itself as booted, and begins booting the other harts by calling the start_hart
function.
booted[hartid] = 1; // Mark the first hart as booted
int CPU_START = 0; // Starting index for CPUs to boot
for (int i = CPU_START; i < NCPU; i++) {
if (i != hartid) { // Skip the first hart (already booted)
start_hart(i, (uint64_t)_entry, 0);
while (booted[i] == 0) {
// Wait until the hart has updated its booted status
}
}
}
OpenSBI Background
An SBI extension defines a set of SBI functions which provides a particular functionality to supervisor-mode software.
The higher privilege software providing SBI interface to the supervisor-mode software is referred as an SBI implementation or Supervisor Execution Environment (SEE).
All SBI functions share a single binary encoding, which facilitates the mixing of SBI extensions. The SBI specification follows the below calling convention.
struct sbiret {
long error;
long value;
};
The start_hart
function internally calls the a_sbi_ecall
function, which uses the ecall instruction to invoke the Supervisor Execution Environment (SEE).
void start_hart(uint64 hartid, uint64 start_addr, uint64 a1) {
a_sbi_ecall(0x48534D, 0, hartid, start_addr, a1, 0, 0, 0);
}
The value 0x48534D represents the Extension ID (EID) for the Hart State Management (HSM) extension in the RISC-V SBI.
The second argument of the a_sbi_ecall
function is the Function ID (FID).
In the HSM extension, Function ID 0 represents to the sbi_hart_start
function. The sbi_hart_start
function request the SBI implementation to start executing the target hart in supervisor-mode.
struct sbiret sbi_hart_start(
unsigned long hartid,
unsigned long start_addr,
unsigned long opaque
)
After the first hart completes its initialization, the other harts execute the hart_bootcamp
function as follows:
void hart_bootcamp(uint64 hartid, uint64 a1) {
w_tp(hartid); // Write hart ID to the thread pointer
kvminithart(); // Initialize kernel virtual memory for this hart
trapinit_hart(); // Set up the trap handling for this hart
plicinithart(); // Configure the PLIC for this hart
booted[hartid] = 1; // Mark this hart as booted
}
Once each hart (besides the first) has executed the hart_bootcamp
function, they enter a while
loop to wair for all harts to complete booting:
while (!all_started) {
; // Busy wait until all hart have started
}
When the first hart completes booting all the other harts, it will execute the wait_all_boot
function to signal that all harts have started by setting all_started to 1.
void wait_all_boot() {
...
all_started = 1;
}
Once all harts have been booted and the all_started flag is set to 1 by the first hart, all harts proceed to execute the scheduler
function.
This content is referenced from the fw.md.
OpenSBI currently supports three different types of firmwares:
The uCore-SMP utilizes the FW_JUMP firmware, so I will explain the FW_JUMP flow.
At the beginning, the _start
function determines which hart will execute the _relocate
function. Assume that hart 0 executes the amoadd.w instruction first. As shown in the image below, hart 0 reads a zero value from memory, while the other harts will read a non-zero value from memory. Therefore, hart 0 is designated to execute the _relocate
function.
The next image illustrates the execution flow for harts other than hart 0 (the boot core).
These non-boot cores will wait for the boot core to complete relocation and initialization before proceeding.
The following image displays the execution flow for hart 0 (the boot core).
After one hart executes _start_warm
, it will call the sbi_init
function.
In the sbi_init
function, a lottery mechanism is used to select a coldboot hart from all available harts. Only one hart will execute init_coldboot
, while the other harts will execute init_warmboot
.
if (next_mode_supported && atomic_xchg(&coldboot_lottery, 1) == 0)
coldboot = TRUE;
if (coldboot)
init_coldboot(scratch, hartid);
else
init_warmboot(scratch, hartid);
HSM Extension Background
The Hart State Management (HSM) Extension introduces a set of hart states and a set of functions which allow the supervisor-mode software to request a hart state change.
The OS kernel manages all processes and determines which process each hart will execute by scheduling them using a specific algorithm.
The uCore-SMP kernel uses a structure called proc to record the state of each process.
struct proc {
struct spinlock lock;
enum procstate state;
int pid;
...
struct context context;
...
uint64 stride;
uint64 priority;
...
};
We can see that there are two variables, stride and priority, in the proc structure. The uCore-SMP primarily uses these two variables to determine which process will be executed, following an algorithm called Stride Scheduling.
Scheduling Metrics
To compare different scheduling policies effectively, we need a scheduling matric.
The turnaround time of a job is defined as the difference between the time the job completes and the time it arrives in the system:
While turnaround time serves as a performance matric, scheduling algorithms can also be evaluated in terms of fairness. However, performance and fairness often conflict, making it challenging to optimize both simultaneously.
The advent of time-shared system introduce a new metric called response time. Response time is defined as the duration between a job's arrival in the system and the first time it is scheduled to run:
Proportional-Share Background
The proportional-share scheduler, also known as a fair-share scheduler, is built around a straightforward concept: instead of optimzing for matrics like turnaround time or response time, this scheduler focuses on ensuring that each job receives a specific percentage of CPU time.
Stride scheduling is a deterministic fair-share scheduler that is also straightforward to implement. Each job in the system is assigned a stride, which is inversely proportional to the number of tickets it holds. The stride for each job is calculated by dividing a large constant value by the number of tickets assigned to it.
The pseudocode for stride scheduling is shown below:
curr = remove_min(queue);
schedule(curr);
curr->pass += curr->stride;
insert(queue, curr);
Assume we have three jobs, and their strides are illustrated in the image below:
The uCore-SMP kernel uses the following code to find the process with the smallest stride:
for (struct proc *p = pool; p < &pool[NPROC]; p++)
{
if (p->state == RUNNABLE && !p->lock.locked)
{
if (p->stride < min_stride)
{
min_stride = p->stride;
next_proc = p;
}
}
}
Once the scheduler identifies the process with the smallest stride, it updates the next_proc
's stride using the following code:
uint64 pass = BIGSTRIDE / (next_proc->priority);
next_proc->stride += pass;
In the current implementation of uCore-SMP, a process's priority is fixed at 16, resulting in all processes having the same pass
value, calculated as:
uint64 pass = BITSTRIDE / 16;
Since pass
value is identical for all processes, the stride adjustment does not account for variations in priority, causing uniform progression of stride
values across all processes. This effectively makes the scheduler behave as though all processes have equal priority.
In the uCore-SMP implementation, while the default priority is fixed at 16, the system includes a system call, sys_setpriority
, which allows processes to dynamically adjust their priority. The function is defined as follows:
/**
* @brief Set the priority of the current process
*
* @param priority The desired priority value (must be >= 2).
* @return int64 Returns the set priority value if successful, or -1 if the input is invalid.
*/
int64 sys_setpriority(int64 priority) {
if (2 <= priority) {
struct proc *p = curr_proc();
acquire(&p->lock);
p->priority = priority;
release(&p->lock);
return priority;
}
return -1;
}
6.1 Statements and Declarations in Expressions
What does 'inferior' mean in the term 'inferior debugger'?
2.1.3 What GDB Does During Startup
2.1.4 Initialization Files