Linux 核心專題: 建構 RISC-V 相容處理器並運作 Linux 核心

執行人: millaker
解說錄影: https://youtu.be/wsnKy-woxdQ

任務描述

重現 2023 年實驗，將主要的 ISA 換為 RV32IMA，並升級到 Linux v6.1，並確定特定的硬體周邊 (如 NIC) 正確運作。
原始專案: 從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇

JTAG

OpenSBI

SBI is a RISC-V specific term that means supervisor binary interface. It acts as the bridge between the program running in the supervisor mode, and the underlying SEE (Supervisor execution environment). OpenSBI is one of the open source SBI implementations. A list of other implementation can be found in the riscv SBI documentation.

Qemu

Qemu provides an emulated RISC-V 32-bit CPU that can be used test our software and drivers. To build qemu with riscv target configure the build with target=riscv32-softmmu.

riscv-gnu-toolchain

In order to build binary programs for out target ISA, we need to use a custom compiler toolset capable of doing this. risc-gnu-toolchain hosted a suite of tools that we can use including gcc, objdump. Configure the build with --with-arch=rv32ima_zicsr_zifencei --with-abi=ilp32.

zicsr and zifencei was separated from the I extension and must be specified explicitly.

Bootloader

Wikipedia
First stage bootloader is typically stored in ROM, or BootROM. It initializes the platform and loads the next stage bootloader. In my case, PYNQ-Z2 development board initialized the board peripherals including DDR when booting the ARM core, therefore there is no need to init DDR myself. One thing that concerns me is that in the previous implementation, the second stage bootloader is loaded in "System SRAM", a dedicated memory space just for the boot sequence. The author wanted to follow the boot sequence of an embedded ARM core, but in my opinion, this is just a waste of BRAM on the chip.

TODO: 重現 PYNQ-Z2 : AMP (Arm + RISC-V)

將 Linux 執行於 FPGA 為基礎 RISC-V 處理器

Obtain previous project file

Since the Vivado project file is available, I'll use the same project file to avoid trivial tasks.

Open the project using Vivado 2022.2

Three AMD/Xilinx proprietary IPs used in this project, AXI Smartconnect, AXI_APB Bridge and AXI Interconnect requires update. I expect the behaviour of the IPs to not change after the revision, so no modification is needed. Due to this occurence, I'm considering writing my own axi interconnect and bridge.

After some inspection, I realized that these IPs were used to connect the CPU AXI ports to PS. The interconnects can be omitted if the interface naming follows the Xilinx naming convention.

Then its the EDA tools effort.

Synthesis
Implementation (Place and Route)
Generate bitstream

Implementation results

Resource	Utilization	Available	Utilization %
LUT	40839	53200	76.77
LUTRAM	879	17400	5.05
FF	27078	106400	25.45
BRAM	45.50	140	32.50
DSP	36	220	16.36
IO	17	125	13.60
BUFG	5	32	15.63

The placement is not what I concern at this moment, so I simply ignore the hardware implementation. From the table above, we can clearly see that the FPGA LUTs and FFs were not fully utilized, which means that there is some space for hardware improvements. The FPGA on-chip sram analogy, BRAM, were under utilized as well, which means a bigger cache is possible. Since I dont know the implementation parameters used for this CPU just yet, I'll come back to this later when I figure them out.

Analyze worst neg slack path, seek improvement possibilties.

Generate bitstream

Previous work uses generate hardware platform to obtain the bitstream from the .xsa file, which can be opened by any archive software according to the article. I'm curious about the xsa file, and found this thorough explanation. The xsa file contains:

Hardware descriptions (xml)
Bitstream
Tcl script to rebuild the block design

I'm only interested in the bitstream; Therefore using generate bitstream has the same output.

Prepare SD card

The system sees SD card as its secondary storage, and two partitions must be provided for the system to run. The first partition stores the bootloader and the operating system, the second partition stores rootfs.

On Linux, use lsblk to list block devices

RISC-V_SoC/src/cpu$ lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
...
sda                         8:0    1    29G  0 disk
├─sda1                      8:1    1    10G  0 part
└─sda2                      8:2    1    19G  0 part
nvme0n1                   259:0    0 931.5G  0 disk
├─nvme0n1p1               259:1    0     1G  0 part /boot/efi
├─nvme0n1p2               259:2    0     2G  0 part /boot
└─nvme0n1p3               259:3    0 928.5G  0 part
  └─ubuntu--vg-ubuntu--lv 253:0    0 928.5G  0 lvm  ...        /

The device called sda is the sdcard connected to my computer. Since I got the SD card from the previous author, the SD card is already populated with required files beforehand.

Device     Boot    Start      End  Sectors Size Id Type
/dev/sda1           2048 20973567 20971520  10G  c W95 FAT32 (LBA)
/dev/sda2       20973568 60751871 39778304  19G 83 Linux

We can distinguish the partitions by inspecting the FS Type field.

I'll come back to this later when I'm switching the bootloader or the operating system.

Prepare booting PYNQ-Z2

I followed the official instructions on how to boot the arm core.

Adjust boot option to SD card instead of JTAG
Switch to USB mode // Not precise
Download the official image onto the SD card and plug it in. Note that the SD card is not the same as the one previously mentioned.
Connect power cable.
Connect ethernet cable.

After correctly setting up the board, we can see two serial devices being connected to my computer.

$ sudo dmesg | grep USB
[1068003.239506] ftdi_sio 5-2:1.0: FTDI USB Serial Device converter detected
[1068003.240168] usb 5-2: FTDI USB Serial Device converter now attached to ttyUSB0
[1068003.240877] ftdi_sio 5-2:1.1: FTDI USB Serial Device converter detected
[1068003.241528] usb 5-2: FTDI USB Serial Device converter now attached to ttyUSB1

Access the second serial device /dev/ttyUSB1 to access PYNQ/Linux.

$ screen /dev/ttyUSB1 115200
xilinx@pynq:~$

I dont know what is being transmitted/received on the first serial device. Maybe I'll check the manual later.

TODO: Check Zynq-7000 series manual for ttyUSB0 and ttyUSB1.

PYNQ overlay

PYNQ overlay provides us with an easy way to program the FPGA chip with our own bitstream. We can also access the AXI system bus from PS to verify our peripherals connected to the bus.

The original author chose this way to debug the peripherals including JTAG, SD card…

The first time I call Overlay('soc_xsa.bit') to download bitstream onto the FPGA, python emitted error messages as below:

Few people mentioned this error that I have no idea whats wrong with the PortType thing. I found this thread and this thread similar to me but with some slight differences. User stf's answer gave me a hint about the problem, pynqmetadata used in the script got updated and was not compatible with unmatched bitstream versions. So I downgraded the pynq prebuilt image from 3.0 to 2.7, and luckily the bitstream worked this time.

Testing peripherals on bus

Now that I can download bitstream to PL, tests can be made via pynq overlay. The overall address mapping of the whole system is listed below:

Name	Size	Address	End
BootROM	8KB	0x00000000	0x00001FFF
Reserved	~~120KB~~	0x00002000	0x0001FFFF
SystemRAM	128KB	0x00020000	0x0003FFFF
Reserved	~~768KB~~	0x00040000	0x03FFFFFF
CPU Config	4KB	0x04000000	0x04000FFF
Reserved	~~4KB~~	0x04001000	0x04001FFF
Debug Monitor	8KB	0x04002000	0x04003FFF
Reserved	KB	0x04004000	0x07FFFFFF
CLINT	64KB	0x08000000	0x0800FFFF
Reserved	KB	0x08010000	0x0BFFFFFF
PLIC	64MB	0x0C000000	0x0FFFFFFF
UART	4KB	0x10000000	0x10000FFF
SPI	4KB	0x10001000	0x10001FFF
MAC	4KB	0x10002000	0x10002FFF
Reserved	KB	0x10003000	0x7FFFFFFF
DRAM	512 MB	0x80000000	0x9FFFFFFF

Download bitstream

Use Overlay() to parse bitstream and download to PL.

from pynq import Overlay
#riscv = Overlay("./soc_xsa.bit")
riscv = Overlay("./soc_wrapper.bit")

Testing BootROM, System SRAM

Use MMIO() to write/read data to/from the AXI bus.
Testing method:

Check the initial value of the BootROM
Write arbitrary data to it
Check the value again

Before Download ROM code
Read 00000000:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
Download ROM
After Download ROM code
Read 00000000:
00 00 00 00 01 01 01 01 02 02 02 02 03 03 03 03 
04 04 04 04 05 05 05 05 06 06 06 06 07 07 07 07 
08 08 08 08 09 09 09 09 0a 0a 0a 0a 0b 0b 0b 0b 
0c 0c 0c 0c 0d 0d 0d 0d 0e 0e 0e 0e 0f 0f 0f 0f 
10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 
14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 
18 18 18 18 19 19 19 19 1a 1a 1a 1a 1b 1b 1b 1b 
1c 1c 1c 1c 1d 1d 1d 1d 1e 1e 1e 1e 1f 1f 1f 1f

We can see from the results that the memory and the bus both worked as expected. For system sram on the bus, the test is identical. Set the base address and modify it, which results in

First read:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Second read:
00 00 00 00 01 01 01 01 02 02 02 02 03 03 03 03 
04 04 04 04 05 05 05 05 06 06 06 06 07 07 07 07 
08 08 08 08 09 09 09 09 0a 0a 0a 0a 0b 0b 0b 0b 
0c 0c 0c 0c 0d 0d 0d 0d 0e 0e 0e 0e 0f 0f 0f 0f 
10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 
14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 
18 18 18 18 19 19 19 19 1a 1a 1a 1a 1b 1b 1b 1b 
1c 1c 1c 1c 1d 1d 1d 1d 1e 1e 1e 1e 1f 1f 1f 1f

Testing UART

UART requires some configuration like base address, baud rate.

TODO: Initializing details, will be useful when writing driver.

# Enable UART
riscv_base.write(UART_TXCTRL, 1)
riscv_base.write(UART_RXCTRL, 1)

#set baud rate 115200
riscv_base.write(UART_DIV, 1000000000//CLK_PERIOD//115200)

for i in "Hello".encode():
    riscv_base.write(UART_TXFIFO, i)

Connect a uart-usb converter to the TX, RX, VDD, GND pin on the board and monitor the output.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Testing SPI sd card reader

What is SPI
Initializing Details
How does SD card reader work?

The SPI control register is defined as follow:

Bit Position	Field Name	Description
31:16	15'b0	Reserved bits (write with zeros)
15	spi_cr1_del	Delay
14	spi_cr1_bidimode	Bidirectional data mode enable
13	spi_cr1_bidioe	Output enable in bidirectional mode
12	spi_cr1_crcen	Hardware CRC calculation enable
11	spi_cr1_crcnext	CRC transfer next
10	spi_cr1_dff	Data frame format (0: 8-bit, 1: 16-bit)
9	spi_cr1_rxonly	Receive only
8	spi_cr1_ssm	Software slave management
7	spi_cr1_ssi	Internal slave select
6	spi_cr1_lsbfirst	Frame format (0: MSB first, 1: LSB first)
5	spi_cr1_spe	SPI enable
4:2	spi_cr1_br	Baud rate control
1	spi_cr1_mstr	Master selection (0: Slave, 1: Master)
0	spi_cr1_cpol	Clock polarity (0: CK to 0 when idle, 1: CK to 1 when idle)
0	spi_cr1_cpha	Clock phase (0: First clock transition, 1: Second clock transition)

After initializing the SPI module, we must set the SD card to SPI mode by sending appropriate commands. Then we can read sector information from the SD card

Output:
SPI_CR1 = 0x0000007c
[DBG] buff = 000001aa
[DBG] acmd41_r1 = 0x00000001
[DBG] acmd41_r1 = 0x00000000
[DBG] cmd58_r1 = 0x00000000
[DBG] buff = c0ff8000
SPI_CR1 = 0x0000004c
[DBG] r1 = 0x00000000
[DBG] r1 = 0x00000000
SD STATE: STAT_INIT_OK
[SD_STATE] STAT_INIT_OK
[SD_TYPE] SD_SDHC
Read Success
00000000: fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 ................
00000010: fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 ...|.........!..
00000020: 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75 ....8.u........u
00000030: f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b .........|...t..
00000040: 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00 L.....|.........

Download boot code to BootROM

The previouse design of BootROM is actually a readable/writable memory sitting on the bus. I guess its for faster developement and debugging purposes. Therefore there is nothing in the ROM when PL is programmed, and requires an additional step to load the memory manually.

Use the existing debugger provided by the author to download boot file to ROM.

The debugger was written in C# and is a Windows only app, so I planned to write a Linux version of it.

I couldn't get the debugger to work. The debugger can detect the JTAG-USB device but emitted Get input buffer timeout error. I don't know how the dubugger works so I left this issue unsolved and move on to the next approach for now.

Since the BootROM is on the AXI bus, I can write data to it from the python overlay. I uploaded rom.bin to the SD card and read from python.

# Read rom.bin file
def read_file_as_hex(file_path):
    with open(file_path, 'rb') as file:
        file_content = file.read()
        hex_content = file_content.hex()
        # Group the hex content into 4-byte chunks (8 hex digits each)
        hex_chunks = [hex_content[i:i+8] for i in range(0, len(hex_content), 8)]
        return hex_chunks
    
# Function to convert hex chunks to unsigned integers
def hexstr_to_int(s):
    if len(s) < 8:
            s = s.ljust(8, '0')
    s = s[6:8] + s[4:6] + s[2:4] + s[0:2]
    return int(s,16)

boot = read_file_as_hex('rom.bin')

print('Write rom.bin to ROM')
for i, b in enumerate(boot):
    riscv_base.write(i*4 , hexstr_to_int(b))
print('Write done')

And then start the CPU same as the debugger.

From the risc-v UART output, we can observe that the boot up code is actually running. bbl and vmlinux is loaded to System SRAM and ready for execution. However, the program got stuck and produced no other output. I have no idea what bbl did, and why did it get stuck, so I'll first study bbl and vmlinux boot sequence, try to compile it and run it with qemu.

Try using the original bbl and vmlinux from google drive.

I followed commands from the previous work to partition the SD card and format the two partitions.

Use fdisk to create two new partitions.

$ sudo fdisk -l /dev/sda
Disk /dev/sda: 28.97 GiB, 31104958464 bytes, 60751872 sectors
Disk model: Storage Device
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc1685bee

Device     Boot  Start      End  Sectors  Size Id Type
/dev/sda1         2048   104447   102400   50M  c W95 FAT32 (LBA)
/dev/sda2       104448 60751871 60647424 28.9G 83 Linux

Use mkfs to format the partitions into FAT32 and ext3.
Copy bbl and vmlinux into partition 1 and the content of rootfs into partition 2.

This time bbl never got loaded correctly and resulted in an infinite loop inside the zero stage boot loader.

[BROM] UART init done
[BROM] HW ver: 20230508
[BROM] SD card init
[BROM] FAT BPB init
[BROM] load bbl
[BROM] File not found

At first I thought that I something went wrong with the fdisk command until I read the SD card partition 1 in the python overlay the following content

00100000: eb 58 90 6d 6b 66 73 2e 66 61 74 00 02 01 20 00 .X.mkfs.fat... .
00100010: 02 00 00 00 00 f8 00 00 20 00 40 00 00 08 00 00 ........ .@.....
00100020: 00 90 01 00 14 03 00 00 00 00 00 00 02 00 00 00 ................
00100030: 01 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00100040: 80 01 29 7e aa a0 7a 4e 4f 20 4e 41 4d 45 20 20 ..)~..zNO NAME  
00100050: 20 20 46 41 54 33 32 20 20 20 0e 1f be 77 7c ac   FAT16   ...w|.
00100060: 22 c0 74 0b 56 b4 0e bb 07 00 cd 10 5e eb f0 32 ".t.V.......^..2
00100070: e4 cd 16 cd 19 eb fe 54 68 69 73 20 69 73 20 6e .......This is n
00100080: 6f 74 20 61 20 62 6f 6f 74 61 62 6c 65 20 64 69 ot a bootable di
00100090: 73 6b 2e 20 20 50 6c 65 61 73 65 20 69 6e 73 65 sk.  Please inse
001000a0: 72 74 20 61 20 62 6f 6f 74 61 62 6c 65 20 66 6c rt a bootable fl
001000b0: 6f 70 70 79 20 61 6e 64 0d 0a 70 72 65 73 73 20 oppy and..press 
001000c0: 61 6e 79 20 6b 65 79 20 74 6f 20 74 72 79 20 61 any key to try a
001000d0: 67 61 69 6e 20 2e 2e 2e 20 0d 0a 00 00 00 00 00 gain ... .......

The FAT16 word shows that the file system has FAT16 format, which is not the same as the one created by fdisk. According to mkfs.fat(8) manual page, -F 32 option specifiies FAT size. Then it works.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Now that I've reproduced the previous work, some questions came to me with this design.

I don't know how rom.bin, bbl, vmlinux were built from source.
I don't know what each bootloader is doing exactly.
Why system SRAM is needed in the current design?
The FPGA softcore shares the same physical memory with the on-chip ARM core; However ARM core didn't know its existence, and accessed the corrupted DDR. Is there any way to solve this problem?
The hardware bitstream I sythesized myself did not work as the pre-built bitstream from last year

[BROM] UART init done
[BROM] HW ver: 20230701
[BROM] SD card init
[BROM] FAT BPB init
[BROM] FAT BPB init fail

The first two questions can be solved by reading the source code. The third problem requires a hardware modification if the sram is removed.

Build `rom.bin` from source

If I compile the rom code with riscv gnu cross compiler from /rom, I will get link error as showed below:

/riscv-64/lib/gcc/riscv64-unknown-elf/13.2.0/../../../../riscv64-unknown-elf/bin/ld: main section `.rodata.str1.8' will not fit in region `brom'
/riscv-64/lib/gcc/riscv64-unknown-elf/13.2.0/../../../../riscv64-unknown-elf/bin/ld: region `brom' overflowed by 278 bytes
/riscv-64/lib/gcc/riscv64-unknown-elf/13.2.0/../../../../riscv64-unknown-elf/bin/ld: warning: main has a LOAD segment with RWX permissions
collect2: error: ld returned 1 exit status

The bootrom has max size 8 KB but the binary exceeds the limit. So I shortened some of the debug messages and successfully built the binary file. However, I couldn't boot from the new rom.bin I just built.

When analyzing what setup.S did, I realized that there were ifdef guards for UART initialization and cache init.

#ifdef ENABLE_UART_IRQ
    li t0, 0x10000000;
    li t1, 7;
    sw t1, 0x10(t0);
#endif

which is missing in the original Makefile. After adding this line
DEFINE := -DENABLE_UART_IRQ -DENABLE_CACHE -DREAL_ROM
rom.bin worked.

bbl study

The current boot flow requires ROM to setup UART and SPI-SD card reader and will load bbl, vmlinux to a fix location. To this end, I'm skeptical about what is the purpose of bbl (Berkeley boot loader) if vmlinux is loaded in DDR already. So I dived into bbl source code and try to understand what I missed here.

The entry point of bbl can be found in the linker script bbl.lds

ENTRY( reset_vector )

which is located in machine/mentry.S. reset_vector simply resets all the registers and mscratch to zero. The machine mode trap handler is set to a bbl trap_vector with the following sequence

  # write mtvec and make sure it sticks
  la t0, trap_vector
  csrw mtvec, t0
  csrr t1, mtvec
1:bne t0, t1, 1b

After all the reset, init_first_hart() is called. init_first_hart() in brief:

Setup console as soon as possible with query_uart() etc…
init_hart()
1. mstatus_init(): Set mstatus for availablie extensions, like F extension, V extension…
2. fp_init(): Setup floating point control registers if F extension is supported.
3. delegate_trap(): If Supervisor mode is supported, send s-mode interrupts and exceptions to s-mode instead of mode. Set mideleg and medeleg CSRs.
4. setup_pmp(): Setup PMP to allow access to all memory locations. (Details to be figured out…)
A series of query_* functions that searchs the device tree blob(dtb) for information about our platform hardwares.

query_finisher(dtb);
query_mem(dtb);
query_harts(dtb);
query_clint(dtb);
query_plic(dtb);
query_chosen(dtb);

Wake up all the harts. In my case, there is only one.
Init memory, per hart plic according to the info we got from query.
bootloader():
1. filter_dtb() : Place dtb right after the kernel code.
2. Set kernel entry point
3. boot_other_hart(): Get the entry point and enter supervisor mode with that entry. When looking at boot_other_hart(), I found something I couldn't comprehend

























void boot_other_hart(uintptr_t unused __attribute__((unused)))
{
  const void* entry;
  do {
    entry = entry_point;
    mb();
  } while (!entry);

  long hartid = read_csr(mhartid);
  if ((1 << hartid) & disabled_hart_mask) {
    while (1) {
      __asm__ volatile("wfi");
#ifdef __riscv_div
      __asm__ volatile("div x0, x0, x0");
#endif
    }
  }

#ifdef BBL_BOOT_MACHINE
  enter_machine_mode(entry, hartid, dtb_output());
#else /* Run bbl in supervisor mode */
  protect_memory();
  enter_supervisor_mode(entry, hartid, dtb_output());
#endif
}

We can see that boot_other_hart() takes one uintptr_t as an argument, but it is named unused and has a compiler attribute unused. This argument is really unused as stated, and I dont know what is its purpose. Looking at the GNU extension documentation:

unused
This attribute, attached to a variable, means that the variable is meant to be possibly unused. GCC does not produce a warning for this variable.

This attribute has no other meaning but to let compiler ignore warnings related to this variable.

Notice the define guard at line 19 BBL_BOOT_MACHINE, this is used when riscv pk (RISC-V proxy kernel) is used, which use bbl as a fake machine so I can ignore that part.

In conclusion, the zero stage boot loader rom.bin and bbl did almost the same thing and therefore can be reduce to a more minimal boot code. bbl must be swapped with openSBI for more portability and more complex boot loader if possible.

Shared physical memory problem

In the previous work, the same physical memory is shared between the two operating systems, and I found no keys showing that ARM core is turned off. Therefore, in order to make both operating system live together, the easiest way is to separate two DDR regions for each OS with dtb until I learn another approach to deal with this issue. I've gone through multiple official documentations and found no way to completely stop PS(ARM core).

TODO: 指令集從 RV64 改為 RV32

Hardware RTL analysis

Memory mapping

Name	Size	Address	End
BootROM	8KB	0x00000000	0x00001FFF
Reserved	~~120KB~~	0x00002000	0x0001FFFF
SystemRAM	128KB	0x00020000	0x0003FFFF
Reserved	~~768KB~~	0x00040000	0x03FFFFFF
CPU Config	4KB	0x04000000	0x04000FFF
Reserved	~~4KB~~	0x04001000	0x04001FFF
Debug Monitor	8KB	0x04002000	0x04003FFF
Reserved	KB	0x04004000	0x07FFFFFF
CLINT	64KB	0x08000000	0x0800FFFF
Reserved	KB	0x08010000	0x0BFFFFFF
PLIC	64MB	0x0C000000	0x0FFFFFFF
UART	4KB	0x10000000	0x10000FFF
SPI	4KB	0x10001000	0x10001FFF
MAC	4KB	0x10002000	0x10002FFF
Reserved	KB	0x10003000	0x7FFFFFFF
DRAM	512 MB	0x80000000	0x9FFFFFFF

UART controller

The address mapping

`define UART_TXFIFO 12'h00 
`define UART_RXFIFO 12'h04
`define UART_TXCTRL 12'h08
`define UART_RXCTRL 12'h0C
`define UART_IE     12'h10
`define UART_IP     12'h14
`define UART_IC     12'h18
`define UART_DIV    12'h1C
`define UART_LCR    12'h20

TXFIFO: Write to send data, read to check if TX FIFO is full
Read:

Write:
RXFIFO: Read to receive data
Read :

Write: Not defined
TXCTRL:
RXCTRL

cnt field is the threshold for almost empty and almost full for TX and RX
IE: Interrupt enable, Enable TX, RX, Error interrupts
IP: Interrupt pending, Indicates pending TX, RX, Error interrupts
IC: Interrupt clear, Write to clear corresponding interrupt
DIV: Baudrate divider, defaults to 115200 baudrate, can be modified by writing to this register
LCR: Line control register, control whether use parity, the current uart controller provides no way to config stop bits, data length

PLIC (Platform level interrupt controller)

Memory mapping:

Name	Offset	Description
Priority	0x0000000	Zero: Never interrupt
Pending	0x0001000	Current status of the interrupt, 1 bit per source
Enable	0x0002000	Decides if the interrupt source is enabled
Priority threshold	0x0200000	Priority threshold for each context (hart)
Claim/Clear	0x0200000 + 0x4	Claim or Clear interrupt for each context (hart)

According to PLIC spec:
Context
Interrupt targets are usually hart contexts, where a hart context is a given privilege mode on a given hart (though there are other possible interrupt targets, such as DMA engines).

`define PLIC_INT_PRIOR   26'h000_0000
`define PLIC_INT_PEND    26'h000_1000
`define PLIC_INT_TYPE    26'h000_1080
`define PLIC_INT_POL     26'h000_1100
`define PLIC_INT_EN      26'h000_2000
`define PLIC_PRIOR_TH    26'h020_0000

PLIC spec defines all the memory mapping and meaning of each register. So there is no need to write my own firmware. However, the verilog source code defines two extra field PLIC_INT_TYPE and PLIC_INT_POL which I cannot tell what are their usage just by their name.

In cpu/plic.sv I found out that those two fields are gateway configs which determines the type (edge or level trigger) and polarity (high or low trigger) of each source. The default config is high level trigger.

.src_type ( int_type[gvar_i] ), // 0: edge, 1: level
.src_pol  ( int_pol [gvar_i] ), // 0: high, 1: low

Clint (Core local interrupt controller)

Memory mapping:

Name	Offset	Note
msip	0x0	Machine mode software interrupt
mtimecmp	0x4000	Machine mode timer compare register
mtime	0xBFF8	Timer register

mtime and mtimecmp are defined in RISC-V priviledged ISA section 3.2.1.
The machine timer interrupt becomes pending when mtimecmp is greater than or equal to mtime.

SPI controller

Memory mapping:

Name	Offset	Description
SPI_CR1	0x00	Control register 1
SPI_CR2	0x04	Control register 2
SPI_SR	0x08	Status register
SPI_DR	0x0C	Data register

Control register 1:
spi_cr1

cpha: Clock phase
cpol: Clock polarity
mstr: Master selection
br: Baud rate control
spe: SPI enable
lsbfirst: shifting from lsb or msb
ssi: Internal slave select
ssm: Software slave management, whether use internal ssi bit to select
rxonly: Receive only
dff: Data frame format, 8 or 16 per data frame
crcnext: Transmit CRC next
crcen: Hardware CRC calculation enable
bidioe: Output enable in bidirectional mode
bidimode: Bidirectional data mode enable
del: Data frame format

Control register 2:
spi_cr2

txeie: TX buffer empty interrupt enable
rxneie: RX buffer not empty interrupt enable
errie: Error interrupt enable
ssoe: SS output enable
txdmaen: TX buffer DMA enable
rxdmaen: RX buffer DMA enable

Status register:
spi_status

bsy: SPI busy flag
ovr: Overrun flag
modf: Mode fault flag
crcerr: CRC error flag
udr: Underrun flag
chside: Channel side flag
rxne: Receive buffer not empty
txe: Transmit buffer empty

Normal execution flow requires CPU to read txe and rxne in SPI_CR2 to determine whether tx rx buffer needs attention. This will waste CPU cycles doing trivial tasks, therefore a DMA is introduced inside the SPI core. DMA will write to or read from the buffers whenever txe/rxne is set, freeing the CPU to other critical tasks.

STM32 RM0090 Reference manual 28.3.5, for more data transmission details.

SPI DMA

Memory mapping:

Name	Offset	Description
DMA_SRC	0x00	Source address for DMA transfer
DMA_DEST	0x04	Destination address for DMA
DMA_LEN	0x08	Length of DMA transfer
DMA_CON	0x0C	DMA control register
DMA_IE	0x10	DMA interrupt enable register
DMA_IP	0x14	DMA interrupt pending register
DMA_IC	0x18	DMA interrupt clear register
DMA_WDT_CNT	0x1C	DMA watchdog timer count register

SPI DMA is responsible for transfering data from DDR to SD card. It is capable of interacting with the SPI controller, sending tx/rx commands.

Control register:
spi_dma_csr

size: WORD, HWORD, BYTE
type: FIXED, INCR, CONST
bypass: Not sure what bypass mean

Debugger CSR

APB Memory mapping:

Name	Address	Description
DBGAPB_DBG_EN	12'h000	Debug enable register
DBGAPB_INST	12'h004	Debug instruction register
DBGAPB_INST_WR	12'h008	Debug instruction write register
DBGAPB_WDATA_L	12'h010	Debug write data low register
DBGAPB_WDATA_H	12'h014	Debug write data high register
DBGAPB_WDATA_WR	12'h01C	Debug write data write enable register
DBGAPB_RDATA_L	12'h020	Debug read data low register
DBGAPB_RDATA_H	12'h024	Debug read data high register

Monitor memory mapping:

Definition	Address	Description
DBGMON_BP0	13'h1100	Debug monitor breakpoint 0
DBGMON_BP1	13'h1108	Debug monitor breakpoint 1
DBGMON_BP2	13'h1110	Debug monitor breakpoint 2
DBGMON_BP3	13'h1118	Debug monitor breakpoint 3
DBGMON_WP0	13'h1120	Debug monitor watchpoint 0
DBGMON_WP1	13'h1128	Debug monitor watchpoint 1
DBGMON_WP2	13'h1130	Debug monitor watchpoint 2
DBGMON_WP3	13'h1138	Debug monitor watchpoint 3
DBGMON_VC_EXC	13'h1140	Debug monitor vector catch exception
DBGMON_VC_IRQ	13'h1144	Debug monitor vector catch interrupt
DBGMON_DELAY	13'h1148	Debug monitor delay
DBGMON_STOP_TRACE	13'h114c	Debug monitor stop trace
DBGMON_IE	13'h1150	Debug monitor interrupt enable

There are more address mapped to architectural registers.
To trace instruction execution sequence after a specific PC, I need to:

Set BreakPoint to target address
Set BreakPoint enable (BP0 + 4) to 1
Set Delay to target number
Run the CPU till stop bit become 1

Difference between RV32 and RV64

Ref: RiscV ISA manual

This section lists the difference between RV32I and RV64I, which act as a to-do or reference for me.

General

XLEN : width of the integer register in bits. 64 for rv64i and 32 for rv32i. This number is also related to the size \(2^{len}\) of the supported max address space.

Unprivileged ISA

RiscV unprivileged ISA Chapter 4.0:

This chapter describes the RV64I base integer instruction set, which builds upon the RV32I variant described in Chapter 2.

Chapter 4.2:

Most integer computational instructions operate on XLEN-bit values.

Chapter 13.1, 13.2:

MULW is an RV64 instruction…

DIVW and DIVUW are RV64 instructions…

REMW and REMUW are RV64 instructions…

Chapter 14.2:

LR.D and SC.D act analogously on doublewords and are only available on RV64.

The 64-bit ISA is built upon RV32I. I will list out the differences between them.

All the instructions defined in RV32I is supported in RV64I with different behavior (depends on XLEN value).
Additional instructions have been included to operate on 32-bit length (ADDIW, SLLIW, SRLW, SUBW, SRAW). These can be removed in RV32I.
LD and SD can be removed.
For "M" extension, behavior depends on XLEN.
LR.D and SC.D can be removed
AMO*.D can be removed

Modified file:

alu.sv: Remove 64-bit arithmetic logics
dec.sv: Remove 64-bit instructions, which means removed illegal instructions will be decoded to ill_isns.
mdu.sv: Mul/Div behavior

Privileged ISA

RISC-V privileged ISA Chapter 3.1.6:

MXLEN is the effective XLEN in M-mode

For RV32 only, there is mstatush register, which contains the same field in RV64 mstatus uppper 32 bit.

When MXLEN=32, the SXL and UXL fields do not exist, and SXLEN=32 and UXLEN=32.

mtvec is MXLEN bits long.

mdelegh, menvcfgh , mseccfgh is the alias bits of the upper half 64-bit non-h counterpart.

msratch, mepc, mcause, mtval, mconfigptr is MXLEN bits

mtime is still 64-bit precision. In RV32, memory-mapped writes to mtimecmp modify only one 32-bit part of the register.

Physical memory protection Chapter 3.7.1:

16 pmpcfg registers are 32-bit compared to 8 64-bit regs in RV64.

pmp address is 32-bit as well, storing the addr[33:2] in the address field.

S-mode has almost the same changes in the CSR fields.

SV32 Chapter 11.3:

Two level page table is used compared to three in SV39.

The hardware page table walker does not require a change because the hardware checks if this entry is a leaf node and stops there. The level isn't that relevant to the walking process.

The upper 32 bits of medeleg in RV64 is hardwired to zeros, so medelegh can be all zero. menvcfg, mseccfg registers did not exist in this design, so the additional *h version can be ignored.

AXI Interconnect

The AXI Interconnect bitwidth is 32 bits, which conforms to RV32. Take AR channel as an example,

logic  [ 31: 0] m0_araddr

In RV64, the read is done by two bursts, therefore no need to change the bus and the corresponding masters (cache, …).

Verifying the CPU

I found only the system level testbench for cpu_wrap under scripts/. I think a full test on the whole system is an overkill and verifying only cpu_top is adequate since I did not files other than cpu core. I need to figure out how to compile and use riscv-tests. There is no document on how to do this.

Simple rv32i test assembly: https://github.com/hamsternz/simple-riscv/blob/main/sw/asm/isa_test.S

Verify the core first, inspect the IO of the core

pu_top DUT(
    // Need to figure out whats the difference between two reset signals
    // and how do I generate a correct systime
    .clk(),
    // The two resets can be see as one
    .srstn(),
    .xrstn(),
    // cpu id == 0
    .cpu_id(),
    //output, can be ingnored
    .rv64_mode(),
    // The first instruction after cpu start
    .bootvec(),
    //output can be ignored
    .warm_rst_trigger(),
    // the actual mtime, can be hardwired zero
    .systime(),

    // mpu csr
    // All outputs, I assume I dont need these signals

    // mmu csr
    // All outputs, I assume I dont need these signals.
    .satp_ppn(),
    .satp_asid(),
    .satp_mode(),
    .prv(),
    .sum(),
    .mprv(),
    .mpp(),

    // TLB control
    // All outputs, I assume I dont need these signals.
    .tlb_flush_req(),
    .tlb_flush_all_vaddr(),
    .tlb_flush_all_asid(),
    .tlb_flush_vaddr(),
    .tlb_flush_asid(),

    // interrupt interface
    // interrupt pins, hardwire to zeros for no interrupt
    .msip(),
    .mtip(),
    .meip(),
    .seip(),

    // insn interface
    // Need to figure out whats the expected memory behavior and write my own model
    .imem_en(),
    .imem_addr(),
    .imem_rdata(),
    .imem_bad(),
    .imem_busy(),
    .ic_flush(),

    // data interface
    // Need to figure out whats the expected memory behavior and write my own model
    .dmem_en(),
    .dmem_addr(),
    .dmem_write(),
    .dmem_ex(),
    .dmem_strb(),
    .dmem_wdata(),
    .dmem_rdata(),
    .dmem_bad(),
    .dmem_xstate(),
    .dmem_busy(),

    // debug interface
    // I can just ignore the debug driving signals
    .dbg_gpr_all(),
    .dbg_addr(),
    .dbg_wdata(),
    .dbg_gpr_rd(),
    .dbg_gpr_wr(),
    .dbg_gpr_out(),
    .dbg_csr_rd(),
    .dbg_csr_wr(),
    .dbg_csr_out(),
    .dbg_pc_out(),
    .dbg_exec(),
    .dbg_insn(),
    .attach(),
    .halted(),

    // CPU tracer
    // All output, assume its irrelevant
    .trace_pkg_valid(),
    .trace_pkg()
  );

Most of the output ports are irrelevant when verifying baremetal rv32 so I can ignore them. I wrote a simple memory model for imem and dmem.

TODO: 用 OpenSBI 驗證

bbl (boot loader)，確認 Sv32 (MMU) 可運作

After watching openSBI Deep dive by WD I could grasp the big picture of openSBI. There are many features but I need only a small set of them to boot linux. The new boot flow will be like this:

ZSBL: Load openSBI firmware and my own bootloader to DDR
FSBL: Initialize console, peripherals, PLIC, PMP. Load linux and pass control to the kernel. This stage will be implemented using openSBI.
Kernel

To keep ZSBL small enough, I think iolib can be removed and use LEDs to indicate the boot process. As soon as openSBI is ready, the console can then be init. This is initial plan, so changes may be made to this boot flow. I don't need to do all the hardware(PLL, DDR, …) inits, since zynq FSBL will do this for me.

Try openSBI on qemu

To further understand openSBI in action, I use qemu to try out openSBI firmware.

Prepare riscv cross-compiler and compile openSBI with default qemu platform and no payload:

$ make PLATFORM=generic CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build

Then execute the firmware with qemu:

$ qemu-system-riscv64 -M virt -m 256M -nographic -bios build/platform/generic/firmware/fw_payload.bin

I got the openSBI output:

$ qemu-system-riscv64 -M virt -m 256M -nographic    -bios build/platform/generic/firmware/fw_payload.bin

OpenSBI v1.4-111-gd962db2
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : riscv-virtio,qemu
Platform Features         : medeleg
Platform HART Count       : 1
Platform IPI Device       : aclint-mswi
Platform Timer Device     : aclint-mtimer @ 10000000Hz
...
Domain0 Name              : root
Domain0 Boot HART         : 0
...
Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64imafdch
Boot HART ISA Extensions  : sstc,zicntr,zihpm,zicboz,zicbom,sdtrig
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 2 bits
Boot HART PMP Address Bits: 54
Boot HART MHPM Info       : 16 (0x0007fff8)
Boot HART Debug Triggers  : 2 triggers
Boot HART MIDELEG         : 0x0000000000001666
Boot HART MEDELEG         : 0x0000000000f0b509

Test payload running

Notice that this platform is the default platform for qemu emulation, and we need to implement a new platform for our SOC. The execution stopped at Test payload running because I didn't specify any next stage payload.

Next, I compiled linux kernel v6.4 from source and got a bootable image.
Then, I prepared a rootfs image with buildroot all with default config.

To compile openSBI firmware with a payload, the linux kernel in my case, the following command is used:

$ make PLATFORM=generic FW_PAYLOAD_PATH=../linux/arch/riscv/boot/Image CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build

and emulate with :

$ qemu-system-riscv64 -M virt -m 256M -nographic \
	-bios build/platform/generic/firmware/fw_payload.bin \
	-drive file=../rootfs.ext2,format=raw,id=hd0 \
	-device virtio-blk-device,drive=hd0 \
	-append "root=/dev/vda rw console=ttyS0"

I got an error running the command above:

qemu-system-riscv64: -append only allowed with -kernel option

If I delete the append flag, the kernel would start but couldn't load the filesystem correctly.

[    0.289222] VFS: Cannot open root device "" or unknown-block(0,0): error -6
[    0.289349] Please append a correct "root=" boot option; here are the available partitions:
[    0.289590] fe00           61440 vda
...
[    0.292969] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.293283] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0 #1
[    0.293462] Hardware name: riscv-virtio,qemu (DT)
[    0.293627] Call Trace:
[    0.293792] [<ffffffff800061e2>] dump_backtrace+0x1c/0x24
[    0.294199] [<ffffffff8097ca9c>] show_stack+0x2c/0x38
[    0.294317] [<ffffffff809896d6>] dump_stack_lvl+0x52/0x74
[    0.294417] [<ffffffff8098970c>] dump_stack+0x14/0x1c
[    0.294526] [<ffffffff8097cfaa>] panic+0x106/0x2ba
[    0.294650] [<ffffffff80a0174a>] mount_root_generic+0x208/0x2ca
[    0.294789] [<ffffffff80a019fe>] mount_root+0x1f2/0x224
[    0.294905] [<ffffffff80a01c2e>] prepare_namespace+0x1fe/0x25a
[    0.295027] [<ffffffff80a0118e>] kernel_init_freeable+0x26c/0x28e
[    0.295159] [<ffffffff8098b164>] kernel_init+0x1e/0x10a
[    0.295275] [<ffffffff809936da>] ret_from_fork+0xe/0x1c
[    0.295750] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

I need to specify the root device block in the kernel command line but I couldn't append anything without kerenl option.

My guess is that the tutorial is using the older version of qemu. So instead I run the example using fw_jump(with jump address):

$ qemu-system-riscv64 -M virt -m 256M -nographic \
	-bios build/platform/generic/firmware/fw_jump.bin \
	-kernel ../linux/arch/riscv/boot/Image \
	-drive file=../rootfs.ext2,format=raw,id=hd0 \
	-device virtio-blk-device,drive=hd0 \
	-append "root=/dev/vda rw console=ttyS0"

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Now I'm more familiar with openSBI and linux boot sequence, I will try to configure qemu system like my SOC's environment and do the openSBI debugging on qemu first. Its more simpler and faster.

Configuring and Compiling linux kernel

After searching how to specify kernel command line at compile time, I found out that there is a config dedicated to this purpose. In arch/riscv/Kconfig:

menu "Boot options"

config CMDLINE
	string "Built-in kernel command line"
	help
	  For most platforms, the arguments for the kernel's command line
	  are provided at run-time, during boot. However, there are cases
	  where either no arguments are being provided or the provided
	  arguments are insufficient or even invalid.

	  When that occurs, it is possible to define a built-in command
	  line here and choose how the kernel should use it later on.

choice
	prompt "Built-in command line usage" if CMDLINE != ""
	default CMDLINE_FALLBACK
	help
	  Choose how the kernel will handle the provided built-in command
	  line.

config CMDLINE_FALLBACK
	bool "Use bootloader kernel arguments if available"
	help
	  Use the built-in command line as fallback in case we get nothing
	  during boot. This is the default behaviour.

...

config CMDLINE_FORCE
	bool "Always use the default kernel command string"
	help
	  Always use the built-in command line, even if we get one during
	  boot. This is useful in case you need to override the provided
	  command line on systems where you don't have or want control
	  over it.

endchoice

I can set CMDLINE_FORCE to always use default kernel command string. So I reconfigure the .config file to use the qemu default:

CONFIG_CMDLINE="root=/dev/vda rw console=ttyS0"
CONFIG_CMDLINE_FORCE=y

and rebuilt the kernel.

When running the fw_payload opensbi firmware, the same issue occurs.

...
[    0.000000] percpu: Embedded 22 pages/cpu s49400 r8192 d32520 u90112
[    0.000000] Kernel command line:
[    0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
...

The Kernel command line was empty.

😅It turned out that I forgot to recompile the opensbi firmware with the latest linux kernel image. This showed an important drawback of booting linux kernel directly from openSBI firmware.

Useful reference: Linux Kernel configuration list

ZSBL (Zero stage boot loader)

openSBI relies on the previous bootloader to load its firmware to DDR. The zero stage bootloader is quite simple since all the clock, DDR is intialized by PS.

UART

To emit zsbl boot process information, we can use LEDs or UART as a reliable feedback. Since the bus is ready when running boot rom, uart is enabled and more informative than LED indicators. The uart init sequence is similar to openSBI firmware but with only putc and puts capabilities.

SD/MMC

Ref: How to use MMC/SDC
Ref: SPI

In order to use MMC/SDC in my system, I need to first initialize SPI and then put SD card into SPI mode. The following shows the initialization process:

Note:

ACMD(N) is a sequence of CMD55-CMD(N)

SPI mode is block addressing

After putting SD card into SPI mode, we can send CMD17 to read a block and CMD24 to write a block. Upon receiving a valid response for read/write block commands, we can utilize DMA to move the data packets.

I'm not 100% sure how the custom DMA works at this point

void __dma_cfg(u32 src,
               u32 dest,
               u32 len,
               u8 spi_bypass,
               u8 src_btype,
               u8 dest_btype,
               u8 src_size,
               u8 dest_size);

#define __dma_spi2buf(__BUFF__, __LEN__)                            \
    do {                                                            \
        __dma_cfg(0xffffffff, (u32) (__BUFF__), (u32) (__LEN__), 0, \
                  DMA_TYPE_CONST, DMA_TYPE_INCR, DMA_SIZE_WORD,     \
                  DMA_SIZE_WORD);                                   \
        while (__dma_busy())                                        \
            ;                                                       \
    } while (0)

dma_cfg has the function signature shown above. The macro dma_spi2buf will read data from SPI data register to the destination address. At first, I was skeptical about the src config 0xffffffff. But later I when I looked at the RTL source

assign fifo_wdata_pre = ({32{dma_con_src_type  == TYPE_FIXED}} & m_axi_intf.rdata >> {src_addr[1:0], 3'b0})|
                        ({32{dma_con_src_type  == TYPE_INCR }} & m_axi_intf.rdata >> {src_addr[1:0], 3'b0})|
                        ({32{dma_con_src_type  == TYPE_CONST}} & dma_src);

I found that TYPE_CONST means SPI mode which is different from TYPE_FIXED for fixed address. dma_src which is configured 0xffffffff will act as a mask to the data. The bypass probably means that DMA need not to read via AXI bus -> APB -> SPI.

File IO

To load files from the bootable section partition 0, I need to understand the details of the filesystem used.
Ref: Microsoft FAT spec

Loader

With adequate file operations, we can load the elf file from SD card to DDR.

Migrating from bbl to openSBI firmware

digraph G {
    N1[label="Zero stage boot loader"];
    N2[label="OpenSBI firmware - Payload"];
    N3[label="Linux Kernel"];
    N1 -> N2 -> N3;
}

The new boot flow will look like this graph. Zero stage boot loader is repsonsible of loading openSBI firmware to DDR. The linux kernel is bundled with openSBI firmware as payload. After openSBI initialization, the CPU will execute in supervisor mode and the handle will be transfered to the kernel.

Create a new platform in openSBI following the official guide. The file structure will look like this:

.
├── configs
│   └── defconfig
├── Kconfig
├── objects.mk
└── platform.c

Kconfig and defconfig will provide build time configuration options. platform.c will provide struct sbi_platform object for building openSBI firmware.

The official repo kindly provides a template for new-built platform. There is a generic platform used by many SoC vendors including Andes, Sifive, THead. The generic platform is FDT (flattten device tree) based platform, and its really overkill in my case. The hardware info can be hardcoded in the firmware for my design. However, I need to figure out how does openSBI generate and pass the device tree blob to the next stage if I choose this method for the new platform.

I created a new platform called amp for Asymmetric multiprocessing. To make debugging firmware easier, I need to maker serial console work as soon as possible. I followed how sifive implemented there own uart firmware and created amp-uart.[ch] under sbi_utils/serial.

I cannot find any specification for the custom uart controller, so I search for the original RTL source code. See Hardware RTL UART controller section for more detail. I then implemented putc and getc for struct sbi_console_device.

struct sbi_console_device {
	/** Name of the console device */
	char name[32];

	/** Write a character to the console output */
	void (*console_putc)(char ch);

	/** Write a character string to the console output */
	unsigned long (*console_puts)(const char *str, unsigned long len);

	/** Read a character from the console input */
	int (*console_getc)(void);
};

puts is not required because sbi_console will check if it is implemented and choose between using puts or iterative putcs.

Next the irqchip (PLIC controller). See Hardware RTL PLIC section for more memory mapping details. OpenSBI provides a set of PLIC APIs in sbi_utils/plic.c, which I can use in the firmware. Simply fill in my PLIC config

#define PLATFORM_HART_COUNT		1
#define PLATFORM_PLIC_ADDR		0xc000000
#define PLATFORM_PLIC_SIZE		(0x200000 + \
					 (PLATFORM_HART_COUNT * 0x1000))
#define PLATFORM_PLIC_NUM_SOURCES	32
static struct plic_data plic = {
	.addr = PLATFORM_PLIC_ADDR,
	.size = PLATFORM_PLIC_SIZE,
	.num_src = PLATFORM_PLIC_NUM_SOURCES,
};

and I can use the implemented PLIC firmware.

Next the timer firmware. The hardware description can be found in CLINT section. A sbi timer device must implement the following attributes

static struct sbi_timer_device plmt_timer = {
	.name		   = "amp-plmt",
	.timer_freq	   = 10000000,
	.timer_value	   = plmt_timer_value,
	.timer_event_start = plmt_timer_event_start,
	.timer_event_stop  = plmt_timer_event_stop
};

timer_event_start and timer_event_stop allows the operating system to do scheduling and other timing related operations.

With uart controller, plic, clint firmware, my platform can be initialized.
After adding platform amp to the compilation config options, I built the platform with:

$ make PLATFORM=amp CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build

To specify platform info, use PLATFORM_RISCV_ISA, PLATFORM_RISCV_ABI, PLATFORM_RISCV_XLEN for a specific architecture. For example:

$  make PLATFORM=amp PLATFORM_RISCV_ISA=rv64ima_zicsr_zifencei PLATFORM_RISCV_ABI=lp64 CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build

boot openSBI

After loading openSBI firmware to DDR, zsbl will jump to firmware entry point and firmware will start its work. Then I got this error message:

  ... BBL
  ... VMLINUX
  ... SBI
OpenSBI v1.4-111-gd962db2
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|


sbi_trap_error: hart0: trap0: trap redirect failed (error -2)

sbi_trap_error: hart0: trap0: mcause=0x0000000000000003 mtval=0x0000000000000000
sbi_trap_error: hart0: trap0: mepc=0x0000000090014844 mstatus=0x0000000a00001800
sbi_trap_error: hart0: trap0: ra=0x0000000090009830 sp=0x0000000090023ef0
sbi_trap_error: hart0: trap0: gp=0x0000000000000000 tp=0x0000000090024000
sbi_trap_error: hart0: trap0: s0=0x0000000090023f00 s1=0x0000000090024150
...

I also encountered the situation that the CPU jumps execution back to 0x0, which restarts the rom code again. Not sure what caused this issue. The next step will be enabling JTAG debugger and correct physical memory separation between ARM and RISC-V.

USB-to-JTAG FT232H driver issue

The JTAG debugger written in C# provided by the previous work will run into error when I followed the same steps. To make sure that the FT232H chip is functioning, I will do some basic tests on it. If the chip is working, then its the connection issue or RTL failure.

Ref: FTDI-in-C
Ref: FTDI driver programming guide
Ref: FTDI JTAG

I tried to use the example driver code for FT232H chip from Reference 1, and found no issue running the code. The offficial read buffer code is like:

dwNumBytesToSend = 0; // Reset output buffer pointer
  do {
    ftStatus = FT_GetQueueStatus(ftHandle, &dwNumBytesToRead);
    // Get the number of bytes in the device input buffer
  } while ((dwNumBytesToRead == 0) && (ftStatus == FT_OK));
  // or Timeout
  bool bCommandEchod = false;
  ftStatus =
      FT_Read(ftHandle, &byInputBuffer, dwNumBytesToRead, &dwNumBytesRead);

and the one in the debugger

FTDI.FT_STATUS ftStatus;
Int32 retry = 0;
do
{
    // Get the number of bytes in the device input buffer
    ftStatus = ftdi.GetRxBytesAvailable(ref NumBytesToRead);
    retry++;
    if (retry > 5000)
    {
        MessageBox.Show("Get input buffer timeout");
        ftdi.Close();
        return FTDI.FT_STATUS.FT_OTHER_ERROR;
    }
} while (((len == 0 && NumBytesToRead == 0) || (len != 0 && NumBytesToRead != len)) && (ftStatus == FTDI.FT_STATUS.FT_OK));
// Read out the data from input buffer
return ftStatus |= ftdi.Read(InputBuffer, NumBytesToRead, ref NumBytesRead);

Despite the programming language different, there was a retry limit in the debugger. I commented out the retyr limit and did a quick try, then it worked!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

This is an important milestone because it allows me to trace instructions executed by the CPU and probe some internal states. A future plan is to port this debugger to Linux for convenient as Windows is not my main working computer.

Separating ARM and RISC-V physical memory space

Ref: https://gist.github.com/yunqu/827862e580a5f9b069eccdfcdcf70398
Followed the tutorial, and get pynq boot image info

FIT description: U-Boot fitImage for PYNQ arm kernel
Created:         Thu Nov 18 03:29:34 2021
 Image 0 (kernel@0)
  Description:  Linux Kernel
  Created:      Thu Nov 18 03:29:34 2021
  Type:         Kernel Image
  Compression:  uncompressed
  Data Size:    5869440 Bytes = 5731.88 KiB = 5.60 MiB
  Architecture: ARM
  OS:           Linux
  Load Address: 0x00080000
  Entry Point:  0x00080000
  Hash algo:    sha1
  Hash value:   d113552f61c40e646b7ec24bab9b0c31f3778d57
 Image 1 (fdt@0)
  Description:  Flattened Device Tree blob
  Created:      Thu Nov 18 03:29:34 2021
  Type:         Flat Device Tree
  Compression:  uncompressed
  Data Size:    19771 Bytes = 19.31 KiB = 0.02 MiB
  Architecture: ARM
  Hash algo:    sha1
  Hash value:   314cc1baf3d0d5360c5ac3c6c4e0dfa742a1a27f
 Default Configuration: 'conf@1'
 Configuration 0 (conf@1)
  Description:  Boot Linux kernel with FDT blob
  Kernel:       kernel@0
  FDT:          fdt@0
  Hash algo:    sha1
  Hash value:   unavailable

Extract the device tree wiht

dumpimage -T flat_dt -p 1 -o ~/amp.dtb image.ub

convert to humanreadable format and identify physical memory field

...
	memory {
		device_type = "memory";
		reg = <0x00 0x20000000>;
	};
...

change reg = <0x00 0x20000000> to reg = <0x00 0x10000000> then recompile device tree.

dtc -I dts -O dtb -o system.dtb amp.dts

then repackage the boot image with

mkimage -f image.its image.ub

After a reboot, I checked the memory usage:

$ xilinx@pynq:~$ free -g -h -t
              total        used        free      shared  buff/cache   available
Mem:          494Mi       131Mi       141Mi       1.0Mi       220Mi       351Mi
Swap:         511Mi          0B       511Mi
Total:        1.0Gi       131Mi       653Mi

I searched for other methods but still failed to overcome this issue.

openSBI trap issue

I found that my platform timer init has the wrong implementation, which executes ebreak and halted openSBI runtime. openSBI successfully printed out all the hart info and platform info after a quick fix:

UART done
SD done
FAT BPB done
  ... BBL
  ... VMLINUX
  ... SBI
sbi done

OpenSBI v1.4-111-gd962db2
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : amp
Platform Features         : medeleg
Platform HART Count       : 1
Platform IPI Device       : ---
Platform Timer Device     : amp-plmt @ 1000000Hz
Platform Console Device   : amp_uart
Platform HSM Device       : ---
Platform PMU Device       : ---
Platform Reboot Device    : ---
Platform Shutdown Device  : ---
Platform Suspend Device   : ---
Platform CPPC Device      : ---
Firmware Base             : 0x80000000
Firmware Size             : 182 KB
Firmware RW Offset        : 0x20000
Firmware RW Size          : 54 KB
Firmware Heap Offset      : 0x25000
Firmware Heap Size        : 34 KB (total), 2 KB (reserved), 11 KB (used), 20 KB (free)
Firmware Scratch Size     : 4096 B (total), 344 B (used), 3752 B (free)
Runtime SBI Version       : 2.0

Domain0 Name              : root
Domain0 Boot HART         : 0
Domain0 HARTs             : 0*
Domain0 Region00          : 0x000000000c200000-0x000000000c200fff M: (I,R,W) S/U: (R,W)
Domain0 Region01          : 0x0000000010000000-0x0000000010000fff M: (I,R,W) S/U: (R,W)
Domain0 Region02          : 0x0000000008004000-0x0000000008007fff M: (I,R,W) S/U: ()
Domain0 Region03          : 0x0000000008010000-0x0000000008013fff M: (I,R,W) S/U: ()
Domain0 Region04          : 0x0000000008008000-0x000000000800ffff M: (I,R,W) S/U: ()
Domain0 Region05          : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: ()
Domain0 Region06          : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
Domain0 Region07          : 0x000000000c000000-0x000000000c1fffff M: (I,R,W) S/U: (R,W)
Domain0 Region08          : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x00000000000011d0
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes
Domain0 SysSuspend        : yes

Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64iemac
Boot HART ISA Extensions  : smaia,smstateen,sscofpmf,sstc,zicntr,smcntrpmf,sdtrig
Boot HART PMP Count       : 0
Boot HART PMP Granularity : 0 bits
Boot HART PMP Address Bits: 0
Boot HART MHPM Info       : 0 (0x00000000)
Boot HART Debug Triggers  : 1 triggers
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109

Notice that some fields have weird values, like Boot HART Base ISA, Boot HART ISA Extensions and the dummy payload did not print out test payload running. Debugger showed that

(28678999 cycles) [M] 00000000800130a0:30200073 mret
(28679023 cycles) [S] InstructionAccessFault, epc = 0x80200000, tval = 0x80200000

after a mret, the CPU caught an InstructionAccessFault execption.

Boot HART Base ISA was implemented as rv64eimac in the RTL, so there is no issue.
Boot HART ISA Extensions requires platform code implementing extensions_init. So I need to implement that. It is related to PMP as well.

Ref: OpenSBI Domain
If I removed the domain memory region I added myself in the sbi firmware, the test payload can run without error.

Domain0 Name              : root
Domain0 Boot HART         : 0
Domain0 HARTs             : 0*
Domain0 Region00          : 0x000000000c200000-0x000000000c200fff M: (I,R,W) S/U: (R,W)
Domain0 Region01          : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: ()
Domain0 Region02          : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
Domain0 Region03          : 0x000000000c000000-0x000000000c1fffff M: (I,R,W) S/U: (R,W)
Domain0 Region04          : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000000000000
...
Boot HART Debug Triggers  : 1 triggers
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109

Test payload running

The first region is PLIC config. Then the two regions protecting openSBI firmware. I couldn't find which context added the fourth region. The last region is by default the rest of the memory space. The payload is a while(1) wfi(); loop.

TODO: 用 Linux 核心驗證

Compile linux kernel 4.20

Prepare toolchain beforehand, then put in custom drivers.
Add driver files and modify Makefile

drivers/Makefile
+obj-y += debug/

drivers/net/ethernet/Makefile
+obj-y += eth-riscv.o

drivers/power/reset/Makefile
+obj-y += pwrcon-riscv.o

drivers/spi/Makefile
+obj-y += spi-riscv.o

drivers/tty/serial/Makefile
+obj-y += sifive.o

Configure features

make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- menuconfig

Compile kernel

make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- -j$(nproc)

Fix some minor issues like require zifenci, DECLARE_TASKLET.
Then bundle the result Image with openSBI firmware as a payload.

make PLATFORM=amp PLATFORM_RISCV_ISA=rv64ima_zicsr_zifencei PLATFORM_RISCV_ABI=lp64 CROSS_COMPILE=riscv64-unknown-linux-gnu- FW_PAYLOAD_PATH=../../linux/arch/riscv/boot/Image FW_PAYLOAD_OFFSET=0x400000 FW_FDT_PATH=./amp.dtb O=build

Boot from openSBI firmware

Then boot from SBI gave an error,

...
Platform CPPC Device      : ---
Firmware Base             : 0x80000000
Firmware Size             : 182 KB
Firmware RW Offset        : 0x20000
Firmware RW Size          : 54 KB
Firmware Heap Offset      : 0x25000
Firmware Heap Size        : 34 KB (total), 2 KB (reserved), 11 KB (used), 20 KB (free)
Firmware Scratch Size     : 4096 B (total), 344 B (used), 3752 B (free)
Runtime SBI Version       : 2.0

Domain0 Name              : root
Domain0 Boot HART         : 0
Domain0 HARTs             : 0*
Domain0 Region00          : 0x000000000c200000-0x000000000c200fff M: (I,R,W) S/U: (R,W)
Domain0 Region01          : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: ()
Domain0 Region02          : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
Domain0 Region03          : 0x000000000c000000-0x000000000c1fffff M: (I,R,W) S/U: (R,W)
Domain0 Region04          : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
Domain0 Next Address      : 0x0000000080400000
Domain0 Next Arg1         : 0x0000000080016000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes
Domain0 SysSuspend        : yes

Some irrelevant information have been stripped. We can see next address is the payload (linux kernel entry) address and next arg1 is the device tree address. So the control has been passed to the kernel, but nothing is showed on the screen. The kernel doc about RISC-V booting only mentioned that a0 hartid, and a1 the FDT address. Looking into the CPU instruction trace, PC got stuck at 0xffffffff800000dc. From the kernel disassembly

ffffffff800000cc:	5f018193          	addi	gp,gp,1520 # ffffffff804256b8 <sched_clock_running>
ffffffff800000d0:	18061073          	csrw	satp,a2
ffffffff800000d4:	8082                	ret
ffffffff800000d6:	0001                	nop
ffffffff800000d8:	10500073          	wfi
ffffffff800000dc:	bff5                	j	ffffffff800000d8 <relocate+0x60>

This code can be mapped to arch/riscv/kernel/head.S






















	csrw sptbr, a0
.align 2
1:
	/* Set trap vector to spin forever to help debug */
	la a0, .Lsecondary_park
	csrw stvec, a0

	/* Reload the global pointer */
.option push
.option norelax
	la gp, __global_pointer$
.option pop

	/* Switch to kernel page tables */
	csrw sptbr, a2

	ret
    
.Lsecondary_park:
/* We lack SMP support or have too many harts, so park this hart */
wfi
j .Lsecondary_park

Line 5 ~ 6 setups a temprorary trap address that points to a wfi() j combo, which will spin forever.

(38151087 cycles) [S] ffffffff800144c0:00338097 auipc ra,0x338
  ra        ffffffff8034c4c0
(38151088 cycles) [S] ffffffff800144c4:5d4080e7 jalr ra,1492(ra)
  ra        ffffffff800144c8
(38151200 cycles) [S] LoadPageFault, epc = 0xffffffff8034ca94, tval = 0xffffffff7fc16000
(38151242 cycles) [S] ffffffff80000100:10500073 wfi

After a LoadPageFault, PC jumps to the spin address. It is clear that something is wrong with PMP amd PMA configs that I didn't handle in openSBI firmware. In openSBI firmware, there is a call to sbi_hart_pmp_configure(struct sbi_scratch *scratch) when initializing the platform. This functions configures PMP CSRs according to the Domain previosly set.

The stuck happened in fdt_check_header(params), where the kernel tries to parse the device tree passed from the previos boot stage. I was guessing that the location where the DTB stays when compiling openSBI firmware FW_FDT_PATH is prohibited SU mode from accessing. So I tried moving DTB to a higher address that is valid (0x98000000) in the boot code, which resulted in:

Domain0 Next Address      : 0x0000000080400000
Domain0 Next Arg1         : 0x0000000098000000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes
...
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109
[    0.000000] Linux version 4.20.0+ (jacob@jacob-ubuntu-server) (gcc version 13.2.0 (gc891d8dc23e)) #13 Fri Jun 21 15:32:17 CST 2024
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] initrd not found or empty - disabling initrd
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080400000-0x00000000a03fffff]
[    0.000000]   Normal   [mem 0x00000000a0400000-0x00000a03ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080400000-0x00000000a03fffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080400000-0x00000000a03fffff]

Luckily the kernel has some feedback, but still stuck on Initmem setup node0. I immediately recognize a skeptical spot where Initmem ranges from 0x80400000 to 0xa03fffff. The upper bound should be 0x9fffffff. It turned out that the device tree was wrong.

#address-cells = <2>;
#size-cells = <2>;
ddr: ddr@80000000 {
    device_type = "memory";
-    reg = <0x00000000 0x80400000 0x00000000 0x20000000>;
+    reg = <0x00000000 0x80400000 0x00000000 0x1fc00000>;
};

the last two number in reg field is the size. Since I move the ddr base from 0x80000000 to 0x80400000, the size must be shrunk too. Solved.

Then the kernel stuck at this point, no error message, no other feedback. When I look into the CPU PC, its not in a dead end, but still running.

[    0.000000] Memory: 440772K/520192K available (3377K kernel code, 195K rwdata, 1619K rodata, 124K init, 232K bss, 79420K reserved, 0K cm)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
[    0.000000] plic: mapped 32 interrupts to 2 (out of 2) handlers.
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns

Because there is no feedback (error message) from the kernel, I inserted some custom messages into start_kernel.

[    0.000000] Tick init
[    0.000000] RCU
[    0.000000] Init timers
[    0.000000] hrtimers
[    0.000000] Soft IRQ init
[    0.000000] Time init
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
[    0.000000] Printk safe
[    0.000000] Perf event
[    0.000000] Profile event
[    0.000000] Call function
[    0.000000] Local irq

CPU intstruction trace:
(49066075 cycles) [S] ffffffff80000aec:10016073 csrsi sstatus,0x2
  sstatus   0000000200000102
(49066099 cycles) [S] Interrupt 5, epc = 0xffffffff80000af0, tval = 0x00000000 <- Interrupt 5: S mode timer interrupt
(49066149 cycles) [S] ffffffff800201e0:14021273 csrrw tp,sscratch,tp
  sscratch  ffffffff80508428
  tp        0000000000000000
(49066150 cycles) [S] ffffffff800201e4:00021663 bnez tp,ffffffff800201f0
(49066151 cycles) [S] ffffffff800201e8:14002273 csrr tp,sscratch

After enabling local interrupt, the kernel hangs. By looking at the trace, I found that CPU got a timer interrupt and never came back to kernel startup code. I realized that my timer firmware may have some issue and revised it using aclint mtimer provided by openSBI, which solved the issue.

ACLINT spec 1.1:
The RISC-V ACLINT specification is defined to be backward compatible with the SiFive CLINT
specification.

[   13.860000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[   13.860000] CPU: 0 PID: 1 Comm: init Not tainted 4.20.0+ #18
[   13.860000] Call Trace:
[   13.860000] [<ffffffff8002166c>] walk_stackframe+0x0/0xc0
[   13.860000] [<ffffffff80025f74>] panic+0x110/0x248
[   13.860000] [<ffffffff8002721c>] forget_original_parent+0x2c8/0x2d4
[   13.860000] [<ffffffff8002757c>] exit_notify+0x30/0x144
[   13.860000] [<ffffffff80027884>] do_exit+0x1f4/0x420
[   13.860000] [<ffffffff80028514>] do_group_exit+0x2c/0x8c
[   13.860000] [<ffffffff80031e44>] get_signal+0x100/0x4c0
[   13.860000] [<ffffffff80020de0>] do_notify_resume+0x4c/0x180
[   13.860000] [<ffffffff8002034c>] ret_from_syscall+0xc/0x10
[   13.860000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 ]---
[  310.240000] EXT4-fs (mmcblk0p2): error count since last fsck: 10
[  310.240000] EXT4-fs (mmcblk0p2): initial error at time 457: ext4_iget:5074: inode 1218298
[  310.250000] EXT4-fs (mmcblk0p2): last error at time 703: ext4_iget:5074: inode 1218298

The busybox init was built with floating point instructions, while my firwmare does not provide FP emulation, so I need to rebuild a new rootfs.

Construct rootfs and busybox

Compile busy box

$ ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- make menuconfig
$ ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- make -j$(nproc)
$ ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- make install #Copy all built file to _install/

Make rootfs

# Make rootfs
$ mkdir rootfs
$ cp -r $BUSYBOX_DIR/_install/* rootfs
# Install libraries from toolchain
$ cp -a rv64ima-linux/sysroot/lib/ ../amp/rootfs/
# create empty directories
$ mkdir -p dev home mnt proc sys tmp var
$ mkdir -p etc/init.d

Create a minimal rcS file:

#!/bin/sh +x
# /etc/rcS

export PATH=/sbin:/bin:/usr/bin

mount -t sysfs  sysfs  /sys
mount -t proc   proc   /proc

hostname amp

The rest is the same as before. Make disk uing mkfs.ext3, copy file …
Quick Demo of this version:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Now that I have a slight grasp to the entire flow, including CPU RTL -> openSBI firmware -> Linux drivers -> rootfs, I can start all over again and make it a RV32IMAC CPU.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Linux 核心專題: 建構 RISC-V 相容處理器並運作 Linux 核心

任務描述

JTAG

OpenSBI

Qemu

riscv-gnu-toolchain

Bootloader

TODO: 重現 PYNQ-Z2 : AMP (Arm + RISC-V)

Obtain previous project file

Implementation results

Generate bitstream

Prepare SD card

Prepare booting PYNQ-Z2

PYNQ overlay

Testing peripherals on bus

Download bitstream

Testing BootROM, System SRAM

Testing UART

Testing SPI sd card reader

Download boot code to BootROM

Build rom.bin from source

bbl study

Shared physical memory problem

TODO: 指令集從 RV64 改為 RV32

Memory mapping

UART controller

PLIC (Platform level interrupt controller)

Clint (Core local interrupt controller)

SPI controller

SPI DMA

Debugger CSR

Difference between RV32 and RV64

General

Unprivileged ISA

Privileged ISA

AXI Interconnect

Verifying the CPU

TODO: 用 OpenSBI 驗證

Try openSBI on qemu

Configuring and Compiling linux kernel

ZSBL (Zero stage boot loader)

UART

SD/MMC

File IO

Loader

Migrating from bbl to openSBI firmware

boot openSBI

USB-to-JTAG FT232H driver issue

Separating ARM and RISC-V physical memory space

openSBI trap issue

TODO: 用 Linux 核心驗證

Compile linux kernel 4.20

Boot from openSBI firmware

Construct rootfs and busybox

Build `rom.bin` from source