執行人: millaker
解說錄影: https://youtu.be/wsnKy-woxdQ
重現 2023 年實驗,將主要的 ISA 換為 RV32IMA,並升級到 Linux v6.1,並確定特定的硬體周邊 (如 NIC) 正確運作。
原始專案: 從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇
Reference link : VLSI tutorial: JTAG
SBI
is a RISC-V specific term that means supervisor binary interface. It acts as the bridge between the program running in the supervisor mode, and the underlying SEE (Supervisor execution environment). OpenSBI
is one of the open source SBI implementations. A list of other implementation can be found in the riscv SBI documentation.
Qemu provides an emulated RISC-V 32-bit CPU that can be used test our software and drivers. To build qemu with riscv target configure the build with target=riscv32-softmmu
.
In order to build binary programs for out target ISA, we need to use a custom compiler toolset capable of doing this. risc-gnu-toolchain hosted a suite of tools that we can use including gcc, objdump. Configure the build with --with-arch=rv32ima_zicsr_zifencei --with-abi=ilp32
.
zicsr
andzifencei
was separated from theI
extension and must be specified explicitly.
Wikipedia
First stage bootloader is typically stored in ROM, or BootROM. It initializes the platform and loads the next stage bootloader. In my case, PYNQ-Z2 development board initialized the board peripherals including DDR when booting the ARM core, therefore there is no need to init DDR myself. One thing that concerns me is that in the previous implementation, the second stage bootloader is loaded in "System SRAM", a dedicated memory space just for the boot sequence. The author wanted to follow the boot sequence of an embedded ARM core, but in my opinion, this is just a waste of BRAM on the chip.
Since the Vivado project file is available, I'll use the same project file to avoid trivial tasks.
Open the project using Vivado 2022.2
Three AMD/Xilinx proprietary IPs used in this project, AXI Smartconnect, AXI_APB Bridge and AXI Interconnect requires update. I expect the behaviour of the IPs to not change after the revision, so no modification is needed. Due to this occurence, I'm considering writing my own axi interconnect and bridge.
After some inspection, I realized that these IPs were used to connect the CPU AXI ports to PS. The interconnects can be omitted if the interface naming follows the Xilinx naming convention.
Then its the EDA tools effort.
Resource | Utilization | Available | Utilization % |
---|---|---|---|
LUT | 40839 | 53200 | 76.77 |
LUTRAM | 879 | 17400 | 5.05 |
FF | 27078 | 106400 | 25.45 |
BRAM | 45.50 | 140 | 32.50 |
DSP | 36 | 220 | 16.36 |
IO | 17 | 125 | 13.60 |
BUFG | 5 | 32 | 15.63 |
The placement is not what I concern at this moment, so I simply ignore the hardware implementation. From the table above, we can clearly see that the FPGA LUTs and FFs were not fully utilized, which means that there is some space for hardware improvements. The FPGA on-chip sram analogy, BRAM, were under utilized as well, which means a bigger cache is possible. Since I dont know the implementation parameters used for this CPU just yet, I'll come back to this later when I figure them out.
Analyze worst neg slack path, seek improvement possibilties.
Previous work uses generate hardware platform
to obtain the bitstream from the .xsa
file, which can be opened by any archive software according to the article. I'm curious about the xsa
file, and found this thorough explanation. The xsa
file contains:
I'm only interested in the bitstream; Therefore using generate bitstream
has the same output.
The system sees SD card as its secondary storage, and two partitions must be provided for the system to run. The first partition stores the bootloader and the operating system, the second partition stores rootfs.
On Linux, use lsblk
to list block devices
The device called sda
is the sdcard connected to my computer. Since I got the SD card from the previous author, the SD card is already populated with required files beforehand.
We can distinguish the partitions by inspecting the FS Type field.
I'll come back to this later when I'm switching the bootloader or the operating system.
I followed the official instructions on how to boot the arm core.
After correctly setting up the board, we can see two serial devices being connected to my computer.
Access the second serial device /dev/ttyUSB1
to access PYNQ/Linux.
I dont know what is being transmitted/received on the first serial device. Maybe I'll check the manual later.
TODO: Check Zynq-7000 series manual for ttyUSB0 and ttyUSB1.
PYNQ overlay provides us with an easy way to program the FPGA chip with our own bitstream. We can also access the AXI system bus from PS to verify our peripherals connected to the bus.
The original author chose this way to debug the peripherals including JTAG, SD card…
The first time I call Overlay('soc_xsa.bit')
to download bitstream onto the FPGA, python emitted error messages as below:
Few people mentioned this error that I have no idea whats wrong with the PortType
thing. I found this thread and this thread similar to me but with some slight differences. User stf's answer gave me a hint about the problem, pynqmetadata
used in the script got updated and was not compatible with unmatched bitstream versions. So I downgraded the pynq prebuilt image from 3.0 to 2.7, and luckily the bitstream worked this time.
Now that I can download bitstream to PL, tests can be made via pynq overlay. The overall address mapping of the whole system is listed below:
Name | Size | Address | End |
---|---|---|---|
BootROM | 8KB | 0x00000000 | 0x00001FFF |
Reserved | 0x00002000 | 0x0001FFFF | |
SystemRAM | 128KB | 0x00020000 | 0x0003FFFF |
Reserved | 0x00040000 | 0x03FFFFFF | |
CPU Config | 4KB | 0x04000000 | 0x04000FFF |
Reserved | 0x04001000 | 0x04001FFF | |
Debug Monitor | 8KB | 0x04002000 | 0x04003FFF |
Reserved | 0x04004000 | 0x07FFFFFF | |
CLINT | 64KB | 0x08000000 | 0x0800FFFF |
Reserved | 0x08010000 | 0x0BFFFFFF | |
PLIC | 64MB | 0x0C000000 | 0x0FFFFFFF |
UART | 4KB | 0x10000000 | 0x10000FFF |
SPI | 4KB | 0x10001000 | 0x10001FFF |
MAC | 4KB | 0x10002000 | 0x10002FFF |
Reserved | 0x10003000 | 0x7FFFFFFF | |
DRAM | 512 MB | 0x80000000 | 0x9FFFFFFF |
Use Overlay()
to parse bitstream and download to PL.
Use MMIO()
to write/read data to/from the AXI bus.
Testing method:
We can see from the results that the memory and the bus both worked as expected. For system sram on the bus, the test is identical. Set the base address and modify it, which results in
UART requires some configuration like base address, baud rate.
TODO: Initializing details, will be useful when writing driver.
Connect a uart-usb converter to the TX, RX, VDD, GND pin on the board and monitor the output.
The SPI control register is defined as follow:
Bit Position | Field Name | Description |
---|---|---|
31:16 | 15'b0 | Reserved bits (write with zeros) |
15 | spi_cr1_del | Delay |
14 | spi_cr1_bidimode | Bidirectional data mode enable |
13 | spi_cr1_bidioe | Output enable in bidirectional mode |
12 | spi_cr1_crcen | Hardware CRC calculation enable |
11 | spi_cr1_crcnext | CRC transfer next |
10 | spi_cr1_dff | Data frame format (0: 8-bit, 1: 16-bit) |
9 | spi_cr1_rxonly | Receive only |
8 | spi_cr1_ssm | Software slave management |
7 | spi_cr1_ssi | Internal slave select |
6 | spi_cr1_lsbfirst | Frame format (0: MSB first, 1: LSB first) |
5 | spi_cr1_spe | SPI enable |
4:2 | spi_cr1_br | Baud rate control |
1 | spi_cr1_mstr | Master selection (0: Slave, 1: Master) |
0 | spi_cr1_cpol | Clock polarity (0: CK to 0 when idle, 1: CK to 1 when idle) |
0 | spi_cr1_cpha | Clock phase (0: First clock transition, 1: Second clock transition) |
After initializing the SPI module, we must set the SD card to SPI mode by sending appropriate commands. Then we can read sector information from the SD card
The previouse design of BootROM is actually a readable/writable memory sitting on the bus. I guess its for faster developement and debugging purposes. Therefore there is nothing in the ROM when PL is programmed, and requires an additional step to load the memory manually.
Use the existing debugger provided by the author to download boot file to ROM.
The debugger was written in C# and is a Windows only app, so I planned to write a Linux version of it.
I couldn't get the debugger to work. The debugger can detect the JTAG-USB device but emitted Get input buffer timeout
error. I don't know how the dubugger works so I left this issue unsolved and move on to the next approach for now.
Since the BootROM is on the AXI bus, I can write data to it from the python overlay. I uploaded rom.bin
to the SD card and read from python.
And then start the CPU same as the debugger.
From the risc-v UART output, we can observe that the boot up code is actually running. bbl and vmlinux is loaded to System SRAM and ready for execution. However, the program got stuck and produced no other output. I have no idea what bbl did, and why did it get stuck, so I'll first study bbl and vmlinux boot sequence, try to compile it and run it with qemu.
Try using the original bbl and vmlinux from google drive.
I followed commands from the previous work to partition the SD card and format the two partitions.
fdisk
to create two new partitions.mkfs
to format the partitions into FAT32 and ext3.This time bbl never got loaded correctly and resulted in an infinite loop inside the zero stage boot loader.
At first I thought that I something went wrong with the fdisk
command until I read the SD card partition 1 in the python overlay the following content
The FAT16
word shows that the file system has FAT16 format, which is not the same as the one created by fdisk
. According to mkfs.fat(8)
manual page, -F 32
option specifiies FAT size. Then it works.
Now that I've reproduced the previous work, some questions came to me with this design.
rom.bin
, bbl
, vmlinux
were built from source.The first two questions can be solved by reading the source code. The third problem requires a hardware modification if the sram is removed.
rom.bin
from sourceIf I compile the rom code with riscv gnu cross compiler from /rom, I will get link error as showed below:
The bootrom has max size 8 KB but the binary exceeds the limit. So I shortened some of the debug messages and successfully built the binary file. However, I couldn't boot from the new rom.bin I just built.
When analyzing what setup.S
did, I realized that there were ifdef
guards for UART initialization and cache init.
which is missing in the original Makefile. After adding this line
DEFINE := -DENABLE_UART_IRQ -DENABLE_CACHE -DREAL_ROM
rom.bin worked.
The current boot flow requires ROM to setup UART and SPI-SD card reader and will load bbl, vmlinux to a fix location. To this end, I'm skeptical about what is the purpose of bbl (Berkeley boot loader) if vmlinux is loaded in DDR already. So I dived into bbl source code and try to understand what I missed here.
The entry point of bbl can be found in the linker script bbl.lds
which is located in machine/mentry.S
. reset_vector
simply resets all the registers and mscratch
to zero. The machine mode trap handler is set to a bbl trap_vector
with the following sequence
After all the reset, init_first_hart()
is called. init_first_hart()
in brief:
query_uart()
etc…init_hart()
mstatus_init()
: Set mstatus
for availablie extensions, like F extension, V extension…fp_init()
: Setup floating point control registers if F extension is supported.delegate_trap()
: If Supervisor mode is supported, send s-mode interrupts and exceptions to s-mode instead of mode. Set mideleg
and medeleg
CSRs.setup_pmp()
: Setup PMP to allow access to all memory locations. (Details to be figured out…)query_*
functions that searchs the device tree blob(dtb) for information about our platform hardwares.bootloader()
:
filter_dtb()
: Place dtb right after the kernel code.boot_other_hart()
: Get the entry point and enter supervisor mode with that entry. When looking at boot_other_hart()
, I found something I couldn't comprehendWe can see that boot_other_hart()
takes one uintptr_t
as an argument, but it is named unused
and has a compiler attribute unused. This argument is really unused as stated, and I dont know what is its purpose. Looking at the GNU extension documentation:
unused
This attribute, attached to a variable, means that the variable is meant to be possibly unused. GCC does not produce a warning for this variable.
This attribute has no other meaning but to let compiler ignore warnings related to this variable.
Notice the define guard at line 19 BBL_BOOT_MACHINE
, this is used when riscv pk (RISC-V proxy kernel) is used, which use bbl as a fake machine so I can ignore that part.
In conclusion, the zero stage boot loader rom.bin
and bbl did almost the same thing and therefore can be reduce to a more minimal boot code. bbl must be swapped with openSBI for more portability and more complex boot loader if possible.
In the previous work, the same physical memory is shared between the two operating systems, and I found no keys showing that ARM core is turned off. Therefore, in order to make both operating system live together, the easiest way is to separate two DDR regions for each OS with dtb until I learn another approach to deal with this issue. I've gone through multiple official documentations and found no way to completely stop PS(ARM core).
Hardware RTL analysis
Name | Size | Address | End |
---|---|---|---|
BootROM | 8KB | 0x00000000 | 0x00001FFF |
Reserved | 0x00002000 | 0x0001FFFF | |
SystemRAM | 128KB | 0x00020000 | 0x0003FFFF |
Reserved | 0x00040000 | 0x03FFFFFF | |
CPU Config | 4KB | 0x04000000 | 0x04000FFF |
Reserved | 0x04001000 | 0x04001FFF | |
Debug Monitor | 8KB | 0x04002000 | 0x04003FFF |
Reserved | 0x04004000 | 0x07FFFFFF | |
CLINT | 64KB | 0x08000000 | 0x0800FFFF |
Reserved | 0x08010000 | 0x0BFFFFFF | |
PLIC | 64MB | 0x0C000000 | 0x0FFFFFFF |
UART | 4KB | 0x10000000 | 0x10000FFF |
SPI | 4KB | 0x10001000 | 0x10001FFF |
MAC | 4KB | 0x10002000 | 0x10002FFF |
Reserved | 0x10003000 | 0x7FFFFFFF | |
DRAM | 512 MB | 0x80000000 | 0x9FFFFFFF |
The address mapping
cnt
field is the threshold for almost empty and almost full for TX and RXMemory mapping:
Name | Offset | Description |
---|---|---|
Priority | 0x0000000 | Zero: Never interrupt |
Pending | 0x0001000 | Current status of the interrupt, 1 bit per source |
Enable | 0x0002000 | Decides if the interrupt source is enabled |
Priority threshold | 0x0200000 | Priority threshold for each context (hart) |
Claim/Clear | 0x0200000 + 0x4 | Claim or Clear interrupt for each context (hart) |
According to PLIC spec:
Context
Interrupt targets are usually hart contexts, where a hart context is a given privilege mode on a given hart (though there are other possible interrupt targets, such as DMA engines).
PLIC spec defines all the memory mapping and meaning of each register. So there is no need to write my own firmware. However, the verilog source code defines two extra field PLIC_INT_TYPE
and PLIC_INT_POL
which I cannot tell what are their usage just by their name.
In cpu/plic.sv
I found out that those two fields are gateway configs which determines the type (edge or level trigger) and polarity (high or low trigger) of each source. The default config is high level trigger.
Memory mapping:
Name | Offset | Note |
---|---|---|
msip | 0x0 | Machine mode software interrupt |
mtimecmp | 0x4000 | Machine mode timer compare register |
mtime | 0xBFF8 | Timer register |
mtime
and mtimecmp
are defined in RISC-V priviledged ISA section 3.2.1.
The machine timer interrupt becomes pending when mtimecmp
is greater than or equal to mtime
.
Memory mapping:
Name | Offset | Description |
---|---|---|
SPI_CR1 | 0x00 | Control register 1 |
SPI_CR2 | 0x04 | Control register 2 |
SPI_SR | 0x08 | Status register |
SPI_DR | 0x0C | Data register |
Control register 1:
cpha
: Clock phasecpol
: Clock polaritymstr
: Master selectionbr
: Baud rate controlspe
: SPI enablelsbfirst
: shifting from lsb or msbssi
: Internal slave selectssm
: Software slave management, whether use internal ssi
bit to selectrxonly
: Receive onlydff
: Data frame format, 8 or 16 per data framecrcnext
: Transmit CRC nextcrcen
: Hardware CRC calculation enablebidioe
: Output enable in bidirectional modebidimode
: Bidirectional data mode enabledel
: Data frame formatControl register 2:
Status register:
bsy
: SPI busy flagovr
: Overrun flagmodf
: Mode fault flagcrcerr
: CRC error flagudr
: Underrun flagchside
: Channel side flagrxne
: Receive buffer not emptytxe
: Transmit buffer emptyNormal execution flow requires CPU to read txe
and rxne
in SPI_CR2 to determine whether tx rx buffer needs attention. This will waste CPU cycles doing trivial tasks, therefore a DMA is introduced inside the SPI core. DMA will write to or read from the buffers whenever txe/rxne is set, freeing the CPU to other critical tasks.
STM32 RM0090 Reference manual 28.3.5, for more data transmission details.
Memory mapping:
Name | Offset | Description |
---|---|---|
DMA_SRC | 0x00 | Source address for DMA transfer |
DMA_DEST | 0x04 | Destination address for DMA |
DMA_LEN | 0x08 | Length of DMA transfer |
DMA_CON | 0x0C | DMA control register |
DMA_IE | 0x10 | DMA interrupt enable register |
DMA_IP | 0x14 | DMA interrupt pending register |
DMA_IC | 0x18 | DMA interrupt clear register |
DMA_WDT_CNT | 0x1C | DMA watchdog timer count register |
SPI DMA is responsible for transfering data from DDR to SD card. It is capable of interacting with the SPI controller, sending tx/rx commands.
Control register:
WORD
, HWORD
, BYTE
FIXED
, INCR
, CONST
APB Memory mapping:
Name | Address | Description |
---|---|---|
DBGAPB_DBG_EN | 12'h000 | Debug enable register |
DBGAPB_INST | 12'h004 | Debug instruction register |
DBGAPB_INST_WR | 12'h008 | Debug instruction write register |
DBGAPB_WDATA_L | 12'h010 | Debug write data low register |
DBGAPB_WDATA_H | 12'h014 | Debug write data high register |
DBGAPB_WDATA_WR | 12'h01C | Debug write data write enable register |
DBGAPB_RDATA_L | 12'h020 | Debug read data low register |
DBGAPB_RDATA_H | 12'h024 | Debug read data high register |
Monitor memory mapping:
Definition | Address | Description |
---|---|---|
DBGMON_BP0 | 13'h1100 | Debug monitor breakpoint 0 |
DBGMON_BP1 | 13'h1108 | Debug monitor breakpoint 1 |
DBGMON_BP2 | 13'h1110 | Debug monitor breakpoint 2 |
DBGMON_BP3 | 13'h1118 | Debug monitor breakpoint 3 |
DBGMON_WP0 | 13'h1120 | Debug monitor watchpoint 0 |
DBGMON_WP1 | 13'h1128 | Debug monitor watchpoint 1 |
DBGMON_WP2 | 13'h1130 | Debug monitor watchpoint 2 |
DBGMON_WP3 | 13'h1138 | Debug monitor watchpoint 3 |
DBGMON_VC_EXC | 13'h1140 | Debug monitor vector catch exception |
DBGMON_VC_IRQ | 13'h1144 | Debug monitor vector catch interrupt |
DBGMON_DELAY | 13'h1148 | Debug monitor delay |
DBGMON_STOP_TRACE | 13'h114c | Debug monitor stop trace |
DBGMON_IE | 13'h1150 | Debug monitor interrupt enable |
There are more address mapped to architectural registers.
To trace instruction execution sequence after a specific PC, I need to:
Ref: RiscV ISA manual
This section lists the difference between RV32I and RV64I, which act as a to-do or reference for me.
XLEN
: width of the integer register in bits. 64 for rv64i
and 32 for rv32i
. This number is also related to the size of the supported max address space.RiscV unprivileged ISA Chapter 4.0:
- This chapter describes the RV64I base integer instruction set, which builds upon the RV32I variant described in Chapter 2.
Chapter 4.2:
- Most integer computational instructions operate on XLEN-bit values.
Chapter 13.1, 13.2:
- MULW is an RV64 instruction…
- DIVW and DIVUW are RV64 instructions…
- REMW and REMUW are RV64 instructions…
Chapter 14.2:
- LR.D and SC.D act analogously on doublewords and are only available on RV64.
The 64-bit ISA is built upon RV32I. I will list out the differences between them.
XLEN
value).ADDIW
, SLLIW
, SRLW
, SUBW
, SRAW
). These can be removed in RV32I.LD
and SD
can be removed.XLEN
.LR.D
and SC.D
can be removedAMO*.D
can be removedModified file:
alu.sv
: Remove 64-bit arithmetic logicsdec.sv
: Remove 64-bit instructions, which means removed illegal instructions will be decoded to ill_isns
.mdu.sv
: Mul/Div behaviorRISC-V privileged ISA Chapter 3.1.6:
MXLEN
is the effectiveXLEN
in M-mode- For RV32 only, there is
mstatush
register, which contains the same field in RV64mstatus
uppper 32 bit.- When MXLEN=32, the SXL and UXL fields do not exist, and SXLEN=32 and UXLEN=32.
mtvec
isMXLEN
bits long.mdelegh
,menvcfgh
,mseccfgh
is the alias bits of the upper half 64-bit non-h counterpart.msratch
,mepc
,mcause
,mtval
,mconfigptr
isMXLEN
bitsmtime
is still 64-bit precision. In RV32, memory-mapped writes to mtimecmp modify only one 32-bit part of the register.Physical memory protection Chapter 3.7.1:
- 16 pmpcfg registers are 32-bit compared to 8 64-bit regs in RV64.
- pmp address is 32-bit as well, storing the
addr[33:2]
in the address field.- S-mode has almost the same changes in the CSR fields.
SV32 Chapter 11.3:
- Two level page table is used compared to three in SV39.
- The hardware page table walker does not require a change because the hardware checks if this entry is a leaf node and stops there. The level isn't that relevant to the walking process.
The upper 32 bits of medeleg
in RV64 is hardwired to zeros, so medelegh
can be all zero. menvcfg
, mseccfg
registers did not exist in this design, so the additional *h version can be ignored.
The AXI Interconnect bitwidth is 32 bits, which conforms to RV32. Take AR channel as an example,
In RV64, the read is done by two bursts, therefore no need to change the bus and the corresponding masters (cache, …).
I found only the system level testbench for cpu_wrap
under scripts/
. I think a full test on the whole system is an overkill and verifying only cpu_top
is adequate since I did not files other than cpu core. I need to figure out how to compile and use riscv-tests. There is no document on how to do this.
Simple rv32i test assembly: https://github.com/hamsternz/simple-riscv/blob/main/sw/asm/isa_test.S
Verify the core first, inspect the IO of the core
Most of the output ports are irrelevant when verifying baremetal rv32 so I can ignore them. I wrote a simple memory model for imem
and dmem
.
bbl (boot loader),確認 Sv32 (MMU) 可運作
After watching openSBI Deep dive by WD I could grasp the big picture of openSBI. There are many features but I need only a small set of them to boot linux. The new boot flow will be like this:
To keep ZSBL small enough, I think iolib
can be removed and use LEDs to indicate the boot process. As soon as openSBI is ready, the console can then be init. This is initial plan, so changes may be made to this boot flow. I don't need to do all the hardware(PLL, DDR, …) inits, since zynq FSBL will do this for me.
To further understand openSBI in action, I use qemu to try out openSBI firmware.
Prepare riscv cross-compiler and compile openSBI with default qemu platform and no payload:
Then execute the firmware with qemu:
I got the openSBI output:
Notice that this platform is the default platform for qemu emulation, and we need to implement a new platform for our SOC. The execution stopped at Test payload running
because I didn't specify any next stage payload.
Next, I compiled linux kernel v6.4 from source and got a bootable image.
Then, I prepared a rootfs image with buildroot all with default config.
To compile openSBI firmware with a payload, the linux kernel in my case, the following command is used:
and emulate with :
I got an error running the command above:
If I delete the append
flag, the kernel would start but couldn't load the filesystem correctly.
I need to specify the root device block in the kernel command line but I couldn't append anything without kerenl
option.
My guess is that the tutorial is using the older version of qemu. So instead I run the example using fw_jump
(with jump address):
Now I'm more familiar with openSBI and linux boot sequence, I will try to configure qemu system like my SOC's environment and do the openSBI debugging on qemu first. Its more simpler and faster.
After searching how to specify kernel command line at compile time, I found out that there is a config dedicated to this purpose. In arch/riscv/Kconfig
:
I can set CMDLINE_FORCE
to always use default kernel command string. So I reconfigure the .config file to use the qemu default:
and rebuilt the kernel.
When running the fw_payload
opensbi firmware, the same issue occurs.
The Kernel command line was empty.
😅It turned out that I forgot to recompile the opensbi firmware with the latest linux kernel image. This showed an important drawback of booting linux kernel directly from openSBI firmware.
Useful reference: Linux Kernel configuration list
openSBI relies on the previous bootloader to load its firmware to DDR. The zero stage bootloader is quite simple since all the clock, DDR is intialized by PS.
To emit zsbl boot process information, we can use LEDs or UART as a reliable feedback. Since the bus is ready when running boot rom, uart is enabled and more informative than LED indicators. The uart init sequence is similar to openSBI firmware but with only putc
and puts
capabilities.
Ref: How to use MMC/SDC
Ref: SPI
In order to use MMC/SDC in my system, I need to first initialize SPI and then put SD card into SPI mode. The following shows the initialization process:
Note:
- ACMD(N) is a sequence of CMD55-CMD(N)
- SPI mode is block addressing
After putting SD card into SPI mode, we can send CMD17 to read a block and CMD24 to write a block. Upon receiving a valid response for read/write block commands, we can utilize DMA to move the data packets.
I'm not 100% sure how the custom DMA works at this point
dma_cfg
has the function signature shown above. The macro dma_spi2buf
will read data from SPI data register to the destination address. At first, I was skeptical about the src
config 0xffffffff
. But later I when I looked at the RTL source
I found that TYPE_CONST
means SPI mode which is different from TYPE_FIXED
for fixed address. dma_src
which is configured 0xffffffff
will act as a mask to the data. The bypass
probably means that DMA need not to read via AXI bus -> APB -> SPI.
To load files from the bootable section partition 0, I need to understand the details of the filesystem used.
Ref: Microsoft FAT spec
With adequate file operations, we can load the elf file from SD card to DDR.
The new boot flow will look like this graph. Zero stage boot loader is repsonsible of loading openSBI firmware to DDR. The linux kernel is bundled with openSBI firmware as payload. After openSBI initialization, the CPU will execute in supervisor mode and the handle will be transfered to the kernel.
Create a new platform in openSBI following the official guide. The file structure will look like this:
Kconfig
and defconfig
will provide build time configuration options. platform.c
will provide struct sbi_platform
object for building openSBI firmware.
The official repo kindly provides a template for new-built platform. There is a generic
platform used by many SoC vendors including Andes, Sifive, THead. The generic
platform is FDT (flattten device tree) based platform, and its really overkill in my case. The hardware info can be hardcoded in the firmware for my design. However, I need to figure out how does openSBI generate and pass the device tree blob to the next stage if I choose this method for the new platform.
I created a new platform called amp
for Asymmetric multiprocessing. To make debugging firmware easier, I need to maker serial console work as soon as possible. I followed how sifive implemented there own uart firmware and created amp-uart.[ch]
under sbi_utils/serial
.
I cannot find any specification for the custom uart controller, so I search for the original RTL source code. See Hardware RTL UART controller section for more detail. I then implemented putc
and getc
for struct sbi_console_device
.
puts
is not required because sbi_console
will check if it is implemented and choose between using puts
or iterative putc
s.
Next the irqchip (PLIC controller). See Hardware RTL PLIC section for more memory mapping details. OpenSBI provides a set of PLIC APIs in sbi_utils/plic.c
, which I can use in the firmware. Simply fill in my PLIC config
and I can use the implemented PLIC firmware.
Next the timer firmware. The hardware description can be found in CLINT section. A sbi timer device must implement the following attributes
timer_event_start
and timer_event_stop
allows the operating system to do scheduling and other timing related operations.
With uart controller, plic, clint firmware, my platform can be initialized.
After adding platform amp
to the compilation config options, I built the platform with:
To specify platform info, use PLATFORM_RISCV_ISA
, PLATFORM_RISCV_ABI
, PLATFORM_RISCV_XLEN
for a specific architecture. For example:
After loading openSBI firmware to DDR, zsbl will jump to firmware entry point and firmware will start its work. Then I got this error message:
I also encountered the situation that the CPU jumps execution back to 0x0, which restarts the rom code again. Not sure what caused this issue. The next step will be enabling JTAG debugger and correct physical memory separation between ARM and RISC-V.
The JTAG debugger written in C# provided by the previous work will run into error when I followed the same steps. To make sure that the FT232H chip is functioning, I will do some basic tests on it. If the chip is working, then its the connection issue or RTL failure.
Ref: FTDI-in-C
Ref: FTDI driver programming guide
Ref: FTDI JTAG
I tried to use the example driver code for FT232H chip from Reference 1, and found no issue running the code. The offficial read buffer code is like:
and the one in the debugger
Despite the programming language different, there was a retry limit in the debugger. I commented out the retyr limit and did a quick try, then it worked!
This is an important milestone because it allows me to trace instructions executed by the CPU and probe some internal states. A future plan is to port this debugger to Linux for convenient as Windows is not my main working computer.
Ref: https://gist.github.com/yunqu/827862e580a5f9b069eccdfcdcf70398
Followed the tutorial, and get pynq boot image info
Extract the device tree wiht
convert to humanreadable format and identify physical memory field
change reg = <0x00 0x20000000>
to reg = <0x00 0x10000000>
then recompile device tree.
then repackage the boot image with
After a reboot, I checked the memory usage:
I searched for other methods but still failed to overcome this issue.
I found that my platform timer init has the wrong implementation, which executes ebreak
and halted openSBI runtime. openSBI successfully printed out all the hart info and platform info after a quick fix:
Notice that some fields have weird values, like Boot HART Base ISA
, Boot HART ISA Extensions
and the dummy payload did not print out test payload running
. Debugger showed that
after a mret
, the CPU caught an InstructionAccessFault execption.
Boot HART Base ISA
was implemented as rv64eimac
in the RTL, so there is no issue.
Boot HART ISA Extensions
requires platform code implementing extensions_init
. So I need to implement that. It is related to PMP as well.
Ref: OpenSBI Domain
If I removed the domain memory region I added myself in the sbi firmware, the test payload can run without error.
The first region is PLIC config. Then the two regions protecting openSBI firmware. I couldn't find which context added the fourth region. The last region is by default the rest of the memory space. The payload is a while(1) wfi();
loop.
Prepare toolchain beforehand, then put in custom drivers.
Add driver files and modify Makefile
Configure features
Compile kernel
Fix some minor issues like require zifenci
, DECLARE_TASKLET
.
Then bundle the result Image
with openSBI firmware as a payload.
Then boot from SBI gave an error,
Some irrelevant information have been stripped. We can see next address is the payload (linux kernel entry) address and next arg1 is the device tree address. So the control has been passed to the kernel, but nothing is showed on the screen. The kernel doc about RISC-V booting only mentioned that a0
hartid, and a1
the FDT address. Looking into the CPU instruction trace, PC got stuck at 0xffffffff800000dc. From the kernel disassembly
This code can be mapped to arch/riscv/kernel/head.S
Line 5 ~ 6 setups a temprorary trap address that points to a wfi()
j
combo, which will spin forever.
After a LoadPageFault, PC jumps to the spin address. It is clear that something is wrong with PMP amd PMA configs that I didn't handle in openSBI firmware. In openSBI firmware, there is a call to sbi_hart_pmp_configure(struct sbi_scratch *scratch)
when initializing the platform. This functions configures PMP CSRs according to the Domain
previosly set.
The stuck happened in fdt_check_header(params)
, where the kernel tries to parse the device tree passed from the previos boot stage. I was guessing that the location where the DTB stays when compiling openSBI firmware FW_FDT_PATH
is prohibited SU mode from accessing. So I tried moving DTB to a higher address that is valid (0x98000000) in the boot code, which resulted in:
Luckily the kernel has some feedback, but still stuck on Initmem setup node0
. I immediately recognize a skeptical spot where Initmem
ranges from 0x80400000
to 0xa03fffff
. The upper bound should be 0x9fffffff
. It turned out that the device tree was wrong.
the last two number in reg field is the size. Since I move the ddr base from 0x80000000
to 0x80400000
, the size must be shrunk too. Solved.
Then the kernel stuck at this point, no error message, no other feedback. When I look into the CPU PC, its not in a dead end, but still running.
Because there is no feedback (error message) from the kernel, I inserted some custom messages into start_kernel
.
After enabling local interrupt, the kernel hangs. By looking at the trace, I found that CPU got a timer interrupt and never came back to kernel startup code. I realized that my timer firmware may have some issue and revised it using aclint mtimer provided by openSBI, which solved the issue.
ACLINT spec 1.1:
The RISC-V ACLINT specification is defined to be backward compatible with the SiFive CLINT
specification.
The busybox init was built with floating point instructions, while my firwmare does not provide FP emulation, so I need to rebuild a new rootfs.
Compile busy box
Make rootfs
Create a minimal rcS file:
The rest is the same as before. Make disk uing mkfs.ext3, copy file …
Quick Demo of this version:
Now that I have a slight grasp to the entire flow, including CPU RTL -> openSBI firmware -> Linux drivers -> rootfs, I can start all over again and make it a RV32IMAC CPU.