# Linux 核心專題: 建構 RISC-V 相容處理器並運作 Linux 核心 > 執行人: millaker > 解說錄影: https://youtu.be/wsnKy-woxdQ ## 任務描述 重現 [2023 年實驗](https://hackmd.io/@sysprog/S1jNiYgr2),將主要的 ISA 換為 RV32IMA,並升級到 Linux v6.1,並確定特定的硬體周邊 (如 NIC) 正確運作。 原始專案: [從零開始的RISC-V SoC架構設計與Linux核心運行 - 硬體篇](https://hackmd.io/@w4K9apQGS8-NFtsnFXutfg/B1Re5uGa5#SoC%E6%9E%B6%E6%A7%8B) #### JTAG Reference link : [VLSI tutorial: JTAG](https://vlsitutorials.com/jtag-architecture-overview/) #### OpenSBI `SBI` is a RISC-V specific term that means supervisor binary interface. It acts as the bridge between the program running in the supervisor mode, and the underlying SEE (Supervisor execution environment). `OpenSBI` is one of the open source SBI implementations. A list of other implementation can be found in the [riscv SBI documentation](https://github.com/riscv-non-isa/riscv-sbi-doc). #### Qemu Qemu provides an emulated RISC-V 32-bit CPU that can be used test our software and drivers. To build qemu with riscv target configure the build with `target=riscv32-softmmu`. #### riscv-gnu-toolchain In order to build binary programs for out target ISA, we need to use a custom compiler toolset capable of doing this. [risc-gnu-toolchain](https://github.com/riscv-collab/riscv-gnu-toolchain) hosted a suite of tools that we can use including gcc, objdump. Configure the build with `--with-arch=rv32ima_zicsr_zifencei --with-abi=ilp32`. > `zicsr` and `zifencei` was separated from the `I` extension and must be specified explicitly. #### Bootloader [Wikipedia](https://en.wikipedia.org/wiki/Booting#Modern_boot_loaders) First stage bootloader is typically stored in ROM, or BootROM. It initializes the platform and loads the next stage bootloader. In my case, PYNQ-Z2 development board initialized the board peripherals including DDR when booting the ARM core, therefore there is no need to init DDR myself. One thing that concerns me is that in the previous implementation, the second stage bootloader is loaded in "System SRAM", a dedicated memory space just for the boot sequence. The author wanted to follow the boot sequence of an embedded ARM core, but in my opinion, this is just a waste of BRAM on the chip. ## TODO: 重現 PYNQ-Z2 : AMP (Arm + RISC-V) > [將 Linux 執行於 FPGA 為基礎 RISC-V 處理器](https://hackmd.io/@sysprog/S1jNiYgr2) ![image](https://hackmd.io/_uploads/ByTY8RpUR.png) ![image](https://hackmd.io/_uploads/SJn12T6IR.png) ### Obtain previous project file Since the Vivado project file is available, I'll use the same project file to avoid trivial tasks. Open the project using Vivado 2022.2 ![image](https://hackmd.io/_uploads/ByAkyC0V0.png) Three AMD/Xilinx proprietary IPs used in this project, AXI Smartconnect, AXI_APB Bridge and AXI Interconnect requires update. I expect the behaviour of the IPs to not change after the revision, so no modification is needed. Due to this occurence, I'm considering writing my own axi interconnect and bridge. After some inspection, I realized that these IPs were used to connect the CPU AXI ports to PS. The interconnects can be omitted if the interface naming follows the Xilinx naming convention. Then its the EDA tools effort. 1. Synthesis 2. Implementation (Place and Route) 3. Generate bitstream ### Implementation results | Resource | Utilization | Available | Utilization % | |----------|-------------|-----------|---------------| | LUT | 40839 | 53200 | 76.77 | | LUTRAM | 879 | 17400 | 5.05 | | FF | 27078 | 106400 | 25.45 | | BRAM | 45.50 | 140 | 32.50 | | DSP | 36 | 220 | 16.36 | | IO | 17 | 125 | 13.60 | | BUFG | 5 | 32 | 15.63 | The placement is not what I concern at this moment, so I simply ignore the hardware implementation. From the table above, we can clearly see that the FPGA LUTs and FFs were not fully utilized, which means that there is some space for hardware improvements. The FPGA on-chip sram analogy, BRAM, were under utilized as well, which means a bigger cache is possible. Since I dont know the implementation parameters used for this CPU just yet, I'll come back to this later when I figure them out. > Analyze worst neg slack path, seek improvement possibilties. ### Generate bitstream Previous work uses `generate hardware platform` to obtain the bitstream from the `.xsa` file, which can be opened by any archive software according to the article. I'm curious about the `xsa` file, and found this [thorough explanation](https://ohwr.org/project/soc-course/wikis/Reverse-Engineering-the-XSA-File). The `xsa` file contains: 1. Hardware descriptions (xml) 2. Bitstream 3. Tcl script to rebuild the block design I'm only interested in the bitstream; Therefore using `generate bitstream` has the same output. ### Prepare SD card ![image](https://hackmd.io/_uploads/SkN-J00VR.png) The system sees SD card as its secondary storage, and two partitions must be provided for the system to run. The first partition stores the bootloader and the operating system, the second partition stores rootfs. On Linux, use `lsblk` to list block devices ``` RISC-V_SoC/src/cpu$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS ... sda 8:0 1 29G 0 disk ├─sda1 8:1 1 10G 0 part └─sda2 8:2 1 19G 0 part nvme0n1 259:0 0 931.5G 0 disk ├─nvme0n1p1 259:1 0 1G 0 part /boot/efi ├─nvme0n1p2 259:2 0 2G 0 part /boot └─nvme0n1p3 259:3 0 928.5G 0 part └─ubuntu--vg-ubuntu--lv 253:0 0 928.5G 0 lvm ... / ``` The device called `sda` is the sdcard connected to my computer. Since I got the SD card from the previous author, the SD card is already populated with required files beforehand. ``` Device Boot Start End Sectors Size Id Type /dev/sda1 2048 20973567 20971520 10G c W95 FAT32 (LBA) /dev/sda2 20973568 60751871 39778304 19G 83 Linux ``` We can distinguish the partitions by inspecting the FS Type field. I'll come back to this later when I'm switching the bootloader or the operating system. ### Prepare booting PYNQ-Z2 ![image](https://hackmd.io/_uploads/ryCGk004R.png) I followed the official instructions on how to boot the arm core. 1. Adjust boot option to SD card instead of JTAG 2. Switch to USB mode // Not precise 3. Download the official image onto the SD card and plug it in. Note that the SD card is not the same as the one previously mentioned. 4. Connect power cable. 5. Connect ethernet cable. After correctly setting up the board, we can see two serial devices being connected to my computer. ``` $ sudo dmesg | grep USB [1068003.239506] ftdi_sio 5-2:1.0: FTDI USB Serial Device converter detected [1068003.240168] usb 5-2: FTDI USB Serial Device converter now attached to ttyUSB0 [1068003.240877] ftdi_sio 5-2:1.1: FTDI USB Serial Device converter detected [1068003.241528] usb 5-2: FTDI USB Serial Device converter now attached to ttyUSB1 ``` Access the second serial device `/dev/ttyUSB1` to access PYNQ/Linux. ``` $ screen /dev/ttyUSB1 115200 xilinx@pynq:~$ ``` I dont know what is being transmitted/received on the first serial device. Maybe I'll check the manual later. :::info TODO: Check Zynq-7000 series manual for ttyUSB0 and ttyUSB1. ::: ### PYNQ overlay PYNQ overlay provides us with an easy way to program the FPGA chip with our own bitstream. We can also access the AXI system bus from PS to verify our peripherals connected to the bus. ![image](https://hackmd.io/_uploads/HJySkCC40.png) The original author chose this way to debug the peripherals including JTAG, SD card... The first time I call `Overlay('soc_xsa.bit')` to download bitstream onto the FPGA, python emitted error messages as below: ![image](https://hackmd.io/_uploads/SynHyAC40.png) Few people mentioned this error that I have no idea whats wrong with the `PortType` thing. I found this [thread](https://discuss.pynq.io/t/keyerror-when-loading-new-bit-file/6162/7) and this [thread](https://discuss.pynq.io/t/key-error-when-loading-overlay/5620/3) similar to me but with some slight differences. User stf's answer gave me a hint about the problem, `pynqmetadata` used in the script got updated and was not compatible with unmatched bitstream versions. So I downgraded the pynq prebuilt image from 3.0 to 2.7, and luckily the bitstream worked this time. ### Testing peripherals on bus Now that I can download bitstream to PL, tests can be made via pynq overlay. The overall address mapping of the whole system is listed below: | Name | Size | Address | End | | ------------- | --------- | ---------- |:---------- | | BootROM | 8KB | 0x00000000 | 0x00001FFF | | Reserved | ~~120KB~~ | 0x00002000 | 0x0001FFFF | | SystemRAM | 128KB | 0x00020000 | 0x0003FFFF | | Reserved | ~~768KB~~ | 0x00040000 | 0x03FFFFFF | | CPU Config | 4KB | 0x04000000 | 0x04000FFF | | Reserved | ~~4KB~~ | 0x04001000 | 0x04001FFF | | Debug Monitor | 8KB | 0x04002000 | 0x04003FFF | | Reserved | ~~KB~~ | 0x04004000 | 0x07FFFFFF | | CLINT | 64KB | 0x08000000 | 0x0800FFFF | | Reserved | ~~KB~~ | 0x08010000 | 0x0BFFFFFF | | PLIC | 64MB | 0x0C000000 | 0x0FFFFFFF | | UART | 4KB | 0x10000000 | 0x10000FFF | | SPI | 4KB | 0x10001000 | 0x10001FFF | | MAC | 4KB | 0x10002000 | 0x10002FFF | | Reserved | ~~KB~~ | 0x10003000 | 0x7FFFFFFF | | DRAM | 512 MB | 0x80000000 | 0x9FFFFFFF | ### Download bitstream Use `Overlay()` to parse bitstream and download to PL. ```python from pynq import Overlay #riscv = Overlay("./soc_xsa.bit") riscv = Overlay("./soc_wrapper.bit") ``` ### Testing BootROM, System SRAM Use `MMIO()` to write/read data to/from the AXI bus. Testing method: 1. Check the initial value of the BootROM 2. Write arbitrary data to it 3. Check the value again ``` Before Download ROM code Read 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Download ROM After Download ROM code Read 00000000: 00 00 00 00 01 01 01 01 02 02 02 02 03 03 03 03 04 04 04 04 05 05 05 05 06 06 06 06 07 07 07 07 08 08 08 08 09 09 09 09 0a 0a 0a 0a 0b 0b 0b 0b 0c 0c 0c 0c 0d 0d 0d 0d 0e 0e 0e 0e 0f 0f 0f 0f 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 18 19 19 19 19 1a 1a 1a 1a 1b 1b 1b 1b 1c 1c 1c 1c 1d 1d 1d 1d 1e 1e 1e 1e 1f 1f 1f 1f ``` We can see from the results that the memory and the bus both worked as expected. For system sram on the bus, the test is identical. Set the base address and modify it, which results in ``` First read: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Second read: 00 00 00 00 01 01 01 01 02 02 02 02 03 03 03 03 04 04 04 04 05 05 05 05 06 06 06 06 07 07 07 07 08 08 08 08 09 09 09 09 0a 0a 0a 0a 0b 0b 0b 0b 0c 0c 0c 0c 0d 0d 0d 0d 0e 0e 0e 0e 0f 0f 0f 0f 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 18 19 19 19 19 1a 1a 1a 1a 1b 1b 1b 1b 1c 1c 1c 1c 1d 1d 1d 1d 1e 1e 1e 1e 1f 1f 1f 1f ``` ### Testing UART UART requires some configuration like base address, baud rate. :::info TODO: Initializing details, will be useful when writing driver. ::: ```python # Enable UART riscv_base.write(UART_TXCTRL, 1) riscv_base.write(UART_RXCTRL, 1) #set baud rate 115200 riscv_base.write(UART_DIV, 1000000000//CLK_PERIOD//115200) for i in "Hello".encode(): riscv_base.write(UART_TXFIFO, i) ``` Connect a uart-usb converter to the TX, RX, VDD, GND pin on the board and monitor the output. {%youtube wNnI4BICbhw %} ### Testing SPI sd card reader :::info 1. What is SPI 2. Initializing Details 3. How does SD card reader work? ::: The SPI control register is defined as follow: | Bit Position | Field Name | Description | | ------------ | ---------------- | ------------------------------------------------------------------- | | 31:16 | 15'b0 | Reserved bits (write with zeros) | | 15 | spi_cr1_del | Delay | | 14 | spi_cr1_bidimode | Bidirectional data mode enable | | 13 | spi_cr1_bidioe | Output enable in bidirectional mode | | 12 | spi_cr1_crcen | Hardware CRC calculation enable | | 11 | spi_cr1_crcnext | CRC transfer next | | 10 | spi_cr1_dff | Data frame format (0: 8-bit, 1: 16-bit) | | 9 | spi_cr1_rxonly | Receive only | | 8 | spi_cr1_ssm | Software slave management | | 7 | spi_cr1_ssi | Internal slave select | | 6 | spi_cr1_lsbfirst | Frame format (0: MSB first, 1: LSB first) | | 5 | spi_cr1_spe | SPI enable | | 4:2 | spi_cr1_br | Baud rate control | | 1 | spi_cr1_mstr | Master selection (0: Slave, 1: Master) | | 0 | spi_cr1_cpol | Clock polarity (0: CK to 0 when idle, 1: CK to 1 when idle) | | 0 | spi_cr1_cpha | Clock phase (0: First clock transition, 1: Second clock transition) | After initializing the SPI module, we must set the SD card to SPI mode by sending appropriate commands. Then we can read sector information from the SD card ``` Output: SPI_CR1 = 0x0000007c [DBG] buff = 000001aa [DBG] acmd41_r1 = 0x00000001 [DBG] acmd41_r1 = 0x00000000 [DBG] cmd58_r1 = 0x00000000 [DBG] buff = c0ff8000 SPI_CR1 = 0x0000004c [DBG] r1 = 0x00000000 [DBG] r1 = 0x00000000 SD STATE: STAT_INIT_OK [SD_STATE] STAT_INIT_OK [SD_TYPE] SD_SDHC Read Success 00000000: fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 ................ 00000010: fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 ...|.........!.. 00000020: 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75 ....8.u........u 00000030: f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b .........|...t.. 00000040: 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00 L.....|......... ``` ### Download boot code to BootROM The previouse design of BootROM is actually a readable/writable memory sitting on the bus. I guess its for faster developement and debugging purposes. Therefore there is nothing in the ROM when PL is programmed, and requires an additional step to load the memory manually. Use the [existing debugger](https://github.com/yutongshen/RISC-V-Debugger) provided by the author to download boot file to ROM. ![image](https://hackmd.io/_uploads/Sk_vJCRER.png) The debugger was written in C# and is a Windows only app, so I planned to write a Linux version of it. I couldn't get the debugger to work. The debugger can detect the JTAG-USB device but emitted `Get input buffer timeout` error. I don't know how the dubugger works so I left this issue unsolved and move on to the next approach for now. Since the BootROM is on the AXI bus, I can write data to it from the python overlay. I uploaded `rom.bin` to the SD card and read from python. ```python # Read rom.bin file def read_file_as_hex(file_path): with open(file_path, 'rb') as file: file_content = file.read() hex_content = file_content.hex() # Group the hex content into 4-byte chunks (8 hex digits each) hex_chunks = [hex_content[i:i+8] for i in range(0, len(hex_content), 8)] return hex_chunks # Function to convert hex chunks to unsigned integers def hexstr_to_int(s): if len(s) < 8: s = s.ljust(8, '0') s = s[6:8] + s[4:6] + s[2:4] + s[0:2] return int(s,16) boot = read_file_as_hex('rom.bin') print('Write rom.bin to ROM') for i, b in enumerate(boot): riscv_base.write(i*4 , hexstr_to_int(b)) print('Write done') ``` And then start the CPU same as the debugger. ![image](https://hackmd.io/_uploads/rJF_JRA4R.png) From the risc-v UART output, we can observe that the boot up code is actually running. bbl and vmlinux is loaded to System SRAM and ready for execution. However, the program got stuck and produced no other output. I have no idea what bbl did, and why did it get stuck, so I'll first study bbl and vmlinux boot sequence, try to compile it and run it with qemu. Try using the original bbl and vmlinux from [google drive](https://drive.google.com/drive/folders/1U_mz91qeFlM4RmhGj2KTosD3d-gEyJ4v). I followed commands from the previous work to partition the SD card and format the two partitions. 1. Use `fdisk` to create two new partitions. ``` $ sudo fdisk -l /dev/sda Disk /dev/sda: 28.97 GiB, 31104958464 bytes, 60751872 sectors Disk model: Storage Device Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xc1685bee Device Boot Start End Sectors Size Id Type /dev/sda1 2048 104447 102400 50M c W95 FAT32 (LBA) /dev/sda2 104448 60751871 60647424 28.9G 83 Linux ``` 2. Use `mkfs` to format the partitions into FAT32 and ext3. 4. Copy bbl and vmlinux into partition 1 and the content of rootfs into partition 2. This time bbl never got loaded correctly and resulted in an infinite loop inside the zero stage boot loader. ``` [BROM] UART init done [BROM] HW ver: 20230508 [BROM] SD card init [BROM] FAT BPB init [BROM] load bbl [BROM] File not found ``` At first I thought that I something went wrong with the `fdisk` command until I read the SD card partition 1 in the python overlay the following content ``` 00100000: eb 58 90 6d 6b 66 73 2e 66 61 74 00 02 01 20 00 .X.mkfs.fat... . 00100010: 02 00 00 00 00 f8 00 00 20 00 40 00 00 08 00 00 ........ .@..... 00100020: 00 90 01 00 14 03 00 00 00 00 00 00 02 00 00 00 ................ 00100030: 01 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00100040: 80 01 29 7e aa a0 7a 4e 4f 20 4e 41 4d 45 20 20 ..)~..zNO NAME 00100050: 20 20 46 41 54 33 32 20 20 20 0e 1f be 77 7c ac FAT16 ...w|. 00100060: 22 c0 74 0b 56 b4 0e bb 07 00 cd 10 5e eb f0 32 ".t.V.......^..2 00100070: e4 cd 16 cd 19 eb fe 54 68 69 73 20 69 73 20 6e .......This is n 00100080: 6f 74 20 61 20 62 6f 6f 74 61 62 6c 65 20 64 69 ot a bootable di 00100090: 73 6b 2e 20 20 50 6c 65 61 73 65 20 69 6e 73 65 sk. Please inse 001000a0: 72 74 20 61 20 62 6f 6f 74 61 62 6c 65 20 66 6c rt a bootable fl 001000b0: 6f 70 70 79 20 61 6e 64 0d 0a 70 72 65 73 73 20 oppy and..press 001000c0: 61 6e 79 20 6b 65 79 20 74 6f 20 74 72 79 20 61 any key to try a 001000d0: 67 61 69 6e 20 2e 2e 2e 20 0d 0a 00 00 00 00 00 gain ... ....... ``` The `FAT16` word shows that the file system has FAT16 format, which is not the same as the one created by `fdisk`. According to [`mkfs.fat(8)`](https://man7.org/linux/man-pages/man8/mkfs.fat.8.html) manual page, `-F 32` option specifiies FAT size. Then it works. {%youtube mVWKk6xXTbs %} Now that I've reproduced the previous work, some questions came to me with this design. 1. I don't know how `rom.bin`, `bbl`, `vmlinux` were built from source. 2. I don't know what each bootloader is doing exactly. 3. Why system SRAM is needed in the current design? 4. The FPGA softcore shares the same physical memory with the on-chip ARM core; However ARM core didn't know its existence, and accessed the corrupted DDR. Is there any way to solve this problem? 5. The hardware bitstream I sythesized myself did not work as the pre-built bitstream from last year ``` [BROM] UART init done [BROM] HW ver: 20230701 [BROM] SD card init [BROM] FAT BPB init [BROM] FAT BPB init fail ``` The first two questions can be solved by reading the source code. The third problem requires a hardware modification if the sram is removed. ### Build `rom.bin` from source If I compile the rom code with riscv gnu cross compiler from [/rom](https://github.com/yutongshen/RISC-V_SoC/tree/master/rom), I will get link error as showed below: ``` /riscv-64/lib/gcc/riscv64-unknown-elf/13.2.0/../../../../riscv64-unknown-elf/bin/ld: main section `.rodata.str1.8' will not fit in region `brom' /riscv-64/lib/gcc/riscv64-unknown-elf/13.2.0/../../../../riscv64-unknown-elf/bin/ld: region `brom' overflowed by 278 bytes /riscv-64/lib/gcc/riscv64-unknown-elf/13.2.0/../../../../riscv64-unknown-elf/bin/ld: warning: main has a LOAD segment with RWX permissions collect2: error: ld returned 1 exit status ``` The bootrom has max size 8 KB but the binary exceeds the limit. So I shortened some of the debug messages and successfully built the binary file. However, I couldn't boot from the new rom.bin I just built. When analyzing what `setup.S` did, I realized that there were `ifdef` guards for UART initialization and cache init. ``` #ifdef ENABLE_UART_IRQ li t0, 0x10000000; li t1, 7; sw t1, 0x10(t0); #endif ``` which is missing in the original Makefile. After adding this line `DEFINE := -DENABLE_UART_IRQ -DENABLE_CACHE -DREAL_ROM` rom.bin worked. ### bbl study ![image](https://hackmd.io/_uploads/Hkw9J0CER.png) The current boot flow requires ROM to setup UART and SPI-SD card reader and will load bbl, vmlinux to a fix location. To this end, I'm skeptical about what is the purpose of bbl (Berkeley boot loader) if vmlinux is loaded in DDR already. So I dived into bbl source code and try to understand what I missed here. The entry point of bbl can be found in the linker script `bbl.lds` ```ld ENTRY( reset_vector ) ``` which is located in [`machine/mentry.S`](https://github.com/riscv-software-src/riscv-pk/blob/master/machine/mentry.S). `reset_vector` simply resets all the registers and `mscratch` to zero. The machine mode trap handler is set to a bbl `trap_vector` with the following sequence ```asm # write mtvec and make sure it sticks la t0, trap_vector csrw mtvec, t0 csrr t1, mtvec 1:bne t0, t1, 1b ``` After all the reset, [`init_first_hart()`](https://github.com/riscv-software-src/riscv-pk/blob/4f3debe4d04f56d31089c1c716a27e2d5245e9a1/machine/minit.c#L192) is called. `init_first_hart()` in brief: 1. Setup console as soon as possible with `query_uart()` etc... 2. `init_hart()` 1. `mstatus_init()`: Set `mstatus` for availablie extensions, like F extension, V extension... 2. `fp_init()`: Setup floating point control registers if F extension is supported. 3. `delegate_trap()`: If Supervisor mode is supported, send s-mode interrupts and exceptions to s-mode instead of mode. Set `mideleg` and `medeleg` CSRs. 4. `setup_pmp()`: Setup PMP to allow access to all memory locations. (Details to be figured out...) 3. A series of `query_*` functions that searchs the device tree blob(dtb) for information about our platform hardwares. ```c query_finisher(dtb); query_mem(dtb); query_harts(dtb); query_clint(dtb); query_plic(dtb); query_chosen(dtb); ``` 4. Wake up all the harts. In my case, there is only one. 5. Init memory, per hart plic according to the info we got from query. 6. `bootloader()`: 1. `filter_dtb()` : Place dtb right after the kernel code. 2. Set kernel entry point 3. `boot_other_hart()`: Get the entry point and enter supervisor mode with that entry. When looking at `boot_other_hart()`, I found something I couldn't comprehend ```c= void boot_other_hart(uintptr_t unused __attribute__((unused))) { const void* entry; do { entry = entry_point; mb(); } while (!entry); long hartid = read_csr(mhartid); if ((1 << hartid) & disabled_hart_mask) { while (1) { __asm__ volatile("wfi"); #ifdef __riscv_div __asm__ volatile("div x0, x0, x0"); #endif } } #ifdef BBL_BOOT_MACHINE enter_machine_mode(entry, hartid, dtb_output()); #else /* Run bbl in supervisor mode */ protect_memory(); enter_supervisor_mode(entry, hartid, dtb_output()); #endif } ``` We can see that `boot_other_hart()` takes one `uintptr_t` as an argument, but it is named `unused` and has a compiler attribute unused. This argument is really unused as stated, and I dont know what is its purpose. Looking at the GNU extension documentation: > unused This attribute, attached to a variable, means that the variable is meant to be possibly unused. GCC does not produce a warning for this variable. This attribute has no other meaning but to let compiler ignore warnings related to this variable. Notice the define guard at line 19 `BBL_BOOT_MACHINE`, this is used when riscv pk (RISC-V proxy kernel) is used, which use bbl as a fake machine so I can ignore that part. In conclusion, the zero stage boot loader `rom.bin` and bbl did almost the same thing and therefore can be reduce to a more minimal boot code. bbl must be swapped with openSBI for more portability and more complex boot loader if possible. ### Shared physical memory problem In the previous work, the same physical memory is shared between the two operating systems, and I found no keys showing that ARM core is turned off. Therefore, in order to make both operating system live together, the easiest way is to separate two DDR regions for each OS with dtb until I learn another approach to deal with this issue. I've gone through multiple official documentations and found no way to completely stop PS(ARM core). ## TODO: 指令集從 RV64 改為 RV32 > Hardware RTL analysis ### Memory mapping | Name | Size | Address | End | | ------------- | --------- | ---------- |:---------- | | BootROM | 8KB | 0x00000000 | 0x00001FFF | | Reserved | ~~120KB~~ | 0x00002000 | 0x0001FFFF | | SystemRAM | 128KB | 0x00020000 | 0x0003FFFF | | Reserved | ~~768KB~~ | 0x00040000 | 0x03FFFFFF | | CPU Config | 4KB | 0x04000000 | 0x04000FFF | | Reserved | ~~4KB~~ | 0x04001000 | 0x04001FFF | | Debug Monitor | 8KB | 0x04002000 | 0x04003FFF | | Reserved | ~~KB~~ | 0x04004000 | 0x07FFFFFF | | CLINT | 64KB | 0x08000000 | 0x0800FFFF | | Reserved | ~~KB~~ | 0x08010000 | 0x0BFFFFFF | | PLIC | 64MB | 0x0C000000 | 0x0FFFFFFF | | UART | 4KB | 0x10000000 | 0x10000FFF | | SPI | 4KB | 0x10001000 | 0x10001FFF | | MAC | 4KB | 0x10002000 | 0x10002FFF | | Reserved | ~~KB~~ | 0x10003000 | 0x7FFFFFFF | | DRAM | 512 MB | 0x80000000 | 0x9FFFFFFF | ### UART controller The address mapping ```verilog `define UART_TXFIFO 12'h00 `define UART_RXFIFO 12'h04 `define UART_TXCTRL 12'h08 `define UART_RXCTRL 12'h0C `define UART_IE 12'h10 `define UART_IP 12'h14 `define UART_IC 12'h18 `define UART_DIV 12'h1C `define UART_LCR 12'h20 ``` - TXFIFO: Write to send data, read to check if TX FIFO is full Read: ![image](https://hackmd.io/_uploads/SkyR1ACVC.png) Write: ![image](https://hackmd.io/_uploads/BJbJlRCV0.png) - RXFIFO: Read to receive data Read : ![image](https://hackmd.io/_uploads/By-eeCCNC.png) Write: Not defined - TXCTRL: ![image](https://hackmd.io/_uploads/SJ7bg0CEA.png) - RXCTRL ![image](https://hackmd.io/_uploads/r1kGeCAER.png) `cnt` field is the threshold for almost empty and almost full for TX and RX - IE: Interrupt enable, Enable TX, RX, Error interrupts ![image](https://hackmd.io/_uploads/BJzXxRCE0.png) - IP: Interrupt pending, Indicates pending TX, RX, Error interrupts ![image](https://hackmd.io/_uploads/BJzXxRCE0.png) - IC: Interrupt clear, Write to clear corresponding interrupt ![image](https://hackmd.io/_uploads/BJzXxRCE0.png) - DIV: Baudrate divider, defaults to 115200 baudrate, can be modified by writing to this register ![image](https://hackmd.io/_uploads/SJ5Ne0AVR.png) - LCR: Line control register, control whether use parity, the current uart controller provides no way to config stop bits, data length ![image](https://hackmd.io/_uploads/SJmreA0NA.png) ### PLIC (Platform level interrupt controller) Memory mapping: | Name | Offset | Description | | -------- | -------- | -------- | | Priority | 0x0000000 | Zero: Never interrupt | | Pending | 0x0001000| Current status of the interrupt, 1 bit per source | | Enable | 0x0002000 | Decides if the interrupt source is enabled | | Priority threshold | 0x0200000 | Priority threshold for each context (hart) | | Claim/Clear | 0x0200000 + 0x4 | Claim or Clear interrupt for each context (hart) | > According to PLIC spec: > **Context** > Interrupt targets are usually hart contexts, where a hart context is a given privilege mode on a given hart (though there are other possible interrupt targets, such as DMA engines). ```verilog `define PLIC_INT_PRIOR 26'h000_0000 `define PLIC_INT_PEND 26'h000_1000 `define PLIC_INT_TYPE 26'h000_1080 `define PLIC_INT_POL 26'h000_1100 `define PLIC_INT_EN 26'h000_2000 `define PLIC_PRIOR_TH 26'h020_0000 ``` [PLIC spec](https://github.com/riscv/riscv-plic-spec) defines all the memory mapping and meaning of each register. So there is no need to write my own firmware. However, the verilog source code defines two extra field `PLIC_INT_TYPE` and `PLIC_INT_POL` which I cannot tell what are their usage just by their name. In `cpu/plic.sv` I found out that those two fields are gateway configs which determines the type (edge or level trigger) and polarity (high or low trigger) of each source. The default config is high level trigger. ```verilog .src_type ( int_type[gvar_i] ), // 0: edge, 1: level .src_pol ( int_pol [gvar_i] ), // 0: high, 1: low ``` ### Clint (Core local interrupt controller) Memory mapping: | Name | Offset | Note | | -------- | ------ | ----------------------------------- | | msip | 0x0 | Machine mode software interrupt | | mtimecmp | 0x4000 | Machine mode timer compare register | | mtime | 0xBFF8 | Timer register | `mtime` and `mtimecmp` are defined in RISC-V priviledged ISA section 3.2.1. The machine timer interrupt becomes pending when `mtimecmp` is greater than or equal to `mtime`. ### SPI controller Memory mapping: | Name | Offset | Description | | ------- | ------ | ------------------ | | SPI_CR1 | 0x00 | Control register 1 | | SPI_CR2 | 0x04 | Control register 2 | | SPI_SR | 0x08 | Status register | | SPI_DR | 0x0C | Data register | Control register 1: ![spi_cr1](https://hackmd.io/_uploads/HyxUe34HC.png) - `cpha`: Clock phase - `cpol`: Clock polarity - `mstr`: Master selection - `br`: Baud rate control - `spe`: SPI enable - `lsbfirst`: shifting from lsb or msb - `ssi`: Internal slave select - `ssm`: Software slave management, whether use internal `ssi` bit to select - `rxonly`: Receive only - `dff`: Data frame format, 8 or 16 per data frame - `crcnext`: Transmit CRC next - `crcen`: Hardware CRC calculation enable - `bidioe`: Output enable in bidirectional mode - `bidimode`: Bidirectional data mode enable - `del`: Data frame format Control register 2: ![spi_cr2](https://hackmd.io/_uploads/Hywd82NHC.png) - txeie: TX buffer empty interrupt enable - rxneie: RX buffer not empty interrupt enable - errie: Error interrupt enable - ssoe: SS output enable - txdmaen: TX buffer DMA enable - rxdmaen: RX buffer DMA enable Status register: ![spi_status](https://hackmd.io/_uploads/rk8gJ2NH0.png) - `bsy`: SPI busy flag - `ovr`: Overrun flag - `modf`: Mode fault flag - `crcerr`: CRC error flag - `udr`: Underrun flag - `chside`: Channel side flag - `rxne`: Receive buffer not empty - `txe`: Transmit buffer empty Normal execution flow requires CPU to read `txe` and `rxne` in SPI_CR2 to determine whether tx rx buffer needs attention. This will waste CPU cycles doing trivial tasks, therefore a DMA is introduced inside the SPI core. DMA will write to or read from the buffers whenever txe/rxne is set, freeing the CPU to other critical tasks. > STM32 RM0090 Reference manual 28.3.5, for more data transmission details. ### SPI DMA Memory mapping: | Name | Offset | Description | | ----------- | ------ | --------------------------------- | | DMA_SRC | 0x00 | Source address for DMA transfer | | DMA_DEST | 0x04 | Destination address for DMA | | DMA_LEN | 0x08 | Length of DMA transfer | | DMA_CON | 0x0C | DMA control register | | DMA_IE | 0x10 | DMA interrupt enable register | | DMA_IP | 0x14 | DMA interrupt pending register | | DMA_IC | 0x18 | DMA interrupt clear register | | DMA_WDT_CNT | 0x1C | DMA watchdog timer count register | SPI DMA is responsible for transfering data from DDR to SD card. It is capable of interacting with the SPI controller, sending tx/rx commands. Control register: ![spi_dma_csr](https://hackmd.io/_uploads/SkhSO0NS0.png) - size: `WORD`, `HWORD`, `BYTE` - type: `FIXED`, `INCR`, `CONST` - bypass: Not sure what bypass mean ### Debugger CSR APB Memory mapping: | Name | Address | Description | | --------------- | ------- | -------------------------------------- | | DBGAPB_DBG_EN | 12'h000 | Debug enable register | | DBGAPB_INST | 12'h004 | Debug instruction register | | DBGAPB_INST_WR | 12'h008 | Debug instruction write register | | DBGAPB_WDATA_L | 12'h010 | Debug write data low register | | DBGAPB_WDATA_H | 12'h014 | Debug write data high register | | DBGAPB_WDATA_WR | 12'h01C | Debug write data write enable register | | DBGAPB_RDATA_L | 12'h020 | Debug read data low register | | DBGAPB_RDATA_H | 12'h024 | Debug read data high register | Monitor memory mapping: | Definition | Address | Description | |---------------------|----------|----------------------------------------| | DBGMON_BP0 | 13'h1100 | Debug monitor breakpoint 0 | | DBGMON_BP1 | 13'h1108 | Debug monitor breakpoint 1 | | DBGMON_BP2 | 13'h1110 | Debug monitor breakpoint 2 | | DBGMON_BP3 | 13'h1118 | Debug monitor breakpoint 3 | | DBGMON_WP0 | 13'h1120 | Debug monitor watchpoint 0 | | DBGMON_WP1 | 13'h1128 | Debug monitor watchpoint 1 | | DBGMON_WP2 | 13'h1130 | Debug monitor watchpoint 2 | | DBGMON_WP3 | 13'h1138 | Debug monitor watchpoint 3 | | DBGMON_VC_EXC | 13'h1140 | Debug monitor vector catch exception | | DBGMON_VC_IRQ | 13'h1144 | Debug monitor vector catch interrupt | | DBGMON_DELAY | 13'h1148 | Debug monitor delay | | DBGMON_STOP_TRACE | 13'h114c | Debug monitor stop trace | | DBGMON_IE | 13'h1150 | Debug monitor interrupt enable | There are more address mapped to architectural registers. To trace instruction execution sequence after a specific PC, I need to: 1. Set BreakPoint to target address 2. Set BreakPoint enable (BP0 + 4) to 1 3. Set Delay to target number 4. Run the CPU till stop bit become 1 ### Difference between RV32 and RV64 Ref: [RiscV ISA manual](https://github.com/riscv/riscv-isa-manual) This section lists the difference between RV32I and RV64I, which act as a to-do or reference for me. #### General 1. `XLEN` : width of the integer register in bits. 64 for `rv64i` and 32 for `rv32i`. This number is also related to the size $2^{len}$ of the supported max address space. #### Unprivileged ISA > RiscV unprivileged ISA Chapter 4.0: > - This chapter describes the RV64I base integer instruction set, which builds upon the RV32I variant described in Chapter 2. > > Chapter 4.2: > - Most integer computational instructions operate on XLEN-bit values. > > Chapter 13.1, 13.2: > - MULW is an RV64 instruction... > - DIVW and DIVUW are RV64 instructions... > - REMW and REMUW are RV64 instructions... > > Chapter 14.2: > - LR.D and SC.D act analogously on doublewords and are only available on RV64. The 64-bit ISA is built upon RV32I. I will list out the differences between them. 1. All the instructions defined in RV32I is supported in RV64I with different behavior (depends on `XLEN` value). 2. Additional instructions have been included to operate on 32-bit length (`ADDIW`, `SLLIW`, `SRLW`, `SUBW`, `SRAW`). These can be removed in RV32I. 3. `LD` and `SD` can be removed. 4. For "M" extension, behavior depends on `XLEN`. 5. `LR.D` and `SC.D` can be removed 6. `AMO*.D` can be removed Modified file: - `alu.sv`: Remove 64-bit arithmetic logics - `dec.sv`: Remove 64-bit instructions, which means removed illegal instructions will be decoded to `ill_isns`. - `mdu.sv`: Mul/Div behavior #### Privileged ISA > RISC-V privileged ISA Chapter 3.1.6: > > - `MXLEN` is the effective `XLEN` in M-mode > - For RV32 only, there is `mstatush` register, which contains the same field in RV64 `mstatus` uppper 32 bit. > - When MXLEN=32, the SXL and UXL fields do not exist, and SXLEN=32 and UXLEN=32. > - `mtvec` is `MXLEN` bits long. > - `mdelegh`, `menvcfgh` , `mseccfgh` is the alias bits of the upper half 64-bit non-h counterpart. > - `msratch`, `mepc`, `mcause`, `mtval`, `mconfigptr` is `MXLEN` bits > - `mtime` is still 64-bit precision. In RV32, memory-mapped writes to mtimecmp modify only one 32-bit part of the register. > > Physical memory protection Chapter 3.7.1: > - 16 pmpcfg registers are 32-bit compared to 8 64-bit regs in RV64. > - pmp address is 32-bit as well, storing the `addr[33:2]` in the address field. > - S-mode has almost the same changes in the CSR fields. > > SV32 Chapter 11.3: > - Two level page table is used compared to three in SV39. > - The hardware page table walker does not require a change because the hardware checks if this entry is a leaf node and stops there. The level isn't that relevant to the walking process. The upper 32 bits of `medeleg` in RV64 is hardwired to zeros, so `medelegh` can be all zero. `menvcfg`, `mseccfg` registers did not exist in this design, so the additional *h version can be ignored. #### AXI Interconnect The AXI Interconnect bitwidth is 32 bits, which conforms to RV32. Take AR channel as an example, ```verilog logic [ 31: 0] m0_araddr ``` In RV64, the read is done by two bursts, therefore no need to change the bus and the corresponding masters (cache, ...). ### Verifying the CPU I found only the system level testbench for `cpu_wrap` under `scripts/`. I think a full test on the whole system is an overkill and verifying only `cpu_top` is adequate since I did not files other than cpu core. I need to figure out how to compile and use riscv-tests. There is no document on how to do this. Simple rv32i test assembly: https://github.com/hamsternz/simple-riscv/blob/main/sw/asm/isa_test.S Verify the core first, inspect the IO of the core ```verilog pu_top DUT( // Need to figure out whats the difference between two reset signals // and how do I generate a correct systime .clk(), // The two resets can be see as one .srstn(), .xrstn(), // cpu id == 0 .cpu_id(), //output, can be ingnored .rv64_mode(), // The first instruction after cpu start .bootvec(), //output can be ignored .warm_rst_trigger(), // the actual mtime, can be hardwired zero .systime(), // mpu csr // All outputs, I assume I dont need these signals // mmu csr // All outputs, I assume I dont need these signals. .satp_ppn(), .satp_asid(), .satp_mode(), .prv(), .sum(), .mprv(), .mpp(), // TLB control // All outputs, I assume I dont need these signals. .tlb_flush_req(), .tlb_flush_all_vaddr(), .tlb_flush_all_asid(), .tlb_flush_vaddr(), .tlb_flush_asid(), // interrupt interface // interrupt pins, hardwire to zeros for no interrupt .msip(), .mtip(), .meip(), .seip(), // insn interface // Need to figure out whats the expected memory behavior and write my own model .imem_en(), .imem_addr(), .imem_rdata(), .imem_bad(), .imem_busy(), .ic_flush(), // data interface // Need to figure out whats the expected memory behavior and write my own model .dmem_en(), .dmem_addr(), .dmem_write(), .dmem_ex(), .dmem_strb(), .dmem_wdata(), .dmem_rdata(), .dmem_bad(), .dmem_xstate(), .dmem_busy(), // debug interface // I can just ignore the debug driving signals .dbg_gpr_all(), .dbg_addr(), .dbg_wdata(), .dbg_gpr_rd(), .dbg_gpr_wr(), .dbg_gpr_out(), .dbg_csr_rd(), .dbg_csr_wr(), .dbg_csr_out(), .dbg_pc_out(), .dbg_exec(), .dbg_insn(), .attach(), .halted(), // CPU tracer // All output, assume its irrelevant .trace_pkg_valid(), .trace_pkg() ); ``` Most of the output ports are irrelevant when verifying baremetal rv32 so I can ignore them. I wrote a simple memory model for `imem` and `dmem`. ## TODO: 用 OpenSBI 驗證 > bbl (boot loader),確認 Sv32 (MMU) 可運作 After watching [openSBI Deep dive by WD](https://www.youtube.com/watch?v=jstwB-o9ll0&ab_channel=RISC-VInternational) I could grasp the big picture of openSBI. There are many features but I need only a small set of them to boot linux. The new boot flow will be like this: 1. ZSBL: Load openSBI firmware and my own bootloader to DDR 2. FSBL: Initialize console, peripherals, PLIC, PMP. Load linux and pass control to the kernel. This stage will be implemented using openSBI. 3. Kernel To keep ZSBL small enough, I think `iolib` can be removed and use LEDs to indicate the boot process. As soon as openSBI is ready, the console can then be init. This is initial plan, so changes may be made to this boot flow. I don't need to do all the hardware(PLL, DDR, ...) inits, since zynq FSBL will do this for me. ### Try openSBI on qemu To further understand openSBI in action, I use qemu to try out openSBI firmware. Prepare riscv cross-compiler and compile openSBI with default qemu platform and no payload: ``` $ make PLATFORM=generic CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build ``` Then execute the firmware with qemu: ``` $ qemu-system-riscv64 -M virt -m 256M -nographic -bios build/platform/generic/firmware/fw_payload.bin ``` I got the openSBI output: ``` $ qemu-system-riscv64 -M virt -m 256M -nographic -bios build/platform/generic/firmware/fw_payload.bin OpenSBI v1.4-111-gd962db2 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : riscv-virtio,qemu Platform Features : medeleg Platform HART Count : 1 Platform IPI Device : aclint-mswi Platform Timer Device : aclint-mtimer @ 10000000Hz ... Domain0 Name : root Domain0 Boot HART : 0 ... Boot HART ID : 0 Boot HART Domain : root Boot HART Priv Version : v1.12 Boot HART Base ISA : rv64imafdch Boot HART ISA Extensions : sstc,zicntr,zihpm,zicboz,zicbom,sdtrig Boot HART PMP Count : 16 Boot HART PMP Granularity : 2 bits Boot HART PMP Address Bits: 54 Boot HART MHPM Info : 16 (0x0007fff8) Boot HART Debug Triggers : 2 triggers Boot HART MIDELEG : 0x0000000000001666 Boot HART MEDELEG : 0x0000000000f0b509 Test payload running ``` Notice that this platform is the default platform for qemu emulation, and we need to implement a new platform for our SOC. The execution stopped at `Test payload running` because I didn't specify any next stage payload. Next, I compiled linux kernel v6.4 from source and got a bootable image. Then, I prepared a rootfs image with buildroot all with default config. To compile openSBI firmware with a payload, the linux kernel in my case, the following command is used: ``` $ make PLATFORM=generic FW_PAYLOAD_PATH=../linux/arch/riscv/boot/Image CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build ``` and emulate with : ``` $ qemu-system-riscv64 -M virt -m 256M -nographic \ -bios build/platform/generic/firmware/fw_payload.bin \ -drive file=../rootfs.ext2,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0 \ -append "root=/dev/vda rw console=ttyS0" ``` I got an error running the command above: ``` qemu-system-riscv64: -append only allowed with -kernel option ``` If I delete the `append` flag, the kernel would start but couldn't load the filesystem correctly. ``` [ 0.289222] VFS: Cannot open root device "" or unknown-block(0,0): error -6 [ 0.289349] Please append a correct "root=" boot option; here are the available partitions: [ 0.289590] fe00 61440 vda ... [ 0.292969] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) [ 0.293283] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0 #1 [ 0.293462] Hardware name: riscv-virtio,qemu (DT) [ 0.293627] Call Trace: [ 0.293792] [<ffffffff800061e2>] dump_backtrace+0x1c/0x24 [ 0.294199] [<ffffffff8097ca9c>] show_stack+0x2c/0x38 [ 0.294317] [<ffffffff809896d6>] dump_stack_lvl+0x52/0x74 [ 0.294417] [<ffffffff8098970c>] dump_stack+0x14/0x1c [ 0.294526] [<ffffffff8097cfaa>] panic+0x106/0x2ba [ 0.294650] [<ffffffff80a0174a>] mount_root_generic+0x208/0x2ca [ 0.294789] [<ffffffff80a019fe>] mount_root+0x1f2/0x224 [ 0.294905] [<ffffffff80a01c2e>] prepare_namespace+0x1fe/0x25a [ 0.295027] [<ffffffff80a0118e>] kernel_init_freeable+0x26c/0x28e [ 0.295159] [<ffffffff8098b164>] kernel_init+0x1e/0x10a [ 0.295275] [<ffffffff809936da>] ret_from_fork+0xe/0x1c [ 0.295750] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]--- ``` I need to specify the root device block in the kernel command line but I couldn't append anything without `kerenl` option. My guess is that the [tutorial](https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/qemu_virt.md) is using the older version of qemu. So instead I run the example using `fw_jump`(with jump address): ``` $ qemu-system-riscv64 -M virt -m 256M -nographic \ -bios build/platform/generic/firmware/fw_jump.bin \ -kernel ../linux/arch/riscv/boot/Image \ -drive file=../rootfs.ext2,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0 \ -append "root=/dev/vda rw console=ttyS0" ``` {%youtube 23L4Sg5hjpY %} Now I'm more familiar with openSBI and linux boot sequence, I will try to configure qemu system like my SOC's environment and do the openSBI debugging on qemu first. Its more simpler and faster. ### Configuring and Compiling linux kernel After searching how to specify kernel command line at compile time, I found out that there is a config dedicated to this purpose. In `arch/riscv/Kconfig`: ``` menu "Boot options" config CMDLINE string "Built-in kernel command line" help For most platforms, the arguments for the kernel's command line are provided at run-time, during boot. However, there are cases where either no arguments are being provided or the provided arguments are insufficient or even invalid. When that occurs, it is possible to define a built-in command line here and choose how the kernel should use it later on. choice prompt "Built-in command line usage" if CMDLINE != "" default CMDLINE_FALLBACK help Choose how the kernel will handle the provided built-in command line. config CMDLINE_FALLBACK bool "Use bootloader kernel arguments if available" help Use the built-in command line as fallback in case we get nothing during boot. This is the default behaviour. ... config CMDLINE_FORCE bool "Always use the default kernel command string" help Always use the built-in command line, even if we get one during boot. This is useful in case you need to override the provided command line on systems where you don't have or want control over it. endchoice ``` I can set `CMDLINE_FORCE` to always use default kernel command string. So I reconfigure the .config file to use the qemu default: ```diff CONFIG_CMDLINE="root=/dev/vda rw console=ttyS0" CONFIG_CMDLINE_FORCE=y ``` and rebuilt the kernel. When running the `fw_payload` opensbi firmware, the same issue occurs. ``` ... [ 0.000000] percpu: Embedded 22 pages/cpu s49400 r8192 d32520 u90112 [ 0.000000] Kernel command line: [ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear) ... ``` The Kernel command line was empty. 😅It turned out that I forgot to recompile the opensbi firmware with the latest linux kernel image. This showed an important drawback of booting linux kernel directly from openSBI firmware. Useful reference: [Linux Kernel configuration list](https://cateee.net/lkddb/web-lkddb/) ### ZSBL (Zero stage boot loader) openSBI relies on the previous bootloader to load its firmware to DDR. The zero stage bootloader is quite simple since all the clock, DDR is intialized by PS. #### UART To emit zsbl boot process information, we can use LEDs or UART as a reliable feedback. Since the bus is ready when running boot rom, uart is enabled and more informative than LED indicators. The uart init sequence is similar to openSBI firmware but with only `putc` and `puts` capabilities. #### SD/MMC Ref: [How to use MMC/SDC](http://elm-chan.org/docs/mmc/mmc_e.html) Ref: [SPI](http://elm-chan.org/docs/spi_e.html) In order to use MMC/SDC in my system, I need to first initialize SPI and then put SD card into SPI mode. The following shows the initialization process: ```plantuml @startuml start :Init SPI; :Wait for at least 1ms; :Send 80 SCLK pulses; note right: CS pin low repeat :Send **CMD0**; repeat while(R1 response?) is (Bad) -> 0x1; :Send **CMD8** with 0x000001AA; switch (**CMD8** response?) case ( error(0x05) ) :SDC Ver.1; case ( 0x1AA ) :SDC Ver.2+; case ( other ) :reject card; stop endswitch :Send **ACMD41**; :Send **CMD58** read OCR; if (CCS[30]) is (1) then :SDHC/SDXC; else (0) :x; endif :Send **CMD59** (CRC_ON_OFF); note right :DisableCRC :Send **CMD16** (Set block len); note right :512 bytes stop @enduml ``` > Note: > 1. ACMD(N) is a sequence of CMD55-CMD(N) > 2. SPI mode is block addressing After putting SD card into SPI mode, we can send **CMD17** to read a block and **CMD24** to write a block. Upon receiving a valid response for read/write block commands, we can utilize DMA to move the data packets. I'm not 100% sure how the custom DMA works at this point ```c void __dma_cfg(u32 src, u32 dest, u32 len, u8 spi_bypass, u8 src_btype, u8 dest_btype, u8 src_size, u8 dest_size); #define __dma_spi2buf(__BUFF__, __LEN__) \ do { \ __dma_cfg(0xffffffff, (u32) (__BUFF__), (u32) (__LEN__), 0, \ DMA_TYPE_CONST, DMA_TYPE_INCR, DMA_SIZE_WORD, \ DMA_SIZE_WORD); \ while (__dma_busy()) \ ; \ } while (0) ``` `dma_cfg` has the function signature shown above. The macro `dma_spi2buf` will read data from SPI data register to the destination address. At first, I was skeptical about the `src` config `0xffffffff`. But later I when I looked at the RTL source ```verilog assign fifo_wdata_pre = ({32{dma_con_src_type == TYPE_FIXED}} & m_axi_intf.rdata >> {src_addr[1:0], 3'b0})| ({32{dma_con_src_type == TYPE_INCR }} & m_axi_intf.rdata >> {src_addr[1:0], 3'b0})| ({32{dma_con_src_type == TYPE_CONST}} & dma_src); ``` I found that `TYPE_CONST` means SPI mode which is different from `TYPE_FIXED` for fixed address. `dma_src` which is configured `0xffffffff` will act as a mask to the data. The `bypass` probably means that DMA need not to read via AXI bus -> APB -> SPI. #### File IO To load files from the bootable section partition 0, I need to understand the details of the filesystem used. Ref: [Microsoft FAT spec](https://academy.cba.mit.edu/classes/networking_communications/SD/FAT.pdf) #### Loader With adequate file operations, we can load the elf file from SD card to DDR. ### Migrating from bbl to openSBI firmware ```graphviz digraph G { N1[label="Zero stage boot loader"]; N2[label="OpenSBI firmware - Payload"]; N3[label="Linux Kernel"]; N1 -> N2 -> N3; } ``` The new boot flow will look like this graph. Zero stage boot loader is repsonsible of loading openSBI firmware to DDR. The linux kernel is bundled with openSBI firmware as payload. After openSBI initialization, the CPU will execute in supervisor mode and the handle will be transfered to the kernel. Create a new platform in openSBI following the [official guide](https://github.com/riscv-software-src/opensbi/blob/master/docs/platform_guide.md). The file structure will look like this: ``` . ├── configs │   └── defconfig ├── Kconfig ├── objects.mk └── platform.c ``` `Kconfig` and `defconfig` will provide build time configuration options. `platform.c` will provide `struct sbi_platform` object for building openSBI firmware. The official repo kindly provides a template for new-built platform. There is a `generic` platform used by many SoC vendors including Andes, Sifive, THead. The `generic` platform is FDT (flattten device tree) based platform, and its really overkill in my case. The hardware info can be hardcoded in the firmware for my design. However, I need to figure out how does openSBI generate and pass the device tree blob to the next stage if I choose this method for the new platform. I created a new platform called `amp` for Asymmetric multiprocessing. To make debugging firmware easier, I need to maker serial console work as soon as possible. I followed how sifive implemented there own uart firmware and created `amp-uart.[ch]` under `sbi_utils/serial`. I cannot find any specification for the custom uart controller, so I search for the original RTL source code. See Hardware RTL [UART controller](#UART-controller) section for more detail. I then implemented `putc` and `getc` for `struct sbi_console_device`. ```c struct sbi_console_device { /** Name of the console device */ char name[32]; /** Write a character to the console output */ void (*console_putc)(char ch); /** Write a character string to the console output */ unsigned long (*console_puts)(const char *str, unsigned long len); /** Read a character from the console input */ int (*console_getc)(void); }; ``` `puts` is not required because `sbi_console` will check if it is implemented and choose between using `puts` or iterative `putc`s. Next the irqchip (PLIC controller). See Hardware RTL [PLIC](#PLIC-Platform-level-interrupt-controller) section for more memory mapping details. OpenSBI provides a set of PLIC APIs in `sbi_utils/plic.c`, which I can use in the firmware. Simply fill in my PLIC config ```c #define PLATFORM_HART_COUNT 1 #define PLATFORM_PLIC_ADDR 0xc000000 #define PLATFORM_PLIC_SIZE (0x200000 + \ (PLATFORM_HART_COUNT * 0x1000)) #define PLATFORM_PLIC_NUM_SOURCES 32 static struct plic_data plic = { .addr = PLATFORM_PLIC_ADDR, .size = PLATFORM_PLIC_SIZE, .num_src = PLATFORM_PLIC_NUM_SOURCES, }; ``` and I can use the implemented PLIC firmware. Next the timer firmware. The hardware description can be found in [CLINT](#Clint-Core-local-interrupt-controller) section. A sbi timer device must implement the following attributes ```c static struct sbi_timer_device plmt_timer = { .name = "amp-plmt", .timer_freq = 10000000, .timer_value = plmt_timer_value, .timer_event_start = plmt_timer_event_start, .timer_event_stop = plmt_timer_event_stop }; ``` `timer_event_start` and `timer_event_stop` allows the operating system to do scheduling and other timing related operations. With uart controller, plic, clint firmware, my platform can be initialized. After adding platform `amp` to the compilation config options, I built the platform with: ``` $ make PLATFORM=amp CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build ``` To specify platform info, use `PLATFORM_RISCV_ISA`, `PLATFORM_RISCV_ABI`, `PLATFORM_RISCV_XLEN` for a specific architecture. For example: ``` $ make PLATFORM=amp PLATFORM_RISCV_ISA=rv64ima_zicsr_zifencei PLATFORM_RISCV_ABI=lp64 CROSS_COMPILE=riscv64-unknown-linux-gnu- O=build ``` ### boot openSBI After loading openSBI firmware to DDR, zsbl will jump to firmware entry point and firmware will start its work. Then I got this error message: ``` ... BBL ... VMLINUX ... SBI OpenSBI v1.4-111-gd962db2 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| sbi_trap_error: hart0: trap0: trap redirect failed (error -2) sbi_trap_error: hart0: trap0: mcause=0x0000000000000003 mtval=0x0000000000000000 sbi_trap_error: hart0: trap0: mepc=0x0000000090014844 mstatus=0x0000000a00001800 sbi_trap_error: hart0: trap0: ra=0x0000000090009830 sp=0x0000000090023ef0 sbi_trap_error: hart0: trap0: gp=0x0000000000000000 tp=0x0000000090024000 sbi_trap_error: hart0: trap0: s0=0x0000000090023f00 s1=0x0000000090024150 ... ``` I also encountered the situation that the CPU jumps execution back to 0x0, which restarts the rom code again. Not sure what caused this issue. The next step will be enabling JTAG debugger and correct physical memory separation between ARM and RISC-V. ### USB-to-JTAG FT232H driver issue The JTAG debugger written in C# provided by the previous work will run into error when I followed the same steps. To make sure that the FT232H chip is functioning, I will do some basic tests on it. If the chip is working, then its the connection issue or RTL failure. Ref: [FTDI-in-C](https://www.instructables.com/FTDI-in-C/) Ref: [FTDI driver programming guide](https://ftdichip.com/wp-content/uploads/2023/09/D2XX_Programmers_Guide.pdf) Ref: [FTDI JTAG](https://ftdichip.com/Documents/AppNotes/AN_129_FTDI_Hi_Speed_USB_To_JTAG_Example.pdf) I tried to use the example driver code for FT232H chip from Reference 1, and found no issue running the code. The offficial read buffer code is like: ```c dwNumBytesToSend = 0; // Reset output buffer pointer do { ftStatus = FT_GetQueueStatus(ftHandle, &dwNumBytesToRead); // Get the number of bytes in the device input buffer } while ((dwNumBytesToRead == 0) && (ftStatus == FT_OK)); // or Timeout bool bCommandEchod = false; ftStatus = FT_Read(ftHandle, &byInputBuffer, dwNumBytesToRead, &dwNumBytesRead); ``` and the one in the debugger ```csharp FTDI.FT_STATUS ftStatus; Int32 retry = 0; do { // Get the number of bytes in the device input buffer ftStatus = ftdi.GetRxBytesAvailable(ref NumBytesToRead); retry++; if (retry > 5000) { MessageBox.Show("Get input buffer timeout"); ftdi.Close(); return FTDI.FT_STATUS.FT_OTHER_ERROR; } } while (((len == 0 && NumBytesToRead == 0) || (len != 0 && NumBytesToRead != len)) && (ftStatus == FTDI.FT_STATUS.FT_OK)); // Read out the data from input buffer return ftStatus |= ftdi.Read(InputBuffer, NumBytesToRead, ref NumBytesRead); ``` Despite the programming language different, there was a retry limit in the debugger. I commented out the retyr limit and did a quick try, then it worked! {%youtube tJRigf_rBEk %} This is an important milestone because it allows me to trace instructions executed by the CPU and probe some internal states. A future plan is to port this debugger to Linux for convenient as Windows is not my main working computer. ### Separating ARM and RISC-V physical memory space Ref: https://gist.github.com/yunqu/827862e580a5f9b069eccdfcdcf70398 Followed the tutorial, and get pynq boot image info ``` FIT description: U-Boot fitImage for PYNQ arm kernel Created: Thu Nov 18 03:29:34 2021 Image 0 (kernel@0) Description: Linux Kernel Created: Thu Nov 18 03:29:34 2021 Type: Kernel Image Compression: uncompressed Data Size: 5869440 Bytes = 5731.88 KiB = 5.60 MiB Architecture: ARM OS: Linux Load Address: 0x00080000 Entry Point: 0x00080000 Hash algo: sha1 Hash value: d113552f61c40e646b7ec24bab9b0c31f3778d57 Image 1 (fdt@0) Description: Flattened Device Tree blob Created: Thu Nov 18 03:29:34 2021 Type: Flat Device Tree Compression: uncompressed Data Size: 19771 Bytes = 19.31 KiB = 0.02 MiB Architecture: ARM Hash algo: sha1 Hash value: 314cc1baf3d0d5360c5ac3c6c4e0dfa742a1a27f Default Configuration: 'conf@1' Configuration 0 (conf@1) Description: Boot Linux kernel with FDT blob Kernel: kernel@0 FDT: fdt@0 Hash algo: sha1 Hash value: unavailable ``` Extract the device tree wiht ``` dumpimage -T flat_dt -p 1 -o ~/amp.dtb image.ub ``` convert to humanreadable format and identify physical memory field ``` ... memory { device_type = "memory"; reg = <0x00 0x20000000>; }; ... ``` change `reg = <0x00 0x20000000>` to `reg = <0x00 0x10000000>` then recompile device tree. ``` dtc -I dts -O dtb -o system.dtb amp.dts ``` then repackage the boot image with ``` mkimage -f image.its image.ub ``` After a reboot, I checked the memory usage: ``` $ xilinx@pynq:~$ free -g -h -t total used free shared buff/cache available Mem: 494Mi 131Mi 141Mi 1.0Mi 220Mi 351Mi Swap: 511Mi 0B 511Mi Total: 1.0Gi 131Mi 653Mi ``` I searched for other methods but still failed to overcome this issue. ### openSBI trap issue I found that my platform timer init has the wrong implementation, which executes `ebreak` and halted openSBI runtime. openSBI successfully printed out all the hart info and platform info after a quick fix: ``` UART done SD done FAT BPB done ... BBL ... VMLINUX ... SBI sbi done OpenSBI v1.4-111-gd962db2 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : amp Platform Features : medeleg Platform HART Count : 1 Platform IPI Device : --- Platform Timer Device : amp-plmt @ 1000000Hz Platform Console Device : amp_uart Platform HSM Device : --- Platform PMU Device : --- Platform Reboot Device : --- Platform Shutdown Device : --- Platform Suspend Device : --- Platform CPPC Device : --- Firmware Base : 0x80000000 Firmware Size : 182 KB Firmware RW Offset : 0x20000 Firmware RW Size : 54 KB Firmware Heap Offset : 0x25000 Firmware Heap Size : 34 KB (total), 2 KB (reserved), 11 KB (used), 20 KB (free) Firmware Scratch Size : 4096 B (total), 344 B (used), 3752 B (free) Runtime SBI Version : 2.0 Domain0 Name : root Domain0 Boot HART : 0 Domain0 HARTs : 0* Domain0 Region00 : 0x000000000c200000-0x000000000c200fff M: (I,R,W) S/U: (R,W) Domain0 Region01 : 0x0000000010000000-0x0000000010000fff M: (I,R,W) S/U: (R,W) Domain0 Region02 : 0x0000000008004000-0x0000000008007fff M: (I,R,W) S/U: () Domain0 Region03 : 0x0000000008010000-0x0000000008013fff M: (I,R,W) S/U: () Domain0 Region04 : 0x0000000008008000-0x000000000800ffff M: (I,R,W) S/U: () Domain0 Region05 : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: () Domain0 Region06 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: () Domain0 Region07 : 0x000000000c000000-0x000000000c1fffff M: (I,R,W) S/U: (R,W) Domain0 Region08 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X) Domain0 Next Address : 0x0000000080200000 Domain0 Next Arg1 : 0x00000000000011d0 Domain0 Next Mode : S-mode Domain0 SysReset : yes Domain0 SysSuspend : yes Boot HART ID : 0 Boot HART Domain : root Boot HART Priv Version : v1.12 Boot HART Base ISA : rv64iemac Boot HART ISA Extensions : smaia,smstateen,sscofpmf,sstc,zicntr,smcntrpmf,sdtrig Boot HART PMP Count : 0 Boot HART PMP Granularity : 0 bits Boot HART PMP Address Bits: 0 Boot HART MHPM Info : 0 (0x00000000) Boot HART Debug Triggers : 1 triggers Boot HART MIDELEG : 0x0000000000000222 Boot HART MEDELEG : 0x000000000000b109 ``` Notice that some fields have weird values, like `Boot HART Base ISA`, `Boot HART ISA Extensions` and the dummy payload did not print out `test payload running`. Debugger showed that ``` (28678999 cycles) [M] 00000000800130a0:30200073 mret (28679023 cycles) [S] InstructionAccessFault, epc = 0x80200000, tval = 0x80200000 ``` after a `mret`, the CPU caught an InstructionAccessFault execption. `Boot HART Base ISA` was implemented as `rv64eimac` in the RTL, so there is no issue. `Boot HART ISA Extensions` requires platform code implementing `extensions_init`. So I need to implement that. It is related to PMP as well. Ref: [OpenSBI Domain](https://review.coreboot.org/plugins/gitiles/opensbi/+/781cafdbee0fc9c8b0296149da35e7929abb6224/docs/domain_support.md) If I removed the domain memory region I added myself in the sbi firmware, the test payload can run without error. ``` Domain0 Name : root Domain0 Boot HART : 0 Domain0 HARTs : 0* Domain0 Region00 : 0x000000000c200000-0x000000000c200fff M: (I,R,W) S/U: (R,W) Domain0 Region01 : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: () Domain0 Region02 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: () Domain0 Region03 : 0x000000000c000000-0x000000000c1fffff M: (I,R,W) S/U: (R,W) Domain0 Region04 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X) Domain0 Next Address : 0x0000000080200000 Domain0 Next Arg1 : 0x0000000000000000 ... Boot HART Debug Triggers : 1 triggers Boot HART MIDELEG : 0x0000000000000222 Boot HART MEDELEG : 0x000000000000b109 Test payload running ``` The first region is PLIC config. Then the two regions protecting openSBI firmware. I couldn't find which context added the fourth region. The last region is by default the rest of the memory space. The payload is a `while(1) wfi();` loop. ## TODO: 用 Linux 核心驗證 ### Compile linux kernel 4.20 Prepare toolchain beforehand, then put in custom drivers. Add driver files and modify Makefile ```diff drivers/Makefile +obj-y += debug/ drivers/net/ethernet/Makefile +obj-y += eth-riscv.o drivers/power/reset/Makefile +obj-y += pwrcon-riscv.o drivers/spi/Makefile +obj-y += spi-riscv.o drivers/tty/serial/Makefile +obj-y += sifive.o ``` Configure features ``` make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- menuconfig ``` Compile kernel ``` make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- -j$(nproc) ``` Fix some minor issues like `require zifenci`, `DECLARE_TASKLET`. Then bundle the result `Image` with openSBI firmware as a payload. ``` make PLATFORM=amp PLATFORM_RISCV_ISA=rv64ima_zicsr_zifencei PLATFORM_RISCV_ABI=lp64 CROSS_COMPILE=riscv64-unknown-linux-gnu- FW_PAYLOAD_PATH=../../linux/arch/riscv/boot/Image FW_PAYLOAD_OFFSET=0x400000 FW_FDT_PATH=./amp.dtb O=build ``` ### Boot from openSBI firmware Then boot from SBI gave an error, ``` ... Platform CPPC Device : --- Firmware Base : 0x80000000 Firmware Size : 182 KB Firmware RW Offset : 0x20000 Firmware RW Size : 54 KB Firmware Heap Offset : 0x25000 Firmware Heap Size : 34 KB (total), 2 KB (reserved), 11 KB (used), 20 KB (free) Firmware Scratch Size : 4096 B (total), 344 B (used), 3752 B (free) Runtime SBI Version : 2.0 Domain0 Name : root Domain0 Boot HART : 0 Domain0 HARTs : 0* Domain0 Region00 : 0x000000000c200000-0x000000000c200fff M: (I,R,W) S/U: (R,W) Domain0 Region01 : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: () Domain0 Region02 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: () Domain0 Region03 : 0x000000000c000000-0x000000000c1fffff M: (I,R,W) S/U: (R,W) Domain0 Region04 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X) Domain0 Next Address : 0x0000000080400000 Domain0 Next Arg1 : 0x0000000080016000 Domain0 Next Mode : S-mode Domain0 SysReset : yes Domain0 SysSuspend : yes ``` Some irrelevant information have been stripped. We can see next address is the payload (linux kernel entry) address and next arg1 is the device tree address. So the control has been passed to the kernel, but nothing is showed on the screen. The kernel doc about [RISC-V booting](https://www.kernel.org/doc/html/next/riscv/boot.html) only mentioned that `a0` hartid, and `a1` the FDT address. Looking into the CPU instruction trace, PC got stuck at 0xffffffff800000dc. From the kernel disassembly ``` ffffffff800000cc: 5f018193 addi gp,gp,1520 # ffffffff804256b8 <sched_clock_running> ffffffff800000d0: 18061073 csrw satp,a2 ffffffff800000d4: 8082 ret ffffffff800000d6: 0001 nop ffffffff800000d8: 10500073 wfi ffffffff800000dc: bff5 j ffffffff800000d8 <relocate+0x60> ``` This code can be mapped to `arch/riscv/kernel/head.S` ```asm= csrw sptbr, a0 .align 2 1: /* Set trap vector to spin forever to help debug */ la a0, .Lsecondary_park csrw stvec, a0 /* Reload the global pointer */ .option push .option norelax la gp, __global_pointer$ .option pop /* Switch to kernel page tables */ csrw sptbr, a2 ret .Lsecondary_park: /* We lack SMP support or have too many harts, so park this hart */ wfi j .Lsecondary_park ``` Line 5 ~ 6 setups a temprorary trap address that points to a `wfi()` `j` combo, which will spin forever. ``` (38151087 cycles) [S] ffffffff800144c0:00338097 auipc ra,0x338 ra ffffffff8034c4c0 (38151088 cycles) [S] ffffffff800144c4:5d4080e7 jalr ra,1492(ra) ra ffffffff800144c8 (38151200 cycles) [S] LoadPageFault, epc = 0xffffffff8034ca94, tval = 0xffffffff7fc16000 (38151242 cycles) [S] ffffffff80000100:10500073 wfi ``` After a LoadPageFault, PC jumps to the spin address. It is clear that something is wrong with PMP amd PMA configs that I didn't handle in openSBI firmware. In openSBI firmware, there is a call to `sbi_hart_pmp_configure(struct sbi_scratch *scratch)` when initializing the platform. This functions configures PMP CSRs according to the `Domain` previosly set. The stuck happened in `fdt_check_header(params)`, where the kernel tries to parse the device tree passed from the previos boot stage. I was guessing that the location where the DTB stays when compiling openSBI firmware `FW_FDT_PATH` is prohibited SU mode from accessing. So I tried moving DTB to a higher address that is valid (0x98000000) in the boot code, which resulted in: ``` Domain0 Next Address : 0x0000000080400000 Domain0 Next Arg1 : 0x0000000098000000 Domain0 Next Mode : S-mode Domain0 SysReset : yes ... Boot HART MIDELEG : 0x0000000000000222 Boot HART MEDELEG : 0x000000000000b109 [ 0.000000] Linux version 4.20.0+ (jacob@jacob-ubuntu-server) (gcc version 13.2.0 (gc891d8dc23e)) #13 Fri Jun 21 15:32:17 CST 2024 [ 0.000000] printk: bootconsole [early0] enabled [ 0.000000] initrd not found or empty - disabling initrd [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080400000-0x00000000a03fffff] [ 0.000000] Normal [mem 0x00000000a0400000-0x00000a03ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080400000-0x00000000a03fffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080400000-0x00000000a03fffff] ``` Luckily the kernel has some feedback, but still stuck on `Initmem setup node0`. I immediately recognize a skeptical spot where `Initmem` ranges from `0x80400000` to `0xa03fffff`. The upper bound should be `0x9fffffff`. It turned out that the device tree was wrong. ```diff #address-cells = <2>; #size-cells = <2>; ddr: ddr@80000000 { device_type = "memory"; - reg = <0x00000000 0x80400000 0x00000000 0x20000000>; + reg = <0x00000000 0x80400000 0x00000000 0x1fc00000>; }; ``` the last two number in reg field is the size. Since I move the ddr base from `0x80000000` to `0x80400000`, the size must be shrunk too. Solved. Then the kernel stuck at this point, no error message, no other feedback. When I look into the CPU PC, its not in a dead end, but still running. ``` [ 0.000000] Memory: 440772K/520192K available (3377K kernel code, 195K rwdata, 1619K rodata, 124K init, 232K bss, 79420K reserved, 0K cm) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0 [ 0.000000] plic: mapped 32 interrupts to 2 (out of 2) handlers. [ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns ``` Because there is no feedback (error message) from the kernel, I inserted some custom messages into `start_kernel`. ``` [ 0.000000] Tick init [ 0.000000] RCU [ 0.000000] Init timers [ 0.000000] hrtimers [ 0.000000] Soft IRQ init [ 0.000000] Time init [ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns [ 0.000000] Printk safe [ 0.000000] Perf event [ 0.000000] Profile event [ 0.000000] Call function [ 0.000000] Local irq CPU intstruction trace: (49066075 cycles) [S] ffffffff80000aec:10016073 csrsi sstatus,0x2 sstatus 0000000200000102 (49066099 cycles) [S] Interrupt 5, epc = 0xffffffff80000af0, tval = 0x00000000 <- Interrupt 5: S mode timer interrupt (49066149 cycles) [S] ffffffff800201e0:14021273 csrrw tp,sscratch,tp sscratch ffffffff80508428 tp 0000000000000000 (49066150 cycles) [S] ffffffff800201e4:00021663 bnez tp,ffffffff800201f0 (49066151 cycles) [S] ffffffff800201e8:14002273 csrr tp,sscratch ``` After enabling local interrupt, the kernel hangs. By looking at the trace, I found that CPU got a timer interrupt and never came back to kernel startup code. I realized that my timer firmware may have some issue and revised it using aclint mtimer provided by openSBI, which solved the issue. > ACLINT spec 1.1: > The RISC-V ACLINT specification is defined to be backward compatible with the SiFive CLINT specification. ``` [ 13.860000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 [ 13.860000] CPU: 0 PID: 1 Comm: init Not tainted 4.20.0+ #18 [ 13.860000] Call Trace: [ 13.860000] [<ffffffff8002166c>] walk_stackframe+0x0/0xc0 [ 13.860000] [<ffffffff80025f74>] panic+0x110/0x248 [ 13.860000] [<ffffffff8002721c>] forget_original_parent+0x2c8/0x2d4 [ 13.860000] [<ffffffff8002757c>] exit_notify+0x30/0x144 [ 13.860000] [<ffffffff80027884>] do_exit+0x1f4/0x420 [ 13.860000] [<ffffffff80028514>] do_group_exit+0x2c/0x8c [ 13.860000] [<ffffffff80031e44>] get_signal+0x100/0x4c0 [ 13.860000] [<ffffffff80020de0>] do_notify_resume+0x4c/0x180 [ 13.860000] [<ffffffff8002034c>] ret_from_syscall+0xc/0x10 [ 13.860000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 ]--- [ 310.240000] EXT4-fs (mmcblk0p2): error count since last fsck: 10 [ 310.240000] EXT4-fs (mmcblk0p2): initial error at time 457: ext4_iget:5074: inode 1218298 [ 310.250000] EXT4-fs (mmcblk0p2): last error at time 703: ext4_iget:5074: inode 1218298 ``` The busybox init was built with floating point instructions, while my firwmare does not provide FP emulation, so I need to rebuild a new rootfs. ### Construct rootfs and busybox Compile busy box ``` $ ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- make menuconfig $ ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- make -j$(nproc) $ ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- make install #Copy all built file to _install/ ``` Make rootfs ``` # Make rootfs $ mkdir rootfs $ cp -r $BUSYBOX_DIR/_install/* rootfs # Install libraries from toolchain $ cp -a rv64ima-linux/sysroot/lib/ ../amp/rootfs/ # create empty directories $ mkdir -p dev home mnt proc sys tmp var $ mkdir -p etc/init.d ``` Create a minimal rcS file: ```shell #!/bin/sh +x # /etc/rcS export PATH=/sbin:/bin:/usr/bin mount -t sysfs sysfs /sys mount -t proc proc /proc hostname amp ``` The rest is the same as before. Make disk uing mkfs.ext3, copy file ... **Quick Demo of this version:** {%youtube iY2cqgiRfYY %} Now that I have a slight grasp to the entire flow, including CPU RTL -> openSBI firmware -> Linux drivers -> rootfs, I can start all over again and make it a RV32IMAC CPU.