owned this note
owned this note
Published
Linked with GitHub
# Homework3: Soft CPU
## [Reindeer - RISCV `RV32I[M]` Soft CPU](https://github.com/PulseRain/Reindeer)
PulseRain Reindeer is a soft CPU of Von Neumann architecture. It supports RISC-V RV32I[M] instruction set, and features a 2 x 2 pipeline. It strives to make a balance between speed and area, and offers a flexible choice for soft CPU across all FPGA platforms.
## Highlights - 2 x 2 Pipeline
Reindeer's pipeline is composed of 4 stages:
1. Instruction Fetch (IF)
2. Instruction Decode (ID)
3. Instruction Execution (IE)
4. Register Write Back and Memory Access (MEM)
However, unlike traditional pipelines, Reindeer's pipeline stages are mapped to a 2 x 2 layout, as illustrated below:

In the 2 x 2 layout, each stage is active every other clock cycle. For the even cycle, only IF and IE stages are active, while for the odd cycle, only ID and MEM stages are active. In this way, the Instruction Fetch and Memory Access always happen on different clock cycles, thus to avoid the structural hazard caused by the single port memory.
Verilator is invoked with parameters similar to GCC or Synopsys’s VCS. It "Verilates" the specified Verilog or SystemVerilog code by reading it, performing lint checks, and optionally inserting assertion checks and coverage-analysis points. It outputs single- or multi-threaded .cpp and .h files, the "Verilated" code.
The user writes a little C++/SystemC wrapper file, which instantiates the "Verilated" model of the user’s top level module. These C++/SystemC files are then compiled by a C++ compiler (gcc/clang/MSVC++). The resulting executable performs the design simulation. Verilator also supports linking its generated libraries, optionally encrypted, into other simulators.
## [Verilator](https://www.veripool.org/wiki/verilator)
Verilator may not be the best choice if you are expecting a full featured replacement for NC-Verilog, VCS or another commercial Verilog simulator, or if you are looking for a behavioral Verilog simulator e.g. for a quick class project (we recommend Icarus Verilog for this.) However, if you are looking for a path to migrate SystemVerilog to C++ or SystemC, or your team is comfortable writing just a touch of C++ code, Verilator is the tool for you.
## Hold and Load
Bootstrapping a soft CPU tends to be a headache. The traditional approach is more or less like the following:
1. Making a boot loader in software
2. Store the boot loader in a ROM
3. After power on reset, the boot loader is supposed to be executed first, for which it will move the rest of the code/data into RAM. And the PC will be set to the _start address of the new image afterwards.
The drawbacks of the above approach are:
1. The bootloader will be more or less intrusive, as it takes memory spaces.
2. The implementation of ROM is not consistent across all FPGA platforms. For some FPGAs, like Intel Cyclone or Xilinx Artix, the memory initial data can be stored in FPGA bitstream, but other platforms might choose to use external flash for the ROM data. The boot loader itself could be executed in place, or loaded with a small preloader implemented in FPGA fabric. And when it comes to SoC + FPGA, like the Microsemi Smartfusion2, a hardcore processor could also be involved in the boot-loading. In other words, the soft CPU might have to improvise a little bit to work on various platforms.
To break the status quo, the PulseRain Reindeer takes a different approach called "hold and load", which brings a hardware based OCD (on-chip debugger) into the fore, as illustrated below:


## Source code taken from [Homework1](https://hackmd.io/SvOdYVW9Qi6aASB4oCRvag)
**Assembly code**
```cpp=
main:
addi a2, zero, 7
addi a0, zero, 0
jal honoi
j end
honoi:
addi sp, sp, -8
sw ra, 4(sp)
sw a2, 0(sp)
addi t0, zero, 1
beq a2, t0, re
addi a2, a2, -1
jal honoi
addi a0, a0, 1
jal honoi
lw ra, 4(sp)
lw a2, 0(sp)
addi sp, sp, 8
jr ra
re:
addi sp, sp, 8
addi a0, a0, 1
jr ra
end:
```
**Rewrite the code matched with the requirements described in [RISC-V Compliance Tests](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc).**
```cpp=
#include "riscv_test_macros.h"
#include "compliance_test.h"
#include "compliance_io.h"
# Test Virtual Machine (TVM) used by program.
RV_COMPLIANCE_RV32M
# Test code region.
RV_COMPLIANCE_CODE_BEGIN
RVTEST_IO_INIT
RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "Test Begin\n")
# ---------------------------------------------------------------------------------------------
RVTEST_IO_WRITE_STR(x31, "# Test part 1\n")
# Addresses for test data and results
la x5, test_1_data
la x6, test_1_res
# Load testdata
lw x12, 0(x5)
# Register initialization
li x11, 0 # result
# Test
jal x1, hanoi
# Store results
sw x12, 0(x6)
sw x11, 4(x6)
j end
//
// Assert
//
RVTEST_IO_CHECK()
RVTEST_IO_ASSERT_GPR_EQ(x6, x12, 0x3)
RVTEST_IO_WRITE_STR(x31, "# Argument test - complete\n")
RVTEST_IO_ASSERT_GPR_EQ(x6, x11, 0x00000007)
RVTEST_IO_WRITE_STR(x31, "# Combination test - complete")
RVTEST_IO_WRITE_STR(x31, "# Test part 1 End\n")
hanoi:
addi x2, x2, -8
sw x1, 4(x2)
sw x12, 0(x2)
addi x13, x0, 1
beq x12, x13, re
addi x12, x12, -1
jal x1, hanoi
addi x11, x11, 1
jal x1, hanoi
lw x1, 4(x2)
lw x12, 0(x2)
addi x2, x2, 8
jr x1
re:
addi x2, x2, 8
addi x11, x11, 1
jr x1
end:
RV_COMPLIANCE_HALT
RV_COMPLIANCE_CODE_END
# Input data section.
.data
test_1_data:
.word 3
#Output data section.
RV_COMPLIANCE_DATA_BEGIN
test_1_res:
.fill 2, 4, -1
RV_COMPLIANCE_DATA_END
```
**Convert .S file to .elf**
:::spoiler .elf.objdump檔
```cpp=
/home/samyang/riscv-compliance/imperas-riscv-tests/work/rv32i/Tower_of_Hanoi.elf: file format elf32-littleriscv
Disassembly of section .text.init:
80000000 <_start>:
80000000: 04c0006f j 8000004c <reset_vector>
80000004 <trap_vector>:
80000004: 34202f73 csrr t5,mcause
80000008: 00800f93 li t6,8
8000000c: 03ff0a63 beq t5,t6,80000040 <write_tohost>
80000010: 00900f93 li t6,9
80000014: 03ff0663 beq t5,t6,80000040 <write_tohost>
80000018: 00b00f93 li t6,11
8000001c: 03ff0263 beq t5,t6,80000040 <write_tohost>
80000020: 80000f17 auipc t5,0x80000
80000024: fe0f0f13 addi t5,t5,-32 # 0 <_start-0x80000000>
80000028: 000f0463 beqz t5,80000030 <trap_vector+0x2c>
8000002c: 000f0067 jr t5
80000030: 34202f73 csrr t5,mcause
80000034: 000f5463 bgez t5,8000003c <handle_exception>
80000038: 0040006f j 8000003c <handle_exception>
8000003c <handle_exception>:
8000003c: 5391e193 ori gp,gp,1337
80000040 <write_tohost>:
80000040: 00001f17 auipc t5,0x1
80000044: fc3f2023 sw gp,-64(t5) # 80001000 <tohost>
80000048: ff9ff06f j 80000040 <write_tohost>
8000004c <reset_vector>:
8000004c: f1402573 csrr a0,mhartid
80000050: 00051063 bnez a0,80000050 <reset_vector+0x4>
80000054: 00000297 auipc t0,0x0
80000058: 01028293 addi t0,t0,16 # 80000064 <reset_vector+0x18>
8000005c: 30529073 csrw mtvec,t0
80000060: 18005073 csrwi satp,0
80000064: 00000297 auipc t0,0x0
80000068: 01c28293 addi t0,t0,28 # 80000080 <reset_vector+0x34>
8000006c: 30529073 csrw mtvec,t0
80000070: fff00293 li t0,-1
80000074: 3b029073 csrw pmpaddr0,t0
80000078: 01f00293 li t0,31
8000007c: 3a029073 csrw pmpcfg0,t0
80000080: 00000297 auipc t0,0x0
80000084: 01828293 addi t0,t0,24 # 80000098 <reset_vector+0x4c>
80000088: 30529073 csrw mtvec,t0
8000008c: 30205073 csrwi medeleg,0
80000090: 30305073 csrwi mideleg,0
80000094: 30405073 csrwi mie,0
80000098: 00000193 li gp,0
8000009c: 00000297 auipc t0,0x0
800000a0: f6828293 addi t0,t0,-152 # 80000004 <trap_vector>
800000a4: 30529073 csrw mtvec,t0
800000a8: 00100513 li a0,1
800000ac: 01f51513 slli a0,a0,0x1f
800000b0: 00054863 bltz a0,800000c0 <reset_vector+0x74>
800000b4: 0ff0000f fence
800000b8: 00100193 li gp,1
800000bc: 00000073 ecall
800000c0: 80000297 auipc t0,0x80000
800000c4: f4028293 addi t0,t0,-192 # 0 <_start-0x80000000>
800000c8: 00028e63 beqz t0,800000e4 <reset_vector+0x98>
800000cc: 10529073 csrw stvec,t0
800000d0: 0000b2b7 lui t0,0xb
800000d4: 10928293 addi t0,t0,265 # b109 <_start-0x7fff4ef7>
800000d8: 30229073 csrw medeleg,t0
800000dc: 30202373 csrr t1,medeleg
800000e0: f4629ee3 bne t0,t1,8000003c <handle_exception>
800000e4: 30005073 csrwi mstatus,0
800000e8: 00002537 lui a0,0x2
800000ec: 80050513 addi a0,a0,-2048 # 1800 <_start-0x7fffe800>
800000f0: 30052073 csrs mstatus,a0
800000f4: 00000297 auipc t0,0x0
800000f8: 01428293 addi t0,t0,20 # 80000108 <begin_testcode>
800000fc: 34129073 csrw mepc,t0
80000100: f1402573 csrr a0,mhartid
80000104: 30200073 mret
80000108 <begin_testcode>:
80000108: 00002297 auipc t0,0x2
8000010c: ef828293 addi t0,t0,-264 # 80002000 <test_1_data>
80000110: 00002317 auipc t1,0x2
80000114: f0030313 addi t1,t1,-256 # 80002010 <begin_signature>
80000118: 0002a603 lw a2,0(t0)
8000011c: 00000593 li a1,0
80000120: 010000ef jal ra,80000130 <hanoi>
80000124: 00c32023 sw a2,0(t1)
80000128: 00b32223 sw a1,4(t1)
8000012c: 0440006f j 80000170 <end>
80000130 <hanoi>:
80000130: ff810113 addi sp,sp,-8
80000134: 00112223 sw ra,4(sp)
80000138: 00c12023 sw a2,0(sp)
8000013c: 00100693 li a3,1
80000140: 02d60263 beq a2,a3,80000164 <re>
80000144: fff60613 addi a2,a2,-1
80000148: fe9ff0ef jal ra,80000130 <hanoi>
8000014c: 00158593 addi a1,a1,1
80000150: fe1ff0ef jal ra,80000130 <hanoi>
80000154: 00412083 lw ra,4(sp)
80000158: 00012603 lw a2,0(sp)
8000015c: 00810113 addi sp,sp,8
80000160: 00008067 ret
80000164 <re>:
80000164: 00810113 addi sp,sp,8
80000168: 00158593 addi a1,a1,1
8000016c: 00008067 ret
80000170 <end>:
80000170: 0ff0000f fence
80000174: 00100193 li gp,1
80000178: 00000073 ecall
8000017c <end_testcode>:
8000017c: c0001073 unimp
80000180: 0000 unimp
...
Disassembly of section .tohost:
80001000 <tohost>:
...
80001100 <fromhost>:
...
Disassembly of section .data:
80002000 <test_1_data>:
80002000: 00000003 lb zero,0(zero) # 0 <_start-0x80000000>
...
80002010 <begin_signature>:
80002010: ffff 0xffff
80002012: ffff 0xffff
80002014: ffff 0xffff
80002016: ffff 0xffff
...
80002020 <end_signature>:
...
80002100 <begin_regstate>:
80002100: 0080 addi s0,sp,64
...
80002200 <end_regstate>:
80002200: 0004 0x4
```
:::
**Convert .elf file to .vcd**
