# Asssigment3: SoftCPU
contributed by < `dck9661` >
## Modify the assembly programs for Reindeer Simulation
### Analyze
In Reindeer github, we can only execute the elf file provided by [Reindeer](https://github.com/PulseRain/Reindeer/tree/master/sim/compliance). If we want to test our testbench we need to compile the assembly code by ourselves. So we need to analyze the assembly code provided by [riscv-compliance](https://github.com/riscv/riscv-compliance), then we can know how to compile our code to run with Reindeer verilator. Watch [I-ADD-01.S](https://github.com/riscv/riscv-compliance/blob/master/riscv-test-suite/rv32i/src/I-ADD-01.S) as example.
In this code, we need to know four things:
1. This assembly code use many macro from different header files, which are compliance_test.h, compliance_io.h, test_macros.h.
2. It use register x0 to x32 instead of ABI. (ra,sp,t1,....)
3. Input data section to give initial global data.
4. Output data section can see the result is correct or not.
```cpp=
#include "compliance_test.h"
#include "compliance_io.h"
#include "test_macros.h"
# Test Virtual Machine (TVM) used by program.
RV_COMPLIANCE_RV32M
# Test code region
RV_COMPLIANCE_CODE_BEGIN
RVTEST_IO_INIT
RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "# Test Begin\n")
# ---------------------------------------------------------------------------------------------
RVTEST_IO_WRITE_STR(x31, "# Test part A1 - general test of value 0 with 0, 1, -1, MIN, MAX immediate values\n");
# Addresses for test data and results
la x1, test_A1_data
la x2, test_A1_res
# Load testdata
lw x3, 0(x1)
# Test
addi x4, x3, 1
addi x5, x3, 0x7FF
addi x6, x3, 0xFFFFFFFF
addi x7, x3, 0
addi x8, x3, 0xFFFFF800
# Store results
sw x3, 0(x2)
sw x4, 4(x2)
sw x5, 8(x2)
sw x6, 12(x2)
sw x7, 16(x2)
sw x8, 20(x2)
//
// Assert
//
RVTEST_IO_CHECK()
RVTEST_IO_ASSERT_GPR_EQ(x2, x3, 0x00000111)
RVTEST_IO_ASSERT_GPR_EQ(x2, x4, 0x00000001)
RVTEST_IO_ASSERT_GPR_EQ(x2, x5, 0x000007FF)
RVTEST_IO_ASSERT_GPR_EQ(x2, x6, 0xFFFFFFFF)
RVTEST_IO_ASSERT_GPR_EQ(x2, x7, 0x00000000)
RVTEST_IO_ASSERT_GPR_EQ(x2, x8, 0xFFFFF800)
RVTEST_IO_WRITE_STR(x31, "# Test part A1 - Complete\n");
# ---------------------------------------------------------------------------------------------
# HALT
RV_COMPLIANCE_HALT
RV_COMPLIANCE_CODE_END
# Input data section.
.data
.align 4
test_A1_data:
.word 0
test_A2_data:
.word 1
test_A3_data:
.word -1
test_A4_data:
.word 0x7FFFFFFF
test_A5_data:
.word 0x80000000
test_B_data:
.word 0x0000ABCD
test_C_data:
.word 0x12345678
test_D_data:
.word 0xFEDCBA98
test_E_data:
.word 0x36925814
# Output data section.
RV_COMPLIANCE_DATA_BEGIN
.align 4
test_A1_res:
.fill 6, 4, -1
test_A2_res:
.fill 6, 4, -1
test_A3_res:
.fill 6, 4, -1
test_A4_res:
.fill 6, 4, -1
test_A5_res:
.fill 6, 4, -1
test_B_res:
.fill 7, 4, -1
test_C_res:
.fill 1, 4, -1
test_D_res:
.fill 2, 4, -1
test_E_res:
.fill 4, 4, -1
RV_COMPLIANCE_DATA_END
```
When we know how the above assembly works, next step we need a linker script to match the Reindeer simulation testbench.
```cpp=
OUTPUT_ARCH( "riscv" )
ENTRY(_start)
SECTIONS
{
. = 0x00000000;
.text.trap : { *(.text.trap) }
. = 0x80000000;
.text.init : { *(.text.init) }
. = ALIGN(0x1000);
.tohost : { *(.tohost) }
. = ALIGN(0x1000);
.text : { *(.text) }
. = ALIGN(0x1000);
.data : { *(.data) }
.data.string : { *(.data.string)}
.bss : { *(.bss) }
_end = .;
}
```
Final step we write a Makefile to compile assembly code.
```cpp=
RISCV_PREFIX ?= riscv32-unknown-elf-
RISCV_GCC ?= $(RISCV_PREFIX)gcc
RISCV_OBJDUMP ?= $(RISCV_PREFIX)objdump
RISCV_GCC_OPTS ?= -static -mcmodel=medany -fvisibility=hidden -nostdlib -nostartfiles
ROOTDIR ?= ${your_path}/riscv-compliance
TARGETDIR ?= $(ROOTDIR)/riscv-target
RISCV_TARGET ?= riscvOVPsim
APP_SRC = I-ADDI-01.S
EXE = I-ADDI-01.elf
all:
$(RISCV_GCC) $(APP_SRC) $(RISCV_GCC_OPTS) \
-I$(ROOTDIR)/riscv-test-env/ \
-I$(ROOTDIR)/riscv-test-env/p/ \
-I$(TARGETDIR)/$(RISCV_TARGET)/ \
-T$(ROOTDIR)/riscv-test-env/p/link.ld \
-o $(EXE)
```
### Write my own assembly code (bubble sort)
```cpp=
# RISC-V Compliance Test Bubble_sort.S
#include "compliance_test.h"
#include "compliance_io.h"
#include "test_macros.h"
# Test Virtual Machine (TVM) used by program.
RV_COMPLIANCE_RV32M
# Test code region.
RV_COMPLIANCE_CODE_BEGIN
RVTEST_IO_INIT
RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "# Test Begin\n")
# Addresses for test data and results
la x1, test_A1_data
la x2, test_A1_res
# Load testdata
lw x3, 0(x1)
# Register initialization
addi x3,x2,-32
li x4, 2
sw x4, 12(x3)
li x4, 1
sw x4, 16(x3)
li x4, 5
sw x4, 20(x3)
li x4, 4
sw x4, 24(x3)
li x4, 3
sw x4, 28(x3)
li x5, 4 // i = 4
4:
li x6, 0 //i = 0
addi x4, x3, 12 //x4 = 12(x3)
bge x6, x5, 1f
3:
lw x7, 0(x4) //first element
lw x8, 4(x4) //second element
addi x6, x6,1
bge x8, x7, 2f
sw x8, 0(x4)
sw x7, 4(x4)
2:
addi x4, x4, 4 //next element address
blt x6, x5, 3b
1:
addi x5, x5, -1
bnez x5, 4b
lw x6, 28(x3)
lw x5, 24(x3)
lw x4, 20(x3)
lw x7, 16(x3)
lw x8, 12(x3)
sw x6, 0(x2)
sw x5, 4(x2)
sw x4, 8(x2)
sw x7, 12(x2)
sw x8, 16(x2)
addi x3, x3, 32
# ---------------------------------------------------------------------------------------------
# HALT
RV_COMPLIANCE_HALT
RV_COMPLIANCE_CODE_END
# Input data section.
.data
test_A1_data:
.word 0
test_A2_data:
.word 1
test_A3_data:
.word -1
test_A4_data:
.word 0x7FFFFFFF
test_A5_data:
.word 0x80000000
test_B_data:
.word 0x0000ABCD
test_C_data:
.word 0x12345678
test_D_data:
.word 0xFEDCBA98
test_E_data:
.word 0x36925814
# Output data section.
RV_COMPLIANCE_DATA_BEGIN
test_A1_res:
.fill 6, 4, -1
RV_COMPLIANCE_DATA_END
```
### Result: run on the Reindeer Simulator
Write your own golden data in reference.
```cpp=
=============================================================
=== PulseRain Technology, RISC-V RV32IM Test Bench
=============================================================
elf file : ../compliance/Bubble_sort.elf
reference : ../compliance/references/Bubble_sort.reference_output
start address = 0x80000000
begin signature address = 0x80002030
end signature address = 0x80002050
```
Instructions sections
```cpp=
Loading section .text.init ... 1c4 bytes, LMA = 0x80000000
80000000 04c0006f
80000004 34202f73
80000008 00800f93
8000000c 03ff0a63
80000010 00900f93
80000014 03ff0663
80000018 00b00f93
8000001c 03ff0263
80000020 80000f17
80000024 fe0f0f13
80000028 000f0463
8000002c 000f0067
80000030 34202f73
80000034 000f5463
80000038 0040006f
8000003c 5391e193
80000040 00001f17
80000044 fc3f2023
.
.
.
.
```
Initial global data section.
```cpp=
Loading section .data ... 516 bytes, LMA = 0x80002000
80002000 00000000
80002004 00000001
80002008 ffffffff
8000200c 7fffffff
80002010 80000000
80002014 0000abcd
80002018 12345678
8000201c fedcba98
80002020 36925814
80002024 00000000
80002028 00000000
8000202c 00000000
80002030 ffffffff
80002034 ffffffff
80002038 ffffffff
8000203c ffffffff
80002040 ffffffff
```
Check the sorting is correct.
```cpp=
========> Matching signature ...
80002030 00000005 PASS
80002034 00000004 PASS
80002038 00000003 PASS
8000203c 00000002 PASS
80002040 00000001 PASS
======> Signature ALL MATCH!!!
=============================================================
Simulation exit ../compliance/Bubble_sort.elf
Wave trace Bubble_sort.vcd
=============================================================
```
## Explain how your program run in Reindeer Simulation
### Check the hierarchy about HDL code
```cpp=
HDL
|-- Reindeer.v |-- PLL (clock control)
|-- MCU (PulseRain_RV2T_MCU)
|-- OCD (debugger)
|-- MCU |-- UART_TX.v
|-- port_ram.v
|-- PulseRain_processor_core |-- RV2T_controller
|-- RV2T_CSR
|-- RV2T_data_access
|-- RV2T_execution
|-- RV2T_fetch_inst
|-- RV2T_inst_decode
|-- RV2T_machine_timer
|-- RV2T_mm_reg
|-- RV2T_reg_file
|-- OCD |-- debug_coprocessor_wrapper|-- debug_coprocessor
|-- debug_reply
|-- debug_UART
```
### Explain the instuction run in Reindeer Simulation
I choose one load instruction to illustrate the result because we can see fetch, decode, read register, execution, read from memory, write to the register from a load instruction.
Now we focus on PC = 0x8000278 ,instruction = 0x0000ab83. we know it will calculate the addr using ra+0. In this case, ra's data is 0x80002010 from PC = 0x8000026c, so we expect that lw instruction will access memory from address 0x80002010 and write the data to s7 register in the end.
```cpp=
80000268: 00002097 auipc ra,0x2
8000026c: da808093 addi ra,ra,-600 # 80002010 <test_A5_data>
80000270: 00002117 auipc sp,0x2
80000274: e2010113 addi sp,sp,-480 # 80002090 <test_A5_res>
80000278: 0000ab83 lw s7,0(ra)
8000027c: 00000c13 li s8,0
80000280: 00100c93 li s9,1
80000284: fff00d13 li s10,-1
80002010 <test_A5_data>:
80002010: 0000 unimp
80002012: 8000 0x8000
```
The waveform show the lw progress.
* In Fetch stage:
PC is 0x8000278 so instruction get 0x0000ab83.
* In decode stage:
we can see rs1 = 1 because ra register equal to x1 register, and the rd register is 0x17 because s7 register is equal to x23 register.
* In execute stage:
we can see mem_access_addr is 0x8000_2010 which is ra + 0.
* In memory stage:
word_out signal is 0x8000_0000 because in the beginning memory[ra] store test_A5_data which is 0x8000_0000, so memory stage read this data out.
* In write back stage:
we can see write_addr = 0x17, and write_data_in = 0x8000_0000. and next cycle in register file we can see mem[23] have data 0x8000_0000.

## Explain how Reindeer works with Verilator
1. Reset the CPU, put it into hold state
```cpp=
uut->reset_n = 0; // Set some inputs
uut->sync_reset = 0;
std::cout << "\n=============> reset..." << "\n";
uut->sync_reset = 0;
uut->ocd_reg_we = 0;
uut->ocd_reg_read_addr = 0;
uut->ocd_reg_write_addr = 0;
uut->ocd_reg_write_data = 0;
uut->ocd_read_enable = 0;
uut->ocd_write_enable = 0;
uut->ocd_rw_addr = 0;
uut->ocd_write_word = 0;
uut->start = 0;
uut->start_address = start_address;
t.reset();
```
2. Call upon toolchain to extract code/data from the .elf file for the test case
```cpp=
std::cout << "=============> load elf file..." << "\n";
load_elf_sections(&t, uut);
t.run();
```
3. Start the CPU, run for 2000 clock cycles
```cpp=
std::cout << "\n=============> start running ..." << "\n";
uut->start = 1;
t.run();
uut->start = 0;
t.run();
```
4. Reset the CPU, put it into hold state for the second time
```cpp=
t.reset();
```
5. Read the data out of the memory, and compare them against the reference signature
```cpp=
if (sig_list.size()) {
std::cout << "\n========> Matching signature ...\n\n";
if (uut_memory_peek (&t, uut, begin_signature_addr, sig_list.size() * 4, true)) {
std::cout << "\n======> Signature MISMATCH!!!\n";
ret = -1;
} else {
std::cout << "\n======> Signature ALL MATCH!!!\n";
}
}
```
## How the simulation does for bootstrapping
We don't need a ROM or flash to store bootcode, instead we use Host PC to transmit data to Debug Module. OCD will put data into CPU's local memory and make CPU start. After program is over. OCD will have a signal to TX mux module which can let CPU's data can transmit to host PC by TX_CPU . We can see the result from the PC in the end.
