# Asssigment3: SoftCPU
contributed by < `WeiCheng14159` >
###### tags: `computer architure 2020`
## Introduction
### How Compliance Tests work ?
RISCV compliance tests are a set of rules that ensures a claimed RISCV implementation fits the basic standard and plays along with other RISCV implementation in the ecosystem.
From the [riscv-compliance/doc/README.adoc](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc) we can have a better understanding of the compliance tests.
> At the heart of the testing infrastructure is the detailed compliance test. This is the RISC-V assembler code that is executed on the processor and that provides results in a **defined memory area (the signature)**. The test should only use **the minimum of instructions** and only those absolutely necessary. It should only use instructions and registers from the ISA instruction set on which it is targeted.
#### What is signature ?
==**signature** is a defined memory area where the result of a test suite is stored==
#### How to run RISCV compliance tests ?
According to the [README.md](https://github.com/riscv/riscv-compliance) in the [riscv-compliance](https://github.com/riscv/riscv-compliance) repo, these are the requirements to run the RISCV compliance tests
- Specify the toolchain in the [Makefile](https://github.com/riscv/riscv-compliance/blob/master/Makefile)
- `export RISCV_PREFIX ?= riscv64-unknown-elf-`
- `export RISCV_TARGET_FLAGS ?=`
- `export RISCV_ASSERT ?= 0`
- Specify where is the target device in the [Makefile](https://github.com/riscv/riscv-compliance/blob/master/Makefile)
- `export RISCV_TARGET ?= riscvOVPsim`
- `export RISCV_DEVICE ?= rv32i`
- Set the ISA of target device
- Install `riscvOVPsim` simulator
- `riscvOVPsim` is a simulator used here. It has to be installed parallelled to the `riscv-compliance` directory.
- Install `riscvOVPsim` by `git clone https://github.com/riscv-ovpsim/imperas-riscv-tests.git`
- Also, the environment variable `TARGET_SIM` should point to the executable `/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe`
Eventually, a modified version of Makefile can be found [here](https://github.com/WeiCheng14159/riscv-compliance/blob/rv32i-arch/Makefile)
There're currently 48 test suites in the repository, run all the tests by running the command
```bash=
export TARGET_SIM=/path-on-ur-computer/imperas-riscv-tests/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe
make stimulate verify
```
The result will be `OK: 48/48 RISCV_TARGET=riscvOVPsim RISCV_DEVICE=rv32i RISCV_ISA=`
#### How to write my own RISCV compliance test ?
- Learn from examples by looking into a list of test suites for `rv32i` architecture in the [riscv-compliance/riscv-test-suite/rv32i/src/](https://github.com/riscv/riscv-compliance/tree/master/riscv-test-suite/rv32i/src) directory.
- A list of available testing marco can be found in [compliance_io.h](https://github.com/riscv/riscv-compliance/blob/master/riscv-target/riscvOVPsim/compliance_io.h) and [compliance_test.h](https://github.com/riscv/riscv-compliance/blob/master/riscv-target/riscvOVPsim/compliance_test.h)
- Write your own test suite (`.S` file), and place it in `riscv-compliance/riscv-test-suite/rv32i/src/` directory
- Modify the [Makefrag](https://github.com/riscv/riscv-compliance/blob/master/riscv-test-suite/rv32i/Makefrag) file to add your test suite
- Write the reference output file for your test suite and place it in [riscv-compliance/riscv-test-suite/rv32i/references](https://github.com/riscv/riscv-compliance/tree/master/riscv-test-suite/rv32i/references) directory
- Run `make stimulate verify` to check the result of your test suite. The output files can be found in `riscv-compliance/work` directory
## Choose an assembly program as new test suite
The assembly code `Count Leading Zero` from [Assignment 1](https://hackmd.io/S7_Jr4AiRZelkXA-rH-ILQ?view) is chosen for our new test suite.
### Count Leading Zero assembly code (from hw1)
:::spoiler Assembly Code
```c=1
clz:
lw x5, mask
li x6, 32
li x7, 0
_for:
bne x6, zero, _count
_return:
mv x10, x7
jr ra
_count:
addi x6, x6, -1
and x28, x10, x5
bne x28, zero, _return
addi x7, x7, 1
srli x5, x5, 1
j _for
```
:::
### Count Leading Zero test suite
#### Test case itself
- This is the `I-CLZ-01.S` Count Leading Zero test suite.
- I've created in total 9 test cases, for simplicity, only the first test case will be shown here. For a complete view of ALL the test cases, check [here](https://github.com/WeiCheng14159/riscv-compliance/blob/rv32i-arch/riscv-test-suite/rv32i/src/I-CLZ-01.S)
:::spoiler Code
```c=1
#include "compliance_test.h"
#include "compliance_io.h"
#include "test_macros.h"
# Test Virtual Machine (TVM) used by program.
RV_COMPLIANCE_RV32M
# Test code region.
RV_COMPLIANCE_CODE_BEGIN
RVTEST_IO_INIT
RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "# Test Begin\n")
# ---------------------------------------------------------------------------------------------
RVTEST_IO_WRITE_STR(x31, "# Test part 1\n");
# Addresses for test data and results
la x1, test_1_data
la x2, test_1_res
# Load testdata
lw x10, 0(x1)
# Register initialization
# Test
sw x10, 0(x2)
jal clz
# Store results
sw x10, 4(x2)
//
// Assert
//
RVTEST_IO_CHECK()
RVTEST_IO_ASSERT_GPR_EQ(x2, x10, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "# Test part 1 - Complete\n");
...
RV_COMPLIANCE_HALT
# ---------------------------------------------------------------------------------------------
# HALT
# Count Leading Zero program
# x5 -> t0
# x6 -> t1
# x7 -> t2
# x28 -> t3
# x10 -> a0
clz:
lw x5, mask
li x6, 32
li x7, 0
_for:
bne x6, zero, _count
_return:
mv x10, x7
jr ra
_count:
addi x6, x6, -1
and x28, x10, x5
bne x28, zero, _return
addi x7, x7, 1
srli x5, x5, 1
j _for
RV_COMPLIANCE_CODE_END
# Input data section.
.data
mask:
.word 0x80000000
test_1_data:
.word 0xFFFFFFFF
... skip ...
# Output data section.
RV_COMPLIANCE_DATA_BEGIN
test_1_res:
.fill 2, 4, -1
... skip ...
RV_COMPLIANCE_DATA_END
```
:::
#### Signature
- This is the `I-CLZ-01.reference_output` signature I've created [I-CLZ-01.reference_output](https://github.com/WeiCheng14159/riscv-compliance/blob/rv32i-arch/riscv-test-suite/rv32i/references/I-CLZ-01.reference_output)
- There're in total 9 test cases. Each test case write 2 words to the memory. For simplicity, only the result for the first test is shown
- 1st word: input argument
- 2nd word: result
- The reference output signature will be compared with the output of the simulator
```c=1
ffffffff // input 1
00000000 // result 1
00000001 // input 2
0000001f // result 2
... skip...
```
## Wave form with GTKwave
### Steps
- Copy and generated files (.elf, .objdump, .signature.output) from `riscv-compliance/work` directory to `Reindeer/sim/compliance` directory to run our test suite in Reindeer CPU (virtually)
- Run command `make test I-CLZ-01` to start the simulation
```c=1
=============================================================
Simulation exit ../compliance/I-CLZ-01.elf
Wave trace I-CLZ-01.vcd
=============================================================
====> Test PASSED, Total of 1 case(s)
```
- `I-CLZ-01.vcd` file will be generated. Use gtkwave to check the waveform
### Waveform analysis
- Analyzing the file `I-CLZ-01.elf.objdump`, our first test suite starts at address `0x80000110`
```c=1
<begin_signature>
80000110: 0000a503 lw a0,0(ra)
80000114: 00a12023 sw a0,0(sp)
80000118: 114000ef jal ra,8000022c <clz>
8000011c: 00a12223 sw a0,4(sp)
80000120: 00002097 auipc ra,0x2
... skip ...
```
- 
## How Reindeer works with Verilator ?
Reindeer is a soft RISCV core written in Verilog, and Verilator is a compiler that transform the verilog HDL to C++ for simulation. Our test suite (assembly code) is copied into Reindeer and simulated on the Reindeer hardware virtually.
## What's 2x2 pipeline and its benefits?

- A single port memory is used. A 2x2 pipeline interleaves memory read (Instruction Fetch) and memory access (Reg and Mem Access) so a single port is sufficient for the design. The major purpose of this design is to avoid structural hazard.
- Show in the wave diagram. Memory read and memory write are interleaved. (Read in even cycle, write in odd cycle)
- 
- `PC_out` is the current PC signal
- `read_mem_addr[31:0]` is the address read by IF unit
- `data_write_word[31:0]` is the address write by MEM unit
## Hardware arch of Reindeer CPU and OCD
### FSM inside Reindeer CPU
The following diagram show the FSM on the Reindeer CPU side
```graphviz
digraph reindeer_fsm{
graph [fontname=Arial];
node [shape=record,style=filled,
fillcolor=aquamarine,fontsize=20.0];
edge [fontcolor=red, fontsize=20.0];
// nodes
init [label="S_INIT"];
init_wait_1 [label="S_INIT_WAIT1"];
fetch [label="S_FETCH"];
decode [label="S_DECODE"];
fetch_exec [label="S_FETCH_EXE"];
except [label="S_EXCEPTION"];
except_reinit [label="S_EXCEPTION_REINIT"];
decode_data [label="S_DECODE_DATA"];
wfi [label="S_WFI"];
load [label="S_LOAD"];
load_wait [label="S_LOAD_WAIT"];
mul_div [label="S_MUL_DIV"];
wfi_wait [label="S_WFI_WAIT"];
store [label="S_STORE"];
store_wait [label="S_STORE_WAIT"];
// edges
init->init_wait_1 [label="start=1"];
init->init [label="start=0"]
init_wait_1->fetch->decode->fetch_exec;
fetch_exec->except[label="timer & interrupt & ecall"];
fetch_exec->except[label="branch & branch_addr[1]"];
fetch_exec->except[label="except & data_acc_enb"];
fetch_exec->init_wait_1[label="branch | mret_active"];
fetch_exec->decode_data;
decode_data->wfi[label=" decode_ctl_WFI=1 "];
decode_data->store[label=" decode_ctl_STORE=1 "];
decode_data->load[label=" decode_ctl_LOAD=1 "];
decode_data->mul_div[label=" MUL_DIV_FUNCT3 "];
decode_data->fetch_exec;
wfi->wfi_wait;
wfi_wait->except[label=" timer & interrupt "];
wfi_wait->wfi_wait;
store->store_wait;
store_wait->except[label="misalign"];
store_wait->fetch_exec[label="store_done=1"];
store_wait->store_wait;
load->load_wait;
load_wait->except[label=" misalign "];
load_wait->fetch_exec[label="load_done=1"];
load_wait->load_wait;
except->except_reinit->init_wait_1;
mul_div->mul_div[label="done=0"];
mul_div->fetch_exec[label="done=1"];
}
```
### FSM inside OCD
The following diagram shows the FSM in OCD.
```graphviz
digraph ocd_fsm{
graph [fontname=Arial];
node [shape=record,style=filled,
fillcolor=aquamarine,fontsize=20.0];
edge [fontcolor=red, fontsize=20.0];
// nodes
idle [label="S_IDLE"];
sync_1 [label="S_SYNC_1"];
sync_0 [label="S_SYNC_0"];
input [label="S_INPUT_WAIT"];
frame [label="S_FRAME_TYPE"];
crc [label="S_CRC"];
ext_crc [label="S_EXT_CRC"];
wr_ext [label="S_WR_EXT"];
wr_ack [label="S_WR_ACK"];
read_wait [label="S_PRAM_READ_WAIT"];
cpu_ack [label="S_CPU_STATUS_ACK"];
wait_done [label="S_WAIT_DONE"];
// edges
idle->sync_1 [label="new_data_in"];
idle->idle;
sync_1->sync_1;
sync_1->sync_0[label="new_data_in"];
sync_1->idle;
sync_0->input[label="new_data_in"];
sync_0->idle;
sync_0->sync_0;
input->frame[label="input_counter=??"];
input->input;
frame->frame;
frame->crc[label="crc_en"];
crc->wr_ext[label=" WRITE_128_BYTES_WITH_ACK "];
crc->idle[label=" WRITE_4_BYTES_WITHOUT_ACK "];
crc->wr_ack[label=" WRITE_4_BYTES_WITH_ACK "];
crc->idle[label=" WRITE_4_BYTES_WITHOUT_ACK "];
crc->read_wait[label=" READ_4_BYTES "];
crc->wr_ack[label=" CPU_RESET_WITH_ACK "];
crc->wr_ack[label=" RUN_PULSE_WITH_ACK "];
crc->cpu_ack[label=" READ_CPU_STATUS "];
crc->wr_ack[label=" COUNTER_CONFIG "];
crc->idle[label="UART_SEL"];
crc->idle;
wr_ack->wait_done;
cpu_ack->wait_done;
read_wait->read_wait[label="!pram_read_enable_in"];
read_wait->wait_done;
wait_done->idle[label="reply_done"];
wait_done->wait_done;
wr_ext->wr_ext;
wr_ext->ext_crc[label="input_counter"];
ext_crc->idle[label="crc_out"];
}
```
## What's "Hold and Load" ?
- Overview of Hold and load
- 
### Reindeer CPU side
The FSM in Reindeer controller [RV2T_controller.v](https://github.com/PulseRain/Reindeer/blob/master/submodules/PulseRain_MCU/PulseRain_processor_core/source/RV2T_controller.v) (~600 lines of verilog) shows how the soft CPU interact with the OCD coprocessor.
```c=448
current_state[S_INIT]: begin
ctl_paused = 1'b1;
if (start) begin
ctl_pc_init = 1'b1;
next_state [S_INIT_WAIT1] = 1'b1;
end else begin
next_state [S_INIT] = 1'b1;
end
end
```
The Reindeer CPU will only switch `S_INIT_WAIT1` state and set `ctrl_pc_init` to `1'b1` when signal `start` is high. Otherwise, stays in `S_INIT` state.
```c=348
ctl_pc_init : begin
fetch_start_addr <= {start_addr [`PC_BITWIDTH - 1 : 1], 1'b0};
end
```
CPU set the `fetch_start_addr` after receiving `1'b1` on `ctrl_pc_init` signal. After this event, CPU starts running in IF stage.
### OCD side
OCD waits for signal to transfer the memory to Reindeer CPU. Initially, OCD is in `S_IDLE` state. It will switch to `S_SYNC_1` state when debug signal comes in.
```c=356
current_state[S_IDLE]: begin
ctl_wr_ext_disable = 1'b1;
if (enable_in_sr[0] && (new_data_in == `DEBUG_SYNC_2)) begin
next_state [S_SYNC_1] = 1;
end else begin
ctl_crc_sync_reset = 1'b1;
next_state [S_IDLE] = 1;
end
end
```
(Some trivial states are skipped for simplicity.)
OCD will switch from `S_INPUT_WAIT` to `S_FRAME_TYPE` **(frame checking state)** when condition are satisfied.
```c=393
current_state [S_INPUT_WAIT] : begin
if (input_counter == (`DEBUG_FRAME_LENGTH - `DEBUG_SYNC_LENGTH - 1)) begin
next_state [S_FRAME_TYPE] = 1;
end else begin
next_state [S_INPUT_WAIT] = 1;
end
end
```
Then OCD switch from `S_FRAME_TYPE` state to `S_CRC` state where OCD do **CRC16 cyclic redundency checking**.
```c=401
current_state [S_FRAME_TYPE] : begin
ctl_reset_input_counter = 1'b1;
if (enable_in_sr[0]) begin
next_state [S_CRC] = 1;
end else begin
next_state [S_FRAME_TYPE] = 1;
end
end
```
## How does simulation bootstrap ?
TBD
## Signal and events inside Reindeer
TBD
## Requirements
:::spoiler
1. Following the instructions of [Lab3: Reindeer - RISCV RV32I[M] Soft CPU](https://hackmd.io/@sysprog/rJw2A5DqS), you shall modify the assembly programs used/done with [Assignment1](https://hackmd.io/@sysprog/2020-arch-homework1) or [Assignment2](https://hackmd.io/@sysprog/2020-arch-homework2) as new test case(s) for [Reindeer](https://github.com/PulseRain/Reindeer) Simulation with Verilator.
* [ I-ADD-01](https://github.com/riscv/riscv-compliance/blob/master/riscv-test-suite/rv32i/src/I-ADD-01.S) is a good starting point for writing test cases.
* You have to ensure signature matched with the requirements described in [RISC-V Compliance Tests](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc).
2. Check the generated VCD file and use GTKwave to view the waveform. Then, explain how your program is executed along with [Reindeer](https://github.com/PulseRain/Reindeer) Simulation.
3. Write down your thoughts and progress in [HackMD notes](https://hackmd.io/s/features).
* Summarize how [RISC-V Compliance Tests](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc) works and why the signature should be matched.
* Explain how [Reindeer](https://github.com/PulseRain/Reindeer) works with [Verilator](https://www.veripool.org/wiki/verilator).
* What is 2 x 2 Pipeline? How can we benefit from such pipeline design?
* What is "Hold and Load"? And, how the simulation does for bootstraping?
* Can you show some signals/events inside [Reindeer](https://github.com/PulseRain/Reindeer) and describe?
:::