# Reindeer lab3
contributed by < `kksweet8845` >
## Source code taken from [assignment 1](https://hackmd.io/@kksweet8845/lab1r32i)
```cpp
# This example shows an implementation of the
# mathematical combintation formula.
.data
argument: .word 7 5
.text
main: # Initialize the register of argument
addi a2, zero, 0x7 # The n of nCk
addi a3, zero, 0x5 # The k of nCk
addi a0, zero, 0 # The return value of nCk
jal comb # Begin the routine of comb
j end # After finished, goto end:
comb:
addi sp, sp, -12 # Create stack
sw ra, 8(sp) # Save return address
sw a2, 4(sp) # Save caller argument $a2(The n of nCk)
sw a3, 0(sp) # Save caller argument $a3(The k of nCk)
beq a2, a3, if # Jump if nCn == 1
beq a3, zero, if # Jump if nC0 == 1
addi a2, a2, -1 # Sub 1 from $a2
addi a3, a3, -1 # Sub 1 from $a3
jal comb # The recursion of (n-1)C(k-1)
lw a3, 0(sp) # Restore $a3 to $a3 + 1
jal comb # The recursion of (n-1)C(k)
lw ra, 8(sp) # Restore $ra, $a2, $a3
lw a2, 4(sp) # ''
lw a3, 0(sp) # ''
addi sp, sp, 12 # Pop the stack
jr ra # Return to ${ra}
if:
addi sp, sp, 12 # Pop the stack
addi a0, a0, 1 # Add 1 to ${a0}
jr ra # Return to ${ra}
end:
```
It is a assembly for the simple combination function. It will produce the value of nCk by using the Pascal combination formula
\begin{split}C^{n}_{m} = C^{n-1}_{m-1} + C^{n-1}_{m}\end{split}
, which can be used in recusive method to produce the answer. However, this method is slow when the number is too large.
## Transform assembly into ELF file
### From assembly to ELF
According to the [README.md](https://github.com/riscv/riscv-compliance/blob/master/README.md) in [riscv-compliance](https://github.com/riscv/riscv-compliance), I need to compile the .S file into .elf one.
Quoted from the [README.md](https://github.com/riscv/riscv-compliance/blob/master/README.md)
>The only setup required is to define where the toolchain is found, and where the target / device is found.
For the toolchain, the binaries must be in the search path and the compiler prefix is defined on the make line. The default value for this is
```
RISCV_PREFIX ?= riscv64-unknown-elf-
```
>The path to the RUN_TARGET is defined within the riscv-target Makefile.include.
To run the rv32i test suite on riscvOVPsim
```
make RISCV_TARGET=riscvOVPsim RISCV_DEVICE=rv32i
```
I need to specify the `RISCV_PREFIX`, `RISCV_TARGET`, `RISCV_DEVICE`.
Then, I need to set up the variable to match my riscv-compiler, that is,
- `riscv-none-embed` is the new adopted name of either `riscv64-unknown-elf` or `riscv32-unknown-elf`.
- `riscvOVPsim` is the simulator in this example
- `rv32i` is the support ISA in riscv with only integer operations with simple arithmetic.
The following lines of code is written in the `Makefile` in `/riscv-test-suite/rv32i/Makefile`
```
RISCV_PREFIX := riscv-none-embed
RISCV_TARGET := riscvOVPsim
RISCV_DEVICE := rv32i
```
However, in `Makefile` of rv32i, there is another variables needed to be modified are `TARGETDIR` and `ROOTDIR`, which refer to the `Makefile.include` in
`/riscv-target/riscvOVPsim/device/rv32i/` and the root path which you want to mkdir /work directory the place for elf output and some other objdump and signature file.
That it, I need to specify the absolute path to this directory.
```
TARGETDIR:=/home/nober/git/adv_CO/riscv-compliance/riscv-target
ROOTDIR := /home/nober/git/adv_CO/riscv-compliance
```
:::danger
Avoid hardcoded paths such as `home/nober`. Instead, you can set via environment variables.
:notes: jserv
:::
Until now, you still can't compile the .S file into .elf file, because it will occur the error of not found the simulator, `riscvOVPsim`.
Then, I finally modified the path of `riscvOVPsim` in Makefile.include.
```
TARGET_SIM ?= /home/nober/git/adv_CO/riscv-compliance/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe
```
Type the command shown in the following will generate the elf file in /work directory.
```
make I-ADD-01.elf
```
The hierarchy of /work
```shell
work
├── I-ADD-01.elf
├── I-ADD-01.elf.objdump
└── rv32i
└── I-ENDIANESS-01.elf.objdump
```
### Generate the ground truth to this test file
In order to test the `Reindeer`, we need to use another simulator to generate the true answer, riscvOVPsim in this example.
```
%.log: %.elf
$(RUN_TARGET)
```
Just typically type the `make I-ADD-01.log` will generate the `log` file and `signature` file.
The following file is `log` and `signature`, respectively.
```log
Info Session started: Mon Nov 11 14:09:54 2019
Info ------------- ENVIRONMENT -------------
Info --------------------------------------
Info -------- FLAGS (from /home/nober/git/adv_CO/riscv-compliance/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe)
Info --variant RV32I
Info --program /home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.elf
Info --signaturedump
Info --customcontrol
Info --override riscvOVPsim/cpu/sigdump/SignatureFile=/home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.signature.output
Info --override riscvOVPsim/cpu/sigdump/ResultReg=3
Info --override riscvOVPsim/cpu/simulateexceptions=T
Info --override riscvOVPsim/cpu/defaultsemihost=F
Info --logfile /home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.log
Info --override riscvOVPsim/cpu/user_version=2.3
Info --override riscvOVPsim/cpu/priv_version=1.11
Info --------------------------------------
Imperas riscvOVPsim
CpuManagerFixedPlatform (64-Bit) v20190923.0 Open Virtual Platform simulator from www.IMPERAS.com.
Copyright (c) 2005-2019 Imperas Software Ltd. Contains Imperas Proprietary Information.
Licensed Software, All Rights Reserved.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
CpuManagerFixedPlatform started: Mon Nov 11 14:09:54 2019
Info (OR_OF) Target 'riscvOVPsim/cpu' has object file read from '/home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.elf'
Info (OR_PH) Program Headers:
Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align
Info (OR_PD) LOAD 0x00001000 0x80000000 0x80000000 0x000003c4 0x000003c4 R-E 1000
Info (OR_PD) LOAD 0x00002000 0x80001000 0x80001000 0x00001204 0x00001204 RW- 1000
Info (SIGNATURE_DUMP) Found Symbol 'begin_signature' in application at 0x80002000
Info (SIGNATURE_DUMP) Found Symbol 'end_signature' in application at 0x80002090
Info (SIGNATURE_DUMP) Signature File enabled, file '/home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.signature.output'.
Info (SIGNATURE_DUMP) Extracting signature from 0x80002000 size 144 bytes
Info (SIGNATURE_DUMP) Symbol 'begin_signature' at 0x80002000
Info (SIGNATURE_DUMP) Symbol 'end_signature' at 0x80002090
Info (SIGNATURE_DUMP) Intercept 'write_tohost'. Generate Signature file
fffff5cbfffffffffffff80200000000
800000000765432000001a3480000000
8000000000001a340765432080000000
00000000fffff802fffffffefffff5cb
fffff5cbfffffffffffff802ffffffff
800000000765432000001a3480000000
8000000000001a340765432080000000
00000000fffff802fffffffefffff5cb
00000000ffffffffffffffffffffffff
Test PASSED
Info
Info ---------------------------------------------------
Info CPU 'riscvOVPsim/cpu' STATISTICS
Info Type : riscv (RV32I)
Info Nominal MIPS : 100
Info Final program counter : 0x80000044
Info Simulated instructions: 212
Info Simulated MIPS : run too short for meaningful result
Info ---------------------------------------------------
Info
Info ---------------------------------------------------
Info SIMULATION TIME STATISTICS
Info Simulated time : 0.00 seconds
Info User time : 0.00 seconds
Info System time : 0.00 seconds
Info Elapsed time : 0.00 seconds
Info ---------------------------------------------------
CpuManagerFixedPlatform finished: Mon Nov 11 14:09:54 2019
CpuManagerFixedPlatform (64-Bit) v20190923.0 Open Virtual Platform simulator from www.IMPERAS.com.
Visit www.IMPERAS.com for multicore debug, verification and analysis solutions.
Info Session ended: Mon Nov 11 14:09:54 2019
```
```shell
00000000
fffff802
00001a34
...
00001a34
80000000
ffffffff
00000000
```
## Rewrite as part of test suite
Before I choose this assembly code to be my test file, I had chosen the 9x9 multiplication table to be my test one. However, the feature of the test file is not quite high level as `assert` in C/C++. It is hard for me to dynamically change the address of the array, so I give up this option.
The first thing in here is to note that no any unnecessary section like `.text`. It will produce the file with wrong address.
```cpp
#include "riscv_test_macros.h"
#include "compliance_test.h"
#include "compliance_io.h"
RV_COMPLIANCE_RV32M
RV_COMPLIANCE_CODE_BEGIN
RVTEST_IO_INIT
RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "Test begin Reserved regs ra(x1) a0(x10) t0(x5)\n")
# ----------------------------------------------------------------------------------
initial1:
la x5, test_1_res
li x18, 0x7
li x19, 0x5
li x20, 0
main1:
sw x18, 0(x5)
sw x19, 4(x5)
jal comb
sw x20, 8(x5)
RVTEST_IO_CHECK()
RVTEST_IO_ASSERT_GPR_EQ(x5, x18, 0x7)
RVTEST_IO_ASSERT_GPR_EQ(x5, x19, 0x5)
RVTEST_IO_WRITE_STR(x31, "# Argument test - complete\n")
RVTEST_IO_ASSERT_GPR_EQ(x5, x20, 0x00000015)
RVTEST_IO_WRITE_STR(x31, "# Combination test - complete")
#-----------------------------------------------------------------------------------------
initial2:
la x5, test_2_res
li x18, 0x3
li x19, 0x2
li x20, 0
main2:
sw x18, 0(x5)
sw x19, 4(x5)
jal comb
sw x20, 8(x5)
j end
RVTEST_IO_CHECK()
RVTEST_IO_ASSERT_GPR_EQ(x5, x18, 0x3)
RVTEST_IO_ASSERT_GPR_EQ(x5, x19, 0x2)
RVTEST_IO_WRITE_STR(x31, "# Argument test - complete\n")
RVTEST_IO_ASSERT_GPR_EQ(x5, x20, 0x00000003)
RVTEST_IO_WRITE_STR(x31, "# Combination test - complete")
comb:
addi x2, x2, -12
sw x1, 8(x2)
sw x18, 4(x2)
sw x19, 0(x2)
beq x18, x19, if
beq x19, x0, if
addi x18, x18, -1
addi x19, x19, -1
jal comb
lw x19, 0(x2)
jal comb
lw x1, 8(x2)
lw x18, 4(x2)
lw x19, 0(x2)
addi x2, x2, 12
jr x1
if:
addi x2, x2, 12
addi x20, x20, 1
jr x1
end:
RV_COMPLIANCE_HALT
RV_COMPLIANCE_CODE_END
# Inpute data section.
.data
# Output data section.
RV_COMPLIANCE_DATA_BEGIN
.align 4
test_1_res:
.fill 3, 4, -1
test_2_res:
.fill 3, 4, -1
RV_COMPLIANCE_DATA_END
```
I used the `x18, x19 and x20` to be my arg1 , arg2 and return value of the combination function.
And, we can simply compile this file into elf format. Before compile this file, it is needed to put the name into `rv32i_sc_tests` in `Makefrag`.
```
make comb.elf
make comb.log
```
Then, it will generate the file into /work directory.
```shell
work
├── comb.elf
├── comb.elf.objdump
├── comb.log
├── comb.signature.output
├── I-ADD-01.elf
├── I-ADD-01.elf.objdump
├── I-ADD-01.log
├── I-ADD-01.signature.output
└── rv32i
└── I-ENDIANESS-01.elf.objdump
```
Any, my test is very simple. Just the calculation of $C^{7}_{5}$ and $C^{3}_{2}$. The signature is as following
```shell
00000007 # The arg1 of case 1
00000005 # The arg2 of case 1
00000015 # The return value
00000003 # Case 2 ...
00000002
00000003
00000000
00000000
```
## Generate the vcd file
After finishing the compilation of elf and log file, we can simply copy these files into /sim/compliance/ in [Reindeer]() and type `make test comb`.
```shell
$ make test comb
```
It will gererate the `comb.vcd`.
Use the [gtkwave]() to view the `comb.vcd`
```shell
$ gtkwave comb.vcd
```

::: success
// TODO : Analysis the comb.vcd
Update at 11/21
:::
After I traced the core of reindeer, it using the dual port for reading and writting simultaneously. It uses two bram to implement the function above.
```shell
dual_port_ram #(.ADDR_WIDTH (`REG_ADDR_BITS), .DATA_WIDTH (`XLEN)) single_clk_ram_rs1 (
.waddr (write_addr), /* Synchronize the memory */
.raddr (read_rs1_addr),
.din (write_data_in),
.write_en (write_enable),
.clk (clk),
.dout (read_rs1_data_out_i) );
dual_port_ram #(.ADDR_WIDTH (`REG_ADDR_BITS), .DATA_WIDTH (`XLEN)) single_clk_ram_rs2 (
.waddr (write_addr), /* Synchronize the memory */
.raddr (read_rs2_addr),
.din (write_data_in),
.write_en (write_enable),
.clk (clk),
.dout (read_rs2_data_out_i) );
```
Let take a look to the waveform. Before that, I select `PC_in`, `PC_out` and some register which I used. In the comb.s, I use the `x18`, `x19`, `x20`.
- `PC_in`
- `PC_out`
- `mem(18)` : `x18`, which is the m
- `mem(19)` : `x19`, which is the n
- `mem(20)` : `x20`, which is the return value
And, the dissemble of elf file.
```shell
Disassembly of section .text.init:
80000000 <_start>:
80000000: 04c0006f j 8000004c <reset_vector>
80000004 <trap_vector>:
80000004: 34202f73 csrr t5,mcause
80000008: 00800f93 li t6,8
8000000c: 03ff0a63 beq t5,t6,80000040 <write_tohost>
80000010: 00900f93 li t6,9
80000014: 03ff0663 beq t5,t6,80000040 <write_tohost>
80000018: 00b00f93 li t6,11
8000001c: 03ff0263 beq t5,t6,80000040 <write_tohost>
80000020: 80000f17 auipc t5,0x80000
80000024: fe0f0f13 addi t5,t5,-32 # 0 <_start-0x80000000>
80000028: 000f0463 beqz t5,80000030 <trap_vector+0x2c>
8000002c: 000f0067 jr t5
80000030: 34202f73 csrr t5,mcause
80000034: 000f5463 bgez t5,8000003c <handle_exception>
80000038: 0040006f j 8000003c <handle_exception>
...
```
We can see that the signal of `PC_in` and `PC_out`, which they are the program counter of PulseRain MCU. However, In MCU, it has several signal also called `PC_in` and `PC_out` of different stage, i.e. IF, ID, EXE, etc.
In first line of code, it is the jump instruction which will jump to `0x8000004c`
Let look at the wave diagram to justify this instruction.

We can see the exe stage, `branch_addr` is `0x8000004c`, which is identical to the comb.objdump.
Now, let skip to the execution of the test code.
```shell
8000011c <main1>:
8000011c: 0122a023 sw s2,0(t0)
80000120: 0132a223 sw s3,4(t0)
80000124: 030000ef jal ra,80000154 <comb>
80000128: 0142a423 sw s4,8(t0)
```
The waveform is shown as following.
First line
`0x8000011c sw s2, 0(t0)`, rs1 = x5, rs2 = x18, offset = 0
As we can see, the rs1 is `x5`, rs2 is `x18`, and the immediate offset is zero.

In the first test, the m and n are stored in `x18`(7) and `x19`(5), respectively. That is,
$C^{7}_{5} = 21$
```
80000154 <comb>:
80000154: ff410113 addi sp,sp,-12
80000158: 00112423 sw ra,8(sp)
8000015c: 01212223 sw s2,4(sp)
80000160: 01312023 sw s3,0(sp)
80000164: 03390863 beq s2,s3,80000194 <if>
80000168: 02098663 beqz s3,80000194 <if>
8000016c: fff90913 addi s2,s2,-1
80000170: fff98993 addi s3,s3,-1
80000174: fe1ff0ef jal ra,80000154 <comb>
80000178: 00012983 lw s3,0(sp)
8000017c: fd9ff0ef jal ra,80000154 <comb>
80000180: 00812083 lw ra,8(sp)
80000184: 00412903 lw s2,4(sp)
80000188: 00012983 lw s3,0(sp)
8000018c: 00c10113 addi sp,sp,12
80000190: 00008067 ret
```
According to my code.
```
8000015c: 01212223 sw s2,4(sp)
80000160: 01312023 sw s3,0(sp)
```
For instruction at `15c`, the `rs2` must be `x18` and the `rs2_in` must be 7.
for instruction at `160`, the `rs2` is must be `x19` and the `rs2_in` must be 5.
- Instruction `15c`

- Instruction `160`

Our final answer is at `x20` return to `128`.
```
8000011c <main1>:
8000011c: 0122a023 sw s2,0(t0)
80000120: 0132a223 sw s3,4(t0)
80000124: 030000ef jal ra,80000154 <comb>
80000128: 0142a423 sw s4,8(t0)
```
We can see the answer is 0x15, 21 in decimal.

## How Reindeer works with Verilator
Verilator can convert synthesizable Verilog code, plus some Synthesis, SystemVerilog and small subset of Verilog AMS into C++ or System code. It is not a traditional simulator, but a compiler.
Pardon... It is a compiler which can compile the Verilog code into C++ or SystemC code.
It is fantistic tool for me to run the verilog without commericial application, like modelsim or Intel Quatus.
The Makefile in `Reindeer/sim/verilator/Makefile` is also to build the UUT, VPluseTain_RV2T_MCU from a bunch of submodule in Reindeer/submodules/PulseRain_MCU and compile in a /obj_dir, this is full of submodule of written with C++.
### In `tb_PulseRain_RV2T.cpp`
Test Bench, tb_PulseRain_RV2T.cpp will new a `UUT` object, which is the Reindeer simulator, VPulseRain_RV2T_MCU.
`tb_PulseRain_RV2T.cpp` will extracte the elf file with wanted value.
```cpp
ref_file_process(ref_file);
/* Extract the begin_signature and end_signature */
elf_label_process(elf_file);
prepare_elf_section_list(elf_file);
```
It reset UUT input wire with the following code. The `testbench`, which is a class to evalutate the simulator given UUT and tfp, `VerilatedVcdC*`
```cpp
uut = new UUT; // Create instance
/* Create a VCD dump file in c standalone (no SystemC) simulations */
VerilatedVcdC* tfp = new VerilatedVcdC;
if (!trace_file.empty()) {
Verilated::traceEverOn(true);
uut->trace (tfp, 99);
tfp->open (trace_file.c_str());
}
else {
tfp = NULL;
}
testbench t {10, uut, tfp};
uut->reset_n = 0; // Set some inputs
uut->sync_reset = 0;
std::cout << "\n=============> reset..." << "\n";
/* Initialize the uut */
uut->sync_reset = 0;
uut->ocd_reg_we = 0;
uut->ocd_reg_read_addr = 0;
uut->ocd_reg_write_addr = 0;
uut->ocd_reg_write_data = 0;
uut->ocd_read_enable = 0;
uut->ocd_write_enable = 0;
uut->ocd_rw_addr = 0;
uut->ocd_write_word = 0;
uut->start = 0;
uut->start_address = start_address;
t.reset();
std::cout << "=============> init stack ..." << "\n";
uut->ocd_reg_we = 1;
uut->ocd_reg_write_addr = 2; // SP
uut->ocd_reg_write_data = DEFAULT_STACK_INIT_VALUE;
t.run();
uut->ocd_reg_we = 0;
t.run();
```
## Hold and Load
The tradtional method to bootstrap a CPU is 1.making a boot loader in software 2. storing the boot loader in a ROM and 3. after power on rest, the boot loader is supposed to be executed first, for which it will move the rest of the code/data into RAM. And the PC will be set to the _start address of the new image afterwords.

Quote from [README.md] from [Reindeer]()
>After reset, the soft CPU will be put into a hold state, and it will have access to the UART TX port by default. But a valid debug frame sending from the host PC can let OCD to reconfigure the mux and switch the UART TX to OCD side, for which the memory can be accessed, and the control frames can be exchanged. A new software image can be loaded into the memory during the CPU hold state, which gives rise to the name "hold-and-load".