# Assignment3: SoftCPU
###### tags: `arch2020`
Here we create a test case from [Assignment 1 - Digits of π](https://hackmd.io/@guaneec/arch2020-a1)
Each compliance test case is a written in a `.S` file. The file is then compiled into an `.elf` file and executed. After execution, the memory content is compared to a reference output file `.reference_output`. The signature must be matched so the testbench knows where in the memory should be compared (0x80002000).
Originally in Assignment 1, output is directly written to the console and not stored in the memory. In order to create a test case, the code is modified to output to the memory instead. The memory can be inspected with Ripes:

The full modified test case for PI:
```a
#include "riscv_test_macros.h"
#include "compliance_test.h"
#include "compliance_io.h"
RV_COMPLIANCE_RV32M
RV_COMPLIANCE_CODE_BEGIN
RVTEST_IO_INIT
RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000)
RVTEST_IO_WRITE_STR(x31, "Test Begin\n")
# ---------------------------------------------------------------------------------------------
addi sp, sp, -32
sw s0, 0(sp)
sw s1, 4(sp)
sw s2, 8(sp)
sw s4, 12(sp)
sw s5, 16(sp)
sw s7, 20(sp)
sw ra, 24(sp)
sw s6, 28(sp)
la s0, arr # arr.begin()
la s1, pred # arr.end(), pred.begin()
la s2, pred # pred.end()
la s6, out # output buffer
# s4: main loop counter
# s5: digit output
# s7: m
li s4, 100
li s7, 333
L1: # for _ in range(n)
ble s4, zero, L1E
li t0, 10
mv t1, s0
L2: # [x * 10 for x in a]
beq t1, s1, L2E
lh t2, 0(t1)
mul t2, t2, t0
sh t2, 0(t1)
addi t1, t1, 2
j L2
L2E:
addi t0, s7, -1 # t0: i
slli t1, t0, 1
add t1, s0, t1 # t1: a.begin() + i
L3: # for i in range(m - 1, 0, -1):
ble t0, zero, L3E
lh t5, 0(t1)
slli t2, t0, 1
addi t2, t2, 1
divu t3, t5, t2
remu t4, t5, t2
sh t4, 0(t1)
mul t3, t3, t0
lh t4, -2(t1)
add t3, t3, t4
sh t3, -2(t1)
addi t0, t0, -1
addi t1, t1, -2
j L3
L3E:
# d, a[0] = divmod(a[0], 10)
lh t0, 0(s0)
li t1, 10
divu s5, t0, t1
remu t3, t0, t1
sh t3, 0(s0)
li t1, 8
# C0: if d <= 8
bgt s5, t1, C0
mv t0, s1 # t0: pred.begin() + i
L4: # for p in pred:
beq t0, s2, L4E
lb t3, 0(t0)
# print
mv a0, t3
sb a0, 0(s6)
addi s6, s6, 1
addi t0, t0,1
j L4
L4E:
addi s2, s1, 1 # pred.end() = pred.begin() + 1
sb s5, 0(s1) # pred[0] = d
j C0E
C0:
li t0, 9
## if d == 9:
bne s5, t0 ,C1
sb s5, 0(s2)
addi s2, s2, 1
j C1E
C1:
mv t0, s1 # t0: pred.begin() + i
L5: # for p in pred
beq t0, s2, L5E
lb t2, 0(t0)
addi t2, t2, 1
li t3, 10
remu a0, t2, t3
sb a0, 0(s6)
addi s6, s6, 1
addi t0, t0, 1
j L5
L5E:
# pred = [0]
sb zero, 0(s1)
addi s2, s1, 1
C1E:
C0E:
addi s4, s4, -1
j L1
L1E:
lw s0, 0(sp)
lw s1, 4(sp)
lw s2, 8(sp)
lw s4, 12(sp)
lw s5, 16(sp)
lw s7, 20(sp)
lw ra, 24(sp)
lw s6, 28(sp)
addi sp, sp, 28
RVTEST_IO_WRITE_STR(x31, "Test End\n")
# ---------------------------------------------------------------------------------------------
RV_COMPLIANCE_HALT
RV_COMPLIANCE_CODE_END
# Input data section.
.data
RV_COMPLIANCE_DATA_BEGIN
out: .zero 112
RV_COMPLIANCE_DATA_END
arr: .half 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
pred: .zero 200
# Output data section.
```
To generate `.elf`, the [riscv/riscv-compliance](https://github.com/riscv/riscv-compliance) repository is used. [riscv-ovpsim](https://github.com/riscv-ovpsim/imperas-riscv-tests) also need to be downloaded otherwise `make` would fail.
The source `PI.S` is placed in `riscv-compliance/riscv-test-suite/rv32im` (not `rv32i` since `mul` and `rem` is used), and the output is generated in `riscv-compliance/work/rv32im`. The reference output is written in `riscv-compliance/riscv-test-suite/rv32im/references/PI.reference_output`.
PI.reference_output:
```
01040103
06020905
08050305
03090709
04080302
04060206
03080303
05090702
08080200
07090104
03090601
07030909
05000105
09000208
04090407
02090504
08070003
00040601
06080206
09080002
02060809
04030008
03050208
01010204
00060007
00000000
00000000
00000000
```
This is digits of pi, each digit taking a byte. The last three lines of zeros is necessary, as for some reason the number of lines must be a multiple of 4.
After setting up, run `make`:
```
$ make
...
Compare to reference files ...
Check DIV ... OK
Check DIVU ... OK
Check MUL ... OK
Check MULH ... OK
Check MULHSU ... OK
Check MULHU ... OK
Check PI ... OK
Check REM ... OK
Check REMU ... OK
--------------------------------
OK: 9/9 RISCV_TARGET=riscvOVPsim RISCV_DEVICE=rv32im RISCV_ISA=rv32im
...
```
This confirms that our new test case works with `riscvOVPsim`.
However simply copying over the elf file to the Reindeer directory doesn't work.

Turns out that the code isn't wrong. It's just not run for long enough.
In Ripes, it is reported that calculating 100 digits takes ~870k cycles. However, the testbench only runs for 2000 cycles.
The number of cycles is defined in [tb_PulseRain_RV2T.cpp](https://github.com/PulseRain/Reindeer/blob/master/sim/verilator/tb_PulseRain_RV2T.cpp#L39). Changing the number to 1000k still gives an error:

Only five lines are correctly output instead of 25. This is probably due to "cycle" having different meaning in Ripes and Reindeer. In Ripes, a cycle advances the pipeline by 1 and in Reindeer a cycle is a Verilog clock tick, which is only a fraction of a pipeline cycle.
Running 1M cycles produces a 915 MB waveform file. I decided not to run until all the digits are output, since it might take up too much space.
-------------
Reindeer is a soft CPU written in Verilog. Being a Verilog "program", it can be loaded in to FPGAs or simulated with simulators such as Verilator.

In the waveform above, we can inspect the inner workings of the register and memory module.