# Assignment3: SoftCPU ###### tags: `arch2020` Here we create a test case from [Assignment 1 - Digits of π](https://hackmd.io/@guaneec/arch2020-a1) Each compliance test case is a written in a `.S` file. The file is then compiled into an `.elf` file and executed. After execution, the memory content is compared to a reference output file `.reference_output`. The signature must be matched so the testbench knows where in the memory should be compared (0x80002000). Originally in Assignment 1, output is directly written to the console and not stored in the memory. In order to create a test case, the code is modified to output to the memory instead. The memory can be inspected with Ripes: ![](https://i.imgur.com/aHAy9rk.png) The full modified test case for PI: ```a #include "riscv_test_macros.h" #include "compliance_test.h" #include "compliance_io.h" RV_COMPLIANCE_RV32M RV_COMPLIANCE_CODE_BEGIN RVTEST_IO_INIT RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000) RVTEST_IO_WRITE_STR(x31, "Test Begin\n") # --------------------------------------------------------------------------------------------- addi sp, sp, -32 sw s0, 0(sp) sw s1, 4(sp) sw s2, 8(sp) sw s4, 12(sp) sw s5, 16(sp) sw s7, 20(sp) sw ra, 24(sp) sw s6, 28(sp) la s0, arr # arr.begin() la s1, pred # arr.end(), pred.begin() la s2, pred # pred.end() la s6, out # output buffer # s4: main loop counter # s5: digit output # s7: m li s4, 100 li s7, 333 L1: # for _ in range(n) ble s4, zero, L1E li t0, 10 mv t1, s0 L2: # [x * 10 for x in a] beq t1, s1, L2E lh t2, 0(t1) mul t2, t2, t0 sh t2, 0(t1) addi t1, t1, 2 j L2 L2E: addi t0, s7, -1 # t0: i slli t1, t0, 1 add t1, s0, t1 # t1: a.begin() + i L3: # for i in range(m - 1, 0, -1): ble t0, zero, L3E lh t5, 0(t1) slli t2, t0, 1 addi t2, t2, 1 divu t3, t5, t2 remu t4, t5, t2 sh t4, 0(t1) mul t3, t3, t0 lh t4, -2(t1) add t3, t3, t4 sh t3, -2(t1) addi t0, t0, -1 addi t1, t1, -2 j L3 L3E: # d, a[0] = divmod(a[0], 10) lh t0, 0(s0) li t1, 10 divu s5, t0, t1 remu t3, t0, t1 sh t3, 0(s0) li t1, 8 # C0: if d <= 8 bgt s5, t1, C0 mv t0, s1 # t0: pred.begin() + i L4: # for p in pred: beq t0, s2, L4E lb t3, 0(t0) # print mv a0, t3 sb a0, 0(s6) addi s6, s6, 1 addi t0, t0,1 j L4 L4E: addi s2, s1, 1 # pred.end() = pred.begin() + 1 sb s5, 0(s1) # pred[0] = d j C0E C0: li t0, 9 ## if d == 9: bne s5, t0 ,C1 sb s5, 0(s2) addi s2, s2, 1 j C1E C1: mv t0, s1 # t0: pred.begin() + i L5: # for p in pred beq t0, s2, L5E lb t2, 0(t0) addi t2, t2, 1 li t3, 10 remu a0, t2, t3 sb a0, 0(s6) addi s6, s6, 1 addi t0, t0, 1 j L5 L5E: # pred = [0] sb zero, 0(s1) addi s2, s1, 1 C1E: C0E: addi s4, s4, -1 j L1 L1E: lw s0, 0(sp) lw s1, 4(sp) lw s2, 8(sp) lw s4, 12(sp) lw s5, 16(sp) lw s7, 20(sp) lw ra, 24(sp) lw s6, 28(sp) addi sp, sp, 28 RVTEST_IO_WRITE_STR(x31, "Test End\n") # --------------------------------------------------------------------------------------------- RV_COMPLIANCE_HALT RV_COMPLIANCE_CODE_END # Input data section. .data RV_COMPLIANCE_DATA_BEGIN out: .zero 112 RV_COMPLIANCE_DATA_END arr: .half 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 pred: .zero 200 # Output data section. ``` To generate `.elf`, the [riscv/riscv-compliance](https://github.com/riscv/riscv-compliance) repository is used. [riscv-ovpsim](https://github.com/riscv-ovpsim/imperas-riscv-tests) also need to be downloaded otherwise `make` would fail. The source `PI.S` is placed in `riscv-compliance/riscv-test-suite/rv32im` (not `rv32i` since `mul` and `rem` is used), and the output is generated in `riscv-compliance/work/rv32im`. The reference output is written in `riscv-compliance/riscv-test-suite/rv32im/references/PI.reference_output`. PI.reference_output: ``` 01040103 06020905 08050305 03090709 04080302 04060206 03080303 05090702 08080200 07090104 03090601 07030909 05000105 09000208 04090407 02090504 08070003 00040601 06080206 09080002 02060809 04030008 03050208 01010204 00060007 00000000 00000000 00000000 ``` This is digits of pi, each digit taking a byte. The last three lines of zeros is necessary, as for some reason the number of lines must be a multiple of 4. After setting up, run `make`: ``` $ make ... Compare to reference files ... Check DIV ... OK Check DIVU ... OK Check MUL ... OK Check MULH ... OK Check MULHSU ... OK Check MULHU ... OK Check PI ... OK Check REM ... OK Check REMU ... OK -------------------------------- OK: 9/9 RISCV_TARGET=riscvOVPsim RISCV_DEVICE=rv32im RISCV_ISA=rv32im ... ``` This confirms that our new test case works with `riscvOVPsim`. However simply copying over the elf file to the Reindeer directory doesn't work. ![](https://i.imgur.com/O3Wpivd.png) Turns out that the code isn't wrong. It's just not run for long enough. In Ripes, it is reported that calculating 100 digits takes ~870k cycles. However, the testbench only runs for 2000 cycles. The number of cycles is defined in [tb_PulseRain_RV2T.cpp](https://github.com/PulseRain/Reindeer/blob/master/sim/verilator/tb_PulseRain_RV2T.cpp#L39). Changing the number to 1000k still gives an error: ![](https://i.imgur.com/nb6XqHo.png) Only five lines are correctly output instead of 25. This is probably due to "cycle" having different meaning in Ripes and Reindeer. In Ripes, a cycle advances the pipeline by 1 and in Reindeer a cycle is a Verilog clock tick, which is only a fraction of a pipeline cycle. Running 1M cycles produces a 915 MB waveform file. I decided not to run until all the digits are output, since it might take up too much space. ------------- Reindeer is a soft CPU written in Verilog. Being a Verilog "program", it can be loaded in to FPGAs or simulated with simulators such as Verilator. ![](https://i.imgur.com/CZRyNUp.png) In the waveform above, we can inspect the inner workings of the register and memory module.