# Asssigment3: SoftCPU contributed by < `WeiCheng14159` > ###### tags: `computer architure 2020` ## Introduction ### How Compliance Tests work ? RISCV compliance tests are a set of rules that ensures a claimed RISCV implementation fits the basic standard and plays along with other RISCV implementation in the ecosystem. From the [riscv-compliance/doc/README.adoc](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc) we can have a better understanding of the compliance tests. > At the heart of the testing infrastructure is the detailed compliance test. This is the RISC-V assembler code that is executed on the processor and that provides results in a **defined memory area (the signature)**. The test should only use **the minimum of instructions** and only those absolutely necessary. It should only use instructions and registers from the ISA instruction set on which it is targeted. #### What is signature ? ==**signature** is a defined memory area where the result of a test suite is stored== #### How to run RISCV compliance tests ? According to the [README.md](https://github.com/riscv/riscv-compliance) in the [riscv-compliance](https://github.com/riscv/riscv-compliance) repo, these are the requirements to run the RISCV compliance tests - Specify the toolchain in the [Makefile](https://github.com/riscv/riscv-compliance/blob/master/Makefile) - `export RISCV_PREFIX ?= riscv64-unknown-elf-` - `export RISCV_TARGET_FLAGS ?=` - `export RISCV_ASSERT ?= 0` - Specify where is the target device in the [Makefile](https://github.com/riscv/riscv-compliance/blob/master/Makefile) - `export RISCV_TARGET ?= riscvOVPsim` - `export RISCV_DEVICE ?= rv32i` - Set the ISA of target device - Install `riscvOVPsim` simulator - `riscvOVPsim` is a simulator used here. It has to be installed parallelled to the `riscv-compliance` directory. - Install `riscvOVPsim` by `git clone https://github.com/riscv-ovpsim/imperas-riscv-tests.git` - Also, the environment variable `TARGET_SIM` should point to the executable `/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe` Eventually, a modified version of Makefile can be found [here](https://github.com/WeiCheng14159/riscv-compliance/blob/rv32i-arch/Makefile) There're currently 48 test suites in the repository, run all the tests by running the command ```bash= export TARGET_SIM=/path-on-ur-computer/imperas-riscv-tests/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe make stimulate verify ``` The result will be `OK: 48/48 RISCV_TARGET=riscvOVPsim RISCV_DEVICE=rv32i RISCV_ISA=` #### How to write my own RISCV compliance test ? - Learn from examples by looking into a list of test suites for `rv32i` architecture in the [riscv-compliance/riscv-test-suite/rv32i/src/](https://github.com/riscv/riscv-compliance/tree/master/riscv-test-suite/rv32i/src) directory. - A list of available testing marco can be found in [compliance_io.h](https://github.com/riscv/riscv-compliance/blob/master/riscv-target/riscvOVPsim/compliance_io.h) and [compliance_test.h](https://github.com/riscv/riscv-compliance/blob/master/riscv-target/riscvOVPsim/compliance_test.h) - Write your own test suite (`.S` file), and place it in `riscv-compliance/riscv-test-suite/rv32i/src/` directory - Modify the [Makefrag](https://github.com/riscv/riscv-compliance/blob/master/riscv-test-suite/rv32i/Makefrag) file to add your test suite - Write the reference output file for your test suite and place it in [riscv-compliance/riscv-test-suite/rv32i/references](https://github.com/riscv/riscv-compliance/tree/master/riscv-test-suite/rv32i/references) directory - Run `make stimulate verify` to check the result of your test suite. The output files can be found in `riscv-compliance/work` directory ## Choose an assembly program as new test suite The assembly code `Count Leading Zero` from [Assignment 1](https://hackmd.io/S7_Jr4AiRZelkXA-rH-ILQ?view) is chosen for our new test suite. ### Count Leading Zero assembly code (from hw1) :::spoiler Assembly Code ```c=1 clz: lw x5, mask li x6, 32 li x7, 0 _for: bne x6, zero, _count _return: mv x10, x7 jr ra _count: addi x6, x6, -1 and x28, x10, x5 bne x28, zero, _return addi x7, x7, 1 srli x5, x5, 1 j _for ``` ::: ### Count Leading Zero test suite #### Test case itself - This is the `I-CLZ-01.S` Count Leading Zero test suite. - I've created in total 9 test cases, for simplicity, only the first test case will be shown here. For a complete view of ALL the test cases, check [here](https://github.com/WeiCheng14159/riscv-compliance/blob/rv32i-arch/riscv-test-suite/rv32i/src/I-CLZ-01.S) :::spoiler Code ```c=1 #include "compliance_test.h" #include "compliance_io.h" #include "test_macros.h" # Test Virtual Machine (TVM) used by program. RV_COMPLIANCE_RV32M # Test code region. RV_COMPLIANCE_CODE_BEGIN RVTEST_IO_INIT RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000) RVTEST_IO_WRITE_STR(x31, "# Test Begin\n") # --------------------------------------------------------------------------------------------- RVTEST_IO_WRITE_STR(x31, "# Test part 1\n"); # Addresses for test data and results la x1, test_1_data la x2, test_1_res # Load testdata lw x10, 0(x1) # Register initialization # Test sw x10, 0(x2) jal clz # Store results sw x10, 4(x2) // // Assert // RVTEST_IO_CHECK() RVTEST_IO_ASSERT_GPR_EQ(x2, x10, 0x00000000) RVTEST_IO_WRITE_STR(x31, "# Test part 1 - Complete\n"); ... RV_COMPLIANCE_HALT # --------------------------------------------------------------------------------------------- # HALT # Count Leading Zero program # x5 -> t0 # x6 -> t1 # x7 -> t2 # x28 -> t3 # x10 -> a0 clz: lw x5, mask li x6, 32 li x7, 0 _for: bne x6, zero, _count _return: mv x10, x7 jr ra _count: addi x6, x6, -1 and x28, x10, x5 bne x28, zero, _return addi x7, x7, 1 srli x5, x5, 1 j _for RV_COMPLIANCE_CODE_END # Input data section. .data mask: .word 0x80000000 test_1_data: .word 0xFFFFFFFF ... skip ... # Output data section. RV_COMPLIANCE_DATA_BEGIN test_1_res: .fill 2, 4, -1 ... skip ... RV_COMPLIANCE_DATA_END ``` ::: #### Signature - This is the `I-CLZ-01.reference_output` signature I've created [I-CLZ-01.reference_output](https://github.com/WeiCheng14159/riscv-compliance/blob/rv32i-arch/riscv-test-suite/rv32i/references/I-CLZ-01.reference_output) - There're in total 9 test cases. Each test case write 2 words to the memory. For simplicity, only the result for the first test is shown - 1st word: input argument - 2nd word: result - The reference output signature will be compared with the output of the simulator ```c=1 ffffffff // input 1 00000000 // result 1 00000001 // input 2 0000001f // result 2 ... skip... ``` ## Wave form with GTKwave ### Steps - Copy and generated files (.elf, .objdump, .signature.output) from `riscv-compliance/work` directory to `Reindeer/sim/compliance` directory to run our test suite in Reindeer CPU (virtually) - Run command `make test I-CLZ-01` to start the simulation ```c=1 ============================================================= Simulation exit ../compliance/I-CLZ-01.elf Wave trace I-CLZ-01.vcd ============================================================= ====> Test PASSED, Total of 1 case(s) ``` - `I-CLZ-01.vcd` file will be generated. Use gtkwave to check the waveform ### Waveform analysis - Analyzing the file `I-CLZ-01.elf.objdump`, our first test suite starts at address `0x80000110` ```c=1 <begin_signature> 80000110: 0000a503 lw a0,0(ra) 80000114: 00a12023 sw a0,0(sp) 80000118: 114000ef jal ra,8000022c <clz> 8000011c: 00a12223 sw a0,4(sp) 80000120: 00002097 auipc ra,0x2 ... skip ... ``` - ![](https://i.imgur.com/kAXfQxB.png) ## How Reindeer works with Verilator ? Reindeer is a soft RISCV core written in Verilog, and Verilator is a compiler that transform the verilog HDL to C++ for simulation. Our test suite (assembly code) is copied into Reindeer and simulated on the Reindeer hardware virtually. ## What's 2x2 pipeline and its benefits? ![](https://i.imgur.com/uC5c8uA.png) - A single port memory is used. A 2x2 pipeline interleaves memory read (Instruction Fetch) and memory access (Reg and Mem Access) so a single port is sufficient for the design. The major purpose of this design is to avoid structural hazard. - Show in the wave diagram. Memory read and memory write are interleaved. (Read in even cycle, write in odd cycle) - ![](https://i.imgur.com/LtPpD2b.png) - `PC_out` is the current PC signal - `read_mem_addr[31:0]` is the address read by IF unit - `data_write_word[31:0]` is the address write by MEM unit ## Hardware arch of Reindeer CPU and OCD ### FSM inside Reindeer CPU The following diagram show the FSM on the Reindeer CPU side ```graphviz digraph reindeer_fsm{ graph [fontname=Arial]; node [shape=record,style=filled, fillcolor=aquamarine,fontsize=20.0]; edge [fontcolor=red, fontsize=20.0]; // nodes init [label="S_INIT"]; init_wait_1 [label="S_INIT_WAIT1"]; fetch [label="S_FETCH"]; decode [label="S_DECODE"]; fetch_exec [label="S_FETCH_EXE"]; except [label="S_EXCEPTION"]; except_reinit [label="S_EXCEPTION_REINIT"]; decode_data [label="S_DECODE_DATA"]; wfi [label="S_WFI"]; load [label="S_LOAD"]; load_wait [label="S_LOAD_WAIT"]; mul_div [label="S_MUL_DIV"]; wfi_wait [label="S_WFI_WAIT"]; store [label="S_STORE"]; store_wait [label="S_STORE_WAIT"]; // edges init->init_wait_1 [label="start=1"]; init->init [label="start=0"] init_wait_1->fetch->decode->fetch_exec; fetch_exec->except[label="timer & interrupt & ecall"]; fetch_exec->except[label="branch & branch_addr[1]"]; fetch_exec->except[label="except & data_acc_enb"]; fetch_exec->init_wait_1[label="branch | mret_active"]; fetch_exec->decode_data; decode_data->wfi[label=" decode_ctl_WFI=1 "]; decode_data->store[label=" decode_ctl_STORE=1 "]; decode_data->load[label=" decode_ctl_LOAD=1 "]; decode_data->mul_div[label=" MUL_DIV_FUNCT3 "]; decode_data->fetch_exec; wfi->wfi_wait; wfi_wait->except[label=" timer & interrupt "]; wfi_wait->wfi_wait; store->store_wait; store_wait->except[label="misalign"]; store_wait->fetch_exec[label="store_done=1"]; store_wait->store_wait; load->load_wait; load_wait->except[label=" misalign "]; load_wait->fetch_exec[label="load_done=1"]; load_wait->load_wait; except->except_reinit->init_wait_1; mul_div->mul_div[label="done=0"]; mul_div->fetch_exec[label="done=1"]; } ``` ### FSM inside OCD The following diagram shows the FSM in OCD. ```graphviz digraph ocd_fsm{ graph [fontname=Arial]; node [shape=record,style=filled, fillcolor=aquamarine,fontsize=20.0]; edge [fontcolor=red, fontsize=20.0]; // nodes idle [label="S_IDLE"]; sync_1 [label="S_SYNC_1"]; sync_0 [label="S_SYNC_0"]; input [label="S_INPUT_WAIT"]; frame [label="S_FRAME_TYPE"]; crc [label="S_CRC"]; ext_crc [label="S_EXT_CRC"]; wr_ext [label="S_WR_EXT"]; wr_ack [label="S_WR_ACK"]; read_wait [label="S_PRAM_READ_WAIT"]; cpu_ack [label="S_CPU_STATUS_ACK"]; wait_done [label="S_WAIT_DONE"]; // edges idle->sync_1 [label="new_data_in"]; idle->idle; sync_1->sync_1; sync_1->sync_0[label="new_data_in"]; sync_1->idle; sync_0->input[label="new_data_in"]; sync_0->idle; sync_0->sync_0; input->frame[label="input_counter=??"]; input->input; frame->frame; frame->crc[label="crc_en"]; crc->wr_ext[label=" WRITE_128_BYTES_WITH_ACK "]; crc->idle[label=" WRITE_4_BYTES_WITHOUT_ACK "]; crc->wr_ack[label=" WRITE_4_BYTES_WITH_ACK "]; crc->idle[label=" WRITE_4_BYTES_WITHOUT_ACK "]; crc->read_wait[label=" READ_4_BYTES "]; crc->wr_ack[label=" CPU_RESET_WITH_ACK "]; crc->wr_ack[label=" RUN_PULSE_WITH_ACK "]; crc->cpu_ack[label=" READ_CPU_STATUS "]; crc->wr_ack[label=" COUNTER_CONFIG "]; crc->idle[label="UART_SEL"]; crc->idle; wr_ack->wait_done; cpu_ack->wait_done; read_wait->read_wait[label="!pram_read_enable_in"]; read_wait->wait_done; wait_done->idle[label="reply_done"]; wait_done->wait_done; wr_ext->wr_ext; wr_ext->ext_crc[label="input_counter"]; ext_crc->idle[label="crc_out"]; } ``` ## What's "Hold and Load" ? - Overview of Hold and load - ![](https://i.imgur.com/nTWkrJr.png) ### Reindeer CPU side The FSM in Reindeer controller [RV2T_controller.v](https://github.com/PulseRain/Reindeer/blob/master/submodules/PulseRain_MCU/PulseRain_processor_core/source/RV2T_controller.v) (~600 lines of verilog) shows how the soft CPU interact with the OCD coprocessor. ```c=448 current_state[S_INIT]: begin ctl_paused = 1'b1; if (start) begin ctl_pc_init = 1'b1; next_state [S_INIT_WAIT1] = 1'b1; end else begin next_state [S_INIT] = 1'b1; end end ``` The Reindeer CPU will only switch `S_INIT_WAIT1` state and set `ctrl_pc_init` to `1'b1` when signal `start` is high. Otherwise, stays in `S_INIT` state. ```c=348 ctl_pc_init : begin fetch_start_addr <= {start_addr [`PC_BITWIDTH - 1 : 1], 1'b0}; end ``` CPU set the `fetch_start_addr` after receiving `1'b1` on `ctrl_pc_init` signal. After this event, CPU starts running in IF stage. ### OCD side OCD waits for signal to transfer the memory to Reindeer CPU. Initially, OCD is in `S_IDLE` state. It will switch to `S_SYNC_1` state when debug signal comes in. ```c=356 current_state[S_IDLE]: begin ctl_wr_ext_disable = 1'b1; if (enable_in_sr[0] && (new_data_in == `DEBUG_SYNC_2)) begin next_state [S_SYNC_1] = 1; end else begin ctl_crc_sync_reset = 1'b1; next_state [S_IDLE] = 1; end end ``` (Some trivial states are skipped for simplicity.) OCD will switch from `S_INPUT_WAIT` to `S_FRAME_TYPE` **(frame checking state)** when condition are satisfied. ```c=393 current_state [S_INPUT_WAIT] : begin if (input_counter == (`DEBUG_FRAME_LENGTH - `DEBUG_SYNC_LENGTH - 1)) begin next_state [S_FRAME_TYPE] = 1; end else begin next_state [S_INPUT_WAIT] = 1; end end ``` Then OCD switch from `S_FRAME_TYPE` state to `S_CRC` state where OCD do **CRC16 cyclic redundency checking**. ```c=401 current_state [S_FRAME_TYPE] : begin ctl_reset_input_counter = 1'b1; if (enable_in_sr[0]) begin next_state [S_CRC] = 1; end else begin next_state [S_FRAME_TYPE] = 1; end end ``` ## How does simulation bootstrap ? TBD ## Signal and events inside Reindeer TBD ## Requirements :::spoiler 1. Following the instructions of [Lab3: Reindeer - RISCV RV32I[M] Soft CPU](https://hackmd.io/@sysprog/rJw2A5DqS), you shall modify the assembly programs used/done with [Assignment1](https://hackmd.io/@sysprog/2020-arch-homework1) or [Assignment2](https://hackmd.io/@sysprog/2020-arch-homework2) as new test case(s) for [Reindeer](https://github.com/PulseRain/Reindeer) Simulation with Verilator. * [ I-ADD-01](https://github.com/riscv/riscv-compliance/blob/master/riscv-test-suite/rv32i/src/I-ADD-01.S) is a good starting point for writing test cases. * You have to ensure signature matched with the requirements described in [RISC-V Compliance Tests](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc). 2. Check the generated VCD file and use GTKwave to view the waveform. Then, explain how your program is executed along with [Reindeer](https://github.com/PulseRain/Reindeer) Simulation. 3. Write down your thoughts and progress in [HackMD notes](https://hackmd.io/s/features). * Summarize how [RISC-V Compliance Tests](https://github.com/riscv/riscv-compliance/blob/master/doc/README.adoc) works and why the signature should be matched. * Explain how [Reindeer](https://github.com/PulseRain/Reindeer) works with [Verilator](https://www.veripool.org/wiki/verilator). * What is 2 x 2 Pipeline? How can we benefit from such pipeline design? * What is "Hold and Load"? And, how the simulation does for bootstraping? * Can you show some signals/events inside [Reindeer](https://github.com/PulseRain/Reindeer) and describe? :::