# Assignment3: single-cycle RISC-V CPU contributed by < [kkkkk1109](https://github.com/kkkkk1109) > ## Introduction In this assignment, we are asked to learn [Chisel](https://www.chisel-lang.org/) and implement a single-cycle RISC-V CPU. By following the steps in [Lab3](https://hackmd.io/@sysprog/r1mlr3I7p#Lab3-Construct-a-single-cycle-RISC-V-CPU-with-Chisel), I complete `mycpu`, which is the object mentioned above, and test my assembly code from [Assignment 2](https://hackmd.io/Jt_bFcnUQMWSnUI55kU-GQ#Assignment2-RISC-V-Toolchain).In addition, there are explanations for the signals at different stages of `mycpu` along with evidence of successful tests. ## Hello World in Chisel ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` * `led` is the output of this circuit * `CNT_MAX` is an unsign integer number 24999999 * `cntReg` is a 32-bit register initialed with 0 value * `blkReg` is a 1-bit register initialed with 0 value The `cntReg` increase one every cycle, and when `cntReg` equals to `CNT_MAX`, the bit in `blkReg` will flip, and the output is the value in `blkReg` ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U) blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg) io.led := blkReg } ``` we can simply use the Mux to implement **Hello World in Chisel** ## Single-cycle RISC-V CPU To complete the Single-cycle RISC-V CPU, we need to add code to Scala files in `src/main/scala/riscv/core`. The following strategies outline how to complete each module and what the code should look like when tests pass. ### **Instruction Fetch** Check the `jump_flag_id` to determine whether it is true or not. If it is true, set the `program counter (pc) to the jump_address`; otherwise, add `4.U to the program counter (pc)` ``` $ sbt "testOnly riscv.singlecycle.InstructionFetchTest" ``` ``` [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] Run completed in 4 seconds, 609 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 6 s, completed Nov 29, 2023, 10:13:46 PM ``` ### **Instruction Decode** The space we need to filled in is to output the `memory read and write enable`.To determine the `memory read and write enable`, decode the instruction. If the instruction is of `L-type`, set memory_read_enable to true; if it is of `S-type`, set memory_write_enable to true. ``` $ sbt "testOnly riscv.singlecycle.InstructionDecoderTest" ``` ``` [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] Run completed in 4 seconds, 820 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 12 s, completed Nov 29, 2023, 10:28:23 PM ``` ### **Execute** In Execute stage, we should define the Input of op1 and op2. First, I write the code like this. ``` alu.io.op1 := io.reg1_data alu.io.op2 := io.reg2_data ``` However, I still pass the test ``` $ sbt "testOnly riscv.singlecycle.ExecuteTest" ``` ``` [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] Run completed in 4 seconds, 685 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 10 s, completed Nov 29, 2023, 10:35:42 PM ``` While, this mistake leads to the tests Failed in CPU ``` $ sbt test ``` ``` [error] Failed tests: [error] riscv.singlecycle.ByteAccessTest [error] riscv.singlecycle.FibonacciTest [error] riscv.singlecycle.QuicksortTest [error] (Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 31 s, completed Nov 29, 2023, 10:37:59 PM ``` It takes me hours to debug since I passed the Execute test and thought it was the CPU code went wrong. I found that I miss reading a secten in [Lab3](https://hackmd.io/@sysprog/r1mlr3I7p#Single-cycle-RISC-V-CPU) > Taking ex_aluop1_source control signal as an example, this control signal determines the input for the first operand of the ALU. It assigns a value to ex_aluop1_source based on the opcode. When the instruction type is either auipc, jal, or B, ex_aluop1_source is set to 0, controlling the ALU’s first operand input to be the instruction address. In other cases, ex_aluop1_source is set to 1, controlling the ALU’s first operand input to be a register. I forgot to check the op1 and op2 should be an address,a register value or an immediate, after take this into consideration, I passed all the tests. ``` [info] Run completed in 27 seconds, 916 milliseconds. [info] Total number of tests run: 10 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## waveform I made a assembly code `Instruction.s` in `csrc` to see the waveform. ``` .text _start: addi x1, x0, 3 exit: j exit ``` * **Instruction Fetch** We can see when the `instruction_valid` on, the `instruction` get the `instruction_read_data`signal. Also, when the `jump_flag_id` on, `insturcion_address` become the jump address where here is the label `exit` ![image](https://hackmd.io/_uploads/B14p7yDrT.png) * **Instruction Decode** Since the instruction here is `addi`, `aluop1_source` is 0 which means the data input is register; when`aluop2_source` is 1 ,which means the data input should be immediate. ![image](https://hackmd.io/_uploads/HyTFNJPr6.png) when the instruction is `j`, the `aluop1_source` become 1, which means the data input now is address. ![image](https://hackmd.io/_uploads/r1uvIyvH6.png) Also, this stage also decode the instruction into opcode, rs, rd, register address and so on. * **Execute** In this stage, the signal `op1` and `op2` are the value 0 from `x0` and `immediate` 3, and show the result 3 in the signal `result`. When the instruction changes to `j`, the `jump_flag ` goes to 1. ![image](https://hackmd.io/_uploads/r1gVDyDB6.png) * **Memory** There is no memory write or read in `Instruction.s`, so the `memory_read_enable` and `memory_write_enable` are both 0. ![image](https://hackmd.io/_uploads/H1wlFkDBT.png) * **Write Back** In the write-back stage, the computed data or data read from memory is written into registers ![image](https://hackmd.io/_uploads/S1RcFyPST.png) ## Modify handwritten RISC-V code in Assignment 2 Modify the assembly in assignment 2 by removing the `ecall` and `RDCYCLE/RDCYCLEH` instruction, and the result is stored in the register `s3`. ### Makefile Add assembly code `hw3.s` and `instruction.s` to generate `.asmbin` ```diff CROSS_COMPILE ?= riscv-none-elf- ASFLAGS = -march=rv32i_zicsr -mabi=ilp32 CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv AS := $(CROSS_COMPILE)as CC := $(CROSS_COMPILE)gcc LD := $(CROSS_COMPILE)ld OBJCOPY := $(CROSS_COMPILE)objcopy %.o: %.S $(AS) -R $(ASFLAGS) -o $@ $< %.elf: %.S $(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) %.elf: %.c init.o $(CC) $(CFLAGS) -c -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o %.asmbin: %.elf $(OBJCOPY) -O binary -j .text -j .data $< $@ BINS = \ fibonacci.asmbin \ hello.asmbin \ mmio.asmbin \ quicksort.asmbin \ sb.asmbin\ + hw3.asmbin\ + instruction.asmbin\ # Clear the .DEFAULT_GOAL special variable, so that the following turns # to the first target after .DEFAULT_GOAL is not set. .DEFAULT_GOAL := all: $(BINS) update: $(BINS) cp -f $(BINS) ../src/main/resources clean: $(RM) *.o *.elf *.asmbin ``` ``` $ make update ``` In `src/test/scala/riscv/singlecycle/CPUTest.scala`, add the test file ```scala class hw3 extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "do bfloat16 multiplication "in { test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 500) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.regs_debug_read_address.poke(19.U) // t0 c.io.regs_debug_read_data.expect(0x43000000.U) } } } ``` In my assembly code, the result of bfloat16 multiplication should be stored in `s3`, which is `x19`, and the multiplication result of `1.29999* 99.09999` should be `0x43000000`. Then, run the test file. ``` $ sbt test ``` ``` [info] - should do bfloat16 multiplication *** FAILED *** [info] io_regs_debug_read_data=133 (0x85) did not equal expected=1124073472 (0x43000000) (lines in CPUTest.scala: 126, 120) (CPUTest.scala:126) info] *** 1 TEST FAILED *** [error] Failed tests: [error] riscv.singlecycle.hw3 [error] (Test / test) sbt.TestsFailedException: Tests unsuccessful ``` Test Failed, the register value is not corresponded to the expected value `0x43000000`, using the gtkwave to check the signal. ``` $ ./run-verilator.sh -instruction src/main/resources/hw3.asmbin -time 2000 -vcd dump.vcd $ gtkwave dump.vcd ``` ![image](https://hackmd.io/_uploads/SkcLzhPrT.png) When running the program, it only reaches halfway at cycle 500. It can be observed that by cycle 681, the register s3 has the expected value. ![image](https://hackmd.io/_uploads/SykQX3vra.png) Change the run cycle to 1000. ```scala class hw3 extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "do bfloat16 multiplication "in { test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 1000) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.regs_debug_read_address.poke(19.U) // t0 c.io.regs_debug_read_data.expect(0x43000000.U) } } } ``` All tests passed! ``` [info] hw3: [info] Single Cycle CPU [info] - should do bfloat16 multiplication [info] Run completed in 29 seconds, 278 milliseconds. [info] Total number of tests run: 10 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ```