# Assignment3: single-cycle RISC-V CPU contributed by [yptang5488](https://github.com/yptang5488/ca2023-lab3) ## Prerequisites Follow the environment setting on [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Development-Objectives-of-this-Project). ### install the dependent packages ``` $ sudo apt install build-essential verilator gtkwave ``` ### install sbt ```powershell # install SDKMAN $ curl -s "https://get.sdkman.io" | bash $ source "/home/cgvsl/.sdkman/bin/sdkman-init.sh" # executes the content of the file # install JDK and sbt using SDKMAN $ sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) $ sdk install sbt ``` ## Hello World in Chisel ```scala! class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` - `CNT_MAX` : set the maximum value of `cntReg` - `cntReg` : counter - add 1 each time until `cntReg` = `CNT_MAX` and resets to zero - `blkReg` : set `io.led` - do NOT operation when `cntReg` = `CNT_MAX` each time - `io.led` : output of class `Hello` ### enhance I use a multiplexer instead of the `when` conditional statement since the synthesis tool is better optimized for multiplexers. Then, `blkReg` is toggled using a simple XOR operation, which reduces the need for an explicit `when` statement. ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U) blkReg := blkReg ^ (cntReg === CNT_MAX) io.led := blkReg } ``` ## Single-Cycle RISC-V CPU with Chisel In [ca2023-lab3](https://github.com/sysprog21/ca2023-lab3/tree/main), we have to fill in the correct code for each `//lab3` annotated program segment so that the test can be successfully passed. I use the command to run the CPU test: ``` $ sbt test ``` And here are the outputs: ``` [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from {myFilePath}/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:{myFilePath}/ca2023-lab3/) [info] compiling 1 Scala source to {myFilePath}/ca2023-lab3/target/scala-2.13/test-classes ... [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 14 seconds, 205 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 19 s, completed Nov 29, 2023 5:00:23 PM ``` This shows that the CPU is able to perform correctly. ### My Implementation #### Instruction Fetch Use `jump_flag_id` to determine if a jump needs to be taken and assign a value to `pc` (i.e. `instruction_address`). #### Instruction Decode Use `opcode` to determine the `memory_read_enable` and `memory_write_enable` control lines to control whether the memory should be read or written. - memory read enable : load instruction (`opcode = b0000011`) - memory write enable : store instruction (`opcode = b0100011`) ![image](https://hackmd.io/_uploads/rye92KTEBp.png) - **case 1 : read** - Instruction `0x0003AA03` is equal to `lw x20, 0(x7)` - `opcode = 3 (b0000011)` and then `memory_read_enable = 1` - **case 2 : store** - Instruction `0x00112023` is equal to `sw x1, 0(x2)` - `opcode = 23 (b0100011)` and then `memory_write_enable = 1` #### Execute Implement the input of the ALU. The control line of ALU `alu.io.func` will come from the output of ALU control `alu_ctrl.io.alu_funct` , and the two inputs of the ALU will be determined by the `aluop1_source` and `aluop2_source` control lines, respectively, through the two Muxes. ![image](https://hackmd.io/_uploads/Hk45ZANrT.png) In this waveform graph, we can see that the input of ALU `io_op1`, `io_op2` is determined by the control of `aluop1_source` and `aluop2_source` respectively. #### CPU In `CPU.scala`, all the inputs and outputs of the components will be connected together to make sure that the CPU works correctly. My implementation of this part is to set up the inputs of the execute stage, including : - input data sources - `ex.io.instruction_address` - `ex.io.reg1_data` - `ex.io.reg2_data` - `ex.io.immediate` - input controls - `ex.io.aluop1_source` - `ex.io.aluop2_source` - instructions : `ex.io.instruction` The instructions are used to retrieve `func`, `opcode`, etc., to be used as controls for the ALU. ### Unit Test #### Instruction Fetch Test The test assigns an arbitrary `io.jump_flag_id` and determines if `io.instruction_address` is the expected instruction. ![image](https://hackmd.io/_uploads/ry_twHEr6.png) From the waveform graph, you can see that when `io_jump_flag_id = 1`, `io_instruction_address` will be set to `io_jump_address_id (0x1000)` instead of `io_instruction_address + 4 (0x1010)`. #### Instruction Decoder Test There are three commands, `s-type`, `lui` and `add`, to test if the two sources of ALU, `ex_aluop1_source` and `ex_aluop1_source` are correct. In `InstructionDecode.scala`, two objects, `ALUOp1Source` and `ALUOp2Source`, are defined to specify the source and the corresponding select number of Muxes : - `ex_aluop1_source` - 0 : Register (others) - 1 : InstructionAddress (`auipc`, `B-type`, `jal`) - `ex_aluop1_source` - 0 : Register (`R-type`) - 1 : Immediate (others) - **test 1 : lui** ![image](https://hackmd.io/_uploads/H1_EaHNS6.png) - Instruction `0x000022b7` stands for `lui x5, 2` - check if `ex_aluop1_source` = `0`, because there is no need to read the data - check if `ex_aluop2_source` = `1`, to input the immediate number to ALU - **test 2 : add** ![image](https://hackmd.io/_uploads/HkEoTrVHp.png) - Instruction `0x002081b3` stands for `add x3, x1, x2` - check if `ex_aluop1_source` = `1`, because there is no need to read the data - check if `ex_aluop2_source` = `1`, to input the immediate number to ALU #### 3. Execute Test There are two types of commands, `add`, `beq`, to test if the output of ALU `mem_alu_result`, `if_jump_flag` is expected result. If a command related to jump (ie. `beq`) is encountered, the `if_jump_address` value calculated by the ALU is checked. - **test 1 : add** ![image](https://hackmd.io/_uploads/SkHVzI4ST.png) - Instruction `0x001101b3` stands for `add x3, x2, x1` - check if `io_mem_alu_result` = `io_reg1_data` + `io_reg2_data` - In the waveform graph, we can see that `0x2a6400da` = `0x118f33fa` + `0x18d4cce0` - check if `if_jump_flat` = `0` - **test 1 : beq** ![image](https://hackmd.io/_uploads/HkmCfAEBT.png) - Instruction `0x00208163` stands for `beq x1, x2, 2` - equal case : - `io_if_jump_flag` = `1` when `io_reg1_data` = `io_reg2_data` = `9` - `io_if_jump_address` is set to `4` ### Summary of different test cases There are tests of different three programs including Fibonacci Test, Quicksort Test and Byte Access Test. The first two tests check memory and the last test checks registers. #### read the binary file Set `readAsmBinary` function in the custom class `InstructionROM` to read the `.asmbin` file. `inputStream` is set by `java.nio` package that represents an input stream of bytes. ```scala var instructions = new Array[BigInt](0) val arr = new Array[Byte](4) while (inputStream.read(arr) == 4) { val instBuf = ByteBuffer.wrap(arr) instBuf.order(ByteOrder.LITTLE_ENDIAN) val inst = BigInt(instBuf.getInt() & 0xffffffffL) instructions = instructions :+ inst } ``` In this function, a 32-bit instruction is read four times, one byte at a time. #### clock step setting of memory and register In the class `TestTopModule`, I found that `cpu.io.debug_read_address` is defined in CPU tick while `mem.io.debug_read_address` is defined outside. ```scala withClock(CPU_tick.asClock) { val cpu = Module(new CPU) cpu.io.debug_read_address := 0.U /* various interactions with the CPU module are performed here */ cpu.io.debug_read_address := io.regs_debug_read_address io.regs_debug_read_data := cpu.io.debug_read_data } mem.io.debug_read_address := io.mem_debug_read_address io.mem_debug_read_data := mem.io.debug_read_data } ``` The difference between these two designs may be due to the fact that the register changes with the clock that toggles every `CPU_tick` cycles while the memory does not. #### avoid of timeout There is an initial loop before the test operation as follow : ```scala for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } ``` The initial loop helps the design under test initialize, stabilize, and progress through a sufficient number of clock cycles. This approach is to make ensure that the simulation progresses adequately before critical test scenarios are executed, avoiding potential timeouts and allowing the design to reach a stable state. I used `c.clock.step()` instead of `c.clock.step(1000)` in my tests first. And I found that I couldn't access the correct answer in memory, the result I got was always zero. ## Adapt RISC-V assembly code to myCPU I take the handwriten assembly code of [homework2](https://hackmd.io/YEt_6po6SgigOtBNi5Tm2Q?view) and run it on this CPU. ### converted into binary file 1. put assembly code in `/csrc` 2. To get the `.asmbin` file, I modify the Makefile and execute `make` on terminal 3. put `.asmbin` file into the `/source` ``` cp -fr csrc/slicing_rv32emu.asmbin src/main/resources/slicing_rv32emu.asmbin ``` ### test case in MyCPU Write my own test cases by referring to the other tests. There are 3 cases where I put the expected answers in list structures respectively for easy access. ```scala val test_size = List(4, 9, 6) // test case 0 val test0_ans = List(255.U, 0.U, 255.U, 0.U) val base_addr = 4 for (i <- 0 to test_size(0)-1) { c.io.mem_debug_read_address.poke((i * 4 + 4).U) c.clock.step() c.io.mem_debug_read_data.expect(test0_ans(i)) } // test case 1 val test1_ans = List(255.U, 255.U, 0.U, 255.U, 255.U, 255.U, 255.U, 255.U, 255.U) for (i <- 0 to test_size(1)-1) { c.io.mem_debug_read_address.poke((i * 4 + 20).U) c.clock.step() c.io.mem_debug_read_data.expect(test1_ans(i)) } // test case 2 val test2_ans = List(0.U, 0.U, 255.U, 255.U, 255.U, 255.U) for (i <- 0 to test_size(2)-1) { c.io.mem_debug_read_address.poke((i * 4 + 56).U) c.clock.step() c.io.mem_debug_read_data.expect(test2_ans(i)) ``` Use the command to run my test : ``` $ sbt "testOnly riscv.singlecycle.SlicingTest" ``` Finally, I get output on the terminal showing that the program passed the test on the CPU : ``` [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from {myFilePath}/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:{myFilePath}/ca2023-lab3/) [info] compiling 1 Scala source to {myFilePath}/ca2023-lab3/target/scala-2.13/test-classes ... [info] SlicingTest: [info] Single Cycle CPU [info] - should calculate the bit slicing matrix [info] Run completed in 5 seconds, 342 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 10 s, completed Nov 28, 2023 10:47:32 PM ``` ### waveform To make sure that the program run correctly on the CPU, I use Verilator for simulation. ```shell $ make verilator # to get executable file $ ./run-verilator.sh -instruction src/main/resources/slicing_rv32emu.asmbin -time 1800 -vcd dump.vcd ``` Output message is as follow : ``` -time 1800 -memory 1048576 -instruction src/main/resources/slicing_rv32emu.asmbin [-------------------->] 100% ``` I get the `dump.vcd` file, which keeps a waveform graph of the CPU as it runs. I choose three different commands to check if the CPU was executing correctly. #### case 1 ![image](https://hackmd.io/_uploads/rJKbN3NSp.png) Instruction `0x33336313` stands for `ori t1, t1, 51` - ALU - `alu_io_function` is set to `6` (`ALUFunctions.or`) which means the `ori` execute the `or` function in ALU (`result = op1 | op2`) - Registers - write the answer `0x33333333` back to `t1 (06)` register #### case 2 ![image](https://hackmd.io/_uploads/SJ3HwnNST.png) Instruction `0x038000ef` stands for `jal ra, 56 (0x38)`. - ALU - jump-related instruction , so `ex_io_if_jump_flag` = `1` - `alu_io_function` = `1` (`ALUFunctions.add`) - `result`(`0x104C`) = `inst_fecth_io_instruction_address`(`0x1014`) + `ex_io_immediate`(`0x38`) - Registers - write back `ra (01)` = next `inst_fecth_io_instruction_address` + 4 = `0x104C` #### case 3 ![image](https://hackmd.io/_uploads/Hyf8ypVrp.png) Instruction `0x00112023` stands for `sw ra, 0(sp)`. - Register File - read register `ra (01)` and get the value `0x1018` - immediate = `0x0` - ALU - caculate the target address - `alu_io_func` = `1` (`ALUFunctions.add`) - `result` (`0xFFFFFFFC`) = `regs_io_read_data1` (`0xFFFFFFFC`) + `id_io_ex_immdiate` (`0x0`) - Memory - `memory_write_enable` = `1` - write the data (`0x1018`) load from `ra` back to `sp` (`0xFFFFFFFC`)