## Assignment3: Single-cycle RISC-V CPU contributed by < [yuchen0620](https://github.com/yuchen0620) > [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p) ## Prerequisites Install the dependent packages - For Ubuntu Linux ```shell $ sudo apt install build-essential verilator gtkwave ``` Install [sbt](https://www.scala-sbt.org/) ```shell # Install sdkman $ curl -s "https://get.sdkman.io" | bash $ source "$HOME/.sdkman/bin/sdkman-init.sh" # Install Eclipse Temurin JDK 11 $ sdk install java 11.0.21-tem $ sdk install sbt ``` ### Hello World in Chisel ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` The character each val represent. - CNT_MAX : a register contains the maximium value of 24999999 uint. - cntReg : a 32-bit register acts as a counter - blkReg : a 1-bit register acts as a flag which is used to decide the output value. - led : 1-bit output, it only has the value 0 or 1 depend on blkReg, The Hello world in Chisel is acting as a circuit, cntReg will plus 1 every clock cycle, after 25000000 clock cycle, cntReg will meet CNT_MAX, cntReg will be reset to 0 and blkReg will be inverted. Thus the output led will switch between 0 and 1 every 250000 cycles which is dependent on the value of blkReg. ## Single-cycle RISC-V CPU We have to construct a single-cycle RISC-V CPU named `Mycpu` by completing `IntructionFetch.scala`、`IntructionDecode.scala`、`Execute.scala` and `CPU.scala`. ### InstructionFetch ![image](https://hackmd.io/_uploads/Bk7deBPBp.png) At the instruction fetch stage, we need to modify the value of the PC register to determind the address of next instruction. Based on `jump_flag_id`, we can figure out that the PC should be the jump address or `PC + 4`. **jump_flag_id = 0** ![image](https://hackmd.io/_uploads/SyFhwHDSa.png) when `jump_flag_id = 0`, the PC just normally plus 4. **jump_flag_id = 1** ![image](https://hackmd.io/_uploads/Byh3_SDr6.png) when `jump_flag_id = 1`, the PC jumps back to `0x00001000` from `0x0000100C` ### InstructionDecode At the instruction decode stage, we need to handle the `memory_read_enable` and `memory_write_enable` by opcode. If InstrcutionType is L, we have to set `memory_read_enable` to 1, otherwise set to 0. If InstrcutionType is S, we have to set `memory_write_enable` to 1, otherwise set to 0. ![image](https://hackmd.io/_uploads/Sy6P6BvB6.png) `Instruction = 0x00A02223` is s-type instruction, thus the `io_memory_write_enable` is set to 1 and `io_memory_read_enable` is set to 0. `Instruction = 0x000022B7` is `lui` instruction, thus both `io_memory_write_enable` and `io_memory_read_enable` are set to 0. `Instruction = 0x002081B3` is `add` instruction, thus both `io_memory_write_enable` and `io_memory_read_enable` are set to 0. ### Execute At the execute stage, we have to assign value to the input ports of ALU. ```scala class ALU extends Module { val io = IO(new Bundle { val func = Input(ALUFunctions()) val op1 = Input(UInt(Parameters.DataWidth)) val op2 = Input(UInt(Parameters.DataWidth)) val result = Output(UInt(Parameters.DataWidth)) }) ``` By `aluop_source`, we can determind that `op1` should be reg_data or instruction_address and `op2`should be reg_data or immdiate . **The testing way** First of the testing is to run `add` instruction 100 times with random `op1` and `op2`. The next is `beq` test, and there are two different type of `beq` test. One is equal, the other is not equal. To see that in this two different situation `if_jump_flag` and `if_jump_address` can meet our expect or not. ```scala class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Execution of Single Cycle CPU") it should "execute correctly" in { test(new Execute).withAnnotations(TestAnnotations.annos) { c => c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1 var x = 0 for (x <- 0 to 100) { val op1 = scala.util.Random.nextInt(429496729) val op2 = scala.util.Random.nextInt(429496729) val result = op1 + op2 val addr = scala.util.Random.nextInt(32) c.io.reg1_data.poke(op1.U) c.io.reg2_data.poke(op2.U) c.clock.step() c.io.mem_alu_result.expect(result.U) c.io.if_jump_flag.expect(0.U) } // beq test c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2 c.io.instruction_address.poke(2.U) c.io.immediate.poke(2.U) c.io.aluop1_source.poke(1.U) c.io.aluop2_source.poke(1.U) c.clock.step() // equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(9.U) c.clock.step() c.io.if_jump_flag.expect(1.U) c.io.if_jump_address.expect(4.U) // not equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(19.U) c.clock.step() c.io.if_jump_flag.expect(0.U) c.io.if_jump_address.expect(4.U) } } } ``` **add test** We can observe that `io_aluop1_source = 0` and `io_aluop2_source = 0` ,hence alu.io.op should be reg_data. Besides, `io_if_jump_address = 0` 、 `io_if_jump_flag = 0` and the `io_mem_alu_result (1AAA7DB1) = io_reg1_data(1180D150) + io_reg2_data(0929AC61)` ![image](https://hackmd.io/_uploads/HJTO0IDHa.png) **beq test** `equal` When `io_reg1_data = io_reg2_data`, the `io_if_jump_address = 4` and `io_if_jump_flag = 1`; ![image](https://hackmd.io/_uploads/S169pIvrp.png) `not equal` When `io_reg1_data != io_reg2_data`, the `io_if_jump_address = 4` (stay the same with last clock cycle) and `io_if_jump_flag = 0`; ![image](https://hackmd.io/_uploads/Skh-C8DrT.png) We can observe that `io_aluop1_source = 1` and `io_aluop2_source = 1` in those two situation. ### Combining into a CPU We have to conect all the components together according to the single-cycle CPU architecture diagram ![image](https://hackmd.io/_uploads/rJ8wDwwHT.png) In this section, we have to connect the inputs of the Execute module with the outputs of other modules. The execute moudle has totally 7 inputs, we have to connect those input with correct wire one by one. ### sbt test Run the following command to check our implement. ```shell $ sbt test ``` The sucessful message I get! ``` [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 8 seconds, 500 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## Modify Handwritten RISC-V assembly code I replace `ecall` which is used to print the result on console by storing the result into memory `0x4` 、`0x8`、`0x12` and `0x16` for four test datas. ``` main: la s1, test lw a0, 0(s1) lw a1, 4(s1) jal ra, palindrome_detected addi s2, s2 , 4 sw a0, 0(s2) lw a0, 8(s1) lw a1, 12(s1) jal ra, palindrome_detected sw a0, 4(s2) lw a0, 16(s1) lw a1, 20(s1) jal ra, palindrome_detected sw a0, 8(s2) lw a0, 24(s1) lw a1, 28(s1) jal ra, palindrome_detected sw a0, 12(s2) ``` I also modify the main function of my C code to store the result in the memory. ``` c int main(){ uint64_t testA = 0x0000000000000000; //0 is palindrome uint64_t testB = 0x0000000000000001; //testB not palindrome uint64_t testC = 0x00000C0000000003; //testC is palindrome uint64_t testD = 0x0F000000000000F0; //testD not palindrome *((volatile int *) 4) = palindrome_detected(testA); *((volatile int *) 8) = palindrome_detected(testB); *((volatile int *) 12) = palindrome_detected(testC); *((volatile int *) 16) = palindrome_detected(testD); return 0; } ``` The next thing to do is to put the c file and assembly file into the directory `ca2023-lab3/csrc` To regenerate the RISC-V programs for unit tests, change to `csrc` directory and use `make update` to generate the asmbin file! ``` $ cd ~/ca2023-lab3/csrc $ make update ``` Change the Makefile in the csrc directory by adding the asmbin file below `BINS` ``` BINS = \ fibonacci.asmbin \ hello.asmbin \ mmio.asmbin \ quicksort.asmbin \ sb.asmbin\ palindrome_opt_hw3.asmbin ``` Add the PalindromeTest in the CPUTest.scala file. ```scala class PalindromeTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "check a 64-bit uint is palindrome or not" in { test(new TestTopModule("palindrome_opt_hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(1.U) c.io.mem_debug_read_address.poke(8.U) c.clock.step() c.io.mem_debug_read_data.expect(0.U) c.io.mem_debug_read_address.poke(12.U) c.clock.step() c.io.mem_debug_read_data.expect(1.U) c.io.mem_debug_read_address.poke(16.U) c.clock.step() c.io.mem_debug_read_data.expect(0.U) } } } ``` Go back to the directory `ca2023-lab3` and run the `PalindromeTest` ``` $ cd ~/ca2023-lab3 $ sbt "testOnly riscv.singlecycle.PalindromeTest" ``` The successful message we get. ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/ubuntu/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/ubuntu/ca2023-lab3/) [info] PalindromeTest: [info] Single Cycle CPU [info] - should check a 64-bit uint is palindrome or not [info] Run completed in 4 seconds, 582 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Verilator analysis To Load the `palindrome_opt_hw3.asmbin` file, simulate for 1000 cycles, and save the simulation waveform to the `dump.vcd` file, we can run the following command. ``` $ make verilator $ ./run-verilator.sh -instruction csrc/palindrome_opt_hw3.asmbin -time 2000 -vcd dump.vcd ```