Assignment3: Single-cycle RISC-V CPU

# Assignment3: Single-cycle RISC-V CPU contributed by < [`ChengChiTing`](https://github.com/ChengChiTing) > ###### tags: `RISC-V`, `jserv` ## Implementation in Chisel. ### Operating Systems Ubuntu 22.04.3 ### Environment Setup Follow the instructions in [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p), install the essential package. #### Install verilator and gtkwave ``` $ sudo apt install build-essential verilator gtkwave ``` verilator version : 4.038 GTKWave Analyzer version : 3.3.104 #### Install sbt( the Scala build tool ) ```shell $ curl -s "https://get.sdkman.io" | bash $ source "$HOME/.sdkman/bin/sdkman-init.sh" $ sdk install java 11.0.21-tem $ sdk install sbt ``` java version : 11.0.21 sbt version : 1.9.7 >note! java version is crucial! >Version 11 & 17 can execute sbt command, but I encountered some error when I used java 21 in my first time environment setting. ## Chisel Tutorial Before we start Lab 3, we have to learn the fundamental concepts of Chisel first. Chisel is a domain specific language (DSL) implemented using Scala’s macro features. We can get the Repositiory with git command. ``` $ git clone https://github.com/ucb-bar/chisel-tutorial ``` After then, we can use the following command to check whether sbt installed successfully and executed correctly in our system. ```shell $ cd chisel-tutorial $ sbt run ``` It is needed to download necessary components for the first time. If the sbt run successfully, we can get the following output: ``` [info] Loading project definition from /home/riscv/chisel-tutorial/project [info] Loading settings for project chisel-tutorial from build.sbt ... [info] Set current project to chisel-tutorial (in build file:/home/riscv/chisel-tutorial/) [info] running hello.Hello [info] [0.001] Elaborating design... [info] [0.049] Done elaborating. Computed transform order in: 106.1 ms Total FIRRTL Compile Time: 223.9 ms End of dependency graph Circuit state created [info] [0.001] SEED 1701115466014 test Hello Success: 1 tests passed in 6 cycles taking 0.008942 seconds [info] [0.002] RAN 1 CYCLES PASSED [success] Total time: 2 s, completed Nov 28, 2023, 4:04:27 AM ``` ### Using Docker Follow the instructions in [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p) , and install Docker on Ubuntu. Run the following command: ``` $ docker run -it --rm -p 8888:8888 sysprog21/chisel-bootcamp ``` Copy the URL in output message, which starts with http://127.0.0.1:8888/, and paste it into your web browser to access the [Jupyter Notebook](https://jupyter.org/). ### Learning Chisel Online Learn how to write basic Scala code and the concepts of Chisel from [Chisel tutorial](https://hub.ovh2.mybinder.org/user/freechipsproject-chisel-bootcamp-kinh6utn/lab/tree/1_intro_to_scala.ipynb). We should go through the following CHEPTER and complete the exercises: * 1_intro_to_scala * 2.1_first_module * 2.2_comb_logic * 2.3_control_flow * 2.4_sequential_logic * 2.5_exercise * 2.6_chiseltest * 3.1_parameters * 3.2_collections * 3.2_interlude * 3.3_higher-order_functions * 3.4_functional_programming * 3.5_object_oriented_programming * 3.6_types After we have already completed all the exercises above, we can begin our work on Single-cycle RISC-V CPU ## Single-cycle RISC-V CPU Fork the GitHub repository ca2023-lab3 ```shell $ git clone https://github.com/sysprog21/ca2023-lab3 $ cd ca2023-lab3 ``` We can use the following command to check if the single-cycle RISC-V cpu implement sucessfully. ```shell $ sbt test ``` However, the Scala code in this repository is not entirely complete. Once we run the test directly without filling the lost code, we will get the error message shown below : ``` [info] Run completed in 9 seconds, 985 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 3, failed 6, canceled 0, ignored 0, pending 0 [info] *** 6 TESTS FAILED *** [error] Failed tests: [error] riscv.singlecycle.InstructionDecoderTest [error] riscv.singlecycle.ByteAccessTest [error] riscv.singlecycle.InstructionFetchTest [error] riscv.singlecycle.ExecuteTest [error] riscv.singlecycle.FibonacciTest [error] riscv.singlecycle.QuicksortTest [error] (Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 15 s, completed Nov 28, 2023, 10:50:50 AM ``` Therefore, we have to fill the scala code and passed all of the core test. The code related to core is located in the `src/main/scala/riscv` directory. If we want to run a single test, such as running only InstructionFetchTest, execute the following command: ```shell $ sbt "testOnly riscv.singlecycle.InstructionFetchTest" ``` We can check the single-cycle cpu architecture diagram to help us complete the cpu core. ![upload_95dee3babeb8f99e90e7a63866b7faa0](https://hackmd.io/_uploads/rJDTORzBT.png) ### Instruction Fetch >Code can be found in src/main/scala/riscv/core/InstructionFetch.scala :::spoiler Instruction Fetch scala code ```scala class InstructionFetch extends Module { val io = IO(new Bundle { val jump_flag_id = Input(Bool()) val jump_address_id = Input(UInt(Parameters.AddrWidth)) val instruction_read_data = Input(UInt(Parameters.DataWidth)) val instruction_valid = Input(Bool()) val instruction_address = Output(UInt(Parameters.AddrWidth)) val instruction = Output(UInt(Parameters.InstructionWidth)) }) val pc = RegInit(ProgramCounter.EntryAddress) when(io.instruction_valid) { io.instruction := io.instruction_read_data // lab3(InstructionFetch) begin // lab3(InstructionFetch) end }.otherwise { pc := pc io.instruction := 0x00000013.U } io.instruction_address := pc } ``` ::: We can compare the instruction fetch stage diagram shown below : ![if](https://hackmd.io/_uploads/SyWIhRzH6.png) In instruction fetch stage, we have four inputs( jump_flag_id, jump_address_id, instruction_read_data, instruction_valid ) and two output( instruction_address, instruction ). We have to check if the instruction is valid or not with `instruction_valid` then check `jump_flag_id`. once the `jump_flag_id` is `True`, the PC is directed to `jump_address_id`; otherwise, it is incremented to PC + 4. ### Instruction Decode If we run InstructionDecodeTest, we will get the following error : ``` [info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized. [info] : io.memory_write_enable <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized. [info] : io.memory_read_enable <= VOID ``` Therefore, we need to find the correct signal drive `io.memory_write_enable` and `io.memory_read_enable`. The `io.memory_write_enable` is used to implement `store` operate and the `io.memory_read_enable` is used to implement `load` operate. ![image](https://hackmd.io/_uploads/H1q2cXQra.png) When we check the `InstructionTypes` defined by `InstructionDecode.scala`, `load` instructions are defined as `InstructionsTypeL` and `store` instructions are defined as `InstructionsTypeS` >Code can be found in src/main/scala/riscv/core/InstructionDecode.scala :::spoiler Instruction Decode scala code ```scala object InstructionsTypeL { val lb = "b000".U val lh = "b001".U val lw = "b010".U val lbu = "b100".U val lhu = "b101".U } object InstructionsTypeI { val addi = 0.U val slli = 1.U val slti = 2.U val sltiu = 3.U val xori = 4.U val sri = 5.U val ori = 6.U val andi = 7.U } object InstructionsTypeS { val sb = "b000".U val sh = "b001".U val sw = "b010".U } ``` ::: We can use opcode to check whether the instruction is `load` or `store` or not, and then determine the enable signal is `True` or `False`. ### Execution If we run ExecuteTest, we will get the following error : ``` [info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized. [info] : alu.io.op1 <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized. [info] : alu.io.op2 <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized. [info] : alu.io.func <= VOID ``` According to the execute stage diagram shown below, we can notice that the two input, `aluop1_source` and `aluop2_source`, determine where the alu_op signal come from. ![image](https://hackmd.io/_uploads/SJ2Q0EmBT.png) If we check the `InstructionDecode.scala`, we would know the relationship between alu_op and aluop_source. Therefore, we could use conditionals to complete our code. >Code can be found in src/main/scala/riscv/core/InstructionDecode.scala :::spoiler Instruction Decode scala code ```scala object ALUOp1Source { val Register = 0.U(1.W) val InstructionAddress = 1.U(1.W) } object ALUOp2Source { val Register = 0.U(1.W) val Immediate = 1.U(1.W) } ``` ::: Another missing part is `alu.io.func`. Alu_func signal comes from the `ALU Control`, and we could check `ALUControl.scala`. We will use `alu_ctrl.io,alu_func` as input signal and `alu.io.func` as output signal. >Code can be found in src/main/scala/riscv/core/ALUControl.scala :::spoiler ALUControl scala code ``` class ALUControl extends Module { val io = IO(new Bundle { val opcode = Input(UInt(7.W)) val funct3 = Input(UInt(3.W)) val funct7 = Input(UInt(7.W)) val alu_funct = Output(ALUFunctions()) } ``` ::: ### CPU After filling missing scala code above, we could run `sbt test` again. However, there are still some error appeared, and the output message are shown below : ``` [info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.aluop2_source <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.aluop1_source <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.reg2_data <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.immediate <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.instruction_address <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.reg1_data <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized. [info] : ex.io.instruction <= VOID ``` After checking the error message, we have to drive the missing imput signal during Execute stage. Comparing with the single-cycle cpu architecture diagram above, we can figure out that the signal of `aluop_source`, `immediate` and `instruction` come from InstructionDecoded stage. The signal of `reg_data` come from RegisterFile and the signal of `instruction_address` come from InstructionFetch stage. Meanwhile, we can check the corresponded scala code then fill the right input signal. ### sbt test As long as we fix all the missing part of the scala code , we could run `sbt test` again. If all of the scala code is correct, we would get the following output. ``` [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] InstructionDecoderTest: [info] ExecuteTest: [info] InstructionDecoder of Single Cycle CPU [info] Execution of Single Cycle CPU [info] - should produce correct control signal [info] - should execute correctly [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] Run completed in 13 seconds, 26 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 17 s, completed Nov 29, 2023, 12:23:55 AM ``` ## GTKWave Analysis We could use waveform simulation to check the signals generated by our cpu. Run the following command. ```shell $ WRITE_VCD=1 sbt test ``` Afterward, we can find `.vcd` files generated in `test_run_dir` folder; Thus , we could open file with GTKWave and analisis relationship between different signals. ### InstructionFetch test ![image](https://hackmd.io/_uploads/ByHrpTEB6.png) At the time 33ps, the `io_instruction_address` = 100C and when it comes to time 35ps. Since `the io_instruction_valid` is valid and `io_jump_id` is `False`(low signal), the `io_instruction_address` has to add 4 and send into `pc`. Therefore we can see the `pc` and `io_instruction_address` are 1010 at time 35ps. When it comes to time 37ps, the `pc` and `io_instruction_address` should add 4 and be equal to 1014. However, io_jump_flag is `True`(high signal) and io_jump_address_id is 1000 so `pc` and `io_instruction_address` have to be 1000, too. ### InstructionDecode test ![image](https://hackmd.io/_uploads/BJ5WXRNH6.png) At time 2ps, the input instruction is `00A02223`, we can change it from machine code to RISC-V assembly code. we will get `sw x10, 4(x0)`. We can check the output signal. The offset `4` correspond to the `io_ex_immediate` and `io_memory_write_enable` is `True` because we are executing store instruction. The target memory address `rd` is `x4` because `4(x0)` and the source data comes from `x10` correspond to `io_regs_regs2_read_address` ### Execute test ![image](https://hackmd.io/_uploads/BySD9CVSa.png) At time 2ps, we can figure out the `io_instruction` is `001101B3` correspond to `add x3, x2, x1`. The `io_func` is equal to `1` and its definition in `ALUControl.scala` is also `add` instruction. Therefore, we have to add io_op1 to io_op2, and the result should be `19CAEB99` (`155EEA9E` + `046C00FB`). ## Run HW2 ON MyCPU We could use GNU toolchain to help us run HW2 on MyCPU. First, we have to set environment on our system. According to [Assignment2: RISC-V Toolchain](https://hackmd.io/5TIUG_u-RPmyWI7YBmSx0w), we have to execute the following command on root folder. ```shell $ cd $HOME $ source riscv-none-elf-gcc/setenv ``` Second, keep our RISC-V assembly code in the `csrs` directory and modify `Makefile`. After that, to regenerate the RISC-V programs utilized for unit tests, change to the `csrc` directory and run the `make update` command :::spoiler Makefile ``` CROSS_COMPILE ?= riscv-none-elf- ASFLAGS = -march=rv32i_zicsr -mabi=ilp32 CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv AS := $(CROSS_COMPILE)as CC := $(CROSS_COMPILE)gcc LD := $(CROSS_COMPILE)ld OBJCOPY := $(CROSS_COMPILE)objcopy %.o: %.S $(AS) -R $(ASFLAGS) -o $@ $< %.elf: %.S $(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) %.elf: %.c init.o $(CC) $(CFLAGS) -c -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o %.asmbin: %.elf $(OBJCOPY) -O binary -j .text -j .data $< $@ BINS = \ fibonacci.asmbin \ hello.asmbin \ mmio.asmbin \ quicksort.asmbin \ sb.asmbin # Clear the .DEFAULT_GOAL special variable, so that the following turns # to the first target after .DEFAULT_GOAL is not set. .DEFAULT_GOAL := all: $(BINS) update: $(BINS) cp -f $(BINS) ../src/main/resources clean: $(RM) *.o *.elf *.asmbin ``` ::: ```shell $ cd csrc $ make update ``` If `make` command run successfully, we will get new `.asmbin` in `csrc` folder. In addition, we can use `sbt test` to test our file. Sbt test activates the test cases for validating the CPU implementation relies on Chiseltest. We have to modify `CPUTest.scala` first. >File: src/test/scala/riscv/singlecycle/CPUTest.scala The code I add to the file is shown below. We can run `sbt test` command and confirm whether our `.asmbin` file pass test or not. :::spoiler CPUTest.scala ```scala class ipcheckTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "report success" in { test(new TestTopModule("ipcheck.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 2000) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.regs_debug_read_address.poke(1.U) c.io.regs_debug_read_data.expect(0x1300.U) c.io.regs_debug_read_address.poke(2.U) c.io.regs_debug_read_data.expect(0xf80.U) c.io.regs_debug_read_address.poke(8.U) c.io.regs_debug_read_data.expect(0xfb0.U) c.io.regs_debug_read_address.poke(9.U) c.io.regs_debug_read_data.expect(0xfc4.U) c.io.regs_debug_read_address.poke(12.U) c.io.regs_debug_read_data.expect(0x10.U) c.io.regs_debug_read_address.poke(15.U) c.io.regs_debug_read_data.expect(0xfcc.U) c.io.regs_debug_read_address.poke(16.U) c.io.regs_debug_read_data.expect(0x8.U) } println("success") } } ``` ::: Third, we need to execute the following command in the project’s root directory to generate Verilog files : ```shell $ make verilator ``` To load the ipcheck.asmbin file, simulate for 1000 cycles, and save the simulation waveform to the dump.vcd file, we can run : ```shell $ ./run-verilator.sh -instruction src/main/resources/ipcheck.asmbin -time 2000 -vcd dump.vcd ``` Then, open `dump.vcd` with `GTKWave` to check its waveform. ### InstructionFetch ![IF1](https://hackmd.io/_uploads/B1RqoeLST.png) At time 2ps, `io_instruction_valid` is `True` and `io_jump_flag_id` is `False` so the pc will point to instruction address `0x00001000`. The pointed instruction is `0x00001137` which will be send to InstructionDecode stage. ### InstructionDecode ![ID1](https://hackmd.io/_uploads/Sy2DE3USp.png) The instruction will decoded in this stage. If we check the RISC-V instruction set manual, we will find the correspond code is `lui x2, 1`. We can confirm the signals in the GTKWave. Since `lui` is a U-type instruction, `io_reg_write_enable` will be `True`. `rd` and `io_reg_write_address` is 02 due to the target register is `x2`. `lui` places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros. Therefore, `0x00001000` will be placed in `x2` register. ### Execute ![EXE1](https://hackmd.io/_uploads/Hy5Ot2IS6.png) `io_aluop1_source` is 0 and `io_aluop2_source` is 1. Alu will implement `add` function because of `func` is 1, so the result will be `0x00001000`. ### Register ![REG1](https://hackmd.io/_uploads/BJ0a1pUBT.png) `io_write_address` is 2 and `io_write_data` is `0x00001000` . The value of register `x2` change into `0x00001000` at time 5ps, when is the next cycle signal begin. ### Memory ![MEM1](https://hackmd.io/_uploads/SkSOg68Sa.png) The `lui` instruction doesn't implement read or write in memory so most of the signals will be `0` in this stage. ### Writeback ![WRB1](https://hackmd.io/_uploads/rJe6ZaLH6.png) It will send `0x00001000` back to the registers, and `x2` will be changed when the next cycle begin. # Reference * [The RISC-V Instruction Set Manual](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf) * [chisel-tutorial](https://github.com/ucb-bar/chisel-tutorial) * [chisel ca2023-lab3](https://github.com/sysprog21/ca2023-lab3) * [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Single-cycle-RISC-V-CPU)