# Branch prediction for 5-stage pipelined RISC-V > 邱柏穎, 黃詩哲 [GitHub](https://github.com/charliechiou/CA2024_Branch-prediction-5-stage-RISCV) ## Goal - [x] Studying [5-Stage-RV32I](https://github.com/kinzafatim/5-Stage-RV32I) - [x] Refer to the books mentioned at the bottom of the [ChiselRiscV](https://github.com/nozomioshi/ChiselRiscV) page to learn how to design a RISC-V processor using Chisel. - [x] Introduce branch prediction mechanisms into the 5-Stage-RV32I project and thoroughly validate them. ## Environment Setting ### 5-Stage-RV32I from [Kinza Fatima](https://github.com/kinzafatim) Cloning the repository from [5-Stage-RV32I](https://github.com/kinzafatim/5-Stage-RV32I) ``` $ git clone https://github.com/kinzafatim/5-stage-RV32I.git $ cd 5-stage-RV32I ``` Due to the original author's unintended mistake in setting the path, we need to modify it manually. ``` Modify the path in "./src/main/scala/Pipeline/Main.scala" origin: val InstMemory = Module(new InstMem ("/home/kinzaa/Desktop/5-Stage-RV32I/src/main/scala/Pipeline/test.txt")) Replace it with: val InstMemory = Module(new InstMem ("./src/main/scala/Pipeline/test.txt")) ``` Then run the processor simulation by `sbt test`, the output should be ```shell= $ sbt test Elaborating design... Elaborating design... Done elaborating. Done elaborating. test PIPELINE Success: 0 tests passed in 202 cycles in 0.088404 seconds 2284.97 Hz test DecoupledGcd Success: 0 tests passed in 841 cycles in 0.469659 seconds 1790.66 Hz [info] TOPTest: [info] - 5-Stage test [info] GCDSpec: [info] - Gcd should calculate proper greatest common denominator [info] Run completed in 1 second, 828 milliseconds. [info] Total number of tests run: 2 [info] Suites: completed 2, aborted 0 [info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 7 s, completed Jan 2, 2025, 11:13:43 PM ``` ### Build our own 5-Stage-RV32I >sbt.version=1.9.1 Refer to [sbt Reference Manual - Hello, World](https://www.scala-sbt.org/1.x/docs/Hello.html) we can quickly setup a simple `Hello world build` for our own 5-Stage-RV32I ```shell= $ sbt new sbt/scala-seed.g8 ``` The tree should be ```shell= $ tree . ├── build.sbt ├── project │ ├── build.properties │ └── Dependencies.scala └── src ├── main │ └── scala │ └── example │ └── Hello.scala └── test └── scala └── example └── HelloSpec.scala ``` After that, we need to modify the `build.sbt` file to include Chisel as a dependency. ```scala= import Dependencies._ ThisBuild / scalaVersion := "2.12.8" lazy val root = (project in file(".")) .settings( name := "our_riscv_5_stage", libraryDependencies ++= Seq( "edu.berkeley.cs" %% "chisel3" % "3.4.3", "edu.berkeley.cs" %% "chiseltest" % "0.3.3" % Test, "org.scalatest" %% "scalatest" % "3.1.4" % Test ), scalacOptions ++= Seq( "-Xsource:2.11", "-language:reflectiveCalls", "-deprecation", "-feature", "-Xcheckinit", // Enables autoclonetype2 in 3.4.x (on by default in 3.5) "-P:chiselplugin:useBundlePlugin" ), addCompilerPlugin("edu.berkeley.cs" % "chisel3-plugin" % "3.4.3" cross CrossVersion.full), addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.1" cross CrossVersion.full) ) ``` ### riscv-tests To test our 5-stage pipelined, we utilize `riscv-tests`. By default, the PC start address in riscv-tests is set to `0x80000000`, but for convenience, we modify it to `0x00000000`. ```shell= $vim /opt/riscv/riscv-tests/env/p/link,ld SECTIONS { .=0x80000000; //change it to 0x00000000 ... } ``` then make the riscv-tests ```scala= $cd /opt/riscv/riscv-tests $autoconf $./configure --prefix=/src/target $make$make install ``` the test file from **riscv-tests** will be generated at `/src/target/share/riscv-tests/isa`, for instance we can check by ```shell= $ file /src/target/share/riscv-tests/isa/rv32ui-p-add /src/target/share/riscv-tests/isa/rv32ui-p-add: ELF 32-bit LSB executable, UCB RISC-V, soft-float ABI, version 1 (SYSV), statically linked, not stripped ``` we have to transfer the `EFE file` into `.bin` file. ```shell= $ riscv64-unknown-elf-objcopy -O binary /src/target/share/riscv-tests/isa/rv32ui-p-add rv32ui-p-add.bin ``` then change it into `.hex` file ```shell= od -An -tx1 -w1 -v rv32ui-p-add.bin >> rv32ui-p-add.hex ``` the result `.hex` file would be like ```hex= 6f 00 00 05 73 2f 20 34 .. ``` Ref: [riscv-chisel-book - Chapter 20](https://github.com/chadyuu/riscv-chisel-book) ### Run by [Docker](https://www.docker.com/) To streamline the process of managing and deploying the project, we use **Docker** to package everything. Follow the steps below to build and run the Docker container: Run the following command in your project directory to create a Docker image: ```shell= docker build . -t riscv/our_riscv ``` This will build a Docker image named `riscv/our_riscv`. Once the image is built, use the command below to start the Docker container and mount the current directory to /src inside the container: ```shell= docker run -it -v ./:/src riscv/our_riscv ``` DockerFile: :::spoiler dockerfile ```dockerfile= FROM ubuntu:22.04 ENV RISCV=/opt/riscv ENV PATH=$RISCV/bin:$PATH ENV MAKEFLAGS=-j4 WORKDIR $RISCV # Install dependencies RUN apt update && \ apt install -y autoconf automake autotools-dev curl libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev pkg-config git libusb-1.0-0-dev device-tree-compiler default-jdk gnupg vim # riscv-gnu-toolchain RUN git clone --recursive --single-branch https://github.com/riscv-collab/riscv-gnu-toolchain RUN cd riscv-gnu-toolchain && mkdir build && cd build && ../configure --prefix=${RISCV} --enable-multilib && make # riscv-tests RUN git clone -b master --single-branch https://github.com/riscv/riscv-tests && \ cd riscv-tests && git submodule update --init --recursive # sbt RUN echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | tee -a /etc/apt/sources.list.d/sbt.list && \ echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | tee /etc/apt/sources.list.d/sbt_old.list && \ curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | apt-key add && \ apt-get update && apt-get install -y sbt ``` ::: ## 5-Stage-RV32I (Main) The following image illustrates the five-stage pipelined datapath ![image](https://hackmd.io/_uploads/B1JJymrUyl.png) where the folder structure is as follows ```shell= $ tree . ├── Hazard Units │ ├── BranchForward.scala │ ├── Forwarding.scala │ ├── HazardDetection.scala │ └── StructuralHazard.scala ├── Main.scala ├── Memory │ ├── DataMemory.scala │ └── InstMem.scala ├── Pipelines │ ├── EX_MEM.scala │ ├── ID_EX.scala │ ├── IF_ID.scala │ └── MEM_WB.scala ├── test.txt └── UNits ├── Alu_Control.scala ├── Alu.scala ├── BRANCH.scala ├── Control.scala ├── ImmGenerator.scala ├── JALR.scala ├── PC4.scala ├── PC.scala └── RegisterFile.scala ``` The following will introduce the contents of each object. --- ### Memory Memory units which are fetched during execution. #### Data Memory For Data Memory, there are five I/O ports as following, | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **MemAddress** | `addr` | Specifies the memory address to either read data from or write data to. | | **MemWriteData** | `dataIn` | The data to be written into the memory. | | **MemREn** | `mem_read` | Indicates whether a read operation is enabled. | | **MemWEn** | `mem_write` | Indicates whether a write operation is enabled. | | **MemReadData** | `dataOut` | Outputs the data read from the memory. | ![image](https://hackmd.io/_uploads/rJutMmrU1e.png) ```scala= val io = IO(new Bundle { val addr = Input(UInt(32.W)) // Address input val dataIn = Input(SInt(32.W)) // Data to be written val mem_read = Input(Bool()) // Memory read enable val mem_write = Input(Bool()) // Memory write enable val dataOut = Output(SInt(32.W)) // Data output }) ``` After initializing the I/O ports, creating a memory module by using ```scala= val Dmemory = Mem(1024, SInt(32.W)) ``` Where `1024` is the number of memory cells or locations, and `SInt(32.W)` defines the data type stored as **a signed integer that is 32 bits wide**. Therefore, we will have `32 bits * 1024 = 32768 bits = 4096 Bytes(4 KB)` memory. :::info #### Difference of SyncReadMem and Mem There are two types of read-write memories that can be implemented in Chisel: `SyncReadMem` and `Mem`. - `SyncReadMem` represents **synchronous-read, synchronous-write memories**, where the values on the read data port are not guaranteed to be valid until the next clock cycle. - On the other hand, `Mem` represents **asynchronous-read, synchronous-write memories**, where the value is output immediately after the address is provided. For the implementation of the 5-stage RISC-V pipeline, we choose `Mem` due to its simpler integration, although `SyncReadMem` is closer to real-world hardware behavior, such as that of FPGAs and other applications. Ref : [Chisel Cookbook (Memories)][Memories_ChiselCookbook] ::: For the final step, we initialize the output to `0` and perform read/write operations based on the values of `io.mem_write` and `io.mem_read`. ```scala= when(io.mem_write) { Dmemory.write(io.addr, io.dataIn) } when(io.mem_read) { io.dataOut := Dmemory.read(io.addr) } ``` The complete code of `Data Memory.scala` would be :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class DataMemory extends Module { val io = IO(new Bundle { val addr = Input(UInt(32.W)) // Address input val dataIn = Input(SInt(32.W)) // Data to be written val mem_read = Input(Bool()) // Memory read enable val mem_write = Input(Bool()) // Memory write enable val dataOut = Output(SInt(32.W)) // Data output }) val Dmemory = Mem(1024, SInt(32.W)) io.dataOut := 0.S when(io.mem_write) { Dmemory.write(io.addr, io.dataIn) } when(io.mem_read) { io.dataOut := Dmemory.read(io.addr) } } ``` ::: #### Instruction Memory For Instruction Memory, there are two I/O ports as following, | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **PC address** | `addr` | PC address for the the instructions. | | **inst** | `data` | The data of instructions. | ![image](https://hackmd.io/_uploads/SknbHUrIJl.png) As stated before, the instruction memory block also used a `Mem` class for declaration, which allocate $2^{12}$ bytes (4 KB) in total. ```scala= val imem = Mem(16384, UInt(8.W)) ``` WE implement this module as read-only memory, therefore, there is only one input and one output. ```scala= val io = IO(new Bundle { val addr = Input(UInt(32.W)) // Address input to fetch instruction val data = Output(UInt(32.W)) // Output instruction } ``` A noticeable thing here is that, we called the `loadMemoryFromFile` function. ```scala= loadMemoryFromFile(imem, initFile) ``` This function load binary into memory module. In this senerio, the module load `test.txt` in the root directory of the package into the `InstMem` module. Finally, we drive the output signal with : ```scala= io.data := imem(io.addr/4.U) ``` We devide `io.addr` by `4` for word alignment. :::info 2025/01/09 To enable our module to input `.hex` files generated by riscv-tests, we modified the module as follows: ```scala= val imem = Mem(4096, UInt(8.W)) loadMemoryFromFile(imem, initFile) io.data := Cat( imem(io.addr+3.U(32.W)), imem(io.addr+2.U(32.W)), imem(io.addr+1.U(32.W)), imem(io.addr) ) ``` The memory (`imem`) is defined as 4096 x 8-bit, where each location represents a single byte and the total storage remain the same. Fetches 4 consecutive bytes and concatenates them to form a 32-bit instruction. ::: Putting all together we get: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ import chisel3.util.experimental.loadMemoryFromFile import scala.io.Source class InstMem(initFile: String) extends Module { val io = IO(new Bundle { val addr = Input(UInt(32.W)) // Address input to fetch instruction val data = Output(UInt(32.W)) // Output instruction }) val imem = Mem(1024, UInt(8.W)) loadMemoryFromFile(imem, initFile) io.data := Cat( imem(io.addr+3.U(32.W)), imem(io.addr+2.U(32.W)), imem(io.addr+1.U(32.W)), imem(io.addr) ) // For txt imput // val imem = Mem(1024, UInt(32.W)) // io.data := imem(io.addr/4.U) } ``` ::: --- ### Pipelines Pipeline registers for storing results of the previous stage. #### IF_ID For IF stage to ID stage , there are two I/O ports as following, | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **PC (I/O)** | `pc_in(out)`, `pc4_in(out)` , `SelectedPC(out)` | PC for the instructions, the next instructions, and selected PC.| | **inst (I/O)** | `SelectedInstr(out)` | The instruction that has been selected. | The `PC (I/O)` and `PC+4 (I/O)` correspond to the I/O ports of the `PC` module, while the `SelectedPC (I/O)` and `Selected Instruction (I/O)` correspond to the I/O ports of the instruction module. | ![Image 1](https://hackmd.io/_uploads/B1qCcFSU1x.png) | ![Image 2](https://hackmd.io/_uploads/ryXJitB8Jx.png) | |------------------------------------------------------|------------------------------------------------------| This module serve as register that save program counter and the fetched instruction accordingly. In order to speed up the pipeline, we forward `PC` and `PC+4` simultaneously. Four registers are declared and initialized using `RegInit`, which sets their reset values: ```scala= val Pc_In = RegInit (0.S (32.W)) val Pc4_In = RegInit (0.U (32.W)) val S_pc = RegInit (0.S (32.W)) val S_instr = RegInit (0.U (32.W)) ``` These reset values are applied during a system reset, ensuring the hardware starts in a known state. During normal operation, the registers are updated with the input signals every clock cycle: ```scala= Pc_In := io.pc_in Pc4_In := io.pc4_in S_pc := io.SelectedPC S_instr := io.SelectedInstr io.pc_out := Pc_In io.pc4_out := Pc4_In io.SelectedPC_out := S_pc io.SelectedInstr_out := S_instr ``` This ensures that the register values are synchronized with the input signals on each clock edge. Putting all together we get: ::: spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class IF_ID extends Module { val io = IO(new Bundle { val pc_in = Input (SInt(32.W)) // PC in val pc4_in = Input (UInt(32.W)) // PC4 in val SelectedPC = Input (SInt(32.W)) val SelectedInstr = Input (UInt(32.W)) val pc_out = Output (SInt(32.W)) // PC out val pc4_out = Output (UInt(32.W)) // PC + 4 out val SelectedPC_out = Output (SInt(32.W)) val SelectedInstr_out = Output (UInt(32.W)) }) val Pc_In = RegInit (0.S (32.W)) val Pc4_In = RegInit (0.U (32.W)) val S_pc = RegInit (0.S (32.W)) val S_instr = RegInit (0.U (32.W)) Pc_In := io.pc_in Pc4_In := io.pc4_in S_pc := io.SelectedPC S_instr := io.SelectedInstr io.pc_out := Pc_In io.pc4_out := Pc4_In io.SelectedPC_out := S_pc io.SelectedInstr_out := S_instr ``` ::: In the `ID_EX`, `EX_MEM`, and `MEM_WB` stages, we use `RegNext` to ensure that the input signals represent the **next states** of the registers, while the output signals reflect their **current states**. #### ID_EX For ID stage to EX stage , there are four I/O ports as following, | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **PC (I/O)** |`IFID_pc4_in(out)`| Program counter passed to the next stage.| | **RegRead Data 1 (I/O)** | `rs1_data_in(out)`| Data stored in the `rs1` register.| | **RegRead Data 2 (I/O)** | `rs2_data_in(out)`|Data stored in the `rs2` register | | **inst (I/O)** | `rs1_in(out)`,`rs2_in(out)`,`rd_in(out)`, `func3_in(out)`, `func7_in(out)`| Indecices of the selected registers, and the decoded parts of the instruction. | ![image](https://hackmd.io/_uploads/Bk4i9_UIJl.png)| ![image](https://hackmd.io/_uploads/BJQ2cdLUyl.png) | ![image](https://hackmd.io/_uploads/HJ705_IUJl.png) | ![image](https://hackmd.io/_uploads/r1sysuI8kg.png) |------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------ so we have : :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class ID_EX extends Module { val io = IO(new Bundle { val rs1_in = Input(UInt(5.W)) val rs2_in = Input(UInt(5.W)) val rs1_data_in = Input(SInt(32.W)) val rs2_data_in = Input(SInt(32.W)) val imm = Input(SInt(32.W)) val rd_in = Input(UInt(5.W)) val func3_in = Input(UInt(3.W)) val func7_in = Input(Bool()) val ctrl_MemWr_in = Input(Bool()) val ctrl_Branch_in = Input(Bool()) val ctrl_MemRd_in = Input(Bool()) val ctrl_Reg_W_in = Input(Bool()) val ctrl_MemToReg_in = Input(Bool()) val ctrl_AluOp_in = Input(UInt(3.W)) val ctrl_OpA_in = Input(UInt(2.W)) val ctrl_OpB_in = Input(Bool()) val ctrl_nextpc_in = Input(UInt(2.W)) val IFID_pc4_in = Input(UInt(32.W)) val rs1_out = Output(UInt(5.W)) val rs2_out = Output(UInt(5.W)) val rs1_data_out = Output(SInt(32.W)) val rs2_data_out = Output(SInt(32.W)) val rd_out = Output(UInt(5.W)) val imm_out = Output(SInt(32.W)) val func3_out = Output(UInt(3.W)) val func7_out = Output(Bool()) val ctrl_MemWr_out = Output(Bool()) val ctrl_Branch_out = Output(Bool()) val ctrl_MemRd_out = Output(Bool()) val ctrl_Reg_W_out = Output(Bool()) val ctrl_MemToReg_out = Output(Bool()) val ctrl_AluOp_out = Output(UInt(3.W)) val ctrl_OpA_out = Output(UInt(2.W)) val ctrl_OpB_out = Output(Bool()) val ctrl_nextpc_out = Output(UInt(2.W)) val IFID_pc4_out = Output(UInt(32.W)) }) io.rs1_out := RegNext(io.rs1_in) io.rs2_out := RegNext(io.rs2_in) io.rs1_data_out := RegNext(io.rs1_data_in) io.rs2_data_out := RegNext(io.rs2_data_in) io.imm_out := RegNext(io.imm) io.rd_out := RegNext(io.rd_in) io.func3_out := RegNext(io.func3_in) io.func7_out := RegNext(io.func7_in) io.ctrl_MemWr_out := RegNext(io.ctrl_MemWr_in) io.ctrl_Branch_out := RegNext(io.ctrl_Branch_in) io.ctrl_MemRd_out := RegNext(io.ctrl_MemRd_in) io.ctrl_Reg_W_out := RegNext(io.ctrl_Reg_W_in) io.ctrl_MemToReg_out := RegNext(io.ctrl_MemToReg_in) io.ctrl_AluOp_out := RegNext(io.ctrl_AluOp_in) io.ctrl_OpA_out := RegNext(io.ctrl_OpA_in) io.ctrl_OpB_out := RegNext(io.ctrl_OpB_in) io.ctrl_nextpc_out := RegNext(io.ctrl_nextpc_in) io.IFID_pc4_out := RegNext(io.IFID_pc4_in) } ``` ::: #### EX_MEM For EX stage to MEM stage , there are four I/O ports as following, | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **PC (I/O)** | `None`| | | **ALU Out (I/O)** | `alu_out (EXMEM_alu_out)`| Result computed by ALU.| | **RegRead Data 2 (I/O)** | `IDEX_rs2 (EXMEM_rs2_out)`| Data read from the register `rs1`.| | **inst (I/O)** | `IDEX_rd (EXMEM_rd_out)`| Index of the register for storing data read from memory.| ![image](https://hackmd.io/_uploads/Bk4i9_UIJl.png)| ![image](https://hackmd.io/_uploads/S1kcjdIIJx.png) | ![image](https://hackmd.io/_uploads/r1QisuL8Jg.png) | ![image](https://hackmd.io/_uploads/Hymhjd8I1x.png) |------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------ so we have : :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class EX_MEM extends Module { val io = IO(new Bundle { val IDEX_MEMRD = Input(Bool()) val IDEX_MEMWR = Input(Bool()) val IDEX_MEMTOREG = Input(Bool()) val IDEX_REG_W = Input(Bool()) val IDEX_rs2 = Input(SInt(32.W)) val IDEX_rd = Input(UInt(5.W)) val alu_out = Input(SInt(32.W)) val EXMEM_memRd_out = Output(Bool()) val EXMEM_memWr_out = Output(Bool()) val EXMEM_memToReg_out = Output(Bool()) val EXMEM_reg_w_out = Output(Bool()) val EXMEM_rs2_out = Output(SInt(32.W)) val EXMEM_rd_out = Output(UInt(5.W)) val EXMEM_alu_out = Output(SInt(32.W)) }) io.EXMEM_memRd_out := RegNext(io.IDEX_MEMRD) io.EXMEM_memWr_out := RegNext(io.IDEX_MEMWR) io.EXMEM_memToReg_out := RegNext(io.IDEX_MEMTOREG) io.EXMEM_reg_w_out := RegNext(io.IDEX_REG_W) io.EXMEM_rs2_out := RegNext(io.IDEX_rs2) io.EXMEM_rd_out := RegNext(io.IDEX_rd) io.EXMEM_alu_out := RegNext(io.alu_out) } ``` ::: #### MEM_WB For MEM stage to WB stage , there are two I/O ports as following, | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **Reg Write Data (I/O)** | `in_dataMem_out (MEMWB_dataMem_out)`, `in_alu_out (MEMWB_alu_out)`|Data to be written to memory, and address computed by the ALU. | | **inst (I/O)** | `EXMEM_rd (MEMWB_rd_out)`| Index of register that will store the data.| | ![image](https://hackmd.io/_uploads/Skj-n_LIkg.png)| ![Image 2](https://hackmd.io/_uploads/ryXJitB8Jx.png) | |------------------------------------------------------|------------------------------------------------------| so we have : :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class MEM_WB extends Module { val io = IO(new Bundle { val EXMEM_MEMTOREG = Input(Bool()) val EXMEM_REG_W = Input(Bool()) val EXMEM_MEMRD = Input(Bool()) val EXMEM_rd = Input(UInt(5.W)) val in_dataMem_out = Input(SInt(32.W)) val in_alu_out = Input(SInt(32.W)) val MEMWB_memToReg_out = Output(Bool()) val MEMWB_reg_w_out = Output(Bool()) val MEMWB_memRd_out = Output(Bool()) val MEMWB_rd_out = Output(UInt(5.W)) val MEMWB_dataMem_out = Output(SInt(32.W)) val MEMWB_alu_out = Output(SInt(32.W)) }) io.MEMWB_memToReg_out := RegNext(io.EXMEM_MEMTOREG) io.MEMWB_reg_w_out := RegNext(io.EXMEM_REG_W) io.MEMWB_memRd_out := RegNext(io.EXMEM_MEMRD) io.MEMWB_rd_out := RegNext(io.EXMEM_rd) io.MEMWB_dataMem_out := RegNext(io.in_dataMem_out) io.MEMWB_alu_out := RegNext(io.in_alu_out) } ``` ::: ---- ### UNits #### ALUDecode To generate the control signal for the ALU, we design the `ALUDecode` unit. This unit takes `funct3`, `funct7`, and `aluOp` as inputs, which represent the instruction's functional fields and operation type. ```scala= val io = IO(new Bundle { val Type = Input(UInt(3.W)) val funct7 = Input(UInt(1.W)) // Only funct7[5] matters val funct3 = Input(UInt(3.W)) val ALUSel = Output(UInt(4.W)) }) ``` To find out what kind of operation the `ALU` module need to perform, we first identify what instruction's type is, then generate `ALUSel` signal from `funct3` (or funct3 + funct7 if the instruction is `R-type`). The following section is how we generate the `ALUSel` signal: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class ALUDecode extends Module { val io = IO(new Bundle { val Type = Input(UInt(3.W)) val funct7 = Input(UInt(1.W)) // Only funct7[5] matters val funct3 = Input(UInt(3.W)) val ALUSel = Output(UInt(4.W)) }) // ALUSel table val ADD = 0.U(4.W) val SUB = 1.U(4.W) val AND = 2.U(4.W) val OR = 3.U(4.W) val XOR = 4.U(4.W) val SLL = 5.U(4.W) val SRL = 6.U(4.W) val SRA = 7.U(4.W) val X = 8.U(4.W) // X means don't care // Instruction types val Rtype = 0.U(3.W) val Itype = 1.U(3.W) val Ltype = 4.U(3.W) val Stype = 5.U(3.W) val ConditionalBranch = 2.U(3.W) val UnconditionalBranch = 3.U(3.W) val AUIPC = 7.U(3.W) val LUI = 6.U(3.W) val temp = Cat(io.funct7, io.funct3) // Combine funct7 and funct3 // Default ALUSel assignment io.ALUSel := MuxLookup(io.Type, X, Array( Stype -> ADD, Ltype -> ADD, AUIPC -> ADD, Rtype -> MuxLookup(temp, X, Array( "b0111".U -> AND, "b0110".U -> OR, "b1101".U -> SRA, "b0101".U -> SRL, "b0100".U -> XOR, "b0001".U -> SLL, "b1000".U -> SUB, "b0000".U -> ADD )), Itype -> MuxLookup(temp, X, Array( "b1101".U -> SRA, "b0101".U -> SRL, "b0001".U -> SLL, "b0111".U -> AND, "b0110".U -> OR, "b0100".U -> XOR, "b0000".U -> ADD )), ConditionalBranch -> X, UnconditionalBranch -> X, LUI -> X )) } ``` ::: With this module, designing `ALU` module would simple. <!-- - I type - SB type - Branch type - Loads, S type, U type (lui), U type (auipc) --> <!-- > cannot understand :( [name=Charlie Chiou] --> #### Alu | **I/O Port** | **Variable Name** | **Description** | |--------------------|------------------|---------------------------------------------------------------------------------| | **A** | `in_A` |Input data A | | **B** | `in_B` |Input data B| | **ALU Select** | `ALUSel` |ALU select decide which operation to compute| | **ALU Output** | `out` |The result of ALU.| ![image](https://hackmd.io/_uploads/By2_UQ_Lyx.png =10%x) ALU is used for performing **arithmetic operations** (such as addition, subtraction, multiplication, and division) and **logical operations** (such as AND, OR, NOT, XOR). How the ALU perform operation is defined as : ```scala= // ALUSel table val ADD = 0.U(4.W) val SUB = 1.U(4.W) val AND = 2.U(4.W) val OR = 3.U(4.W) val XOR = 4.U(4.W) val SLL = 5.U(4.W) val SRL = 6.U(4.W) val SRA = 7.U(4.W) val X = 8.U(4.W) // X means don't care ``` And the ALU will execute according to the table: :::spoiler Code switch(io.ALUSel) { is(ADD) { io.out := io.in_A + io.in_B } is(SUB) { io.out := io.in_A - io.in_B } is(AND) { io.out := io.in_A & io.in_B } is(OR) { io.out := io.in_A | io.in_B } is(XOR) { io.out := io.in_A ^ io.in_B } is(SLL) { io.out := io.in_A << io.in_B(4, 0) } // Limit shift amount to 5 bits is(SRL) { io.out := io.in_A >> io.in_B(4, 0) } // Limit shift amount to 5 bits is(SRA) { io.out := (io.in_A.asSInt >> io.in_B(4, 0)) } // Signed right shift is(X) { io.out := 0.S } } ::: Putting everything together: :::spoiler Code ```sca= package Pipeline import chisel3._ import chisel3.util._ class ALU extends Module { val io = IO(new Bundle { val in_A = Input(SInt(32.W)) val in_B = Input(SInt(32.W)) val ALUSel = Input(UInt(4.W)) val out = Output(SInt(32.W)) }) // ALUSel table val ADD = 0.U(4.W) val SUB = 1.U(4.W) val AND = 2.U(4.W) val OR = 3.U(4.W) val XOR = 4.U(4.W) val SLL = 5.U(4.W) val SRL = 6.U(4.W) val SRA = 7.U(4.W) val X = 8.U(4.W) // X means don't care // Default value for out io.out := 0.S switch(io.ALUSel) { is(ADD) { io.out := io.in_A + io.in_B } is(SUB) { io.out := io.in_A - io.in_B } is(AND) { io.out := io.in_A & io.in_B } is(OR) { io.out := io.in_A | io.in_B } is(XOR) { io.out := io.in_A ^ io.in_B } is(SLL) { io.out := io.in_A << io.in_B(4, 0) } // Limit shift amount to 5 bits is(SRL) { io.out := io.in_A >> io.in_B(4, 0) } // Limit shift amount to 5 bits is(SRA) { io.out := (io.in_A.asSInt >> io.in_B(4, 0)) } // Signed right shift is(X) { io.out := 0.S } } } ``` ::: <!-- First used an object to define the **operation code**, > Not sure the meanings of the number ?? [name=Charlie Chiou] ```scala= object AluOpCode { // funct7 funct3 opcode val ALU_ADDI = 0.U(5.W) // | --- ---- | 000 | 001 0011 | val ALU_ADD = 0.U(5.W) // | 000 0000 | 000 | 011 0011 | val ALU_SW = 0.U(5.W) // | --- ---- | 010 | 010 0011 | val ALU_LW = 0.U(5.W) // | --- ---- | 010 | 000 0011 | val ALU_LUI = 0.U(5.W) // | --- ---- | --- | 011 0111 | val ALU_AUIPC = 0.U(5.W) // | --- ---- | --- | 001 0111 | val ALU_SLL = 1.U(5.W) // | 000 0000 | 001 | 011 0011 | val ALU_SLLI = 1.U(5.W) // | 000 0000 | 001 | 001 0011 | val ALU_SLT = 2.U(5.W) // | 000 0000 | 010 | 011 0011 | val ALU_SLTI = 2.U(5.W) // | --- ---- | 010 | 001 0011 | val ALU_SLTU = 3.U(5.W) // | 000 0000 | 011 | 011 0011 | val ALU_SLTUI = 3.U(5.W) // | --- ---- | 011 | 001 0011 | val ALU_XOR = 4.U(5.W) // | 000 0000 | 100 | 011 0011 | val ALU_XORI = 4.U(5.W) // | --- ---- | 100 | 001 0011 | val ALU_SRL = 5.U(5.W) // | 000 0000 | 101 | 011 0011 | val ALU_SRLI = 5.U(5.W) // | 000 0000 | 101 | 001 0011 | val ALU_OR = 6.U(5.W) // | 000 0000 | 110 | 011 0011 | val ALU_ORI = 6.U(5.W) // | --- ---- | 110 | 001 0011 | val ALU_AND = 7.U(5.W) // | 000 0000 | 111 | 011 0011 | val ALU_ANDI = 7.U(5.W) // | --- ---- | 111 | 001 0011 | val ALU_SUB = 8.U(5.W) // | 010 0000 | 000 | 011 0011 | val ALU_SRA = 13.U(5.W) // | 010 0000 | 101 | 011 0011 | val ALU_SRAI = 13.U(5.W) // | 010 0000 | 101 | 001 0011 | val ALU_JAL = 31.U(5.W) // | --- ---- | --- | 110 1111 | val ALU_JALR = 31.U(5.W) // | --- ---- | 000 | 110 0111 | } ``` --> <!-- we initialize the output by `val result = WireDefault(0.S(32.W))` and choose the operation by `switch` ```scala= switch(io.alu_Op) { is(ALU_ADD, ALU_ADDI, ALU_SW, ALU_LW, ALU_LUI, ALU_AUIPC) { result := io.in_A + io.in_B } is(ALU_SLL, ALU_SLLI) { result := (io.in_A.asUInt << io.in_B(4, 0)).asSInt } is(ALU_SLT, ALU_SLTI) { result := Mux(io.in_A < io.in_B, 1.S, 0.S) } is(ALU_SLTU, ALU_SLTUI) { result := Mux(io.in_A.asUInt < io.in_B.asUInt, 1.S, 0.S) } is(ALU_XOR, ALU_XORI) { result := io.in_A ^ io.in_B } is(ALU_SRL, ALU_SRLI) { result := (io.in_A.asUInt >> io.in_B(4, 0)).asSInt } is(ALU_OR, ALU_ORI) { result := io.in_A | io.in_B } is(ALU_AND, ALU_ANDI) { result := io.in_A & io.in_B } is(ALU_SUB) { result := io.in_A - io.in_B } is(ALU_SRA, ALU_SRAI) { result := (io.in_A >> io.in_B(4, 0)).asSInt } is(ALU_JAL, ALU_JALR) { result := io.in_A } } ``` Notably, we utilized the following approach to implement the functionality: - **ADD**, **ADDI**, **SW**, **LW**, **LUI**, **AUIPC** -> directly use `addition` to compute the results. - **SLL**, **SLLI**, **SRL**, **SRLI** -> shift the input by `<<`or`>>` and use `.asUInt`,`.asSInt` to convert input and output into **a unsigned integer (UInt)** and **a signed integer (SInt)** - **SLT**, **SLTI**, **SLTU**, **SLTUI** -> set the result by `Mux` and for `SLTU`,`SLTUI` we use `.asUInt` to convert the input into `UInt`. - **XOR**, **XORI** -> directly use `xor` to compute the results. - **OR**, **ORI** -> directly use `or` to compute the results. - **AND**, **ANDI** -> directly use `and` to compute the results. - **SUB** -> directly use `subtraction` to compute the results. - **SRA**, **SRAI** -> shift the input by `>>` - **JAL**, **JALR** -> directly wired the output to input A Put them all together: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ import AluOpCode._ object AluOpCode { // funct7 funct3 opcode val ALU_ADDI = 0.U(5.W) // | --- ---- | 000 | 001 0011 | val ALU_ADD = 0.U(5.W) // | 000 0000 | 000 | 011 0011 | val ALU_SW = 0.U(5.W) // | --- ---- | 010 | 010 0011 | val ALU_LW = 0.U(5.W) // | --- ---- | 010 | 000 0011 | val ALU_LUI = 0.U(5.W) // | --- ---- | --- | 011 0111 | val ALU_AUIPC = 0.U(5.W) // | --- ---- | --- | 001 0111 | val ALU_SLL = 1.U(5.W) // | 000 0000 | 001 | 011 0011 | val ALU_SLLI = 1.U(5.W) // | 000 0000 | 001 | 001 0011 | val ALU_SLT = 2.U(5.W) // | 000 0000 | 010 | 011 0011 | val ALU_SLTI = 2.U(5.W) // | --- ---- | 010 | 001 0011 | val ALU_SLTU = 3.U(5.W) // | 000 0000 | 011 | 011 0011 | val ALU_SLTUI = 3.U(5.W) // | --- ---- | 011 | 001 0011 | val ALU_XOR = 4.U(5.W) // | 000 0000 | 100 | 011 0011 | val ALU_XORI = 4.U(5.W) // | --- ---- | 100 | 001 0011 | val ALU_SRL = 5.U(5.W) // | 000 0000 | 101 | 011 0011 | val ALU_SRLI = 5.U(5.W) // | 000 0000 | 101 | 001 0011 | val ALU_OR = 6.U(5.W) // | 000 0000 | 110 | 011 0011 | val ALU_ORI = 6.U(5.W) // | --- ---- | 110 | 001 0011 | val ALU_AND = 7.U(5.W) // | 000 0000 | 111 | 011 0011 | val ALU_ANDI = 7.U(5.W) // | --- ---- | 111 | 001 0011 | val ALU_SUB = 8.U(5.W) // | 010 0000 | 000 | 011 0011 | val ALU_SRA = 13.U(5.W) // | 010 0000 | 101 | 011 0011 | val ALU_SRAI = 13.U(5.W) // | 010 0000 | 101 | 001 0011 | val ALU_JAL = 31.U(5.W) // | --- ---- | --- | 110 1111 | val ALU_JALR = 31.U(5.W) // | --- ---- | 000 | 110 0111 | } class ALU extends Module { val io = IO(new Bundle { val in_A = Input(SInt(32.W)) val in_B = Input(SInt(32.W)) val alu_Op = Input(UInt(5.W)) val out = Output(SInt(32.W)) }) val result = WireDefault(0.S(32.W)) switch(io.alu_Op) { is(ALU_ADD, ALU_ADDI, ALU_SW, ALU_LW, ALU_LUI, ALU_AUIPC) { result := io.in_A + io.in_B } is(ALU_SLL, ALU_SLLI) { result := (io.in_A.asUInt << io.in_B(4, 0)).asSInt } is(ALU_SLT, ALU_SLTI) { result := Mux(io.in_A < io.in_B, 1.S, 0.S) } is(ALU_SLTU, ALU_SLTUI) { result := Mux(io.in_A.asUInt < io.in_B.asUInt, 1.S, 0.S) } is(ALU_XOR, ALU_XORI) { result := io.in_A ^ io.in_B } is(ALU_SRL, ALU_SRLI) { result := (io.in_A.asUInt >> io.in_B(4, 0)).asSInt } is(ALU_OR, ALU_ORI) { result := io.in_A | io.in_B } is(ALU_AND, ALU_ANDI) { result := io.in_A & io.in_B } is(ALU_SUB) { result := io.in_A - io.in_B } is(ALU_SRA, ALU_SRAI) { result := (io.in_A >> io.in_B(4, 0)).asSInt } is(ALU_JAL, ALU_JALR) { result := io.in_A } } io.out := result } ``` ::: --> #### BRANCH ![image](https://hackmd.io/_uploads/Hyln7L9IJe.png =20%x) There are $6$ B-type instructions in RISC-V I, which are told by `opcode`, and `func3` indicate which instruction it is. | instruction | funct3 (bianry) | funct3 (decimal) | | -------- | -------- |-------- | | `beq` | 000 |0 | | `bge` | 101 |5 | | `bgeu` | 111 |7 | | `blt` | 100 |4 | | `bltu` | 110 |6 | | `bne` | 001 |1 | By examing IO ports, it will be clear how this module work: ```scala= class Branch extends Module { val io = IO(new Bundle { val fnct3 = Input(UInt(3.W)) val branch = Input(Bool()) val arg_x = Input(SInt(32.W)) val arg_y = Input(SInt(32.W)) val br_taken = Output(Bool()) }) ``` `branch` decide whether this is a `B-type` instruction or not, while `arg_x`, `arg_y` are value coming from `rs1` and `rs2`, `func3` indicates which B-type instruction it is, and `br_taken` is the output signal set according to input. Control signal setting: ```scala= when(io.branch) { // beq when(io.fnct3 === 0.U) { io.br_taken := io.arg_x === io.arg_y } // bne .elsewhen(io.fnct3 === 1.U) { io.br_taken := io.arg_x =/= io.arg_y } // blt .elsewhen(io.fnct3 === 4.U) { io.br_taken := io.arg_x < io.arg_y } // bge .elsewhen(io.fnct3 === 5.U) { io.br_taken := io.arg_x >= io.arg_y } // bltu (unsigned less than) .elsewhen(io.fnct3 === 6.U) { io.br_taken := io.arg_x.asUInt < io.arg_y.asUInt } // bgeu (unsigned greater than or equal) .elsewhen(io.fnct3 === 7.U) { io.br_taken := io.arg_x.asUInt >= io.arg_y.asUInt } } ``` Putting them all together: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class Branch extends Module { val io = IO(new Bundle { val fnct3 = Input(UInt(3.W)) val branch = Input(Bool()) val arg_x = Input(SInt(32.W)) val arg_y = Input(SInt(32.W)) val br_taken = Output(Bool()) }) io.br_taken := false.B when(io.branch) { // beq when(io.fnct3 === 0.U) { io.br_taken := io.arg_x === io.arg_y } // bne .elsewhen(io.fnct3 === 1.U) { io.br_taken := io.arg_x =/= io.arg_y } // blt .elsewhen(io.fnct3 === 4.U) { io.br_taken := io.arg_x < io.arg_y } // bge .elsewhen(io.fnct3 === 5.U) { io.br_taken := io.arg_x >= io.arg_y } // bltu (unsigned less than) .elsewhen(io.fnct3 === 6.U) { io.br_taken := io.arg_x.asUInt < io.arg_y.asUInt } // bgeu (unsigned greater than or equal) .elsewhen(io.fnct3 === 7.U) { io.br_taken := io.arg_x.asUInt >= io.arg_y.asUInt } } } ``` ::: #### Control Control signal are directly mapped from `opcode`, the sheet will demonstrate the mapping relationship: | opcode | Instruction Type | | -------- | -------- | | 011 0011 | R-type | | 110 0011 | B-type | | 001 0011 | I-type | | 010 0011 |S-type | | 000 0011 |L-type | | 001 0111 | AUIPC | | 011 0111 | LUI | | 110 1111 | JAL | | 110 0111 | JALR | And the control signals can be set accordingly. We set the control signals with chisel's `swtich` syntax: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class Control extends Module { val io = IO(new Bundle { val opcode = Input(UInt(7.W)) // 7-bit opcode val mem_write = Output(Bool()) // whether a write to memory val branch = Output(Bool()) // whether a branch instruction val mem_read = Output(Bool()) // whether a read from memory val reg_write = Output(Bool()) // whether a register write val men_to_reg = Output(Bool()) // whether the value written to a register (for load instructions) val alu_operation = Output(UInt(3.W)) val operand_A = Output(UInt(2.W)) // Operand A source selection for the ALU val operand_B = Output(Bool()) // Operand B source selection for the ALU // Indicates the type of extension to be used (e.g., sign-extend, zero-extend) val extend = Output(UInt(2.W)) val next_pc_sel = Output(UInt(2.W)) // next PC value (e.g., PC+4, branch target, jump target) }) io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 0.B io.men_to_reg := 0.B io.alu_operation := 0.U io.operand_A := 0.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 0.U switch(io.opcode) { // R type instructions (e.g., add, sub) is(51.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 0.U io.operand_A := 0.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 0.U } // I type instructions (e.g., immediate operations) is(19.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 1.U io.operand_A := 0.U io.operand_B := 1.B io.extend := 0.U io.next_pc_sel := 0.U } // S type instructions (e.g., store operations) is(35.U) { io.mem_write := 1.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 0.B io.men_to_reg := 0.B io.alu_operation := 5.U io.operand_A := 0.U io.operand_B := 1.B io.extend := 1.U io.next_pc_sel := 0.U } // Load instructions (e.g., load data from memory) is(3.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 1.B io.reg_write := 1.B io.men_to_reg := 1.B io.alu_operation := 4.U io.operand_A := 0.U io.operand_B := 1.B io.extend := 0.U io.next_pc_sel := 0.U } // SB type instructions (e.g., conditional branch) is(99.U) { io.mem_write := 0.B io.branch := 1.B io.mem_read := 0.B io.reg_write := 0.B io.men_to_reg := 0.B io.alu_operation := 2.U io.operand_A := 0.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 1.U } // UJ type instructions (e.g., jump and link) is(111.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 3.U io.operand_A := 1.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 2.U } // Jalr instruction (e.g., jump and link register) is(103.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 3.U io.operand_A := 1.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 3.U } // U type (LUI) instructions (e.g., load upper immediate) is(55.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 6.U io.operand_A := 3.U io.operand_B := 1.B io.extend := 2.U io.next_pc_sel := 0.U } // U type (AUIPC) instructions (e.g., add immediate to PC) is(23.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 7.U io.operand_A := 2.U io.operand_B := 1.B io.extend := 2.U io.next_pc_sel := 0.U } } } ``` ::: #### ImmGenerator ![image](https://hackmd.io/_uploads/rJxryPcLke.png =15%x) Different types of instructions need different ways to concatenate the immediate number. The module outputs the immediate number based on the input instruction. - I-type <table style="text-align: center;"> <tr> <th>bits position</th> <th>31-25</th> <th>24-20</th> <th>19-15</th> <th>14-12</th> <th>11-7</th> <th>6-0</th> </tr> <tr> <td>I</td> <td colspan="2">imm[11:0]</td> <td>rs1</td> <td>funct3</td> <td>rd</td> <td>opcode</td> </tr> </table> For **I-type**, we extract the `inst[31-20]` and sign extend the MSB. ```shell= io.I_type := Cat(Fill(20, io.instr(31)), io.instr(31, 20)).asSInt ``` - S-Type <table style="text-align: center;"> <tr> <th>bits position</th> <th>31-25</th> <th>24-20</th> <th>19-15</th> <th>14-12</th> <th>11-7</th> <th>6-0</th> </tr> <tr> <td>S</td> <td >imm[11:5]</td> <td>rs2</td> <td>rs1</td> <td>funct3</td> <td>imm[4:0]</td> <td>opcode</td> </tr> </table> For **S-type**, we concatenate `inst[31-25]` with `inst[11-7]` and sign extend the MSB. ```shell= io.S_type := Cat(Fill(20, io.instr(31)), io.instr(31, 25), io.instr(11, 7)).asSInt ``` - Branch-Type <table style="text-align: center;"> <tr> <th>bits position</th> <th>31-25</th> <th>24-20</th> <th>19-15</th> <th>14-12</th> <th>11-7</th> <th>6-0</th> </tr> <tr> <td>B</td> <td>imm[12|10:5]</td> <td>rs2</td> <td>rs1</td> <td>funct3</td> <td>imm[4:1|11]</td> <td>opcode</td> </tr> </table> For **Branch-type**, we concatenate `inst[31]`,`inst[7]`,`inst[20-25]`,`inst[11-8]`, 0 and sign extend the MSB.Furthermore, we also **add the program counter** for `jump` ```shell= val sbImm = Cat(Fill(19, io.instr(31)), io.instr(31), io.instr(7), io.instr(30, 25), io.instr(11, 8), 0.U(1.W)).asSInt io.SB_type := sbImm + io.pc.asSInt ``` - U-Type <table style="text-align: center;"> <tr> <th>bits position</th> <th>31-25</th> <th>24-20</th> <th>19-15</th> <th>14-12</th> <th>11-7</th> <th>6-0</th> </tr> <tr> <td>U</td> <td colspan="4">imm[31:12]</td> <td>rd</td> <td>opcode</td> </tr> </table> For **U-type**, we eatract `inst[31-12]` and fill the rest of the bits with 0. ```shell= io.U_type := Cat(io.instr(31, 12), Fill(12, 0.U)).asSInt ``` - UJ-Type <table style="text-align: center;"> <tr> <th>bits position</th> <th>31-25</th> <th>24-20</th> <th>19-15</th> <th>14-12</th> <th>11-7</th> <th>6-0</th> </tr> <tr> <td>UJ</td> <td colspan="4">imm[20|10:1|11|19:12]</td> <td>rd</td> <td>opcode</td> </tr> </table> For **UJ-type**, we concatenate `inst[31]`,`inst[19-12]`,`inst[20]`,`inst[30-21]`, 0 and sign extend the MSB. Furthermore, we also **add the program counter** for `jump` ```shell= val ujImm = Cat(Fill(11, io.instr(31)), io.instr(31), io.instr(19, 12), io.instr(20), io.instr(30, 21), 0.U(1.W)).asSInt io.UJ_type := ujImm + io.pc.asSInt ``` The complete code is shown below. :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class ImmGenerator extends Module { val io = IO(new Bundle { val instr = Input(UInt(32.W)) val pc = Input(UInt(32.W)) val I_type = Output(SInt(32.W)) val S_type = Output(SInt(32.W)) val SB_type = Output(SInt(32.W)) val U_type = Output(SInt(32.W)) val UJ_type = Output(SInt(32.W)) }) // I-Type Immediate: [31:20] sign-extended to 32 bits io.I_type := Cat(Fill(20, io.instr(31)), io.instr(31, 20)).asSInt // S-Type Immediate: [31:25][11:7] sign-extended to 32 bits io.S_type := Cat(Fill(20, io.instr(31)), io.instr(31, 25), io.instr(11, 7)).asSInt // Branch-Type Immediate: [31][7][30:25][11:8] sign-extended to 32 bits val sbImm = Cat(Fill(19, io.instr(31)), io.instr(31), io.instr(7), io.instr(30, 25), io.instr(11, 8), 0.U(1.W)).asSInt io.SB_type := sbImm + io.pc.asSInt // U-Type Immediate: [31:12] shifted left by 12 bits io.U_type := Cat(io.instr(31, 12), Fill(12, 0.U)).asSInt // UJ-Type Immediate: [31][19:12][20][30:21] sign-extended to 32 bits, shifted left by 1 bit val ujImm = Cat(Fill(11, io.instr(31)), io.instr(31), io.instr(19, 12), io.instr(20), io.instr(30, 21), 0.U(1.W)).asSInt io.UJ_type := ujImm + io.pc.asSInt } ``` ::: #### JALR This part is not directly related to the `the five-stage pipelined datapath image`, but by writing this component independently, the circuit can be more understandable. The module implements the `JALR` instruction directly by adding generated `imm` and value from register to compute destination address ```scala= val computedAddr = io.imme + io.rdata1 ``` then we align the address by bitwise operation. ```scala= io.out := computedAddr & "hFFFFFFFE".U ``` The complete code is shown below. :::spoiler Code ```scala! package Pipeline import chisel3._ import chisel3.util._ class Jalr extends Module { val io = IO(new Bundle { val imme = Input(UInt(32.W)) val rdata1 = Input(UInt(32.W)) val out = Output(UInt(32.W)) }) val computedAddr = io.imme + io.rdata1 // Align the address by masking the least significant bit (LSB) to 0 io.out := computedAddr & "hFFFFFFFE".U } ``` ::: #### PC & PC4 ![image](https://hackmd.io/_uploads/Hy3PnT5UJe.png =30%x) The `PC` and `PC4` modules share a similar structure in terms of handling input and output, but their functionality differs slightly based on whether the input is incremented by 4. PC: ```scala= package Pipeline import chisel3._ import chisel3.util._ class PC extends Module { val io = IO (new Bundle { val in = Input(SInt(32.W)) val out = Output(SInt(32.W)) }) val PC = RegInit(0.S(32.W)) io.out := PC PC := io.in } ``` PC4: ```scala= package Pipeline import chisel3._ import chisel3.util._ class PC4 extends Module { val io = IO (new Bundle { val pc = Input(UInt(32.W)) val out = Output(UInt(32.W)) }) io.out := 0.U io.out := io.pc + 4.U(32.W) } ``` #### RegisterFile ![image](https://hackmd.io/_uploads/ryFi4p9I1e.png =25%x) For the register file, we use `RegInit` together with `VecInit` to create 32 registers, each initialized to 0. ```scala= val regfile = RegInit(VecInit(Seq.fill(32)(0.S(32.W)))) ``` `RegisterFile` module accepts two register addresses for reading and one address for writing. When reading a register, if the address is 0, the output is always 0. Otherwise, it outputs the data stored at the specified address. ```scala= io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1)) io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2)) ``` When writing to a register, the module first checks the write enable signal (`reg_write`) and ensures the target address is not 0. If both conditions are met, the data is written to the specified register. ```scala= when(io.reg_write && io.w_reg =/= 0.U) { regfile(io.w_reg) := io.w_data } ``` The complete code is shown below. :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class RegisterFile extends Module { val io = IO(new Bundle { val rs1 = Input(UInt(5.W)) val rs2 = Input(UInt(5.W)) val reg_write = Input(Bool()) val w_reg = Input(UInt(5.W)) val w_data = Input(SInt(32.W)) val rdata1 = Output(SInt(32.W)) val rdata2 = Output(SInt(32.W)) }) val regfile = RegInit(VecInit(Seq.fill(32)(0.S(32.W)))) io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1)) io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2)) when(io.reg_write && io.w_reg =/= 0.U) { regfile(io.w_reg) := io.w_data } } ``` ::: --- ### Hazard Units #### HazardDetection This module is a combinational logic that decide whther forwarding is needed.`IF_ID_inst`, `IF_ID_inst`, and `ID_EX_rd` are inputs from pipeline registers, `pc_in` and `current_pc` are not in charge of forwarding decisions. ```scala= val io = IO(new Bundle { val IF_ID_inst = Input(UInt(32.W)) val ID_EX_memRead = Input(Bool()) val ID_EX_rd = Input(UInt(5.W)) val pc_in = Input(SInt(32.W)) val current_pc = Input(SInt(32.W)) val inst_forward = Output(Bool()) val pc_forward = Output(Bool()) val ctrl_forward = Output(Bool()) val inst_out = Output(UInt(32.W)) val pc_out = Output(SInt(32.W)) val current_pc_out = Output(SInt(32.W)) } ``` If a L-type instruction (i.e., `io.ID_EX_memRead === 1.B`) is followed by R-type, S-type or B-type instruction, and registers overlap (i.e., `((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))`), we forward instruction, PC and certain control signals. For example: ```asm lw t0, 0(t1) // instruction 1 add s0, t0, s1 // instruction 2 ``` in this senerio, we set signals `inst_forward`, `pc_forward` and `ctrl_forward` to `true`, otherwise `false`. ```scala= when(io.ID_EX_memRead === 1.B && ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))) { io.inst_forward := true.B io.pc_forward := true.B io.ctrl_forward := true.B }.otherwise { io.inst_forward := false.B io.pc_forward := false.B io.ctrl_forward := false.B } ``` then the data are path through the module ```scala= io.inst_out := io.IF_ID_inst io.pc_out := io.pc_in io.current_pc_out := io.current_pc ``` The implementation is shown below: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class HazardDetection extends Module { val io = IO(new Bundle { val IF_ID_inst = Input(UInt(32.W)) val ID_EX_memRead = Input(Bool()) val ID_EX_rd = Input(UInt(5.W)) val pc_in = Input(SInt(32.W)) val current_pc = Input(SInt(32.W)) val inst_forward = Output(Bool()) val pc_forward = Output(Bool()) val ctrl_forward = Output(Bool()) val inst_out = Output(UInt(32.W)) val pc_out = Output(SInt(32.W)) val current_pc_out = Output(SInt(32.W)) }) val Rs1 = io.IF_ID_inst(19, 15) val Rs2 = io.IF_ID_inst(24, 20) when(io.ID_EX_memRead === 1.B && ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))) { io.inst_forward := true.B io.pc_forward := true.B io.ctrl_forward := true.B }.otherwise { io.inst_forward := false.B io.pc_forward := false.B io.ctrl_forward := false.B } io.inst_out := io.IF_ID_inst io.pc_out := io.pc_in io.current_pc_out := io.current_pc } ``` ::: #### BranchForward This module handles hazard involving B-type instructions, we will first explain the meaning of the IO ports, then explain the logic of the code. | Name | I/O | Meaning | | -------- | -------- | -------- | | `ID_EX_RD` | Input | Index of the destination register for instruction in the `ID/EX` stage. | | `EX_MEM_RD` | Input | Index of the destination register for instruction in the `EX/MEM` stage. | | `MEM_WB_RD` | Input |Index of the destination register for instruction in the `MEM/WB` stage. | | `ID_EX_memRd` | Input | whether the instruction in the `ID/EX` stage is `L-type` | | `EX_MEM_memRd` | Input | Index of the destination register for `L-type` instruction in the `EX/MEM` stage. | | `MEM_WB_memRd` | Input | Index of the destination register for `L-type` instruction in the `MEM/WB` stage. | | `rs1` | Input | `rs1` register of `B-type` instruction | | `rs2` | Input | `rs2` register of `B-type` instruction | | `ctrl_branch` | Input | whether this instruction is `B-type` | | `forward_rs1` | Output | where `rs1` came from | | `forward_rs2` | Output | where `rs2` came from | This module handles $4$ types of hazard, and we will explain with example: - `B-type`, ALU hazard Data need to be forwarded after ALU execution to prevent hazard. ```asm= add x3, x1, x2 beq x3, x4, label ``` and the circuit handling this situation: ```scala= when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U) { when(io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) { io.forward_rs1 := "b0001".U io.forward_rs2 := "b0001".U }.elsewhen(io.ID_EX_RD === io.rs1) { io.forward_rs1 := "b0001".U }.elsewhen(io.ID_EX_RD === io.rs2) { io.forward_rs2 := "b0001".U } } ``` Here `io.forward_rs1 := "b0001".U` indicates that data will be forwarded from `EXE/MEM` register. - `B-type`, EX/MEM Hazard Data fetched from memory needed to be forwarded. ```asm= lw x3, 0(x1) beq x3, x2, label ``` and the circuit handling this situation: ```scala= // EX/MEM Hazard when(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U) { when(io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2)) { io.forward_rs1 := "b0010".U io.forward_rs2 := "b0010".U }.elsewhen(io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b0010".U }.elsewhen(io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2)) { io.forward_rs2 := "b0010".U } } ``` Here, `io.forward_rs1 := "b0010".U` means data are forwarded from `MEM/WB` stage. - `B-type`, MEM/WB Hazard Circuit designed for thos part is similar as the`B-type`, EXE/MEM Hazard part: ```asm= addi x1, x2, 42 beq x1, x4, label ``` and the code accordingly: ```scala= // MEM/WB Hazard when(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U) { when(io.MEM_WB_RD === io.rs1 && io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2)) { io.forward_rs1 := "b0011".U io.forward_rs2 := "b0011".U }.elsewhen(io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b0011".U }.elsewhen(io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs2)) { io.forward_rs2 := "b0011".U } } ``` here, `io.forward_rs1 := "b0011".U` means data will be forwarded from the `WB` stage. - Jalr forwarding logic Unconditional jump causes hazard as well, here is an example: ```asm= lw x1, 0(x2) jalr x2, x1, 42 // target address depends on last instruction ``` Different types of instruction can all cause hazard, therefore, `forward_rs1` should be set correspondingly. The following sheet demonstrate how each value of `forward_rs1` means: | Value of `forward_rs1` |Type| Where data are forwarde from | | -------- | -------- | -------- | | 0001 | ALU Hazard | `ID/EX ` | | 0010 | EX/MEM Hazard | `EX/MEM` | | 0011 | MEM/WB Hazard | `MEM/WB` | | 0110 | JALR | `ID/EX` | | 0111 | JALR | `EX/MEM` | | 1001 | JALR | `EX/MEM` | | 1000 | JALR | `MEM/WB` | | 1010 | JALR | `MEM/WB` | The full implementation: :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class BranchForward extends Module { val io = IO(new Bundle { val ID_EX_RD = Input(UInt(5.W)) val EX_MEM_RD = Input(UInt(5.W)) val MEM_WB_RD = Input(UInt(5.W)) val ID_EX_memRd = Input(UInt(1.W)) val EX_MEM_memRd = Input(UInt(1.W)) val MEM_WB_memRd = Input(UInt(1.W)) val rs1 = Input(UInt(5.W)) val rs2 = Input(UInt(5.W)) val ctrl_branch = Input(UInt(1.W)) val forward_rs1 = Output(UInt(4.W)) val forward_rs2 = Output(UInt(4.W)) }) io.forward_rs1 := "b0000".U io.forward_rs2 := "b0000".U // Branch forwarding logic when(io.ctrl_branch === 1.U) { // ALU Hazard when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U) { when(io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) { io.forward_rs1 := "b0001".U io.forward_rs2 := "b0001".U }.elsewhen(io.ID_EX_RD === io.rs1) { io.forward_rs1 := "b0001".U }.elsewhen(io.ID_EX_RD === io.rs2) { io.forward_rs2 := "b0001".U } } // EX/MEM Hazard when(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U) { when(io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2)) { io.forward_rs1 := "b0010".U io.forward_rs2 := "b0010".U }.elsewhen(io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b0010".U }.elsewhen(io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2)) { io.forward_rs2 := "b0010".U } } // MEM/WB Hazard when(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U) { when(io.MEM_WB_RD === io.rs1 && io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2)) { io.forward_rs1 := "b0011".U io.forward_rs2 := "b0011".U }.elsewhen(io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b0011".U }.elsewhen(io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs2)) { io.forward_rs2 := "b0011".U } } // Jalr forwarding logic }.elsewhen(io.ctrl_branch === 0.U) { when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U && io.ID_EX_RD === io.rs1) { io.forward_rs1 := "b0110".U }.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b0111".U }.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd === 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b1001".U }.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b1000".U }.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd === 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b1010".U } } } ``` ::: #### Forwarding The module `Forwarding.scala` are handling for `data hazard`. This module handles 2 types of hazard, and we will explain with example: - EX Hazard The EX Hazard show as below: ```asm= add x3, x1, x2 sub x4, x3, x5 ``` For the first instruction `add`, the result needs to be stored in `x3`. However, `x3` is also required as an input for the sub instruction in the next line. The situation creates an **EX Hazard**. The situation is detecting by ```scala= io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1.asUInt)&& (io.EXMEM_rd === io.IDEX_rs2) ``` 1. `io.EXMEM_regWr === "b1".U` is used to detect whether the write-back signal is active, which in this case means that the result of add will be written back to x3. 2. Then, `io.EXMEM_rd =/= "b00000".U` ensures that the target register is valid (not `x0`, which always holds the value 0 in RISC-V architecture). 3. Finally, compare the target register to write with target register to calculate which in our case compare `x3`(in add) with `x3` and `x5`(in sub). This logic ensures that the hazard is detected, sending the signal `b10` allowing the processor to forward the result of `x3` directly from the `EX/MEM stage` to the next instruction before it is written back to the register file. The complete code for handling `EX hazard` would be: :::spoiler Code ```scala= // EX HAZARD when(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1.asUInt) && (io.EXMEM_rd === io.IDEX_rs2)) { io.forward_a := "b10".U io.forward_b := "b10".U }.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs2)) { io.forward_b := "b10".U }.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1)) { io.forward_a := "b10".U } ``` ::: - MEM Hazard The MEM Hazard show as below: ```scala= lw x3, 0(x1) sub x4, x3, x5 ``` For the first instruction `lw`, the result needs to be stored in `x3`. However, the next instruction `sub` also requires the data in `x3`. The situation creates a **MEM Hazard**. The situation is detecting by ```scala= (io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2) && ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1) && (io.EXMEM_rd === io.IDEX_rs2)) ``` 1. `(io.MEMWB_regWr === "b1".U)` is used to detect whether the write-back signal is active, which in this case means that the result of `lw` will be written back to `x3`. 2. Then, `io.MEMWB_rd =/= "b00000".U` ensures that the target register is valid (not `x0`, which always holds the value 0 in RISC-V architecture). 3. `(io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2)` compare the target register to write with target register to calculate which in our case compare x3(in lw) with x3 and x5(in sub). 4. Finally, exclude the situation of `EX Hazard` ```scala= ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1) && (io.EXMEM_rd === io.IDEX_rs2)) ``` This logic ensures that the hazard is detected and sends the signal b01, allowing the processor to forward the result of x3 directly from the MEM/WB stage to the next instruction, even before it is written back to the register file. #### StructuralHazard Most of the hazards are handled before, this module may be redundent. --- ### Main Once we have completed all the modules, we can integrate them into a system. First, we instantiate every module. ```scala= // Pipes of stages val IF_ID_ = Module(new IF_ID) val ID_EX_ = Module(new ID_EX) val EX_MEM_M = Module(new EX_MEM) val MEM_WB_M = Module(new MEM_WB) // PC / PC+4 val PC = Module(new PC) val PC4 = Module(new PC4) // Memory val InstMemory = Module(new InstMem ("./src/riscv/test.hex")) val DataMemory = Module(new DataMemory) // Helping Units val control_module = Module(new Control) val ImmGen = Module(new ImmGenerator) val RegFile = Module(new RegisterFile) val ALU_Control = Module(new AluControl) dontTouch(ALU_Control.io) val ALU = Module(new ALU) dontTouch(ALU.io) val Branch_M = Module(new Branch) val JALR = Module(new Jalr) // hazard units val Forwarding = Module(new Forwarding) val HazardDetect = Module(new HazardDetection) val Branch_Forward = Module(new BranchForward) val Structural = Module(new StructuralHazard) ``` Remarkably, we use `dontTouch` to prevent Chisel from removing the signal during optimization. Ref : [DontTouch](https://www.chisel-lang.org/api/latest/chisel3/dontTouch$.html) #### 1. Fetch To fetch the instruction, we select the correct program counter (PC) based on the signal from **the HazardDetect module**. ```scala= val PC_F = MuxLookup (HazardDetect.io.pc_forward, 0.S, Array ( (0.U) -> PC4.io.out.asSInt, (1.U) -> HazardDetect.io.pc_out)) ``` - If no hazard is detected, the `PC+4 value` from the **PC4 module** is selected. - If a hazard is detected, the result from the **HazardDetect module** is chosen. Then, update the current program counter (PC) by `PC.io.in := PC_F`, update the next PC by `PC4.io.pc := PC.io.out.asUInt` and fetch the instruction by `InstMemory.io.addr := PC.io.out.asUInt`. ```scala= PC.io.in := PC_F // PC_in input PC4.io.pc := PC.io.out.asUInt // PC4_in input <- PC_out InstMemory.io.addr := PC.io.out.asUInt // Address to fetch instruction ``` For PC and instruction forwarding, we choose the instruction and PC using the **HazardDetect module**. ```scala= val PC_for = MuxLookup (HazardDetect.io.inst_forward, 0.S, Array ( (0.U) -> PC.io.out, (1.U) -> HazardDetect.io.current_pc_out)) val Instruction_F = MuxLookup (HazardDetect.io.inst_forward, 0.U, Array ( (0.U) -> InstMemory.io.data, (1.U) -> HazardDetect.io.inst_out)) ``` Finally, we pass the pc and instruction to the register (i.e., IF_ID module) where - `PC.io_out` represents the **current PC**, - `PC4.io.out` represents the **next PC (i.e., PC + 4)**, - `PC_for` represents the **correct PC** selected by the **HazardDetect module**. ```scala= IF_ID_.io.pc_in := PC.io.out // PC out from pc IF_ID_.io.pc4_in := PC4.io.out // PC4 out from pc4 IF_ID_.io.SelectedPC := PC_for // Selected PC IF_ID_.io.SelectedInstr := Instruction_F // Selected Instruction ``` #### 2. Decode First, we pass the Selected instruction and PC into the **ImmGenerator module** and forward the opcode (i.e., inst[6,0]) into **control module**. ```scala= ImmGen.io.instr := IF_ID_.io.SelectedInstr_out // Instrcution to generate Immidiate Value 32 ImmGen.io.pc := IF_ID_.io.SelectedPC_out.asUInt // PC to add // Decode connections (Control unit RegFile) control_module.io.opcode := IF_ID_.io.SelectedInstr_out(6, 0) // OPcode to check Instrcution TYpe ``` If the type of the instruction is `R-type` (opcode = 51), `I-type` (opcode = 19), `S-type` (opcode = 35), `I-type (load instructions)` (opcode = 3), `SB-type (branch)` (opcode = 99), or `JALR` (opcode = 103), we will decode the `rs1` (i.e., inst[19,15]). ```scala= RegFile.io.rs1 := Mux( control_module.io.opcode === 51.U || // R-type control_module.io.opcode === 19.U || // I-type control_module.io.opcode === 35.U || // S-type control_module.io.opcode === 3.U || // I-type (load instructions) control_module.io.opcode === 99.U || // SB-type (branch) control_module.io.opcode === 103.U, // JALR instruction IF_ID_.io.SelectedInstr_out(19, 15), 0.U ) ``` Same, we decode the `rs2` (i.e., inst[24,20]) if the type of the instruction is `R-type` (opcode = 51), `S-type` (opcode = 35), `SB-type (branch)` (opcode = 99). ```scala= RegFile.io.rs2 := Mux(control_module.io.opcode === 51.U || // R-type control_module.io.opcode === 35.U || // S-type control_module.io.opcode === 99.U, // SB-type (branch) IF_ID_.io.SelectedInstr_out(24, 20), 0.U) ``` then, control the write signal by `RegFile.io.reg_write := control_module.io.reg_write`. Finally, we generate the immediate value by **ImmGenerator module**. ```scala= val ImmValue = MuxLookup (control_module.io.extend, 0.S, Array ( (0.U) -> ImmGen.io.I_type, (1.U) -> ImmGen.io.S_type, (2.U) -> ImmGen.io.U_type)) ``` For handling **Structural Hazards**, we extract the `rs1` and `rs2` from the `IF_ID instruction` and pass them to the **Structural module**. This allows the module to detect whether the stage requires the data. ```scala= Structural.io.rs1 := IF_ID_.io.SelectedInstr_out(19, 15) Structural.io.rs2 := IF_ID_.io.SelectedInstr_out(24, 20) ``` and receive the data for forwarding from `MEM_WB register`. ```scala= Structural.io.MEM_WB_regWr := MEM_WB_M.io.EXMEM_REG_W Structural.io.MEM_WB_Rd := MEM_WB_M.io.MEMWB_rd_out ``` then decide the whether the data need from forwarding by **Structural module** signal. ```scala= // rs1_data when (Structural.io.fwd_rs1 === 0.U) { S_rs1DataIn := RegFile.io.rdata1 }.elsewhen (Structural.io.fwd_rs1 === 1.U) { S_rs1DataIn := RegFile.io.w_data }.otherwise { S_rs1DataIn := 0.S } // rs2_data when (Structural.io.fwd_rs2 === 0.U) { S_rs2DataIn := RegFile.io.rdata2 }.elsewhen (Structural.io.fwd_rs2 === 1.U) { S_rs2DataIn := RegFile.io.w_data }.otherwise { S_rs2DataIn := 0.S } //ID_EX_ inputs ID_EX_.io.rs1_data_in := S_rs1DataIn ID_EX_.io.rs2_data_in := S_rs2DataIn ``` For detecting Hazard, we pass the data from `IF_ID register` and `ID_EX register` into the **HazardDetect module**. ```scala= // Hazard detection Unit inputs HazardDetect.io.IF_ID_inst := IF_ID_.io.SelectedInstr_out HazardDetect.io.ID_EX_memRead := ID_EX_.io.ctrl_MemRd_out HazardDetect.io.ID_EX_rd := ID_EX_.io.rd_out HazardDetect.io.pc_in := IF_ID_.io.pc4_out.asSInt HazardDetect.io.current_pc := IF_ID_.io.SelectedPC_out ``` then if the stall is needed (detected by **HazardDetect module**), we make a bubble by setting all the control signal to 0. ```scala= // Stall when forward when(HazardDetect.io.ctrl_forward === "b1".U) { ID_EX_.io.ctrl_MemWr_in := 0.U ID_EX_.io.ctrl_MemRd_in := 0.U ID_EX_.io.ctrl_MemToReg_in := 0.U ID_EX_.io.ctrl_Reg_W_in := 0.U ID_EX_.io.ctrl_AluOp_in := 0.U ID_EX_.io.ctrl_OpB_in := 0.U ID_EX_.io.ctrl_Branch_in := 0.U ID_EX_.io.ctrl_nextpc_in := 0.U }.otherwise { ID_EX_.io.ctrl_MemWr_in := control_module.io.mem_write ID_EX_.io.ctrl_MemRd_in := control_module.io.mem_read ID_EX_.io.ctrl_MemToReg_in := control_module.io.men_to_reg ID_EX_.io.ctrl_Reg_W_in := control_module.io.reg_write ID_EX_.io.ctrl_AluOp_in := control_module.io.alu_operation ID_EX_.io.ctrl_OpB_in := control_module.io.operand_B ID_EX_.io.ctrl_Branch_in := control_module.io.branch ID_EX_.io.ctrl_nextpc_in := control_module.io.next_pc_sel } ``` Before passing the data into the `ID_EX register`, we will handle the `branch` and `jal` operations separately, and we will explain them in separate sections. then we pass the data into `ID_EX register`. ```scala= // ID_EX PIPELINE ID_EX_.io.rs1_in := RegFile.io.rs1 ID_EX_.io.rs2_in := RegFile.io.rs2 ID_EX_.io.imm := ImmValue ID_EX_.io.func3_in := IF_ID_.io.SelectedInstr_out(14, 12) ID_EX_.io.func7_in := IF_ID_.io.SelectedInstr_out(30) ID_EX_.io.rd_in := IF_ID_.io.SelectedInstr_out(11, 7) ID_EX_.io.ctrl_OpA_in := control_module.io.operand_A // Operand A selection ID_EX_.io.IFID_pc4_in := IF_ID_.io.pc4_out // pc+4 from Decode to execute ``` #### 3. Branch and JALR For Branch and JALR we pass the data from `IF_ID`, `ID_EX`, `EX_MEM`, `MEM_WB` into the **BranchForward module** to detect whether the `rs1` and `rs2` come from. ```scala= Branch_Forward.io.ID_EX_RD := ID_EX_.io.rd_out Branch_Forward.io.EX_MEM_RD := EX_MEM_M.io.EXMEM_rd_out Branch_Forward.io.MEM_WB_RD := MEM_WB_M.io.MEMWB_rd_out Branch_Forward.io.ID_EX_memRd := ID_EX_.io.ctrl_MemRd_out Branch_Forward.io.EX_MEM_memRd := EX_MEM_M.io.EXMEM_memRd_out Branch_Forward.io.MEM_WB_memRd := MEM_WB_M.io.MEMWB_memRd_out Branch_Forward.io.rs1 := IF_ID_.io.SelectedInstr_out(19, 15) Branch_Forward.io.rs2 := IF_ID_.io.SelectedInstr_out(24, 20) Branch_Forward.io.ctrl_branch := control_module.io.branch ``` We utilize the **BranchForward module** to choose the forward `rs1` and `rs2` and pass the data into the **Branch module** to detect whether it should branch. ```scala= // Branch X Branch_M.io.arg_x := MuxLookup (Branch_Forward.io.forward_rs1, 0.S, Array ( (0.U) -> RegFile.io.rdata1, (1.U) -> ALU.io.out, (2.U) -> EX_MEM_M.io.EXMEM_alu_out, (3.U) -> RegFile.io.w_data, (4.U) -> DataMemory.io.dataOut, (5.U) -> RegFile.io.w_data, (6.U) -> RegFile.io.rdata1, (7.U) -> RegFile.io.rdata1, (8.U) -> RegFile.io.rdata1, (9.U) -> RegFile.io.rdata1, (10.U) -> RegFile.io.rdata1)) // Branch Y Branch_M.io.arg_y := MuxLookup (Branch_Forward.io.forward_rs2, 0.S, Array ( (0.U) -> RegFile.io.rdata2, (1.U) -> ALU.io.out, (2.U) -> EX_MEM_M.io.EXMEM_alu_out, (3.U) -> RegFile.io.w_data, (4.U) -> DataMemory.io.dataOut, (5.U) -> RegFile.io.w_data)) Branch_M.io.fnct3 := IF_ID_.io.SelectedInstr_out(14, 12) // Fun3 for(beq,bne....) Branch_M.io.branch := control_module.io.branch // Branch instr yes ``` also, we utilize the **JAL module** to detect how to jump the instruction. ```scala= // for JALR JALR.io.rdata1 := MuxLookup (Branch_Forward.io.forward_rs1, 0.U, Array ( (0.U) -> RegFile.io.rdata1.asUInt, (1.U) -> RegFile.io.rdata1.asUInt, (2.U) -> RegFile.io.rdata1.asUInt, (3.U) -> RegFile.io.rdata1.asUInt, (4.U) -> RegFile.io.rdata1.asUInt, (5.U) -> RegFile.io.rdata1.asUInt, (6.U) -> ALU.io.out.asUInt, (7.U) -> EX_MEM_M.io.EXMEM_alu_out.asUInt, (8.U) -> RegFile.io.w_data.asUInt, (9.U) -> DataMemory.io.dataOut.asUInt, (10.U) -> RegFile.io.w_data.asUInt)) JALR.io.imme := ImmValue.asUInt ``` The **JALR module** should output the target address with correct alignment. Finally, updating the PC by **detecting the Hazard** and **Control module**. ```scala= when(HazardDetect.io.pc_forward === 1.B) { PC.io.in := HazardDetect.io.pc_out }.otherwise { when(control_module.io.next_pc_sel === "b01".U) { when(Branch_M.io.br_taken === 1.B && control_module.io.branch === 1.B) { PC.io.in := ImmGen.io.SB_type IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U }.otherwise { PC.io.in := PC4.io.out.asSInt } }.elsewhen(control_module.io.next_pc_sel === "b10".U) { PC.io.in := ImmGen.io.UJ_type IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U }.elsewhen(control_module.io.next_pc_sel === "b11".U) { PC.io.in := JALR.io.out.asSInt IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U }.otherwise { PC.io.in := PC4.io.out.asSInt } } ``` #### 4. ALU So far, we have completed the functions for `IF`, `ID`, and `branch & jalr`. The correct register is ready for calculation, and the PC is accurate. First, pass the correct register of `rs1`, `rs2` and `rd` into the **ALU module** and also the **ALU control signal**. ```scala= ALU_Control.io.aluOp := ID_EX_.io.ctrl_AluOp_out // Alu op code ALU.io.alu_Op := ALU_Control.io.out // Alu op code ALU_Control.io.func3 := ID_EX_.io.func3_out // function 3 ALU_Control.io.func7 := ID_EX_.io.func7_out // function 7 ``` then we have to forward the data in EX stage therefore pass the data into the **Forwarding module**. ```scala= Forwarding.io.IDEX_rs1 := ID_EX_.io.rs1_out Forwarding.io.IDEX_rs2 := ID_EX_.io.rs2_out Forwarding.io.EXMEM_rd := EX_MEM_M.io.EXMEM_rd_out Forwarding.io.EXMEM_regWr := EX_MEM_M.io.EXMEM_reg_w_out Forwarding.io.MEMWB_rd := MEM_WB_M.io.MEMWB_rd_out Forwarding.io.MEMWB_regWr := MEM_WB_M.io.MEMWB_reg_w_out ``` and decide the value passing into ALU by the Forwarding module. ```scala= when (ID_EX_.io.ctrl_OpA_out === "b01".U) { ALU.io.in_A := ID_EX_.io.IFID_pc4_out.asSInt }.otherwise { // forwarding A when(Forwarding.io.forward_a === "b00".U) { ALU.io.in_A := ID_EX_.io.rs1_data_out }.elsewhen(Forwarding.io.forward_a === "b01".U) { ALU.io.in_A := d }.elsewhen(Forwarding.io.forward_a === "b10".U) { ALU.io.in_A := EX_MEM_M.io.EXMEM_alu_out }.otherwise { ALU.io.in_A := ID_EX_.io.rs1_data_out } } ``` ```scala= // forwarding B val RS2_value = Wire(SInt(32.W)) when (Forwarding.io.forward_b === 0.U) { RS2_value := ID_EX_.io.rs2_data_out }.elsewhen (Forwarding.io.forward_b === 1.U) { RS2_value := d }.elsewhen (Forwarding.io.forward_b === 2.U) { RS2_value := EX_MEM_M.io.EXMEM_alu_out }.otherwise { RS2_value := 0.S } when (ID_EX_.io.ctrl_OpB_out === 0.U) { ALU.io.in_B := RS2_value }.otherwise { ALU.io.in_B := ID_EX_.io.imm_out } ``` then pass the data into the EX_WB register. ```scala= // Execute EX_MEM_M.io.IDEX_rd := ID_EX_.io.rd_out EX_MEM_M.io.IDEX_MEMRD := ID_EX_.io.ctrl_MemRd_out EX_MEM_M.io.IDEX_MEMWR := ID_EX_.io.ctrl_MemWr_out EX_MEM_M.io.IDEX_MEMTOREG := ID_EX_.io.ctrl_MemToReg_out EX_MEM_M.io.IDEX_REG_W := ID_EX_.io.ctrl_Reg_W_out EX_MEM_M.io.IDEX_rs2 := RS2_value EX_MEM_M.io.alu_out := ALU.io.out ``` #### 5. Memory The only thing we have to do in MEM stage is to pass the data into the **DataMemory module**. ```scala= DataMemory.io.mem_read := EX_MEM_M.io.EXMEM_memRd_out DataMemory.io.mem_write := EX_MEM_M.io.EXMEM_memWr_out DataMemory.io.dataIn := EX_MEM_M.io.EXMEM_rs2_out DataMemory.io.addr := EX_MEM_M.io.EXMEM_alu_out.asUInt ``` then update the MEM_WB register ```scala= MEM_WB_M.io.EXMEM_MEMRD := EX_MEM_M.io.EXMEM_memRd_out // 0/ 1: data read from memory MEM_WB_M.io.EXMEM_MEMTOREG := EX_MEM_M.io.EXMEM_memToReg_out MEM_WB_M.io.EXMEM_REG_W := EX_MEM_M.io.EXMEM_reg_w_out MEM_WB_M.io.EXMEM_rd := EX_MEM_M.io.EXMEM_rd_out MEM_WB_M.io.in_dataMem_out := DataMemory.io.dataOut // data from Data Memory MEM_WB_M.io.in_alu_out := EX_MEM_M.io.EXMEM_alu_out // data from Alu Result ``` #### 6. Write back For the write back stage, we pass the write and read signal from MEM_WB register. ```scala= RegFile.io.w_reg := MEM_WB_M.io.MEMWB_rd_out RegFile.io.reg_write := MEM_WB_M.io.MEMWB_reg_w_out ``` Finally, we pass the data and determine which value should be written back based on the control signal. ```scla= when (MEM_WB_M.io.MEMWB_memToReg_out === 0.U) { d := MEM_WB_M.io.MEMWB_alu_out // data from Alu Result }.elsewhen (MEM_WB_M.io.MEMWB_memToReg_out === 1.U) { d := MEM_WB_M.io.MEMWB_dataMem_out // data from Data Memory }.otherwise { d := 0.S } RegFile.io.w_data := d // Write back data ``` Until now, we have completed the 5-stage RISC-V pipeline. The complete code is shown below. :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class PIPELINE extends Module { val io = IO(new Bundle { val out = Output (SInt(4.W)) }) // Pipes of stages val IF_ID_ = Module(new IF_ID) val ID_EX_ = Module(new ID_EX) val EX_MEM_M = Module(new EX_MEM) val MEM_WB_M = Module(new MEM_WB) // PC / PC+4 val PC = Module(new PC) val PC4 = Module(new PC4) // Memory val InstMemory = Module(new InstMem ("./src/riscv/rv32ui-p-add.hex")) val DataMemory = Module(new DataMemory) // Helping Units val control_module = Module(new Control) val ImmGen = Module(new ImmGenerator) val RegFile = Module(new RegisterFile) val ALU_Control = Module(new AluControl) dontTouch(ALU_Control.io) val ALU = Module(new ALU) dontTouch(ALU.io) val Branch_M = Module(new Branch) val JALR = Module(new Jalr) // hazard units val Forwarding = Module(new Forwarding) val HazardDetect = Module(new HazardDetection) val Branch_Forward = Module(new BranchForward) val Structural = Module(new StructuralHazard) val PC_F = MuxLookup (HazardDetect.io.pc_forward, 0.S, Array ( (0.U) -> PC4.io.out.asSInt, (1.U) -> HazardDetect.io.pc_out)) PC.io.in := PC_F // PC_in input PC4.io.pc := PC.io.out.asUInt // PC4_in input <- PC_out InstMemory.io.addr := PC.io.out.asUInt // Address to fetch instruction val PC_for = MuxLookup (HazardDetect.io.inst_forward, 0.S, Array ( (0.U) -> PC.io.out, (1.U) -> HazardDetect.io.current_pc_out)) val Instruction_F = MuxLookup (HazardDetect.io.inst_forward, 0.U, Array ( (0.U) -> InstMemory.io.data, (1.U) -> HazardDetect.io.inst_out)) // Fetch decode pipe connections IF_ID_.io.pc_in := PC.io.out // PC out from pc IF_ID_.io.pc4_in := PC4.io.out // PC4 out from pc4 IF_ID_.io.SelectedPC := PC_for // Selected PC IF_ID_.io.SelectedInstr := Instruction_F // Selected Instruction //ImmGenerator Inputs ImmGen.io.instr := IF_ID_.io.SelectedInstr_out // Instrcution to generate Immidiate Value 32 ImmGen.io.pc := IF_ID_.io.SelectedPC_out.asUInt // PC to add // Decode connections (Control unit RegFile) control_module.io.opcode := IF_ID_.io.SelectedInstr_out(6, 0) // OPcode to check Instrcution TYpe // Registerfile inputs RegFile.io.rs1 := Mux( control_module.io.opcode === 51.U || // R-type control_module.io.opcode === 19.U || // I-type control_module.io.opcode === 35.U || // S-type control_module.io.opcode === 3.U || // I-type (load instructions) control_module.io.opcode === 99.U || // SB-type (branch) control_module.io.opcode === 103.U, // JALR instruction IF_ID_.io.SelectedInstr_out(19, 15), 0.U ) RegFile.io.rs2 := Mux( control_module.io.opcode === 51.U || // R-type control_module.io.opcode === 35.U || // S-type control_module.io.opcode === 99.U, // SB-type (branch) IF_ID_.io.SelectedInstr_out(24, 20), 0.U) RegFile.io.reg_write := control_module.io.reg_write val ImmValue = MuxLookup (control_module.io.extend, 0.S, Array ( (0.U) -> ImmGen.io.I_type, (1.U) -> ImmGen.io.S_type, (2.U) -> ImmGen.io.U_type)) // Structural hazard inputs Structural.io.rs1 := IF_ID_.io.SelectedInstr_out(19, 15) Structural.io.rs2 := IF_ID_.io.SelectedInstr_out(24, 20) Structural.io.MEM_WB_regWr := MEM_WB_M.io.EXMEM_REG_W Structural.io.MEM_WB_Rd := MEM_WB_M.io.MEMWB_rd_out val S_rs1DataIn = Wire(SInt(32.W)) val S_rs2DataIn = Wire(SInt(32.W)) // rs1_data when (Structural.io.fwd_rs1 === 0.U) { S_rs1DataIn := RegFile.io.rdata1 }.elsewhen (Structural.io.fwd_rs1 === 1.U) { S_rs1DataIn := RegFile.io.w_data }.otherwise { S_rs1DataIn := 0.S } // rs2_data when (Structural.io.fwd_rs2 === 0.U) { S_rs2DataIn := RegFile.io.rdata2 }.elsewhen (Structural.io.fwd_rs2 === 1.U) { S_rs2DataIn := RegFile.io.w_data }.otherwise { S_rs2DataIn := 0.S } //ID_EX_ inputs ID_EX_.io.rs1_data_in := S_rs1DataIn ID_EX_.io.rs2_data_in := S_rs2DataIn // Stall when forward when(HazardDetect.io.ctrl_forward === "b1".U) { ID_EX_.io.ctrl_MemWr_in := 0.U ID_EX_.io.ctrl_MemRd_in := 0.U ID_EX_.io.ctrl_MemToReg_in := 0.U ID_EX_.io.ctrl_Reg_W_in := 0.U ID_EX_.io.ctrl_AluOp_in := 0.U ID_EX_.io.ctrl_OpB_in := 0.U ID_EX_.io.ctrl_Branch_in := 0.U ID_EX_.io.ctrl_nextpc_in := 0.U }.otherwise { ID_EX_.io.ctrl_MemWr_in := control_module.io.mem_write ID_EX_.io.ctrl_MemRd_in := control_module.io.mem_read ID_EX_.io.ctrl_MemToReg_in := control_module.io.men_to_reg ID_EX_.io.ctrl_Reg_W_in := control_module.io.reg_write ID_EX_.io.ctrl_AluOp_in := control_module.io.alu_operation ID_EX_.io.ctrl_OpB_in := control_module.io.operand_B ID_EX_.io.ctrl_Branch_in := control_module.io.branch ID_EX_.io.ctrl_nextpc_in := control_module.io.next_pc_sel } // Hazard detection Unit inputs HazardDetect.io.IF_ID_inst := IF_ID_.io.SelectedInstr_out HazardDetect.io.ID_EX_memRead := ID_EX_.io.ctrl_MemRd_out HazardDetect.io.ID_EX_rd := ID_EX_.io.rd_out HazardDetect.io.pc_in := IF_ID_.io.pc4_out.asSInt HazardDetect.io.current_pc := IF_ID_.io.SelectedPC_out MEM_WB_M.io.EXMEM_MEMRD := EX_MEM_M.io.EXMEM_memRd_out // 0/ 1: data read from memory // Branch forward Unit inputs Branch_Forward.io.ID_EX_RD := ID_EX_.io.rd_out Branch_Forward.io.EX_MEM_RD := EX_MEM_M.io.EXMEM_rd_out Branch_Forward.io.MEM_WB_RD := MEM_WB_M.io.MEMWB_rd_out Branch_Forward.io.ID_EX_memRd := ID_EX_.io.ctrl_MemRd_out Branch_Forward.io.EX_MEM_memRd := EX_MEM_M.io.EXMEM_memRd_out Branch_Forward.io.MEM_WB_memRd := MEM_WB_M.io.MEMWB_memRd_out Branch_Forward.io.rs1 := IF_ID_.io.SelectedInstr_out(19, 15) Branch_Forward.io.rs2 := IF_ID_.io.SelectedInstr_out(24, 20) Branch_Forward.io.ctrl_branch := control_module.io.branch // Branch X Branch_M.io.arg_x := MuxLookup (Branch_Forward.io.forward_rs1, 0.S, Array ( (0.U) -> RegFile.io.rdata1, (1.U) -> ALU.io.out, (2.U) -> EX_MEM_M.io.EXMEM_alu_out, (3.U) -> RegFile.io.w_data, (4.U) -> DataMemory.io.dataOut, (5.U) -> RegFile.io.w_data, (6.U) -> RegFile.io.rdata1, (7.U) -> RegFile.io.rdata1, (8.U) -> RegFile.io.rdata1, (9.U) -> RegFile.io.rdata1, (10.U) -> RegFile.io.rdata1)) // for JALR JALR.io.rdata1 := MuxLookup (Branch_Forward.io.forward_rs1, 0.U, Array ( (0.U) -> RegFile.io.rdata1.asUInt, (1.U) -> RegFile.io.rdata1.asUInt, (2.U) -> RegFile.io.rdata1.asUInt, (3.U) -> RegFile.io.rdata1.asUInt, (4.U) -> RegFile.io.rdata1.asUInt, (5.U) -> RegFile.io.rdata1.asUInt, (6.U) -> ALU.io.out.asUInt, (7.U) -> EX_MEM_M.io.EXMEM_alu_out.asUInt, (8.U) -> RegFile.io.w_data.asUInt, (9.U) -> DataMemory.io.dataOut.asUInt, (10.U) -> RegFile.io.w_data.asUInt)) JALR.io.imme := ImmValue.asUInt // Branch Y Branch_M.io.arg_y := MuxLookup (Branch_Forward.io.forward_rs2, 0.S, Array ( (0.U) -> RegFile.io.rdata2, (1.U) -> ALU.io.out, (2.U) -> EX_MEM_M.io.EXMEM_alu_out, (3.U) -> RegFile.io.w_data, (4.U) -> DataMemory.io.dataOut, (5.U) -> RegFile.io.w_data)) Branch_M.io.fnct3 := IF_ID_.io.SelectedInstr_out(14, 12) // Fun3 for(beq,bne....) Branch_M.io.branch := control_module.io.branch // Branch instr yes when(HazardDetect.io.pc_forward === 1.B) { PC.io.in := HazardDetect.io.pc_out }.otherwise { when(control_module.io.next_pc_sel === "b01".U) { when(Branch_M.io.br_taken === 1.B && control_module.io.branch === 1.B) { PC.io.in := ImmGen.io.SB_type IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U }.otherwise { PC.io.in := PC4.io.out.asSInt } }.elsewhen(control_module.io.next_pc_sel === "b10".U) { PC.io.in := ImmGen.io.UJ_type IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U }.elsewhen(control_module.io.next_pc_sel === "b11".U) { PC.io.in := JALR.io.out.asSInt IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U }.otherwise { PC.io.in := PC4.io.out.asSInt } } // ID_EX PIPELINE ID_EX_.io.rs1_in := RegFile.io.rs1 ID_EX_.io.rs2_in := RegFile.io.rs2 ID_EX_.io.imm := ImmValue ID_EX_.io.func3_in := IF_ID_.io.SelectedInstr_out(14, 12) ID_EX_.io.func7_in := IF_ID_.io.SelectedInstr_out(30) ID_EX_.io.rd_in := IF_ID_.io.SelectedInstr_out(11, 7) ALU_Control.io.aluOp := ID_EX_.io.ctrl_AluOp_out // Alu op code ALU.io.alu_Op := ALU_Control.io.out // Alu op code ALU_Control.io.func3 := ID_EX_.io.func3_out // function 3 ALU_Control.io.func7 := ID_EX_.io.func7_out // function 7 EX_MEM_M.io.IDEX_rd := ID_EX_.io.rd_out // Forwarding Inputs Forwarding.io.IDEX_rs1 := ID_EX_.io.rs1_out Forwarding.io.IDEX_rs2 := ID_EX_.io.rs2_out Forwarding.io.EXMEM_rd := EX_MEM_M.io.EXMEM_rd_out Forwarding.io.EXMEM_regWr := EX_MEM_M.io.EXMEM_reg_w_out Forwarding.io.MEMWB_rd := MEM_WB_M.io.MEMWB_rd_out Forwarding.io.MEMWB_regWr := MEM_WB_M.io.MEMWB_reg_w_out ID_EX_.io.ctrl_OpA_in := control_module.io.operand_A // Operand A selection ID_EX_.io.IFID_pc4_in := IF_ID_.io.pc4_out // pc+4 from Decode to execute val d = Wire(SInt(32.W)) when (ID_EX_.io.ctrl_OpA_out === "b01".U) { ALU.io.in_A := ID_EX_.io.IFID_pc4_out.asSInt }.otherwise { // forwarding A when(Forwarding.io.forward_a === "b00".U) { ALU.io.in_A := ID_EX_.io.rs1_data_out }.elsewhen(Forwarding.io.forward_a === "b01".U) { ALU.io.in_A := d }.elsewhen(Forwarding.io.forward_a === "b10".U) { ALU.io.in_A := EX_MEM_M.io.EXMEM_alu_out }.otherwise { ALU.io.in_A := ID_EX_.io.rs1_data_out } } // forwarding B val RS2_value = Wire(SInt(32.W)) when (Forwarding.io.forward_b === 0.U) { RS2_value := ID_EX_.io.rs2_data_out }.elsewhen (Forwarding.io.forward_b === 1.U) { RS2_value := d }.elsewhen (Forwarding.io.forward_b === 2.U) { RS2_value := EX_MEM_M.io.EXMEM_alu_out }.otherwise { RS2_value := 0.S } when (ID_EX_.io.ctrl_OpB_out === 0.U) { ALU.io.in_B := RS2_value }.otherwise { ALU.io.in_B := ID_EX_.io.imm_out } // Execute EX_MEM_M.io.IDEX_MEMRD := ID_EX_.io.ctrl_MemRd_out EX_MEM_M.io.IDEX_MEMWR := ID_EX_.io.ctrl_MemWr_out EX_MEM_M.io.IDEX_MEMTOREG := ID_EX_.io.ctrl_MemToReg_out EX_MEM_M.io.IDEX_REG_W := ID_EX_.io.ctrl_Reg_W_out EX_MEM_M.io.IDEX_rs2 := RS2_value EX_MEM_M.io.alu_out := ALU.io.out // Data memory inputs DataMemory.io.mem_read := EX_MEM_M.io.EXMEM_memRd_out DataMemory.io.mem_write := EX_MEM_M.io.EXMEM_memWr_out DataMemory.io.dataIn := EX_MEM_M.io.EXMEM_rs2_out DataMemory.io.addr := EX_MEM_M.io.EXMEM_alu_out.asUInt MEM_WB_M.io.EXMEM_MEMTOREG := EX_MEM_M.io.EXMEM_memToReg_out MEM_WB_M.io.EXMEM_REG_W := EX_MEM_M.io.EXMEM_reg_w_out MEM_WB_M.io.EXMEM_rd := EX_MEM_M.io.EXMEM_rd_out MEM_WB_M.io.in_dataMem_out := DataMemory.io.dataOut // data from Data Memory MEM_WB_M.io.in_alu_out := EX_MEM_M.io.EXMEM_alu_out // data from Alu Result // Register file connections RegFile.io.w_reg := MEM_WB_M.io.MEMWB_rd_out RegFile.io.reg_write := MEM_WB_M.io.MEMWB_reg_w_out // Write back data to registerfile writedata when (MEM_WB_M.io.MEMWB_memToReg_out === 0.U) { d := MEM_WB_M.io.MEMWB_alu_out // data from Alu Result }.elsewhen (MEM_WB_M.io.MEMWB_memToReg_out === 1.U) { d := MEM_WB_M.io.MEMWB_dataMem_out // data from Data Memory }.otherwise { d := 0.S } RegFile.io.w_data := d // Write back data io.out := 0.S // printf(p"pc : 0x${Hexadecimal(IF_ID_.io.SelectedPC)}\n") // printf(p"inst : 0x${Hexadecimal(InstMemory.io.data)}\n") } ``` ::: ## 5-Stage-RV32I (Testing) In this section, we test each function by using `ChiselTest` which is base on `ScalaTest` and by using `Verilator` which is a powerful tool for simulating verilog module. Ref : [Chisel Cookbook (Migrating from ChiselTest to ChiselSim)][ChiselTest_ChiselCookbook] :::info #### Using Verilator Chisel provide tools to convert scala code into verilog. For example: ```shell= scala-cli generate.scala ``` Once the Verilog module is generated, it can be converted into a C++ object using verilator. Ex : ```shell= verilator --cc DataMemory.v --exe DataMemory_tb.cpp --trace ``` A testbench can then be written in C++ (Ex : DataMemory_tb.cpp) to test the module. Ref : [Chisel Cookbook (Quickstart with Scala CLI)][Verilator_ChiselCookbook] ::: ### Memory #### Data Memory ##### Test with ChiselTest For the **DataMemory module**, we test the functionality of `Write Data` and `Read Data` separately. First, initialize the Data Memory and set the `address 0` to `42`. ```scala= c.io.addr.poke(0.U) // Address 0 c.io.dataIn.poke(42.S) // Write data = 42 c.io.mem_write.poke(false.B) // Disable write initially c.io.mem_read.poke(false.B) // Disable read initially c.clock.step(1) // Step clock ``` To test the functionality of `Write Data`, we set the `mem_write` to **true** and step the clock. ```scala= c.io.mem_write.poke(true.B) // Enable write c.clock.step(1) // Step clock c.io.mem_write.poke(false.B) // Disable write after one cycle ``` To test the functionality of `Read Data`, we set the `mem_read` to **true**, step the clock and expect the output to be `42`. ```scala= c.io.mem_read.poke(true.B) // Enable read c.clock.step(1) // Step clock c.io.dataOut.expect(42.S) // Expect dataOut = 42 c.io.mem_read.poke(false.B) // Disable read ``` Finally, we test `Write Data` and `Read data` together. Write the `Address 1` to `-15` and read it from the memory. ```scala= c.io.addr.poke(1.U) // Address 1 c.io.dataIn.poke(-15.S) // Write data = -15 c.io.mem_write.poke(true.B) // Enable write c.clock.step(1) // Step clock c.io.mem_write.poke(false.B) // Disable write c.io.mem_read.poke(true.B) // Enable read c.clock.step(1) // Step clock c.io.dataOut.expect(-15.S) // Expect dataOut = -15 ``` The whole code of `DataMemoryTest.scala` would be ::: spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class DataMemoryTester extends AnyFlatSpec with ChiselScalatestTester { behavior of "DataMemory" it should "Correct" in { test(new DataMemory) { c => // Initialize c.io.addr.poke(0.U) // Address 0 c.io.dataIn.poke(42.S) // Write data = 42 c.io.mem_write.poke(false.B) // Disable write initially c.io.mem_read.poke(false.B) // Disable read initially c.clock.step(1) // Advance clock // Write data c.io.mem_write.poke(true.B) // Enable write c.clock.step(1) // Advance clock c.io.mem_write.poke(false.B) // Disable write after one cycle // Read data c.io.mem_read.poke(true.B) // Enable read c.clock.step(1) // Advance clock c.io.dataOut.expect(42.S) // Expect dataOut = 42 c.io.mem_read.poke(false.B) // Disable read // Write and read from another address c.io.addr.poke(1.U) // Address 1 c.io.dataIn.poke(-15.S) // Write data = -15 c.io.mem_write.poke(true.B) // Enable write c.clock.step(1) // Advance clock c.io.mem_write.poke(false.B) // Disable write c.io.mem_read.poke(true.B) // Enable read c.clock.step(1) // Advance clock c.io.dataOut.expect(-15.S) // Expect dataOut = -15 } } } ``` ::: ##### Test with Verilator The code for generating verilog: ```scala= object Main extends App { println( ChiselStage.emitSystemVerilog( gen = new DataMemory, firtoolOpts = Array("-disable-all-randomization", "-strip-debug-info") ) ) } ``` and the generated verilog code can be derived: ```verilog= module Dmemory_1024x32( input [9:0] R0_addr, input R0_en, R0_clk, output [31:0] R0_data, input [9:0] W0_addr, input W0_en, W0_clk, input [31:0] W0_data ); reg [31:0] Memory[0:1023]; always @(posedge W0_clk) begin if (W0_en & 1'h1) Memory[W0_addr] <= W0_data; end // always @(posedge) assign R0_data = R0_en ? Memory[R0_addr] : 32'bx; endmodule module DataMemory( input clock, reset, input [31:0] io_addr, io_dataIn, input io_mem_read, io_mem_write, output [31:0] io_dataOut ); wire [31:0] _Dmemory_ext_R0_data; Dmemory_1024x32 Dmemory_ext ( .R0_addr (io_addr[9:0]), .R0_en (io_mem_read), .R0_clk (clock), .R0_data (_Dmemory_ext_R0_data), .W0_addr (io_addr[9:0]), .W0_en (io_mem_write), .W0_clk (clock), .W0_data (io_dataIn) ); assign io_dataOut = io_mem_read ? _Dmemory_ext_R0_data : 32'h0; endmodule ``` We wrote a testbench to test the module's functionality: :::spoiler Code ```cpp= #include "VDataMemory.h" #include "verilated.h" #include "verilated_vcd_c.h" #include <stdint.h> #include <iostream> // clock void tick(VDataMemory* dut, VerilatedVcdC* tfp, int tickcount) { dut->clock = 0; // negedge dut->eval(); if (tfp) tfp->dump(tickcount * 10 - 5); // dump negedge result dut->clock = 1; // posedge dut->eval(); if (tfp) tfp->dump(tickcount * 10); // dump posedge result if (tfp) tfp->flush(); } int main(int argc, char** argv) { Verilated::commandArgs(argc, argv); Verilated::traceEverOn(true); VDataMemory* dut = new VDataMemory; VerilatedVcdC* tfp = new VerilatedVcdC; dut->trace(tfp, 99); // 99 for tracing all signals tfp->open("DataMemory.vcd"); // init dut->reset = 1; int tickcount = 0; tick(dut, tfp, ++tickcount); dut->reset = 0; for(int i = 0; i < 30; i++) { // write into memory uint32_t addr = rand() % 1024; int data = rand(); dut->io_mem_write = 1; dut->io_mem_read = 0; dut->io_addr = addr; dut->io_dataIn = data; tick(dut, tfp, ++tickcount); // read from memory dut->io_mem_write = 0; dut->io_mem_read = 1; dut->io_addr = addr; tick(dut, tfp, ++tickcount); // verify output if (dut->io_dataOut == data) { printf("Test passed: Read value is %d\n", dut->io_dataOut); } else { printf("Test failed: Expected %d, ", data); printf("but got %d\n", dut->io_dataOut); } } dut->final(); if (tfp) tfp->close(); delete dut; delete tfp; return 0; } ``` ::: and the verification result: ```shell= Test passed: Read value is 846930886 Test passed: Read value is 1714636915 Test passed: Read value is 424238335 Test passed: Read value is 1649760492 Test passed: Read value is 1189641421 Test passed: Read value is 1350490027 Test passed: Read value is 1102520059 Test passed: Read value is 1967513926 Test passed: Read value is 1540383426 Test passed: Read value is 1303455736 Test passed: Read value is 521595368 Test passed: Read value is 1726956429 Test passed: Read value is 861021530 Test passed: Read value is 233665123 Test passed: Read value is 468703135 Test passed: Read value is 1801979802 Test passed: Read value is 635723058 Test passed: Read value is 1125898167 Test passed: Read value is 2089018456 Test passed: Read value is 1656478042 Test passed: Read value is 1653377373 Test passed: Read value is 1914544919 Test passed: Read value is 756898537 Test passed: Read value is 1973594324 Test passed: Read value is 2038664370 Test passed: Read value is 184803526 Test passed: Read value is 1424268980 Test passed: Read value is 749241873 Test passed: Read value is 42999170 Test passed: Read value is 13549728 ``` #### Instruction Memory ##### Test with ChiselTest For the **InstMem module**, we utilize the `.hex` file from `rv32ui-p-add.hex` ```hex= 6f 00 00 05 73 2f 20 34 93 0f 80 00 63 08 ff 03 ``` we extract the first four instructions which is `0x0500006f`,`0x34202f73`,`0x00800f93`, and `0x03ff0863`. By using `Seq`, we can generate the test cases for the module. ```scala= val testCases = Seq( (0.U, "h0500006f".U), // Address 0, Expect 0x0500006f (4.U, "h34202f73".U), // Address 4, Expect 0x34202f73 (8.U, "h00800f93".U), // Address 8, Expect 0x00800f93 (12.U, "h03ff0863".U) // Address 12, Expect 0x03ff0863 ) ``` then test the module by `poke` and `expect` ```scala= for ((addr, expectedData) <- testCases) { c.io.addr.poke(addr) c.clock.step(1) println(s"Address: $addr, Expected: $expectedData, Actual: 0x ${c.io.data.peek().litValue}") c.io.data.expect(expectedData) } ``` :::info Chisel uses `h` as the prefix for hexadecimal values, instead of `0x` as used in C/C++. Ref : [Chisel Cookbook (Chisel Data Types)][DataTypes_ChiselCookbook] ::: :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class InstMem_Tester extends AnyFlatSpec with ChiselScalatestTester { behavior of "InstMem" it should "Read data from file" in { test(new InstMem("./src/riscv/rv32ui-p-add.hex")) { c => val testCases = Seq( (0.U, "h0500006f".U), // Address 0, Expect 0x0500006f (4.U, "h34202f73".U), // Address 4, Expect 0x34202f73 (8.U, "h00800f93".U), // Address 8, Expect 0x00800f93 (12.U, "h03ff0863".U) // Address 12, Expect 0x03ff0863 ) for ((addr, expectedData) <- testCases) { c.io.addr.poke(addr) c.clock.step(1) c.io.data.expect(expectedData) } } } } ``` ::: ##### Test with Verilator --- ### Pipelines For the testing of the piplnes we test by **initialize the registers**, **step the clock** and **validate the outputs**. The algorithm of `ChiselTest` remains the same; therefore, to reduce the length of the article, the complete code will be provided without additional explanation. #### IF_ID ##### Test with ChiselTest For the **IF_ID module**, we also test the initialize of the registers. ```scala= c.io.pc_out.expect(0.S) c.io.pc4_out.expect(0.U) c.io.SelectedPC_out.expect(0.S) c.io.SelectedInstr_out.expect(0.U) ``` >However, Chisel seems to initialize the registers to zero >automatically, so we are not sure whether the test works. The complete code : :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class IF_ID_Test extends AnyFlatSpec with ChiselScalatestTester { behavior of "IF_ID" it should "Initialize registers with default values" in { test(new IF_ID) { c => c.io.pc_out.expect(0.S) c.io.pc4_out.expect(0.U) c.io.SelectedPC_out.expect(0.S) c.io.SelectedInstr_out.expect(0.U) } } it should "Pass inputs to outputs after one clock cycle" in { test(new IF_ID) { c => val testPcIn = 42.S val testPc4In = 46.U val testSelectedPC = 100.S val testSelectedInstr = "h12345678".U c.io.pc_in.poke(testPcIn) c.io.pc4_in.poke(testPc4In) c.io.SelectedPC.poke(testSelectedPC) c.io.SelectedInstr.poke(testSelectedInstr) c.clock.step(1) c.io.pc_out.expect(testPcIn) c.io.pc4_out.expect(testPc4In) c.io.SelectedPC_out.expect(testSelectedPC) c.io.SelectedInstr_out.expect(testSelectedInstr) } } } ``` ::: #### ID_EX ##### Test with ChiselTest :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class ID_EX_Test extends AnyFlatSpec with ChiselScalatestTester { behavior of "ID_EX" it should "Pass inputs to outputs after one clock cycle" in { test(new ID_EX) { c => // Initialize inputs c.io.rs1_in.poke(1.U) c.io.rs2_in.poke(2.U) c.io.rs1_data_in.poke(10.S) c.io.rs2_data_in.poke(20.S) c.io.imm.poke(100.S) c.io.rd_in.poke(5.U) c.io.func3_in.poke(3.U) c.io.func7_in.poke(true.B) c.io.ctrl_MemWr_in.poke(true.B) c.io.ctrl_Branch_in.poke(false.B) c.io.ctrl_MemRd_in.poke(true.B) c.io.ctrl_Reg_W_in.poke(true.B) c.io.ctrl_MemToReg_in.poke(false.B) c.io.ctrl_AluOp_in.poke(2.U) c.io.ctrl_OpA_in.poke(1.U) c.io.ctrl_OpB_in.poke(true.B) c.io.ctrl_nextpc_in.poke(1.U) c.io.IFID_pc4_in.poke(4.U) // Clock step to register inputs c.clock.step(1) // Validate outputs c.io.rs1_out.expect(1.U) c.io.rs2_out.expect(2.U) c.io.rs1_data_out.expect(10.S) c.io.rs2_data_out.expect(20.S) c.io.imm_out.expect(100.S) c.io.rd_out.expect(5.U) c.io.func3_out.expect(3.U) c.io.func7_out.expect(true.B) c.io.ctrl_MemWr_out.expect(true.B) c.io.ctrl_Branch_out.expect(false.B) c.io.ctrl_MemRd_out.expect(true.B) c.io.ctrl_Reg_W_out.expect(true.B) c.io.ctrl_MemToReg_out.expect(false.B) c.io.ctrl_AluOp_out.expect(2.U) c.io.ctrl_OpA_out.expect(1.U) c.io.ctrl_OpB_out.expect(true.B) c.io.ctrl_nextpc_out.expect(1.U) c.io.IFID_pc4_out.expect(4.U) } } } ``` ::: ##### Test with Verilator #### EX_MEM ##### Test with ChiselTest :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class EX_MEM_Test extends AnyFlatSpec with ChiselScalatestTester { behavior of "EX_MEM" it should "Pass inputs to outputs after one clock cycle" in { test(new EX_MEM) { c => c.io.IDEX_MEMRD.poke(false.B) c.io.IDEX_MEMWR.poke(false.B) c.io.IDEX_MEMTOREG.poke(false.B) c.io.IDEX_REG_W.poke(false.B) c.io.IDEX_rs2.poke(0.S) c.io.IDEX_rd.poke(0.U) c.io.alu_out.poke(0.S) c.clock.step(1) c.io.EXMEM_memRd_out.expect(false.B) c.io.EXMEM_memWr_out.expect(false.B) c.io.EXMEM_memToReg_out.expect(false.B) c.io.EXMEM_reg_w_out.expect(false.B) c.io.EXMEM_rs2_out.expect(0.S) c.io.EXMEM_rd_out.expect(0.U) c.io.EXMEM_alu_out.expect(0.S) c.io.IDEX_MEMRD.poke(true.B) c.io.IDEX_MEMWR.poke(true.B) c.io.IDEX_MEMTOREG.poke(true.B) c.io.IDEX_REG_W.poke(true.B) c.io.IDEX_rs2.poke(42.S) c.io.IDEX_rd.poke(5.U) c.io.alu_out.poke(123.S) c.clock.step(1) c.io.EXMEM_memRd_out.expect(true.B) c.io.EXMEM_memWr_out.expect(true.B) c.io.EXMEM_memToReg_out.expect(true.B) c.io.EXMEM_reg_w_out.expect(true.B) c.io.EXMEM_rs2_out.expect(42.S) c.io.EXMEM_rd_out.expect(5.U) c.io.EXMEM_alu_out.expect(123.S) c.io.IDEX_MEMRD.poke(false.B) c.io.IDEX_MEMWR.poke(false.B) c.io.IDEX_MEMTOREG.poke(false.B) c.io.IDEX_REG_W.poke(false.B) c.io.IDEX_rs2.poke(-100.S) c.io.IDEX_rd.poke(10.U) c.io.alu_out.poke(-50.S) c.clock.step(1) c.io.EXMEM_memRd_out.expect(false.B) c.io.EXMEM_memWr_out.expect(false.B) c.io.EXMEM_memToReg_out.expect(false.B) c.io.EXMEM_reg_w_out.expect(false.B) c.io.EXMEM_rs2_out.expect(-100.S) c.io.EXMEM_rd_out.expect(10.U) c.io.EXMEM_alu_out.expect(-50.S) } } } ``` ::: ##### Test with Verilator #### MEM_WB ##### Test with ChiselTest :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class MEM_WBTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "MEM_WB" it should "Pass inputs to outputs after one clock cycle" in { test(new MEM_WB) { c => c.io.EXMEM_MEMTOREG.poke(true.B) c.io.EXMEM_REG_W.poke(true.B) c.io.EXMEM_MEMRD.poke(false.B) c.io.EXMEM_rd.poke(5.U) c.io.in_dataMem_out.poke(123.S) c.io.in_alu_out.poke(456.S) c.clock.step(1) c.io.MEMWB_memToReg_out.expect(true.B) c.io.MEMWB_reg_w_out.expect(true.B) c.io.MEMWB_memRd_out.expect(false.B) c.io.MEMWB_rd_out.expect(5.U) c.io.MEMWB_dataMem_out.expect(123.S) c.io.MEMWB_alu_out.expect(456.S) c.io.EXMEM_MEMTOREG.poke(false.B) c.io.EXMEM_REG_W.poke(false.B) c.io.EXMEM_MEMRD.poke(true.B) c.io.EXMEM_rd.poke(10.U) c.io.in_dataMem_out.poke(789.S) c.io.in_alu_out.poke(1011.S) c.clock.step(1) c.io.MEMWB_memToReg_out.expect(false.B) c.io.MEMWB_reg_w_out.expect(false.B) c.io.MEMWB_memRd_out.expect(true.B) c.io.MEMWB_rd_out.expect(10.U) c.io.MEMWB_dataMem_out.expect(789.S) c.io.MEMWB_alu_out.expect(1011.S) } } } ``` ::: ##### Test with Verilator ### UNits #### Alu_Control ##### Test with ChiselTest For the **Alu Control module**, we test all of the `alu op` to ensure the result concate correctly. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class AluControlTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "AluControl" it should "generate correct control signals" in { test(new AluControl) { c => // Test R type c.io.func3.poke("b010".U) c.io.func7.poke(true.B) c.io.aluOp.poke(0.U) c.io.out.expect("b001010".U) c.io.func3.poke("b000".U) c.io.func7.poke(false.B) c.io.aluOp.poke(0.U) c.io.out.expect("b000000".U) // Test I type c.io.func3.poke("b101".U) c.io.func7.poke(false.B) c.io.aluOp.poke(1.U) c.io.out.expect("b00101".U) c.io.func3.poke("b011".U) c.io.aluOp.poke(1.U) c.io.out.expect("b00011".U) // Test SB type c.io.func3.poke("b110".U) c.io.aluOp.poke(2.U) c.io.out.expect("b010110".U) c.io.func3.poke("b001".U) c.io.aluOp.poke(2.U) c.io.out.expect("b010001".U) // Test Branch type c.io.aluOp.poke(3.U) c.io.out.expect("b11111".U) // Test Loads, S type, U type (lui), U type (auipc) c.io.aluOp.poke(4.U) c.io.out.expect("b00000".U) c.io.aluOp.poke(5.U) c.io.out.expect("b00000".U) c.io.aluOp.poke(6.U) c.io.out.expect("b00000".U) c.io.aluOp.poke(7.U) c.io.out.expect("b00000".U) } } } ``` ::: ##### Test with Verilator #### Alu ##### Test with ChiselTest For the **ALU module**, we test all operations to ensure they produce correct results. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec import AluOpCode._ class ALUTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "ALU" it should "perform correct operations" in { test(new ALU) { c => // Test ALU_ADD and ALU_ADDI c.io.in_A.poke(10.S) c.io.in_B.poke(15.S) c.io.alu_Op.poke(ALU_ADD) c.io.out.expect(25.S) c.io.alu_Op.poke(ALU_ADDI) c.io.out.expect(25.S) // Test ALU_SUB c.io.alu_Op.poke(ALU_SUB) c.io.out.expect(-5.S) // Test ALU_SLL and ALU_SLLI c.io.in_A.poke(3.S) // 3 = 0b11 c.io.in_B.poke(2.S) // Shift left by 2 c.io.alu_Op.poke(ALU_SLL) c.io.out.expect(12.S) // 12 = 0b1100 c.io.alu_Op.poke(ALU_SLLI) c.io.out.expect(12.S) // Test ALU_SLT and ALU_SLTI c.io.in_A.poke(5.S) c.io.in_B.poke(10.S) c.io.alu_Op.poke(ALU_SLT) c.io.out.expect(1.S) c.io.alu_Op.poke(ALU_SLTI) c.io.out.expect(1.S) // Test ALU_SLTU and ALU_SLTUI c.io.in_A.poke(-1.S) // Treated as unsigned: max value c.io.in_B.poke(0.S) c.io.alu_Op.poke(ALU_SLTU) c.io.out.expect(0.S) c.io.alu_Op.poke(ALU_SLTUI) c.io.out.expect(0.S) // Test ALU_XOR and ALU_XORI c.io.in_A.poke(6.S) // 6 = 0b110 c.io.in_B.poke(3.S) // 3 = 0b011 c.io.alu_Op.poke(ALU_XOR) c.io.out.expect(5.S) // 5 = 0b101 c.io.alu_Op.poke(ALU_XORI) c.io.out.expect(5.S) // Test ALU_OR and ALU_ORI c.io.alu_Op.poke(ALU_OR) c.io.out.expect(7.S) // 7 = 0b111 c.io.alu_Op.poke(ALU_ORI) c.io.out.expect(7.S) // Test ALU_AND and ALU_ANDI c.io.alu_Op.poke(ALU_AND) c.io.out.expect(2.S) // 2 = 0b010 c.io.alu_Op.poke(ALU_ANDI) c.io.out.expect(2.S) // Test ALU_SRL and ALU_SRLI c.io.in_A.poke(16.S) // 16 = 0b10000 c.io.in_B.poke(2.S) c.io.alu_Op.poke(ALU_SRL) c.io.out.expect(4.S) // 4 = 0b100 c.io.alu_Op.poke(ALU_SRLI) c.io.out.expect(4.S) // Test ALU_SRA and ALU_SRAI c.io.in_A.poke(-16.S) // -16 = 0b1111111111110000 (sign-extended) c.io.alu_Op.poke(ALU_SRA) c.io.out.expect(-4.S) // -4 = 0b1111111111111100 c.io.alu_Op.poke(ALU_SRAI) c.io.out.expect(-4.S) // Test ALU_JAL and ALU_JALR c.io.in_A.poke(42.S) c.io.alu_Op.poke(ALU_JAL) c.io.out.expect(42.S) c.io.alu_Op.poke(ALU_JALR) c.io.out.expect(42.S) } } } ``` ::: ##### Test with Verilator #### BRANCH ##### Test with ChiselTest For the **Branch module**, we set an function to be able to input the test case. ```scala= def testBranch(fnct3: Int, branch: Boolean, x: Int, y: Int, expected: Boolean): Unit = { c.io.fnct3.poke(fnct3.U) c.io.branch.poke(branch.B) c.io.arg_x.poke(x.S) c.io.arg_y.poke(y.S) c.clock.step() c.io.br_taken.expect(expected.B) } ``` then test all the possible condition of branch ```scala= // beq (fnct3 = 0) testBranch(0, branch = true, x = 10, y = 10, expected = true) // Equal testBranch(0, branch = true, x = 10, y = 5, expected = false) // Not equal // bne (fnct3 = 1) testBranch(1, branch = true, x = 10, y = 10, expected = false) // Equal testBranch(1, branch = true, x = 10, y = 5, expected = true) // Not equal // blt (fnct3 = 4) testBranch(4, branch = true, x = 5, y = 10, expected = true) // Less than testBranch(4, branch = true, x = 10, y = 5, expected = false) // Not less than // bge (fnct3 = 5) testBranch(5, branch = true, x = 10, y = 5, expected = true) // Greater than or equal testBranch(5, branch = true, x = 5, y = 10, expected = false) // Not greater than or equal // bltu (fnct3 = 6) testBranch(6, branch = true, x = -1, y = 10, expected = false) // Unsigned: -1 is large testBranch(6, branch = true, x = 5, y = 10, expected = true) // Unsigned less than // bgeu (fnct3 = 7) testBranch(7, branch = true, x = -1, y = 10, expected = true) // Unsigned: -1 is large testBranch(7, branch = true, x = 5, y = 10, expected = false) // Unsigned not greater than or equal // branch = false (should always be false) testBranch(0, branch = false, x = 10, y = 10, expected = false) testBranch(1, branch = false, x = 10, y = 5, expected = false) testBranch(4, branch = false, x = 5, y = 10, expected = false) ``` :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class BranchTester extends AnyFlatSpec with ChiselScalatestTester { behavior of "Branch" it should "correctly evaluate branch conditions" in { test(new Branch) { c => // Function to test a single case def testBranch(fnct3: Int, branch: Boolean, x: Int, y: Int, expected: Boolean): Unit = { c.io.fnct3.poke(fnct3.U) c.io.branch.poke(branch.B) c.io.arg_x.poke(x.S) c.io.arg_y.poke(y.S) c.clock.step() c.io.br_taken.expect(expected.B) } // beq (fnct3 = 0) testBranch(0, branch = true, x = 10, y = 10, expected = true) // Equal testBranch(0, branch = true, x = 10, y = 5, expected = false) // Not equal // bne (fnct3 = 1) testBranch(1, branch = true, x = 10, y = 10, expected = false) // Equal testBranch(1, branch = true, x = 10, y = 5, expected = true) // Not equal // blt (fnct3 = 4) testBranch(4, branch = true, x = 5, y = 10, expected = true) // Less than testBranch(4, branch = true, x = 10, y = 5, expected = false) // Not less than // bge (fnct3 = 5) testBranch(5, branch = true, x = 10, y = 5, expected = true) // Greater than or equal testBranch(5, branch = true, x = 5, y = 10, expected = false) // Not greater than or equal // bltu (fnct3 = 6) testBranch(6, branch = true, x = -1, y = 10, expected = false) // Unsigned: -1 is large testBranch(6, branch = true, x = 5, y = 10, expected = true) // Unsigned less than // bgeu (fnct3 = 7) testBranch(7, branch = true, x = -1, y = 10, expected = true) // Unsigned: -1 is large testBranch(7, branch = true, x = 5, y = 10, expected = false) // Unsigned not greater than or equal // branch = false (should always be false) testBranch(0, branch = false, x = 10, y = 10, expected = false) testBranch(1, branch = false, x = 10, y = 5, expected = false) testBranch(4, branch = false, x = 5, y = 10, expected = false) } } } ``` ::: #### Control ##### Test with ChiselTest For the **Control module**, we test for each opcode ```scala= testOpcode(Opcode,(memWrite, branch, memRead, regWrite, menToReg, aluOp, opA, opB, ext, nextPcSel)) ``` ```scala= // R-type instruction (opcode 51) testOpcode(51, (false.B, false.B, false.B, true.B, false.B, 0.U, 0.U, false.B, 0.U, 0.U)) // I-type instruction (opcode 19) testOpcode(19, (false.B, false.B, false.B, true.B, false.B, 1.U, 0.U, true.B, 0.U, 0.U)) // S-type instruction (opcode 35) testOpcode(35, (true.B, false.B, false.B, false.B, false.B, 5.U, 0.U, true.B, 1.U, 0.U)) // Load instruction (opcode 3) testOpcode(3, (false.B, false.B, true.B, true.B, true.B, 4.U, 0.U, true.B, 0.U, 0.U)) // SB-type instruction (opcode 99) testOpcode(99, (false.B, true.B, false.B, false.B, false.B, 2.U, 0.U, false.B, 0.U, 1.U)) // UJ-type instruction (opcode 111) testOpcode(111, (false.B, false.B, false.B, true.B, false.B, 3.U, 1.U, false.B, 0.U, 2.U)) // Jalr instruction (opcode 103) testOpcode(103, (false.B, false.B, false.B, true.B, false.B, 3.U, 1.U, false.B, 0.U, 3.U)) // U-type (LUI) instruction (opcode 55) testOpcode(55, (false.B, false.B, false.B, true.B, false.B, 6.U, 3.U, true.B, 2.U, 0.U)) // U-type (AUIPC) instruction (opcode 23) testOpcode(23, (false.B, false.B, false.B, true.B, false.B, 7.U, 2.U, true.B, 2.U, 0.U)) ``` :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class ControlTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "Control" it should "generate correct signals for each opcode" in { test(new Control) { c => // Function to test opcodes def testOpcode(opcode: Int, expectedSignals: (Bool, Bool, Bool, Bool, Bool, UInt, UInt, Bool, UInt, UInt)) = { c.io.opcode.poke(opcode.U) c.clock.step(1) val (memWrite, branch, memRead, regWrite, menToReg, aluOp, opA, opB, ext, nextPcSel) = expectedSignals c.io.mem_write.expect(memWrite) c.io.branch.expect(branch) c.io.mem_read.expect(memRead) c.io.reg_write.expect(regWrite) c.io.men_to_reg.expect(menToReg) c.io.alu_operation.expect(aluOp) c.io.operand_A.expect(opA) c.io.operand_B.expect(opB) c.io.extend.expect(ext) c.io.next_pc_sel.expect(nextPcSel) } // R-type instruction (opcode 51) testOpcode(51, (false.B, false.B, false.B, true.B, false.B, 0.U, 0.U, false.B, 0.U, 0.U)) // I-type instruction (opcode 19) testOpcode(19, (false.B, false.B, false.B, true.B, false.B, 1.U, 0.U, true.B, 0.U, 0.U)) // S-type instruction (opcode 35) testOpcode(35, (true.B, false.B, false.B, false.B, false.B, 5.U, 0.U, true.B, 1.U, 0.U)) // Load instruction (opcode 3) testOpcode(3, (false.B, false.B, true.B, true.B, true.B, 4.U, 0.U, true.B, 0.U, 0.U)) // SB-type instruction (opcode 99) testOpcode(99, (false.B, true.B, false.B, false.B, false.B, 2.U, 0.U, false.B, 0.U, 1.U)) // UJ-type instruction (opcode 111) testOpcode(111, (false.B, false.B, false.B, true.B, false.B, 3.U, 1.U, false.B, 0.U, 2.U)) // Jalr instruction (opcode 103) testOpcode(103, (false.B, false.B, false.B, true.B, false.B, 3.U, 1.U, false.B, 0.U, 3.U)) // U-type (LUI) instruction (opcode 55) testOpcode(55, (false.B, false.B, false.B, true.B, false.B, 6.U, 3.U, true.B, 2.U, 0.U)) // U-type (AUIPC) instruction (opcode 23) testOpcode(23, (false.B, false.B, false.B, true.B, false.B, 7.U, 2.U, true.B, 2.U, 0.U)) } } } ``` ::: #### ImmGenerator ##### Test with ChiselTest For the **ImmGenerator module**, we test the possible input instruction and exam the output. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class ImmGeneratorTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "ImmGenerator" it should "generate correct immediate values for all types" in { test(new ImmGenerator) { c => // Helper function to sign-extend a value def signExtend(value: BigInt, bits: Int): BigInt = { val shift = 32 - bits (value << shift) >> shift } // Test Case 1: I-type immediate c.io.instr.poke("h00000093".U) // ADDI x1, x0, 0 -> Immediate: 0x000 c.clock.step(1) c.io.I_type.expect(0.S) c.io.instr.poke("hFFF00093".U) // ADDI x1, x0, -1 -> Immediate: 0xFFF c.clock.step(1) c.io.I_type.expect(-1.S) // Test Case 2: S-type immediate c.io.instr.poke("h00F02023".U) // SW x15, 0(x0) -> Immediate: 0x00F c.clock.step(1) c.io.S_type.expect(0.S) c.io.instr.poke("hF8002023".U) // SW x15, -128(x0) -> Immediate: 0xF80 c.clock.step(1) c.io.S_type.expect(-128.S) // Test Case 3: SB-type immediate c.io.instr.poke("h00008063".U) // BEQ x0, x0, 0 -> Immediate: 0x000 c.io.pc.poke(0.U) c.clock.step(1) c.io.SB_type.expect(0.S) c.io.instr.poke("hFE008EE3".U) // BEQ x15, x0, -4 -> Immediate: 0xFFC (negative offset) c.io.pc.poke(8.U) c.clock.step(1) c.io.SB_type.expect(4.S) // Test Case 4: U-type immediate c.io.instr.poke("h000000B7".U) // LUI x1, 0 -> Immediate: 0x00000000 c.clock.step(1) c.io.U_type.expect(0.S) // Test Case 5: UJ-type immediate c.io.instr.poke("h0000006F".U) // JAL x0, 0 -> Immediate: 0x00000000 c.io.pc.poke(0.U) c.clock.step(1) c.io.UJ_type.expect(0.S) c.io.instr.poke("hFF00006F".U) // JAL x0, -16 -> Immediate: 0xFFFFFFF0 c.io.pc.poke(16.U) c.clock.step(1) c.io.UJ_type.expect(-1046528.S) } } } ``` ::: #### JALR ##### Test with ChiselTest For the **JALR module**, we test different type of input and check the result of output. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class JalrTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "Jalr" it should "compute the correct jump address with alignment" in { test(new Jalr) { c => // Test case 1: imme = 0, rdata1 = 0 c.io.imme.poke(0.U) c.io.rdata1.poke(0.U) c.clock.step(1) c.io.out.expect(0.U) // Test case 2: imme = 4, rdata1 = 8 c.io.imme.poke(4.U) c.io.rdata1.poke(8.U) c.clock.step(1) c.io.out.expect(12.U) // 4 + 8 = 12 (no masking required) // Test case 3: imme = 5, rdata1 = 10 (unaligned address) c.io.imme.poke(5.U) c.io.rdata1.poke(10.U) c.clock.step(1) c.io.out.expect(14.U) // (5 + 10 = 15, aligned to 14) // Test case 4: imme = 0xFFFFFFFE, rdata1 = 1 (boundary test) c.io.imme.poke("hFFFFFFFE".U) c.io.rdata1.poke(1.U) c.clock.step(1) c.io.out.expect("hFFFFFFFE".U) // 0xFFFFFFFE + 1 = 0xFFFFFFFF, aligned to 0xFFFFFFFE // Test case 5: imme = 0x1234, rdata1 = 0x5678 c.io.imme.poke("h1234".U) c.io.rdata1.poke("h5678".U) c.clock.step(1) c.io.out.expect("h68AC".U) // 0x1234 + 0x5678 = 0x68AC (already aligned) } } } ``` ::: #### PC4 ##### Test with ChiselTest For the **PC4 module**, we update the **program counter** for `0`, `4`, `100` and the output results should be `pc+4`. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class PC4Test extends AnyFlatSpec with ChiselScalatestTester { "PC4" should "correctly compute PC + 4" in { test(new PC4) { c => // Test case 1: Input PC = 0 c.io.pc.poke(0.U) c.clock.step(1) c.io.out.expect(4.U) // Test case 2: Input PC = 4 c.io.pc.poke(4.U) c.clock.step(1) c.io.out.expect(8.U) // Test case 3: Input PC = 100 c.io.pc.poke(100.U) c.clock.step(1) c.io.out.expect(104.U) } } } ``` ::: #### PC ##### Test with ChiselTest For the **PC module**, we update the **program counter** for `4`, `100`, `-8` to check the behavior and also check the result for program counter remain the same. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class PCTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "PC" it should "update and hold the correct value" in { test(new PC) { c => // Initial state: PC register should initialize to 0 c.io.out.expect(0.S) // Test case 1: Update PC to 4 c.io.in.poke(4.S) c.clock.step(1) // Advance one clock cycle c.io.out.expect(4.S) // Test case 2: Update PC to 100 c.io.in.poke(100.S) c.clock.step(1) c.io.out.expect(100.S) // Test case 3: Update PC to -8 c.io.in.poke(-8.S) c.clock.step(1) c.io.out.expect(-8.S) // Test case 4: Hold PC value (no change to input) c.io.in.poke(-8.S) c.clock.step(1) c.io.out.expect(-8.S) } } } ``` ::: #### RegisterFile ##### Test with ChiselTest For the `RegisterFile module`, we test the write and read functions, and most importantly, ensure that `x0` is **always zero**. :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class RegisterFileTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "RegisterFile" it should "initialize and perform read/write correctly" in { test(new RegisterFile) { c => // initialize for (i <- 0 until 32) { c.io.rs1.poke(i.U) c.io.rs2.poke(i.U) c.io.reg_write.poke(false.B) c.io.w_reg.poke(0.U) c.io.w_data.poke(0.S) c.clock.step(1) // check the initialize data c.io.rdata1.expect(0.S) c.io.rdata2.expect(0.S) } // write and read c.io.reg_write.poke(true.B) c.io.w_reg.poke(5.U) c.io.w_data.poke(42.S) c.clock.step(1) // check c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.reg_write.poke(false.B) c.clock.step(1) c.io.rdata1.expect(42.S) c.io.rdata2.expect(42.S) // check write x0 c.io.reg_write.poke(true.B) c.io.w_reg.poke(0.U) c.io.w_data.poke(123.S) c.clock.step(1) // check x0 c.io.rs1.poke(0.U) c.io.rs2.poke(0.U) c.io.reg_write.poke(false.B) c.clock.step(1) c.io.rdata1.expect(0.S) c.io.rdata2.expect(0.S) } } } ``` ::: ### Hazard Units #### BranchForward ##### Test with ChiselTest For the **BranchForward module**, we test for `ALU hazards`, `EX/MEM hazards`, `MEM/WB hazards`, and `Jalr forwarding`. ```asm= add x5, x1, x2 beq x5, x5, label ``` - ALU hazards Here, we set the `rd` at **ID_EX register** to `x5` (`dut.io.ID_EX_RD.poke(5.U)`) and set both `rs1` and `rs2` to `x5`. ```scala= it should "handle ALU hazards correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(5.U) c.io.EX_MEM_RD.poke(0.U) c.io.MEM_WB_RD.poke(0.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.ctrl_branch.poke(1.U) // ALU hazard forwarding c.clock.step(1) c.io.forward_rs1.expect("b0001".U) c.io.forward_rs2.expect("b0001".U) } } ``` - EX/MEM hazards Here, we set the `rd` at **EX_MEM register** to `x5` (`dut.io.ID_EX_RD.poke(5.U)`) and set both `rs1` and `rs2` to `x5`. ```scala= it should "handle EX/MEM hazards correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(0.U) c.io.EX_MEM_RD.poke(5.U) c.io.MEM_WB_RD.poke(0.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.ctrl_branch.poke(1.U) c.clock.step(1) c.io.forward_rs1.expect("b0010".U) c.io.forward_rs2.expect("b0010".U) } } ``` - MEM/WB hazards Here, we set the `rd` at **MEM_WB register** to `x5` (`dut.io.ID_EX_RD.poke(5.U)`) and set both `rs1` and `rs2` to `x5`. ```scala= it should "handle MEM/WB hazards correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(0.U) c.io.EX_MEM_RD.poke(0.U) c.io.MEM_WB_RD.poke(5.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.ctrl_branch.poke(1.U) c.clock.step(1) c.io.forward_rs1.expect("b0011".U) c.io.forward_rs2.expect("b0011".U) } } ``` - Jalr forwarding ```asm= add x5, x1, x2 jalr x0, x5, 0 ``` Here, we set the `ctrl_branch` to 0 which means the instruction is **JALR**. ```scala= it should "handle Jalr forwarding logic correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(5.U) c.io.EX_MEM_RD.poke(0.U) c.io.MEM_WB_RD.poke(0.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.ctrl_branch.poke(0.U) c.clock.step(1) c.io.forward_rs1.expect("b0110".U) } } ``` The whole code would be : :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class BranchForwardTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "BranchForward" it should "handle ALU hazards correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(5.U) c.io.EX_MEM_RD.poke(0.U) c.io.MEM_WB_RD.poke(0.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.ctrl_branch.poke(1.U) // ALU hazard forwarding c.clock.step(1) c.io.forward_rs1.expect("b0001".U) c.io.forward_rs2.expect("b0001".U) } } it should "handle EX/MEM hazards correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(0.U) c.io.EX_MEM_RD.poke(5.U) c.io.MEM_WB_RD.poke(0.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.ctrl_branch.poke(1.U) c.clock.step(1) c.io.forward_rs1.expect("b0010".U) c.io.forward_rs2.expect("b0010".U) } } it should "handle MEM/WB hazards correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(0.U) c.io.EX_MEM_RD.poke(0.U) c.io.MEM_WB_RD.poke(5.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.rs2.poke(5.U) c.io.ctrl_branch.poke(1.U) c.clock.step(1) c.io.forward_rs1.expect("b0011".U) c.io.forward_rs2.expect("b0011".U) } } it should "handle Jalr forwarding logic correctly" in { test(new BranchForward) { c => c.io.ID_EX_RD.poke(5.U) c.io.EX_MEM_RD.poke(0.U) c.io.MEM_WB_RD.poke(0.U) c.io.ID_EX_memRd.poke(0.U) c.io.EX_MEM_memRd.poke(0.U) c.io.MEM_WB_memRd.poke(0.U) c.io.rs1.poke(5.U) c.io.ctrl_branch.poke(0.U) c.clock.step(1) c.io.forward_rs1.expect("b0110".U) } } } ``` ::: For the remaining Hazard Units, we applied the same testing methodology. Therefore, only the code demonstration is provided here. #### Forwarding ##### Test with ChiselTest :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class ForwardingTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "Forwarding" it should "handle EX Hazard correctly" in { test(new Forwarding) { c => c.io.IDEX_rs1.poke(5.U) c.io.IDEX_rs2.poke(5.U) c.io.EXMEM_rd.poke(5.U) c.io.EXMEM_regWr.poke(1.U) c.io.MEMWB_rd.poke(0.U) c.io.MEMWB_regWr.poke(0.U) c.clock.step(1) c.io.forward_a.expect("b10".U) c.io.forward_b.expect("b10".U) c.io.IDEX_rs1.poke(5.U) c.io.IDEX_rs2.poke(6.U) c.io.EXMEM_rd.poke(5.U) c.io.EXMEM_regWr.poke(1.U) c.io.MEMWB_rd.poke(0.U) c.io.MEMWB_regWr.poke(0.U) c.clock.step(1) c.io.forward_a.expect("b10".U) c.io.forward_b.expect("b00".U) c.io.IDEX_rs1.poke(6.U) c.io.IDEX_rs2.poke(5.U) c.io.EXMEM_rd.poke(5.U) c.io.EXMEM_regWr.poke(1.U) c.io.MEMWB_rd.poke(0.U) c.io.MEMWB_regWr.poke(0.U) c.clock.step(1) c.io.forward_a.expect("b00".U) c.io.forward_b.expect("b10".U) } } it should "handle MEM Hazard correctly" in { test(new Forwarding) { c => c.io.IDEX_rs1.poke(5.U) c.io.IDEX_rs2.poke(5.U) c.io.EXMEM_rd.poke(0.U) c.io.EXMEM_regWr.poke(0.U) c.io.MEMWB_rd.poke(5.U) c.io.MEMWB_regWr.poke(1.U) c.clock.step(1) c.io.forward_a.expect("b01".U) c.io.forward_b.expect("b01".U) // Case 2: MEM Hazard for rs1 only c.io.IDEX_rs1.poke(5.U) c.io.IDEX_rs2.poke(6.U) c.io.EXMEM_rd.poke(0.U) c.io.EXMEM_regWr.poke(0.U) c.io.MEMWB_rd.poke(5.U) c.io.MEMWB_regWr.poke(1.U) c.clock.step(1) c.io.forward_a.expect("b01".U) c.io.forward_b.expect("b00".U) // Case 3: MEM Hazard for rs2 only c.io.IDEX_rs1.poke(6.U) c.io.IDEX_rs2.poke(5.U) c.io.EXMEM_rd.poke(0.U) c.io.EXMEM_regWr.poke(0.U) c.io.MEMWB_rd.poke(5.U) c.io.MEMWB_regWr.poke(1.U) c.clock.step(1) c.io.forward_a.expect("b00".U) c.io.forward_b.expect("b01".U) } } it should "handle no hazards correctly" in { test(new Forwarding) { c => c.io.IDEX_rs1.poke(5.U) c.io.IDEX_rs2.poke(6.U) c.io.EXMEM_rd.poke(0.U) c.io.EXMEM_regWr.poke(0.U) c.io.MEMWB_rd.poke(0.U) c.io.MEMWB_regWr.poke(0.U) c.clock.step(1) c.io.forward_a.expect("b00".U) c.io.forward_b.expect("b00".U) } } } ``` ::: #### HazardDetection ##### Test with ChiselTest :::spoiler Code ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class HazardDetectionTest extends AnyFlatSpec with ChiselScalatestTester { "HazardDetection" should "detect hazards correctly" in { test(new HazardDetection) { c => // Case 1: Hazard detected (Rs1 matches ID_EX_rd) c.io.IF_ID_inst.poke("h00028033".U) // Rs1 = 0x02, Rs2 = 0x00 c.io.ID_EX_memRead.poke(true.B) c.io.ID_EX_rd.poke(5.U) c.io.pc_in.poke(100.S) c.io.current_pc.poke(96.S) c.clock.step(1) c.io.inst_forward.expect(true.B) c.io.pc_forward.expect(true.B) c.io.ctrl_forward.expect(true.B) c.io.inst_out.expect("h00028033".U) c.io.pc_out.expect(100.S) c.io.current_pc_out.expect(96.S) // Case 2: Hazard detected (Rs2 matches ID_EX_rd) c.io.IF_ID_inst.poke("h0002A033".U) // Rs1 = 0x00, Rs2 = 0x02 c.io.ID_EX_memRead.poke(true.B) c.io.ID_EX_rd.poke(5.U) c.io.pc_in.poke(104.S) c.io.current_pc.poke(100.S) c.clock.step(1) c.io.inst_forward.expect(true.B) c.io.pc_forward.expect(true.B) c.io.ctrl_forward.expect(true.B) c.io.inst_out.expect("h0002A033".U) c.io.pc_out.expect(104.S) c.io.current_pc_out.expect(100.S) } } } ``` ::: #### StructuralHazard ##### Test with ChiselTest ##### Test with Verilator --- ### Main ##### Test with ChiselTest For the test of Main, we utilize [riscv-test](https://github.com/riscv-software-src/riscv-tests) as our test bench. There are many type of riscv-test, we choose `rv32ui-p-*` to test our **5 stage pipelined RISC-V cpu** where rv32ui stands for **32-bit RISC-V user mode with the integer base instruction set** and p stands for program. For every RISC-V test, if the CPU passes all tests, the global pointer (gp) will be set to 1. Otherwise, it will be set to a value greater than 1. Take `rv32ui-p-add` as example ```asm= 0000018c <test_2>: 18c: 00200193 li gp,2 190: 00000593 li a1,0 194: 00000613 li a2,0 198: 00c58733 add a4,a1,a2 19c: 00000393 li t2,0 1a0: 4c771663 bne a4,t2,66c <fail> 000001a4 <test_3>: 1a4: 00300193 li gp,3 1a8: 00100593 li a1,1 1ac: 00100613 li a2,1 ... 00000650 <test_38>: 650: 02600193 li gp,38 654: 01000093 li ra,16 658: 01e00113 li sp,30 65c: 00208033 add zero,ra,sp 660: 00000393 li t2,0 664: 00701463 bne zero,t2,66c <fail> 668: 02301063 bne zero,gp,688 <pass> 0000066c <fail>: 66c: 0ff0000f fence 670: 00018063 beqz gp,670 <fail+0x4> 674: 00119193 slli gp,gp,0x1 678: 0011e193 ori gp,gp,1 67c: 05d00893 li a7,93 680: 00018513 mv a0,gp 684: 00000073 ecall 00000688 <pass>: 688: 0ff0000f fence 68c: 00100193 li gp,1 690: 05d00893 li a7,93 694: 00000513 li a0,0 698: 00000073 ecall 69c: c0001073 unimp 6a0: 0000 .insn 2, 0x ``` If the test fails, the PC will jump to fail; otherwise, the program will complete all the tests and pass. In order to check the `gp` we set a debug output in the **RegisterFile module** ( i.e., `io.reg_debug1 := regfile(3)` where 3 is the gp register id ) and print it out. ```scala= printf(p"============================= \n") printf(p"pc : 0x${Hexadecimal(IF_ID_.io.SelectedPC)}\n") printf(p"inst : 0x${Hexadecimal(IF_ID_.io.SelectedInstr)}\n") printf(p"gp : 0x${Hexadecimal(RegFile.io.reg_debug)}\n") ``` For the test program, we step the clock in order to finish all the instructions. ```scala= package Pipeline import chisel3._ import org.scalatest.flatspec.AnyFlatSpec import chiseltest._ class MainTest extends AnyFlatSpec with ChiselScalatestTester{ behavior of "5-Stage test" it should "Go through" in { test(new PIPELINE){c => c.clock.step(600) } } } ``` We tested all the instructions, and the gp register consistently shows `gp: 0x00000001` after running the program, indicating that the testbench has passed. :::warning We did not pass the fence and jalr tests. The reasons will be explained below. The fence instruction serves as a memory barrier to enforce a specific ordering of loads and stores.Since we did not implement this functionality, we do not need to test `rv32ui-p-fence`. For `jalr`, we checked the `.dump` file and noticed that the RISC-V test set the t1 register to `0x00000010`, but the target of test 2 is at `0x000001a4`. Therefore, we believe that the reason we did not pass the test is because of the testbench mistakenly set the destination of jalr. ```asm= 0000018c <test_2>: 18c: 00200193 li gp,2 190: 00000293 li t0,0 194: 00000317 auipc t1,0x0 198: 01030313 addi t1,t1,16 # 1a4 <target_2> 19c: 000302e7 jalr t0,t1 000001a0 <linkaddr_2>: 1a0: 0e00006f j 280 <fail> 000001a4 <target_2>: 1a4: 00000317 auipc t1,0x0 1a8: ffc30313 addi t1,t1,-4 # 1a0 <linkaddr_2> 1ac: 0c629a63 bne t0,t1,280 <fail> ``` ::: ##### Test with Verilator We test the `PIPLINE` module with the floowing `cpp` code. As mention above `gp` (register index 3) will be set to 1 if all testcases were passed. The following is the `cpp` code: :::spoiler Code ```cpp= #include "verilated.h" #include "VPIPELINE.h" #include "verilated_vcd_c.h" void tick(VPIPELINE* dut) { dut->clock = 0; dut->eval(); dut->clock = 1; dut->eval(); } int main(int argc, char** argv, char** env) { Verilated::commandArgs(argc, argv); Verilated::traceEverOn(true); VPIPELINE* dut = new VPIPELINE; dut->reset = 1; for (int i = 0; i < 5; i++) { tick(dut); } dut->reset = 0; int cycle_count = 0; for (size_t i = 0; i < 600; i++) { tick(dut); cycle_count++; if(dut->PIPELINE__DOT__RegFile__DOT__regfile_3 == 1) { printf("passed, cycle count : %d\n", cycle_count); return 0; } } printf("failed\n"); return 0; } ``` ::: and the execution result: ``` ~/5-Stage-RV32I/generated$ ./obj_dir/VPIPELINE passed, cycle count : 521 ``` ## Improvement ### Branch Prediction We wrote a FSM to decide whether to take the branch or not. This module will be placed in the instruction fetch stage, and will be updated when the actual result is computed. ![image](https://hackmd.io/_uploads/SJG69JBwyx.png =60%x) ```scala= package Pipeline import chisel3._ import chisel3.util._ import Branch_predict_state._ object Branch_predict_state{ val STRONG_TAKEN = 0.U(2.W) val WEAK_TAKEN = 1.U(2.W) val STRONG_NOT_TAKEN = 2.U(2.W) val WEAK_NOT_TAKEN = 3.U(2.W) def stateToString(state: UInt): String = { state.litValue.toInt match { case 0 => "STRONG_TAKEN" case 1 => "WEAK_TAKEN" case 2 => "STRONG_NOT_TAKEN" case 3 => "WEAK_NOT_TAKEN" case _ => "UNKNOWN" } } } class branch_predict extends Module{ val io = IO(new Bundle{ val taken = Input(Bool()) val branch_predict = Output(Bool()) }) val current_state = RegInit(STRONG_NOT_TAKEN) val next_state = Wire(UInt(2.W)) next_state := current_state when(current_state === STRONG_TAKEN){ next_state := Mux(io.taken, STRONG_TAKEN, WEAK_TAKEN) }.elsewhen(current_state === WEAK_TAKEN){ next_state := Mux(io.taken, STRONG_TAKEN, WEAK_NOT_TAKEN) }.elsewhen(current_state === STRONG_NOT_TAKEN){ next_state := Mux(io.taken, WEAK_NOT_TAKEN, STRONG_NOT_TAKEN) }.elsewhen(current_state === WEAK_NOT_TAKEN){ next_state := Mux(io.taken, WEAK_TAKEN, STRONG_NOT_TAKEN) }.otherwise{ next_state := current_state } current_state := next_state printf(p"taken: ${io.taken}, current_state: ${current_state}, next_state: ${next_state}, predict: ${(next_state === STRONG_TAKEN) || (next_state === WEAK_TAKEN)}\n") io.branch_predict := (next_state === STRONG_TAKEN) || (next_state === WEAK_TAKEN) } ``` then test it by ```scala= package Pipeline import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec import Branch_predict_state._ class BranchPredictTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "branch_predict" it should "correctly predict branch behavior" in { test(new branch_predict) { c => val testCases = Seq( // (initial state, taken, expected state, expected prediction) (true.B, false.B), (true.B, true.B), (false.B, false.B), (true.B, true.B), (true.B, true.B), (true.B, true.B), (false.B, true.B) ) for ((taken, expectedPrediction) <- testCases) { c.io.taken.poke(taken) c.io.branch_predict.expect(expectedPrediction) c.clock.step() } } } } ``` Prediction should be compared with actual result to determined whether instruction should be flushed or not. To achieve the functionality in 5-stage pipelined RISC-V processor, we rewrite the `BRANCH` module as well. ```scala= val io = IO(new Bundle { val fnct3 = Input(UInt(3.W)) val branch = Input(Bool()) val arg_x = Input(SInt(32.W)) val arg_y = Input(SInt(32.W)) val pred = Input(Bool()) // predicted result of whether branch is taken or not val actual = Output(Bool()) // actual result of whether branch is taken or not val flush = Output(Bool()) }) ``` The whole code would be as following, :::spoiler Code ```scala= package Pipeline import chisel3._ import chisel3.util._ class Branch extends Module { val io = IO(new Bundle { val fnct3 = Input(UInt(3.W)) val branch = Input(Bool()) val arg_x = Input(SInt(32.W)) val arg_y = Input(SInt(32.W)) val pred = Input(Bool()) // predicted result of whether branch is taken or not val actual = Output(Bool()) // actual result of whether branch is taken or not val flush = Output(Bool()) }) io.actual := false.B io.flush := false.B val temp = WireDefault(false.B) when(io.branch) { // beq when(io.fnct3 === 0.U) { io.actual := io.arg_x === io.arg_y temp := io.arg_x === io.arg_y } // bne .elsewhen(io.fnct3 === 1.U) { io.actual := io.arg_x =/= io.arg_y temp := io.arg_x =/= io.arg_y } // blt .elsewhen(io.fnct3 === 4.U) { io.actual := io.arg_x < io.arg_y temp := io.arg_x < io.arg_y } // bge .elsewhen(io.fnct3 === 5.U) { io.actual := io.arg_x >= io.arg_y temp := io.arg_x >= io.arg_y } // bltu (unsigned less than) .elsewhen(io.fnct3 === 6.U) { io.actual := io.arg_x.asUInt < io.arg_y.asUInt temp := io.arg_x < io.arg_y } // bgeu (unsigned greater than or equal) .elsewhen(io.fnct3 === 7.U) { io.actual := io.arg_x.asUInt >= io.arg_y.asUInt temp := io.arg_x >= io.arg_y } io.flush := io.pred ^ temp } } ``` ::: Other than the predictor, we added two more modules to compute the desired program counter. The `BTB` module is for computing the predicted program counter, and the `PCselector` is for deciding which program counter to take. The following will be explaination of these 2 module. **BTB** The `BTB` module takes in current `PC` and current instruction, then calculate target address and decide whether the instruction is `B-type`. ```scala= package Pipeline import chisel3._ import chisel3.util._ class BTB extends Module { val io = IO(new Bundle { val inst = Input(UInt(32.W)) val PC = Input(UInt(32.W)) val isBtype = Output(Bool()) val target = Output(UInt(32.W)) }) // Compute immediate value val imm = Cat(Fill(23, io.inst(7)), io.inst(30, 26), io.inst(11, 8)) // Compute target io.target := Mux(io.inst(6, 0) === "b1100011".U, io.PC + imm.asUInt, io.PC + 4.U) // Determine if instruction is B-type io.isBtype := io.inst(6, 0) === "b1100011".U } ``` **PC selection** When predictor is introduced, how to set the new program counter become complicated. Therefore, we redesign the logic of selecting program counter. ```scala= when(HazardDetect.io.pc_forward === 1.B) { // If load type instruction happens, stall for one cycle PC.io.in := HazardDetect.io.pc_out }.otherwise { when(control_module.io.next_pc_sel === "b01".U) { when(Branch_M.io.flush === 1.B && control_module.io.branch === 1.B) { //conditional jump, check if flush is needed PC.io.in := Mux(Branch_M.io.actual, IF_ID_.io.target_old, IF_ID_.io.pc4_out.asSInt) IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.target:= 0.S IF_ID_.io.SelectedInstr := 0.U }.otherwise { // decide next PC using predictor PC.io.in := Mux(btb.io.isBtype, Mux(predictor.io.prediction, btb.io.target, PC4.io.out.asSInt), PC4.io.out.asSInt) } }.elsewhen(control_module.io.next_pc_sel === "b10".U) { // unconditional jump, flush unconditionally PC.io.in := ImmGen.io.UJ_type IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.target:= 0.S IF_ID_.io.SelectedInstr := 0.U }.elsewhen(control_module.io.next_pc_sel === "b11".U) { // unconditional jump, flush unconditionally PC.io.in := JALR.io.out.asSInt IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.target:= 0.S IF_ID_.io.SelectedInstr := 0.U }.otherwise { // decide next PC using predictor PC.io.in := Mux(btb.io.isBtype, Mux(predictor.io.prediction, btb.io.target, PC4.io.out.asSInt), PC4.io.out.asSInt) } } ``` ## Verify with verilator We test our processor with branch prediction using riscv test mentioned above, which set the register `gp` to $1$ if success. The following is the testing preogram and testing result: ```cpp= #include "verilated.h" #include "VCPU.h" #include "verilated_vcd_c.h" void tick(VCPU* dut) { dut->clock = 0; dut->eval(); dut->clock = 1; dut->eval(); } int main(int argc, char** argv, char** env) { Verilated::commandArgs(argc, argv); Verilated::traceEverOn(true); VCPU* dut = new VCPU; dut->reset = 1; for (int i = 0; i < 5; i++) { tick(dut); } dut->reset = 0; int cycle_count = 0; for (size_t i = 0; i < 600; i++) { tick(dut); cycle_count++; if(i % 50 == 0)printf("gp register = 0x%08X\n", dut->CPU__DOT__RegFile__DOT__regfile_3); if(dut->CPU__DOT__RegFile__DOT__regfile_3 == 1) { printf("gp register = 0x%08X\n", dut->CPU__DOT__RegFile__DOT__regfile_3); printf("passed, cycle count : %d\n", cycle_count); return 0; } } printf("failed\n"); return 0; } ``` and the execution result: ``` gp register = 0x00000001 passed, cycle count : 521 ``` ### Test Case 1 To verify the branch prediction module, we compare the number of cycles required to execute the same program on the original processor and on the one with the prediction mechanism. The testing program is shown as follow: :::spoiler Code ```asm= .section .text .global _start _start: # initialize li t0, 0 li t1, 10 # ========================== # Test 1: Always Taken # ========================== always_taken: addi t0, t0, 1 # t0 += 1 beq t0, t1, branch_end beq zero, zero, always_taken branch_end: # ========================== # Test 2: Not Taken # ========================== not_taken: li t0, 0 li t1, 10 not_taken_loop: addi t0, t0, 1 bne t0, t1, not_taken_loop # ========================== # Test 3: Alternating # ========================== alternating: li t0, 0 li t1, 10 li t2, 0 alternating_loop: addi t0, t0, 1 beq t2, zero, alt_taken j alt_not_taken alt_taken: li t2, 1 j alternating_loop alt_not_taken: li t2, 0 bne t0, t1, alternating_loop done: li gp, 1 ``` ::: and we used verilator to monitor how many cycles is required: :::spoiler Code ```cpp= #include "verilated.h" #include "VCPU.h" #include "verilated_vcd_c.h" void tick(VCPU* dut) { dut->clock = 0; dut->eval(); dut->clock = 1; dut->eval(); } int main(int argc, char** argv, char** env) { Verilated::commandArgs(argc, argv); Verilated::traceEverOn(true); VCPU* dut = new VCPU; dut->reset = 1; for (int i = 0; i < 5; i++) { tick(dut); } dut->reset = 0; int cycle_count = 0; int branch_count = 0; int flush_count = 0; for (size_t i = 0; i < 1500; i++) { tick(dut); cycle_count++; if(dut->CPU__DOT__control_module_io_branch) { branch_count++; flush_count = dut->CPU__DOT__Branch_M_io_flush ? flush_count + 1: flush_count; } if(dut->CPU__DOT__RegFile__DOT__regfile_3 == 1) { printf("gp register = 0x%08X\n", dut->CPU__DOT__RegFile__DOT__regfile_3); printf("passed, cycle count : %d\n", cycle_count); printf("number of flushes : %d\n", flush_count); return 0; } } printf("failed\n"); return 0; } ``` ::: and the executing result: **without branch prediction** ![Screenshot from 2025-01-21 19-49-58](https://hackmd.io/_uploads/Sk_DcZ6Pkl.png =300%x) ``` gp register = 0x00000001 passed, cycle count : 137 number of flushes : 22 ``` **with branch prediction** ![Screenshot from 2025-01-21 19-46-24](https://hackmd.io/_uploads/rJOP5ZpPyg.png =300%x) ``` gp register = 0x00000001 passed, cycle count : 171 number of flushes : 37 ``` ### Test Case 2 We test our processor with a program that does multiplication: :::spoiler Code ``` .section .text .global _start _start: addi a0,zero, -79 # multiplier addi a1,zero, 2 # multiplicand addi t2,zero, 0 # result loop: addi t0,zero, 1 # check if lsb = 0 beq a1, zero, done and t0, a1, t0 beq t0, zero, next add t2, a0, t2 next: srli a1, a1, 1 slli a0, a0, 1 j loop done: li gp, 1 ``` ::: Result : **with branch prediction** ``` gp register = 0x00000001 passed, cycle count : 83 number of flushes : 10 ``` **without branch prediction** ``` gp register = 0x00000001 passed, cycle count : 92 number of flushes : 5 ``` ### Test Case 3 We tried running bubble sort with our processor, the following is the testing program: :::spoiler Code ``` .section .text .global _start .data arr: .word 456 .word 78 .word -796 .word 456785 .word 3 .word -12345 .word 98765 .word 4321 .word -654 .word 0 _start: li sp, 0x7ff addi sp,sp,-16 sw ra,12(sp) sw s0,8(sp) addi s0,sp,16 li a1,10 lui a5,%hi(arr) addi a0,a5,%lo(arr) call bubble_sort li a5,0 mv a0,a5 lw ra,12(sp) lw s0,8(sp) addi sp,sp,16 li gp, 1 jr ra bubble_sort: addi sp,sp,-48 sw ra,44(sp) sw s0,40(sp) addi s0,sp,48 sw a0,-36(s0) sw a1,-40(s0) sw zero,-20(s0) j .L2 .L8: sw zero,-28(s0) sw zero,-24(s0) j .L3 .L5: lw a5,-24(s0) slli a5,a5,2 lw a4,-36(s0) add a5,a4,a5 lw a4,0(a5) lw a5,-24(s0) addi a5,a5,1 slli a5,a5,2 lw a3,-36(s0) add a5,a3,a5 lw a5,0(a5) ble a4,a5,.L4 lw a5,-24(s0) addi a5,a5,1 slli a5,a5,2 lw a4,-36(s0) add a5,a4,a5 lw a5,0(a5) sw a5,-32(s0) lw a5,-24(s0) slli a5,a5,2 lw a4,-36(s0) add a4,a4,a5 lw a5,-24(s0) addi a5,a5,1 slli a5,a5,2 lw a3,-36(s0) add a5,a3,a5 lw a4,0(a4) sw a4,0(a5) lw a5,-24(s0) slli a5,a5,2 lw a4,-36(s0) add a5,a4,a5 lw a4,-32(s0) sw a4,0(a5) .L4: lw a5,-24(s0) addi a5,a5,1 sw a5,-24(s0) .L3: lw a4,-40(s0) lw a5,-20(s0) sub a5,a4,a5 addi a5,a5,-1 lw a4,-24(s0) blt a4,a5,.L5 lw a5,-28(s0) bne a5,zero,.L9 lw a5,-20(s0) addi a5,a5,1 sw a5,-20(s0) .L2: lw a4,-20(s0) lw a5,-40(s0) blt a4,a5,.L8 j .L10 .L9: nop .L10: nop lw ra,44(sp) lw s0,40(sp) addi sp,sp,48 jr ra ``` ::: We could not pass this one, and our guess is that the memory map is problematic. Our first assumption is stack overflow, because we did not allocate enough space when designing the data memory module, the `lw ra,44(sp)` instruction did not work as expected, causing problem for returning. The second assumption is segmentaion fault, because we did not specify the memory map during linking, return address is gone during sorting. We will try to fix this problem as soon as possible. **Brute Force solution** We assign the data of array with `lw` instruction and set head of the array's head to 0 :::spoiler Code ```asm= # a0 start at the bottom of memory # a0 = {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} li a0, 0 mv t0, a0 li t2, 9 sw t2, 0(t0) addi t0, t0, 4 li t2, 8 sw t2, 0(t0) addi t0, t0, 4 li t2, 7 sw t2, 0(t0) addi t0, t0, 4 li t2, 6 sw t2, 0(t0) addi t0, t0, 4 li t2, 5 sw t2, 0(t0) addi t0, t0, 4 li t2, 4 sw t2, 0(t0) addi t0, t0, 4 li t2, 3 sw t2, 0(t0) addi t0, t0, 4 li t2, 2 sw t2, 0(t0) addi t0, t0, 4 li t2, 1 ``` ::: and the program became runnable. **with predictor** ``` execution result gp register = 0x00000001 passed, cycle count : 66 ``` **without predictor** ``` execution result gp register = 0x00000001 passed, cycle count : 70 number of branch instructions : 2, predictor hit : 2 ``` Although this method solve the problem, it was too inefficient. We are trying to solve it with linker script. --- In general, the branch predictor successfully reduces cycles. [Memories_ChiselCookbook]: https://www.chisel-lang.org/docs/explanations/memories [ChiselTest_ChiselCookbook]: https://www.chisel-lang.org/docs/appendix/migrating-from-chiseltest [DataTypes_ChiselCookbook]: https://www.chisel-lang.org/docs/explanations/data-types [Verilator_ChiselCookbook]: https://www.chisel-lang.org/docs/installation#quickstart-with-scala-cli