# Pipelined RISC-V in Chisel > 饒胤琛 [GitHub](https://github.com/bevmmf/CA2024_term-project/tree/main) >[!Important] >The target of the project is to get familiar with designing digital systems through **chisel**.Afterwards,extending the [5-Stage-RV32I by kinzafatim](https://github.com/kinzafatim/5-Stage-RV32I/tree/main/src/main/scala/Pipeline) with hazarddetection as well as forwarding logic and fully verified by 5 testbenches ## study **5-Stage-RV32I** by kinzafatim ### I. Chisel Top-Level Structure and Common Workflow #### 1. Defining a Top-Level System (Top-Level Module) Typically, one uses `class ... extends Module`. In this project, for instance, we have `class PIPELINE(...) extends Module`. This represents the “highest-level” hardware design, responsible for coordinating the various stages and submodules. #### 2. Defining Top-Level I/O (Bundle) Inside the top-level class : ```scala val io = IO(new Bundle { val out = Output (SInt(4.W)) }) ``` #### 3. Instantiating Submodules According to our design needs, we “create” other module instances in the top-level code, such as: ```scala val IF_ID_ = Module(new IF_ID) val InstMemory = Module(new InstMem(initFile)) ``` In Chisel, these instances function like components in a circuit schematic: we declare them using Module(new ...) and give them names. #### 4. Wiring and Interconnection We then perform all necessary I/O connections among submodules and the top-level `io`. For example: ```scala IF_ID_.io.pc_in := PC.io.out InstMemory.io.addr := PC.io.out.asUInt ``` Finally, we ensure that signals and control paths are properly linked from one stage to the next. (Summary) This procedure—defining the top module, declaring the I/O interface, instantiating submodules, and wiring them up—is a typical Chisel hardware design workflow. Once complete, you can simulate in Chisel/Verilog and map it onto an FPGA or ASIC. ### II. Pipeline Architecture and Hazard Analysis #### 1. Five-Stage Pipeline Registers In this project’s `Main.scala`, we commonly see four pipeline registers: ```scala val IF_ID_ = Module(new IF_ID) val ID_EX_ = Module(new ID_EX) val EX_MEM_M = Module(new EX_MEM) val MEM_WB_M = Module(new MEM_WB) ``` They handle the IF→ID, ID→EX, EX→MEM, MEM→WB stages, respectively. These registers latch instruction details and control signals at each stage, ensuring each instruction traverses the five pipeline steps—fetching, decoding, executing, memory accessing, and writing back—over multiple clock cycles. #### 2. Hazard and Forwarding Modules ##### **Load-Use Hazard** If a previous instruction is a load (`memRead=1`), and the next instruction immediately needs that register (rs1/rs2 == rd) in the EX stage, we must stall for one cycle and insert a bubble in EX. In practice, `HazardDetection.scala` detects this condition and outputs `ctrl_forward=1`, prompting the top-level code to freeze `PC/IF_ID` and set `ID_EX` control signals to zero (bubble). ##### **General Data Hazard (R/I/B-type)** If a previous instruction hasn’t fully written back to the register file, but the next instruction in EX needs that value, we use `Forwarding.scala` to route data (EX/MEM or MEM/WB outputs) directly to the ALU. ##### **Branch Hazard (Branch / Jal / Jalr)** If the instruction is a branch or jump, we need to decide whether to take the branch in the ID or EX stage. If taken, we flush the pipeline (clearing IF/ID or beyond). Meanwhile, `BranchForward.scala` performs data forwarding for rs1, rs2 needed in branch comparisons. ##### **Structural Hazard (Simultaneous Read/Write)** Since this project uses separate instruction/data memories, we don’t encounter a unified memory structural hazard. However, we do consider “RegisterFile read and write in the same cycle.” In `StructuralHazard.scala`, if ID stage reads xN at the same time WB stage writes xN, we directly fetch `RegFile.io.w_data` instead of stale data. #### 3. Forwarding Scenarios ##### **ALU forwarding (`Forwarding.scala`)** Deals with ALU inputs in the EX stage (`in_A, in_B`). - If `forward_x = 1.U` or `2.U`, the data might come from EX/MEM or MEM/WB. - If no hazard, we simply use `rs1/rs2`. ##### **Branch / JALR forwarding (`BranchForward.scala`)** Specifically for instructions like `beq`, `bne` or `jalr` that may resolve in ID. If a previous instruction hasn’t written the needed register, we route from ALU.out, EX_MEM, or WB. Unlike ALU forwarding, branch/jalr decisions might be made in the ID stage or a specialized branch unit, requiring separate forwarding logic. ##### **RegisterFile Same-Cycle Forwarding (`StructuralHazard.scala`)** If ID stage reads xN while WB stage is writing xN in the same clock cycle, we immediately use `RegFile.io.w_data`. This is another form of data hazard, though many textbooks might solve it by writing in the first half-cycle and reading in the second half-cycle. This project handles it explicitly in a separate file. ### III. Differences Between `.elf` and `.txt` Memory Loading #### 1. Limitations of Chisel’s `loadMemoryFromFile` `loadMemoryFromFile(...)`, as provided by Chisel/FIRRTL/Treadle, generally expects a simple text file with one data word per line (in hex or binary). It does not parse ELF structure (section headers, symbol tables, etc.). Hence, if we only have an `.elf` file, we must convert it to `.bin`, `.hex`, or `.mem` first—some textual format that `loadMemoryFromFile` can handle. #### 2. ELF Format Complexity ELF (Executable and Linkable Format) includes section headers, relocation data, and more. A minimal hardware memory model in Chisel is unaware of these complexities and doesn’t come with an ELF loader. Consequently, typical solutions involve converting ELF to `.txt` / `.mem`, ensuring each line corresponds to 32-bit instructions at addresses 0, 4, 8, etc. ### IV. The Principles of Stall and Bubble In a pipeline design: - **Stall** Freezes a particular stage (e.g., ID), preventing that pipeline register from updating or fetching a new instruction. Effectively, the same instruction remains for one extra cycle. - **Bubble** Continues advancing the pipeline register but replaces control signals with zeros (NOP) so that stage does no meaningful work. Practically, we often stall the ID stage and simultaneously bubble the EX stage. For example, if we have a load-use hazard, ID remains stuck, while EX receives a no-op. ### V. Conclusion: Pipeline Architecture Division and Key Takeaways #### A. Possible Ways to Partition the Design 1. **By Pipeline Stage** - **IF**: PC + InstMemory, then send outputs to `IF_ID`. - **ID**: `IF_ID` inputs + RegisterFile + Control + ImmGen + HazardDetect, then forward to `ID_EX`. - **EX**: `ID_EX` inputs + ALU + ALUControl + Forwarding, then forward to `EX_MEM`. - **MEM**: `EX_MEM` inputs + DataMemory, then forward to `MEM_WB`. - **WB**: `MEM_WB` inputs, writes back to RegFile. 2. **By Functional Blocks** - PC/Branch (PC, PC4, Branch_M, JALR, BranchForward) - Hazard Detection (HazardDetect, Structural, Forwarding) - Decode/Control (Control, ImmGen, RegisterFile) - Pipeline Registers (IF_ID, ID_EX, EX_MEM, MEM_WB) - Memory (InstMemory, DataMemory) 3. **Hybrid** - Outline the five pipeline stages, then insert hazard forwarding and branch logic wherever they intersect. :::spoiler TOP module structure ![image](https://hackmd.io/_uploads/rJzau-hDJe.png) ::: ## Refining and Extending the 5-Stage RV32I Pipeline ### problem1 : sbt test TOPTest fail reason : cause test.txt address is absolute in **Main.scala** => The solution is we can just modify the address to a relative address to solve ``` scala val InstMemory = Module(new InstMem ("/home/kinzaa/Desktop/5-Stage-RV32I/src/main/scala/Pipeline/test.txt")) ``` ### problem2 : we can only run one program on the 5-stage-RV32I per test reason : In my** Main.scala** (under the PIPELINE directory), InstMem is currently hard-coded as follows: ``` scala val InstMemory = Module(new InstMem("/home/.../test.txt")) ``` As a result, it always loads instructions from test.txt. => I make it **Parameterized** In case i want to test multiple sets of machine code, a common approach is to pass the file path as a constructor parameter to the PIPELINE. This way, you can easily switch between different instruction files as needed. => I add a test2 which is a riscv code called sum_int to the TOPTest and running on 5-Stage-RV32I successfully ![image](https://hackmd.io/_uploads/HkCfTYR81l.png) ### Implementation of the DebugPort Interface and Its Verification After introducing a **DebugPort** interface to facilitate direct readout from the CPU’s register file (`RegisterFile`), I performed the following steps: 1. **Module Modifications** - **`RegisterFile.scala`**: Added new I/O ports to read a specified register (e.g., `debug_read_reg`) and output its value (`debug_reg_value`). - **`PIPELINE.scala`**: Exposed the above debug ports through the top-level module, allowing external test code to poke/peek those signals. 2. **Dedicated DebugPortTest Suite** - Created a **`DebugPortTest`** class in **`MainTest.scala`**, leveraging [ChiselTest’s][chiseltest-docs] `poke()`, `peek()`, and `expect()` APIs. - Loaded a simple RISC-V `add` instruction sequence (located in **`test_add.txt`**). This sequence initializes registers `x3` and `x4` to constants and then adds them into `x5`. 3. **Validation** - After running the pipeline for a sufficient number of clock cycles (`dut.clock.step(...)`), I used: ```scala dut.io.debug_read_reg.poke(5.U) // Read register x5 val result = dut.io.debug_reg_value.peek() dut.io.debug_reg_value.expect(42.S) // Expect x5 == 42 ``` - The test confirmed that register `x5` indeed contained the expected value of **42**, demonstrating both the functionality of the new debug interface and the correctness of the five-stage RV32I pipeline. ![image](https://hackmd.io/_uploads/HkJcq5A8ke.png) > **Note**: > - Make sure to give the pipeline enough clock cycles to complete the instructions before checking the debug port. > - If your ChiselTest version differs, you may need to adjust the specific API calls or use alternative testing methods (e.g., comparing `result.litValue` with the expected integer). ### problem3 : testq2_square fail and debug --- #### **I. `sp` Not Initialized** ##### Symptom: In typical Linux/OS or standard C runtime environments, the system’s startup code automatically sets the stack pointer (sp) to the top of a valid memory region. However, in our educational 5-stage RV32I processor/bare-metal environment, there is no default mechanism to initialize sp. As a result, upon reset, sp often starts off as zero (or undefined), which causes the program to treat sp as a valid address even though it actually points to invalid or out-of-range memory locations. ##### Impact: Whenever you execute code that uses the stack (such as function calls/returns or local variables), it can fail. The program may produce incorrect results or jump to an invalid address and hang or crash. Solution: In the “software side” of the program entry point, explicitly set sp to a safe location in RAM #### **II. Program Output (`a0`) Wrong and Unusually Large** :::info #### recall the hazard and forwarding ::: spoiler Types of Hazards Encountered 1. **Load-use Hazard**(data hazard) Requires **1-cycle stall** (special case of RAW hazard when the previous instruction is a load). 2. **Other RAW Hazards** (data hazard) Usually handled by **forwarding** (no stall needed). 3. **Branch/JAL/JALR Hazards** (control hazard) Involve **flushing** the IF/ID stage and possibly dealing with hazards in ID/EX if the instruction depends on a register yet to be written. ### Data Hazards #### 1. RAW (Read After Write) - **Symptom:** A subsequent instruction tries to read a register that a previous instruction hasn’t written yet. - **Solution:** - **Forwarding** (also called bypassing) typically solves RAW hazards without stalling. --- #### 2. **Load-use Hazard** (a special RAW) - Occurs when a load instruction is immediately followed by an instruction that needs its loaded data. - **Scenario:** - **Load** + R-type (or I-type ALU) - The next instruction tries to read the result in **EX** or ID stage, but the load data is only available after **MEM** completes. - **Solution:** - **Stall for 1 cycle** - Because you only get the data after the **MEM** stage, and the next instruction’s EX stage can’t wait if we do not stall. - Concretely: - In the n+3 cycle (where load is in MEM), the CPU retrieves data from memory. - In that same cycle’s second half, forwarding can pass this data to the next instruction, which is now only entering EX after stalling. - We temporarily freeze (bubble) `IF/ID` and `ID/EX` so the dependent instruction does not advance for one cycle. > **Example Timing** > - **Front half**: `lw` finishes MEM, obtains data, and forwards it > - **Back half**: the dependent instruction is in EX, receiving that data. - **Load + Branch** is even trickier because branch decisions are resolved in the ID stage: - If the branch offset or comparison depends on a loaded register, we can’t resolve the branch until the load data is known, typically requiring **2 cycles stall** (since you need the data earlier in the pipeline, at ID). --- ### Control Hazards - **When?** - For a conditional branch, the decision to jump (or not) is known after ID stage. - **Solution:** - Possibly handle data hazard via **forwarding** if the register used in the branch comparison is being written by a previous instruction. - **Flush** the pipeline (e.g., clear IF/ID) if the branch decides to jump. ::: > **Timing Note (Digital Circuit / Pipeline):** > We often assume an idealized timescale where in the “first half” of a cycle, an instruction’s output is produced and can be forwarded; in the “second half,” the subsequent instruction consumes it via forwarding. --- :::info systematically analyzing `.vcd` files with **GTKWave** to pinpoint logic errors, verify protocol behavior, and ensure correct signal timing ::: ![image](https://hackmd.io/_uploads/BJNxrb2Dyl.png) ##### Observing Instruction Execution and Results The program was intended to perform an addition of `x0` and `a0`, but the final outcome ended up as `0x49`. ![image](https://hackmd.io/_uploads/HJfwSWhv1g.png) > This indicates that `x0` was treated as `0x48`, and we noticed `reg_7 = 0x49` at the end (i.e., `io_w_data = 0x49`), implying the ALU’s output (or one of its inputs) was corrupted. --- ##### Tracing the ALU Output ![image](https://hackmd.io/_uploads/SyJ9S-3Pkl.png) I examined the ALU’s `out`; its input `inB` was supposed to be `x0=0`, but in reality became `0x48`. We initially suspected **hazard forwarding** (EX/MEM or MEM/WB) might have overridden `rs2_data` (which should have been `x0`) with an unrelated value. --- ##### Checking Forwarding ![image](https://hackmd.io/_uploads/HJMhBZhD1g.png) - `forwarding.io.forward_a` and `forwarding.io.forward_b` were both `b00`, indicating **no forwarding** at those ports. - Hence, the standard forwarding logic **did not** inject spurious data into `x0`. --- ##### Moving One Stage Up: `ID/EX rs2_data_out` ![image](https://hackmd.io/_uploads/HyfAB-2wyl.png) We found: - `ID/EX.io.rs2_data_out = 0x48` - `Structural.io.fwd_rs2 = 1` Meaning the **Structural** module **incorrectly** decided to **overwrite** `rs2` with `RegFile.io.w_data`. Upon checking `Structural.scala`, we discovered: ![image](https://hackmd.io/_uploads/SJUzLWnDyx.png) > Whenever the instruction is **`lw`** in ID stage and the WB stage has `io.MEM_WB_regWr = 1` **plus** a matching `rd == rs2`, it triggers forwarding. But **if `rd=0`**, it incorrectly matches `rs2=0`, causing an unwanted forwarding into `x0`. --- ##### Identifying the Root Cause In RISC-V, writing to **`rd = x0`** is effectively invalid (`x0` is always `0`). Because the Structural logic did **not** exclude `rd=0` from its check, any time `rd=0` coincided with `rs2=0`, the module attempted to forward. This erroneously plugged in `0x48` where `x0` should have stayed zero. --- ##### Fixing the Issue ![image](https://hackmd.io/_uploads/S15uQKhDJg.png) By adding a condition **`&& (rd =/= 0.U)`** in the Structural forwarding logic, we exclude `x0` from being forwarded to. Consequently, `io.MEM_WB_rd=0` will no longer match `rs2=0`.This ensures `x0` **remains zero** as intended and prevents the ALU from receiving `0x48` in place of `x0`. ### problem4 : testq4_log2 fail and debug #### Debugging the Incorrect Result ![image](https://hackmd.io/_uploads/Sk66gq3vJx.png) :::info systematically analyzing `.vcd` files with **GTKWave** to pinpoint logic errors, verify protocol behavior, and ensure correct signal timing ![image](https://hackmd.io/_uploads/rkKIbq3D1g.png) ::: 1. **Initial Observation** ![image](https://hackmd.io/_uploads/HJz4353Pkl.png) - I first check key registers from the register file—particularly `x10 (a0)`, `x1 (ra)`, `x2 (sp)`, and `x5 (t0)`. - Then, I match them against the fetched instruction from `instmem.io_data`, noting that the machine code `00150513` corresponds to `addi a0, a0, 1`. - **Expected** outcome: `a0` should be incremented by 1. - **Actual** outcome: the value written back was `9`, instead of `1`. 2. **Tracing Backward From the Faulty Instruction** - I observe a load or add path and check `dmem` if it’s a load, or `alu.io_out` plus `alu.io_in_A / alu.io_in_B` if it’s an add. - It turns out the ALU output was `9`, suggesting the input was incorrect. - Indeed, `inA` was `0x08`, which means `a0` had been `0x8` instead of `0x0`. 3. **Further Analysis: Wrong ALU Input** ![image](https://hackmd.io/_uploads/HyWFsq2Dyl.png) - Looking back, the source instruction that set `a0` to `0x8` was the older `add a0, zero, zero` (machine code `00000533`), presumably not updated correctly (using stale data). - No forwarding was triggered for that path. However, when `addi a0, a0, 1` (`00150513`) is in ID, the older `add a0, zero, zero` is already at WB. - **Hence**, we’d expect a forwarding path from MEM/WB to ID to supply the correct `a0` value. 4. **Misrouted Forwarding** ![image](https://hackmd.io/_uploads/B1jZoc3wyg.png) - Because we want to forward from the **WB** stage back to an instruction currently in **ID** stage, we check the `structural` hazard (or forwarding) unit’s signals. - We see `structural.fwd_rs1` isn’t triggered because `structural.MEM_WB_regWr` was `0` at that moment. This conflicts with the actual pipeline state: `add a0, zero, zero` is indeed writing back, so `regWr` should be `1`. - On reviewing the design in `Main` or `PIPELINE`, we realize the signal for `reg_w` in WB wasn’t correctly propagated into `structural`. The `structural` unit is only looking at `EX_MEM_M.io.EXMEM_reg_w_out`, but we really need the WB-stage control line to detect **WB** → **ID** hazards. 5. **Fixing the Issue** ![image](https://hackmd.io/_uploads/B1QUjqhDyg.png) - The solution is to ensure that the **WB** register-write signal (`MEM_WB_regWr`) feeds into `structural` (or an equivalent hazard-check module) so it can detect “the instruction in WB is writing to the same register `rs1` needed by the instruction in ID.” - By including the correct control path, we allow `structural` to assert `fwd_rs1` properly and forward from WB to ID when required, fixing the stale data problem and ensuring `addi a0, a0, 1` sees the updated register value rather than `0x8`. [chiseltest-docs]: https://github.com/ucb-bar/chisel-testers2 "Official ChiselTest Documentation" ### Processor Validation with RISC-V Test Programs In order to verify the processor’s functionality—particularly its hazard detection and forwarding mechanisms—I used five RISC-V tests originally adapted from quiz questions. The tests were: 1. **Testq1_shift_and_add_mul** 2. **Testq2_square** 3. **Testq3_fib** 4. **Testq4_log2** 5. **Testq5_bitreverse** #### Final Validation via SBT Test After iterating through the debugging steps, I reran the SBT-based test framework: - **All Five Tests Passed** Each of the listed RISC-V programs (mul, square, fib, log2, bitreverse) successfully executed end-to-end, confirming that: - **Hazard Detection** is properly stalling or forwarding as required. - **Forwarding Logic** handles both ALU-to-ALU 、 MEM-to-ALU and WB-to-ALU paths without data corruption. - **Basic ISA Functionality** adheres to the RISC-V specification for these core instructions. ![image](https://hackmd.io/_uploads/HJDRqHTw1l.png) #### reference - [5-Stage Pipeline Processor in Chisel - Kinza Fatima](https://github.com/kinzafatim/5-Stage-RV32I/tree/main/src/main/scala/Pipeline) - [testbench](https://hackmd.io/@dppa1008/testbench) - [How-to-write-a-testbench-in-Chisel](https://hackmd.io/@Haouo/BkV0yTTo5#How-to-write-a-testbench-in-Chisel) - [YatCPU](https://yatcpu.sysu.tech/) - [chiseltest-docs](https://github.com/ucb-bar/chisel-testers2)