Pipelined RISC-V in Chisel

饒胤琛

Important

The target of the project is to get familiar with designing digital systems through chisel.Afterwards,extending the 5-Stage-RV32I by kinzafatim with hazarddetection as well as forwarding logic and fully verified by 5 testbenches

study 5-Stage-RV32I by kinzafatim

I. Chisel Top-Level Structure and Common Workflow

1. Defining a Top-Level System (Top-Level Module)

Typically, one uses class ... extends Module. In this project, for instance, we have class PIPELINE(...) extends Module.
This represents the “highest-level” hardware design, responsible for coordinating the various stages and submodules.

2. Defining Top-Level I/O (Bundle)

Inside the top-level class :

val io = IO(new Bundle {
  val out = Output (SInt(4.W))
 })

3. Instantiating Submodules

According to our design needs, we “create” other module instances in the top-level code, such as:

val IF_ID_ = Module(new IF_ID)
val InstMemory = Module(new InstMem(initFile))

In Chisel, these instances function like components in a circuit schematic: we declare them using Module(new …) and give them names.

4. Wiring and Interconnection

We then perform all necessary I/O connections among submodules and the top-level io.
For example:

IF_ID_.io.pc_in    := PC.io.out
InstMemory.io.addr := PC.io.out.asUInt

Finally, we ensure that signals and control paths are properly linked from one stage to the next.

(Summary)
This procedure—defining the top module, declaring the I/O interface, instantiating submodules, and wiring them up—is a typical Chisel hardware design workflow. Once complete, you can simulate in Chisel/Verilog and map it onto an FPGA or ASIC.

II. Pipeline Architecture and Hazard Analysis

1. Five-Stage Pipeline Registers

In this project’s Main.scala, we commonly see four pipeline registers:

val IF_ID_   = Module(new IF_ID)
val ID_EX_   = Module(new ID_EX)
val EX_MEM_M = Module(new EX_MEM)
val MEM_WB_M = Module(new MEM_WB)

They handle the IF→ID, ID→EX, EX→MEM, MEM→WB stages, respectively.
These registers latch instruction details and control signals at each stage, ensuring each instruction traverses the five pipeline steps—fetching, decoding, executing, memory accessing, and writing back—over multiple clock cycles.

2. Hazard and Forwarding Modules

Load-Use Hazard

If a previous instruction is a load (memRead=1), and the next instruction immediately needs that register (rs1/rs2 == rd) in the EX stage, we must stall for one cycle and insert a bubble in EX.

In practice, HazardDetection.scala detects this condition and outputs ctrl_forward=1, prompting the top-level code to freeze PC/IF_ID and set ID_EX control signals to zero (bubble).

General Data Hazard (R/I/B-type)

If a previous instruction hasn’t fully written back to the register file, but the next instruction in EX needs that value, we use Forwarding.scala to route data (EX/MEM or MEM/WB outputs) directly to the ALU.

Branch Hazard (Branch / Jal / Jalr)

If the instruction is a branch or jump, we need to decide whether to take the branch in the ID or EX stage. If taken, we flush the pipeline (clearing IF/ID or beyond). Meanwhile, BranchForward.scala performs data forwarding for rs1, rs2 needed in branch comparisons.

Structural Hazard (Simultaneous Read/Write)

Since this project uses separate instruction/data memories, we don’t encounter a unified memory structural hazard. However, we do consider “RegisterFile read and write in the same cycle.” In StructuralHazard.scala, if ID stage reads xN at the same time WB stage writes xN, we directly fetch RegFile.io.w_data instead of stale data.

3. Forwarding Scenarios

ALU forwarding (`Forwarding.scala`)

Deals with ALU inputs in the EX stage (in_A, in_B).

If forward_x = 1.U or 2.U, the data might come from EX/MEM or MEM/WB.
If no hazard, we simply use rs1/rs2.

Branch / JALR forwarding (`BranchForward.scala`)

Specifically for instructions like beq, bne or jalr that may resolve in ID. If a previous instruction hasn’t written the needed register, we route from ALU.out, EX_MEM, or WB.

Unlike ALU forwarding, branch/jalr decisions might be made in the ID stage or a specialized branch unit, requiring separate forwarding logic.

RegisterFile Same-Cycle Forwarding (`StructuralHazard.scala`)

If ID stage reads xN while WB stage is writing xN in the same clock cycle, we immediately use RegFile.io.w_data. This is another form of data hazard, though many textbooks might solve it by writing in the first half-cycle and reading in the second half-cycle. This project handles it explicitly in a separate file.

III. Differences Between `.elf` and `.txt` Memory Loading

1. Limitations of Chisel’s `loadMemoryFromFile`

loadMemoryFromFile(...), as provided by Chisel/FIRRTL/Treadle, generally expects a simple text file with one data word per line (in hex or binary). It does not parse ELF structure (section headers, symbol tables, etc.). Hence, if we only have an .elf file, we must convert it to .bin, .hex, or .mem first—some textual format that loadMemoryFromFile can handle.

2. ELF Format Complexity

ELF (Executable and Linkable Format) includes section headers, relocation data, and more. A minimal hardware memory model in Chisel is unaware of these complexities and doesn’t come with an ELF loader. Consequently, typical solutions involve converting ELF to .txt / .mem, ensuring each line corresponds to 32-bit instructions at addresses 0, 4, 8, etc.

IV. The Principles of Stall and Bubble

In a pipeline design:

Stall
Freezes a particular stage (e.g., ID), preventing that pipeline register from updating or fetching a new instruction. Effectively, the same instruction remains for one extra cycle.
Bubble
Continues advancing the pipeline register but replaces control signals with zeros (NOP) so that stage does no meaningful work.

Practically, we often stall the ID stage and simultaneously bubble the EX stage. For example, if we have a load-use hazard, ID remains stuck, while EX receives a no-op.

V. Conclusion: Pipeline Architecture Division and Key Takeaways

A. Possible Ways to Partition the Design

By Pipeline Stage
- IF: PC + InstMemory, then send outputs to IF_ID.
- ID: IF_ID inputs + RegisterFile + Control + ImmGen + HazardDetect, then forward to ID_EX.
- EX: ID_EX inputs + ALU + ALUControl + Forwarding, then forward to EX_MEM.
- MEM: EX_MEM inputs + DataMemory, then forward to MEM_WB.
- WB: MEM_WB inputs, writes back to RegFile.
By Functional Blocks
- PC/Branch (PC, PC4, Branch_M, JALR, BranchForward)
- Hazard Detection (HazardDetect, Structural, Forwarding)
- Decode/Control (Control, ImmGen, RegisterFile)
- Pipeline Registers (IF_ID, ID_EX, EX_MEM, MEM_WB)
- Memory (InstMemory, DataMemory)
Hybrid
- Outline the five pipeline stages, then insert hazard forwarding and branch logic wherever they intersect.

TOP module structure

Refining and Extending the 5-Stage RV32I Pipeline

problem1 : sbt test TOPTest fail

reason : cause test.txt address is absolute in Main.scala
=> The solution is we can just modify the address to a relative address to solve

val InstMemory          =   Module(new InstMem ("/home/kinzaa/Desktop/5-Stage-RV32I/src/main/scala/Pipeline/test.txt"))

problem2 : we can only run one program on the 5-stage-RV32I per test

reason : In my** Main.scala** (under the PIPELINE directory), InstMem is currently hard-coded as follows:

val InstMemory = Module(new InstMem("/home/.../test.txt"))

As a result, it always loads instructions from test.txt.
=> I make it Parameterized
In case i want to test multiple sets of machine code, a common approach is to pass the file path as a constructor parameter to the PIPELINE. This way, you can easily switch between different instruction files as needed.
=> I add a test2 which is a riscv code called sum_int to the TOPTest and running on 5-Stage-RV32I successfully

Implementation of the DebugPort Interface and Its Verification

After introducing a DebugPort interface to facilitate direct readout from the CPU’s register file (RegisterFile), I performed the following steps:

Module Modifications
- RegisterFile.scala: Added new I/O ports to read a specified register (e.g., debug_read_reg) and output its value (debug_reg_value).
- PIPELINE.scala: Exposed the above debug ports through the top-level module, allowing external test code to poke/peek those signals.
Dedicated DebugPortTest Suite
- Created a DebugPortTest class in MainTest.scala, leveraging ChiselTest’s poke(), peek(), and expect() APIs.
- Loaded a simple RISC-V add instruction sequence (located in test_add.txt). This sequence initializes registers x3 and x4 to constants and then adds them into x5.
Validation
- After running the pipeline for a sufficient number of clock cycles (dut.clock.step(...)), I used:
```
dut.io.debug_read_reg.poke(5.U)         // Read register x5
val result = dut.io.debug_reg_value.peek()
dut.io.debug_reg_value.expect(42.S)     // Expect x5 == 42
```
- The test confirmed that register x5 indeed contained the expected value of 42, demonstrating both the functionality of the new debug interface and the correctness of the five-stage RV32I pipeline.

Note:

Make sure to give the pipeline enough clock cycles to complete the instructions before checking the debug port.

If your ChiselTest version differs, you may need to adjust the specific API calls or use alternative testing methods (e.g., comparing result.litValue with the expected integer).

problem3 : testq2_square fail and debug

I. `sp` Not Initialized

Symptom:

In typical Linux/OS or standard C runtime environments, the system’s startup code automatically sets the stack pointer (sp) to the top of a valid memory region. However, in our educational 5-stage RV32I processor/bare-metal environment, there is no default mechanism to initialize sp. As a result, upon reset, sp often starts off as zero (or undefined), which causes the program to treat sp as a valid address even though it actually points to invalid or out-of-range memory locations.

Impact:

Whenever you execute code that uses the stack (such as function calls/returns or local variables), it can fail.
The program may produce incorrect results or jump to an invalid address and hang or crash.
Solution:
In the “software side” of the program entry point, explicitly set sp to a safe location in RAM

II. Program Output (`a0`) Wrong and Unusually Large

recall the hazard and forwarding

Types of Hazards Encountered

Load-use Hazard(data hazard)
Requires 1-cycle stall (special case of RAW hazard when the previous instruction is a load).
Other RAW Hazards (data hazard)
Usually handled by forwarding (no stall needed).
Branch/JAL/JALR Hazards (control hazard)
Involve flushing the IF/ID stage and possibly dealing with hazards in ID/EX if the instruction depends on a register yet to be written.

Data Hazards

1. RAW (Read After Write)

Symptom: A subsequent instruction tries to read a register that a previous instruction hasn’t written yet.
Solution:
- Forwarding (also called bypassing) typically solves RAW hazards without stalling.

2. Load-use Hazard (a special RAW)

Occurs when a load instruction is immediately followed by an instruction that needs its loaded data.
Scenario:
- Load + R-type (or I-type ALU)
- The next instruction tries to read the result in EX or ID stage, but the load data is only available after MEM completes.
Solution:
- Stall for 1 cycle
  - Because you only get the data after the MEM stage, and the next instruction’s EX stage can’t wait if we do not stall.
  - Concretely:
    - In the n+3 cycle (where load is in MEM), the CPU retrieves data from memory.
    - In that same cycle’s second half, forwarding can pass this data to the next instruction, which is now only entering EX after stalling.
    - We temporarily freeze (bubble) IF/ID and ID/EX so the dependent instruction does not advance for one cycle.

Example Timing

Front half: lw finishes MEM, obtains data, and forwards it

Back half: the dependent instruction is in EX, receiving that data.

Load + Branch is even trickier because branch decisions are resolved in the ID stage:
- If the branch offset or comparison depends on a loaded register, we can’t resolve the branch until the load data is known, typically requiring 2 cycles stall (since you need the data earlier in the pipeline, at ID).

Control Hazards

When?
- For a conditional branch, the decision to jump (or not) is known after ID stage.
Solution:
- Possibly handle data hazard via forwarding if the register used in the branch comparison is being written by a previous instruction.
- Flush the pipeline (e.g., clear IF/ID) if the branch decides to jump.

Timing Note (Digital Circuit / Pipeline):
We often assume an idealized timescale where in the “first half” of a cycle, an instruction’s output is produced and can be forwarded; in the “second half,” the subsequent instruction consumes it via forwarding.

systematically analyzing .vcd files with GTKWave
to pinpoint logic errors, verify protocol behavior, and ensure correct signal timing

Observing Instruction Execution and Results

The program was intended to perform an addition of x0 and a0, but the final outcome ended up as 0x49.

This indicates that x0 was treated as 0x48, and we noticed reg_7 = 0x49 at the end (i.e., io_w_data = 0x49), implying the ALU’s output (or one of its inputs) was corrupted.

Tracing the ALU Output

I examined the ALU’s out; its input inB was supposed to be x0=0, but in reality became 0x48. We initially suspected hazard forwarding (EX/MEM or MEM/WB) might have overridden rs2_data (which should have been x0) with an unrelated value.

Checking Forwarding

forwarding.io.forward_a and forwarding.io.forward_b were both b00, indicating no forwarding at those ports.
Hence, the standard forwarding logic did not inject spurious data into x0.

Moving One Stage Up: `ID/EX rs2_data_out`

We found:

ID/EX.io.rs2_data_out = 0x48
Structural.io.fwd_rs2 = 1

Meaning the Structural module incorrectly decided to overwrite rs2 with RegFile.io.w_data.

Upon checking Structural.scala, we discovered:

Whenever the instruction is lw in ID stage and the WB stage has io.MEM_WB_regWr = 1 plus a matching rd == rs2, it triggers forwarding.
But if rd=0, it incorrectly matches rs2=0, causing an unwanted forwarding into x0.

Identifying the Root Cause

In RISC-V, writing to rd = x0 is effectively invalid (x0 is always 0). Because the Structural logic did not exclude rd=0 from its check, any time rd=0 coincided with rs2=0, the module attempted to forward. This erroneously plugged in 0x48 where x0 should have stayed zero.

Fixing the Issue

By adding a condition && (rd =/= 0.U) in the Structural forwarding logic, we exclude x0 from being forwarded to. Consequently, io.MEM_WB_rd=0 will no longer match rs2=0.This ensures x0 remains zero as intended and prevents the ALU from receiving 0x48 in place of x0.

problem4 : testq4_log2 fail and debug

Debugging the Incorrect Result