饒胤琛
Important
The target of the project is to get familiar with designing digital systems through chisel.Afterwards,extending the 5-Stage-RV32I by kinzafatim with hazarddetection as well as forwarding logic and fully verified by 5 testbenches
Typically, one uses class ... extends Module
. In this project, for instance, we have class PIPELINE(...) extends Module
.
This represents the “highest-level” hardware design, responsible for coordinating the various stages and submodules.
Inside the top-level class :
val io = IO(new Bundle {
val out = Output (SInt(4.W))
})
According to our design needs, we “create” other module instances in the top-level code, such as:
val IF_ID_ = Module(new IF_ID)
val InstMemory = Module(new InstMem(initFile))
In Chisel, these instances function like components in a circuit schematic: we declare them using Module(new …) and give them names.
We then perform all necessary I/O connections among submodules and the top-level io
.
For example:
IF_ID_.io.pc_in := PC.io.out
InstMemory.io.addr := PC.io.out.asUInt
Finally, we ensure that signals and control paths are properly linked from one stage to the next.
(Summary)
This procedure—defining the top module, declaring the I/O interface, instantiating submodules, and wiring them up—is a typical Chisel hardware design workflow. Once complete, you can simulate in Chisel/Verilog and map it onto an FPGA or ASIC.
In this project’s Main.scala
, we commonly see four pipeline registers:
val IF_ID_ = Module(new IF_ID)
val ID_EX_ = Module(new ID_EX)
val EX_MEM_M = Module(new EX_MEM)
val MEM_WB_M = Module(new MEM_WB)
They handle the IF→ID, ID→EX, EX→MEM, MEM→WB stages, respectively.
These registers latch instruction details and control signals at each stage, ensuring each instruction traverses the five pipeline steps—fetching, decoding, executing, memory accessing, and writing back—over multiple clock cycles.
If a previous instruction is a load (memRead=1
), and the next instruction immediately needs that register (rs1/rs2 == rd) in the EX stage, we must stall for one cycle and insert a bubble in EX.
In practice, HazardDetection.scala
detects this condition and outputs ctrl_forward=1
, prompting the top-level code to freeze PC/IF_ID
and set ID_EX
control signals to zero (bubble).
If a previous instruction hasn’t fully written back to the register file, but the next instruction in EX needs that value, we use Forwarding.scala
to route data (EX/MEM or MEM/WB outputs) directly to the ALU.
If the instruction is a branch or jump, we need to decide whether to take the branch in the ID or EX stage. If taken, we flush the pipeline (clearing IF/ID or beyond). Meanwhile, BranchForward.scala
performs data forwarding for rs1, rs2 needed in branch comparisons.
Since this project uses separate instruction/data memories, we don’t encounter a unified memory structural hazard. However, we do consider “RegisterFile read and write in the same cycle.” In StructuralHazard.scala
, if ID stage reads xN at the same time WB stage writes xN, we directly fetch RegFile.io.w_data
instead of stale data.
Forwarding.scala
)Deals with ALU inputs in the EX stage (in_A, in_B
).
forward_x = 1.U
or 2.U
, the data might come from EX/MEM or MEM/WB.rs1/rs2
.BranchForward.scala
)Specifically for instructions like beq
, bne
or jalr
that may resolve in ID. If a previous instruction hasn’t written the needed register, we route from ALU.out, EX_MEM, or WB.
Unlike ALU forwarding, branch/jalr decisions might be made in the ID stage or a specialized branch unit, requiring separate forwarding logic.
StructuralHazard.scala
)If ID stage reads xN while WB stage is writing xN in the same clock cycle, we immediately use RegFile.io.w_data
. This is another form of data hazard, though many textbooks might solve it by writing in the first half-cycle and reading in the second half-cycle. This project handles it explicitly in a separate file.
.elf
and .txt
Memory LoadingloadMemoryFromFile
loadMemoryFromFile(...)
, as provided by Chisel/FIRRTL/Treadle, generally expects a simple text file with one data word per line (in hex or binary). It does not parse ELF structure (section headers, symbol tables, etc.). Hence, if we only have an .elf
file, we must convert it to .bin
, .hex
, or .mem
first—some textual format that loadMemoryFromFile
can handle.
ELF (Executable and Linkable Format) includes section headers, relocation data, and more. A minimal hardware memory model in Chisel is unaware of these complexities and doesn’t come with an ELF loader. Consequently, typical solutions involve converting ELF to .txt
/ .mem
, ensuring each line corresponds to 32-bit instructions at addresses 0, 4, 8, etc.
In a pipeline design:
Stall
Freezes a particular stage (e.g., ID), preventing that pipeline register from updating or fetching a new instruction. Effectively, the same instruction remains for one extra cycle.
Bubble
Continues advancing the pipeline register but replaces control signals with zeros (NOP) so that stage does no meaningful work.
Practically, we often stall the ID stage and simultaneously bubble the EX stage. For example, if we have a load-use hazard, ID remains stuck, while EX receives a no-op.
By Pipeline Stage
IF_ID
.IF_ID
inputs + RegisterFile + Control + ImmGen + HazardDetect, then forward to ID_EX
.ID_EX
inputs + ALU + ALUControl + Forwarding, then forward to EX_MEM
.EX_MEM
inputs + DataMemory, then forward to MEM_WB
.MEM_WB
inputs, writes back to RegFile.By Functional Blocks
Hybrid
reason : cause test.txt address is absolute in Main.scala
=> The solution is we can just modify the address to a relative address to solve
val InstMemory = Module(new InstMem ("/home/kinzaa/Desktop/5-Stage-RV32I/src/main/scala/Pipeline/test.txt"))
reason : In my** Main.scala** (under the PIPELINE directory), InstMem is currently hard-coded as follows:
val InstMemory = Module(new InstMem("/home/.../test.txt"))
As a result, it always loads instructions from test.txt.
=> I make it Parameterized
In case i want to test multiple sets of machine code, a common approach is to pass the file path as a constructor parameter to the PIPELINE. This way, you can easily switch between different instruction files as needed.
=> I add a test2 which is a riscv code called sum_int to the TOPTest and running on 5-Stage-RV32I successfully
After introducing a DebugPort interface to facilitate direct readout from the CPU’s register file (RegisterFile
), I performed the following steps:
Module Modifications
RegisterFile.scala
: Added new I/O ports to read a specified register (e.g., debug_read_reg
) and output its value (debug_reg_value
).PIPELINE.scala
: Exposed the above debug ports through the top-level module, allowing external test code to poke/peek those signals.Dedicated DebugPortTest Suite
DebugPortTest
class in MainTest.scala
, leveraging ChiselTest’s poke()
, peek()
, and expect()
APIs.add
instruction sequence (located in test_add.txt
). This sequence initializes registers x3
and x4
to constants and then adds them into x5
.Validation
dut.clock.step(...)
), I used:
dut.io.debug_read_reg.poke(5.U) // Read register x5
val result = dut.io.debug_reg_value.peek()
dut.io.debug_reg_value.expect(42.S) // Expect x5 == 42
x5
indeed contained the expected value of 42, demonstrating both the functionality of the new debug interface and the correctness of the five-stage RV32I pipeline.Note:
- Make sure to give the pipeline enough clock cycles to complete the instructions before checking the debug port.
- If your ChiselTest version differs, you may need to adjust the specific API calls or use alternative testing methods (e.g., comparing
result.litValue
with the expected integer).
sp
Not InitializedIn typical Linux/OS or standard C runtime environments, the system’s startup code automatically sets the stack pointer (sp) to the top of a valid memory region. However, in our educational 5-stage RV32I processor/bare-metal environment, there is no default mechanism to initialize sp. As a result, upon reset, sp often starts off as zero (or undefined), which causes the program to treat sp as a valid address even though it actually points to invalid or out-of-range memory locations.
Whenever you execute code that uses the stack (such as function calls/returns or local variables), it can fail.
The program may produce incorrect results or jump to an invalid address and hang or crash.
Solution:
In the “software side” of the program entry point, explicitly set sp to a safe location in RAM
a0
) Wrong and Unusually LargeOccurs when a load instruction is immediately followed by an instruction that needs its loaded data.
Scenario:
Solution:
IF/ID
and ID/EX
so the dependent instruction does not advance for one cycle.Example Timing
- Front half:
lw
finishes MEM, obtains data, and forwards it- Back half: the dependent instruction is in EX, receiving that data.
Timing Note (Digital Circuit / Pipeline):
We often assume an idealized timescale where in the “first half” of a cycle, an instruction’s output is produced and can be forwarded; in the “second half,” the subsequent instruction consumes it via forwarding.
systematically analyzing .vcd
files with GTKWave
to pinpoint logic errors, verify protocol behavior, and ensure correct signal timing
The program was intended to perform an addition of x0
and a0
, but the final outcome ended up as 0x49
.
This indicates that
x0
was treated as0x48
, and we noticedreg_7 = 0x49
at the end (i.e.,io_w_data = 0x49
), implying the ALU’s output (or one of its inputs) was corrupted.
I examined the ALU’s out
; its input inB
was supposed to be x0=0
, but in reality became 0x48
. We initially suspected hazard forwarding (EX/MEM or MEM/WB) might have overridden rs2_data
(which should have been x0
) with an unrelated value.
forwarding.io.forward_a
and forwarding.io.forward_b
were both b00
, indicating no forwarding at those ports.x0
.ID/EX rs2_data_out
We found:
ID/EX.io.rs2_data_out = 0x48
Structural.io.fwd_rs2 = 1
Meaning the Structural module incorrectly decided to overwrite rs2
with RegFile.io.w_data
.
Upon checking Structural.scala
, we discovered:
Whenever the instruction is
lw
in ID stage and the WB stage hasio.MEM_WB_regWr = 1
plus a matchingrd == rs2
, it triggers forwarding.
But ifrd=0
, it incorrectly matchesrs2=0
, causing an unwanted forwarding intox0
.
In RISC-V, writing to rd = x0
is effectively invalid (x0
is always 0
). Because the Structural logic did not exclude rd=0
from its check, any time rd=0
coincided with rs2=0
, the module attempted to forward. This erroneously plugged in 0x48
where x0
should have stayed zero.
By adding a condition && (rd =/= 0.U)
in the Structural forwarding logic, we exclude x0
from being forwarded to. Consequently, io.MEM_WB_rd=0
will no longer match rs2=0
.This ensures x0
remains zero as intended and prevents the ALU from receiving 0x48
in place of x0
.
systematically analyzing .vcd
files with GTKWave
to pinpoint logic errors, verify protocol behavior, and ensure correct signal timing
Initial Observation
x10 (a0)
, x1 (ra)
, x2 (sp)
, and x5 (t0)
.instmem.io_data
, noting that the machine code 00150513
corresponds to addi a0, a0, 1
.a0
should be incremented by 1.9
, instead of 1
.Tracing Backward From the Faulty Instruction
dmem
if it’s a load, or alu.io_out
plus alu.io_in_A / alu.io_in_B
if it’s an add.9
, suggesting the input was incorrect.inA
was 0x08
, which means a0
had been 0x8
instead of 0x0
.Further Analysis: Wrong ALU Input
a0
to 0x8
was the older add a0, zero, zero
(machine code 00000533
), presumably not updated correctly (using stale data).addi a0, a0, 1
(00150513
) is in ID, the older add a0, zero, zero
is already at WB.a0
value.Misrouted Forwarding
structural
hazard (or forwarding) unit’s signals.structural.fwd_rs1
isn’t triggered because structural.MEM_WB_regWr
was 0
at that moment. This conflicts with the actual pipeline state: add a0, zero, zero
is indeed writing back, so regWr
should be 1
.Main
or PIPELINE
, we realize the signal for reg_w
in WB wasn’t correctly propagated into structural
. The structural
unit is only looking at EX_MEM_M.io.EXMEM_reg_w_out
, but we really need the WB-stage control line to detect WB → ID hazards.Fixing the Issue
MEM_WB_regWr
) feeds into structural
(or an equivalent hazard-check module) so it can detect “the instruction in WB is writing to the same register rs1
needed by the instruction in ID.”structural
to assert fwd_rs1
properly and forward from WB to ID when required, fixing the stale data problem and ensuring addi a0, a0, 1
sees the updated register value rather than 0x8
.In order to verify the processor’s functionality—particularly its hazard detection and forwarding mechanisms—I used five RISC-V tests originally adapted from quiz questions. The tests were:
After iterating through the debugging steps, I reran the SBT-based test framework: