# Construct RISC-V in Chisel
> 蕭郁霖, 徐向廷
## A. Repository Study
> [5-Stage-RV32I](https://github.com/kinzafatim/5-Stage-RV32I/)
### 1. Basic Components
#### 1.1 Register File
**Filepath**: ``src/main/scala/Pipeline/UNits/RegisterFile.scala``
```scala=
class RegisterFile extends Module {
val io = IO(new Bundle {
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val reg_write = Input(Bool())
val w_reg = Input(UInt(5.W))
val w_data = Input(SInt(32.W))
val rdata1 = Output(SInt(32.W))
val rdata2 = Output(SInt(32.W))
})
val regfile = RegInit(VecInit(Seq.fill(32)(0.S(32.W))))
io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))
when(io.reg_write && io.w_reg =/= 0.U) {
regfile(io.w_reg) := io.w_data
}
}
```
The code snippet defines a `RegisterFile` module for a RISC-V pipeline, featuring seven input and output ports dedicated to data transfer. In RISC-V, unlike the classic MIPS pipeline, the register file supports two read registers (`rs1` and `rs2`) and a single write register. Initially, the register file is instantiated with 32 registers, all initialized to 0. The outputs `rdata1` and `rdata2` are continuously updated based on the values of `rs1` and `rs2`, respectively—with a special check to ensure that reading from register 0 always returns 0. For write operations, if the `reg_write` flag is asserted and the target register (`w_reg`) is not zero, the corresponding register is updated with the value provided on `w_data`. The following image illustrates the seven ports that facilitate these operations in the `RegisterFile` unit.
<center><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRfRZv5TLVtl1ZMktwcdj2mdCjrYtxOuWPzTTMKanMMLQ7n7JvWRLe-mVg7UCA8kFzrIxNHe_t4P-Q2OdxLPw5cKxQB4xoYU9EXpVnpEBgBUs8F9szcwuZwIr1AgVpOqFtcdtV-DZ3TvQ/s1600/8.gif" height="275"></img></center>
#### 1.2 PC
**Filepath**: `src/main/scala/Pipeline/UNits/PC.scala`
```scala=
class PC extends Module {
val io = IO (new Bundle {
val in = Input(SInt(32.W))
val out = Output(SInt(32.W))
})
val PC = RegInit(0.S(32.W))
io.out := PC
PC := io.in
}
```
The code snippet implements a program counter (`PC`) module that maintains the current program counter value. It uses `RegInit` to initialize the register to 0 and updates the stored PC value with the input (`io.in`) at every cycle, while also exposing this value via `io.out`.
#### 1.3 PC + 4
**Filepath**: `src/main/scala/Pipeline/UNits/PC4.scala`
```scala=
class PC4 extends Module {
val io = IO (new Bundle {
val pc = Input(UInt(32.W))
val out = Output(UInt(32.W))
})
io.out := 0.U
io.out := io.pc + 4.U(32.W)
}
```
The second snippet defines a `PC4` module, which computes the next program counter value by simply adding 4 to the current PC input (`io.pc`). This incrementation is crucial for sequential instruction execution in the pipeline.
#### 1.4 JALR
**Filepath**: `src/main/scala/Pipeline/UNits/JALR.scala`
```scala=
class Jalr extends Module {
val io = IO(new Bundle {
val imme = Input(UInt(32.W))
val rdata1 = Input(UInt(32.W))
val out = Output(UInt(32.W))
})
val computedAddr = io.imme + io.rdata1
// Align the address by masking the least significant bit (LSB) to 0
io.out := computedAddr & "hFFFFFFFE".U
}
```
The code snippet above implements the address calculation for the jump-and-link-register (`JALR`) instruction. The module computes the target address by adding a forwarded register value (`rdata1`) to an immediate offset (`imme`). To ensure proper alignment, it then applies a binary mask (`0xFFFFFFFE`), forcing the least significant bit (LSB) to 0. The aligned jump address is finally provided through `io.out`.
#### 1.5 Imm-Generator
**Filepath**: `src/main/scala/Pipeline/UNits/ImmGenerator.scala`
```scala=
class ImmGenerator extends Module {
val io = IO(new Bundle {
val instr = Input(UInt(32.W))
val pc = Input(UInt(32.W))
val I_type = Output(SInt(32.W))
val S_type = Output(SInt(32.W))
val SB_type = Output(SInt(32.W))
val U_type = Output(SInt(32.W))
val UJ_type = Output(SInt(32.W))
})
// I-Type Immediate: [31:20] sign-extended to 32 bits
io.I_type := Cat(Fill(20, io.instr(31)), io.instr(31, 20)).asSInt
// S-Type Immediate: [31:25][11:7] sign-extended to 32 bits
io.S_type := Cat(Fill(20, io.instr(31)), io.instr(31, 25), io.instr(11, 7)).asSInt
// Branch-Type Immediate: [31][7][30:25][11:8] sign-extended to 32 bits
val sbImm = Cat(Fill(19, io.instr(31)), io.instr(31), io.instr(7), io.instr(30, 25), io.instr(11, 8), 0.U(1.W)).asSInt
io.SB_type := sbImm + io.pc.asSInt
// U-Type Immediate: [31:12] shifted left by 12 bits
io.U_type := Cat(io.instr(31, 12), Fill(12, 0.U)).asSInt
// UJ-Type Immediate: [31][19:12][20][30:21] sign-extended to 32 bits, shifted left by 1 bit
val ujImm = Cat(Fill(11, io.instr(31)), io.instr(31), io.instr(19, 12), io.instr(20), io.instr(30, 21), 0.U(1.W)).asSInt
io.UJ_type := ujImm + io.pc.asSInt
}
```
The code snippet implements the generation of 32-bit immediate values from RISC-V instructions, tailored to each instruction format. For I-type instructions, it extracts bits `[31:20]` from the instruction and sign-extends them to 32 bits. In the case of S-type instructions, the immediate is formed by concatenating bits `[31:25]` with bits `[11:7]` and then sign-extending the result. For branch (SB-type) instructions, the immediate is built by concatenating several segments—bit 31, bit 7, bits `[30:25]`, and bits `[11:8]`—with an additional 0 appended as the least significant bit for proper alignment, followed by sign extension. For U-type instructions, the immediate is taken from bits `[31:12]` and shifted left by 12 bits. Finally, for UJ-type instructions, the immediate is generated by concatenating bit 31, bits `[19:12]`, bit 20, and bits `[30:21]`, appending a trailing 0, and then sign-extending the result to 32 bits.
Additionally, the module computes target addresses for control flow instructions using these immediates. The output `io.SB_type` represents the branch target address for SB-type instructions, obtained by adding the sign-extended branch immediate to the current program counter (PC), thus yielding a PC-relative address for branch operations. Similarly, `io.UJ_type` provides the target address for UJ-type (jump) instructions by adding the corresponding immediate value to the current PC. These computed addresses are essential for correctly directing the control flow during instruction execution in the RISC-V pipeline.
#### 1.6 Control Unit
**Filepath**: `src/main/scala/Pipeline/UNits/control.scala`
```scala=
class Control extends Module {
val io = IO(new Bundle {
val opcode = Input(UInt(7.W)) // 7-bit opcode
val mem_write = Output(Bool()) // whether a write to memory
val branch = Output(Bool()) // whether a branch instruction
val mem_read = Output(Bool()) // whether a read from memory
val reg_write = Output(Bool()) // whether a register write
val men_to_reg = Output(Bool()) // whether the value written to a register (for load instructions)
val alu_operation = Output(UInt(3.W))
val operand_A = Output(UInt(2.W)) // Operand A source selection for the ALU
val operand_B = Output(Bool()) // Operand B source selection for the ALU
// Indicates the type of extension to be used (e.g., sign-extend, zero-extend)
val extend = Output(UInt(2.W))
val next_pc_sel = Output(UInt(2.W)) // next PC value (e.g., PC+4, branch target, jump target)
})
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 0.B
io.men_to_reg := 0.B
io.alu_operation := 0.U
io.operand_A := 0.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 0.U
switch(io.opcode) {
// R type instructions (e.g., add, sub)
is(51.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 0.U
io.operand_A := 0.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 0.U
}
// I type instructions (e.g., immediate operations)
is(19.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 1.U
io.operand_A := 0.U
io.operand_B := 1.B
io.extend := 0.U
io.next_pc_sel := 0.U
}
// S type instructions (e.g., store operations)
is(35.U) {
io.mem_write := 1.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 0.B
io.men_to_reg := 0.B
io.alu_operation := 5.U
io.operand_A := 0.U
io.operand_B := 1.B
io.extend := 1.U
io.next_pc_sel := 0.U
}
// Load instructions (e.g., load data from memory)
is(3.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 1.B
io.reg_write := 1.B
io.men_to_reg := 1.B
io.alu_operation := 4.U
io.operand_A := 0.U
io.operand_B := 1.B
io.extend := 0.U
io.next_pc_sel := 0.U
}
// SB type instructions (e.g., conditional branch)
is(99.U) {
io.mem_write := 0.B
io.branch := 1.B
io.mem_read := 0.B
io.reg_write := 0.B
io.men_to_reg := 0.B
io.alu_operation := 2.U
io.operand_A := 0.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 1.U
}
// UJ type instructions (e.g., jump and link)
is(111.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 3.U
io.operand_A := 1.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 2.U
}
// Jalr instruction (e.g., jump and link register)
is(103.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 3.U
io.operand_A := 1.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 3.U
}
// U type (LUI) instructions (e.g., load upper immediate)
is(55.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 6.U
io.operand_A := 3.U
io.operand_B := 1.B
io.extend := 2.U
io.next_pc_sel := 0.U
}
// U type (AUIPC) instructions (e.g., add immediate to PC)
is(23.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 7.U
io.operand_A := 2.U
io.operand_B := 1.B
io.extend := 2.U
io.next_pc_sel := 0.U
}
}
}
```
The code snippet above implements the control unit for a 5-stage RISC-V pipeline. This module generates a suite of control signals—such as `memory write`, `branch`, `memory read`, `register write`, `memory-to-register`, `ALU operation`, `operand selection`, `extension type`, and next `PC selection`—that steer the processor’s datapath. Using a switch-case construct keyed on the opcode, the module assigns specific values to these signals according to the instruction type (e.g., R-type, I-type, S-type, SB-type, U-type, UJ-type, etc.). The accompanying diagram and mapping table illustrate how these signals are routed to the appropriate hardware components in the pipeline.
<center><img src="https://stevengong.co/attachments/Screen-Shot-2022-12-11-at-1.16.11-PM.png" height="275"></img></center>
<center>
|Label|Signal Name (Code)|Signal Name (Diagram)|
|---|---|---|
|1|io.mem_write|MemWrite|
|2|io.branch|Branch|
|3|io.mem_read|MemRead|
|4|io.reg_write|RegWrite|
|5|io.men_to_reg|MemtoReg|
|6|io.alu_operation|ALUSrc|
|7|io.operand_a|ALUOp1|
|8|io.operand_b|ALUOp0|
</center>
#### 1.7 Branching Unit
**Filepath**: `src/main/scala/Pipeline/UNits/BRANCH.scala`
```scala=
class Branch extends Module {
val io = IO(new Bundle {
val fnct3 = Input(UInt(3.W))
val branch = Input(Bool())
val arg_x = Input(SInt(32.W))
val arg_y = Input(SInt(32.W))
val br_taken = Output(Bool())
})
io.br_taken := false.B
when(io.branch) {
// beq
when(io.fnct3 === 0.U) {
io.br_taken := io.arg_x === io.arg_y
}
// bne
.elsewhen(io.fnct3 === 1.U) {
io.br_taken := io.arg_x =/= io.arg_y
}
// blt
.elsewhen(io.fnct3 === 4.U) {
io.br_taken := io.arg_x < io.arg_y
}
// bge
.elsewhen(io.fnct3 === 5.U) {
io.br_taken := io.arg_x >= io.arg_y
}
// bltu (unsigned less than)
.elsewhen(io.fnct3 === 6.U) {
io.br_taken := io.arg_x.asUInt < io.arg_y.asUInt
}
// bgeu (unsigned greater than or equal)
.elsewhen(io.fnct3 === 7.U) {
io.br_taken := io.arg_x.asUInt >= io.arg_y.asUInt
}
}
}
```
The code snippet implements branch decision logic for RISC-V's conditional branch instructions—namely, `beq`, `bne`, `blt`, `bge`, `bltu`, and `bgeu`. It uses four input ports: `io.fnct3`, which indicates the specific branch condition based on the instruction's function field; `io.branch`, a Boolean flag identifying whether the current instruction is an SB-Type branch; and `io.arg_x` and `io.arg_y`, which are the operands to be compared. Based on the value of `fnct3`, the module evaluates the appropriate comparison between `arg_x` and `arg_y`, and if the condition is satisfied, sets the output `io.br_taken` to true, indicating that a branch should be taken.
#### 1.8 ALU Control Unit
**Filepath**: `src/main/scala/Pipeline/UNits/Alu_Control.scala`
```scala=
class AluControl extends Module {
val io = IO(new Bundle {
val func3 = Input(UInt(3.W))
val func7 = Input(Bool())
val aluOp = Input(UInt(3.W))
val out = Output(UInt(5.W))
})
io.out := 0.U
// R type
when(io.aluOp === 0.U) {
io.out := Cat(0.U(2.W), io.func7, io.func3)
// I type
}.elsewhen(io.aluOp === 1.U) {
io.out := Cat("b00".U(2.W), io.func3)
// SB type
}.elsewhen(io.aluOp === 2.U) {
io.out := Cat("b010".U(3.W), io.func3)
// Branch type
}.elsewhen(io.aluOp === 3.U) {
io.out := "b11111".U
// Loads, S type, U type (lui), U type (auipc)
}.elsewhen(io.aluOp === 4.U || io.aluOp === 5.U || io.aluOp === 6.U || io.aluOp === 7.U) {
io.out := "b00000".U
} .otherwise {
io.out := 0.U
}
}
```
The code snippet above implements the ALU Control Unit for a RISC-V pipeline, as illustrated in the diagram below. This unit features three input ports—`func3`, `func7`, and `aluOp` (a signal provided by the core control unit)—and one output port, `io.out`. The 5-bit output is determined by combining values from these inputs in a way that depends on the instruction type. For example, R-type instructions derive the ALU operation by concatenating specific bits from `func7` and `func3`, while I-type instructions form the control signal by prepending a fixed two-bit value to `func3`. Other instruction types—such as branch (SB type), jump, and load/store operations—are assigned specific constant values to control the ALU accordingly.
<center><img src="https://media.cheggcdn.com/media/246/24615c38-1d21-4653-8276-51de3789f545/phpWU6LEQ" height="500"></img></center>
#### 1.9 ALU Unit
**Filepath**: `src/main/scala/Pipeline/UNits/Alu.scala`
```scala=
object AluOpCode {
val ALU_ADD = 0.U(5.W)
val ALU_ADDI = 0.U(5.W)
val ALU_SW = 0.U(5.W)
val ALU_LW = 0.U(5.W)
val ALU_LUI = 0.U(5.W)
val ALU_AUIPC = 0.U(5.W)
val ALU_SLL = 1.U(5.W)
val ALU_SLLI = 1.U(5.W)
val ALU_SLT = 2.U(5.W)
val ALU_SLTI = 2.U(5.W)
val ALU_SLTU = 3.U(5.W)
val ALU_SLTUI = 3.U(5.W)
val ALU_XOR = 4.U(5.W)
val ALU_XORI = 4.U(5.W)
val ALU_SRL = 5.U(5.W)
val ALU_SRLI = 5.U(5.W)
val ALU_OR = 6.U(5.W)
val ALU_ORI = 6.U(5.W)
val ALU_AND = 7.U(5.W)
val ALU_ANDI = 7.U(5.W)
val ALU_SUB = 8.U(5.W)
val ALU_SRA = 13.U(5.W)
val ALU_SRAI = 13.U(5.W)
val ALU_JAL = 31.U(5.W)
val ALU_JALR = 31.U(5.W)
}
class ALU extends Module {
val io = IO(new Bundle {
val in_A = Input(SInt(32.W))
val in_B = Input(SInt(32.W))
val alu_Op = Input(UInt(5.W))
val out = Output(SInt(32.W))
})
val result = WireDefault(0.S(32.W))
switch(io.alu_Op) {
is(ALU_ADD, ALU_ADDI, ALU_SW, ALU_LW, ALU_LUI, ALU_AUIPC) {
result := io.in_A + io.in_B
}
is(ALU_SLL, ALU_SLLI) {
result := (io.in_A.asUInt << io.in_B(4, 0)).asSInt
}
is(ALU_SLT, ALU_SLTI) {
result := Mux(io.in_A < io.in_B, 1.S, 0.S)
}
is(ALU_SLTU, ALU_SLTUI) {
result := Mux(io.in_A.asUInt < io.in_B.asUInt, 1.S, 0.S)
}
is(ALU_XOR, ALU_XORI) {
result := io.in_A ^ io.in_B
}
is(ALU_SRL, ALU_SRLI) {
result := (io.in_A.asUInt >> io.in_B(4, 0)).asSInt
}
is(ALU_OR, ALU_ORI) {
result := io.in_A | io.in_B
}
is(ALU_AND, ALU_ANDI) {
result := io.in_A & io.in_B
}
is(ALU_SUB) {
result := io.in_A - io.in_B
}
is(ALU_SRA, ALU_SRAI) {
result := (io.in_A >> io.in_B(4, 0)).asSInt
}
is(ALU_JAL, ALU_JALR) {
result := io.in_A
}
}
io.out := result
}
```
The code snippet implements the ALU unit for a RISC-V pipeline, responsible for executing various arithmetic and logical operations based on the instruction type. The module accepts three input ports: two operands (`io.in_A` and `io.in_B`) and an operation code (`io.alu_Op`) coming from the ALU Control Unit. The result of the computation is output via `io.out`. For example, when `io.alu_Op` is set to `ALU_ADD` or `ALU_ADDI` (among other similar opcodes for load/store and immediate operations), the module computes the sum of `io.in_A` and `io.in_B` and assigns the result to `io.out`.
---
### 2. Pipeline Registers
Since the RISC-V pipeline consists of five stages, it requires four sets of pipeline registers. These registers are encapsulated in modules labeled `IF/ID`, `ID/EX`, `EX/MEM`, and `MEM/WB`, where the slash indicates the two adjacent stages that the register bridges. These pipeline registers are painted orange in the illustration below.
<center><img src="https://sirinsoftware.com/wp-content/uploads/2024/03/Scheme-2-1-1.svg" height="275"></img></center>
#### 2.1 IF_ID Pipeline
**Filepath**: `src/main/scala/Pipeline/Pipelines/IF_ID.scala`
```scala=
class IF_ID extends Module {
val io = IO(new Bundle {
val pc_in = Input (SInt(32.W)) // PC in
val pc4_in = Input (UInt(32.W)) // PC4 in
val SelectedPC = Input (SInt(32.W))
val SelectedInstr = Input (UInt(32.W))
val pc_out = Output (SInt(32.W)) // PC out
val pc4_out = Output (UInt(32.W)) // PC + 4 out
val SelectedPC_out = Output (SInt(32.W))
val SelectedInstr_out = Output (UInt(32.W))
})
val Pc_In = RegInit (0.S (32.W))
val Pc4_In = RegInit (0.U (32.W))
val S_pc = RegInit (0.S (32.W))
val S_instr = RegInit (0.U (32.W))
Pc_In := io.pc_in
Pc4_In := io.pc4_in
S_pc := io.SelectedPC
S_instr := io.SelectedInstr
io.pc_out := Pc_In
io.pc4_out := Pc4_In
io.SelectedPC_out := S_pc
io.SelectedInstr_out := S_instr
// io.pc_out := RegNext(io.pc_in)
// io.pc4_out := RegNext(io.pc4_in)
// io.SelectedPC_out := RegNext(io.SelectedPC)
// io.SelectedInstr_out := RegNext(io.SelectedInstr)
}
```
Although the illustration above shows only three register ports at `IF/ID`, the design also takes into account hazard detection (which will be discussed later). In this context, the `SelectedPC` signal represents the program counter after hazard resolution. Consequently, the `IF/ID` pipeline register stores four values: `io.pc_in`, `io.pc4_in`, `io.SelectedPC`, and `io.SelectedInstr`. These registers are instantiated using the `RegInit` class, which initializes them with default values.
#### 2.2 ID_EX Pipeline
**Filepath**: `src/main/scala/Pipeline/Pipelines/ID_EX.scala`
```scala=
class ID_EX extends Module {
val io = IO(new Bundle {
val rs1_in = Input(UInt(5.W))
val rs2_in = Input(UInt(5.W))
val rs1_data_in = Input(SInt(32.W))
val rs2_data_in = Input(SInt(32.W))
val imm = Input(SInt(32.W))
val rd_in = Input(UInt(5.W))
val func3_in = Input(UInt(3.W))
val func7_in = Input(Bool())
val ctrl_MemWr_in = Input(Bool())
val ctrl_Branch_in = Input(Bool())
val ctrl_MemRd_in = Input(Bool())
val ctrl_Reg_W_in = Input(Bool())
val ctrl_MemToReg_in = Input(Bool())
val ctrl_AluOp_in = Input(UInt(3.W))
val ctrl_OpA_in = Input(UInt(2.W))
val ctrl_OpB_in = Input(Bool())
val ctrl_nextpc_in = Input(UInt(2.W))
val IFID_pc4_in = Input(UInt(32.W))
val rs1_out = Output(UInt(5.W))
val rs2_out = Output(UInt(5.W))
val rs1_data_out = Output(SInt(32.W))
val rs2_data_out = Output(SInt(32.W))
val rd_out = Output(UInt(5.W))
val imm_out = Output(SInt(32.W))
val func3_out = Output(UInt(3.W))
val func7_out = Output(Bool())
val ctrl_MemWr_out = Output(Bool())
val ctrl_Branch_out = Output(Bool())
val ctrl_MemRd_out = Output(Bool())
val ctrl_Reg_W_out = Output(Bool())
val ctrl_MemToReg_out = Output(Bool())
val ctrl_AluOp_out = Output(UInt(3.W))
val ctrl_OpA_out = Output(UInt(2.W))
val ctrl_OpB_out = Output(Bool())
val ctrl_nextpc_out = Output(UInt(2.W))
val IFID_pc4_out = Output(UInt(32.W))
})
io.rs1_out := RegNext(io.rs1_in)
io.rs2_out := RegNext(io.rs2_in)
io.rs1_data_out := RegNext(io.rs1_data_in)
io.rs2_data_out := RegNext(io.rs2_data_in)
io.imm_out := RegNext(io.imm)
io.rd_out := RegNext(io.rd_in)
io.func3_out := RegNext(io.func3_in)
io.func7_out := RegNext(io.func7_in)
io.ctrl_MemWr_out := RegNext(io.ctrl_MemWr_in)
io.ctrl_Branch_out := RegNext(io.ctrl_Branch_in)
io.ctrl_MemRd_out := RegNext(io.ctrl_MemRd_in)
io.ctrl_Reg_W_out := RegNext(io.ctrl_Reg_W_in)
io.ctrl_MemToReg_out := RegNext(io.ctrl_MemToReg_in)
io.ctrl_AluOp_out := RegNext(io.ctrl_AluOp_in)
io.ctrl_OpA_out := RegNext(io.ctrl_OpA_in)
io.ctrl_OpB_out := RegNext(io.ctrl_OpB_in)
io.ctrl_nextpc_out := RegNext(io.ctrl_nextpc_in)
io.IFID_pc4_out := RegNext(io.IFID_pc4_in)
}
```
The code snippet implements the `ID/EX` pipeline register, which captures and stores several critical values for the subsequent execution stage. In particular, it holds the operand data (`rs1_data` and `rs2_data`), the incremented program counter (`IFID_pc4`), and the immediate value (`imm`).
Additionally, it preserves nine control signals generated during instruction decode, ensuring proper propagation through the multi-stage pipeline. Register addresses and function fields such as `rs1`, `rs2`, `rd`, `func3`, and `func7` are also stored to support data forwarding in the event of hazards.
`RegNext` is used instead of `RegInit` because it automatically captures and updates each value at the next clock cycle, maintaining seamless data flow between pipeline stages without the need for an explicit initial value.
#### 2.3 EX_MEM Pipeline
**Filepath**: `src/main/scala/Pipeline/Pipelines/EX_MEM.scala`
```scala=
class EX_MEM extends Module {
val io = IO(new Bundle {
val IDEX_MEMRD = Input(Bool())
val IDEX_MEMWR = Input(Bool())
val IDEX_MEMTOREG = Input(Bool())
val IDEX_REG_W = Input(Bool())
val IDEX_rs2 = Input(SInt(32.W))
val IDEX_rd = Input(UInt(5.W))
val alu_out = Input(SInt(32.W))
val EXMEM_memRd_out = Output(Bool())
val EXMEM_memWr_out = Output(Bool())
val EXMEM_memToReg_out = Output(Bool())
val EXMEM_reg_w_out = Output(Bool())
val EXMEM_rs2_out = Output(SInt(32.W))
val EXMEM_rd_out = Output(UInt(5.W))
val EXMEM_alu_out = Output(SInt(32.W))
})
io.EXMEM_memRd_out := RegNext(io.IDEX_MEMRD)
io.EXMEM_memWr_out := RegNext(io.IDEX_MEMWR)
io.EXMEM_memToReg_out := RegNext(io.IDEX_MEMTOREG)
io.EXMEM_reg_w_out := RegNext(io.IDEX_REG_W)
io.EXMEM_rs2_out := RegNext(io.IDEX_rs2)
io.EXMEM_rd_out := RegNext(io.IDEX_rd)
io.EXMEM_alu_out := RegNext(io.alu_out)
}
```
The code snippet above implements the `EX/MEM` pipeline registers, which transfer critical data and control signals from the execution stage (`EX`) to the memory stage (`MEM`). In this module, essential control signals—namely, `memRD`, `memWr`, and `memToReg`—are preserved to ensure proper memory operations and data routing. Additionally, the ALU result (`alu_out`) is stored along with the `reg_w_out` and `rd_out` signals, which are vital for hazard detection and data forwarding in later pipeline stages.
#### 2.4 MEM_WB Pipeline
**Filepath**: `src/main/scala/Pipeline/Pipelines/MEM_WB.scala`
```scala=
class MEM_WB extends Module {
val io = IO(new Bundle {
val EXMEM_MEMTOREG = Input(Bool())
val EXMEM_REG_W = Input(Bool())
val EXMEM_MEMRD = Input(Bool())
val EXMEM_rd = Input(UInt(5.W))
val in_dataMem_out = Input(SInt(32.W))
val in_alu_out = Input(SInt(32.W))
val MEMWB_memToReg_out = Output(Bool())
val MEMWB_reg_w_out = Output(Bool())
val MEMWB_memRd_out = Output(Bool())
val MEMWB_rd_out = Output(UInt(5.W))
val MEMWB_dataMem_out = Output(SInt(32.W))
val MEMWB_alu_out = Output(SInt(32.W))
})
io.MEMWB_memToReg_out := RegNext(io.EXMEM_MEMTOREG)
io.MEMWB_reg_w_out := RegNext(io.EXMEM_REG_W)
io.MEMWB_memRd_out := RegNext(io.EXMEM_MEMRD)
io.MEMWB_rd_out := RegNext(io.EXMEM_rd)
io.MEMWB_dataMem_out := RegNext(io.in_dataMem_out)
io.MEMWB_alu_out := RegNext(io.in_alu_out)
}
```
The code snippet above implements the `MEM/WB` pipeline registers, which transfer essential data from the memory stage (`MEM`) to the write-back stage (`WB`). Specifically, this module preserves control signals such as `memToReg`, `reg_w`, and `memRd`, as well as key data values including the destination register (`rd`), data from memory (`dataMem`), and the ALU output (`alu`).
---
### 3. Memory Units
In the RISC-V pipeline, two distinct memory units are employed: instruction memory and data memory. The repository implements these as separate modules, each tailored to its specific role in the processor's operation.
#### 3.1 Inst-Memory
**Filepath**: `src/main/scala/Pipeline/Memory/InstMem.scala`
```scala=
class InstMem(initFile: String) extends Module {
val io = IO(new Bundle {
val addr = Input(UInt(32.W)) // Address input to fetch instruction
val data = Output(UInt(32.W)) // Output instruction
})
val imem = Mem(1024, UInt(32.W))
loadMemoryFromFile(imem, initFile)
io.data := imem(io.addr/4.U)
}
```
The code snippet implements the instruction memory module for the RISC-V pipeline. This module features one 32-bit address input (`io.addr`) used to fetch instructions and one 32-bit data output (`io.data`) for delivering the corresponding instruction. The memory is instantiated with `Mem(1024, UInt(32.W))`, which creates an array of 1024 entries, each capable of storing a 32-bit instruction. The `initFile` parameter specifies the file from which the initial contents of the instruction memory are loaded, and the function `loadMemoryFromFile` is used to populate the memory with these values. Finally, the module accesses the instruction memory by dividing the input address by 4 to ensure proper word alignment.
#### 3.2 Data-Memory
**Filepath**: `src/main/scala/Pipeline/Memory/DataMemory.scala`
```scala=
class DataMemory extends Module {
val io = IO(new Bundle {
val addr = Input(UInt(32.W)) // Address input
val dataIn = Input(SInt(32.W)) // Data to be written
val mem_read = Input(Bool()) // Memory read enable
val mem_write = Input(Bool()) // Memory write enable
val dataOut = Output(SInt(32.W)) // Data output
})
val Dmemory = Mem(1024, SInt(32.W))
io.dataOut := 0.S
when(io.mem_write) {
Dmemory.write(io.addr, io.dataIn)
}
when(io.mem_read) {
io.dataOut := Dmemory.read(io.addr)
}
}
```
The code snippet implements the data memory unit for the RISC-V pipeline. This module features four input ports—`io.addr`, `io.dataIn`, `io.mem_read`, and `io.mem_write`—and one output port, `io.dataOut`. It instantiates a memory array with 1024 entries, where each entry is a 32-bit word. When the control signal `io.mem_write` is asserted, the module writes the data from `io.dataIn` into the memory at the address specified by `io.addr`. Conversely, if `io.mem_read` is activated, the module reads the data stored at `io.addr` and outputs it via `io.dataOut`.
---
### 4. Hazard Units
#### 4.1 Structural Hazard
**Filepath**: `src/main/scala/Pipeline/Hazard Units/StructuralHazard.scala`
```scala=
class StructuralHazard extends Module {
val io = IO(new Bundle {
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val MEM_WB_regWr = Input(Bool())
val MEM_WB_Rd = Input(UInt(5.W))
val fwd_rs1 = Output(Bool())
val fwd_rs2 = Output(Bool())
})
// Determine if forwarding is needed for rs1
when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs1) {
io.fwd_rs1 := true.B
}.otherwise {
io.fwd_rs1 := false.B
}
// Determine if forwarding is needed for rs2
when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs2) {
io.fwd_rs2 := true.B
}.otherwise {
io.fwd_rs2 := false.B
}
}
```
The code snippet implements the structural hazard resolution mechanism for the RISC-V pipeline. This module is connected to four input ports—`rs1`, `rs2`, `MEM_WB_regWr`, and `MEM_WB_Rd`—and produces two output ports—`fwd_rs1` and `fwd_rs2`. The module checks whether the register destination in the `MEM/WB` stage (`MEM_WB_Rd`) matches either source register (`rs1` or `rs2`) while ensuring that write-back is enabled (i.e., `MEM_WB_regWr` is asserted). If a match is detected, the corresponding forwarding signal (`fwd_rs1` or `fwd_rs2`) is set to `true`; otherwise, it remains `false`.
#### 4.2 Hazard Detection
**Filepath**: `src/main/scala/Pipeline/Hazard Units/HazardDetection.scala`
```scala=
class HazardDetection extends Module {
val io = IO(new Bundle {
val IF_ID_inst = Input(UInt(32.W))
val ID_EX_memRead = Input(Bool())
val ID_EX_rd = Input(UInt(5.W))
val pc_in = Input(SInt(32.W))
val current_pc = Input(SInt(32.W))
val inst_forward = Output(Bool())
val pc_forward = Output(Bool())
val ctrl_forward = Output(Bool())
val inst_out = Output(UInt(32.W))
val pc_out = Output(SInt(32.W))
val current_pc_out = Output(SInt(32.W))
})
val Rs1 = io.IF_ID_inst(19, 15)
val Rs2 = io.IF_ID_inst(24, 20)
when(io.ID_EX_memRead === 1.B && ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))) {
io.inst_forward := true.B
io.pc_forward := true.B
io.ctrl_forward := true.B
}.otherwise {
io.inst_forward := false.B
io.pc_forward := false.B
io.ctrl_forward := false.B
}
io.inst_out := io.IF_ID_inst
io.pc_out := io.pc_in
io.current_pc_out := io.current_pc
}
```
The code snippet implements the hazard detection mechanism, which monitors potential data hazards in the pipeline. When the `ID/EX` stage is performing a memory read (i.e., `io.ID_EX_memRead` is `true`) and the destination register (`io.ID_EX_rd`) matches either of the source registers specified in the instruction (`Rs1` or `Rs2` extracted from `io.IF_ID_inst`), the module asserts three forwarding signals: `inst_forward`, `pc_forward`, and `ctrl_forward` are all set to true. These signals indicate that instruction, program counter, and control signal forwarding are required to avoid pipeline stalls. Otherwise, all forwarding signals remain `false`. Additionally, the module passes through the values of `io.IF_ID_inst`, `io.pc_in`, and `io.current_pc` to `io.inst_out`, `io.pc_out`, and `io.current_pc_out`, respectively, ensuring that the instruction and relevant PC values continue to the next pipeline stage.
#### 4.3 Forwarding Unit
**Filepath**: `src/main/scala/Pipeline/Hazard Units/Forwarding.scala`
```scala=
class Forwarding extends Module {
val io = IO(new Bundle {
val IDEX_rs1 = Input(UInt(5.W))
val IDEX_rs2 = Input(UInt(5.W))
val EXMEM_rd = Input(UInt(5.W))
val EXMEM_regWr = Input(UInt(1.W))
val MEMWB_rd = Input(UInt(5.W))
val MEMWB_regWr = Input(UInt(1.W))
val forward_a = Output(UInt(2.W))
val forward_b = Output(UInt(2.W))
})
io.forward_a := "b00".U
io.forward_b := "b00".U
// EX HAZARD
when(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U &&
(io.EXMEM_rd === io.IDEX_rs1.asUInt) && (io.EXMEM_rd === io.IDEX_rs2)) {
io.forward_a := "b10".U
io.forward_b := "b10".U
}.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U &&
(io.EXMEM_rd === io.IDEX_rs2)) {
io.forward_b := "b10".U
}.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U &&
(io.EXMEM_rd === io.IDEX_rs1)) {
io.forward_a := "b10".U
}
// MEM HAZARD
when((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2) &&
~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1) && (io.EXMEM_rd === io.IDEX_rs2))) {
io.forward_a := "b01".U
io.forward_b := "b01".U
}.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs2) &&
~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs2))){
io.forward_b := "b01".U
}.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) &&
~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1))){
io.forward_a := "b01".U
}
}
```
This module implements the forwarding unit, which dynamically selects and routes data from later pipeline stages to resolve data hazards in the RISC-V pipeline. The unit examines the source registers from the `ID/EX` stage (i.e., `IDEX_rs1` and `IDEX_rs2`) and compares them with the destination registers from both the `EX/MEM` and `MEM/WB` stages. Depending on which stage provides the most recent data, the module assigns a corresponding two-bit value to the forwarding outputs (`forward_a` and `forward_b`). For example, when the `EX/MEM` stage is writing to a non-zero register that matches a source operand, the corresponding forward signal is set to binary `10`, indicating that data should be forwarded directly from the `EX/MEM` stage.
In the `MEM` hazard section, the module addresses cases where the `MEM/WB` stage holds the data needed by the current instruction. Here, the module checks whether the `MEM/WB` stage is writing to a non-zero register that matches the source registers of the `ID/EX` stage. However, this forwarding is only enabled if the `EX/MEM` stage is not already forwarding for that register (thereby prioritizing `EX` hazards). If the conditions are met, the forward signal is set to binary `01`, signaling that the required data should be forwarded from the `MEM/WB` stage. This mechanism ensures that even if an instruction's result has not been written back yet, the correct value is available for subsequent computations, thereby avoiding pipeline stalls.
#### 4.4 Branch Forwarding
**Filepath**: `src/main/scala/Pipeline/Hazard Units/BranchForward.scala`
```scala=
class BranchForward extends Module {
val io = IO(new Bundle {
val ID_EX_RD = Input(UInt(5.W))
val EX_MEM_RD = Input(UInt(5.W))
val MEM_WB_RD = Input(UInt(5.W))
val ID_EX_memRd = Input(UInt(1.W))
val EX_MEM_memRd = Input(UInt(1.W))
val MEM_WB_memRd = Input(UInt(1.W))
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val ctrl_branch = Input(UInt(1.W))
val forward_rs1 = Output(UInt(4.W))
val forward_rs2 = Output(UInt(4.W))
})
io.forward_rs1 := "b0000".U
io.forward_rs2 := "b0000".U
// Branch forwarding logic
when(io.ctrl_branch === 1.U) {
// ALU Hazard
when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U) {
when(io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) {
io.forward_rs1 := "b0001".U
io.forward_rs2 := "b0001".U
}.elsewhen(io.ID_EX_RD === io.rs1) {
io.forward_rs1 := "b0001".U
}.elsewhen(io.ID_EX_RD === io.rs2) {
io.forward_rs2 := "b0001".U
}
}
// EX/MEM Hazard
when(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U) {
when(io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2)) {
io.forward_rs1 := "b0010".U
io.forward_rs2 := "b0010".U
}.elsewhen(io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
io.forward_rs1 := "b0010".U
}.elsewhen(io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2)) {
io.forward_rs2 := "b0010".U
}
}
// MEM/WB Hazard
when(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U) {
when(io.MEM_WB_RD === io.rs1 && io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2)) {
io.forward_rs1 := "b0011".U
io.forward_rs2 := "b0011".U
}.elsewhen(io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
io.forward_rs1 := "b0011".U
}.elsewhen(io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs2)) {
io.forward_rs2 := "b0011".U
}
}
// Jalr forwarding logic
}.elsewhen(io.ctrl_branch === 0.U) {
when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U && io.ID_EX_RD === io.rs1) {
io.forward_rs1 := "b0110".U
}.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
io.forward_rs1 := "b0111".U
}.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd === 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
io.forward_rs1 := "b1001".U
}.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
io.forward_rs1 := "b1000".U
}.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd === 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
io.forward_rs1 := "b1010".U
}
}
}
```
The `BranchForward` module is a key component in the RISC-V pipeline, responsible for resolving data hazards during `branch` and `Jalr` instruction execution. It determines if source operands for branch evaluation need to be forwarded from later pipeline stages to avoid stalls. The module takes as inputs the destination register identifiers and memory read flags from the `ID/EX`, `EX/MEM`, and `MEM/WB` pipeline stages, alongside the source register identifiers (`rs1` and `rs2`) of the branch instruction and a control signal (`ctrl_branch`). The outputs, `forward_rs1` and `forward_rs2`, are four-bit signals indicating the source of the forwarded data. When `ctrl_branch` is set to `1`, branch forwarding logic is applied by sequentially checking for hazards in the `ID/EX`, `EX/MEM`, and `MEM/WB` stages, forwarding the most recent valid data to the source registers based on specific matching conditions.
For `Jalr` instructions, indicated when `ctrl_branch` is set to `0`, the module only evaluates the source register `rs1` for potential forwarding. It similarly checks the `ID/EX`, `EX/MEM`, and `MEM/WB` stages for data matches, prioritizing the most recent and valid data for forwarding. Different codes are assigned to `forward_rs1` based on whether the data comes from a memory read or a non-memory read operation. This modular and hierarchical approach ensures that the correct operand is always forwarded for branch or `Jalr` instruction evaluation, reducing pipeline stalls and maintaining efficient instruction execution.
### 5. Pipeline
#### 5.1 MuxLookup select PC value
Filepath: `src/main/scala/Pipeline/Main.scala`
```scala=
val PC_F = MuxLookup(HazardDetect.io.pc_forward, 0.S, Array(
(0.U) -> PC4.io.out.asSInt,
(1.U) -> HazardDetect.io.pc_out))
PC.io.in := PC_F // PC_in input
PC4.io.pc := PC.io.out.asUInt // PC4_in input <- PC_out
InstMemory.io.addr := PC.io.out.asUInt // Address to fetch instruction
val PC_for = MuxLookup (HazardDetect.io.inst_forward, 0.S, Array (
(0.U) -> PC.io.out,
(1.U) -> HazardDetect.io.current_pc_out))
val Instruction_F = MuxLookup (HazardDetect.io.inst_forward, 0.U, Array (
(0.U) -> InstMemory.io.data,
(1.U) -> HazardDetect.io.inst_out))
```
This code snippet demonstrates the use of `MuxLookup` to manage the Program Counter (PC) update logic in a pipeline processor. It incorporates hazard detection mechanisms to ensure the correct instruction is executed, even in the presence of potential pipeline hazards.
<center>

</center>
#### 5.2 Register File Inputs (rs1 and rs2)
```scala=
// Decode connections (Control unit RegFile)
control_module.io.opcode := IF_ID_.io.SelectedInstr_out(6, 0) // OPcode to check Instrcution TYpe
// Registerfile inputs
RegFile.io.rs1 := Mux(
control_module.io.opcode === 51.U || // R-type
control_module.io.opcode === 19.U || // I-type
control_module.io.opcode === 35.U || // S-type
control_module.io.opcode === 3.U || // I-type (load instructions)
control_module.io.opcode === 99.U || // SB-type (branch)
control_module.io.opcode === 103.U, // JALR instruction
IF_ID_.io.SelectedInstr_out(19, 15), 0.U )
RegFile.io.rs2 := Mux(
control_module.io.opcode === 51.U || // R-type
control_module.io.opcode === 35.U || // S-type
control_module.io.opcode === 99.U, // SB-type (branch)
IF_ID_.io.SelectedInstr_out(24, 20), 0.U)
RegFile.io.reg_write := control_module.io.reg_write
```
This code is responsible for decoding the fetched instruction by extracting its opcode to identify the instruction type. Based on the opcode and the instruction format, it determines the values of the `rs1` and `rs2` register fields, specifying the source registers to be used for operations. The `rs1` field is selected for instruction types such as R-type, I-type, S-type, SB-type, and JALR, while the `rs2` field is used for R-type, S-type, and SB-type instructions. Additionally, the `reg_write` signal is configured to enable or disable write-back to the register file (`RegFile`), depending on whether the current instruction requires a write operation. This ensures the proper setup of source registers and write-back control for subsequent execution stages.
<center>
| Instruction | Opcode | Decimal |
| -------- | -------- | -------- |
| R-type | 011 0011 | 51 |
| I-type | 001 0011 | 19 |
| S-type | 010 0011 | 35 |
| I-type (load instructions) | 000 0011 | 3 |
| SB-type (branch) | 110 0011 | 99 |
| JALR instruction | 110 0111 | 103 |

</center>
#### 5.3 **Data Forwarding for rs1 and rs2 to Resolve Pipeline Hazards**
```scala=
// rs1_data
when (Structural.io.fwd_rs1 === 0.U) {
S_rs1DataIn := RegFile.io.rdata1
}.elsewhen (Structural.io.fwd_rs1 === 1.U) {
S_rs1DataIn := RegFile.io.w_data
}.otherwise {
S_rs1DataIn := 0.S
}
// rs2_data
when (Structural.io.fwd_rs2 === 0.U) {
S_rs2DataIn := RegFile.io.rdata2
}.elsewhen (Structural.io.fwd_rs2 === 1.U) {
S_rs2DataIn := RegFile.io.w_data
}.otherwise {
S_rs2DataIn := 0.S
}
```
This code implements data forwarding for the rs1 and rs2 source registers to handle potential data hazards in the pipeline.
`S_rs1DataIn` and `S_rs2DataIn`: Wires used to hold the correct values for `rs1` and `rs2` after evaluating forwarding needs.
* **Forwarding Logic:**
* If no hazard exists, data is read directly from the register file.
* If a hazard is detected, data is forwarded from the write-back stage to avoid delays.
* **Default Behavior:** Sets the values to `0.S` if no valid data path is available.
This ensures that the pipeline uses the most up-to-date data for execution, maintaining correctness and avoiding unnecessary stalls.
#### 5.4 **Stalling Logic for Control Hazard Resolution in Pipeline**
```scala=
// Stall when forward
when(HazardDetect.io.ctrl_forward === "b1".U) {
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_Reg_W_in := 0.U
ID_EX_.io.ctrl_AluOp_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
}.otherwise {
ID_EX_.io.ctrl_MemWr_in := control_module.io.mem_write
ID_EX_.io.ctrl_MemRd_in := control_module.io.mem_read
ID_EX_.io.ctrl_MemToReg_in := control_module.io.men_to_reg
ID_EX_.io.ctrl_Reg_W_in := control_module.io.reg_write
ID_EX_.io.ctrl_AluOp_in := control_module.io.alu_operation
ID_EX_.io.ctrl_OpB_in := control_module.io.operand_B
ID_EX_.io.ctrl_Branch_in := control_module.io.branch
ID_EX_.io.ctrl_nextpc_in := control_module.io.next_pc_sel
}
```
This code snippet implements **stalling logic** to handle control hazards in a pipelined processor. When a hazard is detected, the pipeline stage is stalled by setting all control signals in the `ID_EX` pipeline register to `0`. Otherwise, the normal control signals are passed through.
---
## B. Rectifications
### 1. Exposing Registers
In addition to constructing a pipelined RISC-V CPU using Chisel, it is essential to verify the integrity of the structure. Therefore, we first verify the correctness of our RISC-V test code using a third-party processor simulator named Ripes. Next, we establish the expected register outputs and compare them with the results produced by our CPU.
However, since the register values are confined within the RegisterFile module, we need to "expose" them through the IO Bundle. The following code snippet shows the modified IO of this module, which exposes all argument registers, temporary registers, and save registers.
```scala=
// RegisterFile (RegisterFile.scala)
val io = IO(new Bundle {
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val reg_write = Input(Bool())
val w_reg = Input(UInt(5.W))
val w_data = Input(SInt(32.W))
val rdata1 = Output(SInt(32.W))
val rdata2 = Output(SInt(32.W))
// >> exposed argument registers
val a0 = Output(SInt(32.W))
val a1 = Output(SInt(32.W))
val a2 = Output(SInt(32.W))
val a3 = Output(SInt(32.W))
val a4 = Output(SInt(32.W))
val a5 = Output(SInt(32.W))
val a6 = Output(SInt(32.W))
val a7 = Output(SInt(32.W))
// << exposed argument registers
// >> exposed temporary registers
val t0 = Output(SInt(32.W))
val t1 = Output(SInt(32.W))
val t2 = Output(SInt(32.W))
val t3 = Output(SInt(32.W))
val t4 = Output(SInt(32.W))
val t5 = Output(SInt(32.W))
val t6 = Output(SInt(32.W))
// << exposed temporary registers
// >> exposed save registers
val s0 = Output(SInt(32.W))
val s1 = Output(SInt(32.W))
val s2 = Output(SInt(32.W))
val s3 = Output(SInt(32.W))
val s4 = Output(SInt(32.W))
val s5 = Output(SInt(32.W))
val s6 = Output(SInt(32.W))
val s7 = Output(SInt(32.W))
val s8 = Output(SInt(32.W))
val s9 = Output(SInt(32.W))
val s10 = Output(SInt(32.W))
val s11 = Output(SInt(32.W))
// << exposed save registers
})
```
After exposing these IO ports, we need to wire the register values to the corresponding output ports. The following code snippet implements the wiring logic within the module.
```scala=
// RegisterFile (RegisterFile.scala)
// >> wiring argument registers to corresponding output ports
io.a0 := Mux(io.reg_write && io.w_reg === 10.U, io.w_data, regfile(10))
io.a1 := Mux(io.reg_write && io.w_reg === 11.U, io.w_data, regfile(11))
io.a2 := Mux(io.reg_write && io.w_reg === 12.U, io.w_data, regfile(12))
io.a3 := Mux(io.reg_write && io.w_reg === 13.U, io.w_data, regfile(13))
io.a4 := Mux(io.reg_write && io.w_reg === 14.U, io.w_data, regfile(14))
io.a5 := Mux(io.reg_write && io.w_reg === 15.U, io.w_data, regfile(15))
io.a6 := Mux(io.reg_write && io.w_reg === 16.U, io.w_data, regfile(16))
io.a7 := Mux(io.reg_write && io.w_reg === 17.U, io.w_data, regfile(17))
// << wiring argument registers to corresponding output ports
// >> wiring temporary registers to corresponding output ports
io.t0 := Mux(io.reg_write && io.w_reg === 5.U, io.w_data, regfile(5))
io.t1 := Mux(io.reg_write && io.w_reg === 6.U, io.w_data, regfile(6))
io.t2 := Mux(io.reg_write && io.w_reg === 7.U, io.w_data, regfile(7))
io.t3 := Mux(io.reg_write && io.w_reg === 28.U, io.w_data, regfile(28))
io.t4 := Mux(io.reg_write && io.w_reg === 29.U, io.w_data, regfile(29))
io.t5 := Mux(io.reg_write && io.w_reg === 30.U, io.w_data, regfile(30))
io.t6 := Mux(io.reg_write && io.w_reg === 31.U, io.w_data, regfile(31))
// << wiring temporary registers to corresponding output ports
// >> wiring save registers to corresponding output ports
io.s0 := Mux(io.reg_write && io.w_reg === 8.U, io.w_data, regfile(8))
io.s1 := Mux(io.reg_write && io.w_reg === 9.U, io.w_data, regfile(9))
io.s2 := Mux(io.reg_write && io.w_reg === 18.U, io.w_data, regfile(18))
io.s3 := Mux(io.reg_write && io.w_reg === 19.U, io.w_data, regfile(19))
io.s4 := Mux(io.reg_write && io.w_reg === 20.U, io.w_data, regfile(20))
io.s5 := Mux(io.reg_write && io.w_reg === 21.U, io.w_data, regfile(21))
io.s6 := Mux(io.reg_write && io.w_reg === 22.U, io.w_data, regfile(22))
io.s7 := Mux(io.reg_write && io.w_reg === 23.U, io.w_data, regfile(23))
io.s8 := Mux(io.reg_write && io.w_reg === 24.U, io.w_data, regfile(24))
io.s9 := Mux(io.reg_write && io.w_reg === 25.U, io.w_data, regfile(25))
io.s10 := Mux(io.reg_write && io.w_reg === 26.U, io.w_data, regfile(26))
io.s11 := Mux(io.reg_write && io.w_reg === 27.U, io.w_data, regfile(27))
// << wiring save registers to corresponding output ports
```
Similarly, we expose the register values outside the `PIPELINE` module using the subsequent code snippets.
```scala=
// PIPELINE (Main.scala)
val io = IO(new Bundle {
val out = Output(SInt(32.W))
val out_pc = Output(SInt(32.W))
// >> exposed argument registers
val a0 = Output(SInt(32.W))
val a1 = Output(SInt(32.W))
val a2 = Output(SInt(32.W))
val a3 = Output(SInt(32.W))
val a4 = Output(SInt(32.W))
val a5 = Output(SInt(32.W))
val a6 = Output(SInt(32.W))
val a7 = Output(SInt(32.W))
// << exposed argument registers
// >> exposed temporary registers
val t0 = Output(SInt(32.W))
val t1 = Output(SInt(32.W))
val t2 = Output(SInt(32.W))
val t3 = Output(SInt(32.W))
val t4 = Output(SInt(32.W))
val t5 = Output(SInt(32.W))
val t6 = Output(SInt(32.W))
// << exposed temporary registers
// >> exposed save registers
val s0 = Output(SInt(32.W))
val s1 = Output(SInt(32.W))
val s2 = Output(SInt(32.W))
val s3 = Output(SInt(32.W))
val s4 = Output(SInt(32.W))
val s5 = Output(SInt(32.W))
val s6 = Output(SInt(32.W))
val s7 = Output(SInt(32.W))
val s8 = Output(SInt(32.W))
val s9 = Output(SInt(32.W))
val s10 = Output(SInt(32.W))
val s11 = Output(SInt(32.W))
// << exposed save registers
})
```
```scala=
// PIPELINE (Main.scala)
// >> wiring argument registers to corresponding output ports
io.out_a0 := RegFile.io.a0
io.out_a1 := RegFile.io.a1
io.out_a2 := RegFile.io.a2
io.out_a3 := RegFile.io.a3
io.out_a4 := RegFile.io.a4
io.out_a5 := RegFile.io.a5
io.out_a6 := RegFile.io.a6
io.out_a7 := RegFile.io.a7
// << wiring argument registers to corresponding output ports
// >> wiring temporary registers to corresponding output ports
io.out_t0 := RegFile.io.t0
io.out_t1 := RegFile.io.t1
io.out_t2 := RegFile.io.t2
io.out_t3 := RegFile.io.t3
io.out_t4 := RegFile.io.t4
io.out_t5 := RegFile.io.t5
io.out_t6 := RegFile.io.t6
// << wiring temporary registers to corresponding output ports
// >> wiring save registers to corresponding output ports
io.out_s0 := RegFile.io.s0
io.out_s1 := RegFile.io.s1
io.out_s2 := RegFile.io.s2
io.out_s3 := RegFile.io.s3
io.out_s4 := RegFile.io.s4
io.out_s5 := RegFile.io.s5
io.out_s6 := RegFile.io.s6
io.out_s7 := RegFile.io.s7
io.out_s8 := RegFile.io.s8
io.out_s9 := RegFile.io.s9
io.out_s10 := RegFile.io.s10
io.out_s11 := RegFile.io.s11
// << wiring save registers to corresponding output ports
```
Finally, in our `MainTest.scala`, we add test cases following the structure shown in the code snippet below:
```scala=
// MainTest.scala
class TOPTest extends FreeSpec with ChiselScalatestTester{
"test a0" in {
// test program
test(new PIPELINE("/home/mi2s/FProject/compilation/testA0.txt")){
x =>
// the number of clock cycles to finish the program
x.clock.step(6)
// the expected value of a0 register
x.io.out_a0.expect(10.S)
}
}
}
```
---
### 2. Logging States
The code provided in the repository initially could not properly execute our test cases. Consequently, we traced the execution process and monitored register states after each clock cycle. However, since neither Chisel nor the author of the repository offers a user-friendly debugging tool like the Ripes simulator, which displays register values, we had to implement logging using `printf` statements. The following 3 code snippets demonstrate logging for temporary, argument, and save registers.
```scala=
// PIPELINE (Main.scala)
// t0-t6 : temporary registers
printf(p"[ ${hwCounter} ] t0: ${Hexadecimal(RegFile.io.t0)}, t1: ${Hexadecimal(RegFile.io.t1)}, t2: ${Hexadecimal(RegFile.io.t2)}, t3: ${Hexadecimal(RegFile.io.t3)}, t4: ${Hexadecimal(RegFile.io.t4)}, t5: ${Hexadecimal(RegFile.io.t5)}, t6: ${Hexadecimal(RegFile.io.t6)}\n")
```
```scala=
// PIPELINE (Main.scala)
// a0-a7 : argument registers
printf(p"[ ${hwCounter} ] a0: ${Hexadecimal(RegFile.io.a0)}, a1: ${Hexadecimal(RegFile.io.a1)}, a2: ${Hexadecimal(RegFile.io.a2)}, a3: ${Hexadecimal(RegFile.io.a3)}, a4: ${Hexadecimal(RegFile.io.a4)}, a5: ${Hexadecimal(RegFile.io.a5)}, a6: ${Hexadecimal(RegFile.io.a6)}, a7: ${Hexadecimal(RegFile.io.a7)}\n")
```
```scala=
// PIPELINE (Main.scala)
// s0-s11 : save registers
printf(p"[ ${hwCounter} ] s0: ${Hexadecimal(RegFile.io.s0)}, s1: ${Hexadecimal(RegFile.io.s1)}, s2: ${Hexadecimal(RegFile.io.s2)}, s3: ${Hexadecimal(RegFile.io.s3)}, s4: ${Hexadecimal(RegFile.io.s4)}, s5: ${Hexadecimal(RegFile.io.s5)}, s6: ${Hexadecimal(RegFile.io.s6)}, s7: ${Hexadecimal(RegFile.io.s7)}, s8: ${Hexadecimal(RegFile.io.s8)}, s9: ${Hexadecimal(RegFile.io.s9)}, s10: ${Hexadecimal(RegFile.io.s10)}, s11: ${Hexadecimal(RegFile.io.s11)}\n")
```
Additionally, to effectively monitor and analyze the `IF_ID` and `ID_EXE` pipelines and ALU controls, we include the following code snippet for logging supplementary information.
```scala=
// PIPELINE (Main.scala)
// control signals from decode to execute (including ALU operands)
printf(p"[ ${hwCounter} ] idx: ${Decimal(PC.io.out / 4.S + 1.S)} op: 0x${Hexadecimal(control_module.io.opcode)} rs1: ${Decimal(RegFile.io.rs1)} (0x${Hexadecimal(RegFile.io.rdata1)}) rs2: ${Decimal(RegFile.io.rs2)} (0x${Hexadecimal(RegFile.io.rdata2)}) alu_arg1: 0x${Hexadecimal(ALU.io.in_A)} alu_arg2: 0x${Hexadecimal(ALU.io.in_B)} inst: 0x${Hexadecimal(InstMemory.io.data)} alu_ctrl_op_A: ${ID_EX_.io.ctrl_OpA_out} alu_forward_a: ${Forwarding.io.forward_a} alu_ctrl_op_B: ${ID_EX_.io.ctrl_OpB_out} alu_forward_b: ${Forwarding.io.forward_b} EXMEM_rd: ${Decimal(Forwarding.io.EXMEM_rd)} IDEX_rs1: ${Decimal(Forwarding.io.IDEX_rs1)} IDEX_rs1_data_out: 0x${Hexadecimal(ID_EX_.io.rs1_data_out)} EXMEM_alu_out: 0x${Hexadecimal(EX_MEM_M.io.EXMEM_alu_out)} IDEX_rs2_data: 0x${Hexadecimal(ID_EX_.io.rs2_data_out)} IDEX_rs1_data_in: 0x${Hexadecimal(ID_EX_.io.rs1_data_in)} fwd_rs1: ${Structural.io.fwd_rs1} MEM_WB_RD: ${Decimal(Forwarding.io.MEMWB_rd)} io_rs1: ${Decimal(ID_EX_.io.rs1_out)} io_rs2: ${Decimal(ID_EX_.io.rs2_out)} MEM_WB_RD_Data: ${Hexadecimal(MEM_WB_M.io.MEMWB_alu_out)} ALUOp: ${Decimal(ALU.io.alu_Op)}\n")
```
---
### 3. Structural Hazards
While testing one of our programs, we observed an unusual discrepancy by tracing the logs and comparing the register states with those produced by the Ripes simulator.
```text
[ 35 ] t0: 7fffffff, t1: 3fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 36 ] t0: 7fffffff, t1: 3fffffff, t2: 55555000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 37 ] t0: 7fffffff, t1: 3fffffff, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 38 ] t0: 7fffffff, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 39 ] t0: 6aaaaaaa, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 40 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 41 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 42 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888888, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
```
In Ripes, at clock cycle 41, register `t2` is expected to change to `0x33333000` and then to `0x33333333` due to the instruction `li t2, 0x33333333`.
```
[ 35 ] idx: 20 op: 0x33 rs1: 6 (0x00007fff) rs2: 7 (0x00000000) alu_arg1: 0x55555000 alu_arg2: 0x00000555 inst: 0x406282b3 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 7 IDEX_rs1: 7 IDEX_rs1_data: 0x00000000 EXMEM_alu_out: 0x55555000
[ 36 ] idx: 21 op: 0x33 rs1: 5 (0x7fffffff) rs2: 6 (0x3fffffff) alu_arg1: 0x3fffffff alu_arg2: 0x55555555 inst: 0x0022d313 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 7 IDEX_rs1: 6 IDEX_rs1_data: 0x3fffffff EXMEM_alu_out: 0x55555555
[ 37 ] idx: 22 op: 0x13 rs1: 5 (0x7fffffff) rs2: 0 (0x00000000) alu_arg1: 0x7fffffff alu_arg2: 0x15555555 inst: 0x333333b7 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 6 IDEX_rs1: 5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x15555555
[ 38 ] idx: 23 op: 0x37 rs1: 0 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x6aaaaaaa alu_arg2: 0x00000002 inst: 0x33338393 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 5 IDEX_rs1: 5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x6aaaaaaa
[ 39 ] idx: 24 op: 0x13 rs1: 7 (0x55555555) rs2: 0 (0x00000000) alu_arg1: 0x15555555 alu_arg2: 0x33333000 inst: 0x00737333 alu_ctrl_op_A: 3 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 6 IDEX_rs1: 0 IDEX_rs1_data: 0x15555555 EXMEM_alu_out: 0x1aaaaaaa
[ 40 ] idx: 25 op: 0x33 rs1: 6 (0x15555555) rs2: 7 (0x55555555) alu_arg1: 0x48888555 alu_arg2: 0x00000333 inst: 0x0072f3b3 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 7 IDEX_rs1: 7 IDEX_rs1_data: 0x55555555 EXMEM_alu_out: 0x48888555
[ 41 ] idx: 26 op: 0x33 rs1: 5 (0x6aaaaaaa) rs2: 7 (0x55555555) alu_arg1: 0x1aaaaaaa alu_arg2: 0x48888888 inst: 0x007302b3 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 7 IDEX_rs1: 6 IDEX_rs1_data: 0x1aaaaaaa EXMEM_alu_out: 0x48888888
[ 42 ] idx: 27 op: 0x33 rs1: 6 (0x1aaaaaaa) rs2: 7 (0x48888555) alu_arg1: 0x6aaaaaaa alu_arg2: 0x48888888 inst: 0x0042d313 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 1 EXMEM_rd: 6 IDEX_rs1: 5 IDEX_rs1_data: 0x6aaaaaaa EXMEM_alu_out: 0x08888888
```
However, examining the ALU log reveals that at clock cycle 39 during the `EXE` stage, the CPU adds `0x15555555` and `0x33333000` instead of `0x00000000` and `0x33333000` as expected from the `lui t2, 0x33333` instruction. Further analysis shows that the value `0x15555555` is incorrectly forwarded from the write-back pipeline register. This issue originates from the module responsible for hazard detection.
The original implementation included a `StructuralHazard` class intended to resolve structural hazards but inadvertently handled data hazards instead, as shown in the code snippet below.
```scala=
// StructuralHazard (StructuralHazard.scala)
// Determine if forwarding is needed for rs1
when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs1)) {
io.fwd_rs1 := true.B
}.otherwise {
io.fwd_rs1 := false.B
}
// Determine if forwarding is needed for rs2
when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs2)) {
io.fwd_rs2 := true.B
}.otherwise {
io.fwd_rs2 := false.B
}
```
Additionally, its integration in `Main.scala` disrupted proper data forwarding by only addressing hazards from the `MEM/WB` pipeline and ignoring those from the `EX/MEM` pipeline. It also detected hazards at incorrect stages. To rectify this, we removed the flawed `StructuralHazard` class and correctly implemented structural hazard resolution.
```scala=
// PIPELINE (Main.scala)
// rs1_data
when (Structural.io.fwd_rs1 === 0.U) {
S_rs1DataIn := RegFile.io.rdata1
}.elsewhen (Structural.io.fwd_rs1 === 1.U) {
S_rs1DataIn := RegFile.io.w_data
}.otherwise {
S_rs1DataIn := 0.S
}
// rs2_data
when (Structural.io.fwd_rs2 === 0.U) {
S_rs2DataIn := RegFile.io.rdata2
}.elsewhen (Structural.io.fwd_rs2 === 1.U) {
S_rs2DataIn := RegFile.io.w_data
}.otherwise {
S_rs2DataIn := 0.S
}
```
After removing the defective module, new issues emerged, observable in the logs of argument and temporary registers.
```
[ 20 ] a0: ffffffff, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 21 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 22 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 23 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 24 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 25 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
```
```
[ 20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 24 ] t0: ffffffff, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 25 ] t0: ffffffff, t1: 7fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
```
Specifically, at clock cycle 24, the instruction `add t0, x0, a0` was supposed to complete execution. However, analysis of the control signal history in the decode and execute stages revealed that there was no forwarding of the latest value of `a0`. Consequently, reading and writing occurred simultaneously, causing the CPU to fetch stale data since reading is typically faster than writing.
```
[ 20 ] idx: 5 op: 0x00 rs1: 0 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x000000e0 alu_arg2: 0x00000000 inst: 0x00a002b3 alu_ctrl_op_A: 1 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 0 EXMEM_rd: 10 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 18 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 7fff0000 ALUOp: 31
[ 21 ] idx: 6 op: 0x33 rs1: 0 (0x00000000) rs2: 10 (0xffffffff) alu_arg1: 0x00000000 alu_arg2: 0x00000000 inst: 0x0012d313 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 0 EXMEM_rd: 1 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x000000e0 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 10 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 7fff0000 ALUOp: 0
[ 22 ] idx: 7 op: 0x13 rs1: 5 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0xffffffff inst: 0x0062e2b3 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 0 EXMEM_rd: 0 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x00000000 IDEX_rs2_data: 0xffffffff IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 1 io_rs1: 0 io_rs2: 10 MEM_WB_RD_Data: 000000e0 ALUOp: 0
[ 23 ] idx: 8 op: 0x33 rs1: 5 (0x00000000) rs2: 6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000001 inst: 0x0022d313 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 5 IDEX_rs1: 5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 0 io_rs1: 5 io_rs2: 0 MEM_WB_RD_Data: 00000000 ALUOp: 5
[ 24 ] idx: 9 op: 0x13 rs1: 5 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x7fffffff inst: 0x0062e2b3 alu_ctrl_op_A: 0 alu_forward_a: 1 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 6 IDEX_rs1: 5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 1 MEM_WB_RD: 5 io_rs1: 5 io_rs2: 6 MEM_WB_RD_Data: ffffffff ALUOp: 6
[ 25 ] idx: 10 op: 0x33 rs1: 5 (0xffffffff) rs2: 6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000002 inst: 0x0042d313 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 5 IDEX_rs1: 5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0xffffffff fwd_rs1: 0 MEM_WB_RD: 6 io_rs1: 5 io_rs2: 0 MEM_WB_RD_Data: 7fffffff ALUOp: 5
```
<center><img src="https://user-images.githubusercontent.com/56905673/117547053-f932fe00-b046-11eb-91af-9291291d4f52.png"></center>
Furthermore, the forwarding scenarios primarily included `EX/MEM` → `ALU`, `MEM/WB` → `ALU`, and `MEM/WB` → `InstrDecode`. The root cause of the issue was neglecting the priority of writing to registers before reading from them when both operations use the same register. This oversight led to reading stale data. For example, while storing `0x7FFF0000` to `a0`, the CPU simultaneously attempted to read `a0`, resulting in the stale value `0xFFFFFFFF`.
```
[ 20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 24 ] t0: 7fff0000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 25 ] t0: 7fff0000, t1: 3fff8000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
```
To ensure that writing to registers is prioritized before reading, we revised a section of the `RegisterFile` module
```scala=
// RegisterFile (RegisterFile.scala)
io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))
```
and replaced it with the following code snippet
```scala=
// RegisterFile (RegisterFile.scala)
// 1) Read old data from the array.
val readData1 = Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
val readData2 = Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))
// 2) If there's a same-cycle write to the same register, override (bypass) it.
val bypassedData1 = Mux(io.reg_write && (io.w_reg === io.rs1) && (io.w_reg =/= 0.U),
io.w_data,
readData1)
val bypassedData2 = Mux(io.reg_write && (io.w_reg === io.rs2) && (io.w_reg =/= 0.U),
io.w_data,
readData2)
// 3) Send those results to outputs
io.rdata1 := bypassedData1
io.rdata2 := bypassedData2
```
---
### 4. Missing Instructions
After resolving these hazards, we encountered an unexpected issue with arithmetic operations. The log files below display the state history of save registers and control signals. At clock cycle 64, the instruction `0x408a5a13` (`srai s4, s4, 8`) is loaded and executed at clock cycle 66. By clock cycle 68, the instruction performs a logical right shift without the required sign extension for `srai` or `sra` instructions. This is evident from the `ALUOp` value of 5 at clock cycle 66, which corresponds to `SRL` instead of `SRA`.
```
[ 64 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 00000000, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 65 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 66 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 04000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 67 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 83ff0000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 68 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 69 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
```
```
[ 64 ] idx: 64 op: 0x33 rs1: 18 (0x7fff0000) rs2: 20 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0x04000000 inst: 0x408a5a13 alu_ctrl_op_A: 3 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 19 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0000002b IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1: 0 MEM_WB_RD: 8 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 00000000 ALUOp: 0
[ 65 ] idx: 65 op: 0x13 rs1: 20 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0x04000000 inst: 0x7f8002b7 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 20 IDEX_rs1: 18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x04000000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 19 io_rs1: 18 io_rs2: 20 MEM_WB_RD_Data: 0000002b ALUOp: 0
[ 66 ] idx: 66 op: 0x37 rs1: 0 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x83ff0000 alu_arg2: 0x00000408 inst: 0x005a7a33 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 20 IDEX_rs1: 20 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x83ff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 20 io_rs1: 20 io_rs2: 0 MEM_WB_RD_Data: 04000000 ALUOp: 5
[ 67 ] idx: 67 op: 0x33 rs1: 20 (0x83ff0000) rs2: 5 (0x00000001) alu_arg1: 0x00000000 alu_arg2: 0x7f800000 inst: 0xfff90a93 alu_ctrl_op_A: 3 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 20 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0083ff00 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x83ff0000 fwd_rs1: 1 MEM_WB_RD: 20 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 83ff0000 ALUOp: 0
[ 68 ] idx: 68 op: 0x13 rs1: 18 (0x7fff0000) rs2: 0 (0x00000000) alu_arg1: 0x0083ff00 alu_arg2: 0x7f800000 inst: 0x01fada93 alu_ctrl_op_A: 0 alu_forward_a: 1 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 5 IDEX_rs1: 20 IDEX_rs1_data_out: 0x83ff0000 EXMEM_alu_out: 0x7f800000 IDEX_rs2_data: 0x00000001 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1: 0 MEM_WB_RD: 20 io_rs1: 20 io_rs2: 5 MEM_WB_RD_Data: 0083ff00 ALUOp: 7
[ 69 ] idx: 69 op: 0x13 rs1: 21 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0xffffffff inst: 0x013912b3 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 20 IDEX_rs1: 18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x00800000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 5 io_rs1: 18 io_rs2: 0 MEM_WB_RD_Data: 7f800000 ALUOp: 0
```
Through debugging and observation, we discovered that some instructions were not implemented correctly. Specifically, in the ALU module, `SRA` and `SRAI` should be assigned an `ALUOp` value of 13 instead of 5.
```scala=
// AluOpCode (ALU.scala)
val ALU_ADD = 0.U(5.W)
val ALU_ADDI = 0.U(5.W)
val ALU_SW = 0.U(5.W)
val ALU_LW = 0.U(5.W)
val ALU_LUI = 0.U(5.W)
val ALU_AUIPC = 0.U(5.W)
val ALU_SLL = 1.U(5.W)
val ALU_SLLI = 1.U(5.W)
val ALU_SLT = 2.U(5.W)
val ALU_SLTI = 2.U(5.W)
val ALU_SLTU = 3.U(5.W)
val ALU_SLTUI = 3.U(5.W)
val ALU_XOR = 4.U(5.W)
val ALU_XORI = 4.U(5.W)
val ALU_SRL = 5.U(5.W)
val ALU_SRLI = 5.U(5.W)
val ALU_OR = 6.U(5.W)
val ALU_ORI = 6.U(5.W)
val ALU_AND = 7.U(5.W)
val ALU_ANDI = 7.U(5.W)
val ALU_SUB = 8.U(5.W)
val ALU_SRA = 13.U(5.W)
val ALU_SRAI = 13.U(5.W)
val ALU_JAL = 31.U(5.W)
val ALU_JALR = 31.U(5.W)
```
The original ALU code only supported I-type instructions with operation codes less than 8, as it only considered `func3` values from `000` to `111` (0 to 7).
```scala=
// AluControl (Alu_Control.scala)
// R type
when (io.aluOp === 0.U) {
io.out := Cat(0.U(2.W), io.func7, io.func3)
// I type
}.elsewhen (io.aluOp === 1.U) {
io.out := Cat("b00".U(2.W), io.func3)
}
```
To fix this issue, we referred to the RISC-V instruction set and extended the ALU module to include the missing instructions. The following table illustrates the I-type and R-type instructions.
<center><img src="https://five-embeddev.com/riscv-user-isa-manual/Priv-v1.12/instr-table_00.svg"></center>
To accurately calculate the `aluOp` code, the ALU Control unit must consider the entire `func7` field of I-type and R-type instructions. Originally, `func7` was defined as a boolean in ALU Control, which was incorrect. We rectified this by defining `func7` as a 7-bit unsigned integer.
```scala=
// AluControl (Alu_Control.scala)
val io = IO(new Bundle {
val func3 = Input(UInt(3.W))
val func7 = Input(UInt(7.W)) // changed from Input(Bool())
val aluOp = Input(UInt(3.W))
val out = Output(UInt(5.W))
})
```
Additionally, outside of ALU Control, we ensured that the correct bits for `func7` are properly extracted.
```scala=
// PIPELINE (Main.scala)
ID_EX_.io.func7_in := IF_ID_.io.SelectedInstr_out(31, 25) // changed from IF_ID_.io.SelectedInstr_out(30)
```
The revised ALU Control code snippet below now supports `SRA`, `SRAI`, and `SUB` instructions, which have operation codes greater than 7.
```scala=
// AluControl (Alu_Control.scala)
// R type
when (io.aluOp === 0.U) {
when ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b000".U(3.W))) {
io.out := 8.U // SUB, originally broken
}.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 13.U // SRA, originally broken
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 5.U // SRL
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) {
io.out := 1.U // SLL
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b010".U(3.W))) {
io.out := 2.U // SLT
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b011".U(3.W))) {
io.out := 3.U // SLTU
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b100".U(3.W))) {
io.out := 4.U // XOR
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b111".U(3.W))) {
io.out := 7.U // AND
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b110".U(3.W))) {
io.out := 6.U // OR
}.otherwise {
io.out := Cat(0.U(2.W), io.func7, io.func3)
}
// I type
}.elsewhen(io.aluOp === 1.U) {
when ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 5.U // SRLI
}.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 13.U // SRAI, originally broken
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) {
io.out := 1.U // SLLI
}.otherwise {
io.out := Cat("b00".U(2.W), io.func3)
}
}
```
#### M Extension
##### 1. ALU Control
Filepath: `src/main/scala/Pipeline/UNits/Alu_Control.scala`
To implement the **M-extension**, we need to modify the `AluControl` module to allow the `func7` signal to be passed into the module. Below is the updated definition for the AluControl class:
```scala=
class AluControl extends Module {
val io = IO(new Bundle {
val func3 = Input(UInt(3.W)) // 3-bit function code for RISC-V instructions
val func7 = Input(UInt(7.W)) // 7-bit function code for RISC-V instructions (used for M-extension)
val aluOp = Input(UInt(3.W)) // ALU operation selector
val out = Output(UInt(5.W)) // ALU operation output code
})
io.out := 0.U
...
}
```
In the R-type instruction logic, we need to add a condition to handle M-extension instructions. Specifically, when `func7` equals `b0000001`, the instruction corresponds to an M-extension operation, such as multiplication (MUL), division (DIV), or remainder (REM). Below is the updated code for supporting M-extension:
```scala=
// R type
when(io.aluOp === 0.U) {
// First, check for M-extension: func7 === "b0000001"
when(io.func7 === "b0000001".U) {
// M-extension operations (e.g., MUL, DIV, REM)
switch(io.func3) {
is("b000".U) { io.out := 14.U } // MUL
is("b001".U) { io.out := 15.U } // MULH
is("b010".U) { io.out := 16.U } // MULHSU
is("b011".U) { io.out := 17.U } // MULHU
is("b100".U) { io.out := 18.U } // DIV
is("b101".U) { io.out := 19.U } // DIVU
is("b110".U) { io.out := 20.U } // REM
is("b111".U) { io.out := 21.U } // REMU
}
...
}
```
1. Adding func7 Input:
* The func7 signal is now passed as an input to the AluControl module. This allows the module to distinguish between standard R-type instructions and M-extension instructions, as M-extension operations are identified by `func7 === "b0000001"`.
2. Condition for M-extension:
* A new when block is introduced to check if `func7` equals `b0000001`, which indicates an M-extension instruction.
* Inside this block, a switch statement is used to determine the specific operation based on the func3 value.
3. Assigning ALU Operation Codes:
* Each M-extension instruction (e.g., MUL, DIV, REM) is assigned a unique 5-bit operation code (`14.U` to `21.U`), which corresponds to the predefined codes in the ALU.
##### 2. ALU
Extending the ALU to Support M-extension Instructions
To fully implement the M-extension, we need to modify both the AluOpCode object and the ALU module. Below are the detailed steps with the modifications.
1. Modifying `AluOpCode` to Include M-extension Instruction Types
We add operation codes for the M-extension instructions (MUL, DIV, REM, etc.) in the AluOpCode object. These codes will represent each specific M-extension operation.
```scala=
object AluOpCode {
...
// M-extension operations
val ALU_MUL = 14.U(5.W) // Multiplication
val ALU_MULH = 15.U(5.W) // Multiplication high (signed)
val ALU_MULHSU = 16.U(5.W) // Multiplication high (signed x unsigned)
val ALU_MULHU = 17.U(5.W) // Multiplication high (unsigned)
val ALU_DIV = 18.U(5.W) // Division (signed)
val ALU_DIVU = 19.U(5.W) // Division (unsigned)
val ALU_REM = 20.U(5.W) // Remainder (signed)
val ALU_REMU = 21.U(5.W) // Remainder (unsigned)
}
```
* Each M-extension instruction is assigned a unique 5-bit operation code.
* These codes are used by the AluControl module to generate the appropriate `alu_Op` value for the **ALU**.
2. Implementing M-extension Operations in the ALU Module
The ALU module is extended to perform the M-extension operations based on the `alu_Op` value provided.
```scala=
class ALU extends Module {
val io = IO(new Bundle {
val in_A = Input(SInt(32.W)) // First operand
val in_B = Input(SInt(32.W)) // Second operand
val alu_Op = Input(UInt(5.W)) // ALU operation code
val out = Output(SInt(32.W)) // ALU result
})
val result = WireDefault(0.S(32.W)) // Default result is zero
switch(io.alu_Op) {
...
// M-extension operations
is(ALU_MUL) {
result := io.in_A * io.in_B // Standard multiplication
}
is(ALU_MULH) {
result := (io.in_A * io.in_B)(63, 32).asSInt // High 32 bits of signed multiplication
}
is(ALU_MULHSU) {
result := (io.in_A.asSInt * io.in_B.asUInt)(63, 32).asSInt // High 32 bits (signed x unsigned)
}
is(ALU_MULHU) {
result := (io.in_A.asUInt * io.in_B.asUInt)(63, 32).asSInt // High 32 bits of unsigned multiplication
}
is(ALU_DIV) {
result := io.in_A / io.in_B // Signed division
}
is(ALU_DIVU) {
result := (io.in_A.asUInt / io.in_B.asUInt).asSInt // Unsigned division
}
is(ALU_REM) {
result := io.in_A % io.in_B // Signed remainder
}
is(ALU_REMU) {
result := (io.in_A.asUInt % io.in_B.asUInt).asSInt // Unsigned remainder
}
}
io.out := result // Output the result
}
```
---
### 5. Pipeline Flushing
Since branching or jumping occurs during the `MEM` stage, we need to flush both the `IF/ID` and `ID/EX` pipelines with `NOP` instructions (`addi x0, x0, 0`) and clear all control signals. The corrected code is shown below:
```scala=
// PIPELINE (Main.scala)
when(HazardDetect.io.pc_forward === 1.B) {
PC.io.in := HazardDetect.io.pc_out
}.otherwise {
when(control_module.io.next_pc_sel === "b01".U) {
when(Branch_M.io.br_taken === 1.B && control_module.io.branch === 1.B) {
PC.io.in := ImmGen.io.SB_type
// Flush IF/ID
IF_ID_.io.pc_in := 0.S
IF_ID_.io.pc4_in := 0.U
IF_ID_.io.SelectedPC:= 0.S
IF_ID_.io.SelectedInstr := 0.U
// Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
ID_EX_.io.rs1_in := 0.U
ID_EX_.io.rs2_in := 0.U
ID_EX_.io.imm := 0.S
ID_EX_.io.func3_in := 0.U
ID_EX_.io.func7_in := 0.U
ID_EX_.io.rd_in := 0.U
// Also set the control signals to 0 so no writes occur:
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_OpA_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
ID_EX_.io.IFID_pc4_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
}.otherwise {
PC.io.in := PC4.io.out.asSInt
}
}.elsewhen(control_module.io.next_pc_sel === "b10".U) {
PC.io.in := ImmGen.io.UJ_type
// Flush IF/ID
IF_ID_.io.pc_in := 0.S
IF_ID_.io.pc4_in := 0.U
IF_ID_.io.SelectedPC:= 0.S
IF_ID_.io.SelectedInstr := 0.U
// Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
ID_EX_.io.rs1_in := 0.U
ID_EX_.io.rs2_in := 0.U
ID_EX_.io.imm := 0.S
ID_EX_.io.func3_in := 0.U
ID_EX_.io.func7_in := 0.U
ID_EX_.io.rd_in := 0.U
// Also set the control signals to 0 so no writes occur:
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_OpA_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
ID_EX_.io.IFID_pc4_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
}.elsewhen(control_module.io.next_pc_sel === "b11".U) {
PC.io.in := JALR.io.out.asSInt
// Flush IF/ID
IF_ID_.io.pc_in := 0.S
IF_ID_.io.pc4_in := 0.U
IF_ID_.io.SelectedPC:= 0.S
IF_ID_.io.SelectedInstr := 0.U
// Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
ID_EX_.io.rs1_in := 0.U
ID_EX_.io.rs2_in := 0.U
ID_EX_.io.imm := 0.S
ID_EX_.io.func3_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
// Also set the control signals to 0 so no writes occur:
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_OpA_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
ID_EX_.io.IFID_pc4_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
}.otherwise {
PC.io.in := PC4.io.out.asSInt
}
}
```
---
## C. Test Cases
### 1. argmax
This RISC-V assembly program finds the index of the maximum value in a predefined integer array. It initializes the array with three elements (`0, 2, 1`) and iterates through it to compare each element with the current maximum value. The program uses registers to track the current maximum value (`t0`), its index (`t1`), and the current index (`t2`). If a larger value is found, both the maximum value and its index are updated. Once the loop completes, the index of the maximum value is stored in register `a0`, and the program exits using a system call. This implementation demonstrates basic array traversal and conditional updates in assembly.
```asm=
.data
array: .word 0, 0, 0
.text
_start:
la a0, array
li t1, 0
addi s0, s0, 0
addi s0, s0, 0
sw t1, 0(a0)
li t1, 2
addi s0, s0, 0
addi s0, s0, 0
sw t1, 4(a0)
li t1, 1
addi s0, s0, 0
addi s0, s0, 0
sw t1, 8(a0)
li a1, 3
argmax:
li t6, 1
lw t0, 0(a0)
li t1, 0
li t2, 1
loop_start:
beq t2, a1, end
addi s0, s0, 0
addi s0, s0, 0
addi a0, a0, 4
addi s0, s0, 0
addi s0, s0, 0
lw t3, 0(a0)
addi s0, s0, 0
addi s0, s0, 0
bge t3, t0, set_max_num
addi s0, s0, 0
addi s0, s0, 0
addi t2, t2, 1
addi s0, s0, 0
addi s0, s0, 0
j loop_start
addi s0, s0, 0
addi s0, s0, 0
set_max_num:
mv t0, t3
mv t1, t2
addi t2, t2, 1
addi s0, s0, 0
addi s0, s0, 0
j loop_start
addi s0, s0, 0
addi s0, s0, 0
end:
mv a0, t1
li a7, 10
ecall
```
### 2. clz
This RISC-V assembly program calculates the number of leading zeros in a 32-bit integer. The program starts by loading a value (`0x70000002`) into register `a0` and calls the `my_clz` function. In `my_clz`, the input value is processed using a bitmask (`t3`) initialized to `0x80000000` (representing the most significant bit). A loop checks each bit from left to right by performing a bitwise AND operation between the input value and the bitmask. If the current bit is 1, the loop exits; otherwise, the bitmask is right-shifted, and a counter (`t1`) is incremented. Once the loop completes, the count of leading zeros is returned in `a0`, and the program exits.
```asm=
main:
li a0, 0x70000002
jal ra, my_clz
li a7, 10
ecall
my_clz:
mv t0, a0
li t1, 0
li t3, 0x80000000
clz_loop:
and t4, t0, t3
bne t4, x0, exit_clz
srli t3, t3, 1
addi t1, t1, 1
bnez t3, clz_loop
exit_clz:
mv a0, t1
ret
```
### 3. fabsf
This RISC-V assembly program calculates the absolute value of a 32-bit floating-point number. The program begins by loading the value `0xFFFFFFFF` into register `a0`, representing the input, and then calls the `fabsf` function. Inside `fabsf`, a bitmask (`0x7FFFFFFF`) is loaded into `t0`, which clears the sign bit of the input number when applied using a bitwise AND operation. The result, stored back in `a0`, represents the absolute value of the input. Finally, the program exits the function and terminates using a system call.
```asm=
main:
li a0, 0xFFFFFFFF
jal ra, fabsf
li a7, 10
ecall
fabsf:
li t0, 0x7FFFFFFF
and a0, a0, t0
jr ra
```
### 4. fp16 to 32
This RISC-V assembly program converts a 16-bit floating-point number (FP16) to a 32-bit floating-point number (FP32). The main function loads the FP16 value (`0xFFFFFFFF`) into register `a0` and calls the `fp16_to_fp32` function. Within `fp16_to_fp32`, the program handles sign extraction, normalization, and exponent adjustment. The `my_clz` function is used to calculate the number of leading zeros for normalization. The program adjusts the FP16 format to FP32 by aligning the mantissa, adding a bias to the exponent, and managing special cases like zeros, infinities, and NaNs. Finally, the result is constructed by combining the sign, exponent, and mantissa and is returned in `a0`. The program uses a stack for register saving and restoring during function calls to maintain execution context.
```asm=
main:
li a0, 0xFFFFFFFF
jal ra, fp16_to_fp32
li a7, 10
ecall
my_clz:
my_clz_prologue:
add t0, x0, a0
my_clz_padding:
srli t1, t0, 1
or t0, t0, t1
srli t1, t0, 2
or t0, t0, t1
srli t1, t0, 4
or t0, t0, t1
srli t1, t0, 8
or t0, t0, t1
srli t1, t0, 16
or t0, t0, t1
my_clz_popcount:
srli t1, t0, 1
li t2, 0x55555555
and t1, t1, t2
sub t0, t0, t1
srli t1, t0, 2
li t2, 0x33333333
and t1, t1, t2
and t2, t0, t2
add t0, t1, t2
srli t1, t0, 4
add t1, t1, t0
li t2, 0x0F0F0F0F
and t0, t1, t2
srli t1, t0, 8
add t0, t0, t1
srli t1, t0, 16
add t0, t0, t1
li t2, 0x3F
and t0, t0, t2
li t1, 32
sub a0, t1, t0
my_clz_epilogue:
jr ra
fp16_to_fp32:
fp16_to_fp32_prologue:
addi sp, sp, -28
sw ra, 0(sp)
sw s0, 4(sp)
sw s1, 8(sp)
sw s2, 12(sp)
sw s3, 16(sp)
sw s4, 20(sp)
sw s5, 24(sp)
fp16_to_fp32_prologue_after:
slli s0, a0, 16
li s1, 0x80000000
and s1, s1, s0
li s2, 0x7FFFFFFF
and s2, s2, s0
mv a0, s2
jal ra, my_clz
li s3, 0
li t0, 5
slt t0, t0, a0
beq t0, x0, fp16_to_fp32_post_overflow_check
addi s3, a0, -5
fp16_to_fp32_post_overflow_check:
li s4, 0x04000000
add s4, s2, s4
srai s4, s4, 8
li t0, 0x7F800000
and s4, s4, t0
addi s5, s2, -1
srli s5, s5, 31
sll t0, s2, s3
srli t0, t0, 3
li t1, 0x70
sub t1, t1, s3
slli t1, t1, 23
add t0, t0, t1
or t0, t0, s4
not t1, s5
and t0, t0, t1
or a0, s1, t0
fp16_to_fp32_epilogue:
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
lw s5, 24(sp)
addi sp, sp, 28
jr ra
```
### 5. multiply
This RISC-V assembly program performs multiplication using the shift-and-add method, which is a bitwise algorithm. It takes two numbers (multiplier and multiplicand) and calculates their product without using the `mul` instruction. The program handles negative values by converting them to positive before computation and uses a 32-bit loop counter to iterate through each bit of the multiplier. For each bit, it conditionally adds the multiplicand to an accumulator if the bit is 1. The multiplier is shifted right, and the multiplicand is shifted left after each iteration. The result is stored in `a0` at the end, and the program exits.
```asm=
main:
li a1, 6
li a3, 7
li t0, 0
li t1, 32
bltz a1, handle_negative1
j shift_and_add_loop
bltz a3, handle_negative2
j shift_and_add_loop
handle_negative1:
neg a1, a1
handle_negative2:
neg a3, a3
shift_and_add_loop:
beqz t1, end_shift_and_add
andi t2, a1, 1
beqz t2, skip_add
add t0, t0, a3
skip_add:
srai a1, a1, 1
slli a3, a3, 1
addi t1, t1, -1
j shift_and_add_loop
end_shift_and_add:
mv a0, t0
li a7, 10
ecall
```
### MainTest.scala
```scala=
class TOPTest extends FreeSpec with ChiselScalatestTester{
"argmax test" in{
test(new PIPELINE("/home/mi2s/FProject/compilation/argmax.txt")){
x =>
x.clock.step(69)
x.io.out.expect(1.S)
}
}
"clz test" in{
test(new PIPELINE("/home/mi2s/FProject/compilation/clz.txt")){
x =>
x.clock.step(200)
x.io.out.expect(15.S)
}
}
"fabsf test" in {
test(new PIPELINE("/home/mi2s/FProject/test_compilation/fabsf.txt")){
x =>
x.clock.step(200)
x.io.out.expect(2147483647.S)
}
}
"fp16_to_32 test" in {
test(new PIPELINE("/home/mi2s/FProject/test_compilation/fp16_to_32.txt")){
x =>
x.clock.step(107)
x.io.out.expect(-8192.S)
}
}
"multiply test" in{
test(new PIPELINE("/home/mi2s/FProject/compilation/multiply.txt")){
x =>
x.clock.step(370)
x.io.out.expect(42.S)
}
}
}
```
### Test Result
```
[info] TOPTest:
[info] - argmax test
[info] - clz test
[info] - fabsf test
[info] - fp16_to_32 test
[info] - multiply test
[info] Run completed in 4 seconds, 621 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 8 s, completed Jan 23, 2025, 6:23:41 PM
```
---
## D. Chisel Tutorial
- **Construct RISC-V CPU**
- `sbt test`
:::info
To save the execution history as a file, use `sbt test > <filename.txt>`.
:::
---
## E. RISC-V Compilation
- **Compiler Environment Setup**
- `git clone https://github.com/riscv/riscv-gnu-toolchain`
- `cd riscv-gnu-toolchain`
- `sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build`
- `make linux`
- **Program Compilation**
1. **Conversion (`*.s` to `*.elf`)**
- **Command**: `riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -o <in_name>.elf <out_name>.s`
:::info
For RISC-V programs utilizing the M-extension, change to `-march=rv32im`.
:::
2. **Conversion (`*.elf` to `*.bin`)**
- **Command:** `riscv64-unknown-elf-objcopy -O binary <out_name>.elf <in_name>.bin`
3. **Conversion (`*.elf` to `*.hex`)**
- **Command:** `riscv64-unknown-elf-objcopy -O verilog <out_name>.elf <in_name>.hex`
:::warning
The compiled program must undergo post-processing for being encoded in the form of little Endian, containing special characters and whitespaces.
:::