蕭郁霖, 徐向廷
Filepath: src/main/scala/Pipeline/UNits/RegisterFile.scala
class RegisterFile extends Module {
val io = IO(new Bundle {
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val reg_write = Input(Bool())
val w_reg = Input(UInt(5.W))
val w_data = Input(SInt(32.W))
val rdata1 = Output(SInt(32.W))
val rdata2 = Output(SInt(32.W))
})
val regfile = RegInit(VecInit(Seq.fill(32)(0.S(32.W))))
io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))
when(io.reg_write && io.w_reg =/= 0.U) {
regfile(io.w_reg) := io.w_data
}
}
The code snippet defines a RegisterFile
module for a RISC-V pipeline, featuring seven input and output ports dedicated to data transfer. In RISC-V, unlike the classic MIPS pipeline, the register file supports two read registers (rs1
and rs2
) and a single write register. Initially, the register file is instantiated with 32 registers, all initialized to 0. The outputs rdata1
and rdata2
are continuously updated based on the values of rs1
and rs2
, respectively—with a special check to ensure that reading from register 0 always returns 0. For write operations, if the reg_write
flag is asserted and the target register (w_reg
) is not zero, the corresponding register is updated with the value provided on w_data
. The following image illustrates the seven ports that facilitate these operations in the RegisterFile
unit.
Filepath: src/main/scala/Pipeline/UNits/PC.scala
class PC extends Module {
val io = IO (new Bundle {
val in = Input(SInt(32.W))
val out = Output(SInt(32.W))
})
val PC = RegInit(0.S(32.W))
io.out := PC
PC := io.in
}
The code snippet implements a program counter (PC
) module that maintains the current program counter value. It uses RegInit
to initialize the register to 0 and updates the stored PC value with the input (io.in
) at every cycle, while also exposing this value via io.out
.
Filepath: src/main/scala/Pipeline/UNits/PC4.scala
class PC4 extends Module {
val io = IO (new Bundle {
val pc = Input(UInt(32.W))
val out = Output(UInt(32.W))
})
io.out := 0.U
io.out := io.pc + 4.U(32.W)
}
The second snippet defines a PC4
module, which computes the next program counter value by simply adding 4 to the current PC input (io.pc
). This incrementation is crucial for sequential instruction execution in the pipeline.
Filepath: src/main/scala/Pipeline/UNits/JALR.scala
class Jalr extends Module {
val io = IO(new Bundle {
val imme = Input(UInt(32.W))
val rdata1 = Input(UInt(32.W))
val out = Output(UInt(32.W))
})
val computedAddr = io.imme + io.rdata1
// Align the address by masking the least significant bit (LSB) to 0
io.out := computedAddr & "hFFFFFFFE".U
}
The code snippet above implements the address calculation for the jump-and-link-register (JALR
) instruction. The module computes the target address by adding a forwarded register value (rdata1
) to an immediate offset (imme
). To ensure proper alignment, it then applies a binary mask (0xFFFFFFFE
), forcing the least significant bit (LSB) to 0. The aligned jump address is finally provided through io.out
.
Filepath: src/main/scala/Pipeline/UNits/ImmGenerator.scala
class ImmGenerator extends Module {
val io = IO(new Bundle {
val instr = Input(UInt(32.W))
val pc = Input(UInt(32.W))
val I_type = Output(SInt(32.W))
val S_type = Output(SInt(32.W))
val SB_type = Output(SInt(32.W))
val U_type = Output(SInt(32.W))
val UJ_type = Output(SInt(32.W))
})
// I-Type Immediate: [31:20] sign-extended to 32 bits
io.I_type := Cat(Fill(20, io.instr(31)), io.instr(31, 20)).asSInt
// S-Type Immediate: [31:25][11:7] sign-extended to 32 bits
io.S_type := Cat(Fill(20, io.instr(31)), io.instr(31, 25), io.instr(11, 7)).asSInt
// Branch-Type Immediate: [31][7][30:25][11:8] sign-extended to 32 bits
val sbImm = Cat(Fill(19, io.instr(31)), io.instr(31), io.instr(7), io.instr(30, 25), io.instr(11, 8), 0.U(1.W)).asSInt
io.SB_type := sbImm + io.pc.asSInt
// U-Type Immediate: [31:12] shifted left by 12 bits
io.U_type := Cat(io.instr(31, 12), Fill(12, 0.U)).asSInt
// UJ-Type Immediate: [31][19:12][20][30:21] sign-extended to 32 bits, shifted left by 1 bit
val ujImm = Cat(Fill(11, io.instr(31)), io.instr(31), io.instr(19, 12), io.instr(20), io.instr(30, 21), 0.U(1.W)).asSInt
io.UJ_type := ujImm + io.pc.asSInt
}
The code snippet implements the generation of 32-bit immediate values from RISC-V instructions, tailored to each instruction format. For I-type instructions, it extracts bits [31:20]
from the instruction and sign-extends them to 32 bits. In the case of S-type instructions, the immediate is formed by concatenating bits [31:25]
with bits [11:7]
and then sign-extending the result. For branch (SB-type) instructions, the immediate is built by concatenating several segments—bit 31, bit 7, bits [30:25]
, and bits [11:8]
—with an additional 0 appended as the least significant bit for proper alignment, followed by sign extension. For U-type instructions, the immediate is taken from bits [31:12]
and shifted left by 12 bits. Finally, for UJ-type instructions, the immediate is generated by concatenating bit 31, bits [19:12]
, bit 20, and bits [30:21]
, appending a trailing 0, and then sign-extending the result to 32 bits.
Additionally, the module computes target addresses for control flow instructions using these immediates. The output io.SB_type
represents the branch target address for SB-type instructions, obtained by adding the sign-extended branch immediate to the current program counter (PC), thus yielding a PC-relative address for branch operations. Similarly, io.UJ_type
provides the target address for UJ-type (jump) instructions by adding the corresponding immediate value to the current PC. These computed addresses are essential for correctly directing the control flow during instruction execution in the RISC-V pipeline.
Filepath: src/main/scala/Pipeline/UNits/control.scala
class Control extends Module {
val io = IO(new Bundle {
val opcode = Input(UInt(7.W)) // 7-bit opcode
val mem_write = Output(Bool()) // whether a write to memory
val branch = Output(Bool()) // whether a branch instruction
val mem_read = Output(Bool()) // whether a read from memory
val reg_write = Output(Bool()) // whether a register write
val men_to_reg = Output(Bool()) // whether the value written to a register (for load instructions)
val alu_operation = Output(UInt(3.W))
val operand_A = Output(UInt(2.W)) // Operand A source selection for the ALU
val operand_B = Output(Bool()) // Operand B source selection for the ALU
// Indicates the type of extension to be used (e.g., sign-extend, zero-extend)
val extend = Output(UInt(2.W))
val next_pc_sel = Output(UInt(2.W)) // next PC value (e.g., PC+4, branch target, jump target)
})
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 0.B
io.men_to_reg := 0.B
io.alu_operation := 0.U
io.operand_A := 0.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 0.U
switch(io.opcode) {
// R type instructions (e.g., add, sub)
is(51.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 0.U
io.operand_A := 0.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 0.U
}
// I type instructions (e.g., immediate operations)
is(19.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 1.U
io.operand_A := 0.U
io.operand_B := 1.B
io.extend := 0.U
io.next_pc_sel := 0.U
}
// S type instructions (e.g., store operations)
is(35.U) {
io.mem_write := 1.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 0.B
io.men_to_reg := 0.B
io.alu_operation := 5.U
io.operand_A := 0.U
io.operand_B := 1.B
io.extend := 1.U
io.next_pc_sel := 0.U
}
// Load instructions (e.g., load data from memory)
is(3.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 1.B
io.reg_write := 1.B
io.men_to_reg := 1.B
io.alu_operation := 4.U
io.operand_A := 0.U
io.operand_B := 1.B
io.extend := 0.U
io.next_pc_sel := 0.U
}
// SB type instructions (e.g., conditional branch)
is(99.U) {
io.mem_write := 0.B
io.branch := 1.B
io.mem_read := 0.B
io.reg_write := 0.B
io.men_to_reg := 0.B
io.alu_operation := 2.U
io.operand_A := 0.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 1.U
}
// UJ type instructions (e.g., jump and link)
is(111.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 3.U
io.operand_A := 1.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 2.U
}
// Jalr instruction (e.g., jump and link register)
is(103.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 3.U
io.operand_A := 1.U
io.operand_B := 0.B
io.extend := 0.U
io.next_pc_sel := 3.U
}
// U type (LUI) instructions (e.g., load upper immediate)
is(55.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 6.U
io.operand_A := 3.U
io.operand_B := 1.B
io.extend := 2.U
io.next_pc_sel := 0.U
}
// U type (AUIPC) instructions (e.g., add immediate to PC)
is(23.U) {
io.mem_write := 0.B
io.branch := 0.B
io.mem_read := 0.B
io.reg_write := 1.B
io.men_to_reg := 0.B
io.alu_operation := 7.U
io.operand_A := 2.U
io.operand_B := 1.B
io.extend := 2.U
io.next_pc_sel := 0.U
}
}
}
The code snippet above implements the control unit for a 5-stage RISC-V pipeline. This module generates a suite of control signals—such as memory write
, branch
, memory read
, register write
, memory-to-register
, ALU operation
, operand selection
, extension type
, and next PC selection
—that steer the processor’s datapath. Using a switch-case construct keyed on the opcode, the module assigns specific values to these signals according to the instruction type (e.g., R-type, I-type, S-type, SB-type, U-type, UJ-type, etc.). The accompanying diagram and mapping table illustrate how these signals are routed to the appropriate hardware components in the pipeline.
Label | Signal Name (Code) | Signal Name (Diagram) |
---|---|---|
1 | io.mem_write | MemWrite |
2 | io.branch | Branch |
3 | io.mem_read | MemRead |
4 | io.reg_write | RegWrite |
5 | io.men_to_reg | MemtoReg |
6 | io.alu_operation | ALUSrc |
7 | io.operand_a | ALUOp1 |
8 | io.operand_b | ALUOp0 |
Filepath: src/main/scala/Pipeline/UNits/BRANCH.scala
class Branch extends Module {
val io = IO(new Bundle {
val fnct3 = Input(UInt(3.W))
val branch = Input(Bool())
val arg_x = Input(SInt(32.W))
val arg_y = Input(SInt(32.W))
val br_taken = Output(Bool())
})
io.br_taken := false.B
when(io.branch) {
// beq
when(io.fnct3 === 0.U) {
io.br_taken := io.arg_x === io.arg_y
}
// bne
.elsewhen(io.fnct3 === 1.U) {
io.br_taken := io.arg_x =/= io.arg_y
}
// blt
.elsewhen(io.fnct3 === 4.U) {
io.br_taken := io.arg_x < io.arg_y
}
// bge
.elsewhen(io.fnct3 === 5.U) {
io.br_taken := io.arg_x >= io.arg_y
}
// bltu (unsigned less than)
.elsewhen(io.fnct3 === 6.U) {
io.br_taken := io.arg_x.asUInt < io.arg_y.asUInt
}
// bgeu (unsigned greater than or equal)
.elsewhen(io.fnct3 === 7.U) {
io.br_taken := io.arg_x.asUInt >= io.arg_y.asUInt
}
}
}
The code snippet implements branch decision logic for RISC-V's conditional branch instructions—namely, beq
, bne
, blt
, bge
, bltu
, and bgeu
. It uses four input ports: io.fnct3
, which indicates the specific branch condition based on the instruction's function field; io.branch
, a Boolean flag identifying whether the current instruction is an SB-Type branch; and io.arg_x
and io.arg_y
, which are the operands to be compared. Based on the value of fnct3
, the module evaluates the appropriate comparison between arg_x
and arg_y
, and if the condition is satisfied, sets the output io.br_taken
to true, indicating that a branch should be taken.
Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala
class AluControl extends Module {
val io = IO(new Bundle {
val func3 = Input(UInt(3.W))
val func7 = Input(Bool())
val aluOp = Input(UInt(3.W))
val out = Output(UInt(5.W))
})
io.out := 0.U
// R type
when(io.aluOp === 0.U) {
io.out := Cat(0.U(2.W), io.func7, io.func3)
// I type
}.elsewhen(io.aluOp === 1.U) {
io.out := Cat("b00".U(2.W), io.func3)
// SB type
}.elsewhen(io.aluOp === 2.U) {
io.out := Cat("b010".U(3.W), io.func3)
// Branch type
}.elsewhen(io.aluOp === 3.U) {
io.out := "b11111".U
// Loads, S type, U type (lui), U type (auipc)
}.elsewhen(io.aluOp === 4.U || io.aluOp === 5.U || io.aluOp === 6.U || io.aluOp === 7.U) {
io.out := "b00000".U
} .otherwise {
io.out := 0.U
}
}
The code snippet above implements the ALU Control Unit for a RISC-V pipeline, as illustrated in the diagram below. This unit features three input ports—func3
, func7
, and aluOp
(a signal provided by the core control unit)—and one output port, io.out
. The 5-bit output is determined by combining values from these inputs in a way that depends on the instruction type. For example, R-type instructions derive the ALU operation by concatenating specific bits from func7
and func3
, while I-type instructions form the control signal by prepending a fixed two-bit value to func3
. Other instruction types—such as branch (SB type), jump, and load/store operations—are assigned specific constant values to control the ALU accordingly.
Filepath: src/main/scala/Pipeline/UNits/Alu.scala
object AluOpCode {
val ALU_ADD = 0.U(5.W)
val ALU_ADDI = 0.U(5.W)
val ALU_SW = 0.U(5.W)
val ALU_LW = 0.U(5.W)
val ALU_LUI = 0.U(5.W)
val ALU_AUIPC = 0.U(5.W)
val ALU_SLL = 1.U(5.W)
val ALU_SLLI = 1.U(5.W)
val ALU_SLT = 2.U(5.W)
val ALU_SLTI = 2.U(5.W)
val ALU_SLTU = 3.U(5.W)
val ALU_SLTUI = 3.U(5.W)
val ALU_XOR = 4.U(5.W)
val ALU_XORI = 4.U(5.W)
val ALU_SRL = 5.U(5.W)
val ALU_SRLI = 5.U(5.W)
val ALU_OR = 6.U(5.W)
val ALU_ORI = 6.U(5.W)
val ALU_AND = 7.U(5.W)
val ALU_ANDI = 7.U(5.W)
val ALU_SUB = 8.U(5.W)
val ALU_SRA = 13.U(5.W)
val ALU_SRAI = 13.U(5.W)
val ALU_JAL = 31.U(5.W)
val ALU_JALR = 31.U(5.W)
}
class ALU extends Module {
val io = IO(new Bundle {
val in_A = Input(SInt(32.W))
val in_B = Input(SInt(32.W))
val alu_Op = Input(UInt(5.W))
val out = Output(SInt(32.W))
})
val result = WireDefault(0.S(32.W))
switch(io.alu_Op) {
is(ALU_ADD, ALU_ADDI, ALU_SW, ALU_LW, ALU_LUI, ALU_AUIPC) {
result := io.in_A + io.in_B
}
is(ALU_SLL, ALU_SLLI) {
result := (io.in_A.asUInt << io.in_B(4, 0)).asSInt
}
is(ALU_SLT, ALU_SLTI) {
result := Mux(io.in_A < io.in_B, 1.S, 0.S)
}
is(ALU_SLTU, ALU_SLTUI) {
result := Mux(io.in_A.asUInt < io.in_B.asUInt, 1.S, 0.S)
}
is(ALU_XOR, ALU_XORI) {
result := io.in_A ^ io.in_B
}
is(ALU_SRL, ALU_SRLI) {
result := (io.in_A.asUInt >> io.in_B(4, 0)).asSInt
}
is(ALU_OR, ALU_ORI) {
result := io.in_A | io.in_B
}
is(ALU_AND, ALU_ANDI) {
result := io.in_A & io.in_B
}
is(ALU_SUB) {
result := io.in_A - io.in_B
}
is(ALU_SRA, ALU_SRAI) {
result := (io.in_A >> io.in_B(4, 0)).asSInt
}
is(ALU_JAL, ALU_JALR) {
result := io.in_A
}
}
io.out := result
}
The code snippet implements the ALU unit for a RISC-V pipeline, responsible for executing various arithmetic and logical operations based on the instruction type. The module accepts three input ports: two operands (io.in_A
and io.in_B
) and an operation code (io.alu_Op
) coming from the ALU Control Unit. The result of the computation is output via io.out
. For example, when io.alu_Op
is set to ALU_ADD
or ALU_ADDI
(among other similar opcodes for load/store and immediate operations), the module computes the sum of io.in_A
and io.in_B
and assigns the result to io.out
.
Since the RISC-V pipeline consists of five stages, it requires four sets of pipeline registers. These registers are encapsulated in modules labeled IF/ID
, ID/EX
, EX/MEM
, and MEM/WB
, where the slash indicates the two adjacent stages that the register bridges. These pipeline registers are painted orange in the illustration below.
Filepath: src/main/scala/Pipeline/Pipelines/IF_ID.scala
class IF_ID extends Module {
val io = IO(new Bundle {
val pc_in = Input (SInt(32.W)) // PC in
val pc4_in = Input (UInt(32.W)) // PC4 in
val SelectedPC = Input (SInt(32.W))
val SelectedInstr = Input (UInt(32.W))
val pc_out = Output (SInt(32.W)) // PC out
val pc4_out = Output (UInt(32.W)) // PC + 4 out
val SelectedPC_out = Output (SInt(32.W))
val SelectedInstr_out = Output (UInt(32.W))
})
val Pc_In = RegInit (0.S (32.W))
val Pc4_In = RegInit (0.U (32.W))
val S_pc = RegInit (0.S (32.W))
val S_instr = RegInit (0.U (32.W))
Pc_In := io.pc_in
Pc4_In := io.pc4_in
S_pc := io.SelectedPC
S_instr := io.SelectedInstr
io.pc_out := Pc_In
io.pc4_out := Pc4_In
io.SelectedPC_out := S_pc
io.SelectedInstr_out := S_instr
// io.pc_out := RegNext(io.pc_in)
// io.pc4_out := RegNext(io.pc4_in)
// io.SelectedPC_out := RegNext(io.SelectedPC)
// io.SelectedInstr_out := RegNext(io.SelectedInstr)
}
Although the illustration above shows only three register ports at IF/ID
, the design also takes into account hazard detection (which will be discussed later). In this context, the SelectedPC
signal represents the program counter after hazard resolution. Consequently, the IF/ID
pipeline register stores four values: io.pc_in
, io.pc4_in
, io.SelectedPC
, and io.SelectedInstr
. These registers are instantiated using the RegInit
class, which initializes them with default values.
Filepath: src/main/scala/Pipeline/Pipelines/ID_EX.scala
class ID_EX extends Module {
val io = IO(new Bundle {
val rs1_in = Input(UInt(5.W))
val rs2_in = Input(UInt(5.W))
val rs1_data_in = Input(SInt(32.W))
val rs2_data_in = Input(SInt(32.W))
val imm = Input(SInt(32.W))
val rd_in = Input(UInt(5.W))
val func3_in = Input(UInt(3.W))
val func7_in = Input(Bool())
val ctrl_MemWr_in = Input(Bool())
val ctrl_Branch_in = Input(Bool())
val ctrl_MemRd_in = Input(Bool())
val ctrl_Reg_W_in = Input(Bool())
val ctrl_MemToReg_in = Input(Bool())
val ctrl_AluOp_in = Input(UInt(3.W))
val ctrl_OpA_in = Input(UInt(2.W))
val ctrl_OpB_in = Input(Bool())
val ctrl_nextpc_in = Input(UInt(2.W))
val IFID_pc4_in = Input(UInt(32.W))
val rs1_out = Output(UInt(5.W))
val rs2_out = Output(UInt(5.W))
val rs1_data_out = Output(SInt(32.W))
val rs2_data_out = Output(SInt(32.W))
val rd_out = Output(UInt(5.W))
val imm_out = Output(SInt(32.W))
val func3_out = Output(UInt(3.W))
val func7_out = Output(Bool())
val ctrl_MemWr_out = Output(Bool())
val ctrl_Branch_out = Output(Bool())
val ctrl_MemRd_out = Output(Bool())
val ctrl_Reg_W_out = Output(Bool())
val ctrl_MemToReg_out = Output(Bool())
val ctrl_AluOp_out = Output(UInt(3.W))
val ctrl_OpA_out = Output(UInt(2.W))
val ctrl_OpB_out = Output(Bool())
val ctrl_nextpc_out = Output(UInt(2.W))
val IFID_pc4_out = Output(UInt(32.W))
})
io.rs1_out := RegNext(io.rs1_in)
io.rs2_out := RegNext(io.rs2_in)
io.rs1_data_out := RegNext(io.rs1_data_in)
io.rs2_data_out := RegNext(io.rs2_data_in)
io.imm_out := RegNext(io.imm)
io.rd_out := RegNext(io.rd_in)
io.func3_out := RegNext(io.func3_in)
io.func7_out := RegNext(io.func7_in)
io.ctrl_MemWr_out := RegNext(io.ctrl_MemWr_in)
io.ctrl_Branch_out := RegNext(io.ctrl_Branch_in)
io.ctrl_MemRd_out := RegNext(io.ctrl_MemRd_in)
io.ctrl_Reg_W_out := RegNext(io.ctrl_Reg_W_in)
io.ctrl_MemToReg_out := RegNext(io.ctrl_MemToReg_in)
io.ctrl_AluOp_out := RegNext(io.ctrl_AluOp_in)
io.ctrl_OpA_out := RegNext(io.ctrl_OpA_in)
io.ctrl_OpB_out := RegNext(io.ctrl_OpB_in)
io.ctrl_nextpc_out := RegNext(io.ctrl_nextpc_in)
io.IFID_pc4_out := RegNext(io.IFID_pc4_in)
}
The code snippet implements the ID/EX
pipeline register, which captures and stores several critical values for the subsequent execution stage. In particular, it holds the operand data (rs1_data
and rs2_data
), the incremented program counter (IFID_pc4
), and the immediate value (imm
).
Additionally, it preserves nine control signals generated during instruction decode, ensuring proper propagation through the multi-stage pipeline. Register addresses and function fields such as rs1
, rs2
, rd
, func3
, and func7
are also stored to support data forwarding in the event of hazards.
RegNext
is used instead of RegInit
because it automatically captures and updates each value at the next clock cycle, maintaining seamless data flow between pipeline stages without the need for an explicit initial value.
Filepath: src/main/scala/Pipeline/Pipelines/EX_MEM.scala
class EX_MEM extends Module {
val io = IO(new Bundle {
val IDEX_MEMRD = Input(Bool())
val IDEX_MEMWR = Input(Bool())
val IDEX_MEMTOREG = Input(Bool())
val IDEX_REG_W = Input(Bool())
val IDEX_rs2 = Input(SInt(32.W))
val IDEX_rd = Input(UInt(5.W))
val alu_out = Input(SInt(32.W))
val EXMEM_memRd_out = Output(Bool())
val EXMEM_memWr_out = Output(Bool())
val EXMEM_memToReg_out = Output(Bool())
val EXMEM_reg_w_out = Output(Bool())
val EXMEM_rs2_out = Output(SInt(32.W))
val EXMEM_rd_out = Output(UInt(5.W))
val EXMEM_alu_out = Output(SInt(32.W))
})
io.EXMEM_memRd_out := RegNext(io.IDEX_MEMRD)
io.EXMEM_memWr_out := RegNext(io.IDEX_MEMWR)
io.EXMEM_memToReg_out := RegNext(io.IDEX_MEMTOREG)
io.EXMEM_reg_w_out := RegNext(io.IDEX_REG_W)
io.EXMEM_rs2_out := RegNext(io.IDEX_rs2)
io.EXMEM_rd_out := RegNext(io.IDEX_rd)
io.EXMEM_alu_out := RegNext(io.alu_out)
}
The code snippet above implements the EX/MEM
pipeline registers, which transfer critical data and control signals from the execution stage (EX
) to the memory stage (MEM
). In this module, essential control signals—namely, memRD
, memWr
, and memToReg
—are preserved to ensure proper memory operations and data routing. Additionally, the ALU result (alu_out
) is stored along with the reg_w_out
and rd_out
signals, which are vital for hazard detection and data forwarding in later pipeline stages.
Filepath: src/main/scala/Pipeline/Pipelines/MEM_WB.scala
class MEM_WB extends Module {
val io = IO(new Bundle {
val EXMEM_MEMTOREG = Input(Bool())
val EXMEM_REG_W = Input(Bool())
val EXMEM_MEMRD = Input(Bool())
val EXMEM_rd = Input(UInt(5.W))
val in_dataMem_out = Input(SInt(32.W))
val in_alu_out = Input(SInt(32.W))
val MEMWB_memToReg_out = Output(Bool())
val MEMWB_reg_w_out = Output(Bool())
val MEMWB_memRd_out = Output(Bool())
val MEMWB_rd_out = Output(UInt(5.W))
val MEMWB_dataMem_out = Output(SInt(32.W))
val MEMWB_alu_out = Output(SInt(32.W))
})
io.MEMWB_memToReg_out := RegNext(io.EXMEM_MEMTOREG)
io.MEMWB_reg_w_out := RegNext(io.EXMEM_REG_W)
io.MEMWB_memRd_out := RegNext(io.EXMEM_MEMRD)
io.MEMWB_rd_out := RegNext(io.EXMEM_rd)
io.MEMWB_dataMem_out := RegNext(io.in_dataMem_out)
io.MEMWB_alu_out := RegNext(io.in_alu_out)
}
The code snippet above implements the MEM/WB
pipeline registers, which transfer essential data from the memory stage (MEM
) to the write-back stage (WB
). Specifically, this module preserves control signals such as memToReg
, reg_w
, and memRd
, as well as key data values including the destination register (rd
), data from memory (dataMem
), and the ALU output (alu
).
In the RISC-V pipeline, two distinct memory units are employed: instruction memory and data memory. The repository implements these as separate modules, each tailored to its specific role in the processor's operation.
Filepath: src/main/scala/Pipeline/Memory/InstMem.scala
class InstMem(initFile: String) extends Module {
val io = IO(new Bundle {
val addr = Input(UInt(32.W)) // Address input to fetch instruction
val data = Output(UInt(32.W)) // Output instruction
})
val imem = Mem(1024, UInt(32.W))
loadMemoryFromFile(imem, initFile)
io.data := imem(io.addr/4.U)
}
The code snippet implements the instruction memory module for the RISC-V pipeline. This module features one 32-bit address input (io.addr
) used to fetch instructions and one 32-bit data output (io.data
) for delivering the corresponding instruction. The memory is instantiated with Mem(1024, UInt(32.W))
, which creates an array of 1024 entries, each capable of storing a 32-bit instruction. The initFile
parameter specifies the file from which the initial contents of the instruction memory are loaded, and the function loadMemoryFromFile
is used to populate the memory with these values. Finally, the module accesses the instruction memory by dividing the input address by 4 to ensure proper word alignment.
Filepath: src/main/scala/Pipeline/Memory/DataMemory.scala
class DataMemory extends Module {
val io = IO(new Bundle {
val addr = Input(UInt(32.W)) // Address input
val dataIn = Input(SInt(32.W)) // Data to be written
val mem_read = Input(Bool()) // Memory read enable
val mem_write = Input(Bool()) // Memory write enable
val dataOut = Output(SInt(32.W)) // Data output
})
val Dmemory = Mem(1024, SInt(32.W))
io.dataOut := 0.S
when(io.mem_write) {
Dmemory.write(io.addr, io.dataIn)
}
when(io.mem_read) {
io.dataOut := Dmemory.read(io.addr)
}
}
The code snippet implements the data memory unit for the RISC-V pipeline. This module features four input ports—io.addr
, io.dataIn
, io.mem_read
, and io.mem_write
—and one output port, io.dataOut
. It instantiates a memory array with 1024 entries, where each entry is a 32-bit word. When the control signal io.mem_write
is asserted, the module writes the data from io.dataIn
into the memory at the address specified by io.addr
. Conversely, if io.mem_read
is activated, the module reads the data stored at io.addr
and outputs it via io.dataOut
.
Filepath: src/main/scala/Pipeline/Hazard Units/StructuralHazard.scala
class StructuralHazard extends Module {
val io = IO(new Bundle {
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val MEM_WB_regWr = Input(Bool())
val MEM_WB_Rd = Input(UInt(5.W))
val fwd_rs1 = Output(Bool())
val fwd_rs2 = Output(Bool())
})
// Determine if forwarding is needed for rs1
when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs1) {
io.fwd_rs1 := true.B
}.otherwise {
io.fwd_rs1 := false.B
}
// Determine if forwarding is needed for rs2
when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs2) {
io.fwd_rs2 := true.B
}.otherwise {
io.fwd_rs2 := false.B
}
}
The code snippet implements the structural hazard resolution mechanism for the RISC-V pipeline. This module is connected to four input ports—rs1
, rs2
, MEM_WB_regWr
, and MEM_WB_Rd
—and produces two output ports—fwd_rs1
and fwd_rs2
. The module checks whether the register destination in the MEM/WB
stage (MEM_WB_Rd
) matches either source register (rs1
or rs2
) while ensuring that write-back is enabled (i.e., MEM_WB_regWr
is asserted). If a match is detected, the corresponding forwarding signal (fwd_rs1
or fwd_rs2
) is set to true
; otherwise, it remains false
.
Filepath: src/main/scala/Pipeline/Hazard Units/HazardDetection.scala
class HazardDetection extends Module {
val io = IO(new Bundle {
val IF_ID_inst = Input(UInt(32.W))
val ID_EX_memRead = Input(Bool())
val ID_EX_rd = Input(UInt(5.W))
val pc_in = Input(SInt(32.W))
val current_pc = Input(SInt(32.W))
val inst_forward = Output(Bool())
val pc_forward = Output(Bool())
val ctrl_forward = Output(Bool())
val inst_out = Output(UInt(32.W))
val pc_out = Output(SInt(32.W))
val current_pc_out = Output(SInt(32.W))
})
val Rs1 = io.IF_ID_inst(19, 15)
val Rs2 = io.IF_ID_inst(24, 20)
when(io.ID_EX_memRead === 1.B && ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))) {
io.inst_forward := true.B
io.pc_forward := true.B
io.ctrl_forward := true.B
}.otherwise {
io.inst_forward := false.B
io.pc_forward := false.B
io.ctrl_forward := false.B
}
io.inst_out := io.IF_ID_inst
io.pc_out := io.pc_in
io.current_pc_out := io.current_pc
}
The code snippet implements the hazard detection mechanism, which monitors potential data hazards in the pipeline. When the ID/EX
stage is performing a memory read (i.e., io.ID_EX_memRead
is true
) and the destination register (io.ID_EX_rd
) matches either of the source registers specified in the instruction (Rs1
or Rs2
extracted from io.IF_ID_inst
), the module asserts three forwarding signals: inst_forward
, pc_forward
, and ctrl_forward
are all set to true. These signals indicate that instruction, program counter, and control signal forwarding are required to avoid pipeline stalls. Otherwise, all forwarding signals remain false
. Additionally, the module passes through the values of io.IF_ID_inst
, io.pc_in
, and io.current_pc
to io.inst_out
, io.pc_out
, and io.current_pc_out
, respectively, ensuring that the instruction and relevant PC values continue to the next pipeline stage.
Filepath: src/main/scala/Pipeline/Hazard Units/Forwarding.scala
class Forwarding extends Module {
val io = IO(new Bundle {
val IDEX_rs1 = Input(UInt(5.W))
val IDEX_rs2 = Input(UInt(5.W))
val EXMEM_rd = Input(UInt(5.W))
val EXMEM_regWr = Input(UInt(1.W))
val MEMWB_rd = Input(UInt(5.W))
val MEMWB_regWr = Input(UInt(1.W))
val forward_a = Output(UInt(2.W))
val forward_b = Output(UInt(2.W))
})
io.forward_a := "b00".U
io.forward_b := "b00".U
// EX HAZARD
when(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U &&
(io.EXMEM_rd === io.IDEX_rs1.asUInt) && (io.EXMEM_rd === io.IDEX_rs2)) {
io.forward_a := "b10".U
io.forward_b := "b10".U
}.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U &&
(io.EXMEM_rd === io.IDEX_rs2)) {
io.forward_b := "b10".U
}.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U &&
(io.EXMEM_rd === io.IDEX_rs1)) {
io.forward_a := "b10".U
}
// MEM HAZARD
when((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2) &&
~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1) && (io.EXMEM_rd === io.IDEX_rs2))) {
io.forward_a := "b01".U
io.forward_b := "b01".U
}.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs2) &&
~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs2))){
io.forward_b := "b01".U
}.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) &&
~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1))){
io.forward_a := "b01".U
}
}
This module implements the forwarding unit, which dynamically selects and routes data from later pipeline stages to resolve data hazards in the RISC-V pipeline. The unit examines the source registers from the ID/EX
stage (i.e., IDEX_rs1
and IDEX_rs2
) and compares them with the destination registers from both the EX/MEM
and MEM/WB
stages. Depending on which stage provides the most recent data, the module assigns a corresponding two-bit value to the forwarding outputs (forward_a
and forward_b
). For example, when the EX/MEM
stage is writing to a non-zero register that matches a source operand, the corresponding forward signal is set to binary 10
, indicating that data should be forwarded directly from the EX/MEM
stage.
In the MEM
hazard section, the module addresses cases where the MEM/WB
stage holds the data needed by the current instruction. Here, the module checks whether the MEM/WB
stage is writing to a non-zero register that matches the source registers of the ID/EX
stage. However, this forwarding is only enabled if the EX/MEM
stage is not already forwarding for that register (thereby prioritizing EX
hazards). If the conditions are met, the forward signal is set to binary 01
, signaling that the required data should be forwarded from the MEM/WB
stage. This mechanism ensures that even if an instruction's result has not been written back yet, the correct value is available for subsequent computations, thereby avoiding pipeline stalls.
Filepath: src/main/scala/Pipeline/Hazard Units/BranchForward.scala
class BranchForward extends Module {
val io = IO(new Bundle {
val ID_EX_RD = Input(UInt(5.W))
val EX_MEM_RD = Input(UInt(5.W))
val MEM_WB_RD = Input(UInt(5.W))
val ID_EX_memRd = Input(UInt(1.W))
val EX_MEM_memRd = Input(UInt(1.W))
val MEM_WB_memRd = Input(UInt(1.W))
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val ctrl_branch = Input(UInt(1.W))
val forward_rs1 = Output(UInt(4.W))
val forward_rs2 = Output(UInt(4.W))
})
io.forward_rs1 := "b0000".U
io.forward_rs2 := "b0000".U
// Branch forwarding logic
when(io.ctrl_branch === 1.U) {
// ALU Hazard
when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U) {
when(io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) {
io.forward_rs1 := "b0001".U
io.forward_rs2 := "b0001".U
}.elsewhen(io.ID_EX_RD === io.rs1) {
io.forward_rs1 := "b0001".U
}.elsewhen(io.ID_EX_RD === io.rs2) {
io.forward_rs2 := "b0001".U
}
}
// EX/MEM Hazard
when(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U) {
when(io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2)) {
io.forward_rs1 := "b0010".U
io.forward_rs2 := "b0010".U
}.elsewhen(io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
io.forward_rs1 := "b0010".U
}.elsewhen(io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2)) {
io.forward_rs2 := "b0010".U
}
}
// MEM/WB Hazard
when(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U) {
when(io.MEM_WB_RD === io.rs1 && io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2)) {
io.forward_rs1 := "b0011".U
io.forward_rs2 := "b0011".U
}.elsewhen(io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
io.forward_rs1 := "b0011".U
}.elsewhen(io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs2)) {
io.forward_rs2 := "b0011".U
}
}
// Jalr forwarding logic
}.elsewhen(io.ctrl_branch === 0.U) {
when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U && io.ID_EX_RD === io.rs1) {
io.forward_rs1 := "b0110".U
}.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
io.forward_rs1 := "b0111".U
}.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd === 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
io.forward_rs1 := "b1001".U
}.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
io.forward_rs1 := "b1000".U
}.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd === 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
io.forward_rs1 := "b1010".U
}
}
}
The BranchForward
module is a key component in the RISC-V pipeline, responsible for resolving data hazards during branch
and Jalr
instruction execution. It determines if source operands for branch evaluation need to be forwarded from later pipeline stages to avoid stalls. The module takes as inputs the destination register identifiers and memory read flags from the ID/EX
, EX/MEM
, and MEM/WB
pipeline stages, alongside the source register identifiers (rs1
and rs2
) of the branch instruction and a control signal (ctrl_branch
). The outputs, forward_rs1
and forward_rs2
, are four-bit signals indicating the source of the forwarded data. When ctrl_branch
is set to 1
, branch forwarding logic is applied by sequentially checking for hazards in the ID/EX
, EX/MEM
, and MEM/WB
stages, forwarding the most recent valid data to the source registers based on specific matching conditions.
For Jalr
instructions, indicated when ctrl_branch
is set to 0
, the module only evaluates the source register rs1
for potential forwarding. It similarly checks the ID/EX
, EX/MEM
, and MEM/WB
stages for data matches, prioritizing the most recent and valid data for forwarding. Different codes are assigned to forward_rs1
based on whether the data comes from a memory read or a non-memory read operation. This modular and hierarchical approach ensures that the correct operand is always forwarded for branch or Jalr
instruction evaluation, reducing pipeline stalls and maintaining efficient instruction execution.
Filepath: src/main/scala/Pipeline/Main.scala
val PC_F = MuxLookup(HazardDetect.io.pc_forward, 0.S, Array(
(0.U) -> PC4.io.out.asSInt,
(1.U) -> HazardDetect.io.pc_out))
PC.io.in := PC_F // PC_in input
PC4.io.pc := PC.io.out.asUInt // PC4_in input <- PC_out
InstMemory.io.addr := PC.io.out.asUInt // Address to fetch instruction
val PC_for = MuxLookup (HazardDetect.io.inst_forward, 0.S, Array (
(0.U) -> PC.io.out,
(1.U) -> HazardDetect.io.current_pc_out))
val Instruction_F = MuxLookup (HazardDetect.io.inst_forward, 0.U, Array (
(0.U) -> InstMemory.io.data,
(1.U) -> HazardDetect.io.inst_out))
This code snippet demonstrates the use of MuxLookup
to manage the Program Counter (PC) update logic in a pipeline processor. It incorporates hazard detection mechanisms to ensure the correct instruction is executed, even in the presence of potential pipeline hazards.
// Decode connections (Control unit RegFile)
control_module.io.opcode := IF_ID_.io.SelectedInstr_out(6, 0) // OPcode to check Instrcution TYpe
// Registerfile inputs
RegFile.io.rs1 := Mux(
control_module.io.opcode === 51.U || // R-type
control_module.io.opcode === 19.U || // I-type
control_module.io.opcode === 35.U || // S-type
control_module.io.opcode === 3.U || // I-type (load instructions)
control_module.io.opcode === 99.U || // SB-type (branch)
control_module.io.opcode === 103.U, // JALR instruction
IF_ID_.io.SelectedInstr_out(19, 15), 0.U )
RegFile.io.rs2 := Mux(
control_module.io.opcode === 51.U || // R-type
control_module.io.opcode === 35.U || // S-type
control_module.io.opcode === 99.U, // SB-type (branch)
IF_ID_.io.SelectedInstr_out(24, 20), 0.U)
RegFile.io.reg_write := control_module.io.reg_write
This code is responsible for decoding the fetched instruction by extracting its opcode to identify the instruction type. Based on the opcode and the instruction format, it determines the values of the rs1
and rs2
register fields, specifying the source registers to be used for operations. The rs1
field is selected for instruction types such as R-type, I-type, S-type, SB-type, and JALR, while the rs2
field is used for R-type, S-type, and SB-type instructions. Additionally, the reg_write
signal is configured to enable or disable write-back to the register file (RegFile
), depending on whether the current instruction requires a write operation. This ensures the proper setup of source registers and write-back control for subsequent execution stages.
Instruction | Opcode | Decimal |
---|---|---|
R-type | 011 0011 | 51 |
I-type | 001 0011 | 19 |
S-type | 010 0011 | 35 |
I-type (load instructions) | 000 0011 | 3 |
SB-type (branch) | 110 0011 | 99 |
JALR instruction | 110 0111 | 103 |
// rs1_data
when (Structural.io.fwd_rs1 === 0.U) {
S_rs1DataIn := RegFile.io.rdata1
}.elsewhen (Structural.io.fwd_rs1 === 1.U) {
S_rs1DataIn := RegFile.io.w_data
}.otherwise {
S_rs1DataIn := 0.S
}
// rs2_data
when (Structural.io.fwd_rs2 === 0.U) {
S_rs2DataIn := RegFile.io.rdata2
}.elsewhen (Structural.io.fwd_rs2 === 1.U) {
S_rs2DataIn := RegFile.io.w_data
}.otherwise {
S_rs2DataIn := 0.S
}
This code implements data forwarding for the rs1 and rs2 source registers to handle potential data hazards in the pipeline.
S_rs1DataIn
and S_rs2DataIn
: Wires used to hold the correct values for rs1
and rs2
after evaluating forwarding needs.
0.S
if no valid data path is available.This ensures that the pipeline uses the most up-to-date data for execution, maintaining correctness and avoiding unnecessary stalls.
// Stall when forward
when(HazardDetect.io.ctrl_forward === "b1".U) {
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_Reg_W_in := 0.U
ID_EX_.io.ctrl_AluOp_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
}.otherwise {
ID_EX_.io.ctrl_MemWr_in := control_module.io.mem_write
ID_EX_.io.ctrl_MemRd_in := control_module.io.mem_read
ID_EX_.io.ctrl_MemToReg_in := control_module.io.men_to_reg
ID_EX_.io.ctrl_Reg_W_in := control_module.io.reg_write
ID_EX_.io.ctrl_AluOp_in := control_module.io.alu_operation
ID_EX_.io.ctrl_OpB_in := control_module.io.operand_B
ID_EX_.io.ctrl_Branch_in := control_module.io.branch
ID_EX_.io.ctrl_nextpc_in := control_module.io.next_pc_sel
}
This code snippet implements stalling logic to handle control hazards in a pipelined processor. When a hazard is detected, the pipeline stage is stalled by setting all control signals in the ID_EX
pipeline register to 0
. Otherwise, the normal control signals are passed through.
In addition to constructing a pipelined RISC-V CPU using Chisel, it is essential to verify the integrity of the structure. Therefore, we first verify the correctness of our RISC-V test code using a third-party processor simulator named Ripes. Next, we establish the expected register outputs and compare them with the results produced by our CPU.
However, since the register values are confined within the RegisterFile module, we need to "expose" them through the IO Bundle. The following code snippet shows the modified IO of this module, which exposes all argument registers, temporary registers, and save registers.
// RegisterFile (RegisterFile.scala)
val io = IO(new Bundle {
val rs1 = Input(UInt(5.W))
val rs2 = Input(UInt(5.W))
val reg_write = Input(Bool())
val w_reg = Input(UInt(5.W))
val w_data = Input(SInt(32.W))
val rdata1 = Output(SInt(32.W))
val rdata2 = Output(SInt(32.W))
// >> exposed argument registers
val a0 = Output(SInt(32.W))
val a1 = Output(SInt(32.W))
val a2 = Output(SInt(32.W))
val a3 = Output(SInt(32.W))
val a4 = Output(SInt(32.W))
val a5 = Output(SInt(32.W))
val a6 = Output(SInt(32.W))
val a7 = Output(SInt(32.W))
// << exposed argument registers
// >> exposed temporary registers
val t0 = Output(SInt(32.W))
val t1 = Output(SInt(32.W))
val t2 = Output(SInt(32.W))
val t3 = Output(SInt(32.W))
val t4 = Output(SInt(32.W))
val t5 = Output(SInt(32.W))
val t6 = Output(SInt(32.W))
// << exposed temporary registers
// >> exposed save registers
val s0 = Output(SInt(32.W))
val s1 = Output(SInt(32.W))
val s2 = Output(SInt(32.W))
val s3 = Output(SInt(32.W))
val s4 = Output(SInt(32.W))
val s5 = Output(SInt(32.W))
val s6 = Output(SInt(32.W))
val s7 = Output(SInt(32.W))
val s8 = Output(SInt(32.W))
val s9 = Output(SInt(32.W))
val s10 = Output(SInt(32.W))
val s11 = Output(SInt(32.W))
// << exposed save registers
})
After exposing these IO ports, we need to wire the register values to the corresponding output ports. The following code snippet implements the wiring logic within the module.
// RegisterFile (RegisterFile.scala)
// >> wiring argument registers to corresponding output ports
io.a0 := Mux(io.reg_write && io.w_reg === 10.U, io.w_data, regfile(10))
io.a1 := Mux(io.reg_write && io.w_reg === 11.U, io.w_data, regfile(11))
io.a2 := Mux(io.reg_write && io.w_reg === 12.U, io.w_data, regfile(12))
io.a3 := Mux(io.reg_write && io.w_reg === 13.U, io.w_data, regfile(13))
io.a4 := Mux(io.reg_write && io.w_reg === 14.U, io.w_data, regfile(14))
io.a5 := Mux(io.reg_write && io.w_reg === 15.U, io.w_data, regfile(15))
io.a6 := Mux(io.reg_write && io.w_reg === 16.U, io.w_data, regfile(16))
io.a7 := Mux(io.reg_write && io.w_reg === 17.U, io.w_data, regfile(17))
// << wiring argument registers to corresponding output ports
// >> wiring temporary registers to corresponding output ports
io.t0 := Mux(io.reg_write && io.w_reg === 5.U, io.w_data, regfile(5))
io.t1 := Mux(io.reg_write && io.w_reg === 6.U, io.w_data, regfile(6))
io.t2 := Mux(io.reg_write && io.w_reg === 7.U, io.w_data, regfile(7))
io.t3 := Mux(io.reg_write && io.w_reg === 28.U, io.w_data, regfile(28))
io.t4 := Mux(io.reg_write && io.w_reg === 29.U, io.w_data, regfile(29))
io.t5 := Mux(io.reg_write && io.w_reg === 30.U, io.w_data, regfile(30))
io.t6 := Mux(io.reg_write && io.w_reg === 31.U, io.w_data, regfile(31))
// << wiring temporary registers to corresponding output ports
// >> wiring save registers to corresponding output ports
io.s0 := Mux(io.reg_write && io.w_reg === 8.U, io.w_data, regfile(8))
io.s1 := Mux(io.reg_write && io.w_reg === 9.U, io.w_data, regfile(9))
io.s2 := Mux(io.reg_write && io.w_reg === 18.U, io.w_data, regfile(18))
io.s3 := Mux(io.reg_write && io.w_reg === 19.U, io.w_data, regfile(19))
io.s4 := Mux(io.reg_write && io.w_reg === 20.U, io.w_data, regfile(20))
io.s5 := Mux(io.reg_write && io.w_reg === 21.U, io.w_data, regfile(21))
io.s6 := Mux(io.reg_write && io.w_reg === 22.U, io.w_data, regfile(22))
io.s7 := Mux(io.reg_write && io.w_reg === 23.U, io.w_data, regfile(23))
io.s8 := Mux(io.reg_write && io.w_reg === 24.U, io.w_data, regfile(24))
io.s9 := Mux(io.reg_write && io.w_reg === 25.U, io.w_data, regfile(25))
io.s10 := Mux(io.reg_write && io.w_reg === 26.U, io.w_data, regfile(26))
io.s11 := Mux(io.reg_write && io.w_reg === 27.U, io.w_data, regfile(27))
// << wiring save registers to corresponding output ports
Similarly, we expose the register values outside the PIPELINE
module using the subsequent code snippets.
// PIPELINE (Main.scala)
val io = IO(new Bundle {
val out = Output(SInt(32.W))
val out_pc = Output(SInt(32.W))
// >> exposed argument registers
val a0 = Output(SInt(32.W))
val a1 = Output(SInt(32.W))
val a2 = Output(SInt(32.W))
val a3 = Output(SInt(32.W))
val a4 = Output(SInt(32.W))
val a5 = Output(SInt(32.W))
val a6 = Output(SInt(32.W))
val a7 = Output(SInt(32.W))
// << exposed argument registers
// >> exposed temporary registers
val t0 = Output(SInt(32.W))
val t1 = Output(SInt(32.W))
val t2 = Output(SInt(32.W))
val t3 = Output(SInt(32.W))
val t4 = Output(SInt(32.W))
val t5 = Output(SInt(32.W))
val t6 = Output(SInt(32.W))
// << exposed temporary registers
// >> exposed save registers
val s0 = Output(SInt(32.W))
val s1 = Output(SInt(32.W))
val s2 = Output(SInt(32.W))
val s3 = Output(SInt(32.W))
val s4 = Output(SInt(32.W))
val s5 = Output(SInt(32.W))
val s6 = Output(SInt(32.W))
val s7 = Output(SInt(32.W))
val s8 = Output(SInt(32.W))
val s9 = Output(SInt(32.W))
val s10 = Output(SInt(32.W))
val s11 = Output(SInt(32.W))
// << exposed save registers
})
// PIPELINE (Main.scala)
// >> wiring argument registers to corresponding output ports
io.out_a0 := RegFile.io.a0
io.out_a1 := RegFile.io.a1
io.out_a2 := RegFile.io.a2
io.out_a3 := RegFile.io.a3
io.out_a4 := RegFile.io.a4
io.out_a5 := RegFile.io.a5
io.out_a6 := RegFile.io.a6
io.out_a7 := RegFile.io.a7
// << wiring argument registers to corresponding output ports
// >> wiring temporary registers to corresponding output ports
io.out_t0 := RegFile.io.t0
io.out_t1 := RegFile.io.t1
io.out_t2 := RegFile.io.t2
io.out_t3 := RegFile.io.t3
io.out_t4 := RegFile.io.t4
io.out_t5 := RegFile.io.t5
io.out_t6 := RegFile.io.t6
// << wiring temporary registers to corresponding output ports
// >> wiring save registers to corresponding output ports
io.out_s0 := RegFile.io.s0
io.out_s1 := RegFile.io.s1
io.out_s2 := RegFile.io.s2
io.out_s3 := RegFile.io.s3
io.out_s4 := RegFile.io.s4
io.out_s5 := RegFile.io.s5
io.out_s6 := RegFile.io.s6
io.out_s7 := RegFile.io.s7
io.out_s8 := RegFile.io.s8
io.out_s9 := RegFile.io.s9
io.out_s10 := RegFile.io.s10
io.out_s11 := RegFile.io.s11
// << wiring save registers to corresponding output ports
Finally, in our MainTest.scala
, we add test cases following the structure shown in the code snippet below:
// MainTest.scala
class TOPTest extends FreeSpec with ChiselScalatestTester{
"test a0" in {
// test program
test(new PIPELINE("/home/mi2s/FProject/compilation/testA0.txt")){
x =>
// the number of clock cycles to finish the program
x.clock.step(6)
// the expected value of a0 register
x.io.out_a0.expect(10.S)
}
}
}
The code provided in the repository initially could not properly execute our test cases. Consequently, we traced the execution process and monitored register states after each clock cycle. However, since neither Chisel nor the author of the repository offers a user-friendly debugging tool like the Ripes simulator, which displays register values, we had to implement logging using printf
statements. The following 3 code snippets demonstrate logging for temporary, argument, and save registers.
// PIPELINE (Main.scala)
// t0-t6 : temporary registers
printf(p"[ ${hwCounter} ] t0: ${Hexadecimal(RegFile.io.t0)}, t1: ${Hexadecimal(RegFile.io.t1)}, t2: ${Hexadecimal(RegFile.io.t2)}, t3: ${Hexadecimal(RegFile.io.t3)}, t4: ${Hexadecimal(RegFile.io.t4)}, t5: ${Hexadecimal(RegFile.io.t5)}, t6: ${Hexadecimal(RegFile.io.t6)}\n")
// PIPELINE (Main.scala)
// a0-a7 : argument registers
printf(p"[ ${hwCounter} ] a0: ${Hexadecimal(RegFile.io.a0)}, a1: ${Hexadecimal(RegFile.io.a1)}, a2: ${Hexadecimal(RegFile.io.a2)}, a3: ${Hexadecimal(RegFile.io.a3)}, a4: ${Hexadecimal(RegFile.io.a4)}, a5: ${Hexadecimal(RegFile.io.a5)}, a6: ${Hexadecimal(RegFile.io.a6)}, a7: ${Hexadecimal(RegFile.io.a7)}\n")
// PIPELINE (Main.scala)
// s0-s11 : save registers
printf(p"[ ${hwCounter} ] s0: ${Hexadecimal(RegFile.io.s0)}, s1: ${Hexadecimal(RegFile.io.s1)}, s2: ${Hexadecimal(RegFile.io.s2)}, s3: ${Hexadecimal(RegFile.io.s3)}, s4: ${Hexadecimal(RegFile.io.s4)}, s5: ${Hexadecimal(RegFile.io.s5)}, s6: ${Hexadecimal(RegFile.io.s6)}, s7: ${Hexadecimal(RegFile.io.s7)}, s8: ${Hexadecimal(RegFile.io.s8)}, s9: ${Hexadecimal(RegFile.io.s9)}, s10: ${Hexadecimal(RegFile.io.s10)}, s11: ${Hexadecimal(RegFile.io.s11)}\n")
Additionally, to effectively monitor and analyze the IF_ID
and ID_EXE
pipelines and ALU controls, we include the following code snippet for logging supplementary information.
// PIPELINE (Main.scala)
// control signals from decode to execute (including ALU operands)
printf(p"[ ${hwCounter} ] idx: ${Decimal(PC.io.out / 4.S + 1.S)} op: 0x${Hexadecimal(control_module.io.opcode)} rs1: ${Decimal(RegFile.io.rs1)} (0x${Hexadecimal(RegFile.io.rdata1)}) rs2: ${Decimal(RegFile.io.rs2)} (0x${Hexadecimal(RegFile.io.rdata2)}) alu_arg1: 0x${Hexadecimal(ALU.io.in_A)} alu_arg2: 0x${Hexadecimal(ALU.io.in_B)} inst: 0x${Hexadecimal(InstMemory.io.data)} alu_ctrl_op_A: ${ID_EX_.io.ctrl_OpA_out} alu_forward_a: ${Forwarding.io.forward_a} alu_ctrl_op_B: ${ID_EX_.io.ctrl_OpB_out} alu_forward_b: ${Forwarding.io.forward_b} EXMEM_rd: ${Decimal(Forwarding.io.EXMEM_rd)} IDEX_rs1: ${Decimal(Forwarding.io.IDEX_rs1)} IDEX_rs1_data_out: 0x${Hexadecimal(ID_EX_.io.rs1_data_out)} EXMEM_alu_out: 0x${Hexadecimal(EX_MEM_M.io.EXMEM_alu_out)} IDEX_rs2_data: 0x${Hexadecimal(ID_EX_.io.rs2_data_out)} IDEX_rs1_data_in: 0x${Hexadecimal(ID_EX_.io.rs1_data_in)} fwd_rs1: ${Structural.io.fwd_rs1} MEM_WB_RD: ${Decimal(Forwarding.io.MEMWB_rd)} io_rs1: ${Decimal(ID_EX_.io.rs1_out)} io_rs2: ${Decimal(ID_EX_.io.rs2_out)} MEM_WB_RD_Data: ${Hexadecimal(MEM_WB_M.io.MEMWB_alu_out)} ALUOp: ${Decimal(ALU.io.alu_Op)}\n")
While testing one of our programs, we observed an unusual discrepancy by tracing the logs and comparing the register states with those produced by the Ripes simulator.
[ 35 ] t0: 7fffffff, t1: 3fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 36 ] t0: 7fffffff, t1: 3fffffff, t2: 55555000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 37 ] t0: 7fffffff, t1: 3fffffff, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 38 ] t0: 7fffffff, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 39 ] t0: 6aaaaaaa, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 40 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 41 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 42 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888888, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
In Ripes, at clock cycle 41, register t2
is expected to change to 0x33333000
and then to 0x33333333
due to the instruction li t2, 0x33333333
.
[ 35 ] idx: 20 op: 0x33 rs1: 6 (0x00007fff) rs2: 7 (0x00000000) alu_arg1: 0x55555000 alu_arg2: 0x00000555 inst: 0x406282b3 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 7 IDEX_rs1: 7 IDEX_rs1_data: 0x00000000 EXMEM_alu_out: 0x55555000
[ 36 ] idx: 21 op: 0x33 rs1: 5 (0x7fffffff) rs2: 6 (0x3fffffff) alu_arg1: 0x3fffffff alu_arg2: 0x55555555 inst: 0x0022d313 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 7 IDEX_rs1: 6 IDEX_rs1_data: 0x3fffffff EXMEM_alu_out: 0x55555555
[ 37 ] idx: 22 op: 0x13 rs1: 5 (0x7fffffff) rs2: 0 (0x00000000) alu_arg1: 0x7fffffff alu_arg2: 0x15555555 inst: 0x333333b7 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 6 IDEX_rs1: 5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x15555555
[ 38 ] idx: 23 op: 0x37 rs1: 0 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x6aaaaaaa alu_arg2: 0x00000002 inst: 0x33338393 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 5 IDEX_rs1: 5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x6aaaaaaa
[ 39 ] idx: 24 op: 0x13 rs1: 7 (0x55555555) rs2: 0 (0x00000000) alu_arg1: 0x15555555 alu_arg2: 0x33333000 inst: 0x00737333 alu_ctrl_op_A: 3 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 6 IDEX_rs1: 0 IDEX_rs1_data: 0x15555555 EXMEM_alu_out: 0x1aaaaaaa
[ 40 ] idx: 25 op: 0x33 rs1: 6 (0x15555555) rs2: 7 (0x55555555) alu_arg1: 0x48888555 alu_arg2: 0x00000333 inst: 0x0072f3b3 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 7 IDEX_rs1: 7 IDEX_rs1_data: 0x55555555 EXMEM_alu_out: 0x48888555
[ 41 ] idx: 26 op: 0x33 rs1: 5 (0x6aaaaaaa) rs2: 7 (0x55555555) alu_arg1: 0x1aaaaaaa alu_arg2: 0x48888888 inst: 0x007302b3 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 7 IDEX_rs1: 6 IDEX_rs1_data: 0x1aaaaaaa EXMEM_alu_out: 0x48888888
[ 42 ] idx: 27 op: 0x33 rs1: 6 (0x1aaaaaaa) rs2: 7 (0x48888555) alu_arg1: 0x6aaaaaaa alu_arg2: 0x48888888 inst: 0x0042d313 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 1 EXMEM_rd: 6 IDEX_rs1: 5 IDEX_rs1_data: 0x6aaaaaaa EXMEM_alu_out: 0x08888888
However, examining the ALU log reveals that at clock cycle 39 during the EXE
stage, the CPU adds 0x15555555
and 0x33333000
instead of 0x00000000
and 0x33333000
as expected from the lui t2, 0x33333
instruction. Further analysis shows that the value 0x15555555
is incorrectly forwarded from the write-back pipeline register. This issue originates from the module responsible for hazard detection.
The original implementation included a StructuralHazard
class intended to resolve structural hazards but inadvertently handled data hazards instead, as shown in the code snippet below.
// StructuralHazard (StructuralHazard.scala)
// Determine if forwarding is needed for rs1
when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs1)) {
io.fwd_rs1 := true.B
}.otherwise {
io.fwd_rs1 := false.B
}
// Determine if forwarding is needed for rs2
when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs2)) {
io.fwd_rs2 := true.B
}.otherwise {
io.fwd_rs2 := false.B
}
Additionally, its integration in Main.scala
disrupted proper data forwarding by only addressing hazards from the MEM/WB
pipeline and ignoring those from the EX/MEM
pipeline. It also detected hazards at incorrect stages. To rectify this, we removed the flawed StructuralHazard
class and correctly implemented structural hazard resolution.
// PIPELINE (Main.scala)
// rs1_data
when (Structural.io.fwd_rs1 === 0.U) {
S_rs1DataIn := RegFile.io.rdata1
}.elsewhen (Structural.io.fwd_rs1 === 1.U) {
S_rs1DataIn := RegFile.io.w_data
}.otherwise {
S_rs1DataIn := 0.S
}
// rs2_data
when (Structural.io.fwd_rs2 === 0.U) {
S_rs2DataIn := RegFile.io.rdata2
}.elsewhen (Structural.io.fwd_rs2 === 1.U) {
S_rs2DataIn := RegFile.io.w_data
}.otherwise {
S_rs2DataIn := 0.S
}
After removing the defective module, new issues emerged, observable in the logs of argument and temporary registers.
[ 20 ] a0: ffffffff, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 21 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 22 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 23 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 24 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 25 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[ 20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 24 ] t0: ffffffff, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 25 ] t0: ffffffff, t1: 7fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
Specifically, at clock cycle 24, the instruction add t0, x0, a0
was supposed to complete execution. However, analysis of the control signal history in the decode and execute stages revealed that there was no forwarding of the latest value of a0
. Consequently, reading and writing occurred simultaneously, causing the CPU to fetch stale data since reading is typically faster than writing.
[ 20 ] idx: 5 op: 0x00 rs1: 0 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x000000e0 alu_arg2: 0x00000000 inst: 0x00a002b3 alu_ctrl_op_A: 1 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 0 EXMEM_rd: 10 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 18 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 7fff0000 ALUOp: 31
[ 21 ] idx: 6 op: 0x33 rs1: 0 (0x00000000) rs2: 10 (0xffffffff) alu_arg1: 0x00000000 alu_arg2: 0x00000000 inst: 0x0012d313 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 0 EXMEM_rd: 1 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x000000e0 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 10 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 7fff0000 ALUOp: 0
[ 22 ] idx: 7 op: 0x13 rs1: 5 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0xffffffff inst: 0x0062e2b3 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 0 EXMEM_rd: 0 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x00000000 IDEX_rs2_data: 0xffffffff IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 1 io_rs1: 0 io_rs2: 10 MEM_WB_RD_Data: 000000e0 ALUOp: 0
[ 23 ] idx: 8 op: 0x33 rs1: 5 (0x00000000) rs2: 6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000001 inst: 0x0022d313 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 5 IDEX_rs1: 5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 0 io_rs1: 5 io_rs2: 0 MEM_WB_RD_Data: 00000000 ALUOp: 5
[ 24 ] idx: 9 op: 0x13 rs1: 5 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x7fffffff inst: 0x0062e2b3 alu_ctrl_op_A: 0 alu_forward_a: 1 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 6 IDEX_rs1: 5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 1 MEM_WB_RD: 5 io_rs1: 5 io_rs2: 6 MEM_WB_RD_Data: ffffffff ALUOp: 6
[ 25 ] idx: 10 op: 0x33 rs1: 5 (0xffffffff) rs2: 6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000002 inst: 0x0042d313 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 5 IDEX_rs1: 5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0xffffffff fwd_rs1: 0 MEM_WB_RD: 6 io_rs1: 5 io_rs2: 0 MEM_WB_RD_Data: 7fffffff ALUOp: 5
Furthermore, the forwarding scenarios primarily included EX/MEM
→ ALU
, MEM/WB
→ ALU
, and MEM/WB
→ InstrDecode
. The root cause of the issue was neglecting the priority of writing to registers before reading from them when both operations use the same register. This oversight led to reading stale data. For example, while storing 0x7FFF0000
to a0
, the CPU simultaneously attempted to read a0
, resulting in the stale value 0xFFFFFFFF
.
[ 20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 24 ] t0: 7fff0000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[ 25 ] t0: 7fff0000, t1: 3fff8000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
To ensure that writing to registers is prioritized before reading, we revised a section of the RegisterFile
module
// RegisterFile (RegisterFile.scala)
io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))
and replaced it with the following code snippet
// RegisterFile (RegisterFile.scala)
// 1) Read old data from the array.
val readData1 = Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
val readData2 = Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))
// 2) If there's a same-cycle write to the same register, override (bypass) it.
val bypassedData1 = Mux(io.reg_write && (io.w_reg === io.rs1) && (io.w_reg =/= 0.U),
io.w_data,
readData1)
val bypassedData2 = Mux(io.reg_write && (io.w_reg === io.rs2) && (io.w_reg =/= 0.U),
io.w_data,
readData2)
// 3) Send those results to outputs
io.rdata1 := bypassedData1
io.rdata2 := bypassedData2
After resolving these hazards, we encountered an unexpected issue with arithmetic operations. The log files below display the state history of save registers and control signals. At clock cycle 64, the instruction 0x408a5a13
(srai s4, s4, 8
) is loaded and executed at clock cycle 66. By clock cycle 68, the instruction performs a logical right shift without the required sign extension for srai
or sra
instructions. This is evident from the ALUOp
value of 5 at clock cycle 66, which corresponds to SRL
instead of SRA
.
[ 64 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 00000000, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 65 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 66 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 04000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 67 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 83ff0000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 68 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 69 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[ 64 ] idx: 64 op: 0x33 rs1: 18 (0x7fff0000) rs2: 20 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0x04000000 inst: 0x408a5a13 alu_ctrl_op_A: 3 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 19 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0000002b IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1: 0 MEM_WB_RD: 8 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 00000000 ALUOp: 0
[ 65 ] idx: 65 op: 0x13 rs1: 20 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0x04000000 inst: 0x7f8002b7 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 20 IDEX_rs1: 18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x04000000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 19 io_rs1: 18 io_rs2: 20 MEM_WB_RD_Data: 0000002b ALUOp: 0
[ 66 ] idx: 66 op: 0x37 rs1: 0 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x83ff0000 alu_arg2: 0x00000408 inst: 0x005a7a33 alu_ctrl_op_A: 0 alu_forward_a: 2 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 20 IDEX_rs1: 20 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x83ff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 20 io_rs1: 20 io_rs2: 0 MEM_WB_RD_Data: 04000000 ALUOp: 5
[ 67 ] idx: 67 op: 0x33 rs1: 20 (0x83ff0000) rs2: 5 (0x00000001) alu_arg1: 0x00000000 alu_arg2: 0x7f800000 inst: 0xfff90a93 alu_ctrl_op_A: 3 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 20 IDEX_rs1: 0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0083ff00 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x83ff0000 fwd_rs1: 1 MEM_WB_RD: 20 io_rs1: 0 io_rs2: 0 MEM_WB_RD_Data: 83ff0000 ALUOp: 0
[ 68 ] idx: 68 op: 0x13 rs1: 18 (0x7fff0000) rs2: 0 (0x00000000) alu_arg1: 0x0083ff00 alu_arg2: 0x7f800000 inst: 0x01fada93 alu_ctrl_op_A: 0 alu_forward_a: 1 alu_ctrl_op_B: 0 alu_forward_b: 2 EXMEM_rd: 5 IDEX_rs1: 20 IDEX_rs1_data_out: 0x83ff0000 EXMEM_alu_out: 0x7f800000 IDEX_rs2_data: 0x00000001 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1: 0 MEM_WB_RD: 20 io_rs1: 20 io_rs2: 5 MEM_WB_RD_Data: 0083ff00 ALUOp: 7
[ 69 ] idx: 69 op: 0x13 rs1: 21 (0x00000000) rs2: 0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0xffffffff inst: 0x013912b3 alu_ctrl_op_A: 0 alu_forward_a: 0 alu_ctrl_op_B: 1 alu_forward_b: 0 EXMEM_rd: 20 IDEX_rs1: 18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x00800000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1: 0 MEM_WB_RD: 5 io_rs1: 18 io_rs2: 0 MEM_WB_RD_Data: 7f800000 ALUOp: 0
Through debugging and observation, we discovered that some instructions were not implemented correctly. Specifically, in the ALU module, SRA
and SRAI
should be assigned an ALUOp
value of 13 instead of 5.
// AluOpCode (ALU.scala)
val ALU_ADD = 0.U(5.W)
val ALU_ADDI = 0.U(5.W)
val ALU_SW = 0.U(5.W)
val ALU_LW = 0.U(5.W)
val ALU_LUI = 0.U(5.W)
val ALU_AUIPC = 0.U(5.W)
val ALU_SLL = 1.U(5.W)
val ALU_SLLI = 1.U(5.W)
val ALU_SLT = 2.U(5.W)
val ALU_SLTI = 2.U(5.W)
val ALU_SLTU = 3.U(5.W)
val ALU_SLTUI = 3.U(5.W)
val ALU_XOR = 4.U(5.W)
val ALU_XORI = 4.U(5.W)
val ALU_SRL = 5.U(5.W)
val ALU_SRLI = 5.U(5.W)
val ALU_OR = 6.U(5.W)
val ALU_ORI = 6.U(5.W)
val ALU_AND = 7.U(5.W)
val ALU_ANDI = 7.U(5.W)
val ALU_SUB = 8.U(5.W)
val ALU_SRA = 13.U(5.W)
val ALU_SRAI = 13.U(5.W)
val ALU_JAL = 31.U(5.W)
val ALU_JALR = 31.U(5.W)
The original ALU code only supported I-type instructions with operation codes less than 8, as it only considered func3
values from 000
to 111
(0 to 7).
// AluControl (Alu_Control.scala)
// R type
when (io.aluOp === 0.U) {
io.out := Cat(0.U(2.W), io.func7, io.func3)
// I type
}.elsewhen (io.aluOp === 1.U) {
io.out := Cat("b00".U(2.W), io.func3)
}
To fix this issue, we referred to the RISC-V instruction set and extended the ALU module to include the missing instructions. The following table illustrates the I-type and R-type instructions.
To accurately calculate the aluOp
code, the ALU Control unit must consider the entire func7
field of I-type and R-type instructions. Originally, func7
was defined as a boolean in ALU Control, which was incorrect. We rectified this by defining func7
as a 7-bit unsigned integer.
// AluControl (Alu_Control.scala)
val io = IO(new Bundle {
val func3 = Input(UInt(3.W))
val func7 = Input(UInt(7.W)) // changed from Input(Bool())
val aluOp = Input(UInt(3.W))
val out = Output(UInt(5.W))
})
Additionally, outside of ALU Control, we ensured that the correct bits for func7
are properly extracted.
// PIPELINE (Main.scala)
ID_EX_.io.func7_in := IF_ID_.io.SelectedInstr_out(31, 25) // changed from IF_ID_.io.SelectedInstr_out(30)
The revised ALU Control code snippet below now supports SRA
, SRAI
, and SUB
instructions, which have operation codes greater than 7.
// AluControl (Alu_Control.scala)
// R type
when (io.aluOp === 0.U) {
when ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b000".U(3.W))) {
io.out := 8.U // SUB, originally broken
}.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 13.U // SRA, originally broken
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 5.U // SRL
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) {
io.out := 1.U // SLL
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b010".U(3.W))) {
io.out := 2.U // SLT
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b011".U(3.W))) {
io.out := 3.U // SLTU
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b100".U(3.W))) {
io.out := 4.U // XOR
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b111".U(3.W))) {
io.out := 7.U // AND
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b110".U(3.W))) {
io.out := 6.U // OR
}.otherwise {
io.out := Cat(0.U(2.W), io.func7, io.func3)
}
// I type
}.elsewhen(io.aluOp === 1.U) {
when ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 5.U // SRLI
}.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
io.out := 13.U // SRAI, originally broken
}.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) {
io.out := 1.U // SLLI
}.otherwise {
io.out := Cat("b00".U(2.W), io.func3)
}
}
Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala
To implement the M-extension, we need to modify the AluControl
module to allow the func7
signal to be passed into the module. Below is the updated definition for the AluControl class:
class AluControl extends Module {
val io = IO(new Bundle {
val func3 = Input(UInt(3.W)) // 3-bit function code for RISC-V instructions
val func7 = Input(UInt(7.W)) // 7-bit function code for RISC-V instructions (used for M-extension)
val aluOp = Input(UInt(3.W)) // ALU operation selector
val out = Output(UInt(5.W)) // ALU operation output code
})
io.out := 0.U
...
}
In the R-type instruction logic, we need to add a condition to handle M-extension instructions. Specifically, when func7
equals b0000001
, the instruction corresponds to an M-extension operation, such as multiplication (MUL), division (DIV), or remainder (REM). Below is the updated code for supporting M-extension:
// R type
when(io.aluOp === 0.U) {
// First, check for M-extension: func7 === "b0000001"
when(io.func7 === "b0000001".U) {
// M-extension operations (e.g., MUL, DIV, REM)
switch(io.func3) {
is("b000".U) { io.out := 14.U } // MUL
is("b001".U) { io.out := 15.U } // MULH
is("b010".U) { io.out := 16.U } // MULHSU
is("b011".U) { io.out := 17.U } // MULHU
is("b100".U) { io.out := 18.U } // DIV
is("b101".U) { io.out := 19.U } // DIVU
is("b110".U) { io.out := 20.U } // REM
is("b111".U) { io.out := 21.U } // REMU
}
...
}
func7 === "b0000001"
.func7
equals b0000001
, which indicates an M-extension instruction.14.U
to 21.U
), which corresponds to the predefined codes in the ALU.Extending the ALU to Support M-extension Instructions
To fully implement the M-extension, we need to modify both the AluOpCode object and the ALU module. Below are the detailed steps with the modifications.
Modifying AluOpCode
to Include M-extension Instruction Types
We add operation codes for the M-extension instructions (MUL, DIV, REM, etc.) in the AluOpCode object. These codes will represent each specific M-extension operation.
object AluOpCode {
...
// M-extension operations
val ALU_MUL = 14.U(5.W) // Multiplication
val ALU_MULH = 15.U(5.W) // Multiplication high (signed)
val ALU_MULHSU = 16.U(5.W) // Multiplication high (signed x unsigned)
val ALU_MULHU = 17.U(5.W) // Multiplication high (unsigned)
val ALU_DIV = 18.U(5.W) // Division (signed)
val ALU_DIVU = 19.U(5.W) // Division (unsigned)
val ALU_REM = 20.U(5.W) // Remainder (signed)
val ALU_REMU = 21.U(5.W) // Remainder (unsigned)
}
alu_Op
value for the ALU.Implementing M-extension Operations in the ALU Module
The ALU module is extended to perform the M-extension operations based on the alu_Op
value provided.
class ALU extends Module {
val io = IO(new Bundle {
val in_A = Input(SInt(32.W)) // First operand
val in_B = Input(SInt(32.W)) // Second operand
val alu_Op = Input(UInt(5.W)) // ALU operation code
val out = Output(SInt(32.W)) // ALU result
})
val result = WireDefault(0.S(32.W)) // Default result is zero
switch(io.alu_Op) {
...
// M-extension operations
is(ALU_MUL) {
result := io.in_A * io.in_B // Standard multiplication
}
is(ALU_MULH) {
result := (io.in_A * io.in_B)(63, 32).asSInt // High 32 bits of signed multiplication
}
is(ALU_MULHSU) {
result := (io.in_A.asSInt * io.in_B.asUInt)(63, 32).asSInt // High 32 bits (signed x unsigned)
}
is(ALU_MULHU) {
result := (io.in_A.asUInt * io.in_B.asUInt)(63, 32).asSInt // High 32 bits of unsigned multiplication
}
is(ALU_DIV) {
result := io.in_A / io.in_B // Signed division
}
is(ALU_DIVU) {
result := (io.in_A.asUInt / io.in_B.asUInt).asSInt // Unsigned division
}
is(ALU_REM) {
result := io.in_A % io.in_B // Signed remainder
}
is(ALU_REMU) {
result := (io.in_A.asUInt % io.in_B.asUInt).asSInt // Unsigned remainder
}
}
io.out := result // Output the result
}
Since branching or jumping occurs during the MEM
stage, we need to flush both the IF/ID
and ID/EX
pipelines with NOP
instructions (addi x0, x0, 0
) and clear all control signals. The corrected code is shown below:
// PIPELINE (Main.scala)
when(HazardDetect.io.pc_forward === 1.B) {
PC.io.in := HazardDetect.io.pc_out
}.otherwise {
when(control_module.io.next_pc_sel === "b01".U) {
when(Branch_M.io.br_taken === 1.B && control_module.io.branch === 1.B) {
PC.io.in := ImmGen.io.SB_type
// Flush IF/ID
IF_ID_.io.pc_in := 0.S
IF_ID_.io.pc4_in := 0.U
IF_ID_.io.SelectedPC:= 0.S
IF_ID_.io.SelectedInstr := 0.U
// Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
ID_EX_.io.rs1_in := 0.U
ID_EX_.io.rs2_in := 0.U
ID_EX_.io.imm := 0.S
ID_EX_.io.func3_in := 0.U
ID_EX_.io.func7_in := 0.U
ID_EX_.io.rd_in := 0.U
// Also set the control signals to 0 so no writes occur:
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_OpA_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
ID_EX_.io.IFID_pc4_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
}.otherwise {
PC.io.in := PC4.io.out.asSInt
}
}.elsewhen(control_module.io.next_pc_sel === "b10".U) {
PC.io.in := ImmGen.io.UJ_type
// Flush IF/ID
IF_ID_.io.pc_in := 0.S
IF_ID_.io.pc4_in := 0.U
IF_ID_.io.SelectedPC:= 0.S
IF_ID_.io.SelectedInstr := 0.U
// Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
ID_EX_.io.rs1_in := 0.U
ID_EX_.io.rs2_in := 0.U
ID_EX_.io.imm := 0.S
ID_EX_.io.func3_in := 0.U
ID_EX_.io.func7_in := 0.U
ID_EX_.io.rd_in := 0.U
// Also set the control signals to 0 so no writes occur:
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_OpA_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
ID_EX_.io.IFID_pc4_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
}.elsewhen(control_module.io.next_pc_sel === "b11".U) {
PC.io.in := JALR.io.out.asSInt
// Flush IF/ID
IF_ID_.io.pc_in := 0.S
IF_ID_.io.pc4_in := 0.U
IF_ID_.io.SelectedPC:= 0.S
IF_ID_.io.SelectedInstr := 0.U
// Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
ID_EX_.io.rs1_in := 0.U
ID_EX_.io.rs2_in := 0.U
ID_EX_.io.imm := 0.S
ID_EX_.io.func3_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
// Also set the control signals to 0 so no writes occur:
ID_EX_.io.ctrl_MemWr_in := 0.U
ID_EX_.io.ctrl_MemRd_in := 0.U
ID_EX_.io.ctrl_MemToReg_in := 0.U
ID_EX_.io.ctrl_OpA_in := 0.U
ID_EX_.io.ctrl_OpB_in := 0.U
ID_EX_.io.ctrl_Branch_in := 0.U
ID_EX_.io.ctrl_nextpc_in := 0.U
ID_EX_.io.IFID_pc4_in := 0.U
ID_EX_.io.rs1_data_in := 0.S
ID_EX_.io.rs2_data_in := 0.S
}.otherwise {
PC.io.in := PC4.io.out.asSInt
}
}
This RISC-V assembly program finds the index of the maximum value in a predefined integer array. It initializes the array with three elements (0, 2, 1
) and iterates through it to compare each element with the current maximum value. The program uses registers to track the current maximum value (t0
), its index (t1
), and the current index (t2
). If a larger value is found, both the maximum value and its index are updated. Once the loop completes, the index of the maximum value is stored in register a0
, and the program exits using a system call. This implementation demonstrates basic array traversal and conditional updates in assembly.
.data
array: .word 0, 0, 0
.text
_start:
la a0, array
li t1, 0
addi s0, s0, 0
addi s0, s0, 0
sw t1, 0(a0)
li t1, 2
addi s0, s0, 0
addi s0, s0, 0
sw t1, 4(a0)
li t1, 1
addi s0, s0, 0
addi s0, s0, 0
sw t1, 8(a0)
li a1, 3
argmax:
li t6, 1
lw t0, 0(a0)
li t1, 0
li t2, 1
loop_start:
beq t2, a1, end
addi s0, s0, 0
addi s0, s0, 0
addi a0, a0, 4
addi s0, s0, 0
addi s0, s0, 0
lw t3, 0(a0)
addi s0, s0, 0
addi s0, s0, 0
bge t3, t0, set_max_num
addi s0, s0, 0
addi s0, s0, 0
addi t2, t2, 1
addi s0, s0, 0
addi s0, s0, 0
j loop_start
addi s0, s0, 0
addi s0, s0, 0
set_max_num:
mv t0, t3
mv t1, t2
addi t2, t2, 1
addi s0, s0, 0
addi s0, s0, 0
j loop_start
addi s0, s0, 0
addi s0, s0, 0
end:
mv a0, t1
li a7, 10
ecall
This RISC-V assembly program calculates the number of leading zeros in a 32-bit integer. The program starts by loading a value (0x70000002
) into register a0
and calls the my_clz
function. In my_clz
, the input value is processed using a bitmask (t3
) initialized to 0x80000000
(representing the most significant bit). A loop checks each bit from left to right by performing a bitwise AND operation between the input value and the bitmask. If the current bit is 1, the loop exits; otherwise, the bitmask is right-shifted, and a counter (t1
) is incremented. Once the loop completes, the count of leading zeros is returned in a0
, and the program exits.
main:
li a0, 0x70000002
jal ra, my_clz
li a7, 10
ecall
my_clz:
mv t0, a0
li t1, 0
li t3, 0x80000000
clz_loop:
and t4, t0, t3
bne t4, x0, exit_clz
srli t3, t3, 1
addi t1, t1, 1
bnez t3, clz_loop
exit_clz:
mv a0, t1
ret
This RISC-V assembly program calculates the absolute value of a 32-bit floating-point number. The program begins by loading the value 0xFFFFFFFF
into register a0
, representing the input, and then calls the fabsf
function. Inside fabsf
, a bitmask (0x7FFFFFFF
) is loaded into t0
, which clears the sign bit of the input number when applied using a bitwise AND operation. The result, stored back in a0
, represents the absolute value of the input. Finally, the program exits the function and terminates using a system call.
main:
li a0, 0xFFFFFFFF
jal ra, fabsf
li a7, 10
ecall
fabsf:
li t0, 0x7FFFFFFF
and a0, a0, t0
jr ra
This RISC-V assembly program converts a 16-bit floating-point number (FP16) to a 32-bit floating-point number (FP32). The main function loads the FP16 value (0xFFFFFFFF
) into register a0
and calls the fp16_to_fp32
function. Within fp16_to_fp32
, the program handles sign extraction, normalization, and exponent adjustment. The my_clz
function is used to calculate the number of leading zeros for normalization. The program adjusts the FP16 format to FP32 by aligning the mantissa, adding a bias to the exponent, and managing special cases like zeros, infinities, and NaNs. Finally, the result is constructed by combining the sign, exponent, and mantissa and is returned in a0
. The program uses a stack for register saving and restoring during function calls to maintain execution context.
main:
li a0, 0xFFFFFFFF
jal ra, fp16_to_fp32
li a7, 10
ecall
my_clz:
my_clz_prologue:
add t0, x0, a0
my_clz_padding:
srli t1, t0, 1
or t0, t0, t1
srli t1, t0, 2
or t0, t0, t1
srli t1, t0, 4
or t0, t0, t1
srli t1, t0, 8
or t0, t0, t1
srli t1, t0, 16
or t0, t0, t1
my_clz_popcount:
srli t1, t0, 1
li t2, 0x55555555
and t1, t1, t2
sub t0, t0, t1
srli t1, t0, 2
li t2, 0x33333333
and t1, t1, t2
and t2, t0, t2
add t0, t1, t2
srli t1, t0, 4
add t1, t1, t0
li t2, 0x0F0F0F0F
and t0, t1, t2
srli t1, t0, 8
add t0, t0, t1
srli t1, t0, 16
add t0, t0, t1
li t2, 0x3F
and t0, t0, t2
li t1, 32
sub a0, t1, t0
my_clz_epilogue:
jr ra
fp16_to_fp32:
fp16_to_fp32_prologue:
addi sp, sp, -28
sw ra, 0(sp)
sw s0, 4(sp)
sw s1, 8(sp)
sw s2, 12(sp)
sw s3, 16(sp)
sw s4, 20(sp)
sw s5, 24(sp)
fp16_to_fp32_prologue_after:
slli s0, a0, 16
li s1, 0x80000000
and s1, s1, s0
li s2, 0x7FFFFFFF
and s2, s2, s0
mv a0, s2
jal ra, my_clz
li s3, 0
li t0, 5
slt t0, t0, a0
beq t0, x0, fp16_to_fp32_post_overflow_check
addi s3, a0, -5
fp16_to_fp32_post_overflow_check:
li s4, 0x04000000
add s4, s2, s4
srai s4, s4, 8
li t0, 0x7F800000
and s4, s4, t0
addi s5, s2, -1
srli s5, s5, 31
sll t0, s2, s3
srli t0, t0, 3
li t1, 0x70
sub t1, t1, s3
slli t1, t1, 23
add t0, t0, t1
or t0, t0, s4
not t1, s5
and t0, t0, t1
or a0, s1, t0
fp16_to_fp32_epilogue:
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
lw s5, 24(sp)
addi sp, sp, 28
jr ra
This RISC-V assembly program performs multiplication using the shift-and-add method, which is a bitwise algorithm. It takes two numbers (multiplier and multiplicand) and calculates their product without using the mul
instruction. The program handles negative values by converting them to positive before computation and uses a 32-bit loop counter to iterate through each bit of the multiplier. For each bit, it conditionally adds the multiplicand to an accumulator if the bit is 1. The multiplier is shifted right, and the multiplicand is shifted left after each iteration. The result is stored in a0
at the end, and the program exits.
main:
li a1, 6
li a3, 7
li t0, 0
li t1, 32
bltz a1, handle_negative1
j shift_and_add_loop
bltz a3, handle_negative2
j shift_and_add_loop
handle_negative1:
neg a1, a1
handle_negative2:
neg a3, a3
shift_and_add_loop:
beqz t1, end_shift_and_add
andi t2, a1, 1
beqz t2, skip_add
add t0, t0, a3
skip_add:
srai a1, a1, 1
slli a3, a3, 1
addi t1, t1, -1
j shift_and_add_loop
end_shift_and_add:
mv a0, t0
li a7, 10
ecall
class TOPTest extends FreeSpec with ChiselScalatestTester{
"argmax test" in{
test(new PIPELINE("/home/mi2s/FProject/compilation/argmax.txt")){
x =>
x.clock.step(69)
x.io.out.expect(1.S)
}
}
"clz test" in{
test(new PIPELINE("/home/mi2s/FProject/compilation/clz.txt")){
x =>
x.clock.step(200)
x.io.out.expect(15.S)
}
}
"fabsf test" in {
test(new PIPELINE("/home/mi2s/FProject/test_compilation/fabsf.txt")){
x =>
x.clock.step(200)
x.io.out.expect(2147483647.S)
}
}
"fp16_to_32 test" in {
test(new PIPELINE("/home/mi2s/FProject/test_compilation/fp16_to_32.txt")){
x =>
x.clock.step(107)
x.io.out.expect(-8192.S)
}
}
"multiply test" in{
test(new PIPELINE("/home/mi2s/FProject/compilation/multiply.txt")){
x =>
x.clock.step(370)
x.io.out.expect(42.S)
}
}
}
[info] TOPTest:
[info] - argmax test
[info] - clz test
[info] - fabsf test
[info] - fp16_to_32 test
[info] - multiply test
[info] Run completed in 4 seconds, 621 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 8 s, completed Jan 23, 2025, 6:23:41 PM
sbt test
To save the execution history as a file, use sbt test > <filename.txt>
.
git clone https://github.com/riscv/riscv-gnu-toolchain
cd riscv-gnu-toolchain
sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build
make linux
*.s
to *.elf
)
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -o <in_name>.elf <out_name>.s
For RISC-V programs utilizing the M-extension, change to -march=rv32im
.
*.elf
to *.bin
)
riscv64-unknown-elf-objcopy -O binary <out_name>.elf <in_name>.bin
*.elf
to *.hex
)
riscv64-unknown-elf-objcopy -O verilog <out_name>.elf <in_name>.hex
The compiled program must undergo post-processing for being encoded in the form of little Endian, containing special characters and whitespaces.