Construct RISC-V in Chisel

蕭郁霖, 徐向廷

A. Repository Study

5-Stage-RV32I

1. Basic Components

1.1 Register File

Filepath: src/main/scala/Pipeline/UNits/RegisterFile.scala



















class RegisterFile extends Module {
  val io = IO(new Bundle {
    val rs1       = Input(UInt(5.W))
    val rs2       = Input(UInt(5.W))
    val reg_write = Input(Bool())
    val w_reg     = Input(UInt(5.W))
    val w_data    = Input(SInt(32.W))
    val rdata1    = Output(SInt(32.W))
    val rdata2    = Output(SInt(32.W))
  })
  val regfile = RegInit(VecInit(Seq.fill(32)(0.S(32.W))))

  io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
  io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))

  when(io.reg_write && io.w_reg =/= 0.U) {
    regfile(io.w_reg) := io.w_data
  }
}

The code snippet defines a RegisterFile module for a RISC-V pipeline, featuring seven input and output ports dedicated to data transfer. In RISC-V, unlike the classic MIPS pipeline, the register file supports two read registers (rs1 and rs2) and a single write register. Initially, the register file is instantiated with 32 registers, all initialized to 0. The outputs rdata1 and rdata2 are continuously updated based on the values of rs1 and rs2, respectively—with a special check to ensure that reading from register 0 always returns 0. For write operations, if the reg_write flag is asserted and the target register (w_reg) is not zero, the corresponding register is updated with the value provided on w_data. The following image illustrates the seven ports that facilitate these operations in the RegisterFile unit.

1.2 PC

Filepath: src/main/scala/Pipeline/UNits/PC.scala









class PC extends Module {
    val io = IO (new Bundle {
        val in = Input(SInt(32.W))
        val out = Output(SInt(32.W))
    })
    val PC = RegInit(0.S(32.W))
    io.out := PC
    PC := io.in
}

The code snippet implements a program counter (PC) module that maintains the current program counter value. It uses RegInit to initialize the register to 0 and updates the stored PC value with the input (io.in) at every cycle, while also exposing this value via io.out.

1.3 PC + 4

Filepath: src/main/scala/Pipeline/UNits/PC4.scala








class PC4 extends Module {
    val io = IO (new Bundle {
        val pc = Input(UInt(32.W))
        val out = Output(UInt(32.W))
    })
    io.out := 0.U
    io.out := io.pc + 4.U(32.W)
}

The second snippet defines a PC4 module, which computes the next program counter value by simply adding 4 to the current PC input (io.pc). This incrementation is crucial for sequential instruction execution in the pipeline.

1.4 JALR

Filepath: src/main/scala/Pipeline/UNits/JALR.scala











class Jalr extends Module {
  val io = IO(new Bundle {
    val imme = Input(UInt(32.W))
    val rdata1 = Input(UInt(32.W))
    val out = Output(UInt(32.W))
  })
  val computedAddr = io.imme + io.rdata1

  // Align the address by masking the least significant bit (LSB) to 0
  io.out := computedAddr & "hFFFFFFFE".U
}

The code snippet above implements the address calculation for the jump-and-link-register (JALR) instruction. The module computes the target address by adding a forwarded register value (rdata1) to an immediate offset (imme). To ensure proper alignment, it then applies a binary mask (0xFFFFFFFE), forcing the least significant bit (LSB) to 0. The aligned jump address is finally provided through io.out.

1.5 Imm-Generator

Filepath: src/main/scala/Pipeline/UNits/ImmGenerator.scala




























class ImmGenerator extends Module {
  val io = IO(new Bundle {
    val instr = Input(UInt(32.W))
    val pc = Input(UInt(32.W))
    val I_type = Output(SInt(32.W))
    val S_type = Output(SInt(32.W))
    val SB_type = Output(SInt(32.W))
    val U_type = Output(SInt(32.W))
    val UJ_type = Output(SInt(32.W))
  })

  // I-Type Immediate: [31:20] sign-extended to 32 bits
  io.I_type := Cat(Fill(20, io.instr(31)), io.instr(31, 20)).asSInt

  // S-Type Immediate: [31:25][11:7] sign-extended to 32 bits
  io.S_type := Cat(Fill(20, io.instr(31)), io.instr(31, 25), io.instr(11, 7)).asSInt

  // Branch-Type Immediate: [31][7][30:25][11:8] sign-extended to 32 bits
  val sbImm = Cat(Fill(19, io.instr(31)), io.instr(31), io.instr(7), io.instr(30, 25), io.instr(11, 8), 0.U(1.W)).asSInt
  io.SB_type := sbImm + io.pc.asSInt

  // U-Type Immediate: [31:12] shifted left by 12 bits
  io.U_type := Cat(io.instr(31, 12), Fill(12, 0.U)).asSInt

  // UJ-Type Immediate: [31][19:12][20][30:21] sign-extended to 32 bits, shifted left by 1 bit
  val ujImm = Cat(Fill(11, io.instr(31)), io.instr(31), io.instr(19, 12), io.instr(20), io.instr(30, 21), 0.U(1.W)).asSInt
  io.UJ_type := ujImm + io.pc.asSInt
}

The code snippet implements the generation of 32-bit immediate values from RISC-V instructions, tailored to each instruction format. For I-type instructions, it extracts bits [31:20] from the instruction and sign-extends them to 32 bits. In the case of S-type instructions, the immediate is formed by concatenating bits [31:25] with bits [11:7] and then sign-extending the result. For branch (SB-type) instructions, the immediate is built by concatenating several segments—bit 31, bit 7, bits [30:25], and bits [11:8]—with an additional 0 appended as the least significant bit for proper alignment, followed by sign extension. For U-type instructions, the immediate is taken from bits [31:12] and shifted left by 12 bits. Finally, for UJ-type instructions, the immediate is generated by concatenating bit 31, bits [19:12], bit 20, and bits [30:21], appending a trailing 0, and then sign-extending the result to 32 bits.

Additionally, the module computes target addresses for control flow instructions using these immediates. The output io.SB_type represents the branch target address for SB-type instructions, obtained by adding the sign-extended branch immediate to the current program counter (PC), thus yielding a PC-relative address for branch operations. Similarly, io.UJ_type provides the target address for UJ-type (jump) instructions by adding the corresponding immediate value to the current PC. These computed addresses are essential for correctly directing the control flow during instruction execution in the RISC-V pipeline.

1.6 Control Unit

Filepath: src/main/scala/Pipeline/UNits/control.scala



























































































































































class Control extends Module {
  val io = IO(new Bundle {
    val opcode = Input(UInt(7.W))         // 7-bit opcode
    val mem_write = Output(Bool())        // whether a write to memory
    val branch = Output(Bool())           // whether a branch instruction
    val mem_read = Output(Bool())         // whether a read from memory
    val reg_write = Output(Bool())        // whether a register write
    val men_to_reg = Output(Bool())       // whether the value written to a register (for load instructions)
    val alu_operation = Output(UInt(3.W))
    val operand_A = Output(UInt(2.W))  // Operand A source selection for the ALU
    val operand_B = Output(Bool()) // Operand B source selection for the ALU

    // Indicates the type of extension to be used (e.g., sign-extend, zero-extend)
    val extend = Output(UInt(2.W))   
    val next_pc_sel = Output(UInt(2.W)) // next PC value (e.g., PC+4, branch target, jump target)
  })
  io.mem_write := 0.B
  io.branch := 0.B
  io.mem_read := 0.B
  io.reg_write := 0.B
  io.men_to_reg := 0.B
  io.alu_operation := 0.U
  io.operand_A := 0.U
  io.operand_B := 0.B
  io.extend := 0.U
  io.next_pc_sel := 0.U

  switch(io.opcode) {
    // R type instructions (e.g., add, sub)
    is(51.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 1.B
      io.men_to_reg := 0.B
      io.alu_operation := 0.U
      io.operand_A := 0.U
      io.operand_B := 0.B
      io.extend := 0.U
      io.next_pc_sel := 0.U
    }

    // I type instructions (e.g., immediate operations)
    is(19.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 1.B
      io.men_to_reg := 0.B
      io.alu_operation := 1.U
      io.operand_A := 0.U
      io.operand_B := 1.B
      io.extend := 0.U
      io.next_pc_sel := 0.U
    }

    // S type instructions (e.g., store operations)
    is(35.U) {
      io.mem_write := 1.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 0.B
      io.men_to_reg := 0.B
      io.alu_operation := 5.U
      io.operand_A := 0.U
      io.operand_B := 1.B
      io.extend := 1.U
      io.next_pc_sel := 0.U
    }

    // Load instructions (e.g., load data from memory)
    is(3.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 1.B
      io.reg_write := 1.B
      io.men_to_reg := 1.B
      io.alu_operation := 4.U
      io.operand_A := 0.U
      io.operand_B := 1.B
      io.extend := 0.U
      io.next_pc_sel := 0.U
    }

    // SB type instructions (e.g., conditional branch)
    is(99.U) {
      io.mem_write := 0.B
      io.branch := 1.B
      io.mem_read := 0.B
      io.reg_write := 0.B
      io.men_to_reg := 0.B
      io.alu_operation := 2.U
      io.operand_A := 0.U
      io.operand_B := 0.B
      io.extend := 0.U
      io.next_pc_sel := 1.U
    }

    // UJ type instructions (e.g., jump and link)
    is(111.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 1.B
      io.men_to_reg := 0.B
      io.alu_operation := 3.U
      io.operand_A := 1.U
      io.operand_B := 0.B
      io.extend := 0.U
      io.next_pc_sel := 2.U
    }

    // Jalr instruction (e.g., jump and link register)
    is(103.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 1.B
      io.men_to_reg := 0.B
      io.alu_operation := 3.U
      io.operand_A := 1.U
      io.operand_B := 0.B
      io.extend := 0.U
      io.next_pc_sel := 3.U
    }

    // U type (LUI) instructions (e.g., load upper immediate)
    is(55.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 1.B
      io.men_to_reg := 0.B
      io.alu_operation := 6.U
      io.operand_A := 3.U
      io.operand_B := 1.B
      io.extend := 2.U
      io.next_pc_sel := 0.U
    }

    // U type (AUIPC) instructions (e.g., add immediate to PC)
    is(23.U) {
      io.mem_write := 0.B
      io.branch := 0.B
      io.mem_read := 0.B
      io.reg_write := 1.B
      io.men_to_reg := 0.B
      io.alu_operation := 7.U
      io.operand_A := 2.U
      io.operand_B := 1.B
      io.extend := 2.U
      io.next_pc_sel := 0.U
    }
  }
}

The code snippet above implements the control unit for a 5-stage RISC-V pipeline. This module generates a suite of control signals—such as memory write, branch, memory read, register write, memory-to-register, ALU operation, operand selection, extension type, and next PC selection—that steer the processor’s datapath. Using a switch-case construct keyed on the opcode, the module assigns specific values to these signals according to the instruction type (e.g., R-type, I-type, S-type, SB-type, U-type, UJ-type, etc.). The accompanying diagram and mapping table illustrate how these signals are routed to the appropriate hardware components in the pipeline.

Label	Signal Name (Code)	Signal Name (Diagram)
1	io.mem_write	MemWrite
2	io.branch	Branch
3	io.mem_read	MemRead
4	io.reg_write	RegWrite
5	io.men_to_reg	MemtoReg
6	io.alu_operation	ALUSrc
7	io.operand_a	ALUOp1
8	io.operand_b	ALUOp0

1.7 Branching Unit

Filepath: src/main/scala/Pipeline/UNits/BRANCH.scala





































class Branch extends Module {
  val io = IO(new Bundle {
    val fnct3 = Input(UInt(3.W))
    val branch = Input(Bool())
    val arg_x = Input(SInt(32.W))
    val arg_y = Input(SInt(32.W))
    val br_taken = Output(Bool())
  })
  io.br_taken := false.B

  when(io.branch) {
    // beq
    when(io.fnct3 === 0.U) {
      io.br_taken := io.arg_x === io.arg_y
    }
    // bne
    .elsewhen(io.fnct3 === 1.U) {
      io.br_taken := io.arg_x =/= io.arg_y
    }
    // blt
    .elsewhen(io.fnct3 === 4.U) {
      io.br_taken := io.arg_x < io.arg_y
    }
    // bge
    .elsewhen(io.fnct3 === 5.U) {
      io.br_taken := io.arg_x >= io.arg_y
    }
    // bltu (unsigned less than)
    .elsewhen(io.fnct3 === 6.U) {
      io.br_taken := io.arg_x.asUInt < io.arg_y.asUInt
    }
    // bgeu (unsigned greater than or equal)
    .elsewhen(io.fnct3 === 7.U) {
      io.br_taken := io.arg_x.asUInt >= io.arg_y.asUInt
    }
  }
}

The code snippet implements branch decision logic for RISC-V's conditional branch instructions—namely, beq, bne, blt, bge, bltu, and bgeu. It uses four input ports: io.fnct3, which indicates the specific branch condition based on the instruction's function field; io.branch, a Boolean flag identifying whether the current instruction is an SB-Type branch; and io.arg_x and io.arg_y, which are the operands to be compared. Based on the value of fnct3, the module evaluates the appropriate comparison between arg_x and arg_y, and if the condition is satisfied, sets the output io.br_taken to true, indicating that a branch should be taken.

1.8 ALU Control Unit

Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala

































class AluControl extends Module {
  val io = IO(new Bundle {
    val func3 = Input(UInt(3.W))
    val func7 = Input(Bool())
    val aluOp = Input(UInt(3.W))
    val out = Output(UInt(5.W))
  })
  io.out := 0.U

  // R type
  when(io.aluOp === 0.U) {
    io.out := Cat(0.U(2.W), io.func7, io.func3)

  // I type
  }.elsewhen(io.aluOp === 1.U) {
    io.out := Cat("b00".U(2.W), io.func3)

  // SB type
  }.elsewhen(io.aluOp === 2.U) {
    io.out := Cat("b010".U(3.W), io.func3)

  // Branch type
  }.elsewhen(io.aluOp === 3.U) {
    io.out := "b11111".U

  // Loads, S type, U type (lui), U type (auipc)
  }.elsewhen(io.aluOp === 4.U || io.aluOp === 5.U || io.aluOp === 6.U || io.aluOp === 7.U) {
    io.out := "b00000".U

  } .otherwise {
    io.out := 0.U
  }
}

The code snippet above implements the ALU Control Unit for a RISC-V pipeline, as illustrated in the diagram below. This unit features three input ports—func3, func7, and aluOp (a signal provided by the core control unit)—and one output port, io.out. The 5-bit output is determined by combining values from these inputs in a way that depends on the instruction type. For example, R-type instructions derive the ALU operation by concatenating specific bits from func7 and func3, while I-type instructions form the control signal by prepending a fixed two-bit value to func3. Other instruction types—such as branch (SB type), jump, and load/store operations—are assigned specific constant values to control the ALU accordingly.

1.9 ALU Unit

Filepath: src/main/scala/Pipeline/UNits/Alu.scala











































































object AluOpCode {
  val ALU_ADD     =   0.U(5.W)
  val ALU_ADDI    =   0.U(5.W)
  val ALU_SW      =   0.U(5.W)
  val ALU_LW      =   0.U(5.W)
  val ALU_LUI     =   0.U(5.W)
  val ALU_AUIPC   =   0.U(5.W)
  val ALU_SLL     =   1.U(5.W)
  val ALU_SLLI    =   1.U(5.W)
  val ALU_SLT     =   2.U(5.W)
  val ALU_SLTI    =   2.U(5.W)
  val ALU_SLTU    =   3.U(5.W)
  val ALU_SLTUI   =   3.U(5.W)
  val ALU_XOR     =   4.U(5.W)
  val ALU_XORI    =   4.U(5.W)
  val ALU_SRL     =   5.U(5.W)
  val ALU_SRLI    =   5.U(5.W)
  val ALU_OR      =   6.U(5.W)
  val ALU_ORI     =   6.U(5.W)
  val ALU_AND     =   7.U(5.W)
  val ALU_ANDI    =   7.U(5.W)
  val ALU_SUB     =   8.U(5.W)
  val ALU_SRA     =   13.U(5.W)
  val ALU_SRAI    =   13.U(5.W)
  val ALU_JAL     =   31.U(5.W)
  val ALU_JALR    =   31.U(5.W)
}

class ALU extends Module {
  val io = IO(new Bundle {
    val in_A = Input(SInt(32.W))
    val in_B = Input(SInt(32.W))
    val alu_Op = Input(UInt(5.W))
    val out = Output(SInt(32.W))
  })

  val result = WireDefault(0.S(32.W))
  switch(io.alu_Op) {
    is(ALU_ADD, ALU_ADDI, ALU_SW, ALU_LW, ALU_LUI, ALU_AUIPC) {
      result := io.in_A + io.in_B
    }
    is(ALU_SLL, ALU_SLLI) {
      result := (io.in_A.asUInt << io.in_B(4, 0)).asSInt
    }
    is(ALU_SLT, ALU_SLTI) {
      result := Mux(io.in_A < io.in_B, 1.S, 0.S)
    }
    is(ALU_SLTU, ALU_SLTUI) {
      result := Mux(io.in_A.asUInt < io.in_B.asUInt, 1.S, 0.S)
    }
    is(ALU_XOR, ALU_XORI) {
      result := io.in_A ^ io.in_B
    }
    is(ALU_SRL, ALU_SRLI) {
      result := (io.in_A.asUInt >> io.in_B(4, 0)).asSInt
    }
    is(ALU_OR, ALU_ORI) {
      result := io.in_A | io.in_B
    }
    is(ALU_AND, ALU_ANDI) {
      result := io.in_A & io.in_B
    }
    is(ALU_SUB) {
      result := io.in_A - io.in_B
    }
    is(ALU_SRA, ALU_SRAI) {
      result := (io.in_A >> io.in_B(4, 0)).asSInt
    }
    is(ALU_JAL, ALU_JALR) {
      result := io.in_A
    }
  }

  io.out := result
}

The code snippet implements the ALU unit for a RISC-V pipeline, responsible for executing various arithmetic and logical operations based on the instruction type. The module accepts three input ports: two operands (io.in_A and io.in_B) and an operation code (io.alu_Op) coming from the ALU Control Unit. The result of the computation is output via io.out. For example, when io.alu_Op is set to ALU_ADD or ALU_ADDI (among other similar opcodes for load/store and immediate operations), the module computes the sum of io.in_A and io.in_B and assigns the result to io.out.

2. Pipeline Registers

Since the RISC-V pipeline consists of five stages, it requires four sets of pipeline registers. These registers are encapsulated in modules labeled IF/ID, ID/EX, EX/MEM, and MEM/WB, where the slash indicates the two adjacent stages that the register bridges. These pipeline registers are painted orange in the illustration below.

2.1 IF_ID Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/IF_ID.scala

































class IF_ID extends Module {
    val io = IO(new Bundle {
        val pc_in               = Input (SInt(32.W))         // PC in
        val pc4_in              = Input (UInt(32.W))         // PC4 in
        val SelectedPC          = Input (SInt(32.W))
        val SelectedInstr       = Input (UInt(32.W))

        val pc_out              = Output (SInt(32.W))        // PC out
        val pc4_out             = Output (UInt(32.W))        // PC + 4 out
        val SelectedPC_out      = Output (SInt(32.W))
        val SelectedInstr_out   = Output (UInt(32.W))
    })

    val Pc_In               = RegInit (0.S (32.W))
    val Pc4_In              = RegInit (0.U (32.W))
    val S_pc                = RegInit (0.S (32.W))
    val S_instr             = RegInit (0.U (32.W))

    Pc_In                   := io.pc_in
    Pc4_In                  := io.pc4_in
    S_pc                    := io.SelectedPC
    S_instr                 := io.SelectedInstr

    io.pc_out               := Pc_In
    io.pc4_out              := Pc4_In
    io.SelectedPC_out       := S_pc
    io.SelectedInstr_out    := S_instr

    // io.pc_out               := RegNext(io.pc_in)
    // io.pc4_out              := RegNext(io.pc4_in)
    // io.SelectedPC_out       := RegNext(io.SelectedPC)
    // io.SelectedInstr_out    := RegNext(io.SelectedInstr)
}

Although the illustration above shows only three register ports at IF/ID, the design also takes into account hazard detection (which will be discussed later). In this context, the SelectedPC signal represents the program counter after hazard resolution. Consequently, the IF/ID pipeline register stores four values: io.pc_in, io.pc4_in, io.SelectedPC, and io.SelectedInstr. These registers are instantiated using the RegInit class, which initializes them with default values.

2.2 ID_EX Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/ID_EX.scala




























































class ID_EX extends Module {
  val io = IO(new Bundle {
    val rs1_in              = Input(UInt(5.W))
    val rs2_in              = Input(UInt(5.W))
    val rs1_data_in         = Input(SInt(32.W))
    val rs2_data_in         = Input(SInt(32.W))
    val imm                 = Input(SInt(32.W))
    val rd_in               = Input(UInt(5.W))
    val func3_in            = Input(UInt(3.W))
    val func7_in            = Input(Bool())
    val ctrl_MemWr_in       = Input(Bool())
    val ctrl_Branch_in      = Input(Bool())
    val ctrl_MemRd_in       = Input(Bool())
    val ctrl_Reg_W_in       = Input(Bool())
    val ctrl_MemToReg_in    = Input(Bool())
    val ctrl_AluOp_in       = Input(UInt(3.W))
    val ctrl_OpA_in         = Input(UInt(2.W))
    val ctrl_OpB_in         = Input(Bool())
    val ctrl_nextpc_in      = Input(UInt(2.W))
    val IFID_pc4_in         = Input(UInt(32.W))

    val rs1_out             = Output(UInt(5.W))
    val rs2_out             = Output(UInt(5.W))
    val rs1_data_out        = Output(SInt(32.W))
    val rs2_data_out        = Output(SInt(32.W))
    val rd_out              = Output(UInt(5.W))
    val imm_out             = Output(SInt(32.W))
    val func3_out           = Output(UInt(3.W))
    val func7_out           = Output(Bool())
    val ctrl_MemWr_out      = Output(Bool())
    val ctrl_Branch_out     = Output(Bool())
    val ctrl_MemRd_out      = Output(Bool())
    val ctrl_Reg_W_out      = Output(Bool())
    val ctrl_MemToReg_out   = Output(Bool())
    val ctrl_AluOp_out      = Output(UInt(3.W))
    val ctrl_OpA_out        = Output(UInt(2.W))
    val ctrl_OpB_out        = Output(Bool())
    val ctrl_nextpc_out     = Output(UInt(2.W))
    val IFID_pc4_out        = Output(UInt(32.W))
  })

  io.rs1_out            :=  RegNext(io.rs1_in)
  io.rs2_out            :=  RegNext(io.rs2_in)
  io.rs1_data_out       :=  RegNext(io.rs1_data_in)
  io.rs2_data_out       :=  RegNext(io.rs2_data_in)
  io.imm_out            :=  RegNext(io.imm)
  io.rd_out             :=  RegNext(io.rd_in)
  io.func3_out          :=  RegNext(io.func3_in)
  io.func7_out          :=  RegNext(io.func7_in)
  io.ctrl_MemWr_out     :=  RegNext(io.ctrl_MemWr_in)
  io.ctrl_Branch_out    :=  RegNext(io.ctrl_Branch_in)
  io.ctrl_MemRd_out     :=  RegNext(io.ctrl_MemRd_in)
  io.ctrl_Reg_W_out     :=  RegNext(io.ctrl_Reg_W_in)
  io.ctrl_MemToReg_out  :=  RegNext(io.ctrl_MemToReg_in)
  io.ctrl_AluOp_out     :=  RegNext(io.ctrl_AluOp_in)
  io.ctrl_OpA_out       :=  RegNext(io.ctrl_OpA_in)
  io.ctrl_OpB_out       :=  RegNext(io.ctrl_OpB_in)
  io.ctrl_nextpc_out    :=  RegNext(io.ctrl_nextpc_in)
  io.IFID_pc4_out       :=  RegNext(io.IFID_pc4_in)
}

The code snippet implements the ID/EX pipeline register, which captures and stores several critical values for the subsequent execution stage. In particular, it holds the operand data (rs1_data and rs2_data), the incremented program counter (IFID_pc4), and the immediate value (imm).

Additionally, it preserves nine control signals generated during instruction decode, ensuring proper propagation through the multi-stage pipeline. Register addresses and function fields such as rs1, rs2, rd, func3, and func7 are also stored to support data forwarding in the event of hazards.

RegNext is used instead of RegInit because it automatically captures and updates each value at the next clock cycle, maintaining seamless data flow between pipeline stages without the need for an explicit initial value.

2.3 EX_MEM Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/EX_MEM.scala



























class EX_MEM extends Module {
  val io = IO(new Bundle {
    val IDEX_MEMRD          =   Input(Bool())
    val IDEX_MEMWR          =   Input(Bool())
    val IDEX_MEMTOREG       =   Input(Bool())
    val IDEX_REG_W          =   Input(Bool())
    val IDEX_rs2            =   Input(SInt(32.W))
    val IDEX_rd             =   Input(UInt(5.W))
    val alu_out             =   Input(SInt(32.W))

    val EXMEM_memRd_out     = Output(Bool())
    val EXMEM_memWr_out     = Output(Bool())
    val EXMEM_memToReg_out  = Output(Bool())
    val EXMEM_reg_w_out     = Output(Bool())
    val EXMEM_rs2_out       = Output(SInt(32.W))
    val EXMEM_rd_out        = Output(UInt(5.W))
    val EXMEM_alu_out       = Output(SInt(32.W))
    })
  
    io.EXMEM_memRd_out      := RegNext(io.IDEX_MEMRD)
    io.EXMEM_memWr_out      := RegNext(io.IDEX_MEMWR)
    io.EXMEM_memToReg_out   := RegNext(io.IDEX_MEMTOREG)
    io.EXMEM_reg_w_out      := RegNext(io.IDEX_REG_W)
    io.EXMEM_rs2_out        := RegNext(io.IDEX_rs2)
    io.EXMEM_rd_out         := RegNext(io.IDEX_rd)
    io.EXMEM_alu_out        := RegNext(io.alu_out)
}

The code snippet above implements the EX/MEM pipeline registers, which transfer critical data and control signals from the execution stage (EX) to the memory stage (MEM). In this module, essential control signals—namely, memRD, memWr, and memToReg—are preserved to ensure proper memory operations and data routing. Additionally, the ALU result (alu_out) is stored along with the reg_w_out and rd_out signals, which are vital for hazard detection and data forwarding in later pipeline stages.

2.4 MEM_WB Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/MEM_WB.scala
























class MEM_WB extends Module {
  val io = IO(new Bundle {
    val EXMEM_MEMTOREG      = Input(Bool())
    val EXMEM_REG_W         = Input(Bool())
    val EXMEM_MEMRD         = Input(Bool())
    val EXMEM_rd            = Input(UInt(5.W))
    val in_dataMem_out      = Input(SInt(32.W))
    val in_alu_out          = Input(SInt(32.W))

    val MEMWB_memToReg_out  = Output(Bool())
    val MEMWB_reg_w_out     = Output(Bool())
    val MEMWB_memRd_out     = Output(Bool())
    val MEMWB_rd_out        = Output(UInt(5.W))
    val MEMWB_dataMem_out   = Output(SInt(32.W))
    val MEMWB_alu_out       = Output(SInt(32.W))
  })

  io.MEMWB_memToReg_out     := RegNext(io.EXMEM_MEMTOREG)
  io.MEMWB_reg_w_out        := RegNext(io.EXMEM_REG_W)
  io.MEMWB_memRd_out        := RegNext(io.EXMEM_MEMRD)
  io.MEMWB_rd_out           := RegNext(io.EXMEM_rd)
  io.MEMWB_dataMem_out      := RegNext(io.in_dataMem_out)
  io.MEMWB_alu_out          := RegNext(io.in_alu_out)
}

The code snippet above implements the MEM/WB pipeline registers, which transfer essential data from the memory stage (MEM) to the write-back stage (WB). Specifically, this module preserves control signals such as memToReg, reg_w, and memRd, as well as key data values including the destination register (rd), data from memory (dataMem), and the ALU output (alu).

3. Memory Units

In the RISC-V pipeline, two distinct memory units are employed: instruction memory and data memory. The repository implements these as separate modules, each tailored to its specific role in the processor's operation.

3.1 Inst-Memory

Filepath: src/main/scala/Pipeline/Memory/InstMem.scala









class InstMem(initFile: String) extends Module {
  val io = IO(new Bundle {
    val addr        =   Input(UInt(32.W))       // Address input to fetch instruction
    val data        =   Output(UInt(32.W))      // Output instruction
  })
  val imem = Mem(1024, UInt(32.W))
  loadMemoryFromFile(imem, initFile)
  io.data := imem(io.addr/4.U)
}

The code snippet implements the instruction memory module for the RISC-V pipeline. This module features one 32-bit address input (io.addr) used to fetch instructions and one 32-bit data output (io.data) for delivering the corresponding instruction. The memory is instantiated with Mem(1024, UInt(32.W)), which creates an array of 1024 entries, each capable of storing a 32-bit instruction. The initFile parameter specifies the file from which the initial contents of the instruction memory are loaded, and the function loadMemoryFromFile is used to populate the memory with these values. Finally, the module accesses the instruction memory by dividing the input address by 4 to ensure proper word alignment.

3.2 Data-Memory

Filepath: src/main/scala/Pipeline/Memory/DataMemory.scala


















class DataMemory extends Module {
  val io = IO(new Bundle {
    val addr        = Input(UInt(32.W))         // Address input
    val dataIn      = Input(SInt(32.W))         // Data to be written
    val mem_read    = Input(Bool())             // Memory read enable
    val mem_write   = Input(Bool())             // Memory write enable
    val dataOut     = Output(SInt(32.W))        // Data output
  })
  val Dmemory = Mem(1024, SInt(32.W))
  io.dataOut := 0.S

  when(io.mem_write) {
    Dmemory.write(io.addr, io.dataIn)
  }
  when(io.mem_read) {
    io.dataOut := Dmemory.read(io.addr)
  }
}

The code snippet implements the data memory unit for the RISC-V pipeline. This module features four input ports—io.addr, io.dataIn, io.mem_read, and io.mem_write—and one output port, io.dataOut. It instantiates a memory array with 1024 entries, where each entry is a 32-bit word. When the control signal io.mem_write is asserted, the module writes the data from io.dataIn into the memory at the address specified by io.addr. Conversely, if io.mem_read is activated, the module reads the data stored at io.addr and outputs it via io.dataOut.

4. Hazard Units

4.1 Structural Hazard

Filepath: src/main/scala/Pipeline/Hazard Units/StructuralHazard.scala
























class StructuralHazard extends Module {
  val io = IO(new Bundle {
    val rs1 = Input(UInt(5.W))
    val rs2 = Input(UInt(5.W))
    val MEM_WB_regWr = Input(Bool())
    val MEM_WB_Rd = Input(UInt(5.W))
    val fwd_rs1 = Output(Bool())
    val fwd_rs2 = Output(Bool())
  })

  // Determine if forwarding is needed for rs1
  when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs1) {
    io.fwd_rs1 := true.B
  }.otherwise {
    io.fwd_rs1 := false.B
  }

  // Determine if forwarding is needed for rs2
  when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs2) {
    io.fwd_rs2 := true.B
  }.otherwise {
    io.fwd_rs2 := false.B
  }
}

The code snippet implements the structural hazard resolution mechanism for the RISC-V pipeline. This module is connected to four input ports—rs1, rs2, MEM_WB_regWr, and MEM_WB_Rd—and produces two output ports—fwd_rs1 and fwd_rs2. The module checks whether the register destination in the MEM/WB stage (MEM_WB_Rd) matches either source register (rs1 or rs2) while ensuring that write-back is enabled (i.e., MEM_WB_regWr is asserted). If a match is detected, the corresponding forwarding signal (fwd_rs1 or fwd_rs2) is set to true; otherwise, it remains false.

4.2 Hazard Detection

Filepath: src/main/scala/Pipeline/Hazard Units/HazardDetection.scala
































class HazardDetection extends Module {
  val io = IO(new Bundle {
    val IF_ID_inst = Input(UInt(32.W))
    val ID_EX_memRead = Input(Bool())
    val ID_EX_rd = Input(UInt(5.W))
    val pc_in = Input(SInt(32.W))
    val current_pc = Input(SInt(32.W))

    val inst_forward = Output(Bool())
    val pc_forward = Output(Bool())
    val ctrl_forward = Output(Bool())
    val inst_out = Output(UInt(32.W))
    val pc_out = Output(SInt(32.W))
    val current_pc_out = Output(SInt(32.W))
  })

  val Rs1 = io.IF_ID_inst(19, 15)
  val Rs2 = io.IF_ID_inst(24, 20)

  when(io.ID_EX_memRead === 1.B && ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))) {
    io.inst_forward := true.B
    io.pc_forward := true.B
    io.ctrl_forward := true.B
  }.otherwise {
    io.inst_forward := false.B
    io.pc_forward := false.B
    io.ctrl_forward := false.B
  }
  io.inst_out := io.IF_ID_inst
  io.pc_out := io.pc_in
  io.current_pc_out := io.current_pc
}

The code snippet implements the hazard detection mechanism, which monitors potential data hazards in the pipeline. When the ID/EX stage is performing a memory read (i.e., io.ID_EX_memRead is true) and the destination register (io.ID_EX_rd) matches either of the source registers specified in the instruction (Rs1 or Rs2 extracted from io.IF_ID_inst), the module asserts three forwarding signals: inst_forward, pc_forward, and ctrl_forward are all set to true. These signals indicate that instruction, program counter, and control signal forwarding are required to avoid pipeline stalls. Otherwise, all forwarding signals remain false. Additionally, the module passes through the values of io.IF_ID_inst, io.pc_in, and io.current_pc to io.inst_out, io.pc_out, and io.current_pc_out, respectively, ensuring that the instruction and relevant PC values continue to the next pipeline stage.

4.3 Forwarding Unit

Filepath: src/main/scala/Pipeline/Hazard Units/Forwarding.scala














































class Forwarding extends Module {
    val io = IO(new Bundle {
        val IDEX_rs1 = Input(UInt(5.W))
        val IDEX_rs2 = Input(UInt(5.W))
        val EXMEM_rd = Input(UInt(5.W))
        val EXMEM_regWr = Input(UInt(1.W))
        val MEMWB_rd = Input(UInt(5.W))
        val MEMWB_regWr = Input(UInt(1.W))
        
        val forward_a = Output(UInt(2.W))
        val forward_b = Output(UInt(2.W))
    })

    io.forward_a := "b00".U
    io.forward_b := "b00".U

    // EX HAZARD
    when(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && 
            (io.EXMEM_rd === io.IDEX_rs1.asUInt) && (io.EXMEM_rd === io.IDEX_rs2)) {
        io.forward_a := "b10".U
        io.forward_b := "b10".U

    }.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && 
            (io.EXMEM_rd === io.IDEX_rs2)) {    
        io.forward_b := "b10".U
    
    }.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && 
            (io.EXMEM_rd === io.IDEX_rs1)) {    
        io.forward_a := "b10".U
    }

    // MEM HAZARD
    when((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2) && 
            ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1) && (io.EXMEM_rd === io.IDEX_rs2))) {
        io.forward_a := "b01".U
        io.forward_b := "b01".U

    }.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs2) && 
            ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs2))){
        io.forward_b := "b01".U

    }.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && 
            ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1))){
        io.forward_a := "b01".U
        }
}

This module implements the forwarding unit, which dynamically selects and routes data from later pipeline stages to resolve data hazards in the RISC-V pipeline. The unit examines the source registers from the ID/EX stage (i.e., IDEX_rs1 and IDEX_rs2) and compares them with the destination registers from both the EX/MEM and MEM/WB stages. Depending on which stage provides the most recent data, the module assigns a corresponding two-bit value to the forwarding outputs (forward_a and forward_b). For example, when the EX/MEM stage is writing to a non-zero register that matches a source operand, the corresponding forward signal is set to binary 10, indicating that data should be forwarded directly from the EX/MEM stage.

In the MEM hazard section, the module addresses cases where the MEM/WB stage holds the data needed by the current instruction. Here, the module checks whether the MEM/WB stage is writing to a non-zero register that matches the source registers of the ID/EX stage. However, this forwarding is only enabled if the EX/MEM stage is not already forwarding for that register (thereby prioritizing EX hazards). If the conditions are met, the forward signal is set to binary 01, signaling that the required data should be forwarded from the MEM/WB stage. This mechanism ensures that even if an instruction's result has not been written back yet, the correct value is available for subsequent computations, thereby avoiding pipeline stalls.

4.4 Branch Forwarding

Filepath: src/main/scala/Pipeline/Hazard Units/BranchForward.scala







































































class BranchForward extends Module {
  val io = IO(new Bundle {
    val ID_EX_RD    = Input(UInt(5.W))
    val EX_MEM_RD   = Input(UInt(5.W))
    val MEM_WB_RD   = Input(UInt(5.W))
    val ID_EX_memRd = Input(UInt(1.W))
    val EX_MEM_memRd = Input(UInt(1.W))
    val MEM_WB_memRd = Input(UInt(1.W))
    val rs1         = Input(UInt(5.W))
    val rs2         = Input(UInt(5.W))
    val ctrl_branch = Input(UInt(1.W))

    val forward_rs1 = Output(UInt(4.W))
    val forward_rs2 = Output(UInt(4.W))
  })
  io.forward_rs1 := "b0000".U
  io.forward_rs2 := "b0000".U

  // Branch forwarding logic
  when(io.ctrl_branch === 1.U) {
    // ALU Hazard
    when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U) {
      when(io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) {
        io.forward_rs1 := "b0001".U
        io.forward_rs2 := "b0001".U
      }.elsewhen(io.ID_EX_RD === io.rs1) {
        io.forward_rs1 := "b0001".U
      }.elsewhen(io.ID_EX_RD === io.rs2) {
        io.forward_rs2 := "b0001".U
      }
    }

    // EX/MEM Hazard
    when(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U) {
      when(io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2)) {
        io.forward_rs1 := "b0010".U
        io.forward_rs2 := "b0010".U
      }.elsewhen(io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
        io.forward_rs1 := "b0010".U
      }.elsewhen(io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2)) {
        io.forward_rs2 := "b0010".U
      }
    }

    // MEM/WB Hazard
    when(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U) {
      when(io.MEM_WB_RD === io.rs1 && io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2)) {
        io.forward_rs1 := "b0011".U
        io.forward_rs2 := "b0011".U
      }.elsewhen(io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
        io.forward_rs1 := "b0011".U
      }.elsewhen(io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs2)) {
        io.forward_rs2 := "b0011".U
      }
    }

  // Jalr forwarding logic
  }.elsewhen(io.ctrl_branch === 0.U) {
    when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U && io.ID_EX_RD === io.rs1) {
      io.forward_rs1 := "b0110".U
    }.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
      io.forward_rs1 := "b0111".U
    }.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd === 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) {
      io.forward_rs1 := "b1001".U
    }.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
      io.forward_rs1 := "b1000".U
    }.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd === 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) {
      io.forward_rs1 := "b1010".U
    }
  }
}

The BranchForward module is a key component in the RISC-V pipeline, responsible for resolving data hazards during branch and Jalr instruction execution. It determines if source operands for branch evaluation need to be forwarded from later pipeline stages to avoid stalls. The module takes as inputs the destination register identifiers and memory read flags from the ID/EX, EX/MEM, and MEM/WB pipeline stages, alongside the source register identifiers (rs1 and rs2) of the branch instruction and a control signal (ctrl_branch). The outputs, forward_rs1 and forward_rs2, are four-bit signals indicating the source of the forwarded data. When ctrl_branch is set to 1, branch forwarding logic is applied by sequentially checking for hazards in the ID/EX, EX/MEM, and MEM/WB stages, forwarding the most recent valid data to the source registers based on specific matching conditions.

For Jalr instructions, indicated when ctrl_branch is set to 0, the module only evaluates the source register rs1 for potential forwarding. It similarly checks the ID/EX, EX/MEM, and MEM/WB stages for data matches, prioritizing the most recent and valid data for forwarding. Different codes are assigned to forward_rs1 based on whether the data comes from a memory read or a non-memory read operation. This modular and hierarchical approach ensures that the correct operand is always forwarded for branch or Jalr instruction evaluation, reducing pipeline stalls and maintaining efficient instruction execution.

5. Pipeline

5.1 MuxLookup select PC value

Filepath: src/main/scala/Pipeline/Main.scala















val PC_F = MuxLookup(HazardDetect.io.pc_forward, 0.S, Array(
(0.U) -> PC4.io.out.asSInt,
(1.U) -> HazardDetect.io.pc_out))

PC.io.in := PC_F                            // PC_in input
PC4.io.pc := PC.io.out.asUInt               // PC4_in input <- PC_out
InstMemory.io.addr := PC.io.out.asUInt      // Address to fetch instruction

val PC_for = MuxLookup (HazardDetect.io.inst_forward, 0.S, Array (
    (0.U) -> PC.io.out,
    (1.U) -> HazardDetect.io.current_pc_out))

val Instruction_F = MuxLookup (HazardDetect.io.inst_forward, 0.U, Array (
    (0.U) -> InstMemory.io.data,
    (1.U) -> HazardDetect.io.inst_out))

This code snippet demonstrates the use of MuxLookup to manage the Program Counter (PC) update logic in a pipeline processor. It incorporates hazard detection mechanisms to ensure the correct instruction is executed, even in the presence of potential pipeline hazards.

5.2 Register File Inputs (rs1 and rs2)


















// Decode connections (Control unit RegFile)
control_module.io.opcode := IF_ID_.io.SelectedInstr_out(6, 0)   // OPcode to check Instrcution TYpe
// Registerfile inputs
RegFile.io.rs1 := Mux(
control_module.io.opcode === 51.U ||    // R-type
control_module.io.opcode === 19.U ||    // I-type
control_module.io.opcode === 35.U ||    // S-type
control_module.io.opcode === 3.U ||     // I-type (load instructions)
control_module.io.opcode === 99.U ||    // SB-type (branch)
control_module.io.opcode === 103.U,     // JALR instruction
IF_ID_.io.SelectedInstr_out(19, 15), 0.U )

RegFile.io.rs2 := Mux(
control_module.io.opcode === 51.U || // R-type
control_module.io.opcode === 35.U || // S-type
control_module.io.opcode === 99.U,   // SB-type (branch)
IF_ID_.io.SelectedInstr_out(24, 20), 0.U)
RegFile.io.reg_write := control_module.io.reg_write

This code is responsible for decoding the fetched instruction by extracting its opcode to identify the instruction type. Based on the opcode and the instruction format, it determines the values of the rs1 and rs2 register fields, specifying the source registers to be used for operations. The rs1 field is selected for instruction types such as R-type, I-type, S-type, SB-type, and JALR, while the rs2 field is used for R-type, S-type, and SB-type instructions. Additionally, the reg_write signal is configured to enable or disable write-back to the register file (RegFile), depending on whether the current instruction requires a write operation. This ensures the proper setup of source registers and write-back control for subsequent execution stages.

Instruction	Opcode	Decimal
R-type	011 0011	51
I-type	001 0011	19
S-type	010 0011	35
I-type (load instructions)	000 0011	3
SB-type (branch)	110 0011	99
JALR instruction	110 0111	103

5.3 Data Forwarding for rs1 and rs2 to Resolve Pipeline Hazards
















//  rs1_data
when (Structural.io.fwd_rs1 === 0.U) {
  S_rs1DataIn := RegFile.io.rdata1
}.elsewhen (Structural.io.fwd_rs1 === 1.U) {
  S_rs1DataIn := RegFile.io.w_data
}.otherwise {
  S_rs1DataIn := 0.S 
}
// rs2_data
when (Structural.io.fwd_rs2 === 0.U) {
  S_rs2DataIn := RegFile.io.rdata2
}.elsewhen (Structural.io.fwd_rs2 === 1.U) {
  S_rs2DataIn := RegFile.io.w_data
}.otherwise {
  S_rs2DataIn := 0.S
}

This code implements data forwarding for the rs1 and rs2 source registers to handle potential data hazards in the pipeline.

S_rs1DataIn and S_rs2DataIn: Wires used to hold the correct values for rs1 and rs2 after evaluating forwarding needs.

Forwarding Logic:
- If no hazard exists, data is read directly from the register file.
- If a hazard is detected, data is forwarded from the write-back stage to avoid delays.
Default Behavior: Sets the values to 0.S if no valid data path is available.

This ensures that the pipeline uses the most up-to-date data for execution, maintaining correctness and avoiding unnecessary stalls.

5.4 Stalling Logic for Control Hazard Resolution in Pipeline




















// Stall when forward
when(HazardDetect.io.ctrl_forward === "b1".U) {
    ID_EX_.io.ctrl_MemWr_in       := 0.U
    ID_EX_.io.ctrl_MemRd_in       := 0.U
    ID_EX_.io.ctrl_MemToReg_in    := 0.U
    ID_EX_.io.ctrl_Reg_W_in       := 0.U
    ID_EX_.io.ctrl_AluOp_in       := 0.U
    ID_EX_.io.ctrl_OpB_in         := 0.U
    ID_EX_.io.ctrl_Branch_in      := 0.U
    ID_EX_.io.ctrl_nextpc_in      := 0.U
}.otherwise {
    ID_EX_.io.ctrl_MemWr_in      := control_module.io.mem_write
    ID_EX_.io.ctrl_MemRd_in      := control_module.io.mem_read
    ID_EX_.io.ctrl_MemToReg_in   := control_module.io.men_to_reg
    ID_EX_.io.ctrl_Reg_W_in      := control_module.io.reg_write 
    ID_EX_.io.ctrl_AluOp_in      := control_module.io.alu_operation
    ID_EX_.io.ctrl_OpB_in        := control_module.io.operand_B
    ID_EX_.io.ctrl_Branch_in     := control_module.io.branch
    ID_EX_.io.ctrl_nextpc_in     := control_module.io.next_pc_sel
}

This code snippet implements stalling logic to handle control hazards in a pipelined processor. When a hazard is detected, the pipeline stage is stalled by setting all control signals in the ID_EX pipeline register to 0. Otherwise, the normal control signals are passed through.

B. Rectifications

1. Exposing Registers

In addition to constructing a pipelined RISC-V CPU using Chisel, it is essential to verify the integrity of the structure. Therefore, we first verify the correctness of our RISC-V test code using a third-party processor simulator named Ripes. Next, we establish the expected register outputs and compare them with the results produced by our CPU.

However, since the register values are confined within the RegisterFile module, we need to "expose" them through the IO Bundle. The following code snippet shows the modified IO of this module, which exposes all argument registers, temporary registers, and save registers.
















































// RegisterFile (RegisterFile.scala)
val io = IO(new Bundle {
    
    val rs1       = Input(UInt(5.W))
    val rs2       = Input(UInt(5.W))
    val reg_write = Input(Bool())
    val w_reg     = Input(UInt(5.W))
    val w_data    = Input(SInt(32.W))
    val rdata1    = Output(SInt(32.W))
    val rdata2    = Output(SInt(32.W))

    // >> exposed argument registers
    val a0        = Output(SInt(32.W))
    val a1        = Output(SInt(32.W))
    val a2        = Output(SInt(32.W))
    val a3        = Output(SInt(32.W))
    val a4        = Output(SInt(32.W))
    val a5        = Output(SInt(32.W))
    val a6        = Output(SInt(32.W))
    val a7        = Output(SInt(32.W))
    // << exposed argument registers

    // >> exposed temporary registers
    val t0        = Output(SInt(32.W))
    val t1        = Output(SInt(32.W))
    val t2        = Output(SInt(32.W))
    val t3        = Output(SInt(32.W))
    val t4        = Output(SInt(32.W))
    val t5        = Output(SInt(32.W))
    val t6        = Output(SInt(32.W))
    // << exposed temporary registers

    // >> exposed save registers
    val s0        = Output(SInt(32.W))
    val s1        = Output(SInt(32.W))
    val s2        = Output(SInt(32.W))
    val s3        = Output(SInt(32.W))
    val s4        = Output(SInt(32.W))
    val s5        = Output(SInt(32.W))
    val s6        = Output(SInt(32.W))
    val s7        = Output(SInt(32.W))
    val s8        = Output(SInt(32.W))
    val s9        = Output(SInt(32.W))
    val s10       = Output(SInt(32.W))
    val s11       = Output(SInt(32.W))
    // << exposed save registers
    
})

After exposing these IO ports, we need to wire the register values to the corresponding output ports. The following code snippet implements the wiring logic within the module.






































// RegisterFile (RegisterFile.scala)

// >> wiring argument registers to corresponding output ports
io.a0 := Mux(io.reg_write && io.w_reg === 10.U, io.w_data, regfile(10))
io.a1 := Mux(io.reg_write && io.w_reg === 11.U, io.w_data, regfile(11))
io.a2 := Mux(io.reg_write && io.w_reg === 12.U, io.w_data, regfile(12))
io.a3 := Mux(io.reg_write && io.w_reg === 13.U, io.w_data, regfile(13))
io.a4 := Mux(io.reg_write && io.w_reg === 14.U, io.w_data, regfile(14))
io.a5 := Mux(io.reg_write && io.w_reg === 15.U, io.w_data, regfile(15))
io.a6 := Mux(io.reg_write && io.w_reg === 16.U, io.w_data, regfile(16))
io.a7 := Mux(io.reg_write && io.w_reg === 17.U, io.w_data, regfile(17))
// << wiring argument registers to corresponding output ports

// >> wiring temporary registers to corresponding output ports
io.t0 := Mux(io.reg_write && io.w_reg === 5.U,  io.w_data, regfile(5))
io.t1 := Mux(io.reg_write && io.w_reg === 6.U,  io.w_data, regfile(6))
io.t2 := Mux(io.reg_write && io.w_reg === 7.U,  io.w_data, regfile(7))
io.t3 := Mux(io.reg_write && io.w_reg === 28.U, io.w_data, regfile(28))
io.t4 := Mux(io.reg_write && io.w_reg === 29.U, io.w_data, regfile(29))
io.t5 := Mux(io.reg_write && io.w_reg === 30.U, io.w_data, regfile(30))
io.t6 := Mux(io.reg_write && io.w_reg === 31.U, io.w_data, regfile(31))
// << wiring temporary registers to corresponding output ports

// >> wiring save registers to corresponding output ports
io.s0  := Mux(io.reg_write && io.w_reg === 8.U,  io.w_data, regfile(8))
io.s1  := Mux(io.reg_write && io.w_reg === 9.U,  io.w_data, regfile(9))
io.s2  := Mux(io.reg_write && io.w_reg === 18.U, io.w_data, regfile(18))
io.s3  := Mux(io.reg_write && io.w_reg === 19.U, io.w_data, regfile(19))
io.s4  := Mux(io.reg_write && io.w_reg === 20.U, io.w_data, regfile(20))
io.s5  := Mux(io.reg_write && io.w_reg === 21.U, io.w_data, regfile(21))
io.s6  := Mux(io.reg_write && io.w_reg === 22.U, io.w_data, regfile(22))
io.s7  := Mux(io.reg_write && io.w_reg === 23.U, io.w_data, regfile(23))
io.s8  := Mux(io.reg_write && io.w_reg === 24.U, io.w_data, regfile(24))
io.s9  := Mux(io.reg_write && io.w_reg === 25.U, io.w_data, regfile(25))
io.s10 := Mux(io.reg_write && io.w_reg === 26.U, io.w_data, regfile(26))
io.s11 := Mux(io.reg_write && io.w_reg === 27.U, io.w_data, regfile(27))
// << wiring save registers to corresponding output ports

Similarly, we expose the register values outside the PIPELINE module using the subsequent code snippets.










































// PIPELINE (Main.scala)

val io = IO(new Bundle {
    val out     = Output(SInt(32.W))
    val out_pc  = Output(SInt(32.W))

    // >> exposed argument registers
    val a0        = Output(SInt(32.W))
    val a1        = Output(SInt(32.W))
    val a2        = Output(SInt(32.W))
    val a3        = Output(SInt(32.W))
    val a4        = Output(SInt(32.W))
    val a5        = Output(SInt(32.W))
    val a6        = Output(SInt(32.W))
    val a7        = Output(SInt(32.W))
    // << exposed argument registers

    // >> exposed temporary registers
    val t0        = Output(SInt(32.W))
    val t1        = Output(SInt(32.W))
    val t2        = Output(SInt(32.W))
    val t3        = Output(SInt(32.W))
    val t4        = Output(SInt(32.W))
    val t5        = Output(SInt(32.W))
    val t6        = Output(SInt(32.W))
    // << exposed temporary registers

    // >> exposed save registers
    val s0        = Output(SInt(32.W))
    val s1        = Output(SInt(32.W))
    val s2        = Output(SInt(32.W))
    val s3        = Output(SInt(32.W))
    val s4        = Output(SInt(32.W))
    val s5        = Output(SInt(32.W))
    val s6        = Output(SInt(32.W))
    val s7        = Output(SInt(32.W))
    val s8        = Output(SInt(32.W))
    val s9        = Output(SInt(32.W))
    val s10       = Output(SInt(32.W))
    val s11       = Output(SInt(32.W))
    // << exposed save registers
})






































// PIPELINE (Main.scala)

// >> wiring argument registers to corresponding output ports
io.out_a0  := RegFile.io.a0
io.out_a1  := RegFile.io.a1
io.out_a2  := RegFile.io.a2
io.out_a3  := RegFile.io.a3
io.out_a4  := RegFile.io.a4
io.out_a5  := RegFile.io.a5
io.out_a6  := RegFile.io.a6
io.out_a7  := RegFile.io.a7
// << wiring argument registers to corresponding output ports

// >> wiring temporary registers to corresponding output ports
io.out_t0  := RegFile.io.t0
io.out_t1  := RegFile.io.t1
io.out_t2  := RegFile.io.t2
io.out_t3  := RegFile.io.t3
io.out_t4  := RegFile.io.t4
io.out_t5  := RegFile.io.t5
io.out_t6  := RegFile.io.t6
// << wiring temporary registers to corresponding output ports

// >> wiring save registers to corresponding output ports
io.out_s0  := RegFile.io.s0
io.out_s1  := RegFile.io.s1
io.out_s2  := RegFile.io.s2
io.out_s3  := RegFile.io.s3
io.out_s4  := RegFile.io.s4
io.out_s5  := RegFile.io.s5
io.out_s6  := RegFile.io.s6
io.out_s7  := RegFile.io.s7
io.out_s8  := RegFile.io.s8
io.out_s9  := RegFile.io.s9
io.out_s10 := RegFile.io.s10
io.out_s11 := RegFile.io.s11
// << wiring save registers to corresponding output ports

Finally, in our MainTest.scala, we add test cases following the structure shown in the code snippet below:


















// MainTest.scala

class TOPTest extends FreeSpec with ChiselScalatestTester{
"test a0" in {
    
    // test program
    test(new PIPELINE("/home/mi2s/FProject/compilation/testA0.txt")){
        x =>
        
        // the number of clock cycles to finish the program
        x.clock.step(6)

        // the expected value of a0 register
        x.io.out_a0.expect(10.S)

       }
   }
}

2. Logging States

The code provided in the repository initially could not properly execute our test cases. Consequently, we traced the execution process and monitored register states after each clock cycle. However, since neither Chisel nor the author of the repository offers a user-friendly debugging tool like the Ripes simulator, which displays register values, we had to implement logging using printf statements. The following 3 code snippets demonstrate logging for temporary, argument, and save registers.




// PIPELINE (Main.scala)

// t0-t6 : temporary registers
printf(p"[ ${hwCounter} ] t0: ${Hexadecimal(RegFile.io.t0)}, t1: ${Hexadecimal(RegFile.io.t1)}, t2: ${Hexadecimal(RegFile.io.t2)}, t3: ${Hexadecimal(RegFile.io.t3)}, t4: ${Hexadecimal(RegFile.io.t4)}, t5: ${Hexadecimal(RegFile.io.t5)}, t6: ${Hexadecimal(RegFile.io.t6)}\n")




// PIPELINE (Main.scala)

// a0-a7 : argument registers
printf(p"[ ${hwCounter} ] a0: ${Hexadecimal(RegFile.io.a0)}, a1: ${Hexadecimal(RegFile.io.a1)}, a2: ${Hexadecimal(RegFile.io.a2)}, a3: ${Hexadecimal(RegFile.io.a3)}, a4: ${Hexadecimal(RegFile.io.a4)}, a5: ${Hexadecimal(RegFile.io.a5)}, a6: ${Hexadecimal(RegFile.io.a6)}, a7: ${Hexadecimal(RegFile.io.a7)}\n")




// PIPELINE (Main.scala)

// s0-s11 : save registers
printf(p"[ ${hwCounter} ] s0: ${Hexadecimal(RegFile.io.s0)}, s1: ${Hexadecimal(RegFile.io.s1)}, s2: ${Hexadecimal(RegFile.io.s2)}, s3: ${Hexadecimal(RegFile.io.s3)}, s4: ${Hexadecimal(RegFile.io.s4)}, s5: ${Hexadecimal(RegFile.io.s5)}, s6: ${Hexadecimal(RegFile.io.s6)}, s7: ${Hexadecimal(RegFile.io.s7)}, s8: ${Hexadecimal(RegFile.io.s8)}, s9: ${Hexadecimal(RegFile.io.s9)}, s10: ${Hexadecimal(RegFile.io.s10)}, s11: ${Hexadecimal(RegFile.io.s11)}\n")

Additionally, to effectively monitor and analyze the IF_ID and ID_EXE pipelines and ALU controls, we include the following code snippet for logging supplementary information.




// PIPELINE (Main.scala)

// control signals from decode to execute (including ALU operands)
printf(p"[ ${hwCounter} ] idx: ${Decimal(PC.io.out / 4.S + 1.S)} op: 0x${Hexadecimal(control_module.io.opcode)} rs1: ${Decimal(RegFile.io.rs1)} (0x${Hexadecimal(RegFile.io.rdata1)}) rs2: ${Decimal(RegFile.io.rs2)} (0x${Hexadecimal(RegFile.io.rdata2)}) alu_arg1: 0x${Hexadecimal(ALU.io.in_A)} alu_arg2: 0x${Hexadecimal(ALU.io.in_B)} inst: 0x${Hexadecimal(InstMemory.io.data)} alu_ctrl_op_A: ${ID_EX_.io.ctrl_OpA_out} alu_forward_a: ${Forwarding.io.forward_a} alu_ctrl_op_B: ${ID_EX_.io.ctrl_OpB_out} alu_forward_b: ${Forwarding.io.forward_b} EXMEM_rd: ${Decimal(Forwarding.io.EXMEM_rd)} IDEX_rs1: ${Decimal(Forwarding.io.IDEX_rs1)} IDEX_rs1_data_out: 0x${Hexadecimal(ID_EX_.io.rs1_data_out)} EXMEM_alu_out: 0x${Hexadecimal(EX_MEM_M.io.EXMEM_alu_out)} IDEX_rs2_data: 0x${Hexadecimal(ID_EX_.io.rs2_data_out)} IDEX_rs1_data_in: 0x${Hexadecimal(ID_EX_.io.rs1_data_in)} fwd_rs1: ${Structural.io.fwd_rs1} MEM_WB_RD: ${Decimal(Forwarding.io.MEMWB_rd)} io_rs1: ${Decimal(ID_EX_.io.rs1_out)} io_rs2: ${Decimal(ID_EX_.io.rs2_out)} MEM_WB_RD_Data: ${Hexadecimal(MEM_WB_M.io.MEMWB_alu_out)} ALUOp: ${Decimal(ALU.io.alu_Op)}\n")

3. Structural Hazards

While testing one of our programs, we observed an unusual discrepancy by tracing the logs and comparing the register states with those produced by the Ripes simulator.

[          35 ] t0: 7fffffff, t1: 3fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          36 ] t0: 7fffffff, t1: 3fffffff, t2: 55555000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          37 ] t0: 7fffffff, t1: 3fffffff, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          38 ] t0: 7fffffff, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          39 ] t0: 6aaaaaaa, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          40 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          41 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          42 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888888, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000

In Ripes, at clock cycle 41, register t2 is expected to change to 0x33333000 and then to 0x33333333 due to the instruction li t2, 0x33333333.

[          35 ] idx:          20 op: 0x33 rs1:   6 (0x00007fff) rs2:   7 (0x00000000) alu_arg1: 0x55555000 alu_arg2: 0x00000555 inst: 0x406282b3 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   7 IDEX_rs1:   7 IDEX_rs1_data: 0x00000000 EXMEM_alu_out: 0x55555000
[          36 ] idx:          21 op: 0x33 rs1:   5 (0x7fffffff) rs2:   6 (0x3fffffff) alu_arg1: 0x3fffffff alu_arg2: 0x55555555 inst: 0x0022d313 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   7 IDEX_rs1:   6 IDEX_rs1_data: 0x3fffffff EXMEM_alu_out: 0x55555555
[          37 ] idx:          22 op: 0x13 rs1:   5 (0x7fffffff) rs2:   0 (0x00000000) alu_arg1: 0x7fffffff alu_arg2: 0x15555555 inst: 0x333333b7 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   6 IDEX_rs1:   5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x15555555
[          38 ] idx:          23 op: 0x37 rs1:   0 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x6aaaaaaa alu_arg2: 0x00000002 inst: 0x33338393 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   5 IDEX_rs1:   5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x6aaaaaaa
[          39 ] idx:          24 op: 0x13 rs1:   7 (0x55555555) rs2:   0 (0x00000000) alu_arg1: 0x15555555 alu_arg2: 0x33333000 inst: 0x00737333 alu_ctrl_op_A:  3 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   6 IDEX_rs1:   0 IDEX_rs1_data: 0x15555555 EXMEM_alu_out: 0x1aaaaaaa
[          40 ] idx:          25 op: 0x33 rs1:   6 (0x15555555) rs2:   7 (0x55555555) alu_arg1: 0x48888555 alu_arg2: 0x00000333 inst: 0x0072f3b3 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   7 IDEX_rs1:   7 IDEX_rs1_data: 0x55555555 EXMEM_alu_out: 0x48888555
[          41 ] idx:          26 op: 0x33 rs1:   5 (0x6aaaaaaa) rs2:   7 (0x55555555) alu_arg1: 0x1aaaaaaa alu_arg2: 0x48888888 inst: 0x007302b3 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   7 IDEX_rs1:   6 IDEX_rs1_data: 0x1aaaaaaa EXMEM_alu_out: 0x48888888
[          42 ] idx:          27 op: 0x33 rs1:   6 (0x1aaaaaaa) rs2:   7 (0x48888555) alu_arg1: 0x6aaaaaaa alu_arg2: 0x48888888 inst: 0x0042d313 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  1 EXMEM_rd:   6 IDEX_rs1:   5 IDEX_rs1_data: 0x6aaaaaaa EXMEM_alu_out: 0x08888888

However, examining the ALU log reveals that at clock cycle 39 during the EXE stage, the CPU adds 0x15555555 and 0x33333000 instead of 0x00000000 and 0x33333000 as expected from the lui t2, 0x33333 instruction. Further analysis shows that the value 0x15555555 is incorrectly forwarded from the write-back pipeline register. This issue originates from the module responsible for hazard detection.

The original implementation included a StructuralHazard class intended to resolve structural hazards but inadvertently handled data hazards instead, as shown in the code snippet below.
















// StructuralHazard (StructuralHazard.scala)

// Determine if forwarding is needed for rs1
when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs1)) {
    io.fwd_rs1 := true.B
}.otherwise {
    io.fwd_rs1 := false.B
}

// Determine if forwarding is needed for rs2
when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs2)) {
    io.fwd_rs2 := true.B
}.otherwise {
    io.fwd_rs2 := false.B
}

Additionally, its integration in Main.scala disrupted proper data forwarding by only addressing hazards from the MEM/WB pipeline and ignoring those from the EX/MEM pipeline. It also detected hazards at incorrect stages. To rectify this, we removed the flawed StructuralHazard class and correctly implemented structural hazard resolution.




















// PIPELINE (Main.scala)

// rs1_data
when (Structural.io.fwd_rs1 === 0.U) {
    S_rs1DataIn := RegFile.io.rdata1
}.elsewhen (Structural.io.fwd_rs1 === 1.U) {
    S_rs1DataIn := RegFile.io.w_data
}.otherwise {
    S_rs1DataIn := 0.S 
}

// rs2_data
when (Structural.io.fwd_rs2 === 0.U) {
    S_rs2DataIn := RegFile.io.rdata2
}.elsewhen (Structural.io.fwd_rs2 === 1.U) {
    S_rs2DataIn := RegFile.io.w_data
}.otherwise {
    S_rs2DataIn := 0.S
}

After removing the defective module, new issues emerged, observable in the logs of argument and temporary registers.

[          20 ] a0: ffffffff, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          21 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          22 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          23 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          24 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          25 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000

[          20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          24 ] t0: ffffffff, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          25 ] t0: ffffffff, t1: 7fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000

Specifically, at clock cycle 24, the instruction add t0, x0, a0 was supposed to complete execution. However, analysis of the control signal history in the decode and execute stages revealed that there was no forwarding of the latest value of a0. Consequently, reading and writing occurred simultaneously, causing the CPU to fetch stale data since reading is typically faster than writing.

[          20 ] idx:           5 op: 0x00 rs1:   0 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x000000e0 alu_arg2: 0x00000000 inst: 0x00a002b3 alu_ctrl_op_A:  1 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  0 EXMEM_rd:  10 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  18 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 7fff0000 ALUOp:  31
[          21 ] idx:           6 op: 0x33 rs1:   0 (0x00000000) rs2:  10 (0xffffffff) alu_arg1: 0x00000000 alu_arg2: 0x00000000 inst: 0x0012d313 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  0 EXMEM_rd:   1 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x000000e0 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  10 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 7fff0000 ALUOp:   0
[          22 ] idx:           7 op: 0x13 rs1:   5 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0xffffffff inst: 0x0062e2b3 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  0 EXMEM_rd:   0 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x00000000 IDEX_rs2_data: 0xffffffff IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:   1 io_rs1:   0 io_rs2:  10 MEM_WB_RD_Data: 000000e0 ALUOp:   0
[          23 ] idx:           8 op: 0x33 rs1:   5 (0x00000000) rs2:   6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000001 inst: 0x0022d313 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   5 IDEX_rs1:   5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:   0 io_rs1:   5 io_rs2:   0 MEM_WB_RD_Data: 00000000 ALUOp:   5
[          24 ] idx:           9 op: 0x13 rs1:   5 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x7fffffff inst: 0x0062e2b3 alu_ctrl_op_A:  0 alu_forward_a:  1 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   6 IDEX_rs1:   5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  1 MEM_WB_RD:   5 io_rs1:   5 io_rs2:   6 MEM_WB_RD_Data: ffffffff ALUOp:   6
[          25 ] idx:          10 op: 0x33 rs1:   5 (0xffffffff) rs2:   6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000002 inst: 0x0042d313 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   5 IDEX_rs1:   5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0xffffffff fwd_rs1:  0 MEM_WB_RD:   6 io_rs1:   5 io_rs2:   0 MEM_WB_RD_Data: 7fffffff ALUOp:   5

Furthermore, the forwarding scenarios primarily included EX/MEM → ALU, MEM/WB → ALU, and MEM/WB → InstrDecode. The root cause of the issue was neglecting the priority of writing to registers before reading from them when both operations use the same register. This oversight led to reading stale data. For example, while storing 0x7FFF0000 to a0, the CPU simultaneously attempted to read a0, resulting in the stale value 0xFFFFFFFF.

[          20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          24 ] t0: 7fff0000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          25 ] t0: 7fff0000, t1: 3fff8000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000

To ensure that writing to registers is prioritized before reading, we revised a section of the RegisterFile module




// RegisterFile (RegisterFile.scala)

io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))

and replaced it with the following code snippet


















// RegisterFile (RegisterFile.scala)

// 1) Read old data from the array.
val readData1 = Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1))
val readData2 = Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))

// 2) If there's a same-cycle write to the same register, override (bypass) it.
val bypassedData1 = Mux(io.reg_write && (io.w_reg === io.rs1) && (io.w_reg =/= 0.U), 
    io.w_data, 
    readData1)

val bypassedData2 = Mux(io.reg_write && (io.w_reg === io.rs2) && (io.w_reg =/= 0.U),
    io.w_data,
    readData2)

// 3) Send those results to outputs
io.rdata1 := bypassedData1
io.rdata2 := bypassedData2

4. Missing Instructions

After resolving these hazards, we encountered an unexpected issue with arithmetic operations. The log files below display the state history of save registers and control signals. At clock cycle 64, the instruction 0x408a5a13 (srai s4, s4, 8) is loaded and executed at clock cycle 66. By clock cycle 68, the instruction performs a logical right shift without the required sign extension for srai or sra instructions. This is evident from the ALUOp value of 5 at clock cycle 66, which corresponds to SRL instead of SRA.

[          64 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 00000000, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          65 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          66 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 04000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          67 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 83ff0000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          68 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          69 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000

[          64 ] idx:          64 op: 0x33 rs1:  18 (0x7fff0000) rs2:  20 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0x04000000 inst: 0x408a5a13 alu_ctrl_op_A:  3 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  19 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0000002b IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1:  0 MEM_WB_RD:   8 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 00000000 ALUOp:   0
[          65 ] idx:          65 op: 0x13 rs1:  20 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0x04000000 inst: 0x7f8002b7 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:  20 IDEX_rs1:  18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x04000000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  19 io_rs1:  18 io_rs2:  20 MEM_WB_RD_Data: 0000002b ALUOp:   0
[          66 ] idx:          66 op: 0x37 rs1:   0 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x83ff0000 alu_arg2: 0x00000408 inst: 0x005a7a33 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  20 IDEX_rs1:  20 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x83ff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  20 io_rs1:  20 io_rs2:   0 MEM_WB_RD_Data: 04000000 ALUOp:   5
[          67 ] idx:          67 op: 0x33 rs1:  20 (0x83ff0000) rs2:   5 (0x00000001) alu_arg1: 0x00000000 alu_arg2: 0x7f800000 inst: 0xfff90a93 alu_ctrl_op_A:  3 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  20 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0083ff00 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x83ff0000 fwd_rs1:  1 MEM_WB_RD:  20 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 83ff0000 ALUOp:   0
[          68 ] idx:          68 op: 0x13 rs1:  18 (0x7fff0000) rs2:   0 (0x00000000) alu_arg1: 0x0083ff00 alu_arg2: 0x7f800000 inst: 0x01fada93 alu_ctrl_op_A:  0 alu_forward_a:  1 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   5 IDEX_rs1:  20 IDEX_rs1_data_out: 0x83ff0000 EXMEM_alu_out: 0x7f800000 IDEX_rs2_data: 0x00000001 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1:  0 MEM_WB_RD:  20 io_rs1:  20 io_rs2:   5 MEM_WB_RD_Data: 0083ff00 ALUOp:   7
[          69 ] idx:          69 op: 0x13 rs1:  21 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0xffffffff inst: 0x013912b3 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  20 IDEX_rs1:  18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x00800000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:   5 io_rs1:  18 io_rs2:   0 MEM_WB_RD_Data: 7f800000 ALUOp:   0

Through debugging and observation, we discovered that some instructions were not implemented correctly. Specifically, in the ALU module, SRA and SRAI should be assigned an ALUOp value of 13 instead of 5.



























// AluOpCode (ALU.scala)

val ALU_ADD     =   0.U(5.W)
val ALU_ADDI    =   0.U(5.W)
val ALU_SW      =   0.U(5.W)
val ALU_LW      =   0.U(5.W)
val ALU_LUI     =   0.U(5.W)
val ALU_AUIPC   =   0.U(5.W)
val ALU_SLL     =   1.U(5.W)
val ALU_SLLI    =   1.U(5.W)
val ALU_SLT     =   2.U(5.W)
val ALU_SLTI    =   2.U(5.W)
val ALU_SLTU    =   3.U(5.W)
val ALU_SLTUI   =   3.U(5.W)
val ALU_XOR     =   4.U(5.W)
val ALU_XORI    =   4.U(5.W)
val ALU_SRL     =   5.U(5.W)
val ALU_SRLI    =   5.U(5.W)
val ALU_OR      =   6.U(5.W)
val ALU_ORI     =   6.U(5.W)
val ALU_AND     =   7.U(5.W)
val ALU_ANDI    =   7.U(5.W)
val ALU_SUB     =   8.U(5.W)
val ALU_SRA     =   13.U(5.W)
val ALU_SRAI    =   13.U(5.W)
val ALU_JAL     =   31.U(5.W)
val ALU_JALR    =   31.U(5.W)

The original ALU code only supported I-type instructions with operation codes less than 8, as it only considered func3 values from 000 to 111 (0 to 7).









// AluControl (Alu_Control.scala)

// R type
when (io.aluOp === 0.U) {
    io.out := Cat(0.U(2.W), io.func7, io.func3)
// I type
}.elsewhen (io.aluOp === 1.U) {
    io.out := Cat("b00".U(2.W), io.func3)
}

To fix this issue, we referred to the RISC-V instruction set and extended the ALU module to include the missing instructions. The following table illustrates the I-type and R-type instructions.

To accurately calculate the aluOp code, the ALU Control unit must consider the entire func7 field of I-type and R-type instructions. Originally, func7 was defined as a boolean in ALU Control, which was incorrect. We rectified this by defining func7 as a 7-bit unsigned integer.








// AluControl (Alu_Control.scala)

val io = IO(new Bundle {
    val func3 = Input(UInt(3.W))
    val func7 = Input(UInt(7.W)) // changed from Input(Bool())
    val aluOp = Input(UInt(3.W))
    val out = Output(UInt(5.W))
})

Additionally, outside of ALU Control, we ensured that the correct bits for func7 are properly extracted.


// PIPELINE (Main.scala)
ID_EX_.io.func7_in := IF_ID_.io.SelectedInstr_out(31, 25) // changed from IF_ID_.io.SelectedInstr_out(30)

The revised ALU Control code snippet below now supports SRA, SRAI, and SUB instructions, which have operation codes greater than 7.






































// AluControl (Alu_Control.scala)

// R type
when (io.aluOp === 0.U) {
    when ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b000".U(3.W))) {
        io.out := 8.U // SUB, originally broken
    }.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
        io.out := 13.U // SRA, originally broken
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
        io.out := 5.U // SRL
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) {
        io.out := 1.U // SLL
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b010".U(3.W))) {
        io.out := 2.U // SLT
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b011".U(3.W))) {
        io.out := 3.U // SLTU
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b100".U(3.W))) {
        io.out := 4.U // XOR
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b111".U(3.W))) {
        io.out := 7.U // AND
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b110".U(3.W))) {
        io.out := 6.U // OR
    }.otherwise {
        io.out := Cat(0.U(2.W), io.func7, io.func3)
    }

// I type
}.elsewhen(io.aluOp === 1.U) {
    when ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
        io.out := 5.U // SRLI
    }.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) {
        io.out := 13.U // SRAI, originally broken
    }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) {
        io.out := 1.U // SLLI
    }.otherwise {
        io.out := Cat("b00".U(2.W), io.func3)
    }
}

M Extension

1. ALU Control

Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala

To implement the M-extension, we need to modify the AluControl module to allow the func7 signal to be passed into the module. Below is the updated definition for the AluControl class:










class AluControl extends Module {
  val io = IO(new Bundle {
    val func3 = Input(UInt(3.W))  // 3-bit function code for RISC-V instructions
    val func7 = Input(UInt(7.W))  // 7-bit function code for RISC-V instructions (used for M-extension)
    val aluOp = Input(UInt(3.W))  // ALU operation selector
    val out = Output(UInt(5.W))   // ALU operation output code
  })
  io.out := 0.U
  ...
}

In the R-type instruction logic, we need to add a condition to handle M-extension instructions. Specifically, when func7 equals b0000001, the instruction corresponds to an M-extension operation, such as multiplication (MUL), division (DIV), or remainder (REM). Below is the updated code for supporting M-extension:

















// R type
when(io.aluOp === 0.U) {
  // First, check for M-extension: func7 === "b0000001"
  when(io.func7 === "b0000001".U) {
    // M-extension operations (e.g., MUL, DIV, REM)
    switch(io.func3) {
      is("b000".U) { io.out := 14.U } // MUL
      is("b001".U) { io.out := 15.U } // MULH
      is("b010".U) { io.out := 16.U } // MULHSU
      is("b011".U) { io.out := 17.U } // MULHU
      is("b100".U) { io.out := 18.U } // DIV
      is("b101".U) { io.out := 19.U } // DIVU
      is("b110".U) { io.out := 20.U } // REM
      is("b111".U) { io.out := 21.U } // REMU
    }
  ...
}

Adding func7 Input:
- The func7 signal is now passed as an input to the AluControl module. This allows the module to distinguish between standard R-type instructions and M-extension instructions, as M-extension operations are identified by func7 === "b0000001".
Condition for M-extension:
- A new when block is introduced to check if func7 equals b0000001, which indicates an M-extension instruction.
- Inside this block, a switch statement is used to determine the specific operation based on the func3 value.
Assigning ALU Operation Codes:
- Each M-extension instruction (e.g., MUL, DIV, REM) is assigned a unique 5-bit operation code (14.U to 21.U), which corresponds to the predefined codes in the ALU.

2. ALU

Extending the ALU to Support M-extension Instructions

To fully implement the M-extension, we need to modify both the AluOpCode object and the ALU module. Below are the detailed steps with the modifications.

Modifying AluOpCode to Include M-extension Instruction Types

We add operation codes for the M-extension instructions (MUL, DIV, REM, etc.) in the AluOpCode object. These codes will represent each specific M-extension operation.













object AluOpCode {
  ...

  // M-extension operations
  val ALU_MUL     =   14.U(5.W)  // Multiplication
  val ALU_MULH    =   15.U(5.W)  // Multiplication high (signed)
  val ALU_MULHSU  =   16.U(5.W)  // Multiplication high (signed x unsigned)
  val ALU_MULHU   =   17.U(5.W)  // Multiplication high (unsigned)
  val ALU_DIV     =   18.U(5.W)  // Division (signed)
  val ALU_DIVU    =   19.U(5.W)  // Division (unsigned)
  val ALU_REM     =   20.U(5.W)  // Remainder (signed)
  val ALU_REMU    =   21.U(5.W)  // Remainder (unsigned)
}

Each M-extension instruction is assigned a unique 5-bit operation code.
These codes are used by the AluControl module to generate the appropriate alu_Op value for the ALU.

Implementing M-extension Operations in the ALU Module

The ALU module is extended to perform the M-extension operations based on the alu_Op value provided.










































class ALU extends Module {
  val io = IO(new Bundle {
    val in_A = Input(SInt(32.W))    // First operand
    val in_B = Input(SInt(32.W))    // Second operand
    val alu_Op = Input(UInt(5.W))   // ALU operation code
    val out = Output(SInt(32.W))    // ALU result
  })

  val result = WireDefault(0.S(32.W))  // Default result is zero

  switch(io.alu_Op) {
    ...

    // M-extension operations
    is(ALU_MUL) {
      result := io.in_A * io.in_B  // Standard multiplication
    }
    is(ALU_MULH) {
      result := (io.in_A * io.in_B)(63, 32).asSInt  // High 32 bits of signed multiplication
    }
    is(ALU_MULHSU) {
      result := (io.in_A.asSInt * io.in_B.asUInt)(63, 32).asSInt  // High 32 bits (signed x unsigned)
    }
    is(ALU_MULHU) {
      result := (io.in_A.asUInt * io.in_B.asUInt)(63, 32).asSInt  // High 32 bits of unsigned multiplication
    }
    is(ALU_DIV) {
      result := io.in_A / io.in_B  // Signed division
    }
    is(ALU_DIVU) {
      result := (io.in_A.asUInt / io.in_B.asUInt).asSInt  // Unsigned division
    }
    is(ALU_REM) {
      result := io.in_A % io.in_B  // Signed remainder
    }
    is(ALU_REMU) {
      result := (io.in_A.asUInt % io.in_B.asUInt).asSInt  // Unsigned remainder
    }
  }

  io.out := result  // Output the result
}

5. Pipeline Flushing

Since branching or jumping occurs during the MEM stage, we need to flush both the IF/ID and ID/EX pipelines with NOP instructions (addi x0, x0, 0) and clear all control signals. The corrected code is shown below:
























































































// PIPELINE (Main.scala)

when(HazardDetect.io.pc_forward === 1.B) {
    PC.io.in := HazardDetect.io.pc_out
}.otherwise {
    when(control_module.io.next_pc_sel === "b01".U) {
        when(Branch_M.io.br_taken === 1.B && control_module.io.branch === 1.B) {
            PC.io.in := ImmGen.io.SB_type
            // Flush IF/ID
            IF_ID_.io.pc_in := 0.S
            IF_ID_.io.pc4_in := 0.U
            IF_ID_.io.SelectedPC:= 0.S
            IF_ID_.io.SelectedInstr := 0.U
            // Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
            ID_EX_.io.rs1_in           := 0.U
            ID_EX_.io.rs2_in           := 0.U
            ID_EX_.io.imm              := 0.S
            ID_EX_.io.func3_in         := 0.U
            ID_EX_.io.func7_in         := 0.U
            ID_EX_.io.rd_in            := 0.U
            // Also set the control signals to 0 so no writes occur:
            ID_EX_.io.ctrl_MemWr_in    := 0.U
            ID_EX_.io.ctrl_MemRd_in    := 0.U
            ID_EX_.io.ctrl_MemToReg_in := 0.U
            ID_EX_.io.ctrl_OpA_in      := 0.U
            ID_EX_.io.ctrl_OpB_in      := 0.U
            ID_EX_.io.ctrl_Branch_in   := 0.U
            ID_EX_.io.ctrl_nextpc_in   := 0.U
            ID_EX_.io.IFID_pc4_in      := 0.U
            ID_EX_.io.rs1_data_in      := 0.S
            ID_EX_.io.rs2_data_in      := 0.S
        }.otherwise {
            PC.io.in := PC4.io.out.asSInt
        }
    }.elsewhen(control_module.io.next_pc_sel === "b10".U) {
        PC.io.in := ImmGen.io.UJ_type
        // Flush IF/ID
        IF_ID_.io.pc_in := 0.S
        IF_ID_.io.pc4_in := 0.U
        IF_ID_.io.SelectedPC:= 0.S
        IF_ID_.io.SelectedInstr := 0.U
        // Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
        ID_EX_.io.rs1_in           := 0.U
        ID_EX_.io.rs2_in           := 0.U
        ID_EX_.io.imm              := 0.S
        ID_EX_.io.func3_in         := 0.U
        ID_EX_.io.func7_in         := 0.U
        ID_EX_.io.rd_in            := 0.U
        // Also set the control signals to 0 so no writes occur:
        ID_EX_.io.ctrl_MemWr_in    := 0.U
        ID_EX_.io.ctrl_MemRd_in    := 0.U
        ID_EX_.io.ctrl_MemToReg_in := 0.U
        ID_EX_.io.ctrl_OpA_in      := 0.U
        ID_EX_.io.ctrl_OpB_in      := 0.U
        ID_EX_.io.ctrl_Branch_in   := 0.U
        ID_EX_.io.ctrl_nextpc_in   := 0.U
        ID_EX_.io.IFID_pc4_in      := 0.U
        ID_EX_.io.rs1_data_in      := 0.S
        ID_EX_.io.rs2_data_in      := 0.S
    }.elsewhen(control_module.io.next_pc_sel === "b11".U) {
        PC.io.in := JALR.io.out.asSInt
        // Flush IF/ID
        IF_ID_.io.pc_in := 0.S
        IF_ID_.io.pc4_in := 0.U
        IF_ID_.io.SelectedPC:= 0.S
        IF_ID_.io.SelectedInstr := 0.U
        // Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0)
        ID_EX_.io.rs1_in           := 0.U
        ID_EX_.io.rs2_in           := 0.U
        ID_EX_.io.imm              := 0.S
        ID_EX_.io.func3_in         := 0.U
        ID_EX_.io.rs1_data_in      := 0.S
        ID_EX_.io.rs2_data_in      := 0.S
        // Also set the control signals to 0 so no writes occur:
        ID_EX_.io.ctrl_MemWr_in    := 0.U
        ID_EX_.io.ctrl_MemRd_in    := 0.U
        ID_EX_.io.ctrl_MemToReg_in := 0.U
        ID_EX_.io.ctrl_OpA_in      := 0.U
        ID_EX_.io.ctrl_OpB_in      := 0.U
        ID_EX_.io.ctrl_Branch_in   := 0.U
        ID_EX_.io.ctrl_nextpc_in   := 0.U
        ID_EX_.io.IFID_pc4_in      := 0.U
        ID_EX_.io.rs1_data_in      := 0.S
        ID_EX_.io.rs2_data_in      := 0.S
    }.otherwise {
        PC.io.in := PC4.io.out.asSInt
    }
}

C. Test Cases

1. argmax

This RISC-V assembly program finds the index of the maximum value in a predefined integer array. It initializes the array with three elements (0, 2, 1) and iterates through it to compare each element with the current maximum value. The program uses registers to track the current maximum value (t0), its index (t1), and the current index (t2). If a larger value is found, both the maximum value and its index are updated. Once the loop completes, the index of the maximum value is stored in register a0, and the program exits using a system call. This implementation demonstrates basic array traversal and conditional updates in assembly.
























































.data
array: .word 0, 0, 0
.text
_start:
    la a0, array
    li t1, 0
    addi s0, s0, 0
    addi s0, s0, 0
    sw t1, 0(a0)
    li t1, 2
    addi s0, s0, 0
    addi s0, s0, 0
    sw t1, 4(a0)
    li t1, 1
    addi s0, s0, 0
    addi s0, s0, 0
    sw t1, 8(a0)
    li a1, 3
argmax:
    li t6, 1
    lw t0, 0(a0)
    li t1, 0
    li t2, 1
loop_start:
    beq t2, a1, end
    addi s0, s0, 0
    addi s0, s0, 0
    addi a0, a0, 4
    addi s0, s0, 0
    addi s0, s0, 0
    lw t3, 0(a0)
    addi s0, s0, 0
    addi s0, s0, 0
    bge t3, t0, set_max_num
    addi s0, s0, 0
    addi s0, s0, 0
    addi t2, t2, 1
    addi s0, s0, 0
    addi s0, s0, 0
    j loop_start
    addi s0, s0, 0
    addi s0, s0, 0
set_max_num:
    mv t0, t3
    mv t1, t2
    addi t2, t2, 1
    addi s0, s0, 0
    addi s0, s0, 0
    j loop_start
    addi s0, s0, 0
    addi s0, s0, 0
end:
    mv a0, t1
    li a7, 10
    ecall

2. clz

This RISC-V assembly program calculates the number of leading zeros in a 32-bit integer. The program starts by loading a value (0x70000002) into register a0 and calls the my_clz function. In my_clz, the input value is processed using a bitmask (t3) initialized to 0x80000000 (representing the most significant bit). A loop checks each bit from left to right by performing a bitwise AND operation between the input value and the bitmask. If the current bit is 1, the loop exits; otherwise, the bitmask is right-shifted, and a counter (t1) is incremented. Once the loop completes, the count of leading zeros is returned in a0, and the program exits.



















main:
    li a0, 0x70000002
    jal ra, my_clz
    li a7, 10
    ecall

my_clz:
    mv t0, a0
    li t1, 0
    li t3, 0x80000000
clz_loop:
    and t4, t0, t3
    bne t4, x0, exit_clz
    srli t3, t3, 1
    addi t1, t1, 1
    bnez t3, clz_loop
exit_clz:
    mv a0, t1
    ret

3. fabsf

This RISC-V assembly program calculates the absolute value of a 32-bit floating-point number. The program begins by loading the value 0xFFFFFFFF into register a0, representing the input, and then calls the fabsf function. Inside fabsf, a bitmask (0x7FFFFFFF) is loaded into t0, which clears the sign bit of the input number when applied using a bitwise AND operation. The result, stored back in a0, represents the absolute value of the input. Finally, the program exits the function and terminates using a system call.









main:
    li a0, 0xFFFFFFFF
    jal ra, fabsf
    li a7, 10 
    ecall
fabsf:
    li t0, 0x7FFFFFFF
    and a0, a0, t0
    jr ra

4. fp16 to 32

This RISC-V assembly program converts a 16-bit floating-point number (FP16) to a 32-bit floating-point number (FP32). The main function loads the FP16 value (0xFFFFFFFF) into register a0 and calls the fp16_to_fp32 function. Within fp16_to_fp32, the program handles sign extraction, normalization, and exponent adjustment. The my_clz function is used to calculate the number of leading zeros for normalization. The program adjusts the FP16 format to FP32 by aligning the mantissa, adding a bias to the exponent, and managing special cases like zeros, infinities, and NaNs. Finally, the result is constructed by combining the sign, exponent, and mantissa and is returned in a0. The program uses a stack for register saving and restoring during function calls to maintain execution context.































































































main:
    li a0, 0xFFFFFFFF
    jal ra, fp16_to_fp32
    li a7, 10
    ecall
my_clz:
    my_clz_prologue:
        add t0, x0, a0
    my_clz_padding:
        srli t1, t0, 1
        or t0, t0, t1
        srli t1, t0, 2
        or t0, t0, t1
        srli t1, t0, 4
        or t0, t0, t1
        srli t1, t0, 8
        or t0, t0, t1
        srli t1, t0, 16
        or t0, t0, t1
    my_clz_popcount:
        srli t1, t0, 1
        li t2, 0x55555555
        and t1, t1, t2
        sub t0, t0, t1
        srli t1, t0, 2
        li t2, 0x33333333
        and t1, t1, t2
        and t2, t0, t2
        add t0, t1, t2
        srli t1, t0, 4
        add t1, t1, t0
        li t2, 0x0F0F0F0F
        and t0, t1, t2
        srli t1, t0, 8
        add t0, t0, t1
        srli t1, t0, 16
        add t0, t0, t1
        li t2, 0x3F
        and t0, t0, t2
        li t1, 32
        sub a0, t1, t0
    my_clz_epilogue:
        jr ra
fp16_to_fp32:
    fp16_to_fp32_prologue:
        addi sp, sp, -28
        sw ra, 0(sp)
        sw s0, 4(sp)
        sw s1, 8(sp)
        sw s2, 12(sp)
        sw s3, 16(sp)
        sw s4, 20(sp)
        sw s5, 24(sp)
    fp16_to_fp32_prologue_after:
        slli s0, a0, 16
        li s1, 0x80000000
        and s1, s1, s0
        li s2, 0x7FFFFFFF
        and s2, s2, s0
        mv a0, s2
        jal ra, my_clz
        li s3, 0
        li t0, 5
        slt t0, t0, a0
        beq t0, x0, fp16_to_fp32_post_overflow_check
        addi s3, a0, -5
    fp16_to_fp32_post_overflow_check:
        li s4, 0x04000000
        add s4, s2, s4
        srai s4, s4, 8
        li t0, 0x7F800000
        and s4, s4, t0
        addi s5, s2, -1
        srli s5, s5, 31
        sll t0, s2, s3
        srli t0, t0, 3
        li t1, 0x70
        sub t1, t1, s3
        slli t1, t1, 23
        add t0, t0, t1
        or t0, t0, s4
        not t1, s5
        and t0, t0, t1
        or a0, s1, t0
    fp16_to_fp32_epilogue:
        lw ra, 0(sp)
        lw s0, 4(sp)
        lw s1, 8(sp)
        lw s2, 12(sp)
        lw s3, 16(sp)
        lw s4, 20(sp)
        lw s5, 24(sp)
        addi sp, sp, 28
        jr ra

5. multiply

This RISC-V assembly program performs multiplication using the shift-and-add method, which is a bitwise algorithm. It takes two numbers (multiplier and multiplicand) and calculates their product without using the mul instruction. The program handles negative values by converting them to positive before computation and uses a 32-bit loop counter to iterate through each bit of the multiplier. For each bit, it conditionally adds the multiplicand to an accumulator if the bit is 1. The multiplier is shifted right, and the multiplicand is shifted left after each iteration. The result is stored in a0 at the end, and the program exits.



























main:
    li a1, 6
    li a3, 7
    li t0, 0
    li t1, 32
    bltz a1, handle_negative1
    j shift_and_add_loop
    bltz a3, handle_negative2
    j shift_and_add_loop
handle_negative1:
    neg a1, a1
handle_negative2:
    neg a3, a3
shift_and_add_loop:
    beqz t1, end_shift_and_add
    andi t2, a1, 1
    beqz t2, skip_add
    add t0, t0, a3
skip_add:
    srai a1, a1, 1
    slli a3, a3, 1
    addi t1, t1, -1
    j shift_and_add_loop
end_shift_and_add:
    mv a0, t0
    li a7, 10
    ecall

MainTest.scala










































class TOPTest extends FreeSpec with ChiselScalatestTester{
   "argmax test" in{
    test(new PIPELINE("/home/mi2s/FProject/compilation/argmax.txt")){
        x =>
        x.clock.step(69)

        x.io.out.expect(1.S)
       }
   }

   "clz test" in{
    test(new PIPELINE("/home/mi2s/FProject/compilation/clz.txt")){
        x =>
        x.clock.step(200) 
        x.io.out.expect(15.S)
       }
   }

   "fabsf test" in {
    test(new PIPELINE("/home/mi2s/FProject/test_compilation/fabsf.txt")){
        x =>
        x.clock.step(200) 
        x.io.out.expect(2147483647.S)
       }
   }

   "fp16_to_32 test" in {
    test(new PIPELINE("/home/mi2s/FProject/test_compilation/fp16_to_32.txt")){
        x =>
        x.clock.step(107)
        x.io.out.expect(-8192.S)
       }
   }

    "multiply test" in{
    test(new PIPELINE("/home/mi2s/FProject/compilation/multiply.txt")){
        x =>
        x.clock.step(370) 
        x.io.out.expect(42.S)
       }
   }
}

Test Result

[info] TOPTest:
[info] - argmax test
[info] - clz test
[info] - fabsf test
[info] - fp16_to_32 test
[info] - multiply test
[info] Run completed in 4 seconds, 621 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 8 s, completed Jan 23, 2025, 6:23:41 PM

D. Chisel Tutorial

Construct RISC-V CPU
- sbt test
  
  To save the execution history as a file, use sbt test > <filename.txt>.

E. RISC-V Compilation

Compiler Environment Setup
- git clone https://github.com/riscv/riscv-gnu-toolchain
- cd riscv-gnu-toolchain
- sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build
- make linux
Program Compilation
1. Conversion (*.s to *.elf)
  - Command: riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -o <in_name>.elf <out_name>.s
    
    For RISC-V programs utilizing the M-extension, change to -march=rv32im.
2. Conversion (*.elf to *.bin)
  - Command: riscv64-unknown-elf-objcopy -O binary <out_name>.elf <in_name>.bin
3. Conversion (*.elf to *.hex)
  - Command: riscv64-unknown-elf-objcopy -O verilog <out_name>.elf <in_name>.hex
    
    The compiled program must undergo post-processing for being encoded in the form of little Endian, containing special characters and whitespaces.