Try   HackMD

Construct RISC-V in Chisel

蕭郁霖, 徐向廷

A. Repository Study

5-Stage-RV32I

1. Basic Components

1.1 Register File

Filepath: src/main/scala/Pipeline/UNits/RegisterFile.scala

class RegisterFile extends Module { val io = IO(new Bundle { val rs1 = Input(UInt(5.W)) val rs2 = Input(UInt(5.W)) val reg_write = Input(Bool()) val w_reg = Input(UInt(5.W)) val w_data = Input(SInt(32.W)) val rdata1 = Output(SInt(32.W)) val rdata2 = Output(SInt(32.W)) }) val regfile = RegInit(VecInit(Seq.fill(32)(0.S(32.W)))) io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1)) io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2)) when(io.reg_write && io.w_reg =/= 0.U) { regfile(io.w_reg) := io.w_data } }

The code snippet defines a RegisterFile module for a RISC-V pipeline, featuring seven input and output ports dedicated to data transfer. In RISC-V, unlike the classic MIPS pipeline, the register file supports two read registers (rs1 and rs2) and a single write register. Initially, the register file is instantiated with 32 registers, all initialized to 0. The outputs rdata1 and rdata2 are continuously updated based on the values of rs1 and rs2, respectively—with a special check to ensure that reading from register 0 always returns 0. For write operations, if the reg_write flag is asserted and the target register (w_reg) is not zero, the corresponding register is updated with the value provided on w_data. The following image illustrates the seven ports that facilitate these operations in the RegisterFile unit.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

1.2 PC

Filepath: src/main/scala/Pipeline/UNits/PC.scala

class PC extends Module { val io = IO (new Bundle { val in = Input(SInt(32.W)) val out = Output(SInt(32.W)) }) val PC = RegInit(0.S(32.W)) io.out := PC PC := io.in }

The code snippet implements a program counter (PC) module that maintains the current program counter value. It uses RegInit to initialize the register to 0 and updates the stored PC value with the input (io.in) at every cycle, while also exposing this value via io.out.

1.3 PC + 4

Filepath: src/main/scala/Pipeline/UNits/PC4.scala

class PC4 extends Module { val io = IO (new Bundle { val pc = Input(UInt(32.W)) val out = Output(UInt(32.W)) }) io.out := 0.U io.out := io.pc + 4.U(32.W) }

The second snippet defines a PC4 module, which computes the next program counter value by simply adding 4 to the current PC input (io.pc). This incrementation is crucial for sequential instruction execution in the pipeline.

1.4 JALR

Filepath: src/main/scala/Pipeline/UNits/JALR.scala

class Jalr extends Module { val io = IO(new Bundle { val imme = Input(UInt(32.W)) val rdata1 = Input(UInt(32.W)) val out = Output(UInt(32.W)) }) val computedAddr = io.imme + io.rdata1 // Align the address by masking the least significant bit (LSB) to 0 io.out := computedAddr & "hFFFFFFFE".U }

The code snippet above implements the address calculation for the jump-and-link-register (JALR) instruction. The module computes the target address by adding a forwarded register value (rdata1) to an immediate offset (imme). To ensure proper alignment, it then applies a binary mask (0xFFFFFFFE), forcing the least significant bit (LSB) to 0. The aligned jump address is finally provided through io.out.

1.5 Imm-Generator

Filepath: src/main/scala/Pipeline/UNits/ImmGenerator.scala

class ImmGenerator extends Module { val io = IO(new Bundle { val instr = Input(UInt(32.W)) val pc = Input(UInt(32.W)) val I_type = Output(SInt(32.W)) val S_type = Output(SInt(32.W)) val SB_type = Output(SInt(32.W)) val U_type = Output(SInt(32.W)) val UJ_type = Output(SInt(32.W)) }) // I-Type Immediate: [31:20] sign-extended to 32 bits io.I_type := Cat(Fill(20, io.instr(31)), io.instr(31, 20)).asSInt // S-Type Immediate: [31:25][11:7] sign-extended to 32 bits io.S_type := Cat(Fill(20, io.instr(31)), io.instr(31, 25), io.instr(11, 7)).asSInt // Branch-Type Immediate: [31][7][30:25][11:8] sign-extended to 32 bits val sbImm = Cat(Fill(19, io.instr(31)), io.instr(31), io.instr(7), io.instr(30, 25), io.instr(11, 8), 0.U(1.W)).asSInt io.SB_type := sbImm + io.pc.asSInt // U-Type Immediate: [31:12] shifted left by 12 bits io.U_type := Cat(io.instr(31, 12), Fill(12, 0.U)).asSInt // UJ-Type Immediate: [31][19:12][20][30:21] sign-extended to 32 bits, shifted left by 1 bit val ujImm = Cat(Fill(11, io.instr(31)), io.instr(31), io.instr(19, 12), io.instr(20), io.instr(30, 21), 0.U(1.W)).asSInt io.UJ_type := ujImm + io.pc.asSInt }

The code snippet implements the generation of 32-bit immediate values from RISC-V instructions, tailored to each instruction format. For I-type instructions, it extracts bits [31:20] from the instruction and sign-extends them to 32 bits. In the case of S-type instructions, the immediate is formed by concatenating bits [31:25] with bits [11:7] and then sign-extending the result. For branch (SB-type) instructions, the immediate is built by concatenating several segments—bit 31, bit 7, bits [30:25], and bits [11:8]—with an additional 0 appended as the least significant bit for proper alignment, followed by sign extension. For U-type instructions, the immediate is taken from bits [31:12] and shifted left by 12 bits. Finally, for UJ-type instructions, the immediate is generated by concatenating bit 31, bits [19:12], bit 20, and bits [30:21], appending a trailing 0, and then sign-extending the result to 32 bits.

Additionally, the module computes target addresses for control flow instructions using these immediates. The output io.SB_type represents the branch target address for SB-type instructions, obtained by adding the sign-extended branch immediate to the current program counter (PC), thus yielding a PC-relative address for branch operations. Similarly, io.UJ_type provides the target address for UJ-type (jump) instructions by adding the corresponding immediate value to the current PC. These computed addresses are essential for correctly directing the control flow during instruction execution in the RISC-V pipeline.

1.6 Control Unit

Filepath: src/main/scala/Pipeline/UNits/control.scala

class Control extends Module { val io = IO(new Bundle { val opcode = Input(UInt(7.W)) // 7-bit opcode val mem_write = Output(Bool()) // whether a write to memory val branch = Output(Bool()) // whether a branch instruction val mem_read = Output(Bool()) // whether a read from memory val reg_write = Output(Bool()) // whether a register write val men_to_reg = Output(Bool()) // whether the value written to a register (for load instructions) val alu_operation = Output(UInt(3.W)) val operand_A = Output(UInt(2.W)) // Operand A source selection for the ALU val operand_B = Output(Bool()) // Operand B source selection for the ALU // Indicates the type of extension to be used (e.g., sign-extend, zero-extend) val extend = Output(UInt(2.W)) val next_pc_sel = Output(UInt(2.W)) // next PC value (e.g., PC+4, branch target, jump target) }) io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 0.B io.men_to_reg := 0.B io.alu_operation := 0.U io.operand_A := 0.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 0.U switch(io.opcode) { // R type instructions (e.g., add, sub) is(51.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 0.U io.operand_A := 0.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 0.U } // I type instructions (e.g., immediate operations) is(19.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 1.U io.operand_A := 0.U io.operand_B := 1.B io.extend := 0.U io.next_pc_sel := 0.U } // S type instructions (e.g., store operations) is(35.U) { io.mem_write := 1.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 0.B io.men_to_reg := 0.B io.alu_operation := 5.U io.operand_A := 0.U io.operand_B := 1.B io.extend := 1.U io.next_pc_sel := 0.U } // Load instructions (e.g., load data from memory) is(3.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 1.B io.reg_write := 1.B io.men_to_reg := 1.B io.alu_operation := 4.U io.operand_A := 0.U io.operand_B := 1.B io.extend := 0.U io.next_pc_sel := 0.U } // SB type instructions (e.g., conditional branch) is(99.U) { io.mem_write := 0.B io.branch := 1.B io.mem_read := 0.B io.reg_write := 0.B io.men_to_reg := 0.B io.alu_operation := 2.U io.operand_A := 0.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 1.U } // UJ type instructions (e.g., jump and link) is(111.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 3.U io.operand_A := 1.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 2.U } // Jalr instruction (e.g., jump and link register) is(103.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 3.U io.operand_A := 1.U io.operand_B := 0.B io.extend := 0.U io.next_pc_sel := 3.U } // U type (LUI) instructions (e.g., load upper immediate) is(55.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 6.U io.operand_A := 3.U io.operand_B := 1.B io.extend := 2.U io.next_pc_sel := 0.U } // U type (AUIPC) instructions (e.g., add immediate to PC) is(23.U) { io.mem_write := 0.B io.branch := 0.B io.mem_read := 0.B io.reg_write := 1.B io.men_to_reg := 0.B io.alu_operation := 7.U io.operand_A := 2.U io.operand_B := 1.B io.extend := 2.U io.next_pc_sel := 0.U } } }

The code snippet above implements the control unit for a 5-stage RISC-V pipeline. This module generates a suite of control signals—such as memory write, branch, memory read, register write, memory-to-register, ALU operation, operand selection, extension type, and next PC selection—that steer the processor’s datapath. Using a switch-case construct keyed on the opcode, the module assigns specific values to these signals according to the instruction type (e.g., R-type, I-type, S-type, SB-type, U-type, UJ-type, etc.). The accompanying diagram and mapping table illustrate how these signals are routed to the appropriate hardware components in the pipeline.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Label Signal Name (Code) Signal Name (Diagram)
1 io.mem_write MemWrite
2 io.branch Branch
3 io.mem_read MemRead
4 io.reg_write RegWrite
5 io.men_to_reg MemtoReg
6 io.alu_operation ALUSrc
7 io.operand_a ALUOp1
8 io.operand_b ALUOp0

1.7 Branching Unit

Filepath: src/main/scala/Pipeline/UNits/BRANCH.scala

class Branch extends Module { val io = IO(new Bundle { val fnct3 = Input(UInt(3.W)) val branch = Input(Bool()) val arg_x = Input(SInt(32.W)) val arg_y = Input(SInt(32.W)) val br_taken = Output(Bool()) }) io.br_taken := false.B when(io.branch) { // beq when(io.fnct3 === 0.U) { io.br_taken := io.arg_x === io.arg_y } // bne .elsewhen(io.fnct3 === 1.U) { io.br_taken := io.arg_x =/= io.arg_y } // blt .elsewhen(io.fnct3 === 4.U) { io.br_taken := io.arg_x < io.arg_y } // bge .elsewhen(io.fnct3 === 5.U) { io.br_taken := io.arg_x >= io.arg_y } // bltu (unsigned less than) .elsewhen(io.fnct3 === 6.U) { io.br_taken := io.arg_x.asUInt < io.arg_y.asUInt } // bgeu (unsigned greater than or equal) .elsewhen(io.fnct3 === 7.U) { io.br_taken := io.arg_x.asUInt >= io.arg_y.asUInt } } }

The code snippet implements branch decision logic for RISC-V's conditional branch instructions—namely, beq, bne, blt, bge, bltu, and bgeu. It uses four input ports: io.fnct3, which indicates the specific branch condition based on the instruction's function field; io.branch, a Boolean flag identifying whether the current instruction is an SB-Type branch; and io.arg_x and io.arg_y, which are the operands to be compared. Based on the value of fnct3, the module evaluates the appropriate comparison between arg_x and arg_y, and if the condition is satisfied, sets the output io.br_taken to true, indicating that a branch should be taken.

1.8 ALU Control Unit

Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala

class AluControl extends Module { val io = IO(new Bundle { val func3 = Input(UInt(3.W)) val func7 = Input(Bool()) val aluOp = Input(UInt(3.W)) val out = Output(UInt(5.W)) }) io.out := 0.U // R type when(io.aluOp === 0.U) { io.out := Cat(0.U(2.W), io.func7, io.func3) // I type }.elsewhen(io.aluOp === 1.U) { io.out := Cat("b00".U(2.W), io.func3) // SB type }.elsewhen(io.aluOp === 2.U) { io.out := Cat("b010".U(3.W), io.func3) // Branch type }.elsewhen(io.aluOp === 3.U) { io.out := "b11111".U // Loads, S type, U type (lui), U type (auipc) }.elsewhen(io.aluOp === 4.U || io.aluOp === 5.U || io.aluOp === 6.U || io.aluOp === 7.U) { io.out := "b00000".U } .otherwise { io.out := 0.U } }

The code snippet above implements the ALU Control Unit for a RISC-V pipeline, as illustrated in the diagram below. This unit features three input ports—func3, func7, and aluOp (a signal provided by the core control unit)—and one output port, io.out. The 5-bit output is determined by combining values from these inputs in a way that depends on the instruction type. For example, R-type instructions derive the ALU operation by concatenating specific bits from func7 and func3, while I-type instructions form the control signal by prepending a fixed two-bit value to func3. Other instruction types—such as branch (SB type), jump, and load/store operations—are assigned specific constant values to control the ALU accordingly.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

1.9 ALU Unit

Filepath: src/main/scala/Pipeline/UNits/Alu.scala

object AluOpCode { val ALU_ADD = 0.U(5.W) val ALU_ADDI = 0.U(5.W) val ALU_SW = 0.U(5.W) val ALU_LW = 0.U(5.W) val ALU_LUI = 0.U(5.W) val ALU_AUIPC = 0.U(5.W) val ALU_SLL = 1.U(5.W) val ALU_SLLI = 1.U(5.W) val ALU_SLT = 2.U(5.W) val ALU_SLTI = 2.U(5.W) val ALU_SLTU = 3.U(5.W) val ALU_SLTUI = 3.U(5.W) val ALU_XOR = 4.U(5.W) val ALU_XORI = 4.U(5.W) val ALU_SRL = 5.U(5.W) val ALU_SRLI = 5.U(5.W) val ALU_OR = 6.U(5.W) val ALU_ORI = 6.U(5.W) val ALU_AND = 7.U(5.W) val ALU_ANDI = 7.U(5.W) val ALU_SUB = 8.U(5.W) val ALU_SRA = 13.U(5.W) val ALU_SRAI = 13.U(5.W) val ALU_JAL = 31.U(5.W) val ALU_JALR = 31.U(5.W) } class ALU extends Module { val io = IO(new Bundle { val in_A = Input(SInt(32.W)) val in_B = Input(SInt(32.W)) val alu_Op = Input(UInt(5.W)) val out = Output(SInt(32.W)) }) val result = WireDefault(0.S(32.W)) switch(io.alu_Op) { is(ALU_ADD, ALU_ADDI, ALU_SW, ALU_LW, ALU_LUI, ALU_AUIPC) { result := io.in_A + io.in_B } is(ALU_SLL, ALU_SLLI) { result := (io.in_A.asUInt << io.in_B(4, 0)).asSInt } is(ALU_SLT, ALU_SLTI) { result := Mux(io.in_A < io.in_B, 1.S, 0.S) } is(ALU_SLTU, ALU_SLTUI) { result := Mux(io.in_A.asUInt < io.in_B.asUInt, 1.S, 0.S) } is(ALU_XOR, ALU_XORI) { result := io.in_A ^ io.in_B } is(ALU_SRL, ALU_SRLI) { result := (io.in_A.asUInt >> io.in_B(4, 0)).asSInt } is(ALU_OR, ALU_ORI) { result := io.in_A | io.in_B } is(ALU_AND, ALU_ANDI) { result := io.in_A & io.in_B } is(ALU_SUB) { result := io.in_A - io.in_B } is(ALU_SRA, ALU_SRAI) { result := (io.in_A >> io.in_B(4, 0)).asSInt } is(ALU_JAL, ALU_JALR) { result := io.in_A } } io.out := result }

The code snippet implements the ALU unit for a RISC-V pipeline, responsible for executing various arithmetic and logical operations based on the instruction type. The module accepts three input ports: two operands (io.in_A and io.in_B) and an operation code (io.alu_Op) coming from the ALU Control Unit. The result of the computation is output via io.out. For example, when io.alu_Op is set to ALU_ADD or ALU_ADDI (among other similar opcodes for load/store and immediate operations), the module computes the sum of io.in_A and io.in_B and assigns the result to io.out.


2. Pipeline Registers

Since the RISC-V pipeline consists of five stages, it requires four sets of pipeline registers. These registers are encapsulated in modules labeled IF/ID, ID/EX, EX/MEM, and MEM/WB, where the slash indicates the two adjacent stages that the register bridges. These pipeline registers are painted orange in the illustration below.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

2.1 IF_ID Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/IF_ID.scala

class IF_ID extends Module { val io = IO(new Bundle { val pc_in = Input (SInt(32.W)) // PC in val pc4_in = Input (UInt(32.W)) // PC4 in val SelectedPC = Input (SInt(32.W)) val SelectedInstr = Input (UInt(32.W)) val pc_out = Output (SInt(32.W)) // PC out val pc4_out = Output (UInt(32.W)) // PC + 4 out val SelectedPC_out = Output (SInt(32.W)) val SelectedInstr_out = Output (UInt(32.W)) }) val Pc_In = RegInit (0.S (32.W)) val Pc4_In = RegInit (0.U (32.W)) val S_pc = RegInit (0.S (32.W)) val S_instr = RegInit (0.U (32.W)) Pc_In := io.pc_in Pc4_In := io.pc4_in S_pc := io.SelectedPC S_instr := io.SelectedInstr io.pc_out := Pc_In io.pc4_out := Pc4_In io.SelectedPC_out := S_pc io.SelectedInstr_out := S_instr // io.pc_out := RegNext(io.pc_in) // io.pc4_out := RegNext(io.pc4_in) // io.SelectedPC_out := RegNext(io.SelectedPC) // io.SelectedInstr_out := RegNext(io.SelectedInstr) }

Although the illustration above shows only three register ports at IF/ID, the design also takes into account hazard detection (which will be discussed later). In this context, the SelectedPC signal represents the program counter after hazard resolution. Consequently, the IF/ID pipeline register stores four values: io.pc_in, io.pc4_in, io.SelectedPC, and io.SelectedInstr. These registers are instantiated using the RegInit class, which initializes them with default values.

2.2 ID_EX Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/ID_EX.scala

class ID_EX extends Module { val io = IO(new Bundle { val rs1_in = Input(UInt(5.W)) val rs2_in = Input(UInt(5.W)) val rs1_data_in = Input(SInt(32.W)) val rs2_data_in = Input(SInt(32.W)) val imm = Input(SInt(32.W)) val rd_in = Input(UInt(5.W)) val func3_in = Input(UInt(3.W)) val func7_in = Input(Bool()) val ctrl_MemWr_in = Input(Bool()) val ctrl_Branch_in = Input(Bool()) val ctrl_MemRd_in = Input(Bool()) val ctrl_Reg_W_in = Input(Bool()) val ctrl_MemToReg_in = Input(Bool()) val ctrl_AluOp_in = Input(UInt(3.W)) val ctrl_OpA_in = Input(UInt(2.W)) val ctrl_OpB_in = Input(Bool()) val ctrl_nextpc_in = Input(UInt(2.W)) val IFID_pc4_in = Input(UInt(32.W)) val rs1_out = Output(UInt(5.W)) val rs2_out = Output(UInt(5.W)) val rs1_data_out = Output(SInt(32.W)) val rs2_data_out = Output(SInt(32.W)) val rd_out = Output(UInt(5.W)) val imm_out = Output(SInt(32.W)) val func3_out = Output(UInt(3.W)) val func7_out = Output(Bool()) val ctrl_MemWr_out = Output(Bool()) val ctrl_Branch_out = Output(Bool()) val ctrl_MemRd_out = Output(Bool()) val ctrl_Reg_W_out = Output(Bool()) val ctrl_MemToReg_out = Output(Bool()) val ctrl_AluOp_out = Output(UInt(3.W)) val ctrl_OpA_out = Output(UInt(2.W)) val ctrl_OpB_out = Output(Bool()) val ctrl_nextpc_out = Output(UInt(2.W)) val IFID_pc4_out = Output(UInt(32.W)) }) io.rs1_out := RegNext(io.rs1_in) io.rs2_out := RegNext(io.rs2_in) io.rs1_data_out := RegNext(io.rs1_data_in) io.rs2_data_out := RegNext(io.rs2_data_in) io.imm_out := RegNext(io.imm) io.rd_out := RegNext(io.rd_in) io.func3_out := RegNext(io.func3_in) io.func7_out := RegNext(io.func7_in) io.ctrl_MemWr_out := RegNext(io.ctrl_MemWr_in) io.ctrl_Branch_out := RegNext(io.ctrl_Branch_in) io.ctrl_MemRd_out := RegNext(io.ctrl_MemRd_in) io.ctrl_Reg_W_out := RegNext(io.ctrl_Reg_W_in) io.ctrl_MemToReg_out := RegNext(io.ctrl_MemToReg_in) io.ctrl_AluOp_out := RegNext(io.ctrl_AluOp_in) io.ctrl_OpA_out := RegNext(io.ctrl_OpA_in) io.ctrl_OpB_out := RegNext(io.ctrl_OpB_in) io.ctrl_nextpc_out := RegNext(io.ctrl_nextpc_in) io.IFID_pc4_out := RegNext(io.IFID_pc4_in) }

The code snippet implements the ID/EX pipeline register, which captures and stores several critical values for the subsequent execution stage. In particular, it holds the operand data (rs1_data and rs2_data), the incremented program counter (IFID_pc4), and the immediate value (imm).

Additionally, it preserves nine control signals generated during instruction decode, ensuring proper propagation through the multi-stage pipeline. Register addresses and function fields such as rs1, rs2, rd, func3, and func7 are also stored to support data forwarding in the event of hazards.

RegNext is used instead of RegInit because it automatically captures and updates each value at the next clock cycle, maintaining seamless data flow between pipeline stages without the need for an explicit initial value.

2.3 EX_MEM Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/EX_MEM.scala

class EX_MEM extends Module { val io = IO(new Bundle { val IDEX_MEMRD = Input(Bool()) val IDEX_MEMWR = Input(Bool()) val IDEX_MEMTOREG = Input(Bool()) val IDEX_REG_W = Input(Bool()) val IDEX_rs2 = Input(SInt(32.W)) val IDEX_rd = Input(UInt(5.W)) val alu_out = Input(SInt(32.W)) val EXMEM_memRd_out = Output(Bool()) val EXMEM_memWr_out = Output(Bool()) val EXMEM_memToReg_out = Output(Bool()) val EXMEM_reg_w_out = Output(Bool()) val EXMEM_rs2_out = Output(SInt(32.W)) val EXMEM_rd_out = Output(UInt(5.W)) val EXMEM_alu_out = Output(SInt(32.W)) }) io.EXMEM_memRd_out := RegNext(io.IDEX_MEMRD) io.EXMEM_memWr_out := RegNext(io.IDEX_MEMWR) io.EXMEM_memToReg_out := RegNext(io.IDEX_MEMTOREG) io.EXMEM_reg_w_out := RegNext(io.IDEX_REG_W) io.EXMEM_rs2_out := RegNext(io.IDEX_rs2) io.EXMEM_rd_out := RegNext(io.IDEX_rd) io.EXMEM_alu_out := RegNext(io.alu_out) }

The code snippet above implements the EX/MEM pipeline registers, which transfer critical data and control signals from the execution stage (EX) to the memory stage (MEM). In this module, essential control signals—namely, memRD, memWr, and memToReg—are preserved to ensure proper memory operations and data routing. Additionally, the ALU result (alu_out) is stored along with the reg_w_out and rd_out signals, which are vital for hazard detection and data forwarding in later pipeline stages.

2.4 MEM_WB Pipeline

Filepath: src/main/scala/Pipeline/Pipelines/MEM_WB.scala

class MEM_WB extends Module { val io = IO(new Bundle { val EXMEM_MEMTOREG = Input(Bool()) val EXMEM_REG_W = Input(Bool()) val EXMEM_MEMRD = Input(Bool()) val EXMEM_rd = Input(UInt(5.W)) val in_dataMem_out = Input(SInt(32.W)) val in_alu_out = Input(SInt(32.W)) val MEMWB_memToReg_out = Output(Bool()) val MEMWB_reg_w_out = Output(Bool()) val MEMWB_memRd_out = Output(Bool()) val MEMWB_rd_out = Output(UInt(5.W)) val MEMWB_dataMem_out = Output(SInt(32.W)) val MEMWB_alu_out = Output(SInt(32.W)) }) io.MEMWB_memToReg_out := RegNext(io.EXMEM_MEMTOREG) io.MEMWB_reg_w_out := RegNext(io.EXMEM_REG_W) io.MEMWB_memRd_out := RegNext(io.EXMEM_MEMRD) io.MEMWB_rd_out := RegNext(io.EXMEM_rd) io.MEMWB_dataMem_out := RegNext(io.in_dataMem_out) io.MEMWB_alu_out := RegNext(io.in_alu_out) }

The code snippet above implements the MEM/WB pipeline registers, which transfer essential data from the memory stage (MEM) to the write-back stage (WB). Specifically, this module preserves control signals such as memToReg, reg_w, and memRd, as well as key data values including the destination register (rd), data from memory (dataMem), and the ALU output (alu).


3. Memory Units

In the RISC-V pipeline, two distinct memory units are employed: instruction memory and data memory. The repository implements these as separate modules, each tailored to its specific role in the processor's operation.

3.1 Inst-Memory

Filepath: src/main/scala/Pipeline/Memory/InstMem.scala

class InstMem(initFile: String) extends Module { val io = IO(new Bundle { val addr = Input(UInt(32.W)) // Address input to fetch instruction val data = Output(UInt(32.W)) // Output instruction }) val imem = Mem(1024, UInt(32.W)) loadMemoryFromFile(imem, initFile) io.data := imem(io.addr/4.U) }

The code snippet implements the instruction memory module for the RISC-V pipeline. This module features one 32-bit address input (io.addr) used to fetch instructions and one 32-bit data output (io.data) for delivering the corresponding instruction. The memory is instantiated with Mem(1024, UInt(32.W)), which creates an array of 1024 entries, each capable of storing a 32-bit instruction. The initFile parameter specifies the file from which the initial contents of the instruction memory are loaded, and the function loadMemoryFromFile is used to populate the memory with these values. Finally, the module accesses the instruction memory by dividing the input address by 4 to ensure proper word alignment.

3.2 Data-Memory

Filepath: src/main/scala/Pipeline/Memory/DataMemory.scala

class DataMemory extends Module { val io = IO(new Bundle { val addr = Input(UInt(32.W)) // Address input val dataIn = Input(SInt(32.W)) // Data to be written val mem_read = Input(Bool()) // Memory read enable val mem_write = Input(Bool()) // Memory write enable val dataOut = Output(SInt(32.W)) // Data output }) val Dmemory = Mem(1024, SInt(32.W)) io.dataOut := 0.S when(io.mem_write) { Dmemory.write(io.addr, io.dataIn) } when(io.mem_read) { io.dataOut := Dmemory.read(io.addr) } }

The code snippet implements the data memory unit for the RISC-V pipeline. This module features four input ports—io.addr, io.dataIn, io.mem_read, and io.mem_write—and one output port, io.dataOut. It instantiates a memory array with 1024 entries, where each entry is a 32-bit word. When the control signal io.mem_write is asserted, the module writes the data from io.dataIn into the memory at the address specified by io.addr. Conversely, if io.mem_read is activated, the module reads the data stored at io.addr and outputs it via io.dataOut.


4. Hazard Units

4.1 Structural Hazard

Filepath: src/main/scala/Pipeline/Hazard Units/StructuralHazard.scala

class StructuralHazard extends Module { val io = IO(new Bundle { val rs1 = Input(UInt(5.W)) val rs2 = Input(UInt(5.W)) val MEM_WB_regWr = Input(Bool()) val MEM_WB_Rd = Input(UInt(5.W)) val fwd_rs1 = Output(Bool()) val fwd_rs2 = Output(Bool()) }) // Determine if forwarding is needed for rs1 when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs1) { io.fwd_rs1 := true.B }.otherwise { io.fwd_rs1 := false.B } // Determine if forwarding is needed for rs2 when(io.MEM_WB_regWr && io.MEM_WB_Rd === io.rs2) { io.fwd_rs2 := true.B }.otherwise { io.fwd_rs2 := false.B } }

The code snippet implements the structural hazard resolution mechanism for the RISC-V pipeline. This module is connected to four input ports—rs1, rs2, MEM_WB_regWr, and MEM_WB_Rd—and produces two output ports—fwd_rs1 and fwd_rs2. The module checks whether the register destination in the MEM/WB stage (MEM_WB_Rd) matches either source register (rs1 or rs2) while ensuring that write-back is enabled (i.e., MEM_WB_regWr is asserted). If a match is detected, the corresponding forwarding signal (fwd_rs1 or fwd_rs2) is set to true; otherwise, it remains false.

4.2 Hazard Detection

Filepath: src/main/scala/Pipeline/Hazard Units/HazardDetection.scala

class HazardDetection extends Module { val io = IO(new Bundle { val IF_ID_inst = Input(UInt(32.W)) val ID_EX_memRead = Input(Bool()) val ID_EX_rd = Input(UInt(5.W)) val pc_in = Input(SInt(32.W)) val current_pc = Input(SInt(32.W)) val inst_forward = Output(Bool()) val pc_forward = Output(Bool()) val ctrl_forward = Output(Bool()) val inst_out = Output(UInt(32.W)) val pc_out = Output(SInt(32.W)) val current_pc_out = Output(SInt(32.W)) }) val Rs1 = io.IF_ID_inst(19, 15) val Rs2 = io.IF_ID_inst(24, 20) when(io.ID_EX_memRead === 1.B && ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))) { io.inst_forward := true.B io.pc_forward := true.B io.ctrl_forward := true.B }.otherwise { io.inst_forward := false.B io.pc_forward := false.B io.ctrl_forward := false.B } io.inst_out := io.IF_ID_inst io.pc_out := io.pc_in io.current_pc_out := io.current_pc }

The code snippet implements the hazard detection mechanism, which monitors potential data hazards in the pipeline. When the ID/EX stage is performing a memory read (i.e., io.ID_EX_memRead is true) and the destination register (io.ID_EX_rd) matches either of the source registers specified in the instruction (Rs1 or Rs2 extracted from io.IF_ID_inst), the module asserts three forwarding signals: inst_forward, pc_forward, and ctrl_forward are all set to true. These signals indicate that instruction, program counter, and control signal forwarding are required to avoid pipeline stalls. Otherwise, all forwarding signals remain false. Additionally, the module passes through the values of io.IF_ID_inst, io.pc_in, and io.current_pc to io.inst_out, io.pc_out, and io.current_pc_out, respectively, ensuring that the instruction and relevant PC values continue to the next pipeline stage.

4.3 Forwarding Unit

Filepath: src/main/scala/Pipeline/Hazard Units/Forwarding.scala

class Forwarding extends Module { val io = IO(new Bundle { val IDEX_rs1 = Input(UInt(5.W)) val IDEX_rs2 = Input(UInt(5.W)) val EXMEM_rd = Input(UInt(5.W)) val EXMEM_regWr = Input(UInt(1.W)) val MEMWB_rd = Input(UInt(5.W)) val MEMWB_regWr = Input(UInt(1.W)) val forward_a = Output(UInt(2.W)) val forward_b = Output(UInt(2.W)) }) io.forward_a := "b00".U io.forward_b := "b00".U // EX HAZARD when(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1.asUInt) && (io.EXMEM_rd === io.IDEX_rs2)) { io.forward_a := "b10".U io.forward_b := "b10".U }.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs2)) { io.forward_b := "b10".U }.elsewhen(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1)) { io.forward_a := "b10".U } // MEM HAZARD when((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2) && ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1) && (io.EXMEM_rd === io.IDEX_rs2))) { io.forward_a := "b01".U io.forward_b := "b01".U }.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs2) && ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs2))){ io.forward_b := "b01".U }.elsewhen((io.MEMWB_regWr === "b1".U) && (io.MEMWB_rd =/= "b00000".U) && (io.MEMWB_rd === io.IDEX_rs1) && ~(io.EXMEM_regWr === "b1".U && io.EXMEM_rd =/= "b00000".U && (io.EXMEM_rd === io.IDEX_rs1))){ io.forward_a := "b01".U } }

This module implements the forwarding unit, which dynamically selects and routes data from later pipeline stages to resolve data hazards in the RISC-V pipeline. The unit examines the source registers from the ID/EX stage (i.e., IDEX_rs1 and IDEX_rs2) and compares them with the destination registers from both the EX/MEM and MEM/WB stages. Depending on which stage provides the most recent data, the module assigns a corresponding two-bit value to the forwarding outputs (forward_a and forward_b). For example, when the EX/MEM stage is writing to a non-zero register that matches a source operand, the corresponding forward signal is set to binary 10, indicating that data should be forwarded directly from the EX/MEM stage.

In the MEM hazard section, the module addresses cases where the MEM/WB stage holds the data needed by the current instruction. Here, the module checks whether the MEM/WB stage is writing to a non-zero register that matches the source registers of the ID/EX stage. However, this forwarding is only enabled if the EX/MEM stage is not already forwarding for that register (thereby prioritizing EX hazards). If the conditions are met, the forward signal is set to binary 01, signaling that the required data should be forwarded from the MEM/WB stage. This mechanism ensures that even if an instruction's result has not been written back yet, the correct value is available for subsequent computations, thereby avoiding pipeline stalls.

4.4 Branch Forwarding

Filepath: src/main/scala/Pipeline/Hazard Units/BranchForward.scala

class BranchForward extends Module { val io = IO(new Bundle { val ID_EX_RD = Input(UInt(5.W)) val EX_MEM_RD = Input(UInt(5.W)) val MEM_WB_RD = Input(UInt(5.W)) val ID_EX_memRd = Input(UInt(1.W)) val EX_MEM_memRd = Input(UInt(1.W)) val MEM_WB_memRd = Input(UInt(1.W)) val rs1 = Input(UInt(5.W)) val rs2 = Input(UInt(5.W)) val ctrl_branch = Input(UInt(1.W)) val forward_rs1 = Output(UInt(4.W)) val forward_rs2 = Output(UInt(4.W)) }) io.forward_rs1 := "b0000".U io.forward_rs2 := "b0000".U // Branch forwarding logic when(io.ctrl_branch === 1.U) { // ALU Hazard when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U) { when(io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) { io.forward_rs1 := "b0001".U io.forward_rs2 := "b0001".U }.elsewhen(io.ID_EX_RD === io.rs1) { io.forward_rs1 := "b0001".U }.elsewhen(io.ID_EX_RD === io.rs2) { io.forward_rs2 := "b0001".U } } // EX/MEM Hazard when(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U) { when(io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2)) { io.forward_rs1 := "b0010".U io.forward_rs2 := "b0010".U }.elsewhen(io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b0010".U }.elsewhen(io.EX_MEM_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2)) { io.forward_rs2 := "b0010".U } } // MEM/WB Hazard when(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U) { when(io.MEM_WB_RD === io.rs1 && io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1 && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1 && io.EX_MEM_RD === io.rs2)) { io.forward_rs1 := "b0011".U io.forward_rs2 := "b0011".U }.elsewhen(io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b0011".U }.elsewhen(io.MEM_WB_RD === io.rs2 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs2) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs2)) { io.forward_rs2 := "b0011".U } } // Jalr forwarding logic }.elsewhen(io.ctrl_branch === 0.U) { when(io.ID_EX_RD =/= 0.U && io.ID_EX_memRd =/= 1.U && io.ID_EX_RD === io.rs1) { io.forward_rs1 := "b0110".U }.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd =/= 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b0111".U }.elsewhen(io.EX_MEM_RD =/= 0.U && io.EX_MEM_memRd === 1.U && io.EX_MEM_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1)) { io.forward_rs1 := "b1001".U }.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd =/= 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b1000".U }.elsewhen(io.MEM_WB_RD =/= 0.U && io.MEM_WB_memRd === 1.U && io.MEM_WB_RD === io.rs1 && !(io.ID_EX_RD =/= 0.U && io.ID_EX_RD === io.rs1) && !(io.EX_MEM_RD =/= 0.U && io.EX_MEM_RD === io.rs1)) { io.forward_rs1 := "b1010".U } } }

The BranchForward module is a key component in the RISC-V pipeline, responsible for resolving data hazards during branch and Jalr instruction execution. It determines if source operands for branch evaluation need to be forwarded from later pipeline stages to avoid stalls. The module takes as inputs the destination register identifiers and memory read flags from the ID/EX, EX/MEM, and MEM/WB pipeline stages, alongside the source register identifiers (rs1 and rs2) of the branch instruction and a control signal (ctrl_branch). The outputs, forward_rs1 and forward_rs2, are four-bit signals indicating the source of the forwarded data. When ctrl_branch is set to 1, branch forwarding logic is applied by sequentially checking for hazards in the ID/EX, EX/MEM, and MEM/WB stages, forwarding the most recent valid data to the source registers based on specific matching conditions.

For Jalr instructions, indicated when ctrl_branch is set to 0, the module only evaluates the source register rs1 for potential forwarding. It similarly checks the ID/EX, EX/MEM, and MEM/WB stages for data matches, prioritizing the most recent and valid data for forwarding. Different codes are assigned to forward_rs1 based on whether the data comes from a memory read or a non-memory read operation. This modular and hierarchical approach ensures that the correct operand is always forwarded for branch or Jalr instruction evaluation, reducing pipeline stalls and maintaining efficient instruction execution.

5. Pipeline

5.1 MuxLookup select PC value

Filepath: src/main/scala/Pipeline/Main.scala

val PC_F = MuxLookup(HazardDetect.io.pc_forward, 0.S, Array( (0.U) -> PC4.io.out.asSInt, (1.U) -> HazardDetect.io.pc_out)) PC.io.in := PC_F // PC_in input PC4.io.pc := PC.io.out.asUInt // PC4_in input <- PC_out InstMemory.io.addr := PC.io.out.asUInt // Address to fetch instruction val PC_for = MuxLookup (HazardDetect.io.inst_forward, 0.S, Array ( (0.U) -> PC.io.out, (1.U) -> HazardDetect.io.current_pc_out)) val Instruction_F = MuxLookup (HazardDetect.io.inst_forward, 0.U, Array ( (0.U) -> InstMemory.io.data, (1.U) -> HazardDetect.io.inst_out))

This code snippet demonstrates the use of MuxLookup to manage the Program Counter (PC) update logic in a pipeline processor. It incorporates hazard detection mechanisms to ensure the correct instruction is executed, even in the presence of potential pipeline hazards.

image

5.2 Register File Inputs (rs1 and rs2)

// Decode connections (Control unit RegFile) control_module.io.opcode := IF_ID_.io.SelectedInstr_out(6, 0) // OPcode to check Instrcution TYpe // Registerfile inputs RegFile.io.rs1 := Mux( control_module.io.opcode === 51.U || // R-type control_module.io.opcode === 19.U || // I-type control_module.io.opcode === 35.U || // S-type control_module.io.opcode === 3.U || // I-type (load instructions) control_module.io.opcode === 99.U || // SB-type (branch) control_module.io.opcode === 103.U, // JALR instruction IF_ID_.io.SelectedInstr_out(19, 15), 0.U ) RegFile.io.rs2 := Mux( control_module.io.opcode === 51.U || // R-type control_module.io.opcode === 35.U || // S-type control_module.io.opcode === 99.U, // SB-type (branch) IF_ID_.io.SelectedInstr_out(24, 20), 0.U) RegFile.io.reg_write := control_module.io.reg_write

This code is responsible for decoding the fetched instruction by extracting its opcode to identify the instruction type. Based on the opcode and the instruction format, it determines the values of the rs1 and rs2 register fields, specifying the source registers to be used for operations. The rs1 field is selected for instruction types such as R-type, I-type, S-type, SB-type, and JALR, while the rs2 field is used for R-type, S-type, and SB-type instructions. Additionally, the reg_write signal is configured to enable or disable write-back to the register file (RegFile), depending on whether the current instruction requires a write operation. This ensures the proper setup of source registers and write-back control for subsequent execution stages.

Instruction Opcode Decimal
R-type 011 0011 51
I-type 001 0011 19
S-type 010 0011 35
I-type (load instructions) 000 0011 3
SB-type (branch) 110 0011 99
JALR instruction 110 0111 103

image

5.3 Data Forwarding for rs1 and rs2 to Resolve Pipeline Hazards

// rs1_data when (Structural.io.fwd_rs1 === 0.U) { S_rs1DataIn := RegFile.io.rdata1 }.elsewhen (Structural.io.fwd_rs1 === 1.U) { S_rs1DataIn := RegFile.io.w_data }.otherwise { S_rs1DataIn := 0.S } // rs2_data when (Structural.io.fwd_rs2 === 0.U) { S_rs2DataIn := RegFile.io.rdata2 }.elsewhen (Structural.io.fwd_rs2 === 1.U) { S_rs2DataIn := RegFile.io.w_data }.otherwise { S_rs2DataIn := 0.S }

This code implements data forwarding for the rs1 and rs2 source registers to handle potential data hazards in the pipeline.

S_rs1DataIn and S_rs2DataIn: Wires used to hold the correct values for rs1 and rs2 after evaluating forwarding needs.

  • Forwarding Logic:
    • If no hazard exists, data is read directly from the register file.
    • If a hazard is detected, data is forwarded from the write-back stage to avoid delays.
  • Default Behavior: Sets the values to 0.S if no valid data path is available.

This ensures that the pipeline uses the most up-to-date data for execution, maintaining correctness and avoiding unnecessary stalls.

5.4 Stalling Logic for Control Hazard Resolution in Pipeline

// Stall when forward when(HazardDetect.io.ctrl_forward === "b1".U) { ID_EX_.io.ctrl_MemWr_in := 0.U ID_EX_.io.ctrl_MemRd_in := 0.U ID_EX_.io.ctrl_MemToReg_in := 0.U ID_EX_.io.ctrl_Reg_W_in := 0.U ID_EX_.io.ctrl_AluOp_in := 0.U ID_EX_.io.ctrl_OpB_in := 0.U ID_EX_.io.ctrl_Branch_in := 0.U ID_EX_.io.ctrl_nextpc_in := 0.U }.otherwise { ID_EX_.io.ctrl_MemWr_in := control_module.io.mem_write ID_EX_.io.ctrl_MemRd_in := control_module.io.mem_read ID_EX_.io.ctrl_MemToReg_in := control_module.io.men_to_reg ID_EX_.io.ctrl_Reg_W_in := control_module.io.reg_write ID_EX_.io.ctrl_AluOp_in := control_module.io.alu_operation ID_EX_.io.ctrl_OpB_in := control_module.io.operand_B ID_EX_.io.ctrl_Branch_in := control_module.io.branch ID_EX_.io.ctrl_nextpc_in := control_module.io.next_pc_sel }

This code snippet implements stalling logic to handle control hazards in a pipelined processor. When a hazard is detected, the pipeline stage is stalled by setting all control signals in the ID_EX pipeline register to 0. Otherwise, the normal control signals are passed through.


B. Rectifications

1. Exposing Registers

In addition to constructing a pipelined RISC-V CPU using Chisel, it is essential to verify the integrity of the structure. Therefore, we first verify the correctness of our RISC-V test code using a third-party processor simulator named Ripes. Next, we establish the expected register outputs and compare them with the results produced by our CPU.

However, since the register values are confined within the RegisterFile module, we need to "expose" them through the IO Bundle. The following code snippet shows the modified IO of this module, which exposes all argument registers, temporary registers, and save registers.

// RegisterFile (RegisterFile.scala) val io = IO(new Bundle { val rs1 = Input(UInt(5.W)) val rs2 = Input(UInt(5.W)) val reg_write = Input(Bool()) val w_reg = Input(UInt(5.W)) val w_data = Input(SInt(32.W)) val rdata1 = Output(SInt(32.W)) val rdata2 = Output(SInt(32.W)) // >> exposed argument registers val a0 = Output(SInt(32.W)) val a1 = Output(SInt(32.W)) val a2 = Output(SInt(32.W)) val a3 = Output(SInt(32.W)) val a4 = Output(SInt(32.W)) val a5 = Output(SInt(32.W)) val a6 = Output(SInt(32.W)) val a7 = Output(SInt(32.W)) // << exposed argument registers // >> exposed temporary registers val t0 = Output(SInt(32.W)) val t1 = Output(SInt(32.W)) val t2 = Output(SInt(32.W)) val t3 = Output(SInt(32.W)) val t4 = Output(SInt(32.W)) val t5 = Output(SInt(32.W)) val t6 = Output(SInt(32.W)) // << exposed temporary registers // >> exposed save registers val s0 = Output(SInt(32.W)) val s1 = Output(SInt(32.W)) val s2 = Output(SInt(32.W)) val s3 = Output(SInt(32.W)) val s4 = Output(SInt(32.W)) val s5 = Output(SInt(32.W)) val s6 = Output(SInt(32.W)) val s7 = Output(SInt(32.W)) val s8 = Output(SInt(32.W)) val s9 = Output(SInt(32.W)) val s10 = Output(SInt(32.W)) val s11 = Output(SInt(32.W)) // << exposed save registers })

After exposing these IO ports, we need to wire the register values to the corresponding output ports. The following code snippet implements the wiring logic within the module.

// RegisterFile (RegisterFile.scala) // >> wiring argument registers to corresponding output ports io.a0 := Mux(io.reg_write && io.w_reg === 10.U, io.w_data, regfile(10)) io.a1 := Mux(io.reg_write && io.w_reg === 11.U, io.w_data, regfile(11)) io.a2 := Mux(io.reg_write && io.w_reg === 12.U, io.w_data, regfile(12)) io.a3 := Mux(io.reg_write && io.w_reg === 13.U, io.w_data, regfile(13)) io.a4 := Mux(io.reg_write && io.w_reg === 14.U, io.w_data, regfile(14)) io.a5 := Mux(io.reg_write && io.w_reg === 15.U, io.w_data, regfile(15)) io.a6 := Mux(io.reg_write && io.w_reg === 16.U, io.w_data, regfile(16)) io.a7 := Mux(io.reg_write && io.w_reg === 17.U, io.w_data, regfile(17)) // << wiring argument registers to corresponding output ports // >> wiring temporary registers to corresponding output ports io.t0 := Mux(io.reg_write && io.w_reg === 5.U, io.w_data, regfile(5)) io.t1 := Mux(io.reg_write && io.w_reg === 6.U, io.w_data, regfile(6)) io.t2 := Mux(io.reg_write && io.w_reg === 7.U, io.w_data, regfile(7)) io.t3 := Mux(io.reg_write && io.w_reg === 28.U, io.w_data, regfile(28)) io.t4 := Mux(io.reg_write && io.w_reg === 29.U, io.w_data, regfile(29)) io.t5 := Mux(io.reg_write && io.w_reg === 30.U, io.w_data, regfile(30)) io.t6 := Mux(io.reg_write && io.w_reg === 31.U, io.w_data, regfile(31)) // << wiring temporary registers to corresponding output ports // >> wiring save registers to corresponding output ports io.s0 := Mux(io.reg_write && io.w_reg === 8.U, io.w_data, regfile(8)) io.s1 := Mux(io.reg_write && io.w_reg === 9.U, io.w_data, regfile(9)) io.s2 := Mux(io.reg_write && io.w_reg === 18.U, io.w_data, regfile(18)) io.s3 := Mux(io.reg_write && io.w_reg === 19.U, io.w_data, regfile(19)) io.s4 := Mux(io.reg_write && io.w_reg === 20.U, io.w_data, regfile(20)) io.s5 := Mux(io.reg_write && io.w_reg === 21.U, io.w_data, regfile(21)) io.s6 := Mux(io.reg_write && io.w_reg === 22.U, io.w_data, regfile(22)) io.s7 := Mux(io.reg_write && io.w_reg === 23.U, io.w_data, regfile(23)) io.s8 := Mux(io.reg_write && io.w_reg === 24.U, io.w_data, regfile(24)) io.s9 := Mux(io.reg_write && io.w_reg === 25.U, io.w_data, regfile(25)) io.s10 := Mux(io.reg_write && io.w_reg === 26.U, io.w_data, regfile(26)) io.s11 := Mux(io.reg_write && io.w_reg === 27.U, io.w_data, regfile(27)) // << wiring save registers to corresponding output ports

Similarly, we expose the register values outside the PIPELINE module using the subsequent code snippets.

// PIPELINE (Main.scala) val io = IO(new Bundle { val out = Output(SInt(32.W)) val out_pc = Output(SInt(32.W)) // >> exposed argument registers val a0 = Output(SInt(32.W)) val a1 = Output(SInt(32.W)) val a2 = Output(SInt(32.W)) val a3 = Output(SInt(32.W)) val a4 = Output(SInt(32.W)) val a5 = Output(SInt(32.W)) val a6 = Output(SInt(32.W)) val a7 = Output(SInt(32.W)) // << exposed argument registers // >> exposed temporary registers val t0 = Output(SInt(32.W)) val t1 = Output(SInt(32.W)) val t2 = Output(SInt(32.W)) val t3 = Output(SInt(32.W)) val t4 = Output(SInt(32.W)) val t5 = Output(SInt(32.W)) val t6 = Output(SInt(32.W)) // << exposed temporary registers // >> exposed save registers val s0 = Output(SInt(32.W)) val s1 = Output(SInt(32.W)) val s2 = Output(SInt(32.W)) val s3 = Output(SInt(32.W)) val s4 = Output(SInt(32.W)) val s5 = Output(SInt(32.W)) val s6 = Output(SInt(32.W)) val s7 = Output(SInt(32.W)) val s8 = Output(SInt(32.W)) val s9 = Output(SInt(32.W)) val s10 = Output(SInt(32.W)) val s11 = Output(SInt(32.W)) // << exposed save registers })
// PIPELINE (Main.scala) // >> wiring argument registers to corresponding output ports io.out_a0 := RegFile.io.a0 io.out_a1 := RegFile.io.a1 io.out_a2 := RegFile.io.a2 io.out_a3 := RegFile.io.a3 io.out_a4 := RegFile.io.a4 io.out_a5 := RegFile.io.a5 io.out_a6 := RegFile.io.a6 io.out_a7 := RegFile.io.a7 // << wiring argument registers to corresponding output ports // >> wiring temporary registers to corresponding output ports io.out_t0 := RegFile.io.t0 io.out_t1 := RegFile.io.t1 io.out_t2 := RegFile.io.t2 io.out_t3 := RegFile.io.t3 io.out_t4 := RegFile.io.t4 io.out_t5 := RegFile.io.t5 io.out_t6 := RegFile.io.t6 // << wiring temporary registers to corresponding output ports // >> wiring save registers to corresponding output ports io.out_s0 := RegFile.io.s0 io.out_s1 := RegFile.io.s1 io.out_s2 := RegFile.io.s2 io.out_s3 := RegFile.io.s3 io.out_s4 := RegFile.io.s4 io.out_s5 := RegFile.io.s5 io.out_s6 := RegFile.io.s6 io.out_s7 := RegFile.io.s7 io.out_s8 := RegFile.io.s8 io.out_s9 := RegFile.io.s9 io.out_s10 := RegFile.io.s10 io.out_s11 := RegFile.io.s11 // << wiring save registers to corresponding output ports

Finally, in our MainTest.scala, we add test cases following the structure shown in the code snippet below:

// MainTest.scala class TOPTest extends FreeSpec with ChiselScalatestTester{ "test a0" in { // test program test(new PIPELINE("/home/mi2s/FProject/compilation/testA0.txt")){ x => // the number of clock cycles to finish the program x.clock.step(6) // the expected value of a0 register x.io.out_a0.expect(10.S) } } }

2. Logging States

The code provided in the repository initially could not properly execute our test cases. Consequently, we traced the execution process and monitored register states after each clock cycle. However, since neither Chisel nor the author of the repository offers a user-friendly debugging tool like the Ripes simulator, which displays register values, we had to implement logging using printf statements. The following 3 code snippets demonstrate logging for temporary, argument, and save registers.

// PIPELINE (Main.scala) // t0-t6 : temporary registers printf(p"[ ${hwCounter} ] t0: ${Hexadecimal(RegFile.io.t0)}, t1: ${Hexadecimal(RegFile.io.t1)}, t2: ${Hexadecimal(RegFile.io.t2)}, t3: ${Hexadecimal(RegFile.io.t3)}, t4: ${Hexadecimal(RegFile.io.t4)}, t5: ${Hexadecimal(RegFile.io.t5)}, t6: ${Hexadecimal(RegFile.io.t6)}\n")
// PIPELINE (Main.scala) // a0-a7 : argument registers printf(p"[ ${hwCounter} ] a0: ${Hexadecimal(RegFile.io.a0)}, a1: ${Hexadecimal(RegFile.io.a1)}, a2: ${Hexadecimal(RegFile.io.a2)}, a3: ${Hexadecimal(RegFile.io.a3)}, a4: ${Hexadecimal(RegFile.io.a4)}, a5: ${Hexadecimal(RegFile.io.a5)}, a6: ${Hexadecimal(RegFile.io.a6)}, a7: ${Hexadecimal(RegFile.io.a7)}\n")
// PIPELINE (Main.scala) // s0-s11 : save registers printf(p"[ ${hwCounter} ] s0: ${Hexadecimal(RegFile.io.s0)}, s1: ${Hexadecimal(RegFile.io.s1)}, s2: ${Hexadecimal(RegFile.io.s2)}, s3: ${Hexadecimal(RegFile.io.s3)}, s4: ${Hexadecimal(RegFile.io.s4)}, s5: ${Hexadecimal(RegFile.io.s5)}, s6: ${Hexadecimal(RegFile.io.s6)}, s7: ${Hexadecimal(RegFile.io.s7)}, s8: ${Hexadecimal(RegFile.io.s8)}, s9: ${Hexadecimal(RegFile.io.s9)}, s10: ${Hexadecimal(RegFile.io.s10)}, s11: ${Hexadecimal(RegFile.io.s11)}\n")

Additionally, to effectively monitor and analyze the IF_ID and ID_EXE pipelines and ALU controls, we include the following code snippet for logging supplementary information.

// PIPELINE (Main.scala) // control signals from decode to execute (including ALU operands) printf(p"[ ${hwCounter} ] idx: ${Decimal(PC.io.out / 4.S + 1.S)} op: 0x${Hexadecimal(control_module.io.opcode)} rs1: ${Decimal(RegFile.io.rs1)} (0x${Hexadecimal(RegFile.io.rdata1)}) rs2: ${Decimal(RegFile.io.rs2)} (0x${Hexadecimal(RegFile.io.rdata2)}) alu_arg1: 0x${Hexadecimal(ALU.io.in_A)} alu_arg2: 0x${Hexadecimal(ALU.io.in_B)} inst: 0x${Hexadecimal(InstMemory.io.data)} alu_ctrl_op_A: ${ID_EX_.io.ctrl_OpA_out} alu_forward_a: ${Forwarding.io.forward_a} alu_ctrl_op_B: ${ID_EX_.io.ctrl_OpB_out} alu_forward_b: ${Forwarding.io.forward_b} EXMEM_rd: ${Decimal(Forwarding.io.EXMEM_rd)} IDEX_rs1: ${Decimal(Forwarding.io.IDEX_rs1)} IDEX_rs1_data_out: 0x${Hexadecimal(ID_EX_.io.rs1_data_out)} EXMEM_alu_out: 0x${Hexadecimal(EX_MEM_M.io.EXMEM_alu_out)} IDEX_rs2_data: 0x${Hexadecimal(ID_EX_.io.rs2_data_out)} IDEX_rs1_data_in: 0x${Hexadecimal(ID_EX_.io.rs1_data_in)} fwd_rs1: ${Structural.io.fwd_rs1} MEM_WB_RD: ${Decimal(Forwarding.io.MEMWB_rd)} io_rs1: ${Decimal(ID_EX_.io.rs1_out)} io_rs2: ${Decimal(ID_EX_.io.rs2_out)} MEM_WB_RD_Data: ${Hexadecimal(MEM_WB_M.io.MEMWB_alu_out)} ALUOp: ${Decimal(ALU.io.alu_Op)}\n")

3. Structural Hazards

While testing one of our programs, we observed an unusual discrepancy by tracing the logs and comparing the register states with those produced by the Ripes simulator.

[          35 ] t0: 7fffffff, t1: 3fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          36 ] t0: 7fffffff, t1: 3fffffff, t2: 55555000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          37 ] t0: 7fffffff, t1: 3fffffff, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          38 ] t0: 7fffffff, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          39 ] t0: 6aaaaaaa, t1: 15555555, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          40 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 55555555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          41 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888555, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          42 ] t0: 6aaaaaaa, t1: 1aaaaaaa, t2: 48888888, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000

In Ripes, at clock cycle 41, register t2 is expected to change to 0x33333000 and then to 0x33333333 due to the instruction li t2, 0x33333333.

[          35 ] idx:          20 op: 0x33 rs1:   6 (0x00007fff) rs2:   7 (0x00000000) alu_arg1: 0x55555000 alu_arg2: 0x00000555 inst: 0x406282b3 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   7 IDEX_rs1:   7 IDEX_rs1_data: 0x00000000 EXMEM_alu_out: 0x55555000
[          36 ] idx:          21 op: 0x33 rs1:   5 (0x7fffffff) rs2:   6 (0x3fffffff) alu_arg1: 0x3fffffff alu_arg2: 0x55555555 inst: 0x0022d313 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   7 IDEX_rs1:   6 IDEX_rs1_data: 0x3fffffff EXMEM_alu_out: 0x55555555
[          37 ] idx:          22 op: 0x13 rs1:   5 (0x7fffffff) rs2:   0 (0x00000000) alu_arg1: 0x7fffffff alu_arg2: 0x15555555 inst: 0x333333b7 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   6 IDEX_rs1:   5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x15555555
[          38 ] idx:          23 op: 0x37 rs1:   0 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x6aaaaaaa alu_arg2: 0x00000002 inst: 0x33338393 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   5 IDEX_rs1:   5 IDEX_rs1_data: 0x7fffffff EXMEM_alu_out: 0x6aaaaaaa
[          39 ] idx:          24 op: 0x13 rs1:   7 (0x55555555) rs2:   0 (0x00000000) alu_arg1: 0x15555555 alu_arg2: 0x33333000 inst: 0x00737333 alu_ctrl_op_A:  3 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   6 IDEX_rs1:   0 IDEX_rs1_data: 0x15555555 EXMEM_alu_out: 0x1aaaaaaa
[          40 ] idx:          25 op: 0x33 rs1:   6 (0x15555555) rs2:   7 (0x55555555) alu_arg1: 0x48888555 alu_arg2: 0x00000333 inst: 0x0072f3b3 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   7 IDEX_rs1:   7 IDEX_rs1_data: 0x55555555 EXMEM_alu_out: 0x48888555
[          41 ] idx:          26 op: 0x33 rs1:   5 (0x6aaaaaaa) rs2:   7 (0x55555555) alu_arg1: 0x1aaaaaaa alu_arg2: 0x48888888 inst: 0x007302b3 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   7 IDEX_rs1:   6 IDEX_rs1_data: 0x1aaaaaaa EXMEM_alu_out: 0x48888888
[          42 ] idx:          27 op: 0x33 rs1:   6 (0x1aaaaaaa) rs2:   7 (0x48888555) alu_arg1: 0x6aaaaaaa alu_arg2: 0x48888888 inst: 0x0042d313 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  1 EXMEM_rd:   6 IDEX_rs1:   5 IDEX_rs1_data: 0x6aaaaaaa EXMEM_alu_out: 0x08888888

However, examining the ALU log reveals that at clock cycle 39 during the EXE stage, the CPU adds 0x15555555 and 0x33333000 instead of 0x00000000 and 0x33333000 as expected from the lui t2, 0x33333 instruction. Further analysis shows that the value 0x15555555 is incorrectly forwarded from the write-back pipeline register. This issue originates from the module responsible for hazard detection.

The original implementation included a StructuralHazard class intended to resolve structural hazards but inadvertently handled data hazards instead, as shown in the code snippet below.

// StructuralHazard (StructuralHazard.scala) // Determine if forwarding is needed for rs1 when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs1)) { io.fwd_rs1 := true.B }.otherwise { io.fwd_rs1 := false.B } // Determine if forwarding is needed for rs2 when(io.MEM_WB_regWr && (io.MEM_WB_Rd === io.rs2)) { io.fwd_rs2 := true.B }.otherwise { io.fwd_rs2 := false.B }

Additionally, its integration in Main.scala disrupted proper data forwarding by only addressing hazards from the MEM/WB pipeline and ignoring those from the EX/MEM pipeline. It also detected hazards at incorrect stages. To rectify this, we removed the flawed StructuralHazard class and correctly implemented structural hazard resolution.

// PIPELINE (Main.scala) // rs1_data when (Structural.io.fwd_rs1 === 0.U) { S_rs1DataIn := RegFile.io.rdata1 }.elsewhen (Structural.io.fwd_rs1 === 1.U) { S_rs1DataIn := RegFile.io.w_data }.otherwise { S_rs1DataIn := 0.S } // rs2_data when (Structural.io.fwd_rs2 === 0.U) { S_rs2DataIn := RegFile.io.rdata2 }.elsewhen (Structural.io.fwd_rs2 === 1.U) { S_rs2DataIn := RegFile.io.w_data }.otherwise { S_rs2DataIn := 0.S }

After removing the defective module, new issues emerged, observable in the logs of argument and temporary registers.

[          20 ] a0: ffffffff, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          21 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          22 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          23 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          24 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          25 ] a0: 7fff0000, a1: 00000000, a2: 00000000, a3: 00000000, a4: 00000000, a5: 00000000, a6: 00000000, a7: 00000000
[          20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          24 ] t0: ffffffff, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          25 ] t0: ffffffff, t1: 7fffffff, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000

Specifically, at clock cycle 24, the instruction add t0, x0, a0 was supposed to complete execution. However, analysis of the control signal history in the decode and execute stages revealed that there was no forwarding of the latest value of a0. Consequently, reading and writing occurred simultaneously, causing the CPU to fetch stale data since reading is typically faster than writing.

[          20 ] idx:           5 op: 0x00 rs1:   0 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x000000e0 alu_arg2: 0x00000000 inst: 0x00a002b3 alu_ctrl_op_A:  1 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  0 EXMEM_rd:  10 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  18 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 7fff0000 ALUOp:  31
[          21 ] idx:           6 op: 0x33 rs1:   0 (0x00000000) rs2:  10 (0xffffffff) alu_arg1: 0x00000000 alu_arg2: 0x00000000 inst: 0x0012d313 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  0 EXMEM_rd:   1 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x000000e0 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  10 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 7fff0000 ALUOp:   0
[          22 ] idx:           7 op: 0x13 rs1:   5 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0xffffffff inst: 0x0062e2b3 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  0 EXMEM_rd:   0 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x00000000 IDEX_rs2_data: 0xffffffff IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:   1 io_rs1:   0 io_rs2:  10 MEM_WB_RD_Data: 000000e0 ALUOp:   0
[          23 ] idx:           8 op: 0x33 rs1:   5 (0x00000000) rs2:   6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000001 inst: 0x0022d313 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   5 IDEX_rs1:   5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:   0 io_rs1:   5 io_rs2:   0 MEM_WB_RD_Data: 00000000 ALUOp:   5
[          24 ] idx:           9 op: 0x13 rs1:   5 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x7fffffff inst: 0x0062e2b3 alu_ctrl_op_A:  0 alu_forward_a:  1 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   6 IDEX_rs1:   5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x7fffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  1 MEM_WB_RD:   5 io_rs1:   5 io_rs2:   6 MEM_WB_RD_Data: ffffffff ALUOp:   6
[          25 ] idx:          10 op: 0x33 rs1:   5 (0xffffffff) rs2:   6 (0x00000000) alu_arg1: 0xffffffff alu_arg2: 0x00000002 inst: 0x0042d313 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:   5 IDEX_rs1:   5 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0xffffffff IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0xffffffff fwd_rs1:  0 MEM_WB_RD:   6 io_rs1:   5 io_rs2:   0 MEM_WB_RD_Data: 7fffffff ALUOp:   5
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Furthermore, the forwarding scenarios primarily included EX/MEMALU, MEM/WBALU, and MEM/WBInstrDecode. The root cause of the issue was neglecting the priority of writing to registers before reading from them when both operations use the same register. This oversight led to reading stale data. For example, while storing 0x7FFF0000 to a0, the CPU simultaneously attempted to read a0, resulting in the stale value 0xFFFFFFFF.

[          20 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          21 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          22 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          23 ] t0: 00000000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          24 ] t0: 7fff0000, t1: 00000000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000
[          25 ] t0: 7fff0000, t1: 3fff8000, t2: 00000000, t3: 00000000, t4: 00000000, t5: 00000000, t6: 00000000

To ensure that writing to registers is prioritized before reading, we revised a section of the RegisterFile module

// RegisterFile (RegisterFile.scala) io.rdata1 := Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1)) io.rdata2 := Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2))

and replaced it with the following code snippet

// RegisterFile (RegisterFile.scala) // 1) Read old data from the array. val readData1 = Mux(io.rs1 === 0.U, 0.S, regfile(io.rs1)) val readData2 = Mux(io.rs2 === 0.U, 0.S, regfile(io.rs2)) // 2) If there's a same-cycle write to the same register, override (bypass) it. val bypassedData1 = Mux(io.reg_write && (io.w_reg === io.rs1) && (io.w_reg =/= 0.U), io.w_data, readData1) val bypassedData2 = Mux(io.reg_write && (io.w_reg === io.rs2) && (io.w_reg =/= 0.U), io.w_data, readData2) // 3) Send those results to outputs io.rdata1 := bypassedData1 io.rdata2 := bypassedData2

4. Missing Instructions

After resolving these hazards, we encountered an unexpected issue with arithmetic operations. The log files below display the state history of save registers and control signals. At clock cycle 64, the instruction 0x408a5a13 (srai s4, s4, 8) is loaded and executed at clock cycle 66. By clock cycle 68, the instruction performs a logical right shift without the required sign extension for srai or sra instructions. This is evident from the ALUOp value of 5 at clock cycle 66, which corresponds to SRL instead of SRA.

[          64 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 00000000, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          65 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 00000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          66 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 04000000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          67 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 83ff0000, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          68 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          69 ] s0: ffff0000, s1: 80000000, s2: 7fff0000, s3: 0000002b, s4: 0083ff00, s5: 00000000, s6: 00000000, s7: 00000000, s8: 00000000, s9: 00000000, s10: 00000000, s11: 00000000
[          64 ] idx:          64 op: 0x33 rs1:  18 (0x7fff0000) rs2:  20 (0x00000000) alu_arg1: 0x00000000 alu_arg2: 0x04000000 inst: 0x408a5a13 alu_ctrl_op_A:  3 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  19 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0000002b IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1:  0 MEM_WB_RD:   8 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 00000000 ALUOp:   0
[          65 ] idx:          65 op: 0x13 rs1:  20 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0x04000000 inst: 0x7f8002b7 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:  20 IDEX_rs1:  18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x04000000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  19 io_rs1:  18 io_rs2:  20 MEM_WB_RD_Data: 0000002b ALUOp:   0
[          66 ] idx:          66 op: 0x37 rs1:   0 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x83ff0000 alu_arg2: 0x00000408 inst: 0x005a7a33 alu_ctrl_op_A:  0 alu_forward_a:  2 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  20 IDEX_rs1:  20 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x83ff0000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:  20 io_rs1:  20 io_rs2:   0 MEM_WB_RD_Data: 04000000 ALUOp:   5
[          67 ] idx:          67 op: 0x33 rs1:  20 (0x83ff0000) rs2:   5 (0x00000001) alu_arg1: 0x00000000 alu_arg2: 0x7f800000 inst: 0xfff90a93 alu_ctrl_op_A:  3 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  20 IDEX_rs1:   0 IDEX_rs1_data_out: 0x00000000 EXMEM_alu_out: 0x0083ff00 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x83ff0000 fwd_rs1:  1 MEM_WB_RD:  20 io_rs1:   0 io_rs2:   0 MEM_WB_RD_Data: 83ff0000 ALUOp:   0
[          68 ] idx:          68 op: 0x13 rs1:  18 (0x7fff0000) rs2:   0 (0x00000000) alu_arg1: 0x0083ff00 alu_arg2: 0x7f800000 inst: 0x01fada93 alu_ctrl_op_A:  0 alu_forward_a:  1 alu_ctrl_op_B:  0 alu_forward_b:  2 EXMEM_rd:   5 IDEX_rs1:  20 IDEX_rs1_data_out: 0x83ff0000 EXMEM_alu_out: 0x7f800000 IDEX_rs2_data: 0x00000001 IDEX_rs1_data_in: 0x7fff0000 fwd_rs1:  0 MEM_WB_RD:  20 io_rs1:  20 io_rs2:   5 MEM_WB_RD_Data: 0083ff00 ALUOp:   7
[          69 ] idx:          69 op: 0x13 rs1:  21 (0x00000000) rs2:   0 (0x00000000) alu_arg1: 0x7fff0000 alu_arg2: 0xffffffff inst: 0x013912b3 alu_ctrl_op_A:  0 alu_forward_a:  0 alu_ctrl_op_B:  1 alu_forward_b:  0 EXMEM_rd:  20 IDEX_rs1:  18 IDEX_rs1_data_out: 0x7fff0000 EXMEM_alu_out: 0x00800000 IDEX_rs2_data: 0x00000000 IDEX_rs1_data_in: 0x00000000 fwd_rs1:  0 MEM_WB_RD:   5 io_rs1:  18 io_rs2:   0 MEM_WB_RD_Data: 7f800000 ALUOp:   0

Through debugging and observation, we discovered that some instructions were not implemented correctly. Specifically, in the ALU module, SRA and SRAI should be assigned an ALUOp value of 13 instead of 5.

// AluOpCode (ALU.scala) val ALU_ADD = 0.U(5.W) val ALU_ADDI = 0.U(5.W) val ALU_SW = 0.U(5.W) val ALU_LW = 0.U(5.W) val ALU_LUI = 0.U(5.W) val ALU_AUIPC = 0.U(5.W) val ALU_SLL = 1.U(5.W) val ALU_SLLI = 1.U(5.W) val ALU_SLT = 2.U(5.W) val ALU_SLTI = 2.U(5.W) val ALU_SLTU = 3.U(5.W) val ALU_SLTUI = 3.U(5.W) val ALU_XOR = 4.U(5.W) val ALU_XORI = 4.U(5.W) val ALU_SRL = 5.U(5.W) val ALU_SRLI = 5.U(5.W) val ALU_OR = 6.U(5.W) val ALU_ORI = 6.U(5.W) val ALU_AND = 7.U(5.W) val ALU_ANDI = 7.U(5.W) val ALU_SUB = 8.U(5.W) val ALU_SRA = 13.U(5.W) val ALU_SRAI = 13.U(5.W) val ALU_JAL = 31.U(5.W) val ALU_JALR = 31.U(5.W)

The original ALU code only supported I-type instructions with operation codes less than 8, as it only considered func3 values from 000 to 111 (0 to 7).

// AluControl (Alu_Control.scala) // R type when (io.aluOp === 0.U) { io.out := Cat(0.U(2.W), io.func7, io.func3) // I type }.elsewhen (io.aluOp === 1.U) { io.out := Cat("b00".U(2.W), io.func3) }

To fix this issue, we referred to the RISC-V instruction set and extended the ALU module to include the missing instructions. The following table illustrates the I-type and R-type instructions.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

To accurately calculate the aluOp code, the ALU Control unit must consider the entire func7 field of I-type and R-type instructions. Originally, func7 was defined as a boolean in ALU Control, which was incorrect. We rectified this by defining func7 as a 7-bit unsigned integer.

// AluControl (Alu_Control.scala) val io = IO(new Bundle { val func3 = Input(UInt(3.W)) val func7 = Input(UInt(7.W)) // changed from Input(Bool()) val aluOp = Input(UInt(3.W)) val out = Output(UInt(5.W)) })

Additionally, outside of ALU Control, we ensured that the correct bits for func7 are properly extracted.

// PIPELINE (Main.scala) ID_EX_.io.func7_in := IF_ID_.io.SelectedInstr_out(31, 25) // changed from IF_ID_.io.SelectedInstr_out(30)

The revised ALU Control code snippet below now supports SRA, SRAI, and SUB instructions, which have operation codes greater than 7.

// AluControl (Alu_Control.scala) // R type when (io.aluOp === 0.U) { when ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b000".U(3.W))) { io.out := 8.U // SUB, originally broken }.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) { io.out := 13.U // SRA, originally broken }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) { io.out := 5.U // SRL }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) { io.out := 1.U // SLL }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b010".U(3.W))) { io.out := 2.U // SLT }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b011".U(3.W))) { io.out := 3.U // SLTU }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b100".U(3.W))) { io.out := 4.U // XOR }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b111".U(3.W))) { io.out := 7.U // AND }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b110".U(3.W))) { io.out := 6.U // OR }.otherwise { io.out := Cat(0.U(2.W), io.func7, io.func3) } // I type }.elsewhen(io.aluOp === 1.U) { when ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b101".U(3.W))) { io.out := 5.U // SRLI }.elsewhen ((io.func7 === "b0100000".U(7.W)) && (io.func3 === "b101".U(3.W))) { io.out := 13.U // SRAI, originally broken }.elsewhen ((io.func7 === "b0000000".U(7.W)) && (io.func3 === "b001".U(3.W))) { io.out := 1.U // SLLI }.otherwise { io.out := Cat("b00".U(2.W), io.func3) } }

M Extension

1. ALU Control

Filepath: src/main/scala/Pipeline/UNits/Alu_Control.scala

To implement the M-extension, we need to modify the AluControl module to allow the func7 signal to be passed into the module. Below is the updated definition for the AluControl class:

class AluControl extends Module { val io = IO(new Bundle { val func3 = Input(UInt(3.W)) // 3-bit function code for RISC-V instructions val func7 = Input(UInt(7.W)) // 7-bit function code for RISC-V instructions (used for M-extension) val aluOp = Input(UInt(3.W)) // ALU operation selector val out = Output(UInt(5.W)) // ALU operation output code }) io.out := 0.U ... }

In the R-type instruction logic, we need to add a condition to handle M-extension instructions. Specifically, when func7 equals b0000001, the instruction corresponds to an M-extension operation, such as multiplication (MUL), division (DIV), or remainder (REM). Below is the updated code for supporting M-extension:

// R type when(io.aluOp === 0.U) { // First, check for M-extension: func7 === "b0000001" when(io.func7 === "b0000001".U) { // M-extension operations (e.g., MUL, DIV, REM) switch(io.func3) { is("b000".U) { io.out := 14.U } // MUL is("b001".U) { io.out := 15.U } // MULH is("b010".U) { io.out := 16.U } // MULHSU is("b011".U) { io.out := 17.U } // MULHU is("b100".U) { io.out := 18.U } // DIV is("b101".U) { io.out := 19.U } // DIVU is("b110".U) { io.out := 20.U } // REM is("b111".U) { io.out := 21.U } // REMU } ... }
  1. Adding func7 Input:
    • The func7 signal is now passed as an input to the AluControl module. This allows the module to distinguish between standard R-type instructions and M-extension instructions, as M-extension operations are identified by func7 === "b0000001".
  2. Condition for M-extension:
    • A new when block is introduced to check if func7 equals b0000001, which indicates an M-extension instruction.
    • Inside this block, a switch statement is used to determine the specific operation based on the func3 value.
  3. Assigning ALU Operation Codes:
    • Each M-extension instruction (e.g., MUL, DIV, REM) is assigned a unique 5-bit operation code (14.U to 21.U), which corresponds to the predefined codes in the ALU.
2. ALU

Extending the ALU to Support M-extension Instructions

To fully implement the M-extension, we need to modify both the AluOpCode object and the ALU module. Below are the detailed steps with the modifications.

  1. Modifying AluOpCode to Include M-extension Instruction Types

    We add operation codes for the M-extension instructions (MUL, DIV, REM, etc.) in the AluOpCode object. These codes will represent each specific M-extension operation.

object AluOpCode { ... // M-extension operations val ALU_MUL = 14.U(5.W) // Multiplication val ALU_MULH = 15.U(5.W) // Multiplication high (signed) val ALU_MULHSU = 16.U(5.W) // Multiplication high (signed x unsigned) val ALU_MULHU = 17.U(5.W) // Multiplication high (unsigned) val ALU_DIV = 18.U(5.W) // Division (signed) val ALU_DIVU = 19.U(5.W) // Division (unsigned) val ALU_REM = 20.U(5.W) // Remainder (signed) val ALU_REMU = 21.U(5.W) // Remainder (unsigned) }
  • Each M-extension instruction is assigned a unique 5-bit operation code.
  • These codes are used by the AluControl module to generate the appropriate alu_Op value for the ALU.
  1. Implementing M-extension Operations in the ALU Module

    The ALU module is extended to perform the M-extension operations based on the alu_Op value provided.

class ALU extends Module { val io = IO(new Bundle { val in_A = Input(SInt(32.W)) // First operand val in_B = Input(SInt(32.W)) // Second operand val alu_Op = Input(UInt(5.W)) // ALU operation code val out = Output(SInt(32.W)) // ALU result }) val result = WireDefault(0.S(32.W)) // Default result is zero switch(io.alu_Op) { ... // M-extension operations is(ALU_MUL) { result := io.in_A * io.in_B // Standard multiplication } is(ALU_MULH) { result := (io.in_A * io.in_B)(63, 32).asSInt // High 32 bits of signed multiplication } is(ALU_MULHSU) { result := (io.in_A.asSInt * io.in_B.asUInt)(63, 32).asSInt // High 32 bits (signed x unsigned) } is(ALU_MULHU) { result := (io.in_A.asUInt * io.in_B.asUInt)(63, 32).asSInt // High 32 bits of unsigned multiplication } is(ALU_DIV) { result := io.in_A / io.in_B // Signed division } is(ALU_DIVU) { result := (io.in_A.asUInt / io.in_B.asUInt).asSInt // Unsigned division } is(ALU_REM) { result := io.in_A % io.in_B // Signed remainder } is(ALU_REMU) { result := (io.in_A.asUInt % io.in_B.asUInt).asSInt // Unsigned remainder } } io.out := result // Output the result }

5. Pipeline Flushing

Since branching or jumping occurs during the MEM stage, we need to flush both the IF/ID and ID/EX pipelines with NOP instructions (addi x0, x0, 0) and clear all control signals. The corrected code is shown below:

// PIPELINE (Main.scala) when(HazardDetect.io.pc_forward === 1.B) { PC.io.in := HazardDetect.io.pc_out }.otherwise { when(control_module.io.next_pc_sel === "b01".U) { when(Branch_M.io.br_taken === 1.B && control_module.io.branch === 1.B) { PC.io.in := ImmGen.io.SB_type // Flush IF/ID IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U // Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0) ID_EX_.io.rs1_in := 0.U ID_EX_.io.rs2_in := 0.U ID_EX_.io.imm := 0.S ID_EX_.io.func3_in := 0.U ID_EX_.io.func7_in := 0.U ID_EX_.io.rd_in := 0.U // Also set the control signals to 0 so no writes occur: ID_EX_.io.ctrl_MemWr_in := 0.U ID_EX_.io.ctrl_MemRd_in := 0.U ID_EX_.io.ctrl_MemToReg_in := 0.U ID_EX_.io.ctrl_OpA_in := 0.U ID_EX_.io.ctrl_OpB_in := 0.U ID_EX_.io.ctrl_Branch_in := 0.U ID_EX_.io.ctrl_nextpc_in := 0.U ID_EX_.io.IFID_pc4_in := 0.U ID_EX_.io.rs1_data_in := 0.S ID_EX_.io.rs2_data_in := 0.S }.otherwise { PC.io.in := PC4.io.out.asSInt } }.elsewhen(control_module.io.next_pc_sel === "b10".U) { PC.io.in := ImmGen.io.UJ_type // Flush IF/ID IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U // Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0) ID_EX_.io.rs1_in := 0.U ID_EX_.io.rs2_in := 0.U ID_EX_.io.imm := 0.S ID_EX_.io.func3_in := 0.U ID_EX_.io.func7_in := 0.U ID_EX_.io.rd_in := 0.U // Also set the control signals to 0 so no writes occur: ID_EX_.io.ctrl_MemWr_in := 0.U ID_EX_.io.ctrl_MemRd_in := 0.U ID_EX_.io.ctrl_MemToReg_in := 0.U ID_EX_.io.ctrl_OpA_in := 0.U ID_EX_.io.ctrl_OpB_in := 0.U ID_EX_.io.ctrl_Branch_in := 0.U ID_EX_.io.ctrl_nextpc_in := 0.U ID_EX_.io.IFID_pc4_in := 0.U ID_EX_.io.rs1_data_in := 0.S ID_EX_.io.rs2_data_in := 0.S }.elsewhen(control_module.io.next_pc_sel === "b11".U) { PC.io.in := JALR.io.out.asSInt // Flush IF/ID IF_ID_.io.pc_in := 0.S IF_ID_.io.pc4_in := 0.U IF_ID_.io.SelectedPC:= 0.S IF_ID_.io.SelectedInstr := 0.U // Flush ID_EX (set inputs to zero so next cycle ID_EX_ registers become 0) ID_EX_.io.rs1_in := 0.U ID_EX_.io.rs2_in := 0.U ID_EX_.io.imm := 0.S ID_EX_.io.func3_in := 0.U ID_EX_.io.rs1_data_in := 0.S ID_EX_.io.rs2_data_in := 0.S // Also set the control signals to 0 so no writes occur: ID_EX_.io.ctrl_MemWr_in := 0.U ID_EX_.io.ctrl_MemRd_in := 0.U ID_EX_.io.ctrl_MemToReg_in := 0.U ID_EX_.io.ctrl_OpA_in := 0.U ID_EX_.io.ctrl_OpB_in := 0.U ID_EX_.io.ctrl_Branch_in := 0.U ID_EX_.io.ctrl_nextpc_in := 0.U ID_EX_.io.IFID_pc4_in := 0.U ID_EX_.io.rs1_data_in := 0.S ID_EX_.io.rs2_data_in := 0.S }.otherwise { PC.io.in := PC4.io.out.asSInt } }

C. Test Cases

1. argmax

This RISC-V assembly program finds the index of the maximum value in a predefined integer array. It initializes the array with three elements (0, 2, 1) and iterates through it to compare each element with the current maximum value. The program uses registers to track the current maximum value (t0), its index (t1), and the current index (t2). If a larger value is found, both the maximum value and its index are updated. Once the loop completes, the index of the maximum value is stored in register a0, and the program exits using a system call. This implementation demonstrates basic array traversal and conditional updates in assembly.

.data array: .word 0, 0, 0 .text _start: la a0, array li t1, 0 addi s0, s0, 0 addi s0, s0, 0 sw t1, 0(a0) li t1, 2 addi s0, s0, 0 addi s0, s0, 0 sw t1, 4(a0) li t1, 1 addi s0, s0, 0 addi s0, s0, 0 sw t1, 8(a0) li a1, 3 argmax: li t6, 1 lw t0, 0(a0) li t1, 0 li t2, 1 loop_start: beq t2, a1, end addi s0, s0, 0 addi s0, s0, 0 addi a0, a0, 4 addi s0, s0, 0 addi s0, s0, 0 lw t3, 0(a0) addi s0, s0, 0 addi s0, s0, 0 bge t3, t0, set_max_num addi s0, s0, 0 addi s0, s0, 0 addi t2, t2, 1 addi s0, s0, 0 addi s0, s0, 0 j loop_start addi s0, s0, 0 addi s0, s0, 0 set_max_num: mv t0, t3 mv t1, t2 addi t2, t2, 1 addi s0, s0, 0 addi s0, s0, 0 j loop_start addi s0, s0, 0 addi s0, s0, 0 end: mv a0, t1 li a7, 10 ecall

2. clz

This RISC-V assembly program calculates the number of leading zeros in a 32-bit integer. The program starts by loading a value (0x70000002) into register a0 and calls the my_clz function. In my_clz, the input value is processed using a bitmask (t3) initialized to 0x80000000 (representing the most significant bit). A loop checks each bit from left to right by performing a bitwise AND operation between the input value and the bitmask. If the current bit is 1, the loop exits; otherwise, the bitmask is right-shifted, and a counter (t1) is incremented. Once the loop completes, the count of leading zeros is returned in a0, and the program exits.

main: li a0, 0x70000002 jal ra, my_clz li a7, 10 ecall my_clz: mv t0, a0 li t1, 0 li t3, 0x80000000 clz_loop: and t4, t0, t3 bne t4, x0, exit_clz srli t3, t3, 1 addi t1, t1, 1 bnez t3, clz_loop exit_clz: mv a0, t1 ret

3. fabsf

This RISC-V assembly program calculates the absolute value of a 32-bit floating-point number. The program begins by loading the value 0xFFFFFFFF into register a0, representing the input, and then calls the fabsf function. Inside fabsf, a bitmask (0x7FFFFFFF) is loaded into t0, which clears the sign bit of the input number when applied using a bitwise AND operation. The result, stored back in a0, represents the absolute value of the input. Finally, the program exits the function and terminates using a system call.

main: li a0, 0xFFFFFFFF jal ra, fabsf li a7, 10 ecall fabsf: li t0, 0x7FFFFFFF and a0, a0, t0 jr ra

4. fp16 to 32

This RISC-V assembly program converts a 16-bit floating-point number (FP16) to a 32-bit floating-point number (FP32). The main function loads the FP16 value (0xFFFFFFFF) into register a0 and calls the fp16_to_fp32 function. Within fp16_to_fp32, the program handles sign extraction, normalization, and exponent adjustment. The my_clz function is used to calculate the number of leading zeros for normalization. The program adjusts the FP16 format to FP32 by aligning the mantissa, adding a bias to the exponent, and managing special cases like zeros, infinities, and NaNs. Finally, the result is constructed by combining the sign, exponent, and mantissa and is returned in a0. The program uses a stack for register saving and restoring during function calls to maintain execution context.

main: li a0, 0xFFFFFFFF jal ra, fp16_to_fp32 li a7, 10 ecall my_clz: my_clz_prologue: add t0, x0, a0 my_clz_padding: srli t1, t0, 1 or t0, t0, t1 srli t1, t0, 2 or t0, t0, t1 srli t1, t0, 4 or t0, t0, t1 srli t1, t0, 8 or t0, t0, t1 srli t1, t0, 16 or t0, t0, t1 my_clz_popcount: srli t1, t0, 1 li t2, 0x55555555 and t1, t1, t2 sub t0, t0, t1 srli t1, t0, 2 li t2, 0x33333333 and t1, t1, t2 and t2, t0, t2 add t0, t1, t2 srli t1, t0, 4 add t1, t1, t0 li t2, 0x0F0F0F0F and t0, t1, t2 srli t1, t0, 8 add t0, t0, t1 srli t1, t0, 16 add t0, t0, t1 li t2, 0x3F and t0, t0, t2 li t1, 32 sub a0, t1, t0 my_clz_epilogue: jr ra fp16_to_fp32: fp16_to_fp32_prologue: addi sp, sp, -28 sw ra, 0(sp) sw s0, 4(sp) sw s1, 8(sp) sw s2, 12(sp) sw s3, 16(sp) sw s4, 20(sp) sw s5, 24(sp) fp16_to_fp32_prologue_after: slli s0, a0, 16 li s1, 0x80000000 and s1, s1, s0 li s2, 0x7FFFFFFF and s2, s2, s0 mv a0, s2 jal ra, my_clz li s3, 0 li t0, 5 slt t0, t0, a0 beq t0, x0, fp16_to_fp32_post_overflow_check addi s3, a0, -5 fp16_to_fp32_post_overflow_check: li s4, 0x04000000 add s4, s2, s4 srai s4, s4, 8 li t0, 0x7F800000 and s4, s4, t0 addi s5, s2, -1 srli s5, s5, 31 sll t0, s2, s3 srli t0, t0, 3 li t1, 0x70 sub t1, t1, s3 slli t1, t1, 23 add t0, t0, t1 or t0, t0, s4 not t1, s5 and t0, t0, t1 or a0, s1, t0 fp16_to_fp32_epilogue: lw ra, 0(sp) lw s0, 4(sp) lw s1, 8(sp) lw s2, 12(sp) lw s3, 16(sp) lw s4, 20(sp) lw s5, 24(sp) addi sp, sp, 28 jr ra

5. multiply

This RISC-V assembly program performs multiplication using the shift-and-add method, which is a bitwise algorithm. It takes two numbers (multiplier and multiplicand) and calculates their product without using the mul instruction. The program handles negative values by converting them to positive before computation and uses a 32-bit loop counter to iterate through each bit of the multiplier. For each bit, it conditionally adds the multiplicand to an accumulator if the bit is 1. The multiplier is shifted right, and the multiplicand is shifted left after each iteration. The result is stored in a0 at the end, and the program exits.

main: li a1, 6 li a3, 7 li t0, 0 li t1, 32 bltz a1, handle_negative1 j shift_and_add_loop bltz a3, handle_negative2 j shift_and_add_loop handle_negative1: neg a1, a1 handle_negative2: neg a3, a3 shift_and_add_loop: beqz t1, end_shift_and_add andi t2, a1, 1 beqz t2, skip_add add t0, t0, a3 skip_add: srai a1, a1, 1 slli a3, a3, 1 addi t1, t1, -1 j shift_and_add_loop end_shift_and_add: mv a0, t0 li a7, 10 ecall

MainTest.scala

class TOPTest extends FreeSpec with ChiselScalatestTester{ "argmax test" in{ test(new PIPELINE("/home/mi2s/FProject/compilation/argmax.txt")){ x => x.clock.step(69) x.io.out.expect(1.S) } } "clz test" in{ test(new PIPELINE("/home/mi2s/FProject/compilation/clz.txt")){ x => x.clock.step(200) x.io.out.expect(15.S) } } "fabsf test" in { test(new PIPELINE("/home/mi2s/FProject/test_compilation/fabsf.txt")){ x => x.clock.step(200) x.io.out.expect(2147483647.S) } } "fp16_to_32 test" in { test(new PIPELINE("/home/mi2s/FProject/test_compilation/fp16_to_32.txt")){ x => x.clock.step(107) x.io.out.expect(-8192.S) } } "multiply test" in{ test(new PIPELINE("/home/mi2s/FProject/compilation/multiply.txt")){ x => x.clock.step(370) x.io.out.expect(42.S) } } }

Test Result

[info] TOPTest:
[info] - argmax test
[info] - clz test
[info] - fabsf test
[info] - fp16_to_32 test
[info] - multiply test
[info] Run completed in 4 seconds, 621 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 8 s, completed Jan 23, 2025, 6:23:41 PM

D. Chisel Tutorial

  • Construct RISC-V CPU
    • sbt test

      To save the execution history as a file, use sbt test > <filename.txt>.


E. RISC-V Compilation

  • Compiler Environment Setup
    • git clone https://github.com/riscv/riscv-gnu-toolchain
    • cd riscv-gnu-toolchain
    • sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build
    • make linux
  • Program Compilation
    1. Conversion (*.s to *.elf)
      • Command: riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -o <in_name>.elf <out_name>.s

        For RISC-V programs utilizing the M-extension, change to -march=rv32im.

    2. Conversion (*.elf to *.bin)
      • Command: riscv64-unknown-elf-objcopy -O binary <out_name>.elf <in_name>.bin
    3. Conversion (*.elf to *.hex)
      • Command: riscv64-unknown-elf-objcopy -O verilog <out_name>.elf <in_name>.hex

        The compiled program must undergo post-processing for being encoded in the form of little Endian, containing special characters and whitespaces.