Computer Architecture - Assignment 3

# Computer Architecture - Assignment 3 :::warning __Notice !__ In this assignment, I used ChatGPT to fix my code, understand the problem requirements, and improve my grammar. ::: contributed by <[Hao2152](https://github.com/Hao2152)> [Source Code](https://github.com/Hao2152/ca2025-mycpu) ## 1-Single-Cycle Exercise 1~9 ### Ex1 ``` scala= // ============================================================ // [CA25: Exercise 1] Immediate Extension - RISC-V Instruction Encoding // ============================================================ // Hint: RISC-V has five immediate formats, requiring correct bit-field // extraction and sign extension // // I-type (12-bit): Used for ADDI, LW, JALR, etc. // Immediate located at inst[31:20] // Requires sign extension to 32 bits // Hint: Use Fill() to replicate sign bit instruction(31) // val immI = Cat( Fill(Parameters.DataBits - 12, instruction(31)), // Sign extension: replicate bit 31 twenty times instruction(31, 20) // Immediate: bits [31:20] ) // S-type (12-bit): Used for SW, SH, SB store instructions // Immediate split into two parts: inst[31:25] and inst[11:7] // Need to concatenate these parts then sign extend // Hint: High bits at upper field, low bits at lower field // // TODO: Complete S-type immediate extension val immS = Cat( Fill(Parameters.DataBits - 12, instruction(?)), // Sign extension instruction(?), // High 7 bits instruction(?) // Low 5 bits ) // B-type (13-bit): Used for BEQ, BNE, BLT branch instructions // Immediate requires reordering: {sign, bit11, bits[10:5], bits[4:1], 0} // Note: LSB is always 0 (2-byte alignment) // Requires sign extension to 32 bits // Hint: B-type bit order is scrambled, must reorder per specification // // TODO: Complete B-type immediate extension val immB = Cat( Fill(Parameters.DataBits - 13, instruction(31)), // Sign extension instruction(?), // bit [12] instruction(?), // bit [11] instruction(?), // bits [10:5] instruction(?), // bits [4:1] ? // bit [0] = 0 (alignment) ) // U-type (20-bit): Used for LUI, AUIPC // Immediate located at inst[31:12], low 12 bits filled with zeros // No sign extension needed (placed directly in upper 20 bits) // Hint: U-type places 20 bits in result's upper bits, fills low 12 bits with 0 val immU = Cat(instruction(31, 12), 0.U(12.W)) // J-type (21-bit): Used for JAL // Immediate requires reordering: {sign, bits[19:12], bit11, bits[10:1], 0} // Note: LSB is always 0 (2-byte alignment) // Requires sign extension to 32 bits // Hint: J-type bit order is scrambled, similar to B-type // // TODO: Complete J-type immediate extension val immJ = Cat( Fill(Parameters.DataBits - 21, instruction(31)), // Sign extension instruction(?), // bit [20] instruction(?), // bits [19:12] instruction(?), // bit [11] instruction(?), // bits [10:1] ? // bit [0] = 0 (alignment) ) val immediate = MuxLookup(immKind.asUInt, 0.U(Parameters.DataBits.W))( Seq( ImmediateKind.I.asUInt -> immI, ImmediateKind.S.asUInt -> immS, ImmediateKind.B.asUInt -> immB, ImmediateKind.U.asUInt -> immU, ImmediateKind.J.asUInt -> immJ ) ) io.ex_immediate := immediate } ``` #### Answer - S-Type ``` scala= val immS = Cat( Fill(Parameters.DataBits - 12, instruction(31)), instruction(31, 25), instruction(11, 7) ) ``` - B-Type ``` scala= val immB = Cat( Fill(Parameters.DataBits - 13, instruction(31)), instruction(31), instruction(7), instruction(30, 25), instruction(11, 8), 0.U(1.W) ) ``` - J-Type ``` scala= val immJ = Cat( Fill(Parameters.DataBits - 21, instruction(31)), instruction(31), instruction(19, 12), instruction(20), instruction(30, 21), 0.U(1.W) ) ``` In Exercise 1, the immediate values are constructed strictly according to the RISC-V ISA specification For each instruction format, the immediate fields are extracted from the instruction word, reordered if necessary, and then sign-extended to 32 bits - S-type immediates are formed by concatenating inst[31:25] and inst[11:7], followed by sign extension using inst[31] - B-type immediates require bit reordering as {inst[31], inst[7], inst[30:25], inst[11:8], 0}. The least significant bit is always zero due to 2-byte alignment, and the immediate is sign-extended - J-type immediates are constructed as {inst[31], inst[19:12], inst[20], inst[30:21], 0}, with sign extension applied - U-type immediates directly place inst[31:12] in the upper 20 bits and fill the lower 12 bits with zeros, without sign extension The sign bit inst[31] is replicated using Fill() to ensure correct sign extension for signed immediates ### Ex2 ``` scala= // ============================================================ // [CA25: Exercise 2] Control Signal Generation // ============================================================ // Hint: Generate correct control signals based on instruction type // // Need to determine three key multiplexer selections: // 1. WriteBack data source selection (wbSource) // 2. ALU operand 1 selection (aluOp1Sel) // 3. ALU operand 2 selection (aluOp2Sel) // WriteBack data source selection: // - Default: ALU result // - Load instructions: Read from Memory // - JAL/JALR: Save PC+4 (return address) val wbSource = WireDefault(RegWriteSource.ALUResult) // TODO: Determine when to write back from Memory when(?) { wbSource := RegWriteSource.Memory } // TODO: Determine when to write back PC+4 .elsewhen(?) { wbSource := RegWriteSource.NextInstructionAddress } // ALU operand 1 selection: // - Default: Register rs1 // - Branch/AUIPC/JAL: Use PC (for calculating target address or PC+offset) // val aluOp1Sel = WireDefault(ALUOp1Source.Register) // TODO: Determine when to use PC as first operand // Hint: Consider instructions that need PC-relative addressing when(?) { aluOp1Sel := ALUOp1Source.InstructionAddress } // ALU operand 2 selection: // - Default: Register rs2 (for R-type instructions) // - I-type/S-type/B-type/U-type/J-type: Use immediate val needsImmediate = isLoad || isStore || isOpImm || isBranch || isLui || isAuipc || isJal || isJalr val aluOp2Sel = WireDefault(ALUOp2Source.Register) // TODO: Determine when to use immediate as second operand // Hint: Most instruction types except R-type use immediate when(?) { aluOp2Sel := ALUOp2Source.Immediate } ``` #### Answer ``` scala= val wbSource = WireDefault(RegWriteSource.ALUResult) when(isLoad) { wbSource := RegWriteSource.Memory }.elsewhen(isJal || isJalr) { wbSource := RegWriteSource.NextInstructionAddress } ``` ``` scala= val aluOp1Sel = WireDefault(ALUOp1Source.Register) when(isBranch || isAuipc || isJal) { aluOp1Sel := ALUOp1Source.InstructionAddress } ``` ``` scala= val aluOp2Sel = WireDefault(ALUOp2Source.Register) when(needsImmediate) { aluOp2Sel := ALUOp2Source.Immediate } ``` In Exercise 2, control signals are generated to correctly select datapath sources based on instruction types - Write-back source selection: The default write-back source is the ALU result Load instructions write data from memory, while JAL and JALR write back the next instruction address (PC+4) as the return address - ALU operand 1 selection: Most instructions use register rs1 as the first ALU operand However, branch instructions, AUIPC, and JAL require PC-relative address calculation, so the program counter is selected as the first operand - ALU operand 2 selection: All instruction types except R-type use an immediate value as the second ALU operand Therefore, when an instruction requires an immediate, the ALU selects the immediate instead of register rs2 ### Ex3 ``` scala= // ============================================================ // [CA25: Exercise 3] ALU Control Logic - Opcode/Funct3/Funct7 Decoding // ============================================================ // Hint: Determine which ALU operation to execute based on instruction's // opcode, funct3, and funct7 // // RISC-V instruction encoding rules: // - OpImm (I-type): funct3 determines operation type // - Special case: SRLI/SRAI distinguished by funct7[5] // - Op (R-type): funct3 determines operation type // - Special cases: ADD/SUB and SRL/SRA distinguished by funct7[5] // - Other types: All use ADD (for address calculation or immediate loading) // Default ALU function for address calculation (Branch, Load, Store, JAL, JALR, LUI, AUIPC) io.alu_funct := ALUFunctions.add switch(io.opcode) { is(InstructionTypes.OpImm) { // I-type immediate operation instructions (ADDI, SLTI, XORI, ORI, ANDI, SLLI, SRLI, SRAI) io.alu_funct := MuxLookup(io.funct3, ALUFunctions.zero)( Seq( // TODO: Map funct3 to corresponding ALU operation // Hint: Refer to definitions in InstructionsTypeI object InstructionsTypeI.addi -> ALUFunctions.add, // Completed example InstructionsTypeI.slli -> ALUFunctions.sll, InstructionsTypeI.slti -> ALUFunctions.slt, InstructionsTypeI.sltiu -> ALUFunctions.sltu, // TODO: Complete the following mappings InstructionsTypeI.xori -> ?, InstructionsTypeI.ori -> ?, InstructionsTypeI.andi -> ?, // SRLI/SRAI distinguished by funct7[5]: // funct7[5] = 0 → SRLI (logical right shift) // funct7[5] = 1 → SRAI (arithmetic right shift) // TODO: Complete Mux selection logic InstructionsTypeI.sri -> ? ) ) } is(InstructionTypes.Op) { // R-type register operation instructions (ADD, SUB, SLL, SLT, SLTU, XOR, SRL, SRA, OR, AND) io.alu_funct := MuxLookup(io.funct3, ALUFunctions.zero)( Seq( // ADD/SUB distinguished by funct7[5]: // funct7[5] = 0 → ADD // funct7[5] = 1 → SUB // TODO: Complete Mux selection logic InstructionsTypeR.add_sub -> ?, InstructionsTypeR.sll -> ALUFunctions.sll, InstructionsTypeR.slt -> ALUFunctions.slt, InstructionsTypeR.sltu -> ALUFunctions.sltu, // TODO: Complete the following mappings InstructionsTypeR.xor -> ?, InstructionsTypeR.or -> ?, InstructionsTypeR.and -> ?, // SRL/SRA distinguished by funct7[5]: // funct7[5] = 0 → SRL (logical right shift) // funct7[5] = 1 → SRA (arithmetic right shift) // TODO: Complete Mux selection logic InstructionsTypeR.sr -> ? ) ) } // All other instruction types use ADD for address/immediate calculation // (Branch, Load, Store, JAL, JALR, LUI, AUIPC) - handled by default assignment above } } ``` #### Answer ``` scala= is(InstructionTypes.OpImm) { io.alu_funct := MuxLookup(io.funct3, ALUFunctions.zero)( Seq( InstructionsTypeI.addi -> ALUFunctions.add, InstructionsTypeI.slli -> ALUFunctions.sll, InstructionsTypeI.slti -> ALUFunctions.slt, InstructionsTypeI.sltiu -> ALUFunctions.sltu, InstructionsTypeI.xori -> ALUFunctions.xor, InstructionsTypeI.ori -> ALUFunctions.or, InstructionsTypeI.andi -> ALUFunctions.and, InstructionsTypeI.sri -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl) ) ) } ``` ``` scala= is(InstructionTypes.Op) { io.alu_funct := MuxLookup(io.funct3, ALUFunctions.zero)( Seq( InstructionsTypeR.add_sub -> Mux(io.funct7(5), ALUFunctions.sub, ALUFunctions.add), InstructionsTypeR.sll -> ALUFunctions.sll, InstructionsTypeR.slt -> ALUFunctions.slt, InstructionsTypeR.sltu -> ALUFunctions.sltu, InstructionsTypeR.xor -> ALUFunctions.xor, InstructionsTypeR.or -> ALUFunctions.or, InstructionsTypeR.and -> ALUFunctions.and, InstructionsTypeR.sr -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl) ) ) } ``` In Exercise 3, the ALU operation is determined by decoding the instruction opcode, funct3, and funct7 fields according to the RISC-V ISA specification For I-type (OpImm) instructions, the ALU operation is selected based on funct3 Shift-right instructions (SRLI and SRAI) share the same funct3 value and are distinguished by funct7[5] For R-type (Op) instructions, funct3 determines the operation category, while funct7[5] is used to differentiate special cases such as ADD vs. SUB and SRL vs. SRA All other instruction types (Branch, Load, Store, JAL, JALR, LUI, AUIPC) use the ADD operation by default for address or immediate calculations This design follows the RISC-V instruction encoding rules and ensures correct ALU behavior for all RV32I instructions ### Ex4 ``` scala= // ============================================================ // [CA25: Exercise 4] Branch Comparison Logic // ============================================================ // Hint: Implement all six RV32I branch conditions // // Branch types: // - BEQ/BNE: Equality/inequality comparison (sign-agnostic) // - BLT/BGE: Signed comparison (requires type conversion) // - BLTU/BGEU: Unsigned comparison (direct comparison) val branchCondition = MuxLookup(funct3, false.B)( Seq( // TODO: Implement six branch conditions // Hint: Compare two register data values based on branch type InstructionsTypeB.beq -> ?, InstructionsTypeB.bne -> ?, // Signed comparison (need conversion to signed type) InstructionsTypeB.blt -> ?, InstructionsTypeB.bge -> ?, // Unsigned comparison InstructionsTypeB.bltu -> ?, InstructionsTypeB.bgeu -> ? ) ) val isBranch = opcode === InstructionTypes.Branch val isJal = opcode === Instructions.jal val isJalr = opcode === Instructions.jalr ``` #### Answer ``` scala= val branchCondition = MuxLookup(funct3, false.B)( Seq( // Equality / Inequality (sign-agnostic) InstructionsTypeB.beq -> (reg1 === reg2), InstructionsTypeB.bne -> (reg1 =/= reg2), // Signed comparison InstructionsTypeB.blt -> (reg1.asSInt < reg2.asSInt), InstructionsTypeB.bge -> (reg1.asSInt >= reg2.asSInt), // Unsigned comparison InstructionsTypeB.bltu -> (reg1 < reg2), InstructionsTypeB.bgeu -> (reg1 >= reg2) ) ) ``` In Exercise 4, branchCondition implements all six RV32I branch conditions based on funct3, following the RISC-V ISA specification - __*beq/bne*__ perform equality/inequality checks and are sign-agnostic, so direct bitwise equality (==, =/=) is sufficient - __*blt/bge*__ require signed comparisons, so both operands must be converted to signed type (asSInt) before comparing - __*bltu/bgeu*__ require unsigned comparisons, so the default unsigned comparisons on UInt are used directly ### Ex5 ``` scala= // ============================================================ // [CA25: Exercise 5] Jump Target Address Calculation // ============================================================ // Hint: Calculate branch and jump target addresses // // Address calculation rules: // - Branch: PC + immediate (PC-relative) // - JAL: PC + immediate (PC-relative) // - JALR: (rs1 + immediate) & ~1 (register base, clear LSB for alignment) // // TODO: Complete the following address calculations val branchTarget = ? val jalTarget = branchTarget // JAL and Branch use same calculation method // JALR address calculation: // 1. Add register value and immediate // 2. Clear LSB (2-byte alignment) val jalrSum = ? // TODO: Clear LSB using bit concatenation // Hint: Extract upper bits and append zero val jalrTarget = ? val branchTaken = isBranch && branchCondition io.if_jump_flag := branchTaken || isJal || isJalr io.if_jump_address := Mux( isJalr, jalrTarget, Mux(isJal, jalTarget, branchTarget) ) } ``` #### Answer ``` scala= val branchTarget = instruction_address + io.ex_immediate val jalrSum = io.reg1_data + io.ex_immediate val jalrTarget = Cat(jalrSum(Parameters.DataBits - 1, 1), 0.U(1.W)) ``` Exercise 5 computes target addresses for control-flow instructions according to the RV32I specification __Branch__ and __jal__ use PC-relative addressing, so the target is computed as: - target = PC + immediate __jalr__ uses register-relative addressing, so the target is: - target = (rs1 + immediate) & ~1 The least significant bit is cleared to enforce 2-byte alignment. In Chisel, this is implemented by concatenating the upper bits with 0: - Cat(sum(31, 1), 0.U(1.W)) ### Ex6 ``` scala= // ============================================================ // [CA25: Exercise 6] Load Data Extension - Sign and Zero Extension // ============================================================ // Hint: Implement proper sign extension and zero extension for load operations // // RISC-V Load instruction types: // - LB (Load Byte): Load 8-bit value and sign-extend to 32 bits // - LBU (Load Byte Unsigned): Load 8-bit value and zero-extend to 32 bits // - LH (Load Halfword): Load 16-bit value and sign-extend to 32 bits // - LHU (Load Halfword Unsigned): Load 16-bit value and zero-extend to 32 bits // - LW (Load Word): Load full 32-bit value, no extension needed // // Sign extension: Replicate the sign bit (MSB) to fill upper bits // Example: LB loads 0xFF → sign-extended to 0xFFFFFFFF // Zero extension: Fill upper bits with zeros // Example: LBU loads 0xFF → zero-extended to 0x000000FF when(io.memory_read_enable) { // Optimized load logic: extract bytes/halfwords based on address alignment val data = io.memory_bundle.read_data val bytes = Wire(Vec(Parameters.WordSize, UInt(Parameters.ByteWidth))) for (i <- 0 until Parameters.WordSize) { bytes(i) := data((i + 1) * Parameters.ByteBits - 1, i * Parameters.ByteBits) } // Select byte based on lower 2 address bits (mem_address_index) val byte = bytes(mem_address_index) // Select halfword based on bit 1 of address (word-aligned halfwords) val half = Mux(mem_address_index(1), Cat(bytes(3), bytes(2)), Cat(bytes(1), bytes(0))) // TODO: Complete sign/zero extension for load operations // Hint: // - Use Fill to replicate a bit multiple times // - For sign extension: Fill with the sign bit (MSB) // - For zero extension: Fill with zeros // - Use Cat to concatenate extension bits with loaded data io.wb_memory_read_data := MuxLookup(io.funct3, 0.U)( Seq( // TODO: Complete LB (sign-extend byte) // Hint: Replicate sign bit, then concatenate with byte InstructionsTypeL.lb -> ?, // TODO: Complete LBU (zero-extend byte) // Hint: Fill upper bits with zero, then concatenate with byte InstructionsTypeL.lbu -> ?, // TODO: Complete LH (sign-extend halfword) // Hint: Replicate sign bit, then concatenate with halfword InstructionsTypeL.lh -> ?, // TODO: Complete LHU (zero-extend halfword) // Hint: Fill upper bits with zero, then concatenate with halfword InstructionsTypeL.lhu -> ?, // LW: Load full word, no extension needed (completed example) InstructionsTypeL.lw -> data ) ) ``` #### Answer ``` scala= io.wb_memory_read_data := MuxLookup(io.funct3, 0.U)(Seq( InstructionsTypeL.lb -> Cat(Fill(24, byte(7)), byte), InstructionsTypeL.lbu -> Cat(Fill(24, 0.U), byte), InstructionsTypeL.lh -> Cat(Fill(16, half(15)), half), InstructionsTypeL.lhu -> Cat(Fill(16, 0.U), half), InstructionsTypeL.lw -> data )) ``` Exercise 6 implements correct sign/zero extension for RV32I load instructions - __lb__ loads an 8-bit value and performs sign extension to 32 bits by replicating the byte’s sign bit byte(7) into the upper 24 bits - __lbu__ loads an 8-bit value and performs zero extension by filling the upper 24 bits with zeros - __lh__ loads a 16-bit value and performs sign extension by replicating the halfword’s sign bit half(15) into the upper 16 bits - __lhu__ loads a 16-bit value and performs zero extension by filling the upper 16 bits with zeros - __lw__ loads a full 32-bit word, so no extension is needed ### Ex7 ``` scala= // ============================================================ // [CA25: Exercise 7] Store Data Alignment - Byte Strobes and Shifting // ============================================================ // Hint: Implement proper data alignment and byte strobes for store operations // // RISC-V Store instruction types: // - SB (Store Byte): Write 8-bit value to memory at byte-aligned address // - SH (Store Halfword): Write 16-bit value to memory at halfword-aligned address // - SW (Store Word): Write 32-bit value to memory at word-aligned address // // Key concepts: // 1. Byte strobes: Control which bytes in a 32-bit word are written // - SB: 1 strobe active (at mem_address_index position) // - SH: 2 strobes active (based on address bit 1) // - SW: All 4 strobes active // 2. Data shifting: Align data to correct byte position in 32-bit word // - mem_address_index (bits 1:0) indicates byte position // - Left shift by (mem_address_index * 8) bits for byte operations // - Left shift by 16 bits for upper halfword // // Examples: // - SB to address 0x1002 (index=2): data[7:0] → byte 2, strobe[2]=1 // - SH to address 0x1002 (index=2): data[15:0] → bytes 2-3, strobes[2:3]=1 }.elsewhen(io.memory_write_enable) { io.memory_bundle.write_enable := true.B io.memory_bundle.address := io.alu_result val data = io.reg2_data // Optimized store logic: reduce combinational depth by simplifying shift operations // mem_address_index is already computed from address alignment (bits 1:0) val strobeInit = VecInit(Seq.fill(Parameters.WordSize)(false.B)) val defaultData = 0.U(Parameters.DataWidth) val writeStrobes = WireInit(strobeInit) val writeData = WireDefault(defaultData) switch(io.funct3) { is(InstructionsTypeS.sb) { // TODO: Complete store byte logic // Hint: // 1. Enable single byte strobe at appropriate position // 2. Shift byte data to correct position based on address writeStrobes(?) := true.B writeData := data(?) << (mem_address_index << ?) } is(InstructionsTypeS.sh) { // TODO: Complete store halfword logic // Hint: Check address to determine lower/upper halfword position when(mem_address_index(?) === 0.U) { // Lower halfword (bytes 0-1) // TODO: Enable strobes for lower two bytes, no shifting needed writeStrobes(?) := true.B writeStrobes(?) := true.B writeData := data(?) }.otherwise { // Upper halfword (bytes 2-3) // TODO: Enable strobes for upper two bytes, apply appropriate shift writeStrobes(?) := true.B writeStrobes(?) := true.B writeData := data(?) << ? } } is(InstructionsTypeS.sw) { // Store word: enable all byte strobes, no shifting needed (completed example) writeStrobes := VecInit(Seq.fill(Parameters.WordSize)(true.B)) writeData := data } } io.memory_bundle.write_data := writeData io.memory_bundle.write_strobe := writeStrobes } } ``` #### Answer ``` scala= switch(io.funct3) { is(InstructionsTypeS.sb) { writeStrobes(mem_address_index) := true.B writeData := data(7, 0) << (mem_address_index << 3) } is(InstructionsTypeS.sh) { when(mem_address_index(1) === 0.U) { writeStrobes(0) := true.B writeStrobes(1) := true.B writeData := data(15, 0) }.otherwise { writeStrobes(2) := true.B writeStrobes(3) := true.B writeData := data(15, 0) << 16 } } is(InstructionsTypeS.sw) { writeStrobes := VecInit(Seq.fill(Parameters.WordSize)(true.B)) writeData := data } } ``` Exercise 7 implements correct byte-lane alignment for RV32I store instructions using byte strobes and data shifting __sb__ (Store Byte) writes only one byte. Therefore, exactly one strobe bit is asserted at mem_address_index, and the low 8-bit data (rs2[7:0]) is shifted left by (mem_address_index * 8) to place it into the correct byte lane of the 32-bit write data bus __sh__ (Store Halfword) writes two bytes. The target halfword position is determined by mem_address_index(1): - If mem_address_index(1)=0, the lower halfword (bytes 0–1) is written with strobes 0 and 1 enabled and no shifting - Otherwise, the upper halfword (bytes 2–3) is written with strobes 2 and 3 enabled and the 16-bit data shifted left by 16 bits __sw__ (Store Word) writes all four bytes, so all strobes are enabled and no shifting is required ### Ex8 ``` scala= // ============================================================ // [CA25: Exercise 8] WriteBack Source Selection // ============================================================ // Hint: Select the appropriate write-back data source based on instruction type // // WriteBack sources: // - ALU result (default): Used by arithmetic/logical/branch/jump instructions // - Memory read data: Used by load instructions (LB, LH, LW, LBU, LHU) // - Next instruction address (PC+4): Used by JAL/JALR for return address // // The control signal regs_write_source (from Decode stage) selects: // - RegWriteSource.ALUResult (0): Default, use ALU computation result // - RegWriteSource.Memory (1): Load instruction, use memory read data // - RegWriteSource.NextInstructionAddress (2): JAL/JALR, save return address // // TODO: Complete MuxLookup to multiplex writeback sources // Hint: Specify default value and cases for each source type io.regs_write_data := MuxLookup(io.regs_write_source, ?)( Seq( RegWriteSource.Memory -> ?, RegWriteSource.NextInstructionAddress -> ? ) ) } ``` #### Answer ``` scala= io.regs_write_data := MuxLookup(io.regs_write_source, io.alu_result)( Seq( RegWriteSource.Memory -> io.wb_memory_read_data, RegWriteSource.NextInstructionAddress -> (instruction_address + 4.U) ) ) ``` Exercise 8 selects the correct register write-back data using the control signal regs_write_source The default write-back value is the ALU result, which is used by most arithmetic and address-generation instructions For load instructions, the write-back value must come from memory read data (already sign/zero-extended in Exercise 6) For jal/jalr, the write-back value is PC + 4, which is the return address of the next instruction ### Ex9 ``` scala= // ============================================================ // [CA25: Exercise 9] PC Update Logic - Sequential vs Control Flow // ============================================================ // Hint: Implement program counter (PC) update logic for sequential execution // and control flow changes // // PC update rules: // 1. Control flow (jump/branch taken): PC = jump target address // - When jump flag is asserted, use jump address // - Covers: JAL, JALR, and taken branches (BEQ, BNE, BLT, BGE, BLTU, BGEU) // 2. Sequential execution: PC = PC + 4 // - When no jump/branch, increment PC by 4 bytes (next instruction) // - RISC-V instructions are 4 bytes (32 bits) in RV32I // 3. Invalid instruction: PC = PC (hold current value) // - When instruction is invalid, don't update PC // - Insert NOP to prevent illegal instruction execution // // Examples: // - Normal ADD: PC = 0x1000 → next PC = 0x1004 (sequential) // - JAL offset: PC = 0x1000, target = 0x2000 → next PC = 0x2000 (control flow) // - BEQ taken: PC = 0x1000, target = 0x0FFC → next PC = 0x0FFC (control flow) when(io.instruction_valid) { io.instruction := io.instruction_read_data // TODO: Complete PC update logic // Hint: Use multiplexer to select between jump target and sequential PC // - Check jump flag condition // - True case: Use jump target address // - False case: Sequential execution pc := ? }.otherwise { // When instruction is invalid, hold PC and insert NOP (ADDI x0, x0, 0) // NOP = 0x00000013 allows pipeline to continue safely without side effects pc := pc io.instruction := 0x00000013.U // NOP: prevents illegal instruction execution } io.instruction_address := pc } ``` #### Answer ``` scala= pc := Mux(io.if_jump_flag, io.if_jump_address, pc + 4.U) ``` Exercise 9 updates the program counter (PC) based on whether a control-flow change occurs If if_jump_flag is asserted (jal, jalr, or a taken branch), the next PC is set to the computed jump target address if_jump_address Otherwise, the CPU executes sequentially and the PC increments by 4 bytes (PC + 4), since RV32I instructions are 32-bit wide ### Test All ![image](https://hackmd.io/_uploads/S10CNSCfbg.png) ## 2-MMIO-Trap Exercise 1~15 ### Ex1~Ex5 Same as 1-Single-Cycle Exercises, but Ex2 not found in 2-mmio-trap ### Ex6 ``` scala= // ============================================================ // [CA25: Exercise 6] Control Signal Generation // ============================================================ // Hint: Generate correct control signals based on instruction type // // Need to determine three key multiplexer selections: // 1. WriteBack data source selection (wbSource) // 2. ALU operand 1 selection (aluOp1Sel) // 3. ALU operand 2 selection (aluOp2Sel) // WriteBack data source selection: // - Default: ALU result // - Load instructions: Read from Memory // - CSR instructions: Read from CSR // - JAL/JALR: Save PC+4 (return address) val wbSource = WireDefault(RegWriteSource.ALUResult) // TODO: Determine when to write back from Memory when(?) { wbSource := RegWriteSource.Memory } // TODO: Determine when to write back from CSR .elsewhen(?) { wbSource := RegWriteSource.CSR } // TODO: Determine when to write back PC+4 .elsewhen(?) { wbSource := RegWriteSource.NextInstructionAddress } // ALU operand 1 selection: // - Default: Register rs1 // - Branch/AUIPC/JAL: Use PC (for calculating target address or PC+offset) // val aluOp1Sel = WireDefault(ALUOp1Source.Register) // TODO: Determine when to use PC as first operand // Hint: Consider instructions that need PC-relative addressing when(?) { aluOp1Sel := ALUOp1Source.InstructionAddress } // ALU operand 2 selection: // - Default: Register rs2 (for R-type instructions) // - I-type/S-type/B-type/U-type/J-type: Use immediate val needsImmediate = isLoad || isStore || isOpImm || isBranch || isLui || isAuipc || isJal || isJalr val aluOp2Sel = WireDefault(ALUOp2Source.Register) // TODO: Determine when to use immediate as second operand // Hint: Most instruction types except R-type use immediate when(?) { aluOp2Sel := ALUOp2Source.Immediate } val immKind = WireDefault(ImmediateKind.None) when(isLoad || isOpImm || isJalr) { immKind := ImmediateKind.I } when(isStore) { immKind := ImmediateKind.S } when(isBranch) { immKind := ImmediateKind.B } when(isLui || isAuipc) { immKind := ImmediateKind.U } when(isJal) { immKind := ImmediateKind.J } io.regs_reg1_read_address := Mux(usesRs1, rs1, 0.U) io.regs_reg2_read_address := Mux(usesRs2, rs2, 0.U) io.ex_aluop1_source := aluOp1Sel io.ex_aluop2_source := aluOp2Sel io.memory_read_enable := isLoad io.memory_write_enable := isStore io.wb_reg_write_source := wbSource io.reg_write_enable := regWrite io.reg_write_address := rd io.csr_reg_address := instruction(31, 20) val csrWrites = isCsr && ( (funct3 === InstructionsTypeCSR.csrrw) || (funct3 === InstructionsTypeCSR.csrrwi) || (!csrSourceZero && (funct3 === InstructionsTypeCSR.csrrs || funct3 === InstructionsTypeCSR.csrrc || funct3 === InstructionsTypeCSR.csrrsi || funct3 === InstructionsTypeCSR.csrrci)) ) io.csr_reg_write_enable := csrWrites ``` #### Answer ``` scala= val wbSource = WireDefault(RegWriteSource.ALUResult) when(isLoad) { wbSource := RegWriteSource.Memory }.elsewhen(isCsr) { wbSource := RegWriteSource.CSR }.elsewhen(isJal || isJalr) { wbSource := RegWriteSource.NextInstructionAddress } val aluOp1Sel = WireDefault(ALUOp1Source.Register) when(isBranch || isAuipc || isJal) { aluOp1Sel := ALUOp1Source.InstructionAddress } val aluOp2Sel = WireDefault(ALUOp2Source.Register) when(needsImmediate) { aluOp2Sel := ALUOp2Source.Immediate } ``` Exercise 6 in the MMIO-Trap design generates control signals for the datapath, extending the single-cycle control logic with CSR support - Write-back source selection: The default write-back source is the ALU result Load instructions write back data from memory CSR instructions write back data read from the CSR register jal and jalr write back the next instruction address (PC + 4) as the return address - ALU operand 1 selection: Most instructions use register rs1 as the first ALU operand Branch, AUIPC, and jal instructions require PC-relative address calculation, so the program counter is selected as the first operand - ALU operand 2 selection: All instruction types except R-type use an immediate value as the second ALU operand Therefore, when an instruction requires an immediate, the ALU selects the immediate instead of register rs2 ### Ex7~Ex9 Not found in 2-mmio-trap ### Ex10 ``` scala= // ============================================================ // [CA25: Exercise 10] CSR Register Lookup Table - CSR Address Mapping // ============================================================ // Hint: Map CSR addresses to corresponding registers // // CSR addresses defined in CSRRegister object: // - MSTATUS (0x300): Machine status register // - MIE (0x304): Machine interrupt enable register // - MTVEC (0x305): Machine trap vector base address // - MSCRATCH (0x340): Machine scratch register // - MEPC (0x341): Machine exception program counter // - MCAUSE (0x342): Machine trap cause // - CycleL (0xC00): Cycle counter low 32 bits // - CycleH (0xC80): Cycle counter high 32 bits val regLUT = IndexedSeq( // TODO: Complete CSR address to register mapping CSRRegister.MSTATUS -> ?, CSRRegister.MIE -> ?, CSRRegister.MTVEC -> ?, CSRRegister.MSCRATCH -> ?, CSRRegister.MEPC -> ?, CSRRegister.MCAUSE -> ?, // 64-bit cycle counter split into high and low 32 bits // TODO: Extract low 32 bits and high 32 bits from cycles CSRRegister.CycleL -> ?, CSRRegister.CycleH -> ?, ) cycles := cycles + 1.U // If the pipeline and the CLINT are going to write the CSR at the same time, CLINT writes take priority. // Interrupt entry (CLINT) must override normal CSR writes to properly handle traps. io.reg_read_data := MuxLookup(io.reg_read_address_id, 0.U)(regLUT) io.debug_reg_read_data := MuxLookup(io.debug_reg_read_address, 0.U)(regLUT) // what data should be passed from csr to clint (Note: what should clint see is the next state of the CPU) io.clint_access_bundle.mstatus := mstatus io.clint_access_bundle.mtvec := mtvec io.clint_access_bundle.mcause := mcause io.clint_access_bundle.mepc := mepc io.clint_access_bundle.mie := mie ``` #### Answer ``` scala= val regLUT = IndexedSeq( CSRRegister.MSTATUS -> mstatus, CSRRegister.MIE -> mie, CSRRegister.MTVEC -> mtvec, CSRRegister.MSCRATCH -> mscratch, CSRRegister.MEPC -> mepc, CSRRegister.MCAUSE -> mcause, // 64-bit cycle counter split into low/high 32 bits CSRRegister.CycleL -> cycles(31, 0), CSRRegister.CycleH -> cycles(63, 32), ) cycles := cycles + 1.U ``` Exercise 10 implements a lookup table (LUT) that maps CSR addresses to their corresponding CSR registers Each CSR address defined in the RISC-V privileged specification is associated with an internal register: - mstatus, mie, mtvec, mscratch, mepc, and mcause are mapped directly to their respective CSR registers The 64-bit cycle counter is split into two 32-bit CSRs: - CycleL returns the lower 32 bits of cycles - CycleH returns the upper 32 bits of cycles ### Ex11 ``` scala= // ============================================================ // [CA25: Exercise 11] CSR Write Priority Logic // ============================================================ // Hint: Handle priority when both CLINT and CPU write to CSRs simultaneously // // Write priority rules: // 1. CLINT direct write (interrupt handling): Highest priority // 2. CPU CSR instruction write: Secondary priority // // CSRs requiring atomic update (interrupt-related): // - mstatus: Save/restore interrupt enable state // - mepc: Save exception return address // - mcause: Record trap cause when(io.clint_access_bundle.direct_write_enable) { // Atomic update when CLINT triggers interrupt // TODO: Which CSRs does CLINT need to write? ? := io.clint_access_bundle.mstatus_write_data ? := io.clint_access_bundle.mepc_write_data ? := io.clint_access_bundle.mcause_write_data }.elsewhen(io.reg_write_enable_id) { // CPU CSR instruction write // TODO: Update corresponding CSR based on write address when(io.reg_write_address_id === CSRRegister.MSTATUS) { mstatus := ? }.elsewhen(io.reg_write_address_id === CSRRegister.MEPC) { ? := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MCAUSE) { ? := io.reg_write_data_ex } } // CPU-exclusive CSRs (CLINT never writes these): // - mie: Machine interrupt enable bits // - mtvec: Machine trap vector base address // - mscratch: Machine scratch register for trap handlers when(io.reg_write_enable_id) { // TODO: Complete write logic for these CSRs when(io.reg_write_address_id === CSRRegister.MIE) { ? := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MTVEC) { ? := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MSCRATCH) { ? := io.reg_write_data_ex } } } ``` #### Answer ``` scala= when(io.clint_access_bundle.direct_write_enable) { // CLINT has highest priority (atomic update on trap/interrupt) mstatus := io.clint_access_bundle.mstatus_write_data mepc := io.clint_access_bundle.mepc_write_data mcause := io.clint_access_bundle.mcause_write_data }.elsewhen(io.reg_write_enable_id) { // CPU CSR instruction write (secondary priority) when(io.reg_write_address_id === CSRRegister.MSTATUS) { mstatus := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MEPC) { mepc := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MCAUSE) { mcause := io.reg_write_data_ex } } // CPU-exclusive CSRs (CLINT never writes these) when(io.reg_write_enable_id) { when(io.reg_write_address_id === CSRRegister.MIE) { mie := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MTVEC) { mtvec := io.reg_write_data_ex }.elsewhen(io.reg_write_address_id === CSRRegister.MSCRATCH) { mscratch := io.reg_write_data_ex } } ``` Exercise 11 enforces correct write priority when both the CLINT (interrupt/trap handling) and the CPU attempt to update CSRs - Highest priority: CLINT direct writes When an interrupt/trap is taken, CLINT must update mstatus, mepc, and mcause atomically. These updates represent the architectural next state required by the RISC-V privileged specification, so they must override any concurrent CPU CSR writes - Secondary priority: CPU CSR instruction writes If CLINT is not performing a direct write, the CPU may update CSRs through CSR instructions. The write is applied based on reg_write_address_id (e.g., MSTATUS, MEPC, MCAUSE) - CPU-exclusive CSRs Some CSRs (mie, mtvec, mscratch) are not written by CLINT. Therefore, they are updated only when reg_write_enable_id is asserted and the CSR address matches ### Ex12-1 ``` scala= // ============================================================ // [CA25: Exercise 12] Load Data Extension - Sign and Zero Extension // ============================================================ // Hint: Implement proper sign extension and zero extension for load operations // // RISC-V Load instruction types: // - LB (Load Byte): Load 8-bit value and sign-extend to 32 bits // - LBU (Load Byte Unsigned): Load 8-bit value and zero-extend to 32 bits // - LH (Load Halfword): Load 16-bit value and sign-extend to 32 bits // - LHU (Load Halfword Unsigned): Load 16-bit value and zero-extend to 32 bits // - LW (Load Word): Load full 32-bit value, no extension needed // // Sign extension: Replicate the sign bit (MSB) to fill upper bits // Example: LB loads 0xFF → sign-extended to 0xFFFFFFFF // Zero extension: Fill upper bits with zeros // Example: LBU loads 0xFF → zero-extended to 0x000000FF when(io.memory_read_enable) { // Optimized load logic: extract bytes/halfwords based on address alignment val data = io.memory_bundle.read_data val bytes = Wire(Vec(Parameters.WordSize, UInt(Parameters.ByteWidth))) for (i <- 0 until Parameters.WordSize) { bytes(i) := data((i + 1) * Parameters.ByteBits - 1, i * Parameters.ByteBits) } // Select byte based on lower 2 address bits (mem_address_index) val byte = bytes(mem_address_index) // Select halfword based on bit 1 of address (word-aligned halfwords) val half = Mux(mem_address_index(1), Cat(bytes(3), bytes(2)), Cat(bytes(1), bytes(0))) // TODO: Complete sign/zero extension for load operations // Hint: // - Use Fill to replicate a bit multiple times // - For sign extension: Fill with the sign bit (MSB) // - For zero extension: Fill with zeros // - Use Cat to concatenate extension bits with loaded data // // Note: This optimized implementation uses MuxLookup for byte selection // to handle all possible byte positions (0, 1, 2, 3) in a 32-bit word io.wb_memory_read_data := MuxLookup(io.funct3, 0.U)( Seq( // TODO: Complete LB (sign-extend byte) // Hint: Replicate sign bit, then concatenate with byte InstructionsTypeL.lb -> MuxLookup(mem_address_index, Cat(Fill(24, data(31)), data(31, 24)))( Seq( 0.U -> ?, 1.U -> ?, 2.U -> ? ) ), // TODO: Complete LBU (zero-extend byte) // Hint: Fill upper bits with zero, then concatenate with byte InstructionsTypeL.lbu -> MuxLookup(mem_address_index, Cat(Fill(24, 0.U), data(31, 24)))( Seq( 0.U -> ?, 1.U -> ?, 2.U -> ? ) ), // TODO: Complete LH (sign-extend halfword) // Hint: Replicate sign bit, then concatenate with halfword InstructionsTypeL.lh -> Mux( mem_address_index === 0.U, ?, ? ), // TODO: Complete LHU (zero-extend halfword) // Hint: Fill upper bits with zero, then concatenate with halfword InstructionsTypeL.lhu -> Mux( mem_address_index === 0.U, ?, ? ), // LW: Load full word, no extension needed (completed example) InstructionsTypeL.lw -> data ) ) ``` #### Answer ``` scala= io.wb_memory_read_data := MuxLookup(io.funct3, 0.U)( Seq( // LB (sign-extend byte) InstructionsTypeL.lb -> MuxLookup(mem_address_index, Cat(Fill(24, data(31)), data(31, 24)))( Seq( 0.U -> Cat(Fill(24, data(7)), data(7, 0)), 1.U -> Cat(Fill(24, data(15)), data(15, 8)), 2.U -> Cat(Fill(24, data(23)), data(23, 16)) ) ), // LBU (zero-extend byte) InstructionsTypeL.lbu -> MuxLookup(mem_address_index, Cat(Fill(24, 0.U), data(31, 24)))( Seq( 0.U -> Cat(Fill(24, 0.U), data(7, 0)), 1.U -> Cat(Fill(24, 0.U), data(15, 8)), 2.U -> Cat(Fill(24, 0.U), data(23, 16)) ) ), // LH (sign-extend halfword) InstructionsTypeL.lh -> Mux( mem_address_index === 0.U, Cat(Fill(16, data(15)), data(15, 0)), Cat(Fill(16, data(31)), data(31, 16)) ), // LHU (zero-extend halfword) InstructionsTypeL.lhu -> Mux( mem_address_index === 0.U, Cat(Fill(16, 0.U), data(15, 0)), Cat(Fill(16, 0.U), data(31, 16)) ), // LW InstructionsTypeL.lw -> data ) ) ``` In Exercise 12-1, the load data extension logic is implemented to correctly handle unaligned byte and halfword loads in a 32-bit memory word, as required by the RV32I specification Because memory is accessed as a 32-bit word, the exact byte or halfword to be loaded depends on the lower bits of the memory address (mem_address_index) __Byte Loads (LB / LBU)__ For LB and LBU, the CPU must select one of the four bytes inside the 32-bit word: - mem_address_index = 0 → bits [7:0] - mem_address_index = 1 → bits [15:8] - mem_address_index = 2 → bits [23:16] - mem_address_index = 3 → bits [31:24] LB performs sign extension, so the most significant bit of the selected byte is replicated into the upper 24 bits using Fill(24, signBit) LBU performs zero extension, so the upper 24 bits are filled with zeros __Halfword Loads (LH / LHU)__ For LH and LHU, only two aligned positions are valid: - mem_address_index(1) = 0 → lower halfword ([15:0]) - mem_address_index(1) = 1 → upper halfword ([31:16]) A simple Mux is sufficient to select between these two cases: - LH performs sign extension by replicating bit 15 or bit 31 into the upper 16 bits - LHU performs zero extension by filling the upper 16 bits with zeros __Word Load (LW)__ For LW, the entire 32-bit word is loaded directly, so no extension or byte selection is required ### Ex12-2 ``` scala= // ============================================================ // [CA25: Exercise 12] WriteBack Source Selection with CSR Support // ============================================================ // Hint: Select the appropriate write-back data source based on instruction type // // WriteBack sources (extended from single-cycle design): // - ALU result (default): Used by arithmetic/logical/branch/jump instructions // - Memory read data: Used by load instructions (LB, LH, LW, LBU, LHU) // - CSR read data: Used by CSR read instructions (CSRRW, CSRRS, CSRRC) **NEW** // - Next instruction address (PC+4): Used by JAL/JALR for return address // // The control signal regs_write_source (from Decode stage) selects: // - RegWriteSource.ALUResult (0): Default, use ALU computation result // - RegWriteSource.Memory (1): Load instruction, use memory read data // - RegWriteSource.CSR (2): CSR instruction, use CSR read data **NEW** // - RegWriteSource.NextInstructionAddress (3): JAL/JALR, save return address // // Comparison with 1-single-cycle Exercise 8: // - Single-cycle: 3 sources (ALU, Memory, PC+4) // - MMIO-trap: 4 sources (ALU, Memory, CSR, PC+4) **Added CSR support** // // TODO: Complete MuxLookup to multiplex writeback sources with CSR support // Hint: Specify default value and cases for each source type, including CSR io.regs_write_data := MuxLookup(io.regs_write_source, ?)( Seq( RegWriteSource.Memory -> ?, RegWriteSource.CSR -> ?, RegWriteSource.NextInstructionAddress -> ? ) ) } ``` #### Answer ``` scala= io.regs_write_data := MuxLookup(io.regs_write_source, io.alu_result)( Seq( RegWriteSource.Memory -> io.wb_memory_read_data, RegWriteSource.CSR -> io.csr_reg_read_data, RegWriteSource.NextInstructionAddress -> (instruction_address + 4.U) ) ) ``` Exercise 12-2 extends the write-back selection logic to support CSR instructions in the MMIO-Trap design - The default write-back value is the ALU result, which is used by most arithmetic, logical, and address-generation instructions - For load instructions, the write-back value comes from memory read data, which has already been sign- or zero-extended in Exercise 12-1 - For CSR instructions, the write-back value is the data read from the CSR register, since CSR instructions return the previous CSR value to rd - For jal and jalr, the write-back value is the next instruction address (PC + 4), which serves as the return address ### Ex13-1 ``` scala= // ============================================================ // [CA25: Exercise 13] Interrupt Entry - mstatus State Transition // ============================================================ // Hint: Implement mstatus register update during interrupt/trap entry // // mstatus bit positions (RISC-V Privileged Spec): // - Bit 3 (MIE): Machine Interrupt Enable (global interrupt enable) // - Bit 7 (MPIE): Machine Previous Interrupt Enable (saved MIE) // // Trap entry state transition: // 1. Save current interrupt enable: MPIE ← MIE (save before disabling) // 2. Disable interrupts: MIE ← 0 (prevent nested interrupts) // 3. Save return address: mepc ← instruction_address // 4. Record cause: mcause ← interrupt code (bit 31=1 for interrupts) // 5. Jump to handler: PC ← mtvec // // Example: // - Before: mstatus.MIE=1, mstatus.MPIE=? (don't care) // - After: mstatus.MIE=0, mstatus.MPIE=1 (saved previous enable state) // Check individual interrupt source enable based on interrupt type val interrupt_source_enabled = Mux( io.interrupt_flag === InterruptCode.Timer0, interrupt_enable_timer, interrupt_enable_external ) when(io.interrupt_flag =/= InterruptCode.None && interrupt_enable_global && interrupt_source_enabled) { // interrupt io.interrupt_assert := true.B io.interrupt_handler_address := io.csr_bundle.mtvec // TODO: Complete mstatus update logic for interrupt entry // Hint: mstatus bit layout (showing only relevant bits): // [31:13] | [12:11:MPP] | [10:8] | [7:MPIE] | [6:4] | [3:MIE] | [2:0] // Need to: // 1. Set MPP to 0b11 (Machine mode) // 2. Save current MIE to MPIE (bit 7) // 3. Clear MIE (bit 3) to disable interrupts io.csr_bundle.mstatus_write_data := Cat( io.csr_bundle.mstatus(31, 13), 3.U(2.W), // mpp ← 0b11 (Machine mode) io.csr_bundle.mstatus(10, 8), mie, // mpie ← mie (save current interrupt enable) io.csr_bundle.mstatus(6, 4), 0.U(1.W), // mie ← 0 (disable interrupts) io.csr_bundle.mstatus(2, 0) ) io.csr_bundle.mepc_write_data := instruction_address io.csr_bundle.mcause_write_data := Cat( 1.U, MuxLookup( io.interrupt_flag, 11.U(31.W) // machine external interrupt )( IndexedSeq( InterruptCode.Timer0 -> 7.U(31.W), ) ) ) io.csr_bundle.direct_write_enable := true.B }.elsewhen(io.instruction === InstructionsEnv.ebreak || io.instruction === InstructionsEnv.ecall) { // exception io.interrupt_assert := true.B io.interrupt_handler_address := io.csr_bundle.mtvec io.csr_bundle.mstatus_write_data := Cat( io.csr_bundle.mstatus(31, 13), 3.U(2.W), // mpp ← 0b11 (Machine mode) io.csr_bundle.mstatus(10, 8), mie, // mpie io.csr_bundle.mstatus(6, 4), 0.U(1.W), // mie io.csr_bundle.mstatus(2, 0) ) io.csr_bundle.mepc_write_data := instruction_address io.csr_bundle.mcause_write_data := Cat( 0.U, MuxLookup(io.instruction, 0.U)( IndexedSeq( InstructionsEnv.ebreak -> 3.U(31.W), InstructionsEnv.ecall -> 11.U(31.W), ) ) ) io.csr_bundle.direct_write_enable := true.B ``` Exercise 13-1 does not require any code modification The interrupt and exception entry logic is already fully implemented in the provided template, so this exercise does not contain any blanks to fill ### Ex13-2 ``` scala= // ============================================================ // [CA25: Exercise 13] Store Data Alignment - Byte Strobes and Shifting // ============================================================ // Hint: Implement proper data alignment and byte strobes for store operations // // RISC-V Store instruction types: // - SB (Store Byte): Write 8-bit value to memory at byte-aligned address // - SH (Store Halfword): Write 16-bit value to memory at halfword-aligned address // - SW (Store Word): Write 32-bit value to memory at word-aligned address // // Key concepts: // 1. Byte strobes: Control which bytes in a 32-bit word are written // - SB: 1 strobe active (at mem_address_index position) // - SH: 2 strobes active (based on address bit 1) // - SW: All 4 strobes active // 2. Data shifting: Align data to correct byte position in 32-bit word // - mem_address_index (bits 1:0) indicates byte position // - Left shift by (mem_address_index * 8) bits for byte operations // - Left shift by 16 bits for upper halfword // // Examples: // - SB to address 0x1002 (index=2): data[7:0] → byte 2, strobe[2]=1 // - SH to address 0x1002 (index=2): data[15:0] → bytes 2-3, strobes[2:3]=1 }.elsewhen(io.memory_write_enable) { io.memory_bundle.write_data := io.reg2_data io.memory_bundle.write_enable := true.B io.memory_bundle.write_strobe := VecInit(Seq.fill(Parameters.WordSize)(false.B)) // Optimized store logic: reduce combinational depth by simplifying shift operations // mem_address_index is already computed from address alignment (bits 1:0) when(io.funct3 === InstructionsTypeS.sb) { // TODO: Complete store byte logic // Hint: // 1. Enable single byte strobe at appropriate position // 2. Shift byte data to correct position based on address io.memory_bundle.write_strobe(?) := true.B io.memory_bundle.write_data := io.reg2_data(?) << (mem_address_index << ?) }.elsewhen(io.funct3 === InstructionsTypeS.sh) { // TODO: Complete store halfword logic // Hint: Check address to determine lower/upper halfword position when(mem_address_index(?) === 0.U) { // Lower halfword (bytes 0-1) // TODO: Enable strobes for bytes 0 and 1, no shifting needed for (i <- 0 until Parameters.WordSize / 2) { io.memory_bundle.write_strobe(i) := true.B } io.memory_bundle.write_data := io.reg2_data( Parameters.WordSize / 2 * Parameters.ByteBits - 1, 0 ) }.otherwise { // Upper halfword (bytes 2-3) // TODO: Enable strobes for bytes 2 and 3, shift left by 16 bits for (i <- Parameters.WordSize / 2 until Parameters.WordSize) { io.memory_bundle.write_strobe(i) := true.B } io.memory_bundle.write_data := io.reg2_data( Parameters.WordSize / 2 * Parameters.ByteBits - 1, 0 ) << (Parameters.WordSize / 2 * Parameters.ByteBits) } }.elsewhen(io.funct3 === InstructionsTypeS.sw) { // Store word: enable all byte strobes, no shifting needed (completed example) for (i <- 0 until Parameters.WordSize) { io.memory_bundle.write_strobe(i) := true.B } } } } ``` #### Answer ``` scala= }.elsewhen(io.memory_write_enable) { io.memory_bundle.write_data := io.reg2_data io.memory_bundle.write_enable := true.B io.memory_bundle.write_strobe := VecInit(Seq.fill(Parameters.WordSize)(false.B)) // Store Byte (SB) when(io.funct3 === InstructionsTypeS.sb) { io.memory_bundle.write_strobe(mem_address_index) := true.B io.memory_bundle.write_data := io.reg2_data(7, 0) << (mem_address_index << 3) // Store Halfword (SH) }.elsewhen(io.funct3 === InstructionsTypeS.sh) { when(mem_address_index(1) === 0.U) { // lower halfword (bytes 0,1) for (i <- 0 until Parameters.WordSize / 2) { io.memory_bundle.write_strobe(i) := true.B } io.memory_bundle.write_data := io.reg2_data(Parameters.WordSize / 2 * Parameters.ByteBits - 1, 0) }.otherwise { // upper halfword (bytes 2,3) for (i <- Parameters.WordSize / 2 until Parameters.WordSize) { io.memory_bundle.write_strobe(i) := true.B } io.memory_bundle.write_data := io.reg2_data(Parameters.WordSize / 2 * Parameters.ByteBits - 1, 0) << (Parameters.WordSize / 2 * Parameters.ByteBits) } // Store Word (SW) }.elsewhen(io.funct3 === InstructionsTypeS.sw) { for (i <- 0 until Parameters.WordSize) { io.memory_bundle.write_strobe(i) := true.B } io.memory_bundle.write_data := io.reg2_data } } ``` Exercise 13-2 implements correct data alignment and byte strobe control for RV32I store instructions in the MMIO-Trap design Memory writes are performed on a 32-bit data bus, but store instructions may write only a byte (SB), a halfword (SH), or a full word (SW). Therefore, the processor must explicitly specify which byte lanes are written and how the store data is aligned within the 32-bit word - __Store Byte (SB):__ Only one byte is written. The target byte lane is selected using the lower two bits of the memory address (mem_address_index) Exactly one write strobe is asserted, and the lower 8 bits of rs2 are shifted left by (mem_address_index × 8) to align the data with the correct byte position - __Store Halfword (SH):__ Two adjacent bytes are written. The halfword position is determined by mem_address_index(1) If it is zero, the lower halfword (bytes 0–1) is written without shifting Otherwise, the upper halfword (bytes 2–3) is written, and the data is shifted left by 16 bits Corresponding byte strobes are enabled to ensure only the intended bytes are modified - __Store Word (SW):__ All four bytes are written. All write strobes are asserted, and no data shifting is required ### Ex14 ``` scala= // ============================================================ // [CA25: Exercise 14] Trap Return (MRET) - mstatus State Restoration // ============================================================ // Hint: Implement mstatus register update during trap return (MRET instruction) // // MRET (Machine Return) state transition: // 1. Restore interrupt enable: MIE ← MPIE (restore saved state) // 2. Set MPIE to 1: MPIE ← 1 (spec requires MPIE=1 after MRET) // 3. Return to saved PC: PC ← mepc // // This is the inverse of trap entry: // - Trap entry: MPIE←MIE, MIE←0 (save and disable) // - MRET: MIE←MPIE, MPIE←1 (restore and reset) // // Example: // - Before MRET: mstatus.MIE=0, mstatus.MPIE=1 (in trap handler) // - After MRET: mstatus.MIE=1, mstatus.MPIE=1 (interrupts re-enabled) }.elsewhen(io.instruction === InstructionsRet.mret) { // ret io.interrupt_assert := true.B io.interrupt_handler_address := io.csr_bundle.mepc // TODO: Complete mstatus update logic for MRET // Hint: mstatus bit layout (showing only relevant bits): // [31:13] | [12:11:MPP] | [10:8] | [7:MPIE] | [6:4] | [3:MIE] | [2:0] // Need to: // 1. Set MPP to 0b11 (Machine mode, for M-mode only systems) // 2. Set MPIE to 1 (bit 7) // 3. Restore MIE from MPIE (bit 3 ← bit 7) io.csr_bundle.mstatus_write_data := Cat( io.csr_bundle.mstatus(31, 13), 3.U(2.W), // mpp ← 0b11 (Machine mode) io.csr_bundle.mstatus(10, 8), 1.U(1.W), // mpie ← 1 (reset MPIE) io.csr_bundle.mstatus(6, 4), mpie, // mie ← mpie (restore interrupt enable) io.csr_bundle.mstatus(2, 0) ) io.csr_bundle.mepc_write_data := io.csr_bundle.mepc io.csr_bundle.mcause_write_data := io.csr_bundle.mcause io.csr_bundle.direct_write_enable := true.B }.otherwise { io.interrupt_assert := false.B io.interrupt_handler_address := io.csr_bundle.mtvec io.csr_bundle.mstatus_write_data := io.csr_bundle.mstatus io.csr_bundle.mepc_write_data := io.csr_bundle.mepc io.csr_bundle.mcause_write_data := io.csr_bundle.mcause io.csr_bundle.direct_write_enable := false.B } // io.interrupt_handler_address := io.csr_bundle.mepc } ``` Exercise 14 implements the MRET instruction, restoring the program counter and interrupt enable state so execution can return from the trap handler to the original program but doesn't require any code modification ### Ex15 ``` scala= // ============================================================ // [CA25: Exercise 15] PC Update Logic - Sequential vs Control Flow with Interrupts // ============================================================ // Hint: Implement program counter (PC) update logic for sequential execution, // control flow changes, and interrupt handling // // PC update rules: // 1. Interrupt asserted: PC = interrupt handler address (highest priority) // - When interrupt is asserted, vector to trap handler // - Saves current PC to mepc before jump (handled by CLINT) // 2. Control flow (jump/branch taken): PC = jump target address // - When jump flag is asserted, use jump address // - Covers: JAL, JALR, taken branches, and MRET // 3. Sequential execution: PC = PC + 4 // - When no interrupt/jump/branch, increment PC by 4 bytes (next instruction) // - RISC-V instructions are 4 bytes (32 bits) in RV32I // 4. Invalid instruction: PC = PC (hold current value) // - When instruction is invalid, don't update PC // - Insert NOP to prevent illegal instruction execution // // Priority: Interrupt > Jump/Branch > Sequential // // Examples: // - Normal ADD: PC = 0x1000 → next PC = 0x1004 (sequential) // - JAL offset: PC = 0x1000, target = 0x2000 → next PC = 0x2000 (control flow) // - Timer interrupt: PC = 0x1000, handler = 0x8000 → next PC = 0x8000 (interrupt) when(io.instruction_valid) { io.instruction := io.instruction_read_data // TODO: Complete PC update logic with interrupt priority // Hint: Use nested multiplexer to implement priority: interrupt > jump > sequential // - Outermost multiplexer: Check interrupt condition // - True: Use interrupt handler address // - False: Check jump/branch condition // - Inner multiplexer: Check jump flag // - True: Use jump target address // - False: Sequential execution pc := ? }.otherwise { // When instruction is invalid, hold PC and insert NOP (ADDI x0, x0, 0) // NOP = 0x00000013 allows pipeline to continue safely without side effects pc := pc io.instruction := 0x00000013.U // NOP: prevents illegal instruction execution } io.instruction_address := pc } ``` #### Answer ``` scala= pc := Mux( io.interrupt_assert, io.interrupt_handler_address, Mux( io.if_jump_flag, io.if_jump_address, pc + 4.U ) ) ``` Exercise 15 implements the program counter (PC) update logic with proper priority handling in the MMIO-Trap design. The PC update follows a strict priority order: - Interrupt handling (highest priority) When interrupt_assert is asserted, the PC is immediately redirected to the interrupt handler address. Interrupts and exceptions must override all normal control flow to guarantee correct trap handling - Control-flow instructions (jump / branch) If no interrupt is asserted and a jump or taken branch occurs, the PC is updated to the computed jump target address. This includes jal, jalr, taken branch instructions, and MRET - Sequential execution (lowest priority) When neither an interrupt nor a control-flow change occurs, the PC increments by 4 bytes to fetch the next instruction, following the RV32I sequential execution model ### Test All ![image](https://hackmd.io/_uploads/Syh0Sr0fZl.png) ## 3-Pipeline Exercise 16~21 ### Ex16 ``` scala= // ============================================================ // [CA25: Exercise 16] ALU Operation Implementation - Basic Arithmetic and Logic // ============================================================ // Hint: Implement all RV32I ALU operations // // Completed examples: // - add: Addition (op1 + op2) // - sub: Subtraction (op1 - op2) // // Students need to complete: // - Logical operations: xor, or, and // - Shift operations: sll, srl, sra // - Comparison operations: slt, sltu io.result := 0.U switch(io.func) { is(ALUFunctions.add) { io.result := io.op1 + io.op2 } is(ALUFunctions.sub) { io.result := io.op1 - io.op2 } // Shift Operations // Hint: RISC-V specifies that shift amount uses only low 5 bits (max shift 31 bits) is(ALUFunctions.sll) { // TODO: Implement logical left shift (Shift Left Logical) // Hint: Use shift left operator, only use low 5 bits of second operand io.result := ? } is(ALUFunctions.srl) { // TODO: Implement logical right shift (Shift Right Logical) // Hint: Use shift right operator, fill high bits with 0, only use low 5 bits io.result := ? } is(ALUFunctions.sra) { // TODO: Implement arithmetic right shift (Shift Right Arithmetic) // Hint: Need to preserve sign bit, steps: // 1. Convert operand to signed type // 2. Perform arithmetic right shift // 3. Convert back to unsigned type io.result := ? } // Comparison Operations // is(ALUFunctions.slt) { // TODO: Implement signed comparison (Set Less Than) // Hint: Convert both operands to signed type then compare // If op1 < op2 (signed), result is 1, otherwise 0 io.result := ? } is(ALUFunctions.sltu) { // TODO: Implement unsigned comparison (Set Less Than Unsigned) // Hint: Directly compare unsigned values // If op1 < op2 (unsigned), result is 1, otherwise 0 io.result := ? } // Logical Operations // is(ALUFunctions.xor) { // TODO: Implement XOR operation io.result := ? } is(ALUFunctions.or) { // TODO: Implement OR operation io.result := ? } is(ALUFunctions.and) { // TODO: Implement AND operation io.result := ? } } } ``` #### Answer ``` scala io.result := 0.U switch(io.func) { is(ALUFunctions.add) { io.result := io.op1 + io.op2 } is(ALUFunctions.sub) { io.result := io.op1 - io.op2 } // Shift Operations (use only low 5 bits of shift amount) is(ALUFunctions.sll) { io.result := io.op1 << io.op2(4, 0) } is(ALUFunctions.srl) { io.result := io.op1 >> io.op2(4, 0) } is(ALUFunctions.sra) { io.result := (io.op1.asSInt >> io.op2(4, 0)).asUInt } // Comparison Operations is(ALUFunctions.slt) { io.result := (io.op1.asSInt < io.op2.asSInt).asUInt } is(ALUFunctions.sltu) { io.result := (io.op1 < io.op2).asUInt } // Logical Operations is(ALUFunctions.xor) { io.result := io.op1 ^ io.op2 } is(ALUFunctions.or) { io.result := io.op1 | io.op2 } is(ALUFunctions.and) { io.result := io.op1 & io.op2 } } ``` Exercise 16 completes the RV32I ALU operations by implementing shifts, comparisons, and logical functions 1. Shift operations (sll/srl/sra): RISC-V defines that the shift amount uses only the lower 5 bits of the second operand (RV32 → 0–31) Therefore, the shift amount is op2(4,0) - SLL: logical left shift (op1 << shamt) - SRL: logical right shift with zero-fill (op1 >> shamt) - SRA: arithmetic right shift that preserves the sign bit, implemented by converting op1 to SInt, shifting, then converting back to UInt 2. Comparison operations (slt/sltu): - slt performs a signed comparison (asSInt) and produces 1 if op1 < op2 - sltu performs an unsigned comparison directly on UInt 3. Logical operations (xor/or/and): - These are implemented using bitwise operators on 32-bit operands. ### Ex17 ``` scala= // ============================================================ // [CA25: Exercise 17] Data Forwarding to EX Stage // ============================================================ // Hint: Resolve RAW (Read After Write) data hazards // // Forwarding conditions (using rs1 as example): // 1. MEM stage needs to write register // 2. Source register matches destination register // 3. Destination register is not x0 (x0 is always 0, no forwarding needed) // // Example scenario: // ADD x1, x2, x3 [MEM stage, x1 just computed] // SUB x4, x1, x5 [EX stage, needs x1 value] → Forwarding required! // EX stage rs1 forwarding: Resolve RAW hazards for first ALU operand // TODO: Complete rs1_ex forwarding logic when(? && ? =/= 0.U) { // Condition 1: MEM stage writes register // Condition 2: Register address match // Condition 3: Not x0 register // // Priority 1: Forward from EX/MEM stage (1-cycle RAW hazard) // Most recent result takes precedence io.reg1_forward_ex := ForwardingType.ForwardFromMEM }.elsewhen(? && ? =/= 0.U) { // Condition 1: WB stage writes register // Condition 2: Register address match // Condition 3: Not x0 register // // Priority 2: Forward from MEM/WB stage (2-cycle RAW hazard) // Older result if no newer hazard exists io.reg1_forward_ex := ForwardingType.ForwardFromWB }.otherwise { // No hazard: Use register file value io.reg1_forward_ex := ForwardingType.NoForward } // EX stage rs2 forwarding: Resolve RAW hazards for second ALU operand // TODO: Complete rs2_ex forwarding logic (similar to rs1) when(? && ? && ? =/= 0.U) { // Priority 1: Forward from EX/MEM stage // Example: ADD x1, x2, x3; SUB x4, x5, x1 (forward x1 to rs2) io.reg2_forward_ex := ForwardingType.ForwardFromMEM }.elsewhen(? && ? && ? =/= 0.U) { // Priority 2: Forward from MEM/WB stage io.reg2_forward_ex := ForwardingType.ForwardFromWB }.otherwise { // No hazard: Use register file value io.reg2_forward_ex := ForwardingType.NoForward } ``` #### Answer ``` scala= when(io.reg_write_enable_mem && (io.reg_write_address_mem === io.regs_reg1_read_address_ex) && io.reg_write_address_mem =/= 0.U) { io.reg1_forward_ex := ForwardingType.ForwardFromMEM }.elsewhen(io.reg_write_enable_wb && (io.reg_write_address_wb === io.regs_reg1_read_address_ex) && io.reg_write_address_wb =/= 0.U) { io.reg1_forward_ex := ForwardingType.ForwardFromWB }.otherwise { io.reg1_forward_ex := ForwardingType.NoForward } when(io.reg_write_enable_mem && (io.reg_write_address_mem === io.regs_reg2_read_address_ex) && io.reg_write_address_mem =/= 0.U) { io.reg2_forward_ex := ForwardingType.ForwardFromMEM }.elsewhen(io.reg_write_enable_wb && (io.reg_write_address_wb === io.regs_reg2_read_address_ex) && io.reg_write_address_wb =/= 0.U) { io.reg2_forward_ex := ForwardingType.ForwardFromWB }.otherwise { io.reg2_forward_ex := ForwardingType.NoForward } ``` Exercise 17 resolves RAW (Read-After-Write) hazards in a 5-stage pipeline by forwarding the most recent computed value to the EX stage For each source operand (rs1 and rs2), forwarding is enabled when: - the upstream stage will write a register (reg_write_enable_*), its destination register matches the EX-stage source register address, and the destination register is not x0 (since x0 is always zero) Forwarding priority: - EX/MEM (MEM stage) forwarding has higher priority because it contains the newest result (1-cycle hazard) - MEM/WB (WB stage) forwarding is used only if no MEM-stage match exists (2-cycle hazard) ### Ex18 ``` scala= // ============================================================ // [CA25: Exercise 18] Data Forwarding to ID Stage // ============================================================ // Hint: Provide early forwarding for branch instructions to reduce // branch penalty // // Significance of ID stage forwarding: // - Branch instructions need to compare rs1 and rs2 in ID stage // - With ID stage forwarding, branch penalty reduced from 2 cycles to 1 cycle // // Example scenario: // ADD x1, x2, x3 [MEM stage] // BEQ x1, x4, label [ID stage, needs x1 for comparison] → Forwarding required! // ID stage rs1 forwarding: Enable early branch operand resolution // TODO: Complete rs1_id forwarding logic when(? && ? && ? =/= 0.U) { // Condition 1: MEM stage writes register // Condition 2: Register address match // Condition 3: Not x0 register // // Forward from EX/MEM to ID for branch comparison // Example: ADD x1, x2, x3 (in EX); BEQ x1, x4, label (in ID) // Without this: 2-cycle branch penalty // With this: 1-cycle branch penalty io.reg1_forward_id := ForwardingType.ForwardFromMEM }.elsewhen(? && ? && ? =/= 0.U) { // Condition 1: WB stage writes register // Condition 2: Register address match // Condition 3: Not x0 register // // Forward from MEM/WB to ID for older dependencies io.reg1_forward_id := ForwardingType.ForwardFromWB }.otherwise { // No forwarding needed for ID stage rs1 io.reg1_forward_id := ForwardingType.NoForward } // ID stage rs2 forwarding: Enable early branch operand resolution // TODO: Complete rs2_id forwarding logic (similar to rs1_id) when(? && ? && ? =/= 0.U) { // Forward from EX/MEM to ID for second branch operand // Critical for instructions like: BEQ x1, x2, label // where both operands may have pending writes io.reg2_forward_id := ForwardingType.ForwardFromMEM }.elsewhen(? && ? && ? =/= 0.U) { // Forward from MEM/WB to ID io.reg2_forward_id := ForwardingType.ForwardFromWB }.otherwise { // No forwarding needed for ID stage rs2 io.reg2_forward_id := ForwardingType.NoForward } } ``` #### Answer ``` scala= // ID stage rs1 forwarding when(io.reg_write_enable_mem && (io.reg_write_address_mem === io.regs_reg1_read_address_id) && io.reg_write_address_mem =/= 0.U) { io.reg1_forward_id := ForwardingType.ForwardFromMEM }.elsewhen(io.reg_write_enable_wb && (io.reg_write_address_wb === io.regs_reg1_read_address_id) && io.reg_write_address_wb =/= 0.U) { io.reg1_forward_id := ForwardingType.ForwardFromWB }.otherwise { io.reg1_forward_id := ForwardingType.NoForward } // ID stage rs2 forwarding when(io.reg_write_enable_mem && (io.reg_write_address_mem === io.regs_reg2_read_address_id) && io.reg_write_address_mem =/= 0.U) { io.reg2_forward_id := ForwardingType.ForwardFromMEM }.elsewhen(io.reg_write_enable_wb && (io.reg_write_address_wb === io.regs_reg2_read_address_id) && io.reg_write_address_wb =/= 0.U) { io.reg2_forward_id := ForwardingType.ForwardFromWB }.otherwise { io.reg2_forward_id := ForwardingType.NoForward } ``` Exercise 18 adds early forwarding into the ID stage to reduce branch penalty Branch instructions perform operand comparison (rs1 vs rs2) in the ID stage. Without ID-stage forwarding, a branch depending on a recently produced register value must wait until the value is written back, increasing the branch penalty For each ID-stage source operand (rs1 and rs2), forwarding is enabled when: - the upstream stage will write a register (reg_write_enable_mem or reg_write_enable_wb), the destination register matches the ID-stage source register address (regs_reg*_read_address_id), and the destination register is not x0 Forwarding priority: - EX/MEM forwarding has higher priority because it contains the newest result - MEM/WB forwarding is used only when there is no EX/MEM match ### Ex19 ``` scala= // ============================================================ // [CA25: Exercise 19] Pipeline Hazard Detection // ============================================================ // Hint: Detect data and control hazards, decide when to insert bubbles // or flush the pipeline // // Hazard types: // 1. Load-use hazard: Load result used immediately by next instruction // 2. Jump-related hazard: Jump instruction needs register value not ready // 3. Control hazard: Branch/jump instruction changes PC // // Control signals: // - pc_stall: Freeze PC (don't fetch next instruction) // - if_stall: Freeze IF/ID register (hold current fetch result) // - id_flush: Flush ID/EX register (insert NOP bubble) // - if_flush: Flush IF/ID register (discard wrong-path instruction) // Complex hazard detection for early branch resolution in ID stage when( // ============ Complex Hazard Detection Logic ============ // This condition detects multiple hazard scenarios requiring stalls: // --- Condition 1: EX stage hazards (1-cycle dependencies) --- // TODO: Complete hazard detection conditions // Need to detect: // 1. Jump instruction in ID stage // 2. OR Load instruction in EX stage // 3. AND destination register is not x0 // 4. AND destination register conflicts with ID source registers // ((?) && // Either: // - Jump in ID needs register value, OR // - Load in EX (load-use hazard) ? =/= 0.U && // Destination is not x0 (?)) // Destination matches ID source // // Examples triggering Condition 1: // a) Jump dependency: ADD x1, x2, x3 [EX]; JALR x0, x1, 0 [ID] → stall // b) Load-use: LW x1, 0(x2) [EX]; ADD x3, x1, x4 [ID] → stall // c) Load-branch: LW x1, 0(x2) [EX]; BEQ x1, x4, label [ID] → stall || // OR // --- Condition 2: MEM stage load with jump dependency (2-cycle) --- // TODO: Complete MEM stage hazard detection // Need to detect: // 1. Jump instruction in ID stage // 2. Load instruction in MEM stage // 3. Destination register is not x0 // 4. Destination register conflicts with ID source registers // (? && // Jump instruction in ID ? && // Load instruction in MEM ? =/= 0.U && // Load destination not x0 (?)) // Load dest matches jump source // // Example triggering Condition 2: // LW x1, 0(x2) [MEM]; NOP [EX]; JALR x0, x1, 0 [ID] // Even with forwarding, load result needs extra cycle to reach ID stage ) { // Stall action: Insert bubble and freeze pipeline // TODO: Which control signals need to be set to insert a bubble? // Hint: // - Flush ID/EX register (insert bubble) // - Freeze PC (don't fetch next instruction) // - Freeze IF/ID (hold current fetch result) io.id_flush := ? io.pc_stall := ? io.if_stall := ? }.elsewhen(io.jump_flag) { // ============ Control Hazard (Branch Taken) ============ // Branch resolved in ID stage - only 1 cycle penalty // Only flush IF stage (not ID) since branch resolved early // TODO: Which stage needs to be flushed when branch is taken? // Hint: Branch resolved in ID stage, discard wrong-path instruction io.if_flush := ? // Note: No ID flush needed - branch already resolved in ID! // This is the key optimization: 1-cycle branch penalty vs 2-cycle } ``` #### Answer ``` scala= when( ( ((io.jump_id || io.memory_read_enable_ex) && io.reg_write_address_ex =/= 0.U && ((io.reg_write_address_ex === io.regs_reg1_read_address_id) || (io.reg_write_address_ex === io.regs_reg2_read_address_id))) || (io.jump_id && io.memory_read_enable_mem && io.reg_write_address_mem =/= 0.U && (io.reg_write_address_mem === io.regs_reg1_read_address_id)) ) ) { io.id_flush := true.B io.pc_stall := true.B io.if_stall := true.B }.elsewhen(io.jump_flag) { io.if_flush := true.B } ``` Exercise 19 detects pipeline hazards that cannot be solved by normal forwarding and generates stall/flush control signals Condition 1 (EX-stage dependency): - A stall is required when either: - a jump instruction in ID (e.g., JALR) needs a register value that is being produced by the instruction in EX, or the instruction in EX is a load (load-use / load-branch hazard), and its destination register matches rs1 or rs2 of the ID-stage instruction. The destination register must not be x0 Condition 2 (MEM-stage load with jump dependency): - Even with forwarding, if a jump instruction in ID depends on a load currently in MEM, the load data still may not be available early enough for ID-stage jump resolution. Therefore, an additional stall is inserted when the MEM-stage load destination matches the jump base register (rs1) Stall action: - To insert a bubble, the pipeline freezes PC and IF/ID (pc_stall, if_stall) and flushes ID/EX (id_flush) Control hazard (branch/jump taken): - When the branch/jump decision is taken in ID (jump_flag), the wrong-path instruction in IF/ID is flushed (if_flush), achieving a 1-cycle branch penalty ### Ex20 ``` scala= // ============================================================ // [CA25: Exercise 20] Pipeline Register Flush Logic // ============================================================ // Hint: Implement pipeline register behavior with stall and flush support // // Pipeline register behavior: // 1. Normal operation: Pass input to output (register contents updated) // 2. Stall: Hold current output (freeze register) // 3. Flush: Output NOP/default value (clear invalid instruction) // // PipelineRegister module interface: // - io.in: Input data to register // - io.stall: Freeze register when true // - io.flush: Clear register to default when true // - io.out: Registered output // - defaultValue: Value output when flushed // // For instruction register: // - Normal: Pass instruction from IF // - Stall: Keep previous instruction // - Flush: Output NOP (InstructionsNop.nop = 0x00000013) // // TODO: Complete the instantiation and connection // Hint: Use Module() to instantiate PipelineRegister with appropriate default val instruction = Module(new PipelineRegister(defaultValue = ?)) instruction.io.in := ? instruction.io.stall := ? instruction.io.flush := ? io.output_instruction := ? // For instruction address register: // - Flush: Output entry address (ProgramCounter.EntryAddress) // TODO: Complete the instantiation and connection val instruction_address = Module(new PipelineRegister(defaultValue = ?)) instruction_address.io.in := ? instruction_address.io.stall := ? instruction_address.io.flush := ? io.output_instruction_address := ? // For interrupt flag register: // - Flush: Output 0 (no interrupt) // TODO: Complete the instantiation and connection val interrupt_flag = Module(new PipelineRegister(Parameters.InterruptFlagBits)) interrupt_flag.io.in := ? interrupt_flag.io.stall := ? interrupt_flag.io.flush := ? io.output_interrupt_flag := ? } ``` #### Answer ``` scala= val instruction = Module(new PipelineRegister(defaultValue = InstructionsNop.nop)) instruction.io.in := io.input_instruction instruction.io.stall := io.if_stall instruction.io.flush := io.if_flush io.output_instruction := instruction.io.out val instruction_address = Module(new PipelineRegister(defaultValue = ProgramCounter.EntryAddress.U)) instruction_address.io.in := io.input_instruction_address instruction_address.io.stall := io.if_stall instruction_address.io.flush := io.if_flush io.output_instruction_address := instruction_address.io.out val interrupt_flag = Module(new PipelineRegister(Parameters.InterruptFlagBits)) interrupt_flag.io.in := io.input_interrupt_flag interrupt_flag.io.stall := io.if_stall interrupt_flag.io.flush := io.if_flush io.output_interrupt_flag := interrupt_flag.io.out ``` Exercise 20 implements the stall and flush behavior of pipeline registers to correctly handle data hazards and control hazards in a pipelined processor Each pipeline register supports three operating modes: - Normal operation When neither stall nor flush is asserted, the register captures its input and forwards it to the output in the next cycle. - Stall (freeze) When stall is asserted, the pipeline register holds its current value. This prevents the pipeline from advancing when a hazard is detected, such as a load-use or jump-related dependency. - Flush (clear) When flush is asserted, the pipeline register outputs a predefined default value. Flushing is used to discard incorrect or unwanted instructions fetched along the wrong control path. For the instruction register, flushing inserts a NOP (0x00000013) so that no side effects occur from an invalid instruction For the instruction address register, flushing outputs the entry address to maintain a safe and consistent program counter state For the interrupt flag register, flushing clears the flag to zero, ensuring that no stale interrupt information propagates through the pipeline ### Ex21 ``` scala= // ============================================================ // [CA25: Exercise 21] Hazard Detection Summary and Analysis // ============================================================ // Conceptual Exercise: Answer the following questions based on the hazard // detection logic implemented above // // Q1: Why do we need to stall for load-use hazards? // A: [Student answer here] // Hint: Consider data dependency and forwarding limitations // // Q2: What is the difference between "stall" and "flush" operations? // A: [Student answer here] // Hint: Compare their effects on pipeline registers and PC // // Q3: Why does jump instruction with register dependency need stall? // A: [Student answer here] // Hint: When is jump target address available? // // Q4: In this design, why is branch penalty only 1 cycle instead of 2? // A: [Student answer here] // Hint: Compare ID-stage vs EX-stage branch resolution // // Q5: What would happen if we removed the hazard detection logic entirely? // A: [Student answer here] // Hint: Consider data hazards and control flow correctness // // Q6: Complete the stall condition summary: // Stall is needed when: // 1. ? (EX stage condition) // 2. ? (MEM stage condition) // // Q7: Flush is needed when: // 1. ? (Branch/Jump condition) // } ``` #### Answer Q1 Why do we need to stall for load-use hazards? - Ans: We stall on load-use hazards because the load’s data isn’t produced until the MEM/WB stage, so the next instruction would otherwise read a stale value; forwarding can’t shortcut this Q2: What is the difference between "stall" and "flush" operations? - Ans: A stall freezes the PC and IF/ID registers so the current instruction waits for data, whereas a flush wipes pipeline registers (inserts a NOP) to discard an incorrect or outdated instruction Q3: Why does jump instruction with register dependency need stall? - Ans: Jump/branch instructions need the latest rs1/rs2 value to compute their target; if a prior instruction hasn’t written back yet, we must stall until a forwarding path can supply the fresh data Q4: In this design, why is branch penalty only 1 cycle instead of 2? - Ans: Simple branches are resolved in the ID stage, so only the IF stage’s next instruction is discarded, yielding just a one-cycle penalty instead of two Q5: What would happen if we removed the hazard detection logic entirely? - Ans: Without hazard detection, the CPU would read stale operands or continue down the wrong control path, producing incorrect results or crashing almost immediately Q6: Complete the stall condition summary: - Ans: - 1. An EX-stage producer (load or jump source) targets the same register that the ID-stage instruction needs - 2. A MEM-stage load’s destination matches the ID-stage jump’s source register Q7: Flush is needed when: - Ans: A branch/jump is taken in ID, so the IF stage’s fetched instruction on the wrong path must be discarded ### Test All ![image](https://hackmd.io/_uploads/Sy2vOBAG-e.png) ## NyanCat so cute ![image](https://hackmd.io/_uploads/SJ4Vb6Af-x.png)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.