CA2025 : Quiz6

# CA2025 : Quiz6 contributed by><[winterchen](https://github.com/kstoko02)> ## Q70. Why does PC-relative addressing simplify datapath? Because PC-relative only requires addition using "existing PC + immediate value", it does not require additional hardware or special paths, so the datapath is short and clean. #### Comparing JAL and JALR : jal rd, imm（PC-relative） ``` rd = PC + 4 PC = PC + imm ``` * The target address is only related to the PC. * rs1 is completely unnecessary. jalr rd, rs1, imm（register-based） ``` rd = PC + 4 PC = rs1 + imm ``` * The target address comes from a temporary register. * Requires alignment. The control logic of PC-relative is almost fixed, and because it does not rely on registers, it does not require forwarding and there is no data hazard. Its target calculation is only related to PC. Reference : [Lecture 19: Datapath II p.25~29](https://docs.google.com/presentation/d/1nJknDrB402GuriZuqeYOVpoJv9bvYoZeFRSPKzqjdqw/edit?slide=id.p26#slide=id.p26) ### MyCPU discussions : In `1-single-cycle/src/main/scala/riscv/core/Execute.scala`, design Jump Target Address ```Code=C val branchTarget = io.instruction_address + io.immediate val jalTarget = branchTarget ``` * Branch target = PC + imm * JAL target = PC + imm ``` val jalrSum = io.reg1_data + io.immediate val jalrTarget = Cat(jalrSum(31, 1), 0.U(1.W)) ``` * JALR has two more features than JAL/Branch. 1. An additional input for "dependency register rs1" 2. An additional "alignment process": clear LSB In the EX stage of MyCPU, jal/branch only need to calculate PC + imm and share the same branchTarget; in contrast, jalr needs to clear LSB after calculating rs1 + imm, which introduces data dependencies and additional alignment logic. Therefore, PC-relative addressing significantly simplifies the datapath. Reference : [Github : ca2025-mycpu_Execute.scala](https://github.com/kstoko02/ca2025-mycpu/blob/main/1-single-cycle/src/main/scala/riscv/core/Execute.scala) ## Q69. Why is ALU control separated from main opcode decode? Because the main opcode decoder determines "what type of instruction this is"; the ALU control determines "what kind of operation the ALU should perform". If packing everything into a single, monolithic decoder forces it to handle all combinations of opcode, funct3, and funct7 at once. This leads to very large case logic, deep and complex combinational paths, and a design that is difficult to debug, modify, and extend. Using hierarchical decoding divides the work into smaller, simpler pieces, resulting in a cleaner design that is much easier to maintain and scale. ![Screenshot from 2026-01-07 15-39-00](https://hackmd.io/_uploads/Bkio6FiE-l.png) * The table lists all R-type instructions. A key point to note is that the rightmost instructions—add, sub, slt, xor, srl, or, and—all have the same opcode (0110011). This means that by only looking at the opcode, the CPU doesn't know which ALU operation to perform; it still needs to look at funct3/funct7. So, it would be better to separate them into a dedicated module. Refernce : [Lecture 18: Datapath I p.27](https://docs.google.com/presentation/d/1kYFjJROF0OVFPPXCrPXBnlY7d9lP2rwcLxNJvI6DuAE/edit?slide=id.p27#slide=id.p27) ### MyCPU discussions : In `3-pipeline/src/main/scala/riscv/core/ALUControl.scala`, the design centralizes the task of "selecting what the ALU should do," preventing the main decoder from being bogged down by the details of funct3/funct7. Enter three fields: * opcode (7 bits) : Major instruction category (I/R/Load/Store/Branch…) * funct3 (3 bits) : Subtype (e.g., addi/andi/xori…) * funct7 (7 bits) : Finer branch (specifically used to distinguish add vs sub, srl vs sra) ```Code=C is(InstructionTypes.I) { io.alu_funct := MuxLookup( io.funct3, ALUFunctions.zero )( IndexedSeq( InstructionsTypeI.addi -> ALUFunctions.add, InstructionsTypeI.slli -> ALUFunctions.sll, InstructionsTypeI.slti -> ALUFunctions.slt, InstructionsTypeI.sltiu -> ALUFunctions.sltu, InstructionsTypeI.xori -> ALUFunctions.xor, InstructionsTypeI.ori -> ALUFunctions.or, InstructionsTypeI.andi -> ALUFunctions.and, InstructionsTypeI.sri -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl) ) ) } ``` * If is I-type, determines the ALU action using funct3 (and funct7(5) of shift). ```Code=C is(InstructionTypes.RM) { io.alu_funct := MuxLookup( io.funct3, ALUFunctions.zero )( IndexedSeq( InstructionsTypeR.add_sub -> Mux(io.funct7(5), ALUFunctions.sub, ALUFunctions.add), InstructionsTypeR.sll -> ALUFunctions.sll, InstructionsTypeR.slt -> ALUFunctions.slt, InstructionsTypeR.sltu -> ALUFunctions.sltu, InstructionsTypeR.xor -> ALUFunctions.xor, InstructionsTypeR.or -> ALUFunctions.or, InstructionsTypeR.and -> ALUFunctions.and, InstructionsTypeR.sr -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl) ) ) } ``` * If is R-tpye, slso uses funct3 + funct7(5) for subdivision (add/sub, srl/sra). ```Code=C is(InstructionTypes.B) { io.alu_funct := ALUFunctions.add } is(InstructionTypes.L) { io.alu_funct := ALUFunctions.add } is(InstructionTypes.S) { io.alu_funct := ALUFunctions.add } is(Instructions.jal) { io.alu_funct := ALUFunctions.add } is(Instructions.jalr) { io.alu_funct := ALUFunctions.add } is(Instructions.lui) { io.alu_funct := ALUFunctions.add } is(Instructions.auipc) { io.alu_funct := ALUFunctions.add } ``` * B / L / S / jal / jalr / lui / auipc type are all additions. Therefore, if they are not separated, the main decoder must repeatedly write "ADD" in each opcode branch, and also handle the complex R/I functions of funct3/funct7. Reference : [Github : ca2025-mycpu_ALUControl.scala](https://github.com/kstoko02/ca2025-mycpu/blob/main/3-pipeline/src/main/scala/riscv/core/ALUControl.scala) ## Q68. What pipeline hazard is introduced by CSR instructions? CSR will primarily introduce **data hazard**. For example: ``` csrrw x0, csr, rs1 # Write rs1 to mstatus csrrwi x0 , csr, uimm # Read the old mstatus and write the immediate value ``` | | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 | | -------- | -------- | -------- | --- | --- | --- | | csrrw | IF | ID | EX(write CSR) | MEM | WB | | csrrwi | | IF | ID(read CSR) | EX | MEM | * As shown in the diagram above, csrrw writes a CSR in the EX stage during cycle 3. Then, in the same cycle, csrrwi reads the same CSR in the ID stage, resulting in a Read-After-Write data hazard. Reference : [Lecture 20: Processor Design Control p.26~31](https://docs.google.com/presentation/d/1EoTcGYgOlqO7ytnzHmQi_2JB1g-fyXpyYp7AV7dvwMs/edit?slide=id.p27#slide=id.p27) ### MyCPU discussions : In `3-pipeline/src/main/scala/riscv/core/CSR.scala`, can see desgin ```Code=C io.id_reg_read_data := MuxLookup(io.reg_read_address_id, 0.U)(regLUT) ``` * read CSR at ID stage ```Code=C val reg_write_enable_ex = Input(Bool()) val reg_write_address_ex = Input(UInt(...)) val reg_write_data_ex = Input(UInt(...)) ... }.elsewhen(io.reg_write_enable_ex) { ... mstatus := io.reg_write_data_ex ... } ``` * write CSR at EX stage so, this design will introduce RAW hazard, as mentioned above. But, `CSR.scala` includes a solution to the "CLINT vs pipeline" read conflict within the same cycle. ```Code=C io.clint_access_bundle.mstatus := Mux( io.reg_write_enable_ex && io.reg_write_address_ex === CSRRegister.MSTATUS, io.reg_write_data_ex, mstatus ) ``` If the same cycle pipeline is writing to mstatus, CLINT will directly use reg_write_data_ex (the new one) instead of mstatus (the old one). Reference : [Github : ca2025-mycpu_CSR.scala](https://github.com/kstoko02/ca2025-mycpu/blob/main/3-pipeline/src/main/scala/riscv/core/CSR.scala)