# Final project: Rework Homework3 contributed by <[00853029/ca2023-lab3](https://github.com/00853029/ca2023-lab3)> ## Environment and Setup > Follow [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p) Virtual Box: **ubuntu22.04.3** Install **sdkman** Install **Eclipse Temurin JDK 11** ```shell $ sudo apt install build-essential verilator gtkwave ``` * Installing **sbt** from **SDKMAN** ```shell $ sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) $ sdk install sbt ``` * Ubuntu and other Debian-based distributions ```shell sudo apt-get update sudo apt-get install apt-transport-https curl gnupg -yqq echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo -H gpg --no-default-keyring --keyring gnupg-ring:/etc/apt/trusted.gpg.d/scalasbt-release.gpg --import sudo chmod 644 /etc/apt/trusted.gpg.d/scalasbt-release.gpg sudo apt-get update sudo apt-get install sbt ``` ## Single-cycle RISC-V CPU #### Get the repository: ```shell $ git clone https://github.com/sysprog21/ca2023-lab3 ``` ![image](https://hackmd.io/_uploads/BkaeaDPH6.png) ### Finish MyCPU #### 1. InstructionFetch The original code forgot to control the PC (Program Counter). Therefore, I added a condition to determine whether to jump, deciding whether the PC value should go to the `jump_address` or `pc + 4`(move to next instruction). > Source: `./ca2023-lab3/src/main/scala/riscv/core/InstructionFetch.scala` ```java= // mycpu is freely redistributable under the MIT License. See the file // "LICENSE" for information on usage and redistribution of this file. package riscv.core import chisel3._ import riscv.Parameters object ProgramCounter { val EntryAddress = Parameters.EntryAddress } class InstructionFetch extends Module { val io = IO(new Bundle { val jump_flag_id = Input(Bool()) val jump_address_id = Input(UInt(Parameters.AddrWidth)) val instruction_read_data = Input(UInt(Parameters.DataWidth)) val instruction_valid = Input(Bool()) val instruction_address = Output(UInt(Parameters.AddrWidth)) val instruction = Output(UInt(Parameters.InstructionWidth)) }) val pc = RegInit(ProgramCounter.EntryAddress) when(io.instruction_valid) { io.instruction := io.instruction_read_data // lab3(InstructionFetch) begin when(io.jump_flag_id){ pc := io.jump_address_id }.otherwise { pc := pc + 0x4.U } // lab3(InstructionFetch) end }.otherwise { pc := pc io.instruction := 0x00000013.U } io.instruction_address := pc } ``` #### 2. InstructionDecode The original code didn't determine the control signals for `memory_read_enable`(*MemRead*) and `memory_write_enable`(*MemWrite*). Therefore, I added conditions to control these two memory control signals.(by opcode value) ![image](https://hackmd.io/_uploads/BJ2MpwPr6.png) > Source: `src/main/scala/riscv/core/InstructionDecode.scala` ```java= // mycpu is freely redistributable under the MIT License. See the file // "LICENSE" for information on usage and redistribution of this file. package riscv.core import scala.collection.immutable.ArraySeq import chisel3._ import chisel3.util._ import riscv.Parameters object InstructionTypes { val L = "b0000011".U val I = "b0010011".U val S = "b0100011".U val RM = "b0110011".U val B = "b1100011".U } object Instructions { val lui = "b0110111".U val nop = "b0000001".U val jal = "b1101111".U val jalr = "b1100111".U val auipc = "b0010111".U val csr = "b1110011".U val fence = "b0001111".U } object InstructionsTypeL { val lb = "b000".U val lh = "b001".U val lw = "b010".U val lbu = "b100".U val lhu = "b101".U } object InstructionsTypeI { val addi = 0.U val slli = 1.U val slti = 2.U val sltiu = 3.U val xori = 4.U val sri = 5.U val ori = 6.U val andi = 7.U } object InstructionsTypeS { val sb = "b000".U val sh = "b001".U val sw = "b010".U } InstructionsTypeR { val add_sub = 0.U val sll = 1.U val slt = 2.U val sltu = 3.U val xor = 4.U val sr = 5.U val or = 6.U val and = 7.U } object InstructionsTypeM { val mul = 0.U val mulh = 1.U val mulhsu = 2.U val mulhum = 3.U val div = 4.U val divu = 5.U val rem = 6.U val remu = 7.U } object InstructionsTypeB { val beq = "b000".U val bne = "b001".U val blt = "b100".U val bge = "b101".U val bltu = "b110".U val bgeu = "b111".U } object InstructionsTypeCSR { val csrrw = "b001".U val csrrs = "b010".U val csrrc = "b011".U val csrrwi = "b101".U val csrrsi = "b110".U val csrrci = "b111".U } object InstructionsNop { val nop = 0x00000013L.U(Parameters.DataWidth) } object InstructionsRet { val mret = 0x30200073L.U(Parameters.DataWidth) val ret = 0x00008067L.U(Parameters.DataWidth) } object InstructionsEnv { val ecall = 0x00000073L.U(Parameters.DataWidth) val ebreak = 0x00100073L.U(Parameters.DataWidth) } object ALUOp1Source { val Register = 0.U(1.W) val InstructionAddress = 1.U(1.W) } object ALUOp2Source { val Register = 0.U(1.W) val Immediate = 1.U(1.W) } object RegWriteSource { val ALUResult = 0.U(2.W) val Memory = 1.U(2.W) // val CSR = 2.U(2.W) val NextInstructionAddress = 3.U(2.W) } class InstructionDecode extends Module { val io = IO(new Bundle { val instruction = Input(UInt(Parameters.InstructionWidth)) val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) val ex_immediate = Output(UInt(Parameters.DataWidth)) val ex_aluop1_source = Output(UInt(1.W)) val ex_aluop2_source = Output(UInt(1.W)) val memory_read_enable = Output(Bool()) val memory_write_enable = Output(Bool()) val wb_reg_write_source = Output(UInt(2.W)) val reg_write_enable = Output(Bool()) val reg_write_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) }) val opcode = io.instruction(6, 0) val funct3 = io.instruction(14, 12) val funct7 = io.instruction(31, 25) val rd = io.instruction(11, 7) val rs1 = io.instruction(19, 15) val rs2 = io.instruction(24, 20) io.regs_reg1_read_address := Mux(opcode === Instructions.lui, 0.U(Parameters.PhysicalRegisterAddrWidth), rs1) io.regs_reg2_read_address := rs2 val immediate = MuxLookup( opcode, Cat(Fill(20, io.instruction(31)), io.instruction(31, 20)), IndexedSeq( InstructionTypes.I -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)), InstructionTypes.L -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)), Instructions.jalr -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)), InstructionTypes.S -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 25), io.instruction(11, 7)), InstructionTypes.B -> Cat( Fill(20, io.instruction(31)), io.instruction(7), io.instruction(30, 25), io.instruction(11, 8), 0.U(1.W) ), Instructions.lui -> Cat(io.instruction(31, 12), 0.U(12.W)), Instructions.auipc -> Cat(io.instruction(31, 12), 0.U(12.W)), // jal's imm represents a multiple of 2 bytes. Instructions.jal -> Cat( Fill(12, io.instruction(31)), io.instruction(19, 12), io.instruction(20), io.instruction(30, 21), 0.U(1.W) ) ) ) io.ex_immediate := immediate io.ex_aluop1_source := Mux( opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal, ALUOp1Source.InstructionAddress, ALUOp1Source.Register ) // ALU op2 from reg: R-type, // ALU op2 from imm: L-Type (I-type subtype), // I-type (nop=addi, jalr, csr-class, fence), // J-type (jal), // U-type (lui, auipc), // S-type (rs2 value sent to MemControl, ALU computes rs1 + imm.) // B-type (rs2 compares with rs1 in jump judge unit, ALU computes jump address PC+imm.) io.ex_aluop2_source := Mux( opcode === InstructionTypes.RM, ALUOp2Source.Register, ALUOp2Source.Immediate ) // lab3(InstructionDecode) begin io.memory_read_enable := Mux( opcode === InstructionTypes.L, 1.U(1.W), 0.U(1.W) ) io.memory_write_enable := Mux( opcode === InstructionTypes.S, 1.U(1.W), 0.U(1.W) ) // lab3(InstructionDecode) end io.wb_reg_write_source := MuxCase( RegWriteSource.ALUResult, ArraySeq( (opcode === InstructionTypes.RM || opcode === InstructionTypes.I || opcode === Instructions.lui || opcode === Instructions.auipc) -> RegWriteSource.ALUResult, // same as default (opcode === InstructionTypes.L) -> RegWriteSource.Memory, (opcode === Instructions.jal || opcode === Instructions.jalr) -> RegWriteSource.NextInstructionAddress ) ) io.reg_write_enable := (opcode === InstructionTypes.RM) || (opcode === InstructionTypes.I) || (opcode === InstructionTypes.L) || (opcode === Instructions.auipc) || (opcode === Instructions.lui) || (opcode === Instructions.jal) || (opcode === Instructions.jalr) io.reg_write_address := rd } ``` #### 3. Execute The original code didn't pass signal lines into the ALU for computation. Therefore, I included the `op1`, `op2` resources, and `func` control signal into the ALU module. Then, the `alu.io.result` will be connected to the output in the following code. > Source: `src/main/scala/riscv/core/Execute.scala` ```java= // mycpu is freely redistributable under the MIT License. See the file // "LICENSE" for information on usage and redistribution of this file. package riscv.core import chisel3._ import chisel3.util.Cat import chisel3.util.MuxLookup import riscv.Parameters class Execute extends Module { val io = IO(new Bundle { val instruction = Input(UInt(Parameters.InstructionWidth)) val instruction_address = Input(UInt(Parameters.AddrWidth)) val reg1_data = Input(UInt(Parameters.DataWidth)) val reg2_data = Input(UInt(Parameters.DataWidth)) val immediate = Input(UInt(Parameters.DataWidth)) val aluop1_source = Input(UInt(1.W)) val aluop2_source = Input(UInt(1.W)) val mem_alu_result = Output(UInt(Parameters.DataWidth)) val if_jump_flag = Output(Bool()) val if_jump_address = Output(UInt(Parameters.DataWidth)) }) val opcode = io.instruction(6, 0) val funct3 = io.instruction(14, 12) val funct7 = io.instruction(31, 25) val rd = io.instruction(11, 7) val uimm = io.instruction(19, 15) val alu = Module(new ALU) val alu_ctrl = Module(new ALUControl) alu_ctrl.io.opcode := opcode alu_ctrl.io.funct3 := funct3 alu_ctrl.io.funct7 := funct7 // lab3(Execute) begin alu.io.func := alu_ctrl.io.alu_funct alu.io.op1 := Mux(io.aluop1_source.asBool, io.instruction_address, io.reg1_data) alu.io.op2 := Mux(io.aluop2_source.asBool, io.immediate, io.reg2_data) // lab3(Execute) end io.mem_alu_result := alu.io.result io.if_jump_flag := opcode === Instructions.jal || (opcode === Instructions.jalr) || (opcode === InstructionTypes.B) && MuxLookup( funct3, false.B, IndexedSeq( InstructionsTypeB.beq -> (io.reg1_data === io.reg2_data), InstructionsTypeB.bne -> (io.reg1_data =/= io.reg2_data), InstructionsTypeB.blt -> (io.reg1_data.asSInt < io.reg2_data.asSInt), InstructionsTypeB.bge -> (io.reg1_data.asSInt >= io.reg2_data.asSInt), InstructionsTypeB.bltu -> (io.reg1_data.asUInt < io.reg2_data.asUInt), InstructionsTypeB.bgeu -> (io.reg1_data.asUInt >= io.reg2_data.asUInt) ) ) io.if_jump_address := io.immediate + Mux(opcode === Instructions.jalr, io.reg1_data, io.instruction_address) } ``` #### 4. CPU I added connections between the id module and exe module that were previously unconnected in the original code. > Source: `src/main/scala/riscv/core/CPU.scala` ```java= // mycpu is freely redistributable under the MIT License. See the file // "LICENSE" for information on usage and redistribution of this file. package riscv.core import chisel3._ import chisel3.util.Cat import riscv.CPUBundle import riscv.Parameters class CPU extends Module { val io = IO(new CPUBundle) val regs = Module(new RegisterFile) val inst_fetch = Module(new InstructionFetch) val id = Module(new InstructionDecode) val ex = Module(new Execute) val mem = Module(new MemoryAccess) val wb = Module(new WriteBack) io.deviceSelect := mem.io.memory_bundle .address(Parameters.AddrBits - 1, Parameters.AddrBits - Parameters.SlaveDeviceCountBits) inst_fetch.io.jump_address_id := ex.io.if_jump_address inst_fetch.io.jump_flag_id := ex.io.if_jump_flag inst_fetch.io.instruction_valid := io.instruction_valid inst_fetch.io.instruction_read_data := io.instruction io.instruction_address := inst_fetch.io.instruction_address regs.io.write_enable := id.io.reg_write_enable regs.io.write_address := id.io.reg_write_address regs.io.write_data := wb.io.regs_write_data regs.io.read_address1 := id.io.regs_reg1_read_address regs.io.read_address2 := id.io.regs_reg2_read_address regs.io.debug_read_address := io.debug_read_address io.debug_read_data := regs.io.debug_read_data id.io.instruction := inst_fetch.io.instruction // lab3(cpu) begin ex.io.instruction := id.io.instruction ex.io.instruction_address := inst_fetch.io.instruction_address ex.io.reg1_data := regs.io.read_data1 ex.io.reg2_data := regs.io.read_data2 ex.io.immediate := id.io.ex_immediate ex.io.aluop1_source := id.io.ex_aluop1_source ex.io.aluop2_source := id.io.ex_aluop2_source // lab3(cpu) end mem.io.alu_result := ex.io.mem_alu_result mem.io.reg2_data := regs.io.read_data2 mem.io.memory_read_enable := id.io.memory_read_enable mem.io.memory_write_enable := id.io.memory_write_enable mem.io.funct3 := inst_fetch.io.instruction(14, 12) io.memory_bundle.address := Cat( 0.U(Parameters.SlaveDeviceCountBits.W), mem.io.memory_bundle.address(Parameters.AddrBits - 1 - Parameters.SlaveDeviceCountBits, 0) ) io.memory_bundle.write_enable := mem.io.memory_bundle.write_enable io.memory_bundle.write_data := mem.io.memory_bundle.write_data io.memory_bundle.write_strobe := mem.io.memory_bundle.write_strobe mem.io.memory_bundle.read_data := io.memory_bundle.read_data wb.io.instruction_address := inst_fetch.io.instruction_address wb.io.alu_result := ex.io.mem_alu_result wb.io.memory_read_data := mem.io.wb_memory_read_data wb.io.regs_write_source := id.io.wb_reg_write_source } ``` ## MyCPU testing ```shell= $ make test sbt test [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/vboxuser/chisel-tutorial/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/vboxuser/chisel-tutorial/ca2023-lab3/) [info] compiling 1 Scala source to /home/vboxuser/chisel-tutorial/ca2023-lab3/target/scala-2.13/test-classes ... [info] done compiling [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] Run completed in 17 seconds, 631 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 23 s, completed Jan 12, 2024 10:04:35 PM ``` ### Modify assembly code [Assembly code](https://github.com/00853029/ca2023-lab3/blob/main/csrc/clz.S) - To check the correctness of the answers in HW2's assembly code involves confirming the accuracy of the values stored in the two specified registers. ```= li s2, 0 #s2: high 32 of number li s3, 0 #s3: low 32 of number . . . add s2, s2, s6 mv s3, s7 ``` ### Modify Makefile Add `main.asmbin \` to `BINS` ```= BINS = \ main.asmbin \ fibonacci.asmbin \ hello.asmbin \ mmio.asmbin \ quicksort.asmbin \ sb.asmbin ``` ### Modify CPUTest.scala - Add the following code in `CPUTest.scala`: ```java=116 class main extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "answer is s2 = 0x1234540a , s3 = 0x8f5c3d98." in { test(new TestTopModule("main.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 1000) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.regs_debug_read_address.poke(18.U) c.io.regs_debug_read_data.expect(0x1234540aL.U) c.io.regs_debug_read_address.poke(19.U) c.io.regs_debug_read_data.expect(0x8f5c3d98L.U) } } } ``` ### Testing previous assembly by MyCPU: - **[info] Run completed in 19 seconds, 659 milliseconds. [info] Total number of tests run: 10 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed.** - more details : ```= sbt test [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/vboxuser/chisel-tutorial/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/vboxuser/chisel-tutorial/ca2023-lab3/) [info] compiling 1 Scala source to /home/vboxuser/chisel-tutorial/ca2023-lab3/target/scala-2.13/test-classes ... [info] done compiling [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] main: [info] Single Cycle CPU [info] - should answer is s2 = 0x1234540a , s3 = 0x8f5c3d98. [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] Run completed in 19 seconds, 659 milliseconds. [info] Total number of tests run: 10 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 24 s, completed Jan 12, 2024 10:56:09 PM ``` :::warning No usage of Verilator nor extended requirements. :::