# Evaluate NucleusRV > 林趺菩 [GitHub](https://github.com/merledu/nucleusrv) NucleusRV is a 32-bit 5-stage pipelined RISC-V core implemented in Chisel. ## Prerequisites Cloning `nucleusrv` repository. ```shell $ git clone https://github.com/merledu/nucleusrv.git ``` --- Since I encountered difficulties when building the `riscv-gnu-toolchain`, I referenced the web resources and decided to follow its guide, so I didn't have to build the `riscv-gnu-toolchain` from scratch. * download `riscv-gnu-toolchain` related files ```shell $ wget https://github.com/riscv-collab/riscv-gnu-toolchain/releases/download/2024.12.16/riscv32-elf-ubuntu-22.04-gcc-nightly-2024.12.16-nightly.tar.xz $ tar -xvf riscv32-elf-ubuntu-22.04-gcc-nightly-2024.12.16-nightly.tar.xz $ rm -f riscv32-elf-ubuntu-22.04-gcc-nightly-2024.12.16-nightly.tar.xz ``` --- I decide to use docker to efficiently build the required environment. * The Dockerfile content. ``` # Start with Ubuntu 22.04 as the base image FROM ubuntu:22.04 # set time zone to avoid some request when apt install ENV TZ="Asia/Taipei" RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone # Update package lists and install required packages RUN apt-get update && apt-get install -y \ build-essential \ verilator \ gtkwave \ curl \ zip \ unzip \ sudo \ bsdmainutils \ && rm -rf /var/lib/apt/lists/* # Set up SDKMAN RUN curl -s "https://get.sdkman.io" | bash # Set up environment for SDKMAN ENV SDKMAN_DIR="/root/.sdkman" ENV PATH="${PATH}:${SDKMAN_DIR}/bin:${SDKMAN_DIR}/candidates/java/current/bin:${SDKMAN_DIR}/candidates/sbt/current/bin" # Install Java and SBT using SDKMAN RUN bash -c "source ${SDKMAN_DIR}/bin/sdkman-init.sh && \ sdk install java 11.0.21-tem && \ sdk install sbt" # Set the working directory WORKDIR /nucleusrv ############################ for riscv-gnu-toolchain ############################ # Copy the riscv directory to /opt/riscv COPY ./riscv /opt/riscv # Add /opt/riscv/bin to PATH RUN echo "export PATH=/opt/riscv/bin:$PATH" >> /root/.bashrc ############################ for riscv-gnu-toolchain ############################ # Set the default command CMD ["/bin/bash", "-c", "source /root/.bashrc && /bin/bash"] ``` * The script to build the docker image. ```shell $ bash build.sh ``` * The script to run the docker container. ```shell $ bash run.sh ``` ## NucleusRV Demo First run the docker container. ```shell $ bash run.sh ``` The terminal will look like this: ``` root@0e76ca700801:/app# ls Dockerfile build.sh nucleusrv riscv run.sh ``` ### Building C Programs (hello_world) Referencing the steps that the `nucleusrv` repository gives: ![image](https://hackmd.io/_uploads/ry3bMknUyl.png) ```shell $ cd nucleusrv/tools/ $ make PROGRAM=hello_world ``` The terminal output: ``` rm -rf out riscv32-unknown-elf-gcc -c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer -c -o tests/hello_world/hello.o tests/hello_world/hello.c riscv32-unknown-elf-gcc -c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer -c -o tests/hello_world/main.o tests/hello_world/main.c riscv32-unknown-elf-gcc -c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer -c -o tests/hello_world/world.o tests/hello_world/world.c riscv32-unknown-elf-gcc -march=rv32im -mabi=ilp32 -static -nostdlib -nostartfiles -T link.ld tests/hello_world/hello.o tests/hello_world/main.o tests/hello_world/world.o -o out/program.elf -lgcc riscv32-unknown-elf-objdump --disassemble-all --section=.text out/program.elf > out/program.dump python3 makehex.py out/program.elf 2048 > out/program.hex ``` The corresponding `program.dump`, `program.elf`, `program.hex` files will be generated under `nucleusrv/tools/out/` The content of `program.dump`: ``` out/program.elf: file format elf32-littleriscv Disassembly of section .text: 00000000 <hello>: 0: ff010113 addi sp,sp,-16 4: 00400793 li a5,4 8: 00f12623 sw a5,12(sp) c: 00500793 li a5,5 10: 00f12423 sw a5,8(sp) 14: 00c12703 lw a4,12(sp) 18: 00812783 lw a5,8(sp) 1c: 00f707b3 add a5,a4,a5 20: 00f12223 sw a5,4(sp) 24: 00412783 lw a5,4(sp) 28: 00078513 mv a0,a5 2c: 01010113 addi sp,sp,16 30: 00008067 ret 00000034 <main>: 34: fe010113 addi sp,sp,-32 38: 00112e23 sw ra,28(sp) 3c: fc5ff0ef jal 0 <hello> 40: 00a12623 sw a0,12(sp) 44: 02c000ef jal 70 <world> 48: 00a12423 sw a0,8(sp) 4c: 00c12703 lw a4,12(sp) 50: 00812783 lw a5,8(sp) 54: 00f707b3 add a5,a4,a5 58: 00f12223 sw a5,4(sp) 5c: 00000793 li a5,0 60: 00078513 mv a0,a5 64: 01c12083 lw ra,28(sp) 68: 02010113 addi sp,sp,32 6c: 00008067 ret 00000070 <world>: 70: 00500793 li a5,5 74: 00078513 mv a0,a5 78: 00008067 ret ``` ### Building with SBT Referencing the steps that the `nucleusrv` repository gives: ![image](https://hackmd.io/_uploads/rJPErkn8Jl.png) Moving to `nucleusrv` directory: ```shell $ cd .. ``` Opening SBT server: ```shell $ sbt ``` The terminal output: ``` ... [info] loading settings for project nucleusrv from build.sbt ... [info] set current project to nucleusrv (in build file:/app/nucleusrv/) [info] sbt server started at local:///root/.sbt/1.0/server/0aa2831cde32e66c128a/sock [info] started sbt server ``` Running SBT test: ```shell $ testOnly nucleusrv.components.TopTest -- -DwriteVcd=1 -DprogramFile=/app/nucleusrv/tools/out/program.hex ``` * `DwriteVcd=1`: This flag enables VCD (Value Change Dump) file generation, which is useful for waveform viewing and debugging purpose. * `DprogramFile=/app/nucleusrv/tools/out/program.hex`: This specifies the path to the program file (in hexadecimal format) that will be used for testing. The terminal output: ``` ... Enabling waves.. Exit Code: 0 [info] - Top Test [info] Run completed in 4 seconds, 903 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 14 s, completed Jan 9, 2025, 6:29:50 PM sbt:nucleusrv> ``` If we want to exit the sbt server, just use `CTRL+D`. ### Running Compliance Tests Referencing the steps that the `nucleusrv` repository gives: ![image](https://hackmd.io/_uploads/r1JKrJhUJe.png) Cloning `riscv-arch-test` repository under `nucleusrv`. ```shell $ git clone git@github.com:riscv-non-isa/riscv-arch-test.git -b 1.0 ``` The default `run_compliance.sh` uses `riscv64`, so I modified to `riscv32`. ![image](https://hackmd.io/_uploads/HkHsVcjwJg.png) Running compliance tests: ```shell $ bash run_compliance.sh rv32i ``` The terminal output shows some errors: ``` /app/nucleusrv/test_run_dir/Top_Test/VTop exists. make \ RISCV_TARGET=nucleusrv \ RISCV_DEVICE=rv32i \ RISCV_PREFIX=riscv32-unknown-elf- \ clean -C /app/nucleusrv/riscv-arch-test/riscv-test-suite/rv32i make[1]: Entering directory '/app/nucleusrv/riscv-arch-test/riscv-test-suite/rv32i' rm -rf /app/nucleusrv/riscv-arch-test/work make[1]: Leaving directory '/app/nucleusrv/riscv-arch-test/riscv-test-suite/rv32i' make \ RISCV_TARGET=nucleusrv \ RISCV_DEVICE=rv32i \ RISCV_PREFIX=riscv32-unknown-elf- \ run -C /app/nucleusrv/riscv-arch-test/riscv-test-suite/rv32i make[1]: Entering directory '/app/nucleusrv/riscv-arch-test/riscv-test-suite/rv32i' Compile /app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf src/I-MISALIGN_JMP-01.S: Assembler messages: src/I-MISALIGN_JMP-01.S:48: Error: unrecognized opcode `csrrw x31,mtvec,x1', extension `zicsr' required src/I-MISALIGN_JMP-01.S:51: Error: unrecognized opcode `csrrci x0,misa,4', extension `zicsr' required src/I-MISALIGN_JMP-01.S:273: Error: unrecognized opcode `csrw mtvec,x31', extension `zicsr' required src/I-MISALIGN_JMP-01.S:282: Error: unrecognized opcode `csrr x30,mtval', extension `zicsr' required src/I-MISALIGN_JMP-01.S:284: Error: unrecognized opcode `csrw mepc,x30', extension `zicsr' required src/I-MISALIGN_JMP-01.S:287: Error: unrecognized opcode `csrr x30,mtval', extension `zicsr' required src/I-MISALIGN_JMP-01.S:292: Error: unrecognized opcode `csrr x30,mcause', extension `zicsr' required riscv32-unknown-elf-objcopy: '/app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf': No such file riscv32-unknown-elf-objcopy: '/app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf': No such file riscv32-unknown-elf-objdump: '/app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf': No such file hexdump: /app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf.text.bin: No such file or directory hexdump: all input file arguments failed hexdump: /app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf.data.bin: No such file or directory hexdump: all input file arguments failed make[1]: *** [Makefile:50: /app/nucleusrv/riscv-arch-test/work/rv32i/I-MISALIGN_JMP-01.elf] Error 1 make[1]: Leaving directory '/app/nucleusrv/riscv-arch-test/riscv-test-suite/rv32i' make: *** [Makefile:79: simulate] Error 2 ``` The error messages are related to the `RISC-V Control and Status Register (CSR)` instructions. These errors occur because the compiler is not recognizing the `CSR` instructions, which are part of the `Zicsr` extension in RISC-V. To resolve this issue, I need to explicitly enable the `Zicsr` extension when compiling the code. I modify the Makefile at `nucleusrv/riscv-arch-test/riscv-test-suite/rv32i/Makefile`. At line 48, modify the `-march` flag from `rv32i` to `rv32i_zicsr` Running compliance tests again: ```shell $ bash run_compliance.sh rv32i ``` The terminal output still shows errors: ``` ... Check I-SW-011d0 < ffffffff 8d6 < 7fffffff 14,15d11 < ffffffff < fffff801 24,25d19 < 7fffffff < 00000001 31d24 < fffff801 ... FAIL Check I-XOR-011d0 < 00000000 5d3 < 80000000 9d6 < 7ffffffe 13,15d9 < ffffea33 < 00000000 < fffff800 25d18 < 7ffffffe 33d25 < ffffffff ... FAIL Check I-XORI-014d3 < ffffffff 7d5 < f89abb21 20d17 < ffffffff 23d19 < f89abb21 25d20 < fffff801 30,31d24 < 00000000 < fffff800 ... FAIL -------------------------------- FAIL: 48/48 RISCV_TARGET=nucleusrv RISCV_DEVICE=rv32i RISCV_ISA=rv32i make: *** [Makefile:86: verify] Error 1 ``` Seems like all 48 tests are failed, it doesn't make sense. I want to debug by comparing the actual output and the golden data. Take `I-ADD-01` for example, I compare `nucleusrv/riscv-arch-test/work/rv32i/I-ADD-01.signature.output` and `nucleusrv/riscv-arch-test/riscv-test-suite/rv32i/references/I-ADD-01.reference_output`. ![image](https://hackmd.io/_uploads/BkeiPik_kl.png) This problem has not been solved yet ... ## NucleusRV explanation ### Instruction Fetch ![image](https://hackmd.io/_uploads/B1e0DXVRDJl.png) The `InstructionFetch` module is designed to fetch instructions from memory based on a given address. Code can be found in `nucleusrv/src/main/scala/components/InstructionFetch.scala` ```scala= package nucleusrv.components import chisel3._ import chisel3.util._ class InstructionFetch extends Module { val io = IO(new Bundle { val address: UInt = Input(UInt(32.W)) val instruction: UInt = Output(UInt(32.W)) val stall: Bool = Input(Bool()) val coreInstrReq = Decoupled(new MemRequestIO) val coreInstrResp = Flipped(Decoupled(new MemResponseIO)) }) val rst = Wire(Bool()) rst := reset.asBool() io.coreInstrResp.ready := true.B ... ``` --- ```scala io.coreInstrReq.bits.activeByteLane := "b1111".U ``` Indicating that all four bytes of a 32-bit word are active. --- ```scala io.coreInstrReq.bits.isWrite := false.B ``` Indicating that this is a read operation. --- ```scala io.coreInstrReq.bits.dataRequest := DontCare ``` Since we're performing a read operation (fetching an instruction), we don't need to specify any data to write. --- ```scala io.coreInstrReq.bits.addrRequest := io.address >> 2 ``` Sets the address for the memory request. The input address `io.address` is right-shifted by 2 bits, which is equivalent to dividing by 4. This operation converts the byte address to the word address. --- ```scala io.coreInstrReq.valid := Mux(rst || io.stall, false.B, true.B) ``` Ensures that no instruction fetch requests are made when the system is being reset or when the pipeline is stalled. --- ```scala io.instruction := Mux(io.coreInstrResp.valid, io.coreInstrResp.bits.dataResponse, DontCare) ``` Ensures that the instruction output is only updated with valid data from memory, and remains in an undefined state when no valid instruction has been fetched. ### Instruction Decode The `InstructionDecode` stage is responsible for decoding instructions. Code can be found in `nucleusrv/src/main/scala/components/InstructionDecode.scala` ```scala= package nucleusrv.components import chisel3._ class InstructionDecode(TRACE:Boolean) extends Module { val io = IO(new Bundle { val id_instruction = Input(UInt(32.W)) val writeData = Input(UInt(32.W)) val writeReg = Input(UInt(5.W)) val pcAddress = Input(UInt(32.W)) val ctl_writeEnable = Input(Bool()) val id_ex_mem_read = Input(Bool()) // val ex_mem_mem_write = Input(Bool()) val ex_mem_mem_read = Input(Bool()) val dmem_resp_valid = Input(Bool()) val id_ex_rd = Input(UInt(5.W)) val ex_mem_rd = Input(UInt(5.W)) val id_ex_branch = Input(Bool()) //for forwarding val ex_mem_ins = Input(UInt(32.W)) val mem_wb_ins = Input(UInt(32.W)) val ex_ins = Input(UInt(32.W)) val ex_result = Input(UInt(32.W)) val ex_mem_result = Input(UInt(32.W)) val mem_wb_result = Input(UInt(32.W)) //Outputs val immediate = Output(UInt(32.W)) val writeRegAddress = Output(UInt(5.W)) val readData1 = Output(UInt(32.W)) val readData2 = Output(UInt(32.W)) val func7 = Output(UInt(7.W)) val func3 = Output(UInt(3.W)) val ctl_aluSrc = Output(Bool()) val ctl_memToReg = Output(UInt(2.W)) val ctl_regWrite = Output(Bool()) val ctl_memRead = Output(Bool()) val ctl_memWrite = Output(Bool()) val ctl_branch = Output(Bool()) val ctl_aluOp = Output(UInt(2.W)) val ctl_jump = Output(UInt(2.W)) val ctl_aluSrc1 = Output(UInt(2.W)) val hdu_pcWrite = Output(Bool()) val hdu_if_reg_write = Output(Bool()) val pcSrc = Output(Bool()) val pcPlusOffset = Output(UInt(32.W)) val ifid_flush = Output(Bool()) val stall = Output(Bool()) // RVFI pins val rs_addr = if (TRACE) Some(Output(Vec(2, UInt(5.W)))) else None }) //Hazard Detection Unit val hdu = Module(new HazardUnit) ... //Control Unit val control = Module(new Control) ... //Register File val registers = Module(new Registers) ... val immediate = Module(new ImmediateGen) immediate.io.instruction := io.id_instruction io.immediate := immediate.io.out ... //Branch Unit val bu = Module(new BranchUnit) ... ``` The `InstructionDecode` module instantiates several sub-modules to perform specific tasks. The `HazardUnit` module is used to detect and handle hazards in the pipeline. The `Control` module generates control signals based on the instruction opcode. The `Registers` module represents the register file, which stores and retrieves register values. The `ImmediateGen` module generates immediate values from the instruction. The `BranchUnit` module evaluates branch conditions, and calculates the target address for branches and jumps. --- ```scala when(hdu.io.ctl_mux && io.id_instruction =/= "h13".U) { io.ctl_memWrite := control.io.memWrite io.ctl_regWrite := control.io.regWrite }.otherwise { io.ctl_memWrite := false.B io.ctl_regWrite := false.B } ``` It allows normal operation when the `HDU (Hazard Detection Unit)` indicates it's safe and the instruction is not a `NOP (No Operation)`. It disables memory and register writes when there's a hazard or when processing a `NOP` instruction. --- ```scala //Forwarding to fix structural hazard when(io.ctl_writeEnable && (io.writeReg === registerRs1)){ when(registerRs1 === 0.U){ io.readData1 := 0.U }.otherwise{ io.readData1 := io.writeData } }.otherwise{ io.readData1 := registers.io.readData(0) } when(io.ctl_writeEnable && (io.writeReg === registerRs2)){ when(registerRs2 === 0.U){ io.readData2 := 0.U }.otherwise{ io.readData2 := io.writeData } }.otherwise{ io.readData2 := registers.io.readData(1) } ``` This forwarding logic serves to resolve structural hazards. It handles the case where a register is being written to and read from in the same cycle. Instead of waiting for the write to complete and then reading (which would introduce a delay), it forwards the data being written directly to the read output. It maintains the behavior of the `zero register (always reading as 0)` even in forwarding situations. --- ```scala // Branch Forwarding val input1 = Wire(UInt(32.W)) val input2 = Wire(UInt(32.W)) when(registerRs1 === io.ex_mem_ins(11, 7)) { input1 := io.ex_mem_result }.elsewhen(registerRs1 === io.mem_wb_ins(11, 7)) { input1 := io.mem_wb_result } .otherwise { input1 := io.readData1 } when(registerRs2 === io.ex_mem_ins(11, 7)) { input2 := io.ex_mem_result }.elsewhen(registerRs2 === io.mem_wb_ins(11, 7)) { input2 := io.mem_wb_result } .otherwise { input2 := io.readData2 } ``` The branch forwarding logic resolves data hazards specifically for branch instructions. If `registerRs1` / `registerRs2` matches the destination register of the instruction in the `EX/MEM` stage, `input1` / `input2` is set to the result from that stage. Else if `registerRs1` / `registerRs2` matches the destination register of the instruction in the `MEM/WB` stage, `input1` / `input2` is set to the result from that stage. Otherwise, `input1` / `input2` is set to the value read from the register file. --- ```scala //Forwarding for Jump val j_offset = Wire(UInt(32.W)) when(registerRs1 === io.ex_ins(11, 7)){ j_offset := io.ex_result }.elsewhen(registerRs1 === io.ex_mem_ins(11, 7)) { j_offset := io.ex_mem_result }.elsewhen(registerRs1 === io.mem_wb_ins(11, 7)) { j_offset := io.mem_wb_result }.elsewhen(registerRs1 === io.ex_ins(11, 7)){ j_offset := io.ex_result }.otherwise { j_offset := io.readData1 } ``` The forwarding logic resolves data hazards that can occur when a jump instruction depends on the result of a recent instruction that hasn't yet been written back to the register file. If `registerRs1` matches the destination register of the instruction in the `EX` stage, `j_offset` is set to the result from that stage. Else if `registerRs1` matches the destination register of the instruction in the `EX/MEM` stage, `j_offset` is set to the result from that stage. Else if `registerRs1` matches the destination register of the instruction in the `MEM/WB` stage, `j_offset` is set to the result from that stage. There's a redundant check for the `EX` stage again (likely a mistake in the code ?). If none of the above conditions are met, `j_offset` is set to the value read from the register file `io.readData1`. --- ```scala //Offset Calculation (Jump/Branch) when(io.ctl_jump === 1.U) { io.pcPlusOffset := io.pcAddress + io.immediate }.elsewhen(io.ctl_jump === 2.U) { io.pcPlusOffset := j_offset + io.immediate } .otherwise { io.pcPlusOffset := io.pcAddress + immediate.io.out } when(bu.io.taken || io.ctl_jump =/= 0.U) { io.pcSrc := true.B }.otherwise { io.pcSrc := false.B } ``` The code handles offset calculation for jump and branch instructions. It calculates the next `program counter (PC)` value based on the type of control flow instruction (jump/branch) and determines whether the `PC` should be updated. `io.ctl_jump === 1.U` checks if the control signal `ctl_jump` indicates a jump instruction where the offset is calculated relative to the current program counter `pcAddress`. The next `PC` value `io.pcPlusOffset` is computed as: `pcPlusOffset = pcAddress + immediate`, typically for jump instructions like `jal (jump and link)`. `io.ctl_jump === 2.U` checks if the control signal `ctl_jump` indicates a jump instruction where the offset is calculated relative to a register value `j_offset`. The next `PC` value is computed as: `pcPlusOffset = j_offset + immediate`, typically for `jalr (jump and link register)`. Otherwise if no jump is indicated, it assumes a branch instruction or regular sequential execution. The next PC value is computed as: `pcPlusOffset = pcAddress + immediate`, typically for branch instructions like `beq`, `bne`, etc., where the offset is relative to the current `PC`. If `bu.io.taken || io.ctl_jump =/= 0.U` is true, which means that either a branch is taken or a jump instruction is present, `io.pcSrc` is set to `true.B`, indicates that the program counter should be updated to the new target address. Else if `bu.io.taken || io.ctl_jump =/= 0.U` is false, which means that neither a branch is taken nor a jump instruction exists, `io.pcSrc` is set to `false.B`, indicates that the program counter will not change and will continue sequentially. --- ```scala //Instruction Flush io.ifid_flush := hdu.io.ifid_flush io.writeRegAddress := io.id_instruction(11, 7) io.func3 := io.id_instruction(14, 12) when((io.id_instruction(6,0) === "b0110011".U) | ((io.id_instruction(6,0) === "b0010011".U) & (io.func3 === 5.U))){ io.func7 := io.id_instruction(31,25) }.otherwise{ io.func7 := 0.U } io.stall := io.func7 === 1.U && (io.func3 === 4.U || io.func3 === 5.U || io.func3 === 6.U || io.func3 === 7.U) ``` The code handles instruction flushing, extracts specific fields from the instruction, and determines if a stall is necessary. Checks if the `opcode (bits 6-0)` is either `"0110011" (R-type)` or `"0010011" (I-type)` with `func3 == 5`. If true, it sets `func7` to `bits 31-25` of the instruction, otherwise it sets `func7` to 0. Determines if a stall is necessary: it checks if `func7` is 1 and `func3` is either 4, 5, 6, or 7. This likely identifies specific instructions (`RV32M` instructions) that require additional processing time, necessitating a pipeline stall. ### Execute Code can be found in `nucleusrv/src/main/scala/components/Execute.scala` ```scala= package nucleusrv.components import chisel3._ import chisel3.util.MuxCase class Execute(M:Boolean = false) extends Module { val io = IO(new Bundle { val immediate = Input(UInt(32.W)) val readData1 = Input(UInt(32.W)) val readData2 = Input(UInt(32.W)) val pcAddress = Input(UInt(32.W)) val func7 = Input(UInt(7.W)) val func3 = Input(UInt(3.W)) val mem_result = Input(UInt(32.W)) val wb_result = Input(UInt(32.W)) val ex_mem_regWrite = Input(Bool()) val mem_wb_regWrite = Input(Bool()) val id_ex_ins = Input(UInt(32.W)) val ex_mem_ins = Input(UInt(32.W)) val mem_wb_ins = Input(UInt(32.W)) val ctl_aluSrc = Input(Bool()) val ctl_aluOp = Input(UInt(2.W)) val ctl_aluSrc1 = Input(UInt(2.W)) val writeData = Output(UInt(32.W)) val ALUresult = Output(UInt(32.W)) val stall = Output(Bool()) }) val alu = Module(new ALU) val aluCtl = Module(new AluControl) val fu = Module(new ForwardingUnit).io // Forwarding Unt fu.ex_regWrite := io.ex_mem_regWrite fu.mem_regWrite := io.mem_wb_regWrite fu.ex_reg_rd := io.ex_mem_ins(11, 7) fu.mem_reg_rd := io.mem_wb_ins(11, 7) fu.reg_rs1 := io.id_ex_ins(19, 15) fu.reg_rs2 := io.id_ex_ins(24, 20) val inputMux1 = MuxCase( 0.U, Array( (fu.forwardA === 0.U) -> (io.readData1), (fu.forwardA === 1.U) -> (io.mem_result), (fu.forwardA === 2.U) -> (io.wb_result) ) ) val inputMux2 = MuxCase( 0.U, Array( (fu.forwardB === 0.U) -> (io.readData2), (fu.forwardB === 1.U) -> (io.mem_result), (fu.forwardB === 2.U) -> (io.wb_result) ) ) val aluIn1 = MuxCase( inputMux1, Array( (io.ctl_aluSrc1 === 1.U) -> io.pcAddress, (io.ctl_aluSrc1 === 2.U) -> 0.U ) ) val aluIn2 = Mux(io.ctl_aluSrc, inputMux2, io.immediate) aluCtl.io.f3 := io.func3 aluCtl.io.f7 := io.func7(5) aluCtl.io.aluOp := io.ctl_aluOp aluCtl.io.aluSrc := io.ctl_aluSrc alu.io.input1 := aluIn1 alu.io.input2 := aluIn2 alu.io.aluCtl := aluCtl.io.out io.stall := false.B if(M){ val mdu = Module (new MDU) mdu.io.src_a := aluIn1 mdu.io.src_b := aluIn2 mdu.io.op := io.func3 // mdu.io.valid := true.B // io.stall := false.B val src_a_reg = RegInit(0.U(32.W)) val src_b_reg = RegInit(0.U(32.W)) val op_reg = RegInit(0.U(3.W)) val div_en = RegInit(false.B) val f7_reg = RegInit(0.U(6.W)) val counter = RegInit(0.U(6.W)) when(io.func7 === 1.U && (io.func3 === 0.U || io.func3 === 1.U || io.func3 === 2.U || io.func3 === 3.U)){ mdu.io.valid := true.B }otherwise{ mdu.io.valid := false.B } dontTouch(io.stall) when(io.func7 === 1.U && ~div_en && (io.func3 === 4.U || io.func3 === 5.U || io.func3 === 6.U || io.func3 === 7.U)){ mdu.io.valid := RegNext(true.B) div_en := true.B src_a_reg := aluIn1 src_b_reg := aluIn2 op_reg := io.func3 f7_reg := io.func7 io.stall := true.B dontTouch(f7_reg) } when(div_en){ // io.stall := true.B when (counter < 32.U){ io.stall := true.B mdu.io.src_a := src_a_reg mdu.io.src_b := src_b_reg mdu.io.op := op_reg // mdu.io.valid := true.B counter := counter + 1.U }.otherwise{ mdu.io.valid := false.B div_en := false.B mdu.io.src_a := src_a_reg mdu.io.src_b := src_b_reg mdu.io.op := op_reg counter := 0.U } }//.otherwise{io.stall := false.B} when(div_en && f7_reg === 1.U && mdu.io.ready){ io.ALUresult := Mux(mdu.io.output.valid, mdu.io.output.bits, 0.U) } .elsewhen (io.func7 === 1.U && mdu.io.ready){ io.ALUresult := Mux(mdu.io.output.valid, mdu.io.output.bits, 0.U) } .otherwise{io.ALUresult := alu.io.result} } else { io.ALUresult := alu.io.result } // io.ALUresult := alu.io.result io.writeData := inputMux2 } ``` The `Execute` module handle arithmetic, logical operations, data forwarding, etc. ```scala val inputMux1 = MuxCase( 0.U, Array( (fu.forwardA === 0.U) -> (io.readData1), (fu.forwardA === 1.U) -> (io.mem_result), (fu.forwardA === 2.U) -> (io.wb_result) ) ) val inputMux2 = MuxCase( 0.U, Array( (fu.forwardB === 0.U) -> (io.readData2), (fu.forwardB === 1.U) -> (io.mem_result), (fu.forwardB === 2.U) -> (io.wb_result) ) ) val aluIn1 = MuxCase( inputMux1, Array( (io.ctl_aluSrc1 === 1.U) -> io.pcAddress, (io.ctl_aluSrc1 === 2.U) -> 0.U ) ) val aluIn2 = Mux(io.ctl_aluSrc, inputMux2, io.immediate) ``` Selects the appropriate input for the `ALU`. For `inputMux1` and `inputMux2`: If `fu.forwardA === 0.U` / `fu.forwardB === 0.U`, selects `io.readData1` / `io.readData2`, the original register value. Else if `fu.forwardA === 1.U` / `fu.forwardB === 1.U`, selects `io.mem_result`, the result from the memory stage. Else if `fu.forwardA === 2.U` / `fu.forwardB === 2.U`, selects `io.wb_result`, the result from the writeback stage. For `aluIn1`: If `io.ctl_aluSrc1 === 1.U`, selects `io.pcAddress`, the current program counter value. Else if `io.ctl_aluSrc1 === 2.U`, selects `0.U`, a constant zero. Else selects `inputMux1`, the result of the forwarding logic. For `aluIn2`: If `io.ctl_aluSrc` is true, selects `inputMux2`, another forwarding logic. Else selects `io.immediate`, immediate value encoded in the instruction. --- ### Memory Access Code can be found in `nucleusrv/src/main/scala/components/MemoryFetch.scala` ```scala= package nucleusrv.components import chisel3._ import chisel3.util._ class MemoryFetch extends Module { val io = IO(new Bundle { val aluResultIn: UInt = Input(UInt(32.W)) val writeData: UInt = Input(UInt(32.W)) val writeEnable: Bool = Input(Bool()) val readEnable: Bool = Input(Bool()) val readData: UInt = Output(UInt(32.W)) val stall: Bool = Output(Bool()) val f3 = Input(UInt(3.W)) val dccmReq = Decoupled(new MemRequestIO) val dccmRsp = Flipped(Decoupled(new MemResponseIO)) }) io.dccmRsp.ready := true.B val wdata = Wire(Vec(4, UInt(8.W))) val rdata = Wire(UInt(32.W)) val offset = RegInit(0.U(2.W)) val funct3 = RegInit(0.U(3.W)) val offsetSW = io.aluResultIn(1,0) when(!io.dccmRsp.valid){ funct3 := io.f3 offset := io.aluResultIn(1,0) }.otherwise{ funct3 := funct3 offset := offset } wdata(0) := io.writeData(7,0) wdata(1) := io.writeData(15,8) wdata(2) := io.writeData(23,16) wdata(3) := io.writeData(31,24) /* Store Half Word */ when(io.writeEnable && io.f3 === "b000".U){ when(offsetSW === 0.U){ io.dccmReq.bits.activeByteLane := "b0001".U }.elsewhen(offsetSW === 1.U){ wdata(0) := io.writeData(15,8) wdata(1) := io.writeData(7,0) wdata(2) := io.writeData(23,16) wdata(3) := io.writeData(31,24) io.dccmReq.bits.activeByteLane := "b0010".U }.elsewhen(offsetSW === 2.U){ wdata(0) := io.writeData(15,8) wdata(1) := io.writeData(23,16) wdata(2) := io.writeData(7,0) wdata(3) := io.writeData(31,24) io.dccmReq.bits.activeByteLane := "b0100".U }.otherwise{ wdata(0) := io.writeData(15,8) wdata(1) := io.writeData(23,16) wdata(2) := io.writeData(31,24) wdata(3) := io.writeData(7,0) io.dccmReq.bits.activeByteLane := "b1000".U } } /* Store Half Word */ .elsewhen(io.writeEnable && io.f3 === "b001".U){ // offset will either be 0 or 2 since address will be 0x0000 or 0x0002 when(offsetSW === 0.U){ // data to be stored at lower 16 bits (15,0) io.dccmReq.bits.activeByteLane := "b0011".U }.elsewhen(offsetSW === 1.U){ // data to be stored at lower 16 bits (15,0) io.dccmReq.bits.activeByteLane := "b0110".U wdata(0) := io.writeData(23,16) wdata(1) := io.writeData(7,0) wdata(2) := io.writeData(15,8) wdata(3) := io.writeData(31,24) }.otherwise{ // data to be stored at upper 16 bits (31,16) io.dccmReq.bits.activeByteLane := "b1100".U wdata(2) := io.writeData(7,0) wdata(3) := io.writeData(15,8) wdata(0) := io.writeData(23,16) wdata(1) := io.writeData(31,24) } } /* Store Word */ .otherwise{ io.dccmReq.bits.activeByteLane := "b1111".U } io.dccmReq.bits.dataRequest := wdata.asUInt() io.dccmReq.bits.addrRequest := (io.aluResultIn & "h00001fff".U) >> 2 io.dccmReq.bits.isWrite := io.writeEnable io.dccmReq.valid := Mux(io.writeEnable | io.readEnable, true.B, false.B) io.stall := (io.writeEnable || io.readEnable) && !io.dccmRsp.valid rdata := Mux(io.dccmRsp.valid, io.dccmRsp.bits.dataResponse, DontCare) when(io.readEnable) { when(funct3 === "b010".U) { // load word io.readData := rdata } .elsewhen(funct3 === "b000".U) { // load byte when(offset === "b00".U) { // addressing memory with 0,4,8... io.readData := Cat(Fill(24,rdata(7)),rdata(7,0)) } .elsewhen(offset === "b01".U) { // addressing memory with 1,5,9... io.readData := Cat(Fill(24, rdata(15)),rdata(15,8)) } .elsewhen(offset === "b10".U) { // addressing memory with 2,6,10... io.readData := Cat(Fill(24, rdata(23)),rdata(23,16)) } .elsewhen(offset === "b11".U) { // addressing memory with 3,7,11... io.readData := Cat(Fill(24, rdata(31)),rdata(31,24)) } .otherwise { // this condition would never occur but using to avoid Chisel generating VOID errors io.readData := DontCare } } .elsewhen(funct3 === "b100".U) { //load byte unsigned when(offset === "b00".U) { // addressing memory with 0,4,8... io.readData := Cat(Fill(24, 0.U), rdata(7, 0)) }.elsewhen(offset === "b01".U) { // addressing memory with 1,5,9... io.readData := Cat(Fill(24, 0.U), rdata(15, 8)) }.elsewhen(offset === "b10".U) { // addressing memory with 2,6,10... io.readData := Cat(Fill(24, 0.U), rdata(23, 16)) }.elsewhen(offset === "b11".U) { // addressing memory with 3,7,11... io.readData := Cat(Fill(24, 0.U), rdata(31, 24)) } .otherwise { // this condition would never occur but using to avoid Chisel generating VOID errors io.readData := DontCare } } .elsewhen(funct3 === "b101".U) { // load halfword unsigned when(offset === "b00".U) { // addressing memory with 0,4,8... io.readData := Cat(Fill(16, 0.U),rdata(15,0)) } .elsewhen(offset === "b01".U) { // addressing memory with 2,6,10... io.readData := Cat(Fill(16, 0.U),rdata(23,8)) } .elsewhen(offset === "b10".U) { // addressing memory with 2,6,10... io.readData := Cat(Fill(16, 0.U),rdata(31,16)) } .otherwise { // this condition would never occur but using to avoid Chisel generating VOID errors io.readData := DontCare } } .elsewhen(funct3 === "b001".U) { // load halfword when(offset === "b00".U) { // addressing memory with 0,4,8... io.readData := Cat(Fill(16, rdata(15)),rdata(15,0)) } .elsewhen(offset === "b01".U) { // addressing memory with 1,3,7... io.readData := Cat(Fill(16, rdata(23)),rdata(23,8)) } .elsewhen(offset === "b10".U) { // addressing memory with 2,6,10... io.readData := Cat(Fill(16, rdata(31)),rdata(31,16)) } .otherwise { // this condition would never occur but using to avoid Chisel generating VOID errors io.readData := DontCare } } .otherwise { // unknown func3 bits io.readData := DontCare } } .otherwise { io.readData := DontCare } when(io.writeEnable && io.aluResultIn(31, 28) === "h8".asUInt()){ printf("%x\n", io.writeData) } } ``` The `MemoryFetch` module handles data memory `DCCM (Data Closely Coupled Memory)` read and write operations. --- ```scala /* Store Half Word */ when(io.writeEnable && io.f3 === "b000".U){ when(offsetSW === 0.U){ io.dccmReq.bits.activeByteLane := "b0001".U }.elsewhen(offsetSW === 1.U){ wdata(0) := io.writeData(15,8) wdata(1) := io.writeData(7,0) wdata(2) := io.writeData(23,16) wdata(3) := io.writeData(31,24) io.dccmReq.bits.activeByteLane := "b0010".U }.elsewhen(offsetSW === 2.U){ wdata(0) := io.writeData(15,8) wdata(1) := io.writeData(23,16) wdata(2) := io.writeData(7,0) wdata(3) := io.writeData(31,24) io.dccmReq.bits.activeByteLane := "b0100".U }.otherwise{ wdata(0) := io.writeData(15,8) wdata(1) := io.writeData(23,16) wdata(2) := io.writeData(31,24) wdata(3) := io.writeData(7,0) io.dccmReq.bits.activeByteLane := "b1000".U } } ``` I think the code is actually handling a `Store Byte (SB)` operation, not a `Store Half Word (SH)` as the comment suggests. When `io.writeEnable` is true and `io.f3 === "b000".U`, the operation stores a single byte (8 bits) at a specified memory address. `offsetSW` is the least significant 2 bits of the `ALU` result (memory address), to determine where within a 32-bit word the byte should be stored. `activeByteLane` is a 4-bit value indicating which byte within the 32-bit word should be written. if `offsetSW === 0.U`, the byte is stored in the least significant byte (bits 7-0) of the 32-bit word, and `io.dccmReq.bits.activeByteLane := "b0001".U` Else if `offsetSW === 1.U`, the byte is stored in the second least significant byte (bits 15-8) of the 32-bit word, and `io.dccmReq.bits.activeByteLane := "b0010".U` Else if `offsetSW === 2.U`, the byte is stored in the second most significant byte (bits 23-16) of the 32-bit word, and `io.dccmReq.bits.activeByteLane := "b0100".U` Else if `offsetSW === 3.U`, the byte is stored in the most significant byte (bits 31-24) of the 32-bit word, and `io.dccmReq.bits.activeByteLane := "b1000".U` --- ```scala /* Store Half Word */ .elsewhen(io.writeEnable && io.f3 === "b001".U){ // offset will either be 0 or 2 since address will be 0x0000 or 0x0002 when(offsetSW === 0.U){ // data to be stored at lower 16 bits (15,0) io.dccmReq.bits.activeByteLane := "b0011".U }.elsewhen(offsetSW === 1.U){ // data to be stored at lower 16 bits (15,0) io.dccmReq.bits.activeByteLane := "b0110".U wdata(0) := io.writeData(23,16) wdata(1) := io.writeData(7,0) wdata(2) := io.writeData(15,8) wdata(3) := io.writeData(31,24) }.otherwise{ // data to be stored at upper 16 bits (31,16) io.dccmReq.bits.activeByteLane := "b1100".U wdata(2) := io.writeData(7,0) wdata(3) := io.writeData(15,8) wdata(0) := io.writeData(23,16) wdata(1) := io.writeData(31,24) } } ``` When `io.writeEnable` is true and `io.f3 === "b001".U`, it handles the `Store Half Word (SH)` operation. The comment states that `offsetSW` will either be 0 or 2 since address will be `0x0000` or `0x0002` If `offsetSW === 0.U`, the half word is stored in the lower 16 bits (15-0) of the 32-bit word. `io.dccmReq.bits.activeByteLane := "b0011".U`, indicating that the two least significant bytes should be written. If `offsetSW === 2.U`, the half word is stored in the upper 16 bits (31-16) of the 32-bit word. `io.dccmReq.bits.activeByteLane := "b1100".U`, and the `wdata` is rearranged accordingly. --- ```scala /* Store Word */ .otherwise{ io.dccmReq.bits.activeByteLane := "b1111".U } ``` `Store Word (SW)` operation. `io.dccmReq.bits.activeByteLane := "b1111".U` indicates that all four bytes of the 32-bit word should be active for writing. --- ```scala io.dccmReq.bits.dataRequest := wdata.asUInt() io.dccmReq.bits.addrRequest := (io.aluResultIn & "h00001fff".U) >> 2 io.dccmReq.bits.isWrite := io.writeEnable io.dccmReq.valid := Mux(io.writeEnable | io.readEnable, true.B, false.B) ``` Prepares the memory request by setting up the data to be written (if it's a write operation), calculating the memory address, setting the write enable flag, and validating the request when there's an actual memory operation to perform. --- ```scala io.stall := (io.writeEnable || io.readEnable) && !io.dccmRsp.valid ``` The stall logic ensures that the processor waits for memory operations to complete before proceeding. --- ```scala rdata := Mux(io.dccmRsp.valid, io.dccmRsp.bits.dataResponse, DontCare) ``` Selects the data from the `DCCM` response if it's valid, otherwise sets it to `DontCare`. --- ```scala when(io.readEnable) { when(funct3 === "b010".U) { // load word io.readData := rdata } ``` When `funct3 === "b010"`, it performs a full 32-bit word load, `Load Word (LW)`. --- ```scala .elsewhen(funct3 === "b000".U) { // load byte (sign-extended) // ... } ``` When `funct3 === "b000"`, it performs loading a single byte and sign-extending it to 32 bits, `Load Byte (LB)` It uses the `offset` to determine which byte of the 32-bit word to load. --- ```scala .elsewhen(funct3 === "b100".U) { //load byte unsigned // ... } ``` Similar to `Load Byte (LB)`, but zero-extends the byte instead of sign-extending. `Load Byte Unsigned (LBU)`. --- ```scala .elsewhen(funct3 === "b101".U) { // load halfword unsigned // ... } ``` Loads a 16-bit halfword and zero-extends it to 32 bits, `Load Halfword Unsigned (LHU)`. --- ```scala .elsewhen(funct3 === "b001".U) { // load halfword // ... } ``` Loads a 16-bit halfword and sign-extends it to 32 bits, `Load Halfword (LH)`. ## RV32IM Instruction Introduction ### Multiplication Operations **mul (Multiplication)** ![image](https://hackmd.io/_uploads/BJ02aHtDye.png) **Format:** `mul rd,rs1,rs2` **Description:** performs a 32-bit × 32-bit multiplication and places the lower 32 bits in the destination register (Both `rs1` and `rs2` treated as signed numbers). **Implementation:** `x[rd] = x[rs1] * x[rs2]` --- **mulh (Multiplication Higher)** ![image](https://hackmd.io/_uploads/HJA6JIFvkl.png) **Format:** `mulh rd,rs1,rs2` **Description:** performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both `rs1` and `rs2` treated as signed numbers). **Implementation:** `x[rd] = (x[rs1] s*s x[rs2]) >>s 32` --- **mulhsu (Multiplication Higher Signed Unsigned)** ![image](https://hackmd.io/_uploads/Sye3LXcwJe.png) **Format:** `mulhsu rd,rs1,rs2` **Description:** performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (`rs1` treated as signed number, `rs2` treated as unsigned number). **Implementation:** `x[rd] = (x[rs1] s*u x[rs2]) >>s 32` --- **mulhu (Multiplication Higher Unsigned)** ![image](https://hackmd.io/_uploads/HJOE_X5vyl.png) **Format:** `mulhu rd,rs1,rs2` **Description:** performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both `rs1` and `rs2` treated as unsigned numbers). **Implementation:** `x[rd] = (x[rs1] u*u x[rs2]) >>u 32` --- ### Division Operations **div (Division)** ![image](https://hackmd.io/_uploads/HJeTKXqw1g.png) **Format:** `div rd,rs1,rs2` **Description:** perform signed integer division of 32 bits by 32 bits (rounding towards zero). **Implementation:** `x[rd] = x[rs1] /s x[rs2]` --- **divu (Division Unsigned)** ![image](https://hackmd.io/_uploads/SJOzjQ5P1g.png) **Format:** `divu rd, rs1, rs2` **Description:** perform unsigned integer division of 32 bits by 32 bits (rounding towards zero). **Implementation:** `x[rd] = x[rs1] /u x[rs2]` --- **rem (Remain)** ![image](https://hackmd.io/_uploads/r1VQoQcPyl.png) **Format:** `rem rd, rs1, rs2` **Description:** provide the remainder of the corresponding division operation div (the sign of `rd` equals the sign of `rs1`). **Implementation:** `x[rd] = x[rs1] %s x[rs2]` --- **remu (Remain Unsigned)** ![image](https://hackmd.io/_uploads/ryxNjQ5Dkl.png) **Format:** `rem rd, rs1, rs2` **Description:** provide the remainder of the corresponding division operation divu. **Implementation:** `x[rd] = x[rs1] %u x[rs2]` ## References * https://github.com/merledu/nucleusrv * https://hackmd.io/@sysprog/r1mlr3I7p#Single-cycle-RISC-V-CPU * https://blog.csdn.net/raw_inputhello/article/details/135848711 * https://verilator.org/guide/latest/install.html * https://github.com/riscv-collab/riscv-gnu-toolchain * https://github.com/riscv-non-isa/riscv-arch-test/tree/1.0 * https://msyksphinz-self.github.io/riscv-isadoc/html/rvm.html * https://docs.openhwgroup.org/projects/cva6-user-manual/01_cva6_user/RISCV_Instructions_RV32M.html