# ChiselRiscV > 郭君瑋 [GitHub](https://github.com/ks63154304/MyCPU) ## Mission In my term project, my goal is Study [ChiselRiscV](https://github.com/nozomioshi/ChiselRiscV) and the books mentioned on the project page to understand how to build a RISC-V processor using Chisel. Ensure that it passes the [riscv-arch-test](https://github.com/riscv-non-isa/riscv-arch-test) and supports the RV32IM and CSR instructions.Finally, select and modify at least three RISC-V programs from course quiz, and make them run on my improved ChiselRiscV processor. The entire process can be divided into the following three main steps: 1. Study how to make a chisel-based RISC-V CPU 2. Make my CPU pass riscv-arch-test 3. Run three programs from quiz selected from course quiz on it ## Prerequisites Before beginning my study and implementation, it is essential to build my environment. The following are OS and the software I used in this project. - Ubuntu Linux 22.04 - [riscv-gnu-toolchain](https://github.com/riscv/riscv-gnu-toolchain) For chisel - verilator - gtkwave - [sbt](https://sdkman.io/install) - jdk For riscv-arch-test - python3 - [SAIL](https://github.com/riscv/sail-riscv.git) - [RISCOF](https://riscof.readthedocs.io/en/latest/installation.html) I use Docker to set up my own environment. The detail installation of the above software and others not mentioned is recorded in the following Dockerfile. ::: spoiler Dockerfile ```!=bash # Building Stage FROM ubuntu:22.04 AS stage_building ARG RISCV_GNU_DEP="wget curl make autoconf automake autotools-dev python3 python3-pip \ libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf \ libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git libglib2.0-dev" ARG SAIL_DEP="opam build-essential libgmp-dev z3 pkg-config zlib1g-dev" RUN apt update && apt -y install ${RISCV_GNU_DEP} ${SAIL_DEP} # riscv-gnu-toolchain RUN git clone https://github.com/riscv/riscv-gnu-toolchain && \ cd riscv-gnu-toolchain && \ ./configure --prefix=/usr/local/riscv --enable-multilib --with-arch=rv32gc --with-abi=ilp32d && \ make -j `nproc` && \ make install # SAIL C-emulator RUN opam init -y --disable-sandboxing && \ opam switch create ocaml-base-compiler.4.08.1 && \ opam install sail -y && \ eval $(opam config env) && \ git clone https://github.com/riscv/sail-riscv.git /usr/local/sail-riscv && \ cd /usr/local/sail-riscv && \ ARCH=RV32 make -j `nproc` # Final Stage FROM ubuntu:22.04 AS stage_final # copy tools built in last stage to final stage COPY --from=stage_building /usr/local/riscv /usr/local/riscv COPY --from=stage_building /usr/local/sail-riscv /usr/local/sail-riscv # set PATH to include RISC-V GNU Toolchain and SAIL C-emulator ENV PATH="$PATH:/usr/local/riscv/bin" ENV PATH="$PATH:/usr/local/sail-riscv/c_emulator" # zip and unzip are required for the sdkman to be installed from script RUN \ apt update && \ DEBIAN_FRONTEND=noninteractive apt-get install -y \ build-essential \ verilator \ curl \ zip \ unzip \ sudo \ git \ python3 \ python3-pip \ && \ rm -rf /var/lib/apt/lists/* # this SHELL command is needed to allow `source` to work properly # reference: https://stackoverflow.com/questions/20635472/using-the-run-instruction-in-a-dockerfile-with-source-does-not-work/45087082#45087082 SHELL ["/bin/bash", "-c"] # add a user whose uid and gid are same as the master user ARG UID GID NAME=user RUN groupadd -g $GID -o $NAME RUN useradd -u $UID -m -g $NAME -G plugdev $NAME && \ echo "$NAME ALL = NOPASSWD: ALL" > /etc/sudoers.d/user && \ chmod 0440 /etc/sudoers.d/user RUN chown -R $NAME:$NAME /home/$NAME USER $NAME # reference: https://sdkman.io/install RUN curl -s "https://get.sdkman.io" | bash RUN source "$HOME/.sdkman/bin/sdkman-init.sh" && \ sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) && \ sdk install sbt # install riscof and riscv-arch-test RUN pip3 install --upgrade pip && \ pip3 install riscof ENV PATH="$PATH:/home/user/.local/bin" WORKDIR "/home/user/workspace" ENTRYPOINT ["/bin/bash"] ``` ::: :::warning In the official RISCOF document, there is a small mistake regarding the installation of SAIL. The version of ocaml-base-compiler needs to be at least 4.08.1 instead of 4.06.1. Additionally, when installing RISCOF by pip, it will be installed in `/home/user/.local/bin`, you need to use `export PATH="$PATH:/home/user/.local/bin"` to get `riscof` this command in your terminal. ::: ## ChiselRiscV [ChiselRiscV](https://github.com/nozomioshi/ChiselRiscV) is a 32-bit RISC-V CPU implemented according to the book ["CPU Design with RISC-V and Chisel - First step to custom CPU implementation with open-source ISA"](https://github.com/chadyuu/riscv-chisel-book). To understand how to make a riscv CPU by Chisel, the first step is cloning this repository. ```bash git clone https://github.com/nozomioshi/ChiselRiscV.git ``` After cloning repository, we can see this file structure ``` . ├── build.sbt ├── doc ├── dockerfile ├── .gitignore ├── README.md ├── results ├── src │ ├── main │ │ ├── c │ │ ├── scala │ │ └── shell │ └── test │ ├── resources │ └── scala └── target ``` In Chisel, the source code can divided two part, `main` and `test`. `main` contains the code for all hardware behaviors, and `test` is similiar to the testbench in Verilog. It is responsible for providing inputs and verifying the outputs. `build.sbt` include the compile configuration and the version of Scala and Chisel. If we want to run the all test, use ```bash sbt run ``` or use testOnly ```bash sbt "testOnly ctest.tests.HexTest" ``` to run the Specified test. ### Fetch Memory consists of UInt(8.W), representing a byte. RISC-V uses little-endian, which means the least significant byte is stored at the lowest address. 8 bit fetch.hex | Address | Data | | :-----: | :--: | | 0 | 11 | | 1 | 12 | | 2 | 13 | | 3 | 14 | | 4 | 21 | | ... | ... | | 11 | 34 | 8 bit fetch.hex after being organized into 32 bit. | Address | Data | | :-----: | :------: | | 0 | 14131211 | | 4 | 24232221 | | 8 | 34333221 | Hence, the behavior of fetching memory will like the following. ```scala imem.inst := Cat( mem(imem.addr + 3.U), mem(imem.addr + 2.U), mem(imem.addr + 1.U), mem(imem.addr ) ) ``` `pcReg` will increment based on the `StartAddr` when test starts. Every cycle, CPU fetch instruction depended on `pcReg`. ```scala val pcReg = RegInit(StartAddr) pcReg := pcReg + 4.U imem.addr := pcReg val inst = imem.inst ``` ### Decode 32 registers are defined in RV32I, which are used to store data and addresses. Each register is 32-bit wide. :::spoiler register | Address | register | |:-------:|:--------:| | 0 | zero | | 1 | ra | | 2 | sp | | 3 | gp | | 4 | tp | | 5 | t0 | | 6 | t1 | | 7 | t2 | | 8 | s0/fp | | 9 | s1 | | 10 | a0 | | 11 | a1 | | 12 | a2 | | 13 | a3 | | 14 | a4 | | 15 | a5 | | 16 | a6 | | 17 | a7 | | 18 | s2 | | 19 | s3 | | 20 | s4 | | 21 | s5 | | 22 | s6 | | 23 | s7 | | 24 | s8 | | 25 | s9 | | 26 | s10 | | 27 | s11 | | 28 | t3 | | 29 | t4 | | 30 | s5 | | 31 | t6 | ::: And there is a register memory declared in Core. we can use the above table to find the register we want to accessed ```scala val regFile = Mem(32, UInt(WordLen.W)) ``` RV32I basic instruction set is composed of 32-bit instructions. The instruction format is divided into six types: R-Type, I-Type, S-Type,U-Type,J-Type and B-Type. J-Type and B-Type are respectively come from I-Type and S-Type, so we can say that there are four basic types of instructions. :::spoiler Instruction format [R-Type] +-------------------------------------------------------------------------------------------------+ | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | +---------------------+--------------+--------------+--------+--------------+---------------------+ | funct7 | rs2 | rs1 | funct3 | rd | opcode | +---------------------+--------------+--------------+--------+--------------+---------------------+ [I-Tpye] +-------------------------------------------------------------------------------------------------+ | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | +------------------------------------+--------------+--------+--------------+---------------------+ | imm_i | rs1 | funct3 | rd | opcode | +------------------------------------+--------------+--------+--------------+---------------------+ [S-Type] +-------------------------------------------------------------------------------------------------+ | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | +---------------------+--------------+--------------+--------+--------------+---------------------+ | imm_s(11:5) | rs2 | rs1 | funct3 | imm_s(4:0) | opcode | +---------------------+--------------+--------------+--------+--------------+---------------------+ [U-Type] +-------------------------------------------------------------------------------------------------+ | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | +------------------------------------------------------------+--------------+---------------------+ | imm_u(11:5) | rd | opcode | +------------------------------------------------------------+--------------+---------------------+ [J-Type] +-------------------------------------------------------------------------------------------------+ | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | +------------------------------------------------------------+--------------+---------------------+ | imm_j(20 + 10:1 + 11 + 19:12) | rd | opcode | +------------------------------------------------------------+--------------+---------------------+ [B-Type] +-------------------------------------------------------------------------------------------------+ | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 | +---------------------+--------------+--------------+--------+--------------+---------------------+ | imm_b(12 + 10:5) | rs2 | rs1 | funct3 | imm_b(4:1+11)| opcode | +---------------------+--------------+--------------+--------+--------------+---------------------+ ::: The CPU can analyze the instructions based on these formats to determine the required registers and data. ```scala val rs1Addr = inst(19, 15) val rs2Addr = inst(24, 20) val wbAddr = inst(11, 7) val rs1Data = Mux(rs1Addr =/= 0.U, regFile(rs1Addr), 0.U) val rs2Data = Mux(rs2Addr =/= 0.U, regFile(rs2Addr), 0.U) ``` and immediate can be got by the same method. ```scala val immI = inst(31, 20) val immIsext = Cat(Fill(20, immI(11)), immI) val immS = Cat(inst(31, 25), inst(11, 7)) val immSsext = Cat(Fill(20, immS(11)), immS) val immB = Cat(inst(31), inst(7), inst(30, 25), inst(11, 8)) val immBsext = Cat(Fill(19, immB(11)), immB, 0.U(1.W)) val immJ = Cat(inst(31), inst(19, 12), inst(20), inst(30, 21)) val immJsext = Cat(Fill(11, immJ(19)), immJ, 0.U(1.W)) val immU = inst(31, 12) val immUshifted = Cat(immU, 0.U(12.W)) val immZ = inst(19, 15) val immZext = Cat(Fill(27, 0.U), immZ) ``` Since every instructions execute different behaviors, it is necessary to use control signals to determine the circuit paths. The following is the signal applied to this CPU. ```scala val exeFun :: op1Sel :: op2Sel :: memWen :: regFileWen :: wbSel :: csrCmd :: Nil = controlSignals ``` A lookup table is required to decode every instruction and determining all the signals. ```scala val controlSignals = ListLookup(inst, List(AluX, Op1Rs1, Op2Rs2, MenX, RenS, WbX, CsrX), Array( Lw -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbMem, CsrX), Sw -> List(AluAdd, Op1Rs1, Op2Ims, MenS, RenX, WbX, CsrX), Add -> List(AluAdd, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Addi -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Sub -> List(AluSub, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), And -> List(AluAnd, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Or -> List(AluOr, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Xor -> List(AluXor, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Andi -> List(AluAnd, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Ori -> List(AluOr, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Xori -> List(AluXor, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Sll -> List(AluSll, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Srl -> List(AluSrl, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Sra -> List(AluSra, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Slli -> List(AluSll, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Srli -> List(AluSrl, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Srai -> List(AluSra, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Slt -> List(AluSlt, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Sltu -> List(AluSltu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Slti -> List(AluSlt, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Sltiu -> List(AluSltu, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX), Beq -> List(BrBeq, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX), Bne -> List(BrBne, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX), Blt -> List(BrBlt, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX), Bge -> List(BrBge, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX), Bltu -> List(BrBltu, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX), Bgeu -> List(BrBgeu, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX), Jal -> List(AluAdd, Op1Pc, Op2Imj, MenX, RenS, WbPc, CsrX), Jalr -> List(AluJalr, Op1Rs1, Op2Imi, MenX, RenS, WbPc, CsrX), Lui -> List(AluAdd, Op1X, Op2Imu, MenX, RenS, WbAlu, CsrX), AuiPc -> List(AluAdd, Op1Pc, Op2Imu, MenX, RenS, WbAlu, CsrX), CsrRw -> List(AluCopy1, Op1Rs1, Op2X, MenX, RenS, WbCsr, CsrW), CsrRs -> List(AluCopy1, Op1Rs1, Op2X, MenX, RenS, WbCsr, CsrS), CsrRc -> List(AluCopy1, Op1Rs1, Op2X, MenX, RenS, WbCsr, CsrC), CsrRwi -> List(AluCopy1, Op1Imz, Op2X, MenX, RenS, WbCsr, CsrW), CsrRsi -> List(AluCopy1, Op1Imz, Op2X, MenX, RenS, WbCsr, CsrS), CsrRci -> List(AluCopy1, Op1Imz, Op2X, MenX, RenS, WbCsr, CsrC), Ecall -> List(AluX, Op1X, Op2X, MenX, RenX, WbX, CsrE) ) ) ``` Here, take an `add` instruction as a example to demonstrate the behavior of the lookup table. If an `add` instruction is the following. ``` 00a50e33 add t3,a0,a0 # 0x00a50e33 = 0000 0000 1010 0101 0000 1110 0011 0011 ``` The bit pattern of `add` is `b0000000??????????000?????0110011`.Then, the CPU will use this pattern to match the instruction in the table, identify which instruction it is, and give back a set of control signals. ::: spoiler bit pattern ```scala val Lw = BitPat("b?????????????????010?????0000011") val Sw = BitPat("b?????????????????010?????0100011") // Add val Add = BitPat("b0000000??????????000?????0110011") val Addi = BitPat("b?????????????????000?????0010011") // Subtract val Sub = BitPat("b0100000??????????000?????0110011") // Logical val And = BitPat("b0000000??????????111?????0110011") val Or = BitPat("b0000000??????????110?????0110011") val Xor = BitPat("b0000000??????????100?????0110011") val Andi = BitPat("b?????????????????111?????0010011") val Ori = BitPat("b?????????????????110?????0010011") val Xori = BitPat("b?????????????????100?????0010011") // Shift val Sll = BitPat("b0000000??????????001?????0110011") val Srl = BitPat("b0000000??????????101?????0110011") val Sra = BitPat("b0100000??????????101?????0110011") val Slli = BitPat("b0000000??????????001?????0010011") val Srli = BitPat("b0000000??????????101?????0010011") val Srai = BitPat("b0100000??????????101?????0010011") // Compare val Slt = BitPat("b0000000??????????010?????0110011") val Sltu = BitPat("b0000000??????????011?????0110011") val Slti = BitPat("b?????????????????010?????0010011") val Sltiu = BitPat("b?????????????????011?????0010011") // Branch val Beq = BitPat("b?????????????????000?????1100011") val Bne = BitPat("b?????????????????001?????1100011") val Blt = BitPat("b?????????????????100?????1100011") val Bge = BitPat("b?????????????????101?????1100011") val Bltu = BitPat("b?????????????????110?????1100011") val Bgeu = BitPat("b?????????????????111?????1100011") // Jump val Jal = BitPat("b?????????????????????????1101111") val Jalr = BitPat("b?????????????????000?????1100111") // Load immediate val Lui = BitPat("b?????????????????????????0110111") val AuiPc = BitPat("b?????????????????????????0010111") // CSR val CsrRw = BitPat("b?????????????????001?????1110011") val CsrRwi = BitPat("b?????????????????101?????1110011") val CsrRs = BitPat("b?????????????????010?????1110011") val CsrRsi = BitPat("b?????????????????110?????1110011") val CsrRc = BitPat("b?????????????????011?????1110011") val CsrRci = BitPat("b?????????????????111?????1110011") // Exception val Ecall = BitPat("b00000000000000000000000001110011") ``` ::: Because of different instruction format and usage, the operation is not performed on two different registers but on a register and an immediate value sometimes. ```scala val op1Data = MuxCase(0.U, Seq( (op1Sel === Op1Rs1) -> rs1Data, (op1Sel === Op1Pc) -> pcReg, (op1Sel === Op1Imz) -> immZext )) val op2Data = MuxCase(0.U, Seq( (op2Sel === Op2Rs2) -> rs2Data, (op2Sel === Op2Imi) -> immIsext, (op2Sel === Op2Ims) -> immSsext, (op2Sel === Op2Imj) -> immJsext, (op2Sel === Op2Imu) -> immUshifted )) ``` ### Execute After determining op1Data and op2Data, the inputs wiil be sent to the ALU. The design of the ALU is shown in the following code, where the control signal determines the type of operation to be executed. ```scala aluOut := MuxCase(0.U, Seq( (exeFun === AluAdd) -> (op1Data + op2Data), (exeFun === AluSub) -> (op1Data - op2Data), (exeFun === AluAnd) -> (op1Data & op2Data), (exeFun === AluOr) -> (op1Data | op2Data), (exeFun === AluXor) -> (op1Data ^ op2Data), (exeFun === AluSll) -> (op1Data << op2Data(4, 0))(31, 0), (exeFun === AluSrl) -> (op1Data >> op2Data(4, 0)), (exeFun === AluSra) -> (op1Data.asSInt >> op2Data(4, 0)).asUInt, (exeFun === AluSlt) -> (op1Data.asSInt < op2Data.asSInt).asUInt, (exeFun === AluSltu) -> (op1Data < op2Data).asUInt, (exeFun === AluJalr) -> ((op1Data + op2Data) & ~1.U(WordLen.W)), (exeFun === AluCopy1) -> op1Data )) ``` For B format instruciotn, branch Comparator is also required. ```scala brFlag := MuxCase(false.B, Seq( (exeFun === BrBeq) -> (op1Data === op2Data), (exeFun === BrBne) -> (op1Data =/= op2Data), (exeFun === BrBlt) -> (op1Data.asSInt < op2Data.asSInt), (exeFun === BrBge) -> !(op1Data.asSInt < op2Data.asSInt), (exeFun === BrBltu) -> (op1Data < op2Data), (exeFun === BrBgeu) -> !(op1Data < op2Data) )) brTarget := pcReg + immBsext ``` Becasue B and J format maybe jump to another address, modification to the pcReg is essential. If instruction is `Jal` or `Jalr`, directly write the jump address to pcReg, and so do B format instructions. ```scala val pcReg = RegInit(StartAddr) // pcReg := pcReg + 4.U imem.addr := pcReg val inst = imem.inst val jmpFlag = inst === Jal || inst === Jalr val eCallFlag = inst === Ecall val aluOut = Wire(UInt(WordLen.W)) pcReg := MuxCase(pcPlus4, Seq( brFlag -> brTarget, jmpFlag -> aluOut, eCallFlag -> csrRegFile(0x305.U) // Trap vector )) ``` when CPU execute `ecall`, the trap handler must be triggered. Since this CPU only implements M-mode(Machine mode), when an `ecall` occurs, the value of `mtvec` must be written to `pcReg` to allow the CPU to jump to the trap handler. ![](https://hackmd.io/_uploads/r1zvIPpDJx.png) ### Memory access load and save instructions must access data memory, so only `lw` and `sw` are related to this stage. ```scala dmem.addr := aluOut dmem.wEn := memWen dmem.wData := rs2Data ``` CPU access data memory according to the address of register value and immediate (`rs1Data` + `immS`) in `sw` instrution. ```scala dmem.data := Cat( mem(dmem.addr + 3.U), mem(dmem.addr + 2.U), mem(dmem.addr + 1.U), mem(dmem.addr) ) ``` The `memWen` signal will only be true for the `lw` instruction. In other words, only the `lw` instruction writes data memory based on the register value and the immediate address (`rs1Data` + `immI`). ```scala when(dmem.wEn) { mem(dmem.addr + 3.U) := dmem.wData(31, 24) mem(dmem.addr + 2.U) := dmem.wData(23, 16) mem(dmem.addr + 1.U) := dmem.wData(15, 8) mem(dmem.addr) := dmem.wData(8, 0) } ``` ### Write back Except for `lw`, `sw`, J and B format instrutions, the remaining basic instructions write the result of `aluOut` back to the specified register when `regFileWen === RenS`. `lw`, `Jal` and `Jalr` do the same thing, but `lw` write the data accessed from memory, while `Jal` and `Jalr` write back `pcPlus4`, instead of `aluOut`. `sw` and B format instrutions don't care about this stage. ```scala val wbData = MuxCase(aluOut, Seq( (wbSel === WbMem) -> dmem.data, (wbSel === WbPc) -> pcPlus4, (wbSel === WbCsr) -> csrRdata )) when(regFileWen === RenS) { regFile(wbAddr) := wbData } ``` CSR instuction is atomic. Such instructions cannot be divided into separate steps. In other words, CSR instuction can read and write the same CSR register in the same time. Take `csrrw` for a example, `csrrw` is "Atomic Read/Write in CSR". `csrrw` read value from the CSR rgister and write it to `rd` register (see the above code), while the value in `rs1` register is read and written to CSR register. ``` csrrw rd, csr, rs1 csrrwi rd, csr, imm_z csrrs rd, csr, rs1 csrrsi rd, csr, imm_z csrrc rd, csr, rs1 csrrci rd, csr, imm_z ``` `s` for set, the operated value is ORed with the CSR value. `c` for clear, the operated value is inverted and ANDed with the CSR value. ```scala // CSR val csrAddr = Mux(csrCmd === CsrE, 0x342.U, inst(31, 20)) // mcause: 0x342 val csrRdata = csrRegFile(csrAddr) val csrWdata = MuxCase(0.U, Seq( (csrCmd === CsrW) -> op1Data, (csrCmd === CsrS) -> (csrRdata | op1Data), (csrCmd === CsrC) -> (csrRdata & ~op1Data), (csrCmd === CsrE) -> 11.U // Machine ECALL )) when(csrCmd > 0.U) { csrRegFile(csrAddr) := csrWdata } ``` The `exit` is the end signal. when instrution is `Unimp`, the `exit` signal is raised to high. Then, test detect that `exit` signal is high, the whole CPU test will be finished. ```scala exit := (inst === Unimp) ``` ```scala test(config()) { dut => while(!dut.exit.peek().litToBoolean) { dut.clock.setTimeout(0) dut.clock.step() } dut.clock.step() } ``` ## Improve ChiselRiscV Through my research, I discovered that this CPU lacks some basic instructions and needs to add the M-series instructions. ### Complement load and save instructions The missing instructions are `Lh`, `Lhu`, `Lb`, `Lbu`, `Sh` and `Sb`. In order to adding them to the current CPU, the first step is modifying the decoder. Hence, I add the bit pattern of these instructions while adjusting the lookup table and part of control signal. ```scala // bit pattern val Lw = BitPat("b?????????????????010?????0000011") val Lh = BitPat("b?????????????????001?????0000011") val Lhu = BitPat("b?????????????????101?????0000011") val Lb = BitPat("b?????????????????000?????0000011") val Lbu = BitPat("b?????????????????100?????0000011") val Sw = BitPat("b?????????????????010?????0100011") val Sh = BitPat("b?????????????????001?????0100011") val Sb = BitPat("b?????????????????000?????0100011") ``` ```scala // lookup table Lh -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLH, CsrX), Lhu -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLHU, CsrX), Lb -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLB, CsrX), Lbu -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLBU, CsrX), Sw -> List(AluAdd, Op1Rs1, Op2Ims, MenSW, RenX, WbX, CsrX), Sh -> List(AluAdd, Op1Rs1, Op2Ims, MenSH, RenX, WbX, CsrX), Sb -> List(AluAdd, Op1Rs1, Op2Ims, MenSB, RenX, WbX, CsrX), ``` In the beginning, the signal `memWen` is ture only when the instruction is `sw`. after adding `sh` and `sb`, this signal is expanded to four states and can identify how much byte data should be stored. ```scala // old Memory access dmem.addr := aluOut dmem.wEn := memWen dmem.wData := rs2Data // new Memory access dmem.addr := aluOut dmem.wEn := (memWen > 0.U) dmem.wData := MuxCase(0.U, Seq( (memWen === MenSW) -> rs2Data, (memWen === MenSH) -> Cat(dmem.data(31, 16), rs2Data(15, 0)), (memWen === MenSB) -> Cat(dmem.data(31, 8), rs2Data(7, 0)) )) ``` `wbSel` is also expanded to 8 states. For each load instructions, there are corresponding write-back behaviors. ```scala // old Write back val wbData = MuxCase(aluOut, Seq( (wbSel === WbMem) -> dmem.data, (wbSel === WbPc) -> pcPlus4, (wbSel === WbCsr) -> csrRdata )) when(regFileWen === RenS) { regFile(wbAddr) := wbData } // Nnew Write back val wbData = MuxCase(aluOut, Seq( (wbSel === WbLW) -> dmem.data, (wbSel === WbLH) -> Cat(Fill(16, dmem.data(31)), dmem.data(15, 0)), (wbSel === WbLHU) -> Cat(Fill(16, 0.U), dmem.data(15, 0)), (wbSel === WbLB) -> Cat(Fill(24, dmem.data(31)), dmem.data(7, 0)), (wbSel === WbLBU) -> Cat(Fill(24, 0.U), dmem.data(7, 0)), (wbSel === WbPc) -> pcPlus4, (wbSel === WbCsr) -> csrRdata )) when(regFileWen === RenS) { regFile(wbAddr) := wbData } ``` ### M extension expanding M-series instructions is similiar to complement load and save instructions. Adding bit pattern and modifying lookup table is necessary. ```scala // bit pattern val Mul = BitPat("b0000001??????????000?????0110011") val Mulh = BitPat("b0000001??????????001?????0110011") val Mulhsu = BitPat("b0000001??????????010?????0110011") val Mulhu = BitPat("b0000001??????????011?????0110011") val Div = BitPat("b0000001??????????100?????0110011") val Divu = BitPat("b0000001??????????101?????0110011") val Rem = BitPat("b0000001??????????110?????0110011") val Remu = BitPat("b0000001??????????111?????0110011") ``` ```scala // lookup table Mul -> List(AluMul, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Mulh -> List(AluMulh, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Mulhsu -> List(AluMulhsu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Mulhu -> List(AluMulhu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Div -> List(AluDiv, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Divu -> List(AluDivu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Rem -> List(AluRem, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), Remu -> List(AluRemu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX), ``` For M-series instructions, it is sufficient to expand the behavior of the ALU, and the rest of the signal controls are the same as `add` instruction. ```scala // ALU aluOut := MuxCase(0.U, Seq( (exeFun === AluAdd) -> (op1Data + op2Data), (exeFun === AluSub) -> (op1Data - op2Data), (exeFun === AluAnd) -> (op1Data & op2Data), (exeFun === AluOr) -> (op1Data | op2Data), (exeFun === AluXor) -> (op1Data ^ op2Data), (exeFun === AluSll) -> (op1Data << op2Data(4, 0))(31, 0), (exeFun === AluSrl) -> (op1Data >> op2Data(4, 0)), (exeFun === AluSra) -> (op1Data.asSInt >> op2Data(4, 0)).asUInt, (exeFun === AluSlt) -> (op1Data.asSInt < op2Data.asSInt).asUInt, (exeFun === AluSltu) -> (op1Data < op2Data).asUInt, (exeFun === AluJalr) -> ((op1Data + op2Data) & ~1.U(WordLen.W)), (exeFun === AluCopy1) -> op1Data, (exeFun === AluMul) -> (op1Data * op2Data)(31, 0).asUInt, (exeFun === AluMulh) -> (op1Data.asSInt * op2Data.asSInt)(63, 32).asUInt, (exeFun === AluMulhsu) -> (op1Data.asSInt * op2Data)(63, 32).asUInt, (exeFun === AluMulhu) -> (op1Data * op2Data)(63, 32).asUInt, (exeFun === AluDiv) -> Mux(op2Data.asSInt === 0.S(WordLen.W), 0xFFFFFFFF.S(WordLen.W), (op1Data.asSInt/op2Data.asSInt)).asUInt, (exeFun === AluDivu) -> Mux(op2Data === 0.U(WordLen.W), 0xFFFFFFFF.S(WordLen.W).asUInt, (op1Data/op2Data)).asUInt, (exeFun === AluRem) -> Mux(op2Data.asSInt === 0.S(WordLen.W), op1Data.asSInt, (op1Data.asSInt % op2Data.asSInt)).asUInt, (exeFun === AluRemu) -> Mux(op2Data === 0.U(WordLen.W), op1Data, (op1Data % op2Data)).asUInt )) ``` >On division, you have to be careful about deviding by 0. Specification for this case is also defined.The quotient of division by zero has all bits set, and the remainder of division by zero equals the dividend. >from [Implemented M-extension of RISCV](https://medium.com/@moriryosuke48/implemented-m-extension-of-riscv-84cc20e30d17) ## Riscv-Arch-Test The [riscv-arch-test](https://github.com/riscv-non-isa/riscv-arch-test) are an evolving set of tests that are created to help ensure that software written for a given RISC-V Profile/Specification will run on all implementations that comply with that profile. The older 2.x version of the framework is based on Makefiles and the current version 3.10 I Adopt use [RISCOF](https://riscof.readthedocs.io/en/latest/installation.html) as its basis system. [RISCOF](https://riscof.readthedocs.io/en/latest/installation.html)(The RISC-V Compatibility Framework) is a python based framework which enables testing of a RISC-V target (hard or soft implementations) against a standard RISC-V golden reference model using a suite of RISC-V architectural assembly tests. RISCOF generates standard pre-built templates for DUTs and Reference Models for the user via the `setup` command as shown below: ```bash riscof setup --dutname=spike ``` The above command will generate the following files and directories in the current directory: ``` ├──config.ini # configuration file for riscof ├──spike/ # DUT plugin templates ├── env │ ├── link.ld # DUT linker script │ └── model_test.h # DUT specific header file ├── riscof_spike.py # DUT python plugin ├── spike_isa.yaml # DUT ISA yaml based on riscv-config └── spike_platform.yaml # DUT Platform yaml based on riscv-config ├──sail_cSim/ # reference plugin templates ├── env │ ├── link.ld # Reference linker script │ └── model_test.h # Reference model specific header file ├── __init__.py └── riscof_sail_cSim.py # Reference model python plugin. ``` The generate template `config.ini` will look something like this by default: ``` [RISCOF] ReferencePlugin=sail_cSim ReferencePluginPath=/path/to/riscof/sail_cSim DUTPlugin=spike DUTPluginPath=/path/to/riscof/spike ## Example configuration for spike plugin. [spike] pluginpath=/path/to/riscof/spike/ ispec=/path/to/riscof/spike/spike_isa.yaml pspec=/path/to/riscof/spike/spike_platform.yaml [sail_cSim] pluginpath=/path/to/riscof/sail_cSim ``` Before you start to run RISCOF, you should supply the path of some files about your hardware model, such as plugin, ispec and pspec to `config.ini`. ### DUT Plugin A typical DUT plugin directory has the following structure: ``` ├──dut-name/ # DUT plugin templates ├── env │ ├── link.ld # DUT linker script │ └── model_test.h # DUT specific header file ├── riscof_dut-name.py # DUT python plugin ├── dut-name_isa.yaml # DUT ISA yaml based on riscv-config └── dut-name_platform.yaml # DUT Platform yaml based on riscv-config ``` The python plugin files capture the behavior of model for compiling tests, executing them on the DUT and finally extracting the signature for each test. The yaml specs in the DUT plugin directory are the most important inputs to the RISCOF framework. All decisions of filtering tests depend on the these YAML files. The files must follow the syntax/format specified by riscv-config. These YAMLs are validated in RISCOF using riscv-config. The `env` folder can also contain other necessary plugin specific files for pre/post processing of logs, signatures, elfs, etc. ### Building ChiselRiscV Plugin For ChiselRiscV, the input data is `.hex` compiled `.c` via riscv-gnu-toolchain. I use the following command to compile the testing program. ```bash %: ./src/%.c riscv32-unknown-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o ./build/init.o ./scripts/init.S riscv32-unknown-elf-gcc $< -O0 -march=rv32im_zicsr -mabi=ilp32 -c -o ./build/$@.o riscv32-unknown-elf-ld ./build/$@.o ./build/init.o -b elf32-littleriscv -T ./scripts/link.ld -o ./build/$@ riscv32-unknown-elf-objcopy ./build/$@ -O binary ./bin/$@.bin od ./bin/$@.bin -An -tx1 -w1 -v > ../../test/resources/hex/$@.hex riscv32-unknown-elf-objdump ./build/$@ -b elf32-littleriscv -D > ./dump/$@.dump ``` Then, it produce a `.hex` as input of hardware and a `dump` for debugging. the testing command is the following: ```bash sbt "testOnly ChiselRiscV.tests.HexTest -- -DprogramFile=ctest.hex -DwriteVcd=1" ``` I add argument `-DprogramFile` to find where the `.hex` is. The `loadMemoryFromFileInline` funtion read it to `mem`. ```scala! def loadMemoryFromHexFile(filename: Option[String]): Unit = loadMemoryFromFileInline(mem, filename.get) ``` In RISCOF, the testing is comparing the section `.data` of the program with the reference hardware to verify whether the hardware behavior is correct. Therefore, the dut hardware must have the funtion of printing out the memory content. However, Chisel doesn't have any funtion about writting data to file, like `fwrite` in Verilog. So, I used `2>&1 | tee output.stdout` to save terminal output and Capture the part of memory data. This method was provided by [nucleusrv](https://github.com/merledu/nucleusrv/blob/master/riscv-target/nucleusrv/device/rv32i/Makefile.include). ```bash sbt \"testOnly ChiselRiscV.tests.HexTest -- -DprogramFile=ctest.hex -DwriteVcd=1\" 2>&1 | tee output.stdout; `grep '^[a-f0-9]\+$$' output.stdout > output.signature` ``` The remaining task is to print the content of a specified memory address. Here, I refer to the MMIO approach by setting two rarely used memory addresses as output addresses. By moving specific memory value to `OutAddr` and then storing 1 into `PrintAddr`, the hardware will print the value of `OutAddr` in the terminal. ```scala when(PrintAddr_mem===1.U(WordLen.W)) {// OutAddr = 0x00100000 PrintAddr = 0x00100004 val memdata = Cat(mem(OutAddr + 3.U), mem(OutAddr + 2.U), mem(OutAddr + 1.U), mem(OutAddr)) printf(cf"${Hexadecimal(memdata)}\n") mem(PrintAddr) := 0.U(8.W) mem(PrintAddr + 1.U) := 0.U(8.W) mem(PrintAddr + 2.U) := 0.U(8.W) mem(PrintAddr + 3.U) := 0.U(8.W) } ``` At the end of the test program, I add the following assembly code. This assembly code will read the specified `begin_signature` address and `end_signature` address, then print the values of this memory region one by one using the method described above. ```cpp #define RVMODEL_HALT \ la a0, begin_signature; \ la a1, end_signature; \ li t1, 0x00100000; \ li t2, 1; \ print_data: \ beq a0, a1, halt; \ lw t0, 0(a0); \ sw t0, 0(t1); \ sw t2, 4(t1); \ addi a0 , a0, 4; \ j print_data; \ halt: \ unimp; ``` ### Result :::spoiler test result ```bash INFO | Running Tests on Reference Model. INFO | Initiating signature checking. INFO | Following 47 tests have been run : INFO | TEST NAME : COMMIT ID : STATUS INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/and-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/andi-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/auipc-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/beq-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bge-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bgeu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/blt-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bltu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bne-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/fence-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jal-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jalr-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lb-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lbu-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lh-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lhu-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lui-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lw-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/misalign1-jalr-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/or-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/ori-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sb-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sh-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sll-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slli-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slt-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slti-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltiu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sra-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srai-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srl-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srli-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sub-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sw-align-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xor-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xori-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/div-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/divu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mul-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulh-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulhsu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulhu-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/rem-01.S : - : Passed INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/remu-01.S : - : Passed ``` ::: ## Program Test ### Test 1 This test is modified from [Quiz2 ](/VpbJuNCxS4i92SbkF8BOTw) Problem `A` ```cpp //shift-and-add multiplication algorithm .global main .data multiplier: .word -9 multiplicand: .word 7 .text main: addi sp, sp, -16 sw ra, 12(sp) sw s0, 8(sp) addi s0, sp, 16 la a0, multiplier # Load multiplier address lw a1, 0(a0) # Load multiplier value la a2, multiplicand # Load multiplicand address lw a3, 0(a2) # Load multiplicand value li t0, 0 # Initialize accumulator li t1, 32 # Set bit counter (#A01) # Check for negative values bltz a1, handle_negative1 # If multiplier negative (#A02) j shift_and_add_loop # Skip to main loop (#A05) bltz a3, handle_negative2 # If multiplicand negative (#A03) j shift_and_add_loop # Continue to main loop (#A04) handle_negative1: neg a1, a1 # Make multiplier positive handle_negative2: neg a3, a3 # Make multiplicand positive shift_and_add_loop: beqz t1, end_shift_and_add # Exit if bit count is zero andi t2, a1, 1 # Check least significant bit (#A06) beqz t2, skip_add # Skip add if bit is 0 add t0, t0, a3 # Add to accumulator skip_add: srai a1, a1, 1 # Right shift multiplier slli a3, a3, 1 # Left shift multiplicand addi t1, t1, -1 # Decrease bit counter j shift_and_add_loop # Repeat loop (#A07) end_shift_and_add: li a4, 0x00100000 # Load print address sw t0, 0(a4) # Store final result (#A08) li a5, 1 sw a5, 4(a4) li a5, 0 mv a0, a5 lw ra, 12(sp) lw s0, 8(sp) addi sp, sp, 16 ret ``` The testing result is `0xffffffc1` equal to `-63`. ```bash [info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432) [info] loading project definition from /home/user/workspace/MyCPU/project [info] loading settings for project root from build.sbt... [info] set current project to cpu (in build file:/home/user/workspace/MyCPU/) [info] compiling 1 Scala source to /home/user/workspace/MyCPU/target/scala-2.13/classes ... ffffffc1 [info] HexTest: [info] CPU [info] - should work through hex [info] Run completed in 3 seconds, 78 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Test 2 This test is modified from [Quiz2 ](/VpbJuNCxS4i92SbkF8BOTw) Problem `D` ```cpp //Ancient Egyptian Multiplication .global main .data # Define the data section with two numbers to multiply num1: .word 13 num2: .word 7 .text main: addi sp, sp, -16 sw ra, 12(sp) sw s0, 8(sp) addi s0, sp, 16 # Begin the main code in the text section li x1, 0x00100000 # Load the print address into register x1 lw t0, num1 # Load the first number (num1) into register t0 lw t1, num2 # Load the second number (num2) into register t1 li t2, 0 # Initialize the result (t2) to 0 loop: # Check if the least significant bit of t0 (num1) is 1 (i.e., if the number is odd) andi t3, t0, 1 beq t3, x0, skip_add # If the bit is 0 (even), skip the addition # If the number is odd, add the value in t1 (num2) to the result in t2 add t2, t2, t1 # D01 skip_add: # Perform a right shift on t0 (num1), effectively dividing it by 2 srli t0, t0, 1 # D02 # Perform a left shift on t1 (num2), effectively multiplying it by 2 slli t1, t1, 1 # D03 # If t0 (num1) is not zero, repeat the loop bnez t0, loop # Store the final result in the memory location pointed by x1 sw t2, 0(x1) li a5, 1 sw a5, 4(x1) li a5, 0 mv a0, a5 lw ra, 12(sp) lw s0, 8(sp) addi sp, sp, 16 ret ``` The testing result is `0x0000005b` equal to `91`. ```bash [info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432) [info] loading project definition from /home/user/workspace/MyCPU/project [info] loading settings for project root from build.sbt... [info] set current project to cpu (in build file:/home/user/workspace/MyCPU/) 0000005b [info] HexTest: [info] CPU [info] - should work through hex [info] Run completed in 2 seconds, 142 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Test 3 This test is modified from [Quiz4](/6jedbSKtQ2mcuVfTS9dDOw) Problem `A` ```cpp // Sum of Squares .global main .data # Define the data section with two numbers n: .word 50 m: .word 25 .text main: addi sp, sp, -4 sw ra, 0(sp) lw a0, n lw a1, m call sum_of_squares li t0, 0x00100000 li t1, 1 sw a0, 0(t0) sw t1, 4(t0) lw ra, 0(sp) addi sp, sp, 4 ret sum_of_squares: # Check if n (a0) is less than or equal to zero bgt a0, x0, recurse_case # __ A01 __ zero_case: # If n ≤ 0, return m (a1) add a0, a1, x0 jalr x0, ra, 0 # __ A02 __ recurse_case: # Save caller-saved registers on the stack add t0, a0, x0 # t0 = a0 (copy n) addi sp, sp, -12 # Allocate stack space __ A03 __ sw a1, 0(sp) # Save a1 (m) sw t0, 4(sp) # Save t0 (n) sw ra, 8(sp) # Save return address __ A04 __ # Call the square function jal ra, square # __ A05 __ # Restore registers and stack lw a1, 0(sp) # Restore a1 (m) lw t0, 4(sp) # Restore t0 (n) lw ra, 8(sp) # Restore return address __ A06 __ addi sp, sp, 12 # Deallocate stack space __ A07 __ # Update m = m + n^2 add a1, a1, a0 # Decrement n: a0 = n - 1 addi a0, t0, -1 # Recursive call to sum_of_squares addi sp, sp, -4 # Allocate stack space for ra __ A08 __ sw ra, 0(sp) # Save return address jal ra, sum_of_squares # __ A09 __ lw ra, 0(sp) # Restore return address addi sp, sp, 4 # Deallocate stack space __ A10 __ # Return from the function jalr x0, ra, 0 # __ A11 __ # Function: square # Computes the square of an integer (a0 = n), returns result in a0 square: addi sp, sp, -8 # Allocate stack space sw ra, 0(sp) # Save return address __ A13 __ add t0, x0, x0 # t0 = 0 (accumulator for the result) add t1, a0, x0 # t1 = a0 (copy of n, multiplicand) add t2, a0, x0 # t2 = a0 (copy of n, multiplier) square_loop: andi t3, t2, 1 # Check the lowest bit of t2 (t2 & 1) __ A14 __ beq t3, x0, skip_add # If the bit is 0, skip addition add t0, t0, t1 # Accumulate: t0 += t1 skip_add: sll t1, t1, 1 # Left shift t1 (multiply by 2) __ A15 __ srl t2, t2, 1 # Right shift t2 (divide by 2) __ A16 __ bne t2, x0, square_loop # Repeat loop if t2 is not zero __ A17 __ square_end: add a0, t0, x0 # Move result to a0 lw ra, 0(sp) # Restore return address __ A18 __ addi sp, sp, 8 # Deallocate stack space jalr x0, ra, 0 # Return from function __ A19 __ ``` The testing result is `0x0000a7c6` equal to `42950`. ```bash [info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432) [info] loading project definition from /home/user/workspace/MyCPU/project [info] loading settings for project root from build.sbt... [info] set current project to cpu (in build file:/home/user/workspace/MyCPU/) 0000a7c6 [info] HexTest: [info] CPU [info] - should work through hex [info] Run completed in 8 seconds, 172 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` if you use the following `c` program, you will get the same result. The program is the implementation of the following equation. $$ sum = m + n^2 + (n - 1)^2 + \ldots + 1^2 $$ ```cpp #include <stdio.h> int main() { int n=50, m=25; int sum = 0; for(int i=1;i<n+1;i++){ sum += i * i; } printf("%d\n", sum+m); return 0; } ```