ChiselRiscV

郭君瑋

GitHub

Mission

In my term project, my goal is Study ChiselRiscV and the books mentioned on the project page to understand how to build a RISC-V processor using Chisel. Ensure that it passes the riscv-arch-test and supports the RV32IM and CSR instructions.Finally, select and modify at least three RISC-V programs from course quiz, and make them run on my improved ChiselRiscV processor.
The entire process can be divided into the following three main steps:

  1. Study how to make a chisel-based RISC-V CPU
  2. Make my CPU pass riscv-arch-test
  3. Run three programs from quiz selected from course quiz on it

Prerequisites

Before beginning my study and implementation, it is essential to build my environment. The following are OS and the software I used in this project.

For chisel

  • verilator
  • gtkwave
  • sbt
  • jdk

For riscv-arch-test

I use Docker to set up my own environment. The detail installation of the above software and others not mentioned is recorded in the following Dockerfile.

Dockerfile
# Building Stage
FROM ubuntu:22.04 AS stage_building
ARG RISCV_GNU_DEP="wget curl make autoconf automake autotools-dev python3 python3-pip \
  libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf \
  libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git libglib2.0-dev"

ARG SAIL_DEP="opam build-essential libgmp-dev z3 pkg-config zlib1g-dev"

RUN apt update && apt -y install ${RISCV_GNU_DEP} ${SAIL_DEP}

# riscv-gnu-toolchain
RUN git clone https://github.com/riscv/riscv-gnu-toolchain && \
    cd riscv-gnu-toolchain && \
    ./configure --prefix=/usr/local/riscv --enable-multilib --with-arch=rv32gc --with-abi=ilp32d && \
    make -j `nproc` && \
    make install

# SAIL C-emulator
RUN opam init -y --disable-sandboxing && \
    opam switch create ocaml-base-compiler.4.08.1 && \
    opam install sail -y && \
    eval $(opam config env) && \
    git clone https://github.com/riscv/sail-riscv.git /usr/local/sail-riscv && \
    cd /usr/local/sail-riscv && \
    ARCH=RV32 make -j `nproc`

# Final Stage
FROM ubuntu:22.04 AS stage_final

# copy tools built in last stage to final stage
COPY --from=stage_building /usr/local/riscv /usr/local/riscv
COPY --from=stage_building /usr/local/sail-riscv /usr/local/sail-riscv

# set PATH to include RISC-V GNU Toolchain and SAIL C-emulator
ENV PATH="$PATH:/usr/local/riscv/bin"
ENV PATH="$PATH:/usr/local/sail-riscv/c_emulator"

# zip and unzip are required for the sdkman to be installed from script
RUN \
    apt update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
        build-essential \
        verilator \
        curl \
        zip \ 
        unzip \
        sudo \
        git \
        python3 \
        python3-pip \
        && \
    rm -rf /var/lib/apt/lists/*

# this SHELL command is needed to allow `source` to work properly
# reference: https://stackoverflow.com/questions/20635472/using-the-run-instruction-in-a-dockerfile-with-source-does-not-work/45087082#45087082
SHELL ["/bin/bash", "-c"] 

# add a user whose uid and gid are same as the master user
ARG UID GID NAME=user
RUN groupadd -g $GID -o $NAME
RUN useradd -u $UID -m -g $NAME -G plugdev $NAME && \
    echo "$NAME ALL = NOPASSWD: ALL" > /etc/sudoers.d/user && \
    chmod 0440 /etc/sudoers.d/user
RUN chown -R $NAME:$NAME /home/$NAME
USER $NAME

# reference: https://sdkman.io/install
RUN curl -s "https://get.sdkman.io" | bash 

RUN source "$HOME/.sdkman/bin/sdkman-init.sh" && \
    sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) && \
    sdk install sbt

# install riscof and riscv-arch-test
RUN pip3 install --upgrade pip && \
    pip3 install riscof 
ENV PATH="$PATH:/home/user/.local/bin"

WORKDIR "/home/user/workspace"

ENTRYPOINT ["/bin/bash"]

In the official RISCOF document, there is a small mistake regarding the installation of SAIL. The version of ocaml-base-compiler needs to be at least 4.08.1 instead of 4.06.1. Additionally, when installing RISCOF by pip, it will be installed in /home/user/.local/bin, you need to use export PATH="$PATH:/home/user/.local/bin" to get riscof this command in your terminal.

ChiselRiscV

ChiselRiscV is a 32-bit RISC-V CPU implemented according to the book "CPU Design with RISC-V and Chisel - First step to custom CPU implementation with open-source ISA". To understand how to make a riscv CPU by Chisel, the first step is cloning this repository.

git clone https://github.com/nozomioshi/ChiselRiscV.git

After cloning repository, we can see this file structure

.
├── build.sbt
├── doc
├── dockerfile
├── .gitignore
├── README.md
├── results
├── src
│   ├── main
│   │   ├── c
│   │   ├── scala
│   │   └── shell
│   └── test
│       ├── resources
│       └── scala
└── target

In Chisel, the source code can divided two part, main and test. main contains the code for all hardware behaviors, and test is similiar to the testbench in Verilog. It is responsible for providing inputs and verifying the outputs. build.sbt include the compile configuration and the version of Scala and Chisel.

If we want to run the all test, use

sbt run

or use testOnly

sbt "testOnly ctest.tests.HexTest"

to run the Specified test.

Fetch

Memory consists of UInt(8.W), representing a byte. RISC-V uses little-endian, which means the least significant byte is stored at the lowest address.

8 bit fetch.hex

Address Data
0 11
1 12
2 13
3 14
4 21
11 34

8 bit fetch.hex after being organized into 32 bit.

Address Data
0 14131211
4 24232221
8 34333221

Hence, the behavior of fetching memory will like the following.

imem.inst := Cat(
    mem(imem.addr + 3.U),
    mem(imem.addr + 2.U),
    mem(imem.addr + 1.U),
    mem(imem.addr )
)

pcReg will increment based on the StartAddr when test starts. Every cycle, CPU fetch instruction depended on pcReg.

val pcReg = RegInit(StartAddr)
pcReg := pcReg + 4.U
imem.addr := pcReg
val inst = imem.inst

Decode

32 registers are defined in RV32I, which are used to store data and addresses. Each register is 32-bit wide.

register
Address register
0 zero
1 ra
2 sp
3 gp
4 tp
5 t0
6 t1
7 t2
8 s0/fp
9 s1
10 a0
11 a1
12 a2
13 a3
14 a4
15 a5
16 a6
17 a7
18 s2
19 s3
20 s4
21 s5
22 s6
23 s7
24 s8
25 s9
26 s10
27 s11
28 t3
29 t4
30 s5
31 t6

And there is a register memory declared in Core. we can use the above table to find the register we want to accessed

val regFile = Mem(32, UInt(WordLen.W))

RV32I basic instruction set is composed of 32-bit instructions.
The instruction format is divided into six types: R-Type, I-Type, S-Type,U-Type,J-Type and B-Type. J-Type and B-Type are respectively come from I-Type and S-Type, so we can say that there are four basic types of instructions.

Instruction format
​​​​[R-Type]
​​​​+-------------------------------------------------------------------------------------------------+
​​​​| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
​​​​+---------------------+--------------+--------------+--------+--------------+---------------------+
​​​​| funct7              | rs2          | rs1          | funct3 | rd           | opcode              |
​​​​+---------------------+--------------+--------------+--------+--------------+---------------------+

​​​​[I-Tpye]
​​​​+-------------------------------------------------------------------------------------------------+
​​​​| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
​​​​+------------------------------------+--------------+--------+--------------+---------------------+
​​​​| imm_i                              | rs1          | funct3 | rd           | opcode              |
​​​​+------------------------------------+--------------+--------+--------------+---------------------+

​​​​[S-Type]
​​​​+-------------------------------------------------------------------------------------------------+
​​​​| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
​​​​+---------------------+--------------+--------------+--------+--------------+---------------------+
​​​​| imm_s(11:5)         | rs2          | rs1          | funct3 | imm_s(4:0)   | opcode              |
​​​​+---------------------+--------------+--------------+--------+--------------+---------------------+

​​​​[U-Type]
​​​​+-------------------------------------------------------------------------------------------------+
​​​​| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
​​​​+------------------------------------------------------------+--------------+---------------------+
​​​​| imm_u(11:5)                                                | rd           | opcode              |
​​​​+------------------------------------------------------------+--------------+---------------------+

​​​​[J-Type]
​​​​+-------------------------------------------------------------------------------------------------+
​​​​| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
​​​​+------------------------------------------------------------+--------------+---------------------+
​​​​| imm_j(20 + 10:1 + 11 + 19:12)                              | rd           | opcode              |
​​​​+------------------------------------------------------------+--------------+---------------------+

​​​​[B-Type]
​​​​+-------------------------------------------------------------------------------------------------+
​​​​| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
​​​​+---------------------+--------------+--------------+--------+--------------+---------------------+
​​​​| imm_b(12 + 10:5)    | rs2          | rs1          | funct3 | imm_b(4:1+11)| opcode              |
​​​​+---------------------+--------------+--------------+--------+--------------+---------------------+

The CPU can analyze the instructions based on these formats to determine the required registers and data.

val rs1Addr = inst(19, 15)
val rs2Addr = inst(24, 20)
val wbAddr  = inst(11, 7)
val rs1Data = Mux(rs1Addr =/= 0.U, regFile(rs1Addr), 0.U)
val rs2Data = Mux(rs2Addr =/= 0.U, regFile(rs2Addr), 0.U)

and immediate can be got by the same method.

val immI        = inst(31, 20)
val immIsext    = Cat(Fill(20, immI(11)), immI)
val immS        = Cat(inst(31, 25), inst(11, 7))
val immSsext    = Cat(Fill(20, immS(11)), immS)
val immB        = Cat(inst(31), inst(7), inst(30, 25), inst(11, 8))
val immBsext    = Cat(Fill(19, immB(11)), immB, 0.U(1.W))
val immJ        = Cat(inst(31), inst(19, 12), inst(20), inst(30, 21))
val immJsext    = Cat(Fill(11, immJ(19)), immJ, 0.U(1.W))
val immU        = inst(31, 12)
val immUshifted = Cat(immU, 0.U(12.W))
val immZ        = inst(19, 15)
val immZext     = Cat(Fill(27, 0.U), immZ)

Since every instructions execute different behaviors, it is necessary to use control signals to determine the circuit paths. The following is the signal applied to this CPU.

val exeFun :: op1Sel :: op2Sel :: memWen :: regFileWen :: wbSel :: csrCmd :: Nil = controlSignals

A lookup table is required to decode every instruction and determining all the signals.

val controlSignals = ListLookup(inst, List(AluX, Op1Rs1, Op2Rs2, MenX, RenS, WbX, CsrX),
    Array(
        Lw     -> List(AluAdd,   Op1Rs1, Op2Imi, MenX, RenS, WbMem, CsrX),
        Sw     -> List(AluAdd,   Op1Rs1, Op2Ims, MenS, RenX, WbX,   CsrX),
        Add    -> List(AluAdd,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Addi   -> List(AluAdd,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Sub    -> List(AluSub,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        And    -> List(AluAnd,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Or     -> List(AluOr,    Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Xor    -> List(AluXor,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Andi   -> List(AluAnd,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Ori    -> List(AluOr,    Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Xori   -> List(AluXor,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Sll    -> List(AluSll,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Srl    -> List(AluSrl,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Sra    -> List(AluSra,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Slli   -> List(AluSll,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Srli   -> List(AluSrl,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Srai   -> List(AluSra,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Slt    -> List(AluSlt,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Sltu   -> List(AluSltu,  Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
        Slti   -> List(AluSlt,   Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Sltiu  -> List(AluSltu,  Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
        Beq    -> List(BrBeq,    Op1Rs1, Op2Rs2, MenX, RenX, WbX,   CsrX),
        Bne    -> List(BrBne,    Op1Rs1, Op2Rs2, MenX, RenX, WbX,   CsrX),
        Blt    -> List(BrBlt,    Op1Rs1, Op2Rs2, MenX, RenX, WbX,   CsrX),
        Bge    -> List(BrBge,    Op1Rs1, Op2Rs2, MenX, RenX, WbX,   CsrX),
        Bltu   -> List(BrBltu,   Op1Rs1, Op2Rs2, MenX, RenX, WbX,   CsrX),
        Bgeu   -> List(BrBgeu,   Op1Rs1, Op2Rs2, MenX, RenX, WbX,   CsrX),
        Jal    -> List(AluAdd,   Op1Pc,  Op2Imj, MenX, RenS, WbPc,  CsrX),
        Jalr   -> List(AluJalr,  Op1Rs1, Op2Imi, MenX, RenS, WbPc,  CsrX),
        Lui    -> List(AluAdd,   Op1X,   Op2Imu, MenX, RenS, WbAlu, CsrX),
        AuiPc  -> List(AluAdd,   Op1Pc,  Op2Imu, MenX, RenS, WbAlu, CsrX),
        CsrRw  -> List(AluCopy1, Op1Rs1, Op2X,   MenX, RenS, WbCsr, CsrW),
        CsrRs  -> List(AluCopy1, Op1Rs1, Op2X,   MenX, RenS, WbCsr, CsrS),
        CsrRc  -> List(AluCopy1, Op1Rs1, Op2X,   MenX, RenS, WbCsr, CsrC),
        CsrRwi -> List(AluCopy1, Op1Imz, Op2X,   MenX, RenS, WbCsr, CsrW),
        CsrRsi -> List(AluCopy1, Op1Imz, Op2X,   MenX, RenS, WbCsr, CsrS),
        CsrRci -> List(AluCopy1, Op1Imz, Op2X,   MenX, RenS, WbCsr, CsrC),
        Ecall  -> List(AluX,     Op1X,   Op2X,   MenX, RenX, WbX,   CsrE)
    )
)

Here, take an add instruction as a example to demonstrate the behavior of the lookup table. If an add instruction is the following.

00a50e33   add	t3,a0,a0  # 0x00a50e33 = 0000 0000 1010 0101 0000 1110 0011 0011

The bit pattern of add is b0000000??????????000?????0110011.Then, the CPU will use this pattern to match the instruction in the table, identify which instruction it is, and give back a set of control signals.

bit pattern
val Lw      = BitPat("b?????????????????010?????0000011")
val Sw      = BitPat("b?????????????????010?????0100011")

// Add
val Add     = BitPat("b0000000??????????000?????0110011")
val Addi    = BitPat("b?????????????????000?????0010011")

// Subtract
val Sub     = BitPat("b0100000??????????000?????0110011")

// Logical
val And     = BitPat("b0000000??????????111?????0110011")
val Or      = BitPat("b0000000??????????110?????0110011")
val Xor     = BitPat("b0000000??????????100?????0110011")
val Andi    = BitPat("b?????????????????111?????0010011")
val Ori     = BitPat("b?????????????????110?????0010011")
val Xori    = BitPat("b?????????????????100?????0010011")

// Shift
val Sll     = BitPat("b0000000??????????001?????0110011")
val Srl     = BitPat("b0000000??????????101?????0110011")
val Sra     = BitPat("b0100000??????????101?????0110011")
val Slli    = BitPat("b0000000??????????001?????0010011")
val Srli    = BitPat("b0000000??????????101?????0010011")
val Srai    = BitPat("b0100000??????????101?????0010011")

// Compare
val Slt     = BitPat("b0000000??????????010?????0110011")
val Sltu    = BitPat("b0000000??????????011?????0110011")
val Slti    = BitPat("b?????????????????010?????0010011")
val Sltiu   = BitPat("b?????????????????011?????0010011")

// Branch
val Beq     = BitPat("b?????????????????000?????1100011")
val Bne     = BitPat("b?????????????????001?????1100011")
val Blt     = BitPat("b?????????????????100?????1100011")
val Bge     = BitPat("b?????????????????101?????1100011")
val Bltu    = BitPat("b?????????????????110?????1100011")
val Bgeu    = BitPat("b?????????????????111?????1100011")

// Jump
val Jal     = BitPat("b?????????????????????????1101111")
val Jalr    = BitPat("b?????????????????000?????1100111")

// Load immediate
val Lui     = BitPat("b?????????????????????????0110111")
val AuiPc   = BitPat("b?????????????????????????0010111")

// CSR
val CsrRw   = BitPat("b?????????????????001?????1110011")
val CsrRwi  = BitPat("b?????????????????101?????1110011")
val CsrRs   = BitPat("b?????????????????010?????1110011")
val CsrRsi  = BitPat("b?????????????????110?????1110011")
val CsrRc   = BitPat("b?????????????????011?????1110011")
val CsrRci  = BitPat("b?????????????????111?????1110011")

// Exception
val Ecall   = BitPat("b00000000000000000000000001110011")

Because of different instruction format and usage, the operation is not performed on two different registers but on a register and an immediate value sometimes.

val op1Data = MuxCase(0.U, Seq(
    (op1Sel === Op1Rs1) -> rs1Data,
    (op1Sel === Op1Pc)  -> pcReg,
    (op1Sel === Op1Imz) -> immZext
))
val op2Data = MuxCase(0.U, Seq(
    (op2Sel === Op2Rs2) -> rs2Data,
    (op2Sel === Op2Imi) -> immIsext,
    (op2Sel === Op2Ims) -> immSsext,
    (op2Sel === Op2Imj) -> immJsext,
    (op2Sel === Op2Imu) -> immUshifted
))

Execute

After determining op1Data and op2Data, the inputs wiil be sent to the ALU. The design of the ALU is shown in the following code, where the control signal determines the type of operation to be executed.

aluOut := MuxCase(0.U, Seq(
    (exeFun === AluAdd)   -> (op1Data + op2Data),
    (exeFun === AluSub)   -> (op1Data - op2Data),
    (exeFun === AluAnd)   -> (op1Data & op2Data),
    (exeFun === AluOr)    -> (op1Data | op2Data),
    (exeFun === AluXor)   -> (op1Data ^ op2Data),
    (exeFun === AluSll)   -> (op1Data << op2Data(4, 0))(31, 0),
    (exeFun === AluSrl)   -> (op1Data >> op2Data(4, 0)),
    (exeFun === AluSra)   -> (op1Data.asSInt >> op2Data(4, 0)).asUInt,
    (exeFun === AluSlt)   -> (op1Data.asSInt < op2Data.asSInt).asUInt,
    (exeFun === AluSltu)  -> (op1Data < op2Data).asUInt,
    (exeFun === AluJalr)  -> ((op1Data + op2Data) & ~1.U(WordLen.W)),
    (exeFun === AluCopy1) -> op1Data
))

For B format instruciotn, branch Comparator is also required.

brFlag := MuxCase(false.B, Seq(
    (exeFun === BrBeq)  -> (op1Data === op2Data),
    (exeFun === BrBne)  -> (op1Data =/= op2Data),
    (exeFun === BrBlt)  -> (op1Data.asSInt < op2Data.asSInt),
    (exeFun === BrBge)  -> !(op1Data.asSInt < op2Data.asSInt),
    (exeFun === BrBltu) -> (op1Data < op2Data),
    (exeFun === BrBgeu) -> !(op1Data < op2Data)
))

brTarget := pcReg + immBsext

Becasue B and J format maybe jump to another address, modification to the pcReg is essential. If instruction is Jal or Jalr, directly write the jump address to pcReg, and so do B format instructions.

val pcReg = RegInit(StartAddr)
// pcReg := pcReg + 4.U
imem.addr := pcReg
val inst = imem.inst

val jmpFlag   = inst === Jal || inst === Jalr
val eCallFlag = inst === Ecall

val aluOut = Wire(UInt(WordLen.W))
pcReg := MuxCase(pcPlus4, Seq(
    brFlag    -> brTarget,
    jmpFlag   -> aluOut,
    eCallFlag -> csrRegFile(0x305.U) // Trap vector
))

when CPU execute ecall, the trap handler must be triggered. Since this CPU only implements M-mode(Machine mode), when an ecall occurs, the value of mtvec must be written to pcReg to allow the CPU to jump to the trap handler.

Memory access

load and save instructions must access data memory, so only lw and sw are related to this stage.

dmem.addr  := aluOut
dmem.wEn   := memWen
dmem.wData := rs2Data

CPU access data memory according to the address of register value and immediate (rs1Data + immS) in sw instrution.

dmem.data := Cat(
    mem(dmem.addr + 3.U),
    mem(dmem.addr + 2.U),
    mem(dmem.addr + 1.U),
    mem(dmem.addr)
)

The memWen signal will only be true for the lw instruction. In other words, only the lw instruction writes data memory based on the register value and the immediate address (rs1Data + immI).

when(dmem.wEn) {
    mem(dmem.addr + 3.U) := dmem.wData(31, 24)
    mem(dmem.addr + 2.U) := dmem.wData(23, 16)
    mem(dmem.addr + 1.U) := dmem.wData(15, 8)
    mem(dmem.addr)       := dmem.wData(8, 0)
}

Write back

Except for lw, sw, J and B format instrutions, the remaining basic instructions write the result of aluOut back to the specified register when regFileWen === RenS. lw, Jal and Jalr do the same thing, but lw write the data accessed from memory, while Jal and Jalr write back pcPlus4, instead of aluOut. sw and B format instrutions don't care about this stage.

val wbData = MuxCase(aluOut, Seq(
    (wbSel === WbMem) -> dmem.data,
    (wbSel === WbPc)  -> pcPlus4,
    (wbSel === WbCsr) -> csrRdata
))
when(regFileWen === RenS) {
    regFile(wbAddr) := wbData
}

CSR instuction is atomic. Such instructions cannot be divided into separate steps.
In other words, CSR instuction can read and write the same CSR register in the same time. Take csrrw for a example, csrrw is "Atomic Read/Write in CSR". csrrw read value from the CSR rgister and write it to rd register (see the above code), while the value in rs1 register is read and written to CSR register.

csrrw  rd, csr, rs1
csrrwi rd, csr, imm_z
csrrs  rd, csr, rs1
csrrsi rd, csr, imm_z
csrrc  rd, csr, rs1
csrrci rd, csr, imm_z

s for set, the operated value is ORed with the CSR value.

c for clear, the operated value is inverted and ANDed with the CSR value.

// CSR
val csrAddr = Mux(csrCmd === CsrE, 0x342.U, inst(31, 20)) // mcause: 0x342
val csrRdata = csrRegFile(csrAddr)
val csrWdata = MuxCase(0.U, Seq(
    (csrCmd === CsrW) -> op1Data,
    (csrCmd === CsrS) -> (csrRdata | op1Data),
    (csrCmd === CsrC) -> (csrRdata & ~op1Data),
    (csrCmd === CsrE) -> 11.U // Machine ECALL
))
when(csrCmd > 0.U) {
    csrRegFile(csrAddr) := csrWdata
}

The exit is the end signal. when instrution is Unimp, the exit signal is raised to high. Then, test detect that exit signal is high, the whole CPU test will be finished.

exit := (inst === Unimp)
test(config()) { dut =>
    while(!dut.exit.peek().litToBoolean) {
        dut.clock.setTimeout(0)
        dut.clock.step()
    }
    dut.clock.step()
}

Improve ChiselRiscV

Through my research, I discovered that this CPU lacks some basic instructions and needs to add the M-series instructions.

Complement load and save instructions

The missing instructions are Lh, Lhu, Lb, Lbu, Sh and Sb. In order to adding them to the current CPU, the first step is modifying the decoder. Hence, I add the bit pattern of these instructions while adjusting the lookup table and part of control signal.

// bit pattern
val Lw      = BitPat("b?????????????????010?????0000011")
val Lh      = BitPat("b?????????????????001?????0000011")
val Lhu     = BitPat("b?????????????????101?????0000011")
val Lb      = BitPat("b?????????????????000?????0000011")
val Lbu     = BitPat("b?????????????????100?????0000011")
val Sw      = BitPat("b?????????????????010?????0100011")
val Sh      = BitPat("b?????????????????001?????0100011")
val Sb      = BitPat("b?????????????????000?????0100011")
// lookup table
Lh     -> List(AluAdd,    Op1Rs1, Op2Imi, MenX, RenS, WbLH,  CsrX),
Lhu    -> List(AluAdd,    Op1Rs1, Op2Imi, MenX, RenS, WbLHU, CsrX),
Lb     -> List(AluAdd,    Op1Rs1, Op2Imi, MenX, RenS, WbLB,  CsrX),
Lbu    -> List(AluAdd,    Op1Rs1, Op2Imi, MenX, RenS, WbLBU, CsrX),
Sw     -> List(AluAdd,    Op1Rs1, Op2Ims, MenSW, RenX, WbX,  CsrX),
Sh     -> List(AluAdd,    Op1Rs1, Op2Ims, MenSH, RenX, WbX,  CsrX),
Sb     -> List(AluAdd,    Op1Rs1, Op2Ims, MenSB, RenX, WbX,  CsrX),

In the beginning, the signal memWen is ture only when the instruction is sw. after adding sh and sb, this signal is expanded to four states and can identify how much byte data should be stored.

// old Memory access
dmem.addr  := aluOut
dmem.wEn   := memWen
dmem.wData := rs2Data

// new Memory access
dmem.addr  := aluOut
dmem.wEn   := (memWen > 0.U)
dmem.wData := MuxCase(0.U, Seq(
    (memWen === MenSW)  -> rs2Data,
    (memWen === MenSH)  -> Cat(dmem.data(31, 16), rs2Data(15, 0)),
    (memWen === MenSB)  -> Cat(dmem.data(31, 8), rs2Data(7, 0))
))

wbSel is also expanded to 8 states. For each load instructions, there are corresponding write-back behaviors.

// old Write back
val wbData = MuxCase(aluOut, Seq(
    (wbSel === WbMem) -> dmem.data,
    (wbSel === WbPc)  -> pcPlus4,
    (wbSel === WbCsr) -> csrRdata
))
when(regFileWen === RenS) {
    regFile(wbAddr) := wbData
}

// Nnew Write back
val wbData = MuxCase(aluOut, Seq(
    (wbSel === WbLW)  -> dmem.data,
    (wbSel === WbLH)  -> Cat(Fill(16, dmem.data(31)), dmem.data(15, 0)),
    (wbSel === WbLHU) -> Cat(Fill(16, 0.U), dmem.data(15, 0)),
    (wbSel === WbLB)  -> Cat(Fill(24, dmem.data(31)), dmem.data(7, 0)),
    (wbSel === WbLBU) -> Cat(Fill(24, 0.U), dmem.data(7, 0)),
    (wbSel === WbPc)  -> pcPlus4,
    (wbSel === WbCsr) -> csrRdata
))
when(regFileWen === RenS) {
    regFile(wbAddr) := wbData
}

M extension

expanding M-series instructions is similiar to complement load and save instructions. Adding bit pattern and modifying lookup table is necessary.

// bit pattern
val Mul     = BitPat("b0000001??????????000?????0110011")
val Mulh    = BitPat("b0000001??????????001?????0110011")
val Mulhsu  = BitPat("b0000001??????????010?????0110011")
val Mulhu   = BitPat("b0000001??????????011?????0110011")
val Div     = BitPat("b0000001??????????100?????0110011")
val Divu    = BitPat("b0000001??????????101?????0110011")
val Rem     = BitPat("b0000001??????????110?????0110011")
val Remu    = BitPat("b0000001??????????111?????0110011")
// lookup table
Mul    -> List(AluMul,    Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Mulh   -> List(AluMulh,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Mulhsu -> List(AluMulhsu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Mulhu  -> List(AluMulhu,  Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Div    -> List(AluDiv,    Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Divu   -> List(AluDivu,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Rem    -> List(AluRem,    Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Remu   -> List(AluRemu,   Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),

For M-series instructions, it is sufficient to expand the behavior of the ALU, and the rest of the signal controls are the same as add instruction.

// ALU
aluOut := MuxCase(0.U, Seq(
    (exeFun === AluAdd)    -> (op1Data + op2Data),
    (exeFun === AluSub)    -> (op1Data - op2Data),
    (exeFun === AluAnd)    -> (op1Data & op2Data),
    (exeFun === AluOr)     -> (op1Data | op2Data),
    (exeFun === AluXor)    -> (op1Data ^ op2Data),
    (exeFun === AluSll)    -> (op1Data << op2Data(4, 0))(31, 0),
    (exeFun === AluSrl)    -> (op1Data >> op2Data(4, 0)),
    (exeFun === AluSra)    -> (op1Data.asSInt >> op2Data(4, 0)).asUInt,
    (exeFun === AluSlt)    -> (op1Data.asSInt < op2Data.asSInt).asUInt,
    (exeFun === AluSltu)   -> (op1Data < op2Data).asUInt,
    (exeFun === AluJalr)   -> ((op1Data + op2Data) & ~1.U(WordLen.W)),
    (exeFun === AluCopy1)  -> op1Data,
    (exeFun === AluMul)    -> (op1Data * op2Data)(31, 0).asUInt,
    (exeFun === AluMulh)   -> (op1Data.asSInt * op2Data.asSInt)(63, 32).asUInt,
    (exeFun === AluMulhsu) -> (op1Data.asSInt * op2Data)(63, 32).asUInt,
    (exeFun === AluMulhu)  -> (op1Data * op2Data)(63, 32).asUInt,
    (exeFun === AluDiv)    -> Mux(op2Data.asSInt === 0.S(WordLen.W), 0xFFFFFFFF.S(WordLen.W), (op1Data.asSInt/op2Data.asSInt)).asUInt,
    (exeFun === AluDivu)   -> Mux(op2Data === 0.U(WordLen.W), 0xFFFFFFFF.S(WordLen.W).asUInt, (op1Data/op2Data)).asUInt,
    (exeFun === AluRem)    -> Mux(op2Data.asSInt === 0.S(WordLen.W), op1Data.asSInt, (op1Data.asSInt % op2Data.asSInt)).asUInt,
    (exeFun === AluRemu)   -> Mux(op2Data === 0.U(WordLen.W), op1Data, (op1Data % op2Data)).asUInt
))

On division, you have to be careful about deviding by 0. Specification for this case is also defined.The quotient of division by zero has all bits set, and the remainder of division by zero equals the dividend.
from Implemented M-extension of RISCV

Riscv-Arch-Test

The riscv-arch-test are an evolving set of tests that are created to help ensure that software written for a given RISC-V Profile/Specification will run on all implementations that comply with that profile. The older 2.x version of the framework is based on Makefiles and the current version 3.10 I Adopt use RISCOF as its basis system.

RISCOF(The RISC-V Compatibility Framework) is a python based framework which enables testing of a RISC-V target (hard or soft implementations) against a standard RISC-V golden reference model using a suite of RISC-V architectural assembly tests.

RISCOF generates standard pre-built templates for DUTs and Reference Models for the user via the setup command as shown below:

riscof setup --dutname=spike

The above command will generate the following files and directories in the current directory:

├──config.ini                   # configuration file for riscof
├──spike/                       # DUT plugin templates
   ├── env
   │   ├── link.ld              # DUT linker script
   │   └── model_test.h         # DUT specific header file
   ├── riscof_spike.py          # DUT python plugin
   ├── spike_isa.yaml           # DUT ISA yaml based on riscv-config
   └── spike_platform.yaml      # DUT Platform yaml based on riscv-config
├──sail_cSim/                   # reference plugin templates
   ├── env
   │   ├── link.ld              # Reference linker script
   │   └── model_test.h         # Reference model specific header file
   ├── __init__.py
   └── riscof_sail_cSim.py      # Reference model python plugin.

The generate template config.ini will look something like this by default:

[RISCOF]
ReferencePlugin=sail_cSim
ReferencePluginPath=/path/to/riscof/sail_cSim
DUTPlugin=spike
DUTPluginPath=/path/to/riscof/spike

## Example configuration for spike plugin.
[spike]
pluginpath=/path/to/riscof/spike/
ispec=/path/to/riscof/spike/spike_isa.yaml
pspec=/path/to/riscof/spike/spike_platform.yaml

[sail_cSim]
pluginpath=/path/to/riscof/sail_cSim

Before you start to run RISCOF, you should supply the path of some files about your hardware model, such as plugin, ispec and pspec to config.ini.

DUT Plugin

A typical DUT plugin directory has the following structure:

├──dut-name/                    # DUT plugin templates
   ├── env
   │   ├── link.ld              # DUT linker script
   │   └── model_test.h         # DUT specific header file
   ├── riscof_dut-name.py       # DUT python plugin
   ├── dut-name_isa.yaml        # DUT ISA yaml based on riscv-config
   └── dut-name_platform.yaml   # DUT Platform yaml based on riscv-config

The python plugin files capture the behavior of model for compiling tests, executing them on the DUT and finally extracting the signature for each test.

The yaml specs in the DUT plugin directory are the most important inputs to the RISCOF framework. All decisions of filtering tests depend on the these YAML files. The files must follow the syntax/format specified by riscv-config. These YAMLs are validated in RISCOF using riscv-config.

The env folder can also contain other necessary plugin specific files for pre/post processing of logs, signatures, elfs, etc.

Building ChiselRiscV Plugin

For ChiselRiscV, the input data is .hex compiled .c via riscv-gnu-toolchain. I use the following command to compile the testing program.

%: ./src/%.c
    riscv32-unknown-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o ./build/init.o ./scripts/init.S
    riscv32-unknown-elf-gcc $< -O0 -march=rv32im_zicsr -mabi=ilp32 -c -o ./build/$@.o
    riscv32-unknown-elf-ld ./build/$@.o ./build/init.o -b elf32-littleriscv -T ./scripts/link.ld -o ./build/$@
    riscv32-unknown-elf-objcopy ./build/$@ -O binary ./bin/$@.bin
    od ./bin/$@.bin -An -tx1 -w1 -v > ../../test/resources/hex/$@.hex
    riscv32-unknown-elf-objdump ./build/$@ -b elf32-littleriscv -D > ./dump/$@.dump

Then, it produce a .hex as input of hardware and a dump for debugging.

the testing command is the following:

sbt "testOnly ChiselRiscV.tests.HexTest -- -DprogramFile=ctest.hex -DwriteVcd=1"

I add argument -DprogramFile to find where the .hex is. The loadMemoryFromFileInline funtion read it to mem.

def loadMemoryFromHexFile(filename: Option[String]): Unit = loadMemoryFromFileInline(mem, filename.get)

In RISCOF, the testing is comparing the section .data of the program with the reference hardware to verify whether the hardware behavior is correct. Therefore, the dut hardware must have the funtion of printing out the memory content. However, Chisel doesn't have any funtion about writting data to file, like fwrite in Verilog. So, I used 2>&1 | tee output.stdout to save terminal output and Capture the part of memory data. This method was provided by nucleusrv.

sbt \"testOnly ChiselRiscV.tests.HexTest -- -DprogramFile=ctest.hex -DwriteVcd=1\" 2>&1 | tee output.stdout; `grep '^[a-f0-9]\+$$' output.stdout > output.signature`

The remaining task is to print the content of a specified memory address. Here, I refer to the MMIO approach by setting two rarely used memory addresses as output addresses. By moving specific memory value to OutAddr and then storing 1 into PrintAddr, the hardware will print the value of OutAddr in the terminal.

when(PrintAddr_mem===1.U(WordLen.W)) {// OutAddr = 0x00100000 PrintAddr = 0x00100004
    val memdata = Cat(mem(OutAddr + 3.U), mem(OutAddr + 2.U), mem(OutAddr + 1.U), mem(OutAddr))
    printf(cf"${Hexadecimal(memdata)}\n")
    mem(PrintAddr)       := 0.U(8.W)
    mem(PrintAddr + 1.U) := 0.U(8.W)
    mem(PrintAddr + 2.U) := 0.U(8.W)
    mem(PrintAddr + 3.U) := 0.U(8.W)
}   

At the end of the test program, I add the following assembly code. This assembly code will read the specified begin_signature address and end_signature address, then print the values of this memory region one by one using the method described above.

#define RVMODEL_HALT \
            la a0, begin_signature; \
            la a1, end_signature; \
            li t1, 0x00100000; \
            li t2, 1; \
        print_data: \  
            beq a0, a1, halt; \
            lw t0, 0(a0); \
            sw t0, 0(t1); \
            sw t2, 4(t1); \
            addi a0 , a0, 4; \
            j print_data; \
        halt: \
            unimp;   

Result

test result
    INFO | Running Tests on Reference Model.
    INFO | Initiating signature checking.
    INFO | Following 47 tests have been run :

    INFO | TEST NAME                                          : COMMIT ID                                : STATUS
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/and-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/andi-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/auipc-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/beq-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bge-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bgeu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/blt-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bltu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bne-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/fence-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jal-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jalr-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lb-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lbu-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lh-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lhu-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lui-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lw-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/misalign1-jalr-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/or-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/ori-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sb-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sh-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sll-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slli-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slt-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slti-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltiu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sra-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srai-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srl-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srli-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sub-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sw-align-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xor-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xori-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/div-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/divu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mul-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulh-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulhsu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulhu-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/rem-01.S : -                                        : Passed
    INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/remu-01.S : -                                        : Passed

Program Test

Test 1

This test is modified from Quiz2 Problem A

//shift-and-add multiplication algorithm
.global main

.data
    multiplier: .word -9
    multiplicand: .word 7

.text
main:
    addi sp, sp, -16
    sw ra, 12(sp)
    sw s0, 8(sp)
    addi s0, sp, 16

    la a0, multiplier         # Load multiplier address
    lw a1, 0(a0)              # Load multiplier value
    la a2, multiplicand       # Load multiplicand address
    lw a3, 0(a2)              # Load multiplicand value
    li t0, 0                  # Initialize accumulator
    li t1, 32                 # Set bit counter (#A01)

    # Check for negative values
    bltz a1, handle_negative1 # If multiplier negative (#A02)
    j shift_and_add_loop      # Skip to main loop (#A05)
    bltz a3, handle_negative2 # If multiplicand negative (#A03)
    j shift_and_add_loop      # Continue to main loop (#A04)

handle_negative1:
    neg a1, a1                # Make multiplier positive

handle_negative2:
    neg a3, a3                # Make multiplicand positive

shift_and_add_loop:
    beqz t1, end_shift_and_add # Exit if bit count is zero
    andi t2, a1, 1            # Check least significant bit (#A06)
    beqz t2, skip_add         # Skip add if bit is 0
    add t0, t0, a3            # Add to accumulator

skip_add:
    srai a1, a1, 1            # Right shift multiplier
    slli a3, a3, 1            # Left shift multiplicand
    addi t1, t1, -1           # Decrease bit counter
    j shift_and_add_loop      # Repeat loop (#A07)

end_shift_and_add:
    li a4, 0x00100000         # Load print address
    sw t0, 0(a4)              # Store final result (#A08)
    li a5, 1
    sw a5, 4(a4)

    li a5, 0
    mv a0, a5
    lw ra, 12(sp)
    lw s0, 8(sp)
    addi sp, sp, 16
    ret

The testing result is 0xffffffc1 equal to -63.

[info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432)
[info] loading project definition from /home/user/workspace/MyCPU/project
[info] loading settings for project root from build.sbt...
[info] set current project to cpu (in build file:/home/user/workspace/MyCPU/)
[info] compiling 1 Scala source to /home/user/workspace/MyCPU/target/scala-2.13/classes ...
ffffffc1
[info] HexTest:
[info] CPU
[info] - should work through hex
[info] Run completed in 3 seconds, 78 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Test 2

This test is modified from Quiz2 Problem D

//Ancient Egyptian Multiplication
.global main

.data
    # Define the data section with two numbers to multiply
    num1: .word 13
    num2: .word 7

.text
main:
    addi sp, sp, -16
    sw ra, 12(sp)
    sw s0, 8(sp)
    addi s0, sp, 16
    # Begin the main code in the text section
    li x1, 0x00100000   # Load the print address into register x1
    lw t0, num1         # Load the first number (num1) into register t0
    lw t1, num2         # Load the second number (num2) into register t1
    li t2, 0            # Initialize the result (t2) to 0

loop:                                         
    # Check if the least significant bit of t0 (num1) is 1 (i.e., if the number is odd)
    andi t3, t0, 1
    beq t3, x0, skip_add  # If the bit is 0 (even), skip the addition
    # If the number is odd, add the value in t1 (num2) to the result in t2
    add t2, t2, t1 # D01

skip_add:
    # Perform a right shift on t0 (num1), effectively dividing it by 2
    srli t0, t0, 1 # D02
    # Perform a left shift on t1 (num2), effectively multiplying it by 2
    slli t1, t1, 1 # D03
    # If t0 (num1) is not zero, repeat the loop
    bnez t0, loop

    # Store the final result in the memory location pointed by x1
    sw t2, 0(x1)
    li a5, 1
    sw a5, 4(x1)

    li a5, 0
    mv a0, a5
    lw ra, 12(sp)
    lw s0, 8(sp)
    addi sp, sp, 16
    ret

The testing result is 0x0000005b equal to 91.

[info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432)
[info] loading project definition from /home/user/workspace/MyCPU/project
[info] loading settings for project root from build.sbt...
[info] set current project to cpu (in build file:/home/user/workspace/MyCPU/)
0000005b
[info] HexTest:
[info] CPU
[info] - should work through hex
[info] Run completed in 2 seconds, 142 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Test 3

This test is modified from Quiz4 Problem A

// Sum of Squares
.global main 

.data
    # Define the data section with two numbers
    n: .word 50
    m: .word 25

.text
main:
    addi sp, sp, -4
    sw ra, 0(sp)

    lw a0, n
    lw a1, m
    call sum_of_squares
    li t0, 0x00100000
    li t1, 1
    sw a0, 0(t0)
    sw t1, 4(t0)

    lw ra, 0(sp)
    addi sp, sp, 4
    ret

sum_of_squares:
    # Check if n (a0) is less than or equal to zero
    bgt a0, x0, recurse_case # __ A01 __

zero_case:
    # If n ≤ 0, return m (a1)
    add a0, a1, x0
    jalr x0, ra, 0 # __ A02 __

recurse_case:
    # Save caller-saved registers on the stack
    add t0, a0, x0           # t0 = a0 (copy n)
    addi sp, sp, -12         # Allocate stack space __ A03 __
    sw a1, 0(sp)             # Save a1 (m)
    sw t0, 4(sp)             # Save t0 (n)
    sw ra, 8(sp)             # Save return address __ A04 __

    # Call the square function
    jal ra, square # __ A05 __

    # Restore registers and stack
    lw a1, 0(sp)             # Restore a1 (m)
    lw t0, 4(sp)             # Restore t0 (n)
    lw ra, 8(sp)             # Restore return address __ A06 __
    addi sp, sp, 12          # Deallocate stack space __ A07 __

    # Update m = m + n^2
    add a1, a1, a0

    # Decrement n: a0 = n - 1
    addi a0, t0, -1

    # Recursive call to sum_of_squares
    addi sp, sp, -4          # Allocate stack space for ra __ A08 __
    sw ra, 0(sp)             # Save return address
    jal ra, sum_of_squares # __ A09 __
    lw ra, 0(sp)             # Restore return address
    addi sp, sp, 4           # Deallocate stack space __ A10 __

    # Return from the function
    jalr x0, ra, 0 # __ A11 __

# Function: square
# Computes the square of an integer (a0 = n), returns result in a0
square:
    addi sp, sp, -8         # Allocate stack space
    sw ra, 0(sp)            # Save return address __ A13 __

    add t0, x0, x0          # t0 = 0 (accumulator for the result)
    add t1, a0, x0          # t1 = a0 (copy of n, multiplicand)
    add t2, a0, x0          # t2 = a0 (copy of n, multiplier)

square_loop:
    andi t3, t2, 1          # Check the lowest bit of t2 (t2 & 1) __ A14 __
    beq t3, x0, skip_add    # If the bit is 0, skip addition
    add t0, t0, t1          # Accumulate: t0 += t1

skip_add:
    sll t1, t1, 1           # Left shift t1 (multiply by 2) __ A15 __
    srl t2, t2, 1           # Right shift t2 (divide by 2) __ A16 __
    bne t2, x0, square_loop # Repeat loop if t2 is not zero __ A17 __

square_end:
    add a0, t0, x0          # Move result to a0

    lw ra, 0(sp)            # Restore return address __ A18 __
    addi sp, sp, 8          # Deallocate stack space
    jalr x0, ra, 0          # Return from function __ A19 __

The testing result is 0x0000a7c6 equal to 42950.

[info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432)
[info] loading project definition from /home/user/workspace/MyCPU/project
[info] loading settings for project root from build.sbt...
[info] set current project to cpu (in build file:/home/user/workspace/MyCPU/)
0000a7c6
[info] HexTest:
[info] CPU
[info] - should work through hex
[info] Run completed in 8 seconds, 172 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

if you use the following c program, you will get the same result. The program is the implementation of the following equation.

\[ sum = m + n^2 + (n - 1)^2 + \ldots + 1^2 \]

#include <stdio.h>

int main() {
    int n=50, m=25;
    int sum = 0;
    for(int i=1;i<n+1;i++){
        sum += i * i;
    }
    printf("%d\n", sum+m);

    return 0;
}