郭君瑋
In my term project, my goal is Study ChiselRiscV and the books mentioned on the project page to understand how to build a RISC-V processor using Chisel. Ensure that it passes the riscv-arch-test and supports the RV32IM and CSR instructions.Finally, select and modify at least three RISC-V programs from course quiz, and make them run on my improved ChiselRiscV processor.
The entire process can be divided into the following three main steps:
Before beginning my study and implementation, it is essential to build my environment. The following are OS and the software I used in this project.
For chisel
For riscv-arch-test
I use Docker to set up my own environment. The detail installation of the above software and others not mentioned is recorded in the following Dockerfile.
# Building Stage
FROM ubuntu:22.04 AS stage_building
ARG RISCV_GNU_DEP="wget curl make autoconf automake autotools-dev python3 python3-pip \
libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf \
libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git libglib2.0-dev"
ARG SAIL_DEP="opam build-essential libgmp-dev z3 pkg-config zlib1g-dev"
RUN apt update && apt -y install ${RISCV_GNU_DEP} ${SAIL_DEP}
# riscv-gnu-toolchain
RUN git clone https://github.com/riscv/riscv-gnu-toolchain && \
cd riscv-gnu-toolchain && \
./configure --prefix=/usr/local/riscv --enable-multilib --with-arch=rv32gc --with-abi=ilp32d && \
make -j `nproc` && \
make install
# SAIL C-emulator
RUN opam init -y --disable-sandboxing && \
opam switch create ocaml-base-compiler.4.08.1 && \
opam install sail -y && \
eval $(opam config env) && \
git clone https://github.com/riscv/sail-riscv.git /usr/local/sail-riscv && \
cd /usr/local/sail-riscv && \
ARCH=RV32 make -j `nproc`
# Final Stage
FROM ubuntu:22.04 AS stage_final
# copy tools built in last stage to final stage
COPY --from=stage_building /usr/local/riscv /usr/local/riscv
COPY --from=stage_building /usr/local/sail-riscv /usr/local/sail-riscv
# set PATH to include RISC-V GNU Toolchain and SAIL C-emulator
ENV PATH="$PATH:/usr/local/riscv/bin"
ENV PATH="$PATH:/usr/local/sail-riscv/c_emulator"
# zip and unzip are required for the sdkman to be installed from script
RUN \
apt update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential \
verilator \
curl \
zip \
unzip \
sudo \
git \
python3 \
python3-pip \
&& \
rm -rf /var/lib/apt/lists/*
# this SHELL command is needed to allow `source` to work properly
# reference: https://stackoverflow.com/questions/20635472/using-the-run-instruction-in-a-dockerfile-with-source-does-not-work/45087082#45087082
SHELL ["/bin/bash", "-c"]
# add a user whose uid and gid are same as the master user
ARG UID GID NAME=user
RUN groupadd -g $GID -o $NAME
RUN useradd -u $UID -m -g $NAME -G plugdev $NAME && \
echo "$NAME ALL = NOPASSWD: ALL" > /etc/sudoers.d/user && \
chmod 0440 /etc/sudoers.d/user
RUN chown -R $NAME:$NAME /home/$NAME
USER $NAME
# reference: https://sdkman.io/install
RUN curl -s "https://get.sdkman.io" | bash
RUN source "$HOME/.sdkman/bin/sdkman-init.sh" && \
sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) && \
sdk install sbt
# install riscof and riscv-arch-test
RUN pip3 install --upgrade pip && \
pip3 install riscof
ENV PATH="$PATH:/home/user/.local/bin"
WORKDIR "/home/user/workspace"
ENTRYPOINT ["/bin/bash"]
In the official RISCOF document, there is a small mistake regarding the installation of SAIL. The version of ocaml-base-compiler needs to be at least 4.08.1 instead of 4.06.1. Additionally, when installing RISCOF by pip, it will be installed in /home/user/.local/bin
, you need to use export PATH="$PATH:/home/user/.local/bin"
to get riscof
this command in your terminal.
ChiselRiscV is a 32-bit RISC-V CPU implemented according to the book "CPU Design with RISC-V and Chisel - First step to custom CPU implementation with open-source ISA". To understand how to make a riscv CPU by Chisel, the first step is cloning this repository.
git clone https://github.com/nozomioshi/ChiselRiscV.git
After cloning repository, we can see this file structure
.
├── build.sbt
├── doc
├── dockerfile
├── .gitignore
├── README.md
├── results
├── src
│ ├── main
│ │ ├── c
│ │ ├── scala
│ │ └── shell
│ └── test
│ ├── resources
│ └── scala
└── target
In Chisel, the source code can divided two part, main
and test
. main
contains the code for all hardware behaviors, and test
is similiar to the testbench in Verilog. It is responsible for providing inputs and verifying the outputs. build.sbt
include the compile configuration and the version of Scala and Chisel.
If we want to run the all test, use
sbt run
or use testOnly
sbt "testOnly ctest.tests.HexTest"
to run the Specified test.
Memory consists of UInt(8.W), representing a byte. RISC-V uses little-endian, which means the least significant byte is stored at the lowest address.
8 bit fetch.hex
Address | Data |
---|---|
0 | 11 |
1 | 12 |
2 | 13 |
3 | 14 |
4 | 21 |
… | … |
11 | 34 |
8 bit fetch.hex after being organized into 32 bit.
Address | Data |
---|---|
0 | 14131211 |
4 | 24232221 |
8 | 34333221 |
Hence, the behavior of fetching memory will like the following.
imem.inst := Cat(
mem(imem.addr + 3.U),
mem(imem.addr + 2.U),
mem(imem.addr + 1.U),
mem(imem.addr )
)
pcReg
will increment based on the StartAddr
when test starts. Every cycle, CPU fetch instruction depended on pcReg
.
val pcReg = RegInit(StartAddr)
pcReg := pcReg + 4.U
imem.addr := pcReg
val inst = imem.inst
32 registers are defined in RV32I, which are used to store data and addresses. Each register is 32-bit wide.
Address | register |
---|---|
0 | zero |
1 | ra |
2 | sp |
3 | gp |
4 | tp |
5 | t0 |
6 | t1 |
7 | t2 |
8 | s0/fp |
9 | s1 |
10 | a0 |
11 | a1 |
12 | a2 |
13 | a3 |
14 | a4 |
15 | a5 |
16 | a6 |
17 | a7 |
18 | s2 |
19 | s3 |
20 | s4 |
21 | s5 |
22 | s6 |
23 | s7 |
24 | s8 |
25 | s9 |
26 | s10 |
27 | s11 |
28 | t3 |
29 | t4 |
30 | s5 |
31 | t6 |
And there is a register memory declared in Core. we can use the above table to find the register we want to accessed
val regFile = Mem(32, UInt(WordLen.W))
RV32I basic instruction set is composed of 32-bit instructions.
The instruction format is divided into six types: R-Type, I-Type, S-Type,U-Type,J-Type and B-Type. J-Type and B-Type are respectively come from I-Type and S-Type, so we can say that there are four basic types of instructions.
[R-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+---------------------+--------------+--------------+--------+--------------+---------------------+
| funct7 | rs2 | rs1 | funct3 | rd | opcode |
+---------------------+--------------+--------------+--------+--------------+---------------------+
[I-Tpye]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+------------------------------------+--------------+--------+--------------+---------------------+
| imm_i | rs1 | funct3 | rd | opcode |
+------------------------------------+--------------+--------+--------------+---------------------+
[S-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+---------------------+--------------+--------------+--------+--------------+---------------------+
| imm_s(11:5) | rs2 | rs1 | funct3 | imm_s(4:0) | opcode |
+---------------------+--------------+--------------+--------+--------------+---------------------+
[U-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+------------------------------------------------------------+--------------+---------------------+
| imm_u(11:5) | rd | opcode |
+------------------------------------------------------------+--------------+---------------------+
[J-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+------------------------------------------------------------+--------------+---------------------+
| imm_j(20 + 10:1 + 11 + 19:12) | rd | opcode |
+------------------------------------------------------------+--------------+---------------------+
[B-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+---------------------+--------------+--------------+--------+--------------+---------------------+
| imm_b(12 + 10:5) | rs2 | rs1 | funct3 | imm_b(4:1+11)| opcode |
+---------------------+--------------+--------------+--------+--------------+---------------------+
The CPU can analyze the instructions based on these formats to determine the required registers and data.
val rs1Addr = inst(19, 15)
val rs2Addr = inst(24, 20)
val wbAddr = inst(11, 7)
val rs1Data = Mux(rs1Addr =/= 0.U, regFile(rs1Addr), 0.U)
val rs2Data = Mux(rs2Addr =/= 0.U, regFile(rs2Addr), 0.U)
and immediate can be got by the same method.
val immI = inst(31, 20)
val immIsext = Cat(Fill(20, immI(11)), immI)
val immS = Cat(inst(31, 25), inst(11, 7))
val immSsext = Cat(Fill(20, immS(11)), immS)
val immB = Cat(inst(31), inst(7), inst(30, 25), inst(11, 8))
val immBsext = Cat(Fill(19, immB(11)), immB, 0.U(1.W))
val immJ = Cat(inst(31), inst(19, 12), inst(20), inst(30, 21))
val immJsext = Cat(Fill(11, immJ(19)), immJ, 0.U(1.W))
val immU = inst(31, 12)
val immUshifted = Cat(immU, 0.U(12.W))
val immZ = inst(19, 15)
val immZext = Cat(Fill(27, 0.U), immZ)
Since every instructions execute different behaviors, it is necessary to use control signals to determine the circuit paths. The following is the signal applied to this CPU.
val exeFun :: op1Sel :: op2Sel :: memWen :: regFileWen :: wbSel :: csrCmd :: Nil = controlSignals
A lookup table is required to decode every instruction and determining all the signals.
val controlSignals = ListLookup(inst, List(AluX, Op1Rs1, Op2Rs2, MenX, RenS, WbX, CsrX),
Array(
Lw -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbMem, CsrX),
Sw -> List(AluAdd, Op1Rs1, Op2Ims, MenS, RenX, WbX, CsrX),
Add -> List(AluAdd, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Addi -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Sub -> List(AluSub, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
And -> List(AluAnd, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Or -> List(AluOr, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Xor -> List(AluXor, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Andi -> List(AluAnd, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Ori -> List(AluOr, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Xori -> List(AluXor, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Sll -> List(AluSll, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Srl -> List(AluSrl, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Sra -> List(AluSra, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Slli -> List(AluSll, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Srli -> List(AluSrl, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Srai -> List(AluSra, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Slt -> List(AluSlt, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Sltu -> List(AluSltu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Slti -> List(AluSlt, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Sltiu -> List(AluSltu, Op1Rs1, Op2Imi, MenX, RenS, WbAlu, CsrX),
Beq -> List(BrBeq, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX),
Bne -> List(BrBne, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX),
Blt -> List(BrBlt, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX),
Bge -> List(BrBge, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX),
Bltu -> List(BrBltu, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX),
Bgeu -> List(BrBgeu, Op1Rs1, Op2Rs2, MenX, RenX, WbX, CsrX),
Jal -> List(AluAdd, Op1Pc, Op2Imj, MenX, RenS, WbPc, CsrX),
Jalr -> List(AluJalr, Op1Rs1, Op2Imi, MenX, RenS, WbPc, CsrX),
Lui -> List(AluAdd, Op1X, Op2Imu, MenX, RenS, WbAlu, CsrX),
AuiPc -> List(AluAdd, Op1Pc, Op2Imu, MenX, RenS, WbAlu, CsrX),
CsrRw -> List(AluCopy1, Op1Rs1, Op2X, MenX, RenS, WbCsr, CsrW),
CsrRs -> List(AluCopy1, Op1Rs1, Op2X, MenX, RenS, WbCsr, CsrS),
CsrRc -> List(AluCopy1, Op1Rs1, Op2X, MenX, RenS, WbCsr, CsrC),
CsrRwi -> List(AluCopy1, Op1Imz, Op2X, MenX, RenS, WbCsr, CsrW),
CsrRsi -> List(AluCopy1, Op1Imz, Op2X, MenX, RenS, WbCsr, CsrS),
CsrRci -> List(AluCopy1, Op1Imz, Op2X, MenX, RenS, WbCsr, CsrC),
Ecall -> List(AluX, Op1X, Op2X, MenX, RenX, WbX, CsrE)
)
)
Here, take an add
instruction as a example to demonstrate the behavior of the lookup table. If an add
instruction is the following.
00a50e33 add t3,a0,a0 # 0x00a50e33 = 0000 0000 1010 0101 0000 1110 0011 0011
The bit pattern of add
is b0000000??????????000?????0110011
.Then, the CPU will use this pattern to match the instruction in the table, identify which instruction it is, and give back a set of control signals.
val Lw = BitPat("b?????????????????010?????0000011")
val Sw = BitPat("b?????????????????010?????0100011")
// Add
val Add = BitPat("b0000000??????????000?????0110011")
val Addi = BitPat("b?????????????????000?????0010011")
// Subtract
val Sub = BitPat("b0100000??????????000?????0110011")
// Logical
val And = BitPat("b0000000??????????111?????0110011")
val Or = BitPat("b0000000??????????110?????0110011")
val Xor = BitPat("b0000000??????????100?????0110011")
val Andi = BitPat("b?????????????????111?????0010011")
val Ori = BitPat("b?????????????????110?????0010011")
val Xori = BitPat("b?????????????????100?????0010011")
// Shift
val Sll = BitPat("b0000000??????????001?????0110011")
val Srl = BitPat("b0000000??????????101?????0110011")
val Sra = BitPat("b0100000??????????101?????0110011")
val Slli = BitPat("b0000000??????????001?????0010011")
val Srli = BitPat("b0000000??????????101?????0010011")
val Srai = BitPat("b0100000??????????101?????0010011")
// Compare
val Slt = BitPat("b0000000??????????010?????0110011")
val Sltu = BitPat("b0000000??????????011?????0110011")
val Slti = BitPat("b?????????????????010?????0010011")
val Sltiu = BitPat("b?????????????????011?????0010011")
// Branch
val Beq = BitPat("b?????????????????000?????1100011")
val Bne = BitPat("b?????????????????001?????1100011")
val Blt = BitPat("b?????????????????100?????1100011")
val Bge = BitPat("b?????????????????101?????1100011")
val Bltu = BitPat("b?????????????????110?????1100011")
val Bgeu = BitPat("b?????????????????111?????1100011")
// Jump
val Jal = BitPat("b?????????????????????????1101111")
val Jalr = BitPat("b?????????????????000?????1100111")
// Load immediate
val Lui = BitPat("b?????????????????????????0110111")
val AuiPc = BitPat("b?????????????????????????0010111")
// CSR
val CsrRw = BitPat("b?????????????????001?????1110011")
val CsrRwi = BitPat("b?????????????????101?????1110011")
val CsrRs = BitPat("b?????????????????010?????1110011")
val CsrRsi = BitPat("b?????????????????110?????1110011")
val CsrRc = BitPat("b?????????????????011?????1110011")
val CsrRci = BitPat("b?????????????????111?????1110011")
// Exception
val Ecall = BitPat("b00000000000000000000000001110011")
Because of different instruction format and usage, the operation is not performed on two different registers but on a register and an immediate value sometimes.
val op1Data = MuxCase(0.U, Seq(
(op1Sel === Op1Rs1) -> rs1Data,
(op1Sel === Op1Pc) -> pcReg,
(op1Sel === Op1Imz) -> immZext
))
val op2Data = MuxCase(0.U, Seq(
(op2Sel === Op2Rs2) -> rs2Data,
(op2Sel === Op2Imi) -> immIsext,
(op2Sel === Op2Ims) -> immSsext,
(op2Sel === Op2Imj) -> immJsext,
(op2Sel === Op2Imu) -> immUshifted
))
After determining op1Data and op2Data, the inputs wiil be sent to the ALU. The design of the ALU is shown in the following code, where the control signal determines the type of operation to be executed.
aluOut := MuxCase(0.U, Seq(
(exeFun === AluAdd) -> (op1Data + op2Data),
(exeFun === AluSub) -> (op1Data - op2Data),
(exeFun === AluAnd) -> (op1Data & op2Data),
(exeFun === AluOr) -> (op1Data | op2Data),
(exeFun === AluXor) -> (op1Data ^ op2Data),
(exeFun === AluSll) -> (op1Data << op2Data(4, 0))(31, 0),
(exeFun === AluSrl) -> (op1Data >> op2Data(4, 0)),
(exeFun === AluSra) -> (op1Data.asSInt >> op2Data(4, 0)).asUInt,
(exeFun === AluSlt) -> (op1Data.asSInt < op2Data.asSInt).asUInt,
(exeFun === AluSltu) -> (op1Data < op2Data).asUInt,
(exeFun === AluJalr) -> ((op1Data + op2Data) & ~1.U(WordLen.W)),
(exeFun === AluCopy1) -> op1Data
))
For B format instruciotn, branch Comparator is also required.
brFlag := MuxCase(false.B, Seq(
(exeFun === BrBeq) -> (op1Data === op2Data),
(exeFun === BrBne) -> (op1Data =/= op2Data),
(exeFun === BrBlt) -> (op1Data.asSInt < op2Data.asSInt),
(exeFun === BrBge) -> !(op1Data.asSInt < op2Data.asSInt),
(exeFun === BrBltu) -> (op1Data < op2Data),
(exeFun === BrBgeu) -> !(op1Data < op2Data)
))
brTarget := pcReg + immBsext
Becasue B and J format maybe jump to another address, modification to the pcReg is essential. If instruction is Jal
or Jalr
, directly write the jump address to pcReg, and so do B format instructions.
val pcReg = RegInit(StartAddr)
// pcReg := pcReg + 4.U
imem.addr := pcReg
val inst = imem.inst
val jmpFlag = inst === Jal || inst === Jalr
val eCallFlag = inst === Ecall
val aluOut = Wire(UInt(WordLen.W))
pcReg := MuxCase(pcPlus4, Seq(
brFlag -> brTarget,
jmpFlag -> aluOut,
eCallFlag -> csrRegFile(0x305.U) // Trap vector
))
when CPU execute ecall
, the trap handler must be triggered. Since this CPU only implements M-mode(Machine mode), when an ecall
occurs, the value of mtvec
must be written to pcReg
to allow the CPU to jump to the trap handler.
load and save instructions must access data memory, so only lw
and sw
are related to this stage.
dmem.addr := aluOut
dmem.wEn := memWen
dmem.wData := rs2Data
CPU access data memory according to the address of register value and immediate (rs1Data
+ immS
) in sw
instrution.
dmem.data := Cat(
mem(dmem.addr + 3.U),
mem(dmem.addr + 2.U),
mem(dmem.addr + 1.U),
mem(dmem.addr)
)
The memWen
signal will only be true for the lw
instruction. In other words, only the lw
instruction writes data memory based on the register value and the immediate address (rs1Data
+ immI
).
when(dmem.wEn) {
mem(dmem.addr + 3.U) := dmem.wData(31, 24)
mem(dmem.addr + 2.U) := dmem.wData(23, 16)
mem(dmem.addr + 1.U) := dmem.wData(15, 8)
mem(dmem.addr) := dmem.wData(8, 0)
}
Except for lw
, sw
, J and B format instrutions, the remaining basic instructions write the result of aluOut
back to the specified register when regFileWen === RenS
. lw
, Jal
and Jalr
do the same thing, but lw
write the data accessed from memory, while Jal
and Jalr
write back pcPlus4
, instead of aluOut
. sw
and B format instrutions don't care about this stage.
val wbData = MuxCase(aluOut, Seq(
(wbSel === WbMem) -> dmem.data,
(wbSel === WbPc) -> pcPlus4,
(wbSel === WbCsr) -> csrRdata
))
when(regFileWen === RenS) {
regFile(wbAddr) := wbData
}
CSR instuction is atomic. Such instructions cannot be divided into separate steps.
In other words, CSR instuction can read and write the same CSR register in the same time. Take csrrw
for a example, csrrw
is "Atomic Read/Write in CSR". csrrw
read value from the CSR rgister and write it to rd
register (see the above code), while the value in rs1
register is read and written to CSR register.
csrrw rd, csr, rs1
csrrwi rd, csr, imm_z
csrrs rd, csr, rs1
csrrsi rd, csr, imm_z
csrrc rd, csr, rs1
csrrci rd, csr, imm_z
s
for set, the operated value is ORed with the CSR value.
c
for clear, the operated value is inverted and ANDed with the CSR value.
// CSR
val csrAddr = Mux(csrCmd === CsrE, 0x342.U, inst(31, 20)) // mcause: 0x342
val csrRdata = csrRegFile(csrAddr)
val csrWdata = MuxCase(0.U, Seq(
(csrCmd === CsrW) -> op1Data,
(csrCmd === CsrS) -> (csrRdata | op1Data),
(csrCmd === CsrC) -> (csrRdata & ~op1Data),
(csrCmd === CsrE) -> 11.U // Machine ECALL
))
when(csrCmd > 0.U) {
csrRegFile(csrAddr) := csrWdata
}
The exit
is the end signal. when instrution is Unimp
, the exit
signal is raised to high. Then, test detect that exit
signal is high, the whole CPU test will be finished.
exit := (inst === Unimp)
test(config()) { dut =>
while(!dut.exit.peek().litToBoolean) {
dut.clock.setTimeout(0)
dut.clock.step()
}
dut.clock.step()
}
Through my research, I discovered that this CPU lacks some basic instructions and needs to add the M-series instructions.
The missing instructions are Lh
, Lhu
, Lb
, Lbu
, Sh
and Sb
. In order to adding them to the current CPU, the first step is modifying the decoder. Hence, I add the bit pattern of these instructions while adjusting the lookup table and part of control signal.
// bit pattern
val Lw = BitPat("b?????????????????010?????0000011")
val Lh = BitPat("b?????????????????001?????0000011")
val Lhu = BitPat("b?????????????????101?????0000011")
val Lb = BitPat("b?????????????????000?????0000011")
val Lbu = BitPat("b?????????????????100?????0000011")
val Sw = BitPat("b?????????????????010?????0100011")
val Sh = BitPat("b?????????????????001?????0100011")
val Sb = BitPat("b?????????????????000?????0100011")
// lookup table
Lh -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLH, CsrX),
Lhu -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLHU, CsrX),
Lb -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLB, CsrX),
Lbu -> List(AluAdd, Op1Rs1, Op2Imi, MenX, RenS, WbLBU, CsrX),
Sw -> List(AluAdd, Op1Rs1, Op2Ims, MenSW, RenX, WbX, CsrX),
Sh -> List(AluAdd, Op1Rs1, Op2Ims, MenSH, RenX, WbX, CsrX),
Sb -> List(AluAdd, Op1Rs1, Op2Ims, MenSB, RenX, WbX, CsrX),
In the beginning, the signal memWen
is ture only when the instruction is sw
. after adding sh
and sb
, this signal is expanded to four states and can identify how much byte data should be stored.
// old Memory access
dmem.addr := aluOut
dmem.wEn := memWen
dmem.wData := rs2Data
// new Memory access
dmem.addr := aluOut
dmem.wEn := (memWen > 0.U)
dmem.wData := MuxCase(0.U, Seq(
(memWen === MenSW) -> rs2Data,
(memWen === MenSH) -> Cat(dmem.data(31, 16), rs2Data(15, 0)),
(memWen === MenSB) -> Cat(dmem.data(31, 8), rs2Data(7, 0))
))
wbSel
is also expanded to 8 states. For each load instructions, there are corresponding write-back behaviors.
// old Write back
val wbData = MuxCase(aluOut, Seq(
(wbSel === WbMem) -> dmem.data,
(wbSel === WbPc) -> pcPlus4,
(wbSel === WbCsr) -> csrRdata
))
when(regFileWen === RenS) {
regFile(wbAddr) := wbData
}
// Nnew Write back
val wbData = MuxCase(aluOut, Seq(
(wbSel === WbLW) -> dmem.data,
(wbSel === WbLH) -> Cat(Fill(16, dmem.data(31)), dmem.data(15, 0)),
(wbSel === WbLHU) -> Cat(Fill(16, 0.U), dmem.data(15, 0)),
(wbSel === WbLB) -> Cat(Fill(24, dmem.data(31)), dmem.data(7, 0)),
(wbSel === WbLBU) -> Cat(Fill(24, 0.U), dmem.data(7, 0)),
(wbSel === WbPc) -> pcPlus4,
(wbSel === WbCsr) -> csrRdata
))
when(regFileWen === RenS) {
regFile(wbAddr) := wbData
}
expanding M-series instructions is similiar to complement load and save instructions. Adding bit pattern and modifying lookup table is necessary.
// bit pattern
val Mul = BitPat("b0000001??????????000?????0110011")
val Mulh = BitPat("b0000001??????????001?????0110011")
val Mulhsu = BitPat("b0000001??????????010?????0110011")
val Mulhu = BitPat("b0000001??????????011?????0110011")
val Div = BitPat("b0000001??????????100?????0110011")
val Divu = BitPat("b0000001??????????101?????0110011")
val Rem = BitPat("b0000001??????????110?????0110011")
val Remu = BitPat("b0000001??????????111?????0110011")
// lookup table
Mul -> List(AluMul, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Mulh -> List(AluMulh, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Mulhsu -> List(AluMulhsu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Mulhu -> List(AluMulhu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Div -> List(AluDiv, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Divu -> List(AluDivu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Rem -> List(AluRem, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
Remu -> List(AluRemu, Op1Rs1, Op2Rs2, MenX, RenS, WbAlu, CsrX),
For M-series instructions, it is sufficient to expand the behavior of the ALU, and the rest of the signal controls are the same as add
instruction.
// ALU
aluOut := MuxCase(0.U, Seq(
(exeFun === AluAdd) -> (op1Data + op2Data),
(exeFun === AluSub) -> (op1Data - op2Data),
(exeFun === AluAnd) -> (op1Data & op2Data),
(exeFun === AluOr) -> (op1Data | op2Data),
(exeFun === AluXor) -> (op1Data ^ op2Data),
(exeFun === AluSll) -> (op1Data << op2Data(4, 0))(31, 0),
(exeFun === AluSrl) -> (op1Data >> op2Data(4, 0)),
(exeFun === AluSra) -> (op1Data.asSInt >> op2Data(4, 0)).asUInt,
(exeFun === AluSlt) -> (op1Data.asSInt < op2Data.asSInt).asUInt,
(exeFun === AluSltu) -> (op1Data < op2Data).asUInt,
(exeFun === AluJalr) -> ((op1Data + op2Data) & ~1.U(WordLen.W)),
(exeFun === AluCopy1) -> op1Data,
(exeFun === AluMul) -> (op1Data * op2Data)(31, 0).asUInt,
(exeFun === AluMulh) -> (op1Data.asSInt * op2Data.asSInt)(63, 32).asUInt,
(exeFun === AluMulhsu) -> (op1Data.asSInt * op2Data)(63, 32).asUInt,
(exeFun === AluMulhu) -> (op1Data * op2Data)(63, 32).asUInt,
(exeFun === AluDiv) -> Mux(op2Data.asSInt === 0.S(WordLen.W), 0xFFFFFFFF.S(WordLen.W), (op1Data.asSInt/op2Data.asSInt)).asUInt,
(exeFun === AluDivu) -> Mux(op2Data === 0.U(WordLen.W), 0xFFFFFFFF.S(WordLen.W).asUInt, (op1Data/op2Data)).asUInt,
(exeFun === AluRem) -> Mux(op2Data.asSInt === 0.S(WordLen.W), op1Data.asSInt, (op1Data.asSInt % op2Data.asSInt)).asUInt,
(exeFun === AluRemu) -> Mux(op2Data === 0.U(WordLen.W), op1Data, (op1Data % op2Data)).asUInt
))
On division, you have to be careful about deviding by 0. Specification for this case is also defined.The quotient of division by zero has all bits set, and the remainder of division by zero equals the dividend.
from Implemented M-extension of RISCV
The riscv-arch-test are an evolving set of tests that are created to help ensure that software written for a given RISC-V Profile/Specification will run on all implementations that comply with that profile. The older 2.x version of the framework is based on Makefiles and the current version 3.10 I Adopt use RISCOF as its basis system.
RISCOF(The RISC-V Compatibility Framework) is a python based framework which enables testing of a RISC-V target (hard or soft implementations) against a standard RISC-V golden reference model using a suite of RISC-V architectural assembly tests.
RISCOF generates standard pre-built templates for DUTs and Reference Models for the user via the setup
command as shown below:
riscof setup --dutname=spike
The above command will generate the following files and directories in the current directory:
├──config.ini # configuration file for riscof
├──spike/ # DUT plugin templates
├── env
│ ├── link.ld # DUT linker script
│ └── model_test.h # DUT specific header file
├── riscof_spike.py # DUT python plugin
├── spike_isa.yaml # DUT ISA yaml based on riscv-config
└── spike_platform.yaml # DUT Platform yaml based on riscv-config
├──sail_cSim/ # reference plugin templates
├── env
│ ├── link.ld # Reference linker script
│ └── model_test.h # Reference model specific header file
├── __init__.py
└── riscof_sail_cSim.py # Reference model python plugin.
The generate template config.ini
will look something like this by default:
[RISCOF]
ReferencePlugin=sail_cSim
ReferencePluginPath=/path/to/riscof/sail_cSim
DUTPlugin=spike
DUTPluginPath=/path/to/riscof/spike
## Example configuration for spike plugin.
[spike]
pluginpath=/path/to/riscof/spike/
ispec=/path/to/riscof/spike/spike_isa.yaml
pspec=/path/to/riscof/spike/spike_platform.yaml
[sail_cSim]
pluginpath=/path/to/riscof/sail_cSim
Before you start to run RISCOF, you should supply the path of some files about your hardware model, such as plugin, ispec and pspec to config.ini
.
A typical DUT plugin directory has the following structure:
├──dut-name/ # DUT plugin templates
├── env
│ ├── link.ld # DUT linker script
│ └── model_test.h # DUT specific header file
├── riscof_dut-name.py # DUT python plugin
├── dut-name_isa.yaml # DUT ISA yaml based on riscv-config
└── dut-name_platform.yaml # DUT Platform yaml based on riscv-config
The python plugin files capture the behavior of model for compiling tests, executing them on the DUT and finally extracting the signature for each test.
The yaml specs in the DUT plugin directory are the most important inputs to the RISCOF framework. All decisions of filtering tests depend on the these YAML files. The files must follow the syntax/format specified by riscv-config. These YAMLs are validated in RISCOF using riscv-config.
The env
folder can also contain other necessary plugin specific files for pre/post processing of logs, signatures, elfs, etc.
For ChiselRiscV, the input data is .hex
compiled .c
via riscv-gnu-toolchain. I use the following command to compile the testing program.
%: ./src/%.c
riscv32-unknown-elf-as -R -march=rv32i_zicsr -mabi=ilp32 -o ./build/init.o ./scripts/init.S
riscv32-unknown-elf-gcc $< -O0 -march=rv32im_zicsr -mabi=ilp32 -c -o ./build/$@.o
riscv32-unknown-elf-ld ./build/$@.o ./build/init.o -b elf32-littleriscv -T ./scripts/link.ld -o ./build/$@
riscv32-unknown-elf-objcopy ./build/$@ -O binary ./bin/$@.bin
od ./bin/$@.bin -An -tx1 -w1 -v > ../../test/resources/hex/$@.hex
riscv32-unknown-elf-objdump ./build/$@ -b elf32-littleriscv -D > ./dump/$@.dump
Then, it produce a .hex
as input of hardware and a dump
for debugging.
the testing command is the following:
sbt "testOnly ChiselRiscV.tests.HexTest -- -DprogramFile=ctest.hex -DwriteVcd=1"
I add argument -DprogramFile
to find where the .hex
is. The loadMemoryFromFileInline
funtion read it to mem
.
def loadMemoryFromHexFile(filename: Option[String]): Unit = loadMemoryFromFileInline(mem, filename.get)
In RISCOF, the testing is comparing the section .data
of the program with the reference hardware to verify whether the hardware behavior is correct. Therefore, the dut hardware must have the funtion of printing out the memory content. However, Chisel doesn't have any funtion about writting data to file, like fwrite
in Verilog. So, I used 2>&1 | tee output.stdout
to save terminal output and Capture the part of memory data. This method was provided by nucleusrv.
sbt \"testOnly ChiselRiscV.tests.HexTest -- -DprogramFile=ctest.hex -DwriteVcd=1\" 2>&1 | tee output.stdout; `grep '^[a-f0-9]\+$$' output.stdout > output.signature`
The remaining task is to print the content of a specified memory address. Here, I refer to the MMIO approach by setting two rarely used memory addresses as output addresses. By moving specific memory value to OutAddr
and then storing 1 into PrintAddr
, the hardware will print the value of OutAddr
in the terminal.
when(PrintAddr_mem===1.U(WordLen.W)) {// OutAddr = 0x00100000 PrintAddr = 0x00100004
val memdata = Cat(mem(OutAddr + 3.U), mem(OutAddr + 2.U), mem(OutAddr + 1.U), mem(OutAddr))
printf(cf"${Hexadecimal(memdata)}\n")
mem(PrintAddr) := 0.U(8.W)
mem(PrintAddr + 1.U) := 0.U(8.W)
mem(PrintAddr + 2.U) := 0.U(8.W)
mem(PrintAddr + 3.U) := 0.U(8.W)
}
At the end of the test program, I add the following assembly code. This assembly code will read the specified begin_signature
address and end_signature
address, then print the values of this memory region one by one using the method described above.
#define RVMODEL_HALT \
la a0, begin_signature; \
la a1, end_signature; \
li t1, 0x00100000; \
li t2, 1; \
print_data: \
beq a0, a1, halt; \
lw t0, 0(a0); \
sw t0, 0(t1); \
sw t2, 4(t1); \
addi a0 , a0, 4; \
j print_data; \
halt: \
unimp;
INFO | Running Tests on Reference Model.
INFO | Initiating signature checking.
INFO | Following 47 tests have been run :
INFO | TEST NAME : COMMIT ID : STATUS
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/and-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/andi-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/auipc-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/beq-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bge-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bgeu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/blt-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bltu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bne-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/fence-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jal-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jalr-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lb-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lbu-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lh-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lhu-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lui-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lw-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/misalign1-jalr-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/or-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/ori-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sb-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sh-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sll-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slli-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slt-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slti-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltiu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sra-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srai-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srl-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srli-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sub-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sw-align-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xor-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xori-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/div-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/divu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mul-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulh-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulhsu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/mulhu-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/rem-01.S : - : Passed
INFO | /home/user/workspace/MyCPU/arch-test-target/riscv-arch-test/riscv-test-suite/rv32i_m/M/src/remu-01.S : - : Passed
This test is modified from Quiz2 Problem A
//shift-and-add multiplication algorithm
.global main
.data
multiplier: .word -9
multiplicand: .word 7
.text
main:
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
addi s0, sp, 16
la a0, multiplier # Load multiplier address
lw a1, 0(a0) # Load multiplier value
la a2, multiplicand # Load multiplicand address
lw a3, 0(a2) # Load multiplicand value
li t0, 0 # Initialize accumulator
li t1, 32 # Set bit counter (#A01)
# Check for negative values
bltz a1, handle_negative1 # If multiplier negative (#A02)
j shift_and_add_loop # Skip to main loop (#A05)
bltz a3, handle_negative2 # If multiplicand negative (#A03)
j shift_and_add_loop # Continue to main loop (#A04)
handle_negative1:
neg a1, a1 # Make multiplier positive
handle_negative2:
neg a3, a3 # Make multiplicand positive
shift_and_add_loop:
beqz t1, end_shift_and_add # Exit if bit count is zero
andi t2, a1, 1 # Check least significant bit (#A06)
beqz t2, skip_add # Skip add if bit is 0
add t0, t0, a3 # Add to accumulator
skip_add:
srai a1, a1, 1 # Right shift multiplier
slli a3, a3, 1 # Left shift multiplicand
addi t1, t1, -1 # Decrease bit counter
j shift_and_add_loop # Repeat loop (#A07)
end_shift_and_add:
li a4, 0x00100000 # Load print address
sw t0, 0(a4) # Store final result (#A08)
li a5, 1
sw a5, 4(a4)
li a5, 0
mv a0, a5
lw ra, 12(sp)
lw s0, 8(sp)
addi sp, sp, 16
ret
The testing result is 0xffffffc1
equal to -63
.
[info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432)
[info] loading project definition from /home/user/workspace/MyCPU/project
[info] loading settings for project root from build.sbt...
[info] set current project to cpu (in build file:/home/user/workspace/MyCPU/)
[info] compiling 1 Scala source to /home/user/workspace/MyCPU/target/scala-2.13/classes ...
ffffffc1
[info] HexTest:
[info] CPU
[info] - should work through hex
[info] Run completed in 3 seconds, 78 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
This test is modified from Quiz2 Problem D
//Ancient Egyptian Multiplication
.global main
.data
# Define the data section with two numbers to multiply
num1: .word 13
num2: .word 7
.text
main:
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
addi s0, sp, 16
# Begin the main code in the text section
li x1, 0x00100000 # Load the print address into register x1
lw t0, num1 # Load the first number (num1) into register t0
lw t1, num2 # Load the second number (num2) into register t1
li t2, 0 # Initialize the result (t2) to 0
loop:
# Check if the least significant bit of t0 (num1) is 1 (i.e., if the number is odd)
andi t3, t0, 1
beq t3, x0, skip_add # If the bit is 0 (even), skip the addition
# If the number is odd, add the value in t1 (num2) to the result in t2
add t2, t2, t1 # D01
skip_add:
# Perform a right shift on t0 (num1), effectively dividing it by 2
srli t0, t0, 1 # D02
# Perform a left shift on t1 (num2), effectively multiplying it by 2
slli t1, t1, 1 # D03
# If t0 (num1) is not zero, repeat the loop
bnez t0, loop
# Store the final result in the memory location pointed by x1
sw t2, 0(x1)
li a5, 1
sw a5, 4(x1)
li a5, 0
mv a0, a5
lw ra, 12(sp)
lw s0, 8(sp)
addi sp, sp, 16
ret
The testing result is 0x0000005b
equal to 91
.
[info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432)
[info] loading project definition from /home/user/workspace/MyCPU/project
[info] loading settings for project root from build.sbt...
[info] set current project to cpu (in build file:/home/user/workspace/MyCPU/)
0000005b
[info] HexTest:
[info] CPU
[info] - should work through hex
[info] Run completed in 2 seconds, 142 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
This test is modified from Quiz4 Problem A
// Sum of Squares
.global main
.data
# Define the data section with two numbers
n: .word 50
m: .word 25
.text
main:
addi sp, sp, -4
sw ra, 0(sp)
lw a0, n
lw a1, m
call sum_of_squares
li t0, 0x00100000
li t1, 1
sw a0, 0(t0)
sw t1, 4(t0)
lw ra, 0(sp)
addi sp, sp, 4
ret
sum_of_squares:
# Check if n (a0) is less than or equal to zero
bgt a0, x0, recurse_case # __ A01 __
zero_case:
# If n ≤ 0, return m (a1)
add a0, a1, x0
jalr x0, ra, 0 # __ A02 __
recurse_case:
# Save caller-saved registers on the stack
add t0, a0, x0 # t0 = a0 (copy n)
addi sp, sp, -12 # Allocate stack space __ A03 __
sw a1, 0(sp) # Save a1 (m)
sw t0, 4(sp) # Save t0 (n)
sw ra, 8(sp) # Save return address __ A04 __
# Call the square function
jal ra, square # __ A05 __
# Restore registers and stack
lw a1, 0(sp) # Restore a1 (m)
lw t0, 4(sp) # Restore t0 (n)
lw ra, 8(sp) # Restore return address __ A06 __
addi sp, sp, 12 # Deallocate stack space __ A07 __
# Update m = m + n^2
add a1, a1, a0
# Decrement n: a0 = n - 1
addi a0, t0, -1
# Recursive call to sum_of_squares
addi sp, sp, -4 # Allocate stack space for ra __ A08 __
sw ra, 0(sp) # Save return address
jal ra, sum_of_squares # __ A09 __
lw ra, 0(sp) # Restore return address
addi sp, sp, 4 # Deallocate stack space __ A10 __
# Return from the function
jalr x0, ra, 0 # __ A11 __
# Function: square
# Computes the square of an integer (a0 = n), returns result in a0
square:
addi sp, sp, -8 # Allocate stack space
sw ra, 0(sp) # Save return address __ A13 __
add t0, x0, x0 # t0 = 0 (accumulator for the result)
add t1, a0, x0 # t1 = a0 (copy of n, multiplicand)
add t2, a0, x0 # t2 = a0 (copy of n, multiplier)
square_loop:
andi t3, t2, 1 # Check the lowest bit of t2 (t2 & 1) __ A14 __
beq t3, x0, skip_add # If the bit is 0, skip addition
add t0, t0, t1 # Accumulate: t0 += t1
skip_add:
sll t1, t1, 1 # Left shift t1 (multiply by 2) __ A15 __
srl t2, t2, 1 # Right shift t2 (divide by 2) __ A16 __
bne t2, x0, square_loop # Repeat loop if t2 is not zero __ A17 __
square_end:
add a0, t0, x0 # Move result to a0
lw ra, 0(sp) # Restore return address __ A18 __
addi sp, sp, 8 # Deallocate stack space
jalr x0, ra, 0 # Return from function __ A19 __
The testing result is 0x0000a7c6
equal to 42950
.
[info] welcome to sbt 1.10.7 (Temurin Java 1.8.0_432)
[info] loading project definition from /home/user/workspace/MyCPU/project
[info] loading settings for project root from build.sbt...
[info] set current project to cpu (in build file:/home/user/workspace/MyCPU/)
0000a7c6
[info] HexTest:
[info] CPU
[info] - should work through hex
[info] Run completed in 8 seconds, 172 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
if you use the following c
program, you will get the same result. The program is the implementation of the following equation.
\[ sum = m + n^2 + (n - 1)^2 + \ldots + 1^2 \]
#include <stdio.h>
int main() {
int n=50, m=25;
int sum = 0;
for(int i=1;i<n+1;i++){
sum += i * i;
}
printf("%d\n", sum+m);
return 0;
}