Try   HackMD

Cache system for NucleusRV

程品叡, 吳睿秉

Introduction

NucleusRV is a Chisel-based RISC-V 5-stage pipelined CPU, which implements the 32-bit version of the ISA. Verilator is used to generate a C++ simulator and an executable, which are then verified using the RISC-V architectural test. The CPU currently supports a limited set of instructions. In this project, we are focusing on the memory-related components, i.e., those related to SRAM. We are building Nucleusrv in Ubuntu 22.04.5 and working on the progress of the cache system completed so far. After this, we attempted to implement a simple direct-mapped cache similar to the previous one and studied the compulsory miss situation in the instruction fetch.

Development Environment

  • OS: Ubuntu 22.04.5 LTS

Get NucleusRV and RISC-V Architecture Test SIG

1. nucleusrv

main branch: https://github.com/merledu/nucleusrv.git

git clone https://github.com/merledu/nucleusrv.git

or new cache branch: https://github.com/merledu/nucleusrv/tree/new_cache

git clone https://github.com/merledu/nucleusrv.git -b new_cache

2. riscv-arch-test

riscv-arch-test should be placed inside the nucleusrv directory.
riscv-arch-test: https://github.com/riscv-non-isa/riscv-arch-test.git

cd nucleusrv
git clone https://github.com/riscv-non-isa/riscv-arch-test.git -b 1.0

Dependencies

1. Install essential packages

sudo apt install git
sudo apt install curl
sudo apt install make

2. Install Java and SBT

Reference: https://www.chisel-lang.org/docs/installation#java-development-kit-jdk

sudo su

apt install -y wget gpg apt-transport-https
wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | gpg --dearmor | tee /etc/apt/trusted.gpg.d/adoptium.gpg > /dev/null
echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | tee /etc/apt/sources.list.d/adoptium.list
apt update
apt install temurin-17-jdk

exit

Reference: https://www.chisel-lang.org/docs/installation#sbt

curl -s -L https://github.com/sbt/sbt/releases/download/v1.9.7/sbt-1.9.7.tgz | tar xvz
sudo mv sbt/bin/sbt /usr/local/bin/

3. Verilator

Reference: https://verilator.org/guide/latest/install.html#package-manager-quick-install

sudo apt-get install verilator

After successfully installing, check the version.

verilator --version

The terminal should show something like :

Verilator 4.038 2020-07-11 rev v4.036-114-g0cd4a57ad

NOTE: Only in Ubuntu 22.04.5 does running sudo apt-get install verilator correctly install Verilator 4.038.

On newer versions of Verilator, the sbt test might fail because it requires additional arguments to specify how to handle timing. For example, in Ubuntu 24.04.1, the same command installs Verilator 5.020, which is newer but fails to successfully run the sbt testonly command.

Faults During Building

Running Compliance Tests (README.md of nucleusrv)

  • Clone riscv-arch-test repo in nucleusrv root git clone git@github.com:riscv-non-isa/riscv-arch-test.git -b 1.0
  • Build the simulation executable as defined in "Building with SBT" section.
  • Run ./run-compliance.sh in root directory.

When I executed ./run-compliance.sh, the error message is as follows.

[info] TopTest: Elaborating design... Done elaborating. cd /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test && verilator --cc Top.v --assert -Wno-fatal -Wno-WIDTH -Wno-STMTDLY -O1 --top-module Top +define+TOP_TYPE=VTop +define+PRINTF_COND=!Top.reset +define+STOP_COND=!Top.reset -CFLAGS "-Wno-undefined-bool-conversion -O1 -DTOP_TYPE=VTop -DVL_USER_FINISH -include VTop.h" -Mdir /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test -f /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test/firrtl_black_box_resource_files.f --exe /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test/Top-harness.cpp --trace %Error-NEEDTIMINGOPT: /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test/sram.v:131:17: Use --timing or --no-timing to specify how timing controls should be handled : ... note: In instance 'Top.imem.sram.memory' 131 | dout0 <= #(DELAY) mem[addr0_reg]; | ^ ... For error description see https://verilator.org/warn/NEEDTIMINGOPT?v=5.033 %Error-NEEDTIMINGOPT: /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test/sram.v:139:17: Use --timing or --no-timing to specify how timing controls should be handled : ... note: In instance 'Top.imem.sram.memory' 139 | dout1 <= #(DELAY) mem[addr1_reg]; | ^ %Error: Exiting due to 2 error(s) [info] - Top Test *** FAILED ***

According to the Errors and Warnings page, the NEEDTIMINGOPT error indicates that the command does not specify how Verilator should handle timing-related constructs, such as delays.

Since running ./run-compliance.sh triggers the command on line 4 to invoke Verilator, and we are unable to locate where to modify the arguments passed to Verilator, an alternative solution is to manually type the command instead of directly running ./run-compliance.sh. This allows us to add the --timing or --no-timing argument at the end of the command.
The command looks like:

cd /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test

verilator --cc Top.v --assert -Wno-fatal -Wno-WIDTH -Wno-STMTDLY -O1 \
--top-module Top +define+TOP_TYPE=VTop +define+PRINTF_COND=\!Top.reset +define+STOP_COND=\!Top.reset \
-CFLAGS "-Wno-undefined-bool-conversion \
-O1 -DTOP_TYPE=VTop -DVL_USER_FINISH -include VTop.h" \
-Mdir /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test \
-f /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test/firrtl_black_box_resource_files.f \
--exe /home/vboxuser/Desktop/nucleusrv/test_run_dir/Top_Test/Top-harness.cpp \
--trace --no-timing
![](https://imgur.com/6myVKuG.png)

Avoid including screenshots that display only plaintext. Instead, always use Markdown syntax.

I got it.

4. RISC-V GNU Compiler Toolchain

riscv-gnu-toolchain: https://github.com/riscv-collab/riscv-gnu-toolchain

git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git
sudo apt-get install autoconf automake autotools-dev curl python3 python3-pip python3-tomli libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build git cmake libglib2.0-dev libslirp-dev

cd riscv-gnu-toolchain/
./configure --prefix=/opt/riscv

echo "PATH=$PATH:/opt/riscv/bin" >> ~/.bashrc
source ~/.bashrc

sudo make -j `nproc`

After successfully installing and building, run the following command to check the installation.

riscv64-unknown-elf-gcc -v

The terminal should show something like :

Using built-in specs.
COLLECT_GCC=riscv64-unknown-elf-gcc
COLLECT_LTO_WRAPPER=/opt/riscv/libexec/gcc/riscv64-unknown-elf/14.2.0/lto-wrapper
Target: riscv64-unknown-elf
Configured with: /home/user/Desktop/riscv-gnu-toolchain/gcc/configure --target=riscv64-unknown-elf --prefix=/opt/riscv --disable-shared --disable-threads --enable-languages=c,c++ --with-pkgversion= --with-system-zlib --enable-tls --with-newlib --with-sysroot=/opt/riscv/riscv64-unknown-elf --with-native-system-header-dir=/include --disable-libmudflap --disable-libssp --disable-libquadmath --disable-libgomp --disable-nls --disable-tm-clone-registry --src=.././gcc --disable-multilib --with-abi=lp64d --with-arch=rv64gc --with-tune=rocket --with-isa-spec=20191213 'CFLAGS_FOR_TARGET=-Os    -mcmodel=medlow' 'CXXFLAGS_FOR_TARGET=-Os    -mcmodel=medlow'
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.2.0 () 

After building the riscv-gnu-toolchain, the following commands are now available. Try riscv64-unknown-elf- in the terminal and press the tab to check them.

user@user:~/Desktop/nucleusrv/tools$ riscv64-unknown-elf-
riscv64-unknown-elf-addr2line      riscv64-unknown-elf-gdb
riscv64-unknown-elf-ar             riscv64-unknown-elf-gdb-add-index
riscv64-unknown-elf-as             riscv64-unknown-elf-gprof
riscv64-unknown-elf-c++            riscv64-unknown-elf-ld
riscv64-unknown-elf-c++filt        riscv64-unknown-elf-ld.bfd
riscv64-unknown-elf-cpp            riscv64-unknown-elf-lto-dump
riscv64-unknown-elf-elfedit        riscv64-unknown-elf-nm
riscv64-unknown-elf-g++            riscv64-unknown-elf-objcopy
riscv64-unknown-elf-gcc            riscv64-unknown-elf-objdump
riscv64-unknown-elf-gcc-14.2.0     riscv64-unknown-elf-ranlib
riscv64-unknown-elf-gcc-ar         riscv64-unknown-elf-readelf
riscv64-unknown-elf-gcc-nm         riscv64-unknown-elf-run
riscv64-unknown-elf-gcc-ranlib     riscv64-unknown-elf-size
riscv64-unknown-elf-gcov           riscv64-unknown-elf-strings
riscv64-unknown-elf-gcov-dump      riscv64-unknown-elf-strip
riscv64-unknown-elf-gcov-tool      

Building with SBT

Top Test

Run the following command in SBT shell:

testOnly nucleusrv.components.TopTest -- -DwriteVcd=1

The terminal should show something like :

[info] TopTest:
Elaborating design...
Done elaborating.
...
sim start on DESKTOP at Mon Jan 20 17:34:11 2025
inChannelName: 00001368.in
outChannelName: 00001368.out
cmdChannelName: 00001368.cmd
STARTING test_run_dir/Top_Test/VTop
Enabling waves..
Exit Code: 0
[info] - Top Test
[info] Run completed in 3 seconds, 278 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Jan 20, 2025, 5:34:12 PM

After successfully doing this, a VTop executable in nucleusrv/test_run_dir/Top_Test will be generated.

SRAM Test

Additionally, to perform the SRAM test, the following command can be used to execute the cache SRAM test in the nucleusrv\src\test\scala\components\SRamTests.scala:

testOnly nucleusrv.components.CacheSRAMTests

Running Compliance Tests

In the root directory, run ./run-compliance.sh. If the VTop executable already exists, it will start the RISC-V architecture tests. However, before actually starting, some modifications need to be made:

Modify Makefile

In the old ISA specification, CSR related instructions were part of the basic instruction set. However, in the new ISA specification, CSR instructions were separated into the Zicsr extension. Therefore, to recognize the related instructions, explicit configuration settings must be made in the Makefile.

Running the RISC-V architecture test suite for rv32i requires adding
-march=rv32imac_zicsr at the end of RISCV_GCC_OPTS in the
nucleusrv/riscv-target/nucleusrv/device/rv32i/Makefile.include. This is not needed when running rv32im or rv32imc.

#sbt "testOnly nucleusrv.components.TopTest -- -DmemFile=tools/out/program.hex -DwriteVcd=1 -DsignatureFile=test.sig" TARGET_SIM ?= VTop ifeq ($(shell command -v $(TARGET_SIM) 2> /dev/null),) $(error Target simulator executable '$(TARGET_SIM)` not found) endif RUN_TARGET=\ cd $(NUCLEUSRV) && sbt "testOnly nucleusrv.components.TopTest -- -DprogramFile=$(<).program.hex -DwriteVcd=1 -DdataFile=$(<).data.hex" \ > $(*).stdout; \ `grep '^[a-f0-9]\+$$' $(*).stdout > $(*).signature.output`; RISCV_PREFIX ?= riscv64-unknown-elf- RISCV_GCC ?= $(RISCV_PREFIX)gcc RISCV_OBJCOPY ?= $(RISCV_PREFIX)objcopy RISCV_OBJDUMP ?= $(RISCV_PREFIX)objdump RISCV_ELF2HEX ?= $(RISCV_PREFIX)elf2hex -RISCV_GCC_OPTS ?= -static -mcmodel=medany -fvisibility=hidden -nostdlib -nostartfiles +RISCV_GCC_OPTS ?= -static -mcmodel=medany -fvisibility=hidden -nostdlib -nostartfiles -march=rv32imac_zicsr SBT ?= sbt COMPILE_TARGET=\ ...

Modify ./run-compliance.sh to Run Test

In ./run-compliance.sh, the $ISA and $TEST specify the testcase in
nucleusrv/riscv-arch-test/riscv-test-suite/[ISA]/src/[TEST] that will be executed.

Note that if the ISA is specifically rv32i, modifying the Makefile is necessary, as described in the Modify Makefile section previously.

If want to perform all the tests, set TEST as $ALL.
For example, the following modifications to ISA and TEST will only perform the ISA rv32imc's C-ADDI16SP test.

ISA=rv32imc
TEST=C-ADDI16SP

After setting the ISA and TEST variables, the two make commands in this script will run the tests.

Compare to reference files ... 

Check               C-ADDI16SP ... OK
Check               C-ADDI4SPN ... IGNORE
Check                   C-ADDI ... IGNORE
Check                    C-ADD ... IGNORE
Check                   C-ANDI ... IGNORE
Check                    C-AND ... IGNORE
Check                   C-BEQZ ... IGNORE
Check                   C-BNEZ ... IGNORE
Check                    C-JAL ... IGNORE
Check                   C-JALR ... IGNORE
Check                      C-J ... IGNORE
Check                     C-JR ... IGNORE
Check                     C-LI ... IGNORE
Check                    C-LUI ... IGNORE
Check                     C-LW ... IGNORE
Check                   C-LWSP ... IGNORE
Check                     C-MV ... IGNORE
Check                     C-OR ... IGNORE
Check                   C-SLLI ... IGNORE
Check                   C-SRAI ... IGNORE
Check                   C-SRLI ... IGNORE
Check                    C-SUB ... IGNORE
Check                     C-SW ... IGNORE
Check                   C-SWSP ... IGNORE
Check                    C-XOR ... IGNORE
--------------------------------
OK: 25/25 RISCV_TARGET=nucleusrv RISCV_DEVICE=rv32i RISCV_ISA=rv32imc

The test will compare two files, and if they are identical, the test will pass.

  • Reference answer: riscv-arch-test/riscv-test-suite/[ISA]/reference
  • Test output: riscv-arch-test/work/[ISA]/[TEST].output

Building C Programs

Modify the Makefile

Before making the C program, since the toolchain is installed with
./configure --prefix=/opt/riscv, which by default installs riscv64. Modify line 1 of the project's Makefile in nucleusrv/tools/makefile to RISCV=riscv64-unknown-elf- to ensure it does not call riscv32-unknown-elf-.

-RISCV=riscv32-unknown-elf- +RISCV=riscv64-unknown-elf- CC=$(RISCV)gcc OBJDUMP=$(RISCV)objdump OBJCOPY=$(RISCV)objcopy CFLAGS=-c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer OFLAGS=--disassemble-all --section=.text LFLAGS = -march=rv32im -mabi=ilp32 -static -nostdlib -nostartfiles -T link.ld PROGRAM ?= fibonacci ...

Make Program

Navigate to the nucleusrv/tools and run the following command to build your program, which is located in nucleusrv/tools/tests/FOLDER_NAME:
make PROGRAM=<FOLDER_NAME>

For example, if your program folder in nucleusrv/tools/test is named hello_world, use:

make PROGRAM=hello_world

The terminal should show something like :

rm -rf out
riscv64-unknown-elf-gcc -c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer   -c -o tests/hello_world/hello.o tests/hello_world/hello.c
riscv64-unknown-elf-gcc -c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer   -c -o tests/hello_world/main.o tests/hello_world/main.c
riscv64-unknown-elf-gcc -c -march=rv32i -mabi=ilp32 -ffreestanding -fomit-frame-pointer   -c -o tests/hello_world/world.o tests/hello_world/world.c
riscv64-unknown-elf-gcc -march=rv32im -mabi=ilp32 -static -nostdlib -nostartfiles -T link.ld tests/hello_world/hello.o tests/hello_world/main.o tests/hello_world/world.o -o out/program.elf -lgcc
riscv64-unknown-elf-objdump --disassemble-all --section=.text out/program.elf > out/program.dump
python3 makehex.py out/program.elf 2048 > out/program.hex

The corresponding RISC-V instruction will be generated in nucleusrv/tools/out/program.dump.


Understanding the Cache in new_cache Branch

Note that in this project, the class names and file names do not always correspond and may have inconsistent capitalization. The names mentioned below, if without file extensions, primarily refer to the class names.

Difference Between Main and new_cache Branch

The main difference between the main branch and the new_cache branch is that the SRAM is divided into Instruction SRAM and Data SRAM. In the new_cache branch, there are four types of memory: SRamTop, new_SRamTop, Instruction_SRamTop, and Data_SRamTop. They differ only in the class names or variable names, and their content is essentially identical.

To be more specific, SRamTop is not used at all. new_SRamTop is only used in the test program SRamTests.scala under CacheSRAMTests class. After modifying the test program from new_SRamTop to test Instruction_SRamTop or Data_SRamTop, as shown below, new_SRamTop is no longer used.

...
class CacheSRAMTests extends FreeSpec with ChiselScalatestTester {
    "New SRAM Test" in {
-        test(new         new_SRamTop(None)).withAnnotations(Seq(VerilatorBackendAnnotation)) { c =>
+        test(new Instruction_SRamTop(None)).withAnnotations(Seq(VerilatorBackendAnnotation)) { c => 
            // Write data to a specific address
            c.io.req.valid.poke(true.B)
...

This CacheSRAMTests test program is also a newly implemented feature in the new_cache branch, which has not been enabled in the main branch.

Program for Memory Components

Cache or memory serves as the closest place for the CPU to access instructions and data during execution. In the design architecture of this project, an instruction cache and a data cache are instantiated in the Top module. The Core utilizes them during the IF stage and MEM stage. In the implementation of these instruction and data caches (i.e., Instruction_SRamTop and Data_SRamTop), the *.v files in src/main/resources are called by Chisel using the BlackBox mechanism.

Moreover we have explain the code…more detail in code_explain

Here, I will briefly introduce the code in src/main/scala/components from the new_cache branch.

|-src/main/scala/components
  |-ALU.scala
  |-AluControl.scala
  |-BranchUnit.scala
  |-CompressedDecoder.scala
  |-Configs.scala
  |-Constants.scala
  |-Control.scala
  |-Core.scala
  |-Data_SRamTOP.scala
  |-Execute.scala
  |-ForwardingUnit.scala
  |-HazardUnit.scala
  |-ImmediateGen.scala
  |-InstructionDecode.scala
  |-InstructionFetch.scala
  |-Instruction_SRamTOP.scala
  |-JumpUnit.scala
  |-MDU.scala
  |-Main.scala
  |-MemIO.scala
  |-MemoryFetch.scala
  |-PC.scala
  |-Realigner.scala
  |-Registers.scala
  |-Top.scala
  |-SRamTop.scala
  |-new_SRamTOP.scala

ALU.scala

Basic ALU (Arithmetic Logic Unit) operation, like and,or,add,sub,slt and so on.

AluControl.scala

This code defines a module named AluControl using Chisel, which implements the control logic for an ALU (Arithmetic Logic Unit). It generates specific ALU control signals based on the input control signals (aluOp, f7, f3, aluSrc). The control signals determine the operation mode of the ALU.

The table below maps aluOp, f3, f7, and aluSrc to the output io.out:

aluOp f3 f7 aluSrc Operation Type io.out
0.U Any Any Any Add (ADD) 2.U
2.U 0.U 0.U false Add (ADD) 2.U
2.U 0.U 1.U true Subtract (SUB) 3.U
2.U 1.U Any Any Shift Left (SLL) 6.U
2.U 2.U Any Any Set Less Than (SLT) 4.U
2.U 3.U Any Any Set Less Than Unsigned (SLTU) 5.U
2.U 5.U 0.U Any Logical Right Shift (SRL) 7.U
2.U 5.U 1.U Any Arithmetic Right Shift (SRA) 8.U
2.U 7.U Any Any Logical AND (AND) 0.U
2.U 6.U Any Any Logical OR (OR) 1.U
2.U 4.U Any Any Logical XOR (XOR) 9.U

BranchUnit.scala

This code implements a RISC-V processor's branch unit that determines whether a branch should be taken based on branch conditions (funct3), operands, and control signals.

switch(io.funct3) {
    is(0.U) { check := (io.rd1 === io.rd2) } // beq
    is(1.U) { check := (io.rd1 =/= io.rd2) } // bne
    is(4.U) { check := (io.rd1.asSInt < io.rd2.asSInt) } // blt
    is(5.U) { check := (io.rd1.asSInt >= io.rd2.asSInt) } // bge
    is(6.U) { check := (io.rd1 < io.rd2) } // bltu
    is(7.U) { check := (io.rd1 >= io.rd2) } // bgeu
  }

CompressedDecoder.scala

The function is to decode RISC-V 16-bit compressed instructions into their corresponding 32-bit standard instructions.

Here’s the table with C0-C3 instructions explained:

Instruction Type Opcode (15-14/13-12) Instruction Name Description Decoding/Explanation
C0 00 + b00 c.addi4spn Stack pointer offset calculation Offset calculation and addition to x2 (sp)
00 + b01 c.lw Load data into register lw rd', imm(rs1')
00 + b11 c.sw Store data to memory sw rs2', imm(rs1')
00 + b10 Illegal instruction Unrecognized instruction, directly return -
C1 01 + b000 c.addi/nop Add immediate or no operation addi rd, rd, imm or nop
01 + b001 c.jal Unconditional jump, save return address to x1 jal x1, imm
01 + b101 c.j Unconditional jump jal x0, imm
01 + b010 c.li Load immediate into register addi rd, x0, imm
01 + b011 c.lui/addi16sp Load upper immediate or specific stack operation lui rd, imm or addi x2, x2, imm
01 + b100 Logical/Arithmetic Shift, logical operation or subtraction srli, srai, andi, sub, etc.
01 + b110 c.beqz Branch if rs1' is zero beq rs1', x0, imm
01 + b111 c.bnez Branch if rs1' is not zero bne rs1', x0, imm
C2 10 + b00 c.slli Logical left shift slli rd, rd, shamt
10 + b01 c.lwsp Load data from stack with offset lw rd, imm(x2)
10 + b10 c.mv/c.add/c.jr etc Data move, addition, or jump Depends on whether the register is zero
10 + b11 c.swsp Store data to stack sw rs2, imm(x2)
C3 11 Illegal instruction Unrecognized instruction, directly return -

Configs.scala

This code defines a case class named Configs, which is used to store and initialize the basic configuration parameters for a RISC-V core, such as the bit width (XLEN), whether the M and C instruction sets are enabled (M and C), and whether TRACE messages are enabled (TRACE).

Constants.scala

This code defines an object called ALUOps that contains various operation codes (opcodes) for an Arithmetic Logic Unit (ALU).

ALUop Opcode
ADD 2
SUB 6
AND 0
OR 1
XOR 9
SLL 3
SRL 4
SRA 5
SLT 9
SLTU 10
COPY 11

Control.scala

This code implements a control unit based on instruction decoding, which generates the corresponding control signals by matching instruction bit patterns. It includes signals for ALU source selection (aluSrc), memory operation control (memToReg, memRead, memWrite), register write control (regWrite), branch decision (branch), jump instructions (jump), and ALU operation selection (aluOp, aluSrc1). These control signals guide the operation of the RISC-V processor based on the input instructions.

Core.scala

This code simulates the core logic of a RISC-V processor, processing instructions through a pipeline (IF, ID, EX, MEM, WB). It defines multiple registers to store data at each stage, along with control signals and computation results. Each stage is responsible for handling different parts of the instruction flow, including instruction fetching, decoding, execution, memory access, and write-back. Through this pipelined design, the code efficiently models the operation of a RISC-V processor, improving performance by alternating instruction processing. Additionally, it handles memory access requests and outputs RVFI (RISC-V Formal Interface) data for simulation and validation purposes.

Data_SRamTOP.scala

This code defines a SRAM memory controller that handles read and write requests from external devices (such as a processor). It processes incoming Decoupled requests to determine whether the operation is a read or write, and it controls the SRAM memory by adjusting the chip-select signal, read/write control, address, and data inputs. A status register (validReg) manages the validity of the responses, and the read data or invalid data is passed back to the response. This allows the SRAM memory to efficiently perform data operations based on the incoming requests.

Execute.scala

This code implements an execution unit that handles the execution phase in a RISC-V processor pipeline. It is responsible for performing arithmetic and logical operations (such as addition and subtraction) based on control signals (func3, func7). The code uses a forwarding unit to resolve data hazards and selectively supports multiplication and division operations. If multiplication/division functionality (M is set to true) is enabled, it performs these operations based on control signals and manages pipeline stalls. Ultimately, the output includes the computation result (ALUresult) and the data to be written (writeData).

ForwardingUnit.scala

Forwarding Unit handles the data hazard in a processor pipeline. The Forwarding Unit determines the appropriate data source for execution by checking if the source registers (reg_rs1, reg_rs2) match with the destination registers of the execution unit (ex_reg_rd) or the memory unit (mem_reg_rd), and then chooses the correct forwarding path.

  • io.forwardA and io.forwardB are output signals that indicate which data source should be forwarded, either from the execution unit (EX) or the memory unit (MEM).
  • io.reg_rs1 and io.reg_rs2 are the source registers used in branch operations.
  • io.ex_reg_rd represents the destination register of the execution unit, and io.mem_reg_rd represents the destination register of the memory unit.
  • io.ex_regWrite and io.mem_regWrite indicate whether the execution unit and memory unit are writing back to their destination registers.

This code’s role is to provide the appropriate data forwarding path to the execution unit (ALU) to resolve data hazards caused by uncompleted read operations, such as accessing memory.

HazardUnit.scala

This code is responsible for detecting data hazards and controlling the flow of the pipeline in a processor. The Hazard Unit checks if the current instruction (mainly memory access and branch instructions) causes pipeline hazards and adjusts the control signals accordingly.

Code Logic:
  1. load-use hazard: This part of the code checks for "data hazard" situations, where the ID stage has a memory read operation, and the source register (id_rs1 or id_rs2) matches the destination register (id_ex_rd).

    • If this condition is true, it disables further pipeline control signals (io.ctl_mux, io.pc_write, io.if_reg_write), preventing incorrect instruction execution.
  2. branch hazard: This part of the code detects if a branch hazard exists. If the taken signal is true or jump is not zero, the IF/ID stage needs to be cleared to avoid executing the wrong instruction.

    • It sets io.ifid_flush to true.B, which signals the need to clear the pipeline.

ImmediateGen.scala

This code extracts and extends appropriate immediate values from a given 32-bit instruction based on its instruction type (I-type, U-type, S-type, SB-type, UJ-type). The code determines the immediate type using the OPCODE part of the instruction and then combines the relevant bits using the Cat function. The extended immediate value is output to io.out, which will be used in subsequent arithmetic operations, address calculations, or branch decisions during instruction execution, playing a crucial role in the execution phase.

InstructionDecode.scala

This code is handling control logic, pipeline hazards, immediate value processing, register data read/write, and branch/jump calculations—critical for decoding instructions into the next operational phase.
It primarily performs the following functions:

  • Hazard Detection:
    • Detects data hazards and adjusts the pipeline accordingly based on control signals and memory responses.
  • Control Unit:
    • Decodes the instruction and generates control signals for operations, such as ALU source selection, memory access control, etc.
  • Register File:
    • Reads from and writes to the register file, and performs data forwarding.
  • Immediate Generation:
    • Extracts the immediate value from the instruction for use in calculations.
  • Branch Unit:
    • Handles branch decisions and checks if a branch should be taken.
  • Offset Calculation:
    • Calculates the new program counter (PC) address after a jump or branch.
  • Instruction Flush:
    • Performs instruction flush if there is a structural hazard.
  • RVFI:
    • If tracing is enabled, outputs register information involved in the instruction operation.

InstructionFetch.scala

This code implements an instruction fetch module that retrieves a 32-bit instruction from memory at a specified address. It interacts with the memory using a handshake protocol, sending read requests and receiving instruction responses. The module supports reset and stall mechanisms to ensure request operations are paused when the pipeline is stalled or in a reset state.

Instruction_SRamTOP.scala

This code defines an SRAM controller module named Instruction_SRamTop designed specifically for instruction access, implemented using Chisel. It utilizes the BlackBox module instructioncache_sramTop to simulate SRAM behavior. The functionality and logic are similar to Data_SRamTop: it handles SRAM signals like address, write mask, and others based on external read or write requests, enabling instruction read or write operations. The module optionally supports loading an initialization file into memory and sends the processed results back to the response port, making it suitable for managing processor instruction access.

JumpUnit.scala

Here’s a table explanation of the code:

Condition (func7) Output (jump) Description
1101111 (JAL) 2 Indicates a JAL instruction, unconditional jump.
1100111 (JALR) 3 Indicates a JALR instruction, register-based unconditional jump.
Other values 0 Indicates a non-jump instruction, no jump occurs.

MDU.scala

This code implements a multi-functional integer multiplication and division unit (MDU), capable of performing multiplication, division, and modulus operations.

  • Inputs: src_a, src_b (32-bit numbers), op (operation type), valid.
  • Outputs: ready (whether ready), output.bits (operation result).
Key Logic
  • Multiplication: Depending on op, different multiplication operations are performed.
  • Division: Simulates division using state registers, outputs quotient or remainder based on op.
  • Output: Based on op, it selects whether to output multiplication results, division quotient, or remainder.

Main.scala

Nothing here

MemIO.scala

These MemRequestIO and MemResponseIO classes define the request and response interfaces for memory operations, facilitating communication between hardware modules such as CPU cores and memory or peripherals.

MemoryFetch.scala

This code implements a Memory Fetch module with the following functionality:

  1. Write Operations:

    • Supports word, halfword, and byte-level data storage.
    • Activates specific byte lanes based on funct3 and configures valid data bits.
  2. Read Operations:

    • Supports signed and zero-extended data loading.
    • Selects the correct data byte or halfword based on funct3 and address offset.
  3. Address and Data Formatting:

    • Aligns memory addresses as required and packages them into read/write requests.
    • Processes returned data and formats it for proper output.
  4. Request Management:

    • Controls the validity of memory requests and sets a stall signal to ensure operational synchronization based on response readiness.
  5. Debugging Support:

    • Outputs data during specific write operations for debugging purposes.

PC.scala

This code implements a Program Counter (PC) module, which manages the current program address in a processor and calculates the next possible address (incremented by 4 or 2) based on a condition (whether halted), supporting program flow control and instruction address generation.

Realigner.scala

This code implements a Realigner module that handles misaligned instruction addresses. In the processor pipeline, if the instruction address is misaligned (i.e., the least significant bit is not word-aligned), the module uses a state machine to process the instruction in two steps: first, it stores the upper 16 bits of the current instruction, halts the PC for one cycle, and outputs a NOP instruction to the core; then, in the next cycle, it concatenates the saved upper 16 bits with the new lower 16 bits to form an aligned instruction, which is then sent to the core. If the address is already aligned, the module directly passes the instruction. This design ensures instruction alignment and prevents issues caused by misaligned addresses.

Registers.scala

This code implements a simple 32-bit register file module for handling register read and write operations in a processor. It supports two read ports, allowing data to be read from two registers simultaneously, and can write data to a specified register based on the input write address.

Top.scala

The Top module serves as the integration point for the various components in the processor system, such as the core, instruction memory, data memory, and trace functionality. It coordinates the interaction between the core and memory modules, ensuring that instructions and data are correctly exchanged between the core and memory. Additionally, when tracing is enabled in the configuration, the Top module sends the core's execution details to the trace module for monitoring and debugging purposes.

SRamTop.scala

This code defines a module for handling read and write operations to SRAM. The SRamTop module acts as an interface for memory requests and responses, working with the sram_top module to perform actual read and write operations. When a read request is received, SRamTop enables the SRAM, reads the data, and returns it. For a write request, it writes the data into SRAM. The module uses control signals such as csb_i, we_i, and wmask_i to manage memory operations. The sram_top module is implemented as a black-box, interacting with an external Verilog memory model and optionally loading a program file to initialize the memory content.

new_SRamTOP.scala

The new_SRamTop module handles memory requests by managing data read and write operations. It interfaces with a custom SRAM model (new_sramTop) and performs read and write actions based on the incoming requests, which include address and data. The module uses a handshake mechanism with Decoupled signals and controls the SRAM using signals like csb_i, we_i, and addr_i. The module also incorporates logic to handle both read and write operations, ensuring data integrity by properly managing the valid response and request readiness signals.

Memory Request and Response I/O

The memory request and response are achieved using the Chisel Decoupled technique. In Chisel, the decoupled technique is used to separate components and allow them to operate independently, especially when designing systems with multiple modules.

In a decoupled system, two signals need to be set: the valid signal and the ready signal.

  • The sender is responsible for setting the valid signal to indicate that the data is going to be sent.
  • The receiver is responsible for setting the ready signal to indicate that it is ready to accept the data.

As shown in the figure below, there are two directions of communication. In either direction, the sender is responsible for setting the valid signal, while the receiver is responsible for setting the ready signal.

Here, the entity requesting memory could be the CPU or a test program.

Take InstructionFetch as an example. To request memory, the MemResponseIO instance (coreInstrResp) is set to ready := true (line 12). Similarly, the MemRequestIO instance (coreInstrReq) has its valid signal set to true (line 19), unless a reset or stall condition occurs.

class InstructionFetch extends Module { val io = IO(new Bundle { val address: UInt = Input(UInt(32.W)) val instruction: UInt = Output(UInt(32.W)) val stall: Bool = Input(Bool()) val coreInstrReq = Decoupled(new MemRequestIO) val coreInstrResp = Flipped(Decoupled(new MemResponseIO)) }) val rst = Wire(Bool()) rst := reset.asBool() io.coreInstrResp.ready := true.B io.coreInstrReq.bits.activeByteLane := "b1111".U io.coreInstrReq.bits.isWrite := false.B io.coreInstrReq.bits.dataRequest := DontCare io.coreInstrReq.bits.addrRequest := io.address >> 2 io.coreInstrReq.valid := Mux(rst || io.stall, false.B, true.B) io.instruction := Mux(io.coreInstrResp.valid, io.coreInstrResp.bits.dataResponse, DontCare) }

The MemRequestIO and MemResponseIO described here are defined in MemIO.scala.

The addrRequest specifies the memory location to be accessed, and the isWrite signal determines whether the operation is a write or a read.

class MemRequestIO extends Bundle {
  val addrRequest: UInt = Input(UInt(32.W))
  val dataRequest: UInt = Input(UInt(32.W))
  val activeByteLane: UInt = Input(UInt(4.W))
  val isWrite: Bool = Input(Bool())
}

class MemResponseIO extends Bundle {
  val dataResponse: UInt = Input(UInt(32.W))
}

In the write operation, dataRequest contains the data to be written to the cache. For a read case, isWrite is set to false, and dataRequest is considered DontCare (as InstructionFetch line 15~16). The MemResponseIO carries the data being read in the dataResponse signal.

Observations in Cache Access

Below is the content of CacheSRAMTests after modifying the test from new_SRamTop to Instruction_SRamTop. Note that the programs for Instruction_SRamTop and Data_SRamTop are basically the same, differing only in name.

In this test program, data is written and then read from the corresponding address to verify that the cache works correctly and can access previously stored data.

package nucleusrv.components

import chisel3._
import chisel3.util._
import org.scalatest._
import chiseltest._
import chiseltest.experimental.TestOptionBuilder._
import chiseltest.internal.VerilatorBackendAnnotation

class CacheSRAMTests extends FreeSpec with ChiselScalatestTester {
    "New SRAM Test" in {
        test(new Instruction_SRamTop(None)).withAnnotations(Seq(VerilatorBackendAnnotation)) { c => 
            // Write data to a specific address
            c.io.req.valid.poke(true.B)
            c.io.req.bits.isWrite.poke(true.B)           // Write operation
            c.io.req.bits.addrRequest.poke(100.U)        // Address to write
            c.io.req.bits.dataRequest.poke(42.U)         // Data to write
            c.io.req.bits.activeByteLane.poke("b1111".U) // Enable all bytes
            c.clock.step(10)                             // Allow time for write to complete

            // Read data back from the same address
            c.io.req.bits.isWrite.poke(false.B)          // Read operation
            c.io.req.bits.addrRequest.poke(100.U)        // Address to read
            c.clock.step(10)                             // Allow time for read
            c.io.rsp.bits.dataResponse.expect(42.U)      // Verify the data

            // Test another address
            c.io.req.bits.addrRequest.poke(5000000.U)    // Address to write (large address within range)
            c.io.req.bits.dataRequest.poke(123.U)        // Data to write
            c.io.req.bits.isWrite.poke(true.B)           // Write operation
            c.clock.step(10)                             // Allow time for write

            c.io.req.bits.isWrite.poke(false.B)          // Read operation
            c.io.req.bits.addrRequest.poke(5000000.U)    // Address to read
            c.clock.step(10)                             // Allow time for read
            c.io.rsp.bits.dataResponse.expect(123.U)     // Verify the data
        }
    }
}

Here is a code snippet from Instruction_SRamTop, where two printf lines are added to observe whether the process is a read or write operation.

when(io.req.valid && !io.req.bits.isWrite) {
    // READ
+    printf("read case\n")
    ...
} .elsewhen(io.req.valid && io.req.bits.isWrite) {
    // WRITE
+    printf("write case\n")
    ...
} .otherwise {
    ...
}

During the CacheSRAMTests, both read and write requests can be observed.

...
STARTING test_run_dir/New_SRAM_Test/VInstruction_SRamTop
write case
write case
read case
read case
write case
read case
read case
Exit Code: 0
[info] - New SRAM Test
...
[info] All tests passed.
[success] Total time: 2 s, completed Jan 21, 2025, 11:48:02 AM

However, in the TopTest, it always shows only read requests.

...
read case
read case
read case
read case
read case
Enabling waves..
Exit Code: 0
[info] - Top Test
...
[info] All tests passed.
[success] Total time: 3 s, completed Jan 21, 2025, 11:50:01 AM

And since this project lack a memory hierarchy, every request results in a miss. Moreover, the CPU's fetching of the program file and data file seems not to be fully implemented, and it is unable to execute instructions based on the file I want to provide.

Therefore, a more feasible approach for us seems to be focusing first on improving and testing only the instruction fetch part.

Cache Implementation for Instruction Fetch

Modifying Memory Request and Response I/O

First, I want to know whether the cache is a hit or miss, so I added an output hit in MemResponseIO. The other parts remain unchanged, keeping the I/O interface as close to the original as possible. Upon knowing the hit or miss, further actions can be proceeded with, such as handling a cache miss.

class MemRequestIO extends Bundle{
  val addrRequest: UInt = Input(UInt(32.W))
  val isWrite:Bool = Input(Bool())
  val dataRequest: UInt = Input(UInt(32.W))
  val activeByteLane: UInt = Input(UInt(4.W))
}

class MemResponseIO extends Bundle{
  val dataResponse:UInt = Output(UInt(32.W))
  val hit: Bool = Output(Bool())  
}

Implementing a Simple Cache

Cache Program

I am attempting to write another cache without using Verilog black boxes. Below is our attempt at implementing a simple direct-mapped cache. The cache contain valid tag and data.

package nucleusrv.components import chisel3._ import chisel3.util._ import chisel3.experimental._ import chisel3.util.experimental._ class Instruction_SRamTop(val programFile:Option[String] ) extends Module { val io = IO(new Bundle { val req = Flipped(Decoupled(new MemRequestIO)) val rsp = Decoupled(new MemResponseIO) }) val validReg = RegInit(false.B) io.rsp.valid := validReg io.req.ready := true.B val cacheSize = 8 // Cache 行數 val blockSize = 32 // 32 bit per line val indexBits = log2Ceil(cacheSize) val validBits = RegInit(VecInit(Seq.fill(cacheSize)(false.B))) // V val tags = Reg(Vec(cacheSize, UInt((32 - indexBits).W))) // TAG val data = Reg(Vec(cacheSize, UInt(blockSize.W))) // DATA // extract tag、index from addr val tag = io.req.bits.addrRequest(31, indexBits) val index = io.req.bits.addrRequest(indexBits - 1, 0) // Data and TAG in the cache val isValid = validBits(index) val cacheTag = tags(index) val cacheData = data(index) io.rsp.bits.hit := isValid && (cacheTag === tag) dontTouch(io.req.valid) val target_data = Reg(UInt(32.W)) target_data := 0.U when(io.req.valid && !io.req.bits.isWrite){ when(io.rsp.bits.hit) { printf(">>>Cache hit! %x\n",cacheData) target_data := cacheData }.otherwise{ printf(">>>Cache miss!\n") target_data := 0.U } validReg := true.B } .elsewhen(io.req.valid && io.req.bits.isWrite) { printf(">>>Write, %x\n",io.req.bits.dataRequest) validBits(index) := true.B tags(index) := tag data(index) := io.req.bits.dataRequest validReg := true.B } .otherwise { validReg := false.B } io.rsp.bits.dataResponse := target_data }

Test Program

Since this implementation maintains an almost identical interface to the original Instruction_SRamTop, the CacheSRAMTests can continue to be used to test this new implementation. Additionally, since a hit signal has been added to MemResponseIO, the test can now expect a hit output for verification, as shown in lines 26, 38, and 44.

package nucleusrv.components import chisel3._ import chisel3.util._ import org.scalatest._ import chiseltest._ import chiseltest.experimental.TestOptionBuilder._ import chiseltest.internal.VerilatorBackendAnnotation class CacheSRAMTests extends FreeSpec with ChiselScalatestTester { "New SRAM Test" in { test(new Instruction_SRamTop(None)).withAnnotations(Seq(VerilatorBackendAnnotation)) { c => // Write data to a specific address c.io.req.valid.poke(true.B) c.io.req.bits.isWrite.poke(true.B) // Write operation c.io.req.bits.addrRequest.poke(100.U) // Address to write c.io.req.bits.dataRequest.poke(42.U) // Data to write c.io.req.bits.activeByteLane.poke("b1111".U) // Enable all bytes c.clock.step(10) // Allow time for write to complete // Read data back from the same address c.io.req.bits.isWrite.poke(false.B) // Read operation c.io.req.bits.addrRequest.poke(100.U) // Address to read c.clock.step(10) // Allow time for read c.io.rsp.bits.dataResponse.expect(42.U) // Verify the data c.io.rsp.bits.hit.expect(true.B) // Test another address c.io.req.bits.addrRequest.poke(5000000.U) // Address to write (large address within range) c.io.req.bits.dataRequest.poke(123.U) // Data to write c.io.req.bits.isWrite.poke(true.B) // Write operation c.clock.step(10) // Allow time for write c.io.req.bits.isWrite.poke(false.B) // Read operation c.io.req.bits.addrRequest.poke(5000000.U) // Address to read c.clock.step(10) // Allow time for read c.io.rsp.bits.dataResponse.expect(123.U) // Verify the data c.io.rsp.bits.hit.expect(true.B) c.io.req.bits.isWrite.poke(false.B) // Read operation c.io.req.bits.addrRequest.poke(777.U) // Address to read c.clock.step(10) // Allow time for read c.io.rsp.bits.dataResponse.expect(0.U) // Verify the data c.io.rsp.bits.hit.expect(false.B) } } }

The output looks like this. During execution, it will show whether it's a hit or miss. The number of output lines varies by clock.step.

>>>Write, 0000002a
>>>Write, 0000002a
>>>Write, 0000002a
>>>Cache hit! 0000002a
>>>Cache hit! 0000002a
>>>Cache hit! 0000002a
>>>Write, 0000007b
>>>Write, 0000007b
>>>Write, 0000007b
>>>Cache hit! 0000007b
>>>Cache hit! 0000007b
>>>Cache hit! 0000007b
>>>Cache miss!
>>>Cache miss!
>>>Cache miss!
Exit Code: 0
[info] - New SRAM Test
...
[info] All tests passed.
[success] Total time: 2 s, completed Jan 21, 2025, 5:36:54 PM

Instruction Fetch

As previously described, the CPU only performs reads during each cycle and does not attempt to write missing data into the cache when a cache miss occurs.

After creating the new cache, I started focusing on whether the CPU is truly utilizing the cache. Specifically, I am examining whether the CPU correctly handles cache misses and fetches data from the next level, like the behavior in the IF (Instruction Fetch) stage.

Based on the original Instruction Fetch process, I made some modifications so that when a miss occurs, the data will be placed into the cache, so that there won't be endless compulsory misses during a cold start.

Instruction Fetch Program

package nucleusrv.components import chisel3._ import chisel3.util._ import scala.io.Source class InstructionFetch extends Module { val io = IO(new Bundle { val stall: Bool = Input(Bool()) val address: UInt = Input(UInt(32.W)) val coreInstrReq = Decoupled(new MemRequestIO) val coreInstrResp = Flipped(Decoupled(new MemResponseIO)) val instruction: UInt = Output(UInt(32.W)) val hit: Bool = Output(Bool()) }) io.instruction := 0.U io.hit := false.B io.coreInstrResp.ready := true.B val rst = Wire(Bool()) rst := reset.asBool() io.coreInstrReq.valid := true.B // Mux(rst || io.stall, false.B, true.B) io.coreInstrReq.bits.isWrite := false.B io.coreInstrReq.bits.dataRequest := DontCare io.coreInstrReq.bits.addrRequest := io.address >> 2 io.coreInstrReq.bits.activeByteLane := "b1111".U // Sending a request to SRAM, trying to read data. val real_iSRAM = Module(new Instruction_SRamTop(None)) real_iSRAM.io.req.valid := true.B real_iSRAM.io.rsp.ready := true.B val writeEnable = RegInit(false.B) real_iSRAM.io.req.bits.isWrite := writeEnable real_iSRAM.io.req.bits.addrRequest := io.coreInstrReq.bits.addrRequest real_iSRAM.io.req.bits.dataRequest := DontCare real_iSRAM.io.req.bits.activeByteLane := "b1111".U // hit or miss io.hit := RegNext(real_iSRAM.io.rsp.bits.hit, false.B) when(real_iSRAM.io.rsp.valid && !io.hit){ // when cache miss, read .txt as next level cache val searchTableData = RegInit(0.U(32.W)) val filename = "address_data.txt" val addressDataMap = Source.fromFile(filename).getLines() .map { line => val Array(address, data) = line.split("=").map(_.trim) BigInt(address, 16) -> BigInt(data, 16).U(32.W) }.toMap addressDataMap.foreach { case (address, data) => when(address.U === io.address){ searchTableData := data } } // get the instruction io.instruction := searchTableData // load into the cache real_iSRAM.io.req.bits.isWrite := true.B real_iSRAM.io.req.bits.addrRequest := io.coreInstrReq.bits.addrRequest real_iSRAM.io.req.bits.dataRequest := searchTableData }.elsewhen(real_iSRAM.io.rsp.valid && io.hit){ real_iSRAM.io.req.bits.isWrite := false.B // get the instruction io.instruction := real_iSRAM.io.rsp.bits.dataResponse } }

Similar to the original Instruction Fetch program, but when a cache miss occurs, it will access the next level of the cache to load the data from the requested address.

If a cache miss occurs, the condition on line 41 is met, and the data is retrieved from the next level and placed into the cache.
Since there is no memory hierarchy in this system, I use a text file as a substitute for the lower-level memory. When a cache miss occurs, the program will read from this text file to get the corresponding address's data. (Line 43~54)

Copy the addr_x=data column (excluding the heading) into a address_data.txt file and place it in the root directory of the project.

index          addr_bin	addr_dec               addr_x=data
0		000000|00	0			0=fe010113
1		000001|00	4			4=00012a23
2		000010|00	8			8=08000793
3		000011|00	12			c=00f12823
4		000100|00	16			10=000027b7
5		000101|00	20			14=00f12623
6		000110|00	24			18=00012e23
7		000111|00	28			1c=0400006f
0		001000|00	32			20=00012c23
1		001001|00	36			24=0200006f
2		001010|00	40			28=01c12703
3		001011|00	44			2c=01812783
4		001100|00	48			30=00f707b3
5		001101|00	52			34=00f12a23
6		001110|00	56			38=01812783
7		001111|00	60			3c=00178793
0		010000|00	64			40=00f12c23
1		010001|00	68			44=01812703
2		010010|00	72			48=00c12783
3		010011|00	76			4c=fcf74ee3
4		010100|00	80			50=01c12783
5		010101|00	84			54=00178793
6		010110|00	88			58=00f12e23
7		010111|00	92			5c=01c12703
0		011000|00	96			60=01012783
1		011001|00	100			64=faf74ee3
2		011010|00	104			68=00000793
3		011011|00	108			6c=00078513
4		011100|00	112			70=02010113
5		011101|00	116			74=00008067

Test Program

We also created a test for the InstructionFetch, which successfully passes the test. Initially, it reads data from addresses 0 and 28, resulting in compulsory misses. (line 11~14)

Since a text file is used as the next-level storage, when a miss occurs, data can be retrieved from it and loaded into the cache. On the second read request, a cache hit occurs. (line 16~28)

When new data (address 92) is placed into the same line of the cache, the previous data (address 28) will be evicted from the cache. So, when address 28 is read again, it will result in a conflict miss. (line 30 to 45)

package nucleusrv.components import chisel3._ import org.scalatest.FreeSpec import chiseltest._ class InstructionFetchTest extends FreeSpec with ChiselScalatestTester { "IF Test" in { test(new InstructionFetch) { IF => IF.io.stall.poke(false.B) IF.io.address.poke(0.U) IF.io.hit.expect(false.B) IF.io.address.poke(28.U) IF.io.hit.expect(false.B) // read address 0 IF.clock.step(10) IF.io.address.poke(0.U) IF.io.hit.expect(true.B) IF.clock.step() IF.io.instruction.expect(BigInt("fe010113", 16).U) // read address 28 IF.clock.step(10) IF.io.address.poke(28.U) IF.io.hit.expect(true.B) IF.clock.step() IF.io.instruction.expect(BigInt("0400006f", 16).U) // Request 92 and evict 28 IF.io.address.poke(92.U) IF.clock.step() IF.io.hit.expect(false.B) // read address 92 IF.clock.step(10) IF.io.address.poke(92.U) IF.io.hit.expect(true.B) IF.clock.step() IF.io.instruction.expect(BigInt("01c12703", 16).U) // Conflict miss for 28 IF.io.address.poke(28.U) IF.clock.step() IF.io.hit.expect(false.B) } } }

Future Work

In this project, we focused on building a Chisel-based CPU, Nucleusrv, with an emphasis on understanding its memory-related components. While the instruction cache has been implemented, the data cache remains to be developed. This will likely require a more complex cache architecture or development.

Additionally, the cache hierarchy has not yet been fully implemented, and there is room for further refinement in this area.

Another area for future work is improving the CPU's ability to read program and data files, which currently has some limitations. There is also potential for enhancing compatibility with the CPU to ensure seamless integration with external systems.

Reference