陳榮昶, 林晉德
This project involves enhancing the SRV32 lightweight RISC-V processor core, developed in Verilog, by integrating a cache system. The aim is to improve memory access performance and implement classic cache replacement mechanisms.
This document provides a guide for installing and testing SRV32 on Ubuntu systems and outlines the steps to achieve the project goals.
First, install the required development packages:
There are two methods to install the RISC-V toolchain:
Download xPack RISC-V toolchain
Extract and configure
Set environment variables
Problem:
Error "cannot execute binary file: Exec format error" when running make tests-all
.
Solution:
Ensure you download the correct toolchain version for your system:
uname -m
to check system architectureProblem:
Errors related to "la x0,5b" during compilation.
Solution:
Clear previous environment variables:
Set correct environment variables:
Clean and recompile:
Above is the srv32 Architecture, it has two memory system, instruction memory and the data memory.
Thus we adding two Level 1 Caches to the CPU, namely the Instruction Cache and the Data Cache. Both utilize a 2-Way Set Associative Mapping method to map. The cache replacement policies we choose is LRU (Least Recently Used Cache). The System Architecture is as shown in the figure below.
In this project, the Cache I/O is based on the srv32 cpu, with some modifications. Due to the 2-Way associative mapping, the Cache design includes two banks, each containing 32 blocks. Each block has its own Tag and Valid bit. Therefore, the designed Cache has a 64-bit valid signal internally, representing the validity of the 32 blocks in each bank. The following diagrams illustrates the hardware architecture of the Cache for this project.
The figure above shows the state diagram of the Instruction Cache controller designed for this project. Upon receiving the rst signal, the controller transitions to the IDLE state. When a CPU access request to the Instruction Memory is received, the controller transitions to the READ_HIT state to determine whether the accessed address matches the tag stored in the Cache. If the tag matches and the Valid bit of the corresponding block is 1, it is considered a hit; otherwise, it is a miss. This state lasts for only one cycle.
If it is determined to be a hit, the temporarily stored value in the Cache is returned to the CPU. Each Cache block is 128 bits, so the word to be returned to the CPU is determined based on address[3:2]. If it is determined to be a miss, the controller transitions to the READ_AXI state.
In the event of a Read Miss, the controller employs a Read Allocate policy, requiring communication with the SRAM to fetch the requested data along with three additional pieces of data (since the block size is 128 bits). When all four pieces of data have been sequentially written into the Cache block, the read operation is completed, and the controller transitions to the READ_DONE state.
The figure above illustrates the state machine of the aata Cache controller designed for this project. The design process is generally similar to that of the Instruction Cache, with the main difference being that the aata Cache must handle both read and write operations, whereas the Instruction Cache only needs to handle read operations. Therefore, the state machine for the Data Cache includes two additional states, WRITE_HIT and WRITE_CACHE, to manage write operations to the Data Memory.
Initially, the system remains in the IDLE state, waiting for the CPU to issue a core_req. Upon receiving a core_req, the system uses the core_write signal to determine whether the operation is a write. If it is a write operation, the system transitions to the WRITE_HIT state to determine whether it is a Write Hit. In the case of a Write Miss, the system notifies the CPU to update the value in the data Memory and simultaneously updates the value in the Cache. The system then returns to the IDLE state to await the next access request from the CPU, ensuring that the data is successfully written before allowing the CPU to proceed.
If it is a Write Hit, the system transitions directly to the WRITE_CACHE state, following the Write-Through Policy. In this state, the data to be written to the data Memory is also written to the corresponding block in the Cache. The system then transitions to the DONE state before returning to the IDLE state.
On the other hand, if the core_write signal is low while in the IDLE state, it indicates that the CPU intends to read from the data Memory. The process then follows a flow similar to that of the Instruction Cache. The system transitions to the READ_HIT state to determine whether it is a Read Hit. If it is a Read Hit, the system returns to the IDLE state while simultaneously outputting the cached data back to the CPU. Conversely, if it is a Read Miss, the system transitions to the READ_AXI state to fetch data from the data Memory via the CPU Wrapper. After receiving all four pieces of data, the system transitions to the DONE state and then returns to the IDLE state to await the next request.
This module implements the control logic for L1 Instruction Cache. Since instruction cache only handles read operations, it's relatively simple:
hit0 || hit1
)sram_counter
to track progressThe data cache control logic is more complex since it needs to handle both read and write operations:
In both L1C_data.v and L1C_inst.v, LRU is implemented similarly:
LRU bit update logic has several cases:
hit0
: Indicates hit on set 0hit1
: Indicates hit on set 1Replacement priority order:
The waveform demonstrates a L1 Instruction Cache Read Miss case. During the READ_HIT state, the low hit signal indicates that the requested data is not present in the Cache Block. This triggers the need to fetch data from instruction memory. Following the Cache's Read Allocate Policy, this fetch operation includes retrieving the requested data along with three additional data pieces to fill the corresponding Cache Block.
The READ_AXI state communicates with the CPU Wrapper to retrieve data from SRAM. The process completes only after receiving four data pieces, each accompanied by an axi_ready signal. The core_wait signal then informs the CPU that it can proceed with the next request. Notably, an lru_buffer is implemented to track which bank was most recently accessed in each set, facilitating efficient replacement decisions.
The waveform shows multiple Instruction Cache Read Hit cases. In the READ_HIT state, the high hit signal confirms a cache hit. The requested data is then output to the CPU through the core_out port. The system returns to IDLE state, signaling the CPU's readiness for the next request.
In the Data Cache Write Miss case, identified by the low hit signal, the cache proceeds from WRITE_HIT state directly to inform the CPU to implement the Write Through policy, updating the DM accordingly.
The Data Cache Write Hit waveform shows a high hit signal. Upon detection in the WRITE_HIT state, the cache transitions to WRITE_CACHE state, where it writes the CPU's data to the corresponding Cache Block.