蔡承璋, 張恩祥
In this project, we write RTL code of pipeline CPU. We need to get familiar with the RISC-V instruction set architecture and implement 45 instructions. The requirements include R-type, I-type, S-type, B-type U-type and J-type instruction, which allows us to recall the knowledge that was taught in computer organization.
The CPU is organized into five pipeline stages: IF (Instruction Fetch), ID (Instruction Decode and Register File Read), EX (Execution or Address Calculation), MEM (Data Memory Access), and WB (Write Back).
IF: This stage includes the PC Register (Program Counter) and IM (Instruction Memory). In this stage, the PC increments by 4 on each clock cycle and passes the next instruction's address to IM. IM, which is an SRAM, retrieves the instruction and passes it to the ID stage in the next clock cycle. When encountering special instructions like J-type or B-type instructions, the PC waits for the new address to be computed before sending the instruction address to IM.
ID: This stage primarily consists of the Decode, Register File, and Immediate Unit. In this stage, the Control Unit decodes the 32-bit instruction fetched by IM, generating signals for subsequent units. It also reads rsl and rs2 from the Register File, providing them for further calculations. The Immediate Unit calculates the current Immediate value based on the instruction and passes it for subsequent operations.
EXE: This stage includes the ALU (Arithmetic Logic Unit) responsible for executing arithmetic operations, the Multiplier does 4 types of multiplication by stall all the register, the JB Unit for determining jump and branch when encountering B-type or U-type instructions, and several Mux units. The ALU performs basic operations such as addition, subtraction, shifting, comparisons, OR, and AND. The Mux units in this stage select sources based on signals from the ID stage and forwarding unit's feedback. The JB Unit does comparison to decide whether branch operation should be executed, and jump signal is also connected, which allowing Hazard Detection Unit to judge the necessary of flush in IF stage.
MEM: This stage primarily handles Data Memory operations, distinguishing between Load DM and Save DM processes. The Save Control unit manages write signals for SW, SH, and SB operations. And the CSR Unit is also at this stage, to count for the number of instructions and cycles.
WB: This stage determines whether to write the ALU results or the value loaded from DM back to the Register File. If it's the value from DM, a Load Sign Extend operation is performed.
Forwarding Unit: Forwarding occurs when the reg address used in the second instruction matches the address being written by previous instructions. This unit determines the need for forwarding based on the positions of rd in MEM and WB stages and rs1 and rs2 positions in the EXE stage. If necessary, it sends signals to the EXE stage to select which stage's values to forward.
Hazard Detection: This unit sends signals to the pipeline registers to decide whether to flush or stall the pipeline. When encountering J-type or B-type instructions for jumping, the signals from EXE are sent to this unit. It sends flush signals to IFtoID and IDtoEXE, indicating that the current instructions are not needed. When dealing with Load DM instructions that require forwarding and where DM data is delayed by one cycle, the unit sends a stall signal to IFtoID and PC, indicating the need to wait for DM data before proceeding with the next instruction. It also sends a flush signal to IDtoEXE, indicating that the ID stage's instruction is a nop. As the multiplication takes more time, when the ALU operation is multiplication, the stall signals are sent to the five registers, including PC, IFtoID, IDtoEXE, EXEtoMEM, and MEMtoWB.
In our current design, we have implemented the basic structure for the RV32IMC architecture, which includes the base RV32I instructions as well as extensions for multiplication and division (M). However, we did not implement the C extension, which defines the compressed instructions.
Why the "C" Extension Was Not Implemented:
The C extension is responsible for compressing instructions to reduce code size and improve performance in some scenarios. While this can be advantageous in terms of memory efficiency, implementing the C extension requires additional logic to handle compressed instructions, such as:
For this specific project, we chose to focus on the core features of RV32IMC without adding the compression logic. This decision was made to simplify the implementation and avoid the extra complexity involved in decoding and managing compressed instructions.
In this section, we explain the SRAM memory model used in our design, including the SRAM wrapper module and how it interfaces with the actual SRAM memory.
SRAM Wrapper Overview:
The SRAM_wrapper module serves as an interface to the underlying SRAM module. It takes various control signals as inputs and passes them to the internal SRAM instance to manage memory operations.
Internal SRAM Instance:
The SRAM_wrapper module instantiates the SRAM module, passing the control and data signals to it. Here's how the internal memory operations work:
This section describes how to set up the simulation environment for the 5-stage pipelined RISC-V processor with RV32IMC design using ModelSim - Intel FPGA Starter Edition.
Create a working library:
Compile the SystemVerilog files:
Run the Simulation:
View the waveform to monitor signals:
First, We use below sim.do
in Modelsim to compile my program and start simualtion by entering "do sim.do" in Modelsim terminal
Second, We will explain our testbench program cpu_tb.sv
This testbench runs a RISC-V program in hex format from test.mem
It simulates the CPU with instruction and data memory
At the end, it shows the final state of all registers and data memory
Due to our memory model (SRAM.sv
and SRAM_wrapper.sv
), we do not support .data usage and our default memory starts as address 0. Below is an example
As above output, [5, 2, 7, 1, 3] is sorted and become [1, 2, 3, 5, 7]
In this second program, we want to compute the dot product of two arrays
Address | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
Word[ 0] | 0x00000003 | 3 | Array A[0]. |
Word[ 1] | 0xFFFFFFFC | -4 | Array A[1]. |
Word[ 2] | 0x00000002 | 2 | Array A[2]. |
Word[ 3] | 0x00000005 | 5 | Array B[0]. |
Word[ 4] | 0x00000006 | 6 | Array B[1]. |
Word[ 5] | 0x00000007 | 7 | Array B[2]. |
Word[ 6] | 0x00000005 | 5 | Final dot product result. |
In this program, We want to test the 4 multuplication related instructino, which are MUL, MULH, MULHSU, and MULHU
Register | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
x1 | 0x80000000 | -2147483648 | From lui x1, 0x800000 |
x2 | 0xFFFFFFFE | -2 | From addi x2, x0, -2 |
x3 | 0x07FFF7FF | 8,388,607 | From lui + addi combination |
x4 | 0x00000000 | 0 | Lower 32 bits of x1 * x2 |
x5 | 0x00000001 | 1 | Upper 32 bits (signed × signed) |
x6 | 0x80000001 | -2147483647 | Upper 32 bits (signed × unsigned) |
x7 | 0x7FFFFFFF | 2147483647 | Upper 32 bits (unsigned × unsigned) |
Address | Value (Hex) | Description |
---|---|---|
0x00 | 0x00000000 | MUL result (lower 32 bits) |
0x04 | 0x00000001 | MULH result (signed × signed) |
0x08 | 0x80000001 | MULHSU result (signed × unsigned) |
0x0C | 0x7FFFFFFF | MULHU result (unsigned × unsigned) |
In this program, We want to verify the hazard detection and forwarding ability of our CPU, Hence, We created a program with lots of hazards, including load-use hazard.
Registers
Register | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
x0 | 0x00000000 | 0 | Zero register (always 0). |
x1 | 0x0000000A | 10 | Initialized value. |
x2 | 0x00000014 | 20 | Initialized value. |
x3 | 0x00000005 | 5 | Initialized value. |
x4 | 0x00000003 | 3 | Initialized value. |
x5 | 0x0000000A | 10 | Loaded from memory[0]. |
x6 | 0x0000001E | 30 | Result of x5 + x2 . |
x7 | 0x00000014 | 20 | Loaded from memory[4]. |
x8 | 0x0000000F | 15 | Result of x7 - x3 . |
x9 | 0x00000005 | 5 | Loaded from memory[8]. |
x10 | 0x00000096 | 150 | Result of x9 * x6 . |
x11 | 0x000000A5 | 165 | Result of x10 + x8 . |
x12 | 0x000001EF | 495 | Result of x11 * x4 . |
x13 | 0x000001F9 | 505 | Result of x12 + x5 . |
x14 | 0x000001F4 | 500 | Result of x13 - x9 . |
Data Memory
Address | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
Word[0] | 0x0000000A | 10 | Stored value of x1 . |
Word[1] | 0x00000014 | 20 | Stored value of x2 . |
Word[2] | 0x00000005 | 5 | Stored value of x3 . |
Word[3] | 0x00000003 | 3 | Stored value of x4 . |
Word[4-15] | 0x00000000 | 0 | Placeholder for unused memory. |
In this program, We want to test I-type and S-type instrcutions, especially those dealing with byte and half-word
Register | Value (Hex) | Value (Decimal) | Explanation |
---|---|---|---|
x0 | 0x00000000 | 0 | Zero register (always 0). |
x1 | 0x0000000A | 10 | addi x1, x0, 10 . |
x2 | 0xFFFFFFFB | -5 | addi x2, x0, -5 . |
x3 | 0x000000F5 | 245 | xori x3, x1, 0xFF → 10 ^ 0xFF = 0xF5 . |
x4 | 0xFFFFFFFF | -1 | ori x4, x2, 0x7F → `-5 |
x5 | 0x00000005 | 5 | andi x5, x3, 0xF → 0xF5 & 0xF = 0x5 . |
x6 | 0x00000028 | 40 | slli x6, x1, 2 → 10 << 2 = 40 . |
x7 | 0x0000007A | 122 | srli x7, x3, 1 → 0xF5 >> 1 = 0x7A . |
x8 | 0xFFFFFFFD | -3 | srai x8, x2, 1 → -5 >> 1 = -3 (arithmetic shift keeps the sign). |
x9 | 0x00000001 | 1 | slti x9, x2, 0 → (-5 < 0) ? 1 : 0 . |
x10 | 0x00000001 | 1 | sltiu x10, x1, 20 → (10 < 20) ? 1 : 0 . |
x11 | 0x0000000A | 10 | lw x11, 0(x0) → Loads word from mem[0] . |
x12 | 0x000000F5 | 245 | lh x12, 4(x0) → Signed halfword from mem[4] . |
x13 | 0x000000F5 | 245 | lhu x13, 4(x0) → Unsigned halfword from mem[4] . |
x14 | 0xFFFFFFF5 | -11 | lb x14, 6(x0) → Signed byte from mem[6] . |
x15 | 0x000000F5 | 5 | lbu x15, 6(x0) → Unsigned byte from mem[6] . |
Word Address | Value (Hex) | Explanation |
---|---|---|
Word[ 0] | 0x0000000A | Stored by sw x1, 0(x0) . |
Word[ 1] | 0x000500F5 | Combined result of sh x3, 4(x0) and sb x5, 6(x0) . |
### Program6
In this program, we want to test U-type and B-type instructions
Register | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
x1 | 0x12345000 | 305419264 | Upper 20 bits loaded by lui |
x2 | 0x01000004 | 16777220 | PC + 0x1000000 from auipc |
x3 | 0xFFFFFFFB | -5 | Negative value for comparison |
x4 | 0x0000000A | 10 | Positive value for comparison |
x5 | 0x0000000A | 10 | Shows signed branch was taken |
x6 | 0x0000001E | 30 | Shows unsigned branch not taken |
Address | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
0x00 | 0x00000001 | 1 | Initial marker |
0x04 | 0x0000000A | 10 | Shows signed path taken |
0x08 | 0x0000001E | 30 | Shows unsigned path not taken |