蔡承璋, 張恩祥
In this project, we write RTL code of pipeline CPU. We need to get familiar with the RISC-V instruction set architecture and implement 45 instructions. The requirements include R-type, I-type, S-type, B-type U-type and J-type instruction, which allows us to recall the knowledge that was taught in computer organization.
The CPU is organized into five pipeline stages: IF (Instruction Fetch), ID (Instruction Decode and Register File Read), EX (Execution or Address Calculation), MEM (Data Memory Access), and WB (Write Back).
IF: This stage includes the PC Register (Program Counter) and IM (Instruction Memory). In this stage, the PC increments by 4 on each clock cycle and passes the next instruction's address to IM. IM, which is an SRAM, retrieves the instruction and passes it to the ID stage in the next clock cycle. When encountering special instructions like J-type or B-type instructions, the PC waits for the new address to be computed before sending the instruction address to IM.
ID: This stage primarily consists of the Decode, Register File, and Immediate Unit. In this stage, the Control Unit decodes the 32-bit instruction fetched by IM, generating signals for subsequent units. It also reads rsl and rs2 from the Register File, providing them for further calculations. The Immediate Unit calculates the current Immediate value based on the instruction and passes it for subsequent operations.
EXE: This stage includes the ALU (Arithmetic Logic Unit) responsible for executing arithmetic operations, the Multiplier does 4 types of multiplication by stall all the register, the JB Unit for determining jump and branch when encountering B-type or U-type instructions, and several Mux units. The ALU performs basic operations such as addition, subtraction, shifting, comparisons, OR, and AND. The Mux units in this stage select sources based on signals from the ID stage and forwarding unit's feedback. The JB Unit does comparison to decide whether branch operation should be executed, and jump signal is also connected, which allowing Hazard Detection Unit to judge the necessary of flush in IF stage.
MEM: This stage primarily handles Data Memory operations, distinguishing between Load DM and Save DM processes. The Save Control unit manages write signals for SW, SH, and SB operations. And the CSR Unit is also at this stage, to count for the number of instructions and cycles.
WB: This stage determines whether to write the ALU results or the value loaded from DM back to the Register File. If it's the value from DM, a Load Sign Extend operation is performed.
Forwarding Unit: Forwarding occurs when the reg address used in the second instruction matches the address being written by previous instructions. This unit determines the need for forwarding based on the positions of rd in MEM and WB stages and rs1 and rs2 positions in the EXE stage. If necessary, it sends signals to the EXE stage to select which stage's values to forward.
module Forwarding_Unit(forwardA ,forwardB ,RegAddr1 ,RegAddr2 ,MEM_RdAddr ,WB_RdAddr ,MEM_RegWrite ,WB_RegWrite);
input MEM_RegWrite ,WB_RegWrite ;
input [4:0] RegAddr1 ,RegAddr2 ,MEM_RdAddr ,WB_RdAddr;
output logic [1:0] forwardA, forwardB;
always_comb begin
priority if(MEM_RegWrite && MEM_RdAddr == RegAddr1)begin
if(WB_RegWrite && WB_RdAddr == RegAddr2)begin
forwardA = `MEMRDA;
forwardB = `WBRDB;
end
else begin
forwardA = `MEMRDA;
forwardB = `NOFB;
end
end
else if(MEM_RegWrite && MEM_RdAddr == RegAddr2)begin
if(WB_RegWrite && WB_RdAddr == RegAddr1)begin
forwardA = `WBRDA;
forwardB = `MEMRDB;
end
else begin
forwardA = `NOFA;
forwardB = `MEMRDB;
end
end
else if(WB_RegWrite && WB_RdAddr == RegAddr1)begin
forwardA = `WBRDA;
forwardB = `NOFB;
end
else if(WB_RegWrite && WB_RdAddr == RegAddr2)begin
forwardA = `NOFA;
forwardB = `WBRDB;
end
else begin
forwardA = `NOFA;
forwardB = `NOFB;
end
end
endmodule
Hazard Detection: This unit sends signals to the pipeline registers to decide whether to flush or stall the pipeline. When encountering J-type or B-type instructions for jumping, the signals from EXE are sent to this unit. It sends flush signals to IFtoID and IDtoEXE, indicating that the current instructions are not needed. When dealing with Load DM instructions that require forwarding and where DM data is delayed by one cycle, the unit sends a stall signal to IFtoID and PC, indicating the need to wait for DM data before proceeding with the next instruction. It also sends a flush signal to IDtoEXE, indicating that the ID stage's instruction is a nop. As the multiplication takes more time, when the ALU operation is multiplication, the stall signals are sent to the five registers, including PC, IFtoID, IDtoEXE, EXEtoMEM, and MEMtoWB.
module HazardDetection(RegAddr1 ,RegAddr2 ,pc_stall ,IF_flush ,IF_stall ,ID_stall ,ID_flush ,BranchorJump ,RdAddr ,MemRead);
input [4:0] RegAddr1, RegAddr2, RdAddr;
input MemRead;
input BranchorJump;
output logic pc_stall ,IF_stall ,IF_flush ,ID_stall ,ID_flush;
always_comb begin
if(MemRead && (RdAddr == RegAddr1 || RdAddr == RegAddr2))begin //load-use data hazard
pc_stall = 1'b1;
IF_stall = 1'b1;
IF_flush = 1'b0;
ID_stall = 1'b0;
ID_flush = 1'b1;
end
else if(BranchorJump)begin
pc_stall = 1'b0;
IF_stall = 1'b0;
IF_flush = 1'b1;
ID_stall = 1'b0;
ID_flush = 1'b1;
end
else begin
pc_stall = 1'b0;
IF_stall = 1'b0;
IF_flush = 1'b0;
ID_stall = 1'b0;
ID_flush = 1'b0;
end
end
endmodule
In our current design, we have implemented the basic structure for the RV32IMC architecture, which includes the base RV32I instructions as well as extensions for multiplication and division (M). However, we did not implement the C extension, which defines the compressed instructions.
Why the "C" Extension Was Not Implemented:
The C extension is responsible for compressing instructions to reduce code size and improve performance in some scenarios. While this can be advantageous in terms of memory efficiency, implementing the C extension requires additional logic to handle compressed instructions, such as:
For this specific project, we chose to focus on the core features of RV32IMC without adding the compression logic. This decision was made to simplify the implementation and avoid the extra complexity involved in decoding and managing compressed instructions.
In this section, we explain the SRAM memory model used in our design, including the SRAM wrapper module and how it interfaces with the actual SRAM memory.
SRAM Wrapper Overview:
The SRAM_wrapper module serves as an interface to the underlying SRAM module. It takes various control signals as inputs and passes them to the internal SRAM instance to manage memory operations.
Internal SRAM Instance:
The SRAM_wrapper module instantiates the SRAM module, passing the control and data signals to it. Here's how the internal memory operations work:
module SRAM_wrapper (
input CK,
input CS,
input OE,
input [3:0] WEB,
input [13:0] A,
input [31:0] DI,
output [31:0] DO
);
SRAM i_SRAM (
.A0 (A[0] ),
.A1 (A[1] ),
.A2 (A[2] ),
.A3 (A[3] ),
.A4 (A[4] ),
.A5 (A[5] ),
.A6 (A[6] ),
.A7 (A[7] ),
.A8 (A[8] ),
.A9 (A[9] ),
.A10 (A[10] ),
.A11 (A[11] ),
.A12 (A[12] ),
.A13 (A[13] ),
.DO0 (DO[0] ),
.DO1 (DO[1] ),
.DO2 (DO[2] ),
.DO3 (DO[3] ),
.DO4 (DO[4] ),
.DO5 (DO[5] ),
.DO6 (DO[6] ),
.DO7 (DO[7] ),
.DO8 (DO[8] ),
.DO9 (DO[9] ),
.DO10 (DO[10]),
.DO11 (DO[11]),
.DO12 (DO[12]),
.DO13 (DO[13]),
.DO14 (DO[14]),
.DO15 (DO[15]),
.DO16 (DO[16]),
.DO17 (DO[17]),
.DO18 (DO[18]),
.DO19 (DO[19]),
.DO20 (DO[20]),
.DO21 (DO[21]),
.DO22 (DO[22]),
.DO23 (DO[23]),
.DO24 (DO[24]),
.DO25 (DO[25]),
.DO26 (DO[26]),
.DO27 (DO[27]),
.DO28 (DO[28]),
.DO29 (DO[29]),
.DO30 (DO[30]),
.DO31 (DO[31]),
.DI0 (DI[0] ),
.DI1 (DI[1] ),
.DI2 (DI[2] ),
.DI3 (DI[3] ),
.DI4 (DI[4] ),
.DI5 (DI[5] ),
.DI6 (DI[6] ),
.DI7 (DI[7] ),
.DI8 (DI[8] ),
.DI9 (DI[9] ),
.DI10 (DI[10]),
.DI11 (DI[11]),
.DI12 (DI[12]),
.DI13 (DI[13]),
.DI14 (DI[14]),
.DI15 (DI[15]),
.DI16 (DI[16]),
.DI17 (DI[17]),
.DI18 (DI[18]),
.DI19 (DI[19]),
.DI20 (DI[20]),
.DI21 (DI[21]),
.DI22 (DI[22]),
.DI23 (DI[23]),
.DI24 (DI[24]),
.DI25 (DI[25]),
.DI26 (DI[26]),
.DI27 (DI[27]),
.DI28 (DI[28]),
.DI29 (DI[29]),
.DI30 (DI[30]),
.DI31 (DI[31]),
.CK (CK ),
.WEB0 (WEB[0]),
.WEB1 (WEB[1]),
.WEB2 (WEB[2]),
.WEB3 (WEB[3]),
.OE (OE ),
.CS (CS )
);
endmodule
This section describes how to set up the simulation environment for the 5-stage pipelined RISC-V processor with RV32IMC design using ModelSim - Intel FPGA Starter Edition.
Create a working library:
vlib work
vmap work <path_to_work_directory>
Compile the SystemVerilog files:
vcom -sv HazardDetection.sv
Run the Simulation:
vsim work.HazardDetection
View the waveform to monitor signals:
add wave -position end sim:/HazardDetection/*
First, We use below sim.do
in Modelsim to compile my program and start simualtion by entering "do sim.do" in Modelsim terminal
# Create work library
vlib work
# Compile design files in correct order
vlog -sv parameter_define.sv
vlog -sv PCReg.sv
vlog -sv IFtoID.sv
vlog -sv RegFile.sv
vlog -sv Control_Unit.sv
vlog -sv Immediate_Unit.sv
vlog -sv IDtoEXE.sv
vlog -sv ALU.sv
vlog -sv MUX.sv
vlog -sv MUX3.sv
vlog -sv ConditionChecker.sv
vlog -sv EXEtoMEM.sv
vlog -sv MEMtoWB.sv
vlog -sv Forwarding_Unit.sv
vlog -sv LoadSignExtend.sv
vlog -sv SaveControl.sv
vlog -sv HazardDetection.sv
vlog -sv CSR.sv
vlog -sv SRAM.sv
vlog -sv SRAM_wrapper.sv
vlog -sv CPU.sv
vlog -sv cpu_tb.sv
# Start simulation
vsim -c cpu_tb
# Run simulation
run -all
Second, We will explain our testbench program cpu_tb.sv
module cpu_tb();
// Clock and reset signals
logic clk;
logic rst;
// Memory interface signals
logic IM_cs, DM_cs;
logic IM_oe, DM_oe;
logic [3:0] IM_web, DM_web;
logic [13:0] IM_addr, DM_addr;
logic [31:0] IM_datain, IM_dataout;
logic [31:0] DM_datain, DM_dataout;
// File reading variables
int i;
int file;
string line;
logic [31:0] instruction;
// Helper function to get memory word
function logic [31:0] get_mem_word(input int addr);
return {DM1.i_SRAM.Memory_byte3[addr],
DM1.i_SRAM.Memory_byte2[addr],
DM1.i_SRAM.Memory_byte1[addr],
DM1.i_SRAM.Memory_byte0[addr]};
endfunction
// Helper function to convert hex character to 4-bit value
function logic [3:0] hex_to_4bit(input byte hex_char);
if (hex_char >= "0" && hex_char <= "9")
return hex_char - "0";
else if (hex_char >= "a" && hex_char <= "f")
return hex_char - "a" + 10;
else if (hex_char >= "A" && hex_char <= "F")
return hex_char - "A" + 10;
else
return 4'h0;
endfunction
// Helper function to print registers
function void print_registers();
$display("\nRegister File Contents:");
$display("----------------------");
for (int i = 0; i < 32; i += 4) begin
$display("x%-2d: %8h x%-2d: %8h x%-2d: %8h x%-2d: %8h",
i, cpu_inst.RF.RegMem[i],
i+1, cpu_inst.RF.RegMem[i+1],
i+2, cpu_inst.RF.RegMem[i+2],
i+3, cpu_inst.RF.RegMem[i+3]);
end
endfunction
// Helper function to print data memory contents
function void print_data_mem();
$display("\nData Memory Contents:");
$display("--------------------");
for (int i = 0; i < 16; i += 4) begin
$display("Word[%2d]: %8h Word[%2d]: %8h Word[%2d]: %8h Word[%2d]: %8h",
i, get_mem_word(i),
i+1, get_mem_word(i+1),
i+2, get_mem_word(i+2),
i+3, get_mem_word(i+3));
end
endfunction
// Clock generation
initial begin
clk = 0;
forever #5 clk = ~clk;
end
// Reset generation
initial begin
rst = 1;
#20 rst = 0;
end
// CPU instantiation
CPU cpu_inst (
.clk(clk),
.rst(rst),
.IM_cs(IM_cs),
.DM_cs(DM_cs),
.IM_oe(IM_oe),
.DM_oe(DM_oe),
.IM_web(IM_web),
.DM_web(DM_web),
.IM_addr(IM_addr),
.DM_addr(DM_addr),
.IM_datain(IM_datain),
.IM_dataout(IM_dataout),
.DM_datain(DM_datain),
.DM_dataout(DM_dataout)
);
// Instruction Memory
SRAM_wrapper IM1 (
.CK(clk),
.CS(IM_cs),
.OE(IM_oe),
.WEB(IM_web),
.A(IM_addr),
.DI(IM_datain),
.DO(IM_dataout)
);
// Data Memory
SRAM_wrapper DM1 (
.CK(clk),
.CS(DM_cs),
.OE(DM_oe),
.WEB(DM_web),
.A(DM_addr),
.DI(DM_datain),
.DO(DM_dataout)
);
// Main test sequence
initial begin
i = 0;
// Initialize memories
for (int idx = 0; idx < 16384; idx++) begin
// Initialize both memories to 0
IM1.i_SRAM.Memory_byte0[idx] = 8'h00;
IM1.i_SRAM.Memory_byte1[idx] = 8'h00;
IM1.i_SRAM.Memory_byte2[idx] = 8'h00;
IM1.i_SRAM.Memory_byte3[idx] = 8'h00;
DM1.i_SRAM.Memory_byte0[idx] = 8'h00;
DM1.i_SRAM.Memory_byte1[idx] = 8'h00;
DM1.i_SRAM.Memory_byte2[idx] = 8'h00;
DM1.i_SRAM.Memory_byte3[idx] = 8'h00;
end
// Load test.mem into instruction memory
file = $fopen("test.mem", "r");
if (file) begin
while (!$feof(file) && i < 1024) begin
void'($fgets(line, file));
// Skip comment lines and empty lines
if (line.len() > 0 && line[0] != "/" && line[1] != "/") begin
// Read 8 hex characters for the instruction
if (line.len() >= 8) begin
instruction[31:28] = hex_to_4bit(line[0]);
instruction[27:24] = hex_to_4bit(line[1]);
instruction[23:20] = hex_to_4bit(line[2]);
instruction[19:16] = hex_to_4bit(line[3]);
instruction[15:12] = hex_to_4bit(line[4]);
instruction[11:8] = hex_to_4bit(line[5]);
instruction[7:4] = hex_to_4bit(line[6]);
instruction[3:0] = hex_to_4bit(line[7]);
// Store in instruction memory
IM1.i_SRAM.Memory_byte3[i] = instruction[31:24];
IM1.i_SRAM.Memory_byte2[i] = instruction[23:16];
IM1.i_SRAM.Memory_byte1[i] = instruction[15:8];
IM1.i_SRAM.Memory_byte0[i] = instruction[7:0];
i = i + 1;
end
end
end
$fclose(file);
end else begin
$display("Error: Could not open test.mem");
$finish;
end
// Wait for reset
@(negedge rst);
// Run program
repeat(2000) @(posedge clk);
// Print final state
print_registers();
print_data_mem();
$stop;
end
endmodule
This testbench runs a RISC-V program in hex format from test.mem
It simulates the CPU with instruction and data memory
At the end, it shows the final state of all registers and data memory
initial begin
// 1. Initialize all memory to zero
for (int idx = 0; idx < 16384; idx++) begin
IM1.i_SRAM.Memory_byte[0-3][idx] = 8'h00;
DM1.i_SRAM.Memory_byte[0-3][idx] = 8'h00;
end
// 2. Read and load program from test.mem
file = $fopen("test.mem", "r");
// Read hex instructions line by line
// Each line should contain 8 hex characters (32-bit instruction)
// 3. Wait for reset
@(negedge rst);
// 4. Run program for 2000 clock cycles
repeat(2000) @(posedge clk);
// 5. Print results
print_registers();
print_data_mem();
end
// A. Change number of cycles:
// Find this line:
repeat(2000) @(posedge clk);
// Change 2000 to desired number
// B. Change memory size:
// Find this line:
for (int idx = 0; idx < 16384; idx++) begin
// Change 16384 to desired size
// C. Change displayed memory range:
// In print_data_mem function:
for (int i = 0; i < 16; i += 4) begin
// Change 16 to show more/less memory words
Due to our memory model (SRAM.sv
and SRAM_wrapper.sv
), we do not support .data usage and our default memory starts as address 0. Below is an example
# Common practice with .data
.data # Data segment
array: .word 1,2,3 # Array starting at some address
# Our CPU requires (memory starts at 0):
addi x1, x0, 1 # x1 = 1
addi x2, x0, 2 # x2 = 2
addi x3, x0, 3 # x3 = 3
sw x1, 0(x0) # Store 1 at addr 0
sw x2, 4(x0) # Store 2 at addr 4
sw x3, 8(x0) # Store 3 at addr 8
# Initialize array [5, 2, 7, 1, 3]
addi x1, x0, 5 # Load 5
sw x1, 0(x0) # Store at mem[0]
addi x1, x0, 2 # Load 2
sw x1, 4(x0) # Store at mem[1]
addi x1, x0, 7 # Load 7
sw x1, 8(x0) # Store at mem[2]
addi x1, x0, 1 # Load 1
sw x1, 12(x0) # Store at mem[3]
addi x1, x0, 3 # Load 3
sw x1, 16(x0) # Store at mem[4]
# Initialize counters
addi x2, x0, 4 # n-1 = 4 (outer loop limit)
addi x3, x0, 0 # i = 0 (outer loop counter)
# Outer Loop
outer:
beq x3, x2, done # if i == n-1, jump to done
addi x4, x0, 0 # j = 0 (inner loop counter)
sub x5, x2, x3 # limit = (n-1)-i
jal x0, inner # Jump to inner loop
# Inner Loop
inner:
beq x4, x5, next_i # if j == limit, jump to next_i
slli x6, x4, 2 # x6 = j * 4
lw x7, 0(x6) # load A[j]
lw x8, 4(x6) # load A[j+1]
bge x8, x7, skip # if A[j+1] >= A[j], skip swap
# Swap values
sw x8, 0(x6) # store smaller value
sw x7, 4(x6) # store larger value
skip:
addi x4, x4, 1 # j++
jal x0, inner # Jump back to inner loop
# Move to the Next Outer Loop Iteration
next_i:
addi x3, x3, 1 # i++
jal x0, outer # Jump back to outer loop
# End Program
done:
# Program terminates here
00500093
00102023
00200093
00102223
00700093
00102423
00100093
00102623
00300093
00102823
00400113
00000193
02218e63
00000213
403102b3
0040006f
02520263
00221313
00032383
00432403
00745663
00832023
00732223
00120213
fe1ff06f
00118193
fc9ff06f
# Data Memory Contents:
# --------------------
# Word[ 0]: 00000001 Word[ 1]: 00000002 Word[ 2]: 00000003 Word[ 3]: 00000005
# Word[ 4]: 00000007 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000
# Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000
# Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000
As above output, [5, 2, 7, 1, 3] is sorted and become [1, 2, 3, 5, 7]
In this second program, we want to compute the dot product of two arrays
# Compute the dot product of two arrays
# Array A: [3, -4, 2] (signed, manually initialized in memory)
# Array B: [5, 6, 7] (unsigned, manually initialized in memory)
# Initialize array A
addi x10, x0, 3 # Load 3 into x10
sw x10, 0(x0) # Store A[0] at address 0x00000000
addi x10, x0, -4 # Load -4 into x10
sw x10, 4(x0) # Store A[1] at address 0x00000004
addi x10, x0, 2 # Load 2 into x10
sw x10, 8(x0) # Store A[2] at address 0x00000008
# Initialize array B
addi x10, x0, 5 # Load 5 into x10
sw x10, 12(x0) # Store B[0] at address 0x0000000C
addi x10, x0, 6 # Load 6 into x10
sw x10, 16(x0) # Store B[1] at address 0x00000010
addi x10, x0, 7 # Load 7 into x10
sw x10, 20(x0) # Store B[2] at address 0x00000014
# Initialize variables
addi x5, x0, 0 # Base address for array A (0x00000000)
addi x6, x0, 12 # Base address for array B (0x0000000C)
addi x2, x0, 3 # n = 3 (array size)
addi x3, x0, 0 # i = 0 (index counter)
addi x4, x0, 0 # dot_product = 0 (result accumulator)
# Loop to compute the dot product
loop:
beq x3, x2, done # If i == n, exit loop
# Load A[i] (signed) and B[i] (unsigned)
slli x7, x3, 2 # x7 = i * 4 (offset for arrays)
add x8, x5, x7 # Address of A[i]
add x9, x6, x7 # Address of B[i]
lw x10, 0(x8) # Load A[i] into x10 (signed)
lw x11, 0(x9) # Load B[i] into x11 (unsigned)
# Compute A[i] * B[i] using MUL (lower 32 bits)
mul x12, x10, x11 # x12 = A[i] * B[i] (lower 32 bits)
# Compute A[i] * B[i] using MULH (upper 32 bits, signed × signed)
mulh x13, x10, x11 # x13 = Upper 32 bits of A[i] * B[i] (signed × signed)
# Compute A[i] * B[i] using MULHSU (upper 32 bits, signed × unsigned)
mulhsu x14, x10, x11 # x14 = Upper 32 bits of A[i] * B[i] (signed × unsigned)
# Compute A[i] * B[i] using MULHU (upper 32 bits, unsigned × unsigned)
mulhu x15, x11, x11 # x15 = Upper 32 bits of B[i] * B[i] (unsigned × unsigned)
# Accumulate the lower 32 bits for dot product
add x4, x4, x12 # dot_product += A[i] * B[i] (lower 32 bits)
# Increment index
addi x3, x3, 1 # i++
jal x0, loop # Jump back to loop
# Store the final dot product
done:
sw x4, 24(x0) # Store dot_product in mem[6]
# End program
00300513
00a02023
ffc00513
00a02223
00200513
00a02423
00500513
00a02623
00600513
00a02823
00700513
00a02a23
00000293
00c00313
00300113
00000193
00000213
02218a63
00219393
00728433
007304b3
00042503
0004a583
02b50633
02b516b3
02b52733
02b5b7b3
00c20233
00118193
fd1ff06f
00402c23
Address | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
Word[ 0] | 0x00000003 | 3 | Array A[0]. |
Word[ 1] | 0xFFFFFFFC | -4 | Array A[1]. |
Word[ 2] | 0x00000002 | 2 | Array A[2]. |
Word[ 3] | 0x00000005 | 5 | Array B[0]. |
Word[ 4] | 0x00000006 | 6 | Array B[1]. |
Word[ 5] | 0x00000007 | 7 | Array B[2]. |
Word[ 6] | 0x00000005 | 5 | Final dot product result. |
# Data Memory Contents:
# --------------------
# Word[ 0]: 00000003 Word[ 1]: fffffffc Word[ 2]: 00000002 Word[ 3]: 00000005
# Word[ 4]: 00000006 Word[ 5]: 00000007 Word[ 6]: 00000005 Word[ 7]: 00000000
# Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000
# Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000
In this program, We want to test the 4 multuplication related instructino, which are MUL, MULH, MULHSU, and MULHU
# Initialize test values
lui x1, 0x80000 # x1 = 0x80000000 (most negative signed 32-bit)
addi x2, x0, -2 # x2 = -2 (0xFFFFFFFE)
lui x3, 0x7FFF # x3 = 0x7FFF0000
addi x3, x3, 0x7FF # x3 = 0x7FFFFFFF (largest positive signed)
# Test MUL (lower 32 bits only)
mul x4, x1, x2 # (-2^31) * (-2) = 2^32
sw x4, 0(x0) # Should store 0x00000000 (lower 32 bits)
# Test MULH (signed × signed)
mulh x5, x1, x2 # (-2^31) * (-2) = 2^32, upper bits
sw x5, 4(x0) # Should store 0x00000001 (upper 32 bits)
# Test MULHSU (signed × unsigned)
mulhsu x6, x1, x2 # (-2^31) treated as signed, (-2) treated as unsigned
sw x6, 8(x0) # Should show difference from MULH
# Test MULHU (unsigned × unsigned)
mulhu x7, x1, x2 # Both treated as unsigned values
sw x7, 12(x0) # Should show difference from both MULH and MULHSU
# End program
800000b7
ffe00113
07fff1b7
7ff18193
02208233
00402023
022092b3
00502223
0220a333
00602423
0220b3b3
00702623
Register | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
x1 | 0x80000000 | -2147483648 | From lui x1, 0x800000 |
x2 | 0xFFFFFFFE | -2 | From addi x2, x0, -2 |
x3 | 0x07FFF7FF | 8,388,607 | From lui + addi combination |
x4 | 0x00000000 | 0 | Lower 32 bits of x1 * x2 |
x5 | 0x00000001 | 1 | Upper 32 bits (signed × signed) |
x6 | 0x80000001 | -2147483647 | Upper 32 bits (signed × unsigned) |
x7 | 0x7FFFFFFF | 2147483647 | Upper 32 bits (unsigned × unsigned) |
Address | Value (Hex) | Description |
---|---|---|
0x00 | 0x00000000 | MUL result (lower 32 bits) |
0x04 | 0x00000001 | MULH result (signed × signed) |
0x08 | 0x80000001 | MULHSU result (signed × unsigned) |
0x0C | 0x7FFFFFFF | MULHU result (unsigned × unsigned) |
# Register File Contents:
# ----------------------
# x0 : 00000000 x1 : 80000000 x2 : fffffffe x3 : 07fff7ff
# x4 : 00000000 x5 : 00000001 x6 : 80000001 x7 : 7fffffff
# x8 : 00000000 x9 : 00000000 x10: 00000000 x11: 00000000
# x12: 00000000 x13: 00000000 x14: 00000000 x15: 00000000
# x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000
# x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000
# x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000
# x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000
#
# Data Memory Contents:
# --------------------
# Word[ 0]: 00000000 Word[ 1]: 00000001 Word[ 2]: 80000001 Word[ 3]: 7fffffff
# Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000
# Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000
# Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000
In this program, We want to verify the hazard detection and forwarding ability of our CPU, Hence, We created a program with lots of hazards, including load-use hazard.
# Initialize registers
addi x1, x0, 10 # x1 = 10
addi x2, x0, 20 # x2 = 20
addi x3, x0, 5 # x3 = 5
addi x4, x0, 3 # x4 = 3
# Store values into memory
sw x1, 0(x0) # Store x1 (10) at address 0x00000000
sw x2, 4(x0) # Store x2 (20) at address 0x00000004
sw x3, 8(x0) # Store x3 (5) at address 0x00000008
sw x4, 12(x0) # Store x4 (3) at address 0x0000000C
# Load values from memory and create load-use hazards
lw x5, 0(x0) # x5 = mem[0] = 10
add x6, x5, x2 # x6 = x5 + x2 (data hazard with x5)
lw x7, 4(x0) # x7 = mem[4] = 20
sub x8, x7, x3 # x8 = x7 - x3 (data hazard with x7)
lw x9, 8(x0) # x9 = mem[8] = 5
mul x10, x9, x6 # x10 = x9 * x6 (data hazard with x6 and x9)
add x11, x10, x8 # x11 = x10 + x8 (data hazard with x10 and x8)
# Additional operations to create more hazards
mul x12, x11, x4 # x12 = x11 * x4 (data hazard with x11)
add x13, x12, x5 # x13 = x12 + x5 (data hazard with x12 and x5)
sub x14, x13, x9 # x14 = x13 - x9 (data hazard with x13 and x9)
00a00093
01400113
00500193
00300213
00102023
00202223
00302423
00402623
00002283
00228333
00402383
40338433
00802483
02648533
008505b3
02458633
005606b3
40968733
Registers
Register | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
x0 | 0x00000000 | 0 | Zero register (always 0). |
x1 | 0x0000000A | 10 | Initialized value. |
x2 | 0x00000014 | 20 | Initialized value. |
x3 | 0x00000005 | 5 | Initialized value. |
x4 | 0x00000003 | 3 | Initialized value. |
x5 | 0x0000000A | 10 | Loaded from memory[0]. |
x6 | 0x0000001E | 30 | Result of x5 + x2 . |
x7 | 0x00000014 | 20 | Loaded from memory[4]. |
x8 | 0x0000000F | 15 | Result of x7 - x3 . |
x9 | 0x00000005 | 5 | Loaded from memory[8]. |
x10 | 0x00000096 | 150 | Result of x9 * x6 . |
x11 | 0x000000A5 | 165 | Result of x10 + x8 . |
x12 | 0x000001EF | 495 | Result of x11 * x4 . |
x13 | 0x000001F9 | 505 | Result of x12 + x5 . |
x14 | 0x000001F4 | 500 | Result of x13 - x9 . |
Data Memory
Address | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
Word[0] | 0x0000000A | 10 | Stored value of x1 . |
Word[1] | 0x00000014 | 20 | Stored value of x2 . |
Word[2] | 0x00000005 | 5 | Stored value of x3 . |
Word[3] | 0x00000003 | 3 | Stored value of x4 . |
Word[4-15] | 0x00000000 | 0 | Placeholder for unused memory. |
# Register File Contents:
# ----------------------
# x0 : 00000000 x1 : 0000000a x2 : 00000014 x3 : 00000005
# x4 : 00000003 x5 : 0000000a x6 : 0000001e x7 : 00000014
# x8 : 0000000f x9 : 00000005 x10: 00000096 x11: 000000a5
# x12: 000001ef x13: 000001f9 x14: 000001f4 x15: 00000000
# x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000
# x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000
# x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000
# x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000
#
# Data Memory Contents:
# --------------------
# Word[ 0]: 0000000a Word[ 1]: 00000014 Word[ 2]: 00000005 Word[ 3]: 00000003
# Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000
# Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000
# Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000
In this program, We want to test I-type and S-type instrcutions, especially those dealing with byte and half-word
# Test program using various I-type and S-type instructions
addi x1, x0, 10 # x1 = 10
addi x2, x0, -5 # x2 = -5 (sign extended)
xori x3, x1, 0xFF # x3 = 10 ^ 0xFF = 0xF5
ori x4, x2, 0x7F # x4 = -5 | 0x7F
andi x5, x3, 0xF # x5 = 0xF5 & 0xF = 0x5
slli x6, x1, 2 # x6 = 10 << 2 = 40
srli x7, x3, 1 # x7 = 0xF5 >> 1
srai x8, x2, 1 # x8 = -5 >> 1 (arithmetic)
slti x9, x2, 0 # x9 = (-5 < 0) ? 1 : 0
sltiu x10, x1, 20 # x10 = (10 < 20) ? 1 : 0
# Store some values
sw x1, 0(x0) # Store x1(10) at mem[0]
sh x3, 4(x0) # Store x3(0xF5) as halfword at mem[4]
sb x5, 6(x0) # Store x5(5) as byte at mem[8]
# Load values back
lw x11, 0(x0) # Load word from mem[0]
lh x12, 4(x0) # Load halfword (signed) from mem[4]
lhu x13, 4(x0) # Load halfword (unsigned) from mem[4]
lb x14, 6(x0) # Load byte (signed) from mem[8]
lbu x15, 6(x0) # Load byte (unsigned) from mem[8]
00a00093
ffb00113
0ff0c193
07f16213
00f1f293
00209313
0011d393
40115413
00012493
0140b513
00102023
00301223
00500323
00002583
00401603
00405683
00600703
00604783
Register | Value (Hex) | Value (Decimal) | Explanation |
---|---|---|---|
x0 | 0x00000000 | 0 | Zero register (always 0). |
x1 | 0x0000000A | 10 | addi x1, x0, 10 . |
x2 | 0xFFFFFFFB | -5 | addi x2, x0, -5 . |
x3 | 0x000000F5 | 245 | xori x3, x1, 0xFF → 10 ^ 0xFF = 0xF5 . |
x4 | 0xFFFFFFFF | -1 | ori x4, x2, 0x7F → `-5 |
x5 | 0x00000005 | 5 | andi x5, x3, 0xF → 0xF5 & 0xF = 0x5 . |
x6 | 0x00000028 | 40 | slli x6, x1, 2 → 10 << 2 = 40 . |
x7 | 0x0000007A | 122 | srli x7, x3, 1 → 0xF5 >> 1 = 0x7A . |
x8 | 0xFFFFFFFD | -3 | srai x8, x2, 1 → -5 >> 1 = -3 (arithmetic shift keeps the sign). |
x9 | 0x00000001 | 1 | slti x9, x2, 0 → (-5 < 0) ? 1 : 0 . |
x10 | 0x00000001 | 1 | sltiu x10, x1, 20 → (10 < 20) ? 1 : 0 . |
x11 | 0x0000000A | 10 | lw x11, 0(x0) → Loads word from mem[0] . |
x12 | 0x000000F5 | 245 | lh x12, 4(x0) → Signed halfword from mem[4] . |
x13 | 0x000000F5 | 245 | lhu x13, 4(x0) → Unsigned halfword from mem[4] . |
x14 | 0xFFFFFFF5 | -11 | lb x14, 6(x0) → Signed byte from mem[6] . |
x15 | 0x000000F5 | 5 | lbu x15, 6(x0) → Unsigned byte from mem[6] . |
Word Address | Value (Hex) | Explanation |
---|---|---|
Word[ 0] | 0x0000000A | Stored by sw x1, 0(x0) . |
Word[ 1] | 0x000500F5 | Combined result of sh x3, 4(x0) and sb x5, 6(x0) . |
# Register File Contents:
# ----------------------
# x0 : 00000000 x1 : 0000000a x2 : fffffffb x3 : 000000f5
# x4 : ffffffff x5 : 00000005 x6 : 00000028 x7 : 0000007a
# x8 : fffffffd x9 : 00000001 x10: 00000001 x11: 0000000a
# x12: 000000f5 x13: 000000f5 x14: fffffff5 x15: 000000f5
# x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000
# x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000
# x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000
# x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000
#
# Data Memory Contents:
# --------------------
# Word[ 0]: 0000000a Word[ 1]: 000500f5 Word[ 2]: 00000000 Word[ 3]: 00000000
# Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000
# Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000
# Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000
### Program6
In this program, we want to test U-type and B-type instructions
# U-type instructions test
lui x1, 0x12345 # x1 = 0x12345000
auipc x2, 0x1000 # x2 = PC + 0x1000000
# Test signed vs unsigned comparisons with distinct values
addi x3, x0, -5 # x3 = -5 (0xFFFFFFFB)
addi x4, x0, 10 # x4 = 10
# Store markers at different memory locations to track which path was taken
addi x5, x0, 1 # Path marker = 1
sw x5, 0(x0) # Store at mem[0] before comparison
# Test signed comparison
blt x3, x4, signed_path # Should take (-5 < 10)
addi x5, x0, 20 # Won't execute
sw x5, 4(x0) # Won't execute
j next_test
signed_path:
addi x5, x0, 10 # Will execute
sw x5, 4(x0) # Store at mem[1] to show path taken
next_test:
# Test unsigned comparison
bltu x3, x4, unsigned_path # Should not take (0xFFFFFFFB > 10)
addi x6, x0, 30 # Will execute
sw x6, 8(x0) # Store at mem[2]
j done
unsigned_path:
addi x6, x0, 40 # Won't execute
sw x6, 8(x0) # Won't execute
done:
123450b7
01000117
ffb00193
00a00213
00100293
00502023
0041c863
01400293
00502223
00c0006f
00a00293
00502223
0041e863
01e00313
00602423
Register | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
x1 | 0x12345000 | 305419264 | Upper 20 bits loaded by lui |
x2 | 0x01000004 | 16777220 | PC + 0x1000000 from auipc |
x3 | 0xFFFFFFFB | -5 | Negative value for comparison |
x4 | 0x0000000A | 10 | Positive value for comparison |
x5 | 0x0000000A | 10 | Shows signed branch was taken |
x6 | 0x0000001E | 30 | Shows unsigned branch not taken |
Address | Value (Hex) | Value (Decimal) | Description |
---|---|---|---|
0x00 | 0x00000001 | 1 | Initial marker |
0x04 | 0x0000000A | 10 | Shows signed path taken |
0x08 | 0x0000001E | 30 | Shows unsigned path not taken |
# Register File Contents:
# ----------------------
# x0 : 00000000 x1 : 12345000 x2 : 01000004 x3 : fffffffb
# x4 : 0000000a x5 : 0000000a x6 : 0000001e x7 : 00000000
# x8 : 00000000 x9 : 00000000 x10: 00000000 x11: 00000000
# x12: 00000000 x13: 00000000 x14: 00000000 x15: 00000000
# x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000
# x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000
# x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000
# x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000
#
# Data Memory Contents:
# --------------------
# Word[ 0]: 00000001 Word[ 1]: 0000000a Word[ 2]: 0000001e Word[ 3]: 00000000
# Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000
# Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000
# Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000