Try   HackMD

5-stage pipelined RISC-V processor with RV32IM

蔡承璋, 張恩祥

GitHub

Introduction

 In this project, we write RTL code of pipeline CPU. We need to get familiar with the RISC-V instruction set architecture and implement 45 instructions. The requirements include R-type, I-type, S-type, B-type U-type and J-type instruction, which allows us to recall the knowledge that was taught in computer organization.

Block diagram

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

CPU Description:

 The CPU is organized into five pipeline stages: IF (Instruction Fetch), ID (Instruction Decode and Register File Read), EX (Execution or Address Calculation), MEM (Data Memory Access), and WB (Write Back).

 IF: This stage includes the PC Register (Program Counter) and IM (Instruction Memory). In this stage, the PC increments by 4 on each clock cycle and passes the next instruction's address to IM. IM, which is an SRAM, retrieves the instruction and passes it to the ID stage in the next clock cycle. When encountering special instructions like J-type or B-type instructions, the PC waits for the new address to be computed before sending the instruction address to IM.

 ID: This stage primarily consists of the Decode, Register File, and Immediate Unit. In this stage, the Control Unit decodes the 32-bit instruction fetched by IM, generating signals for subsequent units. It also reads rsl and rs2 from the Register File, providing them for further calculations. The Immediate Unit calculates the current Immediate value based on the instruction and passes it for subsequent operations.

 EXE: This stage includes the ALU (Arithmetic Logic Unit) responsible for executing arithmetic operations, the Multiplier does 4 types of multiplication by stall all the register, the JB Unit for determining jump and branch when encountering B-type or U-type instructions, and several Mux units. The ALU performs basic operations such as addition, subtraction, shifting, comparisons, OR, and AND. The Mux units in this stage select sources based on signals from the ID stage and forwarding unit's feedback. The JB Unit does comparison to decide whether branch operation should be executed, and jump signal is also connected, which allowing Hazard Detection Unit to judge the necessary of flush in IF stage.

 MEM: This stage primarily handles Data Memory operations, distinguishing between Load DM and Save DM processes. The Save Control unit manages write signals for SW, SH, and SB operations. And the CSR Unit is also at this stage, to count for the number of instructions and cycles.

 WB: This stage determines whether to write the ALU results or the value loaded from DM back to the Register File. If it's the value from DM, a Load Sign Extend operation is performed.

 Forwarding Unit: Forwarding occurs when the reg address used in the second instruction matches the address being written by previous instructions. This unit determines the need for forwarding based on the positions of rd in MEM and WB stages and rs1 and rs2 positions in the EXE stage. If necessary, it sends signals to the EXE stage to select which stage's values to forward.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

module Forwarding_Unit(forwardA ,forwardB ,RegAddr1 ,RegAddr2 ,MEM_RdAddr ,WB_RdAddr ,MEM_RegWrite ,WB_RegWrite); input MEM_RegWrite ,WB_RegWrite ; input [4:0] RegAddr1 ,RegAddr2 ,MEM_RdAddr ,WB_RdAddr; output logic [1:0] forwardA, forwardB; always_comb begin priority if(MEM_RegWrite && MEM_RdAddr == RegAddr1)begin if(WB_RegWrite && WB_RdAddr == RegAddr2)begin forwardA = `MEMRDA; forwardB = `WBRDB; end else begin forwardA = `MEMRDA; forwardB = `NOFB; end end else if(MEM_RegWrite && MEM_RdAddr == RegAddr2)begin if(WB_RegWrite && WB_RdAddr == RegAddr1)begin forwardA = `WBRDA; forwardB = `MEMRDB; end else begin forwardA = `NOFA; forwardB = `MEMRDB; end end else if(WB_RegWrite && WB_RdAddr == RegAddr1)begin forwardA = `WBRDA; forwardB = `NOFB; end else if(WB_RegWrite && WB_RdAddr == RegAddr2)begin forwardA = `NOFA; forwardB = `WBRDB; end else begin forwardA = `NOFA; forwardB = `NOFB; end end endmodule

 Hazard Detection: This unit sends signals to the pipeline registers to decide whether to flush or stall the pipeline. When encountering J-type or B-type instructions for jumping, the signals from EXE are sent to this unit. It sends flush signals to IFtoID and IDtoEXE, indicating that the current instructions are not needed. When dealing with Load DM instructions that require forwarding and where DM data is delayed by one cycle, the unit sends a stall signal to IFtoID and PC, indicating the need to wait for DM data before proceeding with the next instruction. It also sends a flush signal to IDtoEXE, indicating that the ID stage's instruction is a nop. As the multiplication takes more time, when the ALU operation is multiplication, the stall signals are sent to the five registers, including PC, IFtoID, IDtoEXE, EXEtoMEM, and MEMtoWB.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

module HazardDetection(RegAddr1 ,RegAddr2 ,pc_stall ,IF_flush ,IF_stall ,ID_stall ,ID_flush ,BranchorJump ,RdAddr ,MemRead); input [4:0] RegAddr1, RegAddr2, RdAddr; input MemRead; input BranchorJump; output logic pc_stall ,IF_stall ,IF_flush ,ID_stall ,ID_flush; always_comb begin if(MemRead && (RdAddr == RegAddr1 || RdAddr == RegAddr2))begin //load-use data hazard pc_stall = 1'b1; IF_stall = 1'b1; IF_flush = 1'b0; ID_stall = 1'b0; ID_flush = 1'b1; end else if(BranchorJump)begin pc_stall = 1'b0; IF_stall = 1'b0; IF_flush = 1'b1; ID_stall = 1'b0; ID_flush = 1'b1; end else begin pc_stall = 1'b0; IF_stall = 1'b0; IF_flush = 1'b0; ID_stall = 1'b0; ID_flush = 1'b0; end end endmodule

RV32IMC - Compression ( C ) Instructions

In our current design, we have implemented the basic structure for the RV32IMC architecture, which includes the base RV32I instructions as well as extensions for multiplication and division (M). However, we did not implement the C extension, which defines the compressed instructions.
Why the "C" Extension Was Not Implemented:
The C extension is responsible for compressing instructions to reduce code size and improve performance in some scenarios. While this can be advantageous in terms of memory efficiency, implementing the C extension requires additional logic to handle compressed instructions, such as:

  • Instruction Decoding: Decoding 16-bit compressed instructions into 32-bit instructions introduces additional complexity in the instruction pipeline.
  • Handling Pairs of Instructions: Some instructions may require multiple 16-bit instructions to form a full 32-bit operation.
  • Backward Compatibility: Ensuring compatibility between standard 32-bit instructions and compressed instructions requires careful design to handle the mixed instruction set

For this specific project, we chose to focus on the core features of RV32IMC without adding the compression logic. This decision was made to simplify the implementation and avoid the extra complexity involved in decoding and managing compressed instructions.

Memory Model - SRAM Wrapper

In this section, we explain the SRAM memory model used in our design, including the SRAM wrapper module and how it interfaces with the actual SRAM memory.

SRAM Wrapper Overview:
The SRAM_wrapper module serves as an interface to the underlying SRAM module. It takes various control signals as inputs and passes them to the internal SRAM instance to manage memory operations.
Internal SRAM Instance:
The SRAM_wrapper module instantiates the SRAM module, passing the control and data signals to it. Here's how the internal memory operations work:

  • Addressing: The 14-bit address bus (A[13:0]) is used to select a specific memory location in the SRAM.
  • Read/Write Control:
    • The WEB (Write Enable) signal controls whether the incoming data on the DI bus will be written to the SRAM.
    • The OE (Output Enable) signal controls whether the DO bus will output the data from the SRAM.
Click to view SRAM Wrapper code
module SRAM_wrapper (
  input CK,
  input CS,
  input OE,
  input [3:0] WEB,
  input [13:0] A,
  input [31:0] DI,
  output [31:0] DO
);

  SRAM i_SRAM (
    .A0   (A[0]  ),
    .A1   (A[1]  ),
    .A2   (A[2]  ),
    .A3   (A[3]  ),
    .A4   (A[4]  ),
    .A5   (A[5]  ),
    .A6   (A[6]  ),
    .A7   (A[7]  ),
    .A8   (A[8]  ),
    .A9   (A[9]  ),
    .A10  (A[10] ),
    .A11  (A[11] ),
    .A12  (A[12] ),
    .A13  (A[13] ),
    .DO0  (DO[0] ),
    .DO1  (DO[1] ),
    .DO2  (DO[2] ),
    .DO3  (DO[3] ),
    .DO4  (DO[4] ),
    .DO5  (DO[5] ),
    .DO6  (DO[6] ),
    .DO7  (DO[7] ),
    .DO8  (DO[8] ),
    .DO9  (DO[9] ),
    .DO10 (DO[10]),
    .DO11 (DO[11]),
    .DO12 (DO[12]),
    .DO13 (DO[13]),
    .DO14 (DO[14]),
    .DO15 (DO[15]),
    .DO16 (DO[16]),
    .DO17 (DO[17]),
    .DO18 (DO[18]),
    .DO19 (DO[19]),
    .DO20 (DO[20]),
    .DO21 (DO[21]),
    .DO22 (DO[22]),
    .DO23 (DO[23]),
    .DO24 (DO[24]),
    .DO25 (DO[25]),
    .DO26 (DO[26]),
    .DO27 (DO[27]),
    .DO28 (DO[28]),
    .DO29 (DO[29]),
    .DO30 (DO[30]),
    .DO31 (DO[31]),
    .DI0  (DI[0] ),
    .DI1  (DI[1] ),
    .DI2  (DI[2] ),
    .DI3  (DI[3] ),
    .DI4  (DI[4] ),
    .DI5  (DI[5] ),
    .DI6  (DI[6] ),
    .DI7  (DI[7] ),
    .DI8  (DI[8] ),
    .DI9  (DI[9] ),
    .DI10 (DI[10]),
    .DI11 (DI[11]),
    .DI12 (DI[12]),
    .DI13 (DI[13]),
    .DI14 (DI[14]),
    .DI15 (DI[15]),
    .DI16 (DI[16]),
    .DI17 (DI[17]),
    .DI18 (DI[18]),
    .DI19 (DI[19]),
    .DI20 (DI[20]),
    .DI21 (DI[21]),
    .DI22 (DI[22]),
    .DI23 (DI[23]),
    .DI24 (DI[24]),
    .DI25 (DI[25]),
    .DI26 (DI[26]),
    .DI27 (DI[27]),
    .DI28 (DI[28]),
    .DI29 (DI[29]),
    .DI30 (DI[30]),
    .DI31 (DI[31]),
    .CK   (CK    ),
    .WEB0 (WEB[0]),
    .WEB1 (WEB[1]),
    .WEB2 (WEB[2]),
    .WEB3 (WEB[3]),
    .OE   (OE    ),
    .CS   (CS    )
  );

endmodule

Simulation Environment Setup

This section describes how to set up the simulation environment for the 5-stage pipelined RISC-V processor with RV32IMC design using ModelSim - Intel FPGA Starter Edition.
Create a working library:

vlib work
vmap work <path_to_work_directory>

Compile the SystemVerilog files:

vcom -sv HazardDetection.sv

Run the Simulation:

vsim work.HazardDetection

View the waveform to monitor signals:

add wave -position end sim:/HazardDetection/*

Verify the correctness of this RISC-V CPU

Compile and Simulation

First, We use below sim.do in Modelsim to compile my program and start simualtion by entering "do sim.do" in Modelsim terminal

# Create work library
vlib work

# Compile design files in correct order
vlog -sv parameter_define.sv
vlog -sv PCReg.sv
vlog -sv IFtoID.sv
vlog -sv RegFile.sv
vlog -sv Control_Unit.sv
vlog -sv Immediate_Unit.sv
vlog -sv IDtoEXE.sv
vlog -sv ALU.sv
vlog -sv MUX.sv
vlog -sv MUX3.sv
vlog -sv ConditionChecker.sv
vlog -sv EXEtoMEM.sv
vlog -sv MEMtoWB.sv
vlog -sv Forwarding_Unit.sv
vlog -sv LoadSignExtend.sv
vlog -sv SaveControl.sv
vlog -sv HazardDetection.sv
vlog -sv CSR.sv
vlog -sv SRAM.sv
vlog -sv SRAM_wrapper.sv
vlog -sv CPU.sv
vlog -sv cpu_tb.sv

# Start simulation
vsim -c cpu_tb

# Run simulation
run -all

Testbench

Second, We will explain our testbench program cpu_tb.sv

module cpu_tb(); // Clock and reset signals logic clk; logic rst; // Memory interface signals logic IM_cs, DM_cs; logic IM_oe, DM_oe; logic [3:0] IM_web, DM_web; logic [13:0] IM_addr, DM_addr; logic [31:0] IM_datain, IM_dataout; logic [31:0] DM_datain, DM_dataout; // File reading variables int i; int file; string line; logic [31:0] instruction; // Helper function to get memory word function logic [31:0] get_mem_word(input int addr); return {DM1.i_SRAM.Memory_byte3[addr], DM1.i_SRAM.Memory_byte2[addr], DM1.i_SRAM.Memory_byte1[addr], DM1.i_SRAM.Memory_byte0[addr]}; endfunction // Helper function to convert hex character to 4-bit value function logic [3:0] hex_to_4bit(input byte hex_char); if (hex_char >= "0" && hex_char <= "9") return hex_char - "0"; else if (hex_char >= "a" && hex_char <= "f") return hex_char - "a" + 10; else if (hex_char >= "A" && hex_char <= "F") return hex_char - "A" + 10; else return 4'h0; endfunction // Helper function to print registers function void print_registers(); $display("\nRegister File Contents:"); $display("----------------------"); for (int i = 0; i < 32; i += 4) begin $display("x%-2d: %8h x%-2d: %8h x%-2d: %8h x%-2d: %8h", i, cpu_inst.RF.RegMem[i], i+1, cpu_inst.RF.RegMem[i+1], i+2, cpu_inst.RF.RegMem[i+2], i+3, cpu_inst.RF.RegMem[i+3]); end endfunction // Helper function to print data memory contents function void print_data_mem(); $display("\nData Memory Contents:"); $display("--------------------"); for (int i = 0; i < 16; i += 4) begin $display("Word[%2d]: %8h Word[%2d]: %8h Word[%2d]: %8h Word[%2d]: %8h", i, get_mem_word(i), i+1, get_mem_word(i+1), i+2, get_mem_word(i+2), i+3, get_mem_word(i+3)); end endfunction // Clock generation initial begin clk = 0; forever #5 clk = ~clk; end // Reset generation initial begin rst = 1; #20 rst = 0; end // CPU instantiation CPU cpu_inst ( .clk(clk), .rst(rst), .IM_cs(IM_cs), .DM_cs(DM_cs), .IM_oe(IM_oe), .DM_oe(DM_oe), .IM_web(IM_web), .DM_web(DM_web), .IM_addr(IM_addr), .DM_addr(DM_addr), .IM_datain(IM_datain), .IM_dataout(IM_dataout), .DM_datain(DM_datain), .DM_dataout(DM_dataout) ); // Instruction Memory SRAM_wrapper IM1 ( .CK(clk), .CS(IM_cs), .OE(IM_oe), .WEB(IM_web), .A(IM_addr), .DI(IM_datain), .DO(IM_dataout) ); // Data Memory SRAM_wrapper DM1 ( .CK(clk), .CS(DM_cs), .OE(DM_oe), .WEB(DM_web), .A(DM_addr), .DI(DM_datain), .DO(DM_dataout) ); // Main test sequence initial begin i = 0; // Initialize memories for (int idx = 0; idx < 16384; idx++) begin // Initialize both memories to 0 IM1.i_SRAM.Memory_byte0[idx] = 8'h00; IM1.i_SRAM.Memory_byte1[idx] = 8'h00; IM1.i_SRAM.Memory_byte2[idx] = 8'h00; IM1.i_SRAM.Memory_byte3[idx] = 8'h00; DM1.i_SRAM.Memory_byte0[idx] = 8'h00; DM1.i_SRAM.Memory_byte1[idx] = 8'h00; DM1.i_SRAM.Memory_byte2[idx] = 8'h00; DM1.i_SRAM.Memory_byte3[idx] = 8'h00; end // Load test.mem into instruction memory file = $fopen("test.mem", "r"); if (file) begin while (!$feof(file) && i < 1024) begin void'($fgets(line, file)); // Skip comment lines and empty lines if (line.len() > 0 && line[0] != "/" && line[1] != "/") begin // Read 8 hex characters for the instruction if (line.len() >= 8) begin instruction[31:28] = hex_to_4bit(line[0]); instruction[27:24] = hex_to_4bit(line[1]); instruction[23:20] = hex_to_4bit(line[2]); instruction[19:16] = hex_to_4bit(line[3]); instruction[15:12] = hex_to_4bit(line[4]); instruction[11:8] = hex_to_4bit(line[5]); instruction[7:4] = hex_to_4bit(line[6]); instruction[3:0] = hex_to_4bit(line[7]); // Store in instruction memory IM1.i_SRAM.Memory_byte3[i] = instruction[31:24]; IM1.i_SRAM.Memory_byte2[i] = instruction[23:16]; IM1.i_SRAM.Memory_byte1[i] = instruction[15:8]; IM1.i_SRAM.Memory_byte0[i] = instruction[7:0]; i = i + 1; end end end $fclose(file); end else begin $display("Error: Could not open test.mem"); $finish; end // Wait for reset @(negedge rst); // Run program repeat(2000) @(posedge clk); // Print final state print_registers(); print_data_mem(); $stop; end endmodule

1.Basic Purpose:

This testbench runs a RISC-V program in hex format from test.mem
It simulates the CPU with instruction and data memory
At the end, it shows the final state of all registers and data memory

2.Key Components:

  • CPU instance (cpu_inst)
  • Instruction Memory (IM1)
  • Data Memory (DM1)

3.Test flow

initial begin // 1. Initialize all memory to zero for (int idx = 0; idx < 16384; idx++) begin IM1.i_SRAM.Memory_byte[0-3][idx] = 8'h00; DM1.i_SRAM.Memory_byte[0-3][idx] = 8'h00; end // 2. Read and load program from test.mem file = $fopen("test.mem", "r"); // Read hex instructions line by line // Each line should contain 8 hex characters (32-bit instruction) // 3. Wait for reset @(negedge rst); // 4. Run program for 2000 clock cycles repeat(2000) @(posedge clk); // 5. Print results print_registers(); print_data_mem(); end

4.How to modify

// A. Change number of cycles: // Find this line: repeat(2000) @(posedge clk); // Change 2000 to desired number // B. Change memory size: // Find this line: for (int idx = 0; idx < 16384; idx++) begin // Change 16384 to desired size // C. Change displayed memory range: // In print_data_mem function: for (int i = 0; i < 16; i += 4) begin // Change 16 to show more/less memory words

5.test.mem Format:

  • Each line should contain one 32-bit instruction in hexadecimal
  • 8 characters per line (e.g., "00500093")
  • Can include comment lines starting with "//"
  • Memory address start at 0 at default
  • In this project, We write testing program in RISC-V on our own and use Venus to gernerate the hexadecimal machine code as our test.mem

Start Testing

Due to our memory model (SRAM.sv and SRAM_wrapper.sv), we do not support .data usage and our default memory starts as address 0. Below is an example

# Common practice with .data .data # Data segment array: .word 1,2,3 # Array starting at some address # Our CPU requires (memory starts at 0): addi x1, x0, 1 # x1 = 1 addi x2, x0, 2 # x2 = 2 addi x3, x0, 3 # x3 = 3 sw x1, 0(x0) # Store 1 at addr 0 sw x2, 4(x0) # Store 2 at addr 4 sw x3, 8(x0) # Store 3 at addr 8

Program1 - Bubble sort

# Initialize array [5, 2, 7, 1, 3] addi x1, x0, 5 # Load 5 sw x1, 0(x0) # Store at mem[0] addi x1, x0, 2 # Load 2 sw x1, 4(x0) # Store at mem[1] addi x1, x0, 7 # Load 7 sw x1, 8(x0) # Store at mem[2] addi x1, x0, 1 # Load 1 sw x1, 12(x0) # Store at mem[3] addi x1, x0, 3 # Load 3 sw x1, 16(x0) # Store at mem[4] # Initialize counters addi x2, x0, 4 # n-1 = 4 (outer loop limit) addi x3, x0, 0 # i = 0 (outer loop counter) # Outer Loop outer: beq x3, x2, done # if i == n-1, jump to done addi x4, x0, 0 # j = 0 (inner loop counter) sub x5, x2, x3 # limit = (n-1)-i jal x0, inner # Jump to inner loop # Inner Loop inner: beq x4, x5, next_i # if j == limit, jump to next_i slli x6, x4, 2 # x6 = j * 4 lw x7, 0(x6) # load A[j] lw x8, 4(x6) # load A[j+1] bge x8, x7, skip # if A[j+1] >= A[j], skip swap # Swap values sw x8, 0(x6) # store smaller value sw x7, 4(x6) # store larger value skip: addi x4, x4, 1 # j++ jal x0, inner # Jump back to inner loop # Move to the Next Outer Loop Iteration next_i: addi x3, x3, 1 # i++ jal x0, outer # Jump back to outer loop # End Program done: # Program terminates here

Corresponding hexadecimal machine code

00500093
00102023
00200093
00102223
00700093
00102423
00100093
00102623
00300093
00102823
00400113
00000193
02218e63
00000213
403102b3
0040006f
02520263
00221313
00032383
00432403
00745663
00832023
00732223
00120213
fe1ff06f
00118193
fc9ff06f

Output of Program1

# Data Memory Contents:
# --------------------
# Word[ 0]: 00000001    Word[ 1]: 00000002    Word[ 2]: 00000003    Word[ 3]: 00000005
# Word[ 4]: 00000007    Word[ 5]: 00000000    Word[ 6]: 00000000    Word[ 7]: 00000000
# Word[ 8]: 00000000    Word[ 9]: 00000000    Word[10]: 00000000    Word[11]: 00000000
# Word[12]: 00000000    Word[13]: 00000000    Word[14]: 00000000    Word[15]: 00000000

As above output, [5, 2, 7, 1, 3] is sorted and become [1, 2, 3, 5, 7]

Program2

In this second program, we want to compute the dot product of two arrays

# Compute the dot product of two arrays # Array A: [3, -4, 2] (signed, manually initialized in memory) # Array B: [5, 6, 7] (unsigned, manually initialized in memory) # Initialize array A addi x10, x0, 3 # Load 3 into x10 sw x10, 0(x0) # Store A[0] at address 0x00000000 addi x10, x0, -4 # Load -4 into x10 sw x10, 4(x0) # Store A[1] at address 0x00000004 addi x10, x0, 2 # Load 2 into x10 sw x10, 8(x0) # Store A[2] at address 0x00000008 # Initialize array B addi x10, x0, 5 # Load 5 into x10 sw x10, 12(x0) # Store B[0] at address 0x0000000C addi x10, x0, 6 # Load 6 into x10 sw x10, 16(x0) # Store B[1] at address 0x00000010 addi x10, x0, 7 # Load 7 into x10 sw x10, 20(x0) # Store B[2] at address 0x00000014 # Initialize variables addi x5, x0, 0 # Base address for array A (0x00000000) addi x6, x0, 12 # Base address for array B (0x0000000C) addi x2, x0, 3 # n = 3 (array size) addi x3, x0, 0 # i = 0 (index counter) addi x4, x0, 0 # dot_product = 0 (result accumulator) # Loop to compute the dot product loop: beq x3, x2, done # If i == n, exit loop # Load A[i] (signed) and B[i] (unsigned) slli x7, x3, 2 # x7 = i * 4 (offset for arrays) add x8, x5, x7 # Address of A[i] add x9, x6, x7 # Address of B[i] lw x10, 0(x8) # Load A[i] into x10 (signed) lw x11, 0(x9) # Load B[i] into x11 (unsigned) # Compute A[i] * B[i] using MUL (lower 32 bits) mul x12, x10, x11 # x12 = A[i] * B[i] (lower 32 bits) # Compute A[i] * B[i] using MULH (upper 32 bits, signed × signed) mulh x13, x10, x11 # x13 = Upper 32 bits of A[i] * B[i] (signed × signed) # Compute A[i] * B[i] using MULHSU (upper 32 bits, signed × unsigned) mulhsu x14, x10, x11 # x14 = Upper 32 bits of A[i] * B[i] (signed × unsigned) # Compute A[i] * B[i] using MULHU (upper 32 bits, unsigned × unsigned) mulhu x15, x11, x11 # x15 = Upper 32 bits of B[i] * B[i] (unsigned × unsigned) # Accumulate the lower 32 bits for dot product add x4, x4, x12 # dot_product += A[i] * B[i] (lower 32 bits) # Increment index addi x3, x3, 1 # i++ jal x0, loop # Jump back to loop # Store the final dot product done: sw x4, 24(x0) # Store dot_product in mem[6] # End program

Corresponding hexadecimal machine code

00300513
00a02023
ffc00513
00a02223
00200513
00a02423
00500513
00a02623
00600513
00a02823
00700513
00a02a23
00000293
00c00313
00300113
00000193
00000213
02218a63
00219393
00728433
007304b3
00042503
0004a583
02b50633
02b516b3
02b52733
02b5b7b3
00c20233
00118193
fd1ff06f
00402c23

Expected output of Program2

Address Value (Hex) Value (Decimal) Description
Word[ 0] 0x00000003 3 Array A[0].
Word[ 1] 0xFFFFFFFC -4 Array A[1].
Word[ 2] 0x00000002 2 Array A[2].
Word[ 3] 0x00000005 5 Array B[0].
Word[ 4] 0x00000006 6 Array B[1].
Word[ 5] 0x00000007 7 Array B[2].
Word[ 6] 0x00000005 5 Final dot product result.

Output of Program2

# Data Memory Contents:
# --------------------
# Word[ 0]: 00000003    Word[ 1]: fffffffc    Word[ 2]: 00000002    Word[ 3]: 00000005
# Word[ 4]: 00000006    Word[ 5]: 00000007    Word[ 6]: 00000005    Word[ 7]: 00000000
# Word[ 8]: 00000000    Word[ 9]: 00000000    Word[10]: 00000000    Word[11]: 00000000
# Word[12]: 00000000    Word[13]: 00000000    Word[14]: 00000000    Word[15]: 00000000

Program3

In this program, We want to test the 4 multuplication related instructino, which are MUL, MULH, MULHSU, and MULHU

# Initialize test values lui x1, 0x80000 # x1 = 0x80000000 (most negative signed 32-bit) addi x2, x0, -2 # x2 = -2 (0xFFFFFFFE) lui x3, 0x7FFF # x3 = 0x7FFF0000 addi x3, x3, 0x7FF # x3 = 0x7FFFFFFF (largest positive signed) # Test MUL (lower 32 bits only) mul x4, x1, x2 # (-2^31) * (-2) = 2^32 sw x4, 0(x0) # Should store 0x00000000 (lower 32 bits) # Test MULH (signed × signed) mulh x5, x1, x2 # (-2^31) * (-2) = 2^32, upper bits sw x5, 4(x0) # Should store 0x00000001 (upper 32 bits) # Test MULHSU (signed × unsigned) mulhsu x6, x1, x2 # (-2^31) treated as signed, (-2) treated as unsigned sw x6, 8(x0) # Should show difference from MULH # Test MULHU (unsigned × unsigned) mulhu x7, x1, x2 # Both treated as unsigned values sw x7, 12(x0) # Should show difference from both MULH and MULHSU # End program

Corresponding hexadecimal machine code

800000b7
ffe00113
07fff1b7
7ff18193
02208233
00402023
022092b3
00502223
0220a333
00602423
0220b3b3
00702623

Expected output of Program3

Register Value (Hex) Value (Decimal) Description
x1 0x80000000 -2147483648 From lui x1, 0x800000
x2 0xFFFFFFFE -2 From addi x2, x0, -2
x3 0x07FFF7FF 8,388,607 From lui + addi combination
x4 0x00000000 0 Lower 32 bits of x1 * x2
x5 0x00000001 1 Upper 32 bits (signed × signed)
x6 0x80000001 -2147483647 Upper 32 bits (signed × unsigned)
x7 0x7FFFFFFF 2147483647 Upper 32 bits (unsigned × unsigned)
Address Value (Hex) Description
0x00 0x00000000 MUL result (lower 32 bits)
0x04 0x00000001 MULH result (signed × signed)
0x08 0x80000001 MULHSU result (signed × unsigned)
0x0C 0x7FFFFFFF MULHU result (unsigned × unsigned)

Output of Program3

# Register File Contents:
# ----------------------
# x0 : 00000000    x1 : 80000000    x2 : fffffffe    x3 : 07fff7ff
# x4 : 00000000    x5 : 00000001    x6 : 80000001    x7 : 7fffffff
# x8 : 00000000    x9 : 00000000    x10: 00000000    x11: 00000000
# x12: 00000000    x13: 00000000    x14: 00000000    x15: 00000000
# x16: 00000000    x17: 00000000    x18: 00000000    x19: 00000000
# x20: 00000000    x21: 00000000    x22: 00000000    x23: 00000000
# x24: 00000000    x25: 00000000    x26: 00000000    x27: 00000000
# x28: 00000000    x29: 00000000    x30: 00000000    x31: 00000000
# 
# Data Memory Contents:
# --------------------
# Word[ 0]: 00000000    Word[ 1]: 00000001    Word[ 2]: 80000001    Word[ 3]: 7fffffff
# Word[ 4]: 00000000    Word[ 5]: 00000000    Word[ 6]: 00000000    Word[ 7]: 00000000
# Word[ 8]: 00000000    Word[ 9]: 00000000    Word[10]: 00000000    Word[11]: 00000000
# Word[12]: 00000000    Word[13]: 00000000    Word[14]: 00000000    Word[15]: 00000000

Program4

In this program, We want to verify the hazard detection and forwarding ability of our CPU, Hence, We created a program with lots of hazards, including load-use hazard.

# Initialize registers addi x1, x0, 10 # x1 = 10 addi x2, x0, 20 # x2 = 20 addi x3, x0, 5 # x3 = 5 addi x4, x0, 3 # x4 = 3 # Store values into memory sw x1, 0(x0) # Store x1 (10) at address 0x00000000 sw x2, 4(x0) # Store x2 (20) at address 0x00000004 sw x3, 8(x0) # Store x3 (5) at address 0x00000008 sw x4, 12(x0) # Store x4 (3) at address 0x0000000C # Load values from memory and create load-use hazards lw x5, 0(x0) # x5 = mem[0] = 10 add x6, x5, x2 # x6 = x5 + x2 (data hazard with x5) lw x7, 4(x0) # x7 = mem[4] = 20 sub x8, x7, x3 # x8 = x7 - x3 (data hazard with x7) lw x9, 8(x0) # x9 = mem[8] = 5 mul x10, x9, x6 # x10 = x9 * x6 (data hazard with x6 and x9) add x11, x10, x8 # x11 = x10 + x8 (data hazard with x10 and x8) # Additional operations to create more hazards mul x12, x11, x4 # x12 = x11 * x4 (data hazard with x11) add x13, x12, x5 # x13 = x12 + x5 (data hazard with x12 and x5) sub x14, x13, x9 # x14 = x13 - x9 (data hazard with x13 and x9)

Corresponding hexadecimal machine code

00a00093
01400113
00500193
00300213
00102023
00202223
00302423
00402623
00002283
00228333
00402383
40338433
00802483
02648533
008505b3
02458633
005606b3
40968733

Expected output of Program4

Registers

Register Value (Hex) Value (Decimal) Description
x0 0x00000000 0 Zero register (always 0).
x1 0x0000000A 10 Initialized value.
x2 0x00000014 20 Initialized value.
x3 0x00000005 5 Initialized value.
x4 0x00000003 3 Initialized value.
x5 0x0000000A 10 Loaded from memory[0].
x6 0x0000001E 30 Result of x5 + x2.
x7 0x00000014 20 Loaded from memory[4].
x8 0x0000000F 15 Result of x7 - x3.
x9 0x00000005 5 Loaded from memory[8].
x10 0x00000096 150 Result of x9 * x6.
x11 0x000000A5 165 Result of x10 + x8.
x12 0x000001EF 495 Result of x11 * x4.
x13 0x000001F9 505 Result of x12 + x5.
x14 0x000001F4 500 Result of x13 - x9.

Data Memory

Address Value (Hex) Value (Decimal) Description
Word[0] 0x0000000A 10 Stored value of x1.
Word[1] 0x00000014 20 Stored value of x2.
Word[2] 0x00000005 5 Stored value of x3.
Word[3] 0x00000003 3 Stored value of x4.
Word[4-15] 0x00000000 0 Placeholder for unused memory.

Output of Program4

# Register File Contents:
# ----------------------
# x0 : 00000000    x1 : 0000000a    x2 : 00000014    x3 : 00000005
# x4 : 00000003    x5 : 0000000a    x6 : 0000001e    x7 : 00000014
# x8 : 0000000f    x9 : 00000005    x10: 00000096    x11: 000000a5
# x12: 000001ef    x13: 000001f9    x14: 000001f4    x15: 00000000
# x16: 00000000    x17: 00000000    x18: 00000000    x19: 00000000
# x20: 00000000    x21: 00000000    x22: 00000000    x23: 00000000
# x24: 00000000    x25: 00000000    x26: 00000000    x27: 00000000
# x28: 00000000    x29: 00000000    x30: 00000000    x31: 00000000
# 
# Data Memory Contents:
# --------------------
# Word[ 0]: 0000000a    Word[ 1]: 00000014    Word[ 2]: 00000005    Word[ 3]: 00000003
# Word[ 4]: 00000000    Word[ 5]: 00000000    Word[ 6]: 00000000    Word[ 7]: 00000000
# Word[ 8]: 00000000    Word[ 9]: 00000000    Word[10]: 00000000    Word[11]: 00000000
# Word[12]: 00000000    Word[13]: 00000000    Word[14]: 00000000    Word[15]: 00000000

Program5

In this program, We want to test I-type and S-type instrcutions, especially those dealing with byte and half-word

# Test program using various I-type and S-type instructions addi x1, x0, 10 # x1 = 10 addi x2, x0, -5 # x2 = -5 (sign extended) xori x3, x1, 0xFF # x3 = 10 ^ 0xFF = 0xF5 ori x4, x2, 0x7F # x4 = -5 | 0x7F andi x5, x3, 0xF # x5 = 0xF5 & 0xF = 0x5 slli x6, x1, 2 # x6 = 10 << 2 = 40 srli x7, x3, 1 # x7 = 0xF5 >> 1 srai x8, x2, 1 # x8 = -5 >> 1 (arithmetic) slti x9, x2, 0 # x9 = (-5 < 0) ? 1 : 0 sltiu x10, x1, 20 # x10 = (10 < 20) ? 1 : 0 # Store some values sw x1, 0(x0) # Store x1(10) at mem[0] sh x3, 4(x0) # Store x3(0xF5) as halfword at mem[4] sb x5, 6(x0) # Store x5(5) as byte at mem[8] # Load values back lw x11, 0(x0) # Load word from mem[0] lh x12, 4(x0) # Load halfword (signed) from mem[4] lhu x13, 4(x0) # Load halfword (unsigned) from mem[4] lb x14, 6(x0) # Load byte (signed) from mem[8] lbu x15, 6(x0) # Load byte (unsigned) from mem[8]

Corresponding hexadecimal machine code

00a00093
ffb00113
0ff0c193
07f16213
00f1f293
00209313
0011d393
40115413
00012493
0140b513
00102023
00301223
00500323
00002583
00401603
00405683
00600703
00604783

Expected output of Program5

Expected Output

Register File Contents

Register Value (Hex) Value (Decimal) Explanation
x0 0x00000000 0 Zero register (always 0).
x1 0x0000000A 10 addi x1, x0, 10.
x2 0xFFFFFFFB -5 addi x2, x0, -5.
x3 0x000000F5 245 xori x3, x1, 0xFF10 ^ 0xFF = 0xF5.
x4 0xFFFFFFFF -1 ori x4, x2, 0x7F → `-5
x5 0x00000005 5 andi x5, x3, 0xF0xF5 & 0xF = 0x5.
x6 0x00000028 40 slli x6, x1, 210 << 2 = 40.
x7 0x0000007A 122 srli x7, x3, 10xF5 >> 1 = 0x7A.
x8 0xFFFFFFFD -3 srai x8, x2, 1-5 >> 1 = -3 (arithmetic shift keeps the sign).
x9 0x00000001 1 slti x9, x2, 0(-5 < 0) ? 1 : 0.
x10 0x00000001 1 sltiu x10, x1, 20(10 < 20) ? 1 : 0.
x11 0x0000000A 10 lw x11, 0(x0) → Loads word from mem[0].
x12 0x000000F5 245 lh x12, 4(x0) → Signed halfword from mem[4].
x13 0x000000F5 245 lhu x13, 4(x0) → Unsigned halfword from mem[4].
x14 0xFFFFFFF5 -11 lb x14, 6(x0) → Signed byte from mem[6].
x15 0x000000F5 5 lbu x15, 6(x0) → Unsigned byte from mem[6].

Data Memory Contents

Word Address Value (Hex) Explanation
Word[ 0] 0x0000000A Stored by sw x1, 0(x0).
Word[ 1] 0x000500F5 Combined result of sh x3, 4(x0) and sb x5, 6(x0).

Output of Program5

# Register File Contents:
# ----------------------
# x0 : 00000000    x1 : 0000000a    x2 : fffffffb    x3 : 000000f5
# x4 : ffffffff    x5 : 00000005    x6 : 00000028    x7 : 0000007a
# x8 : fffffffd    x9 : 00000001    x10: 00000001    x11: 0000000a
# x12: 000000f5    x13: 000000f5    x14: fffffff5    x15: 000000f5
# x16: 00000000    x17: 00000000    x18: 00000000    x19: 00000000
# x20: 00000000    x21: 00000000    x22: 00000000    x23: 00000000
# x24: 00000000    x25: 00000000    x26: 00000000    x27: 00000000
# x28: 00000000    x29: 00000000    x30: 00000000    x31: 00000000
# 
# Data Memory Contents:
# --------------------
# Word[ 0]: 0000000a    Word[ 1]: 000500f5    Word[ 2]: 00000000    Word[ 3]: 00000000
# Word[ 4]: 00000000    Word[ 5]: 00000000    Word[ 6]: 00000000    Word[ 7]: 00000000
# Word[ 8]: 00000000    Word[ 9]: 00000000    Word[10]: 00000000    Word[11]: 00000000
# Word[12]: 00000000    Word[13]: 00000000    Word[14]: 00000000    Word[15]: 00000000

### Program6
In this program, we want to test U-type and B-type instructions

# U-type instructions test lui x1, 0x12345 # x1 = 0x12345000 auipc x2, 0x1000 # x2 = PC + 0x1000000 # Test signed vs unsigned comparisons with distinct values addi x3, x0, -5 # x3 = -5 (0xFFFFFFFB) addi x4, x0, 10 # x4 = 10 # Store markers at different memory locations to track which path was taken addi x5, x0, 1 # Path marker = 1 sw x5, 0(x0) # Store at mem[0] before comparison # Test signed comparison blt x3, x4, signed_path # Should take (-5 < 10) addi x5, x0, 20 # Won't execute sw x5, 4(x0) # Won't execute j next_test signed_path: addi x5, x0, 10 # Will execute sw x5, 4(x0) # Store at mem[1] to show path taken next_test: # Test unsigned comparison bltu x3, x4, unsigned_path # Should not take (0xFFFFFFFB > 10) addi x6, x0, 30 # Will execute sw x6, 8(x0) # Store at mem[2] j done unsigned_path: addi x6, x0, 40 # Won't execute sw x6, 8(x0) # Won't execute done:

Corresponding hexadecimal machine code

123450b7
01000117
ffb00193
00a00213
00100293
00502023
0041c863
01400293
00502223
00c0006f
00a00293
00502223
0041e863
01e00313
00602423

Expected output of Program6

Register Value (Hex) Value (Decimal) Description
x1 0x12345000 305419264 Upper 20 bits loaded by lui
x2 0x01000004 16777220 PC + 0x1000000 from auipc
x3 0xFFFFFFFB -5 Negative value for comparison
x4 0x0000000A 10 Positive value for comparison
x5 0x0000000A 10 Shows signed branch was taken
x6 0x0000001E 30 Shows unsigned branch not taken
Address Value (Hex) Value (Decimal) Description
0x00 0x00000001 1 Initial marker
0x04 0x0000000A 10 Shows signed path taken
0x08 0x0000001E 30 Shows unsigned path not taken

Output of Program6

# Register File Contents:
# ----------------------
# x0 : 00000000    x1 : 12345000    x2 : 01000004    x3 : fffffffb
# x4 : 0000000a    x5 : 0000000a    x6 : 0000001e    x7 : 00000000
# x8 : 00000000    x9 : 00000000    x10: 00000000    x11: 00000000
# x12: 00000000    x13: 00000000    x14: 00000000    x15: 00000000
# x16: 00000000    x17: 00000000    x18: 00000000    x19: 00000000
# x20: 00000000    x21: 00000000    x22: 00000000    x23: 00000000
# x24: 00000000    x25: 00000000    x26: 00000000    x27: 00000000
# x28: 00000000    x29: 00000000    x30: 00000000    x31: 00000000
# 
# Data Memory Contents:
# --------------------
# Word[ 0]: 00000001    Word[ 1]: 0000000a    Word[ 2]: 0000001e    Word[ 3]: 00000000
# Word[ 4]: 00000000    Word[ 5]: 00000000    Word[ 6]: 00000000    Word[ 7]: 00000000
# Word[ 8]: 00000000    Word[ 9]: 00000000    Word[10]: 00000000    Word[11]: 00000000
# Word[12]: 00000000    Word[13]: 00000000    Word[14]: 00000000    Word[15]: 00000000