# 5-stage pipelined RISC-V processor with RV32IM > 蔡承璋, 張恩祥 [GitHub](https://github.com/dennis15984/RISC-V-CPU) ## Introduction  In this project, we write RTL code of pipeline CPU. We need to get familiar with the RISC-V instruction set architecture and implement 45 instructions. The requirements include R-type, I-type, S-type, B-type U-type and J-type instruction, which allows us to recall the knowledge that was taught in computer organization. ## Block diagram ![圖片 1](https://hackmd.io/_uploads/HJHGJq5P1g.jpg) ## CPU Description:  The CPU is organized into five pipeline stages: IF (Instruction Fetch), ID (Instruction Decode and Register File Read), EX (Execution or Address Calculation), MEM (Data Memory Access), and WB (Write Back).  IF: This stage includes the PC Register (Program Counter) and IM (Instruction Memory). In this stage, the PC increments by 4 on each clock cycle and passes the next instruction's address to IM. IM, which is an SRAM, retrieves the instruction and passes it to the ID stage in the next clock cycle. When encountering special instructions like J-type or B-type instructions, the PC waits for the new address to be computed before sending the instruction address to IM.  ID: This stage primarily consists of the Decode, Register File, and Immediate Unit. In this stage, the Control Unit decodes the 32-bit instruction fetched by IM, generating signals for subsequent units. It also reads rsl and rs2 from the Register File, providing them for further calculations. The Immediate Unit calculates the current Immediate value based on the instruction and passes it for subsequent operations.  EXE: This stage includes the ALU (Arithmetic Logic Unit) responsible for executing arithmetic operations, the Multiplier does 4 types of multiplication by stall all the register, the JB Unit for determining jump and branch when encountering B-type or U-type instructions, and several Mux units. The ALU performs basic operations such as addition, subtraction, shifting, comparisons, OR, and AND. The Mux units in this stage select sources based on signals from the ID stage and forwarding unit's feedback. The JB Unit does comparison to decide whether branch operation should be executed, and jump signal is also connected, which allowing Hazard Detection Unit to judge the necessary of flush in IF stage.  MEM: This stage primarily handles Data Memory operations, distinguishing between Load DM and Save DM processes. The Save Control unit manages write signals for SW, SH, and SB operations. And the CSR Unit is also at this stage, to count for the number of instructions and cycles.  WB: This stage determines whether to write the ALU results or the value loaded from DM back to the Register File. If it's the value from DM, a Load Sign Extend operation is performed.  Forwarding Unit: Forwarding occurs when the reg address used in the second instruction matches the address being written by previous instructions. This unit determines the need for forwarding based on the positions of rd in MEM and WB stages and rs1 and rs2 positions in the EXE stage. If necessary, it sends signals to the EXE stage to select which stage's values to forward. ![IMG_2703](https://hackmd.io/_uploads/ry9O8ccPJe.jpg) ```sv= module Forwarding_Unit(forwardA ,forwardB ,RegAddr1 ,RegAddr2 ,MEM_RdAddr ,WB_RdAddr ,MEM_RegWrite ,WB_RegWrite); input MEM_RegWrite ,WB_RegWrite ; input [4:0] RegAddr1 ,RegAddr2 ,MEM_RdAddr ,WB_RdAddr; output logic [1:0] forwardA, forwardB; always_comb begin priority if(MEM_RegWrite && MEM_RdAddr == RegAddr1)begin if(WB_RegWrite && WB_RdAddr == RegAddr2)begin forwardA = `MEMRDA; forwardB = `WBRDB; end else begin forwardA = `MEMRDA; forwardB = `NOFB; end end else if(MEM_RegWrite && MEM_RdAddr == RegAddr2)begin if(WB_RegWrite && WB_RdAddr == RegAddr1)begin forwardA = `WBRDA; forwardB = `MEMRDB; end else begin forwardA = `NOFA; forwardB = `MEMRDB; end end else if(WB_RegWrite && WB_RdAddr == RegAddr1)begin forwardA = `WBRDA; forwardB = `NOFB; end else if(WB_RegWrite && WB_RdAddr == RegAddr2)begin forwardA = `NOFA; forwardB = `WBRDB; end else begin forwardA = `NOFA; forwardB = `NOFB; end end endmodule ```  Hazard Detection: This unit sends signals to the pipeline registers to decide whether to flush or stall the pipeline. When encountering J-type or B-type instructions for jumping, the signals from EXE are sent to this unit. It sends flush signals to IFtoID and IDtoEXE, indicating that the current instructions are not needed. When dealing with Load DM instructions that require forwarding and where DM data is delayed by one cycle, the unit sends a stall signal to IFtoID and PC, indicating the need to wait for DM data before proceeding with the next instruction. It also sends a flush signal to IDtoEXE, indicating that the ID stage's instruction is a nop. As the multiplication takes more time, when the ALU operation is multiplication, the stall signals are sent to the five registers, including PC, IFtoID, IDtoEXE, EXEtoMEM, and MEMtoWB. ![IMG_2702](https://hackmd.io/_uploads/SkL5UccDyx.jpg) ```sv= module HazardDetection(RegAddr1 ,RegAddr2 ,pc_stall ,IF_flush ,IF_stall ,ID_stall ,ID_flush ,BranchorJump ,RdAddr ,MemRead); input [4:0] RegAddr1, RegAddr2, RdAddr; input MemRead; input BranchorJump; output logic pc_stall ,IF_stall ,IF_flush ,ID_stall ,ID_flush; always_comb begin if(MemRead && (RdAddr == RegAddr1 || RdAddr == RegAddr2))begin //load-use data hazard pc_stall = 1'b1; IF_stall = 1'b1; IF_flush = 1'b0; ID_stall = 1'b0; ID_flush = 1'b1; end else if(BranchorJump)begin pc_stall = 1'b0; IF_stall = 1'b0; IF_flush = 1'b1; ID_stall = 1'b0; ID_flush = 1'b1; end else begin pc_stall = 1'b0; IF_stall = 1'b0; IF_flush = 1'b0; ID_stall = 1'b0; ID_flush = 1'b0; end end endmodule ``` ## RV32IMC - Compression ( C ) Instructions In our current design, we have implemented the basic structure for the **RV32IMC architecture**, which includes the base **RV32I** instructions as well as extensions for multiplication and division (M). However, we did not implement the **C** extension, which defines the **compressed instructions**. **Why the "C" Extension Was Not Implemented:** The C extension is responsible for compressing instructions to reduce code size and improve performance in some scenarios. While this can be advantageous in terms of memory efficiency, implementing the C extension requires additional logic to handle compressed instructions, such as: * **Instruction Decoding:** Decoding 16-bit compressed instructions into 32-bit instructions introduces additional complexity in the instruction pipeline. * **Handling Pairs of Instructions:** Some instructions may require multiple 16-bit instructions to form a full 32-bit operation. * **Backward Compatibility:** Ensuring compatibility between standard 32-bit instructions and compressed instructions requires careful design to handle the mixed instruction set For this specific project, we chose to focus on the core features of RV32IMC without adding the compression logic. This decision was made to simplify the implementation and **avoid the extra complexity involved in decoding and managing compressed instructions.** ## Memory Model - SRAM Wrapper In this section, we explain the **SRAM memory model** used in our design, including the **SRAM wrapper module** and how it interfaces with the actual SRAM memory. **SRAM Wrapper Overview:** The SRAM_wrapper module serves as an interface to the underlying SRAM module. It takes various control signals as inputs and passes them to the internal SRAM instance to manage memory operations. **Internal SRAM Instance:** The SRAM_wrapper module instantiates the SRAM module, passing the control and data signals to it. Here's how the internal memory operations work: * Addressing: The 14-bit address bus (A[13:0]) is used to select a specific memory location in the SRAM. * Read/Write Control: * The WEB (Write Enable) signal controls whether the incoming data on the DI bus will be written to the SRAM. * The OE (Output Enable) signal controls whether the DO bus will output the data from the SRAM. <details> <summary>Click to view SRAM Wrapper code</summary> ```sv module SRAM_wrapper ( input CK, input CS, input OE, input [3:0] WEB, input [13:0] A, input [31:0] DI, output [31:0] DO ); SRAM i_SRAM ( .A0 (A[0] ), .A1 (A[1] ), .A2 (A[2] ), .A3 (A[3] ), .A4 (A[4] ), .A5 (A[5] ), .A6 (A[6] ), .A7 (A[7] ), .A8 (A[8] ), .A9 (A[9] ), .A10 (A[10] ), .A11 (A[11] ), .A12 (A[12] ), .A13 (A[13] ), .DO0 (DO[0] ), .DO1 (DO[1] ), .DO2 (DO[2] ), .DO3 (DO[3] ), .DO4 (DO[4] ), .DO5 (DO[5] ), .DO6 (DO[6] ), .DO7 (DO[7] ), .DO8 (DO[8] ), .DO9 (DO[9] ), .DO10 (DO[10]), .DO11 (DO[11]), .DO12 (DO[12]), .DO13 (DO[13]), .DO14 (DO[14]), .DO15 (DO[15]), .DO16 (DO[16]), .DO17 (DO[17]), .DO18 (DO[18]), .DO19 (DO[19]), .DO20 (DO[20]), .DO21 (DO[21]), .DO22 (DO[22]), .DO23 (DO[23]), .DO24 (DO[24]), .DO25 (DO[25]), .DO26 (DO[26]), .DO27 (DO[27]), .DO28 (DO[28]), .DO29 (DO[29]), .DO30 (DO[30]), .DO31 (DO[31]), .DI0 (DI[0] ), .DI1 (DI[1] ), .DI2 (DI[2] ), .DI3 (DI[3] ), .DI4 (DI[4] ), .DI5 (DI[5] ), .DI6 (DI[6] ), .DI7 (DI[7] ), .DI8 (DI[8] ), .DI9 (DI[9] ), .DI10 (DI[10]), .DI11 (DI[11]), .DI12 (DI[12]), .DI13 (DI[13]), .DI14 (DI[14]), .DI15 (DI[15]), .DI16 (DI[16]), .DI17 (DI[17]), .DI18 (DI[18]), .DI19 (DI[19]), .DI20 (DI[20]), .DI21 (DI[21]), .DI22 (DI[22]), .DI23 (DI[23]), .DI24 (DI[24]), .DI25 (DI[25]), .DI26 (DI[26]), .DI27 (DI[27]), .DI28 (DI[28]), .DI29 (DI[29]), .DI30 (DI[30]), .DI31 (DI[31]), .CK (CK ), .WEB0 (WEB[0]), .WEB1 (WEB[1]), .WEB2 (WEB[2]), .WEB3 (WEB[3]), .OE (OE ), .CS (CS ) ); endmodule ``` </details> ## Simulation Environment Setup This section describes how to set up the simulation environment for the 5-stage pipelined RISC-V processor with RV32IMC design using **ModelSim - Intel FPGA Starter Edition**. Create a working library: ```bash vlib work vmap work <path_to_work_directory> ``` Compile the SystemVerilog files: ```bash vcom -sv HazardDetection.sv ``` Run the Simulation: ```bash vsim work.HazardDetection ``` View the waveform to monitor signals: ```bash add wave -position end sim:/HazardDetection/* ``` ## Verify the correctness of this RISC-V CPU ## Compile and Simulation First, We use below `sim.do` in Modelsim to compile my program and start simualtion by entering "do sim.do" in Modelsim terminal ```c # Create work library vlib work # Compile design files in correct order vlog -sv parameter_define.sv vlog -sv PCReg.sv vlog -sv IFtoID.sv vlog -sv RegFile.sv vlog -sv Control_Unit.sv vlog -sv Immediate_Unit.sv vlog -sv IDtoEXE.sv vlog -sv ALU.sv vlog -sv MUX.sv vlog -sv MUX3.sv vlog -sv ConditionChecker.sv vlog -sv EXEtoMEM.sv vlog -sv MEMtoWB.sv vlog -sv Forwarding_Unit.sv vlog -sv LoadSignExtend.sv vlog -sv SaveControl.sv vlog -sv HazardDetection.sv vlog -sv CSR.sv vlog -sv SRAM.sv vlog -sv SRAM_wrapper.sv vlog -sv CPU.sv vlog -sv cpu_tb.sv # Start simulation vsim -c cpu_tb # Run simulation run -all ``` ## Testbench Second, We will explain our testbench program `cpu_tb.sv` ```sv= module cpu_tb(); // Clock and reset signals logic clk; logic rst; // Memory interface signals logic IM_cs, DM_cs; logic IM_oe, DM_oe; logic [3:0] IM_web, DM_web; logic [13:0] IM_addr, DM_addr; logic [31:0] IM_datain, IM_dataout; logic [31:0] DM_datain, DM_dataout; // File reading variables int i; int file; string line; logic [31:0] instruction; // Helper function to get memory word function logic [31:0] get_mem_word(input int addr); return {DM1.i_SRAM.Memory_byte3[addr], DM1.i_SRAM.Memory_byte2[addr], DM1.i_SRAM.Memory_byte1[addr], DM1.i_SRAM.Memory_byte0[addr]}; endfunction // Helper function to convert hex character to 4-bit value function logic [3:0] hex_to_4bit(input byte hex_char); if (hex_char >= "0" && hex_char <= "9") return hex_char - "0"; else if (hex_char >= "a" && hex_char <= "f") return hex_char - "a" + 10; else if (hex_char >= "A" && hex_char <= "F") return hex_char - "A" + 10; else return 4'h0; endfunction // Helper function to print registers function void print_registers(); $display("\nRegister File Contents:"); $display("----------------------"); for (int i = 0; i < 32; i += 4) begin $display("x%-2d: %8h x%-2d: %8h x%-2d: %8h x%-2d: %8h", i, cpu_inst.RF.RegMem[i], i+1, cpu_inst.RF.RegMem[i+1], i+2, cpu_inst.RF.RegMem[i+2], i+3, cpu_inst.RF.RegMem[i+3]); end endfunction // Helper function to print data memory contents function void print_data_mem(); $display("\nData Memory Contents:"); $display("--------------------"); for (int i = 0; i < 16; i += 4) begin $display("Word[%2d]: %8h Word[%2d]: %8h Word[%2d]: %8h Word[%2d]: %8h", i, get_mem_word(i), i+1, get_mem_word(i+1), i+2, get_mem_word(i+2), i+3, get_mem_word(i+3)); end endfunction // Clock generation initial begin clk = 0; forever #5 clk = ~clk; end // Reset generation initial begin rst = 1; #20 rst = 0; end // CPU instantiation CPU cpu_inst ( .clk(clk), .rst(rst), .IM_cs(IM_cs), .DM_cs(DM_cs), .IM_oe(IM_oe), .DM_oe(DM_oe), .IM_web(IM_web), .DM_web(DM_web), .IM_addr(IM_addr), .DM_addr(DM_addr), .IM_datain(IM_datain), .IM_dataout(IM_dataout), .DM_datain(DM_datain), .DM_dataout(DM_dataout) ); // Instruction Memory SRAM_wrapper IM1 ( .CK(clk), .CS(IM_cs), .OE(IM_oe), .WEB(IM_web), .A(IM_addr), .DI(IM_datain), .DO(IM_dataout) ); // Data Memory SRAM_wrapper DM1 ( .CK(clk), .CS(DM_cs), .OE(DM_oe), .WEB(DM_web), .A(DM_addr), .DI(DM_datain), .DO(DM_dataout) ); // Main test sequence initial begin i = 0; // Initialize memories for (int idx = 0; idx < 16384; idx++) begin // Initialize both memories to 0 IM1.i_SRAM.Memory_byte0[idx] = 8'h00; IM1.i_SRAM.Memory_byte1[idx] = 8'h00; IM1.i_SRAM.Memory_byte2[idx] = 8'h00; IM1.i_SRAM.Memory_byte3[idx] = 8'h00; DM1.i_SRAM.Memory_byte0[idx] = 8'h00; DM1.i_SRAM.Memory_byte1[idx] = 8'h00; DM1.i_SRAM.Memory_byte2[idx] = 8'h00; DM1.i_SRAM.Memory_byte3[idx] = 8'h00; end // Load test.mem into instruction memory file = $fopen("test.mem", "r"); if (file) begin while (!$feof(file) && i < 1024) begin void'($fgets(line, file)); // Skip comment lines and empty lines if (line.len() > 0 && line[0] != "/" && line[1] != "/") begin // Read 8 hex characters for the instruction if (line.len() >= 8) begin instruction[31:28] = hex_to_4bit(line[0]); instruction[27:24] = hex_to_4bit(line[1]); instruction[23:20] = hex_to_4bit(line[2]); instruction[19:16] = hex_to_4bit(line[3]); instruction[15:12] = hex_to_4bit(line[4]); instruction[11:8] = hex_to_4bit(line[5]); instruction[7:4] = hex_to_4bit(line[6]); instruction[3:0] = hex_to_4bit(line[7]); // Store in instruction memory IM1.i_SRAM.Memory_byte3[i] = instruction[31:24]; IM1.i_SRAM.Memory_byte2[i] = instruction[23:16]; IM1.i_SRAM.Memory_byte1[i] = instruction[15:8]; IM1.i_SRAM.Memory_byte0[i] = instruction[7:0]; i = i + 1; end end end $fclose(file); end else begin $display("Error: Could not open test.mem"); $finish; end // Wait for reset @(negedge rst); // Run program repeat(2000) @(posedge clk); // Print final state print_registers(); print_data_mem(); $stop; end endmodule ``` ### 1.Basic Purpose: This testbench runs a RISC-V program in hex format from `test.mem` It simulates the CPU with instruction and data memory At the end, it shows the final state of all registers and data memory ### 2.Key Components: - CPU instance (cpu_inst) - Instruction Memory (IM1) - Data Memory (DM1) ### 3.Test flow ```sv= initial begin // 1. Initialize all memory to zero for (int idx = 0; idx < 16384; idx++) begin IM1.i_SRAM.Memory_byte[0-3][idx] = 8'h00; DM1.i_SRAM.Memory_byte[0-3][idx] = 8'h00; end // 2. Read and load program from test.mem file = $fopen("test.mem", "r"); // Read hex instructions line by line // Each line should contain 8 hex characters (32-bit instruction) // 3. Wait for reset @(negedge rst); // 4. Run program for 2000 clock cycles repeat(2000) @(posedge clk); // 5. Print results print_registers(); print_data_mem(); end ``` ### 4.How to modify ```sv= // A. Change number of cycles: // Find this line: repeat(2000) @(posedge clk); // Change 2000 to desired number // B. Change memory size: // Find this line: for (int idx = 0; idx < 16384; idx++) begin // Change 16384 to desired size // C. Change displayed memory range: // In print_data_mem function: for (int i = 0; i < 16; i += 4) begin // Change 16 to show more/less memory words ``` ### 5.test.mem Format: - Each line should contain one 32-bit instruction in hexadecimal - 8 characters per line (e.g., "00500093") - Can include comment lines starting with "//" - Memory address start at 0 at default - In this project, We write testing program in RISC-V on our own and use Venus to gernerate the hexadecimal machine code as our test.mem ## Start Testing Due to our memory model (`SRAM.sv` and `SRAM_wrapper.sv`), we do not support .data usage and our default memory starts as address 0. Below is an example ```sv= # Common practice with .data .data # Data segment array: .word 1,2,3 # Array starting at some address # Our CPU requires (memory starts at 0): addi x1, x0, 1 # x1 = 1 addi x2, x0, 2 # x2 = 2 addi x3, x0, 3 # x3 = 3 sw x1, 0(x0) # Store 1 at addr 0 sw x2, 4(x0) # Store 2 at addr 4 sw x3, 8(x0) # Store 3 at addr 8 ``` ### Program1 - Bubble sort ```sv= # Initialize array [5, 2, 7, 1, 3] addi x1, x0, 5 # Load 5 sw x1, 0(x0) # Store at mem[0] addi x1, x0, 2 # Load 2 sw x1, 4(x0) # Store at mem[1] addi x1, x0, 7 # Load 7 sw x1, 8(x0) # Store at mem[2] addi x1, x0, 1 # Load 1 sw x1, 12(x0) # Store at mem[3] addi x1, x0, 3 # Load 3 sw x1, 16(x0) # Store at mem[4] # Initialize counters addi x2, x0, 4 # n-1 = 4 (outer loop limit) addi x3, x0, 0 # i = 0 (outer loop counter) # Outer Loop outer: beq x3, x2, done # if i == n-1, jump to done addi x4, x0, 0 # j = 0 (inner loop counter) sub x5, x2, x3 # limit = (n-1)-i jal x0, inner # Jump to inner loop # Inner Loop inner: beq x4, x5, next_i # if j == limit, jump to next_i slli x6, x4, 2 # x6 = j * 4 lw x7, 0(x6) # load A[j] lw x8, 4(x6) # load A[j+1] bge x8, x7, skip # if A[j+1] >= A[j], skip swap # Swap values sw x8, 0(x6) # store smaller value sw x7, 4(x6) # store larger value skip: addi x4, x4, 1 # j++ jal x0, inner # Jump back to inner loop # Move to the Next Outer Loop Iteration next_i: addi x3, x3, 1 # i++ jal x0, outer # Jump back to outer loop # End Program done: # Program terminates here ``` #### Corresponding hexadecimal machine code ```c 00500093 00102023 00200093 00102223 00700093 00102423 00100093 00102623 00300093 00102823 00400113 00000193 02218e63 00000213 403102b3 0040006f 02520263 00221313 00032383 00432403 00745663 00832023 00732223 00120213 fe1ff06f 00118193 fc9ff06f ``` #### Output of Program1 ```c # Data Memory Contents: # -------------------- # Word[ 0]: 00000001 Word[ 1]: 00000002 Word[ 2]: 00000003 Word[ 3]: 00000005 # Word[ 4]: 00000007 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000 # Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000 # Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000 ``` As above output, [5, 2, 7, 1, 3] is sorted and become [1, 2, 3, 5, 7] ### Program2 In this second program, we want to compute the dot product of two arrays ```sv= # Compute the dot product of two arrays # Array A: [3, -4, 2] (signed, manually initialized in memory) # Array B: [5, 6, 7] (unsigned, manually initialized in memory) # Initialize array A addi x10, x0, 3 # Load 3 into x10 sw x10, 0(x0) # Store A[0] at address 0x00000000 addi x10, x0, -4 # Load -4 into x10 sw x10, 4(x0) # Store A[1] at address 0x00000004 addi x10, x0, 2 # Load 2 into x10 sw x10, 8(x0) # Store A[2] at address 0x00000008 # Initialize array B addi x10, x0, 5 # Load 5 into x10 sw x10, 12(x0) # Store B[0] at address 0x0000000C addi x10, x0, 6 # Load 6 into x10 sw x10, 16(x0) # Store B[1] at address 0x00000010 addi x10, x0, 7 # Load 7 into x10 sw x10, 20(x0) # Store B[2] at address 0x00000014 # Initialize variables addi x5, x0, 0 # Base address for array A (0x00000000) addi x6, x0, 12 # Base address for array B (0x0000000C) addi x2, x0, 3 # n = 3 (array size) addi x3, x0, 0 # i = 0 (index counter) addi x4, x0, 0 # dot_product = 0 (result accumulator) # Loop to compute the dot product loop: beq x3, x2, done # If i == n, exit loop # Load A[i] (signed) and B[i] (unsigned) slli x7, x3, 2 # x7 = i * 4 (offset for arrays) add x8, x5, x7 # Address of A[i] add x9, x6, x7 # Address of B[i] lw x10, 0(x8) # Load A[i] into x10 (signed) lw x11, 0(x9) # Load B[i] into x11 (unsigned) # Compute A[i] * B[i] using MUL (lower 32 bits) mul x12, x10, x11 # x12 = A[i] * B[i] (lower 32 bits) # Compute A[i] * B[i] using MULH (upper 32 bits, signed × signed) mulh x13, x10, x11 # x13 = Upper 32 bits of A[i] * B[i] (signed × signed) # Compute A[i] * B[i] using MULHSU (upper 32 bits, signed × unsigned) mulhsu x14, x10, x11 # x14 = Upper 32 bits of A[i] * B[i] (signed × unsigned) # Compute A[i] * B[i] using MULHU (upper 32 bits, unsigned × unsigned) mulhu x15, x11, x11 # x15 = Upper 32 bits of B[i] * B[i] (unsigned × unsigned) # Accumulate the lower 32 bits for dot product add x4, x4, x12 # dot_product += A[i] * B[i] (lower 32 bits) # Increment index addi x3, x3, 1 # i++ jal x0, loop # Jump back to loop # Store the final dot product done: sw x4, 24(x0) # Store dot_product in mem[6] # End program ``` #### Corresponding hexadecimal machine code ```c 00300513 00a02023 ffc00513 00a02223 00200513 00a02423 00500513 00a02623 00600513 00a02823 00700513 00a02a23 00000293 00c00313 00300113 00000193 00000213 02218a63 00219393 00728433 007304b3 00042503 0004a583 02b50633 02b516b3 02b52733 02b5b7b3 00c20233 00118193 fd1ff06f 00402c23 ``` #### Expected output of Program2 | Address | Value (Hex) | Value (Decimal) | Description | |--------------|---------------|-----------------|----------------------| | Word[ 0] | 0x00000003 | 3 | Array A[0]. | | Word[ 1] | 0xFFFFFFFC | -4 | Array A[1]. | | Word[ 2] | 0x00000002 | 2 | Array A[2]. | | Word[ 3] | 0x00000005 | 5 | Array B[0]. | | Word[ 4] | 0x00000006 | 6 | Array B[1]. | | Word[ 5] | 0x00000007 | 7 | Array B[2]. | | Word[ 6] | 0x00000005 | 5 | Final dot product result. | #### Output of Program2 ```c # Data Memory Contents: # -------------------- # Word[ 0]: 00000003 Word[ 1]: fffffffc Word[ 2]: 00000002 Word[ 3]: 00000005 # Word[ 4]: 00000006 Word[ 5]: 00000007 Word[ 6]: 00000005 Word[ 7]: 00000000 # Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000 # Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000 ``` ### Program3 In this program, We want to test the 4 multuplication related instructino, which are MUL, MULH, MULHSU, and MULHU ```sv= # Initialize test values lui x1, 0x80000 # x1 = 0x80000000 (most negative signed 32-bit) addi x2, x0, -2 # x2 = -2 (0xFFFFFFFE) lui x3, 0x7FFF # x3 = 0x7FFF0000 addi x3, x3, 0x7FF # x3 = 0x7FFFFFFF (largest positive signed) # Test MUL (lower 32 bits only) mul x4, x1, x2 # (-2^31) * (-2) = 2^32 sw x4, 0(x0) # Should store 0x00000000 (lower 32 bits) # Test MULH (signed × signed) mulh x5, x1, x2 # (-2^31) * (-2) = 2^32, upper bits sw x5, 4(x0) # Should store 0x00000001 (upper 32 bits) # Test MULHSU (signed × unsigned) mulhsu x6, x1, x2 # (-2^31) treated as signed, (-2) treated as unsigned sw x6, 8(x0) # Should show difference from MULH # Test MULHU (unsigned × unsigned) mulhu x7, x1, x2 # Both treated as unsigned values sw x7, 12(x0) # Should show difference from both MULH and MULHSU # End program ``` #### Corresponding hexadecimal machine code ```c 800000b7 ffe00113 07fff1b7 7ff18193 02208233 00402023 022092b3 00502223 0220a333 00602423 0220b3b3 00702623 ``` #### Expected output of Program3 | Register | Value (Hex) | Value (Decimal) | Description | |----------|--------------|-----------------|--------------------------------------| | x1 | 0x80000000 | -2147483648 | From lui x1, 0x800000 | | x2 | 0xFFFFFFFE | -2 | From addi x2, x0, -2 | | x3 | 0x07FFF7FF | 8,388,607 | From lui + addi combination | | x4 | 0x00000000 | 0 | Lower 32 bits of x1 * x2 | | x5 | 0x00000001 | 1 | Upper 32 bits (signed × signed) | | x6 | 0x80000001 | -2147483647 | Upper 32 bits (signed × unsigned) | | x7 | 0x7FFFFFFF | 2147483647 | Upper 32 bits (unsigned × unsigned) | | Address | Value (Hex) | Description | |----------|--------------|--------------------------------------| | 0x00 | 0x00000000 | MUL result (lower 32 bits) | | 0x04 | 0x00000001 | MULH result (signed × signed) | | 0x08 | 0x80000001 | MULHSU result (signed × unsigned) | | 0x0C | 0x7FFFFFFF | MULHU result (unsigned × unsigned) | #### Output of Program3 ```c # Register File Contents: # ---------------------- # x0 : 00000000 x1 : 80000000 x2 : fffffffe x3 : 07fff7ff # x4 : 00000000 x5 : 00000001 x6 : 80000001 x7 : 7fffffff # x8 : 00000000 x9 : 00000000 x10: 00000000 x11: 00000000 # x12: 00000000 x13: 00000000 x14: 00000000 x15: 00000000 # x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000 # x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000 # x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000 # x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000 # # Data Memory Contents: # -------------------- # Word[ 0]: 00000000 Word[ 1]: 00000001 Word[ 2]: 80000001 Word[ 3]: 7fffffff # Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000 # Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000 # Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000 ``` ### Program4 In this program, We want to verify the hazard detection and forwarding ability of our CPU, Hence, We created a program with lots of hazards, including load-use hazard. ```sv= # Initialize registers addi x1, x0, 10 # x1 = 10 addi x2, x0, 20 # x2 = 20 addi x3, x0, 5 # x3 = 5 addi x4, x0, 3 # x4 = 3 # Store values into memory sw x1, 0(x0) # Store x1 (10) at address 0x00000000 sw x2, 4(x0) # Store x2 (20) at address 0x00000004 sw x3, 8(x0) # Store x3 (5) at address 0x00000008 sw x4, 12(x0) # Store x4 (3) at address 0x0000000C # Load values from memory and create load-use hazards lw x5, 0(x0) # x5 = mem[0] = 10 add x6, x5, x2 # x6 = x5 + x2 (data hazard with x5) lw x7, 4(x0) # x7 = mem[4] = 20 sub x8, x7, x3 # x8 = x7 - x3 (data hazard with x7) lw x9, 8(x0) # x9 = mem[8] = 5 mul x10, x9, x6 # x10 = x9 * x6 (data hazard with x6 and x9) add x11, x10, x8 # x11 = x10 + x8 (data hazard with x10 and x8) # Additional operations to create more hazards mul x12, x11, x4 # x12 = x11 * x4 (data hazard with x11) add x13, x12, x5 # x13 = x12 + x5 (data hazard with x12 and x5) sub x14, x13, x9 # x14 = x13 - x9 (data hazard with x13 and x9) ``` #### Corresponding hexadecimal machine code ```c 00a00093 01400113 00500193 00300213 00102023 00202223 00302423 00402623 00002283 00228333 00402383 40338433 00802483 02648533 008505b3 02458633 005606b3 40968733 ``` #### Expected output of Program4 **Registers** | **Register** | **Value (Hex)** | **Value (Decimal)** | **Description** | |--------------|-----------------|---------------------|-------------------------------------| | x0 | 0x00000000 | 0 | Zero register (always 0). | | x1 | 0x0000000A | 10 | Initialized value. | | x2 | 0x00000014 | 20 | Initialized value. | | x3 | 0x00000005 | 5 | Initialized value. | | x4 | 0x00000003 | 3 | Initialized value. | | x5 | 0x0000000A | 10 | Loaded from memory[0]. | | x6 | 0x0000001E | 30 | Result of `x5 + x2`. | | x7 | 0x00000014 | 20 | Loaded from memory[4]. | | x8 | 0x0000000F | 15 | Result of `x7 - x3`. | | x9 | 0x00000005 | 5 | Loaded from memory[8]. | | x10 | 0x00000096 | 150 | Result of `x9 * x6`. | | x11 | 0x000000A5 | 165 | Result of `x10 + x8`. | | x12 | 0x000001EF | 495 | Result of `x11 * x4`. | | x13 | 0x000001F9 | 505 | Result of `x12 + x5`. | | x14 | 0x000001F4 | 500 | Result of `x13 - x9`. | --- **Data Memory** | **Address** | **Value (Hex)** | **Value (Decimal)** | **Description** | |--------------|-----------------|---------------------|-------------------------------------| | Word[0] | 0x0000000A | 10 | Stored value of `x1`. | | Word[1] | 0x00000014 | 20 | Stored value of `x2`. | | Word[2] | 0x00000005 | 5 | Stored value of `x3`. | | Word[3] | 0x00000003 | 3 | Stored value of `x4`. | | Word[4-15] | 0x00000000 | 0 | Placeholder for unused memory. | #### Output of Program4 ```c # Register File Contents: # ---------------------- # x0 : 00000000 x1 : 0000000a x2 : 00000014 x3 : 00000005 # x4 : 00000003 x5 : 0000000a x6 : 0000001e x7 : 00000014 # x8 : 0000000f x9 : 00000005 x10: 00000096 x11: 000000a5 # x12: 000001ef x13: 000001f9 x14: 000001f4 x15: 00000000 # x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000 # x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000 # x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000 # x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000 # # Data Memory Contents: # -------------------- # Word[ 0]: 0000000a Word[ 1]: 00000014 Word[ 2]: 00000005 Word[ 3]: 00000003 # Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000 # Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000 # Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000 ``` ### Program5 In this program, We want to test I-type and S-type instrcutions, especially those dealing with byte and half-word ```sv= # Test program using various I-type and S-type instructions addi x1, x0, 10 # x1 = 10 addi x2, x0, -5 # x2 = -5 (sign extended) xori x3, x1, 0xFF # x3 = 10 ^ 0xFF = 0xF5 ori x4, x2, 0x7F # x4 = -5 | 0x7F andi x5, x3, 0xF # x5 = 0xF5 & 0xF = 0x5 slli x6, x1, 2 # x6 = 10 << 2 = 40 srli x7, x3, 1 # x7 = 0xF5 >> 1 srai x8, x2, 1 # x8 = -5 >> 1 (arithmetic) slti x9, x2, 0 # x9 = (-5 < 0) ? 1 : 0 sltiu x10, x1, 20 # x10 = (10 < 20) ? 1 : 0 # Store some values sw x1, 0(x0) # Store x1(10) at mem[0] sh x3, 4(x0) # Store x3(0xF5) as halfword at mem[4] sb x5, 6(x0) # Store x5(5) as byte at mem[8] # Load values back lw x11, 0(x0) # Load word from mem[0] lh x12, 4(x0) # Load halfword (signed) from mem[4] lhu x13, 4(x0) # Load halfword (unsigned) from mem[4] lb x14, 6(x0) # Load byte (signed) from mem[8] lbu x15, 6(x0) # Load byte (unsigned) from mem[8] ``` #### Corresponding hexadecimal machine code ```c 00a00093 ffb00113 0ff0c193 07f16213 00f1f293 00209313 0011d393 40115413 00012493 0140b513 00102023 00301223 00500323 00002583 00401603 00405683 00600703 00604783 ``` #### Expected output of Program5 ## Expected Output ## Register File Contents | **Register** | **Value (Hex)** | **Value (Decimal)** | **Explanation** | |--------------|-----------------|---------------------|---------------------------------------------------------------------------------| | x0 | 0x00000000 | 0 | Zero register (always 0). | | x1 | 0x0000000A | 10 | `addi x1, x0, 10`. | | x2 | 0xFFFFFFFB | -5 | `addi x2, x0, -5`. | | x3 | 0x000000F5 | 245 | `xori x3, x1, 0xFF` → `10 ^ 0xFF = 0xF5`. | | x4 | 0xFFFFFFFF | -1 | `ori x4, x2, 0x7F` → `-5 | 0x7F = 0xFFFFFFFF`. | | x5 | 0x00000005 | 5 | `andi x5, x3, 0xF` → `0xF5 & 0xF = 0x5`. | | x6 | 0x00000028 | 40 | `slli x6, x1, 2` → `10 << 2 = 40`. | | x7 | 0x0000007A | 122 | `srli x7, x3, 1` → `0xF5 >> 1 = 0x7A`. | | x8 | 0xFFFFFFFD | -3 | `srai x8, x2, 1` → `-5 >> 1 = -3` (arithmetic shift keeps the sign). | | x9 | 0x00000001 | 1 | `slti x9, x2, 0` → `(-5 < 0) ? 1 : 0`. | | x10 | 0x00000001 | 1 | `sltiu x10, x1, 20` → `(10 < 20) ? 1 : 0`. | | x11 | 0x0000000A | 10 | `lw x11, 0(x0)` → Loads word from `mem[0]`. | | x12 | 0x000000F5 | 245 | `lh x12, 4(x0)` → Signed halfword from `mem[4]`. | | x13 | 0x000000F5 | 245 | `lhu x13, 4(x0)` → Unsigned halfword from `mem[4]`. | | x14 | 0xFFFFFFF5 | -11 | `lb x14, 6(x0)` → Signed byte from `mem[6]`. | | x15 | 0x000000F5 | 5 | `lbu x15, 6(x0)` → Unsigned byte from `mem[6]`. | --- ## Data Memory Contents | **Word Address** | **Value (Hex)** | **Explanation** | |------------------|-----------------|-----------------------------------------------------| | Word[ 0] | 0x0000000A | Stored by `sw x1, 0(x0)`. | | Word[ 1] | 0x000500F5 | Combined result of `sh x3, 4(x0)` and `sb x5, 6(x0)`. | #### Output of Program5 ```c # Register File Contents: # ---------------------- # x0 : 00000000 x1 : 0000000a x2 : fffffffb x3 : 000000f5 # x4 : ffffffff x5 : 00000005 x6 : 00000028 x7 : 0000007a # x8 : fffffffd x9 : 00000001 x10: 00000001 x11: 0000000a # x12: 000000f5 x13: 000000f5 x14: fffffff5 x15: 000000f5 # x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000 # x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000 # x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000 # x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000 # # Data Memory Contents: # -------------------- # Word[ 0]: 0000000a Word[ 1]: 000500f5 Word[ 2]: 00000000 Word[ 3]: 00000000 # Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000 # Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000 # Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000 ``` ### Program6 In this program, we want to test U-type and B-type instructions ```sv= # U-type instructions test lui x1, 0x12345 # x1 = 0x12345000 auipc x2, 0x1000 # x2 = PC + 0x1000000 # Test signed vs unsigned comparisons with distinct values addi x3, x0, -5 # x3 = -5 (0xFFFFFFFB) addi x4, x0, 10 # x4 = 10 # Store markers at different memory locations to track which path was taken addi x5, x0, 1 # Path marker = 1 sw x5, 0(x0) # Store at mem[0] before comparison # Test signed comparison blt x3, x4, signed_path # Should take (-5 < 10) addi x5, x0, 20 # Won't execute sw x5, 4(x0) # Won't execute j next_test signed_path: addi x5, x0, 10 # Will execute sw x5, 4(x0) # Store at mem[1] to show path taken next_test: # Test unsigned comparison bltu x3, x4, unsigned_path # Should not take (0xFFFFFFFB > 10) addi x6, x0, 30 # Will execute sw x6, 8(x0) # Store at mem[2] j done unsigned_path: addi x6, x0, 40 # Won't execute sw x6, 8(x0) # Won't execute done: ``` #### Corresponding hexadecimal machine code ```c 123450b7 01000117 ffb00193 00a00213 00100293 00502023 0041c863 01400293 00502223 00c0006f 00a00293 00502223 0041e863 01e00313 00602423 ``` #### Expected output of Program6 | Register | Value (Hex) | Value (Decimal) | Description | |----------|--------------|-----------------|------------------------------------| | x1 | 0x12345000 | 305419264 | Upper 20 bits loaded by lui | | x2 | 0x01000004 | 16777220 | PC + 0x1000000 from auipc | | x3 | 0xFFFFFFFB | -5 | Negative value for comparison | | x4 | 0x0000000A | 10 | Positive value for comparison | | x5 | 0x0000000A | 10 | Shows signed branch was taken | | x6 | 0x0000001E | 30 | Shows unsigned branch not taken | | Address | Value (Hex) | Value (Decimal) | Description | |----------|--------------|-----------------|------------------------------------| | 0x00 | 0x00000001 | 1 | Initial marker | | 0x04 | 0x0000000A | 10 | Shows signed path taken | | 0x08 | 0x0000001E | 30 | Shows unsigned path not taken | #### Output of Program6 ```c # Register File Contents: # ---------------------- # x0 : 00000000 x1 : 12345000 x2 : 01000004 x3 : fffffffb # x4 : 0000000a x5 : 0000000a x6 : 0000001e x7 : 00000000 # x8 : 00000000 x9 : 00000000 x10: 00000000 x11: 00000000 # x12: 00000000 x13: 00000000 x14: 00000000 x15: 00000000 # x16: 00000000 x17: 00000000 x18: 00000000 x19: 00000000 # x20: 00000000 x21: 00000000 x22: 00000000 x23: 00000000 # x24: 00000000 x25: 00000000 x26: 00000000 x27: 00000000 # x28: 00000000 x29: 00000000 x30: 00000000 x31: 00000000 # # Data Memory Contents: # -------------------- # Word[ 0]: 00000001 Word[ 1]: 0000000a Word[ 2]: 0000001e Word[ 3]: 00000000 # Word[ 4]: 00000000 Word[ 5]: 00000000 Word[ 6]: 00000000 Word[ 7]: 00000000 # Word[ 8]: 00000000 Word[ 9]: 00000000 Word[10]: 00000000 Word[11]: 00000000 # Word[12]: 00000000 Word[13]: 00000000 Word[14]: 00000000 Word[15]: 00000000 ```