# Memory and Write Back Analysis In class Disccusion(11/25) about which instructions need memory and/or wb and corresponding signals ## Introduction This report analyzes the **Memory (MEM)** and **Writeback (WB)** behavior in the single-cycle CPU designed for Lab 3. It explains which instructions require MEM/WB, how control flows through the pipeline-like stages, and how loads/stores/PC+4 writeback are implemented. Although the CPU is *single-cycle*, its logic is divided conceptually into: IF → ID → EX → MEM → WB This improves organization and debugging. In one clock cycle, all these computations occur combinationally. Below is the analysis of MEM/WB stages and their interactions with control signals. ## Instruction Types and Stage Usage Different RISC-V instruction classes interact with the **MEM** and **WB** stages in different ways. This section summarizes which types of instructions require memory access, which require register writeback, and what data is produced or consumed in each case. ### Instructions that use the MEM stage The MEM stage is responsible for reading from or writing to data memory. Only **load** and **store** instructions activate this stage: | Instruction Type | Examples | Uses MEM? | Description | |------------------|----------------------------------------|-----------|-------------| | **Load** | `LB`, `LH`, `LW`, `LBU`, `LHU` | ✔ | Reads data from memory and forwards it to WB | | **Store** | `SB`, `SH`, `SW` | ✔ | Writes register data to memory using byte strobes | Only loads and stores truly “use” MEM; all other instruction types bypass this stage. --- ### Instructions that use the WB stage The Writeback (WB) stage determines the final value written into the destination register (`rd`). Depending on the instruction type, the writeback source differs: | Instruction Type | Examples | Writes Back? | Writeback Source | |------------------|------------------|--------------|------------------| | **Op / Op-Imm** | `ADD`, `ANDI` | ✔ | ALU result | | **Load** | `LW`, `LBU`, … | ✔ | Memory read data | | **LUI** | `LUI` | ✔ | Immediate << 12 | | **AUIPC** | `AUIPC` | ✔ | PC + immediate | | **Jump** | `JAL`, `JALR` | ✔ | PC + 4 (return address) | This classification determines: - which instructions set `reg_write_enable` - and how `regs_write_source` is encoded during decode. ## Control Signals for MEM After identifying which instructions require the MEM and WB stages, we now examine **how the Decode stage generates control signals** that govern memory access and register writeback behavior. Although the CPU is single-cycle, each conceptual stage activates specific combinational logic based on the control signals produced in the Decode stage. ### MemoryAccess Stage: Control and Data Flow The `MemoryAccess` module implements the MEM stage of the single-cycle CPU. Its behavior is controlled entirely by two signals generated in Decode: - `memory_read_enable` - `memory_write_enable` Together with `alu_result`, `funct3`, and `reg2_data`, these signals define how load and store instructions interact with data memory. --- ### Memory Address Decoding (Shared by Load & Store) Even though the memory is stored as **32-bit words (4 bytes)**, RV32I uses **byte-addressing**. Therefore, the lower bits of `alu_result` determine which byte or halfword inside a word is accessed. ```scala val mem_address_index = io.alu_result(log2Up(Parameters.WordSize) - 1, 0) ``` If `WordSize = 4`, then: ``` log2Up(4) = 2 mem_address_index = alu_result(1, 0) ``` This index is used by *both* load and store logic to: - pick the correct byte from the 32-bit word: `bytes(mem_address_index)` - determine whether the halfword is lower or upper: `mem_address_index(1)` - activate the correct byte strobes for SB/SH/SW - shift stored data into the correct byte position inside the word --- ### Load Path (When `memory_read_enable = 1`) When `memory_read_enable = 1`, the MEM stage performs a memory read. The effective address was already computed in Execute: ```text effective_address = rs1 + immediate ``` This address is forwarded to the memory system: ```scala io.memory_bundle.address := io.alu_result ``` The memory returns a full 32-bit word, which is then split into 4 bytes: ```scala val data = io.memory_bundle.read_data val bytes = Wire(Vec(4, UInt(8.W))) ``` The correct byte or halfword is selected using `mem_address_index`, and then sign-/zero-extended depending on `funct3`: - `LB`, `LH` → sign-extend - `LBU`, `LHU` → zero-extend - `LW` → no extension ```scala io.wb_memory_read_data := MuxLookup(io.funct3, 0.U)(...) ``` This value is forwarded to the WB stage as the load result. --- ### Store Path (When `memory_write_enable = 1`) When `memory_write_enable = 1`, the MEM stage performs a **memory write**. Just like loads, the effective address was already computed in Execute: ```text effective_address = rs1 + immediate ``` And is forwarded to the memory system as: ```scala io.memory_bundle.address := io.alu_result io.memory_bundle.write_enable := true.B ``` Stores have two responsibilities: 1. **Enable the correct byte strobes** 2. **Shift the store data into the correct byte position** Both depend on `mem_address_index`. --- #### Byte Strobes (Which bytes in the word to write) RISC-V stores support 3 sizes: | Instruction | Description | Bytes Written | Depends on | |------------|-------------|----------------|-------------| | **SB** | Store byte | 1 byte | `mem_address_index` | | **SH** | Store half | 2 bytes | `mem_address_index(1)` | | **SW** | Store word | 4 bytes | none | The hardware creates a 4-bit strobe vector: ``` write_strobe = [b3 b2 b1 b0] ``` Each bit indicates whether a byte in the 32-bit word should be overwritten. Example patterns: | Operation | `mem_address_index` | Strobes | |----------|----------------------|---------| | SB | 2 | 0 0 1 0 | | SH | 0 | 0 0 1 1 | | SH | 2 | 1 1 0 0 | | SW | x | 1 1 1 1 | --- #### Data Shifting (Align store data into the correct byte range) Because memory writes whole **32-bit words**, sub-word writes require shifting: - SB: shift by `(index * 8)` bits - SH (upper): shift by 16 bits - SW: no shift needed Example (SB): ```scala writeData := data(7,0) << (mem_address_index << 3) ``` Example (SH upper halfword): ```scala writeData := data(15,0) << 16 ``` The selected data is then forwarded to memory: ```scala io.memory_bundle.write_data := writeData io.memory_bundle.write_strobe := writeStrobes ``` --- ### Demonstration Using Waveforms (SB → LW) To validate that the MEM stage behaves correctly, we use a small test program (`sb.asmbin`) that performs the following sequence: 1. Compute a base address 2. Construct a 32-bit constant (0xDEADBEEF) 3. **Store individual bytes** into memory using `SB` 4. **Load back a full word** using `LW` to confirm the stored bytes were placed correctly This program is ideal because it exercises both the **store path** (byte-level write) and the **load path** (full-word read + extension logic). When running the simulation with Verilator and observing the VCD waveform, we can verify both operations. Dump Code of the program ```asm= 0: 00400513 li a0,4 4: deadc2b7 lui t0,0xdeadc 8: eef28293 addi t0,t0,-273 # 0xdeadbeef c: 00550023 sb t0,0(a0) 10: 00052303 lw t1,0(a0) 14: 01500913 li s2,21 18: 012500a3 sb s2,1(a0) 1c: 00052083 lw ra,0(a0) 20: 0000006f j 0x20 ``` --- #### Store Operation (SB) During an `SB` instruction: - `memory_write_enable` becomes `1` - `mem_address_index` selects the target byte inside the 32-bit word - only one bit of `write_strobe` is asserted (e.g., `0010` for index 1) - `write_data` contains the 8-bit value shifted to the correct byte position ![截圖 2025-11-30 晚上7.05.46](https://hackmd.io/_uploads/H1szBiY-Wg.png) In the waveform above, the SB instruction begins at the cycle where `io_instruction_address = 0x0000100c`, matching the instruction `sb t0, 0(a0)` in the program(13ps to 17ps). At the following cycle: - `io_memory_write_enable` goes HIGH, indicating a store. - The ALU computes the address: ```text alu_result = 0x00000004 mem_address_index = alu_result[1:0] = 0 ``` Since the index is `0`, the **lowest byte** of the 32-bit memory word is selected. This is visible in the strobe lines: ```text write_strobe = 0001 ``` The byte written comes from the low 8 bits of `t0 = 0xDEADBEEF`, so: ```text writeData = 0x000000EF ``` Only that single byte is updated in memory; the remaining three bytes remain untouched. This confirms that the store-byte implementation correctly: - decodes the address offset, - activates the proper byte strobe, - and shifts the stored byte into the correct 8-bit lane. --- #### Load Operation (LW) When the program immediately performs an `LW` from the same address: - `memory_read_enable` becomes `1` - the memory returns the entire 32-bit word - the CPU does **not** use byte strobes; the full word is forwarded to the load-extension logic - for `LW`, no sign/zero extension is needed → the 32-bit value stored earlier is returned directly ![截圖 2025-11-30 晚上7.18.07](https://hackmd.io/_uploads/rySfdjKW-e.png) During an `LW` instruction, the MEM stage performs a full 32-bit word read. In the waveform shown, the load activity starts around **18 ps**, where the CPU reaches: `io_instruction_address = 0x00001010` which corresponds to the `lw t1, 0(a0)` instruction inside the SB test program. From this point onward, the following signals illustrate the load behavior: - `io_memory_read_enable = 1` This indicates that the current instruction is a memory read. The write-enable signal stays low (`io_memory_write_enable = 0`). - The ALU-computed effective address becomes available: `io_alu_result = 0x00000004` This address is forwarded to memory as `io_memory_bundle.address = 0x00000004`. - Since RV32I uses byte addressing but memory stores 32-bit words, the lower two bits determine the byte offset: `mem_address_index = io.alu_result(1,0) = 0` For `LW`, natural alignment is expected, so the index being zero confirms that the access is aligned. - The data memory returns the full 32-bit word: `io_memory_bundle.read_data = 0x000000ef` This matches the earlier store-byte operation, which wrote only the least-significant byte. The 8-bit split bytes in the waveform confirm this: `bytes_0 = ef`, `bytes_1 = 00`, `bytes_2 = 00`, `bytes_3 = 00`. - The MemoryAccess unit forwards the load result directly toward the Writeback stage without sign or zero extension (since `LW` loads a full word): `io_wb_memory_read_data = 0x000000ef`. This demonstrates that: - the read-enable logic activates correctly, - the EX stage provides the correct effective address, - memory returns the expected updated word, - and the WB stage receives the correct 32-bit load result. The waveform verifies that the LW datapath—address forwarding, word extraction, and MEM→WB propagation—operates correctly on a naturally aligned load. --- ## Control Signals for WB The Writeback (WB) stage is the final step in the single-cycle CPU’s execution flow. Its responsibility is simple but essential: **select the correct data source and deliver it to the register file (`rd`)**. Unlike the MEM stage, WB does not determine *whether* a register should be written. That decision is made earlier in the Decode stage through the `reg_write_enable` control signal. WB only determines **what value should be written**, based on the control encoding: - `RegWriteSource.ALUResult` - `RegWriteSource.Memory` - `RegWriteSource.NextInstructionAddress` These values are selected through a multiplexer implemented using `MuxLookup`. --- ### WB Inputs and Outputs The WriteBack module receives three possible sources of data: instruction_address : PC of current instruction alu_result : result computed in the EX stage memory_read_data : value loaded in the MEM stage regs_write_source : select signal indicating which source to use Its output is: regs_write_data : final data to be written into register rd This output is later consumed by the register file only when `reg_write_enable = 1`. --- ### Writeback Data Source Selection The core logic of the WB stage is: ```scala= io.regs_write_data := MuxLookup(io.regs_write_source, io.alu_result)( Seq( RegWriteSource.Memory -> io.memory_read_data, RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U) ) ) ``` This logic implements a 3-way multiplexer: #### 1. Default: ALU Result If `regs_write_source` is neither `Memory` nor `NextInstructionAddress`, `regs_write_data` defaults to the ALU output. This covers the majority of ALU and immediate instructions: - `ADD`, `SUB`, `AND`, `OR`, `XOR` - `ADDI`, `ANDI`, `ORI`, `XORI` - shifts (`SLL`, `SRL`, `SRA`) - comparisons (`SLT`, `SLTU`) - `LUI`, `AUIPC` (their results are generated through the ALU path) #### 2. Load Instructions Load instructions: - `LB`, `LH`, `LW`, `LBU`, `LHU` set `regs_write_source = RegWriteSource.Memory`. The value passed into WB (`io.memory_read_data`) is the sign-extended or zero-extended value produced in the MEM stage. WB simply forwards this value to be written into `rd`. #### 3. Jump Instructions (JAL, JALR) Jump instructions need to write **PC + 4** into `rd` as a return address. When `regs_write_source = RegWriteSource.NextInstructionAddress`, the WB stage computes this return address using: io.instruction_address + 4.U This behavior matches the RISC-V specification for link-register updates in `JAL` and `JALR`. --- ### Write-Back Stage Analysis After the MEM stage finishes (either producing a loaded value or performing a store), the final step of the execution is handled by the **WriteBack (WB) stage**. The WB stage receives three possible data sources: - `alu_result` — used by arithmetic/logic instructions (`ADD`, `ADDI`, `SLT`, …) - `memory_read_data` — used by all load instructions (`LB`, `LW`, `LHU`, …) - `instruction_address + 4` — used by `JAL` / `JALR` to write the return address into `rd` The Decode stage determines which of these should be selected by providing: regs_write_source which is a 2-bit control value: | Source Code | Meaning | Used By | |-------------|------------------------------|-------------------------| | 0 | ALU result | Op / Op-Imm / AUIPC | | 1 | Memory read data | Load instructions | | 2 | PC + 4 | JAL / JALR | The WriteBack module simply multiplexes these sources: if regs_write_source = 0 → write ALU result if regs_write_source = 1 → write memory data if regs_write_source = 2 → write PC+4 The correctness of WB can therefore be verified directly in the waveform. --- ### Waveform Demonstration of Write-Back Behavior Using the same `sb.asmbin` program from the MEM demonstration, we can observe these types of write-back events: 1. **ALU-result writeback** — occurs for instructions like `li`, `addi`, `lui` 2. **Memory-data writeback** — occurs for `lw` 3. **PC+4 writeback** — occurs for the infinite `j 0x20` loop (a form of JAL) Below is an example of how the waveform shows the write-back source selection at the cycle of an `LW`. --- #### Write-Back for the `LW` Instruction When the program executes: lw t1, 0(a0) ![截圖 2025-11-30 晚上8.18.16](https://hackmd.io/_uploads/ryrzI2tZ-l.png) the waveform captures the expected write-back behavior for a load operation. At the cycle where `io_instruction_address = 0x00001010`, which corresponds to the `lw` instruction, the following signals become active: - `io_wb_reg_write_source = 1` This tells the Write-Back stage to select **memory_read_data** as the source. - `io_memory_read_enable = 1` confirming that the MEM stage has performed a full-word read. - The ALU provides the effective address: ``` io_alu_result = 0x00000004 ``` matching the address computed from `a0 + 0`. - Memory returns the 32-bit word stored earlier: ``` io_memory_read_data = 0x000000ef ``` This value corresponds to the byte previously written by the `SB` instruction (lower 8 bits of `0xDEADBEEF`). Because `LW` loads a full word, no sign or zero extension is applied, and the MEM stage forwards the value directly. This results in: ``` io_regs_write_data = 0x000000ef ``` - `io_write_enable = 1` indicates that the register file will commit this value to `t1`. This waveform confirms that: - the Decode stage correctly identifies `LW` as a memory-source writeback instruction, - the WB multiplexer selects the memory path (`regs_write_source = 1`), - and the loaded word is correctly propagated from MEM → WB → Register File. --- #### Write-Back for an Arithmetic Instruction Earlier in the program, when the CPU executes: addi t0, t0, -273 ![截圖 2025-11-30 晚上8.25.49](https://hackmd.io/_uploads/r1n0w2YbWl.png) In the waveform shown, the ADDI instruction begins when: io_instruction_address = 0x00001008 which corresponds to the ADDI instruction executed in the program. During this time window (around 9–13 ps in the waveform), the following signals confirm correct WB behavior: - `io_wb_reg_write_source = 0` This selects the **ALU result** as the write-back source (`0 → ALU`, `1 → Memory`, `2 → PC+4`). - `io_write_enable = 1` meaning the instruction writes a value back to register `rd`. - The ALU has computed its result from EX stage: io_alu_result = 0xDEADBEEF - The WB output reflects exactly the same value: io_regs_write_data = 0xDEADBEEF This demonstrates that the Write-Back multiplexer correctly forwards the ALU result for all arithmetic/immediate-type instructions. No memory access is involved, and the entire path: EX → WB → Register File operates as expected. The decode logic asserts write-enable, the WB stage selects the ALU result, and the correct value is written back to the destination register. --- #### Write-Back for the Final Jump (PC + 4) The last instruction in the program: j 0x20 This is encoded as a JAL instruction where the destination register is `x0`. Although writing to `x0` has no effect (because `x0` is always zero), the Write-Back logic is still activated, making this a useful case to verify that the WB multiplexer behaves correctly. ![截圖 2025-11-30 晚上8.34.40](https://hackmd.io/_uploads/ByA153tbZl.png) The waveform shows: io_instruction_address = 0x00001020 At this moment, the Write-Back path shows the following behavior: - `io_wb_reg_write_source = 2` This selects the “NextInstructionAddress” input of the WB multiplexer, which is correct for JAL/JALR. - `io_write_enable = 1` The decode stage marks J-type instructions as producing a value for `rd`, even though in this program the destination is `x0`. - `io_alu_result = 0x00001020` This reflects the computed jump target from the EX stage. - `io_regs_write_data = 0x00001024` This is exactly `PC + 4`, confirming that the Write-Back unit is generating the correct return address. Even though the value is ultimately discarded because the write target is `x0`, the waveform demonstrates that the Write-Back logic works exactly as intended: - the decode stage selects the correct write-back source, - the WB multiplexer outputs the correct `PC + 4` value, - the write-enable behavior matches that of a standard jump instruction. The combination of `regs_write_source = 2`, `write_enable = 1`, and `regs_write_data = PC+4` confirms that JAL instructions propagate through the WB stage correctly. ---