# Memory and Write Back Analysis
In class Disccusion(11/25) about which instructions need memory and/or wb and corresponding signals
## Introduction
This report analyzes the **Memory (MEM)** and **Writeback (WB)** behavior in the single-cycle CPU designed for Lab 3.
It explains which instructions require MEM/WB, how control flows through the pipeline-like stages, and how loads/stores/PC+4 writeback are implemented.
Although the CPU is *single-cycle*, its logic is divided conceptually into:
IF → ID → EX → MEM → WB
This improves organization and debugging.
In one clock cycle, all these computations occur combinationally.
Below is the analysis of MEM/WB stages and their interactions with control signals.
## Instruction Types and Stage Usage
Different RISC-V instruction classes interact with the **MEM** and **WB** stages in different ways.
This section summarizes which types of instructions require memory access, which require register writeback, and what data is produced or consumed in each case.
### Instructions that use the MEM stage
The MEM stage is responsible for reading from or writing to data memory.
Only **load** and **store** instructions activate this stage:
| Instruction Type | Examples | Uses MEM? | Description |
|------------------|----------------------------------------|-----------|-------------|
| **Load** | `LB`, `LH`, `LW`, `LBU`, `LHU` | ✔ | Reads data from memory and forwards it to WB |
| **Store** | `SB`, `SH`, `SW` | ✔ | Writes register data to memory using byte strobes |
Only loads and stores truly “use” MEM; all other instruction types bypass this stage.
---
### Instructions that use the WB stage
The Writeback (WB) stage determines the final value written into the destination register (`rd`).
Depending on the instruction type, the writeback source differs:
| Instruction Type | Examples | Writes Back? | Writeback Source |
|------------------|------------------|--------------|------------------|
| **Op / Op-Imm** | `ADD`, `ANDI` | ✔ | ALU result |
| **Load** | `LW`, `LBU`, … | ✔ | Memory read data |
| **LUI** | `LUI` | ✔ | Immediate << 12 |
| **AUIPC** | `AUIPC` | ✔ | PC + immediate |
| **Jump** | `JAL`, `JALR` | ✔ | PC + 4 (return address) |
This classification determines:
- which instructions set `reg_write_enable`
- and how `regs_write_source` is encoded during decode.
## Control Signals for MEM
After identifying which instructions require the MEM and WB stages,
we now examine **how the Decode stage generates control signals** that
govern memory access and register writeback behavior.
Although the CPU is single-cycle, each conceptual stage activates specific
combinational logic based on the control signals produced in the Decode stage.
### MemoryAccess Stage: Control and Data Flow
The `MemoryAccess` module implements the MEM stage of the single-cycle CPU.
Its behavior is controlled entirely by two signals generated in Decode:
- `memory_read_enable`
- `memory_write_enable`
Together with `alu_result`, `funct3`, and `reg2_data`, these signals define how
load and store instructions interact with data memory.
---
### Memory Address Decoding (Shared by Load & Store)
Even though the memory is stored as **32-bit words (4 bytes)**, RV32I uses **byte-addressing**.
Therefore, the lower bits of `alu_result` determine which byte or halfword inside a word is accessed.
```scala
val mem_address_index = io.alu_result(log2Up(Parameters.WordSize) - 1, 0)
```
If `WordSize = 4`, then:
```
log2Up(4) = 2
mem_address_index = alu_result(1, 0)
```
This index is used by *both* load and store logic to:
- pick the correct byte from the 32-bit word:
`bytes(mem_address_index)`
- determine whether the halfword is lower or upper:
`mem_address_index(1)`
- activate the correct byte strobes for SB/SH/SW
- shift stored data into the correct byte position inside the word
---
### Load Path (When `memory_read_enable = 1`)
When `memory_read_enable = 1`, the MEM stage performs a memory read.
The effective address was already computed in Execute:
```text
effective_address = rs1 + immediate
```
This address is forwarded to the memory system:
```scala
io.memory_bundle.address := io.alu_result
```
The memory returns a full 32-bit word, which is then split into 4 bytes:
```scala
val data = io.memory_bundle.read_data
val bytes = Wire(Vec(4, UInt(8.W)))
```
The correct byte or halfword is selected using `mem_address_index`, and then
sign-/zero-extended depending on `funct3`:
- `LB`, `LH` → sign-extend
- `LBU`, `LHU` → zero-extend
- `LW` → no extension
```scala
io.wb_memory_read_data := MuxLookup(io.funct3, 0.U)(...)
```
This value is forwarded to the WB stage as the load result.
---
### Store Path (When `memory_write_enable = 1`)
When `memory_write_enable = 1`, the MEM stage performs a **memory write**.
Just like loads, the effective address was already computed in Execute:
```text
effective_address = rs1 + immediate
```
And is forwarded to the memory system as:
```scala
io.memory_bundle.address := io.alu_result
io.memory_bundle.write_enable := true.B
```
Stores have two responsibilities:
1. **Enable the correct byte strobes**
2. **Shift the store data into the correct byte position**
Both depend on `mem_address_index`.
---
#### Byte Strobes (Which bytes in the word to write)
RISC-V stores support 3 sizes:
| Instruction | Description | Bytes Written | Depends on |
|------------|-------------|----------------|-------------|
| **SB** | Store byte | 1 byte | `mem_address_index` |
| **SH** | Store half | 2 bytes | `mem_address_index(1)` |
| **SW** | Store word | 4 bytes | none |
The hardware creates a 4-bit strobe vector:
```
write_strobe = [b3 b2 b1 b0]
```
Each bit indicates whether a byte in the 32-bit word should be overwritten.
Example patterns:
| Operation | `mem_address_index` | Strobes |
|----------|----------------------|---------|
| SB | 2 | 0 0 1 0 |
| SH | 0 | 0 0 1 1 |
| SH | 2 | 1 1 0 0 |
| SW | x | 1 1 1 1 |
---
#### Data Shifting (Align store data into the correct byte range)
Because memory writes whole **32-bit words**, sub-word writes require shifting:
- SB: shift by `(index * 8)` bits
- SH (upper): shift by 16 bits
- SW: no shift needed
Example (SB):
```scala
writeData := data(7,0) << (mem_address_index << 3)
```
Example (SH upper halfword):
```scala
writeData := data(15,0) << 16
```
The selected data is then forwarded to memory:
```scala
io.memory_bundle.write_data := writeData
io.memory_bundle.write_strobe := writeStrobes
```
---
### Demonstration Using Waveforms (SB → LW)
To validate that the MEM stage behaves correctly, we use a small test program (`sb.asmbin`) that performs the following sequence:
1. Compute a base address
2. Construct a 32-bit constant (0xDEADBEEF)
3. **Store individual bytes** into memory using `SB`
4. **Load back a full word** using `LW` to confirm the stored bytes were placed correctly
This program is ideal because it exercises both the **store path** (byte-level write) and the **load path** (full-word read + extension logic).
When running the simulation with Verilator and observing the VCD waveform, we can verify both operations.
Dump Code of the program
```asm=
0: 00400513 li a0,4
4: deadc2b7 lui t0,0xdeadc
8: eef28293 addi t0,t0,-273 # 0xdeadbeef
c: 00550023 sb t0,0(a0)
10: 00052303 lw t1,0(a0)
14: 01500913 li s2,21
18: 012500a3 sb s2,1(a0)
1c: 00052083 lw ra,0(a0)
20: 0000006f j 0x20
```
---
#### Store Operation (SB)
During an `SB` instruction:
- `memory_write_enable` becomes `1`
- `mem_address_index` selects the target byte inside the 32-bit word
- only one bit of `write_strobe` is asserted (e.g., `0010` for index 1)
- `write_data` contains the 8-bit value shifted to the correct byte position

In the waveform above, the SB instruction begins at the cycle where
`io_instruction_address = 0x0000100c`, matching the instruction `sb t0, 0(a0)` in the program(13ps to 17ps).
At the following cycle:
- `io_memory_write_enable` goes HIGH, indicating a store.
- The ALU computes the address:
```text
alu_result = 0x00000004
mem_address_index = alu_result[1:0] = 0
```
Since the index is `0`, the **lowest byte** of the 32-bit memory word is selected.
This is visible in the strobe lines:
```text
write_strobe = 0001
```
The byte written comes from the low 8 bits of `t0 = 0xDEADBEEF`, so:
```text
writeData = 0x000000EF
```
Only that single byte is updated in memory; the remaining three bytes remain untouched.
This confirms that the store-byte implementation correctly:
- decodes the address offset,
- activates the proper byte strobe,
- and shifts the stored byte into the correct 8-bit lane.
---
#### Load Operation (LW)
When the program immediately performs an `LW` from the same address:
- `memory_read_enable` becomes `1`
- the memory returns the entire 32-bit word
- the CPU does **not** use byte strobes; the full word is forwarded to the load-extension logic
- for `LW`, no sign/zero extension is needed
→ the 32-bit value stored earlier is returned directly

During an `LW` instruction, the MEM stage performs a full 32-bit word read.
In the waveform shown, the load activity starts around **18 ps**, where the CPU reaches:
`io_instruction_address = 0x00001010`
which corresponds to the `lw t1, 0(a0)` instruction inside the SB test program.
From this point onward, the following signals illustrate the load behavior:
- `io_memory_read_enable = 1`
This indicates that the current instruction is a memory read.
The write-enable signal stays low (`io_memory_write_enable = 0`).
- The ALU-computed effective address becomes available:
`io_alu_result = 0x00000004`
This address is forwarded to memory as
`io_memory_bundle.address = 0x00000004`.
- Since RV32I uses byte addressing but memory stores 32-bit words, the lower two bits determine the byte offset:
`mem_address_index = io.alu_result(1,0) = 0`
For `LW`, natural alignment is expected, so the index being zero confirms that the access is aligned.
- The data memory returns the full 32-bit word:
`io_memory_bundle.read_data = 0x000000ef`
This matches the earlier store-byte operation, which wrote only the least-significant byte.
The 8-bit split bytes in the waveform confirm this:
`bytes_0 = ef`, `bytes_1 = 00`, `bytes_2 = 00`, `bytes_3 = 00`.
- The MemoryAccess unit forwards the load result directly toward the Writeback stage without sign or zero extension (since `LW` loads a full word):
`io_wb_memory_read_data = 0x000000ef`.
This demonstrates that:
- the read-enable logic activates correctly,
- the EX stage provides the correct effective address,
- memory returns the expected updated word,
- and the WB stage receives the correct 32-bit load result.
The waveform verifies that the LW datapath—address forwarding, word extraction, and MEM→WB propagation—operates correctly on a naturally aligned load.
---
## Control Signals for WB
The Writeback (WB) stage is the final step in the single-cycle CPU’s execution flow.
Its responsibility is simple but essential: **select the correct data source and deliver it to the register file (`rd`)**.
Unlike the MEM stage, WB does not determine *whether* a register should be written.
That decision is made earlier in the Decode stage through the `reg_write_enable` control signal.
WB only determines **what value should be written**, based on the control encoding:
- `RegWriteSource.ALUResult`
- `RegWriteSource.Memory`
- `RegWriteSource.NextInstructionAddress`
These values are selected through a multiplexer implemented using `MuxLookup`.
---
### WB Inputs and Outputs
The WriteBack module receives three possible sources of data:
instruction_address : PC of current instruction
alu_result : result computed in the EX stage
memory_read_data : value loaded in the MEM stage
regs_write_source : select signal indicating which source to use
Its output is:
regs_write_data : final data to be written into register rd
This output is later consumed by the register file only when `reg_write_enable = 1`.
---
### Writeback Data Source Selection
The core logic of the WB stage is:
```scala=
io.regs_write_data := MuxLookup(io.regs_write_source, io.alu_result)(
Seq(
RegWriteSource.Memory -> io.memory_read_data,
RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U)
)
)
```
This logic implements a 3-way multiplexer:
#### 1. Default: ALU Result
If `regs_write_source` is neither `Memory` nor `NextInstructionAddress`,
`regs_write_data` defaults to the ALU output.
This covers the majority of ALU and immediate instructions:
- `ADD`, `SUB`, `AND`, `OR`, `XOR`
- `ADDI`, `ANDI`, `ORI`, `XORI`
- shifts (`SLL`, `SRL`, `SRA`)
- comparisons (`SLT`, `SLTU`)
- `LUI`, `AUIPC` (their results are generated through the ALU path)
#### 2. Load Instructions
Load instructions:
- `LB`, `LH`, `LW`, `LBU`, `LHU`
set `regs_write_source = RegWriteSource.Memory`.
The value passed into WB (`io.memory_read_data`) is the sign-extended or zero-extended value produced in the MEM stage.
WB simply forwards this value to be written into `rd`.
#### 3. Jump Instructions (JAL, JALR)
Jump instructions need to write **PC + 4** into `rd` as a return address.
When `regs_write_source = RegWriteSource.NextInstructionAddress`,
the WB stage computes this return address using:
io.instruction_address + 4.U
This behavior matches the RISC-V specification for link-register updates in `JAL` and `JALR`.
---
### Write-Back Stage Analysis
After the MEM stage finishes (either producing a loaded value or performing a store),
the final step of the execution is handled by the **WriteBack (WB) stage**.
The WB stage receives three possible data sources:
- `alu_result` — used by arithmetic/logic instructions (`ADD`, `ADDI`, `SLT`, …)
- `memory_read_data` — used by all load instructions (`LB`, `LW`, `LHU`, …)
- `instruction_address + 4` — used by `JAL` / `JALR` to write the return address into `rd`
The Decode stage determines which of these should be selected by providing:
regs_write_source
which is a 2-bit control value:
| Source Code | Meaning | Used By |
|-------------|------------------------------|-------------------------|
| 0 | ALU result | Op / Op-Imm / AUIPC |
| 1 | Memory read data | Load instructions |
| 2 | PC + 4 | JAL / JALR |
The WriteBack module simply multiplexes these sources:
if regs_write_source = 0 → write ALU result
if regs_write_source = 1 → write memory data
if regs_write_source = 2 → write PC+4
The correctness of WB can therefore be verified directly in the waveform.
---
### Waveform Demonstration of Write-Back Behavior
Using the same `sb.asmbin` program from the MEM demonstration,
we can observe these types of write-back events:
1. **ALU-result writeback** — occurs for instructions like `li`, `addi`, `lui`
2. **Memory-data writeback** — occurs for `lw`
3. **PC+4 writeback** — occurs for the infinite `j 0x20` loop (a form of JAL)
Below is an example of how the waveform shows the write-back source selection at the cycle of an `LW`.
---
#### Write-Back for the `LW` Instruction
When the program executes:
lw t1, 0(a0)

the waveform captures the expected write-back behavior for a load operation.
At the cycle where
`io_instruction_address = 0x00001010`,
which corresponds to the `lw` instruction, the following signals become active:
- `io_wb_reg_write_source = 1`
This tells the Write-Back stage to select **memory_read_data** as the source.
- `io_memory_read_enable = 1`
confirming that the MEM stage has performed a full-word read.
- The ALU provides the effective address:
```
io_alu_result = 0x00000004
```
matching the address computed from `a0 + 0`.
- Memory returns the 32-bit word stored earlier:
```
io_memory_read_data = 0x000000ef
```
This value corresponds to the byte previously written by the `SB` instruction (lower 8 bits of `0xDEADBEEF`).
Because `LW` loads a full word, no sign or zero extension is applied, and the MEM stage forwards the value directly.
This results in:
```
io_regs_write_data = 0x000000ef
```
- `io_write_enable = 1`
indicates that the register file will commit this value to `t1`.
This waveform confirms that:
- the Decode stage correctly identifies `LW` as a memory-source writeback instruction,
- the WB multiplexer selects the memory path (`regs_write_source = 1`),
- and the loaded word is correctly propagated from MEM → WB → Register File.
---
#### Write-Back for an Arithmetic Instruction
Earlier in the program, when the CPU executes:
addi t0, t0, -273

In the waveform shown, the ADDI instruction begins when:
io_instruction_address = 0x00001008
which corresponds to the ADDI instruction executed in the program.
During this time window (around 9–13 ps in the waveform), the following signals confirm correct WB behavior:
- `io_wb_reg_write_source = 0`
This selects the **ALU result** as the write-back source
(`0 → ALU`, `1 → Memory`, `2 → PC+4`).
- `io_write_enable = 1`
meaning the instruction writes a value back to register `rd`.
- The ALU has computed its result from EX stage:
io_alu_result = 0xDEADBEEF
- The WB output reflects exactly the same value:
io_regs_write_data = 0xDEADBEEF
This demonstrates that the Write-Back multiplexer correctly forwards the ALU result for all arithmetic/immediate-type instructions. No memory access is involved, and the entire path:
EX → WB → Register File
operates as expected. The decode logic asserts write-enable, the WB stage selects the ALU result, and the correct value is written back to the destination register.
---
#### Write-Back for the Final Jump (PC + 4)
The last instruction in the program:
j 0x20
This is encoded as a JAL instruction where the destination register is `x0`.
Although writing to `x0` has no effect (because `x0` is always zero), the Write-Back logic is still activated, making this a useful case to verify that the WB multiplexer behaves correctly.

The waveform shows:
io_instruction_address = 0x00001020
At this moment, the Write-Back path shows the following behavior:
- `io_wb_reg_write_source = 2`
This selects the “NextInstructionAddress” input of the WB multiplexer, which is correct for JAL/JALR.
- `io_write_enable = 1`
The decode stage marks J-type instructions as producing a value for `rd`, even though in this program the destination is `x0`.
- `io_alu_result = 0x00001020`
This reflects the computed jump target from the EX stage.
- `io_regs_write_data = 0x00001024`
This is exactly `PC + 4`, confirming that the Write-Back unit is generating the correct return address.
Even though the value is ultimately discarded because the write target is `x0`, the waveform demonstrates that the Write-Back logic works exactly as intended:
- the decode stage selects the correct write-back source,
- the WB multiplexer outputs the correct `PC + 4` value,
- the write-enable behavior matches that of a standard jump instruction.
The combination of `regs_write_source = 2`, `write_enable = 1`, and `regs_write_data = PC+4` confirms that JAL instructions propagate through the WB stage correctly.
---