---
# System prepended metadata

title: Implement MMU (Sv32) for MyCPU Report

---

# Implement MMU (Sv32) for MyCPU Report

>This is for report full develop log here: https://hackmd.io/@sysprog/SyCApAcWWg


## Goals
* The provided pipelined RISC-V design (`4-soc`) is currently a physical-address-only core. Extending it to support Sv32 (32-bit page-based virtual memory) requires substantial hardware changes.
* CSR implementation (Privileged Specification):
  * Implementation of `satp` (Supervisor Address Translation and Protection), which controls paging mode, ASID, and the physical page number of the root page table.
  * Correct handling of `mstatus` and `sstatus`, including privilege mode tracking (MPP/SPP) and permission-related bits such as SUM and MXR.
  * Support for the `sfence.vma` instruction, which flushes TLB entries and requires logic to invalidate selected or all TLB entries.
* Address translation logic:
  * Translation Lookaside Buffers (TLB):
    * Engineering challenge: Designing separate instruction and data TLBs (I-TLB and D-TLB). These must be set-associative or fully associative to maintain reasonable performance.
    * Critical-path concern: TLB lookup sits directly in the instruction-fetch and memory stages. Careful Chisel design is required to avoid degrading the maximum clock frequency.
  * Page Table Walker (PTW):
    * Requires a hardware finite-state machine that walks the two-level Sv32 page table (Level 1 then Level 0) in physical memory when a TLB miss occurs.
    * The PTW must arbitrate memory access with the rest of the pipeline, typically stalling the core while it fetches and processes Page Table Entries (PTEs).
* Hardware page updates (A/D bits):
  * The hardware must check and update the Accessed (A) and Dirty (D) bits in PTEs during the walk. When a page is accessed or written, these bits must be set atomically in memory, which increases the complexity of the PTW’s memory operations.
* Integrating the MMU into the MyCPU pipeline is one of the most error-prone aspects of the project:
  * Pipeline stalling: The core must correctly stall the Fetch stage on an I-TLB miss and the Memory stage on a D-TLB miss while the PTW completes translation.
  * Exception handling:
    * Precise detection of Instruction Page Faults, Load Page Faults, and Store/AMO Page Faults.
    * Correctly writing the faulting virtual address into `stval` (or `mtval`) and transferring control to the trap handler (`stvec`), with proper privilege-mode switching.
* Verification: Ensure that the [mmu-test suite](https://github.com/sysprog21/rv32emu/tree/master/tests/system/mmu) runs correctly and that all translations, exceptions, and corner cases behave as expected.

---

## What is SV32?

### Sv32 Virtual Memory Overview

Sv32 is the 32-bit virtual memory paging scheme defined in the RISC-V privileged specification.
It translates a 32-bit virtual address (VA) into a physical address (PA) using a two-level page table structure.

Sv32 uses 4 KiB pages, and the virtual address is divided as follows:

```
31            22 21            12 11           0
+---------------+---------------+---------------+
|    VPN[1]     |    VPN[0]     |   page offset |
+---------------+---------------+---------------+
      10 bits         10 bits         12 bits
```

-	VPN[1]: Index into the Level-1 page table
-    VPN[0]: Index into the Level-0 page table
-	page offset: Offset within a 4 KiB page

After translation, the physical address is formed as:
```
PA = [ PPN (from PTE) | page offset ]
```

### satp Register and Paging Enable

Paging in Sv32 is controlled by the satp CSR (Supervisor Address Translation and Protection).

In Sv32 mode, satp contains:
	•	MODE: Enables Sv32 paging
	•	ASID: Address Space Identifier (not implement yet)
	•	PPN: Physical Page Number of the root (Level-1) page table

When satp.MODE enables Sv32, all instruction fetches and data accesses use virtual addresses and must go through the MMU for VA→PA translation.

### Page Table Walk (PTW)
On a TLB miss, requires a two-step page table walk:

1. Level-1 Lookup
```
PTE1_addr = satp.PPN * 4096 + VPN[1] * 4
```
2. Level-0 Lookup
```
PTE0_addr = PTE1.PPN * 4096 + VPN[0] * 4
```

If any PTE is invalid or permission checks fail, the MMU must raise:
-    Instruction Page Fault
-    Load Page Fault
-    Store/AMO Page Fault

---

## Supervisor-mode CSR + Trap Infrastructure (Implemented)

### Why implement S-mode first?
Before MMU/PTW becomes meaningful, the core must be able to *enter and run Supervisor mode correctly*, because in a typical Sv32 system the operating system runs in S-mode and controls virtual memory through S-mode CSRs (e.g., `satp`, `stvec`, `sstatus`).  
So I implemented the S-mode CSR set and the trap/return path first, to make sure page table control and page-fault delivery can later work end-to-end in Supervisor context.

---

### 1) What I implemented
- Added Supervisor CSRs in the CSR module:
  - `satp`, `stvec`, `sscratch`, `sepc`, `scause`, `stval`, and the supervisor-visible status view `sstatus`.
- Added a privilege mode state (`priv_mode`) in CSR to track the current execution mode (M or S).
- Extended CLINT to handle trap entry / return and to commit CSR updates via a direct-write interface.
- Implemented exception delegation via `medeleg` (exceptions only; interrupt delegation via `mideleg` is not implemented in this stage).

---

### 2) How it is implemented (hardware behavior)

#### 2.1 CSR storage + `sstatus` behavior
- Keep one physical `mstatus` register.
- Expose `sstatus` as a masked view of `mstatus`:
  - read: `sstatus = mstatus & SSTATUS_MASK`
- Writes to `sstatus` only update the masked bits in `mstatus`, preserving all other fields.
- This ensures S-mode status fields (`SPP`, `SIE`, `SPIE`, etc.) are stored physically in `mstatus` while still behaving like `sstatus` architecturally.

#### 2.2 Trap commit path (atomic CSR updates)
- CSR module provides a “direct write” commit path from CLINT:
  - `direct_write_enable` (M target)
  - `direct_write_enable_s` (S target)
- When CLINT asserts these enables, CSR updates commit atomically in one place:
  - M target writes: `mstatus/mepc/mcause/mtval`
  - S target writes: `sepc/scause/stval` plus `sstatus` effect through the masked write path
- `priv_mode` is updated through a dedicated interface (`priv_write_enable/priv_write_data`) to make privilege transitions explicit and easy to debug in waveforms.

#### 2.3 Trap entry and return (CLINT)
- CLINT determines whether to take a trap (exceptions + optional interrupts).
- On trap entry, it records the faulting PC into `*epc`, writes `*cause`, then redirects PC to `*tvec`:
  - If handled in M-mode:
    - jump to `mtvec`
    - write `mepc/mcause/mtval`
    - set `priv_mode = M`
  - If handled in S-mode:
    - jump to `stvec`
    - write `sepc/scause/stval`
    - set `priv_mode = S`
- Return instructions:
  - `mret`: PC <- `mepc`, and `priv_mode` is restored from `mstatus.MPP` (supports M→S transition)
  - `sret`: PC <- `sepc`, return to S (current design has no U-mode yet)

#### 2.4 Exception delegation via `medeleg` (exceptions only)
- Delegation decision is explicit and per-cause:
  - `delegatedToS = (cur_priv != M) && trap_is_exception && medeleg[cause]`
- If delegated:
  - use S-mode CSRs (`sepc/scause/stval`) and jump to `stvec`
- If not delegated:
  - default to M-mode CSRs (`mepc/mcause/mtval`) and jump to `mtvec`
- Interrupt delegation via `mideleg` is intentionally not implemented at this stage.

---

### 3) How I verified it (waveform checkpoints)

:::spoiler testcode here
```asm=
.section .text
    .globl main
    .option norvc

main:
    # ===== M-mode checkpoint =====
    # Write a known value to mscratch to mark execution in M-mode
    li      t0, 0x4D300001
    csrw    mscratch, t0

    # ===== Configure MRET target and set MPP = S =====
    # Prepare mstatus so that MRET returns to Supervisor mode
    csrr    t1, mstatus
    li      t0, ~(3 << 11)      # Clear MPP[12:11]
    and     t1, t1, t0
    li      t0, (1 << 11)       # Set MPP = 01 (Supervisor mode)
    or      t1, t1, t0
    csrw    mstatus, t1

    # Set return address for MRET
    la      t0, s_main
    csrw    mepc, t0

    # Second M-mode checkpoint before MRET
    li      t0, 0x4D300002
    csrw    mscratch, t0

    # Return from M-mode to S-mode
    mret


# =========================
# S-mode trap handler
# =========================
    .align 4
s_trap:
    # Trap entry checkpoint
    li      t0, 0x5330EE01
    csrw    sscratch, t0

    # Advance SEPC to skip the faulting ECALL instruction
    csrr    t1, sepc
    addi    t1, t1, 4
    csrw    sepc, t1

    # Trap exit checkpoint before SRET
    li      t0, 0x5330EE02
    csrw    sscratch, t0

    # Return from Supervisor trap
    sret


# --- Insert a large gap to make PC jumps clearly visible in waveforms ---
    .align 4
    .space  1024                # Can be increased (e.g., 4096) for clearer separation


# =========================
# S-mode main
# =========================
    .align 4
s_main:
    # Write the current PC into sscratch for easy identification in waveforms
    auipc   t2, 0
    csrw    sscratch, t2        # sscratch = PC of s_main

    # Configure Supervisor trap vector to point to s_trap
    # This is intentionally done in S-mode
    la      t0, s_trap
    csrw    stvec, t0

    # S-mode execution checkpoint before ECALL
    li      t0, 0x53300002
    csrw    sscratch, t0

    # Trigger Supervisor-mode ECALL
    ecall

after_ecall:
    # Checkpoint indicating successful return from SRET
    li      t0, 0x53300003
    csrw    sscratch, t0

done:
    # Infinite loop to keep execution observable
    j       done
```
:::

#### 3.1 M→S transition via `mret`
- Test sets `mstatus.MPP = S` and `mepc = s_main`, then executes `mret`.
- Expected waveform:
  - PC jumps to `s_main`
  - `priv_mode` switches from M to S

![截圖 2026-01-06 凌晨12.21.32](https://hackmd.io/_uploads/BkYfrvYEbe.png)

#### 3.2 S-mode `ecall` trap + `sret` return
- In S-mode, set `stvec = s_trap`, execute `ecall`.
- Handler increments `sepc` by 4 then executes `sret`.
- Expected waveform:
  - `sepc` captures the faulting PC
  - `scause = 9` (ECALL from S-mode)
  - PC jumps to `stvec`
  - `sret` returns to the instruction after `ecall`
 
![截圖 2026-01-06 凌晨12.22.46](https://hackmd.io/_uploads/rkHwrPYVZl.png)
![截圖 2026-01-06 凌晨12.25.22](https://hackmd.io/_uploads/SkbW8vt4Ze.png)


#### 3.3 `medeleg` behavior (delegated vs non-delegated)
- Run the same S-mode `ecall`, but toggle `medeleg[9]`:
  - `medeleg[9] = 1`:
    - trap stays in S-mode
    - PC jumps to `stvec`
    - writes `sepc/scause`

![截圖 2026-01-09 晚上8.25.21](https://hackmd.io/_uploads/S1AnXd0VWx.png)


  - `medeleg[9] = 0`:
    - trap escalates to M-mode (default behavior)
    - PC jumps to `mtvec`
    - writes `mepc/mcause`
    - `priv_mode` transitions S → M


![截圖 2026-01-09 晚上8.31.01](https://hackmd.io/_uploads/SyffSdA4-e.png)

---

## PTW + TLB (Full Version Only)

### 1) What I implemented
To support Sv32 VA→PA translation, I implemented inside the MMU:

- **Separate ITLB / DTLB**
  - **8 sets × 2 ways** (16 entries) each
  - Cache translations for **4KB pages** and **4MB superpages**
  - Per-set **round-robin replacement** (waveform-friendly and deterministic)

- **A shared PTW (Page Table Walker) FSM**
  - Performs a **full 2-level Sv32 walk**:
    - L1 PTE fetch → (if non-leaf) L0 PTE fetch
  - Produces either:
    - **A leaf translation** → fills ITLB/DTLB
    - **A fault** → raises I/D fault signals (full trap plumbing is the next step)

- **Pipeline stall + bus arbitration**
  - PTW must fetch PTEs via the **same AXI/bus** that the core MEM stage uses
  - So I added a PTW memory port and used **mutual exclusion**:
    - When PTW is active, **stall the whole core** (reuse `mem_stall`)
    - Block MEM stage from issuing requests during MMU stall
    - Use a **MUX** controlled by `ptw_active` to route bus request/response
      to either **PTW** or **normal MEM stage**

---

### 2) How the TLB works (high level)
Each (I/D) TLB entry stores:
- `valid`
- `tag`
- `ppn` (PA[31:12])
- `isSuper` (distinguish 4KB vs 4MB superpage entry)

Lookup:
- **4KB page** lookup uses **VPN0-based set** and a 4KB tag
- **4MB superpage** lookup uses **VPN1-based set** and a superpage tag
- `isSuper` prevents matching the wrong page size

On hit:
- output `PA = (PPN << 12) | page_offset`

On PTW completion:
- fill the selected set/way
- update per-set RR victim pointer

---


### 3) How the PTW works (full Sv32 walk)

![image](https://hackmd.io/_uploads/HkTXWnnHbx.png)

The PTW FSM contains the following states:

- **sIdle**  
  The PTW is inactive and the pipeline is not stalled by the MMU.  
  When Sv32 is enabled and an instruction/data access misses in the corresponding TLB, the MMU latches the faulting virtual address and access type (I-side fetch vs D-side load/store), then starts a page table walk.

- **sL1Req**  
  Issues a memory read request for the level-1 PTE (L1 PTE).  
  The requested physical address is computed from `satp.ppn` (root page table base) and `VPN[1]` extracted from the latched VA.

- **sL1Wait**  
  Waits for the memory response containing the L1 PTE.  
  Once the response arrives, the PTW checks:
  1) validity and illegal encoding (e.g., `V=0` or `R=0 && W=1`),  
  2) whether the entry is a **leaf** (translation terminates here) or a **pointer** (must continue to level-0),  
  3) if it is an L1 leaf (superpage/4MB mapping), it also checks alignment constraints (Sv32 requires `PPN0 == 0` for a superpage leaf).  
  If the L1 PTE is a valid leaf, the walk completes and transitions to `sLeaf`.  
  If it is a valid non-leaf pointer, the PTW proceeds to fetch the L0 PTE.

- **sL0Req**  
  Issues a memory read request for the level-0 PTE (L0 PTE).  
  The base address is derived from the PPN in the L1 PTE, and the index comes from `VPN[0]` of the latched VA.

- **sL0Wait**  
  Waits for the memory response containing the L0 PTE.  
  When the response arrives, the PTW validates the PTE similarly:
  - invalid or illegal encoding → page fault (`sFault`)
  - leaf entry → translation complete (`sLeaf`)
  - non-leaf entry at L0 → page fault (`sFault`) because Sv32 has only two levels

- **sLeaf**  
  Finalization state for a successful translation.  
  In this state the PTW:
  1) checks access permissions based on the request type:
     - instruction fetch requires `X`
     - load requires `R`
     - store requires `W`
  2) constructs the final translated PPN:
     - for a normal 4KB page, PPN comes directly from the L0 leaf PTE
     - for a 4MB superpage, the PPN is formed by combining `PPN1` from the L1 leaf PTE with `VPN0` from the VA
  3) fills the appropriate TLB (ITLB or DTLB), including replacement selection (2-way set-associative with per-set round-robin victim)
  After filling the TLB, the PTW returns to `sIdle` so the stalled request can be retried using the translated physical address.

- **sFault**  
  Terminal state for translation failure.  
  The MMU reports a fault to either the I-side or D-side depending on which request triggered the walk, then returns to `sIdle`.  
  *(Note: fault signaling to the IF stage and writing the corresponding `scause` are not implemented yet and will be added later.)*


> Note: A/D-bit update is **not implemented yet** (I currently pre-set A/D in the test PTEs).

---

### 4) How I verified it
Because the full mmu-test requires fault delivery + other missing pieces, I validated PTW/TLB behavior using a **standalone Sv32 assembly test** that:
- Builds page tables in memory at fixed, aligned physical addresses
- Enables `satp`
- Triggers controlled accesses that force:
  - **L1 leaf (4MB superpage)** translations
  - **L1 pointer → L0 leaf (4KB)** translations

:::spoiler

```asm=
.section .text
.globl main

.equ L1_PT_PA,   0x00005000      # 4KB aligned
.equ L0_PT0_PA,  0x00006000      # vpn1=0  -> VA 0x0000_0000 ~ 0x003F_FFFF

.equ PTE_PTR,    0x001           # V=1, R=W=X=0  (pointer)
.equ PTE_LEAF,   0x0CF           # V|R|W|X|A|D

main:
  # 1) mtvec point to trap entry
  la   t0, __trap_entry
  csrw mtvec, t0

  # ---------------------------------------------------
  # 2) clear L1 + L0 tables (only tables we use)
  # ---------------------------------------------------
  li   t2, 0

  # clear L1 page table @ L1_PT_PA
  li   t0, L1_PT_PA
  li   t1, 1024
1:
  sw   t2, 0(t0)
  addi t0, t0, 4
  addi t1, t1, -1
  bnez t1, 1b

  # clear L0_PT0 @ L0_PT0_PA
  li   t0, L0_PT0_PA
  li   t1, 1024
2:
  sw   t2, 0(t0)
  addi t0, t0, 4
  addi t1, t1, -1
  bnez t1, 2b

  # ---------------------------------------------------
  # 3) L1[0] -> L0_PT0 (pointer)
  #    L1[1] = superpage leaf (4MB) for VA 0x0040_0000~0x007F_FFFF
  # ---------------------------------------------------
  li   t0, L1_PT_PA

  # L1[0] pointer -> L0_PT0
  li   t1, (L0_PT0_PA >> 12)
  slli t1, t1, 10
  ori  t1, t1, PTE_PTR
  sw   t1, 0(t0)              # entry vpn1=0

  # L1[1] superpage leaf:
  # PPN1 = 1, and PPN0 must be 0 => just put (1<<20) into PTE[31:20]
  li   t2, (1 << 20)          # PPN1 goes to bits [31:20]
  ori  t2, t2, PTE_LEAF
  sw   t2, 4(t0)              # entry vpn1=1 (superpage leaf)

  # ---------------------------------------------------
  # 4) Fill L0_PT0: identity map 0~4MB (vpn1=0 chunk)
  # ---------------------------------------------------
  li   t0, L0_PT0_PA
  li   t1, 0                  # vpn0
4:
  slli t2, t1, 12             # va within vpn1=0 region
  srli t3, t2, 12             # ppn = va>>12 (identity)
  slli t3, t3, 10
  ori  t3, t3, PTE_LEAF
  sw   t3, 0(t0)

  addi t0, t0, 4
  addi t1, t1, 1
  li   t4, 1024
  blt  t1, t4, 4b

  # ---------------------------------------------------
  # 5) Enter S-mode
  # ---------------------------------------------------
  csrr t0, mstatus
  li   t1, ~(3 << 11)
  and  t0, t0, t1
  li   t1, (1 << 11)          # MPP=S
  or   t0, t0, t1
  csrw mstatus, t0

  la   t0, s_main
  csrw mepc, t0
  mret

# =====================================================
# S-mode main 
# =====================================================
.align 4
s_main:
  li   t0, (1 << 31) | (L1_PT_PA >> 12)
  csrw satp, t0

  li   s0, 0                 # error accumulator (0 = pass)

  # ===================================================
  # (A) Superpage region (vpn1=1 megapage)
  # touch a few addresses: base + 0x1000*k + small offset
  # ===================================================
  li   s1, 0x00400000         # superpage base
  li   s2, 0xA0000000         # seed

  # A0: [0x00401000]
  li   t1, 0x00401000
  li   t2, 0xA0000001
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # A1: [0x00402004]
  li   t1, 0x00402004
  li   t2, 0xA0000002
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # A2: [0x00403008]
  li   t1, 0x00403008
  li   t2, 0xA0000003
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # A3: [0x0040400C]
  li   t1, 0x0040400C
  li   t2, 0xA0000004
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4


  # ===================================================
  # (B) 4KB region (vpn1=0 via L0)
  # Force SAME set conflicts:
  # base = 0x00001000 keeps VA[14:12]=001
  # stride = 0x8000 changes VA[21:15] (tag) but keeps set
  # Touch 6 distinct pages => must evict in 2-way
  # ===================================================
  li   s4, 0xCAFE0000

  # B0: 0x00001000
  li   t1, 0x00001000
  li   t2, 0xCAFE1000
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # B1: 0x00009000
  li   t1, 0x00009000
  li   t2, 0xCAFE1001
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # B2: 0x00011000
  li   t1, 0x00011000
  li   t2, 0xCAFE1002
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # B3: 0x00019000
  li   t1, 0x00019000
  li   t2, 0xCAFE1003
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # B4: 0x00021000
  li   t1, 0x00021000
  li   t2, 0xCAFE1004
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # B5: 0x00029000
  li   t1, 0x00029000
  li   t2, 0xCAFE1005
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4


  # ===================================================
  # (C) Re-touch first two addresses again
  # If replacement is working, at least one should miss/refill
  # ===================================================
  # C0: 0x00001000 again
  li   t1, 0x00001000
  li   t2, 0xDEAD0000
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4

  # C1: 0x00009000 again
  li   t1, 0x00009000
  li   t2, 0xDEAD0001
  sw   t2, 0(t1)
  lw   t3, 0(t1)
  xor  t4, t2, t3
  or   s0, s0, t4


done:
  j done

```

:::

Waveforms confirm the PTW/TLB loop works end-to-end:

- **I-side (Instruction fetch)**
  - On an iTLB miss, the PTW performs the expected Sv32 walk and then fills the **ITLB**.
    - 4KB case: `L1 fetch → L0 fetch → leaf → ITLB fill`
    - (If the VA maps to a superpage) 4MB case: `L1 fetch → leaf → ITLB fill`
  - After the fill, the same fetch is retried and **hits in ITLB**.

    ![截圖 2026-01-18 晚上7.04.27](https://hackmd.io/_uploads/H1KHAE9SZe.png)

- **D-side (Load/Store)**
  - I used **two waveforms** to cover both translation patterns:
    1) **D-side 1-stage (superpage / 4MB)**
       - `L1 fetch → leaf → DTLB fill`
       - The retried load/store then **hits in DTLB**.

    ![截圖 2026-01-18 晚上11.22.15](https://hackmd.io/_uploads/rkEh5_qBZe.png)

    2) **D-side 2-stage (normal 4KB)**
       - `L1 fetch → L0 fetch → leaf → DTLB fill`
       - The retried load/store then **hits in DTLB**.

    ![截圖 2026-01-19 凌晨12.00.01](https://hackmd.io/_uploads/S1aK7t9S-l.png)

---

### 5) Current limitation (next work)
I can raise `i_fault/d_fault` from PTW, but **end-to-end page fault handling is not complete yet**, mainly because:
- In a pipeline, faults must be **precise** (only the correct-path instruction should trap)
- If the pipeline keeps presenting the same faulting request, PTW can re-walk repeatedly unless the core **kills the request + redirects PC** to the handler
- The final integration step is:
  - carry fault info to the right stage,
  - flush/redirect correctly,
  - write `stval/scause/sepc` via the trap path.


## Next work / Not finished yet

There are still several missing pieces before the Sv32 MMU is “complete” and OS-ready:

- **Run the official `mmu-test` suite**
  - I currently cannot pass the full test suite because the remaining exception/fault plumbing is incomplete (see below).
  - Target: make the core pass the `rv32emu mmu-test` end-to-end (translation + faults + corner cases).

- **Precise page fault handling (end-to-end)**
  - PTW can raise `i_fault/d_fault`, but I still need to:
    - generate the correct fault type (`Instruction/Load/Store Page Fault`) and write **`scause`**
    - write the faulting VA into **`stval`**
    - ensure faults are **precise** in a pipelined core (only correct-path instructions trap)
    - correctly **flush/kill** the faulting request and **redirect PC** to the trap handler (`stvec`) to avoid infinite re-walk loops.

- **A/D bit update in hardware**
  - Current tests pre-set `A/D` in PTEs; PTW is read-only.
  - Target: implement atomic A/D updates (including the extra memory write sequence and corner cases).

 
- **`sfence.vma` support**
  - TLB flush/invalidate behavior is still pending.
  - Target: implement `sfence.vma` and verify selective + global invalidation.