Zisk zkVM Memory Model

# Zisk zkVM Memory Model The efficiency and integrity of a zkVM are fundamentally rooted in its memory model. The model must not only be versatile enough to run complex programs but also be rigidly structured to allow for efficient and verifiable proof generation. The Zisk zkVM, employs a memory model designed to ensure correctness and consistency for every memory access. This note explores the architecture, components, and constraints that define Zisk's memory system. ## Core Concepts At its heart, Zisk's memory is a **64-bit, little-endian** addressable space. Every memory operation, regardless of its type, is tracked and ordered. The system relies on two key principles for verification: 1. **Chronological and Address-Based Sorting**: All memory operations are first sorted by their memory address. For operations at the same address, they are further sorted by a `mem-step`, which acts as a logical timestamp. This strict ordering is the foundation for verifying the state transitions of memory cells. 2. **Permutation Checks**: A permutation check is used to link every memory operation executed by the CPU with the corresponding entry in the sorted memory trace. This ensures that the processor's view of memory perfectly aligns with the verified memory state, leaving no room for inconsistencies. Each step of the main processor execution, or `main-step`, can correspond to multiple memory operations. Think of a `main-step` as a single tick of a clock, i.e. one discrete moment in which the CPU performs an instruction. However, a single instruction can be complex, sometimes requiring it to read from multiple memory locations before writing a result. To track these sub-operations, Zisk uses a more granular clock called a `mem-step`. If a `main-step` is like a second, a `mem-step` is like a millisecond. It gives a unique and ordered timestamp to every single memory access, ensuring there is no ambiguity about what happened when. The `offset` is what determines the specific "millisecond" within that "second." It's a small, fixed number assigned to each memory access within a single `main-step`. Zisk reserves specific offset "slots" for different types of memory access to maintain order. For instance, reads are typically assigned lower offsets (like 0, 1, 2) and writes are assigned a higher offset (like 3). This guarantees that within a single instruction, the prover can verify that all necessary data was read *before* a new value was written. This relationship is captured by the formula: \begin{aligned} mem\_step &= \text{RESERVED_MEM_STEPS} \\ &+ (main\_step \times \text{MAX_MEM_STEPS_PER_MAIN_STEP}) \\&+ \text{offset} \end{aligned} where `RESERVED_MEM_STEPS` is a starting constant (usually 1) and `MAX_MEM_STEPS_PER_MAIN_STEP` is the total number of slots available (usually 4). :::info ### An Example in Action Let's imagine the CPU executes the following operations over two `main-steps`. **CPU Execution Order:** 1. **main-step 5**: Read value from address `0xA008`. 2. **main-step 5**: Write value `42` to address `0xA000`. 3. **main-step 6**: Read value from address `0xA000`. 4. **main-step 6**: Read value from address `0xA008`. **Choosing the Offsets:** The system assigns offsets based on a predetermined sequence to ensure order within a `main-step`. * In `main-step 5`, the first operation is a **Read**, so it gets the first available read slot, `offset = 0`. The second operation is a **Write**. Writes are typically handled after reads and are assigned a higher, reserved slot, so it gets `offset = 3`. * In `main-step 6`, the first operation is another **Read**, so it starts again at the first read slot, `offset = 0`. The second operation is also a **Read**, so it takes the next available read slot, `offset = 1`. **Calculating the `mem-steps`:** Using the formula with `MAX_MEM_STEPS_PER_MAIN_STEP = 4`, we get: * **Op 1 (Read `0xA008`)**: `main-step`=5, `offset`=0 → `mem_step` = 1 + (5 * 4) + 0 = **21** * **Op 2 (Write `0xA000`)**: `main-step`=5, `offset`=3 → `mem_step` = 1 + (5 * 4) + 3 = **24** * **Op 3 (Read `0xA000`)**: `main-step`=6, `offset`=0 → `mem_step` = 1 + (6 * 4) + 0 = **25** * **Op 4 (Read `0xA008`)**: `main-step`=6, `offset`=1 → `mem_step` = 1 + (6 * 4) + 1 = **26** For verification, these operations are **not** processed in execution order. Instead, they are sorted first by `address`, then by `mem-step`. **Verified Memory Trace Order:** | Address | Mem-Step | Operation | Value | | :------- | :------- | :-------- | :---------------- | | `0xA000` | 24 | Write | 42 | | `0xA000` | 25 | Read | 42 | | `0xA008` | 21 | Read | (previous value) | | `0xA008` | 26 | Read | (same previous value) | This sorted trace is what the zk-prover verifies. It checks that the value read from address `0xA000` at `mem-step` 25 correctly follows the write at `mem-step` 24. ::: ## Memory Types Zisk defines three distinct types of memory, each with specific rules enforced by polynomial constraints. ### RAM (mem) This is the standard **Random Access Memory**. It is a mutable region that allows for regular read and write operations. The state of a RAM cell can change multiple times throughout the program's execution. The correctness of these transitions is verified by checking the sequence of operations at each address, sorted by their `mem-step`. For a given address, if an operation is a read, its value must match the value of the preceding operation (either a read or a write) at that same address. ### ROM (rom-data) This represents **Read-Only Memory**. It is designed to be "write-once, read-many." The ROM is initialized with the program's code and constants at startup. The constraints enforce that the very first operation on any ROM address must be a write. All subsequent operations to that same address are strictly limited to being reads of that initial value. Any attempt to overwrite an existing value will fail the proof verification. ### INPUT (input-data) This is a special, read-only memory region used to feed **private inputs** to the program. The primary purpose of this memory type is to allow the zkVM to perform computations on data that the prover wishes to keep secret from the verifier without making it public. For instance, a program could read a secret number from the input stream, compute its hash, and prove that the result matches a public commitment. The proof would validate the computation's integrity without ever exposing the secret number itself. Its behavior is unique: * A special address, `0x90000000`, is mapped to an **input stream**. Each read from this address consumes the next value from the stream. * Reads from any other address within the INPUT region simply **return the address itself** as the value. This feature acts as a safety measure, ensuring deterministic behavior if the program attempts to read from an uninitialized part of the input space. This type of memory is defined in the PIL as a `free_input_mem`, which is a variation of immutable memory where all operations are reads. ## Memory Map The Zisk memory space is precisely organized into regions, each serving a distinct purpose. The layout is defined by constants in the virtual machine's implementation. | Address Range | Name | Type | Description | | :--- | :--- | :--- | :--- | | `0x1000` - `0x1004` | BIOS | ROM | Contains the entry (`ROM_ENTRY`) and exit (`ROM_EXIT`) points for the BIOS, which handles program setup and finalization. | | `0x80000000`... | Program | ROM | Starting address (`ROM_ADDR`) for the main program's instructions. | | `0x90000000`... | Input Data | INPUT | Location (`INPUT_ADDR`) for program inputs. | | `0xA0000000`... | System Area | RAM | Reserved for system operations (`SYS_ADDR`). The first 256 bytes map to the 32 general-purpose registers. Includes the `UART_ADDR` for standard output. | | `0xA0010000`... | Output Data | RAM | A region (`OUTPUT_ADDR`) where the program writes its results. | | `0xA0020000`... | General Purpose | RAM | Available memory (`AVAILABLE_MEM_ADDR`) for the program's dynamic use (stack, heap, etc.). | ## Enforcing Correctness with PIL The rules of the memory model are not just conventions; they are enforced by a set of polynomial constraints defined in PIL (Polynomial Identity Language). ### Memory Segmentation and Continuations To handle programs with a large number of memory operations, the entire memory trace is broken down into smaller, fixed-size "segments". The integrity across these segments is maintained using **continuations**. At the end of a segment, the state of the last memory operation (its address, `mem-step`, and value) is recorded. The subsequent segment begins by "assuming" this exact state as its starting point. This is enforced via `direct_update_proves` and `direct_update_assumes` opcodes in PIL. For a memory area starting at $base\_address$ and divided into segments, the link between segment $i$ and segment $i+1$ is established as follows: - **Segment $i$ proves its final state**: `direct_update_proves(CONT_ID, [base_address, i+1, last_addr, last_step, ...last_value])` - **Segment $i+1$ assumes its initial state**: `direct_update_assumes(CONT_ID, [base_address, i+1, prev_addr, prev_step, ...prev_value])` The bus ensures that the values "proved" by segment $i$ match the values "assumed" by segment $i+1$. A global constraint initializes the very first segment and verifies the very last one, preventing circular proofs and ensuring a single, valid execution trace. ### State Transition Constraints Within a segment, constraints govern how memory state can evolve from one row to the next. A key flag is `addr_changes`, which is 1 if the current operation's address differs from the previous one, and 0 otherwise. - If the address does not change ($addr\_changes = 0$): - And the operation is a read ($wr = 0$), the value must be identical to the previous row's value. \begin{equation} (1 - addr\_changes) \times (1 - wr) \times (value - previous\_value) = 0 \end{equation} - If the address changes ($addr\_changes = 1$): - And the operation is a read on mutable memory ($wr = 0$), the value must be zero, representing the default state of uninitialized memory. \begin{equation} addr\_changes \times (1 - wr) \times value = 0 \end{equation} ### Memory Alignment The core memory component in PIL operates on 8-byte aligned memory accesses. However, the CPU can perform non-aligned reads and writes of 1, 2, or 4 bytes. These non-aligned accesses are handled by a dedicated **Memory Align** component. When a non-aligned operation occurs, the main state machine requests one or two aligned 8-byte chunks from the memory. The `mem.rs` implementation shows this logic in the `read_required` function, which fetches the necessary 8-byte words containing the requested data. The Memory Align component then proves that the smaller value was correctly extracted from or inserted into these larger, aligned words.