Valida zkVM Design

# Valida zkVM Design *Our vision:* We want to make it practical to compile conventional code to a zkVM with reasonable performance. No zkVM today is optimized for this goal. We desire a VM with neither registers nor a dedicated stack, and one that has a small field size with low-degree constraints. We also want efficient compilation from LLVM IR to the set of VM instructions. ## Architecture The zkVM consists of a CPU and several coprocessors, which are connected with communication buses: ```mermaid graph TD; CPU* --- Memory*; CPU* --- Memory*; CPU* --- Memory*; CPU* --- Tip5*; CPU* --- Logic; CPU* --- u32_mul; CPU* --- u32_add_sub; CPU* --- Bls12; CPU* --- Keccak-f; Memory* --- Pager*; ``` \* Part of the core (non-optional) configuration Communication buses are implemented using the logarithmic derivative lookup argument, and are multiplexed for efficiency (i.e. CPU interactions with multiple chips may share the same bus). There are multiple VM configurations. The "Core" configuration is always present, and provides instructions for basic control flow and memory access. Additional configurations, such as "Field Arithmetic" or "Additional Jump" build upon the core configuration and offer additional instructions. ### Instruction format Instructions are encoded in groups of 6 field elements. The first element in the group contains the opcode, followed by three elements representing the operands and two immediate value flags: $\text{opcode}, \text{op}_a$, $\text{op}_b$, $\text{op}_c$, $\text{imm}_b$, $\text{imm}_c$. ### Program ROM Our VM operates under the Harvard architecture, where program code is stored separately from main memory. Code is addressed by any field element, starting from $0$. The program counter `pc` stores the location (a field element) of the instruction that is being executed. ### Memory Memory is comprised of word-addressable cells. A given cell contains 4 field elements, each of which are typically used to store a single byte (arbitrary field elements can also be stored). All core and ALU-related instructions operate on cells (i.e. any operand address is word aligned -- a multiple of 4). In the VM compiler, the address of newly added local variables in the stack is word aligned. For example, a U32 is represented in memory by its byte decomposition (4 elements). To initialize a U32 from an immediate value, we use the `SETL16` instruction (see the complete instruction list below), which sets the first two bytes in memory. To initialize a U32 value greater than 16 bits, we can also call the `SETH16` instruction to set the upper two bytes. ### Immediate Values Our VM cannot represent operand values that are greater than the prime $p$, and cannot distinguish between $0$ and $p$. Therefore, any immediate values greater than or equal to $p$ need to be expanded into smaller values. ### Registers Our zkVM does not operate on general purpose registers. Instead, instructions refer to variables local to the call frame, i.e. relative to the current frame pointer `fp`. ### Notation The following notation is used throughout this document: **Operand values**: `opa`, `opb`, `opc` denote the value encoded in the operand a, b, or c of the current instruction. **CPU registers**: `fp`, `pc` denote the value of the current frame pointer and program counter, respectively. **Relative addressing**: `[a]` denote the cell value at address `a` offset from `fp`, i.e. `fp + a`. Variables local to the call frame are denoted in this form. *Note that we are omitting `fp` in the expression here*, but that the first dereference of an operand is always relative to the frame pointer. **Absolute addressing:** `[[a]]` denotes the cell value at absolute address `[a]`. Heap-allocated values are denoted in this form. To refer to relative or absolute element values, we use the notation $[a]_\text{elem}$ or $[[a]]_\text{elem}$ respectively. ## Instruction list [Depending on the values of the instruction flags, operands $a$ or $b$ may be either an immediate value or memory address. In the table below we assume that these operands represent addresses. Memory values are retrieved by indexing the array $\text{mem}$.]:# Each instruction contains 5 field element operands, $a, b, c, d, e$. Often, $d$ and $e$ are binary flags indicating whther operands $a$ and $b$ are immediate values or relative offets. Listed below are the instructions offered in each configuration. ### Core | Mnemonic | <div style="width:140px">Operands (asm)</div> | Description | | ----------- | --- | ----- | | **LW** / **LOAD32** | `a(fp), c(fp)` | Follow the pointer stored at offset $c$ from the current frame pointer and write the next 4 byte values to those beginning at offset a. Operand $b$ is unused, but is constrained to $[c]$ in the trace. LOAD32 is used to load 4 bytes from the heap, and is aligned (i.e. the address at offset $c$ is assumed to be a multiple of $4$). | | **SW** / **STORE32** | `b(fp), c(fp)` | Write the 4 byte values beginning at the address stored at offset $c$ to those beginning at offset $b$. Operand $a$ is unused, but is constrained to $[c]$ in the trace. STORE32 is used to write 4 bytes to the heap, and is aligned. | | **JAL** | `a(fp), b, c` | Jump to address and link: Store the $pc+1$ to local stack variable at offset $a$, then set $pc$ to field element $b$. Set $fp$ to $fp + c$. | | **JALV** | `a(fp), b(fp), c(fp)` | Jump to variable and link: Store the $pc+1$ to local stack variable at offset $a$, then set $pc$ to the field element $[b]_{elem}$. Set $fp$ to $fp$ + $[c]$. | | **BEQ** | `a, b(fp), c(fp)` | If $[b] = [c]$, then set the program counter $\text{pc}$ to $a$ | | **BEQI** | `a, b(fp), c` | If $[b] = c$, then set the program counter $\text{pc}$ to $a$ | | **BNE** | `a, b(fp), c(fp)` | If $[b] \neq [c]$, then set the program counter $\text{pc}$ to $a$ | | **BNEI** | `a, b(fp), c` | If $[b] \neq c$, then set the program counter $\text{pc}$ to $a$ | | **IMM32** | `a, b, c, d, e` | Write the immediate values $b, c, d, e$ to the cell located at offset $a$.| ### Field arithmetic | Mnemonic | <div style="width:170px">Operands (asm)</div> | Description | | ----------- | --- | ------------------------------------------------------------ | | **FEADD** | `a(fp), b(fp), c(fp)`| $d$ and $e$ are a flags denoting whether $a$ and $b$ are interpreted as an immediate or offset. Let $A = a$ if $d = 1$ and $[a]_{elem}$ otherwise. Let $B = b$ if $e = 1$ and $[b]_{elem}$ otherwise. The instruction compute $A + B$, and write the result to offset $c$ | | **FEMUL** | `a(fp), b(fp), c(fp)`| $d$ and $e$ are a flags denoting whether $a$ and $b$ are interpreted as an immediate or offset. Let $A = a$ if $d = 1$ and $[a]_{elem}$ otherwise. Let $B = b$ if $e = 1$ and $[b]_{elem}$ otherwise. The instruction compute $A \cdot B$, and write the result to offset $c$ | | **TO_FE** | `a(fp), b(fp)` | Convert an U32, represented by 4 field elements starting at offset b, to a field element stored to the first field element at offset $a$. $a$ is assuemd to be a multiple of $4$. | | **FROM_FE** | `a(fp), b(fp)` | Convert a field element $[b]_{elem}$ to an U32 stored at offset a, which is assumed to be a multiple of $4$. | Note that field arithmetic instructions only operate on the first element in a cell, which represents a field element instead of a single byte. ### U32 Arithmetic | Mnemonic | <div style="width:170px">Operands (asm)</div> | Description | | ----------- | ------------- | ----------------------------------------------------------------------------------- | | **ADD** | `a(fp), b(fp), c(fp)` | Compute the unchecked addition of the U32 values at cell offsets $b$ and $c$ and write the sum to cell offset $a$. Note that because a full 32-bit value does not fit within one field element, we assume that values have been decomposed into 4 8-byte elements. The summed output is stored at cell offset $a$. The same limb decomposition is used for the other U32 operations listed below. | | **ADDI** | `a(fp), b(fp), c` | Compute the unchecked addition of the U32 variable at cell offsets $b$ and an immediate $c$, and write the sum to cell offset $a$.| | **SUB** | `a(fp), b(fp), c(fp)` | Unchecked subtraction | | **SUBI** | `a(fp), b(fp), c` | Unchecked subtraction | | **MUL** | `a(fp), b(fp), c(fp)` | Unchecked multiplication | | **MULI** | `a(fp), b(fp), c` | Unchecked subtraction | | **DIV** | `a(fp), b(fp), c(fp)` | Division | | **SH{L,R}** | `a(fp), b(fp), c(fp)` | Shift [b] left/right by [c] and write to offset $a$. | | **SH{L,R}I** | `a(fp), b(fp), c` | Shift [b] left/right by c and write to offset $a$. | | **ISH{L,R}** | `a(fp), b, c(fp)` | Shift b left/right by [c] and write to offset $a$. | | **LT** | `a(fp), b(fp), c(fp)` | Set local variable $a$ to $1$ if $[b] < [c]$ and $0$ otherwise. | | **LTI** | `a(fp), b(fp), c` | Set local variable $a$ to $1$ if $[b] < c$ and $0$ otherwise. |   ### Bitwise | Mnemonic | Operands (asm) | Description | | ----------- | - | ----------------------------------------------------------------------------------- | | **AND** | `a(fp), b(fp), c(fp)` | Set $[a]$ to $[b]$ bitwise-and $[c]$ | | **OR** | `a(fp), b(fp), c(fp)` | Set $[a]$ to $[b]$ bitwise-or $[c]$ | | **XOR** | `a(fp), b(fp), c(fp)` | Set $[a]$ to $[b]$ bitwise-xor $[c]$ |  ### Byte Manipulation **Note**: These will not be supported in the initial version. | Instruction | <div style="width:130px">Operands (asm)</div> | | | ----------- | ------------- | -| | **LOAD8** | `a(fp), b(fp)` | Load a byte at the address specified by local variable at offset $b$ to local variable at offset $a$.| | **STORE8** | `b(fp), c(fp)` | Store a byte encoded at offset $c$ to the address encoded in offset $b$| | **STORE8I** | `b(fp), c` | Store a byte encoded in the field element $c$ to the address encoded in offset $b$| ## Heap allocation Notes: - Fixed configurable stack size (e.g. 8MB), growing in opposite direction of the heap. - Allocate-only malloc (no de-allocation using `free`) ## Assembly ### Instructions  We will closely follow RISC-V assembly, making modifications as necessary. The most important difference between our zkVM assembly and RV32IM is that instead of registers `x0-31`, we only have two special-purpose registers `fp` and `pc`. However, we have (up to $2^{31}-1$) local variables, addressed relative to the current frame point `fp`. ### Calling convention / stack frame | Stack (grows downwards, i.e. address decreases from top row to bottom row) | | -------- | | Arg 2 | | Arg 1 | | Return FP | | Return value | | Return address **(<- Current fp)** | | Local 1 | | Local 2 | | ... | | Local N | We follow the RISC-V convention and grow the stack downwards. For a function call, the arguments are pushed onto the stack in reverse order. **We only allow statically sized allocation on the stack**, unlike traditional architectures where `alloca` can be used to allocate dynamically. All dynamic allocation will be compiled to heap allocations. Instead of using a frame pointer that points at the begining of the frame, we use a stack pointer which points at the first free stack cell. Note that: * Functions arguments are stored at fp + 12, fp + 16, ... * Return FP (the value of FP before the call) is stored at fp+8 * Return value is stored at fp + 4 * Return address is stored at fp * Local variables are stored at fp - 4, fp - 8, ... ### Pseudo instructions | Pseudo Instruction | Instruction | | -------- | -------- | | `call label` | `imm32 (-b+8)(fp), 0, 0, 0, -b(fp)`; `jal -b(fp), label, -b(fp)`, where b is the size of the current stack frame plus the call frame size for instantiating a call to label | | `ret` | `jalv -4(fp), 0(fp), 8(fp)`| ### Implementing MEMCPY/SET/MOVE Memcpy will require roughly 2 cycles per word. We can follow [this memcpy implementation on RISC-V](https://github.com/westerndigitalcorporation/RISC-V-Linux/blob/master/linux/arch/riscv/lib/memcpy.S). ### Example program ```assembly= ; fib.s ... fib: ; @fib ; %bb.0: sw -4(fp), 12(fp) imm32 -8(fp), 0, 0, 0, 0 imm32 -12(fp), 0, 0, 0, 1 imm32 -16(fp), 0, 0, 0, 0 beq .LBB0_1, 0(fp), 0(fp) .LBB0_1: bne .LBB0_2, -16(fp), -4(fp) beq .LBB0_4, 0(fp), 0(fp) ; %bb.2: add -20(fp), -8(fp), -12(fp) sw -8(fp), -12(fp) sw -12(fp), -20(fp) beq .LBB0_3, 0(fp), 0(fp) ; %bb.3: addi -16(fp), -16(fp), 1 beq .LBB0_1, 0(fp), 0(fp) .LBB0_4: sw 4(fp), -8(fp) jalv -4(fp), 0(fp), 8(fp) ... main: ; @main ; %bb.0: imm32 -4(fp), 0, 0, 0, 0 imm32 -8(fp), 0, 0, 0, 10 sw -16(fp), -8(fp) imm32 -20(fp), 0, 0, 0, 28 jal -28(fp), fib, -28 sw -12(fp), -24(fp) sw 4(fp), -12(fp) jalv -4(fp), 0(fp), 8(fp) ... ``` The (partial) stack at the time of executing the first instruction (`sw`) inside `fib` after the call from `main` (line 6 above) looks like: | Stack | Description | | -------- | ---- | | 10 | Argument 1 | | 28 | Return frame pointer | | ... | Return value | | main:6 | Return address | | ... | Local variable 1 (unset) | | ... | Local variable 2 (unset) |  ## Trace ### Main CPU | Columns | Configuration | Description | | --------------- | --- | -------------------------------------------------------------------------------------- | | $\text{clk}$ | Core | Clock cycle | | $\text{pc}$ | Core | Program counter | | $\text{fp}$ | Core | Frame pointer | | $\text{opcode}$ | Core | Instruction opcode | | $\text{op}_a$ | Core | Operand $a$ | | $\text{op}_b$ | Core | Operand $b$ | | $\text{op}_c$ | Core | Operand $c$ | | $\text{op}_d$ | Core | Operand $d$, flag for if $\text{op}_a$ is immediate or offset.| | $\text{op}_e$ | Core | Operand $e$, flag for if $\text{op}_b$ is immediate or offset.| | $\text{addr}_a$ | Core | $\text{fp} + \text{op}_a$ (if $\text{op}_d$ is not set) | | $\text{addr}_b$ | Core | $\text{fp} + \text{op}_b$ (if $\text{op}_e$ is not set) | | $\text{addr}_c$ | Core | $\text{fp} + \text{op}_c$ | Columns $\text{opcode}, \text{op}_a, \text{op}_b, \text{op}_c, \text{op}_d, \text{op}_e$ are specified by the program code (see the "Instruction Trace" section below). Trace cells are also allocated to hold buffered read memory values for $\text{addr}_a$ and $\text{addr}_b$, and buffered write values for $\text{addr}_c$. We read and write 4 elements from memory at a time to the main trace. These elements are only constrained when the immediate value flags are not set (see the "Instruction Decoding" section below): | Cell | Configuration | Description | | --------------- | --- | -------------------------------------------------------------------------------------- | | $v_{a,0}$ | Core | $[\text{addr}_a]_\text{elem}$| | $v_{a,1}$ | Core | $[\text{addr}_a+1]_\text{elem}$| | $v_{a,2}$ | Core | $[\text{addr}_a+2]_\text{elem}$| | $v_{a,3}$ | Core | $[\text{addr}_a+3]_\text{elem}$| | $v_{b,0}$ | Core | $[\text{addr}_b]_\text{elem}$| | $v_{b,1}$ | Core | $[\text{addr}_b+1]_\text{elem}$| | $v_{b,2}$ | Core | $[\text{addr}_b+2]_\text{elem}$| | $v_{b,3}$ | Core | $[\text{addr}_b+3]_\text{elem}$| | $v_{c,0}$ | Core | Value written to $\text{addr}_c$ | | $v_{c,1}$ | Core | Value written to $\text{addr}_c+1$ | | $v_{c,2}$ | Core | Value written to $\text{addr}_c+2$ | | $v_{c,3}$ | Core | Value written to $\text{addr}_c+3$ | ### Memory | Cell | Description | | ------------------------ | ----------- | | $\text{addr}$ | Address | | $\text{clk}$ | Clock cycle | | $\text{val}_0$ | Value 0 | | $\text{val}_1$ | Value 1 | | $\text{val}_2$ | Value 2 | | $\text{val}_3$ | Value 3 | | $d_0$ | Lower 16 bits of $\text{clk}'-\text{clk}-1$ | | $d_1$ | Upper 16 bits of $\text{clk}'-\text{clk}-1$ | | $t$ | Nondeterministic inverse of $\text{addr}'-\text{addr}$ | The memory table is sorted by ($\text{addr}, \text{clk}$)  ### Instruction decoding Trace cells are also allocated for each selector. In each cycle, main CPU opcodes are decoded into binary selector flags, or to a single `is_bus_opcode` flag in the case that the opcode is processed by a different chip. ### Instruction Trace Each instruction is encoded as 6 field elements #### Core | Mnemonic | Operands (asm) | Encoded | | ----------- | --- | ---------------------------------------------------- | | **LW** / **LOAD32** | `a(fp), c(fp)` | $\text{OP}_{LW}$, $a$, _, $c$, $0$, $0$ | Follow the pointer stored at offset $c$ from the current frame pointer and write the next 4 byte values to those beginning at offset a. Operand $b$ is unused, but is constrained to $[c]$ in the trace. LOAD32 is used to load 4 bytes from the heap, and is aligned (i.e. the address at offset $c$ is assumed to be a multiple of $4$). | | **SW** / **STORE32** | `b(fp), c(fp)` | $\text{OP}_{SW}$, _, $b$, $c$, $0$, $0$ | Write the 4 byte values beginning at the address stored at offset $c$ to those beginning at offset $b$. Operand $a$ is unused, but is constrained to $[c]$ in the trace. STORE32 is used to write 4 bytes to the heap, and is aligned. | | **JAL** | `a(fp), b, c` | $\text{OP}_{jump}$, $a$, $b$, $c$, $1$, $1$| Jump to address and link: Store the $pc+1$ to local stack variable at offset $a$, then set $pc$ to field element $b$. Set $fp$ to $fp + c$. | | **JALV** | `a(fp), b(fp), c` | $\text{OP}_{jump}$, $a$, $b$, $c$, $0$, $1$| Jump to variable and link: Store the $pc+1$ to local stack variable at offset $a$, then set $pc$ to the field element $[b]_{elem}$. Set $fp$ to $fp + c$. | | **BEQ** | `a, b(fp), c(fp)` | $\text{OP}_{branch}$, $a$, $b$, $c$, $0$, $0$ | If $[b] = [c]$, then set the program counter $\text{pc}$ to $a$ | | **BEQ** | `a, b(fp), c` | $\text{OP}_{branch}$, $a$, $b$, $c$, $0$, $1$ | If $[b] = c$, then set the program counter $\text{pc}$ to $a$ | | **BNE** | `a, b(fp), c(fp)` | $\text{OP}_{branch}$, $a$, $b$, $c$, $1$, $0$ | If $[b] \neq [c]$, then set the program counter $\text{pc}$ to $a$ | | **BNEI** | `a, b(fp), c` | $\text{OP}_{branch}$, $a$, $b$, $c$, $1$, $1$ | If $[b] \neq c$, then set the program counter $\text{pc}$ to $a$ | | **IMM32** | `a, b, c, d, e` | $\text{OP}_{IMM}$, $a$, $b$, $c$, $d$, $e$ | Write the immediate values $b, c, d, e$ to the cell located at offset $a$.| #### Field arithmetic | Mnemonic | Operands (asm) | Encoded | | ----------- | --- | ------------- | | **FEADD** | `a(fp), b(fp), c(fp)`| $\text{OP}_{FEADD}$, $a$, $b$, $c$, $d$, $e$ | $d$ and $e$ are a flags denoting whether $a$ and $b$ are interpreted as an immediate or offset. Let $A = a$ if $d = 1$ and $[a]_{elem}$ otherwise. Let $B = b$ if $e = 1$ and $[b]_{elem}$ otherwise. The instruction compute $A + B$, and write the result to offset $c$ | | **FEMUL** | `a(fp), b(fp), c(fp)`| $\text{OP}_{FEMUL}$, $a$, $b$, $c$, $d$, $e$ | $d$ and $e$ are a flags denoting whether $a$ and $b$ are interpreted as an immediate or offset. Let $A = a$ if $d = 1$ and $[a]_{elem}$ otherwise. Let $B = b$ if $e = 1$ and $[b]_{elem}$ otherwise. The instruction compute $A \cdot B$, and write the result to offset $c$ | | **TO_FE** | `a(fp), b(fp)` | | Convert an U32, represented by 4 field elements starting at offset b, to a field element stored to the first field element at offset $a$. $a$ is assuemd to be a multiple of $4$. | | **FROM_FE** | `a(fp), b(fp)` | | Convert a field element $[b]_{elem}$ to an U32 stored at offset a, which is assumed to be a multiple of $4$. | #### U32 Arithmetic | Mnemonic | Operands (asm) | Encoded | | ----------- | ------------- | --| | **ADD** | `a(fp), b(fp), c(fp)` | $\text{OP}_{ADD}$, $a$, $b$, $c$, $0$, $0$ | Compute the unchecked addition of the U32 values at cell offsets $b$ and $c$ and write the sum to cell offset $a$. Note that because a full 32-bit value does not fit within one field element, we assume that values have been decomposed into 4 8-byte elements. The summed output is stored at cell offset $a$. The same limb decomposition is used for the other U32 operations listed below. | | **ADDI** | `a(fp), b(fp), c` | $\text{OP}_{ADD}$, $a$, $b$, $c_0$, $1$, $c_1$ | Compute the unchecked addition of the U32 variable at cell offsets $b$ and an immediate $c$, and write the sum to cell offset $a$.| | **SUB** | `a(fp), b(fp), c(fp)` | $\text{OP}_{SUB}$, $a$, $b$, $c$, $0$, $0$ | Unchecked subtraction | | **SUBI** | `a(fp), b(fp), c` | $\text{OP}_{SUB}$, $a$, $b$, $c_1$, $1$, $c_2$ | Unchecked subtraction | | **MUL** | `a(fp), b(fp), c(fp)` | $\text{OP}_{MUL}$, $a$, $b$, $c$, $0$, $0$| Unchecked multiplication | | **MULI** | `a(fp), b(fp), c` | $\text{OP}_{MUL}$, $a$, $b$, $c_1$, $1$, $c_2$ | Unchecked subtraction | | **DIV** | `a(fp), b(fp), c(fp)` | $\text{OP}_{MUL}$, $a$, $b$, $c$, $0$, $0$| Division | | **SH{L,R}** | `a(fp), b(fp), c(fp)` | $\text{OP}_{SHIFT}$, $a$, $b$, $c$, $0$, $0$| Shift [b] left/right by [c] and write to offset $a$. | | **SH{L,R}I** | `a(fp), b(fp), c` | $\text{OP}_{SHIFT}$, $a$, $b$, $c$, $0$, $1$| Shift [b] left/right by c and write to offset $a$. | | **ISH{L,R}** | `a(fp), b, c(fp)` | $\text{OP}_{SHIFT}$, $a$, $b$, $c$, $0$, $1$| Shift b left/right by [c] and write to offset $a$. | | **SLT** | `a(fp), b(fp), c(fp)` | $\text{OP}_{SLT}$, $a$, $b$, $c$, $0$, $0$| Set local variable $a$ to $1$ if $[b] < [c]$ and $0$ otherwise. | | **SLT** | `a(fp), b(fp), c` | $\text{OP}_{SLT}$, $a$, $b$, $c$, $0$, $1$| Set local variable $a$ to $1$ if $[b] < c$ and $0$ otherwise. |   ### Bitwise | Mnemonic | Operands (asm) | Encoded | | ----------- | - | ------------- | | **AND** | `a(fp), b(fp), c(fp)` | $\text{OP}_{AND}$, $a$, $b$, $c$, $0$, _| Set $[a]$ to $[b]$ bitwise-and $[c]$ | | **OR** | `a(fp), b(fp), c(fp)` | $\text{OP}_{OR}$, $a$, $b$, $c$, $0$, _| Set $[a]$ to $[b]$ bitwise-or $[c]$ | | **XOR** | `a(fp), b(fp), c(fp)` | $\text{OP}_{XOR}$, $a$, $b$, $c$, $0$, _| Set $[a]$ to $[b]$ bitwise-xor $[c]$ |    ## Design notes  ### Frontend target We are writing a compiler from LLVM IR to our ISA ### ZK stack This is a STARK-based zkVM. We are using [Plonky3](https://github.com/Plonky3/Plonky3) to implement the polynomial IOP and PCS.  ### Field choice We plan to use the 32-bit field defined by `p = 2^31 - 1`, which should give very good performance on GPUs or with most vector instruction sets.  ### Registers Our VM has no general purpose registers, since memory is cheap.  ### Memory We will use a conventional R/W memory. ### Tables The CPU can do up to three memory operations per cycle, to support binary operations involving two reads and one write. If we used a single-trace model, we could support this by adding columns for 6 memory operations in each row of our trace: 3 for the chronological memory log and 3 for the `(address, timestamp)` ordered memory log. Instead, we make the memory a separate table (i.e. a separate STARK which gets connected with a permutation argument). We also use multi-table support to implement other coprocessors that are wasteful to include in the main CPU, as their operations may not be used during most cycles (e.g. Keccak).  ### Continuations TODO: Explain the permutation-based continuation implementation. ### Lookups Initially, we will support lookups only against prover-supplied tables. The main use case is range checks. To perform a 16-bit range check, for example, we would have the prover send a table containing `[0 .. 2^16 - 1]` in order. (If the trace was not already 2^16, we would pad it. If it was longer than 2^16, the prover would include some duplicates.) We would then use constraints to enforce that this table starts at 0, ends at 2^16 - 1, and increments by 0 or 1. Preprocessed tables can also be useful, particularly for bitwise operations like xor. However, we will not support them initially because they require non-succinct preprocessing. ### Floating point arithmetic Fast floating point arithmetic doesn't seem important for our anticipated use cases, so we will convert float operations to integer ones during compilation.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.