owned this note
owned this note
Published
Linked with GitHub
<style>
.narrow-note th:last-child { width: 35% }
</style>
## Update
> **Update 07/07**
> We have some discussion about how we could design bus mapping to be more friendly for evm circuit, then we could have serveral approaches described [here](https://hackmd.io/muKcvXHnREqPh46X1HNKMA). Since we have not explored evm circuit much, we don't know which one is better. So we will stick to the original simplest way to do bus mapping, then optmize for evm circuit if needed.
> **Update 07/26**
> After some discussion about how to handle misaligned memory access, we decide to treat memory as a byte array in the beginning. So any memory operation below should expand to 32 items from byte to byte (could be a lot of `0` read/write).
> From the perspective of constraints, memory circuit has less difference from stack circuit, the main difference is memory circuit doesn't has 1024 size constraint, so prover could expand memory to the size he want once the gas left is allowed.
> **Update 08/05**
> The `compress` for bus mapping seems to be unnecessary because [`halo2` does for us](https://github.com/zcash/halo2/blob/27c4187673a9c6ade13fbdbd4f20955530c22d7f/src/plonk/lookup/prover.rs#L97). State circuit and evm circuit still share the bus mapping table for vector lookup, and we don't need to worry about the compression, but just make sure the table compressing randomness for the vector lookup between the two proof is equal (we could just share the transcript). So the `compress` inside `bus_mapping_lookup` in the note will be removed because we don't actually do it ourself.
> Note that for [word encoding](/QUiYS3MnTu29s62yg9EtLQ), we still need to do random linear combination ourself, we call it `encode` to avoid ambiguity from now on.
## Introduction
State circuit serves as a random accessible data holder of stack, memory, storage, and all the others things evm interpreter could access at any time.
This note tried to focus on **stack** and **memory** first. It goes through memory and stack sub-circuit, then explain how these sub-circuits provides the valid access records, the **bus mapping**, for evm circuit to read and write.
I take this naive but concrete solidity code as example:
```solidity
pragma solidity ^0.8;
contract Sample {
function memory_sample() public pure {
assembly {
let ptr := mload(0x40)
mstore(ptr, 0xdeadbeaf)
mstore(ptr, add(mload(ptr), 0xfaceb00c))
mstore(add(ptr, 0x20), 0xcafeb0ba)
}
}
}
```
When function `memory_sample` is executed, the evm should has such log (only focus on the function body in this note):
```
pc op stack (top -> down) memory
-- -------------- ---------------------------------- ---------------------------------------
...
53 JUMPDEST [ , , , ] {40: 80, 80: , a0: }
54 PUSH1 40 [ , , , 40] {40: 80, 80: , a0: }
56 MLOAD [ , , , 80] {40: 80, 80: , a0: }
57 PUSH4 deadbeaf [ , , deadbeef, 80] {40: 80, 80: , a0: }
62 DUP2 [ , 80, deadbeef, 80] {40: 80, 80: , a0: }
63 MSTORE [ , , , 80] {40: 80, 80: deadbeef, a0: }
64 PUSH4 faceb00c [ , , faceb00c, 80] {40: 80, 80: deadbeef, a0: }
69 DUP2 [ , 80, faceb00c, 80] {40: 80, 80: deadbeef, a0: }
70 MLOAD [ , deadbeef, faceb00c, 80] {40: 80, 80: deadbeef, a0: }
71 ADD [ , , 1d97c6efb, 80] {40: 80, 80: deadbeef, a0: }
72 DUP2 [ , 80, 1d97c6efb, 80] {40: 80, 80: deadbeef, a0: }
73 MSTORE [ , , , 80] {40: 80, 80: 1d97c6efb, a0: }
74 PUSH4 cafeb0ba [ , , cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
79 PUSH1 20 [ , 20, cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
81 DUP3 [ 80, 20, cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
82 ADD [ , a0, cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
83 MSTORE [ , , , 80] {40: 80, 80: 1d97c6efb, a0: cafeb0ba}
84 POP [ , , , ] {40: 80, 80: 1d97c6efb, a0: cafeb0ba}
...
```
## Definition
- `fq` - 253-bit value.
- `op` - A byte representing EVM operation code
- `code` - A vector of `op` compiled from smart contract
- `pc` - Program counter
- `gc` - Global counter, is offset to `0` for simplicity
- `sp` - Stack pointer
- `stack` - A vector of `fq` with max size `1024`
- `memory` - A vector of bytes
- `$a === $b` - `$a` is equal to `$b`
- `$t_lookup` - A function that ensures the input is in table `$t`
## Memory Circuit
In memory circuit, prover should collect all `MLOAD` and `MSTORE` operation and order them by `key` and then by `gc`, then build a layout with column `key`, `val`, `rw` and `gc`, which stands:
- `key` - `key` of `memory` we are operating
- `val` - `memory[key]` after operation
- `rw` - access enum, could be
- `0` - `Read`
- `1` - `Write`
With some auxiliary notation:
- `key_prev` - `key` value in previous row
- `gc_prev` - `gc` value in previous row
- `val_prev` - `val` value in previous row
The constraint will be:
<div class="narrow-note">
| Condition | Constraint | Note |
| ------------------- | ------------------------------------------ | -------------------------------------------- |
| INIT | `rw === Write` <br> and `val === 0` | First row of circuit (does not query `prev`) |
| `*` | `rw ∈ [0, 1]` | Should be valid `rw` |
| `*` | `key - key_prev ∈ [0, ?]` | Should be non-strict monotonic |
| `*` | `val ∈ [0, 255]` | Should be a byte value |
| `key != key_prev` | `rw === Write` <br> and `val === 0` | Should be initialized to `0` |
| `key == key_prev` | `gc > gc_prev` | Should be strict monotonic |
| `rw == Read` | `val === val_prev` | Should be previous written value |
</div>
In the above sample, the memory table should be like:
| `key` | `val` | `rw` | `gc` | Note |
|:------:| ------------- | ------- | ---- | ---------------------------------------- |
| `0x40` | `0` | `Write` | | Init |
| `0x40` | `0x80` | `Write` | ? | Assume written at the begining of `code` |
| `0x40` | `0x80` | `Read` | 4 | `56 MLOAD` |
| - | | | | |
| `0x80` | `0` | `Write` | | Init |
| `0x80` | `0xdeadbeef` | `Write` | 10 | `63 MSTORE` |
| `0x80` | `0xdeadbeef` | `Read` | 16 | `70 MLOAD` |
| `0x80` | `0x1d97c6efb` | `Write` | 24 | `73 MSTORE` |
| - | | | | |
| `0xa0` | `0` | `Write` | | Init |
| `0xa0` | `0xcafeb0ba` | `Write` | 34 | `83 MSTORE` |
## Stack Circuit
Stack circuit is like memory circuit, but to prevent modification on every entry when `PUSH` or `POP`, we let evm circuit maintain a stack pointer `sp` inited at `1024` to lookup the top value of stack. For example, `PUSH` does `sp--` and `POP` does `sp++`.
The constraint will be:
<div class="narrow-note">
| Condition | Constraint | Note |
| ----------------- | ----------------------------------- | -------------------------------------------- |
| INIT | `rw === Write` <br> and `val === 0` | First row of circuit (does not query `prev`) |
| `*` | `rw ∈ [0, 1]` | Should be valid `rw` |
| `*` | `key - key_prev ∈ [0, 1023]` | Should be non-strict monotonic |
| `*` | `key ∈ [0, 1023]` | Should be in range |
| `key != key_prev` | `rw === Write` <br> and `val === 0` | Should be initialized to `0` |
| `key == key_prev` | `gc > gc_prev` | Should be strict monotonic |
| `rw == Read` | `val === val_prev` | Should be previous written value |
</div>
Some `op` should be accompanied by multiple stack read/write at the same time, for `DUPX` as example, we have to check if the source and new pushed value is equal, so it requires a `Read` and a `Write` to be in bus mapping.
We let evm circuit to use multiple lookup to ensure these read/write happen at the same time (memory as well if needed), so we could have these constraints (`$x` means variable, should be equal in the same `op`):
<div class="narrow-note">
| Condition | Constraint | Note |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `op == PUSH` | `bus_mapping_lookup(` <br> ` gc,` <br> ` Stack,` <br> ` key,` <br> ` val,` <br> ` Write` <br> `)` | Should exist in bus mapping |
| `op == DUPX` | `bus_mapping_lookup(` <br> ` gc,` <br> ` Stack,` <br> ` key+X,` <br> ` val,` <br> ` Read` <br> `)` and <br> `bus_mapping_lookup(` <br> ` gc+1,` <br> ` Stack,` <br> ` key,` <br> ` val,` <br> ` Write` <br> `)` | Both source and new written values should exist in bus mapping. |
| `op == MLOAD` | `bus_mapping_lookup(` <br> ` gc,` <br> ` Stack,` <br> ` key,` <br> ` $key,` <br> ` Read` <br> `)` and <br> `bus_mapping_lookup(` <br> ` gc+1,` <br> ` Stack,` <br> ` key,` <br> ` $val,` <br> ` Write` <br> `)` and <br> `bus_mapping_lookup(` <br> ` gc+2,` <br> ` Memory,` <br> ` $key,` <br> ` $val,` <br> ` Read` <br> `)` | Memory loading `$key` stack `Read`, `$val` stack `Write`, and memory `Read` should exist in bus mapping. |
| `op == MSTORE` | `bus_mapping_lookup(` <br> ` gc,` <br> ` Stack,` <br> ` key,` <br> ` $key,` <br> ` Read` <br> `)` and <br> `bus_mapping_lookup(` <br> ` gc+1,` <br> ` Stack,` <br> ` key+1,` <br> ` $val,` <br> ` Read` <br> `)` and <br> `bus_mapping_lookup(` <br> ` gc+2,` <br> ` Memory,` <br> ` $key,` <br> ` $val,` <br> ` Write` <br> `)` | Memory storing `$key` `$val` stack `Read`, and memory `Write` should exist in bus mapping. |
| `op == SWAPX` | See ↓ | |
> [name=hshen@scroll] Just want to help add the missing piece for the SWAPX. We need 4 bus mapping lookup for 2 reads and 2 writes as follows. Please help check the correctness of the constraints:
> `bus_mapping_lookup(gc, Stack, key+X, $val1, Read)`
> `bus_mapping_lookup(gc+1, Stack, key, $val2, Read)`
> `bus_mapping_lookup(gc+2, Stack, key+X, $val2, Write)`
> `bus_mapping_lookup(gc+3, Stack, key, $val1, Write)`
> Note: 2 reads at `key` and `key+X` and 2 writes with swapped values at `key` and `key+X` in the stack should exist in bus mapping.
</div>
In the above sample, the stack table should be like:
| `key` | `val` | `rw` | `gc` | Note |
|:------:| ------------- | ------- | ---- | --------------- |
| `1020` | `0` | `Write` | | Init |
| `1020` | `0x80` | `Write` | 28 | `81 DUP3` |
| `1020` | `0x80` | `Read` | 29 | `82 ADD.POP` |
| - | | | | |
| `1021` | `0` | `Write` | | Init |
| `1021` | `0x80` | `Write` | 7 | `62 DUP2` |
| `1021` | `0x80` | `Read` | 8 | `63 MSTORE.POP` |
| `1021` | `0x80` | `Write` | 13 | `69 DUP2` |
| `1021` | `0x80` | `Read` | 14 | `70 MLOAD.POP` |
| `1021` | `0xdeadbeef` | `Write` | 15 | `70 MLOAD.PUSH` |
| `1021` | `0xdeadbeef` | `Read` | 17 | `71 ADD.POP` |
| `1021` | `0x80` | `Write` | 21 | `72 DUP2` |
| `1021` | `0x80` | `Read` | 22 | `73 MSTORE.POP` |
| `1021` | `0x20` | `Write` | 26 | `79 PUSH1` |
| `1021` | `0x20` | `Read` | 30 | `82 ADD.POP` |
| `1021` | `0xa0` | `Write` | 31 | `82 ADD.PUSH` |
| `1021` | `0xa0` | `Read` | 32 | `83 MSTORE.POP` |
| - | | | | |
| `1022` | `0` | `Write` | | Init |
| `1022` | `0xdeadbeef` | `Write` | 5 | `57 PUSH4` |
| `1022` | `0xdeadbeef` | `Read` | 9 | `63 MSTORE.POP` |
| `1022` | `0xfaceb00c` | `Write` | 11 | `64 PUSH4` |
| `1022` | `0xfaceb00c` | `Read` | 18 | `71 ADD.POP` |
| `1022` | `0x1d97c6efb` | `Write` | 19 | `71 ADD.PUSH` |
| `1022` | `0x1d97c6efb` | `Read` | 23 | `73 MSTORE.POP` |
| `1022` | `0xcafeb0ba` | `Write` | 25 | `74 PUSH4` |
| `1022` | `0xcafeb0ba` | `Read` | 33 | `83 MSTORE.POP` |
| - | | | | |
| `1023` | `0` | `Write` | | Init |
| `1023` | `0x40` | `Write` | 1 | `54 PUSH1 40` |
| `1023` | `0x40` | `Read` | 2 | `56 MLOAD.POP` |
| `1023` | `0x80` | `Write` | 3 | `56 MLOAD.PUSH` |
| `1023` | `0x80` | `Read` | 6 | `62 DUP2` |
| `1023` | `0x80` | `Read` | 12 | `69 DUP2` |
| `1023` | `0x80` | `Read` | 20 | `72 DUP2` |
| `1023` | `0x80` | `Read` | 27 | `81 DUP3` |
## Storage Circuit
Storage access is not covered by this note, see [here](/kON1GVL6QOC6t5tf_OTuKA) for more details
## Bus Mapping
Memory and stack circuit will provide the valid and meaningful access record (some rows like init will not be included) to the bus mapping lookup table, which is shared by the state circuit and evm circuit.
It has a unique `gc` to serves as a synchronizing clock, a `target` to specify the residue of the access record, and many arbitrary `valX` if necessary. In evm circuit, we lookup all `gc` one by one and finally check the bus mapping degree is bounded to `gc` in the execution end, then we have confidence that no malicious write is inserted.
We have enum `target` with their `valX` representation:
- `Stack`
- `val1` - `key`
- `val2` - `val`
- `val3` - `rw`
- `Memory`
- `val1` - `key`
- `val2` - `val`
- `val3` - `rw`
- `Storage`
- `val1` - `key`
- `val2` - `val`
- `val3` - `rw`
- `val4` - `val_prev`
- `val5` - `is_first_touch`
- `...`
In the above sample, the bus mapping should be like (order by gc increasingly):
| `gc` | `target` | `val1` | `val2` | `val3` | Note |
| ---- | -------- | ------ | ------------- | ------- | --------------- |
| 1 | `Stack` | `1023` | `0x40` | `Write` | `54 PUSH1 40` |
| 2 | `Stack` | `1023` | `0x40` | `Read` | `56 MLOAD.POP` |
| 3 | `Stack` | `1023` | `0x80` | `Write` | `56 MLOAD.PUSH` |
| 4 | `Memory` | `0x40` | `0x80` | `Read` | `56 MLOAD` |
| 5 | `Stack` | `1022` | `0xdeadbeef` | `Write` | `57 PUSH4` |
| 6 | `Stack` | `1023` | `0x80` | `Read` | `62 DUP2` |
| 7 | `Stack` | `1021` | `0x80` | `Write` | `62 DUP2` |
| 8 | `Stack` | `1021` | `0x80` | `Read` | `63 MSTORE.POP` |
| 9 | `Stack` | `1022` | `0xdeadbeef` | `Read` | `63 MSTORE.POP` |
| 10 | `Memory` | `0x80` | `0xdeadbeef` | `Write` | `63 MSTORE` |
| 11 | `Stack` | `1022` | `0xfaceb00c` | `Write` | `64 PUSH4` |
| 12 | `Stack` | `1023` | `0x80` | `Read` | `69 DUP2` |
| 13 | `Stack` | `1021` | `0x80` | `Write` | `69 DUP2` |
| 14 | `Stack` | `1021` | `0x80` | `Read` | `70 MLOAD.POP` |
| 15 | `Stack` | `1021` | `0xdeadbeef` | `Write` | `70 MLOAD.PUSH` |
| 16 | `Memory` | `0x80` | `0xdeadbeef` | `Read` | `70 MLOAD` |
| 17 | `Stack` | `1021` | `0xdeadbeef` | `Read` | `71 ADD.POP` |
| 18 | `Stack` | `1022` | `0xfaceb00c` | `Read` | `71 ADD.POP` |
| 19 | `Stack` | `1022` | `0x1d97c6efb` | `Write` | `71 ADD.PUSH` |
| 20 | `Stack` | `1023` | `0x80` | `Read` | `72 DUP2` |
| 21 | `Stack` | `1021` | `0x80` | `Write` | `72 DUP2` |
| 22 | `Stack` | `1021` | `0x80` | `Read` | `73 MSTORE.POP` |
| 23 | `Stack` | `1022` | `0x1d97c6efb` | `Read` | `73 MSTORE.POP` |
| 24 | `Memory` | `0x80` | `0x1d97c6efb` | `Write` | `73 MSTORE` |
| 25 | `Stack` | `1022` | `0xcafeb0ba` | `Write` | `74 PUSH4` |
| 26 | `Stack` | `1021` | `0x20` | `Write` | `79 PUSH1` |
| 27 | `Stack` | `1023` | `0x80` | `Read` | `81 DUP3` |
| 28 | `Stack` | `1020` | `0x80` | `Write` | `81 DUP3` |
| 29 | `Stack` | `1020` | `0x80` | `Read` | `82 ADD.POP` |
| 30 | `Stack` | `1021` | `0x20` | `Read` | `82 ADD.POP` |
| 31 | `Stack` | `1021` | `0xa0` | `Write` | `82 ADD.PUSH` |
| 32 | `Stack` | `1021` | `0xa0` | `Read` | `83 MSTORE.POP` |
| 33 | `Stack` | `1022` | `0xcafeb0ba` | `Read` | `83 MSTORE.POP` |
| 34 | `Memory` | `0xa0` | `0xcafeb0ba` | `Write` | `83 MSTORE` |
## Call Context
In different EOA calls or internal calls, their stack and memory will be seperated by `call_context`. This is beneficial in two way:
- caller and callee doesn't need to copy the return data or call data if not used. When handling `CALLDATACOPY` or `RETURNDATACOPY`, we just locating the memory by caller or callee's `call_context` and ensure they are indeed there.
- callee can memorize caller's `call_context` directly in evm circuit, and decompress it when switching back to caller's context.
This note only describes how the state circuit works within the same call and doesn't cover `call_context`, see [here](/0Z3N9gniRAmOg8X1RN6VAg) for more information.
## Question & Discussion
### Q1. Memory is actually a byte array
1. Memory is actually a byte array which can be access at any position, how do we handle them if the operation has overlap? For example, `mstore(0x20, 0x1234) + mstore(0x19, 0x56) => mload(0x20) = 0x5634`
2. Another TBD part is how large memory address we allow to be access. If we are checking the non-strict monotonicity of memory address by lookup, larger memory address will require a larger table (or more cells to store chunks if using smaller table).
For now in evm, the gas cost of memory expansion follows [qudratic cost](https://github.com/ethereum/go-ethereum/blob/master/core/vm/gas_table.go#L27-L56), and if we assume the gas limit per block is up to $20,000,000$, we can have an inequality equation to find the max memory address access which is also runnable:
$$
\begin{aligned}
& 3\cdot\textsf{memSizeWord} + \frac{\textsf{memSizeWord}^2}{512} \le 20,000,000 \
& \textsf{memSizeWord} \le 2^{16.62} \
& \textsf{memSize} = 32 \cdot \textsf{memSizeWord} \le 2^{21.62}
\end{aligned}
$$
Naively we can have a really large table from $1$ to $2^{22}$ and lookup `key - key_prev` to check the non-strict monotonicity in memory circuit. Or we can split it into 10-bit chunks to lookup each by a smaller 10-bit table.
### Q2. Do we put stack pointer inside stack table?
> We don't validate the stack pointer in the stack table. Instead we calculate this in the EVM and use that to check with index to read from.
> [name=barry]
### Q3. How and where do we handle `DUPX`, `SWAPX`?
How do we ensure swapped or duplicated value is same to the other one?
> My thought is in the EVM proof we check 2 reads for the values that are being swapped and then we update both of these values.
> This means swap a multi slot opcode but we make the bus mapping constant lenght which will be helpful in other places.
> Curious to hear yoru thoughts on this tradeoff
> [name=barry]
### Q4. How to handle `MSTORE`
When `MSTORE` happens, it is accompanied by two `Read` at different stack `key`, and we have to combine them to lookup bus mapping to ensure to `key` (first `Read`) `val` (second `Read`) is stored to `memory`, but in `stack` table we sort them by `key` and then `gc`, so they are far away from each other. Need to think more on this.
> [name=barryWhiteHat] in the evm proof everything is looked up in order according to gc. So in evm proof we will open bus mapping at gc and see two stack ops pop and pop getting the index and the value. Then we will see a single memory op which writes the value to that index.
>[name=barryWhiteHat] since bus mapping currently only has one stack element per gc we may have to use two gc elemnts. One for stack pop of index and antohter for stack pop of value.
>[name=barryWhiteHat] this is a difficult tradeoff to make and asks the questions how big should bus mapping for stack elements be?
### Q5. What if we put more things in bus mapping (add more `valX`)?
In the above approach, it uses `(key, val, rw)` as `(val, val2, val3)` in the bus mapping contributed by `stack` table, which works for most op. But it always cost **2 slot** for op like `DUPX` because in evm circuit we need to lookup a `Read` at duplication source `(sp + X, $val, Read)` and a `Write` at stack top `(sp, $val, Write)`.
If we can have an extra `val4` in the bus mapping, we can set them `(sp, $val, Write, sp + X)` and only do **1 slot** in evm circuit (we save a `Read` check). It cost one more multiplication and addition in `compress`, but seems not to increase the constraint degree because they are multiplied by constant (randomness) and added at the end.
> [name=han] Is this correct? Would it increase the constraint degree?
> [name=barryWhiteHat] My thinking here is that DUP constraint check would have a differnt number of compressed elmeents inside it.
> [name=barryWhiteHat] I don't quite understand what you mean by increase the degree of hte constraint. Because all hte r's have already been put to the powers we need and its just a mul and add.
>
Take another example, for `MLOAD`, we can make the bus mapping `(sp, $val, Write, $key)`, where `$key` should be the preivous row's `val` in `stack` table. Then `($key, $val)` should serve as `(key, val)` to lookup `memory` entry. Then we gain **1 slot** `MLOAD` instead of 2.
> [name=han] Does this make sense? Still stuck in how we build such entry `(sp, $val, Write, sp + X)` into bus mapping in stack circuit.
> [name=barryWhiteHat] What does the bus mapping look like in this example ?
> [name=han] So take `gc = 3` in bus mapping as example, the bus mapping becomes <br> `(3, Stack, 1023, 0x80, Write, 0x40)` and `gc = 2` could be saved.
### Q6. EVM supports 256-bit value, which definitely not fits 253-bit
In this note, all the values of stack should be in compressed form instead of actual values and the actual values will not be used in state circuit, here we use actual values for clarity.
Actual values will be split into 8-bit array (byte array) and we use the same `compress` function to compress them into single `fq`
>[name=barryWhiteHat] Yes
### Q7. How do we ensure `val` of two slot in op are same if we are expecting?
For `DUPX` as example, how do we make sure the two lookup has same `$val` in different slot? I think copy constraint won't work because we never know which one is which slot.
>[name=barryWhiteHat] Agree on copy constarints. In these cases we can look back at our previous constraints wires and take the value from there. Does this make sense ?
>[name=han] Sounds good. So we also have to care about the shifted access because we have to open another shifted point when doing Kate, which increases proof size a little bit. But if we succeed to make all opcode single slot, the problem is gone.
## Reference
- [ZKEVM Intro](https://hackmd.io/@liangcc/zkvmbook/https%3A%2F%2Fhackmd.io%2FHfCsKWfWRT-B_k5j3LjIVw)
- [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf)