Update

Update 07/07
We have some discussion about how we could design bus mapping to be more friendly for evm circuit, then we could have serveral approaches described here. Since we have not explored evm circuit much, we don't know which one is better. So we will stick to the original simplest way to do bus mapping, then optmize for evm circuit if needed.

Update 07/26
After some discussion about how to handle misaligned memory access, we decide to treat memory as a byte array in the beginning. So any memory operation below should expand to 32 items from byte to byte (could be a lot of 0 read/write).
From the perspective of constraints, memory circuit has less difference from stack circuit, the main difference is memory circuit doesn't has 1024 size constraint, so prover could expand memory to the size he want once the gas left is allowed.

Update 08/05
The compress for bus mapping seems to be unnecessary because halo2 does for us. State circuit and evm circuit still share the bus mapping table for vector lookup, and we don't need to worry about the compression, but just make sure the table compressing randomness for the vector lookup between the two proof is equal (we could just share the transcript). So the compress inside bus_mapping_lookup in the note will be removed because we don't actually do it ourself.
Note that for word encoding, we still need to do random linear combination ourself, we call it encode to avoid ambiguity from now on.

Introduction

State circuit serves as a random accessible data holder of stack, memory, storage, and all the others things evm interpreter could access at any time.

This note tried to focus on stack and memory first. It goes through memory and stack sub-circuit, then explain how these sub-circuits provides the valid access records, the bus mapping, for evm circuit to read and write.

I take this naive but concrete solidity code as example:

pragma solidity ^0.8;

contract Sample {
    function memory_sample() public pure {
        assembly {
            let ptr := mload(0x40)
            mstore(ptr, 0xdeadbeaf)
            mstore(ptr, add(mload(ptr), 0xfaceb00c))
            mstore(add(ptr, 0x20), 0xcafeb0ba)
        }
    }
}

When function memory_sample is executed, the evm should has such log (only focus on the function body in this note):

pc  op              stack (top -> down)                  memory
--  --------------  ----------------------------------   ---------------------------------------  
... 
53  JUMPDEST        [    ,          ,           ,    ]   {40: 80,  80:          ,  a0:         }
54  PUSH1 40        [    ,          ,           ,  40]   {40: 80,  80:          ,  a0:         }
56  MLOAD           [    ,          ,           ,  80]   {40: 80,  80:          ,  a0:         }
57  PUSH4 deadbeaf  [    ,          ,   deadbeef,  80]   {40: 80,  80:          ,  a0:         }
62  DUP2            [    ,        80,   deadbeef,  80]   {40: 80,  80:          ,  a0:         }
63  MSTORE          [    ,          ,           ,  80]   {40: 80,  80:  deadbeef,  a0:         }
64  PUSH4 faceb00c  [    ,          ,   faceb00c,  80]   {40: 80,  80:  deadbeef,  a0:         }
69  DUP2            [    ,        80,   faceb00c,  80]   {40: 80,  80:  deadbeef,  a0:         }
70  MLOAD           [    ,  deadbeef,   faceb00c,  80]   {40: 80,  80:  deadbeef,  a0:         }
71  ADD             [    ,          ,  1d97c6efb,  80]   {40: 80,  80:  deadbeef,  a0:         }
72  DUP2            [    ,        80,  1d97c6efb,  80]   {40: 80,  80:  deadbeef,  a0:         }
73  MSTORE          [    ,          ,           ,  80]   {40: 80,  80: 1d97c6efb,  a0:         }
74  PUSH4 cafeb0ba  [    ,          ,   cafeb0ba,  80]   {40: 80,  80: 1d97c6efb,  a0:         }
79  PUSH1 20        [    ,        20,   cafeb0ba,  80]   {40: 80,  80: 1d97c6efb,  a0:         }
81  DUP3            [  80,        20,   cafeb0ba,  80]   {40: 80,  80: 1d97c6efb,  a0:         }
82  ADD             [    ,        a0,   cafeb0ba,  80]   {40: 80,  80: 1d97c6efb,  a0:         }
83  MSTORE          [    ,          ,           ,  80]   {40: 80,  80: 1d97c6efb,  a0: cafeb0ba}
84  POP             [    ,          ,           ,    ]   {40: 80,  80: 1d97c6efb,  a0: cafeb0ba}
...

Definition

  • fq - 253-bit value.
  • op - A byte representing EVM operation code
  • code - A vector of op compiled from smart contract
  • pc - Program counter
  • gc - Global counter, is offset to 0 for simplicity
  • sp - Stack pointer
  • stack - A vector of fq with max size 1024
  • memory - A vector of bytes
  • $a === $b - $a is equal to $b
  • $t_lookup - A function that ensures the input is in table $t

Memory Circuit

In memory circuit, prover should collect all MLOAD and MSTORE operation and order them by key and then by gc, then build a layout with column key, val, rw and gc, which stands:

  • key - key of memory we are operating
  • val - memory[key] after operation
  • rw - access enum, could be
    • 0 - Read
    • 1 - Write

With some auxiliary notation:

  • key_prev - key value in previous row
  • gc_prev - gc value in previous row
  • val_prev - val value in previous row

The constraint will be:

Condition Constraint Note
INIT rw === Write
and val === 0
First row of circuit (does not query prev)
* rw ∈ [0, 1] Should be valid rw
* key - key_prev ∈ [0, ?] Should be non-strict monotonic
* val ∈ [0, 255] Should be a byte value
key != key_prev rw === Write
and val === 0
Should be initialized to 0
key == key_prev gc > gc_prev Should be strict monotonic
rw == Read val === val_prev Should be previous written value

In the above sample, the memory table should be like:

key val rw gc Note
0x40 0 Write Init
0x40 0x80 Write ? Assume written at the begining of code
0x40 0x80 Read 4 56 MLOAD
-
0x80 0 Write Init
0x80 0xdeadbeef Write 10 63 MSTORE
0x80 0xdeadbeef Read 16 70 MLOAD
0x80 0x1d97c6efb Write 24 73 MSTORE
-
0xa0 0 Write Init
0xa0 0xcafeb0ba Write 34 83 MSTORE

Stack Circuit

Stack circuit is like memory circuit, but to prevent modification on every entry when PUSH or POP, we let evm circuit maintain a stack pointer sp inited at 1024 to lookup the top value of stack. For example, PUSH does sp-- and POP does sp++.

The constraint will be:

Condition Constraint Note
INIT rw === Write
and val === 0
First row of circuit (does not query prev)
* rw ∈ [0, 1] Should be valid rw
* key - key_prev ∈ [0, 1023] Should be non-strict monotonic
* key ∈ [0, 1023] Should be in range
key != key_prev rw === Write
and val === 0
Should be initialized to 0
key == key_prev gc > gc_prev Should be strict monotonic
rw == Read val === val_prev Should be previous written value

Some op should be accompanied by multiple stack read/write at the same time, for DUPX as example, we have to check if the source and new pushed value is equal, so it requires a Read and a Write to be in bus mapping.

We let evm circuit to use multiple lookup to ensure these read/write happen at the same time (memory as well if needed), so we could have these constraints ($x means variable, should be equal in the same op):

Condition Constraint Note
op == PUSH bus_mapping_lookup(
  gc,
  Stack,
  key,
  val,
  Write
)
Should exist in bus mapping
op == DUPX bus_mapping_lookup(
  gc,
  Stack,
  key+X,
  val,
  Read
) and
bus_mapping_lookup(
  gc+1,
  Stack,
  key,
  val,
  Write
)
Both source and new written values should exist in bus mapping.
op == MLOAD bus_mapping_lookup(
  gc,
  Stack,
  key,
  $key,
  Read
) and
bus_mapping_lookup(
  gc+1,
  Stack,
  key,
  $val,
  Write
) and
bus_mapping_lookup(
  gc+2,
  Memory,
  $key,
  $val,
  Read
)
Memory loading $key stack Read, $val stack Write, and memory Read should exist in bus mapping.
op == MSTORE bus_mapping_lookup(
  gc,
  Stack,
  key,
  $key,
  Read
) and
bus_mapping_lookup(
  gc+1,
  Stack,
  key+1,
  $val,
  Read
) and
bus_mapping_lookup(
  gc+2,
  Memory,
  $key,
  $val,
  Write
)
Memory storing $key $val stack Read, and memory Write should exist in bus mapping.
op == SWAPX See ↓

hshen@scroll Just want to help add the missing piece for the SWAPX. We need 4 bus mapping lookup for 2 reads and 2 writes as follows. Please help check the correctness of the constraints:
bus_mapping_lookup(gc, Stack, key+X, $val1, Read)
bus_mapping_lookup(gc+1, Stack, key, $val2, Read)
bus_mapping_lookup(gc+2, Stack, key+X, $val2, Write)
bus_mapping_lookup(gc+3, Stack, key, $val1, Write)
Note: 2 reads at key and key+X and 2 writes with swapped values at key and key+X in the stack should exist in bus mapping.

In the above sample, the stack table should be like:

key val rw gc Note
1020 0 Write Init
1020 0x80 Write 28 81 DUP3
1020 0x80 Read 29 82 ADD.POP
-
1021 0 Write Init
1021 0x80 Write 7 62 DUP2
1021 0x80 Read 8 63 MSTORE.POP
1021 0x80 Write 13 69 DUP2
1021 0x80 Read 14 70 MLOAD.POP
1021 0xdeadbeef Write 15 70 MLOAD.PUSH
1021 0xdeadbeef Read 17 71 ADD.POP
1021 0x80 Write 21 72 DUP2
1021 0x80 Read 22 73 MSTORE.POP
1021 0x20 Write 26 79 PUSH1
1021 0x20 Read 30 82 ADD.POP
1021 0xa0 Write 31 82 ADD.PUSH
1021 0xa0 Read 32 83 MSTORE.POP
-
1022 0 Write Init
1022 0xdeadbeef Write 5 57 PUSH4
1022 0xdeadbeef Read 9 63 MSTORE.POP
1022 0xfaceb00c Write 11 64 PUSH4
1022 0xfaceb00c Read 18 71 ADD.POP
1022 0x1d97c6efb Write 19 71 ADD.PUSH
1022 0x1d97c6efb Read 23 73 MSTORE.POP
1022 0xcafeb0ba Write 25 74 PUSH4
1022 0xcafeb0ba Read 33 83 MSTORE.POP
-
1023 0 Write Init
1023 0x40 Write 1 54 PUSH1 40
1023 0x40 Read 2 56 MLOAD.POP
1023 0x80 Write 3 56 MLOAD.PUSH
1023 0x80 Read 6 62 DUP2
1023 0x80 Read 12 69 DUP2
1023 0x80 Read 20 72 DUP2
1023 0x80 Read 27 81 DUP3

Storage Circuit

Storage access is not covered by this note, see here for more details

Bus Mapping

Memory and stack circuit will provide the valid and meaningful access record (some rows like init will not be included) to the bus mapping lookup table, which is shared by the state circuit and evm circuit.

It has a unique gc to serves as a synchronizing clock, a target to specify the residue of the access record, and many arbitrary valX if necessary. In evm circuit, we lookup all gc one by one and finally check the bus mapping degree is bounded to gc in the execution end, then we have confidence that no malicious write is inserted.

We have enum target with their valX representation:

  • Stack
    • val1 - key
    • val2 - val
    • val3 - rw
  • Memory
    • val1 - key
    • val2 - val
    • val3 - rw
  • Storage
    • val1 - key
    • val2 - val
    • val3 - rw
    • val4 - val_prev
    • val5 - is_first_touch
  • ...

In the above sample, the bus mapping should be like (order by gc increasingly):

gc target val1 val2 val3 Note
1 Stack 1023 0x40 Write 54 PUSH1 40
2 Stack 1023 0x40 Read 56 MLOAD.POP
3 Stack 1023 0x80 Write 56 MLOAD.PUSH
4 Memory 0x40 0x80 Read 56 MLOAD
5 Stack 1022 0xdeadbeef Write 57 PUSH4
6 Stack 1023 0x80 Read 62 DUP2
7 Stack 1021 0x80 Write 62 DUP2
8 Stack 1021 0x80 Read 63 MSTORE.POP
9 Stack 1022 0xdeadbeef Read 63 MSTORE.POP
10 Memory 0x80 0xdeadbeef Write 63 MSTORE
11 Stack 1022 0xfaceb00c Write 64 PUSH4
12 Stack 1023 0x80 Read 69 DUP2
13 Stack 1021 0x80 Write 69 DUP2
14 Stack 1021 0x80 Read 70 MLOAD.POP
15 Stack 1021 0xdeadbeef Write 70 MLOAD.PUSH
16 Memory 0x80 0xdeadbeef Read 70 MLOAD
17 Stack 1021 0xdeadbeef Read 71 ADD.POP
18 Stack 1022 0xfaceb00c Read 71 ADD.POP
19 Stack 1022 0x1d97c6efb Write 71 ADD.PUSH
20 Stack 1023 0x80 Read 72 DUP2
21 Stack 1021 0x80 Write 72 DUP2
22 Stack 1021 0x80 Read 73 MSTORE.POP
23 Stack 1022 0x1d97c6efb Read 73 MSTORE.POP
24 Memory 0x80 0x1d97c6efb Write 73 MSTORE
25 Stack 1022 0xcafeb0ba Write 74 PUSH4
26 Stack 1021 0x20 Write 79 PUSH1
27 Stack 1023 0x80 Read 81 DUP3
28 Stack 1020 0x80 Write 81 DUP3
29 Stack 1020 0x80 Read 82 ADD.POP
30 Stack 1021 0x20 Read 82 ADD.POP
31 Stack 1021 0xa0 Write 82 ADD.PUSH
32 Stack 1021 0xa0 Read 83 MSTORE.POP
33 Stack 1022 0xcafeb0ba Read 83 MSTORE.POP
34 Memory 0xa0 0xcafeb0ba Write 83 MSTORE

Call Context

In different EOA calls or internal calls, their stack and memory will be seperated by call_context. This is beneficial in two way:

  • caller and callee doesn't need to copy the return data or call data if not used. When handling CALLDATACOPY or RETURNDATACOPY, we just locating the memory by caller or callee's call_context and ensure they are indeed there.
  • callee can memorize caller's call_context directly in evm circuit, and decompress it when switching back to caller's context.

This note only describes how the state circuit works within the same call and doesn't cover call_context, see here for more information.

Question & Discussion

Q1. Memory is actually a byte array

  1. Memory is actually a byte array which can be access at any position, how do we handle them if the operation has overlap? For example, mstore(0x20, 0x1234) + mstore(0x19, 0x56) => mload(0x20) = 0x5634

  2. Another TBD part is how large memory address we allow to be access. If we are checking the non-strict monotonicity of memory address by lookup, larger memory address will require a larger table (or more cells to store chunks if using smaller table).

    For now in evm, the gas cost of memory expansion follows qudratic cost, and if we assume the gas limit per block is up to \(20,000,000\), we can have an inequality equation to find the max memory address access which is also runnable:

    \[ \begin{aligned} & 3\cdot\textsf{memSizeWord} + \frac{\textsf{memSizeWord}^2}{512} \le 20,000,000 \ & \textsf{memSizeWord} \le 2^{16.62} \ & \textsf{memSize} = 32 \cdot \textsf{memSizeWord} \le 2^{21.62} \end{aligned} \]

    Naively we can have a really large table from \(1\) to \(2^{22}\) and lookup key - key_prev to check the non-strict monotonicity in memory circuit. Or we can split it into 10-bit chunks to lookup each by a smaller 10-bit table.

Q2. Do we put stack pointer inside stack table?

We don't validate the stack pointer in the stack table. Instead we calculate this in the EVM and use that to check with index to read from.
barry

Q3. How and where do we handle DUPX, SWAPX?

How do we ensure swapped or duplicated value is same to the other one?

My thought is in the EVM proof we check 2 reads for the values that are being swapped and then we update both of these values.
This means swap a multi slot opcode but we make the bus mapping constant lenght which will be helpful in other places.
Curious to hear yoru thoughts on this tradeoff
barry

Q4. How to handle MSTORE

When MSTORE happens, it is accompanied by two Read at different stack key, and we have to combine them to lookup bus mapping to ensure to key (first Read) val (second Read) is stored to memory, but in stack table we sort them by key and then gc, so they are far away from each other. Need to think more on this.

barryWhiteHat in the evm proof everything is looked up in order according to gc. So in evm proof we will open bus mapping at gc and see two stack ops pop and pop getting the index and the value. Then we will see a single memory op which writes the value to that index.
barryWhiteHat since bus mapping currently only has one stack element per gc we may have to use two gc elemnts. One for stack pop of index and antohter for stack pop of value.
barryWhiteHat this is a difficult tradeoff to make and asks the questions how big should bus mapping for stack elements be?

Q5. What if we put more things in bus mapping (add more valX)?

In the above approach, it uses (key, val, rw) as (val, val2, val3) in the bus mapping contributed by stack table, which works for most op. But it always cost 2 slot for op like DUPX because in evm circuit we need to lookup a Read at duplication source (sp + X, $val, Read) and a Write at stack top (sp, $val, Write).

If we can have an extra val4 in the bus mapping, we can set them (sp, $val, Write, sp + X) and only do 1 slot in evm circuit (we save a Read check). It cost one more multiplication and addition in compress, but seems not to increase the constraint degree because they are multiplied by constant (randomness) and added at the end.

han Is this correct? Would it increase the constraint degree?
barryWhiteHat My thinking here is that DUP constraint check would have a differnt number of compressed elmeents inside it.
barryWhiteHat I don't quite understand what you mean by increase the degree of hte constraint. Because all hte r's have already been put to the powers we need and its just a mul and add.

Take another example, for MLOAD, we can make the bus mapping (sp, $val, Write, $key), where $key should be the preivous row's val in stack table. Then ($key, $val) should serve as (key, val) to lookup memory entry. Then we gain 1 slot MLOAD instead of 2.

han Does this make sense? Still stuck in how we build such entry (sp, $val, Write, sp + X) into bus mapping in stack circuit.
barryWhiteHat What does the bus mapping look like in this example ?
han So take gc = 3 in bus mapping as example, the bus mapping becomes
(3, Stack, 1023, 0x80, Write, 0x40) and gc = 2 could be saved.

Q6. EVM supports 256-bit value, which definitely not fits 253-bit

In this note, all the values of stack should be in compressed form instead of actual values and the actual values will not be used in state circuit, here we use actual values for clarity.

Actual values will be split into 8-bit array (byte array) and we use the same compress function to compress them into single fq

barryWhiteHat Yes

Q7. How do we ensure val of two slot in op are same if we are expecting?

For DUPX as example, how do we make sure the two lookup has same $val in different slot? I think copy constraint won't work because we never know which one is which slot.

barryWhiteHat Agree on copy constarints. In these cases we can look back at our previous constraints wires and take the value from there. Does this make sense ?
han Sounds good. So we also have to care about the shifted access because we have to open another shifted point when doing Kate, which increases proof size a little bit. But if we succeed to make all opcode single slot, the problem is gone.

Reference

Select a repo