Update 07/07
We have some discussion about how we could design bus mapping to be more friendly for evm circuit, then we could have serveral approaches described here. Since we have not explored evm circuit much, we don't know which one is better. So we will stick to the original simplest way to do bus mapping, then optmize for evm circuit if needed.
Update 07/26
After some discussion about how to handle misaligned memory access, we decide to treat memory as a byte array in the beginning. So any memory operation below should expand to 32 items from byte to byte (could be a lot of0
read/write).
From the perspective of constraints, memory circuit has less difference from stack circuit, the main difference is memory circuit doesn't has 1024 size constraint, so prover could expand memory to the size he want once the gas left is allowed.
Update 08/05
Thecompress
for bus mapping seems to be unnecessary becausehalo2
does for us. State circuit and evm circuit still share the bus mapping table for vector lookup, and we don't need to worry about the compression, but just make sure the table compressing randomness for the vector lookup between the two proof is equal (we could just share the transcript). So thecompress
insidebus_mapping_lookup
in the note will be removed because we don't actually do it ourself.
Note that for word encoding, we still need to do random linear combination ourself, we call itencode
to avoid ambiguity from now on.
State circuit serves as a random accessible data holder of stack, memory, storage, and all the others things evm interpreter could access at any time.
This note tried to focus on stack and memory first. It goes through memory and stack sub-circuit, then explain how these sub-circuits provides the valid access records, the bus mapping, for evm circuit to read and write.
I take this naive but concrete solidity code as example:
pragma solidity ^0.8;
contract Sample {
function memory_sample() public pure {
assembly {
let ptr := mload(0x40)
mstore(ptr, 0xdeadbeaf)
mstore(ptr, add(mload(ptr), 0xfaceb00c))
mstore(add(ptr, 0x20), 0xcafeb0ba)
}
}
}
When function memory_sample
is executed, the evm should has such log (only focus on the function body in this note):
pc op stack (top -> down) memory
-- -------------- ---------------------------------- ---------------------------------------
...
53 JUMPDEST [ , , , ] {40: 80, 80: , a0: }
54 PUSH1 40 [ , , , 40] {40: 80, 80: , a0: }
56 MLOAD [ , , , 80] {40: 80, 80: , a0: }
57 PUSH4 deadbeaf [ , , deadbeef, 80] {40: 80, 80: , a0: }
62 DUP2 [ , 80, deadbeef, 80] {40: 80, 80: , a0: }
63 MSTORE [ , , , 80] {40: 80, 80: deadbeef, a0: }
64 PUSH4 faceb00c [ , , faceb00c, 80] {40: 80, 80: deadbeef, a0: }
69 DUP2 [ , 80, faceb00c, 80] {40: 80, 80: deadbeef, a0: }
70 MLOAD [ , deadbeef, faceb00c, 80] {40: 80, 80: deadbeef, a0: }
71 ADD [ , , 1d97c6efb, 80] {40: 80, 80: deadbeef, a0: }
72 DUP2 [ , 80, 1d97c6efb, 80] {40: 80, 80: deadbeef, a0: }
73 MSTORE [ , , , 80] {40: 80, 80: 1d97c6efb, a0: }
74 PUSH4 cafeb0ba [ , , cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
79 PUSH1 20 [ , 20, cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
81 DUP3 [ 80, 20, cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
82 ADD [ , a0, cafeb0ba, 80] {40: 80, 80: 1d97c6efb, a0: }
83 MSTORE [ , , , 80] {40: 80, 80: 1d97c6efb, a0: cafeb0ba}
84 POP [ , , , ] {40: 80, 80: 1d97c6efb, a0: cafeb0ba}
...
fq
- 253-bit value.op
- A byte representing EVM operation codecode
- A vector of op
compiled from smart contractpc
- Program countergc
- Global counter, is offset to 0
for simplicitysp
- Stack pointerstack
- A vector of fq
with max size 1024
memory
- A vector of bytes$a === $b
- $a
is equal to $b
$t_lookup
- A function that ensures the input is in table $t
In memory circuit, prover should collect all MLOAD
and MSTORE
operation and order them by key
and then by gc
, then build a layout with column key
, val
, rw
and gc
, which stands:
key
- key
of memory
we are operatingval
- memory[key]
after operationrw
- access enum, could be
0
- Read
1
- Write
With some auxiliary notation:
key_prev
- key
value in previous rowgc_prev
- gc
value in previous rowval_prev
- val
value in previous rowThe constraint will be:
Condition | Constraint | Note |
---|---|---|
INIT | rw === Write and val === 0 |
First row of circuit (does not query prev ) |
* |
rw ∈ [0, 1] |
Should be valid rw |
* |
key - key_prev ∈ [0, ?] |
Should be non-strict monotonic |
* |
val ∈ [0, 255] |
Should be a byte value |
key != key_prev |
rw === Write and val === 0 |
Should be initialized to 0 |
key == key_prev |
gc > gc_prev |
Should be strict monotonic |
rw == Read |
val === val_prev |
Should be previous written value |
In the above sample, the memory table should be like:
key |
val |
rw |
gc |
Note |
---|---|---|---|---|
0x40 |
0 |
Write |
Init | |
0x40 |
0x80 |
Write |
? | Assume written at the begining of code |
0x40 |
0x80 |
Read |
4 | 56 MLOAD |
- | ||||
0x80 |
0 |
Write |
Init | |
0x80 |
0xdeadbeef |
Write |
10 | 63 MSTORE |
0x80 |
0xdeadbeef |
Read |
16 | 70 MLOAD |
0x80 |
0x1d97c6efb |
Write |
24 | 73 MSTORE |
- | ||||
0xa0 |
0 |
Write |
Init | |
0xa0 |
0xcafeb0ba |
Write |
34 | 83 MSTORE |
Stack circuit is like memory circuit, but to prevent modification on every entry when PUSH
or POP
, we let evm circuit maintain a stack pointer sp
inited at 1024
to lookup the top value of stack. For example, PUSH
does sp--
and POP
does sp++
.
The constraint will be:
Condition | Constraint | Note |
---|---|---|
INIT | rw === Write and val === 0 |
First row of circuit (does not query prev ) |
* |
rw ∈ [0, 1] |
Should be valid rw |
* |
key - key_prev ∈ [0, 1023] |
Should be non-strict monotonic |
* |
key ∈ [0, 1023] |
Should be in range |
key != key_prev |
rw === Write and val === 0 |
Should be initialized to 0 |
key == key_prev |
gc > gc_prev |
Should be strict monotonic |
rw == Read |
val === val_prev |
Should be previous written value |
Some op
should be accompanied by multiple stack read/write at the same time, for DUPX
as example, we have to check if the source and new pushed value is equal, so it requires a Read
and a Write
to be in bus mapping.
We let evm circuit to use multiple lookup to ensure these read/write happen at the same time (memory as well if needed), so we could have these constraints ($x
means variable, should be equal in the same op
):
Condition | Constraint | Note |
---|---|---|
op == PUSH |
bus_mapping_lookup( gc, Stack, key, val, Write ) |
Should exist in bus mapping |
op == DUPX |
bus_mapping_lookup( gc, Stack, key+X, val, Read ) and bus_mapping_lookup( gc+1, Stack, key, val, Write ) |
Both source and new written values should exist in bus mapping. |
op == MLOAD |
bus_mapping_lookup( gc, Stack, key, $key, Read ) and bus_mapping_lookup( gc+1, Stack, key, $val, Write ) and bus_mapping_lookup( gc+2, Memory, $key, $val, Read ) |
Memory loading $key stack Read , $val stack Write , and memory Read should exist in bus mapping. |
op == MSTORE |
bus_mapping_lookup( gc, Stack, key, $key, Read ) and bus_mapping_lookup( gc+1, Stack, key+1, $val, Read ) and bus_mapping_lookup( gc+2, Memory, $key, $val, Write ) |
Memory storing $key $val stack Read , and memory Write should exist in bus mapping. |
op == SWAPX |
See ↓ |
hshen@scroll Just want to help add the missing piece for the SWAPX. We need 4 bus mapping lookup for 2 reads and 2 writes as follows. Please help check the correctness of the constraints:
bus_mapping_lookup(gc, Stack, key+X, $val1, Read)
bus_mapping_lookup(gc+1, Stack, key, $val2, Read)
bus_mapping_lookup(gc+2, Stack, key+X, $val2, Write)
bus_mapping_lookup(gc+3, Stack, key, $val1, Write)
Note: 2 reads atkey
andkey+X
and 2 writes with swapped values atkey
andkey+X
in the stack should exist in bus mapping.
In the above sample, the stack table should be like:
key |
val |
rw |
gc |
Note |
---|---|---|---|---|
1020 |
0 |
Write |
Init | |
1020 |
0x80 |
Write |
28 | 81 DUP3 |
1020 |
0x80 |
Read |
29 | 82 ADD.POP |
- | ||||
1021 |
0 |
Write |
Init | |
1021 |
0x80 |
Write |
7 | 62 DUP2 |
1021 |
0x80 |
Read |
8 | 63 MSTORE.POP |
1021 |
0x80 |
Write |
13 | 69 DUP2 |
1021 |
0x80 |
Read |
14 | 70 MLOAD.POP |
1021 |
0xdeadbeef |
Write |
15 | 70 MLOAD.PUSH |
1021 |
0xdeadbeef |
Read |
17 | 71 ADD.POP |
1021 |
0x80 |
Write |
21 | 72 DUP2 |
1021 |
0x80 |
Read |
22 | 73 MSTORE.POP |
1021 |
0x20 |
Write |
26 | 79 PUSH1 |
1021 |
0x20 |
Read |
30 | 82 ADD.POP |
1021 |
0xa0 |
Write |
31 | 82 ADD.PUSH |
1021 |
0xa0 |
Read |
32 | 83 MSTORE.POP |
- | ||||
1022 |
0 |
Write |
Init | |
1022 |
0xdeadbeef |
Write |
5 | 57 PUSH4 |
1022 |
0xdeadbeef |
Read |
9 | 63 MSTORE.POP |
1022 |
0xfaceb00c |
Write |
11 | 64 PUSH4 |
1022 |
0xfaceb00c |
Read |
18 | 71 ADD.POP |
1022 |
0x1d97c6efb |
Write |
19 | 71 ADD.PUSH |
1022 |
0x1d97c6efb |
Read |
23 | 73 MSTORE.POP |
1022 |
0xcafeb0ba |
Write |
25 | 74 PUSH4 |
1022 |
0xcafeb0ba |
Read |
33 | 83 MSTORE.POP |
- | ||||
1023 |
0 |
Write |
Init | |
1023 |
0x40 |
Write |
1 | 54 PUSH1 40 |
1023 |
0x40 |
Read |
2 | 56 MLOAD.POP |
1023 |
0x80 |
Write |
3 | 56 MLOAD.PUSH |
1023 |
0x80 |
Read |
6 | 62 DUP2 |
1023 |
0x80 |
Read |
12 | 69 DUP2 |
1023 |
0x80 |
Read |
20 | 72 DUP2 |
1023 |
0x80 |
Read |
27 | 81 DUP3 |
Storage access is not covered by this note, see here for more details
Memory and stack circuit will provide the valid and meaningful access record (some rows like init will not be included) to the bus mapping lookup table, which is shared by the state circuit and evm circuit.
It has a unique gc
to serves as a synchronizing clock, a target
to specify the residue of the access record, and many arbitrary valX
if necessary. In evm circuit, we lookup all gc
one by one and finally check the bus mapping degree is bounded to gc
in the execution end, then we have confidence that no malicious write is inserted.
We have enum target
with their valX
representation:
Stack
val1
- key
val2
- val
val3
- rw
Memory
val1
- key
val2
- val
val3
- rw
Storage
val1
- key
val2
- val
val3
- rw
val4
- val_prev
val5
- is_first_touch
...
In the above sample, the bus mapping should be like (order by gc increasingly):
gc |
target |
val1 |
val2 |
val3 |
Note |
---|---|---|---|---|---|
1 | Stack |
1023 |
0x40 |
Write |
54 PUSH1 40 |
2 | Stack |
1023 |
0x40 |
Read |
56 MLOAD.POP |
3 | Stack |
1023 |
0x80 |
Write |
56 MLOAD.PUSH |
4 | Memory |
0x40 |
0x80 |
Read |
56 MLOAD |
5 | Stack |
1022 |
0xdeadbeef |
Write |
57 PUSH4 |
6 | Stack |
1023 |
0x80 |
Read |
62 DUP2 |
7 | Stack |
1021 |
0x80 |
Write |
62 DUP2 |
8 | Stack |
1021 |
0x80 |
Read |
63 MSTORE.POP |
9 | Stack |
1022 |
0xdeadbeef |
Read |
63 MSTORE.POP |
10 | Memory |
0x80 |
0xdeadbeef |
Write |
63 MSTORE |
11 | Stack |
1022 |
0xfaceb00c |
Write |
64 PUSH4 |
12 | Stack |
1023 |
0x80 |
Read |
69 DUP2 |
13 | Stack |
1021 |
0x80 |
Write |
69 DUP2 |
14 | Stack |
1021 |
0x80 |
Read |
70 MLOAD.POP |
15 | Stack |
1021 |
0xdeadbeef |
Write |
70 MLOAD.PUSH |
16 | Memory |
0x80 |
0xdeadbeef |
Read |
70 MLOAD |
17 | Stack |
1021 |
0xdeadbeef |
Read |
71 ADD.POP |
18 | Stack |
1022 |
0xfaceb00c |
Read |
71 ADD.POP |
19 | Stack |
1022 |
0x1d97c6efb |
Write |
71 ADD.PUSH |
20 | Stack |
1023 |
0x80 |
Read |
72 DUP2 |
21 | Stack |
1021 |
0x80 |
Write |
72 DUP2 |
22 | Stack |
1021 |
0x80 |
Read |
73 MSTORE.POP |
23 | Stack |
1022 |
0x1d97c6efb |
Read |
73 MSTORE.POP |
24 | Memory |
0x80 |
0x1d97c6efb |
Write |
73 MSTORE |
25 | Stack |
1022 |
0xcafeb0ba |
Write |
74 PUSH4 |
26 | Stack |
1021 |
0x20 |
Write |
79 PUSH1 |
27 | Stack |
1023 |
0x80 |
Read |
81 DUP3 |
28 | Stack |
1020 |
0x80 |
Write |
81 DUP3 |
29 | Stack |
1020 |
0x80 |
Read |
82 ADD.POP |
30 | Stack |
1021 |
0x20 |
Read |
82 ADD.POP |
31 | Stack |
1021 |
0xa0 |
Write |
82 ADD.PUSH |
32 | Stack |
1021 |
0xa0 |
Read |
83 MSTORE.POP |
33 | Stack |
1022 |
0xcafeb0ba |
Read |
83 MSTORE.POP |
34 | Memory |
0xa0 |
0xcafeb0ba |
Write |
83 MSTORE |
In different EOA calls or internal calls, their stack and memory will be seperated by call_context
. This is beneficial in two way:
CALLDATACOPY
or RETURNDATACOPY
, we just locating the memory by caller or callee's call_context
and ensure they are indeed there.call_context
directly in evm circuit, and decompress it when switching back to caller's context.This note only describes how the state circuit works within the same call and doesn't cover call_context
, see here for more information.
Memory is actually a byte array which can be access at any position, how do we handle them if the operation has overlap? For example, mstore(0x20, 0x1234) + mstore(0x19, 0x56) => mload(0x20) = 0x5634
Another TBD part is how large memory address we allow to be access. If we are checking the non-strict monotonicity of memory address by lookup, larger memory address will require a larger table (or more cells to store chunks if using smaller table).
For now in evm, the gas cost of memory expansion follows qudratic cost, and if we assume the gas limit per block is up to \(20,000,000\), we can have an inequality equation to find the max memory address access which is also runnable:
\[ \begin{aligned} & 3\cdot\textsf{memSizeWord} + \frac{\textsf{memSizeWord}^2}{512} \le 20,000,000 \ & \textsf{memSizeWord} \le 2^{16.62} \ & \textsf{memSize} = 32 \cdot \textsf{memSizeWord} \le 2^{21.62} \end{aligned} \]
Naively we can have a really large table from \(1\) to \(2^{22}\) and lookup key - key_prev
to check the non-strict monotonicity in memory circuit. Or we can split it into 10-bit chunks to lookup each by a smaller 10-bit table.
We don't validate the stack pointer in the stack table. Instead we calculate this in the EVM and use that to check with index to read from.
barry
DUPX
, SWAPX
?How do we ensure swapped or duplicated value is same to the other one?
My thought is in the EVM proof we check 2 reads for the values that are being swapped and then we update both of these values.
This means swap a multi slot opcode but we make the bus mapping constant lenght which will be helpful in other places.
Curious to hear yoru thoughts on this tradeoff
barry
MSTORE
When MSTORE
happens, it is accompanied by two Read
at different stack key
, and we have to combine them to lookup bus mapping to ensure to key
(first Read
) val
(second Read
) is stored to memory
, but in stack
table we sort them by key
and then gc
, so they are far away from each other. Need to think more on this.
barryWhiteHat in the evm proof everything is looked up in order according to gc. So in evm proof we will open bus mapping at gc and see two stack ops pop and pop getting the index and the value. Then we will see a single memory op which writes the value to that index.
barryWhiteHat since bus mapping currently only has one stack element per gc we may have to use two gc elemnts. One for stack pop of index and antohter for stack pop of value.
barryWhiteHat this is a difficult tradeoff to make and asks the questions how big should bus mapping for stack elements be?
valX
)?In the above approach, it uses (key, val, rw)
as (val, val2, val3)
in the bus mapping contributed by stack
table, which works for most op. But it always cost 2 slot for op like DUPX
because in evm circuit we need to lookup a Read
at duplication source (sp + X, $val, Read)
and a Write
at stack top (sp, $val, Write)
.
If we can have an extra val4
in the bus mapping, we can set them (sp, $val, Write, sp + X)
and only do 1 slot in evm circuit (we save a Read
check). It cost one more multiplication and addition in compress
, but seems not to increase the constraint degree because they are multiplied by constant (randomness) and added at the end.
han Is this correct? Would it increase the constraint degree?
barryWhiteHat My thinking here is that DUP constraint check would have a differnt number of compressed elmeents inside it.
barryWhiteHat I don't quite understand what you mean by increase the degree of hte constraint. Because all hte r's have already been put to the powers we need and its just a mul and add.
Take another example, for MLOAD
, we can make the bus mapping (sp, $val, Write, $key)
, where $key
should be the preivous row's val
in stack
table. Then ($key, $val)
should serve as (key, val)
to lookup memory
entry. Then we gain 1 slot MLOAD
instead of 2.
han Does this make sense? Still stuck in how we build such entry
(sp, $val, Write, sp + X)
into bus mapping in stack circuit.
barryWhiteHat What does the bus mapping look like in this example ?
han So takegc = 3
in bus mapping as example, the bus mapping becomes
(3, Stack, 1023, 0x80, Write, 0x40)
andgc = 2
could be saved.
In this note, all the values of stack should be in compressed form instead of actual values and the actual values will not be used in state circuit, here we use actual values for clarity.
Actual values will be split into 8-bit array (byte array) and we use the same compress
function to compress them into single fq
barryWhiteHat Yes
val
of two slot in op are same if we are expecting?For DUPX
as example, how do we make sure the two lookup has same $val
in different slot? I think copy constraint won't work because we never know which one is which slot.
barryWhiteHat Agree on copy constarints. In these cases we can look back at our previous constraints wires and take the value from there. Does this make sense ?
han Sounds good. So we also have to care about the shifted access because we have to open another shifted point when doing Kate, which increases proof size a little bit. But if we succeed to make all opcode single slot, the problem is gone.