STOP
, RETURN
, or REVERT
)STOP
, RETURN
, or REVERT
)What's annoying is there are 142 opcodes in total, each of them also have a few possible execution results, which sum to around 470 possible execution results. (plus few call initialization cases)
If we treat ErrOutOfGas
as 2 different errors ErrOutOfConstantGas
and ErrOutOfDynamicGas
, and the former with other common errors which could be verified easily as an single case (other like ErrStackUnderflow
, ErrStackUnderflow
, ErrWriteProtection
, ErrDepth
, ErrInsufficientBalance
), then we have 187 possible execution results, which seems better but still many for a circuit.
In the end there will be less cause we will find some opcodes can be handled without any extra cost (like DUP*
and SWAP*
).
In current impl I didn't treat common errors as an single case because I thought it won't save too much, but now for me it seems we should do this to avoid impl redundancy. Then it seems only around 30 opcodes need 2 cases (success and dynamic OOG) and only
CREATE*
needs to be taken care specially. Others only need to focus on success case.
han
See here for the estimation of possible execution results. Or see evm-opcodes
by wolflo for a quick reference for EVM opcodes, which also explains the gas cost fantastically.
The pre-compiled contracts are not considered yet, it seems we could treats them as special opcode that constraint address to some value.
Assume finally we have each polynomial degree to
Reference. Not certainly sure about this, need @yingtong's confirmation.
han
gc
, sp
, etc…)is_padding
to identify whether it's a paddingis_executing
to identify whether it's executing bytecode or initializing a callq_ops
for op gadgets, each op gadget could handle more than 1 op (80 comes from 142 - (num_push - 1) - (num_dup - 1) - (num_swap - 1)
, could be less in the future).global_counter
, call_id
, bytecode_source_kind
, bytecode_source
, stack_pointer
, etc…CALL
which pops 7 words from stack)When memory is used to
If we can assume the memory will always be in
The reason is most opcode uses
But not sure if such incompatible with EVM will break some current DApp, need to do more investigation.
Currently we have so many subset arguments… A subset argument costs 3 commitments and 5 evaluations (shuffled table + shuffled input accessible to previous + 1 shuffle product accessible to next), which seems really costly.
Reference. Not sure if it's correct in our case, need @yingtong and @kilic's confirmation.
han
With narrower circuit seems reasonable because we then can make good use of a single subset argument, but still have some drawback:
For each subset argument, we handle the layout:
(tag * input, tag * table)
for all-zero-row when disabledThey are several approaches to do branching (creation of bool expression for branches), to represetn
Description | Degree Cost | Cell Cost | Note |
---|---|---|---|
Lagrange basis in single cell | State circuit currently uses this | ||
Binary composition in |
|||
Stand-alone booleans in |
Current implementation uses this nestedly |
Wondering if there are more choices to do efficient branching.
han
Since we have 186 possible execution results (might be much less in the future), we can binary composition with 8 bit to represent each case, which leads to a degree 8 synthetic selector.
It will be too high for previous limitation, but if we relax the mul degree bound to 16, then we can reduce 80 selector to only 8.
It seems not beneficial cause we only save height from 7 to 5, but our total poly degree is also half. And the linearisation could be much more complicated.
The reason the previous approach doesn't save is because the 4 words still dominate the step cost. If we can outsource word operation, and with the previous approach, the circuit could be much less wide.
So like arithmetic, bitwise, comparator, etc… which needs to be verified byte to byte will be handle in another circuit as a table.
Some special case like bytecode copy, we can assume bytecode table is constraint to order bytecode by index, then do lookup like:
meta.lookup(|meta| {
// ...
// encode table with continously 32 rotation instead of decompressing in evm circuit step
let a_word = meta.query_advice(advice_opcode, Rotaion::cur())
let t_word = (0..32)
.iter()
.map(|rotation| meta.query_advice(table_opcode, Rotation(rotation)))
.enumerate()
.fold(
Expression::Constant(F::zero()),
|acc, (idx, op)| acc + op * r.exp(idx)
);
vec![
// ...
(a_word, t_word)
]
});
Then we don't need to decompress word into 32 bytes in evm circuit.
Other case like memory copy, we need some other constraint on bus mapping to perform such trick.
Currently there is a fixed selector to serve as a pivot to enable each step, it enable in the first row of every 10 rows to fit the most costly case.
What if we replace the fixed selector with a witness one, then we use another fixed selector to constraint the first row's of the witness pivot to 1, then each case will constraint next N+1 row's witness pivot to 1, where N is the amount of rows the case requires. Then we don't waste any rows ever.
To visualize, we can treat the circuit as a giant table, each gadget is just sliding from top to bottom to verify the cells satisfy the constraints when it's enabled, so we should be able to do this trick.
Not sure if there is some drawback (security issue, more complicated linearisation on verifier, etc…)