zkVM on Ethereum - Minimal ExecutionWitness proposal

## Intro This document proposes a minimal *ExecutionWitness* data structure as input to guest programs for L1 block proving. Having an initial version of this data structure can unblock many other lines of work, as will be detailed below. Please feel free to leave comments! ## Context Let’s quickly refresh where this *ExecutionWitness* fits into the big picture. To generate a proof, we need two main components: 1. The EL (zkVM guest program) 2. A zkVM that executes it and generates a proof of the execution trace. The EL guest program is a stateless implementation of the transition function (STF), so it must receive all required data as inputs (i.e., there isn’t any “disk” to read from). A way to visualize this is: ![image](https://hackmd.io/_uploads/SkqESCZmWl.png =400x) The `StatelessInput` is what is passed as input to the EL: - `block`: is the full block being proven/validated. - `execution_witness`: contains all the data (and proof is correct) required to execute the STF. - `chain_config`: contains the chain configuration so the STF can be configured properly to apply fork rules. The *ExecutionWitness* is also a core part of a proposed new RPC `debug_executionWitness` (name TBD). For a given block number/hash, this RPC returns the `ExecutionWitness` for it. The e2e workflow for a prover for optional proofs would be (roughly): ![image](https://hackmd.io/_uploads/rJ9EUA-QWx.png =500x) In a reality where proofs are mandatory, the creator of the execution witness is the builder. In this case, the execution witness will be generated during block construction. Note in this document we mainly focus on the `ExecutionWitness` field of `StatelessInput` since it is the most important (and complex) one to standarize. We also assume all EL guest programs use this `StatelessInput` structure as private input. **This topic requires proper discussion, and we'll kick it off in a separate document where pros/cons of standarizing guest program API.** In a potential reality where guest program API isn't standarized, step 2. in the diagram above would mean a `ExecutionWitness -> CustomExecutionWitness` transformation that the guest program prefers. ## Why is it helpful to agree on a standard *ExecutionWitness*? Note that *ExecutionWitness* is part of the private inputs to the guest program; thus, it might not be obvious why standardizing it is helpful. Let’s explore this a bit more. ### Testing and benchmarking One of the main reasons is to provide the testing framework and related infrastructure with a clear API to target EL guest programs. If there isn’t a minimal agreement on the primary input for guest programs, we can’t build shared testing cases, benchmark infrastructure, or fallback processes. On the specs and testing side, it allows generating the *ExecutionWitness* for all existing tests. EL clients would run EEST test as usual (i.e., full node mode), but also run in stateless mode, which is the mode that runs as a guest program. This gives us confidence that the stateless EL version still passes all EEST tests, as the full-node version does. Moreover, generating *ExecutionWitness* as part of the test fixtures also allows us to verify that the generation logic of the `debug_executionWitness` RPC works as expected. As in, ELs must also run all EEST fixtures, generate the execution witness, and verify that it matches the expected execution witness in the fixture (i.e., the spec implementation). This will provide more confidence that the underlying logic for `debug_executionWitness` works as expected. In addition to leveraging existing tests to run in stateless mode, we must also create new test cases. Although 95% of the guest program logic is the STF, which is the same across full and stateless node modes, stateless mode also includes custom logic—mainly input verification. This is critical logic to cover—for example, if an EL client implementation has faulty pre-state verification logic, it can produce proofs that appear valid but correspond to an incorrect STF. The only way to create these tests is to leverage the current testing framework (e.g., creating tests with invalid execution witnesses and verifying EL rejects them in stateless mode), thus we need standardized outputs that all EL clients can consume. Benchmarking for worst cases is a specific case of testing. The current worst-case scenarios were constructed as [*execution-specs](https://github.com/ethereum/execution-specs)* tests. This means benchmarking leverages EL's test support to run benchmarks against them as well. In summary, having a agreeing in such an important data structure allows building tests, tools, and benchmarks independently of custom EL client design decisions. For any EL, accepting only standardized input means they get immediate access to a useful set of processes to test and benchmark their implementations. ### Avoid infrastructure fragmentation Having the standardized `debug_executionWitness` RPC, which returns the execution witness structure, provides a more robust infrastructure in the case of unexpected bugs in production. If `debug_executionWitness` RPC is standardized; any EL RPC can generate the required input data for any *other* EL as a guest program used in proofs. If this isn’t the case and the non-standardized logic for generating inputs for a particular EL breaks, there isn’t an easy workaround to fix the problem in production, which can be risky. ### Live networks datasets In addition to running synthetic cases created with the testing framework, it is also useful to generate datasets of historical blocks in existing networks, such as devnets, testnets, and mainnet. These sets allow any EL client that complies with this standard to test or benchmark against a big set of real blocks. ## Proposal This section contains a proposal for the *ExecutionWitness*. Note that this document is not a spec since explaining it precisely without code is very challenging. The goal is to give a high-level description with enough details to gain confidence that this is a proposal worth implementing in *execution-specs,* which will become the specs. The proposed *ExecutionWitness* is: ```jsx @dataclass class ExecutionWitness: nodes: List[bytes] bytecodes: List[bytes] ancestors: List[bytes] ``` ### `nodes` field The high-level explanation of this field is that it is the minimal union of: 1. Nodes from Merkle Patricia Trie (MPT) proofs of every state access done in the account trie, or storage tries during the STF i.e. both proof of presence and proof of absence. 2. Trie nodes required for post-state root calculations due to branch compressions. Regarding point 1., it can be constructed by recording all state accesses that happened during the STF execution into a set, and generating an MPT multi-proof of this set. Regarding point 2., we must define a standardized procedure on how the post-state root is calculated. This is required because applying state changes in different orders can yield different observed branch compressions, so different auxiliary sibling nodes are needed during tree updates. We propose applying state changes in the following order: 1. Apply all inserts and updates from the state diff. Ordering is not relevant. 2. Apply all deletions. Ordering is not relevant. This proposed order, which applies all insertions and updates first, minimizes branch compression when deletions are applied. This helps minimize the number of auxiliary nodes required for post-state root calculations. Note this indirectly adds an extra rule on how the post-state tree is updated in the stateless mode—not doing so might lead to not found required nodes. The proposed way to encode the nodes into the `nodes` list is using RLP. While RLP isn’t convenient for deserialization performance inside the guest program, this is naturally required to do the MPT proofs verification (i.e. hashing the RLP-encoded node). This list of `nodes` can effectively be used as map of `keccak(node) -> node` to do pre-state checking. As a final note, this field must: - Not contain duplicate values, i.e., duplicate nodes are expected since MPT proofs will share nodes—there is no point in duplicating them. - Not contain unnecessary values (e.g. trie nodes not required) and be sorted in ascending lexicographic order. i.e., is canonical. There can be other orderings that can feel more convenient, such as trees access order, but this can complicate the deduplication requirement. Regarding verification of this field, the nodes provided as values must correspond to the pre-state tree i.e. one with root equal to the parent of the target block being proven. Or said differently, verify is a valid multi-MPT proof. If this check isn’t done, the STF would run on invalid data. ### `bytecodes` field Every accessed account bytecode existing in the pre-state during the STF must be captured and included it in this field. Examples of triggers accessing contract bytecode are: - Execution of `CALL`, `CALLCODE`, `STATICCALL`, `DELEGATECALL` , `EXTCODECOPY`, `EXTCODESIZE`. Note that `CODECOPY` or `CODESIZE` won’t trigger a new bytecode access since a previous opcodes execution should have already triggered this event for the target bytecode. `EXTCODEHASH` can be resolved from leaf nodes, thus doesn’t require providing the bytecode. New bytecode created by `CREATE` and `CREATE2` doesn’t live in `bytecodes`, since this bytecode was created at runtime during the block execution, thus isn’t required to be passed as input. - System contracts called during the STF. The `bytecodes` is a minimal, deduplicated, lexicographically ascending list of these raw-accessed bytecodes. Regarding the validation of this field, if a call tries to execute code whose `codeHash` is not matched by any element of bytecodes, the witness is invalid and the STF must fail. ### `ancestors` field This field contains block headers that are ancestors of the target block. This field is required for two reasons: - System contracts may require access to previous headers, e.g., see [EIP-2935](https://eips.ethereum.org/EIPS/eip-2935), which requires the parent header. - `BLOCKHASH` opcodes valid block range is the last 256 blocks. This means that, in the current mainnet, this field will always have at least one element (i.e., parent block) and up to 256 elements. Note that if the target block being proven has block number N, and a single `BLOCKHASH` asks for block number N-k, then the whole set of block numbers `[N-k, N-1]` will be needed. The `ancestors` is a list of the block headers encoded using RLP. Note this encoding isn’t efficient, but avoids introducing new encoding formats for the EL—this decision can be reconsidered in future iterations of this standard. Regarding verification, all provided ancestors must be validated by checking the chain of parent hashes. If this verification isn’t done or gaps are allowed in the verification chain, invalid ancestor hashes could be provided to `BLOCKHASH`. Also, due to `BLOCKHASH` definition, this list can’t have more than 256 entries. ## Future EIPs impact Two main EIPs could impact the current proposal, but luckily, they do so for the better. [EIP-7709](https://eips.ethereum.org/EIPS/eip-7709) proposes serving `BLOCKHASH` from the saved values in the [EIP-2935](https://eips.ethereum.org/EIPS/eip-2935) contract. This allows removing the `ancestors` field and only leaving the parent header (for Pectra or newer forks). Any EIP that implements bytecode-chunkification like [EIP-7864](https://eips.ethereum.org/EIPS/eip-7864) or [EIP-2926](https://eips.ethereum.org/EIPS/eip-2926) would remove the `bytecode` field entirely, since code-chunks will become naturally part of `nodes`. ## Conclusion This document aims to start the discussion if the proposed first version for a standarized `ExecutionWitness` makes sense, so we have a solid ground to start making progress in _execution-specs_ work on tests and specs, and also for EL clients implementing `debug_executionWitness` RPC. This wouldn't be a final version of this standard, and after we have some working implementations we can keep iterating on if.