zkVM on Ethereum - Guest input standardization exploration

*Thanks Kev for the feedback* ## Intro This document explores a proposed standard for Ethereum execution-layer (EL) zkVM guest program inputs: a unified `StatelessInput` API and its encoding. We look at the tradeoffs of asking all EL clients to adopt the same format for the private inputs to their STF guest programs. We aim to answer: 1. Can we define a “fungible” input interface such that, for a given block, the same input bytes can be used to run any EL guest program? 2. Do the costs of a standardized format (deserialization overhead, generic structures, encoding choice) outweigh the interoperability, testing, and maintenance benefits? The intended readers are EL and specs core developers, and prover implementers who integrate with EL zkVM guest programs. This is an open discussion, and the opinions of EL clients are very important, so please feel free to chime in! ## What are the private inputs for the EL guest program? For Ethereum, zkVMs are proving the execution of a specific program. This program implements the logic to run the state transition function (STF) for a block. zkVM guest programs do not support extra hardware like network or disk, so any data beyond the block (e.g., state) must be provided as input to the program. The following is a Python-style guest program for Ethereum: ```python # entrypoint is the function that the zkVM always starts # to execute. It does not have any parameter inputs. def entrypoint(): # The only way to receive parameters for the guest program # is to use zkVM SDK to read input raw bytes. input = zkvm_sdk.read_input() # Raw bytes are the result of encoding `StatelessInput` # with some XXX encoding format (e.g. JSON, cbor, bincode). # We call `decode_input_bytes` to decode the raw bytes # into the corresponding struct. stateless_input = decode_input_bytes(input) return stateless_block_validation(stateless_input) # StatelessInput contains the inputs for the Ethereum guest program. @dataclass class StatelessInputV1: block : Block execution_witness : ExecutionWitness chain_config : ChainConfig # This function represents a stateless execution of Ethereum STF. # It returns `True` if the STF validates correctly against the # provided block, and `False` otherwise. def stateless_block_validation(stateless_input : StatelessInput): # Input validation ... # STF ... # Public inputs commitment ... return True ``` The zkVM starts the guest program by calling `entrypoint`, which follows the coded flow (see code comments). Precisely speaking, the guest program's *private input* is the `input` raw bytes read using `zkvm_sdk.read_input()`. The host is the program that spins up the zkVM, and sets the guest program's input, which corresponds to these `input` bytes received by the guest program. From the host's perspective, the guest program's API is literally whatever these raw bytes represent. The goal of this document is exploring the pros and cons of asking all EL clients to define the exact same API. *Side note: The reason why these function inputs are called private inputs is that the proof verifier does not see them. If that were the case, the proof size wouldn’t be succinct. This function can (and must!) also define public inputs, but to keep this document focused, we will ignore them and discuss them in another document.* ## Deeper look at potential standardized API The code snippet in the previous section defines the guest program input: a serialized `StatelessInputV1` structure. The `V1` prefix is just for versioning, but from now forward, we’ll omit it. This API has two main parts: - The `StatelessInput` struct definition: - The `block` contains the block target that we want to prove. - The `execution_witness` contains the required data to execute the block (e.g., Ethereum state, auxiliary ancestor block headers, bytecode). - The `chain_config` provides the chain configuration (e.g., forks active at block numbers or timestamps). More details about it later. - The encoding format used to serialize `StatelessInput`. This results in the bytes passed from the host to the zkVM being received by the guest program. Regarding the `StatelessInput` structure, the `block` field looks obvious to include, and the format is already standardized in the Ethereum specs. In this document, `execution_witness` is assumed to be exactly the same object as defined in the separate [ExecutionWitness standardization effort](https://hackmd.io/@jsign/zkvm-eth-minimal-execution-witness) (please join the discussion!). The `chain_config` field contains chain configuration information so that the STF can be configured properly. It could be a full-chain config like [this snippet](https://gist.githubusercontent.com/jsign/cc93fa494c04cc26fcfae223b48a0b47/raw/f10303c59c17dd4f6b55d04ef4ba0811e9b6851e/gistfile1.txt), or we could use [EIP-7910](https://eips.ethereum.org/EIPS/eip-7910). This `chain_config` field may be less obvious to justify being required. ELs can define it as a constant defined in the source code. The main benefit is that removing it from the input reduces the number of bytes and lowers input deserialization/parsing overhead. Note that having the `chain_config` doesn’t remove the need for EL clients to provide a new ELF for new forks, since new forks change the code that implements the STF. So the question is: why is it worth including it? The main reason is that this is required for proper test runs, since the testing framework defines [many custom forks that exist only in tests](https://github.com/ethereum/execution-specs/blob/125cb14268552ac7aa34e39c1eae32be641f5bd8/packages/testing/src/execution_testing/forks/__init__.py#L4-L39) — mainly transition forks to cover behavior changes between two consecutive forks. ELs already know how to identify all these forks into their corresponding `chain_config` to run the tests. If we do not add `chain_config` as part of the `StatelessInput` field, then there will be a slight (but important) difference between how tests are run and how production will run. This could be a risk we should discuss, and whether it's worth taking. The safest option is to define a `chain_config` rather simple format (e.g., maybe [EIP-7910](https://eips.ethereum.org/EIPS/eip-7910)) to minimize deserialization overhead. ## Pros and cons Let’s quickly explore some pros and cons of standardizing the guest program API. ### Pro: Fungible EL guest programs Let’s look at the following high-level diagram of the prover flow for optional proofs (for mandatory proofs would be similar, but removing the RPC call): ![image](https://hackmd.io/_uploads/r1NIz4mQbg.png =500x) In this flow, if we standardize the guest program API, step 2. is independent of the target EL guest program. This simplifies the prover's logic because it doesn’t need custom code to prepare the inputs. The EL-guest programs are fungible, simplifying host complexity and enabling quick switching to different guest programs if required. ### Pro: Doesn’t increase the testing surface for invalid inputs As mentioned in the [*ExecutionWitness* standardization document](https://hackmd.io/IK4xZ2N-QMaQRGgnzlm6-Q#Why-is-it-helpful-to-agree-on-a-standard-ExecutionWitness), at the spec and test levels, we need a standard format for representing the execution witness. This execution witness is assumed to be consumed by EL guests to run tests. Many of the most critical tests to add for stateless ELs are that they do not allow generating a proof if the received inputs are invalid, e.g., the pre-state MPT proof is invalid, the inputs are missing bytecodes, or the provided ancestors are incomplete. If preparing the input for a specific EL client requires a transformation from the spec-level guest input to an EL-specific custom input format, that transformation becomes an additional source of bugs. The shared testing framework has no visibility into this extra format or transformation logic, and therefore cannot generate adversarial test cases targeting it. With a shared `StatelessInput` format, all spec-level invalidity lives in one place: malformed or inconsistent `StatelessInput` objects that the test framework can construct, mutate, and fuzz. Any additional, client-specific input formats introduce a second layer of invariants and invalid-input cases that the spec-level tests are unable to exercise. ### Cons: Prevents input data structure validation optimizations Input validation is one of the main parts where guest programs want to optimize their code. The rest of the guest program logic is the STF, which can also benefit from optimizations focused on proving, but these are harder because they can overlap with full-node mode (if supported). Validating the input usually means: - Having to decode raw bytes to the target input structure (e.g., decoding JSON, bincode, cbor, etc) - Validate every field of the input. For example, from the mentioned `StatelessInput`: - The provided `block` conforms to the consensus rules for the active fork (e.g., expected or non-expected fields, values, etc) - The `execution_witness.nodes` contains nodes of a multi-MPT proof that must be validated against the parent block `state_root` field. This involves calculating the keccak of every node, checking the node-pointer logic, and performing RLP decoding. - The `execution_witness.bytecodes` and `execution_witness.ancestors` also have validation rules that involve keccak-ing data, and similar. Today, many existing guest programs use more complex inputs. For example, providing an input that simplifies the pre-state check by avoiding deserialization, but adding other "easier" validation checks. If this shows to be beneficial performance-wise, it makes sense to use this custom input. ### Cons: Forces a potential non-optimal encoding format As mentioned before, the input that the host pass to the zkVM are raw bytes. This means the underlying structure is serialized to a chosen format, such as JSON, bincode, or cbor. Deciding on an encoding format that efficiently decodes in all EL guest programs isn’t something obvious. Each EL guest program is written in a different language, which may have various levels of support for encoding formats. For example, standard encoding formats for Rust guest programs include [bincode](https://docs.rs/crate/bincode/2.0.1) and [rkyv](https://rkyv.org/). These formats are probably not supported in other languages. Usually, using standardized formats incurs a performance penalty in favour of broader support. It would be helpful if ELs could explore this further to determine how much performance is lost when using serialization formats like cbor, protobufs, flatbuffers, or msgpack. A standard zero-copy deserialization is also a good candidate. Note that we could have a partially-standarized guest program input, where we agree on the `StatelessInput` data struct, but allow custom encoding formats per EL. ## What if we conclude not to standardize? If we consider the “Cons” offsets the “Pros”, ELs might decide not to adopt a standard API. If this is the case, then the reality may look like this: - All ELs define, maintain, and publish a library that transforms the specs `StatelessInput` (i.e., mainly [*ExecutionWitness* discussion](https://hackmd.io/@jsign/zkvm-eth-minimal-execution-witness)) into their raw bytes for reception in the guest program. Having a public, well-maintained library for each EL can help minimize the friction of provers switching ELs and the lack of an easy way to prepare custom input. - The same library mentioned above would be used when running EEST tests. This is because what EEST will provide is the standardized `StatelessInput`so the t8n runner of the EL must use the same library to transform it into their custom input. This implies multiple parallel “mini-standards” and adapter libraries—one per EL—each with its own versioning, documentation, and bug surface. Prover implementers and test harnesses would need to track and integrate all of them. At that point, we are effectively recreating a standard input layer, but in the form of per-EL tooling rather than a shared spec. Note that if this scenario becomes real, only building and maintaining a “library” per EL might not be enough. We probably need a unified tool to do this work, since having libraries in different languages doesn’t make it easy for a single prover to use. This needs further discussion. ## Conclusion This document explores the pros and cons of a standardized guest program API (i.e., the `StatelessInput` used in the spec/testing). The goal is to get EL and spec coredevs to chime in and help walk through the tradeoffs. The decision about this topic will have implications for EL guest program implementations.