Abstract - HackMD

# Abstract This document describes the proof verification threshold variants when verifying an execution payload. Instead of re-executing the execution payload as a means to check that it is correct using an execution layer(EL) client, an attester will receive zkEVM proofs that attest to the correctness of the execution payload with respects to a given EL. We parameterize the different variants, discuss their impact on soundness and completeness/liveness. # Workflow zkEVM proofs allow an attester to verify the correctness of an execution payload without re-executing it locally. The high level view of the workflow is as follows: 1. The attester receives the header for an execution payload. 2. The attester waits for K valid proofs before attesting that the payload is valid. Generally, these proofs may originate from different zkEVM implementations and/or different execution clients. # Definitions ## Proof > TODO: Maybe we should remove this? A proof is simply evidence that some claim is true without reasonable doubt. In our setting, the proof is evidence for the claim, that a deterministic function $f$ on input $x$ produces $y$: $$f(x) = y$$ Analogously, an argument is a line of reasoning that aims to convince you that some claim is true. The argument is not irrefutable. It is common nomenclature to say proofs when we mean arguments. For consistency, we will use "proof" when we mean "argument". ## Bugs There are two implementation-level types of bugs that we care about: - Bugs that cause a true claim to be unprovable - Bugs that cause a false claim to be provable ### Completeness When we say completeness, we are not referring to the theoretical property of the proof system. We refer to implementation bugs that would cause a claim that is otherwise true to be unprovable. *Completeness*: The claim is true, but because the zkEVM implementation has a bug, it is not possible to convince the verifier. This can be due to the witness generator having a bug/panicking etc. ### Soundness Similarly, when we say soundness, we are not referring to the theoretical property of the underlying proof system. *Soundness*: The actual claim may be false, but because the zkEVM implementation has a bug, it convinces a verifier that some claim is true. *actual: Trying to capture the notion of us writing down a statement, but the thing we actually prove is not the exact statement we want. As we will see later on in the document, different threshold variants emphasize either completeness or soundness. ## zkEVM proof semantics A zkEVM proof aims to convince the verifier that the execution layer's state transition function $f_{\text{client}}$, on inputs $(\text{block}, \text{state_witness})$ will produce $\text{next_block_header}$. $$ f_{\text{client}}(\text{block}, \text{state_witness}) = \text{next_block_header} $$ where $f_{\text{client}}$ denotes the state transition function (STF) of a specific execution client implementation. > Note: This should probably be called zk(EL)STF to be more precise. But we are too far gone. # Paradigm shift ## Counterparty agreement Currently for re-execution, the choice as to whether a validator runs a specific EL to validate the execution payload is a local decision. This means that adding an EL is permissionless. The reason it is permissionless is because there is no counterparty. With zkEVM proofs, because the entity making the proof(prover) is different from the entity verifying the proof(verifier), there needs to be an agreement between the two parties as to what zkEVM proofs are supported. In this sense, attesters need the prover's permission/agreement that they will create proofs for the proof type. ## Protocol awareness of ELs Another analogue to draw here; today, the protocol does not know about any particular EL. If a particular EL client has a bug and all of the validators need to switch to a different EL client or patch the EL, this can be done without a hard fork. In practice, it will still require coordination, moreso in the case that the EL client with the bug is a majority. Optional proofs are similar in the sense that a hard fork is not required in order to update the proofs that an attester will accept (you still need counterparty agreement) because the protocol is not aware of the proof types. In practice, if a significant number of attesters are running optional proofs, it is in practice enshrined and the hard forks will not proceed without ensuring that it works. This is similar to mev-boost. Enshrining the proofs, ie adding them to the next block, will make them protocol aware. The counterparty agreement between provers and verifiers will be explicitly codified in the protocol, whereas with optional proofs it was explicit. Since it is codified in the protocol, it means that it can only change over hard forks vs optional proofs that allow the counterparty agreement to change liberally. ## Summary | Model | Counterparty Agreement | Protocol Awareness | Permission Model (For adding new proof types) | Change Process | Analogue | | -------------------- | ---------------------------------------------------------------- | ------------------------------------------ | --------------------------------------------- | ------------------------------------------------ | ----------------------- | | **Re-execution** | None — validator runs its own EL locally | Protocol is agnostic to EL choice | Fully permissionless | Validator can switch EL clients freely (no fork) | Current Ethereum model | | **Optional proofs** | Contractual agreement between prover and verifier (off-protocol) | Protocol unaware of proof types | Semi-permissioned (depends on provers) | Agreement can change freely; no fork required | mev-boost–like phase | | **Enshrined proofs** | Contractual agreement defined in consensus — prover/verifier types fixed in protocol | Protocol explicitly aware of proof types | Fully permissioned | Changes require hard fork | ePBS-like phase | # Implementation diversity ## Motivation zkEVMs have complex software stacks and inherent complexity due to the cryptographic primitives being used. Relying on a single zkEVM would introduce systemic risk, similar to the motivation as to why Ethereum does not rely on a single EL client. We utilize redundancy to allow us to mitigate against soundness and completeness bugs. A zkEVM creates a proof for a particular client, so with the addition of zkEVM proofs, we consider diversity across two axes: | | Example | Benefit | | -------------------- | ------------------------------ | ------------------------------------------------------ | | **EL Client diversity** | Geth, Erigon, Nethermind, Reth | Different STF implementations reduce risk of logical bugs | | **zkEVM diversity** | Jolt, SP1, OpenVM, Ziren | Different circuits/provers reduce risk of proving bugs | ## Total Proof combinations *Mini example* Suppose we have 2 zkEVMs and 5 EL clients. Lets see how many different zkEVM proof types we could make: Each zkEVM can create a different proof for each client, ie $f_{1}, f_{2}, ..., f_{5}$ We have two zkEVMs, so in total there are ten different proof types, ie $\pi_{z_1, f_{1}}$, ..., $\pi_{z_2, f_{5}}$ In general, if we have $Z$ zkEVMs and $C$ execution clients. The total number of possible proofs will be $|\delta| = |Z| \times |C|$ Each $(z_i, f_j)$ defines a proof type $\pi_{z_i, f_j}$ > Note: Although optional proofs allow anyone to launch up their own prover, if two attesters accept two different set of proofs, they will effectively live on separate forks. It therefore is reasonable to talk about a finite set of zkVMs and ELs. The provers will be ran by reputable entities that will to the best of their abilities, guarantee liveness. ## Active Proof combinations In practice, given $Z$ zkEVMs and $C$ execution clients, we do not allow $\kappa$ proof types because we want to decrease the risk of correlated bugs. To illustrate: $$ \{ \pi_{z_1, f_{1}}, \pi_{z_1, f_2}, \pi_{z_1, f_3} \} $$ Since we have the same zkEVM for the three different proof types, if there is a bug in $z_1$, there is a high likelihood that it is exploitable even if the client is changed. $$ \{ \pi_{z_1, f_1}, \pi_{z_2, f_1}, \pi_{z_3, f_1} \} $$ Similarly, if there is a logical bug in $f_1$, it will be exploitable no matter the zkVM thats proving it. zkEVMs can prove bugs. We therefore take a subset of the $\delta$ proof types. Lets call this $N$. # Proof Verification Thresholds Intuitively, one would assume that attesters will wait for all $N$ proofs. Below we discuss different variants of this and their tradeoffs. ## N-of-N **Overview** - Builders must build all $N$ proofs - Attesters must wait for and verify all $N$ proofs **Tradeoffs** *Advantages:* - In order for a malicious builder to exploit a soundness bug, they need the bug to be present in all $N$ different proof types. *Disadvantages:* - If one proof type cannot be made, due to the prover having hardware issues or a completeness bug, then the block will not be seen as valid This favours soundness over liveness to the extreme; we assume that $N$ is chosen suitably, ie highly uncorrelated. ## K-of-N **Overview** - Builders must build $K$ out of the $N$ proofs - Attesters must wait for and verify all $K$ proofs **Tradeoffs** *Advantages:* - If a particular proof type cannot be built, then the builder is able to fallback to any of the $N-K$ proof types. *Disadvantages:* - Since builders can liberally choose any proof type out of the $N$. A rational builder would choose the fastest and cheapest proof types, or the proof types corresponding to zkEVMs that they have Service Level Agreements(SLAs) with. This means that some proof types won't be built for the vast majority of blocks. Depending on the values of $K$ and $N$, this favours liveness or soundness. To see why, consider the unrealistic case where $K=1$ and $N=100$. If there is a soundness bug in any of the $N$ different proof types, a malicious builder would only need to choose that 1 proof type. As $K$ gets closer to $N$, we start to trade liveness off for soundness, until we get to the $K=N$ case, ie N-of-N. ## K-of-N - Epoch based shuffling **Overview:** - At the start of the epoch, all nodes run a pseudo-random function that is seeded with the epoch number and the $N$ different proof types. This function will return $K$ proof types. - Builders must build for the specified $K$ proofs - Attesters must wait for and verify the specified $K$ proofs **Tradeoffs** *Advantages:* - All proof types(ELs) are equally likely to be chosen for a particular epoch *Disadvantages:* - Similar to N-of-N, if a proof type cannot be built due to completeness bugs or hardware issues, then the block will not be valid. > Note: Although a malicious builder can no longer choose proof types that have bugs, in practice, one can use the previous $K-of-N$ variant and choose ($K$, $N$) carefully in addition to choosing proof types that are highly uncorrelated. ## K-of-M-of-N > This was independently thought of by Julian Ma(RIG) and Milos(Protocol Prototyping) **Overview** - Choose a subset M of N. This can be done using a shuffling algorithm - Builders must build for $K$ out of $M$ proofs - Attesters must wait for and verify $K$ proofs **Tradeoffs** *Advantages:* - This adds another parameter $M$ to control the soundness threshold. $M$ allows the protocol to have some degree of freedom in determining the subset of proofs that will be used for that epoch, however the builder still has autonomy to choose out of the $M$ *Disadvantages:* - This introduces a liveness issue with respects to $M$ as opposed to $N$ This adds another parameter $M$ to control the soundness threshold. Observe when M=N, we have the K-out-of-N variant, when M=K we have the K-out-of-N epoch based shuffling variant. ## Summary | Variant | Builder control | Shuffling | Soundness | Liveness | | ---------------------- | --------------- | ------------- | --------- | ----------- | | N-of-N | None | None | Max | Min | | K-of-N | Full | None | Medium (Depends on K and N) | Medium-High (Depends on K and N) | | K-of-N (epoch shuffle) | None | High | High | Medium | | K-of-M-of-N | Partial | Medium | Tunable | Tunable | ------ # Scratchpad (Ignore) - Make a note of builder diversity also being marginally useful. If the block was built with Reth and then proven with Reth then if Reth has a bug, then we have issues. If a block was built with Besu, and then proven with Reth, then if Reth has a bug, it will not be able to prove the besu built block. - This is marginal because we get the same property by proving multiple different ELs. --- **Correlation coefficient** Let: $p_i$: probability that proof type i has a soundness bug. $\rho$: correlation coefficient between any two proof types $$ P_{\text{fail}} \approx \rho + (1-\rho)\prod_{i=1}^{K} p_i $$ * If implementations are highly correlated $(\rho \approx 1)$, diversity offers little benefit when there's a bug. * If mostly independent $(\rho \approx 0)$, risk decreases multiplicatively with (K).