EIP-4762 execution witness worst-case

This document surfaces a consequence of EIP-4762 that is worth analyzing more carefully since it might mean changing the proposed constants.

Context

Let’s start with a minimal refresher so we can understand the point of this document correctly.

Data grouping

The potential proposed Verkle (EIP-6800) and Binary (EIP-7864) trees come with many fundamental changes apart from changing the tree structure.

For example, they define a way to do code-chunkification, which solves the current worst-case scenario for block proving, and they also propose grouping data at the leaf level so the number of branch openings can be reduced, which helps for Verkle/Binary/SNARK proving (i.e. fewer branches = less hashing)

This proposed grouping in the new trees implies a gas cost remodeling to define how this should be charged. For example, in today’s reality, accessing two storage slots always implies different branches. In the new reality, if the two storage slots live in the same branch, it is required to explain how this should be charged.

This is where EIP-4762 comes into play, and part of it explains how gas costs for state access should be charged. The part of the EIP that describes how to do the gas cost charging is here, but here is a TL;DR:

A branch (named stem) in the new tree contains 256 values (i.e. storage slots, code chunks, or account data)
Whenever a new stem is accessed WITNESS_BRANCH_COST is charged.
Accessing any value in this (now warm) stem is WITNESS_CHUNK_COST.

The EIP also defines similar rules for tree writes.

The whole idea of the grouping is to try to improve the efficiency of state access since, for example, if storage slot X is accessed, there’s a high chance that near storage slots could be accessed, which means having to pay less for them since this doesn’t mean having to open new branches for proving.

Block execution witness

Let’s look at the implications of this gas cost remodeling from the angle of potential block size increases. Recall that the idea of Verkle/Binary kind of statelessness is to include in the block an execution witness that contains information and cryptographic proof that the provided pre-state data for the block execution is correct.

The execution witness is defined as:

class ExecutionWitness(container):
    state_diff: StateDiff
    verkle_proof: VerkleProof

where:

state_diff is a list describing the accessed/changed in the block execution. This is similar to what a block-access list would provide but in a different format grouped by stem.
verkle_proof information to prove that the provided state_diff corresponds to the parent state root. This allows the stateless client to verify that the state_diff information required for block re-execution is correct.

This is enough information to understand the next section of this document. If you want to understand more about these internal structures, you can see their definitions in this document. Also, to understand more about the execution witness's gory details, refer to this article I wrote a while ago.

Execution witness size worst-case scenario

The proposed gas remodeling, which reflects this grouping done to improve efficiency, implies a new potential worst-case for block size, which might not be evident since it behaves differently from worst-case sizes in the current MPT.

I’ll describe the rationale in steps:

Every state access implies adding a value in state_diff. For example, if we access a storage slot, it must be included there.
For this state access, there’re two scenarios:
- The storage slot is the first access in the stem, and it should pay WITNESS_BRANCH_COST+WITNESS_CHUNK_COST.
- The storage slot isn’t the first access in the stem, in which case it should pay WITNESS_CHUNK_COST since another access already paid WITNESS_BRANCH_COST.
This already should signal that there’s a way to include at least 32 bytes in the state_diff only paying WITNESS_CHUNK_COST, which is 200.

To exploit this to generate the biggest execution witness size, we can try to access all values for each stem, which lowers the average access cost per byte included in state_diff. The worst case isn’t doing random access but actually the opposite.

Let’s do the math with the proposed constants WITNESS_BRANCH_COST=1900 and WITNESS_CHUNK_COST=200:

Accessing a complete group costs 1900+256*200=53_100 gas.
Accessing this entire group has the following impact on state_diff:
- Add the stem value of 31 bytes (i.e., the key prefix for all these 256 values).
- Add the actual pre-state value of the storage slot for all 256 entries, which is 256*32 bytes.
- This means a total of 8223 bytes.
In a 36M gas limit block, we can do 36M/53_100~=678 of these accesses.
678*8223~=5.3MiB of state_diff in the execution witness.

This is the gross number of the worst-case. The execution witness has other nits included, but the number won’t change much.

For example, the verkle_proof part of the execution witness is less relevant since this kind of access minimizes the number of branches opened by definition. As in, the expected depth of the Verkle Tree is 4; thus opening 53_100 stems implies less than ~ 53_100*4*32=7KiB of internal node commitments. The cryptographic proof (IPA+Multiproof) has a constant size in the number of openings, so it is unaffected by any state access.

So the actual blow-up isn’t the verkle_proof per se, but the state_diff. This is rooted in the intention of the tree design to allow more efficient state access since you can have a lot of storage slot access for 200 gas costs, which is trying to improve UX. The point of this document is also to clarify this new worst-case scenario and account for it in the EIP design.

Should we do something about this?

The reality is that 5.3MiB might be too big, which increases linearly with the gas limit. It also depends on the networking block distribution protocol improvements and bandwidth requirements recommended in nodes.

If this number is too big, a reconsideration of WITNESS_CHUNK_COST and WITNESS_BRANCH_COST should be done. Note that the current values tried to simulate today's reality where WITNESS_BRANCH_COST+WITNESS_CHUNK_COST is 2100 so is very similar to current cold-SLOAD costs.

A potential option is to increase WITNESS_CHUNK_COST to regulate the impact of the described “attack” impact on state_diff, while also potentially lowering WITNESS_BRANCH_COST only if we want to keep this 2100 cost for single-value access. In any case, this requires a more thoughtful discussion.

EIP-4762 execution witness worst-case

Context

Data grouping

Block execution witness

Execution witness size worst-case scenario

Should we do something about this?

Read more

Berlin - Stateless Ethereum

Ethproof Killers Test Suite – Overview & Format Guide

Execution-Spec-Tests TODOs (Stateless Tests)

EIP- 7748 Tests (Tree conversion)