Thanks to Guillaume, Josh, and Dankrad for their feedback.
Verkle Trees enables Ethereum to go stateless which requires adding contracts code to the tree in addition to the usual account data and contract storage. This allows stateless clients to validate blocks – required account data, contract storage and code can be verified with a state proof. Code-chunking is the process of transforming the contract bytecodes into tree key-values.
This document analyzes a recent set of ~1 million mainnet transactions to answer:
h/t to Paweł Bylica, who inspired the technical angle for this exploration in a chat we had while I was implementing the 32-chunker.
TL;DR: the projected average gas overhead (compared to mainnet receipt gas) of going stateless including contracts code in the state tree is ~30%. If the current gas impact is considered too big, there could be some ideas to mitigate:
The last point might be the most elegant one considering going stateless opens the door to increase the block gas limit.
For completeness, it's worth clarifying that all this is relevant if we include the execution witness in blocks, as is expected today. We could decide not to do that, but that would disable stateless verification from the expected benefits..
See the Conclusions section for more info.
The core idea is recording mainnet txs PC traces. A PC trace records all pc
values during a tx execution. If a tx execution has a pc
trace [1, 2, 3, 10, 15, 16, 20, ...]
, we can simulate which code-chunks are accessed and calculate the corresponding gas for any given code-chunker strategy. A PC trace for a tx can contain pc
records for multiple contracts since contracts can do *CALL
s.
I built the following pipeline:
This pipeline is automatic and can easily be re-run in the future. The PC Traces could be recorded via sources other than the Geth live tracer.
A pipeline run was done in mainnet recording ~1 million txs between blocks 20158433 and 20168316 (~2024-06-(24 & 25)). No sampling was done — all (and only) txs which execute code in existing contracts were used in the described range.
I split the analysis into three sections:
In an appendix, I show the most important charts but with txs targeting some of the Top Burners claimed by ultrasound.money to see specific cases.
Let's build an intuition about these txs. Nothing about this section is related to Verkle, this is just mainnet info.
(Open in a new tab to see it bigger)
The image is 2D-heatmap of counted transactions in two dimensions:
execution_length
: is the length of the PC trace. e.g: pc_trace=[1,2,3,4,3,4,3,4,5,6,7]
has length 11.receipt_gas
: is the gas the tx used in mainnet (i.e. Dencun).Some notes about the image:
Note that a large execution_length
doesn't necessarily mean touching much code. The same execution_length
could be a tight loop or a long linear execution.
With each tx PC trace, we can simulate which code chunks were accessed and calculate the corresponding gas under EIP-4762 rules.
This analysis considers code-access gas any gas strictly required for code access. The code accessed in the account header chunks charges only WITNESS_CHUNK_COST
since the account header branch is necessarily charged in the tx processing rules or corresponding *CALL
. Code access out of the account branch must account for WITNESS_BRANCH_COST
and WITNESS_CHUNK_COST
(once) per branch and chunk, respectively.
The above means 95% of txs would have an overhead of <800k gas, with ~68% <50k.
Nominal overhead lacks context, so let's see the relative cost compared to the receipt gas for the tx:
An interpretation:
Important note: code_access_gass/receipt_gas
is a ratio between Verkle code-accessing gas and current Dencun tx gas. EIP-4762 has other gas changes that should be accounted for (for good or bad). This doesn’t mean that “Verkle has on average 32% gas impact on tx”, but “Verkle code-access gas on average is 32% of current (Dencun) gas usage”. Recall no compiler optimizations exist today that optimize for convenient Verkle code-access patterns.
To understand which fundamental cause can be correlated with the %-overhead, let's look at the following chart:
(Open in new tab to see it bigger)
The Y-axis is the number of contracts that executed code in the tx, and the X-axis is the %-code-gas-overhead mentioned before. This makes sense, since the more external contracts your code-execution invovles the more chances you'll have to pay for WITNESS_BRANCH_COST costs.
To gain more insight, see the Uniswap: Universal Router and Tether cases shown in the appendix.
Worst-cases of %-code-access-overhead:
Out of curiosity, let's check the longest execution lengths:
as we expected, having a longer execution doesn't necessarily mean a high relative overhead.
The currently proposed chunking algorithms are:
TL;DR of how 31 and 32 byte-chunking works:
PUSHN
instruction. This algorithm doesn't require auxiliary tables.JUMP*
destinations. For example, only if a code chunk has a 0x5F
(JUMPDEST
) byte that is part of a PUSHN
will it be stored in this auxiliary table so it can be detected in an invalid jump. Since the table is densely encoded, it must always be fully read.Let's separate the comparison between gas usage and encoded chunked size.
Important note: The 32-byte chunker proposal requires an auxiliary table, but there’s no clear spec on where it should live, which can impact gas usage. I’ve decided to store it at the start of the bytecode without any EOF-container format. bytecode = <table_size_varint> | <auxiliary_table> | <contract_bytecode>
. This is probably the most compact way to store it. A real spec might have another format.
This is the hardest metric to estimate theoretically since we’d need a very good low-level intuition of code access execution. The new analysis pipeline is most useful here.
The 32-byte chunker uses ~1.5% less total gas than 31-byte for the full ~1 million txs run. Recall contracts weren’t compiled with any optimization favoring none of the chunking strategies, and the 32-byte chunker table-encoding format was optimized to take as few bytes as possible. The saved byte per chunk positively offsets the new required auxiliary table size.
Let’s look at the same plot we showed before regarding byte31_chunker_gas/receipt_gas
but with the 32-byte chunker:
The relative overhead was reduced by ~1.5% (i.e. 32.57*(1-0.015)~=32.09
) as expected. The shape of the histogram is similar to the 31byte chunker.
This metric is easier to estimate theoretically:
original_size*32/31
(+3.22%).Let's double-check those claims with all the contracts directly or indirectly (sub-calls) in our txs executions:
Notes:
An analysis pipeline now exists to take fresh mainnet txs and simulate how Verkle new code-access impacts gas costs and chunked contract sizes. As previously said, code-access gas isn't the only gas change in Verkle, so for good or bad there can be other overheads or reductions in gas usage – but code-access has probably the biggest impact.
The 31-byte chunker was compared against a proposed 32-byte chunker, which can help decide the best code-chunker, considering benefits and implementation complexity. Any new or existing chunker variant can be added to the pipeline to keep exploring the solution space.
The following is a scoped analysis on txs with the To
field targeting some top 10 top burners:
Notice how for Uniswap: Universal Router the number of executed contracts per tx is way above the mean of mainnet txs. This makes sense, since Uniswap use multiple liquidity pools (thus contracts) to make a swap betwen tokens A and B that don't have a dedicated pool.
Some of worst cases with many charged _WITNESS_BRANCH_COST_s: Tx 1, Tx 2, Tx 3. We can see how this correlates with the tx interacting with many pools in the swap route which is unfortunate for code-chunk accessing.
Here we see clearly that Tether only has 1 contract involved in tx, which makes sense since it's an ERC-20 contract. For that reason, the overall overhead of code-chunk gas is lower than average (20.59% < ~31%)