Thanks to Guillaume Ballet, Gottfried Herold, Łukasz Rozmej, and Josh Rudolf for the fruitful discussions.
Ethereum using Verkle Trees requires a bunch of changes at many layers. Apart from the data structure, gas model, and cryptography changes, the chain will have to migrate the state of the current Merkle Patricia Trie to the new Verkle Tree (assuming no State Expiry is implemented).
Without getting into details about the migration method, the current path is to do it via the Overlay tree method. This means we’ll migrate 1 billion (estimated) key values from the Merkle Patricia Tree to the Verkle Tree on each block. I highly recommend watching this EthCC 2023 talk if you want more details.
So, what is the relationship between migrating X key-values per block and preimages? On each block, the nodes will deterministically walk the MPT taking the next key-values to migrate. Then, it will insert these same key-values into the VKT. But there’s a catch: the keys in the MPT are Keccak hashed values, but we need the preimages to recalculate the corresponding new key in the VKT.
More concretely, one of the key-values to migrate has the MPT key 0xabddde...
which results from keccack(someAddress)
. To know where to store someAddress
account information in the new VKT, we need to apply another kind of hashing (Pedersen Hash) to someAddress
. Walking the MPT only give us 0xabdde...
, but we need the preimage of that hash (i.e: someAddress
) to be able to migrate it.
Many client implementations have a flag to enable preimage recording. As an example, you can check Geth documentation regarding preimage recording. Since this is an optional feature (not enabled by default), most validators don’t record preimages; thus, we can do the reverse lookup to transform 0xabddde...
to someAddress
. Other clients, such as Erigon, store their accounts as preimages, so they don’t need to fetch preimages from external sources.
In summary, the Overlay tree migration strategy requires all validators to have access to a complete set of preimages for the last (frozen) MPT from where data will be migrated. Thus, we need some strategy to generate and distribute the preimages to all nodes before the migration starts.
Before describing the currently discussed solutions, it helps to have a visual timeline of some facts about how the Verkle Trees EIP will unfold and define some milestones and terms.
Let’s try to unpack the above diagram since it will be the base diagram for later explaining potential options for generating and distributing preimages.
In slot A, we see the VKT activation. This activation means that the Overlay Tree gets activated, which implies validators will have two trees:
Between slots A and C, the only key-values that will be inserted in the VKT are:
SSTORE
)Note that the Sweeping phase hasn’t started yet. Soon we’ll explain what this phase is about.
This isn’t a relevant slot for the VKT EIP since nothing mainly changes in validators’ logic. It’s just an observation that from this point forward, no reorgs will make clients revert to a state before the VKT activation.
The Sweeping phase starts. In the Sweeping phase, we migrate X key-values from the MPT to the VKT on each block. For simplicity, the diagram above states that Slot B < Slot C, but that’s strictly needed depending on the solution. (i.e: Option 1. to be explained soon, doesn’t require this condition).
This is a critical slot because network validators must have the complete preimages database before starting the Sweeping phase. If that isn’t the case, when walking the MPT migrating key-values they can find a key that can’t be resolved to its preimage, which will completely block the client from executing the block.
Theoretically, you could start this phase even if you have a partial preimages database since it’s expected to be used in order (sequentially). Exploiting this might be up to each client, but it comes with risks compared to planning for validators to pull the entire database with time. It can make sense for validators who join the network close to this slot.
The Sweeping phase finishes, meaning all the MPT key-values were already migrated to the VKT. The VKT has the complete state of the Ethereum chain.
We could also imagine some Slot E where this event gets finalized, and clients can do further cleaning tasks considering no reorg can jump back to the Sweeping phase.
Now that we understand why these preimages are needed and have some mental model of how the Verkle Trees deployment will (probably) unfold, let’s remember our goal: How can we ensure that network validators have the preimages database when the Sweeping phase starts?
Note that there are two angles to this question:
In the following two sub-sections, we’ll discuss two proposals. Right after, we’ll go through some general questions to help compare how both options have different tradeoffs.
Remember, this is an active discussion — better approaches might be discovered. What “better” means is not precisely defined, so we want to open this discussion as much as possible.
The options are described in no particular order.
Let’s look at the following diagram explaining this option:
Link to bigger image.
Note: the red elements were added compared to our base diagram. The distance between slots might not be accurate. Slot B is irrelevant (it can be ignored; it might happen before or after Slot C).
We first define a loosely defined period where all EL clients will release a highly recommended version that will enable preimage recording by default. There’s no strict point where all clients should coordinate, but this period should be far from Slot A.
At a similar time, some network actors (to be explained later) will start generating preimage databases version based on finalized MPTs. These preimage databases will be published and available for the network to download.
For validators that:
The explanation of why this works is simple, but we’ll dive into more details later.
Some general observations about this solution:
Let’s look at the following diagram explaining this option:
Link to bigger image.
Note: the red elements were added compared to our base diagram. The distance between slots might not be accurate. The red “circle” means the uniquely generated and published database.
This proposal avoids the need for preimage recoding in validators by generating the preimages database only after Slot B when these are final. Compared to Option 1., no new release should be considered by EL clients to enable preimage recording.
To put it concretely, the preimages generated in this option can be viewed as those generated in Option 1, plus the recorded preimages done by validators. When the Sweeping phase starts, all validators must find the required preimage in this generated database. (i.e: they don’t have to check in “two places”).
To understand the different tradeoffs between these options, let’s compare both by looking from different angles.
By correctness, we mean that any validator at Slot C and forward can resolve any preimage in the MPT that will be walked to migrate the state.
Option 1:
After enabling preimage recording and pulling a valid preimage database (i.e: published after the recording was activated), the validator can resolve any MPT preimage.
For example, let’s assume we’re at some Slot β after the validator has enabled recording and has pulled a database*:*
This means that validators will have a complete set of preimages to perform the migration at Slot C (i.e: Slot β = Slot C).
Option 2:
This option created the preimage database after the MPT is read-only and finalized. Between Slot B and Slot C, no keys could be modified (1. MPT is read-only since we’re after Slot A, 2. No reorg should modify the consensus around the finalization state*)*
Note that the time between Slot A and Slot C, validators will have two active trees. We might need to charge more gas for the reading state during this time. This is true since accessing state potentially means accessing two trees, which means it’s most costly for validators. This depends if we can assume all clients have flat-dbs for state to avoid reading trees.
In Option 1. the time between both slots is smaller than in Option 2. The reason for that is that Option 2. packs most of the preimage database generation and distribution between these slots. In Option 1. the preimage database generation and a big part of the distribution happens before Slot A; thus, the time between Slot A and Slot C can be shorter.
This means, Option 1. could be considered better than Option 2. regarding UX.
This is a critical question to answer. If Slot C is defined incorrectly and most of the network doesn’t have all the needed preimages, there can be a liveness problem.
In Option 1. validators have more time to start recording preimages and downloading a preimage database, probably long before Slot A happens. Slot C will be defined when the release for Verkle EIP is released, so we don’t have to strictly “guess” what Slot C should be upfront.
In Option 2. there’s naturally a shorter time between Slot B and Slot C. Additionally, as mentioned in the Impact on Ethereum users section above, there’s a tension between giving the network more time and asking the users to pay an overhead in gas. Option 2 might need social coordination to know when the network is ready while not waiting too long to avoid this UX impact.
Some points about this:
Any validator that has been running with preimage recording or runs specific clients that can take advantage of their design to dump the preimages (e.g: Erigon).
No. This database is easily verifiable by any validator. The way to verify it is to do the MPT tree walking and check that all the preimages needed for the Sweeping phase can be resolved. This verification must happen before the Sweeping phase and can be done potentially in the background by validators.
Recall that even missing one preimage can mean a validator will completely block the Sweeping phase, so this verification step is essential.
This is something still being discussed, and we need more opinions from the community, but some discussed options are:
Both options are possible, and below I’ll list some general questions that we’ve already touched on that can help to continue the conversation:
Take everything said here as brainstorming. Time spent on this topic was mainly ping-pong conversations.
It will probably be a flat file with some trivial encoding. It won’t be a database with some complex format or similar. Ignacio and Guillaume had experimented with a Geth database generator and format that is very simple; more to be shared soon in VKT implementers call.
Option 1. keeps publishing updated preimage databases for two main reasons.
First, let’s try to imagine if we only published this database once. This means a new validator joining the chain can only sync from a point equal to or before this published timestamp. If that isn’t the case, there would be a gap (i.e: missing preimages) in the preimages database + preimage recording set.
Second, republishing databases also allow less coordination between EL clients publishing new versions with preimage recording enabled. Let’s imagine two extreme cases:
The only downside of this republishing is that devops teams or archive nodes participating in publishing databases will have ongoing costs.
The actual processing to generate a preimage database isn’t costly. A standard machine can generate this database multiple times daily without any problem. Most of the cost might be in infrastructure to automate the generation and uploading, plus bandwidth, anti-DoS, and related expenses for parties that want to be download sources.
Yes. Although the primary goal of generating this database is to be sure every validator has the necessary data to do the Sweeping phase correctly, using this database can have another big benefit: improving clients' efficiency in the Sweeping phase.
This preimages database will be constructed in a specific way such that its usage in the client implementations of the Sweeping phase logic have the least amount of disk IO overhead possible.
For example, if we use a usual Geth-like preimage database, this would mean that resolving preimages will be random disk lookups that aren’t optimizable by the host OS. The preimage database we plan to generate is precisely stored so that resolving preimages can be implemented as a sequential forward-only read of a flat file, which is very efficient and doesn’t require random lookups. These random lookups can also mess up with other planned in-order walks of the MPT using “flat snapshots” in clients.
In summary, besides providing all the necessary information, it can also be helpful compared to other generic forms of preimage databases in some clients. Using this generated database for performance reasons isn’t mandatory but just an indirect benefit that can be taken advantage of.
This is still an actively discussed topic. We want more people to share their opinions!