Verkle migration

Motivation

I think it would be preferable to reduce this pressure so that we are no longer in a rush and that we do not put too much pressure on the block time processing. I don't think there is a perfect solution but I'm trying to explore some ideas and see what we can do .

So ?

What is the result if we not set the goal of having a quick state migration but rather choosing to migrate a small number of PMT leaves to Verkle slowly enough so that we do not require too recent machines and that we do not necessarily seek to do this migration quickly and impact block time processing. This can take some time without causing any problems ?

Challenges and Solutions

Managing Database Size and Bug Risks

How can we ensure that this solution does not lead to an overly large database size and also an increased risk of bugs since we have to manage two states for a long period?

The initial idea is to freeze PMT and PMT flat db at the time of the Verkle fork. Then, the new modified or created leaves will be pushed into the Verkle tree, and finally, we will have a small number of leaves per block moving from PMT to Verkle.

The next step is to keep the PMT until the next finalize block to be able to reorg if necessary. Afterward, we will only keep the flat DB that contains account, slot, and code. So far, nothing new.

Dual Flat DB Composition

At the time of transition, there will be two flat DBs: one for Verkle (with stems as keys) and the flat PMT (based on accountHash, codeHash, slotHash).

Migration Process

The new leaves that we will migrate will move to the flat DB Verkle and will be deleted from the flat DB PMT, which will keep the state size equivalent. With each block, we will also take leaves from left to right to migrate. We will keep the index of the last migrated leaf to know if we need to read flat db PMT in fallback at the time of the SLOAD or if reading on the flat DB Verkle is sufficient. The problem is that we will have, in the worst case, two disk accesses which is not very good for the SLOAD. Unfortunately, if we do not have a snap sync mechanism that allows to resync the flat DB by accountHash, slotHash, and codeHash, we will not be able to resync it at the time of the snap sync which only knows the stems. It is therefore preferable to pass the flat DB Verkle via stem keys.

Addressing Synchronization Challenges

One significant challenge we face is the synchronization process during the migration to Verkle. Our goal is to avoid developing overly complex solutions for this temporary phase.

Implementing Snap Sync for Verkle

A crucial component of our strategy involves creating a snap sync mechanism for Verkle. This will facilitate the downloading of Verkle tree leaves, enabling us to construct and fully synchronize the tree.

Pre-Verkle Fork Data Recovery

Concerning the recovery of the flat DB from before the Verkle fork, we aim to simplify the process. A practical solution is to distribute a pre-image of the flat DB via torrent, containing:

account hash <> value
slot hash <> value
code hash <> value

This approach allows us to easily integrate the pre-image of the PMT flat DB into our client and validate the pre-image. After that we can do the snapsync that will create the verkle trie and the verkle flat DB

Validation

To validate this received data, we could initially push it into a PMT trie. If the state root matches the last block before the fork, we can remove the PMT entirely, retaining only the Verkle flat DB. This method allows us to directly recover the pre-Verkle flat DB using stem keys, eliminating the need for fallback mechanisms and reducing SLOAD disk access to one. This conversion and validation should occur post-fork.

Verkle Snap Sync

When we have our pre-Verkle flat DB, we start the Verkle snap sync.

This sync must be able to recover the stems of the tree and recreate the tree. It will also allow updating the Verkle flat DB, with the last update. After the snapsync your node is running and can import new block

Conclusion

Even though it's not ideal, I believe there are no insurmountable obstacles. The primary concern I foresee is the dual flat DB structure, which could increase the cost of SLOAD operations if we cannot find a snap sync solution capable of reconstructing a flat DB using accountHash, codeHash, and slotHash. I think if we are able to retrieve address+index from stem it will fix the problem of double flat db . But I don't think it's a reversible operation

Therefore, the most significant issue appears to be the increased cost of SLOAD during the transition. However, it's important to note that this cost remains lower with a flat DB than when we had to traverse the tree to retrieve a leaf.