As the blockchain grows, a looming challenge arises: State Bloat. With the increasing influx of new user activities into the network, more states are created, including accounts and contracts. The growing state imposes a storage burden on nodes, compelling them to store the entire state and affecting performance. When individuals like yourself and me can no longer run a node, this leads to centralization.
Do you see the flow here? More state leads to higher hardware maintenance costs, reducing the number of people able to run a node, which then leads to centralization. In other words, state bloat puts the entire blockchain at risk if not properly addressed.
The underlying idea is to remove redundant dataโdata that is not accessed for a long time. We can achieve this with state expiry. State expiry allows us to temporarily remove inactive data from the blockchain state, so nodes only have to store the most recent data (i.e., the past 6 months) needed to execute a block. If a particular state is needed in the future, anyone can perform a state revive by submitting a cryptographic proof.
Okay, that's the state expiry part. But why Verkle tree? Because the Verkle tree is inevitable. It brings numerous benefits such as stateless clients and faster sync time. Hence, it is worth exploring how state expiry looks in the Verkle tree.
The primary goals are:
During the development process, it would also be beneficial to:
The PoC illustrates the outcome of enabling state expiry in a post-Verkle environment. I simulated an on-chain scenario where accounts expire and are later revived by submitting proofs.
To build the PoC, I wrote bash scripts to deploy nodes on a local private testnet. The testnet runs on PoA, so only execution clients (i.e., Geth) are required. The genesis file is also modified to include the state expiry hard forks.
The following is the table of key-value pairs (i.e., accounts and storage slots) accessed in a block range of 1051200 (approximately 6 months each):
Block Range | Accounts | Storage Slots | Accounts (%) | Storage Slots (%) | Block Range (%) |
---|---|---|---|---|---|
1-1051199 | 29133 | 528590 | 0.01% | 0.06% | 0.05% |
1051200-2102399 | 355666 | 1045690 | 0.16% | 0.12% | 0.13% |
2102400-3153599 | 19442427 | 984221 | 9.02% | 0.12% | 1.92% |
3153600-4204799 | 3909482 | 10221100 | 1.81% | 1.20% | 1.33% |
4204800-5255999 | 19535302 | 45281140 | 9.06% | 5.33% | 6.08% |
5256000-6307199 | 12682407 | 78778712 | 5.88% | 9.27% | 8.58% |
6307200-7358399 | 11209987 | 57899289 | 5.20% | 6.81% | 6.48% |
7358400-8409599 | 13752072 | 53722555 | 6.38% | 6.32% | 6.33% |
8409600-9460799 | 11168617 | 59364265 | 5.18% | 6.98% | 6.62% |
9460800-10511999 | 17629223 | 81365754 | 8.18% | 9.57% | 9.29% |
10512000-11563199 | 22344746 | 70720477 | 10.37% | 8.32% | 8.73% |
11563200-12614399 | 30045584 | 74031128 | 13.94% | 8.71% | 9.77% |
12614400-13665599 | 22575867 | 136646139 | 10.47% | 16.07% | 14.94% |
13665600-14716799 | 23524144 | 150006706 | 10.91% | 17.64% | 16.28% |
14716800-14907811 | 7360654 | 29627335 | 3.41% | 3.48% | 3.47% |
Total | 215565311 | 850223101 | 100.00% | 100.00% | 100.00% |
Looking at the combined block range between 13665600 (22th Nov 2021) and 14907811 (5th Jun 2022), it shows that only 19.75% of the total key-value pairs are accessed. It means that 81.25% of the rest of the key-value pairs are practically redundant and not accessed at all! It's just wasting nodes' storage space.
I did two conversions from MPT to VKT with 50 million key-value pairs. The first conversion uses normal VKT while the second conversion uses my modified VKT. The result shows about ~27% reduction in the storage space.
In the initial phase of the fellowship programme, I wrote a document on all the available state bloat solutions that I could find so far.
Here's the link to the document.
Considering the Verkle Tree may not be a stable release on Geth yet, having to modify directly on the VKT source code is challenging but worked out in the end. The project was able to showcase how state expiry would look like and clearly showed the amount of storage reduction before and after state expiry.
Based on my findings so far, nobody has done a key-value pairs analysis on ETH mainnet. By analyzing the key-value pairs, I was able to find out the exact amount of redundant data that exists on-chain.
The PoC works, but not for all scenarios. There were certain cases where the local testnet broke down due to some bugs. While the minimal scenario passes, it would be great to have more test cases pass to show the robustness of the state expiry scheme.
I ran a local self-hosted node for 2 months, and it broke down so many times that I'd have to restart the process again. Around week 15, a power outage causes the database to be corrupted. In the end, I was only able to sync the node from genesis up until block 14907811.
I didn't expect the conversion will take so much time. At the end, I was only able to convert 50 million key-value pairs, which is just ~0.05% of the total key-value pairs.
And also, I should have enabled preimages collection at the start!
The key-value pairs analysis is certainly an interesting one, and it'll help with the future research of any state expiry scheme. I'd probably improve and optimize the performance of the key-value collection method and resync from genesis again. This time, I must have a fail-safe node setup.
While coding on the VKT components, I found out that there are certain technical debts that need to be addressed, as well as existing components that have not been integrated. It'll be a great opportunity for me to contribute to the VKT codebase in order to push things to the mainnet.
Special thanks to Ignacio and Guillaume for giving advices and helping me along the way. I certainly wouldn't have made it this far without their support.
The fellowship program has undoubtedly been one of the best things to happen this year. Up to this moment, I remain deeply grateful for being part of the fellowship program and having the opportunity to contribute back to the Ethereum ecosystem.
As the fellowship program comes to an end, it signifies the commencement of a new journey. This time, it's a fresh challenge for me to contribute to more current Ethereum ecosystem projects. I can't wait to continue growing in the future!