# Han (weihan) - Final Development Update ## ๐ŸŒฒ Why Post-Verkle State Expiry? ### Significance of the project As the blockchain grows, a looming challenge arises: State Bloat. With the increasing influx of new user activities into the network, more states are created, including accounts and contracts. The growing state imposes a storage burden on nodes, compelling them to store the entire state and affecting performance. When individuals like yourself and me can no longer run a node, this leads to centralization. Do you see the flow here? More state leads to higher hardware maintenance costs, reducing the number of people able to run a node, which then leads to centralization. In other words, state bloat puts the entire blockchain at risk if not properly addressed. The underlying idea is to remove redundant dataโ€”data that is not accessed for a long time. We can achieve this with state expiry. State expiry allows us to temporarily remove inactive data from the blockchain state, so nodes only have to store the most recent data (i.e., the past 6 months) needed to execute a block. If a particular state is needed in the future, anyone can perform a state revive by submitting a cryptographic proof. Okay, that's the state expiry part. But why Verkle tree? Because the Verkle tree is inevitable. It brings numerous benefits such as stateless clients and faster sync time. Hence, it is worth exploring how state expiry looks in the Verkle tree. ### Goals & Objectives The primary goals are: 1. Demonstrate a PoC simulating the effect of state expiry using Verkle tree as the underlying state data structure. 2. Analyze the amount of storage reduction before and after implementing this state expiry scheme. During the development process, it would also be beneficial to: 1. Determine the amount of redundant state in the blockchain. 2. Compile all existing state bloat solutions for future references. ## ๐ŸŽฏ Project Outcome ### Yes, the PoC works! The PoC illustrates the outcome of enabling state expiry in a post-Verkle environment. I simulated an on-chain scenario where accounts expire and are later revived by submitting proofs. To build the PoC, I wrote bash scripts to deploy nodes on a local private testnet. The testnet runs on PoA, so only execution clients (i.e., Geth) are required. The genesis file is also modified to include the state expiry hard forks. ### How much useless state? I got it. The following is the table of key-value pairs (i.e., accounts and storage slots) accessed in a block range of 1051200 (approximately 6 months each): | Block Range | Accounts | Storage Slots | Accounts (%) | Storage Slots (%) | Block Range (%) | | -------- | -------- | -------- | -------- | -------- | -------- | | 1-1051199 | 29133 | 528590 | 0.01% | 0.06% | 0.05% | | 1051200-2102399 | 355666 | 1045690 | 0.16% | 0.12% | 0.13% | | 2102400-3153599 | 19442427 | 984221 | 9.02% | 0.12% | 1.92% | | 3153600-4204799 | 3909482 | 10221100 | 1.81% | 1.20% | 1.33% | | 4204800-5255999 | 19535302 | 45281140 | 9.06% | 5.33% | 6.08% | | 5256000-6307199 | 12682407 | 78778712 | 5.88% | 9.27% | 8.58% | | 6307200-7358399 | 11209987 | 57899289 | 5.20% | 6.81% | 6.48% | | 7358400-8409599 | 13752072 | 53722555 | 6.38% | 6.32% | 6.33% | | 8409600-9460799 | 11168617 | 59364265 | 5.18% | 6.98% | 6.62% | | 9460800-10511999 | 17629223 | 81365754 | 8.18% | 9.57% | 9.29% | | 10512000-11563199 | 22344746 | 70720477 | 10.37% | 8.32% | 8.73% | | 11563200-12614399 | 30045584 | 74031128 | 13.94% | 8.71% | 9.77% | | 12614400-13665599 | 22575867 | 136646139 | 10.47% | 16.07% | 14.94% | | 13665600-14716799 | 23524144 | 150006706 | 10.91% | 17.64% | 16.28% | | 14716800-14907811 | 7360654 | 29627335 | 3.41% | 3.48% | 3.47% | | **Total** | **215565311** | **850223101** | **100.00%** | **100.00%** | **100.00%** | Looking at the combined block range between [13665600](https://etherscan.io/block/13665600) (22th Nov 2021) and [14907811](https://etherscan.io/block/14907811) (5th Jun 2022), it shows that only **19.75%** of the total key-value pairs are accessed. It means that 81.25% of the rest of the key-value pairs are practically redundant and not accessed at all! It's just wasting nodes' storage space. ### So, what's the impact of this state expiry scheme? I did two conversions from MPT to VKT with 50 million key-value pairs. The first conversion uses normal VKT while the second conversion uses my modified VKT. The result shows about ~27% reduction in the storage space. ### Right here, all the state bloat solutions. In the initial phase of the fellowship programme, I wrote a document on all the available state bloat solutions that I could find so far. Here's the [link](https://hackmd.io/NZn8QMkATQOdEAncOu9ZMQ) to the document. ## ๐Ÿ“Š Project Evaluation ### What Went Well #### Working PoC with impact measurement Considering the Verkle Tree may not be a stable release on Geth yet, having to modify directly on the VKT source code is challenging but worked out in the end. The project was able to showcase how state expiry would look like and clearly showed the amount of storage reduction before and after state expiry. #### The first ever key-value pairs analysis Based on my findings so far, nobody has done a key-value pairs analysis on ETH mainnet. By analyzing the key-value pairs, I was able to find out the exact amount of redundant data that exists on-chain. ### Challenges Faced #### It needs more testing scenarios The PoC works, but not for all scenarios. There were certain cases where the local testnet broke down due to some bugs. While the minimal scenario passes, it would be great to have more test cases pass to show the robustness of the state expiry scheme. #### Running a self hosted node is harder than I think I ran a local self-hosted node for 2 months, and it broke down so many times that I'd have to restart the process again. Around week 15, a power outage causes the database to be corrupted. In the end, I was only able to sync the node from genesis up until block 14907811. #### Conversion from MPT to VKT took longer than expected I didn't expect the conversion will take so much time. At the end, I was only able to convert 50 million key-value pairs, which is just ~0.05% of the total key-value pairs. And also, I should have enabled preimages collection at the start! ## ๐Ÿ”ฎ Future Plans The key-value pairs analysis is certainly an interesting one, and it'll help with the future research of any state expiry scheme. I'd probably improve and optimize the performance of the key-value collection method and resync from genesis again. This time, I must have a fail-safe node setup. While coding on the VKT components, I found out that there are certain technical debts that need to be addressed, as well as existing components that have not been integrated. It'll be a great opportunity for me to contribute to the VKT codebase in order to push things to the mainnet. ## ๐Ÿ™ Acknowledgements Special thanks to Ignacio and Guillaume for giving advices and helping me along the way. I certainly wouldn't have made it this far without their support. ## ๐Ÿ  Feedback on EPF #### What went well: 1. Weekly AMA sessions were really great. I got to learn directly from the experience of the current core devs. 2. Full autonomy and support to select our own research projects. 3. Grants and Devconnect subsidies. Without financial burden, I can just focus on the research and development, as well as gaining the opportunity to meet and network with like-minded peers. 4. It's really easy to check out the projects that have been done before and refer to the their resources. #### Even better if: 1. I feel like the weekly Discord threads didn't work out as intended, as not all fellows have posted their updates there. I personally feel that it's double work because I have already updated the github repo. 2. We can have more exposure on each fellow's project. Perhaps we can utilize the social media to showcase what each fellow is doing. 3. More hands-on workshop. For example, it'll be great to showcase the development process of core devs building a feature on client nodes. ## ๐Ÿ”š Bye EPF, it was a great one The fellowship program has undoubtedly been one of the best things to happen this year. Up to this moment, I remain deeply grateful for being part of the fellowship program and having the opportunity to contribute back to the Ethereum ecosystem. As the fellowship program comes to an end, it signifies the commencement of a new journey. This time, it's a fresh challenge for me to contribute to more current Ethereum ecosystem projects. I can't wait to continue growing in the future!