Summary

This week, I have completed the code for storage analysis, but found some critical issues to the design so I had to redo it. I've also connected with Stellar's state expiry dev and continued my conversation with the relevant mentors. I've also began to write my project proposal.

Storage Analysis

I've managed to setup a node on mainnet and let it full sync for 1 day. Here's a quick snippet:

As shown, we are able to see how much storage is being used for a certain block range (customizable).

But there's a huge flaw with this approach. Geth now uses a hash-based approach. Each MPT node is stored in the database where the key is its hash and the value is RLP-encoded data or raw value bytes. As we know, any underlying changes to the leaf nodes (values) will propagate its changes to their parent nodes, which modifies their hashes. Therefore, my current approach of storing key-value as hash(node)-blocknum will not work because there are guaranteed redundant trie nodes in the database Hence, the analysis result will be flawed.

With Geth's recent release of Path-based State Scheme (PBSS), we are able to reduce this redundancy because each trie node is stored as path-value instead of hash-value. However, if the MPT structure changes, there will still be data redundancy.

Another approach of doing this storage analysis is to analyze the number of KVs being accessed instead of the raw trie nodes. The general approach is every time the bottommost diff layer is written to the disk layer, we will also write the block number where the KV is accessed. Most of the code components will look similar, so I aim to finish this feature by the end of Week 5.

Link to the feature branch. Will make a separate branch for the new approach. Also, a complete documentation will be prepared in the upcoming weeks on how to reproduce the result.

Stellar x Ethereum

I had a talk with Garand from Stellar to understand their state expiry approach. Garand also did a lot of research on various Ethereum's state expiry proposals, so I had gained tons of insights from the short call with him. Here are some notes:

What is an Expired State Store (ESS) in detail?

  • It's a binary Merkle tree
  • Use bloom filter for non-existent proof generation

One of the best advantages of using Verkle Tree is its small witness size. Why use binary Merkle Tree instead of Verkle Tree?

  • The witness size for binary merkle tree is also relatively small, but the lookup time complexity is larger as the tree has higher depths
  • For Stellar's ESS, there are different access patterns, in which lookup speed doesn't matter
  • So binary merkle tree is good enough, without worrying about the complexity of Verkle Trees

If Stellar stores deleted persistent data in ESS, the amount of deleted data will grow over time, so state bloat still occurs. How to mitigate this issue?

  • Validators do not store ESS, ESS is run off-chain by nodes called ESS nodes
  • Querying ESS to revive a certain state data will give some rewards to ESS nodes
  • Also, ESS nodes can store only the subset of the state data that they want (i.e. sharding)

What's your opinion on Vitalik's State Expiry EIP?

  • Has worked on an epoch-based approach for 6 months, but it just doesn't work and can be attacked very easily.
  • Introduces a lot of attack vectors
  • Since it requires proof of existence, the state data still need to be available somewhere, so someone still needs to store them and needs to be easily retrievable
  • As the number of trees grows, computing the proofs will take more time and expensive

Any advice on State Expiry for Ethereum?

  • Don't think it's so important
  • Validators are already stateless, so it's shifting the responsibility of storing state from the consensus layer
  • But statelessness will cause MEV to skyrocket. It's hard to control MEV if there's a separation of block proposals and block validators

Chat with Mentors

I've also continued my conversation with Guillaume on Discord. One of the state expiry approach that seems quite appealing as compared to the rest is to only expire the values in the bottommost nodes in the Verkle Tree. Quoting Guillaume, since internal nodes are quite small, so most of the data will be in the bottom nodes, expiring only the values might save a lot space.

Initially, I thought that expiring values in the leaf nodes will not save a lot of space comparatively because internal nodes take up more space. But his opinion was completely different. Therefore, I'll continue to explore this approach as the other state expiry schemes do not seem too appealing at the moment.

Project Proposal

I've just started to write my project proposal, aiming to finish it by the end of this week or next week. The general idea is to work on post-verkle state expiry scheme, adding on to the storage analysis project that I've been working on.

Daily Updates

To ensure consistency, I post daily updates (on weekdays) on what I did for EPF. Check out my daily updates this week:

Monday
Tuesday
Wednesday
Thursday
Friday

Next week's Action Items

  • Complete the project proposal
  • Dive deep into Verkle's state expiry scheme
  • Complete the coding part for storage analysis
  • Connect with Piper on state expiry
Select a repo