# Han (weihan) - Update 14, 15 & 16 ## Development Summary The analysis on Ethereum state size and the key values analysis have been generated. Meanwhile, the code for conversion is completed and the conversion progress has been running. It will take approximately 2 days to finish the conversion analysis (assuming there are no bugs in the process). Check out my development branch for Verkle Tree [here](https://github.com/weiihann/go-verkle/tree/state-expiry-dev). Check out my development branch for Geth [here](https://github.com/weiihann/go-ethereum/tree/state-expiry-dev). Check out my local Verkle testnet setup [here](https://github.com/weiihann/verkle-state-expiry). ## State Storage Analysis ### Update on self-hosted node There was a power outrage when I was running my node, and the database is corrupted, sigh 😔. I tried restarting the node, but got the following error messages: ``` INFO [11-01|07:57:08.387] Allocated trie memory caches clean=614.00MiB dirty=1024.00MiB INFO [11-01|07:57:08.621] Using pebble as the backing database INFO [11-01|07:57:08.621] Allocated cache and file handles database=/home/eth-main/execution/data-seed/node-pbss-2/geth/chaindata cache=2.00GiB handles=262,144 INFO [11-01|07:57:11.471] Opened ancient database database=/home/eth-main/execution/data-seed/node-pbss-2/geth/chaindata/ancient/chain readonly=false INFO [11-01|07:57:11.477] Initialising Ethereum protocol network=1 dbversion=8 WARN [11-01|07:57:11.477] Sanitizing invalid node buffer size provided=1024.00MiB updated=256.00MiB INFO [11-01|07:57:11.478] Failed to load journal, discard it err="journal not found" INFO [11-01|07:57:11.534] Opened ancient database database=/home/eth-main/execution/data-seed/node-pbss-2/geth/chaindata/ancient/state readonly=false CRIT [11-01|07:57:11.607] Failed to truncate extra state histories err=EOF ``` This seems to be an ongoing issue on Geth (refer to [this issue](https://github.com/ethereum/go-ethereum/issues/28105)). Anyways, the node has synced up to block `14907811` (5th June 2022), which is 1M blocks less than expected. Since the EPF day is around the corner, there isn't much time to debug and resync the node. Hence, I will just carry on with the analysis, though I would love to do it all over again after the EPF so that I can get the full analysis. ### Key values analysis The main reason why I did a full sync from genesis is to check the last block number that each key-value pair was accessed. Here's the output of my analysis: ``` Number of key-value pairs accessed from block 1 to block 14907811 +-------------------+---------------+---------------+--------------------+--------------------+------------------+ | BLOCK RANGE | ACCOUNT COUNT | STORAGE COUNT | ACCOUNT PERCENTAGE | STORAGE PERCENTAGE | RANGE PERCENTAGE | +-------------------+---------------+---------------+--------------------+--------------------+------------------+ | 1-1051199 | 29133 | 528590 | 0.01% | 0.06% | 0.05% | | 1051200-2102399 | 355666 | 1045690 | 0.16% | 0.12% | 0.13% | | 2102400-3153599 | 19442427 | 984221 | 9.02% | 0.12% | 1.92% | | 3153600-4204799 | 3909482 | 10221100 | 1.81% | 1.20% | 1.33% | | 4204800-5255999 | 19535302 | 45281140 | 9.06% | 5.33% | 6.08% | | 5256000-6307199 | 12682407 | 78778712 | 5.88% | 9.27% | 8.58% | | 6307200-7358399 | 11209987 | 57899289 | 5.20% | 6.81% | 6.48% | | 7358400-8409599 | 13752072 | 53722555 | 6.38% | 6.32% | 6.33% | | 8409600-9460799 | 11168617 | 59364265 | 5.18% | 6.98% | 6.62% | | 9460800-10511999 | 17629223 | 81365754 | 8.18% | 9.57% | 9.29% | | 10512000-11563199 | 22344746 | 70720477 | 10.37% | 8.32% | 8.73% | | 11563200-12614399 | 30045584 | 74031128 | 13.94% | 8.71% | 9.77% | | 12614400-13665599 | 22575867 | 136646139 | 10.47% | 16.07% | 14.94% | | 13665600-14716799 | 23524144 | 150006706 | 10.91% | 17.64% | 16.28% | | 14716800-14907811 | 7360654 | 29627335 | 3.41% | 3.48% | 3.47% | +-------------------+---------------+---------------+--------------------+--------------------+------------------+ | TOTAL | 215565311 | 850223101 | 100.00% | 100.00% | 100.00% | +-------------------+---------------+---------------+--------------------+--------------------+------------------+ Total account KV count from genesis block to the latest block: 215565311 (20.23%) Total storage KV count from genesis block to the latest block: 850223101 (79.77%) Total KV accessed from block 1 to block 14907811: 1065788412 (100.00%) ``` The analysis is shown using a block range of 1051200, which is about 6 months (given a block time of 15s). There are a total of **1.07B key-value pairs** up until block `14907811`. The most important data point is that only about **19.75%** of the total key-value pairs have been accessed in the past 1 year, which corresponds to about **211M key-value pairs**. In other words, **80.25%** of the total state storage have been redundant, and regular full nodes have been storing them for nothing, for free. ### Conversion progress I got a preimage file from Guillaume, and thanks to that, I can do the conversion from MPT to VKT and analyze the total storage saved with my state expiry scheme. The analysis process looks something like this: 1. Import preimages to the local chaindata database 2. Run offline conversion from MPT to VKT (no expiry) 3. Inspect database 4. Run offline conversion from MPT to VKT (with expiry) 5. Inspect database 6. Compare the inspect results For step 1, once I have obtained the preimage file, I can execute the following geth command to import the preimages into my node's database: ``` ./geth_verkle db import --datadir node preimages.bin ``` For step 2, there is an existing command that can do a offline conversion to VKT by using the underlying snapshot. I've modified the inner workings to include state expiry logics and cater to my new VKT format. The command is as follows: ``` ./geth_verkle verkle to-verkle --datadir node --stateexpiry true ``` For step 3, there is an existing tool that allow users to inspect the database. Here's an example of the output: ``` +-----------------+-------------------------+------------+-----------+ | DATABASE | CATEGORY | SIZE | ITEMS | +-----------------+-------------------------+------------+-----------+ | Key-Value store | Headers | 49.86 MiB | 90001 | | Key-Value store | Bodies | 8.96 GiB | 90001 | | Key-Value store | Receipt lists | 4.93 GiB | 90001 | | Key-Value store | Difficulties | 4.55 MiB | 90001 | | Key-Value store | Block number->hash | 3.60 MiB | 90001 | | Key-Value store | Block hash->number | 582.91 MiB | 14907812 | | Key-Value store | Transaction index | 14.72 GiB | 439137644 | | Key-Value store | Bloombit index | 2.71 GiB | 7456312 | | Key-Value store | Contract codes | 3.43 GiB | 600037 | | Key-Value store | Hash trie nodes | 0.00 B | 0 | | Key-Value store | Path trie state lookups | 3.52 MiB | 90001 | | Key-Value store | Path trie account nodes | 27.16 GiB | 232886629 | | Key-Value store | Path trie storage nodes | 95.53 GiB | 959044684 | | Key-Value store | Trie preimages | 21.37 GiB | 318962126 | | Key-Value store | Account snapshot | 7.94 GiB | 174286121 | | Key-Value store | Storage snapshot | 50.54 GiB | 709545865 | | Key-Value store | Account snapshot meta | 8.23 GiB | 215565311 | | Key-Value store | Storage snapshot meta | 57.80 GiB | 850223101 | | Key-Value store | Beacon sync headers | 1.86 GiB | 3585187 | | Key-Value store | Clique snapshots | 0.00 B | 0 | | Key-Value store | Singleton metadata | 7.83 MiB | 15 | | Light client | CHT trie nodes | 0.00 B | 0 | | Light client | Bloom trie nodes | 0.00 B | 0 | +-----------------+-------------------------+------------+-----------+ | TOTAL | 305.83 GIB | | +-----------------+-------------------------+------------+-----------+ ``` The command is as follows: ``` ./geth db inspect --datadir node ``` ## Next week's Action Items - Complete the analysis on VKT conversion - Complete the final development update - Prepare for EPF day presentation