The analysis on Ethereum state size and the key values analysis have been generated. Meanwhile, the code for conversion is completed and the conversion progress has been running. It will take approximately 2 days to finish the conversion analysis (assuming there are no bugs in the process).
Check out my development branch for Verkle Tree here.
Check out my development branch for Geth here.
Check out my local Verkle testnet setup here.
There was a power outrage when I was running my node, and the database is corrupted, sigh 😔. I tried restarting the node, but got the following error messages:
INFO [11-01|07:57:08.387] Allocated trie memory caches clean=614.00MiB dirty=1024.00MiB
INFO [11-01|07:57:08.621] Using pebble as the backing database
INFO [11-01|07:57:08.621] Allocated cache and file handles database=/home/eth-main/execution/data-seed/node-pbss-2/geth/chaindata cache=2.00GiB handles=262,144
INFO [11-01|07:57:11.471] Opened ancient database database=/home/eth-main/execution/data-seed/node-pbss-2/geth/chaindata/ancient/chain readonly=false
INFO [11-01|07:57:11.477] Initialising Ethereum protocol network=1 dbversion=8
WARN [11-01|07:57:11.477] Sanitizing invalid node buffer size provided=1024.00MiB updated=256.00MiB
INFO [11-01|07:57:11.478] Failed to load journal, discard it err="journal not found"
INFO [11-01|07:57:11.534] Opened ancient database database=/home/eth-main/execution/data-seed/node-pbss-2/geth/chaindata/ancient/state readonly=false
CRIT [11-01|07:57:11.607] Failed to truncate extra state histories err=EOF
This seems to be an ongoing issue on Geth (refer to this issue). Anyways, the node has synced up to block 14907811
(5th June 2022), which is 1M blocks less than expected.
Since the EPF day is around the corner, there isn't much time to debug and resync the node. Hence, I will just carry on with the analysis, though I would love to do it all over again after the EPF so that I can get the full analysis.
The main reason why I did a full sync from genesis is to check the last block number that each key-value pair was accessed. Here's the output of my analysis:
Number of key-value pairs accessed from block 1 to block 14907811
+-------------------+---------------+---------------+--------------------+--------------------+------------------+
| BLOCK RANGE | ACCOUNT COUNT | STORAGE COUNT | ACCOUNT PERCENTAGE | STORAGE PERCENTAGE | RANGE PERCENTAGE |
+-------------------+---------------+---------------+--------------------+--------------------+------------------+
| 1-1051199 | 29133 | 528590 | 0.01% | 0.06% | 0.05% |
| 1051200-2102399 | 355666 | 1045690 | 0.16% | 0.12% | 0.13% |
| 2102400-3153599 | 19442427 | 984221 | 9.02% | 0.12% | 1.92% |
| 3153600-4204799 | 3909482 | 10221100 | 1.81% | 1.20% | 1.33% |
| 4204800-5255999 | 19535302 | 45281140 | 9.06% | 5.33% | 6.08% |
| 5256000-6307199 | 12682407 | 78778712 | 5.88% | 9.27% | 8.58% |
| 6307200-7358399 | 11209987 | 57899289 | 5.20% | 6.81% | 6.48% |
| 7358400-8409599 | 13752072 | 53722555 | 6.38% | 6.32% | 6.33% |
| 8409600-9460799 | 11168617 | 59364265 | 5.18% | 6.98% | 6.62% |
| 9460800-10511999 | 17629223 | 81365754 | 8.18% | 9.57% | 9.29% |
| 10512000-11563199 | 22344746 | 70720477 | 10.37% | 8.32% | 8.73% |
| 11563200-12614399 | 30045584 | 74031128 | 13.94% | 8.71% | 9.77% |
| 12614400-13665599 | 22575867 | 136646139 | 10.47% | 16.07% | 14.94% |
| 13665600-14716799 | 23524144 | 150006706 | 10.91% | 17.64% | 16.28% |
| 14716800-14907811 | 7360654 | 29627335 | 3.41% | 3.48% | 3.47% |
+-------------------+---------------+---------------+--------------------+--------------------+------------------+
| TOTAL | 215565311 | 850223101 | 100.00% | 100.00% | 100.00% |
+-------------------+---------------+---------------+--------------------+--------------------+------------------+
Total account KV count from genesis block to the latest block: 215565311 (20.23%)
Total storage KV count from genesis block to the latest block: 850223101 (79.77%)
Total KV accessed from block 1 to block 14907811: 1065788412 (100.00%)
The analysis is shown using a block range of 1051200, which is about 6 months (given a block time of 15s). There are a total of 1.07B key-value pairs up until block 14907811
. The most important data point is that only about 19.75% of the total key-value pairs have been accessed in the past 1 year, which corresponds to about 211M key-value pairs. In other words, 80.25% of the total state storage have been redundant, and regular full nodes have been storing them for nothing, for free.
I got a preimage file from Guillaume, and thanks to that, I can do the conversion from MPT to VKT and analyze the total storage saved with my state expiry scheme. The analysis process looks something like this:
For step 1, once I have obtained the preimage file, I can execute the following geth command to import the preimages into my node's database:
./geth_verkle db import --datadir node preimages.bin
For step 2, there is an existing command that can do a offline conversion to VKT by using the underlying snapshot. I've modified the inner workings to include state expiry logics and cater to my new VKT format. The command is as follows:
./geth_verkle verkle to-verkle --datadir node --stateexpiry true
For step 3, there is an existing tool that allow users to inspect the database. Here's an example of the output:
+-----------------+-------------------------+------------+-----------+
| DATABASE | CATEGORY | SIZE | ITEMS |
+-----------------+-------------------------+------------+-----------+
| Key-Value store | Headers | 49.86 MiB | 90001 |
| Key-Value store | Bodies | 8.96 GiB | 90001 |
| Key-Value store | Receipt lists | 4.93 GiB | 90001 |
| Key-Value store | Difficulties | 4.55 MiB | 90001 |
| Key-Value store | Block number->hash | 3.60 MiB | 90001 |
| Key-Value store | Block hash->number | 582.91 MiB | 14907812 |
| Key-Value store | Transaction index | 14.72 GiB | 439137644 |
| Key-Value store | Bloombit index | 2.71 GiB | 7456312 |
| Key-Value store | Contract codes | 3.43 GiB | 600037 |
| Key-Value store | Hash trie nodes | 0.00 B | 0 |
| Key-Value store | Path trie state lookups | 3.52 MiB | 90001 |
| Key-Value store | Path trie account nodes | 27.16 GiB | 232886629 |
| Key-Value store | Path trie storage nodes | 95.53 GiB | 959044684 |
| Key-Value store | Trie preimages | 21.37 GiB | 318962126 |
| Key-Value store | Account snapshot | 7.94 GiB | 174286121 |
| Key-Value store | Storage snapshot | 50.54 GiB | 709545865 |
| Key-Value store | Account snapshot meta | 8.23 GiB | 215565311 |
| Key-Value store | Storage snapshot meta | 57.80 GiB | 850223101 |
| Key-Value store | Beacon sync headers | 1.86 GiB | 3585187 |
| Key-Value store | Clique snapshots | 0.00 B | 0 |
| Key-Value store | Singleton metadata | 7.83 MiB | 15 |
| Light client | CHT trie nodes | 0.00 B | 0 |
| Light client | Bloom trie nodes | 0.00 B | 0 |
+-----------------+-------------------------+------------+-----------+
| TOTAL | 305.83 GIB | |
+-----------------+-------------------------+------------+-----------+
The command is as follows:
./geth db inspect --datadir node