# How effective is EIP7736's state expiry? ## Introduction State expiry has long been considered complex, leading to limited implementation despite various proposals such as [state rent](https://github.com/ethereum/EIPs/issues/35), [regenesis](https://medium.com/@mandrigin/regenesis-explained-97540f457807), and [multi-period trees](https://notes.ethereum.org/@vbuterin/verkle_and_state_expiry_proposal). With the introduction of Verkle Tree, EIP7736 simplifies the state expiry process by introducing an "expiry period" field to extension nodes, indicating if a stem has expired. Expired stems' leaves may be pruned, and users must submit a resurrection transaction to recover these pruned data. ![Everything you need to know about state expiry [MZXQKJ]](https://hackmd.io/_uploads/ByYk0uP0kx.png) *Figure 1: ["Everything you need to know about state expiry"](https://www.youtube.com/watch?v=UJrM6BOG7zk), Devcon Bangkok 2024* Detailed information is available in [EIP7736](https://eips.ethereum.org/EIPS/eip-7736). ## Methodology We modified go-ethereum (v1.15.5) for our analysis. The full code is [here](https://github.com/weiihann/go-ethereum/pull/1). ### How to reproduce 1. Setup ``` git clone https://github.com/weiihann/go-ethereum cd go-ethereum git switch v1.15.5/snapshot-meta make geth ``` 2. Run node ``` ./build/bin/geth --mainnet \ --snapshot=true \ --syncmode=full \ --cache.preimages=true \ --datadir=datadir \ --state.scheme="path" \ --db.engine="pebble" \ --synctarget=0x8e38b4dbf6b11fcc3b9dee84fb7986e29ca0a02cecd8977c161ff7333329681e ``` *Note: The `synctarget` points to the block hash of block 21000000.* 3. Execute analysis ``` ./build/bin/geth --datadir datadir db expiry ``` ## Analysis & Findings The analysis spanned Ethereum mainnet blocks 17165429 to 21000000 (~1 year and 5 months). ### How many accounts and slots were accessed? The term `access` here refers to read and write of persistent state data (i.e. accounts and slots). In terms of EVM opcodes, this refers to `SLOAD` and `SSTORE`. #### Accounts ![image](https://hackmd.io/_uploads/SJaCf9kJll.png) Only **29% of total accounts** were accessed from block 17165429 to 21000000. #### Slots ![image](https://hackmd.io/_uploads/SJwWXc1Jxl.png) Only **23.5% of the total storage slots** were accessed from block 17165429 to 21000000. ### How many stems were expired? Referring to [EIP7736](https://eips.ethereum.org/EIPS/eip-7736), a stem is considered expired if it has not been accessed in the last 2 expiry periods. In this analysis, an expiry period is **1314000** blocks, which is about **6 months**. In other words, if a stem hasn't been accessed in the past 1 year, it is considered expired. In our experiment, since we only performed about one year and 5 months worth of state syncing, we have only 3 periods as such: | Period | Block Range | Status | |--------|-----------------------------------|-------------| | 0 | <17165429 - 18479428 | Expired | | 1 | 18479429 - 19793428 | Not Expired | | 2 | 19793429 - 21000000 | Not Expired | Refer to the [appendix](#Distribution-of-stem-count-and-size) for detailed distribution. #### Accounts ![image](https://hackmd.io/_uploads/ryTbhWzyxl.png) **78.9%** of the account stems were expired. #### Account Codes ![image](https://hackmd.io/_uploads/rJimmaFxxe.png) **71%** of the account code stems were expired. #### Slots ![image](https://hackmd.io/_uploads/Sk_NXTFxxe.png) **84.7%** of the storage slot stems were expired. #### Total Stems ![image](https://hackmd.io/_uploads/BJsuAbf1ge.png) **83.5%** of the total stems were expired. ### In go-ethereum, how much storage space can we save? Aside from trie storage, go-ethereum stores accounts and storage slots in a flat key-value format (also known as snapshot) as such: ``` // Accounts prefix (1 byte) + addr hash (32 bytes) -> acc val // Bytecodes prefix (1 byte) + addr hash (32 bytes) -> bytecodes // Slots prefix (1 byte) + addr hash (32 bytes) + slot hash (32 bytes) -> slot val ``` At block 21000000, the size of the flat key-value storage is: - Accounts - 11.95GB - Bytecodes - 8.95GB - Slots - 86.29GB **Total: 107.19GB** Theoretically, if we were to apply state expiry (using the percentages from the [previous section](#Accounts)) and pruned away the expired key-values, the estimated storage size is: - Accounts - 11.95 * (0.29) = **3.47GB** - Bytecodes - 8.95 * (0.29) = **2.59GB** - Slots - 86.29 * (0.235) = **20.28GB** **Total: 26.34GB** Therefore, we effectively reduce the flat key-value storage size by **80.85GB**. ### How much storage space can we save with EIP7736? In EIP7736, only expired leaf values are pruned, but not the intermediate Verkle trie nodes. Therefore, we can only evaluate the storage space saved for the leaf values. #### Accounts ![image](https://hackmd.io/_uploads/Hy2aVaYgxx.png) We can save **20.99GB** worth of leaf values in account stems. #### Account Codes ![image](https://hackmd.io/_uploads/S1SzHTKglg.png) We can save **6.5GB** worth of leaf values in account code stems. #### Slots ![image](https://hackmd.io/_uploads/HJEXSTYegg.png) We can save **11.54GB** worth of leaf values in slot stems. #### Total Stems ![image](https://hackmd.io/_uploads/H1r4HTYggl.png) In total, we can save **39.03GB** worth of leaf values. ## Discussion of Results The results indicate that a significant proportion of state data can be considered expired and thus eligible for pruning. However, when specifically evaluating the effectiveness of EIP7736—which only prunes leaf nodes—the storage benefits are limited due to the relatively small size of these leaf values. Additionally, the amount of storage reduction achievable depends on the client's storage architecture. For instance, while pruning leaf values in geth yields minimal savings relative to the overall tree size, more substantial reductions could be realized in the flat storage format. ## Conclusion While a significant portion of Ethereum state data remains inactive and can be expired, EIP7736's approach—pruning only leaf values—offers limited storage reduction. Thus, broader pruning strategies may be necessary for substantial storage savings. The storage savings may also depend on the clients' storage architecture design. ## Appendix ### Distribution of Stem Counts and Sizes #### Accounts | Period | Stem Count | Size (GB) | |--------|------------|-----------| | 0 | 206226005 | 20.99 | | 1 | 22518195 | 1.26 | | 2 | 32698494 | 2.19 | | **Total** | **261442694** | **24.44** | #### Codes | Period | Stem Count | Size (GB) | |--------|------------|-----------| | 0 | 1467787 | 6.50 | | 1 | 195087 | 0.99 | | 2 | 404934 | 2.1 | | **Total** | **2067758** | **9.59** | #### Slots | Period | Stem Count | Size (GB) | |--------|------------|-----------| | 0 | 755102392 | 11.54 | | 1 | 70336810 | 1.11 | | 2 | 63965234 | 1.18 | | **Total** | **889404436** | **13.83** | #### Total | Period | Stem Count | Size (GB) | |--------|------------|-----------| | 0 | 962796184 | 39.03 | | 1 | 93050042 | 3.36 | | 2 | 97068662 | 5.47 | | **Total** | **1152914888**| **47.86** | ### Slot stems estimation go-ethereum stores `hash(slot)` in the database instead of the raw `slot` value. To resolve `slot` from `hash(slot)`, we need to read from the preimage. This entire process will take a long time so we resort to estimation instead. Here's how we get the estimated number of slot stems at block 21000000: From this [article](https://stateless.fyi/development/mainnet-analysis/tree-shape.html#stems-type-counts), we know that there are 939462320 storage slot stems at block 22181932. We can get the the number of stems per block by: ``` 939462320/22181932 = ~42.35 ``` So the estimated slot stems at block 21000000 is: ``` 21000000 * 42.35 = 889350000 ``` ### DB Stat (Block #17,165,429) ``` +-----------------------+---------------------------+------------+------------+ | DATABASE | CATEGORY | SIZE | ITEMS | +-----------------------+---------------------------+------------+------------+ | Key-Value store | Headers | 52.38 MiB | 90001 | | Key-Value store | Bodies | 8.79 GiB | 90001 | | Key-Value store | Receipt lists | 5.31 GiB | 90001 | | Key-Value store | Difficulties (deprecated) | 7.09 MiB | 112989 | | Key-Value store | Block number->hash | 6.12 MiB | 112937 | | Key-Value store | Block hash->number | 671.18 MiB | 17165430 | | Key-Value store | Transaction index | 12.51 GiB | 371513784 | | Key-Value store | Bloombit index | 3.30 GiB | 8585311 | | Key-Value store | Contract codes | 5.69 GiB | 896078 | | Key-Value store | Hash trie nodes | 163.71 GiB | 1510440022 | | Key-Value store | Path trie state lookups | 0.00 B | 0 | | Key-Value store | Path trie account nodes | 0.00 B | 0 | | Key-Value store | Path trie storage nodes | 0.00 B | 0 | | Key-Value store | Verkle trie nodes | 0.00 B | 0 | | Key-Value store | Verkle trie state lookups | 0.00 B | 0 | | Key-Value store | Trie preimages | 75.10 GiB | 1117798271 | | Key-Value store | Account snapshot | 9.48 GiB | 206393468 | | Key-Value store | Account snapshot meta | 0.00 B | 0 | | Key-Value store | Storage snapshot | 69.58 GiB | 972915484 | | Key-Value store | Storage snapshot meta | 0.00 B | 0 | | Key-Value store | Beacon sync headers | 1.45 GiB | 2673499 | | Key-Value store | Clique snapshots | 0.00 B | 0 | | Key-Value store | Singleton metadata | 695.29 KiB | 13 | | Light client | CHT trie nodes | 0.00 B | 0 | | Light client | Bloom trie nodes | 0.00 B | 0 | | Ancient store (Chain) | Headers | 7.74 GiB | 17075430 | | Ancient store (Chain) | Hashes | 618.81 MiB | 17075430 | | Ancient store (Chain) | Bodies | 315.94 GiB | 17075430 | | Ancient store (Chain) | Receipts | 141.98 GiB | 17075430 | +-----------------------+---------------------------+------------+------------+ | TOTAL | 821.92 GIB | | +-----------------------+---------------------------+------------+------------+ ``` ### DB Stat (Block #21,000,000) ``` +-----------------------+---------------------------+------------+------------+ | DATABASE | CATEGORY | SIZE | ITEMS | +-----------------------+---------------------------+------------+------------+ | Key-Value store | Headers | 55.70 MiB | 90001 | | Key-Value store | Bodies | 5.94 GiB | 90001 | | Key-Value store | Receipt lists | 6.62 GiB | 90001 | | Key-Value store | Difficulties (deprecated) | 15.69 MiB | 146957 | | Key-Value store | Block number->hash | 14.68 MiB | 146921 | | Key-Value store | Block hash->number | 821.11 MiB | 21000001 | | Key-Value store | Transaction index | 13.11 GiB | 380442313 | | Key-Value store | Bloombit index | 4.29 GiB | 10503175 | | Key-Value store | Contract codes | 8.95 GiB | 1392047 | | Key-Value store | Hash trie nodes | 712.63 GiB | 3744465835 | | Key-Value store | Path trie state lookups | 0.00 B | 0 | | Key-Value store | Path trie account nodes | 0.00 B | 0 | | Key-Value store | Path trie storage nodes | 0.00 B | 0 | | Key-Value store | Verkle trie nodes | 0.00 B | 0 | | Key-Value store | Verkle trie state lookups | 0.00 B | 0 | | Key-Value store | Trie preimages | 86.69 GiB | 1289580154 | | Key-Value store | Account snapshot | 11.95 GiB | 261442694 | | Key-Value store | Account snapshot meta | 2.98 GiB | 76061854 | | Key-Value store | Storage snapshot | 86.29 GiB | 1196998244 | | Key-Value store | Storage snapshot meta | 35.66 GiB | 517482187 | | Key-Value store | Beacon sync headers | 612.00 B | 1 | | Key-Value store | Clique snapshots | 0.00 B | 0 | | Key-Value store | Singleton metadata | 15.53 MiB | 14 | | Light client | CHT trie nodes | 0.00 B | 0 | | Light client | Bloom trie nodes | 0.00 B | 0 | | Ancient store (Chain) | Headers | 9.86 GiB | 20910001 | | Ancient store (Chain) | Hashes | 757.77 MiB | 20910001 | | Ancient store (Chain) | Bodies | 586.19 GiB | 20910001 | | Ancient store (Chain) | Receipts | 211.26 GiB | 20910001 | +-----------------------+---------------------------+------------+------------+ | TOTAL | 1.74 TIB | | +-----------------------+---------------------------+------------+------------+ ``` *Note: The node uses hash-based state scheme and is unpruned. The actual db size is much smaller than this.*