# Ethereum mainnet - MPT analysis # Context Previously, we’ve explored which are the cryptographic primitive's differences and measured their overhead; for more information, see: - [Verkle Tries exploration about Keccak vs. Pedersen Commitments](https://hackmd.io/@jsign/verkle-tries-exploration-about-keccak-vs-pedersen-commitments) - The [verkle-vs-patricia](https://hackmd.io/@jsign/verkle-tries-exploration-about-keccak-vs-pedersen-commitments) repo contains more benchmarks. Apart from comparing MPT and VKT cryptographic primitives, it’s also worth looking at their shapes since the number of cryptographic operation in each case depend on the shape (e.g: depth) of the trees. We’ll analyze the: - State Trie: this is a global trie with information for each Ethereum address. - Storage Trie: This SC-scoped storage trie holds the contract's persistent data. # What’s the approach for these results? The results described in this document correspond to Ethereum mainnet at the end of November 2022: - The state tree was walked depth-first search, collecting information when touching leaves. If the leaf was an SC, the underlying Storage Trie was walked DFS. - The results aren’t a full walk of all the trie, but that isn’t necessary. The results converge pretty fast just some minutes after start walking the tree. This happens because the keys in the trie are hashed results, so despite doing DFS, the leaves are a (logical) random sample of the whole trie. - Despite not being a full scan, I preferred to take a big walk anyway. This report shows the results of walking a total of: - More than 36 million State Trie leaves. - More than 3.7 million Storage Tries. # How do I reproduce these results? The tool to generate these results is open-source and part of the [verkle-vs-patricia](https://github.com/jsign/verkle-vs-patricia) repo, particularly the [`analytics` tool](https://github.com/jsign/verkle-vs-patricia#analytics). Anyone having a synced Geth node can run it and must see almost the same results (since these tree traits evolve slowly). The tool generates a preliminary set of results in the standard output every couple of seconds, and when the program is interrupted (Ctrl+C), it dumps multiple `.csv` files with the results. The repository also has Gnuplot files to transform the CSV files into plots. See the README for more instructions. The `.csv` results and the plot files will be shown in the following sections. # State Trie The *State Trie* is the main trie that holds the information for every Ethereum address (EOAs and SCs). ## Plaintext report ```markdown # Walked 36772203 (EOA + SC) accounts: State Trie - Depths: 8: 49.35% (18145363) 9: 46.33% (17035102) 10: 4.18% (1537973) 11: 0.14% (52828) 12: 0.00% (466) 7: 0.00% (465) 13: 0.00% (6) # State Trie - Path types: B.B.B.B.B.B.B.L: 49.35% (18145363) B.B.B.B.B.B.B.B.L: 46.32% (17034627) B.B.B.B.B.B.B.E.B.L: 2.22% (817195) B.B.B.B.B.B.B.B.B.L: 1.96% (720764) B.B.B.B.B.B.B.B.E.B.L: 0.13% (49085) B.B.B.B.B.B.B.E.B.B.L: 0.01% (1929) B.B.B.B.B.B.B.B.B.B.L: 0.00% (1812) B.B.B.B.B.B.E.B.L: 0.00% (475) B.B.B.B.B.B.L: 0.00% (465) B.B.B.B.B.B.B.B.B.E.B.L: 0.00% (240) B.B.B.B.B.B.B.E.B.E.B.L: 0.00% (222) B.B.B.B.B.B.E.B.B.L: 0.00% (14) B.B.B.B.B.B.B.B.E.B.E.B.L: 0.00% (6) B.B.B.B.B.B.B.B.E.B.B.L: 0.00% (4) B.B.B.B.B.B.E.B.E.B.L: 0.00% (2) ``` Let’s explain the above report so we can get used to the nomenclature: - `Walked XXX (EOA + SC) accounts` indicate how many leaf nodes of the trie were walked (depth-first search) to generate the report results. - `State Trie - Depths`, shows a histogram with the % of leaf nodes that have a particular depth. - e.g: `<depth>: XX.XX% (Y)` means that `XX.XX%` of scanned leaves (a total of `Y` leaves), had a depth of `<depth>`. - `Path types`, show a histogram with the % of *path types* discovered in each branch. `B` means a *Branch node*, `E` means an *Extension Node*, and `L` means a *Leaf node* - e.g: `B.B.B.E.L: 42.42% (1234)` indicates that `42.42%` of leaves had a branch structure composed of the first three *Branch nodes*, then a single *Extension node*, and finally a *Leaf node* - This was done mainly to double-check the analysis was making sense. We can see how most of the first part of branches are *Branch nodes* (which is expected), and an *Extension node* is the type of node just before the expected *Leaf node*. - In the tail of the histogram (low %), we can see more interesting combinations with branches having *Extension nodes* in the middle of the branch. ## Plots This section contains graphical representations of the above plaintext reports. ![image](https://user-images.githubusercontent.com/6136245/205495458-fef29a5e-145f-4fb5-978c-1607ccd11948.png) ![image](https://user-images.githubusercontent.com/6136245/204922990-8e0f6f29-5634-4cda-8863-a9d3ba84cb06.png) # Storage Tries While walking the State Trie, whenever we touched a leaf corresponding to an SC account, we jumped into the Storage Trie to gather metrics. ## Plaintext report ```markdown # Walked 3710196 Storage Tries: Storage Trie - Depths: 1: 62.20% (2307851) 2: 36.16% (1341745) 3: 1.31% (48445) 4: 0.27% (9996) 5: 0.06% (2042) 6: 0.00% (110) 7: 0.00% (7) # Storage Trie - Number of used slots: 1: 62.20% (2307851) 2: 12.52% (464601) 4: 11.14% (413395) 3: 3.22% (119459) 5: 2.18% (80712) 6: 1.04% (38660) 10: 0.86% (32003) 7: 0.82% (30415) 11: 0.63% (23553) 8: 0.63% (23360) 12: 0.56% (20603) 9: 0.44% (16494) 18: 0.37% (13757) 13: 0.25% (9454) 16: 0.23% (8669) 17: 0.23% (8557) 14: 0.23% (8481) 15: 0.16% (5887) 19: 0.14% (5251) 24: 0.08% (2974) 20: 0.08% (2875) 22: 0.07% (2623) 23: 0.07% (2479) 21: 0.07% (2467) 25: 0.05% (1893) 30: 0.04% (1669) 27: 0.04% (1648) 28: 0.04% (1507) 26: 0.04% (1470) 29: 0.04% (1428) 32: 0.03% (1149) 31: 0.03% (1135) 33: 0.03% (1090) 34: 0.03% (993) 36: 0.02% (886) 35: 0.02% (877) 37: 0.02% (848) 38: 0.02% (762) 39: 0.02% (738) 40: 0.02% (685) 42: 0.02% (655) 43: 0.02% (634) 45: 0.02% (621) 41: 0.02% (600) 44: 0.01% (551) 46: 0.01% (541) 48: 0.01% (534) 49: 0.01% (479) 47: 0.01% (469) 52: 0.01% (457) 50: 0.01% (443) 51: 0.01% (436) 54: 0.01% (423) 53: 0.01% (416) 57: 0.01% (409) 55: 0.01% (374) 59: 0.01% (369) 56: 0.01% (369) 58: 0.01% (367) 63: 0.01% (318) 61: 0.01% (315) 62: 0.01% (307) 68: 0.01% (305) 66: 0.01% (302) 60: 0.01% (299) 67: 0.01% (289) 74: 0.01% (280) 71: 0.01% (278) 65: 0.01% (276) 72: 0.01% (273) 64: 0.01% (273) 70: 0.01% (272) 73: 0.01% (256) 69: 0.01% (253) 75: 0.01% (234) 76: 0.01% (220) 80: 0.01% (219) 82: 0.01% (215) 88: 0.01% (214) 79: 0.01% (208) 91: 0.01% (207) 81: 0.01% (200) 78: 0.01% (200) 87: 0.01% (199) 84: 0.01% (194) 85: 0.01% (190) 83: 0.01% (187) 90: 0.00% (184) 77: 0.00% (182) 96: 0.00% (177) 89: 0.00% (171) 92: 0.00% (169) 86: 0.00% (164) 98: 0.00% (163) 93: 0.00% (161) 95: 0.00% (159) 102: 0.00% (157) 97: 0.00% (157) 106: 0.00% (156) 99: 0.00% (153) ``` The tail of `Number of used slots` is pretty long, so it’s truncated. ## Plots ![image](https://user-images.githubusercontent.com/6136245/204923034-b2f01dde-0d07-4311-9bbe-42840fd3ba50.png) ![image](https://user-images.githubusercontent.com/6136245/204923074-aba283c7-2912-4782-b694-9f1f8434ce0a.png) # Appendix After reading the report Guillaume was interested in knowing the average length of extension nodes. Here're the results for the State Trie: ```markdown # Walked 163065487 (EOA + SC) accounts: State Trie - Extension Length: 1: 93.30% (3609877) 2: 5.74% (221891) 3: 0.56% (21654) 4: 0.31% (11882) 5: 0.09% (3410) 6: 0.01% (220) 7: 0.00% (16) ```