Data collection points for "BloatNet"

# Data collection for "BloatNet" ![image](https://hackmd.io/_uploads/BJw_ooDgee.png) :::warning The idea for this document is to collect any interesting data points from a testnet with all EL clients where state growth rates are maximized wrt `gas_limit` increases within (60, 100, 300)MGas. **We wonder what kind of data is interesting to obtain to people's view and why.** ::: :::info **We strongly encourage EL teams as well as Research individuals to fill in more data points** (specially ephimeral ones) and for which they would like to have data such that more research or fixes to current EL implementations can be done. And subtle improvement areas can be spotted. ::: ## Metrics always taken: - RAM - SWAP - Gorutines - Threads/CPUs used - State-size growth - Overall disk size - Number of reorgs - Heap activity ## Stateless-consensus desired metrics ### 0. Block hash and Block time calls ### Description: Discussed with Guillaume ### 1. Read/Write performance speed wrt. overall state growth. #### Description: It would be nice to know how performance for I/O gets degraded wrt the state growth and its size at each time. Particular measurements: - Min/Max/Avg Bytes/s Read and Write for every block with a 50k block queue. Same block queue for all the bloated states. - Avg RAM usag ### Block time wrt. overall state size and gas_limit If block-time (block production/generation) time goes avobe 4s could start to be a problem. Also attesters would have more issues to make a block finalize on time. Particular measurement: - For a machine with the recommendend hardware as per [EIP-9270](https://github.com/ethereum/EIPs/pull/9270) - Block flamegraph decomposition. Similar to: ![image](https://hackmd.io/_uploads/BycgPCb7xl.png) ### 2. At which state-size do client-DBs break? #### Description: It's important to know when and where client DB implementations will just break as they can't hold X[G/T]Bs of state. So that we see if things like e.g. sync or other things (OOM, etc) fail. ### 4. Size of an MPT witness & BAL wrt. state size and gas_limit #### Description: It's nice to be able to take measurements about MPT-proof sizes for state as well as for BALs(looking at Glamsterdam already). Specific metrics: - MPT Multiproof for state-diff size. - Size of BAL for each block at different gas-limit rates with different bloating scenarios. - Time taken to compute the BAL and the MPT multiproof. ### 5. DB compaction impact (if applies to DB type). #### Description: Interesting to see the impacts on DB compaction that state-growth will cause and when. Would be interesting to have a client constantly running compaction or at least monitor closely what's the Specific metrics: - Duration - Frequency - What happens if we reorg when ongoing. Check for different reorg sizes. - Compaction count as in: - ![image](https://hackmd.io/_uploads/Bk745AWQxl.png) ### 6. Sync metrics (when sync breaks and/or how slow/bad becomes wrt state growth and gas_limit increases) #### Description: Test snap-sync with each client for every 50-100 GB. Account for starting and final chain/DB size and speed of syncyng as well as any issues that happen on the meantime. This measurements should be taken under heavy network stress and under avg. workload. - Healig phase duration - Overall sync duration - Start-End DB size - GasLimit at which is performed - R/W performance for all ops. :::danger pari ping for Sync scenario :::: ### 7. Cache missing rates (if cache exists). ##### Description: BALs might ### 9. Transaction Confirmation Times #### Description: Real-world transaction processing speeds under various network loads. Important for maintaining user experience as throughput increases. ### 10. Read/Write performance speed at different state sizes and tree depth leaf inclusions. #### Description: We want to understand this multidimensional tradeoff space of State size - Deepest leaf included - Read/Write performance. The idea is to go in increments of 100GB and force DB leaf inclusions at different tree depths and then measure read/write and state-root computation performance. ## Reth's desired metrics * general i/o metrics (for ex those from prometheus node_exporter), even better if it's possible to collect page cache metrics. ## Erigon's desired metrics * Measure database Access (SLOAD/SSTORE) * Measure state root computation time ## Nethermind's desired metrics ## Besu's desired metrics ### Percentage of state accesses like SLOAD in block execution time. This should be checked with different gas limits, and different state sizes and for similar blocks. ### State root hash calculation. Do EL clients have the same throughput (mgas/s) with higher gas limit? And if no, is it only related to state growth? ### What gas_limit and at what state size do EL clients beak on a predefined hardware, like a a 32 GiB machine. Using [Hardware and Bandwidth recommendations](https://github.com/ethereum/EIPs/pull/9270) from EIP 9270 to see when different EL clients break due to state being too big and the gas_limit too high. To test this multidimensional problem we should get to certain state sizes (increments of 50-100GiB), and then stress-test the network with the different attack scenarios we have prepared adapted to the gas limits proposed. ## Geth desired test cases ### Random storage access patterns We are mainly concerned about blocks that have a similar access pattern as ZEN mining: We would like to see the following scenarios tested with a single large contract with a lot of random storage slots - Reading a lot of random storage slots with SLOAD - Updating EXISTING storage slots to another non-zero number - Updating EXISTING storage slots to 0 - Writing more random storage slots in the contract ### Reorg testing We would like to see larger reorgs of blocks that contain a lot of changes to the database, like - Reorging the head block - Reorging the last 127 blocks - Reorging the last 128+ blocks (geth supports reorgs up to 8k) ### Testing of pathdb in archive mode We are working on shipping a new archive mode which would be nice to test on this network. ### Impact of account tree depth vs storage tree depth During our state root computation, we calculate the storage tree's in parallel. The account tree is computed at the end. Thus an increase in the depth of the account trie probably has more impact on our state root computation than increase in the state tree