owned this note
owned this note
Published
Linked with GitHub
[TOC]
# TLDR
:::info
This document is a collection of questions directed to EL client developers to better asses what worst cases we need to build tests for & prepare the infra for.
Here's a list of all current scenarios: {%preview https://hackmd.io/@CPerezz/ryoMhzaLel %}
:::
## Questions for Geth
### Geth-specific questions
- For Pebble, Point reads and Range deletions are the worst operations that can be executed. Rage scans are hard for me to imagine being triggered with the access patterns that block processing and/or RPC offer. But, for point reads, do we have an idea on how we can trigger worst cases? Not only from the tx perspective (we try to modify the shallowest range of keys possible). But from the SSTable ordering perspective, can we influence on that somehow to make sure we trigger the absolutely worst case?
- What is the best way to mess with compaction?
- Sustained High-Volume Writes: We just SSTORE as much as possible in as many different and "distant" possitions as possible. Hoping compaction never finishes eventually halting upcomming writes.
- Tombstone spamming: We wait until storage positions or contracts leave the cache, to then delete them. We also create new ones and do the same. Eventually, amplyfying the space the DB takes as **tombstones are only removed when compaction processes their specific file**.
- Any other scenarios to mess with compaction only? Any deterministic benchmarks or similar we could perform?
- With the integration of Pebble, have you modeled the frequency and performance impact of database compaction? Is there any easier way (outside from the chain itself) to asses deterministically the risk that the compaction queue could grow faster than it can be processed, leading to a feedback loop of performance degradation and increasing I/O stalls?
- What is the absolute worst-case block processing time you have modeled or fear for a 100M gas block on a 2x mainnet state, specifically a block crafted to maximize random MPT node lookups and cache misses?
- How do you envision cache management evolving to handle a 2x or 4x state size? Will operators require significantly more RAM (e.g., >128GB) to maintain performance parity, or are there architectural changes planned to improve cache efficiency and reduce the memory footprint for large states?
### General questions
- Given the accelerated rate of state growth under a 100M gas limit, what is your projection for how frequently a full node operator will need to perform offline pruning to remain within a standard 2TB SSD limit? At what point does the required frequency and duration of this downtime become an unacceptable operational risk for a solo staker?
- During a deep reorg (e.g., 10+ blocks) on a 2x state size database, what are the primary performance bottlenecks you expect to find in Geth? Aside from having it under a heavy load and, compacting, is there any other thing that could make a reorg worse to digest for Geth?
- Are there any particular scenarios (at 100Mgas and 2x mainnet-state size)that you are worried about wrt. your client architecture and the tradeoffs that you've done in the implementation of any part of the codebase? What would you like to see tested for 100Mgas and 2x mainnet state size? Please, be specific and extend yourself as much as possible. This will help us to figure out any high-detail testing scenarios we might be missing that are really tight to the architecture and design of the client.
## Questions for Netheremind
### Nethermind-specific questions
- The flat database layout is designed to accelerate state reads. What is the computational process for reconstructing the state root from this flat layout at the end of each block, and what are its worst-case performance characteristics? Can a block be crafted with a specific pattern of state changes (e.g., across a wide range of addresses) to maximize the complexity and duration of this reconstruction step?
- [It has been observed](https://docs.nethermind.io/fundamentals/pruning/#preparation-for-full-pruning) that online pruning can lead to a ~5-10% loss of validator rewards on the current mainnet. How do you project this performance degradation will scale on a 2x state network with a 100M gas limit? Is there a risk that pruning becomes a continuous performance drag rather than a periodic maintenance task, as the rate of state growth matches or exceeds the rate of pruning?
- Are there known transaction patterns, such as the rapid creation and subsequent destruction of a large number of temporary accounts, that are particularly expensive for the flat database model to handle, either during block execution or the subsequent online pruning? In summary, if this optimization introducing a weakness we should stress-test on any other part of the node?
- Is it worth to test/monitor or is it even possible to loose the sync between the flat db optimization structure and the underlying source of truth (the tree itself)? Is there any process like online proving or other factors that could influence on this?
- What is the best way to mess with compaction?
- Sustained High-Volume Writes: We just SSTORE as much as possible in as many different and "distant" possitions as possible. Hoping compaction never finishes eventually halting upcomming writes.
- Tombstone spamming: We wait until storage positions or contracts leave the cache, to then delete them. We also create new ones and do the same. Eventually, amplyfying the space the DB takes as **tombstones are only removed when compaction processes their specific file**. (Unsure if that applies here too).
- Any other scenarios to mess with compaction only? Any deterministic benchmarks or similar we could perform?
- Your documentation mentions that the flat DB layout helps "accelerate transaction processing & virtual machine execution." Can you elaborate on the specific mechanisms by which it speeds up the EVM beyond faster state reads?
- The `PreWarmStateOnBlockProcessing` configuration option claims up to a 2x speed-up in block processing. What exactly is being "pre-warmed," and what are the trade-offs in terms of memory or I/O? What kind of block (e.g., one with highly unpredictable state access patterns) would be a worst-case scenario for this feature, potentially making it slower than baseline?
### General questions
- Given the accelerated rate of state growth under a 100M gas limit, what is your projection for how much resources will a home-staker need with the recommended hardware in order to support 2x mainnet size and no significant performance degradations that can affect rewards or proposals?
- During a deep reorg (e.g., 10+ blocks) on a 2x state size database, what are the primary performance bottlenecks you expect to find in Nethermind? Aside from having it under a heavy load, online prunning and, compacting, is there any other thing that could make a reorg worse to digest for Nethermind?
- Are there any particular scenarios (at 100Mgas and 2x mainnet-state size)that you are worried about wrt. your client architecture and the tradeoffs that you've done in the implementation of any part of the codebase? What would you like to see tested for 100Mgas and 2x mainnet state size? Please, be specific and extend yourself as much as possible. This will help us to figure out any high-detail testing scenarios we might be missing that are really tight to the architecture and design of the client.
## Questions for Erigon
### Erigon-specific questions
- With MDBX, writes and non-immediate pagre freeing are the worst operations that can be performed (Specially random writes). Do you have any thoughts or ideas on which will be the worse type of block pattern for you to digest? Both state-root computation-wise and DB/IO-performance wise?
- The "Compute State Root" stage iscritical path for block processing time. What constitutes a worst-case input for this stage? Is it a large number of total state changes, changes to deep or sparse trie paths, or another pattern? How does its performance scale with the number of `IntermediateTrieHashes` that need to be rebuilt from scratch versus updated?
- Are there other (less obvious) stages in your block-processing pipeline (maybe like Recover Senders?) or the various index-generation stages that you're concerned about becoming performance bottlenecks within BloatNet? If so, which ones?
- How does "Execute Blocks" stage (which is described as "executing blocks without building the MPT in real time") get affected by different types of blocks? Are there any worst-cases you already know?
- You sepparate "Recover Senders" from "Execute Blocks". Are there data dependencies between these processes that we should try to exploit? Something like a block with lots of tx with low computational burden that forces a cascade slowdown.
- Erigon's choice of MDBX is a key differentiator. What are the specific advantages and potential failure modes of MDBX compared to the more common RocksDB/PebbleDB under the extreme and sustained write load that Bloatnet will generate? Does MDBX have an analogous process to compaction that could cause I/O stalls or performance degradation? If so, how can we try to push it to the limit? And what metrics should we be collecting to make sure we understand how it responds to the sustained bloating?
- Your architecture decouples transaction execution (on the "plain state") from Merkle trie generation ("hashed state"). What are the memory overheads and performance costs associated with maintaining this dual representation, particularly when the total state size doubles or quadruples? Should home-stakers expect to upgrade RAM for instance when we go to 100Mgas and 2-4x mainnet state-size?
- The unwind logic for handling reorgs in a staged pipeline is inherently complex. Have you stress-tested deep reorgs (e.g., 20+ blocks) on a large-state database? What are the failure modes you have observed or anticipate, and what is the expected recovery time under such a scenario?
### General questions
- Given the accelerated rate of state growth under a 100M gas limit, what is your projection for how much resources will a home-staker need with the recommended hardware in order to support 2x mainnet size and no significant performance degradations that can affect rewards or proposals?
- During a deep reorg (e.g., 10+ blocks) on a 2x state size database, what are the primary performance bottlenecks you expect to find in Erigon? Aside from having it under a heavy load, compacting-analogous process running, etc.. is there any other thing that could make a reorg worse to digest for Erigon?
- Are there any particular scenarios (at 100Mgas and 2x mainnet-state size)that you are worried about wrt. your client architecture and the tradeoffs that you've done in the implementation of any part of the codebase? What would you like to see tested for 100Mgas and 2x mainnet state size? Please, be specific and extend yourself as much as possible. This will help us to figure out any high-detail testing scenarios we might be missing that are really tight to the architecture and design of the client.
## Questions for Besu
### Besu-specific questions
- The Bonsai Tries architecture is a novel approach to state storage. What are the known performance cliffs or pathological cases for this model? For instance, can a specific pattern of state writes (e.g., updates that are widely dispersed across the trie) lead to excessive fragmentation or inefficiency in the "Manicured Trie" update process?
- The "flat db healing" process is vital for ensuring performant state access after an initial sync. What is the resource overhead of this process, and are there scenarios (e.g., syncing from a particularly fragmented snapshot) where it could fail or take an exceptionally long time, leaving the node in a performance-degraded state?
- Bonsai tradeoffs imply that reorging is one of the most critical operations for a node. Have you done any benchamarks? Which will be the things you'd say will complicate even further the reorg work? Compacting at the same time? High load? Anything else? Any specific tests that come to mind that could be much more critical?
- The Bonsai accumulator gathers state modifications during block processing and can trigger background tasks to pre-fetch trie paths. What types of transaction sequences or state access patterns would be most effective at defeating this pre-fetching optimization, forcing the final trie update to be slow?
- How have you optimized your Java-based EVM for performance? Are there specific opcodes or contract patterns that are known to be less performant on the JVM compared to implementations in other languages like Go or Rust?
- How is JVM garbage collection expected to behave under a sustained 100M gas/block load? Are you concerned that "stop-the-world" GC pauses could become a significant and unpredictable component of block processing time, potentially causing validators to miss attestation deadlines? Have you tested that scenario and if not, how would you suggest doing it? Maybe it's not worth it?
### General questions
- Given the accelerated rate of state growth under a 100M gas limit, what is your projection for how much resources will a home-staker need with the recommended hardware in order to support 2x mainnet size and no significant performance degradations that can affect rewards or proposals?
- During a deep reorg (e.g., 10+ blocks) on a 2x state size database, what are the primary performance bottlenecks you expect to find in Besu? Aside from having it under a heavy load, online prunning and, compacting, is there any other thing that could make a reorg worse to digest for Besu?
- Are there any particular scenarios (at 100Mgas and 2x mainnet-state size)that you are worried about wrt. your client architecture and the tradeoffs that you've done in the implementation of any part of the codebase? What would you like to see tested for 100Mgas and 2x mainnet state size? Please, be specific and extend yourself as much as possible. This will help us to figure out any high-detail testing scenarios we might be missing that are really tight to the architecture and design of the client.
## Questions for Reth
### Reth-specific questions
- Your team has benchmarked Reth at impressive speeds for both live and historical sync. As you push towards a sustained 100M gas per second on a 2x state size, where do you anticipate the first major performance bottleneck will appear? Do you expect it to be in the execution stage, similar to Erigon, or do you foresee it emerging in the state root calculation or another part of the pipeline?
- You're working on optimizing the ["last mile" of state root calculation](https://github.com/paradigmxyz/reth/issues/16086), including the multiproof and sparse trie update steps. What are the most computationally intensive parts of this process in your implementation, and what types of state changes (e.g., creations, deletions, modifications) are most difficult to handle efficiently?
- Your benchmarks distinguish between "live sync" (including execution and trie updates) and "historical sync" (execution only), with historical sync being an order of magnitude faster. 1 Does this imply that state commitment, rather than raw EVM execution, is still the dominant bottleneck in your pipeline?
- With MDBX, writes and non-immediate pagre freeing are the worst operations that can be performed (Specially random writes). Do you have any thoughts or ideas on which will be the worse type of block pattern for you to digest? Both state-root computation-wise and DB/IO-performance wise?
- On the same way as Erigon, you use MDBX. What are the specific advantages and potential failure modes of MDBX compared to the more common RocksDB/PebbleDB under the extreme and sustained write load that Bloatnet will generate? Does MDBX have an analogous process to compaction that could cause I/O stalls or performance degradation? If so, how can we try to push it to the limit? And what metrics should we be collecting to make sure we understand how it responds to the sustained bloating?
- Is simply maximum writes/block triggered what will force reth's state root computation to take as long as possible? Or are there any combinations with other back process or types of tx patterns that can slow it down more?
### General questions
- Given the accelerated rate of state growth under a 100M gas limit, what is your projection for how much resources will a home-staker need with the recommended hardware in order to support 2x mainnet size and no significant performance degradations that can affect rewards or proposals?
- During a deep reorg (e.g., 10+ blocks) on a 2x state size database, what are the primary performance bottlenecks you expect to find in Reth? Aside from having it under a heavy load, online prunning and/or, compacting (or any analogous process if applies). Is there any other thing that could make a reorg worse to digest for Reth?
- Are there any particular scenarios (at 100Mgas and 2x mainnet-state size)that you are worried about wrt. your client architecture and the tradeoffs that you've done in the implementation of any part of the codebase? What would you like to see tested for 100Mgas and 2x mainnet state size? Please, be specific and extend yourself as much as possible. This will help us to figure out any high-detail testing scenarios we might be missing that are really tight to the architecture and design of the client.