Beacon node load as fn of validator set size

This doc is an orientative guide. Each implementation comes with different tradeoffs that affect resource consumption. Also there are possible optimizations with incomplete adoption through clients. # Bandwidth ## Gossip Beacon chain bandwidth is dominated by gossip of SignedAggregateAndProof (global topic) and Attestation (subnet topics). **Aggregate bandwidth** There are a maximum of 64 attestation subnets, with a target size of 128 participants per slot. So at validator counts > `32 * 64 * 128` = 262,144 the subnet count is maxed. On each subnet there are a target of 16 aggregators, which produce all traffic allowed on the SignedAggregateAndProof global topic ![](https://hackmd.io/_uploads/HkJuDWiY3.png) **Attestation subnets bandwidth** Each attestation subnet includes at minimum subcribers that forward all traffic: - Selected aggregators - Long lived random subnets Non-aggregator participans can broadcast their message to subcribers but are not required to forward other traffic. The probability of a validator being selected as an aggregator for one epoch is: ``` P = 16 / max(128, N // 64 // 32) ``` An aggregator can only publish aggregates containing attestations for its assigned slot. While implementations do some actions ahead of time to ensure a healthy mesh before the aggregator slot, and aggregator _could_ only subscribe during a single slot. ![](https://hackmd.io/_uploads/BJRUHfoK2.png) _Ref [Ethereum-beacon-chain-bandwith.ipynb](https://app.noteable.io/f/149afe15-ee8f-4ef6-931e-09b021024718/Ethereum-beacon-chain-bandwith.ipynb)_ Initial rules for long lived random subnet selection are: 1 random subnet subscription per connected validator (capped at 64). Since [ethereum/consensus-specs#3312](https://github.com/ethereum/consensus-specs/pull/3312) it's a fixed 2 subnets per beacon node regardless of connected validators. As per spec, this 2 subnets are never overlapping. Note that the latter spec is not rolled out by July 2023 will be progressively rolled out through late 2023. ![](https://hackmd.io/_uploads/HyVEHfoKh.png) Post #3312 for any number of connected validators, all beacon nodes will experience the level of trafic ![](https://hackmd.io/_uploads/SyyrSGoth.png) # Memory Revelant memory hogs: - State cache (implementation specific / unbounded) - Pubkey cache (Vec<index, JacobianPoint>) - Deposit tree cache - Gossip queues - Gossip history - Gossip seen cache ## State cache AFAIK all clients require the entirety of a state to be loaded in memory in other to process blocks. Hashing a full beacon state from scratch is expensive (5-10s) so a hash cache is requried. There are two common representations of the state and its hash cache: - linear [Prysm, Nimbus, Lighthouse* (transitioning to tree)] - tree [Lodestar, Teku] An SSZ serialized state size is approx `139 * N` bytes (N = validator set size). 1M validators -> 139MB per state of only data, no cache. A tree representation requires significantly more but reduces the cost of additional states that structurally share data. In Lodestar benchmarks a full epoch worth of descendant states take ~1.5x memory of a single state. But the cost of single state is 5-10x it's linear serialized size. In good network conditions almost all processing can be done with the head state. However in adverse condition the amount of states required to process network data can grow unbounded, as a function of parallel forking. Most clients choose to keep recently access states in memory, and regenerate the rest from disk on demand. Larger states mean less states kept in memory at once, limiting the severity of adverse conditions tolerated before having to drop traffic or crashing (not an uncommon result in recent incidents). Exact numbers of this tiping points are not trivial to compute and heavily dependant on continuous improvements of the clients. Clients could explore to model the state in a similar way to the execution layer, tolerating much higher state sizes theoretically. # Compute **Crypto BLS** On good network conditions compute is dominated by BLS signature verification. BLST performs ~1 ms for single signature. However there are two main optimizations: - Batch verify multiple signature sets of distinct messages [ethresear.ch#5407](https://ethresear.ch/t/fast-verification-of-multiple-bls-signatures/5407), ammortizes the final pairing tending to 50% compute reduction. - Batch verify multiple signatures of same message. Both attestations and aggregates on the same (slot, index) tuple are likely to have identical AttestationData. Many signature sets can be added and pay a single signature verification, tending to +90% reduction (only pay sig check + add op). **Crypto noise** All traffic between beacon nodes goes through a libp2p stack and: - encrypted with noise - multiplexed with yamux or mplex Noise uses chacha20poly1305, for Lodestar our assemblyscript implementation seal 1MB in 15ms, same for open. **State transition** State transition contributes marginally to total compute. However it's on the hot path of block verification. Beacon chain block processing is divided in: - block processing: Done per block, typically very fast (< 10 ms), sublinear increase to set size - epoch processing: Done every time processing crosses an epoch boundary much slower multi-100ms, linear to set size Slow epoch processing is the main cause to late slot 0 blocks, as it delays block verification significantly compared to non-epoch boundaries. However all clients implement an optimization to pre-compute the epoch-transition ahead of time, so that only block processing is the only remaining computation upon receiving the block at slot 0. ![](https://hackmd.io/_uploads/HknRL8iKh.png) _Credit: Michael Sproul "Failed blocks on mainnet July 2023"_ # Disk Beacon chain nodes store: - Beacon blocks: size somewhat independent to active set size - non-finalized + finalized: mandatory as per p2p spec - Beacon states: size proportional to active set size - non-finalized: required for most state cache strategies - finalized: optional, used only for archive nodes / explorers Non-finalized states persisted in disk increase as function of bad network conditions (distance to finality + forking). So bigger states significantly increase the peak requirements in adverse conditions but naive implementations suffer the most. For the initial period of the Beacon chain, all clients implemented naive strategies of state persistance on disk: each different state is written in full. AFAIK some clients are improving here: - Nimbus: stores immutable validator data separatedly - Teku: stores states as tree / diffs - Lighthouse: (WIP) diff storage Diff storage should not be that vulnerable to state growth, as for non-archive node disk requirements should be dominated by block storage ---- Bandwidth charts reference: https://app.noteable.io/f/149afe15-ee8f-4ef6-931e-09b021024718/Ethereum-beacon-chain-bandwith.ipynb