Should we adopt EIP-6914 (re-use indexes)?

# Should we adopt EIP-6914: re-use indexes? **TLDR** EIP-6914 re-assigns long withdrawn indexes to new validator desposits validators, to prevent unbounded growth of the Beacon State. This document addresses some recurring concerns of the EIP: - Is unbounded beacon state growth actually a problem? ***Long term yes*** - Can it be solved with engineering solutions (no EIP)? ***Not entirely*** - Should we adopt EIP-6914? ***Maybe, in several years*** This document is welcoming to both core devs and casual observers. Feel free to jump to your section of interest: - [Why is a big consensus state problematic?](#Why-is-a-big-consensus-state-problematic) - [EIP-6914 alternative: compressing inactive validators](#EIP-6914-alternative-compressing-inactive-validators) - [Adding EIP-6914, engineering cost](#Adding-EIP-6914-engineering-cost) - [Ecosystem considerations](#Ecosystem-considerations) ## Background The beacon chain state is basically a big array of validator records. Deposits for new public keys appends records to this big array. There is no mechanism to ever prune validator records. The beacon chain state therefore, experiences unbounded growth. Note that the maximum possible number of _active validators_ is bounded by the ETH supply. The same capital can deposit, exit and re-deposit, creating multiple validator records per unit of ETH over time. The performance problems induced by a large count of _active_ validators are out of scope for this document. Here we focus on the additional load caused by _inactive_ validators, i.e. dead weight to the state. Let's define `dead_weight_rate` as: ``` dead_weight_rate = (total_indexes - active_indexes) / active_indexes ``` The beacon chain is still on its high growth phase. There have been few exits relative to deposits so far, thus the `dead_weight_rate` is low (0.125 as of October 2023). Factors that can significantly increase the `dead_weight_rate` in the mid term: - Big rotations from one LSD protocol to another - Regulatory forces over a big exchange offering staking products - Widespread consolidation with MaxEB Due to the activation churn, it is not possible for the `dead_weight_rate` to exceed 1 in less than 1-2 years, limiting the scale of the problem short-term. Long term, once the current stakers and operators rotate or retire we could see a steady increase in `dead_weight_rate`. ## Why is a big consensus state problematic? Casual observers can point at how big the execution state is, and execution clients handling it just fine. ### How beacon state is handled As of 2023, all beacon chain implementations require to load at least one full beacon state in memory to process blocks. This differs from execution clients, which do not load the full execution state in memory. Instead, they selectively load some parts of the state from disk. This is why today execution sync speeds are dominated by the host's disk read/write speeds. | | Execution state | Consensus state | | - | --------------- | --------------- | | data in state | Address storage | Validator records | lookups | by address | by index | | reads per block | very sparse | all state is read at least once per epoch | | total serialized size | many GBs | many MBs | Beacon chain nodes must perform routine manteinance operations, that require to read all validator records. Thus, in order to iterate the state in full "frequently", and that its size can still fit in consumer hosts RAM, consensus clients choose to have the full state available in memory. Otherwise, the full iterations required at epoch transitions could be too expensive for the processing speeds required today. ### Cost a big beacon state Because consensus clients keep at least one full beacon state in memory, bigger states equal to a higher memory footprint. Some derived computational costs include serialization of states, longer iterations on block processing, and more hashing. The computational costs are slight compared to the memory cost. Note that both are heavily dependant on how each client represents state in memory and disk. ## Forgeting inactive validator records Let's define an inactive validator `i` as a record that has already fully withdrawn and has the following values ```python state.validators[i] = { pubkey: # irrelevant withdrawal_credentials: # 0x01 credentials effective_balance: 0 slashed: # irrelevant activation_eligibility_epoch: # some old epoch activation_epoch: # some old epoch exit_epoch: # some old epochoch withdrawable_epoch: # some old epoch } state.balances[i] = 0 state.previous_epoch_participation[i] = 0 state.current_epoch_participation[i] = 0 state.inactivity_scores[i] = 0 ``` :hand: If long withdrawn validator records can be re-assigned, can we ignore them? > :point_right: Not entirely with current spec Once a validator has withdrawn it can't participate in duties. It can't be slashed either past `validator.withdrawable_epoch`. However an inactive validator record is still involved in the following actions: | action | Changes validator record | - | - | | Top-up deposit | Yes, increases balance + effective balance. May latter trigers a withdrawal and return to 0 | BLS to execution credentials change | Yes, mutates once withdrawal_credentials field | Part of another's AttesterSlashing | No Thus, a beacon node must retain _some_ information about inactive validators to process the actions above. Let's review then one by one. ### Inactive validator actions #### Top-up deposits A beacon node must keep a map of validator pubkeys to indexes for all the known validators (active or inactive) to process deposits quickly. The size of this map will grow forever. However, this map could be persisted on disk as deposits are an infrequent operation, economically bounded. ```python def apply_deposit(state: BeaconState, pubkey: BLSPubkey, ...) -> None: validator_pubkeys = [v.pubkey for v in state.validators] if pubkey not in validator_pubkeys: # Add validator to state ... else: increase_balance(state, validator_pubkeys.index(pubkey), amount) ``` #### BLS to execution change At any point, a validator can submit a once operation to upgrade its withdrawal credentials from type 0x00 (BLS pubkey) to type 0x01 (execution address). A beacon node requires access to the validator's withdrawal credentials to verify the message. The p2p verification rules for the `bls_to_execution_change` also require access to the validator record. https://github.com/ethereum/consensus-specs/blob/dev/specs/capella/p2p-interface.md#bls_to_execution_change #### AttesterSlashing While a withdrawn validator can't be slashed, it can be a participant of a bitfield for some other validator's slashable message. A beacon node may be required to aggregate thousands of pubkeys of long withdrawn validators. Beacon nodes cache deserialized pubkeys for fast aggregation in a finalized cache. Dropping the pubkeys of long withdrawn validators will prevent to verify those AttesterSlashing efficiently, requiring to retrieve and deserialize the pubkeys from the state. ### Memory cost of inactive validators Each validator data items in each of these beacon state arrays. ```python class BeaconState(Container): ... validators: List[Validator, VALIDATOR_REGISTRY_LIMIT] balances: List[Gwei, VALIDATOR_REGISTRY_LIMIT] previous_epoch_participation: List[ParticipationFlags, VALIDATOR_REGISTRY_LIMIT] current_epoch_participation: List[ParticipationFlags, VALIDATOR_REGISTRY_LIMIT] inactivity_scores: List[uint64, VALIDATOR_REGISTRY_LIMIT] ``` For reference, the contribution of each index to the beacon state serialized size is 139 bytes (`(48+32+8+1+8+8+8+8)+8+1+1+8`). The actual size in memory depends on how each client chooses to represent the beacon state + the hashing cache + other caches + forkchoice data structures. ## EIP-6914 alternative: compressing inactive validators Given that the possible actions of an inactive validator are limited, its possible to represent them in compressed form. | Action | Required data | - | - | | Top-up deposits | Requires knowledge of the pubkey, to map to index. Can be handled by a disk persisted cache at the cost of slower deposit processing | BLS to execution change | Requires knowledge that credentials are already 0x01, one bit of data | AttesterSlashing | Requires knowledge of the pubkey to compute aggregated pubkeys. Could be handled by a disk persisted cache, at the cost of slower AttesterSlashing processing | Requires knowledge that all balances are 0, one bit of data. Thus, one could represent validators as an enum of ```rust enum Validator { /// Validator which: /// - All status epochs are in the past /// - Withdrawal credentials are 0x01 /// - Effective balance and balance are zero LongWithdrawnValidator, FullValidator(spec.Validator) } ``` In response to a any mutation action to an inactive validator, its record can switch back to being represented in full. The beacon epoch transition has a few operations that requires loops over most validator records. The status `Validator::LongWithdrawnValidator` is sufficient since it can not cause any state change. | Operation | Iterates inactive validators | Action on inactive validators | | - | - | - | | `process_inactivity_updates` | No | `process_rewards_and_penalties` | No | `process_registry_updates` | Yes, but can ignore inactive validators | `process_slashings` | Yes, but can ignore inactive validators | `process_effective_balance_updates` | Yes | None, balance and effective balance are 0 | `process_withdrawals` (per block) | Yes | None, balance is zero ### Will it actually help? In memory allocators like Rust's, arrays of enums will allocate enough memory to represent the biggest variant. So if we represent the beacon state in simple flat memory the above optimization will not help in reducing its memory footprint. Some consensus implementations (Teku, Lodestar, Lighthouse* _soon_) represent the state as a tree, not flat memory. In that case, its feasible to significantly reduce the memory footprint of validator records in the state. It's important to note that states represented as a tree allow to structurally share data, so the memory cost of inactive validators is usually paid only once in the root state. ## Adding EIP-6914, engineering cost EIP-6914 is not free in terms of engineering. It breaks the assumption that the finalized array of pubkeys is append only. Clients will have to mantain an unfinalized map or re-assigned indexes, since with [EIP-6110](https://eips.ethereum.org/EIPS/eip-6110) the order of deposits is susceptible to re-orgs. Current spec requires to: - add new cache: unfinalized index to pubkey cache - modify cache: unfinalized pubkey to index cache - add new cache: re-assignable indexes cache ### Index to pubkey cache Validator records of long withdrawn validators are re-assigned to new validators on any block. **Example case** Consider two operations: - A deposit that with cause re-assigning the index `i` - An AttesterSlashing including the pubkey of index `i` The inclusion of that deposit will invalid the `AttesterSlashing` since the pubkey at index `i` will have mutated. For this reason, implementation bugs on this unfinalized cache will cause consensus bugs. **Cache modifications** Most clients feature a cache to access deserialized BLS pubkeys fast, a contiguous array to provide fast lookups by index. To deal with the example above, the cache must now be fork aware, to return different pubkey at position X. ```rust fn get_pubkey(i: Index) { if let Some(pk) = unfinalized_cache(i) { return pk } else { return finalized_cache(i) } } ``` The above function adds an additional hash map lookup to index any pubkey. This may add some overhead to processing large counts of attestations. But the `unfinalized_cache` check can be skipped for any signed object that requires active duties: attestations, sync committee messages, blocks, etc. ### Pubkey to index cache To process deposits quickly, clients must lookup if a deposit's pubkey is already present in the block's pre-state. EIP-6110 will require the cache to be fork-aware. Now, it will also have to _evict_ the pubkey of the re-assigned validator record. ### Re-assignable indexes cache Current spec for EIP-6914 uses a full sweep to pick the next index to re-use. This requires to have some cache to prevent full iterations on every deposit. Details of the cache requirements and a proposed implementation here: https://github.com/ethereum/consensus-specs/pull/3307 ## Ecosystem considerations ### With EIP-6914 **Stop identifying validators by index** What happens if a validator is no longer identified with its index? Identification by index is extremely common today, due to being _much_ shorter than the full BLS pubkey of 48 bytes. ### Without EIP-6914 **More expensive checkpoint sync** The in-escapable fact is that the size of raw beacon state will keep growing as more and more indexes are allocated. Because some fields are zero compression can reduce the data cost of inactive validators, but to some extend (**TODO**: compute it). Beacon states are regularly transmited over the wire today to support **checkpoint sync**. On October 30th 2023, the mainnet beacon state has a raw serialized size of 140MB (990k indices). Compressing with gzip the size is reduced to 60MB (x0.43 original size). If checkpointz servers enable compression by default, it will take a significant amount of time for the beacon states to be too big to cause problems. ## EIP-6914 with EIP-7251 (Increase the MAX_EFFECTIVE_BALANCE) If [EIP-7251](https://eips.ethereum.org/EIPS/eip-7251) is included, beacon chain state growth is likely to slow down significantly short-mid term. New depositing stake from big operators will produce `1/k` new validators records, where `k` can range between 1 and 64. Long term, if the minimum active balance is reduced, the problem becomes relevant again. Also, a big consolidation event will create a big pool of re-usable indices. Including EIP-6914 after will potentially cause the beacon state to not grow in size for a significant period of time.