Try   HackMD

Should we adopt EIP-6914: re-use indexes?

TLDR

EIP-6914 re-assigns long withdrawn indexes to new validator desposits validators, to prevent unbounded growth of the Beacon State. This document addresses some recurring concerns of the EIP:

  • Is unbounded beacon state growth actually a problem? Long term yes
  • Can it be solved with engineering solutions (no EIP)? Not entirely
  • Should we adopt EIP-6914? Maybe, in several years

This document is welcoming to both core devs and casual observers. Feel free to jump to your section of interest:

Background

The beacon chain state is basically a big array of validator records. Deposits for new public keys appends records to this big array. There is no mechanism to ever prune validator records. The beacon chain state therefore, experiences unbounded growth. Note that the maximum possible number of active validators is bounded by the ETH supply. The same capital can deposit, exit and re-deposit, creating multiple validator records per unit of ETH over time.

The performance problems induced by a large count of active validators are out of scope for this document. Here we focus on the additional load caused by inactive validators, i.e. dead weight to the state. Let's define dead_weight_rate as:

dead_weight_rate = (total_indexes - active_indexes) / active_indexes

The beacon chain is still on its high growth phase. There have been few exits relative to deposits so far, thus the dead_weight_rate is low (0.125 as of October 2023). Factors that can significantly increase the dead_weight_rate in the mid term:

  • Big rotations from one LSD protocol to another
  • Regulatory forces over a big exchange offering staking products
  • Widespread consolidation with MaxEB

Due to the activation churn, it is not possible for the dead_weight_rate to exceed 1 in less than 1-2 years, limiting the scale of the problem short-term. Long term, once the current stakers and operators rotate or retire we could see a steady increase in dead_weight_rate.

Why is a big consensus state problematic?

Casual observers can point at how big the execution state is, and execution clients handling it just fine.

How beacon state is handled

As of 2023, all beacon chain implementations require to load at least one full beacon state in memory to process blocks. This differs from execution clients, which do not load the full execution state in memory. Instead, they selectively load some parts of the state from disk. This is why today execution sync speeds are dominated by the host's disk read/write speeds.

Execution state Consensus state
data in state Address storage Validator records
lookups by address by index
reads per block very sparse all state is read at least once per epoch
total serialized size many GBs many MBs

Beacon chain nodes must perform routine manteinance operations, that require to read all validator records. Thus, in order to iterate the state in full "frequently", and that its size can still fit in consumer hosts RAM, consensus clients choose to have the full state available in memory. Otherwise, the full iterations required at epoch transitions could be too expensive for the processing speeds required today.

Cost a big beacon state

Because consensus clients keep at least one full beacon state in memory, bigger states equal to a higher memory footprint. Some derived computational costs include serialization of states, longer iterations on block processing, and more hashing. The computational costs are slight compared to the memory cost. Note that both are heavily dependant on how each client represents state in memory and disk.

Forgeting inactive validator records

Let's define an inactive validator i as a record that has already fully withdrawn and has the following values

state.validators[i] = {
    pubkey:                       # irrelevant
    withdrawal_credentials:       # 0x01 credentials
    effective_balance: 0
    slashed:                      # irrelevant
    activation_eligibility_epoch: # some old epoch
    activation_epoch:             # some old epoch
    exit_epoch:                   # some old epochoch
    withdrawable_epoch:           # some old epoch
}
state.balances[i] = 0
state.previous_epoch_participation[i] = 0
state.current_epoch_participation[i] = 0
state.inactivity_scores[i] = 0

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
If long withdrawn validator records can be re-assigned, can we ignore them?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Not entirely with current spec

Once a validator has withdrawn it can't participate in duties. It can't be slashed either past validator.withdrawable_epoch. However an inactive validator record is still involved in the following actions:

action Changes validator record
Top-up deposit Yes, increases balance + effective balance. May latter trigers a withdrawal and return to 0
BLS to execution credentials change Yes, mutates once withdrawal_credentials field
Part of another's AttesterSlashing No

Thus, a beacon node must retain some information about inactive validators to process the actions above. Let's review then one by one.

Inactive validator actions

Top-up deposits

A beacon node must keep a map of validator pubkeys to indexes for all the known validators (active or inactive) to process deposits quickly. The size of this map will grow forever. However, this map could be persisted on disk as deposits are an infrequent operation, economically bounded.

def apply_deposit(state: BeaconState, pubkey: BLSPubkey, ...) -> None:
    validator_pubkeys = [v.pubkey for v in state.validators]
    if pubkey not in validator_pubkeys:
        # Add validator to state
        ...
    else:
        increase_balance(state, validator_pubkeys.index(pubkey), amount)

BLS to execution change

At any point, a validator can submit a once operation to upgrade its withdrawal credentials from type 0x00 (BLS pubkey) to type 0x01 (execution address). A beacon node requires access to the validator's withdrawal credentials to verify the message.

The p2p verification rules for the bls_to_execution_change also require access to the validator record. https://github.com/ethereum/consensus-specs/blob/dev/specs/capella/p2p-interface.md#bls_to_execution_change

AttesterSlashing

While a withdrawn validator can't be slashed, it can be a participant of a bitfield for some other validator's slashable message. A beacon node may be required to aggregate thousands of pubkeys of long withdrawn validators.

Beacon nodes cache deserialized pubkeys for fast aggregation in a finalized cache. Dropping the pubkeys of long withdrawn validators will prevent to verify those AttesterSlashing efficiently, requiring to retrieve and deserialize the pubkeys from the state.

Memory cost of inactive validators

Each validator data items in each of these beacon state arrays.

class BeaconState(Container):
    ...
    validators: List[Validator, VALIDATOR_REGISTRY_LIMIT]
    balances: List[Gwei, VALIDATOR_REGISTRY_LIMIT]
    previous_epoch_participation: List[ParticipationFlags, VALIDATOR_REGISTRY_LIMIT]
    current_epoch_participation: List[ParticipationFlags, VALIDATOR_REGISTRY_LIMIT]
    inactivity_scores: List[uint64, VALIDATOR_REGISTRY_LIMIT]

For reference, the contribution of each index to the beacon state serialized size is 139 bytes ((48+32+8+1+8+8+8+8)+8+1+1+8). The actual size in memory depends on how each client chooses to represent the beacon state + the hashing cache + other caches + forkchoice data structures.

EIP-6914 alternative: compressing inactive validators

Given that the possible actions of an inactive validator are limited, its possible to represent them in compressed form.

Action Required data
Top-up deposits Requires knowledge of the pubkey, to map to index. Can be handled by a disk persisted cache at the cost of slower deposit processing
BLS to execution change Requires knowledge that credentials are already 0x01, one bit of data
AttesterSlashing Requires knowledge of the pubkey to compute aggregated pubkeys. Could be handled by a disk persisted cache, at the cost of slower AttesterSlashing processing

Thus, one could represent validators as an enum of

enum Validator {
    /// Validator which:
    /// - All status epochs are in the past
    /// - Withdrawal credentials are 0x01
    /// - Effective balance and balance are zero
    LongWithdrawnValidator,
    FullValidator(spec.Validator)
}

In response to a any mutation action to an inactive validator, its record can switch back to being represented in full.

The beacon epoch transition has a few operations that requires loops over most validator records. The status Validator::LongWithdrawnValidator is sufficient since it can not cause any state change.

Operation Iterates inactive validators Action on inactive validators
process_inactivity_updates No
process_rewards_and_penalties No
process_registry_updates Yes, but can ignore inactive validators
process_slashings Yes, but can ignore inactive validators
process_effective_balance_updates Yes None, balance and effective balance are 0
process_withdrawals (per block) Yes None, balance is zero

Will it actually help?

In memory allocators like Rust's, arrays of enums will allocate enough memory to represent the biggest variant. So if we represent the beacon state in simple flat memory the above optimization will not help in reducing its memory footprint.

Some consensus implementations (Teku, Lodestar, Lighthouse* soon) represent the state as a tree, not flat memory. In that case, its feasible to significantly reduce the memory footprint of validator records in the state. It's important to note that states represented as a tree allow to structurally share data, so the memory cost of inactive validators is usually paid only once in the root state.

Adding EIP-6914, engineering cost

EIP-6914 is not free in terms of engineering. It breaks the assumption that the finalized array of pubkeys is append only. Clients will have to mantain an unfinalized map or re-assigned indexes, since with EIP-6110 the order of deposits is susceptible to re-orgs.

Current spec requires to:

  • add new cache: unfinalized index to pubkey cache
  • modify cache: unfinalized pubkey to index cache
  • add new cache: re-assignable indexes cache

Index to pubkey cache

Validator records of long withdrawn validators are re-assigned to new validators on any block.

Example case

Consider two operations:

  • A deposit that with cause re-assigning the index i
  • An AttesterSlashing including the pubkey of index i

The inclusion of that deposit will invalid the AttesterSlashing since the pubkey at index i will have mutated. For this reason, implementation bugs on this unfinalized cache will cause consensus bugs.

Cache modifications

Most clients feature a cache to access deserialized BLS pubkeys fast, a contiguous array to provide fast lookups by index. To deal with the example above, the cache must now be fork aware, to return different pubkey at position X.

fn get_pubkey(i: Index) {
  if let Some(pk) = unfinalized_cache(i) {
    return pk
  } else {
    return finalized_cache(i)
  }
}

The above function adds an additional hash map lookup to index any pubkey. This may add some overhead to processing large counts of attestations. But the unfinalized_cache check can be skipped for any signed object that requires active duties: attestations, sync committee messages, blocks, etc.

Pubkey to index cache

To process deposits quickly, clients must lookup if a deposit's pubkey is already present in the block's pre-state. EIP-6110 will require the cache to be fork-aware. Now, it will also have to evict the pubkey of the re-assigned validator record.

Re-assignable indexes cache

Current spec for EIP-6914 uses a full sweep to pick the next index to re-use. This requires to have some cache to prevent full iterations on every deposit. Details of the cache requirements and a proposed implementation here: https://github.com/ethereum/consensus-specs/pull/3307

Ecosystem considerations

With EIP-6914

Stop identifying validators by index

What happens if a validator is no longer identified with its index? Identification by index is extremely common today, due to being much shorter than the full BLS pubkey of 48 bytes.

Without EIP-6914

More expensive checkpoint sync

The in-escapable fact is that the size of raw beacon state will keep growing as more and more indexes are allocated. Because some fields are zero compression can reduce the data cost of inactive validators, but to some extend (TODO: compute it).

Beacon states are regularly transmited over the wire today to support checkpoint sync. On October 30th 2023, the mainnet beacon state has a raw serialized size of 140MB (990k indices). Compressing with gzip the size is reduced to 60MB (x0.43 original size). If checkpointz servers enable compression by default, it will take a significant amount of time for the beacon states to be too big to cause problems.

EIP-6914 with EIP-7251 (Increase the MAX_EFFECTIVE_BALANCE)

If EIP-7251 is included, beacon chain state growth is likely to slow down significantly short-mid term. New depositing stake from big operators will produce 1/k new validators records, where k can range between 1 and 64. Long term, if the minimum active balance is reduced, the problem becomes relevant again. Also, a big consolidation event will create a big pool of re-usable indices. Including EIP-6914 after will potentially cause the beacon state to not grow in size for a significant period of time.