Ethereum slasher backfilling algorithm

by Leo Lara (github.com/leolara | https://app.poap.xyz/scan/leolara.eth)

Abstract

Slashing is a vital part of the Ethereum PoS consensus layer. Work by several contributors has been done in order to optimize it.

Prysm is an implementation of PoS consensus layer node software. During its development it was observed that is not trivial to backfill previous information into the slasher. This is, how to consider attestations and block proposals included in blocks before the chain head after initial sync.

Previous work done on Prysm concluded that backfilling during the initial sync would unacceptably slow down initial sync, while doing it afterwards would be very inefficient as aparently would require to reconstruct all past states. Also, it is needed to be done in batches in order to do not impact the general performance of the node.

In this proposal it is concluded that to backfill the slasher it is actually not necessary to recreate past states. Any future state to the block data that is being backfilled into the slasher is enough. Additionally a way to throttle the backfilling in Prysm in order to do not impact the node performance is proposed.

Introduction

Slashing is a fundamental part of Ethereum PoS consensus security as it disincentives and reacts to harmful actions done by validators. Validators that performs these harmful actions, or slashable offenses, are slashed, which means that they will lose stake and eventually not allowed to participate anymore on the consensus (forced exit). There are three types of slashable offences [Eth2 Slashing Prevention Tips].

For slashing to work the beacon chain node (BN) used by validators, need to index and match all blocks proposed and all attestations done by all validators from the time period between now and now minus WEAK_SUBJECTIVITY_PERIOD epochs. For two types this is mostly trivial. For the surrounding vote slashable offence, it is not as straightforward but some work has been done that allows to build an index that has constant complexity for insertion and matching [sproul, protolambda and prysmaticlabs].

Prysm has a full slasher implementation included. However, it only includes and matches in the indexes, blocks and attestations that arrive when the node is synchronized. Additional code is required to add blocks and attestation to the slasher indexes that are retrieved while synchronizing. This activity is called slasher backfilling.

Research has been done about how to implement slasher backfilling and a prototype has been implemented by Raul Jordan [ prysmaticlabs research, open issue at prysm repo and Raul Jordan prototype ].

Raul Jordan points out the main reason more research needs before including this:

Algorithm would perform poorly and slow down the node, as it requires to rebuild old beacon chain states.
Something needs to be added to batch or throttle the backfilling algorithm so it does not slow down the node so much that it prevents it for running synchronized.

In this document a solution to both problems is proposed.

Current prototype implementation

https://github.com/prysmaticlabs/prysm/pull/9211/files

The main part that requires study from the prototype is this: https://github.com/prysmaticlabs/prysm/pull/9211/files#diff-1112be79c761db4cc8c3553ef3d38a8ebe437885a813de26415190b2194a7888R84-R121






































for i, block := range blocks {
    if block.Block().Slot() == 0 {
        continue // Skip genesis.
    }
    header, err := blockutil.SignedBeaconBlockHeaderFromBlock(block)
    if err != nil {
        return err
    }
    headers = append(headers, &slashertypes.SignedBlockHeaderWrapper{
        SignedBeaconBlockHeader: header,
        SigningRoot:             roots[i],
    })
    preState, err := s.getBlockPreState(s.ctx, block.Block())
    if err != nil {
        return err
    }
    if preState == nil || preState.IsNil() {
        continue // Block is useless without a pre-state.
    }
    for _, att := range block.Block().Body().Attestations() {
        committee, err := helpers.BeaconCommitteeFromState(preState, att.Data.Slot, att.Data.CommitteeIndex)
        if err != nil {
            return err
        }
        indexedAtt, err := attestationutil.ConvertToIndexed(s.ctx, att, committee)
        if err != nil {
            return err
        }
        signingRoot, err := indexedAtt.Data.HashTreeRoot()
        if err != nil {
            return err
        }
        atts = append(atts, &slashertypes.IndexedAttestationWrapper{
            IndexedAttestation: indexedAtt,
            SigningRoot:        signingRoot,
        })
    }
}

This is the main loop of the prototype. It goes over the blocks that needs backfilling, gets the pre-state of that, and then indexes all the attestations in the block using the pre-state.

As it is asserted in previous research calculating the pre-state of the block is really costly, unless we are using an archive node. An analysis of the code confirms this, as it needs to go far in the past for blocks (WEAK_SUBJECTIVITY_PERIOD epochs) the pre-state will not be stored neither in disk or memory, it will need to go to the genesis state and from there start performing all the transitions for all the blocks, which is equivalent to syncing after all the blocks are downloaded.

It is worth mentioning, that the normal slasher does not have to do this, as it works directly on the head of the chain blocks and recent states are available.

Hence, this articles focuses on how to reduce or hopefully eliminate the need to calcualte the pre-state.

Indexed attestation data dependencies on the state

The pre-state is needed to calculate indexed attestations (spec: IndexedAttestation) from the attestations as they are included in the block (spec: Attestation).

An analysis of the algorithm to index attestations (spec:get_indexed_attestation) shows that it depends on the committees assignations of its slot (spec:get_beacon_committee). This assignation depends on two datum that themselves depend on the state:

A seed: get_seed(state, epoch, DOMAIN_BEACON_ATTESTER)
The active validators list get_active_validator_indices(state, epoch)

Further analysis of each calculation is done below.

Seed calculation dependency on state

The state contains an array of randao mixes

randao_mixes: Vector[Bytes32, EPOCHS_PER_HISTORICAL_VECTOR],

the seed depends on

randao_mixes[(epoch - MIN_SEED_LOOKAHEAD - 1) % EPOCHS_PER_HISTORICAL_VECTOR] with MIN_SEED_LOOKAHEAD >= 1

and a state transition modifies this array on the position randao_mixes[epoch % EPOCHS_PER_HISTORICAL_VECTOR].

Hence, states transitions on future epochs do not modify the randao_mixes of past epochs up to EPOCHS_PER_HISTORICAL_VECTOR. The backfilling algorithm only needs to go WEAK_SUBJECTIVITY_PERIOD in the past, and WEAK_SUBJECTIVITY_PERIOD < EPOCHS_PER_HISTORICAL_VECTOR.

In conclusion, the backfilling algorithm can use any future state to calculate the seed.

Active validators dependency on state

The active validators at a epoch e can be calculated from a state of any epoch the same or higher than e, as it is shown bellow.

Relevant spec details, definions and lemmas

The state keeps an array of all the validators:








class Validator(Container):
    activation_epoch: Epoch
    exit_epoch: Epoch
    # other fields...
    
class BeaconState(Container):    
    validators: List[Validator, VALIDATOR_REGISTRY_LIMIT]
    # other fields...

To decide if a valdiator is active:


def is_active_validator(validator: Validator, epoch: Epoch) -> bool:
    return validator.activation_epoch <= epoch < validator.exit_epoch

When a validator is added activation_epoch and exit_epoch are both set to FAR_FUTURE_EPOCH. Then when it is decided when the validator is included activation_epoch is set, and at last when the exit is decided exit_epoch is set.

Therefore, it passes three stages in order:

Stage 1 (activation_epoch=FAR_FUTURE_EPOCH, exit_epoch=FAR_FUTURE_EPOCH)
Stage 2 (activation_epoch=ae, exit_epoch=FAR_FUTURE_EPOCH)
Stage 3 (activation_epoch=ae, exit_epoch=ee)

Every state transition, if these values change for a validator will do it in this order.

Definition D.1

Given a validator v. ae(v) is the activation epoch of a validator and ee(v) is the exit epoch of a validator. This is a constant for each validator independently of when this values are settled in the beacon chain state.

Lemma L.1

For any validator v we have that ae(v) < ee(v).

Proof. Follows from the spec.

Definition D.2

Given a validator v. et1(v) is the epoch when the transition from stage 1 to state 2 occurs and et2(v) is the epoch when the transitio from stage 2 to stage 3 occurs.

Definition D.3

A(state, validator, epoch) is true iff is_active_validator(validator, epoch) returns true at state. Note that epoch does not have to be the same as state.epoch.

Definition D.4

AS(state, epoch) = { val validator such that A(state, val, epoch) }
This is the set of active validators at epoch with the information available at state. Note that epoch does not have to be the same as state.epoch.

Lemma L.2

et1(v) <= ae(v) < et2(v) < ee(v)

Proof. Out of scope, from the spec.

Main proof

Theorem T.1

Given a state st1 and a validator val. For all state st2 such that st2.slot > st1.slot. If A(st1, val, st1.epoch) then A(st2, val, st1.epoch).

Proof.

Consider that st2.slot > st1.slot implies

(1) st2.epoch >= st1.epoch.

From (1), previous definitions, lemmas, and A(st1, val, st1.epoch):

(2) et1(val) <= ae(val) <= st1.epoch < ee(val),

(3) et1(val) <= ae(val) <= st1.epoch <= st2.epoch which implies et1(val) <= st2.epoch

Note that we cannot say anything about st2.epoch and et2(val). Then:

(P.1) either st2.epoch < et2(val) which means that at st2 val.exit_epoch=FAR_FUTURE_EPOCH,
(P.2) or st2.epoch < et2(val) which means that at st2 val.exit_epoch=ee(val).

For P.1 and from (3) val = (activation_epoch=ae(val), exit_epoch=FAR_FUTURE_EPOCH)) at st2, which means that is_active_validator(val, st1.epoch) at st2 returns true. Which directly implies A(st2, val, st1.epoch)

For P.2 and from (3) val = (activation_epoch=ae(val), exit_epoch=ee(val)) at st2, which means that is_active_validator(val, st1.epoch) at st2 returns true. Which directly implies A(st2, val, st1.epoch).

Therefore QED for both cases P.1 and P.2.

Theorem T.2

Given a state st1 and a validator val. For all state st2 such that st2.slot > st1.slot. If not A(st1, val, st1.epoch) then not A(st2, val, st1.epoch).

Proof.

Consider that st2.slot > st1.slot implies

(1) st2.epoch >= st1.epoch.

From (1), previous definitions, lemmas, and not A(st1, val, st1.epoch):

(2) st1.epoch < ae(val),

Note, that we cannot say anything about st2.epoch and et1(val) or et2(val).

There are the following posibilities:

either (P.1) st2.epoch < et1(val) < et2(val), hence at st2 val = (activation_epoch=FAR_FUTURE_EPOCH, exit_epoch=FAR_FUTURE_EPOCH),
or (P.2) et1(val) <= st2.epoch < et2(val), hence at st2 val = (activation_epoch=ae(val), exit_epoch=FAR_FUTURE_EPOCH),
or (P.3) et1(val) < et2(val) <= st2.epoch , hence at st2 val = (activation_epoch=ae(val), exit_epoch=ee(val)).

For P.1 is_active_validator(val, st1.epoch) at st2 returns false, so not A(st2, val, st1.epoch).

For P.2 and from (2) follows that is_active_validator(val, st1.epoch) at st2 returns false, so not A(st2, val, st1.epoch).

For P.3 and from (2) follows that is_active_validator(val, st1.epoch) at st2 returns false, so not A(st2, val, st1.epoch).

Therefore QED for cases P.1, P.2 and P.3.

Theorem T.3

Given a state st1 and a validator val. For all state st2 such that st2.slot > st1.slot. A(st1, val, st1.epoch) if and only if A(st2, val, st1.epoch).

Proof. T.1 and T.2 considered together.

Theorem T.4

Given a state st1, for all state st2 such that st2.slot > st1.slot. Then, AS(st1, st1.epoch) == AS(st2, st1.epoch).

Proof. By T.3 it is trivial that val belongs to AS(st1, st1.epoch) if and only if val belongs to AS(st2, st1.epoch).

QED the active validators set at an epoch, will be the same for a state at that epoch than for any state at a higher epoch.

Conclusion

The indexed attestations can be calculated from any state of a slot same or higher than the slot the attestations refer to. The importance of this is that we can relay on any recent state (the head block state for example is usually easily available). This reduces the biggest performance issue of backfilling the node slasher.

Proposed algorithm for slasher backfilling

Throttling

One point mentioned in previous worked and that we have not touched is how to throttle the slasher backfilling, so it does not prevent the normal functioning of the BN.

From analysis of the code we see that the computational effort is mainly proportional to the number of attestations processed.

The proposal is to have a BN configuration setting that is how much attestations to process before pausing the backfilling algorithm for SECONDS_PER_SLOT seconds. Consider that it is not necessary for the backfilling to be synchronized with the slots ticker.

Each person running a slasher BN could set this setting according to the performance of their machine.

This is one of the easiest ways to implement throttling. Other more sophisticated without configuration and self-regulated ones would require per thread CPU stats, which for example in golang it is not available (golang/go#41554).

Main algorithm

It would follow the order in which Raul's PR processed the blocks, but when the number of attestations gotten from a block rises over the threshold, we would stop processing more blocks until we have waited SECONDS_PER_SLOT seconds.

To index the attestations, we would use a cached state from the head state, if a attestation regards a higher slot we will require a more recent state from the head.

pseudo-code

















































func (s *Service) backfill(start, end types.Slot, maxAttestions uint64) error {
    curSlot := start
    if curSlot == 0 {
      curSlot = 1 // Skip genesis.
    }

    attestationsCounter := 0
    blocks := []{}
    attestations := []{}
    
	for curSlot <= end {
        curBlocks := s.beaconDB.BlocksBySlot(curSlot)

        if  len(attestations) + s.countAttestions(curBlocks) > maxAttestions {
            s.backfillBlocks(blocks)
            s.backfillAttestations(attestations)
            
            wait(SECONDS_PER_SLOT)
        }
    
        blocks = append(blocks, curBlocks)
        attestations = append(attestations, curBlocks.attesatations)
	}
}

func (s *Service) backfillBlocks(blocks) {
    headers := blockutil.SignedBeaconBlockHeaderFromBlocks(block)
    propSlashings := s.detectProposerSlashings(s.ctx, headers)
    s.processProposerSlashings(propSlashings)
}

func (s *Service) backfillAttestations(attestations) {
    indexed := []{}
    for _, att := range attestations {
        committee := helpers.BeaconCommitteeFromState(s.backfillState(att.Data.Slot), att.Data.Slot, att.Data.CommitteeIndex)
        indexedAtt := attestationutil.ConvertToIndexed(att, committee)
        indexed = append(indexed, indexedAtt)
    }
    attSlashings := s.checkSlashableAttestations( atts)
    s.processAttesterSlashings(attSlashings)
}

func (s *Service) backfillState(slot) {
    if s.cachedBackfillState == nil || s.cachedBackfillState.Slot < slot {
        s.cachedBackfillState = s.getHeadState()
    }
    
    return s.cachedBackfillState
}

I am using here pseudo-golang, some types and trivial code/optimizations are omitted.

Thanks for reading :-)

Leo Lara (github.com/leolara | https://app.poap.xyz/scan/leolara.eth)