by Leo Lara (github.com/leolara | https://app.poap.xyz/scan/leolara.eth)
Slashing is a vital part of the Ethereum PoS consensus layer. Work by several contributors has been done in order to optimize it.
Prysm is an implementation of PoS consensus layer node software. During its development it was observed that is not trivial to backfill previous information into the slasher. This is, how to consider attestations and block proposals included in blocks before the chain head after initial sync.
Previous work done on Prysm concluded that backfilling during the initial sync would unacceptably slow down initial sync, while doing it afterwards would be very inefficient as aparently would require to reconstruct all past states. Also, it is needed to be done in batches in order to do not impact the general performance of the node.
In this proposal it is concluded that to backfill the slasher it is actually not necessary to recreate past states. Any future state to the block data that is being backfilled into the slasher is enough. Additionally a way to throttle the backfilling in Prysm in order to do not impact the node performance is proposed.
Slashing is a fundamental part of Ethereum PoS consensus security as it disincentives and reacts to harmful actions done by validators. Validators that performs these harmful actions, or slashable offenses, are slashed, which means that they will lose stake and eventually not allowed to participate anymore on the consensus (forced exit). There are three types of slashable offences [Eth2 Slashing Prevention Tips].
For slashing to work the beacon chain node (BN) used by validators, need to index and match all blocks proposed and all attestations done by all validators from the time period between now and now minus WEAK_SUBJECTIVITY_PERIOD
epochs. For two types this is mostly trivial. For the surrounding vote slashable offence, it is not as straightforward but some work has been done that allows to build an index that has constant complexity for insertion and matching [sproul, protolambda and prysmaticlabs].
Prysm has a full slasher implementation included. However, it only includes and matches in the indexes, blocks and attestations that arrive when the node is synchronized. Additional code is required to add blocks and attestation to the slasher indexes that are retrieved while synchronizing. This activity is called slasher backfilling.
Research has been done about how to implement slasher backfilling and a prototype has been implemented by Raul Jordan [ prysmaticlabs research, open issue at prysm repo and Raul Jordan prototype ].
Raul Jordan points out the main reason more research needs before including this:
In this document a solution to both problems is proposed.
https://github.com/prysmaticlabs/prysm/pull/9211/files
The main part that requires study from the prototype is this: https://github.com/prysmaticlabs/prysm/pull/9211/files#diff-1112be79c761db4cc8c3553ef3d38a8ebe437885a813de26415190b2194a7888R84-R121
This is the main loop of the prototype. It goes over the blocks that needs backfilling, gets the pre-state of that, and then indexes all the attestations in the block using the pre-state.
As it is asserted in previous research calculating the pre-state of the block is really costly, unless we are using an archive node. An analysis of the code confirms this, as it needs to go far in the past for blocks (WEAK_SUBJECTIVITY_PERIOD
epochs) the pre-state will not be stored neither in disk or memory, it will need to go to the genesis state and from there start performing all the transitions for all the blocks, which is equivalent to syncing after all the blocks are downloaded.
It is worth mentioning, that the normal slasher does not have to do this, as it works directly on the head of the chain blocks and recent states are available.
Hence, this articles focuses on how to reduce or hopefully eliminate the need to calcualte the pre-state.
The pre-state is needed to calculate indexed attestations (spec: IndexedAttestation) from the attestations as they are included in the block (spec: Attestation).
An analysis of the algorithm to index attestations (spec:get_indexed_attestation) shows that it depends on the committees assignations of its slot (spec:get_beacon_committee). This assignation depends on two datum that themselves depend on the state:
get_seed(state, epoch, DOMAIN_BEACON_ATTESTER)
get_active_validator_indices(state, epoch)
Further analysis of each calculation is done below.
The state contains an array of randao mixes
randao_mixes: Vector[Bytes32, EPOCHS_PER_HISTORICAL_VECTOR]
,
the seed depends on
randao_mixes[(epoch - MIN_SEED_LOOKAHEAD - 1) % EPOCHS_PER_HISTORICAL_VECTOR]
with MIN_SEED_LOOKAHEAD >= 1
and a state transition modifies this array on the position randao_mixes[epoch % EPOCHS_PER_HISTORICAL_VECTOR]
.
Hence, states transitions on future epochs do not modify the randao_mixes of past epochs up to EPOCHS_PER_HISTORICAL_VECTOR
. The backfilling algorithm only needs to go WEAK_SUBJECTIVITY_PERIOD
in the past, and WEAK_SUBJECTIVITY_PERIOD < EPOCHS_PER_HISTORICAL_VECTOR
.
In conclusion, the backfilling algorithm can use any future state to calculate the seed.
The active validators at a epoch e
can be calculated from a state of any epoch the same or higher than e
, as it is shown bellow.
The state keeps an array of all the validators:
To decide if a valdiator is active:
When a validator is added activation_epoch
and exit_epoch
are both set to FAR_FUTURE_EPOCH
. Then when it is decided when the validator is included activation_epoch
is set, and at last when the exit is decided exit_epoch
is set.
Therefore, it passes three stages in order:
(activation_epoch=FAR_FUTURE_EPOCH, exit_epoch=FAR_FUTURE_EPOCH)
(activation_epoch=ae, exit_epoch=FAR_FUTURE_EPOCH)
(activation_epoch=ae, exit_epoch=ee)
Every state transition, if these values change for a validator will do it in this order.
Definition D.1
Given a validator
v
.ae(v)
is the activation epoch of a validator andee(v)
is the exit epoch of a validator. This is a constant for each validator independently of when this values are settled in the beacon chain state.
Lemma L.1
For any validator
v
we have thatae(v) < ee(v)
.
Proof. Follows from the spec.
Definition D.2
Given a validator
v
.et1(v)
is the epoch when the transition from stage 1 to state 2 occurs andet2(v)
is the epoch when the transitio from stage 2 to stage 3 occurs.
Definition D.3
A(state, validator, epoch)
is true iffis_active_validator(validator, epoch)
returns true atstate
. Note thatepoch
does not have to be the same asstate.epoch
.
Definition D.4
AS(state, epoch) = { val validator such that A(state, val, epoch) }
This is the set of active validators atepoch
with the information available atstate
. Note thatepoch
does not have to be the same asstate.epoch
.
Lemma L.2
et1(v) <= ae(v) < et2(v) < ee(v)
Proof. Out of scope, from the spec.
Theorem T.1
Given a state
st1
and a validatorval
. For all statest2
such thatst2.slot > st1.slot
. IfA(st1, val, st1.epoch)
thenA(st2, val, st1.epoch)
.
Proof.
Consider that st2.slot > st1.slot
implies
(1) st2.epoch >= st1.epoch
.
From (1), previous definitions, lemmas, and A(st1, val, st1.epoch)
:
(2) et1(val) <= ae(val) <= st1.epoch < ee(val)
,
(3) et1(val) <= ae(val) <= st1.epoch <= st2.epoch
which implies et1(val) <= st2.epoch
Note that we cannot say anything about st2.epoch
and et2(val)
. Then:
st2.epoch < et2(val)
which means that at st2 val.exit_epoch=FAR_FUTURE_EPOCH
,st2.epoch < et2(val)
which means that at st2 val.exit_epoch=ee(val)
.For P.1 and from (3) val = (activation_epoch=ae(val), exit_epoch=FAR_FUTURE_EPOCH))
at st2
, which means that is_active_validator(val, st1.epoch)
at st2 returns true. Which directly implies A(st2, val, st1.epoch)
For P.2 and from (3) val = (activation_epoch=ae(val), exit_epoch=ee(val))
at st2, which means that is_active_validator(val, st1.epoch)
at st2
returns true. Which directly implies A(st2, val, st1.epoch)
.
Therefore QED for both cases P.1 and P.2.
Theorem T.2
Given a state
st1
and a validatorval
. For all statest2
such thatst2.slot > st1.slot
. Ifnot A(st1, val, st1.epoch)
thennot A(st2, val, st1.epoch)
.
Proof.
Consider that st2.slot > st1.slot
implies
(1) st2.epoch >= st1.epoch
.
From (1), previous definitions, lemmas, and not A(st1, val, st1.epoch)
:
(2) st1.epoch < ae(val)
,
Note, that we cannot say anything about st2.epoch
and et1(val)
or et2(val)
.
There are the following posibilities:
st2.epoch < et1(val) < et2(val)
, hence at st2 val = (activation_epoch=FAR_FUTURE_EPOCH, exit_epoch=FAR_FUTURE_EPOCH)
,et1(val) <= st2.epoch < et2(val)
, hence at st2 val = (activation_epoch=ae(val), exit_epoch=FAR_FUTURE_EPOCH)
,et1(val) < et2(val) <= st2.epoch
, hence at st2 val = (activation_epoch=ae(val), exit_epoch=ee(val))
.For P.1 is_active_validator(val, st1.epoch)
at st2 returns false, so not A(st2, val, st1.epoch)
.
For P.2 and from (2) follows that is_active_validator(val, st1.epoch)
at st2
returns false, so not A(st2, val, st1.epoch)
.
For P.3 and from (2) follows that is_active_validator(val, st1.epoch)
at st2 returns false, so not A(st2, val, st1.epoch)
.
Therefore QED for cases P.1, P.2 and P.3.
Theorem T.3
Given a state
st1
and a validatorval
. For all statest2
such thatst2.slot > st1.slot
.A(st1, val, st1.epoch)
if and only ifA(st2, val, st1.epoch)
.
Proof. T.1 and T.2 considered together.
Theorem T.4
Given a state
st1
, for all statest2
such thatst2.slot > st1.slot
. Then,AS(st1, st1.epoch) == AS(st2, st1.epoch)
.
Proof. By T.3 it is trivial that val
belongs to AS(st1, st1.epoch)
if and only if val belongs to AS(st2, st1.epoch)
.
QED the active validators set at an epoch, will be the same for a state at that epoch than for any state at a higher epoch.
The indexed attestations can be calculated from any state of a slot same or higher than the slot the attestations refer to. The importance of this is that we can relay on any recent state (the head block state for example is usually easily available). This reduces the biggest performance issue of backfilling the node slasher.
One point mentioned in previous worked and that we have not touched is how to throttle the slasher backfilling, so it does not prevent the normal functioning of the BN.
From analysis of the code we see that the computational effort is mainly proportional to the number of attestations processed.
The proposal is to have a BN configuration setting that is how much attestations to process before pausing the backfilling algorithm for SECONDS_PER_SLOT seconds. Consider that it is not necessary for the backfilling to be synchronized with the slots ticker.
Each person running a slasher BN could set this setting according to the performance of their machine.
This is one of the easiest ways to implement throttling. Other more sophisticated without configuration and self-regulated ones would require per thread CPU stats, which for example in golang it is not available (golang/go#41554).
It would follow the order in which Raul's PR processed the blocks, but when the number of attestations gotten from a block rises over the threshold, we would stop processing more blocks until we have waited SECONDS_PER_SLOT seconds.
To index the attestations, we would use a cached state from the head state, if a attestation regards a higher slot we will require a more recent state from the head.
I am using here pseudo-golang, some types and trivial code/optimizations are omitted.
Thanks for reading :-)
Leo Lara (github.com/leolara | https://app.poap.xyz/scan/leolara.eth)