Detecting slashing conditions

# Detecting slashing conditions An efficient and optimized slashing condition detector is an [open engineering challenge](https://github.com/ethereum/eth2.0-pm/issues/63). This describes one engineering effort with some new ideas. A compilation of various thoughts around this topic is also discussed. This is an open engineering challenge, and if you have ideas, please share at https://github.com/ethereum/eth2.0-pm/issues/63 ## Background reading - Casper FFG paper https://arxiv.org/pdf/1710.09437.pdf - Weak subjectivity evaluation https://ethresear.ch/t/weak-subjectivity-under-the-exit-queue-model/5187 - Sharding attestation policing https://github.com/ethereum/eth2.0-pm/issues/63 - eth2-surround by Diederik (**@protolambda**) https://github.com/protolambda/eth2-surround#min-max-surround ## Problem Description ### Prerequisites Casper FFG paper mandates two slashing conditions which provides _accountable safety_: - **Double vote** -- two conflicting votes for the same _target_ checkpoint (votes with different _source_) - **Surround vote** -- a vote with _source_ and _target_ surrounding or surrounded by previously made vote In order to prevent a scenario with finalizing two conflicting checkpoints (checkpoints having same _target_), the system needs to punish validators for their malicious behaviour. Validators can be punished if they are caught within a _weak subjectivity_ period. The weak subjectivity period is a bit tricky. It could be simplified. Assuming _2/3_ of validator registry are malicious, they could retroactively finalize a checkpoint after they had all withdrawn their stake. Thus, the number of epochs allowing _2/3_ of validators to withdraw, is used as the size of the weak subjectivity period. ### Solution Naive solution for slashing detection has _O(N) = O(E*C)_ storage and algorithmic complexity, where: - _N_ -- number of distinct attestation messages in a week subjectivity period - _E_ -- number of epochs in the weak subjectivity period - _C_ -- number of committees per epoch The whistleblower's attestation slashing message must include the original attestations that are slashable. So you must keep a history of attestations. This requires storing gigabytes of data that can't fit in the memory of a regular computer. This means storage complexity can't be reduced and stays at _O(N)_. A reasonable solution here would be to use cheap storage (disk) to cover this amount of data. However, this limits the speed of queries over attestation data, by the speed of disk accesses. We're looking for an algorithm that can be broken down into two tracks: - _Fast_ -- memory based filter over the epochs with _O(E)_ complexity - _Slow_ -- disk based lookup over committees of epochs found by fast track, should have _O(C )_ complexity The fast track could probably be improved by the cost of a few gigabytes stored in memory which could significantly reduce its complexity e.g. to _O(1)_ or near that margin. In that case we would have _O(C )_ scan over slow disk storage which could run in a background. Bear in mind update complexity as well. Index of a fast track should have relatively small complexity of updates to handle _A_ attestation messages per slot. ### Numbers A good margin for slashing condition detection is: - _E = 54,000_ -- epochs in a weak subjectivity period - _V = 300,000_ -- active validators - _C = 2,048_ -- Phase 1 spec with shards and _V = 300,000_ - _N = 110,970,000_ -- distinct attestation messages in the best case, which is _~84G_ worth of data; bear in mind amplification factor caused by forks, thus, size of data could be 2X larger - _A = ~10_000_ -- attestation messages per slot ## Design Overview ![](https://i.imgur.com/eidLNBt.png) - **On attestation** - check for double vote - map vote on a node in the index - check if there is an intersection between validator indices of the attestation and indices from validator bitfield - get hashes for all distances of the node if intersection exists - within retrieved hashes run a disk lookup to find proper `IndexedAttestation` and fire a slashing message - check for surrounding vote - scan index starting from _source + 1_ til _target - 1_ to look for votes _surrounded by_ received attestation - scan index starting from _target + 1_ til _current_epoch_ to look for votes _surrounding_ received attestation - for each node being scanned use validator bitfield to filter out validators that has no votes for a target of this particular node - find distances satisfying to surrounded or surrounding conditions - get hashes for matched distances - run a disk lookup with these hashes to find proper `IndexedAttestation` and fire a slashing message - update index - convert _(source, target)_ into _(target_idx, d)_ - if _d_ is unique for the node add it to the list of distances - update attestation data hashes with a new hash - update validators bitfield with new indices - update disk storage; it is required only if hash does not exist in the list and there are new validator indices wrt validator bitfield - **On new epoch** - update _current_epoch_ - set _current_epoch_idx = (current_epoch_idx + 1) % N_ - clear node corresponding to updated _current_epoch_idx_ - erase outdated _IndexedAttestation_ objects from disk ## Parameters Beacon chain spec version [v0.9.1](https://github.com/ethereum/eth2.0-specs/blob/v0.9.1/specs/core/0_beacon-chain.md) |Parameter |Value | |----------|---------| |shards | 64 | |slots | 32 | |validators| 300,000 | |epochs | 54,000 | All calculations are done for Phase 1. In most cases Phase 0 sizes should be less by a factor of _64_ which is the expected number of "crosslinks". ## Surrounding votes ### Problem Upon receiving an attestation we want to know whether its FFG vote surrounds or is surrounded by the vote of other attestations that were received previously. If _surrounding_ is found we want to get _IndexedAttestation_ objects of both conflicting attestations and fire a slashing condition. A period of time during which it makes sense to check attestation for slashing condition is called _weak subjectivity period_. According to Casper FFG paper this period equals to a number of epochs during which 2/3 validators from current validator dynasty are able to withdraw. It is _54K_ epochs in the worst case. It means that we have to do fast lookups accross _~110M_ aggregated attestations in the best case (for _300K_ validators in Phase 1). In bytes it would be _~84G_ of (uncompressed) data. ### Votes Cache An alternative to the index suggested by **@protolambda** (https://github.com/protolambda/eth2-surround#min-max-surround) which is _O(1)_ on lookup and _O(N)_ on update. New approach has _O(1)_ update time and _O(N)_ search time, where _N_ is a number of epochs. Due to relatively low _MAX_N=54K_ scan time of entier index is negligible (~0.01 millis on core i7 laptop, [details](#Benchmark)). Hence, these two approaches are pretty similar in terms of performance. Votes cache is a tiny structure in terms of memory (_~0.1Mb_ in best case) which is used for basic lookups in order to get votes surrounding or surrounded by received attestation. According to a figure in [Design Overview](#Design-Overview) a cache is represented as circular linked list consisting of _N_ nodes. Where _N_ is a number of epochs in the weak subjectivity period. A node stores a list of 2-byte numbers. Each number is a distance between _source_ and _target_ epochs of the vote. #### Update index with new vote To add a new vote _(s, t)_ to a votes cache, first, we need to calculate a node index in a circular linked list and then add `d = t - s` to the list of distances stored in the node. Calculating node's index: ```python idx = (current_epoch_idx - (current_epoch - t)) % N ``` Complexity of update is _O(1)_. #### Scanning for surrounding votes To get votes _surrounded by_ received attestation _(s, t)_ you need to scan cache nodes starting from `start_idx = (current_epoch_idx - (current_epoch - s) + 1) % N` til `end_idx = (current_epoch_idx - (current_epoch - t) - 1) % N`. In _surrounding_ case you need a scan starting from `start_idx = (current_epoch_idx - (current_epoch - t) + 1) % N` til `end_idx = current_epoch_idx`. Cost of this scanning is _O(N)_ where _N_ a number of epochs in weak subjectivity period. #### Size evaluation The size of this cache depends on a number of distinct votes made to same _target_. In the best there is one distinct vote per epoch, it happens when all attestations look like _(source, source + 1)_. In this best case cache size would be _106Kb_. If say there are _10_ distinct votes per epoch then size grows up to _1Mb_. In a normal case a number of distinct votes should be close to _1_. Therefore, we may round the size up to _1Mb_ for simplicity. #### Benchmark Full scan of cache with _10_ distances in each node takes _~0.01_ millis on core i7 which is negligible and disregards its linear complexity. #### Optimization A list of distances could be sorted to utilize binary search for even faster scans. But it doesn't seem a valuable improvement. ### Indexed attestations Accessing _IndexedAttestation_ objects requires a couple of structures more. First structure binds distinct votes with a list of attestation data hashes that are referred to this vote. Size of this structure is relatively small and it could be placed in memory. Second structure is a map of attestation data hash to the list of _IndexedAttestation_ carrying attestation data with corresponding hash. This structure is potentially big and should be stored on disk. #### Size evaluation A number of distinct _AttestationData_ per epoch depends on a number of slots, shards, beacon chain forks and distinct FFG votes. In the best case (one fork, one distinct vote) a number of distinct _AttestationData_ per epoch is _64 * 32 = 2048_. Having two forks and two distinct votes increases this number by a factor of _4_. Let's denote this factor with _I_ (instability). Taking in account two degrees of freedom (forks and distinct votes) _I=4_ is considered as a sane evaluation for instability factor. For _54K_ epochs we have _3Gb_ and _12Gb_ worth of hashes for _I=1_ and _I=4_ respectively. If memory requirements are high it might make sense to offload hashes to disk for old epochs. For example, for the first bunch of _1024_ epochs with _I=4_ it would take only _256Mb_. Size evaluation of disk part of _IndexedAttestation_ cache is a bit more tricky. To make it less complex let's assume that there is no aggregation overlaps and each distinct attestation data always corresponds to one _IndexedAttestation_ object. With a new proposal for Phase 1 size of signed _AttestationData_ should be _230b_. With _I=4_ we get _230 * 64 * 32 * 4 = 1.8Mb_ signed attestation data per epoch plus _300000 * 4 = 2.3Mb_ occupied by validator indices. And finally we get _155Gb_ for entire weak subjectivity period. ## Landing double votes detection As mentioned in [eth2.0-pm#63](https://github.com/ethereum/eth2.0-pm/issues/63) it's enough to maintain a bitfield with bits corresponding to epochs of weak subjectivity period for each validator to detect double votes. This approach requires an update of each validator's bitfield since current epoch changes. To avoid this complexity we may revert validator <-> epoch relation and store a bitfield of validators voted for particular epoch. It's worth noting that bitfield index aids surrounding votes check as well by a second round of in-memory filtering with validator indices voted for a particular _target_. Size of this index linearly dependends on a number of validators and equals _300000 / 8 * 54000 ~ 2Gb_ for _300K_. ### Compressing bitfields It might make sense to compress those bitfields to reduce their effective storage size. Compression with Snappy: |inactive validators per epoch|compression ratio|compressed size| |-|-|-| |1/16|1.55|1.3Gb| |1/8|1.11|1.8Gb| |1/4|1|2Gb| **Note:** compression of original bitfield index with epoch bitfields per each validator might be more efficient since they have lower entropy. ## Credits - Danny Ryan (**@djrtwo**) for initial efforts in https://github.com/ethereum/eth2.0-pm/issues/63 - Diederik Loerakker (**@protolambda**) for this great work https://github.com/protolambda/eth2-surround#min-max-surround - Alex Vlasov (**@ericsson49**) for discussions around this topic