Detecting slashing conditions

An efficient and optimized slashing condition detector is an open engineering challenge. This describes one engineering effort with some new ideas. A compilation of various thoughts around this topic is also discussed.

This is an open engineering challenge, and if you have ideas, please share at https://github.com/ethereum/eth2.0-pm/issues/63

Background reading

Problem Description

Prerequisites

Casper FFG paper mandates two slashing conditions which provides accountable safety:

  • Double vote two conflicting votes for the same target checkpoint (votes with different source)
  • Surround vote a vote with source and target surrounding or surrounded by previously made vote

In order to prevent a scenario with finalizing two conflicting checkpoints (checkpoints having same target), the system needs to punish validators for their malicious behaviour. Validators can be punished if they are caught within a weak subjectivity period.

The weak subjectivity period is a bit tricky. It could be simplified. Assuming 2/3 of validator registry are malicious, they could retroactively finalize a checkpoint after they had all withdrawn their stake. Thus, the number of epochs allowing 2/3 of validators to withdraw, is used as the size of the weak subjectivity period.

Solution

Naive solution for slashing detection has O(N) = O(E*C) storage and algorithmic complexity, where:

  • N number of distinct attestation messages in a week subjectivity period
  • E number of epochs in the weak subjectivity period
  • C number of committees per epoch

The whistleblower's attestation slashing message must include the original attestations that are slashable. So you must keep a history of attestations. This requires storing gigabytes of data that can't fit in the memory of a regular computer.

This means storage complexity can't be reduced and stays at O(N). A reasonable solution here would be to use cheap storage (disk) to cover this amount of data. However, this limits the speed of queries over attestation data, by the speed of disk accesses.

We're looking for an algorithm that can be broken down into two tracks:

  • Fast memory based filter over the epochs with O(E) complexity
  • Slow disk based lookup over committees of epochs found by fast track, should have O(C ) complexity

The fast track could probably be improved by the cost of a few gigabytes stored in memory which could significantly reduce its complexity e.g. to O(1) or near that margin. In that case we would have O(C ) scan over slow disk storage which could run in a background.

Bear in mind update complexity as well. Index of a fast track should have relatively small complexity of updates to handle A attestation messages per slot.

Numbers

A good margin for slashing condition detection is:

  • E = 54,000 epochs in a weak subjectivity period
  • V = 300,000 active validators
  • C = 2,048 Phase 1 spec with shards and V = 300,000
  • N = 110,970,000 distinct attestation messages in the best case, which is ~84G worth of data; bear in mind amplification factor caused by forks, thus, size of data could be 2X larger
  • A = ~10_000 attestation messages per slot

Design Overview

  • On attestation

    • check for double vote

      • map vote on a node in the index
      • check if there is an intersection between validator indices of the attestation and indices from validator bitfield
      • get hashes for all distances of the node if intersection exists
      • within retrieved hashes run a disk lookup to find proper IndexedAttestation and fire a slashing message
    • check for surrounding vote

      • scan index starting from source + 1 til target - 1 to look for votes surrounded by received attestation
      • scan index starting from target + 1 til current_epoch to look for votes surrounding received attestation
      • for each node being scanned use validator bitfield to filter out validators that has no votes for a target of this particular node
      • find distances satisfying to surrounded or surrounding conditions
      • get hashes for matched distances
      • run a disk lookup with these hashes to find proper IndexedAttestation and fire a slashing message
    • update index

      • convert (source, target) into (target_idx, d)
      • if d is unique for the node add it to the list of distances
      • update attestation data hashes with a new hash
      • update validators bitfield with new indices
      • update disk storage; it is required only if hash does not exist in the list and there are new validator indices wrt validator bitfield
  • On new epoch

    • update current_epoch
    • set current_epoch_idx = (current_epoch_idx + 1) % N
    • clear node corresponding to updated current_epoch_idx
    • erase outdated IndexedAttestation objects from disk

Parameters

Beacon chain spec version v0.9.1

Parameter Value
shards 64
slots 32
validators 300,000
epochs 54,000

All calculations are done for Phase 1. In most cases Phase 0 sizes should be less by a factor of 64 which is the expected number of "crosslinks".

Surrounding votes

Problem

Upon receiving an attestation we want to know whether its FFG vote surrounds or is surrounded by the vote of other attestations that were received previously. If surrounding is found we want to get IndexedAttestation objects of both conflicting attestations and fire a slashing condition.

A period of time during which it makes sense to check attestation for slashing condition is called weak subjectivity period. According to Casper FFG paper this period equals to a number of epochs during which 2/3 validators from current validator dynasty are able to withdraw. It is 54K epochs in the worst case.

It means that we have to do fast lookups accross ~110M aggregated attestations in the best case (for 300K validators in Phase 1). In bytes it would be ~84G of (uncompressed) data.

Votes Cache

An alternative to the index suggested by @protolambda (https://github.com/protolambda/eth2-surround#min-max-surround) which is O(1) on lookup and O(N) on update. New approach has O(1) update time and O(N) search time, where N is a number of epochs. Due to relatively low MAX_N=54K scan time of entier index is negligible (~0.01 millis on core i7 laptop, details). Hence, these two approaches are pretty similar in terms of performance.

Votes cache is a tiny structure in terms of memory (~0.1Mb in best case) which is used for basic lookups in order to get votes surrounding or surrounded by received attestation.

According to a figure in Design Overview a cache is represented as circular linked list consisting of N nodes. Where N is a number of epochs in the weak subjectivity period.

A node stores a list of 2-byte numbers. Each number is a distance between source and target epochs of the vote.

Update index with new vote

To add a new vote (s, t) to a votes cache, first, we need to calculate a node index in a circular linked list and then add d = t - s to the list of distances stored in the node.

Calculating node's index:

idx = (current_epoch_idx - (current_epoch - t)) % N

Complexity of update is O(1).

Scanning for surrounding votes

To get votes surrounded by received attestation (s, t) you need to scan cache nodes starting from start_idx = (current_epoch_idx - (current_epoch - s) + 1) % N til end_idx = (current_epoch_idx - (current_epoch - t) - 1) % N.

In surrounding case you need a scan starting from start_idx = (current_epoch_idx - (current_epoch - t) + 1) % N til end_idx = current_epoch_idx.

Cost of this scanning is O(N) where N a number of epochs in weak subjectivity period.

Size evaluation

The size of this cache depends on a number of distinct votes made to same target.

In the best there is one distinct vote per epoch, it happens when all attestations look like (source, source + 1). In this best case cache size would be 106Kb. If say there are 10 distinct votes per epoch then size grows up to 1Mb. In a normal case a number of distinct votes should be close to 1. Therefore, we may round the size up to 1Mb for simplicity.

Benchmark

Full scan of cache with 10 distances in each node takes ~0.01 millis on core i7 which is negligible and disregards its linear complexity.

Optimization

A list of distances could be sorted to utilize binary search for even faster scans. But it doesn't seem a valuable improvement.

Indexed attestations

Accessing IndexedAttestation objects requires a couple of structures more.

First structure binds distinct votes with a list of attestation data hashes that are referred to this vote. Size of this structure is relatively small and it could be placed in memory.

Second structure is a map of attestation data hash to the list of IndexedAttestation carrying attestation data with corresponding hash. This structure is potentially big and should be stored on disk.

Size evaluation

A number of distinct AttestationData per epoch depends on a number of slots, shards, beacon chain forks and distinct FFG votes.

In the best case (one fork, one distinct vote) a number of distinct AttestationData per epoch is 64 * 32 = 2048. Having two forks and two distinct votes increases this number by a factor of 4. Let's denote this factor with I (instability). Taking in account two degrees of freedom (forks and distinct votes) I=4 is considered as a sane evaluation for instability factor.

For 54K epochs we have 3Gb and 12Gb worth of hashes for I=1 and I=4 respectively. If memory requirements are high it might make sense to offload hashes to disk for old epochs. For example, for the first bunch of 1024 epochs with I=4 it would take only 256Mb.

Size evaluation of disk part of IndexedAttestation cache is a bit more tricky. To make it less complex let's assume that there is no aggregation overlaps and each distinct attestation data always corresponds to one IndexedAttestation object.

With a new proposal for Phase 1 size of signed AttestationData should be 230b. With I=4 we get 230 * 64 * 32 * 4 = 1.8Mb signed attestation data per epoch plus 300000 * 4 = 2.3Mb occupied by validator indices. And finally we get 155Gb for entire weak subjectivity period.

Landing double votes detection

As mentioned in eth2.0-pm#63 it's enough to maintain a bitfield with bits corresponding to epochs of weak subjectivity period for each validator to detect double votes.

This approach requires an update of each validator's bitfield since current epoch changes. To avoid this complexity we may revert validator <-> epoch relation and store a bitfield of validators voted for particular epoch.

It's worth noting that bitfield index aids surrounding votes check as well by a second round of in-memory filtering with validator indices voted for a particular target.

Size of this index linearly dependends on a number of validators and equals 300000 / 8 * 54000 ~ 2Gb for 300K.

Compressing bitfields

It might make sense to compress those bitfields to reduce their effective storage size.

Compression with Snappy:

inactive validators per epoch compression ratio compressed size
1/16 1.55 1.3Gb
1/8 1.11 1.8Gb
1/4 1 2Gb

Note: compression of original bitfield index with epoch bitfields per each validator might be more efficient since they have lower entropy.

Credits

Select a repo