Currently, our slasher implementation uses a simple approach towards min-max span detection. The problem of slashing detection is non-trivial, as a slasher needs to keep track of attesting histories for every single validator in some optimal format to account for space required and also observe every single attestation the beacon node sees to perform slashing detection. The hardest part about building a slasher is when it comes to optimizing around resource use. Our end goal is for an average staker to be able to run a slasher in the same way they can run a beacon node.
Unfortuntely, our current approach does not fare well in terms of resource use. We need to revamp it from the ground up and test the assumptions we made when first developing it. Thankfully, Lighthouse has done a lot of work on optimized slashing detection here: https://hackmd.io/@sproul/min-max-slasher and their slasher implementation is a great reference that will be the main backbone of this design discussion. Also, see protolambda's exploration of this problem here: https://github.com/protolambda/eth2-surround.
We have a working prototype of this design document here: https://github.com/prysmaticlabs/prysm/tree/advancedslasher
The hardest part about slashing detection comes when considering how to detect surround votes for attesters. As such, this design will only focus on that aspect of slasher.
At a high-level, a surround vote happens when an attester creates an attestation that is trying to "contradict" history and chain consensus. For example, if they create an attestation that shows they believe consensus should be different than what the chain believes. Let's take a look at an example first:
A = (source: 1, target: 2)
B = (source: 0, target: 3)
The second attestation,
The reason this sort of event is difficult to effectively detect is because with a naive approach, we need to keep track of every attestation source, target pair for every single validator, and for every single new attestation that comes in, we need to loop through all of these for each validator to cross-check whether there is a slashable event. As expected, the memory and time complexity of the naive approach is unreasonable.
Naive approach:
Formally, a surround vote is defined as:
an attestation
Using this definition, we can create create two arrays that can help us quickly detect surround votes, namely min_targets
and max_targets
.
Where we define:
min_targets[i]
is the minimum target of an attestation with source epoch greater than
max_targets[i]
is the maximum target of an attestation with source epoch less than
In lighthouse's design doc https://hackmd.io/@sproul/min-max-slasher, they mathematically prove surround votes can be caught using these arrays because, for an an attestation
and
Given we only care about attestations within a weak subjectivity period, for every validator, we have to store 2 arrays of type []uint64
, with length 54k.
Even then, for many validators, each of these arrays are wasteful given we do not actually need to store target epochs at all. Instead, we can store the difference between current epoch and target epoch, guaranteeing values in our arrays are bounded by MaxUint16
, further reducing the space requirements. We will be calling these epoch distances max_spans
and min_spans
, respectively. Let's take a closer look to ensure we understand what these arrays are.
min_spans[e]
is defined as the min((att.target.epoch - e) for att in attestations)
where att.source.epoch > e
. That is, it is the minimum distance between the specified epoch and all attestation target epochs a validator has created where att.source.epoch > e
.
max_spans[e]
is defined as the max((att.target.epoch - e) for att in attestations)
where att.source.epoch < e
. That is, it is the maximum distance between the specified epoch and all attestation target epochs a validator has created where att.source.epoch < e
.
Remember, the reason we do this is to optimize for storage. Difference between epochs can be stored as uint16
values instead of uint64
, giving us precious space optimizations. The above is pretty hard to understand without an example. So, let's try to construct some min and max spans with the following attestation history, which would be the ideal attesting history since genesis up to and including epoch 4.
(source: 0, target: 1)
(source: 1, target: 2)
(source: 2, target: 3)
(source: 3, target: 4)
The attestations above represent ideal network conditions. We'll build up min spans first, and we have 5
epochs worth of data.
Building min spans manually
For min_spans[0]
, we find all attestations with source > 0
. Then, for each, we compute the difference (target - 0)
. The smallest such value is 2
, so we set the value as 2
.
For min_spans[1]
, we find all attestations with source > 1
. Then, for each, we compute the difference (target - 1)
. The smallest such value is 2
, so we set the value as 2
.
For min_spans[2]
, we find all attestations with source > 2
. Then, for each, we compute the difference (target - 2)
. The smallest such value is 2
, so we set the value as 2
.
For min_spans[3]
, we find all attestations with source > 3
. We have none, so we set the value to undefined
Our list looks like this min_spans = [2, 2, 2, undefined]
Building max spans manually
For max_spans[0]
, we find all attestations with source < 0
. We have none, so we set the value as 0
. The reason we can use 0
here is because detecting a surrounded vote given an attestation
For max_spans[1]
, we find all attestations with source < 1
. Then, for each, we find the maximum target - 1
, getting a value of 0
For max_spans[2]
, we find all attestations with source < 2
. Then, for each, we find the maximum target - 2
, getting a value of 0
For max_spans[3]
, we find all attestations with source < 3
. Then, for each, we find the maximum target - 3
, getting a value of 0
.
Our list looks like this: max_spans = [0, 0, 0, 0]
Let's take a look at example of a surround vote and why these spans arrays are so helpful to us. Let's take a look at a surround vote example:
A = (source: 3, target: 4)
B = (source: 2, target: 5)
The second attestation, min_spans
in this case with the following condition explained perfectly by the Lighthouse team
If the target epoch of a new attestation,
, is greater than the min target for its source, that means there exists a previously signed attestation with .source > .source and .target < .target, i.e. surrounds . Moreover, because is the attestation with the minimum target amongst attestations that could be surrounded by , we know that if the check fails, then cannot surround any other previously signed attestation.
In other words,
Ok, so we want to check
B.target = 5 < min_spans[B.source = 2]
First of all, let's update our min spans with attestation
For min_spans[2]
, we find all attestations with source > 2
. Then, for each, we compute the difference (target - 2)
. The smallest such value is (4 - 2) = 2
, so we set the value as 2
.
So then
That's great! Now what if we saw max_spans
in this case with the following condition:
So then, we want to check
A.target = 4
max_spans[A.source = 3] = ???
To do this, let's update our max spans with attestation
For max_spans[3]
, we find all attestations with source < 3
. Then, for each, we compute the difference (target - 2)
. The largest such value is (5 - 2) = 3
, so we set the value as 3
.
A.target = 4
max_spans[A.source = 3] = 3
So then
If the target epoch of an attestation
is less than the max target for its source, that means there exists a previously signed attestation with .source > .source and .target < .target, i.e. surrounds . Moreover, because is the attestation with the maximum target amongst attestations that could be surrounded by , we know that if the check fails, then cannot be surrounded by any other previously signed attestation.
These spans arrays give us a nice, compact way of mathematically checking surround votes quite easily!
The great thing about these span arrays is that in ideal network conditions, as we covered in the previous section and as what we are seeing on mainnet today, min and max spans for most validators will look like this (assuming each of their attestations target epochs immediately follow their source epochs):
and for max spans:
Let's think about the naive way of storing these on disk. Something we could do is store two buckets in boltDB, called minSpansBucket
and maxSpansBucket
where the keys are validator indices and the values are the respective arrays of 54000 elements (the weak subjectivity period).
Since each array is a []uint16
, and we have 54000 elements in each, the total data stored on disk for 100,000 validators is:
min_spans_size = 54000 * 2 bytes (since each elem is uint16)
max_spans_size = 54000 * 2 bytes
total_per_validator = min_spans_size + max_spans_size = 216kb
total_values = num_validator_indices * total_per_validator
= 100,000 * 216kb = 21Gb total
21Gb total for 100k validators ends up being pretty big! Can we do something more to reduce this space?
Let's take a look at single validator with 100% attestation effectiveness. Their spans arrays will look like [2, 2, 2, ...]
and [0, 0, 0, ...]
Given this data is all redundant, we could split each array up into chunks of length
Then, we can take each chunk's hash and store it in the DB. In a separate bucket, we store (validator_idx ++ chunk_idx) => chunk_hash
. So our schema looks like this:
(validator_idx ++ chunk_idx) => chunk_hash
chunk_hash => chunk
To give an example, let's say validator index 1 is attesting perfectly has a min span array that looks like this [2, 2, 2, 2, …, 2]. We split it up into 0x923099304902934
. This means in that in the DB, we are saving a TON of space, and storing 54 chunks worth of data in a single chunk since they are same, redundant data.
Given certain runs of numbers could be repeated across tons of validators, best case scenario we would have only 2 chunks in the database for the entire validator registry!
This means that our DB would look something like this
0x923099304902934 -> [2, 2, 2, ..., 2]
0x1231a3723762833 -> [0, 0, 0, ..., 0]
if we were to store chunks addressed by hash and we had perfect uniformity. Even if we do not have perfect uniformity, the more uniform each chunk is, the more we can save on space and reuse chunks!
This approach is great, however, it becomes complex because we need to perform database pruning for chunks that are not in use anymore and the simplicity of the schema makes this really hard to do To do this, we would need to tag each chunk with the latest epoch it was used and perform garbage collection to do this. The lighthouse team abandoned this approach given the complexity of this pruning routine.
The parameter
An even better approach is as follows. Instead of storing chunks and hashes for each validator index for min and max spans, we can be a little more clever with how we store data on disk. Under ideal network conditions, where every target epoch immediately follows its source,
min spans for a set of validators will look as follows:
min_spans_validator_0 = [2, 2, 2, ..., 2]
min_spans_validator_1 = [2, 2, 2, ..., 2]
min_spans_validator_2 = [2, 2, 2, ..., 2]
...
Next, we can chunk this list of min spans into chunks of length
chunk0 chunk1 chunkN
{ } { } { }
chunked_min_spans_val_0 = [[2, 2], [2, 2], ..., [2, 2]]
chunk0 chunk1 chunkN
{ } { } { }
chunked_min_spans_val_1 = [[2, 2], [2, 2], ..., [2, 2]]
chunk0 chunk1 chunkN
{ } { } { }
chunked_min_spans_val_2 = [[2, 2], [2, 2], ..., [2, 2]]
Finally, we can store each chunk index for
val0 val1 val2
{ } { } { }
chunk_0_for_validators_0_to_3 = [2, 2, 2, 2, 2, 2]
val0 val1 val2
{ } { } { }
chunk_1_for_validators_0_to_3 = [2, 2, 2, 2, 2, 2]
val0 val1 val2
{ } { } { }
chunk_2_for_validators_0_to_3 = [2, 2, 2, 2, 2, 2]
Then we have an easy way of representing a chunk
We store these in a single bucket in our database with the schema chunk_id => flat_chunk
where the chunk_id
for an incoming attestation is computed as follows from a validator index and epoch:
validator_chunk_number = validator_index / K
chunk_number = att.source.epoch / C
chunk_id = validator_chunk_number * C + chunk_number
The lighthouse design doc outlines this approach and mentiones the new update method when receiving an attestation is:
chunk_id(validator_index, att.source.epoch)
There is no need for hashing in this scheme so there is fewer data we need to store, making this approach space efficient and predictable.
Let's take a look at how an update process specifically works for this 2D chunking scheme:
Let's set
will hold 2 epochs worth of attesting history. Then we set
chunk the min span into arrays each of length 2 and
So assume we get a target 3 for source 0 and validator 0, then, we need to update every epoch in the span from
3 to 0 inclusive. First, we find out which chunk epoch 3 falls into, which is calculated as:
chunk_idx = (epoch % H) / C = (3 % 4) / 2 = 1
val0 val1 val2
{ } { } { }
chunk_1_for_validators_0_to_3 = [[nil, nil], [nil, nil], [nil, nil]]
| |
| |-> epoch 3 for validator 0
|
|-> epoch 2 for validator 0
val0 val1 val2
{ } { } { }
chunk_0_for_validators_0_to_3 = [[nil, nil], [nil, nil], [nil, nil]]
| |
| |-> epoch 1 for validator 0
|
|-> epoch 0 for validator 0
Next up, we proceed with the update process for validator index 0, starting epoch 3
updating every value along the way according to the update rules for min spans.
Once we finish updating a chunk, we need to move on to the next chunk. We stop whenever we reach the min epoch we need
to update, in our example, we stop at 0, which is a part chunk 0, so we need to perform updates across 2 different "2D" datastructures as shown above.
After doing the math, the chunk_0 for validator_0 will look like [1, 0]
and chunk_1 validator_0 will look like [3, 2]
.
When actually writing the code for our slasher, we need to keep the following items in mind:
Currently, we designed our slasher to be its own independent process. However, this led to a lot more pain the benefit as we realzed the needs of slasher a bit too late. In particular, slasher needs access to all indexed attestations observed by the beacon node, and also needs access to certain information about the validator registry. Given we do not store indexed attestations in the database, our current slasher reaches out to a beacon node gRPC endpoint which duplicates work by converting attestations to indexed form and also aggregating them, which puts a lot of load in our beacon node process at runtime, especially during times of low chain participation. Given that at the most basic level, having a simple feed of indexed attestations is all we need for slashing detection, we posit that slasher should instead be a service in a running beacon node. Another reason is that slasher is so tightly coupled with a beacon node that it makes sense to have both of them together.
Inspired by lighthouse's slasher implementation (https://github.com/sigp/lighthouse/tree/stable/slasher/src), our approach mirrors its general flow.
Here are the required components for our implementation:
Chunker
interface, specifying methods to update chunks, detect slashings within a chunk, etc.
MinChunkList
implementation of the Chunker interfaceMaxChunkList
implementation of the Chunker interfaceChunker
interfaceWhen we first receive an attestation via some source, we:
In the background, we run a goroutine which processes enqueued attestations on tick intervals (per epoch):
Grouping attestations by validator chunk index will allow us to process batches easily and concurrently in goroutines, making our lives easier.
Given the routine for updating chunks is the same at a high level for both min and max spans, we can define a Chunker
interface that defines some methods that actually do the heavy lifting.
type Chunker interface {
NeutralElement() uint16
Chunk() []uint16
CheckSlashable(
validatorIdx uint64,
attestation *ethpb.IndexedAttestation,
) (bool, slashingKind, error)
Update(
chunkIdx,
validatorIdx,
startEpoch uint64,
attestation *ethpb.IndexedAttestation,
) (bool, error)
StartEpoch(sourceEpoch, currentEpoch uint64) (uint64, bool)
NextEpoch(startEpoch uint64) uint64
}
Then, we can implement this interface for min and max spans as they differ slightly in how the underlying implementation of these methods work.
type MinChunk struct {
config *Config
data []uint16
}
type MaxChunk struct {
config *Config
data []uint16
}
Given how unintuitive the optimized slasher implementation is, we need a way to ensure it works as expected for all kinds of edge cases, and ensure it can work on as many test scenarios we can devise. To do this, we leverage the simplicity of the slasher entrypoint being a single feed of indexed attestations. Slasher does not care what the source is, as long as it can receive these attestations from some event feed.
Additionally, we can create a simulator that can craft all sorts of combinations of indexed attestations for any number of validators and epochs.
type MockFeeder struct {
feed *event.Feed
config *Config
}
func (m *MockFeeder) IndexedAttestationFeed() *event.Feed {
return m.feed
}
func (m *MockFeeder) generateFakeAttestations(ctx context.Context) {
ticker := slotutil.GetEpochTicker(m.config.GenesisTime, m.config.Interval)
defer ticker.Done()
for {
select {
case currentEpoch := <-ticker.C():
att := m.generateIndexedAtt(ctx)
m.feed.Send(indexedAtt)
case <-ctx.Done():
return
}
}
}
func (m *MockFeeder) generateIndexedAtt(ctx context.Context) {
...
}
This design has now been completely implemented in Prysm under the feature/slasher
branch of as March 25, 2021