# Optimized Prysm Slasher Design ## Background Currently, our slasher implementation uses a simple approach towards min-max span detection. The problem of slashing detection is non-trivial, as a slasher needs to keep track of attesting histories for every single validator in some optimal format to account for space required and also observe every single attestation the beacon node sees to perform slashing detection. The hardest part about building a slasher is when it comes to optimizing around resource use. Our end goal is for an average staker to be able to run a slasher in the same way they can run a beacon node. Unfortuntely, our current approach does not fare well in terms of resource use. We need to revamp it from the ground up and test the assumptions we made when first developing it. Thankfully, Lighthouse has done a lot of work on optimized slashing detection here: https://hackmd.io/@sproul/min-max-slasher and their slasher implementation is a great reference that will be the main backbone of this design discussion. Also, see protolambda's exploration of this problem here: https://github.com/protolambda/eth2-surround. **We have a working prototype** of this design document here: https://github.com/prysmaticlabs/prysm/tree/advancedslasher ### Explaining surround votes The hardest part about slashing detection comes when considering how to detect **surround votes** for attesters. As such, this design will only focus on that aspect of slasher. At a high-level, a surround vote happens when an attester creates an attestation that is trying to "contradict" history and chain consensus. For example, if they create an attestation that shows they believe consensus should be different than what the chain believes. Let's take a look at an example first: ``` A = (source: 1, target: 2) B = (source: 0, target: 3) ``` ![](https://i.imgur.com/0OITIpy.png) The second attestation, $B$, is _surrounding_ the first one, $A$, because the validator is changing their mind by saying the justified epoch is actually in the past of what their previous attestation says it is. This is a slashable offense. Conversely, the $A$ is _surrounded_ by $B$. If we see a validator created this pair of attestations in either order, such validator needs to be **slashed**. That is, both surrounding and surrounded attestations count as slashable offenses. The reason this sort of event is difficult to effectively detect is because with a naive approach, we need to keep track of _every_ attestation source, target pair for every single validator, and for every single new attestation that comes in, we need to loop through all of these for each validator to cross-check whether there is a slashable event. As expected, the memory and time complexity of the naive approach is unreasonable. Naive approach: 1. Receive an attestation: 2. Store source_epoch => [target_epoch] in the DB 3. When checking if an attestation = att is slashable for a public key - For every (source, [target]) in the DB - checkSurround(att.source, source, att.target, target) ### Better way to detect surround votes Formally, a surround vote is defined as: an attestation $A$ surrounds $B$ if and only if $$ A.source.epoch < B.source.epoch < B.target.epoch < A.target.epoch $$ Using this definition, we can create create two arrays that can help us quickly detect surround votes, namely `min_targets` and `max_targets`. Where we define: `min_targets[i]` is the minimum target of an attestation with source epoch greater than $i$ `max_targets[i]` is the maximum target of an attestation with source epoch less than $i$ In lighthouse's design doc https://hackmd.io/@sproul/min-max-slasher, they mathematically prove surround votes can be caught using these arrays because, for an an attestation $A$: $A$ surrounds an existing attestation $\iff$ $A$.target.epoch > min_targets[$A$.source.epoch] and $A$ is surrounded by an existing attestation $\iff$ $A$.target.epoch < max_targets[$A$.source.epoch] Given we only care about attestations within a weak subjectivity period, for every validator, we have to store 2 arrays of type `[]uint64`, with length 54k. Even then, for many validators, each of these arrays are wasteful given we do not actually need to store target epochs at all. Instead, we can store the difference between current epoch and target epoch, guaranteeing values in our arrays are bounded by `MaxUint16`, further reducing the space requirements. **We will be calling these epoch distances `max_spans` and `min_spans`, respectively**. Let's take a closer look to ensure we understand what these arrays are. `min_spans[e]` is defined as the `min((att.target.epoch - e) for att in attestations)` where `att.source.epoch > e`. That is, it is the minimum distance between the specified epoch and all attestation target epochs a validator has created where `att.source.epoch > e`. `max_spans[e]` is defined as the `max((att.target.epoch - e) for att in attestations)` where `att.source.epoch < e`. That is, it is the maximum distance between the specified epoch and all attestation target epochs a validator has created where `att.source.epoch < e`. Remember, the reason we do this is to optimize for storage. Difference between epochs can be stored as `uint16` values instead of `uint64`, giving us precious space optimizations. The above is pretty hard to understand without an example. So, let's try to construct some min and max spans with the following attestation history, which would be the ideal attesting history since genesis up to and including epoch 4. ``` (source: 0, target: 1) (source: 1, target: 2) (source: 2, target: 3) (source: 3, target: 4) ``` The attestations above represent ideal network conditions. We'll build up min spans first, and we have `5` epochs worth of data. **Building min spans manually** For `min_spans[0]`, we find all attestations with `source > 0`. Then, for each, we compute the difference `(target - 0)`. The smallest such value is `2`, so we set the value as `2`. For `min_spans[1]`, we find all attestations with `source > 1`. Then, for each, we compute the difference `(target - 1)`. The smallest such value is `2`, so we set the value as `2`. For `min_spans[2]`, we find all attestations with `source > 2`. Then, for each, we compute the difference `(target - 2)`. The smallest such value is `2`, so we set the value as `2`. For `min_spans[3]`, we find all attestations with `source > 3`. We have none, so we set the value to `undefined` Our list looks like this `min_spans = [2, 2, 2, undefined]` **Building max spans manually** For `max_spans[0]`, we find all attestations with `source < 0`. We have none, so we set the value as `0`. The reason we can use `0` here is because detecting a surrounded vote given an attestation $A$ involves checking if $A$.target < max_spans[$A$.source]. Since nothing can be less than 0, 0 serves as a good neutral element for max_spans. More on this is explained in the next section. For `max_spans[1]`, we find all attestations with `source < 1`. Then, for each, we find the maximum `target - 1`, getting a value of `0` For `max_spans[2]`, we find all attestations with `source < 2`. Then, for each, we find the maximum `target - 2`, getting a value of `0` For `max_spans[3]`, we find all attestations with `source < 3`. Then, for each, we find the maximum `target - 3`, getting a value of `0`. Our list looks like this: `max_spans = [0, 0, 0, 0]` ### Detecting surround votes with min/max spans Let's take a look at example of a surround vote and why these spans arrays are so helpful to us. Let's take a look at a surround vote example: ``` A = (source: 3, target: 4) B = (source: 2, target: 5) ``` The second attestation, $B$, is _surrounding_ the first one, $A$. How can we verify that $B$ is surrounding $A$ using our spans? We use `min_spans` in this case with the following condition explained perfectly by the Lighthouse team > If the target epoch of a new attestation, $B$, is greater than the min target for its source, that means there exists a previously signed attestation $A$ with $A$.source > $B$.source and $A$.target < $B$.target, i.e. $B$ surrounds $A$. Moreover, because $A$ is the attestation with the minimum target amongst attestations that could be surrounded by $B$, we know that if the check fails, then $B$ cannot surround any other previously signed attestation. In other words, $B$ surrounds $A$ $\iff$ $B$.target > min_spans[$B$.source]. Ok, so we want to check ``` B.target = 5 < min_spans[B.source = 2] ``` First of all, let's update our min spans with attestation $A$. For `min_spans[2]`, we find all attestations with `source > 2`. Then, for each, we compute the difference `(target - 2)`. The smallest such value is `(4 - 2) = 2`, so we set the value as `2`. So then $5 > 2$, we have a surround vote! That's great! Now what if we saw $B$ first and then $A$? How can we verify that $A$ is surrounded by $B$? We use `max_spans` in this case with the following condition: $A$ is surrounded by $B$ $\iff$ $A$.target < max_spans[$A$.source]. So then, we want to check ``` A.target = 4 max_spans[A.source = 3] = ??? ``` To do this, let's update our max spans with attestation $B$ at epoch 3 For `max_spans[3]`, we find all attestations with `source < 3`. Then, for each, we compute the difference `(target - 2)`. The largest such value is `(5 - 2) = 3`, so we set the value as `3`. ``` A.target = 4 max_spans[A.source = 3] = 3 ``` So then $A$.target = 4 > 3 we have a surrounded vote! The rationale is as follows: > If the target epoch of an attestation $A$ is less than the max target for its source, that means there exists a previously signed attestation $B$ with $B$.source > $A$.source and $B$.target < $A$.target, i.e. $B$ surrounds $A$. Moreover, because $A$ is the attestation with the maximum target amongst attestations that could be surrounded by $B$, we know that if the check fails, then $A$ cannot be surrounded by any other previously signed attestation. These spans arrays give us a nice, compact way of mathematically checking surround votes quite easily! ## Towards an optimal approach: 1D chunking The great thing about these span arrays is that in ideal network conditions, as we covered in the previous section and as what we are seeing on mainnet today, min and max spans for most validators will look like this (assuming each of their attestations target epochs immediately follow their source epochs): $$ \text{min_spans} = [2, 2, 2, ...] $$ and for max spans: $$ \text{max_spans} = [0, 0, 0, ...] $$ Let's think about the naive way of storing these on disk. Something we could do is store two buckets in boltDB, called `minSpansBucket` and `maxSpansBucket` where the keys are validator indices and the values are the respective arrays of 54000 elements (the weak subjectivity period). Since each array is a `[]uint16`, and we have 54000 elements in each, the total data stored on disk for 100,000 validators is: ``` min_spans_size = 54000 * 2 bytes (since each elem is uint16) max_spans_size = 54000 * 2 bytes total_per_validator = min_spans_size + max_spans_size = 216kb total_values = num_validator_indices * total_per_validator = 100,000 * 216kb = 21Gb total ``` 21Gb total for 100k validators ends up being pretty big! Can we do something more to reduce this space? Let's take a look at single validator with 100% attestation effectiveness. Their spans arrays will look like `[2, 2, 2, ...]` and `[0, 0, 0, ...]` Given this data is all redundant, we could split each array up into **chunks** of length $C$. There will be $54000$ / C chunks. So if $C$ = 1000, then we have $54$ chunks of length $1000$. Chunk index $1$ starts at $1000$, chunk index $10$ starts at $10000$, etc. Then, we can take each chunk's hash and store it in the DB. In a separate bucket, we store `(validator_idx ++ chunk_idx) => chunk_hash`. So our schema looks like this: ``` (validator_idx ++ chunk_idx) => chunk_hash chunk_hash => chunk ``` To give an example, let's say validator index 1 is attesting perfectly has a min span array that looks like this [2, 2, 2, 2, ..., 2]. We split it up into $54$ chunks of $C = 1000$. We take the hash of each chunk, and since all of them are $2$ the entire array, all of them have the same hash `0x923099304902934`. This means in that in the DB, we are saving a TON of space, and storing 54 chunks worth of data in a single chunk since they are same, redundant data. Given certain runs of numbers could be repeated across tons of validators, best case scenario we would have only 2 chunks in the database for the entire validator registry! This means that our DB would look something like this ``` 0x923099304902934 -> [2, 2, 2, ..., 2] 0x1231a3723762833 -> [0, 0, 0, ..., 0] ``` if we were to store chunks addressed by hash and we had perfect uniformity. Even if we do not have perfect uniformity, the more uniform each chunk is, the more we can save on space and reuse chunks! This approach is great, however, it becomes complex because we need to perform **database pruning** for chunks that are not in use anymore and the simplicity of the schema makes this really hard to do To do this, we would need to tag each chunk with the latest epoch it was used and perform garbage collection to do this. The lighthouse team abandoned this approach given the complexity of this pruning routine. ### How do you choose C? The parameter $C$, being the number of elements per chunk, is important to determine how optimal our approach is in saving disk space. The space used on disk is proportional to $C*U$ where $U$ is a _uniformity_ factor in a chunk. Namely, how redundant the data in a chunk actually is. The lighthouse team chooses a default value of $16$ based on optimization analysis they performed [here](https://docs.google.com/spreadsheets/d/1UEWfQ_Oh73uarXLqTuKz9JERoRGutfLr-M6bGqX9u-w/edit#gid=0). ## Easiest and optimal approach: 2D chunking An even better approach is as follows. Instead of storing chunks and hashes for each validator index for min and max spans, we can be a little more clever with how we store data on disk. Under ideal network conditions, where every target epoch immediately follows its source, min spans for a set of validators will look as follows: ``` min_spans_validator_0 = [2, 2, 2, ..., 2] min_spans_validator_1 = [2, 2, 2, ..., 2] min_spans_validator_2 = [2, 2, 2, ..., 2] ... ``` Next, we can chunk this list of min spans into chunks of length $C$. For $C$ = 2, for example: ``` chunk0 chunk1 chunkN { } { } { } chunked_min_spans_val_0 = [[2, 2], [2, 2], ..., [2, 2]] chunk0 chunk1 chunkN { } { } { } chunked_min_spans_val_1 = [[2, 2], [2, 2], ..., [2, 2]] chunk0 chunk1 chunkN { } { } { } chunked_min_spans_val_2 = [[2, 2], [2, 2], ..., [2, 2]] ``` Finally, we can store each chunk index for $K$ validators into a single flat slice. For $K$ = 3: ``` val0 val1 val2 { } { } { } chunk_0_for_validators_0_to_3 = [2, 2, 2, 2, 2, 2] val0 val1 val2 { } { } { } chunk_1_for_validators_0_to_3 = [2, 2, 2, 2, 2, 2] val0 val1 val2 { } { } { } chunk_2_for_validators_0_to_3 = [2, 2, 2, 2, 2, 2] ``` Then we have an easy way of representing a chunk $C_i$ for $K$ validators on disk all in a single flat bytes slice. 1D provides de-duplication but requires garbage collection and uses more space in the worst-case than 2D, because we have to store hashes for every chunk and also garbage collection markers to allow us to prune old chunks. In both approaches, we still have to store the 2*54000 on disk for the spans. We store these in a single bucket in our database with the schema `chunk_id => flat_chunk` where the `chunk_id` for an incoming attestation is computed as follows from a validator index and epoch: ``` validator_chunk_number = validator_index / K chunk_number = att.source.epoch / C chunk_id = validator_chunk_number * C + chunk_number ``` The lighthouse design doc outlines this approach and mentiones the new update method when receiving an attestation is: 1. Load the chunk for `chunk_id(validator_index, att.source.epoch)` 2. Fetch the validator’s chunk from the validator_index % K sub-chunk 3. Check for slashing and update the chunk (lazy load neighbours and modifying them as necessary) 4. Write all updated chunks back to the database Upsides 5. Single chunk table (no need for the index, thus smaller overall) There is no need for hashing in this scheme so there is fewer data we need to store, making this approach space efficient and predictable. Let's take a look at how an update process specifically works for this 2D chunking scheme: Let's set $H$ = historyLength = 2, meaning a min span will hold 2 epochs worth of attesting history. Then we set $C$ = 2 meaning we will chunk the min span into arrays each of length 2 and $K$ = 3 meaning we store each chunk index for 3 validators at a time. So assume we get a target 3 for source 0 and validator 0, then, we need to update every epoch in the span from 3 to 0 inclusive. First, we find out which chunk epoch 3 falls into, which is calculated as: `chunk_idx = (epoch % H) / C = (3 % 4) / 2 = 1` ``` val0 val1 val2 { } { } { } chunk_1_for_validators_0_to_3 = [[nil, nil], [nil, nil], [nil, nil]] | | | |-> epoch 3 for validator 0 | |-> epoch 2 for validator 0 val0 val1 val2 { } { } { } chunk_0_for_validators_0_to_3 = [[nil, nil], [nil, nil], [nil, nil]] | | | |-> epoch 1 for validator 0 | |-> epoch 0 for validator 0 ``` Next up, we proceed with the update process for validator index 0, starting epoch 3 updating every value along the way according to the update rules for min spans. Once we finish updating a chunk, we need to move on to the next chunk. We stop whenever we reach the min epoch we need to update, in our example, we stop at 0, which is a part chunk 0, so we need to perform updates across 2 different "2D" datastructures as shown above. After doing the math, the chunk_0 for validator_0 will look like `[1, 0]` and chunk_1 validator_0 will look like `[3, 2]`. ## Requirements of our implementation When actually writing the code for our slasher, we need to keep the following items in mind: - Simplicity in the implementation - Caching strategies only where needed and done at the very end - Ability to scale to large numbers of validators under reasonable resource constraints - Extremely easy to test - Easy entrypoint to run a simulation of any number of validators and attestations for testing correctness and benchmarking performance - Readable code with comprehensive godoc comments and a simple package structure - Performant in terms of disk I/O ## Design Question: Part of Beacon Node vs. Independent Process Currently, we designed our slasher to be its own independent process. However, this led to a lot more pain the benefit as we realzed the needs of slasher a bit too late. In particular, slasher needs access to all indexed attestations observed by the beacon node, and also needs access to certain information about the validator registry. Given we do not store indexed attestations in the database, our current slasher reaches out to a beacon node gRPC endpoint which duplicates work by converting attestations to indexed form and also aggregating them, which puts a lot of load in our beacon node process at runtime, especially during times of low chain participation. Given that at the most basic level, having a simple feed of indexed attestations is all we need for slashing detection, we posit that slasher should instead be a service in a running beacon node. Another reason is that slasher is so tightly coupled with a beacon node that it makes sense to have both of them together. ## Implementation Inspired by lighthouse's slasher implementation (https://github.com/sigp/lighthouse/tree/stable/slasher/src), our approach mirrors its general flow. ### Components Here are the required components for our implementation: - Source of indexed attestations (a beacon node sending to an event feed, for example) - Receiver of attestations from an event feed, able to perform basic integrity checks and enqueue received attestations for processing - Background routine running on_tick to dequeue attestations, group them by validator chunk indices, and process those batches - A `Chunker` interface, specifying methods to update chunks, detect slashings within a chunk, etc. - a `MinChunkList` implementation of the Chunker interface - a `MaxChunkList` implementation of the Chunker interface - A function which, for each attestation, is able to update get the chunks for all validators involved in the attesting indices, update min and max chunks, and save them to the database. This function will leverage the `Chunker` interface - Simulator that can serve as a mock source of indexed attestations for benchmarking, testing, and more - Background routine which periodically deletes redundant attestations from the db - API endpoints for remote slashing protection - Ability to prune old data and defer attestations from the future for later processing (out of scope) ### The flow of an attestation When we first receive an attestation via some source, we: - Check if the attestation is valid (source epoch < target epoch) - If not, discard - Check if attestation.target.epoch <= current epoch - If not, defer for future processing - Enqueue verified attestations for batch processing In the background, we run a goroutine which processes enqueued attestations on tick intervals (per epoch): ```flow st=>start: OnTick(Epoch) ed=>end: End group=>operation: Group by validator_chunk_idx process_batch=>operation: Process batch (validator_chunk_idx, []attestations) split_batch=>operation: Group batch by source_chunk_idx save_atts=>operation: SaveToDB(attestations in batch) update_min=>subroutine: slashings := UpdateChunks(MinChunk) update_max=>subroutine: slashings := UpdateChunks(MaxChunk) para=>parallel: FetchChunks(batch) process_slashings=>operation: ProcessSlashings(slashings) update_epoch_written=>operation: Update latest epoch written(validators in chunk) st->group->process_batch->save_atts->para para(path1, right)->update_max->process_slashings para(path2, bottom)->update_min->process_slashings process_slashings->ed ``` Grouping attestations by validator chunk index will allow us to process batches easily and concurrently in goroutines, making our lives easier. ### The chunk interface Given the routine for updating chunks is the same at a high level for both min and max spans, we can define a `Chunker` interface that defines some methods that actually do the heavy lifting. ```go type Chunker interface { NeutralElement() uint16 Chunk() []uint16 CheckSlashable( validatorIdx uint64, attestation *ethpb.IndexedAttestation, ) (bool, slashingKind, error) Update( chunkIdx, validatorIdx, startEpoch uint64, attestation *ethpb.IndexedAttestation, ) (bool, error) StartEpoch(sourceEpoch, currentEpoch uint64) (uint64, bool) NextEpoch(startEpoch uint64) uint64 } ``` Then, we can implement this interface for min and max spans as they differ slightly in how the underlying implementation of these methods work. ```go type MinChunk struct { config *Config data []uint16 } type MaxChunk struct { config *Config data []uint16 } ``` ### Testing Given how unintuitive the optimized slasher implementation is, we need a way to ensure it works as expected for all kinds of edge cases, and ensure it can work on as many test scenarios we can devise. To do this, we leverage the simplicity of the slasher entrypoint being a single feed of indexed attestations. Slasher does not care what the source is, as long as it can receive these attestations from some event feed. Additionally, we can create a simulator that can craft all sorts of combinations of indexed attestations for any number of validators and epochs. ```go type MockFeeder struct { feed *event.Feed config *Config } func (m *MockFeeder) IndexedAttestationFeed() *event.Feed { return m.feed } func (m *MockFeeder) generateFakeAttestations(ctx context.Context) { ticker := slotutil.GetEpochTicker(m.config.GenesisTime, m.config.Interval) defer ticker.Done() for { select { case currentEpoch := <-ticker.C(): att := m.generateIndexedAtt(ctx) m.feed.Send(indexedAtt) case <-ctx.Done(): return } } } func (m *MockFeeder) generateIndexedAtt(ctx context.Context) { ... } ``` ## Work Estimates This design has now been completely implemented in Prysm under the `feature/slasher` branch of as March 25, 2021