These are my design notes for a min-max slasher for Lighthouse, the Ethereum 2.0 client from Sigma Prime.
In this post Proto describes a scheme for efficently storing attestation data so that surround slashings can be quickly detected. Detecting surrounds is the difficult part of building a slasher, so we'll focus mainly on that for now and come back to detecting double votes later.
Proto's construction is as follows:
The most conceptually straight-forward variant involves storing the min and max targets directly.
Let \(A_v\) be the set of all attestations signed by validator \(v\). Each \(a \in A_v\) has type AttestationData
.
For each validator \(v\), let \(\text{min_targets}_v\) and \(\text{max_targets}_v\) be 54K element arrays such that:
\[ \text{min_targets}_v[i] = \textbf{min}(\{a\text{.target.epoch} \mid a \in A_v,\, a\text{.source.epoch > i }\}) \\ \text{max_targets}_v[i] = \textbf{max}(\{a\text{.target.epoch} \mid a \in A_v,\, a\text{.source.epoch < i }\}) \]
In natural language:
\(\text{min_targets}_v[i]%\) is the minimum target of an attestation with source epoch greater than \(i\).
\(\text{max_targets}_v[i]%\) is the maximum target of an attestation with source epoch less than \(i\).
\[ x \;\text{surrounds an existing attestation} \iff x\text{.target.epoch} > \text{min_targets}_v[x \text{.source.epoch}] \]
If we have a new attestation \(x\), then we can check whether it is surrounding any existing attestation by checking if \(x\text{.target.epoch} > \text{min_targets}_v[x\text{.source.epoch}]\). If the target epoch of the new attestation is greater than the min target for its source, that means there exists a previously signed attestation \(a\) with \(a\text{.source.epoch} > x\text{.source.epoch}\) and \(\text{a.target.epoch} < x\text{.source.epoch}\), i.e. \(x\) surrounds \(a\). Moreover, because \(a\) is the attestation with the minimum target amongst attestations that could be surrounded by \(x\), we know that if the check fails, then \(x\) cannot surround any other previously signed attestation. This has been a hand-wavy proof of Lemma 1.
\[ x \;\text{is surrounded by an existing attestation} \iff x\text{.target.epoch} < \text{max_targets}_v[x \text{.source.epoch}] \]
Similarly, the max targets array allows us to check whether a new attestation \(x\) is surrounded by any existing attestation. If \(x\text{.target.epoch} < \text{max_targets}_v[x\text{.source.epoch}]\) then \(\exists a \in A_v. a\text{.source.epoch} < x\text{.source.epoch} \wedge a\text{.target.epoch} > x\text{.target.epoch}\), i.e. \(a\) surrounds \(x\). By similar reasoning to above we have the other direction of the iff.
So far we've considered arrays of length 54K epochs without considering what happens once the chain progresses past epoch 54K. In this case, we want to wrap around the end of the array so that we are still storing the same amount of history, while overwriting the oldest values.
For an epoch \(e\) within the weak subjectivity period we store the minimum/maximum target for \(e\) at index \(e \bmod N\).
\[ \text{min_targets}_v[e \bmod N] = \textbf{min}(\{a\text{.target.epoch} \mid a \in A_v,\, a\text{.source.epoch > e }\}) \\ \text{max_targets}_v[e \bmod N] = \textbf{max}(\{a\text{.target.epoch} \mid a \in A_v,\, a\text{.source.epoch < e }\}) \]
This means that the arrays now require knowledge of the current epoch which necessitates some clarifications:
Let \(\top\) (top) be the target that is greater than all others (e.g. INT_MAX
).
For \(\text{min_targets}_v\):
Initialising the min target for new epochs to \(\top\) reflects that we have no information about attestations with source epochs greater than new epochs, and ensures that any attestation with source equal to the new epoch is unslashable (Lemma 1).
For \(\text{max_targets}_v\):
For max targets, the initial 0 value represents a lack of information, and ensures new attestations are not slashable (Lemma 2). The epoch update is more interesting: if a maximum target is known for the current epoch, then it also applies to the next epoch because the \(a\) with max target has \(a\text{.source.epoch} < \text{current_epoch} < \text{next_epoch}\). However, there could also be attestations with \(a\text{.source.epoch} = \text{current_epoch}\) which should also be taken into account. To deal with this, an epoch of lookahead could be implemented, either by constraining the source of attestations processed: \(\text{current_epoch} - (N - 1) < a\text{.source.epoch} \leq \text{current_epoch} - 1\), or something equivalent. We'll see how this falls out in the implementation.
Lemmas 1 & 2 give us rules for checking when a new attestation is slashable, but we also need to update the min-max arrays for the existence of the new attestation. We can do this efficiently using the following algorithms:
Rather than storing the target of attestations in the arrays directly, we can instead equivalently store their distance from epoch \(e\).
\[ \text{min_targets}_v[e \bmod N] = \textbf{min}(\{a\text{.target.epoch} - e \mid a \in A_v,\, a\text{.source.epoch > e }\}) \\ \text{max_targets}_v[e \bmod N] = \textbf{max}(\{a\text{.target.epoch} - e \mid a \in A_v,\, a\text{.source.epoch < e }\}) \]
This has several advantages:
u16
. The distance will always fit inside a u16
.If all attestations are targetting the epoch immediately after their source epoch, then the arrays are completely uniform:
\[ \text{min_targets}_v = [2, 2, 2, \ldots] \\ \text{max_targets}_v = [0, 0, 0, \ldots] \\ \]
In implementation, we have several options for compressing these arrays, depending on our choice of data store.
It is likely that as well as repetition within the arrays for a single validator, there will also be repetition across the attestation behaviours of validators. It is essential that we take advantage of this repetition to reduce disk usage, as naively the arrays could use up to 64.8 GB, and we don't want to be rewriting entire arrays all the time.
Here's a first draft of a chunking scheme, that comes with some drawbacks:
C
be the number of array elements per chunk(validator_index: u64, chunk_idx: u16) => ChunkHash
When a new attestation appears:
chunk_hash
for (validator_index, chunk_idx)
, where chunk_idx = att.source.epoch / C
chunk_hash
(validator_index, chunk_idx)
. This will involve hashing each updated chunk, so we should choose a cheap hash function (maybe BLAKE2b)In order to implement garbage collection, we could tag each chunk with the epoch at which it was last used. This would use an extra 8 bytes per chunk, and require an extra write to every chunk used in an epoch.
232323...
can be stored once and referenced from multiple (validator_id, chunk_idx)
pairs2 * V
chunks need to be written to disk. With C=240 (~one day) and V=300k this is only 2 * 2 * 240 * 300_000
= 288 MB[u16; 240]
shows that it runs in less than 1μs! For arrays of 2048 elements (far beyond what we're likely to want) the hashing time is ~4μs.V * N/C * H
where V
is num validators and H
is the size of a hash (probably 32 bytes). That's around 2.16 GB for C=240
, smaller with larger chunk sizes.U
times the full size 2 * 2 * V * N
(64.8 GB for V=300k, U=1.0).Here's another approach to chunking, due to Proto.
Split each array into "horizontal" runs of length C
. Then, take K
runs of length C
from different validators and store these as a chunk on disk (size 2 * K * C
bytes).
chunk_id => chunk
, where chunk_id
is computed from validator index and epoch like so:
validator_chunk_number = validator_index / K
chunk_number = att.source.epoch / C
chunk_id = validator_chunk_number * C + chunk_number
(not necessary in an SQL DB)The update protocol for a new att
is:
chunk_id(validator_index, att.source.epoch)
validator_index % K
sub-chunkK
chunks containing all validators get updated… TODOThe real differentiaton between 1D and 2D is:
1D provides de-duplication but requires garbage collection and uses more space in the worst-case than 2D.
What's the break-even point for 1D to use less space?
Assuming a single array for simplicity
1D Disk Usage:
* Index: H*V*N/C
* Chunks: U*2*V*N
(U
uniqueness factor)
2D Disk Usage:
* Chunks: 2*V*N
Solving for equality yields U = 1 - H/2C
as the break-even point at which both strategies use the same disk space.
With the last_used_epoch
tags for GC:
1D Disk Usage:
* Index: H*V*N/C
* Chunks: U*V*N/C*(2C + 8)
We get U = (2C - H)/(2C + 8)
Some numbers are here: https://docs.google.com/spreadsheets/d/1UEWfQ_Oh73uarXLqTuKz9JERoRGutfLr-M6bGqX9u-w
It looks like the de-duplication could prove useful, particularly if we allow ourselves 8 or 16 byte BLAKE2b hashes. However, due to the extra complexity of this approach, I'm going to start by implementing 2D chunking.
Although we could rely on the compression provided by a key-value store or relational database, support is patchy between databases, and we can probably do better with a custom-designed format. For completeness, here's a quick summary of compression support accross different stores.
LevelDB uses Snappy, but LMDB doesn't support any form of compression.
Postgres has no support for compressed tables, but can compress individual rows, which could work with a one-array-per-row storage schema. MariaDB supports compressed pages, which would work with one-array-per-row or one-element-per-row.
This section mostly obsoleted by chunking, above^
To compress the long runs of identical distances, it seems run-length encoding (RLE) could be effective. Noting that the min target for the current epoch \(e_C\) will always be \(\top\), the run-length encoding \(E\) of an ideal \(\text{min_taherergets}\) array would be:
\[ \begin{equation} \begin{aligned} \text{min_targets}_v = & \, [2, \ldots, 2, \top, 2, \ldots] \\ E(\text{min_targets}_v) = & \, [(e_C \bmod N, 2), (1, \top), (N - (e_C \bmod N) - 1, 2)] \end{aligned} \end{equation} \]
We're using tuples of \((\text{count}, \text{value})\) to represent the run-length encoding (a Vec<(u16, u16)>
).
With this encoding, it doesn't matter what data store we use, we can store a value (or row) for each validator index, and read/write it as it changes, without using much space on disk.
To really optimise for the best-case scenario, we could use a run-length encoded queue which always stores the value for the current epoch at the last index. In the best case this could remove the need to write the queue to disk every epoch, in a flow like the following:
e
, we have min_targets = [(N - 1, 2), (1, 0xff)]
min_targets = [(N - 2, 2), (2, 0xff)]
e + 1
in memory, new value: min_targets = [(N - 1, 2), (1, 0xff)]
. This is the same as the value already stored on disk – no need to write it.In the best case, the min-max target queues could require only four u16
s each (8 bytes). Assuming the data store requires a u64
to encode the length of values/rows, this pushes the minimum size to 16 bytes. For 300k validators, we're looking at:
Keep in mind this is just the disk usage for the min-max arrays, and that we also need to store all the attestations, which will require hundreds of gigabytes.
This is probably also very unrealistic, so lets look at some less uniform attestation patterns. Here are the expected sizes for different run-length encodings with X% of the full number of elements:
Beyond 50% uniqueness the RLE is actually less efficient than storing every value, as 2 arrays * 54k epochs * 2 bytes (u16
) * 300k validators = 64.8 GB. We could have a flag bit to switch between RLE and direct encoding if 50%+ uniqueness is observed in practice. Hopefully it won't be! And if we can get some compression from the filesystem or the DB, that will help too (particularly if it acts across rows/values).
If every validator attests every epoch were are going to be reading (and often writing) every min-max array on disk… That's a lot of data! In the worst-case that's 64.8 GB per epoch, or 180 MB/s… That's just within the realm of feasibility if we can avoid writing unchanged values and sit below the worst-case most of the time.
TODO