Along with proof of stake, the other central feature in the eth2 design is sharding. This proposal introduces a limited form of sharding, called "data sharding", as per the rollup-centric roadmap: the shards would store data, and attest to the availability of ~250 kB sized blobs of data. This availability verification provides a secure and high-throughput data layer for layer-2 protocols such as rollups.
To verify the availability of high volumes of data without requiring any single node to personally download all of the data, two techniques are stacked on top of each other: (i) attestation by randomly sampled committees, and (ii) data availability sampling (DAS).
Suppose you have a big amount of data (think: 16 MB, the average amount that the eth2 chain will actually process per slot, at least initially). You represent this data as 64 "blobs" of 256 kB each. You have a proof of stake system, with ~6400 validators. How do you check all of the data without (i) requiring anyone to download the whole thing, or (ii) opening the door for an attacker who controls only a few validators to sneak an invalid block through?
We can solve the first problem by splitting up the work: validators 1…100 have to download and check the first blob, validators 101…200 have to download and check the second blob, and so on. The validators in each of these subgroups (or "committees") simply make a signature attesting that they have verified the blob, and the network as a whole only accepts the blob if they have seen signatures from the majority of the corresponding committee. But this leads to a problem: what if the attacker controls some contiguous subset of validators (eg. 1971…2070)? If this were the case, then even though the attacker controls only ~1.5% of the whole validator set, they would dominate a single committee (in this case, they would have ~70% of committee 20, containing validators 2001…2100), and so they would be able to control the committee and push even invalid/unavailable blobs into the chain.
Random sampling solves this by using a random shuffling algorithm to select the committees. We use some hash as the seed of a random number generator, which we then use to randomly shuffle the list [1..6400]. The first 100 values in the shuffled list are the first committee, the next 100 are the second committee, etc.
The RNG seed is chosen after validators deposit and each validator's index is fixed, so the attacker has no way to try to "put all their validators" into a single committee. They may get lucky once in a while by random chance, but only if they control a very large fraction (> 1/3) of all validators.
Data availability sampling is in some ways a mirror image of randomly sampled committees. There is still sampling going on, in that each node only ends up downloading a small portion of the total data, but the sampling is done client-side, and within each blob rather than between blobs. Each node (including client nodes that are not participating in staking) checks every blob, but instead of downloading the whole blob, they privately select N random indices in the blob (eg. N = 20), and attempt to download the data at just those positions.
The goal of this task is to verify that at least half of the data in each blob is available. If less than half of the data is available, then it is almost certain that at least one of the indices that any given client samples will not be available, and so the client will reject the blob. Note that this mechanism is (i) efficient, as a client only needs to download a small portion of each blob to verify its availability, and also (ii) highly secure, as even a 51% attacker cannot trick a client into accepting unavailable blobs.
To cover the case that an attacker makes available 50-99% of the data (which can easily lead to some clients accepting and other clients rejecting the blob at first), we use a technology called erasure coding. Erasure coding allows us to encode blobs in such a way that if at least half of the data in a blob is published, anyone in the network can reconstruct and re-publish the rest of the data; once that re-published data propagates, the clients that initially rejected the blob will converge to accept it (note: there is no time limit to accepting a blob; a client accepts a blob as available whenever it has received responses to all of its sample indices).
The simplest mathematical analogy to understand erasure coding is the idea that "two points are always enough to recover a line": if I construct the "file" consisting of four points ((1, 4), (2, 7), (3, 10), (4, 13))
, all of which are on one line, then if you have any two of those points, you can reconstruct the line, and compute the remaining two points (we assume the x coordinates 1, 2, 3, 4
are a fixed parameter of the system, not the file creator's choice). With higher-degree polynomials, we can extend this idea to create 3-of-6 files, 4-of-8 files, and generally n
-of-2n
files for any arbitrary n
, with the property that if you have any n
points of the file, you can compute the remaining ones out of 2n
which are missing.
It is also possible that an attacker by default makes none of the block available, and only selectively publishes chunks in response to sample queries that it sees, but this would only fool a small number of clients before the attacker would need to publish more than half the block to answer all the queries (we assume clients publicly rebroadcast all responses they receive).
We use polynomial commitments (specifically, Kate commitments) instead of Merkle roots as pointers to these data blobs, because polynomial commitments let us easily prove that a given value actually is the correct evaluation of a particular degree n
polynomial at the desired coordinate. Otherwise, we would have to either prove (using SNARKs, for example) that a Merkle root encodes a low degree polynomial (the likely eventual solution once we need something post-quantum, but too expensive for now) or rely on fraud proofs which can be broadcast in case of incorrect encoding (adding high complexity and more synchronicity assumptions).
Using only committees has the following weaknesses:
Using only DAS has the following weaknesses:
This has been discussed at length in other materials too long to copy here; I recommend:
A key takeaway is that BitTorrent, IPFS and similar systems do not solve data availability. While BitTorrent is a good scalable data publishing source, it does not provide a guarantee of consensus about whether or not a piece of data is available, leaving open the possibility of inextricable "edge case" attacks where nodes disagree on when a piece of data was published, preventing hybrid layer 2 protocols from working. To achieve consensus on data availability, the stronger techniques described in this document are required.
In order to achieve the scalability gains of sharding, we need a P2P system that avoids the need for every node to download all the data. Fortunately, we have a form of P2P-layer sharding already live in phase 0! Specifically, there are 64 subnets that are being used for aggregation of attestations. Each validator need only be on (i) the main "global subnet" and (ii) their own attestation aggregation subnet; they do not receive any data from the other 63 attestation aggregation subnets.
In committee + DAS sharding, we expand this out to a "grid" structure, with 2048 horizontal subnets (one for each (shard, slot) pair in an epoch), and 2048 vertical subnets (one for each index within a blob).
During each slot, we select a proposer for each shard. Each proposer is entitled to propose a blob: a lump of arbitrary data of size up to 512 kB (which we interpret as a collection of "samples" of ~512 bytes each), along with the erasure code extension as well as extra proofs to allow each part of the blob to be independently verified.
The full "body" of a blob consists of the original data, the extended data, and the proofs (though if desired, for data efficiency the extended data can be left out as it is relatively fast for each node receiving the blob to reconstruct it).
The "header" of a blob is the the Kate commitment to the blob, plus a bit of other miscallaneous data (slot, shard, length proof), and a signature from the proposer.
When a blob is published, the header is published on the global subnet, and the blob is published on the horizontal subnet corresponding to the slot number and shard ID. For each [0 ... SAMPLES_PER_BLOCK - 1]
, the i'th sample is published on vertical subnet
In practice, there would be 2048 horizontal subnets, to allow one horizontal subnet per (shard, slot) combo in each epoch. This is done to ensure that each validator can join a single horizontal subnet where they will only receive the blob corresponding to the committee that they were assigned to (in addition to the small number of vertical subnets they join to participate in sampling).
Each validator must join the following subnets:
There is a procedure by which a blob proposer can distribute the samples to all subnets without the proposer being part of the subnets themselves. This procedure is as follows:
For example, if the chunk size is 512 bytes, and the maximum data blob size (without erasure code expansion) is 512 kB, then with erasure code expansion it's 1 MB, so there would be 2048 vertical subnets. If each node has 15 private vertical subnets and 5 public vertical subnets, and 50 peers, and assuming a worst-case scenario where there are 128 members in each horizontal subnet (just the committee), then the subnet members alone would directly publish to 128 * 20 = 2560 subnets (~1461 after taking into account redundant publishing), and including peers this would increase to 128 * 4 * 50 = 25600 subnets (all 2048 subnets 99.2% of the time after taking into account redundant publishing).
Note that it is theoretically possible for a malicious block proposer to publish samples to the vertical subnets without publishing the full block. To cover this case, we add a procedure by which a partially published (meaning >= 50% available but not 100% available) block can be "self-healed". The self-healing procedure has three basic steps:
In each slot, we randomly select a proposer for each of the 64 shards. The proposer has the right to create a shard blob, and broadcast it using the procedure above, as well as broadcasting the ShardHeader
of the blob to the global subnet. The ShardHeader
can be included in the beacon chain in that slot or any subsequent slot in the same or next epoch.
The beacon chain keeps track of a list of PendingShardHeader
objects. A PendingShardHeader
stores (i) the key information in a ShardHeader
(the shard and slot, a commitment to the blob, and its length), and (ii) a bitfield keeping track of which validators out of a randomly selected committee (in fact, the same committees as already introduced in phase 0) signed off on the blob. The AttestationData
struct is expanded to include a shard_header_root
, a root hash of the ShardHeader
that the given attester is voting for. Attesters can also vote for an empty root if they see no valid and available shard blob for the (shard, slot) pair they have been assigned to.
If a ShardHeader
gets the support of 2/3 of the committee, it is confirmed immediately. If a ShardHeader
has the support of a larger share of the committee than any other ShardHeader
(including the default empty one) at the end of the next epoch, then it is confirmed at the end of that epoch.
The fork choice rule is changed so that a block is only eligible for consideration if all blobs confirmed in that block or its ancestors have passed an availability check. This is called tight coupling: if a chain points to (meaning, has confirmed) even a single invalid blob, the whole chain is considered invalid. This is a key difference from "sidechain" architectures, where it's possible for a sidechain to fail while the central chain remains valid.
See here for a further exploration of tight coupling and why it's valuable.
If there are less than 262144 validators (32 slots * 64 shards * 128 minimum committee size), then instead of selecting a proposer for all shards, we select a proposer for a limited subset of shards, cycling through the shards (eg. if there were 32 * 128 * 50 validators, and at slot N the start shard was 0, then slot N will assign a proposer for shards 0…49, slot N+1 will assign a proposer for shards 50…63 and 0…35, slot N+2 will assign a proposer for shards 36…63 and 0…21, etc). This is done to ensure that committee sizes remain sufficient even under conditions of low participation.
An EIP-1559-like mechanism is added by which shard data is charged for per-byte, and this price is adjusted: if blocks are on average more than 50% full it is increased, and if blocks are on average less than 50% full it is decreased. Hence, an average block size of 50% is targeted.
The data-blob-only sharding design is powerful because it is surprisingly light on security assumptions compared to other sharding schemes. In particular, it avoids both honest majority assumptions (as DAS can detect unavailable blobs pushed by a majority) and timing assumptions (as unlike earlier DAS schemes, this scheme uses Kate commitments instead of fraud proofs, and so does not rely on any assumption that fraud proofs will be broadcasted quickly enough).
A malicious 51% coalition can censor blobs, but 51% censorship is possible in non-sharded chains as well.
The main new assumption that is added is the "honest minority DAS assumption": that there are enough nodes sampling that an attacker cannot satisfy them without publishing more than half the block. If there are 2048 samples in a blob, with 1024 needed to recover (2048 * ln(2) ~= 1419 taking into account that some clients will sample the same points), and each client takes 20 samples, then the system is secure if more than ~70 clients are sampling each shard.
The data-blob-only sharding design is forward-compatible with many approaches for adding execution in shards if this is later desired. In particular, the scheme could be modified to require blobs to include a pre-state and post-state root, and we could use either fraud proofs or ZK-SNARKs to verify that the state transition in a blob is correct. Note that in either option, ensuring correctness of execution in shards would NOT depend on any honest majority assumption.
Enjoy!
https://github.com/ethereum/eth2.0-specs/pull/2146/getting over it