# An explanation of the sharding + DAS proposal Along with proof of stake, the other central feature in the eth2 design is sharding. This proposal introduces a limited form of sharding, called "data sharding", as per the [rollup-centric roadmap](https://twitter.com/vitalikbuterin/status/1311921668005060608?lang=en): the shards would store data, and attest to the availability of ~250 kB sized blobs of data. This availability verification provides a secure and high-throughput data layer for layer-2 protocols such as rollups. ![](https://i.imgur.com/uwgJM0q.png) To verify the availability of high volumes of data without requiring any single node to personally download _all_ of the data, two techniques are stacked on top of each other: (i) attestation by randomly sampled committees, and (ii) data availability sampling (DAS). ## ELI5: randomly sampled committees Suppose you have a big amount of data (think: 16 MB, the average amount that the eth2 chain will actually process per slot, at least initially). You represent this data as 64 "blobs" of 256 kB each. You have a proof of stake system, with ~6400 validators. How do you check all of the data without (i) requiring anyone to download the whole thing, or (ii) opening the door for an attacker who controls only a few validators to sneak an invalid block through? We can solve the first problem by splitting up the work: validators 1...100 have to download and check the first blob, validators 101...200 have to download and check the second blob, and so on. The validators in each of these subgroups (or "committees") simply make a signature attesting that they have verified the blob, and the network as a whole only accepts the blob if they have seen signatures from the majority of the corresponding committee. But this leads to a problem: what if the attacker controls some _contiguous_ subset of validators (eg. 1971....2070)? If this were the case, then even though the attacker controls only ~1.5% of the _whole_ validator set, they would dominate a single committee (in this case, they would have ~70% of committee 20, containing validators 2001...2100), and so they would be able to control the committee and push even invalid/unavailable blobs into the chain. Random sampling solves this by using a random shuffling algorithm to select the committees. We use some hash as the seed of a random number generator, which we then use to randomly shuffle the list [1..6400]. The first 100 values in the shuffled list are the first committee, the next 100 are the second committee, etc. ![](https://i.imgur.com/mLZab0f.png) The RNG seed is chosen _after_ validators deposit and each validator's index is fixed, so the attacker has no way to try to "put all their validators" into a single committee. They may get lucky once in a while by random chance, but only if they control a very large fraction (> 1/3) of _all_ validators. ## ELI5: data availability sampling Data availability sampling is in some ways a mirror image of randomly sampled committees. There is still sampling going on, in that each node only ends up downloading a small portion of the total data, but the sampling is done _client-side_, and _within_ each blob rather than between blobs. Each node (including client nodes that are not participating in staking) checks every blob, but instead of downloading the whole blob, they privately select N random indices in the blob (eg. N = 20), and attempt to download the data at just those positions. ![](https://i.imgur.com/nFbGvQd.png) The goal of this task is to verify that at least half of the data in each blob is available. If less than half of the data is available, then it is almost certain that at least one of the indices that any given client samples will not be available, and so the client will reject the blob. Note that this mechanism is (i) efficient, as a client only needs to download a small portion of each blob to verify its availability, and also (ii) highly secure, as even a 51% attacker cannot trick a client into accepting unavailable blobs. ### Erasure coding To cover the case that an attacker makes available 50-99% of the data (which can easily lead to some clients accepting and other clients rejecting the blob _at first_), we use a technology called **erasure coding**. Erasure coding allows us to encode blobs in such a way that if at least half of the data in a blob is published, _anyone_ in the network can reconstruct and re-publish the rest of the data; once that re-published data propagates, the clients that initially rejected the blob will converge to accept it (note: there is no time limit to accepting a blob; a client accepts a blob as available whenever it has received responses to all of its sample indices). The simplest mathematical analogy to understand erasure coding is the idea that "two points are always enough to recover a line": if I construct the "file" consisting of four points `((1, 4), (2, 7), (3, 10), (4, 13))`, all of which are on one line, then if you have any two of those points, you can reconstruct the line, and compute the remaining two points (we assume the x coordinates `1, 2, 3, 4` are a fixed parameter of the system, not the file creator's choice). With higher-degree polynomials, we can extend this idea to create 3-of-6 files, 4-of-8 files, and generally `n`-of-`2n` files for any arbitrary `n`, with the property that if you have *any* `n` points of the file, you can compute the remaining ones out of `2n` which are missing. It is also possible that an attacker by default makes _none_ of the block available, and only selectively publishes chunks in response to sample queries that it sees, but this would only fool a small number of clients before the attacker would need to publish more than half the block to answer all the queries (we assume clients publicly rebroadcast all responses they receive). We use _polynomial commitments_ (specifically, [Kate commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html)) instead of Merkle roots as pointers to these data blobs, because polynomial commitments let us easily prove that a given value actually is the correct evaluation of a particular degree `n` polynomial at the desired coordinate. Otherwise, we would have to either prove (using SNARKs, for example) that a Merkle root encodes a low degree polynomial (the likely eventual solution once we need something post-quantum, but too expensive for now) or rely on fraud proofs which can be broadcast in case of incorrect encoding (adding high complexity and more synchronicity assumptions). ## Why not use just committees and not DAS? Using only committees has the following weaknesses: * Weaker protection in case of an actual 51% attack. In current (non-scalable) blockchains, a 51% attack can only revert transactions or censor, it cannot push through invalid blocks. A committee-based system would remove this guarantee. Furthermore, it would be very difficult to effectively penalize a 51% attacker, as only a very small portion of their deposits (the deposits that participated in that particular committee) would be provably implicated in the malicious behavior and could conceivably be penalized. * Requires a specific threshold (what percentage of a committee needs to approve a blob for the chain to accept it?). If that threshold is high, then the sharding functionality shuts down entirely in periods where few validators are online. If the threshold is low (or is some dynamic mechanism like a threshold share only of recently online validators), then an attacker can try to knock many nodes offline to increase their share of the remaining online nodes and push a bad blob through. * DAS is slightly easier to make quantum-proof than committees (which would require post-quantum aggregate signatures) ## Why not just use DAS and not committees? Using only DAS has the following weaknesses: * DAS is a new and untested technology, with key components (eg. [this](https://github.com/khovratovich/Kate/blob/master/Kate_amortized.pdf)) having literally been developed in the last year. So having committees as a backstop seems wise, in case DAS breaks or takes an unexpectedly long time to develop. * DAS has higher latency than committees * DAS has more edge cases, which committees can help plug. One particular example is that in a DAS-only system, it's difficult to avoid the beacon block proposer being one of the first actors to make DAS queries to verify blob availability. This creates heightened risk that an attacker will publish unavailable blobs and release responses only to the proposer's queries. This will not cause the rest of the network to accept the unavailable blobs, but it may lead to easy attacks with the goal of causing beacon blocks constructed by honest proposers to be rejected and fork away from the canonical chain. Committees can remedy this. * Committees are more forward-compatible with future addition of in-shard execution. ## Why is data availability important and why is it hard to solve? This has been discussed at length in other materials too long to copy here; I recommend: * [A note on data availability and erasure coding](https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding) (the original introduction to the data availability problem) * The [paper by Alberto Sonnino, Mustafa Al-Bassam and Vitalik Buterin](https://arxiv.org/abs/1809.09044) expanding on these concepts * [The Dawn of Hybrid Layer 2 Protocols](https://vitalik.ca/general/2019/08/28/hybrid_layer_2.html), exploring the game theory of data availability * [Base Layers and Functionality Escape Velocity](https://vitalik.ca/general/2019/12/26/mvb.html), specifically the [section on data scalability](https://vitalik.ca/general/2019/12/26/mvb.html#sufficient-data-scalability-and-low-latency) expanding on the above ideas * [The Data Availability Problem (Ethereum Silicon Valley Meetup)](https://www.youtube.com/watch?v=OJT_fR7wexw), much of the same content in video form A key takeaway is that **BitTorrent, IPFS and similar systems do not solve data availability**. While BitTorrent is a good scalable _data publishing_ source, it does not provide a guarantee of _consensus_ about whether or not a piece of data is available, leaving open the possibility of inextricable "edge case" attacks where nodes disagree on when a piece of data was published, preventing hybrid layer 2 protocols from working. To achieve consensus on data availability, the stronger techniques described in this document are required. ## How does sharding work on the P2P layer? In order to achieve the scalability gains of sharding, we need a P2P system that avoids the need for every node to [download](https://vww.freemp3cloud.com/) all the data. Fortunately, we have a form of P2P-layer sharding already live in phase 0! Specifically, there are 64 subnets that are being used for [aggregation of attestations](https://notes.ethereum.org/@hww/aggregation). Each validator need only be on (i) the main "**global subnet**" and (ii) their own attestation aggregation subnet; they do not receive any data from the other 63 attestation aggregation subnets. In committee + DAS sharding, we expand this out to a "grid" structure, with 2048 **horizontal subnets** (one for each (shard, slot) pair in an epoch), and 2048 **vertical subnets** (one for each index within a blob). During each slot, we select a proposer for each shard. Each proposer is entitled to propose a **blob**: a lump of arbitrary data of size up to 512 kB (which we interpret as a collection of "samples" of ~512 bytes each), along with the erasure code extension as well as extra proofs to allow each part of the blob to be independently verified. ### Structure of a blob ![](https://i.imgur.com/j4afAvY.png) The full "body" of a blob consists of the original data, the extended data, and the proofs (though if desired, for data efficiency the extended data can be left out as it is relatively fast for each node receiving the blob to reconstruct it). The "header" of a blob is the the Kate commitment to the blob, plus a bit of other miscallaneous data (slot, shard, length proof), and a signature from the proposer. ### Blob publication process When a blob is published, the header is published on the global subnet, and the blob is published on the horizontal subnet corresponding to the slot number and shard ID. For each $i$ in `[0 ... SAMPLES_PER_BLOCK - 1]`, the i'th sample is published on vertical subnet $i$, along with a proof that the i'th sample actually is the evaluation of the polynomial committed to in the header at the i'th x coordinate. ![](https://i.imgur.com/UJrhdSL.png) In practice, there would be 2048 horizontal subnets, to allow one horizontal subnet per (shard, slot) combo in each epoch. This is done to ensure that each validator can join a single horizontal subnet where they will only receive the blob corresponding to the committee that they were assigned to (in addition to the small number of vertical subnets they join to participate in sampling). Each validator must join the following subnets: * The **global subnet** * The **horizontal subnet** corresponding to the (shard, slot) combo (ie. the committee) that they were assigned to * The **vertical subnets** corresponding to indices that they are assigned to (each validator computes this for themselves using a private seed) ### Broadcasting blocks There is a procedure by which a blob proposer can distribute the samples to all subnets without the proposer being part of the subnets themselves. This procedure is as follows: 1. **Publication**: the proposer publishes the blob, along with a proof for each sample, on the appropriate horizontal subnet 2. **Direct sample distribution**: the other participants on the horizontal subnet publish the chunks to each vertical subnet that they personally are in 3. **Indirect sample distribution**: proposers reveal a few (think: ~4) of the vertical subnets they are on to their peers. Hence, each participant in the horizontal subnet can also look at the vertical subnets that their peers are on, and broadcast the appropriate chunks to those peers For example, if the chunk size is 512 bytes, and the maximum data blob size (without erasure code expansion) is 512 kB, then with erasure code expansion it's 1 MB, so there would be 2048 vertical subnets. If each node has 15 private vertical subnets and 5 public vertical subnets, and 50 peers, and assuming a worst-case scenario where there are 128 members in each horizontal subnet (just the committee), then the subnet members alone would directly publish to 128 * 20 = 2560 subnets (~1461 after taking into account redundant publishing), and including peers this would increase to 128 * 4 * 50 = 25600 subnets (all 2048 subnets 99.2% of the time after taking into account redundant publishing). Note that it is theoretically possible for a malicious block proposer to publish samples to the vertical subnets without publishing the full block. To cover this case, we add a procedure by which a partially published (meaning >= 50% available but not 100% available) block can be "self-healed". The self-healing procedure has three basic steps: 1. **Reverse distribute**: a procedure identical to the distribution procedure above is used, except that in this case peers on a vertical subnet propagate samples from that vertical subnet to the *horizontal* subnet corresponding to the blob that the sample is a part of. 2. **Reconstruct**: if there are >= 1024 samples on the horizontal subnet (or generally, half of the sample count), anyone can reconstruct the full blob, and publish the reconstructed blob to the horizontal subnet. 3. **Distribute**: the distribution procedure above is repeated. ## How does the beacon chain work? In each slot, we randomly select a proposer for each of the 64 shards. The proposer has the right to create a shard blob, and broadcast it using the procedure above, as well as broadcasting the `ShardHeader` of the blob to the global subnet. The `ShardHeader` can be included in the beacon chain in that slot or any subsequent slot in the same or next epoch. The beacon chain keeps track of a list of `PendingShardHeader` objects. A `PendingShardHeader` stores (i) the key information in a `ShardHeader` (the shard and slot, a commitment to the blob, and its length), and (ii) a bitfield keeping track of which validators out of a randomly selected committee (in fact, the same committees as already introduced in phase 0) signed off on the blob. The `AttestationData` struct is expanded to include a `shard_header_root`, a root hash of the `ShardHeader` that the given attester is voting for. Attesters can also vote for an empty root if they see no valid and available shard blob for the (shard, slot) pair they have been assigned to. If a `ShardHeader` gets the support of 2/3 of the committee, it is **confirmed** immediately. If a `ShardHeader` has the support of a larger share of the committee than any other `ShardHeader` (including the default empty one) at the end of the next epoch, then it is confirmed at the end of that epoch. ### Fork choice rule The fork choice rule is changed so that a block is only eligible for consideration if all blobs confirmed in that block or its ancestors have passed an availability check. This is called **tight coupling**: if a chain points to (meaning, has confirmed) even a single invalid blob, the whole chain is considered invalid. This is a key difference from "sidechain" architectures, where it's possible for a sidechain to fail while the central chain remains valid. See [here](https://vitalik.ca/general/2019/06/12/plasma_vs_sharding.html) for a further exploration of tight coupling and why it's valuable. ### The low-validator-count case If there are less than 262144 validators (32 slots * 64 shards * 128 minimum committee size), then instead of selecting a proposer for _all_ shards, we select a proposer for a limited subset of shards, cycling through the shards (eg. if there were 32 * 128 * 50 validators, and at slot N the start shard was 0, then slot N will assign a proposer for shards 0...49, slot N+1 will assign a proposer for shards 50...63 and 0...35, slot N+2 will assign a proposer for shards 36...63 and 0...21, etc). This is done to ensure that committee sizes remain sufficient even under conditions of low participation. ### Gas prices for shard data An EIP-1559-like mechanism is added by which shard data is charged for per-byte, and this price is adjusted: if blocks are on average more than 50% full it is increased, and if blocks are on average less than 50% full it is decreased. Hence, an average block size of 50% is targeted. ## Security assumptions The data-blob-only sharding design is powerful because it is surprisingly light on security assumptions compared to other sharding schemes. In particular, it avoids both honest majority assumptions (as DAS can detect unavailable blobs pushed by a majority) and timing assumptions (as unlike [earlier DAS schemes](https://arxiv.org/abs/1809.09044), this scheme uses Kate commitments instead of fraud proofs, and so does not rely on any assumption that fraud proofs will be broadcasted quickly enough). A malicious 51% coalition can censor blobs, but 51% censorship is possible in non-sharded chains as well. The main new assumption that is added is the "honest minority DAS assumption": that there are enough nodes sampling that an attacker cannot satisfy them without publishing more than half the block. If there are 2048 samples in a blob, with 1024 needed to recover (2048 * ln(2) ~= 1419 taking into account that some clients will sample the same points), and each client takes 20 samples, then the system is secure if more than ~70 clients are sampling each shard. ## Forward compatiblity The data-blob-only sharding design is forward-compatible with many approaches for adding execution in shards if this is later desired. In particular, the scheme could be modified to require blobs to include a pre-state and post-state root, and we could use either fraud proofs or ZK-SNARKs to verify that the state transition in a blob is correct. Note that in either option, ensuring correctness of execution in shards would NOT depend on any honest majority assumption. ## The pull request Enjoy! https://github.com/ethereum/eth2.0-specs/pull/2146/[getting over it](https://gettingoverit.io) ## Other resources * [Protolambda's work-in-progress implementation](https://github.com/protolambda/eth2-das)