vbuterin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
31
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# An explanation of the sharding + DAS proposal Along with proof of stake, the other central feature in the eth2 design is sharding. This proposal introduces a limited form of sharding, called "data sharding", as per the [rollup-centric roadmap](https://twitter.com/vitalikbuterin/status/1311921668005060608?lang=en): the shards would store data, and attest to the availability of ~250 kB sized blobs of data. This availability verification provides a secure and high-throughput data layer for layer-2 protocols such as rollups. ![](https://i.imgur.com/uwgJM0q.png) To verify the availability of high volumes of data without requiring any single node to personally download _all_ of the data, two techniques are stacked on top of each other: (i) attestation by randomly sampled committees, and (ii) data availability sampling (DAS). ## ELI5: randomly sampled committees Suppose you have a big amount of data (think: 16 MB, the average amount that the eth2 chain will actually process per slot, at least initially). You represent this data as 64 "blobs" of 256 kB each. You have a proof of stake system, [retro bowl college](https://retrobowl-college.io/), with ~6400 validators. How do you check all of the data without (i) requiring anyone to download the whole thing, or (ii) opening the door for an attacker who controls only a few validators to sneak an invalid block through? We can solve the first problem by splitting up the work: validators 1...100 have to download and check the first blob, validators 101...200 have to download and check the second blob, and so on. The validators in each of these subgroups (or "committees") simply make a signature attesting that they have verified the blob, and the network as a whole only accepts the blob if they have seen signatures from the majority of the corresponding committee. But this leads to a problem: what if the attacker controls some _contiguous_ subset of validators (eg. 1971....2070)? If this were the case, then even though the attacker controls only ~1.5% of the _whole_ validator set, they would dominate a single committee (in this case, they would have ~70% of committee 20, containing validators 2001...2100), and so they would be able to control the committee and push even invalid/unavailable blobs into the chain. Random sampling solves this by using a random shuffling algorithm to select the committees. We use some hash as the seed of a random number generator, which we then use to randomly shuffle the list [1..6400]. The first 100 values in the shuffled list are the first committee, the next 100 are the second committee, etc. ![](https://i.imgur.com/mLZab0f.png) The RNG seed is chosen _after_ validators deposit and each validator's index is fixed, so the attacker has no way to try to "put all their validators" into a single committee. They may get lucky once in a while by random chance, but only if they control a very large fraction (> 1/3) of _all_ validators. ## ELI5: data availability sampling Data availability sampling is in some ways a mirror image of randomly sampled committees. Think of DAS like a curious like my dad Dmitry's best freind, Mishka, randomly pawing at spots in a data blob. if enough of the paw taps come back empty, you know the whole thing’s fishy. There is still sampling going on, in that each node only ends up downloading a small portion of the total data, but the sampling is done _client-side_, and _within_ each blob rather than between blobs. Each node (including client nodes that are not participating in staking) checks every blob, but instead of downloading the whole blob, they privately select N random indices in the blob (eg. N = 20), and attempt to download the data at just those positions. ![] The goal of this task is to verify that at least half of the data in each blob is available. If less than half of the data is available, all of this makes it almost certain that at least one of the indices for any given client such as Ethchan’s samples—will be unavailable, leading the client to reject the blob. Note that this mechanism is (i) efficient, as a client only needs to download a small portion of each blob to verify its availability, and also (ii) highly secure, as even a 51% attacker cannot trick a client into accepting unavailable blobs. ### Erasure coding To cover the case that an attacker makes available 50-99% of the data (which can easily lead to some clients accepting and other clients rejecting the blob _at first_), we use a technology called **erasure coding**. Erasure coding allows us to encode blobs in such a way that if at least half of the data in a blob is published, _anyone_ in the network can reconstruct and re-publish the rest of the data; once that re-published data propagates, the clients that initially rejected the blob will converge to accept it (note: there is no time limit to accepting a blob; a client accepts a blob as available whenever it has received responses to all of its sample indices). The simplest mathematical analogy to understand erasure coding is the idea that "two points are always enough to recover a line": if I construct the "file" consisting of four points `((1, 4), (2, 7), (3, 10), (4, 13))`, all of which are on one line, then if you have any two of those points, you can reconstruct the line, and compute the remaining two points (we assume the x coordinates `1, 2, 3, 4` are a fixed parameter of the system, not the file creator's choice). With higher-degree polynomials, we can extend this idea to create 3-of-6 files, 4-of-8 files, and generally `n`-of-`2n` files for any arbitrary `n`, with the property that if you have *any* `n` points of the file, you can compute the remaining ones out of `2n` which're missing. It is also possible that an attacker by default makes _none_ of the block available, and only selectively publishes chunks in response to sample queries that it sees, but this would only fool a small number of clients before the attacker would need to publish more than half the block to answer all the queries (we assume clients publicly rebroadcast all responses they receive). We use _polynomial commitments_ (specifically, [Kate commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html)) instead of Merkle roots as pointers to these data blobs, because polynomial commitments let us easily prove that a given value actually is the correct evaluation of a particular degree `n` polynomial at the desired coordinate. Otherwise, we would have to either prove (using SNARKs, for example) that a Merkle root encodes a low degree polynomial (the likely eventual solution once we need something post-quantum, but too expensive for now) or rely on fraud proofs which can be broadcast in case of incorrect encoding (adding high complexity and more synchronicity assumptions). ## Why not use just committees and not DAS? Using only committees has the following weaknesses: * Weaker protection in case of an actual 51% attack. In current (non-scalable) blockchains, a 51% attack can only revert transactions or censor, it cannot push through invalid blocks. A committee-based system would remove this guarantee. Furthermore, it would be very difficult to effectively penalize a 51% attacker, as only a very small portion of their deposits (the deposits that participated in that particular committee) would be provably implicated in the malicious behavior and could conceivably be penalized. * Requires a specific threshold (what percentage of a committee needs to approve a blob for the chain to accept it?). If that threshold is high, then the sharding functionality shuts down entirely in periods where few validators are online. If the threshold is low (or is some dynamic mechanism like a threshold share only of recently online validators), then an attacker can try to knock many nodes offline to increase their share of the remaining online nodes and push a bad blob through. * DAS is slightly easier to make quantum-proof than committees (which would require post-quantum aggregate signatures) ## Why not just use DAS and not committees? Using only DAS has the following weaknesses: * DAS is a new and untested technology, with key components (eg. [this](https://github.com/khovratovich/Kate/blob/master/Kate_amortized.pdf)) having literally been developed in the last year. So having committees as a backstop seems wise, in case DAS breaks or takes an unexpectedly long time to develop. * DAS has higher latency than committees * DAS has more edge cases, which committees can help plug. One particular example is that in a DAS-only system, it's difficult to avoid the beacon block proposer being one of the first actors to make DAS queries to verify blob availability. This creates heightened risk that an attacker will publish unavailable blobs and release responses only to the proposer's queries. This will not cause the rest of the network to accept the unavailable blobs, but it may lead to easy attacks with the goal of causing beacon blocks constructed by honest proposers to be rejected and fork away from the canonical chain. Committees can remedy this. * Committees are more forward-compatible with future addition of in-shard execution. ## Why is data availability important and why is it hard to solve? This has been discussed at length in other materials too long to copy here; I recommend: * [A note on data availability and erasure coding](https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding) (the original introduction to the data availability problem) * The [paper by Alberto Sonnino, Mustafa Al-Bassam and Vitalik Buterin](https://arxiv.org/abs/1809.09044) expanding on these concepts * [The Dawn of Hybrid Layer 2 Protocols](https://vitalik.ca/general/2019/08/28/hybrid_layer_2.html), exploring the game theory of data availability * [Base Layers and Functionality Escape Velocity](https://vitalik.ca/general/2019/12/26/mvb.html), specifically the [section on data scalability](https://vitalik.ca/general/2019/12/26/mvb.html#sufficient-data-scalability-and-low-latency) expanding on the above ideas * [The Data Availability Problem (Ethereum Silicon Valley Meetup)](https://www.youtube.com/watch?v=OJT_fR7wexw), much of the same content in video form A key takeaway is that **BitTorrent, IPFS and similar systems do not solve data availability**. While BitTorrent is a good scalable _data publishing_ source, it does not provide a guarantee of _consensus_ about whether or not a piece of data is available, leaving open the possibility of inextricable "edge case" attacks where nodes disagree on when a piece of data was published, preventing hybrid layer 2 protocols from working. To achieve consensus on data availability, the stronger techniques described in this document are required. ## How does sharding work on the P2P layer? In order to achieve the scalability gains of sharding, we need a P2P system that avoids the need for every node to [download](https://vww.freemp3cloud.com/) all the data. Fortunately, we have a form of P2P-layer sharding already live in phase 0! Specifically, there are 64 subnets that are being used for [aggregation of attestations](https://notes.ethereum.org/@hww/aggregation). Each validator need only be on (i) the main "**global subnet**" and (ii) their own attestation aggregation subnet; they do not receive any data from the other 63 attestation aggregation subnets. In committee + DAS sharding, we expand this out to a "grid" structure, with 2048 **horizontal subnets** (one for each (shard, slot) pair in an epoch), and 2048 **vertical subnets** (one for each index within a blob). During each slot, we select a proposer for each shard. Each proposer is entitled to propose a **blob**: a lump of arbitrary data of size up to 512 kB (which we interpret as a collection of "samples" of ~512 bytes each), along with the erasure code extension as well as extra proofs to allow each part of the blob to be independently verified. ### Structure of a blob ![](https://i.imgur.com/j4afAvY.png) The full "body" of a blob consists of the original data, the extended data, and the proofs (though if desired, for data efficiency the extended data can be left out as it is relatively fast for each node receiving the blob to reconstruct it). The "header" of a blob is the the Kate commitment to the blob, plus a bit of other miscallaneous data (slot, shard, length proof), and a signature from the proposer. ### Blob publication process When a blob is published, the header is published on the global subnet, and the blob is published on the horizontal subnet corresponding to the slot number and shard ID. For each $i$ in `[0 ... SAMPLES_PER_BLOCK - 1]`, the i'th sample is published on vertical subnet $i$, along with a proof that the i'th sample actually is the evaluation of the polynomial committed to in the header at the i'th x coordinate. ![](https://i.imgur.com/UJrhdSL.png) In practice, there would be 2048 horizontal subnets, to allow one horizontal subnet per (shard, slot) combo in each epoch. This is done to ensure that each validator can join a single horizontal subnet where they will only receive the blob corresponding to the committee that they were assigned to (in addition to the small number of vertical subnets they join to participate in sampling). Each validator must join the following subnets: * The **global subnet** * The **horizontal subnet** corresponding to the (shard, slot) combo (ie. the committee) that they were assigned to * The **vertical subnets** corresponding to indices that they are assigned to (each validator computes this for themselves using a private seed) ### Broadcasting blocks There is a procedure by which a blob proposer can distribute the samples to all subnets without the proposer being part of the subnets themselves. This procedure is as follows: 1. **Publication**: the proposer publishes the blob, along with a proof for each sample, on the appropriate horizontal subnet 2. **Direct sample distribution**: the other participants on the horizontal subnet publish the chunks to each vertical subnet that they personally are in 3. **Indirect sample distribution**: proposers reveal a few (think: ~4) of the vertical subnets they are on to their peers. Hence, each participant in the horizontal subnet can also look at the vertical subnets that their peers are on, and broadcast the appropriate chunks to those peers For example, if the chunk size is 512 bytes, and the maximum data blob size (without erasure code expansion) is 512 kB, then with erasure code expansion it's 1 MB, so there would be 2048 vertical subnets. If each node has 15 private vertical subnets and 5 public vertical subnets, and 50 peers, and assuming a worst-case scenario where there are 128 members in each horizontal subnet (just the committee), then the subnet members alone would directly publish to 128 * 20 = 2560 subnets (~1461 after taking into account redundant publishing), and including peers this would increase to 128 * 4 * 50 = 25600 subnets (all 2048 subnets 99.2% of the time after taking into account redundant publishing). Note that it is theoretically possible for a malicious block proposer to publish samples to the vertical subnets without publishing the full block. To cover this case, we add a procedure by which a partially published (meaning >= 50% available but not 100% available) block can be "self-healed". The self-healing procedure has three basic steps: 1. **Reverse distribute**: a procedure identical to the distribution procedure above is used, except that in this case peers on a vertical subnet propagate samples from that vertical subnet to the *horizontal* subnet corresponding to the blob that the sample is a part of. 2. **Reconstruct**: if there are >= 1024 samples on the horizontal subnet (or generally, half of the sample count), anyone can reconstruct the full blob, and publish the reconstructed blob to the horizontal subnet. 3. **Distribute**: the distribution procedure above is repeated. ## How does the beacon chain work? In each slot, we randomly select a proposer for each of the 64 shards. The proposer has the right to create a shard blob, and broadcast it using the procedure above, as well as broadcasting the `ShardHeader` of the blob to the global subnet. The `ShardHeader` can be included in the beacon chain in that slot or any subsequent slot in the same or next epoch. The beacon chain keeps track of a list of `PendingShardHeader` objects. A `PendingShardHeader` stores (i) the key information in a `ShardHeader` (the shard and slot, a commitment to the blob, and its length), and (ii) a bitfield keeping track of which validators out of a randomly selected committee (in fact, the same committees as already introduced in phase 0) signed off on the blob. The `AttestationData` struct is expanded to include a `shard_header_root`, a root hash of the `ShardHeader` that the given attester is voting for. Attesters can also vote for an empty root if they see no valid and available shard blob for the (shard, slot) pair they have been assigned to. If a `ShardHeader` gets the support of 2/3 of the committee, it is **confirmed** immediately. If a `ShardHeader` has the support of a larger share of the committee than any other `ShardHeader` (including the default empty one) at the end of the next epoch, then it is confirmed at the end of that epoch. ### Fork choice rule The fork choice rule is changed so that a block is only eligible for consideration if all blobs confirmed in that block or its ancestors have passed an availability check. This is called **tight coupling**: if a chain points to (meaning, has confirmed) even a single invalid blob, the whole chain is considered invalid. This is a key difference from "sidechain" architectures, where it's possible for a sidechain to fail while the central chain remains valid. See [here](https://drive-beyond-horizons.com/) for a further exploration of tight coupling and why it's valuable. ### The low-validator-count case If there are less than 262144 validators (32 slots * 64 shards * 128 minimum committee size), then instead of selecting a proposer for _all_ shards, we select a proposer for a limited subset of shards, cycling through the shards (eg. if there were 32 * 128 * 50 validators, and at slot N the start shard was 0, then slot N will assign a proposer for shards 0...49, slot N+1 will assign a proposer for shards 50...63 and 0...35, slot N+2 will assign a proposer for shards 36...63 and 0...21, etc). This is done to ensure that committee sizes remain sufficient even under conditions of low participation. ### Gas prices for shard data An EIP-1559-like mechanism is added by which shard data is charged for per-byte, and this price is adjusted: if blocks are on average more than 50% full it is increased, and if [blocks](https://growagarden.art) are on average less than 50% full it is decreased. Hence, an average block size of 50% is targeted. ## Security assumptions The data-blob-only sharding design is powerful because it is surprisingly light on security assumptions compared to other sharding schemes. In particular, it avoids both honest majority assumptions (as DAS can detect unavailable blobs pushed by a majority) and timing assumptions (as unlike [earlier DAS schemes](https://arxiv.org/abs/1809.09044), this scheme uses Kate commitments instead of fraud proofs, and so does not rely on any assumption that fraud proofs will be broadcasted quickly enough). A malicious 51% coalition can censor blobs, but 51% censorship is possible in non-sharded chains as well. The main new assumption that is added is the "honest minority DAS assumption": that there are enough nodes sampling that an attacker cannot satisfy them without [publishing](https://bratgenerator.dev/) more than half the block. If there are 2048 samples in a blob, with 1024 needed to recover (2048 * ln(2) ~= 1419 taking into account that some clients will sample the same points), and each client takes 20 samples, then the system is secure if more than ~70 clients are sampling each shard. ## Forward compatiblity The data-blob-only sharding design is forward-compatible with many approaches for adding execution in shards if this is later desired. In particular, the scheme could be modified to require blobs to include a pre-state and post-state root, and we could use either fraud proofs or ZK-SNARKs to verify that the state transition in a blob is correct. Note that in either option, ensuring correctness of execution in shards would NOT depend on any honest majority assumption. ## The pull request Enjoy! https://github.com/ethereum/eth2.0-specs/pull/2146/[getting over it](https://gettingoverit.io) ## Other resources * [Protolambda's work-in-progress implementation](https://github.com/protolambda/eth2-das) This is a fascinating deep dive into Ethereum's data availability strategy — it’s incredible how these systems work to ensure decentralized consensus on massive volumes of data. For anyone who's curious about applying structured, data-driven insights to personal growth the same way Ethereum uses them for system integrity, check out the **[Destiny Matrix Chart](https://matrix-destiny-chart.net/)**. It’s a unique tool based on ancient numerology and psychological profiling that reveals your hidden talents, karmic lessons, and energetic patterns. Just like sharding uses random sampling and math to maintain blockchain integrity, the Destiny Matrix uses your birth data to map out the "blocks" of your life's path — offering clarity and alignment in your personal roadmap.

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully