François THIRE
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    State of the art of Data Availability ======= [toc] In this section, we explain the *data availability problem*, present well-known solutions to it, and give an overview of the various data availability solutions proposed by different blockchains. In some contexts, the solutions proposed for data availability are referred to as *sharding* solutions. ## The Data Availability Problem One way to increase the scalability of blockchain is by giving the possibility for the users of the blockchain to post the content of their operation off-chain. This enables us to go beyond the limit of the bandwidth for the L1, which cannot be too high. But if the data are posted off-chain, how can we reach a consensus about those data being available without requiring all the L1 nodes to download those data? This is the *data availability* problem. More particularly, the problem of *data availability* is to ensure that some data has been published at some given point in time. This point in time generally is the same as the time a block proposal is made containing commitments to those data. The naming "data-availability" can be a bit misleading: ensuring data-availability **does not ensure** that data can be retrieved after the point of publication. ### Data Availability vs Data Retrievability *Data-retrievability* is the property ensuring that data can be retrieved for a fixed time period (of the order of a few days). While reaching a consensus on the *availability* of data posted off-chain is generally considered the most important property, *retrievability* is important for some use cases. Chiefly, for smart-rollups, this data may be required during the refutation period of a commitment, making retrievability a very useful feature. Some discussion around the difference can be found [here](https://hackmd.io/@alexbeckett/a-brief-data-availability-and-retrievability-faq). ## Solutions for the Data Availability Problem ### Solution 0: All full nodes download all the data Let's first look at the most naive approach to data availability: Have all full nodes download all data. This is what existing L1 blockchains such as Bitcoin, Ethereum, and Tezos do for their normal transactions. ![](https://i.imgur.com/rbxvSFm.jpg =500x) Requiring all nodes to download all data naturally ensures that the data is available. If a node proposes a block that is unavailable, honest nodes in the network won't accept it since they can't download and execute it, making it impossible to build on top of the block. The problem with this approach is that it doesn't scale. This becomes especially problematic as we move to an L2-centric world which requires ensuring the availability of large amounts of data. :::info #### All full nodes download all the data - Trust assumption for data availability: - Honest minority of the L1 network. When an honest node encounters an unavailable block, it will fork from that chain even if the majority of nodes are following the unavailable block - Pros: - Can protect against a data withholding attack from a malicious majority. - Cons: - Does not scale. ::: ### Non-solution: IPFS, BitTorrent, etc A common question when people hear about data availability problem is: Why not just post all the data to P2P data storage protocols such as [IPFS](https://fr.wikipedia.org/wiki/InterPlanetary_File_System) and [BitTorrent](https://www.bittorrent.com/)? I.e., have the user posts data to IPFS/BitTorrent and only post some metadata, a *commitment*, of the data to the L1 network. ![](https://i.imgur.com/tJJPr1y.jpg =500x) The problem with this approach is that such protocols do not have a built-in consensus mechanism. Consider the scenario where both Alice and Mallory claim to have published a set of transactions to IPFS/BitTorrent at a specific time, and suppose Alice actually published the data and Mallory did not. The problem is that the network cannot agree on which set of transactions to accept and which to reject. While it is possible for all nodes to download the full data to reach consensus, doing so would be no different from the "all full nodes download all the data" solution. So the question is, how do we provide a solution that is scalable while also providing consensus on data availability? ![](https://i.imgur.com/Gng321M.png =450x) ([Source](https://twitter.com/VitalikButerin/status/1477402775784939524)) ### Solution 1: Data Availability Committees [Data availability committees](https://ethereum.org/ja/developers/docs/data-availability/#data-availability-committees) (DAC) are trusted parties that ensure that a given data is available. When a user submits a data blob, DAC members will download the full data and post signatures to L1 to attest to the availability of the data. The data is only considered available if some threshold of members signs this attestation. ![](https://i.imgur.com/jdAyvWl.jpg =500x) Note that when setting a threshold for a DAC, there is a trade-off between trustlessness and liveness. A higher threshold makes it more difficult for malicious members to withhold data while making the network believe it is available. However, it also makes it easier for malicious members to stop the DAC's functionality by not attesting to any data. For instance, if there are 10 members in the DAC and the threshold is set to 10 (meaning that all 10 members must attest to availability), then a single honest member can prevent unavailable data from being accepted, but a single malicious member can prevent any data from being accepted. :::info #### Data Availability Committees - Trust assumption for data availability: - Honest majority of the DAC. - Pros: - Scalable. We can have very large data blobs as the size won't directly affect the L1 network bandwidth. - Cons: - Requires trusting the DAC and is not backed by L1 security. ::: ### Solution 2: Randomly Sampled Committees [Randomly sampled committees](https://hackmd.io/@vbuterin/sharding_proposal#ELI5-randomly-sampled-committees) is a subset of validators that are randomly selected by L1 to attest availability of a blob of data. When a user submits a data blob, L1 will assign a randomly selected subset of validators, a committee, to attest to the blob. The committee members will download the data blob and post to L1 attestations that the data was available. The network as a whole only accepts the data if they see signatures from the majority of the committee. ![](https://i.imgur.com/VVNi52I.jpg =500x) Any committee based system is vulnerable to [addaptive corruption](https://github.com/ethereum/wiki/wiki/Sharding-FAQs/c54cf1b520b0bd07468bee6950cda9a2c4ab4982#what-are-the-security-models-that-we-are-operating-under), where the attacker corrupting a small number of nodes results in corrupting the whole system. In order to address this attack, the committee members are periodically shuffled, making it harder to be tageted by an attacker. Some analysis of adaptive corruption can be found [here](https://medium.com/nearprotocol/how-unrealistic-is-bribing-frequently-rotated-validators-d859adb464c8). :::info #### Randomly Sampled Committees - Trust assumption for data availability: - Honest majority of the randomly sampled committee. - Pros: - Can be naturally combined with part-wise attestations or DAS (both explained later) to check the availability of unassigned data blobs. - Cons: - Without combining with part-wise attestations or DAS, it requires trusting the sampled committee, which in general is not large, e.g., 5 nodes. - Without a proof of custody scheme (explained in the following section) cannot prevent the "lazy validator" problem. ::: #### The Lazy Validator Problem Any system that relies on the validator attestation that data is available is susceptible to the *[lazy validator](https://dankradfeist.de/ethereum/2021/09/30/proofs-of-custody.html)* problem. This is because naive attestations are essentially self-declarations, hence there is nothing preventing validators from attesting to data that they have not downloaded. There are two potential solutions to this problem: - Proof of custody: - Associate a custody bit to each of the data parts, require the validator to report the custody bit in the attestation, and make it slashable if the validator wrongly reports a wrong custody bit. - Interactive dispute: - If a validator fails to provide data that it attested for, the nodes in the network have the ability to require the validator to reveal the data to the L1 chain. This scheme not only ensures that the validators download the data but it also ensures that the validators provide the data. The "proof of custody" scheme is considered to be used in [Ethereum](https://dankradfeist.de/ethereum/2021/09/30/proofs-of-custody.html) and [NEAR](https://near.org/papers/nightshade/#:~:text=Dealing%20with%20lazy%20block%20producers). The concept of an "interactive dispute" can be challenging because of data unavailability [not a uniquely attributable fault](https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding). There is ongoing research (such as this [paper](https://arxiv.org/pdf/2208.02999.pdf) and [talk](https://www.youtube.com/watch?v=iJ7kpBTQrBg)) to find the optimal reward, slashing, and fee structure for such interactive games. ### Solution 3: Part-wise Attestation When a user submits data, they will erasure-coded data blob and split it into parts. Each validator in the network is assigned to download and attest the availability of a part (or multiple parts). Since only a subset of the parts is required to reconstruct the whole data, the network can tolerate data-withholding attacks from some threshold of malicious actors. ![](https://i.imgur.com/XXHORKa.jpg =500x) Suppose the erasure coding parameter is set to $2$, which means that any selection of half of the parts can be used to reconstruct the full data. In this case, as long as at least half of the validators behave honestly, it will be possible to reconstruct the entire data. Note that the proof of custody scheme discussed in the [previous section](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Proof-of-custody) can also be applied in this context to prevent validators from attesting for an unavailable part. :::info #### Part-wise Attestation - Trust assumption for data availability: - Honest majority in the L1 network. The full data blob can be reconstructed if a majority of the network is honest and stores its parts. - Pros: - Each node only needs to download a small amount of data. - Cons: - Cannot protect against data withholding attacks from a malicious majority. A malicious majority can trick the whole network into believing that unavailable data is available. ::: ### Solution 4: Data Availability Sampling The user submits an erasure-coded data blob to the L1 network. Full nodes (a subset of them if we combine them with randomly sampled committees) will download the data and attest to their availability. Light nodes, i.e., nodes that did not download the data, can check the availability of the data by running *data availability sampling (DAS)* against the data. DAS will be explained in detail in the [next section](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#A-brief-explanation-of-Data-Availability-Sampling). Note that DAS does not rely on trusting the majority of the network. Even if a majority of validators act maliciously and accept an unavailable data blob, the light clients will be able to detect the unavailability in a trustless manner. ![](https://i.imgur.com/JKcJzqn.jpg =500x) The word "light clients" here refers to nodes that have not completely downloaded a specific set of data. For instance, when paired with randomly sampled committees, full nodes that were not chosen to download the data can be considered as "light clients" for that data. It is worth emphasizing that passing DAS is often part of the *requirement* for a block to be accepted. In other words, it is considered part of the validity condition of the chain. This means that invalid data is treated similarly to invalid transactions. If a block contains a transaction that creates a large amount of currency out of nowhere, it will be considered invalid and will be rejected by honest nodes that execute the transactions. Similarly, a block containing an unavailable data will also be rejected by honest nodes that run DAS regardless of what the rest of the network thinks. :::info #### Data Availability Sampling - Trust assumption for data availability: - Honest minority in the L1 network. When an honest node fails DAS for a block, it can reject that chain even if the majority of nodes are following the unavailable block. - Pros: - Can protect against a data withholding attack from a malicious majority. - Cons: - Inefficient when the data is split into multiple independent "shards". See this later section for more detail: - [Problem with the Eth2 sharding approach](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?view#Problem-with-the-Eth2-sharding-approach) - Designing an efficient P2P network for DAS is very difficult problem. See this later section for more information: - [Difficulty in P2P for Danksharding DAS](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?view#Difficulty-in-P2P-for-Danksharding-DAS) - Without a private P2P network, the system is susceptible to [selective disclosure attacks](https://forum.celestia.org/t/evaluation-of-private-networks-for-bypassing-selective-disclosure-attacks-in-data-availability-sampling/819). In this type of attack, the attacker responds to DAS requests for a specific subset of nodes and convinces them to accept a block without revealing enough parts to reconstruct the data. <!--- - Quote from John Adler, co-founder of Celestiao, from [this podcast](https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy84OWY0YWE2OC9wb2RjYXN0L3Jzcw/episode/VGhlRGVscGhpUG9kY2FzdC5wb2RiZWFuLmNvbS9mOGI2YzBjOS04MmJjLTM2NzQtOGY2Mi1lM2I0ZTJiYjczMjA?ep=14): > If you have multiple applications, or multiple subchains, for example in Eth2 you have 64 shards, you need to do DAS for each of these. This means you are doing extra work. The best way to get the most scale out of DAS is by having very large blocks, because the scaling factor is the square root of the block size. If you have a block size of 1 million, then square root is 1000. ---> ::: ##### A brief explanation of Data Availability Sampling Data availability sampling[^references] is a technique to ensure, with a high probability, that some data is available while only downloading a small portion of the entire data. It works as follows: [^references]: [References](https://hackmd.io/@p-cUv0l5RNaDKBCowZ0IzA/HJgFgSzpo/https%3A%2F%2Fhackmd.io%2F%40p-cUv0l5RNaDKBCowZ0IzA%2FH1_zMMl0s) The data is erasure coded and split into `k*n` parts where `n` is the size of the original data, and `k` is some parameter. Due to the nature of erasure coding, any subset of `n` parts out of the `k*n` parts is sufficient to reconstruct the entire data. When a client in the network wants to ensure data availability of the data, it will randomly sample parts of the erasure-coded data and only consider the data available if all sampling requests succeed. Now how does the above scheme actually ensure data availability? Suppose that a malicious actor wants to withhold the data while convincing the network that the data is available. Since `n` parts out of the `2*n` parts (for this explanation, we assume `k=2`) is enough to reconstruct the whole data, they must withhold at least `n+1` parts to the network (= can make at most `n-1` parts available). ![](https://i.imgur.com/apnLBTL.png =500x) Now suppose a client attempts to check the availability of the data. Since the probability of one sampling request to succeed is less than `1/2` (because `(n-1)/2n < 1/2`), the probability of `x` consecutive sampling requests to succeed is less than `(1/2)^x`, hence the chance of withholding data becomes exponentially smaller with the number of sampling requests. For example, with `x=20`, the possibility of passing the DAS check while withholding data will be less than `1/10^6` (one in one million). This solution is discussed further [here](https://hackmd.io/E6_izRwdRbaas-2rMZ_f0A#Sampling-for-the-DAL) while we discuss future improvements to the DAL design for Tezos. ## Approaches of various blockchains ### Summary The table below is a summary of the data availability solutions of blockchains covered in this document. The columns "Randomly Sampled Committee," "Part-wise Attestations," and "Data Availability Sampling" indicate whether each solution utilizes the corresponding data availability technique discussed in this document. The "Decentralized Block Production" column indicates whether or not the network relies on powerful block producers to erasure code and distribute all the data in the block. Note that, as mentioned in the Vitalik's [endgame post](https://vitalik.ca/general/2021/12/06/endgame.html), centralizing block production is not neccesarily bad as long as block *validation* is trustless. | Blockchain | Randomly Sampled Committee | Part-wise Attestations | Data Availabiliy Sampling | Decentralized Block Production | |:-------------------- | -------------------------- |:---------------------- | ------------------------- |:----------------------- | | Polkadot | ◯ | ◯ | X | ◯ | | NEAR | ◯ | ◯ | X | ◯ | | Eth2 | ◯ | X | ◯ | ◯ | | Danksharding Phase 1 | X | ◯ | X | X | | Danksharding Phase 2 | X | ◯ | ◯ | X | | Celestia | X | X | ◯ | X | | Tezos DAL V1 | X | ◯ | X | ◯ | ### Polkadot The structure of the Polkadot network is represented in the following figure. ![An overview of Polkadot's blockchain with parachains](https://hackmd.io/_uploads/H1tCwgpjo.png =500x) Polkadot has two types of chains. At the center of the figure above, there is a chain called the [*relay chain*](https://wiki.polkadot.network/docs/learn-architecture#relay-chain) which acts as the hub for the whole system. Transactions are processed by several different blockchains, known as [*parachains*](https://wiki.polkadot.network/docs/learn-architecture#parachain-and-parathread-slots), which operate independently. Each parachain is managed by its own group of full nodes, called [*collators*](https://wiki.polkadot.network/docs/learn-architecture#collators). The collators are responsible for creating blocks for their specific parachain. A group of relay chain [validators](https://wiki.polkadot.network/docs/learn-architecture#validators) is selected to become the *para-validators* for each parachain. The para-validators are responsible for downloading and verifying new parachain blocks generated by the collators. Once a majority of the para-validators agree on the validity of a block, they send the header to all relay chain validators. Furthermore, in order to ensure data availability, para-validators construct erasure-coded chunks of the parachain block (or more accurately, the [PoV](https://wiki.polkadot.network/docs/glossary#proof-of-validity) of the block which include the transactions and the state that the transaction touches) and send a piece to each validator in the relay chain. The parachain block is accepted in the relay chain only if some threshold of validators report that they downloaded their chunk. The erasure coding is done so that only $1/3 + 1$ of the chunks are needed to reconstruct the full block; hence it is ensured that the block is reconstructable as long as more than $1/3$ of the validators are honest. In summary, in terms of data availability, the validators only accept a relay chain block if it was able to: - Download all data for para-validators ([Solution 2](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-2-Randomly-Sampled-Committees)). - Download a part of the data for all the other relay chain validators ([Solution 3](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-3-Part-wise-Attestation)). ### NEAR Nightshade ![Schema of nightshade](https://hackmd.io/_uploads/SJWrL63ij.png) [NEAR Nightshade](https://near.org/papers/nightshade/)'s data availability solution is similar to Polkadot's solution. Each block is split into *chunks*. Each validator is assigned to a specific chunk and is tasked to download, execute, and verify the chunk. To ensure data availability, validators not only download the assigned chunk but also downloads small erasure-coded *parts* of all the other chunks. In terms of data availability, the validators only accept a block if it is able to: - Download the whole data of their assigned chunk ([Solution 2](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-2-Randomly-Sampled-Committees)). - Download parts of all the other chunks ([Solution 3](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-3-Part-wise-Attestation)). Furthermore, in order to prevent validators from being [lazy](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#The-Lazy-Validator-Problem), NEAR introduces a concept of *colored parts*. #### Colored Parts Preventing lazyness via *[colored parts](https://near.org/papers/nightshade/#:~:text=Dealing%20with%20lazy%20block%20producers)* works as follows: - When the chunk is produced and split into erasure-coded parts, the chunk producer chooses a color (red or blue) for each part. - The color is combined with the parts when computing the Merkle root of the parts. - The color is combined with the part when it is being distributed. - When a validator attests for a chunk part, it must include the color of the part. - If the validator includes the wrong color, then it is possible to prove via the Merkle tree; hence it is possible to slash. This can be thought of as a simplified version of the proof of custody scheme described [here](https://dankradfeist.de/ethereum/2021/09/30/proofs-of-custody.html). In proof of custody, the colors for each part are personalized for each validator. This is to prevent depending on a centralized color provider. ### Eth2 In this section, we will see Eth2's proposal for data availability described [here](https://hackmd.io/@vbuterin/sharding_proposal). Note that this proposal has been dropped in favor of the Danksharding solution described later. As the term "Eth2" has been [phased out](https://blog.ethereum.org/2022/01/24/the-great-eth2-renaming) at the time when Danksharding was proposed, we refer to pre-Danksharding Eth2 design as just *Eth2*. In this proposal, block data is split into *shards*. For each shard, a randomly sampled set of validators, called *committee*, are assigned. The committee is responsible for proposing and verifying the given shard, and is shuffled at each epoch (a fixed number of blocks). For each shard in each block, a single member of the shard's committee is selected as the shard's *proposer*. The proposer will compile a *blob* of arbitrary data for the shard and broadcast a *shard header*, which contains the metadata of the blob. The other members of the shard's committee will attempt to download the full blob and sign the shard header if it succeeds. Additionally, in order to ensure data availability of unassigned shards, validators run data availability sampling against shards that they did not fully download. In summary, in terms of data availability, the validators only accept a block if it was able to: - Download all data for their assigned shard ([Solution 2](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-2-Randomly-Sampled-Committees)). - Successfully conduct DAS against all the other shards ([Solution 4](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-4-Data-Availability-Sampling)). All the nodes of the L1 accept and propagate the block if it was able to successfully conduct DAS against all the shards proposed in the block ### Celestia Unlike other solutions we have seen so far, [Celestia](https://celestia.org/) does not "shard" its blocks. All data is directly stored in large blocks in the main chain. Since full nodes are required to download all the data, Celestia has higher hardware requirements compared to sharded blockchains, which leads to centralization (i.e. they require *supernodes*). In such a setting, it is important to have a mechanism to monitor the full nodes in a decentralized way. Celestia uses data availability sampling ([Solution 4](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-4-Data-Availability-Sampling)) for this monitoring. With DAS, light clients (i.e., clients that do not download the full block) can validate the block's availability without relying on the honesty of a full node. Even if *all* the full nodes in the network conspire and publish an unavailable block, the light clients of the network will simply stop following the unavailable block, which effectively halts the progression of the chain. At this point, light client users can fall back to social consensus and spin up a new honest node and/or slash the malicious full nodes via a hard fork. Furthermore, in Celestia, light clients cache the fetched samples and serve them on request. This means that with enough light clients, full blocks can be fully reconstructed from the samples collectively stored in the light clients. (Note: [Polygon Avail](https://medium.com/the-polygon-blog/introducing-avail-by-polygon-a-robust-general-purpose-scalable-data-availability-layer-98bc9814c048pap), which is a data availability solution for the Polygon ecosystem, takes a similar approach to Celestia. One difference is that Polygon Avail relies on [Kate commitments](https://wiki.polygon.technology/ko/docs/avail/faq/#what-are-kate-commitments) instead of [fraud proofs](https://docs.celestia.org/concepts/data-availability-faq/#why-is-block-reconstruction-necessary-for-security) to prevent invalid erasure codings.) ### Danksharding Danksharding ([original proposal](https://notes.ethereum.org/@dankrad/new_sharding), [article from Dephidigital](https://members.delphidigital.io/reports/the-hitchhikers-guide-to-ethereum#danksharding), [workshop at Devcon Bogotá](https://www.youtube.com/watch?v=8L2C6RDMV9Q)) is the latest sharding proposal for Ethereum. Before explaining what it is, let's first look at the problem with the previous Eth2 sharding proposal described [in the above section](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Eth2). #### Problem with the Eth2 sharding approach In the Eth2 approach, data is split into shards where each shard is built and proposed independently: ![](https://i.imgur.com/tlAagen.jpg =450x) As a consequence, we must check availability (via DAS) for each shard independently. This is because, as illustrated below, it is enough for a malicious make a single shard unavailable to place harm to the system. ![](https://i.imgur.com/jlV3Bvr.jpg =450x) Hence with $N$ shards where each shard requires $K$ samples for DAS, $N * K$ samples are needed in total. #### Utilizing Powerful Block Builders One way to improve this situation is to introduce a new role, a *block builder*, that would gather data for all the shards in a block and create a single erasure-coded data blob. ![](https://i.imgur.com/W089QLu.jpg) This way, you would only need to run DAS once to check availability. Since such a block builder would require a lot of network bandwidth and computing power, it would naturally be a centralization vector. However, due to MEV, introducing centralization in the block-building process is *already* being proposed in the form of [*proposer builder separation (PBS)*](https://notes.ethereum.org/@vbuterin/pbs_censorship_resistance). Such centralization of block *production* is accepted as long as the *validation* stays decentralized, as expressed as the "centralized block production, decentralized validation" slogan in Vitalik's [endgame blog post](https://vitalik.ca/general/2021/12/06/endgame.html). Danksharding is a proposal to enable efficient sharding by utilizing the powerful block builder role introduced for PBS. The basic idea is to have the block builder be responsible for gathering the shard data and erasure coding/distributing them. Here is a [quote](https://www.reddit.com/r/ethereum/comments/w4nqmr/questions_on_danksharding/) that nicely summarizes the core idea of Danksharding. > genius of danksharding is to realize that MEV is incentivizing this powerful hardware anyway, so we might as well leverage their specialized role and have it be unified with data sharding. Note that such a design makes Ethereum's sharding closer to Celestia's design. Below is a [relevant tweet exchange](https://twitter.com/musalbas/status/1476172269315444740?s=20&t=hcwCJCAqqK843dqupksMiw) between Danksharding proposer and Celestia founder: ![](https://hackmd.io/_uploads/ByJlCG6is.png =300x) #### 2D Erasure Coding Scheme In the above explanation, we represented the erasure-coded block as a single 1D erasure coding. The problem with this approach is that it not only requires a powerful computer to create the block, but it also takes a powerful computer to reconstruct the data from the erasure coding. In order to support reconstruction on lower-resource computers, Danksharding employs a 2D erasure coding scheme. This approach divides the data into rows and columns and then applies erasure coding to both the row and column directions. ![](https://i.imgur.com/3nN6rAf.jpg =500x) This erasure-coding scheme allows for the data to be reconstructed row by row or column by column, which enables incremental reconstruction and makes it feasible for computers with limited resources to perform the reconstruction. Furthermore, having both rows *and columns*, and not just rows, makes the system more robust against data withholding attacks. This is because a malicious actor must withhold at least 25% of the erasure-coded data to conduct a data unavailability attack. ![](https://i.imgur.com/kAF9HSC.jpg =450x) If more than 75% of data is made available, we would be able to reconstruct the original data by combining the reconstruction of columns (e.g. (A) below) and rows (e.g. (B) below). ![](https://i.imgur.com/2oJbIY6.jpg =450x) #### Difficulty in P2P for Danksharding DAS Danksharding is planned to be split into two (or more) phases, where the initial version will NOT include DAS. This is because the P2P networking design for efficient DAS in Ethereum remains to be an open question. [Here](https://github.com/ethereum/requests-for-proposals/blob/master/open-rfps/das.md) is an open RFP for the DAS P2P, [here](https://youtu.be/8L2C6RDMV9Q?t=5502) is a talk explaining the difficulty of P2P for DAS, and [here](https://www.paradigm.xyz/2022/08/das) is a blog post that touches on the topic. Furthermore, below is a quote from Vitalik in [this podcast](https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5saWJzeW4uY29tLzI0NzQyNC9yc3M/episode/ZjNjZjgwYzktN2FmNy00NWM4LTkzMzUtMTg2YzdhMTM2MmUy?ep=14). > Moderator: If people want to contribute from the research/engineering point of view, what are the big open questions in sharding? > > Vitalik: One problem is definitely sorting out the networking of DAS. There are designs that we have that work in theory, like doing it based on subnets, making DHT faster, a couple of other techniques in the middle. But taking these ideas from the engineering perspective and optimizing is very hard. Basically, it is like a scalable specialized DHT where publishing/downloading can happen extremely quickly. I think in general, in the Ethereum ecosystem, the networking side is talked about the least. ... . But it is still a very important problem, and it would be amazing if we had more networking expertise in Ethereum. Note that in Celestia, the P2P networking is simplified by [relying on multiple supernodes](https://notes.ethereum.org/@dankrad/pbs-for-das-ideas#Supernodes) to download and distribute the complete data. However, in Ethereum, although they accept having high requirements for the block builder, they still aim is to keep the requirements for validators low. Therefore, they need to design a P2P network that does not necessitate high bandwidth for validators. In conclusion, designing a P2P network for DAS that does not rely on validators being supernodes is an open problem. #### Danksharding without DAS The initial phase will rely on [part-wise attestation](https://hackmd.io/GfXfa1CqQtGY6zoessBzIw?both#Solution-3-Part-wise-Attestation) for availability. Each validator will be assigned to download a few rows and columns in the erasure-coded matrix (e.g. two rows and two columns), and will only attest for a block if the assigned rows/columns are available. This ensures that the data is reconstructable as long as the majority of the validators are honest. This is described as "Danksharding honest majority validation" in the below [slide](https://youtu.be/8L2C6RDMV9Q?t=5354): ![](https://i.imgur.com/pIatWO5.png =450x) #### Danksharding with DAS As a future upgrade, Danksharding will incorporate DAS in order to obtain "malicious majority safety": ![](https://i.imgur.com/XW7b7CV.png =450x) <!-- Here is a quote that nicely summarizes the core idea of Danksharding. > **genius of danksharding is to realize that MEV is incentivizing this powerful hardware anyway, so we might as well leverage their specialized role and have it be unified with data sharding**. > https://www.reddit.com/r/ethereum/comments/w4nqmr/questions_on_danksharding/ --> <!--- #### DAS v.s. Part Attestations TODO: Revisit and refine. TODO: What is the incentive for nodes to run the DAS? If there is no incentive for nodes to be "not lazy", then does it really give benefit compared to part attestation? In traditional blockchains, all transactions must be validated in order for a block to be accepted. This means that all nodes in the network will execute each transaction in the block and, if any of the transactions are found to be invalid, the block will be rejected. The decision to accept or reject a block is "universal" for all honest nodes since they can independently and without trust execute the transactions and reach the same conclusion. Now suppose there is a new type of transaction called a "data commitment transaction" is introduced. This type of transaction is a commitment to a data blob, such as a slot header in Tezos DAL or a parachain block header in Polkadot. The challenge with these transactions is that they cannot be validated in the same way as traditional transactions, as there is nothing to execute. Instead, validation involves checking that the data was actually published. There are two ways to validate these data commitment transactions: part attestation and DAS. These methods differ in how they preserve the universality of the apply function. Part attestation results in the apply function becoming parametrized by the public key hash of the node running the apply function, meaning that universality is broken (i.e. not guaranteed that all nodes reach to the same conclusion). On the other hand, DAS retains the universality of the apply function, as it is not parametrized. Or if you look at it in another way, with part attestation, the validation of data commitment transactions is moved outside of the apply function, while with DAS, validation remains within the apply function. ## TODO - Be clear about the terminology of colums in the table. - Comment column and mention about Polygon Avail. - OK to add twitter links. - Can leave out intro part. --->

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully