vbuterin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
12
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Data Availability Sampling Phase 1 Proposal # WIP The purpose of this document is to describe in more detail a proposal for how phase 1 can be structured based on a "data-availability-focused" approach. The main addition to the beacon chain will be a `Vector` of `ShardDataHeader` objects, one for each shard. A `ShardDataHeader` is a small object which represents a large amount of underlying data (roughly 0-512 kB in size). A block is only valid if the underlying data that the `ShardDataHeader` points to is _available_ - that is, it has been published to the network and anyone can download it. However, to preserve scalability, clients will not try to download the full underlying data of every `ShardDataHeader` to verify the block. Instead, they will verify that the data is available using an indirect technique called _data availability sampling_. ### Parameters | Parameter | Value | Description | | - | - | - | | `SHARD_COUNT` | 64 | Number of `ShardDataHeader` objects per block | | `POINTS_PER_SAMPLE` | 8 | Number of evaluations of the polynomial in each chunk that gets sampled (we group a few evaluations together for efficiency reasons) | | `MODULUS` | [$\approx 5.2 * 10^{77}$](https://github.com/ethereum/py_ecc/blob/master/py_ecc/bls12_381/bls12_381_curve.py#L21) | The modulus that is used for arithmetic operations (the polynomial construction and evaluation are all done in [modular arithmetic](https://en.wikipedia.org/wiki/Modular_arithmetic)). With this modulus we have ~31.7 bytes per evaluation | | `SAMPLES_PER_BLOCK` | 4096 | Number of samples in a block when that block is extended (so the amount of actual data is at most half this) | | `FAST_SHUFFLING_SAMPLES` | 12 | Number of sample indices that adjust quickly | | `SLOW_SHUFFLING_INDICES` | 4 | Number of sample indices that adjust slowly | | `SLOW_SHUFFLING_PERIOD` | 4096 | Slots between slot reshuffle | ## Background: Data Availability Sampling There exists an efficient algorithm to encode N chunks of data $D[0] ... D[n-1]$ (the chunks can have any size) into 2N chunks, such that _any_ N of those chunks suffice to reconstruct the full data. The algorithm is as follows. Compute the polynomial $P$ where $P(0) = D[0]$, $P(1) = D[1]$ ... $P(N-1) = D[N-1]$. This polynomial is guaranteed to have degree < N. Then, evaluate this polynomial at $2n$ positions, the original N but also $P(N) ... P(2N-1)$, to get 2N evaluations. We now use an important property of polynomials: given a polynomial of degree < N, _any_ N evaluations of the polynomial at N different points can be used to reconstruct the polynomial and hence the original data. Hence, if we consider the collection of 2N evaluations of this polynomial at the values $0 ... 2N-1$, we can see that any N of them can be used to reconstruct the polynomial, and from there reconstruct the original data - exactly our goal. (note that the above description is simplified for exposition; to make it possible to use [fast Fourier transforms](https://vitalik.ca/general/2019/05/12/fft.html) to do the above operations in $N*log(N)$ time, powers of some root of unity $\omega$ are used as evaluation points instead of the integers $0...2N-1$) What does this give us? It gives us a way to turn _50% availability_ into _100% availability_. This matters because checking for 50% availability is vastly easier than checking for 100% availability. Checking for 100% availability can only be done by downloading all the data; even if you download an entire 90% of the data, there's a 10% chance that some single missing chunk is somewhere in the remaining 10%. **But checking for 50% availability can be done probabilistically, with only a few random sample checks.** This is the key trick that allows us to do scalable data availability validation. ## Kate commitments See https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html [geometry dash world](https://geometrydashworld.online) for an introduction to Kate commitments A Kate commitment is a fixed size elliptic-curve-pairing-based commitment $com(P)$ to a polynomial $P$, which has the property that for any evaluation point $z$ where $P(z) = a$ for some $a$, you can provide an _opening proof_, another elliptic curve point $Q$, where a verifier given $com(P)$ and $Q$ can be convinced that $P(z)$ actually does equal $a$. This is very useful technology for us, because it means that if we generate the commitment for the polynomial and use that to represent the data, and we generate openings for each of the 2N outputs, then the openings become "self-verifying". **As long as the header is well-formatted, there is no way for a shard data block to be invalid. It can only be unavailable**. This key property allows us to completely avoid the need for any fraud proofs, eliminating a large class of complexity and attacks. ## Subnets Unlike in eth1, we can no longer simply publish all shard data to the same shared p2p network. This is because a node that is on that network would be forced to download all of that data, removing the benefits from sharding! With the above parameters, the amount of data broadcasted would be 256 kB per block (on average, assuming an EIP-1559-like mechanism) * 64 shards / 12 seconds per slot = 1.33 MB/sec (~11 Mbps) (plus P2P overhead), but it is expected to expand over time with more shards and a larger block size. Instead, we use a **subnet** architecture, where there are multiple p2p networks and each node is only usually on a few of them at a time. We have three kinds of subnets: * Shard block subnets (or **horizontal subnets**) * Sample index subnets (or **vertical subnets**) * Cluster protection subnets(or the **main-diagonal subnets**) * The `ShardDataHeader` subnet (or the **global subnet**) When a shard data block is published, the header is published on the global subnet, and the block is published on the horizontal subnet corrsponding to the shard ID. For each $i$ in `[0 ... SAMPLES_PER_BLOCK - 1]`, the i'th set of evaluations (`[POINTS_PER_SAMPLE * i ... POINTS_PER_SAMPLE * (i+1) - 1]`) is published on vertical subnet $i$. ![](https://i.imgur.com/UJrhdSL.png) Figure 1. _(Note: a possible extension is to shuffle which vertical subnet corresponds to which sample index in each slot, so as to balance the load of the different shards in the case of blocks of different sizes)_ ![](https://www.researchgate.net/profile/120445-120472-120475-120459-120462-120475-120477-95-120444-120458-120475-120480-120458-120471/publication/47557940/figure/fig5/AS:670000604008485@1536752000953/Adjacency-matrices-corresponding-to-different-types-of-networks-constructed-from-the.png) Figure 2. _(Note: Adjacency matrices is to classify different types of networks constructed on the slots and nodes involved in validating specific transactions)_ In practice, there would be 2048 horizontal subnets instead of 64, to allow one horizontal subnet per (shard, slot) combo in each epoch. This is done to ensure that each validator can join a single subnet where they will only receive the block corresponding to the committee that they were assigned to. Each validator must join the following subnets: * The global subnet * The horizontal subnet corresponding to the (shard, slot) combo (ie. the committee) that they were assigned to * The vertical subnets corresponding to indices that they are assigned to (each validator computes this for themselves using a private seed) * The main-diagonal subnets corresponding to the complex networks for information validating encryption ### Index calculation algorithm `sample_indices` returns the vertical shards that a validator should make sure to be in during a given slot, based on their private seed. _The code below can be run self-contained._ ```python from hashlib import sha256 def hash(x): return sha256(x).digest() SAMPLES_PER_BLOCK = 4096 FAST_SHUFFLING_SAMPLES = 12 SLOW_SHUFFLING_SAMPLES = 4 SLOW_SHUFFLING_PERIOD = 4096 OFFSET = 2**24 bytes32 = None; List = {int: None} def get_delta(seed: bytes32, adjusted_slot: int, positions: int) -> (int, int): # TODO: consider replacing with an MPC-friendly function like Legendre sub_seed = hash( seed + adjusted_slot.to_bytes(32, 'little') ) position = int.from_bytes(sub_seed[:8], 'little') % positions new_index = int.from_bytes(sub_seed[8:16], 'little') % SAMPLES_PER_BLOCK return (position, new_index) def sample_helper(seed: bytes32, slot: int, positions: int) -> List[int]: output = [None] * positions adjusted_slot = slot + OFFSET while None in output: delta_position, delta_new_index = get_delta(seed, adjusted_slot, positions) if output[delta_position] is None: output[delta_position] = delta_new_index adjusted_slot -= 1 return output def sample_indices(secret_seed: bytes32, public_seed: bytes32, slot: int) -> List[int]: fast_indices = sample_helper(secret_seed, slot, FAST_SHUFFLING_SAMPLES) offset = int.from_bytes(public_seed[:4], 'little') period = (slot + offset) // SLOW_SHUFFLING_PERIOD slow_indices = sample_helper(public_seed, period, SLOW_SHUFFLING_SAMPLES) return fast_indices + slow_indices ``` The intended behavior is that there are two types of indices: fast-reshuffling indices and slow-reshuffling indices. One fast-reshuffling index changes every slot; one slow-reshuffling index changes every `SLOW_SHUFFLING_PERIOD` slots. The slow-reshuffling indices are chosen from a public seed; this allows easy discoverability of nodes that are guaranteed to be on particular subnets. ### ShardDataCommitment structure ```python class ShardDataCommitment(Container): commitment: BLSPubkey length: uint64 length_proof: BLSPubkey # Signatures and signer-related data TBD ``` ### Publishing Note that while we can reasonably expect the publisher of a shard data block to be on the horizontal subnet of the desired shard, they are not going to be on _all_ the vertical shards. To solve this, we use a two-step broadcasting mechanism: * Every block has an associated committee that are all on that horizontal shard. The committee members (~128) receive the block. * Each of the committee members (128) asks each of their peers (conservatively, 128 * 10) what vertical subnets they are in (128 * 10 * 16). They broadcast the appropriate samples to each peer. * Each peer that receives a valid sample broadcasts it to their subnet. This broadcasts an expected 20480 samples, enough to in expectation provide ~14 times the number of samples needed to reconstruct the block (including inefficiency from double-covering an index, you need ~69.3% of a block, or ~1419 samples). This provides a large safety margin and ensures that the block actually does get broadcasted. ### Technical details: Fast Fourier Transform To make the erasure coding efficient, we find a _root of unity_ with order $2N$ = `POINTS_PER_SAMPLE * SAMPLES_PER_BLOCK`; that is, we find a $\omega$ such that $\omega^N = 1$ (in modular arithmetic modulo the `MODULUS` in the config above) but $\omega^k \ne 1$ for any $1 \le k < N$. We use $\omega = 5^{\frac{MODULUS - 1}{32768}}$: `w = 30195699792882346185164345110260439085017223719129789169349923251189180189908` Instead of using $[1 ... 2N]$ as our evaluation points, we use powers of $\omega$ _in order of reverse bit representation_: ```python def reverse_bit_order(bits): if bits == 0: return [0] return ( [x*2 for x in reverse_bit_order(bits-1)] + [x*2+1 for x in reverse_bit_order(bits-1)] ) ``` Here are the examples of outputs with low values (the value that would be used as an input to generate the exponents that will actually get used is `log2(32768) = 15`): ```python >>> reverse_bit_order(1) [0, 1] >>> reverse_bit_order(2) [0, 2, 1, 3] >>> reverse_bit_order(3) [0, 4, 2, 6, 1, 5, 3, 7] >>> reverse_bit_order(4) [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15] ``` Reverse bit order has a neat property that any power-of-two-aligned subset (another way to think of this: any subset that would form a clean subtree if the list were put into a Merkle tree) is a subgroup (or a coset of a subgroup) of the multiplicative group of powers of $\omega$. This makes it very mathematically easy and clean to: * Use a [fast Fourier transform](https://vitalik.ca/general/2019/05/12/fft.html) to extend the original data ($N$ points) into the extended data ($2N$ points). Particularly, note that the original data, which goes in the first half of the evaluation points, neatly falls into even powers of $\omega$ (ie. powers of $\omega^2$), which is a subgroup, and so an FFT can be applied directly. * Prove that any particular contiguous sub-range is full of zeroes. For this, we compute a polynomial $Z(x) = (x - \omega^{rev(i)}) * ... * (x - \omega^{rev(j)})$, and provide a commitment to $Q(x) = \frac{P(x)}{Z(x)}$. The verifier can generate the G2-representation of $Z$ and do a pairing check to verify $e(com(Q), com(Z)) = e(com(P), com(1))$. Note that the powers of $\omega$ can be viewed as at most $2 * log_2(j-i)$ subtrees ($log_2(j-i)$ if $j = \frac{N}{2}$), making it easy to compute $Q(x)$. $Z(x)$ too will be a product of a few $x^{2^k} - \omega^{2^k * r}$ terms, making it and the commitment to it easy to calculate. The latter property is needed because for EIP 1559 purposes, we need to enforce the size of a shard block in such a way that it can be verified from the shard header. For this, we use the application-specific SNARK summarized above to prove that the data outside the range is zeroes, so no one needs to try to propagate or download those coordinates. ### Serialization One important point that must be stressed in this data availability scheme is that **the shard block data should be viewed as natively _being_ a list of integers in the range `[0...MODULUS-1]`**. This is a departure from eth1 and indeed almost all other blockchains (except perhaps iota, which uses ternary?), which tend to view data as natively being bytes. That is, if for example `MODULUS = 13`, the set `[5, 10, 2, 7, 12, 4, 9, 1]` would be admissible data, but `[1, 8, 15, 6, 13, 4, 11, 2]` is not: 13 and 15 are invalid elements. This should be viewed philosophically similarly to how eg. `0x2babd4ba18af` is valid hexadecimal, but `0x7d394a1bdghi` is not. The data availability check that we use is natively built around modular arithmetic, and so the data that it verifies availability of is in the format of a list of integers in the range `[0...MODULUS-1]`. The main challenge here is that serialization and networking use bytes, and so there is a disconnect of formats between serialization/networking and the data availability scheme (which uses points): the 32-byte blob `0xffffffff...ffffffff` is valid _bytes_, but it's not a valid _point_, because points must be lower than the modulus (which is a prime number somewhere between $2^{254}$ and $2^{255}$). To deal with this disconnect, we encode every point as 32 bytes (in little endian), but we add to every object that contains a point a validity condition that requires the point to be less than the modulus. Objects that contain points that are out of range are effectively ignored and treated as nonexistent, as though they had never been received by the client. Note that _at the application layer_, applications usually prefer to use bytes. How to convert a list of points into bytes is an application-layer choice; a natural default is to just take $x\mod 256^{31}$, looking only at the bottom 31 bytes of every point.

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully