The purpose of this document is to describe in more detail a proposal for how phase 1 can be structured based on a "data-availability-focused" approach.
The main addition to the beacon chain will be a Vector
of ShardDataHeader
objects, one for each shard. A ShardDataHeader
is a small object which represents a large amount of underlying data (roughly 0-512 kB in size). A block is only valid if the underlying data that the ShardDataHeader
points to is available - that is, it has been published to the network and anyone can download it. However, to preserve scalability, clients will not try to download the full underlying data of every ShardDataHeader
to verify the block. Instead, they will verify that the data is available using an indirect technique called data availability sampling.
Parameter | Value | Description |
---|---|---|
SHARD_COUNT |
64 | Number of ShardDataHeader objects per block |
POINTS_PER_SAMPLE |
8 | Number of evaluations of the polynomial in each chunk that gets sampled (we group a few evaluations together for efficiency reasons) |
MODULUS |
The modulus that is used for arithmetic operations (the polynomial construction and evaluation are all done in modular arithmetic). With this modulus we have ~31.7 bytes per evaluation | |
SAMPLES_PER_BLOCK |
4096 | Number of samples in a block when that block is extended (so the amount of actual data is at most half this) |
FAST_SHUFFLING_SAMPLES |
12 | Number of sample indices that adjust quickly |
SLOW_SHUFFLING_INDICES |
4 | Number of sample indices that adjust slowly |
SLOW_SHUFFLING_PERIOD |
4096 | Slots between slot reshuffle |
There exists an efficient algorithm to encode N chunks of data
The algorithm is as follows. Compute the polynomial
We now use an important property of polynomials: given a polynomial of degree < N, any N evaluations of the polynomial at N different points can be used to reconstruct the polynomial and hence the original data.
Hence, if we consider the collection of 2N evaluations of this polynomial at the values
(note that the above description is simplified for exposition; to make it possible to use fast Fourier transforms to do the above operations in
What does this give us? It gives us a way to turn 50% availability into 100% availability. This matters because checking for 50% availability is vastly easier than checking for 100% availability. Checking for 100% availability can only be done by downloading all the data; even if you download an entire 90% of the data, there's a 10% chance that some single missing chunk is somewhere in the remaining 10%. But checking for 50% availability can be done probabilistically, with only a few random sample checks. This is the key trick that allows us to do scalable data availability validation.
See https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html geometry dash world for an introduction to Kate commitments
A Kate commitment is a fixed size elliptic-curve-pairing-based commitment
This is very useful technology for us, because it means that if we generate the commitment for the polynomial and use that to represent the data, and we generate openings for each of the 2N outputs, then the openings become "self-verifying". As long as the header is well-formatted, there is no way for a shard data block to be invalid. It can only be unavailable. This key property allows us to completely avoid the need for any fraud proofs, eliminating a large class of complexity and attacks.
Unlike in eth1, we can no longer simply publish all shard data to the same shared p2p network. This is because a node that is on that network would be forced to download all of that data, removing the benefits from sharding! With the above parameters, the amount of data broadcasted would be 256 kB per block (on average, assuming an EIP-1559-like mechanism) * 64 shards / 12 seconds per slot = 1.33 MB/sec (~11 Mbps) (plus P2P overhead), but it is expected to expand over time with more shards and a larger block size.
Instead, we use a subnet architecture, where there are multiple p2p networks and each node is only usually on a few of them at a time. We have three kinds of subnets:
ShardDataHeader
subnet (or the global subnet)When a shard data block is published, the header is published on the global subnet, and the block is published on the horizontal subnet corrsponding to the shard ID. For each [0 ... SAMPLES_PER_BLOCK - 1]
, the i'th set of evaluations ([POINTS_PER_SAMPLE * i ... POINTS_PER_SAMPLE * (i+1) - 1]
) is published on vertical subnet
Figure 1. (Note: a possible extension is to shuffle which vertical subnet corresponds to which sample index in each slot, so as to balance the load of the different shards in the case of blocks of different sizes)
In practice, there would be 2048 horizontal subnets instead of 64, to allow one horizontal subnet per (shard, slot) combo in each epoch. This is done to ensure that each validator can join a single subnet where they will only receive the block corresponding to the committee that they were assigned to.
Each validator must join the following subnets:
sample_indices
returns the vertical shards that a validator should make sure to be in during a given slot, based on their private seed. The code below can be run self-contained.
from hashlib import sha256
def hash(x):
return sha256(x).digest()
SAMPLES_PER_BLOCK = 4096
FAST_SHUFFLING_SAMPLES = 12
SLOW_SHUFFLING_SAMPLES = 4
SLOW_SHUFFLING_PERIOD = 4096
OFFSET = 2**24
bytes32 = None; List = {int: None}
def get_delta(seed: bytes32, adjusted_slot: int, positions: int) -> (int, int):
# TODO: consider replacing with an MPC-friendly function like Legendre
sub_seed = hash(
seed +
adjusted_slot.to_bytes(32, 'little')
)
position = int.from_bytes(sub_seed[:8], 'little') % positions
new_index = int.from_bytes(sub_seed[8:16], 'little') % SAMPLES_PER_BLOCK
return (position, new_index)
def sample_helper(seed: bytes32, slot: int, positions: int) -> List[int]:
output = [None] * positions
adjusted_slot = slot + OFFSET
while None in output:
delta_position, delta_new_index = get_delta(seed, adjusted_slot, positions)
if output[delta_position] is None:
output[delta_position] = delta_new_index
adjusted_slot -= 1
return output
def sample_indices(secret_seed: bytes32,
public_seed: bytes32,
slot: int) -> List[int]:
fast_indices = sample_helper(secret_seed, slot, FAST_SHUFFLING_SAMPLES)
offset = int.from_bytes(public_seed[:4], 'little')
period = (slot + offset) // SLOW_SHUFFLING_PERIOD
slow_indices = sample_helper(public_seed, period, SLOW_SHUFFLING_SAMPLES)
return fast_indices + slow_indices
The intended behavior is that there are two types of indices: fast-reshuffling indices and slow-reshuffling indices. One fast-reshuffling index changes every slot; one slow-reshuffling index changes every SLOW_SHUFFLING_PERIOD
slots. The slow-reshuffling indices are chosen from a public seed; this allows easy discoverability of nodes that are guaranteed to be on particular subnets.
class ShardDataCommitment(Container):
commitment: BLSPubkey
length: uint64
length_proof: BLSPubkey
# Signatures and signer-related data TBD
Note that while we can reasonably expect the publisher of a shard data block to be on the horizontal subnet of the desired shard, they are not going to be on all the vertical shards. To solve this, we use a two-step broadcasting mechanism:
This broadcasts an expected 20480 samples, enough to in expectation provide ~14 times the number of samples needed to reconstruct the block (including inefficiency from double-covering an index, you need ~69.3% of a block, or ~1419 samples). This provides a large safety margin and ensures that the block actually does get broadcasted.
To make the erasure coding efficient, we find a root of unity with order POINTS_PER_SAMPLE * SAMPLES_PER_BLOCK
; that is, we find a MODULUS
in the config above) but
We use
w = 30195699792882346185164345110260439085017223719129789169349923251189180189908
Instead of using
def reverse_bit_order(bits):
if bits == 0:
return [0]
return (
[x*2 for x in reverse_bit_order(bits-1)] +
[x*2+1 for x in reverse_bit_order(bits-1)]
)
Here are the examples of outputs with low values (the value that would be used as an input to generate the exponents that will actually get used is log2(32768) = 15
):
>>> reverse_bit_order(1)
[0, 1]
>>> reverse_bit_order(2)
[0, 2, 1, 3]
>>> reverse_bit_order(3)
[0, 4, 2, 6, 1, 5, 3, 7]
>>> reverse_bit_order(4)
[0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]
Reverse bit order has a neat property that any power-of-two-aligned subset (another way to think of this: any subset that would form a clean subtree if the list were put into a Merkle tree) is a subgroup (or a coset of a subgroup) of the multiplicative group of powers of
This makes it very mathematically easy and clean to:
The latter property is needed because for EIP 1559 purposes, we need to enforce the size of a shard block in such a way that it can be verified from the shard header. For this, we use the application-specific SNARK summarized above to prove that the data outside the range is zeroes, so no one needs to try to propagate or download those coordinates.
One important point that must be stressed in this data availability scheme is that the shard block data should be viewed as natively being a list of integers in the range [0...MODULUS-1]
. This is a departure from eth1 and indeed almost all other blockchains (except perhaps iota, which uses ternary?), which tend to view data as natively being bytes.
That is, if for example MODULUS = 13
, the set [5, 10, 2, 7, 12, 4, 9, 1]
would be admissible data, but [1, 8, 15, 6, 13, 4, 11, 2]
is not: 13 and 15 are invalid elements. This should be viewed philosophically similarly to how eg. 0x2babd4ba18af
is valid hexadecimal, but 0x7d394a1bdghi
is not. The data availability check that we use is natively built around modular arithmetic, and so the data that it verifies availability of is in the format of a list of integers in the range [0...MODULUS-1]
.
The main challenge here is that serialization and networking use bytes, and so there is a disconnect of formats between serialization/networking and the data availability scheme (which uses points): the 32-byte blob 0xffffffff...ffffffff
is valid bytes, but it's not a valid point, because points must be lower than the modulus (which is a prime number somewhere between
To deal with this disconnect, we encode every point as 32 bytes (in little endian), but we add to every object that contains a point a validity condition that requires the point to be less than the modulus. Objects that contain points that are out of range are effectively ignored and treated as nonexistent, as though they had never been received by the client.
Note that at the application layer, applications usually prefer to use bytes. How to convert a list of points into bytes is an application-layer choice; a natural default is to just take