Data Availability Sampling - Research

Abstract

This document presents a diverse study of Data Availability Sampling (DAS) for Ethereum, a technique that enables efficient verification of data availability in blockchain networks. We study two primary approaches for data dissemination: unstructured (gossip-based) and structured (DHT-based). We focus on the structured approach, tailoring our random sampling design accordingly. Multiple algorithms and their variations are proposed and evaluated with a Rust-written prototype. Additionally, we provide some design considerations encompassing networking, security, and privacy aspects. Our results showcase performance and trade-offs between different approaches, offering limited insights for selecting appropriate techniques. This study contributes to ongoing research and development efforts to secure the rollup-centric future of Ethereum L1.

1. Introduction

The future of Ethereum is rollup-centric: instead of providing more space for transactions, it will reduce the cost to store blobs of data, which the core protocol does not attempt to interpret. Verifying a blob simply requires checking that the data is available - that it can be downloaded from the network. However, supporting a constant inflow of extrinsic data will require Ethereum's consensus to upgrade the way how such data availability (DA) is guaranteed.

Today ensuring availability requires nodes to download a full block and verify Merkle proofs associated with it. With an increased storage capacity per block, such a check will only be feasible for nodes running specialized hardware with high-throughput network connectivity, which goes against the core design principles of Ethereum. DA must remain sufficiently decentralized and be able to withstand targeted attacks.

For this purpose, a technique called random sampling was popularized in a paper by Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin in 2018. Since then it has gone through some major changes, introducing KZG instead of fraud proofs and replacing trusted committee with Danksharding (DS), but the core idea remains the same - extend useful data with multi-dimensional erasure codes such as Reed-Solomon, disseminate it to the network, and require validators to query a number of individual samples randomly until statistical confidence of full availability is reached.

In this work, we lay down the theoretical background for Data Availability Sampling (DAS) and propose multiple different options to deploy its core components onto the Ethereum network. The primary focus of the current research would be on sample dissemination and random sampling. However, we include some considerations regarding fault identification and data reconstruction here as well.

2. Reference materials

2.1 Cryptography

Reed-Solomon (RS) is a type of error-correcting code that works by encoding original data consisting of

n

chunks into a low-degree polynomial

f ()

and adding

k

redundant points on

f ()

so that if less than

k

chunks are missed it would be possible to reconstruct the original data by recovering the polynomial via Lagrange interpolation. More details in the note here.

In DAS, RS codes (with a rate

k / n

1 / 2

) are used to change the game for an attacker trying to make data unavailable: rather then withholding

1

n

chunks, an attacker would have to hide more then half (

k + 1

) of them (

n = 2 k

). Otherwise, the network could just reconstruct the rest of the data using the erasure codes.

Kate commitments (KZG) is a type of polynomial commitment scheme that employs structured reference string (SRS) and trusted setup, thus producing a toxic waste. It requires pairing-friendly curves and it's security assumes pairing group hardness. In KZG, the prover who needs to convince verifier that

y

is the correct value of the polynomial

f ()

z

, sends commitment

C = f (s) * G

and a proof

π = h (s) * G

such that

s

is a secret field element (toxic waste) and

G

is a curves generator point. The verifier is able to convince themselves this is true even if they don’t know

f ()

. More details in Scroll's blog post.

In DAS, KZG is used to prove that RS codes are well-formed (that all extension points lay on the same polynomial as the original data points, more details here) and can be used to reconstruct data. Being an alternative to fraud proofs, it has much lower communication overhead, though it does require more resources from a node generating proofs. Committing to all of the data in a single KZG commitment will also require nodes to download all of the data for reconstruction. We are okay with a super-node requirement for initial building, but we need to avoid such assumption for reconstruction.

2D-KZG scheme is a way to make reconstruction easier, by splitting each block into

m

shard blobs encoded in

m

KZG commitments, then RS codes used again to extend

m

commitments to

2 m

commitments resulting in a 2-dimensional matrix. In such a scheme, all rows and columns can be individually reconstructed, while no super-nodes are required. The 2D-KZG requires for ≥75% of the data to be sampled to ensure availability (as opposed to 50% with 1D), which leads to a bit higher number of fixed samples to query.

In the rest of this document, when we mention rows and columns we will refer to the commitments encoded along the horizontal and vertical axes according to the 2D-KZG scheme we have just outlined.

2.2 Network participants

Builders are resourced-optimized and highly-incentivized actors that are responsible for compiling "exec block body" (ordered lists of transactions) and submitting it along with a bid to the network. According to Vitalik's "Endgame", as long as block validation is decentralized, block building can be more centralized, ie. run by companies/institutions rather than ordinary users. Nonetheless, we need not to rely on builders to continuously (in the span of multiple epochs) act consistently and even honestly. More details in the spec here.

In DAS, builders are expected to erasure-code blobs of data and generate KZG proofs attesting to the validity of extended chunks.

Validators are actors whose responsibility is to verify the blocks produced by the current builder. It is crucial for validators to remain decentralized, so we should not assume much about their hardware.

In DAS, validators attest to the availability of 2 rows and 2 columns of the erasure coded data by querying the network for chunks that are associated with those locations. While they need to attest only in their own slot, validators are expected to keep querying rows and columns assigned to them until the change in next epoch.

Proposer is an actor selected from the validator set using the RANDAO mechanism. According to proposer-builder separation (PBS) proposal, these nodes are purposely separated from the builders to prevent censorship and MEV attacks. Proposer's main responsibility is to accept the exec block body with the highest bid. DAS makes no distinction between proposer and other validator nodes.

Regular nodes and light clients expected to be partake in random sampling and rebroadcast any samples they received to help disseminate samples to the entirety of the network.

2.3 Building blocks

Kademlia is a distributed hash table (DHT) protocol that organizes network into a binary tree. Each node maintains a partial view on the network tree stored in their routing table, which is further separated into

N

k

-buckets, each being a collection of at most

k

routing addresses of other nodes in the network. Nodes are identified by a

N

-bit key ranging from

0

2^{N} - 1

(originally

N = 160

). The distance between two keys (routing metric) is obtained using

X O R

. Being a symmetric operation, it allows Kademlia's nodes to receive query responses from the same distribution of nodes as contained in their routing tables.

In order to find the nodes that are currently responsible for a specific key

y

, the lookup initiator node browses its own k-buckets to initialize the initial lookup vector

\vec{V_{y}}

with

β

closest known nodes. Then, in parallel it sends the lookup key

y

to the first

α

nodes of

\vec{V_{y}}

. A node receiving

y

searches a bucket that belongs to the longest common prefix with

y

. From all the nodes found in that bucket,

β

records with the smallest

X O R

-distance to

y

are chosen and returned to the initiator and are put into

\vec{V_{y}}

. The initiator repeats the lookup over new nodes until no new ones are returned or the desired value is received.

Discv5 is a discovery protocol build on top of Kademlia with

N = 256

and

k = 16

. In addition to a bigger routing table, it also has a "replacement cache" queue holding recently-seen nodes which would have fallen into the corresponding buckets, but those are already at capacity. Unlike Kademlia, Discv5 doesn't support FIND_VALUE queries, as for peer discovery only FIND_NODES is needed. For security reasons, discv5 nodes must regularly perform liveness checks and routing table maintenance to favor long-lived nodes. The communication between discv5 nodes is done over the UDP transport.

Discv5 is already tightly integrated into Ethereum client design and implementation. Hence, whenever structured overlays are considered, we should prefer it rather than other Kademlia-based systems.

GossipSub is a p2p protocol that is based on a publish-subscribe system, in which members of the network subscribe to topics and messages will be pushed to them. A popular implementation of pub-sub systems uses a mechanism called flood-sub, which upon receiving a message, will forward it to any peers in the network that are subscribed to the topic. GossipSub takes this concept and makes it a bit more intelligent by asking peers if they have seen the message before forwarding it, “have you heard the gossip”, which reduces unnecessary communications.

2.4 Parameters & Known values

Block time = 12 seconds (could increase to 16s with 2-slot PBS)
Samples per blob (row or column) = 512 (each sample contains 16 field elements and a KZG multi-proof)
Field elements per sample = 16
Blob size = 512x16 = 8192 field elements
Blob matrix size: 512x512 = 262144 samples
Sample size = 512 bytes
Field element = 32 bytes
KZG proof size (BLS12-381) = 48 bytes
Data per block:
- target = 16 MB (useful) + 48 MB (extension) = 64 MB (full)
- max = 32 MB (useful) + 96 MB (extension) = 128 MB (full)
Minimum number of samples (for private sampling) = 75
KZG verifier time
$\approx$ 2 ms
Samples pruning period = 30 days

Values above are taken from multiple sources [I, II, III] and are subjected to changes.

3. Disseminate rows & columns

Validators need to download KZG commitments for their assigned rows and columns so that they can take part in the reconstruction. The exact numbers may vary for blocks produced at different slots. The target is 256, while the max is 512. Each commitment is 48 bytes, so it would be practical to send all of them in one message.

This data needs to be disseminated at the same time for each slot, similar to how full blocks are broadcasted today. Hence, it would be wise to consider same tool for the job. A flooding-based pub-sub gossip protocol such as GossipSub organizes communication into different “broadcast groups” (“topics”) that participants can join (“subscribe”) and send messages (“publish”) to.

There exist multiple configurations for how messages are sent over the gossip networks: push, pull, and push-pull. According to [1: p. 10-11], pull is generally faster and takes less bandwidth than other configurations, however, this advantage quickly vanishes when connection limit is presented. Hence, assuming the worst case hardware run by validators, we tend to give preference to push-based gossiping, which can also benefit from the higher bandwidth available to publishers, i.e. builders.

Another relevant observation is that builders may choose to include different amounts of data into their exec bodies of the same block. Since it is up to the proposer to decide on the winning body, to save time builders may need to broadcast rows & columns to the interested validators before the winning body is revealed. This way, builders will generate rows & columns and send it as a commitment to their exec block body, along with their bids.

Alternatively, the specific dimensions can be set to predetermined upper bound. To fill empty rows and columns, builders would have to pad blobs of data with zeros. This will spare the need to broadcast these values, which could potentially save some time. However, this will result in an unnecessary communication and storage overhead, arguably a worse trade-off.

4. Disseminate samples

A properly erasure-coded block result in more then a quarter of a million (512^2) samples. Any 50% of these is sufficient to reconstruct the whole block. Futhermore, any 50% of samples associated with a specific row/column is sufficient to reconstruct that row/column. However, before being requested all these sample must first be somehow put into the network. Hence, the aim of current component is to prepare network for the upcoming sampling queries.

There are two types of sampling the network must be able to support:

rows & columns sampling is how validators are suppose to fetch samples to validate 2 rows and 2 columns which corresponds to 2048 cells total.
private random sampling is how low-resource nodes will be able to check availability by requesting a threshold number (~75) of randomly chosen samples.

For the former, it would be sufficient to have a topology-aware broadcast that could reach all interested peers in the tightly-constraint time without creating a lot of unnecessary bandwidth. According to the PBS proposal validators will have roughly ~2.5 seconds (between T+5.33s and T+8s) to receive samples for their rows and columns. However, KZG proving for 32 MB of raw data will take ~1s with 32-64 core CPU, so unless all builders do this heavy computation before the winner announcement, the dissemination have even less time (~1.5s).

For the latter, it is crucial that samples are distributed uniformly and deterministically, such that any client could figure out where to look for the randomly chosen ones. Such a predictable configuration needs to remain this way even after the block has been called available. This is needed for nodes that have gone offline for an extended periods of time and are now wish to regain trust in the current leading chain, i.e. to know whether it was available that whole time.

In the following sections, we discuss some approaches for sample dissemination. We will then evaluate their performance and applicability according to the known parameters and based on our own simulation tests.

4.1 Sharded GossipSub

Individual validators remain interested in samples from the same rows and columns during the whole epoch (6.4 min today). Such rate of rotation is persistent enough to at least consider use of gossip protocols for dissemination samples to validators that follow the head.

At the start of each epoch, validators will subscribe to the topics indexed by the rows and columns they have just been assigned to. When ready they will receive samples from their subscribed topics until being re-assigned again in the next epoch.

There are 512 individual rows and columns (1024 total). Problem is that gossip communication tends to become unreliable if there is a large number of topics, each with few subscribers. This is because the sub-graph of nodes subscribed to a particular topic may no longer be connected causing flooding to fail.

Churn caused by frequently changing up topics is another issue. There are some strategies to perform such rotation slowly, which may help avoiding churn. However, these are tricky to implement in practice, especially with our strict time constraints.

One such approach was proposed by Vitalik. It involves sharding pub-sub topics horizontally on the subnets and propagate samples to subscribers vertically. This grid structure could provide a limited improvement when it comes to churn as horizontal subnets rotate slowly (once per epoch) while vertical subnets are not related to epochs schedule and thus can remain even longer than that. It also comes with some interesting shortcuts, e.g. indirect sample distribution, where peers could share a small number (~4) of public vertical subnets, to act as a force multiplier and increase coverage.

At this point, we already have one installation of the multi-sharded gossip on such a scale. The eth2-das prototype developed by Protolambda based on a slightly updated design, showed us that while feasible this will produce a lot of undesirable bandwidth. Most importantly, any gossip-based method will only work for live sampling, as there is no way to reliably request samples past the most recent block. With historical retrievability being a crucial requirement for DAS, this renders gossip-based approach suitable only as a parallel mechanism working along side with a structured network in case latter won't satisfy block time constraints.

4.2 Structured dissemination

Communication conducted over the structured overlays is known to have a much lower overhead compared to gossip-based protocols. Distributed hash tables (DHT) offer scalable routing latency and efficient lookup interface. However, structured protocols are generally more fragile in the presence of failures, lacking the natural resilience of gossip protocols. DHTs also were not designed for large-scale broadcast or data dissemination, requiring additional coordination to be used for samples dissemination.

A promising approach is to construct new broadcast primitives which combine structured and gossip-based approaches. In that way, one can benefit from the scalability and resilience of pure gossip-based and approximate the efficiency of the broadcast mechanism to that of structured solutions. Finally, data once disseminated into a P2P data structure could then be retrieved at any point in the future as long it remains stored by at least one live node.

In the rest of this section, we are to develop a specialized data dissemination protocol that considers network topology and the nature of data to be disseminated. Compared to the existing broadcast solutions that only define how a message is routed (routing strategy), our solution must also specify how individual pieces that collectively form the original data are bundled together into messages (bundling strategy) that are deterministically addressed to a certain selection of remote peers (peer selection strategy).

4.3 Bundling strategies

We define bundling as a means of grouping multiple pieces (samples from here and forth) of the same message into a bundle that can be sent over to certain remote peers. How that data was originally split (RS encoding here) is outside the bundling strategy definition, but it may be useful to consider that when allocating pieces to a group.

In practice, a bundling algorithm must take a list of samples and produce one or more groups of samples each addressed to one or more remote peers. Cases, where a single group is addressed to more than one peer, will be covered by the peer selection strategy definition. For now, assume that each bundle is intended for a certain remote peer based on some commonly shared characteristic(s), that must be unambiguously recognized by any node in the network. To ensure determinism and reproducibility, only public values must be used when mapping samples to their destination peers.

Below we present 3 bundling strategies and the trade-offs between them.

4.3.1 Group by distance

Kademlia uses XOR to determine the distance between two points in the id space. XOR is symmetric, allowing participants to evaluate queries from each other's point of view by only knowing their node identities. This property makes Kademlia's distance metric perfectly suited for grouping samples.

The initiator will start by calculating distances between each sample-id and node-ids of all remote peers it knows. Which she then sorts and selects

k

nodes that correspond to the shortest ones. Finally, samples are grouped by the selected node-id, resulting in one or more bundles of samples, mapped by node-id they are intended for.

pseudocode:

def group_by_closest_node(self, sample_ids, k):
    node_ids = self.discv5.get_all_known_node_ids()
    result = {} # Map<NodeId, [SampleId]>

    for node_id in node_ids:
        sorted_sample_ids = sample_ids.sort_by(lambda sample_id: sample_id.xor_distance(node_id))
        closest_sample_ids = sorted_sample_ids[:k]
        result[node_id] = closest_sample_ids

    return result

This method is simple and understandable, but it is far from being efficient: the complexity of this step is

O (K * N)

where

N

is a number of samples and

K

is a size of node's routing table. Today, Ethereum nodes store 50-150 ENR records in their routing table. This can be even higher for the DAS overlay. There is a lot of unnecessary work that is thrown away after sorting. Our next strategy relies on Kademlia's native data structure for optimization.

4.3.2 Group by buckets

In Kademlia, peers are organized into a binary tree. Its routing table, usually referred to as k-buckets table, is organized around these subtrees. Hence, the most natural way to map Kademlia's broadcasting space is bucket-wise.

The initiator will start by mapping samples to k-bucket indexes based on the XOR-distance between each sample-id and local node-id. For each bucket index, initiator will select

k

peers from its routing table. Kademlia guarantees that, in stable conditions, for every bucket with real nodes, there is at least one contact. If a node does not have contacts in a resulted buckets, it is likely that it is the closest one and it should store the corresponding sample locally. At the end, initiator applies same grouping as defined in the section before, resulting in one or more bundles mapped by node-ids.

pseudocode:

def group_by_buckets(self, sample_ids, k):
    local_view = self.discv5.get_nodes_per_bucket()
    bucket_indexes = [sample_id.xor_distance(self.local_node_id) for sample_id in sample_ids]
    grouped_sample_ids = group_sample_ids_by_bucket(sample_ids, bucket_indexes)
    result = {} # Map<[NodeId], [SampleId]>

    for (bucket_idx, samples) in grouped_sample_ids:
        nodes = local_view[bucket_idx]
        result[nodes] = samples

    return result

A remote peer receiving a bundle of samples from the initiator will perform the same operations based on their own perspective, using their local node-id. The one tricky part comes from the fact that buckets are different for each node (as they represent distances to the node). Thus, one must be sure that when a node sends a message to a contact from one of their buckets, this contact will in turn be able to map this bucket to its own routing table. Luckily, thanks to the proof in [2: p. 758], we know that for any forwarder

B

this property holds for Kademlia, as long the distance parameter

T

, being the width of the initiator's

A

bucket containing

B

, is used to select only those peers whose distances range from 0 to

T

. Alternatively, authors of [3: p. 1855] propose treating every forwarder node as the root of the subtree it is responsible for, which simplifies the analysis.

It is true that ids generated using collision-resistant hash functions form a uniform distribution over the Kademlia key space. This, however, is no longer the case when the relative distances are mapped to a finite number of buckets. Closer buckets are narrower, while most of the samples will correspond to a few furthest buckets. This will result in distant nodes receiving larger bundles than the closer ones. Although no particularly problematic since dissemination origin will change each slot, one can minimize this disproportion by tweaking parameter

K

, which effects number of buckets and their width. Our next strategy employs an alternative method that can further improve uniformity of sample distribution.

4.3.3 Group by prefix

Kademlia authors indicate that, although described in different terms [4: p. 1], Pastry’s prefix-based routing is very similar to that of Kademlia. Indeed, Kademlia’s buckets can be mapped to entries in a prefix-based routing table, while an increasing common prefix will directly result in shorter XOR-distance. We use this insight to define prefix-based bundling. Similar method was also proposed by Protolambda in his more recent work.

The initiator must maintain a secondary routing table along with Kademlia's primary one; it's essentially a binary prefix tree (trie) which organize remote peers in branches based on their common prefix (see example). An initiator will start by grouping samples by their common prefix of

(b)

bits (e.g.

00, 01, 10, 11

). For each group's prefix initiator will perform search on the secondary routing table and select

k

results. Same grouping is applied, resulting in one or more bundles mapped by node-ids.

A peer receiving a bundle is required to determine the current branching position on the trie to decide on further samples bundling. As proposed in [5: p. 2], such context awareness can be achieved by sending packets carrying the prefix currently addressed, which authors call destination prefix. This destination prefix will grow in length with every forwarding hop while descending the trie.

Since common prefix length and XOR-distance are directly correlated, it's likely that samples disseminated with this method will generally be retrievable by the standard Kademlia lookups. Nonetheless, one should mind edge cases and consider the possibility of degraded lookup efficiency and consider using a specialized prefix-based routing for lookups too.

4.4 Peer selection strategies

We define peer selection as a set of rules that control quantity and order by which peers are picked as a destination for bundles. Under stable condition it would be most efficient for each sample to be stored in one place. However, we must assume no such conditions continually, thus some redundancy settings must be applied.

The main problem with the structured dissemination is that a failure in a node will cause a whole tree branch to be mishandled. This is especially likely in networks where nodes join and leave the system often, known as churn. Common mitigation methods trade better coverage for an increased latency or a higher number of sent messages. In this section we cover the latter trade-off, introducing 3 settings to exploit redundancy for churn and network fault tolerance. In later sections we cover other techniques for dealing with these threats.

The advantage of this technique is that it adds no latency since parallel messages are sent almost simultaneously. When redundancy is applied, the number of messages is multiplied and a given node may receive the same message several times. This requires each message (bundle) to have a unique identifier so that nodes can process it according to a special pre-configured policies.

4.4.1 Replication policy

Replication multiplier (denoted as

k

throughout this document) defines how many more nodes are to receive the same bundle beside the one that appears to be closest. An alternative parameter is the replication radius –- a XOR-distance between a sample-id and any node encountered during the dissemination.

Another pair of settings controls when messages are replicated: whether it happens once by the initiator –- replicate-one/R1; or on each hop so that any node receiving a message would forward to a several more locations –- replicate-all/RA.

4.4.2 Forward policy

A node receiving a same bundle of samples for the second time can simply skip handling it again –- forward-one/F1 configuration. Alternatively, node can keep forwarding it to the next nodes in the routing tree –- forward-all/FA. With the latter option, if a node receives a sample it already handled it will attempt to forward it to the same set of peers as the first time unless its routing table was updated in between invocations. This is expected behavior to account for lost messages and mishandled routing branches in an attempt to maximize coverage.

Considering both previously introduced policies, we now have four possible configurations: R1/F1, R1/FA, RA/F1 and RA/FA. We can immediately discard the RA/FA option because it would lead to an explosion of messages on the network. As mentioned, R1/F1 option can only be applied under stable network conditions, but since there is no reliable way for the single nodes points of view to estimate network health, use of this setting is not recommended. In [6: p. 7] authors came to conclusion, based on their experimental results, that best coverage under the simulated churn is achieved with RA/FS, where forward-some/FS is an additional setting that only forwards duplicated messages to a certain threshold, that needs to be experimentally determined based on the network size and topology.

4.4.3 Peer scoring

Sometimes it is more useful to consider quality rather than quantity. Peer scoring is a powerful technique that can greatly improve both coverage and latency during samples dissemination. It is best when peer scoring applied together with redundancy, in a way that messages are replicated to both more and better peers.

In dv5das, Proto proposes a set of behaviors that either positively or negatively impact peer's score. In this work we analyze three more metrics, which emerge from the following considerations:

Latency: peers that are quicker to take request into work are preferable.
Routing table size: peers that have larger view on the network are better at disseminating data.
Peer role: peers with a builder role are more likely to have larger routing table and smaller latency.

There are, however, some difficulties with translating these metrics into scores, which we discuss in the rest of this section:

Routing tree resolution requires variable time. Assuming that remote peer won't respond due to being blocked until all subsequent nodes in the tree finish their lookups, there is no point in measuring request handling time as latency metric. However, we can require nodes to respond with handle-id immediately and ACK about finished broadcast as a separate request referencing previously generated handle-id. Here, handle-id is a one-time identifier to associate response that will be sent separately with a given request. This way, one can measure time it took remote peer to respond initially with handle-id and apply it to their score.

Ethereum nodes do not expose their routing table sizes, as there is no reliable and efficient way to prove this. However, one can make an approximate estimate about other's network view by just looking at their NODES responses. If remote peer consistently replies with close to 16 nodes, then it is likely to have a larger view than peers that are more restrained in their responses. Still, rely on this logic may prove be misleading as nodes are less likely to have many closer nodes when the lookup destination itself is close.

pseudocode:

def estimate_table_size(self, remote_peer, lookup_key, closer_nodes):
    previous_score = self.routing_table_sizes[remote_peer]
    distance = remote_peer.xor_distance(lookup_key)
    
    # COEFF is empirically determined constant to balance lookup distance with a number of received closer nodes.
    new_score = (previous_score + (distance * len(closer_nodes) * COEFF)) / 2
    self.routing_table_sizes[remote_peer] = new_score
    
    return new_score

Accounting for peer role can be easiest and perhaps most efficient metric. However, node ids of builders aren't publicly known, that would be a violation of validator privacy and a possible DoS vulnerability. Today, builders are only identified by their special builder pub-key, which relays collect when receiving new block.

4.5 Routing strategies

Routing defines how a message is propagated in the overlay network, what nodes are responsible for a its destination, and how it can be found again later. It is crucial for routing to remain deterministic, otherwise disseminated data won't be retrievable from the network by anyone other then ones who put it there.

In Kademlia, routing is a done by comparing distances between the lookup key and the known node ids. It’s an iterative process as each request results in either more discovered nodes to send new requests to or that a node/value for lookup key was found and a routing session is finished. For samples dissemination, the iteration handler is extended with bundling, peer selection, and storage writes (in cases when no closer nodes were found for processed samples), but the underlying scheme generally remains the same.

Literature distinguishes two alternative routing modes:

Iterative: an initiator node controls lookup to ask other nodes about candidates for the next hop. Usually
$O (l o g N)$ nodes have to be asked sequentially for other nodes closer to the lookup key until the corresponding destination node is identified. As a common optimization, lookup messages are sent in parallel to multiple different nodes at the same time.
Recursive: the lookup message is passed from hop to hop to the next closest known node according to the proximity metric, until the destination is reached. After forwarding the message to its first hop, the initiator loses control over the message and its routing path.

Concrete implications of using one mode or another will be more relevant in the context of random sampling. For now, assume that iterative routing provides the initiator node with more information and influence over the routing path, while recursive mode is generally faster and results in less overhead, most notably in stable networks.

4.5.1 Recursive dissemination

A natural way to route data dissemination is the recursive one. Existing data dissemination schemes [I, II, III] are all based on recursive routing, as it appears to be easier to reason about and result in a much cleaner implementation. One only needs to figure out a single algorithm intended for all nodes in the DAS overlay network. Recursive routing makes no distinction between initiator and forwarder nodes, as nodes on each hop act as a root of the broadcast tree and perform same set of operations. It is also more hardware-friendly as it assumes no unique bandwidth requirements for the initiator nor for intermediary nodes.

The initiator starts by picking

α

bundles that was previously grouped and mapped by node-ids. Each bundle is assigned with bundle-id needed to avoid double-handling when redundancy settings are applied. Initiator then sends parallel and asynchronous RPCs to push bundles along with their ids to their intended nodes. If destination is the initiator node, then samples are written to local storage.

Specific wire protocols will be discussed in later sections. For now, we only require it being asynchronous and support acknowledgements for the successfully handled requests, so that the initiator can send next bundles in pace limited by the concurrency parameter
$α$ .

In general, for recursive routing we recommend using unbounded
$α$ , but additional considerations about the wire protocol or implementation specificities may apply.

A peer receiving an RPC request will parse it into bundle-id and a list of samples. If a request the with same bundle-id was already processed and redundancy settings require no additional forwarding, then it can be skipped. Otherwise, receiver node works as follows:

It assumes role of the broadcast tree root and performs same set of actions that the initiator does.
If for certain sample-ids of no closer nodes can be found in receiver's routing table, then samples are written to local storage.
If bundle has already been handled before but the redundancy settings require to keep forwarding it, the receiver does that but makes no additional storage writes.

pseudocode:

def disseminate_samples_recursively(self, samples, args):
    allocation = self.bundle_samples(samples, args.bundling_strategy)
    
    for (node_id, samples) in allocation:
        if next == local_node_id:
            self.store_samples(samples)
            continue
        bundle_id = generate_id()
        msg = (bundle_id, samples)
        self.send_to(node_id, msg)
        
def handle_dissemination_request(self, from, msg, args):
    (bundle_id, samples) = msg
    
    if self.hadled_messages[bundle_id] and args.forward_policy != FP.ForwardAll:
        return
    
    self.disseminate_samples_recursively(samples, args)

4.5.2 Iterative dissemination

With iterative routing, the initiator has more control over the route, allowing them to react to problems faster, e.g. switch to alternative branches in case of remote node failure. As an obvious trade-off, it requires greater resources from the initiator (e.g. bandwidth), but this isn't necessarily a problem. On the contrary, since the initiator will always be a builder, this may lead to a more optimal utilization of available resources.

The initiator starts same as with recursive routing, but instead of sending bundles with full samples, she only sends their ids. She then waits for remote peers to respond. When all responses associated with a given bundle-id are received, the initiator starts work as follows:

Discovered nodes are used to perform bundling again. This time, when newly grouped bundle is addressed to the same remote that was already requested, the initiator sends sample values instead.
For newly appointed peers, she sends ids and the iteration repeats.
The initiator finishes when for all bundles all sample values are send to closest found peers.

A peer receiving an RPC request will start same as with recursive routing, but instead of sending re-bundled samples to closer nodes they know, they will send their ENRs back to the initiator. Note, that in this design, it is up to the initiator to decide which samples to send to a remote peer, based on their own assessment of that peer's local view. In an alternative design, a remote peer responds with NODES(Vec<Enr>) if the know closer nodes to a given sample-id, and with STORE(Vec<SampleId>) if not. This solution is simpler, but incurs more communication overhead than the former one.

pseudocode:

def disseminate_samples_iteratively(self, samples, args):
    allocation = bundle_samples(samples, self.nodes, args.bundling_strategy)
    
    for True:
        new_allocation = {}
        
        for (node_id, samples) in allocation:
            if next == local_node_id:
                self.store_samples(samples)
                continue
            bundle_id = generate_id()
            msg = (bundle_id, "Keys", samples.to_ids())
            closer_nodes = self.send_to(node_id, msg)
            if len(closer_nodes) > 0:
                new_allocation.append(bundle_samples(samples, response.nodes, args.bundling_strategy))
            else:
                msg = (bundle_id, "Samples", samples.to_ids())
                self.send_to(node_id, samples)

        if len(new_allocation) == 0:
            return
        else:
            allocation = new_allocation;
                
        
def handle_dissemination_request(self, from, msg, args):
    (bundle_id, type, samples) = msg;
       
    if type == "Keys":
        closer_nodes = set()
        for sample_id in samples:
            closer_nodes.append(self.close_nodes_to(sample_id))
        self.respond_to(closer_nodes, from)
    else:
        self.store_samples(samples)

4.6 Churn resistance

The destructive implications of high churn along with one mitigation strategy (redundancy) were covered in Section 4.4. Here, we take a look at some other known techniques for dealing with churn in the structured networks.

4.6.1 Anti-entropy gossip

An anti-entropy protocol based on gossip/flooding operating on top of a DHT-based structured overlay can improve coverage and consistency. This family of protocols lets nodes gossip among each other to find out whether they are missing any data, which then can be transmitted. Combined with a structured broadcast algorithm that attempts to disseminate data to all nodes with guarantee to never send redundant messages, anti-entropy protocol can help ensuring that all nodes with high probability and low cost get all the messages.

The major concern is that the gossip protocols have probabilistic guarantees, i.e. more times it was repeated the more reliable the broadcast gets. Considering our strict time constraints, the anti-entropy might never reach enough repetitions to be useful. According to [6: p. 11], parallel flooding produces a rather significant increase in the number of sent messages while result in minor coverage improvements. In the same work, authors recommend using more specialized and deterministic methods, which we discuss next.

4.6.2 ACKs and resubmissions

In order to fight unexpected failures, one can enhance the protocols with a simple acknowledgment mechanism: sending of an ACK as soon as a node has correctly processed and forwarded a received message. If the ACK is not received, the sender will resubmit to a different contact in the bucket. Once a node has acknowledged the received bundle, it's her responsibility to wait for ACKs from all subsequent nodes in the broadcast tree and to handle failures accordingly.

If only ACKs and resubmissions are applied, nodes should only receive a message once, since no parallel paths are used. However, it's most effective when both redundancy and AKCs are applied together. In [6: p. 11], authors experimentally showed that R1/FA + ACKs achieves excellent coverage even under extreme churn and failure rate. As a trade-off, each resubmission adds a slight increase in latency, though this is amortized when redundant paths are used prior to resubmissions. Overall, a 5-10% latency spike seems well worth the gains in coverage, especially considering the long-term need for historical sampling.

4.6.3 Empty regions re-assignment (ERR)

By default, regions (buckets) without contacts are just ignored in broadcasts. In stable conditions, this is safe, but in the presence of churn, routing data can be momentarily inaccurate and new nodes may be unknown. The ERR lets empty regions be handled by peers in close regions, in the hope that they may know about nodes the current handler still ignores. The use of such an algorithm does not increase latency or the number of messages, but under severe churn or failure conditions, it offers limited coverage gain.

5. Query random samples

Establishing the ability to selectively query samples from the network is essential as it allows to probabilistically check data availability by downloading only a small random portion, rather than the whole block. For network with 10k validators who actively involved in DAS, the availability of a 32 Mb block can be verified when each validator samples ~75 randomly selected chunks.

Following are properties that sampling functionality must satisfy:

Discrete identifiability: data is split into pieces whose identifiers are publicly known. It is possible to randomly and uniformly select a subset of pieces using only the identifier space.
Addressability: it is possible to determine location of the piece by only knowing its identifier and use it to route a data retrieval request with some bounded latency.
Consistency: whenever two honest node accept to store some data, then the data is consistent and no two honest honest clients can sample and reconstruct different data.
Historical retrievability: besides pieces of the most recent "live" blocks, it is possible to selectively query older blocks up to a certain period in the past.
Sender/receiver privacy: to prevent adversarial host from selectively withholding certain samples from certain clients, thereby breaking the probabilistic guarantee of random sampling, host (sender) must remain unaware about the querying client's (receiver) identity.

For identifiability (1), we consider two approaches. The obvious one is to randomly pick a number of sample-ids and use them to query chunks from network. We refer to this as structured sampling. The second option assumes uniform distribution of chunks in the network to allow sampling the network in an unstructured way without deciding on specific ids beforehand.

For addressability (2), we consider data retrieval in pull (DHT lookups, req/resp) or push-like manner (topic-sub). Due to the need for samples to be disseminated to their corresponding hosts beforehand, the pull model implies higher latency. Nonetheless, we believe this is the only right way to meet all five properties, including historical retrievability (4): since unstructured/push option characterized by an ephemeral access to data compared to the persistence associated with structured/pull sampling.

In the following sections, we discuss both these approaches in greater details. However, only the structured/pull one will be considered for evaluation and simulation tests.

5.1 Topic subscription

Validators can subscribe to the topics which correspond to the samples they wish to receive. Latency reduction is possible because validators don't wait until samples are disseminated, but can receive their assigned data while the dissemination is taking place.

Topic subscription could fit for row/column sampling, as validators remain interested in the same chunks during the whole epoch and the number of topics is bounded by the number of rows and columns (1024, though still high). The same isn't true for private random sampling. Even if topics are allocated per 1000 of samples, thus requiring around 200 topics in total, individual clients still only need ~75 of 1000. This will produce a lot of waste bandwidth and is likely to still suffer from churn issues.

Perhaps less intuitive, but this approach has another fault that both validators and private clients could suffer from. A rogue builder, trying to make a block unavailable, could potentially push (to the majority of topics) just enough samples to convince a majority of validators about the availability, but not enough to reconstruct the full block. Even if a builder is unaware of the subscribers' identity, it still has more control over sample disclosure. And since there is a finite number of groups (subnets), the builder can perform a withholding attack on the minority of peers.

5.2 Random walk

The primary observation behind this approach comes from the fact that it's possible to randomly sample entries of the set that is uniformly distributed across the P2P network by simply asking some number of peers about the entries they store. However, there is a number of challenges with this method. First of all, the client may not (and probably won't) have enough contacts to ask for the needed number of samples. This is especially true for light clients that usually maintain one or two connections to full nodes in the network. The solution to this problem is a random walk. A simple random walk is a sequence of nodes visited, where at each step, the next destination node is selected with a fixed probability.

5.2.1 Unstructured random walk

Collecting uniform random data samples is also challenging due to the irregular nature of the P2P network: varying degrees of connectivity between peers, widely varying amounts of shared data. In [7: p. 3-6], use an algorithm based on Metropolis-Hasting to actively tweak the length of the random walk (known as the mixing time of the Markov chain) allows converging to a stationary uniform distribution. For this, authors also instantiate a virtual network, where internal (close) links are used for communication, while external (long-range) links are used to perform random walk. Further research on this method is needed, but it is clear that implementing it in the existing network would be quite challenging.

5.2.2 Structured random walk

In DHT networks, there is an opportunity to develop specialized random walk algorithms, considering the distinctive properties of the network graphs. The simplest method is to address queries to random node ids until by chance an occupied one is found. Using this method, every node is selected with the same probability -

n / 2^{N}

where

N

is the size of Kademlia's identifier space. Unfortunately, this method is unusable in practice, as the probability of finding an occupied id is extremely low. In another method, one can use Kademlia lookups to find the closest node to a randomly chosen key (

X

). But again, although ids are distributed uniformly in the identifier space, the resulting territory (number of ids that are closer to X than to any other online node) sizes are highly uneven.

It's possible to account for this unevenness when accepting samples from a "randomly" found node: accept with probability

\frac{T_{m i n}}{T (X)}

where

T_{m i n}

is the minimal territory size in the system, otherwise repeat. In [8: p. 7], authors estimate the complexity of such method to be

O (l o g^{2} N * l o g l o g N)

and argue that using it could be more efficient than doing a number of independent Kademlia lookups each having a complexity of

O (l o g N)

Overall, random walk can be an interesting option for private random sampling, but it lacks precision needed for row/column sampling and reconstruction. It would also be more challenging to formally prove the security and effectiveness of this technique.

5.3 DHT lookups

In Kademlia, key lookups involve contacting a logarithmic number of nodes until a node responsible for a key is found. It is simple, reliable and time-proven method to search data in the P2P networks. To assess compatibility KAD lookups for the random sampling, we identify following three requirements:

Sub-second latency - due to time constraints.
Efficient parallelization - to simultaneously sample 70+ different chunks.
Persistence - for historical sampling.

Our findings prove that Kademlia is in fact a promising candidate for caring out large load of random sample queries:

In [9: p. 6], experimental study on the multimillion-node Kademlia overlay (Mainland DHT) shows that consistent sub-second lookups are generally within reach.
In [10: p. 10], results reveal that higher parallelism in Kademlia not only positively impacts lookup time, but also help dealing with intensive churn conditions.
Finally, being a DHT ensures that as long as at least one node stores values they can be found in logarithmic time at any point in the future. Interestingly, authors of [9: p. 10] also mention that the more older and popular a key is, the faster the lookup.

Now we define how sampling client work: A sampling client starts by randomly picking

ν := 75

samples indexes and computing sample ids = [H(fork_digest, randao_mix, data_id, i) for i in sample_indexes]. It then takes the first

α

sample-ids and works as follows:

For each sample-id, in parallel and asynchronously, client searches its routing table for
$β$ closest nodes and proceeds with a standard Kademlia lookup procedure.
Client initiates next id lookups in pace limited by the concurrency parameter
$α$ .
In case, lookup for any sample-id fails, client can take different
$β$ nodes if available and repeat lookups over the disjoint paths, otherwise or if lookup fails again, client considers block unavailable.
If all samples for chosen sample-ids are found, then client can consider the block available. In this case, client also re-broadcasts all found samples to the rest of the network.

In the rest of this section, we take a look at optimizations to further improve latency, coverage, and lookup consistency.

5.4 Latency optimizations

Random sampling is latency-sensitive application. Kademlia lookups latency on the other hand is only good enough. Following are some optimization strategies that can be incorporated into overlay DHT to consistently improve lookup time.

5.4.1 Larger k-buckets

Lookup latency is proportional to the number of hops it take to reach destination node. It is possible to reduce this number by increasing total number of buckets, which is one of the modifications in the Discv5 protocol. However, extensive study in [11: p. 7] reveals that it's just as effective to increase size of individual buckets, while also being simpler to implement, resulting lower maintenance cost, and improving resistance to churn as a side effect.

Another particularly interesting insights was mentioned in [9: p. 5] –- given the structure of a Kademlia routing table, on average, the first bucket is used in half of the lookups, the second bucket in a quarter of the lookups, and so forth. Hence it's only logical to enlarge buckets proportionally to the probability of them being used in a given lookup, e.g. the first buckets hold 128, 64, 32, and 16 nodes respectively, while the rest of the bucket sizes remain at 8 nodes. In same work, authors measure effectiveness of this optimization and conclude that only 0.1% of modified lookups take over a second, compared to over 27% in Mainland DHT.

5.4.2 Topology adaptations

To achieve lower latencies, the overlay nodes must be aware of the underlying network topology. Without any knowledge, single overlay hops have the average latency of average round trip time (RRT) in the underlying network. In [12: p. 2-3] authors describe three approaches to reduce routing latency by exploiting knowledge of the physical network topology:

Proximity Routing: both XOR-distance and physical proximity are considered during routing. The latter is either known in advance or progressively discovered by observing the response latency of known nodes.
Proximity Neighbor Selection (PNS): considers proximity while initializing the routing tables (require knowledge of physical topology in advance).
Topology-based Node-Id Assignment: assigns node IDs based on physical proximity (usually leads to a non-uniform node-id distribution).

Obviously, the above methods assume some degree of awareness about the nodes' physical distribution, which may not be known in advance. Also note, that such knowledge may open certain DoS attack vectors that wouldn't otherwise be there. Nonetheless, it might be quite advantageous to consider the latency of past responses in the routing algorithm. In Section 4.4.3 we proposed scoring peers based on RRT. The impact of low-RTT bias was studied in [9: p. 4]. Authors reveal that Kademlia nodes with proximity routing modification are 90% more likely to perform routing under 500 ms.

5.4.3 Recursive routing

Kademlia's original paper only specifies iterative lookups as described in Section 2.3. The definition of recursive routing is given in Section 4.5. The primary benefit of doing lookups recursively is to reduce latency, since on each hop node forwards a message directly, without communicating with the lookup initiator first. Additionally, this greatly reduces time spent traversing NAT, which we discuss in Section 7.2.2.

In iterative routing, data collected during routing helps maintain and fill up Kademlia's k-buckets. Hence, in recursive mode there must be a mechanism to notify the initiator of the nodes found on the path. More importantly, the initiator also needs to receive a value corresponding to the key they start the lookup for.

For this, author of R/Kademlia [13: p. 104-105] present two signaling modes to pass data back to the lookup initiator:

With direct signaling, all nodes on the routing path send the closest nodes to the key from their local k-buckets back to the initiator. The node that holds the value for key
$y$ sends it back to the initiator.
With source-routing mode, all nodes on the routing path forward data about the closer nodes to the node in the next hop. Only a node on the last hop sends value and discovered nodes back to the originator. This way, nodes only communicate with known peers, but at the expense of higher bandwidth consumption.

Acknowledgments are essential for recursive routing too. With ACKs the occurrence of failed nodes on the path does not result in a rollback of the whole routing procedure, but only in forwarding the message to an alternative node.

The experimental results in [13: p. 106] reveal almost two times latency reduction. In scenarios where interactive routing (

α = 5

) takes ~630 ms, the recursive one is ~350 ms. A similar difference can be seen under the topology adaptation (PNS) where iterative latency takes ~415ms and recursive is ~225ms. Clearly, this is a significant performance improvement. Nonetheless, it's worth noting that there are far less literature on Kademlia's recursive routing, lacking some formal security analysis of how recursive routing modification stands up against known thread models. For instance, the need for initiator to pass its network address along with the lookup message could open some new DoS attack vectors. Further research and experiments are needed to validate this optimization technique.

5.5 Coverage optimizations

We already described a number of techniques and optimizations to achieve good coverage during the samples dissemination phase. We believe that clients doing random sampling can assist with this goal, even after the main dissemination work has taken place. This is particularly useful for historical sampling, but also help preventing destructive effects of high churn. Methods in this section assume that the network is configured to accept certain level of redundancy when it come to storing samples, which it certainly should (again because of churn).

5.5.1 POKE mechanism

During the course of the sample lookup process, a client may encounter nodes along the search path which do not have the sample but for which the sample-id does fall within the replication radius. These are nodes that should be interested in storing this sample. If the client successfully retrieves the sample from another node, it should send an Offer message for that sample to those interested nodes. This mechanism is designed and implemented by the Trin team to help continually improve coverage of the content stored in the Portal Network. We believe that it would be useful for the purposes of DAS as well.

6. Security analysis

The aim of DAS is to prove that data was made available at some point in time and for enough time that all interested actors could be able to download it. Generally, any attempt to prevent this from happening should be considered an adversarial behavior. We assume that DAS components are deployed properly, ie. there are enough sampling nodes in the network, clients query the network for the sufficient number of randomly chosen sample ids, KZG proofs are sound, and all actors are sufficiently incentivized. We thereby make a consequent assumption that any attack that results in explicitly unavailable data (e.g. failed dissemination, large-scale eclipse attack, etc.) will only lead to the block being discarded due to failed availability check.

Hence, the real concern are the attacks that allow one to trick the majority of sampling validators about data availability while selectively withholding that data. We must also seek ways to prevent attacks that result in continuous failure of availability verification thereby halting consensus and block production.

In the original DAS paper [14: p. 15], "selective share disclosure" attack is defined by the ability of adversaries to link sample requests to individual sampling clients, which allows them to selectively disclose just enough shares to convince a majority of clients of the availability when in reality not enough data was released to reconstruct the whole block. To prevent this attack, each sample request must be sent over the anonymous channel and received in random order, such that it would be impossible to correlate back to its sender.

Achieving this in practice, however, is far from trivial. Firstly, such a channel would require adding a random delay to each request, thus increasing the overall latency. Secondly, while anonymizing the inbound requests is generally figured out, it is usually much harder to route responses back to the requestor without compromising on either privacy or latency even more. Finally, the sheer load of sampling requests is significant enough to cripple any privacy layer that doesn't scale well.

6.1.1 Mixnets

The use of Chaumian mixnets was mentioned in the DAS paper. Unfortunately, the problem with their underlying fixed-cascade topology is that is scales pretty badly. Additionally, their mixing latency is unbounded, which is particularly bad considering our strict time constraints. An alternative design, known as stratified topology, is shown [15: p. 13] to have superior scalability and is currently led by the Nym mixnet. In fact, Nym latency is claimed to reduce with a number of users as less per node mixing delay is required: for 10k clients the expected latency is around 150ms. Receiving responses, however, is still challenging: Nym provides the SURBs (Single Use Reply Blocks) mechanism but it impose a payload limit (2 KB) and causes additional latency. Even more problematic is to conduct Kademlia lookups behind the mixnet. If all requests are sent anonymously then the average latency for the network of 10k validators is

2 * 150 (m s) * l o g N \approx 4 s

. Mixing only the last lookup requests could help, but how to determine a moment when to start isn't obvious too.

6.1.2 Prefix lookup + Private set intersection

It is useful to break the anonymity problem with Kademlia lookups in two: 1) private routing and 2) private data retrieval. The goal of (1) is to find node in the network responsible for storing sample, without revealing the interest about the specific sample-id to all neighbors. For this, we propose a modified lookup procedure that searches by sample-id's prefix of

(p)

bits –- prefix lookup. Such a lookup results in a set of discovered nodes where each is likely to store some sample whose id contains a searched prefix. This purposely redundant modification gives plausible deniability of being interested in a particular sample-id, which only the requesting client knows. A client can then select only those nodes whose ids fall in the shortest XOR-proximity to the actual sample-id and proceed to the next step.

The aim of (2) is to download the sample data from the node storing it (or gain confidence that it really does) without revealing the identity of the interested client. As the previous step resulted in multiple found nodes, they all need to be asked. However, instead of asking them openly or through slow mixnet, we propose the following clarification step: a client asks each node what samples they HAVE on which a node can send all the known sample-ids. Of course, due to the large bandwidth overhead, that would be impractical. Instead, two cryptographic techniques are at our disposal: private set intersection (PSI) and bloom filters.

PSI allows checking whether an element or key is known by both parties, without revealing any additional information. In practice, this is usually done by running a multi-round protocol between parties that involves ECDH (Elliptic Curve Diffie-Hellman), binding, and checking steps. Bloom filter (or Cuckoo filter or Golomb-compressed set) is a space-efficient probabilistic data structure that a node can send to a client to allow checking that a sample with small false-positive probability is known by that node. Often both of these primitives are used together to optimize accuracy and efficiency that they lack in isolation. Finally, once a node that is likely to store a sample is identified, a client can issue a download request secured by the mixnet or PSI-compatible data transfer (needs further research).

Currently, two independent teams work on prototyping such modification for IPFS in hopes of improving privacy for content consumers. ChainSafe's privacy engineering team has successfully prototyped and tested prefix lookups for the Libp2p-based KAD system. In parallel, another line of prototyping work is being done to extend the Bitswap protocol with PSI and bloom filters. Overall, this is quite an interesting design space and we are looking forward to extending on this in future research.

6.2 Continuous unavailability

Failed availability check doesn't pose an existential threat as long as it possible to retry in the next slot. What's worrying, however, is an attacker who is able to continuously make data unavailable thus halting the chain and all L2s with it. Attacks such as Sybil, Eclipse, DoS can potentially be performed to cripple the DAS network. Unfortunately, DHTs along with Kademlia are known to be especially vulnerable to these attacks. It's possible to limit attacker's ability to render these attacks by preferring long-lived contacts, implementing a more sophisticated identity system, and other well-researched techniques. In this work, we focus on one technique that seems most effective in the context of DAS.

6.2.1 S/Kademlia

A structured overlay can employ not one but multiple routing tables, each containing different sets of peers. In S/Kademlia [16: p. 3-5], key lookups are performed first using the primary table (that involves all nodes), and if this fails, using the secondary table, which contains only the nodes who has solved PoW-puzzles to mine hardened node-ids. In a recent adaption by Dankrad, PoW puzzles were replaced with "proof of validator" and the idea of multiple routing tables was applied to secure DAS. The result is a hardened DHT that can theoretically withstand large-scale Sybil and DoS attacks. This promising design space is currently explored by EchoAlice.

7. Practical considerations

In this section, we go over some practical considerations for implementing DAS components and the specific algorithms we have introduced so far.

7.1 Wire protocol

There is a number of considerations when it comes to transport layer protocols: reliability, communication overhead, packet size limit, open connections limit. As Discv5 is a primary tool for achieving goals of our work, we will therefore focus on the UDP (User Datagram Protocol) transport. Reasons why UDP is said to be a mandatory wire protocol for Discv5 can be found here.

A defining design choice of UDP is in the absence of acknowledgements for packets. Thanks to it, UDP speed is superior over TCP, but it also a big reliability trade-off that should be considered. To minimize likelihood of packet lost, application-layer protocols usually enforce a packet size limit. Discv5 spec sets this to 1280 bytes. Now, since each sample is 512 bytes and nodes would potentially require sending them by thousands, this immediately becomes a major issue. Following are some approaches to work around this limitation.

7.1.1 Application-layer split

The first approach can be found inside the Discv5 itself. Namely, the NODES(request-id, total, [ENR, ...]) response support sending up to 16 ENRs, but since each ENR can be up to 300 bytes the remote node would send multiple such responses each having at max 3 ENRs. Same trick can be done when sending samples, ie. up to 2 samples can be send in one UDP packet.

This method is very efficient as each message consist of useful bytes only, nothing more, nothing less. Unless of course, a packet is lost, then these samples may never be disseminated and data availability will be at risk. Even a 0.01% chance of packet loss is considerable enough to have some sort of automatic retransmission mechanism. Implementing it on the application-layer is tricky and error prone, thus adopting some existing solutions is generally safer and easier.

7.1.2 Reliable-UDP over Discv5

Discv5 designed to be minimal and concise, but it can be easily extended using TALKREQ and TALKRESP methods, which can power an unlimited number of sub-protocols. To avoid conflicts one only need to specify protocol argument in the request body. It is therefore possible to support any higher-level UDP-based protocol on top of Discv5's generic RPCs. As happens, one such protocol is already implemented by the Portal Network team in their Rust client called Trin.

μTP (Micro Transport Protocol) is a simple streaming protocol originally designed to provide reliable and ordered delivery for BitTorrent over UDP, while avoiding poor latency and congestion control problems found in TCP based clients. Current version of the Portal Network follows the original BEP29 spec and uTP reference implementation, but instead of a raw UDP packet, the TALKREQ message is used as transport.

In Trin, uTP streaming is always used in response (e.g. ACCEPT(conn_id) for OFFER(content_id)). In our case, streaming is instead applied as a request call (e.g. SAMPLES(conn_id)). Note, that Discv5 implementation may queue sending next TALKREQ before a TALKRESP for the previous request to the same node is received. This will block uTP handshake when used as a request. To avoid this, we propose nodes receiving uTP connection-id to respond with PROMISE(promise_id) immediately and use it to send request results later as a separate TALKREQ. See diagram bellow for the example use.

As seen, Discv5 implementations come with their own design restrictions, which may cause inconvenience or even worse performance. Instead, it could be better to use raw uTP or perhaps some other protocol that already has a performant library. The only concern is that time spent on negotiating another transport over a new connection may end up offsetting the benefits of that transport.

7.1.3 Transport negotiation with Discv5

Discv5 specification describes TALKREQ as means to pre-negotiating connections for new application-specific protocols. From the words of Felix Lange, the intended use of this method is to "agreeing to talk", which is also a reason for its name. Overall, there are two things to exchange before streaming over a new protocol can proceed: a new port and encryption key.

Key exchange is the more expensive of two, so it's worth using existing keys. In theory, we can skip negotiating a new port as well: since UDP is stateless, nothing prevents multiple processes to receive packets on the same port. In practice, this requires extending each packet with a special header, which would be used to dispatch messages to their intended handlers: some to Discv5, others to whatever secondary wire protocol is chosen. This isn't a significant overhead, but it certainly is more that just sending 8-bit number (port) once.

Most notable advantage of having secondary wire protocol decoupled from Discv5, is in the ease and flexibility of integrating such protocol. For instance, one can plug any existing and battle-tested implementation of uTP: libutp, meqif/rust-utp, anacrolix/go-libutp. There are other UDP-based protocols worth considering.

KCP is a reliable ARQ (Automatic Repeat Query) protocol that trades 10-20% bandwidth overhead in exchange for a 30-40% latency reduction (compared to TCP). Amongst other optimizations, KCP retransmits packets more quickly by anticipating packet loss before retransmission timeouts. It natively supports stream encryption and forward error correction. KCP is often used in low-resource devices like routers and some applications related to online gaming and audio/video transmission. Available implementations are skywind3000/kcp, xtaci/kcp-go, en/kcp-rs + libp2p stack integration paralin/go-libp2p-kcp. Learn more about KCP here.

QUIC is a recently standardized transport protocol that stacks multiplexing and retransmission control on top of UDP. All data streams in QUIC are independent and processed concurrently, they manage their flow control and their lost data retransmission. QUIC connections can be terminated on demand or automatically after idle timeout. It natively handles stream encryption, but forward error correction is still in draft. Available implementations are cloudflare/quiche, lucas-clemente/quic-go, quinn-rs/quinn + official integration with libp2p stack.

According to this simulation, KCP tends to have better latency then QUIC, particularly with high packet loss rate.

7.2 NAT traversal overhead

Most of p2p routing and search algorithms such as flooding, random walk and DHT lookups assume that peers are equally accessible to each other. Literature [17: p. 378], however, shows that more than 80% computers are behind routers or firewalls, thus requiring NAT (Network Address Translation) traversal to be accessed. As a result, messages follow a longer path to arrive since the sender cannot initiate a direct connection to a remote peer behind NAT right away, a technique known as "hole punching" must be applied first. This becomes particularly problematic with Kademlia's iterative lookups, where the initiator needs to contact many previously unknown peers, leading to high NAT traversal overhead.

In UDP, no explicit hole-punching is required if the NAT setup is capable of full-cone translation, i.e. a first packet sent to remote node establishes a port mapping which allows next packets reach the node behind NAT. Nonetheless, it's worth considering some application-layer techniques capable to amortize NAT overhead regardless of the address/port translation type used.

7.2.1 Common neighbor facilitated hole punching

With iterative routing, direct connection can be established with the help of neighbor nodes. When a node queries one of its neighbors for certain data and retrieves a node list of candidate nodes, this neighbor can help the initiator and candidates to establish a direct connection since the neighbor is already connected to both. This assumes that peers maintain long-lived connections with neighbors from their routing table. In practice, this require nodes behind NAT to periodically send UDP keep-alive messages to all the neighbor nodes. Nodes in Discv5 are already doing this as part of routing table maintenance routine. Authors of [17: p. 6] show that use of this technique in what they call "strongly-connected model" leads to 60% reduction in number of hops and generally outperforms existing model in delivery rate and scalability.

7.2.2 Recursive routing for NAT minimization

Recursive routing can greatly reduces the need for establishing new connections, as a lookup message travels between neighboring peers, which by assumption already maintain existing connection. However, as lookup originator still needs to get data about nodes found along the path, those nodes including the destination one must report about the contacted candidates back to originator. The problem is that the originator and most nodes on the routing path are not peers, so they not directly connected, which leads to NAT overhead yet again. Likely, the source-routing signaling described in Section 5.4.3 reduces NAT-ed communication to once per lookup (between destination node and the initiator).

7.3 Discv5 overlay

Discovery V5 network consist not only of Ethereum nodes. It also serves other networks, forks, test-nets, etc. Hence, we need a way to separate DAS nodes from everything else, i.e. an overlay. This should require no changes to the Discv5 protocol, but may extend it with new features, such as FIND_VALUE lookups, alternative/bigger routing table, modified maintenance routine, peer scoring, etc.

7.3.1 Topic advertisement

Topic advertisements are used by nodes to make themselves discoverable under a specific topic. Internally, this is implemented using a data structure known as a "topic table", which holds multiple queues (FIFO lists), each containing nodes that have advertised themselves by a specific topic. Ethereum nodes involved in DAS, can advertise themselves using a special topic. This way, dissemination requests and sample lookups can be restricted based on the current state of local topic tables. Assuming that new ads are throttled for a certain amount of time (15 min) before being put on air, it would be challenging to keep this data up to date, especially in the presence of high churn. This approach also gives no advantage in extending Discv5 as it only provides a way to separate nodes based on the services they provide.

7.3.2 Secondary routing table

Peers can maintain a secondary routing table forming an independent DHT exclusively for the purposes of DAS. Application-specific routing table can be initialized with records from the Discv5's primary table. In case only a subset of Ethereum nodes intended to assist with DAS, topic advertisement can be used to selectively pick relevant peers in the advertisement radius. Once initiated, secondary routing table can grow independently from the Discv5's primary one, hosting records that are not presented in the latter.

Modification-wise, the application-specific table can be configured larger. One can replace the k-buckets table with an alternative data structure, e.g. binary trie used for prefix-based routing described in Section 4.3.3. Discv5's TALKREQ/TALKRESP messages are well suited to carry out application-specific communication. Though it's possible to open a separate UDP connection or use other higher-level wire protocols for such a purpose. The overhead of maintaining two routing tables is actually quite low because Kademlia can do most updates based on the data extracted from DHT requests. Additional tasks specified by Discv5 are infrequent and are performed on a peer basis anyway. Any extra parameters (e.g. peer scoring) can be maintained as a separate routine.

Finally, one can extend the overlay functionality by introducing application-specific routing and lookup algorithms. An example of this is FIND_CONTENT lookups specified by the Portal Network and implemented in the Trin.

8. Evaluation & simulation results

In order to evaluate and compare different algorithms presented in this work, we developed a simulator^[1]. In a center of it we placed a modular DAS node that behaves according to the simulation plan served as set of CLI arguments that include: 1) core algorithm (dissemination/sampling) to run, 2) numerous strategies (routing, bundling, peer selection), 3) wire protocols (Discv5-uTP/Libp2p-TCP), and other algorithm-specific parameters (e.g. parallelism).

We developed a simple topology bootstrapping mechanism that generates a configured number of nodes (ENRs) and fills their routing tables according to the passed arguments. Supported topologies are: linear (for simple routing latency tests) and uniform (closer to live network). We allow saving generated topology along with a randomness seeds used elsewhere during simulation (e.g. picking a set of sample-ids during random sampling). Although we cannot guarantee full reproducibility as something like testground does, it's quite a convenient to compare alternative algorithms on the same topology.

There are a few reasons why we decided to roll with our own simulation software as oppose to existing tools like testground or PeerSim. Firstly, instead of just Kademlia we employ Discv5 that, while technically is similar, still has some major design differences. Secondly, our solutions often require particular modifications to the original components of Kademlia, such as changes to the underlying routing protocol. Finally, we wanted to re-use some of the existing solutions build by other teams, such as ethereum/trin from which we forked the discv5-overlay functionality. It's also because of the latter, that we choose Rust to be a primary programming language for this project.

8.1 Dissemination efficiency

The simulation results are prepared based on the modified Performance vs. Cost framework (PVC) [10: p. 2] to identify the best algorithms and parameter combinations regarding routing latencies and bandwidth consumption. We measure a total number of messages sent over the wire as our cost metric (orange lines). We ignore storage cost (e.g. samples written in nodes local store) because communication is typically far more expensive than storage. For performance metric we measure total time it took to disseminate all samples into the network (blue lines). We then map both of these metrics on the vertical axes oppose to the number of samples. We tried keeping network size (number of nodes) consistent for all tests, but for certain configurations that wasn't possible due to technical difficulties that we haven't yet solved.

Following figures demonstrate performance vs cost simulation results for the different routing and bundling strategies (left graph). To demonstrate coverage efficiency and uniformity we also plot the distribution of samples stored by each node in the network (left graph).

So far we only seen performance of the network communicating through uTP over Discv5 TALKREQ RPCs wires. This way is efficient but it doesn't scale very well with the increase of bandwidth (number of samples), which is most noticeable with iterative routing where the originator performs most of the dissemination work. For comparison we include same measurements but for the network running on the strikingly different transport - TCP implemented in the Libp2p stack.

8.2 Random sampling efficiency

We evaluate random sampling using the same cost vs performance framework. For performance, we measure the time it took 100 concurrent clients to query 75 random samples each (blue bars). The cost metric is measured in the number of messages sent over the wire (orange bars). We show the results of these metrics for each of the four dissemination configurations. Additionally, we also demonstrate the sampling failure rate by comparing the number of successfully retrieved samples to the total number of queries (7500).

9. Conclusion

This document presents our research on the Data Availability problem space. This includes findings sourced from a large body of academic work and our discussions with industry experts. The primary focus of this work was on sample dissemination and random sampling, for which we lay down different approaches related to unstructured peer-to-peer networks and structured overlays, as well as some practical considerations regarding security, network faults, and transport protocols. We then propose and briefly specify multiple algorithms and optimization strategies. In doing so we carefully consider the underlying cryptography and information theory of DAS as well as Ethereum's networking model.

Our samples dissemination design includes 2 routing modes (iterative and recursive) used to traverse DHT, 3 bundling strategies (by distance/buckets/prefix) for routing data chunked into a large number of pieces, and multiple peer selection settings that control redundancy and peer scoring. We conclude that bucket-wise bundling has a bigger optimization surface but its design is more nuanced than distance or prefix based strategies. The design of peer selection strategies is even broader: studied peer scoring models (based on latency, routing table size, and role) all come with inherent flaws that make it hard to justify their use. Redundancy also comes with a variety of configurations from which we lean towards replicate-some/RS + forward-all/FA but leave determining a specific number of redundancy multiplier for future reserch and experminentation.

For random sampling, we analyzed both unstructured approaches (topic subscription and random walk) and structured ones (DHT lookups) and conclude that only the latter satisfies all underlying requirements. We also include different techniques to improve lookup performance and coverage. Among them the most promising in our view are bucket resizing, POKE mechanism, and recursive routing.

Finally, we develop a specialized simulation software and used it to test and benchmark our solutions. The results conclude that iterative dissemination is consistently slower than the recursive routing mode. This trend persists during random sampling as well, resulting in longer lookups and a significant failure rate. When it comes to bundling strategies, distance-wise bundling appears to be more efficient during dissemination but leads to slower sampling than bucket-wise bundling.

9.1 Future work

For the future, we identify the following directions that are worth further exploration: 1) proofs of custody and 2) sampling validator privacy. Proofs of custody were initially considered as a means to penalize "lazy validators" for signing data without actually downloading and verifying its correctness, but can also be used to enforce validators to store their assigned chunks for a certain period of time, e.g. 1 month. State-of-the-art techniques are interactive (MPC) and non-interactive (SNARKs). Current efforts [I, II, III] target bandwidth and interactivity reduction for existing schemes.

The sampling validator privacy problem was already discussed in this work. We have proposed and briefly analyzed two approaches based on mixnets and prefix-based lookups + bloom filters respectively. In the future, we plan to extend on the second approach and test its practicality in the context of DAS.

9.2 Acknowledgments

We thank Protolambda (Optimism), Jacob Kaufmann, Ognyan Ogenev (EF/Portal Network), Cayman Nava (ChainSafe) and EchoAlice (EPF) for feedback and all the helpful discussions.

10. References

[1] Epidemic algorithms for replicated database maintenance https://dl.acm.org/doi/10.1145/41840.41841
[2] Evaluation of the Broadcast Operation in Kademlia https://ieeexplore.ieee.org/document/6332245
[3] Solution for the broadcasting in the Kademlia peer-to-peer overlay https://www.sciencedirect.com/science/article/abs/pii/S1389128613000777
[4] Kademlia: A Peer-to-peer Information System Based on the XOR Metric https://www.researchgate.net/publication/2492563
[5] Broadcasting in Prefix Space: P2P Data Dissemination with Predictable Performance https://arxiv.org/abs/0811.3387
[6] Evaluation of alternatives for the broadcast operation in Kademlia under churn https://link.springer.com/article/10.1007/s12083-015-0338-y
[7] Uniform Data Sampling from a Peer-to-Peer Network https://ieeexplore.ieee.org/document/6238553
[8] Random Node Sampling in Kademlia https://ieeexplore.ieee.org/document/5336383
[9] Sub-Second Lookups on a Large-Scale Kademlia-Based Overlay https://ieeexplore.ieee.org/document/6038665
[10] A performance vs. cost framework for evaluating DHT design tradeoffs under churn https://www.researchgate.net/publication/4165492
[11] Improving Lookup Performance over a Widely-Deployed DHT https://ieeexplore.ieee.org/document/4146982
[12] Exploiting Network Proximity in Distributed Hash Tables https://www.researchgate.net/publication/2527273
[13] R/Kademlia: Recursive and Topology-aware Overlay Routing https://ieeexplore.ieee.org/document/5680244
[14] Fraud and Data Availability Proofs: Maximising Light Client Security and Scaling Blockchains with Dishonest Majorities https://arxiv.org/abs/1809.09044
[15] Impact of Network Topology on Anonymity and Overhead in Low-Latency Anonymity Networks https://www.researchgate.net/publication/225107920
[16] S/Kademlia: A practicable approach towards secure key-based routing https://ieeexplore.ieee.org/document/4447808
[17] A NAT-ed Peer Organization Model in Kademlia Protocol https://ieeexplore.ieee.org/document/6681209

https://github.com/ChainSafe/das-prototype ↩︎

Pop

2023/03/10 14:03:34

validators are expected to keep querying rows and columns

To be clearer, validators are not required to keep querying. They are required only to query once when they make attestations. (Edited)

Timofey Yaluhin

2023/04/11 11:58:31

Shouldn't they attest only in case their assigned rows and cols were available before their slot? In this case they must download *until* their assigned slot

2023/03/10 14:53:53

light clients** expected to be partake in random sampling

Are you sure light clients have to do random sampling as well? I don't know much about light clients, but it doesn't make sense to have them do this as well because light clients don't validate blocks and random sampling is part of block validation process. (Edited)

2023/03/10 16:16:57

In KZG, the prover who needs to convince verifier that $y$ is the correct value of the polynomial

KZG is not a proof of knowledge protocol. KZG is used to prove that y is actually the correct value of the polynomial f at z where f is the polynomial committed under C. So the prover needs to send C, z, y, and the proof. (Edited)

2023/04/11 11:39:03

corrected

2023/03/10 18:55:11

Field element

Field element is 32 bytes (Edited)

2023/04/11 12:13:47

okay, I assumed it was base field not scalar

2023/03/10 19:02:01

reconstruction

Reconstruction is voluntary. The validators only need to verify that the data is available. (Edited)

2023/03/10 19:07:22

need to download KZG commitments

They don't actually need to explicitly download the commitments. The commitments will already be in the beacon blocks, so when the beacon blocks are downloaded, they will get the commitments. (Edited)

2023/04/11 12:47:29

I see, any other reason why row&cols dissemination is a separate part of the scope? By scope I'm referring to https://notes.ethereum.org/@djrtwo/das-building-blocks#1-disseminate-rowscolumns

Echo

2023/04/15 20:07:39

Danny is emphasizing the separation of tasks because he's thinking different actors *could* have different roles. i.e: - builders disseminate to validators then - validators disseminate to DHT (Edited)

2023/03/10 19:42:44

The 2D-KZG requires for ≥75% of the data to be s

You can reconstruct the whole matrix using only 50% of the samples. (you also mentioned this below) (Edited)

2023/04/11 11:51:58

Right 50% of all samples is enough. But I also thought that an additional 25% must to be sampled to ensure that all rows and columns are available for reconstruction, otherwise some of them might not have enough intersection to reach 50% individually. Please correct me if I'm wrong with this

2023/04/11 11:53:20

I've rephrase sentence to state this feature more clearly

2023/03/10 20:01:39

2048 cells total

each row/column has 512 samples, so 2048 samples in total? (Edited)

2023/04/11 12:51:57

yes, right

2023/03/11 05:19:36

d improvement when it comes to churn as horizontal subnets rotate slowly (once per epoch) while vertical subnets are not related to epochs schedule and thus can remain even longer than that. It also comes with some interesting shortcuts

I don't understand why it improve the churn problem. Re-explaining the proposal in this post may help. (Edited)

2023/04/11 13:50:16

It wasn't explicitly stated in proposal but Protolambda empirical tests suggest that. I've tried giving my explanation for this

2023/03/11 05:33:44

where peers could share a small number (~4)

This seems to be a new pub-sub design. It's worth a more concrete research on the network layer to study the trade-off. However, in the current gossipsub protocol, nodes don't forwards messages if they aren't already subscribed to the topic. (Edited)

2023/04/11 13:57:55

Indirect sample distribution is from the same proposal. I haven't included a more thorough explanation because this approach seemed dead end IMO but I opted to include it as reference material. I can write more if you think exploring it further is worthwhile, please let me know

2023/03/11 05:36:25

Most importantly, any gossip-based method will only work for live sampling, as there is no way to reliably request samples past the most recent block.

I think this is a major drawback of the gossipsub approach and it's worth highlighting. (Edited)

2023/04/11 13:21:40

Totally, made it more clear

2023/03/11 07:32:23

samples are grouped by the selected `node-id`

I'm quite confused by this. Can you explain it more clearly? It's even better if you give an example. Grouping by only one node id makes sense, but by many node ids? Can a sample be in many groups? (Edited)

2023/04/11 15:34:14

Sorry of confusion - sorting is by `node_id`. The next bundling method actually groups by multiple `node_ids` but I haven't found an efficient way to achieve this with distance-wise

2023/03/11 07:35:13

a size of node's routing table

Is this K the same as the little k above? If not, it does not quite make sense to me that it's O(K*N) (Edited)

2023/04/11 15:31:57

k != K. O(N * K) is because for each node K samples must be sorted. Assuming sort complexity is O(n) - the overall complexity is as stated. In reality it's more like O(N*K*log(K))

2023/03/11 08:53:27

his bucket

What does the region refer to? (Edited)

2023/04/11 16:22:17

Clarified to "bucket in their routing table"

2023/03/12 00:45:18

Light clients don't *have to* randomly sample, but we *expect* that some portion of them will be directly verifying a blob's availability through data availability sampling. Light clients like to directly + efficiently verify that "thing X" happened. (Edited)

2023/03/14 14:27:24

Bundling

Are the bundling strategies parts of the "4.2 Structured information dispersal" section? It seems to me so. If so, it's better to put it as 4.2.1 to avoid confusion. (Edited)

2023/04/11 14:02:38

Yes, sections 4.3, 4.4, 4.5 are related to "4.2 Structured information dispersal" and I mentioned that just above. Each of these section is big and has plenty of subsections so though it would be messier to put it under 4.2 altogether

2023/03/14 17:30:48

Alternatively, node can keep forwarding it to the next nodes in the routing tree

IIUC, with this FA option, if the node has seen the same bundle already, it will still forward to the next nodes. Does this mean that it will forward to the same next nodes as the first time? Should we make sure that the set of next nodes changes before forwarding it? (Edited)

2023/04/12 10:14:08

We should not. Addressed this in document too.

2023/03/14 17:53:40

n [\[6: p. 7\]](https://link.springer.com/article/10.1007/s12083-015-0338-y) authors came to conclusion, based on their experimental results, that ++best coverage under the simulated churn is achieved with `RA/FS`, where `forward-some/FS` is an additional setting that only forwards duplicated messages to a certain threshold++, that needs to be experimentally determined based on the network size and topology

Is this mentioned again in this paper? I cannot find it. (Edited)

2023/04/18 13:19:07

mentioned it again in the conclusion

2023/03/14 18:07:11

How do you prove that the peers have a large routing table size? (Edited)

2023/04/12 10:27:07

Described in the section bellow

2023/03/14 18:18:50

Finally

How about peer selection strategy conclusion? (Edited)

2023/04/18 13:27:33

Added a brief conclusion above. I should say that a less ambiguous conclusion would require more empirical data which unfortunately I haven’t been able to collect

2023/03/14 19:33:36

, ther

What work? I'm quite confused. (Edited)

2023/03/14 19:35:55

with `handle-id

What is handle-id? I think it's a good idea to explain it clearly before mentioning it. (Edited)

2023/04/12 10:36:32

added definition

2023/03/14 19:36:58

generated `handle-id`

Does this refer to the handle-id? If so, can you please make it clear? (Edited)

2023/04/12 10:37:13

right, updated doc to use `handle-id` explicitly

2023/03/14 19:50:56

thereum nodes do not expose their routing table sizes++, as there is no reliable and efficient way to prove this. However, one can make an approximate estimate about other's network view by just looking at their `NODES` responses. If remote peer consistently replies with close to 16 nodes, then it is likely to have a larger view than peers that are more restrained in their responses. Still, rely on this logic may prove be misleading as nodes are less likely to have many closer nodes when the lookup destination itself is close.

This paragraph is worth writing a pseudo-code. I cannot guess the logic from this paragraph at all. Let's say the function is "estimate_table_size(remote_peer)". Can you implement the body of the function? (Edited)

2023/04/12 11:06:38

It was quite a nascent idea which I didn't gave much credit to TBH. I've sketched some pseudocode below. Though I'm sure there better statistical approaches to do achieve this

2023/04/11 12:07:11

Good question Pop, thanks @EchoAlice for the insight. In my understating, DA light clients would have to perform limited sampling to gain confidence about current block availability. They might as well rebroadcast some samples as was proposed in https://arxiv.org/abs/1809.09044 (Edited)

2023/04/26 20:03:33

ore details in the spec [here

Here's the updated spec link: https://github.com/ethereum/builder-specs/blob/main/specs/capella/builder.md (Edited)

2023/04/26 20:12:49

verify the blocks produced by the current builder

Would it be more accurate to say "proposed by the current proposer"? (Edited)

2023/04/26 22:22:47

GossipSub

It might be worth mentioning the concept of subnetworks in this paragraph. (Edited)

2023/04/27 15:20:29

how validators are suppose to fetch samples

Where'd you see that validators are suppose to fetch samples of their rows + columns? I assumed they just downloaded all their required data from the builder at the same time (within section 3's Disseminate rows and columns). (Edited)

2023/05/01 23:51:57

Turns out validators can receive entire rows and columns in multiple ways: 1. Full rows and columns are sent to validators via gossipsub-like protocol. Pro: Validators could disseminate samples quicker once they receive TCP message. Con: Message propagation takes longer 2. Validators sample from the builder. Pro: Validators can attest to row/column being available or unavailable quicker than full row/column messages Con: Don't have as much data to seed DHT

2023/05/01 23:52:46

According to Codex team: https://www.youtube.com/watch?v=M-xkP4FzYMQ (skip to minute 19)

2023/04/27 15:33:54

This is needed for nodes that have gone offline for an extended periods of time

Isn't data required to persist within the network for 'x' amount of time needed regardless of nodes that have gone offline? i.e. to reconstruct fraud proofs for optimistic rollups (Edited)

2023/04/27 17:24:01

Sharded GossipSub

Would it be better to name this section "Unstructured Dissemination"? (Edited)

2023/04/27 19:06:57

proof of validator"

There might be other specialized routing tables worth exploring too! See end of Dankrad's S/Kad article. "Proof of ~x~" (Edited)

Data Availability Sampling - Research

Abstract

1. Introduction

2. Reference materials

2.1 Cryptography

2.2 Network participants

2.3 Building blocks

2.4 Parameters & Known values

3. Disseminate rows & columns

4. Disseminate samples

4.1 Sharded GossipSub

4.2 Structured dissemination

4.3 Bundling strategies

4.3.1 Group by distance

4.3.2 Group by buckets

4.3.3 Group by prefix

4.4 Peer selection strategies

4.4.1 Replication policy

4.4.2 Forward policy

4.4.3 Peer scoring

4.5 Routing strategies

4.5.1 Recursive dissemination

4.5.2 Iterative dissemination

4.6 Churn resistance

4.6.1 Anti-entropy gossip

4.6.2 ACKs and resubmissions

4.6.3 Empty regions re-assignment (ERR)

5. Query random samples

5.1 Topic subscription

5.2 Random walk

5.2.1 Unstructured random walk

5.2.2 Structured random walk

5.3 DHT lookups

5.4 Latency optimizations

5.4.1 Larger k-buckets

5.4.2 Topology adaptations

5.4.3 Recursive routing

5.5 Coverage optimizations

5.5.1 POKE mechanism

6. Security analysis

6.1 Selective share disclosure

6.1.1 Mixnets

6.1.2 Prefix lookup + Private set intersection

6.2 Continuous unavailability

6.2.1 S/Kademlia

7. Practical considerations

7.1 Wire protocol

7.1.1 Application-layer split

7.1.2 Reliable-UDP over Discv5

7.1.3 Transport negotiation with Discv5

7.2 NAT traversal overhead

7.2.1 Common neighbor facilitated hole punching

7.2.2 Recursive routing for NAT minimization

7.3 Discv5 overlay

7.3.1 Topic advertisement

7.3.2 Secondary routing table

8. Evaluation & simulation results

8.1 Dissemination efficiency

8.2 Random sampling efficiency

9. Conclusion

9.1 Future work

9.2 Acknowledgments

10. References