Thanks Emmanuel Nalepa for the review and contribution.
This doc aims to analyze PeerDAS sampling spec and propose a change to the spec to mitigate the underlying issue.
EIP-7594: PeerDAS - Peer Data Availability Sampling proposes an intermediate step towards full Danksharding, its design separates network interaction between nodes into two phaes: distribution and sampling.
During distribution phase, blobs were extended horizontally (1D PeerDAS), and then divided into NUMBER_OF_COLUMS (currently 128)
sections and distributed across DATA_COLUMN_SIDECAR_SUBNET_COUNT (currently 32)
subnets. Each node is required to custody at least CUSTODY_REQUIREMENT (currently 1)
subnets (4 columns).
During sampling phase, quote the current spec
There is 1 issue with the current spec:
a diverse set of peers for each column and each slot
does not provide worst case guarantee that all the peers the node has covers them all. In fact, from calculations we'll see in later sections, it's far from enough and mathematically impossible if we just select random peers. Let's do some calculations:
NUMBER_OF_COLUMNS = 128
DATA_COLUMN_SIDECAR_SUBNET_COUNT = 32
TARGET_NUMBER_OF_PEERS = 70
CUSTODY_REQUIREMENT = 1 => custody 4 columns at least
Let's calculate the expected number of columns our peer set could cover:
Probability that a Specific Number is Not Covered:
Probability that a Specific Number is Not Covered in 70 peers:
Probability that a Specific Number is covered at Least Once in 70 peers:
expected number of distinct numbers covered
With E(distinct number) = 114
, this means that with 70 expected peers, they're only expected to cover 114 of total 128 columns, leaving 14 columns not covered. And increasing peer number here does not have significant effect on the outcome (See figure below).
The expectation brings the issue in front of us: how do we select columns to sample?
Approach: randomly select number of columns from what our peer set custodies (minus the columns the node currently custodies)
Pros:
Cons:
Approach: we randomly select SAMPLES_PER_SLOT
columns from 0 to 127.
The drawback is that with expected distinct columns covered at 114, the chance that they are all covered are very low, clients need to consider how to handle this mismatch situation and this greatly complicates implementation (in terms of retry, error handling and parallelization).