DAS modeling and simulations

# DAS modeling and simulations ## Dissemination of rows and columns to validators ***Baseline assumptions (this is how the current model works, but almost none of these are cast in stone):*** * Block data is encoded using 2D RS coding from 256x256 to 512x512 * Dissemination is done using GossipSub channels per rows and per column * Validators instruct their beacons nodes to subscribe to the channels corresponding to selected rows and columns * Beacon nodes subscribe to the union of channels of all their validators * Reconstruction is done at the beacon node (not validator) level * Reconstruction is done by beacon nodes on-the-fly, injecting missing and reconstructed blocks in their respective channels when possible * Attestations are done based on the information received this way, no further sampling is performed * Validators are also listening when not attesting, and consider this in their later attestations ### Find baseline node to validator ratio (and distribution) to use in our simulations * Goal: determine realistic and parametrizable distribution for number of validators per beacon node * Methodology: reference data points from crawler -> analytical model * Data * Number of nodes in the network: 3500 ~ 5500 (Based on Migalabs) * Number of nodes in the network: ~10K total ~7K Synced * Number of validators: 450K ~ 500K (Based on beaconcha.in) * Model * Model different node-to-validator distributions as a superposition of validator "categories" * 1st model: * Staking pools: Gauss centered around 100? * Solo stakers: Spike around 1 val/node? * Ratio of the two categories ### Each validator selects X rows and columns based on local randomness (X=2 by default) * Vary from 1 to 512? (Some unrealistic configurations) [csaba] Q: With 5K-ish nodes and 400K-ish validators, and with 2\*512 channels, how many nodes will be on a given row (or column) channel? Probability of node $n$ with $v$ validators participating in a given channel $C$: $$ P(n \in C) = 1- \left(1-{X \over 512}\right)^v $$ ### What is the effect of X on attestations, bandwidth, and latency * Metrics: what are the right metrics to measure attestation and networking performance? * attestation: * attestation ratio: ratio of attestations to validators in committee * attestation delay distribution: sum<=1, with 1-sum being the ratio of no attestations * attestation "precision": attestations should reflect the availability of the data. Similar to a statistical test, TP/TN/FP/FN characterisation might make sense. TBD #### Start with Jupyter node model (Assumptions, parameters, and metrics) * Start with a direct flat (unrealistic) network? * Random distribution assumed * GossipSub delay distribution * Push to K-mesh network * Pull (IHAVE/IWANT) for tail distribution * Network failure model: How much was sent out in the network? * Network failure model: Network partitions? #### Simulation * Latency and bandwidth assumptions * Cloud nodes: High bandwidth and low latency * Home nodes: Lower network resources * GossipSub assumptions * Overlay stable (first model) * Sybil Attack * Different channels for rows/column distribution * Network Mesh Dynamics * Impact of many channels * Configurations * Desired Lowest Common Denominator (a.k.a. Raspberry Pis) * Desired network composition (a.k.a. what can actually work?) * Realistic network distribution **Confirm validator rotation and duties** ### Assuming failures, how many reconstruction steps does it take to recreate the full block? * When you have enough data to reconstruct you do and send as if you received it * Parameter: How much of the block was sent? * Parameter: How many nodes do not get gossipSub data from the X channel ### How long does it take to rebuild the block in the worst case? How to attack the structure? ### What is the worst case from the attestation / erasure coding / latency perspective? ### How much network bandwidth takes per node? ### How much erasure coding CPU load does it take per node? ### What is the optimal number of GossipSub topics? ### Can we suffer network splits? What parameters should we use to avoid them? ### How frequently should we change the topics? (Hours, days, weeks) ### What is the effect of distributing only half (or partial) rows and columns? ## Failure model ### Node proposer with bandwidth/network/congestion issues * Effects: * Sending partial data * Mitigation: * Use RS to reconstruct the missing data on arrival and propagate full data * If proposer sends by sample and randomize sample by sample then we get better fault dispersal across multiple rows ## Open questions * GossipSub message is a single sample or an entire rows? *