Prysm Sharding Plan

# Prysm Sharding Plan ## Objective To define how Prysm can be at the forefront of the upcoming work on eth2 regarding sharding and data availability sampling at scale, ensuring we help drive the research and development efforts as an implementation team on this matter. ## Background The basic idea of sharding in eth2 has to do with broadcasting a shard block over its appropriate shard subnet in p2p, then chunking a piece of data called a "data availability commitment", and broadcasting these each chunk_i over a subnet, s_i, that spans all 64 shards. The idea is that using this commitment, it becomes easy to prove the data for a shard block exists and the shard block can then become tightly coupled with the beacon chain. ![Image](https://i.imgur.com/UJrhdSL.png) There are two major aspects towards implementing sharding in eth2. Namely: 1. How data availability works. For example, how validators can challenge other validators for shard block data and ensure it exists via fancy math 2. How sharding of data is actually done at the p2p layer, and how it can be done efficiently and correctly The first is currently done via work on cryptography known as KZG commitments, which are out of scope of this document. The latter, actually, seems to the most unknown with not enough work done to understand the implications of this on the p2p layer of eth2. - [An explanation of sharding and data availability sampling](https://hackmd.io/@HWeNw8hNRimMm2m2GH56Cw/sharding_proposal) - [Eth2 sharding slides](https://vitalik.ca/files/misc_files/eth2_sharding.pdf) - [Data avalability sampling proposal](https://hackmd.io/@HWeNw8hNRimMm2m2GH56Cw/r1XzqYIOv) - [Data availability in practice](https://notes.ethereum.org/@vbuterin/r1v8VCULP) - [Protolambda's idea: gossip sampling](https://notes.ethereum.org/McLGvrWgSX6Cpg60ewMYeQ) - [Network spec ideas](https://github.com/protolambda/eth2-das/blob/master/spec/shards_p2p.md) ## Current Work First, it is highly recommended to read Vitalik's [Data avalability sampling proposal](https://hackmd.io/@HWeNw8hNRimMm2m2GH56Cw/r1XzqYIOv). In his post, he explains the idea of spreading the data availability chunks across subnets spanning all shards. Additionally, Vitalik has a post on how this would work [in-practice](https://notes.ethereum.org/@vbuterin/r1v8VCULP) ### Protolambda's Prototype We need to really work with the researchers to expand on bottlenecks and problems we foresee when adapting their research ideas into Go code. Proto has started a really cool prototype [here](https://github.com/protolambda/eth2-das) called eth2-das in which he creates N mock "beacon nodes", sets up gossipsub, subscription to subnet topics, and implements data availability sampling and retrieval from shard subnets. The purpose of his project is to allow someone to pass in configuration options and spin up a simulation which can then be analyzed to understand the performance of some of these algorithms for data availability sampling, peer rotation, and peer subnet connections, and more. Unfortunately, it is a very experimental repo unmaintained since November, 2020, and does not build. ## Plans for Prysmatic Labs The place where we can be most helpful is to actually continue Protolambda's work on a prototype of subnet-based, chunk-based sharding for data availability sampling ([eth2-das](https://github.com/protolambda/eth2-das)), in a well-maintained, well-developed library that can be used for mass-scale simulations using tools such as testground. Proto's experimental repo has not had a commit for a while, and if we can continue that on our own with help of Proto and build something really solid, it could evolve into something researchers can better understand. Moreover, we can build the code to be reusable and useful for us to eventually just plug it into Prysm. We have started a prototype which takes parts of Proto's code, refactors them, and can serve as a production focused repository for testing here: https://github.com/prysmaticlabs/sharding-testground The following milestones are noteworthy: 1. Having a production-quality, simulation repository using the real algorithms needed for p2p sharding using go-libp2p and data availability sampling using KZG commitments 2. Running the simulation in a large-scale network environment and profiling various metrics 3. Merging this code with the Prysm codebase 4. Run a Prysm-only testnet that is able to successfully handle 64 shards with data availability sampling If we can accomplish 4 by Q4, then we are in a great spot. Moreover, if we can help drive this implementation effort, the research can move faster and we can get the future parts of eth2 out quickly, before folks anticipate it. ## Unknowns and answers - Discovery for these subnets described for data sharding at the p2p layer is something no one has covered, and it is likely discovery v5 will need to change significantly to adapt. What needs to be done here? Is it a big problem? - Proto: A primary idea for DAS gossip subnet stability is the assignment of nodes onto subnets based on their identity. This does not take away from the required sample count. But it does serve as a backbone: no discv5 advertisement required, just hash the known peers in discv5 table/peerstore, and you find those you need. - Q: There seems to be a need for individuals with strong graph theory backgrounds to better understand the problem of allowing validators to optimally navigate the p2p gossip sub mesh with these new subnets added. How much is unknown here? - Proto: You are piecing two unrelated things together there. We were asking for someone with a graph theory background, to review a more complicated version of my Whirpool design doc, v1 :sweat_smile: The gossipsub mesh navigation is still complicated, but the backbone idea is very promising, it should work. - Q: Who is working on production-quality KZG libraries, is supranational a candidate? If so, is EF working on paying for something like this? - Proto: KZG group. TLDR: my go-kzg for prototyping. Mamy explores the optimizations more. Ben gets KZG around BLST started. Then iterate more later, probably with eventual grants etc. for production quality. Just onboarding clients on testnet/prototype work is the first step. - Q: How confident does the research team feel about the shard chunk subnet approach so far? Is there a possibility this could all get thrown out halfway through the year? - Proto: Definitely not 100% on subnets, but some form is likely to fill the happy case. The non-happy case, sampling of 80 days worth of full-capacity shards, as well as syncing it, and seeding the network as validator (many DoS and privacy issues) is the real problem. That's what we're actively looking at, and not confident in yet. Solving only the happy case is just not that useful. We might do merge first, or sharding with less security enhancements (i.e. no data-avail.-sampling) if necessary.