Comparative Analysis of Block Propagation Mechanisms

--- tags: ["[rcnyc]"] --- # Comparative Analysis of Block Propagation Mechanisms Obtaining block proposals from a leader to the rest of the network is a challenging problem in high-performance distributed consensus environments. Block Propogation Engines are specialized message delivery protocols used within Byzantine Fault Tolerant consensus mechanisms to send block proposals from leaders to participating validators. Naively, there are two approaches: - The leader sends the block proposal to each validator, but this leads to high upload bandwidth costs. - The leader send the block proposal to a few peers, who then rebroadcast that block proposal to the rest of the network. These approaches present major scalability bottlenecks. We have to design innovative and efficient block propogation techniques to power high-throughput systems. ### Guiding Topics & Questions - What is the thinking behind the major design choices behind the leading block propagation implementations? Reed-Solomon vs. RLNC? The degree of fanouts? Push- vs. Pull-Based mechanisms? - What should we optimize for, latency or throughput? - To what extent can we decouple and parallelize/pipeline tasks? - What are further design optimizations that we can make? - Which of these mechanisms is most susceptible to threat vectors? - How should network topology be setup? Physical links? Stake-weight? Ad-hoc? - What is the intersection of multiple concurrent proposers and block propagation? Is it better to have faster blocks with monopolistic proposers who can give immediate inclusion and ordering guarantees, or increased censorship-resistance with multiple concurrent proposers but strictly more validator <> validator communication? **In this document and the following discussions in [[rcnyc2]](https://lu.ma/lubx51jw "title"), we will focus on discussions about the *Guiding Topics and Questions* and popular mechanisms.** ## Monad - **Consensus Mechanism:** MonadBFT - **Block Propagation Mechanism:** RaptorCast ![Monad RaptorCast](https://hackmd.io/_uploads/ryPzajtp1x.jpg) The primary design requirements for a block propagation system under MonadBFT's consensus mechanism: 1. Reliable messaging delivery to all participating consensus nodes is guaranteed if 2/3 of the stake weight is non-Byzantine. 2. Upload bandwidth requirements of a validator are linearly proportional to the message size. 3. Worst case latency is twice the worst-case one-way latency between any two nodes. Essentially, the propagation of the message happens within the round-trip time between the two most distant nodes. 4. Messages are delivered with a configurable amount of redundancy, where increased redundancy implies reduced message latency. Monad RaptorCast achieves these design requirements through erasure coding mechanisms, an efficient message and chunk distribution model, and a strong fault tolerance guarantee. Within erasure encoding, the leader encodes the message into a set of chunks and the message can be decoded by only having access to a sufficiently large subset of chunks. Within the message and chunk distribution model, the 1. The leader constructs chunks from the proposed block 2. The leader distributes chunks to validators as first-hop recipients based on their relative stake weights 3. The first-hop recipients rebroadcast the same message to the rest of the network, for redundancy-sake Within fault tolerance, RaptorCast opts to utilize UDP (User Datagram Protocol) as a transport protocol, not TCP (Transmission Control Protocol). While TCP prioritizes reliability, UDP prioritizes speed. Given their choice, RaptorCast does not include a recovery mechanism for downstream nodes in the case of packet losses or drops. To compensate, RaptorCast transmits the message in a redundant fashion. This raises the question, *what are the tradeoffs between the added upload bandwidth costs from redundant packed using UDP vs. the latency introduced by the reliability of TCP?* ## Solana Turbine - **Consensus Mechanism:** TowerBFT with PoH serving as a global clock (PBFT-inspired, prioritizing liveness > safety) - **Block Propagation Mechanism:** Turbine ![Solana Turbine](https://hackmd.io/_uploads/SJJ_6otakl.jpg) Solana Turbine is a block propagation protocol inspired by BitTorrent. Its primary design goal is to solve the scalability trilemma, which is dominated by the bandwidth limitation per node. Given a fixed bandwidth per node, adding more nodes to the network increases the amount of time to propagate all data to all nodes. This poses a unique optimization of maximizing bandwidth and minimizing latency. This sounds familiar, hence Solana's focus on *Increasing Bandwidth and Reducing Latency*. Turbine is optimized for streaming, hence its foundation in UDP, and implements a random path per packet through the network as leaders stream their data. The protocol is defined below: 1. The proposer has a message to broadcast, and does so by breaking the message down into smaller packets which they then transmit to different validators. 2. In turn, each receiving validator re-transmits the packet to a group of peers that are in their local purview - a neighborhood. 3. This neighborhood architecture alludes to a global vs. local mesh, in a binary tree format. The entire tree forms the global mesh, while the individual neighborhoods house closely-connected nodes forming the local mesh. 4. Each neighborhood is responsible for transmitting a portion of packets to each neighborhood below it. Currently, the network topology is organized by stake-weight, so nodes with the bhigest stake get block shreds first and then forward them to the next layer. Furthermore, the fanout is set to 200, so most validators receive the shreds in 2-3 hops. To handle security against adversarial nodes, Turbine dictates that the proposer guarantees Reed-Solomon Erasure Codes, which allow nodes to reconstruct the block without receiving all the packets. Furthermore, Turbine's fanout algorithm generates a stake-weighted tree for every packet using a random source based on the digital signature of the packet, thereby randomizing the path of a piece of data. ## Ethereum - **Consensus Mechanism:** LMD GHOST with Casper FFG (chain-based PoS, prioritizing liveness > safety) - **Block Propagation Mechanism:** libp2p gossipsub ![Ethereum Current Mechanism](https://hackmd.io/_uploads/SkNKTiKTJx.jpg) Ethereum currently utilizes libp2p's gossipsub protocol to disseminate validator information, include block propagation, amongst its consensus clients. Ethereum's networking layer is split amongst its execution and consensus clients. Given the focus on block propagation, we will focus on the consensus layer's networking protocol implementation. The networking layer includes two distinct phases: (1) Discovery Phase: networking stack built on top of UDP that allows new nodes to find peers to connect with (2) DevP2P Layer: networking stack built on top of TCP and SSZ that establishes and maintains communication sessions between nodes The discovery process begins with a new node connecting with a set of hardcoded *bootnodes* that serve the sole purpose of facilitating new connections. Every new connection is dictated by the PING-PONG mechanism, where the source node exchanges hashed information about its identity with the target node, who verfies this information to establish a connection. Since discovery is a "spray-and-pray" strategy with minimally-complex message-passing, it is built on top of UDP. The DevP2P layer exists for maintain these connections and monitor complex message passing with necessary functionality, hence the choice of building on top of TCP. On the consensus layer, peer-to-peer communication utilizes SSZ (Simple Serialization) as this mechanism integrates easily with Merkle Roots and allows clients to decode parts of propagated messages with minimal effort, as compared to RLPx used on the execution layer. The current gossip implementation is simplified as follows: 1. The block proposer selects a random set of its peers, effectively creating a *mesh*, and broadcasts the full block to all of them. 2. The receiving peers verify the proposer signature, and rebroadcast the verified full block to another set of random peers in their local *mesh*. ### Proposal for Faster Block Propagation on Ethereum ![Ethereum New Proposal](https://hackmd.io/_uploads/HyH9piFT1g.jpg) To provide motivation for the proposal, listed below are some important limiting factors for Ethereum's networking layer: #### *External Factors* - Block building timing games within the PBS transaction pipeline impact when the consensus clients can share a finalized block with the execution layer. - The size of the block, impacted by the gas limits per block, impact the performance of the protocol due to worse-case bandwidth limitations. - The geographical distribution of consensus client nodes impacts the latency of the gossip network. #### *Internal Limitations* - GossipHub's implementation guarantees duplicated messages for redundancy and security sake, but larger messages (or blocks) imply bigger duplicated, which means longer action times. - Each hop adds delays equivalent to a full block transfer from one peer to another. - Peers frequently send unnecessary full blocks to peers that have already received the same information. The Proposal focuses on implementing a gossip netwok based on Random Linear Network Coding (RLNC). A succinct description of its processes is described below: 1) The proposer splits the block information into *N* chunks and instead of sending the full block to *x* random peers, the proposer sends a single chunk to a greater (>*x*) number of peers. The form of these chunks are random linear combinations of the original information. 2) The receiving peers are required to download the full block information, and they are able to do so as they receive all necessary block information in parallel from their peers. Once they have received a set of N unique chunks, the nodes are able to compose the original block by solving a linear system of equations. *The receiving nodes keep track of messages they have received, and generate a following subspace. Upon receiving a new message, they are able to check for repetitive information by projected the linear combination of the message onto the existing subspace.* The major distinction between Ethereum's faster block propagation mechanism and Monad's RaptorCast is the design decision between Random Linear Network Coding vs. Reed-Solomon Erasure Coding and Routing. ::: info ### Random Linear Network Coding vs. Reed-Solomon Reed Solomon Erasure Coding is derived from the fundamental principle that a polynomial of degree k-1 is uniquely determined by k unique points. Therefore, the encoding and decoding process are centered on representing the original message as a polynomial which can be uniquely determined by evaluating the polynomial at different points. Since nodes need to know the relative positions of missing packets, Reed Solomon is better suited for statis erasure protection (Data Availability Sampling, Light Client Proofs, etc.). Random Linear Network Coding is derived from (re)constructing messages using independent linear combinations that create unique subspaces. RLNC is more robust to packet loss and dynamic network topology, therefore, being better suited for live, resilient message dissemination across unpredictable, unreliable links. ::: ## Sei - **Consensus Mechanism:** Autobahn - **Block Propagation Mechanism:** In-built Autobahn propagation engine ![Sei Autobahn](https://hackmd.io/_uploads/S15oaiFaJg.jpg) Sei's upcoming upgrade, Sei Giga, is rooted in the implementation of Autobahn, a high-speed parallelized BFT consensus mechanism with a unique propagation engine. The core innvoation of Autobahn is its seperation of partially synchronous consensus and its asynchronous data dissemination layer. ::: info ### Autobahn's Consensus Mechanism Within the data dissimination layer, each proposer maintains its own lane to propose independent transactions. Occasionally, the proposer of the lane will construct a batch of transactions - a proposal - which essentially acts as that lane's local block. This proposal is disseminated to other nodes who vote on its validity and assemble a Proof-of-Availability on the validity and availability of the data. Per epoch, the distinct consensus layer is dictated by a global proposer selected in the traditional stake-weighted selection mechanism. This global proposer executes a global cut of the local tips (the most recent proposals from each independent lane), and they are deterministically ordered and achieve consensus. ::: Sei's block propagation mechanism is rooted in Autobahn's asynchronous data dissemination layer. This networking layer works through parallel DAG lanes that validators utilize to asynchronously propagate blocks, enabling simultaneous *local* block productoin and dissemination. Each of these *local* blocks are supplemented with Proof-of-Availability proofs, ensuring data availability before consensus finalization. These local blocks are routed to the single stake-weighted leader through global cuts, where the node references incremental block data rather than waiting for full downloads. ## Aptos - **Consensus Mechanism:** AptosBFT - **Block Propagation Mechanism:** Narwhal-based Quorum Store *Aptos' mechanism is similar to Sei Autobahn, visualized in the section above* Aptos' block propagation and dissemination system operates under the following design requirements for its AptosBFT consensus mechanism: - Maintain Byzantine fault tolerance against up to 1/3 malicious validators - Achieve sub-second finality with ≤200ms block times - Limit network communication complexity to *O(n)* per block - Enable continuous network bandwidth utilization without idle periods - Guarantee data availability before execution finalization To achieve these goals, Aptos implements a pipelined dissemination model with three coordinated subsystems: **1. Decoupled Transaction Streaming**: Validators continuously broadcast transaction batches to their peers, which is independent from the consensus voting phases. These concurrent streams of transactions overlap with the Quorum Certificate phase of previous block. **2. Leader-Chained Vote Aggregation**: The selected leader proposes block metadata, and validators send votes to *the next round's leader*, who then aggregate votes into Quorum Certificates. The final commit requires *2f+1* stake-weighted signatures. **3. Pacemaker-Driven Synchronization**: This entity manages round management for each block, timeout and fallback messages, as well as implementing a reputation-based leader rotation. The separation of data availability from consensus voting allows independent optimization of propagation parameters without modifying core consensus logic. ## Celestia - **Consensus Mechanism:** Tendermint - **Block Propagation Mechanism:** Vacuum! Vacuum! is pull-based "lazy" gossip network wherein peers optimistically declare the data they have access to, and only send data to those that explicitly ask for it. The protocol utilizes Validator Availability Certificates to allow for multi-height pipelining of block data, thereby providing a mechanism for validators to signal to proposers which block data they have upfront. Furthermore, specific VACs are utilized to indicate which blob data is most valuable, helping the JIT creation of optimal paths for sharing the most valuable data. Vacuum! further introduces two familiar optimizations: data chunking and pipelined *HAVE* and *WANT* requests. Data chunking allows for parallelized data dissemination, thereby not straining a single node's throughput capabilities. Pipelined *HAVE* and *WANT* requests create two phases in the propagation mechanism: an optimistic declaration from nodes about what data they hold (enter VACs) and a pull-based recovery mechanism for requesting information when the block has been produced. How does this look in practice? 1. Each validator starts the cycle by gathering blob transactions from their mempool. VACs are created for each transaction, and then VAC Roots are created and signed over thse transactions using the validator's keys. 2. The VAC Roots are sent to all nodes, and the highest-priority VACs are forwarded to all nodes as well. This allows for prioritized gossiping of the most valuable data. 3. For data requests, the requesting and recipient node go through a series of *HAVE* and *WANT* requests with JIT optimal routing paths, and the requesting node eventually recieves the requested data. --- ### Related Work - [Monad RaptorCast Documentation](https://docs.monad.xyz/monad-arch/consensus/raptorcast "title") - [Solana Turbine Documentation](https://medium.com/solana-labs/turbine-solanas-block-propagation-protocol-solves-the-scalability-trilemma-2ddba46a51db "title") - [Ethereum Networking Layer Documentation](https://ethereum.org/en/developers/docs/networking-layer/ "title") - [Ethereum's Underlying P2P Network](https://youtu.be/Bwtjvmjtyjg "title") - [Block Propoagation in Ethereum's P2P Network](https://www.youtube.com/watch?v=AH8NmuW7pw8 "title") - [*Proposal:* Faster Block Propagation on Ethereum](https://ethresear.ch/t/faster-block-blob-propagation-in-ethereum/21370 "title") - [Attack Vectors & Analysis for gossipsub](https://www.youtube.com/watch?v=VEEEaf8B35w&t=65s "title") - [Synchrony, Asynchrony and Partial Synchrony](https://decentralizedthoughts.github.io/2019-06-01-2019-5-31-models/) - [Aptos' Quorum Store](https://medium.com/aptoslabs/quorum-store-how-consensus-horizontally-scales-on-the-aptos-blockchain-988866f6d5b0) - [Celestia Vacuum! Docs #1](https://github.com/celestiaorg/celestia-app/blob/e666c7d38940ef32c475d8347eee301fa91fe327/specs/src/vacuum.md?ref=blog.celestia.org) - [Celestia Vacuum! Docs #2](https://github.com/celestiaorg/celestia-app/blob/4add30d8dd4c85fb838ac8e792c73fc6458a22c1/specs/src/recovery.md?ref=blog.celestia.org)