owned this note
owned this note
Published
Linked with GitHub
# Scalability and BFT in context of p2p
## Abstract
Scalability, BFT and other issues when implementing Ethereum 2.0 consensus protocol over p2p network.
## Problem overview
Let's review first basic assumptions underlying Ethereum 2.0 networking, e.g. libp2p, PubSub, gossipsub, etc.
### General considerations
#### Scalability
Distributed protocols employ patterns of communication like 1-to-n (scatter), n-to-1 (gather) or n-to-n, where n is the amount of nodes. It's obviously non-scalable with n, i.e. with large enough n, one has to send/receive too many messages, given limited network bandwidth and time constraints.
There are two basic ways to solve the problem:
* lower n where possible
* restrict node communication within a p2p-graph of limited maximum degree m, where m << n. I.e. a sender sends to m peers, they send to m their peers and so on.
One can also weaken bandwidth, time and/or message size constraints.
#### p2p graph
While limiting node degrees scales to large networks, Byzantine Fault Tolerance becomes lower.
#### Lowering amount of senders/receivers
If we have less senders/receivers than it require less efforts to deliver messages. However, then it's easier for and adversary node to de-anonymize proposers/attesters.
#### Managing overlay/mesh structure
Limiting participants (senders or receivers) in a p2p graph means some overlay/mesh structure should be organized and maintained over the network of nodes. Let's assume such purpose-specific subgroup of participants corresponds to a topic (as it is in the case of Ethereum 2.0).
Members of a subgroup/topic should have ways to find peers (within the subgroup/topic) to establish links and route messages.
It also may be required to be able for a non-member to communicate to a subgroup/topic. In some sense it's a lighter form of member-to-group communication, which requires discovery, but avoids full participation in mesh building/routing.
Topic membership can be rather stable (change slowly over time) or it can be dynamic and fast changing. For example, a next block proposer (who should receive aggregated attestations) changes each slot. That means membership subprotocol should be rather fast.
#### Topic discovery/DHT
Topic membership protocol requires discovery of peers (within topic) in some form. When a new member subscribes to a topic or want to leave, it should propagate info about itself, like node's address and port. A non-member willing to communicate to a topic, also should be able to obtain the info somehow.
This can be implemented via DHT. However, DHT are known to have weak BFT properties, which can lead to attacks.
### Ethereum 2.0 details
Ethereum 2.0 specifications considers most of the above issues, either implicitly or explicitly.
Specification assumes proposers and attesters communicate with PubSub/topic pattern. There are three (main) kind of such communication:
* a proposer sends a new block to beacon_block topic
* attesters aggregate their attestation in shard##\_attestaion subnet (topic also)
* result aggregate attestations are sent to beacon_attestation topic
#### Patterns
#### Overlay/mesh structure
#### Network details
### BFT and robustness
???
## Problems
### Scalability aspects of Ethereum 2.0
#### Block size
Block size considerations are described in a separate [document](https://hackmd.io/ESCE0EAaRhqM5qsJDAY1gA).
#### In-/outbound traffic
### Pattern aspects
Publish Subscribe is not appropriate in general for block and attestation propagation in Ethereum 2.0. While initially it seems like a good pattern, when looking at p2a, a2a and a2p communication in details, one can see that it's not really an appropriate pattern.
In general, PubSub separates Publisher from Subscribers, so that an architecture can be flexible. However, the core of Ethereum 2.0 is well defined, and scalability issues are more important, since nodes must propagate relevant information (blocks, attestations) to relevant receivers (proposers, attesters) within half a slot (3 sec, it's critical for new blocks and maybe relaxed for attestations, though still highly desirable). Thus flexibility of existing implementations of PubSub/topic pattern can contradict scalability requirements.
#### Proposer to attesters
In this case, PubSup topic pattern is more or less well matched: one sender sends to a stable group (most nodes are expected to subscribe).
However, a sender changes each round, so there is no possibility to organize effective routes (e.g. tree-like).
The other problem is that a proposer sends to the whole network (most nodes will subscribe beacon_block). While an immediate requirement is to propagate to active validators only, which is about six time less nodes (64 / 10) during half a slot. The others can receive a bit later.
So, while it may be an appropriate pattern to disseminate new blocks in general, it's not a good fit, when the goal is to reliably propagate to a smaller group within 3 second period.
#### a to a
#### p to a
### Overlay management
#### Topic discovery
### BFT problems
#### p2p graph
#### topic discovery
### BFT and robustness
Many BFT protocols assume point to point channels. However, such an approach doesn't scale to large size networks.
For example, consider a system with 5K-15K attesting nodes (an expected number of active validators in a slot). If a proposer sends a 64K message to attesters via point to point channels, it should send 320-960 Mbytes total, during a short period of time (e.g. 3 sec = half slot duration).
An obvious solution is to send to a much smaller amount of peers, and they resending the message further to their respective peers. From network topology point of view, we are restricting maximum degree of a communication graph. And limiting the outbound traffic.
However, in a "Byzantine" context, there arise opportunities for an adversary to block message propagation. The message propagating paths include intermediate nodes now, and while there are redundant paths in such a graph, the redundancy is limited by the maximum degree of the graph and by routing logic. For example, if a proposer sends to 10 peers, 10 malicious nodes would be enough, if all of the proposer peers happen to be such nodes.
That means, BF tolerance contradicts scalability - in order to increase amount of non-overlapping paths in a p2p-graph, one has to increase node degrees in the graph and that means higher outbound traffic.
One way to lower the outbound traffic is to reduce the message size. E.g. in the gossipsub protocol a sender sends a message to small amount of peers, but also sends an IHAVE message to a significantly bigger set of nodes, which can request the message, if they haven't received it from their peers.
Still, it's not clear Byzantine Fault tolerance properties of the approach. Especially, robustness could become a problem. For example, if all ten immediate peers of the proposer happen to be malicious nodes and block the message, the other nodes which received IHAVE message, will request it, resulting in outbound traffic growth, which means the target to propagate the message during three seconds can be missed.
## Communication patterns
Specification assumes proposers and attesters communicate with PubSub pattern.
I.e. a proposer sends a new block to beacon_block topic, attesters communicate to each other in shard##_attestaion subnet. And the result aggregate attestations are sent to beacon_attestation.
However, there can be some mismatch in PubSub pattern and Eth 2.0 requirements.
Let's consider this in details:
### Proposer to attesters
In this case, PubSup topic pattern is well matched: one sender sends to a stable group (topic subscribers), the sender being a member of group (topic subscriber).
One problem is that sender changes each round, so establishing tree-like routes can be a problem.
Scalability vs BFT issue is discussed above.
An additional problem is that a proposer sends to the whole network (most nodes will subscribe beacon_block). While an immediate requirement is to propagate to active validators only, which is about six time less nodes (64 / 10) during half a slot. The others can receive a bit later.
### Attesters to attesters
Attesters need to aggregate their attestations prior to sending them out, in order to reduce traffic. There should be around 5-15K attestations each slot, so compressing them before propagating to beacon_attestations is a good idea.
Validators of an active committee from a stable group (i.e. validators are assigned to a shard for a month or so) and the coummunication in the phase is inside group. However, it's not PubSub pattern, since the (partial) attestation aggregate grows as it is propagated through active committee subnet.
So, while the network mesh would be similar to topic PubSub and stable through time, it's another pattern.
BFT can be an issue since nodes are sending to a limited number of peers, that means an adversary can block propagation with relatively small amount of nodes.
### Attesters to proposer
Specification prescribes attesters to send their results to beacon_attestation topic, which members are to be proposers mostly.
That means senders are not members of the topic, so it's a bit different protocol than when a node sends to the topic it is a member of.
If active validators are to be members of beacon_attestation than there will be scalability issues, since it's much bigger targer, i.e. lots of senders and lots of destinations.
In both cases, beacon_attestation topic membership is very dynamic - it should change each slot, since a proposer should subscribe and current members should gradually unsubscribe after its slot elapsed (in order to keep topic and propagating efforts small).
However, it's unclear how to maintain necessary routing information in the case:
* small size topic, i.e. very sparse overlay mesh over entire p2p-graph
* non-members sending to the topic, how they are supposed to obtain routing information?
Altogether, it seems that PubSub/gossipsub is execessive here. Non-topic senders should somehow obtain a proposer address anyway (because of lack of a stable mesh of significant size) and can send ther result directly (without bothering with gossipsub details and BFT issues).
### Topic discovery
It's presumed that nodes subscribing to a topic, should use Topic Discovery service. Non-members willing to send to a topic, could also use the service.
Two issues here:
* how fast topic discovery/subscription? Topic Discovery states that average process is about 5 minutes. While a new proposer will be chosen for a slot one-two epochs beforehand. Since an epoch duration is lightly bigger than 3 minutes, it will be a problem.
* BF tolerance. DHTs are known to be not quite safe from BFT perspective. So, if attesters will requiest Topic Discovery, which is based on Kademlia DHT, for an address of beacon_attestation members, an adversary can attack DHT and populate it with wrong information.