changed 2 years ago
Linked with GitHub

GossipSub for big messages

Background

This document sums up my findings on using GossipSub for bigger messages, specifically for EIP-4844.
It's based on various simulations to understand real network behaviors.
You'll find the less polished history here: https://github.com/status-im/nim-libp2p/issues/850

The main goal is to keep the message propagation time low, and possibly save bandwidth.

All of the simulations are based on the assumption that one 0.5mb message will be broadcasted every 12 seconds in the network.

Thanks to Anton Nashatyrev for his help and simulations, and arnetheduck for his ideas

Learnings

1 - Duplicates don't matter that much for propagation time


Wisdom was that once we reached messages as big as 0.5mb, duplicates were becoming our worst enemies, since the connections were busy sending them instead of sending useful data.

In practice, even eleminating 100% of duplicates has a very limited impact on propagation time in this scenario (about ~12%).

This can be explained because most of the duplicates actually happen once a majority of peer already have the messages, so while that bandwidth will be wasted, it will not be at the detriment of the sending speed of useful messages.

Obviously, reducing duplicates will still have an impact on the overall traffic

2 - Bursty traffic is hard

TCP (cubic) tries to guess how much bandwidth is available on a specific socket by looking at packet looses. If packets are lost, it will decrease its estimate, if no packets are lost, it will increase it.

That's great, except that in ethereum case, we only use the connection to its full potential every few seconds, leaving it idle / at very slow packet rate for the rest of the time.

Since TCP won't have enough packets to estimate the bandwidth, it will just reset its estimate, and restart the guess process. So sending a 0.5mb message on a "cold" connection will take a lot longer than on a "warm" one.

Proposed gossipsub modifications

1 - ChokeMessage

Upon receiving a big enough message, a node can send a message to inform its mesh that he doesn't need this message anymore. This enables to reduces duplicates on large messages

2 - Ping

By adding ping support to gossipsub, we can continously ping each connection to make sure they stay warm (at least every 2 RTTs). This is a bit hacky, as the TCP estimate might be off, but it's still better than going back to a cold connection.

Another solution might be to use the tcp_slow_start_after_idle sysctl parameter, but that can have some side effects, since this is system-wide

3 - Staggered (/ sequential) relaying


Instead of relaying a message to all peers in your mesh at once, it's better to relay one (or two) at a time. This enables the peers to start relaying said message faster. This also gives more time to receive ChokeMessages from peers, reducing duplicates further.

Each message should be sent to the mesh in a random order. This will help distribute the load better accross the mesh

To implement this, two ways are proposed:

  1. ACKed based
    After sending each message, ping the peer on the same stream and wait for a response. This will cost half a RTT more than necessary, but is bulletproof and easy to implement. Because of the half RTT, this should be limited to big messages

  2. Bandwidth estimation
    Use the data from the continuous pings to estimate the available bandwidth with a peer. This data can then be used to estimate how long the send will take for each peer. See an example here

4 - BBR

When possible, the BBR congestion algorithm should be used. It's unfortunately hard to enforce, since this requires a modprobe by the users on linux, but it gives a significant boost in message propagation time (>15%)

All together

Gains compared to Default:

Default 4 chunks 4 chunks + 2 parallel 4 chunks + 2 parallel + choke
Max lat 0.00% 26.00% 50.03% 61.47%
Mean lat 0.00% 18.92% 54.00% 58.69%
Bandwidth 0.00% 12.40% 20.80% 52.79%

4 chunks: splitting the message in 4 chunks
2 parallel: limiting to 2 parallel relays
choke: ChokeMessage

Potential improvements which are still WiP

"Notify" message

Before sending a big message, a node could send a small "About to send <msgid>", which the receive node could use to send Choke to its mesh optimistically

Because of mixed results in the simulations and potential security issues, its still early to push for wider adoption

Lazy publish

A node could push eagerly the message to 2 peers, and just send IHAVE to the rest of his mesh. Assuming nodes sent a single IWANT per message, this will further reduce the amount of duplicates in the network.

This needs further investigation in terms of latency & security considerations.

Select a repo