This document sums up my findings on using GossipSub for bigger messages, specifically for EIP-4844.
It's based on various simulations to understand real network behaviors.
You'll find the less polished history here: https://github.com/status-im/nim-libp2p/issues/850
The main goal is to keep the message propagation time low, and possibly save bandwidth.
All of the simulations are based on the assumption that one 0.5mb message will be broadcasted every 12 seconds in the network.
Thanks to Anton Nashatyrev for his help and simulations, and arnetheduck for his ideas
Wisdom was that once we reached messages as big as 0.5mb, duplicates were becoming our worst enemies, since the connections were busy sending them instead of sending useful data.
In practice, even eleminating 100% of duplicates has a very limited impact on propagation time in this scenario (about ~12%).
This can be explained because most of the duplicates actually happen once a majority of peer already have the messages, so while that bandwidth will be wasted, it will not be at the detriment of the sending speed of useful messages.
Obviously, reducing duplicates will still have an impact on the overall traffic
TCP (cubic) tries to guess how much bandwidth is available on a specific socket by looking at packet looses. If packets are lost, it will decrease its estimate, if no packets are lost, it will increase it.
That's great, except that in ethereum case, we only use the connection to its full potential every few seconds, leaving it idle / at very slow packet rate for the rest of the time.
Since TCP won't have enough packets to estimate the bandwidth, it will just reset its estimate, and restart the guess process. So sending a 0.5mb message on a "cold" connection will take a lot longer than on a "warm" one.
Upon receiving a big enough message, a node can send a message to inform its mesh that he doesn't need this message anymore. This enables to reduces duplicates on large messages
By adding ping support to gossipsub, we can continously ping each connection to make sure they stay warm (at least every 2 RTTs). This is a bit hacky, as the TCP estimate might be off, but it's still better than going back to a cold connection.
Another solution might be to use the tcp_slow_start_after_idle
sysctl parameter, but that can have some side effects, since this is system-wide
Instead of relaying a message to all peers in your mesh at once, it's better to relay one (or two) at a time. This enables the peers to start relaying said message faster. This also gives more time to receive ChokeMessages from peers, reducing duplicates further.
Each message should be sent to the mesh in a random order. This will help distribute the load better accross the mesh
To implement this, two ways are proposed:
ACKed based
After sending each message, ping the peer on the same stream and wait for a response. This will cost half a RTT more than necessary, but is bulletproof and easy to implement. Because of the half RTT, this should be limited to big messages
Bandwidth estimation
Use the data from the continuous pings to estimate the available bandwidth with a peer. This data can then be used to estimate how long the send will take for each peer. See an example here
When possible, the BBR congestion algorithm should be used. It's unfortunately hard to enforce, since this requires a modprobe
by the users on linux, but it gives a significant boost in message propagation time (>15%)
Gains compared to Default:
Default | 4 chunks | 4 chunks + 2 parallel | 4 chunks + 2 parallel + choke | |
---|---|---|---|---|
Max lat | 0.00% | 26.00% | 50.03% | 61.47% |
Mean lat | 0.00% | 18.92% | 54.00% | 58.69% |
Bandwidth | 0.00% | 12.40% | 20.80% | 52.79% |
4 chunks: splitting the message in 4 chunks
2 parallel: limiting to 2 parallel relays
choke: ChokeMessage
Before sending a big message, a node could send a small "About to send <msgid>
", which the receive node could use to send Choke to its mesh optimistically
Because of mixed results in the simulations and potential security issues, its still early to push for wider adoption
A node could push eagerly the message to 2 peers, and just send IHAVE to the rest of his mesh. Assuming nodes sent a single IWANT per message, this will further reduce the amount of duplicates in the network.
This needs further investigation in terms of latency & security considerations.
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing