Simulating Link Bandwidth

--- mathjax: config: TeX: autoNumber: "ams" --- # Simulating Link Bandwidth In this writeup, I will be reasoning through approaches for modeling bandwidth constraints in our simulations. Our goal is to come up with a strategy that is: 1. simple; 2. efficient; 3. realistic enough to our purposes. As for (3), our primary goal is to understand dowload latency (i.e. the time it takes for a node to obtain a viable copy of a file) and resource usage (what loads to expect on the various participants of the swarm). ## Strategy 1 - Don't do It Not modeling bandwidth constraints is, of course, the go-to strategy if you can get away with it. This typically hinges on being able to assume that link capacity does not dominate what you are trying to measure, and is a common assumption when simulating, say, the delivery latency of a push gossip protocol [^1], where one can assume messages are so small that they will typically not congest a link. Even round-based protocols like anti-entropy [^1], which in principle move around a lot more data, are also amenable to such simplifications if we assume that the round length $\Delta_r$ is large enough that latency is dominated not by available bandwidth but by the round length itself. Indeed, even in cases in which we cannot make such claim beforehand (i.e. that bandwidth scarcity is not the dominating factor), we can test for it. For instance, one could: 1. put no caps on bandwidth; 2. simulate; 3. look at stats like the number of messages a node has to process per round when running the simulation; 4. draw conclusions as to whether the assumption holds or not. If after simulating we conclude that the protocol requires nodes to process 1 billion messages of 1MB in size per $\Delta_r$, and $\Delta_r$ equals $1$ second, then you are probably in trouble. Bitswap and Bittorrent, however, are: _i)_ not round based; _ii)_ do not exchange messages of negligible size. Indeed, unlike push gossip protocols or antientropy, link capacity is a key aspect of expected download times. If we assume infinite bandwidth, nodes can simply download everything as fast as the network latency model allows -- or instantaneuosly if there is none -- meaning link capacity is key. ## Strategy 2 - Rate-Limited Downloads A simple way to try to work around the problem would be to assume that a node can download at most $\alpha$ blocks at a time (based on some simple estimate of bandwidth and block size), and that downloading this number of blocks takes time $t_\alpha$. If we do that, downloading $n$ blocks from neighbors is no longer instantanteous, and takes $t_{\alpha} \times n / \alpha$ instead (Figure 1). ![](https://hackmd.io/_uploads/HJi7TKQC3.png) **Figure 1.** Rate-limited downloads. This is, to some extent, equivalent to transforming our exchange protocol into a round-based protocol, where $\Delta_r = t_\alpha$. While definitely an improvement over having no cap in place, this approach has its own share of issues. If we have a network with $100$ leechers and $1$ seeder (using Bittorrent terminology), we will be assuming that the seeder can serve $\alpha \times 100$ blocks per $t_\alpha$ as the swarm starts, which might not be reasonable. Simulated download times may therefore still not be trustworthy, and we will still need to keep an eye on absurd emerging link capacity requirements and act on them. It may also be difficult to get an accurate view of what protocol performance might look like if all we can tune are swarm-wide, time-invariant parameters $\alpha$ and $t_{\alpha}$. To make it more tunable, we would likely need to introduce an $(\alpha, t_\alpha)$ pair per node, always with the caveat that actual link utilization depends not on $\alpha$ but on the number of requests a node receives over time, which is not explicitly parameterizable here. ## Strategy 3 - Simple Link Capacity Delays We could attempt to model link capacity directly instead by assuming that a network link $l_j$ has some outbound and inbound bandwidth constraints $(o_j, i_j)$, and then computing the delay of sending a message $m$ of size $s_m$ over $l_j$ as: \begin{equation} t_m = \max \left(\frac{s_m}{o_j}, \frac{s_m}{i_j} \right) \tag{1} \end{equation} this is quite simple and has been done, for instance, [here](https://github.com/heikoheiko/p2p-network-simulator/blob/master/peer.py#L38). The obvious shortcoming with this approach is that it assigns a fixed delay to messages without taking into account whether or not a link is congested, which will typically incur more complex delays that what Eq. (1) can capture. ## Strategy 4 - Link Capacity and Queueing If we want to capture link congestion, then there is probably no way around using a queueing model. We will start with a simpler model which produces a pessimistic estimate of delays, and then attempt to refine it into a more realistic one which we refer to as the "double clearance" model. **Simple Model.** Every node keeps an _inbound_ and an _outbound_ message queue per (full duplex) link. A node wishing to send a message over a link starts by putting it into its outbound queue. If the queue is empty, we set the outbound portion of the link to _busy_, and schedule a `Transmit` event immediately with a delay of $s_m/o_j$ which simulates that the message is being transmitted. If the queue is not empty, then the message enters the tail of the queue where it must wait for the other messages to be transmitted first. Note that this guarantees that there is always at most one pending `Transmit` event per link, which refers to the message being currently transmitted. Once the `Transmit` event runs for the message at the head of the queue, it triggers a callback which proceeds by repeating this same procedure on the next message in the queue. A `Transmit` will also cause the corresponding message to be put at the receiver's inbound link queue, where it will follow a similar process: it has to wait in the receivers's inbound queue until we can schedule a corresponding `Receive` event with delay $s_m/i_j$. When the `Receive` event finally runs, the message is delivered at the receiver. **Shortcomings of the Simple Model.** The main problem with this model is that it counts the delays incurred by the sender's link and delays incurred by the receiver's link separately. To see why this is a problem, suppose that we had a 100Mbps link between nodes $a$ and $b$, and that we wanted to send a $100$Mb message from $a$ to $b$. The delay computed by the method would be $o_j/s_m + i_j/s_m = 100/100 + 100/100 = 2$. But it is clear that the transmission delay, in the absence of congestion or network latency, should be $1$. How can we fix this? **The Double Clearance Model.** The flaw in the simple model happens because it assumes that the receiver starts receiving only after the transmitter finishes transmitting, which is false: as soon as the first byte enters the network, the receiver can already receive it. A simple way to address this is to enqueue the message at the receiver _as soon as the corresponding_ `Transmit` _is scheduled._ To see why this works, note that we can visualize a link as something that produces a certain number of "tokens" at every second. If a node can transmit 1Mbps, then it generates 1M tokens per second. Conversely, a node that can receive at 2Mbps will generate 2M tokens per second. A successful transmission happens when a message $m$ is able to clear $s_m$ tokens at the sender and then clear another $s_m$ tokens at the receiver (hence double clearance); i.e., both ends need to have enough tokens, and this needs to happen first at the transmitter and then at the receiver. This model allow us to simulate multi-sender multi-receiver scenarios with differing and asymetric link capacities: if a sender has a lot of bandwidth, it will eat through its queue quickly, but the receiver queues might pile up. The algorithm is expressed as Python pseudocode in Listings 1, 2, and 3. ```python= def send_message(outbound: Link, inbound: Link, message: Message, engine: Engine): outbound.queue.enqueue(message) if not outbound.state == LinkState.busy: transmit_next_message(outbound, inbound, engine) def transmit_next_message(outbound: Link, inbound: Link, engine: Engine): outbound.state = LinkState.busy message = outbound.queue.dequeue() engine.schedule(Transmit(message, inbound, outbound), delay = message.size / o_j) receive_message(inbound, message) def on_transmit(event: Transmit, engine: Engine): message.sender_cleared() if not event.outbound.empty(): transmit_next_message(event.outbound, event.inbound, engine) else: event.outbound.state = LinkState.free ``` **Listing 1.** Messages with bandwidth-incurred delays (transmitter). Protocols call `send_message` when they need to send messages (Listing 1). This will cause the message to be enqueued (line 1) and, if the link is not already transmitting, then `transmit_next_message` gets called. Otherwise it means we have a pending `on_transmit` scheduled which will take care of transmitting the next message. `transmit_next_message` takes the next message from the queue and schedules a `Transmit` with delay $s_m/o_j$, which is the time that the outbound link needs to send the message. It also calls `receive_message` in the inbound link so that the message gets queued at the receiver. `receive_message` works like `send_message` and enqueues the message at the inbound link, calling `receive_next_message` if the link is not busy. Once again, if the link is busy it means we have a pending `on_receive` scheduled which will process the next message. `receive_next_message` then dequeues the next message and schedules a `Transmit` event with a delay of $s_m/i_j$. ```python= def receive_message(inbound: Link, message: Message): inbound.queue.enqueue(message) if not inbound.state == LinkState.busy: receive_next_message(inbound, engine) def receive_next_message(inbound: Link, engine: Engine): link.state = LinkState.busy message = inbound.queue.dequeue() engine.schedule(Receive(message, inbound), delay = message.size / i_j) def on_receive(event: Receive, engine: Engine): message.receiver_cleared() if not event.inbound.empty(): receive_next_message(event.inbound, engine) else: event.inbound.state = LinkState.free ``` **Listing 2.** Messages with bandwidth-incurred delays (receiver). Eventually, both `on_transmit` and `on_receive` will get triggered for the message, which will ultimately cause it to be delivered (Listing 3, line 16). ```python= class Message: receiver: Peer sender_cleared: bool = False receiver_cleared: bool = False def sender_cleared(self: Message): self.sender_cleared = True self.deliver() def receiver_cleared(self: Message): self.receiver_cleared = True self.deliver() def deliver(self: Message): if self.sender_cleared and self.receiver_cleared: self.receiver.deliver(self) ``` **Listing 3.** A message. [^1]: A. Demers et al. "[Epidemic algorithms for replicated database maintenance](https://dl.acm.org/doi/10.1145/41840.41841)".