owned this note
owned this note
Published
Linked with GitHub
# "Impact of Network Topology on Anonymity and Overhead in Low-Latency Anonymity Networks"
###### tags: `Tag(HashCloak - Validator Privacy)`
Author(s): Claudia Diaz, Steven J. Murdoch, and Carmela Troncoso
Paper: https://www.freehaven.net/anonbib/cache/topology-pet2010.pdf
### Table of Contents
[toc]
:::info
>Abstract: Low-latency anonymous communication networks require padding to resist timing analysis attacks, and dependent link padding has been proven to prevent these attacks with minimal overhead. In this paper we consider low-latency anonymity networks that implement dependent link padding, and examine various network topologies. We find that the choice of the topology has an important influence on the padding overhead and the level of anonymity provided, and that Stratified networks offer the best trade-off between them.
>
>We show that fully connected network topologies (Free Routes) are impractical when dependent link padding is used, as they suffer from feedback effects that induce disproportionate amounts of padding; and that Cascade topologies have the lowest padding overhead at the cost of poor scalability with respect to anonymity. Furthermore, we propose an variant of dependent link padding that considerably reduces the overhead at no loss in anonymity with respect to external adversaries. Finally, we discuss how Tor, a deployed large-scale anonymity network, would need to be adapted to support dependent link padding.
:::
## 1 Introduction
A common solution to thwart timing analysis is the use of padding, i.e., dummy packets indistinguishable from (encrypted) real data. In this paper we consider Dependent Link Padding (DLP), a variant of padding in which the amount of dummy traffic generated at the output of a node depends on its input traffic.
We find that the topology of the network has a strong influence on both overhead and anonymity.
* Cascade networks introduce the lowest overhead, but at the cost of poor scalability in terms of anonymity.
* Fully connected networks (Free Routes) offer high anonymity, but suffer from feedback effects that cause huge overhead.
* Stratified networks are the best anonymity vs. overhead trade-off.
* Of all topologies, this provides the best level of anonymity, and its overhead is much lower than Free Routes.
We introduce a restricted variant of the Stratified topology that further reduces the overhead at almost no cost in anonymity. Moreover, restricted topologies have better scalability.
In anonymity networks, connections between two routers are commonly encrypted and carry multiple data flows. We propose Reduced Overhead Dependent Link Padding (RO-DLP), a variant of dependent link padding that takes advantage of this property. RO-DLP provides the same level of protection as DLP towards external adversaries – who can observe communications but do not control any router – while substantially reducing the overhead.
## 2 Background and Related Work
**Low-latency Anonymous Communications:** Goldschlag, Reed, and Syverson introduced onion routing in 1996, and a second generation protocol has been implemented in the Tor network.
* Onion routing is designed to provide a bidirectional, low-latency anonymous communication channel that can be used for applications like web browsing.
* Onion routers perform cryptographic operations on the data they relay, so that the relationship between input and output data packets cannot be inferred from analyzing their content.
* The feasibility for delaying and reordering packets in onion routing is however limited by latency constraints, and therefore incoming and outgoing packets may be linked in these systems by means of timing analysis or end-to-end correlation attacks
**Padding to Resist Timing Analysis:**
To conceal the relationship between incoming and outgoing flows, dummy traffic (padding) must be added to the data flows.
* Data packets leaving each node are augmented by dummy packets which the adversary cannot distinguish from (encrypted) real data packets.
* In addition, the start and end time of the flows must be obscured to prevent traffic analysis attacks based on correlating the timing of these events [12].
* This can be achieved by synchronizing session start and end between all clients [13].
An algorithm for performing DLP, whilst guaranteeing a maximum latency $∆$ at each node, and minimizing the amount of padding, was independently discovered by Venkitasubramaniam and Tong [22] and Wang et al. [23].
Their algorithm is to, when a packet is received at time $t$, check whether a padding packet has been scheduled to be transmitted on the corresponding outgoing link.
* If it is, the padding packet is replaced with the real packet.
* If not, the real packet is scheduled at time $t + ∆$ and padding packets are scheduled at the same time on all other outgoing links.
In this way, no packet will be delayed for more than $∆$ and the scheme is optimal in that it achieves mixing with the minimal amount of padding.
**Anonymity Metrics:** . By observing – or actively attacking – an anonymous communication system, the adversary typically obtains a probability distribution linking the initiator of a communication to all possible recipients, and vice versa.
Computing the anonymity of communications in complex networks while taking into account all information available to the adversary is infeasible to do analytically [15], as it requires enumerating all possible combinations of internal states in the routers, as well as initiator-recipient relationships.
Previous comparisons of network topologies [7] avoided this problem by simplifying the
analyzed scenario – e.g., assuming that the load on each internal link within the network is exactly equal to the statistically expected load given a particular network topology.
* The Markov Chain Monte Carlo methodology recently proposed in [20] is based on sampling possible internal states and initiator-recipient correspondences that satisfy all constraints.
* This allows the efficient estimation (for a given confidence interval) of the adversary’s probability distribution, taking into account all the available information.
* The model in [20] considers high-latency threshold mix networks.
* We note that the adversary’s observation of a low-latency anonymity network that implements DLP, and has synchronous starts and ends of connections, is equivalent to that of a high-latency network made of threshold mixes.
* Therefore, the methodology can be used without major modifications to extend the analysis in [22] to consider routing constraints.
## 3 Model
**System Model:** We consider an anonymity system based on onion routing that implements dependent link padding and has synchronous starts and ends of connections. For simplicity, we assume that the path length of circuits is always three and require that all three routers in the path are distinct.
When a client wishes to make a request, the system works as follows.
* First, it constructs a route selecting three routers (nodes) from the list of all available nodes, subject to topology constraints.
* It then connects to the first node (entry,) and exchanges keys to form an encrypted tunnel.
* Over this encrypted tunnel, the client connects to the second node (middle,) and then the third node (exit,) exchanging keys at each point such that each node knows the previous and next hops, but no more.
The connection through the three nodes, along with the corresponding keys, is known as a *circuit*.
* Once the circuit is established, the client requests that the last node creates a *stream* to carry the application data.
* Data is packaged into fixed length cells which are subsequently encrypted under the keys shared with the recipient, exit, middle, and entry nodes, and sent to the entry node.
* At each hop, one layer is removed until the recipient finally decrypts the payload.
* Note that DLP requires that the recipient be able to decrypt data cells, and to discard the dummy cells that have been added to the stream.
Multiple circuits may be carried on the link between any given pair of nodes.
* In addition to the circuit-level cryptography, which is end-to-end, there is also hop-by-hop link-level cryptography protecting the traffic between nodes.
* An external adversary will therefore not be able to tell, based on content, whether two cells correspond to the same circuit or to different ones.
**Attacker Model:** We assume that the adversary is *global*: it observes traffic on all communication links and knows the number of circuits routed over each of them. Furthermore, the adversary is active, and may introduce, delay, or drop cells.
* We note that DLP protects against active attacks, as all streams coming out of a node are identical.
* e.g., if the adversary deploys traffic watermarking attacks [9, 24], then all outgoing streams will carry the watermark.
Throughout the analysis, we also assume that all nodes are trustworthy, and thus the adversary is *external* and has no knowledge of node keys or other internal state information.
## 4 RO-DLP: Reducing Padding Overhead in DLP
![](https://i.imgur.com/HOHO7IK.png)
In the original DLP proposals, nodes pad every outgoing circuit in the same way, independently of whether or not some circuits are being multiplexed over the same link. In anonymity networks however, nodes typically use link encryption, which hides the correspondence of cells to circuits within a link.
In this section we present the Reduced Overhead Dependent Link Padding (RODLP) algorithm. Compared to simple DLP, RO-DLP reduces the amount of dummy traffic sent over links that multiplex several circuits, while achieving the same level of security against global external adversaries that do not control nodes.
The goal of link padding is to prevent the adversary from learning the correspondence between incoming and outgoing circuits. Given that at time $t$ the node forwards $R_t$ cells, we show that it is enough to send $R_t$ cells over links that contain a number $c_i$ of circuits that is larger than $R_t$.
* Let us consider a node $n$ that routes $C$ circuits over $L$ links (note that $L ≤ C$,) and let $c_i$ denote the number of circuits multiplexed over the same link $l_i (1 ≤ i ≤ L$, and $\sum^L_{i=1} c_i = C.$)
* Initially, RO-DLP schedules a cell for each of the $C$ outgoing circuits, as in DLP.
* Thus, at time $t$ a set of $C$ cells are scheduled, of which $R_t$ correspond to cells that are being forwarded, and $C −R_t$ are dummy cells generated by node $n$.
* RO-DLP removes $r_i$ dummy cells from link $l_i$ as follows:
![](https://i.imgur.com/3tkWIUs.png)
The intuition behind this algorithm is the following:
* The adversary monitors the number of cells arriving at node $n$ and can predict the number $R_t$ of cells that will be forwarded at time $t$.
* When ci > Rt cells are sent over link $l_i$, the adversary knows that (at least) $c_i − R_t$ of these are dummy cells generated by $n$, and thus these do not provide any additional protection.
Note that if each link contains only one circuit, then no dummies can be removed and RO-DLP’s overhead is the same as DLP’s [22, 23]. If all circuits going through a node are routed over one single link (e.g., in a Cascade topology,) then no dummies would be sent by that node and RO-DLP would not generate any overhead.
## 5 Experimental Setup
We have implemented a simulator to evaluate the anonymity and dummy traffic overhead in anonymity networks that implement dependent link padding. Our simulator generates networks of $N$ nodes, where $N$ is an input parameter. Users create circuits that traverse three nodes before reaching their destination. We call entry node the first node in the circuit path, middle node the second, and exit node the third and last node. We consider four possible topologies, shown in **Figure 2**:
![](https://i.imgur.com/nAjTANc.png)
* **Free Routes (FR):** Any combination of three distinct nodes is a valid circuit path.
* Given an entry node, we choose, uniformly at random, a middle node from the remaining $N − 1$ nodes, and an exit node from the remaining $N − 2$ nodes.
* **Stratified (S):** Nodes are divided into entries, middle nodes, and exits ($N/3$ nodes in each category,) such that any entry connects to any middle, and any middle to any exit.
* Given an entry node, we choose uniformly at random one of the $N/3$ middle nodes, and one of the $N/3$ exits.
* **Stratified Restricted (SR):** As in the previous case, nodes are divided into entry, middle and exit nodes.
* We have chosen values of $N$ of the form $N = 3K^2$, where $K$ is an integer.
* Each entry node is connected to $K = \sqrt{N/3}$ middle nodes, and each middle node to $K$ exits.
* An entry node $i \space (0 ≤ i ≤ N/3 − 1)$ is connected to middle nodes $N/3 + [(i + j)$ mod $N/3]$, with $j = 0 . . . K −1$;
* And a middle node $i \space (N/3 ≤ i ≤ 2N/3−1)$ is connected to exit nodes $2N/3 + [(i + j · K)$ mod $N/3]$, with $j = 0 . . . K − 1$.
* Given an entry node we construct the circuit paths choosing uniformly at random one of the $N/3$ exits, and then finding the middle node that connects the entry to the exit (in this topology each entry is connected to each exit by exactly one middle node.)
* The intuition behind this topology is to allow every entry node to connect to any exit node, while minimizing the number of links.
* **Cascades \(C\):** We consider $N/3$ parallel cascades of three nodes each.
* Given an entry node, the middle and exit nodes are fixed by the topology.
## 6 Results
We examine networks in terms of *anonymity loss* and dummy traffic (padding) *overhead factor*.
The anonymity loss is the difference between the maximum achievable anonymity given the total number of circuits routed by the network and the actual anonymity that the network provides to its circuits. We note that when DLP is deployed, the timing of packets does not leak any information, and the anonymity provided by the system depends only on the routing constraints.
Given a circuit $c_x$, we compute its anonymity loss as $H_{loss} = H_{max} −H(c_x)$.
* The maximum achievable anonymity is given by $H_{max} = log_2 (C_T)$, where $C_T$ is the total number of circuits routed by the network.
* For Stratified and Free Route networks, $H(c_x)$ is estimated by the method presented in, using the obtained lower bound as our estimation.
* In Cascades, we compute the anonymity $H(c_x)$ of a circuit $c_x$ routed by cascade$_i$ as $H(c_x) = log_2 (C_i)$, where $C_i$ is the number of circuits routed by cascade$_i$ (note that $\sum _i C_i = C_T.$)
To present the results for the dummy traffic (padding) overhead, we use the *overhead factor*, which is computed as $Dum \over Real$, where Dum is the number of dummy cells sent in the network every second, and Real is the number of real data cells sent over the same time period. Thus, the overhead factor indicates the number of padding cells sent for each real data cell.
### 6.1 Feedback Effects in Free Route Networks
If dependent link padding is implemented in a Free Route network, feedback effects are likely to happen. The feedback effect occurs both with DLP and RO-DLP, and it provokes dummy traffic to be generated even in the absence of real traffic (this case leads to infinite padding overhead.)
To illustrate this effect, consider two nodes routing two circuits in opposite directions, as shown in **Figure 3**.
![](https://i.imgur.com/4lKG8EJ.png)
One real cell is sent into node $A$, on circuit $Y$. This cell is relayed to node $B$. Node $B$ will, after a delay of $∆$, relay this cell onto the next hop of circuit $Y$, but also generate a padding cell on circuit $X$. When node $A$ receives this cell, it cannot tell that the cell is padding. Thus, $A$ sends it onto the next hop of circuit $X$, and also generates a new dummy cell that is sent back to $B$, repeating the cycle.
> Note that feedback loops may form not only between pairs of nodes, but also in more complex structures involving several nodes.
![](https://i.imgur.com/iWpr4xG.png)
**Figure 4(a)** compares the total traffic (number of cells per second) in a Stratified and a Free Route network that route the same input traffic (the networks have $N = 12$ nodes, route a total of $C_T = 12$ circuits, and there are 10 cells per circuit within the first 3 seconds.)
* Although there is no more input traffic after $t = 4$, we can see that the Free Route network continues to generate dummy traffic that quickly becomes stationary.
* In the Stratified network, traffic stops once the last real cell has left the network *
* This happens at most $3 · ∆$ seconds after it has entered, and in our case $∆$ is 1 second and thus the last cell leaves before $t = 7$.)
### 6.2 Comparison of Network Topologies
**Figure 4(b)** shows the tradeoff offered by the four considered topologies in a network with 12 nodes that implement RO-DLP.
* The $x$ axis shows the overhead factor
* i.e., number of padding cells sent in the network for each real cell, taking into account both the traffic between nodes and the traffic in the edges of the network.
* The $y$ axis shows the anonymity loss with respect to the maximum achievable level in each of the experiments
* Therefore, lower values in the $y$ axis correspond to networks that come closer to providing maximum anonymity to the circuits they are routing.
* The symbol at the center of the plot for each topology represents the median values for anonymity and overhead, the lines indicate the first and third quartiles.
* Although it is not shown in the figure, the overhead of Free Routes tends to infinity when the real traffic is very low.
As we can see in the figure, the overhead is lowest in Cascades, and it increases as more routes are possible in the network (i.e., the next best is Stratified Restricted, then Stratified, and worst is Free Route.) This is rather intuitive, as restricting the routing implies that more circuits are routed (multiplexed) over fewer links, and thus less overhead is generated by the RO-DLP algorithm.
> The fact that Free Routes has a much higher overhead than the other topologies is due to the feedback effects explained in the previous section.
Overall, Stratified topologies provide the best anonymity / overhead tradeoffs, with restrictions in the routing reducing the overhead at the cost of slightly worse anonymity. Cascades are better than Stratified topologies in terms of overhead, but this comes at a high cost in anonymity (a problem that becomes worse as the network grows, as shown in Section 6.4.)
### 6.3 Dummy Traffic Overhead with DLP and RO-DLP
The boxplots of **Figure 5(a)** show the intra-network overhead (i.e., only considering links between nodes) of RO-DLP compared to DLP.
* The results were obtained performing several dozens of simulation experiments on networks of 12 nodes, using real traffic as input, and having each node route the same amount of traffic as the Tor router from which the data was collected.
![](https://i.imgur.com/G1Cz9OQ.png)
In Cascades, all circuits going through a node are multiplexed over one link leading to the next node, which explains why RO-DLP reduces the intra-network overhead factor from 20 to zero (note that RO-DLP still generates padding cells on the edges, which is not shown, and thus the overall overhead is greater than zero.)
The overhead reduction of RO-DLP over DLP is rather significant in Stratified networks too. In Stratified Restricted topologies, the median overhead factor is reduced from 23 to 1.5 (i.e., from sending 23 padding cells for each real cell, to just sending 1.5 padding cells per real cell;) and in Stratified from 27 to 8.
There is a very direct relationship between the number of possible routes (i.e., amount of circuit multiplexing) and the reduction in overhead: the fewer the possible routes, the lower the overhead.
RO-DLP does not have a beneficial effect in Free Route topologies though. This is due to two effects.
* First, because Free Routes allow many more possible circuit paths, circuits are more spread over links and thus links multiplex fewer circuits.
* This mitigates to a large extent the benefits of RO-DLP.
* Furthermore, RO-DLP fails to counter the feedback effects explained in Section 6.1, because it only affects the removal of padding cells at the node where they are generated.
### 6.4 Network Scalability: Anonymity and Overhead
In this section we show results on how anonymity and overhead varies with the size of the network that implements RO-DLP.
**Figure 5(b)** shows the anonymity loss for the four topologies and network sizes of 12, 27, 48 and 300 nodes.
* The $y$ axis represents the anonymity loss in these networks with respect to the maximum achievable $H_{max} = log2 (C_T)$, where $C_T$ is the total number of circuits routed by the network.
* Note that larger networks route more circuits and thus have a bigger $H_{max}$.
* Stratified, Stratified Restricted, and Free Route topologies scale very well in terms of anonymity.
* Their anonymity remains very close to the maximum when the network size grows.
* In networks of 300 nodes, the anonymity loss for any of those topologies is less than one bit.
Cascade topologies however, have poor scalability in terms of anonymity. In this topology a larger network implies more parallel cascades. Given that cascades are independent of each other, they provide a constant level of anonymity, and thus a bigger anonymity loss as $H_{max}$ increases.
We show in **Figure 6(a)** and **Figure 6(b)** the overhead in intra-network links (that multiplex several circuits,) and at the edge of networks of 12, 27 and 48 nodes.
* As expected, overhead is unaffected by the growth of the network in Cascade topologies.
* The overhead factor remains at zero in the links between cascade nodes, as all circuits are multiplexed over a single link; and it remains constant at the network edges, as circuits routed by parallel cascades do not mix with each other:
* bursts in the traffic of one circuit only produce padding in the circuits going through the same cascade.
![](https://i.imgur.com/l0aTEbo.png)
In the other three topologies, we can observe that network size has a negative impact on the overhead factor of the network. This is because a traffic burst in a single circuit produces a burst of dummy traffic in all other circuits, and as more circuits are routed by the network, bursts occur at a higher frequency.
## 7 Applying DLP to Tor
In DLP, edge nodes need not generate padding, but they do need to consume it. Moreover, an adversary should not be able to distinguish padding from normal traffic. If we consider only internal circuits – where both the initiator and destination run the Tor software, though they do not necessarily need to route anyone else’s traffic – DLP is straightforward to implement. Tor already uses internal circuits for connecting to hidden services (where the destination server wishes to hide its identity), and when the destination server is known to be running a Tor node.
It may also be possible to implement DLP without the destination being aware of Tor, provided there is end-to-end encryption and the exit node can inject padding which will be ignored by the destination. This option is more complex, because it requires that the exit node be aware of the encryption protocol.
**Network Topology:**
The original Tor design was a free-route network, however for efficiency reasons it has now moved to a more complex topology. Currently only a subset of nodes can act as the entry (because they must be fast and be highly reliable), and only a subset can act as the exit (because nodes must opt in to allowing exit traffic). To balance load over the network, nodes which can neither be entry nor exit are preferentially selected as middle nodes. Strictly speaking, entry and exit nodes can be selected for the middle position, provided that there is sufficient entry and exit bandwidth, respectively. However, most of the time this is not the case and in practice Tor has a network topology very close to Stratified.
**Implementation of Padding Modes in Tor:** In order to implement DLP, it is necessary that connections between nodes are encrypted, so that an external adversary cannot tell whether two cells belong to the same circuit or different ones.
* Tor uses TLS for protecting both confidentiality and integrity on links, so complies with this requirement.
With respect to the creation of padding, the Tor protocol does permit dummy cells to be inserted, although the current implementation does not generate padding nor are there any plans to do so. Two types of padding cells are offered:
* link padding
* and circuit padding.
However neither meet our requirements; the former is detected as padding by nodes and dropped, and the latter can only be injected by the initiator. There is no way for a node to inject a padding cell such that subsequent nodes on the path cannot distinguish it from data cells sent by the initiator.
The fundamental problem for implementing DLP in Tor is that the variant of onion routing adopted uses a stream cipher.
* Each Tor relayed cell contains a circuit ID that identifies which circuit the message pertains to, and an encrypted payload.
* On receiving such a cell, the Tor node checks if a key has been negotiated for the given circuit ID.
* If so, the node uses AES CTR mode to decrypt the cell with the counter being the number of ciphertext blocks seen in that circuit.
* Then, the node verifies whether a 32 bit digest in the cell matches the SHA-1 hash of all valid cells in the circuit.
* Therefore, if a cell is injected by an intermediate node, the counter will be desynchronized, the data corrupted, and the digest check will fail.
To resolve this problem, a simple addition to the Tor protocol would be another type of link padding cell which triggers a new padding cell to be emitted for the same circuit on the output link. In this way, each hop could add their own padding, which would be maintained all the way to the last hop. As links are protected using TLS, an adversary cannot distinguish padding cells from real cells, and so the goal of the padding would be maintained.
This approach would resist the external adversary considered in the anonymity analysis of RO-DLP. However, in a more realistic scenario the adversary may control some routers, and any corrupt node on the path would be able to trivially tell which cells are padding. Circuit padding cannot be used to resolve this weakness because intermediate nodes cannot inject new cells. However, if instead of CTR mode, a per-cell IV was used, this problem would not exist – i.e., padding would not affect the decryption of other cells.
* Intermediate nodes do not know the key shared by the sender and other nodes in the path, hence the padding will fail the integrity check at the final hop and be discarded.
* A node would therefore be able to add padding cells with a random IV, and intermediate nodes will be unable to distinguish them from real data cells.
* A downside of this approach is that there is the overhead of a IV per-cell.
* Nevertheless, it has the advantage that there is no longer any need for a reliable transport protocol between nodes, provided there is an end-to-end error recovery mechanism.
Moving from TCP to UDP for node-to-node communication has been shown to offer significant performance benefits, especially under congestion [14].
## 8 Conclusions and Future Work
Dependent link padding prevents timing analysis in low-latency anonymity networks while minimizing the overhead. However, the impact of complex topologies on the performance of this technique had not yet been assessed. In this work we have analyzed anonymity / overhead trade-offs in low-latency anonymity systems that implement dependent link padding, and compared three topologies: Cascades, Free Routes and Stratified networks.
> In this work we have assumed that all the nodes in the network are trustworthy. This is essential for RO-DLP to achieve the same level of protection against an external adversary, when compared to previous dependent link padding proposals. If the adversary has control over some of the nodes in the network [7], she would see partially padded circuits, and potentially correlate traffic based on timing analysis.