owned this note
owned this note
Published
Linked with GitHub
# Challenges in deploying low-latency anonymity (DRAFT)
###### tags: `Tag(HashCloak - Validator Privacy)`
Authors: Roger Dingledine, Nick Mathewson, and Paul Syverson
Paper: https://www.onion-router.net/Publications/challenges.pdf
Definitions:
### Table of Contents
[toc]
:::info
>Abstract: There are many unexpected or unexpectedly difficult obstacles to deploying anonymous communications. Drawing on our experiences deploying Tor (the second-generation onion routing network), we describe social challenges and technical issues that must be faced in building, deploying, and sustaining a scalable, distributed, low-latency anonymity network.
:::
### Introduction
> Anonymous communication is full of surprises.
This paper discusses some unexpected challenges
arising from our experiences deploying Tor, a low-latency general-purpose anonymous communication system. We will discuss some of the difficulties we have experienced and how we have met them (or how we plan to meet them, if we know). We also discuss some less troublesome open problems that we must nevertheless eventually address.
* Tor is an overlay network for anonymizing TCP streams over the Internet.
* It addresses limitations in earlier Onion Routing designs by adding perfect forward secrecy, congestion control, directory servers, data integrity, configurable exit policies, and location-hidden services using rendezvous points.
* Tor works on the real-world Internet, requires no special privileges or kernel modifications, requires little synchronization or coordination between nodes, and provides a reasonable trade-off between anonymity, usability, and efficiency.
* We deployed the public Tor network in October 2003.
* It has grown to over a hundred volunteer-operated nodes and as much as 80 megabits of average traffic per second
* Tor’s research strategy has focused on deploying a network to as many users as possible.
### Background
> Basic overview of the Tor design and its properties, and compare Tor to other low-latency anonymity designs.
#### Tor, threat models, and distributed trust
* Tor provides forward privacy
* Provides location-hidden services
Tor provides these protections even when a portion of its infrastructure is compromised.
To connect to a remote server via Tor, the client software learns a signed list of Tor nodes from one of several central directory servers, and incrementally creates a private pathway or circuit of encrypted connections through authenticated Tor nodes on the network, negotiating a separate set of encryption keys for each hop along the circuit.
The circuit is extended one node at a time, and each node along the way knows only the immediately previous and following nodes in the circuit, so no individual Tor node knows the complete path that each fixed-sized data packet (or cell) will take.
Later requests use a new circuit, to complicate long-term linkability between different actions by a single user.
Tor also helps servers hide their locations while providing services such as web publishing or instant messaging.
Tor attempts to anonymize the transport layer, not the application layer. This approach is useful for applications such as SSH where authenticated communication is desired. However, when anonymity from those with whom we communicate is desired, application protocols that include personally identifying information need additional application-level scrubbing proxies, such as Privoxy for HTTP.
Tor does not relay arbitrary IP packets; it only anonymizes TCP streams and DNS requests.
Most node operators do not want to allow arbitrary TCP traffic. To address this, Tor provides exit policies so each exit node can block the IP addresses and ports it is unwilling to allow. Tor nodes advertise their exit policies to the directory servers, so that client can tell which nodes will support their connections.
**Threat models and design philosophy**
> The ideal Tor network would be practical, useful
and anonymous.
Tor has a weaker threat model than many designs in the literature. In particular, because we support interactive communications without impractically expensive padding, we fall prey to a variety of intra-network and end-to-end anonymity-breaking attacks.
Tor does not attempt to defend against a global observer. In general, an attacker who can measure both ends of a connection through the Tor network can correlate the timing and volume of data on that connection as it enters and leaves the network, and so link communication partners.
> Known solutions to this attack would seem to require introducing a prohibitive degree of traffic padding between the user and the network, or introducing an unacceptable degree of latency. Also, it is not clear that these methods would work at all against a minimally active adversary who could introduce timing patterns or additional traffic.
If the user continues to build random circuits over time, an adversary is pretty certain to see a statistical sample of the user’s traffic, and thereby can build an increasingly accurate profile of her behavior.
An adversary who controls a popular service outside the Tor network can be certain to observe all connections to that service.
* An attacker who can catalog data volumes of popular responder destinations (say, websites with consistent data volumes) may not need to observe both ends of a stream to learn source-destination links for those responders.
* Similarly, latencies of going through various routes can be cataloged to connect endpoints.
> It has not yet been shown whether these attacks will succeed or fail in the presence of the variability and volume quantization introduced by the Tor network, but it seems likely that these factors will at best delay rather than halt the attacks in the cases where they succeed.
* Clogging attack: in which the throughput on a circuit is observed to slow down when an adversary clogs the right nodes with his own traffic.
> To determine the nodes in a circuit this attack requires the ability to continuously monitor the traffic exiting the network on a circuit that is up long enough to probe all network nodes in binary fashion.
* An outside attacker can actively trace a circuit through the Tor network by observing changes in the latency of his own traffic sent through various Tor nodes.
> This can be done simultaneously at multiple nodes; however, like clogging, this attack only reveals the Tor nodes in the circuit, not initiator and responder addresses, so it is still necessary to discover the endpoints to complete an effective attack.
**Distributed trust**
> In practice Tor’s threat model is based on dispersal and diversity.
Our defense lies in having a diverse enough set of nodes to prevent most real-world adversaries from being in the right places to attack users, by distributing each transaction over several nodes in the network. This “distributed trust” approach means the Tor network can be safely operated and used by a wide variety of mutually distrustful users, providing sustainability and security.
For maximum protection, the Tor design includes an enclave approach that lets data be encrypted (and authenticated) end-to-end, so high-sensitivity users can be sure it hasn’t been read or modified. This even works for Internet services that don’t have built-in encryption and authentication, such as unencrypted HTTP or chat, and it requires no modification of those services.
#### Related work
The Freedom network from Zero-Knowledge Systems [3] was even more flexible than Tor in transporting arbitrary IP packets, and also supported pseudonymity in addition to anonymity; but it had a different approach to sustainability (collecting money from users and paying ISPs to run Tor nodes), and was eventually shut down due to financial load.
### Social Challenges
> In particular, the Tor project’s image with respect to its users and the rest of the Internet impacts the security it can provide. With this image issue in mind, this section discusses the Tor user base and Tor’s interaction with other services on the Internet.
#### Communicating security
> Usability for anonymity systems contributes to their security, because usability affects the possible anonymity set.
Users should choose which anonymity system to use based in part on how usable and secure others will find it, in order to get the protection of a larger anonymity set. Thus we might supplement the adage “usability is a security parameter” with a new one: “perceived usability is a security parameter.” From here we can better understand the effects of publicity on security: the more convincing your advertising, the more likely people will believe you have users, and thus the more users you will attract. Perversely, over-hyped systems (if they are not too broken) may be a better choice than modestly promoted ones, if the hype attracts more users.
#### Reputability and perceived social value
> Another factor impacting the network’s security is its reputability: the perception of its social value based on its current user base.
If Alice is the only user who has ever downloaded the software, it might be socially accepted, but she’s not getting much anonymity. Add a thousand activists, and she’s anonymous, but everyone thinks she’s an activist too. Add a thousand diverse citizens (cancer survivors, privacy enthusiasts, and so on) and now she’s harder to profile.
#### Sustainability and incentives
> One of the unsolved problems in low-latency anonymity designs is how to keep the nodes running.
people and organizations who use Tor for anonymity
depend on the continued existence of the Tor network to do so; running a node helps to keep the
network operational.
Since Tor is run by volunteers, the most crucial software usability issue is usability by operators: when an operator leaves, the network becomes less usable by everybody. To keep operators pleased, we must try to keep Tor’s resource and administrative demands as low as possible.
Because of ISP billing structures, many Tor operators have underused capacity they that they are willing to donate to the network, at no additional monetary cost to them.
#### Bandwidth and file-sharing
> Once users have configured their applications to work with Tor, the largest remaining usability issue is performance.
Users begin to suffer when websites “feel slow.” Clients currently try to build their connections through nodes that they guess will have enough bandwidth. But even if capacity is allocated optimally, it seems unlikely that the current network architecture will have enough capacity to provide every user with as much bandwidth as she would receive if she weren’t using Tor, unless far more nodes join the network.
Much of Tor’s recent bandwidth difficulties have come from file-sharing applications. These applications provide two challenges to any anonymizing network: their intensive bandwidth requirement, and the degree to which they are associated (correctly or not) with copyright infringement.
High-bandwidth protocols can make the network unresponsive, but tend to be somewhat selfcorrecting as lack of bandwidth drives away users who need it.
Tor will likely remain attractive for limited use in file-sharing protocols that have separate control and data channels.
#### Tor and blacklists
> It was long expected that, alongside legitimate users, Tor would also attract troublemakers who
exploit Tor to abuse services on the Internet with vandalism, rude mail, and so on
Our initial answer to this situation was to use “exit policies” to allow individual Tor nodes to block access to specific IP/port ranges. This approach aims to make operators more willing to run Tor by allowing them to prevent their nodes from being used for abusing particular services. For example, all Tor nodes currently block SMTP (port 25), to avoid being used for spam.
By blocking IPs which are used by Tor nodes, open proxies, and service abusers, these systems hope to make ongoing abuse difficult. Although the system is imperfect, it works tolerably well for them in practice.
This is why services use IP blocking. To deter abuse, pseudonymous identities need to require a significant switching cost in resources or human time. Some popular webmail applications impose cost with Reverse Turing Tests, but this step may not deter all abusers. Freedom used blind signatures to limit the number of pseudonyms for each paying account, but Tor has neither the ability nor the desire to collect payment.
> The Freenode IRC network had a problem with a coordinated group of abusers joining channels and subtly taking over the conversation; but when they labelled all users coming from Tor IPs as “anonymous users,” removing the ability of the abusers to blend in, the abuse stopped.
### Design choices
> Tor also faces some design trade-offs that must be investigated as the network develops.
#### Transporting the stream vs transporting the packets
> Tor transports streams; it does not tunnel packets.
It has often been suggested that like the old Freedom network, Tor should “obviously” anonymize IP traffic at the IP layer. Before this could be done, many issues need to be resolved:
1. IP packets reveal OS characteristics. We would still need to do IP-level packet normalization, to stop things like TCP fingerprinting attacks. This is unlikely to be a trivial task, given the diversity and complexity of TCP stacks.
2. Application-level streams still need scrubbing. We still need Tor to be easy to integrate with user-level application-specific proxies such as Privoxy. So it’s not just a matter of capturing packets and anonymizing them at the IP layer.
3. Certain protocols will still leak information. For example, we must rewrite DNS requests so they are delivered to an unlinkable DNS server rather than the DNS server at a user’s ISP; thus, we must understand the protocols we are transporting.
4. The crypto is unspecified. First we need a block-level encryption approach that can provide security despite packet loss and out-of-order delivery. Freedom allegedly had one, but it was never publicly specified. Also, TLS over UDP is not yet implemented or specified, though some early work has begun.
5. We’ll still need to tune network parameters. Since the above encryption system will likely need sequence numbers (and maybe more) to do replay detection, handle duplicate frames, and so on, we will be reimplementing a subset of TCP anyway—a notoriously tricky path.
6. Exit policies for arbitrary IP packets mean building a secure IDS. Our node operators tell us that exit policies are one of the main reasons they’re willing to run Tor. Adding an Intrusion Detection System to handle exit policies would increase the security complexity of Tor, and would likely not work anyway, as evidenced by the entire field of IDS and counter-IDS papers. Many potential abuse issues are resolved by the fact that Tor only transports valid TCP streams (as opposed to arbitrary IP including malformed packets and IP floods), so exit policies become even more important as we become able to transport IP packets. We also need to compactly describe exit policies so clients can predict which nodes will allow which packets to exit.
7. The Tor-internal name spaces would need to be redesigned. We support hidden service .onion addresses (and other special addresses, like .exit which lets the user request a particular exit node), by intercepting the addresses when they are passed to the Tor client. Doing so at the IP level would require a more complex interface between Tor and the local DNS resolver.
#### Mid-latency
> Some users need to resist traffic correlation attacks.
Can we improve Tor’s resistance without losing too much usability? We need to learn whether we can trade a small increase in latency for a large anonymity increase, or if we’d end up trading a lot of latency for only a minimal security gain. A trade-off might be worthwhile even if we could only protect certain use cases, such as infrequent shortduration transactions. We might adapt the techniques of to a lower-latency mix network, where the messages are batches of cells in temporally clustered connections. These large fixed-size batches can also help resist volume signature attacks. We could also experiment with traffic shaping to get a good balance of throughput and security. We must keep usability in mind too. How much can latency increase before we drive users away? We’ve already been forced to increase latency slightly, as our growing network incorporates more DSL and cable-modem nodes and more nodes in distant continents. Perhaps we can harness this increased latency to improve anonymity rather than just reduce usability. Further, if we let clients label certain circuits as mid-latency as they are constructed, we could handle both types of traffic on the same network, giving users a choice between speed and security—and giving researchers a chance to experiment with parameters to improve the quality of those choices.
> Personal Note: Do all the validators in beacon chain need to be low-latency validators? Could dark/private validators have variable/different latancy requirements?
#### Enclaves and helper nodes
> It has long been thought that users can improve their anonymity by running their own node, and using it in an enclave configuration, where all their circuits begin at the node under their control.
* Running Tor clients or servers at the enclave perimeter is useful when policy or other requirements prevent individual machines within the enclave from running Tor clients.
* Of course, Tor’s default path length of three is insufficient for these enclaves, since the entry and/or exit themselves are sensitive. Tor thus increments path length by one for each sensitive endpoint in the circuit.
> Simply adding to the path length, or using a helper node, may not protect an enclave node.
Using randomized path lengths may help some, since the attacker will never be certain he has identified all nodes in the path unless he probes the entire network, but as long as the network remains small this attack will still be feasible.
The literature does not describe how to choose helpers from a list of nodes that changes over time. If Alice is forced to choose a new entry helper every d days and c of the n nodes are bad, she can expect to choose a compromised node around every dc/n days. Statistically over time this approach only helps if she is better at choosing honest helper nodes than at choosing honest nodes. Worse, an attacker with the ability to DoS nodes could force users to switch helper nodes more frequently, or remove other candidate helpers.
#### Location-hidden services
> Tor’s rendezvous points let users provide TCP services to other Tor users without revealing the service’s location.
First, our implementation of hidden services seems less hidden than we’d like, since they build a different rendezvous circuit for each user, and an external adversary can induce them to produce traffic. This insecurity means that they may not be suitable as a building block for Free Haven or other anonymous publishing systems that aim to provide long-term security, though helper nodes would seem to help.
Hot-swap hidden services, where more than one location can provide the service and loss of any one location does not imply a change in service, would help foil intersection and observation attacks where an adversary monitors availability of a hidden service and also monitors whether certain users or servers are online.
#### Location diversity and ISP-class adversaries
> Anonymity networks have long relied on diversity of node location for protection against attacks typically an adversary who can observe a larger fraction of the network can launch a more effective attack.
1. One way to achieve dispersal involves growing the network so a given adversary sees less.
2. Alternately, we can arrange the topology so traffic can enter or exit at many places
> for example, by using a free-route network like Tor
3. Lastly, we can use distributed trust to spread each transaction over multiple jurisdictions.
#### The Anti-censorship problem
> Citizens in a variety of countries, such as most recently China and Iran, are blocked from accessing various sites outside their country.
Even though Tor wasn’t designed with ubiquitous access to the network in mind, thousands of users across the world are now using it for exactly this purpose.
Anti-censorship networks hoping to bridge country-level blocks face a variety of challenges.
One of these is that they need to find enough exit nodes-servers on the ‘free’ side that are willing to relay traffic from users to their final destinations.
> Anonymizing networks like Tor are well-suited to this task since we have already gathered a set of exit nodes that are willing to tolerate some political heat.
The other main challenge is to distribute a list of reachable relays to the users inside the country, and give them software to use those relays, without letting the censors also enumerate this list and block each relay.
### 5. Scaling
> Tor is running today with hundreds of nodes and tens of thousands of users, but it will certainly not scale to millions.
Scaling Tor involves four main challenges.
1. First, to get a large set of nodes, we must address incentives for users to carry traffic for others.
2. Next is safe node discovery, both while bootstrapping (Tor clients must robustly find an initial node list) and later (Tor clients must learn about a fair sample of honest nodes and not let the adversary control circuits).
3. We must also detect and handle node speed and reliability as the network becomes increasingly heterogeneous: since the speed and reliability of a circuit is limited by its worst link, we must learn to track and predict performance.
4. Finally, we must stop assuming that all points on the network can connect to all other points.
#### Incentives by Design
> There are three behaviors we need to encourage for each Tor node: relaying traffic; providing good throughput and reliability while doing it; and allowing traffic to exit the network from that node.
We encourage these behaviors through indirect incentives: that is, by designing the system and educating users in such a way that users with certain goals will choose to relay traffic.
* One main incentive for running a Tor node is social: volunteers altruistically donate their bandwidth and time.
* We further explain to users that they can get deniability for any traffic emerging from the same address as a Tor exit node, and they can use their own Tor node as an entry or exit point with confidence that it’s not run by an adversary.
* Further, users may run a node simply because they need such a network to be persistently available and usable, and the value of supporting this exceeds any countervening costs.
* Finally, we can encourage operators by improving the usability and feature set of the software: rate limiting support and easy packaging decrease the hassle of maintaining a node, and our configurable exit policies allow each operator to advertise a policy describing the hosts and ports to which he feels comfortable connecting.
A more promising option is to use a tit-for-tat incentive scheme, where nodes provide better service to nodes that have provided good service for them. Unfortunately, such an approach introduces new anonymity problems. There are many surprising ways for nodes to game the incentive and reputation system to undermine anonymity—such systems are typically designed to encourage fairness in storage or bandwidth usage, not fairness of provided anonymity. An adversary can attract more traffic by performing well or can target individual users by selectively performing, to undermine their anonymity. Typically a user who chooses evenly from all nodes is most resistant to an adversary targeting him, but that approach hampers the efficient use of heterogeneous nodes.
A possible solution is a simplified approach to the tit-for tat incentive scheme based on two rules:
1. Each node should measure the service it receives from adjacent nodes, and provide service relative to the received service, but
2. When a node is making decisions that affect its own security (such as building a circuit for its own application connections), it should choose evenly from a sufficiently large set of nodes that meet some minimum service threshold.
#### Trust and discovery
> The published Tor design is deliberately simplistic in how new nodes are authorized and how clients are informed about Tor nodes and their status.
* All nodes periodically upload a signed description of their locations, keys, and capabilities to each of several well-known directory servers.
* These directory servers construct a signed summary of all known Tor nodes (a “directory”), and a signed statement of which nodes they believe to be operational then (a “network status”).
* Clients periodically download a directory to learn the latest nodes and keys, and more frequently download a network status to learn which nodes are likely to be running.
* Tor nodes also operate as directory caches, to lighten the bandwidth on the directory servers.
To prevent Sybil attacks, this design requires the directory server operators to manually approve new nodes. Unapproved nodes are included in the directory, but clients do not use them at the start or end of their circuits.
> This procedure may prevent trivial automated Sybil attacks, but will do little against a clever and determined attacker.
There are a number of flaws in this system that need to be addressed as we move forward.
1. First, each directory server represents an independent point of failure: any compromised directory server could start recommending only compromised nodes.
2. Second, as more nodes join the network, directories become infeasibly large, and downloading the list of nodes becomes burdensome.
3. Third, the validation scheme may do as much harm as it does good. It does not prevent clever attackers from mounting Sybil attacks, and it may deter node operators from joining the network if they expect the validation process to be difficult, or they do not share any languages in common with the directory server operators.
> Ultimately, of course, we cannot escape the problem of a first introducer: since most users will run Tor in whatever configuration the software ships with, the Tor distribution itself will remain a single point of failure so long as it includes the seed keys for directory servers, a list of directory servers, or any other means to learn which nodes are on the network. But omitting this information from the Tor distribution would only delegate the trust problem to each individual user. A well publicized, widely available, authoritatively and independently endorsed and signed list of initial directory servers and their keys is a possible solution. But, setting that up properly is itself a large bootstrapping task.
#### Measuring performance and capacity
> One of the paradoxes with engineering an anonymity network is that we’d like to learn as much as we can about how traffic flows so we can improve the network, but we want to prevent others from learning how traffic flows in order to trace users’ connections through the network.
* Currently, nodes try to deduce their own available bandwidth (based on how much traffic they have been able to transfer recently) and include this information in the descriptors they upload to the directory.
* Clients choose servers weighted by their bandwidth, neglecting really slow servers and capping the influence of really fast ones.
While it seems plausible that bandwidth data alone is not enough to reveal sender-recipient connections under most circumstances, it could certainly reveal the path taken by large traffic flows under low-usage circumstances.
#### Non-clique topologies
> Tor’s comparatively weak threat model may allow easier scaling than other designs.
High-latency mix networks need to avoid partitioning attacks, where network splits let an attacker distinguish users in different partitions.
Since Tor assumes the adversary cannot cheaply observe nodes at will, a network split may not decrease protection much. Thus, one option when the scale of a Tor network exceeds some size is simply to split it.
> Personal Note: Sharding?
Nodes could be allocated into partitions while hampering collaborating hostile nodes from taking over a single partition. Clients could switch between networks, even on a per-circuit basis.
More conservatively, we can try to scale a single Tor network. Likely problems with adding more servers to a single Tor network include an explosion in the number of sockets needed on each server as more servers join, and increased coordination overhead to keep each users’ view of the network consistent.
As we grow, we will also have more instances of servers that can’t reach each other simply due to Internet topology or routing problems.
There are many open questions: how to distribute connectivity information (presumably nodes will learn about the central nodes when they download Tor), whether central nodes will need to function as a ‘backbone’, and so on. As above, this could reduce the amount of anonymity available from a mix-net, but for a low-latency network where anonymity derives largely from the edges, it may be feasible.
### 6. The Future
> Tor is the largest and most diverse low-latency anonymity network available, but we are still in the beginning stages of deployment.
* For applications where it is desirable
to keep identifying information out of application traffic, someone must build more and better
protocol-aware proxies that are usable by ordinary people.
* We need to gain a reputation for
social good, and learn how to coexist with the variety of Internet services and their established
authentication mechanisms. We can’t just keep escalating the blacklist standoff forever