owned this note
owned this note
Published
Linked with GitHub
# "Introducing MorphMix: Peer-to-Peer based Anonymous Internet Usage with Collusion Detection"
###### tags: `Tag(HashCloak - Validator Privacy)`
Author(s): Marc Rennhard, Bernhard Plattner
Paper: https://www.freehaven.net/anonbib/cache/morphmix:wpes2002.pdf
### Table of Contents
[toc]
:::info
>Abstract: Traditional mix-based systems are composed of a small set of static, well known, and highly reliable mixes. To resist traffic analysis attacks at a mix, cover traffic must be used, which results in significant bandwidth overhead. End-to-end traffic analysis attacks are even more difficult to counter because there are only a few entry-and exit-points in the system. Static mix networks also suffer from scalability problems and in several countries, institutions operating a mix could be targeted by legal attacks. In this paper, we introduce MorphMix, a system for peer-to-peer based anonymous Internet usage. Each MorphMix node is a mix and anyone can easily join the system. We believe that MorphMix overcomes or reduces several drawbacks of static mix networks. In particular, we argue that our approach offers good protection from traffic analysis attacks without employing cover traffic. But MorphMix also introduces new challenges. One is that an adversary can easily operate several malicious nodes in the system and try to break the anonymity of legitimate users by getting full control over their anonymous paths. To counter this attack, we have developed a collusion detection mechanism, which allows to identify compromised paths with high probability before they are being used.
:::
## 2. BASIC ARCHITECTURE AND DESIGN
MorphMix consists of an open-ended set of nodes. A node is defined be its IP address $ip_i$. In addition, each node has a key-pair consisting of a private key PrK$_i$ and a public key PuK$_i$. This key -pair is generated locally when a node runs for the first time.
MorphMix is a circuit-based network.
* To access the Internet anonymously, a user sets up an *anonymous tunnel*, which starts at her own node, via some other nodes.
* Starting node is named *initiator*.
* The last node is called the *final node*.
* The nodes in-between are the *intermediate nodes*.
* There is also distinguishment between *well-behaving nodes* and *malicious nodes*.
* Layered encryption is used, similar to the approach proposed by Chaum [5].
Figure 1 depicts a fully set up anonymous tunnel from n$_1$ via n$_2$, n$_3$, and n$_4$.
![](https://i.imgur.com/Nki6VG9.png)
* All messages exchanged between two nodes have the same length.
* Denote by $\{{m}\}_k$ the encryption of message $m$ with a key $m$.
* When n$_1$ sends a message $m$ through the anonymous tunnel, it encrypts it repeatedly with the symmetric keys corresponding to the *nested encryption* (NEs), which results in $\{\{\{m\} _{k_{N,3}} \} _{k_{N,2}} \} _{k_{N,1}}$.
* The header also contains a sequence number to counter replay-attacks and a type to distinguish control and data messages.
Before n$_1$ sends the message to n$_2$, the header is encrypted according to the *link encryption* )LE) between n$_1$ and n$_2$ using the symmetric key k$_{L,1}$.
* When n$_2$ receives the message:
* It removes the link encryption using k$_{L,1}$, removes one layer of encryption using k$_{N,1}$,
* Determines the next hop according to the identifier in the header,
* Sets the fields in the header for the next link,
* Encrypts the header according to the link encryption between n$_2$ and n$_3$ using k$_{L,2}$, and sends it to n$_3$.
* This continues until the final node is reached, which relays the data to the server n$_1$ wants to communicate with.
An important design decision is whether the mux network operated on top of the IP layer or on the application layer.
1. In the first case, the system is transparent for end-to-end transport and application protocols.
* Data is extracted at the initiator after the IP-layer and transported hop-by-hop through the mix network within UDP diagrams.
* The end-to-end transport of application protocols are responsible to provide a reliable data stream.
2. In the second case, the user's application usually accesses the mix network in the same way a web browser accesses a web proxy: a TCP-connection is set up to an access program running on the initiator's computer, which in turn handles the communication which the mix network
* The system is no longer transparent for the applications, and the access program usually needs to understand the protocol of each application it supports.
In traditional mix networks where each link between two mixes carries the data of several users, UDP is the better choice because with TCP, one lost packet between two mixes stalls every user on that link. Furthermore, UDP makes sense in an environment where all mixes have similar computing power and network connectivity. Each link between two mixes can be tuned to its maximum throughput without having too many lost datagrams.
In MorphMix cover traffic is not employed; due to the large number of mixes, no link is used by very many users at the same time.
* Furthermore, given the heterogeneity of the nodes, using TCP makes life much easier.
* With UDP, two nodes would have to employ some sort of flow control between them in order to not lose so many packets that end-to-end performance would get unacceptable.
MorphMix is an application level mix network using TCP between mixes.
## 3. ANONYMOUS TUNNEL SETUP
### 3.1 Selecting the Next Hop
In MorphMix, the initiator selects only the first intermediate node and each node along the anonymous tunnel then picks the following node.
* Letting each node select the next hop makes MorphMix highly scalable because a node only has to manage its local environment.
* There is one problem with this approach: once we hit a malicious node that wants to collect data about anonymous tunnels, this node could either simulate all remaining hops by itself or use an accomplice as the next hop.
* Solution to this in section 3.3.
### 3.2 Local Environment and Peer Discovery
At any time, a node knows about some other nodes (their IP addresses and public keys).
* We say that two nodes are *connected* if they have currently established a link encryption.
* The set of nodes a node $a$ is connected to are $a$'s neighbors.
* Two connected nodes exchange control information, which tells them if the other peer is willing to accept further anonymous tunnels.
* They can also check the quality of the link by using ping-messages to find out if it actually makes sense to use that link to set up anonymous tunnels.
> Important that different sources are contacted to learn about other nodes.
### 3.3 Setting up the Link and Nested Encryption
When node $a$ wants to set up the link encryption with another node $b$, it first establishes a TCP-connection with $b$. $a$ then selects a random bit-string that serves as the symmetric key for the link encryption. The key is encrypted with $b$'s public key and send to $b$.
Since the initiator does not know the nodes and their public keys along its tunnel beforehand (except the first intermediate node), we use the Diffie-Hellman (DH) key-exchange.
* If the initiator simply sent its half of the DH key-exchange to node $b$ responsible for selecting the next hop $c$, $b$ could easily play the role of $c$ (and other nodes following $c$) itself without the initiator noticing this.
To counter this attack, we must not allow $b$ to see the initiator's half of the DH key-exchange in the clear. To solve this problem we introduce the notion of a witness.
* For each hop, the initiator selects a witness randomly from the nodes it currently nodes. The witness' task is to act as a third party in the process of selecting the next hop of an anonymous tunnel.
**Figure 2** illustrates the procedure to set up the nested encryption.
![](https://i.imgur.com/MdyQlYf.png)
Node $a$ is the initiator. We assume the tunnel has already been set up to node $b$ (via zero or more intermediate nodes). To set up the nested encryption to the next node, the following steps are carried out:
1. $a$ picks a witness $w$ randomly from the set of nodes it currently knows.
* It generates its half of a DH key-exchange (DH$_a$) and nonce$_1$ to prevent replay attacks.
* nonce$_1$ and DH$_a$ are encrypted using $w$'s public key PuK$_w$. resulting in {nonce$_1$,DH$_a$}$_{PuKw}$.
* $a$ also specifies $s$, which is the number of nodes $b$ has to offer to $w$ in message 2.
* Here, we assume $s$ = 3.
* $a$ then sends a message to $b$ consisting of:
* $w$'s IP address ip$_w$,
* PuK$_w$,
* $s$,
* And the encrypted nonce and DH perameters.
* The message tells $b$ to append a node to the tunnel using witness $w$.
2. $b$ receives this message and sets up a link encryption to $w$, using ip$_w$ and PuK$_w$.
* It generates nonce$_2$, which is used to recognize message 6.
* $b$ generates a message containing the encrypted nonce and DH parameters from $a$, nonce$_2$, the IP addresses of 3 potential next hop nodes (ip$_c$, ip$_d$, and ip$_e$) and sends it to $w$.
* We name the list of IP addresses offered by $b$ the *selection* of $b$.
3. $3$ receives the message and randomly picks one node from the selection of $b$ as the hop.
* In **figure 2**, $w$ picks node $c$ and establishes a link encryption with $c$ using PuK$_c$.
* $w$ also decrypts nonce$_1$ and DH$_a$ using its private key PrK$_w$, generates a message consisting of nonce$_2$, ip$_b$, and DH$_a$, and sends it to $c$.
4. $c$ gents the message and checks if it is indeed willing to accept an anonymous tunnel from $b$. If yes, $c$ generates an ok-message and sends it back to $w$.
5. $w$ receives the ok-message and generates a receipt for $a$.
* The receipt contains the IP address offered by $b$ and is signed by $w$ using PrK$_w$.
* The first IP address in the receipt is the one $w$ has picked as the next hop.
* The receipt also contains nonce$_1$ to guarantee its freshness and is sent to $b$.
6. $b$ receivs the message from $w$ and learns that $w$ has selected $c$ as the next hop.
* It generates a message containing nonce$_2$ and the identifier id to be used to identify data belonging to this anonymous tunnel on the link between $b$ and $c$.
* After having sent this message, $w$'s task is completed and the connection between $b$ and $w$ can be torn down again.
7. $c$ gets the message and sends its part of the DH key-exchange (DH$_c$) back to $b$ via a message identified with id.
8. $b$ generates a message consisting of DH$_c$ and the receipt from $w$ and sends it to $a$.
If anything fails, a nok-message is sent back to $a$ and $a$ can either decide to tear down the tunnel completely or try again.
### 3.4 Analysis of the Nested Encryption Setup
If $b$ wants to simulate the next hop $c$, it can provide $w$ in message 2 with fake public keys it knows the private keys of, intercept message 3 and act as $c$ itself. To do so, $b$ needs active control over the link between $w$ and $c$ to intercept and inject data packets. However, $b$ cannot predict which witness $a$ is going to choose, so $b$ cannot prepare itself in advance and it is difficult to intercept packets close to $w$. It seems more realistic for $b$ to intercept packets close to $c$, especially as it is $b$ that selects the list of nodes in message 2.
* To make this attack as difficult as possible, we require that all IP addresses offered by $b$ in its selection and $b$'s own IP address must not have similar IP prefixes.
* Since $a$ chooses randomly a different witness for each hop, the probability that all witnesses are cooperating with b is quite small.
All other attacks require active control over several network links and are therefore much harder to carry out. In addition, if something like a world-wide PKI got deployed, the use of digital certificates would defeat those attacks; impersonating another party would no longer be possible if $c$ signed message 7.
## 4. TRAFFIC ANALYSIS ATTACKS
The large number of mixes and the dynamism of MorphMix helps to protect from passive traffic analysis attacks. If a global eavesdropper can observe every single MorphMix node, we are doomed.
* Due to the limited mix functionality of the nodes - in particular because there is no cover traffic - such an adversary should be able to break the anonymity of all MorphMix users by means of timing attacks at the nodes along the anonymous tunnels or end-to-end timing attacks at the nodes along the first and final nodes.
In MorphMix, because of the large number of mixes, a global observer seems extremely unlikely. To ensure this, each node should constantly try and learn about other peers that can be used as possible next hop nodes in anonymous tunnels and forget about those it has been using for a while. Adding cover traffic would not add much more to the resistance of MorphMix.
* In particular, keeping up constant traffic flows between nodes in a way that really protects from traffic analysis attacks without significantly degrading the end-to-end performance would be very difficult in a highly dynamic environment with unreliable nodes.
A limited eavesdropper that is able to monitor several nodes but not a significant portion of the system may occasionally break the anonymity of a user if he manages to observe at least the traffic at the initiator and final node of an anonymous tunnel. As soon as the user switches to another tunnel, her identity is protected again.
* This implies that MorphMix is well suited to protect its users from long-term profiling without guaranteeing the anonymity of every single transaction.
## 5. DETECTING COLLUSION ATTACKS
It is a hard problem to detect nodes that are just collecting data but otherwise offer good service. However, there is one key difference between an anonymous tunnel that was set up via well-behaving nodes and one that is partly composed of cooperating malicious nodes:
* In the first case, each node is selected more or less randomly among all active nodes in the system.
* While in the second case, nodes from the malicious set appear with higher probability.
Detecting nodes that appear more often together in anonymous tunnels than others can only work when a user has set up and used a variety of different anonymous tunnels.
Each node maintains an *internal table* that contains a row for each selection it has received during the setup of anonymous tunnels.
* Each row is a combination of a selection and the node that offered the selection, which we name *extended selection*.
* If node $b$ has offered the selection {ip$_b$, ip$_c$, ip$_e$}, the resulting extended selection is {ip$_b$, ip$_c$, ip$_d$, ip$_e$}.
![](https://i.imgur.com/yUQsfjT.png)
![](https://i.imgur.com/8W52WvV.png)
In step 3 of algorithm 1, we want to find out what the nodes in the new extended selection have done before. For reasons (1-4), we can state the following properties about the set ES$_R$:
1. If ES$_N$ mainly consists of colluding nodes, ES$_R$ will contain relatively few different nodes and many occurrences of several colluding nodes.
* This implies a big $m$ and a small $u$, resulting in a big $c$.
2. If ES$_N$ mainly consists of well-behaving nodes, ES$_R$ will contain relatively many different nodes with only a few of them occurring several times.
* This implies a small $m$ and a big $u$, resulting in a small $c$.
What distinguishes well-behaving popular nodes from colluding nodes is that although popular nodes appear frequently in selections of well-behaving nodes, less popular nodes appear in the same selections, too. Consequently, the variety of nodes being selected by well-behaving nodes is always bigger than the one selected by malicious nodes, even if there are some very popular nodes.
### Detecting Malicious Tunnels
IF a new correlation $c$ is computed, it basically affects the slot closest to $c$ by incrementing its value by 1.
* However, in order not to let grow the values in the array grow indefinitely, they follow an exponential weighted moving average (EWMA) with parameter $\alpha$ and (1 - $\alpha$) is added to the slot that corresponds to the new correlation.
We assume a system with 10,000 nodes, where some are malicious and in the same colluding set.
* Each node is equally popular.
* We set up 5,000 anonymous tunnels, whereas each tunnel consists of 5 nodes total.
* This means that the initiator gets 3 different selections during the setup of each tunnel, one from each of the interemediate nodes.
* Each selection contains 10 nodes.
* For now, we assume that malicious nodes offer only other malicious nodes from their collusion in their selections.
**Figure 3** shows the correlation distribution when 5, 10, 20, or 30% of all nodes are malicious.
![](https://i.imgur.com/jKOwOT2.png)
The strategy a node follows to detect malicious anonymous tunnels is as follows:
* At any time, the node knows the correlation distribution it has generated based on selections it received previously.
* Based on this distribution, the node determines a *correlation limit*.
* Only if the final node in the tunnel is malicious, then this is difficult to detect because it does not offer a selection.
* However, this final node cannot learn anything about the anonymous tunnel by itself.
The steps the initiator carries out during the setup of an anonymous tunnel to determine whether it is considered good or malicious are listed in algorithm 2:
![](https://i.imgur.com/z5voJ9r.png)
**Figure 4** shows the false positives and negatives for the setting in **Figure 3**. The graphs show the cumulated percentages of false positives and negatives after $n$ anonymous tunnels have been set up.
* For instance, in **figure 3a**, the line with the false positives shows about 20% false positives after 2,000 anonymous tunnels.
* This means that 20% of all good anonymous tunnels were wrongly classified as malicious during setup of the first 2,000 anonymous tunnels.
* The table lists the absolute figures of false positives and negatives after all 5,000 anonymous tunnels have been set up.
![](https://i.imgur.com/JaUpLa5.png)
It is seen that false negatives mainly occur when only a few tunnels have been set up. The reason is that in the beginning, it is difficult to determine if a new extended selection is good or malicious because the internal table of the initiator does not yet contain enough extended elections.
False positives happens from time to times, which is caused by the fact that the correlation limit is always chosen to minimize the false negatives at the cost of a few false positives.
The collusion detection mechanism has its limit. If the amount of malicious nodes increased to 50% and beyond, detecting malicious tunnels is no longer possible because the two peaks in the correlation distribution merge into one.
### 5.2 A More Clever Adversary
A different adversarial game is to offer not only malicious nodes but also well-behaving nodes in their selections.
In contrast to section 5.1, it is now no longer the case that all remaining nodes of a tunnel are malicious one a malicious node has been hit because a witness can choose a well behaving node from the selection of a malicious node. As there is no cover traffic employed, it could be the case that an advanced adversary makes use of timing attacks to learn these two nodes belong to the same tunnel, which means he would have fully compromised thee tunnel. We therefore look more closely at two cases:
1. The adversary controls all nodes following the initiator along an anonymous tunnel.
2. The adversary controls at least the first intermediate and the final node.
**Figure 5** shows the percentage of all anonymous tunnels the adversary is expected to compromise according to the two cases described above. The adversary's chances to fully compromise anonymous tunnels increases compared to **Figure 4*.
* For instance, *Figure 5a** shows that an adversary controlling 1,-000 nodes is likely to fully compromise nearly 1% of the anonymous tunnels if the nodes he controls offer 5 malicious nodes in their selections
![](https://i.imgur.com/tlgxSqV.png)
Another strategy of the adversary could be to make sure the nodes he controls are not very popular.
* The main idea behind this strategy is to have only a few extended selections from malicious nodes in the internal tables of the initiators to keep their correlations small.
**Figure 6** shows the adversary's expected percentage of compromised tunnels if he varies the relative popularities of the malicious nodes from $0.05...1.0$ of the well-behaving nodes.
![](https://i.imgur.com/GCik2UV.png)
> Note: there are always ~10% false positives for **Figures 5 and 6**
### 5.3 Scalability
The first parameter looked at is the size of the selection. In general, larger selections yield better separations of the two peaks in the correlation distribution. However, very large selections require each node to be connected to a lot of other nodes at the same time.
Formula that provides a good compromise:
* IF $n$ is the number of different nodes in the system, then the selection size $s$ should be chosen as $s$ = max($\lceil 5 \cdotp$ $_{10} n - 10\rceil, 1)$.
* This means the selection size grows logarithmically with the system size.
* As an example, with 10,000 nodes in the system, $s$ should be set to 10, as in sections 5.1 and 5.2
Another issue is the size of the internal table. The cmplexity to compute the correlation of a new extended selection is proportional to the number of extended selections in the internal table.
* Idea is to forget old extended selections and only keep the $k$ *least recently received extended selections* in the internal table.
If $\overline{s}$ is the average number of elements in a selection and $n$ the number of nodes in the system, the number of extended selections $k$ in the intrnal table should be $k$ = 2 $\cdotp n/ \overline{s}$[16], which means the internal table grows linearly with the size of the system.
## 7. CONCLUSIONS AND FUTURE WORK
* MorphMix is reasonably resistant to collusion attacks as long as the adversary does not control significantly more than about 20-30% of all participating nodes.
* Scalability is mainly an issue when determining whether an anonymous tunnel is good or boad.
* MorphMix eventually reaches its limits when the number of nodes approaches 1,000,000.