owned this note
owned this note
Published
Linked with GitHub
# "The Phantom Protocol DEFCON 16 Video + Whitepaper"
###### tags: `Tag(HashCloak - Validator Privacy)`
Author(s): Magnus Bråding, Security Researcher, Fortego Security
Paper: https://github.com/hashcloak/phantom / https://www.youtube.com/watch?v=IAJn3Ag3imY / http://www.magnusbrading.com/phantom/phantom-design-paper.pdf
### Table of Contents
[toc]
:::info
>Abstract: Recent years, and especially this past year, have seen a notable upswing in developments toward anti-online privacy around the world, primarily in the form of draconian surveillance and censorship laws (both legislated and suggested) and ISPs being pressured into individually acting as both police and informants for various commercial interests. Once such first steps have been taken, it is of course also of huge concern how these newly created possibilities could be used outside of their originally stated bounds, and what the future of such developments may hold in store for online privacy.
>
>There are no signs of this trend being broken anytime soon. Combined with the ever growing online migration of everything in general, and privacy sensitive activities in particular (like e.g. voting, all nature of personal and interpersonal discussions, and various personal groupings), this trend will in turn unavoidably lead to a huge demand for online anonymization tools and similar means of maintaining privacy.
>
>However, if not carefully designed, such anonymization tools will, ultimately, be easy targets for additional draconian legislation and directed [il]legal pressure from big commercial and political interests. Therefore, a well-conceived, robust and theoretically secure design for such an anonymization protocol and infrastructure is needed, which is exactly what is set out to be done with this project.
>
>What is presented in this paper is the design of a protocol and complete system for anonymization, intended as a candidate for a free, open, community owned, de facto anonymization standard, aimed at improving on existing solutions such as e.g. TOR.
:::
## Video: DEFCON 16: Generic, Decentralized, Unstoppable Anonymity: The Phantom Protocol
### 0:00 Magnus Bråding
* Swedish security researcher (Fortego Security).
* 10+ years in the security business.
* Central contributor and driving force behind woodmann.com reverse engineering community.
### 0:13 Project Background (why is this interesting?)
* Big upswing in anti-online privacy measures recent years
* A huge upcoming demand for anonymity seems unavoidable!
* Existing anonymization solutions are in many ways not well suited for this upcoming demand and the circumstances surrounding it.
* There is no real “standard” for anonymization, like BitTorrent is for P2P.
### 5:15 Goals of the Project
* To be a good reference for future work within the field of anonymization
* To inspire further discussion about the optimal requirements for the future anonymization demand
* To be a starting point and inspiration for the design and development of a global de facto standard for generic anonymization
* Not to be a complete detailed specification ready to be implemented, but rather to be built upon
### 6:15 Limitations
The protocol is designed to work in any network environment as long as no single attacker is able to eavesdrop all participating nodes in a correlated fashion, or directly controls a large majority of all nodes in the network
### 7:00 Further Assumptions and Directives
* Arbitrary random peers in the network are assumed to be compromised and/or adverse
* CPU power, network bandwidth, working memory and secondary storage resources are all relatively cheap, and will all be available in ever increasing quantity during coming years and thereafter.
### 8:40 Design Goal Overview
* The design goals are stipulated with the requirements and demand of today and the future in mind
* Eight primary design goals:
1. Complete decentralization.
2. Maximum DoS resistance.
3. Theoretically secure anonymization.
4. Theoretically secure end-to-end encryption.
5. Complete isolation from the ”normal” Internet.
6. Protection against protocol identification.
7. High Traffic Volume and Throughput Capability.
8. Generic, Well-Abstracted and Backward Compatible
**Design Goal #1:
Complete Decentralization**
* No central or weak points can exist.
* Both ownership and technical design must be decentralized.
* Open/community owned design & source code.
**Design Goal #2:
Maximum DoS Resistance**
* The only way to stop a decentralized system without any legal owners is to DoS it.
* It only takes one weakness, so defensive thinking must be applied throughout all levels of the design.
**Design Goal #3:
Theoretically Secure Anonymization**
* Nothing should be left to chance.
* No security by obscurity.
* All anonymization aspects should be able to be expressed as a risk probability or a theoretical (cryptographic) proof.
**Design Goal #4:
Theoretically Secure End-to-End Encryption**
* Confidentiality is not only important by itself, but also directly important to anonymity!
* Even if someone would monitor and correlate all traffic at all points in the entire network, they should not be able to see what is communicated, no matter what.
**Design Goal #5:
Isolation from the "Normal" Internet**
* Users should not have to worry about Internet crimes being perpetrated from their own IP address.
* An isolated network is necessary to be able to enforce end-to-end encryption for generic traffic.
* Using an isolated network has many advantages, but not so many disadvantages in the end.
* Out-proxies to the ”normal” Internet can still be implemented on the application level, selectively.
**Design Goal #6:
Protection against Protocol Identification**
* Many powerful interests will lobby against a protocol like this, both to lawmakers and ISPs (who are already today filtering traffic).
* The harder it is made to positively identify the usage of the protocol, the harder it will be to track, throttle and block it.
**Design Goal #7:
High Volume / Throughput Capacity**
* The traffic volume for ”normal usage” of the Internet increases every day.
* More or less high speed / throughput is necessary for many Internet applications.
* Popularity will be proportionally related to transfer speed and volume.
* Anonymity is directly related to popularity.
**Design Goal #8:
Generic, Well-Abstracted and Backward Compatible**
* A generic system is practically always superior to a specific system in the long run.
* A well-abstracted system allows for efficient, distributed design and implementation.
* A system compatible with all pre-existing network enabled applications will get a much quicker takeoff and community penetration, and will have a much larger potential.
### 18:00 A Bird’s-Eye View
**The Basic Idea**
* Usually when nodes connect to the internet they exchange data.
* They also usually end up exchanging identifying information such as IP addresses.
* Using routes, this private data can be hidden
* Nodes can create their own paths.
* Choice of size/composition.
* Choice of throughput/strength.
### 22:08 AP Addresses
* ”Anonymous Protocol” addresses.
* Equivalent to IP addresses in their format.
* Equivalent to IP addresses in functionality, with the exception that they allow communication between two peers without automatically revealing their identity.
* Backward compatible with IP applications.
### 23:15 The Network Database
* Equivalent to the routing tables of the ”normal” Internet.
* Distributed and decentralized database based on DHT (Distributed Hash Table) technology.
* Proven technology
* Automatic resilience to constantly disappearing and newly joining nodes.
* Automatic resilience to malicious nodes of some kinds.
* The network nodes are the database.
### 25:21 Design Details
**Secure Routing Path Establishment**
The selection of the order of nodes in the sequence must obey the following rules:
* No two X-nodes can be adjacent to each other
* A Y-node should be located in one end of the sequence
* A number of Y-nodes equal to the total number of X-nodes minus one, should be located adjacent to each other in the other end of the sequence
* One end of the sequence should be chosen at random to be the beginning of the sequence
![](https://i.imgur.com/RF0eAGH.png)
### 34:39 The Goodie Box (SSL/Port 443)
* The routing path construction certificate.
* IP and port of next, and IP of previous node.
* Random IDs of next/previous node connections.
* Communication certificate of next/previous nodes.
* Seeds and params for dummy package creation.
* Seeds and params for stream encryption keys.
* Flags.
* Secure hash of the entire (encrypted) setup package array in surrently expected state.
* To make it impossible to piggy back info in box.
* Secure hash of the entire (decrypted) cpntents pf the current setup package.
* To know which packets are their own.
**Second Round Extras**
A signed routing table entry, for the AP address associated with the routing path.
### 36:43 Secure Routing Tunnel Establishment (outbound)
![](https://i.imgur.com/h3txwmi.png)
* This is all to obscure AP from IP addresses.
1. $\alpha$ sends a notification package ("black") and remembers it.
2. The next node reads the package and choses stream encryption keys (eg. 100) randomly and encrypts the package with it.
* Again, remembers it.
3. The next node does the same thing.
4. The last node does the same.
* Then it adds another package.
5. (On the way back) Creates a new TCP connection to the previous node.
6. Node receives both packets, remembers the old packet is connected to a certain stream key, and binds it to both connections (in and outbound).
7. Then both packets are decryoted with that key.
* First packet is decrypted to its original form of yellow (heading towards "black").
* Second packet is now a new form.
8. Packets get sent back to previous node.
9. Both packets are decrypted.
* First packet is reappears "black".
* New packet decrypted with keys of every node appears.
* The reply package enables the anonymized node to derive the keys of all the intermediary nodes, while it is impossible for any of them to derive any key with it themselves
* Easy success checking using this method although it takes 4 rounds to establish a connection.
### 44:15 Secure Routing Tunnel Establishment (inbound)
![](https://i.imgur.com/BOgaXP7.png)
* Same process as Outbound.
* Add extra dummy packet sent for symmetry.
### 48:00 Secure End-to-End Encryption
* Once a full anonymized end-to-end connection has been established between two peers, double authenticated SSL can be used over it, as a final layer of encryption / authentication.
* The used certificates can be stored in the network database, in the individual entries for each AP address.
### 49:38 IP Backward Compatibility
* Identical format and functionality of IP.
* Binary hooks for all common network APIs.
* The common Internet DNS system can be used.
* Simple to start supporting IPv6 and similar too.
### 51:30 The Network Database
* Contains separate tables.
* Node IP address table, with associated info.
* Node AP address table, with associated info.
* The database can be accessed through a specific strict API
* Voting algorithms, digital signatures and enforced entry expiry dates are used on top of the standard DHT technology in some cases, to help enforce permissions and protect from malicious manipulation of database contents and query results.
* Resilient to ”net splits”.
### 53:23 Manual Override Command Support
* Powerful emergency measure
* Protection against DoS attacks
* Signed commands can be flooded to all clients
* Many DHT implementations natively support this feature
* Commands signed by trusted party, e.g. project maintainers etc
* Verification certificate hard coded into the client application
* Only commands for banning IP addresses, manually edit the network database etc, never affecting client computers!
* No real worry if signing keys would leak or be cracked.
### 57:00 High-Availability Routing Paths
![](https://i.imgur.com/mYvBJQp.png)
### 57:50 Aftermath
**Legal Aspects & Implications**
File sharing example:
1. Today: Lawsuits based on people connecting to a certain torrent.
2. Lawsuits based on people using a certain file sharing program / protocol.
3. Lawsuits against endpoints in anonymization networks.
4. Lawsuits against routers on the Internet?
5. Lawsuits based on people using a generic anonymization protocol
6. Lawsuits based on people using cryptography?
7. Lawsuits based on people using the Internet?
File sharing is a useful use case, but many people/actors may want to sue for varying levels of application/network usage depending on it's usage implications.
### 1:11 Review of Design Goals
* Review of our eight original design goals:
1. Complete decentralization.
2. Maximum DoS resistance.
3. Theoretically secure anonymization.
4. Theoretically secure end-to-end encryption.
5. Complete isolation from the ”normal” Internet.
6. Protection against protocol identification.
7. High Traffic Volume and Throughput Capability.
8. Generic, Well-Abstracted and Backward Compatible.
### 1:14:00 Comparison with Other Anonymization Solutions (TOR)
* Advantages of Phantom over TOR.
* Designed from the ground up with current and future practical anonymization needs and demand in mind.
* Compatible with all existing and future network enabled software, without any need for adaptations or upgrades.
* Higher throughput.
* No traffic volume limits.
* Isolated from the ”normal” Internet.
* End-to-end encryption.
* Better prevents positive protocol identification?
* Not vulnerable to ”DNS leak” attacks and similar.
### 1:17:57 Comparison with Other Anonymization Solutions (I2P)
* Advantages of Phantom over I2P
* Compatible with all existing and future network enabled software, without any need for adaptations or upgrades.
* Higher throughput.
* End-to-end encryption.
* Better prevents positive traffic analysis identification?
### 1:19:28 Comparison with Other Anonymization Solutions (anonomyzed P2P)
* Advantages of Phantom over anonymized P2P (app specific).
* Less likely to be target of “general ban”.
* The generic nature of Phantom opens up infinitely much more potential than just binding the anonymization to a single application or usage area.
### 1:21:17 Known Weaknesses
1. If all the nodes in a routing path are being controlled by the same attacker, this attacker can bind the anonymized node to the entry/exit node.
2. If an attacker monitors the traffic of all nodes in the network, it will be able to conclude the same thing as in the previous weakness, without even having to doubt where the routing paths end.
3. Individual intermediate nodes in a routing path could try to communicate their identity to other non-adjacent attacker controlled intermediate nodes in the same routing path, by means of different kinds of covert channels.
## Extra Points from Whitepaper
> http://www.magnusbrading.com/phantom/phantom-design-paper.pdf
Anonymity at its simplest core could be described as the inability for (all) other parties to discover,
in a given context, the identity of the party defined as being anonymous.
The identity of a person could be defined as the
set of information that directly, conclusively and uniquely describes and singles out that individual
among all other human beings in this world (and possibly other worlds too, depending on one’s
religious preferences).
Under pseudonymity, a party can
operate without revealing its real identity, but various acts performed can still all be connected
and bound to this same entity, i.e. to its pseudonym identity.
3.1. Design Assumptions
Yet again, for a topic as multifaceted as this, some important assumptions on which the design is
based should be declared:
1. The traffic of every node in the network is assumed to be eavesdropped on (individually, but
not globally in a fully correlated fashion) by an external party.
2. Arbitrary random peers participating in the anonymization network/protocol are assumed to
be compromised and/or adverse.
3. The protocol design cannot allow for any trusted and/or central entity, since such an entity
would always run the risk of being potentially forced offline or manipulated, if for no other
reason due to large enough quantities of money (or lawyers) being “misplaced”.
3.2. Important Consequences of Design Goals and Assumptions
As a result of the design assumptions listed above, a number of consequences critical to the
design of the protocol can be deduced, out of which some important ones are:
1. The protocol needs to be fully decentralized and distributed.
2. No other peer participating in the protocol can be trusted by itself.
3. Probabilistically secure algorithms need to be used instead of deterministically secure ones.
> why?
3.3. Design Directives
During the course of any design procedure, one is often faced with different alternatives or
options at different levels. Design directives are meant to assist in making these decisions, and
also to make sure that all such decisions are made in a consistent manner, toward the same
higher level goals. The most important design directive for this project is:
1. CPU power, network bandwidth, working memory and secondary storage resources are all
relatively cheap, and will all be available in ever increasing quantities during the coming
years and thereafter. Thus, wherever a choice must be made between better security or
better performance/lower resource consumption, the most secure alternative should be
chosen (within reasonable bounds, of course).
4.3. Theoretically Secure Anonymization
As always in the field of security, any solution relying solely on the obscurity and practical
difficulty of cracking it, will always fail sooner or later. Usually sooner too actually, if just enough
motivation and resources exist among its adversaries.
Thus, an important design goal of the protocol is that the security of its anonymity should be
theoretically provable, regardless of being deterministic or probabilistic.
4.4. Theoretically Secure End-to-End Encryption
End-to-end encryption, and the subsequent prevention of anyone eavesdropping on the contents
of single communication sessions, is something that is normally taken for granted on the Internet
today. This kind of secrecy is also of extra importance when it comes to the field of anonymization, due to the simple fact that if someone is able to eavesdrop on the contents of the
communication between two otherwise anonymous parties, it is highly likely that information of
more or less identifying nature will occur at some point. In such case, the identities of the
communicating parties can be deduced by means of this information, instead of through network
specific address information.
The practical goal will rather be to induce a large
enough amount of uncertainty and false positives into any reasonably resourceful traffic analysis
method, in order to prevent real-time throttling and blocking.
![](https://i.imgur.com/3iVaPqR.png)
6.1. Some Further Definitions
x A routing path is a number of network nodes in the anonymous network, in a defined
order, selected by a particular anonymized node that also “owns” the path, over which
communications to/from this anonymized node can be forwarded/routed in order to help
keep its real identity hidden from its communication peers in the anonymous network.
x An exit routing path, or exit route, is a routing path through which the owning anonymized
node can make outgoing connections to other nodes in the anonymous network, and
thus, a mechanism for anonymizing network clients.
x The outermost node in an exit route is called an exit node.
x An entry routing path, or entry route, is a routing path through which the owning
anonymized node can accept incoming connections from other nodes in the anonymous
network, and thus, a mechanism for anonymizing network servers.
x The outermost node in an entry route is called an entry node.
x A routing tunnel is a connection established over a routing path, over which the
anonymized node owning the routing path can perform TCP-equivalent anonymized
communication with a specific peer node in the anonymous network.
x The network database is a fully distributed, decentralized database, based on DHT
(distributed hash table1
) technology. It contains a number of individual virtual “tables”,
which in turn contain all global information necessary for the operation of the anonymous
network. All network nodes have access to the necessary parts of the contents of this
database, through a well-defined API.
x An AP address (Anonymous Protocol address) is the equivalent of an IP address within
the bounds of the anonymous network. AP addresses are used to identify individual
nodes on the anonymous network (without for that sake being able to deduce their IP
address or any similarly identifying real-world information).
6.2. Routing Paths
The concept of routing paths is the central anonymizing mechanism.
As described in the previous chapter, not all nodes in the anonymous network are necessarily
anonymized though. Some of the nodes could just as well only be in the network to be able to
communicate with other anonymized nodes, without for that matter having the need to anonymize
themselves, thus saving resources and improving performance.
Even nodes that don’t make
use of routing paths themselves can still in many ways be considered as being anonymized, if by
no other means, through the existence of “reasonable doubt”, which has very important
performance implications for the protocol. This kind of reasonable doubt is, of course, of little use
for a server with a static AP address, since it won’t take long for any client to notice that the IP
address of its “entry node”, i.e. the server node itself in the non-anonymized case, is always the
same. For various clients in different situations, however, this can be of great usefulness,
discussed in more detail a bit later.
This has been solved by using a distributed and decentralized database, of the DHT (Distributed
Hash Table) type. This way, the collected set of all nodes in the anonymous network actually are
the database. Several successful large scale implementations of this kind of database already
exist today, among which the Kademlia based Kad network database (of eMule fame) is one of
the largest.
All that is needed from a node in order to connect to, and become part of, the distributed
database, and thus the anonymous network, is to get hold of any single node that is already
connected. Without a central point, this might, at first glance, appear to be a big problem,
especially for first-time users of such a network. It really isn’t though. For any user that has
already been connected to the network, even once, thousands upon thousands of node
addresses can be cached until the next time the node wants to connect. At that point, significant
numbers of these will most likely still be available, and, as already mentioned, contact with just
one single connected node is all that is needed to become a fully qualified part of the database
and network, thus again getting access to large volumes of node IP addresses to use as entry
points for subsequent connections. Also, to create easily accessible entry points for never-before
connected nodes, any node that has been connected to the network just once can easily publish
and share its list of valid node IP addresses to the world in any number of ways, from their
website, from their blog, in forums, blog comments, news group postings, or even by email. This
guarantees that, as long as there are any remaining members in the anonymous network, an
entry point is only a few mouse clicks or a URL away.
The anonymized node prepares a special unique “setup package” for each individual X-node
and Y-node, being asymmetrically encrypted with the path building certificate of the
individual recipient, symmetrically encrypted with the 128-bit ID of its incoming connection,
and signed with the routing path construction key from the previous step. The package
contains the following (just as with all other steps, this will be explained in more detail in the
next section):
5.1. IP address of the expected previous node in the sequence.
5.2. IP address and port number of the next node in the sequence.
5.3. The routing path construction certificate, generated by the anonymized node in the
previous step.
5.4. A random 128-bit ID, associated with the connection from the previous node.
5.5. A random 128-bit ID, associated with the connection to the next node.
5.6. The communication certificate of the next and previous node.
5.7. A constant number of tuples containing a 128-bit seed, a size, an index and flags for
creation of dummy setup packages (more info about this in the details later).
5.8. A constant number of 128-bit seeds for stream encryption key generation + the
number of keys to be generated.
5.9. A collection of flags, telling if the node is an intermediate X-node, a terminating Xnode or a Y-node, among other things.
5.10. A secure cryptographic hash of the entire (encrypted) setup package array (see the
next step) - except the package itself - in the expected state upon reception from the
previous node.
5.11. A secure cryptographic hash of the (decrypted) contents of the current setup
package.
located adjacent to each other in one end of the sequence", is because we don’t want, at any
point, to have a shorter distance between the anonymized node and the terminating X-node at
the other end of the intended routing path than we will have in the final path (where “distance”
between two nodes denotes the number of randomly selected intermediary nodes between
them). These Y-nodes thus work as a temporary “buffer” to maintain this distance during the
creation process of the routing path. The “total number of X-nodes minus one” will always be
equal to the distance between the anonymized node and the terminating X-node in the fully
established routing path, and thus this number of Y-nodes will assure that the distance, and thus
the security, selected by the anonymous node to begin with, will also be maintained throughout
the entire setup process of the routing path.
The reason for having “no two X-nodes adjacent to each other” and “a Y-node located in one end
of the sequence” (where the other end has the multiple Y-nodes discussed above) is that no Xnode should be able to know which nodes will be adjacent to it in the final routing path, and thus
be able to influence, interfere or otherwise behave any differently during the path setup process
based on such information (an X-node controlled by an attacker would always want to have other
X-nodes controlled by the same attacker adjacent to it, in order to achieve a fully compromised
path).
Finally, the reason for “choosing one end of the sequence at random to be the beginning of the
sequence” is that the direction in which the one-way communication occurs during the setup of
the path should have no connection to the directions in which the anonymized node and the
terminating node at the ends of the path are located. That is, the intermediary nodes should not
be able to tell which direction of the path is which. Combined with the fact that a routing path can
be either an entry path, an exit path, or both, the intermediary nodes will never be able to derive
at which end of the path the anonymized node is located, not even by taking note of the direction
in which tunnels are established over the path once it is fully setup.
7.6. The node makes a decision whether it is possible for it to take part in the path
building process (under normal circumstances the answer should be yes, which will
be assumed in this process summary, possible exception conditions will be
discussed later).
The only legitimate reason for “saying no” would probably be if the node in question is too heavily
loaded already. Otherwise nodes could easily be DoS attacked by opening too many paths
through them. On the other hand, a built-in option in the protocol to “say no” could encourage
“cheat clients”, who don’t share their bandwidth, but still use others’ bandwidth. Then again,
clients that really wanted to cheat could, of course, just disconnect any such requests completely,
so leaving out the option wouldn’t really be an efficient solution either. Either way, the “cheat
client” scenario is something of a dilemma, and should be considered further.
Depending on whether the tunnel is an inbound tunnel (i.e. initiated by the entry node of a routing
path, in response to an incoming connection from a third-party) or an outbound tunnel (i.e.
initiated by the anonymized node that owns the routing path, as a result of this node wanting to
create a connection to another AP address in the anonymous network), the process will be
internally different. Seen from the outside, or even from the viewpoint of all intermediary nodes in
the affected routing path, however, the process is seemingly identical and symmetrical, which is
an important property further aiding the anonymity and zero-knowledge of the system