owned this note
owned this note
Published
Linked with GitHub
# Design and analysis methods for privacy technologies - Carmela Troncoso 2011
###### tags: `Tag(HashCloak - Validator Privacy)`
Paper: https://www.esat.kuleuven.be/cosic/publications/thesis-187.pdf
Definitions:
#### Table of Contents
[toc]
:::info
>Abstract: As advances in technology increase data processing and storage capabilities, the
collection of massive amounts of electronic data raises new challenging privacy
concerns. Nevertheless, the privacy community has not yet developed a general methodology that allows engineers to embed privacy preserving mechanisms in their designs, and test their efficacy.
In this thesis we investigate whether general methodologies for the design and analysis of privacy-preserving systems can be developed. Our goal is to lay down the foundations for a privacy engineering discipline that provides system designers with tools to build robust privacy-preserving systems.
:::
## Introduction
Motivation: The use of electronic communications changes the flow of information with respect to an off-line society in terms of data storage, data distribution, ease of access to data, etc. This new flow, together with an increased availability of information, raises new concerns and risks with respect to privacy.
The research community has tackled privacy problems from different perspectives, developing a wide range of solutions that achieve different privacy properties such as:
* Anonymity - a subject is anonymous if she is not identifiable within a set of subjects, the anonymity set.
* Unlinkability - two actions are unlinkable if they are no more and no less related than they are related concerning any a priori knowledge
* Unobservability - an action is unobservable if performing it is indistinguishable from not performing any action at all.
### Two main categories of Privacy Models
1. In the first category the privacy model assumes that users reveal their private data to trusted data controllers (e.g., service providers).
2. In the second category, instead of relying on an trusted service provider to protect their data, the model assumes that users actively take part in protecting their privacy by using so-called Privacy Enhancing Technologies (PETs).
### Thesis
In this thesis we tackle the problem of designing systems that provide hard privacy guarantees to their users. We aim to lay down the foundations for a privacy engineering discipline that provides system designers with tools to build robust privacy-preserving systems. These tools shall allow engineers to embed privacy preserving mechanisms in their designs, and test them for potential information leaks – assuming that such a leakage could lead to a privacy breach.
The thesis is in two parts; analysis of privacy systems, and how to design them.
Analysis: The first part of this thesis investigates whether there is a general way to quantify information leaks in any privacy-preserving design such that we can evaluate the design’s degree of protection.
Design: Designing a privacy-preserving system is a complex task. A first,
straightforward, approach to ensure privacy could be to not give any private information to the service provider. However, hiding all private information from providers can be detrimental for the service because it may conflict with other security requirements of the system, such as integrity or accountability; and in some extreme cases even prevent the provision of the service itself.
## Part 1 - Analysis of privacy-preserving systems
### Chapter 2: Traffic analysis in anonymous communications
**Who talks to whom?**
Communicating with a particular person or entity can reveal commercial intentions when two companies start contacting each other, sexual orientation when contacting people from a specific community, or that instructions are being delivered in a military context.
**Who talks where?**
The location where communications take place can reveal medical status of a patient visiting a clinic, political affiliation when a person visits the headquarters of a given party, or the movement or troops in a war scenario.
**Who talks how much?**
Frequent communications can reveal that two people are Involved in a relationship, or imminent actions or planning in a commercial or a military context.
**Who does not talk?** Absence of communications offers as interesting information as its existence. It can reveal the completion of a plan, or a lack of
activity.
##### Intersection attacks
> One of the main building blocks for high-latency anonymous communications is the mix. Introduced by Chaum in 1981, a mix is a router that hides the links between its inputs and its outputs by altering the appearance and timing of the messages. Mixes are suitable to build message-based systems, e.g., email, that do not have latency restrictions.
#### Traffic confirmation attacks
> The attacks described in the previous section are effective when users communicate repeatedly with their contacts. Nevertheless, a passive adversary can also trace communications when they are sporadic. An adversary with access to traffic flows can exploit the traffic shape to match incoming and outgoing streams of packets in what is called a traffic confirmation attack. These attacks can be applied to both high-latency and low-latency anonymous communication systems, and do not require the adversary to observe all communications (i.e., the adversary can be global or partial).
##### Web fingerprinting attacks
> One of the simplest attacks a partial attacker can deploy on a low-latency anonymity system is called the web site fingerprinting attack. The attack consists of two phases. First a training phase in which the adversary creates fingerprints for a number of sites and stores them. This fingerprint represents the web page in terms of packet sizes, timings, counts, etc. The second is a testing phase in which the attacker monitors the encrypted traffic of the user and tries to correlate its shape with the “templates” in her database. For this purpose the attacker only needs to have access to the victim’s traffic, and to have knowledge of the identity of the victim.
##### Routing constraints-based attacks
>Anonymity systems impose route constraints on the paths chosen by users. For instance, given the underlying anonymity protocol some attributes of paths are fixed (e.g., in Tor paths are composed by three distinct relays, and the first relay must be of a special type); or given the node discovery algorithm users have a partial view of the network, hence their paths can only be formed by a subset of all the nodes.
##### Watermarking attacks
> Here the adversary alters some characteristic of a target flow (normally packet timings) in a known fashion, hence introducing a “watermark.” The adversary then searches for this watermark in other flows. Flows containing the watermark are considered to be the same. The most popular are interval based watermarks, which modify the inter-packet delays to mark the flows.
##### Clogging attacks
>Clogging attacks are similar to watermarking attacks, in the sense that the adversary manipulates the shape of the traffic to be able to re-identify flows of packets. Clogging permits a partial adversary to find the route followed by a target stream of messages in low-latency communication systems. In the simplest version of this attack an adversary sequentially clogs nodes suspected to be in the victim’s path, and looks for the corresponding decrease in bandwidth at the client’s connection in order to identify which of the suspects actually belongs to the path.
#### Blending attacks
> The attack targets one message, for which it aims to identify the receiver. When the target message enters the mix, the adversary delays every other incoming message. Simultaneously she generates many messages such that the only message unknown to her in the batch is the target message.
### Chapter 3: Perfect matching disclosure attacks
The cryptographic transformation prevents bitwise linkability: the input and output messages appear different to a passive observer such that connections between them based on the bits they contain is not possible. On the other hand, the shuffling of messages prevents linkability based on the timing and order of outgoing messages with respect to the inputs received by the mix.
Mixes provide anonymity at the cost of substantial delays in the communication, and therefore they can only be used for applications that tolerate high latencies, such as anonymous
email or e-voting protocols.
### Chapter 4: Bayesian inference to de-anonymize persistent communications
Besides their flexibility, Bayesian techniques output reliable error estimates, allowing the adversary to evaluate the confidence she can put on the results obtained. For instance, the algorithm can point at Charlie as Alice’s most likely receiver. Nevertheless, it is not the same when the adversary is 100% certain of this assignment, than when her certainty is only fifty percent. The ability to obtain accurate error estimations allows the attacker to judge the quality of her inferences when operating in the wild.
##### Bayesian inference and Monte Carlo methods
> Bayesian inference is a branch of statistics with applications to machine learning and estimation.
Despite their power, Bayesian techniques come at a considerable computational cost. It is often not possible to compute or characterize directly the distribution due to its complexity. In those cases, sampling-based methods are available to extract some of its characteristics.
For this purpose, Markov chain Monte Carlo methods have been proposed. These are stochastic techniques that perform a long random walk on a state space representing the hidden information, using specially crafted transition probabilities that make the walk converge to the target stationary distribution. Once the Markov Chain has been built, samples of the hidden states of the system can be obtained by taking the current state of the simulation after a certain number of iterations.
##### Gibbs sampler
>The Gibbs sampler is a Markov chain Monte Carlo method to sample from joint distributions that have easy-to-sample marginal distributions. These joint distributions are often the a posteriori distribution resulting from the application of Bayes theorem, and thus Gibbs sampling has been extensively used to solve Bayesian inference problems. The operation of the Gibbs sampler is often referred to as simulation, but we must stress that it is unrelated to simulating the operation of the system under attack.
##### The Vida general Black-box model for anonymity systems
It is computationally infeasible to exhaustively enumerate the states of this distribution. Hence, to calculate the marginals of interest such as users’ profiles, or likely recipients of specific messages, we have to resort to sampling states from that a posteriori distribution. Sampling directly is very hard (due to the interrelation between the profiles, the matches, and the assignments) hence Markov chain Monte Carlo methods are used.
### Chapter 5: A Bayesian framework for the analysis of anonymous communication systems
Our key contribution is a framework to estimate, with arbitrarily high accuracy, the distributions necessary for computing a wide variety of anonymity metrics for relay-based mix networks.
Our analysis of mix networks incorporates most aspects and attacks previously presented in the literature: constraints on paths length, node selection, bridging and fingerprinting attacks, social relations of users, and erratic user behavior. For the first time, all these aspects are brought under a common framework allowing the adversary to combine them all when the analyzing a system.
The Bayesian traffic analysis techniques presented have two key advantages. First, they allow optimal use of all information when drawing conclusions about who is talking to whom. Second, they provide the analyst with an a posteriori probability over all scenarios of interest, whereas previous attacks only provided the most likely candidate solution.
## Part 2 - Design of privacy-preserving systems
### Chapter 6: Location Privacy: an overview
#### Introduction
Even though location-based services have an enormous potential to benefit service providers and users, these advantages come at a cost for the users. Pervasive communication implicitly generates a large amount of sensitive information encoded in the location and timing where and when this communication takes place. The fact that individuals interact with their environment may allow the service provider, or even passive eavesdroppers, to track users’ movements. The analysis of location data can expose aspects of users’ private lives that may not be apparent at first, and sensitive information can be inferred from
it.
#### Anonymizing unlinkable events
An adversary who can trace an individual along several locations may also profile the individual’s behavior over time. A family of solutions to this problem tries to break the linkability of subsequent location samples by changing the identity assigned to them (which can be a one-time or a persistent pseudonym).
#### Adding dummy events
An alternative to anonymization for achieving location privacy is to add fake samples to the location traces. These dummy events, indistinguishable from real actions in the eyes of the adversary, aim to confuse the attacker as to which are the actual movements of the user.
#### Obfuscating events
A third approach to achieve location privacy is to modify the location and/or the timing of events. This adds inaccuracy or imprecision to the adversary’s
observation hampering inferences on users’ behavior. This can be implemented by adding noise to the actual times and or locations, or by coarse graining them.
#### Hiding events
An alternative is to not only to anonymize the location samples but also remove a subset of them before they are transferred to the server in charge of computing statistics.
### Chapter 7: Privacy-friendly pay-as-you-drive applications
#### Introduction
Insurance represents a large fraction of the cost of owning a car. In order to lower costs for both owners and insurers, insurance companies have developed pay-as-you-drive (PAYD) schemes.
#### PriPAYD: Privacy-friendly pay-as-you-drive insurance
* We can distinguish three types of policies, based on how privacy-invasive they are. Some of them do not imply any breach of privacy since the amount of kilometers traveled (no location information) needed to compute the premium, is provided only once a year from a fixed location.
* The second type, despite not recording location information, collects data in geographically distributed points, allowing the insurance company to estimate the movements of the vehicle.
* Finally, the last model collects GPS data to track all vehicle’s location over time.
#### The security of PriPAYD
The only party that is authorized to access the confidential information is the customer, while the insurance company is only authorized to access the billing information.
Three key security properties are required from the channel that transfers the billing data from the vehicle to the insurance company:
* Authenticity. Only the OBU can produce billing data that is accepted as genuine by the insurer or any other third party.
* Confidentiality. Only the insurer and the car owner should be able to read the billing data transmitted.
* Privacy. The customer should be able to verify that only the billing data is sent to the insurer.
#### PrETP: privacy-preserving Electronic Toll Pricing
In this section we present PrETP, a system based on the same design principles as PriPAYD, that uses cryptographic commitments to prevent fraud. We propose a protocol, Optimistic Payment OP, that makes use of homomorphic commitments which allow the OBU to prove remotely to the service provider that it carries out correct computations, thus relaxing the tampering resistance requirements.
* Vehicles with inactive OBUs. Drivers should not be able to shut down their OBUs at will to pretend that they drove less.
* OBUs reporting false GPS location data. Drivers should not be able to spoof the GPS signal and simulate a cheaper route than the actual roads on which they are driving.
* OBUs using incorrect road prices. Drivers should not be able to assign arbitrary prices to the roads on which they are driving, to lower the final fee. If the policy assigns a price p to a road, drivers cannot use a price p 0 < p for this road.
* OBUs reporting false final fees. Drivers should not be able to report an arbitrary fee, but only the result from the correct calculations in the OBU. If at the end of the tax period the final fee corresponding to the GPS location data collected by the OBU is fee, a driver cannot claim that she must pay fee0 < fee.
#### Optimistic Payment
Technical preliminaries
* Signature Schemes. A signature scheme consists of the algorithms SigKeygen, SigSign and SigVerify.
* Commitment schemes. A non-interactive commitment scheme consists of the algorithms ComSetup, Commit and Open.
* Proofs of Knowledge. A zero-knowledge proof of knowledge is a two-party protocol between a prover and a verifier. The prover proves to the verifier knowledge of some secret values that fulfill some statement without disclosing the secret values to the verifier.
### Discussion
#### Technical discussion
Cost
> The additional engineering effort that is required
for building a slightly more complex OBU should be more than balanced by the reduced costs of the back end systems, since they handle less, as well as less sensitive, data.
(The cost of our prototype amounts to $500; such a number would be drastically reduced in a mass-production scenario.)
Strengthening privacy
> A first concern is the use of GSM to transfer the data back to the service provider. In our scheme the billing data does not contain any sensitive location information, but an active GSM device registered in the network does leak the cells the vehicle is transmitting from. Hence it is prudent to keep the GSM system powered down at all times except when transmitting. The transmission time and location must be chosen to minimize location leakage because of the GSM technology.
Certification and independent monitoring
> A key objective in our design is to reduce to the minimum the tamper resistance requirements of the OBU to guarantee user’s privacy. However, as the OBU is commissioned by the service provider the user has limited capacity to check whether it is functioning correctly or leaking private information. Our design choice is to allow users to have a full view of the output of the OBU and to ensure that only the minimum billing information is transmitted.
### Conclusion
In order to complete our design, we have implemented a prototype OBU and demonstrate that, contrary to common belief, the overhead introduced by privacy enhancing technologies is moderate, and that they are efficient enough to be integrated in commercial in-vehicle devices. Finally, we have verified that the design is compliant with the legal framework in which the application is to be deployed. We recall that there are some fundamental limits to the privacy protection offered by technical means. As a complement to our solution, in the previous section we have outlined a series of non-technical measures to mitigate privacy threats that cannot be addressed using engineering solutions.
> There is no component or infrastructure required by PriPAYD or PrETP that would make them significantly more expensive than their privacy-invasive alternatives, as we demonstrate with our prototype. One could in fact argue that in the long run PriPAYD and PrETP, as any other privacy enhancing technology, are cheaper than privacy-invasive systems. The costs of protecting private data stores is often overlooked in the accounting of costs, as is the risk of a single security breach leaking the location data of millions of customers.