# Research Batching for Privacy
## Madars's Blurb for the Website
### Empirical Effectiveness Estimation of Bitcoin Privacy Tools
Bitcoin is the leading public permissionless digital currency. Through the use of a transparent public ledger, Bitcoin achieves strong consistency and security properties, and resistance against double-spending. However, this public ledger also poses challenges for user privacy despite Bitcoin's pseudonymity and various privacy-enhancing tools.
Our research aims to evaluate the real-world privacy capabilities of Bitcoin’s privacy solutions, focusing on mixing techniques like CoinJoin. CoinJoin is a popular Bitcoin privacy solution that combines funds from multiple independent users into a single transaction, thereby obfuscating the traceability of funds from source to destination. In this project we will analyze on-chain data to evaluate prevalence of specific user behaviors, such as linking change outputs with CoinJoin outputs, and estimate their effect on user's privacy. Additionally, building upon prior fingerprinting work (Misra 2023) we will investigate how distinctive wallet characteristics might further isolate a user’s transactions within a CoinJoin."
## Dan's ongoing research
### Unnecessary Input Heuristics and PayJoin Transactions
Some transactions have more input than they need to make a payment. Some are just bitcoin core consolidations, but others are payjoin. How common are these, and do different varieties leave distinguishable fingerprints we should know about?
https://eprints.cs.univie.ac.at/7018/
### Payjoin V2 (BIP 77)
An asynchronous 2-party Payjoin protocol to break common input heuristic without leaking privacy unnecessarily. It's a store-and-forward protocol + Oblivious HTTP network privacy. Under active development and review. Implemented and specified in some fashion since before July 2023.
https://github.com/bitcoin/bips/pull/1483
### Anonymous CoinJoin Transactions with Arbitrary Values
This paper conceptualizes CoinJoin privacy as ambiguity of subset sums of "non-derived subtransactions" instead of output denomination ambiguity. This model concerns input-input, input-output, and output-output links. I find this model more powerful since it allows us to reason about payments batching that both saves money and preserves privacy versus typical mixing transactions that multiplies payment overhead by ~3x.
The paper identifies two algorithms to take payment intents and split outputs to produce a privacy-preserving structure. It leaves open the question of how a series of these transactions might or might not preserve privacy in context of the whole transaction graph. It also alludes to a third splitting algorithm.
https://www.comsys.rwth-aachen.de/fileadmin/papers/2017/2017-maurer-trustcom-coinjoin.pdf
code used to generate results in the paper: https://github.com/payjoin/cja
The author ran out of time to publish results of the third algorithm in the cja repo.
#### Payjoin V3
I am currently collecting data used with the third algorithm which can be run by each participant in a distributed way to perhaps develop a Payjoin V3 protocol that allows arbitrary numbers of participants to batch their payments, **Combine payments to the same destination**, and individually split output payments so that privacy can be achieved with cost savings, or at least come closer to break-even than old school equal-output mixing. The thought is that it can be coordinate with something like [Dicemix](https://eprint.iacr.org/2016/824)
## Yuval's research question ideas
- empirical / data scienc-y
- coinjoin tx graphs
- whirlpool
- tx0 change and post-mix interactions
- simulation approach to determine critical threshold of non-dojo users needed to deanonymize dojo users assuming adversary can observe xpubs
- wasabi 2
- non-uniformity of coin selection and amount decomposition are privacy leaks. how bad are they?
- electrum protocol privacy
- can private usage of electrum servers be achieved?
- if so, at what cost? (bandwidth, latency, ...?)
- how do existing client implementations fare?
- consequences of celsius court data
- many addresses revealed
- parameterize Kelen-Seres model with these boundary conditions
- perhaps a general model for quantifying the privacy cost of KYC can be done?
- from the point of an individual: how costly is segregation of KYC affected transactions from e.g. cold storage?
- what are the consequences for bitcoin as a whole?
- ethical review seems especially prudent, as these are real individuals and their safety was apparently not really considered when the data was made public
- tor connection isolation
- are bitcoin wallets using tor correctly?
- wasabi misuses tor, resulting in a serious (according to me) privacy leak
- other bitcoin wallets which use tor don't need as many isolated connections
- nostr + bitcoin
- apply Diaz's entropic model to clients & relays
- analyze fingerprints due to e.g. JSON serialization or HTTP client variations
- can nostr be used (safely? easily?) to build bitcoin privacy tools?
- theoretical
- adversarial
- n-k deanonymization attack economics (coinjoin)
- what do marginal costs of sybil deanonymization attacks look like for multiple targets?
- lightning routing privacy
- technically, off chain data is just a transaction graph, same anonymity set modeling should apply, but generalized for local information leaks (adversary can only view published parts of the transaction graph or private information of compromised nodes, c.f. https://pure.mpg.de/rest/items/item_3500837/component/file_3500838/content)
- is it viable to make privacy enhanced routing decisions by keeping track of which nodes observed what information using the entropic anonymity model?
- rigorous analysis of path-like anonymity set
- my bitcoin ATM example is somewhat contrived
- is the entropic interpretation justified under this threat model?
- can this threat model be generalized?
- coinswap privacy
- update Moser et al's work from 2016 (?)
- the model in https://fc24.ifca.ai/preproceedings/146.pdf seems useful for quantifying this
- can this coherently account for teleport-tx style coinswaps https://github.com/citadel-tech/coinswap
- mercury statecoins may have particular fingerprints?
- mathy
- expanders <-> graph differential privacy
- some open questions arising from the synthesis Kelen-Seres model along with Maurer-et al
- can we quantify the robustness of the entropic anonymity set?
- can mechanism be designed to improve robustness?
- algebraic study of sub-txn model
- ... hard mode: generalize from decisional version of subset sum to optimization version
- here too some natural connections to expander graphs seem worth exploring
- engineer-y
- incentive compatible light client privacy
- can we build non-parasitic partially validating node class?
- a number of related ideas i've had for a while suggest that maybe we can?
- privacy similar to compact block filters approach
- https://drive.google.com/file/d/1YrJPBl70UhV5CP9L32-1GyH1pSFccEKm/view
- randomized forest instead of flat partition
- flyclient
### What Yuval is working on
- somewhat similar to payjoin v3 ideas, but different approach
- distributed coinjoin protocol
- coalition formation phase
- lots of interesting stuff here, mainly game theory
- tl;dr - how can mutually distrusting agents form coalitions where they commit to constructing privacy enhanced transactions together?
- different coalitions (sets of inputs) are more or less amenable to constructing ambiguous transactions
- how to protect privacy in such a protocol/mechanism
- transaction construction phase
- less interesting, most open questions are bikeshedding
- sybil resistance (measurement & mechanism design)
- practical/implementation considerations
- esp. for light clients, rigorous metrics for privacy are hard to compute
- when and how can we approximate conservative estimates?
## Literature Review
### How to Not Get Caught When You Launder Money on Blockchain?
Cuneyt G. Akcora, Sudhanva Purusotham, Yulia R. Gel, Mitchell Krawiec-Thayer, Murat Kantarcioglu 2020
https://arxiv.org/abs/2010.15082
from Madars
> the title is provocative but we should all read - it has very good observations about practical privacy
### Towards Measuring Anonymity (2002)
Claudia D´ıaz, Stefaan Seys, Joris Claessens, and Bart Preneel
"Anonymity Set" was defined here!
https://sci-hub.se/https://link.springer.com/chapter/10.1007/3-540-36467-6_5
### Anonymity Loves Company (2006)
Roger Dingledine and Nick Mathewson
Case studies in "how usability impacts security" focused on "network effects of usability on privacy and security"
https://www.freehaven.net/anonbib/cache/usability:weis2006.pdf
### Why I'm Not an Entropist (2009)
Paul Syverson
Naval Research Laboratory
TL;DR Anonymity Set is broken since not every member of the set has an equal vulnerability to deanonymization attack.
https://www.freehaven.net/anonbib/cache/entropist.pdf
### Towards Measuring the Traceability of Cryptocurrencies (2022, rev 2024)
Domokos Miklós Kelen, István András Seres
> a formal framework to measure the (un)traceability and anonymity of cryptocurrencies, allowing us to quantitatively reason about the mixing characteristics of cryptocurrencies and the privacy-enhancing technologies built on top of them. Our methods apply absorbing Markov chains combined with Shannon entropy
https://arxiv.org/abs/2211.04259
### On the Anonymity of Peer-To-Peer Network Anonymity Schemes Used by Cryptocurrencies (2022)
Piyush Kumar Sharma, Devashish Gosain, Claudia Diaz
Absolutely demolishes privacy assumptions of Lightning, Dandelion, and Dandelion++
https://arxiv.org/pdf/2201.11860v2