Second Discussion with Vint

# Second Discussion with Vint Dear Vint, I am grateful for your time and willingness to engage in discussions on how to keep the web open. This email summarizes and articulates my thoughts about our conversation last week. ## Summary / TL;DR 1. Proposes a problem statement that our future discussion aims to address: "Anonimity vs. accountability dilemma in an open web." > Comment: maybe this is not the right way. > Identity is not same as Privacy. The torsal here is "identity, accountability and anonymity". you could have privacy even if you are identifiable. > Under what circumstance it's ok to anonymuous in this online einvironment. > The service doens't nee to know who you are to provide your serivce. > If you choose to identify yourself with that user, service provider can provide a better job servicing the user. > Even if there are ways to hold generic parties accountable, that might not be sufficient to undo the harm. > > What are the limits of anonimity? > Who gets to expose the identity of the anonimity? > The companies that offer online services are being held responsibles for identifiying parties who have used the service to hold harm. > Example 1: someone threat physical harm to president, who are responsible for exposing the identity? > Example 2: someone is a whisleblower... journalist are shield of identity of their whistleblowers. > Look into Jurisprudence around the speaks > We may need more "proof of validity" > Process of Critical Thinking > Examle of a dangerous phenominon: somebody is using chatbot to take examples e.g. Medical / Law. > Example of another example of we attribute too much validity: > 1973 Xerox Palo Alto Research. > We have to learn new clues about accept the "proof of validity" > "Inserting false statements into a paper full of true statements" > "How do we know what we know", a paper to look up Action Item > A lot of people assume: if the facts are known, the conclusion will be trivial. > Belief > 3. Uses a real-world example of Wikipedia to illustrate the concepts of "privacy" vs. "accountability." 4. Makes the first attempt to formalize and generalize definitions of privacy and accountability. 5. Offers some thoughts and three possible solutions, along with a brief survey of their academic and industrial prior works. ## Background ### Problem Statement Allow me to articulate our *problem statement* a little more: "To make the web open (e.g., Freedom of Speech), we need to solve the dilemma of privacy vs. accountability." ### Wikipedia's Privacy vs. Accountability To put that problem statement into context, let's use the example of Wikipedia, as you participated in and supported the Wikipedians@Google internal group that I initiated while I was at Google. Wikipedia is an encyclopedia "everyone" can edit. To maximize openness, Wikipedia doesn't require users to register: > [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia) is a [wiki](https://en.wikipedia.org/wiki/Wiki), meaning anyone can edit nearly any [[1]](https://en.wikipedia.org/wiki/Help:Editing#cite_note-protection-1) page and improve articles immediately. You do not need to register to do this, and anyone who has edited is known as a [Wikipedian](https://en.wikipedia.org/wiki/Wikipedia:Wikipedians) or _editor._ https://en.wikipedia.org/wiki/Help:Editing #### Anyone can edit Wikipedia, but not the bad actors However, as Wikipedia becomes more popular, abuse, misinformation, and spam increase. To maintain a certain level of quality and integrity, Wikipedia must mitigate these negative elements, including removing content and banning certain bad actors from editing. How does Wikipedia "ban" bad actors from editing when "everyone can edit"? Abstractly, the fact that Wikipedia is banning a subset of "everyone" (human, bot, [animal](https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog), or anything) from editing means it's no longer "everyone." Instead, it becomes "everyone but the bad actors." (Using "bad actors" instead of "policy violators" to avoid distraction.) Before Wikipedia can "ban" bad actors, the first step is to "identify users." Here is Wikipedia's current behavior: 1. If an editor is registered with Wikipedia and has a user account, that account will no longer be able to edit. 2. If an editor doesn't log in with Wikipedia and remains anonymous, the IP address of the unregistered/not-logged-in user is used and banned. _Note: We use "login user" and "registered user" interchangeably because, in today's Wikipedia, one needs to register before they can log in, although this may change with public-key authentication in the future._ #### The Challenge of Using IP Addresses to Identify Users Using IP addresses to identify users (in order to ban them) is an unfortunate consequence of Wikipedia's prioritization of allowing "everyone to edit," even without registration. On the positive side, allowing anonymous users to edit without registration reduces the censorship space and the chance that the "Wikipedia site master" dictates who can edit, ultimately increasing "web openness." However, as a compromise, an IP address of an anonymous user is recorded and published publicly (https://meta.wikimedia.org/wiki/Unregistered_user), and a banning penalty is assessed upon IP addresses. IP addresses, for various reasons, are not the best choice for identifying users. Here are a few reasons why: 1. In countries with abundant IP resources, such as early participating countries in IP allocation, using IP addresses could potentially expose the very fine-grained locations of users. 2. In countries without abundant IP resources, such as late participants or developing countries, IPv4 addresses are often shared among many users in the same subnet, usually with technologies like [NAT](https://en.wikipedia.org/wiki/Network_address_translation). When a shared IP address is banned, all users using that same IP address are penalized for the bad behavior of one individual. More importantly, using IP addresses to identify users counteracts the original purpose of preserving editor privacy and reducing censorship. Additionally, other identity issues, such as sockpuppetry, arise.``` #### The Challenge of Preventing Sockpuppetry Another problem on Wikipedia is known as "[Sockpuppetry](https://en.wikipedia.org/wiki/Wikipedia:Sockpuppetry)", which is more generally referred to as a [Sybil attack](https://en.wikipedia.org/wiki/Sybil_attack) or pseudospoofing in computer security. In the context of Wikipedia, the goal is to ensure that when a ban is imposed on a malicious user, that user doesn't create another account or switch to a different network environment with a new IP address to continue their harmful behavior. The challenge of preventing sockpuppetry is a lesser-discussed aspect of "identifying users." Typically, we discuss authenticating users or "proving you are you," but it is also crucial to prevent a user from being associated with another identity when they already have an identity in the system, i.e., deduplication. > Note: The appropriate term to describe this process is still unclear. If proving a=a is "authentication," what should we call $x\notin A$? Perhaps "deduplication" or "deduplithentation?" ### Generalization Using Wikipedia as a context, we attempt to generalize and define the terms we have described: 1. **Authentication**: Proving that Alice is Alice. 2. **Deduplication** / **Deduplithentation**: Proving that Alice is not any other member in the set. 3. **Privacy**: When Alice presents proof to support a claim, she does not reveal her identity or enable an adversary to distinguish her from anyone else. However, there is an assumption of accountability where: 4. There is a mechanism, when necessary, to prevent Alice from performing certain actions. > Note: Generalizing these terms can be challenging.``` ### Solutions To grant a user permission to edit Wikipedia (or in general, perform a state-changing action to a state machine) while preserving their identity but holding them accountable when needed, we propose three potential solutions: ### Solution 1: ZK Proof for Set Membership Alice becomes a member of a group. Any member of that group can edit Wikipedia, and accountability is held at the group level. When Alice abuses her editing privileges, either her group is disciplined or the group removes Alice from its membership. For example, only a quorum of group members would be able to reveal Alice's identity. 1. Alice presents a Zero-Knowledge Proof to prove the claim that she is a member of a small editor group with a group size of N. 2. Any group member can edit Wikipedia. 3. When it is necessary to hold Alice accountable, a quorum of Q reveals Alice's identity to these group members, and Alice can then be identified and penalized. One such Set Membership ZKP is via Merkle Trees, where the N group members together generate a Merkle Tree with a secret as a leaf value known to themselves. The hashes of the tree are published. On a normal day, Alice can prove she is a member of the group by presenting her leaf node, but when the group needs to hold her accountable, a quorum of $Q = N-1$ could publish their membership to differentiate themselves from Alice, and thus Alice is revealed. Refer to other ZKP Set Membership algorithms for optimization in different contexts: https://zkproof.org/2020/02/27/zkp-set-membership/ ### Solution 2: ZKP to Prove Greater Score in a PageRank-like Trust Score Graph Assuming a set of Wikipedia users, each user assigns their "trust score" to a small number of other users they know and trust. For example, $S_{trust}(i,j)$ denotes the trust level from user $u_i$ to user $u_j$. Using a PageRank/EdgeRank-like algorithm that takes random walks from the original distribution of trust, we can generate a "related trust score" from $i$ to $j$. We could generate a relative score from a set member to Alice. Alice gains trust in the network by being assigned high enough scores from a few people in the graph ("trustors"). Alice is referred to as a "trustee" in this context. Alice needs to provide a Zero-Knowledge Proof to convince others that she has a greater score than the minimum trust score required to edit Wikipedia. When Alice does something wrong and needs to be held accountable, some of these "trustors" become aware (to be discussed) that Alice conducted the bad action, then reduce their trust in Alice. Now, Alice has a lower score according to the algorithm and can no longer edit Wikipedia. > Note: It is still necessary to consider how to prevent Alice from being known by her "trustor" when she edits in the first place. Perhaps she could "stake" some of her trust score in a virtual identity and use that identity to edit; when that identity needs to be penalized, a negative score is imposed upon it, indirectly reducing Alice's trust by the network. ### Other Notes These are early thoughts on what we could start exploring. ### Bibliography - [Decentralizing Privacy: Using Blockchain to Protect Personal Data](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7163223) - [A-PoA: Anonymous Proof of Authorization for Decentralized Identity Management](https://tum-esi.github.io/publications-list/PDF/2021-ICBC-APoA-Anonymous%20Proof%20of%20Authorization%20for%20Decentralized%20Identity%20Management.pdf) - [Efficient Zero-Knowledge Proof of Algebraic and Non-Algebraic Statements with Applications to Privacy Preserving Credentials](https://eprint.iacr.org/2016/583.pdf) - [Verifiable Credentials Data Model v1.1](https://www.w3.org/TR/vc-data-model/) - [TrustChain: A Privacy-Preserving Blockchain with Edge Computing](https://www.hindawi.com/journals/wcmc/2019/2014697/): uses PageRank score for trust - [A Privacy-enhanced Usage Control Model](https://aran.library.nuigalway.ie/bitstream/handle/10379/1849/thesisFinal.pdf?sequence=1) (thesis, a deep dive into using ZKP for DID) - [ZkRep: A Privacy-Preserving Scheme for Reputation-Based Blockchain System](https://dr.ntu.edu.sg/bitstream/10356/157156/2/zkRep_A_Privacy-preserving_Scheme_for_Reputation-based_Blockchain_System.pdf) - [A Framework for Data Privacy Preserving in Supply Chain Management Using Hybrid Meta-Heuristic Algorithm with Ethereum Blockchain Technology](https://www.mdpi.com/2079-9292/12/6/1404/pdf?version=1678939714) (good survey) ### DRAFTS 1. **Authentication**: Proving Alice is Alice. Given a user $u$ and a set of identities $S_{ids}$ recognized by the namespace, confirm that $u = id_x \in S_{ids}$ (Checking if the user is who they say they are.) 2. **Deduplication** / **Deduplithentation**: Proving Alice is not Bob. Given a user $u$ and a set of identities $S_{ids}$ recognized by the namespace, confirm that $u \neq id_y, \forall y \neq x, \{id_x, id_y\} \in S_{ids}$ (Checking if the user is not someone else.) 3. **Privacy**: When user Alice needs to prove the claim, the $proof$ she provides does not reveal her identity. In other words, the adversary couldn't tell Alice apart from any other member in that set with $1/2 + negl(k)$.