Semaphore, a project developed by PSE at the Ethereum foundation, exists and it is dope. It is a primitive which allows identity providers to create groups and issue identities within these groups to users. These identities have two important properties:
Non-attribution: A user with a Semaphore identity can sign messages in such a way that a verifier of a message can ensure the signer is in a particular group, but without revealing who in the group signed the message.
Uniqueness: A message can come with a nullifier which ensures that a user can not sign two distinct messages within the same scope (a.k.a context).
Figure 1: Semaphore Overview |
These two properties make Semaphore very attractive for private voting, messaging and generally any application which would benefit from private authentication. For example, a provider could administer a "resident of city X" group which could be used by residents to privately vote on the best local restaurant, elect the city council, whistleblow corruption in local government, or voice political dissidence.
However, Semaphore does not solve other critical challenges identity systems face:
Sybil Attacks: While it provides the uniqueness property mentioned earlier, it does not prevent a user from obtaining multiple identities within a group. Preserving a 1-1 mapping between identities and unique humans is a task left up to the provider.
Trust: Applications which choose to delegate their identity system must trust the provider to not issue themselves identities used to exploit the app. This is less an issue when a group is administered by a smart contract using on-chain data for registration. But the most valuable identity information is not typically available on-chain.
One approach to ensuring a user is not able to obtain multiple identities is to have the provider require the user disclose personally identifiable information (PII) to them. The provider then verifies and stores this PII in a private database upon registration. Now for all future registrations the provider can check to make sure a user can not use the same PII more than once. Great! Now we have some strong assurances that users can not obtain multiple identities.
But hold on, this approach has some pretty severe draw backs:
There is a long standing tradition of incumbent identity providers (government, financial institutions, social platforms, etc.) distributing access to sensitive PII. Whether by accidentally leaving the front door wide open, falling victim to sophisticated attacks, or simply selling it intentionally via data sharing agreements. A lot can be said about their shortcomings but as mentioned above, replicating PII to a new set of providers is not an improvement. Unless you're in the business of vetting and storing private information and capable of doing so securely: don't.
Alternative approaches of varying effectiveness do exist such as simply charging a fee per identity, or peer attestation networks to mention a couple. Here we'll focus specifically on identities issued by traditional identity providers.
A large number of sybil resistant identities already exist today. For example, while not perfect, a nation state has a pretty good idea of how many citizens it has. Further, most have assigned unique identifiers to each of their citizens. Some have even gone as far as issuing cryptographic credentials which can already be leveraged by applications.
Governments aren't the only ones in the business of keeping track of identities, heres a non-exhaustive list of others:
That's great and all, so why aren't we using them? There are a number of reasons why we don't see more applications leveraging these existing identities.
Internal: A lot of the entities which vet and house PII use it for internal purposes and have no interest in being identity providers.
Non-cryptographic: Many (most) identity providers do not issue cryptographic credentials. Instead they provide forgery resistant physical credentials such as cards and passports. Or they operate API-based solutions of which they can monitor and restrict access such as OAuth, OpenID, or bespoke deployments.
Until recently, there simply were no satisfactory methods for people to utilize these in other applications.
TLSNotary is another open-source protocol developed by PSE. It started its life as an independent project, first conceived in 2013 on a Bitcoin forum. Its purpose is to solve one conceptually simple problem: How can one query a webserver and share the data with another party in a secure way?
Figure 2: Data sharing |
The fact the internet is largely missing such a basic functionality is seldom noticed, but has influenced its current architecture to a degree that is hard to overstate (and worth an article of its own). Perhaps you've once thought there must be a better way whilst going through the motions of forwarding someone a screenshot of a page on a website.
Using some fancy cryptography, TLSNotary addresses this issue while having some interesting properties:
Connecting the dots, TLSNotary can be used to prove any existing identity information on the internet even if the webserver hosting it wasn't designed to be an identity provider. For example, one could log in to a feature-lacking government website and prove their citizen ID to a third party without revealing their login credentials or anything extra.
In combination with Semaphore, it's possible to reuse identity data to bootstrap massive private identity sets. This can enable people to use their existing information to join new systems while preserving both privacy and sybil resistance.
Figure 3: Bootstrapping with TLSNotary |
By simply combining TLSNotary and Semaphore we're already able to do some really interesting stuff! However, recall from the introduction that the privacy and integrity of the registration process still relies on the party administering the Semaphore group. The Semaphore provider knows who joins the group and can trivially insert fake identities if they so desire. Adding yet another trusted authority to the mix is not satisfactory, can we do better?
Figure 4: Patrick Bateman wants better |
Fortunately, we can! Both the privacy and trust issues can be addressed in tandem by adding more parties and using something called multi-party computation (MPC).
First, a Semaphore group can be configured such that multiple parties must come to agreement when adding an identifier. With this, a user registers to a group by proving their identity to all parties. Any application which wants to incorporate a group can then decide themselves if is sufficiently decentralized for their needs. To address liveness issues it's also possible to configure thresholds, i.e. requiring an of quorum for registration.
Now that we have multiple parties we can address the privacy issue using MPC. A full introduction to MPC is out of scope of this article, but put simply: MPC allows multiple parties to compute some public function on private inputs provided by each party, such that every party only learns the output .
Many MPC protocols can compute arbitrary functions with varying levels of efficiency. Fortunately, in this case we only need to do two very simple things:
With those two functionalities we have enough to upgrade the registration process to provide much better privacy assurances.
Figure 5: Private trust minimized registration |
Now during registration TLSNotary is used to prove a commitment [1] to the identifier which is subsequently provided as an input to the registration MPC outputting a private identifier . This system has several nice properties:
An astute reader may wonder why not just use the commitment as the private identifier? That would be a good question and it's true in some cases that would be sufficient. However, the original identifiers more often than not contain little to no entropy and can be easily recovered by bruteforcing a lookup table. Additionally, as mentioned in the points above, there is value in hiding the members of the set from the source identity provider.
With tools that exist today, users can be empowered to convert existing and otherwise inaccessible identity information into private Semaphore identities. Instead of duplicating sensitive PII into yet another database, users can use zero-knowledge proofs to prove more general statements about themselves and link it to their private identifiers.
Under the hood, Semaphore groups are essentially just merkle trees with some extra cryptographic ornaments. The ideas presented in this article can be framed as "merklizing" existing web databases in a way where the users themselves help port each leaf and privately claim it as their own.
Figure 6: Merklizing the Web |
In the most basic case, as described in the Login with Anything section, it's possible for an off-chain application to trustlessly tap into any existing identity system without requiring direct integration. This can simplify building things like private anonymous chat, voting apps, or perhaps new social platforms.
For cases where we want to unlock this for a broader set of applications, such as on the Ethereum blockchain, things begin to resemble an oracle system (and inherits the associated trust challenges).
Figure 7: Shared Identity Sets |
Further, users can aggregate their group memberships into composite proofs such as depicted below. Keeping in mind that, thanks to Semaphore, it's possible for applications to limit each to a single use in any given context!
Figure 8: Composite Identity Proofs |
With a high-level overview of how we can bootstrap private identity sets we can now touch on concrete steps which can be taken to realize this.
A pragmatic start would be to admit that the path to the trust minimized version comes with a lot of engineering and incentive challenges while the Login with Anything approach could provide an immediate and low investment solution for off-chain products today. With that in mind, below is a potential plan of attack:
If success indicators are looking good for the above track, the trust minimized effort could be approached: