Research Notes: Decentralized Uniqueness Service

Biometric Uniqueness

Worldcoin’s proof of personhood credential relies on iris biometrics to verify uniqueness among a global set. Worldcoin settled on the iris as it provides the most entropy and therefore has the highest accuracy. Iris codes are calculated in a manner similar to the method Daugman described in his seminal paper on the subject. Iris codes are compact binary representations of the unique pattern found in a person's iris.

Once the iris code is calculated it is compared against the iris codes of all other users who have been verified so far to determine if the user is unique to the set or not. This is called the biometric uniqueness service. Today, this service is operated by Tools for Humanity. There are several ideas how the uniqueness service could be decentralized.

To decentralize the uniqueness service, the iriscode would need to be published on chain which brings up the question of irreversibility. In almost 30 years since the invention of the Daugman method, no one has successfully managed to reconstruct an image of the eye from an iris code that resembled the input. The only successful attempts reversed an iris code to an image that generates the same iris code but looks very different which doesn’t pose an attack vector in the scope of the Worldcoin project. Even a perfectly reversed image could not be used to game the system since the Orb includes a liveness check. Additionally, the system is designed such that no personal data linked to the iris code, which leads to the risk of a reversed image being usable for identification purposes being virtually non-existent. However, since irreversibility cannot be mathematically proven, this is a blocker for publishing iriscodes on-chain and therefore decentralization. There are several options to address this which Tools for Humanity is exploring.

Stable Representations

Iris codes computed with the Daugman algorithm are slightly different for different images of the same individual's iris. Finding a path to create stable representations would enable the representation to be hashed and therefore become cryptographically irreversible. Two different ways to achieve this are the following:

Fuzzy Extractors
This method can be applied on iris codes and employs error correcting codes. It makes use of the fact that iris codes from the same subject have a smaller hamming distance and therefore can reconstruct the error and match them. This would allow to cryptographically hash the original template and only compare hashes.

Apart from more practical issue (e.g. computational performance on a billion people scale), hashing alone is not sufficient to guarantee privacy (i.e. irreversibility). The entropy in the iris may not provide enough entropy, so that the hashes can be reversed to the iris code through a brute force attack, which renders the whole application of the scheme useless. The entropy required to distinguish among 8 billion individuals is lower than what would be required to prevent brute force attacks. Therefore external entropy
would need to be added. While the custody over this entropy can happen in a distributed way and does not need to be custodied by a central party, there still persists the risk of leaking this entropy. One approach to add external entropy could be through oblivious pseudorandom functions.

Neural Network and Locality-Sensitive Hashing
A different method from classical algorithms to calculate iriscodes for establishing uniqueness among a set of people based on the iris is deep learning. In that case a convolutional neural network generates a vector embedding to represent the individual. Even though the network is trained to produce embeddings as close as possible to each other for different images of an individual's iris they still differ slightly from each other and can therefore not be hashed. The general class of Locality-sensitive hash (LSH) functions and Property Preserving Hash (PPH) functions are a possible means of creating an irreversible representation since they allow to quantize the space of embeddings and generate a stable representation for any vector that is within that subspace which can then be hashed.

There are three major challenges to this approach that require further research. Firstly, as of today there is no neural network based approach that reaches similar accuracy as iriscodes which are calculated through classical algorithms. While recent advances in face recognition through deep learning indicate that this can likely be done, it is still an area of active research. Secondly, LSH will affect accuracy as well as computational performance when computing uniqueness. The exact impact can only be measured once the above mentioned neural network has been developed. Last but not least, this approach also requires external entropy similar to fuzzy extractors. Several external teams and researchers are working with the Worldcoin foundation and Tools for Humanity to identify potential solutions. If you are interested in the topic feel free to reach out!

Irreversible Vector Embeddings

Recent research has shown promising progress to make embeddings generated by neural networks empirically irreversible. While the irreversibility is only empirical and not mathematically provable it is a direction worth exploring that could in combination with other mechanisms enable mathematically provable irreversibility.

Sharding of the Uniqueness Service

Decentralization is not binary. Between the uniqueness service run by Tools for Humanity today and a perfectly decentralized system where all data for verification is on-chain there is a gradient in between. To publish all data on-chain several hard research questions need to be solved. However, sharding iriscodes (works just as well with iris embeddings) between different verifiers would enable significantly more resilience, increase privacy and reduce single points of failure but wouldn’t be perfectly transparent and verifiable on-chain. Assuming there are N verifiers. For every new user who seeks to verify their uniqueness, the verifier receives an M-th fraction of the iriscode that they compare against their private set of fractional iriscodes. Then all verifiers exchange the indices and fractional hamming distances of the iriscodes that could be a potential collision depending on the results of the other verifiers. Based on that information it can then be determined if there is a collision.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Fig: Exemplary sharding of the iriscode uniqueness comparison among three verifiers. This increases privacy as nobody knows the full iriscode as well as resilience since none of the verifiers would have the power to either censor the comparison or create fake verifications. Further, single points of failure are reduced since any of the verifiers could go offline and the system would still function although security guarantees are reduced. The higher the number of verifiers, the higher the guarantees

If less than M out of N verifiers are malicious, they couldn’t influence the end result as the discrepancy would be uncovered. The more verifiers there are, the harder collusion becomes harder since M (which increases with the number of verifiers N) need to be convinced to act maliciously.