Building Verifiable Computation

# Building Verifiable Computation ![for_distinguished_user](https://hackmd.io/_uploads/H1UjrcnSa.png) ## Introduction **Verifiable computation** is the notion of proving correctness and security between nodes in network systems. Proof-carrying data (PCD) is an approach to achieving verifiable computation through cryptographic mechanisms. PCD has been a topic of research for a little more than a decade — if you're unfamiliar, we suggest checking out the [appendix](https://) before we discuss our experiments in practically building PCD systems. ## Improving PCD Systems An emergent PCD system is [Zupass](https://github.com/proofcarryingdata/zupass), created by the [PCD team](https://pcd.team/) at [0xPARC Foundation](https://0xparc.org/). You may have run into Zupass as the ticketing system for [Zuzalu](https://www.palladiummag.com/2023/10/06/why-i-built-zuzalu/), Zuconnect, and Devconnect. Zupass enables users to produce proofs on their mobile devices every time they enter an event. In some cases, every time they change floors. Beyond ticketing, Zupass serves as an interface for managing private user-controlled data. This itself is a user-centric application on top of the generalizable [PCD SDK framework](https://pcd.team/). The framework intends to provide a set of interfaces for reasoning about and processing PCDs. It also intends to support the various types of underlying PCDs. What do we mean by "various types" of PCDs? There are many options for the type of *proofs* that the data *carries* — [probabilistic proofs like zkSNARKs](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf), signatures, merkle proofs, hash commitments, keypairs, and beyond. While Zupass can hold many types of PCDs, we observe that the current library of PCD types is limited to simpler proofs like the [Semaphore set membership proof](https://semaphore.pse.dev/docs/V1/introduction). This is because the current implementation uses [Circom](https://github.com/iden3/circom), a novel but slightly arduous domain-specific language for building proofs. We've spent much of our time designing APIs to make proof-writing easier. In particular, we built the [EZKL](https://github.com/zkonduit/ezkl) library for straightforward in-program proof generation. Let's experiment with using EZKL to rewrite the business logic for Zupass. We aim for this to provide foundation for a more extensive set of PCDs available in Zupass. ## Rewriting Set Membership The default PCD in Zupass is set membership, which is a proof that a user is included in a larger group. Traditional set membership associates a given user with a password or leverages an identification system (government IDs, a named concert tickets, membership cards) Though these systems are effective, they reveal fragments of potentially identifying information and as such are by default _not private_ and *less secure*. It's one of the reasons plaintext passwords remain prevalent (except for local applications that can access areas like your secure enclave). One of the first applications of zkSNARKs was the [Sempahore protocol](https://semaphore.pse.dev/docs/V1/howitworks), a set membership system for proving you belong while masking your identity (and more). We can use EZKL to even extend private set membership protocols to encompass a wider range of identity schemes such as vocal and facial recognition. Further, we can build applications that allow for key recovery or private verification of a user’s identity based on biometric data. Some of our users are already working on this. As a first demonstration of the EZKL x PCD combination, we’ve built a simple set membership system using EZKL and Zupass. We built the EZKL set membership circuit in a few lines of Python code, and use Zupass to display the resulting proof — the process doesn’t require any knowledge of Circom, and can be done entirely in Python and Javascript. We hope that this broadens the community of developers and applications that use zero knowledge proofs. ## How it Works ### Set Membership Model with Pytorch ![Untitled-2022-01-06-1700(15)](https://hackmd.io/_uploads/H1GSXCjra.png) With just a few lines of Pytorch and a few lines of EZKL we can build a set membership system. ```python class set membershipModel(nn.Module): def __init__(self): super(set membershipModel, self).__init__() def forward(self, individual, members): diff = individual - members membership_test = torch.prod(diff, dim=1) return (membership_test, members) ``` We define a model that takes two inputs: - The `individual` being checked - The `members` of the set We then subtract the `individual` from each of the `members` (the set difference). This results in a tensor (array) where each element is the different between `individual` and each of the `members`. We then take the product of this tensor to test for the zero product property. That is, the product of the set of numbers is zero if and only if at least one of the numbers in the set is zero. Therefore, if the `individual` is a member of the set, the product will be zero. Otherwise, the product will be non-zero. > Consider a trivial example where `individual=3` and `members=[1,2,3,4]`. The set difference will be `[1-3,2-3,3-3,4-4]=[-2,-1,0,1]`, which implies the product will be zero. ### Zero Knowledge Circuit with EZKL In the EZKL config, we design the circuit such that we hash the `individual` and `members` with [Poseidon (a common hash function)](https://eprint.iacr.org/2019/458.pdf) and fix the output to be `0`. This means that if the hashed `individual` is a member of the set, the product will be nil and the fixed output will be *satisfied*. If the `individual` is not a member of the set, the product will be non-zero and the fixed output will *not be satisfied*. ```python run_args.input_visibility = "hashed/private/0" run_args.output_visibility = "fixed" ``` Semantically, we are claiming that the individual knows or has a pre-image (password), and the hash of that pre-image matches a Poseidon hash that was previously registered to the set as a member. In this way the individual is able to prove they are a member of the set without revealing their identity or the pre-image (password) that hashes to the Poseidon hash. For the full set of python code see [this Colab notebook](https://colab.research.google.com/github/zkonduit/ezkl/blob/main/examples/notebooks/set_membership.ipynb). > **Note 1**: This construction is not the most asymptotically efficient construction! We are essentially using an array, linear search, and copy-on-add instead of a tree with efficient insertion and search. However, for small “human-event-sized” sets and where there is large overhead for hashes inside proofs, it can be faster in practice. > **Note 2**: This scheme is not necessarily collision-resistant and does not represent the most secure verification system. We use it here as an intuitive starting point and an easy-to-follow example. ### PCD Management with Zupass We build the application logic into Zupass via the components defined by the [PCD interface](https://github.com/proofcarryingdata/zupass/blob/0550db156d8a7712bdfae68b1f1135ab214404e9/packages/pcd-types/src/pcd.ts#L11)). The application logic here is simply a registration and login flow, but with PCDs. We first define two new components: - **Secret Store**: the pre-image chosen by the user and set in the system. - **Group Membership PCD**: the proof we display as a sequence of QR codes and can verify. Together, these allow for Zupass-based application where a user can create a secret pre-image and generate the EZKL proof, and a third-party can verify it – all while keeping the set of members up-to-date and private. ### Secret Store The user provides a pre-image (password), adding the new secret PCD to the user’s Zupass interface. Think of this as personal PCD, which someone can prove is theirs. The application hashes the pre-image and then registers the resulting Poseidon hash within the set membership server. Once the server rebuilds the set membership model with the new Poseidon hash set, the user is officially a registered member of the set. This might be done at ticket purchase time, or perhaps after verifying another PCD that establishes the right to be included. ### Group Membership PCD When the user wishes to prove they are a member of the set, the Zupass application calls the `@ezkljs/engine` function `prove()`. This creates an EZKL proof, which we render as an animated sequence of QR codes. This is a neat trick for displaying large PCDs. Here, the proof is too large to display in a single QR code. The dynamism means the proof size does not have an upper bound, so it remains practical to use audio or image data as the secret store. This also prevents the user from using a screenshot of the EZKL proof. ![gif-pcd](https://hackmd.io/_uploads/S1hIQRsr6.gif) ### Proof Scanner In practice, a security guard scans the recently generated QR code with the EZKL proof Scanner. The scanner picks up each QR code _slide_ and stitches them together into a single EZKL proof. If the proof is valid, the user is known to be a member of the set! 🎉 > There's more discussion to be had around making this particular example copy-proof, but we omit that here. ![ezgif.com-crop](https://hackmd.io/_uploads/ryixDp2H6.gif) ## Fin! This experiment is a simple example of how EZKL can be used to build real world prover / verifier systems without having to know a specialized language or framework like Circom, or even interact with a blockchain. All relevant code can be found in the resources section below. Beyond simplifying developer experience, EZKL opens the door to a whole new class of applications that leverage AI. For example, you could build a set membership system that uses a voice recognition model to prove you are a member of a set without revealing your identity in the vein of [cryptoidol](https://cryptoidol.tech/). We’re excited to see what you build with EZKL, Zupass, and the PCD SDK. If you have any questions or comments, please reach out to us on [Discord](https://discord.gg/pU7Qc2ECWb) or [Telegram](https://t.me/+CoTsgokvJBQ2NWQx). We’d love to hear from you! ## Special Thanks Special thanks to the [0xPARC team](https://0xparc.org/) for their work on the PCD SDK and Zupass. Thank you in particular to Gubsheep, Ivan, and Richard for their help in building this demo. ## Resources - [Colab notebook for the set membership PCD](https://colab.research.google.com/github/zkonduit/ezkl/blob/main/examples/notebooks/set_membership.ipynb) - [ezkljs engine](https://github.com/zkonduit/ezkljs) - [Set membership server](https://github.com/zkonduit/set-membership-server) - [Zupass fork with new EZKL PCDs](https://github.com/zkonduit/zupass) ## Appendix ### What is proof-carrying data? Proof-carrying data was [proposed by Alessandro Chiesa in 2010](http://dspace.mit.edu/handle/1721.1/61151) as follows. > *Proof-carrying data* sidesteps the threat of faults and leakage by reasoning about properties of a computation’s output *data*, regardless of the *process* that produced it. > In PCD, the system designer prescribes the *desired properties* of a computation’s *outputs*. > Corresponding proofs are attached to every message flowing through the system, and are mutually verified by the system’s components. Each such proof attests that the message’s data *and all of its history* comply with the prescribed properties. Proof-carrying data is a method where **data itself carries a proof of some computation**. Think of it like a parcel that comes with a certificate ensuring everything inside is exactly as it should be, according to some predefined rules or computations. PCD is an approach to achieving a the broader scheme of **verifiable computation**, which Babai et al. introduced in their [1991 paper](https://dl.acm.org/doi/10.1145/103418.103428) on checking computations. The [Pinocchio team](https://www.microsoft.com/en-us/research/publication/pinocchio-nearly-practical-verifiable-computation/) at Microsoft highlights renewed interest in the [early 2010s](https://www.microsoft.com/en-us/research/project/verifiable-computing/overview/), given the pervasiveness of cloud computing. Verifiable computation is important in situtations where we **outsource computations** to a third-party, potentially untrusted worker and seek to verify the correctness of the returned results. Today, this pervasiveness becomes increasingly noticable as we outsource decision making to remote models or forms of articial intelligence. Network systems already utilize various cryptographic protocols for data transit and handling. Verifiable computation and PCD present a new cryptographic primitive focused on correctness. The trivial approach to verifying the correctness of a computation is replication. If Alice outsources a computation to Bob, she can prove the correctness of his returned results by simply redoing the computation. However, the trivial approach becomes infeasible when the computation is sufficiently difficult (for example, Bob is running a large ML model). The main advantage of verifiable computation is **succinctness**, meaning proving / verifying is much less expensive than redoing the computation. There are many proposed approaches to verifiable computation, and more specifically PCDs. For this piece, we've focused on PCDs in the form of zkSNARK proofs (a type of zero knowledge proof) produced by the EZKL library. However, there's been a number of recent papers proposing PCDs with [split accumulation](https://link.springer.com/chapter/10.1007/978-3-030-84242-0_24), [additive polynomial commitments](https://link.springer.com/chapter/10.1007/978-3-030-84242-0_23), [arithmetized random oracles](https://link.springer.com/chapter/10.1007/978-3-031-30617-4_13), [multi-folding schemes](https://eprint.iacr.org/2023/1282), and more. There are also a few implementations of PCD, notably by the [arkworks team](https://github.com/arkworks-rs/pcd) and the [PCD team](https://github.com/proofcarryingdata/zupass) as in the article. Another core property of PCDs is that they are **transitive**: if you pass a PCD to a third party, they should be able to verify the correctness of the computation without further context. This is a powerful property first highlighted by Alessandro Chiesa in his [2010 paper](http://dspace.mit.edu/handle/1721.1/61151). ![Untitled-2022-01-06-1700(22)](https://hackmd.io/_uploads/ryWnRF2BT.png) > **Note**: The current Internet is secured using public key cryptography protocols like TLS and HTTPS. But these protocols are not _arbitrarily programmable_ in the sense that they do not allow for the verification of the correctness of arbitrary computations. Further, transitivity is missing, which is why we need constructions such as [TLS Notary](https://tlsnotary.org/). ### Isn't this just blockchain technology? The astute reader will note that PCDs sound akin to smart contracts, the programmable substrate that underpins blockchains like Ethereum. Smart contracts and blockchain are one approach to composability, with some advantages (especially around data availability and double spend protection) and some costs (consensus is expensive in terms of time, compute, and security). PCDs built from ZK provide a different tradeoff: no inherent double spend protection, but built-in privacy, and much of the composability and cryptographic functionality provisioned in a more scalable way. So while blockchain makes trustworthy and composable computation possible largely through consensus, ZK and PCD proceed in a kind of orthogonal technical direction, allowing for “trust and composability without consensus.” Recent efforts have focused on building out SDKs and APIs that define common interfaces for the transmission and ingestion of cryptographic data. These interfaces have the potential to expand the scope of verifiable computation systems beyond blockchains to encompass the entire Internet. Diagramatically this looks like the expansion of current verifiable computation systems from a financial center (blockchains) to a larger ecosystem of applications (the rest of the Internet): ![Untitled-2022-01-06-1700(16)](https://hackmd.io/_uploads/SJToQAirp.png) You may notice that many of our [tutorials](https://github.com/zkonduit/ezkl/tree/main/examples/notebooks) so far have focused on building zero knowledge circuits whose proofs can be verified either in-program or with a _verifier contract_ deployed on an EVM-based blockchain for added security. Smart contracts as verifiers offers a few advantages: 1. The verifier contract can be deployed once and used to verify an infinite number of proofs, and as such has a one time deployment cost. 2. Verification does not modify the state of the verifier contract, and as such verification is free (if run locally). 3. They offer a canonical source of truth for the verifier code that can’t be post-hoc modified. However, such verifiers aren’t always best for applications with frequently-changing or large, complex circuits. Blockchain-based verification is useful for peristent security, but **it is not a dependency** for verifiable computation and PCD.