The Nullifier Dilemma

# The Nullifier Dilemma The work on the Nullifier dilemma is heavily inflenced and encouraged by the work previously done by Aayush with [Plume](https://blog.aayushg.com/posts/nullifier). ### Context [Summa](https://github.com/summa-dev): Application for proof of solvency. [*add more context on why proof of assets is even relevant*] [*specify that this whole scheme is specifically design with Ethereum applications in mind*] Private proof of assets need to be achieved: A Centralized Exchange (CEX) needs to prove that they own a certain amount of assets that is greater than their total liabilities. The privacy part makes possible to hide the amount of assets and hide the address that they claim to control. For the scope of this article we can simplify that "CEX needs to prove that they control an address with balance $\gt 0.1 ETH$ at a specific block without revealing the address". This can be achieved by snapshotting the state of Ethereum at that specific block and create a (publicly available) merkle tree accumulator that contains all the addresses that own $\gt 0.1 ETH$. The root of this Merkle Tree will be stored on a blockchain. In order to prove control of an address with more than 0.1 eth, an actor would prove inclusion to a merkle tree (identified by the merkle root publicly available) inside a zkSNARK. Note: a similar accomplishment can be achieved without requiring to *manually* create a merkle tree accumulator using Patricia Merkle Trie state proofs. In this write up, the proposed nullifier scheme is explained using the "traditional" Merkle Tree apporach, altough the same scheme can equally apply in the case of using Patricia Merkle Trie state proofs. *[Add image of the group inside a merkle tree here]* This is a common design pattern of zero knowledge proof applications like [Semaphore](https://semaphore.appliedzkp.org/), [Heyanon](https://twitter.com/heyanonxyz) or Tornado Cash (RIP). The idea here is proving that you are part of a certain group **made of ethereum addresses** without revealing who you are in order to **signal** your opinion (in the case of Semaphore or Heyanon) or withdraw your money (in the case of Tornado Cash). The zkSNARK will contain the following computation integrity checks: - Prover controls an address (achievable by ECDSA signature verification) - This address is included in the merkle tree committed on chain with a public available **root** *[Add image of the circuit here]* *[Specify what is public and what is private]* This allows the prover to **prove that he/she is a member of the group without revealing who he/she is**. A typical problem that comes when designing cryptography schemes that involves zk is that the privacy element can be exploited to achieve **double-spending**. For example: - A same user can express double-signal and it can lead to multiple votes expressed by the same user - A same user can double-spend the same receipt and it can lead to multiple withdrawals performed by the same user that has performed a single deposit. In the case of Proof of Assets, two malicious addresses can double-claim the same balance. For example two malicious CEXs can cooperate an share the same signature. Since it's all happening in the dark of a zkSNARK, from the outside it's impossible to tell wheter they used the same signature (for proving ownership of an address) or different ones. Enters the **nullifier**, a unique identifier for each member of the group that: **Property 1**: Uniquely identifies a member of the set. More precisely, the uniqueness relation should be bi-directional: a member of the set, identified by their address, should map to a unique nullifier, and a nullifier should map to a unique address. **Property 2**: Preserves the privacy of the member as no information about the member is retrievable starting from the nullifier. In terms of Circuit Design, a **nullifier** is computed inside a circuit and exposed as public output. This is defined nullifier since it nullifies an address’s ability to perform an action again. Everytime that the same user generates a proof, the nullifier output will be the same. Considering a previous example, if the same user expresses multiple votes via zkp generation, each proof will contain the same output nullifier. Thefore, the vote counter will be able to tell that these votes are coming from the same member of the set (but, still, without accessing their identity), and only count the first vote casted, while discard the other ones. The nullifier, composed with the previous component, allows the prover to **prove that he/she is a unique and consistent member of the group without revealing who he/she is**. ### Possible Designs of Nullifiers ##### 1. $H(address)$ H is a generic hash function. This satisfies property 1, but doesn't satisfy property 2, as anyone can easily hash all the public keys in the (public available) group by brute force and unmask the mebmer of the group behind the nullifier. ##### 2. $H(address, secret)$ Letting the prover add a secret seems a clever idea. Note that in this case we consider *secret* as random 32 bytes string, not the private key of the user. In this way you would achieve property 2. But at the same time we lose property 1. The problem is that the secret has no **connection** with the address so the same actor can modify the secret used as (private) input and, at every round of proof generation, output a different nullifier. ##### 3. $H(address, secret)$ with $H(secret)$ committed inside the tree To solve the previous problem, we force the user to commit to the $H(secret)$ before generating the proof. In this scenario, the user will compute their *secret* locally, and add the commitment $H(secret)$ to the merkle tree that stores the member of the group next to its address. *[Add image of the updated merkle tree group here]* In this way a **connection** between the *secret* and the *address* is achieved. The zkSNARK will now contain an additional computation integrity check: - provers knows the secret such that $H(secret)$ matches the commitment associated to the address they proved they control. This approach works, but it adds a few more problems: - it requires an active action from each member of the group to add their commitment to the group. - it requires each member of the group to store an additional secret. That must be kept secret as revealing the salt would reveal the identity of the prover (as it would lead to a brute force attack similar to design #1). For the proof of assets example the anonimity set of the addresses that control more than 0.1 ETH is composed of $10^{14}$ addresses. If we add the requirement for the members to commit to a secret and add the commitment to the tree, the anonimity set get restricted from $10^{14}$ address to the set of these addresses that actually added their commitment H(salt) to the tree. The consequence is that the anonimity set gets highly restricted. This lead us to other two desired properties of the nullifier: **Property 3**: Do not require an interactive phase prior to the proving phase where each member of the set needs to create an additional commitment and add it to the anonimity set. **Property 4**: Do not require the prover to store an additional secret. ##### 4. $H(private key)$ Using the hash of a user's private key as nullifier seems the perfect solution. It satisfies properties 1 and 2 as it is inherently connected to the address. And it also satisfies the last added properties 3 and 4. The main problem of that solution is related to the security (and user experience) for the user that is generating the proof. In that case the user would need to manually copy and paste their private key and add it as (private!) input for the circuit which something that we want to avoid. Therefore, we add another desired property for the nullifier. **Property 5**: Do not require the user to provide their private key as input to the circuit. [*Is this actually so bad*?] ##### 5. $H(signature)$ This seems a great idea too! The signature is exactly how we prove ownership of an address/public key every day. In fact, even tough the signature is a function of the private key, the signature can be generated by the user in their secure enclave outside of the circuit and later on used as private input for the circuit. In this case there's no security threat for the user. The problem here, as explained in the [0xPARC zk-bug-tracker](https://github.com/0xPARC/zk-bug-tracker#8-0xparc-stealthdrop-nondeterministic-nullifier), is that ECDSA signature (the ones performed for ethereum addresses) are not deterministic by nature. Meaning that a unique user can modify the randomness in the signing function while mainiting the same message. This can lead to different (valid!) signatures to be generated for the same message by the same user. Since a user can generate infinite signatures for the same message, it means infinite nullifier, defeating property 1 of the system. A fix for that, would be to force the ECDSA signature to be deterministic. In zk-design term, it would mean to prove that the ECDSA signature has been generated using a *fixed* randomness. As described inside the [Plume's blogpost](https://blog.aayushg.com/posts/nullifier), this solution wouldn't work as valid nullifier, as it still requires access to the private key, defeating property 5. ##### 6. $H(public key)$ This is a less intuitive idea. Why using something public as the public key as hash preimage. Wouldn't it lead to the same problem as design#1? Surprisingly enough, it's unfeasible to get to retreive the public key behind an address **if, and only if, a wallet hasn't made any transaction or signed any message before**. If the wallet has been performed a transaction or signed a message, the public key can be easily retrieved from an address (and the signature) using [ecrecover](https://forum.openzeppelin.com/t/how-to-extract-public-key-for-a-signed-transaction-in-solidity-contract-from-ecrecover-function/19700). This design would accomplish the 4 desired properties of the protocol only if the user added to the group has never made a transaction or shared any signature to the public. Considering a generic scenario, this nullifier scheme is very weak since it would maintain privacy only until the user doesn't perform any trasaction or signature. As soon as the user performs a transaction, everyone be able to retroactively unmask the nullifier. ##### 7. PLUME standard As extensively described in the Plume [blogpost](https://blog.aayushg.com/posts/nullifier) and [paper](https://aayushg.com/thesis.pdf), deploying verifiably deterministic signatures on ECDSA within wallets allows to achieve all the properties that a nullifier should have. The only issue here pertains to the current availability of this solution. As described in the blog post, these nullifiers can be only produced by installing a [Metamask Snap Extension](https://ethglobal.com/showcase/zk-nullifier-snap-6a9sq). While the solution looks the way to go for the long term, requiring a Centralized Exchange to move their private keys into an experimental Metamask extension is not feasibile today. ##### 8. $H(diffieHellmanSharedSecret, publickey)$ The solution proposed here builds on top of the idea proposed in Design #3. This solution makes a clever usage of Diffie Hellman Key Exchange (DHKE) techniques. In particular, the core idea is to remove the interactivity by generating the secret on behalf of the user using DHKE. For this scenario let's consider Alice identifier by her public key $p_{a}$. Alice has a balance of 0.2 ETH and therefore her $p_{a}$ is part of the anonimity set. On the other side there's Bob, identified by his public key $p_{b}$, trusted party that is in charge of maintaining the anonimity set. Bob can generate a shared secret between him and Alice using DHKE. Since the shared secret can be generated by Bob only using $p_{a}$, no interaction by Alice is needed at this step. Starting from the shared secret Bob can generate a commitment to it $H(sharedSecret)$ and map it to Alice inside the anonimity set. Note that this same process needs to be performed for each member of the anonimity set. Luckly enough, this can be whole performed by Bob, without requiring any intervention from the anonimity set members. If Alice would ever want to privately claim inclusion in the anonimity set, she would need to retrieve the shared secret using her private key and $p_{b}$ (outside of zk SNARK) and prove, inside the zk SNARK, that: - She controls $p_{a}$(achievable by ECDSA signature verification) - $p_{a}$ is included in the merkle tree committed on chain with a public available **root** - She knows $sharedSecret$ such that $H(sharedSecret)$ is associated to $p_{a}$ in the Merkle Tree committed on chain with a public available root - Use $H(sharedSecret, p_{a})$ as nullifier. Let's analyze it according to our 5 desider properties: | Property | Achieved | How? | |:----------------------------------------------------------------------------------------------------------------- |:-------- |:--------------------------------------------------------------------------------------------------- | | Uniquely identifies a member of the set | Yes | The commitment to the hash of the shared secret is connected to the related public key | | Preserves the privacy of the member as no information about the member is retrievable starting from the nullifier | No | The trusted party Bob has access to the sharedSecret, so can brute force the nullifier to unmask it | | Do not require an interactive phase prior to the proving phase | Yes | The commitment is generated and added to the commitment tree solely by Bob| | Do not require the prover to store an additional secret. | Yes | The shared secret is a function of the private key of Alice, therefore as long as Alice has access to her private key, the shared secret can always be generated on the fly| | Do not require the user to provide their private key as input to the circuit | Yes | The shared secret is a function of the private key, but the computation of this function is executed outside of the circuit. Only the shared secret is passed as input to the circuit| Related to the last point note that leaking access to the shared secret wouldn't lead to the same consequences as if it was the private key. Losing the shared secret would, at most, leak the anonimity of the user inside the anonimity set. It's also worth mentioning that the shared secret doesn't reveal any information related to the underlying private key, thanks to the Discrete Log Problem. This scheme looks promising. If there was only a solution that allows many parties to compute fractions of the $sharedSecret$ without ever having access to the whole $sharedSecret$... ### The Nullifier (proposed) Solution The proposed solution to the nullifier problem works is structured in 2 steps: 1. **trusted setup** Each actor is identified by $p$ and $P=G^{p}$ where $p$ is their private key and $P$ is this the corresponding public key. Since the implementation is based on Ethereum, consider $G$ as the generator point of the secp256k1 elliptic curve. $p$ is a random 32-bytes string while $P$ is a point on the secp256k1 elliptic curve. Suppose that actors Bob and Carl are taking part of the trusted setup. In particular their goal is generate a commitment to Alice's secret. The goals of the scheme are to: - Generate a secret for Alice to be used as nullifier, without requiring her to interact with the protocol - Prevent Bob and Carl to access the secret, as it would let them unmask Alice's nullifier. Bob and Carl, separetely, generate a shared secret with Alice using Diffie Hellman Key Exchange: $H_{B}^{A}$ and $H_{C}^{A}$ where the former is the secret shared between Bob and Alice generated by Bob, and the latter is the secret shared between Carl and Alice generated by Carl. Note that the secret can be generated by each individual by only requiring access to Alice's public key $G^{a}$and "adding" their private key to it. $H_{B}^{A}={(G^{a})}^b=G^{ab}$ and $H_{C}^{A}={(G^{a})}^c=G^{ac}$. This step doesn't require any action from Alice. Bob and Carl, separetely, commit to the previously generated shared secret. For example applying a constant scalar to it. $C_{B}^{A}=G^{abk}$ and $C_{C}^{A}=G^{ack}$, where $C_{B}^{A}$ indicates the commitment of Bob to his shared secret with Alice. The constant scalar should be public known and the same for every participant of the trusted setup. The commitments are then published by Bob and Carl with a flag that indicates that the commitments relate to Alice. $C_{C}^{A}=> A$ and $C_{B}^{A}=> A$ where A is Alice's public key. Note that from the commitments $C_{B}^{A}$ and $C_{C}^{A}$, it is unfeasible to retrieve the underlying secrets $H_{B}^{A}$ and $H_{C}^{A}$ as it would mean breaking the Discrete Log Problem. An accumulator should sum together this individual commitments to the trusted setup related to each individual $accC_{A}= C_{C}^{A} + C_{B}^{A} = G^{abk} + G^{ack}$ 2. **proof generation** Now Alice comes into place because she needs to generate a proof. For example, she needs to cast an anonymous vote by proving that she is a unique voter across an set of elegible voters. Before generating the actual zero knowledge proof she needs to recover her shared secret, starting from the public keys of the participants to the trusted setup by "adding her private key". In particular $H_{A}^{B}={(G^{b})}^a=G^{ab}$ and $H_{A}^{C}={(G^{c})}^a=G^{ac}$. Furthermore, she would accumulate these together $accH_{A}=H_{A}^{B} + H_{A}^{C} = G^{ab} + G^{ac}$. $accH_{A}$ is the secret of Alice inside this scheme. Note that Alice would need to use her private key to retrieve this shared secret but this is all happening outside the zero knowledge proof. In order to cast vote, Alice would need to prove that: - She is in control of a public key $A$ that is part of the (public available) anonimity set by using ECDSA signature - She knows the secret behind the commitment $accC^{A}$. This is proven by applying a scalar k to her secret and checking that the output matches the publicly available commitment. In particular: $accH_{A} = G^{ab} + G^{ac}$ $accH_{A} * k = G^{abk} + G^{ack} = accC^{A}$ - $accC^{A}$ is associated to the public $A$ that she is in control of - $H(accH_{A}, A, appID)$ is the *nullifier* specific for the application to prevent double voting Note: the application ID is needed to prevent Alice to generate the same nullifier across different application as it would allow to track the behaviour of a **unknown** user across different applications. Benefit of this scheme: - Non interactive, the user can decide to engage while his commitment is already public. - No need to use their private key as an input to the circuit. On top of that, losing the shared secret won't reveal anything about the private key of the user. - Don't need to store an additional secret. As the retrieved shared secert aan always be computed by Alice on the fly starting from her private key. ![](https://hackmd.io/_uploads/SJ8BIlxE3.png) An alternative commitment scheme would be for example taking the point x from the shared secret $G^{ac} = (x_{ac}, y_{ac})$ and commiting to its hash $H(x_{ac})$. In this case the whole scheme would look like this: ![](https://hackmd.io/_uploads/SyEB4x-N3.png) ### Non interactivity I wan't to focus on how the non-interactivity feature is important for application design. Let's an application that wants to conduct anonymous voting. The census of the voters is the set $[P1, P2, P3, ..., Pi]$. If the application relies on an interactive nullifier scheme, users needs to generate a secret and shared a commitment to this secret to the group manager, that would associate the commitment to the relative public key in the census. The problem here, as common to many voting system, is that it is very unlikely that **all** the member of the census will commit to their secret in order to cast their vote. In this case the census will look like this $[P1 => C1, P2 => -, P3 => -, ..., Pi => Ci]$. Since there's no commitment associated to $P2$ and $P3$ this will automatically reveal that these 2 users didn't join the voting. Therefore the **anonimity set** of the voters is restricted across only the set of users that shared their commitment. This issue would be ever more relevant when the original anonimity set is larger and the usage of the applications is only relevant for a fraction of the anonimity set. For example, in a Proof of solvency application, a CEX needs to prove ownership of a pubkey with a balance greater than x, without revealing the pubkey. In this case the anonimity set is the whole set of ethereum addresses, while the only parties interested in using the application are the CEXes. In this scenario, it is likely that only the interested parties, the CEXes, will commit to a secret. This will practially restrict the anonimity to a fraction of the originial one. A further issue related to interactive nullifier scheme is that of **timing**. Going back to the example of the voting system, let's consider the scenario where the vote is to be casted publicly and the commitment is also collected publicly i.e. on a blockchain. In this scenario, a user will try to delay the action of casting the vote after having submitted the commitment is order to mix themself across different voters. On the contrary, casting a vote right after the submitting the commitment will signal a correlation between the two actions and therefore probabilistically reveal the identity of the voter. In a non-interactive scenario, both the problems are solved! ### Limits of the system and open questions - Large setup, the dimension of the setup grows linearly with the number of users that are included in the original anonimity set - Still need to access the private key, in order to compute the shared secret. Altough it happens outside of the circuit. How would that be achievable? Don't we need a browser extension similar to what is performed by Plume nullifier? - Proving the correct execution of the commitment by the trusted setup contributors. How can we make sure that the commitment published by a member of the setup is actually using the shared secret with Alice as pre-image? And how can we make sure that the scalar applied is actually the constant? Maybe this can be mitigated attacching a zkProof of correct execution to that. - Is the Commitment scheme safe enough? Ideally the commitment scheme should allow everyone to verify a commitment. Similarly to what happens with Groth16 trustedsetup. - Trusted component, the system is sound as long as one of the contributor to the trusted setup doesn't share his shared secret to the other colluding members of the setup. Note that recovering the whole shared key behind the commitment would "only" allow the group to unmask the identity behind a nullifier, while it wouldn't allow the group to generate valid proofs. - How would that be achievable with Smart contract wallets? - Here's a [known bug](https://github.com/0xPARC/zk-bug-tracker#9-a16z-zkdrops-missing-nullifier-range-check) that should be taken into account when designing the system