Proof of Assets proposal - v2

# Proof of Assets proposal - v2 ## Version 1 proposal doc The [previous doc](https://hackmd.io/@JI2FtqawSzO-olUw-r48DQ/ByfjBnNBa) spoke about the snark design, limitations, workarounds for limitations, language & proving system to be used, a ZK block-processing protocol, and dev time estimates. Some more information has been gathered since that doc, hence this new doc. This doc will lay out the snark design (copied from version 1) and give an analysis on it. It will also disect the ZK block-processing protocol in more depth. One does not need to read version 1 before reading this doc. ## Protocol overview ### Requirements An exchange would like to give evidence that they own a set of blockchain accounts, without exposing which ones. They would also like to commit to the sum of the balances of these accounts, without revealing said sum. The only way they can show ownership of an account is by providing a digital signature (i.e. they are not able to provide the private keys). More formally.. Let: -- $\mathcal{E}$ be the exchange -- $\mathcal{C}$ be the blockchain -- $A$ be the set of public addresses for $\mathcal{E}$'s accounts on $\mathcal{C}$ *(needs to be kept private)* -- $\alpha$ be the sum of $\mathcal{E}$'s account balances i.e. $\alpha = \sum_{a \in A}{\text{bal}(a)}$ *(needs to be kept private)* -- $C_{\alpha}$ be the Pedersen commitment to $\alpha$ with blinding factor $r$ *(commitment can be made public)* *(the commitment will need to be used in conjunction with the output of the Proof of Liabilities protocol in order to show solvency, which is the ultimate goal)* $\mathcal{E}$ wants to show that they control the private key for each $a \in A$ and that $C_{\alpha}$ is indeed a commitment to $\alpha$. $\mathcal{E}$ is able to produce a set of digital signatures, $S$, s.t. each $s \in S$ corresponds to exactly one $a \in A$. Let $\mathcal{S}$ be the digital signature scheme used. Ideally, $\mathcal{E}$ wants to show its solvency as often as possible (the finest granularity being at every block). ### Solution A ZK-SNARK will be used to a) validate each $s \in S$, and verify that it corresponds to exactly one $a \in A$ b) construct a Pedersen commitment to the sum of balances of $A$ and verify that it is equal to the public value $C_{\alpha}$ $A$, $S$ & $\alpha$ will be kept private by the snark. A snark of this size (see below for size details) cannot be generated for every block, so we use a 2nd protocol to achieve the desired granularity. Once the snark proof has been generated it will be posted on a PBB at time $t$, at which point the 2nd protocol will take over. It will use $A$ to update $C_{\alpha}$ by processing every transaction on $\mathcal{C}$ after $t$, without revealing any information not already revealed by the snark. Apart from perfect granularity, the block-processing protocol also loosens the input requirements for $\mathcal{E}$ since it does not require any digital signatures. ## 1. ZK-SNARK ### Design The method for hiding $\mathcal{E}$'s addresses works like this: have $A$ be a private input signal to the snark, verify inside the snark that $A \subset A^*$ for some public input signal $A^*$ (called the anonymity set), then have the verifier check that indeed $A^*$ is a valid set of addresses for $\mathcal{C}$. This method verifies that $A$ is a valid set of addresses without revealing them to the verifier. The verifier only knows that $A \subset A^*$. To incorporate the summing of balances of $A$ we add as a public input to the snark the set of balances of $A^*$, have the snark add up balances that correspond to $A$, and have the verifier check that the balance input matches those seen on $\mathcal{C}$. This is the basic idea. **Signatures:** we can't just have $A$ as a private input to the snark since $\mathcal{E}$ then has free choice over how to construct $A$ (can add addresses they do not own, for example). This is where the signatures come in. $S$ is given as a private input to the snark, as opposed to $A$, and $A$ is reconstructed from $S$ inside the snark. **Merkle tree:** having $A^*$ as a public input signal has some issues, one of which being that it's size is directly represented in the size of the proof. So large $A^* \implies$ large proof. We get around this issue by constructing [outside the snark] a Merkle tree with elements of $A^*$ as the leaf nodes, and then feed only the root hash as a public input signal to the snark. Of course this then requires $\mathcal{E}$ to produce Merkle proofs for each element of $A$ and feed those as private inputs to the snark, which the snark will then verify with the root hash. We also need to include the balances in the tree if we are to sum those up verifiably in the snark. More formally.. Let: -- $N = |A^*|$ -- $n = |S| = |A|$ -- $\textbf{a}=(a_0, a_1, ..., a_{n-1})$ be the vector containing elements of $A$ s.t. $\forall \space a \in A \space \exists \space i \in \mathbb{Z}_n \text{ s.t. } \space a = a_i \space \wedge \space \forall \space i \in \mathbb{Z}_n \space a_i \in A$ Full workflow: 1. A Merkle tree is generated to encode $A^*$ and its balances. The leaves of the tree are $(a,\text{bal}(a)) : a \in A^*$. Let $\mathcal{M}$ be the tree and let $\mathcal{M}_h$ be the root hash of the tree. 1. $\mathcal{E}$ generates signatures for each of the addresses in $A$ (this is the set $S$). Let $\textbf{s}=(s_0, s_1, ..., s_{n-1})$ s.t. $s_i \in S \space \forall \space i$ and each $s_i$ corresponds to $a_i$ in $\textbf{a}$ 2. $\mathcal{E}$ generates Merkle proofs for each of the address in $A$. Let $\textbf{m}=(m_0, m_1, ..., m_{n-1})$ be the vector containing all the Merkle proofs s.t. the leaf node for $m_i$ is $(a_i, \text{bal}(a_i))$ 3. $\mathcal{E}$ produces a snark proof that verifies the signatures & Merkle proofs ($\textbf{s}$ & $\textbf{m}$ are provided as private inputs to the snark, while $\mathcal{M}_h$ is a public input). 4. The verifier processes the snark proof, checks that $\mathcal{M}_h$ is correctly constructed for $A^*$, and checks that $A^*$ & the associated balances match $\mathcal{C}$. Furthermore: -- Let $\bar{m}$ be the number of Merkle proof parameters i.e. $m_i=(\mu_0, \mu_1, ... ,\mu_{\bar{m}})$ -- Let $\bar{s}$ be the number of signature parameters needed for verification i.e. $\mathcal{S}_{verify}(s)=\mathcal{S}_{verify}(\sigma_0, \sigma_1, ... ,\sigma_{\bar{s}})$ -- Let $u$ be the number of snark field elements needed to store a single element of the base field of the EC group for $C_{\alpha}$. This number will be $>1$ if the size of the base field for the EC group is greater than the size of the snark field. -- Assume the order of the group used for the Pedersen commitmnts is less than that of the base field for the snark, so that we only need 1 snark field element to store the blinding factor $r$. *see [Pedersen commitments section](#Pedersen-commitments) for more info on the last 2 points* snark params: ``` [pvt] m : field[n][m_bar] [pvt] s : field[n][s_bar] [pvt] r : field [pub] c : field[2][u] // <-- 2 because it is an EC point [pub] h : field ``` snark computation: ```rust let sum = 0; let address_prev = 0; for i in 0..n { let (address, balance) = get_leaf_node(m[i]); assert(address > address_prev); address_prev = address; verify_signature(s[i]); verify_association(address, s[i]); verify_merkle_proof(m[i], h); sum += balance; } let com = pedersen_commitment(sum, r); assert(com == c); ``` ### Example snark for Ethereum We are going to set $\mathcal{C}$ to Ethereum, since this seems to be the worst case scenario. "Worst" because ECDSA on secp256k1 & keccak are not SNARK-friendly. It is a similar case for Bitcoin. We have the following: -- $\mathcal{C}=\text{Ethereum}$ -- $\mathcal{S}=\text{ECDSA on secp256k1}$ ECDSA gives us $s=\{\sigma_{\text{public key}}, \sigma_r, \sigma_s, \sigma_{\text{message}}\}$ And Ethereum gives us our `verify_association` check: ```rust fn verify_association(a, s) { let address = keccak256(get_public_key(s)); assert(address == a); } ``` ### Language, libraries & proving system ECDSA verification, keccak256 hashing & Pedersen commitment construction require some preliminary circuit design (sometimes quite a lot). Due to the time constraints it seems wise to opt for a language & proving system that already has libraries for doing these 2 operations. Circom is one such ecosystem. #### Keccak256 The go-to Keccak256 Circom circuit ([keccak256-circom](https://github.com/vocdoni/keccak256-circom)) uses around 150k R1CS constraints per invocation, which is not too bad. #### ECDSA on secp256k1 ECDSA verification is rather expensive because the characteristic of the base field of secp256k1 is larger than [any of the primes](https://github.com/iden3/circom/blob/c822cddce0010356a4a2bd696cbf36a44a1c333e/program_structure/src/utils/constants.rs#L6) for the 6 curves offered by Circom. This means that doing secp256k1 field operations in the snark requires 2 snark field elements, and extra math. ECDSA verification requires EC operations which require field operations, so there is a lot of extra math to do. The ECDSA verification circuit from 0xPARC ([circom-ecdsa](https://github.com/0xPARC/circom-ecdsa)) takes ~1.5M R1CS constraints per invocation (see [their](https://0xparc.org/blog/zk-ecdsa-1) [blogs](https://0xparc.org/blog/zk-ecdsa-2) for more info). 0xPARC has a different circuit ([batch-ecdsa](https://github.com/puma314/batch-ecdsa)) that does batch verification of ECDSA signatures (see [their blog](https://0xparc.org/blog/batch-ecdsa) for more info). Their algorithm does not offer better than $O(n)$ performance but it does offer a ~$3\text{x}$ improvement on the naive solution in both number of constraints and proving time (see [here](https://www.desmos.com/calculator/thzhbu5o69) for exact constraint & timing data). Even with the batch-verify algorithm the [R1CS constraint limit](https://github.com/iden3/snarkjs#guide) of ~$268\text{M}$ means that a max of ~$600$ ECDSA signatures can be verified inside a snark, which puts an upper bound on $n$. If we need our set $A$ to be larger than that then we must use recursive snarks. #### Groth16 verification Circom offers 3 proving systems: groth16, Plonk and FFlonk. 0xPARC have also created circuits ([circom-pairing](https://github.com/yi-sun/circom-pairing/blob/107c316223a08ac577522c54edd81f0fc4c03130/circuits/bn254/groth16.circom)) for verifying groth16 proofs inside a snark (see [blog](https://0xparc.org/blog/groth16-recursion) for more details), so it is easiest to use the groth16 proving system if we want to do recursive proofs. The circuit requires ~$20\text{M}$ constraints for g16 verification, which is quite a lot. Note that the number of public inputs affects the proof size for g16. So if the size of the inputs is large then g16 verification could take more $20\text{M}$ constraints. #### Pedersen commitments The DAPOL library uses the [Ristretto Group](https://ristretto.group) for the [Edwards form](https://en.wikipedia.org/wiki/EdDSA#Ed25519) of [Curve25519](https://en.wikipedia.org/wiki/Curve25519) to produce the Pedersen commitments. This is set by the Bulletproofs library, which only supports this group. This means that we have to use the same group in the snark. Electron Labs created circuits for doing Ed25519 signature verification ([ed25519-circom](https://github.com/Electron-Labs/ed25519-circom)), which has the EC operations for the Edwards form of Curve25519. Since the Ristretto Group is a prime order subgroup of the main EC group, the operations are the same and so this library can be used for our purposes. Note that some implementations of the Curve25519 group do some tricks (such as clearing certain bits) in order to protect against attacks using the Pohlig–Hellman algorithm (see more details [here](https://ristretto.group/why_ristretto.html)). We just need to double check that such tricks are not done in the ed25519-circom circuits. Some care needs to be taken with regard to the size of the base field of the snark and the size of the base field for 25519 & the order of the Ristretto Group. 1. The base field for Curve25519 has size $p=2^{255}-19$ 2. The Ristretto Group has size $q=2^{252} + 27742317777372353535851937790883648493$ $p$ is greater than all the prime numbers offered by Circom, but $q$ is less than all of them. This means we need 2 snark field elements to store a single 25519 field element, and only 1 snark field element to store scalar multipliers of elements of the Ristretto Group. #### Merkle proofs 0xPARC created a circuit for verifying Merkle proofs using the Poseidon hash function ([cabal](https://github.com/0xPARC/cabal/blob/main/circuits/test/merkle.mjs)). If we set $N=10\text{M}$ then our Merkle tree would have height 24, which ends in ~$6000$ constraints for the verification circuit. This is much smaller than any of the other circuit sizes and so is not a limiting factor. #### Library security None of the libraries mentioned above have been audited, and they all warn about using the code in production. Some of the circuits, however, are currently used in production environments, examples: - [Electron Labs](https://electronlabs.org/) uses ed25519-circom (although they are not particularly popular) - [heyanon](https://www.heyanon.xyz/), [Private Market](https://www.privatemarket.dev/), [SealHub](https://hub.sealcred.xyz/) & [zkPoll](https://zkpoll.xyz/) are some live projects that use circom-ecdsa - [Private Market](https://www.privatemarket.dev/) & [DendrETH](https://github.com/metacraft-labs/DendrETH) both use circom-pairing for groth16 verification - circom-ecdsa & [spartan-ecdsa](https://github.com/personaelabs/spartan-ecdsa) (more on this awesome project below!) both use keccak256-circom - [heyanon](https://www.heyanon.xyz/) uses the Merkle proof verifier circuit ### Recursive snark design The idea here is to do have separate snarks for ECDSA verification and keccak hashing + Merkle proof verification. Multiple instances of these snarks can be run in parallel. This will allow us to get around the constraint limit. In the first set of snarks, ECDSA verification will be done. In the second set of snarks, the ECDSA snark proofs will be verified, the keccak hashing will be done, and Merkle proofs will be verified. In the final snark, the 2nd layer snark proofs will be verified, and the total asset sum commitment will be calculated. Let $k \approx \frac{n}{600}$ be the number of snarks to run in parallel, then the following system gives us what we want: ![poa_snarks_3](https://hackmd.io/_uploads/ry6dRll5p.jpg) *view better quality image [here](https://drive.google.com/file/d/1POxZfg4Y2LH8lbGfXvIrAhXP6apQPjYW/view)* $k$ is limited by the number of constraints for g16 verification (~$20\text{M}$), since $k$ snark proofs need to be verified in the final snark. The max value it can take is $13$ (assuming the max number of constraints allowed is ~$268\text{M}$). In practice this number will be lower since the final circuit needs to also construct the Pedersen commitment. If we assume a conservative value of $k=10$ then the system can support ~$6000$ sigantures, which should be plenty for a first attempt (see below for discussion on ways to improve this). ## 2. ZK transaction sync protocol The protocol that will take over after the snark is detailed in this paper: [Instant Zero Knowledge Proof of Reserve](https://eprint.iacr.org/2023/1156.pdf) (IZKPoR). At a high level the protocol takes as input: commitments to $A$ & $\alpha$ from $\mathcal{E}$, block information from $\mathcal{C}$; and outputs a proof of correct balance change for $\alpha$. If any of the accounts in the set are involved in transactions then the balance change is used to update the sum. In more detail.. 1. **Setup phase**: $\mathcal{E}$ publishes commitments to - The asset sum $\alpha$, using a Pedersen commitment: $C_{\alpha}$ - The account set $A$, by first transforming it into a polynomial and then evaluating it at a secret point in KZG style: $\textbf{C}_A$ 3. **Per-block setup for block $\mathcal{B}$**: - $\mathcal{C}$ emits the set of accounts $\hat{A}$ and their balance changes $\Delta\hat{B}$ for $\mathcal{B}$ - $\mathcal{E}$ publishes a Pedersen commitment to the change in $\alpha$ due to transactions in $\mathcal{B}$ involving accounts in $A$: $C_{\Delta \alpha}$ 5. **Per-block ZK proof**: using a ZK version of the [Cached Quotients](https://eprint.iacr.org/2022/1763.pdf) lookup argument ($\mathfrak{cq}$), $\mathcal{E}$ publishes a proof that $C_{\Delta \alpha}$ is constructed correctly 7. The total asset balance of $\mathcal{E}$ is updated by adding $C_{\alpha}$ & $C_{\Delta \alpha}$ ### Analysis of the protocol See [this doc](https://hackmd.io/@JI2FtqawSzO-olUw-r48DQ/BJ8SNhLYa ) for Q&A with the author. There are a few constructions that are not fully fleshed out, and there are no formal proofs for security or zero-knowledge. A 3rd version of the paper is still in the works, which will cover much of the missing information. The protocol has also been implemented in Rust, which shows promise of it's validity. ## Linking the snark & IZKPoR In order to integrate with IZKPoR the final snark will need to build the KZG commitment to $A$ and check that it is indeed equal to the one being used for IZKPoR proofs. Note that we can include addresses with balance of 0 to prepare for inevitable fund transfers to new accounts. Without this functionality the system would only work if funds stayed within $A$. The yellow & green snarks from above will be adjusted like so (with adjustments commented to make them noticeable): ![poa_snarks_3_izkpr_yellow](https://hackmd.io/_uploads/BJgWxGe5p.png) ![poa_snarks_3_izkpr_green](https://hackmd.io/_uploads/rk_O4-gqa.png) It's not clear how expensive the polynomial commitment will be to construct inside a snark (it depends on the field used and if this coincides with the field of the snark), but it will definitely not be cheap. ## Summary #### With reference to the recursive snark system Bounds on the parameters: -- $n \le 6000$ -- $N = 10\text{M}$ Unknowns: -- How many constraints does it take to produce a Pedersen commitment? #### With reference to the recursive snark system + IZKPoR integration Unknowns: -- How many constraints does it take to construct the polynomial commitment? ## Future work & optimizations ### Better ECDSA verification Personae Labs have created 2 libraries for doing faster ZK ECDSA verification for secp256k1. Their first library ([efficient-zk-ecdsa](https://github.com/personaelabs/efficient-zk-ecdsa)) uses a trick first seen on [crypto.stackexchange](https://crypto.stackexchange.com/questions/81089/is-this-a-safe-way-to-prove-the-knowledge-of-an-ecdsa-signature) to move some of the expensive calculations outside the snark. The resulting Circom ciruit produces ~$160\text{k}$ constraints, which is a reduction of ~$10\text{x}$ from the original 0xPARC one. The circuit works for the groth16 proving system. Their second library ([spartan-ecdsa](https://github.com/personaelabs/spartan-ecdsa)) uses, ontop of the tricks from the first library, the Spartan proving system with the secq256k1 curve (a cousin to secp256k1) to allow for right-field arithmetic. Their circuits for ECDSA verification need only ~$8\text{k}$ constraints. This library has even [been audited](https://github.com/nullity00/audits/blob/b4aea5880dd06ca09c62713d836705c136f0d597/Spartan-ecdsa.md). ### Different curve for Pedersen commitments It's possible to change the curve but this would mean changing the Bulletproofs code. There is already a fork of the code for the BLS12-381 curve ([bls_bulletproofs](https://github.com/maidsafe/bls_bulletproofs)) but it is not clear if the repo is complete, and it is definitely not still maintained. Using a different curve for the Pedersen commitments could have the effect of essentially free computations in the snark. E.g. if the BLS12-381 curve is used for groth16 & the paring group $\mathbb{G}_1$ for the same curve is used for the Pedersen commitments then doing $g_1^ag_2^b$ can be done with a single constraint in the snark: `a*g_1 + b*g_2`. ### Different proving systems Nova is a proving system that allows the folding of repetitive computations for better performance. Since the base algorithm for our PoA snark is just a large loop it seems a perfect fit. There has even been done some work to use Nova for batch-verifying ECDSA signatures for secp256k1 ([nova-browser-ecdsa](https://github.com/dmpierre/nova-browser-ecdsa)) and the results are very promising. It's possible that a lot of work would have to be done to move over to Nova because it's not clear how many of the Circom libraries we rely on exitst in that ecosystem, or how difficult it is to migrate them over. [Lurk](https://filecoin.io/blog/posts/introducing-lurk-a-programming-language-for-recursive-zk-snarks/) is a language specifically used for recursive snarks so if still need to do recursion then it may be useful to try this out. In zk.fm (episode 232, ~45 min in) there is discussion about batch-verification of g16 proofs inside a snark. There seems to be a method to improve performance by batching, so if we still need to recursively verify g16 proofs then this is an avenue to explore. ## Plan for end of Feb 2024 ### Already done The following libraries have been tested to work: - circom-ecdsa ([testing results](https://github.com/silversixpence-crypto/zk-proof-of-assets/blob/b4a358d668e7bf133f4a2005a60828fbd3781505/scripts/ecdsa_verification.sh)) - keccak256-circom ([testing results](https://github.com/silversixpence-crypto/zk-proof-of-assets/blob/b4a358d668e7bf133f4a2005a60828fbd3781505/scripts/keccak.sh)) - batch-ecdsa ([testing results](https://github.com/silversixpence-crypto/zk-proof-of-assets/blob/b4a358d668e7bf133f4a2005a60828fbd3781505/scripts/batch_ecdsa.sh)) - circom-pairing ([testing results](https://github.com/silversixpence-crypto/zk-proof-of-assets/blob/b4a358d668e7bf133f4a2005a60828fbd3781505/scripts/groth16_verification.sh)) - Merkle proofs ### Still to do - Test Pedersen commitment circuit - Put all the circuits together and run a full workflow - Read the mathemagic behind efficient-zk-ecdsa & spartan-ecdsa, and integrate with their circuits & proving system ## Post-Feb - Explore different curves for Pedersen commitments - Explore Merkle proofs for set membership - Explore Nova - IZKPoR