ZK-hack Puzzle #3 - Writeup

# ZK-hack Puzzle #3 - Writeup ## Introduction Another great week has passed in the zkhack event and I'm thrilled to share with you another writeup for another challenge that I ([Matan Hamilis](https://twitter.com/MHamilis)) have solved with [Elichai Turkel](https://twitter.com/Elichai2) and [Shalevos](https://twitter.com/shalev0s). We have written also solutions to [the first problem](https://www.zkhack.dev/solution1) and [the second problem](https://gist.github.com/Shalevos/2d1108290c496b2a2901b131476cb014), go ahead and check them out someday :) This puzzle called "Double Trouble" gives us some introductory material about ZK-proofs and Schnorr's Knowledge-of-Descrete-Logarithm Zero-Knowledge-Proof-of-Knowledge protocol. If all this means nothing to you we will get to it right away, it you already know about it, feel free to skip forward. ### Zero Knowledge Proofs If you're taking part in the [awesome zkhack event](https://www.zkhack.dev/) you've probably already heard about Zero Knowledge proofs, at least by name. Just to get things straight the setting is as follows: We have two entities - a Prover (denoted by $P$) and a verifier (denoted by $V$). The prover wants to prove that a certain argument holds to the verifier. To do so he sends the verifier pieces of information to which the verifier is allowed to respond with further "questions" on which the prover should respond. The conversation continues until the point in which the verifier decides to either *accept* or *reject* the proof. So far we have discussed proof-systems in general and nothing here is zero-knowledge. Many proofs in life, and especially in mathematics, can be done by supplying an example. For example (did you get the recusrion?), we can prove that a number is composite (not prime) by supplying a divisor of that number. Such "examples" are usually referred to as *"witnesses"*. So let's say you have a number that we want to prove it to be composite. But you don't want to provide to the verifier your witness. Actually you want to verifier to gain **zero knowledge** about the number besides it is composite. This is where Zero-Knowledge proofs come to the rescue. If we devise a certain protocol such that we will be able to convince the verifier that a composite number really is composite without disclosing our witness, this will be a Zero-Knowledge-Proof (or ZKP in short). It is also important that the verifier will likely reject false proofs. #### How can we prove that a protocol is ZKP? Let's assume there is a proof-system in which we, as verifiers, are convinced that a certain number is indeed composite, how can we show that this proof-system is ZKP? Well, if we could "simulate" a similar proof between a prover and a verifier that looks exactly the same as a regular proof and without knowing the actual secret, this would mean that zero information is leaked in the process of proving right? Otherwise, we could gain this information by simply simulating the proof! So, a proof-protocol is ZKP if we can create a **simulator** that can create identical transcripts of proofs just like a real proof taking place. You may have heard some "heavy" terms like "statistical indistinguishability" and such, but all they mean is simply to give a proper and precise mathemtical definition to what "*identical transcripts*" really mean. #### Proofs of Knowledge The fact that someone has just given us a proof that some number is indeed composite, doesn't mean that he really knows such a divisor for that number, right? While it makes sense to us that this is "the only" way to prove that some number is composite, we still can't really know that the prover knows the divisor. In some cases, we don't only want to be satisfied that some property holds but that the prover actually knows such a witness. These proofs are knows as **proofs of knowledge (PoK)**. #### How can we prove that a proof is PoK? We can't argue anything about what exactly the prover holds in its memory, so what exactly does "knowledge" even mean? In cryptography it is common to think of knowing something as "being able to efficiently extract the value of it". While we can't claim that a prover really holds a divisor of some number in its memory, we can prove that using the information it provides us it can efficiently extract such a divisor! Formally speaking, to prove that a proof is a PoK of some value $x$, we have to present an **extractor** of $x$ from the proof system. The extractor is an *efficient* algorithm that has the magical power to rewind the procedure of a proof and using the information it gathered can derive (again, efficiently) the value of $x$. For example, an extractor for the knowledge of a divisor of a composite number will be able to run and rewind the procedure of the proof to gain enough transcripts of a proof and using those transcripts to **extract** the value of a divisor. This shows us that the prover itself could have extracted this value (since everything the extractor sees is generated by the prover) and therefore we can tell that the prover knows it (or is able to know it without too much effort). #### Fiat Shamir Assumption As said, many proof systems are interactive. Therefore, the prover and the verifier have multiple rounds of communication in which the verifier asks questions / sends challenges and the prover responds accordingly. To make the proof system *non-interactive* so that the proof can be generated offline and verified later (even publicly for example, on a blockchain) we need to find a way to generate the challenges of the verifier in a reliable way. This can be done using the Fiat-Shamir assumption which basically states that instead of the verifier randomly sending challenges, the prover can generate a challenge by himself. But how is this fair? If the prover chooses the challenges he can create false proofs! That's correct, that's why we don't want the prover to be able to control the challenges easily. The challenge will be the output of some cryptographically secure hash function given as input the whole transcript of the proof. Since those function are considered to be random (under the assumption known as the "random oracle assumption") we can assume that those challenges the prover gets from the hash function are really "random" and couldn't have been specially crafted to trick the verifier. One last thing we should mention are **Pedersen Commitments** which are being used a lot in the challenge. Let's see what those mean. #### Pedersen Commitments A commitment scheme in general gives the functionality of committing to a specific value while exposing the committed value later. A commitment scheme has two main "functions": 1. $\text{Commit}(x,s)$: Creates a commitment to the value $x$ with some (possibly empty) supplementary data $s$. 2. $\text{Verify}(c,v,s)$: Verify that the commitment $c$ commits correctly to the value $v$ with some supplementary data $s$. The most simple commitment scheme which relies on the hardness of discrete-log-assumption uses some group $\mathbb{G}$ with a generator $G$ and works as follows: 1. $\text{Commit}(x,s)$: Return $x\cdot G$ 2. $\text{Verify}(c,v,s)$: Check $c \stackrel{?}{=} v\cdot G$. Notice that in this simplistic commitment scheme we haven't used the supplementary data. This scheme is **perfectly binding**, meaning that it binds the committer to the value of $x$ without being able to later claim that it is actually a commitment of another value $x'$ later on. However, this scheme is only **computationally hiding**. That is, given sufficiently large computational power, someone else could have derived the value of $x$ by breaking the discrete log of the commitment. The pedersen commitment, however, is **perfercly hiding** so that it hides the committed value based on information-theoretic arguments without relying on any hardness assumption (such as discrete log), but is only **computationally binding** so that given enough computation resources someone else could claim that a commitment to value of $x$ actually is a commitment to the value of some other value $x'$. The way Pedersen commitments work is by holding **two** generators $G,H$ of a prime order group, such that the discrete log between $G,H$ is **unknown** and doing the following: 1. $\text{Commit}(x,s)$: Return $x\cdot G + s\cdot H$. 1. $\text{Verify}(c,v,s)$: Check if $c \stackrel{?}{=} v \cdot G + s \cdot H$. This commitment scheme is **perfectly hiding** because for each commitment value $c$ and each committed value $x$ there exists some value $s$ such that $c = x \cdot G + s \cdot H$. However, it is only **computationally binding** since it is possible, with enough resources to find the discrete log of $G$ under $H$ and by that commit to another value. In this puzzle we are using a special variant of pedersen commitment used to commit on vectors instead of just scalars. In which we can commit to vectors of some size $t$ by having $t$ generators $G_1,...,G_t$ and a hiding generator $H$ such that the discrete log between any two generators or combination of generators is unknown. Committing to vector $v = [v_1,...,v_t]$ with supplementary data $s$ is done simply by computing $(\sum_{i=1}^t v_i \cdot G_i) + s \cdot H$. Just like the original Pedersen commitment this scheme is also perfectly hiding and computationally binding. To make notation simpler we will use the inner-product based notation so that $\sum_{i=1}^t v_i \cdot G_i = \langle v, G \rangle$. Great, now that we have enough introductory material, let's try and solve the puzzle! ### The Puzzle The puzzle goes as follows: ``` Bob has developed a new zero-knowledge inner-product proof allows proving that the inner product of a hidden, committed vector `a` with a public vector `b` equals the claimed value `v` that is committed. He's released the protocol license-free below, but still wants to profit from his invention. So, he developed a proprietary prover that he claims is 2x faster than the standard one described below, but without sacrificing zero-knowledge: it still hides all information about the committed vector `a`. To back up his claim, he has published a few proofs generated by this proprietary prover for the same `a` but with respect to possibly different `b` vectors, and has challenged people to recover `a` from just these proofs. Can you rise to the challenge and recover the vector `a`?. ``` In the puzzle we are given an (allegedly) ZKP protocol to prove that the inner product of some unknown vector $a$ to which the prover only commits, by a known vector $b$ equals to some value committed as well by the prover. We are also told that Bob, the inventor of the protocol has invented his own version of the prover which works very fast and has given us two examples of such proofs. The proof system of Bob is made non-interactive by using the Fiat-Shamir assumption we discussed earlier. The original proof system works as follows: ``` Prover Verifier ================================================================================================= Offline phase (before `b` is available): 1. Prover computes C_a := PedersenCOMM(a; α) = sum_i (a_i * G_i) + α * H where G_i and H are random group elements, and s is sampled randomly. --------- C_a ----------> Online phase for a given public vector `b` (can be repeated for different `b`s): 1. Prover samples a random vector r and random elements ρ, τ, υ. 2. Prover computes C_r := PedersenCOMM(r; ρ) := sum(r_i*G_i) + ρ*H C_1 := PedersenCOMM(<a, b>; τ) // <x, y> denotes inner product of x and y. := <a,b>*G_1 + τ*H C_2 := PedersenCOMM(<r, b>; υ) := <r,b>*G_1 + υ*H ---- C_r, C_1, C_2 -----> <- random challenge γ --- 3. Prover computes s := a + γr, u := α + γρ t := τ + γυ, -------- s, u, t -------> // Check that `s` really is a + γr, Check PedersenCOMM(s; u) = C_a + γC_r // Check that the inner product is committed in C_1. Check PedersenCOMM(<s, b>; t) = C_1 + γC_2 ================================================================================================== ``` We are also given [two proofs](https://github.com/kobigurk/zkhack-double-trouble/blob/38f256a6af533ac49056bf17092b20b5730b83d5/src/data.rs#L6) each contains the whole transcript of the proofs (i.e. $b, C_a, C_r, C_1, C_2, \gamma, s, u, t$). Since we have two proofs we will refer to their inputs as: $P_1=(b_1, {C_a}_1, {C_r}_1, {C_1}_1, {C_2}_1, \gamma_1, s_1, u_1, t_1)$ and $P_2=(b_2, {C_a}_2, {C_r}_2, {C_1}_2, {C_2}_2, \gamma_2, s_2, u_2, t_2)$ each referring to the corresponding proof transcript. We are also given the [`commit_key`](https://github.com/kobigurk/zkhack-double-trouble/blob/38f256a6af533ac49056bf17092b20b5730b83d5/src/data.rs#L4) which is the set of generators $G_i$ and hiding generator $H$ taking part in the (vectorized) Pedersen commitment scheme. We can see [how the generators are generated](https://github.com/kobigurk/zkhack-double-trouble/blob/38f256a6af533ac49056bf17092b20b5730b83d5/src/inner_product_argument/data_structures.rs#L20). This is the output of `ChaChaRng` which is considered secure and thus nothing "fishy" determined the value of the generators so there really is no disrete log known between them. One can also verify that $P_1$ and $P_2$ are valid proofs by running the [given main function](https://github.com/kobigurk/zkhack-double-trouble/blob/38f256a6af533ac49056bf17092b20b5730b83d5/src/bin/verify-double-trouble.rs#L10). This can be done using `cargo run` command line utility. #### The solution The proofs we are given are therefore known to be verified successfully. Therefore, the only things we can learn about them is that for $i =1,2$: $$\begin{align} \text{PedersenComm}(s_i;u_i) &= {C_a}_i + \gamma_i \cdot {C_r}_i \\ \text{PedersenComm}(\langle s_i,b \rangle;t_i) &= {C_1}_i + \gamma_i \cdot {C_2}_i \end{align}$$ Recall that $\text{PedersenComm}({\bf x;}y) = \sum x_iG_i + yH$ for vector ${\bf x}$ and that $\text{PedersenComm}(x;y) = xG_1 + yH$ for some scalar $x$. Notice that since we are using another prover we *can't* tell that: 1. $s_i = a_i + \gamma_i r_i$. 2. $u_i = \alpha + \gamma_i \rho_i$ 3. $t_i = \tau_i + \gamma_i \nu_i$ An important observation we make is that both $b$ and $C_a$ are the same in both proofs. This can be done using the following rust code: ```rust= let (commit_key, [(instance_1, proof_1), (instance_2, proof_2)]) = puzzle_data(); let b_1 = instance_1.b; let b_2 = instance_2.b; let a_comm_1 = instance_1.comm_a; let a_comm_2 = instance_2.comm_a; assert_eq!(a_comm_1, a_comm_2); assert_eq!(b_1, b_2); ``` Therefore we will denote $b_1=b_2=b$ and ${C_a}_1 = {C_a}_2 = C_a$. Eventually we want to find $a$ and $\alpha$ such that: $\text{PedComm}(a;\alpha) = C_a$ therefore we write $C_a = \langle a, G \rangle + \alpha H$ however, as for how both $C_r$ values were generated we can't tell much so we can write that the following holds for us: $$\begin{align} \text{PedersenComm}(s_1;u_1) &= \langle a, G \rangle + \alpha H + \gamma_1 \cdot {C_r}_1 \\ \text{PedersenComm}(s_2;u_2) &= \langle a, G \rangle + \alpha H + \gamma_2 \cdot {C_r}_2 \\ \end{align}$$ Now let's get totally rid of all this $\text{PedersenComm}$ notation and write the commitments explicitly: $$\begin{align} \langle s_1, G \rangle + u_1H &= \langle a, G \rangle + \alpha H + \gamma_1 \cdot {C_r}_1 \\ \langle s_2, G \rangle + u_2H &= \langle a, G \rangle + \alpha H + \gamma_2 \cdot {C_r}_2 \\ \end{align}$$ Remember - because the discrete log of $G_i$ and $H$ is unknown and the prover isn't almighty we assume the prover doesn't have such a discrete log between $H$ and $G_i$s and therefore to create such $s_1$ and $s_2$ as part of the proof he could only compare between scalars that multiply each generator in both sides of each equation. The only problem is, that he couldn't have simply done that as long as not all summands in the equation are represented using multiples of $H$ and $G_i$s, but $C_r$s aren't given that way. So, the prover can only either: 1. Represent each $C_r$ by breaking a discrete log. 2. Know some discrete log between the $Cr$ values. Since the first option is not reasonable, the prover isn't almighty! We decided to try break the discrete log between ${C_r}_1$ and ${C_r}_2$ since these two values are simply given to us by the prover and he could have created them to satisfy pretty much anything it wanted. We weren't sure what multiple of ${C_r}_1$ will be a multiple of ${C_r}_2$ (or vice versa) so we have fed our rust code multiple options (by multiplying each ${C_r}$ by both $\gamma$ values.) We have run the following code: ```rust= let (commit_key, [(instance_1, proof_1), (instance_2, proof_2)]) = puzzle_data(); let gamma_1 = challenge(&commit_key, &instance_1, &proof_1.commitment); let gamma_2 = challenge(&commit_key, &instance_2, &proof_2.commitment); let cr_1 = proof_1.commitment.comm_r; let cr_2 = proof_2.commitment.comm_r; let elements = vec![ cr_1, cr_2, cr_1.mul(gamma_1).into_affine(), cr_2.mul(gamma_1).into_affine(), cr_1.mul(gamma_2).into_affine(), cr_2.mul(gamma_2).into_affine(), ]; let mut elements_copy = Vec::clone(&elements); let mut i = 2; loop { elements_copy .iter_mut() .zip(elements.iter()) .for_each(|(copy, orig)| *copy += *orig); elements_copy .iter() .enumerate() .for_each(|(copy_idx, copy)| { elements.iter().enumerate().for_each(|(orig_idx, orig)| { if copy == orig { println!("elements[{}] * {} = elements[{}]", copy_idx, i, orig_idx); } }) }); i += 1; } ``` And got the output: ``` elements[0] * 2 = elements[1] elements[2] * 2 = elements[3] elements[4] * 2 = elements[5] ``` This is awesome! We have: ${C_r}_2 = 2\cdot {C_r}_1$. So if we get back to our original equation, we have: $$\begin{align} \langle s_1, G \rangle + u_1H &= \langle a, G \rangle + \alpha H + \gamma_1 \cdot {C_r}_1 \\ \langle s_2, G \rangle + u_2H &= \langle a, G \rangle + \alpha H + \gamma_2 \cdot {C_r}_2 \\ \end{align}$$ So if we just modify the equations a little bit we have: $$\begin{align} \langle s_1-a, G \rangle + (u_1-\alpha)H &= \gamma_1 \cdot {C_r}_1 \\ \langle s_2-a, G \rangle + (u_2-\alpha)H &= \gamma_2 \cdot {C_r}_2 \\ \end{align}$$ Let's write ${C_r}_2 = 2\cdot {C_r}_1$: $$\begin{align} \langle s_1-a, G \rangle + (u_1-\alpha)H &= \gamma_1 \cdot {C_r}_1 \\ \langle s_2-a, G \rangle + (u_2-\alpha)H &= 2\cdot \gamma_2 \cdot {C_r}_1 \\ \end{align}$$ Now we multiply the first equation by $2 \gamma_2$ and the second equation by $\gamma_1$ and we get: $$\begin{align} 2\gamma_2\langle s_1-a, G \rangle + 2\gamma_2(u_1-\alpha)H &= 2\gamma_2\gamma_1 \cdot {C_r}_1 \\ \gamma_1\langle s_2-a, G \rangle + \gamma_1(u_2-\alpha)H &= 2\gamma_2 \gamma_1 \cdot {C_r}_1 \\ \end{align}$$ Now if we subtract the second equation from the first we can cancel ${C_r}_1$ out: $$ 2\gamma_2\langle s_1-a, G \rangle + 2\gamma_2(u_1-\alpha)H -\gamma_1 \langle s_2 -a , G \rangle - \gamma_1 (u_2 - \alpha)H = 0 $$ To satisfy the equation (and find $a$ and $\alpha$) we only have to require that the multiples of each generator will be zeroed out so: 1. $2\gamma_2(u_1-\alpha) = \gamma_1 (u_2 - \alpha)$ 2. $2\gamma_1 (s_1 -a) = \gamma_1 (s_2 -a)$ By solving each equation we get: $$\alpha = \frac{2\gamma_2 u_1 -\gamma_1 u_2}{2\gamma_2 -\gamma_1}$$ $$a = \frac{2\gamma_2 s_1 -\gamma_1 s_2}{2\gamma_2 -\gamma_1}$$ We have computed their values in: ```rust= let u_1 = proof_1.response.u; let u_2 = proof_2.response.u; let s_1 = proof_1.response.s; let s_2 = proof_2.response.s; let two = Fr::from(2); let alpha = (two * gamma_2 * u_1 - gamma_1 * u_2) / (two * gamma_2 - gamma_1); let a: Vec<Fr> = s_1 .iter() .zip(s_2.iter()) .map(|(s1_e, s2_e)| (two * gamma_2 * s1_e - gamma_1 * s2_e) / (two * gamma_2 - gamma_1)) .collect(); ``` Now we have verified by integrate all of our code into the main function provided and we see that our solution acutally works! ```rust= use ark_ed_on_bls12_381::Fr; use ark_ff::Field; use double_trouble::data::puzzle_data; use double_trouble::inner_product_argument::utils::challenge; use double_trouble::verify; use double_trouble::PUZZLE_DESCRIPTION; use prompt::{puzzle, welcome}; use std::ops::{AddAssign, Mul}; use ark_ec::{ twisted_edwards_extended::{GroupAffine, GroupProjective}, AffineCurve, ProjectiveCurve, }; use ark_ed_on_bls12_381::{EdwardsParameters, FrParameters}; use ark_ff::Zero; use ark_serialize::CanonicalSerialize; use double_trouble::{utils::dot_product, CommitKey}; fn main() { welcome(); puzzle(PUZZLE_DESCRIPTION); let (ck, [instance_and_proof_1, instance_and_proof_2]) = puzzle_data(); let (instance1, proof1) = instance_and_proof_1; let (instance2, proof2) = instance_and_proof_2; assert!(verify(&ck, &instance1, &proof1)); assert!(verify(&ck, &instance2, &proof2)); let (commit_key, [(instance_1, proof_1), (instance_2, proof_2)]) = puzzle_data(); let b_1 = Vec::clone(&instance_1.b); let b_2 = Vec::clone(&instance_2.b); let a_comm_1 = instance_1.comm_a; let a_comm_2 = instance_2.comm_a; assert_eq!(a_comm_1, a_comm_2); assert_eq!(b_1, b_2); let gamma_1 = challenge(&commit_key, &instance_1, &proof_1.commitment); let gamma_2 = challenge(&commit_key, &instance_2, &proof_2.commitment); let cr_1 = proof_1.commitment.comm_r; let cr_2 = proof_2.commitment.comm_r; let elements = vec![ cr_1, cr_2, cr_1.mul(gamma_1).into_affine(), cr_2.mul(gamma_1).into_affine(), cr_1.mul(gamma_2).into_affine(), cr_2.mul(gamma_2).into_affine(), ]; let mut elements_copy = Vec::clone(&elements); let mut i = 2; let u_1 = proof_1.response.u; let u_2 = proof_2.response.u; let s_1 = proof_1.response.s; let s_2 = proof_2.response.s; let two = Fr::from(2); let alpha = (two * gamma_2 * u_1 - gamma_1 * u_2) / (two * gamma_2 - gamma_1); let a: Vec<Fr> = s_1 .iter() .zip(s_2.iter()) .map(|(s1_e, s2_e)| (two * gamma_2 * s1_e - gamma_1 * s2_e) / (two * gamma_2 - gamma_1)) .collect(); let (a, comm_a_rand): (Vec<Fr>, Fr) = { (a, alpha) }; assert_eq!( ck.commit_with_explicit_randomness(&a, comm_a_rand), instance1.comm_a ); assert_eq!( ck.commit_with_explicit_randomness(&a, comm_a_rand), instance2.comm_a ); } ```