Another great week has passed in the zkhack event and I'm thrilled to share with you another writeup for another challenge that I (Matan Hamilis) have solved with Elichai Turkel and Shalevos.
We have written also solutions to the first problem and the second problem, go ahead and check them out someday :)
This puzzle called "Double Trouble" gives us some introductory material about ZK-proofs and Schnorr's Knowledge-of-Descrete-Logarithm Zero-Knowledge-Proof-of-Knowledge protocol. If all this means nothing to you we will get to it right away, it you already know about it, feel free to skip forward.
If you're taking part in the awesome zkhack event you've probably already heard about Zero Knowledge proofs, at least by name. Just to get things straight the setting is as follows: We have two entities - a Prover (denoted by ) and a verifier (denoted by ). The prover wants to prove that a certain argument holds to the verifier. To do so he sends the verifier pieces of information to which the verifier is allowed to respond with further "questions" on which the prover should respond. The conversation continues until the point in which the verifier decides to either accept or reject the proof. So far we have discussed proof-systems in general and nothing here is zero-knowledge.
Many proofs in life, and especially in mathematics, can be done by supplying an example. For example (did you get the recusrion?), we can prove that a number is composite (not prime) by supplying a divisor of that number. Such "examples" are usually referred to as "witnesses". So let's say you have a number that we want to prove it to be composite. But you don't want to provide to the verifier your witness. Actually you want to verifier to gain zero knowledge about the number besides it is composite.
This is where Zero-Knowledge proofs come to the rescue. If we devise a certain protocol such that we will be able to convince the verifier that a composite number really is composite without disclosing our witness, this will be a Zero-Knowledge-Proof (or ZKP in short). It is also important that the verifier will likely reject false proofs.
Let's assume there is a proof-system in which we, as verifiers, are convinced that a certain number is indeed composite, how can we show that this proof-system is ZKP? Well, if we could "simulate" a similar proof between a prover and a verifier that looks exactly the same as a regular proof and without knowing the actual secret, this would mean that zero information is leaked in the process of proving right? Otherwise, we could gain this information by simply simulating the proof! So, a proof-protocol is ZKP if we can create a simulator that can create identical transcripts of proofs just like a real proof taking place.
You may have heard some "heavy" terms like "statistical indistinguishability" and such, but all they mean is simply to give a proper and precise mathemtical definition to what "identical transcripts" really mean.
The fact that someone has just given us a proof that some number is indeed composite, doesn't mean that he really knows such a divisor for that number, right? While it makes sense to us that this is "the only" way to prove that some number is composite, we still can't really know that the prover knows the divisor. In some cases, we don't only want to be satisfied that some property holds but that the prover actually knows such a witness. These proofs are knows as proofs of knowledge (PoK).
We can't argue anything about what exactly the prover holds in its memory, so what exactly does "knowledge" even mean? In cryptography it is common to think of knowing something as "being able to efficiently extract the value of it". While we can't claim that a prover really holds a divisor of some number in its memory, we can prove that using the information it provides us it can efficiently extract such a divisor! Formally speaking, to prove that a proof is a PoK of some value , we have to present an extractor of from the proof system. The extractor is an efficient algorithm that has the magical power to rewind the procedure of a proof and using the information it gathered can derive (again, efficiently) the value of .
For example, an extractor for the knowledge of a divisor of a composite number will be able to run and rewind the procedure of the proof to gain enough transcripts of a proof and using those transcripts to extract the value of a divisor. This shows us that the prover itself could have extracted this value (since everything the extractor sees is generated by the prover) and therefore we can tell that the prover knows it (or is able to know it without too much effort).
As said, many proof systems are interactive. Therefore, the prover and the verifier have multiple rounds of communication in which the verifier asks questions / sends challenges and the prover responds accordingly. To make the proof system non-interactive so that the proof can be generated offline and verified later (even publicly for example, on a blockchain) we need to find a way to generate the challenges of the verifier in a reliable way. This can be done using the Fiat-Shamir assumption which basically states that instead of the verifier randomly sending challenges, the prover can generate a challenge by himself.
But how is this fair? If the prover chooses the challenges he can create false proofs! That's correct, that's why we don't want the prover to be able to control the challenges easily. The challenge will be the output of some cryptographically secure hash function given as input the whole transcript of the proof. Since those function are considered to be random (under the assumption known as the "random oracle assumption") we can assume that those challenges the prover gets from the hash function are really "random" and couldn't have been specially crafted to trick the verifier.
One last thing we should mention are Pedersen Commitments which are being used a lot in the challenge. Let's see what those mean.
A commitment scheme in general gives the functionality of committing to a specific value while exposing the committed value later. A commitment scheme has two main "functions":
The most simple commitment scheme which relies on the hardness of discrete-log-assumption uses some group with a generator and works as follows:
Notice that in this simplistic commitment scheme we haven't used the supplementary data. This scheme is perfectly binding, meaning that it binds the committer to the value of without being able to later claim that it is actually a commitment of another value later on. However, this scheme is only computationally hiding. That is, given sufficiently large computational power, someone else could have derived the value of by breaking the discrete log of the commitment.
The pedersen commitment, however, is perfercly hiding so that it hides the committed value based on information-theoretic arguments without relying on any hardness assumption (such as discrete log), but is only computationally binding so that given enough computation resources someone else could claim that a commitment to value of actually is a commitment to the value of some other value .
The way Pedersen commitments work is by holding two generators of a prime order group, such that the discrete log between is unknown and doing the following:
This commitment scheme is perfectly hiding because for each commitment value and each committed value there exists some value such that .
However, it is only computationally binding since it is possible, with enough resources to find the discrete log of under and by that commit to another value.
In this puzzle we are using a special variant of pedersen commitment used to commit on vectors instead of just scalars. In which we can commit to vectors of some size by having generators and a hiding generator such that the discrete log between any two generators or combination of generators is unknown. Committing to vector with supplementary data is done simply by computing . Just like the original Pedersen commitment this scheme is also perfectly hiding and computationally binding. To make notation simpler we will use the inner-product based notation so that .
Great, now that we have enough introductory material, let's try and solve the puzzle!
The puzzle goes as follows:
In the puzzle we are given an (allegedly) ZKP protocol to prove that the inner product of some unknown vector to which the prover only commits, by a known vector equals to some value committed as well by the prover. We are also told that Bob, the inventor of the protocol has invented his own version of the prover which works very fast and has given us two examples of such proofs.
The proof system of Bob is made non-interactive by using the Fiat-Shamir assumption we discussed earlier. The original proof system works as follows:
We are also given two proofs each contains the whole transcript of the proofs (i.e. ). Since we have two proofs we will refer to their inputs as: and each referring to the corresponding proof transcript.
We are also given the commit_key
which is the set of generators and hiding generator taking part in the (vectorized) Pedersen commitment scheme.
We can see how the generators are generated. This is the output of ChaChaRng
which is considered secure and thus nothing "fishy" determined the value of the generators so there really is no disrete log known between them.
One can also verify that and are valid proofs by running the given main function.
This can be done using cargo run
command line utility.
The proofs we are given are therefore known to be verified successfully. Therefore, the only things we can learn about them is that for :
Recall that for vector and that for some scalar .
Notice that since we are using another prover we can't tell that:
An important observation we make is that both and are the same in both proofs. This can be done using the following rust code:
Therefore we will denote and .
Eventually we want to find and such that: therefore we write however, as for how both values were generated we can't tell much so we can write that the following holds for us:
Now let's get totally rid of all this notation and write the commitments explicitly:
Remember - because the discrete log of and is unknown and the prover isn't almighty we assume the prover doesn't have such a discrete log between and s and therefore to create such and as part of the proof he could only compare between scalars that multiply each generator in both sides of each equation. The only problem is, that he couldn't have simply done that as long as not all summands in the equation are represented using multiples of and s, but s aren't given that way. So, the prover can only either:
Since the first option is not reasonable, the prover isn't almighty! We decided to try break the discrete log between and since these two values are simply given to us by the prover and he could have created them to satisfy pretty much anything it wanted.
We weren't sure what multiple of will be a multiple of (or vice versa) so we have fed our rust code multiple options (by multiplying each by both values.)
We have run the following code:
And got the output:
This is awesome! We have: . So if we get back to our original equation, we have:
So if we just modify the equations a little bit we have:
Let's write :
Now we multiply the first equation by and the second equation by and we get:
Now if we subtract the second equation from the first we can cancel out: To satisfy the equation (and find and ) we only have to require that the multiples of each generator will be zeroed out so:
By solving each equation we get:
We have computed their values in:
Now we have verified by integrate all of our code into the main function provided and we see that our solution acutally works!