There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
PLONK
In this note we describe PLONK, a recent zero-knowledge proof system. The goal is to walk through the scheme step by step and explain the cryptographic primitives at play.
Arithmetization
The arithmetization a program consists in expressing a program as a sequence of constraint which have a simple mathematic form. There are lots of ways to do this, but in this post we will look at a variation of Rank 1 Constraint System (R1CS).
Namely, the constraint binds variables by relations of this form:
Any instance of NP-complete problem can be represented as a system of constraints of this sort. For instance the function can be expressed as:
is called an intermediate variable, because it's neither part of the input set {} nor the output set {}.
This constraint system has an intuitive matrice representation. We note the vectors and the vector .
There are pseudo permutation matrices (meaning that have exactly one non zero entry per row, equal to ), such that the arithmetization can be expressed as where denotes the entrywise product.
Our function can be represented using variables and matrices, in the following way:
Here , and
, ,
The number of lines in the matrices , equal to the size of , correspond to the number of constraints.
If was a public input, the role of a "prover" would be to find such that , that is find such that the equality above holds.
If one replaces by a more complicated function, like SHA-, one would have a lot more variables and the R1CS would be much more complicated. Solving the corresponding constraint system would boil down to finding such that SHA-: it's the job that the prover has to do, and it's supposedly impossible to do unless the prover actually knows .
For the next sections, is a big prime number, we place ourselves in a finite field of high characteristic , and the variables , the matrices all live in .
The cryptographic toolbox
We describe fundamental tools that are used in most flavors of snarks, especially for the succintness part.
Schwartz Zippel lemma
The first tool is the Schwartz-Zippel lemma:
Take a non zero polynomial of degree << in . If one picks randomly, then , is almost surely not zero.
In practice, with the parameters chosen for crypto, you can translate "almost surely" by "surely". The probability that is not is actually at most , and for cryptographic purposes is typically a bits number, while ranges at most in the billion (about bits number).
To justify this lemma, we proceed by induction on , the number of variables. For , we know that on , a non zero polynomial of degree has at most zeros, so sampling in the set of zeros of happens with a probability of at most . Now for , write , a non zero -variate polynomial, as , with of degree at most . Let the largest such that . Let be a random element. Either for all , which by induction happens with probability and then no matter , or there is a such that , which happens with a probability bounded by , and is a univariate polynomial of degree at most , so by the case the probability that is less than . Averaging the cases using our bounds, we have that the probability that is less than .
For the various flavors of snarks that we will describe, the converse of the Schwartz-Zippel is equally useful:
Let be a polynomial of degree . If one picks randomly and , is almost surely the zero polynomial.
Chosing at random is crucial here, otherwise one just has to pick a root (if it exists) of to obtain .
Low degree extension (LDE)
The second is known as Low Degree Extension (or LDE) of a vector. In fact, this is just an interpolation of a vector: if one has a vector with , and a set with , the LDE of , which we name , is the interpolation of on . Concretely it means that is a polynomial of degree , and with a correct indexing of the elements of , one has .
Kate commitment
The Kate commitment scheme is a polynomial commitment scheme, meaning that it allows one to create a digest out of a polynomial that is binding to the polynomial and hiding.
We describe shortly how it works. Let be a group of elements in which the discret log is hard, and suppose that a trusted setup have been performed so that a sequence , with , is publicly available, but nobody knows .
Given a degre polynomial , where , the Kate commitment of is the element .
Binding property Note that . To find a collision, either one has to solve the discrete log to find the value of , which is supposedly hard, or one has to pick a polynomial such that . But since is unknown, the Schwartz Zippel lemma tells us that this happens with probability less than . Hiding porperty It stems from the fact that is unknown.
Note that .
In what follows, the group needs an extra feature, called a pairing, which is a non degenerate bilinear application , where is another group (the stands for target, as is often called the target group). It allows to "multiply" elements in (but only ).
This feature allows a prover to open a commitment at a given value of the choice of a verifier:
Prover computes , which is in , and gives and the claimed value to the verifier
The verifier checks that (remember that are public outputs from the trusted setup).
Completeness: it stems from the bilienarity and non degeneracy of the pairing. Soundness: Since is unknown under the hardness of discrete log property, if the claimed evaluation is wrong, call it , the prover has to send a random , such that by luck is a root of . The degree of is at most due to the trusted setup. The overall expression is at most of degree , so according to the Schwartz Zippel lemma the probability of that is a root of it is at most .
There is a way to batch reveal values of different commitments at a single point using a round of interaction with a verifier:
Let a list of Kate commitments to . To query the values of these polynomial at :
the prover sends the supposed evaluations of to the verifier
the verifier sends a random challenge to the prover
the prover computes the proof , where and sends it to the verifier
the verifier computes and checks if If the last check holds, the claimed values are, with very high probability, corrects.
Since the commitments are given prior to the challenge , it's exactly the same argument as above.
Finally, we describe how to open different commitments at different points.
Let be a set of commitments and a set of points in . To open the commitments on the :
a prover sends the individual proofs as well as the evaluations
a verifier choses random numbers , and computes and checks that
Note: this method is fine as long as there are not too many points, because computing is expensive (usually is a subgroup of the rational points of an elliptic curve over , and computing requires a multi exponentiation on this group).
PLONK
We have a constraint system such that the -th constraint looks like this: and the whole system has the matrix representation
where are pseudo permutation matrices.
The goal of PLONK is to get rid of by encoding the corresponding permutations using polynomials, so they can be commited using the Kate commitment scheme. A circuit is represented by with the condition that are permutations of set .
Example: Imagine a program computing where is a given input. The arithmetization of this circuit consists of the following set of constraints: The result is . With matrices, we have (we omit and , containing only ):
Notice that there is exactly non zero entry in each line of the matrices.
To take the PLONK version of this, we get rid of the matrices by applying the matrix multiplications, corresponding to permutations, and we end up with: Here .
The general PLONK representation is therefore
We need a compact way to encode that are of this form, so a prover can convince the verifier of this.
If we concatenate in that order, we obtain a vector of size , and we observe that is invariant under certain permutations of elements. In fact, should look like this: . The first, fourth, fifth, eighth entries are the same, equal to , so for instance d is invariant when the cycle acts on the entries. Identifying all the cycles that leave invariant (those cycles shift the entries of the duplicated variable, like in the example), we end up with a permutation which completely characterize a circuit-compliant solution.
To prove the knowledge of a solution to the system, he needs to:
claim 1: prove that
claim 2: prove that is invariant under the action of .
In what follows, we drop and to simplify the notations. So claim 1 becomes Without this shortcut the notations are overloaded and it's harder to read, and it changes nothing to the explanations.
claim 1
To prove the first claim, prover and verifier agree on a set of size , and the prover computes the LDE of on .
If , we have where .
The general strategy to prove that an algebraic relation between polynomials holds is:
Verifier asks the prover the commitments
Verifier choses at random, and queries the values for all but one polynomial (say )
Verifier computes by finding such that holds
Verifier asks a proof for the opening of all at
Verifier cheks that the proof is correct, and that
To how this works, since the commitments are sent prior evaluating them at , the Schwartz Zippel lemma tells us that the chances that opens correctly at such that are negligible if does not hold.
In what we need to prove, The role of in the example is played by .
So the procedure is the following, where are public:
The prover sends to the verifier
Verifier queries (but not) at a random challenge
Verifier computes
verify checks a batch opening proof of at to check that it opens correctly to
We mention that there are ways to modify this protocol to use commitments of the public data instead of directly using for memory bounded verifiers.
claim 2
The goal is to prove that where , by characterizing it in term of divisibility by , so a similar strategy as in the first claim can be used.
The solution stems from the following property:
Let a vector in , where , let be a set of distincts elements and a permutation of elements. We have the following equivalence: where the left hand side is in .
is obvious. Since is a unique factorization domain, meaning there is a unique way of decomposing polynomials in irreducible components, and the are irreducible components, for each there is a such that and therefore (with the same argument) that and .
Now we assume that are replaced by random challenges and , chosen by the verifier. The Schwart Zippel lemma ensures that there is no loss of information.
To apply this property to , we would like to have the LDE of , but the problems are that
this vector is of size , and computing its LDE of would require a set of interpolation of size
we don't reuse the LDE of on that the prover aleady computed.
Encoding the permutation :
One thing not avoidable is to extend the set so that it becomes a set of elements for encoding the permutation.
To extend , pick and add the set , then pick and build the set . Those sets are disjoints (it stems from the fact that is a group).
Now that the set is extended to , we interpret as a permutation of , and we can rewrite the products in the property by grouping the entries of :
This grouping allows to compute the LDE of , and on only, and not on something times bigger. Namely
is encoded as
is encoded where is the LDE of , the LDE of , and the LDE of
Now we can write the previous equivalence in this way:
Everything is in polynomial form, and the polynomials involved are LDE on , so it is a huge step forward.
From now on we require that the domain of interpolation is a multiplicative subgroup of , in particular it's a cyclic subgroup generated by of size , its elements are (in that order) .
We also set the following notations:
Call
Call
Call and .
The key idea to encode the permutation using polynomials is to take the LDE of the vector , so is , otherwise for . The term is not taken.
Observe that for from to . Since , we almost have a relation on all which is . The only possibilty of failure if when .
In fact, on , since is a cyclic group, we have . On the other hand, should be if (and only if up to a to a negligible soudness error) .
So the relation and is true on all iff .
Since is characterized by on , where is the LDE of , we have
and
We finally have a characterization of in term of divisibility by involving (since depends on those).
Since the permutations describe the circuit, there are part of the public data, and we suppose that are precomputed. Note that are directly available since , , .
The divisibility claims can be collapsed into one if the verifier checks that a random linear combination is divisible by , call the quotient .
We apply exactly the same strategy as in claim 1, where and the role of is played by .
Here is a description of the proof of claim 2, where are public:
claim 2:
Prover sends
Verifier sends challenges (used to compute )
Prover sends (computed with )
Verifier sends a random to the prover (used to collapse the divisibility claims)
Prover computes and sends to the verifier
Verifier queries at a random
Verifier computes
verifier computes
The verifier sends a random challenge to the prover
The prover sends a proof for the batch opening of at and for the opening of at
The verifier choses a random and verifies the opening of (using the Kate reveal at different points).
Note that claim 1 and claim 2 can be collapsed into claim. Namely, the used to check that is divisible by can be reused to incorporate the first claim, so the final protocol checks that is divisible by .
There are ways to customize this protocol to use commitments of the public data instead of their raw values, for memory bounded verifiers.
Conclusion
There are a lot of small variations to PLONK. Since PLONK relies on general polynomial commitment schemes (here only the Kate commitment scheme was presented), it is quite flexible and can be used for proof carrying data constructions such as Halo.