# Kate Commitments: A Primer Polynomial commitments underpin a lot of ZK Proof systems these days - but to the newcomer the question arises 'what have polynomials got to do with this?'. Because our mental image of polynomials is long wavy lines that cross the x axis a few times and then trot off to ± infinity ### Why do we need PCSs? **Polynomial Commitment Schemes** are a sort of "mathematical scaffolding" that allow cryptographers to load a large amount of information (the 'execution trace of a circuit') into a single elliptic curve point -- a single 'number'. SNARKs (the 'S' standing for 'succinct') need to prove to a verifier that large computations have taken place without the verifier having to re run the whole computation (which would defeat the point). Usually, in the world of Web3, the verifier is a blockchain. The succinctness of these proofs (i.e. keeping the amount of data that needs to be sent to the verifier as small as possible) is very important on blockchains, because typically storing on-chain data is very expensive. ### Please be concrete -- what's the 'trace of a circuit'? Take this algorithm ('circuit'): given a starting $x$, compute $x \mapsto x^3 + 5$, and repeat that 1 million times. I am going to undertaken the enormous task of running this circuit. My ultimate goal is to prove to you I computed this, and did so correctly - without you having to re-run the whole thing. Suppose our starting number is $x=2$. Then in the first round of computation, the prover computes: - $x^2 = 4$ - $x^3 = x^2 \times x = 4 \times 2 = 8$ - $x^3 + 5 = 13$ So our trace is $\{ 2, 4, 8, 13, ... \}$. This means we will produce $3,000,001$ numbers in computing the circuit -- that's a lot of data to send, especially given the vastness of the numbers later in the sequence! How can I convince you, the verifier, I correctly computed all these numbers without breaking the internet trying to send them all to you? I would like to 'commit' to some piece of data that is a digest of all these $3,000,000$ numbers (that is, nail my colours to the mast so you can't shuffle around your numbers later on), and do so in such a way that you can probe me with questions, so that if I am lying, you can quickly find me out, and otherwise convince yourself I must have done this correctly. This is what succinct proofs are for. ### Fine, but what do polynomials have to do with anything? If you did this in a really naïve way, for example just hashing all the data together e.g. $H(a_0| ... | a_d)$, you would be unable to test any arithmetic at all inside that 'mulsh' of information, and to prove you'd committed to it you would therefore have to reveal / transmit all those $a_i$ values. Polynomial commitment schemes use polynomials almost like a vector space -- each 'dimension' is 'loaded' with one of the numbers in the 'trace' of computing a programme (by 'trace', we mean the sequence of all inputs, intermediary values and outputs computed/used in the course of running the algorithm concerned concerned). More importantly, the *polynomial* bit is useful because it allows the 'uploaded' numbers to stay mathematically separated 'inside' that single committed number, and in such a way that mathematical relations can be tested there. The 2010 Kate scheme (properly called the Kate-Zaverucha-Goldberg scheme) allows a prover to create some data which binds them to an evaluation of a polynomial $f(X) = \sum_{i=0}^k a_i X^i$ in a provable fashion. Creating that data is called a 'commitment'. ### Role of the Reference String I referred earlier to PCSs as sort of 'mathematical scaffolding'. What I was describing in particular is a list of precomputed elliptic-curve points known as a **Reference String** -- in the case of Kate, a list of successive powers of an unknown quantity $\alpha$, such as that generated in the recent AZTEC [Ignition Ceremony](https://www.aztecprotocol.com/ignition): \begin{align*} \langle g, g^\alpha, ... g^{\alpha^t} \rangle \in \mathbb{G}^{t+1} \end{align*} Where $\mathbb{G}$ is some elliptic curve group. What are these quantities? Well, they're just the polynomials (monomials) $1, X, X^2, ... X^q$ evaluated at a number $\alpha$, and written in elliptic curve form rather than integer form. ### The t-Strong Discrete Log Assumption There is a big assumption we rely on that even if the whole world knows these elliptic curve points $g$, $g^\alpha$, ... $g^{\alpha^t}$ etc, one can't back out $\alpha$ - it's known as the **$t$-Strong Discrete Log Assumption**. It sounds hard, but it's basic to state: given this string of elliptic curve numbers, it's computationally infeasible to find $\alpha$. That's quite eyebrow-raising if you've not seen it before. Actually, when $t$ gets very big (~$2^{40}$ for Aztec's chosen curve BN254) the SRS provides so much structural information that it does indeed start to corrode the curve's security. Anyway, this Reference String provides us with little monomial blocks of the form $g^{\alpha^i}$, from $i=0$ (the generator point) right up to $i=d$, where $d$ is the degree of the biggest polynomials we want this Reference String to be used for. :::spoiler On the Length of the Reference String > **Length of Reference String:** These polynomials 'represent' the data to which we've committed (often, we are etching the trace of numbers created in performing a computation into these polynomials -- e.g. the execution of a smart contract -- that is, the inputs, the intermediate values, and the outputs). The more and more terms $g^{\alpha^i}$ we include in our Reference String on Day 1, the greater the complexity of computations this Reference String can support. For example, a Reference String just comprising $\langle g^{\alpha^0}, g^{\alpha^1}, g^{\alpha^2} \rangle$ can never help us 'store' more than three numbers - most computations will create many more numbers in the course of execution than 3! So when you set your number $d$, you need to be really sure it's big enough to support all conceivable computations. ::: Along with polynomial coefficients, these little blocks can be used to create encrypted evaluations of a given polynomial $f(X):=\sum_{i=1}^n a_i.X^i$ in the obvious way, by just exponentiating those blocks by the coefficients of $f(X)$, and multiplying them all together: \begin{align*} g^{f(\alpha)} = g^{a_0 + a_1.\alpha + a_2.\alpha^2 + ... a_d.\alpha^d} = g^{a_0}.g^{a_1.\alpha}... g^{a_d.\alpha^d} = (g)^{a_0}.(g^{\alpha^1})^{a_1}...(g^{\alpha^d})^{a_d} \end{align*} And this is where **polynomial commitment scheme** begins -- the Prover has now created a 'commitment' to a polynomial. It has bound the prover to *this particular polynomial* (so they can't change their mind later on). Using the Reference string, the Prover had enough building blocks to form $g^{f(\alpha)}$, so long as $deg(f) \leq t$, the maximum power $g^{\alpha^t}$ in the Reference String. And in doing so, their first stake is in the ground, and the tent can't be pitched anywhere else (pictorial analogy to follow shortly). ### A Mental Image So often a good picture can be instructive. The image I tend to have in mind for polynomial commitment schemes is pitching a tent. You start by putting in the central post, which roots the tent -- 'committing' at the unmanipulable, unknowable value $\alpha$. To make a proof, you tie a number of guy ropes (different evaluation points) -- as many as are required to support the structure of the canopy (the zero knowledge proving scheme relying on this commitment scheme). Some schemes may only require one such point (and the analogy is found to be a bit flimsy -- it would be an odd-looking tent!). Others may require two or more. ![Picturing Polynomial Commitment Schemes](https://i.imgur.com/eW5SxIt.png) ### 1. Commitment >The prover starts by 'committing' to their chosen polynomial, by evaluating $f$ at a value they don't know and can't influence -- for this they use the Reference String How do you commit to a value you don't know? Well, we can commit to the polynomial evaluated at this mystery number $\alpha$ whilst that number is inside the exponent of the point $g$. After all, we know the $i^{th}$ powers: \begin{align*} C = g^{f(\alpha)} = \prod (g^{\alpha^i})^{a_i} \end{align*} So this number is the value that $f$ takes at $\alpha$, but encrypted as an elliptic curve point. ### 2. Opening >Prove that inscrutable $C$ value actually represents a polynomial in $\alpha$, the secret setup value Sometime later after the Commitment was published, the Verifier says -- "now, prove your original commitment $C$ really was the encrypted valuation of some polynomial at $\alpha$". In fact, says the Verifier, here's a point $\beta$ at which to make your proof (which is with almost certainty $\neq \alpha$). The Prover does a quick calculation which P could only do if $C$ was really $g^{f(\alpha)}$: \begin{align*} \psi_\beta (\alpha) = \frac{f(\alpha) - f(\beta)}{\alpha - \beta} \end{align*} The observation of Kate here is the following -- given a polynomial $f(X)$, if one evaluates it at some other point $\beta$ then $f(X) - f(\beta)$ is divisible by $X-\beta$. This is obvious because $f(X)|_{X=\beta} - f(\beta) = 0$. :::spoiler Another way to think about this > You can also think of this as 'deleting the constant term' -- note that $f(X) - f(\beta) = \sum a_iX^i - a_i.\beta^i$. Now for $i=0$ this is just $a_0 - a_0 = 0$ and all the other terms are of the form $a_i (X^i - \beta^i)$. And $X - \beta$ is a factor of each of these. So the whole thing is divisible by $X-\beta$. ::: Now, the Prover must provide the Verifier with the following information: \begin{align*} \langle \beta, f(\beta), g^{\psi_\beta(\alpha)} \rangle \end{align*} Now, if $f(\beta)$ isn't the claimed evaluation, we have a problem -- because then $g^{\psi_{\beta}(\alpha)}$ will contain negative monomial powers (because of that $\alpha - \beta$ term in the denominator). There are no negative powers in the reference string by definition! We'll discuss a bit further in the next section. ### 3. Verify > The computation the Verifier (read: smart contract) must do to check the proof is correct -- more precisely, they must check the relationship between the original commitment $C$ and the quotient $g^{\psi_\beta (\alpha)}$. It is then quite a simple matter for the Verifier to check that the following holds: \begin{align*} e(C, g) \stackrel{?}{=} e(g^{\psi_\beta(\alpha)}, g^\alpha. g^{-\beta}).e(g,g)^{f(\beta)} \end{align*} Hum. Why? What we're trying to do is to work out whether the Prover successfully divided through the polynomial $(f(X) - f(c))$ by the linear term $(X-c)$ with no remainder -- and recall that comparing factors is exactly what pairings do. Take another look -- the left hand side of the equation is: \begin{align*} e(C, g) = e(g^{f(\alpha)}, g) = e(g, g)^{f(\alpha)} \end{align*} Ok, so that's a constituent bit of the more complicated term $\psi_\beta (\alpha) = \frac{f(\alpha)-f(\beta)}{\alpha - \beta}$. And the right hand side is: \begin{align*} e(g^{\psi_\beta(\alpha)}, g^{\alpha - \beta}).e(g, g)^{f(\beta)} = e(g,g )^{\psi_\beta(\alpha).(\alpha - \beta)}.e(g, g)^{f(\beta)} = e(g,g )^{\frac{f(\alpha) - f(\beta)}{\alpha - \beta}.{(\alpha - \beta)}}.e(g, g)^{f(\beta)} \end{align*} -- which you can see reduces to $f(\alpha)$ in the exponent. :::spoiler Something that may be gnawing at you - I'm uncomfortable about this. I thought $\mathbb{G}$ was a prime cyclic group, i.e. we can think of the numbers in the exponent as elements of $\mathbb{Z}_r$, some prime $r$? - Yes, that's right - Well, $\mathbb{Z}_r$ is then a field -- multiplicative inverses everywhere - That's also right - So what exactly am I proving by dividing out by this $\alpha - \beta$ term? So long as they're not equal, no problem? - Actually, without access to this linear-division argument, there's a huge problem. Remember $\alpha$ has only ever been served to you in encrypted form, as $g^\alpha$ etc. So you could only perform the division by knowing the polynomial result. Remember always that $\alpha$ is locked inside this unwieldy elliptic curve point, which means you can't divide nor multiply by $\alpha$ - only add. ::: ### 4. Breathe Take stock -- what have we just done? - The **Prover** committed to a polynomial using the Reference String (i.e. gluing together terms of the form $g^{\alpha^i}$, exponentiating by the polynomial $f$'s coefficients to make $g^{f(\alpha)}$) - The **Verifier** said 'hang on, how do I know you've committed to any kind of polynomial'? Please evaluate this thing at a point of my choosing $\beta$, and come back to me and prove they both evaluate the same polynomial - The **Prover** comes back with two items -- the evaluation $f(\beta)$ and an elliptic curve point evaluated at the difference between the evaluations $f(\alpha)$ and $f(\beta)$, divided out by the linear term $(\alpha - \beta)$ - The **Verifier** then uses a nice(ish) pairings equation to check this computation was done correctly inside the exponent of $g$ Notice one very important thing -- when the Verifier was running their checks, they had to use the first element from the Reference String $g^\alpha$. This means that not only is the Prover proving they can find this solution, but that it was created out of a degree 1 or larger polynomial in $\alpha$. In other words, the Prover showed the 'polynomialness' of the thing which was originally committed to.