Circle STARKs writeup

# Circle STARKs writeup ## Notation - $x \in X$ means $x$ is an element of the set $X$. - $\mathbb{F}$ a finite field. - $\mathbb{F}^N = \{(x_1,\dots,x_N): x_1,\dots,x_n \in \mathbb{F}\}$. - $\mathbb{F}[X]$ is the set of polynomials in one variable with coefficients in $\mathbb{F}$. - $\mathbb{F}[X, Y]$ is the set of polynomials in two variables with coefficients in $\mathbb{F}$. ## STARKs Over the past years many different STARKs implementations appeared. So far one thing they all have in common is that the order of the base field is a smooth prime $p$. That means, it is a prime such that $p-1$ is divisible by a high power of two. Say, $p-1 = 2^nk$ with $n$ larger than $20$. These choices are efficiency-driven. Such primes allow for fast implementations of algorithms needed in the generation of the proof, like the Fast Fourier Transform. Let's loosely recall what the happy path looks like in a STARKs prover. 1. Interpolate columns $T_i$ of the trace to polynomials $p_i\in\mathbb{F}[X]$ over a domain $D=\{d_0, \dots, d_{N-1}\}$ efficiently and send the oracles $[p_i]$ to the verifier. 1. Given the interpolants $p_1, \dots, p_k$ of all the columns, construct off of them the polynomial $f\in\mathbb{F}[X]$ that encodes the AIR constraints and satisfies $f(d) = 0$ for all $d\in D$. 1. Write $f = vt$, where $v, t$ are polynomials and $v$ is the public *vanishing* polynomial of $D$. The polynomial $v$ is efficient to evaluate. Send the oracle $[t]$ to the verifier. 1. Open the polynomials $p_i(z)$, $t(z)$ for a random $z\in\mathbb{F}$ chosen by the verifier. Or $z\in\mathbb{E}$ for an extension field $\mathbb{E}$ of $\mathbb{F}$ in case $\mathbb{F}$ is a small field. In every finite field one can start with any non-zero element $w$ and construct the set formed by its powers: $D = \{1, w, w^2, w^3, \dots\}$. Since the field is finite, the powers will eventually cycle back to $1$ and so the set $D$ looks like $D = \{1, w, \dots, w^{N-1}\}$, where $N$ is the least positive integer such that $w^N=1$. In addition, when the finite field is $\mathbb{F}_p$ with $p$ a smooth prime, one can choose $N$ to be a large power of two. The existence of these sets $D = \{1, w, \dots, w^{2^n-1}\}$ with $w^{2^n}=1$ is mainly the reason to choose smooth primes. Let's zoom in and see why. ### Fast Fourier Transform Given a domain $D=\{d_0, \dots, d_{N-1}\}$ and a column $T = (t_0,\dots,t_{N-1}) \in \mathbb{F}^N$, interpolating means computing the unique polynomial $p$ of degree at most $N-1$ such that $p(d^i) = t_i$ for all $i=0,\dots,N-1$. The set $D$ can be anything with $N$ different elements for such polynomials $p$ to always exist (and be unique). But computing them is horribly slow for a generic set $D$. The Fast Fourier Transform is a fast algorithm for computing them when the set $D$ is of the form $$D = \{1, w, w^2, \dots, w^{N-1}\}$$ for some element $w\in\mathbb{F}$ such that $w^N=1$ and with $N=2^n$ for some $n$. It is a divide and conquer algorithm that exploits the consequences of the equality $w^N=1$. So both properties are important for the algorithm to work. If $w=-1$, then $D = \{1, -1\}$ is such a set for $N = 2^1$. But that's the only set of that shape that's ensured to exist for every prime field $\mathbb{F}$ (except for $\mathbb{F}_2$). A necessary and sufficient condition for a set $D = \{1, w, \dots, w^{N-1}\} \subseteq \mathbb{F}$ with $w^N = 1$ to exist is that $N$ divides $p-1$. That's why prime fields of order $p = Nk+1$ with large $N=2^n$ are desirable: they allow for the use of FFT to interpolate large columns. ### Factorization and zerofiers There's another aspect where one can leverage the use of domains $D\subseteq \mathbb{F}$ of the form $\{1, w, \dots, w^{N-1}\}$ with $w^N=1$ and $N=2^n$. After computing all the polynomials that interpolate the columns of the trace, they are all combined (added, multiplied by constants, multiplied between them) in a specific way determined by the AIR such that the result $f$ is a polynomial that satisfies $f(d) = 0$ for all $d\in D$. The theory of univariate polynomials says that $f(d)=0$ for a specific $d$ if and only if $f$ can be written as $f = (X-d)t$ for some polynomial $t$. Iterating this fact we get that $f(d)=0$ for all $d\in D$ if and only if there exists a polynomial $t$ such that $$ p = (X-d_0)(X-d_1)\cdots(X-d_{N-1})t,$$ where $D = \{d_0,\dots,d_{N-1}\}$. The factor $(X-d_0)\cdots(X-d_{N-1})$ is called a *zerofier*. At some point the STARKs prover needs to evaluate $t(y)$ for some $a\in\mathbb{F}$. For which he computes $t(a)$ as $$\frac{p(a)}{(a-d_1)\dots(a-d_N)}.$$ But the product $(a-d_1)\cdots(a-d_N)$ has $N$ terms. So when $N$ is large, say $N=2^{20}$, this can be non negligible. The advantage with the sets of the form $D = \{1, w, \cdots, w^{N-1}\}$ with $w^N=1$ is that there's the equality $$ (a-1)(a-w)\cdots(a-w^{N-1}) = a^N - 1,$$ which is a handy shortcut since computing $a^N - 1$ is way much faster using a square and multiply algorithm. Specially if $N=2^n$, in which case computing $a^N$ involves only $n$ steps. And so $t(a)$ ends up having the nice form $t(a) = p(a) / (a^N - 1)$. ### Shifting Another very important aspect of domains $D$ of such shapes is the relation an interpolating polynomial has with the interpolants of the shifted data. In the context of STARKs this is very important as it is the way to model constraint systems. For example, suppose $T = (t_0, \dots, t_{N-1})$ is the column with a Fibonacci sequence. That means $t_{i+2} = t_{i+1} + t_i$. More precisely, suppose $p$ is the interpolant of $T$ at the domain $D=\{1, w, \dots, w^{N-1}\}$ with $w^N=1$. Then $p(w^i) = t_i$ for all $i$. Suppose we write $p$ in its coefficient form $$p = p_0 + p_1 X + \cdots + p_{N-1}X^{N-1},$$ with $p_i\in\mathbb{F}$. Then, $p(wX)$ denotes the polynomial $$p(wX) = p_0 + p_1wX + p_2w^2X^2 + \cdots + p_{N-1}w^{N-1}X^{N-1}.$$ It is another polynomial different from $p$. Similarly, $p(w^2X)$ denotes $$p(w^2X) = p_0 + p_1w^2X + p_2w^4X^2 + \cdots + p_{N-1}w^{2(N-1)}X^{N-1}.$$ Then, the Fibonacci constraint is that the polynomial $f := p(w^2X) - p(wX) - p(X)$ satisfies $f(w^i) = 0$ for all $i=0, \dots, N-3$. This ability to express polynomial equations on the entries of a column and its previous values is particularly easy for sets $D$ with such multiplicative structure. ### Takeway: good interpolation domains The takeaway here is that for efficiency reasons, the primes so far chosen for implementations of STARKs have the form $p = 2^nk + 1$ for some big $n$. Where "big" means that the expected traces that are going to be fed to the prover have at most $2^n$ rows. And this is so because that property translates into the existence of domains $D=\{1, w, \dots, w^{2^n-1}\}$ with $w^{2^n}=1$. It's the structure of these sets that enables all the advantages. Let's dig a little bit on how one construct these sets, since it will be illustrative for what's next. ### Generators For every prime $p$ there exists an element (many, actually) $w$ such that the set $\{1, w, ..., w^{p-2}\}$ is equal to the set $\{1, 2, \dots, p-1\}$. That means, the powers of $w$ cover all nonzero elements. An element $w$ with that property is called a *generator of the units group*. For example, if we take $p=13$, then the element $w=2$ is a generator of the units group: ```python p = 13 w = 2 powers_of_w = [pow(w, i, p) for i in range(p-1)] print(powers_of_w) # [1, 2, 4, 8, 3, 6, 12, 11, 9, 5, 10, 7] # `powers_of_w` has `p-1` distinct elements assert(len(set(powers_of_w)) == len(powers_of_w) == p-1) ``` Why such elements $w$ always exist? That's a non-trivial fact and out of the scope of these notes. But they will always be there. What's easier to grasp is the fact that if we take any $w$ (generator or not) and construct the set of powers of it, then it will eventually cycle back to $1$. This is a consequence of $\mathbb{F}_p$ being finite. So that's a way of trying to construct good interpolation domains $D$. We just need to get one with a power of two number of elements. ```python def generate_set_D(w, p): """ Returns the set of powers of w modulo p, D = [1, w, w**2, ..., w**(N-1)], where N is the least positive integer such that `w**N == 1`. """ D = [1] new_element = w % p while new_element != 1: D.append(new_element) new_element = (new_element * w) % p return D p = 13 # Number of elements of every possible generated set print([ len(generate_set_D(w=i, p=p)) for i in range(1, p) ]) # [1, 12, 3, 6, 4, 12, 12, 4, 3, 6, 12, 2] ``` As you can see from this example, the possible sizes of such sets always divide $12$ (which is $12=p-1$). And also for every divisor of $12$ there is a set of powers of some $w$ with that size. That's a general fact for every prime $p$, and the reason why $p-1$ must be divisible by a large power of $2$ for a prime $p$ to be the base field modulus of a STARKs implementation. Because then there are nice interpolation domains to use: the ones with a size equal to $2^k$ for some $k$. #### Generator of the largest interpolation domain suitable for FFT If $2^n$ is the largest power of two that divides $p-1$, then there is an element $w$ such that $D=\{1, w, \dots, w^{2^n-1}\}$ has $2^n$ distinct elements and $w^{2^n} = 1$. How to construct it? Doesn't really matter now. There are algorithms and ways to find it. And it will be fixed once and for all. What's more important is that given that element, one can construct good interpolation domains for smaller sizes $2^k < 2^n$. And that's as easy as taking the set generated by $w^{2^{n-k}}$. ```python p = 17 # The set generated by the powers of this `w` has size 16 = 2^4 w = 3 # The set generated by the element `w^{2^i}` has size 2^{4-i}` print([ len(generate_set_D(w=pow(w, 2**i, p), p=p)) for i in [0, 1, 2, 3, 4] ]) # [16, 8, 4, 2, 1] ``` And this is important because one would usually take the smallest interpolation domain in which the size of the columns fit. So being able to adjust it to the particular trace is handy. ## Circle STARKs The primes that are the aim of the Circle STARKs paper are the Mersenne primes, which are those primes of the form $p = 2^m-1$. So why are these non-smooth primes? If $p=2^m-1$, then $p-1 = 2^m - 2 = 2(2^{m-1}-1)$. Since $2^{m-1}-1$ is an odd number, no factor of $2$ can be extrated from it. With the notation of the previous section, $p = 2^nk + 1$ with $k$ odd and $n=1$. Therefore, with the usual FFT techniques, this prime would only allow interpolation of columns of length $2$ only, which is nonsense. This inability to classical FFT on $\mathbb{F}_p$ for Mersenne primes $p$ is the problem that motivated the circle STARKs paper. The authors find a workaround to do FFT for these primes by going from the realm of univariate polynomials and domains $D\subseteq\mathbb{F}$ to that of bivariate polynomials and domains contained in $\mathbb{F}^2$. ### Domains in $\mathbb{F}^2$ In classical FFT, the interpolation domains are the sets $D = \{1, w, \dots, w^{2^n-1}\}$ with $w^{2^n}=1$, which in particular are subsets of $\mathbb{F}$. Since such domains don't exist for the Mersenne primes, the idea of Circle FFT (and ECFFT before that) is to go and look for domains elsewhere. And the answer turns out to be sets $D$ contained in $\mathbb{F}^2$. This is the set of pairs of elements in the finite field $\mathbb{F}$. $$ \mathbb{F}^2 = \{(x,y): x, y \in\mathbb{F}\}$$ But the questions is, if one aims to do FFT and know that good domains are made out of powers $w^i$ of some element $w$ such that $w^N=1$, then we need a notion of multiplication between elements of $\mathbb{F}^2$. One could go ahead and define the most obvious multiplication rule, that is, the component-wise multiplication $(x,y)(x',y') = (xx', yy')$. But this is not adding anything new. The same problem we had before in $\mathbb{F}$ carries away to this setting if we use this multiplication rule. More precisely, if such $(w_0, w_1)$ exists whose powers generate a set $D$ of size $2^n$, then both $w_0$ and $w_1$ would be good fits for a domain $D$ in $\mathbb{F}$. And they don't exist. It turns out that one can define a multiplication in a very natural way that works. It is the same rule that the complex numbers have: $$(x,y)\cdot (x', y') = (xx'-yy', xy'+yx')$$ So for example, when $p=7$, then $(2, 4)\cdot (0, 1) = (3, 2)$. In Python this would be ```python def multiplication_rule(a, b, p): result_x = (a[0] * b[0] - a[1] * b[1]) % p result_y = (a[0] * b[1] + b[0] * a[1]) % p return (result_x, result_y) a = (2, 4) b = (0, 1) p = 7 print(multiplication_rule(a, b, p)) # (3, 2) ``` The reference to the complex numbers is explained as follows. A complex number is expressed as $x + yi$ with $x,y$ real numbers and $i$ the number such that $i^2=-1$. So if two complex numbers are multiplied and everything is expaned out, one obtains $$(x+yi)(x'+y'i) = xc + xy'i + yix' + yiy'i.$$ Then, using that $yiy'i = yy'ii = yy'i^2 = -yy'$, and that $xy'i + yix' = (xy' + yx')i$, we get $$(x+yi)(x'+y'i) = (xx' - yy') + (xy' + yx')i.$$ And there it is, the same rule. But in our context $x,x',y,y'$ are not real numbers, they are all elements of the prime field $\mathbb{F}$. The set $\mathbb{F}_p^2$ with the above rule of multiplication, and component-wise addition, turns out to be a field when $p$ is a Mersenne prime. It is a field with $p^2$ elements! And many similar properties of prime fields hold for it. In particular, one key property that we mentioned in the previous section for $\mathbb{F}_p$ is that an element $w$ such that $D=\{1, w, \dots, , w^{N-1}\}$ with $w^N=1$ exist if and only if $N$ divides $p-1$. Well, a very similar thing holds for $\mathbb{F}_p^2$. A set $D$ with $N$ different elements of the form $$D = \{(1, 0), (w_0,w_1), \dots, (w_0, w_1)^{N-1}\},$$ with $(w_0, w_1)^N=(1,0)$, exist if and only if $N$ divides $p^2-1$. Now, note that $p^2 -1 = (p-1)(p+1)$. And if in addition $p=2^n-1$ is a Mersenne prime, then $$p^2 - 1 = (p+1)(p-1)= 2^n(p-1) = 2^n(2^n-2) = 2^{n+1}(2^{n-1}-1).$$ This implies that $N$ can be taken to be as large as $2^{n+1}$. For example take $p=7$. It is a Mersenne prime, since $7 = 2^3 - 1$. That is, $n=3$. The set $\mathbb{F}_7^2$ has $49=7^2$ elements. And $$7^2 - 1 = 48 = 2^4(4 - 1) = 2^43.$$ That means that such sets $D$ with $2^4 = 16$ elements exist. In summary, using the multiplication rule inspired by the complex numbers, for Mersenne primes $p=2^n-1$ there exists elements $(w_0, w_1)$ such that their powers cycle back after $2^{n+1}$ steps, giving rise to sets $D=\{(1,0), (w_0, w_1), \dots, (w_0, w_1)^{2^{n+1}-1}\}$, with $(w_0, w_1)^{2^{n+1}}=(1,0)$. This of course is only the beginning of this approach. There's still many remaining questions, like if there exists an FFT algorithm that works over such sets that can leverage their properties. But more importantly, what does it mean to interpolate over sets $D\subseteq\mathbb{F}^2$? Let's do an example before answering those questions. #### Example in $\mathbb{F}_7$ Let $w_0=2$ and $w_1=4$. Let's write a Python code to generate powers of $(w_0, w_1)$ using the multiplication rule described above. ```python def generate_set_D_in_F2(w0, w1, p): """ Returns the set of powers of (w_0, w_1) in 𝔽², D = [1, (w_0, w_1), (w_0, w_1)**2, ..., (w_0, w_1)**(N-1)], where N is the least positive integer such that `(w_0, w_1)**N == 1`. """ D = [(1,0)] w0 = w0 % p w1 = w1 % p new_element = (w0, w1) while new_element != (1, 0): D.append(new_element) new_element = multiplication_rule(new_element, (w0, w1), p) return D (w0, w1) = (2, 4) D = generate_set_D_in_F2(w0, w1, 7) # The resulting set `D` has 16 different elements assert(len(set(D)) == len(D) == 2**4) print(D) # [(1, 0), (2, 4), (2, 2), (3, 5), (0, 1), (3, 2), (5, 2), (2, 3), (6, 0), (5, 3), (5, 5), (4, 2), (0, 6), (4, 5), (2, 5), (5, 4)] ``` Therefore $D$ is a subset of $\mathbb{F}_7^2$ similar in spirit to the classical domains used for interpolation in STARKs in the sense that it has $2^4=16$ elements (a power of two), and it is formed by the powers of an element $(w_0, w_1) := (2,4)$ such that $(w_0, w_1)^{16}=(1,0)$. $$ \begin{align} D = \{&(1, 0), (2, 4), (2, 2), (3, 5), (0, 1), (3, 2), (5, 2), (2, 3),\\& (6, 0), (5, 3), (5, 5), (4, 2), (0, 6), (4, 5), (2, 5), (5, 4) \} \end{align} $$ Note the potential of this approach. The field $\mathbb{F}_7$ has only $7$ elements and the only set contained in $\mathbb{F}_7$ suitable for classical FFT is the set $\{1, -1\}$ (since the only power of $2$ that divides $p-1$ is $2$). But by going to $\mathbb{F}_7^2$ we are able to get a set $D$ of size $16$ that has very similar properties to those classical FFT friendly sets. Namely, its size is 16, a power of $2$, and it is the set of powers of a single element: $(2, 4) \in \mathbb{F}_7^2$ such that $(2, 4)^{16} = (1, 0)$. ### Complex numbers, rotations The reference to the complex numbers is not anecdotic. If in addition to the multiplication rule defined above we consider the component-wise addition, then $\mathbb{F}^2$ is a field. It is the field with $p^2$ elements. And there's a very clear parallelism to the algebraic properties of $\mathbb{C}$. The element $(1, 0)$ really behaves like the element $1$ of the complex numbers. It is the neutral element for the multiplication rule. That means that $(1, 0) \cdot (x, y) = (x, y)$ for all $(x, y)$. It is just like the number $1$. $$(1, 0)\cdot(x, y) = (1\cdot x - 0\cdot y, 0\cdot x + 1\cdot y) = (x, y)$$ And if $(1, 0)$ is like the number $1$, then $(-1, 0)$ is like the number $-1$ $$(-1, 0)\cdot(x, y) = (-x, -y), \text{ for all } (x, y).$$ Similarly, the element $(0, 1)$ behaves like the complex number $i$. For example $$(0, 1)\cdot(0, 1) = (0\cdot0 - 1\cdot 1, 0\cdot 1 + 1\cdot 0) = (-1, 0),$$ So the equality $(0, 1)\cdot(0, 1) = (-1, 0)$ is the same as $i^2 = -1$. But the parallelism is more profound than that. Since $\mathbb{F}^2$ is a field, then polynomial equations in one variable have a many solutions as their degree. For example, in $\mathbb{C}$ the equation $z^4 - 1 = 0$ has only four solutions, namely $\{1, i, -1, -i\}$. In $\mathbb{F}^2$, we could ask which elements $(x, y)$ satisfy $(x, y)^4 = (1, 0)$. The answer is *the same*. The solutions are $$\{(1, 0), (0, 1), (-1, 0), (0, -1)\}.$$ It is easy to check that these elements satisfy the equation. And they are all the possible solutions since the polynomial $X^4 - 1$ has at most $4$ solutions in any field. That includes $\mathbb{F}^2$. ### What does it mean to interpolate over a set contained in $\mathbb{F}^2$? Let's shortly recall the classical FFT setting. After settling on a domain $D = \{1, w, \dots, w^{N-1}\}$, one would use it to interpolate columns $T=(t_0, \dots, t_{N-1})$ of size $N$, with $t_i\in\mathbb{F}$. That means, finding the unique polynomial $p\in\mathbb{F}[X]$ of degree at most $N-1$ such that $p(w^i) = t_i$ for all $i=0, \dots, N-1$. Now, let's say we settle on a domain $D = \{(1,0), (w_0, w_1), \dots, (w_0, w_1)^{N-1} \} \subseteq \mathbb{F}^2$ with $(w_0, w_1)^N = (1,0)$. And say we are given a column $T=(t_0, \dots, t_{N-1})$ with $t_i\in\mathbb{F}$. Interpolating in the previous sense doesn't even make sense, since a polynomial $p\in\mathbb{F}[X]$ cannot be evaluated at points in $D$ because these are contained in $\mathbb{F}^2$. Polynomials in one variable can only be evaluated at points in $\mathbb{F}$. What can be evaluated at points in $\mathbb{F}^2$ are polynomials in two variables! Moreover, if $p\in\mathbb{F}[X,Y]$ and $(x,y)$ is a point in $\mathbb{F}^2$, then the evaluation $p(x,y)$ is an element of $\mathbb{F}$, and it makes sense to ask for it to be equal to $t_i$, an entry of the column $T$. So, once we are embarked in the quest of making things work over domains $D\subseteq\mathbb{F}^2$, we need to start thinking that interpolating means finding a polynomial $p\in\mathbb{F}[X, Y]$ such that $p((w_0, w_1)^i) = t_i$ for all $i=0,\dots, N-1$. ```python T = [2, 2, 4, 6, 3, 3, 0, 6, 5, 3, 5, 2, 3, 6, 6, 0] (w0, w1) = (2, 4) D = generate_set_D_in_F2(w0, w1, 7) # An interpolating polynomial for T pulled out of a hat def p(x, y): """ Returns the evaluation on (x,y) of the polynomial p(X,Y) = X + X^3 + 4 X^4 Y^2 + 5 Y X + 3 Y^2 over F_7 """ return (x + x**3 + 4*x**4*y**2 + 5*y*x + 3*y**2) % 7 evaluations = [p(x,y) for (x,y) in D] assert(T == evaluations) ``` ### Factoring polynomials that vanish over a domain $D$ Unfortunately, moving to bivariate polynomials is not straightforward. The univariate STARKs and all the math tools in it heavily use properties of univariate polynomials that don't really generalize so well to multiple variables. Let's focus for now on the factorization $f = vt$ needed in the protocol. By this we mean the step in the protocol when the prover constructs the polynomial $f$ that encodes the AIR constraints and satisfies $f(d)=0$ for all $d\in D$. In the univariate case the polynomial $v$ is just $(X-d_0)\cdots(X-d_{N-1})$. This polynomial has the property that $D = \{x\in\mathbb{F}: v(x) = 0\}$. Finding such a polynomial for the domains $D\subseteq \mathbb{F}^2$ we've been considering would be the first step. Since then, the kind of factorization $f = vt$ would essentially be a consequence of a result known as Hilbert's Nullstellensatz. The question is: is there a polynomial $v\in\mathbb{F}[X, Y]$ such that the domain $D = \{(1,0), (w_0,w_1), \dots, (w_0, w_1)^{2^n-1}\}$ is equal to $\{(x,y)\in\mathbb{F}^2: v(x, y)=0\}$? Let's try to answer this question for the set $D$ generated by $(w_0, w_1) = (-1, 0)$. That is, $$D = \{(1, 0), (-1, 0)\}.$$ As we mentioned before, the set $D$ is the analog to the set $\{1, -1\}$ of the complex numbers. And it can be described as $$D = \{(x, y) \in \mathbb{F}^2: (x, y)^2 = (1, 0)\}.$$ Let's expand $(x, y)^2$ $$(x, y)^2 = (x^2 - y^2, 2xy)$$ Therefore, the set $D$ can be described as $$D = \{(x, y)\in\mathbb{F}^2: x^2 - y^2 = 1 \text{ and } xy = 0\}$$ And this is almost what we wanted. Except for the fact that we didn't find a single polynomial $v$. We found two. And there's no escaping from that. There's no single polynomial that can describe the same set. Let's do one more example to see how this escalates. Let $D$ be the domain generated by $(0, 1)$. That is $D = \{(1, 0), (0, 1), (-1, 0), (0, -1)\}$. As we mentioned before, this set is the set of elements $(x, y)$ such that $(x, y)^4 = (1, 0)$. If we expand $(x, y)^4$ we get $$(x, y)^4 = (x^2 - y^2, 2xy)^2 = ((x^2 - y^2)^2 - (2xy)^2, 4(x^2 - y^2)xy).$$ And so, the set $D$ can be described as $$D = \{(x, y)\in\mathbb{F}^2: (x^2 - y^2)^2 - (2xy)^2 = 1 \text{ and }(x^2-y^2)xy = 0\}$$ Again, there's no escaping from the two defining polynomials. ```python import itertools def v1(x, y, p): return ((x**2 - y**2)**2 - (2 * x * y)**2 - 1) % p def v2(x, y, p): return ((x**2 - y**2) * x * y) % p p = 127 F = list(itertools.product(range(p), range(p))) D = [(x, y) for (x, y) in F if v1(x, y, p) == 0 and v2(x, y, p) == 0] # Recall -1 = 126 mod 127 print(D) # [(0, 1), (0, 126), (1, 0), (126, 0)] ``` In general, for every domain $D$ generated by an element $(w_0, w_1)$ of order $2^n$ there exist polynomials $v_1, v_2 \in\mathbb{F}[X, Y]$ such that $$ D = \{(x, y)\in\mathbb{F}^2: v_1(x, y) = 0 \text{ and } v_2(x, y) = 0\}.$$ And there's no single polynomial equation that defines it like in the case of univariate polynomials. How bad is that? Well, it's not hopeless. What can be proven is that if $f$ is a polynomial such that $f(d)=0$ for all $d\in D$, then $f$ can be written as $$f = v_1s + v_2t$$ for some polynomials $s, t$. I guess one could think of incorporating such decomposition in a STARKs protocol. Although it would introduce a lot of overhead to actually find $s$ and $t$. The happy path would look something like this. 1. Interpolate columns $T_i$ of the trace to polynomials $p_i\in\mathbb{F}[X, Y]$ over a domain $D=\{(1,0), (w_0, w_1), \dots, (w_0, w_1)^{2^n-1}\}$ efficiently and send the oracles $[p_i]$ to the verifier. 1. Given the interpolants $p_1, \dots, p_k$ of all the columns, construct off of them the polynomial $f\in\mathbb{F}[X, Y]$ that encodes the AIR constraints and satisfies $f(d) = 0$ for all $d\in D$. 1. Write $f = v_1s + v_2t$, where $v_1, v_2, t, s$ are polynomials in $\mathbb{F}[X, Y]$ and $v_1$ and $v_2$ are the public *vanishing* polynomials of $D$. The polynomials $v_1$ and $v_2$ should be efficient to evaluate. Send the oracles $[s]$ and $[t]$ to the verifier. 1. Open the polynomials $p_i(z_0,z_1)$, $t(z_0, z_1), s(z_0, z_1)$ for a random $(z_0, z_1)\in\mathbb{F}^2$ chosen by the verifier. Or $(z_0, z_1)\in\mathbb{E}^2$ for an extension field $\mathbb{E}$ of $\mathbb{F}$ in case $\mathbb{F}$ is a small field. One problem with this is that in step $3$ there's no easy shortcut to compute $s$ and $t$. When the factorization is just $f = vt$, then one can find $v$ by interpolating the values of $f(d)/v(d)$ over a domain disjoint from $D$. When the best we can get is an expression of the form $f = v_1s + v_2t$, we loose that trick. Or more precisely, it get's more difficult to carry over. The straightforward way to translate the trick to this setting would be to exploit the following fact. If $d\in\mathbb{F}^2$ is an element such that $v_1(d) \neq 0$ and $v_2(d) = 0$, then $$f(d) = v_1(d)t(d) + v_2(d)s(d) = v_1(d)t(d),$$ and so $t(d) = f(d)/v_1(d)$. And then, if we find many points $d_0, \dots, d_k$ like $d$, then we can interpolate the values $f(d_i)/v_1(d_i)$ to obtain $t$. Similarly, if we find an elment $d'\in\mathbb{F}^2$ such that $v_1(d') = 0$ and $v_2(d') \neq 0$, then $s(d') = f(d') / v_2(d')$. Finding many such elements would allow us to interpolate them and find $s$. There's still open issues like how to efficiently interpolate such polynomials, and how many points $d_i$ and $d_i'$ we would need. But the idea of Circle STARKs goes in this direction. Although in Circle STARKs the authors find a much better approach. ### One ring to almost rule them all We are stuck with two defining polynomials $v_1$ and $v_2$ for the domains $D\in\mathbb{F}^2$ we've been considering. And so far both $v_1$ and $v_2$ heavily depend on $D$, and their degrees increase with the number of elements of $D$. In the Circle STARKs paper the authors show that, even though there will always be two defining polynomials, one can restrict to some domains $D$ that all share one of their defining polynomials. And that shared polynomial is $$v_1 = X^2 + Y^2 - 1.$$ More precisely, there is a subset of the domains $D$ we've been considering that can be described as $$D = \{(x, y) \in\mathbb{F}^2: x^2 + y^2 = 1 \text{ and } v(x, y) = 0\},$$ for some polynomial $v\in\mathbb{F}[X, Y]$ that depends on $D$. The set of all points that satisfy this equation $x^2 + y^2 = 1$ is called the circle. Let's denote it $$C = \{(x, y) \in \mathbb{F}^2: x^2 + y^2 - 1 = 0\}.$$ There are enough of these sets for any practical purpose and one can restrict to use only such domains $D$. Slightly modifyng the happy path to only ever evaluate the polynomials at points on the circle, we obtain something along these lines 1. Efficiently interpolate the columns $T_i$ of the trace to polynomials $p_i\in\mathbb{F}[X, Y]$ over a domain $D=\{(1,0), (w_0, w_1), \dots, (w_0, w_1)^{2^n-1}\}$ contained in the circle and send the oracles $[p_i]$ to the verifier. Let $v$ be the polynomial such that $$D = \{(x, y): x^2 + y^2 = 1 \text{ and } v(x, y)=0\}.$$ 1. Given the interpolants $p_1, \dots, p_k$ of all the columns, construct off of them the polynomial $f\in\mathbb{F}[X, Y]$ that encodes the AIR constraints and satisfies $f(d) = 0$ for all $d\in D$. 1. Interpolate the values of $f(d) / v(d)$ for $d\in \hat D$ and denote $t$ the resulting polyomial. Where $\hat D$ is a domain contained in the circle disjoint from $D$. Send the oracle $[t]$ to the verifier. 1. Open the polynomials $p_i(z_0,z_1)$, $t(z_0, z_1)$ for a random $(z_0, z_1)\in C\setminus D$ chosen by the verifier. Or $(z_0, z_1)\in C(\mathbb{E})$ for an extension field $\mathbb{E}$ of $\mathbb{F}$ in case $\mathbb{F}$ is a small field. ### Enter the Circle On the real cartesian plane the unit circle is the set of elements that lie at distance $1$ from the origin. The distance of an element $(x,y) \in \mathbb{R}^2$ to the origin can be computed as $\sqrt{x^2 + y^2}$ (Pythagora's theorem). So being at distance $1$ from the origin means exactly $\sqrt{x^2 + y^2} = 1$. And this is equivalent to $x^2 + y^2 = 1$. Therefore, the circle in the real cartesian plane is the set $$\{(x,y)\in\mathbb{R}^2: x^2 + y^2 = 1\}.$$ For this reason, in our context of finite fields, the set $$\{(x,y)\in\mathbb{F}^2: x^2 + y^2 = 1\},$$ is also called the circle. When $p=2^m - 1$ is a Mersenne prime, this set has many nice properties. The main two are (not straightforward to see): 1. The number of elements of $C$ is $2^m$. 1. If $(x, y), (x', y')$ are two points on $C$, then the product $(x, y)\cdot (x', y')$ is also on the circle. One consequence of these two facts is that any element $(w_0, w_1)$ on the circle satisfies $(w_0, w_1)^{2^m} = (1, 0)$. The converse is almost true. There are in total $2^{m+1}$ elements $(w_0, w_1)\in\mathbb{F}^2$ with the property that $(w_0, w_1)^{2^n} = (1,0)$ for some $n$. So, half of the total such elements are on the circle. Another consequence is that if an element $(w_0, w_1)$ belongs to the circle, then the domain it generates $D = \{(1, 0), (w_0, w_1), \dots, (w_0, w_1)^{2^n-1}\}$ is also entirely contained in the circle. This will give us some shortcuts to find good vanishing polynomials for those domains $D$. So, in summary, for $p=2^m-1$: - The set $C = \{(x, y) \in\mathbb{F}^2: x^2 + y^2 = 1\}$, called the **circle**, has $2^m$ elements. - The elements of $C$ are half of the total elements in $\mathbb{F}^2$ with order a power of two. - The order of every element $(x, y) \in C$ is a power of two. - The product of two points on the circle lies again on the circle. ##### Example ```python def is_power_of_two(n): """ Returns `True` if `n` is a power of two and `False` otherwise """ if n <= 0: return False return n & (n-1) == 0 def order_of(x, y, p): """ Returns the order of the element `(x, y)` """ return len(generate_set_D_in_F2(x, y, p)) m = 3 p = 2**m - 1 # p = 7 elements_with_order_a_power_of_two = [ (x, y) for x in range(p) for y in range(p) if (x, y) != (0, 0) and is_power_of_two(order_of(x, y, p)) ] circle = [ (x, y) for x in range(p) for y in range(p) if (x**2 + y**2 - 1) % p == 0 ] # There are 16 elements with order a power of two assert((len(elements_with_order_a_power_of_two)) == 2**(m+1)) # The circle has 8 elements assert(len(circle) == 2**m) # The elements of the circle have order a power of two assert(set(circle).issubset(elements_with_order_a_power_of_two)) ``` ### Vanishing polynomials for domains contained in the circle Suppose we are working over a prime field $\mathbb{F}$ with $p$ elements and $p$ is a Mersenne prime. Say we start with an element $(w_0, w_1)$ on the circle and generate the domain $D = \{(1, 0), (w_0, w_1), \dots, (w_0, w_1)^{N-1}\}$, with $N=2^n$, the order of $(w_0, w_1)$. Also, note that $(x, y)^{N} = (1, 0)$ for all $(x, y) \in D$. That's because $(x, y)$ is equal to $(w_0, w_1)^i$ for some $i$, and then $$(x, y)^N = ((w_0, w_1)^{i})^N = ((w_0, w_1)^N)^i = (1, 0)^i = (1, 0).$$ Then, $D$ is a set of $N$ different elements $(x, y)$ that satisfy $(x, y)^N = (1, 0)$. Since $\mathbb{F}^2$ is a field, the equation $(x, y)^N = (1, 0)$ cannot have more than $N$ solutions. That means $$D = \{(x, y)\in\mathbb{F}^2: (x, y)^N = (1, 0)\}.$$ Expanding $(x, y)^N = (1, 0)$ would lead us to two polynomials $v_1, v_2 \in\mathbb{F}[X, Y]$ like in the examples we did before. And we don't want that. We want it to be expressed as $$D = \{(x, y)\in\mathbb{F}^2: x^2 + y^2 = 1 \text{ and } v(x, y) = 0\},$$ for some polynomial $v\in\mathbb{F}[X, Y]$. From the previous section we know that $D$ is in fact contained in the circle, since it is the domain generated by an element $(w_0, w_1)$ that's already on the circle. The trick is the following. First, recall that if an element $(x, y)$ is on the circle, then $(x, y)^N$ is on the circle as well. Second, notice that the only element on the circle whose first coordinate is $1$ is the element $(1, 0)$. Let's see why. Say $(a, b)$ is an element on the circle whose first coordinate is $1$. That means that $a^2 + b^2 = 1$ and $a=1$. But then, replacing $a=1$ in the circle equation we get $1^2 + b^2 = 1$. Which leaves us with $b^2 = 0$. Hence $b=0$, and $(a, b) = (1, 0)$. Now, if $(x, y)$ is on the circle and the first coordinate of $(x, y)^N$ is $1$ then the only possibility is that $(x, y)^N = (1, 0)$, since $(x, y)^N$ is also an element of the circle. All this to say that these conditions are equivalent for an element $(x, y)$ that's on the circle: 1. $(x, y)^N = (1, 0)$. 1. The first coordinate of $(x, y)^N$ is $1$. Therefore $$D = \{(x, y)\in\mathbb{F}^2: x^2 + y^2 - 1 = 0 \text{ and the first coordinate of $(x, y)^N$ is $1$}\}$$ Let's compute the first coordinate of $(x, y)^N$. Now we are going to use that $N=2^n$. Let's start by computing the first coordinate of $(x, y)^2$. We'll heavily use that $x^2 + y^2 = 1$, or what's the same, that $-y^2 = x^2 - 1$. $$ (x, y)^2 = (x^2 - y^2, 2xy) = (x^2 + x^2 - 1, 2xy) = (2x^2 - 1, 2xy). \\ $$ Then the first coordinate of $(x, y)^2$ is $2x^2 - 1$. Note that this doesn't depend on the $y$ coordinate! This is nice since now we can complete the whole picture by induction. Now that we know the first coordinate of $(x, y)^2$ we can compute the first coordinate of $((x, y)^2)^2 = (x, y)^4$. It is $2(2x^2 -1)^2 - 1$. And so on. ```python def compute_first_coordinate_of_power(x, y, n, p): """ Returns the first coordinate of (x, y)^{2^n} mod p when p is a Mersenne prime and (x, y) is on the circle """ if not is_power_of_two(p + 1): raise ValueError if (x**2 + y**2 - 1) % p != 0: raise ValueError result = x for i in range(n): result = (2 * result**2 - 1) % p return result def power_in_F2(a, n, p): """ Computes the n-th power of an element `a` in 𝔽² """ if n<0: raise ValueError if n==0: return (1,0) b = a for _ in range(n-1): b = multiplication_rule(a, b, p) return b p = 127 (x, y) = (100, 65) assert((x**2 + y**2 - 1) % p == 0) # Compute the first coordinate of the powers (x, y)^{2^i} # using the multiplication rule print([power_in_F2((x, y), 2**i, p)[0] for i in range(8)]) # [100, 60, 87, 24, 8, 0, 126, 1] # Compute the first coordinate of the powers (x, y)^{2^i} # using the shortcut print([compute_first_coordinate_of_power(x, y, i, p) for i in range(8)]) # [100, 60, 87, 24, 8, 0, 126, 1] ``` In this way we obtain the following list of polynomials $v_n$ $$ \begin{aligned} & v_1(x, y) = x, \\ & v_2(x, y) = 2x^2 - 1, \\ & v_3(x, y) = 2(2x^2 - 1)^2 - 1, \\ & v_4(x, y) = 2(2(2x^2 - 1)^2 - 1)^2 - 1 \\ & \dots \end{aligned} $$ The polynomial $v_n$ computes the first coordinate of $(x, y)^{2^{n-1}}$. So, finally, we get that the domain $D$ of size $2^n$ contained in the circle is equal to $$D = \{(x, y)\in\mathbb{F}^2: x^2 + y^2 = 1 \text{ and } v_{n+1}(x, y) = 1\}$$ ### Standard position cosets The actual domains used in circle STARKs are a slight variation of the ones we've been discussing. They are called *standard position cosets*. They have a better symmetry with respect to the $x$-axis. This is beneficial for adapting the univariate FFT and the univariate FRI, the proximity test that's behind the polynomial commitment scheme, to the bivariate setting. Let's start motivating the use of standard position cosets as a slight improvement over the domains $D$ so far discussed. Let's improve our notation. We've been denoting $D$ the domain generated by the powers of some element $(w_0, w_1)$. Let's indicate in the notation the number of elements by calling $D_n$ the set with $2^n$ elements. That is, $$D_n = \{(x, y)\in\mathbb{F}^2: x^2 + y^2 = 1 \text{ and } v_{n+1}(x, y) = 1\}.$$ As we've seen, these are all sets equal to the set of powers of some element $(w_0, w_1)$ of order $2^n$. One moral of the previous section is that $(1, 0)$ is the only point on the circle with first coordinate equal to $1$. And that was kind of the kick-starter. Describing sets in terms of what their first coordinate is is nice since we already have efficiently computable polynomials to obtain the first coordinate of points on the circle, and their powers. The *standard position coset* $H_n$ with $2^n$ elements is defined as $$H_n = \{(x, y) \in\mathbb{F}^2: x^2 + y^2 = 1 \text{ and } v_n(x, y) = 0\}.$$ Can you spot the difference with the set $D_n$? The difference is that the second equation is $v_n(x, y) = 0$, instead of $v_{n+1}(x, y) = 1$. That is, the set $H_n$ is the set of points $(x, y)$ on the circle such that the first coordinate of $(x, y)^{2^{n-1}}$ is $0$. ##### The set $H_1$ In this section we are going to take similar approach but starting with analyzing which points of the circle have null first coordinate. This leads to similar domains to the ones discussed so far. Say $(x, y)$ is in $H_1$. That means, $(x, y)$ is a point on the circle with $v_1(x) = x = 0$. Then $y^2 = 1$, which means that $y$ is either $1$ or $-1$. So, this time we have two points: $(0, 1)$ and $(0, -1)$. That is $H_1$. $$H_1 = \{(0, 1), (0, -1)\}.$$ This set is not of the form $\{(1, 0), (w_0, w_1), \dots, (w_0, w_1)^{2^{n-1}}\}$ for any $(w_0, w_1)$, since $(1, 0)$ does not belong to $H$. But it is a *rotation* of such a set. The set $H_1$ is equal to $$\{Q\cdot(1, 0), Q\cdot (-1, 0)\}$$ for $Q=(0, 1)$. This means, the set $H$ is a rotation by $Q$ of the set $D_1 = \{(1, 0), (-1, 0)\}$, which is the set of powers of $(-1, 0)$. This relation between $H_1$ and $D_1$ is denoted by $H_1 = Q\cdot D_1$. And means that the elements of $H$ consist of the elements of $D$ multiplied by $Q$. ```python p = 127 H = [(x, y) for (x, y) in circle if x == 0] print(H) # [(0, 1), (0, 126)] (w0, w1) = (-1, 0) D = generate_set_D_in_F2(w0, w1, p) print(D) # [(1, 0), (126, 0)] Q = (0, 1) QD = [multiplication_rule(Q, (x, y), p) for (x, y) in D] assert(H == QD) ``` ##### The set $H_2$ Let's go one step further and see what is $H_2$. What are the points $(x, y)$ on the circle such that $(x, y)^2$ has null first coordinate? We know that the first coordinate can be computed as $2x^2 - 1$. So, these are the points with $2x^2-1 = 0$. That is, $x$ must be a field element such that $x^2 = 2^{-1}$. And since the point $(x, y)$ is on the circle, then $$y^2 = 1 - x^2 = 1 - 2^{-1} = 2^{-1}$$ (recall $2^{-1}$ behaves like $1/2$). So the points we are looking for are all the points $(x, y)$ such that $x$ and $y$ are solutions to the equation $X^2 - 2^{-1} = 0$. It can be seen that for $\mathbb{F}_p$ with $p$ a Mersenne prime, that equation always has exactly two solutions $\{a, -a\}$. So, the set $H_2$ is $$H_2 = \{(a, a), (-a, a), (-a, -a), (a, -a)\}.$$ Again, this set is not the set of powers of some $(w_0, w_1)$ in the circle, because for example $(1, 0)$ is not in there. But notice that if we take $Q = (a, a)$, then this set can be expressed as $$H_2 = \{Q\cdot(1, 0), Q\cdot(0, 1), Q\cdot(-1, 0), Q\cdot(0, -1) \}.$$ That is, $H_2 = QD_2$ where $D_2$ is $$\{(1, 0), (0, 1), (-1, 0), (0, 1)\}.$$ Which is the set of powers of the point $(0, 1)$. ```python p = 127 H_2 = { (x, y) for (x, y) in circle if compute_first_coordinate_of_power(x, y, 1, p) == 0 } # Recall 119 == -8 in 𝔽₁₂₇ print(H_2) # {(119, 8), (8, 8), (119, 119), (8, 119)} (w0, w1) = (0, 1) D_2 = generate_set_D_in_F2(w0, w1, p) print(D_2) # [(1, 0), (0, 1), (126, 0), (0, 126)] a = 8 # a^2 = 2^{-1} assert(a**2 == pow(2, -1, p)) Q = (a, a) QD_2 = {multiplication_rule(Q, (x, y), p) for (x, y) in D_2} assert(H_2 == QD_2) ``` ##### General case TODO