The STARK Protocol

# The STARK Protocol ## High-Level Protocol Description The core of the STARK (Scalable Transparent Argument of Knowledge) protocol involves proving the validity of an execution trace through polynomial commitments and interactive proofs. The protocol can be summarized in three major steps: 1. **Arithmetization and Commitment of Execution Trace:** The prover encodes the computation's execution trace into a matrix $T$, which is transformed into polynomial form. The prover commits to these polynomials, which encode the trace's validity. 2. **Construction and Commitment of the Composition Polynomial:** The prover constructs the polynomial $F$, which encodes all the constraints from the execution trace, and computes the quotient polynomial $H = \frac{F}{G}$. 3. **Opening of Polynomials at Random Point:** The verifier challenges the prover with a random point $z$ and checks whether the values $F(z)$ and $H(z)$ satisfy the equation $H(z) = \frac{F(z)}{G(z)}$, thus confirming the trace's correctness. By following this process, the STARK protocol achieves both **soundness** and **efficiency**, ensuring that a verifier can be convinced of the correctness of a computation with high confidence, while minimizing the computational and communication costs involved. ## Arithmetization of the Execution Trace ### Trace Representation The **execution trace** of a computation is a matrix $T$, where each row represents the system's state at a specific time step, and each column tracks the values of specific variables (or registers) of the system. Denote the number of rows in the trace as $T$ and the number of columns (registers) as $w$. The entry $T_{i,j}$ represents the value of the $j$-th register at the $i$-th time step. Formally, we define the trace $T$ as a $T \times w$ matrix: $$T = \begin{pmatrix} T_{1,1} & T_{1,2} & \dots & T_{1,w} \\ T_{2,1} & T_{2,2} & \dots & T_{2,w} \\ \vdots & \vdots & \ddots & \vdots \\ T_{T,1} & T_{T,2} & \dots & T_{T,w} \\ \end{pmatrix}$$ Each column $T_j$ (for $j = 1, \dots, w$) represents the values of a particular register across all time steps. These columns will be arithmetized as polynomials over a finite field $\mathbb{F}$, which allows the use of algebraic techniques to verify the trace. ### Polynomial Representation of the Trace To transform the trace into a polynomial form, each column of the trace is interpreted as a polynomial evaluation over a set of points in a **trace evaluation domain**. Specifically, for each column $T_j$, we define a polynomial $t_j(x)$, where $t_j(g^i) = T_{i+1,j}$ for $i = 0, 1, \dots, T-1$, and $g$ is a generator of a multiplicative subgroup of the finite field $\mathbb{F}$. Thus, for each column $T_j$, we construct a polynomial $t_j(x)$ such that: $$t_j(1) = T_{1,j}, \quad t_j(g) = T_{2,j}, \quad \dots, \quad t_j(g^{T-1}) = T_{T,j}$$ Here, $g$ is the generator of a cyclic subgroup of size $N = 2^n$ (such that $2^n \geq T$), $\{ 1, g , g^2 , g^3 ,..., g^N \}$ ensuring that all trace points are covered by the polynomial evaluations. ## Constraint Systems in STARKs The correctness of the trace is verified by imposing two types of constraints: **boundary constraints** and **transition constraints**. These constraints are represented as polynomial relations that the trace polynomials must satisfy. ### Boundary Constraints **Boundary constraints** ensure that certain registers take specific values at particular time steps. For example, in the case of a Fibonacci computation, the first two elements of the sequence are fixed at 1. This can be enforced as boundary conditions on the first column of the trace: $$T_{0, 1} = 1, T_{1, 1} = 1$$ We can translate the constraints into polynomial relations. We know that $T_{0, 1} = t_1(1)$ and $T_{1, 1} = t_1(g)$. If the constraint holds, say at $x = g$, then the monomial $x - g$ devides $t_1(x) - 1$. This means that the result of the division of $t_1(x) - 1$ by $x - g$ is a polynomial, $$Q_{BC,1}(x)=\frac{t_1(x)-1}{x-g}$$ Analogously, $$Q_{BC,0}(x)=\frac{t_1(x)-1}{x-1}$$ One drawback in this approach is that if we have $n$ boundary constraints, we get $n$ polynomials. One optimization is to interpolate boundary constraints and obtain a new polynomial. In this case, $$t_{BC}(1) = 1, t_{BC}(g) = 1$$ Combining everything, we get $$Q_{BC}(x)=\frac{t_1(x)-t_{BC}(x)}{Z_{BC}(x)}$$ where $Z_{BC}(x)$ is the polynomial vanishing on the points where the boundary conditions are enforced: $$Z_{BC}(x)=(x-1)(x-g)$$ ### Transition Constraints **Transition constraints** ensure that the values in consecutive rows of the trace are related according to the program's transition rules. For example, in the Fibonacci sequence, each value is the sum of the previous two values, which can be written as: $$ T_{i+2,1} = T_{i+1,1} + T_{i,1} $$ This transition rule can be arithmetized as a polynomial constraint. The transition constraint for the Fibonacci sequence can be written as: $$ t_1(g^2 x) - t_1(g x) - t_1(x) = 0 $$ This ensures that at every step of the trace, the values follow the specified transition rule. To enforce this, we construct a **transition polynomial** $Z_T(x)$ that vanishes at the points corresponding to the transition steps: $$ Z_T(x) = \prod_{i=0}^{T-2} (x - g^i) $$ The transition constraint can then be expressed as: $$ Q_T(x) = \frac{t_1(g^2 x) - t_1(g x) - t_1(x)}{Z_T(x)} $$ If the transition rule holds, $Q_T(x)$ will be a polynomial. ## Combining Constraints To verify the trace's correctness, we need to ensure that all boundary and transition constraints hold. One way to do this is to combine the boundary and transition constraints into a single polynomial verification step. We take a random linear combination of the boundary and transition polynomials: $$ CP(x) = \alpha_{BC} Q_{BC}(x) + \alpha_T Q_T(x) $$ where $\alpha_{BC}$ and $\alpha_T$ are random coefficients chosen by the verifier. If both $Q_{BC}(x)$ and $Q_T(x)$ are polynomials, then $CP(x)$ will also be a polynomial. ## Vector Commitments ### Overview Given a vector $Y = (y_0, y_1, \dots, y_M)$, **committing** to the vector means the prover constructs a Merkle tree from the vector elements and sends the **root** of this tree, denoted $[Y]$, to the verifier. The commitment allows the verifier to later check any specific element of the vector at index $i$. When queried, the prover reveals the value $y_i$ and provides the **authentication path** from $y_i$ to the Merkle root. This path encodes both the position $i$ of the element in the vector and the vector’s length $M$. The verifier uses this path to ensure that the revealed value is indeed part of the committed vector. ### Commitment Protocol 1. **Commitment**: The prover builds a Merkle tree from the vector $Y = (y_0, y_1, \dots, y_M)$. The root of this Merkle tree, denoted $[Y]$, is sent to the verifier as the commitment to the vector. 2. **Opening**: To reveal the value of $Y$ at index $i$, the prover sends $y_i$ and the authentication path. The authentication path consists of the hashes needed for the verifier to recompute the Merkle root from $y_i$. 3. **Verification**: The verifier uses the received $y_i$, the Merkle root $[Y]$, and the authentication path to check if the value corresponds to the committed vector. The commitment scheme ensures that the prover cannot change the vector after committing to it, and the verifier can efficiently check the integrity of any element in the vector. ### Role in STARKs In STARKs, vectors committed via Merkle trees are typically evaluations of polynomials over a finite field. These polynomials are defined over a fixed **domain** $D = (d_1, \dots, d_M)$, which is known to both the prover and the verifier. The vector $Y$ is viewed as the evaluation of a polynomial $p(x)$ at the points in the domain $D$: $$ Y = (p(d_1), p(d_2), \dots, p(d_M)) $$ By committing to this vector, the prover essentially commits to the polynomial $p(x)$, allowing the verifier to check if the vector corresponds to a polynomial of a specific degree during the verification process. ## FRI Protocol The **FRI** (Fast Reed-Solomon Interactive Oracle Proof of Proximity) protocol is an interactive protocol that proves the proximity of a given vector to the evaluations of a low-degree polynomial. ### Key Parameters in FRI - $N = 2^n$ and $M = 2^m$, with $n < m$, define the powers of two that characterize the domain. - A domain $D = (d_1, \dots, d_M)$, where $d_i = h \omega^i$, with $h \in \mathbb{F}$ and $\omega$ a primitive $M$-th root of unity. The domain $D$ must exhibit symmetric properties necessary for FRI, such as $-d_i \in D$ for all $i$. ### Blowup Factor The **blowup factor** $\beta = M/N = 2^{m-n}$ is an important parameter in the FRI protocol. It controls the relation between the size of the evaluation domain $M$ and the degree $N-1$ of the polynomial being tested. The security of the protocol depends in part on this blowup factor, as it influences the probability of catching cheating provers. ## Degree Adjustment in Polynomial Constraints In the context of STARKs, the constraints imposed on the trace polynomials need to be adjusted to ensure they are of low degree. This is achieved through **degree adjustment**. ### Degree Adjustment Method Suppose a constraint polynomial $C_j(x)$ has a degree $D_j$. To ensure the final degree of the constraints is sufficiently low, the following adjustment is performed: $$ C_j(x) \cdot (\alpha_j x^{N - D_j - 1} + \beta_j) $$ where $\alpha_j$ and $\beta_j$ are random field elements chosen by the verifier, and $D$ is the smallest power of 2 greater than the maximum degree of any constraint. This adjustment ensures that the resulting constraint polynomials have a degree less than $N$, as required. ### Composition of Constraints Once the constraints are adjusted, the prover creates a **composition polynomial** by combining all individual constraints using a random linear combination. The composition polynomial is of the form: $$ CP(x) = \sum_{j=1}^{k} C_j(x) \cdot (\alpha_j x^{N - D_j - 1} + \beta_j) $$ where $k$ is the number of constraints. The degree of the composition polynomial is checked via a low-degree test, ensuring that the overall constraints are satisfied. ## Committing to the Trace The **execution trace** of a computation is represented as a matrix $T$ with $w$ columns, each corresponding to a **register** in the computation, and $T_{i, j}$ denoting the state of the $j$-th register at the $i$-th time step. To perform commitments, each column $T_j$ of the trace matrix must be transformed into a polynomial through interpolation. 1. For each column $T_j$, interpolate its values over a known **domain** $D_S = \{ 1, g , g^2 , g^3 ,..., g^{N - 1} \}$, resulting in a polynomial $t_j(x)$. This polynomial satisfies the property: $$ t_j(g^i) = T_{i,j}, \quad \text{for all} \, i $$ where $g$ is a generator of the domain. 2. Once the polynomials $t_j(x)$ are obtained, they are **committed** by evaluating them over a larger domain, called the **Low-Degree Extension (LDE) domain**, denoted $D_{LED}$. The prover computes the following commitment for each polynomial: $$ [t_j] := \text{Commit}(t_j(D_{LED})) $$ Here, the values of the polynomial $t_j$ are evaluated on the extended domain $D_{LED}$, and a **Merkle tree** is constructed from the evaluations. The **root of the Merkle tree** serves as the prover’s commitment to the polynomial. ## Overview of the FRI Protocol The goal of the FRI protocol is for the prover to demonstrate that the **composition polynomial** $CP(x)$ is close to a **low-degree polynomial**. This is achieved through a process of recursive folding, where the degree of the polynomial is halved at each step, until a constant polynomial (or a low-degree polynomial) is reached. ### Recursive Folding The process of recursive folding is central to the FRI protocol. At each stage, the polynomial is folded by combining its values at corresponding points on a domain and its negations. This halving process continues until the polynomial is reduced to a form that can easily be checked by the verifier. The FRI protocol is divided into two phases: - **Commitment Phase**: The prover commits to the polynomial evaluations over progressively smaller domains. - **Query Phase**: The verifier queries random points to check the consistency of the polynomial across the different levels of folding. ## Commitment Phase The commitment phase begins with the prover taking the composition polynomial $CP(x)$ and transforming it by **halving its degree** through a series of recursive transformations. The transformation is as follows: 1. **Initial Step**: The prover starts with the polynomial $CP(x)$ and splits it into two parts using the following formulas: $$ g(x^2) = \frac{CP(x) + CP(-x)}{2} $$ $$ x \cdot h(x^2) = \frac{CP(x) - CP(-x)}{2} $$ Hence, the original polynomial $CP(x)$ can be expressed as: $$ CP(x) = g(x^2) + x \cdot h(x^2) $$ This step effectively reduces the degree of the polynomial by half. The polynomials $g(x^2)$ and $h(x^2)$ are defined over a **new domain** $D_1 = \{ h^2, h^2 \omega^2, \dots, h^2 \omega^{m - 2} \}$, which has half the size of the original domain $D$. 2. **Random Linear Combination**: The verifier selects a random coefficient $\alpha_0$, and the prover constructs a new polynomial: $$ P_1(x^2) = g(x^2) + \alpha_0 h(x^2) $$ This polynomial is evaluated over the new domain $D_1$. The prover computes the values of $P_1(x)$ over $D_1$ and **commits** to them by constructing a Merkle tree and sending the root of the tree to the verifier. 3. **Recursive Halving**: The prover continues this process by recursively halving the degree of the polynomial at each step. At step $k$, the polynomial is divided as follows: $$ P_k(y^2) = \frac{P_{k-1}(y) + P_{k-1}(-y)}{2} + \alpha_{k-1} \left( \frac{P_{k-1}(y) - P_{k-1}(-y)}{2} \right) $$ The domain $D_k$ at step $k$ is: $$ D_k = \{ h^{2^k}, (h\omega)^{2^k}, \dots, h^{2^k} \omega^{M - 2^k} \} $$ The prover evaluates the polynomial $P_k(x)$ over the domain $D_k$ and commits to these evaluations by constructing a Merkle tree and sending the Merkle root to the verifier. 4. **Final Layer**: When the folding process reaches the final layer $n$, the degree of the polynomial becomes 0, and the resulting polynomial $P_{n + 1}(x) = c$ is a constant value. The prover sends this constant $c$ to the verifier. ### Summary of Commitment Phase At the end of the commitment phase, the prover has committed to the evaluations of the polynomial across multiple domains of progressively smaller sizes. The Merkle roots at each step ensure the verifier can later check the consistency of the prover's claims. ## Query Phase In the query phase, the verifier selects a random point $q$ from the original domain $D_k$ and asks the prover to prove that the values of the polynomials in the FRI layers are consistent at that point. ### Prover's Response The prover must send the following information to the verifier for each FRI layer: 1. **Evaluations at $q$ and $-q$**: The prover sends the values $P_k(q)$ and $P_k(-q)$ at the chosen point $q$ and its negation $-q$, (the verifier randomly samples an index $q \in D_k$). 2. **Merkle Paths**: For each value $P_k(q)$ and $P_k(-q)$, the prover provides the Merkle proof (authentication path) that shows these values are part of the Merkle tree committed to in the previous step. 3. **Consistency Check**: For each level, the prover sends two elements, $P_k(q)$ and $P_k(-q)$, and demonstrates that these values are related to the next layer. The verifier checks that these values are consistent with the next layer $P_{k+1}(q^2)$. ### Verifier's Checks At each FRI layer, the verifier performs a **collinearity check** to ensure the consistency of the polynomial folding. Specifically, given the values $P_k(q)$, $P_k(-q)$, and $P_{k+1}(q^2)$, we can generate the following 3 points: - $A: (q, P_k(q))$ - $B: (-q, P_k(-q))$ - $C: (\alpha_k, P_{k+1}(q^2))$ The verifier verifies these paths against their proper roots and follows up by verifying that $A$, $B$ and $C$ fall on a straight line. This test is known as the *colinearity check*. Why would $A$, $B$, and $C$ lie on a straight line? Let’s find the line that passes through $A$ and $B$ and see what that means for $C$. An elementary Lagrange interpolation yields $$\begin{align*} y &= \sum_i y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j} \\ &= P_k(q) \cdot \frac{x - (-q)}{q - (-q)} + P_k(-q) \cdot \frac{x - q}{-q - q} \\ &= P_k(q) \cdot 2^{-1} \cdot q^{-1} \cdot (x + q) - P_k(-q) \cdot 2^{-1} \cdot q^{-1} (x - q) \\ &= \frac{P_k(q) + P_k(-q)}{2} + x \cdot \frac{P_k(q) - P_k(-q)}{2q} \enspace . \end{align*}$$ By setting $x = \alpha_k$ we get exactly the y-coordinate of $C$. ### Verifying the Last Folding with $k = n$ In the final folding layer $i = n$, the polynomial $P_n(x)$ is reduced to a constant value $c$. To verify this, the verifier computes $c'$ from the evaluations of the previous layer’s polynomial $P_{n-1}(x)$ at $\omega_j^{2^{n-1}}$ and $-\omega_j^{2^{n-1}}$, as follows: $$ c' = \frac{P_{n-1}(\omega_j^{2^{n-1}}) + P_{n-1}(-\omega_j^{2^{n-1}})}{2} + \alpha_{n-1} \left( \frac{P_{n-1}(\omega_j^{2^{n-1}}) - P_{n-1}(-\omega_j^{2^{n-1}})}{2} \right) $$ The verifier then checks if this computed $c'$ matches the constant value $c$ the prover committed to in the final layer. If the values are consistent, it confirms that the folding process was correctly executed and the polynomial was properly reduced to a constant. ## References 1. https://aszepieniec.github.io/stark-anatomy/fri 2. https://blog.lambdaclass.com/lambdaworks-or-how-we-decided-to-created-our-zksnarks-library-and-a-stark-prover/ 3. https://github.com/lambdaclass/lambdaworks/blob/main/docs/src/starks/protocol.md 4. https://github.com/lambdaclass/lambdaworks/blob/main/docs/src/starks/protocol_overview.md 5. https://eprint.iacr.org/2021/582.pdf 6. https://drops.dagstuhl.de/storage/00lipics/lipics-vol107-icalp2018/LIPIcs.ICALP.2018.14/LIPIcs.ICALP.2018.14.pdf 7. https://hackmd.io/@deanstef/SJTT3MDhC