owned this note
owned this note
Published
Linked with GitHub
# 1/4: Studying FRI Soundness
I actually skipped over a lot of parts on this, mostly intuitive understanding only. The reason for this was that the proof techniques were way too mathematically advanced and the techniques are really specialized towards Reed-Solomon codes only. It seems that the really hard part only comes from the list decoding part for the correlated agreement, and the other parts aren't that bad. I am also feeling a need to understand FRI much much better, so...
IMO the most painful part is that [BBHR18b] is the most accessible proof for FRI soundness yet it writes everything in additive form so it's hard to read for my not-abstract-enough brain.
Anyways this is more of a proof writing session than a short summary.
Denote $\text{RS}[\mathbb{F}, D, k]$ as the Reed-Solomon code over domain $D$ with degree $\le k$ polynomials.
Let $\rho = (k + 1) / n$ where $n = \lvert D \rvert$. $n$ is assumed to be a power of $2$.
We'll work with "binary" version of FRI where the decomposition is $f(x) = g(x^2) + x h(x^2)$.
We do not care about zero-knowledge for now. See [2022/1216](https://eprint.iacr.org/2022/1216.pdf) for adding ZK.
Some parts from [[BCI+23]](https://eprint.iacr.org/2020/654.pdf) will be studied later, especially the Berlekamp-Welch decoder part.
## FRI Description
Start with $f_0(x)$ over evaluation domain $D_0 = D$. The domains will be reduced as
$$D_0 \supseteq D_1 \cdots \supseteq D_r$$
where the size halves each time. One decomposes, for each round,
$$f_{i-1}(x) = g_{i-1}(x^2) + x \cdot h_{i-1}(x^2)$$
then on a random challenge $\lambda_i$ one computes
$$f_i(x) = g_{i-1}(x) + \lambda_i \cdot h_{i-1}(x)$$
over the domain $D_i = D_{i-1}^2$ and sends the evaluation to the verifier as a Merkle tree root.
The query phase takes $x_0 \in D_0$, computes $x_i \in D_i$ iteratively, and checks
$$f_i(x_i) = g_{i-1}(x_i) + \lambda_i \cdot h_{i-1}(x_i) = \frac{f_{i-1}(x_{i-1}) + f_{i-1}(-x_{i-1})}{2} + \lambda_i \cdot \frac{f_{i-1}(x_{i-1}) - f_{i-1}(-x_{i-1})}{2x_{i-1}}$$
as $$f_{i-1}(x_{i-1}) = g_{i-1}(x_i) + x_{i-1} \cdot h_{i-1}(x_i), \quad f_{i-1}(-x_{i-1}) = g_{i-1}(x_i) - x_{i-1} \cdot h_{i-1}(x_i)$$
The query phase is repeated for $s$ rounds to improve soundness.
The consistency of the final evaluation and the final polynomial can be done directly.
## [[BBHR18b]](https://drops.dagstuhl.de/storage/00lipics/lipics-vol107-icalp2018/LIPIcs.ICALP.2018.14/LIPIcs.ICALP.2018.14.pdf): FRI Soundness
The main theorem is Theorem 3.3 in [BBHR18b], which is that the soundness is at least
$$1 - \left( \frac{3 \lvert D_0 \rvert}{\lvert \mathbb{F} \rvert} + \left( 1 - \min \left\{ \delta_0, \frac{1 - 3 \rho - 4/\sqrt{\lvert D_0 \rvert}}{4} \right\} \right)^s \right)$$
To define $\delta_0$, we need to define some distances.
Denoting $S_i$ as the set of cosets w.r.t. $\{-1, 1\}$ in $D_i$, we let $\delta_i$ be the relative blockwise distance between $f_i$ and $\text{RS}_i = \text{RS}[\mathbb{F}, D_i, k / 2^i]$ over $S_i$. Basically, blockwise distance is like a Hamming distance, but over restrictions on domains instead of individual values.
$$\delta_i = \Delta_i (f_i, \text{RS}_i)$$
Note that the relative blockwise distance is always no less than the relative Hamming distance,
$$\delta_i = \Delta_i (f_i, \text{RS}_i) \ge \Delta_H (f_i, \text{RS}_i)$$
Define a round error set to be
$$RE_i(f_i, f_{i-1}, \lambda_i) = \left\{ x_{i-1}^2 : f_i(x_{i-1}^2) \neq \frac{f_{i-1}(x_{i-1}) + f_{i-1}(-x_{i-1})}{2} + \lambda_i \cdot \frac{f_{i-1}(x_{i-1}) - f_{i-1}(-x_{i-1})}{2x_{i-1}} \right\}$$
i.e. the set of points $x_{i-1}^2 \in D_{i}$ such that makes the query phase fail.
The round error probability is
$$err(f_i, f_{i-1}, \lambda_i) = \lvert RE_i(f_i, f_{i-1}, \lambda_i) \rvert / \lvert D_{i} \rvert$$
Define the closest codeword $\overline{f}_i$ to be the codeword in $\text{RS}_i$ that's closest to $f_i$ in the blockwise distance. Define the set of bad cosets $BS_i(f_i) \subset S_i$ as the disagreements of $f_i$ and $\overline{f}_i$, i.e.
$$BS_i(f_i) = \{S \in S_i: f_i|_S \neq \overline{f}_i|_S\}$$
and let $BD_i$ be the union of all elements in $BS_i(f_i)$, so $BD_i \subset D_i$.
Define the distortion set $B(f_{i-1}, \epsilon)$ of $f_{i-1}$ for $\epsilon > 0$ as the set of $\lambda_{i} \in \mathbb{F}$ such that
$$f_i(x_{i-1}^2) = \frac{f_{i-1}(x_{i-1}) + f_{i-1}(-x_{i-1})}{2} + \lambda_i \cdot \frac{f_{i-1}(x_{i-1}) - f_{i-1}(-x_{i-1})}{2x_{i-1}}$$
over $D_i$ has relative hamming distance to $\text{RS}_i$ less than $\epsilon$. Here, $f_i$ is purely derived from $f_{i-1}$. We'll denote this derived $f_i$ as $f_{i, f_{i-1}, \lambda_{i}}$ in some cases.
The key lemmas are, of course, the distance preservation. This is Lemma 4.3 and 4.4.
### Lemma 4.3
For any $\epsilon \ge 4/\lvert \mathbb{F} \rvert$ and $\delta_{i} > 0$, one has the "bad event" probability bounded as
$$\text{Pr} \left(\lambda_{i+1} \in B\left(f_{i}, \frac{1}{2} \left(\delta_{i} \left(1-\epsilon\right) - \rho \right)\right) \right) \le \frac{4}{\epsilon \lvert \mathbb{F} \rvert} $$
### Lemma 4.4
(Part 1): If $\delta_{i} < (1 - \rho) / 2$, then one has the "bad event" probability bounded as
$$\text{Pr} \left( \lambda_{i+1} \in B\left( f_{i}, \delta_{i} \right) \right) \le \frac{\lvert D_{i} \rvert}{\lvert \mathbb{F} \rvert}$$
(Part 2): Also, for $i < r$ and all $f_i, f_{i+1}, \cdots , f_r$ and $\lambda_{i+1}, \cdots \lambda_r$ satisfy
- all $\delta_j$ values are below $(1-\rho)/2$ (always in unique decoding radius)
- $\overline{f}_{j+1}$ is the "natural codeword derived from $\overline{f}_j$", i.e. $f_{j+1, \overline{f}_j, \lambda_{j+1}}$
- no distortion sets are touched, i.e. $\lambda_{j+1} \notin B(f_{j}, \delta_{j})$
in that case, choosing any element in $BD_i$ for the query phase will make it fail. Therefore,
$$\text{Pr}_{x_i \in D_i} (\text{QUERY FAIL}) \ge \delta_i $$
### Main Proof
Assume Lemma 4.3, 4.4 are true. Set $\epsilon = 4 / \lvert D_{r/2} \rvert$, while assuming $r$ is even for simplicity.
It can be easily computed that the total probability of a bad event is at most
$$\sum_{i=0}^r \max \left( \frac{\lvert D_{r/2} \rvert}{\lvert \mathbb{F} \rvert}, \frac{\lvert D_i \rvert}{\lvert \mathbb{F} \rvert} \right) \le \frac{3\lvert D_0 \rvert}{\lvert \mathbb{F} \rvert}$$
So we now assume that no bad event occurs. Therefore, we assume that
$$\delta_{i-1} \ge \frac{1 - \rho}{2} \implies \Delta_i (f_{i, f_{i-1}, \lambda_{i-1}}, \text{RS}_i) \ge \frac{1}{2} \left( \delta_{i-1} (1-\epsilon) - \rho \right) \ge \frac{1 - 3\rho - \epsilon}{4}$$
$$\delta_{i-1} < \frac{1- \rho}{2} \implies \Delta_i (f_{i, f_{i-1}, \lambda_{i-1}}, \text{RS}_i) \ge \delta_{i-1}$$
If the three assumptions from the Part 2 of Lemma 4.4 holds, the remaining proof is easy. Let's deal with the case where it doesn't. We know that if assumptions 1, 2 hold but 3 doesn't, that's a "bad event" case which we already assumed to not happen. Therefore, either assumption 1 or 2 must fail. Take $i$ to be the largest integer such that either
- $\delta_i \ge (1 - \rho) / 2$
- $\delta_i < (1 - \rho) / 2$, but $\overline{f}_{i+1} \neq f_{i+1, \overline{f}_i, \lambda_{i+1}}$
Due to maximality of $i$, we know that $\delta_{i+1} < (1 - \rho) / 2$, providing uniqueness of $\overline{f}_{i+1}$.
The first claim is that
$$\Delta_H(\overline{f}_{i+1}, f_{i+1, f_i, \lambda_{i+1}}) \ge \frac{1 - 3\rho - \epsilon}{4}$$
If $\delta_i \ge (1 - \rho) / 2$, this holds as no bad events happen and Lemma 4.3 is true.
Now we are in the case $\delta_{i} < (1 - \rho) / 2$ and $\overline{f}_{i+1} \neq f_{i+1, \overline{f}_i, \lambda_{i+1}}$. The approach is that $\overline{f}_{i+1}$ and $f_{i+1, \overline{f}_i, \lambda_{i+1}}$ are different elements in $\text{RS}_{i+1}$, so they must be $1 - \rho$ apart. Therefore,
$$1 - \rho \le \Delta_H( \overline{f}_{i+1}, f_{i+1, \overline{f}_i, \lambda_{i+1}}) \le \Delta_H(\overline{f}_{i+1}, f_{i+1, f_i, \lambda_{i+1}}) + \Delta_H(f_{i+1, f_i, \lambda_{i+1}}, f_{i+1, \overline{f}_i, \lambda_{i+1}})$$
So it suffices to bound $\Delta_H(f_{i+1, f_i, \lambda_{i+1}}, f_{i+1, \overline{f}_i, \lambda_{i+1}})$. Thankfully, due to the formulas
$$\Delta_H(f_{i+1, f_i, \lambda_{i+1}}, f_{i+1, \overline{f}_i, \lambda_{i+1}}) \le \delta_i < (1 - \rho) / 2$$
Note how this works due to $f_{i+1}(x_i^2)$ being dependent on $f_i$ values over a coset.
This proves
$$\Delta_H(\overline{f}_{i+1}, f_{i+1, f_i, \lambda_{i+1}}) > (1 - \rho) / 2 \ge \frac{1 - 3\rho - \epsilon}{4}$$
The second claim is that
$$\frac{\lvert RE(f_{i+1}, f_i, \lambda_{i+1}) \cup BD_{i+1} \rvert}{\lvert D_{i+1} \rvert} \ge \Delta_H(\overline{f}_{i+1}, f_{i+1, f_i, \lambda_{i+1}})$$
which is actually quite trivial, since if an element is not in $RE(f_{i+1}, f_i, \lambda_{i+1}) \cup BD_{i+1}$ then
$$\overline{f}_{i+1}(x_{i+1}) = f_{i+1}(x_{i+1}) = f_{i+1, f_i, \lambda_{i+1}}(x_{i+1})$$
where first equation holds since $x_{i+1} \notin BD_{i+1}$ and the latter since $x_{i+1} \notin RE(f_{i+1}, f_i, \lambda_{i+1})$
So in conclusion, we have
$$\frac{\lvert RE(f_{i+1}, f_i, \lambda_{i+1}) \cup BD_{i+1} \rvert}{\lvert D_{i+1} \rvert} \ge \frac{1 - 3 \rho - \epsilon}{4}$$
Let's consider $x_{i+1}$ of the query phase - if this is in $RE(f_{i+1}, f_i, \lambda_{i+1})$ the query fails. If not, we move on to the next step, from which the three assumptions are all true. Therefore, if $x_{i+1}$ is in $BD_{i+1}$ the query fails as well. Therefore, the failure probability is at least $(1-3\rho-\epsilon)/4$.
We now move onto the proof of the Lemmas.
### Proof of Lemma 4.4
Since $\delta_i < (1 - \rho) / 2$, $\overline{f}_i$ and $BS_i$ are uniquely defined. For a bad coset $S \in BS_i$, define
$$X_{i, S} = \{ \lambda_{i+1}: f_{i+1, f_i, \lambda_{i+1}}(S^2) = f_{i+1, \overline{f}_i, \lambda_{i+1}}(S^2) \}$$
where $S^2 = x^2$ if $S = \{-x, x\}$. The claim is that
$$B(f_i, \delta_i) = \bigcup_{S \in BS_i} X_{i, S}$$
From $\delta_i < (1 - \rho) / 2$ we know that
$$\Delta_H(f_{i+1, f_i, \lambda_{i+1}}, f_{i+1, \overline{f}_i, \lambda_{i+1}}) \le \delta_i < (1 - \rho) / 2$$
so due to unique decoding radius
$$\Delta_H(f_{i+1, f_i, \lambda_{i+1}}, f_{i+1, \overline{f}_i, \lambda_{i+1}}) = \Delta_H(f_{i+1, f_i, \lambda_{i+1}}, \text{RS}_{i+1})$$
now it's clear that this is less than $\delta_i$ precisely when at least one of the bad cosets somehow agree on the $f_{i+1, f_i, \lambda_{i+1}}$ and $f_{i+1, \overline{f}_i, \lambda_{i+1}}$ - so basically the claim follows.
Also, it's clear that $X_{i, S}$ has size at most 1 - so $\lvert B(f_i, \delta_i) \rvert \le \lvert D_i \rvert$ follows.
Now assume that all three assumptions hold for Part 2. By translation, assume $\overline{f}_i = 0$. Via induction and assumption 2, we get $\overline{f}_j = 0$ for all $j \ge i$. Via assumption 3, we have
$$\lambda_{j+1} \notin \bigcup_{S \in BS_j} X_{j, S}$$
Assume that we take $x_i, x_{i+1}, \cdots,$ for the query phase, and $x_i \in BD_i$. We show that the query phase fails. Take the maximum $j$ such that $x_j \in BD_j$ - $j$ is well-defined and $j < r$.
At this point, we know (via $x_j \in BD_j$ and $\lambda_{j+1}$ not inside any $X_{j, S}$)
$$f_{j+1, f_j, \lambda_{j+1}}(x_{j+1}) \neq f_{j+1, \overline{f}_j, \lambda_{j+1}}(x_{j+1}) = 0$$
and
$$f_{j+1}(x_{j+1}) = \overline{f}_{j+1}(x_{j+1}) = 0$$
so
$$f_{j+1, f_j, \lambda_{j+1}}(x_{j+1}) \neq f_{j+1}(x_{j+1})$$
making the query phase fail. Note that $x_{j+1} \notin BD_{j+1}$ via maximality.
### Proof of Lemma 4.3
This is the first part of the proof in [BBHR18b] that introduces some bivariate polynomial stuff.
Before getting into the machinery, let's go over why we need that in the first place.
The approach is to prove that, if $\epsilon \ge 4 / \lvert \mathbb{F} \rvert$ and
$$\frac{\lvert B(f_i, \frac{1}{2} \left( \delta (1 - \epsilon) - \rho) \right) \rvert}{\lvert \mathbb{F} \rvert} > \frac{4}{\epsilon \lvert \mathbb{F} \rvert}$$
then one must have $\Delta_i (f_i, \text{RS}_i) < \delta$.
First, some redefinitions - take
$$n = \lvert D_{i+1} \rvert, \quad \alpha = \frac{1}{2\delta} \left(\delta (1-\epsilon) - \rho\right), \quad \delta' = \delta \cdot \alpha, \quad B = B(f_i, \delta'), \quad m = \lvert B \rvert$$
so by defintion, for each $\lambda_{i+1} \in B$ we have $\Delta_H(f_{i+1, f_i, \lambda_{i+1}}, \text{RS}_{i+1}) < \delta'$.
Define $\overline{f}_{i+1, f_i, \lambda_{i+1}}$ to be the closest codeword from $f_{i+1, f_i, \lambda_{i+1}}$.
Now we move on to the bivariate polynomials. The main idea is to work over $B \times D_{i+1}$.
First, since $\overline{f}_{i+1, f_i, \lambda_{i+1}}$ is of degree $\rho n$, one can find a polynomial $C(X, Y)$ such that
$$\deg_X(C) < m, \quad \deg_Y(C) < \rho n, \quad C(X, Y) = \overline{f}_{i+1, f_i, X}(Y)$$
also, via the definition, one can create a $Q(X, Y)$ such that
$$\deg_X(Q) < 2 < \epsilon m, \quad Q(X, Y) = f_{i+1, f_i, X}(Y)$$
Meanwhile, by definition, we know that $C(X, Y) \neq Q(X, Y)$ with probability less than $\delta'$.
Therefore, by finding a linear dependence, we can find $E(X, Y)$ such that
$$C(X, Y) \neq Q(X, Y) \implies E(X, Y) = 0, \quad \deg_X(E) \le \alpha m, \quad \deg_Y(E) \le \delta n$$
We now wish to work with the identity, which works for all points in $B \times D_{i+1}$
$$P(x, y) = C(x, y) \cdot E(x, y) = Q(x, y) \cdot E(x, y)$$
First, we show that there exists such polynomial $P$ with
$$\deg_X(P) \le (\epsilon + \alpha)m, \quad \deg_Y(P) \le (\rho + \delta)n$$
This is via Proposition 4.2.9 in [[Spi95]](https://www.cs.yale.edu/homes/spielman/PAPERS/thesis.pdf). The statement is as follows.
Let $f(x, y)$ be a function over $X \times Y$ with $X = \{x_1, \cdots, x_m\}$ and $Y = \{y_1, \cdots, y_n\}$. If
- For each $1 \le j \le n$, $f(x, y_j)$ agrees with a degree $d$-poly in $x$
- For each $1 \le i \le m$, $f(x_i, y)$ agrees with a degree $e$-poly in $y$
Then, $f$ agrees with a polynomial $P(x, y)$ with degree $(d, e)$.
The proof is very simple - take $p_j(x) = f(x, y_j)$ with degree $d$. Also, for each $1 \le j \le e+1$ define $\delta_j(y)$ be the degree $e$ polynomial that is $1$ at $y_j$ and $0$ at other points in $Y$.
Now define $P(x, y) = \sum_{j=1}^{e+1} \delta_j(y) p_j(x)$ - this polynomial has degree $(d, e)$, and
$$P(x, y_k) = \sum_{j=1}^{e+1} \delta_j(y_k) p_j(x) = p_k(x) = f(x, y_k)$$
for $1 \le k \le e+1$. Therefore, each $f(x_i, y)$ agrees with $P(x_i, y)$ on at least $e+1$ points.
Since $f(x_i, y)$ can be represented with a degree $e$ polynomial, it must be $P(x_i, y)$.
Now with some fidgeting around the degree shows us that
$$E(X, y) | P(X, y) \text{ }\forall y \in D_{i+1}, \quad E(x, Y) | P(x, Y) \text{ } \forall x \in B$$
to be more exact, note $E(X, y) \cdot Q(X, y)$ agrees with $P(X, y)$ and is a low-degree polynomial just like $P(X, y)$. This shows that for each $y \in D_{i+1}$ we have $P(X, y) = E(X, y) \cdot Q(X, y)$.
Now the goal is to show that $E(X, Y) | P(X, Y)$ as a bivariate polynomial. If so, we set
$$Q' = P(X, Y) / E(X, Y)$$
We know that $Q(X, Y) = Q'(X, Y)$ on all points where $E(X, Y) \neq 0$.
This of course has $\deg_X(Q') < 2$ and $\deg_Y(Q') < \rho n$. Writing
$$Q'(X, Y) = P_1(Y) + X \cdot P_2(Y)$$
we can consider
$$\overline{f}_i(x) = P_1(x^2) + x \cdot P_2(x^2)$$
We claim that $\overline{f}_i$ is disagrees with $f_i$ in at most $\delta n$ cosets.
We know that
$$\frac{f_i(x) + f_i(-x)}{2} + \lambda \cdot \frac{f_i(x) - f_i(-x)}{2x} = Q(\lambda, x^2)$$
$$Q'(\lambda, x^2) = \frac{\overline{f}_i(x) + \overline{f}_i(-x)}{2} + \lambda \cdot \frac{\overline{f}_i(x) - \overline{f}_i(-x)}{2x}$$
If $Q'(\lambda, x^2) = Q(\lambda, x^2)$ for more than two $\lambda$, we show that $f_i = \overline{f}_i$ on $\{-x, x\}$.
Assume otherwise - then $E(\lambda, x^2) = 0$ for all but one $\lambda$. This forces $E(X, x^2)$ to be a zero polynomial over $X$ - as $\deg_Y(E) \le \delta n$, there are at most $\delta n$ possible values of $x^2$.
This proves that $\Delta_i(f_i, \text{RS}_i) < \delta$, as desired. Now we move on to proving $E(X, Y) | P(X, Y)$.
### Lemma 4.7.
If $E(X, Y)$ is a polynomial of degree $(\alpha m, \beta n)$ and $P(X, Y)$ is a polynomial of degree $((\alpha + \delta), (\beta + \epsilon) n)$, and there exist distinct $x_1, \cdots, x_m$ such that $E(x_i, Y) | P(x_i, Y)$ and distinct $y_1, \cdots, y_n$ such that $E(X, y_i) | P(X, y_i)$, and the following inequality hold:
$$1 > \max \left\{ \beta + \epsilon, 2 \alpha + \delta + \frac{\epsilon}{\beta} \right\}$$
then $E(X, Y) | P(X, Y)$ as a bivariate polynomial.
To finish Lemma 4.3 with Lemma 4.7, raw computation is sufficient.
First, one shows that we can assume $\gcd(P, E) = 1$ - this can be done by raw computation: showing that removing the common factors keeps the main inequality assumption true.
Assume $\gcd(P, E) = 1$ with $E$ nontrivial. We show that the resultant is zero.
Assume $\beta \ge \alpha$. Consider the usual resultant matrix (see page 98 of the paper) - denote
$$P(x, y) = \sum_{i=0}^{(\beta + \epsilon)n} P_i(x) y^i$$
$$E(x, y) = \sum_{i=0}^{\beta n} E_i(x) y^i$$
so as usual it has $\beta n$ rows of $P_i$ and $(\beta + \epsilon)n$ rows of $E_i$. The degree of resultant is at most
$$(\alpha + \delta)m \cdot \beta n + \alpha m \cdot (\beta + \epsilon) n = mn(\alpha \beta + \delta \beta + \alpha \beta + \epsilon \alpha)$$
The first idea is that $E(x_i, Y) | P(x_i, Y)$ as polynomials, so the rank of the resultant matrix is at most $(\beta + \epsilon) n$ when substituted $x = x_i$. Also, by repeated application of the product differentiation rule, one sees that differentiating a determinant of a matrix of rank $R$ leads to a sum of determinants of matrices of rank at most $R + 1$. At most one row changes for each sum, increasing the rank by at most 1. See Proposition 4.2.17 in the [Spi95] paper.
Therefore, as the matrix is of size $(2\beta + \epsilon) n$, we see that $x_i$ is a zero of the resultant polynomial with multiplicity at least $\beta n$. This means there are at least $mn \beta$ roots.
It can be seen with some computation that
$$mn\beta > mn(\alpha \beta + \delta \beta + \alpha \beta + \epsilon \alpha)$$
which forces the resultant polynomial to be identically zero, a contradiction.