owned this note
owned this note
Published
Linked with GitHub
# Decoding Reed-Solomon Codes
When we discuss Reed-Solomon Codes or other codes in general in ZKPs, the main ingredient is really about proximity and their testing. For example, to test if $f_1, f_2, \cdots, f_m$ have correlated agreement to a codeword, how would you do it efficiently?
In that sense, if I recall correctly, you need to dig a little deep to find relations between Reed-Solomon decoding and ZKPs. Of course, that doesn't mean there's no relation between the two. The most critical relationship between the two comes from, none other than, [[BCIKS20]](https://eprint.iacr.org/2020/654.pdf). The proof technique for correlated agreement starts with running an efficient Reed-Solomon decoding algorithm to a word over a formal variable.
Also, the fact that there are polynomial number of codewords within the list decoding regime is an important part of some proofs, such as the soundness of STIR.
Therefore, it seems like a good idea to study the relevant theory regarding Reed-Solomon Codes and their decoding possibilities. Also, it's just good coding theory study and it's fun.
This note aims to study four things.
- The Johnson Bound, which works for arbitrary codewords
- Berlekamp-Welch Decoder, works over unique decoding radius
- Guruswami-Sudan Decoder, works over the Johnson Bound
- The STOC 2024 paper ["Randomly punctured Reed–Solomon codes achieve list-decoding capacity over linear-sized fields"](https://arxiv.org/pdf/2304.09445), which is a more combinatoral paper
- To be more exact, the paper doesn't discuss efficient decoding algorithms
as well as some study on some bounds on linear codes in the literature.
# Johnson Bound
We show two proofs: one combinatoral, and one geometric.
The desired statement is that for a code $C$ with distance $d$ and length $n$, $C$ is
$$\left(1-\sqrt{1 - d/n}, nd\right)$$
list decodable. Note that this bound doesn't depend on $q$, the alphabet size.
## Proof 1
I learned this from [this lecture note](https://courses.csail.mit.edu/6.440/spring08/notes/ST08-Lecture12.pdf) and confirmed the result [here](https://courses.cs.washington.edu/courses/cse533/06au/lecnotes/lecture6.pdf).
It also seems like [this note](https://www.cs.cmu.edu/~venkatg/teaching/au18-coding-theory/lec-scribes/list-decoding.pdf) is very good, although I found this after I wrote this.
Let $\rho < 1 - \sqrt{1 - d/n}$ and $y$ be a word. Consider $L$ codewords that are within $\rho n$ distance from $y$. Denote those codewords as $c_1, \cdots, c_L$. Now, we take a bipartite graph $[L] \times [n]$ where the $i$th vertex on the left is connected to the $j$th vertex on the right iff
$$(c_i)_j = y_j$$
holds. There are two important facts here. The first is that each vertex on the left has degree at least $(1 - \rho)n$, as the distance from $y$ is at most $\rho n$. Also, the second fact is that this bipartite graph does not contain $K_{2, n-d+1}$ as its subgraph. Indeed, if this graph did contain $K_{2, n-d+1}$, then this would imply that the two codewords agree on $n - d + 1$ entries, which contradict that the code has distance $d$. Now we're ready to finish.
The idea is to double count the tuple $T = (i_1, i_2, j)$ where
$$i_1 \neq i_2, \quad (c_{i_1})_j = (c_{i_2})_j = y_j$$
First, for each pair $(i_1, i_2)$, there are at most $n - d$ values of $j$. Therefore,
$$\#T \le (n-d) \cdot \binom{L}{2}$$
Also, for each $j$, denote $d_j$ as the degree of vertex $j$ on the right. Note that
$$\#E = \sum_{j=1}^n d_j \ge (1 - \rho)n \cdot L$$
and now, by simply using Jensen's inequality
$$\#T = \sum_{j=1}^n \binom{d_j}{2} \ge n \cdot \binom{(1 - \rho)L}{2}$$
Combining these two results, we obtain
$$n \cdot \binom{(1-\rho)L}{2} \le (n-d) \cdot \binom{L}{2}$$
Since $\rho < 1 - \sqrt{1 - d/n}$, we know $n(1 - \rho)^2 > n-d$ and now
$$L \le \frac{d - n\rho}{n(1 - \rho)^2 - (n - d)}$$
Note that $n(1-\rho)^2 > n - d$, and $\rho = \Delta / n$ for some integer $\Delta$. Therefore, the denominator, if non-zero, must be at least $1/n$ as it is an integer over $n$. This shows
$$L \le (d - n \rho) \cdot n \le nd$$
as desired.
## Proof 2
The second proof is geometric, and I learned it from [here](https://cse.buffalo.edu/faculty/atri/courses/coding-theory/lectures/lect17.pdf) and [here](https://www.cs.cmu.edu/~venkatg/teaching/codingtheory/notes/notes4.pdf).
First, we show that there's a function $f: C \rightarrow \mathbb{R}^{nq}$ such that
$$c \in C \implies \lVert f(c) \rVert = 1$$
$$c_1, c_2 \in C, c_1 \neq c_2 \implies \langle f(c_1), f(c_2) \rangle = 1 - \frac{q}{q-1} \cdot \left( \frac{\Delta(c_1, c_2)}{n} \right)$$
This can be shown by making a map $\phi: [q] \rightarrow \mathbb{R}^q$ as
$$\phi(i) = \left\langle \frac{1}{q}, \cdots, \frac{1}{q}, -\frac{q-1}{q}, \frac{1}{q}, \cdots, \frac{1}{q} \right\rangle$$
where the index of $-(q-1)/q$ is $i$. Then, by mapping $(c_1, c_2, \cdots, c_n)$ to
$$\sqrt{\frac{q}{n(q-1)}} \cdot \left\langle \phi(c_1), \cdots, \phi(c_n) \right\rangle$$
the proof follows by direct computation. This doesn't need to start from a codeword.
Now, take $y$ and say there are $L$ codewords that are within $\rho n$ distance from $y$.
For each distinct codewords $c_1, c_2$ within $B(y, \rho n)$ and any real $\alpha > 0$, we have
$$
\begin{aligned}
& \langle f(c_1) - \alpha f(y), f(c_2) - \alpha f(y) \rangle = \\
& (\alpha - 1)^2 - \frac{q}{q-1} \cdot \frac{\Delta(c_1, c_2)}{n} + \alpha \cdot \frac{q}{q-1} \cdot \left( \frac{\Delta(c_1, y) + \Delta(c_2, y)}{n} \right) \\
& \le (\alpha - 1)^2 - \frac{q}{q-1} \cdot \frac{d}{n} + 2\alpha\rho \cdot \frac{q}{q-1}
\end{aligned}
$$
Also, note that
$$
\begin{aligned}
\langle f(c_1) - \alpha f(y), f(c_1) - \alpha f(y) \rangle &= (\alpha - 1)^2 + 2\alpha \cdot \frac{q}{q-1} \cdot \left( \frac{\Delta(c_1, y)}{n} \right) \\
& \le (\alpha - 1)^2 + 2\alpha\rho \cdot \frac{q}{q-1}
\end{aligned}
$$
Let's denote
$$c = \frac{q}{q-1}, \quad \alpha = 1 - \rho c$$
If $\rho < 1 - 1/q$, then $\alpha > 0$. Plugging this in, we see that
$$\langle f(c_1) - \alpha f(y), f(c_2) - \alpha f(y) \rangle \le \rho c(2 - \rho c) - c (d/n) $$
$$\langle f(c_1) - \alpha f(y), f(c_1) - \alpha f(y) \rangle \le \rho c(2 - \rho c)$$
First, we show that the first inner product is negative. Indeed,
$$\rho (2 - \rho c) - d/n < \rho(2 - \rho) - d/n = 1 - d/n - (1 - \rho)^2 < 0$$
as $\rho < 1 - \sqrt{1 - d/n}$. Therefore, by resizing, this means that we have $L$ distinct vectors of unit length such that any two (distinct) vectors have inner product at most
$$\frac{\rho c(2 - \rho c) - c (d/n)}{\rho c(2 - \rho c)} = 1 - \frac{d/n}{\rho (2 - \rho c)}$$
At this point, we show the following Lemma. If $m$ unit vectors in $\mathbb{R}^n$ satisfy
$$\langle v_i, v_j \rangle \le -\epsilon, \quad \forall i \neq j$$
then $m \le 1 + \epsilon^{-1}$. This can be shown easily, by noting
$$0 \le \lVert \sum_{i=1}^m v_i \rVert^2 \le m - m(m-1)\epsilon$$
Therefore, we can now deduce that
$$L \le 1 + \frac{1}{\frac{d/n}{\rho (2 - \rho c)} - 1} = \frac{d/n}{d/n - \rho (2 - \rho c)}$$
There are many finishes here - see Exercise 4 and Theorem 11 in [this note](https://www.cs.cmu.edu/~venkatg/teaching/codingtheory/notes/notes4.pdf) for concrete results. These results can be directly derived from the computation so far.
For our results, however, we can just assume that $q > n$ (this is the case for ZKPs) which allows us to not care about the condition $\rho < 1 - 1/q$, and then simply using
$$L \le \frac{d/n}{d/n - \rho(2-\rho c)} \le \frac{d/n}{d/n - \rho(2 - \rho)} = \frac{d}{d - \rho n (2 - \rho)}$$
since the denominator $\rho n (2 - \rho)$ is integer over $n$, we see that $L \le dn$ as desired.
You can also solve the reverse problem, fixing $L$ and solving for $\rho$. You can prove that with there are at most $\epsilon^{-1}$ codewords within distance $n - \sqrt{n(n-d) + nd\epsilon}$. Basically, just set the $L$ bound as $\epsilon^{-1}$ and solve for $\rho$. The same computation works out, along with
$$1 - \sqrt{1-x} \le (1-1/q) \cdot \sqrt{1 - \frac{qx}{q-1}}$$
which hold for $0 \le x \le 1 - 1/q$. Refer to the note linked above for more discussion.
# Berlekamp-Welch Decoder
I studied this from [this lecture note](https://cse.buffalo.edu/faculty/atri/courses/coding-theory/lectures/lect27.pdf) and compared it to Lemma 4.2 of [[BCIKS20]](https://eprint.iacr.org/2020/654.pdf).
Now we look at efficient decoders. We'll just focus on polynomial running time.
Consider the Reed-Solomon code that is $[n, k, n - k + 1]$-code: in other words, the code is for polynomial with degree less than $k$, having the distance $n - k + 1$.
The Berlekamp-Welch decoder works over the unique decoding radius $(n - k + 1) / 2$.
Let $e \in \left(0, \frac{n-k+1}{2}\right)$ and we have inputs $\{(\alpha_i, y_i)\}_{i=1}^n$. We aim to output a polynomial $P$ of degree less than $k$ such that $\Delta(y, P(\alpha)) \le e$. Note that this $P$, if exists, is unique.
The idea is that you take two polynomials - $E, Q$ with $\deg E = e$, $\deg Q \le e + k - 1$ with
$$y_i E(\alpha_i) = Q(\alpha_i), \quad \forall 1 \le i \le n$$
If such polynomials are recovered, one simply outputs $P = Q/E$.
We need to show three things.
- Part 1: If $P$ exists, such $Q$ and $E$ exist.
- Part 2: If $P$ exists, then the output $Q/E$ result is unique.
- Part 3: If $P$ exists, such $Q$ and $E$ can by recovered in polynomial time.
## Part 1
This is very simple, basically we force $E = Q = 0$ only on the disagreements between $y$ and $P(\alpha)$. Let $D$ be the disagreement points. We can simply set
$$E(x) = \prod_{a \in D} (x - a), \quad Q(x) = P(x) E(x)$$
If $\alpha_i \notin D$, then since $y_i = P(\alpha_i)$ we are good. If $\alpha_i \in D$, then $E(\alpha_i) = Q(\alpha_i) = 0$.
Since $\deg E \le e$, we can shift both $E, Q$ by multiplying appropriate powers of $x$.
## Part 2
Assume that $Q_1, E_1, Q_2, E_2$ are all results satisfying our condition, and $Q_1/E_1 \neq Q_2/E_2$.
Now, we can easily recover the fact that
$$R(x) = Q_1(x) E_2(x) - Q_2(x)E_1(x)$$
is zero over all $\alpha_i$. This can be done by degree counting, as
$$\deg R \le e + (e + k - 1) < n$$
## Part 3
This can be done by solving a system of linear equations. The number of equations is $n$. The number of variables is $e + 1 + e + k \le n + 1$. We add the condition that the coefficient of $x^e$ in $E$ is $1$, as both $E, Q$ can be multiplied by a constant factor.
Now we can simply solve for a system of $n + 1$ equations and $n + 1$ variables.
# Guruswami-Sudan Decoder
I studied this from [this note](https://www.cs.cmu.edu/~venkatg/teaching/au18-coding-theory/lec-scribes/list-decoding.pdf), [this note](https://www.cs.cmu.edu/~venkatg/teaching/au18-coding-theory/lec-scribes/RS-list-decoding.pdf) and [this note](https://courses.csail.mit.edu/6.440/spring08/scribe/lec13.pdf).
The only pre-requisite (or things we need to take for granted) is that bivariate polynomials over $\mathbb{F}_q$ can be factorized in polynomial time. Anyways, we are given $n$ points $(a_i, y_i)_{i=1}^n$ and will try to find all polynomials of degree no more than $k$ with at least $\sqrt{kn}$ agreements.
- Actually, if you know the algorithm for factoring univariate polynomials over any finite field, you can do the bivariate polynomial factorization by taking an extension field with $x$ and factoring over $y$. After the initial factoring, you can go back to $\mathbb{F}_q[x]$.
- This is due to us only needing to find factors of the form $y - f(x)$.
## Main Approach
The rough framework for our decoder is as follows. First, we some parameter $l$. Then, we take the bivariate polynomial $Q(x, y) \neq 0$ with $Q(a_i, y_i) = 0$ for all $i$, and $Q$ has the form
$$Q(x, y) = \sum_{i=0}^l A_i(x) y^i, \quad \deg (A_j) \le d_j$$
Then, we take a factor $y - f(x)$ of $Q$, and output $f$ if it matches our goals.
The first choice is to use $l = \sqrt{n/k}$ and $d_j = n/l = \sqrt{nk}$. Note that the system has a non-zero solution, as the number of variables is $(l+1)(n/l+1)$ but has $n$ equations.
Suppose that $f$ satisfies $\deg f \le k$ and $f(a_i)$ agrees with $y_i$ for at least $2 \sqrt{kn}$ entries. We aim to show that $y - f(x)$ is a factor of $Q(x, y)$. Due to long division, it sufficies to show $Q(x, f(x)) = 0$. By shoving $x = a_i$ for the $2\sqrt{kn}$ agreements, we know that
$$Q(a_i, f(a_i)) = Q(a_i, y_i) = 0$$
so $Q(x, f(x))$ has at least $2\sqrt{kn}$ roots. Also, the degree of $Q(x, f(x))$ is at most
$$\deg Q(x, f(x)) \le d + l \cdot \deg f \le 2\sqrt{kn}$$
so we have $Q(x, f(x)) = 0$ as we desired. So this works for more than $2\sqrt{kn}$ agreements.
## Optimization
Of course, now we see that we can do better - let's set $d_j = D - k \cdot j$. We will have
$$\deg Q(x, f(x)) \le D$$
We still need the degree of freedom condition to hold - which requires
$$\sum_{i=0}^{\lfloor D/k \rfloor} (D - k \cdot i) \ge n$$
This can be shown to hold if $D = \lfloor \sqrt{2kn} \rfloor$. Therefore, this works for more than $\sqrt{2kn}$ agreements. This is better, but still off by a factor of $\sqrt{2}$. We patch this using multiplicities.
## Final Idea and Proof
This time, we set a bivariate polynomial $Q(x, y)$ such that
$$Q(\alpha_i, y_i) = 0, \quad \text{multiplicity} = r, \quad \forall 1 \le i \le n$$
$$\deg Q(x, f(x)) \le D$$
Working similarly, the number of equations we have to satisfy is $n \cdot \binom{r+1}{2}$, while the number of variables we have for our use is $\sum_{i=0}^{\lfloor D/k \rfloor} (D - k \cdot i)$. Therefore, we need
$$\sum_{i=0}^{\lfloor D/k \rfloor} (D - k \cdot i) > n \cdot \binom{r+1}{2}$$
We show that if $f$ is a polynomial that $f(\alpha_i) = y_i$, then $Q(x, f(x))$ divides $(x-\alpha_i)^r$.
Basically, the condition was that expanding around $(\alpha_i, y_i)$, all monomials $x^i y^j$ would have coefficients zero if $i + j < r$. Therefore, expanding $Q(x, f(x))$ around $(\alpha_i, y_i)$, we see that the expansion would look something like the following.
$$Q(x,f(x)) = \sum_{l + m \ge r} q_{l, m} (x - \alpha_i)^l (f(x) - f(\alpha_i))^m$$
but since $f(x) - f(\alpha_i)$ has a factor of $(x - \alpha_i)$, we are done.
Now, we can see that if $\deg f \le k$ and $f(\alpha_i) = y_i$ for $t$ values, and $rt > D$, then $y - f(x)$ must be a factor of $Q(x, y)$. This is simple - we know we need to show $Q(x, f(x)) = 0$. This is of degree at most $D$, but also we know that this $Q(x, f(x))$ must be a multiple of
$$\prod_{i \in I} (x - \alpha_i)^r$$
which is of degree $rt$. Since $rt > D$, we can conclude $Q(x, f(x)) = 0$.
Therefore, we need the setup where
$$rt > D, \quad \frac{D(D+2)}{2k} > n \binom{r+1}{2}$$
Note that our focus is on $t$. To see how far we can go with $t$, note that
$$rt(rt+2) > nk(r+1)r$$
by shoving $rt > D$. This roughly gives us
$$t(t+2/r) > nk(1+1/r)$$
for large values of $r$, for example $r = nk$, so we arrive that $t > \sqrt{nk}$ as desired.
# [[AGL24]](https://arxiv.org/pdf/2304.09445) - The Context
So we arrive at the final boss, which I decided to study mostly for entertainment.
To understand the context, we need to study some bounds regarding list decoding.
First of all, for general codes of alphabet size $q$, there's a fundamental bound on what's possible in list decoding. This is called the "list decoding capacity".
> Theorem. Let $q \ge 2$, $0 \le \rho < 1 - 1/q$, and $\epsilon > 0$ be a small real.
> For sufficiently large block length $n$, the following is true.
> (i). If $R \le 1 - H_q(\rho) - \epsilon$, there exists a $(\rho, \mathcal{O}(1/\epsilon))$-list-decodable code with rate $R$.
> (ii). If $R \ge 1 - H_q(\rho) + \epsilon$, all $(\rho, L)$-list-decodable code with rate $R$ has $L \ge q^{\Omega(\epsilon n)}$.
This implies that the regime changes at
$$1 - H_q(\rho)$$
where $H_q(p)$ is the $q$-ary entropy function
$$H_q(p) = p \log_q (q-1) - p \log_q p - (1-p) \log_q (1 - p)$$
Note that on large $q \ge \exp(\Omega(1/\epsilon))$, we have something along
$$H_q(p) = p + \mathcal{O}(1/\log q)$$
so the regime changes on $R = 1 - \rho - \epsilon$. This means
$$\rho = 1 - R - \epsilon$$
is the boundary, i.e. $\rho \approx 1 - R$ is the target.
Also, on list decoding, there's a generalization of the Singleton bound, from [[ST20]](https://arxiv.org/pdf/1911.01502).
> Theorem. If $C$ is a linear code in $\mathbb{F}_q^n$ with $q > L$, and $C$ is $(\rho, L)$-list-decodable, then
>$$ \frac{ \log (\lvert C \rvert )}{\log q} \le n - \left\lfloor \frac{(L+1)\rho n}{L} \right\rfloor$$
This means that, roughly speaking
$$\rho \approx \frac{L}{L+1} (1 - R)$$
is the optimal distance.
Therefore, one could ask if Reed-Solomon codes are
$$\left( \frac{L}{L+1}(1 - R - \epsilon), L \right)$$
list-decodable, and under what conditions.
This paper shows that, Reed-Solomon codes, created with **random evaluation points**, is
$$\left( \frac{L}{L+1}(1 - R - \epsilon), L \right)$$
average-radius list decodable, if
$$q \ge n + k \cdot 2^{10L/\epsilon}$$
This is **combinatoral**, so no algorithm on actual decoding is derived from the paper.
Before we get into the paper, let's study the proofs of the two Theorems above.
## Proof of List Decoding Capacity - Part 1
This is Theorem 7.4.2 in [this book](https://cse.buffalo.edu/faculty/atri/courses/coding-theory/book/web-coding-book.pdf) and also can be found in [this lecture note](https://www.cs.cmu.edu/~venkatg/teaching/au18-coding-theory/lec-scribes/list-decoding.pdf). The proofs shown here, both Part 1 and Part 2, is taken from the material in the links above.
For Part 1, we just choose the codewords randomly. Note that we don't care whether or not the codes are linear or not. This Theorem is on general codes.
So basically the idea is to choose $q^{nR}$ codewords randomly, and show via union bound that the probability that there exists a $y$ such that
$$\lvert B_q(y, \rho n) \cap C \rvert > L$$
is less than $1$. To do so, we concretely choose $R = 1 - H_q(\rho) - 1/L$.
For a fixed $y$, the probability that
$$\lvert B_q(y, \rho n) \cap C \rvert \ge L + 1$$
holds is, by the union bound, at most
$$\binom{\lvert C \rvert}{L+1} \cdot \left(\frac{\text{Vol}(B_q(y, \rho n))}{q^n} \right)^{L+1}$$
Here, we apply the important bound on Hamming ball's volumes, i.e.
$$\text{Vol}(B_q(y, \rho n)) \le q^{H_q(\rho) n}$$
Let's give a quick proof of this, taken from [this lecture note](https://www.cs.cmu.edu/~venkatg/teaching/codingtheory/notes/notes2.pdf).
Basically, one ccan compute that
$$\begin{aligned}
\frac{\text{Vol}(B_q(y, \rho n))}{q^{H_q(\rho) n}} &= \frac{\sum_{j=0}^{\rho n} \binom{n}{j} (q-1)^j}{(q-1)^{\rho n} p^{-\rho n}(1-\rho)^{-(1-\rho)n}} \\
&= \sum_{j=0}^{\rho n} \binom{n}{j} (q-1)^j (1-\rho)^n \left(\frac{\rho}{(q-1)(1-\rho)} \right)^{\rho n} \\
& \le \sum_{j=0}^{\rho n} \binom{n}{j} (q-1)^j (1-\rho)^n \left(\frac{\rho}{(q-1)(1-\rho)} \right)^j \\
&= \sum_{j=0}^{\rho n} \binom{n}{j} (1-\rho)^{n-j} \rho^j \le (\rho + (1 - \rho))^n = 1
\end{aligned}$$
Here, we have used the fact that
$$\frac{\rho}{(q-1)(1-\rho)} \le 1$$
which comes from the assumption $\rho \le 1 - 1/q$. Therefore, the probability is at most
$$\binom{\lvert C \rvert}{L+1} \cdot \left(\frac{\text{Vol}(B_q(y, \rho n))}{q^n} \right)^{L+1} \le q^{nR(L+1)} \cdot q^{(L+1)(H_q(\rho) n - n)} $$
and summing the bound over all $q^n$ possible $y$, we see that the total probability is at most
$$q^{n + nR(L+1) + (L+1)(H_q(\rho) n - n)} = q^{-n/L} < 1$$
by routine computation. This means that there must be a $C$ that we want.
## Proof of List Decoding Capacity - Part 2
We choose $y$ at random to compute the expected number of codewords in $B_q(y, \rho n)$.
By linearity of expectation, one can simply write
$$\mathbb{E}[\lvert B_q(y, \rho n) \cap C \rvert] = \frac{\text{Vol}(B_q(y, \rho n)) \cdot \lvert C \rvert}{q^n}$$
So now we need a lower bound of $\text{Vol}(B_q(y, \rho n))$. This can be done by showing
$$\text{Vol}(B_q(y, \rho n)) \ge \binom{n}{\rho n} (q-1)^{\rho n} \ge q^{H_q(\rho)n - \mathcal{o}(n)}$$
The first inequality is trivial - the second can be derived via Stirling's approximation. A very simple way to derive the second inequality is to use elementary calculus to show
$$\log (m-1)! \le m \log m - m + 1 \le \log m!$$
which immediately gives
$$\frac{m^m}{e^{m-1}} \le m! \le \frac{m^{m+1}}{e^{m-1}}$$
Now, if $R = 1 - H_q(\rho) + \epsilon$, we see that
$$\frac{\text{Vol}(B_q(y, \rho n)) \cdot \lvert C \rvert}{q^n} \ge q^{nR + nH_q(\rho) - n - \mathcal{o}(n)} = q^{\epsilon n - \mathcal{o}(n)} \ge q^{\epsilon n / 2}$$
so there must be a $y$ with $B_q(y, \rho n)$ having at least $q^{\epsilon n / 2}$ codewords.
## Proof of Generalized Singleton Bound
We assume by contradiction that
$$ \frac{ \log (\lvert C \rvert )}{\log q} > n - \left\lfloor \frac{(L+1)\rho n}{L} \right\rfloor$$
and show that there's a $y$ such that $B_q(y, \rho n)$ has $L+1$ codewords.
We can safely assume $\rho n$ is an integer, and we'll write
$$a = \left\lfloor \frac{(L+1)\rho n}{L} \right\rfloor = \rho n + \left\lfloor \frac{\rho n}{L} \right\rfloor$$
First, since $\lvert C \rvert > q^{n-a}$, by Pigeonhole we can find two codewords $c_1$ and $c_2$ that match the first $n-a$ coordinates. Also, since we are in a linear code, for each $\alpha \in \mathbb{F}_q$ we have $c_1 + \alpha (c_2 - c_1) \in C$. Note that these codewords also match each other in the first $n-a$ coordinates. Finally, note that we have the condition $q > L$, so with this we can find $L+1$ distinct codewords $c_1, c_2, \cdots, c_{L+1}$ that have equal first $n-a$ coordinates.
Now, we divide the remaining $a$ coordinates to "equal" (as equal as possible) $L+1$ chunks. Then, we define our $y$ to share the first $n-a$ coordinates, and for the remaining $a$ coordinates, $y$ is equal to $c_i$ on the $i$th chunk of the remaining $a$ coordinates.
In this case the maximum distance from $y$ to $c_i$ is at most
$$a - \left \lfloor \frac{a}{L+1} \right \rfloor \le \rho n$$
so we found $B_q(y, \rho n) \cap C$ having at least $L+1$ elements, a contradiction.
# [[AGL24]](https://arxiv.org/pdf/2304.09445) - Full Proof
The paper link is https://arxiv.org/pdf/2304.09445.
Also see [this youtube video - the talk on STOC24](https://www.youtube.com/watch?v=qjY23tSPbog), as it's very nice.
This paper continues the work of [[BGM22]](https://arxiv.org/pdf/2206.05256) and [[GZ23]](https://arxiv.org/pdf/2304.01403). These three proofs work on similar ideas, but the analysis and techniques get better. We'll describe the core plan first.
Also, although (AFAIK) the [[AGL24]](https://arxiv.org/pdf/2304.09445) paper introduced the hypergraph perspective of the proof, we will use this idea to explain [BGM22] and [GZ23] to make this note more concise.
This note will be self-contained as possible, but some graph theory may be omitted.
## Outline #1 - Looking at [BGM22] and [GZ23]
We are dealing with $\text{RS}_{n, k}(\alpha_1, \cdots, \alpha_n)$, so the evaluation points are $\alpha_1, \cdots, \alpha_n$.
Consider $y$ and codewords $c^{(1)}, \cdots, c^{(t)}$. We are to discuss the agreements between $y$ and the codewords, and their properties. To do so, we define an Agreement Hypergrpah.
A *hypergraph* $\mathcal{H} = (V, \mathcal{E})$ is like a graph, but with *hyperedges*. The hyperedges are subsets of $V$, and they are labeled. Each hyperedge have a weight $\text{wt}(e) = \max(0, \lvert e \rvert - 1)$. Also, the weight of hyperedge set is $\text{wt}(\mathcal{E}) = \sum_{e \in \mathcal{E}} \text{wt}(e)$.
Now, a natural definition of a agreement hypergraph is, with
$$V = [t], \quad \mathcal{E} = (e_1, e_2, \cdots, e_n), \quad e_i = \{j \in [t] : c_i^{(j)} = y_i\}$$
in other words, the codewords with $i$th coordinate equal to $y$'s is connected.
The good thing about the agreement hypergraph is that this perspective allows us to care less about $q$. After all, the most important parts come from whether or not there is an agreement or a disagreement. This is symmetry is used in all relevant works listed above.
Anyways, in this hypergraph, we can define a weak partition connectivity.
This means that for any partition $\mathcal{P}$ of $[t]$, we have
$$\sum_{e \in \mathcal{E}} \max \{ \lvert \mathcal{P}(e) \rvert - 1, 0 \} \ge k (\lvert \mathcal{P} \rvert - 1)$$
where $\lvert \mathcal{P} \rvert$ is the number of sets in the partition, and $\lvert \mathcal{P}(e) \rvert$ is the number of sets the edge $e$ intersects. This property is called $k$-weakly-partition-connectivity.
Why is this relevant? There are graph theoretic properties associated with them (for example, in normal graphs we have the [Nash-Williams Theorem](https://en.wikipedia.org/wiki/Nash-Williams_theorem) which is quite similar) but for us, the key idea is that it's relevant to a list-decodability counterexample.
**Lemma 2.3 in [AGL24]**
> If $y, c^{(1)}, \cdots, c^{(L+1)}$ are vectors in $\mathbb{F}_q^n$ such that
> $$\sum_{i=1}^{L+1} \Delta(y, c^{(j)}) \le L(n-k)$$
> then for some subset $J \subset [L+1]$ with $\lvert J \rvert \ge 2$, the agreement hypergraph of $y$ and $c^{(j)}$ for $j \in J$ is $k$-weakly-partition-connected.
This is the definition of "average"-list-decoding - note that the above deals with the "average" hamming distance of $L+1$ codewords is at least
$$\frac{L}{L+1} (n - k) = \frac{L}{L+1} n(1-R)$$
In our usual list decoding, we would be worried about at most $L$ words within
$$\frac{L}{L+1} (n-k) = \frac{L}{L+1} n(1-R)$$
distance. We can now see that average list decoding is a stronger variant.
The second element in the mix is the so called "Reduced Intersection Matrix".
Basically, we are building a set of linear constraints that each codeword has to satisfy. For example, say that $t=4$ and $e_1 = (1, 2, 4)$. In that case, we know that, for sure,
$$f_1(\alpha_1) = f_2(\alpha_1) = f_4(\alpha_1)$$
must hold. This means that, if we write
$$\left[ \begin{matrix} V_1 & -V_1 & 0 & 0 \\ V_1 & 0 & 0 & -V_1 \end{matrix} \right] \cdot \left[ \begin{matrix} f_1 \\ f_2 \\ f_3 \\ f_4 \end{matrix} \right] = 0$$
where we abuse the notation $f_i$ as the vector of their coefficients and
$$V_1 = [1, \alpha_1, \alpha_1^2, \cdots, \alpha_1^{k-1}]$$
This can be generalized further. If $e_2 = (2, 3)$, then we can add
$$\left[ \begin{matrix} V_1 & -V_1 & 0 & 0 \\ V_1 & 0 & 0 & -V_1 \\ 0 & V_2 & -V_2 & 0\end{matrix} \right] \cdot \left[ \begin{matrix} f_1 \\ f_2 \\ f_3 \\ f_4 \end{matrix} \right] = 0$$
and so on. So, we can kinda see how this matrix allows us to transform an agreement hypergraph into a matrix and their properties. We want to analyze this matrix further. However, there are some ground work to do. First of all, this matrix is already clearly not full rank - $f_1 = f_2 = \cdots = f_t$, then the matrix-vector product clearly computes to zero. To avoid this triviality, we can simply remove the last big column, and make the number of columns $(t-1) \cdot k$. In this case, if the resulting matrix is derived from an actual agreement hypergraph of codewords, we can see that this matrix still has, for example,
$$\left[ \begin{matrix} V_1 & -V_1 & 0 \\ V_1 & 0 & 0 \\ 0 & V_2 & -V_2 \end{matrix} \right] \cdot \left[ \begin{matrix} f_1 - f_4 \\ f_2 - f_4 \\ f_3 - f_4 \end{matrix} \right] = 0$$
Note that the vector is non-zero, as the codewords are distinct. Therefore, we can now assure the more non-trivial fact that this new matrix is not full column rank.
Finally, we want to utilize the fact that $\alpha_1, \cdots, \alpha_n$ are random. Indeed, we want to show that there are **no** $k$-weakly-partition-conected-hypergraphs that arise from codeword's agreement hypergraph. To do so, what we could do is that the matrix derived above, generated from $k$-weakly-partition-connected-hypergraphs, have full column rank. If it was generated from a codeword agreement hypergraph, we already know the resulting matrix will **not** have a full column rank. Therefore, our goal would be to show that $k$-weakly-partition-connected-hypergraphs lead to full rank matrices.
Now, with the fact that we are using the randomness property of $\alpha_1, \cdots, \alpha_n$, the next idea is clear. We replace $\alpha_1, \cdots, \alpha_n$ as formal variables, and then claim that the resulting matrix is full rank. So, we now replace the $V_1, V_2$ from before with
$$V_1 = [1, X_1, \cdots, X_1^{k-1}], \quad V_2 = [1, X_2, \cdots, X_2^{k-1}]$$
and show that the matrices are full column rank in terms of formal variables. In this case, for the matrix that arise from substituting $\alpha_1, \cdots, \alpha_n$ to $X_1, \cdots, X_n$ to be not full column rank, these $\alpha_1, \cdots, \alpha_n$ must be roots of a multivariate non-zero polynomial (i.e. determinant of some submatrix) so Schwartz-Zippel can handle everything for us.
Of course, this plan turns out to work.
**Theorem 2.11 in [AGL24]**
> If $\mathcal{H}$ is a $k$-weakly-partition-connected hypergraph with $n$ hyperedges and at least $2$ vertices, then $\text{RIM}_{\mathcal{H}}$ has full column rank over the field $\mathbb{F}_q(X_1, \cdots, X_n)$.
Here, $\mathbb{F}_q(X_1, \cdots, X_n)$ is the field of fractions with the polynomials with $X_1, \cdots, X_n$.
So, that's the Reduced Intersection Matrix. To reiterate, the RIM for $t=4$ and $e_1 = (1, 2, 4)$ and $e_2 = (2, 3)$ is (whether or not this is from agreement or not) the matrix
$$\left[ \begin{matrix} V_1 & -V_1 & 0 \\ V_1 & 0 & 0 \\ 0 & V_2 & -V_2 \end{matrix} \right]$$
where
$$V_1 = [1, X_1, \cdots, X_1^{k-1}], \quad V_2 = [1, X_2, \cdots, X_2^{k-1}]$$
At this point, the results in [BGM22] can be recovered. If there was a counterexample to
$$\left(\frac{L}{L+1}(1-R), L\right)$$
average-list-decodability, then there would be a $k$-weakly-partition-connected graph from agreement with vertex at most $L+1$. We would have a RIM that is full column rank in $\mathbb{F}_q(X_1, \cdots, X_n)$ but somehow not full column rank once substituted in $\alpha_1, \cdots, \alpha_n$. This means that we would have a size at most $Lk \times Lk$ submatrix with non-zero polynomial as its determinant, but evaluated to zero when substituted in $\alpha_1, \cdots, \alpha_n$. This happens with probability at most $\text{poly}(L, k)/q$ via Schwartz-Zippel. Since the number of possible hypergraphs is at most $\sum_{t=2}^{L+1} 2^{tn} \le 2^{(L+2)n}$, by the union bound we can see that with
$$q \gg \text{poly}(L, k) \cdot 2^{(L+2)n}$$
this can only happen with a very small probability. Note the exponential dependency to $n$.
To go further, with the argument of [GZ23], we need to utilize that we are interested in
$$\left( \frac{L}{L+1}(1-R-\epsilon), L \right)$$
average-list-decoding. In other words, we have an $\epsilon$ gap that we can utilize.
Here, the idea is that in the counterexample case, we can get away with removing $\epsilon n$ edges in our hypergraph, and *still* ending up with a full column rank RIM. Indeed, by Lemma 2.3, in the counterexample, we will end up with a $(k+\epsilon n)$ weakly-partition-connected hypergraph $\mathcal{H}$. If we remove any $\epsilon n$ edges and end up with $\mathcal{H}' = ([t], \mathcal{E}')$, for any partition $\mathcal{P}$ of $[t]$ we will have the following inequality.
$$\begin{aligned}\sum_{e \in \mathcal{E}'} \max \{ \lvert \mathcal{P}(e) \rvert - 1, 0 \} & \ge \sum_{e \in \mathcal{E}} \max \{ \lvert \mathcal{P}(e) \rvert - 1, 0\} - \sum_{i \in \mathcal{E} \setminus \mathcal{E}'} \max \{ \lvert \mathcal{P}(e) \rvert - 1, 0 \} \\
& \ge (k + \epsilon n) (\lvert \mathcal{P} \rvert - 1) - \lvert \mathcal{E} \setminus \mathcal{E}' \rvert \cdot (\lvert \mathcal{P} - 1 \rvert) \ge k \cdot (\lvert \mathcal{P} \rvert - 1) \end{aligned}$$
which shows that $\mathcal{H}'$ is indeed $k$-weakly-partition-connected. This is **Lemma 2.14**.
Intuitively, this means that our RIM is "so" full-column rank in $\mathbb{F}_q(X_1, \cdots, X_n)$, yet the evaluated version isn't. Therefore, we can do our Schwartz-Zippel across multiple $Lk \times Lk$ submatrices than just 1, which makes our bound on $q$ better, i.e. quadratic in $n$.
## Outline #2 - Main Idea of [GZ23]
We continue with the $(k+\epsilon n)$-weakly-partition-connected hypergraph notion.
**Lemma 3.1 in [AGL24]**
> If $\mathcal{H} = ([t], (e_1, \cdots, e_n))$ is $(k+\epsilon n)$-weakly-partition-connected with $t \ge 2$, then with $r = \lfloor \epsilon n / 2 \rfloor$ and randomly generated distinct $\alpha_1, \cdots, \alpha_n$, we have
> $$\text{Pr}[ \text{RIM}_{\mathcal{H}}(\alpha_1, \cdots, \alpha_n) \text{ doesn't have full column rank}] \le \binom{n}{r} 2^{tr} \cdot \left( \frac{(t-1)k}{q - n} \right)^{r}$$
First, let's quickly show why Lemma 3.1 finishes the problem. If there was $y$ and distinct codewords $c^{(1)}, c^{(2)}, \cdots, c^{(L+1)}$ such that the average-list-decoding fali, i.e.
$$\sum_{i=1}^{L+1} \Delta(y, c^{(i)}) \le L(n-(k+\epsilon n))$$
then we know this results in a agreement hypergraph being $(k+\epsilon n)$-weakly-partition-connected. Also, due to this being from an agreement hypergraph, this will result in an RIM that is not a full-column rank matrix. So, by the union bound over the $2^{(L+2)n}$ possible hypergraphs, we have the probability, over $\alpha_1, \cdots, \alpha_n$, of $\text{RS}_{n,k}(\alpha_1, \cdots, \alpha_n)$ being not
$$\left( \frac{L}{L+1}(1-R-\epsilon), L \right)$$
average-list-decodable will be at most
$$2^{(L+2)n} \cdot \binom{n}{r} 2^{(L+1)r} \cdot \left( \frac{Lk}{q - n} \right)^{r}$$
with $q = n + k \cdot 2^{10L/\epsilon}$, we can show this is at most $2^{-Ln}$ with routine computation.
What does the bound in Lemma 3.1,
$$\binom{n}{r} 2^{tr} \cdot \left( \frac{(t-1)k}{q-n} \right)^r$$
mean? Well, turns out this is another union bound. A not-full-column-rank will have a *certificate* to prove such properties, like a singular submatrix. The proof in [AGL24] shows
- **Lemma 3.8**: if the RIM is not full column rank, it'll have a certificate.
- **Corollary 3.10**: the number of possible certificates is small, at most
$$\binom{n}{r} 2^{tr}$$
- **Corollary 3.12**: over random $\alpha_1, \cdots, \alpha_n$, the probability of a certiciate is at most
$$\left( \frac{(t-1)k}{q-n} \right)^r$$
From these three subproblems, Lemma 3.1 clearly follows via the union bound. We note here that the improvement of [AGL24] is from the certificate definition - in [GZ23], the maximum number of certificates was $n^r$, which is larger than the bound in Corollary 3.10.
Let's look at the $n^r$ certificate version of [GZ23] first.
So what *are* the certificates anyway? Well, yes, they are the submatrices that are singular when evaluated on $\alpha_1, \cdots, \alpha_n$, but that is not quite sufficient. To have a result like Corollary 3.12, we want the singularity to happen over "somewhat independent chunks". Let me explain what I mean. If we had $r$ singular $(t-1)k \times (t-1)k$ submatrices, we kinda want to argue that the probability of this happening is something like
$$\left( \frac{(t-1)k}{q} \right)^r$$
which is exactly what's happening in Corollary 3.12. But this is certainly not possible immediately, as the Schwartz-Zippel over the $r$ submatrices are not independent.
To handle this, we need a new idea - a "point" in which the $\text{RIM}_{\mathcal{H}}$ becomes (or rather, their submatrices) singular. Define the "intermediate" substitutions
$$\text{RIM}_{\mathcal{H}} (X_{\le i} = \alpha_{\le i})$$
as the $\text{RIM}_{\mathcal{H}}$ with $X_1, \cdots X_i$ substitued with $\alpha_1, \cdots, \alpha_i$. Define the lexicographical order on all the $(t-1)k \times (t-1)k$ based on the row index tuple. Now, we define the "faulty index" of $A(X_1, \cdots, X_n)$ with full column rank as the minimum $i$ such that
$$\det(M(X_{\le i} = \alpha_{\le i})) = 0$$
where $M$ is the lexicographically smallest nonsingular submatrix of $A(X_1, \cdots, X_n)$. In other words, we take the "first" nonsingular submatrix $M$ in the original formal-variable matrix that makes the original matrix full-column rank. Then, we incrementally substitute $X_i \leftarrow \alpha_i$, then the point when $M$ becomes singular is the "faulty index". Note that if $A(\alpha_1, \cdots, \alpha_n)$ is not full-column-rank, such "faulty index" must exist.
Now, the idea to bring in faulty index is intuitively connected to the whole "independent Schwartz-Zippel" debacle - if two submatrices $A, B$ had different faulty indices, say $i_A < i_B$, then you can Schwartz-Zippel once for $\alpha_{i_A}$ and once for $\alpha_{i_B}$ separately.
To do so, we utilize the previously mentioned row deletion trick. Define $\text{RIM}_{\mathcal{H}}^B$ as the random intersection matrix, but with all the rows relevant to edges in $B$ deleted.
We run the following to get the certificate $(i_1, i_2, \cdots, i_r)$.
```python=
B = []
for i in [1, 2, ... r]:
idx = faulty index of RIM_H^B
assert idx != None
B.append(idx)
return B
```
Why does this work? Well, first of all, since we are working with $(k+\epsilon n)$-weakly-partition-connected hypergraphs, we know that deleting $r < \epsilon n$ hyper-edges still gives a $k$-weakly partition-connected-hypergraph. Therefore, the RIM would still be full column rank under all formal variables. Also, as we know that the matrix with $\alpha_1, \cdots, \alpha_n$ substituted will not have full column rank, the faulty index can be well-defined.
Note that these faulty indices must be different, as after selecting `idx` as the faulty index, the hyper-edge `idx` itself gets removed, which means the variable itself is now gone. Since the variable is gone, it cannot be the "point" a submatrix becomes singular.
Also, note that these $r$ submatrices chosen as the lexicographically smallest must be different, as a submatrix with faulty index `idx` must have that variable somewhere in the submatrix, but that variable is deleted in the matrix afterwards.
The finisher is **Lemma 3.11** of [AGL24], which also directly proves **Corollary 3.12** later.
**Lemma 3.11** in [AGL24] (Also **Lemma 4.5** in [GZ23])
> Let $i_1, \cdots, i_r$ be distinct indices and $M_1, \cdots, M_r$ be $(t-1)k \times (t-1)k$ submatrices of $\text{RIM}_{\mathcal{H}}$. If distinct evaluation points $\alpha_1, \cdots, \alpha_n$ are randomly chosen, the probability that all the events below hold for all $j=1, 2, \cdots, r$ is at most
> $$\left( \frac{(t-1)k}{q-n} \right)^r$$
> - $E_j$: event that $M_j(X_{\le i} = \alpha_{\le i})$ is nonsingular for $i < i_j$
> - $F_j$: event that $M_j(X_{\le i_j} = \alpha_{\le i_j})$ is singular
Note that there's no need to do a union bound over $M_1, \cdots, M_r$ - they are defined in the certificate generation algorithm, as $M_l$ is the lexicographically smallest nonsingular $(t-1)k \times (t-1)k$ submatrix of $\text{RIM}_{\mathcal{H}}^B$ for $B = [i_1, \cdots, i_{l-1}]$ for each $1 \le l \le r$.
As the events are solely dependent on $\alpha_1, \cdots, \alpha_n$, we can just sort all the indices (note that $i_1 < i_2 < \cdots < i_r$ is **not** true, but assuming this is fine for Lemma 3.11)
To bound this, we simply need to bound
$$\text{Pr}[F_j | E_1 \wedge F_1 \wedge \cdots \wedge E_{j-1} \wedge F_{j-1} \wedge E_j] \le \frac{(t-1)k}{q-n}$$
as the final probability will at most be the product of this conditional probability.
Note that $E_1 \wedge F_1 \wedge \cdots \wedge E_{j-1} \wedge F_{j-1} \wedge E_j$ is solely dependent on $\alpha_1, \cdots, \alpha_{i_j - 1}$, and $F_j$ is dependent on $\alpha_1, \cdots, \alpha_{i_j}$. Therefore, we can just prove that for each $\alpha_1, \cdots, \alpha_{i_j - 1}$ that makes $E_1 \wedge F_1 \wedge \cdots \wedge E_{j-1} \wedge F_{j-1} \wedge E_j$ true, we would have
$$\text{Pr}[F_j | \alpha_1, \cdots, \alpha_{i_j - 1}] \le \frac{(t-1)k}{q-n}$$
This is not too hard - $M_j$ on $\alpha_1, \cdots, \alpha_{i_j - 1}$ would not be already singular as $E_j$ is true, and $\det(M_j)$ would have $X_{i_j}$-degree of at most $(t-1)k$ as at most $t-1$ rows contain $X_{i_j}$ and each row contributes at most degree $k-1$. Now, as $\alpha_{i_j}$ is randomly selected from $q - i_j + 1 \ge q - n$ elements, working over the field $\mathbb{F}_q(X_{i_j+1}, \cdots, X_n)$ we see
$$\text{Pr}[F_j | \alpha_1, \cdots, \alpha_{i_j - 1}] \le \frac{(t-1)k}{q-n}$$
by Schwartz-Zippel. This concludes the proof of **Lemma 3.11**.
Now the proof in [GZ23] is done, with $n^r$ possible certificates. This gives us the bound
$$2^{(L+2)n} \cdot n^r \cdot \left( \frac{Lk}{q-n} \right)^r$$
from which we can see that $q$ needs to be at least $\Omega(nk)$ for the bound to go through. Indeed, this quadratic bound is what [GZ23] achieves, giving the polynomial bound on $q$.
## Outline #3 - Certificates in [AGL24]
This requires the additional idea that there are only $2^t$ possible "type" of hyper-edges, one for each subset of $[t]$. This symmetry will be utilized in [AGL24], as follows.
Let's take some $M_1$, lexicographically smallest nonsingular $(t-1)k \times (t-1)k$ submatrix. Let's say that this becomes singular at substitution $X_{i_1} \leftarrow \alpha_{i_1}$. What we now do is replace all instances of $X_{i_1}$ with $X_{i_1'}$ where the hyperedge $i_1' > i_1$ has the same type as $i_1$. Note that in the matrix, whatever rows in it with $X_{i_1}$, there are analogous rows in it with $X_{i_1'}$ instead, as the two hyper-edges are of same type. Let $M_2$ be the matrix derived by changing $X_{i_1}$ to $X_{i_1'}$ in $M_1$. The glorious thing is that $M_2$ is in fact a submatrix of the original matrix. Also, $M_2$ will become singular at some point $i_2 > i_1$, as $M_2$ will not become singular at $X_{i_1} \leftarrow \alpha_{i_1}$. If there's a hyperedge $i_2' > i_2$ with the same type, we can do the same thing, replacing $X_{i_2}$ with $X_{i_2'}$. This can continue on until we run out of types - from which where we can remove all indices used so far and restart the process. The nice thing about this is that the **faulty indices increase until we pull the restart trigger**.
Let's have a first attempt at an algorithm.
```python=
def faulty_index(M):
return minimum `i` such that M(X_..=i = \alpha_..=i) is singular
# let's just use 1-index
M_list = [""]
l_list = [""]
B = set([1, 2, ... n])
M_1 = lexicographically smallest (t-1)k x (t-1)k nonsingular submatrix of RIM_H
run_start = 1
M_list.append(M_1)
l_list.append(faulty_index(M_1))
for idx in [2, 3, ... , r]:
tau = type(l_list[idx - 1]) # get the type of the faulty index
s = number of values in l_list[run_start..idx-1] with type tau
next_var = sth smallest element in B with type tau
if next_var != None:
M_idx = swap_var(M_list[idx - 1], l_list[idx - 1], next_var)
# swap X_{l_list[idx - 1]} to X_{next_var}
M_list.append(M_idx)
l_list.append(faulty_index(M_idx))
else:
# we ran out of variables with the same type - "refresh"
B = set([1, 2, ... n]) \ set(l_list)
M_idx = lexicographically smallest (t-1)k x (t-1)k nonsingular submatrix of RIM_H^{set(l_list)}
M_list.append(M_idx)
l_list.append(faulty_index(M_idx))
run_start = idx
return M_list, l_list
```
You can already try to prove a lot of good properties. We'll state these a bit informally, as our presentation is already a bit different from the original [AGL24] paper. I'll link our observations with the Lemmas in the [AGL24] paper which have more formal treatment.
> Spoiler, some proofs below are wrong, but we'll fix them and the algorithm soon.
For example, **Lemma 3.2, 3.3** state that all the `M_idx` and `faulty index` are well-defined on refreshes. This can be easily derived as we did on [GZ23] - the number of variable deletions is at most $r < \epsilon n$, so our hypergraph is still $k$-weakly-partition-connected, which gives us a full-column-rank matrix over the formal variables. Therefore, in the case where the RIM doesn't have full-column-rank under evaluation $\alpha_1, \cdots, \alpha_n$, the algorithm runs.
Another good property in **Lemma 3.5, 3.6, 3.9** is that the faulty indices in `l_list` are all different, and that in a single "run", the faulty indices are increasing. Of course, the way we choose the variables in a single run forces us to choose different variables each time. Also, by swapping a faulty index variable to a larger index variable with the same type, we will be increasing the faulty index of the said matrix. Of course, when we do a refresh, the indices gathered so far will be removed from the matrix. This means that the matrices from the new runs will not be able to have the same faulty indices, as they were already removed. Note that this also means `M_list` are all pairwise distinct as well.
Finally, in **Lemma 3.7, 3.8** we show that `M_list` are all valid submatrices of the original $\text{RIM}_{\mathcal{H}}$. Of course, since we will work on the case where $\text{RIM}_{\mathcal{H}}$ is full-column-rank over the formal variables yet not full-column over substitutions $\alpha_1, \cdots, \alpha_n$, this also shows that each matrices in `M_list` have valid faulty indices as well. As mentioned before, because the types are equal, so they have the same type of rows. Therefore, a replacement of variables will not be able to change whether or not its a submatrix or not, unless...
Unless $M_{l-1}$ already had an instance of $X_{i_{l-1}'}$ in it. In this case, if we change $X_{i_{l-1}}$ to $X_{i_{l-1}'}$, we would have multiple instances of $X_{i_{l-1}'}$ in the new matrix $M_l$, threatening the submatrix property of $M_l$. This is our first major problem. Because the variables selected during the run are all different, all we need is that the matrix selected on refresh doesn't have $X_{i_{l-1}'}$ in it. This means that we actually have to remove this variable before selecting the lexicographically smallest nonsingular submatrix. Let's rewrite the algorithm.
```python=
def faulty_index(M):
return minimum `i` such that M(X_..=i = \alpha_..=i) is singular
# let's just use 1-index
M_list = [""]
l_list = [""]
for idx in [1, 2, 3, ... , r]:
tau = type(l_list[idx - 1]) # get the type of the faulty index
s = number of values in l_list[run_start..idx-1] with type tau
next_var = sth smallest element in B with type tau
if idx >= 2 and next_var != None:
M_idx = swap_var(M_list[idx - 1], l_list[idx - 1], next_var)
# swap X_{l_list[idx - 1]} to X_{next_var}
M_list.append(M_idx)
l_list.append(faulty_index(M_idx))
else:
# immediately start here on idx == 1
# we ran out of variables with the same type - "refresh"
B = set()
for tau in range(1 << t):
B = B.union([largest floor(r / (1 << t)) indices of type tau in [1, 2, ... n] \ set(l_list)])
M_idx = lexicographically smallest (t-1)k x (t-1)k nonsingular submatrix of RIM_H^{B.union(set(l_list))}
M_list.append(M_idx)
l_list.append(faulty_index(M_idx))
run_start = idx
return M_list, l_list
```
OK, so now things make more sense.
**Lemma 3.2, 3.3** are still fine, as the set $B$ has at most $r$ elements, and the set `l_list` also has at most $r$ elements. Therefore, the variable deletions are at most $2r \le \epsilon n$, which still preserves $k$-weakly-partition-connectivity. The logic from before still stands.
**Lemma 3.5, 3.6, 3.9**'s main ideas are still fine - note that in the flawed proof above we didn't account for the fact that if we had two same rows while swapping variables for `M_idx`, the resulting matrix wouldn't be even full-column-rank for formal variables. Therefore, faulty index cannot be defined. Thankfully, with the patch, the idea works. Indeed, on refresh, the `M_idx` will not have any variables in $B$ as we removed them before taking the lexicographically smallest nonsingular submatrix. The fact and logic that the selected variables are different within a single run stands. Therefore, we don't have to worry about swapping variables leading to two formally equal rows - so the faulty indices can be appropriately defined. The logic that the faulty indices increase on a single run stands, and the logic that `l_list` is pairwise differnet stands. This also resolves **Lemma 3.7, 3.8** that these `M_idx`'s are all submatrices of the original $\text{RIM}_\mathcal{H}$, and that there must be a valid certificate in the case where $\text{RIM}_{\mathcal{H}}(\alpha_1, \cdots, \alpha_n)$ is not full-column-rank.
The certificate here is the `l_list`. Due to the pairwise-distinct property, we can finish with **Lemma 3.11** we used for [GZ23] once we bound the number of possible certificates.
The main idea is that the number of distinct runs (or refreshes) is at most $2^t$. Since each run must be increasing, the number of possible certificates then becomes at most
$$\binom{n}{r} 2^{tr}$$
by selecting $r$ index values then assigning which run the index should go to.
This is derived from the fact that each run has at least $r/2^t$ length. Let's say a type $T$ doesn't have the full $\lfloor r/2^t \rfloor$ elements in $B$. That means that in `[1, 2, ... n] \ set(l_list)`, there are less than $\lfloor r/2^t \rfloor$ edges with type $T$, and they are all added in $B$. Therefore, upon deletion of $B$ and `set(l_list)`, no rows of type $T$ remain. This means that `M_idx` on the refresh will have no rows of type $T$, and as `M_idx` over a single run keeps a same set of types of rows, all matrices on this run will not have a row of type $T$ and will not appear as faulty indices during this run as well. This gives us the conclusion that **all faulty indices in this run come from types with full $\lfloor r/2^t \rfloor$ elements in $B$**.
Since a run continues until we run out of a type, this means that a single run has 1 refresh and then at least $\lfloor r/2^t \rfloor$ length go-around with $B$, so at least length $1 + \lfloor r/2^t \rfloor > r/2^t$.
This means that we would have at most $2^t$ runs, which gives us the desired result.
# [[AGL24]](https://arxiv.org/pdf/2304.09445) - Delegated Proofs
## Lemma 2.3 & Lemma 2.4 in [AGL24]
Connects *average-radius-list-decoding* and *$k$-weakly-partition-connectivity*.
**Lemma 2.3 in [AGL24]**
> If $y, c^{(1)}, \cdots, c^{(L+1)}$ are vectors in $\mathbb{F}_q^n$ such that
> $$\sum_{i=1}^{L+1} \Delta(y, c^{(j)}) \le L(n-k)$$
> then for some subset $J \subset [L+1]$ with $\lvert J \rvert \ge 2$, the agreement hypergraph of $y$ and $c^{(j)}$ for $j \in J$ is $k$-weakly-partition-connected.
The total edge weight in the agreement hypergraph is
$$\sum_{e} \text{wt}(e) \ge -n + \sum_{e} \lvert e \rvert \ge -n + \sum_{i=1}^{L+1} (n - \Delta(y, c^{(i)})) \ge Lk$$
from our assumption. Therefore, we prove the general statement in
**Lemma 2.4 in [AGL24]**
> Let $\mathcal{H} = (V, \mathcal{E})$ be a hypergraph with at least two vertices, and total edge weight
> $$\sum_{e \in \mathcal{E}} \text{wt}(e) \ge k \cdot (\lvert V \rvert - 1)$$
> for some positive integer $k$. Then, there exists a subset $V' \subset V$ such that the induced hypergraph $\mathcal{H}' = (V', \{V' \cap e : e \in \mathcal{E}\})$ is $k$-weakly-partition-connected.
To do so, we select an inclusion-minimal subset $V' \subset V$ such that $\lvert V' \rvert \ge 2$ and
$$\sum_{e \in \mathcal{E}} \text{wt}(e \cap V') \ge k \cdot (\lvert V' \rvert -1)$$
as the set itself $V$ satisfies this, we can select this $V'$ without issue. We claim that this $V'$ satisfies our needs. Due to inclusion-minimality, for all $V'' \subsetneq V'$ we will have
$$\sum_{e \in \mathcal{E}} \text{wt}(e \cap V'') \le k \cdot (\lvert V'' \rvert -1)$$
Now, for any partition $\mathcal{P} = P_1 \cup \cdots \cup P_p$ of $V'$, we compute
$$\begin{aligned}\sum_{e \in \mathcal{E}'} \max \{ \lvert \mathcal{P}(e) \rvert - 1, 0 \} &= \sum_{e \in \mathcal{E}'} \text{wt}(e) - \sum_{l=1}^p \sum_{e \in \mathcal{E}'} \text{wt}(e \cap P_i) \\ & \ge k \cdot (\lvert V' \rvert - 1) - \sum_{l=1}^{p} k \cdot (\lvert P_i \rvert - 1) = k(\lvert \mathcal{P} \rvert -1)\end{aligned}$$
as $\sum_{i=1}^p \lvert P_i \rvert = \lvert V' \rvert$ and $\lvert \mathcal{P} \rvert = p$. The first equation can be derived for each $e$ and added up. As this is our definition of $k$-weakly-partition-connectivity, the proof is complete.
## Theorem 2.11 in [AGL24]
Connects *$k$-weakly-partition-connectivity* and *column rank of random intersection matrix*.
**Theorem 2.11 in [AGL24]**
> If $\mathcal{H}$ is a $k$-weakly-partition-connected hypergraph with $n$ hyperedges and at least $2$ vertices, then $\text{RIM}_{\mathcal{H}}$ has full column rank over the field $\mathbb{F}_q(X_1, \cdots, X_n)$.
We delegate the proof to [AGL24]'s Appendix, mostly due to the fact that I haven't studied some ingredients of the proof and I don't have much intution on this.
For completeness I'll briefly go over the proof idea here.
You need three ingredients to this, and we will skip the proofs of these.
- *$k$-weakly-partition-connectivity* to *$k$-edge-disjoint paths* from root via graph theory
- *$k$-edge-disjoint paths* to *Generic Zero Pattern* also via graph theory
- *Generic Zero Pattern* to *Reed-Solomon Generators* via **GM-MDS Theorem**
### Part 1 - Graph Theory (Theorem A.3 in [AGL24])
Theorem A.3 in [AGL24] states
> A hypergraph $\mathcal{H}$ is $k$-weakly-partition-connected if and only if it has an orientation that, for some vertex $v$, the root, every other vertex $u$ has $k$ edge-disjoint paths to $v$.
Here, an orientation means that in each hyperedge, we choose one vertex as the head and consider all others as tails. We give a direction from tails to heads.
This is really graph-theory - and similar properties exist for graphs (not hypergraphs).
### Part 2 - Generic Zero Pattern (Corollary A.4 in [AGL24])
For $k \le n$, an $(n, n-k)$-generic-zero-pattern is sets $S_1, \cdots, S_{n-k}$ such that
$$\left\lvert \bigcap_{l \in K} S_l \right\rvert \le n -k - \lvert K \rvert$$
for all $K \subset \{1, 2, \cdots, n-k\}$. The main intuition of these patterns come from the zeros of MDS matrices. If a $(n-k) \times n$ matrix $M$ wants to have all $(n-k) \times (n-k)$ submatrices nonsingular, then the zero indices of each $(n-k)$ rows must satisfy the generic zero pattern. Indeed, if two rows had $n-k-1$ zeroes in common, we can create a singular $(n-k) \times (n-k)$ submatrix from it, and so on. This idea will continue in Part 3.
Anyways, the idea is that with our hypergraph $\mathcal{H}$ and orientation from Part 1, we can create a $(n, n-k)$-generic-zero-pattern with sets of the form
$$S_j = \{i \in [n] : j \notin e_i \}$$
which is basically like $\Delta(c^{(j)}, y)$ from previous discussions.
Parts 1 and 2 can be combined by using arguments in [BGM22]'s Lemma 2.8 - which use some generalizations of Hall's marriage theorem. Seems that theories regarding the generic zero pattern are very related to Hall, and that fact is indeed quite intuitive.
### Part 3 - Reed-Solomon Generators (Theorem A.2 in [AGL24])
We go back to the Generic Zero Pattern. We already know that MDS matrices zeroes satisfy the generic zero pattern. One can ask the converse, whether or not there will exist a MDS matrix with a given set of zeroes, if that set satisfies the generic zero pattern.
If the underlying $q$ is very large, you can prove this by allocating formal variables to remaining entries, then running Schwartz-Zippel on all possible submatrices. This would give the result that such MDS matrix exist for $q \gg \text{exp}(n)$. However, better bounds exist, and they have been proved in [[Lov18]](https://arxiv.org/pdf/1803.02523) & [[YH19]](https://arxiv.org/pdf/1803.03752) as the GM-MDS Theorem.
Note that the dual of the Reed-Solomon code is a generalized Reed-Solomon code.
Indeed, there exists $\beta_1, \cdots, \beta_n$ such that with
$$H = \left[ \begin{matrix} \beta_1 & \beta_2 & \cdots & \beta_n \\ \beta_1 \alpha_1 & \beta_2 \alpha_2 & \cdots & \beta_n \alpha_n \\ \cdots & \cdots & \cdots & \cdots \\ \beta_1 \cdot \alpha_1^{n-k-1} & \beta_2 \cdot \alpha_2^{n-k-1} & \cdots & \beta_n \cdot \alpha_n^{n-k-1} \end{matrix} \right]$$
satisfies $Hc = 0^{n-k}$ if and only if $c \in \text{RS}_{n,k}(\alpha_1, \cdots, \alpha_n)$. Indeed, selecting
$$\beta_i = \frac{1}{\prod_{j \neq i} (\alpha_i - \alpha_j)}$$
We basically need to show that for any $f, g$ with $\deg f < k$ and $\deg g < n-k$, we have
$$\sum_{i=1}^n \frac{f(\alpha_i) g(\alpha_i)}{\prod_{j \neq i} (\alpha_i - \alpha_j)} = 0$$
This is a classic - since $fg$ is degree at most $n - 2$, we note that
$$f(x) g(x) = \sum_{i=1}^n f(\alpha_i) g(\alpha_i) \cdot \left( \prod_{j \neq i} \frac{x - \alpha_j}{\alpha_i - \alpha_j} \right)$$
via interpolation on $\alpha_1, \cdots, \alpha_n$. Comparing the degree of $x^{n-1}$ finishes the proof.
Now, we look at generators of the Reed-Solomon code which achieves a generic zero pattern. This is another variant of the GM-MDS Theorem.
Theorem A.2 in [AGL24], proved in [DSY14], [Lov18] and [YH19]
> If $q \ge 2n-k-1$ and $S_1, \cdots, S_{n-k}$ are $(n,n-k)$-generic-zero-patterns, there exists pairwise distinct evaluation points $\alpha_1, \cdots, \alpha_n \in \mathbb{F}_q$ and an invertible matrix $M \in \mathbb{F}_q^{(n-k) \times (n-k)}$ such that if $H$ is the parity-check matrix for $\text{RS}_{n,k}(\alpha_1, \cdots, \alpha_n)$, then $MH$ achieves the zero pattern $S_1, \cdots, S_{n-k}$, i.e.
> $$(MH)_{i, l} = 0, \quad \forall l \in S_i$$
### Main Proof of Theorem 2.11
From Parts 1, 2, 3, we have $\alpha_1, \cdots, \alpha_n \in \mathbb{F}_q$ and invertible matrix $M$ such that
$$(MH)_{i, l} = 0, \quad \forall l \in S_i$$
where all $S_i$'s are of the form
$$\{i \in [n]: j \notin e_i\} = \Delta(c^{(j)}, y)$$
Ultimately, the nice thing about having this type of zero pattern is
$$\begin{aligned}(MH(y - c^{(j)}))_i &= \sum_{l=1}^n (MH)_{i, l} \cdot (y - c^{(j)})_l \\ &= \sum_{l \in S_i} (MH)_{i, l} \cdot (y - c^{(j)})_l + \sum_{l \notin S_i} (MH)_{i, l} \cdot (y - c^{(j)})_l \\ &= \sum_{l \in S_i} 0 \cdot (y - c^{(j)})_l + \sum_{l \notin S_i} (MH)_{i, l} \cdot 0 = 0 \end{aligned}$$
as $(MH)_{i, l} = 0$ for $l \in S_i$, and $y_l = c^{(j)}_l$ for $l \notin S_i$ as $S_i = \Delta(c^{(j)}, y)$.
With this in mind, let's begin. To prove $\text{RIM}_{\mathcal{H}}$ has full column rank over the formal variables $X_1, \cdots, X_n$, we just need to prove full-column-rankness for one substitution $(X_1, \cdots, X_n) \leftarrow (\alpha_1, \cdots, \alpha_n)$. Indeed, we will use the $(\alpha_1, \cdots, \alpha_n)$ from Parts 1, 2, 3.
Assume for the sake of contradiction that $\text{RIM}_{\mathcal{H}}(\alpha_1, \cdots, \alpha_n)$ is not full-column-rank, so
$$\text{RIM}_{\mathcal{H}}(\alpha_1, \cdots, \alpha_n) \cdot \left[ \begin{matrix} f_1 \\ f_2 \\ \cdots \\ f_{t-1} \end{matrix} \right] = 0$$
and let's just write $f_t = 0$. These $f_i$'s can be regarded as polynomials, $\deg f_i < k$.
Due to the definition of $\text{RIM}_{\mathcal{H}}$, it's clear that if $j, j' \in e_i$, then $f_j(\alpha_i) = f_{j'}(\alpha_i)$.
Take, for each $1 \le j \le t$
$$c^{(j)} = (f_j(\alpha_1), \cdots, f_j(\alpha_n))$$
and decide $y$'s $l$th entry is taken from evaluation at $\alpha_l$ on polynomial inside hyperedge $e_l$.
Now we have recovered our desired setup, and can move forward. Indeed, for each $1 \le i \le n-k$, the set $S_i$ is of the form $\Delta(c^{(j)}, y)$ for some $j$, so we can recover
$$(MH(y - c^{(j)}))_i = 0$$
Also, since $c^{(j)}$ is a codeword of $\text{RS}_{n,k}(\alpha_1, \cdots, \alpha_n)$, we know that $MHc^{(j)} = 0$ as well.
Therefore, for each $i$ we recover $(MHy)_i = 0$, which means that $MHy = 0$. Since $M$ is invertible we arrive at the conclusion $Hy = 0$, i.e. $y \in \text{RS}_{n,k}(\alpha_1, \cdots, \alpha_n)$.
Due to the $k$-weakly-partition-connectivity, we can show that for each $j$, using the partition $\{j\} \cup ([t] \setminus \{j\})$ that there must be at least $k$ hyperedges including $j$. This means that $y$ and $c^{(j)}$ agree on at least $k$ evaluation points, and since both of them are codewords this forces $y = c^{(j)}$. Therefore, we end up with
$$y = c^{(1)} = c^{(2)} = \cdots = c^{(t)} = 0$$
which means that our kernel vector was in fact the zero vector.