# Proximity Gaps for Reed–Solomon Codes
**Authors:** Eli Ben-Sasson, Dan Carmon, Yuval Ishai, Swastik Kopparty, Shubhangi Saraf
**Date:** July 3, 2021
**Link:** https://eprint.iacr.org/2020/654.pdf
## Abstract / Core Idea
This paper establishes a fundamental structural property of Reed-Solomon (RS) codes, which they term a **proximity gap**. The core finding is that for any affine subspace of words, either *all* words in the subspace are close to the RS code, or *almost none* are. There is no "in-between" scenario where a significant fraction of words are close while another significant fraction is far. This gap property is proven to hold for distances up to the Johnson/Guruswami-Sudan list-decoding bound and has major applications in cryptography and proof systems, most notably improving the soundness of the FRI protocol used in STARKs.
:::info
## An Example of the Proximity Gap
To understand the proximity gap, let's use a simple analogy. Imagine a code where "words" are lists of numbers, and a word is considered "close" if it has at most 1 error. We'll look at an "affine subspace", which for our purposes is just a line of words generated from two starting words, `u₀` and `u₁`.
The proximity gap theorem states that for any such line, only two scenarios are truly possible.
### Scenario 1: "All-in" - Correlated Errors
This scenario occurs when the errors in the starting words are perfectly aligned.
Let's define the error vector as the difference between a word and its closest valid codeword. Suppose `u₀` and `u₁` both have their single error in the exact same position:
* The error in `u₀` is `error₀ = (0, 0, 0, 5)` (an error of 5 in the 4th slot).
* The error in `u₁` is `error₁ = (0, 0, 0, 2)` (an error of 2 in the 4th slot).
Any word on the line is formed by the expression `u₀ + z * u₁`, where `z` is a number. The total error for any word on this line is `Total Error(z) = error₀ + z * error₁`. Let's see what this looks like:
`Total Error(z) = (0, 0, 0, 5) + z * (0, 0, 0, 2) = (0, 0, 0, 5 + 2z)`
Notice that this resulting error vector still has only one non-zero entry (the last one). No matter which value you choose for `z`, the error remains neatly confined to that single position.
**Result:** Every single word on this line has just one error. Therefore, all words are "close" to the code.
### Scenario 2: "All-out" - Uncorrelated Errors
This happens when the initial errors are scattered in different places.
* The error in `u₀` is `error₀ = (5, 0, 0, 0)` (error in the 1st slot).
* The error in `u₁` is `error₁ = (0, 0, 2, 0)` (error in the 3rd slot).
Now, the total error is calculated with the same formula:
`Total Error(z) = (5, 0, 0, 0) + z * (0, 0, 2, 0) = (5, 0, 2z, 0)`
For any `z` that isn't zero, this error vector now has two non-zero entries. The number of errors has jumped from 1 to 2, making these words "far" from the code. Only the single word `u₀` (when `z=0`) remains "close".
**Result:** At most one word on this line is "close," meaning **almost all words are "far"**.
### The Key Insight: Why There's No "Middle Ground"
Let's look at the error formula again:
`Total Error(z) = error₀ + z * error₁`
If we zoom in on a single coordinate, say position `i`, the error value there is determined by a simple linear equation:
`Error at position i = error₀[i] + z * error₁[i]`
A fundamental rule of algebra is that a linear equation (in a single variable `z`) can have at most one solution. This means that for any given coordinate, there is at most one special value of `z` that can make the error at that specific position disappear (become zero).
So, imagine you wanted to create a "close" word. This would require the errors at several different positions to magically become zero for the *exact same value of `z`*. For instance:
* To fix the error at coordinate #5, you might need `z = 2`.
* To fix the error at coordinate #12, you might need `z = 5`.
* To fix the error at coordinate #28, you might need `z = 1`.
You cannot find a single value of `z` that satisfies all these conditions simultaneously. It is algebraically impossible to "clean up" scattered errors across many coordinates for a large number of `z` values.
This rigid structure is what creates the sharp split. The errors either line up perfectly from the beginning (the correlated error case), or their interaction is constrained by these algebraic rules, which prevents widespread, simultaneous cancellation and immediately pushes almost all the words into the uncorrelated error category. That is the essence of the gap.
:::
## The Central Question: Proximity Gaps
Many protocols in distributed storage, cryptography, and proof systems rely on verifying that a whole *batch* of words $\{u_0, u_1, \dots, u_l\}$ are close to a linear code $V$. A natural and efficient way to do this is to take a single random linear combination $u' = \sum r_i u_i$ and check if $u'$ is close to $V$.
The critical question is: **How sound is this test?**
If even one of the $u_i$ is far from $V$, does that guarantee that the random combination $u'$ will also be far from $V$?
Previous work showed this is often true but with a degradation in the distance parameter (e.g., if a $u_i$ is $\delta$-far, $u'$ is only guaranteed to be $\delta/2$-far). This paper seeks to eliminate this degradation. The authors formalize this goal using the concept of a proximity gap.

:::success
### Definition: Proximity Gap
This definition establishes a powerful "all-or-nothing" principle. It says that for a given collection of sets, every single set must fall into one of two extreme categories regarding a specific property, with no middle ground.
A collection of sets $\mathcal{C}$ (e.g., all possible lines or planes of data words) displays a **$(\delta, \epsilon)$-proximity gap** with respect to a property $P$ (e.g., "being a valid Reed-Solomon codeword") if every set $S \in \mathcal{C}$ satisfies *exactly one* of the following conditions.
Here, $\delta$ is the **proximity parameter**, a threshold defining how "close" a member needs to be to the property. For example, being $\delta$-close could mean differing in at most $\delta \cdot 100\%$ of its positions. The parameter $\epsilon$ is the **error parameter**, a very small number (like 0.001) that defines what "a tiny fraction" means.
1. **All-In:** All members of the set $S$ are $\delta$-close to the property $P$. In this case, there is a 100% chance that any randomly chosen member from the set satisfies the closeness condition.
$$Pr[s \in S \text{ is } \delta-\text{close}] = 1$$

2. **All-Out:** A tiny fraction of members of the set $S$ are $\delta$-close to the property $P$. Here, the probability of finding a member that satisfies the closeness condition is negligible — at most $\epsilon$.
$$Pr[s \in S \text{ is } \delta-\text{close}] \leq \epsilon$$

The crucial part of this definition is the "exactly one" condition. It means no set $S$ can be, for instance, 40% "close" and 60% "far." It must be one of these two extremes.
:::
## The Main Result: Proximity Gaps for Reed-Solomon Codes (Theorem 1.2)
The paper's main discovery establishes a powerful "all-or-nothing" rule for data protected with **Reed-Solomon (RS) codes**. It proves that any structured collection of data words, like a line or a plane (known as an **affine space**), must adhere to a stark dichotomy.
To understand this, let's first define our terms:
* **The Code ($V$):** A Reed-Solomon code, denoted $V = \text{RS}[F_q, D, k]$, is a set of "valid" words of length $n$. Each word is created by evaluating a polynomial of degree at most $k$ at $n$ different points.
* **The Rate ($\rho$):** The code's rate, $\rho = (k+1)/n$, is its efficiency — the ratio of the original message size to the final encoded word size.
* **The Distance ($\delta$):** This is the fraction of symbols that are different between two words (the relative Hamming distance). A word is **$\delta$-close** to the code if it has at most $\delta \cdot 100\%$ errors.
The main theorem states that for any affine space, a proximity gap exists. This means every word in the space is $\delta$-close, or almost none are. This rule holds even for a large number of errors, up to a famous limit in coding theory.
:::success
### Theorem 1.2 (Proximity Gap for RS Codes)
For any Reed-Solomon code $V$, the collection of affine spaces displays a **proximity gap** for any error level $\delta$ in the range:
$$\delta < 1 - \sqrt{\rho}$$
The term $1 - \sqrt{\rho}$ is the **Johnson bound**, a well-known theoretical limit for how many errors a word can have while still allowing for reliable list-decoding.
:::
:::info
### Example: A Miniature Reed-Solomon System
Let's build a small-scale version of the system described in the paper to make the terms clear.
#### The Code ($V$)
Imagine we want to encode data using a small alphabet of just five numbers, $F_5 = \{0, 1, 2, 3, 4\}$. We'll use a Reed-Solomon code to turn a 2-number message into a 4-number "codeword".
* **Our Code:** $V = \text{RS}[F_5, \{0,1,2,3\}, 1]$
* This means our messages are linear polynomials (degree $k=1$) like $P(x) = ax + b$.
* We create a codeword of length $n=4$ by evaluating the polynomial at the points $D=\{0, 1, 2, 3\}$.
* **Creating a Codeword:**
Let's pick the message polynomial $P(x) = 2x + 1$. We evaluate it at our four points (remembering that math is done modulo 5):
* $P(0) = 2(0) + 1 = 1$
* $P(1) = 2(1) + 1 = 3$
* $P(2) = 2(2) + 1 = 5 \equiv 0$
* $P(3) = 2(3) + 1 = 7 \equiv 2$
The resulting valid codeword is the list of these values: `v = (1, 3, 0, 2)`. The set of all such possible codewords is our code, $V$.
#### The Rate ($\rho$)
The rate measures the code's efficiency.
* **Formula:** $\rho = (k+1)/n$
* **Our Example:** We used a message of size 2 (for the numbers `a` and `b`) to create a codeword of length 4. So, the rate is $\rho = (1+1)/4 = 0.5$.
* This means our code is 50% efficient; half the data is the original message, and half is redundancy for error protection.
#### An Affine Space
An affine space is just a structured set of words. The simplest is a line, created from a base word `u₀` and a direction word `u₁`.
* Let `u₀ = (1, 1, 1, 1)` and `u₁ = (0, 1, 2, 3)`.
* The affine space is the set of all words `u₀ + z * u₁`, where `z` is from our alphabet $\{0, 1, 2, 3, 4\}$.
* For example, if we pick `z=2`, we get the word:
`(1,1,1,1) + 2 * (0,1,2,3) = (1,1,1,1) + (0,2,4,1) = (1, 3, 0, 2)`
This specific word happens to be a valid codeword we saw earlier. The other words on this line may or may not be.
The theorem applies to this entire set of 5 words. It says either all 5 are "close" to a codeword, or almost none are.
#### The Distance ($\delta$)
This measures the fraction of errors in a word.
* Let's take our valid codeword `v = (1, 3, 0, 2)`.
* Suppose we receive a corrupted version, `w = (1, 3, 4, 2)`.
* Comparing `v` and `w`, we see they differ in only **one** position (the third one).
* The relative distance is the number of differences divided by the length: $\delta = 1/4 = 0.25$.
We can therefore say that the word `w` is **0.25-close** to the code $V$. The theorem's range, $\delta < 1 - \sqrt{\rho}$, tells us how large this error fraction can be while the proximity gap still holds.
:::
The precise nature of this "all-or-nothing" guarantee depends on how many errors we are considering. The paper splits the result into two key regimes:
### The Unique Decoding Regime: High Certainty with Few Errors
This applies when the number of errors is relatively small, small enough that there's only one possible valid codeword nearby.
**Condition:** The error level $\delta$ is less than half the code's relative distance.
$$\delta < \frac{1-\rho}{2}$$
**Result:** In this case, the gap is extremely sharp. For an "All-Out" space, the fraction of words that are $\delta$-close is at most:
$$\epsilon = \frac{n}{q}$$
where $n$ is the word length and $q$ is the size of the alphabet (the field size).
**What this means:** This is a very strong guarantee. If an affine space is not "All-In," then the chance of randomly finding a "close" word is incredibly small. For instance, if your alphabet size $q$ is much larger than your word length $n$ (a common scenario), this probability becomes almost zero.
### The List Decoding (Johnson) Regime: Handling Many More Errors
This is a major extension, showing the gap still holds even when the number of errors is much larger. In this regime, there might be a *list* of potential valid codewords close to a given word.
**Condition:** The error level $\delta$ is larger, approaching the Johnson bound.
$$\frac{1-\rho}{2} \le \delta < 1 - \sqrt{\rho}$$
**Result:** The guarantee is still powerful, but the formula is more complex. For an "All-Out" space, the fraction of close words is at most:
$$\epsilon = O\left( \frac{n^2}{q \cdot (\eta\rho)^{O(1)}} \right)$$
Here, $\eta = (1 - \sqrt{\rho}) - \delta$ is a "safety margin" that measures how far your error level $\delta$ is from the absolute limit.
**What this means:** The gap still exists, but to ensure that $\epsilon$ is small, the alphabet size $q$ needs to be larger (roughly proportional to $n^2$). The guarantee is slightly weaker if your error level $\delta$ is very close to the maximum possible (i.e., when the safety margin $\eta$ is tiny), but it robustly holds across this entire range of high error rates.
## The Proof Technique: A Unified Algebraic Approach
The most insightful contribution of the paper is its novel proof strategy. Instead of analyzing the words of an affine space one by one, the authors use a higher level of algebraic abstraction to analyze them all simultaneously. This shift in perspective transforms a messy statistical problem into a single, structured algebraic one.
### The Core Idea: From Many Words to One Formal Word
The central challenge is to understand the collective behavior of a whole line of words. A line (a simple affine space) is the set of all words $u_z$ of the form:
$$u_z = u_0 + z \cdot u_1$$
where $u_0$ and $u_1$ are fixed words and $z$ takes on every value in the finite field $F_q$.
The authors' idea is to represent this entire collection with a single formal word, $w$, defined over a more abstract mathematical world called a function field, $K = F_q(Z)$. Think of this field as the set of all rational functions (ratios of polynomials) in a formal variable $Z$.
The formal word $w$ is defined as:
$$w(x) = \underbrace{u_0(x)}_{\text{base word}} + \underbrace{Z}_{\substack{\text{formal}\\\text{variable}}} \cdot \underbrace{u_1(x)}_{\text{direction word}}$$
This new word $w$ is not just a list of numbers; it's a list of polynomials in $Z$.
Hence, a single formal word $w$ encapsulates every word on the original line. If we take $w$ and simply substitute a specific value $z$ from our original field $F_q$ in place of the formal variable $Z$, we get back the exact word $u_z$:
$$w(x) \bigg|_{Z=z} = u_0(x) + z \cdot u_1(x) = u_z(x)$$
By running classical decoding algorithms on the single abstract word $w$, the authors can deduce properties that must hold for *all* the concrete words $u_z$ at once.
### Part A: The Unique Decoding Regime via Berlekamp-Welch
This approach is used when the number of errors ($\delta$) is small. The goal is to show that if many words on the line are close to a codeword, then the initial words $u_0$ and $u_1$ must share a highly correlated structure.
#### Step 1: The Berlekamp-Welch Algorithm
The Berlekamp-Welch (BW) algorithm is a classic method for correcting errors in RS codes. Given a received word $r$ with a few errors, it finds two polynomials:
* An **error-locator polynomial**, $A(X)$, which is zero at the error positions.
* A **codeword-evaluator polynomial**, $B(X)$.
These polynomials satisfy the key equation $A(x) \cdot r(x) = B(x)$ for all positions $x$. The multiplication by the error-locator polynomial $A(X)$ effectively masks the errors present in the received word $r$, allowing the original polynomial to be restored perfectly through the division. The corrected codeword is then found by $P(X) = B(X) / A(X)$.
#### Step 2: Applying BW to the Formal Word
The authors apply the BW algorithm to their formal word $w(x) = u_0(x) + Z \cdot u_1(x)$. The algorithm now operates over the function field, producing two new polynomials, $A(X, Z)$ and $B(X, Z)$, whose coefficients are themselves functions of $Z$. The key equation becomes:
$$A(X, Z) \cdot \overbrace{\left( u_0(x) + Z \cdot u_1(x) \right)}^{w(x)} = B(X, Z)$$
#### Step 3: The "Lift" from Many Instances to a Formal Property
The starting assumption is that for many specific values $z \in F_q$, the corresponding word $u_z$ is close to a codeword $P_z(X)$. For these "good" $z$'s, the BW properties guarantee that when we substitute $Z=z$, the resulting polynomial $A(X, z)$ must divide $B(X, z)$:
$$\frac{B(X, z)}{A(X, z)} = P_z(X) \quad (\text{for many "good" } z\text{'s})$$
This is where the crucial insight comes in. The authors use the **Polishchuk-Spielman lemma**, a powerful algebraic tool which states, in essence:
> *If a polynomial $A$ divides another polynomial $B$ after you plug in many different values for a variable $Z$, then $A$ must have divided $B$ formally to begin with. In other words, it deduces divisibility of bivariate polynomials from divisibility of univariate restrictions.*
This allows them to "lift" the property from the many concrete instances to the abstract, formal setting. They conclude that $A(X, Z)$ must divide $B(X, Z)$.
### Step 4: Recovering the Shared Structure and Proving the Gap
The proof culminates in this step. The previous steps have established that if a line of words is not in the "All-Out" category, then a single, formal polynomial $P(X, Z)$ must exist. This polynomial represents the "corrected" version of the formal word $w(x)$. A final algebraic argument reveals the remarkably simple structure of this polynomial:
\begin{equation}
P(X, Z) = v_0(X) + Z \cdot v_1(X)
\end{equation}
This polynomial $P(X, Z)$ represents a "perfect line" composed entirely of valid codewords, since both $v_0(X)$ and $v_1(X)$ are themselves valid Reed-Solomon codewords.
The key insight is that the original formal word, $w(x)$, must be "close" to this perfect formal codeword, $P(X, Z)$. This means they must agree on a large number of coordinate positions $x$. For any such position $x$ where they agree, the following identity must hold:
\begin{equation}
u_0(x) + Z \cdot u_1(x) = v_0(x) + Z \cdot v_1(x)
\end{equation}
Since this equation between two linear polynomials in the formal variable $Z$ is true, their corresponding coefficients must be equal. This forces two conditions to be true simultaneously for this large set of agreeing positions:
1. $u_0(x) = v_0(x)$
2. $u_1(x) = v_1(x)$
This is the shared structure, also referred to as correlated agreement. It implies that the set of positions where $u_0$ has errors (that is, differs from $v_0$) must be the exact same small set of positions where $u_1$ has errors (differs from $v_1$).
This directly proves the proximity gap. If this shared error structure exists, then for any word $u_z = u_0 + z \cdot u_1$ on the line, its errors relative to the corresponding codeword $v_z = v_0 + z \cdot v_1$ are confined to that same small set of positions. As a result, every single word on the line is close to the code. This is the definition of the "All-In" case.
Therefore, the proof has established that any line of words must either be "All-Out" (which was the initial assumption for the sake of argument) or it is forced into being "All-In." No middle ground can exist, which completes the proof of the proximity gap.
### Part B: The List Decoding Regime via Guruswami-Sudan
This approach is necessary to prove the gap for a much larger number of errors. In this regime, a received word might be close to a whole list of valid codewords, requiring a more powerful algorithm than Berlekamp-Welch.
#### Step 1: The Guruswami-Sudan Algorithm
The Guruswami-Sudan (GS) algorithm is used in modern coding theory. Instead of finding a corrected codeword directly, it first constructs a geometric object. Given a received word $r$, it finds an auxiliary polynomial, $Q(X, Y)$, which defines a curve that passes through many of the word's points, $(x, r(x))$.
The key property is that any valid Reed-Solomon codeword, $P(X)$, that is close to the received word $r$ will correspond to a simple factor of this auxiliary polynomial. Specifically, $(Y - P(X))$ will divide $Q(X, Y)$. This elegantly transforms the problem of error correction into a problem of polynomial factorization.
#### Step 2: Applying GS to the Formal Word
The authors apply this algorithm to their formal word $w(x) = u_0(x) + Z \cdot u_1(x)$. The process now operates over the function field, yielding a single, three-variable polynomial $Q(X, Y, Z)$. This polynomial is constructed such that for every coordinate $x$, the formal point $(x, w(x))$ lies on the surface it defines. This gives the key property:
\begin{equation}
Q(x, u_0(x) + Z \cdot u_1(x), Z) = 0
\end{equation}
#### Step 3: Connecting Back to the Finite Field
The initial assumption is that for many "good" values of $z$, the corresponding word $u_z$ is close to a valid codeword $P_z(X)$. For these specific values, the properties of the GS algorithm guarantee that when we substitute $Z=z$, the polynomial $P_z(X)$ must be a "root" of $Q(X, Y, z)$. This means $(Y - P_z(X))$ must be a factor:
\begin{equation}
Q(X, Y, z) = (Y - P_z(X)) \cdot (\text{some other polynomial})
\end{equation}
#### Step 4: The Advanced Algebraic Lift 🛠️
The central challenge is to prove that if $Q(X, Y, z)$ has a linear factor $(Y - P_z(X))$ for many specific values of $z$, then the formal polynomial $Q(X, Y, Z)$ must also have a similar factor. Standard algebraic tools are not sufficient for this leap. The solution requires the following toolkit.
* **Tool 1: Algebraic Extension of a Function Field**
The first step is to move into a mathematical world where the equation $Q(X, Y, Z) = 0$ is guaranteed to have a solution for $Y$. Our starting field of rational functions $F_q(Z)$ may not be rich enough. For instance, if $Q$ contained a term like $Y^2 - Z$, there is no solution for $Y$ in $F_q(Z)$. The authors move to an **algebraic extension**, which is a specially constructed field where such solutions, like a formal $\sqrt{Z}$, are given meaning. This guarantees that a root for $Y$ exists.
* **Tool 2: Hensel Lifting**
Now that we know a solution exists in this richer world, we need a way to find it. This is done using a technique called **Hensel lifting**. It is a powerful iterative algorithm, analogous to how you might find the decimal expansion of $\sqrt{2}$: you start with an approximation (like 1.4) and then algorithmically find the next digit (1.41), then the next (1.414), and so on.
Hensel lifting does this for polynomials. It starts with a solution at a single point, say $X=x_0$, and then builds the solution up, term by term, as an infinite power series in $(X-x_0)$. The result is a power series, $\gamma(X, Z)$, whose coefficients are algebraic functions of $Z$.
\begin{equation}
\gamma(X, Z) = \underbrace{c_0(Z)}_{\text{Solution at } X=x_0} + \underbrace{c_1(Z)(X-x_0)}_{\text{First-order correction}} + \underbrace{c_2(Z)(X-x_0)^2}_{\text{Second-order correction}} + \dots
\end{equation}
This power series $\gamma(X, Z)$ is now a formal root of the polynomial $Q(X, Y, Z)$.
#### Step 5: The Contradiction Argument
The logical core of the proof is to show that this seemingly complex power series, $\gamma(X, Z)$, must actually be a simple polynomial. This is done by contradiction.
1. **Connecting the Formal and Concrete:** The proof first establishes that for every "good" value of $z$ from our initial assumption, this formal power series solution, when specialized, must equal the corresponding concrete codeword polynomial $P_z(X)$.
\begin{equation}
\gamma(X, z) = P_z(X)
\end{equation}
2. **The Assumption for Contradiction:** We assume the opposite of what we want to prove. Let's assume that $\gamma(X, Z)$ is *not* a simple polynomial of the form $v_0(X) + Z \cdot v_1(X)$. This means that either its power series is truly infinite, or its coefficients $c_i(Z)$ are complex algebraic functions (e.g., involving terms like $\sqrt{Z^3+1}$), not just simple polynomials in $Z$.
3. **The "Too Many Roots" Problem:** The fact that $\gamma(X, z)$ must simplify to a low-degree polynomial $P_z(X)$ for *all* the many "good" $z$'s imposes incredibly strong constraints. For example, for any degree $d$ greater than the degree of the RS code, the coefficient of $X^d$ in $\gamma(X, z)$ must be zero. This means the formal coefficient of $X^d$ in $\gamma(X, Z)$, a complex algebraic function, must evaluate to zero at all of these "good" $z$ values.
3. **The "Too Many Roots" Problem:** The "Too Many Roots" problem arises from a conflict between the properties of the formal solution and the properties of the concrete codewords it must represent.
We know from the previous step that our formal power series solution, $\gamma(X, Z)$, must simplify into a valid, low-degree Reed-Solomon codeword, $P_z(X)$, for every "good" value of $z$.
Let's imagine our Reed-Solomon code contains polynomials of degree at most 100. This means any valid codeword $P_z(X)$ has no term with $X^{101}$, $X^{102}$, or higher. The coefficients for these terms are all zero.
Now, let's look at our formal power series:
\begin{equation}
\gamma(X, Z) = c_0(Z) + c_1(Z)X + \dots + c_{100}(Z)X^{100} + c_{101}(Z)X^{101} + \dots
\end{equation}
The coefficients, like $c_{101}(Z)$, are algebraic functions of the formal variable $Z$. They could be simple, like $Z^2+1$, or complex, like $\sqrt{Z^3-Z+5}$.
Here is the constraint: For every single "good" value of $z$, the specialized polynomial $\gamma(X, z)$ must be a codeword of degree at most 100. This forces the coefficient of the $X^{101}$ term in $\gamma(X, z)$ to be zero. That coefficient is simply $c_{101}(z)$.
So, for thousands of different "good" values of $z$, we must have:
\begin{equation}
c_{101}(z) = 0
\end{equation}
This creates the "Too Many Roots" problem. We have an algebraic function, $c_{101}(Z)$, which our argument assumes is a potentially complex, non-zero function. Yet, we have found that it must have a root (it must equal zero) at every single one of the thousands of "good" $z$ values.
A fundamental theorem of algebra states that a non-zero function can only have a limited number of roots. It is mathematically impossible for a non-zero function like $\sqrt{Z^3-Z+5}$ to be zero for thousands of different inputs. The only way for a function to have this many roots is if it was the zero function to begin with.
This forces the conclusion that the coefficient $c_{101}(Z)$ must be formally equal to zero. The same logic applies to all higher-order coefficients ($c_{102}(Z)$, $c_{103}(Z)$, etc.). They must all be zero, which proves that the infinite power series was actually a finite, low-degree polynomial all along. This contradiction proves our initial assumption was false.
#### Step 6: Recovering the Shared Structure
The contradiction in the previous step forces the conclusion that the formal power series $\gamma(X, Z)$ is not an infinitely complex object. Instead, it must be a simple polynomial with a very specific structure.
The infinite tail of the series must be all zeros, and its coefficients must be simple linear polynomials in $Z$. Therefore, the solution has the structure we were looking for:
\begin{equation}
\gamma(X, Z) = P(X, Z) = \underbrace{v_0(X)}_{\text{a valid RS codeword}} + Z \cdot \underbrace{v_1(X)}_{\text{a valid RS codeword}}
\end{equation}
Since $\gamma(X, Z)$ is a root of $Q(X, Y, Z)$, it follows that $(Y - P(X, Z))$ is a formal factor of $Q(X, Y, Z)$. This discovery of a single, structured formal factor is the key to proving the proximity gap.
### The Link to the Proximity Gap
The entire proof, from Step 1 to this point, has operated under a single premise: we assumed that our affine line of words was **not** in the "All-Out" category. That is, we started by assuming a significant fraction of words on the line $\{u_0 + z \cdot u_1\}$ were close to the code.
The discovery of the polynomial $P(X, Z)$ reveals the consequences of that assumption. It represents a "perfect line" of valid codewords, $\{v_0 + z \cdot v_1\}$. The proof has shown that our original, potentially corrupted formal word, $w(x)$, is "close" to this perfect formal codeword, $P(X, Z)$.
This "closeness" on the formal level implies a powerful structural property on the concrete level. For the vast majority of coordinate positions $x$, the following identity must hold:
\begin{equation}
\overbrace{u_0(x) + Z \cdot u_1(x)}^{w(x)} = \overbrace{v_0(x) + Z \cdot v_1(x)}^{P(x,Z)}
\end{equation}
Because this equation holds formally for the variable $Z$, the coefficients must match. This forces a simultaneous agreement for most positions $x$:
\begin{equation}
\begin{cases}
u_0(x) = v_0(x) \\
u_1(x) = v_1(x)
\end{cases}
\end{equation}
This is the **shared structure**. It means the errors in $u_0$ and $u_1$ are not random; they are **correlated**, occurring only on the same small set of positions where this agreement fails.
If the errors are correlated in this way, then for any choice of $z$, the combined word $u_z = u_0 + z \cdot u_1$ will also have its errors confined to that same small set of positions. Therefore, its distance to the corresponding valid codeword $v_z = v_0 + z \cdot v_1$ will be small.
This leads to the final conclusion: if the line is not "All-Out," then **every single one of its words must be $\delta$-close to the code**. This is the definition of the "All-In" case. The proof has shown there is no middle ground, which establishes the proximity gap.