# The simplest proximity-to-low-degree-polynomial test and how to rehabilitate approximate polynomials
## Introduction
This post is an apetizer for a blog post in preparation on the __FRI__ protocol. We describe what is arguably the simplest of all __proof of proximity (to low degree polynomial functions)__[^algthough] scheme out there. Here and elsewhere, __low degree__ means "of degree $\leq d$" for some fixed positive integer $d$. But our interest in this scheme is as a means to shed light on a 👻 _dual_ 👻 problem to that of proximity testing.
This scheme was originally described in the 1991 paper [Self Testing/Correcting for Polynomials and Approximate Functions](http://people.csail.mit.edu/ronitt/papers/glrsw.pdf) by P. Gemmel, R. Lipton, R. Rubinfeld, M. Sudhan and A. Widegerson (we shall sometimes use the acronym __GLRSW__). It has been referred to as "Sudhan $d+1$" or some variation on that name in presentations on FRI given by Eli Ben Sasson. In StarkWare's blogpost on [Low Degree Testing](https://medium.com/starkware/low-degree-testing-f7614f5172db), it is what is called "the direct test".
[^algthough]: although, more on that later.
Compared to modern proof of proximity schemes such as FRI it is __spectacularly inefficient__: it runs in linear time $O(d)$ as opposed to FRI's polylogarithmic time $O(\ln(d)^2)$. However, it has the advantage of __sheer simplicity__:
* there is only ever _one_ map at play (as opposed to the $O(\ln(d))$ inter-dependent maps in FRI);
* coherence is checked for that one map as opposed to having a "trickle down" sequence of coherence conditions relating all the maps in that sequence;
* the redundancy property at the heart of this scheme is second nature to anyone familiar with polynomial interpolation.
Also, proving its "soundness properties", while involved, follows a clear path. It also illustrates well a popular outlook for proving soundness: "_deal with the good cases in detail, don't bother analyzing what happens in the bad cases except for bounding the probability of something bad happening in the first place_". Usually the good case is when everything happens according to plan, bad cases are everything else. This makes the __GLRSW__ scheme a perfect entry point into proof of proximity schemes.
---
[TOC]
---
## Role of proofs of proximity in transparent computational integrity schemes
We somewhat cryptically indicated that __GLRSW__ is really about a _dual_ problem to that of low-degree proximity testing. Let us try to explain that claim. So before we go any further: __what are proof of proximity schemes and what are they useful for?__ Certainly the main use-case relevant to blockchain related applications is as part of computational integrity schemes.
__Computational integrity schemes__ such as STARKs or SNARKs and many others are protocols whereby a __prover__ can convince a __verifier__ that a complex and/or confidential computation was performed as advertized. The tremedous appeal of such schemes comes from the existence of computational integrity schemes that are
* far quicker to verify than it is to run the computation,
* leak no private information,
* are nigh impossible to twist into producing false positives.
Such schemes rely on a __prover__ to construct a __proof of computational integrity__. The prover's work can be broadly separated into two phases.
The __first phase__, common to all such schemes[^impl_varies], is the __arithmetization of the computation__ (R1CS or algebraic execution trace for instance). The _raw data_ produced during the computation is converted into _algebraic form_ (typically field elements) which in turn is condensed into _polynomial form_ by means of [polynomial interpolation](https://en.wikipedia.org/wiki/Polynomial_interpolation)[^sfo]. Polynomials are useful in this regard: besides addition and scalar multiplication[^vectors_do_too], they support __products__ and __divisibility__. Importantly, this first phase can be done so that computational integrity (i.e. validity of the underlying computation) is _equivalent_ to the satisfaction of some algebraic constraints by the resulting polynomials. Typically __low-degreeness__ and __divisibility__ conditions.
[^impl_varies]: although concrete implementations vary greatly
[^sfo]: the precise way in which this interpolation is done varies. In particular, the __domain of interpolation__ is usually a _structured subset_ of a (finite) field $\Bbb{F}$: a subgroup/coset of $\Bbb{F}^\times$ or an additive sugroup/coset of $\Bbb{F}_{p^d}$.
[^vectors_do_too]: which vectors also support
The __second phase__, i.e. actually compiling the proof, comes down to finding an efficient commitment of these polynomials. This commitment should allow a verifier to convince themselves of the claim to be proven. In particular, of __low-degreeness__ and __divisibility__ conditions that may apply. There are various ways of doing this, and (at least) two competing philosophies for carrying out the second step.
__Philosophy 1: use opaque data and secrets to force the prover down a particular path; valid proofs are those producing ultra-rare collisions.__ The verifier thus generates some secrets along with associated opaque data (hidden behind a hard discrete log problem, say) called the __proving key__ to be handed to the prover. The proving key is generated in such a way that for one to generate a convincing proof from it, one has to
1. either comply with the verifier and produce a proof according to protocol; such proofs encode polynomials that are __low-degree by construction__;
2. or successfully break a cryptographically hard problem (e.g. variants of discrete log computations).
For checking __divisibility__ conditions, the relevant fact is that two distinct low degree polynomials $P$ and $Q$ virtually never[^virtually_zero] produce a collision when evaluated at a random (secret) point. Thus, producing an exceedingly rare collision $P(s) = Q(s)$ at a random (secret) point $s$ is seen as strong supporting evidence for the claim " $P=Q$ ". By extension, a collision of the form $A(s)\cdot H(s) = B(s)$ is interpreted as strong evidence for a divisibility claim " $A\mid B$ ".
[^and_others]: as well as a polynomial $R$ needed to force internal consistency among $L$, $R$ and $O$ and a polynomial $H$ to witness divisibility by $Z$, i.e. $LR-O=HZ$.
[^modulo_consistency_checks]: modulo some necessary internal consistency checks.
[^virtually_zero]: indeed, if $\deg(P),\deg(Q)\lll\#\Bbb{F}$ and if $s$ is drawn at random in a large field $\Bbb{F}$, then $P(s)=Q(s)$ happens with probability at most $\frac{\max\{\deg(P),\deg(Q)\}}{\#\Bbb{F}}\lll 1$.
__Philosophy 2: no secrets, find another way to enforce low-degreeness / divisibility conditions.__ Schemes of the first kind had low-degreeness baked into them. For schemes where the prover isn't forced down a narrow path for constructing polynomial commitments, this is no longer the case.
This is immediately problematic: producing collisions between polynomials becomes easy if one is free to use large degree polynomials. __Checking for low-degreness thus becomes the central problem__. And a difficult problem it certainly is. Checking low-degreeness _on the nose_ is computationnally demanding, at least as demanding as the original computation the verifier may wish to bypass. __Proofs of proximity to low-degree maps__ are an efficient substitute for _on the nose_ low-degree tests. They don't prove that a map is low-degree, they show that a map $f$ is (likely) in the immediate neighborhood of __some__ low degree map whose properties may be extracted by other means from $f$.
Once the verifier has sufficient supporting evidence for believing in low-degreeness (or at least proximity to a low degree map), checking divisibilty conditions follows much of the same path as before: open commitments at a random points $r$ (that usually need not be secret) and check $A(r)\cdot H(r) \overset?= B(r)$ in the clear.
## Proofs of proximity vs the __GLRSW__ scheme
The __GLRSW__ scheme we will describe below can be seen as a crude means to discriminate between maps. To distinguish between those that are very close to being polynomial and those that are far from polynomial. In that sense it can work like a __proof of proximity scheme__. Indeed, the purpose of such a scheme is to
* ACCEPT with probability 1 low degree polynomial functions
* REJECT with high probability maps that are far from any low degree polynomial function.
The behaviour of such a proof of proximity scheme is illustrated below: low degree maps (here the central red dot) are to be accepted all the time, maps that are outside of some small neighborhood of the set of low degree polynomial maps are to be rejected with high probability.
![](https://i.imgur.com/hXeTgwP.jpg)
And then there is a grey area (a small Hamming neighborhood of some low degree polynomial map) where the test's behaviour is unspecified. The __GLRSW__ scheme on the other hand fulfils the complementary (or dual) role of "discovering" the nearest low degree map when fed one of maps in that small neighborhood.
* Proofs of proximity can be seen as means of amplifying the distinction between polynomial maps and their distant neighbors,
* the __GLRSW__ scheme can be understood as contracting small neighborhoods of polynomial functions onto that polynomial function.
So while proof of proximity schemes are in the business of discrimination, __GLRSW__ is in the business of rehabilitation. The next section are our attempt to make explicit the above picture.
## The barren wasteland of the space of all maps
Let us think for a moment about the space $\mathcal{F}$ of all maps $f:\Bbb{F}\to\Bbb{F}$ $$\mathcal{F}=\left\{\begin{array}{c}\text{the space of all}\\\text{set maps }\Bbb{F}\to\Bbb{F}\end{array}\right\}$$ (similar mental pictures will apply to the space of all maps $S\to\Bbb{F}$ for some large subset $f:S\subset\Bbb{F}$). When $\Bbb{F}$ is a large finite field, $\mathcal{F}$ is a __very large set__: $$\#\mathcal{F}=(\#\Bbb{F})^{\#\Bbb{F}}$$ For instance, if $\Bbb{F}$ is a 256 bit prime field, we get $\#\mathcal{F} \approx 2^{2^{264}}$. In other words, $\mathcal{F}$'s size is beyond comprehension. Most of the points in that space[^points] are totally unstructured ''random'' maps.
__The overwhelming majority of points of $\mathcal{F}$ are maps whose interpolation polynomial has degree $\#\Bbb{F}$.__
Indeed, there are only $(\#\Bbb{F})^{\#\Bbb{F} - 1}$ polynomial maps of degree $<\#\Bbb{F}$, i.e. the probability of stumbling on a polynomial maps of degree $<\#\Bbb{F}$ by chance is $1/(\#\Bbb{F})\approx 2^{-256}$.
__For all intents and purposes low degree maps don't exist.__
For instance, if we consider the collection of all polynomial maps of degree $\leq 10^9$, a very reasonable bound for any real world application, the likelyhood of stumbling on such a function by chance is 1 in $(\#\Bbb{F})^{\#\Bbb{F}-10^9}$ which is, for a 256 bit field, $\lll$ than 1 in $2^{2^{263}}$.
[^points]: i.e. we think of maps $f:\Bbb{F}\to\Bbb{F}$ as the points of $\mathcal{F}$
## The civilizing influence of low degree maps on their neighbors
### A neighborhood of maps
While low degree maps are exceedingly rare they do, of course, exist. Furthermore, __they exert a taming influence on their immediate neighbor functions__. When talking about neighboring functions we are implicitly talking about [distances](https://en.wikipedia.org/wiki/Distance#General_metric) between maps. To measure distances between maps, i.e. points in $\mathcal{F}$, we use [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance). The Hamming distance $d(f,g)\in[0,1]$ between two maps $f,g:S\to\Bbb{F}$ is the proportion of inputs in $S$ on which they differ. Thus if $f=g$ they differ on no inputs and $d(f,g)=0$ while if for all $s\in S,~f(s)\neq g(s)$ then $d(f,g)=1$. Thus, for any $\epsilon >0$ we can consider the "neighborhood of $f\in\mathcal{F}$" comprised of all maps $g$ with $d(f,g)<\epsilon$.
The picture below is a mental picture of a neighborhood of a low degree map ${\color{red}P}$ in $\mathcal{F}$. It's a pretty large picture, and opening it in a new window will make its features clearer.
![](https://i.imgur.com/stMvKsT.png)
In it we depict
* a low degree map ${\color{red}P}$,
* a somewhat close neighbor ${\color{orange}{f}}$ with, say, $d({\color{orange}{f}},{\color{red}P})\approx.001$)
* a slightly remote neighbor ${\color{green}{g}}$ with, say, $d({\color{green}{g}},{\color{red}P})\approx .1$)
* and a map ${\color{blue}{h}}$ that bears some ressemblance (yet not all that much) with ${\color{red}P}$, say $h$ with $d({\color{blue}{h}},{\color{red}P})\approx.5$).
The white halo around ${\color{red}P}$ is meant to represent the small neighborhood on which the civilizing influence of ${\color{red}P}$ can be felt. And then there is a dense and featureless blue ocean of "generic maps".
### ... Taming / civilizing influence?
What do we mean by "taming influence"? First of all, let us say clearly what we _don't_ mean. We _don't mean to suggest_ that close neighbors of a low degree map ${\color{red}P}$ are themselves low degree ... __Far from it.__ Close neighbors are overwhelmingly likely to be of degree $\#\Bbb{F}$, the maximum degree possible... Yet, to the casual observer working with incomplete information (e.g. a probabilistic polynomial time Turing machine with oracle access to a close neighbor of ${\color{red}P}$) they are nigh indistinguishable from the polynomial map ${\color{red}P}$ in whose light they bask. Indeed ...
:::info
__Close neighbors of low degree maps exhibit many of the same local redundancies which characterize low degree maps.__
:::
Let us qualify that statement: they do _with exceedingly high probability_. When tested on random inputs:
* maps that are close to a polynomial exhibit many of the local redundancies exhibited by low degree polynomials
* maps that are far from low degree polynomial maps don't.
This loose dichotomy is the __basis for proofs of proximity__. We can probabilistically test for proximity to a polynomial by checking if the map in question exhibits the expected amount of redundancies.
## Polynomials <=> built in redundancy
Let us now be slightly more clear about the way in which polynomials exhibit redundancy. This is pretty basic stuff (interpolation understood using linear algebra) and can nicely be illustrated. In one word this boils down to the fact that the space $\mathcal{P}_d$ of polynomial maps of degree $<d$ is a vector space of dimension $d$ and the linear functions " evaluation at $x_i$, $i=1,\dots, d$ " $$
\mathrm{ev}_{x_i}:\left\{
\begin{array}{rcl}
\mathcal{P}_d & \longrightarrow & \Bbb{F}\\
f & \longmapsto & f(x_i)
\end{array}
\right.
$$ for distinct points $x_1,x_2,\dots,x_d\in\Bbb{F}$ form a basis of the [dual space](https://en.wikipedia.org/wiki/Dual_space) $\mathcal{P}_d^*$.
So, for instance, while there is a whole " line's worth " of polynomial maps $f$ of degree $<2$ satisfying the constraint $f(x_1)=y_1$, there is but __one__ polynomial map of degree $<2$ satisfying the extra constraint $f(x_2)=y_2$, as depicted below:
| Single Constraint | Two constraints |
| -------- | -------- |
| ![](https://i.imgur.com/ZSLTqrB.png =600x600) | ![](https://i.imgur.com/hG0IzaN.jpg =600x600) |
Similarly there is a whole " line's worth " of polynomial maps $f$ of degree $<3$ satisfying the constraints $f(x_1)=y_1$ and $f(x_2)=y_2$ (whatever the values $y_1,y_2$ as long as $x_1\neq x_2$), but there is only __one__ polynomial map of degree $<3$ satisfying the extra constraint $f(x_3)=y_3$, as depicted below:
| Two Constraints | Three constraints |
| -------- | -------- |
| ![All degree $<3$ polynomials with two values fixed](https://i.imgur.com/Tdlz6hI.png =600x400) | ![A single degree $<3$ polynomial satisfies these two constraints](https://i.imgur.com/MdZI452.png =600x400) |
This is simply expressing the fact that for distinct $x_1,\dots,x_d\in\Bbb{F}$ the " evaluation at $x_i$ linear forms " form a basis of the dual $\mathcal{P}_d^*$.
Its [(ante)dual basis](https://en.wikipedia.org/wiki/Dual_basis) is the basis of Lagrange polynomials. Indeed, to produce the only polynomial function $f\in\mathcal{P}_d$ such that $f(x_i)=y_i$ for $i=1,\dots,d$, one forms $f=\sum_{i=1}^dy_i L_i$ where the Lagrange polynomials of the set $\{x_1,\dots,x_d\}$: $$
L_i=
\prod_{\substack{1\leq j\leq d\\j\neq i}}
\frac{X-x_j}{x_i-x_j}
$$
The fact that the family of linear functionals $(\mathrm{ev}_{x_1}, \dots, \mathrm{ev}_{x_d})$ forms a basis of the space of linear functionals on $\mathcal{P}_d$ also means that evaluations at points $x\in\Bbb{F}\setminus\{x_1,\dots,x_d\}$ can be expressed as linear combinations of evaluations at the $x_i$. This accounts for the " in-built redundancy " of polynomial maps.
![Predictive power of polynomials](https://i.imgur.com/scojUcH.png)
Indeed, if $f$ is a degree $<d$ polynomial function we can compute $f(x)$ for any $x$ by plugging $x$ into the formula above and using its values at the $x_i$: $$f(x) = \sum_{i=1}^dL_i(x)\cdot f(x_i)$$ or, if we write $\lambda_i^x=L_i(x)$, $$f(x) = \sum_{i=1}^d\lambda_i^x\cdot f(x_i).$$
## An inefficient proximity test
We describe a (crude) proximity test to low-degree polynomials. The redundancy within low-degree polynomials is at the heart of it. As we tried to suggest earlier, these redundancies remain by and large true for functions that are very close to low degree polynomial functions. This, in spite of the fact that these functions are hopelessly high degree polynomial functions. These maximal degree polynomial maps do their darndest to emulate low degree maps. These maps will usually pass the following test:
![](https://i.imgur.com/Nywpnh4.png)
The number of repetitions $N$ required will depend on a target soundness.
### A simple criterion to rule out maps that are far from low degree maps
Let us put $\epsilon = d(f,\mathcal{P}_d)\lll 1$: in other words, there is some low-degree polynomial $P_f$ that agrees with $f$ on a random input with probability $1-\epsilon$. Then the equation $$f(x) \overset{?}{=}
\sum_{i=1}^d
\lambda^x_i\cdot f(x_i),$$ for randomly and independently sampled $x,x_1,\dots,x_d\in\Bbb{F}$ is satisfied with probability _at least_ $(1-\epsilon)^{d+1}$[^when_everything_goes_well] and fails with probability _at least_ $(d+1)\epsilon\cdot(1-\epsilon)^d$[^when_one_things_fails]. Using these simple estimates and fixing some desired threshold $\epsilon_0$ one can find $N$ so that with overwhelming probability the test will reject maps that are at least $\epsilon_0$ far from low degree.
[^when_everything_goes_well]: when _all conditions_ $f(x)=P_f(x)$, $f(x_1)=P_f(x_1)$, ..., $f(x_d)=P_f(x_d)$ are met simultaneously. There can of course be serendipitous collisions, but those are harder to account for.
[^when_one_things_fails]: when _precisely one of the conditions_ $f(x)=P_f(x)$, $f(x_1)=P_f(x_1)$, ..., $f(x_d)=P_f(x_d)$ isn't met. Of course more than one of these may fail, but it is then unclear whether the equation fails too.
### On the wastefulness of this test
One could establish "soundness results" for this test. The proof sketched below would likely adapt, albeit with a much larger parameter space: $\Bbb{F}^{d+1}$ as opposed to $\Bbb{F}^2$. But one should note that this test is __particularly impractical__. The reason is that __every cycle requires recomputing a whole new set of weights__ $\lambda_1^x,\dots,\lambda_d^x$. This requires heavy lifting: at least $O(d)$ products and inversions. The main point, however, testing $$f(x) \overset{?}{=}
\sum_{i=1}^d
\lambda^x_i\cdot f(x_i),$$ requires __minimal computation__.
## Outline of the GLRSW scheme
The test we describe here is a refinement of the previous one. It works very much the same, but it manages to bypass the constant recomputation of the weights $\lambda_1^x,\dots,\lambda_d^x$. The trick is to select $x,x_1,x_2,\dots,x_d$ at every round so that __the associated weights stay the same through-out__.
A simple way to ensure this is to initially choose $d+1$ distinct points $b,a_1,\dots,a_d$, compute once and for all the $$\alpha_i = \prod_{\substack{1\leq j\leq d\\j\neq i}}\frac{b-a_j}{a_i-a_j},\qquad i=1,\dots,d$$ and at every round draw random $s,t\leftarrow\Bbb{Z}/p\Bbb{Z}$ and define $$\left\{
\begin{array}{lcl}
x & \leftarrow & s + bt \\[1mm]
x_1 & \leftarrow & s + a_1t \\[1mm]
x_2 & \leftarrow & s + a_2t \\[1mm]
~\vdots & & \quad\vdots\\[2mm]
x_d & \leftarrow & s + a_dt
\end{array}
\right.$$
It is quite obvious that for all $s,t\in\Bbb{Z}/p\Bbb{Z}$, $$\lambda_i^x = \prod_{\substack{1\leq j\leq d\\j\neq i}}\frac{x-x_j}{x_i-x_j}=
\prod_{\substack{1\leq j\leq d\\j\neq i}}\frac{(s + ta)-(s + a_jt)}{(s + a_it)-(s + a_jt)}=
\prod_{\substack{1\leq j\leq d\\j\neq i}}\frac{b-a_j}{a_i-a_j}=\alpha_i$$ since the $s$ disappear both in the numerators and denominators when taking differences, and the $t$'s factor out in every quotient. We thus have a new improved test:
![](https://i.imgur.com/IUYsg4z.png)
__Note.__ The reader may have noticed that the definition of the $\lambda_i^x$ doesn't make sense when $t=0$: we are dividing by zero. This isn't really a problem: for $f$ polynomial function of degree $<d$, the functional equation $f(s+bt) \overset?= \sum_{i=1}^d \alpha_i\cdot f(s+a_it)$ _does_ still hold in that case, which is all we need.
# Details
In this second half of the post we dive into details about the __GLRSW__ scheme, in particular how it rehabilitates approximate polynomials by computing their closest low-degree polynomial. The proofs we present below are fleshed out versions of those from the original paper [Self Testing/Correcting for Polynomials and Approximate Functions](http://people.csail.mit.edu/ronitt/papers/glrsw.pdf).
## Conventions
In what follows $\Bbb{F}$ is a finite _prime_ field and $d$ is a positive integer with $d<\#\Bbb{F}$. We will sometimes write $n=\#\Bbb{F}$. We define the __degree of a map__ $f:\Bbb{F}\to\Bbb{F}$ to be the degree of its interpolation polynomial. Thus $\mathrm{deg}(f) < \#\Bbb{F}$. We write $\Bbb{F}_d[X]$ for the space of all degree $\leq d$ polynomials with coefficients in $\Bbb{F}$. We furthermore say that a map $f:\Bbb{F}\to\Bbb{F}$ has __low degree__ if $\mathrm{deg}(f)\leq d$.
In line with the __GLRSW__ paper, we will consider an efficient program $P$ that supposedly computes a certain function $f$. We write $P(x)$ for the output of $P$ on input $x$. The program $P$ should compute $f$ correctly most of the time, that is, we expect $P(x)=f(x)$ for most inputs $x\in\Bbb{F}$. One may think for instance of $P$ as verifying a Merkle branch from (supposedly) $f(x)$ to a Merkle root representing $f$.
## A quick rundown
### Step 1: Characterizing low-degreeness
We first establish that a certain form of redundancy in the values of a map $f:\Bbb{F}\to\Bbb{F}$ is __equivalent__ to the interpolating polynomial of that map having low degree (i.e. degree $\leq d$).
* One direction ($\implies$) is [easily checked](#Low-degreeness-implies-redundancy). It is a fact about abstract polynomials.
* The converse, i.e. the fact that this form of redundancy in maps __precisely__ characterizes those maps with low degree interpolating polynomial is established in [the next section](#Redundancy-implies-low-degreeness).
The converse seemingly requires one to work in a prime field to avoid having to divide by binomial coefficients which may be zero (because of positive characteristic).
### The helper function _g_
We next introduce a helper function $g:\Bbb{F}\to\Bbb{F}$. The definition of $g$ can be explained as follows. If $P$ were indeed a low-degree polynomial function, then we would be able to predict any of its values $P(x)$ starting with __any__ $d+1$ of its values $P(x_0)$, $\dots$, $P(x_d)$. We would simply use the interpolation formula to make a (correct) prediction. What if $P$ isn't polynomial, though? Then different sets of values $x_0,\dots,x_d$ might lead to different predictions of the value of $P(x)$. We thus define
> __Informal definition of $g$.__ For $x\in\Bbb{F}$, $g(x)$ is be __the most popular prediction__ we get by interpolating values of $P$.
This definition is a little unsatisfying, though. There is potentially room for ambiguity:
:::warning
__Potential source of ambiguity 1.__ Of the predicted values, there might be two values that are tied for most popular predicted value.
:::
That would be unfortunate, and it is something we have to deal with. A simple (yet not completely satisfactory) work-around is to arbitrarily break ties. Furthermore, one might question the value of this prediction: what if 998 values are respectively predicted $0.1001\%$ of the time, and another value is predicted $0.1002\%$ of the time? That value would be our definition of $g(x)$, but its value as a majority value is dubious.
:::warning
__Potential source of ambiguity 2.__ There might be no value that is predicted $>50\%$ of the time.
:::
__Note.__ It is important to note that
* $g$ is never explicitly computed by anybody
* nor is $g$ meant to be efficiently computable.
If the verifier had the computational power to compute $g$ even on a single value it might as well skip computing $g$ altogether and directly verify that $P$ is polynomial. The function $g$ is simply here to be used in abstract arguments.
What matters most is that this $g$, whatever it may be, is unambiguously defined and contains information about $P$. The fact that $g$ may indeed be unambiguously defined and without arbitrary choices (such as arbitrarily resolving ties) is the [first thing we establish] __TODO__
### Step 2: Analyzing the helper function and drawing conclusions
We formulate the expectations one might place on $g$:
:::success
__Expectation 1.__ If $P$ is close enough to being polynomial, there is no ambiguity in the definition of $g$.
:::
This is proven in [__Lemma 1__](#Lemma-1---the-helper-function-and-the-oracle-agree-a-lot). This result also has psychological value, as it tells us that we don't have to deal with the potential ambiguities listed above, see its [__Corollary__](#Corollary). In this context, __close enough__ means $\delta <\frac1{4(d+1)}$.
:::success
__Expectation 2.__ If $P$ computes a map that is close to polynomial, then $g$ and $P$ ought to agree most of the time.
:::
This is indeed true and established in [__Lemma 2__](#Lemma-2---the-helper-function-and-the-oracle-agree-a-lot). In this context, __most of the time__ means on a proportion $> 1-2\delta$ of all inputs.
:::success
__Expectation 3.__ The if $P$ is close enough to _some_ low-degree polynomial map, then the helper function $g$ _is_ that low-degree polynomial map.
:::
This is indeed true and established in [__Lemma 3__](#Lemma-3). Thus $g$ _rehabilitates_ $P$ provided $P$ is accurate enough.
## Low-degreeness implies redundancy
The following lemma is the theoretical foundation of the test. We consider pairwise distinct field elements $b,a_0,a_1,\dots,a_d\in\Bbb{F}$. These will remain constant through out.
:::info
__Lemma.__ There exist coefficients $\alpha_0,\dots,\alpha_d\in\Bbb{F}$ such that for all $P\in\Bbb{F}_{d}[X]$, $$
P(X+bT)=
\sum_{k=0}^d
\alpha_kP(X+a_kT)
$$
:::
__Proof.__ We start with a special case of $P$ which turns out to be sufficient. Thus, expand, for $P=X^d$ and arbitrary $\alpha_0,\dots,\alpha_d\in\Bbb{F}$, both sides of the above:
$$\left\{~
\begin{array}{rcl}
(X+bT)^d
& = &
\displaystyle\sum_{k=0}^d\binom{d}{k} b^kT^kX^{d-k}\\
\displaystyle\sum_{k=0}^d
\alpha_k (X+a_kT)^d
& = &
\displaystyle\sum_{k=0}^d\binom{d}{k}
\Big(
\sum_{l=0}^d \alpha_l a_l^k
\Big)T^kX^{d-k}
\end{array}
\right.$$
To achieve equality, it is _enough_ that the $\alpha_i$ be a solution of the linear system
$$
\begin{pmatrix}
1 & 1 & 1 & \cdots{} & 1\\
a_{0} & a_{1} & a_{2} & \cdots{} & a_{d}\\
a_{0}^{2} & a_{1}^{2} & a_{2}^{2} & \cdots{} & a_{d}^{2}\\
\vdots &\vdots &\vdots &\ddots &\vdots\\
a_{0}^{d} & a_{1}^{d} & a_{2}^{d} & \cdots{} & a_{d}^{d}
\end{pmatrix}
\begin{pmatrix}
\alpha_0\\\alpha_1\\\alpha_2\\\vdots\\\alpha_d
\end{pmatrix}
=
\begin{pmatrix}
1\\b\\b^2\\\vdots\\b^d
\end{pmatrix}
$$
This [Vandermonde system](https://en.wikipedia.org/wiki/Vandermonde_matrix) is known to be invertible and so there is a unique solution $(\alpha_0,\dots,\alpha_{d})\in\Bbb{F}^{d+1}$. We note that had we written a similar system for $X^k$ with $k\leq d$, the resulting constraints would have been a subset of those for $P=X^d$. Hence the $\alpha_i$ we just identified work for all degree $\leq d$ polynomials.
## Redundancy implies low degreeness
In view of the previous lemma, any degree $\leq d$ polynomial map $f:\Bbb{F}\to\Bbb{F}$ satisfies $$\forall s,t\in\Bbb{F},\quad
f(s+bt)=
\sum_{k=0}^d
\alpha_kf(s+a_kt)
$$ Since this is the basis of the proximity test above, one should wonder about the converse: are degree $\leq d$ these the _only_ maps $f:\Bbb{F}\to\Bbb{F}$ satisfying this property? To answer to that question let us define, for every map $f:\Bbb{F}\to\Bbb{F}$ a new map $\widehat{f}:\Bbb{F}\times\Bbb{F}\to\Bbb{F}$ by the formula $$\displaystyle
\forall s,t\in\Bbb{F},
\quad\widehat{f}\big(s,t\big) := f(s+bt) - \sum_{i=0}^{d}\alpha_i f(s+a_i\cdot t)$$ We can thus restate the previous question as
:::success
__Question.__ Are degree $\leq d$ these the only maps $f:\Bbb{F}\to\Bbb{F}$ satisfying $\widehat{f}=0$?
:::
The answer is YES:
:::info
__Lemma.__ Let $f:\Bbb{F}\to\Bbb{F}$ be a map. The following are equivalent:
* $\widehat{f}=0$
* $\mathrm{deg}(f)\leq d$
:::
__Proof.__ The proof isn't complicated. We use polynomial interpolation of bivariate maps to convert the problem from one about __maps__ to one about __polynomials__. First of all, bivariate polynomial interpolation is possible. The map $\Bbb{F}_{n-1,n-1}[X,Y]\to\mathcal{F}(\Bbb{F}\times\Bbb{F},\Bbb{F})$ that sends a bivariate polynomial $P$ of bidegree $\leq (n-1,n-1)$ to the associated function $\Bbb{F}\times\Bbb{F}\to\Bbb{F},~(x,y)\mapsto P(x,y)$ is an isomorphism. This is easily seen using the polynomials $L_{a,b}$ defined by $$
L_{a,b}(X,Y)
=
\Bigg(
\prod_{\substack{\alpha\in\Bbb{F}\\\alpha\neq a}}
\frac{X-\alpha}{a-\alpha}
\Bigg)
\cdot
\Bigg(
\prod_{\substack{\beta\in\Bbb{F}\\\beta\neq a}}
\frac{Y-\beta}{b-\beta}
\Bigg)
$$ which vanish everywhere except at the one point $(a,b)\in\Bbb{F}^2$.
Similarly to what we did with functions, we can define, for any univariate polynomial $P\in\Bbb{F}_{n-1}[X]$ a bivariate polynomial $\widehat{P}\in\Bbb{F}_{n-1,n-1}[S,T]$ like so $$
\displaystyle \widehat{P}= P(S+bT) - \sum_{i=0}^{d}\alpha_i P(S+a_i\cdot T)
$$ The procedures $f\mapsto \widehat{f}$ and $P\mapsto \widehat{P}$ are compatible with interpolation in the sense that if $f$ is a map $\Bbb{F}\to\Bbb{F}$ and $P_f$ is its degree $<n$ interpolating polynomial, then $\widehat{P_f}$ is $\widehat{f}$'s bivariate interpolation polynomial.
Now suppose $f$ satisfies $\widehat{f}=0$. Then $\widehat{P_f}=0$ and if $e$ is $f$'s degree (i.e. $e=\deg(P_f)$), then by looking solely at the degree $e$ term of $P_f$ we get that $$(S+bT)^e=\sum_{i=0}^d\alpha_i(S+a_iT)^e$$ and by isolating the $X^{e-j}T^{j}$ terms we see that $$\forall
j\in\{0,\dots,e\},
b^j=\sum_{i=0}^d
\alpha_i
a_i^j$$ and so $$
\begin{pmatrix}
1\\
b\\
b^{2}\\
\vdots\\
b^{d}\\
\vdots\\
b^{e}
\end{pmatrix}
=
\alpha_0
\cdot
\begin{pmatrix}
1\\
a_0\\
a^{2}_0\\
\vdots\\
a^{d}_0\\
\vdots\\
a^{e}_0
\end{pmatrix}
+
\alpha_1
\cdot
\begin{pmatrix}
1\\
a_1\\
a^{2}_1\\
\vdots\\
a^{d}_1\\
\vdots\\
a^{e}_1
\end{pmatrix}
+
\alpha_2
\cdot
\begin{pmatrix}
1\\
a_2\\
a^{2}_2\\
\vdots\\
a^{d}_2\\
\vdots\\
a^{e}_2
\end{pmatrix}
+
\cdots+
\alpha_d
\cdot
\begin{pmatrix}
1\\
a_d\\
a^{2}_d\\
\vdots\\
a^{d}_d\\
\vdots\\
a^{e}_d
\end{pmatrix}
$$ This is where we get a contradiction: this is impossible for $e\geq d+1$ for it would contradict the invertibility of the Vandermonde matrix $\mathrm{VdM}(b,a_0,a_1,\dots,a_d)$ (recall that the $b,a_0,\dots,a_d$ are supposed pairwise distinct.)
__Note.__ We implicitely used the assumption that $\Bbb{F}$ is a prime field. Indeed, in the step where we " looked at $X^{e-j}T^{j}$ terms " we divided by a binomial coefficient $\binom{e}{j}$ with $e<\#\Bbb{F}$. To be legitimate in doing so (i.e. to be sure we don't divide by zero) we have to assume $n=\#\Bbb{F}$ is prime.
## Expected value of interpolation: the helper function _g_
Suppose you are given oracle access to the values of a map $P$. In other words, you have some black box that spits out values $P(x)$ when you feed it some field element $x\in\Bbb{F}$. Suppose furthermore you are told: "That function $P:\Bbb{F}\to\Bbb{F}$ is a low degree polynomial map. How low you ask? Its degree is $\leq d$." How would you go about convincing yourself of that claim? If $P$'s domain $\Bbb{F}$ is large, it is hopeless to ask for a perfect proof: you would have to interpolate $P$ from $d+1$ values and compare the values predicted by your formula to those queried from the oracle.
You might on the other hand try to use the characterization of low degree maps presented above. While we won't be able to find efficient tests of low-degreeness _per se_, what we can do is reliably reject oracles $P$ that are far from any degree $\leq d$ polynomial. That is, what we describe below is a __probabilistic test of proximity__.
To simplify things slightly, we fix $b=0$ and consider pairwise distinct nonzero $a_0,\dots,a_d\in\Bbb{F}$. There is no reason to try to be fancy about choosing the $a_i$, so taking $a_i=i$ for instance is a perfectly valid choice. If $P$ were indeed polynomial of degree $\leq d$, we would have, for all $x,t\in\Bbb{F}$, $$P(x) = \sum_{i=0}^{d}\alpha_i P(x+a_i\cdot t)$$
This can't be reasonably checked on all inputs $x,t$. But starting from this observation we can make some abstract definitions that turn out to be useful in the analysis. First of all we define a nonnegative $\delta=\delta_P$ by the formula $$\delta = \frac{\displaystyle\Bigg|\Big\{(x,t)\in\Bbb{F}^2~\Big|~P(x)\neq\sum_{i=0}^{d}P(x+\alpha_i\cdot t)\Big\}\Bigg|}{|\Bbb{F}|^2}$$ This is simply the proportion of inputs $(x,t)$ that fail the expected equation. We can also define subsets of $\Bbb{F}$ associated with $P$. For instance, $\mathsf{Maj}_P$ is the set of all $x$ such that the right hand side of this equation takes on a particular value a __majority__ of the time, i.e. more than half of the time with respect to $t$. We thus define $$
\mathsf{Maj}_P
=\left\{
x\in\Bbb{F}
~\bigg|~
\exists\theta\in\Bbb{F},\text{ s.t. $>50\%$ of all $t$ satisfy }
\theta=\sum_{i=0}^{d}\alpha_i P(x+a_i\cdot t)
\right\}
$$
We can define a map[^reason] $g:\mathrm{Maj}_P\to\Bbb{F}$ by setting $g(x)=\theta$ where $\theta$ is the value that appears $>50\%$ of the time as $\sum_{i=0}^{d}\alpha_i P(x+a_i\cdot t)$. No harm is made by arbitrarily extending $g$ to a map $\Bbb{F}\to\Bbb{F}$. We can also define the subset where, __conveniently__, that majority value coindices with the value predicted by the oracle and that predicted by the supposed low degreeness $$\mathsf{Conv}_P =
\left\{
x\in\Bbb{F}
~\bigg|~
\text{ $>50\%$ of all $t$ satisfy }
P(x)=\sum_{i=1}^{d+1}\alpha_i P(x+a_i\cdot t)\right\}$$
[^reason]: we defined $\mathrm{Maj}_P$ precisely in order to define $g$
Here's a picture to accompany these definitions. We can think of every pair $(x,t)\in\Bbb{F}^2$ as a test case for the expected equality $P(x)\overset?=\sum_{i=1}^{d+1}\alpha_i P(x+a_i\cdot t)$. In the diagram below, blue dots signal that this expected equality holds true, yellow crosses that it fails.
![](https://i.imgur.com/9NX53sl.png)
For that particular oracle $P$, most pairs $(x,t)$ satisfy the expected equality. If we actually count the yellow crosses we get $\delta=\delta_P=39/256\approx 15,23\%$. Every column that is majority blue is, by definition, indexed by some $x\in\mathrm{Conv}_P$.
![](https://i.imgur.com/hyoL4l1.png)
Note that three columns have fewer than half of the expected equalities hold. These are columns indexed by $x\notin\mathrm{Conv}_P$, yet some, such as the first from the left, might still define an unambiguous majority value, e.g. the $x$-coordinate of the first red column might still belong to $\mathrm{Maj}_P$.
## Lemma 1 - the helper function and the oracle agree a lot
We start with a simple lemma. Let $\mathbf{X},\mathbf{Y}$ be independent, identically distributed (_iid_) random values with values in a finite set $V$. Also let $v_0$ be (a) the value $\mathbf{X}$ is likeliest to have, that is $$\Bbb{P}[\mathbf{X}=v_0]=\max_{v\in V}\Bbb{P}[\mathbf{X}=v]$$ Let $\mathcal{E}$ be the event that $\mathbf{X}$ and $\mathbf{Y}$ agree, i.e. $\mathcal{E}=\big[\mathbf{X}=\mathbf{Y}\big]$, then
:::info
__Lemma.__ $\Bbb{P}\big[\mathcal{E}\big]\leq \Bbb{P}[\mathbf{X}=v_0]$.
:::
__Proof.__ Decompose the event $\mathcal{E}$ according to the common value of $\mathbf{X}$ and $\mathbf{Y}$: $$\mathcal{E} = \bigsqcup_{v\in V}\big[\mathbf{X} =v\big]\cap\big[ \mathbf{Y}=v\big].$$ Taking probabilities and since $\mathbf{X}$ and $\mathbf{Y}$ are _iid_, $$\begin{array}{rcl} \Bbb{P}[\mathcal{E}] & = & \sum_{v\in V} \Bbb{P}[\mathbf{X}= v]^2 \\ & \leq & \sum_{v\in V} \Bbb{P}[\mathbf{X}=v]\Bbb{P}[\mathbf{X}=v_0] \\ & = & \Bbb{P}[\mathbf{X}=v_0]. \end{array}$$
---
Fix $x\in\Bbb{F}$. Consider two independent uniformly distributed random variables $\mathbf{t_1}$ and $\mathbf{t_2}$ with values in $\Bbb{F}$. Consider the random variables $$\begin{cases} \displaystyle \mathbf{Pred_1} = \sum_{i=0}^{d}\alpha_iP(x+a_i\cdot\mathbf{t_1})\\[1mm] \displaystyle \mathbf{Pred_2} = \sum_{j=0}^{d}\alpha_iP(x+a_j\cdot\mathbf{t_2})\end{cases}$$ These represent random predictions of the value of $P(x)$ obtained by interpolation. These random variables and are independent and identically distributed. Also $g(x)$ is, by definition, the most likely value of either. In light of the previous lemma we le the $\mathcal{E}$ be the event where two predictions agree: $$\mathcal{E}=\Bigg[ \mathbf{Pred_1} ~=~ \mathbf{Pred_1} \Bigg]$$ The previous lemma tells us that $$\Bbb{P}\big[\mathcal{E}\big]\leq\left[\begin{array}{c}\text{Proportion of }t\in\Bbb{F}\text{ such that}\\\sum_{i=0}^{d}\alpha_iP(x+a_i\cdot t) =g(x) \end{array}\right]$$ If we can find a usable __lower bound__ for the probability $\Bbb{P}[\mathcal{E}]$, we get a lower bound for the proportion of inputs realizing the majority value $g(x)$.
In order to produce such a lower bound, we consider the subevent $$\begin{array}{rcl}
\mathcal{E}
& \supset &
\overbrace{\displaystyle\bigcap_{i=0}^{d} \left[ P(x+a_i\cdot\mathbf{t_1}) = \sum_{j=0}^{d}\alpha_jP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right]}^{\displaystyle (1)}\\
& & \quad
\cap~\underbrace{\displaystyle\bigcap_{j=0}^{d} \left[ P(x+a_j\cdot\mathbf{t_2}) = \sum_{i=0}^{d}\alpha_iP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right]}_{\displaystyle (2)}\qquad \end{array}$$ The right hand side is _indeed_ a subset of $\mathcal{E}$: if all the conditions on the right are met, then $$\begin{array}{rcl}
\mathbf{Pred_1} & = & \sum_{i=0}^{d}\alpha_iP(x+i\cdot\mathbf{t_1})\\
& \overset{(1)}= &
\sum_{i=0}^{d}\alpha_i\left( \sum_{j=0}^{d}\alpha_jP(x+i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right)\\
& = &
\sum_{j=0}^{d}\alpha_j \left( \sum_{i=0}^{d}\alpha_iP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right)\\
& \overset{(2)}= &
\sum_{j=0}^{d}\alpha_iP(x+a_j\cdot\mathbf{t_2})\\
& = &
\mathbf{Pred_2}\end{array}$$ These subevents allow us to have th random variables $\mathbf{t_1}, \mathbf{t_2}$ interact. Obtaining a lower bound on $\Bbb{P}[\mathcal{E}]$ is equivalent to getting an upper bound on $\Bbb{P}[\Omega\setminus\mathcal{E}]$. The above gives $$\begin{array}{rcl}
\Omega\setminus\mathcal{E}
& \subset &
\bigcup_{i=0}^{d} \left[ P(x+i\cdot\mathbf{t_1}) \neq \sum_{j=1}^{d+1}\alpha_jP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right]\\
& & \quad\cup~
\bigcup_{j=0}^{d} \left[ P(x+j\cdot\mathbf{t_2}) \neq \sum_{i=1}^{d+1}\alpha_iP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right]
\end{array}$$
So that $$\begin{array}{rcl}
\Bbb{P}\left[\Omega\setminus\mathcal{E}\right]
& \leq &
\sum_{i=0}^{d}\Bbb{P} \left[ P(x+a_i\cdot\mathbf{t_1}) \neq
\sum_{j=0}^{d}\alpha_jP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right]\\
& & \quad+~
\sum_{j=0}^{d}\Bbb{P} \left[ P(x+a_j\cdot\mathbf{t_2}) \neq \sum_{i=0}^{d}\alpha_iP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right]
\end{array}$$
Now for any $i=0,\dots, d$ the random variable $\mathbf{t'_{1}}=x+a_i\cdot\mathbf{t_1}$ is uniformly distributed and independent from $\mathbf{t_2}$, and similarly for any $j=0,\dots, d$ the random variable $\mathbf{t'_{2}}=x+a_j\cdot\mathbf{t_2}$ is uniformly distributed and independent from $\mathbf{t_1}$, and so, by definition of $\delta$, $$\begin{array}{l}
\Bbb{P} \left[ P(x+a_i\cdot\mathbf{t_1}) \neq \sum_{j=0}^{d}\alpha_jP(x+a_i\cdot\mathbf{t_1}+a_j\cdot\mathbf{t_2}) \right] \\
\qquad\qquad = \Bbb{P} \left[ P(\mathbf{t'_1}) \neq \sum_{j=0}^{d}\alpha_jP(\mathbf{t'_1}+a_j\cdot\mathbf{t_2}) \right] \\
\qquad\qquad = \delta
\end{array}$$
and similarly, $$\begin{array}{l}
\Bbb{P} \left[ P(x+a_j\cdot\mathbf{t_2}) \neq \sum_{i=0}^{d}\alpha_iP(x+a_j\cdot\mathbf{t_2}+a_i\cdot\mathbf{t_1}) \right] \\
\qquad\qquad = \Bbb{P} \left[ P(\mathbf{t'_2}) \neq \sum_{j=0}^{d}\alpha_iP(\mathbf{t'_2}+a_i\cdot\mathbf{t_1}) \right] \\
\qquad\qquad = \delta
\end{array}$$ Injecting these into the upper bound for $\Bbb{P}\left[\Omega\setminus\mathcal{E}\right] = 1 - \Bbb{P}\left[\mathcal{E}\right]$ obtained previously proves
:::info
__Lemma 1.__ For any $x\in \Bbb{F}$, $$1-2(d+1)\delta ~\leq~ \Bbb{P}\big[\mathcal{E}\big] ~\leq~ \Bbb{P}\big[\mathbf{Pred_1}=g(x)\big].$$
:::
As promised, we can thus lift any and all unpleasant ambiguities in $g$'s definition. Notice that $1-2(d+1)\delta >\frac12$ as soon as $\delta<\frac1{4(d+1)}$.
:::info
__Corollary.__ Suppose that $\delta<\frac1{4(d+1)}$, then for all $x\in \Bbb{F}$, $g(x)$ represents $>50\%$ of the the values $\sum_{i=0}^{d}\alpha_jP(x + a_i\cdot t)$ where $t\in \Bbb{F}$. In other words, $\mathbf{Maj}_P=\Bbb{F}$.
:::
## Lemma 2 - the helper function and the oracle agree a lot
Let us make the headline precise. We just proved that whenever $\delta<\frac1{4(d+1)}$, then $g(x)$ is unambiguously defined for all $x\in\Bbb{F}$, that is $\mathbf{Maj}_P=\Bbb{F}$. Recall that we defined a subset $\mathbf{Conj}_P\subset\mathbf{Maj}_P$ of so-called "convenient" inputs. Those were the $x\in\Bbb{F}$ where, not only $g$ was unambiguously defined, but $g(x)$ coïncided with $P(x)$. We now establish that as long as $\delta$ is small enough, $\mathbf{Conj}_P$ will be large. I.e. if $\delta$ is small, then the helper function $g$ and the oracle agree often.
:::info
__Lemma 2.__ We have $\frac{|\mathsf{Conv}_P|}{|\Bbb{F}|} \geq 1-2\delta$
:::
__Proof.__ Let us consider the pairs $(x,t)$ where the expected equality $$P(x)\overset?=\sum_{i=0}^{d}\alpha_i P(x+a_i\cdot t')$$ fails. We distinguish two cases: $x\in\mathrm{Conj}_P$ and $x\notin\mathrm{Conj}_P$ :
![](https://i.imgur.com/h6vJy1e.png)
Thus, by definition of $\delta$,
![](https://i.imgur.com/WOJFojl.png)
The last inequality follows from the fact that when $x\notin\mathbf{Conv}_P$, $g(x)\neq P(x)$ and thus $P(x)$ is not a majority value (or it shares that title with $g(x)$ and was arbitrarily disregarded as the choice for the majority value). In particular, the proportion of $t\in\Bbb{F}$ such that $P(x)\neq \sum_{i=0}^d P(x+a_i\cdot t)$ is at least $50\%$. Rearranging things, we get the inequality from the lemma.
## Lemma 3 - the helper function is a low degree polynomial
We first state an informal version of the lemma to prove. A precise statment will be given at the end.
:::info
__Lemma 3 (informal).__ If $\delta$ is small enough, then the helper function $g$ is polynomial.
:::
The proof strategy is quite interesting: to prove that $g$ is polynomial of degree $\leq d$ (or to be precise: $g$'s interpolating polynomial has degree $\leq d$) the authorsinvoke the [characterization of low degree functions](#Redundancy-implies-low-degreeness). The goal is thus to show that for any two $x,t\in\Bbb{F}$ one has $$g(x)=
\sum_{i=0}^{d}\alpha_ig(x+a_i\cdot t)
\qquad\text{i.e.}\qquad
\underbrace{\sum_{i=0}^{d+1}\alpha_ig(x+a_i\cdot t)=0}_{\displaystyle(\text{Eq. }{\color{red}{\Large\star}}_{x,t})}
$$ where we set $a_{d+1}=b=0$ and $\alpha_{d+1}=-1$. What makes the proof strategy interesting is that the satisfaction of (Eq. ${\color{red}{\Large\star}}_{x,t}$) for a particular pair $(x,t)\in\Bbb{F}^2$, which is either true or false, is established using a probabilistic method. To wit, consider the following event $$\begin{array}{rcl}
\mathcal{F} = \mathcal{F}_{x,t}
& = &
\displaystyle
\overbrace{\bigcap_{i=0}^{d+1}
\left[ g(x+a_it) = \sum_{j=0}^{d} \alpha_j P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right]}^{\displaystyle (3)} \\
&&
\displaystyle\qquad\cap~
\underbrace{\bigcap_{j=1}^d
\left[ 0 = \sum_{i=0}^{d+1} \alpha_i P\big(x+a_j\mathbf{t_1}+a_i(t+a_j\mathbf{t_2})\big) \right]}_{\displaystyle (4)}
\end{array}$$ where $\mathbf{t_1}$ and $\mathbf{t_2}$ are independent uniformly distributed random variables in $\Bbb{F}$. The relation with (Eq. ${\color{red}{\Large\star}}_{x,t}$) is that, whenever the conditions in $\mathcal{F}$ are met, one has $$\begin{array}{rcl}
\displaystyle\sum_{i=0}^{d+1} \alpha_i g(x+a_it)
& \overset{(3)}= &
\displaystyle\sum_{i=0}^{d+1} \alpha_i \sum_{j=0}^{d} \alpha_j P\big(x+a_it+a_j(\mathbf{t_1}+i\mathbf{t_2})\big) \\
& = &
\displaystyle\sum_{j=0}^{d} \alpha_j \sum_{i=0}^{d+1} \alpha_i P\big(x+a_j\mathbf{t_1}+a_i(t+j\mathbf{t_2})\big) \\
& \overset{(4)}= & 0.
\end{array}$$ Thus, __to conclude that (Eq. ${\color{red}{\Large\star}}_{x,t}$) holds it is enough to show that the event $\mathcal{F}_{x,t}$ has nonzero probability.__ The goal now is to bound the probability of $\mathcal{F}$ from below, in particular, to show that it is $>0$ when $\delta$ is small enough.
Again, the idea will to bound $\Bbb{P}[\mathcal{F}^c]$ from above. First of all, $$\begin{array}{rcl}
\mathcal{F}^c
& = &
\displaystyle \bigcup_{i=0}^{d+1} \left[ g(x+a_it) \neq \sum_{j=0}^{d} \alpha_j P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right] \\
&& \qquad
\displaystyle \cup~\bigcup_{j=0}^d \left[ 0 \neq \sum_{i=0}^{d+1}\alpha_i P\big(x+a_j\mathbf{t_1}+a_i(t+a_j\mathbf{t_2})\big) \right]
\end{array}$$ so that $$\begin{array}{rcl}
1-\Bbb{P}[\mathcal{F}]=\Bbb{P}[\mathcal{F}^c]
& \leq &
\displaystyle
\sum_{i=0}^{d+1} \overbrace{\Bbb{P}\left[ g(x+a_it) \neq \sum_{j=0}^{d} \alpha_j P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right]}^{\displaystyle\color{red}{\textbf{I}}} \\
&& \qquad
\displaystyle
+~ \sum_{j=0}^{d} \underbrace{\Bbb{P}\left[ 0 \neq \sum_{i=0}^{d+1} \alpha_i P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right]}_{\displaystyle\color{blue}{\textbf{II}}}
\end{array}$$
__Bounding $\color{red}{\textbf{I}}$.__ Since for all $i$, the random variable $\mathbf{t_1}+a_i\mathbf{t_2}$ is uniformly distributed, we can apply the result from [Lemma 1](#Lemma-1---the-helper-function-and-the-oracle-agree-a-lot) and obtain $$\forall i,\quad
\Bbb{P}\left[ g(x+a_it) \neq \sum_{j=0}^{d} \alpha_j P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right]
\leq 2(d+1)\delta.$$
__Bounding $\color{blue}{\textbf{II}}$.__ Take $j\in\{0,\dots,d\}$. For any $i\in\{0,\dots,d+1\}$, rearranging terms $$P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big)=P\big(\boldsymbol{\xi}+a_i\boldsymbol{\tau}\big)$$ where $\boldsymbol{\xi}=x+a_j\mathbf{t_1}$ and $\boldsymbol{\tau}=t+a_j\mathbf{t_2}$. Note that are $\boldsymbol{\xi}$ and $\boldsymbol{\tau}$ are both uniformly distributed in $\Bbb{F}$ (since $a_j\neq 0$) and independent (since $\mathbf{t_1}$ and $\mathbf{t_2}$ are). Then by definition of $\delta$, $$\begin{array}{rcl}
\displaystyle
\quad \Bbb{P} \left[ 0 \neq \sum_{i=0}^{d+1} \alpha_i P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right]
& = &
\displaystyle
\Bbb{P} \left[ 0 \neq \sum_{i=0}^{d+1} \alpha_i P\big(x+a_j\mathbf{t_1}+a_i(t+a_j\mathbf{t_2})\big) \right]\\
& = &
\displaystyle
\Bbb{P} \left[ 0 \neq \sum_{i=0}^{d+1} \alpha_i P\big(\boldsymbol{\xi}+a_i\boldsymbol{\tau})\big) \right]\\
& = &
\delta.
\end{array}$$
Combining these bounds yields: $$\begin{array}{rcl}
1-\Bbb{P}[\mathcal{F}]=\Bbb{P}[\mathcal{F}^c]
& \leq &
\displaystyle \sum_{i=0}^{d+1} \Bbb{P}\left[ g(x+a_it) \neq \sum_{j=0}^{d} \alpha_j P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right] \\
&& \qquad
+~ \displaystyle \sum_{j=0}^{d} \Bbb{P}\left[ 0 \neq \sum_{i=0}^{d+1} \alpha_i P\big(x+a_it+a_j(\mathbf{t_1}+a_i\mathbf{t_2})\big) \right] \\
& \leq &
(d+2)\cdot 2(d+1)\delta + (d+1)\cdot \delta = (d+1)(2d+5)\delta
\end{array}$$ i.e. $$1-(d+1)(2d+5)\delta\leq\Bbb{P}[\mathcal{F}].$$ Therefore, as soon as $\delta<\frac1{(d+1)(2d+5)}$, $g$ is polynomial. We can now state __Lemma 3__ with precision.
:::info
__Lemma 3.__ If $\delta<\frac1{(d+1)(2d+5)}$, the helper function $g$ is polynomial.
:::
## Conclusion
__Lemma 3__ shows how to "recover" the unique low-degree neighbor function $g$ of a map $P$ as soon as $\delta$ is small enough. Thus $g$ is the polynomial function the program $P$ is meant to be computing.
## A question
Let $\Bbb{F}$ be a __prime__ field. Let $x,x_1,\dots,x_d\in\Bbb{F}$ are pairwise distinct and, similarly, let $y,y_1,\dots,y_d\in\Bbb{F}$ are pairwise distinct. Suppose that for all $i=1,\dots,d$, $$\prod_{\substack{1\leq j\leq d\\j\neq i}}\frac{x-x_j}{x_i-x_j}=
\prod_{\substack{1\leq j\leq d\\j\neq i}}
\frac{y-y_j}{y_i-y_j}$$ This is the case for instance if $y, y_1,\dots,y_d$ are obtained from $x,x_1,\dots,x_d$ by means of an affine transformation $u\mapsto au+b$ (with $a\neq 0$). One might ask: is the converse true?
:::success
__Question.__ Is it necessarily the case that there exist $(a,b)\in\Bbb{F}^\times\times\Bbb{F}$ such that $y,y_1,\dots,y_d$ are obtained from the $x,x_1,\dots,x_d$ by application of the affine transformation $\phi_{a,b}:u\mapsto au + b$ ?
:::
__Note.__ The restriction to prime fields could potentially be relaxed by allowing automorphisms $\sigma\in\mathrm{Aut}(\Bbb{F})$ in the affine transformation formula: $\phi_{a,b,\sigma}:u\mapsto a\sigma(u) + b$ and with, say, $\sigma$ the Frobenius automorphism of the subfield generated by the $d$ weights above.