The simplest proximity-to-low-degree-polynomial test and how to rehabilitate approximate polynomials

Introduction

This post is an apetizer for a blog post in preparation on the FRI protocol. We describe what is arguably the simplest of all proof of proximity (to low degree polynomial functions)^[1] scheme out there. Here and elsewhere, low degree means "of degree

\leq d

" for some fixed positive integer

d

. But our interest in this scheme is as a means to shed light on a 👻 dual 👻 problem to that of proximity testing.

This scheme was originally described in the 1991 paper Self Testing/Correcting for Polynomials and Approximate Functions by P. Gemmel, R. Lipton, R. Rubinfeld, M. Sudhan and A. Widegerson (we shall sometimes use the acronym GLRSW). It has been referred to as "Sudhan

d + 1

" or some variation on that name in presentations on FRI given by Eli Ben Sasson. In StarkWare's blogpost on Low Degree Testing, it is what is called "the direct test".

Compared to modern proof of proximity schemes such as FRI it is spectacularly inefficient: it runs in linear time

O (d)

as opposed to FRI's polylogarithmic time

O (\ln (d)^{2})

. However, it has the advantage of sheer simplicity:

there is only ever one map at play (as opposed to the
$O (\ln (d))$ inter-dependent maps in FRI);
coherence is checked for that one map as opposed to having a "trickle down" sequence of coherence conditions relating all the maps in that sequence;
the redundancy property at the heart of this scheme is second nature to anyone familiar with polynomial interpolation.

Also, proving its "soundness properties", while involved, follows a clear path. It also illustrates well a popular outlook for proving soundness: "deal with the good cases in detail, don't bother analyzing what happens in the bad cases except for bounding the probability of something bad happening in the first place". Usually the good case is when everything happens according to plan, bad cases are everything else. This makes the GLRSW scheme a perfect entry point into proof of proximity schemes.

The simplest proximity-to-low-degree-polynomial test and how to rehabilitate approximate polynomials
Details

Role of proofs of proximity in transparent computational integrity schemes

We somewhat cryptically indicated that GLRSW is really about a dual problem to that of low-degree proximity testing. Let us try to explain that claim. So before we go any further: what are proof of proximity schemes and what are they useful for? Certainly the main use-case relevant to blockchain related applications is as part of computational integrity schemes.

Computational integrity schemes such as STARKs or SNARKs and many others are protocols whereby a prover can convince a verifier that a complex and/or confidential computation was performed as advertized. The tremedous appeal of such schemes comes from the existence of computational integrity schemes that are

far quicker to verify than it is to run the computation,
leak no private information,
are nigh impossible to twist into producing false positives.

Such schemes rely on a prover to construct a proof of computational integrity. The prover's work can be broadly separated into two phases.

The first phase, common to all such schemes^[2], is the arithmetization of the computation (R1CS or algebraic execution trace for instance). The raw data produced during the computation is converted into algebraic form (typically field elements) which in turn is condensed into polynomial form by means of polynomial interpolation^[3]. Polynomials are useful in this regard: besides addition and scalar multiplication^[4], they support products and divisibility. Importantly, this first phase can be done so that computational integrity (i.e. validity of the underlying computation) is equivalent to the satisfaction of some algebraic constraints by the resulting polynomials. Typically low-degreeness and divisibility conditions.

The second phase, i.e. actually compiling the proof, comes down to finding an efficient commitment of these polynomials. This commitment should allow a verifier to convince themselves of the claim to be proven. In particular, of low-degreeness and divisibility conditions that may apply. There are various ways of doing this, and (at least) two competing philosophies for carrying out the second step.

Philosophy 1: use opaque data and secrets to force the prover down a particular path; valid proofs are those producing ultra-rare collisions. The verifier thus generates some secrets along with associated opaque data (hidden behind a hard discrete log problem, say) called the proving key to be handed to the prover. The proving key is generated in such a way that for one to generate a convincing proof from it, one has to

either comply with the verifier and produce a proof according to protocol; such proofs encode polynomials that are low-degree by construction;
or successfully break a cryptographically hard problem (e.g. variants of discrete log computations).

For checking divisibility conditions, the relevant fact is that two distinct low degree polynomials

P

and

Q

virtually never^[5] produce a collision when evaluated at a random (secret) point. Thus, producing an exceedingly rare collision

P (s) = Q (s)

at a random (secret) point

s

is seen as strong supporting evidence for the claim "

P = Q

". By extension, a collision of the form

A (s) \cdot H (s) = B (s)

is interpreted as strong evidence for a divisibility claim "

A ∣ B

Philosophy 2: no secrets, find another way to enforce low-degreeness / divisibility conditions. Schemes of the first kind had low-degreeness baked into them. For schemes where the prover isn't forced down a narrow path for constructing polynomial commitments, this is no longer the case.

This is immediately problematic: producing collisions between polynomials becomes easy if one is free to use large degree polynomials. Checking for low-degreness thus becomes the central problem. And a difficult problem it certainly is. Checking low-degreeness on the nose is computationnally demanding, at least as demanding as the original computation the verifier may wish to bypass. Proofs of proximity to low-degree maps are an efficient substitute for on the nose low-degree tests. They don't prove that a map is low-degree, they show that a map

f

is (likely) in the immediate neighborhood of some low degree map whose properties may be extracted by other means from

f

Once the verifier has sufficient supporting evidence for believing in low-degreeness (or at least proximity to a low degree map), checking divisibilty conditions follows much of the same path as before: open commitments at a random points

r

(that usually need not be secret) and check

A (r) \cdot H (r) \overset{?}{=} B (r)

in the clear.

Proofs of proximity vs the GLRSW scheme

The GLRSW scheme we will describe below can be seen as a crude means to discriminate between maps. To distinguish between those that are very close to being polynomial and those that are far from polynomial. In that sense it can work like a proof of proximity scheme. Indeed, the purpose of such a scheme is to

ACCEPT with probability 1 low degree polynomial functions
REJECT with high probability maps that are far from any low degree polynomial function.

The behaviour of such a proof of proximity scheme is illustrated below: low degree maps (here the central red dot) are to be accepted all the time, maps that are outside of some small neighborhood of the set of low degree polynomial maps are to be rejected with high probability.

And then there is a grey area (a small Hamming neighborhood of some low degree polynomial map) where the test's behaviour is unspecified. The GLRSW scheme on the other hand fulfils the complementary (or dual) role of "discovering" the nearest low degree map when fed one of maps in that small neighborhood.

Proofs of proximity can be seen as means of amplifying the distinction between polynomial maps and their distant neighbors,
the GLRSW scheme can be understood as contracting small neighborhoods of polynomial functions onto that polynomial function.

So while proof of proximity schemes are in the business of discrimination, GLRSW is in the business of rehabilitation. The next section are our attempt to make explicit the above picture.

The barren wasteland of the space of all maps

Let us think for a moment about the space

F

of all maps

f : F \to F

F = {\begin{matrix} the space of all \\ set maps F \to F \end{matrix}}

(similar mental pictures will apply to the space of all maps

S \to F

for some large subset

f : S \subset F

). When

F

is a large finite field,

F

is a very large set:

# F = (# F)^{# F}

For instance, if

F

is a 256 bit prime field, we get

# F \approx 2^{2^{264}}

. In other words,

F

's size is beyond comprehension. Most of the points in that space^[6] are totally unstructured ''random'' maps.

The overwhelming majority of points of

$F$ are maps whose interpolation polynomial has degree
$# F$ .

Indeed, there are only

(# F)^{# F - 1}

polynomial maps of degree

< # F

, i.e. the probability of stumbling on a polynomial maps of degree

< # F

by chance is

1 / (# F) \approx 2^{- 256}

For all intents and purposes low degree maps don't exist.

For instance, if we consider the collection of all polynomial maps of degree

\leq 10^{9}

, a very reasonable bound for any real world application, the likelyhood of stumbling on such a function by chance is 1 in

(# F)^{# F - 10^{9}}

which is, for a 256 bit field,

⋘

than 1 in

2^{2^{263}}

The civilizing influence of low degree maps on their neighbors

A neighborhood of maps

While low degree maps are exceedingly rare they do, of course, exist. Furthermore, they exert a taming influence on their immediate neighbor functions. When talking about neighboring functions we are implicitly talking about distances between maps. To measure distances between maps, i.e. points in

F

, we use Hamming distance. The Hamming distance

d (f, g) \in [0, 1]

between two maps

f, g : S \to F

is the proportion of inputs in

S

on which they differ. Thus if

f = g

they differ on no inputs and

d (f, g) = 0

while if for all

s \in S, f (s) \neq g (s)

then

d (f, g) = 1

. Thus, for any

ϵ > 0

we can consider the "neighborhood of

f \in F

" comprised of all maps

g

with

d (f, g) < ϵ

The picture below is a mental picture of a neighborhood of a low degree map

\color r e d P

F

. It's a pretty large picture, and opening it in a new window will make its features clearer.

In it we depict

a low degree map
$\color r e d P$ ,
a somewhat close neighbor
$\color o r a n g e f$ with, say,
$d (\color o r a n g e f, \color r e d P) \approx .001$ )
a slightly remote neighbor
$\color g r e e n g$ with, say,
$d (\color g r e e n g, \color r e d P) \approx .1$ )
and a map
$\color b l u e h$ that bears some ressemblance (yet not all that much) with
$\color r e d P$ , say
$h$ with
$d (\color b l u e h, \color r e d P) \approx .5$ ).

The white halo around

\color r e d P

is meant to represent the small neighborhood on which the civilizing influence of

\color r e d P

can be felt. And then there is a dense and featureless blue ocean of "generic maps".

… Taming / civilizing influence?

What do we mean by "taming influence"? First of all, let us say clearly what we don't mean. We don't mean to suggest that close neighbors of a low degree map

\color r e d P

are themselves low degree … Far from it. Close neighbors are overwhelmingly likely to be of degree

# F

, the maximum degree possible… Yet, to the casual observer working with incomplete information (e.g. a probabilistic polynomial time Turing machine with oracle access to a close neighbor of

\color r e d P

) they are nigh indistinguishable from the polynomial map

\color r e d P

in whose light they bask. Indeed …

Close neighbors of low degree maps exhibit many of the same local redundancies which characterize low degree maps.

Let us qualify that statement: they do with exceedingly high probability. When tested on random inputs:

maps that are close to a polynomial exhibit many of the local redundancies exhibited by low degree polynomials
maps that are far from low degree polynomial maps don't.

This loose dichotomy is the basis for proofs of proximity. We can probabilistically test for proximity to a polynomial by checking if the map in question exhibits the expected amount of redundancies.

Polynomials <=> built in redundancy

Let us now be slightly more clear about the way in which polynomials exhibit redundancy. This is pretty basic stuff (interpolation understood using linear algebra) and can nicely be illustrated. In one word this boils down to the fact that the space

P_{d}

of polynomial maps of degree

< d

is a vector space of dimension

d

and the linear functions " evaluation at

x_{i}

i = 1, \dots, d

{ev}_{x_{i}} : {\begin{array}{rcl} P_{d} & ⟶ & F \\ f & ⟼ & f (x_{i}) \end{array}

for distinct points

x_{1}, x_{2}, \dots, x_{d} \in F

form a basis of the dual space

P_{d}^{*}

So, for instance, while there is a whole " line's worth " of polynomial maps

f

of degree

< 2

satisfying the constraint

f (x_{1}) = y_{1}

, there is but one polynomial map of degree

< 2

satisfying the extra constraint

f (x_{2}) = y_{2}

, as depicted below:

Single Constraint	Two constraints

Similarly there is a whole " line's worth " of polynomial maps

f

of degree

< 3

satisfying the constraints

f (x_{1}) = y_{1}

and

f (x_{2}) = y_{2}

(whatever the values

y_{1}, y_{2}

as long as

x_{1} \neq x_{2}

), but there is only one polynomial map of degree

< 3

satisfying the extra constraint

f (x_{3}) = y_{3}

, as depicted below:

Two Constraints	Three constraints

This is simply expressing the fact that for distinct

x_{1}, \dots, x_{d} \in F

the " evaluation at

x_{i}

linear forms " form a basis of the dual

P_{d}^{*}

Its (ante)dual basis is the basis of Lagrange polynomials. Indeed, to produce the only polynomial function

f \in P_{d}

such that

f (x_{i}) = y_{i}

for

i = 1, \dots, d

, one forms

f = \sum_{i = 1}^{d} y_{i} L_{i}

where the Lagrange polynomials of the set

{x_{1}, \dots, x_{d}}

L_{i} = \prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{X - x_{j}}{x_{i} - x_{j}}

The fact that the family of linear functionals

({ev}_{x_{1}}, \dots, {ev}_{x_{d}})

forms a basis of the space of linear functionals on

P_{d}

also means that evaluations at points

x \in F ∖ {x_{1}, \dots, x_{d}}

can be expressed as linear combinations of evaluations at the

x_{i}

. This accounts for the " in-built redundancy " of polynomial maps.
Predictive power of polynomials

Indeed, if

f

is a degree

< d

polynomial function we can compute

f (x)

for any

x

by plugging

x

into the formula above and using its values at the

x_{i}

f (x) = \sum_{i = 1}^{d} L_{i} (x) \cdot f (x_{i})

or, if we write

λ_{i}^{x} = L_{i} (x)

f (x) = \sum_{i = 1}^{d} λ_{i}^{x} \cdot f (x_{i}) .

An inefficient proximity test

We describe a (crude) proximity test to low-degree polynomials. The redundancy within low-degree polynomials is at the heart of it. As we tried to suggest earlier, these redundancies remain by and large true for functions that are very close to low degree polynomial functions. This, in spite of the fact that these functions are hopelessly high degree polynomial functions. These maximal degree polynomial maps do their darndest to emulate low degree maps. These maps will usually pass the following test:

The number of repetitions

N

required will depend on a target soundness.

A simple criterion to rule out maps that are far from low degree maps

Let us put

ϵ = d (f, P_{d}) ⋘ 1

: in other words, there is some low-degree polynomial

P_{f}

that agrees with

f

on a random input with probability

1 - ϵ

. Then the equation

f (x) \overset{?}{=} \sum_{i = 1}^{d} λ_{i}^{x} \cdot f (x_{i}),

for randomly and independently sampled

x, x_{1}, \dots, x_{d} \in F

is satisfied with probability at least

(1 - ϵ)^{d + 1}

^[7] and fails with probability at least

(d + 1) ϵ \cdot (1 - ϵ)^{d}

^[8]. Using these simple estimates and fixing some desired threshold

ϵ_{0}

one can find

N

so that with overwhelming probability the test will reject maps that are at least

ϵ_{0}

far from low degree.

On the wastefulness of this test

One could establish "soundness results" for this test. The proof sketched below would likely adapt, albeit with a much larger parameter space:

F^{d + 1}

as opposed to

F^{2}

. But one should note that this test is particularly impractical. The reason is that every cycle requires recomputing a whole new set of weights

λ_{1}^{x}, \dots, λ_{d}^{x}

. This requires heavy lifting: at least

O (d)

products and inversions. The main point, however, testing

f (x) \overset{?}{=} \sum_{i = 1}^{d} λ_{i}^{x} \cdot f (x_{i}),

requires minimal computation.

Outline of the GLRSW scheme

The test we describe here is a refinement of the previous one. It works very much the same, but it manages to bypass the constant recomputation of the weights

λ_{1}^{x}, \dots, λ_{d}^{x}

. The trick is to select

x, x_{1}, x_{2}, \dots, x_{d}

at every round so that the associated weights stay the same through-out.

A simple way to ensure this is to initially choose

d + 1

distinct points

b, a_{1}, \dots, a_{d}

, compute once and for all the

α_{i} = \prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{b - a_{j}}{a_{i} - a_{j}}, i = 1, \dots, d

and at every round draw random

s, t \leftarrow Z / p Z

and define

{\begin{array}{lcl} x & \leftarrow & s + b t \\ x_{1} & \leftarrow & s + a_{1} t \\ x_{2} & \leftarrow & s + a_{2} t \\ ⋮ & ⋮ \\ x_{d} & \leftarrow & s + a_{d} t \end{array}

It is quite obvious that for all

s, t \in Z / p Z

λ_{i}^{x} = \prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{x - x_{j}}{x_{i} - x_{j}} = \prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{(s + t a) - (s + a_{j} t)}{(s + a_{i} t) - (s + a_{j} t)} = \prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{b - a_{j}}{a_{i} - a_{j}} = α_{i}

since the

s

disappear both in the numerators and denominators when taking differences, and the

t

's factor out in every quotient. We thus have a new improved test:

Note. The reader may have noticed that the definition of the

λ_{i}^{x}

doesn't make sense when

t = 0

: we are dividing by zero. This isn't really a problem: for

f

polynomial function of degree

< d

, the functional equation

f (s + b t) \overset{?}{=} \sum_{i = 1}^{d} α_{i} \cdot f (s + a_{i} t)

does still hold in that case, which is all we need.

Details

In this second half of the post we dive into details about the GLRSW scheme, in particular how it rehabilitates approximate polynomials by computing their closest low-degree polynomial. The proofs we present below are fleshed out versions of those from the original paper Self Testing/Correcting for Polynomials and Approximate Functions.

Conventions

In what follows

F

is a finite prime field and

d

is a positive integer with

d < # F

. We will sometimes write

n = # F

. We define the degree of a map

f : F \to F

to be the degree of its interpolation polynomial. Thus

\deg (f) < # F

. We write

F_{d} [X]

for the space of all degree

\leq d

polynomials with coefficients in

F

. We furthermore say that a map

f : F \to F

has low degree if

\deg (f) \leq d

In line with the GLRSW paper, we will consider an efficient program

P

that supposedly computes a certain function

f

. We write

P (x)

for the output of

P

on input

x

. The program

P

should compute

f

correctly most of the time, that is, we expect

P (x) = f (x)

for most inputs

x \in F

. One may think for instance of

P

as verifying a Merkle branch from (supposedly)

f (x)

to a Merkle root representing

f

A quick rundown

Step 1: Characterizing low-degreeness

We first establish that a certain form of redundancy in the values of a map

f : F \to F

is equivalent to the interpolating polynomial of that map having low degree (i.e. degree

\leq d

One direction (
$⟹$ ) is easily checked. It is a fact about abstract polynomials.
The converse, i.e. the fact that this form of redundancy in maps precisely characterizes those maps with low degree interpolating polynomial is established in the next section.

The converse seemingly requires one to work in a prime field to avoid having to divide by binomial coefficients which may be zero (because of positive characteristic).

The helper function g

We next introduce a helper function

g : F \to F

. The definition of

g

can be explained as follows. If

P

were indeed a low-degree polynomial function, then we would be able to predict any of its values

P (x)

starting with any

d + 1

of its values

P (x_{0})

\dots

P (x_{d})

. We would simply use the interpolation formula to make a (correct) prediction. What if

P

isn't polynomial, though? Then different sets of values

x_{0}, \dots, x_{d}

might lead to different predictions of the value of

P (x)

. We thus define

Informal definition of
$g$ . For
$x \in F$ ,
$g (x)$ is be the most popular prediction we get by interpolating values of
$P$ .

This definition is a little unsatisfying, though. There is potentially room for ambiguity:

Potential source of ambiguity 1. Of the predicted values, there might be two values that are tied for most popular predicted value.

That would be unfortunate, and it is something we have to deal with. A simple (yet not completely satisfactory) work-around is to arbitrarily break ties. Furthermore, one might question the value of this prediction: what if 998 values are respectively predicted

0.1001 %

of the time, and another value is predicted

0.1002 %

of the time? That value would be our definition of

g (x)

, but its value as a majority value is dubious.

Potential source of ambiguity 2. There might be no value that is predicted

> 50 %

of the time.

Note. It is important to note that

$g$ is never explicitly computed by anybody
nor is
$g$ meant to be efficiently computable.

If the verifier had the computational power to compute

g

even on a single value it might as well skip computing

g

altogether and directly verify that

P

is polynomial. The function

g

is simply here to be used in abstract arguments.

What matters most is that this

g

, whatever it may be, is unambiguously defined and contains information about

P

. The fact that

g

may indeed be unambiguously defined and without arbitrary choices (such as arbitrarily resolving ties) is the [first thing we establish] TODO

Step 2: Analyzing the helper function and drawing conclusions

We formulate the expectations one might place on

g

Expectation 1. If

P

is close enough to being polynomial, there is no ambiguity in the definition of

g

This is proven in Lemma 1. This result also has psychological value, as it tells us that we don't have to deal with the potential ambiguities listed above, see its Corollary. In this context, close enough means

δ < \frac{1}{4 (d + 1)}

Expectation 2. If

P

computes a map that is close to polynomial, then

g

and

P

ought to agree most of the time.

This is indeed true and established in Lemma 2. In this context, most of the time means on a proportion

> 1 - 2 δ

of all inputs.

Expectation 3. The if

P

is close enough to some low-degree polynomial map, then the helper function

g

is that low-degree polynomial map.

This is indeed true and established in Lemma 3. Thus

g

rehabilitates

P

provided

P

is accurate enough.

Low-degreeness implies redundancy

The following lemma is the theoretical foundation of the test. We consider pairwise distinct field elements

b, a_{0}, a_{1}, \dots, a_{d} \in F

. These will remain constant through out.

Lemma. There exist coefficients

α_{0}, \dots, α_{d} \in F

such that for all

P \in F_{d} [X]

P (X + b T) = \sum_{k = 0}^{d} α_{k} P (X + a_{k} T)

Proof. We start with a special case of

P

which turns out to be sufficient. Thus, expand, for

P = X^{d}

and arbitrary

α_{0}, \dots, α_{d} \in F

, both sides of the above:

{\begin{array}{rcl} (X + b T)^{d} & = & \sum_{k = 0}^{d} (\binom{d}{k}) b^{k} T^{k} X^{d - k} \\ \sum_{k = 0}^{d} α_{k} (X + a_{k} T)^{d} & = & \sum_{k = 0}^{d} (\binom{d}{k}) (\sum_{l = 0}^{d} α_{l} a_{l}^{k}) T^{k} X^{d - k} \end{array}

To achieve equality, it is enough that the

α_{i}

be a solution of the linear system

(\begin{matrix} 1 & 1 & 1 & \dots & 1 \\ a_{0} & a_{1} & a_{2} & \dots & a_{d} \\ a_{0}^{2} & a_{1}^{2} & a_{2}^{2} & \dots & a_{d}^{2} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ a_{0}^{d} & a_{1}^{d} & a_{2}^{d} & \dots & a_{d}^{d} \end{matrix}) (\begin{matrix} α_{0} \\ α_{1} \\ α_{2} \\ ⋮ \\ α_{d} \end{matrix}) = (\begin{matrix} 1 \\ b \\ b^{2} \\ ⋮ \\ b^{d} \end{matrix})

This Vandermonde system is known to be invertible and so there is a unique solution

(α_{0}, \dots, α_{d}) \in F^{d + 1}

. We note that had we written a similar system for

X^{k}

with

k \leq d

, the resulting constraints would have been a subset of those for

P = X^{d}

. Hence the

α_{i}

we just identified work for all degree

\leq d

polynomials.

Redundancy implies low degreeness

In view of the previous lemma, any degree

\leq d

polynomial map

f : F \to F

satisfies

\forall s, t \in F, f (s + b t) = \sum_{k = 0}^{d} α_{k} f (s + a_{k} t)

Since this is the basis of the proximity test above, one should wonder about the converse: are degree

\leq d

these the only maps

f : F \to F

satisfying this property? To answer to that question let us define, for every map

f : F \to F

a new map

\hat{f} : F \times F \to F

by the formula

\forall s, t \in F, \hat{f} (s, t) := f (s + b t) - \sum_{i = 0}^{d} α_{i} f (s + a_{i} \cdot t)

We can thus restate the previous question as

Question. Are degree

\leq d

these the only maps

f : F \to F

satisfying

\hat{f} = 0

The answer is YES:

Lemma. Let

f : F \to F

be a map. The following are equivalent:

$\hat{f} = 0$
$\deg (f) \leq d$

Proof. The proof isn't complicated. We use polynomial interpolation of bivariate maps to convert the problem from one about maps to one about polynomials. First of all, bivariate polynomial interpolation is possible. The map

F_{n - 1, n - 1} [X, Y] \to F (F \times F, F)

that sends a bivariate polynomial

P

of bidegree

\leq (n - 1, n - 1)

to the associated function

F \times F \to F, (x, y) \mapsto P (x, y)

is an isomorphism. This is easily seen using the polynomials

L_{a, b}

defined by

L_{a, b} (X, Y) = (\prod_{\begin{matrix} α \in F \\ α \neq a \end{matrix}} \frac{X - α}{a - α}) \cdot (\prod_{\begin{matrix} β \in F \\ β \neq a \end{matrix}} \frac{Y - β}{b - β})

which vanish everywhere except at the one point

(a, b) \in F^{2}

Similarly to what we did with functions, we can define, for any univariate polynomial

P \in F_{n - 1} [X]

a bivariate polynomial

\hat{P} \in F_{n - 1, n - 1} [S, T]

like so

\hat{P} = P (S + b T) - \sum_{i = 0}^{d} α_{i} P (S + a_{i} \cdot T)

The procedures

f \mapsto \hat{f}

and

P \mapsto \hat{P}

are compatible with interpolation in the sense that if

f

is a map

F \to F

and

P_{f}

is its degree

< n

interpolating polynomial, then

\hat{P_{f}}

\hat{f}

's bivariate interpolation polynomial.

Now suppose

f

satisfies

\hat{f} = 0

. Then

\hat{P_{f}} = 0

and if

e

f

's degree (i.e.

e = \deg (P_{f})

), then by looking solely at the degree

e

term of

P_{f}

we get that

(S + b T)^{e} = \sum_{i = 0}^{d} α_{i} (S + a_{i} T)^{e}

and by isolating the

X^{e - j} T^{j}

terms we see that

\forall j \in {0, \dots, e}, b^{j} = \sum_{i = 0}^{d} α_{i} a_{i}^{j}

and so

(\begin{matrix} 1 \\ b \\ b^{2} \\ ⋮ \\ b^{d} \\ ⋮ \\ b^{e} \end{matrix}) = α_{0} \cdot (\begin{matrix} 1 \\ a_{0} \\ a_{0}^{2} \\ ⋮ \\ a_{0}^{d} \\ ⋮ \\ a_{0}^{e} \end{matrix}) + α_{1} \cdot (\begin{matrix} 1 \\ a_{1} \\ a_{1}^{2} \\ ⋮ \\ a_{1}^{d} \\ ⋮ \\ a_{1}^{e} \end{matrix}) + α_{2} \cdot (\begin{matrix} 1 \\ a_{2} \\ a_{2}^{2} \\ ⋮ \\ a_{2}^{d} \\ ⋮ \\ a_{2}^{e} \end{matrix}) + \dots + α_{d} \cdot (\begin{matrix} 1 \\ a_{d} \\ a_{d}^{2} \\ ⋮ \\ a_{d}^{d} \\ ⋮ \\ a_{d}^{e} \end{matrix})

This is where we get a contradiction: this is impossible for

e \geq d + 1

for it would contradict the invertibility of the Vandermonde matrix

VdM (b, a_{0}, a_{1}, \dots, a_{d})

(recall that the

b, a_{0}, \dots, a_{d}

are supposed pairwise distinct.)

Note. We implicitely used the assumption that

F

is a prime field. Indeed, in the step where we " looked at

X^{e - j} T^{j}

terms " we divided by a binomial coefficient

(\binom{e}{j})

with

e < # F

. To be legitimate in doing so (i.e. to be sure we don't divide by zero) we have to assume

n = # F

is prime.

Expected value of interpolation: the helper function g

Suppose you are given oracle access to the values of a map

P

. In other words, you have some black box that spits out values

P (x)

when you feed it some field element

x \in F

. Suppose furthermore you are told: "That function

P : F \to F

is a low degree polynomial map. How low you ask? Its degree is

\leq d

." How would you go about convincing yourself of that claim? If

P

's domain

F

is large, it is hopeless to ask for a perfect proof: you would have to interpolate

P

from

d + 1

values and compare the values predicted by your formula to those queried from the oracle.

You might on the other hand try to use the characterization of low degree maps presented above. While we won't be able to find efficient tests of low-degreeness per se, what we can do is reliably reject oracles

P

that are far from any degree

\leq d

polynomial. That is, what we describe below is a probabilistic test of proximity.

To simplify things slightly, we fix

b = 0

and consider pairwise distinct nonzero

a_{0}, \dots, a_{d} \in F

. There is no reason to try to be fancy about choosing the

a_{i}

, so taking

a_{i} = i

for instance is a perfectly valid choice. If

P

were indeed polynomial of degree

\leq d

, we would have, for all

x, t \in F

P (x) = \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t)

This can't be reasonably checked on all inputs

x, t

. But starting from this observation we can make some abstract definitions that turn out to be useful in the analysis. First of all we define a nonnegative

δ = δ_{P}

by the formula

δ = \frac{| {(x, t) \in F^{2} | P (x) \neq \sum_{i = 0}^{d} P (x + α_{i} \cdot t)} |}{| F |^{2}}

This is simply the proportion of inputs

(x, t)

that fail the expected equation. We can also define subsets of

F

associated with

P

. For instance,

{Maj}_{P}

is the set of all

x

such that the right hand side of this equation takes on a particular value a majority of the time, i.e. more than half of the time with respect to

t

. We thus define

{Maj}_{P} = {x \in F | \exists θ \in F, s.t. > 50 % of all t satisfy θ = \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t)}

We can define a map^[9]

g : {Maj}_{P} \to F

by setting

g (x) = θ

where

θ

is the value that appears

> 50 %

of the time as

\sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t)

. No harm is made by arbitrarily extending

g

to a map

F \to F

. We can also define the subset where, conveniently, that majority value coindices with the value predicted by the oracle and that predicted by the supposed low degreeness

{Conv}_{P} = {x \in F | > 50 % of all t satisfy P (x) = \sum_{i = 1}^{d + 1} α_{i} P (x + a_{i} \cdot t)}

Here's a picture to accompany these definitions. We can think of every pair

(x, t) \in F^{2}

as a test case for the expected equality

P (x) \overset{?}{=} \sum_{i = 1}^{d + 1} α_{i} P (x + a_{i} \cdot t)

. In the diagram below, blue dots signal that this expected equality holds true, yellow crosses that it fails.

For that particular oracle

P

, most pairs

(x, t)

satisfy the expected equality. If we actually count the yellow crosses we get

δ = δ_{P} = 39 / 256 \approx 15, 23 %

. Every column that is majority blue is, by definition, indexed by some

x \in {Conv}_{P}

Note that three columns have fewer than half of the expected equalities hold. These are columns indexed by

x \notin {Conv}_{P}

, yet some, such as the first from the left, might still define an unambiguous majority value, e.g. the

x

-coordinate of the first red column might still belong to

{Maj}_{P}

Lemma 1 - the helper function and the oracle agree a lot

We start with a simple lemma. Let

X, Y

be independent, identically distributed (iid) random values with values in a finite set

V

. Also let

v_{0}

be (a) the value

X

is likeliest to have, that is

P [X = v_{0}] = max_{v \in V} P [X = v]

Let

E

be the event that

X

and

Y

agree, i.e.

E = [X = Y]

, then

Lemma.

P [E] \leq P [X = v_{0}]

Proof. Decompose the event

E

according to the common value of

X

and

Y

E = ⨆_{v \in V} [X = v] \cap [Y = v] .

Taking probabilities and since

X

and

Y

are iid,

\begin{array}{rcl} P [E] & = & \sum_{v \in V} P [X = v]^{2} \\ \leq & \sum_{v \in V} P [X = v] P [X = v_{0}] \\ = & P [X = v_{0}] . \end{array}

Fix

x \in F

. Consider two independent uniformly distributed random variables

t_{1}

and

t_{2}

with values in

F

. Consider the random variables

{\begin{cases} {Pred}_{1} = \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t_{1}) \\ {Pred}_{2} = \sum_{j = 0}^{d} α_{i} P (x + a_{j} \cdot t_{2}) \end{cases}

These represent random predictions of the value of

P (x)

obtained by interpolation. These random variables and are independent and identically distributed. Also

g (x)

is, by definition, the most likely value of either. In light of the previous lemma we le the

E

be the event where two predictions agree:

E = [{Pred}_{1} = {Pred}_{1}]

The previous lemma tells us that

P [E] \leq [\begin{matrix} Proportion of t \in F such that \\ \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t) = g (x) \end{matrix}]

If we can find a usable lower bound for the probability

P [E]

, we get a lower bound for the proportion of inputs realizing the majority value

g (x)

In order to produce such a lower bound, we consider the subevent

\begin{array}{rcl} E & \supset & \overset{(1)}{\overset{⏞}{⋂_{i = 0}^{d} [P (x + a_{i} \cdot t_{1}) = \sum_{j = 0}^{d} α_{j} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})]}} \\ \cap \underset{(2)}{\underset{⏟}{⋂_{j = 0}^{d} [P (x + a_{j} \cdot t_{2}) = \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})]}} \end{array}

The right hand side is indeed a subset of

E

: if all the conditions on the right are met, then

\begin{array}{rcl} {Pred}_{1} & = & \sum_{i = 0}^{d} α_{i} P (x + i \cdot t_{1}) \\ \overset{(1)}{=} & \sum_{i = 0}^{d} α_{i} (\sum_{j = 0}^{d} α_{j} P (x + i \cdot t_{1} + a_{j} \cdot t_{2})) \\ = & \sum_{j = 0}^{d} α_{j} (\sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})) \\ \overset{(2)}{=} & \sum_{j = 0}^{d} α_{i} P (x + a_{j} \cdot t_{2}) \\ = & {Pred}_{2} \end{array}

These subevents allow us to have th random variables

t_{1}, t_{2}

interact. Obtaining a lower bound on

P [E]

is equivalent to getting an upper bound on

P [Ω ∖ E]

. The above gives

\begin{array}{rcl} Ω ∖ E & \subset & ⋃_{i = 0}^{d} [P (x + i \cdot t_{1}) \neq \sum_{j = 1}^{d + 1} α_{j} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})] \\ \cup ⋃_{j = 0}^{d} [P (x + j \cdot t_{2}) \neq \sum_{i = 1}^{d + 1} α_{i} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})] \end{array}

So that

\begin{array}{rcl} P [Ω ∖ E] & \leq & \sum_{i = 0}^{d} P [P (x + a_{i} \cdot t_{1}) \neq \sum_{j = 0}^{d} α_{j} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})] \\ + \sum_{j = 0}^{d} P [P (x + a_{j} \cdot t_{2}) \neq \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})] \end{array}

Now for any

i = 0, \dots, d

the random variable

t_{1}^{'} = x + a_{i} \cdot t_{1}

is uniformly distributed and independent from

t_{2}

, and similarly for any

j = 0, \dots, d

the random variable

t_{2}^{'} = x + a_{j} \cdot t_{2}

is uniformly distributed and independent from

t_{1}

, and so, by definition of

δ

\begin{array}{l} P [P (x + a_{i} \cdot t_{1}) \neq \sum_{j = 0}^{d} α_{j} P (x + a_{i} \cdot t_{1} + a_{j} \cdot t_{2})] \\ = P [P (t_{1}^{'}) \neq \sum_{j = 0}^{d} α_{j} P (t_{1}^{'} + a_{j} \cdot t_{2})] \\ = δ \end{array}

and similarly,

\begin{array}{l} P [P (x + a_{j} \cdot t_{2}) \neq \sum_{i = 0}^{d} α_{i} P (x + a_{j} \cdot t_{2} + a_{i} \cdot t_{1})] \\ = P [P (t_{2}^{'}) \neq \sum_{j = 0}^{d} α_{i} P (t_{2}^{'} + a_{i} \cdot t_{1})] \\ = δ \end{array}

Injecting these into the upper bound for

P [Ω ∖ E] = 1 - P [E]

obtained previously proves

Lemma 1. For any

x \in F

1 - 2 (d + 1) δ \leq P [E] \leq P [{Pred}_{1} = g (x)] .

As promised, we can thus lift any and all unpleasant ambiguities in

g

's definition. Notice that

1 - 2 (d + 1) δ > \frac{1}{2}

as soon as

δ < \frac{1}{4 (d + 1)}

Corollary. Suppose that

δ < \frac{1}{4 (d + 1)}

, then for all

x \in F

g (x)

represents

> 50 %

of the the values

\sum_{i = 0}^{d} α_{j} P (x + a_{i} \cdot t)

where

t \in F

. In other words,

{Maj}_{P} = F

Lemma 2 - the helper function and the oracle agree a lot

Let us make the headline precise. We just proved that whenever

δ < \frac{1}{4 (d + 1)}

, then

g (x)

is unambiguously defined for all

x \in F

, that is

{Maj}_{P} = F

. Recall that we defined a subset

{Conj}_{P} \subset {Maj}_{P}

of so-called "convenient" inputs. Those were the

x \in F

where, not only

g

was unambiguously defined, but

g (x)

coïncided with

P (x)

. We now establish that as long as

δ

is small enough,

{Conj}_{P}

will be large. I.e. if

δ

is small, then the helper function

g

and the oracle agree often.

Lemma 2. We have

\frac{| {Conv}_{P} |}{| F |} \geq 1 - 2 δ

Proof. Let us consider the pairs

(x, t)

where the expected equality

P (x) \overset{?}{=} \sum_{i = 0}^{d} α_{i} P (x + a_{i} \cdot t^{'})

fails. We distinguish two cases:

x \in {Conj}_{P}

and

x \notin {Conj}_{P}

Thus, by definition of

δ

The last inequality follows from the fact that when

x \notin {Conv}_{P}

g (x) \neq P (x)

and thus

P (x)

is not a majority value (or it shares that title with

g (x)

and was arbitrarily disregarded as the choice for the majority value). In particular, the proportion of

t \in F

such that

P (x) \neq \sum_{i = 0}^{d} P (x + a_{i} \cdot t)

is at least

50 %

. Rearranging things, we get the inequality from the lemma.

Lemma 3 - the helper function is a low degree polynomial

We first state an informal version of the lemma to prove. A precise statment will be given at the end.

Lemma 3 (informal). If

δ

is small enough, then the helper function

g

is polynomial.

The proof strategy is quite interesting: to prove that

g

is polynomial of degree

\leq d

(or to be precise:

g

's interpolating polynomial has degree

\leq d

) the authorsinvoke the characterization of low degree functions. The goal is thus to show that for any two

x, t \in F

one has

g (x) = \sum_{i = 0}^{d} α_{i} g (x + a_{i} \cdot t) i.e. \underset{(Eq. {\color r e d ⋆}_{x, t})}{\underset{⏟}{\sum_{i = 0}^{d + 1} α_{i} g (x + a_{i} \cdot t) = 0}}

where we set

a_{d + 1} = b = 0

and

α_{d + 1} = - 1

. What makes the proof strategy interesting is that the satisfaction of (Eq.

{\color r e d ⋆}_{x, t}

) for a particular pair

(x, t) \in F^{2}

, which is either true or false, is established using a probabilistic method. To wit, consider the following event

\begin{array}{rcl} F = F_{x, t} & = & \overset{(3)}{\overset{⏞}{⋂_{i = 0}^{d + 1} [g (x + a_{i} t) = \sum_{j = 0}^{d} α_{j} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))]}} \\ \cap \underset{(4)}{\underset{⏟}{⋂_{j = 1}^{d} [0 = \sum_{i = 0}^{d + 1} α_{i} P (x + a_{j} t_{1} + a_{i} (t + a_{j} t_{2}))]}} \end{array}

where

t_{1}

and

t_{2}

are independent uniformly distributed random variables in

F

. The relation with (Eq.

{\color r e d ⋆}_{x, t}

) is that, whenever the conditions in

F

are met, one has

\begin{array}{rcl} \sum_{i = 0}^{d + 1} α_{i} g (x + a_{i} t) & \overset{(3)}{=} & \sum_{i = 0}^{d + 1} α_{i} \sum_{j = 0}^{d} α_{j} P (x + a_{i} t + a_{j} (t_{1} + i t_{2})) \\ = & \sum_{j = 0}^{d} α_{j} \sum_{i = 0}^{d + 1} α_{i} P (x + a_{j} t_{1} + a_{i} (t + j t_{2})) \\ \overset{(4)}{=} & 0. \end{array}

Thus, to conclude that (Eq.
${\color r e d ⋆}_{x, t}$ ) holds it is enough to show that the event
$F_{x, t}$ has nonzero probability. The goal now is to bound the probability of

F

from below, in particular, to show that it is

> 0

when

δ

is small enough.

Again, the idea will to bound

P [F^{c}]

from above. First of all,

\begin{array}{rcl} F^{c} & = & ⋃_{i = 0}^{d + 1} [g (x + a_{i} t) \neq \sum_{j = 0}^{d} α_{j} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))] \\ \cup ⋃_{j = 0}^{d} [0 \neq \sum_{i = 0}^{d + 1} α_{i} P (x + a_{j} t_{1} + a_{i} (t + a_{j} t_{2}))] \end{array}

so that

\begin{array}{rcl} 1 - P [F] = P [F^{c}] & \leq & \sum_{i = 0}^{d + 1} \overset{\color r e d I}{\overset{⏞}{P [g (x + a_{i} t) \neq \sum_{j = 0}^{d} α_{j} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))]}} \\ + \sum_{j = 0}^{d} \underset{\color b l u e II}{\underset{⏟}{P [0 \neq \sum_{i = 0}^{d + 1} α_{i} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))]}} \end{array}

Bounding

$\color r e d I$ . Since for all

i

, the random variable

t_{1} + a_{i} t_{2}

is uniformly distributed, we can apply the result from Lemma 1 and obtain

\forall i, P [g (x + a_{i} t) \neq \sum_{j = 0}^{d} α_{j} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))] \leq 2 (d + 1) δ .

Bounding

$\color b l u e II$ . Take

j \in {0, \dots, d}

. For any

i \in {0, \dots, d + 1}

, rearranging terms

P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2})) = P (\boldsymbol ξ + a_{i} \boldsymbol τ)

where

\boldsymbol ξ = x + a_{j} t_{1}

and

\boldsymbol τ = t + a_{j} t_{2}

. Note that are

\boldsymbol ξ

and

\boldsymbol τ

are both uniformly distributed in

F

(since

a_{j} \neq 0

) and independent (since

t_{1}

and

t_{2}

are). Then by definition of

δ

\begin{array}{rcl} P [0 \neq \sum_{i = 0}^{d + 1} α_{i} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))] & = & P [0 \neq \sum_{i = 0}^{d + 1} α_{i} P (x + a_{j} t_{1} + a_{i} (t + a_{j} t_{2}))] \\ = & P [0 \neq \sum_{i = 0}^{d + 1} α_{i} P (\boldsymbol ξ + a_{i} \boldsymbol τ))] \\ = & δ . \end{array}

Combining these bounds yields:

\begin{array}{rcl} 1 - P [F] = P [F^{c}] & \leq & \sum_{i = 0}^{d + 1} P [g (x + a_{i} t) \neq \sum_{j = 0}^{d} α_{j} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))] \\ + \sum_{j = 0}^{d} P [0 \neq \sum_{i = 0}^{d + 1} α_{i} P (x + a_{i} t + a_{j} (t_{1} + a_{i} t_{2}))] \\ \leq & (d + 2) \cdot 2 (d + 1) δ + (d + 1) \cdot δ = (d + 1) (2 d + 5) δ \end{array}

i.e.

1 - (d + 1) (2 d + 5) δ \leq P [F] .

Therefore, as soon as

δ < \frac{1}{(d + 1) (2 d + 5)}

g

is polynomial. We can now state Lemma 3 with precision.

Lemma 3. If

δ < \frac{1}{(d + 1) (2 d + 5)}

, the helper function

g

is polynomial.

Conclusion

Lemma 3 shows how to "recover" the unique low-degree neighbor function

g

of a map

P

as soon as

δ

is small enough. Thus

g

is the polynomial function the program

P

is meant to be computing.

A question

Let

F

be a prime field. Let

x, x_{1}, \dots, x_{d} \in F

are pairwise distinct and, similarly, let

y, y_{1}, \dots, y_{d} \in F

are pairwise distinct. Suppose that for all

i = 1, \dots, d

\prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{x - x_{j}}{x_{i} - x_{j}} = \prod_{\begin{matrix} 1 \leq j \leq d \\ j \neq i \end{matrix}} \frac{y - y_{j}}{y_{i} - y_{j}}

This is the case for instance if

y, y_{1}, \dots, y_{d}

are obtained from

x, x_{1}, \dots, x_{d}

by means of an affine transformation

u \mapsto a u + b

(with

a \neq 0

). One might ask: is the converse true?

Question. Is it necessarily the case that there exist

(a, b) \in F^{\times} \times F

such that

y, y_{1}, \dots, y_{d}

are obtained from the

x, x_{1}, \dots, x_{d}

by application of the affine transformation

ϕ_{a, b} : u \mapsto a u + b

Note. The restriction to prime fields could potentially be relaxed by allowing automorphisms

σ \in Aut (F)

in the affine transformation formula:

ϕ_{a, b, σ} : u \mapsto a σ (u) + b

and with, say,

σ

the Frobenius automorphism of the subfield generated by the

d

weights above.

although, more on that later. ↩︎
although concrete implementations vary greatly ↩︎
the precise way in which this interpolation is done varies. In particular, the domain of interpolation is usually a structured subset of a (finite) field
$F$ : a subgroup/coset of
$F^{\times}$ or an additive sugroup/coset of
$F_{p^{d}}$ . ↩︎
which vectors also support ↩︎
indeed, if
$\deg (P), \deg (Q) ⋘ # F$ and if
$s$ is drawn at random in a large field
$F$ , then
$P (s) = Q (s)$ happens with probability at most
$\frac{max {\deg (P), \deg (Q)}}{# F} ⋘ 1$ . ↩︎
i.e. we think of maps
$f : F \to F$ as the points of
$F$ ↩︎
when all conditions
$f (x) = P_{f} (x)$ ,
$f (x_{1}) = P_{f} (x_{1})$ , …,
$f (x_{d}) = P_{f} (x_{d})$ are met simultaneously. There can of course be serendipitous collisions, but those are harder to account for. ↩︎
when precisely one of the conditions
$f (x) = P_{f} (x)$ ,
$f (x_{1}) = P_{f} (x_{1})$ , …,
$f (x_{d}) = P_{f} (x_{d})$ isn't met. Of course more than one of these may fail, but it is then unclear whether the equation fails too. ↩︎
we defined
${Maj}_{P}$ precisely in order to define
$g$ ↩︎

The simplest proximity-to-low-degree-polynomial test and how to rehabilitate approximate polynomials

Introduction

Role of proofs of proximity in transparent computational integrity schemes

Proofs of proximity vs the GLRSW scheme

The barren wasteland of the space of all maps

The civilizing influence of low degree maps on their neighbors

A neighborhood of maps

… Taming / civilizing influence?

Polynomials <=> built in redundancy

An inefficient proximity test

A simple criterion to rule out maps that are far from low degree maps

On the wastefulness of this test

Outline of the GLRSW scheme

Details

Conventions

A quick rundown

Step 1: Characterizing low-degreeness

The helper function g

Step 2: Analyzing the helper function and drawing conclusions

Low-degreeness implies redundancy

Redundancy implies low degreeness

Expected value of interpolation: the helper function g

Lemma 1 - the helper function and the oracle agree a lot

Lemma 2 - the helper function and the oracle agree a lot

Lemma 3 - the helper function is a low degree polynomial

Conclusion

A question

Read more

Notes from a First Encounter with Class Groups

Filling out details in Baker's _Comprehensive Course_