Try   HackMD

The simplest proximity-to-low-degree-polynomial test and how to rehabilitate approximate polynomials

Introduction

This post is an apetizer for a blog post in preparation on the FRI protocol. We describe what is arguably the simplest of all proof of proximity (to low degree polynomial functions)[1] scheme out there. Here and elsewhere, low degree means "of degree

d" for some fixed positive integer
d
. But our interest in this scheme is as a means to shed light on a 👻 dual 👻 problem to that of proximity testing.

This scheme was originally described in the 1991 paper Self Testing/Correcting for Polynomials and Approximate Functions by P. Gemmel, R. Lipton, R. Rubinfeld, M. Sudhan and A. Widegerson (we shall sometimes use the acronym GLRSW). It has been referred to as "Sudhan

d+1" or some variation on that name in presentations on FRI given by Eli Ben Sasson. In StarkWare's blogpost on Low Degree Testing, it is what is called "the direct test".

Compared to modern proof of proximity schemes such as FRI it is spectacularly inefficient: it runs in linear time

O(d) as opposed to FRI's polylogarithmic time
O(ln(d)2)
. However, it has the advantage of sheer simplicity:

  • there is only ever one map at play (as opposed to the
    O(ln(d))
    inter-dependent maps in FRI);
  • coherence is checked for that one map as opposed to having a "trickle down" sequence of coherence conditions relating all the maps in that sequence;
  • the redundancy property at the heart of this scheme is second nature to anyone familiar with polynomial interpolation.

Also, proving its "soundness properties", while involved, follows a clear path. It also illustrates well a popular outlook for proving soundness: "deal with the good cases in detail, don't bother analyzing what happens in the bad cases except for bounding the probability of something bad happening in the first place". Usually the good case is when everything happens according to plan, bad cases are everything else. This makes the GLRSW scheme a perfect entry point into proof of proximity schemes.



Role of proofs of proximity in transparent computational integrity schemes

We somewhat cryptically indicated that GLRSW is really about a dual problem to that of low-degree proximity testing. Let us try to explain that claim. So before we go any further: what are proof of proximity schemes and what are they useful for? Certainly the main use-case relevant to blockchain related applications is as part of computational integrity schemes.

Computational integrity schemes such as STARKs or SNARKs and many others are protocols whereby a prover can convince a verifier that a complex and/or confidential computation was performed as advertized. The tremedous appeal of such schemes comes from the existence of computational integrity schemes that are

  • far quicker to verify than it is to run the computation,
  • leak no private information,
  • are nigh impossible to twist into producing false positives.

Such schemes rely on a prover to construct a proof of computational integrity. The prover's work can be broadly separated into two phases.

The first phase, common to all such schemes[2], is the arithmetization of the computation (R1CS or algebraic execution trace for instance). The raw data produced during the computation is converted into algebraic form (typically field elements) which in turn is condensed into polynomial form by means of polynomial interpolation[3]. Polynomials are useful in this regard: besides addition and scalar multiplication[4], they support products and divisibility. Importantly, this first phase can be done so that computational integrity (i.e. validity of the underlying computation) is equivalent to the satisfaction of some algebraic constraints by the resulting polynomials. Typically low-degreeness and divisibility conditions.

The second phase, i.e. actually compiling the proof, comes down to finding an efficient commitment of these polynomials. This commitment should allow a verifier to convince themselves of the claim to be proven. In particular, of low-degreeness and divisibility conditions that may apply. There are various ways of doing this, and (at least) two competing philosophies for carrying out the second step.

Philosophy 1: use opaque data and secrets to force the prover down a particular path; valid proofs are those producing ultra-rare collisions. The verifier thus generates some secrets along with associated opaque data (hidden behind a hard discrete log problem, say) called the proving key to be handed to the prover. The proving key is generated in such a way that for one to generate a convincing proof from it, one has to

  1. either comply with the verifier and produce a proof according to protocol; such proofs encode polynomials that are low-degree by construction;
  2. or successfully break a cryptographically hard problem (e.g. variants of discrete log computations).

For checking divisibility conditions, the relevant fact is that two distinct low degree polynomials

P and
Q
virtually never[5] produce a collision when evaluated at a random (secret) point. Thus, producing an exceedingly rare collision
P(s)=Q(s)
at a random (secret) point
s
is seen as strong supporting evidence for the claim "
P=Q
". By extension, a collision of the form
A(s)H(s)=B(s)
is interpreted as strong evidence for a divisibility claim "
AB
".

Philosophy 2: no secrets, find another way to enforce low-degreeness / divisibility conditions. Schemes of the first kind had low-degreeness baked into them. For schemes where the prover isn't forced down a narrow path for constructing polynomial commitments, this is no longer the case.

This is immediately problematic: producing collisions between polynomials becomes easy if one is free to use large degree polynomials. Checking for low-degreness thus becomes the central problem. And a difficult problem it certainly is. Checking low-degreeness on the nose is computationnally demanding, at least as demanding as the original computation the verifier may wish to bypass. Proofs of proximity to low-degree maps are an efficient substitute for on the nose low-degree tests. They don't prove that a map is low-degree, they show that a map

f is (likely) in the immediate neighborhood of some low degree map whose properties may be extracted by other means from
f
.

Once the verifier has sufficient supporting evidence for believing in low-degreeness (or at least proximity to a low degree map), checking divisibilty conditions follows much of the same path as before: open commitments at a random points

r (that usually need not be secret) and check
A(r)H(r)=?B(r)
in the clear.

Proofs of proximity vs the GLRSW scheme

The GLRSW scheme we will describe below can be seen as a crude means to discriminate between maps. To distinguish between those that are very close to being polynomial and those that are far from polynomial. In that sense it can work like a proof of proximity scheme. Indeed, the purpose of such a scheme is to

  • ACCEPT with probability 1 low degree polynomial functions
  • REJECT with high probability maps that are far from any low degree polynomial function.

The behaviour of such a proof of proximity scheme is illustrated below: low degree maps (here the central red dot) are to be accepted all the time, maps that are outside of some small neighborhood of the set of low degree polynomial maps are to be rejected with high probability.

And then there is a grey area (a small Hamming neighborhood of some low degree polynomial map) where the test's behaviour is unspecified. The GLRSW scheme on the other hand fulfils the complementary (or dual) role of "discovering" the nearest low degree map when fed one of maps in that small neighborhood.

  • Proofs of proximity can be seen as means of amplifying the distinction between polynomial maps and their distant neighbors,
  • the GLRSW scheme can be understood as contracting small neighborhoods of polynomial functions onto that polynomial function.

So while proof of proximity schemes are in the business of discrimination, GLRSW is in the business of rehabilitation. The next section are our attempt to make explicit the above picture.

The barren wasteland of the space of all maps

Let us think for a moment about the space

F of all maps
f:FF
F={the space of allset maps FF}
(similar mental pictures will apply to the space of all maps
SF
for some large subset
f:SF
). When
F
is a large finite field,
F
is a very large set:
#F=(#F)#F
For instance, if
F
is a 256 bit prime field, we get
#F22264
. In other words,
F
's size is beyond comprehension. Most of the points in that space[6] are totally unstructured ''random'' maps.

The overwhelming majority of points of

F are maps whose interpolation polynomial has degree
#F
.

Indeed, there are only

(#F)#F1 polynomial maps of degree
<#F
, i.e. the probability of stumbling on a polynomial maps of degree
<#F
by chance is
1/(#F)2256
.

For all intents and purposes low degree maps don't exist.

For instance, if we consider the collection of all polynomial maps of degree

109, a very reasonable bound for any real world application, the likelyhood of stumbling on such a function by chance is 1 in
(#F)#F109
which is, for a 256 bit field,
than 1 in
22263
.

The civilizing influence of low degree maps on their neighbors

A neighborhood of maps

While low degree maps are exceedingly rare they do, of course, exist. Furthermore, they exert a taming influence on their immediate neighbor functions. When talking about neighboring functions we are implicitly talking about distances between maps. To measure distances between maps, i.e. points in

F, we use Hamming distance. The Hamming distance
d(f,g)[0,1]
between two maps
f,g:SF
is the proportion of inputs in
S
on which they differ. Thus if
f=g
they differ on no inputs and
d(f,g)=0
while if for all
sS, f(s)g(s)
then
d(f,g)=1
. Thus, for any
ϵ>0
we can consider the "neighborhood of
fF
" comprised of all maps
g
with
d(f,g)<ϵ
.

The picture below is a mental picture of a neighborhood of a low degree map

\colorredP in
F
. It's a pretty large picture, and opening it in a new window will make its features clearer.

In it we depict

  • a low degree map
    \colorredP
    ,
  • a somewhat close neighbor
    \colororangef
    with, say,
    d(\colororangef,\colorredP).001
    )
  • a slightly remote neighbor
    \colorgreeng
    with, say,
    d(\colorgreeng,\colorredP).1
    )
  • and a map
    \colorblueh
    that bears some ressemblance (yet not all that much) with
    \colorredP
    , say
    h
    with
    d(\colorblueh,\colorredP).5
    ).

The white halo around

\colorredP is meant to represent the small neighborhood on which the civilizing influence of
\colorredP
can be felt. And then there is a dense and featureless blue ocean of "generic maps".

Taming / civilizing influence?

What do we mean by "taming influence"? First of all, let us say clearly what we don't mean. We don't mean to suggest that close neighbors of a low degree map

\colorredP are themselves low degree Far from it. Close neighbors are overwhelmingly likely to be of degree
#F
, the maximum degree possible Yet, to the casual observer working with incomplete information (e.g. a probabilistic polynomial time Turing machine with oracle access to a close neighbor of
\colorredP
) they are nigh indistinguishable from the polynomial map
\colorredP
in whose light they bask. Indeed

Close neighbors of low degree maps exhibit many of the same local redundancies which characterize low degree maps.

Let us qualify that statement: they do with exceedingly high probability. When tested on random inputs:

  • maps that are close to a polynomial exhibit many of the local redundancies exhibited by low degree polynomials
  • maps that are far from low degree polynomial maps don't.

This loose dichotomy is the basis for proofs of proximity. We can probabilistically test for proximity to a polynomial by checking if the map in question exhibits the expected amount of redundancies.

Polynomials <=> built in redundancy

Let us now be slightly more clear about the way in which polynomials exhibit redundancy. This is pretty basic stuff (interpolation understood using linear algebra) and can nicely be illustrated. In one word this boils down to the fact that the space

Pd of polynomial maps of degree
<d
is a vector space of dimension
d
and the linear functions " evaluation at
xi
,
i=1,,d
"
evxi:{PdFff(xi)
for distinct points
x1,x2,,xdF
form a basis of the dual space
Pd
.

So, for instance, while there is a whole " line's worth " of polynomial maps

f of degree
<2
satisfying the constraint
f(x1)=y1
, there is but one polynomial map of degree
<2
satisfying the extra constraint
f(x2)=y2
, as depicted below:

Single Constraint Two constraints

Similarly there is a whole " line's worth " of polynomial maps

f of degree
<3
satisfying the constraints
f(x1)=y1
and
f(x2)=y2
(whatever the values
y1,y2
as long as
x1x2
), but there is only one polynomial map of degree
<3
satisfying the extra constraint
f(x3)=y3
, as depicted below:

Two Constraints Three constraints
All degree  polynomials with two values fixed A single degree  polynomial satisfies these two constraints

This is simply expressing the fact that for distinct

x1,,xdF the " evaluation at
xi
linear forms " form a basis of the dual
Pd
.

Its (ante)dual basis is the basis of Lagrange polynomials. Indeed, to produce the only polynomial function

fPd such that
f(xi)=yi
for
i=1,,d
, one forms
f=i=1dyiLi
where the Lagrange polynomials of the set
{x1,,xd}
:
Li=1jdjiXxjxixj

The fact that the family of linear functionals

(evx1,,evxd) forms a basis of the space of linear functionals on
Pd
also means that evaluations at points
xF{x1,,xd}
can be expressed as linear combinations of evaluations at the
xi
. This accounts for the " in-built redundancy " of polynomial maps.
Predictive power of polynomials
Indeed, if
f
is a degree
<d
polynomial function we can compute
f(x)
for any
x
by plugging
x
into the formula above and using its values at the
xi
:
f(x)=i=1dLi(x)f(xi)
or, if we write
λix=Li(x)
,
f(x)=i=1dλixf(xi).

An inefficient proximity test

We describe a (crude) proximity test to low-degree polynomials. The redundancy within low-degree polynomials is at the heart of it. As we tried to suggest earlier, these redundancies remain by and large true for functions that are very close to low degree polynomial functions. This, in spite of the fact that these functions are hopelessly high degree polynomial functions. These maximal degree polynomial maps do their darndest to emulate low degree maps. These maps will usually pass the following test:

The number of repetitions

N required will depend on a target soundness.

A simple criterion to rule out maps that are far from low degree maps

Let us put

ϵ=d(f,Pd)1: in other words, there is some low-degree polynomial
Pf
that agrees with
f
on a random input with probability
1ϵ
. Then the equation
f(x)=?i=1dλixf(xi),
for randomly and independently sampled
x,x1,,xdF
is satisfied with probability at least
(1ϵ)d+1
[7] and fails with probability at least
(d+1)ϵ(1ϵ)d
[8]. Using these simple estimates and fixing some desired threshold
ϵ0
one can find
N
so that with overwhelming probability the test will reject maps that are at least
ϵ0
far from low degree.

On the wastefulness of this test

One could establish "soundness results" for this test. The proof sketched below would likely adapt, albeit with a much larger parameter space:

Fd+1 as opposed to
F2
. But one should note that this test is particularly impractical. The reason is that every cycle requires recomputing a whole new set of weights
λ1x,,λdx
. This requires heavy lifting: at least
O(d)
products and inversions. The main point, however, testing
f(x)=?i=1dλixf(xi),
requires minimal computation.

Outline of the GLRSW scheme

The test we describe here is a refinement of the previous one. It works very much the same, but it manages to bypass the constant recomputation of the weights

λ1x,,λdx. The trick is to select
x,x1,x2,,xd
at every round so that the associated weights stay the same through-out.

A simple way to ensure this is to initially choose

d+1 distinct points
b,a1,,ad
, compute once and for all the
αi=1jdjibajaiaj,i=1,,d
and at every round draw random
s,tZ/pZ
and define
{xs+btx1s+a1tx2s+a2t xds+adt

It is quite obvious that for all
s,tZ/pZ
,
λix=1jdjixxjxixj=1jdji(s+ta)(s+ajt)(s+ait)(s+ajt)=1jdjibajaiaj=αi
since the
s
disappear both in the numerators and denominators when taking differences, and the
t
's factor out in every quotient. We thus have a new improved test:

Note. The reader may have noticed that the definition of the

λix doesn't make sense when
t=0
: we are dividing by zero. This isn't really a problem: for
f
polynomial function of degree
<d
, the functional equation
f(s+bt)=?i=1dαif(s+ait)
does still hold in that case, which is all we need.

Details

In this second half of the post we dive into details about the GLRSW scheme, in particular how it rehabilitates approximate polynomials by computing their closest low-degree polynomial. The proofs we present below are fleshed out versions of those from the original paper Self Testing/Correcting for Polynomials and Approximate Functions.

Conventions

In what follows

F is a finite prime field and
d
is a positive integer with
d<#F
. We will sometimes write
n=#F
. We define the degree of a map
f:FF
to be the degree of its interpolation polynomial. Thus
deg(f)<#F
. We write
Fd[X]
for the space of all degree
d
polynomials with coefficients in
F
. We furthermore say that a map
f:FF
has low degree if
deg(f)d
.

In line with the GLRSW paper, we will consider an efficient program

P that supposedly computes a certain function
f
. We write
P(x)
for the output of
P
on input
x
. The program
P
should compute
f
correctly most of the time, that is, we expect
P(x)=f(x)
for most inputs
xF
. One may think for instance of
P
as verifying a Merkle branch from (supposedly)
f(x)
to a Merkle root representing
f
.

A quick rundown

Step 1: Characterizing low-degreeness

We first establish that a certain form of redundancy in the values of a map

f:FF is equivalent to the interpolating polynomial of that map having low degree (i.e. degree
d
).

  • One direction (
    ) is easily checked. It is a fact about abstract polynomials.
  • The converse, i.e. the fact that this form of redundancy in maps precisely characterizes those maps with low degree interpolating polynomial is established in the next section.

The converse seemingly requires one to work in a prime field to avoid having to divide by binomial coefficients which may be zero (because of positive characteristic).

The helper function g

We next introduce a helper function

g:FF. The definition of
g
can be explained as follows. If
P
were indeed a low-degree polynomial function, then we would be able to predict any of its values
P(x)
starting with any
d+1
of its values
P(x0)
,
,
P(xd)
. We would simply use the interpolation formula to make a (correct) prediction. What if
P
isn't polynomial, though? Then different sets of values
x0,,xd
might lead to different predictions of the value of
P(x)
. We thus define

Informal definition of

g. For
xF
,
g(x)
is be the most popular prediction we get by interpolating values of
P
.

This definition is a little unsatisfying, though. There is potentially room for ambiguity:

Potential source of ambiguity 1. Of the predicted values, there might be two values that are tied for most popular predicted value.

That would be unfortunate, and it is something we have to deal with. A simple (yet not completely satisfactory) work-around is to arbitrarily break ties. Furthermore, one might question the value of this prediction: what if 998 values are respectively predicted

0.1001% of the time, and another value is predicted
0.1002%
of the time? That value would be our definition of
g(x)
, but its value as a majority value is dubious.

Potential source of ambiguity 2. There might be no value that is predicted

>50% of the time.

Note. It is important to note that

  • g
    is never explicitly computed by anybody
  • nor is
    g
    meant to be efficiently computable.

If the verifier had the computational power to compute

g even on a single value it might as well skip computing
g
altogether and directly verify that
P
is polynomial. The function
g
is simply here to be used in abstract arguments.

What matters most is that this

g, whatever it may be, is unambiguously defined and contains information about
P
. The fact that
g
may indeed be unambiguously defined and without arbitrary choices (such as arbitrarily resolving ties) is the [first thing we establish] TODO

Step 2: Analyzing the helper function and drawing conclusions

We formulate the expectations one might place on

g:

Expectation 1. If

P is close enough to being polynomial, there is no ambiguity in the definition of
g
.

This is proven in Lemma 1. This result also has psychological value, as it tells us that we don't have to deal with the potential ambiguities listed above, see its Corollary. In this context, close enough means

δ<14(d+1).

Expectation 2. If

P computes a map that is close to polynomial, then
g
and
P
ought to agree most of the time.

This is indeed true and established in Lemma 2. In this context, most of the time means on a proportion

>12δ of all inputs.

Expectation 3. The if

P is close enough to some low-degree polynomial map, then the helper function
g
is that low-degree polynomial map.

This is indeed true and established in Lemma 3. Thus

g rehabilitates
P
provided
P
is accurate enough.

Low-degreeness implies redundancy

The following lemma is the theoretical foundation of the test. We consider pairwise distinct field elements

b,a0,a1,,adF. These will remain constant through out.

Lemma. There exist coefficients

α0,,αdF such that for all
PFd[X]
,
P(X+bT)=k=0dαkP(X+akT)

Proof. We start with a special case of

P which turns out to be sufficient. Thus, expand, for
P=Xd
and arbitrary
α0,,αdF
, both sides of the above:
{ (X+bT)d=k=0d(dk)bkTkXdkk=0dαk(X+akT)d=k=0d(dk)(l=0dαlalk)TkXdk

To achieve equality, it is enough that the
αi
be a solution of the linear system
(1111a0a1a2ada02a12a22ad2a0da1da2dadd)(α0α1α2αd)=(1bb2bd)

This Vandermonde system is known to be invertible and so there is a unique solution
(α0,,αd)Fd+1
. We note that had we written a similar system for
Xk
with
kd
, the resulting constraints would have been a subset of those for
P=Xd
. Hence the
αi
we just identified work for all degree
d
polynomials.

Redundancy implies low degreeness

In view of the previous lemma, any degree

d polynomial map
f:FF
satisfies
s,tF,f(s+bt)=k=0dαkf(s+akt)
Since this is the basis of the proximity test above, one should wonder about the converse: are degree
d
these the only maps
f:FF
satisfying this property? To answer to that question let us define, for every map
f:FF
a new map
f^:F×FF
by the formula
s,tF,f^(s,t):=f(s+bt)i=0dαif(s+ait)
We can thus restate the previous question as

Question. Are degree

d these the only maps
f:FF
satisfying
f^=0
?

The answer is YES:

Lemma. Let

f:FF be a map. The following are equivalent:

  • f^=0
  • deg(f)d

Proof. The proof isn't complicated. We use polynomial interpolation of bivariate maps to convert the problem from one about maps to one about polynomials. First of all, bivariate polynomial interpolation is possible. The map

Fn1,n1[X,Y]F(F×F,F) that sends a bivariate polynomial
P
of bidegree
(n1,n1)
to the associated function
F×FF, (x,y)P(x,y)
is an isomorphism. This is easily seen using the polynomials
La,b
defined by
La,b(X,Y)=(αFαaXαaα)(βFβaYβbβ)
which vanish everywhere except at the one point
(a,b)F2
.

Similarly to what we did with functions, we can define, for any univariate polynomial

PFn1[X] a bivariate polynomial
P^Fn1,n1[S,T]
like so
P^=P(S+bT)i=0dαiP(S+aiT)
The procedures
ff^
and
PP^
are compatible with interpolation in the sense that if
f
is a map
FF
and
Pf
is its degree
<n
interpolating polynomial, then
Pf^
is
f^
's bivariate interpolation polynomial.

Now suppose

f satisfies
f^=0
. Then
Pf^=0
and if
e
is
f
's degree (i.e.
e=deg(Pf)
), then by looking solely at the degree
e
term of
Pf
we get that
(S+bT)e=i=0dαi(S+aiT)e
and by isolating the
XejTj
terms we see that
j{0,,e},bj=i=0dαiaij
and so
(1bb2bdbe)=α0(1a0a02a0da0e)+α1(1a1a12a1da1e)+α2(1a2a22a2da2e)++αd(1adad2addade)
This is where we get a contradiction: this is impossible for
ed+1
for it would contradict the invertibility of the Vandermonde matrix
VdM(b,a0,a1,,ad)
(recall that the
b,a0,,ad
are supposed pairwise distinct.)

Note. We implicitely used the assumption that

F is a prime field. Indeed, in the step where we " looked at
XejTj
terms " we divided by a binomial coefficient
(ej)
with
e<#F
. To be legitimate in doing so (i.e. to be sure we don't divide by zero) we have to assume
n=#F
is prime.

Expected value of interpolation: the helper function g

Suppose you are given oracle access to the values of a map

P. In other words, you have some black box that spits out values
P(x)
when you feed it some field element
xF
. Suppose furthermore you are told: "That function
P:FF
is a low degree polynomial map. How low you ask? Its degree is
d
." How would you go about convincing yourself of that claim? If
P
's domain
F
is large, it is hopeless to ask for a perfect proof: you would have to interpolate
P
from
d+1
values and compare the values predicted by your formula to those queried from the oracle.

You might on the other hand try to use the characterization of low degree maps presented above. While we won't be able to find efficient tests of low-degreeness per se, what we can do is reliably reject oracles

P that are far from any degree
d
polynomial. That is, what we describe below is a probabilistic test of proximity.

To simplify things slightly, we fix

b=0 and consider pairwise distinct nonzero
a0,,adF
. There is no reason to try to be fancy about choosing the
ai
, so taking
ai=i
for instance is a perfectly valid choice. If
P
were indeed polynomial of degree
d
, we would have, for all
x,tF
,
P(x)=i=0dαiP(x+ait)

This can't be reasonably checked on all inputs
x,t
. But starting from this observation we can make some abstract definitions that turn out to be useful in the analysis. First of all we define a nonnegative
δ=δP
by the formula
δ=|{(x,t)F2 | P(x)i=0dP(x+αit)}||F|2
This is simply the proportion of inputs
(x,t)
that fail the expected equation. We can also define subsets of
F
associated with
P
. For instance,
MajP
is the set of all
x
such that the right hand side of this equation takes on a particular value a majority of the time, i.e. more than half of the time with respect to
t
. We thus define
MajP={xF | θF, s.t. >50% of all t satisfy θ=i=0dαiP(x+ait)}

We can define a map[9]
g:MajPF
by setting
g(x)=θ
where
θ
is the value that appears
>50%
of the time as
i=0dαiP(x+ait)
. No harm is made by arbitrarily extending
g
to a map
FF
. We can also define the subset where, conveniently, that majority value coindices with the value predicted by the oracle and that predicted by the supposed low degreeness
ConvP={xF |  >50% of all t satisfy P(x)=i=1d+1αiP(x+ait)}

Here's a picture to accompany these definitions. We can think of every pair

(x,t)F2 as a test case for the expected equality
P(x)=?i=1d+1αiP(x+ait)
. In the diagram below, blue dots signal that this expected equality holds true, yellow crosses that it fails.

For that particular oracle

P, most pairs
(x,t)
satisfy the expected equality. If we actually count the yellow crosses we get
δ=δP=39/25615,23%
. Every column that is majority blue is, by definition, indexed by some
xConvP
.

Note that three columns have fewer than half of the expected equalities hold. These are columns indexed by

xConvP, yet some, such as the first from the left, might still define an unambiguous majority value, e.g. the
x
-coordinate of the first red column might still belong to
MajP
.

Lemma 1 - the helper function and the oracle agree a lot

We start with a simple lemma. Let

X,Y be independent, identically distributed (iid) random values with values in a finite set
V
. Also let
v0
be (a) the value
X
is likeliest to have, that is
P[X=v0]=maxvVP[X=v]
Let
E
be the event that
X
and
Y
agree, i.e.
E=[X=Y]
, then

Lemma.

P[E]P[X=v0].

Proof. Decompose the event

E according to the common value of
X
and
Y
:
E=vV[X=v][Y=v].
Taking probabilities and since
X
and
Y
are iid,
P[E]=vVP[X=v]2vVP[X=v]P[X=v0]=P[X=v0].


Fix

xF. Consider two independent uniformly distributed random variables
t1
and
t2
with values in
F
. Consider the random variables
{Pred1=i=0dαiP(x+ait1)Pred2=j=0dαiP(x+ajt2)
These represent random predictions of the value of
P(x)
obtained by interpolation. These random variables and are independent and identically distributed. Also
g(x)
is, by definition, the most likely value of either. In light of the previous lemma we le the
E
be the event where two predictions agree:
E=[Pred1 = Pred1]
The previous lemma tells us that
P[E][Proportion of tF such thati=0dαiP(x+ait)=g(x)]
If we can find a usable lower bound for the probability
P[E]
, we get a lower bound for the proportion of inputs realizing the majority value
g(x)
.

In order to produce such a lower bound, we consider the subevent

Ei=0d[P(x+ait1)=j=0dαjP(x+ait1+ajt2)](1) j=0d[P(x+ajt2)=i=0dαiP(x+ait1+ajt2)](2) The right hand side is indeed a subset of
E
: if all the conditions on the right are met, then
Pred1=i=0dαiP(x+it1)=(1)i=0dαi(j=0dαjP(x+it1+ajt2))=j=0dαj(i=0dαiP(x+ait1+ajt2))=(2)j=0dαiP(x+ajt2)=Pred2
These subevents allow us to have th random variables
t1,t2
interact. Obtaining a lower bound on
P[E]
is equivalent to getting an upper bound on
P[ΩE]
. The above gives
ΩEi=0d[P(x+it1)j=1d+1αjP(x+ait1+ajt2)] j=0d[P(x+jt2)i=1d+1αiP(x+ait1+ajt2)]

So that
P[ΩE]i=0dP[P(x+ait1)j=0dαjP(x+ait1+ajt2)]+ j=0dP[P(x+ajt2)i=0dαiP(x+ait1+ajt2)]

Now for any
i=0,,d
the random variable
t1=x+ait1
is uniformly distributed and independent from
t2
, and similarly for any
j=0,,d
the random variable
t2=x+ajt2
is uniformly distributed and independent from
t1
, and so, by definition of
δ
,
P[P(x+ait1)j=0dαjP(x+ait1+ajt2)]=P[P(t1)j=0dαjP(t1+ajt2)]=δ

and similarly,
P[P(x+ajt2)i=0dαiP(x+ajt2+ait1)]=P[P(t2)j=0dαiP(t2+ait1)]=δ
Injecting these into the upper bound for
P[ΩE]=1P[E]
obtained previously proves

Lemma 1. For any

xF,
12(d+1)δ  P[E]  P[Pred1=g(x)].

As promised, we can thus lift any and all unpleasant ambiguities in

g's definition. Notice that
12(d+1)δ>12
as soon as
δ<14(d+1)
.

Corollary. Suppose that

δ<14(d+1), then for all
xF
,
g(x)
represents
>50%
of the the values
i=0dαjP(x+ait)
where
tF
. In other words,
MajP=F
.

Lemma 2 - the helper function and the oracle agree a lot

Let us make the headline precise. We just proved that whenever

δ<14(d+1), then
g(x)
is unambiguously defined for all
xF
, that is
MajP=F
. Recall that we defined a subset
ConjPMajP
of so-called "convenient" inputs. Those were the
xF
where, not only
g
was unambiguously defined, but
g(x)
coïncided with
P(x)
. We now establish that as long as
δ
is small enough,
ConjP
will be large. I.e. if
δ
is small, then the helper function
g
and the oracle agree often.

Lemma 2. We have

|ConvP||F|12δ

Proof. Let us consider the pairs

(x,t) where the expected equality
P(x)=?i=0dαiP(x+ait)
fails. We distinguish two cases:
xConjP
and
xConjP
:


Thus, by definition of

δ,

The last inequality follows from the fact that when
xConvP
,
g(x)P(x)
and thus
P(x)
is not a majority value (or it shares that title with
g(x)
and was arbitrarily disregarded as the choice for the majority value). In particular, the proportion of
tF
such that
P(x)i=0dP(x+ait)
is at least
50%
. Rearranging things, we get the inequality from the lemma.

Lemma 3 - the helper function is a low degree polynomial

We first state an informal version of the lemma to prove. A precise statment will be given at the end.

Lemma 3 (informal). If

δ is small enough, then the helper function
g
is polynomial.

The proof strategy is quite interesting: to prove that

g is polynomial of degree
d
(or to be precise:
g
's interpolating polynomial has degree
d
) the authorsinvoke the characterization of low degree functions. The goal is thus to show that for any two
x,tF
one has
g(x)=i=0dαig(x+ait)i.e.i=0d+1αig(x+ait)=0(Eq. \colorredx,t)
where we set
ad+1=b=0
and
αd+1=1
. What makes the proof strategy interesting is that the satisfaction of (Eq.
\colorredx,t
) for a particular pair
(x,t)F2
, which is either true or false, is established using a probabilistic method. To wit, consider the following event
F=Fx,t=i=0d+1[g(x+ait)=j=0dαjP(x+ait+aj(t1+ait2))](3) j=1d[0=i=0d+1αiP(x+ajt1+ai(t+ajt2))](4)
where
t1
and
t2
are independent uniformly distributed random variables in
F
. The relation with (Eq.
\colorredx,t
) is that, whenever the conditions in
F
are met, one has
i=0d+1αig(x+ait)=(3)i=0d+1αij=0dαjP(x+ait+aj(t1+it2))=j=0dαji=0d+1αiP(x+ajt1+ai(t+jt2))=(4)0.
Thus, to conclude that (Eq.
\colorredx,t
) holds it is enough to show that the event
Fx,t
has nonzero probability.
The goal now is to bound the probability of
F
from below, in particular, to show that it is
>0
when
δ
is small enough.

Again, the idea will to bound

P[Fc] from above. First of all,
Fc=i=0d+1[g(x+ait)j=0dαjP(x+ait+aj(t1+ait2))] j=0d[0i=0d+1αiP(x+ajt1+ai(t+ajt2))]
so that
1P[F]=P[Fc]i=0d+1P[g(x+ait)j=0dαjP(x+ait+aj(t1+ait2))]\colorredI+ j=0dP[0i=0d+1αiP(x+ait+aj(t1+ait2))]\colorblueII

Bounding

\colorredI. Since for all
i
, the random variable
t1+ait2
is uniformly distributed, we can apply the result from Lemma 1 and obtain
i,P[g(x+ait)j=0dαjP(x+ait+aj(t1+ait2))]2(d+1)δ.

Bounding

\colorblueII. Take
j{0,,d}
. For any
i{0,,d+1}
, rearranging terms
P(x+ait+aj(t1+ait2))=P(\boldsymbolξ+ai\boldsymbolτ)
where
\boldsymbolξ=x+ajt1
and
\boldsymbolτ=t+ajt2
. Note that are
\boldsymbolξ
and
\boldsymbolτ
are both uniformly distributed in
F
(since
aj0
) and independent (since
t1
and
t2
are). Then by definition of
δ
,
P[0i=0d+1αiP(x+ait+aj(t1+ait2))]=P[0i=0d+1αiP(x+ajt1+ai(t+ajt2))]=P[0i=0d+1αiP(\boldsymbolξ+ai\boldsymbolτ))]=δ.

Combining these bounds yields:

1P[F]=P[Fc]i=0d+1P[g(x+ait)j=0dαjP(x+ait+aj(t1+ait2))]+ j=0dP[0i=0d+1αiP(x+ait+aj(t1+ait2))](d+2)2(d+1)δ+(d+1)δ=(d+1)(2d+5)δ i.e.
1(d+1)(2d+5)δP[F].
Therefore, as soon as
δ<1(d+1)(2d+5)
,
g
is polynomial. We can now state Lemma 3 with precision.

Lemma 3. If

δ<1(d+1)(2d+5), the helper function
g
is polynomial.

Conclusion

Lemma 3 shows how to "recover" the unique low-degree neighbor function

g of a map
P
as soon as
δ
is small enough. Thus
g
is the polynomial function the program
P
is meant to be computing.

A question

Let

F be a prime field. Let
x,x1,,xdF
are pairwise distinct and, similarly, let
y,y1,,ydF
are pairwise distinct. Suppose that for all
i=1,,d
,
1jdjixxjxixj=1jdjiyyjyiyj
This is the case for instance if
y,y1,,yd
are obtained from
x,x1,,xd
by means of an affine transformation
uau+b
(with
a0
). One might ask: is the converse true?

Question. Is it necessarily the case that there exist

(a,b)F××F such that
y,y1,,yd
are obtained from the
x,x1,,xd
by application of the affine transformation
ϕa,b:uau+b
?

Note. The restriction to prime fields could potentially be relaxed by allowing automorphisms

σAut(F) in the affine transformation formula:
ϕa,b,σ:uaσ(u)+b
and with, say,
σ
the Frobenius automorphism of the subfield generated by the
d
weights above.


  1. although, more on that later. ↩︎

  2. although concrete implementations vary greatly ↩︎

  3. the precise way in which this interpolation is done varies. In particular, the domain of interpolation is usually a structured subset of a (finite) field

    F: a subgroup/coset of
    F×
    or an additive sugroup/coset of
    Fpd
    . ↩︎

  4. which vectors also support ↩︎

  5. indeed, if

    deg(P),deg(Q)#F and if
    s
    is drawn at random in a large field
    F
    , then
    P(s)=Q(s)
    happens with probability at most
    max{deg(P),deg(Q)}#F1
    . ↩︎

  6. i.e. we think of maps

    f:FF as the points of
    F
    ↩︎

  7. when all conditions

    f(x)=Pf(x),
    f(x1)=Pf(x1)
    , ,
    f(xd)=Pf(xd)
    are met simultaneously. There can of course be serendipitous collisions, but those are harder to account for. ↩︎

  8. when precisely one of the conditions

    f(x)=Pf(x),
    f(x1)=Pf(x1)
    , ,
    f(xd)=Pf(xd)
    isn't met. Of course more than one of these may fail, but it is then unclear whether the equation fails too. ↩︎

  9. we defined

    MajP precisely in order to define
    g
    ↩︎