There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
The simplest proximity-to-low-degree-polynomial test and how to rehabilitate approximate polynomials
Introduction
This post is an apetizer for a blog post in preparation on the FRI protocol. We describe what is arguably the simplest of all proof of proximity (to low degree polynomial functions)[1] scheme out there. Here and elsewhere, low degree means "of degree " for some fixed positive integer . But our interest in this scheme is as a means to shed light on a 👻 dual 👻 problem to that of proximity testing.
This scheme was originally described in the 1991 paper Self Testing/Correcting for Polynomials and Approximate Functions by P. Gemmel, R. Lipton, R. Rubinfeld, M. Sudhan and A. Widegerson (we shall sometimes use the acronym GLRSW). It has been referred to as "Sudhan " or some variation on that name in presentations on FRI given by Eli Ben Sasson. In StarkWare's blogpost on Low Degree Testing, it is what is called "the direct test".
Compared to modern proof of proximity schemes such as FRI it is spectacularly inefficient: it runs in linear time as opposed to FRI's polylogarithmic time . However, it has the advantage of sheer simplicity:
there is only ever one map at play (as opposed to the inter-dependent maps in FRI);
coherence is checked for that one map as opposed to having a "trickle down" sequence of coherence conditions relating all the maps in that sequence;
the redundancy property at the heart of this scheme is second nature to anyone familiar with polynomial interpolation.
Also, proving its "soundness properties", while involved, follows a clear path. It also illustrates well a popular outlook for proving soundness: "deal with the good cases in detail, don't bother analyzing what happens in the bad cases except for bounding the probability of something bad happening in the first place". Usually the good case is when everything happens according to plan, bad cases are everything else. This makes the GLRSW scheme a perfect entry point into proof of proximity schemes.
Role of proofs of proximity in transparent computational integrity schemes
We somewhat cryptically indicated that GLRSW is really about a dual problem to that of low-degree proximity testing. Let us try to explain that claim. So before we go any further: what are proof of proximity schemes and what are they useful for? Certainly the main use-case relevant to blockchain related applications is as part of computational integrity schemes.
Computational integrity schemes such as STARKs or SNARKs and many others are protocols whereby a prover can convince a verifier that a complex and/or confidential computation was performed as advertized. The tremedous appeal of such schemes comes from the existence of computational integrity schemes that are
far quicker to verify than it is to run the computation,
leak no private information,
are nigh impossible to twist into producing false positives.
Such schemes rely on a prover to construct a proof of computational integrity. The prover's work can be broadly separated into two phases.
The first phase, common to all such schemes[2], is the arithmetization of the computation (R1CS or algebraic execution trace for instance). The raw data produced during the computation is converted into algebraic form (typically field elements) which in turn is condensed into polynomial form by means of polynomial interpolation[3]. Polynomials are useful in this regard: besides addition and scalar multiplication[4], they support products and divisibility. Importantly, this first phase can be done so that computational integrity (i.e. validity of the underlying computation) is equivalent to the satisfaction of some algebraic constraints by the resulting polynomials. Typically low-degreeness and divisibility conditions.
The second phase, i.e. actually compiling the proof, comes down to finding an efficient commitment of these polynomials. This commitment should allow a verifier to convince themselves of the claim to be proven. In particular, of low-degreeness and divisibility conditions that may apply. There are various ways of doing this, and (at least) two competing philosophies for carrying out the second step.
Philosophy 1: use opaque data and secrets to force the prover down a particular path; valid proofs are those producing ultra-rare collisions. The verifier thus generates some secrets along with associated opaque data (hidden behind a hard discrete log problem, say) called the proving key to be handed to the prover. The proving key is generated in such a way that for one to generate a convincing proof from it, one has to
either comply with the verifier and produce a proof according to protocol; such proofs encode polynomials that are low-degree by construction;
or successfully break a cryptographically hard problem (e.g. variants of discrete log computations).
For checking divisibility conditions, the relevant fact is that two distinct low degree polynomials and virtually never[5] produce a collision when evaluated at a random (secret) point. Thus, producing an exceedingly rare collision at a random (secret) point is seen as strong supporting evidence for the claim " ". By extension, a collision of the form is interpreted as strong evidence for a divisibility claim " ".
Philosophy 2: no secrets, find another way to enforce low-degreeness / divisibility conditions. Schemes of the first kind had low-degreeness baked into them. For schemes where the prover isn't forced down a narrow path for constructing polynomial commitments, this is no longer the case.
This is immediately problematic: producing collisions between polynomials becomes easy if one is free to use large degree polynomials. Checking for low-degreness thus becomes the central problem. And a difficult problem it certainly is. Checking low-degreeness on the nose is computationnally demanding, at least as demanding as the original computation the verifier may wish to bypass. Proofs of proximity to low-degree maps are an efficient substitute for on the nose low-degree tests. They don't prove that a map is low-degree, they show that a map is (likely) in the immediate neighborhood of some low degree map whose properties may be extracted by other means from .
Once the verifier has sufficient supporting evidence for believing in low-degreeness (or at least proximity to a low degree map), checking divisibilty conditions follows much of the same path as before: open commitments at a random points (that usually need not be secret) and check in the clear.
Proofs of proximity vs the GLRSW scheme
The GLRSW scheme we will describe below can be seen as a crude means to discriminate between maps. To distinguish between those that are very close to being polynomial and those that are far from polynomial. In that sense it can work like a proof of proximity scheme. Indeed, the purpose of such a scheme is to
ACCEPT with probability 1 low degree polynomial functions
REJECT with high probability maps that are far from any low degree polynomial function.
The behaviour of such a proof of proximity scheme is illustrated below: low degree maps (here the central red dot) are to be accepted all the time, maps that are outside of some small neighborhood of the set of low degree polynomial maps are to be rejected with high probability. And then there is a grey area (a small Hamming neighborhood of some low degree polynomial map) where the test's behaviour is unspecified. The GLRSW scheme on the other hand fulfils the complementary (or dual) role of "discovering" the nearest low degree map when fed one of maps in that small neighborhood.
Proofs of proximity can be seen as means of amplifying the distinction between polynomial maps and their distant neighbors,
the GLRSW scheme can be understood as contracting small neighborhoods of polynomial functions onto that polynomial function.
So while proof of proximity schemes are in the business of discrimination, GLRSW is in the business of rehabilitation. The next section are our attempt to make explicit the above picture.
The barren wasteland of the space of all maps
Let us think for a moment about the space of all maps (similar mental pictures will apply to the space of all maps for some large subset ). When is a large finite field, is a very large set: For instance, if is a 256 bit prime field, we get . In other words, 's size is beyond comprehension. Most of the points in that space[6] are totally unstructured ''random'' maps.
The overwhelming majority of points of are maps whose interpolation polynomial has degree .
Indeed, there are only polynomial maps of degree , i.e. the probability of stumbling on a polynomial maps of degree by chance is .
For all intents and purposes low degree maps don't exist.
For instance, if we consider the collection of all polynomial maps of degree , a very reasonable bound for any real world application, the likelyhood of stumbling on such a function by chance is 1 in which is, for a 256 bit field, than 1 in .
The civilizing influence of low degree maps on their neighbors
A neighborhood of maps
While low degree maps are exceedingly rare they do, of course, exist. Furthermore, they exert a taming influence on their immediate neighbor functions. When talking about neighboring functions we are implicitly talking about distances between maps. To measure distances between maps, i.e. points in , we use Hamming distance. The Hamming distance between two maps is the proportion of inputs in on which they differ. Thus if they differ on no inputs and while if for all then . Thus, for any we can consider the "neighborhood of " comprised of all maps with .
The picture below is a mental picture of a neighborhood of a low degree map in . It's a pretty large picture, and opening it in a new window will make its features clearer.
In it we depict
a low degree map ,
a somewhat close neighbor with, say, )
a slightly remote neighbor with, say, )
and a map that bears some ressemblance (yet not all that much) with , say with ).
The white halo around is meant to represent the small neighborhood on which the civilizing influence of can be felt. And then there is a dense and featureless blue ocean of "generic maps".
… Taming / civilizing influence?
What do we mean by "taming influence"? First of all, let us say clearly what we don't mean. We don't mean to suggest that close neighbors of a low degree map are themselves low degree …Far from it. Close neighbors are overwhelmingly likely to be of degree , the maximum degree possible… Yet, to the casual observer working with incomplete information (e.g. a probabilistic polynomial time Turing machine with oracle access to a close neighbor of ) they are nigh indistinguishable from the polynomial map in whose light they bask. Indeed …
Close neighbors of low degree maps exhibit many of the same local redundancies which characterize low degree maps.
Let us qualify that statement: they do with exceedingly high probability. When tested on random inputs:
maps that are close to a polynomial exhibit many of the local redundancies exhibited by low degree polynomials
maps that are far from low degree polynomial maps don't.
This loose dichotomy is the basis for proofs of proximity. We can probabilistically test for proximity to a polynomial by checking if the map in question exhibits the expected amount of redundancies.
Polynomials <=> built in redundancy
Let us now be slightly more clear about the way in which polynomials exhibit redundancy. This is pretty basic stuff (interpolation understood using linear algebra) and can nicely be illustrated. In one word this boils down to the fact that the space of polynomial maps of degree is a vector space of dimension and the linear functions " evaluation at , " for distinct points form a basis of the dual space.
So, for instance, while there is a whole " line's worth " of polynomial maps of degree satisfying the constraint , there is but one polynomial map of degree satisfying the extra constraint , as depicted below:
Single Constraint
Two constraints
Similarly there is a whole " line's worth " of polynomial maps of degree satisfying the constraints and (whatever the values as long as ), but there is only one polynomial map of degree satisfying the extra constraint , as depicted below:
Two Constraints
Three constraints
This is simply expressing the fact that for distinct the " evaluation at linear forms " form a basis of the dual .
Its (ante)dual basis is the basis of Lagrange polynomials. Indeed, to produce the only polynomial function such that for , one forms where the Lagrange polynomials of the set :
The fact that the family of linear functionals forms a basis of the space of linear functionals on also means that evaluations at points can be expressed as linear combinations of evaluations at the . This accounts for the " in-built redundancy " of polynomial maps. Indeed, if is a degree polynomial function we can compute for any by plugging into the formula above and using its values at the : or, if we write ,
An inefficient proximity test
We describe a (crude) proximity test to low-degree polynomials. The redundancy within low-degree polynomials is at the heart of it. As we tried to suggest earlier, these redundancies remain by and large true for functions that are very close to low degree polynomial functions. This, in spite of the fact that these functions are hopelessly high degree polynomial functions. These maximal degree polynomial maps do their darndest to emulate low degree maps. These maps will usually pass the following test:
The number of repetitions required will depend on a target soundness.
A simple criterion to rule out maps that are far from low degree maps
Let us put : in other words, there is some low-degree polynomial that agrees with on a random input with probability . Then the equation for randomly and independently sampled is satisfied with probability at least[7] and fails with probability at least[8]. Using these simple estimates and fixing some desired threshold one can find so that with overwhelming probability the test will reject maps that are at least far from low degree.
On the wastefulness of this test
One could establish "soundness results" for this test. The proof sketched below would likely adapt, albeit with a much larger parameter space: as opposed to . But one should note that this test is particularly impractical. The reason is that every cycle requires recomputing a whole new set of weights. This requires heavy lifting: at least products and inversions. The main point, however, testing requires minimal computation.
Outline of the GLRSW scheme
The test we describe here is a refinement of the previous one. It works very much the same, but it manages to bypass the constant recomputation of the weights . The trick is to select at every round so that the associated weights stay the same through-out.
A simple way to ensure this is to initially choose distinct points , compute once and for all the and at every round draw random and define It is quite obvious that for all , since the disappear both in the numerators and denominators when taking differences, and the 's factor out in every quotient. We thus have a new improved test:
Note. The reader may have noticed that the definition of the doesn't make sense when : we are dividing by zero. This isn't really a problem: for polynomial function of degree , the functional equation does still hold in that case, which is all we need.
Details
In this second half of the post we dive into details about the GLRSW scheme, in particular how it rehabilitates approximate polynomials by computing their closest low-degree polynomial. The proofs we present below are fleshed out versions of those from the original paper Self Testing/Correcting for Polynomials and Approximate Functions.
Conventions
In what follows is a finite prime field and is a positive integer with . We will sometimes write . We define the degree of a map to be the degree of its interpolation polynomial. Thus . We write for the space of all degree polynomials with coefficients in . We furthermore say that a map has low degree if .
In line with the GLRSW paper, we will consider an efficient program that supposedly computes a certain function . We write for the output of on input . The program should compute correctly most of the time, that is, we expect for most inputs . One may think for instance of as verifying a Merkle branch from (supposedly) to a Merkle root representing .
A quick rundown
Step 1: Characterizing low-degreeness
We first establish that a certain form of redundancy in the values of a map is equivalent to the interpolating polynomial of that map having low degree (i.e. degree ).
One direction () is easily checked. It is a fact about abstract polynomials.
The converse, i.e. the fact that this form of redundancy in maps precisely characterizes those maps with low degree interpolating polynomial is established in the next section.
The converse seemingly requires one to work in a prime field to avoid having to divide by binomial coefficients which may be zero (because of positive characteristic).
The helper function g
We next introduce a helper function . The definition of can be explained as follows. If were indeed a low-degree polynomial function, then we would be able to predict any of its values starting with any of its values , , . We would simply use the interpolation formula to make a (correct) prediction. What if isn't polynomial, though? Then different sets of values might lead to different predictions of the value of . We thus define
Informal definition of . For , is be the most popular prediction we get by interpolating values of .
This definition is a little unsatisfying, though. There is potentially room for ambiguity:
Potential source of ambiguity 1. Of the predicted values, there might be two values that are tied for most popular predicted value.
That would be unfortunate, and it is something we have to deal with. A simple (yet not completely satisfactory) work-around is to arbitrarily break ties. Furthermore, one might question the value of this prediction: what if 998 values are respectively predicted of the time, and another value is predicted of the time? That value would be our definition of , but its value as a majority value is dubious.
Potential source of ambiguity 2. There might be no value that is predicted of the time.
Note. It is important to note that
is never explicitly computed by anybody
nor is meant to be efficiently computable.
If the verifier had the computational power to compute even on a single value it might as well skip computing altogether and directly verify that is polynomial. The function is simply here to be used in abstract arguments.
What matters most is that this , whatever it may be, is unambiguously defined and contains information about . The fact that may indeed be unambiguously defined and without arbitrary choices (such as arbitrarily resolving ties) is the [first thing we establish] TODO
Step 2: Analyzing the helper function and drawing conclusions
We formulate the expectations one might place on :
Expectation 1. If is close enough to being polynomial, there is no ambiguity in the definition of .
This is proven in Lemma 1. This result also has psychological value, as it tells us that we don't have to deal with the potential ambiguities listed above, see its Corollary. In this context, close enough means .
Expectation 2. If computes a map that is close to polynomial, then and ought to agree most of the time.
This is indeed true and established in Lemma 2. In this context, most of the time means on a proportion of all inputs.
Expectation 3. The if is close enough to some low-degree polynomial map, then the helper function is that low-degree polynomial map.
This is indeed true and established in Lemma 3. Thus rehabilitates provided is accurate enough.
Low-degreeness implies redundancy
The following lemma is the theoretical foundation of the test. We consider pairwise distinct field elements . These will remain constant through out.
Lemma. There exist coefficients such that for all ,
Proof. We start with a special case of which turns out to be sufficient. Thus, expand, for and arbitrary , both sides of the above: To achieve equality, it is enough that the be a solution of the linear system This Vandermonde system is known to be invertible and so there is a unique solution . We note that had we written a similar system for with , the resulting constraints would have been a subset of those for . Hence the we just identified work for all degree polynomials.
Redundancy implies low degreeness
In view of the previous lemma, any degree polynomial map satisfies Since this is the basis of the proximity test above, one should wonder about the converse: are degree these the only maps satisfying this property? To answer to that question let us define, for every map a new map by the formula We can thus restate the previous question as
Question. Are degree these the only maps satisfying ?
The answer is YES:
Lemma. Let be a map. The following are equivalent:
Proof. The proof isn't complicated. We use polynomial interpolation of bivariate maps to convert the problem from one about maps to one about polynomials. First of all, bivariate polynomial interpolation is possible. The map that sends a bivariate polynomial of bidegree to the associated function is an isomorphism. This is easily seen using the polynomials defined by which vanish everywhere except at the one point .
Similarly to what we did with functions, we can define, for any univariate polynomial a bivariate polynomial like so The procedures and are compatible with interpolation in the sense that if is a map and is its degree interpolating polynomial, then is 's bivariate interpolation polynomial.
Now suppose satisfies . Then and if is 's degree (i.e. ), then by looking solely at the degree term of we get that and by isolating the terms we see that and so This is where we get a contradiction: this is impossible for for it would contradict the invertibility of the Vandermonde matrix (recall that the are supposed pairwise distinct.)
Note. We implicitely used the assumption that is a prime field. Indeed, in the step where we " looked at terms " we divided by a binomial coefficient with . To be legitimate in doing so (i.e. to be sure we don't divide by zero) we have to assume is prime.
Expected value of interpolation: the helper function g
Suppose you are given oracle access to the values of a map . In other words, you have some black box that spits out values when you feed it some field element . Suppose furthermore you are told: "That function is a low degree polynomial map. How low you ask? Its degree is ." How would you go about convincing yourself of that claim? If 's domain is large, it is hopeless to ask for a perfect proof: you would have to interpolate from values and compare the values predicted by your formula to those queried from the oracle.
You might on the other hand try to use the characterization of low degree maps presented above. While we won't be able to find efficient tests of low-degreeness per se, what we can do is reliably reject oracles that are far from any degree polynomial. That is, what we describe below is a probabilistic test of proximity.
To simplify things slightly, we fix and consider pairwise distinct nonzero . There is no reason to try to be fancy about choosing the , so taking for instance is a perfectly valid choice. If were indeed polynomial of degree , we would have, for all , This can't be reasonably checked on all inputs . But starting from this observation we can make some abstract definitions that turn out to be useful in the analysis. First of all we define a nonnegative by the formula This is simply the proportion of inputs that fail the expected equation. We can also define subsets of associated with . For instance, is the set of all such that the right hand side of this equation takes on a particular value a majority of the time, i.e. more than half of the time with respect to . We thus define We can define a map[9] by setting where is the value that appears of the time as . No harm is made by arbitrarily extending to a map . We can also define the subset where, conveniently, that majority value coindices with the value predicted by the oracle and that predicted by the supposed low degreeness
Here's a picture to accompany these definitions. We can think of every pair as a test case for the expected equality . In the diagram below, blue dots signal that this expected equality holds true, yellow crosses that it fails.
For that particular oracle , most pairs satisfy the expected equality. If we actually count the yellow crosses we get . Every column that is majority blue is, by definition, indexed by some .
Note that three columns have fewer than half of the expected equalities hold. These are columns indexed by , yet some, such as the first from the left, might still define an unambiguous majority value, e.g. the -coordinate of the first red column might still belong to .
Lemma 1 - the helper function and the oracle agree a lot
We start with a simple lemma. Let be independent, identically distributed (iid) random values with values in a finite set . Also let be (a) the value is likeliest to have, that is Let be the event that and agree, i.e. , then
Lemma..
Proof. Decompose the event according to the common value of and : Taking probabilities and since and are iid,
Fix . Consider two independent uniformly distributed random variables and with values in . Consider the random variables These represent random predictions of the value of obtained by interpolation. These random variables and are independent and identically distributed. Also is, by definition, the most likely value of either. In light of the previous lemma we le the be the event where two predictions agree: The previous lemma tells us that If we can find a usable lower bound for the probability , we get a lower bound for the proportion of inputs realizing the majority value .
In order to produce such a lower bound, we consider the subevent The right hand side is indeed a subset of : if all the conditions on the right are met, then These subevents allow us to have th random variables interact. Obtaining a lower bound on is equivalent to getting an upper bound on . The above gives So that Now for any the random variable is uniformly distributed and independent from , and similarly for any the random variable is uniformly distributed and independent from , and so, by definition of , and similarly, Injecting these into the upper bound for obtained previously proves
Lemma 1. For any ,
As promised, we can thus lift any and all unpleasant ambiguities in 's definition. Notice that as soon as .
Corollary. Suppose that , then for all , represents of the the values where . In other words, .
Lemma 2 - the helper function and the oracle agree a lot
Let us make the headline precise. We just proved that whenever , then is unambiguously defined for all , that is . Recall that we defined a subset of so-called "convenient" inputs. Those were the where, not only was unambiguously defined, but coïncided with . We now establish that as long as is small enough, will be large. I.e. if is small, then the helper function and the oracle agree often.
Lemma 2. We have
Proof. Let us consider the pairs where the expected equality fails. We distinguish two cases: and :
Thus, by definition of , The last inequality follows from the fact that when , and thus is not a majority value (or it shares that title with and was arbitrarily disregarded as the choice for the majority value). In particular, the proportion of such that is at least . Rearranging things, we get the inequality from the lemma.
Lemma 3 - the helper function is a low degree polynomial
We first state an informal version of the lemma to prove. A precise statment will be given at the end.
Lemma 3 (informal). If is small enough, then the helper function is polynomial.
The proof strategy is quite interesting: to prove that is polynomial of degree (or to be precise: 's interpolating polynomial has degree ) the authorsinvoke the characterization of low degree functions. The goal is thus to show that for any two one has where we set and . What makes the proof strategy interesting is that the satisfaction of (Eq. ) for a particular pair , which is either true or false, is established using a probabilistic method. To wit, consider the following event where and are independent uniformly distributed random variables in . The relation with (Eq. ) is that, whenever the conditions in are met, one has Thus, to conclude that (Eq. ) holds it is enough to show that the event has nonzero probability. The goal now is to bound the probability of from below, in particular, to show that it is when is small enough.
Again, the idea will to bound from above. First of all, so that
Bounding . Since for all , the random variable is uniformly distributed, we can apply the result from Lemma 1 and obtain
Bounding . Take . For any , rearranging terms where and . Note that are and are both uniformly distributed in (since ) and independent (since and are). Then by definition of ,
Combining these bounds yields: i.e. Therefore, as soon as , is polynomial. We can now state Lemma 3 with precision.
Lemma 3. If , the helper function is polynomial.
Conclusion
Lemma 3 shows how to "recover" the unique low-degree neighbor function of a map as soon as is small enough. Thus is the polynomial function the program is meant to be computing.
A question
Let be a prime field. Let are pairwise distinct and, similarly, let are pairwise distinct. Suppose that for all , This is the case for instance if are obtained from by means of an affine transformation (with ). One might ask: is the converse true?
Question. Is it necessarily the case that there exist such that are obtained from the by application of the affine transformation ?
Note. The restriction to prime fields could potentially be relaxed by allowing automorphisms in the affine transformation formula: and with, say, the Frobenius automorphism of the subfield generated by the weights above.
the precise way in which this interpolation is done varies. In particular, the domain of interpolation is usually a structured subset of a (finite) field : a subgroup/coset of or an additive sugroup/coset of .↩︎
when all conditions, , …, are met simultaneously. There can of course be serendipitous collisions, but those are harder to account for.↩︎
when precisely one of the conditions, , …, isn't met. Of course more than one of these may fail, but it is then unclear whether the equation fails too.↩︎