Recently I've been working on an implementation of FFTs over finite fields for some purposes (which I'll write about some day). I was finding the algorithm and the problem neat and thought it would be nice to share what are FFTs, how are they computed and what can be done with them.
While reading please remember that FFTs are a gigantic subject with lots of cool caveats, algorithms and use-cases. This post only intends to be merely an introductory to the subject. If you have further questions about this post or if you find this topic interesting and want to discuss about it or general aspects of crypto, feel free to reach out on Telegram or on Twitter.
Polynomials are a very interesting algebraic construct. In particular they have many use cases in cryptography such as SNARKs/STARKs, Shamir-Secret-Sharing (see my post about it), Ring-LPN and more. It's 100% ok if you don't know any of those buzzwords, I will not assume any prior knowledge about them. The only point here is that polynomials are very useful.
A quick recap on what are polynomials. A function is a polynomial if it can be expressed using the following algebraic term: For some non-negative integer and a set of numbers . In this post we will sometimes write only instead of since all polynomials we shall discuss are of a single variable, typically .
Generally speaking, the elements are elements of some field. In this article we will primarily focus on finite fields and thus assume this field to be a finite field of some prime size .
The degree of a polynomial is the largest index such that . For a polynomial we denote its degree using . In the edge-case where , all coefficients are zero we say that the .
Let and be two polynomials of degree (so and ).
One thing we can do with is adding them. So, the sum of and is:
In terms of computational complexity, the addition of two degree- polynomials takes field operations. In other words, it is linear in the size of the polynomial. This is because, as the formula suggests, there are coefficients in the results polynomial and each coefficient can be computed with a single addition of two field elements.
We can also multiply and by each other. But when it comes to multiplication, things get a little bit more complicated. The product of and is:
First, notice that the degree of the resulting polynomial is , so by multiplying we get a polynomial of a higher degree.
As this formula suggests the multiplication take time, this is because for each of the coefficients of we multiply it by each of the coefficients of , thus performing as many as field multiplications.
As some of you may find it more appealing, we can also write the multiplication as:
This is because the coefficient of in the resulting polynomial is the inner sum: .
So, is nice, but can we do better? I mean, imagine if are of degree of . It would be infeasible to multiply them!
At this point you're probably wondering what all these have to do with FFTs. But don't worry, I promised FFTs, and FFTs you will get. We're getting there slowly but surely.
Recall the "unique-interpolation-theorem" I've discussed in a previous post of mine. This theorem claims that for each set of pairs of points such that for all we have there is a unique polynomial of degree at most such that for all .
So far, to represent a polynomial of degree , we used a set of coefficients such that . We call this coefficient representation. The term represent means that using the given constraints, or information, we can uniquely identify a specific polynomial of degree who satisfies those constraints.
However, we can think of another way to represent a polynomial. Consider a fixed set of points . To represent we can take the evaluations of at these points . From the "unique-interpolation-theorem" is the only polynomial of degree with the given evaluations at the given points. We call this evaluation representation. In other words the evaluation representation of a polynomial of degree is its evaluation on a set of predetermined points
Now let's say we're given two polynomials of degree and we want to compute their addition , and let's both input polynomials are represented using the evaluation representation. We know that for each the following equation holds: So adding two polynomials represented by their evaluations only takes field operations (additions).
As for multiplying and the following equation also holds: So to multiply two polynomials we only have to perform field operations (multiplications) in the evaluation-representation. This is because after the the point-wise multiplication we will have the evaluation of polynomial over the set of points . This is exactly the evaluation-representation of . This is much faster then multiplying two polynomials using coefficient-representation.
Given a polynomial of degree represented in the coefficient-representation. How long does it take to change its representation into the evaluation-representation over some points ?
We can do this by sequentially evaluating for each . Since each evaluation takes up to additions and multiplications, the evaluation of points takes time.
The opposite direction, in which we take a polynomial represented in the evaluation-representation a compute is coefficient-representation would take as well using the Lagrange-Interpolation algorithm (I have described it in a previous post).
As mentioned, the schoolbook algorithm to multiply two polynomials (in the coefficient representation) takes . With our previous observation, however, we can think of another algorithm to multiply two degree polynomials. In this new algorithm we take the polynomials, evaluate them over points, multiply the evaluations and interpolate the evaluations to obtain the coefficients of the product-polynomial. For completeness the algorithm is given here:
Input: Two degree polynomials and .
Output: A polynomial of degree .
- Select arbitrary points .
- Compute and
- Compute
- Interpolate to create a polynomial of degree .
Let's see how much time each step of the algorithm takes:
So, overall, our new algorithm takes time, just like the old algorithm. Not too bad.
Our current bottleneck of the algorithm is changing the representation of the polynomial back and forth between coefficient and evaluation representations. If only we could do those faster, our multiplication algorithm will be faster overall.
Remember we have chosen the evaluation points of our evaluation-representation 100% arbitrarily. What if those points weren't chosen arbitrarily? Are there specific points with which we can change representations faster?
Well, apparently the answer is yes! and this is exactly where Fast-Fourier-Transforms (FFTs) come to the rescue.
Just like polynomials are defined over a specific field, so are FFTs. The field can be either a finite field (e.g. , the finite field of prime size ) or infinite (e.g. , the field of complex numbers), but it has to be a field.
FFTs can be used to change the representation of a polynomial of degree with coefficients in some field from coefficients-representation to evaluation-representation (and vice-versa) where the evaluation is given specifically over unique field elements who are roots of unity.
So what are roots of unity?
We begin with a definition.
Definition [root of unity]: Let be a field element of some field and let be an integer. We say that is a root of unity of order (or root of unity) if . The power and equivalence in the equation follow the finite field arithmetic.
So ROUs are, as their name suggest, roots of the unit element of the field. Let's have an example. Consider the finite field with 5 elements with modulo-5 addition and multiplication. Now, let's look at the powers of the field element . Now, since it is a root of unity of order 4.
Great, now let's consider another element, say . We know that and thus, So 4 is a root of unity of order 2. However, since we can also tell that 4 is a root of unity of order 4, because:
In fact, we can prove an even deeper theorem about roots-of-unity:
Theorem: Let be a root of unity of order , then is also a root of unity for all such that ( divides ).
The proof is very straight forward, since we can write for some integer . Now we can prove that is a ROU of order by showing that . Notice that because we assumed is a ROU of order . So, from this observation we define a special kind of ROUs which are primitive-roots-unity:
Definition [primitive root of unity]: Let be a field element and let be an integer. We say that is a primitive root of unity (PROU) of order if is a root of unity of order but isn't a root of unity of order for any .
So, following this definition we can tell that is a PROU of order 4, because but are all not equal to . However, is not a PROU of order 4, because even though we also have .
Another way to think about ROUs and PROUs is through the order of elements.
Definition [order of element]: Given a non-zero field element , the order of is the smallest positive integer such that . We denote the order as .
Therefore, an element is -PROU if .
Ok, so now that we know a little about roots of unity, you are probably wondering, what do they have to do with FFTs? We have already said that FFTs can be thought of as an algorithm to quickly change the representation of a polynomial of degree from coefficient-representation to evaluation-representation and vice-versa where the evaluation is computation over points who are roots of unity.
Let's give a more explicit definition for FFTs.
Let be a number and let's assume we have a field element such that , so is PROU. The FFT algorithm takes points: and computes new points such that:
To make the definition more straightforward, we can think of a polynomial . (Notice - is a function of !) And define:
So, we practically compute as the evaluations of on the powers of a PROU , which are .
In this context we usually call the size of the FFT.
There are various algorithms to compute FFTs and we will try and focus on the most basic one, also known as the Cooley-Tuckey Algorithm (CT) which was invented by Gauss and rediscovered independently in the 1960s by Cooley and Tuckey.
In its core, the CT algorithm computes an FFT of size where is typically a product of many small primes. It is common to call such numbers smooth-numbers. For example is a smooth-number. is also a smooth number, but is not smooth however. So CT can also work for non-smooth sizes of FFTs, but it's getting really efficient only for such smooth sizes. In other words, the smoother the size of the FFT, the bigger the advantage of the CT algorithm over the traditional algorithm.
In the rest of this section I'll try to first give some informal sense behind the core idea of the CT algorithm. Next, I'll give the explicit algorithm for both the FFT and the inverse FFT in the spirit of CT.
Let be a finite field of prime size . Let be a number dividing and let be a polynomial of degree at most over . We are given the coefficient-representation of , so where are all elements of the field .
Since is dividing we have a -PROU, denote it with . So for and . Let's assume that is smooth, in particular that for some integer , so is a power of two.
We can write the polynomial as follows: So we can express using and where are polynomials (and a little multiplication by ). Let's write those polynomials explicitly, we replace the term with a so will be our polynomials.
The following figure visualizes the reduction step for a polynomial of degree :
The degree of each polynomial is less than half the degree of the original polynomial, so we have expressed the problem of evaluating a polynomial, by the problem of evaluating two polynomials of half the degree (and some extra linear-time processing to multiply by the remaining term), right?
Well no, not yet. The original problem was evaluating a degree polynomial over the -ROUs in (). Our new two polynomials still have to be evaluated over points and not over points!
To reduce the number of points we should evaluate we add another observation as follows. Notice that in expressing we evaluate both and on and not on . Originally we were evaluating where is -ROU, so by squaring it is now a -ROU. Since there are only ROUs of order , we have to make only evaluations on and .
So, to evaluate where is -ROU we take the evaluation of , a -ROU, over both and add it to the evaluation of on , multiplied by . In conclusion, to evaluate over ROUs of order , we have to:
For completeness, notice that in the end of the recursion, if the FFT size is then is of degree so for some field element , and thus the evaluation of it on a single point is exactly .
To summarize, our FFT algorithm goes as follows:
Input
Output
Algorithm
If return , because our polynomial is of degree , so it's degree . Therefore the coefficient is the evaluation of .
Split into two polynomials:
Compute, recursively, the evaluations of on the powers of , a PROU of order .
Let be the arrays of legth containing of the evaluations of respectively from the recursive invocation. So and . The polynomials satisfy: .
For each set:
Output .
Let denote the number of computational steps we have to perform over an FFT of size . Since we have to solve a similar problem of size twice + additions and multiplications. So:
So, we devised a algorithm to change the representation of a polynomial from coefficient-representation to evaluation-representation of ROUs of order . As we will explain next, the inverse FFT (from evaluation into coefficient representation) also takes and thus we will be able to multiply polynomials in time.
The Inverse-FFT (IFFT) algorithm, as its name suggests, simply reverts the operation of the original FFT algorithm. Namely, let be a -PROU. The IFFT algorithm takes , the evaluation-representation of some polynomial of degree and outputs the coefficient-representation .
In this section we'll give an informal explanation about the IFFT algorithm. As we did with the FFT algorithm, we'll assume for the sake of simplicity that is a smooth number, . If we could find the coefficient representation of two polynomials and both of degree such that:
Then we could immediately derive the coefficient representation of . Let's see how its done. If and then:
Therefore, the coefficient of in is . Similarly, , the coefficient of in .
So, if we had the coefficient-representation of such we could obtain coefficient-representation of .
At this point you've probably noticed already that this could be our recursive-step! We split the problem of obtaining the coefficient representation of a degree polynomial into obtaining the polynomial representation of two degree polynomials.
The only thing left to do then, is to translate the input of our original problem (evaluations of on all -ROUs) into the inputs of our new, smaller, problems (evaluations of and on all -ROUs).
We know (from the FFT algorithm) that polynomials of degree exist such that . Now, let be a -PROU. And let be an integer in the range . We have the following two equations obtain by setting and in the equation above:
Arranging the second equation a little bit we get: Now, recall that is -PROU so and . Substituting these two identities in the second equation, we get:
This is looking good! Let's see what we have here:
So we have to simply solve a system of two equations with two variables. Solving it we get:
By solving this system of equations for all values of we get the evaluations of and of the powers of which is a PROU of order .
This was the recursive step. The recursion ends when is of degree , in that case is of degree . So is a constant polynomial, and we are given its evaluation on power . So we have and therefore the evaluation itself is exactly the single coefficient of our algorithm.
Let's try and write the algorithm:
Input
Output
Algorithm
What if isn't a power of 2? In both our FFT and inverse FFT we were expressing the input polynomial using two polynomials such that: This was simple because the group of powers of , our -PROU could be split into pairs of elements and such that the squarings of both are equal to .
So, if isn't a power of two, for example, , we can split the polynomial using three polynomials such that:
This will utilize the fact that the order of is a multiple of and therefore we could split the powers of into triplets such that cubing them (i.e. - raising them by the power of ) will yield .
We will also call the number of polynomials was split into the split-factor of this recursive step. Notice that if has different prime factors we may use a different split-factor in each recursive step.
We stated at the beginning of the post that we prefer the case in which is smooth. Why is that? If you closely pay attention you'll notice that step of the FFT and step of the IFFT are both taking time where is the length of the input and is the split factor. For example, if the split factor is then each entry in the output of the FFT is computed using three evaluations, one from one from and one from . Overall, we make operations.
Now, if instead of 3, we a very large prime, say , then each entry in the output will require at least field operations to be computed taking us closer to the old school-book algorithm and slowing down our algorithm. Therefore, given a polynomial of degree which is not a power of we look at as a polynomial of degree for some and solve an FFT of size . This may not work if we specifically are interested in the evaluations of some PROU of order which is not a power of 2.
It would have been really nice if finite fields had PROUs of smooth orders so we could use them to construct efficient FFTs. The problem is that finite fields don't always have a root of unity of order for any , so computing FFTs of the closest power of may not work. To make things clear, let's look at secp256k1.
The order of the generator of secp256k1 is the prime number: p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141
.
Therefore, the scalars we use are elements in the field .
So, the multiplicative group is of size whose factorization is where are some very large primes.
The order of each non-zero element in the field must divide and therefore we don't have a PROU for any order of our choice. Actually, even if we consider and to be smooth, the largest FFT we can compute over secp256k1 is of size .
In other words, the largest "smooth" PROU we can find in secp256k1 group is of order .
One implication of this is that many projects that rely on FFTs of very high orders use curve which has as a factor of the multiplicative group of the scalars. If you want to compute FFTs over the scalar field of some curve which doesn't have a large smooth factor, you may consider other options such as using ECFFTs.
In this post we have introduced FFTs and why they are needed. We have also presented one of the most popular FFT algorithms, the Cooley-Tuckey algorithm and its inverse, the IFFT algorithm.