Try   HackMD

FFT over Finite Fields

Disclaimer and Foreword

Recently I've been working on an implementation of FFTs over finite fields for some purposes (which I'll write about some day). I was finding the algorithm and the problem neat and thought it would be nice to share what are FFTs, how are they computed and what can be done with them.

While reading please remember that FFTs are a gigantic subject with lots of cool caveats, algorithms and use-cases. This post only intends to be merely an introductory to the subject. If you have further questions about this post or if you find this topic interesting and want to discuss about it or general aspects of crypto, feel free to reach out on Telegram or on Twitter.

Introduction

Use cases

Polynomials are a very interesting algebraic construct. In particular they have many use cases in cryptography such as SNARKs/STARKs, Shamir-Secret-Sharing (see my post about it), Ring-LPN and more. It's 100% ok if you don't know any of those buzzwords, I will not assume any prior knowledge about them. The only point here is that polynomials are very useful.

Definition

A quick recap on what are polynomials. A function

P(x) is a polynomial if it can be expressed using the following algebraic term:
P(x)=i=0naixi
For some non-negative integer
n
and a set of
n+1
numbers
a0,a1,,an
. In this post we will sometimes write only
P
instead of
P(x)
since all polynomials we shall discuss are of a single variable, typically
x
.

Generally speaking, the elements

a0,,an are elements of some field. In this article we will primarily focus on finite fields and thus assume this field to be a finite field of some prime size
Z/pZ
.

The degree of a polynomial is the largest index

i such that
ai0
. For a polynomial
P(x)
we denote its degree using
deg(P)
. In the edge-case where
P(x)=0
, all coefficients are zero we say that the
deg(P)=
.

Operations

Let

P(x)=i=0npixi and
Q(x)=i=0nqixi
be two polynomials of degree
n
(so
pn0
and
qn0
).

One thing we can do with

P,Q is adding them. So, the sum of
P
and
Q
is:
Q+P=i=0n(pi+qi)xi

In terms of computational complexity, the addition of two degree-

n polynomials takes
O(n)
field operations. In other words, it is linear in the size of the polynomial. This is because, as the formula suggests, there are
n
coefficients in the results polynomial and each coefficient can be computed with a single addition of two field elements.

We can also multiply

P and
Q
by each other. But when it comes to multiplication, things get a little bit more complicated. The product of
P
and
Q
is:

QP=i=0nj=0npiqjxi+j

First, notice that the degree of the resulting polynomial is

2n, so by multiplying we get a polynomial of a higher degree.

As this formula suggests the multiplication take

O(n2) time, this is because for each of the
n+1
coefficients of
P
we multiply it by each of the coefficients of
Q
, thus performing as many as
(n+1)(n+1)
field multiplications.

As some of you may find it more appealing, we can also write the multiplication as:

QP=i=02nxij=0ipjqij

This is because the coefficient of

xi in the resulting polynomial is the inner sum:
j=0ipjqij
.

So,

O(n2) is nice, but can we do better? I mean, imagine if
P,Q
are of degree of
109
. It would be infeasible to multiply them!

At this point you're probably wondering what all these have to do with FFTs. But don't worry, I promised FFTs, and FFTs you will get. We're getting there slowly but surely.

Polynomial Representation

Recall the "unique-interpolation-theorem" I've discussed in a previous post of mine. This theorem claims that for each set of

n+1 pairs of points
{(x0,y0),,(xn,yn)}
such that for all
ij
we have
xixj
there is a unique polynomial
P(x)
of degree at most
n
such that
P(xi)=yi
for all
0in
.

So far, to represent a polynomial

P of degree
n
, we used a set of
n+1
coefficients
(p0,...,pn)
such that
P(x)=i=0npixi
. We call this coefficient representation. The term represent means that using the given constraints, or information, we can uniquely identify a specific polynomial of degree
n
who satisfies those constraints.

However, we can think of another way to represent a polynomial. Consider a fixed set of

n+1 points
(x0,...,xn)
. To represent
P
we can take the evaluations of
P
at these points
P(x0),...,P(xn)
. From the "unique-interpolation-theorem"
P(x)
is the only polynomial of degree
n
with the given evaluations at the given points. We call this evaluation representation. In other words the evaluation representation of a polynomial
P(x)
of degree
n
is its evaluation on a set of predetermined points
(x0,...,xn)

Now let's say we're given two polynomials

P(x),Q(x) of degree
n
and we want to compute their addition
S(x)=P(x)+Q(x)
, and let's both input polynomials are represented using the evaluation representation. We know that for each
x0,...,xn
the following equation holds:
S(xi)=P(xi)+Q(xi)
So adding two polynomials represented by their evaluations only takes
n
field operations (additions).

As for multiplying

P(x) and
Q(x)
the following equation also holds:
S(xi)=P(xi)Q(xi)
So to multiply two polynomials we only have to perform
n
field operations (multiplications) in the evaluation-representation. This is because after the the point-wise multiplication we will have the evaluation of polynomial
S
over the set of points
{xi}
. This is exactly the evaluation-representation of
S
. This is much faster then multiplying two polynomials using coefficient-representation.

Changing Representation

Given a polynomial

P(x) of degree
n
represented in the coefficient-representation. How long does it take to change its representation into the evaluation-representation over some points
(x0,...,xn)
?

We can do this by sequentially evaluating

P(xi) for each
xi
. Since each evaluation takes up to
n+1
additions and
n+1
multiplications, the evaluation of
n+1
points takes
O(n2)
time.

The opposite direction, in which we take a polynomial represented in the evaluation-representation a compute is coefficient-representation would take

O(n2) as well using the Lagrange-Interpolation algorithm (I have described it in a previous post).

Multiplying Polynomials

As mentioned, the schoolbook algorithm to multiply two polynomials (in the coefficient representation) takes

O(n2). With our previous observation, however, we can think of another algorithm to multiply two degree
n
polynomials. In this new algorithm we take the polynomials, evaluate them over
2n+1
points, multiply the evaluations and interpolate the evaluations to obtain the coefficients of the product-polynomial. For completeness the algorithm is given here:

Input: Two degree

n polynomials
P(x)=i=0npixi
and
Q(x)=i=0nqixi
.

Output: A polynomial

S(x)=i=02nsixi of degree
2n
.

  1. Select arbitrary
    2n+1
    points
    x0,...,x2n+1
    .
  2. Compute
    P=(P(x0),...,P(x2n+1))
    and
    Q=(Q(x0),...,Q(x2n+1))
  3. Compute
    S=(P(x0)Q(x0),...,P(x2n+1)Q(x2n+1))
  4. Interpolate
    S
    to create a polynomial
    S(x)
    of degree
    2n
    .

Let's see how much time each step of the algorithm takes:

  1. O(n)
    time to select
    2n+1
    points.
  2. O(n2)
    to change the representation into evaluation-representation of
    P,Q
    .
  3. O(n)
    time to perform
    2n+1
    field multiplications.
  4. O(n2)
    time to revert the representation of
    S
    back to coefficient-representation.

So, overall, our new algorithm takes

O(n2) time, just like the old algorithm. Not too bad.

Can we do better?

Our current bottleneck of the algorithm is changing the representation of the polynomial back and forth between coefficient and evaluation representations. If only we could do those faster, our multiplication algorithm will be faster overall.

Remember we have chosen the evaluation points

(x0,...,xn) of our evaluation-representation 100% arbitrarily. What if those points weren't chosen arbitrarily? Are there specific points with which we can change representations faster?

Well, apparently the answer is yes! and this is exactly where Fast-Fourier-Transforms (FFTs) come to the rescue.

FFTs

Just like polynomials are defined over a specific field, so are FFTs. The field can be either a finite field (e.g.

Z/pZ, the finite field of prime size
p
) or infinite (e.g.
C
, the field of complex numbers), but it has to be a field.

FFTs can be used to change the representation of a polynomial

P(x) of degree
<n
with coefficients in some field
F
from coefficients-representation to evaluation-representation (and vice-versa) where the evaluation is given specifically over
n
unique field elements who are
nth
roots of unity.

So what are roots of unity?

Roots of Unity

We begin with a definition.

Definition [root of unity]: Let

r be a field element of some field and let
i
be an integer. We say that
r
is a root of unity of order
i
(or
ith
root of unity) if
ri1(mod p)
. The power and equivalence in the equation follow the finite field arithmetic.

So ROUs are, as their name suggest, roots of the unit element of the field. Let's have an example. Consider

Z/5Z the finite field with 5 elements
{0,1,2,3,4}
with modulo-5 addition and multiplication. Now, let's look at the powers of the field element
e=2
.
e1=2e2=4e3=3e4=1
Now, since
e4=1
it is a root of unity of order 4.

Great, now let's consider another element, say

4. We know that
e2=4
and thus,
42=(e2)2=e4=1
So 4 is a root of unity of order 2. However, since
42=1
we can also tell that 4 is a root of unity of order 4, because:
44=(42)2=12=1

In fact, we can prove an even deeper theorem about roots-of-unity:

Theorem: Let

r be a root of unity of order
i
, then
r
is also a root of unity for all
n
such that
i|n
(
i
divides
n
).

The proof is very straight forward, since

i|n we can write
n=ki
for some integer
k
. Now we can prove that
r
is a ROU of order
n
by showing that
rn=1
.
rn=rik=(ri)k=1k=1
Notice that
ri=1
because we assumed
r
is a ROU of order
i
. So, from this observation we define a special kind of ROUs which are primitive-roots-unity:

Definition [primitive root of unity]: Let

r be a field element and let
i
be an integer. We say that
r
is a primitive root of unity (PROU) of order
i
if
r
is a root of unity of order
i
but isn't a root of unity of order
j
for any
1<j<i
.

So, following this definition we can tell that

2 is a PROU of order 4, because
24=1
but
21,22,23
are all not equal to
1
. However,
4
is not a PROU of order 4, because even though
44=1
we also have
42=1
.

Another way to think about ROUs and PROUs is through the order of elements.

Definition [order of element]: Given a non-zero field element

e, the order of
e
is the smallest positive integer
k
such that
ek=1
. We denote the order
e
as
ord(e)
.

Therefore, an element

e is
kth
-PROU if
ord(e)=k
.

Ok, so now that we know a little about roots of unity, you are probably wondering, what do they have to do with FFTs? We have already said that FFTs can be thought of as an algorithm to quickly change the representation of a polynomial of degree

<n from coefficient-representation to evaluation-representation and vice-versa where the evaluation is computation over
n
points who are
nth
roots of unity.

Let's give a more explicit definition for FFTs.

Definition of FFT

Let

n be a number and let's assume we have a field element
g
such that
ord(g)=n
, so
g
is
nth
PROU. The FFT algorithm takes
n
points:
x0,...,xn1
and computes new
n
points
X0,...,Xn1
such that:
Xk=i=0n1xigki

To make the definition more straightforward, we can think of a polynomial

P(e)=xiei. (Notice -
P
is a function of
e
!) And define:
Xk=P(gk)

So, we practically compute

X0,...,Xn1 as the evaluations of
P(e)
on the powers of a
nth
PROU
g
, which are
g0,g1,...,gn1
.

In this context we usually call

n the size of the FFT.

The Algorithm

There are various algorithms to compute FFTs and we will try and focus on the most basic one, also known as the Cooley-Tuckey Algorithm (CT) which was invented by Gauss and rediscovered independently in the 1960s by Cooley and Tuckey.

In its core, the CT algorithm computes an FFT of size

n where
n
is typically a product of many small primes
. It is common to call such numbers smooth-numbers. For example
n=213
is a smooth-number.
n=2735113
is also a smooth number, but
n=67,280,421,310,721524,287
is not smooth however. So CT can also work for non-smooth sizes of FFTs, but it's getting really efficient only for such smooth sizes. In other words, the smoother the size of the FFT, the bigger the advantage of the CT algorithm over the traditional
O(n2)
algorithm.

In the rest of this section I'll try to first give some informal sense behind the core idea of the CT algorithm. Next, I'll give the explicit algorithm for both the FFT and the inverse FFT in the spirit of CT.

FFT

Let

Z/pZ be a finite field of prime size
p
. Let
n
be a number dividing
p1
and let
P(x)
be a polynomial of degree at most
n1
over
Z/pZ
. We are given the coefficient-representation of
P
, so
P(x)=i=0n1aixi
where
a0,...,an1
are all elements of the field
Z/pZ
.

Since

n is dividing
p1
we have a
nth
-PROU, denote it with
g
. So
gi1
for
1in1
and
gn=1
. Let's assume that
n
is smooth, in particular that
n=2k
for some integer
k
, so
n
is a power of two.

We can write the polynomial

P(x) as follows:
P(x)=i=0n1aixi=i=0n/21a2ix2ieven-index terms+i=0n/21a2i+1x2i+1 odd-index terms=i=0n/21a2i(x2)iP0(x2)+xi=0n/21a2i+1(x2)iP1(x2)=P0(x2)+xP1(x2)
So we can express
P(x)
using
P0(x2)
and
P1(x2)
where
P0,P1
are polynomials (and a little multiplication by
x
). Let's write those polynomials explicitly, we replace the
x2
term with a
y
so
P0(y),P1(y)
will be our polynomials.

P0(y)=i=0n/21a2iyiandP1(y)=i=0n/21a2i+1yi The following figure visualizes the reduction step for a polynomial of degree
8
:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The degree of each polynomial is less than half the degree of the original polynomial, so we have expressed the problem of evaluating a polynomial, by the problem of evaluating two polynomials of half the degree (and some extra linear-time processing to multiply

P1(x2) by the remaining
x
term), right?

Well no, not yet. The original problem was evaluating a degree

<n polynomial
P(x)
over the
n
-ROUs in
Zp
(
g0,...,gn1
). Our new two polynomials still have to be evaluated over
n
points and not over
n/2
points!

To reduce the number of points we should evaluate we add another observation as follows. Notice that in expressing

P(x)=P0(x2)+xP1(x2) we evaluate both
P0
and
P1
on
x2
and not on
x
. Originally we were evaluating
P(x)
where
x
is
nth
-ROU, so by squaring it
x2
is now a
(n/2)th
-ROU
. Since there are only
n/2
ROUs of order
n/2
, we have to make only
n/2
evaluations on
P0
and
P1
.

So, to evaluate

P(x) where
x
is
nth
-ROU we take the evaluation of
x2
, a
(n/2)th
-ROU, over both
P0
and add it to the evaluation of
P1
on
x2
, multiplied by
x
. In conclusion, to evaluate
P(x)
over
n
ROUs of order
n
, we have to:

  1. Evaluate
    P0(y)
    and
    P1(y)
    over
    n/2
    ROUs of order
    n/2
    .
  2. Compute
    P(x)=P0(x2)+xP1(x2)
    for each
    x
    which is an ROU of order
    n
    .

For completeness, notice that in the end of the recursion, if the FFT size is

1 then
P(x)
is of degree
0
so
P(x)=c
for some field element
c
, and thus the evaluation of it on a single point is exactly
c
.

To summarize, our FFT algorithm goes as follows:

Input

  • A power of two number,
    n=2k
    .
  • A
    nth
    -PROU,
    g
    .
  • The coefficient representation of a polynomial
    P(x)=i=0n1cixi
    .

Output

  • The evaluation representation of
    P(x)
    on the
    n
    powers of
    g
    :
    P=[P(g0),...,P(gn1)]
    .

Algorithm

  1. If

    n=1 return
    [c0]
    , because our polynomial is of degree
    <1
    , so it's degree
    0
    . Therefore the coefficient
    c0
    is the evaluation of
    P(g0)
    .

  2. Split

    P(x) into two polynomials:

    • P0(x)=i=0n/21c2ixi
      which will be built using the even coefficients of
      P(x)
      .
    • P1(x)=i=0n/21c2i+1xi
      which will be built using the odd coefficients of
      P(x)
      .
  3. Compute, recursively, the evaluations of

    P0(x),P1(x) on the
    n/2
    powers of
    g2
    , a PROU of order
    n/2
    .

  4. Let

    S,T be the arrays of legth
    n/2
    containing of the evaluations of
    P0,P1
    respectively from the recursive invocation. So
    S[i]=P0((g2)i)
    and
    T[i]=P1((g2)i)
    . The polynomials
    P0,P1
    satisfy:
    P(x)=P0(x2)+xP1(x2)
    .

  5. For each

    0i<n/2 set:

    • P[i]=S[i]+giT[i]
    • P[i+n/2]=S[i]+gn/2+iT[i]
  6. Output

    P.

Time complexity

Let

T(n) denote the number of computational steps we have to perform over an FFT of size
n=2k
. Since we have to solve a similar problem of size
n/2
twice +
n
additions and
n
multiplications. So:
T(n)=2T(n/2)+2n=2(2T(n/4)+2n2)=4T(n/4)+4n=4(2T(n/8)+2n4)=8T(n/8)+6n=...=nT(1)+2log2(n)n=O(nlog2(n))

So, we devised a

O(nlog(n)) algorithm to change the representation of a polynomial from coefficient-representation to evaluation-representation of
n
ROUs of order
n
. As we will explain next, the inverse FFT (from evaluation into coefficient representation) also takes
O(nlog(n))
and thus we will be able to multiply polynomials in
O(nlog(n))
time.

The Inverse FFT

The Inverse-FFT (IFFT) algorithm, as its name suggests, simply reverts the operation of the original FFT algorithm. Namely, let

g be a
nth
-PROU. The IFFT algorithm takes
P(g0),P(g1),...,P(gn1)
, the evaluation-representation of some polynomial
P(x)
of degree
<n
and outputs the coefficient-representation
P(x)=i=0n1cixi
.

In this section we'll give an informal explanation about the IFFT algorithm. As we did with the FFT algorithm, we'll assume for the sake of simplicity that

n is a smooth number,
n=2k
. If we could find the coefficient representation of two polynomials
P0
and
P1
both of degree
<n/2
such that:
P(x)=P0(x2)+xP1(x2)

Then we could immediately derive the coefficient representation of

P. Let's see how its done. If
P0(y)=i=0n/21aiyi
and
P1(y)=i=0n/21biyi
then:

P(x)=P0(x2)+xP1(x2)=i=0n/21aix2i+xi=0n/21bix2i=i=0n/21(aix2i+bix2i+1)

Therefore,

c2i the coefficient of
x2i
in
P(x)
is
ai
. Similarly,
c2i+1=bi
, the coefficient of
x2i+1
in
P(x)
.

So, if we had the coefficient-representation of such

P0,P1 we could obtain coefficient-representation of
P
.

At this point you've probably noticed already that this could be our recursive-step! We split the problem of obtaining the coefficient representation of a

<n degree polynomial into obtaining the polynomial representation of two
<n/2
degree polynomials.

The only thing left to do then, is to translate the input of our original problem (evaluations of

P on all
nth
-ROUs) into the inputs of our new, smaller, problems (evaluations of
P0
and
P1
on all
(n/2)th
-ROUs).

We know (from the FFT algorithm) that polynomials

P0,P1 of degree
<n/2
exist such that
P(x)=P0(x2)+xP1(x2)
. Now, let
g
be a
nth
-PROU. And let
i
be an integer in the range
[0,n/2)
. We have the following two equations obtain by setting
x=gi
and
x=gn/2+i
in the equation above:

P(gi)=P0(g2i)+giP1(g2i)P(gn/2+i)=P0(g2(n/2+i))+gn/2+iP1(g2(n/2+i)) Arranging the second equation a little bit we get:
P(gi)=P0(g2i)+giP1(g2i)P(gn/2+i)=P0(gng2i)+gn/2giP1(gng2i)
Now, recall that
g
is
nth
-PROU so
gn=1
and
gn/2=1
. Substituting these two identities in the second equation, we get:
P(gi)=P0((g2)i))+giP1((g2)i))P(gn/2+i)=P0((g2)i)giP1((g2)i)

This is looking good! Let's see what we have here:

  • P(gi)
    and
    P(gn/2+i)
    are known to us, since we are given an array with the evaluations of
    P
    .
  • gi
    is known to us.
  • P0((g2)i)
    and
    P1((g2)i)
    are unknown to us, when taken over all
    i
    values in the range
    [0,n/2)
    these are exactly the evaluations of
    P0
    and
    P1
    over the
    n/2
    powers of a
    n/2th
    -PROU (which is
    g2
    ).

So we have to simply solve a system of two equations with two variables. Solving it we get:

P0((g2)i)=P(gi)+P(gn/2+i)2P1((g2)i)=P(gi)P(gn/2+i)2gi

By solving this system of equations for all values of

i we get the
n/2
evaluations of
P0
and
P1
of the powers of
g2
which is a PROU of order
n/2
.

This was the recursive step. The recursion ends when

P is of degree
<1
, in that case
P
is of degree
0
. So
P(x)=c
is a constant polynomial, and we are given its evaluation on power
g0=1
. So we have
P(1)=c
and therefore the evaluation itself is exactly the single coefficient of our algorithm.

Let's try and write the algorithm:

Input

  • g
    - an
    nth
    -PROU.
  • A number
    n=2k
    , a power of two.
  • An array
    P
    of length
    n
    such that
    P[i]=P(gi)
    , the evaluation of some polynomial
    P
    of degree
    <n
    . This is the evaluation representation of polynomial
    P
    .

Output

  • A sequence of
    n
    field elements
    p0,...,pn1
    such that:
    P(x)=i=0n1pixi
    . This is the coefficient representation of polynomial
    P
    .

Algorithm

  1. If
    n=1
    return
    c0=P[i]
    .
  2. Otherwise, compute arrays
    S,T
    of length
    n/2
    such that:
    S[i]=P[i]+P[n/2+i]2T[i]=P[i]P[n/2+i]2gi
    Those arrays are the evaluations of polynomials
    P0,P1
    such that
    P(x)=P0(x2)+xP1(x2)
    over the
    n/2
    powers of
    g2
    .
  3. Recursively, obtain the coefficient representation of such
    P0,P1
    which are
    (s0,...,sn/21)
    and
    (t0,...,tn/21)
    .
  4. Output
    (p0,p1,...,pn1)=(s0,t0,s1,t1,...,sn/21,tn/21)
    .

The case where
n2k

What if

n isn't a power of 2? In both our FFT and inverse FFT we were expressing the input polynomial
P(x)
using two polynomials
P0,P1
such that:
P(x)=P0(x2)+xP1(x2)
This was simple because the group of powers of
g
, our
nth
-PROU could be split into pairs of elements
gi
and
gi+n/2
such that the squarings of both are equal to
g2i
.

So, if

n isn't a power of two, for example,
n=3m
, we can split the polynomial
P
using three polynomials
P0,P1,P2
such that:
P(x)=P0(x3)+xP1(x3)+x2P2(x3)

This will utilize the fact that the order of

g is a multiple of
3
and therefore we could split the powers of
g
into triplets
gi,gn/3+i,g2n/3+1
such that cubing them (i.e. - raising them by the power of
3
) will yield
g3i
.

We will also call the number of polynomials

P(x) was split into the split-factor of this recursive step. Notice that if
n
has different prime factors we may use a different split-factor in each recursive step.

So why do we care about smooth numbers?

We stated at the beginning of the post that we prefer the case in which

n is smooth. Why is that? If you closely pay attention you'll notice that step
5
of the FFT and step
2
of the IFFT are both taking
ns
time where
n
is the length of the input and
s
is the split factor. For example, if the split factor is
3
then each entry in the output of the FFT is computed using three evaluations, one from
P0
one from
P1
and one from
P2
. Overall, we make
n3
operations.

Now, if instead of 3, we a very large prime, say

3259, then each entry in the output will require at least
3259
field operations to be computed taking us closer to the old school-book algorithm and slowing down our algorithm. Therefore, given a polynomial
P(x)
of degree which is not a power of
2
we look at
P
as a polynomial of degree
<2k
for some
k
and solve an FFT of size
2k
. This may not work if we specifically are interested in the evaluations of some PROU of order
n
which is not a power of 2.

The Problem for Finite Fields

It would have been really nice if finite fields had PROUs of smooth orders so we could use them to construct efficient FFTs. The problem is that finite fields don't always have a root of unity of order

n for any
n
, so computing FFTs of the closest power of
2
may not work. To make things clear, let's look at secp256k1.

The order of the generator of secp256k1 is the prime number: p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141. Therefore, the scalars we use are elements in the field

Z/pZ.

So, the multiplicative group is of size

p1 whose factorization is
263149631p0p1p2
where
p0,p1,p2
are some very large primes.

The order of each non-zero element in the field must divide

p1 and therefore we don't have a PROU for any order of our choice. Actually, even if we consider
149
and
631
to be smooth, the largest FFT we can compute over secp256k1 is of size
263149631=18,051,648
.

In other words, the largest "smooth" PROU we can find in secp256k1 group is of order

18,051,648.

One implication of this is that many projects that rely on FFTs of very high orders use

BLS12-381 curve which has
232
as a factor of the multiplicative group of the scalars. If you want to compute FFTs over the scalar field of some curve which doesn't have a large smooth factor, you may consider other options such as using ECFFTs.

Summary

In this post we have introduced FFTs and why they are needed. We have also presented one of the most popular FFT algorithms, the Cooley-Tuckey algorithm and its inverse, the IFFT algorithm.