# Tutorial 4: Understanding FFT --- # Preface. The FFT is probably the most famous computer science algorithm, and there is a boat load of resources out there! This can be quite overwhelming, and perhaps I am being redundant by publishing my own, ha ha! Be that as it may, I will try must best to provide a potent summary for our purposes. However, I would highly recommend you check out some of the resources I have drawn from in order to get a wholistic understanding. My explanation of FFT will draw heavily from the great Professor Aleks! [here](https://www.cse.unsw.edu.au/~cs4121/FFT.pdf) is the link, it goes into much more detail. I would also recommend the [Algorithms book](http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf) as it has a great explanation as well as some superb visuals under section 2.6 (page 64) "The fast Fourier transform". And for code samples I find "Geeks for Geeks" resources are quite good, they have provided a [recursive implementation](https://www.geeksforgeeks.org/fast-fourier-transformation-poynomial-multiplication/) and an [iterative implementation](https://www.geeksforgeeks.org/iterative-fast-fourier-transformation-polynomial-multiplication/). I will provide some examples in rust. Anyway, if you just want my subpar explanation, then let's get into it! # Understanding FFT with an example. ## Context. The FFT is an efficient algorithm which computes the DFT of a sequence of numbers. These numbers usually define a function and the output of the transform is a series of "samples" which can uniquely define the function, and is invertible, so we can "undo" the transform and return the original domain. Now in terms of polynomials, the DFT is used to convert an input polynomial from coefficient representation to evaluation representation. Therefore, it will take a polynomial of length $n$ and degree $n-1$, and converts it in $n$ evaluation points. The evaluation points are often referred to as samples and are enough to unique define the polynomial, so we can use the inverse DFT to return to the coefficient domain. Calculating the DFT for a polynomial is highly inefficient because evaluating a polynomial at a point has $O(n)$ complexity and this must be done for $n$ points, and we can to actually chose the points correctly! FFT reduces the complexity to $O(nlog(n))$ complexity by employing a "divide and conquer" strategy and selects the points intelligently such that its computations can be "reused". ## Understanding the divide and conquer strategy. In order to achieve an efficient algorithm, we must consider a divide and conquer approach, where the heart of the procedure involves splitting up the polynomial into subsets of odd and even powers. ### Example 1: Simple polynomial To investigate this, let's consider a simple concrete polynomial of degree three, $p(x)$: $$ \begin{align} p(x) &= 1 + 2x + 3x^2 + 4x^3 \\ \end{align} $$ We need to split $p(x)$ into its odd and even powers: $$ \begin{align} p(x) &= 1 + 2x + 3x^2 + 4x^3 \\ p(x) &= (1 + 3x^{2}) + (2x + 4x^{3}) \\ &= (1 + 3x^{2}) + x(2 + 4x^{2}) \\ \end{align} $$ We have neatly rearranged the equation and the powers on each side now share the same pattern. Now, let's simplify this further by replacing the inner sub equation with $y=x^2$ $$ \begin{align} p(x) &= (1 + 3y) + x(2 + 4y) \\ \end{align} $$ Which can be rewritten as: $$ \begin{align} p^0(y) &= 1 + 3y \\ p^1(y) &= 2 + 4y \\ p(x) &= p^0(x^2) + xp^1(x^2) \\ \end{align} $$ Now, this seems like a lot of work to rewrite the equation, but it is the key to efficiently evaluating the polynomial at certain points, i.e. it is the key to the transform. The formula can apply to bigger polynomials, and we can recursively apply it until we reach the base case, which is the above. Therefore, the evaluation results of the simple base case can be "recycled" into bigger polynomials. ### Example 2: Abstract polynomial Consider the polynomial: $$ \begin{align} A(x) = c_0 x^0 + c_1 x^1 + \ldots + c_{n-1} x^{n-1} \end{align} $$ And apply the same rearranging: $$ \begin{align} A(x) &= (c_0 + c_2x^2 + c_3x^4 + \ldots + c_{n-2}x^{n-2}) + (c_1x + c_3x^3 + c_5x^5 + \ldots + c_{n-1}x^{n-1}) \\ &= (c_0 + c_2x^2 + c_4x^4 + \ldots + c_{n-2}x^{n-2}) + x(c_1 + c_3x^2 + c_4x^4 + \ldots + c_{n-1}x^{n-2}) \\ \end{align} $$ Notice again the "duplication" of the powers of two! And, remember that computing large powers can be very expensive, so if we can "built it up" with smaller powers then this will reduce the cost. Let's make this obvious and the pattern will become clear. $$ \begin{align} &= (c_0 + c_2(x^2) + c_4(x^2)^2 + \ldots + c_{n-2}(x^2)^{(n-2)/2}) + x(c_1 + c_3(x^2) + c_4(x^2)^2 + \ldots + c_{n-1}(x^2)^{(n-2)/2}) \\ \end{align} $$ Now let $y=x^2$ . We have the two polynomials that form $A(x)$ $$ \begin{align} A^0(y) &= c_0 + c_2y + c_4y^2 + \ldots + c_{n-2}y^{(n-2)/2} \\ A^1(y) &= c_1 + c_3y + c_4y^2 + \ldots + c_{n-1}y^{(n-2)/2} \\ A(x) &= A^0(x^2) + x A^1(x^2) \end{align} $$ Again, we can continue to apply the same rearranging strategy to break the problem down into its base case. $$ \begin{align} A^{0}(y) &= c_0 + c_2y + c_4y^2 + \ldots + c_{n-2}y^{(n-2)/2} \\ A^{0, 0}(y) &= c_0 + c_4y^2 + \ldots \\ A^{0, 1}(y) &= c_2 + c_6y^2 + \ldots \\ A^{0} &= A^{0, 0}(y^2) + y A^{0,1}(y^2) \\ \end{align} $$ And also for the other half of the polynomial $$ \begin{align} A^{1}(y) &= c_1 + c_3y + c_4y^2 + \ldots + c_{n-1}y^{(n-2)/2} \\ A^{1, 0}(y) &= c_1 + c_5y^2 + \ldots \\ A^{1, 1}(y) &= c_3 + c_7y^2 + \ldots \\ A^{1} &= A^{1, 0}(y^2) + y A^{1,1}(y^2) \\ \end{align} $$ Notice the pattern as we rearrange the sides of the polynomial, this pattern follows as we increase the size of the polynomial; we continue rearranging the equation recursively until we reach the "sub-equations" with two coefficients. ### Example 3: Larger concrete polynomial Now let's consider a polynomial with of degree 7: $$ \begin{align} q(x) &= 3 + 4x + 6x^2 + 2x^3 + 1x^4 + 10x^5 + 11x^{6}+ 7x^7 \\ &= (3 + 6x^{2} + 1x^{4} + 11x^{6} ) + x(4 + 2x^{2} + 10x^{4} + 7x^6) \\ &= ((3 + 1x^{4}) + x^2 (6 + 11x^{4})) + x((4 + 10x^{4}) + x^2(2 + 7x^4)) \\ \end{align} $$ Or, if you prefer as abstract coefficients: $$ \begin{align} A(x) &= (c_{0} + c_2x^{2} + c_{4}x^{4} + c_{6}x^{6}) + x(c_1 + c_3x^{2} + c_{5}x^{4} +c_7x^{6}) \\ &= ((c_{0} + c_{4}x^{4}) + x^{2} (c_2 + c_{6}x^{4})) + x( (c_1 + c_{5}x^{4}) + x^{2} (c_{3} +c_{7}x^{4})) \\ \end{align} $$ Each "sub" equation, the power of x is mirrored on the other side, this allows us to reuse the results. We can see the most basic form for each coefficient pair. With this rearrangement we have reduced the computation to a single add and multiply by some $x$ power of 2, $x^{2} , x^{4}, x^{6} ...$ The result being reused on the recursion. $$ next_{even} = curr_{even} + curr_{odd} \cdot x $$ Where x is power of 2, $x^{0}, x^{2} , x^{4}, x^{6} ...$., where each level of recursion the $x$ from the previous level is squared. More generally: $$ A(x) = A_{even}(x^2) + xA_{odd}(x^2) $$ Let's look a some pseudocode (abbreviated rust code) to understand how this would play out in the code. ``` fn FFT(a: [F], x: F) -> [F] { let n = a.len(); if n == 1 { return a; } let even = FFT(get_evens(a), x ^ 2); let odd = FFT(get_odds(a), x ^ 2); let r = []; for j in 0..n { r[j] = even[j] + (x ^ j) * odd[j]; // <-- we are NOT exploiting butterfly trick (for explanation purposes) } return r; } ``` Notice the above equation is turned into code $$ \begin{align} r_j &= even_j + (x ^ j) * odd_j \\ \\ \text{where:} \\ even_j &= FFT(a_{evens}, x^2) \\ odd_j &= FFT(a_{odds}, x^2) \\ \end{align} $$ **HOWEVER**, the algorithm can be optimised further because we have not exploited the "recycling" of results correctly. The trick is usually known as "the butterfly". ## Understanding the "butterfly" trick. Remember, the goal of the FFT is to compute a bunch of results at certain evaluation points. Considering the polynomial: $$ \begin{align} A(x) = c_0 x^0 + c_1 x^1 + \ldots + c_{n-1} x^{n-1} \end{align} $$ The FFT would be the resulting values computed at the points $x_0, x_1, x_2, \ldots, x_{n-1}$: $$ \begin{align} FFT(A(x)) &= \{ A(x_{0}), A(x_{1}), A(x_{2}), \ldots, A(x_{n-1}) \} \\ \end{align} $$ And this computation would be performed with the recursive relationship: $$ A(x) = A_{even}(x^2) + xA_{odd}(x^2) $$ Thus the FFT would look something like this: $$ \begin{align} A(x_0) &= A_{e}(x_{0}^2) + x_{0} A_{o}(x_{0}^2) \\ A(x_1) &= A_{e}(x_{1}^2) + x_{1} A_{o}(x_{1}^2) \\ A(x_2) &= A_{e}(x_{2}^2) + x_{2} A_{o}(x_{2}^2) \\ \ldots \\ A(x_{n-2}) &= A_{e}(x_{n-2}^2) + x_{n-2} A_{o}(x_{n-2}^2) \\ A(x_{n-1}) &= A_{e}(x_{n-1}^2) + x_{n-1} A_{o}(x_{n-1}^2) \\ \end{align} $$ Now let's remember we can contrive these points to be anything we want, we are just plugging them into the polynomial and computing the result. Now, the "butterfly" trick involves noticing that we can actually reuse the components of the equation for computing $x$ in the computation of negative $x$. This is because $x$ squared would be equal to negative $x$ squared. $$ x^2 = (-x)^2 $$ Thus, notice that if we have a point $x_i$ $$ A(x_{i}) = A_e(x_{i}^{2}) + x_{i}A_o(x_{i}^{2}) $$ We can simply negate that point, $-x_i$, and it gives us a new evaluation point for free! $$ A(−x_{i}) = A_e(x_{i}^{2}) − x_{i}A_o(x_{i}^{2}) $$ Thus, in each level of our "divide and conquer" algorithm, we only need to compute half as many results as before! This is because the results of both $A_e(x_{i}^{2})$ and $A_o(x_{i}^{2})$ can be used in to "build up" the result of the evaluation for both $x_i$ and $-x_i$. In other words, evaluating $A(x)$ which has a length of $n$, we would need $n$ evaluation points, however, this trick allows us to reduce the required number of output evaluation points to halve $n$. $$ \pm x_{0}, \pm x_{1}, \pm x_{2}, \ldots , \pm x_{(n - 1)/2} $$ This reduces the computation cost by half because we only have to compute half as many multiplications. It follows the same reduction is applied to all the recursion levels, and reduces the evaluating of $A_e(x)$ and $A_o(x)$ (each of which each have half the degree of $A(x)$) to half the actual length of $A_e(x)$ and $A_o(x)$. Thus, our general recursive relationship becomes: $$ \begin{align} A(x_{i}) &= A_e(x_{i}^{2}) + x_{i}A_o(x_{i}^{2}) \\ A(−x_{i}) &= A_e(x_{i}^{2}) − x_{i}A_o(x_{i}^{2}) \\ \end{align} $$ This is known as the "butterfly", and our FFT equations can be rewritten as follows, and we can see how $A_e(x_i^2)$ and $A_o(x_i^2)$ computations are "recycled". $$ \begin{align} A(x_0) &= A_{e}(x_{0}^2) + x_{0} A_{o}(x_{0}^2) \\ A(x_1) &= A_{e}(x_{1}^2) + x_{1} A_{o}(x_{1}^2) \\ A(x_2) &= A_{e}(x_{2}^2) + x_{2} A_{o}(x_{2}^2) \\ \ldots \\ A(x_{(n-1)/2}) &= A_{e}(x_{(n-1)/2}^2) + x_{2} A_{o}(x_{(n-1)/2}^2) \\ \ldots \\ A(-x_0) &= A_{e}(x_{0}^2) - x_{0} A_{o}(x_{0}^2) \\ A(-x_1) &= A_{e}(x_{1}^2) - x_{1} A_{o}(x_{1}^2) \\ A(-x_2) &= A_{e}(x_{2}^2) - x_{2} A_{o}(x_{2}^2) \\ \ldots \\ A(-x_{(n-1)/2}) &= A_{e}(x_{(n-1)/2}^2) - x_{(n-1)/2} A_{o}(x_{(n-1)/2}^2) \\ \end{align} $$ And we can now complete our pseudocode to be the "definitive" version: ``` fn FFT(a: [F], x: F) -> [F] { let n = a.len(); if n == 1 { return a; } let even = FFT(get_evens(a), x ^ 2); let odd = FFT(get_odds(a), x ^ 2); let half_n = n / 2; let r = []; for j in 0..half_n { r[j] = even[j] + (x ^ j) * odd[j]; r[j + half_n] = even[j] - (x ^ j) * odd[j]; } return r; } ``` Notice the above "butterfly" equations are turned into the recursive code: $$ \begin{align} r_j &= even_j + (x ^ j) * odd_j \\ r_{j+(n-1)/2} &= even_j - (x ^ j) * odd_j \\ \\ \text{where:} \\ even_j &= FFT(a_{evens}, x^2) \\ odd_j &= FFT(a_{odds}, x^2) \\ \end{align} $$ Usually, the butterfly recursive relationship is shown with a pretty diagram, and I would point to the [Algorithms book](http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf) as an excellent resource for seeing this: see "The fast Fourier transform unraveled" page 77. ## Understanding the choice of evaluation points and complex $n$'th root of unity. You may have noticed that we have a problem: the "butterfly" trick will only work on the top level of recursion. Let's look at the FFT equations again for the evaluation points: $\pm x_{0}, \pm x_{1}, \pm x_{2}, \ldots , \pm x_{(n - 1)/2}$. $$ \begin{align} A(x_0) &= A_{e}(x_{0}^2) + x_{0} A_{o}(x_{0}^2) \\ A(x_1) &= A_{e}(x_{1}^2) + x_{1} A_{o}(x_{1}^2) \\ A(x_2) &= A_{e}(x_{2}^2) + x_{2} A_{o}(x_{2}^2) \\ \ldots \\ A(x_{(n-1)/2}) &= A_{e}(x_{(n-1)/2}^2) + x_{2} A_{o}(x_{(n-1)/2}^2) \\ \ldots \\ A(-x_0) &= A_{e}(x_{0}^2) - x_{0} A_{o}(x_{0}^2) \\ A(-x_1) &= A_{e}(x_{1}^2) - x_{1} A_{o}(x_{1}^2) \\ A(-x_2) &= A_{e}(x_{2}^2) - x_{2} A_{o}(x_{2}^2) \\ \ldots \\ A(-x_{(n-1)/2}) &= A_{e}(x_{(n-1)/2}^2) - x_{(n-1)/2} A_{o}(x_{(n-1)/2}^2) \\ \end{align} $$ The next level of recursion would be feed squared values, and these values would themselves have to be "plus minus". $$ \begin{align} & \pm x_{0}^2, \pm x_{1}^2, \pm x_{2}^2, \ldots , \pm x_{(n - 1)/2}^2 \\ \end{align} $$ However a square cannot be negative! This is where complex numbers come into play. However, what choice of complex numbers? This is best demonstrated by "reverse engineering" the squaring process until we reach the set of $n$ points. ``` ... \ / \ / \ / \ / +1 -1 +i -i \ / \ / +1 -1 \ / \ / +1 ``` At the very bottom of the recursion we have to choose a point, this will be $1$. The level above must consist of it square roots: $$ \pm \sqrt{1}= \pm1 $$ The next level above has: $$ \pm \sqrt{(+1)}= \pm1 $$ As well as the complex component: $$ \pm \sqrt{(-1)}= \pm i $$ Where $i$ is the imaginary unit. Continuing in this process, and we would eventually reach the initial set of $n$ points. Therefore, the points are the complex $n$th roots of unity, that is the $n$ complex solutions to the equation: $$ z^n = 1 $$ Complex numbers essentially are just numbers on a circle that go round and round like a clock, the following panel from [the Algorithms book](http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf) does a go job of visualising it: ![[Pasted image 20240122104646.png]] ## $n$'th roots of unity. Using the complex $n$th roots of unity it extremely helpful! Provided we know the length of the input polynomial, we can generate all the evaluation points for free! And our squaring won't generate huge numbers, but rather just wrap around the unit circle. And by using the Euler formula for imaginary numbers: $$ w = e^{2\pi{}i/n} $$ We can easily compute the roots of unity as we go! Formally the $n$'th roots of unity would be: $$ 1, w, w^2, w^3, \ldots, w^{n-2}, w^{n-1} $$ We can now exploit the fact that the "plus minus" trick will still work so we can generate our "odd" set from the negative of our "even" point, that is, the $n$'th roots are "plus minus" paired: $$ w^{(n/2) + j} = -w^j $$ And squaring the roots will produce the $(n/2)$'nd roots of unity. Therefore, starting with some $n$ (power of 2) then each of levels of recursion ($k=0,1,2,3,..$) will have $(n/2^{k})$'th roots of unity. Lets, consider our FFT equations, where $n$ would be the length of the original polynomial. $$ \begin{align} k=0) & & \pm x_{0}, \pm x_{1}, \pm x_{2}, \ldots , \pm x_{(n - 1)/2} \\ k=1) & & \pm x_{0}^2, \pm x_{1}^2, \pm x_{2}^2, \ldots , \pm x_{(n - 1)/4}^2 \\ k=2) & & \pm x_{0}^4, \pm x_{1}^4, \pm x_{2}^4, \ldots , \pm x_{(n - 1)/8}^4 \\ k=3) & & \pm x_{0}^8, \pm x_{1}^8, \pm x_{2}^8, \ldots , \pm x_{(n - 1)/16}^8 \\ k=4) & & \pm x_{0}^{16}, \pm x_{1}^{16}, \pm x_{2}^{16}, \ldots , \pm x_{(n - 1)/32}^{16} \\ & \ldots \\ \end{align} $$ The size of the polynomial is halved at each recursion level and the roots of unity would also be halved. Since the $n$'th roots of unity are the solutions to the equation $z^n = 1$, there would be exactly $n$ solutions, we can see below how the number of solutions (or roots of unity) halves for each level of recursion. | k | $n/2^k$ 'th roots of unity | |------|----------------| | 0 | n/1 | | 1 | n/2 | | 2 | n/4 | | 3 | n/8 | | 4 | n/16 | | .... | | Where the primitive root of unity is: $$ w = e^{2\pi{}i/n} $$ The fact that we can square our primitive root at every recursion level works perfectly with the "divide and conquer" recursion approach! # Bringing it all together; and, how to the Inverse FFT. ## Key Takeaway. So that was some heavy stuff, so what is the key takeaway? Let's go over the facts. The primitive root of unity is: $$ w_n = e^{i \frac{2\pi{}}{n}} $$ $w_n$ satisfies root of unity equation: $$ w_n = 1 $$ And $$ \begin{align} w_n^m &= 1 \\ && \text{for all } m \end{align} $$ This means all the powers of $w_n$ belong to the unit circle and are evenly spaced, having arguments which are integer multiples of $\frac{2\pi}{n}$. Now, we have the fact that if $n>0$ is an even number, then the squares of the $n$'th root of unity are exactly the $n/2$ complex roots of unity of order $n/2$. We demonstrated this above but with a more "hand wave" explanation, instead, here is the proof by the "cancellation property": $$ w_n^k = (w_{2(\frac{n}{2})})^{2k} = w_{\frac{n}{2}}^k $$ Now, let's remember the way we implement it in code. Essentially, we can start with the primitive root of unity, and each recursion level receives a squared root at that particular level. Therefore, we only need to know the length of the input polynomial, and we can easily compute the primitive root of unity, and then we can compute all of our evaluation points for the evaluation form. Some implementations will calculate the root of unity at that particular level, which is based off the input size, but as shown above, every time we halve the polynomial we can simply square the primitive root and compute the correct root of unity for the level. ## The definitive FFT pseudocode Now with that all in mind, we can finally complete our pseudocode! Let's define the driver function, where we can compute the primitive root of unity using Euler's formula. Assume the $F$ type is a complex floating point number. ``` fn FFT_driver(a: [F]) -> [F] { let w_n = e^(2 * pi * (1/n)) FFT(a, w_n); } ``` Now we see the driver can plug this into the core of the FFT. ``` // function FFT(a, w) // Input: // - a : An array, where, a = (a_0, a_1, ... , a_{n−1}), for n a power of 2 // - w : A primitive n'th root of unity fn FFT(a: [F], w: F) -> [F] { let n = a.len(); if n == 1 { return a; } let even = FFT(get_evens(a), w ^ 2); let odd = FFT(get_odds(a), w ^ 2); let half_n = n / 2; let r = []; for j in 0..half_n { r[j] = even[j] + w ^ j * odd[j]; r[j + half_n] = even[j] - w ^ j * odd[j]; } return r; } ``` ## Inverse FFT The last thing to cover in FFT is how to "undo" the transform, that is, return from the evaluation domain to the coefficient domain. Simply put, this involves using the "inverse" of the roots of unity, but what does that mean in the context of complex numbers? Simply, we just need to "undo" the multiplication of the original $w$. Formally, this would mean $w \cdot w_{inv} = 1$ where $w$ is a complex number and $w_{inv}$ is its inverse. Therefore, $w_{inv}$ would be $$ w_{inv} = \frac{1}{w} = w^{-1} $$ Therefore, all we need to do to inverse the transform, is to apply the FFT with, instead, of the primitive root of unity, the inverse of the primitive root of unity, that is; instead of $w_n$ use $w_n^{-1}$. ### Normalisation The result of the inverse FFT must also be normalised to get the correct scale, or in other words, ensure a "round trip" for both the forward (FFT) and the reverse (inverse FFT) transforms. This is achieved by dividing the result of the inverse FFT by $n$ (length of original polynomial) because we are in fact summing $n$ samples (in the original FFT) so we must normalise by $\sqrt{n}$ during both transforms, but the most common approach is to only normalise the inverse result. For example, normalising during the forward transform would require us to multiply the result by $\frac{1}{\sqrt{n}}$, and for the inverse again apply the same multiplication of $\frac{1}{\sqrt{n}}$. However, this is computational expensive and doesn't actually affect any of the operations in evaluation form. So the "definitive" approach is to just divide the result of the inverse FFT by $n$. ## The definitive Inverse FFT pseudocode Now we can define the pseudocode for the inverse FFT (IFFT), the driver function is almost exactly the same, where $F$ is a complex floating point type: ``` fn IFFT_driver(a: [F]) -> [F] { let w_n = e^(2 * pi * (1/n)) FFT(a, w_n); } ``` The core of the inverse function must simply compute the inverse of the primitive root and apply the FFT transform with it, and then normalise the result. ``` fn IFFT(a: [F], w: F) -> [F] { let r = FFT(a, w ^ -1); for i in 0..r { r[i] *= n ^ -1; } return r; } ```