# Annotate and explain Quiz7 Problem D (2024 Fall)
## Problem `D`
### 1. What is the relationship between FFT and convolution?
Answer:
In a `mathematical` relationship:
The convolution in the time domain is: $(f*g)(t)$
In the `frequency` domain: $F\{f*g\}=F\{f\} \times F\{g\}$ where F represents the Fourier transform.
In a `practical implication`:
Convolution is more expensive in computation than FFT.
FFT time complexity is $O(n \times log(n))$, and Convolution time complexity is $O(n^2)$.
Using the FFT to replace the convolution procedure:
1. Take the FFT of both signals($f$ and $g$) - $O(n \times log(n))$
2. Multiply them in the frequency domain - $O(n)$
3. Take the inverse FFT of the multiplied result to get back to the time domain - $O(n \times log(n))$
### 2. What is the meaning of the following code
```c
complex double *even = malloc(N / 2 * sizeof(complex double));
complex double *odd = malloc(N / 2 * sizeof(complex double));
```
These lines are part of implementing the Cooley-Tukey FFT algorithm, which uses a divide-and-conquer approach.

Purpose:
1. The algorithm splits the input array into two parts:
1. even: Stores elements at even indices $(x[0], x[2], x[4], ...)$
2. odd: Stores elements at odd indices $(x[1], x[3], x[5], ...)$
2. Memory Allocation:
1. `N/2`: Splitting the array in half, we only need $N/2$ space for each part.
2. `sizeof(complex double)`: Allocates space for complex numbers (containing both real and imaginary parts)
3. `malloc`: Dynamically allocates memory on the heap
### 3. Why must we separate the calculation into even and odd parts?

The separation into even and odd parts (also known as decimation-in-time) provides several key advantages:
1. Computational Efficiency:
By splitting an N-point DFT into two N/2-point DFTs, we reduce the number of operations from $O(N^2)$ to $O(NlogN)$.
Each half can be computed independently and in parallel.
The divided problems follow the same pattern, enabling recursive implementation.
2. Hardware Implementation Benefits:
The even-odd decomposition creates a butterfly structure (as shown in the diagram).
This structure is highly suitable for parallel processing.
It enables efficient memory access patterns since we can process elements with stride-2.
The regular pattern makes it easier to pipeline in hardware implementations.
3. Memory Usage:
The algorithm can be implemented in-place, with minimal additional memory requirements.
The butterfly structure allows for efficient cache utilization since adjacent memory elements are processed together.
This divide-and-conquer approach is the fundamental insight that makes FFT so efficient compared to the direct DFT computation.
```c
for (int i = 0; i < N / 2; i++) {
even[i] = x[2 * i]; // Process even indices
odd[i] = x[2 * i + 1]; // Process odd indices
}
```
Please check the reference #3 to see the full code of `rfft.h`.
The separation into even and odd parts (decimation-in-time) is clearly demonstrated in the radix-2 FFT implementation, and it provides several key insights:
- Bit-Reversal Permutation (`fft_transform_radix2`):
```c
// Permute vec by reversing the bits of addresses
for (size_t i = 0; i < n; i++) {
// Reverse the bits of i
size_t j = 0, ii = i;
for (int k = 0; k < levels; k++) {
j = (j << 1) | (ii & 1);
ii >>= 1;
}
```
This initial rearrangement groups even and odd indices together, setting up the butterfly operations.
- In-Place Butterfly Computations (`fft_transform_radix2`):
```c
for (size_t half = 1; half < n; half *= 2) {
size_t size = 2 * half;
size_t step = n / size;
for (size_t j = 0, k = 0; j < half; j++, k += step) {
double angle = (inverse ? 2 : -2) * M_PI * k / n;
double complex omega = cos(angle) + I * sin(angle);
for (size_t i = 0; i < n; i += size) {
double complex tmp = vec[i + j + half] * omega;
vec[i + j + half] = vec[i + j] - tmp;
vec[i + j] += tmp;
}
}
}
```
The butterfly structure:
1. Operates on pairs of elements (even and odd)
2. Performs additions and subtractions in-place
3. Uses only O(1) extra storage
4. Allows parallel processing of independent pairs
- Power-of-Two Optimization (`fft_transform`):
```c
if ((n & (n - 1)) == 0) // Power of 2
fft_transform_radix2(vec, n, inverse);
else
fft_transform_bluestein(vec, n, inverse);
```
The code explicitly checks for power-of-2 sizes because:
1. Even-odd decomposition works perfectly for power-of-2 lengths
2. Leads to uniform butterfly patterns
3. Enables efficient in-place computation
4. Results in simpler memory access patterns
### 4. What is the central part of convolution in the code?
```c
fft(av, roots, n); // Transform first signal to frequency domain
fft(bv, roots, n); // Transform second signal to frequency domain
/* Multiplication in frequency domain */
complex double *cv = malloc(n * sizeof(complex double));
for (int i = 0; i < n; i++)
cv[i]=av[i] * bv[i]; // Point-wise multiplication (Note: there's actually a bug here - result should be assigned to cv[i])
/* Transform back to time domain */
fft(cv, roots, n);
```
### 5. How do you get more preciseon numbers in the following program?
```c
int mysqrt(int i)
{
if (i == 0) return 0;
int x = 1 << (32 - __builtin_clz(i)) / 2;
x = (x + i / x) / 2; //Repeat this line to get a more precise number
return x;
}
```
Here is the MATLAB code
`sqrt_approxi.m`
```matlab
% First, define the mysqrt_iter function
function y = sqrt_approxi(i, iterations)
if i == 0
y = 0;
return;
end
% Initial approximation
bits = floor(log2(i)) + 1;
x = 2^floor(bits/2);
% Apply Newton's method for specified iterations
for iter = 1:iterations
x = floor((x + floor(i/x))/2);
end
y = x;
end
```
`problem_d.m`
```matlab
import sqrt_approxi.*
main_sqrt
% Create a main function to run everything
function main_sqrt()
% Create array of x values (0 to 300)
x = 0:300;
actual_sqrt = sqrt(x);
% Calculate approximations with different iterations
approx_sqrt1 = zeros(size(x));
approx_sqrt2 = zeros(size(x));
approx_sqrt3 = zeros(size(x));
approx_sqrt4 = zeros(size(x));
for i = 1:length(x)
approx_sqrt1(i) = sqrt_approxi(x(i), 1);
approx_sqrt2(i) = sqrt_approxi(x(i), 2);
approx_sqrt3(i) = sqrt_approxi(x(i), 3);
approx_sqrt4(i) = sqrt_approxi(x(i), 4);
end
% Create subplots
figure('Position', [100, 100, 1200, 800]);
% Subplot 1: Original precision (1 iteration)
subplot(2,2,1);
plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2);
hold on;
stairs(x, approx_sqrt1, 'k-', 'LineWidth', 1.5);
grid on;
title('1 Iteration');
xlabel('X (Square Value)');
ylabel('Y (Square Root)');
legend('Actual', 'Approximated');
axis([0 300 0 18]);
% Subplot 2: 2 iterations
subplot(2,2,2);
plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2);
hold on;
stairs(x, approx_sqrt2, 'k-', 'LineWidth', 1.5);
grid on;
title('2 Iterations');
xlabel('X (Square Value)');
ylabel('Y (Square Root)');
legend('Actual', 'Approximated');
axis([0 300 0 18]);
% Subplot 3: 3 iterations
subplot(2,2,3);
plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2);
hold on;
stairs(x, approx_sqrt3, 'k-', 'LineWidth', 1.5);
grid on;
title('3 Iterations');
xlabel('X (Square Value)');
ylabel('Y (Square Root)');
legend('Actual', 'Approximated');
axis([0 300 0 18]);
% Subplot 4: 4 iterations
subplot(2,2,4);
plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2);
hold on;
stairs(x, approx_sqrt4, 'k-', 'LineWidth', 1.5);
grid on;
title('4 Iterations');
xlabel('X (Square Value)');
ylabel('Y (Square Root)');
legend('Actual', 'Approximated');
axis([0 300 0 18]);
sgtitle('Square Root Approximation with Different Iterations');
end
```
Result

## Reference
1. [Fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform)
2. [Quiz7 - Problem D](https://hackmd.io/@sysprog/arch2024-quiz7-sol)
3. [rfft.h](https://github.com/grego/rfft.h/blob/master/rfft.h)