# Annotate and explain Quiz7 Problem D (2024 Fall) ## Problem `D` ### 1. What is the relationship between FFT and convolution? Answer: In a `mathematical` relationship: The convolution in the time domain is: $(f*g)(t)$ In the `frequency` domain: $F\{f*g\}=F\{f\} \times F\{g\}$ where F represents the Fourier transform. In a `practical implication`: Convolution is more expensive in computation than FFT. FFT time complexity is $O(n \times log(n))$, and Convolution time complexity is $O(n^2)$. Using the FFT to replace the convolution procedure: 1. Take the FFT of both signals($f$ and $g$) - $O(n \times log(n))$ 2. Multiply them in the frequency domain - $O(n)$ 3. Take the inverse FFT of the multiplied result to get back to the time domain - $O(n \times log(n))$ ### 2. What is the meaning of the following code ```c complex double *even = malloc(N / 2 * sizeof(complex double)); complex double *odd = malloc(N / 2 * sizeof(complex double)); ``` These lines are part of implementing the Cooley-Tukey FFT algorithm, which uses a divide-and-conquer approach. ![image](https://hackmd.io/_uploads/r1GiOzMwkg.png) Purpose: 1. The algorithm splits the input array into two parts: 1. even: Stores elements at even indices $(x[0], x[2], x[4], ...)$ 2. odd: Stores elements at odd indices $(x[1], x[3], x[5], ...)$ 2. Memory Allocation: 1. `N/2`: Splitting the array in half, we only need $N/2$ space for each part. 2. `sizeof(complex double)`: Allocates space for complex numbers (containing both real and imaginary parts) 3. `malloc`: Dynamically allocates memory on the heap ### 3. Why must we separate the calculation into even and odd parts? ![image](https://hackmd.io/_uploads/r1GiOzMwkg.png) The separation into even and odd parts (also known as decimation-in-time) provides several key advantages: 1. Computational Efficiency: By splitting an N-point DFT into two N/2-point DFTs, we reduce the number of operations from $O(N^2)$ to $O(NlogN)$. Each half can be computed independently and in parallel. The divided problems follow the same pattern, enabling recursive implementation. 2. Hardware Implementation Benefits: The even-odd decomposition creates a butterfly structure (as shown in the diagram). This structure is highly suitable for parallel processing. It enables efficient memory access patterns since we can process elements with stride-2. The regular pattern makes it easier to pipeline in hardware implementations. 3. Memory Usage: The algorithm can be implemented in-place, with minimal additional memory requirements. The butterfly structure allows for efficient cache utilization since adjacent memory elements are processed together. This divide-and-conquer approach is the fundamental insight that makes FFT so efficient compared to the direct DFT computation. ```c for (int i = 0; i < N / 2; i++) { even[i] = x[2 * i]; // Process even indices odd[i] = x[2 * i + 1]; // Process odd indices } ``` Please check the reference #3 to see the full code of `rfft.h`. The separation into even and odd parts (decimation-in-time) is clearly demonstrated in the radix-2 FFT implementation, and it provides several key insights: - Bit-Reversal Permutation (`fft_transform_radix2`): ```c // Permute vec by reversing the bits of addresses for (size_t i = 0; i < n; i++) { // Reverse the bits of i size_t j = 0, ii = i; for (int k = 0; k < levels; k++) { j = (j << 1) | (ii & 1); ii >>= 1; } ``` This initial rearrangement groups even and odd indices together, setting up the butterfly operations. - In-Place Butterfly Computations (`fft_transform_radix2`): ```c for (size_t half = 1; half < n; half *= 2) { size_t size = 2 * half; size_t step = n / size; for (size_t j = 0, k = 0; j < half; j++, k += step) { double angle = (inverse ? 2 : -2) * M_PI * k / n; double complex omega = cos(angle) + I * sin(angle); for (size_t i = 0; i < n; i += size) { double complex tmp = vec[i + j + half] * omega; vec[i + j + half] = vec[i + j] - tmp; vec[i + j] += tmp; } } } ``` The butterfly structure: 1. Operates on pairs of elements (even and odd) 2. Performs additions and subtractions in-place 3. Uses only O(1) extra storage 4. Allows parallel processing of independent pairs - Power-of-Two Optimization (`fft_transform`): ```c if ((n & (n - 1)) == 0) // Power of 2 fft_transform_radix2(vec, n, inverse); else fft_transform_bluestein(vec, n, inverse); ``` The code explicitly checks for power-of-2 sizes because: 1. Even-odd decomposition works perfectly for power-of-2 lengths 2. Leads to uniform butterfly patterns 3. Enables efficient in-place computation 4. Results in simpler memory access patterns ### 4. What is the central part of convolution in the code? ```c fft(av, roots, n); // Transform first signal to frequency domain fft(bv, roots, n); // Transform second signal to frequency domain /* Multiplication in frequency domain */ complex double *cv = malloc(n * sizeof(complex double)); for (int i = 0; i < n; i++) cv[i]=av[i] * bv[i]; // Point-wise multiplication (Note: there's actually a bug here - result should be assigned to cv[i]) /* Transform back to time domain */ fft(cv, roots, n); ``` ### 5. How do you get more preciseon numbers in the following program? ```c int mysqrt(int i) { if (i == 0) return 0; int x = 1 << (32 - __builtin_clz(i)) / 2; x = (x + i / x) / 2; //Repeat this line to get a more precise number return x; } ``` Here is the MATLAB code `sqrt_approxi.m` ```matlab % First, define the mysqrt_iter function function y = sqrt_approxi(i, iterations) if i == 0 y = 0; return; end % Initial approximation bits = floor(log2(i)) + 1; x = 2^floor(bits/2); % Apply Newton's method for specified iterations for iter = 1:iterations x = floor((x + floor(i/x))/2); end y = x; end ``` `problem_d.m` ```matlab import sqrt_approxi.* main_sqrt % Create a main function to run everything function main_sqrt() % Create array of x values (0 to 300) x = 0:300; actual_sqrt = sqrt(x); % Calculate approximations with different iterations approx_sqrt1 = zeros(size(x)); approx_sqrt2 = zeros(size(x)); approx_sqrt3 = zeros(size(x)); approx_sqrt4 = zeros(size(x)); for i = 1:length(x) approx_sqrt1(i) = sqrt_approxi(x(i), 1); approx_sqrt2(i) = sqrt_approxi(x(i), 2); approx_sqrt3(i) = sqrt_approxi(x(i), 3); approx_sqrt4(i) = sqrt_approxi(x(i), 4); end % Create subplots figure('Position', [100, 100, 1200, 800]); % Subplot 1: Original precision (1 iteration) subplot(2,2,1); plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2); hold on; stairs(x, approx_sqrt1, 'k-', 'LineWidth', 1.5); grid on; title('1 Iteration'); xlabel('X (Square Value)'); ylabel('Y (Square Root)'); legend('Actual', 'Approximated'); axis([0 300 0 18]); % Subplot 2: 2 iterations subplot(2,2,2); plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2); hold on; stairs(x, approx_sqrt2, 'k-', 'LineWidth', 1.5); grid on; title('2 Iterations'); xlabel('X (Square Value)'); ylabel('Y (Square Root)'); legend('Actual', 'Approximated'); axis([0 300 0 18]); % Subplot 3: 3 iterations subplot(2,2,3); plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2); hold on; stairs(x, approx_sqrt3, 'k-', 'LineWidth', 1.5); grid on; title('3 Iterations'); xlabel('X (Square Value)'); ylabel('Y (Square Root)'); legend('Actual', 'Approximated'); axis([0 300 0 18]); % Subplot 4: 4 iterations subplot(2,2,4); plot(x, actual_sqrt, 'Color', [0.7 0.7 0.7], 'LineWidth', 2); hold on; stairs(x, approx_sqrt4, 'k-', 'LineWidth', 1.5); grid on; title('4 Iterations'); xlabel('X (Square Value)'); ylabel('Y (Square Root)'); legend('Actual', 'Approximated'); axis([0 300 0 18]); sgtitle('Square Root Approximation with Different Iterations'); end ``` Result ![sqrt_approxi_plot](https://hackmd.io/_uploads/S1BN0Mzwyx.png) ## Reference 1. [Fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform) 2. [Quiz7 - Problem D](https://hackmd.io/@sysprog/arch2024-quiz7-sol) 3. [rfft.h](https://github.com/grego/rfft.h/blob/master/rfft.h)