# FPGA requirements ## Setup Define prime field with a modulus `p` such that `p-1` is divisible by `2^k` with a large `k`. Such field allows to perform polynomial multipoint evaluation and interpolation by use of FFT (called NTT in this case) with an operation count `O(NlogN)`. Bit size of `p` is around 250-256 bits. Representation of the numbers in memory is up to implementor, as well as storage patters in memory. Montgommery reduction of the general form is most likely the best place to start. It may be benefitial to look at special reduction for Proth primes, but it's unlikely to be used for Marsene primes. Barett reduction is also an option. Similar operations were implemented in Rust and can be provided as an example. ## Operation performed Low degree extension - Program running on CPU should write data to DRAM of the FPGA device that is a set of coefficient for polynomial defined over prime field of characteristic `p`. Assume polynomial degree of ~`2^32` - FPGA device should perform a coset FFT operation that takes the following two steps: - Multiply each of the polynomial coefficients by the corresponding power of some scalar - Perform an FFT using any of the algorithms (use of precomputations is allowed, output order of the result can be both natural or bit-reversed) - Output the result to DRAM most likely as a copy - For the same set of coefficients being loaded once to DRAM it will be required to perform such coset FFT more than once (8-32 times) Pure FFT (IFFT) - Program running on CPU should write data to DRAM of the FPGA device that is a set of coefficient for polynomial defined over prime field of characteristic `p`. Assume polynomial degree of ~`2^32` - FPGA device should perform a coset FFT operation that takes the following two steps: - Multiply each of the polynomial coefficients by the corresponding power of some scalar - Perform an FFT using any of the algorithms (use of precomputations is allowed, output order of the result can be both natural or bit-reversed) - Optionally (IFFT) multiply each of the resulting values by some scalar - Output the result to DRAM either as copy or in-place