# FPGA requirements
## Setup
Define prime field with a modulus `p` such that `p-1` is divisible by `2^k` with a large `k`. Such field allows to perform polynomial multipoint evaluation and interpolation by use of FFT (called NTT in this case) with an operation count `O(NlogN)`. Bit size of `p` is around 250-256 bits.
Representation of the numbers in memory is up to implementor, as well as storage patters in memory.
Montgommery reduction of the general form is most likely the best place to start. It may be benefitial to look at special reduction for Proth primes, but it's unlikely to be used for Marsene primes. Barett reduction is also an option.
Similar operations were implemented in Rust and can be provided as an example.
## Operation performed
Low degree extension
- Program running on CPU should write data to DRAM of the FPGA device that is a set of coefficient for polynomial defined over prime field of characteristic `p`. Assume polynomial degree of ~`2^32`
- FPGA device should perform a coset FFT operation that takes the following two steps:
- Multiply each of the polynomial coefficients by the corresponding power of some scalar
- Perform an FFT using any of the algorithms (use of precomputations is allowed, output order of the result can be both natural or bit-reversed)
- Output the result to DRAM most likely as a copy
- For the same set of coefficients being loaded once to DRAM it will be required to perform such coset FFT more than once (8-32 times)
Pure FFT (IFFT)
- Program running on CPU should write data to DRAM of the FPGA device that is a set of coefficient for polynomial defined over prime field of characteristic `p`. Assume polynomial degree of ~`2^32`
- FPGA device should perform a coset FFT operation that takes the following two steps:
- Multiply each of the polynomial coefficients by the corresponding power of some scalar
- Perform an FFT using any of the algorithms (use of precomputations is allowed, output order of the result can be both natural or bit-reversed)
- Optionally (IFFT) multiply each of the resulting values by some scalar
- Output the result to DRAM either as copy or in-place

In recent progress reports there were a lot of improvements on the execution speed of EVM384 based on the EVMOne implementation of EVM. It was claimed that implementation of the miller loop + final exponentiation for the pairing of two pairs takes around 4.8 ms, that is a great result, but unfortunately it can only be considered as a wonderful academic achievement that unfortunately have nothing to do with the final cost of this operation to end-users. There is a fundamental difference between the precompile and the EVM384 based contract from perspective of heavy cryptography: in precompile the final cost covers everything: parsing input validation control flow (fast and native) arithmetics (fast and native)

11/5/2020Simple binary Merkle tree contains 2^N items. Each item is an "account" that follows this declaration and follow this procedure for serialization into the vector of field elements. After serialization item is hashed with Rescue hash in a form of "committer" created like here There are few fields that are must have certain default values: pub balances_tree_root: E::Fr - must be a merkle root of the simple binary merkle tree with 2^K items each of them being zero (as field element) pub sync_pubkey_hash: E::Fr - is zero by default pub self_state: E::Fr - is zero by default pub contract_code_root: E::Fr - is zero by default

9/7/2020
Published on ** HackMD**