# NTT: For NTRUP761 (P = 761, Q = 4591)
## Brief introduction for the NTRUP761 and its constraints
* One thing we have to first check when it comes to use NTT to accelerate the calculation of the following function:
$$(f/(x^{761}-1)) mod(4591)$$
is that $N|(Q-1)$. In this case, $N=761$ and $Q=4591$. The reason of that is because we need to make sure the existence of the root of unity, which is the key for why we could do the element-wise multiplication when the vectors are all transform into the NTT domain.
* However, as you can see that $761|(q-1)$ **doesn't** hold. Therefore, we need to find another bigger ${N}^{'}$ such that ${N}^{'}|(4591-1)$. In this case, we could find that $1620=(5)(3^4)(2^2)$ meets the condition and approximate the value of $761*2$, therefore, the next question is how to do the mixed-radix NTT(rather than pure radix-2 NTT as we often see in the website).
## Mixed-Radix NTT
* Reference: [Radix-3 NTT-Based Polynomial Multiplication for Lattice-Based Cryptography](https://open.metu.edu.tr/handle/11511/97928)
* From the paper, we could get the fomula of radix-3 ntt as follows:

The $\mu$ here is $\frac{2n}{3}\omega$, and we could derive it via the equation $x^n - 1 =(x^{n/3} - 1)(x^{2n/3} + x^{n/3} + 1)$. Notice that since $(x^{n/3} - 1) modq\neq 0$, we have $(x^{2n/3} + x^{n/3} + 1)=0$. That is, $\mu^{2} +\mu+1=0$.
From the details described above, we could get the truth that $(x^{2n/3} + x^{n/3} + 1)=(x^{n/3}-\mu)(x^{n/3}-\mu^{2})$.
* From the example of radix-3, we could simply derive the desired equation of radix-5 as follows:
$$x^{n}-1=(x^{n/5}-1)(x^{n/5}-\mu)(x^{n/5}-\mu^{2})(x^{n/5}-\mu^{3})(x^{n/5}-\mu^{4})$$
where $\mu=(2n/5)\omega$
* Now, we could try to derive the the mixed-radix formula for $1620=(5)(3^4)(2^2)$. We first do one layer of radix-5 and then do four layers of radix-3, and finally do the two layers of radix-2, which seems intuitive. But when it comes to implementation, you need to first be familiar with the implementation of incomplete NTT.
* Incomplete NTT:
Let me use a example in the paper to see how to realize this idea:

The 3-layer for loop is the classic NTT implementation, which would derive a algorithm with time complexity $O(nlogn)$. The reason and details how this be derived and the correctness could refer to the website 」[快速数论变换(NTT)及蝴蝶操作构造详解](https://zhuanlan.zhihu.com/p/80297169) which help me a lot to get a deeper understanding of how this algorithm be constructed.
Back the the implementation of incomplete NTT, as we can see that the first loop (line 3) represents the index of layers we currently in to conduct the operations (butterfly operation). And the second loop represents the index of group we are in. The third loop shows the index of elements (member of the current group) we focus on.
Therefore, for the concept of incomplete NTT, simply means we don't execute the whole layers and stop in the certain layers. Now the problem is how to deal with the original bit reverse. From the graph we could see that since we stop in the intermediate layer, the original derivation of the bit reverse equation doens't hold.
## Reference:
* [快速数论变换(NTT)及蝴蝶操作构造详解](https://zhuanlan.zhihu.com/p/80297169)
* [Radix-3 NTT-Based Polynomial Multiplication for Lattice-Based Cryptography](https://open.metu.edu.tr/handle/11511/97928)
* [Split-Radix FFT Algorithms](https://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fft.html)
* [A Complete Beginner Guide to the Number Theoretic Transform (NTT)](https://eprint.iacr.org/2024/585.pdf)
* [Conceptual Review on Number Theoretic Transform and Comprehensive Review on Its Implementations](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10177902)
* [Ring-SIS and Ideal Lattices](https://people.csail.mit.edu/vinodv/6876-Fall2018/RingSISclass.pdf)
* [FASTER ARITHMETIC FOR NUMBER-THEORETIC
TRANSFORMS](https://arxiv.org/pdf/1205.2926)
* [Negacyclic Polynomial Multiplication](https://www.jeremykun.com/2022/12/09/negacyclic-polynomial-multiplication/)