The Role of NTT in PQC, FHE, and ZKP

# The Role of NTT in PQC, FHE, and ZKP #### Author: [Aikata (Ph.D. Candidate, TU Graz)](https://www.isec.tugraz.at/person/aikata-aikata/) <img src="https://hackmd.io/_uploads/rybcTBjxex.png" width="30" height="30"> ___ <div style="text-align:center"> <img src="https://hackmd.io/_uploads/HyRgfHixxg.jpg" width="700" height="400"> </div> This article explores the implementation techniques behind Number Theoretic Transform (NTT) and how they play critical roles in a wide range of cryptographic schemes-- from Post-Quantum Cryptography (PQC) to Fully Homomorphic Encryption (FHE) and Zero-Knowledge Proofs (ZKPs). ## <img src="https://hackmd.io/_uploads/Bk3m6Hsllg.png" width="50" height="50">What is NTT ? > Amber's excellent guide on [NTT for PQC](https://electricdusk.com/ntt.html) is a must-read for anyone diving into NTT. The **Number Theoretic Transform (NTT)** is a transformation applied on rings that allows for more efficient multiplication of elements within those rings. We work with polynomials whose coefficients are integers modulo $q$, and the polynomials are reduced modulo $X^N+1$. If you're familiar with finite fields, this is similar, but without division. Addition of two polynomials is performed coefficient-wise. Multiplication is more involved. The plain SchoolBook multiplication technique performs up to $N$ multiplications for each coefficient in the result, and we repeat this for all $N$ coefficients. This results in a naive complexity of **O(**$N^2$**)**, which becomes inefficient for large $N$. To make polyonimal multiplication faster, especially in polynomial multiplication, we apply the **NTT**. With the help of the NTT, we can reduce the multiplication complexity from **O(**$N^2$**)** to **O(**$N \log{N}$**)**. ### NTT and [Reed–Solomon Codes](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction) <img src="https://hackmd.io/_uploads/rkLkRSixll.png" width="60" height="50"> The **NTT** flow is also conceptually related to the **Discrete Fourier Transform (DFT)** over finite fields, which underlies **Reed–Solomon error-correcting codes**. In both cases: --> All the elements are mapped from coefficient space to evaluation space using a transform (NTT or DFT). In Reed–Solomon codes, this structure is used for **efficient encoding and decoding**, as well as for **error detection and correction** by working with evaluations at known points (usually roots of unity in a finite field). The NTT enables error correction by ensuring that any error in transmission manifests as discrepancies at specific points. For example, the additive NTT used for Binary Reed-Solomon codes in [Binius](https://www.binius.xyz/blueprint/cryptography/commitment/additive/). <br> > <img src="https://hackmd.io/_uploads/HknZl8sggg.png" width="30" height="30"> While the goals differ-- **error correction** in coding theory vs. **fast multiplication** in lattice-based cryptography, the evaluation at roots of unity is a shared core idea. --- ## Why One Size Doesn’t Fit All? <img src="https://hackmd.io/_uploads/H1Ox-Uoxel.png" width="35" height="30"><img src="https://hackmd.io/_uploads/rJXVW8sege.png" width="35" height="30"><img src="https://hackmd.io/_uploads/r1zu-Islxg.png" width="35" height="30"> There is no "best" implementation of NTT. Different applications impose different constraints based on their use-cases and target platforms. The need for an efficient NTT is **constant**, but the **parameters** for judging that efficiency, such as Polynomial Degree $N$ and modulus size ($q$), change drastically between PQC, FHE, and ZKP. ![Parameter Comparison Between PQC, FHE, ZKP](https://hackmd.io/_uploads/SytQNrslgg.jpg) - **PQC** focuses on **low-area and low-latency** implementations, often for microcontrollers and lightweight IoT platforms. - **FHE** and **ZKP**, on the other hand, emphasize **high throughput and performance**, typically targeting datacenter-scale compute, or High performnace ASIC chip designs. Consequently, each direction requires its own dedicated design exploration, as demonstrated in prior research. --- ## Our Optimized NTT Designs <img src="https://hackmd.io/_uploads/Hkv6N8iege.png" width="40" height="30"> We have proposed specialized NTT architectures for **both PQC and FHE**: - The **PQC accelerator**- **[KaLi](https://eprint.iacr.org/2022/1086)** continues to be **state-of-the-art** for supporting fast and area-efficient transforms on constrained devices for Kyber and Dilithium. The NTT unit is one of the giants building blocks in both the schemes. Our NTT design features a fully utilized datapath for both Kyber (12-bit) and Dilithium moduli (23-bit). - The **FHE-optimized NTT design** in **[REED](https://eprint.iacr.org/2021/1234)** targets high-performance lattice-based homomorphic encryption using optimized polynomial arithmetic-- a design starkly different from embedded PQC implementations. Other than FHE, it finds **cross-cutting applications** in: - ECC-based ZKPs (Elliptic Curve-based Zero-Knowledge Proofs) - Fourier Transform in **error-correcting codes** used in hash-based ZKPs > <img src="https://hackmd.io/_uploads/HknZl8sggg.png" width="30" height="30"> This dual-pronged design strategy shows how tailored NTTs are essential in pushing the limits of cryptographic protocols. For the following disussion, we will focus on the NTT designs for large parameters suitable for both FHE and ZKP. --- ## REED's Hybrid-NTT <img src="https://hackmd.io/_uploads/rJLNtIsegl.png" width="40" height="30"> [REED](https://eprint.iacr.org/2021/1234) is a chiplet-based hardware accelerator built for the [CKKS](https://eprint.iacr.org/2016/421.pdf) FHE scheme. Among its key innovations is the Hybrid NTT-- the first of its kind, which redefines the design of Number Theoretic Transform units. This novel architecture merges the best aspects of pipelined, iterative, and hierarchical NTT implementations to eliminate expensive transpose operations and optimize routing. Inspired by a "Frankenstein-style" fusion, it combines pipelined single-delay feedback (SDF) NTTs with fully unrolled NTTs to deliver both efficiency and scalability, as shown below. ![Hybrid_NTT](https://hackmd.io/_uploads/ByX1e9k-lg.png)*Figure: Novel routing-friendly Hybrid NTT/INTT design flow in REED for $N = N_1 \times N_2$.* It achieves the following: 1. Transpose-free computation, saving area by upto $14\%$ and reducing complexity. 2. Bi-directional flow, enabling efficient forward and inverse NTT (INTT). 3. Scalability with configurable parameters $N_1$, $N_2$ (where, $N_1\times N_2=N$) for performance-area tradeoffs. 4. On-the-fly twiddle factor generation, reducing constant memory usage by over $98\%$. 5. Seamless pipelining, reading and writing $N_2$ coefficients per cycle. ![Hybrid_NTT](https://hackmd.io/_uploads/SkXkUHjxll.png)*Figure: The proposed Hybrid NTT/INTT design flow with Memory access for (a) NTT and (b) INTT with $N_1=N_2$=4 (N=16). The Gentleman-Sande butterfly operation are employed.* These features not only accelerate FHE workloads (like key switching and bootstrapping) but are also highly advantageous for ZKP systems, which often operate on even large polynomial degrees, for NTT-based polynomial multiplication or Reed-Solomon codes. > <img src="https://hackmd.io/_uploads/HknZl8sggg.png" width="30" height="30"> In ZKPs, large-degree polynomial operations often require costly matrix transpositions. The proposed transpose-free Hybrid NTT eliminates this bottleneck, offering a superior area-time tradeoff and scalability, with configurable design adaptable to different bandwidth needs. While NTT-based implementations are known to be efficient for polynomial multiplication, their performance in error-correcting code scenarios was not well established. Specifically, it was unclear how Reed-Solomon codes perform in comparison to other error-correcting codes, such as linear expander graphs used in [BrakeDown](https://eprint.iacr.org/2021/1043) and [Orion](eprint.iacr.org/2022/1010) Polynomial Commitment Schemes. However, a recent work [[CCC+25]](https://eprint.iacr.org/2025/286) conducted this comparison and demonstrated that Reed-Solomon codes outperform linear expanders by up to 115x. Thus, solidifying NTT’s potential in hash-based ZKPs. ![ZKP_NTT](https://hackmd.io/_uploads/rJ6_PHiegx.png) *Figure: Results from [[CCC+25]](https://eprint.iacr.org/2025/286) showing performance for Brakedown commitments using Linear Expander versus RS codes. (ST) denotes single-threaded execution.* > <img src="https://hackmd.io/_uploads/HknZl8sggg.png" width="30" height="30"> It remains interesting to explore how REED's NTT design can futher contribute to achieving scalable and efficient acceleration for ZKP systems. --- ## Closing Thoughts Today’s acceleration efforts often address isolated goals: - **PQC** **secures communication**. - **FHE** enables **privacy-preserving** computation, and **ZKP** ensures its **verifiability**. But these are pieces of the same puzzle. <img src="https://hackmd.io/_uploads/SJ-9UUille.png" width="50" height="30"> Tomorrow’s secure systems will interconnect them: **FHE computation verified by ZKPs**, protected by **PQC-authenticated communication**. Architectures will evolve to handle all of this **seamlessly**, pointing toward a unified and resilient future. > <img src="https://hackmd.io/_uploads/HysMrIjexl.png" width="40" height="30"> Will these dual tracks converge, or will they evolve in silos? What’s clear is that the NTT is no longer just a mathematical construct, it has become a foundational enabler of the emerging cryptographic and privacy-preserving revolution, powering technologies like Web3. <img src="https://hackmd.io/_uploads/ryckn_n3a.png" alt="drawing" width="50" /> With better understanding comes better design. Let's keep pushing the boundaries! ___ > A big thanks to [Ahmet Can Mert](https://acmert.github.io/) for his fantastic NTT tutorials. ___ <p style="text-align:center;font-family:georgia,garamond;"> Made with ♥️ </p>