# Efficient Prime Fields for Zero-knowledge proof
## TL;DR
This article explores efficient prime fields tailored to enhance the performance of code-based zero-knowledge proofs. These fields, implemented in Plonky3, also potentially benefit pairing-based protocols. We cover four prime fields:
1. Goldilocks Field ($p=2^{64} - 2^{32} + 1$) utilized in plonky2
2. Mersenne31 Field ($p=2^{31}-1$) used in plonky3
4. BabyBear Field ($p=2^{31} - 2^{27} + 1$) used in zkVM by Risc0
5. KoalaBear Field ($p=2^{31} - 2^{24} + 1$)
We aim to address the following questions:
### Q1. What are those prime fields and why are they chosen?
Prime fields consist of a finite number of elements with defined operations for addition, subtraction, multiplication, and division, adhering to field properties. For example, consider the set {0,1,2,3,4,5,6} under arithmetic modulo 7, where $4 + 5 \equiv 2 \pmod{7}$ and $3 \times 4 \equiv 5 \pmod{7}$. This set forms a finite field with seven elements. Prime fields are chosen for their computational efficiency across various architectures, and their structure makes them particularly suited for Fast Reed-solomon Interactive oracle proofs (FRI).
The two rules:
1. the modular operations is efficient in several architecture instruction. and small prime field is obviously more efficient and cheaper than large prime field.
2. Those fields have as large a power of 2 in the factors of $p-1$ as possible, so they are "FRI-friendly".
### Q2. How do these fields differ?
The primary difference lies in their computational efficiency and structural characteristics across different computing architectures. Additionally, the order of the prime field $\mathbb{F}_p$ is $p$, which varies among fields.
### Q3. How are these prime fields utilized in zero-knowledge proof protocols?
In Zero-knowledge proof protocols, the data in computation trace are represented as those prime field In zero-knowledge proof protocols, computational trace data are represented as elements of these prime fields. Fields such as Goldilocks, BabyBear, and Mersenne31 are designed to enhance proof generation efficiency in systems employing FRI-based commitment schemes.
### Q4. Why are these prime fields beneficial for zero-knowledge proofs?
Employing smaller prime fields in zero-knowledge proofs optimizes prover and verifier times and reduces proof sizes. This optimization leverages various techniques including diverse interactive oracle proofs (IOPs), polynomial commitment schemes, and efficient hardware use, making calculations more feasible on standard consumer hardware.
## Primer on Prime Numbers
Prime numbers are natural numbers greater than 1, divisible only by 1 and themselves. The smallest prime is 2, which is also the only even prime.
There are several types of primes:
1. Bell primes
2. Euclid primes
3. Fermat primes
4. Fibonacci primes
5. Mersenne primes
6. Solinas primes
7. ...
More detailed information about the type of primes, see [Wikipedia](https://en.wikipedia.org/wiki/List_of_prime_numbers#Lists_of_primes_by_type).
In this article, we introduce Mersenne primes, Crandall primes and Solinas primes deeply.
### Mersenne Primes
Mersenne primes are the prime numbers which is of the form $2^k-1$. The modulus of the Mersenne31 field is a Mersenne prime ($p = 2^{31}-1$) where $k=31$.
### Crandall Primes
Crandall primes are the prime numbers of the form $2^k-c$. To limit carry propagation during modular multiplication, $c$ should be small. It limits the fastest Crandall choices to a few options. such as $2^{379}-19$, $2^{389}-17$, and $2^{414}-17$
### Solinas Primes
In mathematics, a Solinas prime, or generalized Mersenne prime, is a prime number that has the form
$f(2^{m})$, where $f(x)$ is a low-degree polynomial with small integer coefficients. These primes allow **fast modular reduction algorithms** and are widely used in cryptography. They are named after Jerome Solinas. Solina primes encompasses those two categories of prime numbers mentioned before: Mersenne primes and Crandall primes.
We take Mersenne primes $p =2^{31}-1$ as an example to explain why this kind of prime allows fast modular reduction algorithms. Given a number $n \leq p^2$ , bit-length of $n$ is less than 64. We represent $n$ in base 2^{31}, than we have $$n \mod p = a_0\cdot 2^{62}+a_1\cdot 2^{31} + a_2 \mod p = a_0 + a_1 + a_2 \mod p$$. For any 64-bit integer $n$, the result of the modular reduction is equivalent to the modular addition of three segments of its binary representation.
Four of the recommended primes by NIST are Solinas primes:
1. p-192: $2^{192}-2^{64}-1$.
2. p-224: $2^{224}-2^{96}+1$.
3. p-256: $2^{256}-2^{224}+2^{192}+2^{96}-1$.
4. p-384: $2^{384}-2^{128}-2^{96}+2^{32}-1$.
### Goldilocks Primes
Goldilock primes are the Solinas tribomial primes. In [[Hamburg15]](https://eprint.iacr.org/2015/625.pdf), Mike Hamburg introduced a class of primes of the form $p = \phi^{2}-\phi-1$ and named them Goldilocks primes, where $phi$ satisfies the Golden ratio relation $\phi^2 = \phi+1 \mod p$. When $\phi$ is a power of two this allows for very efficient implementation as the benifits of Solinas primes. Unfortunately, the $−1$ at the end means it will have very few power-of-two roots of unity, which is not friendly for doing large-scale number theoretic transforms (NTT) used in many ZKP systems..
#### Goldilocks Primes in plonky2
To solve the drawbacks of Goldilock primes poposed by Hamburg, the researchers in plonky use the prime which is similar to the form of Goldilock primes, $p = \phi^2-\phi+1 = 2^{64}-2^{32}+1$ and keep the name Goldilocks prime.
The first advantage is : the $p = 2^{64}-2^{32}+1$ has a $\phi = 2^{32}$-th root of unity. Because $p-1 = 2^{32}\cdot 3 \cdot 5 \cdot 17 \cdot 257 \cdot 65537$, it is friendly for doing number theoretic transforms for the polynomials with degree up to $2^{32}$.
The second advantage is from the prime of the form $f(2^m)=2^{2m}-2^{m}+1$. To multiplication two number $a,b\in \mathbb{F}_{2^{64}-2^{32}+1}$, we can represent them in base $2^{32}$ and make use of Karatsuba multiplication:
$$
a\cdot b = (a_0+a_1\cdot \phi)\cdot (b_0+b_1\cdot \phi)=a_0\cdot b_0-a_1\cdot b_1+((a_0+a_1)\cdot (b_0+b_1)-a_0\cdot b_0)\cdot \phi$$
## Field Extension
There are several implementation methods for field extension in plonky3:
1. \~128 bit extension field(Mersenne31\BabyBear\KoalaBear\Goldilocks)
2. "complex" extension field (Mersenne31).
3. AVX2 (Mersenne31\BabyBear\KoalaBear): Optimization relied on x86_64_AVX2 CPU instructions
4. AVX-512 (Mersenne31\BabyBear\KoalaBear): Optimization relied on x86_64_AVX-512 CPU instructions
5. NEON (Mersenne31\BabyBear\KoalaBear: Optimization relied on aarch64_neon instructions
### ~128 bit Extension Field
For 64-bits Goldilocks field, the ~128 bit extension field is the quadratic extension of base field $p = 2^{64}-2^{32}+1$. It is easy to find a irreducible 2-degree polynomial: $x^2-w$, where $w$ is 7. Based on this irreducible polynomial, we can construct the extend field $\mathbb{F}_{p^2}$.
We can use vector to represent a element in the extend field $[a_0,a_1], \text{where}\ a_0, a_1 \in \mathbb{F}_{p}$. We can also use a polynomial $a_0+a_1x$ to represent this element.
Now the two basic operations ( field addition operation and field multiplication operation) can be define as the polynomial arithmetic.
- Field addition: Given $a,b\in\mathbb{F}_{p^2}$, $a+b = [a_0+b_0, a_1+b_1]$. Notably, the addition of a_i or b_i is defined in base field $\mathbb{F}_{p}$.
- Field multiplication: $a\cdot b = [a_0\cdot b_0+ a_1\cdot b_1 \cdot w, a_0\cdot b_1+ a_1 \cdot b_0]$.
Through the field extension, the field arithmetic is defined in extend field can be computed efficiently based on the small based field.
For ~32-bits field (Mersenne31\BabyBear\KoalaBear), the ~128 bit extension field can be constructed by using two quadratic field extensions.
### Complex Extension
## More references
For those interested in a deeper dive into small field applications in zero-knowledge proofs, I recommend the article [Small Fields for ZKP](https://blog.icme.io/small-fields-for-zero-knowledge/)