# Faster Extension Field multiplications for Emulated Pairing Circuits
In this article we showcase a new way to greatly enhance the cost of the Miller loop and final exponentiation of pairings in circuit by extensively using hints and moving away from the traditional towered extension approach.
It uses a combination of the Fiat-Shamir Heurisitic, the Schwartz-Zippel lemma, and a polynomial commitment scheme.
The work has been done in CairoZero inside the Garaga library[^1] for the bn254 curve. Equivalent optimisations to BLS12-381 will be applied in the near future.
## 1. Coming back to *direct* extensions
Usually in almost all efficient pairings implementations, a tower of field extension is used to obtain fast formulas for $\mathbb F_{p^{12}}$ arithmetics.
For example, the following tower is used for the Bn254 curve in Gnark[^3] :
$$
\begin{aligned}
\mathbb F_{p^2} &= \mathbb F_p[u]/(u^2 + 1) \\
\mathbb F_{p^6} &= \mathbb F_{p^2}[v]/(v^3 - 9 - u) \\
\mathbb F_{p^{12}} &= \mathbb F_{p^6}[w]/(w^2 - v)
\end{aligned}
$$
We can construct back $\mathbb F_{p^{12}}$ and $\mathbb F_{p^{6}}$ as direct extensions of $\mathbb F_{p}$ and obtain bijections between the tower and direct representation.
#### Obtaining the irreducible polynomials
For $\mathbb F_{p^{12}}$ :
$$
\begin{aligned}
\left\{
\begin{aligned}
u^2 + 1 &= 0 \\
v^3 - 9 - u &= 0 \\
w^2 - v &= 0
\end{aligned}
\right.
& \quad\Leftrightarrow\quad
\left\{
\begin{aligned}
u^2 &= -1 \\
v^3 &= 9 + u \\
w^2 &= v
\end{aligned}
\right.
\end{aligned}
$$
therefore
$$
\begin{aligned}
w^2 &= v \\
\Leftrightarrow w^6 &= v^3\\
\Leftrightarrow w^6 &= 9+u\\
\Leftrightarrow w^6-9 &= u\\
\Leftrightarrow w^{12} - 18w^6 + 81 &= -1\\
\end{aligned}
\quad \Leftrightarrow\boxed{w^{12} - 18w^6 + 82 =0}
$$
Similarly, for $\mathbb F_{p^{6}}$ :
$$
\begin{aligned}
v^3 &= 9+u \\
\Leftrightarrow v^6 &= u^2 + 18u + 81 \\
\Leftrightarrow v^6 &= 18u + 80\\
\Leftrightarrow v^6 &= 18(v^3-9) + 80\\
\Leftrightarrow v^6 &= 18v^3-162 + 80\\
\end{aligned}
\quad \Leftrightarrow\boxed{v^{6} - 18v^3 + 82 =0}
$$
We can define $\boxed{P_{12}(x) = x^{12} - 18x^6 + 82}$ and $\boxed{P_{6}(x)=x^{6} - 18x^3 + 82}$ as the irreducible polynomials generating the *direct* extensions $\mathbb F_{p^{12}}= \mathbb F_p[w]/P_{12}(w)$ and $\mathbb F_{p^{6}}= \mathbb F_p[v]/P_{6}(v)$.
We can verify those polynomials are indeed irreducible over $\mathbb F_p$ using the following `.sage` script :
```python
from sage.all import *
# BN254 prime field
K = GF(21888242871839275222246405745257275088696311157297823662689037894645226208583)
R.<x> = PolynomialRing(K)
f12 = x^12 - 18*x^6 + 82
f6 = x^6 - 18*x^3 + 82
f2 = x^2 +1
def check_irreducibility(polynomial, field):
is_irred = "is" if polynomial.is_irreducible() else "is NOT"
print(f"The polynomial {polynomial} {is_irred} irreducible over {field}.")
check_irreducibility(f12, K)
check_irreducibility(f6, K)
check_irreducibility(f2, K)
```
#### Bijection between tower and *direct* representation
A $\mathbb F_{p^{12}}$ element can be represented by twelve $\mathbb F_{p}$ elements but the actual coefficients will differ depending on if you're using the tower extension stucture or the direct extension.
Using the tower structure, $X\in \mathbb F_{p^{12}}$ is represented by twelve coefficients $a_0, ... , a_{11} \in \mathbb F_{p}$ :
$$
\begin{aligned}
X = a_0 + a_1u + (a_2 + a_3 u)v + (a_4+a_5u)v^2 + (a_6 + a_7u + (a_8 + a_9 u)v + (a_{10}+a_{11}u)v^2)w \\
\end{aligned}
$$
Whereas, using the direct representation, one can write $X\in \mathbb F_{p^{12}}$ using $x_0, ... , x_{11} \in \mathbb F_{p}$ :
$$
\begin{aligned}
X = x_0 + x_1w + x_2w^2+ x_3w^3 + x_4w^4+x_5w^5 + x_6w^6 + x_7w^7 + x_8w^8 + x_9w^9 + x_{10}w^{10}+x_{11}w^{11} \\
\end{aligned}
$$
To obtain the bijective map between $a_i$ and $x_i$, one can express the variables multiplied by each $a_i$ in function of $w$ only, using the tower formulas above.
$$
\begin{aligned}
a_0 &\rightarrow 1\\
a_1 &\rightarrow u = w^6 -9 \\
a_2 &\rightarrow v = w^2\\
a_3 &\rightarrow uv = (w^6-9)w^2 = w^8 -9w^2\\
a_4 &\rightarrow v^2 = w^4\\
a_5 &\rightarrow uv^2 = w^{10} -9w^4\\
a_6 &\rightarrow w\\
a_7 &\rightarrow uw = w^7 -9w\\
a_8 &\rightarrow vw = w^3\\
a_9 &\rightarrow uvw = w^9 -9w^3\\
a_{10} &\rightarrow v^2w = w^5\\
a_{11} &\rightarrow uv^2w = w^{11} -9w^5\\
\end{aligned}
$$
Therefore, you can convert back and forth between the two representations. A simple gist is provided here [^4].
## 2. Multiply in the direct extension using polynomial Euclidean division
Using the direct extensions, one can multiply $A, B\in \mathbb F_{p^{12}}$ or $A, B\in \mathbb F_{p^{6}}$ using Euclidean division of polynomials. Garaga uses both $F_{p^{12}}$ multiplication for the Miller loop and $F_{p^{6}}$ multiplications for the final exponentiation performed in compressed Torus form, as outlined by Youssef El Housni in his work on pairings in R1CS[^2], Torus based final exponentiation is more efficient because inversions costs the same as multiplications in circuits.
:::info
**Note:** It's remarkable to observe that certain operations traditionally viewed as inefficient for real computers—such as field inversions and polynomial divisions as we'll se here—emerge as the most effective approaches when applied to circuit designs.
:::
We will focus here on $F_{p^{12}}$ multiplications but the principle is exactly the same for $F_{p^{6}}$ using the irreducible polynomial $P_6$ instead of $P_{12}$.
To compute $(A*B)\in \mathbb F_{p^{12}}$, we can aply the following :
- Treat $A, B$ as polynomials of degree (at most) 11. For simplicity, we will forget the distinction between the polynomial representation and the extension field element and use the same notation for both mathematical objects.
- Compute $C=AB$, the polynomial of degree (at most) 22.
- Perform the Euclidean division of $C$ by the irreducible polynomial $P_{12}$ of degree 12 and obtain the quotient $Q$ and the remainder $R$ of degree (at most) 10 and (at most) 11 such that:
$$
\boxed{A(x)*B(x) = Q(x)*P_{12}(x) + R(x)} \tag{1}
$$
- Use $R \in \mathbb F_{p^{12}}$ as our result for extension field multiplication.
The condition on the $deg(R) < deg(P_{12})$ guarantees that $Q$ and $R$ are unique[^6].
## 3. Using Fiat-Shamir and the Schwartz-Zippel lemma in circuit
To take advantage of this approach inside a circuit, the following steps can be performed to multiply $A_i, B_i\in \mathbb F_{p^{12}}$:
- Request the prover to supply $Q_i$ and $R_i$ of the correct degrees, as per Equation (1)
- Hash $A_i, B_i, Q_i$ and $R_i$ all together to obtain $z_i \in \mathbb F_p$ (using the Fiat-Shamir heuristic to derive a random point).
- Evaluate $a_i=A_i(z_i), b_i=B_i(z_i), q_i=Q_i(z_i), p_i=P_{12}(z_i), r_i=R_i(z_i)$
- Verify that $a_i*b_i=q_i*p_i+r_i$.
According to the Schwartz-Zippel Lemma, if $Q$ and $R$ are incorrect, the evaluated polynomials will differ with a very high probability.
## 4. Adding a polynomial commitment scheme for efficiency
#### Obtaining a single random point
The latter approach forces us to compute $z_i^2, ..., z_i^{12}$ for each quadruplet $A_i, B_i, Q_i, R_i$ and evaluate $P_{12}(z_i)$ for each $i$.
Pairing algorithms involve hundreds of extension field multiplications and we would like to find some ways of batching those verifications.
Instead of computing as many $z_i$ as there are extension field multiplication, we can instead do this work at the end of the algorithm and obtain one single random point $z\in\mathbb F_p$ by hashing all quadruplets together.
$$
z = Hash(A_0, B_0, Q_0, R_0, \cdots, A_{n-1}, {n-1}, Q_{n-1}, R_{n-1})
$$
Therefore we can evaluate the polynomials as $a_i=A_i(z), b_i=B_i(z), q_i=Q_i(z), p_i=P_{12}(z), r_i=R_i(z)$ and save $n*11$ field multiplications required initially to compute $n$ times $(z_i^2, ..., z_i^{12})$.
#### Verifying a single equation with a polynomial commitment scheme
We can go further by verifying one single equation instead of $n$ by using a linear combination with random coefficients $c_i$:
$$
\sum_{i=0}^{n-1}{c_i*A_i(z)*B_i(z)} = \sum_{i=0}^{n-1}{c_i*(Q_i(z)*P_{12}(z) + R_i(x))}
$$
We can re-arrange the formula by separating the $Q*P$ term from the $R$ term to obtain two sums:
$$
\sum_{i=0}^{n-1}{c_i*A_i(z)*B_i(z)} = \sum_{i=0}^{n-1}{c_i*Q_i(z)*P_{12}(z)} + \sum_{i=0}^{n-1}{c_i*R_i(x)}
$$
Since $P_{12}(z)$ is a constant, we can rewrite this as :
$$
\boxed{\sum_{i=0}^{n-1}{c_i*A_i(z)*B_i(z)} =P_{12}(z) \sum_{i=0}^{n-1}{c_i*Q_i(z)} + \sum_{i=0}^{n-1}{c_i*R_i(x)}} \tag{2}
$$
Using Cairo's Poseidon implementation as a hashing function, we derive each $c_i$ from the second output state of each intermediate hash.
More precisely, at a given multiplication $i$, we derive the *next* $z$ and $c_i$ by computing $Poseidon(z, A_i, B_i, Q_i, R_i)$, which gives us $s_0, s_1$, both of 252 bits as output.
We then assign $z=s_0$ that we use as input for the next $i$, and assign $c_i=s_1\bmod 2^k$, with $k \approx 100$ to achieve targeted 100 bits security in the commitment scheme. $z$ is kept as is as a 252 bit number.
With those conditions, and the base chosen to represent the emulated $\mathbb F_p$ elements with 3 limbs, the equation (2) allows us to perform several optimisations :
- We can multiply each limb by $c_i$ for each coefficient of $Q_i$ and $R_i$ without overflowing Cairo's prime field.
- We can add each multipied-by-$c_i$ limb for each coefficient of $Q_i$ and $R_i$ without overflowing Cairo's prime field, allowing us to obtain $\sum_{i=0}^{n-1}c_iQ_i$ and $\sum_{i=0}^{n-1}c_iR_i$ as polynomials and evaluating them only once in the end since for polynomials $(\sum_{i}X_i)(z) = \sum_{i}(X_i(z))$.
#### Pre-computing $z$ to allow inlined operations to save a large circuit overhead
Finally, instead of keeping on the side all the quadruplets $A_i, B_i, Q_i, R_i$ to hash them in the end and obtain $z$ to evaluate the polynomials, we provide the precomputed $z$ as input to the circuit and perform the evaluations and polynomial accumulation in a inlined way in *advance* as we continue the hashing process.
We assert in the end that the initially provided $z$ matches the computed one. This saves a large overhead in the Cairo program.
## 5. Benchmarks
We obtained more than a 50% reduction in cost following this approach from what was already quite optimized using a lot of unreduced additions and multiplications, as well as multi-miller loop, and Torus based final exponentiation, largely inspired by El Housni's work.
Other optimisations are coming from the use of sparsity in the extension field elements, as sometimes the degree of $Q$ can be smaller by construction or some elements in $R$ are sparsed or forced to be equal to 1 in some cases.
| Operation on curve BN254 | Cairo steps |
|---------|---------------|
| miller_loop | 320 848 |
| multi_miller_loop (N points) | ~ N * 260441 + 60407 |
| final_exponentiation | 270 215 |
| one pairing | 591 063 |
| Product of three pairings | 1 088 319 |
| Starknet current limitations | Cairo steps |
|---------|---------------|
| Transaction | 3 000 000 |
| Block | 8 000 000 |
To our knowledge and all things being equal, this is presumably currently the fastest way of doing emulated pairing in circuits for this curve.
## 6. Acknowledgments and links
I would like to thank Shahar Papini from Starkware for his kind support and the original idea of using direct extensions and Swchartz-Zippel lemma for extension field multiplication.
The actual implementation can be found in the Garaga repository under the `src/bn254` subfolder.
[^1]: https://github.com/keep-starknet-strange/garaga
[^2]: https://eprint.iacr.org/2022/1162
[^3]:https://github.com/Consensys/gnark-crypto/blob/0d4950450c33014e7717d44941cb10e2f64daec3/ecc/bn254/bn254.go#L27-L31
[^4]:https://gist.github.com/feltroidprime/bd31ab8e0cbc0bf8cd952c8b8ed55bf5
[^6]: https://en.wikipedia.org/wiki/Polynomial_long_division