What changes for BW6-761?

Authors: Youssef El Housni (ConsenSys, GRACE) and Aurore Guillevic (Inria, Aarhus University)

Last year in [HG20], we introduced the BW6-761 curve which is a pairing-friendly curve that forms a 2-chain with BLS12-377 (previously introduced in ZEXE). That is, BW6-761 has a subgroup order

r

precisely equal to the characteristic

q

of the base field on which BLS12-377 is defined. In a recent work [HG21], we generalised the curve construction, the pairing formulas (

e : G_{1} \times G_{2} \to G_{T}

) and the group operations to any BW6 curve defined on top of any BLS12 curve, forming a family of 2-chain pairing-friendly curves. We also investigated other possible 2-chain families, and much more (read the paper!).

Since BW6-761 is already implemented in few projects (gnark-crypto, arkworks, EY/libff, kilic/bw6 and constantine) and is used in production (Celo, Aleo), natural questions arise: What changes for BW6-761 after this work? Is the new formulas backward-compatible with the current implementations? If not, does it worth switching?

The new general construction where BW6-761 would fall, is coherent with the groups operations and pairing formulae found previously in [HG20]. However, we found new pairing algorithms that are faster. In this note, we try to shed light on the differences and the backward compatibilty.

Pairing computation

Given

P \in G_{1}

and

Q \in G_{2}

, a pairing computation for BW6 is

e (P, Q) = m (P, Q)^{(q^{6} - 1) / r}

where

m (P, Q)

is the Miller loop and

z \mapsto z^{(q^{6} - 1) / r}

is the final exponentiation.

Final exponentiation

The final exponentiation corresponds to raising the Miller loop result to

(q^{6} - 1) / r

or to a multiple

h \cdot (q^{6} - 1) / r

(where

r ∤ h

). In [HG20], we raise to

3 (u^{3} - u^{2} + 1) (q^{6} - 1) / r

which gives a fast algorithm described in appendix B (alg.6) in that paper. It is the one implemented in all the code bases (AFAWK). In the new work, we give a generic formula for which the BW6-761 specification would be Equation (28) of that paper. This corresponds to raising to

3 (u + 1) (q^{6} - 1) / r

which gives a different result than previously. However, this new algortihm, while generic, is less efficient (previously we took exact values of

h_{t}

and

h_{y}

and this gave a better LLL decomposition).

$\to$ conclusion: There is nothing to change regarding BW6-761 final exponentiation.

Miller loop

Previously, we presented an optimal ate Miller loop

\begin{matrix} (1) & m (P, Q) = f_{u + 1, Q} (P) \cdot f_{u^{3} - u^{2} - u, Q}^{q} (P) \end{matrix}

which can be optimized in

\begin{matrix} (2) & m (P, Q) = f_{u, Q} (P) \cdot l_{[u] Q, Q} (P) \cdot (f_{u})_{u^{2} - u - 1, [u] Q}^{q} (P) \end{matrix}

Where

u = 0x8508c00000000001

is the BW6-761 (and BLS12-377) seed. We note that

integer	bitsize	Hamming weight	2-NAF hamming weight
$u$ or $u + 1$	64	7	7
$u^{3} - u^{2} - u$	190	136	31
$u^{2} - u - 1$	127	66	19

Equations (1) and (2) give the same result but it seems that the fastest algorithm is equation (2) with

f_{u, Q} (Q)

loop in binary and

(f_{u})_{u^{2} - u - 1, [u] Q}^{q} (P)

loop in 2-NAF. Note that implementing the second loop in 2-NAF requires precomputing

f_{u, Q}^{- 1}

(needed when the second loop integer has a bit

- 1

in its 2-NAF form) but this inverse can be replaced by a conjugate in

F_{q^{6}}

.
All mentioned code bases implement Equation (1) with two independent Miller loops, the first one in binary and the second in 2-NAF (no inverse/conjugate when the two loops are independent).
In a fork of gnark-crypto, we implement all three options and report timings on an AWS z1d.large (3.4 GHz Intel Xeon) below

Equation	option (1st loop - 2nd loop)	timing (ms)
(1)	binary - 2NAF	1.62
(2)	binary - binary	1.59
(2')	binary - 2NAF	1.27

So in practice, the third option is the fastest because

{HW}_{2NAF} (u^{2} - u - 1) = 19

compared to

66

in binary. This is backward compatible with existing codes.

Now, let's take a look at the new algorithms introduced in [HG21]. First, there is a different generic formula (e.g. equation (23)) when applied to BW6-761 gives

\begin{matrix} (3) & m (P, Q) = f_{u + 1, Q}^{q} (P) \cdot f_{u^{3} - u^{2} + 1} (P) \end{matrix}

This is slower than before (for the specific BW6-761 choice of

u

) because

{HW}_{2NAF} (u^{3} - u^{2} + 1) = 37

and we cannot share the computation over the two loops. Moreover, it is not backward compatible.

Next, we remark that

(u - 1)^{2}

has a lower Hamming weight and derive the formula

\begin{matrix} (4) & m (P, Q) = f_{u + 1, Q} (P) \cdot (f_{u + 1})_{(u - 1)^{2}, [u + 1] Q}^{q} (P) \cdot l_{[(u + 1) (u - 1)^{2}] Q, - Q}^{q} (P) \end{matrix}

This should be faster because

{HW}_{2NAF} ((u - 1)^{2}) = 12

15

in binary. In practice,

Equation	option (1st loop - 2nd loop)	timing (ms)
(4)	binary - binary	1.250
(4')	binary - 2NAF	1.224

However, this is not backward compatible with equations (1) and (2) as the result is different.

Finally, we derive a novel optimal Tate (or twisted ate) Miller loop (presented in Algorithm 2 in [HG21]) that leverages the endomorphism

ϕ : (x, y) \mapsto (ω x, - y)

G_{1}

with eigenvalue

λ = t - 1

, where

t

is the Frobenius trace. We show that in general it is faster or at least competitive with previous algorithms. For BW6-761, we have two formulae

\begin{matrix} (5) & m_{T a t e} (P, Q) = f_{u^{3} - u^{2} + 1 - λ \cdot (u + 1), P} (Q) \end{matrix}

\begin{matrix} (5') & m_{a t e} (P, Q) = f_{u^{3} - u^{2} + 1 - λ \cdot (u + 1), Q} (P) \end{matrix}

and

\begin{matrix} (6) & m_{T a t e} (P, Q) = f_{u + 1 + λ \cdot (u^{3} - u^{2} - u), P} (Q) \end{matrix}

\begin{matrix} (6') & m_{a t e} (P, Q) = f_{u + 1 + λ \cdot (u^{3} - u^{2} - u), Q} (P) \end{matrix}

Equation (5) and (5') are slightly slower than respectively (6) and (6') because

{HW}_{2NAF} (u^{3} - u^{2} + 1) = 37

while

{HW}_{2NAF} (u^{3} - u^{2} - u) = 31

. We report the following timings

Equation	option	timing (ms)
(5)	2NAF	1.28
(6)	2NAF	1.26
(5')	2NAF	1.29
(6')	2NAF	1.28

To sum up, taking the fastest options:

Equation	option	timing (ms)	backward compatibily
(1)*	binary - 2NAF	1.62	Yes
(2')	binary - 2NAF	1.27	Yes
(4')	binary - 2NAF	1.22	No
(5)	2NAF	1.28	No
(6)	2NAF	1.26	No
(5')	2NAF	1.29	No
(6')	2NAF	1.28	Yes

*the one currently implemented in most code bases.

Note that these are the timings for single pairings. For multi-pairings (product of multiple single pairings), equation (2) and (4) might be slower depending on the number of pairs. In fact, we cannot share squares in the first loop as the second one needs the result of the first one for each independant pair. Here are, for example, the timings of a multi-pairing of size 3:

Equation	multi-pairing size	option	timing (ms)	backward compatibily
(1)*	3	binary-2NAF	3.81	Yes
(2')	3	binary-2NAF	3.82	Yes
(4')	3	binary-binary	3.67	No
(5)	3	2NAF	2.98	No
(6)	3	2NAF	2.95	No
(5')	3	2NAF	3.03	No
(6')	3	2NAF	2.99	Yes

$\to$ conclusion If you need,

single pairings and backward compatibility choose (2') or (6')
single pairings and no backward compatibility choose (4')
large multi-pairings and backward compatibility choose (6')
large multi-pairings and no backward compatibility choose (6)

Remark: For BW6 constructed over BLS24, optimal Tate Miller loops are the fastest for single and multi-pairings.

What changes for BW6-761?

Pairing computation

Final exponentiation

Miller loop

Read more

Yet another curve, but THE curve for your KZG!

BW6 over BLS12-381

Benchmarking pairing-friendly elliptic curves libraries

PLONK