Try   HackMD

What changes for BW6-761?

Authors: Youssef El Housni (ConsenSys, GRACE) and Aurore Guillevic (Inria, Aarhus University)

Last year in [HG20], we introduced the BW6-761 curve which is a pairing-friendly curve that forms a 2-chain with BLS12-377 (previously introduced in ZEXE). That is, BW6-761 has a subgroup order

r precisely equal to the characteristic
q
of the base field on which BLS12-377 is defined. In a recent work [HG21], we generalised the curve construction, the pairing formulas (
e:G1×G2GT
) and the group operations to any BW6 curve defined on top of any BLS12 curve, forming a family of 2-chain pairing-friendly curves. We also investigated other possible 2-chain families, and much more (read the paper!).

Since BW6-761 is already implemented in few projects (gnark-crypto, arkworks, EY/libff, kilic/bw6 and constantine) and is used in production (Celo, Aleo), natural questions arise: What changes for BW6-761 after this work? Is the new formulas backward-compatible with the current implementations? If not, does it worth switching?

The new general construction where BW6-761 would fall, is coherent with the groups operations and pairing formulae found previously in [HG20]. However, we found new pairing algorithms that are faster. In this note, we try to shed light on the differences and the backward compatibilty.

Pairing computation

Given

PG1 and
QG2
, a pairing computation for BW6 is
e(P,Q)=m(P,Q)(q61)/r

where
m(P,Q)
is the Miller loop and
zz(q61)/r
is the final exponentiation.

Final exponentiation

The final exponentiation corresponds to raising the Miller loop result to

(q61)/r or to a multiple
h(q61)/r
(where
rh
). In [HG20], we raise to
3(u3u2+1)(q61)/r
which gives a fast algorithm described in appendix B (alg.6) in that paper. It is the one implemented in all the code bases (AFAWK). In the new work, we give a generic formula for which the BW6-761 specification would be Equation (28) of that paper. This corresponds to raising to
3(u+1)(q61)/r
which gives a different result than previously. However, this new algortihm, while generic, is less efficient (previously we took exact values of
ht
and
hy
and this gave a better LLL decomposition).

conclusion: There is nothing to change regarding BW6-761 final exponentiation.

Miller loop

Previously, we presented an optimal ate Miller loop

(1)m(P,Q)=fu+1,Q(P)fu3u2u,Qq(P)
which can be optimized in
(2)m(P,Q)=fu,Q(P)l[u]Q,Q(P)(fu)u2u1,[u]Qq(P)

Where

u=0x8508c00000000001 is the BW6-761 (and BLS12-377) seed. We note that

integer bitsize Hamming weight 2-NAF hamming weight
u
or
u+1
64 7 7
u3u2u
190 136 31
u2u1
127 66 19

Equations (1) and (2) give the same result but it seems that the fastest algorithm is equation (2) with

fu,Q(Q) loop in binary and
(fu)u2u1,[u]Qq(P)
loop in 2-NAF. Note that implementing the second loop in 2-NAF requires precomputing
fu,Q1
(needed when the second loop integer has a bit
1
in its 2-NAF form) but this inverse can be replaced by a conjugate in
Fq6
.
All mentioned code bases implement Equation (1) with two independent Miller loops, the first one in binary and the second in 2-NAF (no inverse/conjugate when the two loops are independent).
In a fork of gnark-crypto, we implement all three options and report timings on an AWS z1d.large (3.4 GHz Intel Xeon) below

Equation option (1st loop - 2nd loop) timing (ms)
(1) binary - 2NAF 1.62
(2) binary - binary 1.59
(2') binary - 2NAF 1.27

So in practice, the third option is the fastest because

HW2NAF(u2u1)=19 compared to
66
in binary. This is backward compatible with existing codes.

Now, let's take a look at the new algorithms introduced in [HG21]. First, there is a different generic formula (e.g. equation (23)) when applied to BW6-761 gives

(3)m(P,Q)=fu+1,Qq(P)fu3u2+1(P)
This is slower than before (for the specific BW6-761 choice of
u
) because
HW2NAF(u3u2+1)=37
and we cannot share the computation over the two loops. Moreover, it is not backward compatible.

Next, we remark that

(u1)2 has a lower Hamming weight and derive the formula
(4)m(P,Q)=fu+1,Q(P)(fu+1)(u1)2,[u+1]Qq(P)l[(u+1)(u1)2]Q,Qq(P)

This should be faster because
HW2NAF((u1)2)=12
or
15
in binary. In practice,

Equation option (1st loop - 2nd loop) timing (ms)
(4) binary - binary 1.250
(4') binary - 2NAF 1.224

However, this is not backward compatible with equations (1) and (2) as the result is different.

Finally, we derive a novel optimal Tate (or twisted ate) Miller loop (presented in Algorithm 2 in [HG21]) that leverages the endomorphism

ϕ:(x,y)(ωx,y) on
G1
with eigenvalue
λ=t1
, where
t
is the Frobenius trace. We show that in general it is faster or at least competitive with previous algorithms. For BW6-761, we have two formulae

(5)mTate(P,Q)=fu3u2+1λ(u+1),P(Q)
(5')mate(P,Q)=fu3u2+1λ(u+1),Q(P)

and
(6)mTate(P,Q)=fu+1+λ(u3u2u),P(Q)

(6')mate(P,Q)=fu+1+λ(u3u2u),Q(P)

Equation (5) and (5') are slightly slower than respectively (6) and (6') because

HW2NAF(u3u2+1)=37 while
HW2NAF(u3u2u)=31
. We report the following timings

Equation option timing (ms)
(5) 2NAF 1.28
(6) 2NAF 1.26
(5') 2NAF 1.29
(6') 2NAF 1.28

To sum up, taking the fastest options:

Equation option timing (ms) backward compatibily
(1)* binary - 2NAF 1.62 Yes
(2') binary - 2NAF 1.27 Yes
(4') binary - 2NAF 1.22 No
(5) 2NAF 1.28 No
(6) 2NAF 1.26 No
(5') 2NAF 1.29 No
(6') 2NAF 1.28 Yes

*the one currently implemented in most code bases.

Note that these are the timings for single pairings. For multi-pairings (product of multiple single pairings), equation (2) and (4) might be slower depending on the number of pairs. In fact, we cannot share squares in the first loop as the second one needs the result of the first one for each independant pair. Here are, for example, the timings of a multi-pairing of size 3:

Equation multi-pairing size option timing (ms) backward compatibily
(1)* 3 binary-2NAF 3.81 Yes
(2') 3 binary-2NAF 3.82 Yes
(4') 3 binary-binary 3.67 No
(5) 3 2NAF 2.98 No
(6) 3 2NAF 2.95 No
(5') 3 2NAF 3.03 No
(6') 3 2NAF 2.99 Yes

conclusion If you need,

  • single pairings and backward compatibility choose (2') or (6')
  • single pairings and no backward compatibility choose (4')
  • large multi-pairings and backward compatibility choose (6')
  • large multi-pairings and no backward compatibility choose (6)

Remark: For BW6 constructed over BLS24, optimal Tate Miller loops are the fastest for single and multi-pairings.