# What changes for BW6-761?
*Authors: [Youssef El Housni](https://yelhousni.github.io/) ([ConsenSys](https://consensys.net/), [GRACE](https://team.inria.fr/grace/)) and [Aurore Guillevic](https://members.loria.fr/AGuillevic/) ([Inria](https://www.inria.fr/en/caramba), [Aarhus University](https://cs.au.dk/research/cryptography-and-security/))*
Last year in [[HG20](https://eprint.iacr.org/2020/351.pdf)], we introduced the BW6-761 curve which is a pairing-friendly curve that forms a 2-chain with BLS12-377 (previously introduced in [ZEXE](https://eprint.iacr.org/2018/962.pdf)). That is, BW6-761 has a subgroup order $r$ precisely equal to the characteristic $q$ of the base field on which BLS12-377 is defined. In a recent work [[HG21](https://eprint.iacr.org/2021/1359.pdf)], we generalised the curve construction, the pairing formulas ($e : \mathbb{G}_1 \times \mathbb{G}_2 → \mathbb{G}_T$) and the group operations to any BW6 curve defined on top of any BLS12 curve, forming a family of 2-chain pairing-friendly curves. We also investigated other possible 2-chain families, and much more (read the [paper](https://eprint.iacr.org/2021/1359.pdf)!).
Since BW6-761 is already implemented in few projects ([gnark-crypto](https://github.com/ConsenSys/gnark-crypto/tree/master/ecc/bw6-761), [arkworks](https://github.com/arkworks-rs/curves/tree/master/bw6_761), [EY/libff](https://github.com/EYBlockchain/zk-swap-libff/tree/ey/libff/algebra/curves/bw6_761), [kilic/bw6](https://github.com/kilic/bw6) and [constantine](https://github.com/mratsim/constantine/tree/master/constantine/curves)) and is used in production ([Celo](https://celo.org/), [Aleo](https://aleo.org/)), natural questions arise: *What changes for BW6-761 after this work? Is the new formulas backward-compatible with the current implementations? If not, does it worth switching?*
The new general construction where BW6-761 would fall, is coherent with the groups operations and pairing formulae found previously in [HG20]. However, we found new pairing algorithms that are faster. In this note, we try to shed light on the differences and the backward compatibilty.
## Pairing computation
Given $P \in \mathbb{G}_1$ and $Q \in \mathbb{G}_2$, a pairing computation for BW6 is
$$ e(P,Q) = m(P,Q)^{(q^6-1)/r}$$
where $m(P,Q)$ is the Miller loop and $z \mapsto z^{(q^6-1)/r}$ is the final exponentiation.
### Final exponentiation
The final exponentiation corresponds to raising the Miller loop result to $(q^6-1)/r$ or to a multiple $h\cdot (q^6-1)/r$ (where $r \nmid h$). In [HG20], we raise to $3(u^3-u^2+1)(q^6-1)/r$ which gives a fast algorithm described in appendix B (alg.6) in that paper. It is the one implemented in all the code bases (AFAWK). In the new work, we give a generic formula for which the BW6-761 specification would be Equation (28) of that paper. This corresponds to raising to $3(u+1)(q^6-1)/r$ which gives a different result than previously. However, this new algortihm, while generic, is less efficient (previously we took exact values of $h_t$ and $h_y$ and this gave a better LLL decomposition).
**$\rightarrow$ conclusion**: There is nothing to change regarding BW6-761 final exponentiation.
### Miller loop
Previously, we presented an optimal ate Miller loop
$$
m(P,Q) = f_{u+1,Q}(P) \cdot f^q_{u^3-u^2-u,Q}(P)
\tag{1}
$$
which can be optimized in
$$
m(P,Q) = f_{u,Q}(P) \cdot l_{[u]Q,Q}(P) \cdot (f_u)^q_{u^2-u-1,[u]Q}(P)
\tag{2}
$$
Where $u=\texttt{0x8508c00000000001}$ is the BW6-761 (and BLS12-377) seed. We note that
integer | bitsize | Hamming weight | 2-NAF hamming weight |
------- | ------- | -------------- | -------------------- |
$u$ or $u+1$ | 64 | 7 | 7
$u^3-u^2-u$ | 190 | 136 | 31
$u^2-u-1$ | 127 | 66 | 19
Equations (1) and (2) give the same result but it seems that the fastest algorithm is equation (2) with $f_{u,Q}(Q)$ loop in binary and $(f_u)^q_{u^2-u-1,[u]Q}(P)$ loop in 2-NAF. Note that implementing the second loop in 2-NAF requires precomputing $f_{u,Q}^{-1}$ (needed when the second loop integer has a bit $-1$ in its 2-NAF form) but this inverse can be replaced by a conjugate in $\mathbb{F}_{q^6}$.
All mentioned code bases implement Equation (1) with two independent Miller loops, the first one in binary and the second in 2-NAF (no inverse/conjugate when the two loops are independent).
In [a fork of gnark-crypto](https://github.com/yelhousni/gnark-crypto), we implement all three options and report timings on an AWS z1d.large (3.4 GHz Intel Xeon) below
Equation | option (1st loop - 2nd loop) | timing (ms)
-------- | ------------ | ------
[(1)](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L946) | binary - 2NAF | 1.62
[(2)](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L704) | binary - binary| 1.59
[(2')](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L819) | binary - 2NAF | 1.27
So in practice, the third option is the fastest because $\texttt{HW}_{\texttt{2NAF}}(u^2-u-1)=19$ compared to $66$ in binary. This is backward compatible with existing codes.
Now, let's take a look at the new algorithms introduced in [HG21]. First, there is a different generic formula (e.g. equation (23)) when applied to BW6-761 gives
$$
m(P,Q) = f^q_{u+1,Q}(P) \cdot f_{u^3-u^2+1}(P)
\tag{3}
$$
This is slower than before (for the specific BW6-761 choice of $u$) because $\texttt{HW}_{\texttt{2NAF}}(u^3-u^2+1)=37$ and we cannot share the computation over the two loops. Moreover, it is not backward compatible.
Next, we remark that $(u-1)^2$ has a lower Hamming weight and derive the formula
$$
m(P,Q) = f_{u+1,Q}(P) \cdot (f_{u+1})^q_{(u-1)^2,[u+1]Q}(P) \cdot l^q_{[(u+1)(u-1)^2]Q,-Q}(P)
\tag{4}
$$
This should be faster because $\texttt{HW}_{\texttt{2NAF}}((u-1)^2)=12$ or $15$ in binary. In practice,
Equation | option (1st loop - 2nd loop) | timing (ms)
-------- | ------------ | ------
[(4)](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L584) | binary - binary | 1.250
[(4')](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L459) | binary - 2NAF | 1.224
However, this is not backward compatible with equations (1) and (2) as the result is different.
Finally, we derive a novel **optimal Tate** (or twisted ate) Miller loop (presented in Algorithm 2 in [HG21]) that leverages the endomorphism $\phi: (x,y) \mapsto (\omega x,-y)$ on $\mathbb{G}_1$ with eigenvalue $\lambda=t-1$, where $t$ is the Frobenius trace. We show that in general it is faster or at least competitive with previous algorithms. For BW6-761, we have two formulae
$$
m_{Tate}(P,Q) = f_{u^3-u^2+1-\lambda\cdot(u+1),P}(Q)
\tag{5}
$$
$$
m_{ate}(P,Q) = f_{u^3-u^2+1-\lambda\cdot(u+1),Q}(P)
\tag{5'}
$$
and
$$
m_{Tate}(P,Q) = f_{u+1+\lambda\cdot(u^3-u^2-u),P}(Q)
\tag{6}
$$
$$
m_{ate}(P,Q) = f_{u+1+\lambda\cdot(u^3-u^2-u),Q}(P)
\tag{6'}
$$
Equation (5) and (5') are slightly slower than respectively (6) and (6') because $\texttt{HW}_{\texttt{2NAF}}(u^3-u^2+1)=37$ while $\texttt{HW}_{\texttt{2NAF}}(u^3-u^2-u)=31$. We report the following timings
Equation | option | timing (ms)
-------- | ------ | ------
[(5)](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L312) | 2NAF | 1.28
[(6)](https://github.com/yelhousni/gnark-crypto/blob/e19af89e10fd6b421b06f9d5072434bbba89f003/ecc/bw6-761/pairing.go#L163) | 2NAF | 1.26
[(5')](https://github.com/yelhousni/gnark-crypto/blob/67e6bcd98549096529a977ff04d317f763643c68/ecc/bw6-761/pairing.go#L165) | 2NAF | 1.29
[(6')](https://github.com/yelhousni/gnark-crypto/blob/67e6bcd98549096529a977ff04d317f763643c68/ecc/bw6-761/pairing.go#L311) | 2NAF | 1.28
To sum up, taking the fastest options:
Equation| option | timing (ms) | backward compatibily
--------| ------- | ----------- | --------------------
(1)* | binary - 2NAF | 1.62 | Yes
(2') | binary - 2NAF | 1.27 | Yes
(4') | binary - 2NAF | 1.22 | No
(5) | 2NAF | 1.28 | No
(6) | 2NAF | 1.26 | No
(5') | 2NAF | 1.29 | No
(6') | 2NAF | 1.28 | Yes
*the one currently implemented in most code bases.
Note that these are the timings for single pairings. For multi-pairings (product of multiple single pairings), equation (2) and (4) might be slower depending on the number of pairs. In fact, we cannot share squares in the first loop as the second one needs the result of the first one for each independant pair. Here are, for example, the timings of a multi-pairing of size 3:
Equation | multi-pairing size |option | timing (ms) | backward compatibily
-------- | ------ | ------ | --|-
(1)* | 3 | binary-2NAF | 3.81 | Yes
(2') | 3| binary-2NAF | 3.82 | Yes
(4') | 3 | binary-binary | 3.67 | No
(5) | 3 | 2NAF | 2.98 | No
(6) | 3 | 2NAF | 2.95 | No
(5') | 3 | 2NAF | 3.03 | No
(6') | 3 | 2NAF | 2.99| Yes
**$\rightarrow$ conclusion** If you need,
- single pairings and backward compatibility choose (2') or (6')
- single pairings and **no** backward compatibility choose (4')
- large multi-pairings and backward compatibility choose (6')
- large multi-pairings and **no** backward compatibility choose (6)
*Remark*: For BW6 constructed over BLS24, optimal Tate Miller loops are the fastest for single and multi-pairings.