This note describes the techniques used for optimizing the bilinear pairing computation with field emulation in Groth16 using gnark.
A shortlist of the techniques used are:
When using pairing-based SNARK constructions (for example, Groth16 and PLONK), then the computations are performed on the scalar field of the underlying elliptic curve. Depending on the context either the elliptic curve is fixed (for example, when we target verifier running as an Ethereum smart contract) or we have little choice over choosing the curve to have suitable scalar field. This means that if we want to perform computations over arbitrary finite field (further on, lets call it nonnative field), then we have to emulate multi-precision arithmetic on the native field.
For consistency, let denote the modulus of the native field as
Usually, the multi-precision arithmetic is implemented by representing an integer value
Assuming the add-with-carry and mul-with-carry we can implement multi-precision addition and multiplication by operating on the limbs and carrying the excess over the next limb, as learned in school.
This approach works really well on native hardware which has suitable primitive integer types (8-, 16, 32- and 64-bit wide unsigned integers, i.e.
To avoid this, we need to ensure that all the limb values are always small enough so that the multiplication (and to a lesser sense, addition) never overflow the native field. But as the bitlength of multiplication of two
Essentially, we have to show that there exist
or alternatively
We compute both
By computing the integer value
For that, we can instead consider the polynomial representation of the multi-precision integer
With this representation we have
which we can show by evaluating both sides at
Although we did not explicitly mention, the complexity of multiplication is showing that the limbs
More formally, we can write the value
where
We can see that this approach is really inefficient as we have to create
Lets see how to make the multiplication more efficient in three steps.
The first critical step is to obtain some randomness in circuit. But the circuits are fixed and for plain Groth16 (also PLONK) the only randomness could theoretically be provided by the prover. By default this isn't usually sufficient for probabilistic checks as the prover could provide such values which would lead to false positive outcomes.
The simplest way to overcome this problem is to use something called Fiat-Shamir heuristic. Using Fiat-Shamir heuristic, the verifiers challenges in the protocols are replaced with hashes of the protocol transcript up to the moment. So for example if we have an interactive protocol
PROVER VERIFIER
msg
-- msg -->
ch <- Sample()
<-- ch --
use ch
......
then this is turned into non-interactive protocol as
PROVER DETERMINISTIC VERIFIER
msg
-- msg -->
ch <- H(msg)
<-- ch --
use ch
......
Now, the proof is the transcript (msg, ch, ...)
of the protocol. Everyone can verify the correctness of the proof if they check that the verifier's challenges are correctly computed.
It depends on the viewpoint but normally this transformation, assuming that we use strong hash functions, is sound. This is due to difficulty of choosing inputs to the hash function which lead to something suitable for the prover. With strong hash functions if the prover changes only a bit in the inputs, this cascades and changes many output bits.
We can use this approach also in the SNARK circuits, but we stumble again against the wall of inefficiency. Cryptographic hash functions are not SNARK-friendly and create a lot of constraints in the circuit. Even with SNARK-friendly hash functions it becomes unfeasible to feed the whole protocol transcript into the hash function as the the hashing overhead cancels out any possible saving coming from probabilistic checks.
The approach for constructing commitment from the witness comes from LegoSNARK paper.
However, there is a way. To explain it, we would have to remind how Groth16 works on a high level.
The arithmetic circuit is compiled into
Here the the vector
It is possible to represent the matrices
Omitting the full proof SNARK proof computation (see Groth16 for details), the prover also computes
and verifier checks
Now, we can use Pedersen commitment as a Fiat-Shamir challenge of the witness elements
and the commitment value is
Additionally, the prover will have to provide proof of knowledge
The modified prover computation
and verifier's computation
The commitment M can now be used in circuit as a Fiat-Shamir challenge on the inputs
We can build a similar argument in PLONK using custom gate to mark variables we wish to commit.
We can now do a single Fiat-Shamir challenge for some variables in the circuit. Lets choose the inputs well.
In the first section we had to check that the limbs from hints are less than
We compute the the histogram of
and need to show that
By computing commitment
The evaluations cost
There is additional trick which allows to get rid of the
On the both sides we replaced products with inverses, which have the same cost. But we were able to lose the exponent which decreases the number of constraints to evaluate the left side.
This idea for log-derivative lookup table comes from Haböck.
Gnark achieves to be a modular and safe tool for the circuit developers. For example, we see in the range checking gadget that before we can compute the challenge commitment and assert the product check, we have to collect all the queries and compute the multiplicites. From the user perspective this means that the computations are performed out of order. An option would be to require the developer to manually do these steps, but this would provide an opportunity to write unsound circuits if the user forgets to include any query.
To avoid this, we have to build the range checking gadget in a way which ensures that the checks are always included in the circuit. For that, we added methods to register circuit finalizer callbacks and caches. The finalizer callbacks are functions which are called after the user-defined part of the circuit.
Additionally, we use caches to instantiate some constraints only once. We see that in the range checking gadget the left-hand side of the equation adds
As an example, given the following simple circuit in gnark, lets see what happens in the background:
type PairingComputationCircuit struct {
// the witness vector is automatically allocated and initialised
ElG1 sw_bn254.G1Affine
ElG2 sw_bn254.G2Affine
ExpectedGt sw_bn254.GTEl
// the in-circuit types are fully compatible with out-circuit counterpats in
// gnark-crypto
}
func (c *PairingComputationCircuit) Define(api frontend.API) error {
// we use cached instanse of pairing context. The pairing context itself
// uses cached instance of range check context. We can call different
// gadgets which all will use the same range check gadget instance.
pairingCtx, err := sw_bn254.NewPairing(api)
if err != nil {
return err
}
// the witness non-native element limbs are automatically constrained first
// time they are used in a circuit. Intermediate results are automatically
// constrained.
res, err := pairingCtx.Pair([]*sw_bn254.G1Affine{&c.ElG1},
[]*sw_bn254.G2Affine{&c.ElG2})
if err != nil {
return err
}
pairingCtx.AssertIsEqual(res, &c.ExpectedGt)
// we check that all witness elements are used in circuits. Without
// assertion get compile-time error.
return nil
// optimal range check table size computed, range check elements decomposed
// into table entries and checked.
}
func Example() {
var P bn254.G1Affine
var Q bn254.G2Affine
var a, b fr.Element
_, _, g1gen, g2gen := bn254.Generators()
a.SetRandom()
b.SetRandom()
P.ScalarMultiplication(&g1gen, a.BigInt(new(big.Int)))
Q.ScalarMultiplication(&g2gen, b.BigInt(new(big.Int)))
x, err := bn254.Pair([]bn254.G1Affine{P}, []bn254.G2Affine{Q})
if err != nil {
panic(err)
}
assignment := PairingComputationCircuit{
ElG1: sw_bn254.NewG1Affine(P),
ElG2: sw_bn254.NewG2Affine(Q),
ExpectedGt: sw_bn254.NewGTEl(x),
// automatically splits the extension field elements into limbs for
// non-native field arithmetic.
}
ccs, err := frontend.Compile(ecc.BN254.ScalarField(),
r1cs.NewBuilder, &PairingComputationCircuit{})
if err != nil {
panic(err)
}
pk, vk, err := groth16.Setup(ccs)
if err != nil {
panic(err)
}
w, err := frontend.NewWitness(&assignment, ecc.BN254.ScalarField())
if err != nil {
panic(err)
}
proof, err := groth16.Prove(ccs, pk, w)
// prover computes the Groth16 proof and commitment proof of knowledge
if err != nil {
panic(err)
}
pw, err := w.Public()
if err != nil {
panic(err)
}
err = groth16.Verify(proof, vk, pw)
// verifier checks Groth16 proof and the commitment proof.
if err != nil {
panic(err)
}
}
Using these tricks, the performance of BN254-in-BN254 and BLS12-381-in-BN254 pairing computation on Macbook Pro M1 are:
In the next part we will describe further optimisations for pairing computation.