P-256, also known as secp256r1 and prime256v1, is a 256-bit prime field Weierstrass curve standardized by the NIST. It is widely adopted in internet systems, which explains its myriad use cases in platforms such as TLS, DNSSEC, Apple’s Secure Enclave, Passkeys, Android Keystore, and Yubikey. The key operation in elliptic curves based cryptography is the scalar multiplication. When the curve is equipped with an efficient endomorphism it is possible to speed up this operation through the well-known GLV algorithm. P-256 does unfortunately not have an efficient endomorphism (see parameters) to enjoy this speedup.
Verifying ECDSA signatures on Ethereum through precompiled contracts, i.e. smart contracts built into the Ethereum protocol (there are only 9) is only possible with the secp256k1 curve and not the P-256.
Verifying ECDSA signatures on P-256 requires computing scalar multiplications in Solidity and is especially useful for smart-contract wallets, enabling hardware-based signing keys and safer, easier self-custody. Different solutions can bring P-256 signatures on-chain. There are primarily three interesting approaches: (zk)-SNARK based verifiers, smart contract verifiers (e.g. [Dubois23], Ledger/FCL (deprecated), smoo.th/SCL and daimo/p256verifier), and native protocol precompiles (EIP/RIP 7212).
Using SNARK (succinctness) properties, provides a great way to reduce gas cost for computation on Ethereum (e.g. ~232k gas for Groth16, ~285k gas for PLONK and ~185k gas for FFLONK). This is very competitive with (and sometimes better that) the currently gas-optimal smart contract verifier. Moreover one can batch many ECDSA verifications in a single proof, amortizing thus the gas cost. However verifying P-256 signatures in a SNARK circuit can be very expensive i.e. long proving time. This is because the field where the points on the P-256 curve lie is different than the field where the SNARK computation is usually expressed. To be able to verify the proof onchain through the procompile the SNARK field needs to be the BN254 scalar field. Different teams tried to implement the ECDSA verification on P-256 in a BN254 SNARK circuit efficiently. Among these: zkwebauthn/webauthn-halo2, https://github.com/zkwebauthn/webauthn-circom and PSE/circom-ecdsa-p256.
If P-256 had an efficient endomorphism we could have optimized the proving time a great deal!
In this note we show a way to implement a GLV-like scalar multiplications in-circuit without having an efficient endomorphism.
Let be an elliptic curve defined over the prime field and let be a prime divisor of the curve order (i.e. the number of points).
Let and , we are interested in proving scalar multiplication over the -torsion subgroup of , denoted (i.e. the subset of points of order ).
The simplest algorithm is the standard left-to-right double-and-add:
If/else branching is not possible in SNARK circuits so this is replaced by constant window table lookups inside the circuit. This can be achieved using polynomials which vanish at the constants that aren’t being selected, i.e. a 1-bit table lookup Q ← s_i * Q + (1 - s_i) * (Q+P)
. Hence this double-and-add algorithm requires doublings, additions and 1-bit table lookup.
This can be extended to windowed double-and-add, i.e. scanning more than a bit per iteration using larger window tables, but the multiplicative depth of the evaluation increases exponentially. We use affine coordinates for doubling/adding points because inverses cost as much as multiplications, i.e. instead of checking that is we provide out-circuit and check in-circuit that . However since we start with it is infeasible to avoid conditional branching since affine formulas are incomplete. Instead, we scan the bits right-to-left and assume that the first bit s_0
is 1 (so that we start at Q ← P
), we double the input point P
instead of the accumulator Q
in this algorithm and finally conditionally subtract (using the 1-bit lookup) P
if s_0
was 0.
However it is well known that if the curve is equipped with an efficient endomorphism then there exists a faster algorithm known as [GLV].
Example 1 : suppose that has Complex Multiplication (CM) with discrimant , i.e. is of the form , with . This is the case of BN254
, BLS12-381
and secp256k1
elliptic curves used in Ethereum. There is an efficient endomorphism defined by (and ) that acts on as . Both and are cube roots of unity in and respectively, i.e. and .
Example 2 : suppose that has Complex Multiplication (CM) with discrimant , meaning that the endomorphism ring is . This is the case of the Bandersnatch
elliptic curves specified in Ethereum Verkle trie. There is an efficient endomorphism whose kernel is generated by a 2-torsion point. The map can be found by looking at 2-isogeneous curves and applying Vélu's formulas. For Bandersnatch it is defined by for some constants (and ) that acts on as where .
The GLV algorithm starts by decomposing as and then replacing the scalar multiplication by . Because and are guaranteed to be (see Sec.4 of [GLV] and Sec.4 of [FourQ] for an optimization trick), we can halve the size of the for loop in the double-and-add algorithm. We can then scan simultaenously the bits of and and apply the Strauss-Shamir trick. This results in a significant speed up but only when an endomorphism is available. For example the left-to-right double-and-add would become:
Using the efficient endomorphism in-circuit is also possible (see [Halo, Sec. 6.2 and Appendix C] or [gnark implementation] for short Weierstrass curves and [arkworks] and [gnark] implementations for twisted Edwards). But one should be careful about some extra checks of the decomposition (not the SNARK modulus). The integers can possibly be negative in which case they will be reduced in-circuit modulo the SNARK field and not .
Remember that we are proving that and not computing it. We can "hint" the result and check in-circuit that . Now, if we can find such that then we can check instead that
which is equivalent to
The thing now is that and are "small" and we can, similarly to the GLV algorithm, halve the size of the double-and-add loop and apply the Strauss-Shamir trick.
Solution: running the half-GCD algorithm (i.e. running GCD half-way) is sufficient to find and . We can apply the exact same trick for finding the lattice basis as in the GLV paper (Sec. 4). For completeness we recall the algorithm hereafter.
We apply the extended Euclidean algorithm to find the greatest common divisor of and (This gcd is 1 since is prime.) The algorithm produces a sequence of equations
for where , and for all . We stop at the index for which and take and .
Note: By construction is guaranteed to be a positive integer but can be negative, in which case it would be reduced in-circuit modulo the SNARK modulus and not . To circumvent this we return in the hint , and a if is negative and otherwise. In-circuit we negate instead when .
A generic implementation in the gnark library is available at gnark.io (feat/fake-GLV branch). For Short Weierstrass (e.g. P256) look at the scalarMulFakeGLV
method in the emulated package and for twisted Edwards (e.g. Bandersnatch/Jubjub) look at the scalarMulFakeGLV
method in the native package.
The best algorithm to implement scalar multiplication in a non-native circuit (i.e. circuit field ≠ curve field) when an efficient endomorphism is not available is an adaptation of [Joye07] (implemented in gnark here).
Next we compare this scalar multiplication with our fake GLV in a PLONKish vanilla (i.e. no custom gates) circuit (scs) over the BN254 curve (Ethereum compatible). We also give benchmarks in R1CS.
P-256 | Old (Joye07) | New (fake GLV) |
---|---|---|
738,031 scs 186,466 r1cs |
385,412 scs 100,914 r1cs |
|
ECDSA verification | 1,135,876 scs 293,814 r1cs |
742,541 scs 195,266 r1cs |
Note here that the old ECDSA verification uses Strauss-Shamir trick for computing while the new version is merely two fake GLV multiplications and an addition.
p256wallet.org is an ERC-4337 smart contract wallet that leverages zk-SNARKs for WebAuthn and P-256 signature verification. It uses PSE/circom-ecdsa-p256 to generate the webAuthn proof, and underneath PSE/circom-ecdsa-p256 to generate the ECDSA proof on P-256 curve. The github README reports 1,972,905 R1CS
. Compiling our circuit in R1CS results in 195,266 R1CS
. This is more than a 10x reduction, which is not only due to the fake GLV algorithm but also to optimized non-native field arithmetic in gnark.
Similar results are noticed for other curves in short Weirstrass, e.g. P-384 and STARK curve:
P-384 | Old (Joye07) | New (fake GLV) |
---|---|---|
1,438,071 scs | 782,674 scs | |
ECDSA verification | 2,174,027 scs | 1,419,929 scs |
STARK curve | Old (Joye07) | New (fake GLV) |
---|---|---|
727,033 scs | 380,210 scs | |
ECDSA verification | 1,137,459 scs | 732,131 scs |
and also in twisted Edwards e.g. Jubjub vs. Bandersnatch:
Jubjub | Old (2-bit double-and-add) | New (fake GLV) |
---|---|---|
5,863 scs 3,314 r1cs |
4,549 scs 2,401 r1cs |
Bandersnatch | Old (GLV) | New (fake GLV) |
---|---|---|
4,781 scs 2,455 r1cs |
4,712 scs 2,420 r1cs |
EDIT: Thanks to Ben Smith for reporting that a similar idea was proposed in [SAC05:ABGL+] for ECDSA verification. We note that, in our context, the trick applies to a single scalar multiplication and that the half GCD is free through the hint.