Crypto roadmap for Substrate

# Crypto roadmap for Substrate We outline the trade offs between doing host calls for algebraic operations vs doing host calls for full verification routines. We prefer full verification routines once they become well established, but right now the SNARK world has enough churn that exposing algebra likely makes sense, and this benefits more bespoke protocols longer term. As a rule, cryptrograhpy crates could be compiled to WASM, with the exception being protocols employing secrects or randomness, like some batch verification schemes. All cryptography involves substancial computation within some critical functions. At the same time, host calls incur both a computational overhead, due to context switching. There is also a considerable development and maintanance burden for algebraic calls due to there being many costly operations. Almost all low-level cryptography crates already have a distinguished frontend and backend, which either do hardware specific optimizations, or else abstract operations over multiple lower level primitives. We can therefore provide substrate-crypto-* forks for cryptographic crates that alter existing backends to make algebraic host calls where appropriate. ## Host calls TODO: Fix wrong assumptions about how the host call interface works here. ### Types There are two concerns when passing argumnts more complex than `[u8]` but still `'static'`: - Alignment requirements could differ between WASM and native code. - Arithmetic crates customize their internal representations for hardware optimizations. In principle, callees could repair alignments without allocations using [`unsized_locals`](https://github.com/rust-lang/rfcs/blob/master/text/1909-unsized-rvalues.md). We cannot uniformly impose that internal representations be shared between WASM and native. We should therefore pass bytes and presumably ask nothing about alignments. As a rule, crypto crates expose only secure de/serializaation routines, which becomes expensive due to doing divisions and square roots. All substrate-crypto-* should therefor provide their own internal insecure-for-wire de/serialization mechanisms, which they align with their internal representations as appropriate. We'd face an uphill battle upstreaming any insecure-for-wire de/serialization mechanisms, but we might fare better if it uses opaque types that only substrate understands. ### Borrowing We often pass `Vec<u8>`s across the host boundary, but `&[u8]` and `&mut [u8]` work under some constraints, and higher alignment types like `[u64]` work too. We expect verification routines for full proof systems to prefer `Vec<u8>`s since these should do batch verification making them impure. Algebraic host calls might prefer `&[u8]` and `&mut [u8]` though. We'd expect these work due to algebra being mostly pure, except possibly for internal or returnned allocations. ### Compilation We shall assume WASM compiled to native code. In particular, compiled WASM runs roughly half the speed of native code, and imposes context switches on host calls. We'd adopt different strategies for interpreted WASM, but those strategies appear more complex. We'd achieve the best results by optimising for crypto for compiled WASM and sepertely pushing towards more compiled WASM everywhere else. ## Elliptic curves An elliptic curve has a base field, an additive group of points, and a scalar field. We have bespoke functions to hash into the group or fields, and to (de)compress group elements. These operations should require a couple hundred to a couple thousand cycles, so they remain compeditive with context switches. As pure functions, they could perhap be invoked without context switchs, but their number still imposes a maintanence burden. Internally, one represents curve points in several ways: - Ristretto/Ed25519 uses extended projective corrdinates for all variable points, but also optimises for constant points with precomputed tables aka fixed base. - Short Wierstrass curves exploit both projective and affine corrdinates. Ideally, these should support fixed base too, but almost never do so, although they do create temporary tables called prepared form (see pairings below) There are several heavy operations common to all elliptic curves: - Scalar multiplications always output projective coodinates, but inputs could be projective, affine, or constant with precomputed tables aka fixed, but not all implementaitons provide the later. - Multi-scalar-point multiplication always output projective coodinates too, but inputs are ocasionally mixed between projective and fixed. - Batch inversion prevides faster convertion from projective coodinates to affine, which helps with serialization and pairings. Among these, we only benefit much from mutable arguments with batch inversion, although multiplying a projective point times a scalar sometimes mutates the point in place. We ignore variable vs constant time distinctions here because variable time should alwayas suffice inside the runtime. We could skip optimizations for constant points initially because few curve implementations exposes any fixed point inteface, perhaps only [dalek](https://doc.dalek.rs/curve25519_dalek/traits/trait.VartimePrecomputedMultiscalarMul.html#method.vartime_multiscalar_mul). We have enough variety here that each substrate-crypto-* crate should probably manage invoking its own algebraic self calls, without further structuring those calls around specific algebaic operations. We could pass some rather generic type that identifies the substrate-crypto-* crate, specifies the command or even script to execute, and passes slices with the various borrowed and owned argument types. ``` pub struct AlgebraHostCall<'a> { pub system: u16, pub script: &'a [u8], pub refs: &'a mut [&'a [u8]], pub muts: &'a mut [&'a mut [u8]], pub vecs: &'a mut [Vec<u8>], } ``` Although one could pass curve paramters at runtime, we caution agasinst this because code bases that abstract the curve paramaters, like [Zexe](https://github.com/scipr-lab/zexe/tree/master/algebra/src) and [MIRACL Core](https://github.com/miracl/core/tree/master/rust) still employ constants for performance, and Zexe provides relatively bespoke curve models. As an example, Ristretto could be exposed by a fork substrate-crypto-curve25519-dalek of curve25519-dalek that users [`[patch]`](https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html?highlight=replace#the-patch-section) into their dependencies. We'd give substrate-crypto-curve25519-dalek a backend that invokes `mul_assign` and `multiscalar_mul`. In the host, we'd build substrate-crypto-curve25519-dalek with its avx2 or ifma backend, but expose compatable functions for `mul_assign` and `multiscalar_mul` with some scarily named feature. In principle, one could build bulletproofs, perhaps even [ZkVM](https://github.com/stellar/slingshot/), with this much Ristretto, although doing so requires addressing the constant points issues. [Halo](https://github.com/ebfull/halo/tree/master/src/curves) appears similar if implementation matures. ## Pairings We need pairing friendly curves for most SNARKs and for polynomial commitment schemes. In these, we have two groups representations G1 and G2 for the curve over different fields k1 and k2 with k2 normally an extension field of k1. We've far slower operation on G2, and far larger elements, due to the extension field arithemtic being slower. We have a third subgroup GT of the multiplicative group of yet another extension field as well. Again, we do some arithmetic like addition and negation inside WASM for all three G1, G2, and GT, but we require scalar multiplication and multi-scalar multiplication for G1, G2, and GT. We have new methods for the pairing itself, the multi Miller loop and final exponentiation, and possibly preparation for G2 points. We implement the final exponentiation with a scalar multiplication in GT using a distinguished scalar, but the multi Miller loop and possibly preparation require additional host calls. We suggest Zexe over MIRACL Core for pairing friendly curves: Zexe already abstracts all curve arithmetic behind traits provided by their algebra-core crate, which also provides most implementation code, and a seperate [algebra](https://github.com/scipr-lab/zexe/tree/master/algebra/src) crate with curve paramaters. We could build a substrate-crypto-zexe-algebra crate that delegates some operations to zexe's existing code, but invokes host calls for expensive operations. We'd provide the pairing friendly curves BLS12-381, BLS12-377, and BW6-761 this way, along with one associated Edwards curve for each of these. We expect BW6-761 to superseed SW6, but it currently lacks an associated Edwards curve. We caution the four MNT curves there might prove extremely slow, but again they could be supported. We note that ZCash and zexe's curve implementations have both advanced considerably and diverged, so one should monitor ZCash's code base too. As an aside, ZCash [exploits the relationship](https://github.com/zcash/librustzcash/blob/master/bls12_381/src/pairings.rs#L444) between the implementation of Miller loops and point preperation, but cannot obviously merge preperation and the Miller loop into one multi-use host call. ## Proof systems We do gain from embedding larger protocols instead of algerba, but some SNARK verifiers require relatively few calls, so the gains look minimal unless we also batch the verifications. As an example, Groth16 requires one multiscalar multiplication, one preperation, one multi Miller loop covering three pairings, and one final exponentiation. We would not add a host call just to combine these four host calls. Yet, we can batch verify Groth16 by merging the multiscalar multiplications with tweaked constants and doing the multi Miller loop over `n+2` pairings. As a practical matter, we could find that doing a Groth16 batch verifier that supports all the Zexe curves BLS12-381, BLS12-377, and BW6-761 requires less time than implementing the algerbaic host calls described above. Aside from Groth16, there are several new SNARKs like RedShift, Plonk, Marlin, and [more](https://github.com/matter-labs/awesome-zero-knowledge-proofs). We track these as the stabalize and shall slowly access the trade offs, impact of batch verification, etc. ## RSA and bignums Although large, RSA signatures have reasonably fast verification times. RSA has many variations though: - RSA-FDH is rarely deployed, but provides blind signatures. - Rabin-Williams signatures have incredibly fast verification. We could expose a bignum crate optimized for doing RSA style verifications. See https://github.com/w3f/Open-Grants-Program/pull/16 ## Aritmetic hash functions There are various optimization targets for hash functions, like performance in hardware (Shake128) or software (Blake2). Arithmetic aka SNARK friendly hash functions optimize for efficent verification inside SNARKs, normally by using curve scalars for registers. As a result, they require modular reductions during normal computaiton, which makes them quite slow compared with normal hash functions. Zcash's Pederson hash runs 500-1000 times slower than Blake2, but needs only 1% the constraints aka SNARK prover time. It must however be tuned to the specific elliptic curve. There are newer arithmetic hash functions like Poseidon provides a both fewer constraints and less running time, but must also be tunned to the elliptic curve and proof system. We'll run these only inside proof systems mostly, so normally they require no special treatment. Yet, they could arise in non-anonymous payments on anonymous chains, opertunistic protocols, or altered chain sync procedures. ## Symetric cryptography We've some hashing infrastructure already that supports Bkake2b. We should not require anything beyond this anytime soon, but we can discuss it anyways. We might eventually have parachains that benefit from ChaCha ala `ChaChaRng`. We'd expect parachains to seek within the steam directly to a couple output blocks they require however, for which WASM suffices. In ChaCha, you minimse host calls by extracting data in full 64 byte buffers, but you want morethan 64 bytes often, so the actual host call could support copying the prefix and poistfix data seperately: ``` pub fn chacha_fill_bytes(state: ChaChaState, prefix: &mut [u8], middle: &mut [u8] postfix: &mut [u8]) ``` In this, we embed the block offset into `state` and expect `prefix.len() + middle.len() + postfix.len()` to be divisible by 64. Increasingly, there are zero-knowledge proof systems beyond bulletproofs that use [merlin](https://merlin.cool/) transcripts, which possibly benefits from some [STROBE](https://strobe.sourceforge.io/specs/) host call. We'd need to open up [strobe-rs](https://github.com/rozbb/strobe-rs) to minimize the invokations, perhaps its internal `operate` method.