Write high-performance cryptography software in Agda (or at least try and fail)
--------------------------------------------------------------------------------
Warning: This is all very experimental and very much a work in progress.
### Context
Recently I've been working on something called "zero knowledge proofs". These are cryptographic protocols allowing one party (the "prover") to convince another party (the "verifier") about the truth of a statement $R(x,w)\in\mathsf{Bool}$, where $x\in X$ (called the "instance") is known by all parties, but $w\in W$ (called the "witness") is only known by the prover.
These are very complex protocols, building on a large number of mathematical abstractions, for example:
- (large) finite fields and field extensions
- univariate and multivariate polynomials
- fast Fourier transform
- elliptic curves and elliptic curve pairings
- high dimensional lattices
- etc
And this being cryptography, it is both:
- security critical (by definition!)
- performance critical (these protocols typically have huge overheads; think several orders of magnitudes)
Furthermore, this is _programmable cryptography_, that is, some of the inputs (in this case, the relation $R(x,w)$ above) are _programs_. However we often need to express these programs in unusual ways, so that they can be consumed by the mathematical tricks.
All this together presents serious engineering challenges, which are in my opinion very far from being solved.
An algebra library
------------------
Let's say we want to build an algebra library handling all the above math abstractions.
Of course there already exist such libraries, in several languages (like: C++, Rust, Go, Nim, even Haskell), but they are not without problems. One typical problem with these is that you have to use them from the same language... which in turn means imprecise and badly designed APIs, and thus bad programming experience.
In any case, how would one go to build such a library? These abstractions build on the top of each other:
1. we start with big integers (modulo $2^{64k}$, for example $2^{256}$)
2. small prime fields: $p < 2^{64}$
3. large prime fields: $p \gg 2^{64}$
- in standard representation (integers in $[0,p-1]$ interpreted as $\textrm{mod}\;p$)
- Montgomery representation (more efficient multiplication)
4. field extensions; eg.: $\mathbb{F}[x] / (x^3 - x + 1$), then iterating this...
5. vector operations (pointwise, dot product, etc)
6. matrices (dense and sparse)
7. polynomials (univariate and multivariate)
8. NTT (fast Fourier transform / polynomial interpolation and evaluation)
9. elliptic curves
- short Weierstrass representation
- projective representation
- Jacobian (or weighted projective) representation
10. elliptic curve multi-scalar multiplication (MSM)
11. elliptic curve pairings
12. lattices over finite fields
13. lattices over polynomial rings over finite fields
14. [...and so on...]
For these operations to have acceptable performance, they must be implemented carefully, using all kinds of subtle and tricky algorithms, and specialized to the various parameters (field size, elliptic curve parameters, etc), and often unrolled / inlined (and sometimes also optimized for SIMD / GPU).
Modulo prime multiplication is for example something often implemented in hand-written assembly, because it's at the bottom of everything and is worth the pain... Real cryptographers also care about these operations running in constant time (because of timing side-channel leaks), which I just ignore here.
As in practice many different sets of parameters are used, essentially all such libaries involve some form of metaprogramming (macros, direct code generation, etc).
We believe that the right framework to do this is staged metaprogramming in two-level type theory, as developed by András Kovács. This approach has several advantages:
- we can use very high-level abstractions in a safe way
- relatively nice syntax, as far as metaprogramming goes
- still generate efficient low-level code
- static guarantees about the generated code
- importantly, we can write our application too as a meta-program, meaning that we can use the full expressiveness of a dependently typed language, in particular for the APIs between the library and the application; and these APIs can involve further code-generation
### Programmable cryptography
Staged metaprogramming seems to be also a good approach to write programs for the "programmable" part of "programmable cryptography".
Often you need versions of the same program in (widely) different contexts - metaprogramming promises us to write these once and then interpret it in different contexts.
An example would be representing the same program as CPU code, GPU code, and say as an arithmetic circuit (and maybe even in hardware like FPGA or even an ASIC). In the industry this is almost always done manually, which is painful, error-prone, resource-hungry (lots of humans!) and very hard to maintain.
**Exhibit A:** In the mainstream proof system called "Plonky2", everything is implemented at least 3 times (and in Rust to start with!):
- once in the base field
- once in the extension field
- and once as an arithmetic circuit
These look almost like copy-paste, expect they are not really...
Agda prototype
--------------
As there are some features (like stage inference) which can make such metaprogramming more user friendly, but are not typically present in mainstream dependently typed languages, Péter Diviánszky is developing a new language / compiler called `csip` to facilitate writing these metaprograms.
Meanwhile I'm developing an Agda prototype of such an algebra library, which may be a bit more verbose, but still a very useful study to work out the details and to try to figure out the ergonomics.
HOAS representation
-------------------
We use HOAS syntax to write metaprograms. This is very convenient in practice, only a small amount of boilerplate has to be manually inserted by the programmer. And in a dedicated language like `csip`, those can be inferred!
This is the representation of the object programs I use at the moment:
data Ty : Set where
_⇒_ : Ty -> Ty -> Ty
Bit : Ty
U64 : Ty
Struct : {n : ℕ} -> Vec Ty n -> Ty
...
data Tm : (ty : Ty) -> Set where
Lam : (Tm s -> Tm t) -> Tm (s => t)
Let : Tm s -> (Tm s -> Tm t) -> Tm t
App : Tm (s => t) -> Tm s -> Tm t
Fix : Tm ((s => t) => (s => t)) -> Tm (s => t)
Pri : PrimOp Tm t -> Tm t
Lit : Val t -> Tm t
data PrimOp (tm : Ty -> Set) : Ty -> Set where
AddCarryU64 : tm Bit -> tm U64 -> tm U64 -> PrimOp tm (Pair Bit U64)
MulExtU64 : tm U64 -> tm U64 -> PrimOp tm (Pair U64 U64)
...
data Val : Ty -> Set where
BitV : Bool -> Val Bit
U64V : Word64 -> Val U64
StructV : HList Val ts -> Val (Struct ts)
...
Metaprogramming
---------------
Ideally, metaprogramming should look very similar to normal programming (but of course the programmer has to keep in mind which part of the program happens at the meta-level and which part at the object-level... We hope, that in the future tooling should help with this, for example with colorings)
Here is how metaprogramming looks like (in the current Agda proof-of-concept):
-- greater-than-equal relation between big integers
isGE : Tm (BigInt n) -> Tm (BigInt n) -> Tm Bit
isGE {n} big1 big2 = unwrap2 big1 big2 \x y -> worker x y (allFin n) where
worker : Tm (BigInt' n) -> Tm (BigInt' n) -> Vec (Fin n) n -> Tm Bit
worker x y list = go list where
go : {k : ℕ} -> Vec (Fin n) k -> Tm Bit
go [] = true
go (j ∷ js) = Let (vecproj (opposite j) x) \(a : Tm U64) ->
Let (vecproj (opposite j) y) \(b : Tm U64) ->
ifte (a < b)
false
(ifte (a > b)
true
(go js)
)
or:
-- addition of big integers, with carry
addCarry : Tm Bit -> Tm (BigInt n) -> Tm (BigInt n) -> Tm (Pair Bit (BigInt n))
addCarry {n} carry₀ big1 big2 =
unwrap2 big1 big2 \xs ys -> runGen do
(carry₁ , tms) ← worker xs ys (Data.Vec.allFin n)
return (mkPair carry₁ (mkBigInt tms))
where
worker : Tm (BigInt' n) -> Tm (BigInt' n) -> Vec (Fin n) n -> Gen (Tm Bit × Vec (Tm U64) n)
worker xs ys vec = go vec carry₀ where
go : {k : ℕ} -> Vec (Fin n) k -> Tm Bit -> Gen (Tm Bit × Vec (Tm U64) k)
go [] c₀ = return (c₀ , [])
go (j ∷ js) c₀ = do
c₁ , z ← pair⇑ (addCarryU64 c₀ (vecproj j xs) (vecproj j ys))
c₂ , zs ← go js c₁
return (c₂ , z ∷ zs)
This is _quite a bit verbose_ indeed. However, for example in the second case, the essential part of the code is the following (the rest is just boilerplate):
go [] c₀ = return (c₀ , [])
go (j ∷ js) c₀ = do
c₁ , z ← pair⇑ (addCarryU64 c₀ (vecproj j xs) (vecproj j ys))
c₂ , zs ← go js c₁
return (c₂ , z ∷ zs)
which to me looks "kind of OK".
Not surprisingly, with less trivial algorithms, it gets somewhat worse...
Compilation pipeline
--------------------
1. You write your application as a metaprogram in Agda, using HOAS representation and whatever convenience the library can offer
2. This gets translated into a very started well-typed first-order representation of simply typed lambda calculus
3. In theory you could have a well-typed interpreter at this level, but I haven't implement the primops yet
4. The first order representation gets downgraded to an untyped "raw" representation
5. That raw AST is then exported as a text file (using a hand-written `Show` instance)
6. The resulting text file can be read by Haskell with the standard derived `Read` instance
7. This can be interpreted at this point too
8. Lambda lifting transformation
9. ANF conversion
10. Code generation (for example as C source code)
11. Finally the resulting C code is compiled by a standard C compiler
Actual code
-----------
(I mean the above was already actual code!)
Maybe switch to `github` at this point?
Demo
----
The planned demo application would be calculating Poseidon2 hash.
This requires the following layers of abstractions:
- first implement big (eg. 256 bit) integers using the 64-bit primitives supported by the CPU
- then build field operations modulo a big (254 bit) prime number (only addition and multiplication is needed for this)
- (to be efficient we should use Montgomery representation of field elements)
- finally implement the Poseidon2 permutation (using modulo prime addition and multiplication)
Unfortunately, the last step (C code generation from ANF syntax) is not yet in a working state :)
Experience
----------
Things I enjoyed in Agda:
- having precise types
- module system
- the syntax is quite nice
- probably some more I forgot :)
Things I didn't enjoy in Agda:
- transports - maybe use rewrite rules? and/or tactics?
- pattern matching on implicit arguments - feature request: `ScopedTypeVariables`
- Emacs mode - feature request: in holes show definitions in scope from `where` blocks
- name resolution when multiple definitions exist (for example `_::_` and `[]` seems to work, as they are constructors (?); but other names, like for example `_++_` don't)
- the standard library is hard to navigate and some parts look rather ad-hoc
- nice to have the `Word64` type, but operations are incomplete / missing
- speed (or the lack of it)