In the Zero Knowledge sphere, there exists a plethora of Zero Knowledge Systems and languages. From zk-SNARK languages like Zokrates, Leo, and Vamp-IR; to STARK machines like Triton VM, Miden VM, and RISC Zero. It is difficult in the current landscape to make an informed decision about which system fits their use case best. This benchmarking endeavor seeks to alleviate these issues by readily making performance numbers between the various systems be known and provide an easy way to compare the same program across languages and systems.
Currently, the amount of backends and programs that are compared is limited. However, community contributions should amplify the value provided by the benchmarking project.
In particular, the benchmarking results and programs will be most useful to five kinds of individuals:
Both Individuals 4.
and 5.
can use these results in a straightforward manner, seeing what in their programs and compilers to improve and as a comparison point if any results seem suspect.
Individuals 1.
through 3.
all have the system as a free variable and can wisely pick the language/system that is fit for their problem domain. It is important to note that the benchmarking results to these individuals is not the full story in making decisions, it is important to take the benchmark results as a data point in a more full analysis:
These are all equally important questions, and no singular system wins in every single category. For example, RISC zero
may be slower than another STARK system like Miden
, however, it offers your users the ability to use a language they are more familiar with (Rust
, C
, etc.) with little effort, while Miden
has the advantage of being faster but requires writing tooling or using a language your users are unlikely to be familiar with.
With these caveats made apparent, we can now discuss the data collection methodology.
Most Zero Knowledge Systems are implemented in Rust. This gives the benchmarking initiative an easy through line to implement a general framework.
In particular, we define the ZeroKnowledge
trait within rust to act as our API across any Zero Knowledge System. Future work will be had to incorporate systems/languages which do not expose a Rust API into the benchmark.
Thecompile
, prove
, and verify
functions are responsible for compiling the circuit, running the prover and the verifier respectively, with prove_and_verify
running all of the functions. The name
function is there to name the circuit for data reporting reasons.
The prove
and verify
functions, in particular, are where most of the benchmarking interests lay. Since every Zero Knowledge Proof System offers some kind of prover and verifier, this interface is easy to implement for any such system.
For the benchmarking aspect in Rust
, we have decided to go with the criterion micro benchmakrer. We use criterion to aggregate the average times of each interface functions (compile
, prove
, verify
, and prove_and_verify
). For the sake of speed, we only collect twenty samples each with the default benchmarking settings.
As alluded to earlier, in analyzing the data, it is important to understand the capability of the machinery of each system. For example, we can break down the base machinery into a few broad categories:
Even within these categories, great differences can be seen. For example within zk-SNARKs, there is a divide between the R1CS style of systems and more Plonkish style of systems. Typically Plonk style circuits are universal (due to the polynomials making up the circuit being of a fixed degree) while many R1CS style circuits are not universal (and thus require a setup for every new circuit).
With that said, this categorization is useful in the rough properties we can surmise. A good comparison table can be found on the awesome zero knowledge proof repository. Recursive ZK-SNARKS
are not included in the table; for analysis purposes, one can lump them in with ZK-SNARKs
. However, where the difference comes in is in the expressivity of each of the systems:
For non recursive ZK-SNARKS
doing something basic like
requires executing both the then (3 + 5
), and the else (5 + 8
). This is in contrast to both ZK-STARKS
and recursive ZK-SNAKRS
which allow execution of only one branch [1]. Due to the difference in expressivity, this is why projects like:
Can all be turing complete.
Another point to consider when comparing data, is that many of these projects are on different levels of abstraction. For example, we may have benchmark results of RISC0
code and Miden
code. The RISC0
code is generated by Rust, that is to say we have taken a Von Neuman programming language compiled down to the RISCV
instruction set, which is then converted into an underlying ZK-STARK
circuit. While the Miden
code may have been written directly in Miden
, meaning that the code just has to be translated from Miden
's stack representation to the underlying ZK-STARK
representation. Thus, the Miden
code used for benchmarking will be closer to the underlying ZK-STARK
representation and will likely have better performance characteristics.
Further, machines like Miden
and Triton
have static call graphs (MAST for Miden, Triton), which means that all the jumps/branches on the machines must be known statically. This results in machines which have various limitations when building on top of them. Due to the limitations and also trying to maintain multiple non-trivial programs, we have also provided minimal languages on top of the assembly languages. Compiler writers, audience 2.
, should be interested in these to see what kinds of langauges are easy to build on these machines.
src/programs.lisp
to see the program generatorssrc/programs.lisp
to see some programs2.
it may be good to look at issues 16 and 17 instead.This article is published under Anoma's research publication, as such research is useful to the Anoma project. In fact, Anoma makes up all five of the interested individuals:
- Users who are building an application with a Zero Knowledge System.
Anoma, through the Taiga protocol, will offer an interface for evaluating validity predicates(Hereby referred to as VP
s) in a private circuit.
- Compiler writers who wish to target a Zero Knowledge System.
Anoma as a project has many compiler projects that wish to compile to a zero knowledge circuit. From VAMP-IR which seeks to be the "LLVM of Zero Knowledge Systems", to GEB, which is a categorical language that seeks to formally verify compilation from languages such as lambda calculus to polynomial circuits [2].
- Users who have a choice of the Zero Knowledge System in their application.
Anoma is a heterogeneous protocol and thus allows developers to chose the Zero Knowledge System that works best for their use case.
- Zero Knowledge System implementers who wish to know optimization points.
Anoma has created Plonkup, which is a system that combines Plonk and Plookup.
- Performance-oriented users who wish to see optimal code in their desired system.
The aforementioned VP
are run in a system that has gas costs, and so Anoma developers would wish to optimize their circuits.
The data currently collected is in the preliminary stage, so the following caveats should be had when looking at the data:
The data can be seen in an HTML rendered format: here.
All results are taken from a AMD Ryzen 7 5700X 8-Core @ 16x 3.4GHz
CPU.
Sudoku | Miden |
Plonk: 3 by 3 |
Risc |
Halo: 3 by 3 |
---|---|---|---|---|
All Combined | 321.75 ms (✅ 1.00x) |
128.18 ms (🚀 2.51x faster) |
1.39 s (❌ 4.32x slower) |
353.85 ms (✅ 1.10x slower) |
Compile | 2.92 ms (✅ 1.00x) |
60.78 ms (❌ 20.84x slower) |
726.71 us (🚀 4.01x faster) |
271.36 ms (❌ 93.06x slower) |
Prove | 313.32 ms (✅ 1.00x) |
64.96 ms (🚀 4.82x faster) |
1.37 s (❌ 4.36x slower) |
84.19 ms (🚀 3.72x faster) |
Verify | 1.94 ms (✅ 1.00x) |
4.89 ms (❌ 2.52x slower) |
1.78 ms (✅ 1.09x faster) |
3.21 ms (❌ 1.65x slower) |
Fibonacci | Miden: iter-93 |
Miden: fixed-92 |
Miden: fixed-50 |
Risc0: iter-93 |
Risc0: iter-50 |
Risc0: fixed-50 |
Risc0: fixed-92 |
---|---|---|---|---|---|---|---|
All Combined | 317.49 ms (✅ 1.00x) |
151.65 ms (🚀 2.09x faster) |
154.28 ms (🚀 2.06x faster) |
336.80 ms (✅ 1.06x slower) |
333.00 ms (✅ 1.05x slower) |
332.51 ms (✅ 1.05x slower) |
333.55 ms (✅ 1.05x slower) |
Compile | 64.50 us (✅ 1.00x) |
53.27 us (✅ 1.21x faster) |
41.98 us (✅ 1.54x faster) |
113.36 us (❌ 1.76x slower) |
111.91 us (❌ 1.74x slower) |
114.57 us (❌ 1.78x slower) |
114.30 us (❌ 1.77x slower) |
Prove | 316.77 ms (✅ 1.00x) |
151.03 ms (🚀 2.10x faster) |
150.90 ms (🚀 2.10x faster) |
334.39 ms (✅ 1.06x slower) |
332.21 ms (✅ 1.05x slower) |
330.52 ms (✅ 1.04x slower) |
332.26 ms (✅ 1.05x slower) |
Verify | 1.90 ms (✅ 1.00x) |
1.88 ms (✅ 1.01x faster) |
1.88 ms (✅ 1.01x faster) |
1.67 ms (✅ 1.14x faster) |
1.67 ms (✅ 1.14x faster) |
1.65 ms (✅ 1.15x faster) |
1.65 ms (✅ 1.15x faster) |
Fibonacci | Miden: iter-1000 |
Risc0: iter-1000 |
---|---|---|
All Combined | 2.76 s (✅ 1.00x) |
2.74 s (✅ 1.00x faster) |
Compile | 63.84 us (✅ 1.00x) |
108.18 us (❌ 1.69x slower) |
Prove | 2.72 s (✅ 1.00x) |
2.70 s (✅ 1.01x faster) |
Verify | 2.11 ms (✅ 1.00x) |
12.08 ms (❌ 5.73x slower) |
This does not imply it is cheaper to only execute one branch like on a Von Neuman machine. For simpler branches, it is still more efficient to execute both branches. ↩︎
GEB is much more general, as it seeks to be a categorical compiler pipeline from any language to another, allowing programmers to write code in more flexible ways while also allowing them to give up power for various mathematical properties. ↩︎