Things we want to improve in Nova and Nova-based ecosystem.
Context: These are issues that came out of work at ZK Vietnam Residency, specifically towards a ZK VM design https://hackmd.io/0bcDwP5QQp-eiAMVEulH7Q and seeing feasibility of a different architecture vs e.g. current Halo2-based ZKEVM CE.
Audience: (i) ourselves when we keep working on it (ii) upstream people to understand pain points (iii) Zuzalu hackathon in mid-April (iv) Wider community.
Upstream repo: https://github.com/microsoft/nova
PSE fork: https://github.com/privacy-scaling-explorations/nova
Note that the first clear thing is that now, for each step (
We should explore a bit more the performance of the
Also, it's important to highlight that the folds should be big. As big as possible as the advantage of Nova is mostly comming from not needing to compute FFTs
. Hence, small
rayon
.The idea for the future is that instead of havign multiple threads computing
This would definitely decrease the memory consumption significantly as multithreading will be used within
Parallel Nova
implementation. But it might be better to first do an exhaustive analysis of the purposed solution PoC and determine it's correctness as well as it's performance implications.This basically means, that we need to address all the points marked above before we polish and upstream the PoC done in the Vietnam Residency.
Some exploration were done in regards trying to take profit of Neptune-cuda
and pasta-msm
features. The issue was that the AWS servers we had access to, did not have, or had impossible-to-configure GPUs. Hence, we weren't able to test the performance with the GPU backend with powerful GPU cards.
The intuition is that if we have big-enough folds with several private and public inputs, the speedup should be considerable. Specially when we know that the only heavy operations we perform are Multiscalar Multiplications and Hashing.
A big leftover of this Residency that we would love to see happening is benchmarks that give serious intuition over the expected performance gains when these are used.
The API of Nova as well as Nova-Scotia it's a bit painful to work with.
There are a lot of improvements that we could do so that the parallel solution is easier to implement:
Make the API inside of Nova and Nova-Scotia easier to use by providing more abstractions & automatization capabilities.
https://github.com/privacy-scaling-explorations/Nova/blob/parallel_prover_bench/src/lib.rs#L191-L200 is an example of all the inputs that need to be provided. It would be much better if we could have a structure that wraps them up and provides constructors and other helpful methods.
This is specifically useful for the parallel case. Where we created the FoldInput
struct to manage the parallel witnessing. See: https://github.com/privacy-scaling-explorations/Nova/blob/parallel_prover_bench/src/parallel_prover.rs#L652-L656.
We belive this can be significantly improved so that all the public inputs and outputs can be generated prior to the IVC accomulation/folding as well as allow for easier interfaces for witness-gen. See a first try into this direction here: https://github.com/privacy-scaling-explorations/Nova-Scotia/blob/parallel_nova/src/lib.rs#L93-L172
Currently the PoC is focused on implementing Parallel Nova
but not focused on API ergonomics.
We should try to unify the API for the parallel and regular cases. And let Nova do the re-arrenging work over the Folds behind the scenes.
This means that the API is the same (or as close as possible) and we don't worry about providing public inputs/outputs.
We can also consider a parallel feature-flag so that writing the same circuit allows us to
One of the latest things to pay attention to, would be the development of an FPGA-based Prover. This would significantly speedup the overall implementation speed. Combined with the fact that we don't do FFTs, the outcome of that would imply a significant performance boost over the protocol implementation.
It is also easy to implement with a feature flag.
We find useful to perform some changes in the API so that we already account for recieving all of the FoldInput
s and we let Nova internally handle the ordering + the checks of the Folds depending on the feature flag used.
See: https://github.com/privacy-scaling-explorations/Nova/blob/parallel_prover_bench/src/parallel_prover.rs#L652-L656. and https://github.com/privacy-scaling-explorations/Nova/blob/parallel_prover_bench/src/parallel_prover.rs#L658-L716
We've also experimented with using both sides of the curve-cycle so that we can double the amount of work we do at the same time.
This means that in pallas
and vesta
we perform useful work instead of just verifying fold accomulation correctness in one side and do the useful program-logic stuff in the other.
See:
F:PrimeField
and G:Group
when the ideal scenario to not have trait-madness is to just use G:Group
and invoke G::Scalar
when we need something that implements PrimeField
.From<F:PrimeField>
or something similar and we just use it instead of the current conversion madness.Nova-Scotia
don't need to worry at all about these things.See https://github.com/privacy-scaling-explorations/Nova-Scotia/blob/parallel_nova/src/lib.rs#L51-L65 as an example.
No type aliases and make everything single-trait-based. See: this
The type aliases G1
, G2
, F1
, F2
make the library extremely confusing. We should delete them and make everything trait-based everywhere.
Find better abstractions for CircomInput
which is hard-to-work with. See: https://github.com/privacy-scaling-explorations/Nova-Scotia/blob/parallel_nova/src/lib.rs#L121-L124
It would also be nice to be able to abstract this from the user.
O(N*(C*L)
to O(N*(C+L)
complexity (https://hackmd.io/0bcDwP5QQp-eiAMVEulH7Q#SuperNova-VM)Improve Nova benchmarks. See current results. Specifically, we want to (i) ensure correctness (verification checks etc) (ii) clean up the test and make them easier to run (less manual editing, use criterion) (iii) upstream to more standardized benchmark efforts (zk-bench.org or celer network benches).
Bellperson SHA256 Celer-network benchmark appears to be more performant. Reproduce this and understand why there's a diff.
ZK VM spec: https://hackmd.io/0bcDwP5QQp-eiAMVEulH7Q
(…other code bases? like Halo2 support, plonkish…)