# Towards a Nova-based ZK VM
---
## A really quick summary of our journey
<img src=https://i.imgur.com/1XYtlno.png width = 400px></img>
Note:
- "Unlimited" parallelization (Not true, but sounds cool)
- No FFTs!
- There's already an implementation that we could bench
https://github.com/microsoft/Nova
---
## Motivations
😕😕We wanted to do ZKWASM!! Why did we end up here?😕😕
- Alternative designs to ZKEVM
- New proving systems sounded really interesting
- WASM doesn't have gas -> Unbounded computations
- Aggregation would be a nightmare.
---
## Goals
- Be able to sketch out the Memory and program execution consistency.
- Figure out if `Parallel Supernova` was feasible.
- Be able to implement a Parallel-ready version of Nova in Rust.
- Benchmark the overall solution to see how it compares to other state-of-the-art proving systems avaliable for devs.
---
## The Parallelization Promised Land
* Since the IVC process seperates the folding into many small steps can we allocate different threads or even machines to doing subsections of the proof?
* This gives us almost infinite parallelization capacity as given an arbitrarily long calculation we can do it on an arbitrary number of machines machines.
Unfortunately out of the box Nova can't support this.
---
## Curve Cycling and Sequential Nova
**Folding != Proving**
* Each folding step only guarantees that if it is done correctly and is given two valid relaxed R1CS instances it produces a valid relaxed R1CS.
* Therefore we must create a proof that folding is done correctly for every fold in order to check that if the final proof is valid that all proofs are valid.
---
**But folding is a wrong field operation because it directly manipulates points on the EC not in the EC base field**
* Therefore to be efficient we have to use curve cycling
* We run two Nova folding chains and two F' circuits along side eachother where each's proofs validate their own F and folding of **the other** Nova chain
---
## How sequential Nova looks in Practice
```graphviz
digraph structs {
node [shape=box];
newrank=true;
style=dashed;
compound=true;
subgraph cluster1 {
label="Primary curve"
subgraph some {
rank = same
U₃
U₂
U₁
U₀
ubrick2 [style=invis];
ubrick1 [style=invis];
ubrick0 [style=invis];
}
subgraph some1 {
rank = same
u₃
u₂
u₁
u₀
ufold2 [shape=circle, label="Fold"]
ufold1 [shape=circle, label="Fold"]
ufold0 [shape=circle, label="Fold"]
}
U₃ -> u₃ [style=invis, weight=100];
U₂ -> u₂ [style=invis, weight=100];
U₁ -> u₁ [style=invis, weight=100];
U₀ -> u₀ [style=invis, weight=100];
u₃ -> ubrick2 [style=invis, weight=100];
u₂ -> ubrick1 [style=invis, weight=100];
u₁ -> ubrick0 [style=invis, weight=100];
ubrick2 -> u₂ [style=invis, weight=100];
ubrick1 -> u₁ [style=invis, weight=100];
ubrick0 -> u₀ [style=invis, weight=100];
ubrick2 -> ufold2 [style=invis, weight=100];
ubrick1 -> ufold1 [style=invis, weight=100];
ubrick0 -> ufold0 [style=invis, weight=100];
{U₂, u₂} -> ufold2;
ufold2 -> U₃;
{U₁, u₁} -> ufold1;
ufold1 -> U₂ ;
{U₀, u₀} -> ufold0;
ufold0 -> U₁;
}
subgraph cluster2 {
rankdir="BT";
label="Secondary curve"
subgraph some1 {
rank = same
r₃
r₂
r₁
r₀
rfold2 [shape=circle, label="Fold"]
rfold1 [shape=circle, label="Fold"]
rfold0 [shape=circle, label="Fold"]
}
subgraph some {
rank = same
R₃
R₂
R₁
R₀
rbrick2 [style=invis];
rbrick1 [style=invis];
rbrick0 [style=invis];
}
R₃ -> r₃ [style=invis, weight=100];
R₂ -> r₂ [style=invis, weight=100];
R₁ -> r₁ [style=invis, weight=100];
R₀ -> r₀ [style=invis, weight=100];
r₃ -> rbrick2 [style=invis, weight=100];
r₂ -> rbrick1 [style=invis, weight=100];
r₁ -> rbrick0 [style=invis, weight=100];
rbrick2 -> r₂ [style=invis, weight=100];
rbrick1 -> r₁ [style=invis, weight=1000];
rbrick0 -> r₀ [style=invis, weight=1000];
rbrick2 -> rfold2 [style=invis, weight=1000];
rbrick1 -> rfold1 [style=invis, weight=1000];
rbrick0 -> rfold0 [style=invis, weight=1000];
r₀ -> rfold0;
{R₂, r₂} -> rfold2;
rfold2 -> R₃;
{R₁, r₁} -> rfold1;
rfold1 -> R₂ ;
{R₀, R₀} -> rfold0;
rfold0 -> R₁;
}
u₃ -> r₃ [style=invis, weight=1000];
u₂ -> r₂ [style=invis, weight=1000];
u₁ -> r₁ [style=invis, weight=1000];
u₀ -> r₀ [style=invis, weight=1000];
u₁ -> rfold0 [color=green];
u₂ -> rfold1 [color=green];
u₃ -> rfold2 [color=green];
r₁ -> ufold0 [color=green];
r₂ -> ufold1 [color=green];
r₃ -> ufold2 [color=green];
}
```
---
## Parallelization Nova
* The challenge is that we can't just fold instances and then use the result as a proof. We have to construct new R1CS instances at each fold which validate the folds.
* We do this by curve cycling and arranging the execution like this:
---
## Parallelization Nova (cont.)
```graphviz
digraph hierarchy {
node [fontname=Monospace,fontsize=10,shape=box]
"F3" -> {"F1"} [dir=back, label=" (0,2)"];
"F3" -> {"F5"} [dir=back, label=" (4,6)"];
"F7" -> {"F9"} [dir=back, label=" (8,10)"];
"F7" -> {"F3"} [dir=back, label=" (0,6)"];
"F1" -> {"F0"} [dir=back,color="green", label=" (0,0)"];
"F1" -> {"F2"} [dir=back,color="green", label=" (2,2)"];
"F5" -> {"F4"} [dir=back, label=" (4,4)"];
"F5" -> {"F6"} [dir=back, label = " (6,6)"];
"F9" -> {"F8"} [dir=back, label=" (8,8)"];
"F9" -> {"F10"} [dir=back, label=" (10,10)"];
F0 [label="F0", color="green"];
F2 [label="F2", color="green"];
F4 [label="F4"];
F6 [label="F6"];
F8 [label="F8"];
F10 [label="F10"];
"(init)" [style=dashed];
"(end)" [style=dashed];
"proof" [dir=blue, style=dashed, label="Proof: \n (z0 => z10) "];
{
rank=same;
F7 -> "proof"
}
{
rank=same;
"(init)" -> "F0" [label="z0"];
"F10" -> "(end)" [label="z10"];
"F8"
}
}
```
---
## Parallelizing F' in Depth
Given two nodes of $(L, l)$ $(R, r)$ claiming $(z_i, z_{k-1})$ and (z_k, z_j). We fold all instances together with three foldings, then we create a proof that $F(z_{k-1}) = z_k$ and that a triple folding is correct. The folding in in the primary proven for the opposing secondary circuit which will prove our folding of $(L, l)$ $(R, r)$.
---
## Parallelizing F' in Depth (cont)
```graphviz
digraph structs {
node [shape=box];
newrank=true;
style=dashed;
compound=true;
subgraph cluster1 {
labeljust="l";
labelloc="b";
label="Primary curve"
L₁
l₁
R₁
r₁
R;
R₁ -> R [label=" Right Fold "];
r₁ -> R;
L₁ -> L [label=" Left Fold"];
l₁ -> L;
L -> N [label=" Final Fold"];
R -> N;
}
subgraph cluster2 {
label="Secondary curve"
u
}
u -> N [label = " Proves Fold", style=bold];
u -> R₁ [label = " Input", style=dashed];
u -> r₁ [label = " Input", style=dashed];
u -> L₁ [label = " Input", style=dashed];
u -> l₁ [label = "Input ", style=dashed];
}
```
---
## Parallelization result
Super naive POC in the Microsoft Research Nova Lib:
1. Proves any combo of nodes which are one step different.
2. Archives that we can split any computation of N fold steps into N proof steps on N/2 processes.
3. Has bugs and is not optimized yet. But should be possible to have memory and proving time not that much worse than sequential.
---
## Extracting Execution Trace
To prove arbitrary computation we need to extract the execution trace and prove that every step of the computation is correct.
```graphviz
digraph structs {
node [shape=box];
newrank=true;
style=dashed;
compound=true;
subgraph cluster1 {
label="Wasmi's Engine"
subgraph some {
rank = same
e [label="Executor", color=red]
ex [label="Engine Executor"]
ee [label="Engine"]
l [label="Linker"]
}
l -> ee [weight=100];
ee -> ex [weight=100];
ex -> e [weight=100];
}
subgraph cluster2 {
rankdir="BT";
label="Injector"
subgraph some1 {
rank = same
t [label="Trace"]
p [label="Predator"]
}
p -> t [label="1..*"]
}
e -> p [style=dashed];
}
```
We pick the `wasmi` as a starting point, this approach can apply to any kind of VM.
---
## zkMemory Module
You migh aware that the memory can be constructed as a simple state machine with three instructions `INIT`,`READ`, `WRITE` and configurable `WORD_SIZE` [detail here](https://hackmd.io/vF1jobzsRoubyUASqQG0Zw?both).
| Address | Time | Instruction | Value |
| ------- | ---- | ----------- | --------- |
| 0x...00 | 1 | INIT | 0x...0000 |
| 0x...00 | 2 | READ | 0x...0000 |
| 0x...00 | 3 | WRITE | 0x...0a20 |
| 0x...20 | 4 | INIT | 0x...0010 |
| 0x...20 | 5 | READ | 0x...0010 |
| 0x... | .. | ... | ... |
---
## Memory consistency with examples
The idea is to commit to the memory state of the VM before and after the $F'$ execution.
With vector commitments we have the following operations:
- Open(Com, Idx) = Value
- Edit(Com, Idx, NewValue) = NewCommitment
Note:
So that we can open all the positions we want and prove inside of the R1CS that they're equivalent to our witness values which with we will operate then.
One challenge with a VM is that you need to make sure your memory is consistent.
---
## Vector commitments
Remember that a vector commitment commits to a vector $a_0,…,a_{n−1}$ and lets you prove that you committed to $a_i$ for some $i$.
$$
\sum_{n=0}^{n=i_{max}}Mem_i \prod_{n=0}^{n=i_{max}} {X-j\over n-j}
$$
Note:
We can reproduce this using the Kate commitment scheme: Let $p(X)$ be the polynomial that for all $i$ evaluates as $p(i)=a_i$.
This completely matches the behaviour we need where we will commit to all the Memory positions for example, and we need to prove inside of the circuit that we are providing the witnesses that satisfy the openings at particular places of the memory.
As per https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html We know there is such a polynomial, and we can for example compute it using Lagrange interpolation
---
## MUL opcode example
$$
C_M = \sum_{i=Mem(0)}^{i=Mem(MAX)}Mem_i \prod_{j=0\\j \neq i}^{j=MaxMemPos} {X-j\over i-j} \\
C_S = \sum_{i=Stack(0)}^{i=Stack(MAX)}Stack_i \prod_{j=0\\j \neq i}^{j=MaxStackPos} {X-j\over i-j}
$$
---
## MUL opcode example
$$
\mathbb{Opcode}_{MUL} \gets a \times b = c
$$
$$
Fetch(Mem(idx_a)) \gets \mathbb O(C_M, idx_a) = a\\
Fetch(Mem(idx_b)) \gets \mathbb O(C_M, idx_b) = b\\
Set(Mem(idx_{c+1})) \gets \mathbb E(C_{M+1}, c_{+1}, idx_{c+1})\\
$$
Note:
**Note that $c$ is obtained from the commitment of the memory at the end of the step ($C_{M+1}$)**
---
## MUL opcode example
$$
Set(Stack(a)) \gets \mathbb E(C_{S+1}, a, idx_{a+1})\\
Set(Stack(b)) \gets \mathbb E(C_{S+1}, b, idx_{b+1})\\
Set(Stack(c_{+1})) \gets \mathbb E(C_{S+1}, c_{+1}, idx_{c+1})\\
$$
$$
R1CS(a \times b = c_{+1})
$$
Note:
We now check the Stack consistency by checking that $a$, $b$ and $c$ are at the correct positions in the Stack at the end.
Now we apply a constraint in R1CS which checks the correctness of the MUL execution.
---
## MUL opcode example
:::success
- The folds require that the `Public Inputs` of the current fold are the `Public Outputs` of the previous one. In that way, at the very top of the aggregation/folding tree we are sure all the memory is consistent.
- We can account for program counter correctness too with this technique. And indeed most of the things we have been able to imagine related to VM execution traces.
:::
Note:
We have just proven the correctness of the memory handling for one Opcode. This has 2 important considerations:
---
## Benchmarks
- Benchmarks to understand Nova performance
- Recursively hashing SHA256 $k$ times:
- $h(h(h(x)))$
- Nova, Circom (Groth16) and Halo2 (KZG)
- Rationale: Similar to [zk-benchmark](https://github.com/celer-network/zk-benchmark) but recursive to take advantage of Nova+IVC
- Also many hashes in a single fold
- Disclaimer - preliminary, take with grain of salt
Note:
celer-network sha256 standard
30k constraints sha256
some shortcuts so might not be 100% complete
---
## Many hashes in a single fold
$h(h(h(x))) \text{(d times)}\rightarrow h(h(h(x))) \text{(d times)} \rightarrow \dots$
---
## Proving systems
| Framework | Arithmetization | Algorithm | Curve |
|------------------|-----------------|-----------|--------|
| Circom (snarkjs) | R1CS | Groth16 | Pasta |
| Nova (seq) | Relaxed R1CS | Nova | Pasta |
| Nova (par PoC) | Relaxed R1CS | Nova | Pasta |
| Halo2 | Plonkish | KZG | BN254 |
---
## Prover time on powerful laptop
*Hardware: MBP M1 Max 2021, 64GB memory*
| k | Circom | Nova total d=1 | Nova step-sum d=1 | Halo 2 (KZG) |
|-------|--------|------------------|---------------------|--------------|
| 1 | 0.3s | 0.2s | 0.1s | 0.8s |
| 10 | 7.3s | 2.4s | 1.2s | 0.8s |
| 100 | 62s | 24s | 12.5s | 1.6s |
| 1000 | - | 240s | 125s | 25s |
Note:
Total includes witness gen and some other overhead, calling from Nova Scotia
Step sum is sum of all individual folding steps
Not including SNARK verification done at end
Will vary d parameter next slide
---
## Prover time on powerful server
*Hardware: Server with 72 cores and ~350GB RAM*
| k | Nova d=1 | Nova d=10 | Nova d=100 | Nova d=100 par | Halo 2 (KZG) |
|---------|----------------|-----------|-------------|----------------|--------------|
| 100 | 19s | 3.6s | - | - | 2.5s |
| 1000 | 190s | 36s | 26.5s | 28.5s | 41.6s |
| 10000 | 1900s | 360s | **265s** | **226s** | 389.1s |
| 100000 | 19000s | | | | ? |
Note:
Not apples-to-apples, Circom in-circuit, Halo2 lookup table, but also standard op
We see Nova faster than Halo2 both for seq and par for d=100, par has bugs
---
## Memory usage and SRS
*Hardware: Server with 72 cores and ~350GB RAM*
| k | Nova (seq) d=1 | Halo 2 (KZG) | Nova (par PoC) |
|---------|-----------------|--------------|-------------|
| 100 | 1.6GB | 3.7GB | 9GB |
| 1000 | 1.6GB | 32GB | 244GB |
| 10000 | 1.6GB | 245GB | OOM |
| 100000 | 1.6GB | ? | ? |
Note:
Circom k=100 3m constraints, ptau23
Halo2 plonkish better, 2^25 for k=10k
Nova seq constant memo overhead, par shd be but current bug
---
## Benchmark conclusion
1. Nova is memory-efficient.
2. R1CS vs Plonkish matters a lot.
3. Consider constraints vs recursion overheard.
4. Large circuits: Nova seems faster than Halo2.
Note:
Mem matters, Circom and Halo2 req big SRS OOM, low-mem env
Plonkish lookup table prover time for Halo2 w d=1, motivates alt folding like Sangria
SHA256 d=1 no constraints 30k not big vs recursive overhead 10k to see perf gain, for d=10,100 see better perf - suggest larger folding
Suboptimal par see ~75% improvement over Halo2 for 10k rec hashes
Check link for more/future
---
## Conclusion and next steps
- Nova new but promising for ZKVM etc
- Nova wish list/next steps
- Parallel, GPU, API, ...
- Nova, Nova Scotia, Supernova, Nova Bench
- Contribute to [pse/zkvm-ideas](github.com/privacy-scaling-explorations/zkvm-ideas) repo
- Join Nova track at Zuzalu hackathon
Note:
New and promising, great oppu
Things to improve in Nova-based ecosystem, fix par eg also maybe FPGA
Work on this and other stuff at Nova track Zuzalu
---
## Links and Q&A
- [zkresear.ch post](https://zkresear.ch/t/towards-a-nova-based-zk-vm/105)
- [pse/zkvm-ideas](https://github.com/privacy-scaling-explorations/zkvm-ideas)
- [Nova-based ZKVM spec](https://hackmd.io/0bcDwP5QQp-eiAMVEulH7Q?view=)
- [Benchmarks](https://hackmd.io/0gVClQ9IQiSXHYAK0Up9hg?view=)
- [Nova wishlist and next steps](https://hackmd.io/80cPXRc0Sa2MPuYX4Wft8w?view=)
Note:
Thanks! Check out above, zkresear.ch post
{"metaMigratedAt":"2023-06-18T01:05:29.657Z","metaMigratedFrom":"Content","title":"Towards a Nova-based ZK VM","breaks":true,"contributors":"[{\"id\":\"13631d6c-63ba-42ae-99f7-7d45ca96a9fb\",\"add\":6983,\"del\":3458},{\"id\":\"ec89d97e-db3e-4fb3-b85d-10cdd2a14d2c\",\"add\":5409,\"del\":3772},{\"id\":\"87bf749a-9a51-43dd-8c18-1ff87c4baaab\",\"add\":6511,\"del\":1623},{\"id\":\"8ccb40eb-a56c-4c8a-93a2-071a7bd00c59\",\"add\":7739,\"del\":388}]"}