Folding Circom circuits: a ZKML case study

Previous work from other groups

Zator: Verified inference of a 512-layer neural network using recursive SNARKs 🐊

Folding model architecture vs inference

Folding model architecture
- Pros: get much more compression ratio as we wish per inference
- Cons: require models to be specifically designed for folding, which might not be practical for the current web2 industry
Folding model inference
- Pros: do not need to modify current models
- Cons: performance still limited by the model depth and complexity

Let's try to fold model inference!

In order to use Nova-Scotia to fold Circom circuits:

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

We write a circuit that takes a list of public inputs (these must be named step_in for the Nova-Scotia interface) and outputs the same number of public outputs (named step_out). These public outputs will then be routed to the next step of recursion as step_in, and this will continue until we reach the end of the recursion iterations. Within a step circuit, besides the public inputs, Circom circuits can input additional private inputs (with any name/JSON structure Circom will accept).

Let's try to write out the step_in, step_out, and private inputs at each step!

Scenario 1: Public Data, Private Model

Attempt 1: naive approach

signals	Step 1	Step 2	…	Step N-1	Step N
$s t e p_{i n}$	$[m o d e l H a s h, d a t a_{1}, d a t a_{2}, . . ., d a t a_{N}, 0, 0, . . ., 0, 0]$	$s t e p_{o u t, 1}$	…	$s t e p_{o u t, N - 2}$	$s t e p_{o u t, N - 1}$
$s t e p_{o u t}$	$[m o d e l H a s h, d a t a_{1}, d a t a_{2}, . . ., d a t a_{N}, o u t_{1}, 0, . . ., 0, 0]$	$[m o d e l H a s h, d a t a_{1}, d a t a_{2}, . . ., d a t a_{N}, o u t_{1}, o u t_{2}, . . ., 0, 0]$	…	$[m o d e l H a s h, d a t a_{1}, d a t a_{2}, . . ., d a t a_{N}, o u t_{1}, o u t_{2}, . . ., o u t_{N - 1}, 0]$	$[m o d e l H a s h, d a t a_{1}, d a t a_{2}, . . ., d a t a_{N}, o u t_{1}, o u t_{2}, . . ., o u t_{N - 1}, o u t_{N}]$
private inputs	model weights	model weights	…	model weights	model weights

Issues:

the size of the public outputs (hence the size of the proof) grows linearly
a lot of switchers need to be used to insert at the correct positions

Attempt 2: Merkle root?

signals	Step 1	Step 2	…	Step N-1	Step N
$s t e p_{i n}$	$[m o d e l H a s h, z e r o R o o t, z e r o R o o t]$	…	$s t e p_{o u t, N - 2}$	$s t e p_{o u t, N - 1}$
$s t e p_{o u t}$	$[m o d e l H a s h, d a t a M e r k l e R o o t_{1}, o u t p u t M e r k l e R o o t_{1}]$	$[m o d e l H a s h, d a t a M e r k l e R o o t_{2}, o u t p u t M e r k l e R o o t_{2}]$	…	$[m o d e l H a s h, d a t a M e r k l e R o o t_{N - 1}, o u t p u t M e r k l e R o o t_{N - 1}]$	$[m o d e l H a s h, d a t a M e r k l e R o o t_{N}, o u t p u t M e r k l e R o o t_{N}]$
private inputs	$[m o d e l W e i g h t s, d a t a L e a f_{0}, m e r k l e P a t h s_{0}]$	$[m o d e l W e i g h t s, d a t a L e a f_{1}, m e r k l e P a t h s_{1}]$	…	$[m o d e l W e i g h t s, d a t a L e a f_{N - 2}, m e r k l e P a t h s_{N - 2}]$	$[m o d e l W e i g h t s, d a t a L e a f_{N - 1}, m e r k l e P a t h s_{N - 1}]$

Improvements:

less public inputs/outputs

Issues:

order of data matters!
relatively cheaper (than Attempt 3) if try to verify the entire tree on chain
$2 \log N$ number of hashes per step
need to publish Merkle leaves elsewhere

Attempt 3: Recursive hashing?

signals	Step 1	Step 2	…	Step N-1	Step N
$s t e p_{i n}$	$[m o d e l H a s h, 0, 0]$	…	$s t e p_{o u t, N - 2}$	$s t e p_{o u t, N - 1}$
$s t e p_{o u t}$	$[m o d e l H a s h, H (0, d a t a_{1}), H (0, o u t_{1})]$	$[m o d e l H a s h, H (H (0, d a t a_{1}), d a t a_{2}), H (H (0, o u t_{1}), o u t_{2})]$	…	$[m o d e l H a s h, H (. . ., d a t a_{N - 1}), H (. . ., o u t_{N - 1})]$	$[m o d e l H a s h, H (. . ., d a t a_{1}), H (. . ., o u t_{N - 1})]$
private inputs	$[m o d e l W e i g h t s, d a t a_{1}]$	$[m o d e l W e i g h t s, d a t a_{2}]$	…	$[m o d e l W e i g h t s, d a t a_{N - 1}]$	$[m o d e l W e i g h t s, d a t a_{N}]$

Improvements:

only 2 hashes per step
less signals to pass around

Issues:

order of data still matters
very expensive to verify on chain, but luckily we can precompute and commit the final hash
need to publish raw inputs elsewhere

Scenario 2: Private Data, Public Model

Attempt 1: naive approach

signals	Step 1	Step 2	…	Step N-1	Step N
$s t e p_{i n}$	$[m o d e l W e i g h t s, 0, 0, . . ., 0, 0, 0, 0, . . ., 0, 0]$	$s t e p_{o u t, 1}$	…	$s t e p_{o u t, N - 2}$	$s t e p_{o u t, N - 1}$
$s t e p_{o u t}$	$[m o d e l W e i g h t s, d a t a H a s h_{1}, 0, . . ., 0, 0, o u t_{1}, 0, . . ., 0, 0]$	$[m o d e l W e i g h t s, d a t a H a s h_{1}, d a t a H a s h_{2}, . . ., 0, 0, o u t_{1}, o u t_{2}, . . ., 0, 0]$	…	$[m o d e l W e i g h t s, d a t a H a s h_{1}, d a t a H a s h_{2}, . . ., d a t a H a s h_{N - 1}, 0, o u t_{1}, o u t_{2}, . . ., o u t_{N - 1}, 0]$	$[m o d e l W e i g h t s, d a t a H a s h_{1}, d a t a H a s h_{2}, . . ., d a t a H a s h_{N - 1}, d a t a H a s h_{N}, o u t_{1}, o u t_{2}, . . ., o u t_{N - 1}, o u t_{N}]$
private inputs	$d a t a_{1}$	$d a t a_{2}$	…	$d a t a_{N - 1}$	$d a t a_{N}$

Issues: same as Scenario 1 Attempt 1

Solution: see Scenario 1 Attempt 3

Issues:

we will need to publish the model weights elsewhere

Folding Circom circuits: a ZKML case study

Previous work from other groups

Folding model architecture vs inference

Let's try to fold model inference!

Scenario 1: Public Data, Private Model

Attempt 1: naive approach

Attempt 2: Merkle root?

Attempt 3: Recursive hashing?

Scenario 2: Private Data, Public Model

Attempt 1: naive approach

Solution: see Scenario 1 Attempt 3

Read more

ZKML: Bridging AI/ML and Web3 with Zero-Knowledge Proofs

ZKML @ Team Novi

ZKML Research Initiatives