Untitled - HackMD

Distributed Nova Prover sketch

This section sketches how a distributed Nova prover might look like and what kind of performance we can expect.

The following can be simulated in a server with N threads. Alternatively, it can be done in a distributed fashion, either in a controlled data center environment, or in a permissionless environment. As we move from a single server to a data center to a permissionless network we can expect more participants/total parallelization capacity, at the cost of higher latency and complexity.

High level overview

We have a coordinator and N workers
Coordinator is responsible for distributing workload across workers
Workers are responsible for computing base and internal case
Workers process the base level folds in parallel in chunks of two
Once a worker is done, it signals this to the coordinator
When two adjacent base nodes are computed, these are merged
This goes on until we get to the root of the tree

In more detail

In cases where we don't have \(2^n\) steps, we can either pad the tree, or we can "skip" the computation and compute it at two levels above (this is due to elliptic curve cycles).

A base level computation involves three folds - aggregated, primary and secondary circuit.

The three folds of the par case are folding the left running instance with the left new instance, that for the right and then folding the result of those two folds.

This merges two nodes where each node has a running instance and new instance.

At the very top of the tree a CompressedSNARK is generated, either by the coordinator or the first available worker.

If we have N execution steps and N workers, we can roughly expect the following overhead:

Coordinator sends load to N workers with some latency L
For simplicity we assume same overhead for every coordinaton step
Each worker performs \(\frac{1}{N}\) of the work, times overhead
A worker computes \(3\) folds
Coordinator sends and receives folds ~\(2 \times L\)
For base layer we have: \(6 \times L\)
For intermediate steps, we have similar overhead
We have a binary tree so a total of \(\log(N)\) steps
We get \(6 \times \log{N} \times L\) or \(O(\log{N} \times L)\)
We can compare with sequential case which is \(O(N)\)

If \(L\) is high, we lose a lot of the benefits of parallelization. Also note that for the sequential case, a lot of MSM operations can be performed in a multi-threaded environment (powerful server).

To reduce load on coordinator we can also send this to predefined worker, assuming equal load. For example, worker 2 could send its result to worker 1, etc. This assumes that (i) workers are reliable (ii) each worker finishes in about the same time.

Also note that we change field for each level in the tree, which may incur extra overhead.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.