Distributed Nova Prover sketch

This section sketches how a distributed Nova prover might look like and what kind of performance we can expect.

The following can be simulated in a server with N threads. Alternatively, it can be done in a distributed fashion, either in a controlled data center environment, or in a permissionless environment. As we move from a single server to a data center to a permissionless network we can expect more participants/total parallelization capacity, at the cost of higher latency and complexity.

High level overview

  • We have a coordinator and N workers
  • Coordinator is responsible for distributing workload across workers
  • Workers are responsible for computing base and internal case
  • Workers process the base level folds in parallel in chunks of two
  • Once a worker is done, it signals this to the coordinator
  • When two adjacent base nodes are computed, these are merged
  • This goes on until we get to the root of the tree

In more detail

In cases where we don't have \(2^n\) steps, we can either pad the tree, or we can "skip" the computation and compute it at two levels above (this is due to elliptic curve cycles).

A base level computation involves three folds - aggregated, primary and secondary circuit.

The three folds of the par case are folding the left running instance with the left new instance, that for the right and then folding the result of those two folds.

This merges two nodes where each node has a running instance and new instance.

At the very top of the tree a CompressedSNARK is generated, either by the coordinator or the first available worker.

If we have N execution steps and N workers, we can roughly expect the following overhead:

  • Coordinator sends load to N workers with some latency L
  • For simplicity we assume same overhead for every coordinaton step
  • Each worker performs \(\frac{1}{N}\) of the work, times overhead
  • A worker computes \(3\) folds
  • Coordinator sends and receives folds ~\(2 \times L\)
  • For base layer we have: \(6 \times L\)
  • For intermediate steps, we have similar overhead
  • We have a binary tree so a total of \(\log(N)\) steps
  • We get \(6 \times \log{N} \times L\) or \(O(\log{N} \times L)\)
  • We can compare with sequential case which is \(O(N)\)

If \(L\) is high, we lose a lot of the benefits of parallelization. Also note that for the sequential case, a lot of MSM operations can be performed in a multi-threaded environment (powerful server).

To reduce load on coordinator we can also send this to predefined worker, assuming equal load. For example, worker 2 could send its result to worker 1, etc. This assumes that (i) workers are reliable (ii) each worker finishes in about the same time.

Also note that we change field for each level in the tree, which may incur extra overhead.

Select a repo