Bittensor Apollo subnet

Pianist
Network layout and incentive structure
Expansion and future work
Appendix A: A polynomial protocol for distributed PLONK circuits

Bittensor subnets are a great way for providing open APIs to outsourcing certain computational workloads, whether it be model inference, distributed learning or other highly specific compute. However, a key insight seems to be missing from most subnets out there today, which is that almost none of them seem to coordinate their miners to collaborate in order to achieve greater computational power; instead, they seem to often either perform computation redundantly, or miners are instead pitted in competition with each other, which may allow for a free market dynamic in the way of creating models, but severely limits the size of said models (and in some cases, these models seem to actually just be overfitting to a specific training set most of the time). We believe that Bittensor will only be fully leveraged if subnets begin thinking more deeply about collaborative mining, rather than redundant or competitive strategies.

This document will contain the specifications regarding the creation of the Bittensor Apollo subnet. Pianist refers to the recently released paper by Liu et al. and describes the first 'robust' distributed ZKP generation protocol, and makes use of a straightforward way to aggregate SNARKs by folding univariate polynomials of

M

sub-circuits into a single bivariate polynomial which allows for organized indexing and, in the case of PLONK constraint systems, also allows for valid copy constraints across sub-circuits, allowing not only for data-parallel circuits to be sped up, but also unlocking performance increases for general, arbitrary circuits. Additionally, the approach introduced by Liu et al. contains a 'robust' property, meaning that bad actors can not significantly slow down proof generation in the case of foul play - which works well both for responsiveness of the subnet, and for reaching consensus and rewarding miners (or punishing them for misbehavior).

We believe Pianist is expertly suited to be deployed as a subnet on TAO and will provide a great addition to the network's many usecases. In this document, we will explore just how this work will be implemented and what the end product and future work can look like.

Pianist

Unlike prior works such as DIZK, where the network in question performs a distributed FFT in order to compute the proof, Pianist works by slightly modifying the PLONK constraint system in order to accommodate independent work which can be given out, verified and then aggregated by a 'master node' (which in the subnet is a position naturally assumed by the group of validators). Simply speaking, DIZK requires all workers in the network to work together very tightly, and one disconnected worker (or a malicious participant) could ruin the work performed entirely. In contrast, Pianist's distributed work is perfectly independent on a per-worker basis, and it allows for easy recovery in the case of outages or malicious actors. As a bonus, this greatly reduces communication complexity for the protocol as well (DIZK is M-to-M, or exponential in the size of the network, whereas Pianist is 1-to-M, or linear in the size of the network). It works, roughly, as follows.

Pianist works specifically on the PLONK polynomial protocol, which defines two generic checks:

The gate checks - which, for each row in the circuit, checks that the selected equation holds. This can be addition, multiplication, or any kind of custom defined operation.
The wiring checks - which ensure that there is continuity in the circuit variables. E.g., the output of row
$n$ is used in row
$n + 1$ , and there exist no foreign variables injected by the prover to make a false proof.

Detailed explanations of how they are modified to fit in the distributed setting can be found in Appendix A. For now, we will provide a simple explanation of how this distribution of work is made possible.

A key ingredient in understanding the workings of the Pianist protocol is what's called 'additive homomorphism' in cryptography. It means that a cryptographic encoding of some value still upholds the rules of addition as defined in basic arithmetic. What does this mean? Well, let's say that we have two numbers:

7

and

5

. If we add

7

and

5

together, we get

12

. Now, let's add a commitment scheme into the mix - this is just some encoding we can apply to a number to encrypt it and ensure that nobody knows our secret value. So, let's denote the commitment to

7

and

5

c (7)

and

c (5)

. In this context, a commitment scheme is additively homomorphic if the sum of

c (7)

and

c (5)

is the same as

c (12)

. Practically, this would look as follows.

\begin{matrix} 7 + 5 = 12 \\ c (7) := 55 \\ c (5) := 92 \\ c (12) := 147 \\ c (7) + c (5) = 55 + 92 = 147 = c (12) \end{matrix}

As we can see, adding the encrypted values

c (7)

and

c (5)

together here produces the same output as if we just encrypted

12

and created

c (12)

. This is what cryptographers understand as 'additive homomorphism' - addition works the same under the encryption scheme as it would with cleartext values.

Pianist leverages the additive homomorphism property of KZG (a popular polynomial commitment scheme used to create zero-knowledge proofs) in order to cheaply combine work together from separate machines. Essentially, we take a giant polynomial that describes the entire circuit that we wish to prove, and chop it up in smaller chunks, and send them to workers. Workers then create commitments of these smaller polynomials, and send their commitments back to the master node. The master node can then simply combine them together with addition operations to create the final product, which is a proof for the full circuit. This incurs a really small extra overhead over the actual proving work being done, allowing Pianist to scale proof generation speeds almost linearly.

Network layout and incentive structure

As you may guess from the prior section, the two classes of worker in the Apollo distributed network fit in perfectly with the hierarchy of a subnet. At the heart of the protocol is a master node, which receives the circuit and witness for which a proof shall be generated. The master node proceeds to chop this up into

M

subcircuits (where

M

is the number of miners in the subnet), and each piece is then distributed to a miner. Hence, validators can perfectly play the part of 'master node', as they prepare and then verify and complete the work performed by the worker nodes, which can be assumed to be miners as defined in a subnet.

In order to increase efficiency, the subnet may elect to split the circuit into sub-circuits with sizes on a sliding scale, where the bigger portions are given to the miners known to have faster hardware and/or better software implementations which allows them to work faster; essentially cranking up prover performance. The implications of this approach will need to be benchmarked to ensure the speedup is significant, but it is worth exploring regardless.

Validators may set reasonable timers on the work being done by miners. Rewards will be distributed to miners according to their speed of execution in order to incentivize faster hardware and potentially competition in writing more efficient software algorithms for the work at hand (as many companies currently already do) and upon expiry, workloads that haven't been completed will be marked as invalid and redistributed to workers who have proven to be online and capable of completing the work fast enough for the current proof generation phase, further rewarding those miners for their extra work, and allowing validators to punish miners who aren't online or aren't capable of keeping up. Timers may be set on a rolling basis, taking an average of proving times from past attempts, and miners may be asked to perform bandwidth and computational checks prior to registering, to ensure that they are capable of performing the intense work at hand. In case of a majority network failure or an entire network failure, timers may be rolled back immediately to ensure a next attempt may succeed (for instance, due to a widespread outage). Miners may also be actively checked for liveness by pinging frequently.

The communication complexity of the protocol is pretty straightforward, and thus will not need very complicated wiring either. As long as validators can directly send miners their work, and miners can directly send the work back, we are all set to run the Apollo protocol on a TAO subnet.

Expansion and future work

We believe the Pianist system is incredibly useful and the discoveries made by Liu et al. may be applied separately to other proof systems and other configurations. We outline some potential future work here.

Generalizing to any PLONK circuit setup

Initially, our library will support the 'vanilla PLONK' circuit arithmetization; that is, the use of addition and multiplication gates only, in a fan-in-two circuit. Seeing as the mathematical software we will employ is not extremely concerned with homogeneity of a proof system - rather just with the compilation of polynomials into commitments - we should be easily able to support any type of 'PLONKish' circuit design (including things like the incredibly popular Halo2 and all its different configurations). In the Pianist paper, the authors also mention a possibility for expanding to any custom circuit type, and even supporting worker-localized lookup tables to further speed up proof generations and reduce workload on the master node, which is something we intend to implement.

Supporting R1CS and FFLONK proof systems (Circom compatibility)

Besides supporting generalized PLONKish circuit types, we should also be able to perform some research into extending the intuitions provided by Liu et al. in Pianist to other polynomial protocols that can compile down to a KZG-based SNARK. Our high-priority targets would be R1CS circuits and supporting generalized FFLONK circuits, since both of these circuit types are exceedingly popular, given that they are provided options in the Circom circuit compiler. Circom (and particularly R1CS circuits) is very popular with most simpler ZK dApps on Ethereum and other chains, and is also extensively used by multiple rollups in the usage of SNARK wrapper proofs for STARK proofs which are often too expensive to post on chain. The latter usecase is often filled by Circom FFLONK circuits, as it incurs a higher prover cost for a greatly reduced verifier cost and makes it cheaper to store and verify proofs on chain; something that this subnet is very well suited for due to its great proving power.

Adapting the robust model to a STARK

This idea will require some more research and development than aforementioned work, since the adaptation of a STARK commitment scheme such as FRI to a decentralized setting is less straightforward. This is due to the commitment scheme not being additively homomorphic, so you can not simply add commitments together as you do in Pianist. However, through prior work we are aware of ways to distribute work fairly between multiple workers and scale the computation of FRI-like proofs horizontally, and it's something we would eventually like to offer as an API as well given the rapid rise of popularity for STARKs recently - as STARKs are much more scalable to begin with, they might leverage even more interesting applications to high-degree circuits (especially AIR-based STARKs as they are very well suited for structured computation, something that is common in many different AI models and most larger programs that a user might want a ZKP for). This is considered to be future research work and will require some mathmatical work prior to implementation, for which we may also be able to produce a paper.

Infrastructure for further distribution

In case a user has a full cluster of hardware that they would like to use, rather than a single computer as a miner, we would like to be able to provide functionality in the provided miner to set up LAN or WAN networks for the registered miner in order to further distribute the received work and speed up potential proof generation even more. This would create a kind of 'fractally scaled' distributed zkp generation network. Eventually, this may give rise to fractal mining pools, in an event where the subnet hits capacity but people may still be looking to enter and provide hardware in order to scale up the subnet's proving power. If this becomes viable eventually, it is something we would like to provide infrastructure for down the line, but this is still highly speculative and we assume that this really will not be necessary for now - and might never become necessary. It's merely something we will keep an eye out for.

Appendix A: A polynomial protocol for distributed PLONK circuits

We will now proceed to explain in more depth how the PLONK polynomial protocol is modified by Liu et al. to accommodate distribution of work. Recall that we need to modify the gate checks and the wiring checks. On paper, these checks boil down to this.

Gate checks

We take the example of the vanilla PLONK paper, in which the constraint system is defined as a fan-in-two arithmetic circuit with one output per gate, and exposes addition and multiplication gates which can be interwoven by the use of selector polynomials. The polynomial equation for this constraint system's gate checks looks like this. Let

a (X), b (X), c (X)

be polynomial extensions of the vector of all circuit variables; where

a (X)

represents the left input of each gate,

b (X)

represents the right input of each gate, and

c (X)

represents the output value of each gate. Let

q_{a} (X), q_{b} (X), q_{c} (X), q_{a b} (X)

be selector polynomials for left addition input, right addition input, the output value and the multiplication input. Finally, we defined

p (X)

as the public input polynomial, which allows us to include public inputs in the gate equation. The final combination of these polynomials looks like this:

g (X) := q_{a} (X) a (X) + q_{b} (X) b (X) + q_{c} (X) c (X) + q_{a b} (X) a (X) b (X) + p (X) = 0.

Checking whether this polynomial is zero at a random point will convince the verifier that each row of the circuit is computed correctly, due to the Schwarz-Zippel lemma.

Wiring checks

The proof also needs to ensure that the variables used in the circuit are related to one another, so that the integrity of the computation can be proved. More specifically speaking, there are redundant elements in the vector of circuit variables (particularly between

a, b, c

, since outputs from certain gates will be inputs to others) and we need to show that these redundant variables link up together.

In PLONK, this is done by showing that the grand product of one list equals the grand product of another - in this case, the second list is simply a permutation of the original list. We want to show that, for a list

f

{(f_{i}, i)}_{i \in I} = {(f_{i}, σ (i))}_{i \in I} .

To prove this, the prover creates a 'product polynomial'

z (X)

. Let

ω \in H

be defined as the generator of a subgroup

H \subset F

over which we evaluate the PLONK polynomials. Fix

z (ω)

= 1 and define the remainder of

z (X)

as:

z (ω^{i + 1}) = \prod_{i = 0}^{n - 1} f (ω^{i}) / h (ω^{i})

where

\begin{matrix} f (X) := (a (X) + η σ_{a} (X) + γ) (b (X) + η σ_{b} (X) + γ) (c (X) + η σ_{c} (X) + γ) \\ h (X) := (a (X) + η k_{a} X + γ) (b (X) + η k_{b} X + γ) (c (X) + η k_{c} X + γ) \end{matrix}

and

η, γ \in F

are two random scalar elements,

σ_{i} (X)

is a polynomial returning the permuted index for element

X

in polynomial

i

, and

k_{a} = 1

k_{b}

is any quadratic non-residue in

F

, and

k_{c}

is any quadratic non-residue not contained in

k_{b} \cdot H

The prover may then prove to the verifier about the legitimacy of the wiring checks by two polynomial equations:

\begin{matrix} p_{0} (X) := L_{0} (X) (z (X) - 1) \\ p_{1} (X) := z (X) f (X) - z (ω X) h (X) \end{matrix}

which should both equal zero if

X \in H

The full proof

We can finally combine all checks together into a single polynomial identity test. Define

Q_{H} (X)

as the quotient polynomial for subgroup

H

, and

V (X) := X^{n} - 1

(where

n

is the number of gates in the circuit). Finally, define

λ \in F

as another verifier challenge scalar. The final check then becomes

g (X) + λ p_{0} (X) + λ^{2} p_{1} (X) = V (X) Q_{H} (X)

If all polynomials evaluate to 0 on any point in the subgroup, this identity should hold and the proof is considered to be correct.

Making these concurrently computable

Here is where the work on Pianist comes in to divide the work of proof generation of between

M

machines. Liu et al. modify the PLONK constraint system to allow for sectioning into

M

smaller univariate polynomials which later are combined into a larger bivariate polynomial. This works as follows.

Define a bivariate polynomial

S (Y, X)

\sum_{i = 0}^{M - 1} R_{i} (Y) s_{i} (X)

, where

R_{i} (Y)

is defined as the

i

'th Lagrange polynomial, and

s_{i} (X)

is defined as a univariate polynomial produced by miner

i

. The constraint system above can then be used per-miner and the 'master node' may aggregate all produced polynomials together into a single bivariate polynomial containing all of the miners' work. As such, we can extend the constraint section in the prior section as follows:

Note that for any bivariate extension of a univariate polynomial
$f$ , we use the uppercase version of this polynomial
$F$ .

\begin{aligned} G (Y, X) & := Q_{A} (Y, X) A (Y, X) + Q_{B} (Y, X) B (Y, X) + Q_{C} (Y, X) C (Y, X) \\ + Q_{A} B (Y, X) A (Y, X) B (Y, X) + P (Y, X) \\ P_{0} (Y, X) & := L_{0} (X) (Z (Y, X) - 1) \\ P_{1} (Y, X) & := Z (Y, X) \prod_{S \in A, B, C} S (Y, X) + η σ_{S} (Y, X) + γ \\ - Z (Y, ω X) \prod_{S \in A, B, C} S (Y, X) + η κ_{S} X + γ . \end{aligned}

With the final identity test coming down to:

G (Y, X) + λ P_{0} (Y, X) + λ^{2} P_{1} (Y, X) - V_{X} (X) Q_{H} (Y, X) = V_{Y} (Y) Q_{G} (Y, X)

where the left hand side of the equation evaluates to 0 for all

Y \in G

where

G \subset F

of order

M

With all of this combined, we can now analyze the final complexity of generating a distributed proof. Define

N

as the number of gates in the total circuit,

M

as the amount of machines in the network, and

T = N / M

as the size of the sub-circuit. Each prover's independent work then becomes

O (T \log T)

field operations and

O (T)

group operations as per the KZG complexity analysis. Additionally, the 'master node' workload comes out to being

O (T \log T + M \log M)

field operations and

O (T + M)

group operations.

With this all defined, we now have a distributedly computable PLONK + KZG proving system due to the work of Liu et al.

Bittensor Apollo subnet

Table of contents

Pianist

Network layout and incentive structure

Expansion and future work

Generalizing to any PLONK circuit setup

Supporting R1CS and FFLONK proof systems (Circom compatibility)

Adapting the robust model to a STARK

Infrastructure for further distribution

Appendix A: A polynomial protocol for distributed PLONK circuits

Gate checks

Wiring checks

The full proof

Making these concurrently computable