Bittensor subnets are a great way for providing open APIs to outsourcing certain computational workloads, whether it be model inference, distributed learning or other highly specific compute. However, a key insight seems to be missing from most subnets out there today, which is that almost none of them seem to coordinate their miners to collaborate in order to achieve greater computational power; instead, they seem to often either perform computation redundantly, or miners are instead pitted in competition with each other, which may allow for a free market dynamic in the way of creating models, but severely limits the size of said models (and in some cases, these models seem to actually just be overfitting to a specific training set most of the time). We believe that Bittensor will only be fully leveraged if subnets begin thinking more deeply about collaborative mining, rather than redundant or competitive strategies.
This document will contain the specifications regarding the creation of the Bittensor Apollo subnet. Pianist refers to the recently released paper by Liu et al. and describes the first 'robust' distributed ZKP generation protocol, and makes use of a straightforward way to aggregate SNARKs by folding univariate polynomials of
We believe Pianist is expertly suited to be deployed as a subnet on TAO and will provide a great addition to the network's many usecases. In this document, we will explore just how this work will be implemented and what the end product and future work can look like.
Unlike prior works such as DIZK, where the network in question performs a distributed FFT in order to compute the proof, Pianist works by slightly modifying the PLONK constraint system in order to accommodate independent work which can be given out, verified and then aggregated by a 'master node' (which in the subnet is a position naturally assumed by the group of validators). Simply speaking, DIZK requires all workers in the network to work together very tightly, and one disconnected worker (or a malicious participant) could ruin the work performed entirely. In contrast, Pianist's distributed work is perfectly independent on a per-worker basis, and it allows for easy recovery in the case of outages or malicious actors. As a bonus, this greatly reduces communication complexity for the protocol as well (DIZK is M-to-M, or exponential in the size of the network, whereas Pianist is 1-to-M, or linear in the size of the network). It works, roughly, as follows.
Pianist works specifically on the PLONK polynomial protocol, which defines two generic checks:
Detailed explanations of how they are modified to fit in the distributed setting can be found in Appendix A. For now, we will provide a simple explanation of how this distribution of work is made possible.
A key ingredient in understanding the workings of the Pianist protocol is what's called 'additive homomorphism' in cryptography. It means that a cryptographic encoding of some value still upholds the rules of addition as defined in basic arithmetic. What does this mean? Well, let's say that we have two numbers:
As we can see, adding the encrypted values
Pianist leverages the additive homomorphism property of KZG (a popular polynomial commitment scheme used to create zero-knowledge proofs) in order to cheaply combine work together from separate machines. Essentially, we take a giant polynomial that describes the entire circuit that we wish to prove, and chop it up in smaller chunks, and send them to workers. Workers then create commitments of these smaller polynomials, and send their commitments back to the master node. The master node can then simply combine them together with addition operations to create the final product, which is a proof for the full circuit. This incurs a really small extra overhead over the actual proving work being done, allowing Pianist to scale proof generation speeds almost linearly.
As you may guess from the prior section, the two classes of worker in the Apollo distributed network fit in perfectly with the hierarchy of a subnet. At the heart of the protocol is a master node, which receives the circuit and witness for which a proof shall be generated. The master node proceeds to chop this up into
In order to increase efficiency, the subnet may elect to split the circuit into sub-circuits with sizes on a sliding scale, where the bigger portions are given to the miners known to have faster hardware and/or better software implementations which allows them to work faster; essentially cranking up prover performance. The implications of this approach will need to be benchmarked to ensure the speedup is significant, but it is worth exploring regardless.
Validators may set reasonable timers on the work being done by miners. Rewards will be distributed to miners according to their speed of execution in order to incentivize faster hardware and potentially competition in writing more efficient software algorithms for the work at hand (as many companies currently already do) and upon expiry, workloads that haven't been completed will be marked as invalid and redistributed to workers who have proven to be online and capable of completing the work fast enough for the current proof generation phase, further rewarding those miners for their extra work, and allowing validators to punish miners who aren't online or aren't capable of keeping up. Timers may be set on a rolling basis, taking an average of proving times from past attempts, and miners may be asked to perform bandwidth and computational checks prior to registering, to ensure that they are capable of performing the intense work at hand. In case of a majority network failure or an entire network failure, timers may be rolled back immediately to ensure a next attempt may succeed (for instance, due to a widespread outage). Miners may also be actively checked for liveness by pinging frequently.
The communication complexity of the protocol is pretty straightforward, and thus will not need very complicated wiring either. As long as validators can directly send miners their work, and miners can directly send the work back, we are all set to run the Apollo protocol on a TAO subnet.
We believe the Pianist system is incredibly useful and the discoveries made by Liu et al. may be applied separately to other proof systems and other configurations. We outline some potential future work here.
Initially, our library will support the 'vanilla PLONK' circuit arithmetization; that is, the use of addition and multiplication gates only, in a fan-in-two circuit. Seeing as the mathematical software we will employ is not extremely concerned with homogeneity of a proof system - rather just with the compilation of polynomials into commitments - we should be easily able to support any type of 'PLONKish' circuit design (including things like the incredibly popular Halo2 and all its different configurations). In the Pianist paper, the authors also mention a possibility for expanding to any custom circuit type, and even supporting worker-localized lookup tables to further speed up proof generations and reduce workload on the master node, which is something we intend to implement.
Besides supporting generalized PLONKish circuit types, we should also be able to perform some research into extending the intuitions provided by Liu et al. in Pianist to other polynomial protocols that can compile down to a KZG-based SNARK. Our high-priority targets would be R1CS circuits and supporting generalized FFLONK circuits, since both of these circuit types are exceedingly popular, given that they are provided options in the Circom circuit compiler. Circom (and particularly R1CS circuits) is very popular with most simpler ZK dApps on Ethereum and other chains, and is also extensively used by multiple rollups in the usage of SNARK wrapper proofs for STARK proofs which are often too expensive to post on chain. The latter usecase is often filled by Circom FFLONK circuits, as it incurs a higher prover cost for a greatly reduced verifier cost and makes it cheaper to store and verify proofs on chain; something that this subnet is very well suited for due to its great proving power.
This idea will require some more research and development than aforementioned work, since the adaptation of a STARK commitment scheme such as FRI to a decentralized setting is less straightforward. This is due to the commitment scheme not being additively homomorphic, so you can not simply add commitments together as you do in Pianist. However, through prior work we are aware of ways to distribute work fairly between multiple workers and scale the computation of FRI-like proofs horizontally, and it's something we would eventually like to offer as an API as well given the rapid rise of popularity for STARKs recently - as STARKs are much more scalable to begin with, they might leverage even more interesting applications to high-degree circuits (especially AIR-based STARKs as they are very well suited for structured computation, something that is common in many different AI models and most larger programs that a user might want a ZKP for). This is considered to be future research work and will require some mathmatical work prior to implementation, for which we may also be able to produce a paper.
In case a user has a full cluster of hardware that they would like to use, rather than a single computer as a miner, we would like to be able to provide functionality in the provided miner to set up LAN or WAN networks for the registered miner in order to further distribute the received work and speed up potential proof generation even more. This would create a kind of 'fractally scaled' distributed zkp generation network. Eventually, this may give rise to fractal mining pools, in an event where the subnet hits capacity but people may still be looking to enter and provide hardware in order to scale up the subnet's proving power. If this becomes viable eventually, it is something we would like to provide infrastructure for down the line, but this is still highly speculative and we assume that this really will not be necessary for now - and might never become necessary. It's merely something we will keep an eye out for.
We will now proceed to explain in more depth how the PLONK polynomial protocol is modified by Liu et al. to accommodate distribution of work. Recall that we need to modify the gate checks and the wiring checks. On paper, these checks boil down to this.
We take the example of the vanilla PLONK paper, in which the constraint system is defined as a fan-in-two arithmetic circuit with one output per gate, and exposes addition and multiplication gates which can be interwoven by the use of selector polynomials. The polynomial equation for this constraint system's gate checks looks like this. Let
Checking whether this polynomial is zero at a random point will convince the verifier that each row of the circuit is computed correctly, due to the Schwarz-Zippel lemma.
The proof also needs to ensure that the variables used in the circuit are related to one another, so that the integrity of the computation can be proved. More specifically speaking, there are redundant elements in the vector of circuit variables (particularly between
In PLONK, this is done by showing that the grand product of one list equals the grand product of another - in this case, the second list is simply a permutation of the original list. We want to show that, for a list
To prove this, the prover creates a 'product polynomial'
where
and
The prover may then prove to the verifier about the legitimacy of the wiring checks by two polynomial equations:
which should both equal zero if
We can finally combine all checks together into a single polynomial identity test. Define
If all polynomials evaluate to 0 on any point in the subgroup, this identity should hold and the proof is considered to be correct.
Here is where the work on Pianist comes in to divide the work of proof generation of between
Define a bivariate polynomial
Note that for any bivariate extension of a univariate polynomial
, we use the uppercase version of this polynomial .
With the final identity test coming down to:
where the left hand side of the equation evaluates to 0 for all
With all of this combined, we can now analyze the final complexity of generating a distributed proof. Define
With this all defined, we now have a distributedly computable PLONK + KZG proving system due to the work of Liu et al.