The current bottleneck of our chain is that there is an unique being verifier by the whole network for single 32/64GB sectors. Henceforth, we want to remove the linear amount of proofs being verified onchain. With Hyperdrive, we already are aggregating proofs and that will give a noticeable throughput improvement. However, onchain verification still contains linear operations, such as individual sectors book-keeping that will soon dominate the cost of verification. In this document we're evaluating solutions to remove any linearities in the state and in onchain verification, as much as possible on a short/mid term basis.
We want the following goals:
Optionals:
SnarkPack uses the public inputs in two places, two places where this proposal removes most linearities:
The public inputs for one ProveCommit are composed of multiple pieces:
Deals: With respect to the miner storage power, deals are considered in two places:
There exists two main "practical" (in the sense of not requiring to re-design everything in Filecoin) solutions to achieve these goals; both ideas are centered around the idea of aggregating the and keeping only a single aggregate onchain. We call an aggregation of sectors proved at the same time a supersector, which is designated by a single onchain.
There are currently two ways envisionned for doing this:
Size of supersectors: Given we use Groth16 and don't have unlimited recursion available, we must fix a constant size of how many sectors are included in a supersector. If a miner provides less than sectors, then a simple padding is applied at the proof level.
For each of these solutions, we need to guarantee the same guarantees are preserved than the one made by onchain verification. We can use SNARKs to help us. One such particular constraint is to enforce the uniqueness of sector ID: they all must be unique - either by proving they are unique or by showing they are incremental Indeed, the sectorID is the only (minimum) "variable" for a given miner when sealing a new sector. We want to remove this check from onchain verification, such that there is no need to pass individual or sectorIDs onchain. There are two envisionned ways for doing this, both compatible with any of the above proposals:
Why can't we simply put the s into an accumulator ?:
Because the SNARK detects with high probability that the prover used an invalid sectorID, because all the labellings will be incorrect with respect to the valid encoding. However, prover can still create a new from the same replica and proof, by design, allows prover to cheat up to a certain percentage, like 20%.
Because deals are touching many areas in Filecoin including market actors, removing completely the linearities around them would probably require a more indepth restructuring of the state (moving to full stateless approach).
The current state of Filecoin does not keep around the s of each sector already, so the mapping is between s and a sector.
In this proposal, the mapping is now between s and a supersector, but it keeps the same logic: prover have to verify in a ProveCommit that it knows some leaves corresponding to a that depends on the deals.
Proposal 1 requires the engineering of:
Proposal 2 requires the engineering of:
In both case, 3 new circuits will have to developped and the new way of aggregating sectors into supersectors be implemented by proof land as well.
Assuming the provers has a list of deals ID ( of them), he wants to put them in the supersector. Let's call the subset of deals he wants to include in the -th sector.
Note: deals repartition between the different sectors is something to be customized according to specific needs / constraints. This solution enables any kind of subset.
Prover submits the following:
Steps of the prover:
For the deals, verifier has to compute first all manually before giving them to the aggregation SNARK. This work is linear in the number of deals. As well, the state must store all deal IDs individually as it is currently the case.
Constant : Filecoin API could just give a single and the proof system will take care of using the same one at the relevant places. In this case, it has to perform a multiexp like the following:
We can't gain anything on the computation by having the same accross all but this details will be hidden at the border between Lotus and Filecoin.
For WindowPost, we need to make a new circuit with the following characteristic (OR change the inner circuit of prove commit):
Trade-off: We can also create an extra "glue" SNARK similar toProveCommit that aggregates the public inputs so we don't change the window post circuit. But this solution requires to aggregate multiple window post as we are using SnarkPack.
Given the circuit above, we can use SnarkPack natively to aggregate wpostpost proofs. There work will be linear but in the amounts of super sectors.
The main condition is merging supersectors can only happen when the number of total sectors is . Let's say one wants to aggregate two supersectors. Here is some notations:
As noted in the overview section, we must have that .
The circuit looks like the following:
This "naive" mechanism forces all sectors in the supersectors to be proven in this new circuit. Indeed, if we allow a restricted number, then verifier don't know if prover is re-proving old proofs or not. So by default, we could assume a supersector contains 2500 sectors. When aggregating with SnarkPack, there would be 1 public input per supersector so linearity is OK there.
For larger supersector sizes, assuming 2500 sectors is the maximum we can prove inside a wpost, we can change the above circuit to only prove a subset by including the indexes as public inputs. The circuit will only verify inclusion proofs for the given indexes. In that way, we can prove X sectors in one proof, and X in another proof, both for the same supersector.
In that case, when aggregating with SnarkPack, we would need a linear number of inputs with the number of sectors being proven. These inputs are only indexes, but the verifier would need to know which one (for example, first 0->2500 in one proof, then 2501->5000 in another proof
Similar to proposal 1.
In this proposal, we modify the ProveCommit circuit to check that the sector belongs to the supersector directly inside. The circuit still proves only one sector only:
Prover creates a proof for all sectors in the super sector, and then aggregates them using SnarkPack natively.
Note how the verifier must, as in the proposal 1, compute himself the s to give as public inputs to the SnarkPack aggregation routine.
SnarkPack at this stage still needs the public inputs of each porep: it leads to a still linear amount of public inputs but with a much lower constant. For each provecommit proofs to be aggregated, verifier needs to provide:
That means there are 6 public inputs per proofs. Using 819 poreps, that means 49152 public inputs still. Going to a higher number of proofs using SnarkPack, verification time may suffer because of this linearity
TODO: to be analyzed more deeply if we want to go with this solution, where the bottleneck is lying inside SnarkPack. General consensus seems to agree that current bottleneck is state reading, so that would be fine in this case - except for
Similar to Proposal 1.
Proposal 1 has the following pros:
However, it requires a complex interaction between SnarkPack and the proofs, (which may be hard to prove), we're not using SnarkPack as a black box anymore.
Proposal 2 is potentially more elegant as it brings us closer to the design of fully stateless mining in Filecoin. However that requires new proofs systems for ProveCommit and unfortunately, still keep some linearities in the ProveCommit phase.
However, both systems require a new prove commit.
The linearity that is still present in the system in both systems is due to the deals, that is left to solve. Namely, both proposals remove the need to store individuals but the individual s stored inside a supersector are still needed in state. Moving that offchain requires touching many other components of Filecoin, and would most likely endup having the full state being provable offhchain, the "stateless" mining solution.
There is also another way to ensure sectorIDs that removes the need to require incremental sector IDs using accumulators and that approach may lead closer to the fully stateless mining approach. It is however a bit more complex to implement in SNARK.
The key problem here is how to build an accumulator that (1) supports non membership proofs and (2) produces proofs that can be efficiently verified in a circuit.
Using accumulators on pairing settings or even on regular curves or RSA settings, yield (1) but not (2) given our current SNARK framework (BLS12-381). We could use a different curves like BLS12-377 that enables efficients pairings or another proof system, but that is unlikely to give a solution short term.
–> Merkle Trees:
Using a Merkle Tree naively fits (2) but not (1). To have (1), we need special properties about the Merkle Tree that isn't clear yet how this would impact the computation inside the circuit
A sorted tree allows to prove non membership by proving the existing neighbors in the tree. Example: prove 9 and 11 share common ancestor and there is no need for 10, for proving 10 isn't included in the tree. This solution requires complex machinery to update the tree each time we are including a new value. If the tree is empty at the beginning it isn't clear we can do this in a straightforward fashion using our current SNARK toolset (Groth16 / R1CS). We would potentially need more recent constructions that allows incremental computations to allow a diverse depth of updates.
Another practical approach is to consider a Merkle Tree full of zeros, with a maximum size defined. To prove non-membership of entry , we look at index and prove inclusion of the leaf "0". To update, we simply put the value at the index and prove inclusion of the value .
This gives (1) and (2). However, it might come at a high practical cost, given miners now must keep in memory/disk a large merkle tree, regardless of the number of sectors they currently have proven ! Potentially we can cache most layers of the tree so miners don't need to compute from scratch. It will nevertheless probably impact negatively the computation of proofs.
@anca @rosario @nikkolasg
@anca @rosario @nikkolasg @nicola
ReplicaID is unique per PoRep, what makes it unique per sector is the SectorID
Aggregation (CommR1, CommR2, CommR3) -> CommRAgg
ProveCommit(CommRAgg, proof, XXXXX)
Problem: If we naively aggregate inputs, we lose track of whether CommRs have been re-used
Circuit(CommRAgg, SuperSectorID, index):
SnarkPack([snark1(CommRAgg, 1)…snarkn(CommrAgg, n)])
Verification(CommRAgg, [1…n])
Way 1: Prover packs the all inputs into snark
Glue SNARK with replicaID inside of circuit (non-linear):
Way 2: Verifier computes inputs to be passed to snarkpack
SnarkPack with replicaID outside of circuit (non-linear):
w/ luca + nikkolasg
precommit:
prover gives (subset of whose dealsID are in which sector), and
provecommit:
Verifier recomputes from dealsID (sha256 so it is fast)
circuit public input takes , as well as the
Regular prove commit verifies that for each challenge at sector prover knows the path from the leaf to
The aggregation circuit also verifies that is part of - so each public input passed to provecommit, is also proven to be part of , therefore the whole is proven correct, therefore we have a mappings from dealsID ->
Linearities:
Computation:
precommit:
prover gives , verifier computes himselfs the
provecommit:
circuit public input takes all m simply aggregates them normally
verifier needs to load these as public inputs
Linearities
Computation: