Try   HackMD

Aggregation proposals

Context

The current bottleneck of our chain is that there is an unique being verifier by the whole network for single 32/64GB sectors. Henceforth, we want to remove the linear amount of proofs being verified onchain. With Hyperdrive, we already are aggregating proofs and that will give a noticeable throughput improvement. However, onchain verification still contains linear operations, such as individual sectors book-keeping that will soon dominate the cost of verification. In this document we're evaluating solutions to remove any linearities in the state and in onchain verification, as much as possible on a short/mid term basis.

Goals

We want the following goals:

  • The state is does not contain a linear amount of information anymore, for example and particularly
    comR
    s, but only an "aggregation" of commRs.
  • As consequence, the SnarkPack verifier don't need to read all the public inputs to pass as input to the fiat shamir transform
  • As well, verifier don't need to read all the public inputs inside the Groth16 equation

Optionals:

  • Keeping the same circuit for ProveCommit would be ideal
  • Find a way to compress the
    comD
    as well

Background

  • Prover is aggregating
    n
    proofs
  • Each proofs has
    p
    public inputs
    aj
    for
    j:0p
  • We denote
    ai,jF
    the
    j
    -th public input of the
    i
    -th proof

SnarkPack uses the public inputs in two places, two places where this proposal removes most linearities:

  1. Inputs to the Fiat Shamir transform to generate a random
    r
    used for a random linear combination of all the proofs
    • r=H(Cp,{ai,j})F
      with
      i:0n
      and
      j:0p
    • Cp
      is a commitmet to all the proofs
  2. Verification of the Groth16 equation using the random linear combination
    • The relevant term that uses theses public inputs is:
      T=j=0pSji=0nai,jri

      where
      SjG1
      is part of the verification key of the Groth16 provecommit proof.

The public inputs for one ProveCommit are composed of multiple pieces:

  • ReplicaID = H(ProverID, SectorID, ReplicaSeed, CommD, PoRepID)
    • Hash is done outside the circuit
  • The
    comR
    : this is generated during Precommit time, so it is currently known to the chain, before miner sends its prove commit. One per sector.
  • The
    comD
    : this is a commitment to the data being embedded in the sector
  • The challenges
    ck
    : these are generated from a fixed seed
    s
    from random beacon (let's note up to
    k
    challenges)

Deals: With respect to the miner storage power, deals are considered in two places:

  • PreCommit time: Miner advertises which deal CIDs
    dealID
    he is integrating into his sector. The commitment (root of merkle tree) of these
    dealID
    is called
    comD
  • ProveCommit:
    • Miner proves he knows some leaves whose root is
      comD
    • Verifier also checks
      comD
      is correctly computed from the advertised
      dealID
      s from PreCommit

Overview of the proposals

Aggregating sectors

There exists two main "practical" (in the sense of not requiring to re-design everything in Filecoin) solutions to achieve these goals; both ideas are centered around the idea of aggregating the

comR and keeping only a single aggregate
comR
onchain
. We call an aggregation of sectors proved at the same time a supersector, which is designated by a single
comR
onchain.
There are currently two ways envisionned for doing this:

  1. SnarkPack inputs aggregation: onchain verification includes now verifying a SNARK that proves the aggregation of inputs has been done correctly, and gives the aggregation to SnarkPack verifier directly - it removes the linearity in the SnarkPack system.
  2. Circuit inputs aggregation: onchain verification now only passes a constant number of inputs per proofs, i.e. the
    comR
    , and the circuit checks that the
    comR
    belongs to
    comR
    , i.e. that the sector belongs to the supersector.

Size of supersectors: Given we use Groth16 and don't have unlimited recursion available, we must fix a constant size

n of how many sectors are included in a supersector. If a miner provides less than
n
sectors, then a simple padding is applied at the proof level.

Enforcing uniqueness of sectors

For each of these solutions, we need to guarantee the same guarantees are preserved than the one made by onchain verification. We can use SNARKs to help us. One such particular constraint is to enforce the uniqueness of sector ID: they all must be unique - either by proving they are unique or by showing they are incremental Indeed, the sectorID is the only (minimum) "variable" for a given miner when sealing a new sector. We want to remove this check from onchain verification, such that there is no need to pass individual

comRs or sectorIDs onchain. There are two envisionned ways for doing this, both compatible with any of the above proposals:

  1. Incremental SectorID: Miner proves he is using sequential sectorIDs inside a SNARK
  2. Unique SectorID: Miner proves he is using unique sectorIDs inside a SNARK

Why can't we simply put the

comRs into an accumulator ?:
Because the SNARK detects with high probability that the prover used an invalid sectorID, because all the labellings will be incorrect with respect to the valid encoding. However, prover can still create a new
comR
from the same replica and proof, by design, allows prover to cheat up to a certain percentage, like 20%.

Deals

Because deals are touching many areas in Filecoin including market actors, removing completely the linearities around them would probably require a more indepth restructuring of the state (moving to full stateless approach).
The current state of Filecoin does not keep around the

comDs of each sector already, so the mapping is between
dealID
s and a sector.
In this proposal, the mapping is now between
dealID
s and a supersector, but it keeps the same logic: prover have to verify in a ProveCommit that it knows some leaves corresponding to a
comD
that depends on the deals.

Overview of Costs of engineering

Proposal 1 requires the engineering of:

  • 1 aggregation snark proof
  • 1 windowpost proof
  • 1 winning post proof
  • 1 merging supersector proof
  • Rework the snarkpack library to operate in two steps (mild effort)

Proposal 2 requires the engineering of:

  • 1 porep proof
  • 1 wpost proof
  • 1 merging supersector proof
  • 1 winning post proof

In both case, 3 new circuits will have to developped and the new way of aggregating sectors into supersectors be implemented by proof land as well.

Proposal 1: SnarkPack inputs aggregation with incremental SectorIDs

PreCommit step

  • Generate a unique SuperSector ID or
    SSID
  • Generate the different SectorIDs
    Si=H(SSID,MinerID,i)
  • Generate all the respective ReplicaID
    Ri
    using these
    Si
  • Generate all its poreps using the respective sector IDs leading to many
    comR
    s
  • Creates a Merkle Tree of all of these
    comR
    s, whose root is
    comR
    • We must use a SNARK friendly hash function here because we are going to verify inclusion proofs in circuits later on.

Deals Informations

Assuming the provers has a list

L=(D1,,Dp) of deals ID (
p
of them), he wants to put them in the supersector. Let's call
Li
the subset of deals he wants to include in the
i
-th sector.

Note: deals repartition between the different sectors is something to be customized according to specific needs / constraints. This solution enables any kind of subset.

On chain submission

Prover submits the following:

  • comR
  • SSID
  • randomness epoch chosen for this precommit step
  • Deals related information: all subsets of deals
    Li
    with
    i:0p
    This linear information is kept in the state.

Trade off

  • It is not strictly required to get all sectors use the same precommit epoch to get the same randomness. If we don't, then that means we still have a linear amount of epochs at most to track onchain for the duration of precommit - it eventually gets removed after some time or after provecommit. As a first step, it might be an acceptable cost because we still get rid of the linear amount of
    comR
    s onchain and miner dont need to get huge hardware to make and keep all proofs at the same time.

ProveCommit

Steps of the prover:

  1. Prover gets the seed
    s
    to generate the challenges
    ci,k
    from the seed (
    i
    sector,
    k
    challenge index)
  2. Prover generates all its ProveCommit indiviual proofs as usual
  3. Prover starts the SnarkPack procedure but with
    r
    but modified:
    • r=H(Cp,comR,s,minerID)
      - all this information is public
  4. Prover creates an "aggregation" Groth16 proof:
    • Public inputs (statement):
      r
      , the
      comR
      , the miner ID,
      s
      ,
      SSID
      , all
      lj
      (see below), and all
      comDi
    • Private inputs (witness): the individual
      comRs
      , the individual challenges
      ci
      , individual
      Si
    • Circuit:
      a. Prove correct generation of all sectors ID
      Si
      (hashing)
      b. Prove correct generation all the challenges (hashing)
      c. Prove merkle inclusion proof about all
      comR
      s given to the
      comR

      e. Prove correct computation of
      j
      -th terms of the linear combination of SnarkPack
      lj=i=0nai,jriF
      for
      j:0p
      .Note prover has access to all
      ai,j
      from public + private inputs
      g. Ensure all
      comR
      are different
  5. Prover sends on chain: the SnarkPack proof + "aggregation" proof +
    lj
    s on chain
  6. Verifier loads the
    comR
    , minerID from state
  7. Verifier generates the
    r
    from SnarkPack and all the
    comDi
    from PreCommit information (from all the
    Li
    )
  8. Verifier now verifies the "aggregation" proof (now that he has all public inputs)
  9. If successful, then he verifies SnarkPack with the output of the "aggregation proof" for the linear random combination:
    T=j=0pSjlj

    This takes time only with respect of public inputs of ProveCommits; not in the size of the number of proofs.

Deals

For the deals, verifier has to compute first all

comDi manually before giving them to the aggregation SNARK. This work is linear in the number of deals. As well, the state must store all deal IDs individually as it is currently the case.

Constant

comD: Filecoin API could just give a single
comD
and the proof system will take care of using the same one at the relevant places. In this case, it has to perform a multiexp like the following:
j=0pSjcomDj

We can't gain anything on the computation by having the same
comDj
accross all
j
but this details will be hidden at the border between Lotus and Filecoin.

WindowPost

For WindowPost, we need to make a new circuit with the following characteristic (OR change the inner circuit of prove commit):

  • Public inputs (statement): the
    comR
    ,the seeds
    s
    that generate the challenges
  • Private inputs (witness): all the individual
    comR
    s from the supersector, all challenges
    ci
    (X for each sectors) generated from the seed
    si
  • Circuit:
    • verifies all the individual
      comR
      s are included in
      comR
    • for each
      comR
      s
      i
      , take the seed
      si
      and generate
      x
      challenges
      • for each such challenges
        xi
        , prove the inclusion of leaf inside the relevant
        comR
        (native windowpost circuit)

Trade-off: We can also create an extra "glue" SNARK similar toProveCommit that aggregates the public inputs so we don't change the window post circuit. But this solution requires to aggregate multiple window post as we are using SnarkPack.

Aggregating WindowPosts

Given the circuit above, we can use SnarkPack natively to aggregate wpostpost proofs. There work will be linear but in the amounts of super sectors.

Merging SuperSectors

The main condition is merging supersectors can only happen when the number of total sectors is

n. Let's say one wants to aggregate two supersectors. Here is some notations:

  • First one designated by
    comR1
    with
    n1
    actual sectors
    comR1,1comR1,n1
    inside
  • Second one designatured by
    comR2
    with
    n2
    actual sectors
    comR2,1comR2,n2
    inside
  • The new supersector containing sector from both supersectors is designated via
    comRa
    and contains
    n1+n2
    sectors.

As noted in the overview section, we must have that

n1+n2n.

The circuit looks like the following:

  • Public Inputs:
    comR1,comR2,n1,n2,comRa
  • Private Inputs:
    comR1,1comR1,n1,comR2,1comR2,n2
  • Outside circuit: prover computes the new merkle tree where the leafs are all the
    comRs
    from both supersectors
  • Circuit:
    • For each
      comR1,i
      , verify inclusion proofs to
      comR1
    • For each
      comR2,i
      , verify inclusion proofs to
      comR2
    • For each
      comRi,j
      , with
      i=1,2
      , verify inclusion proof to
      comRa

Constraint & Variations

This "naive" mechanism forces all sectors in the supersectors to be proven in this new circuit. Indeed, if we allow a restricted number, then verifier don't know if prover is re-proving old proofs or not. So by default, we could assume a supersector contains 2500 sectors. When aggregating with SnarkPack, there would be 1 public input per supersector so linearity is OK there.

For larger supersector sizes, assuming 2500 sectors is the maximum we can prove inside a wpost, we can change the above circuit to only prove a subset by including the indexes as public inputs. The circuit will only verify inclusion proofs for the given indexes. In that way, we can prove X sectors in one proof, and X in another proof, both for the same supersector.
In that case, when aggregating with SnarkPack, we would need a linear number of inputs with the number of sectors being proven. These inputs are only indexes, but the verifier would need to know which one (for example, first 0->2500 in one proof, then 2501->5000 in another proof

Proposal 2: Circuit inputs aggregation with incremental SectorIDs

PreCommit

Similar to proposal 1.

ProveCommit

Single Sector

In this proposal, we modify the ProveCommit circuit to check that the sector belongs to the supersector directly inside. The circuit still proves only one sector only:

  • Public inputs (statement):
    comR,seed,i,minerID,SSID,comDi
  • Private inputs (witness):
    comR
    , inclusion proof for
    comR
    ,
    sectorID
    ,
    replicaID
    , challenges
    xi
  • Circuit:
    • Verify correct computation of
      sectorID
      (hashing)
    • Verify correct computation of ReplicaID
    • Verify correct computation of challenges
      xi
      (hashing)
    • Verify the inclusion proof from
      comR
      to
      comR
    • Verify the inclusion proof from
      comD
      to
      comD
      (hashing)
    • Verify ProveCommit as before (inclusion proofs on challenges)

Super Sector

Prover creates a proof for all sectors in the super sector, and then aggregates them using SnarkPack natively.
Note how the verifier must, as in the proposal 1, compute himself the

comDs to give as public inputs to the SnarkPack aggregation routine.

Linearities

SnarkPack at this stage still needs the public inputs of each porep: it leads to a still linear amount of public inputs but with a much lower constant. For each provecommit proofs to be aggregated, verifier needs to provide:

  • comR
    - that must be read from the state, only once given sectors belong to same supersector
  • i
    index of the sector - no read, verifier simply input this index
  • seed
    - read only once per supersector from chain info
  • SSID
    - read only once per supersector from state
  • comD
    - read from state, per proofs (linear)

That means there are 6 public inputs per proofs. Using 819 poreps, that means 49152 public inputs still. Going to a higher number of proofs using SnarkPack, verification time may suffer because of this linearity

TODO: to be analyzed more deeply if we want to go with this solution, where the bottleneck is lying inside SnarkPack. General consensus seems to agree that current bottleneck is state reading, so that would be fine in this case - except for

comD

WindowPost

Similar to Proposal 1.

Comparisons & Overview

Proposal 1 has the following pros:

  • ProveCommit circuit is NOT modified, so there is no distinction of hardware/proving time between miners that use supersector or not
  • Removes all linearities to when verifying ProveCommits for a supersector

However, it requires a complex interaction between SnarkPack and the proofs, (which may be hard to prove), we're not using SnarkPack as a black box anymore.

Proposal 2 is potentially more elegant as it brings us closer to the design of fully stateless mining in Filecoin. However that requires new proofs systems for ProveCommit and unfortunately, still keep some linearities in the ProveCommit phase.

However, both systems require a new prove commit.

Areas of improvements

Deals

The linearity that is still present in the system in both systems is due to the deals, that is left to solve. Namely, both proposals remove the need to store individuals

comD but the individual
dealID
s stored inside a supersector are still needed in state. Moving that offchain requires touching many other components of Filecoin, and would most likely endup having the full state being provable offhchain, the "stateless" mining solution.

Proving Uniqueness of Sector IDs

There is also another way to ensure sectorIDs that removes the need to require incremental sector IDs using accumulators and that approach may lead closer to the fully stateless mining approach. It is however a bit more complex to implement in SNARK.

The key problem here is how to build an accumulator that (1) supports non membership proofs and (2) produces proofs that can be efficiently verified in a circuit.

"Pure" cryptographic accumulators

Using accumulators on pairing settings or even on regular curves or RSA settings, yield (1) but not (2) given our current SNARK framework (BLS12-381). We could use a different curves like BLS12-377 that enables efficients pairings or another proof system, but that is unlikely to give a solution short term.

> Merkle Trees:
Using a Merkle Tree naively fits (2) but not (1). To have (1), we need special properties about the Merkle Tree that isn't clear yet how this would impact the computation inside the circuit

Sorted Merkle Tree

A sorted tree allows to prove non membership by proving the existing neighbors in the tree. Example: prove 9 and 11 share common ancestor and there is no need for 10, for proving 10 isn't included in the tree. This solution requires complex machinery to update the tree each time we are including a new value. If the tree is empty at the beginning it isn't clear we can do this in a straightforward fashion using our current SNARK toolset (Groth16 / R1CS). We would potentially need more recent constructions that allows incremental computations to allow a diverse depth of updates.

A Merkle Tree of "0"s

Another practical approach is to consider a Merkle Tree full of zeros, with a maximum size

M defined. To prove non-membership of entry
si
, we look at index
si
and prove inclusion of the leaf "0". To update, we simply put the value
si
at the index
si
and prove inclusion of the value
si
.
This gives (1) and (2). However, it might come at a high practical cost, given miners now must keep in memory/disk a large merkle tree, regardless of the number of sectors they currently have proven ! Potentially we can cache most layers of the tree so miners don't need to compute from scratch. It will nevertheless probably impact negatively the computation of proofs.

Notes

Notes 2020-06-16

@anca @rosario @nikkolasg

  • It is highly mikely miners / nodes need to still access the individual comR for disputing wpost (maybe even for deals?) - we should look into more.
  • We need to make sure the comR* is correctly formed from valid comRs so at PreCommit time, prover sends the individual comRs in the transaction data and verifiers recreates the commitment and validates each individual comRs. This is to avoid Anca's attack where we can reuse previous comRs.
  • We CAN remove the hashing of the challenges inside the extra SNARK if we want to. That means verifier will have to compute these challenges outside the circuit. We could use a fast hashing function like blake3 for this so that it is efficiently done. Even more if we use the same challenges accross all proofs
  • Using same challenges accross all proofs need to be seen with Lucas to see if Porep would still be secure overall for many proofs. More thoughts are needed on this topic.
  • Not clear yet if we can avoid having a new circuit for this.

Notes 2020-06-17

@anca @rosario @nikkolasg @nicola

  • ReplicaID = H(ProverID, SectorID, ReplicaSeed, CommD, PoRepID)
    • ProverID: miner address
    • SectorID: picked by the miner and it can't be already be used
    • ReplicaSeed: some randomness picked in the past from the chain (this ties the proof to a fork of the blockchain)
    • PoRepID: the identifier for this type of proof of replication version

ReplicaID is unique per PoRep, what makes it unique per sector is the SectorID

Aggregation (CommR1, CommR2, CommR3) -> CommRAgg

  • Proof 1 (SectorID) -> CommR
  • Proof 2 (SectorID) -> CommR
  • Proof 3 (SectorID) -> CommR

ProveCommit(CommRAgg, proof, XXXXX)

Problem: If we naively aggregate inputs, we lose track of whether CommRs have been re-used

Input aggregation at circuit layer

  • Circuit CommRAgg, index
    • SubCircuit1: There exist a path at index i, that has a leaf that is CommR
    • SubCircuit2: Prove CommR is highly likely to be correctly generated from SectorID using some challenges which select some nodes to be correctly generated and consistent with CommR
      • Checks all merkle trees witnesses of challenged node are consistent with the position and with CommR and CommD
      • Check that the labeling of a node = H(ReplicaID||parents)
    * CommRAgg
   /\
/\    /\
CommR1...CommRn
/\
/\
node_1...node_z
  • Make all proofs to be aggregated to have the same SectorID
    • This does not work, because sectors with the same sector IDs lead to the same CommR and one could generate one replica and aggregate it many times
  • Subsectors have a Sector ID -> this will be a public input to the SNARK
    • This makes verification time linear AND requires the chain to store SectorIDs in the state, so the state size is linear in the subsectors.
  • Subsectors have a subsectorID which is SuperSectorID||i

Circuit(CommRAgg, SuperSectorID, index):

  • Private Input: CommR at the given index in the super sector
  • SubCircuit1(i): There exist a path at index i, that has a leaf that is CommR
  • SubCircuit2(i):
    • Prove all merkle trees for parents and challenged nodes
    • Check that the labeling is done correctly
    • Check that ReplicaID = H(ProverID, SuperSectorID||index, ReplicaSeed, CommD, PoRepID)

SnarkPack([snark1(CommRAgg, 1)snarkn(CommrAgg, n)])

Verification(CommRAgg, [1n])

Input aggregation at aggregation layer

Way 1: Prover packs the all inputs into snark
Glue SNARK with replicaID inside of circuit (non-linear):

  • Subcircuit 1: Ensure that linear combinations are all correctly computed with inputs that are all aggregated into a CommInputs: Merkle tree of (CommR1, 1, ReplicaID1), (CommR2, 2, ReplicaIDn)
  • Subcircuit 2: For each leaf i show that
    • the second item is increasing from 1 to n
    • the ReplicaID is contructed as H(||i)

Way 2: Verifier computes inputs to be passed to snarkpack
SnarkPack with replicaID outside of circuit (non-linear):

  • At verification time verifier must generate 1..n, must generate ReplicaID1..ReplicaIDn
  • Glue Circuit:
    • Subcircuit 1: Ensure that linear combinations are all correctly computed with inputs that are all aggregated into a CommInputs: Merkle tree of (CommR1, 1, ReplicaID1), (CommR2, 2, Replica3IDn)

Meeting 6 July

w/ luca + nikkolasg

  • We should draft something about winning post (that means 3 new circuits to realize)
  • We should confirm no sector IDs are passed into windowpost / winningpost
  • Challenges:

Drafts - workplace - Do not look if you dont want to loose time

Deal solution1

precommit:
prover gives

Li (subset of whose dealsID are in which sector),
comDi
and
comD

provecommit:
Verifier recomputes
comDi
from dealsID (sha256 so it is fast)
circuit public input takes
comD
, as well as the
comDi

Regular prove commit verifies that for each challenge
j
at sector
i
prover knows the path from the leaf to
comDi

The aggregation circuit also verifies that
comDi
is part of
comD
- so each public input
comDi
passed to provecommit, is also proven to be part of
comD
, therefore the whole
comD
is proven correct, therefore we have a mappings from dealsID ->
comD

Linearities:
Computation:

  • verifier needs to compute all
    comDi
    manually
    Storing:
  • list of Deals tied to super sector, with
    comD

Deal solution 2

precommit:
prover gives

Li, verifier computes himselfs the
comDi

provecommit:
circuit public input takes all
comDi
m simply aggregates them normally
verifier needs to load these as public inputs

Linearities
Computation:

  • verifier needs to compute all
    comDi
    manually
    Storing:
  • List of Deals with each
    comDi