Verifiable Firehose: Problem Statement and Requirements

# Verifiable Firehose: Problem Statement and Requirements # Executive Summary This document aims to define the design requirements for verifying StreamingFast's Firehose, a software stack that efficiently extracts blockchain data. It also provides a reference for future discussions regarding data integrity within The Graph protocol. The role of verifiability in The Graph, the importance of modular verifiability solutions in a World of Data Services (WoDS) environment, and the necessity of composability are initially discussed. We then present a detailed data pipeline for The Graph, built on Firehose, and outline a problem scope for the Firehose process verification. The document's second section introduces terminology and metrics used to quantify trade-offs within the design space, accompanied by a brief discussion on the high-level design patterns of existing solutions to inform the design space. A roadmap for near-term design and prototyping of verifiability solutions for Firehose concludes the document. The benefits of Verifiable Firehose include lowered entry barriers for Indexers, a mechanism to verify that an Indexer possesses the necessary data, and a stepping stone for verifying Firehose-derived queries. Additionally, by delivering Verifiable Firehose, The Graph can potentially provide a verifiable solution to EIP-4444s, proving that an Indexer possesses an accurate copy of the Ethereum blockchain's history and that data can be verifiably retrieved from this history. # Introduction Firehose is a software stack built by StreamingFast to efficiently extract blockchain data. The data extracted by Firehose can be filtered with S*ubstreams* to provide real-time data to consumers or further processed with S*ubstreams modules* to create other downstream data services like GraphQL, SQL, or key-value stores. Extraction, filtering, and processing are performed by Indexers. The Graph is a decentralized and permissionless protocol, so it is critical to verify the integrity of the data served by Indexers. The goals of this document are to **define the Verifiable Firehose** problem statement and **state the design requirements**. This document will serve as a reference for discussion regarding verifiability within The Graph protocol, in particular with regards to The Graph protocol built on Firehose. The first section provides a brief discussion regarding the role of **verifiability in The Graph** and how it impacts the decentralization of the protocol, how **World of Data Services** promotes the need for **modular verifiability solutions**, and why the **composability** of those solutions is important. A detailed **data pipeline** for The Graph built on Firehose is introduced and the verifiable solution space is scoped for the Firehose process. The second section focuses on **design requirements.** **Terminology** and **metrics** are introduced; these will be used to quantify **trade-offs** in the **design space**. A brief discussion regarding **high-level design patterns of existing solutions** is included to provide intuition for the design space. This document concludes with a **roadmap** for the near-term design and prototyping of verifiability solutions for Firehose. ## Verifiability in The Graph The Graph is continuously evolving. In particular, there are two developments that are closely related to the design of verifiability solutions for The Graph: incrementally removing points of centralization within the protocol (progressive decentralization) and expanding the scope of the services provided by The Graph ([GIP-42 World of Data Services, WoDS](https://forum.thegraph.com/t/gip-0042-a-world-of-data-services/3761)). The Graph started as a centralized data provider, i.e. The Graph’s hosted service, with the goal of progressive decentralization. The first step towards decentralization is the migration to the decentralized network. The decentralized network is enabled via smart contracts on the base chain (Arbitrum). These contracts provide incentive mechanisms for encouraging honest Indexer behavior, with the objective of commoditizing the processes of serving queries and ensuring that any entity can efficiently participate in the network and serve queries. The architecture of the mechanisms significantly impacts the overall efficiency, scalability, and ultimately decentralization of the overall protocol. Currently, the verifiability solution implemented for these mechanisms is a trusted oracle; if an indexer is suspected of providing erroneous data, then a dispute is initiated (via smart contract) and the dispute is resolved by a trusted entity appointed by The Graph Council. This solution provides a stepping stone towards decentralization by allowing for permissionless participation in the verifiability of the protocol via smart contracts, however relying on a trusted entity is a point of centralization, and scalability is limited by requiring a human in the loop to arbitrate every dispute. A primary design goal for the verifiability solutions in this document is to automate the dispute process, removing the need for a trusted human in the loop (oracle). Another significant development is the WoDS vision. In that future, The Graph provides expanded services beyond GraphQL queries. What this means for verifiability is that there will likely be a diverse, and potentially open-ended, set of verifiability solutions integrated into the protocol, e.g. verifying key-value store queries can be handled differently than verifying SQL queries. Additionally, consumers will likely have different data-security/verifiability needs, e.g. the requirements of a dashboard provider are likely different than an on-chain oracle provider. Considering the diversity and complexity of this paradigm, a modular design strategy is likely more effective than attempting to architect a one-size fits all solution for all of The Graph’s verifiability needs. ## The Graph’s Data Pipeline(s) There are several processes in The Graph’s current GraphQL-based data pipeline, and, with the introduction of WoDS, there are many end-to-end pipelines. At a high level, two major pipelines are considered in this document, here called the “Archive Node Pipeline” and the “Firehose Pipeline", see figure: ![The Graph Stack and its Data Services](https://hackmd.io/_uploads/rkE95_t32.png) ### The Archive Node Pipeline The archive node pipeline was the first data processing pipeline introduced for providing indexing and querying services for The Graph. This pipeline requires an Indexer to first sync (or make RPC calls to) a blockchain archive node. Graph-Node, operating in Index mode, extracts data from the underlying blockchain by making RPC requests to an archive node for the specific data related to the subgraph currently being synced. The data output from this process is stored in a PostgreSQL database. Graph-Node, in query mode, then responds to user GraphQL queries by first translating them into SQL queries and then querying the PostgreSQL database. ### The Firehose Pipeline An alternative pipeline can be built using [Firehose](https://firehose.streamingfast.io/), in place of the archive node. In this pipeline, the Indexer must sync a Firehose-enabled blockchain full node (or have access to a Firehose data provider). Firehose extracts data from the underlying blockchain by reading from the [tracing API](https://geth.ethereum.org/docs/developers/evm-tracing) provided by the full node, providing the Indexer with access to e.g. EVM opcode execution traces of smart contract execution in addition to the conventional blockchain data (account balances, transactions, receipts, and block header metadata). The additional data is specifically categorized as *trace data* in this paper because verifying this data might require a different solution from verifying the conventional blockchain data. The data extracted using Firehose is stored in object storage in Protobuf format (*flat files*). From here, there are multiple potential data pipelines. For example, the data can be processed by a Substreams module for indexing and then be stored in a variety of sinks, e.g. a PostgreSQL database, a key-value store, etc., and made available for querying by users of The Graph protocol. Alternatively, the data from Firehose could be used by a Substreams to create a real-time streaming service. ### Disputes and Arbitration *Disputes* and *arbitration* refer to the processes used to ensure that Indexers are operating as expected within The Graph. A dispute is initiated via a smart contract whenever an Indexer is suspected of providing an incorrect response. Arbitration is the process used to resolve the dispute and determines the overall verifiability characteristics of the protocol. Arbitration will include on-chain computation and so these costs are an important consideration in the protocol design. **Design question:** Should indexers be required to prove that they possess accurate flat files? - One thought is that a mechanism could be designed that forces an Indexer to prove that they have a complete copy of a blockchain’s archive. This would encourage decentralization, e.g. Indexers would not be able to simply read from a Firehose endpoint. This seems analogous to PoI, but instead of proving data availability for a subgraph, this proves data availability for the entire blockchain history. ### [wip] Indexing Rewards *Indexing reward*s are currently used in The Graph to incentivize Indexer participation in the protocol. Whenever an Indexer wishes to collect a payment, they post a ******************Proof of Indexing****************** (PoI) for the period they claim to have been indexing. Assuming the PoI is valid, the Indexer is paid the indexing rewards for that period. PoIs can be disputed via the dispute and arbitration process described above. PoIs are currently specific to the GraphQL pipeline in the figure above, where the PoI is a digest of the PostgreSQL database; “PoI” will be used to specifically refer to this process in the remainder of this document. An analogous process for Firehose will be discussed in this document. ## Verifiability of the Firehose Pipeline(s) There are many potential end-to-end Firehose pipelines, however, all of them must start with Firehose. This means that all Indexers must either run a Firehose-enabled full node or have access to a Firehose endpoint. On the other hand, each unique pipeline, e.g. Firehose → Uniswap Substreams → PostgreSQL DB, might only be run by a handful of Indexers. Note in particular that the flat files will be identical for all Indexers honestly running Firehose for the same blockchain and that the downstream services will be increasingly specialized. This has a few implications regarding verifiability. The verifiability of Firehose data places an upper bound on the verifiability of all downstream processes, i.e. if the data from Firehose is corrupted, data from Substreams consuming that data will be corrupted. Because every Indexer using the Firehose pipeline must use Firehose data, regardless of their end-to-end pipeline, the solution space for verifiability might be broader, e.g. solutions might take advantage of a large number of Indexers accessing the same data to provide verifiability guarantees using an M of N trust assumption. <aside> 💡 The proposed starting point for verifying the Firehose pipeline(s) is to develop a solution for verifying the Firehose process. </aside> ### Scoping Firehose Verifiability Solutions To effectively scope verifiability solutions for Firehose, it is important to understand how Firehose data will be consumed and the questions that verifiable Firehose answers. **How does the indexer trust the data stored in flat files?** In The Graph, an Indexer is an interface between the blockchain and end users. Each Indexer must directly store an archive (either the full history or a slice) of the blockchain data. Using Firehose, this data is stored in flat files. If The Graph protocol has a mechanism to verify that data was derived from the blockchain, then an Indexer can be challenged to provide accurate blockchain data. So Indexers must ensure that the data they have stored is accurate. One way to ensure this is to sync a Firehose node directly, pulling data from the blockchain’s P2P network and relying on the full node’s logic to check the validity of all data received. Currently, this takes several months for Ethereum and potentially longer or shorter for other blockchains. An alternative would be to download flat files already generated by another source. The Indexer must either trust the source of the data or a mechanism must be developed to check the validity of the data. To be useful the mechanism for checking must be significantly quicker than a full sync. A mechanism providing this ability may provide an alternative [sync strategy](https://github.com/ethereum/devp2p/blob/master/caps/snap.md) for archive nodes, e.g. by downloading all block data first (flat files) and then reconstructing the Merkle tries for each block. ****************How does the protocol ensure that Indexers have a copy of the correct flat files?**************** The answer to this question would address a similar problem to PoIs in today’s protocol. It would provide a mechanism to verify that an Indexer possesses the necessary data to serve queries. In this case, specifically, it would prove that the Indexer has a complete archive of the underlying blockchain data extracted during Firehose execution. This mechanism could serve a similar purpose to PoIs, enabling an Indexer to signal that they have done some required amount of work and are eligible to receive “indexing rewards”. If Indexers are behaving honestly, and if there are no software bugs or other errors, then they should all have the same, verifiably correct, copy of the flat files. **How does the protocol ensure that an Indexer is serving data from the correct flat files?** This is the “verifiable queries” problem for Firehose. Consumers of Firehose data receive a stream of blocks derived from the flat files. The consumer then needs a mechanism to verify that the received block was derived from the correct data set. **Benefits of Verifiable Firehose to The Graph** - Lowers barrier to entry for Indexers: Indexers do not need to sync a Firehose node. They can simply download flat files from another source (i.e. a previously synced node) and verify the correctness. - Provides a mechanism within the protocol to verify that an Indexer possesses the required information to serve correct Firehose data. Similar to PoIs but for proving the entire blockchain history (vs a single subgraph) and this mechanism will rely on cryptographic proofs/automated smart contract protocols instead of human-in-the-loop arbitration. - Provides a mechanism for the consumer of Firehose data to verify the correctness of that data. This mechanism will be composable to enable other downstream data services, e.g. verifiable Substreams. - Note that by delivering Verifiable Firehose, as described in this document, The Graph is positioned to provide The Graph provides a verifiable solution to EIP-4444s → It can be proven that an Indexer possesses an accurate copy of the Ethereum blockchain’s history and that data can be verifiably retrieved from this history. **Another Perspective on Firehose** Firehose enables an alternative architecture or provides a supplement, for an archive node. Instead of storing verified blockchain data from the P2P network in a database, e.g. LevelDB in Geth or KVS in Erigon, data is stored in flat files. In order to serve as e.g. a full node, Firehose needs to be supplemented with a process to recreate the state tree; all of the data required is stored in the flat files, and the supplementary process would need to compute the intermediate Merkle trie nodes. This approach is similar to synching a full node from [snapshots](https://blog.ethereum.org/2021/03/03/geth-v1-10-0#snapshots). The input data to Firehose are transactions queried from the blockchain’s P2P network (via a full node) and the output data are the block headers, bodies, receipts, and state data here called *conventional blockchain* *data*, along with the additional trace data. The output data are stored in Object Storage in Protobuf format. There are two notions of verifiability related to the data output from Firehose, verifying that all the data *stored* in Object Storage is correct and verifying that some data *retrieved* from Object Storage is correct; correctness in this case means that the Firehose data is a direct copy of the data in the blockchain. In this document, these two notions are called *verifiable commitments (VC)* and *verifiable queries (VQ)*. ## Commitments, Verifiable Queries, and Verifiable Commitments A *query* is a request for data from a dataset. A *verifiable query* is a query from which the response can be verified to be the specific data requested from a specific dataset. Two requirements must be satisfied to verify a query result: 1) that the specific query was correctly executed, and 2) the query result was derived from the expected dataset. If both of these requirements are not met, then there can be no guarantees about the validity of the query result. A trivial verifiable query would be for the verifier to maintain an exact copy of the dataset used by the prover and to execute the same query against the dataset as used by the prover. The verifier could then check that the results match, verifying the result. This approach requires the verifier to maintain an exact copy of the dataset which imposes significant resource requirements. Cryptographic *commitments* provide a mechanism for a prover and verifier to perform computations relative to some dataset without requiring the verifier to maintain the dataset. A commitment binds a prover to a particular dataset and a VQ from that dataset can be proven with respect to the commitment. The important question is how does the verifier trust that the commitment was computed correctly? If the commitment is incorrect, then the VQ is proven with respect to a dataset that the verifier does not expect and so the VQ cannot be trusted. A verifiable commitment is used to ensure that the commitment was computed as expected by the verifier. For example, in Ethereum block headers (the roots of the Patricia-Merkle tries) are commitments to the data “stored” in the blockchain. Verifiable key-value queries can be proven by computing a Merkle inclusion proof with respect to a given block header (commitment). There are three ways to compute a VC for Ethereum block headers: 1. The verifier could recompute all block headers for the entire Ethereum history, starting from genesis up to the block of interest. 2. Alternatively, the verifier could use a light client approach, where the verifier would trust a provider to give the correct block header at a checkpoint. This is the approach used by light clients like [Helios](https://github.com/a16z/helios/blob/ff800484bc0cb4d5ea55979da235f555eee1c90c/config.md?plain=1#L15). 3. Another light client approach to obtain a VC is to use a proving system along with verifiability infrastructure provided by PoS Ethereum (e.g. the [Sync Committee](https://github.com/ethereum/annotated-spec/blob/master/altair/sync-protocol.md)) to obtain verified block headers with minimal trust assumptions. This is the approach taken by [Kevlar](https://kevlar.sh/), which uses a refereed game with 1 of N trust assumptions). The Sync Committee signs block headers attesting to their validity → the signed block headers are a VC. ****************Prioritization**************** <aside> 💡 There are six identified challenges for verifying Firehose data - Verification that the conventional data in flat files are correct - VC for Conventional Data - VQ for Conventional Data - Verification that trace data in flat files are correct - VC for trace data - VQ for trace data </aside> The verifiability of Conventional Data benefits from being verifiably committed to by the blockchain. And so VC and VQ solutions for this data may build on top of this, e.g. Conventional Data can be proven to be included in the blockchain via Merkle inclusion proofs relative to a specific Block header. **Discussion Point:** It is less clear how Firehose trace data might be verified. An argument might be made that verifying this data should have a lower priority because this data has not previously been included in the protocol and is not currently used. ### Composability Composability is applied to two relationships in this document. One is concerned with the composability of VC and VQ solutions and the other is the composability of sequential processes in a data pipeline. In general, VQ requires VC. However, this does not mean that VC should be developed completely independently of VQ. The type of commitment used impacts the verifiability of queries. So care must be taken to ensure that the VC solution is composable with a VQ solution. The data output from Firehose is consumed by other processes in the pipeline, e.g. Substreams. A design goal is to develop verifiability solutions that can be composed of other processes in the pipeline and reduce the overhead costs for verifying an end-to-end pipeline. For example, VQs from Firehose could feed VCs to Substreams. # Roadmap There are three verifiability needs for Firehose identified in the statement: 1. Verification of flat file correctness 2. Verification of a commitment to a valid set of conventional data in flat files (VC) 3. Verification that data requested from a flat file is correct (VQ) ## **Verification of flat file correctness** Develop a mechanism for an Indexer to verify that the data stored in a flat file is correct according to the blockchain consensus. In this case, an Indexer is the verifier and they have the complete set of data in the flat file. The mechanism will provide a *validity proof* that the data is correct according to headers (commitments) from the underlying blockchain. ### Potential Solution One approach to verifying the correctness of the flat files is to reconstruct the Merkle trie for each block based on the data stored in the flat file. The Indexer would then need to sync a consensus client to verify that the headers are as expected. This approach seems similar to snap sync. This solution proposes a validity proof based on the headers. The security of this approach depends on how the headers are verified. If the headers are verified by fully syncing a consensus client then the security depends only on the security of the underlying blockchain. If an ultralight client approach, like Kevlar, is used then the security assumptions will be optimistic 1 of N where N is the number of indexers participating as provers for the header verification. This approach uses only existing primitives. Reconstructing the Merkle tries from block data is already done in other full node architectures, e.g. snap sync. The engineering challenge will be ensuring that tree reconstruction is fast enough to be composed with other solutions, e.g. verifiable commitments and verifiable queries. ## **Verification of a commitment to a valid set of conventional data in flat files (VC)** The solution above for verifying flat file correctness is useful for indexers however by itself it does not, by itself, provide a solution that can be used with verifiable queries from flat files. Specifically, the solution above assumes that the verifier has access to all the data stored in the flat file. The verifiable commitment in this section assumes that the verifier has commitment to the data in flat file, i.e. they do not have access to all of the data stored in the flat file, and that they can verify that the commitment was generated correctly. Develop a mechanism enabling an Indexer to efficiently prove that they possess a correct set of flat files. In this case, the verifier is a consumer of The Graph protocol. By definition, the verifier does not have access to the full set of data contained in the flat file and so a commitment to the data is used. To provide downstream composability, this commitment should be useable in the VQ solution for Firehose. This commitment might also be used for enabling indexing rewards, similar to how PoIs are used today. In this architecture, the prover/Indexer must convince the verifier/consumer that they possess an accurate set of flat files, relative to the history of the blockchain, without requiring that the verifier has a complete set of flat files. This implies that the prover must commit to the data stored in the flat files. The architecture for this solution is dependent on the use case. Here we assume the case where the VC is used analogously to PoIs for indexing rewards. The case where VC is used as the foundation for a VQ is discussed in the following section. When the VC is used to claim indexing rewards, the Indexer will make a smart contract call to submit the commitment along with a proof that it was computed correctly. This proof can then be used to settle disputes in arbitration. ### **Potential Solution** One possible approach would be to commit to the block headers for every block using a vector commitment scheme, e.g. Merkle trie, Verkle trie, PCS-based vector commitment, etc. The commitment could then be opened at any block and the block header could be verified using consensus information, i.e. headers signed by a valid sync committee. The challenge is that this commitment can be constructed using only header information, the Indexer doesn’t need to know the leaf data for each block. What is needed is a two-tiered opening proof to prove that the Indexer knows state leaf information as well as block headers. This might be accomplished by having the Indexer produce an inclusion proof for a random leaf of a random block. The PoI analog in this case would be the VC along with the two-tiered opening proof at random challenge points. The proof could be checked for correctness in a smart contract, yielding a validity proof. The proof would check that the block header opening was signed by a valid sync committee, that the opening proof relative to the commitment was correct, and that the opening proof for the leaf data relative to the block header was correct. The overall security of this scheme would be determined by the process to determine the correct sync committee. There are a few options for determining correct sync committees using the consensus layer information provided by Ethereum. One option is to start from the genesis sync committee and verify the chain of sync committee signatures up to the epoch under validation. Another option would be to trust an oracle to provide the sync committee during a specific epoch, this approach would be similar to how light client sync strategy for solutions like Helios. Commitment schemes are well understood and this approach does not require any modification to existing schemes. From a research standpoint, it needs to be proven that the randomized two-tiered opening challenge is sufficient to prove that the Indexer has knowledge of the entire archive up to a specified block. From an engineering standpoint, the Merkle tree reconstruction and proof generation must be performant enough to generate real-time inclusion proofs. ## [wip] **Verification that data requested from a flat file is correct (VQ)** Develop a mechanism enabling an Indexer to prove that the response to a request for data from Firehose is correct relative to a commitment. As discussed, VQ requires a verifiable commitment to the dataset being queried to enable verification without requiring the full dataset. This may or may not be the same as the VC used in the previous section. It is necessary to determine what a ****unit**** of Firehose data is. The verifiability solution for VQ will then be defined for this unit of data. Because Firehose data is sourced directly from the blockchain, it seems reasonable that existing verifiability solutions for the blockchain should be applicable. For example, by reconstructing the Merkle tries as in the section above, it should be possible to produce Merkle inclusion proofs for accounts for a given block. Further, using the VC above, it should be possible to produce inclusion proofs for a given block in the history of the blockchain. ## Milestones The following milestones are written in roughly expected chronological order. Verifying flat file correctness seems to be the most straightforward, requiring the least research effort and mostly engineering effort using existing techniques, and prototyping can start almost immediately. The output of these milestones will be Rust/Solidity prototypes demonstrating that the integrated systems will be performant enough for integration into Indexer operations built on Firehose. **Verification of Flat File Correctness** Specify solution architecture - Requires specifying how to construct Merkle tries from block data stored in flat files - Requires specifying how to verify the correctness of block headers, e.g. using the Ethereum consensus layer Implement prototype and compute expected costs - Rust based prototype - A module that reads flat files and computes Merkle trie and roots from data - A module that obtains consensus layer information and checks the validity of the latest root output from the flat file trie. For Ethereum, block headers (state roots) can be verified via the consensus layer by checking that the header has been signed by a valid sync committee. - No smart contracts **VC for Conventional Data** Specify solution architecture - Prove the soundness/security of two-tiered opening proofs - Specify the scheme for committing to block headers, e.g. Merkle trees, Verkle trees, etc. - Requires specifying how to verify the correctness of block headers, e.g. using the Ethereum consensus layer (re-use solution above) Implement prototype and compute expected costs - Rust-based prover and verifier - A smart contract verifier might be needed depending on the use case of VC, e.g. if the commitment is used to distribute indexing rewards. [wip] **VQ for Conventional Data** Specify solution architecture Implement prototype and compute expected costs - Rust-based prover and verifier - Smart contract verifier needed for dispute and automated arbitration process ### Relationship to The Graph’s Roadmap The solutions described here provide a mechanism to reward indexer participation in the network (to provide Firehose data) while ensuring the indexers are participating honestly by designing verification protocols that do not require human-in-the-loop arbitration. Improving scalability and decentralization of the protocol. Additionally, by focusing on providing verifiability solutions for Firehose, these solutions are providing a foundation for building verifiable data services for pipelines consuming Firehose data in the WoDS vision. # Appendix ## Alignment with Horizon Horizon is very flexible and allows for modular verifiability solutions, specified through the collateralization contract. The architecture defined in this paper only assumes an arbitration smart contract used to settle disputes. Horizon allows for applications to specify their slashing authority and so the arbitration contracts developed here are directly useable within Horizon. For example, trusted SQL database manager, i.e. verifiable commitment, arbitrated via Horizon collateralization. Verifiable SQL queries can be served relative to commitment from manager. ### The Role of Verifiability in Horizon In the context of Horizon the role of verifiability can be succinctly stated as: **Verifiability automates the enforcement of correctness warranties.** Similar to the above, automation reduces trust and increases the scalability of the protocols built on the network. ## Trust Assumptions The Trust Assumptions described below are primarily concerned with how trust is distributed within a network. In particular, a key distinguishing feature is how the truth is verified within the network. ### **Oracle** - ∞: The trust within the network is placed on a single entity. Networks with this trust assumption rely on a trusted entity (an oracle) to verify the truth within the network. It is relatively easy to implement a system relying on an oracle for verifiability as all it requires is a mechanism to prove that the oracle has made a certain decision, e.g. this can be accomplished with a digital signature. The trusted entity becomes a point of centralization in systems relying on this trust assumption. In addition to the overhead required to ensure that oracle is trustworthy, the scalability of the system is limited by the arbitrator’s dispute processing throughput. There are no proofs for the correctness of statements in these networks, the oracle’s decision is the “proof”. Disputes are resolved by the oracle. The Graph’s current verifiability solution relies on this trust assumption. Specifically, the [arbitrator](https://hackmd.io/@4Ln8SAS4RX-505bIHZTeRw/BJcHzpHDu) specified in the dispute smart contract is trusted to determine the correctness of disputed Proofs of Indexing and Attestations. ### **Quorum** - M of N: Trust is distributed among participants in a protocol. Network participants (or a subset of the participants) vote on the correctness of statements within the network. Trust is placed in the honesty of a subset of participants in a network. These networks trade complexity (e.g. consensus protocols) for distributing trust across multiple participants in the network. The effectiveness of verifiability within this protocol depends on the number of independent, self-interested and incentivized participants in the quorum → more participants = more trust. These protocols use public key cryptography (e.g. digital signatures) to ensure that votes are from valid network participants. Additionally, these networks rely on economic incentives to maintain security. There are no proofs for the correctness of statements in these networks, what the quorum says is the truth. Disputes are resolved by the Quorum. Consensus protocols can be viewed as relying on this trust assumption; other systems fall into this category as well. One example is [Chainlink](https://research.chain.link/whitepaper-v1.pdf). The verification protocol used in these schemes verifies that oracle answers are provided by valid members of the network the data queried from the network and to prevent freeloading (oracles cheating by replaying results from other oracles). The answers from participating oracles are aggregated off-chain using a Threshold Signature Scheme combined with a commitment scheme. As long as a quorum of 2/3rds of the total participating oracles is honest, then the verification scheme is sound. [wip] Another interesting example is the [Portal Network](https://github.com/ethereum/portal-network-specs/tree/master). ### **Optimistic** - 1 of N: Trust is placed in the assumption that a single participant in a network will be honest. Statements from network participants are assumed to be correct. If there is ever a discrepancy between two or more participants, indicating an incorrect response, then a dispute protocol can be initiated which can be used to prove which participant is incorrect (Fraud Proof). An important property of the dispute protocol is that the communication and/or computation used to generate and verify the proof is lower than naively verifying by recomputing the statement verbatim. These protocols trade more complicated dispute resolution protocols for lowering trust assumptions. The Fraud Proof in these networks lowers the trust threshold for verification, a verifier only trusts that a single participant in the protocol is honest. Disputes are resolved by the Fraud Proof. The Optimistic Layer 2 protocols [Optimism](https://optimism.mirror.xyz/fLk5UGjZDiXFuvQh6R_HscMQuuY9ABYNF7PI76-qJYs) and [Arbitrum](https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-kalodner.pdf) and the super-light client [Kevlar](https://github.com/lightclients/kevlar/tree/main) rely on Optimistic trust assumptions. At a high level, all three protocols have a similar architecture. A verifier requests responses from multiple participants in the network, if there is a discrepancy between any of the responses, then a refereed game-based proving system is initiated to determine the correct response. If there is no discrepancy, then it is assumed that the response is correct. [wip] Need to clarify the roles of participants and the type of information available. Optimistic trust assumptions assume a fisherman role, who has all information available to the Indexer and so can detect fraud. Validity trust assumptions don’t need a fisherman, the end user can check the validity of the proof and can dispute if the proof check fails without needing all information available to the indexer. ### **Validity** - 0: Trustless. Every interaction within the protocol can be confirmed to adhere to the rules of the protocol. Statements from network participants include a proof that the statement is correct (Validity Proof). Verification amounts to checking that the proof is valid. An important feature of checking the proofs is that the communication and/or computation costs for checking the proof are lower than naively verifying by recomputing the statement verbatim. These protocols trade computation/communication overhead for trustlessness: a proof must be generated and transmitted for every statement but the verifier does not have to trust any entity regarding the validity of the statement. Validity proofs enable trustless verification. Disputes are resolved by the Validity Proof. The zk Layer 2 protocols rely on the Validity trust assumption. These protocols use SNARKs to prove the correctness of state transitions resulting from transactions. If the SNARK proof is valid, then the state can be advanced. ### Combining Trust Assumptions To achieve specific performance requirements, protocols can be composed of sub-protocols, each relying on different trust assumptions. For example, Validiums (e.g. zkPorter) rely on Validity proofs to prove the correctness of state transitions but instead of directly verifying the proof on chain (as a zk Layer 2) the protocol relies on a separate protocol to verify the correctness of the data required for verifying the proof, e.g. a data availability committee (Quorum), trusted data provided (oracle), etc. ## [wip] Metrics **Ease of Implementation** Do off-the-shelf solutions exist? How many changes to existing protocols are required? How complex are the underlying cryptographic protocols? **Decentralization** How do the architectures impact decentralization? **Security** In some cases, lowering trust assumptions relies on more novel cryptography. Increasing complexity increases the surface area for mistakes. **Communication and Computation** Proving and verifying a statement requires some communication and computational overhead. **Speed/Latency** Proof generation and verification are additional overhead costs. The more complex the proof system, the larger the overhead costs. **Interactivity**