[Proposal] Prover model for Aztec's RFP (Randomness, Staking and Reputation) - v1

## [Proposal] Prover model for Aztec's RFP (Randomness, Staking and Reputation) - draft/v1 UPDATE!!! The **final version** of our team's proposal has been published here: https://forum.aztec.network/t/proposal-decentralized-prover-network-staking-reputations-and-proof-races/2489/1 ______________________________ Authors: Norbert, Nilu, Rachit ### Summary: The proposed prover selection and incentive model is a staking-based, in-protocol mechanism in which provers are selected through a VRF from a pool of N provers with the highest reputation score. We are proposing an enshrined prover network with the ability of provers pools to join. The model promotes permissionless entry, liveness and a certain level of competition (through reputation scores and a secondary "backup prover" mechanism), honest behavior (through staking and slashing), and fair and inclusive prover selection (through randomness). ### Decentralization: Decentralization is a key element in Aztec's RFP, and therefore we propose to implement the below indexes with the aim to monitor the health and balance of the prover network and to minimize inequalities. * **Geographic Diversity (GD)**: The goal of this index is to monitor network health, and prevent centralization or monopolies and Sybil attacks. Maximize **GD = Total distinct geographical regions / Total number of provers in the network** * **Resource Distribution Inequality (RDI)**: The primary goal is to quantify the overall inequality, taking into account both within-group and between-group inequality, in the distribution of resource concentration among provers in the network. * *Computing Power Concentration (CPC)* as the Theil T for computing power distribution. * *Stake Concentration (SC)* as the Theil T for stake distribution. Minimize **RDI = CPC + SC** * A maximum level of acceptable inequality to be defined. ### Permissionless entry: Any prover meeting basic eligibility criteria can join the network anytime. The criteria serve the purpose of maximizing liveness and reliability, while minimizing prover failure and dishonest behavior: * Staking: Provers are required to deposit a pre-defined amount of stake on L1 in order to join the prover network. Their public key is registered and a unique identifier is assigned to them calculated from it. Both the identifier and the public key is included in their proofs generated. Activation and cooldown period of N L1 blocks is applied (the idea has been borrowed from the [Aztec team/Fernet](https://hackmd.io/@aztec-network/fernet)). * Minimum system requirements: to ensure all provers have sufficient compute capacity to generate a valid proof within the given proof time window. ### Randomness: A verifiable random function is used for prover selection to rank each prover in the pool. This ensures an inclusive and fair selection method. For simplicity, we apply the same mechanism as for sequencer randomness in [Fernet](https://hackmd.io/@aztec-network/fernet) (borrowed [from the Espresso team](https://discourse.aztec.network/t/proposal-sequencer-selection-irish-coffee/483#vrf-specification-4)), i.e. generating a SNARK of a hash over the prover private key and a public input. The latter could be the current block number and a random value. The prover with the highest score is selected to generate the proof for the block content revealed by the sequencer. ### Reputation score: We propose applying a *Prover Reputation Score (PRS)* which ensures that there is always an available prover to generate proofs maximizing liveness and reliability, and promoting competition in a "soft" way. PRS is based on Prover Uptime and Prover Reliability. New joiners are assigned a base score to ensure that they can become active provers from early days. For this we apply the optimistic assumption: the base PRS is the maximum possible PRS, i.e.:1, on the scale of min/max PRS being 0/1. Misbehavior will then gradually decrease the PRS score, and also result in slashing. The PRS consists of the following components: * *Prover Uptime (PU)* represents the amount of time a prover is available and actively participating in the network. (It can be measured in units of time or epochs.) Minumum uptime should be determined to ensure liveness and discourage free-riders. * *Prover Reliability (PR)* factor accounts for the reliability of each prover in terms of their ability to generate valid proofs within the determined proof time window. It represents the ratio of valid proofs and total proofs generated by the prover. * *PR = Total valid proofs by prover / Total proofs by prover* Maximize **PRS = PU * PR** ### Pool of provers with highest Prover Reputation Score (PRS): Provers are selected from a pool of N provers with the highest reputation score (borrowing some elements from [Taiko’s Alpha-4 testnet](https://community.taiko.xyz/t/taiko-proving-design-overview-grimsvotn-and-eldfell-cases/1014) and [the early concepts of Scroll](https://hackmd.io/@yezhang/SkmyXzWMY)). (N could be 128/256/512 depending on the block time and the proof time window.) The pool is dynamic: any prover selected to generate a zero-knowledge proof gets excluded from the pool at until their proving task is completed and the proof has been submitted to L1, i.e. their computation capacity is available again. * Secondary "backup prover" mechanism is implemented to mitigate prover failure: if any prover fails to generate a valid proof (regardless if intentionally or due to random failure), the proving task is up for an open competition based on proof racing among 4 provers randomly selected from the pool. The fastest prover submitting a valid proof is rewarded while the other three provers get uncle rewards (part of the slashed amount from the failed prover is used to cover this). While this carries some waste of computation, it can be considered as network redundancy to maximize liveness and minimize the cost of a block not being proved. The implementation of the secondary mechanism increases the resilience against certain attack vectors and malicious behavior, can streamline block proving, and minimize reorgs. ### Proving process: The sequencer selected to exclusively decide block contents must upload the contents to either L1 or a verifiable DA layer. If the sequencer fails to reveal the block contents and that block is flagged as skipped (as per Fernet), the randomly selected prover will automatically be assigned to the next revealed block without a new random prover selection taking place. (If this is not feasible due to the randomness mechanism, and a new prover must be randomly selected, the previous prover could receive part of the slashed stake of the sequencer as compensation.) ![](https://hackmd.io/_uploads/ByFBG6zMT.png) <div style="text-align: right"> *Source: Aztec </div> After the sequencer has revealed the block contents, a prover is selected (via the above mechanism) to prove the block. Once the proof is generated, the prover submits the proof to L1 for verification. * If the proof is valid and successfully verified, the block reaches finality. * If there is no valid proof submitted on time, the open competition for the proof racing is initiated by the protocol. In this secondary proving mechanism, the proofs generated are submitted to L1 for verification, and the fastest prover to have submitted a valid proof will be eligible for the proof rewards. Provers are to submit a proof for their specific block back to the L1 smart contract before the end of the proving phase, within the pre-set proof time window of N minutes. The proof submitted during the proving phase needs to reference the new state root uploaded by the sequencer (similar to how it works in [Fernet](https://hackmd.io/@aztec-network/fernet)). If the prover fails to submit a valid proof on time, the prover’s stake will be slashed. Once the proof submitted by the prover to L1 is verified against the initial block commitment and becomes final. Assuming that the parent block reached finality, the new block will also become final once included in L1. Estimated duration of the proving phase: 40-50 Ethereum blocks (8-10 minutes) #### Proof batching: In the case of proof batching a certain number of proofs generated for individual blocks could be aggregated into a batch. Only the aggregated proof is then sent to L1 for verification. This could decrease transaction fees but results in an extended time to finality. As proposed in [Fernet](https://hackmd.io/@aztec-network/fernet), proving of each block happens as the block is revealed. However, the resulting proof is not submitted to L1. Instead, the individual proofs are aggregated into a batch proof every N blocks, and the aggregate prover (selected just as the normal provers) submits the aggregate proof to L1 for verification. Batch proving could be implemented as a mechanism to manage peak periods of incoming transactions. We don’t want to delay finalization under normal conditions, however in times of load peaks batching could ensure balanced transaction processing, proving and minimal finality delay. Instead of determining a block proposal frequency based on time (every N seconds), block proposal could be driven by the minimum number of transactions to be included in the mempool. When several blocks are proposed right after the other due to the high influx of transactions, proving of individual blocks could run “almost” parallel, and batching would only cause minimal delay (if any) in finalization on L1. ### Incentives and slashing: In our model we use staking and slashing to promote honest behavior and discourage malicious proving. During the proposal phase sequencers need to indicate the reward they are willing to pay for the proof in the block commitment (as proverFee). Once a valid proof is submitted to L1 and becomes final, the rewards are paid by the protocol to the prover’s Ethereum address in ETH. Rewards could be summarized as follows: * Sequencer reward: user transaction fees + user tips + MEV – proverFee * Prover reward: proverFee + L1 gas/calldata fees Provers are slashed for failing to submit a valid proof within the given proof time window (or for extended inactivity). Part of the slashed amount will be used to cover the rewards paid in the secondary "backup prover" mechanism. The rest is burnt. * Slashing penalty could be calculated as follows: * Penalty = I * Base penalty * Penalty Severity * Where ‘I’ represents the total number of penalty instances generated by the prover. The slashed amount increases with every new misbehavior. * Base penalty could be 5%. * Penalty Severity is a coefficient that determines the severity of the penalty based on the degree of misbehavior; e.g.: inactivity over a period of time could be less severe than submitting an invalid proof (the coefficients could be 0.5 and 1 respectively). * If a prover's stake decreases by more than 50%, they are forced to exit the prover network, i.e. considering the above formula and coefficient any prover will be automatically kicked out after four times missing to submit a valid proof. ### Transaction phases: We propose the following transaction states representing the different phases of a transaction lifecycle: 1. Executed: the transaction is approved locally and sent to the mempool, 2. Processed: the transaction is included in a revealed block, 3. Proved: proof has been generated for the block including the transaction, 4. Verified: proof has been verified on L1, 5. Finalized: the transaction has been finalized. ### Addressing graceful recovery: As mentioned above, provers are selected from a pool of N provers with the highest reputation score. It could be any predefined number, e.g. 256. The pool is also designed to be dynamic: any prover currently working on a proving task gets excluded from the pool until they have submitted the proof to L1, i.e. their computation capacity is available again. 1. If for example 128 provers out of 256 were to suddenly stop operating, the pool composition would automatically be adjusted, and the pool would be filled up with participants outside the pool (if available), without any noticable impact on performance. 2. If there are not enough provers available to cover for the gap, or there are more blocks to be proved than the number of provers in the pool, the network could temporarily switch to "emergency mode" for a predefined period and apply the secondary mechanism (proof racing) described above. After the predefined period is over, the network automatically returns to "normal mode". If the network balance is still not reinstated, the emergency mode is activated again for a period twice as long as the initial one (and so on, 3x, 4x longer if further activations become necessary), to ensure the network is able to reach balance again. (Credits to Barnabé and Cosmos for inspiring this.) ### Comparisons: #### To existing Aztec proposals: * [Sidecar](https://discourse.aztec.network/t/proposal-prover-coordination-sidecar/2428) is different from our model in most aspects, as it is based on an out-of-protocol proof market where provers submit quotes for specific proving tasks. We are proposing an enshrined prover network with the ability for prover pools to join the network. The proposed in-protocol prover mechanism improves native token utility: the protocol can use and leverage its own native token for staking and prover incentives. It also removes the dependency on an external point of failure, and gives the protocol and its ecosystem greater sovereignty. * Our model is similar to [Anon’s model](https://discourse.aztec.network/t/proposal-provers-bonded-prover-auction/2401) in that both are in-protocol mechanisms. However, we propose a staking-based mechanism where provers are selected through a VRF from a pool of N provers with the highest reputation score, while Anon’s approach is a bonded prover auction. * Similar to the [cooperative model](https://discourse.aztec.network/t/proposal-cooperative-proving-network-for-fernet/2400) our model is also a staking-based mechanism utilizing VRF for prover selection, however in our case the proof for each block is generated by one prover. Only in case of proof batching is there a kind of cooperation in the sense that multiple provers contribute to the batch proof. * Unlike the [Staking Proving Network model](https://discourse.aztec.network/t/proposal-staking-proving-network-for-fernet/2439) and [Fernet on the Rocks](https://discourse.aztec.network/t/proposal-fernet-on-the-rocks/2460) we propose to use a network of provers separate from the network of sequencers because the gap between the system/compute requirements for sequencing and proving may well increase over time. As sequencers could potentially outsource/sell block-building to specialized searchers/builders (similar to validators on Ethereum), a smaller compute capacity could be enough for them, while fast and cost-efficient proof generation could require much higher compute capacity. * Similar to [When the levee breaks](https://discourse.aztec.network/t/proposal-when-the-levee-breaks/2457) our proposal also includes that the sequencers indicate the fees they are willing to pay for a proof generated for their block. In our case a reputable prover is randomly selected to prove the block, while in the above-mentioned proposal the sequencer is entrusted with selecting from the proof proposals the provers have submitted. ### Open questions: #### 1. How can we ensure fair prover rewards? The provers should receive sufficient rewards to cover their costs (L1 gas/calldata fees, computation, electricity, bandwidth etc.) and make sure they are also profitable. Therefore benchmarking should be done for proof generation costs through modelling and simulation. The transactions fees, the tokenomics of Aztec's native token and rollup-economics should be designed accordingly to ensure prover profitability. #### 2. Should new block frequency be defined based on time, or based on min/max number of transactions to be included in a block? If new block frequency is based on the number of transactions the frequency will be driven by the number of incoming transactions and automatically adjusted based on demand (fewer new blocks in periods with less transactions and more new blocks in peak periods). Further analysis should be done regarding this. #### 3. What if provers fail to deliver a valid proof on purpose? Further analysis should be done to estimate the cost and impact of provers not delivering a valid proof intentionally to trigger the secondary mechanism.