Parachains PoV

*PoV stands for Proof-Of-Validity* **Questions**: - [x] ❓ how/where/when is the PoV constructed? (code) - [x] ❓how is the PoV verified? (code) - [x] ❓why does the parachain validation function (PVF) needs to bound the storage items for constructing the PoV? from [StackExchange](https://substrate.stackexchange.com/questions/518/what-does-pov-stand-for/519), answer by @shawntabrizi: This is a specific concept in the parachains protocol which allows validators on Polkadot to execute and verify blocks on the relay chain, which ultimately provides the shared security that Polkadot advertises. At a high level, for the relay chain to execute a parachain block, it needs: - The parachain block, which contains the extrinsics. - The state transition function, which is the Wasm runtime. - The relevant state which is read to complete the extrinsics. The PoV is that last bullet point. Basically, the parachain collator will have the entire current state of the chain, and when it produces a new block, it will capture the relevant subset of the state and send it to the relay chain so that it can do it's job and fully execute the block. This is possible because we use a merkle trie, and thus do not need the full state in order to verify the state transition, just the relevant nodes which can be used to recompute the new state root. The PoV has a maximum limit for parachain blocks. On Polkadot it is currently configured to 5 MB so that there are limits to how long it could take to gossip that information to the relevant Polkadot validators. This is of relevance since this limit can be reached quite quickly depending on the operations your node is doing, and thus limit the number of transaction you can include in a single parachain block. --- **Context summary** (based on [protocol overview docs](https://paritytech.github.io/polkadot/book/protocol-overview.html)): the main goal of the parachains protocol is to carry a parachain block from authoring to secure inclusion on the relay chain. - **Validators**: responsible for validating proposed parachain blocks by verifying the proof of validity of the blocks; They also ensure that the PoV remains available since PoVs are too large to include in the relay chain block. The PoV related data must be kept available. - **Collators**: responsible for creating PoVs that validators can check. To create PoVs, the collators need access to the PVF (parachain verification function, i.e. the runtime wasm) and to have the full storage state of the parachain. - **Inclusion pipeline**: 1. validators are assigned to a specific parachain by the Validator Assignment routine 2. collators build a parachain block (candidate block) and calculate the block's PoV; 3. collators send the proposed blocks to the validators assigned to their parachain; 4. validators check the PoV and the proposed block and sign a verdict, which can be positive or negative. negative verdicts trigger a dispute where many other validators will be assigned to check the parachain block. 5. a parachain block is only considered finalized in the relay chain when after a proof of availability 6. in the following relay chain blocks, validators will participate in the Availability Distribution subsystem to note and ensure that the previous para blocks have all their data available. 7. once the relay chain has enough information about the candidate and its availability, the block is considered in "pending approval". although it is now a parablock that can have child blocks, other validators need to still validate the parablock. 8. the final block is to accept the valid block. This requires more validators to verify the included parablock besides the initial set, in order to ensure that the overall majority of the backing parachain blocks are honest. after this phase, the block has been accepted (backed, available and undisputed) ## how/where/when is the PoV constructed? A struct `PoV` is defined under `/polkadot/node/primitives`. it consists of a `BlockData` which represents contains all the information required to validate a para block (e.g. block and witness data, serializable). ```rust /// Parachain block data. /// /// Contains everything required to validate para-block, may contain block and /// witness data. #[derive(PartialEq, Eq, Clone, Encode, Decode, derive_more::From, TypeInfo, RuntimeDebug)] #[cfg_attr(feature = "std", derive(Serialize, Deserialize))] pub struct BlockData(#[cfg_attr(feature = "std", serde(with = "bytes"))] pub Vec<u8>); /// A Proof-of-Validity #[derive(PartialEq, Eq, Clone, Encode, Decode, Debug)] pub struct PoV { /// The block witness data. pub block_data: BlockData, } impl PoV { /// Get the blake2-256 hash of the PoV. pub fn hash(&self) -> Hash { BlakeTwo256::hash_of(self) } } ``` The collator type implements a constructor for the `CollationFn` which is a closure that implements the state transition of a parachain. the collation function returns a `CollationResult` which contains the PoV of the state transition (within `Collation` struct). ```rust /// Collation function. /// /// Will be called with the hash of the relay chain block the parachain block /// should be build on and the /// [`ValidationData`] that provides information about the state of the /// parachain on the relay chain. /// /// Returns an optional [`CollationResult`]. #[cfg(not(target_os = "unknown"))] pub type CollatorFn = Box< dyn Fn( Hash, &PersistedValidationData, ) -> Pin<Box<dyn Future<Output = Option<CollationResult>> + Send>> + Send + Sync, >; /// Result of the [`CollatorFn`] invocation. #[cfg(not(target_os = "unknown"))] pub struct CollationResult { /// The collation that was build. pub collation: Collation, /// An optional result sender that should be informed about a successfully /// seconded collation. /// /// There is no guarantee that this sender is informed ever about any /// result, it is completely okay to just drop it. /// However, if it is called, it should be called with the signed statement /// of a parachain validator seconding the /// collation. pub result_sender: Option<futures::channel::oneshot::Sender<CollationSecondedSignal>>, } ``` `polkadot/node/primitives/src/lib.rs` In cumulus, the `CollatorService` calls into the `polkadot` primitives and other helper to implement the collator logic. The implementation of a `CollatorService` exposes the `fn build_collation` method constructs and returns the candidate's 1) PoV and 2) `Collation` type (defined in the Polkadot primitives). The PoV is incapsulated through the `ParachainBlockData<Block>`, which consists of: 1. a `header` 2. a set of `extrinsics` 3. a `compact_proof` ```rust /// The parachain block that is created by a collator. /// /// This is send as PoV (proof of validity block) to the relay-chain validators. /// There it will be /// passed to the parachain validation Wasm blob to be validated. #[derive(codec::Encode, codec::Decode, Clone)] pub struct ParachainBlockData<B: BlockT> { /// The header of the parachain block. header: B::Header, /// The extrinsics of the parachain block. extrinsics: sp_std::vec::Vec<B::Extrinsic>, /// The data that is required to emulate the storage accesses executed by all extrinsics. storage_proof: sp_trie::CompactProof, } ``` ## Traits `StorageInfoTrait` and `PartialStorageInfoTrait` ```rust /// A trait to give information about storage. /// /// It can be used to calculate PoV worst case size. pub trait StorageInfoTrait { // Required method fn storage_info() -> Vec<StorageInfo>; } ``` ```rust /// Similar to `StorageInfoTrait`, a trait to give partial information about /// storage. /// /// This is useful when a type can give some partial information with its /// generic parameter doesn’t implement some bounds. pub trait PartialStorageInfoTrait { // Required method fn partial_storage_info() -> Vec<StorageInfo>; } ``` The traits `StorageInfoTrait` and `PartialStorageInfoTrait` require a method with the same signature (no input, returns a `Vec<StorageInfo>`). The `PartialStorageInfo` is used when the storage items are not bounded. The `#[pallet::pallet]` macro implements the `StorageInfoTrait` for every storage type, which provides metadata information about the storage type. If the storage types have the `#[pallet::unbounded]` attribute type or the whole pallet has the `#[pallet::without_storage_info]` attribute type, the `StorageInfo` will be provided by the `PartialStorageInfo`. - [x] ❓ How is the `StorageInfoTrait` used to calculate the worst case size? The `StorageInfo` for each storage type is used to calculate the worst case size of the PoV, thus storage items that will be included in the PoV need to be bounded. ```rust pub struct StorageInfo { // Encoded string of pallet name. pub pallet_name: Vec<u8>, // Encoded string of storage name. pub storage_name: Vec<u8>, // The prefix of the storage. All keys after the prefix are considered part // of this storage. pub prefix: Vec<u8>, // The maximum number of values in the storage, or none if no maximum // specified. pub max_values: Option<u32>, // The maximum size of key/values in the storage, or none if no maximum // specified. pub max_size: Option<u32>, } ``` The parameter `max_size` is used to calculate the worst case scenario size of the key/value of a particular storage item. This parameter is fundamental for calculating the PoV size of a parablock. Each storage type implementation (`StorageMap`, `StorageValue`, etc..) implements the `traits::StorageInfoTrait` and ``traits::PartialStorageInfoTrait`` traits. The partial info trait sets `max_size` to `None`. On the other hand, the `StorageInfoTrait` requires the storage type to implement the `MaxEncodedLen` trait, since the `fn max_size` implementation uses it. Note that it is the responsibility of the pallet macro when expanding the attribute macro related to the storage items to select which trait implementation to use, based on the `#[pallet::unbounded]` and `#[pallet::without_storage_info]` annotations. ```rust impl<Prefix, Hasher, Key, Value, QueryKind, OnEmpty, MaxValues> crate::traits::StorageInfoTrait for StorageMap<Prefix, Hasher, Key, Value, QueryKind, OnEmpty, MaxValues> where Prefix: StorageInstance, Hasher: crate::hash::StorageHasher, Key: FullCodec + MaxEncodedLen, Value: FullCodec + MaxEncodedLen, QueryKind: QueryKindTrait<Value, OnEmpty>, OnEmpty: Get<QueryKind::Query> + 'static, MaxValues: Get<Option<u32>>, { fn storage_info() -> Vec<StorageInfo> { vec![StorageInfo { pallet_name: Self::module_prefix().to_vec(), storage_name: Self::storage_prefix().to_vec(), prefix: Self::final_prefix().to_vec(), max_values: MaxValues::get(), max_size: Some( Hasher::max_len::<Key>() .saturating_add(Value::max_encoded_len()) .saturated_into(), ), }] } } /// It doesn't require to implement `MaxEncodedLen` and give no information for `max_size`. impl<Prefix, Hasher, Key, Value, QueryKind, OnEmpty, MaxValues> crate::traits::PartialStorageInfoTrait for StorageMap<Prefix, Hasher, Key, Value, QueryKind, OnEmpty, MaxValues> where Prefix: StorageInstance, Hasher: crate::hash::StorageHasher, Key: FullCodec, Value: FullCodec, QueryKind: QueryKindTrait<Value, OnEmpty>, OnEmpty: Get<QueryKind::Query> + 'static, MaxValues: Get<Option<u32>>, { fn partial_storage_info() -> Vec<StorageInfo> { vec![StorageInfo { pallet_name: Self::module_prefix().to_vec(), storage_name: Self::storage_prefix().to_vec(), prefix: Self::final_prefix().to_vec(), max_values: MaxValues::get(), max_size: None, }] } } ``` - [x] ❓ Where is the `StorageInfoTrait::max_size()` used in the collation process to stop adding extrinsics to the parablock so that it does not exceed the maximum encoded size of parablocks accepted by the relay chain validators? Each collator node initiates a proposer factory which keeps track of the node's transaction pool. A `Proposer` type can be generated from the proposer factory. The proposer must implement the `sp_consensus::Proposer` trait, which exposes a `fn propose` method where both the max duration of the parablock authoring process and the max size limit of the parablock. ```rust /// Logic for a proposer. /// /// This will encapsulate creation and evaluation of proposals at a specific /// block. Proposers are generic over bits of “consensus data” which are engine- /// specific pub trait Proposer<B: BlockT> { type Error: From<Error> + Error + 'static; type Transaction: Default + Send + 'static; type Proposal: Future<Output = Result<Proposal<B, Self::Transaction, Self::Proof>, Self::Error>> + Send + Unpin + 'static; type ProofRecording: ProofRecording<Proof = Self::Proof> + Send + Sync + 'static; type Proof: Send + Sync + 'static; // Create a proposal. // // Gets the `inherent_data` and `inherent_digests` as input for the // proposal. Additionally a maximum duration for building this proposal is // given. If building the proposal takes longer than this maximum, the // proposal will be very likely discarded. // If `block_size_limit` is given, the proposer should push transactions // until the block size limit is hit. Depending on // the `finalize_block` implementation of the runtime, it probably // incorporates other operations (that are happening after the block limit // is hit). So, when the block size estimation also includes a proof that is // recorded alongside the block production, the proof can still grow. This // means that the `block_size_limit` should not be the hard limit of what is // actually allowed. fn propose( self, inherent_data: InherentData, inherent_digests: Digest, max_duration: Duration, block_size_limit: Option<usize> ) -> Self::Proposal; } ``` `substrate/consensus/common/src/lib.rs` The proposer used in all systems parachains is implemented in the `substrate/client/basic_authorship` crate. The proposer implements a `fn apply_extrinsics` method that keeps track of the time and size block as it polls extrinsics from the pool and adds them to the proposing block. The `apply_extrinsics` returns a `EndProposingReason` which may be `HitBlockSizeLimit`, `HitDeadline`, `HitBlockWeightLimit` or `NoMoreTransactions`. ```rust /// Apply as many extrinsics as possible to the block. async fn apply_extrinsics( &self, block_builder: &mut sc_block_builder::BlockBuilder<'_, Block, C, B>, deadline: time::Instant, block_size_limit: Option<usize>, ) -> Result<EndProposingReason, sp_blockchain::Error> { // ..snip // for all transations in the pool: let block_size = block_builder .estimate_block_size(self.include_proof_in_block_size_estimation); if block_size + pending_tx_data.encoded_size() > block_size_limit { // ..snip // may return early if limits are reached while adding the pending tx // to the block. } match sc_block_builder::BlockBuilder::push(block_builder, pending_tx_data) { //.. snip // may return with `EndProposingReason::HitBlockWeightLimit` } ``` `substrate/client/basic-authorship/src/basic_authorship.rs` The parachains are using the `DEFAULT_SIZE_LIMIT` set in `substrate/client/basic_authorship/src/basic_authorship.rs` as the default limit for parachain blocks, which is ~4MB. - [ ] ❓How is the PoV calculated per parablock and how does the size of the storage items impact the overall PoV size? The PoV size is re-calculated when the transaction to include in the block is fetched from the pool by the proposer (in `fn apply_extrinsic`). The size of the pending extrinsic fetched from the pool is calculated by calling `fn encoded_size()` on the return type of the transaction pool's `ready_at`. In systems parachains, the transaction pool implements the `TransactionPool` trait (in `substrate/client/transaction-pool/api`) which is generic over a block type. The Block type in systems parachains must implement the `sp_runtime::traits::Block` trait, which implements the `parity_scale_codec::Encode` trait. This trait exposes an `fn encoded_size` method used for the size metering of the extrinsic and overall parablock. Check the `StorageInfoTrait` implementation for the different storage types.