[EXTERNAL] Approval's Rewards Implementation Desing

# [EXTERNAL] Approval's Rewards Implementation Desing https://github.com/polkadot-fellows/RFCs/pull/119#discussion_r1997322791 ![availability rewards - offchain diagram](https://hackmd.io/_uploads/Hk1pFetDxg.svg) ### Data collection We track this data for each candidate during the approvals process: ```rs= /// Our subjective record of out availability transfers for this candidate. CandidateRewards { /// Anyone who backed this parablock backers: [AuthorityId; NumBackers], /// Anyone to whome we think no-showed, even only briefly. noshows: HashSet<AuthorityId>, /// Anyone who sent us chunks for this candidate downloaded_from: HashMap<AuthorityId,u16>, /// Anyone to whome we sent chunks for this candidate uploaded_to: HashMap<AuthorityId,u16>, } ``` ### Offchain work So, starting by instantiating the candidates rewards once the relay block arrives and the block entry is created, I assume here is where it is possible to fetch the backers too. #### Storing the candidate rewards within the block entry Currently the approvals holds the information about how many included but non-finalized candidates there is in a specific relay block. They are kept as `candidates: Vec<(CoreIndex, Hash)>` in the `BlockEntry` struct (also stored on disk), so we can include the `candidate_rewards` as another vector in the same struct where the `i'th` candidate reward corresponds to the i'th candidate in `candidates`. #### Storing the candidate rewards within the in-memory state Another alternative is to keep them in the subsystem in-memory state as a hash map of `RelayBlockHash` -> `Vec<CandidateRewards>`, again where the `i'th` candidate reward corresponds to the i'th candidate in `BlockEntry::candidates` That approach is better as it does not require changes to the already stablished schemas. #### fetching the no-shows We can grab the `no_show_validators` inside the `approval_status` method which is a `State` method. - Candidate Rewards as part of the Block Entry: The method `approval_status` receives the block entry, but the block entry parameter is not mutable, so for this approach we should make it mutable, store the new `no-show-validators` in the candidate rewards for that candidate at that relay block and reflect this change on disk. - Candidate Rewards as part of the State: Still touching the `approval_status` since it is a State method however it is a immutable reference, so we should make it mutable and store the `no-show-validators` inside the hash map of `RelayBlockHash` -> `Vec<CandidateRewards>`. This will require a `O(N)` search in the block entry candidates to retrieve the candidate hash index so we can update the correct candidate rewards entry. #### download from / uploaded to - **Q ->** In the CandidateReward struct the download/upload does not make any difference about the strategy used, should we introduce that a granular analysis? For example: `downloaded_from: HashMap<AuthorityId, Downloaded>`, where `Downloaded` could be defined as: ```rs= enum Downloaded { Full(AuthorityId), Chunks { is_systematic: bool, authority: AuthorityId, amount: u16, } } ``` so it will even enable a granular analysis since its possible to separate validators that are full providers from the ones that send only the chunks/systematic-chunks. - Suggestion with a simple counter I would assume a separated data collection from the approvals subsystems as the data exists inside availability recovery subsystem. The availability recovery contains a shared state that lives under the `run` method, we could track the chunks download and upload per candidate in it, something like `HashMap<CandidateHash, HashMap<AuthorityId, u16>>`. then any update takes O(1) to increment the download counter. - **Q ->** Given that in the availability recovery there is the request receiver for the `AvailableDataFetchingRequest` separately from the chunks request receiver that lives in the availability distribution, should we consider the response we return for the requester an upload too? - If `YES` then we can just adjust the type to also count the uploads and not only the downloads like: `HashMap<CandidateHash, HashMap<AuthorityId, (u16, u16)>>`, the tuple's first item relates to downloads and the second to uploads. ##### Collecting Downloads The problem is with download metrics: the current tasks that performs the routine to fetch the availability chunks does not have the availability recovery shared state, they manage another state that is placed under `strategy` module. A suggested approach: I notice that the inner strategies has access to the a channel that is shared with the main availability recovery loop. We can create a channel called `on_strategy_finished_tx` and `on_strategy_finished_rx`, where we keep the `on_strategy_finished_rx` in the main loop while `on_strategy_finished_tx` is shared with the running strategies, and whenever one strategy finishes, successfully or not we collect its download metrics and send them to the availability recovery throughout `on_strategy_finished_tx` enabling us to store them properly in the main/queriable state. Another approach: we can use the `state.ongoing_recoveries.select_next_some()`, as it already notifies the main loop when a recovery task is done, for getting the metrics we should update the return values from `Result<AvailableData, RecoveryError>` to be `Result<(AvailableData, DownloadMetrics), RecoveryError>`. ##### Collecting Uploads **Q ->** I assume that sending chunks is part of the availability distribution subsystem, however I notice that inside availability recovery exists a request receiver for `request_v1::AvailableDataFetchingRequest` incoming requests, is that specific for handling full recovery strategy? Following the download data collection the upload seems that can follow the same path by the subsystem state to hold the information. The collection should happens inside the `run_chunk_receivers` job that is spawned by the `AvailabilityDistributionSubsystem`. The job is responsible for listening on the request receivers `IncomingRequestReceiver<v1::ChunkFetchingRequest>` and `IncomingRequestReceiver<v2::ChunkFetchingRequest>`, for both incoming requests the handler function is `answer_chunk_request` that queries the chunk and send the response. There, inside `answer_chunk_request`, there could exists a communication channel with the `AvailabilityDistributionSubsystem` main loop and for every successfull answered request a message, of the kind `(AuthorityID, CandidateHash, usize)`, is sent to the main loop, which will basically update the mapping. For the state's field type, we can use the same type as used in the download's data collection: `HashMap<CandidateHash, HashMap<AuthorityId, u16>>` but the counter increments for every upload (successfull sent response). - Alternative using a job's internal state: We can also define an inner state for the `run_chunk_receivers` that will manage the upload metrics collection, this approach will not require any message exchange between the job and the subsystem main loop. When we need to query the informations for a candidate, we reach the subsystem and the subsystem forward to the job that endup sending the information through a response channel. ##### Collecting Approvals Usage > After we approve a relay chain block, then we collect all its `CandidateRewards` into an `ApprovalsTally`, with one `ApprovalTallyLine` for each validator. In this, we compute `approval_usages` from the final run of the approvals loop, plus `0.8` for each backer. We say a validator 𝑢 uses an approval vote by a validator 𝑣 on a candidate 𝑐 if the approval assignments loop by 𝑢 counted the vote by 𝑣 towards approving the candidate 𝑐. This information already exists in the approvals subsystems under `CandidateEntry` approvals bitfield as it stores the validator indexes that approved such candidate. The difference is that once the vote is valid we just mark the candidate as approved by its candidate's hash and don't store the validator who sent the approval vote. Here we should track such information basically a counter per relay block as I don't think it should detailed per candidate hash since in the approvals tally we want the "contributions" that a given validator did towards the relay block approval. ```rs= approvals_usage: HashMap<RelayBlockHash, HashMap<AuthorityId, u32>>, ``` and each time a authority's vote contribute to a relay block's candidate approval this counter is incremented. #### Relay block approved Following the RFC, we should retrieve all the candidate's informations collected when the relay block it belongs to is marked as approved by the approvals subsystem. The routine to collect all the candidate rewards into an approvals tally struct is straighforward: 1. We should query the backers and no-shows the approvals subsystem for the approved relay block, which means retrieve the information from the subsystem's state (or database depending on the approach) 2. Collect the downloads & uploads from availability recovery and availability distribution, respectively (considering that availability recovery also may have some upload information) that could be done setting up a overseer message to query these informations. - We might face a problem here as the candidate's chunk downloads/uploads are not tagged by the relay block but by the candidate's hash. Given that we have access to the approved relay block candidates & candidate's hashes, we might need to adjust the query message to use as parameter a list of candidate hashes instead a single relay block hash. 3. After construct the `CandidateRewards` for each candidate in the approved relay block we should build the `ApprovalsTally`. ```rs= /// Our subjective record of what we used from, and provided to, all other validators on the finalized chain pub struct ApprovalsTally(Vec<ApprovalTallyLine>); /// Our subjective record of what we used from, and provided to, all one other validators on the finalized chain pub struct ApprovalTallyLine { /// Approvals by this validator which our approvals gadget used in marking candidates approved. approval_usages: u32, /// How many times we think this validator no-showed, even only briefly. noshows: u32 /// Availability chunks we downloaded from this validator for our approval checks we used. used_downloads: u32, /// Availability chunks we uploaded to this validator which whose approval checks we used. used_uploads: u32, } ``` Given the approvals usage in the approvals subsystem, for each validator `V` in there that has approved a candidate for the approved relay block, iterate through all candidates rewards `C` extracting: no-shows count, downloaded_from and uploaded_to and finaly its approvals usage: ```rs= // consider that "candidates_rewards" is a // "Vec<CandidateReward>" computaded previously // for all candidates in the approved block. let approvals_usage: HashMap<AuthorityId, u32> = state.approval_usages.get(approved_block).unwrap(); for (authority, usages) in &approvals_usage { // here we instantiate the approvals tally already setting the "approval_usages" // In this, we compute `approval_usages` from the final run of the approvals loop, // plus `0.8` for each backer let final_usages = usages + (0.8 * candidates_rewards.backers.len()); let mut approvals_tally = ApprovalsTally::new_with_usage(final_usages); let mut noshows_count: u32 = 0; let mut used_downloads: u32 = 0; let mut used_uploads: u32 = 0; for candidate_reward in &candidates_rewards { if candidate_reward.noshows.contains(authority) { noshows_count += 1; } if let Some(downloads) = candidate_reward.downloaded_from.get(authority) { used_downloads += downloads; } if let Some(uploads) = candidate_reward.uploaded_to.get(authority) { used_uploads += uploads; } } approvals_tally.noshows = noshows_count approvals_tally.used_downloads = used_downloads; approvals_tally.used_uploads = used_uploads; } ``` The created approvals tally should be attached to the current epoch. #### Finality & Storing Approvals Tally We should keep these approvals tally under a pending state, The RFC states that the approvals tally should be stored by one whole epoch, and when the epoch is finalized **IMPORTANT**: What would be the best approach to clean the stored data? - When we perform the query, then we clean the retrieved information from the in-memory mapping? - OR, when finalization happens we perform the cleaning for the respective candidates the belongs to the recent finalized blocks?