Validator rewards in Polkadot/Kusama

# Validator rewards in Polkadot/Kusama Ideally, almost all rewards in Kusama/Polkadot would come from work done securing parachains like backing and approval checkers, including work supporting XCMP, and availability. We must reward participation in relay chain concensus too, which de facto means block production but not finality, so some rewards must continue, but their levels shall wind up greatly reduced. ## Backing rewards We can reward backing checks easily since they appear on-chain anyways. We do not reward for on-chain backing attestation, but only for availability being completed, aka inclusion. We envision back pressure being applied at backing, meaning if nodes feel overworked then they should drop backing checks, not approval checks. We need backing groups to be small enough so that this creates system wide back pressure, ala 2-of-3, 1-of-1, or 2-of-2. We thus need backing work to be rewarded less well than approvals, so that validators handle back pressure correctly. At the same time, backers do more work than approval checkers, due to availability distribution. We omit seperate rewards for availability chunk distribution though. As a compromise, we suggest backers earn 50% of what approval checkers earn. We do reward availability chunk providers below, which presumably incentivizes them fetching the chunks, including from backers. #### Attacks We do foresee some small risk that backing checkers shirk their duties, expecting other backers pick up their slack. We believe backing groups to be small enough that such lazy backers suffer from lower backing rewards if this behavior becomes widespread and/or slows backing. We risk backers not distributing systematic chunks, so only they earn rewards from the preference for fetching systematic chunks. We could simply reward backers less when fewer availability providers offer systematic chunks. Alternatives: - Do rewards backers based upon systematic availability votes, or all availability votes. - Do not reward availability chunk providers either. ## Approvals rewards votes We've explored several plausible architectures for rewards. We briefly discuss approaches not taken below, but mostly those incurred considerable complexity. All validators run an approval assignment loop for each candidate, in which the validator listens to other approval checkers assignments and approval statements/votes, so the validator marks candidates approved and also determines and announces its own approval check assignments. Any validator should always conclude whatever approval checks it begins, but our approval assignment loop ignore some approval checks, either because they were announced too soon or because an earlier checker delivered its approval vote slowly. We say a validator $u$ *uses* an approval vote by a validator $v$ on a candidate $c$ if the approval assignments loop by $u$ counted the vote by $v$ towards approving the candidate $c$. We propose a simple approximate solution based upon computing medians across validators for used votes. 0. In an epoch $e$, each validator $u$ counts of the number $\alpha_{u,v}$ of approval votes they *used* from each validator $v$, including themselves. Any time a validator marks a candidate approved, they increment these counts appropriately. 1. After epoch $e$'s last block gets finalized, all validators of epoch $e$ vote upon aka reveal the number $\alpha_{u,v}$ of useful approvals they saw from each validator $v$ on candidates that became available in epoch $n$. We do not send $\alpha_{u,u}$ for tit-for-tat reasons discussed below, not for bias concerns. 2. We compute the median $\alpha_v := \textrm{median} \{ \alpha_{u,v} : u \}$ approvals statements for each validator $v$. We must compute these medians somehow: As a naive option, we could first post the $\alpha_{u,v}$ on-chain unsorted and then sort to extract the mid point, which we repost on-chain. If $n$ is the number of validators, then these records require $O(n^2)$ space on-chain, but they require only $n$ accesses. As an alternative, we also consider semi-online algorithms for completing median, like a [max heap and min heap combination](https://www.quora.com/Is-there-an-online-algorithm-to-calculate-the-median-of-a-stream-of-numbers-if-stream-elements-can-be-added-or-removed-at-any-point?share=1). We therefor propose extending substrate storage with a priority queue aka heap subtree data structure, so a pair of heaps helps compute each $\alpha_v$ and Merkelizes well. We expect other semi-functional or change resistant tree-like data structures Merkelize well, which requires further exploration. We do not achieve true consensus on approval checkers and their approval votes. Yet, our approval assignment loop gives a rough concensus, under some synchrony assumption and our Byzantine assumption. We thus expect the $\alpha_v$ agree well in practice. It then follows that miss-reporting by malicious validators should not appreciably alter the median $\alpha_v$ and hence rewards. We do however face two dilemas: A priori, validators should fix their $\alpha_{u,v}$ updates when they first mark a candidate approved, because doing so simplifies the code and makes rewards more fair for faster nodes. We'd perhaps achieve better concensus on the median $\alpha_v$ if nodes waited slightly longer however. As always, rewards exist to prevent validators from gaining an advantage by modifying their nodes' code. We hope validator operators benefit little from changing their code, but our median computation creates an incentive to do so. As part of the solution, all validators have their own self count $\alpha_{u,u}$ so we define $v$'s opinion of how much $u$ *stiffed* $v$ by $\zeta_{u,v} := \alpha_{v,u} - \alpha_{u,u}$, which we handle in the next section. We expect relay chain equivocations to be vanishingly rare. We could therefore ignore approval assignments to candidate equivocations, aka the relay equivocates assignments and their statements. #### Approches not taken We discuss approaches not taken briefly. We caution against two earlier approaches now discarded: 1. Rewards only for tranche zero assignments simplifies some probabilistic designs, but risks a tragedy of the commons, in which validators do not check more delayed tranches. 2. Rewards being probabilistic, risks validators manipulating the randomness, which favours whales. We do want better randomness for applications like games, but racing sounds unwise. We also considered "approval rewards transactions" that contain all approvals assignments and validity statements for recently approved candidates. In fact, we designed the approval assignments loop to be usable on-chain, which enables processing such reward transactions. Although too heavy for the relay chain, we could either collect "approval rewards transactions" all onto a single "system parachain". Any system parachain incurs some complexity, so is this simpler than implementing the tit-for-tat scheme described below. We need this tit-for-tat scheme for availability provider rewards though anyways. We thus incur only moderate development and maintanence burden from the combined tit-for-tat system. Although objective, we think "approval rewards transactions" could be more easily gamed than the off-chain system proposed above. In other words, approval checkers could delay releasing their assignment or vote until approval looks asured, and then vote yes without checking. We risk this anyways of course, but doing so looks more risky, especially if the checkers all run long. ## Availability provider rewards As approval checkers could easily perform useless checks, we shall reward availability providers for the availability chunks they provide that resulted in useful approval checks. We enforce honesty using a tit-for-tat mechanism because chunk transfers are inherently subjective. We reconstruct a full parachain block by downloading distinct $f+1$ chunks from other validators. In this, we mix downloading arbitrary chunks from $f+1$ different validators, out of the $n = 3f+1$ total validators, together with downloading the initial systemic $f+1$ chunks from validators who already voted valid, like backing checkers. We could even expand this with torrent like chunk sharing among our directly connected peerset. In all cases, we shall account for the download in terms of chunks. We remark that some validators should recieve credit for more than one chunk per candidate. We expect a validator $v$ has actually performed more approval checks $\omega_v$ than the median $\alpha_v$ for which they actually received credit. In fact, approval checkers even ignore some of their own approval checks, meaning $\alpha_{v,v} \le \omega_v$. Alongside approvals count for epoch $e$, approval checker (aka validator) $v$ computes the counts $\beta_{u,v}$ of the number of chunks they downloaded from each availability provider (aka validator) $u$, including themselves, which they percieve useful, meaning their own approval counts in $\alpha_{v,v}$. Approval checkers publish $\beta_{u,v}$ alongside $\alpha_{u,v}$. Symmetrically, availability provider $u$ computes the counts $\gamma_{u,v}$ of the number of chunks they uploaded to each approval checker $v$, again including themselves, again which they percieve useful, meaning the approval by $v$ counts in $\alpha_{u,v}$. Availability provider $u$ never reveal its $\gamma_{u,v}$ however. At this point, $\alpha_v$, $\alpha_{v,v}$, and $\alpha_{u,v}$ all potentially differ. We established consensus upon $\alpha_v$ above however, with which we avoid approval checkers printing unearned availability provider rewards: After receiving "all" pairs $(\alpha_{u,v},\beta_{u,v})$, validator $w$ re-weights the $\beta_{u,v}$ and their own $\gamma_{w,v}$. $$ \begin{aligned} \beta'_{w,v} &= {(f+1) \alpha_v \over \sum_u \beta_{u,v}} \beta'_{w,v} \\ \gamma'_{w,v} &= {(f+1) \alpha_w \over \sum_v \gamma_{w,v}} \gamma'_{w,v} \\ \end{aligned} $$ At this point, we compute $\beta'_w = \sum_v \beta'_{w,v}$ on-chain for each $w$ and reward $w$ proportionally. We now apply a tit-for-tat mechanism to prevent validators lying to reward their friends more than whoever actually gave them data, which our re-weighting turns into them stiffing real availability providers. An availability provider $w$ defines $\delta'_{w,v} := \gamma'_{w,v} - \beta'_{w,v}$ to be the re-weighted number of chunks by which $v$ *stiffed* $w$. Above we defined $\zeta_{w,v}$ to be number of approval checks by which $v$ *stiffed* $w$ too. So $w$ increments their cumulative stiffing perception $\eta_{w,v}$ from $v$ by the value $\delta'_{w,v}$, so $\eta_{w,v} \mathrel{+}= \delta'_{w,v} + c \zeta_{w,v}$, for some fixed constant $c>1$. In future, anytime $w$ seeks chunks in reconstruction $w$ *skips* $v$ proportional to $\eta_{w,v} / \sum_u \eta_{w,u}$, with each skip reducing $\eta_{w,u}$ by 1. We expect honest accedental availability stiffs have only fractional $\delta'_{w,v}$, so they clear out quickly, but intentional skips add up more quickly. As $c>1$ accedental approval skips add up more quickly, but not necessarily too fast. We keep $\gamma_{w,v}$ and $\alpha_{u,u}$ secret so that approval checkers cannot really know others stiffing perceptions, although $\alpha_{u,v}$ leaks some relevant information. We expect this secrecy keeps skips secret and thus prevents the tit-for-tat escalating beyond one round, which hopefully creates a desirable Nash equilibrium. We favor skiping systematic chunks to reduce reconstructon costs, so we face costs when skipping them. We could however fetch systematic chunks from availability providers as well as backers, or even other approval checkers, so this might not become problematic in practice. ## Non-laziness hashes Above, we never prevent validators from simply claiming they checked the parablock, perhaps after watching several others run checks, but without actually doing the work. We expect this resembles a validator delaying their announcement and vote until after many others vote valid. A priori, we expect economics prevents lazy validators from ling about performing approval or backing check like this for several reasons: 1. We expect maintaining software forks costs far more than running a few nodes honestly, especially since doing so risks loosing rewards, being slashed, and perhaps not having innocent slashes refunded due to non-conformant bugs. Yet, this cost declines as code stabalizes. 2. Any such lazy validators risk being slashed 100% if enough others do likewise, perhaps emboldening attackers. Alone we doubt this disuades laziness, but software forks and common delays being detected risks angering nominators. 3. Availability provider rewards cannot be cheated so easily, so such lazy validators still participate somewhat in availability, and thus still pay significant bandwidth costs. We expect hosting and bandwitdth costs exceed the fixed CPU costs, making this our dominante economic argument. As an option, we propose one simple check that helps prevent lazioness: We ask all PVFs return an execution hash, dependent upon the PVFs' own code and the parablock, but which otherwise carries no special meaning. We define the extended execution hash to be the hash of execution hash followed by the the parablock data, or perhaps the PoV block data. We then define a validator $v$'s 16 byte non-laziness hash to be the hash of $v$'s public key, the extended execution hash, and maybe the candidate reciept. We never share execution hashes or extended execution hashes over the network, and the non-laziness hash includes the parablock data, so dishonest nodes could only compute the non-laziness hash by at least obtaining and ideally by running the PVF on the parablock. All checkers place their 16 byte non-laziness hash into their validity statements. All checkers save their extended execution hashes so they can quickly check one another's non-laziness hash. Any honest validator who detects an invalid non-laziness hash broadcases an accusation. We make W3F and/or Parity log and investigte accusations of invalid non-laziness hash. Any non-laziness hashes found correct result in first public critisism of the validator operator, and second an upgrade of the non-laziness hash system to perform on-chain punishments. As Sophia observed, Ed Felten (Arbitrum) discussed almost exactly these under the catchier name [attention challenges](https://medium.com/offchainlabs/cheater-checking-how-attention-challenges-solve-the-verifiers-dilemma-681a92d9948e), including more accusation discussion. Again, these sound unecessary if bandwidth costs dominate CPU costs. ## Issues We do not prevent validators from preferentially obtaining their pieces from their friends. We should analyze, or at least observe, the long-term consequences. We track rewards so coarsely that identifying bugs, may prove more difficult, or similarly distinguishing bugs from modified behavior, which also demands careful observation. We do not have the per block provider bitfields from https://github.com/w3f/research-security-issues/issues/30#issuecomment-768593265 in particular. #### Rob's nominator-wise skipping A priori, whale nominator's validators could stiff validators but then rotate their validators quickly enough so that they never suffered being skipped back. As a fix, we could assign skip weights to nominators and then deduce validators skip weights from those. We do not require additional bandwidth or on-chain computation to do nominator-wise skip computations, only extra local computation, but such local coputations shall not perfectly align with shifting balances. We do caution that nominator-wise skipping complicate our ongoing nomination protocol work, which eventually extends outside polkadot itself, ala parachains nominating. We worry nominator-wise skipping creates an arguably worse risk of bad validators with enough nominators boost their rewards by acruing penalties, which stay with their innocent nominators, who never even know their skip penalties, and then penalize the other validators those nominators nominate. Initially, we propose only validator-wise skipping by default, but eventually either implement the nominator-wise skipping as a parallel optional system, or else some simpler detenction system. In theory, we could give validators the option to run a hybrid of both systems, and locally determine their own strategy. All these concerns depend somewhat upon the relative rewards values, so we'll ultimately adopt one or more of the following: - Prove some Nash equilibrium among hybrids of validator-wise and nominator-wise skipping. - Increase costs for whale nominators churning their validators, like by increasing minimum self stake not tied to identity. - Do rewards votes inside TEEs like SGX.