owned this note
owned this note
Published
Linked with GitHub
# NPoS validator election protocol: running computations off-chain
A suggestion to optimize the NPoS validator election protocol is to let general dot holders submit tentative solutions (where a solution consists of a committee of validators + a distribution of the nominators' stake among them) during a submission period at the end of each era. In order to find these solutions they may run off-chain one of our suggested solution-finding algorithms (see our [NPoS paper](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf)), or even any other algorithm of their choice. They can also use any randomness and tie breaking rule of their choice, which will result in different solutions even for the same algorithm. On top of this voluntary participation open to anyone, **we make each validator compute a solution off-chain and submit it**.
What we will run on-chain is a fast protocol that checks for the validity of each submission, computes a numerical **score** representing its quality, and use this score to rank the submissions, keeping only the best one seen so far. The best submission at the end of the submission period becomes the official solution for the next era. Optionally, to prepare for the unlikely case that there are no (good) submissions, we can also run on-chain the seqPhragmen algorithm, which is the fastest algorithm we have that satisfies the proportional justified representation (PJR) property.
The main gain of this approach is that on-chain computations are much faster, since we're moving the heavy tasks off-chain. Besides, we can now afford to run other solution-finding algorithms off-chain besides seqPhragmen, that are slower but tend to find better solutions. Moreover, if we have different agents running different algorithms, and we keep the best solution found, we're likely to end up with a better solution than if everybody ran the same algorithm. In this note we provide general algorithmic considerations for the tasks of submitting and processing solutions.
## Definitions and notation
Recall that an instance of the NPoS validator election problem is given by the bipartite graph $(N\cup V, E)$, where $N$ is the set of nominators, $V$ is the set of validator candidates, and there is an edge $nv\in E$ whenever nominator $n\in N$ supports the election of candidate $v\in V$. The instance also includes nominator budgets $b:N\rightarrow \mathbb{R}_{\geq 0}$, and a target number $m$ of validators to elect.
A valid solution consists of a tuple $(S,w)$, where $S\subseteq V$ is a committee of $m$ validators, and $w:E\rightarrow \mathbb{R}_{\geq 0}$ is an edge weight vector which describes how the budget of each nominator $n$ is distributed among $n$'s elected neighbors. Vector $w$ must be *valid*, meaning that $\sum_{v\in S: \ nv\in E} w_{nv}\leq b_n$ for each nominator $n\in N$. Solution $(S,w)$ provides a support $supp_{w}(v):=\sum_{n\in N: \ nv\in E} w_{nv}$ to each validator $v\in S$, and the overall support of the solution is defined as $supp_w(S):=\min_{v\in S} supp_w(v)$, i.e. the minimum of the validator supports.
## Solution reduction
A tentative solution $(S,w)$ must be submitted on-chain as a transaction. This imposes a hard constraint on the encoding length of the solution, as the size of any block, and thus of any transaction, is a couple hundred megabytes at most. To deal with this, we show **a)** how to **reduce** the number of edges with non-zero weight in any solution without changing the supports it provides to validators, and **b)** how to **encode** this reduced solution compactly. The idea is that every agent that finds a solution must perform these two steps before submitting it, and we shall only accept submitted solutions that are so encoded.
**Definitions.** Given a solution $(S,w)$, we denote by $E_w$ the **edge support of $w$**, i.e.
$$E_w:=\{e\in E: \ w_e>0\}.$$
We say that a solution $(S,w)$ is **reduced** if $E_w$ is a forest, i.e. it contains no cycles. Furthermore, let $k$ be the maximum degree of any nominator over any solution $S$ of size $m$. Parameter $k$ is always upper bounded by $m$, as well as by the maximum nominator degree of the input graph. In particular, in Polkadot we let each nominator approve only up to 16 candidates; in this case, $k\leq 16$.
**Theorem. For any valid solution $(S,w)$ there is a valid reduced solution $(S,w')$ with $supp_{w'}(v)=supp_{w}(v)$ for each validator $v\in S$. Moreover, such a solution can be computed from $(S,w)$ in time $O(\min\{|E_{w}|\cdot k + m^3, |E_w|\cdot m\})$.**
Before we present the explicit algorithm, we explain the general idea. We consider the input edge weight vector $w$ as a directed flow over $E_w$, where the flow in each edge is directed from the nominator to the validator. We build $w'$ from $w$ by removing circulations to cancel out the flow over as many edges as possible, while preserving flow demands over all vertices and without reverting the flow direction over any edge. As long as there is a cycle, we can remove an additional circulation and eliminate at least one new edge from $E_{w'}$. This shows that the algorithm always progresses and will eventually finish with an acyclic edge support.
We keep a data structure that represents a forest of rooted trees, which is initialized as a collection of singletons -- one per vertex -- and to which edges in $E_w$ are added one by one, causing the trees to merge. Whenever a new edge creates a cycle, we detect it and destroy it by removing a circulation.
We also run a pre-computation -- Phase 1 below -- which is designed to detect and remove cycles of length four exclusively. This pre-computation is optional, and if we skip it then the running time becomes $O(|E_w|\cdot m)$, so the pre-computation makes sense only if $m>>k$ and $|E_w|>>m^2$.
**Algorithm $Reduce(S,w)$.**
1. Initialize the edge list $F\leftarrow E_w:=\{e\in E: \ w_e>0\}$ and edge weight vector $w'\leftarrow w$.
(Phase 1: removing cycles of length 4)
2. Create a hash map $M$ where the keys are all the $\binom{m}{2}$ pairs of validators in $S$, and the values will be nominators. It is initalized as empty.
3. For each nominator $n\in N$ (outer loop), and each pair of validators $\{v,v'\}$ with $nv, nv'\in F$ (inner loop):
a. If key $\{v,v'\}$ has no value, add $n$ as its value and continue.
b. Else, let $n'$ be the value in key $\{v,v'\}$. If $n'v\notin F$ or $n'v'\notin F$, replace value $n'$ with $n$ in this key and continue.
c. Else, we have identified a size-4 cycle $C=nvn'v'$ in $F$:
c.1. Considering the weights over $w'$ as flows directed from nominators to validators, find the edge $e\in C$ with least flow $w'_e$, and subtract from $C$ a circulation of value $w'_e$, so that the new weight of $e$ is zero (i.e. alternatively add and remove the amount $w'_e$ from the weights of the edges along $C$).
c.2. Remove from $F$ the edge $e$ and any other edge $e'\in C$ whose new weight is $w'_{e'}=0$.
c.3. Leave $n$ as a value in key $\{v,v'\}$ iff $nv, nv'\in F$, and leave $n'$ as a value in this key iff $n'v, n'v'\in F$ (notice we'll always leave at most one value in this key).
(Phase 2: removing cycles of all lengths)
4. Define a hash map $M'$ where both the keys and the values will be elements in $N\cup S$, and all values are initialized to $null$. Given a key $x\in N\cup S$, its value $y$ in $M'$ is denoted by $M'(x)$ and called $x$'s "parent". An invariant of our algorithm is that the arcs of this map always correspond to a forest of rooted trees over $N\cup S$.
5. For each edge $nv\in F$:
a. (Dealing with new vertices) If both $M'(n)=null$ and $M'(v)=null$, set $M'(n)\leftarrow v$ and $M'(v)\leftarrow v$, and continue. If $M'(n)=null$, set $M'(n)\leftarrow v$, and continue. If $M'(v)=null$, set $M'\leftarrow n$, and continue. (Notice the distinction between the cases $M'(x)=null$ and $M'(x)=x$: the former means "$x$ is not yet part of any tree", while the latter means "$x$ is the root of its own tree".)
b. Let $r_n$ and $r_v$ be respectively the roots of $n$'s tree and $v$'s tree, which we find by following the parents iteratively. In doing so, we also keep track of these paths' lengths.
c. If $r_n\neq r_v$, we merge these trees as follows. If path $n-r_n$ is shorter than path $v-r_v$, revert the parent relations of all vertices along path $n-r_n$, and set $M'(n)=v$; else, revert the parent relations of all vertices along path $v-r_v$, and set $M'(v)=n$. Continue.
d. If $r_n=r_v$, identify the common suffix of paths $n-r_n$ and $v-r_v$, and then identify the simple cycle $C$ that contains the edges on the different prefixes of these paths plus edge $nv$.
e. In cycle $C$, identify the edge $e$ of least weight $w'_{e}$, and subtract from $C$ a circulation of value $w'_e$, so that the new weight of $e$ is zero. Remove $e$ from $F$.
f. (Rearranging the tree structure) If edge $e$ belongs to path $n-r_n$, revert the parent relations of all nodes between $n$ and $e$ along this path, and set $M'(n)=v$. Else, if edge $e$ belongs to path $v-r_v$, revert the parent relations of all nodes between $v$ and $e$ along this path, and set $M'(v)=n$. (If $e=nv$, do nothing at this step.)
g. For each additional edge $e'\in C\setminus\{e\}$ with $w'_{e'}=0$, remove $e'$ from $F$ and remove the corresponding parent relation between the endpoints of that edge. (More precisely, if $M'(x)=y$ where $x$ and $y$ are the endpoints of $e'$, then set $M'(x)\leftarrow x$.)
6. Return $w'$ and $F$.
The previous theorem follows from the next claims.
**Lemma.** In the algorithm above:
i. At the end of each iteration of Steps 3 and 5, $w'$ remains valid, and keeps all validator supports unchanged. Moreover, as $w'$ evolves, list $F$ remains equal to edge support $E_{w'}$.
ii. Step 3 executes in time $O(|E_w|\cdot k)$.
iii. In the hash map $M$ defined in Step 2, each key has at most one value throughout the execution of Step 3.
iv. In Step 5, there are at most $\min\{|E_w|,m^2\}$ iterations that execute in time $O(m)$ each, and at most $|N|$ iterations that execute in time $O(1)$ each. Thus, Step 5 takes time $O(\min\{|E_w|\cdot m, m^3+|N|\})$.
v. The overall running time of the algorithm is $O(\min\{|E_w|\cdot k+m^3, |E_w|\cdot m\})$.
vi. The hash map $M'$ in step 4 has the property that the underlying *undirected* graph over vertices $N\cup S$ defined by its arcs remains cycle-free after each iteration of Step 5. Moreover, at the end of Step 5 the set of edges in this graph coincides with $F$.
*Proof.* i. This follows from the fact that we only subtract circulations to the network flow, so all the node demands are preserved. The second statement is evident.
ii. If a nominator $n$ has $d_n$ incident edges in $E_w$, it will account for at most $\binom{d_n}{2}$ iterations of Step 3. The total number of iterations is then at most
$$\sum_{n\in N} \binom{d_n}{2}< \sum_{n\in N} \frac{d_n\cdot k}{2}=\frac{k}{2}\sum_{n\in N}d_n = \frac{k}{2}\cdot |E_w|=O(|E_w|\cdot k),$$
where we used the fact that $d_n\leq k$ for each nominator $n$. It is evident that each iteration of Step 3 takes constant time.
iii. Assume there is an iteration of Step 3 such that some key $\{v,v'\}$ starts with one value $n'$ and ends with two values $\{n,n'\}$. This means that each edge in the cycle $n-v-n'-v'$ is in $F$ by the end of the iteration; but this is precisely the cycle $C$ considered in that iteration, so at least one of these edges should have been cancelled in Step 3.c. We reach a contradiction.
iv. Consider the state of edge set $F$ by the end of Step 3. We call a nominator $n$ a *leaf* if it has a single incident edge in $F$, and the corresponding edge is called a *leaf edge*. Clearly, there are as many leaf edges as leaves, both at most $|N|$. Whenever $nv$ is a leaf edge, the corresponding iteration of Step 5 stops at 5.a., taking time $O(1)$. On the other hand, for a non-leaf edge the iteration of Step 5 takes time $O(m)$ because the size of the path between any node $n$ (or $v$) and its root $r_n$ (or $r_v$) is at most $2m$, as every second node along the path is a distinct validator in $S$. Clearly, there are at most $|E_w|$ non-leaf edges, even if we skip Phase 1. To complete the proof of the claim, we also show that if we do perform Phase 1 then the number of non-leaf edges in $F$ is also bounded by $m^2$. In the dictionary, for every non-leaf edge $nv$ there must be at least one key $\{v,v'\}$ having value $n$. Since each such key $\{v,v'\}$ has at most one value, a value $n$ in it represents two (non-leaf) edges (namely $nv$ and $nv'$), and there are $\binom{m}{2}$ keys, there can be at most $2\cdot \binom{m}{2}<m^2$ non-leaf edges.
v. This running time follows immediately from points ii. and iv.
vi. Let $H$ be the underlying undirected graph of the linked data structure. It can be checked that if an iteration of Step 5 adds an edge $e\in F$, then in this and all subsequent iterations it holds that $e\in F$ iff $e\in H$. Therefore, once we have inspected all edges in $F$ we have that $F$ coincides with the edge set of $H$. Moreover, it is easy to see that $H$ always remains acyclic.
$\square$
## Solution encoding
Notice that if $(S,w)$ is a reduced solution with $m$ validators, the size of its edge support is
$$|E_w|< |N|+m.$$
This is because the graph induced by the edge support $E_w$ forms a forest with $|N| + m$ vertices, and a forest always has fewer edges than vertices; see [here](https://proofwiki.org/wiki/Number_of_Edges_in_Forest).
We exploit this fact to achieve a small encoding. We also exploit the fact that, for each nominator, the sum of weights of its incident edges in $E_w$ must be equal to its budget, so we can skip the weight of one edge per nominator in the encoding. In particular for nominators which are leaves (i.e. with a single incident edge) we don't need to give any weight in the encoding. We assume here that there is a publicly known array of nominators (with a canonical traversal ordering), a list of their budgets, and a list of candidates, so that this information does not need to be part of the encoding.
We **encode** a reduced solution $(S,w)$ with
1. an array of length $m$, with identifiers of the validators in $S$, and
2. an array of length $|N|$, with one entry per nominator which encodes that nominator's incident edges and edge weights in $E_w$,
where for a nominator $n\in N$, the entry mentioned in Point 2 is as follows. If $n$ is not part of the solution, the entry is null. Else, we provide an array listing each neighbor $v$ of $n$ in $E_w$, using $v$'s index in the array of Point 1, followed by an array of all the edge weights $w_{nv}$, expressed as a fraction of $n$'s budget, except for the last weight which can be inferred from the other ones as the fractions must sum up to one.
**Theorem.** The bit-length of the encoding above is
$$(|N|+m)\cdot (\log_2 m +O(1)) + m\cdot (B_V + B_w),$$
where $B_V$ is an upper bound on the bit-length required to identify a validator candidate, and $B_w$ is an upper bound on the bit-length required to express an edge weight as a fraction of the nominator's budget (depending on the desired level of accuracy).
*Proof.* The array of elected validators in Point 1 has a bit-length of $m\cdot B_V$. There are at most $(|N|+m)$ edges in the solution, and each one can be identified, without its weight, with only $(\log_2 m +O(1))$ bits, because it is enough to indicate the validator's index at the nominator's list entry. Thus, all edges without weights take $(|N|+m)\cdot (\log_2 m +O(1))$ bits.
To complete the proof we show that, by skipping one edge weight per nominator, we only need to encode at most $m$ weights, and thus only $m\cdot B_w$ bits are needed to encode weights: If $d_n$ is the degree (number of incident edges) of $n$ in $E_w$, then $\sum_{n} d_n=|E_w|<|N|+m$, as mentioned at the beginning of this section. Therefore, the number of encoded weights is $\sum_{n} (d_n - 1)=|E_w| - |N| < m$.
$\square$
Notice that the size of this encoding is $O(|N|\cdot \log m)$, i.e. linear in the number of nominators, and in practical terms it takes between 2 and 3 kilobytes for every one thousand nominators, which makes it a reasonable size for a transaction. Notice also that parameters $B_V$ and $B_w$ have low impact on the encoding size, as they don't appear in the leading term, and that parameter $k$ (the number of nominations per nominator) has *no impact* on the encoding size.
**Note about encoding and ronding edge weights.** We suggest to express each edge weight as a fraction of the nominator's budget, instead of its absolute value in dots, for two reasons. First, it makes it easier to check for feasibility on-chain, as we only need to check that the sum of the encoded edge weights is at most one for each nominator, without having to query the nominators' budget. Second, it is convenient for rounding: we suggest to round each edge weight to a multiple of 1/2^16 of the nominator's budget. This way, we only need two bytes per edge weight (i.e. parameter $B_w$ above equals 16 bits), and the values of the validator supports change minimally. We present next an analysis of this change due to such rounding.
**Lemma.** If a solution is balanced (meaning that the variance across the validator supports is minimized, by running an algorithm such as star-balancing; see [paper](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf)), and each edge weight is rounded down to a multiple of $1/2^{16}$ of the corresponding nominator's weight, then each validator support loses at most a fraction $m/2^{16}$ of its original value. In particular, if $m\leq 650$, the loss is of less than $1\%$.
*Proof.* If a solution $(S,w)$ is balanced, then the graph induced by its edge support $E_w$ has the property that in each of its [connected components](https://en.wikipedia.org/wiki/Component_(graph_theory)), all validators have the same support. To see why, notice that otherwise there must be a nominator that gives non-zero backings to two validators with different supports, so the star-balancing algorithm centered at that nominator could rebalance the weights to reduce the difference in supports, proving that the solution wasn't fully balanced.
Now let $D$ be a connected component. By the above claim, each validator $v$ in it receives a support of $supp_w(v)=\frac{1}{|S\cap D|}\sum_{n\in N\cap D} b_n$. If validator $v$ loses an amount at most $b_n/2^{16}$ of support from each nominator $n$ in that connected component due to rounding, then it loses at most $\sum_{n\in N\cap D} b_n / 2^{16} = |S\cap D|\cdot supp_w(V)/2^{16}\leq (m/2^{16})\cdot supp_w(v)$. This completes the proof.
$\square$
## How to submit a solution
As mentioned before, we make each validator submit a tentative solution, and allow anybody else to do it as well. For a publicly available instance of the validator election problem, defined by a graph $(N\cup V, E)$, a vector $b$ of nominator budgets and a committee target size $m$, the steps required to find and submit a solution are as follows:
1. Find a valid solution $(S,w)$. They can do so by running one of our suggested algorithms (see [NPoS paper](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf)), or possibly an algorithm of their own. Preferably, they should run an algorithm that guarantees a constant-factor approximation for the maximin support objective (we propose two such algorithms). Notice that the seqPhragmen method does not provide this guarantee, but is the fastest with a runtime of $O(|E|\cdot m)$.
2. (Optional) Make the solution *balanced*, which means that the variance across the validator supports is minimized (see [paper](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf)), and which is achieved by running some rounds of the star-balancing algorithm on it. (This step is recommended after seqPhragmen but unnecessary for our other algorithms, which already guarantee the output solution to be balanced.) This will improve the *score* of the solution (we explain scores in the next section).
3. Reduce the solution, in time $O(|E|\cdot m)$. This leaves the score unchanged.
4. (Optional) Run our *PJR-enabler* as a post-computation, described below, in time $O(|E|\cdot m\cdot \log m)$. The PJR-enabler takes the current reduced solution $(S,w)$, tries swapping some candidates, and eventually outputs a new reduced valid solution $(S',w')$ that a) is guaranteed to satisfy the property of proportional justified representation (PJR), and b) has a score at least as good as the input solution $(S,w)$. We remark that the seqPhragmen algorithm already guarantees that its output satisfies PJR, but performing this step is still a good idea as it is likely to improve the score considerably.
5. Compute the score of the solution.
6. Encode the solution, and submit the encoded solution, together with its score.
The complexity of the whole operation is thus likely to be dominated by the first step, i.e. running a solution-finding algorithm.
We explain now the details on the PJR-enabler below; however, the reader is required to first inspect [our NPoS paper](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf), as we heavily refer to it.
In the algorithm below, the input is a valid and reduced solution $(S,w)$, and a small constant $\varepsilon>0$. It corresponds precisely to $PJRenabler$ (Algorithm 7 in the PDF), with the only difference that in Step f we ensure that the solution remains reduced after each iteration.
**Algorithm.** $EnablePJR(S,w, \varepsilon)$
1. Let $\hat{t}\leftarrow \sum_{n\in N} b_n/m$;
2. While True:
a. (Find candidate with least support) Find tuple $(v_{\min}, t_{\min})$ so that $t_{min}=supp_w(v_{\min})=\min_{v\in S}supp_w(v')$;
b. (Remove it from solution) Define partial solution $(\bar{S}, \bar{w})$ where $\bar{S}\leftarrow S\setminus\{v_{\min}\}$, $\bar{w}\leftarrow w$, and $\bar{w}_{nv_{\min}}\leftarrow 0$ for each $n\in N$ with $nv_{\min}\in E$;
c. (Find best replacement) Let $(c_{\max}, t_{\max})\leftarrow MaxScore(\bar{S}, \bar{w})$;
d. (Stopping condition) If $(t_{max} < \min\{(1+\varepsilon)t_{min}, \hat{t}\})$, return solution $(S,w)$;
e. Update $(S,w)\leftarrow Insert(\bar{S}, \bar{w}, c_{\max}, t_{\max})$;
f. Reduce $(S,w)$.
**Lemma:** The algorithm above has the following properties.
1. If the input solution provides an $\alpha$-approximation for the maximin support problem, for some parameter $\alpha\geq 1$, then the algorithm executes in time $O(|E|\cdot m \cdot \log m\cdot (1+\varepsilon^{-1} \cdot \log \alpha))$. In particular, by making $\varepsilon\rightarrow \infty$, the running time goes down to $O(|E|\cdot m \cdot \log m)$, regardless of the approximation factor $\alpha$ of the input solution.
2. The output solution has a minimum support at least as high as the minimum support of the input, hence it maintains the $\alpha$-factor approximation for the maximin support problem.
3. The output solution satisfies PJR (regardless of the value of $\varepsilon$), as well as the stronger property of being "$\varepsilon$-close to locally optimal" (see the PDF).
4. At the end of each iteration, the solution is of size $m$ and reduced, and thus the same is true for the output solution.
**Proof.** Points 2 and 3 follow from the proof of Theorem 13 in the PDF, and Point 4 easily follows by inspection. Hence, we focus on Point 1. It is proven in the PDF that the claimed running time holds when we remove the Reduce operation (step f). Hence, it is sufficient to prove that the aggregate running time of the Reduce operations throughout the iterations is $O(|E|\cdot m\cdot (1+\epsilon^{-1} \cdot \log \alpha))$, as it shows that the overhead caused by these operations is dominated in complexity by the rest of the algorithm.
We assume here that we keep the data structure built for the Reduce operation from one iteration to the next, i.e. at the start of each iteration we have a solution whose edge support is cycle-free, together with a hash map that represents a collection of rooted trees over the graph induced by it. If we remove candidate $u$ and add candidate $v$ in that iteration, and these candidates have degrees (number of nominators supporting them) $d_u$ and $d_v$ respectively, then we can update this hash map in time $O(d_u+m\cdot d_v)$, because we can remove each edge in $O(1)$ time and add each edge in $O(m)$ time, where for each added edge we may need to remove a circulation over a cycle of size at most $2m$. Next, for each candidate $v\in V$ let $c_v$ count the number of times $v$ is added to the solution in some iteration throughout the algorithm, meaning that it is also removed in some other iteration at least $c_v-1$ times and at most $c_v+1$ times. Then, the aggregate running time of all the Reduce operations is
$$\sum_{v\in V}c_v \cdot O(m\cdot d_v) = O(m \cdot \sum_{v\in V} d_v \cdot \max_{v\in V} c_v) = O(|E|\cdot m\cdot \max_{v\in V} c_v).$$
To complete the proof, notice that it suffices to show that for any candidate $v\in V$, $c_v\leq 1+ \log_{1+\varepsilon}\alpha = O(1+\epsilon^{-1} \cdot \log \alpha)$. To this end, we claim that if $v$ is added to the solution in the $i$-th iteration, and then removed in the $j$-th iteration for some $j>i$, then the value of the least support in the current solution, $supp_w(S)$, increases by at least a multiplicative factor $(1+\varepsilon)$ between the $i$-th and the $j$-th iteration. Indeed, if the state of the current solution at the start of the $i$-th iteration is $(S^i,w^i)$, we know from the stopping condition that $v$ is added with a support of at least $(1+\varepsilon)\cdot supp_{w^i}(S^i)$, and we also know that this support does not decrease below that threshold in the future iterations. If $v$ is then removed at the $j$-th iteration, it must be the validator with least support at the time, thus $supp_{w^j}(S^j)=sup_{w^j}(v)\geq (1+\varepsilon) supp_{w^i}(S^i)$.
We now apply the last claim as follows. If candidate $v\in V$ is added to the solution $c_v$ times throughout the iterations of the algorithm, then it is also removed at least $c_v -1$ times, and this implies that the support of the input solution is multiplied by a factor at least $(1+\varepsilon)^{c_v - 1}$. However, since the input solution had a factor-$\alpha$ approximation guarantee for the maximin support objective, it must holds that $(1+\varepsilon)^{c_v - 1} \leq \alpha$, and consequently $c_v\leq 1+ \log_{1+\varepsilon} \alpha$. This completes the proof.
$\square$
## The score function used to rank tentative solutions
We evaluate a solution $(S,w)$ by the following three objectives, in lexicographical order:
1. its minimum validator support, $supp_w(S):=\min_{v\in S} supp_w(v)$ (the larger the better),
2. its sum of validator supports, $\sum_{v\in S} supp_w(v)$ (the larger the better), and
3. its sum of validator supports squared, $\sum_{v\in S} (supp_w(v))^2$ (the smaller the better).
The first and most important objective corresponds to maximim support, ensuring that the *minimum* validator support is high. Notice that our algorithms provide approximation guarantees for it. Subordinate to that, the second objective ensures that the *sum* of validator supports is high. This value should be equal to the sum of budgets of the represented nominators (nominators with at least one supported candidate in the committee), if the solution uses all the available budget, and thus the value depends only on the elected committee and not on the edge weights. Finally, subordinate to both previous objectives, the third one ensures that the *variance* in the distribution of validator supports is low. Notice that the star-balancing algorithm minimizes this third objective for the elected committee. All together, these objectives ensure that the selected solution is as secure and decentralized as possible.
**Remark.** Check out [this note](https://hackmd.io/GjZrMV5nQpGNVCXbAnlMPQ?both) to handle the possible problem that the third component of the score does not fit inside a u128 variable.
**Notes on incentives and anti-spamming.** We probably don't need to offer rewards to agents that submit solutions, or slash validators that don't, though we could contemplate these possibilities in the future. This is because we can expect anyone who's a candidate for the next era to be motivated to find and submit a high quality solution where they get elected. In particular, if $S$ is the committe that our suggested algorithm finds, then we can expect most candidates in $S$ to try to submit this solution. Thus, assuming the network has a reasonable level of censorship resistance, we'll end up with a solution at least as good as $S$ in terms of score.
To avoid spamming with submissions, we propose three mechanisms:
a) Any submitted solution must begin with a *claim of its score*, and if this value isn't higher than the score of the tentative solution kept on-chain (the best one seen so far), the submission is discarded. Ideally, this check happens in the ingress queue, so that a discarded submission causes zero on-chain overhead, and the transaction fee is not charged to the submitter.
b) Non validators pay a heavy transaction fee for each submission.
c) Validators don't pay a transaction fee, and may potentially submit more than one solution in the same submission period. But, if the claimed score of a solution is checked to be incorrect, we ignore any further submissions of the same author.
**Minimizing submissions.** To minimize the number of submissions, and rewards early submissions, we also suggest that we only accept a new solution on-chain if it is *considerably better* than the current best solution. In particular, for some constant $\varepsilon>0$, say $\varepsilon=0.05$, we accept a new solution $(S',w')$ as a replacement to current solution $(S,w)$ only if
- $supp_{w'}(S')\geq (1+\varepsilon)\cdot supp_w(S)$, or
- $supp_{w'}(S')\geq supp_w(S)$ and $\sum_{v\in S'} supp_{w'}(v)\geq (1+\varepsilon)\cdot \sum_{v\in S} supp_w(v)$, or
- $supp_{w'}(S')\geq supp_w(S)$ and $\sum_{v\in S'} supp_{w'}(v)\geq \sum_{v\in S} supp_w(v)$ and $\sum_{v\in S'} supp_{w'}^2(v)\leq (1-\varepsilon) \cdot \sum_{v\in S} supp_w^2(v)$.
**Note.** Instead of the current three-component score, we could use a two-component score, consisting of
a) $\min_v supp_w(v)$, to be maximized, and
b) $[\sum_v supp_w(v)]^2 - \sum_v [supp_w(v)]^2$, also to be maximized.
It's not clear that to me that this is preferable to the three-component score in any significant way, but it can be proved to be *equivalent*. I can add such proof if there's interest.
## Processing submissions efficiently
On the on-chain side, submissions are processed as follows (we organized the required checks from fastest to slowest). Suppose we have a current tentative solution $(S,w)$ with score $(x,y,z)$, and receive a submission with a new solution $(S',w')$ and claimed score $(x',y',z')$:
First, we discard any submission that is not given in the correct format, i.e. a claimed score + a reduced encoded solution. In particular, remember that such a solution has a specific bound on its bit length (see section on encoding), so any submission that is too long can be discarded immediately. Next, we discard the submission unless its claimed score is *considerably better* (see section on score) than the score of the current solution. Hopefully, these two checks are done off-chain by each validator in the validation phase, i.e. as soon as a submission arrives to the ingress queue.
Then, for each nominator $n$ compute the missing edge weight, as a fraction of $n$'s buget, by subtracting from one the weights of the other adjacent edges. If this computation gives a negative number, discard the solution.
Then, check for feasibility, by checking that each used nominator, validator, and edge, is part of the input problem. This step takes $O(|E_{w'}|)=O(|N|)$ lookups in memory, where we recall that $|E_{w'}|$ is the number of edges with non-zero weight in the solution, which is linear in the number of nominators (regardless of the number of nominations per nominator) because the solution was reduced.
(Possibly at the same time as the last step.) Retrieve each nominator's budget, convert each edge weight into a value in dots by multiplying it with the nominator's budget, and compute each validator's support. This step also takes time $O(|E_{w'}|)=O(|N|)$. Then, compute the actual score of the solution, and if it differs (considerably) from the claimed score, discard the solution and ignore any new submissions by the same author.
(Optional) Furthermore, we only accept the solution if it satisfies the PJR property. We run the *PJR-checker*, specified in [Section 5.1 of the PDF](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf). This check runs in time $O(|E|)$, in particular it need to read information about nominators, candidates and edges that are part of the input but not part of the solution, so this is the slowest step. If the *PJR-enabler* has been run on the solution by the submitter, then the solution is guaranteed to pass the PJR-checker.
If the new solution $(S',w')$ passes all these steps, then we keep it and discard the previous solution $(S,w)$.
**Note.** Other ideas to improve the speed of the on-chain computation are:
* We let each block producer include at most one submission per block, to avoid stuffing the latter. The submission period should be long enough so that each validator gets to produce at least one block with high probability, and thus gets a chance to include his own submission.
* As we saw, the bit length of our encoded solutions is linear in the number of nominators (and independent of the number of nominations per nominator), and take a couple of kilobytes for every thousand nominators. In the future, we can add a minimum stake requirement to become a nominator, to ensure that the number stays bounded. We can also add liquid democracy, so that second-order nominators with low stake can delegate their votes to first-order nominators.
* If after the end of the submission period there are no good submissions, we execute Phragmén's algorithm on-chain. As this algorithm elects one validator at a time throughout $m$ rounds, each taking time $O(|E|)$, we can execute one round per block, so that each block performs a task in time $O(|E|)$.
## Ideas (by Kian)
- Parsing a compact solution on chain, very small edges can be omited. This can aslso help with rewards.
- The score comparision can be moved to validation phase because it is dead cheap to test and it can prevent someone from paying a hefty fee for a solution which is valid but not good enough.
- We can have more than 1 phases: Nomination period, computation perdiod and submission perdiod. For now just one would work.