owned this note
owned this note
Published
Linked with GitHub
*migrated to http://research.web3.foundation/en/latest/polkadot/NPoS/3.%20The%20maximin%20support%20problem/*
# NPoS: The maximin support problem
In NPoS we need to select a committee of validators of size $m$, and assign to them the nominators' stake, according to some objective. In this note we consider the objective of maximizing the minimum stake support perceived by any elected validator. We call this the maximin support problem.
In Section 1 we define the problem formally, and remark that it is NP-hard. In Section 2 we present a $(2+\varepsilon)$-factor approximation algorithm for it. Finally, in Section 3 we establish a property of the maximin support objective which, informally speaking, says that in an optimal committee no pool of nominators is over-represented, thus motivating the choice of this objective for NPoS.
We remark that this problem has been recently studied by [Sánchez-Fernández et al. (2018)](https://arxiv.org/abs/1609.05370), who proposed a greedy algorithm for it. We take ideas from that paper but propose a different, more efficient algorithm.
## 1. Definitions and notation
An instance of the maximin support problem is given by a bipartite graph $G=(N\cup V, E)$ of nominator-validator relations, a vector of nominator budgets $b\in \mathbb{R}_{\geq 0}^{N}$, and the target number $m$ of validators to be elected.
A feasible solution consists of a pair $(S,w)$, where $S\subseteq V$ is a committee of size $m$, and $w\in\mathbb{R}^E_{\geq 0}$ is a vector of edge weights. Vector $w$ must be "affordable", meaning that it observes the budget constraints $\sum_{v\in V: \ nv\in E} w_{nv}\leq b_n$ for each $n\in N$. In this note, we do not require solutions to be maximally affordable. For each validator $v$ we define its _support_ relative to $w$ as $supp_w(v):= \sum_{n\in N: \ nv \in E} w_{nv}$, and extend this definition to any set $V'\subseteq V$ by $supp_w(V'):=\min_{v\in V'} supp_w(s)$.
The objective of the maximin support problem is to find a feasible solution $(S,w)$ that maximizes $supp_w(S)$. It was observed by [Sánchez-Fernández et al. (2018)](https://arxiv.org/abs/1609.05370) that this optimization problem is NP-hard.
## 2. The algorithm
For a committee $S\subseteq V$ of $m$ validators, we say that an affordable edge weight vector $w\in\mathbb{R}_{\geq 0}^E$ is optimal for $S$ if it maximizes $supp_w(S)$ over all affordable vectors. Given a committee $S$, an optimal vector $w$ for it can be computed in polynomial time. In particular, in [our note on the min-norm max-flow problem](https://hackmd.io/r_arDp1LRRWtu8t5nGZo_g), we provide two algorithms to compute this vector, which respectively run in time $O(|E|m+m^3)$ and $\tilde{O}(|E|m^2)$ (ignoring logarithmic terms in the latter expression). Let $F$ be the running time required to compute this vector, depending on the algorithm used.
Fix a constant $\varepsilon>0$. Our proposed algorithm computes the optimal weight vectors of at most $O(|V|\log(\varepsilon^{-1}\log m))$ committees, and returns the best one found. Therefore, the running time is clearly
$$O(F|V|\log(\varepsilon^{-1}\log m)),$$
and we will prove that its output $(S,w)$ has an objective value $supp_w(S)$ that is at a multiplicative factor at most $(2+\varepsilon)$ away from optimal.
The algorithm is performed using two nested loops:
__Guessing the objective value__ (external loop): For a target support value $d$, our algorithm tries to construct an affordable solution $(S,w)$ with $supp_w(S)\geq d$, and declares either success or failure. We respectively raise or decrease the value of $d$, and try again. We perform binary search on $d$, and output the successfully built solution of highest support. (We provide the precise parameters of this binary search later on.)
__Building the set__ (core execution): Assume a target support value $d$ is fixed, and fix an arbitrary order over the candidate set $V$. We execute a lazy algorithm that inspects each validator in $V$ once and decides on the spot whether to include it in the solution or discard it. Namely, we start with the empty set $S$, and then for each $v\in V$ we check whether the set $S\cup\{v\}$ has an affordable weight vector $w$ such that $supp_w(S\cup\{v\})\geq d$. Recall that this check takes time $F$. If it does, we update $S$ to $S\cup\{v\}$. As soon as $|S|=m$ we stop, declare success, and store the found solution. Else, if after traversing the full validator set $V$ we still have that $|S|<m$, we declare failure.
### Analysis of approximation factor
__Theorem__: _For any constant $\varepsilon>0$, our algorithm provides an approximation ratio of $2+\varepsilon$._
Fix a constant $\varepsilon>0$, and assume that the external loop of our algorithm is tuned so that the binary search over the support stops when the ratio between the lowest support $d$ returning failure and the highest support $d$ returning success is at most $(1+\varepsilon/2)$.
__Lemma 1:__ _If $d^*$ is the optimal support value of the given instance, then whenever our algorithm tries to build a solution with a target support value $d$ observing $d\leq d^*/2$, it is guaranteed to succeed._
Notice that Lemma 1 and the previous stopping rule immediately imply the main theorem.
__Lemma 2:__ _If an instance $(N\cup V, E, b, m)$ of NPoS has two affordable solutions $(S, w)$ and $(S^*, w^*)$ with $|S_1|<|S_2|$, then there is a validator $v\in S_2\setminus S_1$_ and an affordable solution $(S\cup \{v\}, w')$ with $supp_{w'}(S\cup\{v\})\geq \min\{supp_{w}(S), \frac{1}{2}supp_{w^*}(S^*)\}$.
Before we prove Lemma 2, let's prove that it implies Lemma 1.
_Proof of Lemma 1._ Let $(S^*, w^*)$ be the optimal solution with optimal support $supp_{w^*}(S^*)=d^*$. Assume by contradiction that for some target support value $d$ with $d\leq d^*/2$, our algorithm fails. Thus, after traversing the whole validator set, the algorithm ends up with an affordable solution $(S,w)$ with $supp_w(S)\geq d$ and $|S|<m$. By Lemma 2, there is a validator $v\in S^*\setminus S$ such that $S\cup \{v\}$ has an affordable weight vector $w'$ that provides support of at least $d$. Notice as well that *any subset* of $S\cup \{v\}$ also has an affordable weight vector that provides support at least $d$ - namely $w'$. But this means that at whichever point that our algorithm inspected validator $v$, it should have included it in the then-current solution, which was a subset of $S$, and thus $v$ should be in $S$. We reach a contradiction.
$\square$
_Proof of Lemma 2._ Consider solutions $(S, w)$ and $(S^*, w^*)$ as in the statement of the lemma, with respective supports $d$ and $d^*$. By replacing $d$ with $\min\{d,d^*/2\}$, we can assume without loss of generality that $d\leq d^*/2$. We can also assume that these solutions give each elected validator the exact same support, respectively $d$ and $d^*$ (in general, vectors $w$ and $w^*$ are not maximally affordable). Consider vector $f:=w^*-w\in \mathbb{R}^E$, which we see as a vector of edge flows over the network induced by $N\cup S \cup S^*$. We partition the nodes into four groups: relative to $f$, we have that
* $N$ has a net excess of $|S^*|\cdot d^* - |S|\cdot d$,
* $S\setminus S^*$ has a net excess of $|S \setminus S^*|\cdot d$,
* $S^*\setminus S$ has a net demand of $|S^* \setminus S|\cdot d^*$, and
* $S\cap S^*$ has a net demand of $|S\cap S^*|\cdot (d^*-d)$.
Now, using the flow decomposition theorem, we can decompose flow $f$ into simple path flows, where each path starts in one of the groups with net excess and ends in one of the groups with net demand. If we eliminate all paths starting from $S\setminus S^*$, we obtain a subflow $f'$, relative to which
* $N$ has a net excess of $|S^*|\cdot d^* - |S|\cdot d$, and
* $S\cap S^*$ has a net demand of at most $|S\cap S^*|\cdot (d^*-d)$.
Therefore, the net demand of $S^*\setminus S$ relative to $f'$ is at least
\begin{align}
|S^*|\cdot d^* - |S|\cdot d - |S\cap S^*|\cdot (d^*-d)
&= |S^*\setminus S|\cdot d^* - |S\setminus S^*|\cdot d \\
&> |S^*\setminus S|\cdot (d^* - d) \\
&\geq |S^* \setminus S|\cdot d,
\end{align}
where the inequalities follow respectively from $|S|< |S^*|$ and $d\leq d^*/2$. By an averaging argument, this implies that there is a validator $v\in S^*\setminus S$ with a demand relative to $f'$ of at least $d$.
To conclude we notice that, from the fact that $f'$ is a subflow of flow $f=w^*-w$, it follows that weight vector $w':=w+f'$ is non-negative and affordable. Moreover, $w'$ does not decrease the support of any validator in $S$ compared to $w$, and gives a support of at least $d$ to $v$. Hence, $supp_{w'}(S\cup \{v\})\geq d$, which is what we needed to show.
$\square$
### Analysis of running time
In this section, we give further details on the implementation of the algorithm, and analyze its running time. Recall that the algorithm guesses a target support value $d$ in the external loop and then tries to build a set of that value. Clearly, for a fixed value $d$, the core execution computes optimal weight vectors for at most $|V|$ sets. Moreover, we show below that only $O(\log(\epsilon^{-1}\log m))$ guesses of $d$ are needed, and thus the total running time of the algorithm is
$$O(F|V|\log(\varepsilon^{-1}\log m)),$$
where $F$ is the time it takes to compute the optimal weight vector of a given committee.
We start by setting upper and lower bounds for the optimal support value $d^*$. For any validator $v\in V$ we define its _potential support_ as $p_v:=\sum_{n\in N: \ nv\in E} b_n$, and notice that $p_v\geq supp_w(v)$ holds for any affordable vector $w\in\mathbb{R}_{\geq 0}^E$. As a pre-computation, we find the potential supports of all validators in $V$, and identify the _$m$-th highest potential support_, which we denote by $p_m$. It easily follows that $d^*\leq p_m$. On the other hand, we also have that $d^*\geq p_m/m$, as we show now: Consider the $m$-solution $(S,w)$ where $S$ contains the $m$ validators with highest potential support, and $w\in\mathbb{R}_{\geq 0}^E$ is defined by $w_{nv}=b_n/m$ for each $nv\in E\cap(N\times S)$. Vector $w$ provides a support of $p_v/m$ to each $v\in S$, so $supp_w(S)\geq p_m/m$. Moreover, each nominator $n$ provides a support of $b_n/m$ to no more than $m$ validators in $S$, so $w$ is affordable. Therefore, $p_m/m \leq d^* \leq p_m.$
Recall that we need to estimate, within a multiplicative factor of $(1+\varepsilon/2)$, a threshold value $d'$ where our algorithm switches response from sucess to failure. Value $d'$ must be in the range $[d^*/2, d^*]$, which means that it is safe to search in the range $[p_m/(2m), p_m]$. The ratio between upper and lower bounds of our search range thus starts at $2m$, and can be square-rooted at each iteration of the outer loop, by always selecting the geometric mean of the bounds. Hence in order to reach a ratio below $(1+\varepsilon/2)$, only $O(\log(\varepsilon^{-1}\log m))$ iterations are needed.
We remark that the running time of the mentioned pre-computations will be dominated by that of the main algorithm.
## 3. Properties of the maximin support objective
Intuitively, we want to select a set of validators that is suitably representative. In the worst case, it can't be that representative. Suppose that there are $100m$ nominators and they each nominate a different validator candidate with $1/100m$ of the total stake. Then whichever set of candidates are elected, it can only be backed by $1/100$ of the total stake.
In the field of election theory for multiwinner elections, a common concern is ensuring that no set of voters is underrepresented. An axiom like Proportional Justified Representation (see our note on the sequential Phragmén method) says that if a set of voters is coordinated and they vote for enough of the same candidates, they are guaranteed a minimum amount of representation in the winning committee. However, for securing blockchains with proof-of-stake, we consider that some stake may be owned by adversarial entities and want to limit their power to attack the system if their stake is small. Thus for us, it is more important that no set of voters is overrepresented.
To attack the system, the adversarial nominators could use their stake to elect a subset of adversarial candidates $A$, who we can assume that no honest nominator would trust with their stake. So we want to avoid having a subset $A$ of the elected validators such that the total stake of the nominators backing $A$ is small compared to $|A|$. In Polkadot, validators observe multiple roles, and an adversary needs to control different numbers of validators to perform different attacks, with the attacks that need more validators being worse. Hence, we do not want to maximise the stake needed to get a set $A$ of a single size elected. Since we'd expect the number of votes needed to scale with the size of $A$, we are interested in the objective
$$a :=\max_{S\subseteq V: \ |S|=m} \min_{A \subseteq S, \ A\neq \emptyset} (\sum_{n:\ \exists v \in A, \ nv \in E} b_n)/|A|$$
However, it turns out that this objective is the same as the one above:
__Lemma 3.1:__ The maximin support objective is equal to the maximum $a$ above.
This means that a multiplicative approximation of this objective ensures that a set of voters cannot be much more overrepresented than some set of voters in the set of elected candidates given by another rule.
We are also interested in rewarding nominators for securing the system and punishing those who back validators who behave badly. We want to slash nominators who back malicious validators in extreme circumstances, and it is important that this costs enough stake. Note that the above characterisation of overrepresentation is enough for this, even if we do not uniquely assign stake to validators. However with uniquely assigned stake, we can slash only the nominator's stake that is used uniquely to back badly behaving validators and still get similar guarantees, but now the relationship is clearer. We can reward nominators based on their assigned stake as this is a good indication of their contribution to the system. Nominators who back less popular validators are more likely to have all their stake assigned and so would be paid more.
_Proof of Lemma 3.1:_ Let $(S,w)$ be a solution that maximises $d=supp_w(S)$ over all solutions. Then, for any subset $A \subseteq S$,
\begin{align*}
\sum_{n: \ \exists v \in A, \ nv \in E} b_n
& \geq \sum_{n\in N} \sum_{v\in A: \ \ nv \in E} w_{nv} \\
&=\sum_{v\in A} \sum_{n\in N: \ nv\in E} w_{nv} = \sum_{v \in A} supp_w(v) \\
& \geq \sum_{v\in A} d = |A| d,
\end{align*}
and so $a \geq d$. To show $a \leq d$, we need to show that there is a subset for which this is tight.
We call a validator $v$ loose in $(S,w)$ if there is an affordable solution $w'$ with $supp_{w'}(S) \geq supp_w(S)$ and $supp_{w'}(v) > supp_w(v)$, and tight otherwise. Since a convex combination of affordable weight vectors is also affordable, there exists a $w'$ such that all loose $v$ have $supp_{w'}(v) > supp_w(v)$, i.e. they are loose simultaneously. Thus, we can assume without loss of generality that $(S,w)$ is a solution of maximal support $d=supp_w(S)$, where $supp_w(v)>d$ for each loose validator.
Since $(S,w)$ maximises $supp_w(S)$, there must be at least one tight validator. Let $A$ be the set of tight validators. We claim that for each nominator $n$ that supports at least one tight validator, we have $\sum_{v \in A} w_{nv} = b_n$. Suppose for a contradiction that this is not the case for some nominator $n$ supporting a tight validator $v'$; then either $\sum_{v\in S} w_{nv} < b_n$ or there is a $v'' \in S \setminus A$ with $w_{nv''} > 0$. In the former case, vector $w'$ with $w'_{nv'}=w_{nv'}+ (b_n - \sum_{v\in S} w_{nv})$ and all other coordinates identical to $w$, would be an affordable solution with $supp_{w'}(v') > supp_{w}(v) = s$, which shows that $v'$ is loose, a contradiction. In the latter case, since $v''$ is loose for $(S,w)$, we have that $supp_w(v'')>s$. If $n$ shifts some load from $v''$ into $v'$, we obtain a new affordable weight vector for which both $v'$ and $v''$ are loose, again a contradiction. This completes the proof of the claim.
Now for this set $A$ of tight validators the inequalities above are tight i.e.
\begin{align*}
\sum_{n: \ \exists v \in A, \ nv \in E} b_n
& = \sum_{n\in N} \sum_{v\in A: \ \ nv \in E} w_{nv} \\
&=\sum_{v\in A} \sum_{n\in N: \ nv\in E} w_{nv} = \sum_{v \in A} supp_w(v) \\
& = \sum_{v\in A} d = |A| d.
\end{align*}
This completes the proof that $a=d$.
$\square$