Cluster Discrimination Analysis

# Cluster Discrimination Analysis In this post we analyse the benefit a cluster will get and how a rational cluster can decrease it's costs by sending the messages to few receivers. ## Assumptions * Clusters are rational actors * All clusters will act similarly * Each cluster will try to send the messages to only a specified percentage of the receivers * Clusters doesn't collude with receivers * Receiver set is large enough compared to set size picked to ensure that probability of finding a malicious cluster is always same ## Definitions * Let $R$ be the total number of receivers in Marlin network. * Let $R_r$ be the total number of receivers who have received the chunks from the cluster. * Let $R_d$ be the total number of receivers who haven't received the chunks from the cluster. * Let $S$ be the set of receivers picked for the dispute resolution in case a dispute is raised by a receiver that hasn't received chunks. * Let $S_d$ be the receivers who haven't received the message and also are in the dispute resolution voting set * Let $S_t$ be the target number of receivers required to slash the cluster as part of the dispute resolution voting * Let $P(x)$ be the probability that x of the receivers picked for dispute resolution voting are from $S_d$ * Let $P(x+)$ be the probability that atleast x of the receivers picked for dispute resolution voting are from $S_d$ * Let $PH(x, y)$ be the probability that exactly $y$ out of $x$ receivers are honest * Let $PH(x, y+)$ be the probability that atleast $y$ out of $x$ receivers are honest * Let p be the probability that any receiver is honest. ## Analysis For a cluster which doesn't send messages to a portion of the receivers. The probability that the cluster will be caught as part of the dispute resolution process i.e receivers who haven't received messages are more than the threshold votes required to slash. The probaility is calculated as follows $P(S_t+)=\sum_{i=S_t}^{S}P(i)*PH(i, S_t+)$ We now calculate the probability that exactly x receivers among the ones picked have received the messages. $P(x)=\frac{Ways\ to\ pick\ x \ receivers\ from R_d\ and\ S-x\ receivers\ from R_r}{Ways\ to\ pick\ S\ receivers\ from\ R}$ $= \frac{{Ways\ to\ pick\ x \ receivers\ from R_d}\ * {Ways\ to\ pick\ \ S-x\ receivers\ from R_r}}{Ways\ to\ pick\ S\ receivers\ from\ R}$ $=\frac{{R_d \choose x}*{R_r \choose S-x}}{R \choose S}$ // pick without repetition Then we calculate the probability of atleast $y$ out of $x$ receivers are honest $PH(x, y) = p^y*(1-p)^{x-y}$ $PH(x, y+)=\sum_{i=y}^{x}p^i*(1-p)^{x-i}$ Using the above result we calculate $P(S_t+)$ $P(S_t+)=\sum_{i=S_t}^{S}\frac{{R_d \choose i}*{R_r \choose S-i}}{R \choose S}*(\sum_{j=S_t}^{i}p^j*(1-p)^{i-j})$  ## Conclusion Assuming a total of $50000$ receivers among which the cluster sends the chunk to $80\%$ of the receivers. If the dispute resolution set size is around 10 receivers among which $50\%$ needs to vote for cluster to get slashed. Assuming 2/3rd of the receiver set is honest,\. The probaility of cluster getting caught comes out to be $0.044$. Which means the amount at stake should be higher than $1/0.044\approx23$ times the cost of dispute resolution process. This value shoots up to $1/0.00022\approx4545$ times the cost of dispute resolution process when the chunks are sent to $90\%$ of the receivers. Sending it to only $90\%$ of the receivers might enable the clusters to discard serving receivers in areas where the cost of infrastructure might be relatively higher. If n messages are sent within the time period of dispute resolution process. The total stake should be $23*n$ in case of 80% receivers getting the message. As the number of messages can be significant as the dispute resolution is an onchain process and will take time, the total stake required will be extremely high which renders this approach not very practical. The probaility of cluster getting caught doesn't change significantly with total number of receivers. ## Reference Code ```python from decimal import * def fac(n): pdt = 1 # iteration to avoid reaching max recursion depth for k in range(1, n+1): pdt = pdt * k return pdt def choose(n, k): return fac(n) / fac(k) / fac(n-k) def honProb(p, x, y): return sum([p**i*(1-p)**(x-i) for i in range(y, x+1)]) def probOne(w, n, k, s, p, ht): b = n - w return Decimal(choose(w, k)*choose(b, s-k))/choose(n, s)*honProb(p, k, ht) def probAll(w, n, s, p, ht): return sum([probOne(w, n, i, s, p, ht) for i in range(ht, s+1)]) # w: Receivers who haven't received messages # n: Total receivers # s: dispute res set size # p: Probability that receiver is honest # ht: Number of people who must have received among them # prob: Probability of cluster getting caught prob = probAll(w, n, s, p, ht) ```