# A cluster-based approach to relay networks This post provides a summary of the cluster based approach we are exploring for Marlin architecture. In this post we summarize the approach, assumptions, analysis and the lessons we learnt, issues we faced from the cluster based approach. ## Overview In this approach the network is divided into 2 parts * The Core network * The Fringe network ### Core Network Core network is the Marlin network that provides the SLA for sending the blocks from producers to consumers. Core network relayers form entities called clusters. A cluster consists of a group of relayers and acts as a single entity. Any misbehaviour or inability to meet SLA due to any of the relayers in the cluster will effect all the relayers in the cluster. Each cluster is also expected to be available and able to provide SLA across all the receivers. Producer divides the block into several chunks using erasure coding and send to designated clusters picked at random using a deterministic function. A set of such clusters are given as options for the producer to send a specific chunk in order to encourage competition among the clusters. Each cluster is expected to send blocks within the SLA to all the consumers in the network. Disputes can be raised by the receivers that enough chunks weren't received to reconstruct the block or if certain clusters haven't sent any messages. Producers submit the acknowledgment that they have send messages to sufficient clusters and then a schelling is started on whether a cluster sent it to all the receivers by randomly picking a small set of receivers and asking them to vote. In case the producer/cluster are found to be at fault then they are slashed. Clusters, producers and consumers in the network stake LIN. This stake is used to dissuade malicious behaviour and avoid sybil in case of consumers. ### Fringe Network Fringe network is the network of nodes which are connected to the producers/consumers of the core network and doesn't offer any SLA. This network is used to propagate the messages propagated across the world by the Core network to all the full nodes who want to receive blocks at exteremely low fee. The receiver receive at no extra cost, there will be competition among the receivers and receivers can send the same message to multiple members of the fringe network, hence the cost per transmission can be delta more than the bandwidth costs. Fringe network provide a low cost alternative for the nodes in the network who do not want an SLA for the block to be received and also do not want to stake or pay much for receiving the block data. ## Assumptions * Significant portion of Clusters are rational * Significant portion of receivers are rational ## Analysis ### Identification of malicious actors The cluster based approach makes certain challanges simpler due to the existence of clusters who act as a single entity and are the only entity along with producer in the transmission chain. This helps to narrow down the identification of malicious actors. As producer and cluster have opposing interests in general and due to the random picking of clusters, we can safely assume that it is very hard for producers and clusters to collude. As clusters and producers can't collude the availability of receipts is an important proof that the chunk was indeed sent through the cluster and it also frees the producer from any suspicion of misbehaviour in terms of availability of data. The propagation of data by the clusters is ensured and audited using a fisherman approach where a small set of voters are selected from the receivers and asked to vote on the cluster's performance and adherance to SLA. This vote specifies if the cluster is to be slashed. This vote is conducted in a way that the participation and voting is not provable (using something like MACI), hence avoiding bribing attacks. The voting can be made incentive compatible for the voters by enforcing the creator of the dispute to pay for tx fee for the dispute process. Hence the relayers in the Marlin protocol can be slashed if the SLA is not met, hence providing the economic guarantees for the block relay. Issues with this approach are that assuming any kind of honesty assumption among the receivers is hard as stake which is the sybil resistance mechanism cannot be very high. A very high stake means low usability of the network. A low stake means that the cost of attack against the dispute resolution process is low. A mathematical analysis of the above procedure([here](https://hackmd.io/c_UDCzBPQPu7VOB1ASZnKw) and [here](https://hackmd.io/DugznPkPSPS_UuT6Uykkog)) proves that even though a high reliability can be acheived for the message to be received. It is possible for cluster to censor messages selectively and to ensure that the proportion of the receivers who receive the message is high, the stake provided by the cluster needs to be extremely high as the dispute resolution mechanism is not efficient enough to detect malicious behaviour and needs significant attempts. ### Clustering of Relayers The core idea of clustering of relayers solve many problems for the Marlin protocol, as all the relayers in the cluster are assumed to be a single entity. The cluster should ensure optimum topology for the cluster to be competitive among other clusters. But it creates additional overheads to relayers for the management of the clusters, hence decreasing the barrier to entry. The management of cluster and formation of a new cluster is a significant barrier in terms of coordination and setting up the infrastructure. The existance of clusters is to ensure SLA and there are options for other nodes to take part in the fringe network where barrier to entry is very low. Though the nodes taking part in fringe doesn't contribute to the guarantees provided by the core network. ### Reward distribution Rewards are giving to the cluster which propagates the chunks of the block in the canonical chain. Reward distribution becomes complex as the schelling game needs to take place for each of the reward to ensure that the block was indeed propagated by the cluster to all the receivers. ### Clock Sync issues It is still hard to measure the latencies accurately due to the clock sync issues which might result in decrease in the granularity of the SLA that the network can provide. ## Conclusion The following are the problems with cluster based based approach based on the above analysis * Voting to slash clusters might involve significant stake for clusters which leads to higher fee as reward for the stake. * Clusters abstract away significant challanges in terms of block propagation and topology across the network. But it adds overheads which might be significant for the clusters to manage themselves. * The overheads on the clusters might be signficant and any node in the network might not be able to contribute to the guanrantees that Marlin network provides. Based on the above issues we conclude that cluster based approach can be a viable approach to block propagation in blockchains given certain aspects like schelling for SLA is improved in terms of reliability.