Building Bulletproof Communities: Combatting Sybil Attacks in Web3

# Building Bulletproof Communities: Combatting Sybil Attacks in Web3 By Juan P. Madrigal Cianci. ## Introduction As the prevalence of Sybil attacks continues to rise in Web3, the integrity of blockchain communities faces significant challenges. This document delves into the nuances of these threats and discusses strategies to effectively mitigate their impact. Web3 is a domain where community visibility and engagement are crucial to the success of any project. Protocols and initiatives must not only attract but also retain a strong user base. Although retaining users is often more challenging and closely tied to the quality of the product, one innovative trend for attracting users involves implementing points systems. These systems frequently signal upcoming airdrops, combining enthusiasm with strategic planning. In these systems, users earn points by engaging in activities that benefit the protocol. These may be protocol-based activities, such as depositing tokens, providing liquidity, or executing specific trades, or they may involve contributions to the broader ecosystem, like participating in the ETH research forum, contributing to relevant repositories, and aligning with Ethereum's goals (c.f. [Celestia's airdrop criteria](https://bsc.news/post/celestia-tia-airdrop-everything-you-need-to-know)). This modern approach to loyalty programs, adapted for the blockchain environment, not only deepens user engagement but also helps expand user bases. By awarding tokens or points, projects not only motivate participation but also facilitate a robust introduction of their tokens into the market. This strategy extends beyond mere promotion; it is a sophisticated tool designed to catalyze network effects and foster a sustainable ecosystem. The advantages of adopting a points-based system are manifold. Primarily, it boosts user engagement by rewarding active participation, and promotes decentralization through the widespread distribution of tokens. This strategy also enables projects to collect valuable data on user behavior and preferences, informing future development and marketing efforts. Moreover, by linking rewards to specific actions, projects can steer user activities towards supporting the overall health and growth of the ecosystem. Additionally, for protocols beyond their initial growth phase and moving into a user-retention stage, such as Arbitrum's Short Term Incentive Program (STIPs) and retroPGFs, points and airdrops serve as potent tools for maintaining engagement. **What's the Catch?** However, like any system of value, points systems in crypto are vulnerable to exploitation. Sophisticated users or "farmers" may conduct what are known as 'Sybil attacks,' creating multiple fake identities to gain undue rewards. This not only skews the intended distribution and utility of the tokens but also siphons significant value from genuine contributors to the network's functionality. Furthermore, such manipulation can inflate user numbers, presenting an illusion of growth. Given these risks, it is imperative for projects to design airdrop mechanisms that are resistant to such abuses, ensuring that the incentives promote long-term value for all stakeholders in the ecosystem. Having introduced how points and airdrops serve as vital tools for enhancing community engagement, we must now turn our attention to the vulnerabilities they introduce, particularly the threat of Sybil attacks, which we will explore in depth in the following section. ## Sybil Detection So, whats a sybil attack? In our context, a Sybil attack refers to when an individual or group creates multiple fake identities to gain disproportionate influence or unfairly increase their share of rewards in decentralized networks. These attacks exploit the anonymous nature of blockchains to manipulate airdrop distributions, voting mechanisms, or any other activity where influence is measured by the number of entities participating. Sybil detection is a critical component of maintaining the integrity and fairness of blockchain ecosystems. Effective detection strategies are essential as these systems must continuously evolve to address new tactics employed by attackers. Enhancing the accuracy of detection algorithms, reducing false positives, and efficiently scaling to accommodate growing datasets are foundational elements of robust Sybil detection. Projects must also manage computational costs, as more sophisticated algorithms may affect system performance. Integrating regular updates based on emerging Sybil behaviors and incorporating community feedback ensures that these systems remain effective and trustworthy. By understanding the mechanics behind Sybil attacks and implementing advanced detection strategies, projects can safeguard their ecosystems against exploitation and support a healthier, more equitable Web3 environment. **Ok, so how can I detect them?** **Remark.** *This section borrows from and builds on top of the work performed by [Trusta Labs](https://ethresear.ch/t/trusta-s-ai-and-machine-learning-framework-for-robust-sybil-resistance-in-airdrops/16828).* In theory, there are several ways of detecting potential Sybil attackers; - One can use, e.g., a proof of human hood (c.f. world coin) or a KYC if there's a need for being more strict, - Alternatively, one can create economic games to detect them, such as e.g., rewarding users that can, with very high certainty, identify said users (c.f. [Hop's detection mechanism](https://github.com/hop-protocol/hop-airdrop)) or promote self-identification as Sybil in exchange for a piece of the rewards they would have gotten (with the underlying assumption that they would get 0 if they get caught), - A different variant is to take a data-driven approach, in which the actions of each individual user/wallet are examined in a broad context and are then processed through Machine Learning and other statistical methodologies to determine whether or not they can be classified as Sybil. While there is some merit to the former two approaches, they do suffer from several drawbacks. On the one hand, a KYC from a dApp can be seen as antithetical to crypto. Conversely, proof of personhood protocols have raised several concerns about personal biometric data. Lastly, these incentive-driven methodologies are not necessarily effective (c.f. [Hop's detection mechanism](https://github.com/hop-protocol/hop-airdrop)), nor might they be aligned with the network's best interest. With this in mind, we will focus on the third alternative: the data-driven approach. Not only such an approach can provide a robust and replicable argument behind the classification, it can also be argued that it is more aligned with the *ethos* of Web3. Indeed, as mentioned by *Trusta Labs*: > 1. AI-ML preserves privacy as users don't provide their bio-information and identities in Web2. Proof of personhood compromises anonymity by requiring identity confirmation. >1. AI-ML comprehensively analyzes massive on-chain data to reduce vulnerability. Proof of personhood is vulnerable, as verified identities can be exploited. >1. AI-ML is inherently permissionless, as anyone can analyze the same public on-chain data. >1. Sybil's judgments can be publicly double-verified through transparent analysis. Equipped with an understanding of the theoretical basis for detecting Sybil attackers, let's examine a practical approach to implementing these concepts. The process begins with constructing a detailed Sybil detection algorithm, which is outlined in the steps below. ### Main steps towards a simple algorithm We will now construct a simple Sybil detection algorithm and discuss potential improvements to this mechanism. Assume there's a list of addresses $\mathscr{A}=\{a_1,a_2,\dots,a_N\}$ that we want to test for sybil-ness. One can then construct a simple Sybil detection methodology as follows. 1. **Construct an ATG.** The first step in constructing this sort of Sybil detection methodology is to create what's known as an Asset-Transfer Graph (ATG). The ATG is a table that maps each address to be examined, $\mathscr{A}$, with the address that first funded such an address. In this case, this ATG can be understood as a table that, in one column, contains all the addresses in $\mathscr{A}$, and on the other one, a list of addresses that first transferred some native token (e.g., ETH) to each wallet on $\mathscr{A}$. The intuition behind this is that if there are many different wallets with the same founder, this indicates a Sybil. Mathematically, defining by $\mathscr{F}$ the list of founding addresses, this step consists of constructing a graph $\mathscr{G}$ where the nodes are $\mathscr{F}\cup\mathscr{A}$, with the edges representing whether there was a transfer between any two nodes. 2. **Community detection.** Once the graph $\mathscr{G}$ has been constructed, one can use community-detection algorithms to detect patterns that would be difficult to spot otherwise. In particular, the Louvain algorithm ([Que et al. (2015)](https://ieeexplore.ieee.org/abstract/document/7161493)) has been used in several approaches in the literature. Such an algorithm returns a cluster of communities, i.e., several groups of addresses that can be shown to be linked with one another (due to having interacted with each other). ![image](https://hackmd.io/_uploads/S1itSa1IR.png) **Figure 1.** *An illustration of the Louvain Algorithm. [Source.](https://medium.com/@nimratahir1212/community-detection-louvain-algorithm-3f3b7058047f)* Once these communities have been detected, one can examine each specific community more in-depth to detect some standard parameters among its components that suggest malicious behavior. Some examples of these communities are shown below: ![image](https://hackmd.io/_uploads/SknhPTk8A.png) **Figure 2.** *A chain-like pattern. [Source: TrustaLabs](https://ethresear.ch/t/trusta-s-ai-and-machine-learning-framework-for-robust-sybil-resistance-in-airdrops/16828)* In the Figure above, a community presents a chain-like pattern. Specifically, this means that address $a_1$ *instantiated * (i.e., gave it its first native tokens to pay for gas) address $a_2$, which in turn instantiated $a_3$ and so forth. These addresses then participated in the protocol. These sorts of patterns are unlikely to occur by themselves without malicious behavior. Another type of possible pattern is shown below. ![image](https://hackmd.io/_uploads/rk0TwaJ80.png) **Figure 3.** *A star-like pattern. [Source: TrustaLabs](https://ethresear.ch/t/trusta-s-ai-and-machine-learning-framework-for-robust-sybil-resistance-in-airdrops/16828)* The Figure above shows a star-like pattern. Here, one address (the one at the star's center) instantiates all other addresses shown as red dots in the graph. While this might suggest malicious behavior, it can occur more naturally if the address in the middle is a Centralized Exchange, for example. Notice then that the Figure above suggests that more than classification based purely on the metrics above might be required as there is room for a misclassification error. Given this, how can these sybil detection methodologies be improved? #### Refining these metrics While effective, the previous steps miss out on a wealth of readily available information due to the on-chain nature of these transactions. We now discuss a few ways of improving these detection mechanisms. 1. **Anomaly detection based on several other metrics** Notice that the previous approach is a bit myopic in the sense that it is only looking at connections between two addresses (i.e., a *yes* or *no* metric on whether they interacted or not), however, it is missing on a wealth of other information that can also be found on chain; interacted amount, which protocols have they interacted with, interaction time, deposit time in the protocol of interest, deposit amount, funding amount, among many others. Notice that, the more metrics we include, the more computationally expensive the algorithm can become. With this in mind, there are several ways to include this additional information. Thus, for each detected community, an anomaly detection algorithm across several of these metrics might help identify more clearly whether the address at hand is a Sybil or not. 3. **hand-checking** Furthermore, given that the methodology at hand invokes some statistical methods, it is essential to consider some hand-chacking of suspicious activity (via, e.g., etherscan) to avoid some potential misclassifications. 4. **Using graph properties.** One can also use graph-related metrics, such as degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality, as tools to detect the *most essential nodes*, whether in the global graph or for each community. In particular, this could mean finding a node (address) that is highly connected to other nodes (e.g., addresses), which can, in turn, suggest a Sybil. 5. **Incentive-specific metrics.** Notice that the metrics described above are general and are only based on properties of the ATG of the instantiated amounts. However, one can also use different metrics to detect malicious behavior based on the specific nature of the incentive. In a protocol where the total traded volume is incentivized, for example, one can also identify so-called *wash traders* (i.e., two addresses trading the same pair back and forth) using similar methods or by constructing an ATG of the traded amounts. While the proposed Sybil detection algorithm provides a framework for identifying potential threats, it's essential to consider the broader implications of such systems. This leads us to evaluate their effectiveness and the ethical considerations they raise critically. #### Criticism While Sybil detection systems are crucial for maintaining the integrity of Web3 platforms, they could certainly be improved. These systems, designed to weed out deceptive practices, raise concerns about fairness, privacy, and the risk of excluding legitimate participants. 1. **The Beta Tester Argument.** Some community members see Sybil attackers and token farmers as unintentional 'beta testers' who stress-test new economic mechanisms. These individuals may uncover vulnerabilities in token distribution algorithms and incentive structures, highlighting areas for necessary security enhancements. Although their actions can be disruptive, they also strengthen the system by exposing weaknesses that require attention. One potential way of leveraging is to have the protocol to agree on releasing some (or all) of the sybil-ed funds in exchange of valuable and constructive feedback (i.e., farming the farmer). Notice that this would provide a positive sum scenario for both parties, although the meaning of *valuable and constructive feedback* is quite subjective and would need to be defined on a case-by-case. 2. **False Positives and Social Cost.** Algorithmic detection methods, especially those employing statistical models, can sometimes mistakenly identify genuine users as malicious actors. Known as false positives, these errors can have significant social costs, including damaging trust in the platform and alienating active community members. Such risks highlight the need for precision in detection algorithms and the incorporation of manual review processes or appeals systems to address and rectify wrongful accusations. 3. **Non-sybil attacks.** While the focus is often on multiple-address Sybil strategies, single-actor manipulations can also exploit incentive mechanisms. Additionally, seemingly malicious patterns may, in specific contexts, represent legitimate strategies. For instance, a user engaging in what appears to be arbitrage on a derivatives platform might be employing a valid hedging approach, underscoring the need for nuanced analysis and cautious application of Sybil detection metrics. Consider the following example. Suppose you're creating a point-based incentive mechanism for a perps exchange protocol (e.g., GMX). Specifically, you want to reward users who open long or short positions. Suppose that some given user opens both a long and a short position. In this case, while such an action might seem like creating an *arbitrage* and, hence, exposing themselves to points with a low cost (the difference of the funding rates in the positions), it is also, strictly speaking, a valid trading strategy, and as such, it should not be penalized. Understanding these criticisms helps us appreciate the complexities of Sybil detection. Next, we will explore how these insights can guide the refinement of detection mechanisms, ensuring they are both practical and equitable. ## Wat do? -- Future Directions Sybil detection is neither a set-it-and-forget-it, nor a *one-size fits all system*. It has to be (i) taylor made, and (ii) it needs to be constntly monitored revisited. As long as there's a way of gaming the system, users **will** find it. Hence, these systems require ongoing refinement and updates to stay ahead of increasingly sophisticated attackers. These continuous improvements should focus on enhancing the accuracy of detection algorithms, reducing false positives, and scaling the systems to handle larger datasets as the user base grows. This begs the question: *would airdrop program do better if they didn't announce how to get points before the initial airdrop?* Furthermore, they system designer needs to be aware of the fact that, in high likelihood, not *all* sybils will be detected (known as a false negative) or that some non-sybils might get labeled as such (false positive). Thus, the underlying incentive program needs to take these guidelines into account; whether it is by planning on a potential chunk of the rewards to be captured by these actors, or by having a mechanism for those who where false positives to contest their labeling. Projects should also consider the computational costs associated with these improvements, as more complex algorithms might require more resources, which could affect the performance and responsiveness of the system. Regularly updating the detection algorithms with new patterns of Sybil behavior and integrating community feedback into the system design is essential for maintaining the integrity of airdrops. In particular, one potential improvement here is to introduce uncertainty quantification methodologies in the classification/community detection algorithm (c.f. [Leeney et Al., 2024](https://www.mdpi.com/1099-4300/26/1/78)). These methodologies can provide an error estimate in the classification as a Sybil. These can also be achieved by, e.g., introducing a Bayesian sub-step in the classification part (c.f. [Van Der Pass, 2018](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://projecteuclid.org/journals/bayesian-analysis/volume-13/issue-3/Bayesian-Community-Detection/10.1214/17-BA1078.pdf&ved=2ahUKEwjumd7MlOmGAxX3RzABHZV7BfEQFnoECBEQAQ&usg=AOvVaw0TS6ztR6nFfo_d_lYlrF0E)) of the methodology. Lastly, one potential improvement that can be achieved as these methodologies improve is to create a list of block-listed addresses, i.e., addresses that have been classified as Sybil beyond any reasonable doubt in previous airdrops. The protocols can then decide whether to not give them any rewards, or severly limit them. On the one hand, this would make the exclusion process simpler. On the other, it would also contribute to an already labeled dataset, which can simplify the use of ML/AI-based methods. As projects navigate the strategic distribution of airdrops, balancing the reward of genuine users while maintaining the token's value is critical. Indeed, given the link between protocol (and hence token) demand and price, having some artificially inflated demand due to the presence of sybils can indeed affect the token price, as well as alter the perception that the protocol has, pre-and-post aidrop. Furthermore, if there's a significant proportion of sybils, it stands to reason that once these they have been rewarded, they are throroughly incentivesed to *dump* the token, potentially creating some significant sell preasure. This involves careful planning regarding the number of tokens distributed, which should reflect the total supply, projected platform growth, and the behaviors the project wishes to incentivize. Adopting dynamic scaling of rewards based on overall engagement levels and the economic health of the platform can optimize the impact of these distributions. Additionally, ensuring compliance with local regulations is paramount to avoid legal repercussions, necessitating that projects work closely with legal experts to understand how airdrops are perceived across different jurisdictions. The regulatory compliance efforts tie directly into managing the market impact of airdrops. Since the crypto regulatory landscape remains fluid, compliance helps mitigate the risk of legal challenges. Moreover, the token distribution method needs careful analysis to prevent market volatility, which might result from a sudden influx of tokens. Mechanisms like vesting periods or staggered distributions help smooth out potential market disruptions, ensuring a more stable integration of new tokens into the market. Further enhancing the effectiveness of these systems involves a collaborative approach that integrates diverse perspectives from team members, ranging from data scientists to blockchain experts. This multidisciplinary input can help refine detection mechanisms and ensure they are comprehensive and up-to-date. Engaging with the community is equally important, as user feedback can provide valuable insights into the system's effectiveness and areas needing improvement. Community-driven initiatives, such as bounty programs for identifying vulnerabilities, strengthen the systems and foster a sense of involvement and ownership among users. Lastly, addressing the ethical considerations and ensuring transparency in how Sybil detection systems operate is fundamental to maintaining user trust. Transparent practices and clear communication about the detection criteria and the steps users can take if they feel wrongly flagged are crucial. It is also vital to ensure that these methods respect user privacy and do not unnecessarily infringe upon it, upholding the privacy-centric values of Web3. These ethical practices are not just about adhering to principles but about building and maintaining a respectful and fair community environment that encourages long-term engagement.