Protocol risk evaluation: developing a fundamental approach

# Restaking Network Risk Evaluation: Developing a Fundamental Approach ## Acknowledgements *Special thanks to Brett Palatiello and Kunz Mainali from EigenLayer, Felix Lutsch from Symbiotic, Charles Mountain from Etherfi, Chadi from Re7, the Puffer team, the eOracle team, Kashish from Lagrange, Pankaj from Witness Chain, Nav from Ungate, and Yarden from Othentic for review and feedback.* ## Abstract Even though Proof of Stake (PoS) has been around for quite some time, there’s still lack of consistent frameworks that systematically address staking risks at the protocol level. Most discussions focus on operators (validators) risk factors, which are very important but only a part of a story. With restaking, the same staked assets secure multiple protocols simultaneously, exposing them to all associated slashing conditions. In this setup, protocol-level risks become even more critical, warranting a structured approach to portfolio management. Later this year, Ethereum restaking protocols and restaking networks will introduce slashing functionality on the mainnet, making this more than just a theoretical concern. For example, Symbiotic has already released [the Slasher module](https://docs.symbiotic.fi/modules/vault/slasher/). In this article, we introduce a protocol risk evaluation approach designed for restaking portfolio management on any restaking platform. However, this methodology isn’t limited to restaking — any staking protocol can be assessed in the same way. Unlike most existing approaches, we focus on fundamental analysis of protocols. To illustrate this, we evaluate six networks and explore how this framework can be applied to portfolio construction. This reading is for anyone seeking to understand the risks associated with restaking networks, whether managing their own assets or advising clients. :::info Worth noting that we’ll stick to Symbiotic's “network”, for notation of protocols using stake for cryptoeconomic security, despite various communities referring to them with different acronyms to describe the same kind of concept. For example, EigenLayer uses Actively validated service (AVS), Babylon - Bitcoin Secured Networks (BSN), Jito - Node consensus Networks (NCN), etc. **We think "network" is the best fit, since the framework we’re building can be applied beyond the restaking ecosystem.** ::: *Disclaimer: This content is presented to you on an “as is” basis for general information and educational purposes only, without representation or warranty of any kind. It should not be construed as financial, legal or other professional advice, nor is it intended to recommend the purchase of any specific product or service.* ## Introduction ### Utility & Purpose of Staking Delegated staking is a deal between the protocol, operators, and delegators. Asset holders are not generally capable to run and support infrastructure, that’s why they are delegating these duties to operators. At the same time, operators do not generally possess sufficient amount of tokens to stake so that the infrastructure provision would be economically feasible. Protocols require stake (locking assets) and continuous compliant support of its operations. In exchange for supporting network operations and adhering to protocol rules, stakers receive rewards for their locked assets. Staking assets (or voting) with an operational entity, such as a node operator, serves several key purposes: - **Sybil resistance** – Preventing attackers from flooding the network with fake nodes, making a 51% attack more difficult. - **Behavioral incentives** – Discouraging malicious actions, negligence, and operational failures through slashing penalties. - **Operator selection** – Helping curate a set of operators that align with specific goals, such as decentralization or network liveness. :::info Stakers “vote for”/“hire”/”buy a service from” the node operators. Node operators do not generally risk anything but reputation unless they stake their own tokens. In the majority of cases, there’s no way for a delegator to bring a misbehaving node operator to justice. A malicious operator could exploit staked assets to launch on-chain attacks or cut corners to save resources, leading to operational failures. ::: Not every failure can be blamed on the operators. It's important to acknowledge that they are constrained by the protocol's design as they perform their duties. An operator might excel in every way but still can be affected by a flawed node client provided by the developers. Likewise, even if they follow best practices, poorly designed rules or architecture could still lead to penalties. :::info A notable example is [Kusama 4543 Era](https://forum.polkadot.network/t/kusama-era-4543-slashing/1410/2) *mass slashing*, which happened due to a bug in the core protocol code. Over 300 validators were slashed, prompting collective network action to reverse the financial consequences. ::: </aside> ### Primer on Restaking **Restaking** is a concept that enhances staking’s capital efficiency by allowing the same assets to be staked across multiple protocols simultaneously. While it inherits all the principles of traditional staking, restaking introduces additional challenges for stakers and investors, making it essential to carefully consider protocol-induced risk factors, as there are now many to account for. ![image](https://hackmd.io/_uploads/BJXHe9bpyx.png) We’ll assume the chosen operators are honest and meet all the technical and operational requirements. Or at least, that's how they are perceived by the delegator. With the operator selected, the next step is to choose a protocol(s), also referred as networks. Different restaking protocols offer various flexibility for construction of a restaking strategy, i.e. EigenLayer allows for the choice of operator (a single one) + set of networks per unit of the capital. Regardless any technical specifics of restaking portfolio construction, the choice of networks is present everywhere. Evaluating the risk of networks is essential for understanding restaking strategies, and we are happy to share our perspective on how the risk evaluation framework can be built. ### **What Is Risk?** Simply speaking, a risk is a possibility of some negative outcome. The higher the probability of that outcome and the closer the actual state of the world aligns with its definition, the greater the risk. Even if we intuitively understand “risk” in the context of (re)staking, we still need a clear definition — an objective metric to optimize, which could take various forms. Generally, there are two types of protocol-level penalties: - **Ejection** - temporary (**jailing**) or permanent (**tombstoning**) ban from the active set. It means the cease of all operator’s activity and, therefore, they stop earning rewards. - **Freezing** of staked assets is something that might accompany the ejection. The protocol can limit withdrawals of staked assets for some period of time. Usually this period is equal the unbonding time, but sometimes can be prolonged due intentionally or features of slashing mechanisms. For example, in Ethereum a slashed validator needs to wait ~36 days before the stake is withdrawn, but a standard exit time is much shorter. - **Slashing** - a penalty, that imposes a partial or full loss of the staked assets. Besides, one might consider “exogenous” risks such as: - **Market risks** related to the execution of rewards conversion or price risks of stake/rewards value. - **Smart contract risks** - improperly set-up/programmed contracts pose potential security vulnerabilities. - **Regulatory/legal risks** - leaving this for the reader’s imagination. Generally, a restaker would likely be concerned about their restaked assets and/or rewards. The loss of potential rewards is equal or strictly better (in case of partial loss) than not opting in networks - if you don’t opt-in in the network you won’t receive any rewards either. The only thing that can make it nuanced is prolonged withdrawal time in case of jailing (freezing) which will introduce additional opportunity costs. The jailing/freezing risks seem to be somewhat *minor* when it comes to restaking portfolio management. Presence of jailing shouldn’t disincentivize from restaking in almost all cases: for example, a minimal unbonding period in Eigenlayer is 7 days, not mentioning that L1 validator withdrawal time may add even more days on top of it, which is already long enough not to be able to react on sudden events. ![re_§](https://hackmd.io/_uploads/HJSjgc-6kg.png) So, having restaked assets slashed is definitely something to be genuinely worried about. At a minimum, slashing will likely lead to jailing as well, but even worse, it could wipe out more staked assets than the rewards earned. Due to this, a net outcome of choosing a network can become *negative*. ![re_2](https://hackmd.io/_uploads/ryu2xqW6ye.png) In light of the above, we set the following goal for the network risk evaluation framework: - *Creation of reasonable estimate (score) of potential losses due to slashing* - Or, more formally, creation of $f:factors → risk\:score$, such that $risk\:score \sim expected\:loss$ , where mapping $f$ represents our methodology and $risk\:score$ is **a proxy** **metric** for the expected losses due to slashing. For now, *we don’t focus on the reward aspect*. Those interested in conducting a full risk/reward assessment can leverage our risk evaluation methodology and findings as inputs for their models. ## Risk Scoring Philosophy ### Possible Paths to Risk Evaluation As we outlined above, the end goal of the evaluation would be a reasonable estimate of the expected loss. Following probabilistic definition of the expected value, we can express $expected\:loss$ or $\mathbb{E}[loss]$ as $\sum p_i*loss_i$, for all possible slashing events (or scenarios) $i$. In addition to that, we can come up with variance, $\mathbb{V}[loss]$, to reflect the uncertainty. Ideally, the calculation of both metrics requires the definition and knowledge of a probability space of slashing outcomes. Meaning that, technically, we must foresee all possible scenarios leading to the losses of funds and be able to assign a probability of the outcome for each of them given the staking timeframe (e.g. a year). With this basic mindset, we can structure the outcome space in different ways: - The simplest approach is to define it by the number of “slashing occurrences” within a given timeframe, along with the total amount slashed. - A more detailed approach is to break it down into specific causes, like “code bugs” or “network issues.” - Each of these causes can then be further refined into more granular scenarios — like “staking module malfunction” or “P2P protocol implementation flaws.” While these examples illustrate the varying depth in defining the outcome space, other factors can come into play as well. For example, different network risk analysts may only see a subset of potential issues due to limitations in knowledge or resources. Once the loss scenarios are identified, the evaluation process moves to assessing their likelihoods. Typically, researchers try to follow a positivistic (”scientific”) way, that is based on empirical studies using the data about realized event occurrences. It can be some simple probabilistic model based on Poisson Distribution and estimated on historical rate of incidents, or it can be a regression with multiple factors for the prediction. Consider examples for Ethereum slashing risk prediction: - **Historical Slashing Rate** sits at 0.04% per validator, annually. We can agree that this rate will persist in this year too; or use a stochastic approach, such as [Martingale](https://en.wikipedia.org/wiki/Martingale_(probability_theory)). - **Contextual Reputation** where a validator was slashed twice in some non-Ethereum L1s previously due to bad signing performance both times, whereas, in Ethereum, the same validator has perfect performance. We therefore conclude that staking with them bears no risk, at least for now. *Note: the examples are illustrative.* It’s obvious that almost always there will be multiple approaches, and the choice (or development) of the methodology can be very subjective. The subjectivity of the methodology then can be minimized through the process of a model reevaluation, figuring out whether it delivered good predictions (which is assumed to be a target). That said, to properly validate or to invalidate the approach, there should be multiple realizations of the events. This is a *huge* problem when it comes to slashing: - ***Slashing events are very rare*** for a specific protocol. Even taking a broader perspective (like all ever existed protocols) - it’s still rare to provide enough data point for reliable estimates. It can take quite long time for the protocol to see a single slashing event after slashing is enabled. - ***The sufficient part of these events occurs due to the operators’ failure**,* meaning that we likely are not able use these data points for a protocol risk evaluation. - The reasons behind slashing sometimes are very different. They can be identified, but the ***predictive power of estimation is likely to be worsened, because of [higher model dimensionality](https://www.statisticshowto.com/dimensionality/)***. - Slashing events happened due to the protocol failure, usually are accompanied with bug fixes or other measures to prevent the same incident. On the one hand, we could say that protocols with frequent slashing are bad, but on the other hand, they are the most battle-tested. ***Can we realistically expect to capture this through modeling?*** Some might take a **Bayesian approach** to the problem, starting with prior knowledge and refining it over time as new information becomes available. This method feels natural since it mirrors the way we learn — adjusting our beliefs based on experience. However, despite its advantages, the process of updating still encounters the same challenges. Taking the aforementioned into account, it becomes clear that application of pure data-driven approaches for slashing risks evaluation might bear low precision and slow (assuming it is possible) convergence to the state that has sufficient predictive power. Accurate estimates of loss likelihood appears to be an extremely complicated problem. ### Fundamental Infrastructure Analysis Nevertheless, there's still a strong demand for a tool to manage restaking portfolios, prompting the search for workarounds. In the meantime, ***fundamental infrastructure analysis*** can bridge the gap until sufficient data is available and serve as a foundation for future models, especially when forming a prior knowledge. By fundamental infrastructure analysis we understand exploration of root-level reasons of slashing. It requires a careful definition of the slashing risk landscape: what scenarios are possible and how they might unfold. This approach can include statistical tools, but as a complementary thing, usually used for inspection of a specific scenario. :::info **Abstract Example:** - Let’s assume there’s a condition, that enables a scenario when honest operators can be mass-slashed due to other operator(s) actions (i.e. just halting the node). - Let it also happen only in case of operator(s) gaining critical amount of stake. First of all, one should actually examine how slashing is triggered, ideally in the code, to see that such a risk exists. Then one must evaluate decentralization of the operator set and figure out if the current operator set state is close to dangerous. ::: </aside> The analysis can be multifaceted, involving usage of various metrics (i.e. to evaluate the decentralization as in the example). In the end, one should be able to draw a conclusion whether mass-slashing event is possible or not. :::info A typical question that a researcher would want to answer: - Is there anything that can trigger slashing if an operator is honest? - If the answer is no, then network-related restaking risk is zero. Everything is reduced to an operator risk. ::: Usually, it’s not well translated into actual probability of the slashing event: most likely a researcher using this approach will only be able to describe it in terms like “it will not happen given the existing environment” or “high chance it happens soon”. These conclusions can be expressed as *a normalized score,* i.e. from 1 to 10. The more well-structured the framework and the deeper the analysis, the stronger the connection between the score and the actual probability or expected loss. Due to the fact that networks in general are different from the operational perspective, we will have unique points to consider on the low level. However, we could (or even must) systematize them up to some higher level of abstraction, that is appropriate for network comparison. If we are successful enough at this stage, then we can proceed with a portfolio construction. Unfortunately, we can’t guarantee the precision of this approach. For now, we have to rely on assumptions and treat them as such — educated guesses, even when working with objective raw data. It is *an expert assessment,* which is inherently subjective: - The process of transforming raw facts to scores depends on a specific person, either doing this or developing a methodology. - Then the score interpretation (translation to the likelihood of loss) is also something that depends on an observer of final scores. In any case, the process and outcome of fundamental analysis provide a wealth of valuable (often objective) insights that can either put all fears to rest or immediately exclude a network from a portfolio. And once again, combined with statistical methods, it serves as a foundation for prior knowledge and model development. ***This a necessary first step for any sensible risk evaluation.*** ## Risk Evaluation Framework In this section, we’ll break down the network risk evaluation framework built on fundamental analysis. Speaking broadly, the outcome of the evaluation of the network is aimed to be *a combination of raw facts and recommendations* that form: - List of standard metrics and their values. This is a fairly abstract feature set, designed to be applicable across a range of possible network types. - Metrics should be backed by justifications, but we’re not doing this for every network/metric here to keep the article concise. - Metrics’ values are accompanied by confidence score, that represent the uncertainty around the evaluation result. The uncertainty can be driven by subjectivity of the reasoning and/or lack of information. It aims to achieve higher transparency in the result communication and avoid possible biases. - The final risk score, which is calculated by aggregating the metric values. ### Which Conditions Can Trigger Slashing Risk? Slashing is triggered when operators fail to meet protocol requirements, which can occur in a couple of circumstances: - **Faulty Execution**: When an operator performs computations incorrectly, violating service commitments to a network. - **Liveness Failures**: When an operator fails to maintain the required availability or responsiveness, leading to service disruptions. Any of the above can occur independently of operator intent, making it essential to evaluate both voluntary and involuntary actions when assessing slashing risks. Some failures stem from honest mistakes, like misconfigurations or downtime, while others involve deliberate exploitation for economic gain. Slashing is enforced when either type of misbehavior threatens protocol security or functionality, requiring a clear distinction between systemic vulnerabilities and adversarial intent for effective mitigation. Common reasons that can trigger slashing in networks include misattestation of block/data, double-signing, operational downtime, laziness (imitation of proper work), breach of developers’ secret keys, wrong computation output, or censorship. ### **Risk Metrics for Network Evaluation** Our scoring framework evaluates various aspects of security and execution within networks, from observable system-level factors to more nuanced and subjective considerations, from which a quantitative scoring scale is heuristically derived. By incorporating these metrics, we aim to comprehensively assess slashing risks from potential slashing events for restakers. 1. **Execution Architecture:** measures how well a network fulfils its intended purpose through its infrastructure and architecture, focusing on whether its execution layer aligns with proper standards, or in some cases, state-of-the-art advancements. For example, **DA networks** that allow operators to reserve only insufficient bandwidth chunks (e.g., less than 0.5MB/s) face increased data congestion, surge pricing, and fluctuating gas fees, significantly limiting throughput compared to optimal allocations like 1MB/s. Similarly, the absence of anti-Sybil protections in **Interop networks** exposes the system to malicious or spam transactions, reducing its reliability and security. For **Oracle networks**, relying solely on TLS for secure data transmission may be adequate, but integrating zkTLS as a state-of-the-art solution further enhances privacy and ensures authenticity without revealing sensitive data. Inadequate or absent execution design and features may not only degrade performance but also increase the likelihood of slashing risks, as attackers may exploit these vulnerabilities to disrupt operations or compromise protocol integrity. In a nutshell, this metric contrasts the protocol’s execution profile with optimal designs to assess its resilience, scalability, and overall effectiveness in fulfilling its teleology, i.e., ultimate function. <details> <summary>Clarifying Qs</summary> - How does the current network execution architecture deviate from adequate or state-of-the-art standards in ways that expose it to heightened slashing risk? - What are the viable attack vectors/scenarios that may disturb operational stability of the network? Can these scenarios lead to slashing events? Are these scenarios realistic? </details> 2. **Consensus Design:** evaluates whether a network’s consensus model is designed to meet its specific throughput demands, validator uptime requirements, and decentralization or permissioning constraints. A poorly matched consensus model can lead to inefficiencies in state resolution, delayed finality, and a higher likelihood of protocol breaches, such as missed attestations or conflicting signatures. Additionally, the absence of features like DVT for improved fault tolerance or TEE for secure computation can weaken decentralization and increase execution risks. Ensuring that the consensus profile is well-integrated with the infrastructure and ethos is essential for maintaining system integrity, sustaining validator participation, and minimizing slashing risks. <details> <summary>Clarifying Qs</summary> - How does the current network consensus architecture deviate from adequate finality, throughtput, and validator agreement requirements in ways that increase slashing risk? </details> 3. **Slashing Conditions:** evaluates whether the slashing rules of a network are clearly defined, well-designed, and effectively implemented. Ambiguous or poorly thought-out slashing conditions may fail to penalize malicious operators or unjustly penalize honest ones, leading to severe stake capital losses and reputation damage. :::info At time of writing, networks are still brainstorming the design of slashing rules. ::: <details> <summary>Clarifying Qs</summary> - Are all the slashing conditions truly objective (unless slashing conditions are explicitly specified for intersubjective slashing mechanisms)? - What is the mechanism of attribution? Are all triggering events tracked? How the consensus over slashing is reached? - Is there any scenario for a honest operator to be slashed while being online and correctly performing their duties? How realistic they are and what should happen? </details> 4. **Security Audits** identify vulnerabilities in the codebase or protocol design pre-deployment. A lack of sufficient audits—whether in quantity and/or quality—significantly increases the chance of validation errors and bugs, making operators more prone to slashing. <details><summary>Clarifying Qs</summary> - How many unique audits (firms) has been conducted? - How reputable are the audit firms? - What was last time the audit was made? - What exactly was audited? I.e. specific smart contracts or a client software run by operators. - Is it relevant for the current version of the protocol/client software)? - What was the outcome of the latest audits? What vulnerabilities were spotted? Were they fixed? </details> 5. **Code Complexity** considers the difficulty of maintaining and validating complex systems and codebases. Highly complex, extensive, and intricate codebases are harder to maintain and thus increase the likelihood of bugs or misconfigurations, even by honest operators. <details> <summary>Clarifying Qs</summary> Considering: LoC (lines of code), external dependencies and third-party integrations, number of repositories, product verticals, documentation comprehensiveness, and unnecessary vs. inevitable code complexity. </details> 6. **Maturity Level** reflects how well-tested and operationally established a network is. Immature protocols carry slightly higher risks due to untested mechanisms by both users and operators, while more mature networks are likely to have fewer operational pitfalls and a stronger track record. <details> <summary>Clarifying Qs</summary> - How much time passed since the first testnet? How much time passed since the mainnet? - How much time passed since the latest update of the protocol with critical operational changes took place? - Has the protocol experienced peak loads in the mainnet? Were there realistic stress tests? That's important if operational stability can be related to slashing conditions. - How did it impact the operations? If the impact was negative, what was fixed (if it was)? </details> 7. **Reputation Level** is a flexible metric designed to account for long-tail risks associated with a network development team having bad intentions or failing to deliver high-quality products. For instance, a team might fake progress to execute a rug pull or have a track record of producing poor-quality code. In the worst-case scenario, they could deliberately develop features that trigger mass slashing events purely out of malice. In many ways, assigning a reputation score resembles a due diligence process. <details> <summary>Clarifying Qs</summary> - Have any founding team members ever been involved in fraudulent activities? - What other (past) projects are they associated with? - What do reputable, independent experts say about the protocol? - Has there been any sign of a potential rug pull? - What does contribution activity in their repositories look like over time? </details> :::info Note, the analysis excludes specifics on slashing-related procedures at the restaking protocol level, such as slashing committees or validators unbonding periods. ::: ### Formulas Our formulated approach computes the final network risk score ($R$) by quantifying metrics based on their risk score, weights ($w_i$), confidence levels ($c_i$), and an overall risk sensitivity parameter $(S_i)$. $R$ provides a balanced and comprehensive estimate of slashing risk across a network protocol, proxied to expected stake loss from a restaker. **Network Table Template** The below table offers a template for the evaluations in the next section, structuring the above denoted metrics, their scores and assigned weights, and specific confidence levels. | **NETWORK METRICS** | **METRIC RISK SCORE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL RANGE** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | --- | | Execution Architecture | ?/10 | 20% | [0%, 100%] | [0%, 100%]| | Consensus Design | ?/10 | 20% | [0%, 100%] | [0%, 100%] | | Slashing Conditions | ?/10 | 20% | [0%, 100%] | [0%, 100%] | | Security Audits | ?/10 | 10% | [0%, 100%] | [0%, 100%] | | Code Complexity | ?/10 | 10% | [0%, 100%] | [0%, 100%] | | Maturity Level | ?/10 | 10% | [0%, 100%] | [0%, 100%] | | Reputation Level | ?/10 | 10% | [0%, 100%] | [0%, 100%] | *Notes*: - Metric risk scores $(m_i)$ range from 1 (least risky) to 10 (most risky); - Weights $(w_i)$ reflect the relative importance of each metric to the slashing assessment, totalling 100%; - Confidence levels $(c_i)$ indicate our perceived reliability toward each metric’s risk, ranging from 0% (low confidence) to 100% (high confidence). **Execution Architecture, Consensus Design, and Slashing Conditions** are the primary contributors, accounting for **60% of the total weight** due to their most critical impact and likely cause of slashing risk. The remaining metrics—Security Audits, Code Complexity, Maturity Level, and Reputation Level—each contribute 10%, as they are often more subjective to the assessment and deemed less critical to the evaluation’s end goal. **Methodology** The scoring model adjusts each metric’s risk based on its confidence level, applies an exponential weighting factor to emphasize higher-risk metrics, and computes a balanced score between best-case and worst-case slashing scenarios. Confidence levels $(c_i)$ define the range of possible risk for each metric by setting upper $(m_{i+}')$ and lower $(m_{i-}')$ bounds: $$ m_{+}' = \min\left(m_i + m_i \cdot (1 - c_i), 10\right) $$ $$ m_{i-}' = \max\left(m_i - m_i \cdot (1 - c_i), 0\right) $$ - A low confidence level (e.g., 10%) widens the range between $m_{i+}'$ and $m_{i-}'$, reflecting greater uncertainty; - A high confidence level (e.g., 90%) narrows the range, reinforcing precision by constraining fluctuation; - Metrics with limited data contribute to wider deviations, while well-supported metrics anchor the estimate more firmly. To emphasize metrics with higher risk in the final scoring, an exponential factor adjusts the weights $(w_i)$, based on the risk sensitivity $(S)$ and the metric risk bounds $(m_{i\pm}')$: $$ w_{i\pm}' = w_i^{\left(1 + \frac{s-1}{5} \cdot \max(0, m_{i\pm}' - 5)\right)} $$ - If $S=1$, weights remain unchanged, treating all metrics equally; - If $S>1$, metrics with $m_{i\pm}'>5$ receive exponentially greater influence in the scoring; - At this phase of network development and to highlight the last point, we decided to assign $S=2$ in our calculations. The risk bounds $(R_{\pm})$ are calculated by weighting each metrics’ bounds $(m_{i\pm}')$ by their adjusted weights $(w_{i\pm}')$, and normalizing by the sum of the weights. The final network score $(R)$ is the average of the upper-bound $(R_+)$ and lower-bound $(R_-)$ risks, providing a balanced measure of the best and worst-case slashing risk scenarios, independent of network sector and use case. $$ R_{\pm} = \frac{\sum\limits_{i} m_{i\pm}' w_{i\pm}'}{\sum\limits_{i} w_{i\pm}'} $$ $$ R = \frac{R_-+R_+}{2} $$ ## **Practical Application: Risk Scoring of Networks** The following networks were selected for our practical calculations because they span diverse tech categories and maturity levels, providing interesting examples that demonstrate our formula and theory's versatility and real-world utility. Risk assessment resources include networks' documentation, GitHub repositories, and Tokensight's [u—1](https://u--1.com/avs) risk integrations. We utilized Tokensight's existing analysis for EigenDA, eOracle, and Lagrange, and conducted new evaluations for Witness Chain, Ungate InfiniRoute, and Hyperlane, within the EigenLayer ecosystem. ### Network Risk Table Summary Presenting the condensed results of each network per sensitivity degree of the final risk scores: | **Network** | **EigenDA** | **eOracle** | **Lagrange** | **Ungate** | **Witness Chain** | **Hyperlane** | | --- | --- | --- | --- | --- | --- | --- | | **Category** | DA | Oracle | ZK | AI | DePIN | Interop | | **Final Risk Score $(R)$, S=1** | 4.10 | 4.00 | 3.70 | 4.80 | 4.17 | 5.08 | | **Final Risk Score $(R)$, S=2** | 5.63 | 5.78 | 5.37 | 6.26 | 5.68 | 6.16 | | **Final Risk Score $(R)$, S=3** | 6.10 | 6.43 | 5.82 | 6.80 | 6.00 | 6.51 | *We encourage the curious reader to refer to the Appendix for the complete rationale and calculations behind each risk score.* ### Risk Score Interpretation As covered earlier, > *Following probabilistic definition of the expected value, we can express $expected\:loss$ or $\mathbb{E}[loss]$ as $\sum p_i*loss_i$, for all possible slashing events (or scenarios) $i$. In addition to that, we can come up with variance, $\mathbb{V}[loss]$, to reflect the uncertainty.* > The formulated $R$ interprets as a valid proxy metric to $\mathbb{E}[loss]$ as network slashing risk directly relates to the expected stake loss for the restaker, with the built-in variance $(\mathbb{V}[loss])$ represented as $w_i$ and $c_i$, weighting and confidence, respectively. We will proceed with $R$ calculations for EigenDA, eOracle, Lagrange ZK Coprocessor, Ungate InfiniRoute, Witness Chain, and Hyperlane, on EigenLayer, in the next section. --- Here’s how a Risk Score $(R)$ should be interpreted: - **$R$ = 0**, reflects a network with a highly secure and robust infrastructure across all key metrics. The **Execution Architecture** is robust and well-designed and the **Consensus Design** effectively meets protocol requirements and fault tolerance standards, ensuring very strong protection against intentional and unintentional exploits and liveness risks, significantly reducing the likelihood of slashing activations. **Slashing Conditions** are clearly defined and fair, minimizing the risk of unjust penalties. Extensive **Security Audits** by reputable firms ensure vulnerabilities are identified and mitigated, and **Code Complexity** remains manageable, reducing the risk of bugs. A **high Maturity Level** indicates extensive real-world testing and strong tack record, and a strong **Reputation Level** suggests a history of reliability and trust, with respect to the founders and sentiment around the protocol itself. *Main takeaway: Slashing is highly unlikely in such a network, making it a safe choice for restakers.* - **$R$ = 5**, signals moderate slashing risk due to trade-offs in key areas. **Execution Architecture** may be partially optimized but still susceptible to performance bottlenecks. **Consensus Design** might rely on mechanisms that are optimal but introduce potential unnecessary disputes or liveness risks. **Slashing Conditions** may lack full clarity and scope, increasing uncertainty around penalties. **Security Audits** may be fewer, of lower quality, or performed solely by one entity, leaving some vulnerabilities unaddressed, while **Code Complexity** could be moderately high, making maintenance and error detection more difficult. A **medium Maturity Level** suggests the network still lacks testing, and a **Reputation Level** that is not fully established may indicate a history of operational issues or governance concerns. *Main takeaway: Restakers should monitor developments and proceed cautiously.* - $R$ **=** **10**, represents a high-risk network with significant weaknesses across multiple factors. **Execution Architecture** is poorly architected and designed, leading to performance instability and security gaps. **Consensus Design** may be inadequate to its purpose and needs, making it fragile, centralized, or vulnerable to manipulation. **Slashing Conditions** may be unclear, and too lenient or unjustly punitive. A lack of **Security Audits** increases the likelihood of undetected exploits, and a high **Code Complexity** invites errors and makes misconfigurations more probable. A **low Maturity Level** indicates limited real-world testing, increasing uncertainty, while a **poor Reputation Level** suggests previous failures, governance issues, or untrustworthy founders’ pasts. *Main takeaway: Restaking in such a network carries a high probability of potential financial loss.* In order to categorize networks into risk groups, we suggest three straightforward tiers that align with our scoring methodology: **Blue Chip Networks** (R = [0,5]) that offer exceptional security with minimal slashing risk, **Moderately Risky Networks** (R = (5,8]) that pose mild security risks requiring careful monitoring, and **Extremely Risky Networks** (R = (8;10]) that demonstrate fundamental flaws that significantly heighten the probability of financial loss. Restakers can use this classification framework as a clear decision-making guide when evaluating networks. ## Conclusion As the number of networks continues to grow, managing a restaking portfolio becomes an increasingly complex challenge. Restakers aiming for a mindful risk/reward approach must navigate through multiple layers of analysis. In this article, we presented a framework for network risk evaluation that addresses the limitations of purely data-driven approaches. It provides a perspective focused on understanding slashing risk from a fundamental standpoint, including the analysis of architecture, slashing conditions, and other factors that could impact the likelihood of local "black swan" events. However, despite the compelling insights our approach offers, there are still some challenges, where the major ones are: - Possible expert evaluation bias in metric score estimation & lack of formal rules - Final score interpretation & quantification To address the first issue, we aim to introduce clearer standards for how metric scores are derived, as we currently follow a more narrative-driven approach. This should also help to minimize the second one. In addition, we aim to align other methodologies with ours and implement cross-validation. We welcome the broader community to contribute to the protocol evaluations, and we hope our framework will help drive the continued development of (re)staking risk management. Thank you for reading! --- ### **Suggestions for Further Research & Methodology Refinement** Although extensive, we acknowledge that this article is not entirely exhaustive on certain important topics, yet we believe it provides a strong foundation for future exploration and refinement of the methodology. 1. **Validation Against Mature Networks**: As discussed, the current framework extends beyond restaking. It would be valuable to include a mature, non-network PoS protocol with a well-documented history of slashing events to test the applicability of the methodology. Doing so would help correlate theoretical risk scores with empirical slashing data. 2. **In-Depth Analysis of Slashing Protection Mechanisms** 1. **Network Layer**: Network slashing data is currently unavailable. As initial implementations roll out, we will be better positioned to clarify slashing conditions and refine existing categories—*Faulty Execution* and *Liveness Failures*. 2. **Restaking Protocol Layer**: Further examination is needed to understand how slashing implementations vary across protocols like EigenLayer, Symbiotic, Jito, and others. These implementations likely differ in meaningful ways that impact the risk profile. Incorporating protocol-specific nuances will improve the accuracy and granularity of risk assessments. 3. **L1 Layer**: Understanding how networks mitigate slashing risk can refine the model. Ethereum, for example, uses built-in protections like attestation history and cross-verification, though many slashing events still arise from node migrations that bypass these safeguards. Identifying the presence or absence of similar mechanisms in other L1s would provide valuable insight. 3. **Externalities Triggered by a Slashing Event**: The inherent shared security characteristic of restaking protocols implies that a single slashing event can have ripple effects across the ecosystem, affecting other networks as well. On a future version, we might consider this angle in greater depth. 4. **Scoring More Networks**: Expanding the dataset by evaluating more networks would strengthen the study. However, this remains a complex task that likely requires coordination with network teams to access necessary data. ### Recommended reading / Learn more Below we propose resources to assist readers on their journey to understanding restaking basics and networks: * [You Could've Invented EigenLayer](https://www.blog.eigenlayer.xyz/ycie/) * [AVS Overview by EigenLayer](https://docs.eigenlayer.xyz/developers/Concepts/avs-developer-guide) * [Introducing Symbiotic: Permissionless Restaking](https://blog.symbiotic.fi/symbiotic-intro/) * [Understanding Networks by Symbiotic](https://docs.symbiotic.fi/intro/networks) * [What are actively validated services (AVS)? by Binance](https://academy.binance.com/en/articles/what-are-actively-validated-services-avs) * [Understanding the EigenLayer AVS Landscape by Coinbase](https://www.coinbase.com/blog/eigenlayer) * [Introduction to Restaking Risk Framework by P2P.org](https://p2p.org/economy/restaking-risk-surface/) * [u-\-1: Restaking Dashboard](https://u--1.com/avs) * [Intro to EigenDA: Hyperscale Data Availability for Rollups by Eigenlayer](https://www.blog.eigenlayer.xyz/intro-to-eigenda-hyperscale-data-availability-for-rollups/) * [EigenDA: AVS Cryptoeconomic Risk Analysis by Tokensight](https://paragraph.xyz/@tokensightxyz/eigenda-avs-cryptoeconomic-risk-analysis) ### Authors **Pavel Yashin** is a researcher at [P2P.org](http://P2P.org). P2P.org is industry-leading secure, non-custodial staking platform serving over 90,000 delegators globally. With billions of dollars in staked and restaked digital assets, the company ranks among the world's largest Ethereum validators. - [Twitter](https://x.com/PaulYa5hin) - [LinkedIn](https://www.linkedin.com/in/pavel-yashin-a8bb49258/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3B7kWruUX2RAWd%2Fna8gKVzVA%3D%3D) **Bernardo Vicente** is the founder of Tokensight, leading (re)staking research firm specializing in AVS/Network and LRT protocol security, leveraging autonomous risk agents. - Twitter: [bvicentes](https://x.com/bvicentes)/[tokensight](https://x.com/tokensightxyz) - Telegram: benvicc *This is a living document. If you notice inaccuracies or have suggestions for improvement, please reach out. The restaking ecosystem evolves rapidly, and community input is crucial for keeping risk evaluations accurate and relevant.* ## Appendix A ***Disclaimer:** This assessment is intended to illustrate how the framework and methodology function in practice and was conducted thoughtfully and in good faith. To reiterate, the selection of networks/services was neither biased nor sponsored by any party. While the evaluation is as comprehensive as possible, we acknowledge that certain elements may be absent. The below evaluations were performed as of early February 2025; protocol upgrades, updates, and newly-provided feedback will be considered in future iterations of this research.* In this section, we proceed with detailed assessments for each network, within the EigenLayer protocol, including technical scoring for $m_i$ and $c_i$ values, based on technical research, reference materials, and calculated results as shown in the tables below. ### EigenDA | **NETWORK METRICS** | **METRIC RISK VALUE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | | Execution Architecture | 3 | 20% | 90% | | Consensus Design | 2 | 20% | 90% | | Slashing Conditions | 5 (Default value for N/A) | 20% | 5% | | Security Audits | 7 | 10% | 80% | | Code Complexity | 5 | 10% | 50% | | Maturity Level | 7 | 10% | 70% | | Reputation Level | 2 | 10% | 80% | <details> <summary>Explanation for values assignment</summary> **Execution**: Very well architected with bandwidth reserves enabled and node-to-node unicast communication for data ordering (no single leader); faulty in permissioned disperser, missing TLS (to secure node-to-node communication), DAS (to enable light clients to verify data availability), threshold cryptography (to prevent single-party control over data attestations); **Consensus**: BFT appropriate, low node requirements foster decentralization, still missing DVT and TEE. Medium code complexity. Medium maturity (mainnet since 6 to 12m). Extremely good reputation. [1 audit performed](https://github.com/Layr-Labs/eigenda/tree/master/docs/audits). *Resources*: - https://u-1.com/avs/0x870679e138bdcf293b7f14dd44b70fc97e12fc0 - https://docs.eigenda.xyz/ - https://github.com/Layr-Labs/eigenda </details> | **Risk Score by Sensitivity** | **Value** | | --- | --- | | **R, S=1** | 4.10 | | **R, S=2** | **5.63** | | **R, S=3** | 6.10 | --- ### eOracle | **NETWORK METRICS** | **METRIC RISK VALUE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | | Execution Architecture | 3 | 20% | 90% | | Consensus Design | 3 | 20% | 90% | | Slashing Conditions | 5 (Default value for N/A) | 20% | 5% | | Security Audits | 1 | 10% | 80% | | Code Complexity | 7 | 10% | 90% | | Maturity Level | 7 | 10% | 70% | | Reputation Level | 3 | 10% | 70% | <details> <summary>Explanation for values assignment</summary> **Execution**: Very good execution design with usage of third-party data providers and threshold-BLS signatures, possibly missing TLS/zkTLS (for secure, encrypted communication between parties), MPC (to eliminate single point of failure in execution), and FHE (to allow computations to be performed on encrypted data); **Consensus**: Adequate BFT (Tendermint) + PoS mechanism. Highly complex code complexity by necessity (not by negligence) due to custom blockchain, extensive LoC, etc. Medium maturity (mainnet 6 to 12m). Very good reputation. [8 audits performed by 4 different entities](https://github.com/Eoracle). - *Resources*: https://u--1.com/avs/0x23221c5bb90c7c57ecc1e75513e2e4257673f0ef https://docs.eoracle.io/docs https://github.com/Eoracle </details> | **Risk Score by Sensitivity** | **Value** | | --- | --- | | **R, S=1** | 4.00 | | **R, S=2** | **5.78** | | **R, S=3** | 6.43 | --- ### Lagrange ZK Coprocessor | **NETWORK METRICS** | **METRIC RISK VALUE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | | Execution Architecture | 2 | 20% | 90% | | Consensus Design | 2 | 20% | 90% | | Slashing Conditions | 5 (Default value for N/A) | 20% | 5% | | Security Audits | 4 | 10% | 80% | | Code Complexity | 5 | 10% | 50% | | Maturity Level | 7 | 10% | 70% | | Reputation Level | 3 | 10% | 70% | <details> <summary>Explanation for values assignment</summary> **Execution**: Very solid profile with decentralized sequencing through the DARA mechanism. Usage of zkSNARKs requires a trusted setup (zkSTARKs would not). **Consensus**: A pure zero-knowledge system like Lagrange is secure against incorrect computations unless the circuit logic breaks. A Tendermint-like validator rotation feature could improve protection against liveness failures. Medium code complexity. Medium maturity (mainnet 6 to 12m). Very good reputation. [4 audits by same firm](https://github.com/Lagrange-Labs/lsc-contracts/tree/develop/audits). Would most likely only require intersubjective slashing rules. - *Resources*: https://u--1.com/avs/0x22cac0e6a1465f043428e8aef737b3cb09d0eeda https://docs.lagrange.dev/ https://lagrange.dev/blog https://github.com/Lagrange-Labs </details> | **Risk Score by Sensitivity** | **Value** | | --- | --- | | **R, S=1** | 3.70 | | **R, S=2** | **5.37** | | **R, S=3** | 5.82 | --- ### Ungate InfiniRoute | **NETWORK METRICS** | **METRIC RISK VALUE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | | Execution Architecture | 4 | 20% | 90% | | Consensus Design | 4 | 20% | 90% | | Slashing Conditions | 5 (Default value for N/A) | 20% | 5% | | Security Audits | 8 | 10% | 50% | | Code Complexity | 3 | 10% | 60% | | Maturity Level | 7 | 10% | 80% | | Reputation Level | 4 | 10% | 80% | <details> <summary>Explanation for values assignment</summary> **Execution**: Well-designed ML-reinforced routing system. Operator attestations published to L1 via Merkle proofs (L2 integration worth considering for cost efficiency and reduced latency). AI model selections benchmarked against "accuracy parameters"—unclear criteria. **Consensus**: OBLS (Othentic BLS): BLS-aggregated signatures, stake-weighted control (with max power limits). At scale, minor centralization concerns may persist (Tendermint/CometBFT-style leader rotation feature worth considering). Missing DVT (for improved validator decentralization) and TEE (for secure execution even under validator compromise). Low code complexity. Medium-low maturity (mainnet < 6 months). Good reputation. No audits performed, solely relying on Othentic's audited codebase. - *Resources*: https://ungate.gitbook.io/ungate-infiniroute-avs https://github.com/Ungate-Ai </details> | **Risk Score by Sensitivity** | **Value** | | --- | --- | | **R, S=1** | 4.80 | | **R, S=2** | **6.26** | | **R, S=3** | 6.80 | --- ### Witness Chain | **NETWORK METRICS** | **METRIC RISK VALUE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | | Execution Architecture | 4 | 20% | 90% | | Consensus Design | 2 | 20% | 90% | | Slashing Conditions | 5 (Default value for N/A) | 20% | 5% | | Security Audits | 4 | 10% | 80% | | Code Complexity | 6 | 10% | 50% | | Maturity Level | 7 | 10% | 50% | | Reputation Level | 3 | 10% | 70% | <details> <summary>Explanation for values assignment</summary> **Execution**: Well architected proving system, with centralized aggregator. Missing TLS (to prevent MitM attacks on UDP pings), threshold cryptography (to secure shared delay reference data), and MPC (to protect watchtower latency measurement data). **Consensus**: ZK proof verification-based, with stake-weighted design — raises centralization concerns. A Tendermint-like validator rotation feature could help mitigate liveness risks. Missing DVT. Medium code complexity. Low maturity (mainnet < 6 months). Very good reputation. [2 audits by 2 firms](https://github.com/witnesschain-com/diligencewatchtower-contracts/tree/main/audits). - *Resources*: https://docs.witnesschain.com/ https://github.com/witnesschain-com </details> | **Risk Score by Sensitivity** | **Value** | | --- | --- | | **R, S=1** | 4.17 | | **R, S=2** | **5.68** | | **R, S=3** | 6.00 | --- ### Hyperlane | **NETWORK METRICS** | **METRIC RISK VALUE (mi)** | **WEIGHTING (wi)** | **CONF. LEVEL SPEC. (ci)** | | --- | --- | --- | --- | | Execution Architecture | 4 | 20% | 90% | | Consensus Design | 5 | 20% | 90% | | Slashing Conditions | 5 (Default value for N/A) | 20% | 5% | | Security Audits | 7 | 10% | 80% | | Code Complexity | 7 | 10% | 50% | | Maturity Level | 5 | 10% | 70% | | Reputation Level | 4 | 10% | 60% | <details> <summary>Explanation for values assignment</summary> **Execution**: Good execution profile with a permissionless relayer; missing MPC (for validator key security in multisig ISM), TLS (to protect relayer communication), and BLS threshold cryptography (useful for multisig ISM validator signature aggregation and checkpoint attestations). **Consensus**: Modular consensus mechanism via ISMs. PoS or basic BFT may bottleneck liveness/throughput in an interoperability service; CometBFT would be more suitable. Missing DVT and TEE (introducing some collusion risk). Medium-high code complexity (due to modular architecture). Good maturity (mainnet 12–18 months). Good reputation. [1 audit performed](https://github.com/oak-security/audit-reports/tree/main/Hyperlane). - *Resources*: https://docs.hyperlane.xyz/ https://github.com/hyperlane-xyz </details> | **Risk Score by Sensitivity** | **Value** | | --- | --- | | **R, S=1** | 5.08 | | **R, S=2** | **6.16** | | **R, S=3** | 6.51 | >