Runtime Monitoring as DeFi Security Solutions

# Runtime Monitoring as DeFi Security Solutions tl;rd - Cryptocurrency as the promise of a decentralized financial future is challenged by losses that have reached at least $3.24 billion from 2018 to 2022. The sobering reality underscores the urgency of improving security; - Current approaches to crypto security mostly concentrate on ex-ante solutions. While these methods have value, they fall short in various ways. They are often time-consuming, limited to a fixed set of attack vectors, and constrained by scalability and coverage complexities. The inherent challenge is that protocols must operate without error, while attackers have the advantage of needing only one successful exploit; - There's a new breed of method, runtime monitoring, that shifts the focus from months of defense preparation to preemptive prevention, enabling responses within seconds. By tracking fund flow, identifying malevolent contract deployments, and monitoring live state updates, runtime monitoring offers a more dynamic and comprehensive approach to DeFi security; - However, the adoption of runtime monitoring must be approached with caution. While it offers promising solutions, its centralized nature and the infancy of the technology could lead to transaction censorship. Striking a proper balance is essential, as this type of service, while enhancing security, could come with potential downsides that align with censorship :::info *“One vision for crypto is, like, "This is a computer game, and if I can figure out a way to win the computer game that the designers did not intend, then I am cool and clever. also, the game pays money.” another vision is, like, "We are working to build a new financial system and a new vision of the web.” - Matt Levine* ::: In crypto security, the stakes are high. According to the [study](https://arxiv.org/pdf/2208.13035.pdf), users, liquidity providers, speculators, and protocol operators suffered a total loss of at least $3.24 billion from Apr 30, 2018 to Apr 30, 2022. To put it into perspective, that’s roughly one-ninth of the [circulating supply of USDC](https://coinmarketcap.com/currencies/usd-coin/)(~27 billion) and more than half of [the entire cash reserve of Lithuania](https://data.imf.org/regular.aspx?key=61280813)(~$5.4 billion). That's not just a number; it's a testament to the magnitude of the challenge we as an industry face. This reality underscores the urgency to imprive crypto security. ## Lay of the Land - Crypto Security Practices Today ![diagrampng](https://hackmd.io/_uploads/HyoOSJhPT.png) :::info Ex-ante Solution: Security measured implemented prior to contract deployment, designed to identify and mitigate potential vulnerabilities during the earliest stages of development. ::: The prevailing strategy in crypto security today is like fortifying a castle before the enemy arrives, focusing on ex-ante risk management. The idea is that robust defenses may deter attackers. In the next section, we'll explore commonly used measures. **Smart Contract Audit** - **What is Smart Contract Audit?** A smart contract audit is akin to an accounting audit for a pre-IPO company. Before deployment on the main net, a smart contract typically needs to be audited by one or more smart contract auditors. **[Research](https://arxiv.org/pdf/2208.13035.pdf)** shows that audits can decrease the average probability of an exploit by a factor of four. It's essentially a white-glove service, allowing developers to consult with domain experts, especially valuable for complex logic. - **Why Smart Contract Audit is not Good Enough?** Audits can take time and can be expensive. To speed things up, some auditors are now using automated tools, such as [Slither](https://github.com/crytic/slither), [Echidna](https://github.com/crytic/echidna), or [Mythril](https://github.com/ConsenSys/mythril) to catch the low-hanging fruit, freeing them up to focus on the trickier issues. But still, smart contract audit is not a highly scalable solution. Automatic tools excel in recognizing known vulnerabilities but fall short in detecting zero-day attacks. Thus, manual audits remain crucial, and efforts to accelerate the process must be balanced carefully against the risks posed by unforeseen attacks. More importantly, audits cover only a limited range of attacks. In fact, **[51.5% of the hacked protocols were audited](https://beosin.com/resources/Global_Web3_Security_Report_2022_.pdf)**. For instance, oracle attacks, which constitute more than half of the ecosystem hacks during 2020-2022, cannot be effectively detected by audits, revealing the limitations of relying solely on this method. **Chaos Engineering** - **What is Chaos Engineering?** Chaos Engineering shifts the paradigm of manual audits, which are effective within the code boundary but struggle with DeFi's composability and external vulnerabilities. It views the protocol in aggregate with external systems, performing stress tests to uncover unforeseen issues. Chaos engineering is a discipline in software engineering that tests resilience by intentionally introducing faults, often in production environments. This "fire drill" for smart contracts uncovers and fixes weaknesses before they cause system-wide issues. The rise of economic attacks in DeFi, such as **[Mango Market’s 100 million hack](https://www.bloomberg.com/news/articles/2022-10-12/crypto-platform-mango-hit-by-latest-hack-in-digital-asset-sector)**, indicates the need for new solutions. Chaos engineering fills this void by intentionally creating disruptions to identify weak points and fortify the system against real-world attacks. - **Why Chaos Engineering is not Good Enough?** While chaos engineering is a powerful tool for improving the resilience of cryptosystems, it has limited scope to only address economic-based attacks as it operates largely within the risk engineering layer, leaving the underlying codebase unexamined. Hence, [a joint effort code audits is often needed](https://quantstamp.com/blog/quantstamp-audits-gauntlets-updates-to-compound-governance-capabilities). Crypto Native Insurance - **What is Crypto Native Insurance?** Crypto-native insurance covers protocols and DAOs against hacks on-chain. Unlike preventative measures like audits, it shifts focus from code quality to ensuring losses are covered, offering a more streamlined process. However, the broad attack surface in crypto makes complete coverage challenging. - **Why Crypto Native Insurance is Not Good Enough?** While promising, crypto-native insurance faces pricing challenges. Traditional insurance relies on the law of large numbers, but crypto's early stage lacks a well-defined dataset for rigorous pricing. Consequently, crypto-native insurance can only cover a limited range of attacks. Fundamentally, crypto native insurance has a fatal flaw: one can’t insure against a sinking boat if the insurer is also onboarded. In crypto, if a hack is so monumental that it triggers a system-wide contagion, the insurance issuer might find themselves in the same sinking boat, unable to process the gargantuan claim. ### The Fundamental Problem of DeFi Security Coverage limitations are a common issue in crypto security's ex-ante risk management, worsened by the complexity of new protocols and unexpected attack vectors. It's like a high-stakes poker game where protocols must be perfect, while opponents can lose repeatedly but only need to win once. Runtime monitoring solutions change this dynamic. They're akin to seeing each player's cards and identifying malicious intent before it's played. Instead of focusing on defenses, they catch bad actors in action, minimizing preparation and addressing the problem of attack coverage. ## A New Horizon - Runtime Monitoring Solutions ![diagram](https://hackmd.io/_uploads/rkWD812PT.png) :::info Runtime Monitoring Solutions: Preventative or mitigative measures that take effect within minutes, or even seconds, before, during, or after the attack through proactive actions, upon detection of a hack ::: Runtime monitoring solutions shift the focus from months of defense preparation to preemptive prevention, enabling responses within seconds. **[Research](https://arxiv.org/pdf/2208.13035.pdf)** on over 183 smart contract exploits from 2018 to 2022 found that 47.5% of exploited protocols had circuit breaker functionality; further, 56% of the 183 attacks covered were not executed atomically. This provides a vital window for runtime monitoring solutions. The runtime monitoring solutions comprised two steps: - Threat detection - Threat prevention/mitigation The following sections will break down the common approaches respectively. ### Threat Detection Prompt detection of malevolent transactions interacting with a protocol serves as the starting point for proactive defense. Commonly, attacks unfold in a structured three-step process, each necessitating distinct detection mechanisms based on unique heuristics: 1. Funding 2. Preparation 3. Exploitation These steps represent key stages in the cycle of an exploit, each offering unique challenges and opportunities for threat detection. ![phases](https://hackmd.io/_uploads/SJBh8y2D6.png) **Funding Detection** To execute an attack, an attacker needs funds for gas, trades, or collateral. Due to KYC procedures in centralized exchanges, attackers often use privacy-centric protocols like Tornado Cash. Detecting such activities involves tracking fund flow from Mixers to addresses within three interactions (hops) from those previously implicated in incidents. This can identify potential attacks with moderate certainty. However, fund movement alone is rarely a standalone, reliable attack indicator. It's most effective when combined with subsequent detection techniques. Sole reliance on funding detections could increase the signal-to-noise ratio, compromising detection effectiveness. **Exploit Preparation Detection** In the preparation phase, the attacker, depending on the chosen attack vector, typically either deploys a smart contract (as seen in re-entrance attacks) or secures token approvals from the victim (common in ice phishing attacks). This stage presents a critical "rescue time" window, during which potential attacks can be detected and thwarted before actual exploitation occurs. ![funding](https://hackmd.io/_uploads/r1nAI12wp.png) An adversary A can deploy a smart contract with transaction txdeploy and then initiate an incident by calling the contract with txfirst. From SoK: Decentralized Finance (DeFi) Attacks [When contrasted with the "incident time" - the time window following the incident - "rescue time" generally spans a longer duration for most types of incidents](https://arxiv.org/pdf/2208.13035.pdf). This highlights the strategic importance of monitoring and action during the preparation phase of an attack. ![lifecycle](https://hackmd.io/_uploads/rJwXvJhPa.png) The incident and rescue time frame per incident type. From SoK: Decentralized Finance (DeFi) Attacks Smart contract bytecode similarity analysis has proven to be a particularly effective method of detection at the preparation stage. As noted in the [study](https://arxiv.org/pdf/2208.13035.pdf), investigators managed to cluster 173 vulnerabilities and 155 adversarial contracts using a robust similarity threshold score (above 80%). Essentially, this means that when a new contract is deployed, bytecode similarity analysis can determine if the new contract is malevolent by cross-referencing it with previous incidents. This information can potentially enable real-time identification of adversarial smart contracts. Moreover, additional heuristics such as the implementation of **[machine learning techniques](https://ieeexplore.ieee.org/document/10130487)** can be utilized to scrutinize smart contract opcodes for abnormalities in real time. These methods further enhance the proactive defensive capabilities of DeFi security mechanisms during the preparation stage. **Exploitation Detection** A significant limitation of ex-ante risk solutions lies in their inability to address attack vectors that stem from protocol composability. To tackle this challenge, exploitation detection simulates and monitors live state updates of critical addresses, marking a shift in approach. Given that most exploits are executed through a series of transactions, exploitation detection has the capacity to promptly flag incidents and curtail further damage. This method is particularly effective when used in conjunction with preparation detection. A prime example of this approach was evident during the [Euler Finance Hack](https://forta.org/blog/detecting-a-197-million-hack-before-exploitation-euler-finance-hack-retrospective/). Forta’s detection bots correctly detected the funding and malicious contract deployment 3 minutes before the first attack transaction and correctly identified Euler Finance as the victim. Unfortunately, the Euler Finance team did not react promptly to prevent the attack. **Summary** Threat detection techniques offer real-time solutions to exploits, reducing reaction time and broadening attack surface coverage. Yet, relying solely on single-stage detections can be inaccurate. The process should span the entire lifecycle of an attack vector, from funding to exploitation. Failure to integrate these stages can lead to a high signal-to-noise ratio, causing false positives and compromising detection efficacy. A holistic approach maximizes the effectiveness of these mechanisms. ### Threat Prevention Threat prevention techniques include 1. front-running exploits 2. circuit breakers **Front-running exploits** Front running the exploits is principally the same as the generalized front running technique used by front running bots documented in the article [Ethereum is a dark Forest](https://www.paradigm.xyz/2020/08/ethereum-is-a-dark-forest). The difference is that the target is not a benign transaction from the mempool, but a malevolent one after profiling the potential attacker(s). On a high level, it involves 1. detecting an exploit transaction in the public mempool 2. initiating an identical transaction with a higher gas fee such that the duplicate transaction receives priority processing 3. sending funds to a safe haven with the intention of returning them to the victim Depending on the protocol and attack vector, the implementation detail can get vary drastically. Let’s walk through an example to illustrate the process :::info **Case Study: [The Anyswap (aka Multichain) Rescue](https://blocksecteam.medium.com/the-race-against-time-and-strategy-about-the-anyswap-rescue-and-things-we-have-learnt-4fe086b186ac)** #### Detection BlockSec identified an attack on the Anyswap protocol, targeting a flaw in the **`anySwapOutUnderlyingWithPermit()`** function. Despite Anyswap's alerts, many users couldn't respond in time. #### Whitehat Rescue Over two months, from January 21st to March 11th, 2022, BlockSec secured remaining funds by transferring them to a safe address. #### Technique Used The team monitored the accounts that had approved WETH to the vulnerable contract in the mempool. When any WETH was transferred to the vulnerable contract, the team would craft and submit a transaction with a higher gas fee to preemptively undercut the attackers. #### Challenge Faced Several challenges emerged throughout this rescue operation: - The use of Flashbots (pre-PBS) complicated the gas fee bidding strategy, as attackers, other white hats, and unrelated regular transactions were all using the Flashbots, which resulted in an opaque competitive environment. - A lack of coordination among the white hats led to unintended competition, exacerbating the difficulty of the rescue operation. ::: These circumstances pose several pertinent questions: - **Scalability of Front Running:** With high stakes and complex attack vectors, plus the 24/7 nature of blockchain, can front running scale as a solution? A dedicated team may be needed to respond at any moment with block builders opt-in to provide additional support. - **Coordination among Different Interest Groups:** Secure coordination among white hats, users, and protocols is challenging. Balancing confidentiality with guidance for judicious action may be difficult. - **Addressing Attacks That Avoid the Mempool:** Many transactions bypass the mempool to evade detection, especially exploits. How can front-running work when most go through MEV-Boost? Given these concerns, it seems likely that front-running exploits may evolve into a white glove service, tailored to each specific occasion. One potential solution could be the implementation of a war-room-as-a-service model. **Circuit Breaker** The circuit breaker in Solidity lets contract owners temporarily halt and resume contract functions, useful during attacks or bug discoveries. The most recent iteration, **[ERC7265](https://github.com/ethereum/EIPs/pull/7265)**, introduces additional features such as a grace period, conferring further control options to the protocol owners regarding fund outflow. Similar to a traditional financial market's circuit breaker, ERC7265 facilitates automatic triggering in the event of extreme tail events. However, this functionality does raise several critical questions: - **Scope of the Pause:** The pause function can be applied to an entire protocol or be limited to specific modules, features, pools, or assets. A precise application of the pause function strikes a balance between risk management and minimizing disruptions for legitimate users. But how is the scope of the pause function determined? - **Risk of Centralization:** Manual activation raises centralization concerns. A malicious project could freeze the contract, as seen in rug pull schemes. Thus, the question of maintaining decentralization while providing security measures remains critical. ## JIT Risk Management in the Context of Transaction Pipeline The preceding section delved into current methodologies employed to counteract and mitigate exploits in real time. Yet, an unanswered question remains: Who are in the best position to deliver these services? The following section will delve into the runtime monitoring solution’s role in the transaction pipeline after [PBS](https://ethereum.org/en/roadmap/pbs/) and [EIP-4337](https://eips.ethereum.org/EIPS/eip-4337). ![pipe](https://hackmd.io/_uploads/rJGpO12wp.png) Transaction life cycle after PBS and EIP-4337 The transaction life cycle can be broken down into three key stages: 1. Generation of the transaction; 2. Propagation of the transaction; 3. Consolidation and validation of the transaction; Each stage presents its own unique opportunities for the application of runtime monitoring solution solutions, helping to counteract specific types of exploits ### Wallet The wallet acts as the first defense in the transaction pipeline, intercepting risks at the outset. Firstly, the wallet possesses a unique ability to detect security threats that may emanate from the front-end application. This includes risks like phishing attacks or **[DNS hijacking](https://en.wikipedia.org/wiki/DNS_hijacking)**. Secondly, when confronted with deceptive contracts that aim to defraud users – such as rug pulls – the wallet can leverage several metrics, such as source verification, token sellability, creator's contract holding, and liquidity distribution. Wallets could create scoring systems to predict the safety level of an address. Moreover, wallets can block transactions with sanctioned entities or mixed-fund pools, allowing institutional users to adjust risk factors to meet compliance guidelines and avoid regulatory risks. **4337 Smart Wallet** EIP-4337 introduced a new wallet standard based on account abstraction, adding components like the alternative mempool and bundler. These enable user-friendly features previously unattainable with traditional EOA wallets and open the door for just-in-time threat detection. The alternative mempool, unlike its conventional counterpart, enables customization. It allows for practices such as **[whitelisting](https://hackmd.io/@dancoombs/BJYRz3h8n)** paymaster addresses to circumvent canonical simulation rules. With this adaptability, it's possible to develop an alternative mempool aimed at adding security-related simulation rules, such as simulating token sellability. However, this design essentially creates a private mempool that without proper balance could lead to [compromises to censorship resistance and mempool fragmentation](https://alchemy.com/blog/erc-4337-gas-estimation). It's therefore essential that any new implementations of the alternative mempool pass the **[bundler spec test](https://github.com/eth-infinitism/bundler-spec-tests)** to avoid these potential issues. A further approach is to enable bundle-level simulation. Once the bundler peer-to-peer network is operational, it's expected that there will be more bundlers than alternative mempools, as not all bundlers need to run a mempool. Users seeking safety features could whitelist their preferred bundlers on their account contract by including a `mapping(address => bool) whitelistedBundler` and adding `require(whitelistedBundler[tx.origin] == true, "Bundler not whitelisted")` in the [`_validateSignature` function](https://github.com/eth-infinitism/account-abstraction/blob/87bcb3e42da80bd7ae3c85edb2b324e2df3aa657/contracts/core/BaseAccount.sol#L70). This strategy helps mitigate mempool fragmentation and lays the groundwork for feature-specific bundlers. ### Node Service Providers Many users submit transactions through node providers like Infura or Alchemy to avoid running a node. These providers could monitor transactions in potential attack vectors and issue alerts. However, malicious actors may avoid third-party providers, so the best use for node providers might be anti-fraud features like [address risk scores](https://marketplace.quicknode.com/add-on/anti-fraud-apis). Private RPC endpoints present an alternate pathway for including transactions, bypassing the public mempool. Given the inherently fragmented nature of private mempools as they exist today, implementing security surveillance measures is less likely to create centralization issues. ### Public and Private Mempools Transactions in the public mempool are visible, aiding threat detection. By chaining the three attack stages - funding, preparation, and exploitation - false positives can be reduced, and white hats can preemptively thwart threats. The challenge with relying on the public mempool for detection and prevention is that sophisticated malevolent actors often employ private mempools to initiate attacks. Owing to its private nature, private mempools make threat detection difficult. However, it is not to suggest that private mempools will always serve as an infallible escape for malicious activities. [Based on post-incident analysis by BlockSec following the Anyswap incidents](https://blocksecteam.medium.com/the-race-against-time-and-strategy-about-the-anyswap-rescue-and-things-we-have-learnt-4fe086b186ac), due to the congestion of private mempools, front-running exploits in the public mempool could potentially be more effective, depending on the circumstances. Therefore, a hybrid approach that employs both public and private mempools to front-run exploits warrants further research. ### Block Builder/Relay Block builders and relays, the final defense before block inclusion, require transaction simulation by default. In theory, additional simulation rules could be incorporated to identify potentially harmful transactions using the heuristics from the previous section. If they can achieve a significant market share, block builders and relays could not only detect malevolent transactions but also reject flagged transactions as they would OFAC-sanctioned addresses. This defense can handle even crafty attackers, as they must rely on block builders and relays to submit transactions, regardless of evading earlier checkpoints like wallets, node services, or public mempools. ![relays](https://hackmd.io/_uploads/HJlsAClnD6.png) MEV-Boost market participants, including block builders (>1500 blocks/month), relays, and proposers (left to right). Dec 2023. From https://mevboost.pics/ Relays face challenges in implementing additional features. Canonical relay simulation is computationally intensive, **[causing delays](https://frontier.tech/optimistic-relays-and-where-to-find-them)**, and more rules could exacerbate this. Without a clear monetization model, increased computations would further raise operation costs. In short, the concentration of transactions through a few relays and the risk of false positives could increase the censorship pressure. Block builders are better positioned to enable security features, with less risk of centralization and the ability to charge for extra security, though censorship risks from **[builder centralization](https://joncharbonneau.substack.com/p/decentralizing-the-builder-role)** should not be ignored. An experimental approach, despite yielding a less-than-ideal user experience, might involve creating a whitelist of block builders who have integrated safety measures. Transactions associated with these protocols could be exclusively channeled through whitelisted block builders. This could be achieved by requiring each user to utilize a bespoke multi-sig wallet that designates the whitelisted block builder's address as an additional signer and a specially designed EIP1271-style **`isValidSignature`** function. If a block builder, spots a transaction that leads to an abnormally large outflow of funds from the protocol, they would be authorized to place a transaction at the top of the block activating the circuit breaker. If the stake is high and security is crucial, adopting this approach could make sense. ### Validator In the study titled **[The Blockchain Imitation Game](https://arxiv.org/pdf/2303.17877.pdf)**, the authors suggest a front-running strategy. This strategy could be adapted by the validator to automate a form of whitehat hacking directed at DeFi attacks, replicating these attacks and returning the acquired funds back to the victim. However, developing an effective system of coordination and incentives requires additional investigation before such a method can be feasibly implemented. ## Word of Caution Implementing real-time monitoring solutions involves checkpoints to filter transactions, but this faces the challenge of high false positive rates. If mishandled, an entity with a large market share could exert significant censorship pressure on the network. To address these issues, it's crucial for industry participants to undertake extensive research and collaboration. The goal should be to strike a balance between the security benefits provided by real-time monitoring solutions and the risk of centralization. This necessitates efforts across various functions including technical, operational, and business, calling for a collaborative approach among all teams involved in this sector. ## **Key Takeaways** - Traditional security measures in the crypto world are constrained by their limitations in scope, effectiveness, and adaptability. - Runtime monitoring emerges as a promising solution, offering real-time detection and prevention, and a more dynamic approach to DeFi security. - A comprehensive integration of threat detection stages is vital for maximizing the effectiveness of security mechanisms. - The urgency to improve crypto security is underscored, with runtime monitoring presented as a forward-thinking approach. - However, an additional concern lies in the potential centralization and transaction censorship that runtime monitoring might introduce. This emphasizes the need for a thoughtful balance in its implementation, ensuring that the pursuit of security does not compromise the decentralized principles of the crypto.