LIP-21: trustless safety net for updating the protocol TVL

--- tags: LIP, Oracle, ZK status: draft author: Alexey Potapkin, Greg Shestakov, Eugene Pshenichniy, Eugene Mamin, George Avsetsyn --- # LIP-21: trustless safety net for updating the protocol TVL ## 🚧 WORK IN PROGRESS 🚧 ## Abstract The proposal to integrate external trustless ZK-based Beacon Chain oracles as extra safety measures for the `stETH` token rebase within the Lido protocol. It includes: - Required changes for the protocol to add these new checks on rebase - 2-of-3 Multiprover to improve security and reliability - Tiebraker quasi-oracle to speed up the start - Emergency lever to prevent DoS for the integration period (1st year) ## Motivation The `AccountingOracle` contract is a fundamental component of the Lido protocol that delivers, amongst other data, the aggregate of all Lido validators' Beacon Chain balances (`clBalance` in the report) to the protocol, thereby facilitating the daily rebase of the stETH token. The accuracy of this value is critical to the integrity of the protocol, and it relies on a committee of independently operated Oracle daemons in a 5-of-9 configuration. The protocol could be harmed if this committee is compromised, malfunctions, or colludes. This risk is acknowledged and constrained by a sanity cap that restricts the possible discrepancy in balance that Oracle can report. However, with the anticipated implementation of [EIP-4788: Beacon block root](https://eips.ethereum.org/EIPS/eip-4788) by Ethereum, it becomes possible to deliver the validators' balances trustless. It eliminates the risk of protocol TVL manipulation by the Oracle committee, reduces the amount of trust, and opens the road to permissionless unmanned protocol operation. Three proposals of trustless Oracle implementations have already been posted on the Lido research forum: - [[ZKLLVM] Trustless ZK-proof TVL oracle](https://research.lido.fi/t/zkllvm-trustless-zk-proof-tvl-oracle/5028) - [DendrETH: A trustless oracle for liquid staking protocols](https://research.lido.fi/t/dendreth-a-trustless-oracle-for-liquid-staking-protocols/5136) - [ZK Lido Oracle powered by Succinct](https://research.lido.fi/t/zk-lido-oracle-powered-by-succinct/5747) But as far as proposed solutions are MVPs and do not deliver a comprehensive accounting report, and ZK-technologies is still a bleeding edge of cryptography and may contain bugs and vulnerabilities, the present proposal focuses on integrating these oracles as a sanity check for the existing accounting report at first and with proper paranoid precautions. ## Specification In the first stage, only Beacon Chain values (`clBalanceGwei`, `numValidators`, and `exitedValidators`) will be delivered by ZK-Oracles as the most crucial for the protocol security. Further, other report values can be added when ZK-Oracles extend their functionality. Basic flow includes: - a new point of check injection in `AccountingOracle`'s `submitReportData` function - a new sanity check in `OracleReportSanityChecks` that checks if ZK Oracle report values match with values from the original Oracle report - a `Multiprover` contract responsible for the aggregation of different ZK reports ```mermaid flowchart LR; A[Accounting Oracle] --> S[Sanity Check] S --> M[Multiprover] M --> Z1[ZK Oracle 1] M --> Z2[ZK Oracle 2] M --> Z3[ZK Oracle 3] ``` ### ZK Oracles The proposed solution is leaning on several ZK-Oracle implementations that can be plugged in and out by the DAO, each represented by a corresponding contract on-chain that implements the following interface: ```solidity= interface LidoZKOracle { function getReport(uint256 refSlot) external view returns ( bool success, uint256 clBalanceGwei, uint256 numValidators, uint256 exitedValidators ); } ``` which allows the use of the ZK Oracle contract for: - learning if a report for the slot is submitted to the contract - retrieving the report values if it is present #### Synchronisation Lido Oracle daemons submit reports based on the blockchain state for the specific reference slot (`refSlot`) in the boundaries defined by the current frame and `processingDeadlineTime` parameter. ```solidity= LidoLocator locator = LidoLocator(locatorAddress); AccountingOracle oracle = locator.accountingOracle(); HashConsensus hashConsensus = oracle.getHashConsencus(); // refSlot, the beginning of the frame (uint256 refSlot,) = hashConsensus.getCurrentFrame(); (,uint256 epochsPerFrame,) = hashConsensus.getFrameConfig(); (uint256 slotsPerEpoch,,) = hashConsensus.getChainConfig(); // refSlot, the beginning of the next frame uint256 nextRefSlot = refSlot + epochsPerFrame * slotsPerEpoch; ``` > NOTE: If the `refSlot` is missed, the Oracle should return values proved against the last available non-missed slot but still accept the original `refSlot` as an argument. ### Multiprover Because ZK-crypto is still immature and based on very intricate cryptography, it is reasonable to add some redundancy to the system, having several ZK-oracles, each based on different technical stack reporting in parallel. So, if a bug or vulnerability exists in one implementation, it's unlikely to be repeated the same way on the other ZK stack. At the same time, each oracle includes an off-chain part that is responsible for creating the proof and submitting it on-chain. And this part can hardly achieve the reliability level of the Ethereum blockchain; thus, it can fail to deliver the proof at all. The Multiprover in a 2-out-of-3 configuration can amend this issue as well. The downside is the cost of the approach, which roughly doubles the protocol spending on oracles. Still, the importance of security for this critical part of the protocol, considering the overall size of the protocol TVL and the opening possibilities that ZK-tech provides, may justify these costs. So, it's proposed to build a "decorator" contract with the same interface as each separate oracle (`LidoZKOracle`) responsible for aggregating different ZK reports from three different oracle contracts. It requires at least two identical reports to be submitted to return the report successfully and must revert with an error if it fails to reach consensus. ```solidity= error NoConsensus(); interface Multiprover is LidoZKOracle { } ``` #### Tiebreaker It's possible to start a multiprover with an intermediary configuration, where one of the oracles is replaced with a quasi-oracle tiebreaker controlled by the protocol emergency committee multisig. It allows us to: - launch faster and connect more ZK oracles later - still have increased reliability - have a very limited influence on the Oracle report from the emergency multisig ### Sanity check There is a new function introduced in the `OracleReportSanityChecker` contract that MUST be called during the processing of the `AccountingOracle` report and reverts if the report values do not match the ones provided by ZK-oracles or there is no aggregated ZK proved report available in `Multiprover`. >IMPORTANT: we assume that the `StakingRouter` state is updated in the same tx before to check the `exitedValidators properly`. ```solidity= error ZKReportIsNotReady(); function checkAccountingReportZKP(ReportData calldata) external view; ``` The logic of this check can be illustrated with the flowchart: ```mermaid flowchart LR A[start] --> B{Is proof delivered?} B -->|Yes| C{Does it match?} B -->|No| M[revert Missed] C -->|No| R[revert Mismatch] C -->|Yes| S[return] style R stroke:#f00 style M stroke:#f00 style S stroke:#0f0 ``` #### Auto resettable fuse During the integration period, the risks of using the novel approach are anticipated, and the following precautions are reasonable to have: - An integration period of one year - Possibility to pause the ZK-check until it's fixed to avoid multiple missing Oracle reports - Possibility to resume the check without the DAO intervention - Forbid to turn off the check without the reason to prevent abuse The chosen approach: 1. **Fuse**: the multisig (emergency) committee that could disable the check if the previous Oracle report was missed 2. **Auto-reset**: if `Multiprover` has the consensus and ZK values match with reported ones during three consecutive Oracle reports, the check is automatically re-enabled 3. **Expiry period**: the fuse should be automatically disarmed in a year #### Matching We know from the proposals on the forum that the inclusion criterion for a validator to be considered belonging to Lido may result in an error, given that anyone can spawn a validator with Lido's withdrawal credentials. Hence, ZK-proved values can only serve as the upper limits at this stage. However, as soon as such an attack costs ether, we can assume that for `clBalance`, there is a lower boundary defined by the economic viability of the attack. So, there MUST be a parameter for the sanity check, `maxClBalanceError`, that can define the tolerance level for `clBalance` matching. Other values are only validated against the respective upper limits. ```solidity= error ClBalanceMismatch(uint256 reportedValue, uint256 provedValue); error NumValidatorsMismatch(uint256 reportedValue, uint256 provedValue); error ExitedValidatorsMismatch(uint256 reportedValue, uint256 provedValue); ``` The `exitedValidators` value is compared with the sum of the exited validators in each module, which can be retrieved from the `StakingRouter`. ## Rationale ### Why matching is not precise from the start From our research, adding precise inclusion proof for Lido validators will require a decent piece of modifications for the protocol public key storage system or building some ZKP-based indexer, and considering that: - It requires time (minimal estimation is 3 months) - `clBalance` delivery is a very important part to secure - rough implementation decreases possible impact significantly we decided to go forward without a public key index. ### Why so many precautions? As you probably noticed, the proposal includes several ways to prevent this check from malfunctioning: - three distributed ZK providers for reliability - three different implementations for bug/vulnerability avoidance - emergency fuse - reserve infrastructure for running provers These decisions stem from the following considerations: - the risk of unknowns is moderate/high. - unknown tech stacks, which are novel and complex - lack of expertise among core Lido contributors - amount of value at risk So, the main principle of this design was - don't make worth. The design considers it possible to roll back to the previous security model if there are any troubles. It's still better security because it makes an attack more difficult. It also allows us to add an experimental but very promising security module. ### Why Fuse, not Gateseal? We already have a great audited and stylish emergency stopper, [Gateseal](https://github.com/lidofinance/gate-seals), but in our case, we need a slightly different setup: - gate seal is one-time, but auto auto-resettable fuse is not - gate seal resumes after some time, fuse after some conditions are met - both are expiring after some time So, fuse allows us to have an experimental module that can fail, but if it restores its functions, it will be resumed without heavy governance operations. To avoid abuse from the fuse committee, the precondition (oracle report miss) should be met before initiation, which is already a serious incident and should be avoided at all costs. ## Backward compatibility Oracle daemon MUST work as intended without modification and explicit knowledge about ZK sanity checks. It will be able to utilize fastlane mechanics and reach consensus but will fail to submit report data until the multiprover is ready. So, it will retry and finally succeed after all the proofs have been delivered. However, it can be optimized to avoid this polling loop and reduce resource utilization. ## Security consideration ZK Oracle does not entirely save us from Oracle committee corruption (which could be caused by code mistakes, supply chain attacks, collusion, etc.). It reduces the effect of slashing concealment but adds a new DoS attack vector. ### DoS It is possible to stop Oracle reports by adding new validators to Lido’s withdrawal credentials. In this case, the `clBalance` reported by Oracle and by ZK Oracle wouldn’t match, which might cause a `CLBalanceMismatch` error and stop Oracle reports. By now, this attack is mitigated by the fuse mechanism; in mitigation, it will rely on the exact matching of ZK and regular Oracle reports. The following attacks work under the assumption that the Oracle committee is corrupted: ### TVL attack Because the provided values are not precise, TVL manipulation attack risk is not mitigated completely, but the impact of such attack is reduced. A malicious actor can report an incorrect `clBalance`. Until this proposal is implemented, such manipulation is limited by sanity checks, which are different for `clBalance` increase and decrease. ``` maxMaliciousDecrease = oneOffCLBalanceDecreaseBPLimit = -5% maxMaliciousIncrease = maxAnnualIncrease / 365 = 10% / 365 ~= 0.027% ``` The sanity check would remain the same for increase, but for decrease, we can add the `maxCLBalanceError` parameter effect. Potential attack scenario: a malicious actor decides to short LDO and stETH and stops all withdrawals (while the oracle still sends reports). To make prices drop even more, such actors start to artificially lower `CLBalance` to cause negative rebase and liquidate positions on lending markets. A malicious actor could not lower `CLBalance` for more than 5% due to sanity check, so assuming governance time is 4 days, with the current parameter, it is possible to lower initial`CLBalance` by 18,5%, which could cause a lot of liquidations on lending markets. It is possible to take loans on stETH with up to 97% LTV ratio (AAVE e-mode). With `maxCLBalanceError` equal to **0.749%**, users can be saved from liquidations on lending markets due to lowering `CLBalance` (0.749% * 4 days of governance = 2,996%), while it is still possible to face liquidations due to price drop. ### Slashing concealment In case of massive slashing, sophisticated actors may prolong turbo mode and withdraw all their funds to avoid slashing consequences. ZK Oracle can reduce the effect of such an attack in a doomsday scenario. The idea is that even with corrupted oracles, a `CLBalance` decrease can trigger a sanity check. Once the decrease is more than 5,75% (sanity check + `maxCLBalanceError`), in any case, the oracle will revert the report and stop withdrawals: - If regular Oracle provides the correct `CLBalance` - a sanity check will be triggered - If regular Oracle provides artificial `CLBalance` - `CLBalanceMismatch` error would be triggered With the current sanity check, even if all Lido validators were slashed, the CLBalance decrease still would be less than 5%, but a decrease from midterm slashing could reach such a percentage. Thus, there will be only 18 days when withdrawals could be artificially prolonged; after that, a sanity check or `CLBalanceMismatch` error would revert all reports. ### Conclusion We propose to set `maxCLBalanceError` at **0.749%**: - This value does not allow artificially trigger liquidations on the DEX market by TVL manipulation - In case of a DoS attack, it is high enough to be sure that such an amount of ETH added on Lido withdraw credentials is malicious action rather than human error ## Links - [[ZKLLVM] Trustless ZK-proof TVL oracle](https://research.lido.fi/t/zkllvm-trustless-zk-proof-tvl-oracle/5028) - [DendrETH: A trustless oracle for liquid staking protocols](https://research.lido.fi/t/dendreth-a-trustless-oracle-for-liquid-staking-protocols/5136) - [ZK Lido Oracle powered by Succinct](https://research.lido.fi/t/zk-lido-oracle-powered-by-succinct/5747) - https://docs.lido.fi/contracts/lido#oracle-report

Read more

LIP-23: Negative rebase sanity check with second opinion

Direct Deposits proposal for Mellow Vault

Simple DVT EasyTrack Motions

Anchor Sunset. Review Scope.