Try   HackMD

LIP-21: trustless safety net for updating the protocol TVL

🚧 WORK IN PROGRESS 🚧

Abstract

The proposal to integrate external trustless ZK-based Beacon Chain oracles as extra safety measures for the stETH token rebase within the Lido protocol.

It includes:

  • Required changes for the protocol to add these new checks on rebase
  • 2-of-3 Multiprover to improve security and reliability
  • Tiebraker quasi-oracle to speed up the start
  • Emergency lever to prevent DoS for the integration period (1st year)

Motivation

The AccountingOracle contract is a fundamental component of the Lido protocol that delivers, amongst other data, the aggregate of all Lido validators' Beacon Chain balances (clBalance in the report) to the protocol, thereby facilitating the daily rebase of the stETH token. The accuracy of this value is critical to the integrity of the protocol, and it relies on a committee of independently operated Oracle daemons in a 5-of-9 configuration. The protocol could be harmed if this committee is compromised, malfunctions, or colludes. This risk is acknowledged and constrained by a sanity cap that restricts the possible discrepancy in balance that Oracle can report.

However, with the anticipated implementation of EIP-4788: Beacon block root by Ethereum, it becomes possible to deliver the validators' balances trustless. It eliminates the risk of protocol TVL manipulation by the Oracle committee, reduces the amount of trust, and opens the road to permissionless unmanned protocol operation. Three proposals of trustless Oracle implementations have already been posted on the Lido research forum:

But as far as proposed solutions are MVPs and do not deliver a comprehensive accounting report, and ZK-technologies is still a bleeding edge of cryptography and may contain bugs and vulnerabilities, the present proposal focuses on integrating these oracles as a sanity check for the existing accounting report at first and with proper paranoid precautions.

Specification

In the first stage, only Beacon Chain values (clBalanceGwei, numValidators, and exitedValidators) will be delivered by ZK-Oracles as the most crucial for the protocol security. Further, other report values can be added when ZK-Oracles extend their functionality.

Basic flow includes:

  • a new point of check injection in AccountingOracle's submitReportData function
  • a new sanity check in OracleReportSanityChecks that checks if ZK Oracle report values match with values from the original Oracle report
  • a Multiprover contract responsible for the aggregation of different ZK reports
Accounting Oracle
Sanity Check
Multiprover
ZK Oracle 1
ZK Oracle 2
ZK Oracle 3

ZK Oracles

The proposed solution is leaning on several ZK-Oracle implementations that can be plugged in and out by the DAO, each represented by a corresponding contract on-chain that implements the following interface:

interface LidoZKOracle { function getReport(uint256 refSlot) external view returns ( bool success, uint256 clBalanceGwei, uint256 numValidators, uint256 exitedValidators ); }

which allows the use of the ZK Oracle contract for:

  • learning if a report for the slot is submitted to the contract
  • retrieving the report values if it is present

Synchronisation

Lido Oracle daemons submit reports based on the blockchain state for the specific reference slot (refSlot) in the boundaries defined by the current frame and processingDeadlineTime parameter.

LidoLocator locator = LidoLocator(locatorAddress); AccountingOracle oracle = locator.accountingOracle(); HashConsensus hashConsensus = oracle.getHashConsencus(); // refSlot, the beginning of the frame (uint256 refSlot,) = hashConsensus.getCurrentFrame(); (,uint256 epochsPerFrame,) = hashConsensus.getFrameConfig(); (uint256 slotsPerEpoch,,) = hashConsensus.getChainConfig(); // refSlot, the beginning of the next frame uint256 nextRefSlot = refSlot + epochsPerFrame * slotsPerEpoch;

NOTE: If the refSlot is missed, the Oracle should return values proved against the last available non-missed slot but still accept the original refSlot as an argument.

Multiprover

Because ZK-crypto is still immature and based on very intricate cryptography, it is reasonable to add some redundancy to the system, having several ZK-oracles, each based on different technical stack reporting in parallel. So, if a bug or vulnerability exists in one implementation, it's unlikely to be repeated the same way on the other ZK stack.

At the same time, each oracle includes an off-chain part that is responsible for creating the proof and submitting it on-chain. And this part can hardly achieve the reliability level of the Ethereum blockchain; thus, it can fail to deliver the proof at all. The Multiprover in a 2-out-of-3 configuration can amend this issue as well.

The downside is the cost of the approach, which roughly doubles the protocol spending on oracles. Still, the importance of security for this critical part of the protocol, considering the overall size of the protocol TVL and the opening possibilities that ZK-tech provides, may justify these costs.

So, it's proposed to build a "decorator" contract with the same interface as each separate oracle (LidoZKOracle) responsible for aggregating different ZK reports from three different oracle contracts. It requires at least two identical reports to be submitted to return the report successfully and must revert with an error if it fails to reach consensus.

error NoConsensus(); interface Multiprover is LidoZKOracle { }

Tiebreaker

It's possible to start a multiprover with an intermediary configuration, where one of the oracles is replaced with a quasi-oracle tiebreaker controlled by the protocol emergency committee multisig. It allows us to:

  • launch faster and connect more ZK oracles later
  • still have increased reliability
  • have a very limited influence on the Oracle report from the emergency multisig

Sanity check

There is a new function introduced in the OracleReportSanityChecker contract that MUST be called during the processing of the AccountingOracle report and reverts if the report values do not match the ones provided by ZK-oracles or there is no aggregated ZK proved report available in Multiprover.

IMPORTANT: we assume that the StakingRouter state is updated in the same tx before to check the exitedValidators properly.

error ZKReportIsNotReady(); function checkAccountingReportZKP(ReportData calldata) external view;

The logic of this check can be illustrated with the flowchart:

Yes
No
No
Yes
start
Is proof delivered?
Does it match?
revert Missed
revert Mismatch
return

Auto resettable fuse

During the integration period, the risks of using the novel approach are anticipated, and the following precautions are reasonable to have:

  • An integration period of one year
  • Possibility to pause the ZK-check until it's fixed to avoid multiple missing Oracle reports
  • Possibility to resume the check without the DAO intervention
  • Forbid to turn off the check without the reason to prevent abuse

The chosen approach:

  1. Fuse: the multisig (emergency) committee that could disable the check if the previous Oracle report was missed
  2. Auto-reset: if Multiprover has the consensus and ZK values match with reported ones during three consecutive Oracle reports, the check is automatically re-enabled
  3. Expiry period: the fuse should be automatically disarmed in a year

Matching

We know from the proposals on the forum that the inclusion criterion for a validator to be considered belonging to Lido may result in an error, given that anyone can spawn a validator with Lido's withdrawal credentials. Hence, ZK-proved values can only serve as the upper limits at this stage.

However, as soon as such an attack costs ether, we can assume that for clBalance, there is a lower boundary defined by the economic viability of the attack.

So, there MUST be a parameter for the sanity check, maxClBalanceError, that can define the tolerance level for clBalance matching. Other values are only validated against the respective upper limits.

error ClBalanceMismatch(uint256 reportedValue, uint256 provedValue); error NumValidatorsMismatch(uint256 reportedValue, uint256 provedValue); error ExitedValidatorsMismatch(uint256 reportedValue, uint256 provedValue);

The exitedValidators value is compared with the sum of the exited validators in each module, which can be retrieved from the StakingRouter.

Rationale

Why matching is not precise from the start

From our research, adding precise inclusion proof for Lido validators will require a decent piece of modifications for the protocol public key storage system or building some ZKP-based indexer, and considering that:

  • It requires time (minimal estimation is 3 months)
  • clBalance delivery is a very important part to secure
  • rough implementation decreases possible impact significantly

we decided to go forward without a public key index.

Why so many precautions?

As you probably noticed, the proposal includes several ways to prevent this check from malfunctioning:

  • three distributed ZK providers for reliability
  • three different implementations for bug/vulnerability avoidance
  • emergency fuse
  • reserve infrastructure for running provers These decisions stem from the following considerations:
  • the risk of unknowns is moderate/high.
    • unknown tech stacks, which are novel and complex
    • lack of expertise among core Lido contributors
    • amount of value at risk So, the main principle of this design was - don't make worth. The design considers it possible to roll back to the previous security model if there are any troubles. It's still better security because it makes an attack more difficult. It also allows us to add an experimental but very promising security module.

Why Fuse, not Gateseal?

We already have a great audited and stylish emergency stopper, Gateseal, but in our case, we need a slightly different setup:

  • gate seal is one-time, but auto auto-resettable fuse is not
  • gate seal resumes after some time, fuse after some conditions are met
  • both are expiring after some time

So, fuse allows us to have an experimental module that can fail, but if it restores its functions, it will be resumed without heavy governance operations.

To avoid abuse from the fuse committee, the precondition (oracle report miss) should be met before initiation, which is already a serious incident and should be avoided at all costs.

Backward compatibility

Oracle daemon MUST work as intended without modification and explicit knowledge about ZK sanity checks. It will be able to utilize fastlane mechanics and reach consensus but will fail to submit report data until the multiprover is ready. So, it will retry and finally succeed after all the proofs have been delivered. However, it can be optimized to avoid this polling loop and reduce resource utilization.

Security consideration

ZK Oracle does not entirely save us from Oracle committee corruption (which could be caused by code mistakes, supply chain attacks, collusion, etc.). It reduces the effect of slashing concealment but adds a new DoS attack vector.

DoS

It is possible to stop Oracle reports by adding new validators to Lido’s withdrawal credentials. In this case, the clBalance reported by Oracle and by ZK Oracle wouldn’t match, which might cause a CLBalanceMismatch error and stop Oracle reports.

By now, this attack is mitigated by the fuse mechanism; in mitigation, it will rely on the exact matching of ZK and regular Oracle reports.

The following attacks work under the assumption that the Oracle committee is corrupted:

TVL attack

Because the provided values are not precise, TVL manipulation attack risk is not mitigated completely, but the impact of such attack is reduced.

A malicious actor can report an incorrect clBalance. Until this proposal is implemented, such manipulation is limited by sanity checks, which are different for clBalance increase and decrease.

maxMaliciousDecrease = oneOffCLBalanceDecreaseBPLimit = -5%
maxMaliciousIncrease = maxAnnualIncrease / 365 = 10% / 365 ~= 0.027%  

The sanity check would remain the same for increase, but for decrease, we can add the maxCLBalanceError parameter effect.

Potential attack scenario: a malicious actor decides to short LDO and stETH and stops all withdrawals (while the oracle still sends reports).

To make prices drop even more, such actors start to artificially lower CLBalance to cause negative rebase and liquidate positions on lending markets. A malicious actor could not lower CLBalance for more than 5% due to sanity check, so assuming governance time is 4 days, with the current parameter, it is possible to lower initialCLBalance by 18,5%, which could cause a lot of liquidations on lending markets.

It is possible to take loans on stETH with up to 97% LTV ratio (AAVE e-mode). With maxCLBalanceError equal to 0.749%, users can be saved from liquidations on lending markets due to lowering CLBalance (0.749% * 4 days of governance = 2,996%), while it is still possible to face liquidations due to price drop.

Slashing concealment

In case of massive slashing, sophisticated actors may prolong turbo mode and withdraw all their funds to avoid slashing consequences. ZK Oracle can reduce the effect of such an attack in a doomsday scenario.

The idea is that even with corrupted oracles, a CLBalance decrease can trigger a sanity check. Once the decrease is more than 5,75% (sanity check + maxCLBalanceError), in any case, the oracle will revert the report and stop withdrawals:

  • If regular Oracle provides the correct CLBalance - a sanity check will be triggered
  • If regular Oracle provides artificial CLBalance - CLBalanceMismatch error would be triggered

With the current sanity check, even if all Lido validators were slashed, the CLBalance decrease still would be less than 5%, but a decrease from midterm slashing could reach such a percentage. Thus, there will be only 18 days when withdrawals could be artificially prolonged; after that, a sanity check or CLBalanceMismatch error would revert all reports.

Conclusion

We propose to set maxCLBalanceError at 0.749%:

  • This value does not allow artificially trigger liquidations on the DEX market by TVL manipulation
  • In case of a DoS attack, it is high enough to be sure that such an amount of ETH added on Lido withdraw credentials is malicious action rather than human error