The document addresses current issues blocking the connection of permissionless modules to the Staking Router and proposes solutions for their elimination:
The document assumes the reader is already familiar with the current design of the Deposit Security Module, the key submitting process, and key vetting.
With the imminent introduction of new permissionless modules, the question arises whether the current vetting of keys through the DAO is compatible with the permissionless model of these modules, as it requires explicit approval from governance for each operator.
The current deposit data vetting process has a vulnerability described in the issue. By exploiting it, a malicious node operator can replace keys that are about to be marked as vetted with invalid keys.
The problem becomes more acute with the appearance of permissionless modules. In a curated set, operators risk their reputation and offboarding from the set, while such levers do not exist for permissionless operators. Thus, the current approach is not applicable for new modules and should be reconsidered.
With the upcoming introduction of permissionless modules, the question of revising the module pause arises. The current design entails pausing the module at any attempt of a front-run attack. For modules with permissionless entry, this opens the possibility for an arbitrary actor to trigger a pause on the module and stop deposits for all operators in the module. The problem motivates a reevaluation of the current approach and handling such attack attempts at the operator level, rather than the module.
It is proposed to modify the current deposit flow by adding a key unvetting process, allowing vetting of keys through different actors, and changing the conditions for pausing modules. In brief, the proposed process can be described as follows:
It is proposed to assign the responsibility for vetting keys for some modules to Council Daemons, and also to add a new actor to DSM, responsible for aggregating vetting messages and delivering them. The new processes and actors are shown in black in the figure.
Not implemented in the first version due to lack of modules with this need
The current process of key vetting in Curated modules is difficult to change. Although the stakingLimit
has been split into vettedKeys
and targetLimit
, operators still use vetting not only for the key validation process but also for active key management. However, vetting through governance does not suit permissionless modules, therefore it is proposed to split the vetting process across different actors for different modules.
For curated modules, governance can still act as the actor, while for permissionless modules – a security tool performing the necessary checks. The Lido protocol already has infrastructure for securing deposits – DSM, on which it is proposed to assign this role.
DSM can take on the role of vetting keys in the module, provided that the module supports the necessary interface and the necessary roles are granted to the DSM contract.
It is proposed that the Council Daemon iterates through modules that support the vetting interface, retrieves data for each of the active operators, checks for the presence of unvetted keys and, if any are found, performs the following checks: validation of signatures, no duplicates, no intersections with previously deposited keys. The checks are detailed in the Checks section.
If all checks are successfully passed, the key is considered valid and can be used for deposit. After passing the checks, one of the following scenarios should be executed:
In case the check fails, the Council Daemon sends a message to the Data Bus about the found invalid keys and the reason they failed the check. Messages are grouped by module, operator, and type of failed check. Based on such messages, it is proposed to build monitoring, in addition to the existing frontend.
Note: The daemon should cache invalid keys and ignore them in further check cycles to avoid cluttering the Data Bus. Caching should include the complete Deposit Data, not just the public key.
In case the check is passed, the Council Daemon sends a message to the Data Bus about keys that have passed the check and can be marked as vetted. Vetting for operators in one module can be grouped into one transaction.
Vetting bot aggregates vetting key messages from different Council Daemons from the Data Bus and forms a quorum. After that, it forms and sends a transaction to the DSM contract. It is proposed that the DSM contract has a new method vetOperatorsKeys
for this, and the corresponding role is granted to the Staking Router contract:
The contract performs onchain checks:
nonce
and depositRoot
match the current onchain values to ensure the contract states have not changedblockHash
matches the onchain block hash at blockNumber
to ensure no reorganization has occurredAfter passing the onchain DSM checks, the contract calls the increaseStakingModuleVettedKeysCountByNodeOperator
method on the Staking Router contract:
Council Daemon can group reports by operators of one module. The number of operators in one transaction is limited by the parameter maxOperatorsPerVetting
, and the report for each module goes through a separate transaction.
Messages for vetting keys are invalidated with a change of nonce
, deposit_root
, or after 255 blocks, which is the limit for obtaining a historical block_hash
onchain.
nonce
.Different modules may have their keys vetted by different actors, including vetting that may be optimistic, where uploaded keys are immediately considered vetted. Keys may become invalid over time in case of a front-run attack attempt. Therefore, it's important to introduce a mechanism for unvetting keys, which will be uniform and required for all modules.
It is proposed to grant the role of decreasing the number of vetted keys to DSM and to modify the IStakingModule
interface that must be supported by every module. Thus, unvetting occurs through the DSM contract, which calls the corresponding method on the module side through the Staking Router.
Council Daemon monitors the state of keys in all modules as well as new deposits in the deposit contract and performs the following checks over keys in the deposit queue for each contract change: validation of signatures, no duplicates, no intersections with previously deposited keys. The checks are detailed in the Checks section.
In case the Council Daemon finds an invalid key in the deposit queue (even for modules which have not opted into using the DSM for vetting, but instead rely on the governance-based approach), it sends a transaction for unvetting keys to the DSM contract.
It also sends a signed message to the Data Bus, which can be used by the Vetting bot to perform the transaction if the Guardian's balance is not sufficient to carry it out.
A signed intent is invalidated upon changing the module's nonce
or after blockhash is unreachable onchain for a signed block.
Council Daemon can group reports by operators of one module. The number of operators in one transaction is limited by the parameter maxOperatorsPerUnvetting
, and the report for each module goes through a separate transaction.
Since Council Daemons will have to send unvetting transactions almost simultaneously, it is proposed to have an early exit in the unvetting method upon checking the module's nonce
to prevent unnecessary gas expenses.
A quorum is not formed, and only one Council Daemon is enough to perform the unvetting of keys. The same principle that underlies the current DSM design for pausing remains – one honest guardian should be enough to prevent collusion of other guardians. Any unvetting of keys is treated as an incident and is thoroughly investigated to exclude operator censorship by guardians.
The design assumes that vetting within a module can be implemented differently, including being optimistic. In this case, it is assumed that keys upon being submitted to the module are marked as vetted. Thus, the pointer to the total number of keys is synchronized with the pointer to the number of vetted keys of the operator, as long as there are no issues with the keys. Synchronization implies that when new keys are submitted, the vetted pointer moves along with the total pointer.
Upon detecting any issues with the keys, deposits in the module go into soft pause, and an unvetting transaction is called on the DSM contract. The DSM contract calls the decreaseVettedSigningKeysCount
method on the module for a specific operator, shifting the pointer to the first valid key. Desynchronization implies that when new keys are uploaded, the vetted pointer remains in place.
After unvetting, the operator loses synchronization between the number of vetted keys and the total number of submitted keys, until some action by the operator restores it. Such action can be any deletion of keys or calling a special method that allows the operator to signal to the module that the issue has been fixed. The choice of approach is left to the discretion of the module, the proposed design only dictates the following constraints:
decreaseVettedSigningKeysCount
on the module should shift the vetted keys pointer and exclude all keys to the right of the pointer from the deposit queue.The flow of deposits itself is proposed to remain unchanged. DSM checks the signed data in the deposit message and, in case of success, calls the corresponding method on the Lido contract.
It is proposed to move the parameters maxDepositsPerBlock
and minDepositBlockDistance
from DSM to the Staking Router level. Modules with different properties have different risks when making deposits, so these parameters can be different for different modules. More reasons will be analyzed below in the Guardian Collusion section and in the research doc.
Tests show that backward compatibility remains for both offchain tooling and possible onchain integrations: https://github.com/lidofinance/sr-1.5-compatibility-tests. The modified methods responses are correctly decoded by standard solidity decoder and ethers.js library. New bytes in the responses are ignored.
For curated modules, the values are proposed to remain unchanged, but to reduce the maxDepositsPerBlock
value for the Community Staking Module. The proposed values:
Module | maxDepositsPerBlock |
minDepositBlockDistance |
---|---|---|
Curated | 150 | 25 |
Simple DVT | 150 | 25 |
Community Staking | 30 | 25 |
The value is chosen based on research.
It is proposed to add a general limit on the frequency of deposits on the DSM contract side. This way there will be distance between deposits to different modules, similar to deposits within a single module. It is assumed that the deposit frequency checks in depositBufferedEther
and canDeposit
methods of the DSM contract use the maximum value of lastDepositBlock
from the module and DSM contract.
maxDepositsPerBlock
and minDepositBlockDistance
for each module.maxDepositsPerBlock
is reduced for the Community Staking Module.Council Daemon performs key checks each iteration cycle before making a deposit. In case any problems with keys in the module are detected, the Council Daemon goes into soft pause mode – it stops signing deposit messages for the selected module until the problem is resolved. At the same time, either unvetting of keys or a complete module pause should occur.
The emergence of new modules with permissionless entry, where the number of node operators is unlimited and the node operators themselves are unknown, imposes new constraints on the design of the deposit pause. Malicious behavior of one of the operators in such a module should not negatively affect the rest of the participants.
It is proposed to pause deposits to all modules in case of an already occurred front run, and the scenario in which an attempt to steal users ETH occurs is proposed to be mitigated by removing the keys from the deposit queue (see Unvetting section). Thus, protection against attempted theft becomes more targeted and directed at a specific operator, rather than the entire module. The deposit pause remains and is moved to the next layer of defense and should trigger in the event of a front-run, which would mean a collusion of guardians or unforeseen circumstances.
It is proposed that the deposit pause will be applied to all modules at once. Otherwise, colluding guardians could execute theft from each module individually. The proposed design no longer allows an operator to trigger a pause, which eliminates the need to isolate pauses by modules and implements an approach of a universal deposit pause, reducing the risks of a guardian collusion attack.
The risks of false positives remain, and in this case, the impact will be higher since the pause will affect deposits in all modules. However, the impact on the protocol in the event of guardian collusion remains significantly higher.
Consider the worst-case scenario of a false positive. The calculations do not take many factors into account, but they allow us to estimate the order of magnitude. 150,000 ETH
(the daily stake limit) would not be deposited daily over 3 days
(the response time of governance). This corresponds to the launch of 4687 validators per day
. With the average rewards of one validator at 0.0034 ETH per day
, this would lead to total losses over 3 days of 96 ETH = 16 ETH + 32 ETH + 48 ETH
.
Based on the current protocol earnings of ~1000 ETH per day
, this would result in a lost profit for the protocol over 3 days of 3.2% = 96 ETH / (3 * 1000 ETH)
.
Considering that the protocol reached 10 million TVL in approximately 4 years, the average daily stake is ~7k
(10 million / 365 / 4), and according to statistics for the last 6 months, this figure is 8k per day. This is 20 times
less than the limit considered in the worst-case scenario. This allows us to assume that the real figures will be an order of magnitude less than the worst-case scenario.
It is proposed that the process of unpausing is left to governance.
Council Daemon spends ETH on unvetting and pause operations. Since these operations are critical, it is absolutely necessary to monitor the balance of each Daemon. For this purpose, it is proposed to introduce 2 thresholds and organize monitoring and alerting for them:
Depositor Bot and Pause Bot have a unified code base and are launched in one infrastructure under one private key. It is proposed to implement Vetting Bot in the same code base and launch it alongside.
The replenishment of the balance of all DSM actors: Council Daemon, Depositor, Vetting, and Pause Bots is proposed to be assigned to the Gas Supply Committee.
When vetting keys, as well as when changing the state of modules or the deposit contract, Council Daemons perform checks over keys, ensuring that keys can be safely used for deposit.
Deposit Data consists of a public key and signature over the deposit message. In this message, withdrawal credentials, deposit amount, domain are included. Council Daemon reconstructs the expected message and checks that the signed message matches the expected one.
Council Daemon checks that the public keys in the investigated deposit data are not duplicates relative to Lido keys: previously deposited, in the deposit queue, or not yet checked. The main task of this check is to distinguish original keys from duplicates. To facilitate the problem of finding duplicates, it is proposed to use the existed SigningKeyAdded
event from Node Operators Registry contract and introduce it in the IStakingModule
interface:
Let's consider possible scenarios and the behavior algorithm in them. Note that the state of keys can be different at the time of check (all or some of the keys can be vetted or unvetted). From DSM's side, the reaction or its absence is assumed, which will lead to the required state.
Duplicates at one operator in one module. In this case, the key with the lowest index is considered the original. In this case, all keys up to the first duplicate are considered valid:
Duplicates between different operators. This case is the same for operators in one module and for operators in different modules. The original key is considered the one that was uploaded earlier. For this, the offchain part receives events for each key by operator and public key. The earliest uploaded key is considered the original.
Since deposit data can be deleted and re-submitted, leading to multiple SigningKeyAdded
events for one key for one operator, the event of addition is considered the earliest one.
There can be an attempt to front-run the key submission transaction, in this case, it's difficult to determine who was first, therefore it is proposed to unvet the entire set of duplicates. If trying to look at the log index, then a malicious actor can make a back-run.
The attack makes little economic sense, provided that the impact is limited to vetting the last submitted keys of the operator. However, in case attempts of such attacks are identified, it is proposed to mitigate the problem using private mempools. An operator facing such a problem should be able to delete the key and submit a new one through a private mempool. A stricter mitigation may include checking some signed message by the validator's private key, but such a solution increases the chances of compromising the validator's private key and is not recommended without extreme necessity.
It is important for modules to consider the features of the attack through duplicates in their design, to limit attacks by operators from other modules or by operators within the module. The impact on the operator due to the unvetting of keys should be limited. In the ideal case – limited to invalidation of one attacked key, or a batch of the last submitted keys.
The Council Daemon checks that the public keys in the modules have not been previously deposited directly through the Deposit Contract with different withdrawal credentials from Lido.
Signatures from Deposit Events are validated and invalid ones are rejected. Such deposits are ignored on the Consensus Layer side. Filtering of such deposits allows to exclude censoring of the deposit queue by depositing 1 ETH (minimum deposit size in a deposit contract) to the pubkey of the attacked operator.
Deposit Events containing deposits to Lido Withdrawal Credentials are ignored and do not block deposits to keys in the queue unless they have been previously deposited through Lido. Such a key can be deposited by Lido without any consequences. Once the validator is activated, donated ETH will be skimmed on withdrawal credentials contract.
In the event of Withdrawal Credentials change, it is assumed that all submitted but not yet deposited keys must be unvetted. Such an operation implies significant changes that need to be made both in the offchain tooling and in the state of the contracts. Therefore, the operation is expected to be coordinated. Nonetheless, it is assumed that the Council Daemon reads the Withdrawal Credentials from the Staking Router contract and uses it to verify signatures. If the Withdrawal Credentials are changed but there remain modules with vetted keys whose signatures are made for old Withdrawal Credentials, this will lead to the regular unvetting of all such keys.
Vetting of keys is used solely for verifying the keys' suitability for deposits and are limited by the checks described in the Checks section. Other conditions limiting deposits on operators must be separately enforced by the module in the contract code. Such conditions may include, for example, the presence of stuck keys, a set target limit, or an insufficient bond.
DSM monitors and performs vetting and unvetting operations only for modules in the Active status; other statuses prohibit deposits at the smart contract level. Deactivated operators are also ignored.
When vetting and unvetting for operators in batches, the Council Daemon must sort the array of operators by index from smallest to largest.
With the emergence of several modules, the possibility arises for operators of one module to influence the state of operators of another module using the protocol property that deposited keys must be unique. For example, an operator from module A can upload an existing key from module B and the protocol should do something about it. This problem is also valid for operators within one module and becomes more acute due to the appearance of permissionless modules, where the protocol has fewer levers on operators.
Mitigation involves more careful identification of original keys and reporting of duplicates, as described in the Checks section.
With the emergence of permissionless modules, it is necessary to reconsider the collusion scenario, as the ability to introduce permissionless modules and modules with FIFO can alter the attack patterns.
Consider a potential scenario:
At least one honest guardian mitigates the potential attack by:
DSM has limits on the frequency of deposits and the number of deposits at a time. Current values: 25 blocks between transactions and 150 keys for curated and 30 keys for CSM at a time. Thus, at least one honest guardian has a sufficient time window to react, and the amount of funds that can be stolen, in the case of guardian collusion, is limited.
In summary: the attack through permisionless modules with FIFO becomes easier than the attack through a curated module. At the same time, the attack becomes more expensive because a bond is required for each validator.
Different properties of modules, changing the conditions of the attack, are proposed to be mitigated by separate parameters maxDepositsPerBlock
and minDepositBlockDistance
for each module. Details can be found in the Deposit section.
It is also proposed to mitigate the one-time attack damage by pausing all modules. Details can be found in the Pause section.
Module operators may censor DSM transactions by front-run them with any operations on keys that change the module's nonce
. The cost of an attack is high and requires a key operation transaction in each block and the impact is limited to deferred deposits. Mitigation of the attack involves having levers over operators in curated modules and having fees for some operations, such as key deletion, in permissionless modules.
The goal of the changes is to unlock the addition of new modules, such as CSM to the Staking Router, so the proposed changes are limited to the minimally necessary set. Other improvements that could be useful are intentionally left out of scope but can be worked on separately outside the main scope.
The current setup of offchain tools supports Rabbit MQ and Kafka as data buses, requiring a centralized server. Transitioning to a decentralized data bus solution reduces legal and infrastructure failure risks and allow anyone to make deposits or perform the deposits pause using signed messages from a publicly accessible data bus.
This improvement is separate from the main scope and can be done in parallel. It will be covered in a separate document.
This improvement guarantees that there will be no onchain deposits to the same key. However, it does not fully solve the problem of duplicates, and all checks remain necessary. Storing the root of all previously deposited keys will allow making checks in ZK Oracle cheaper.
This improvement is separate from the main scope and can be done in parallel. It will be covered in a separate document.
The implementation in Ethereum of EIP-2537 could improve the procedure for checking signatures of submitted deposit data and guarantee onchain the absence of invalid signatures. Nevertheless, these changes are out of scope, as the delivery of changes to DSM is planned before the Prague/Electra hardfork, which might include BLS Precompiled.
It is assumed that the next iteration of DSM should include research on the integration of EIP-2537.
The implementation in Ethereum of EIP-7251 may require a significant change in the deposit process and accounting: revising the handling of duplicates, accounting of keys in modules, accounting of deposited validators in the Lido contract, etc. The proposed DSM design does not consider possible changes in the protocol that would be required in case this EIP is implemented.
The emergence of DVT based modules may lead to a reconsideration of the protocol's relationship with operators, as the technology implies the relationship of several operators to one validator. The proposed DSM design does not consider possible changes in the protocol due to the emergence of DVT based modules.
nonce
remains (Critical-02). Mitigated by code audit of modules.