CSM Strikes

This doc is closed!

Pls comment on and refer to https://hackmd.io/@lido/csm-v2-internal

Preface

One of the challenges with permissionless systems is to ensure the appropriate behavior of the participants. Simply speaking, they should follow the rules. Should they violate the rules, a response should follow.

There are several potential avenues for rule violations in CSM:

MEV stealing (using the wrong feeRecipient)
Incorrect MEV configuration (not allowed relay, vanilla block > 0.07 ETH)
Stuck validator
Bad performance for a long time (below threshold, offline)
CSM queue pollution
Intentional block proposal or sync committee miss
Slashing

Let's look at each of the violations and the response measures that are currently available.

MEV stealing

The existing CSM version extensively covers this violation. CSM Committee detects and reports facts about MEV stealing. During the negotiation period, Node Operators can compensate for the stolen funds + an additional fine. If not compensated, Lido DAO confirms the penalty via Easy Track motion, and the penalty is burned from the Node Operator's bond. Any unbonded validators that appear after are requested to exit by VEBO. If not exited, they are marked as stuck, and no Node Operator rewards are distributed until validators are exited. Stuck validators will be forcefully ejected after EIP-7002 implementation at Lido protocol level.

Current response measures ensure:

Compensation of the stolen funds for the stETH holders;
Request for exit (and ejection after EIP-7002) of the unbonded validators for the Node Operators who committed stealing;
If the stolen amount exceeds the available bond, all validators are requested to exit;

Given that no additional measures are required for this violation.

Incorrect MEV configuration

This violation currently lacks any response measures other than notification of the operators (if they have provided contact information).

This situation is suboptimal and can be improved by policy updates containing explicit conditions for the Node Operators to be penalized for the violation. The addition might read: "In case of repetitive violations of the Block Proposals SNOP, a penalty might be reported and Node Operators can be penalized for 0.1 ETH (additional stealing fine)". This penalty might be reported by the existing CSM Committee using the same tech as in the previous section.

Stuck validator

The term stuck validator stands for the validator that has not exited after the exit request by VEBO. Currently, the presence of the stuck validators for the Node Operator results in Node Operator rewards cancelation. After EIP-7002, these validators can be ejected, and a corresponding penalty can be applied.

No additional measures are required here.

Bad performance for a long time

CSM does not currently cover this type of violation. A new strike system is proposed as a reaction measure here. The system itself will be described below.

CSM queue pollution

In a nutshell, this case stands for uploading multiple validator keys and then deleting them, effectively creating invalid deposit queue items in the CSM queue.

Currently, CSM charges keyDeletionFee for each deleted key, making this attack economically unfeasible. keyDeletionFee is transferred to the Lido DAO treasury, allowing for compensation of the associated maintenance costs (cleanDepositQueue method calls).

No additional measures are required here.

Intentional block proposal or sync committee miss

Intentional or not, missed block proposals and sync committees are not accounted for by the performance oracle now. However, within the updated CSM Performance Oracle version, block proposals and attestations during sync committee participation will be included in the performance metric.

No additional measures are required here.

Slashing

Slashing is probably the most significant violation that CSM validators can commit. Luckily, this violation is extensively covered by the current CSM code. Any losses caused by slashing are compensated upon validator withdrawal reporting.

The initial slashing penalty will be reduced with EIP-7251 from 1 ETH to 1/128 ETH for the 32 ETH validators. This reduction will be reflected in the bond reduction for the CSM Operators. From the code perspective, separate slashing reporting will be removed, and all penalties caused by slashing will be compensated upon validator withdrawal reporting (the difference between the withdrawal balance and 32 ETH will be burned from the Node Operator's bond).

No additional measures are required here.

Summary

Given the above, the only violation lacking response measures is "Bad performance for a long time". It is proposed to introduce a strike system to ensure sufficient response to the violation.

Strike system goals

Protect protocol from systematic bad performers while keeping performance leeway;
Disincentivise systematic bad performance;
Do not create additional operational costs for the protocol;

General description

One of the unresolved issues in the current version of CSM is the ejection of the bad-performing validators. Although these validators will not get the Node Operator's reward, the bond rebases will persist, and such validators will negatively impact the overall LoE protocol APR. Hence, the price of the theoretical attack on the LoE protocol is decreased compared to the other permissionless protocols.

As described in the document attached to the CSM Architecture, one optimal way to tackle the issue is to introduce a bad-performance strikes system into CSM.

Strikes assignment

It is proposed to have a single actor responsible for the performance strikes assignment - CSM Performance Oracle.

Once in a frame, CSM Performance Oracle delivers an additional tree root with information about "strikes" for the validators. A strike means that the validator performed below the threshold in this frame. When updating this tree, CSM Performance Oracle considers the previous values from the old tree. All strikes older than 6 months are dropped.

Strikes tree leaves have a form of {noID, validatorPubkey, [strikeTimestamps]}.

The main reason for assigning strikes to validators and not Node Operators is to maintain consistency in the performance measurements. Currently, CSM Performance Oracle considers validators' performance individually. Hence, strikes should also be a validator property to ensure precise ejections of the bad-performing validators.

It is crucial to note that strikes are not a penalty but an indicator of bad performance that should be considered by the Node Operators as a signal to improve their performance.

Fixed bad performance fine

In the initial proposal, the term "performance tax" was used. However, this value might be challenging to calculate accurately. It seems reasonable to rename the initial term to "bad performance fine" and make it a fixed value that is confiscated from the Node Operators bond should their validators be ejected due to the sufficient number of strikes.

It is proposed to have a fixed configurable value for the "bad performance fine", with the Lido DAO being the actor to set/update the actual value should the network conditions change. This approach allows Lido protocol to keep "bad performance fine" up to date.

Ejection due to strikes

Once the number of strikes reaches 3 (3 strikes in 6 months), the permissionless method can trigger exit for the validator and confiscate a "bad performance fine" from the Node Operator's bond.

Since Node Operator key indices in the CSM keys storage might be changed in the new optimistic vetting approach (deleted key is swapped with the last key in the keys storage), it is required to provide the current key index in the Node Operator's storage to the permissionless method and check that the key in the leaf and the storage are identical.

function ejectBadPerformingValidator(uint64 noId, bytes32 proof, uint256 keyIndex, bytes32 strikesData) {
	validatorKey = getNoKey(noId, keyIndex);
	checkKey(proof, validatorKey);
	assertStrikesCount(strikesData);
	checkProof(proof, strikesData);
	requestEjection(validatorKey);
	confiscateEjectionFee(noId, strikesData);
	confiscateBadPerfPenalty(noId, strikesData);
}

Since validator ejection with EIP-7002 comes with the price, this price should be confiscated from the Node Operator's bond and transferred to Lido DAO treasury to cover corresponding operational expenses.

Exiting earlier than getting the third strike

The Node Operator may decide to exit his validator before getting a third strike, which will allow him to avoid a confiscation of a "bad performance fine."

It is crucial to note that all direct losses will be confiscated, and no staking rewards will be distributed during frames with poor performance anyway.

Given the strike system's goal of "Protecting protocol from systematic bad performers while keeping performance leeway," it is reasonable to allow bad-performing validators to voluntarily leave the protocol, effectively reducing the number of bad-performing validators in the protocol.

Also, accounting for the already assigned strikes upon validator withdrawal would require a direct connection between the withdrawal reporting process and CSM Performance Oracle, effectively making exits permissioned and heavily dependent on the CSM Performance Oracle operation.

Possible attack vectors

Exiting after 2 strikes and joining back

This attack vector might seem very harmful to the protocol since it basically allows for indefinite bad performance without any response measures from the protocol side.

However, the devil is in the detail. To make a noticeable impact on the protocol's effectiveness, malicious actors should control a significant portion of the protocol validators (> 1%). If all of these validators are performing below the threshold for 2 Performance Oracle frames they all get strikes assigned and should be voluntarily exited before the third report. Once exited they need to join the again. To do so they should be placed at the very end of the CSM deposit queue and get deposited to repeat the attack. In real-world conditions, this process might take them a significant time or might not even be feasible since the other validators in the queue in front of them will get deposited first.

Hence, the described attack will most likely become less effective with each cycle, assuming the attacker is not the only user of CSM.

Bad performance for 2 periods within each 6 periods

This case is less of an attack and more of a regular bad-performance situation. A validator can indeed perform below the threshold for 2 periods within every 6 periods and fully avoid additional penalization. Given the required 4 periods of good performance within each 6 periods, this case is less invasive than the one described above.

One possible solution here might be making strikes valid infidelity. However, this will not have a meaningful effect since validators can be exited after 2 strikes and then created again, as described above.

It is proposed that the risks associated with this attack vector be accepted due to the low probability of it ever happening, the low impact on the protocol, and the lack of economic reasons to perform this attack. At the end of the day, it is way more profitable to perform well and benefit from the CSM capital multiplier.