Notes from slashing/NPoS session

--- title: Notes from slashing/NPoS session tags: parity, staking, npos, FRAME --- # Notes from Slashing in NPoS session ## Parsed notes There were a main few topics discussed: - How to decrease the complexity and state updates/reads when reverting a slash - re-nominate nominators to validators that had a slash reverted - slash requires a lot of state reads and updates for nominators; immidiate slash makes these operations very expensive and liable of being reverted in the future; - Slashing spectrum based on offenses criticality: consider to handle different types of slashing offenses through separate flow - critical slashes will be necessarily expensive to ensure security and fairness; - non critical offenses may be more relaxed without impacting security considerably - Immidiate vs lazy slashing and chilling - **Idea 1**: when an offence is reported, the validator is "blocked" (cannot be elected) and the nominations are kept. - the problem with this approach is that slashed nominators will be fully backed after chilled; - solution is to apply this logic only to minor offences - **Idea 2**: keep a slashed balance per nominator - let nominators nominate only if they have a minimum ratio `nominated_balance/slashed_balance` - if the ratio is too low, block/chill the nominator for security reasons - **Idea 3**: Governance controlling the granularity of the slash reversal Distinguishing between different types of slashing according to how critical the offenses are could allow us to better balance the security, complexity and performance tradeoffs of performing slashing. For example slashes associated with non-critical misbehavior (i.e. level [1-2 offenses](https://wiki.polkadot.network/docs/learn-staking#slashing)) could apply a reduced set of state updates, at the expense of security. Critical slashes, on the other hand, would be more expensive but ensure maximum security of the network (hopefully these are less recurrent). Differences between critical and non-critical slashes: - **non-critical slashes**: - nominators do not necessarily need to be chilled; - validators keep the nominations; - cheap state r/w; - **critical slashes**: - nominators should also be chilled; - validators lose their nominations (and are perhaps blacklisted?) - expensive state r/w (hopefully less frequent); --- ## Raw notes Engineering retreat 31st/Oct/22 **Slashing** - Slashing system is not scalable yet and perhaps needs to be re-think from scratch - A few open problems - (main 1): we require all slashes to get slashed immediately (both validator and nominators) - number of nominators is bound, so that's good (but still not optimal as the number of nominators may be large) - Lazy slashing -- could this be an option? specially in small slashes for many parties - Slashing deference. Should we get rid of the slash deference? The point of it is to reverse the slash, but that can be done through other governance mechanisms. - 2 phases in slashing: - Slash is reported - Slash is enacted -- this is the tricky part, since it the chain has to update state over many entities (validator and nominators). Slashing is a state intense operation. - How about reducing the number of nominators to reduce the costs of slash enactment? Perhaps not a bit saving and may affect the security of staking. This may be a good short term solution, but it would be good to have a future-proof approach to unbound the number of nominators (and how it affects the slashing costs). - Nomination pools will reduce the strain on the max number of nominators per validator. - In nomination pools, the slashing of nominators is lazy (through subpools, points to balance ration, etc) - Idea: Kick nominators out if they are badly slashed? (similar to what's done with validators at this point). Slashed nominators are not chilled, but perhaps should be. - The ideal scenario of chilling is dependent on the type of slashing. In principle, the validators should be disabled if 100% of slash is enacted (or reported?). - Currently, the chilling is implicit (every 24h the state is re-read). - Should we chill when there's an economic slash or on a non economic slash? And how about the timing, should it be chilled when the slash is reported? - When considering the chilling, the security of the system may may depend on how serious the slash is (unclear atm) (Ali) - Perhaps we should not give too many options to the users, the chilling is there to protect validators (Jeff) - Taking a step back: what is the point of slashing? - Remove bad actors/misconfigured from the system - Get security from taking stake of bad actors - Waste people's time should not be a slash, but it should be a financial penalty. (unless the action keeps happening over time -- if that's the case, the validator should be disabled and not shown to the nominators) (Jeff) - Suggestion: (Kian) moving the chilling when the slash is enacted. Not when it is reported. - Problem: in this case, we need to make sure the slash is not due to very serious issues (Ali) - Should we ever defer slashes? - Practical reason: misconfigurations and bugs happen, so that's the reason to defer slashes to protect validators in case of bugs, etc. - Ideal solution (Kian): we do what we do now, but better: when the slash is reported, the validator loses the nominators but if the slash is rolled back, we give nominators back to validators. - We don't chill nominators. We only remove the validator. Ali: that's not good. - The slashing protocol was designed to be fair to all parts. Simplifying the code is possible if the initial assumptions don't hold true in reality - Implementation issue: currently the place to get the slashing records grabs a lot of data from the state of all slashes. That storage item is not cleaned/pruned, so complexity increases with time. - Even if the slashing spans go away, governance needs to be able to apply slashes back in time. - The principle of slashing spans is to make sure we can go back in time and learn about the amount of slash that should be applied to a nominator. Slashes may be reported out of order, so we need to keep state of nomination states over time and persist that in the future (Jeff) - Current bottlenecks: - Chilling - When we need to deduce balances when a slash is enacted - Counter is an expensive fetch - Currently, if someone gets slashed (ignoring the chilling details/process), if the slash gets cancelled, the validators need to get back to find the nominators again. This sucks, and the users complain about this. Regardless of if the validator is chilled, they should be able to get back the nominations after the slash is reverted (Kian). - ⭐ Appealing in terms of simplicity: Should we just block a validator for a period after slashing of time without touching the nominators? (Alfonso) The problem is that slashed validators, when coming back, will be fully backed by the nominators again (they won't unbond most likely). - this would be easier if the slash is reversed, nominators do not come back to nomination. - The chain doesn't need to do a lot of work. - 2 cases: - Validator is slashed, and the validator needs to start again - If the slash gets reversed, there's no costs to re-nominate the nominators. - how about cases when we slash a tiny amount? (Ali) In the current implementation, there's no difference between large and small slashes - fixes the historical problem (the threshold of when the chilling happens does not need to be computed); no need to care about the validators - blacklisting: keys are filtered to not validate. bringing back validators and nominators) is just a question of de-blacklist a key. - If this is fixed, we probably don't need any other fundamental performance bottleneck in staking. - this perhaps is not very secure for very large slashes (see points below). - ⭐ When we have a big security slash, we should take away the capability of nominators to nominate for a while (Ali) - different from the approach of only chilling/blocking nominators. (Ali): not enough, we also want to slash/disable nominators that are backing *very bad* validators, since they probably are part of the attack. - this may be a dealbreaker for the optimization mentioned above - ⭐ How about differentiate between large and small slashes, and act accordingly? (optimised for non critical slashes, onchain heavy for critical slashes to keep security) - Data structure: on-chain proof that the slash can be reconstructed on chain (long term) 2 conflicting but good ideas: - 1. Simple slashing (does not work for critical slashes). However, the slash logic can be one of two: - Beefy slash for critical slashes - Simple slash for non critical slashes - 2. There's a spectrum based on how critical the slashes are: 1. If slash is minor, do not touch the nominators 2. if slash is medium, block nominators. 3. if slash is critical, block validator and nominators - We don't have the underlying chilling behaviour that Jeff was thinking about. - Should nominators take the responsibility to un-denominate if their votes are slashed? - Repeating slash is most likely the result of bugs. - ⭐ Another simple version (Jeff) - Apply the slashes immediately and record it - Increment something in the nominator record that tells how much it has been slashed - When we check the nominator, the enforce the nominator to explicitly get nominate again if they have had an history of large slash per nominator in the past - Use a "slash balance" counter. If the slash balance is a % above the nomination bond, then that should affect the nominator. - Problem: keeping slash counter in sync, non negative slash balance. - Governance can control the granularity of the slash reversal, that could be useful (why?) - Original vision: protect nominators. - Should we pursue the current changes: in the short term we don't want to filter nominators, it would be better to defer to when applying the slash. - Instead of reading all the slashing record, whenever the slash is applied, then un-nominate (remove nominators from validator) - if validator gets slashed and it is reversed, they keep the nominators; The problem is the big slashes