owned this note
owned this note
Published
Linked with GitHub
# Wrong Slashing in Council
## Problem Description
During an a session with @apopiak and @guillaume to debug the migration of https://github.com/paritytech/substrate/pull/7040/files, we found a bug in the `elections-phragmen` crate. The PR intended to migrate to a recorded deposit system in which, for all voters, we store the amount that we have reserved for them as deposit.
Before doing a cowboy move and just inserting this data in to the chain, we decided to check a simple assertion: All accounts that are voters must have `VotingBond` reserved. Using `remote-externalities` to test this, we quickly found out that in both Kusama and Polkadot there are a handful of accounts that do not satisfy this criteria.
A bit more investigation, and it was discovered that the root of the issue is in the `election-phragmen` pallet. Before explaining the issue, let's explain the correct behavior.
In the `elections-phragmen` pallet, each candidate has a life cycle with 3 possible stages, namely `Candidate`, `Member` or `Runner-up`. First, accounts can `submit_candidacy()`, and the pallet will reserve `CandidacyBond`. If the account makes it into the `Members` or `Runners-up` list, then this bond is kept. Else, a loser candidate will always lose their bond (`slash_reserved()`). Moreover, members and runners-up must also `renounce_candidacy()` in time. If they lose their spot in an election round (i.e. no longer hold any role), they also lose their bond as the consequence.
All in all, there are ***two valid*** slash scenarios:
1. A candidate who loses.
2. A member/runner-up who loses their spot in **both** sets.
With this, let's demonstrate the issues:
1. Due to an API confusion, we sometimes mis-calculated the outgoing runners-up, thus slashing them prematurely. https://github.com/paritytech/substrate/pull/7384
2. Moreover, we mistakenly slashed members and runners-up when they moved in between the roles (i.e. A member gets downgraded to a runner-up). https://github.com/paritytech/substrate/pull/7394
Fixing both of these were fairly simple. Orders of magnitude more interesting and challenging was identifying the affected accounts, and figuring out how much wrong slash have been applied to them.
This is not *just* a matter of lost tokens. It is also corrupt state, because the reserving mechanism does not distinguish between different reserved amounts. In other words, if multiple modules have reserved some tokens, then `elections-phragmen` could eat into that reserve amount and slash it.
## State Recovery
The state recovery uses two tools:
1. A Rust code that re-creates the elections state and runs the correct and faulty code.
2. A Javascript code that does the same by just observing the state and recreating parts of the faulty code in javascript.
The goal of the state recovery is to determine:**How much money was effectively slashed from each account**?
We first apply the Rust method to Polkadot. This is because, this method is more accurate but is almost impossible to do on Kusama, because we have changed the runtime source code many many times.
Then, we developed the Javascript method and double checked the result with the Rust method on Polkadot. Once verified to be correct, only use the Javascript method to recover Kusama.
### Initial Results.
The following csv formatted snipped shows the effective slash in Polkadot and Kusama.
```
12xGDBh6zSBc3D98Jhw9jgUVsK8jiwGWHaPTK21Pgb7PJyPn,1402990000000
167rjWHghVwBJ52mz8sNkqr5bKu5vpchbc9CBoieBhVX714h,1000000000000
15MUBwP6dyVw5CXF9PjSSv7SdXQuDSwjX86v1kBodCSWVR7c,2050000000000
12Vv2LsLCvPKiXdoVGa3QSs2FMF8zx2c8CPTWwLAwfYSFVS1,4252580000000
128qRiVjxU3TuT37tg7AX99zwqfPtj2t4nDKUv9Dvi5wzxuF,1266880000000
15akrup6APpRegG1TtWkYVuWHYc37tJ8XPN61vCuHQUi65Mx,1407820000000
15aKvwRqGVAwuBMaogtQXhuz9EQqUWsZJSAzomyb5xYwgBXA,1452990000000
14mSXQeHpF8NT1tMKu87tAbNDNjm7q9qh8hYa7BY2toNUkTo,1452990000000
13Gdmw7xZQVbVoojUCwnW2usEikF2a71y7aocbgZcptUtiX9,1202580000000
15BQUqtqhmqJPyvvEH5GYyWffXWKuAgoSUHuG1UeNdb8oDNT,1804170000000
13pdp6ALhYkfEBqBM98ztL2Xhv4MTkm9rZ9vyjyXSdirJHx6,2806820000000
1RG5T6zGY4XovW75mTgpH6Bx7Y6uwwMmPToMCJSdMwdm4EW,1603640000000
12Y8b4C9ar162cBgycxYgxxHG7cLVs8gre9Y5xeMjW3izqer,1202910000000
12xG1Bn4421hUQAxKwZd9WSxZCJQwJBbwr6aZ4ZxvuR7A1Ao,1000000000000
14krbTSTJv3aaT1VeBRX7CzoV4crr3adeF3KutdpkCttrxsZ,1000000000000
1hJdgnAPSjfuHZFHzcorPnFvekSHihK9jdNPWHXgeuL7zaJ,1252580000000
12mP4sjCfKbDyMRAEyLpkeHeoYtS5USY4x34n9NMwQrcEyoh,1202580000000
1rwgen2jqJNNg7DpUA4jBvMjyepgiFKLLm3Bwt8pKQYP8Xf,1252580000000
1dGsgLgFez7gt5WjX2FYzNCJtaCjGG6W9dA42d9cHngDYGg,1607240000000
1WG3jyNqniQMRZGQUc7QD2kVLT8hkRPGMSqAb5XYQM1UDxN,1252580000000
Kusama
CanLB42xJughpTRC1vXStUryjWYkE679emign1af47QnAQC,1060000000000
CcKPhXSyZgATZD1wVaRsSk81UfLcQvyuuS2i9FNhsoeQeWr,1060000000000
CdEm1ErGKML3waXabLvn3NyqdAGXBQJVngLaM86YM5Yb9dr,1060000000000
DMF8a34emwapz9mV5P5PTDcghh1ZR3miH9ad9mHzfAUMSXU,1176666666666
DTLcUu92NoQw4gg6VmNgXeYQiNywDhfYMQBPYg2Y1W6AkJF,1060000000000
DaCSCEQBRmMaBLRQQ5y7swdtfRzjcsewVgCCmngeigwLiax,12060000000000
DfiSM1qqP11ECaekbA64L2ENcsWEpGk8df8wf1LAfV2sBd4,4060000000000
ET9SkhNZhY7KT474vkCEJtAjbgJdaqAGW4beeeUJyDQ3SnA,2060000000000
FcxNWVy5RESDsErjwyZmPCW6Z8Y3fbfLzmou34YZTrbcraL,3010000000000
GcqKn3HHodwcFc3Pg3Evcbc43m7qJNMiMv744e5WMSS7TGn,4059999999996
Gth5jQA6v9EFbpqSPgXcsvpGSrbTdWwmBADnqa36ptjs5m5,11070000000000
HUewJvzVuEeyaxH2vx9XiyAPKrpu1Zj5r5Pi9VrGiBVty7q,1060000000000
FcjmeNzPk3vgdENm1rHeiMCxFK96beUoi2kb59FmCoZtkGF,1060000000000
ELharp1RuLgHUE6436sWSpW6XB2BqhaSq9L1Wexs83aodZ6,1050000000000
FX7ar9G6Lm3KXt431UHfcEETU9NpAyoT4MtCDVuFq6WgdfV,1050000000000
H9eSvWe34vQDJAWckeTHWSqSChRat8bgKHG39GC1fjvEm7y,6160000000000
D9rwRxuG8xm8TZf5tgkbPxhhTJK5frCJU9wvp59VRjcMkUf,1060000000000
Dtj5dNTPxX6UNfz7DvB3wKUiWcYpjmcDBnjWyYEShgBbtnQ,1060000000000
EGVQCe73TpFyAZx5uKfE1222XfkT3BSKozjgcqzLBnc5eYo,12060000000000
GJ7JBaqx4ys5LrkECQKmEWdjGgAoXChsLepjEqdHuKggymK,1000000000000
DbF59HrqrrPh9L2Fi4EBd7gn4xFUSXmrE6zyMzf3pETXLvg,3393333333332
ESred3TPEQhyNBuqYLoKrRQdq6msiX7EC14h3fDx3yDvt65,1000000000000
CzJfMJpzjyjPXbW1wTjFxq3v8aAXpSxwNnuR6oTRCgZehMS,1000000000000
E35K8FV3K1vn1wJfkjUZcDnG8mmcXVre4LiDZxKWDdjtaVE,11050000000000
G7eJUS1A7CdcRb2Y3zEDvfAJrM1QtacgG6mPD1RsPTJXxPQ,3000000000000
HgiRxRWYPoEF5RTAcP8tn5vJ5JoFNW7BZC4ootwmnSQvTpp,1000000000000
HFG4FvoJv8uanizzetS1tPA6wigNAiKuEHKcm1NaKNNDwve,8333333333332
DokiayXWoMvotzchNdLSH4P4Fe7EvMESZvZL4Fn3NekoFtf,4000000000000
FXkJddo2bypz4EyikYZiCLcYv31afnPQkCzt7H6Zn6u6YXZ,5000000000000
J6xn7Mr8pfed6gvvRPZ8HEEb89RCwheTBtxymg9Xw36hUUS,2000000000000
FnoYEw7vhMb5uLDaqjYZ84viAAt4nM7S9sD2oV9owZJTsc4,11000000000000
Hjuii5eGVttxjAqQrPLVN3atxBDXPc4hNpXF6cPhbwzvtis,1000000000000
DWUAQt9zcpnQt5dT48NwWbJuxQ78vKRK9PRkHDkGDn9TJ1j,1050000000000
HSNBs8VHxcZiqz9NfSQq2YaznTa8BzSvuEWVe4uTihcGiQN,7000000000000
GvyfytrxFQbHK8ZFNT3h12dJPfBXFjVV7k98cXni8VAgjKX,2000000000000
J9nD3s7zssCX7bion1xctAF6xcVexcpy2uwy4jTm9JL8yuK,4000000000000
DiUYgG4o2MZg9SwZcEekbXTeGkkWfzXnkTEXKmRx1xZEVY4,333333333332
GZrYeXSDWXQASq4uhrpKG8rx2ziR6XpLxiNo9Z5tJyZBpQX,333333333332
CpjsLDC1JFyrhm3ftC9Gs4QoyrkHKhZKtK7YqGTRFtTafgp,1833333333326
Etj64GQ5Mzm98HijdnvqjxMyK8xemPtLTpWdnwEiEvFygJa,833333333330
GPA7hzVXcZ22vhHYNu2c74e7KMhR5dJzk9uLfdVsdBrKK3x,666666666664
JKoSyjg9nvVZterFB5XssM7eaYj4Ty6LhCLRGUfy6NKGNC3,166666666666
```
### Next step.
*The big question is then, how to refund the accounts.*
A naive refund strategy would have been to refund these accounts directly via a tip, effectively topping up the free balance. But we need to make sure that this will not lead to any further problems. For example, for all accounts that have more than 100DOTs in effective slash, the elections-phragmen *could* have slashed reserved tokens from other modules. Therefore, the main dilemma here is to:
1. Refund to free balance?
2. Refund to reserved balance?
3. Or something in between?
> While the ***target*** of the refund is subject to debate, the ***source*** is known. All the immature slashes went to the treasury's account. Thus, treasury should pay for the refund in any case.
To investigate further, we write a script that tries to compute the amount that ***should*** be recorded as deposit on chain (because most modules record the value on-chain, or it can be known based on a certain role -- e.g. if voter, then `x` tokens served) and compares it with the ***actual*** value. Then, we use the script to see the status of each of the affected accounts. We use this script to infer how the refund should happen.
After some consideration, we propose the following method:
We look at 3 variables. `should_reserve`, `has_reserve`, and `effective_slash`, for each account. The names are self-explanatory. For all accounts of interest, we compute `missing` as `should_reserve - has_reserve`. Then, we calculate:
1. The amount that should be refunded to reserved balance is equal to `missing`.
2. The amount that should be refunded to free balance is equal to `effective_slash - missing`.
The second point basically means: It is likely that an account reserved some bond in the past, but that bond got slashed by the elections-phragmen by mistake. Moreover, it could have been that the user already attempted to unreserve these funds. In this case, the call to `unreserve()` is a no-op. The second point is essentially a best-effort attempt at compensating this. We refund the difference of `effective_slash - missing` **to free balance** under the *assumption* that it has already been unreserved.
[This spreadsheet](https://docs.google.com/spreadsheets/d/1dHch86flQ-yXw-BJcnjWLqND2nxOcp4p-hppcuD_DqA/edit?usp=sharing) shows this entire process in one page for both chains.
![](https://i.imgur.com/xxdLzT6.png)
## Conclusion
Despite the above approach being more accurate, it has two drawbacks, one major and one minor:
1. The major drawback of this approach is that for (anonymous) proxy and multisig accounts, we were yet unable to accurately determine the should-be reserved amount. Therefore, the above approach will not be accurate anyhow.
2. It is no sufficient to compute these values offline and apply them in the next upgrade. All of the calculation need to be translated to Rust code, audited and re-calculate everything at the time of the next upgrade or whenever the fix is being applied. This will increase the complexity of the patch and a potential point of error in itself again.
Therefore, all in all, for now we propose a refund to the free balance of the accounts via `Tips`. Once all modules move to a recorded deposit system, and our abilities to compute the should-be reserved balance of accounts improve, we can double check and fix the reserved balance of all accounts, including the affected ones here.
## How to re-create.
Sadly, the setup to re-create these tests are not trivial. Here's a brief guide:
1. Clone and place this commit of substrate-debug-kit next to your substrate folder: https://github.com/paritytech/substrate-debug-kit/commit/4bf6f078c2fe5c7f9b237e3df01b1aac2c0da075. The outcome should be like
```
.
| -- /substrate
| -- /substrate-debug-kit
```
2. Checkout your substrate to branch: `kiz-fix-faulty-election-1`. This contains both the correct and faulty code.
3. To get the list of all election blocks, go to the `js` folder, change the wrapped async function at the top level to call `await findElections(api)`. The outcome should be tuples with comma separation.
4. Replace this in the `let elections = vec![ ... ]` in `src/main.rs`.
5. Run the main rust script. This will calculate `effective_slash` for all accounts.
6. Run the js script again, but this time change the main function to be `await computeRefund(api)`. This will do the final analysis and output a comma separated table.